Definition: TF-IDF

« Back to Glossary Index

TF-IDF stands for Term Frequency-Inverse Document Frequency, which is a numerical statistic that is used to determine how important a word is to a document in a collection or corpus of documents.

In information retrieval and text mining, the TF-IDF score is used to measure the relevance of a term (e.g., a keyword or phrase) to a document. It is calculated as the product of two factors:

  1. Term Frequency (TF): The number of times a term appears in a document, normalized by the total number of terms in the document.
  2. Inverse Document Frequency (IDF): The logarithmic inverse of the number of documents in the corpus that contain the term, which penalizes terms that appear in many documents and thus have low discriminative power.

TF-IDF is used in a variety of applications, including search engine ranking algorithms, text classification, and information retrieval, to determine the most relevant content for a given query or user request. By assigning a relevance score to each term in a document, TF-IDF can help identify the most important topics, concepts, and keywords in a corpus of documents, and can be used to rank documents based on their relevance to a given query.

« Back to Glossary Index