Difference between revisions of "Term Frequency, Inverse Document Frequency (TF-IDF)"

Revision as of 02:21, 6 November 2018

Natural Language Processing (NLP), Natural Language Inference (NLI) and Recognizing Textual Entailment (RTE)
Scikit-learn Machine Learning in Python, Simple and efficient tools for data mining and data analysis; Built on NumPy, SciPy, and matplotlib
Bag-of-Words (scikit-learn: Count Vectorizer)
Word2Vec
Doc2Vec
Skip-Gram
Global Vectors for Word Representation (GloVe)

This statistic represents words’ importance in each document. We use a word's frequency as a proxy for its importance: if "football" is mentioned 25 times in a document, it might be more important than if it was only mentioned once. We also use the document frequency (the number of documents containing a given word) as a measure of how common the word is. This minimizes the effect of stop-words such as pronouns, or domain-specific language that does not add much information (for example, a word such as "news" that might be present in most documents).

@@ Line 11: / Line 11: @@
 This statistic represents words’ importance in each document. We use a word's frequency as a proxy for its importance: if "football" is mentioned 25 times in a document, it might be more important than if it was only mentioned once. We also use the document frequency (the number of documents containing a given word) as a measure of how common the word is. This minimizes the effect of stop-words such as pronouns, or domain-specific language that does not add much information (for example, a word such as "news" that might be present in most documents).
-<youtube>aCdg-d_476Y</youtube>
+<youtube>4vT4fzjkGCQ</youtube>
-<youtube>OGK9SHt8SWg</youtube>
+<youtube>6HuKFh0BatQ</youtube>
+<youtube>hXNbFNCgPfY</youtube>
+<youtube>bPYJi1E9xeM</youtube>

Difference between revisions of "Term Frequency, Inverse Document Frequency (TF-IDF)"

Revision as of 02:21, 6 November 2018

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools