Difference between revisions of "Term Frequency, Inverse Document Frequency (TF-IDF)"

From
Jump to: navigation, search
(Created page with "[http://www.youtube.com/results?search_query=tf+idf+Term+Frequency+Inverse+Document+nlp+nli+natural+language Youtube search...] * Natural Language Processing (NLP), Natural...")
 
Line 11: Line 11:
 
This statistic represents words’ importance in each document. We use a word's frequency as a proxy for its importance: if "football" is mentioned 25 times in a document, it might be more important than if it was only mentioned once. We also use the document frequency (the number of documents containing a given word) as a measure of how common the word is. This minimizes the effect of stop-words such as pronouns, or domain-specific language that does not add much information (for example, a word such as "news" that might be present in most documents).
 
This statistic represents words’ importance in each document. We use a word's frequency as a proxy for its importance: if "football" is mentioned 25 times in a document, it might be more important than if it was only mentioned once. We also use the document frequency (the number of documents containing a given word) as a measure of how common the word is. This minimizes the effect of stop-words such as pronouns, or domain-specific language that does not add much information (for example, a word such as "news" that might be present in most documents).
  
<youtube>aCdg-d_476Y</youtube>
+
<youtube>4vT4fzjkGCQ</youtube>
<youtube>OGK9SHt8SWg</youtube>
+
<youtube>6HuKFh0BatQ</youtube>
 +
<youtube>hXNbFNCgPfY</youtube>
 +
<youtube>bPYJi1E9xeM</youtube>

Revision as of 01:21, 6 November 2018

Youtube search...

This statistic represents words’ importance in each document. We use a word's frequency as a proxy for its importance: if "football" is mentioned 25 times in a document, it might be more important than if it was only mentioned once. We also use the document frequency (the number of documents containing a given word) as a measure of how common the word is. This minimizes the effect of stop-words such as pronouns, or domain-specific language that does not add much information (for example, a word such as "news" that might be present in most documents).