Difference between revisions of "Natural Language Processing (NLP)"

From
Jump to: navigation, search
(Tokenization / Sentence Splitting)
(Tokenization / Sentence Splitting)
Line 50: Line 50:
 
[http://www.youtube.com/results?search_query=Tokenization+Sentence+Splitting+nlp+natural+language Youtube search...]
 
[http://www.youtube.com/results?search_query=Tokenization+Sentence+Splitting+nlp+natural+language Youtube search...]
  
* [http://books.google.com/ngrams/graph? Ngram Viewer | Google]
+
* [http://books.google.com/ngrams Ngram Viewer | Google Books]
 +
** [https://books.google.com/ngrams/info Ngram Viewer Info]  
  
 +
A contiguous sequence of n items from a given sample of text or speech. The items can be phonemes, syllables, letters, words or base pairs according to the application.
  
 
http://www.researchgate.net/profile/Amelia_Carolina_Sparavigna/publication/286134641/figure/fig5/AS:309159452004356@1450720765824/In-this-time-series-Google-Ngram-Viewer-is-used-to-compare-some-literature-for-children.png
 
http://www.researchgate.net/profile/Amelia_Carolina_Sparavigna/publication/286134641/figure/fig5/AS:309159452004356@1450720765824/In-this-time-series-Google-Ngram-Viewer-is-used-to-compare-some-literature-for-children.png

Revision as of 20:30, 23 September 2018

Youtube search...

Speech recognition, speech translation, understanding complete sentences, understanding synonyms of matching words, sentiment analysis, and writing complete grammatically correct sentences and paragraphs.

Pipeline

Youtube search...

A-global-model-of-the-Power-Workbench.png

Regular Expressions (Regex)

Youtube search...

Search for text patterns, validate emails and URLs, capture information, and use patterns to save development time.

regex-example.png

Tokenization / Sentence Splitting

Youtube search...

A contiguous sequence of n items from a given sample of text or speech. The items can be phonemes, syllables, letters, words or base pairs according to the application.

In-this-time-series-Google-Ngram-Viewer-is-used-to-compare-some-literature-for-children.png

Stop Words

Youtube search...

Stemming (Morphological Similarity)

Youtube search...

Refers to a crude heuristic process that chops off the ends of words in the hope of achieving this goal correctly most of the time, and often includes the removal of derivational affixes.

Part-of-Speech (POS) Tagging

Youtube search...

Chunking

Youtube search...

Chinking

Youtube search...

Named Entity Recognition (NER)

Youtube search...

(also known as entity identification, entity chunking and entity extraction) is a subtask of information extraction that seeks to locate and classify named entities in text into pre-defined categories such as the names of persons, organizations, locations, expressions of times, quantities, monetary values, percentages, etc. Most research on NER systems has been structured as taking an unannotated block of text, and producing an annotated block of text that highlights the names of entities.

maxthonsnap20170219114627.png

Neural Coreference

Youtube search...

Coreference is the fact that two or more expressions in a text – like pronouns or nouns – link to the same person or thing. It is a classical Natural language processing task, that has seen a revival of interest in the past two years as several research groups applied cutting-edge deep-learning and reinforcement-learning techniques to it. It is also one of the key building blocks to building conversational Artificial intelligences.

1*-jpy11OAViGz2aYZais3Pg.png

Hierarchical Classifier

Youtube search...

Classification approaches:

  • Flat - there is no inherent hierarchy between the possible categories the data can belong to (or we chose to ignore it). Train either a single classifier to predict all of the available classes or one classifier per category (1 vs All)
  • Hierarchically - organizing the classes, creating a tree or DAG (Directed Acyclic Graph) of categories, exploiting the information on relationships among them. Although there are different types of hierarchical classification approaches, the difference between both modes of reasoning and analysing are particularly easy to understand in these illustrations, taken from a great review on the subject by Silla and Freitas (2011). Taking a top-down approach, training a classifier per level (or node) of the tree (again, although this is not the only hierarchical approach, it is definitely the most widely used and the one we’ve selected for our problem at hands), where a given decision will lead us down a different classification path.

1.png 2.png

Lemmatization

Youtube search...

Lemmatization usually refers to doing things properly with the use of a vocabulary and morphological analysis of words, normally aiming to remove inflectional endings only and to return the base or dictionary form of a word, which is known as the lemma . If confronted with the token saw, stemming might return just s, whereas lemmatization would attempt to return either see or saw depending on whether the use of the token was as a verb or a noun. The two may also differ in that stemming most commonly collapses derivationally related words, whereas lemmatization commonly only collapses the different inflectional forms of a lemma. Stemming and lemmatization | Stanford.edu

Corpora

Youtube search...

Topic Modeling

Youtube search...

Word Embeddings

Youtube search...

Summarizer

Youtube search...

Ontologies

Youtube search...

(aka knowledge graph) can incorporate computable descriptions that can bring insight in a wide set of compelling applications including more precise knowledge capture, semantic data integration, sophisticated query answering, and powerful association mining - thereby delivering key value for health care and the life sciences.

Natural Language Inference (NLI) and Recognizing Textual Entailment (RTE)

Youtube search...

Identifying whether one piece of text can be plausibly inferred from another - automatic acquisition of paraphrases, lexical semantic relationships, inference methods, knowledge representations for applications such as question answering, information extraction and summarization.

Semantic Role Labeling (SRL)

Youtube search...

identifies shallow semantic information in a given sentence. The tool labels verb-argument structure, identifying who did what to whom by assigning roles that indicate the agent, patient, and theme of each verb to constituents of the sentence representing entities related by the verb.

Deep Learning Algorithms

Youtube search...


Evaluation Measures - Classification Performance

Youtube search...

Confusion Matrix, Precision, Recall, F Score, ROC Curves, trade off between True Positive Rate and False Positive Rate.

Capabilities

Sentiment Analysis

Youtube search...

Wikifier

Youtube search...