Difference between revisions of "Natural Language Tools & Services"

Revision as of 08:30, 10 July 2020

Youtube search... ...Google search

Natural Language Processing (NLP)
NLP Tools | Wikipedia
Auto-tagging
AI researchers create testing tool to find bugs in NLP from Amazon, Google, and Microsoft | Khari Johnson - Venture Beat ...Beyond Accuracy: Behavioral Testing of NLP Models with CheckList | M. Ribeiro, T. Wu, C. Guestrin, and S. Singh
Machine Translation reading list & open-source toolkits | Tsinghua Natural Language Processing Group
Build or Buy for Natural Language Processing? | Tim Mohler - Lexaytics
Statistical natural language processing and corpus-based computational linguistics: An annotated list of resources | Chris Manning - Stanford
Software Packages | Cognitive Computation Group, led by Prof. Dan Roth
PyTorch implementation of the Quasi-Recurrent Neural Network - up to 16 times faster than NVIDIA's cuDNN LSTM | gitHub
NLP Tool Finder | NIH University of Utah DBMI
API-based Services
Natural Language Processing and Machine Learning in JavaScript | David Luecke - Medium
Top 22 NLP (Natural Language Processing) APIs for Developers in 2018 | RapidAPI
Natural language processing systems for capturing and standardizing unstructured clinical information: A systematic review | K. Kreimeyera, M. Foster, A. Pandeya, N. Aryaa, G. Halford, S. Jones, R. Forsheea, M. Walderhauga, and T. Botsisa - Journal of Biomedical Informatics 71 natural language processing systems
8 great Python libraries for natural language processing | Serdar Yegulalp - InfoWorld

Capability with Javascript

TensorFlow.js for training and deploying ML models in the browser and on Node.js (was called Deeplearnjs)
- Keras.js No longer active - capability now is in TensorFlow.js
NLP.js NLP Manager: built on top of several other NLP libraries, including Franc and Brain.js providing Text Classification, Sentiment Analysis, Stemming (Morphological Similarity), Named Entity Recognition (NER), and natural language generation. (nodejs)
Compromise modest natural-language processing (NLP) interprets and pre-parses English and makes some reasonable decisions
Natural provides Tokenization / Sentence Splitting, Stemming (Morphological Similarity) (reducing a word to a not-necessarily morphological root), Text Classification, phonetics, Term Frequency–Inverse Document Frequency (TF-IDF), WordNet, string Similarity, some inflections, and more. (nodejs)

Capability (other)

Google:
- TensorFlow
- Google Natural Language - AutoML Natural Language
- Cloud Speech-to-Text | Google
- Google_Semantic_Reactor a Google Sheets add-on
Amazon - AWS:
- Textract | Amazon in the Elastic Stack Architecture
- Comprehend | Amazon
- Transcribe | Amazon
- Amazon Polly | Amazon
- Kendra | Amazon
Microsoft - Asure:
- Azure Cognitive Services | Microsoft
- Azure Bing Speech API | Microsoft
Standford:
- CoreNLP | Stanford The Stanford Natural Language Processing Group Toolkit (Python)
- Stanza - a Python NLP Library for Many Human Languages ...python NLP Toolkit. Stanza features both a language-agnostic fully neural pipeline for text analysis (supporting 66 human languages), and a python interface to Stanford's CoreNLP java software
Natural Language Toolkit (NLTK) (Python) implements Text Classification, Tokenization / Sentence Splitting, Stemming (Morphological Similarity), Part-of-Speech (POS) Tagging, Syntax (Parsing), and semantic reasoning
- TextBlob(Python) is kind of an extension of NLTK. You can access many of NLTK's functions in a simplified manner; includes functionality from the Pattern library
SpaCy (Python and Cython) everything as an object rather than a string, which simplifies the interface for building applications. Extract useful information from free text with built-in features to assist analysis, such as work tokeniser, named entity recognition, and part-of-speech detection. Spacy supports more than 55 languages.
- textacy (Python) focuses primarily on the tasks that come before and follow after SpaCy
scikit-learn NLP toolkit
Apache OpenNLP
fastText | Facebook's AI Research representations and text classifiers (Python)
MALLET a Java-based package
Intel NLP Architect (Python)
Gensim fast Vector Space Modelling, Topic Modeling, LDA implementation (Python)
flair use pretrained BERT (PyTorch)
AllenNLP an Apache NLP research library (PyTorch)
Pytorch-NLP (PyTorch)
Matlab
Sintelix
H2O Driveless AI
Dandelion API
VoxSigma API
Speech Recognition |Twilio
Automatic Speech Recognition (ASR) | Speechmatics
Voice API | Nexmo
wit.ai
Meaning Cloud
Haven OnDemand
Aylien
Lexalytics
Dialogflow
Indico
TextRazor
Intellexer
Meaning Cloud
LSTM and QRNN Language Model Toolkit for PyTorch | GitHub
Thematically Discover
Pattern (Python)
Polyglot
Speech to Text | IBM
Text to Speech | IBM
Watson Natural Language Understanding | IBM
Project Debater | IBM

Text Labeling

Bella open tool aimed at simplifying and speeding up text data labeling. Usually, if a dataset was labeled in a CSV file or Google spreadsheets, specialists need to convert it to an appropriate format before model training. Bella’s features and simple interface make it a good substitution to spreadsheets and CSV files. A graphical user interface (GUI) and a database backend for managing labeled data are Bella’s main features.
Tagtog choose three approaches: annotate text manually, hire a team that will label data for them, or use machine learning models for automated annotation.
Dataturks provides training data preparation tools. Using its products, teams can perform such tasks as parts-of-speech tagging, named-entity recognition tagging, text classification, moderation, and summarization.
Brat rapid annotation tool] a web-based tool for text annotation; that is, for adding notes to existing text documents, designed in particular for structured annotation, where the notes are not freeform text but have a fixed form that can be automatically processed and interpreted by a computer.
Yedda a lightweight Collaborative Text Span Annotation Tool developed for annotating chunk/entity/event on text (almost all languages including English, Chinese), symbol and even emoji.

Automated Scoring

@@ Line 10: / Line 10: @@
 * [[Natural Language Processing (NLP)]]
 * [http://en.wikipedia.org/wiki/Outline_of_natural_language_processing#Natural_language_processing_tools NLP Tools | Wikipedia]
+* [[Data Augmentation#Auto-tagging|Auto-tagging]]
+* [http://venturebeat.com/2020/07/09/ai-researchers-create-testing-tool-to-find-bugs-in-nlp-from-amazon-google-and-microsoft/ AI researchers create testing tool to find bugs in NLP from Amazon, Google, and Microsoft | Khari Johnson - Venture Beat] ...[http://www.aclweb.org/anthology/2020.acl-main.442/ Beyond Accuracy: Behavioral Testing of NLP Models with CheckList | M. Ribeiro, T. Wu, C. Guestrin, and S. Singh]
+* [http://github.com/THUNLP-MT Machine Translation reading list & open-source toolkits | Tsinghua Natural Language Processing Group]
+* [http://www.lexalytics.com/lexablog/build-or-buy-natural-language-processing Build or Buy for Natural Language Processing? | Tim Mohler - Lexaytics]
 * [http://nlp.stanford.edu/links/statnlp.html Statistical natural language processing and corpus-based computational linguistics: An annotated list of resources | Chris Manning - Stanford]
 * [http://cogcomp.org/page/software/ Software Packages | Cognitive Computation Group, led by Prof. Dan Roth]
@@ Line 17: / Line 21: @@
 * [http://medium.com/@daffl/natural-language-processing-and-machine-learning-in-javascript-249181a3b721 Natural Language Processing and Machine Learning in JavaScript | David Luecke - Medium]
 * [http://blog.rapidapi.com/best-nlp-api/ Top 22 NLP (Natural Language Processing) APIs for Developers in 2018 | RapidAPI]
+* [http://www.sciencedirect.com/science/article/pii/S1532046417301685 Natural language processing systems for capturing and standardizing unstructured clinical information: A systematic review | K. Kreimeyera, M. Foster, A. Pandeya, N. Aryaa, G. Halford, S. Jones, R. Forsheea, M. Walderhauga, and T. Botsisa - Journal of Biomedical Informatics] 71 natural language processing systems
+* [http://www.infoworld.com/article/3519413/8-great-python-libraries-for-natural-language-processing.html 8 great Python libraries for natural language processing | Serdar Yegulalp - InfoWorld]
 ==== Capability with [[Javascript]] ====
-* [[TensorFlow.js]] for training and deploying ML models in the browser and on [http://nodejs.org/en/ Node.js] (was called Deeplearnjs)
+* [[TensorFlow.js]] for training and deploying ML models in the browser and on [[Javascript#Node.js|Node.js]] (was called Deeplearnjs)
 ** [http://transcranial.github.io/keras-js/#/ Keras.js] No longer active - capability now is in TensorFlow.js
-* [http://www.npmjs.com/package/node-nlp NLP.js] NLP Manager: a tool able to manage several languages (nodejs)
+* [http://www.npmjs.com/package/node-nlp NLP.js] NLP Manager: built on top of several other NLP libraries, including [http://github.com/wooorm/franc Franc] and [http://brain.js.org/#/ Brain.js] providing [[Natural Language Processing (NLP)#Text Classification|Text Classification]], [[Natural Language Processing (NLP)#Sentiment Analysis|Sentiment Analysis]], [[Natural Language Processing (NLP)#Stemming (Morphological Similarity)|Stemming (Morphological Similarity)]], [[Natural Language Processing (NLP)#Named Entity Recognition (NER)|Named Entity Recognition (NER)]], and natural language generation. (nodejs)
 * [http://compromise.cool/ Compromise] modest natural-language processing (NLP) interprets and pre-parses English and makes some reasonable decisions
-* [http://github.com/NaturalNode/natural Natural] provides tokenizing, stemming (reducing a word to a not-necessarily morphological root), classification, phonetics, tf-idf, WordNet, string similarity, some inflections, and more. (nodejs)
+* [http://github.com/NaturalNode/natural Natural] provides [[Natural Language Processing (NLP)#Tokenization / Sentence Splitting|Tokenization / Sentence Splitting]], [[Natural Language Processing (NLP)#Stemming (Morphological Similarity)|Stemming (Morphological Similarity)]] (reducing a word to a not-necessarily morphological root), [[Natural Language Processing (NLP)#Text Classification|Text Classification]], phonetics, [[Term Frequency–Inverse Document Frequency (TF-IDF)]], [http://wordnet.princeton.edu/ WordNet], string [[Natural Language Processing (NLP)#Similarity|Similarity]], some inflections, and more. (nodejs)
 ==== Capability (other) ====
-* [[TensorFlow]]
+* [[Google]]:
-* [[Google Natural Language]]
+** [[TensorFlow]]
-* [http://cloud.google.com/speech-to-text/ Cloud Speech-to-Text  | Google]
+** [[Google Natural Language]] - AutoML Natural Language
-* [http://github.com/zalandoresearch/flair flair] use pretrained BERT (PyTorch)
+** [http://cloud.google.com/speech-to-text/ Cloud Speech-to-Text  | Google]
-* [http://stanfordnlp.github.io/CoreNLP/ CoreNLP | Stanford] (Python)
+** [[Google_Semantic_Reactor]] a [http://www.google.com/sheets/about/ Google Sheets] add-on
-* [[Natural Language Toolkit (NLTK)]] (Python)
+* Amazon - AWS:
+** [[Textract]] | Amazon in the Elastic Stack Architecture
+** [http://aws.amazon.com/comprehend/ Comprehend | Amazon]
+** [http://aws.amazon.com/transcribe/ Transcribe | Amazon]
+** [http://aws.amazon.com/polly/ Amazon Polly | Amazon]
+** [http://aws.amazon.com/kendra/ Kendra | Amazon]
+* Microsoft - Asure:
+** [http://azure.microsoft.com/en-us/services/cognitive-services/ Azure Cognitive Services | Microsoft]
+** [http://azure.microsoft.com/en-us/services/cognitive-services/speech-services/ Azure Bing Speech API | Microsoft]
+* Standford:
+** [http://stanfordnlp.github.io/CoreNLP/ CoreNLP | Stanford] The Stanford Natural Language Processing Group Toolkit ([[Python]])
+** [http://stanfordnlp.github.io/stanza/ Stanza - a Python NLP Library for Many Human Languages] ...[http://www.infoq.com/news/2020/03/stanza-nlp-toolkit/ python NLP Toolkit]. Stanza features both a language-agnostic fully neural pipeline for text analysis (supporting 66 human languages), and a python interface to Stanford's CoreNLP java software
+* [[Natural Language Toolkit (NLTK)]] ([[Python]]) implements [[Natural Language Processing (NLP)#Text Classification|Text Classification]], [[Natural Language Processing (NLP)#Tokenization / Sentence Splitting|Tokenization / Sentence Splitting]], [[Natural Language Processing (NLP)#Stemming (Morphological Similarity)|Stemming (Morphological Similarity)]], [[Natural Language Processing (NLP)#Part-of-Speech (POS) Tagging|Part-of-Speech (POS) Tagging]], [[Natural Language Processing (NLP)#Syntax (Parsing)|Syntax (Parsing)]], and semantic reasoning
+** [http://textblob.readthedocs.io/en/dev/ TextBlob]([[Python]]) is kind of an extension of [[Natural Language Toolkit (NLTK)| NLTK]]. You can access many of [[Natural Language Toolkit (NLTK)| NLTK]]'s functions in a simplified manner; includes functionality from the Pattern library
+* [[SpaCy]] ([[Python]] and Cython)  everything as an object rather than a string, which simplifies the interface for building applications. Extract useful information from free text with built-in features to assist analysis, such as work tokeniser, named entity recognition, and part-of-speech detection. Spacy supports more than 55 languages.
+** [http://github.com/chartbeat-labs/textacy textacy] ([[Python]]) focuses primarily on the tasks that come before and follow after [[SpaCy]]
+* [[Python#scikit-learn|scikit-learn]] NLP toolkit
 * [http://opennlp.apache.org/ Apache OpenNLP]
+* [http://fasttext.cc/ fastText | Facebook's AI Research] representations and text classifiers ([[Python]])
 * [http://mallet.cs.umass.edu/ MALLET] a Java-based package
-* [http://www.intel.ai/nlp-architect-by-intel-ai-lab-release-0-2/ Intel NLP Architect] (Python)
+* [http://www.intel.ai/nlp-architect-by-intel-ai-lab-release-0-2/ Intel NLP Architect] ([[Python]])
-* [[SpaCy]] (Python and Cython)
+* [[Gensim]] fast Vector Space Modelling, Topic Modeling, LDA implementation ([[Python]])
-* [http://pypi.org/project/gensim/ Gensim] fast Vector Space Modelling, Topic Modeling, LDA implementation (Python)
+* [http://github.com/zalandoresearch/flair flair] use pretrained BERT (PyTorch)
 * [http://allennlp.org/ AllenNLP] an Apache NLP research library (PyTorch)
+* [http://pytorchnlp.readthedocs.io/en/latest/ Pytorch-NLP] (PyTorch)
 * [[Matlab]]
 * [[Sintelix]]
 * [[H2O]] Driveless AI
-* [[Textract]] in the Elastic Stack Architecture
+* [http://dandelion.eu/ Dandelion API]
-* [http://aws.amazon.com/comprehend/ Comprehend | Amazon]
-* [http://aws.amazon.com/transcribe/ Transcribe | Amazon]
-* [http://aws.amazon.com/polly/ Amazon Polly | Amazon]
-* [http://azure.microsoft.com/en-us/services/cognitive-services/ Azure Cognitive Services | Microsoft]
-* [http://azure.microsoft.com/en-us/services/cognitive-services/speech-services/ Azure Bing Speech API | Microsoft]
 * [http://www.programmableweb.com/api/voxsigma VoxSigma API]
 * [http://www.twilio.com/speech-recognition Speech Recognition |Twilio]
@@ Line 63: / Line 82: @@
 * [http://github.com/salesforce/awd-lstm-lm LSTM and QRNN Language Model Toolkit for PyTorch | GitHub]
 * [http://www.thematically.com Thematically Discover]
+* [http://github.com/clips/pattern Pattern] ([[Python]])
+* [http://github.com/aboSamoor/polyglot Polyglot]
 * [http://www.ibm.com/watson/services/speech-to-text/ Speech to Text | IBM]
 * [http://www.ibm.com/watson/services/text-to-speech/ Text to Speech | IBM]
 * [http://www.ibm.com/watson/services/natural-language-understanding/   Watson Natural Language Understanding | IBM]
+* [http://www.research.ibm.com/artificial-intelligence/project-debater/ Project Debater | IBM]
+===== Text Labeling =====
+* [http://github.com/dennybritz/bella Bella] open tool aimed at simplifying and speeding up text data labeling. Usually, if a dataset was labeled in a CSV file or Google spreadsheets, specialists need to convert it to an appropriate format before model training. Bella’s features and simple interface make it a good substitution to spreadsheets and CSV files. A graphical user interface (GUI) and a database backend for managing labeled data are Bella’s main features.
+* [http://www.tagtog.net/ Tagtog] choose three approaches: annotate text manually, hire a team that will label data for them, or use machine learning models for automated annotation.
+* [http://dataturks.com/index.php Dataturks] provides training data preparation tools. Using its products, teams can perform such tasks as parts-of-speech tagging, named-entity recognition tagging, text classification, moderation, and summarization.
+* [http://brat.nlplab.org/ Brat] rapid annotation tool] a web-based tool for text annotation; that is, for adding notes to existing text documents, designed in particular for structured annotation, where the notes are not freeform text but have a fixed form that can be automatically processed and interpreted by a computer.
+* [http://github.com/jiesutd/YEDDA Yedda] a lightweight Collaborative Text Span Annotation Tool developed for annotating chunk/entity/event on text (almost all languages including English, Chinese), symbol and even emoji.
 <youtube>Y2wgQjxrPD8</youtube>

Difference between revisions of "Natural Language Tools & Services"

Revision as of 08:30, 10 July 2020

Capability with Javascript

Capability (other)

Text Labeling

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools