Difference between revisions of "Natural Language Tools & Services"
(→Capability (other)) |
|||
(43 intermediate revisions by the same user not shown) | |||
Line 10: | Line 10: | ||
* [[Natural Language Processing (NLP)]] | * [[Natural Language Processing (NLP)]] | ||
* [http://en.wikipedia.org/wiki/Outline_of_natural_language_processing#Natural_language_processing_tools NLP Tools | Wikipedia] | * [http://en.wikipedia.org/wiki/Outline_of_natural_language_processing#Natural_language_processing_tools NLP Tools | Wikipedia] | ||
+ | * [[Data Augmentation#Auto-tagging|Auto-tagging]] | ||
+ | * [http://venturebeat.com/2020/07/09/ai-researchers-create-testing-tool-to-find-bugs-in-nlp-from-amazon-google-and-microsoft/ AI researchers create testing tool to find bugs in NLP from Amazon, Google, and Microsoft | Khari Johnson - Venture Beat] ...[http://www.aclweb.org/anthology/2020.acl-main.442/ Beyond Accuracy: Behavioral Testing of NLP Models with CheckList | M. Ribeiro, T. Wu, C. Guestrin, and S. Singh] | ||
+ | * [http://github.com/THUNLP-MT Machine Translation reading list & open-source toolkits | Tsinghua Natural Language Processing Group] | ||
+ | * [http://www.lexalytics.com/lexablog/build-or-buy-natural-language-processing Build or Buy for Natural Language Processing? | Tim Mohler - Lexaytics] | ||
* [http://nlp.stanford.edu/links/statnlp.html Statistical natural language processing and corpus-based computational linguistics: An annotated list of resources | Chris Manning - Stanford] | * [http://nlp.stanford.edu/links/statnlp.html Statistical natural language processing and corpus-based computational linguistics: An annotated list of resources | Chris Manning - Stanford] | ||
* [http://cogcomp.org/page/software/ Software Packages | Cognitive Computation Group, led by Prof. Dan Roth] | * [http://cogcomp.org/page/software/ Software Packages | Cognitive Computation Group, led by Prof. Dan Roth] | ||
Line 17: | Line 21: | ||
* [http://medium.com/@daffl/natural-language-processing-and-machine-learning-in-javascript-249181a3b721 Natural Language Processing and Machine Learning in JavaScript | David Luecke - Medium] | * [http://medium.com/@daffl/natural-language-processing-and-machine-learning-in-javascript-249181a3b721 Natural Language Processing and Machine Learning in JavaScript | David Luecke - Medium] | ||
* [http://blog.rapidapi.com/best-nlp-api/ Top 22 NLP (Natural Language Processing) APIs for Developers in 2018 | RapidAPI] | * [http://blog.rapidapi.com/best-nlp-api/ Top 22 NLP (Natural Language Processing) APIs for Developers in 2018 | RapidAPI] | ||
+ | * [http://www.sciencedirect.com/science/article/pii/S1532046417301685 Natural language processing systems for capturing and standardizing unstructured clinical information: A systematic review | K. Kreimeyera, M. Foster, A. Pandeya, N. Aryaa, G. Halford, S. Jones, R. Forsheea, M. Walderhauga, and T. Botsisa - Journal of Biomedical Informatics] 71 natural language processing systems | ||
+ | * [http://www.infoworld.com/article/3519413/8-great-python-libraries-for-natural-language-processing.html 8 great Python libraries for natural language processing | Serdar Yegulalp - InfoWorld] | ||
==== Capability with [[Javascript]] ==== | ==== Capability with [[Javascript]] ==== | ||
− | * [[TensorFlow.js]] for training and deploying ML models in the browser and on [ | + | * [[TensorFlow.js]] for training and deploying ML models in the browser and on [[Javascript#Node.js|Node.js]] (was called Deeplearnjs) |
** [http://transcranial.github.io/keras-js/#/ Keras.js] No longer active - capability now is in TensorFlow.js | ** [http://transcranial.github.io/keras-js/#/ Keras.js] No longer active - capability now is in TensorFlow.js | ||
− | * [http://www.npmjs.com/package/node-nlp NLP.js] NLP Manager: | + | * [http://www.npmjs.com/package/node-nlp NLP.js] NLP Manager: built on top of several other NLP libraries, including [http://github.com/wooorm/franc Franc] and [http://brain.js.org/#/ Brain.js] providing [[Natural Language Processing (NLP)#Text Classification|Text Classification]], [[Natural Language Processing (NLP)#Sentiment Analysis|Sentiment Analysis]], [[Natural Language Processing (NLP)#Stemming (Morphological Similarity)|Stemming (Morphological Similarity)]], [[Natural Language Processing (NLP)#Named Entity Recognition (NER)|Named Entity Recognition (NER)]], and natural language generation. (nodejs) |
* [http://compromise.cool/ Compromise] modest natural-language processing (NLP) interprets and pre-parses English and makes some reasonable decisions | * [http://compromise.cool/ Compromise] modest natural-language processing (NLP) interprets and pre-parses English and makes some reasonable decisions | ||
− | * [http://github.com/NaturalNode/natural Natural] provides | + | * [http://github.com/NaturalNode/natural Natural] provides [[Natural Language Processing (NLP)#Tokenization / Sentence Splitting|Tokenization / Sentence Splitting]], [[Natural Language Processing (NLP)#Stemming (Morphological Similarity)|Stemming (Morphological Similarity)]] (reducing a word to a not-necessarily morphological root), [[Natural Language Processing (NLP)#Text Classification|Text Classification]], phonetics, [[Term Frequency–Inverse Document Frequency (TF-IDF)]], [http://wordnet.princeton.edu/ WordNet], string [[Natural Language Processing (NLP)#Similarity|Similarity]], some inflections, and more. (nodejs) |
==== Capability (other) ==== | ==== Capability (other) ==== | ||
− | * [[TensorFlow]] | + | * [[Google]]: |
− | * [[Google Natural Language]] | + | ** [[TensorFlow]] |
− | * [http://cloud.google.com/speech-to-text/ Cloud Speech-to-Text | Google] | + | ** [[Google Natural Language]] - AutoML Natural Language |
− | * [http:// | + | ** [http://cloud.google.com/speech-to-text/ Cloud Speech-to-Text | Google] |
− | * [http://stanfordnlp.github.io/CoreNLP/ CoreNLP | Stanford] (Python) | + | ** [[Google_Semantic_Reactor]] a [http://www.google.com/sheets/about/ Google Sheets] add-on |
− | * [[Natural Language Toolkit (NLTK)]] (Python) | + | * Amazon - AWS: |
+ | ** [[Textract]] | Amazon in the Elastic Stack Architecture | ||
+ | ** [http://aws.amazon.com/comprehend/ Comprehend | Amazon] | ||
+ | ** [http://aws.amazon.com/transcribe/ Transcribe | Amazon] | ||
+ | ** [http://aws.amazon.com/polly/ Amazon Polly | Amazon] | ||
+ | ** [http://aws.amazon.com/kendra/ Kendra | Amazon] | ||
+ | * Microsoft - Asure: | ||
+ | ** [http://azure.microsoft.com/en-us/services/cognitive-services/ Azure Cognitive Services | Microsoft] | ||
+ | ** [http://azure.microsoft.com/en-us/services/cognitive-services/speech-services/ Azure Bing Speech API | Microsoft] | ||
+ | * Standford: | ||
+ | ** [http://stanfordnlp.github.io/CoreNLP/ CoreNLP | Stanford] The Stanford Natural Language Processing Group Toolkit ([[Python]]) | ||
+ | ** [http://stanfordnlp.github.io/stanza/ Stanza - a Python NLP Library for Many Human Languages] ...[http://www.infoq.com/news/2020/03/stanza-nlp-toolkit/ python NLP Toolkit]. Stanza features both a language-agnostic fully neural pipeline for text analysis (supporting 66 human languages), and a python interface to Stanford's CoreNLP java software | ||
+ | * [[Natural Language Toolkit (NLTK)]] ([[Python]]) implements [[Natural Language Processing (NLP)#Text Classification|Text Classification]], [[Natural Language Processing (NLP)#Tokenization / Sentence Splitting|Tokenization / Sentence Splitting]], [[Natural Language Processing (NLP)#Stemming (Morphological Similarity)|Stemming (Morphological Similarity)]], [[Natural Language Processing (NLP)#Part-of-Speech (POS) Tagging|Part-of-Speech (POS) Tagging]], [[Natural Language Processing (NLP)#Syntax (Parsing)|Syntax (Parsing)]], and semantic reasoning | ||
+ | ** [http://textblob.readthedocs.io/en/dev/ TextBlob]([[Python]]) is kind of an extension of [[Natural Language Toolkit (NLTK)| NLTK]]. You can access many of [[Natural Language Toolkit (NLTK)| NLTK]]'s functions in a simplified manner; includes functionality from the Pattern library | ||
+ | * [[SpaCy]] ([[Python]] and Cython) everything as an object rather than a string, which simplifies the interface for building applications. Extract useful information from free text with built-in features to assist analysis, such as work tokeniser, named entity recognition, and part-of-speech detection. Spacy supports more than 55 languages. | ||
+ | ** [http://github.com/chartbeat-labs/textacy textacy] ([[Python]]) focuses primarily on the tasks that come before and follow after [[SpaCy]] | ||
+ | * [[Python#scikit-learn|scikit-learn]] NLP toolkit | ||
* [http://opennlp.apache.org/ Apache OpenNLP] | * [http://opennlp.apache.org/ Apache OpenNLP] | ||
+ | * [http://fasttext.cc/ fastText | Facebook's AI Research] representations and text classifiers ([[Python]]) | ||
* [http://mallet.cs.umass.edu/ MALLET] a Java-based package | * [http://mallet.cs.umass.edu/ MALLET] a Java-based package | ||
− | * [http://www.intel.ai/nlp-architect-by-intel-ai-lab-release-0-2/ Intel NLP Architect] (Python) | + | * [http://www.intel.ai/nlp-architect-by-intel-ai-lab-release-0-2/ Intel NLP Architect] ([[Python]]) |
− | * [[ | + | * [[Gensim]] fast Vector Space Modelling, Topic Modeling, LDA implementation ([[Python]]) |
− | * [http:// | + | * [http://github.com/zalandoresearch/flair flair] use pretrained BERT (PyTorch) |
* [http://allennlp.org/ AllenNLP] an Apache NLP research library (PyTorch) | * [http://allennlp.org/ AllenNLP] an Apache NLP research library (PyTorch) | ||
+ | * [http://pytorchnlp.readthedocs.io/en/latest/ Pytorch-NLP] (PyTorch) | ||
* [[Matlab]] | * [[Matlab]] | ||
* [[Sintelix]] | * [[Sintelix]] | ||
* [[H2O]] Driveless AI | * [[H2O]] Driveless AI | ||
− | + | * [http://dandelion.eu/ Dandelion API] | |
− | * [http:// | ||
− | |||
− | |||
− | |||
− | |||
* [http://www.programmableweb.com/api/voxsigma VoxSigma API] | * [http://www.programmableweb.com/api/voxsigma VoxSigma API] | ||
* [http://www.twilio.com/speech-recognition Speech Recognition |Twilio] | * [http://www.twilio.com/speech-recognition Speech Recognition |Twilio] | ||
Line 63: | Line 82: | ||
* [http://github.com/salesforce/awd-lstm-lm LSTM and QRNN Language Model Toolkit for PyTorch | GitHub] | * [http://github.com/salesforce/awd-lstm-lm LSTM and QRNN Language Model Toolkit for PyTorch | GitHub] | ||
* [http://www.thematically.com Thematically Discover] | * [http://www.thematically.com Thematically Discover] | ||
+ | * [http://github.com/clips/pattern Pattern] ([[Python]]) | ||
+ | * [http://github.com/aboSamoor/polyglot Polyglot] | ||
* [http://www.ibm.com/watson/services/speech-to-text/ Speech to Text | IBM] | * [http://www.ibm.com/watson/services/speech-to-text/ Speech to Text | IBM] | ||
* [http://www.ibm.com/watson/services/text-to-speech/ Text to Speech | IBM] | * [http://www.ibm.com/watson/services/text-to-speech/ Text to Speech | IBM] | ||
* [http://www.ibm.com/watson/services/natural-language-understanding/ Watson Natural Language Understanding | IBM] | * [http://www.ibm.com/watson/services/natural-language-understanding/ Watson Natural Language Understanding | IBM] | ||
+ | * [http://www.research.ibm.com/artificial-intelligence/project-debater/ Project Debater | IBM] | ||
+ | |||
+ | ===== Text Labeling ===== | ||
+ | * [http://github.com/dennybritz/bella Bella] open tool aimed at simplifying and speeding up text data labeling. Usually, if a dataset was labeled in a CSV file or Google spreadsheets, specialists need to convert it to an appropriate format before model training. Bella’s features and simple interface make it a good substitution to spreadsheets and CSV files. A graphical user interface (GUI) and a database backend for managing labeled data are Bella’s main features. | ||
+ | * [http://www.tagtog.net/ Tagtog] choose three approaches: annotate text manually, hire a team that will label data for them, or use machine learning models for automated annotation. | ||
+ | * [http://dataturks.com/index.php Dataturks] provides training data preparation tools. Using its products, teams can perform such tasks as parts-of-speech tagging, named-entity recognition tagging, text classification, moderation, and summarization. | ||
+ | * [http://brat.nlplab.org/ Brat] rapid annotation tool] a web-based tool for text annotation; that is, for adding notes to existing text documents, designed in particular for structured annotation, where the notes are not freeform text but have a fixed form that can be automatically processed and interpreted by a computer. | ||
+ | * [http://github.com/jiesutd/YEDDA Yedda] a lightweight Collaborative Text Span Annotation Tool developed for annotating chunk/entity/event on text (almost all languages including English, Chinese), symbol and even emoji. | ||
<youtube>Y2wgQjxrPD8</youtube> | <youtube>Y2wgQjxrPD8</youtube> |
Revision as of 08:30, 10 July 2020
Youtube search... ...Google search
- Natural Language Processing (NLP)
- NLP Tools | Wikipedia
- Auto-tagging
- AI researchers create testing tool to find bugs in NLP from Amazon, Google, and Microsoft | Khari Johnson - Venture Beat ...Beyond Accuracy: Behavioral Testing of NLP Models with CheckList | M. Ribeiro, T. Wu, C. Guestrin, and S. Singh
- Machine Translation reading list & open-source toolkits | Tsinghua Natural Language Processing Group
- Build or Buy for Natural Language Processing? | Tim Mohler - Lexaytics
- Statistical natural language processing and corpus-based computational linguistics: An annotated list of resources | Chris Manning - Stanford
- Software Packages | Cognitive Computation Group, led by Prof. Dan Roth
- PyTorch implementation of the Quasi-Recurrent Neural Network - up to 16 times faster than NVIDIA's cuDNN LSTM | gitHub
- NLP Tool Finder | NIH University of Utah DBMI
- API-based Services
- Natural Language Processing and Machine Learning in JavaScript | David Luecke - Medium
- Top 22 NLP (Natural Language Processing) APIs for Developers in 2018 | RapidAPI
- Natural language processing systems for capturing and standardizing unstructured clinical information: A systematic review | K. Kreimeyera, M. Foster, A. Pandeya, N. Aryaa, G. Halford, S. Jones, R. Forsheea, M. Walderhauga, and T. Botsisa - Journal of Biomedical Informatics 71 natural language processing systems
- 8 great Python libraries for natural language processing | Serdar Yegulalp - InfoWorld
Capability with Javascript
- TensorFlow.js for training and deploying ML models in the browser and on Node.js (was called Deeplearnjs)
- Keras.js No longer active - capability now is in TensorFlow.js
- NLP.js NLP Manager: built on top of several other NLP libraries, including Franc and Brain.js providing Text Classification, Sentiment Analysis, Stemming (Morphological Similarity), Named Entity Recognition (NER), and natural language generation. (nodejs)
- Compromise modest natural-language processing (NLP) interprets and pre-parses English and makes some reasonable decisions
- Natural provides Tokenization / Sentence Splitting, Stemming (Morphological Similarity) (reducing a word to a not-necessarily morphological root), Text Classification, phonetics, Term Frequency–Inverse Document Frequency (TF-IDF), WordNet, string Similarity, some inflections, and more. (nodejs)
Capability (other)
- Google:
- TensorFlow
- Google Natural Language - AutoML Natural Language
- Cloud Speech-to-Text | Google
- Google_Semantic_Reactor a Google Sheets add-on
- Amazon - AWS:
- Textract | Amazon in the Elastic Stack Architecture
- Comprehend | Amazon
- Transcribe | Amazon
- Amazon Polly | Amazon
- Kendra | Amazon
- Microsoft - Asure:
- Standford:
- CoreNLP | Stanford The Stanford Natural Language Processing Group Toolkit (Python)
- Stanza - a Python NLP Library for Many Human Languages ...python NLP Toolkit. Stanza features both a language-agnostic fully neural pipeline for text analysis (supporting 66 human languages), and a python interface to Stanford's CoreNLP java software
- Natural Language Toolkit (NLTK) (Python) implements Text Classification, Tokenization / Sentence Splitting, Stemming (Morphological Similarity), Part-of-Speech (POS) Tagging, Syntax (Parsing), and semantic reasoning
- SpaCy (Python and Cython) everything as an object rather than a string, which simplifies the interface for building applications. Extract useful information from free text with built-in features to assist analysis, such as work tokeniser, named entity recognition, and part-of-speech detection. Spacy supports more than 55 languages.
- scikit-learn NLP toolkit
- Apache OpenNLP
- fastText | Facebook's AI Research representations and text classifiers (Python)
- MALLET a Java-based package
- Intel NLP Architect (Python)
- Gensim fast Vector Space Modelling, Topic Modeling, LDA implementation (Python)
- flair use pretrained BERT (PyTorch)
- AllenNLP an Apache NLP research library (PyTorch)
- Pytorch-NLP (PyTorch)
- Matlab
- Sintelix
- H2O Driveless AI
- Dandelion API
- VoxSigma API
- Speech Recognition |Twilio
- Automatic Speech Recognition (ASR) | Speechmatics
- Voice API | Nexmo
- wit.ai
- Meaning Cloud
- Haven OnDemand
- Aylien
- Lexalytics
- Dialogflow
- Indico
- TextRazor
- Intellexer
- Meaning Cloud
- LSTM and QRNN Language Model Toolkit for PyTorch | GitHub
- Thematically Discover
- Pattern (Python)
- Polyglot
- Speech to Text | IBM
- Text to Speech | IBM
- Watson Natural Language Understanding | IBM
- Project Debater | IBM
Text Labeling
- Bella open tool aimed at simplifying and speeding up text data labeling. Usually, if a dataset was labeled in a CSV file or Google spreadsheets, specialists need to convert it to an appropriate format before model training. Bella’s features and simple interface make it a good substitution to spreadsheets and CSV files. A graphical user interface (GUI) and a database backend for managing labeled data are Bella’s main features.
- Tagtog choose three approaches: annotate text manually, hire a team that will label data for them, or use machine learning models for automated annotation.
- Dataturks provides training data preparation tools. Using its products, teams can perform such tasks as parts-of-speech tagging, named-entity recognition tagging, text classification, moderation, and summarization.
- Brat rapid annotation tool] a web-based tool for text annotation; that is, for adding notes to existing text documents, designed in particular for structured annotation, where the notes are not freeform text but have a fixed form that can be automatically processed and interpreted by a computer.
- Yedda a lightweight Collaborative Text Span Annotation Tool developed for annotating chunk/entity/event on text (almost all languages including English, Chinese), symbol and even emoji.