Difference between revisions of "Natural Language Tools & Services"
(→Capability (other)) |
|||
(3 intermediate revisions by the same user not shown) | |||
Line 11: | Line 11: | ||
* [http://en.wikipedia.org/wiki/Outline_of_natural_language_processing#Natural_language_processing_tools NLP Tools | Wikipedia] | * [http://en.wikipedia.org/wiki/Outline_of_natural_language_processing#Natural_language_processing_tools NLP Tools | Wikipedia] | ||
* [[Data Augmentation#Auto-tagging|Auto-tagging]] | * [[Data Augmentation#Auto-tagging|Auto-tagging]] | ||
+ | * [http://venturebeat.com/2020/07/09/ai-researchers-create-testing-tool-to-find-bugs-in-nlp-from-amazon-google-and-microsoft/ AI researchers create testing tool to find bugs in NLP from Amazon, Google, and Microsoft | Khari Johnson - Venture Beat] ...[http://www.aclweb.org/anthology/2020.acl-main.442/ Beyond Accuracy: Behavioral Testing of NLP Models with CheckList | M. Ribeiro, T. Wu, C. Guestrin, and S. Singh] | ||
* [http://github.com/THUNLP-MT Machine Translation reading list & open-source toolkits | Tsinghua Natural Language Processing Group] | * [http://github.com/THUNLP-MT Machine Translation reading list & open-source toolkits | Tsinghua Natural Language Processing Group] | ||
* [http://www.lexalytics.com/lexablog/build-or-buy-natural-language-processing Build or Buy for Natural Language Processing? | Tim Mohler - Lexaytics] | * [http://www.lexalytics.com/lexablog/build-or-buy-natural-language-processing Build or Buy for Natural Language Processing? | Tim Mohler - Lexaytics] | ||
Line 41: | Line 42: | ||
** [http://aws.amazon.com/transcribe/ Transcribe | Amazon] | ** [http://aws.amazon.com/transcribe/ Transcribe | Amazon] | ||
** [http://aws.amazon.com/polly/ Amazon Polly | Amazon] | ** [http://aws.amazon.com/polly/ Amazon Polly | Amazon] | ||
+ | ** [http://aws.amazon.com/kendra/ Kendra | Amazon] | ||
* Microsoft - Asure: | * Microsoft - Asure: | ||
** [http://azure.microsoft.com/en-us/services/cognitive-services/ Azure Cognitive Services | Microsoft] | ** [http://azure.microsoft.com/en-us/services/cognitive-services/ Azure Cognitive Services | Microsoft] | ||
Line 49: | Line 51: | ||
* [[Natural Language Toolkit (NLTK)]] ([[Python]]) implements [[Natural Language Processing (NLP)#Text Classification|Text Classification]], [[Natural Language Processing (NLP)#Tokenization / Sentence Splitting|Tokenization / Sentence Splitting]], [[Natural Language Processing (NLP)#Stemming (Morphological Similarity)|Stemming (Morphological Similarity)]], [[Natural Language Processing (NLP)#Part-of-Speech (POS) Tagging|Part-of-Speech (POS) Tagging]], [[Natural Language Processing (NLP)#Syntax (Parsing)|Syntax (Parsing)]], and semantic reasoning | * [[Natural Language Toolkit (NLTK)]] ([[Python]]) implements [[Natural Language Processing (NLP)#Text Classification|Text Classification]], [[Natural Language Processing (NLP)#Tokenization / Sentence Splitting|Tokenization / Sentence Splitting]], [[Natural Language Processing (NLP)#Stemming (Morphological Similarity)|Stemming (Morphological Similarity)]], [[Natural Language Processing (NLP)#Part-of-Speech (POS) Tagging|Part-of-Speech (POS) Tagging]], [[Natural Language Processing (NLP)#Syntax (Parsing)|Syntax (Parsing)]], and semantic reasoning | ||
** [http://textblob.readthedocs.io/en/dev/ TextBlob]([[Python]]) is kind of an extension of [[Natural Language Toolkit (NLTK)| NLTK]]. You can access many of [[Natural Language Toolkit (NLTK)| NLTK]]'s functions in a simplified manner; includes functionality from the Pattern library | ** [http://textblob.readthedocs.io/en/dev/ TextBlob]([[Python]]) is kind of an extension of [[Natural Language Toolkit (NLTK)| NLTK]]. You can access many of [[Natural Language Toolkit (NLTK)| NLTK]]'s functions in a simplified manner; includes functionality from the Pattern library | ||
− | * [[SpaCy]] ([[Python]] and Cython) everything as an object rather than a string, which simplifies the interface for building applications | + | * [[SpaCy]] ([[Python]] and Cython) everything as an object rather than a string, which simplifies the interface for building applications. Extract useful information from free text with built-in features to assist analysis, such as work tokeniser, named entity recognition, and part-of-speech detection. Spacy supports more than 55 languages. |
** [http://github.com/chartbeat-labs/textacy textacy] ([[Python]]) focuses primarily on the tasks that come before and follow after [[SpaCy]] | ** [http://github.com/chartbeat-labs/textacy textacy] ([[Python]]) focuses primarily on the tasks that come before and follow after [[SpaCy]] | ||
* [[Python#scikit-learn|scikit-learn]] NLP toolkit | * [[Python#scikit-learn|scikit-learn]] NLP toolkit |
Revision as of 08:30, 10 July 2020
Youtube search... ...Google search
- Natural Language Processing (NLP)
- NLP Tools | Wikipedia
- Auto-tagging
- AI researchers create testing tool to find bugs in NLP from Amazon, Google, and Microsoft | Khari Johnson - Venture Beat ...Beyond Accuracy: Behavioral Testing of NLP Models with CheckList | M. Ribeiro, T. Wu, C. Guestrin, and S. Singh
- Machine Translation reading list & open-source toolkits | Tsinghua Natural Language Processing Group
- Build or Buy for Natural Language Processing? | Tim Mohler - Lexaytics
- Statistical natural language processing and corpus-based computational linguistics: An annotated list of resources | Chris Manning - Stanford
- Software Packages | Cognitive Computation Group, led by Prof. Dan Roth
- PyTorch implementation of the Quasi-Recurrent Neural Network - up to 16 times faster than NVIDIA's cuDNN LSTM | gitHub
- NLP Tool Finder | NIH University of Utah DBMI
- API-based Services
- Natural Language Processing and Machine Learning in JavaScript | David Luecke - Medium
- Top 22 NLP (Natural Language Processing) APIs for Developers in 2018 | RapidAPI
- Natural language processing systems for capturing and standardizing unstructured clinical information: A systematic review | K. Kreimeyera, M. Foster, A. Pandeya, N. Aryaa, G. Halford, S. Jones, R. Forsheea, M. Walderhauga, and T. Botsisa - Journal of Biomedical Informatics 71 natural language processing systems
- 8 great Python libraries for natural language processing | Serdar Yegulalp - InfoWorld
Capability with Javascript
- TensorFlow.js for training and deploying ML models in the browser and on Node.js (was called Deeplearnjs)
- Keras.js No longer active - capability now is in TensorFlow.js
- NLP.js NLP Manager: built on top of several other NLP libraries, including Franc and Brain.js providing Text Classification, Sentiment Analysis, Stemming (Morphological Similarity), Named Entity Recognition (NER), and natural language generation. (nodejs)
- Compromise modest natural-language processing (NLP) interprets and pre-parses English and makes some reasonable decisions
- Natural provides Tokenization / Sentence Splitting, Stemming (Morphological Similarity) (reducing a word to a not-necessarily morphological root), Text Classification, phonetics, Term Frequency–Inverse Document Frequency (TF-IDF), WordNet, string Similarity, some inflections, and more. (nodejs)
Capability (other)
- Google:
- TensorFlow
- Google Natural Language - AutoML Natural Language
- Cloud Speech-to-Text | Google
- Google_Semantic_Reactor a Google Sheets add-on
- Amazon - AWS:
- Textract | Amazon in the Elastic Stack Architecture
- Comprehend | Amazon
- Transcribe | Amazon
- Amazon Polly | Amazon
- Kendra | Amazon
- Microsoft - Asure:
- Standford:
- CoreNLP | Stanford The Stanford Natural Language Processing Group Toolkit (Python)
- Stanza - a Python NLP Library for Many Human Languages ...python NLP Toolkit. Stanza features both a language-agnostic fully neural pipeline for text analysis (supporting 66 human languages), and a python interface to Stanford's CoreNLP java software
- Natural Language Toolkit (NLTK) (Python) implements Text Classification, Tokenization / Sentence Splitting, Stemming (Morphological Similarity), Part-of-Speech (POS) Tagging, Syntax (Parsing), and semantic reasoning
- SpaCy (Python and Cython) everything as an object rather than a string, which simplifies the interface for building applications. Extract useful information from free text with built-in features to assist analysis, such as work tokeniser, named entity recognition, and part-of-speech detection. Spacy supports more than 55 languages.
- scikit-learn NLP toolkit
- Apache OpenNLP
- fastText | Facebook's AI Research representations and text classifiers (Python)
- MALLET a Java-based package
- Intel NLP Architect (Python)
- Gensim fast Vector Space Modelling, Topic Modeling, LDA implementation (Python)
- flair use pretrained BERT (PyTorch)
- AllenNLP an Apache NLP research library (PyTorch)
- Pytorch-NLP (PyTorch)
- Matlab
- Sintelix
- H2O Driveless AI
- Dandelion API
- VoxSigma API
- Speech Recognition |Twilio
- Automatic Speech Recognition (ASR) | Speechmatics
- Voice API | Nexmo
- wit.ai
- Meaning Cloud
- Haven OnDemand
- Aylien
- Lexalytics
- Dialogflow
- Indico
- TextRazor
- Intellexer
- Meaning Cloud
- LSTM and QRNN Language Model Toolkit for PyTorch | GitHub
- Thematically Discover
- Pattern (Python)
- Polyglot
- Speech to Text | IBM
- Text to Speech | IBM
- Watson Natural Language Understanding | IBM
- Project Debater | IBM
Text Labeling
- Bella open tool aimed at simplifying and speeding up text data labeling. Usually, if a dataset was labeled in a CSV file or Google spreadsheets, specialists need to convert it to an appropriate format before model training. Bella’s features and simple interface make it a good substitution to spreadsheets and CSV files. A graphical user interface (GUI) and a database backend for managing labeled data are Bella’s main features.
- Tagtog choose three approaches: annotate text manually, hire a team that will label data for them, or use machine learning models for automated annotation.
- Dataturks provides training data preparation tools. Using its products, teams can perform such tasks as parts-of-speech tagging, named-entity recognition tagging, text classification, moderation, and summarization.
- Brat rapid annotation tool] a web-based tool for text annotation; that is, for adding notes to existing text documents, designed in particular for structured annotation, where the notes are not freeform text but have a fixed form that can be automatically processed and interpreted by a computer.
- Yedda a lightweight Collaborative Text Span Annotation Tool developed for annotating chunk/entity/event on text (almost all languages including English, Chinese), symbol and even emoji.