- Latent Dirichlet Allocation (LDA)
- Natural Language Processing (NLP)
- Beautiful Soup a Python library designed for quick turnaround projects like screen-scraping
- Term Frequency–Inverse Document Frequency (TF-IDF)
- Probabilistic Latent Semantic Analysis (PLSA)
- Topic model | Wikipedia
- Hierarchical Dirichlet Process | Wikipedia
- Latent Semantic Analysis | Wikipedia
- Explicit semantic analysis | Wikipedia
- Topic Modeling with LSA, PLSA, LDA & lda2Vec | Joyce Xu
Topic modelling can be described as a method for finding a group of words (i.e topic) from a collection of documents that best represents the information in the collection. It can also be thought of as a form of text mining – a way to obtain recurring patterns of words in textual material.
In machine learning and Natural Language Processing (NLP), a topic model is a type of statistical model for discovering the abstract "topics" that occur in a collection of documents. Topic modeling is a frequently used text-mining tool for discovery of hidden semantic structures in a text body. Intuitively, given that a document is about a particular topic, one would expect particular words to appear in the document more or less frequently: "dog" and "bone" will appear more often in documents about dogs, "cat" and "meow" will appear in documents about cats, and "the" and "is" will appear equally in both.
A topic map is a standard for the representation and interchange of knowledge, with an emphasis on the findability of information. Topic maps were originally developed in the late 1990s as a way to represent back-of-the-book index structures so that multiple indexes from different sources could be merged. However, the developers quickly realized that with a little additional generalization, they could create a meta-model with potentially far wider application. The ISO standard is formally known as ISO/IEC 13250:2003.
A topic map represents information using
- topics, representing any concept, from people, countries, and organizations to software modules, individual files, and events,
- associations, representing hypergraph relationships between topics, and
- occurrences, representing information resources relevant to a particular topic.
Topic maps are similar to concept maps and mind maps in many respects, though only topic maps are ISO standards. Topic maps are a form of semantic web technology similar to RDF.