Difference between revisions of "Class"

From
Jump to: navigation, search
(Word Embeddings)
m
 
(18 intermediate revisions by the same user not shown)
Line 1: Line 1:
 +
{{#seo:
 +
|title=PRIMO.ai
 +
|titlemode=append
 +
|keywords=artificial, intelligence, machine, learning, models, algorithms, data, singularity, moonshot, Tensorflow, Google, Nvidia, Microsoft, Azure, Amazon, AWS
 +
|description=Helpful resources for your journey with artificial intelligence; videos, articles, techniques, courses, profiles, and tools
 +
}}
 +
https://courses.nvidia.com/dashboard
 +
 +
* [https://www.google.com/search?q=Yuval+Mazor+%40nividia.com&oq=Yuval+Mazor+%40nividia.com&aqs=chrome..69i57j69i58.9126j0j8&sourceid=chrome&ie=UTF-8 Yuval Mazor | NVIDIA]
 +
 +
Jupyer notebooks:  Shift+Enter to run cell
 +
 +
 
Linguistic Concepts
 
Linguistic Concepts
  
Line 6: Line 19:
 
*recursion
 
*recursion
  
= Word Embeddings =
+
= Word [[Embedding]]s =
 +
 
 +
* HMMS, CRF, PGMs
 +
** CBoW -Bag of Words / ngrams - feature per word/n items
 +
** 1-hot Sparse input - create a vector the size of the entire vocabulary
 +
* Stop Words
 +
*TF-IDF
 +
== Word2Vec ==
 +
Skip-Gram
 +
 
 +
 
 +
*Firth 1957 Distributional Hypotheses
 +
 
 +
* Word Cloud
 +
 
 +
** Text Classification
 +
** Text/Machine Translation (MNT) 
 +
 
 +
= Financial News =
 +
 
 +
Tools:
 +
* Glove
 +
** dot product
 +
* FastText
 +
** Skipgram
 +
** Continuous bag of words
 +
 
 +
Multi-channel LSTM Network
 +
Keras wih TensorFlow
 +
Utilize the GloVe and FastText Skipgram pretrained [[embedding]]s, allows he underlying network to access larger feature space to build complex features on top of.
 +
 
 +
Can use utilize combinations of various corpus and [[embedding]] methods for better performance
 +
 
 +
Bidirectional LSTM network is used o encode sequential information on the [[embedding]] layers.
 +
 
 +
Dense layer to project final output classification
 +
 
 +
Use [[embedding]]...
 +
[[embedding]]s = transfer learning 
 +
 
 +
? CNN vs BI-LSTM (RNN) this approach, BI-LSTM does not need a lot of data
 +
 
 +
Attention mechanism -- translate ... you can look back
 +
                    ... not a fixed vector size
 +
 
 +
* [https://nlp.stanford.edu/projects/glove/ GloVe]
 +
* [https://fasttext.cc/ Fasttext]
 +
* [https://www.slideshare.net/chartbeat/mockup-infographicv4-27900399 News articles per day]
 +
* [https://github.com/philipperemy/financial-news-dataset News data source]
 +
* [https://www.analyticsvidhya.com/blog/2017/06/word-embeddings-count-word2veec/ Word embeddings]
 +
* [https://en.wikipedia.org/wiki/Natural-language_processing Natural Language Processing]
 +
* [https://en.wikipedia.org/wiki/Sentiment_analysis Sentiment Analysis]
 +
 
 +
= Deep Autoencoders for Anomaly Detection =
 +
 
 +
Variable Autoencoders
 +
 
 +
* clustering - latent layers may tell you what number of clusters
 +
* anomaly detection
 +
 
 +
https://courses.nvidia.com/courses/course-v1:DLI+L-FI-06+V1/info
 +
 
 +
PCA or TSenee
 +
 
 +
 
 +
== Statistical Arbitrage ==
 +
arbitrage - monies, stocks (price is better than it should be - fair market value)  how right, or how rich?
 +
Mean inversion
 +
Autoencoder learn the fair market value, then feed in current value
 +
 
 +
reconstruction error is a signal - just one signal, consider a basket of signals
  
HMMS, CRF, PGMs
+
backtesting - if we run this on previous historical events how well does our algorithm work?  (don't use training data !!)
Bag of Words / ngrams - feature per word/n items
 
* 1-hot Sparce input - create a vector the size of the entire vocabulary
 
Stop Words
 
TF-IDF
 
Word2Vec
 
Firth 1957 Distributional Hypothess
 
  
= Text Classification =
+
Using Pandas...
= Text/Machine Translation (MNT) =
+
ori_dataset_categ_transformed.head(10)
 +
for i, val in enumerate(list(ori_dataset_categ_transformed.iloc[1])):
 +
    if val is 1:
 +
        print("Got 1 at {}".format(i))

Latest revision as of 19:07, 26 June 2023

https://courses.nvidia.com/dashboard

Jupyer notebooks: Shift+Enter to run cell


Linguistic Concepts

  • conference - anaphors
  • gang of four design
  • null subject
  • recursion

Word Embeddings

  • HMMS, CRF, PGMs
    • CBoW -Bag of Words / ngrams - feature per word/n items
    • 1-hot Sparse input - create a vector the size of the entire vocabulary
  • Stop Words
  • TF-IDF

Word2Vec

Skip-Gram


  • Firth 1957 Distributional Hypotheses
  • Word Cloud
    • Text Classification
    • Text/Machine Translation (MNT)

Financial News

Tools:

  • Glove
    • dot product
  • FastText
    • Skipgram
    • Continuous bag of words

Multi-channel LSTM Network Keras wih TensorFlow Utilize the GloVe and FastText Skipgram pretrained embeddings, allows he underlying network to access larger feature space to build complex features on top of.

Can use utilize combinations of various corpus and embedding methods for better performance

Bidirectional LSTM network is used o encode sequential information on the embedding layers.

Dense layer to project final output classification

Use embedding... embeddings = transfer learning

? CNN vs BI-LSTM (RNN) this approach, BI-LSTM does not need a lot of data

Attention mechanism -- translate ... you can look back

                   ... not a fixed vector size

Deep Autoencoders for Anomaly Detection

Variable Autoencoders

  • clustering - latent layers may tell you what number of clusters
  • anomaly detection

https://courses.nvidia.com/courses/course-v1:DLI+L-FI-06+V1/info

PCA or TSenee


Statistical Arbitrage

arbitrage - monies, stocks (price is better than it should be - fair market value) how right, or how rich? Mean inversion Autoencoder learn the fair market value, then feed in current value

reconstruction error is a signal - just one signal, consider a basket of signals

backtesting - if we run this on previous historical events how well does our algorithm work? (don't use training data !!)

Using Pandas... ori_dataset_categ_transformed.head(10) for i, val in enumerate(list(ori_dataset_categ_transformed.iloc[1])):

   if val is 1:
       print("Got 1 at {}".format(i))