Difference between revisions of "End-to-End Speech"

From
Jump to: navigation, search
(Created page with "{{#seo: |title=PRIMO.ai |titlemode=append |keywords=artificial, intelligence, machine, learning, models, algorithms, data, singularity, moonshot, Tensorflow, Google, Nvidia, M...")
 
m
 
(32 intermediate revisions by the same user not shown)
Line 2: Line 2:
 
|title=PRIMO.ai
 
|title=PRIMO.ai
 
|titlemode=append
 
|titlemode=append
|keywords=artificial, intelligence, machine, learning, models, algorithms, data, singularity, moonshot, Tensorflow, Google, Nvidia, Microsoft, Azure, Amazon, AWS  
+
|keywords=ChatGPT, artificial, intelligence, machine, learning, GPT-4, GPT-5, NLP, NLG, NLC, NLU, models, data, singularity, moonshot, Sentience, AGI, Emergence, Moonshot, Explainable, TensorFlow, Google, Nvidia, Microsoft, Azure, Amazon, AWS, Hugging Face, OpenAI, Tensorflow, OpenAI, Google, Nvidia, Microsoft, Azure, Amazon, AWS, Meta, LLM, metaverse, assistants, agents, digital twin, IoT, Transhumanism, Immersive Reality, Generative AI, Conversational AI, Perplexity, Bing, You, Bard, Ernie, prompt Engineering LangChain, Video/Image, Vision, End-to-End Speech, Synthesize Speech, Speech Recognition, Stanford, MIT |description=Helpful resources for your journey with artificial intelligence; videos, articles, techniques, courses, profiles, and tools
|description=Helpful resources for your journey with artificial intelligence; videos, articles, techniques, courses, profiles, and tools  
+
 
 +
<!-- Google tag (gtag.js) -->
 +
<script async src="https://www.googletagmanager.com/gtag/js?id=G-4GCWLBVJ7T"></script>
 +
<script>
 +
  window.dataLayer = window.dataLayer || [];
 +
  function gtag(){dataLayer.push(arguments);}
 +
  gtag('js', new Date());
 +
 
 +
  gtag('config', 'G-4GCWLBVJ7T');
 +
</script>
 
}}
 
}}
[http://www.youtube.com/results?search_query=sequence+to+sequence+learning+Seq2seq+neural+networks YouTube search...]
+
[https://www.youtube.com/results?search_query=ai+End-to-End+speech+nlp YouTube]
[http://www.google.com/search?q=sequence+to+sequence+deep+machine+learning+ML ...Google search]
+
[https://www.quora.com/search?q=ai%20End-to-End%20~speech%20nlp ... Quora]
 +
[https://www.google.com/search?q=ai+End-to-End+speech+nlp ...Google search]
 +
[https://news.google.com/search?q=ai+End-to-End+speech+nlp ...Google News]
 +
[https://www.bing.com/news/search?q=ai+End-to-End+speech+nlp&qft=interval%3d%228%22 ...Bing News]
  
* [[End-to-End Speech]]
+
* [[End-to-End Speech]] ... [[Synthesize Speech]] ... [[Speech Recognition]] ... [[Music]]
* [http://github.com/NVIDIA/OpenSeq2Seq Open Seq2Seq | NVIDIA]
+
* [[Video/Image]] ... [[Vision]] ... [[Enhancement]] ... [[Fake]] ... [[Reconstruction]] ... [[Colorize]] ... [[Occlusions]] ... [[Predict image]] ... [[Image/Video Transfer Learning]] ... [[Art]] ... [[Photography]]
* [[Recurrent Neural Network (RNN)]]]]
 
 
* [[Autoencoder (AE) / Encoder-Decoder]]
 
* [[Autoencoder (AE) / Encoder-Decoder]]
* [[Attention Models]]
+
* [[Sequence to Sequence (Seq2Seq)]]
* [[Natural Language Processing (NLP)]]
+
* [[Large Language Model (LLM)]] ... [[Natural Language Processing (NLP)]]  ...[[Natural Language Generation (NLG)|Generation]] ... [[Natural Language Classification (NLC)|Classification]] ...  [[Natural Language Processing (NLP)#Natural Language Understanding (NLU)|Understanding]] ... [[Language Translation|Translation]] ... [[Natural Language Tools & Services|Tools & Services]]
* [[Assistants]]
+
* [[Conversational AI]] ... [[ChatGPT]] | [[OpenAI]] ... [[Bing/Copilot]] | [[Microsoft]] ... [[Gemini]] | [[Google]] ... [[Claude]] | [[Anthropic]] ... [[Perplexity]] ... [[You]] ... [[phind]] ... [[Ernie]] | [[Baidu]]
* [[Attention]] Mechanism/[[Transformer]] Model
+
* [[Agents]] ... [[Robotic Process Automation (RPA)|Robotic Process Automation]] ... [[Assistants]] ... [[Personal Companions]] ... [[Personal Productivity|Productivity]] ... [[Email]] ... [[Negotiation]] ... [[LangChain]]
* [[NLP Keras model in browser with TensorFlow.js]]
+
 
* [http://towardsdatascience.com/understanding-encoder-decoder-sequence-to-sequence-model-679e04af4346 Understanding Encoder-Decoder Sequence to Sequence Model | Simeon Kostadinov - Towards Data Science]
+
 
 +
<youtube>3MjIkWxXigM</youtube>
 +
<youtube>WTB2p4bqtXU</youtube>
  
* [[Natural Language Processing (NLP)]]
+
== Translatotron ==
* [http://jalammar.github.io/visualizing-neural-machine-translation-mechanics-of-seq2seq-models-with-attention/ Visualizing A Neural Machine Translation Model (Mechanics of Seq2seq Models With Attention) | Jay Alammar]
+
[https://www.youtube.com/results?search_query=Translatotron YouTube search...]
 +
[https://www.google.com/search?q=Translatotron ...Google search]
  
a general-purpose encoder-decoder that can be used for machine translation, text summarization, conversational modeling, image captioning, interpreting dialects of software code, and more.
+
* [https://google-research.github.io/lingvo-lab/translatotron/ Y. Jia, R. Weiss, F. Biadsy, W. Macherey, M. Johnson, Z. Chen, and Y. Wu - Google AI]
  
http://3.bp.blogspot.com/-3Pbj_dvt0Vo/V-qe-Nl6P5I/AAAAAAAABQc/z0_6WtVWtvARtMk0i9_AtLeyyGyV6AI4wCLcB/s1600/nmt-model-fast.gif
+
Translatotron is the first end-to-end model that can directly translate speech from one language into speech in another language. It is also able to retain the source speaker’s voice in the translated speech. We hope that this work can serve as a starting point for future research on end-to-end speech-to-speech translation systems. ... demonstrating that a single sequence-to-sequence model can directly translate speech from one language into speech in another language, without relying on an intermediate text representation in either language, as is required in cascaded systems.... Translatotron is based on a sequence-to-sequence network which takes source spectrograms as input and generates spectrograms of the translated content in the target language. It also makes use of two other separately trained components: a neural vocoder that converts output spectrograms to time-domain waveforms, and, optionally, a speaker encoder that can be used to maintain the character of the source speaker’s voice in the synthesized translated speech. During training, the sequence-to-sequence model uses a multitask objective to predict source and target transcripts at the same time as generating target spectrograms. However, no transcripts or other intermediate text representations are used during inference. [https://ai.googleblog.com/2019/05/introducing-translatotron-end-to-end.html Translatotron | Ye Jia and Ron Weiss - Google AI]
  
[http://google.github.io/seq2seq/ Seq2seq | GitHub]
 
  
We essentially have two different recurrent neural networks tied together here — the encoder RNN (bottom left boxes) listens to the input tokens until it gets a special <DONE> token, and then the decoder RNN (top right boxes) takes over and starts generating tokens, also finishing with its own <DONE> token. The encoder RNN evolves its internal state (depicted by light blue changing to dark blue while the English sentence tokens come in), and then once the <DONE> token arrives, we take the final encoder state (the dark blue box) and pass it, unchanged and repeatedly, into the decoder RNN along with every single generated German token. The decoder RNN also has its own dynamic internal state, going from light red to dark red.
 
Voila! Variable-length input, variable-length output, from a fixed-size architecture. [http://medium.com/@devnag/seq2seq-the-clown-car-of-deep-learning-f88e1204dac3 seq2seq: the clown car of deep learning | Dev Nag - Medium]
 
  
 +
<img src="https://1.bp.blogspot.com/-hn0CfmmL2Jg/XNxKKDspftI/AAAAAAAAEIA/SCLFCB9wWPESuPcg28C3MOTL66kFTbluQCLcBGAs/s400/image1.png" width="800">
  
http://cdn-images-1.medium.com/max/1080/1*yG2htcHJF9h0sohcZbBEkg.png
 
  
<youtube>CMank9YmtTM</youtube>
+
<youtube>38ZXwJj6j8k</youtube>
<youtube>ElmBrKyMXxs</youtube>
+
<youtube>tKwK8GHLYOo</youtube>
<youtube>oF0Rboc4IJw</youtube>
+
<youtube>KFY6m7j9v04</youtube>
<youtube>G5RY_SUJih4</youtube>
 
<youtube>_Sm0q_FckM8</youtube>
 
<youtube>oiNFCbD_4Tk</youtube>
 
<youtube>RA5oJnKbyB4</youtube>
 

Latest revision as of 08:29, 23 March 2024

YouTube ... Quora ...Google search ...Google News ...Bing News


Translatotron

YouTube search... ...Google search

Translatotron is the first end-to-end model that can directly translate speech from one language into speech in another language. It is also able to retain the source speaker’s voice in the translated speech. We hope that this work can serve as a starting point for future research on end-to-end speech-to-speech translation systems. ... demonstrating that a single sequence-to-sequence model can directly translate speech from one language into speech in another language, without relying on an intermediate text representation in either language, as is required in cascaded systems.... Translatotron is based on a sequence-to-sequence network which takes source spectrograms as input and generates spectrograms of the translated content in the target language. It also makes use of two other separately trained components: a neural vocoder that converts output spectrograms to time-domain waveforms, and, optionally, a speaker encoder that can be used to maintain the character of the source speaker’s voice in the synthesized translated speech. During training, the sequence-to-sequence model uses a multitask objective to predict source and target transcripts at the same time as generating target spectrograms. However, no transcripts or other intermediate text representations are used during inference. Translatotron | Ye Jia and Ron Weiss - Google AI