Difference between revisions of "Bidirectional Encoder Representations from Transformers (BERT)"

From
Jump to: navigation, search
m
m
Line 48: Line 48:
  
  
how BERT works using a fun MadLibs example. BERT is like a super smart language model that understands words and sentences really well.
+
How BERT works using a fun MadLibs example. BERT is like a super smart language model that understands words and sentences really well. Imagine you're playing MadLibs, and you have a sentence with a missing word. Let's say the sentence is:  
 
 
Imagine you're playing MadLibs, and you have a sentence with a missing word. Let's say the sentence is:  
 
  
 
<hr><center><b><i>
 
<hr><center><b><i>
Line 60: Line 58:
 
You need to fill in the blank with the right word to make the sentence make sense. BERT does something similar, but with much more complex sentences. Here's how it works:
 
You need to fill in the blank with the right word to make the sentence make sense. BERT does something similar, but with much more complex sentences. Here's how it works:
  
* <b>Tokenization<b>: BERT first breaks down the sentence into smaller pieces, called [[Large Language Model (LLM)#token|token]]s. In our MadLibs example, each word is a [[Large Language Model (LLM)#token|token]]. So, "I," "went," "to," "the," "_______," "to," "buy," "some," "delicious," "ice," and "cream" are all [[Large Language Model (LLM)#token|token]]s.
+
* <b>Tokenization</b>: BERT first breaks down the sentence into smaller pieces, called [[Large Language Model (LLM)#token|token]]s. In our MadLibs example, each word is a [[Large Language Model (LLM)#token|token]]. So, "I," "went," "to," "the," "_______," "to," "buy," "some," "delicious," "ice," and "cream" are all [[Large Language Model (LLM)#token|token]]s.
  
* <b>Word Representation<b>: BERT gives each token a special code to represent it. Just like in MadLibs, you might have a list of possible words that could fit in the blank, and each word has a number next to it. BERT does something similar by giving each token a unique code that helps it understand what the word means.
+
* <b>Word Representation</b>: BERT gives each token a special code to represent it. Just like in MadLibs, you might have a list of possible words that could fit in the blank, and each word has a number next to it. BERT does something similar by giving each token a unique code that helps it understand what the word means.
  
* <b>Context Matters<b>: BERT doesn't just look at one token at a time; it pays attention to all the [[Large Language Model (LLM)#token|token]]s in the sentence. This is like looking at the whole MadLibs sentence to figure out what word fits best in the blank. BERT considers the words that come before and after the blank to understand the [[context]].
+
* <b>Context Matters</b>: BERT doesn't just look at one token at a time; it pays attention to all the [[Large Language Model (LLM)#token|token]]s in the sentence. This is like looking at the whole MadLibs sentence to figure out what word fits best in the blank. BERT considers the words that come before and after the blank to understand the [[context]].
  
* <b>Prediction<b>: Now, the magic happens. BERT tries to predict what the missing word is by looking at the surrounding words and their codes. It's like guessing the right word for the MadLibs sentence based on the words you've already filled in.
+
* <b>Prediction</b>: Now, the magic happens. BERT tries to predict what the missing word is by looking at the surrounding words and their codes. It's like guessing the right word for the MadLibs sentence based on the words you've already filled in.
  
* <b>Learning from Data<b>: BERT got really good at this by studying lots and lots of sentences from the internet. It learned to understand language by seeing how words are used in different contexts. So, it's like you getting better at MadLibs by playing it over and over.
+
* <b>Learning from Data</b>: BERT got really good at this by studying lots and lots of sentences from the internet. It learned to understand language by seeing how words are used in different contexts. So, it's like you getting better at MadLibs by playing it over and over.
  
* <b>Results<b>: BERT gives a list of possible words for the blank, along with how confident it is about each one. It can even give more than one option, just like you might have a few choices in MadLibs.
+
* <b>Results</b>: BERT gives a list of possible words for the blank, along with how confident it is about each one. It can even give more than one option, just like you might have a few choices in MadLibs.
  
  

Revision as of 10:49, 8 October 2023

Youtube search... ...Google search


How BERT works using a fun MadLibs example. BERT is like a super smart language model that understands words and sentences really well. Imagine you're playing MadLibs, and you have a sentence with a missing word. Let's say the sentence is:


I went to the _______ to buy some delicious ice cream.


You need to fill in the blank with the right word to make the sentence make sense. BERT does something similar, but with much more complex sentences. Here's how it works:

  • Tokenization: BERT first breaks down the sentence into smaller pieces, called tokens. In our MadLibs example, each word is a token. So, "I," "went," "to," "the," "_______," "to," "buy," "some," "delicious," "ice," and "cream" are all tokens.
  • Word Representation: BERT gives each token a special code to represent it. Just like in MadLibs, you might have a list of possible words that could fit in the blank, and each word has a number next to it. BERT does something similar by giving each token a unique code that helps it understand what the word means.
  • Context Matters: BERT doesn't just look at one token at a time; it pays attention to all the tokens in the sentence. This is like looking at the whole MadLibs sentence to figure out what word fits best in the blank. BERT considers the words that come before and after the blank to understand the context.
  • Prediction: Now, the magic happens. BERT tries to predict what the missing word is by looking at the surrounding words and their codes. It's like guessing the right word for the MadLibs sentence based on the words you've already filled in.
  • Learning from Data: BERT got really good at this by studying lots and lots of sentences from the internet. It learned to understand language by seeing how words are used in different contexts. So, it's like you getting better at MadLibs by playing it over and over.
  • Results: BERT gives a list of possible words for the blank, along with how confident it is about each one. It can even give more than one option, just like you might have a few choices in MadLibs.





BERT Research | Chris McCormick