Jump to: navigation, search

Youtube search... | ...Google search

Recently, pre-trained models have achieved state-of-the-art results in various language understanding tasks, which indicates that pre-training on large-scale corpora may play a crucial role in natural language processing. Current pre-training procedures usually focus on training the model with several simple tasks to grasp the co-occurrence of words or sentences. However, besides co-occurring, there exists other valuable lexical, syntactic and semantic information in training corpora, such as named entity, semantic closeness and discourse relations. In order to extract to the fullest extent, the lexical, syntactic and semantic information from training corpora, we propose a continual pre-training framework named ERNIE 2.0 which builds and learns incrementally pre-training tasks through constant multi-task learning. Experimental results demonstrate that ERNIE 2.0 outperforms Bidirectional Encoder Representations from Transformers (BERT) and XLNet on 16 tasks including English tasks on GLUE benchmarks and several common tasks in Chinese.

Out of a full score of 100, the average person scores around 87 points. Baidu is now the first team to surpass 90 with its model, ERNIE. When Baidu researchers began developing their own language model, they wanted to build on the masking technique. But they realized they needed to tweak it to accommodate the Chinese language. In English, the word serves as the semantic unit—meaning a word pulled completely out of context still contains meaning. The same cannot be said for characters in Chinese. While certain characters do have inherent meaning, like fire (火, huŏ), water (水, shuĭ), or wood (木, mù), most do not until they are strung together with others. The character 灵 (líng), for example, can either mean clever (机灵, jīlíng) or soul (灵魂, línghún), depending on its match. And the characters in a proper noun like Boston (波士顿, bōshìdùn) or the US (美国, měiguó) do not mean the same thing once split apart. So the researchers trained ERNIE on a new version of masking that hides strings of characters rather than single ones. They also trained it to distinguish between meaningful and random strings so it could mask the right character combinations accordingly. As a result, ERNIE has a greater grasp of how words encode information in Chinese and is much more accurate at predicting the missing pieces. This proves useful for applications like translation and information retrieval from a text document. Baidu has a new trick for teaching AI the meaning of language | Karen Hao - MIT Technology Review