Google’s flagship language model


The automatic language processing AI is available in open source. With strong versions in French with CamemBERT and FlauBERT, it finds its main applications in chatbots or sentiment analysis.

Bert, what is it?

Bert, or bidirectional encoder representations from transformers, is a deep learning model oriented automatic language processing (TAL) or natural language (NLP). This artificial neural network was released by Google as open source, under the Apache license, in the fall of 2018.

With 100 million parameters in its basic version and 335 million in its large version, Bert is designed to create chatbots, support sentiment analysis, predict text as it is typed or even perform summaries.

Bert relies on transformer technology. Like recurrent neural networks (RNNs), they are tailored to ingest sequential data. This makes them particularly well suited to natural language processing.

Unlike RNNs, transformers do not process data in the form of a continuous stream respecting the order of the words of the sentences. Result: they are able to split the processing and parallelize the calculations of the learning phase. Result: the transformers are much faster to train.

What is special about Bert?

Bert’s key innovation is to implement bi-directional transformer drive. Unlike unidirectional language models, which process input text sequentially from left to right or right to left, the encoder implemented by Bert reads the entire sequence of words at once. “This feature allows the model to ingest the global context of a term, simultaneously taking into account the words to its left and to its right,” explains Houssam AlRachid, lead data scientist at Devoteam.

Bert is pre-trained on two different but related natural language processing (NLP) tasks: masked language model (MLM) and next sentence prediction (NSP).

By combining MLM and NSP techniques, Bert manages to capture the context

“MLM works as follows: the algorithm begins by hiding a word in a sentence. Then, it will try to predict which words have the greatest probability of being good replacements for the hidden word, depending on the context. As for the NSP, it works by predicting whether two sentences have a logical (or sequential) connection or not. For example: Peter is sick. He has the flu. In the second sentence, the pronoun “he” refers to Peter. there is therefore a logical or causal connection between these two sentences”, explains Houssam AlRachid at Devoteam. It is by combining MLM and NSP that Bert manages to grasp the context.

How much server power does Bert need?

To leverage Bert in both training and deployment phases, AWS recommends its entry-level G4 instance. With 4 vCPUs and 16 GB of memory, it is priced at $0.526 per hour.

FlauBERT and CamemBERT: the French versions of Bert

CamemBERT and FlauBERT are variations of Bert tailored to support French.

  • Camembert is itself based on another version of Bert. Called RoBERTa (for Robustly Optimized BERT Pretraining), the latter optimizes Bert’s training process, which allows him to record improved performance on several tasks. CamemBERT was trained on a French corpus of 138 GB of text.
  • FlauBERT was trained on a French corpus of 71 GB of very large and heterogeneous text via the Jean Zay supercomputer of the CNRS. FlauBERT introduces Flue: an evaluation configuration for French NLP systems similar to the famous Glue benchmark. Like GPT-2, FlauBERT relies on the BPE tokenization algorithm (for byte pair encoding) to compress the learning data set and thus accelerate the training phase.
BERT, CamemBERT and FlauBERT in brief
Model/date Licence Vendor Number of parameters Machine learning mode Use cases
BERT/ 2018 open source (Apache license) Google AI – Basic model: 100 million,
– Large model: 335 million.
Bidirectional training that ingests the text to the right and left of a word to determine its context. Chatbots, sentiment analysis, information retrieval, auto-complete, summary.
CamBERT/ 2019 Open source (MIT license) Facebook AI Research and Inria – Basic model: 100 million,
– Large model: 335 million.
French linguistic model based on BERT and RoBERTa and pre-trained on the Oscar multilingual corpus. Filling/hiding tasks, i.e. hiding certain words in a sentence in order to predict them.
FlauBERT/ 2019 Open source (Creative Commons Attribution-NonCommercial 4.0) CNRS – Basic model: 137 million,
– Large model: 373 million
French BERT formed from a very large and heterogeneous corpus. Text classification, paraphrase, natural language inference, syntactic analysis, disambiguation.

Source: Devoteam

Bert can be downloaded from GitHub, as are its French versions CamemBERT and FlauBERT:

Leave a Comment