October 27, 2019
Some of the most powerful NLP models like BERT and GPT-2 have one thing in common: they all use the transformer architecture. Such architecture is built on top of another important concept already known to the community: self-attention.In this episode I ...

![[RB] Complex video analysis made easy with Videoflow](https://datascienceathome.com/wp-content/themes/betheme/images/cookies.png)