• Scaling Transformers: Perform at your best

    Okay, so it took a bit longer to wrap up than I expected … but here is the second part of my Scaling Transformers series 🤗 If you didn’t catch the first part in which I talked about the Reformer, you can find it over here. As you may recall, in the previous part I decided not to implement one of the key innovations of the model: the LSH-based approximation of attention. The reason is simple: a couple months ago a new approach was introduced, and clearly outperformed previous work. So, without further ado let’s jump right in, and talk about …

  • Scaling Transformers: Reform your ways

    After showing you my Python setup in the previous post, I wanted to showcase it in a second post with a project. However it got out of hand: I chose to write a post on scaling Transformers, implemented ideas from two NLP papers, and … ended up with a super long post 😅 Since the topic is close to my heart, I decided to start a series of articles on it, focusing on one paper at a time. In all cases, let’s dive in !

  • Concise guide to efficient Python tooling

    Although I intend to mostly write about AI, for my first post I am simply going to share my current setup for coding in Python. Tooling and “good practices” is a pretty contentious topic so I don’t expect you, dear reader, to agree with everything I’ll be covering - hell, I might even disagree with this in a couple years’ time. However, I strongly believe that the following makes coding Python much easier so if you haven’t thought out much of your tooling then this post might be of interest to you - and even if you have, I’ll welcome any criticism or recommendations 👌