ayan@website $ (show-list 'blogs) _

Following is an exhaustive list (latest first) of the blogs/articles I write in spare time. I write and publish blogs on topics related to my area of research or any general topic of interest. If you have suggestions, corrections or comments on any of them, feel free to ping me on social media OR just email me at

[1] Differentiable Programming: Computing source-code derivatives

If you are follwoing the recent developments in the field of Deep Learning, you might recognize this new buzz-word, “Differentiable Programming”, doing rounds on social media (including prominent researchers like Yann LeCun, Andrej Karpathy) for an year or two. Differentiable Programming (let’s shorten it as “DiffProg” for the rest of this article) is essentially a system proposed as an alternative to tape-based Backpropagation which is running a *recorder* (often called “Tape”) that builds a computation graph *at runtime* and propagates error signal from end towards the leaf-nodes (typically weights and biases). DiffProg is very different from an *implementation perspective* - it doesn’t really “propagate” anything. It consumes a “program” in the form of *source code* and produces the...

[2] Energy Based Models (EBMs): A comprehensive introduction

We talked extensively about Directed PGMs in my earlier article and also described one particular model following the principles of Variational Inference (VI). There exists another class of models conveniently represented by *Undirected* Graphical Models which are practiced relative less than modern methods of Deep Learning (DL) in the research community. They are also characterized as **Energy Based Models (EBM)**, as we shall see, they rely on something called *Energy Functions*. In the early days of this Deep Learning *renaissance*, we discovered few extremely powerful models which helped DL to gain momentum. The class of models we are going to discuss has far more theoretical support than modern day Deep Learning, which as we know, largely relied on...

[3] Introduction to Probabilistic Programming

Welcome to another tutorial about probabilistic models, after a primer on PGMs and VAE. However, I am particularly excited to discuss a topic that doesn’t get as much attention as traditional Deep Learning does. The idea of **Probabilistic Programming** has long been there in the ML literature and got enriched over time. Before it creates confusion, let’s declutter it right now - it’s not really writing traditional “programs”, rather it’s building Probabilistic Graphical Models (PGMs), but *equipped with imperative programming style* (i.e., iterations, branching, recursion etc). Just like Automatic Differentiation allowed us to compute derivative of arbitrary computation graphs (in PyTorch, TensorFlow), Black-box methods have been developed to “solve” probabilistic programs. In this post, I will provide...

Welcome folks ! This is an article I was planning to write for a long time. I finally managed to get it done while locked at home due to the global COVID-19 situation. So, its basically something fun, interesting, attractive and hopefully understandable to most readers. To be specific, my plan is to dive into the world of finding visually appealing patterns in different sections of mathematics. I am gonna introduce you to four distinct mathematical concepts by means of which we can generate artistic patterns that are very soothing to human eyes. Most of these use random number as the underlying principle of generation. These are not necessarily very useful in real life problem solving but widely loved by...

[5] Neural Ordinary Differential Equation (Neural ODE)

Neural Ordinary Differential Equation (Neural ODE) is a very recent and first-of-its-kind idea that emerged in NeurIPS 2018. The authors, four researchers from University of Toronto, reformulated the parameterization of deep networks with differential equations, particularly first-order ODEs. The idea evolved from the fact that ResNet, a very popular deep network, possesses quite a bit of similarity with ODEs in their core structure. The paper also offered an efficient algorithm to train such ODE structures as a part of a larger computation graph. The architecture is flexible and memory efficient for learning. Being a bit non-trivial from a deep network standpoint, I decided to dedicate this article explaining it in detail, making it easier for everyone to understand....

[6] Foundation of Variational Autoencoder (VAE)

In the previous article, I started with Directed Probabilitic Graphical Models (PGMs) and a family of algorithms to do efficient approximate inference on them. Inference problems in Directed PGMs with continuous latent variables are intractable in general and require special attention. The family of algorithms, namely **Variation Inference (VI)**, introduced in the last article is a general formulation of approximating the intractable posterior in such models. **Variational Autoencoder** or famously known as **VAE** is an algorithm based on the principles on VI and have gained a lots of attention in past few years for being extremely efficient. With few more approximations/assumptions, VAE eshtablished a clean mathematical formulation which have later been extended by researchers and used in numerous applications....

[7] Directed Graphical Models & Variational Inference

Welcome to the first part of a series of tutorials about Directed Probabilistic Graphical Models (PGMs) & Variational methods. Directed PGMs (OR, Bayesian Networks) are very powerful probabilistic modelling techniques in machine learning literature and have been studied rigorously by researchers over the years. Variational Methods are family of algorithms arise in the context of Directed PGMs when it involves solving an intractable integrals. Doing inference on a set of latent variables (given a set of observed variables) involves such an intractable integral. Variational Inference (VI) is a specialised form of variation method that handles this situation. This tutorial is NOT for absolute beginners as I assume the reader to have basic-to-moderate knowledge about Random Variables, probability theories and PGMs....

[8] TeX & family : The Typesetting ecosystem

Welcome to the very first and an introductory article on `typesetting`

. If you happened to be from the scientific community, you must have gone through at least one document (maybe in the form of `.pdf`

or a printed paper) which is the result of years of developments in typesetting. If you are from technical/research background, chances are that you have even *typeset* a document before using something called `LaTeX`

. Let me assure you that `LaTeX`

is neither the beginning nor the end of the entire typesetting ecosystem. In this article, I will provide a brief introduction to what typesetting is and what all modern tools are available for use. Specifically, the most...

[9] Intermediate C++ : Speeding up iostreams in C++

We often come across situations where we need to process large files. Clearly interactive input-output is not helpful in all situations. It is a common practice to use **cin** and **cout** for input-output in C++ because of it’s flexibility and ease of use. But there is quite a big problem with iostream, which is by default, much slower than standard IO functions in other languages. In this tutorial, I will quantitatively demonstrate the slowness of iostreams in C++, explain some of the reasons for its slowness and share some tips to speed it up.

[10] Intermediate C++ : Static and Dynamic linking

C++ is a general-purpose, multi-paradigm programming language designed/developed by Bjarne Stroustrup which alongwith C, forms the backbone of majority of the programming industry today. A very important concept to be grasped by beginner programmers is the idea of “linking”. Linking basically refers to the process of bundling library code into an archive and use it later when necessary. It turns out to be an idea that is used extensively in production. The following article aims at presenting a broad view of “linking”.

[11] Advanced Python: Bytecodes and the Python Virutal Machine (VM) - Part I

Over the years, **Python** has become one of the major general purpose programming language that the industry and academia care about. But even with a vast community around the language, very few of them are aware of how Python is actually executed on a computer system. Some of them have a vague idea of how Python is executed, which is partly because it’s totally possible to know nothing about it and still be a successful Python programmer. They believe that unlike C/C++, Python is “interpreted” instead of “compiled”, i.e. they are *executed one statement at a time* rather that converting down to some kind of machine code. **It is not entirely correct**. This post is targeted towards programmers with a...

[12] Deep Learning at scale: The "torch.distributed" API

In the last post, we went through the basics of `Distributed computing`

and `MPI`

, and also demonstrated the steps of setting up a distributed environment. This post will focus on the practical usage of distributed computing strategies to accelerate the training of Deep learning (DL) models. To be specific, we will focus on one particular distributed training algorithm (namely `Synchronous SGD`

) and implement it using `PyTorch`

’s distributed computing API (i.e., `torch.distributed`

). I will use 4 nodes for demonstration purpose, but it can easily be *scaled up* with minor changes. This tutorial assumes the reader to have working knowledge of Deep learning model implementation as I won’t go over typical concepts...

[13] Deep Learning at scale: Setting up distributed cluster

Welcome to an in-depth tutorial on **Distributed Deep learning** with some standard tools and frameworks available to everyone. From the very beginning of my journey with DL when I was an undergrad student, I realized that it’s not as easy as it seems to achieve what the mainstream industry has achieved with Deep learning even thought I was quite confident about my knowledge of “DL algorithms”. Because clearly, algorithm wasn’t the only driving force that the industry survives on - it’s also the *scale* at which they execute their well-planned implementation on high-end hardwares, which was near to impossible for me to get access to. So it’s extremely important to understand the concept of *scale* and the consequences that comes...

[14] Intermediate Python: Generators, Decorators and Context managers - Part II

In my previous post, I laid out the plan for couple of tutorials in the same series about intermediate level Python. This series of posts are intended to introduce some of the intermediate concepts to the programmers who are already familiar with the basic concepts of Python. Specifically, I planned to ellaborately describe `Generator`

s, `Decorator`

s and `Context Manager`

s, among which, I have already dedicated a full-fledged post on the first one - `Generator`

s. This one will be all about `Decorator`

s and some of it’s lesser known features/applications. Without further I do, let’s dive into it.

[15] Intermediate Python: Generators, Decorators and Context managers - Part I

Welcome to the series of *Intermediate* Python tutorials. Before we begin, let me make it very clear that this tutorial is **NOT** for absolute beginners. This is for `Python`

programmers who have familiarity with the standard concepts and syntax of Python. In this three parts tutorial, we will specifically look at three features of Python namely `Generators`

, `Decorators`

and `Context Managers`

which, in my opinion, are not heavily used by average or below-average python programmers. In my experience, these features are lesser known to programmers whose primary purpose for using python is not to focus on the language too much but just to get their own applications/algorithms working. This leads to very...

[16] CapsNet architecture for MNIST

I recently wrote an article explaining the intuitive idea of `capsule`

s proposed by Geoffrey Hinton and colleagues which created a buzz in the deep learning community. In that article, I explained in simple terms the motivation behind the idea of `capsule`

s and its (minimal) mathematical formalism. It is highly recommended that you read that article as a prerequisite to this one. In this article, I would like to explain the specific `CapsNet`

architecture proposed in the same paper which managed to achieve state-of-the-art performance on the MNIST digit classification.

[17] An intuitive understanding of Capsules

Recently, Geoffrey Hinton, the godfather of deep learning argued that one of the key principles in the ConvNet model is flawed, i.e., they don’t work the way human brain does. Hinton also proposed an alternative idea (namely `capsules`

), which he thinks is a better model of the human brain. In this post, I will try to present an intuitive explanation of this new proposal by Hinton and colleagues.