Generative Models are of immense interest in fundamental research due to their ability to model the “all important” data distribution. A large class of generative models fall into the category of Probabilistic Graphical Models or PGMs. PGMs (e.g. VAE) usually train a parametric distribution (encoded in the form of graph structure) by minimizing log-likelihood, and samples from it by virtue of ancestral sampling. GANs, another class of popular generative model, take a different approach for training as well as sampling. Both class of models however, suffer from several drawbacks, e.g. difficult log-likelihood computation, unstable training etc. Recently, efforts have been made to craft generative models that inherit all good qualities from the existing ones. One of the rising classes of generative models is called “Score based Models”. Rather than explicitly maximizing the log-likelihood of a parametric density, it creates a map to navigate the data space. Once learned, sampling is done by Langevin Dynamics, an MCMC based method that actually navigates the data space using the map and lands on regions with high probability under empirical data distribution (i.e. real data regions). In this articles, we will describe the fundamentals of Score based models along with few of its variants.
The next part of this two-part blog is Diffusion Probabilistic Models
Traditional maximum-likelihood (MLE)
Traditional log-likelihood based approaches define a parametric generative process in terms of graphical model and maximize the joint density
The joint density is often quite complex and sometimes intractable. For intractable cases, we maximize a surrogate objective based on e.g. Variational Inference. We achieve the above in practice by moving the parameters in the direction where the expected log-likelihood increases the most at every step
With a trained set of parameters
There is one annoying requirement both in (1) and (2): the parametric model
Score based models (SBMs)
A new and emerging class of generative model, namely “Score based models (SBM)” entirely sidesteps the log-likelihood modelling and approaches the problem in a different way. In specific, SBMs attempt to learn a navigation map on the data space which guides any point on that space to reach a region highly probable under the data distribution
Please be careful and notice that the quantity on the right hand side of (3), i.e.
Given any point on the data space, the score tells us which direction to navigate if we would like see a region with higher likelihood. Unsurprisingly, if we take a little step toward the direction suggested by the score, we get a point
As simple as it might sound, we construct a regression problem with
This is known as Score Matching. Once trained, we simply keep moving in the direction suggested by
Looks all good. But, there are two problems with optimizing
Problem 1: The very obvious one; we don’t have access to the true scores
. No one knows the exact form of .Problem 2: The not-so-obvious one; the expection
is a bit problematic. Ideally, the objective must encourage learning the scores all over the data space (i.e. for every ). But this isn’t possible with an expectation over only the data distribution. The regions of the data space which are unlikely under do not get enough supervisory signals.
Implicit Score Matching (ISM)
Aapo Hyvärinen, 2005 solved the first problem quite elegantly and proposed the Implicit Score Matching objective
The reason it’s known to be “remarkable” is the fact that
where
Denoising Score Matching (DSM)
In a different approach, Pascal Vincent, 2011 investigated the “unsuspected link” between Score Matching and Denoising Autoencoders. This work led to a very efficient and practical objective that is used even in the cutting-edge Score based models. Termed as “Denoising Score Matching (DSM)”, this approach mitigates both problem 1 & 2 described above and does so quite elegantly.
To get rid of problem 2, DSM proposes to simply use a noise-perturbed version of the dataset, i.e. replace
The above equation basically tells us to create a perturbed/corrupted version of the original dataset by adding simple isotropic gaussian noise whose streangth is controlled by
With a crucial proof shown in the appendix of the original paper, we can have an equivalent (changes shown in magenta) version of
Note that we now need original-corrupt data pairs
The score function we learn this way isn’t actually for our original data distribution
Moreover, Eq. 5 has a very intuitive interpretation and this is where Pascal Vincent, 2011 uncovered the link between DSM and Denoising Autoencoders. A closer look at Eq. 5 would reveal that the
Noise Conditioned Score Network (NCSN)
The idea presented in Song et al., 2020 is to have
We finally learn the shared score function from the ensamble of
where
In order to sample, Song et al., 2020 proposed a modified version of Langevin Dynamics termed as “Annealed Langevin Dynamics”. The idea is simple: we start from a random sample and run the Langevin Dynamics (Eq. 4) using
Connection to Stochastic Differential Equations
Recently, Song et al., 2021 have established a surprising connection between Score Models, Diffusion Models and Stochastic Differential Equation (SDEs). Diffusion Models are another rising class of generative models fundamentally similar to score based models but with some notable differences. Since we did not discuss Diffusion Models in this article, we cannot fully explain the connection and how to properly utilize it. However, I would like to show a brief preview of where exactly SDEs show up within the material discussed in this article.
Stochastic Differential Equations (SDEs) are stochastic dymanical systems with state
where
To find a connection now, it is only a matter of comparing Eq. 6 with Eq. 4. The sampling process defined by Langevin Dynamics is essentially an SDE discretized in time with
In another future article named An introduction to Diffusion Probabilistic Models, we explored Diffusion Models along with their connection to SDEs.
Citation
@online{das2021,
author = {Das, Ayan},
title = {Generative Modelling with {Score} {Functions}},
date = {2021-07-14},
url = {https://ayandas.me/blogs/2021-07-14-generative-model-score-function.html},
langid = {en}
}