Robert Eisele
Systems Engineer, Architect and DBA

The log-sum-exp trick in Machine Learning

It is commonly not taught explicitly, but in machine learning you quite often come across problems which contain the following quantity and knowing the trick can help a lot. Let's say we have an \(n\)-dimensional vector and want to calculate:

\[y = \log\sum\limits_{i=1}^n e^{x_i}\]

For example in filtering problems for which the posterior is calculated recursively:

\[p(h_t\vert v_{1:t}) \equiv \alpha(h_t) = p(v_t\vert h_t)\sum_{h_t}\alpha(h_{t-1})p(h_t\vert h_{t-1})\]

As \(\alpha\) can be quite small, it's a common method to solve the problem in log-space:

\[\log \alpha(h_t) = \log p(v_t\vert h_t)+ \log \sum_{h_t}\exp(\log \alpha(h_{t-1}) + \log p(h_t\vert h_{t-1}))\]

Another example is a multinomial distribution which you want to parameterize with softmax, like e.g. logistic regression with more than 2 onordered categories. If you now want to calculate the log-likelihood you get the quantity due to the normalization constant.

Both problems have in common, that if you try to calculate it naively, you quite quickly will encounter underflows or overflows, depending on the scale of \(x_i\). Even if you work in log-space, the limited precision of computers is not enough and the result will be INF or -INF. So what can we do?

We can show, that the following equation holds:

\[\log\sum\limits_{i=1}^n e^{x_i} = a + \log\sum\limits_{i=1}^n e^{x_i-a}\]

For an arbitrary \(a\). This means, you can shift the center of the exponential sum. A typical value is setting \(a\) to the maximum, which forces the greatest value to be zero and even if the other values would underflow, you get a reasonable result:

\[a=\underset{i}{max} x_i\]

Proof of log-sum-exp trick

\[\begin{array}{rll}& y &= \log\sum\limits_{i=1}^n e^{x_i} \\\Rightarrow & e^y &= \sum\limits_{i=1}^n e^{x_i} \\\Rightarrow & e^{-a} e^y &= e^{-a} \sum\limits_{i=1}^n e^{x_i} \\\Rightarrow & e ^{y-a} &= \sum\limits_{i=1}^n e^{-a} e^{x_i} \\\Rightarrow & e ^{y-a} &= \sum\limits_{i=1}^n e^{-a} e^{x_i} \\\Rightarrow & y-a &= \log\sum\limits_{i=1}^n e^{x_i-a} \\\Rightarrow & y &= a+\log\sum\limits_{i=1}^n e^{x_i-a}\end{array}\]

You might also be interested in the following

Leave a comment

25 / 5