# Optimized trend analysis

If you look at the number of visitors of your website, you can only take the data of the previous days for a serious analysis. In itself logical, but it would also be desirable to know which traffic you can expect today, based on the expectation of the day. For the statement about the expected traffic, you can use a simple average, but this has no real significance. A better approach would incorporate all available information.

For example, you could find out the general look of the trend by using a **linear regression**. If you form a straight line from this information, you might theoretically be able to predict even longer periods. However, a long-term prediction of internet traffic could be very difficult to not lose the relation to reality. Oh by the way, if your crystal ball has a USB port, you can feed in these further information at this point. Okay, so we remain at the present day: The traffic for the day can be determined by the trend of the previous days, and possible outside influences that give use a todays traffic-high or -low.

A naive implementation would, as already mentioned, simply use an arithmetic average, but has the problem, as always with averages that important things will be smoothed. You could combine the average with the linear regression and try to interpolate a little, but I'm using a better solution that delivers on the basis of a small sample of periodic data points an almost accurate prediction; an exponential smoothing of the data. A simple form of exponential smoothing is defined by the following recursion:

r_{1}= α * d_{1}+ (1 - α) * r_{0}r_{2}= α * d_{2}+ (1 - α) * r_{1}r_{3}= α * d_{3}+ (1 - α) * r_{2}...

α is an adjustment variable to put the weight more on the beginning or the end of the series. An implementation of the function might look like this in JavaScript:

function smooth(data, alpha) { var res = data[0]; for (var i = 0; i < data.length; i++) { res = res * (1 - alpha) + data[i] * alpha; } return res; }

Putting the weight to the upper third, you can make quite a good prediction based on the previous days. However, the day-dependant relation is still missing. One idea would be to use the intraday volume. Thus, I know what proportion of traffic I can expect, e.g., after 3 o'clock p.m. I created a little visualization, where you can play with the expectation by clicking into the gray square below (the x-axis is the time and the y-axis the actual traffic at a given time point). The trend is calculated on the basis of 30 random data points:

As you can see a simple ratio equation is behind the whole thing; On the morning outweights the calculated trend and in the course of the day, the result approximates more and more to the real traffic volume. However, since we in the course of the day still can not work with the overall traffic, we make a projection on the anticipated traffic:

cur + cur * (1 - x) / x

Here, the variable *cur* is the actual value at the current time and *x* is the percentage of traffic, we've passed so far. This formula can now be simplified to:

cur / x

As I said, *x* is the ratio of intraday results, which we have already completed. The value of x is calculated as the ratio of *sup / sum* - namely the partial sum divided by the overall intraday sum. Combining this knowledge now with the already calculated trend yields the following equation:

trend + (sup / sum) * ((cur / (sup / sum)) - trend)

Which in turn can be simplified:

cur - sup * trend / sum + trend

At the end we arrive at the following JavaScript function that calculates the trend quite accurately for every ongoing hour. Of course if you have intraday information on minute basis, you can improve the result.

function trend(data, weight) { var current = data.pop(), trend = smooth(data, 2 / 3), hour = new Date().getHours(); for (var sup = 0, sum = 0, i = 24; i--; ) { sup+= weight[i] * (i < hour); sum+= weight[i]; } return current - sup / sum * trend + trend; }

If you need something to fork, check out my JavaScript snippets file, where I've added the functions of this article. Additional you can also find the license information for the code there.

You might also be interested in the following

- MySQL Infusion UDF for statistical analysis
- Statistical functions in MySQL
- Analyze online behavior with MySQL and PHP
- Optimized way of getting subqueries at once using JSON

**Sorry, comments are closed for this article. Contact me if you want to leave a note.**