Calculating the standard deviation in MySQL is a no-brainer by using the build-in aggregate function STDDEV(). If you don't need the original data and only want to save aggregated values in your database, the whole matter is getting more complicated - but is worth from a space and performance point of view.
You often see databases with huge dynamic text fields, such as VARCHAR(255), TEXT, or as I recently was allowed to see the blanket use of LONGTEXT (max 4GiB) in order to be invulnerable from all contingencies. Things getting even worse when an index is used over such columns, because hey, there is an index. It makes things fast :-) Okay, jokes aside. Often you can save a lot of space and time, MySQL spends traversing the index when using a proper column type and index size.
Even in times of a growing market of specialized NoSQL databases, the relevance of traditional RDBMS doesn't decline. Especially when it comes to the calculation of aggregates based on complex data sets that can not be processed as a batch like Map&Reduce. MySQL is already bringing in a handful of aggregate functions that can be useful for a statistical analysis. The best known of this type are certainly:
Since I use MySQL for the statistical analysis on a project, I wanted to optimize the database queries and learned a lot about stuff like number theory, set theory and partial sums. I took my MySQL UDF, I've published two years ago, for this purpose and added new functions for a deeper statistical analysis. The project is around for a while, so it's time to share things with the public to start a discussion of how things could be further optimized. The source and a small documentation can be found on Github:
Since March, I can use Spotify officially in Germany, and also do this extensively every day. Albeit, from my initial exuberant joy on the new toy is not much left after nearly two months of use. I frequently find myself again using SoundCloud and YouTube for the musical accompaniment while I'm working on the computer. Since I don't feel so much obliged to music as Spotify does and since I don't have the intention to start an own project in this direction, I'd like to offer a few thoughts on the quality of Spotify and also a few ideas on how such an app could look like.
I was asked for help in optimizing a MySQL query where flags are stored in a database and references should be counted based on the flag value. Sounds not complicated at first, but there are several flags that should be counted and also just once per reference. A lot of food for GROUP BY you may think. Having said this, search and group for flags in this table would be really slow due to a very poor cardinality. But let's start with the actual problem. The example is fictitious, but I did my best to find a general use case for this problem.
Dealing with large data sets makes it necessary to pick out only the newest or the hottest elements and not displaying everything. In order to have older items still available, Pagination navigation's have become established. However, implementing a Pagination with MySQL is one of those problems that can be optimized poorly with MySQL and certainly other RDBM systems. However, knowing the underlying database can also help in optimizing pagination queries, because there is no real copy and paste solution.
If you look at the number of visitors of your website, you can only take the data of the previous days for a serious analysis. In itself logical, but it would also be desirable to know which traffic you can expect today, based on the expectation of the day. For the statement about the expected traffic, you can use a simple average, but this has no real significance. A better approach would incorporate all available information.