Good code has to satisfy a lot of quality factors, such as readability, reusability but also performance. Especially online users aren't known for their patience. That's why writing code with very low level algorithms can speed up things a lot. To abstract all this hacking a bit, I publish a package called BitSet.js today.
Calculating the standard deviation in MySQL is a no-brainer by using the build-in aggregate function STDDEV(). If you don't need the original data and only want to save aggregated values in your database, the whole matter is getting more complicated - but is worth from a space and performance point of view.
You often see databases with huge dynamic text fields, such as VARCHAR(255), TEXT, or as I recently was allowed to see the blanket use of LONGTEXT (max 4GiB) in order to be invulnerable from all contingencies. Things getting even worse when an index is used over such columns, because hey, there is an index. It makes things fast :-) Okay, jokes aside. Often you can save a lot of space and time, MySQL spends traversing the index when using a proper column type and index size.
Even in times of a growing market of specialized NoSQL databases, the relevance of traditional RDBMS doesn't decline. Especially when it comes to the calculation of aggregates based on complex data sets that can not be processed as a batch like Map&Reduce. MySQL is already bringing in a handful of aggregate functions that can be useful for a statistical analysis. The best known of this type are certainly:
Since I use MySQL for the statistical analysis on a project, I wanted to optimize the database queries and learned a lot about stuff like number theory, set theory and partial sums. I took my MySQL UDF, I've published two years ago, for this purpose and added new functions for a deeper statistical analysis. The project is around for a while, so it's time to share things with the public to start a discussion of how things could be further optimized. The source and a small documentation can be found on Github:
I was asked for help in optimizing a MySQL query where flags are stored in a database and references should be counted based on the flag value. Sounds not complicated at first, but there are several flags that should be counted and also just once per reference. A lot of food for GROUP BY you may think. Having said this, search and group for flags in this table would be really slow due to a very poor cardinality. But let's start with the actual problem. The example is fictitious, but I did my best to find a general use case for this problem.
Dealing with large data sets makes it necessary to pick out only the newest or the hottest elements and not displaying everything. In order to have older items still available, Pagination navigation's have become established. However, implementing a Pagination with MySQL is one of those problems that can be optimized poorly with MySQL and certainly other RDBM systems. However, knowing the underlying database can also help in optimizing pagination queries, because there is no real copy and paste solution.