raw database

MySQL - My new playground

Robert Eisele

Note: It was my first attempt using the MySQL UDF Api back then. The library is still maintained and got some new features. Check out the new MySQL Infusion UDF.

The business-logic should not be the task of a database server, because hardware of database servers is mostly more expensive than the sheet metal of some stupid application servers. The other problem are inconsistencies if you change the code-base on a large database cluster very often. You could reduce the risk of inconsistency with batches, but that does not solve the problem completely. However, I've developed a handful of stored functions to simplify queries, reduce index sizes and to shorten strings where only a small portion is needed, which in turn saves a lot of bandwidth.

When I spoke in my previous article about transparent scaling of databases, I also mentioned that this approach can run on a single machine and will scale to hundreds of servers, by initially simulating other machines with multiple databases. The problem with stored functions here is that these functions must be defined for each new database, because stored functions can not be defined in the global scope but only under the scope of a database. A workaround to circumvent this problem is another database just for stored functions and than call these bunch of functions with the prefixed database name:

CREATE DATABASE fn;
CREATE FUNCTION fn.TEST() ...;
SELECT fn.TEST() FROM t1;

Maybe I hereby open the Pandora's box, but I would like to work on MySQL internals in the future in order to realize my wishlist and when I can "hear the sound of MySQL internals", I'll go ahead to take care of stability and performance issues. I've also begun to develop a storage engine that is designed to solve a problem, what is being solved with NoSQL solutions these days - but more about that will come later, when I have something to show.

I started to make me familiar with MySQL's internal structure by developing an UDF that implements some useful functions I've previously only had at hand as slower stored functions. Some of these functions were also injected into the PHP core with my PHP infusion extension. For that reason, I call this UDF also infusion, but it has some differences to the PHP version.

The UDF API is not perfect, it is not possible e.g. to determine which collation is used for strings and it is not possible to process other values than integers, reals and strings without the need to parse the values yourself. But most of the functions, which are useful inside the database were transfered from the PHP pendant.

A significant advantage of UDF's over stored functions is besides the opportunity to define global server functions, that especially string functions are from a performance point of view far superior - how it was already proven from the guys of Percona.

New functions for MySQL

isbit

Check if a bit-flag is set in the number.

<strong>bool isbit(int mask, int n);</strong>

mysql> SELECT isbit(5, 2);
+-------------+
| isbit(5, 2) |
+-------------+
|           1 |
+-------------+
1 row in set (0.00 sec)

setbit

Set a bit-flag in the number to 1 or another boolean.

<strong>int setbit(int mask, int n[, bool init=1]);</strong>

mysql> SELECT setbit(8, 4, 1);
+-----------------+
| setbit(8, 4, 1) |
+-----------------+
|              24 |
+-----------------+
1 row in set (0.00 sec)

invbit

Toggle/invert a bit-flag in the given number.

<strong>int invbit(int mask, int n);</strong>

mysql> SELECT invbit(8, 2);
+--------------+
| invbit(8, 2) |
+--------------+
|           12 |
+--------------+
1 row in set (0.00 sec)

numbit

Count the number of bits set in the number.

<strong>int numbit(int mask[, flag=1]);</strong>

mysql> SELECT numbit(2);
+-----------+
| numbit(2) |
+-----------+
|         1 |
+-----------+
1 row in set (0.00 sec)

msbit

Get the most significant bit of a number. This equals the log base 2 of the number.

<strong>int msbit(int mask);</strong>

mysql> SELECT msbit(15);
+-----------+
| msbit(15) |
+-----------+
|         3 |
+-----------+
1 row in set (0.00 sec)

bround

Round to the next multiple of a base.

<strong>int bround(int num, int base);</strong>

mysql> SELECT bround(13, 3);
+---------------+
| bround(13, 3) |
+---------------+
|            15 |
+---------------+
1 row in set (0.00 sec)

xround

Round to the next power of 10. This breaks down 10ceil(log(n) / log(10))

<strong>int xround(int num);</strong>

mysql> SELECT xround(55);
+------------+
| xround(55) |
+------------+
|        100 |
+------------+
1 row in set (0.00 sec)

bound

Limits a number to a specified lower min- and/or a upper max value. Not used min/max can be ignored with NULL

<strong>double bound(double num, double min, double max);</strong>

mysql> SELECT bound(12, 0, 4);
+-----------------+
| bound(12, 0, 4) |
+-----------------+
|      4.00000000 |
+-----------------+
1 row in set (0.00 sec)

cut

Cuts a string if it's longer then a max value and appends "...". Words are not chopped.

<strong>string cut(string str, int num[, string x='...']);</strong>

mysql> SELECT cut('This is the funny world of MySQL...', 15);
+------------------------------------------------+
| cut('This is the funny world of MySQL...', 15) |
+------------------------------------------------+
| This is the...                                 |
+------------------------------------------------+
1 row in set (0.00 sec)

slug

The old name of this function was makeurl but I renamed it to slug, because this seems to be the more official term for this one. The Wordpress Codex says this: A slug is a few words that describe a post or a page. Slugs are usually a URL friendly version of the post title [...], but a slug can be anything you like. Slugs are meant to be used with permalinks as they help describe what the content at the URL is.

<strong>string slug(string str);</strong>

mysql> SELECT slug('Max Müller Straße!');
+----------------------------+
| slug('Max Müller Straße!') |
+----------------------------+
| max-mueller-strasse        |
+----------------------------+
1 row in set (0.00 sec)

ngram

Generates a list of ngrams from a given string.

<strong>string ngram(string str[, int size=2]);</strong>

mysql> SELECT ngram('Lorem ipsum dolor');
+-------------------------------------------------------+
| ngram('Lorem ipsum dolor')                            |
+-------------------------------------------------------+
| _l lo or re em m_ _i ip ps su um m_ _d do ol lo or r_ |
+-------------------------------------------------------+
1 row in set (0.01 sec)

thumbscale

Calculates a scaled weight on base of a rating system with a number of thumbs up and thumbs down.

<strong>int thumbscale(int up, int down, int scale);</strong>

mysql> SELECT thumbscale(66, 33, 6);
+-----------------------+
| thumbscale(66, 33, 6) |
+-----------------------+
|                     4 |
+-----------------------+
1 row in set (0.00 sec)

thumbratio

Calculates a ratio on base of a rating system with a number of thumbs up and thumbs down.

<strong>int thumbratio(int up, int down);</strong>

mysql> SELECT thumbratio(66, 33);
+--------------------+
| thumbratio(66, 33) |
+--------------------+
|         2.91176471 |
+--------------------+
1 row in set (0.00 sec)

starratio

Calculates a simple average of a rating system which uses stars (for example 5 stars). The result is in the interval [0, num_arguments]

<strong>int starratio(int a, int b[, ...]);</strong>

mysql> select starratio(333,666, 99, 10);
+----------------------------+
| starratio(333,666, 99, 10) |
+----------------------------+
|                 1.80685921 |
+----------------------------+
1 row in set (0.00 sec)

group_first

Get's the first element of an aggregated group.

<strong>string group_first(string str);</strong>

mysql> SELECT group_first(Num) FROM test GROUP BY C_ID;
+------------------+
| group_first(Num) |
+------------------+
| 25               |
| 8                |
| 11               |
+------------------+
3 rows in set (0.01 sec)

group_last

Get's the last element of an aggregated group.

<strong>string group_last(string str);</strong>

mysql> SELECT group_last(Num) FROM test GROUP BY C_ID;
+-----------------+
| group_last(Num) |
+-----------------+
| 88              |
| 11              |
| 34              |
+-----------------+
3 rows in set (0.01 sec)

The binary setting functions from store small numbers in big ones are also implemented in the infusion UDF, but I spare the example at this point. If you want to install this function enhancement UDF, you can download it from Github, compile it with the provided Makefile and create the binding by using the create.sql.