raw database

MySQL Wishlist

Robert Eisele

While I worked on several web projects I collected a few wishes for MySQL (this collection is more user specific - a MySQL internal wishlist will come later) - which I use in most cases. I still fixed a few things, but don't know if they are good enough for production (because I just started hacking MySQL).

AFTER trigger

One of my biggest wishes is, that the AFTER trigger become more intelligence on accessing some things. Because the AFTER trigger should be accessed AFTER the operation is done, it should be possible to access LAST_INSERT_ID() if there is one auto_increment column in that table. The other thing is, that you should be able to run DML operations on the same table in such a trigger. Imagine the case, that you want to implement a linked list. If you would have the possebility of accessing the own table in an AFTER trigger, you could manage everything behind the scenes.

SQL_NO_TRIGGER

To continue trigger issues, it should be possible to disable triggers on statements. For example by:

UPDATE SQL_NO_TRIGGER t1 SET a=123;

Trigger as ON DUPLICATE KEY replica

As the title suggests, it should be possible to create a own ON DUPLICATE KEY handler with a trigger even for non unique keys.

Enhanced trigger functions in DDL

I think this is still a little bug, if you rename a table all depending triggers should also be update their references.

I already mentioned SQL_NO_TRIGGER, but it should also be possible to disbale a trigger completeley without SHOW TRIGGERS, search for the trigger, copy it in a text file, DROP TRIGGER, run all operations and after that create the trigger again. A simple

ALTER TRIGGER trg ENABLE|DISABLE;

should be much comfortablier. To go ahead with this and to be compatible with other CREATE's like CREATE VIEW, it should possible to run CREATE OR REPLACE TRIGGER.

Virtual columns

You see, I like triggers. I would be glad about virtual columns to be able to pass things to triggers. Sure one could say why not use a SP? A stored procedure is not the best solution in every case. Using a normal INSERT statement is furthermore more intuitive. I'm not sure, if it should be possible to only pass virtual colums like this:

INSERT INTO t1 (a, b, VIRTUAL c) VALUES (1, 2, 3);

Or by creating tables with virtual columns like this:

CREATE TABLE t1(
  a INT,
  b INT,
  c VIRTUAL INT
);

This is not as clean as using virtual columns with DML, because a explicit projection of column "c" would produce a NULL (or the virtual column should be implemented with a view functionality to link another ressource)

The idea of virtual columns could also be a good thing for MyISAM MERGE tables to dynamic select a destination table for INSERT's. Imagine, you have the following structure:

t1 \
    > t3
t2 /

...and INSERT in table t3. The merge table have a fixed destination, but using a virtual column and a trigger would be nice to change the underlying destination.

By the by, a merge table should also have an own auto_increment to become the role of a global table manager.

InnoDB behind a MERGE table

At any time, I thought, it would be a good idea bringing a InnoDB table behind a merge table (don't ask why). So I played, read the source of the MERGE storage engine and found a bug, with that it is possible to create a MyISAM -> MERGE construct, and with a established connection alter the underlying tables to InnoDB is allowed.

I used this construct for a while with no problems and the featureset of InnoDB. But now, I would say, if there is no particularly function of building a InnoDB-merge use MyISAM!

Syntax simplification

Short and crisp: I'd be glad about a TRUNCATE syntax for more than one table per statement. Another wish is a multiple INSERT statement with a optional comma seperation (this frittered enough time away in the past only to handle the comma). A desired syntax for both should be:

TRUNCATE t1, t2, t3, ...;
INSERT INTO t1 (a,b) VALUES (1, 1)(2, 1)(3, 1)(4, 1)(5, 1);

Merging QL with DML

Most peoples using strict standards shall say, that this is a bad idea. Merging a select with a DELETE or UPDATE is not so good. I see it a little different, that it should be possible to access data at every time we have it selected. An UPDATE and DELETE also selects before another operation is done on the set. So why should we not be able to return a clean result with an UPDATE?

Because I only had the case of DELETE + SELECT one time, I would say let's focus on UPDATE. A temporary fix would be, to allow the following syntax:

UPDATE t1 SET a=501, @tmp:= b WHERE id=9;

I know, it is possible to write this statement as

UPDATE t1 SET a=501 WHERE id=9 AND @tmp:= b;

...but this query does only work for b <> 0. We need more code to handle this special case, so it would be a good idea to handle this in a different way.

To come back to the main idea, of let UPDATE return a result, it would be nice, having a new method - let's call it UPSEL or a new statement parameter like this:

UPDATE SQL_RETURN_PREV t1 SET a=1, b=2;

... and the returned resultset is a and b before the update were performed.

Views with user variables

Based on the idea of using results from previous operations, I had the idea of using parameters from previous tasks. You can access user variables set by triggers or stored procedures, but not a view exporting a user variable. Consider the following:

CREATE VIEW test AS
  SELECT @a:= a, b, c FROM table;

I also see the problem here, that for every row the set operation is made is slow, but if you only select one row, this is a good practice. It would also be a good practice, if you want to get the last value of a column using a view.

The restriction "View's SELECT contains a variable or parameter" haves another problem, if you want to pass a parameter from the outside like this:

CREATE VIEW test AS
  SELECT a, (b + @c) / 2 FROM t1

Double assign columns

If you want to assign the same value to two columns, you have to use two assigns like this:

UPDATE t1 SET a=123, b=123;

This is a really rarely example, but what if you want to update two tables at the same time with that value (I love denormalization) like:

UPDATE t1 JOIN t2 ON id=ref SET a=123, x=123

This is a more general problem, but what, if the value is not 123 but a string of the length 1024? At the moment you have to send this string twice - or use user variables.

It would be good to use a better syntax like:

UPDATE t1 JOIN t2 ON id=ref SET a=x=123

At the moment x is set to 123 and a is set to 1 because the second assign is trade as compare. A simple idea was to open the assign operator := also for columns to be able to force assigns:

UPDATE t1 JOIN t2 ON id=ref SET a:= x:= 123

Swap columns

Swapping two columns is not as easy. In general you need a temporary variable to swap two variables/columns. If you have luck, you have a third column in the schema to swap two others like this:

UPDATE t1 SET tmp=a, a=b, b=tmp;

It would be a good thing, setting variables in UPDATE's like I mentoined in "Merging QL with DML" to run this:

UPDATE test SET @tmp:=a, a=b, b=@tmp

Yeah. While I wrote the previous lines of the article I had an idea of swapping integers without a temporary variable or column. I tried this idea and...it works :-)

UPDATE t1 SET a=a^b, b=a^b, a=a^b

List Operator

With list operator I mean a virtual table made from a set of values. This is still possible using UNION but this is a really ugly thing. You are able to write:

SELECT * FROM (
  SELECT 1
  UNION
  SELECT 2
  UNION
  SELECT 3
  UNION
  SELECT 4
  UNION
  SELECT 5
  UNION
  SELECT 6
)X

Wouldn't it be nice, to write this as:

SELECT * FROM LIST(1,2,3,4,5,6)

Update from 09.04.2010

PostgreSQL knows a similar way to create pseudo tables by using the generate_series() operator.

Date and Time

I think, there should be a way of getting a RFC822 date from a TIMESTAMP or a DATETIME without passing a ugly format string to DATE_FORMAT(). The other thing is, that UNIX_TIMESTAMP() should return a negative integer, if the date is before the start of this epoch from 1970. PHP does this too.

Optimization

It would be a good idea, if a query runs longer than a specified time to search the processlist for another long waiting query by using the same approach like the query cache by simply compare the query string and give the result to all other waiting querys without running the slow procedere many times. Sure, the query cache could handle this, but only if the result is not too big.

The other point is both, a question and a wish at the same time. Consider the following query:

(
  SELECT *
  FROM t1
) UNION (
  SELECT *
  FROM t2
)
LIMIT 10

Let's think about this construct as a select of top news for example. You have a denormalized table for news (table t1), but you want to fill up the result with older entries if there are not enough new ones. So my question is, is the second query executed even if query one covers the limit of the total result or are both querys executed? If not, could this be so in future releases?

Update from 29.08.2010

The following snippet is a good proof, that both queriers are executed before the result is truncated:

(
  SELECT id
  FROM t1
) UNION (
  SELECT BENCHMARK(1000000, ENCODE('hello', 'goodbye'))
)
LIMIT 2;
Which produces the following output:
+----+
| id |
+----+
|  1 |
|  2 |
+----+
2 rows in set (10.05 sec)

It's a wrap. I hope the one or another wish will be implemented in the future.