raw database

In database modeling, a m:n relationship is usually resolved by an additional table. But what if this relation is used only for archiving and the number of links in the resulting table is not too high? In that context, I got the idea to store all referring ID's as CSV string directly into a TEXT column of one of the referring tables. I came to this idea, because otherwise I would have to build complicated foreign keys and this way I also save one additional table. Certainly, this only makes sense if the data is not frequently accessed as foreign key. Nevertheless, I would like to tackle the problem, even if the implementation is very MySQL-oriented.

Suppose now that we have the following table structure:

CREATE TABLE relation (
  id INT UNSIGNED KEY,
  refCSV TEXT NOT NULL
);

Then we could simulate the m:n relation with the following simple INSERT statement:

INSERT INTO relation SET id=$id, refCSV=$ref
  ON DUPLICATE KEY UPDATE refCSV=IF(0=FIND_IN_SET(VALUES(refCSV), refCSV), CONCAT(refCSV, ',', VALUES(refCSV)), refCSV);

After a few inserts the table looks like this:

mysql> select * from relation;
+----+---------+
| id | refCSV  |
+----+---------+
|  1 | 2,3,4   |
|  2 | 4,2,6,1 |
|  3 | 3,4     |
|  4 | 3,4     |
+----+---------+
4 rows in set (0.00 sec)

Now I've figured out how I can create a join-able table from a given CSV string. The simplest idea would be to build a one-column table with serial numbers (without gaps) and afterwards copying the CSV string between the braces of the IN() operator through the used scripting language. But I wanted to find a pure MySQL-based solution with no need to return to the calling script. For now, my resulting query looks like this:

SELECT CAST(SUBSTRING_INDEX(SUBSTRING_INDEX(refCSV, ',', @pos:= @pos + 1), ',', -1) AS UNSIGNED) ref
FROM (
  SELECT @pos:= 0, @cnt:= 0
)X
JOIN (
  SELECT X.refCSV
  FROM relation
  JOIN (
    SELECT refCSV, LENGTH(refCSV) - LENGTH(REPLACE(refCSV, ',', '')) + 2 lim
    FROM relation
    WHERE id=2
    LIMIT 1
  )X
  WHERE (@cnt:= @cnt + 1) < lim
)Y;

The result of this query is as follows:

+------+
| ref  |
+------+
|    4 |
|    2 |
|    6 |
|    1 |
+------+
4 rows in set (0.01 sec)

As you can see, it does not necessarily need an additional table to resolve a m:n relation. The only real drawback is a lack of indexing. For this reason, such a solution should only be used in special cases, as in my case with the creation of too complicated foreign keys. In general, a m:n relation should be resolved with an additional table (as Codd has suggest it). Accept this approach as proof of concept in a more general manner.

But there is really a more useful example, that one can remove duplicates from a CSV string with MySQL. This can be done with a modified version of the query shown above, so that the resulting table is immediately shrinked together by GROUP_CONCAT(DISTINCT).

The examples from above using SUBSTRING_INDEX() to obtain the desired element at a certain position of the given CSV string. In theory, one could also dissect more complicated strings with quotes and escape sequences embedded. It would only be necessary to find a better solution to determine "lim" and to extract an element from the CSV string without SUBSTRING_INDEX(). But I think, with a stored function, this should also be no particular problem.