Resolve many-to-many relations a bit different with MySQL
In database modeling, a m:n relationship is usually resolved by an additional table. But what if this relation is used only for archiving and the number of links in the resulting table is not too high? In that context, I got the idea to store all referring ID's as CSV string directly into a TEXT column of one of the referring tables. I came to this idea, because otherwise I would have to build complicated foreign keys and this way I also save one additional table. Certainly, this only makes sense if the data is not frequently accessed as foreign key. Nevertheless, I would like to tackle the problem, even if the implementation is very MySQL-oriented.
Suppose now that we have the following table structure:
CREATE TABLE relation ( id INT UNSIGNED KEY, refCSV TEXT NOT NULL );
Then we could simulate the m:n relation with the following simple INSERT statement:
INSERT INTO relation SET id=$id, refCSV=$ref ON DUPLICATE KEY UPDATE refCSV=IF(0=FIND_IN_SET(VALUES(refCSV), refCSV), CONCAT(refCSV, ',', VALUES(refCSV)), refCSV);
After a few inserts the table looks like this:
mysql> select * from relation; +----+---------+ | id | refCSV | +----+---------+ | 1 | 2,3,4 | | 2 | 4,2,6,1 | | 3 | 3,4 | | 4 | 3,4 | +----+---------+ 4 rows in set (0.00 sec)
Now I've figured out how I can create a join-able table from a given CSV string. The simplest idea would be to build a one-column table with serial numbers (without gaps) and afterwards copying the CSV string between the braces of the IN() operator through the used scripting language. But I wanted to find a pure MySQL-based solution with no need to return to the calling script. For now, my resulting query looks like this:
SELECT CAST(SUBSTRING_INDEX(SUBSTRING_INDEX(refCSV, ',', @pos:= @pos + 1), ',', -1) AS UNSIGNED) ref FROM ( SELECT @pos:= 0, @cnt:= 0 )X JOIN ( SELECT X.refCSV FROM relation JOIN ( SELECT refCSV, LENGTH(refCSV) - LENGTH(REPLACE(refCSV, ',', '')) + 2 lim FROM relation WHERE id=2 LIMIT 1 )X WHERE (@cnt:= @cnt + 1) < lim )Y;
The result of this query is as follows:
+------+ | ref | +------+ | 4 | | 2 | | 6 | | 1 | +------+ 4 rows in set (0.01 sec)
As you can see, it does not necessarily need an additional table to resolve a m:n relation. The only real drawback is a lack of indexing. For this reason, such a solution should only be used in special cases, as in my case with the creation of too complicated foreign keys. In general, a m:n relation should be resolved with an additional table (as Codd has suggest it). Accept this approach as proof of concept in a more general manner.
But there is really a more useful example, that one can remove duplicates from a CSV string with MySQL. This can be done with a modified version of the query shown above, so that the resulting table is immediately shrinked together by GROUP_CONCAT(DISTINCT).
The examples from above using SUBSTRING_INDEX() to obtain the desired element at a certain position of the given CSV string. In theory, one could also dissect more complicated strings with quotes and escape sequences embedded. It would only be necessary to find a better solution to determine "lim" and to extract an element from the CSV string without SUBSTRING_INDEX(). But I think, with a stored function, this should also be no particular problem.
You might also be interested in the following
- Running Standard Deviation in MySQL
- Optimal index size for variable text in MySQL
- Fast circular buffer in MySQL
- MySQL Wishlist
Sorry, comments are closed for this article. Contact me if you want to leave a note.
3 Comments on „Resolve many-to-many relations a bit different with MySQL”
Reading the first lines here I just though "hm, this disables index usage, not good". But as you say that may make sense in situations with smaller sets of rows. Different situations take different solutions. For database beginners, this of course sounds totally unsharp and not satisfying. But well, the right answer to database problems is most of the time "it depends on the situation".
This is the same solution proposed this week in this blog post:
http://www.bencurtis.com/2010/08/excessive-use-of-redis/
Like all non-relational designs, it optimizes for one particular use case, at the expense of all other possible uses for the data.
This comes in very handy at the moment... Thanks for sharing!