raw code

Imagine we have a user table in MySQL and similarly, a products table. Each user can now busily buy items. But this is not what this article is all about. This article is more about, that users can also favor articles they like. From a database point of view, this represents a m:n relation, which is resolved by an additional table. We now want to display a list of all products and all users who favor a specific article. In real world scenarios, this can be a really big amount of data, but let's bring an approximation later into play and focus on the main idea for the moment. In addition we want to know, if the person who is actually logged in into the system, already favors a certain product.

The procedural assessed developer would certainly commit such a solution to the paper very quickly:

<?php

$stack = array();
$res = $db->query('SELECT * FROM product ORDER BY product_id');
while (null !== ($row = $res->fetch_assoc())) {

	$users =&$row['Users'];
	$users = array();

	$sub = $db->query('
		SELECT *
		FROM user
		JOIN product_user ON userid=user_id
		WHERE productid=' . $row['product_id']);
 	while (null !== ($user = $res->fetch_assoc())) {
		$users[] = $user;
	}

	$row['Fav'] = $db->query('
		SELECT 1
		FROM product_user
		WHERE productid='.$row['product_id'].'
		AND userid='.$_SESSION['user_id'].'
		LIMIT 1')->num_rows == 1;

	$stack[] = $row;
}

If you have a developer in your team, who considers such a solution is suitable, make sure that he flies out! But how could we optimize this code? If you would only need one attribute of the subquery, you could use GROUP_CONCAT() and would have no additional headaches. But we want several attributes from both tables. My idea was to extend the GROUP_CONCAT() approach and dynamically generate a JSON package with all needed columns. In times of PHP4, I've already made similar things, but used various separators and had to parse the package in the userspace. If you are wondering, how it was possible to simulate GROUP_CONCAT() for < MySQL5, consider the following snippet:

SELECT @prev:= NULL;

SELECT MAX(@ids:= IF(@prev = ID, CONCAT_WS(',', @ids, sub_ID), sub_ID)), @prev:= ID
FROM t1
GROUP BY ID;

But back to the actual problem of using GROUP_CONCAT() to generate JSON. When I created a JSON packet the first time in this way, I quickly got bogged down by all that commas, parenthesis and quotes. For that reason, I've written a small auxiliary function:

<?php

function generateJson($fields) {

	$str = "CONCAT('[',GROUP_CONCAT(CONCAT('{\"";

	for ($i = 0, $l = count($fields); $i < $l; ++$i) {
		$str.= $fields[$i] . "\":','\"'," . $fields[$i];

		if ($i != $l - 1) {
			$str.= ",'\",\"";
		}
	}
	$str.= ",'\"}')),']')";
	return $str;
}

I think, the most complicated part is actually done. Now we need to write only one query that retrieves all necessary data and aggregate it accordingly:

SELECT product.*, $json_magic Users
FROM product_user
JOIN product ON product_id=productid
LEFT JOIN user ON user_id=userid
GROUP BY productid

Where $json_magic is defined as follows:

<?php

$json_magic = generateJson(array('userid', 'user_name', 'user_age'));

This solution requires the SQL mode ONLY_FULL_GROUP_BY to be turned off, but since this is the standard behavior, it should be no problem so far. Actually, the way from the idea to the implementation was simple, but the previous solution isn't quite complete, because we also wanted to check if a certain user already exists in the generated JSON package. We could add another join or a slower subselect but it is much easier, since we already have all we need to reason to that information. Please consider the following statement:

SELECT product.*, $json_magic Users, <strong>BIT_OR(user_id = $user) Fav</strong>
FROM product_user
JOIN product ON product_id=productid
LEFT JOIN user ON user_id=userid
GROUP BY productid

$user is the PHP variable containing the user ID we want to check (of the logged in user). For those of you, who have no idea what the statement actually does: The aggregate function BIT_OR() does what it's name suggests - it OR-s together all numbers falling into a group accrued by the GROUP BY statement. The parameter isn't to be an aggregated column, but a boolean expression which is expressed as an integer 0 or 1. As a reminder, the result of a Boolean OR relationship is always true if either operand is true, which means as much as if the previous number once fell to true, it remains true forever, and we have the result, whether our number is included in the subset.

Incidentally, the PHP function should be used only if no escape sequences are needed. Otherwise you have to make sure, that the JSON parsing does not fail in the user space, for example by using the QUOTE() function which escapes illegal characters, or you store directly an escaped version into the database. As mentioned above, it would also be an opportunity to speed up the whole a little by storing a serialized subset of data directly into the table. But then you have to take care that the cache is refreshed accordingly, like any other cache too. Also notice the variable group_concat_max_len, so that a large package do not stop the query execution.