raw database

Store a tag-cloud in MySQL

Robert Eisele

There was a time when tag-clouds were the thing for website owners to fancy oneself. These clouds are mostly gone, but seen from the perspective of how to implement such a thing, one can learn quite a lot, especially with large amounts of links. Anyway, imagine you publish some articles on your website, which are stored in a table "post" and you want to to add tags to every post in order to print a tag-cloud.

We'll store the tags in a table "tag", which is represented by an id, the tag name, the tag slug/url, the number of uses, the timestamp of the last use of a tag and a crc hash for fast lookup of the tag:

CREATE TABLE tag (
  TID INT UNSIGNED AUTO_INCREMENT KEY,         # The tag id
  TTag VARCHAR(48) NOT NULL,                   # The actual tag label
  TUrl VARCHAR(48) NOT NULL,                   # The url slug for the tag
  TNum MEDIUMINT UNSIGNED NOT NULL DEFAULT 0,  # The number of usages of this tag
  THash INT UNSIGNED NOT NULL DEFAULT 0,       # A CRC32 Hash of TUrl
  TUpdate TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,     # The last time this tag was used
  KEY(THash)
) ENGINE=InnoDB;

As we have to resolve a many-to-many relationship, we need an additional table storing the link between tags and posts:

CREATE TABLE post_tag (
  T_ID INT UNSIGNED NOT NULL,                  # The Tag ID
  P_ID INT UNSIGNED NOT NULL,                  # The Post ID
  KEY(T_ID),
  KEY(P_ID)
) ENGINE=InnoDB;

Adding tags to the cloud

I tried quite some tricks to make it as easy as possible to add new tags to a post. However, a lot of roundtrips between client and server would reduce the execution time, so I decided to manage the whole thing inside the MySQL server with a stored procedure:

delimiter $$

CREATE PROCEDURE addtag(tag VARCHAR(48), ref INT UNSIGNED)
BEGIN
  DECLARE url VARCHAR(48);
  DECLARE xid INT UNSIGNED;
  SET url = slug(tag);       # uses slug() function of udf_infusion
  SET hash = CRC32(url);
  UPDATE tag SET TNum=TNum+1 WHERE THash=hash AND TUrl=url LIMIT 1;
  IF ROW_COUNT() <> 1 THEN
    INSERT INTO tag (TTag, TUrl, THash, TNum) VALUES (tag, url, hash, 1);
    SET xid=LAST_INSERT_ID();
  ELSE
    SELECT TID INTO xid FROM tag WHERE THash=hash AND TUrl=url LIMIT 1;
  END IF;
  INSERT INTO post_tag (T_ID, P_ID) VALUES (xid, ref);
END$$
delimiter ;

As mentioned in the comments, the used function to create the slug is the one from my udf_infusion UDF.

Adding new tags is as easy as this:

call addtag('OpenSource', 9);
call addtag('Math', 9);
call addtag('Data', 9);
call addtag('Math', 3);

Okay, now to the interesting part. Math! I make it a bit more fancy by coloring every tag by its age and the font size is determined by the number of occurrences.

\[\text{font-size} = \text{min-size} + \frac{\text{TNum} - \text{TNum}_\text{min}}{\text{TNum}_\text{max} - \text{TNum}_\text{min}} (\text{max-size} - \text{min-size})\]

\[\text{color} = \text{min-color} + \frac{min(now() - \text{TUpdate}, \text{max-age})}{\text{max-age}} (\text{max-color} - \text{min-color})\]

If you have quite huge numbers, you could work in the log space as well, which can help. Anyway, if you want to print the tag cloud now, all you need is get all the tags via a post-id:

SELECT TTag, TNum, TUpdate
FROM tag
JOIN post_tag ON TID=T_ID
WHERE P_ID=9;

Simply collect the minimum and maximum of TNum and print the tags according to the equations above.