Robert Eisele
Systems Engineer, Architect and DBA

Field size as power of two

The string data types of MySQL CHAR, BINARY, VARCHAR and VARBINARY are defined with a mandatory size parameter M, which allows a value in the range from 0 to 255. VARCHAR and VARBINARY also allows to overflow these 255 by adding a second byte for the length. But please keep in mind, that if you only want to raise the length to a large value and think, that 256 is a good choice, then you will store one additional byte for every entry where it isn't necessary (expect, that you really want to store many 256 byte values, but in this case you should think about choosing another type like TEXT ). In every other case you should use 255 or even less.

But this shouldn't be the topic of this article. At any time, I started to use powers of two for the size parameter M of string data types. So I use 31, 63, 127 and so on instead of 112, 100 or something else. However, I ask myself, if this way makes any difference in performance. If you think about the acccruing work, there is nothing different. MySQL has to allocate the memory and also has to copy the contents independent from the size. Okay, we could state out, that a shorter buffer is faster than a large buffer because the operating system can find free memory even faster and the copy process will also finish earlier. But is there a difference in the memory management or memory alignment of the operating system, when you define the buffer as a power of two. Somewhere in my head is a scrap, that have heared about a difference.

I seem to remember, that I read that the operating system can allocate a power of two even faster. Every other size has to be normalized to a power of two which needs extra CPU cycles. But maybe this is only on fixed size datatypes behaivour like INT with 4 bytes, DOUBLE with 8 bytes and so on. These data types already use a size of power of two. Unfortunally, I can not find information about this behavior. Can anyone help to put me in the right way? I hope, this is not only a geeky habit.

So let's assume, that there is really a benefit by using powers of two. Should the parameter be a 2n-1 or 2n? I use the first one because the maximum value of 255 should fit into this system, but maybe i'm wrong doing this.

You might also be interested in the following

1 Comment on „Field size as power of two”

Roland Bouman

Hi!

there are a few things that I think you should be aware of. First of all, the size parameter for the string types CHAR and VARCHAR is specified as a number of characters (BINARY and VARBINARY are allocated as a number of bytes). One character may occupy more than one byte, so your math will only work for single-byte character sets.

I also like to point out that in most cases, the TEXT (and BLOB) types are typically stored outside the main record structure, which means it will generally be much slower than a CHAR/VARCHAR (which is store inline in the main record structure).

Finally, I don't think MySQL does individual malloc's whenever it needs some memory. Rather, at startup, it will pre-allocate an amount of memory (mem_root) which is used for internal allocation requests.

So in short, i don't think allocating in multiples of 2 will buy you that much.

 

Sorry, comments are closed for this article. Contact me if you have an inventive contribution.