raw database

Field size as power of two

Robert Eisele

The string data types of MySQL CHAR, BINARY, VARCHAR and VARBINARY are defined with a mandatory size parameter M, which allows a value in the range from 0 to 255. VARCHAR and VARBINARY also allows to overflow these 255 by adding a second byte for the length. But please keep in mind, that if you only want to raise the length to a large value and think, that 256 is a good choice, then you will store one additional byte for every entry where it isn't necessary (expect, that you really want to store many 256 byte values, but in this case you should think about choosing another type like TEXT ). In every other case you should use 255 or even less.

But this shouldn't be the topic of this article. At any time, I started to use powers of two for the size parameter M of string data types. So I use 31, 63, 127 and so on instead of 112, 100 or something else. However, I ask myself, if this way makes any difference in performance. If you think about the acccruing work, there is nothing different. MySQL has to allocate the memory and also has to copy the contents independent from the size. Okay, we could state out, that a shorter buffer is faster than a large buffer because the operating system can find free memory even faster and the copy process will also finish earlier. But is there a difference in the memory management or memory alignment of the operating system, when you define the buffer as a power of two. Somewhere in my head is a scrap, that have heared about a difference.

I seem to remember, that I read that the operating system can allocate a power of two even faster. Every other size has to be normalized to a power of two which needs extra CPU cycles. But maybe this is only on fixed size datatypes behaivour like INT with 4 bytes, DOUBLE with 8 bytes and so on. These data types already use a size of power of two. Unfortunally, I can not find information about this behavior. Can anyone help to put me in the right way? I hope, this is not only a geeky habit.

So let's assume, that there is really a benefit by using powers of two. Should the parameter be a 2n-1 or 2n? I use the first one because the maximum value of 255 should fit into this system, but maybe i'm wrong doing this.