Codesignal: TwoFive

Original Problem

Base 32 is often chosen to represent binary data in a compressed form that can be easily read and handled by humans (for example, product keys or activation codes). It's more compact than Base 10 and easier to comprehend than Base 16 (hexadecimal) or Base 64.

To generate Base 32, the input data is processed as a bit stream. Output values for each group of five bits are produced until all input bits are processed. The last 5-bit group is extended with '0' bits if it is incomplete.

Finally, each 5-bit value is mapped into a dictionary of 32 alphanumeric characters to generate a human readable string.

The dictionary omits ambiguous characters such as '0', '1', 'O', and 'I', so only the following characters remain:

d = "23456789ABCDEFGHJKLMNPQRSTUVWXYZ".

Your task is to generate Base 32 from the given input data a.

Example

Input/Output

Solution

The first idea that came to my mind was creating a linear mapping from the indexes of the 8 bit digits to the 5 bit digits. That means that I was looking for mapping that was looking like this for a 2 byte value:

    7654321076543210
43210432104321043210

I created slicing windows from \(a\) to \(b\). The first window is from 0 to 4, the next from 5 to 7 and 0 to 1 and so on. The following algorithm produces exactly this:

a = 0
b = a + 5 - 1
n = 0

for (var i=0; i < 8; i++) {

  if (a < b)
    console.log(n, a, b)
  else
    console.log(n++, 7, a, "##",n,b, 0)

  a = (b + 1) % 8
  b = (a + 5 - 1) % 8
}

With this index creating algorithm, it's easy to create the masks to compute the actual base 32 values and map the array finally to the given dictionary:

arr = [170, 170]
r = []

a = 0
b = a + 5 - 1
n = 0

for (; n < arr.length; ) {

  if (a < b) {
    r.push((arr[n] >> a) & 31)
    console.log(n, a, b, r)
    if (b == 7) n++
  } else {
    v = (arr[n] >> a) & 31
    if (n+1<arr.length)
      v|=(arr[n+1] << (7-a+1)) & 31
    r.push(v)
    console.log(n, 7, a, "##",n+1,b, 0, r)
    n++;
  }
  a = (b + 1) % 8
  b = (a + 5 - 1) % 8
}

r.map(x => "23456789ABCDEFGHJKLMNPQRSTUVWXYZ"[x]).join``

When simplified, this algorithm looks as follows:

TwoFive = X => {
  r = ""
  for (a = n = 0; X[n]; ) {
    b = (a + 4) % 8
    v = X[n] >> a & 31
    b == 7 && n++
    b < a && X[++n] && (v|= X[n] << 8 - a & 31)
    a = (b + 1) % 8
    r+= "23456789ABCDEFGHJKLMNPQRSTUVWXYZ"[v]
  }
  return r
}

As this algorithm is not readable anymore and not as small as I want it to be, I started all over. The only thing we need is a little buffer to overcome the downgrade from 8 to 5 bit. So why not writing 8 bit to a buffer, flush 5 bit, read in another 8 bit and flush all 5-digit chunks again. I wanted to give this idea a trial, which worked quite well:

TwoFive = a => {

  s = "23456789ABCDEFGHJKLMNPQRSTUVWXYZ"
  n = m = 0
  r = ""

  for (x of a) {
    m|= x << n
    n+= 8
    for (; n > 4; )
      r+= s[m & 31],
      m>>= 5,
      n-= 5
  }
  return n ? r + s[m & 31] : r
}

« Back to problem overview