Robert Eisele
Engineer, Systems Architect and DBA

Codefights: TwoFive

Original Problem

Base 32 is often chosen to represent binary data in a compressed form that can be easily read and handled by humans (for example, product keys or activation codes). It's more compact than Base 10 and easier to comprehend than Base 16 (hexadecimal) or Base 64.

To generate Base 32, the input data is processed as a bit stream. Output values for each group of five bits are produced until all input bits are processed. The last 5-bit group is extended with '0' bits if it is incomplete.

Finally, each 5-bit value is mapped into a dictionary of 32 alphanumeric characters to generate a human readable string.

The dictionary omits ambiguous characters such as '0', '1', 'O', and 'I', so only the following characters remain:

d = "23456789ABCDEFGHJKLMNPQRSTUVWXYZ".

Your task is to generate Base 32 from the given input data a.

Example

  • For a = [1, 2], the output should be

    TwoFive(a) = "3J22".

    Represented as bits right-to-left, the input data is transformed as follows:

    Index: 1 0
    Value: 2 1
    Bits: 00000010 00000001
    

    For Base 32 we're interpreting the same bits as groups of five, padding the last group with 0 bits:

    Index: 3 2 1 0
    Value: 0 0 16 1
    Bits: 00000 00000 10000 00001
    

    We get the result sequence [1, 16, 0, 0], which we can now map into the dictionary:

    1 => '3'
    16 => 'J'
    0 => '2'
    0 => '2'
    

    The resulting Base 32 string is "3J22".

  • For a = [170, 170], the output should be

    TwoFive(a) = "CPC3".

    Represented as bits right-to-left, the input data is transformed as follows

    Index: 1 0
    Value: 170 170
    Bits: 10101010 10101010
    

    For Base 32 we're interpreting the same bits as groups of five, again padding the last group with 0 bits:

    Index: 3 2 1 0
    Value: 1 10 21 10
    Bits: 00001 01010 10101 01010
    

    The result sequence is [10, 21, 10, 1]. Mapped into the dictionary:

    10 => 'C'
    21 => 'P'
    10 => 'C'
     1 => '3'
    

    The resulting Base 32 string is "CPC3".

Input/Output

  • [time limit] 4000ms (js)
  • [input] array.integer a

    Array of 8-bit unsigned integers.

    Constraints:

    0 ≤ a.length < 16.

  • [output] string

    Character-mapped Base 32 representation of the input data.

Solution

The first idea that came to my mind was creating a linear mapping from the indexes of the 8 bit digits to the 5 bit digits. That means that I was looking for mapping that was looking like this for a 2 byte value:

    7654321076543210
43210432104321043210

I created slicing windows from \(a\) to \(b\). The first window is from 0 to 4, the next from 5 to 7 and 0 to 1 and so on. The following algorithm produces exactly this:

a = 0
b = a + 5 - 1
n = 0

for (var i=0; i < 8; i++) {

  if (a < b)
    console.log(n, a, b)
  else
    console.log(n++, 7, a, "##",n,b, 0)

  a = (b + 1) % 8
  b = (a + 5 - 1) % 8
}

With this index creating algorithm, it's easy to create the masks to compute the actual base 32 values and map the array finally to the given dictionary:

arr = [170, 170]
r = []

a = 0
b = a + 5 - 1
n = 0

for (; n < arr.length; ) {

  if (a < b) {
    r.push((arr[n] >> a) & 31)
    console.log(n, a, b, r)
    if (b == 7) n++
  } else {
    v = (arr[n] >> a) & 31
    if (n+1<arr.length)
      v|=(arr[n+1] << (7-a+1)) & 31
    r.push(v)
    console.log(n, 7, a, "##",n+1,b, 0, r)
    n++;
  }
  a = (b + 1) % 8
  b = (a + 5 - 1) % 8
}

r.map(x => "23456789ABCDEFGHJKLMNPQRSTUVWXYZ"[x]).join``

When simplified, this algorithm looks as follows:

TwoFive = X => {
  r = ""
  for (a = n = 0; X[n]; ) {
    b = (a + 4) % 8
    v = X[n] >> a & 31
    b == 7 && n++
    b < a && X[++n] && (v|= X[n] << 8 - a & 31)
    a = (b + 1) % 8
    r+= "23456789ABCDEFGHJKLMNPQRSTUVWXYZ"[v]
  }
  return r
}

As this algorithm is not readable anymore and not as small as I want it to be, I started all over. The only thing we need is a little buffer to overcome the downgrade from 8 to 5 bit. So why not writing 8 bit to a buffer, flush 5 bit, read in another 8 bit and flush all 5-digit chunks again. I wanted to give this idea a trial, which worked quite well:

TwoFive = a => {

  s = "23456789ABCDEFGHJKLMNPQRSTUVWXYZ"
  n = m = 0
  r = ""

  for (x of a) {
    m|= x << n
    n+= 8
    for (; n > 4; )
      r+= s[m & 31],
      m>>= 5,
      n-= 5
  }
  return n ? r + s[m & 31] : r
}

« Back to problem overview