Here's one that I put together as a testbed for some mainframe-to-unix tools I
was working on. I used this C code as a model for a COBOL program that
manipulated base64 encodings.
int encode(unsigned s_len, char *src, unsigned d_len, char *dst)
Could make src const char*; and theoretically better to use size_t.
{
unsigned triad;
for (triad = 0; triad < s_len; triad += 3)
{
unsigned long int sr;
unsigned byte;
for (byte = 0; (byte<3)&&(triad+byte<s_len); ++byte)
{
sr <<= 8;
sr |= (*(src+triad+byte) & 0xff);
}
This uses sr uninitialized; in practice unsigned ints won't have trap
representations or even padding, but it's still unclean.
I assume/hope you do (most) array references as *(ptr+sub) instead of
ptr[sub] for alignment with the COBOL; it's still ugly.
sr <<= (6-((8*byte)%6))%6; /* leftshift to 6bit align */
Yuck. Confusing *and* inefficient. Why not
sr <<= (3-byte)*(8-6); /* leftshift for skipped bytes less skipped
output chars */
/* determine which sextet value a Base64 character represents */
int tlu(int byte)
{
int index;
for (index = 0; index < 64; ++index)
if (base64[index] == byte)
break;
if (index > 63) index = -1;
return index;
}
Much more natural in C to use strchr, or even memchr; or set up and
use a reverse translation table. COBOL again?
/* Decode source from Base64 encoded string into raw data */
int decode(unsigned s_len, char *src, unsigned d_len, char *dst)
Similarly.
{
unsigned six, dix;
dix = 0;
for (six = 0; six < s_len; six += 4)
{
unsigned long sr;
unsigned ix;
sr = 0;
This time you do initialize sr.
for (ix = 0; ix < 4; ++ix)
{
int sextet;
if (six+ix >= s_len)
return 1;
if ((sextet = tlu(*(src+six+ix))) < 0)
break;
sr <<= 6;
sr |= (sextet & 0x3f);
Don't need this &, a valid char decode never exceeds 6 bits.
}
switch (ix)
{
case 0: /* end of data, no padding */
return 0;
Or padding of a full group of 4 =, which is at least one of the
standards(!) and your decode does not distinguish from garbage.
If that matters. And of course you don't check padding ='s at all; are
you requiring your caller(s) do that? It's going to be hard(er) for
them, because you don't return any indication of how many chars were
validly decoded, or even into how many bytes.
case 1: /* can't happen */
return 2;
(Can't happen *legally*.)
case 2: /* 1 result byte */
sr >>= 4;
if (dix > d_len) return 3;
dix >= d_len or if you prefer dix+1 > d_len. Unless your d_len already
allows for at least one additional (perhaps terminator?) byte.
*(dst+dix) = (sr & 0xff);
++dix;
break;
Similarly for the 2-byte and 3-byte cases.
In encode you have an offset stepping through the data but adjust the
pointer and count for output chars; in decode you use offsets on both.
I would prefer to be consistent; in C I think I would do adjust in all
cases; and also use names consistent betwen the two directions.
In practice I would probably also loop over only full groups with
their more regular logic, and then handle the more complicated partial
leftovers once, but you don't need and might not even want that for a
reference version.
- David.Thompson1 at worldnet.att.net