C
Clint O
Hi:
I know the FAQ recommends using text where possible, but I've always
been somewhat intrigued by the idea of writing/reading binary data in
a portable way.
Rob Pike published a paper about this and other topics in:
http://plan9.bell-labs.com/sys/doc/comp.html
In there he suggested an implementation that manages to read an
unsigned long assuming that such a type holds 4 bytes and sidesteps
issues of endianness:
ulong getlong(void)
{
ulong l;
l = (getchar()&0xFF)<<24;
l |= (getchar()&0xFF)<<16;
l |= (getchar()&0xFF)<<8;
l |= (getchar()&0xFF)<<0;
return l;
}
It got me thinking how tractable it would be doing things in this way.
So, to check my understanding and see whether I understood the
nuances, I did it myself, this time doing it the way *I* think is the
most intuitive: shifting the bytes right instead of left:
#include <stdio.h>
unsigned long getlong(void)
{
unsigned long l;
l = (getchar() & 0xFFUL);
l |= (getchar() & 0xFFUL) << 8;
l |= (getchar() & 0xFFUL) << 16;
l |= (getchar() & 0xFFUL) << 24;
return l;
}
int main(void)
{
printf("Received %lu\n", getlong());
return 0;
}
Likewise I wrote something that serialized an unsigned long on the
transmitting side:
#include <stdio.h>
int main(void)
{
unsigned long l = 0xdeadbeef;
putchar(l & 0xFFUL);
putchar((l >> 8) & 0xFFUL);
putchar((l >> 16) & 0xFFUL);
putchar((l >> 24) & 0xFFUL);
fprintf(stderr, "Outputting %lu\n", l);
fprintf(stderr, "Unsigned long is %lu\n", sizeof(unsigned long));
return 0;
}
On the receiving side, I did run into one caveat. The intermediate
expression of (getchar() & 0xFF) << 24 caused the result to get
extended with 1s presumably because of operand size differences. Note
that on this Linux platform unsigned long is 8 bytes, not 4.
What I found interesting is that he alludes to the fact that you could
use this technique to send structures etc. and issues of padding and
alignment would not be a problem. I assume this is because you'd
transmit every struct member individually using this scheme? I could
imagine a scheme where you took an array of structure offsets and a
pointer to a struct and somehow naively transmitted them using helper
functions.
Also, he does not say how you'd handle floating point data. I assume
there's no way to do that strictly in binary form regardless of
architecture. I also hadn't mentally sorted out how you'd accommodate
differences in sizes of the basic types like unsigned long and
integer.
I'm curious if anyone on here has solved this problem in one way or
another and how these chose to handle it.
Thanks,
-Clint
I know the FAQ recommends using text where possible, but I've always
been somewhat intrigued by the idea of writing/reading binary data in
a portable way.
Rob Pike published a paper about this and other topics in:
http://plan9.bell-labs.com/sys/doc/comp.html
In there he suggested an implementation that manages to read an
unsigned long assuming that such a type holds 4 bytes and sidesteps
issues of endianness:
ulong getlong(void)
{
ulong l;
l = (getchar()&0xFF)<<24;
l |= (getchar()&0xFF)<<16;
l |= (getchar()&0xFF)<<8;
l |= (getchar()&0xFF)<<0;
return l;
}
It got me thinking how tractable it would be doing things in this way.
So, to check my understanding and see whether I understood the
nuances, I did it myself, this time doing it the way *I* think is the
most intuitive: shifting the bytes right instead of left:
#include <stdio.h>
unsigned long getlong(void)
{
unsigned long l;
l = (getchar() & 0xFFUL);
l |= (getchar() & 0xFFUL) << 8;
l |= (getchar() & 0xFFUL) << 16;
l |= (getchar() & 0xFFUL) << 24;
return l;
}
int main(void)
{
printf("Received %lu\n", getlong());
return 0;
}
Likewise I wrote something that serialized an unsigned long on the
transmitting side:
#include <stdio.h>
int main(void)
{
unsigned long l = 0xdeadbeef;
putchar(l & 0xFFUL);
putchar((l >> 8) & 0xFFUL);
putchar((l >> 16) & 0xFFUL);
putchar((l >> 24) & 0xFFUL);
fprintf(stderr, "Outputting %lu\n", l);
fprintf(stderr, "Unsigned long is %lu\n", sizeof(unsigned long));
return 0;
}
On the receiving side, I did run into one caveat. The intermediate
expression of (getchar() & 0xFF) << 24 caused the result to get
extended with 1s presumably because of operand size differences. Note
that on this Linux platform unsigned long is 8 bytes, not 4.
What I found interesting is that he alludes to the fact that you could
use this technique to send structures etc. and issues of padding and
alignment would not be a problem. I assume this is because you'd
transmit every struct member individually using this scheme? I could
imagine a scheme where you took an array of structure offsets and a
pointer to a struct and somehow naively transmitted them using helper
functions.
Also, he does not say how you'd handle floating point data. I assume
there's no way to do that strictly in binary form regardless of
architecture. I also hadn't mentally sorted out how you'd accommodate
differences in sizes of the basic types like unsigned long and
integer.
I'm curious if anyone on here has solved this problem in one way or
another and how these chose to handle it.
Thanks,
-Clint