packed structs

N

Nick Keighley

having worked on numerous on the wire protocols, I've encountered this
problem many times.  The most practical solution for all but the most
trivial cases is to code generate the formatting code.  The source for
the code generator can either be C (or C++ if inheritance helps) code or
some other easy to parse format.  This is one case where I prefer XML
(often in the form of an OpenOffice document) as the "other easy to
parse format".

there's ASN.1 too...
 
S

Stephen Sprunk

I'd actually finished (what I think is) a more elegant solution,
that I called smemf() that's like memcpy() but under format
control, including additonal format specifiers for hex, for bits,
and for other stuff. The code actually works fine, but still
uncompleted is 723 lines (though that includes >>many<< comments),
which is somewhat of a tail-wagging-dog situation which I also
want to avoid.

Your smemf() looks interesting, but I'm curious why you went with that
syntax (which, despite claiming to be similar to printf, doesn't seem to
end up looking much like it) rather than leveraging the syntax of an
existing system for the same purpose, eg. Perl's pack/unpack.

S
 
J

JohnF

Stephen Sprunk said:
Your smemf() looks interesting, but I'm curious why you went with that
syntax (which, despite claiming to be similar to printf, doesn't seem to
end up looking much like it) rather than leveraging the syntax of an
existing system for the same purpose, eg. Perl's pack/unpack.
S

You didn't read all the followups -- I wasn't aware of perl's
(or python's) pack/unpack, but they were brought to my attention
(thanks again, guys). Then I indeed said I'd be reading up more
carefully about them all, and trying to re-spec smemf() (and maybe
rename it -- pack or spackf???) based on the all that stuff,
as the best C variant I can come up with.
By the way, compared with those pack/unpack formats, I think
smemf()'s format string syntax is a lot more C/sprintf-like
(could you be more specific?). That's, of course, the trick:
access all functionality from a format string syntax that's
immediately intuitively sensible to both (a)people already
familiar with perl and/or python pack/unpack and maybe C/sprintf,
as well as (b)people only familiar with C/sprintf. Since I'm a
b-type myself, and wasn't even aware of the a-types until
brought to my attention here, the C/sprintf look-alike was
my sole original goal (which I'd thought was pretty successful,
modulo the minimal unavoidable syntax differences due to
fundamental functional requirements differences).
 
B

Ben Bacarisse

JohnF said:
By the way, compared with those pack/unpack formats, I think
smemf()'s format string syntax is a lot more C/sprintf-like
(could you be more specific?). That's, of course, the trick:
access all functionality from a format string syntax that's
immediately intuitively sensible to both (a)people already
familiar with perl and/or python pack/unpack and maybe C/sprintf,
as well as (b)people only familiar with C/sprintf. Since I'm a
b-type myself, and wasn't even aware of the a-types until
brought to my attention here, the C/sprintf look-alike was
my sole original goal (which I'd thought was pretty successful,
modulo the minimal unavoidable syntax differences due to
fundamental functional requirements differences).

The main departure from sprintf is that literal characters are not
copied to the destination. That's going to look very odd at first
glance. The main departure from pack/unpack is the lack of support for
alternative byte orderings. As a result, I'm not sure it's all that
close to either familiar "model".
 
J

JohnF

Ben Bacarisse said:
The main departure from sprintf is that literal characters are not
copied to the destination. That's going to look very odd at first
glance.

Impossible to avoid (I think): what are you supposed to do with
smemf(mem,"deaf"); ?
Is that 4 ascii chars or two hex bytes? Somehow, the
user has specify what he wants. My solution:
smemf(mem,"deaf %s"); or smemf(mem,"deaf %x");
where a format specification preceded by a literal
is applied to that literal rather than eating the next arg.
If you've got a better idea, please follow up... consider
that both a request and a challenge. I tried and failed
to think of any better idea. But I'd be grateful for one.
I can (probably) code it, but I can't think of it.
The main departure from pack/unpack is the lack of support for
alternative byte orderings. As a result, I'm not sure it's all that
close to either familiar "model".

It does have little/big-endian support as an extra %d flag.
In the middle of the state machine parsing the format string...
/* --- endian option flags (for %d) --- */
case 'l': case 'L': endian = (-1); break; /* little-endian */
case 'b': case 'B': endian = (+1); break; /* or big-endian */
case 'e': case 'E': endian = ENDIAN; break; /*or whatever machine uses*/
I just didn't document it very well in the comments yet.
Sorry about that. I have a back-and-forth technique between
writing up the functional specs as comments in enough detail
to get going, then writing some code to see how it works,
then returning to the comments to fix them up or add some more
or whatever, etc, etc.
It was primarily smemf()'s general idea I wanted to get across
by posting that comment block. Obviously, any shortcomings
in the specific functional details can be corrected and coded.
Of course, when I originally posted those comments, I wasn't
yet aware of python/perl pack and unpack. In that case, I could
have gotten across the general idea just by mentioning those.
 
B

Ben Bacarisse

JohnF said:
Impossible to avoid (I think): what are you supposed to do with
smemf(mem,"deaf"); ?
Is that 4 ascii chars or two hex bytes?

My point was that its 4 chars (the encoding is whatever your C
implementation decides, it need not be ASCII) in sprintf and not doing
that causes a difference. You said you'd aimed for (or achieved)
"minimal unavoidable syntax differences".
Somehow, the
user has specify what he wants. My solution:
smemf(mem,"deaf %s"); or smemf(mem,"deaf %x");
where a format specification preceded by a literal
is applied to that literal rather than eating the next arg.

In sprintf et. al. the number of required arguments is equal to the
number of format specifiers and everything else is literal bytes. I am
not saying this is wrong, I am saying that it does not match my notion
of "minimal unavoidable syntax differences" from sprintf.

<snip>
 
J

JohnF

Ben Bacarisse said:
My point was that its 4 chars (the encoding is whatever your C
implementation decides, it need not be ASCII) in sprintf and not doing
that causes a difference. You said you'd aimed for (or achieved)
"minimal unavoidable syntax differences".


In sprintf et. al. the number of required arguments is equal to the
number of format specifiers and everything else is literal bytes. I am
not saying this is wrong, I am saying that it does not match my notion
of "minimal unavoidable syntax differences" from sprintf.
<snip>

Well, part of what you snipped was my request for an alternative.
Lacking that, it matches the notion of "minimal unavoidable
syntax differences" because of the "unavoidable" part.
You have to somehow deal with "deaf". And, even worse, with "10",
which could be ascii/decimal/hex/bits, and which I deal with
in one consistent way, "10%s"/"10%d"/"10%x"/"10%b".
For someone familiar and comfortable with sprintf formats,
that's the most intuitively sensible way I could think of to
deal with the problem. And it >>must<< be dealt with somehow.
If you've got a better idea, I'd be happy to code it.
Otherwise, it's an unavoidable difference.
 
B

BartC

It was primarily smemf()'s general idea I wanted to get across
by posting that comment block. Obviously, any shortcomings
in the specific functional details can be corrected and coded.
Of course, when I originally posted those comments, I wasn't
yet aware of python/perl pack and unpack. In that case, I could
have gotten across the general idea just by mentioning those.

Actually this seems a much better idea to me now, than it did at first ...
if you forget about the purpose of using it in place of C's packed structs.

It's just an alternative, possibly simpler way of writing to a binary stream
than trying to use -printf() functions, or sequences of function calls. And
presumably to read from them too.

But you didn't explain clearly how it would be used. In the GIF example,
presumably you'd have an *unpacked* struct representing the header
information, which allows the program to access fields, now properly
aligned, using the conventional forms of 'p.a' or 'p->a'.

smem() (and it's -scanf() counterpart) would simply be used to write to the
proper packed form, or to read from it.

So they are not a plug-in replacement for '#pragma pack()'.
 
J

JohnF

BartC said:
Actually this seems a much better idea to me now, than it did at first ...
if you forget about the purpose of using it in place of C's packed structs.

It's just an alternative, possibly simpler way of writing to a binary stream
than trying to use -printf() functions, or sequences of function calls. And
presumably to read from them too.

But you didn't explain clearly how it would be used. In the GIF example,
presumably you'd have an *unpacked* struct representing the header
information, which allows the program to access fields, now properly
aligned, using the conventional forms of 'p.a' or 'p->a'.

Well, you'd use it however you liked. But for my gif situation,
I'd envisioned >>doing away with structs entirely<<, packed or not.
The smemf >>format string totally replaces<< the need for any struct.
And, of course, I prototyped that for myself, i.e., wrote some
pseudocode using the as-yet-uncompleted smemf, just to make sure
that idea seems to work.
You can see exhaustively complete comments about the gif block
formats at forkosh.com/gifsave89.html by clicking the Listing link
along the left-hand side under "Related Pages", and scrolling down
to line#493 for the GIFIMAGEDESCRIP struct. My "pseudocode" for that
is just one smemf statement that totally replaces the struct,

nbitsinbuffer = /* whitespace in smemf format string is ignored */
smemf(buffer, " 2C %x " /* Image Descriptor identifier is hex 2C */
" %2ld" /* 2-byte little-endian word for X-pos */
" %2ld" /* 2-byte little-endian word for Top */
" %2ld" /* 2-byte little-endian word for Width */
" %2ld" /* 2-byte little-endian word for Height */
/* following is the "Packed" Byte consisting of five bit fields */
" %3b " /* 3-bits #colorbits */
" 0 %2b " /* 2-bits "reserved bits" */
" 0 %1b " /* 1-bit local colortable sort flag */
" 0 %1b " /* 1-bit interlace flag */
" %1b ", /* 1-bit local colortable flag */
col0,row0, ncols,nrows, ncolorbits, (colortable==NULL?0:1) );

And smemf() returns the size, in #bits, of the buffer it constructs.
That would usually be a multiple of 8, in which case you can just
fwrite(buffer,etc), or do whatever you want with it.
 
B

Ben Bacarisse

JohnF said:
Well, part of what you snipped was my request for an alternative.
Lacking that, it matches the notion of "minimal unavoidable
syntax differences" because of the "unavoidable" part.

I gave you my suggestion which is why I snipped the request -- literal
text just represents those characters. That's what sprintf does and
therefor offers the least "surprise".
You have to somehow deal with "deaf". And, even worse, with "10",
which could be ascii/decimal/hex/bits, and which I deal with
in one consistent way, "10%s"/"10%d"/"10%x"/"10%b".
For someone familiar and comfortable with sprintf formats,
that's the most intuitively sensible way I could think of to
deal with the problem. And it >>must<< be dealt with somehow.

Doing what sprintf does must surely be closer than doing something new.
This problem that you insist must be dealt with seems to me to be an
invented one: it's intuitive (to me at least) that "deaf" in a format
means copy a 'd' an 'e' an 'a' and then an 'f' to the destination.
 
J

JohnF

Ben Bacarisse said:
it's intuitive (to me at least) that "deaf" in a format
means copy a 'd' an 'e' an 'a' and then an 'f' to the destination.

Okay, good to know. Thanks so much for your thoughts.
 
S

Stephen Sprunk

You didn't read all the followups -- I wasn't aware of perl's (or
python's) pack/unpack, but they were brought to my attention (thanks
again, guys). Then I indeed said I'd be reading up more carefully
about them all, and trying to re-spec smemf() (and maybe rename it --
pack or spackf???) based on the all that stuff, as the best C variant
I can come up with.

Fair enough.
By the way, compared with those pack/unpack formats, I think
smemf()'s format string syntax is a lot more C/sprintf-like (could
you be more specific?).

Aside from your use of percent signs for format specifiers, I don't see
much in common, and in fact I find your use of percent signs to be quite
misleading since they behave rather differently than printf()'s.
That's, of course, the trick: access all functionality from a format
string syntax that's immediately intuitively sensible to both
(a)people already familiar with perl and/or python pack/unpack and
maybe C/sprintf, as well as (b)people only familiar with C/sprintf.

I don't see how (a) could have been a goal if you hadn't been aware of
Perl's pack()/unpack() when you developed your smemf() interface.
Since I'm a b-type myself, and wasn't even aware of the a-types until
brought to my attention here, the C/sprintf look-alike was my sole
original goal (which I'd thought was pretty successful, modulo the
minimal unavoidable syntax differences due to fundamental functional
requirements differences).

Some differences were unavoidable, but others were not.

For instance, sprintf() copies the format string verbatim to the output
except for format specifiers, which are replaced by formatted arguments.
smemf() discards whitespace and puts literal arguments in the format
string, which are consumed by subsequent format specifiers.

More detail in my response to your example.

S
 
S

Stephen Sprunk

Well, you'd use it however you liked. But for my gif situation,
I'd envisioned >>doing away with structs entirely<<, packed or not.
The smemf >>format string totally replaces<< the need for any struct.
And, of course, I prototyped that for myself, i.e., wrote some
pseudocode using the as-yet-uncompleted smemf, just to make sure
that idea seems to work.
You can see exhaustively complete comments about the gif block
formats at forkosh.com/gifsave89.html by clicking the Listing link
along the left-hand side under "Related Pages", and scrolling down
to line#493 for the GIFIMAGEDESCRIP struct. My "pseudocode" for that
is just one smemf statement that totally replaces the struct,

nbitsinbuffer = /* whitespace in smemf format string is ignored */
smemf(buffer, " 2C %x " /* Image Descriptor identifier is hex 2C */
" %2ld" /* 2-byte little-endian word for X-pos */
" %2ld" /* 2-byte little-endian word for Top */
" %2ld" /* 2-byte little-endian word for Width */
" %2ld" /* 2-byte little-endian word for Height */
/* following is the "Packed" Byte consisting of five bit fields */
" %3b " /* 3-bits #colorbits */
" 0 %2b " /* 2-bits "reserved bits" */
" 0 %1b " /* 1-bit local colortable sort flag */
" 0 %1b " /* 1-bit interlace flag */
" %1b ", /* 1-bit local colortable flag */
col0,row0, ncols,nrows, ncolorbits, (colortable==NULL?0:1) );

And smemf() returns the size, in #bits, of the buffer it constructs.
That would usually be a multiple of 8, in which case you can just
fwrite(buffer,etc), or do whatever you want with it.

This example shows how far you have diverged from sprintf(). From such
a claim, I would have expected something more like this:

nbitsinbuffer = smemf(buffer,
"%1lu" /* Image Descriptor, 8-bit little-endian unsigned int */
"%2lu" /* X-Pos, 16-bit little-endian unsigned int */
"%2lu" /* Top, 16-bit little-endian unsigned int */
"%2lu" /* Width, 16-bit little-endian unsigned int */
"%2lu" /* Height, 16-bit little-endian unsigned int */
/* following is the "Packed" Byte consisting of five bit fields */
"%3b" /* #colorbits, 3-bit field */
"%2b" /* reserved, 2-bit field */
"%1b" /* local colortable sort flag, 1-bit field */
"%1b" /* interlace flag, 1-bit field */
"%1b", /* local colortable flag, 1-bit field */
0x2C, col0, row0, ncols, nrows, ncolorbits, 0, 0, 0,
(colortable==NULL?0:1) );

Notice that whitespace is _not_ ignored and that there are ten arguments
corresponding to ten format specifiers. I also used %u rather than %d
since I'm pretty sure those ints are supposed to be unsigned (but I'm
not sure that makes a difference here).

Of course, your use of "%ld" to mean "little-endian word" rather than
"long signed integer" is a major difference as well, though that's
forgivable since you obviously need more control over representation
than sprintf()'s specifiers offer. When thinking it through, though,
that was the point at which I decided that trying to reuse those
specifiers was probably more trouble than it was worth.

S
 
J

JohnF

Stephen Sprunk said:
This example shows how far you have diverged from sprintf(). From such
a claim, I would have expected something more like this:

nbitsinbuffer = smemf(buffer,
"%1lu" /* Image Descriptor, 8-bit little-endian unsigned int */
"%2lu" /* X-Pos, 16-bit little-endian unsigned int */
"%2lu" /* Top, 16-bit little-endian unsigned int */
"%2lu" /* Width, 16-bit little-endian unsigned int */
"%2lu" /* Height, 16-bit little-endian unsigned int */
/* following is the "Packed" Byte consisting of five bit fields */
"%3b" /* #colorbits, 3-bit field */
"%2b" /* reserved, 2-bit field */
"%1b" /* local colortable sort flag, 1-bit field */
"%1b" /* interlace flag, 1-bit field */
"%1b", /* local colortable flag, 1-bit field */
0x2C, col0, row0, ncols, nrows, ncolorbits, 0, 0, 0,
(colortable==NULL?0:1) );

Thanks for the alternative spec. I'll keep it in mind.
I agree that your format's more sprintf-like. Maybe I
should "deprecate" that literal requirement. More
about why in response to your specific remarks below...
Notice that whitespace is _not_ ignored and that there are ten arguments
corresponding to ten format specifiers.

Both of which I'd consider bad news. Certainly, ignoring
whitespace is different. But notice that your format string
has none. And likely never would, because how often is
whitespace going to be part of a binary packet format
specification? So it's basically no use for this particular
purpose.
And you also don't get constants in your formats.
Instead, you've got that 0x2C and 0,0,0 in the arguments
which "force-feeds" the "%1b" specifiers. And which
the user has to know about. Instead, I'm moving those
same 0x2C and 0's into the format string
preceding the %1b. Different, again, but serves
the purpose better. The user shouldn't have to
remember magic numbers like 0x2C, and where to put
them.
Instead, maybe some .h file with all the format
strings describing the packets for some particular
protocol/whatever would embed all that info and
hide it from the user. He'd just say something like,
#include "packetformatstrings.h"
unsigned char *thispacketbuffer = malloc(whatever);
...
smemf(thispacketbuffer, thispacketformatstring,
all,the,numbers,I,actually,care,about,and,no,others);
And that would do the entire job, with the user taking
care of his business only, and the binary packet formatting
pretty much entirely handled for him.
I also used %u rather than %d
since I'm pretty sure those ints are supposed to be unsigned (but I'm
not sure that makes a difference here).

Of course, your use of "%ld" to mean "little-endian word" rather than
"long signed integer" is a major difference as well, though that's
forgivable since you obviously need more control over representation
than sprintf()'s specifiers offer. When thinking it through, though,
that was the point at which I decided that trying to reuse those
specifiers was probably more trouble than it was worth.

Forget the characters %u vs %d and length modifier, flags, etc.
That's easily/trivially changed. In fact, I'd already changed
my mind to %<d or (%<u if you prefer) for little endian,
and the obvious for big, but might change it again.
Just focus on the functionality. The format string syntax,
at least so far as specifiers, flags, modifiers, etc are
concerned, can be decided later. For now, they're just
illustrative -- you have to write something while discussing it.
 
S

Stephen Sprunk

Thanks for the alternative spec. I'll keep it in mind. I agree that
your format's more sprintf-like. Maybe I should "deprecate" that
literal requirement. More about why in response to your specific
remarks below...

Note that I'm not saying that my syntax is better; I'm just saying that
it is more like a sprintf() format string, which may actually turn out
to be worse than coming up with something completely new.
Both of which I'd consider bad news. Certainly, ignoring whitespace
is different. But notice that your format string has none. And likely
never would, because how often is whitespace going to be part of a
binary packet format specification? So it's basically no use for this
particular purpose.

sprintf() copies everything in the format string (including whitespace)
to its output, replacing format specifiers as it goes. If you're going
to claim to be similar to sprintf(), then you need to do the same.
And you also don't get constants in your formats. Instead, you've got
that 0x2C and 0,0,0 in the arguments which "force-feeds" the "%1b"
specifiers. And which the user has to know about. Instead, I'm moving
those same 0x2C and 0's into the format string preceding the %1b.
Different, again, but serves the purpose better. The user shouldn't
have to remember magic numbers like 0x2C, and where to put them.

I can see cases where I might want to include literal bytes in the
format string, such as the "GIF89a" at the beginning of GIF files,
without having to use a format specifier. In some cases, those literal
bytes might happen to include "whitespace".

Instead, rather that doing the same thing as sprintf(), you chose to
have format specifiers modify the preceding literal text in the format
string and treat whitespace specially, which is completely unlike sprintf().
Forget the characters %u vs %d and length modifier, flags, etc.
That's easily/trivially changed. In fact, I'd already changed my mind
to %<d or (%<u if you prefer) for little endian, and the obvious for
big, but might change it again.

I'm curious how you'd specify middle-endian. Presumably, without one of
those three modifiers, it would use native byte order?
Just focus on the functionality. The format string syntax, at least
so far as specifiers, flags, modifiers, etc are concerned, can be
decided later. For now, they're just illustrative -- you have to
write something while discussing it.

My point was merely to illustrate why I said your syntax didn't seem to
be similar to sprintf() as you had claimed. I'm not debating whether it
is a good syntax without such a claim.

S
 
W

W Karas

Any >>portable<< way to accomplish that in c?

Don't want to use __attribute__((__packed__))

or #pragma pack, etc, nor #ifdef's to choose

among whatever alternatives I happen to know

about. It's a requirement that the code remain

portable.



In particular, I'm trying to write blocks that

conform to a binary file format (gif), and can

set up structs for them easily enough, but can't

fwrite(blockstruct,sizeof(blockstruct),1,fileptr),

or the like, due to blockstruct's inevitable

padding (which indeed occurs for gif format blocks).



At the moment, I just have a different func for

each block type that writes out the members of that

particular struct individually... b..o..r..i..n..g.

A generalization of that idea (if portable packing's

not possible) would also be fine: >>if<< there's some

way to reference the members of a struct, passed as

an argument but of unknown (to the func) type, in a

loop, i.e., for(i=0;i<nmembers;i++)thisstruct->member.

Then I could offsetof() and sizeof() each member, and

write it out, so just one (much less boring) func

could handle all the different block type structs.



But, afaik, I don't think that thisstruct->member

thing is possible, nor portable packing. So is there

any "one size fits all" way to handle this problem?

I'm sure people must come across it frequently enough

that it's been thought about, and the best possible

approach (among, possibly, several bad alternatives)

has been identified.


typedef unsigned char P2b[2];

#define SETP2(PTR, VAL) \
((PTR)[0] = (VAL) >> CHAR_BIT, (PTR)[1] = (VAL) & ((1 << CHAR_BIT) - 1))

#define GETP2(PTR) \
(((PTR)[0] << CHAR_BIT) | (PTR)[0])

Larger "packable ints" are analogous. My choice of big-endian here is arbitrary.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,078
Messages
2,570,570
Members
47,204
Latest member
MalorieSte

Latest Threads

Top