C question

  • Thread starter Kenny McCormack
  • Start date
K

Kenny McCormack

I'm thinking of writing a tool to analyze C structs - specifically to
generate a mapping between each struct member and its offset (from the
beginning of the struct). The reason for this is that I need to access C
structs from another language that doesn't have structs - it only has
offsets. I can use the offsetof(3) macro to generate the offsets; the
actual problem is generating the list of all the members. In particular, if
members are themselves structs, then you will need to recursively expand
them out.

This doesn't look to be a very difficult project, but I'm curious if there
is already something out there that does it. I.e., to avoid wheel
re-invention...
 
B

BartC

Kenny McCormack said:
I'm thinking of writing a tool to analyze C structs - specifically to
generate a mapping between each struct member and its offset (from the
beginning of the struct). The reason for this is that I need to access C
structs from another language that doesn't have structs - it only has
offsets. I can use the offsetof(3) macro to generate the offsets; the
actual problem is generating the list of all the members. In particular,
if
members are themselves structs, then you will need to recursively expand
them out.

This doesn't look to be a very difficult project,

On the contrary, it seems to me that it *is* difficult, since you'd have to
create half a compiler for it to do the job (for example, needing to analyse
and expand a few thousand lines of headers in order to work out the size of
a typedef used for a particular struct member).

Slightly simpler, but not much, is for the analyser to take the C struct
definition, and to create a program using it which when run, prints out all
the members and offsets involved. It depends also on how these definitions
are presented: whether an arbitrary C program is an input, and the output is
a detailed list of all the structs encountered.
but I'm curious if there
is already something out there that does it. I.e., to avoid wheel
re-invention...

I'd be interested too because I have a similar problem, but in my case the
other language does have structs. I solve it by manually constructing a
matching struct in this other language which exactly corresponds, in member
types, sizes, order and offsets, to the original C version. That's not so
simple; it can involve running a dummy C program to display the sizes and
offsets where they are not obvious.

(I guess gcc doesn't have an option to print out such a list, or you would
have known of it.)
 
I

Ian Collins

Kenny said:
I'm thinking of writing a tool to analyze C structs - specifically to
generate a mapping between each struct member and its offset (from the
beginning of the struct). The reason for this is that I need to access C
structs from another language that doesn't have structs - it only has
offsets. I can use the offsetof(3) macro to generate the offsets; the
actual problem is generating the list of all the members. In particular, if
members are themselves structs, then you will need to recursively expand
them out.

If they're your structs, generate both the C and other language code
form an alternative source.
This doesn't look to be a very difficult project, but I'm curious if there
is already something out there that does it. I.e., to avoid wheel
re-invention...

It would probably involve a good percentage of gcc....
 
S

Shao Miller

I'm thinking of writing a tool to analyze C structs - specifically to
generate a mapping between each struct member and its offset (from the
beginning of the struct). The reason for this is that I need to access C
structs from another language that doesn't have structs - it only has
offsets. I can use the offsetof(3) macro to generate the offsets; the
actual problem is generating the list of all the members. In particular, if
members are themselves structs, then you will need to recursively expand
them out.

This doesn't look to be a very difficult project, but I'm curious if there
is already something out there that does it. I.e., to avoid wheel
re-invention...

I started tackling something possibly related to this:


http://git.zytor.com/?p=users/sha0/...7;hb=95e4b5dedc01d6392ea4401fa1016a4c1418c467

But haven't yet finished it. The goal was to use macro magic to both
generate the structure type definitions as well as to generate
"descriptors" that could be used to serialize and deserialize such
structures.

Unfortunately, it means that IDEs or tools which try to parse such
source code in order to offer auto-completion information like:

foo.
^ - bar
- baz

are less likely to be able to, due to the macro obfuscation. If I
recall correctly, several people have suggested to me that they would
actually prefer a tool which:

1. Parses a file whose format is _your_ design
2. Generates C source code from that file
- Including the structure type definition, in a header
- And maybe even a non-header file for serializing/deserializing
such structures

That way, the C source looks like C source, instead of a submission for
IOCCC, and that way, tools which can process mundane C source can
stomach it.
 
B

Barry Margolin

I'm thinking of writing a tool to analyze C structs - specifically to
generate a mapping between each struct member and its offset (from the
beginning of the struct). The reason for this is that I need to access C
structs from another language that doesn't have structs - it only has
offsets. I can use the offsetof(3) macro to generate the offsets; the
actual problem is generating the list of all the members. In particular, if
members are themselves structs, then you will need to recursively expand
them out.

This doesn't look to be a very difficult project, but I'm curious if there
is already something out there that does it. I.e., to avoid wheel
re-invention...

Are you sure you're not reinventing the wheel? Because C is so
ubiquitous, most other languages have tools for "foreign calling", so
that they can be linked to C libraries.

What's the other language?
 
K

Kenny McCormack

Ian Collins said:
If they're your structs, generate both the C and other language code
form an alternative source.

I'm primarily interested in the system structs - the ones used in system
calls - e.g., like "stat()".
It would probably involve a good percentage of gcc....

The overall plan is to use cc (specifically, gcc, although there's probably
not much specific to gcc here) to do as much of the work as possible. I
can use something like "gcc -E /usr/inclue/whatever.h" to get me the fully
parsed version of the struct. Then I can use a scripting language (e.g.,
AWK, Perl, etc) to parse that into a C program that uses offsetof(3) on each
struct member and prints the result. Then, finally, compile and run the C
program to generate a table of "member, offset" for each member.

It is, of course, then the middle part that requires work on my part.

--
Windows 95 n. (Win-doze): A 32 bit extension to a 16 bit user interface for
an 8 bit operating system based on a 4 bit architecture from a 2 bit company
that can't stand 1 bit of competition.

Modern day upgrade --> Windows XP Professional x64: Windows is now a 64 bit
tweak of a 32 bit extension to a 16 bit user interface for an 8 bit
operating system based on a 4 bit architecture from a 2 bit company that
can't stand 1 bit of competition.
 
B

BartC

Kenny McCormack said:
I'm primarily interested in the system structs - the ones used in system
calls - e.g., like "stat()".

How many system structs are there likely to be? It might be simpler to
hardcode, by hand, a list of struct and member names, and use the scripting
language on that. If you're not interested in member types and sizes, that
simplifies it further.

(I'm assuming system structs aren't going to change much.)
 
R

Richard Kettlewell

I'm thinking of writing a tool to analyze C structs - specifically to
generate a mapping between each struct member and its offset (from the
beginning of the struct). The reason for this is that I need to access C
structs from another language that doesn't have structs - it only has
offsets. I can use the offsetof(3) macro to generate the offsets; the
actual problem is generating the list of all the members. In particular, if
members are themselves structs, then you will need to recursively expand
them out.

This doesn't look to be a very difficult project, but I'm curious if there
is already something out there that does it. I.e., to avoid wheel
re-invention...

I suggest starting with gcc -fdump-translation-unit. That will give you
GCC’s parse tree. I’m not sure if the syntax is documented anywhere but
it doesn’t look particularly unclear.

It does actually include the offsets but those will be tied to the code
generation target of the GCC you use; if you’re generating anything
intended to be even slightly portable that won’t be of any use to you
and your plan to use offsetof is better.

You don’t want to write a GCC-compatible C parser if you can reasonably
avoid it.
 
S

shivshankar.dayal

The first step you need to do is pre-process code. There are c pre-processors available or you can use preprocessor of compiler itself. Second is to analyze the structures. Now this can be tedious. You will have to parse the code by BNF grammar much like a compiler. However, I am sure you want to avoid that. If you are not opposed to using GCC then GCC exposes entire AST to you of a program. From that AST you can write a plugin to find offset.

Does not look like a very difficult task.


Best regards,
Shiv

http://libreprogramming.org
 
K

Kenny McCormack

The first step you need to do is pre-process code. There are c
pre-processors available or you can use preprocessor of compiler itself.
Second is to analyze the structures. Now this can be tedious. You will
have to parse the code by BNF grammar much like a compiler. However, I
am sure you want to avoid that. If you are not opposed to using GCC then
GCC exposes entire AST to you of a program. From that AST you can write
a plugin to find offset.

Does not look like a very difficult task.

That was my take, as you can see from the OP. And it may come to that - to
re-inventing this wheel, one more time (myself).

But I'm sure that this task has been done thousands of times, by hundreds of
people, for hundreds of reasons. The trick is finding their work. That was
the reason for the NG post.
 
K

Kenny McCormack

Richard Kettlewell said:
I suggest starting with gcc -fdump-translation-unit. That will give you
GCC’s parse tree. I’m not sure if the syntax is documented
anywhere but it doesn’t look particularly unclear.

I see the length of the field, but not the offset. Does this mean I have to
manually add up the lengths, as I go, to find the offset?

@3140 identifier_node strg: XX_YYYY lngt: 7
It does actually include the offsets but those will be tied to the code
generation target of the GCC you use; if you’re generating anything
intended to be even slightly portable that won’t be of any use to you
and your plan to use offsetof is better.

Not sure I get your drift here, but, no, portability is not an issue.
You don’t want to write a GCC-compatible C parser if you can reasonably
avoid it.

Indeed not.

--
Religion is regarded by the common people as true,
by the wise as foolish,
and by the rulers as useful.

(Seneca the Younger, 65 AD)
 
J

Jorgen Grahn

["Followup-To:" header set to comp.lang.c.]

I'm thinking of writing a tool to analyze C structs - specifically to
generate a mapping between each struct member and its offset (from the
beginning of the struct). The reason for this is that I need to access C ....
This doesn't look to be a very difficult project, but I'm curious if there
is already something out there that does it. I.e., to avoid wheel
re-invention...

Look at c2ph/pstruct which comes with Perl.

/Jorgen
 
R

Richard Kettlewell

I see the length of the field, but not the offset. Does this mean I have to
manually add up the lengths, as I go, to find the offset?

@3140 identifier_node strg: XX_YYYY lngt: 7

That’s just the name, you need to go up one level to the corresponding
field_decl and follow the bpos link to find the offset (in bits). Or
rather, you walk the tree from the root down to the structure definition
(record_type) and follow the chain of fields through. (But as discussed
below you’re better off only extracting the names and treating the
compile’s idea of the offset as hearsay anyway.)
Not sure I get your drift here, but, no, portability is not an issue.

Suppose your struct is this:

struct foo {
void *a;
int b;
};

If you ask a compiler targetting a 64-bit ISA then the offset of ‘b’ is
8; for a 32-bit ISA it’ll be 4. If you build those values directly into
your code then it’ll only work on platforms with the same bitness (and
possibly more narrowly than that). Any numbers you extract from
compiler intermediate output will suffer from this problem.
 
K

Kenny McCormack

["Followup-To:" header set to comp.lang.c.]

I'm thinking of writing a tool to analyze C structs - specifically to
generate a mapping between each struct member and its offset (from the
beginning of the struct). The reason for this is that I need to access C ...
This doesn't look to be a very difficult project, but I'm curious if there
is already something out there that does it. I.e., to avoid wheel
re-invention...

Look at c2ph/pstruct which comes with Perl.

Thank you for this. This seems (*) to be exactly what I am looking for and
precisely in line with my reason for posting. I was sure that this had been
done before. Note that the Perl need is pretty much in line with my own
need - that is, a need to access structs by offset rather than by name.

(*) I say "seems" because, unfortunately, in the test case that I did, it
didn't work right. Which is strange, because I think the Perl guys (TC in
particular) do good work. Strange that it would break (give wrong results)
in my first and only test case. Oh well, this is a QOI issue; I may or may
not investigate further.

In any case, if anyone has any other pointers-to-existing-work, please
continue to send them in. I.e., the Perl solution *is* the sort of thing
I'm looking for - but I'd like something that actually works correctly.

--
Religion is regarded by the common people as true,
by the wise as foolish,
and by the rulers as useful.

(Seneca the Younger, 65 AD)
 
K

Kenny McCormack

This doesn't look to be a very difficult project, but I'm curious if there
is already something out there that does it. I.e., to avoid wheel
re-invention...

pahole(1)

http://lwn.net/Articles/365844/[/QUOTE]

pahole does indeed look good. As I've said all along, I'm sure someone has
done this - so it just needed to be found. Thenks for pointing me to it.

Two comments:

1) I wish there were more documentation. The man page, in true Unix style,
doesn't tell you anything unless you already understands what most of the
options do.

2) It doesn't do the one thing I had most hoped for - namely, recusively
expanding out the structs. I.e., my struct contains some other struct and
all I get in the output of pahole is a reference to the other struct (and
its correct length) - not an explicit enumeration of the elements of the
sub-struct. Oh well. I can live with this.

--
(This discussion group is about C, ...)

Wrong. It is only OCCASIONALLY a discussion group
about C; mostly, like most "discussion" groups, it is
off-topic Rorsharch [sic] revelations of the childhood
traumas of the participants...
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,000
Messages
2,570,252
Members
46,848
Latest member
CristineKo

Latest Threads

Top