question on structs and memory blocks

A

Alfonso Morra

Hi,

I am having some probs with copying memory blocks around (part of a
messaging library) and I would like some confirmation to make sure that
I'm going about things the right way.

I have some data types defined thus:

typedef enum {
ONE ,
TWO
} EnumOne ;

typedef enum {
VAL_LONG ,
VAL_DOUBLE ,
VAL_STRING ,
VAL_DATASET
}DataType ;

typedef union {
long lval ;
double fval ;
char* sval ;
void* ptr ;
} Value ;


typedef struct {
/* Header */
int x ;
int y ;
int z ;
char* s1 ; /* Always set to NULL for this discussion */
char* s2 ; /* Always set to NULL for this discussion */

/* Payload */
EnumOne e1 ;
DataType e2 ;
Value v ;
char *s3 ;
size_t size ;
}MyStruct ;


I have two questions:

1). Is my calculation of the number of bytes used by MyStruct correct?

MyStruct *ptr = NULL ;
....
/* calculate size */
ptr->size = 3*sizeof(int) + 2*sizeof(char*) +
sizeof(EnumOne) + sizeof(DataType)+ sizeof_value(val,valType) +
((strlen(ptr->s3)+1)* sizeof(char)) + sizeof(size_t) ;


where function sizeof_value is defined as ff:

size_t sizeof_value(Value x, DataType y){
if (y == VAL_LONG) return sizeof(long) ;
else if (y == VAL_DOUBLE) return sizeof(double) ;
else if (y == VAL_STRING) return ((strlen(x.sval)+1))*sizeof(char) ;
else return 0 ; //you're f***ed (only 3 types supported for now)
}

Assuming my size calcuulation above is correct, I will be able to copy
data from my (suitably populated) data structure to another variable,
using the value of the size element of the struct as the number of bytes
to copy argument of either memcpy or memmove - Right ?

i.e.

MyStruct *newBlock = calloc(1, sizeof(MyStruct));

/* The line below can cause a crash if we oldBlock has a string value -
how do i overcome this ? */

memcpy( newBlock, oldBlock, oldBlock->size) ;
 
A

ajm

Hi Alfonso,

1) no you do not need to disect your struct and compute the sum of the
component sizes in this fashion ! you can just apply sizeof() to
MyStruct and in fact it is better to do so since sizeof() includes any
padding etc. necessary to accommodate the structure components. Also I
see a strlen in there (?) your structure does not contain the strings
but merely pointers to them, it is the cost of these pointers that
contributes to the size of the structure rather than the length of the
strings.

2) you need to be clear by what you mean by "copy" a structure e.g., do
you want to "copy pointers" (after which both structures contains
pointers to the same string in memory) or do you want to "make copies
of pointers" (after which each structure has pointers to its own copies
of strings which could change over time) - in the former a memcpy of
size sizeof(MyStruct) suffices, in the latter a component copy (perhaps
using strdup for char pointers) might be what you need.

hth,
ajm.
 
A

Alfonso Morra

ajm said:
Hi Alfonso,

1) no you do not need to disect your struct and compute the sum of the
component sizes in this fashion ! you can just apply sizeof() to
MyStruct and in fact it is better to do so since sizeof() includes any
padding etc. necessary to accommodate the structure components. Also I
see a strlen in there (?) your structure does not contain the strings
but merely pointers to them, it is the cost of these pointers that
contributes to the size of the structure rather than the length of the
strings.

Think about it. If one stucture contains a string (char*) of length
1024, and the another variable of the same data type contains a string
(char*) of length 16, sizeof (DataStructure) will not report any
difference in size between the two variables. This is patently wrong
(see below)
2) you need to be clear by what you mean by "copy" a structure e.g., do
you want to "copy pointers" (after which both structures contains
pointers to the same string in memory) or do you want to "make copies
of pointers" (after which each structure has pointers to its own copies
of strings which could change over time) - in the former a memcpy of
size sizeof(MyStruct) suffices, in the latter a component copy (perhaps
using strdup for char pointers) might be what you need.

I may have not asked the question correctly. But put simply, I need to
make a copy of a nested structure. A 'proper' copy, so that the
structure can be sent to another process on a different machine - you
may recall that I DID say that I was writing a messaging library. It
would be the second type of copying that I am interested in since the
data is being sent accross process boundaries and any raw pointers would
be meaningless in another process space.
 
E

Eric Sosman

Alfonso Morra wrote On 09/28/05 17:21,:
ajm wrote:




Think about it. If one stucture contains a string (char*) of length
1024, and the another variable of the same data type contains a string
(char*) of length 16, sizeof (DataStructure) will not report any
difference in size between the two variables. This is patently wrong
(see below)

I can't find anything called DataStructure in the
code you posted (see up-thread). Are you referring to
MyStruct? That seems likely, because the first question
you asked was

For the usual C meaning of "number of bytes used" the
answer is "No." The size of MyStruct is sizeof(MyStruct),
not the sum of the sizes of its elements plus the size of
other things in other places that some of its elements
happen to point at.

What you're really after, it seems, is not the size of
a MyStruct instance, but the number of bytes in a "serialized
version," that is, the length of some kind of encoding of
the data represented by a MyStruct and its dependents. This
latter count depends on the "on the wire" representation you
choose; if you like, it depends on your protocol.
I may have not asked the question correctly. But put simply, I need to
make a copy of a nested structure. A 'proper' copy, so that the
structure can be sent to another process on a different machine - you
may recall that I DID say that I was writing a messaging library. It
would be the second type of copying that I am interested in since the
data is being sent accross process boundaries and any raw pointers would
be meaningless in another process space.

Put simply, this is not what is usually called a "copy."
(Nor do I see anything in your code that I'd call a "nested
structure:" There's only one struct type shown, and it contains
no other struct types.)

You'll probably find it helpful to think not of "copies"
but of internal and external representations of your data.
The two machines may use entirely different representations
even if the MyStruct declarations are identical (the struct
padding may differ, the sizes of ints and enums and whatnot
may differ, the endianness of numbers may differ, and so on).

To reconcile their idiosyncratic internal representations,
the two machines must agree on a common external representation
or "wire format." The sender converts its internal data to
the agreed-upon wire format and transmits it, and the receiver
converts the incoming bytes to its own internal representation.

Since the two internal representations may use different
byte counts to represent "the same" data, the number of bytes
used by one machine is pretty much irrelevant to the other,
and pretty much useless as a component of the wire format.
You might be interested in the number of bytes in the external
representation of some data -- for example, if your protocol
involves a header that declares how many payload bytes follow
it -- but that's not the same thing as the number of bytes
the local in-memory representation requires.
 
A

Alfonso Morra

Eric said:
Alfonso Morra wrote On 09/28/05 17:21,:



I can't find anything called DataStructure in the
code you posted (see up-thread). Are you referring to
MyStruct? That seems likely, because the first question
you asked was




For the usual C meaning of "number of bytes used" the
answer is "No." The size of MyStruct is sizeof(MyStruct),
not the sum of the sizes of its elements plus the size of
other things in other places that some of its elements
happen to point at.

What you're really after, it seems, is not the size of
a MyStruct instance, but the number of bytes in a "serialized
version," that is, the length of some kind of encoding of
the data represented by a MyStruct and its dependents. This
latter count depends on the "on the wire" representation you
choose; if you like, it depends on your protocol.




Put simply, this is not what is usually called a "copy."
(Nor do I see anything in your code that I'd call a "nested
structure:" There's only one struct type shown, and it contains
no other struct types.)

You'll probably find it helpful to think not of "copies"
but of internal and external representations of your data.
The two machines may use entirely different representations
even if the MyStruct declarations are identical (the struct
padding may differ, the sizes of ints and enums and whatnot
may differ, the endianness of numbers may differ, and so on).

To reconcile their idiosyncratic internal representations,
the two machines must agree on a common external representation
or "wire format." The sender converts its internal data to
the agreed-upon wire format and transmits it, and the receiver
converts the incoming bytes to its own internal representation.

Since the two internal representations may use different
byte counts to represent "the same" data, the number of bytes
used by one machine is pretty much irrelevant to the other,
and pretty much useless as a component of the wire format.
You might be interested in the number of bytes in the external
representation of some data -- for example, if your protocol
involves a header that declares how many payload bytes follow
it -- but that's not the same thing as the number of bytes
the local in-memory representation requires.

Hmmm, ok at least were on the right track here. Thanks for pointing out
the fact that the term "copying" here may have been "overloaded" and
thus potentially confusing.

What I mean copying means making a "complete and distinct" copy of a
memory block used by a data structure variable. the term 'DataStructure'
used above is just a place holder for an 'abstract data type'. Finally,
'nested' in this case refers to the fact that I have several pointers
(char*) and a union, "nested" within the MyStruct struct.

Regarding your low-level "close to the wire" concerns, I am using a
library which abstracts away, the low level stuff like endian
differences between machines etc. It provides a simple API like this

CreateMessage(DataStructureToCopy, SizeofBinaryDataBlock, MessageStructure);

that allows creation of the message packet. I am shielded from the low
level stuff you mentioned earlier. So my question still remains:

Given a struct like MyStruct defined which contains "nested" members
like a char*, a union etc,

1). How can I calculate the size of the binary data block (to be used
internally by the function above to a call to memmove() )? - is my
calculation in my OP correct?


MTIA
 
K

Keith Thompson

Alfonso Morra said:
Hmmm, ok at least were on the right track here. Thanks for pointing
out the fact that the term "copying" here may have been "overloaded"
and thus potentially confusing.

What I mean copying means making a "complete and distinct" copy of a
memory block used by a data structure variable. the term
'DataStructure' used above is just a place holder for an 'abstract
data type'. Finally, 'nested' in this case refers to the fact that I
have several pointers (char*) and a union, "nested" within the
MyStruct struct.

The common term for this is a "deep copy".
 
A

Alfonso Morra

Keith said:
The common term for this is a "deep copy".

Tks for the clarification of the term - but do you know how I can
actually implement it in code?
 
F

Flash Gordon

Alfonso Morra wrote:

Hmmm, ok at least were on the right track here. Thanks for pointing out
the fact that the term "copying" here may have been "overloaded" and
thus potentially confusing.

What I mean copying means making a "complete and distinct" copy of a
memory block used by a data structure variable. the term 'DataStructure'
used above is just a place holder for an 'abstract data type'. Finally,
'nested' in this case refers to the fact that I have several pointers
(char*) and a union, "nested" within the MyStruct struct.

that is a "deep copy"
Regarding your low-level "close to the wire" concerns, I am using a
library which abstracts away, the low level stuff like endian
differences between machines etc. It provides a simple API like this

CreateMessage(DataStructureToCopy, SizeofBinaryDataBlock,
MessageStructure);

that allows creation of the message packet. I am shielded from the low
level stuff you mentioned earlier. So my question still remains:

Given a struct like MyStruct defined which contains "nested" members
like a char*, a union etc,

1). How can I calculate the size of the binary data block (to be used
internally by the function above to a call to memmove() )? - is my
calculation in my OP correct?

The problem is that we don't know how CreateMessage works in detail so I
think you would get better answers from the author of the library unless
you can provide us with the full source for the CreateMessage function.

Given
struct struct_name struct_variable;
The size you would pass to memcpy to copy struct_variable is "sizeof
struct_variable". This would correctly handle any unions or nested
structures, but if there are pointers to other memory blocks (e.g. a
pointer to a string") and you want a deep copy then you need to write
additional code that knows about such fields, copies what they point to
separately, and then fixes up the pointers in the new structure. If the
"CreateMessage" function does this, then that's all well and good, but I
would be very surprised since your "MessageStructure" parameter would
probably be a pointer to a tree (after all, you could pass the root node
of a tree to CreateMessage) which you would have spend significant
effort to construct. Then there is the question of how it is handled on
the wire which *will* affect the SizeOfBinaryDataBlock if it can handle
the sort of thing you are trying.

My *guess* is that if you need to send a complex structure such as that
you showed earlier you will have to do it as several messages.
 
E

Eric Sosman

Alfonso said:
[...]
1). How can I calculate the size of the binary data block (to be used
internally by the function above to a call to memmove() )? - is my
calculation in my OP correct?

The calculation in your O.P. produces nothing that seems
useful in connection with memmove().

As for the rest, while I recognize the words you are using,
it appears you are using them in ways unfamiliar to me. I sort
of thought I knew what you were trying to do, but the more you
write and the more I read the less I understand. Sorry.
 
L

Lawrence Kirby

On Thu, 29 Sep 2005 00:35:26 +0000, Alfonso Morra wrote:

....
Hmmm, ok at least were on the right track here. Thanks for pointing out
the fact that the term "copying" here may have been "overloaded" and
thus potentially confusing.

What I mean copying means making a "complete and distinct" copy of a
memory block used by a data structure variable. the term 'DataStructure'
used above is just a place holder for an 'abstract data type'. Finally,
'nested' in this case refers to the fact that I have several pointers
(char*) and a union, "nested" within the MyStruct struct.

You need code that "knows about" the layut of your datastructure and can
make copies of all of the relevant parts.
Regarding your low-level "close to the wire" concerns, I am using a
library which abstracts away, the low level stuff like endian
differences between machines etc. It provides a simple API like this

CreateMessage(DataStructureToCopy, SizeofBinaryDataBlock, MessageStructure);

The answers to your questions depend on how this function works. So the
only way to get answers to your questions is by reading the documentation.
However for it to be able to deal with representation issues like byte
order it must know about the layout of your structure.
that allows creation of the message packet. I am shielded from the low
level stuff you mentioned earlier. So my question still remains:

Given a struct like MyStruct defined which contains "nested" members
like a char*, a union etc,

1). How can I calculate the size of the binary data block (to be used
internally by the function above to a call to memmove() )? - is my
calculation in my OP correct?

That depends on what representation the function above uses to store the
serialised data. It must document a means to determine how much memory
is needed.

Lawrence
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,997
Messages
2,570,241
Members
46,831
Latest member
RusselWill

Latest Threads

Top