copying C structures acroos different processors

Aditya · Jul 29, 2009

Hi ALL

I need to find the best way to copy data in shared memory which is
shared by different processors.

Will there be any issue due to the byte packing strategy used by
different processors.

For example

typedef struct
{
unsigned char a
unsigned short b
unsigned char c
int d
} my_struct

Question :
1. If I need to copy this structure to shared memory if I calculated
the sizeof of this strcuture and memcpy that manyt bytes to shared
memory . Will this be a good approach keeping in mind that Processor B
could be same as well as different

2. Alternatively if I copy the individual elements one by one to the
shared memory and used the similar approach while reading. Will it
create any impact on performance ?

Please suggest me the best alternative for this solution. I want to
adopt a approach which is general across different processors.

Regards

Aditya

Nobody · Jul 29, 2009

I need to find the best way to copy data in shared memory which is
shared by different processors.

Will there be any issue due to the byte packing strategy used by
different processors.

Packing is a property of the compiler rather than the CPU, but it's
influenced by what is efficient.

For example

typedef struct
{
unsigned char a
unsigned short b
unsigned char c
int d
} my_struct

Question :
1. If I need to copy this structure to shared memory if I calculated
the sizeof of this strcuture and memcpy that manyt bytes to shared
memory . Will this be a good approach keeping in mind that Processor B
could be same as well as different

2. Alternatively if I copy the individual elements one by one to the
shared memory and used the similar approach while reading. Will it
create any impact on performance ?

Please suggest me the best alternative for this solution. I want to
adopt a approach which is general across different processors.

Try to pick a layout which is efficient on all of the platforms which you
care about. 2-byte values should be aligned to addresses which are a
multiple of 2, 4-byte values to a multiple of 4. 8-byte values might
benefit from being aligned to a multiple of 8, or they might not.

In any case, I would suggest adding explicit padding fields, e.g.:

typedef struct
{
unsigned char a;
unsigned char pad__1[1];
unsigned short b;
unsigned char c;
unsigned char pad__1[3];
int d;
} my_struct

This way, it's safe to assume that the offsets will be:

field offset
a 0
b 2
c 4
d 8

for any common platform.

If you find that you later need to support a platform which doesn't
support this, you can either re-design the layout or write portable
accessor functions, depending upon how important efficiency on that
platform is.

Keith Thompson · Jul 29, 2009

Nobody said:
Try to pick a layout which is efficient on all of the platforms which you
care about. 2-byte values should be aligned to addresses which are a
multiple of 2, 4-byte values to a multiple of 4. 8-byte values might
benefit from being aligned to a multiple of 8, or they might not.

If you care about portability, 8-byte values should definitely be
8-byte aligned. On one of the systems through which I'm typing this,
the compiler imposes this requirement, even though the hardware
doesn't.

In any case, I would suggest adding explicit padding fields, e.g.:

typedef struct
{
unsigned char a;
unsigned char pad__1[1];
unsigned short b;
unsigned char c;
unsigned char pad__1[3];

This should be pad__2, yes?

int d;
} my_struct

This way, it's safe to assume that the offsets will be:

field offset
a 0
b 2
c 4
d 8

for any common platform.

Only if you assume that unsigned short is 2 bytes. One could argue
that it is 2 bytes on all "common platforms", but I wouldn't make that
assumption. I'd use uint16_t instead.

If you're designing your own layout that's meant to be reasonably
portable, this is a good approach. If you're writing code to use an
externally imposed layout, it's *probably* still a good approach if
the layout was designed well. If it requires, say, an 8-byte int on a
4-byte alignment, you might run into some problems.

BGB / cr88192 · Jul 29, 2009

Aditya said:
Hi ALL

I need to find the best way to copy data in shared memory which is
shared by different processors.

same architecture or different architectures?
same OS or different OS's?
local machine, or over a network?
....

Will there be any issue due to the byte packing strategy used by
different processors.

it depends on the above questions.

For example

typedef struct
{
unsigned char a
unsigned short b
unsigned char c
int d
} my_struct

it would help knowing C's syntax...

Question :
1. If I need to copy this structure to shared memory if I calculated
the sizeof of this strcuture and memcpy that manyt bytes to shared
memory . Will this be a good approach keeping in mind that Processor B
could be same as well as different

again, the original question...

for example, sharing memory between different processes on the same computer
(or, potentially, processors if the computer is multi-core or SMP), it is
probably fairly safe (about the only thing likely to be different in each
case is the physical address of the shared region, but this can be overcome
if one knows what they are doing).

2. Alternatively if I copy the individual elements one by one to the
shared memory and used the similar approach while reading. Will it
create any impact on performance ?

as another has noted:
that code works in the first place, matters a whole hell of a lot more than
how quickly it works...

after all, the fastest code is to do nothing at all, but all bets are off to
whether it is correct...

Please suggest me the best alternative for this solution. I want to
adopt a approach which is general across different processors.

if it is a simplistic multi-computer shared memory (as in, shared memory,
but not distributed shared memory), then likely:
develop a generalized representation, which is neutral both to alignment,
packing, or location in the address space.

a simple way to do this is define structures purely in terms of small arrays
of bytes, and so things like the size, endianess, ... are explicit (and, the
compiler is left without reason to try to align anything).

similarly, address location can be avoided using differential pointers
rather than absolute pointers.

typedef unsigned char byte;

typedef struct
{
byte a;
byte b[2];
byte c;
byte d[4];
} my_struct; //packed

or:

typedef struct
{
byte a;
byte pad0_;
byte b[2];
byte c;
byte pad1_[3];
byte d[4];
} my_struct; //aligned

however, usually as a matter of being clean, struct members "should" be
organized such that padding is not needed.

now, consider all accesses are done via special functions, such as:
int FOO_GetInt16BE(byte *p);
int FOO_SetInt32BE(byte *p, int v);

note that with pointers, one can perform the differential encoding/decoding
while reading/writing the pointer.

int FOO_GetPtrRel32BE(byte *p)
{
int v;
v=FOO_GetInt32BE(p);
return(p+v+4); //end relative
return(p+v); //base relative
}

void FOO_SetPtrRel32BE(byte *p, void *q)
{
int v;

v=((byte *)q)-p-4; //end relative
v=((byte *)q)-p; //base relative

FOO_SetInt32BE(p, v);
}

all of this also works well when designing file formats (although,
typically, file formats use addresses relative to the file base rather than
relative, however, relative encoding does have a few merits with some kinds
of usage patterns).

end-relative vs base relative is a subtle difference, but has a few uses.
end-relative is internally used more by x86 (for example, in jumps and
calls, ...).
base relative is likely to be slightly more efficient if implemented in C
(as the endless +4 and -4 can be eliminated).

....

as a cost, however, this simple scheme does not work so well with either
distributed memory, or with heaps larger than 2GB for the 32-bit case
(unless special trickery is used).

DSM typically uses 64 or 128 bit "segmented absolute" pointers, where part
of the pointer designates the segment (a particular "node" or "store"), and
the rest designates a store-relative offset.

this scheme is also much more difficult to work with.

granted, there "is" a tradeoff, namely in that a 32-bit relative scheme
could be used for most of the pointers, but that a part of the space is
designated for pointers outside the space (either, outside the +-2GB range,
or into different store segments).

in this case, the pointer would map either to a specific region within the
space (for example, "absolute" addresses 0x7F000000 .. 0x7FFFFFFF are
designated for non-local access, or all references in a special range
outside the store, say -0x00000001 .. -0x00FFFFFF), or certain values of the
pointers themselves (say, 0x80000000 .. 0x80FFFFFF, which is right near
the -2GB mark).

either way, this can avoid having to use 64 or 128 bits for all of the
pointers in the DSM case.

further note:
unless one is running on a 64 bit OS, there is no real way around the CPU's
address space limitations in the DSM case (AKA: a 32 bit process will not be
able to have "direct" access to more than about 1-2GB total of shared
heaps/stores...).

the more traditional raw 64 or 128 bit DSM pointers partly get around this,
because not all of the address space needs to actually fit within the host
process.

either way, the Rel32 scheme will allow somewhat "compressing" the shared
memory, regardless of whether native pointers or 64/128 bit DSM pointers are
used in the parent process.

the cost though is a slightly increased complexity in this case, since the
special "non-local" pointers need to be detected and handled specially.

or such...

Thomas Matthews · Jul 30, 2009

Aditya said:
Hi ALL

I need to find the best way to copy data in shared memory which is
shared by different processors.

Will there be any issue due to the byte packing strategy used by
different processors.

For example

typedef struct
{
unsigned char a
unsigned short b
unsigned char c
int d
} my_struct

Question :
1. If I need to copy this structure to shared memory if I calculated
the sizeof of this strcuture and memcpy that manyt bytes to shared
memory . Will this be a good approach keeping in mind that Processor B
could be same as well as different

2. Alternatively if I copy the individual elements one by one to the
shared memory and used the similar approach while reading. Will it
create any impact on performance ?

Please suggest me the best alternative for this solution. I want to
adopt a approach which is general across different processors.

Regards

Aditya

The best and most portable solution is to convert the values
to text and send them to the other processor. The other processor
converts from text into the native format.

Two of the minimal problems are Endianess and Word size.
For example, if one processor orders its bytes from large to
small and the other from small to large, a memcpy will screw
things up. If sizeof(short) on one processor is not equal
to the other things may go bad also.

My experience has been to access each field octet by octet and
build local variables that method. The data Endianness would
have to be a contract between the two processors. This is
a common challenge when processing data formats such as USB
and SCSI.

--
Thomas Matthews

C++ newsgroup welcome message:
http://www.slack.net/~shiva/welcome.txt
C++ Faq: http://www.parashift.com/c++-faq-lite
C Faq: http://www.eskimo.com/~scs/c-faq/top.html
alt.comp.lang.learn.c-c++ faq:
http://www.comeaucomputing.com/learn/faq/
Other sites:
http://www.josuttis.com -- C++ STL Library book
http://www.sgi.com/tech/stl -- Standard Template Library

Nobody · Jul 30, 2009

The best and most portable solution is to convert the values
to text and send them to the other processor. The other processor
converts from text into the native format.

Text is overkill. A defined binary format (specifying width, endianness
and sign-bit/2s-complement) is just as portable, and more efficient.

You still have to decode this format, but it's no different to decoding
text, and more efficient (and on a specific platform which uses that
format natively, much more efficient).

Two of the minimal problems are Endianess and Word size.
For example, if one processor orders its bytes from large to
small and the other from small to large, a memcpy will screw
things up. If sizeof(short) on one processor is not equal
to the other things may go bad also.

Even if you use a textual format, you still have to specify the minimum
and maximum permitted values if you are to be able to use fixed-width
types on any platform (and allowing x == -2^N will hurt platforms which
use a sign bit rather than two's complement, although it's not as if
anyone actually cares about those).

My experience has been to access each field octet by octet and
build local variables that method. The data Endianness would
have to be a contract between the two processors. This is
a common challenge when processing data formats such as USB
and SCSI.

This is fine if the format is fixed in advance. If you get to choose
the format, it makes sense to consider which platforms you intend to
support and what is efficient on those platforms.

[E.g. USB uses little-endian format for everything, including Unicode
characters (Unicode 1.0 only defined big-endian encodings; the
little-endian encodings were invented by Microsoft and railroaded into
later standards). It's a safe bet that this was influenced by the
desire to avoid explicit decoding on x86 (by "influenced", I mean
"Microsoft said so").]

Nick Keighley · Jul 30, 2009

Text is overkill. A defined binary format (specifying width, endianness
and sign-bit/2s-complement) is just as portable, and more efficient.

You still have to decode this format, but it's no different to decoding
text, and more efficient (and on a specific platform which uses that
format natively, much more efficient).

eg. XDR, ASN.1 (with appropriate encoding rules)

<snip>

raashid bhatt · Jul 30, 2009

Hi ALL

I need to find the best way to copy data in shared memory which is
shared by different processors.

Will there be any issue due to the byte packing strategy used by
different processors.

For example

typedef struct
{
unsigned char a
unsigned short b
unsigned char c
int d

} my_struct

Question :
1. If I need to copy this structure to shared memory if I calculated
the sizeof of this strcuture and memcpy that manyt bytes to shared
memory . Will this be a good approach keeping in mind that Processor B
could be same as well as different

2. Alternatively if I copy the individual elements one by one to the
shared memory and used the similar approach while reading. Will it
create any impact on performance ?

Please suggest me the best alternative for this solution. I want to
adopt a approach which is general across different processors.

Regards

Aditya

try to serialize data before you send it could be probably xml or some
representation

Aditya · Jul 31, 2009

try to serialize data before you send it could be probably xml or some
representation- Hide quoted text -

- Show quoted text -

Hi All,

Thanks a lot for all your valuable suggestions.

Regards

Aditya

Nick Keighley · Jul 31, 2009

try to serialize data before you send it could be probably xml or some
representation

or almost anything else except XML.

Richard Bos · Jul 31, 2009

Aditya said:
Thanks a lot for all your valuable suggestions.

Do note that "use XML" is one of the least valuable suggestions possible
in most circumstances. I've never come across a situation where it was a
good idea, and yours most certainly is not one.

Richard

Same structures, different names	31	Apr 3, 2011
Data alignment questin, structures	46	Jan 12, 2013
Calculating checksums for different structures.	1	Jul 3, 2010
How can you make idle processors pick up java work?	3	Jul 31, 2012
Copying structures of different type	10	Feb 18, 2004
Structures and chained lists questions :	1	Feb 12, 2011
Passing an array of structures back through the argument list	9	Sep 7, 2013
Copy string from 2D array to a 1D array in C	1	Nov 1, 2023

copying C structures acroos different processors

Aditya

Nobody

Keith Thompson

BGB / cr88192

Thomas Matthews

Nobody

Nick Keighley

raashid bhatt

Aditya

Nick Keighley

Richard Bos

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads