Aditya said:
Hi ALL
I need to find the best way to copy data in shared memory which is
shared by different processors.
same architecture or different architectures?
same OS or different OS's?
local machine, or over a network?
....
Will there be any issue due to the byte packing strategy used by
different processors.
it depends on the above questions.
For example
typedef struct
{
unsigned char a
unsigned short b
unsigned char c
int d
} my_struct
it would help knowing C's syntax...
Question :
1. If I need to copy this structure to shared memory if I calculated
the sizeof of this strcuture and memcpy that manyt bytes to shared
memory . Will this be a good approach keeping in mind that Processor B
could be same as well as different
again, the original question...
for example, sharing memory between different processes on the same computer
(or, potentially, processors if the computer is multi-core or SMP), it is
probably fairly safe (about the only thing likely to be different in each
case is the physical address of the shared region, but this can be overcome
if one knows what they are doing).
2. Alternatively if I copy the individual elements one by one to the
shared memory and used the similar approach while reading. Will it
create any impact on performance ?
as another has noted:
that code works in the first place, matters a whole hell of a lot more than
how quickly it works...
after all, the fastest code is to do nothing at all, but all bets are off to
whether it is correct...
Please suggest me the best alternative for this solution. I want to
adopt a approach which is general across different processors.
if it is a simplistic multi-computer shared memory (as in, shared memory,
but not distributed shared memory), then likely:
develop a generalized representation, which is neutral both to alignment,
packing, or location in the address space.
a simple way to do this is define structures purely in terms of small arrays
of bytes, and so things like the size, endianess, ... are explicit (and, the
compiler is left without reason to try to align anything).
similarly, address location can be avoided using differential pointers
rather than absolute pointers.
typedef unsigned char byte;
typedef struct
{
byte a;
byte b[2];
byte c;
byte d[4];
} my_struct; //packed
or:
typedef struct
{
byte a;
byte pad0_;
byte b[2];
byte c;
byte pad1_[3];
byte d[4];
} my_struct; //aligned
however, usually as a matter of being clean, struct members "should" be
organized such that padding is not needed.
now, consider all accesses are done via special functions, such as:
int FOO_GetInt16BE(byte *p);
int FOO_SetInt32BE(byte *p, int v);
note that with pointers, one can perform the differential encoding/decoding
while reading/writing the pointer.
int FOO_GetPtrRel32BE(byte *p)
{
int v;
v=FOO_GetInt32BE(p);
return(p+v+4); //end relative
return(p+v); //base relative
}
void FOO_SetPtrRel32BE(byte *p, void *q)
{
int v;
v=((byte *)q)-p-4; //end relative
v=((byte *)q)-p; //base relative
FOO_SetInt32BE(p, v);
}
all of this also works well when designing file formats (although,
typically, file formats use addresses relative to the file base rather than
relative, however, relative encoding does have a few merits with some kinds
of usage patterns).
end-relative vs base relative is a subtle difference, but has a few uses.
end-relative is internally used more by x86 (for example, in jumps and
calls, ...).
base relative is likely to be slightly more efficient if implemented in C
(as the endless +4 and -4 can be eliminated).
....
as a cost, however, this simple scheme does not work so well with either
distributed memory, or with heaps larger than 2GB for the 32-bit case
(unless special trickery is used).
DSM typically uses 64 or 128 bit "segmented absolute" pointers, where part
of the pointer designates the segment (a particular "node" or "store"), and
the rest designates a store-relative offset.
this scheme is also much more difficult to work with.
granted, there "is" a tradeoff, namely in that a 32-bit relative scheme
could be used for most of the pointers, but that a part of the space is
designated for pointers outside the space (either, outside the +-2GB range,
or into different store segments).
in this case, the pointer would map either to a specific region within the
space (for example, "absolute" addresses 0x7F000000 .. 0x7FFFFFFF are
designated for non-local access, or all references in a special range
outside the store, say -0x00000001 .. -0x00FFFFFF), or certain values of the
pointers themselves (say, 0x80000000 .. 0x80FFFFFF, which is right near
the -2GB mark).
either way, this can avoid having to use 64 or 128 bits for all of the
pointers in the DSM case.
further note:
unless one is running on a 64 bit OS, there is no real way around the CPU's
address space limitations in the DSM case (AKA: a 32 bit process will not be
able to have "direct" access to more than about 1-2GB total of shared
heaps/stores...).
the more traditional raw 64 or 128 bit DSM pointers partly get around this,
because not all of the address space needs to actually fit within the host
process.
either way, the Rel32 scheme will allow somewhat "compressing" the shared
memory, regardless of whether native pointers or 64/128 bit DSM pointers are
used in the parent process.
the cost though is a slightly increased complexity in this case, since the
special "non-local" pointers need to be detected and handled specially.
or such...