nameless struct / union

B

Bryan Parkoff

I hate using struct / union with dot between two words. How can I use
one word instead of two words because I want the source code look reading
clear. three variables are shared inside one variable. I manipulate to
change 8-bit data before it causes to change 16-bit data and 32-bit data.
For example.

union

{

struct _Byte

{

U_BYTE AAL;

U_BYTE AAH;

} Byte;

struct _Word

{

U_WORD AAW;

} Word;

struct _DWORD

{

U_DWORD AA;

} DWord;

};

int main()

{

// I hate dot between 2 words.

Byte.AAL = 0xFF;

Byte.AAH = 0x20;

Byte.AAL += 0x0A;

Byte.AAH += 0x01;

Word.AAW += 0xFF;

DWord.AA += 0xFFFF;

// It is easy reading variable inside struct / union.

AAL = 0xFF;

AAH = 0x20;

AAL += 0x0A;

AAH += 0x01;

AAW += 0xFF;

AA += 0xFFFF;
 
V

Victor Bazarov

Bryan said:
I hate using struct / union with dot between two words. How can I
use one word instead of two words because I want the source code look
reading clear. three variables are shared inside one variable. I
manipulate to change 8-bit data before it causes to change 16-bit
data and 32-bit data. For example.

union

{

struct _Byte

{

U_BYTE AAL;

U_BYTE AAH;

} Byte;

struct _Word

{

U_WORD AAW;

} Word;

struct _DWORD

{

U_DWORD AA;

} DWord;

};

int main()

{

// I hate dot between 2 words.

Byte.AAL = 0xFF;

Byte.AAH = 0x20;

Byte.AAL += 0x0A;

Byte.AAH += 0x01;

Word.AAW += 0xFF;

DWord.AA += 0xFFFF;

// It is easy reading variable inside struct / union.

AAL = 0xFF;

AAH = 0x20;

AAL += 0x0A;

AAH += 0x01;

AAW += 0xFF;

AA += 0xFFFF;

Uh... I'm a bit rusty on unnamed unions. Does an unnamed union
create a global instance?

Also, your use of first assigning one part of the union and then
using another part has undefined behaviour, IIRC.

V
 
R

Rolf Magnus

Victor said:
Uh... I'm a bit rusty on unnamed unions. Does an unnamed union
create a global instance?

I didn't even know it's allowed outside a struct. Is it actually?
Also, your use of first assigning one part of the union and then
using another part has undefined behaviour, IIRC.

Yes. You must always only read the member you last wrote to, otherwise the
behavior is undefined.
 
C

Craig Scott

I hate using struct / union with dot between two words. How can I use
one word instead of two words because I want the source code look reading
clear. three variables are shared inside one variable. I manipulate to
change 8-bit data before it causes to change 16-bit data and 32-bit data.
For example.

union
{
struct _Byte
{
U_BYTE AAL;
U_BYTE AAH;
} Byte;

struct _Word
{
U_WORD AAW;
} Word;

struct _DWORD
{
U_DWORD AA;
} DWord;

};

Sorry, it's not answering your original post (others seem to be doing
that already), but using a type name which starts with an underscore
and is followed by an uppercase letter is not allowed by the C++
standard (unless your code is part of the compiler implementation
itself). A name starting with an underscore and NOT followed by an
uppercase letter cannot be used in the global namespace, but
presumably could be elsewhere (but I'd recommend against it to avoid
confusion). See section 17.4.3.1.2 of the standard for details.
 
B

Bryan Parkoff

Uh... I'm a bit rusty on unnamed unions. Does an unnamed union
I didn't even know it's allowed outside a struct. Is it actually?


Yes. You must always only read the member you last wrote to, otherwise the
behavior is undefined.

I want to follow up. I feel nameless union/struct is necessary. I want
three variables to share one big variable. You want to work two byte data.
It causes word data to be modified automatically because it is shared. For
example, you define Low_Byte and High_Byte. You add 0xFF + 0x03. Low_Byte
is modified to show 0x02. It does not modify High_Byte when one bit fell
off Low_Byte to become Carry. Then Carry can be added to High_Byte. It
makes easier so I do not have to use Word &= 0x00FF. If I want to work word
data, I do not need Carry and add can be 0x20FF + 0x0003. Word is modified
to show 0x2102.
It makes my source code readable clearly. Here is an exmaple below.
Please let me know what you think.

tatic union

{

U_BYTE B[4];

U_WORD W[2];

U_DWORD DW;

};

#define Low_Byte B[0]

#define High_Byte B[1]

#define Low_Byte2 B[2]

#define High_Byte2 B[3]

#define Low_Word W[0]

#define High_Word W[1]

#define DW DWord



int main()

{

Low_Byte = 0xFF;

High_Byte = 0x20;

Low_Byte += 0x03;

High_Byte += 0x01;

Low_Word += 0x00FF;

DWord += 0x0000FFFF;

return 0;

}
 
V

Victor Bazarov

Bryan said:
[..] I feel nameless union/struct is necessary. I want three variables to
share one big variable. You want to work
two byte data. It causes word data to be modified automatically
because it is shared.

That is your mistake. The language explicitly states that you only
can read the same data you wrote. You cannot write byte and then
read word. That's not what the unions are for. To accomplish that
you need to 'static_cast' your word into an array of char and then
change each char as you want.

V
 
B

Bryan Parkoff

Bryan said:
[..] I feel nameless union/struct is necessary. I want three variables
to share one big variable. You want to work
two byte data. It causes word data to be modified automatically
because it is shared.

That is your mistake. The language explicitly states that you only
can read the same data you wrote. You cannot write byte and then
read word. That's not what the unions are for. To accomplish that
you need to 'static_cast' your word into an array of char and then
change each char as you want.

Please explain why you think that union is not to be used. I want to
share two bytes into one word. I can modify one low byte at this time and
high byte next time. Then word gets data from 2 bytes. You can define one
array with two elements or two bytes. They are the same. Please reread my
previous thread post so you can compare union using #define without struct
and union with struct here below.

union
{
struct _B
{
BYTE L;
BYTE H;
} B;
WORD W;
}

You can store two bytes into one word using pointer like this below. It
is identical to union above. The problem is that after C++ Compiler
converted C++ source code into x86 / non x86 machine language. It has extra
1-2 instructions because it needs to read memory address first before
accessing variable through pointer. Union does not have extra instructions.
It has only one instruction to acess variable instead of using pointer.
Union is the best choice.
Please explain why you claim that I made my mistake. Please show your
example of static_cast<> keyword. It is like to put pointer in
static_cast<>.

WORD W = 0;
BYTE L = (BYTE*)&W;
BYTE H = (BYTE*)&W+1;
*L = 0xFF;
*H = 0x20;

Bryan Parkoff
 
V

Victor Bazarov

Bryan said:
Bryan said:
[..] I feel nameless union/struct is necessary. I want three
variables to share one big variable. You want to work
two byte data. It causes word data to be modified automatically
because it is shared.

That is your mistake. The language explicitly states that you only
can read the same data you wrote. You cannot write byte and then
read word. That's not what the unions are for. To accomplish that
you need to 'static_cast' your word into an array of char and then
change each char as you want.

Please explain why you think that union is not to be used.

I don't have to. The language Standard forbids it. If I had to
speculate it's because you either need to explicitly allow certain
combinations (thus making a relatively long set of pairs that are
OK to share the memory and let you read *not* what you wrote), or
you disallow everything (like the Standard does) because there are
combinations (like chars and a pointer, for instance) which are by
*no* means OK. You cannot write a bunch of chars and then expect
them to form a valid pointer, and even _reading_ (loading into
an address register) an invalid pointer can cause hardware fault
on some systems.

V
 
B

Bryan Parkoff

Bryan said:
Bryan Parkoff wrote:
[..] I feel nameless union/struct is necessary. I want three
variables to share one big variable. You want to work
two byte data. It causes word data to be modified automatically
because it is shared.

That is your mistake. The language explicitly states that you only
can read the same data you wrote. You cannot write byte and then
read word. That's not what the unions are for. To accomplish that
you need to 'static_cast' your word into an array of char and then
change each char as you want.

Please explain why you think that union is not to be used.

I don't have to. The language Standard forbids it. If I had to
speculate it's because you either need to explicitly allow certain
combinations (thus making a relatively long set of pairs that are
OK to share the memory and let you read *not* what you wrote), or
you disallow everything (like the Standard does) because there are
combinations (like chars and a pointer, for instance) which are by
*no* means OK. You cannot write a bunch of chars and then expect
them to form a valid pointer, and even _reading_ (loading into
an address register) an invalid pointer can cause hardware fault
on some systems.

OK, I understand. It looks like non-standard C++ Compiler to accept
four byte variables to be linked into one dword variable using union. I
always decide to allow overcoming non-standard C++ Compiler. I hope that it
should be compatible to all C++ Coompiler like Microsoft, GNU, Mac OSX, and
others.
static_cast<> is used only if I want to convert small size to big size,
but not shared / linked small / big sizes. Hopefully, C++ Compiler should
be able to implement to support non-standard C++ near the future so this
code can be very good portablility.
Thank you for your comment. Smile...

Bryan Parkoff
 
F

Fred Zwarts

Bryan Parkoff said:
Bryan said:
[..] I feel nameless union/struct is necessary. I want three variables
to share one big variable. You want to work
two byte data. It causes word data to be modified automatically
because it is shared.

That is your mistake. The language explicitly states that you only
can read the same data you wrote. You cannot write byte and then
read word. That's not what the unions are for. To accomplish that
you need to 'static_cast' your word into an array of char and then
change each char as you want.

Please explain why you think that union is not to be used. I want to
share two bytes into one word. I can modify one low byte at this time and
high byte next time. Then word gets data from 2 bytes. You can define one
array with two elements or two bytes. They are the same. Please reread my
previous thread post so you can compare union using #define without struct
and union with struct here below.

union
{
struct _B
{
BYTE L;
BYTE H;
} B;
WORD W;
}

You can store two bytes into one word using pointer like this below. It
is identical to union above. The problem is that after C++ Compiler
converted C++ source code into x86 / non x86 machine language. It has extra
1-2 instructions because it needs to read memory address first before
accessing variable through pointer. Union does not have extra instructions.
It has only one instruction to acess variable instead of using pointer.
Union is the best choice.
Please explain why you claim that I made my mistake. Please show your
example of static_cast<> keyword. It is like to put pointer in
static_cast<>.

WORD W = 0;
BYTE L = (BYTE*)&W;
BYTE H = (BYTE*)&W+1;
*L = 0xFF;
*H = 0x20;

Bryan Parkoff

The standard says that using an union in this way results in undefined behavior.
I don't know the rationale behind this, but I think that normally for a POD it is not a problem.
However, a union can be a complex thing and its members can be complex classes.
E.g., members which may be accessed only using their member functions. Even the assignment
operator may be overwritten, so that it is unclear what happens if you assign one member
of a union and read another member of a union.
Probably, this is the rationale behind the standard.
The same problems will show up in your second strategy if WORD and BYTE are complex classes.
 
R

Rolf Magnus

Bryan said:
OK, I understand.  It looks like non-standard C++ Compiler to accept
four byte variables to be linked into one dword variable using union.

The compiler is right to accept it. The standard just says that the behavior
is undefined, which means the compiler can make anything it wants out of
it. Doing what you expected is just one possible outcome of many. You
should always avoid undefined behavior because it's ... well, not defined
what happens.
I always decide to allow overcoming non-standard C++ Compiler.  I hope
that it should be compatible to all C++ Coompiler like Microsoft, GNU, Mac
OSX, and others.

If you want to be compatible with all compilers, you have to stick to the
standard. That's what it's there for.
static_cast<> is used only if I want to convert small size to big size,
but not shared / linked small / big sizes.

You can static_cast between pointer types. The standard guarantees that you
can access any object as array of char that way. However, the internal
layout of the object might again be different between compilers. One
example from the real world is endianness. The order of the bytes in bigger
size integers is not the same on all CPUs.
 
J

James Kanze

Bryan said:
Bryan Parkoff wrote:
[..] I feel nameless union/struct is necessary. I want three
variables to share one big variable. You want to work
two byte data. It causes word data to be modified automatically
because it is shared.
That is your mistake. The language explicitly states that
you only can read the same data you wrote. You cannot
write byte and then read word. That's not what the unions
are for. To accomplish that you need to 'static_cast' your
word into an array of char and then change each char as you
want.
Please explain why you think that union is not to be used.
I don't have to. The language Standard forbids it. If I had
to speculate it's because you either need to explicitly allow
certain combinations (thus making a relatively long set of
pairs that are OK to share the memory and let you read *not*
what you wrote), or you disallow everything (like the Standard
does) because there are combinations (like chars and a
pointer, for instance) which are by *no* means OK.
You cannot write a bunch of chars and then expect them to form
a valid pointer, and even _reading_ (loading into an address
register) an invalid pointer can cause hardware fault on some
systems.

That would be a valid objection for casting to char* and writing
as well.

I don't know the actual motivation for the restriction. What
the OP is asking for is usually called type punning. In
pre-standard (C standard) days, there were two widespread
techniques of type punning: using a union (as he does), and
casting the address to a different type. Most compilers
supported both, but at least one didn't support either without
special options.

For whatever reasons, the C standards committee decided to only
support the casting of the pointer, and then only if one of the
types involved was a character type (char* or unsigned char*).
In practice, compilers still differ with regards to what they
support. (Note that regardless of what the compiler supports,
it's likely that both will work if optimization is turned off.)
 
T

Tomás Ó hÉilidhe

That is your mistake. The language explicitly states that you only
can read the same data you wrote. You cannot write byte and then
read word.


Although the Standard does not the define the behaviour of what happens
when you write to one union member and then read from a different one,
it certainly does not restrict the implementation from defining the
behaviour.

I've never come across a system where this union practice didn't do
exactly what you want it to do. Never.

I doubt you'll be sacrificing portability if you go ahead with this
method. If you want your code to get the 100% portable stamp of approval
tho, you might consider finding a different way of doing it.

That's not what the unions are for. To accomplish that
you need to 'static_cast' your word into an array of char and then
change each char as you want.


If I wanted to do something like:

union Foo {
char unsigned bytes[sizeof long];
long unsigned x;
};

Foo bar;

bar.x = 27892;
bar.bytes[0] = 5;

, then I'd probably do something like:


struct Foo {
long unsigned x;

char unsigned *const bytes;

Foo() : bytes(reinterpret_cast<char unsigned*>(&x)) {}
};


Unfortunately tho this increases the size of Foo... plus you can no
longer move the object in memory... and it's also not a POD... plus
sizeof bytes won't give you what you want.

The thing is tho, that if you're making assumptions about the amount of
bytes in a certain integer type, and also about whether there's padding
bits in those integer types, then portability's already been thrown out
the window, so I'd say just go with the union method.
 
T

Tomás Ó hÉilidhe

I don't have to. The language Standard forbids it.


I don't see anything about a constraint violation in the Standard, nor do I
see any explicit forbidding of the implementation to define the behaviour
of what happens when you use unions like this.
 
R

red floyd

Tomás Ó hÉilidhe said:
Although the Standard does not the define the behaviour of what happens
when you write to one union member and then read from a different one,
it certainly does not restrict the implementation from defining the
behaviour.

I've never come across a system where this union practice didn't do
exactly what you want it to do. Never.

I doubt you'll be sacrificing portability if you go ahead with this
method. If you want your code to get the 100% portable stamp of approval
tho, you might consider finding a different way of doing it.

I doubt the OP is even concerned about portability. Looks like he's
trying to map the x86 register set.
 
B

Bryan Parkoff

red floyd said:
I doubt the OP is even concerned about portability. Looks like he's
trying to map the x86 register set.

I prefer to make sure to standardize my source code to ANSI C/C++ and
accepts portability. Sometimes, Microsoft Visual C++ 9.0 has extended
standard that standard C/C++ Compiler does not accept. For example, you use
comment "//" and "/* */" on file.c (not file.cpp). Standard C Compiler does
not accept "//". I have to live with standard C/C++ Compiler because I want
to port my source code from Microsoft C/C++ Compiler to GNU to Mac OSX to
Linux to Unix, etc.
How do you think? You try to map x86 register set. Then, how do you
map register set on non-x86 machine? Maybe some machine do not have four
sizes to be shared on one register meaning register set does not accept
byte, word, and dword. It does accept qword. Then, you are able to use
char, word, dword, and qword on C++ source code. Then, C/C++ Compiler will
convert all sizes to 64-bit size using mask to clear all upper bits like
"AND".
What choice do I have? Should I use union if I want to share four sizes
into one 64 bit variable? Should I avoid union and use big size using "AND"
and right shift? Try to compare two examples below.

union _Size
{
struct _B
{
unsigned char L;
unsigned char H;
} B;
unsigned short W;
} Size;

// Example 1 == I want to modify only word before I read two individual
bytes.
Size.B.L = 0xFF;
Size.B.H = 0x20;
Size.W = 0x0A; // 1 bit as carry fell off Size.B.L and add carry to Size.B.H
unsigned char L = Size.B.L;
unsigned char H = Size.B.H;

// Example 2 -- I want to avoid "AND" and right shift.
Size.B.L = 0xFF;
Size.B.H = 0x20;
Size.W = 0x0A; // 1 bit as carry fell off Size.B.L and add carry to Size.B.H
unsigned char L = Size.W & 0xFF;
unsigned char H = Size.W >> 8;

Read two bytes directly instead of "AND" and right shift can only have
two instructions of machine language. Using "AND" and right shift may have
more than two instructions. You have to decide which example 1 or 2 is best
for you. You tell C/C++ Compiler to test optimization and see which is
faster. I should use this to be ported to non-x86 machine and test it.
Please state your opinion. Should I use "AND" and shift? Should I use
union? Which is best practice writing C++?

Bryan Parkoff
 
J

James Kanze

Although the Standard does not the define the behavior of
what happens when you write to one union member and then read
from a different one, it certainly does not restrict the
implementation from defining the behavior.
I've never come across a system where this union practice
didn't do exactly what you want it to do. Never.

Really. G++ does define it (I think), but it's about the only
one that doesn't. I've had real problems with it in the
past---with the Microsoft C compiler.

Note that depending on the optimization options, and what you
are doing around it, it may also seem to work even though the
compiler doesn't guarantee it. Seeming to work is one of the
possible behaviors of undefined behavior. And it may stop
working as a result of modifying some totally unrelated
statement. (That was, in fact, the behavoir I encountered with
Microsoft C.)
I doubt you'll be sacrificing portability if you go ahead with
this method.

You definitely will be. In practice, as well as in theory.
 
J

James Kanze

I prefer to make sure to standardize my source code to ANSI C/C++ and
accepts portability.

Regretfully, conforming to the relevant ISO standard doesn't
guarantee portability.
Sometimes, Microsoft Visual C++ 9.0 has extended standard that
standard C/C++ Compiler does not accept. For example, you use
comment "//" and "/* */" on file.c (not file.cpp). Standard C
Compiler does not accept "//".

If they conform to the C standard, they do. "//" is just as
valid in C as in C++.
What choice do I have? Should I use union if I want to share
four sizes into one 64 bit variable?

Of course. But of course, portably, you can only access the
last value written. Portably, it can't be made to work
otherwise anyway---type punning, even when it works, is never
portable.
Should I avoid union and use big size using "AND"
and right shift? Try to compare two examples below.
union _Size
{
struct _B
{
unsigned char L;
unsigned char H;
} B;
unsigned short W;
} Size;
// Example 1 == I want to modify only word before I read two individual
bytes.
Size.B.L = 0xFF;
Size.B.H = 0x20;
Size.W = 0x0A; // 1 bit as carry fell off Size.B.L and add carry to Size..B.H
unsigned char L = Size.B.L;
unsigned char H = Size.B.H;

The problem is that even if the type punning worked, the values
you get in L and H will vary. You're not even really guaranteed
that one will be 0x0A, and the other 0x00 (although it's hard to
imagine an implementation where this wouldn't be the case).

(Also, I can't make any sense of your comment.)
// Example 2 -- I want to avoid "AND" and right shift.
Size.B.L = 0xFF;
Size.B.H = 0x20;
Size.W = 0x0A; // 1 bit as carry fell off Size.B.L and add carry to Size..B.H
unsigned char L = Size.W & 0xFF;
unsigned char H = Size.W >> 8;
Read two bytes directly instead of "AND" and right shift can only have
two instructions of machine language.

I would expect just about any reasonable compiler to generate
the same code for your two examples. And if the code is
different, the second will probably be faster, since it will
only require one memory read.
Using "AND" and right shift may have more than two
instructions.

Or not. It depends on the compiler and the architecture.
Masking with 0xFF and shifting right 8 bits are common enough
idioms for accessing bytes that the compiler will recognize
them, and generate the byte access instructions, if that is the
fastest way to do it.
You have to decide which example 1 or 2 is best
for you. You tell C/C++ Compiler to test optimization and see which is
faster. I should use this to be ported to non-x86 machine and test it.
Please state your opinion. Should I use "AND" and shift? Should I use
union? Which is best practice writing C++?

Please state what you are trying to accomplish. Until we know
that, we can't very well say what the best way to do it is.
 
R

Rolf Magnus

You could separate portability into several parts: The hardware
architecture, the operating system and the compiler. If we are talking
about x86, the hardware is pretty much fixed, but there are still different
compilers that may handle things differently.
I prefer to make sure to standardize my source code to ANSI C/C++ and
accepts portability.

That's basically a good idea, but for those parts that are as low-level as
CPU registers, you can't get, nor do you need, full portability.
Sometimes, Microsoft Visual C++ 9.0 has extended
standard that standard C/C++ Compiler does not accept. For example, you
use
comment "//" and "/* */" on file.c (not file.cpp). Standard C Compiler
does not accept "//".

This has actually been part of standard C for almost 9 years now.
How do you think? You try to map x86 register set. Then, how do you
map register set on non-x86 machine? Maybe some machine do not have four
sizes to be shared on one register meaning register set does not accept
byte, word, and dword.

x86 is the only architecture I heard of that does this. There are many
differences bewteen CPU architectures concerning registers. Many other
architectures have a lot more registers, but some of those might have
special meaning. Other architectures don't even have registers. And there
are architectures which can dynamically switch between several modes, with
register sets behaving differently depending on the mode. Hardware
registers are as unportable as it gets.
It does accept qword. Then, you are able to use char, word, dword, and
qword on C++ source code. Then, C/C++ Compiler will convert all sizes to
64-bit size using mask to clear all upper bits like "AND".
What choice do I have? Should I use union if I want to share four
sizes into one 64 bit variable? Should I avoid union and use big size
using "AND" and right shift? Try to compare two examples below.

If you want to use 64 bit values, you already have to sacrifice portability.
In standard C++, there is no portable type that is guaranteed to be 64 bits
wide. Most compilers seem to offer such a type, but under different
compiler-specific names.
union _Size
{
struct _B
{
unsigned char L;
unsigned char H;
} B;
unsigned short W;
} Size;

Don't use names starting with an underscore followed by an uppercase letter.
Those are reserved for the compiler/standard library.
// Example 2 -- I want to avoid "AND" and right shift.
Size.B.L = 0xFF;
Size.B.H = 0x20;
Size.W = 0x0A; // 1 bit as carry fell off Size.B.L and add carry to
Size.B.H unsigned char L = Size.W & 0xFF;
unsigned char H = Size.W >> 8;

Read two bytes directly instead of "AND" and right shift can only have
two instructions of machine language. Using "AND" and right shift may
have more than two instructions.

It may or may not, depending on the optimization capabilities of your
compiler. Don't speculate what the compiler might produce out of your code.
If you want to know, look into assembler output. It might surprise you.
Also, there is still the cast option. Unions are not the language element
that is meant to be used for this kind of thing.
 
P

Pete Becker

I doubt the OP is even concerned about portability. Looks like he's
trying to map the x86 register set.

Even so, portability may be a legitimate concern. There is more than
one compiler that targets the x86.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,236
Members
46,822
Latest member
israfaceZa

Latest Threads

Top