How to extract bytes from long?

S

Samuel Barber

Hello? The point is that -1 is ***being used as*** a bit pattern. The
intent is to get "all 1s", which is true if the integer representation
is 2's complement; that's an implicit assumption of the code. (This is
the best reason not to use -1: the intent is not perfectly clear).

(Quoting myself)

Please disregard this part of my reply. I misinterpreted what Pete was saying.

Sam
 
S

Samuel Barber

Arthur J. O'Dwyer said:
Of course it can! (Why wouldn't it? And do modern digital computers
perform any operations that *aren't* bitwise, anyway?)

The C bitwise operators are &, |, ^, and ~ (>> and << are also
included in this catagory, although bitwise is a misnomer in the case
of shifts). "Trap" in the context of this discussion seems to mean
"detect an illegal value"; well, there are only two values of interest
to bitwise operators (0 and 1), and they are both legal. So how can it
trap?

Sam
 
S

Samuel Barber

Sheldon Simms said:
6.2.6.2 Integer types
2 ... (It is implementation-defined) whether the value with ...
sign bit 1 and all value bits 1 (for one's complement), is a trap
representation or a normal value. In the case of ... one's
complement, if this representation is a normal value it is called
a negative zero.
...
3 If the implementation supports negative zeros, they shall be
generated only by: the &, |, ^, ~, <<, and >> operators with
arguments that produce such a value;
...
4 If the implementation does not support negative zeros, the
behavior of the &, |, ^, ~, <<, and >> operators with arguments
that would produce such a value is undefined.

Okay, but if we are to believe this hocus pocus, there is no way to
avoid the hypothetical trapping. It makes no difference whether you
use (unsigned char)-1, (unsigned char)~0, or UCHAR_MAX, since they all
evaluate to the same thing. All are equally right or equally wrong.

Sam
 
I

Irrwahn Grausewitz

The C bitwise operators are &, |, ^, and ~ (>> and << are also
included in this catagory, although bitwise is a misnomer in the case
of shifts).

Not from the standard's POV. In fact, they are referred to as "bitwise
shift operators" explicitly.
"Trap" in the context of this discussion seems to mean
"detect an illegal value"; well, there are only two values of interest
to bitwise operators (0 and 1), and they are both legal. So how can it
trap?

Read C99 6.2.6.2, Sheldon Simms already quoted the relevant parts in
reply to your other post.

Regards
 
C

Chris Torek

Okay, but if we are to believe this hocus pocus, there is no way to
avoid the hypothetical trapping. It makes no difference whether you
use (unsigned char)-1, (unsigned char)~0, or UCHAR_MAX, since they all
evaluate to the same thing. All are equally right or equally wrong.

You appear to have an incorrect "mental model" of how C works.
This is not surprising; I suspect most people do. (One needs to
have worked with those oddball ones'-complement machines to really
have a feel for this stuff.)

The language is not defined in terms of "what happens on a PDP-11",
nor "what happens on a VAX", nor even "what happens on an x86 or
other CPU produced within the last few years". Rather, it is defined
in terms of an "abstract machine". A C compiler writer must map
from "abstract machine" to "real machine" in some way.

The section quoted above (along with others) define how the abtract
machine is to work. In the abstract machine, writing:

~0

means:

- make an int with the value 0
- now, flip all the bits

This process *can* give rise to a "trap representation" on a ones'
complement machine.

On the other hand, writing:

-1

means:

- make an int with the value 1
- now, negate it

This process *must* produce the (ordinary signed int) value -1.
On a ones' complement machine, this value in binary is a sequence
of 1 bits followed by a zero, e.g., 111111111111111110 -- 17 1 bits
and then a 0 -- on an 18-bit-int ones' complement CPU. (The CPU
I am using as a model here is the Univac 11xx, which has 9, 18,
and 36 bit integers and *does* use ones' complement.)

Converting any ordinary signed int to type "unsigned char" *must*
produce a valid unsigned char bit pattern and value -- these are
defined as more or less the same thing in the abstract machine --
and the process by which a negative signed int is transformed into
a (positive) unsigned char is defined mathematically. If the
signed int has value -1, the result must be UCHAR_MAX, which is
a valid bit pattern that consists of all-1-bits, e.g., 111111111
(9 ones) on a 9-bit-byte ones' complement CPU.

At this point you are probably ready to hit your "post follow-up"
key or mouseable button or whatnot, saying: "What?! HOLD ON! JUST
A CONSARNED MINUTE! That all-1-bits pattern, you just said it's
a trap representation, now you say it's a valid value?!?" Yep.
How can it be both?

The answer lies in the *type* of the value. When the *type* of
the value is "signed int", an all-one-bits pattern is allowed to
be a "trap representation". When the type is "unsigned int", this
is *not* allowed. If the target CPU makes this a royal pain in
the butt, well, too bad for the C compiler implementor and/or user
-- "unsigned"s are going to be difficult and/or slow. But if you
*need* all-one-bits patterns, you -- as a C programmer -- should
use "unsigned" arithmetic, which is well-behaved and avoids all
these "trap representation" things. Moreover, given:

unsigned int ui = UINT_MAX;

the sequence:

ui++;

is *guaranteed* to cause ui to "roll over" to zero, without trapping
at runtime with an overflow error. With ordinary signed ints there
is no guarantee -- they may "roll over" (from positive to negative
or vice versa) or they may trap at runtime, whichever the implementor
finds easier or "better".

At the edges, the rules for C can get pretty complicated, but
there *are* simple answers for the common cases:

- If you need an ordinary signed integer and do not believe
you will overflow it, use an ordinary signed integer. (Use
"long" if your range is -2 billion to +2 billion; in C99, use
"long long" if your range is -9 quintillion to +9 quintillion.
Numerically these are 2147483647 and 9223372036854775807
respectively, in case you-the-reader are someone who uses
"milliard". :) Ordinary "int" is only guaranteed to handle
[-32767..+32767], even though it often handles the 2 billion
number.)

- If you need modular "clock arithmetic", use an unsigned integer.

- If you need to do bitwise operations, use an unsigned integer.

- If you need exact, precisely defined behavior in *all* cases,
use an unsigned integer, synthesizing your own signed values
from these if desired. (In other words, build your own ones'
or two's complement or sign-and-magnitude system.)

Incidentally, one trick proposed (but not actually used on the
Univac) for unsigned integers vs. trap representations is, e.g.,
to have "unsigned int" be only 17 bits, while ordinary signed int
is 18 bits. Then UINT_MAX and INT_MAX are the same number (!),
and "unsigned"ness is achieved mainly by forcing the sign bit to
stay off. This appears to be allowed by the C standard. It is
therefore possible that the "simple rules" *still* do not achieve
the desired effect, depending on what that desired effect might
be.
 
C

cody

Arthur J. O'Dwyer said:
However, it *does* produce the right answer, which is a point in
its favor. ~0 might trap, and in any case I think (unsigned char)-1
has a bit more aesthetic value to it (YMMV, of course).

Why should ~0 trap??? it results in the 1's complement of 0 which means all
bits are 1's. Padding bits are not affected by this operation, however the
values of padding bits should never be of your interest.

A Trap representation can *only* be generated when manipulating the value
using pointers which aren't the type of the value or doesn't start at the
exact address of the value. Wrong usage of an union can also result in a
Trap representation.

But all arithmetic or bitwise operations cannot result in a
trap-representation.
 
C

CBFalconer

pete said:
I was addressing the more general subject,
in the subject line of this thread: "How to extract bytes from long?",
rather than how to extract bytes from 0x12345678
or any other number which doesn't require more than 32 bits.

You were not extracting bytes. You were extracting 8 bit
quantities, least significant part first. In other words you are
expressing the _value_ in base 256, so why not say so in the code?

If the compiler knows that it can improve the code by using
shifts, it may do so.
 
A

Arthur J. O'Dwyer

Arthur J. O'Dwyer wrote:
...

You must have miscalculated somewhere. (uc)-1 should be UCHAR_MAX.

Augh! I do that every time! Thanks.

(uc)-1 == (uc)(0 - 1) == (-1+UCHAR_MAX+1) == UCHAR_MAX

-Arthur
 
A

Arthur J. O'Dwyer

Why should ~0 trap??? it results in the 1's complement of 0 which
means all bits are 1's

....which may be a trap representation on a ones'-complement
architecture.
Padding bits are not affected by this operation, however the
values of padding bits should never be of your interest.

Well, technically padding bits *might* be affected by the ~
operation, but the effect on the padding bits alone cannot create
a trap representation -- the system has to remember to do the
Right Thing with them in this case.
A trap representation can *only* be generated when manipulating the
value using pointers which aren't the type of the value or doesn't start
at the exact address of the value.

Wrong. Signed integer overflow may create a trap value, for instance.
Wrong usage of an union can also result in a trap representation.

But all arithmetic or bitwise operations cannot result in a
trap representation.

Wrong.

-Arthur
 
S

Sheldon Simms

Why should ~0 trap??? it results in the 1's complement of 0 which means all
bits are 1's. Padding bits are not affected by this operation, however the
values of padding bits should never be of your interest.

A Trap representation can *only* be generated when manipulating the value
using pointers which aren't the type of the value or doesn't start at the
exact address of the value. Wrong usage of an union can also result in a
Trap representation.

But all arithmetic or bitwise operations cannot result in a
trap-representation.

Go back and read the whole thread
 
P

pete

CBFalconer said:
You were not extracting bytes. You were extracting 8 bit
quantities, least significant part first. In other words you are
expressing the _value_ in base 256, so why not say so in the code?

I don't know what you're talking about.
Which code do you think is mine?
 
S

Samuel Barber

Chris Torek said:
The section quoted above (along with others) define how the abtract
machine is to work. In the abstract machine, writing:

~0

means:

- make an int with the value 0
- now, flip all the bits

This process *can* give rise to a "trap representation" on a ones'
complement machine.

Thanks, I get it now. The correct usage is therefore ~0u or ~0U.

Sam
 
S

Samuel Barber

You're just repeating words, without any understanding. There's no
such thing as "negative zero" (or negative anything) in the context of
bitwise operations. How can a bitwise operation trap? It can't.

I was wrong about everything. ~0 is "wrong" (in terms of the abstract
C machine); the correct expression is ~0u or ~0U.

Sam
 
M

Micah Cowan

cody said:
Why should ~0 trap??? it results in the 1's complement of 0 which means all
bits are 1's. Padding bits are not affected by this operation, however the
values of padding bits should never be of your interest.

A Trap representation can *only* be generated when manipulating the value
using pointers which aren't the type of the value or doesn't start at the
exact address of the value. Wrong usage of an union can also result in a
Trap representation.

But all arithmetic or bitwise operations cannot result in a
trap-representation.

This is not true. You should read the relevant portions of the
standard before making such assertions.

-Micah
 
M

Micah Cowan

Arthur J. O'Dwyer said:
...which may be a trap representation on a ones'-complement
architecture.


Well, technically padding bits *might* be affected by the ~
operation, but the effect on the padding bits alone cannot create
a trap representation -- the system has to remember to do the
Right Thing with them in this case.

I don't believe this is true. Can you back this up with a quote
from the standard? I cannot remember any instance where the
standard says this can't be: and I can think of a specific spot
where the standard says (non-normatively) that this *can* be (see
footnote 45 to 6.2.6.2#5). The lack of exclusion means it
certainly is possible.

-Micah
 
A

Arthur J. O'Dwyer

I don't believe this is true. Can you back this up with a quote
from the standard?

From N869, section 6.2.6.2, footnote 39:

Some combinations of padding bits might generate trap
representations, for example, if one padding bit is a
parity bit. Regardless, no arithmetic operation on valid
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
values can generate a trap representation other than as
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
part of an exception such as an overflow, and this cannot
occur with unsigned types. All other combinations of
padding bits are alternative object representations of
the value specified by the value bits.

(The index to N869 clearly puts "bitwise operators" under "arithmetic
operators," even if the text doesn't explicitly say so.)

This is obviously the Right Thing for the standard to say, too,
since if padding bits' values *could* create trap representations
out of thin air, how could programmers on those platforms ever
compute anything?
I cannot remember any instance where the
standard says this can't be: and I can think of a specific spot
where the standard says (non-normatively) that this *can* be (see
footnote 45 to 6.2.6.2#5).

N869 doesn't have a 6.2.6.2#5. Could you post that paragraph and
footnote, please?

-Arthur
 
M

Micah Cowan

Arthur J. O'Dwyer said:
From N869, section 6.2.6.2, footnote 39:

Some combinations of padding bits might generate trap
representations, for example, if one padding bit is a
parity bit. Regardless, no arithmetic operation on valid
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
values can generate a trap representation other than as
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
part of an exception such as an overflow, and this cannot
occur with unsigned types. All other combinations of
padding bits are alternative object representations of
the value specified by the value bits.

(The index to N869 clearly puts "bitwise operators" under "arithmetic
operators," even if the text doesn't explicitly say so.)

The above quote is footnote 45 (*and* 44: typo!) of
6.2.6.2 in the final version.

The same grouping appears in the index of the final version; but
the index isn't normative (nor are the footnotes). However, the
footnote you quote above still seems to be borne out by normative
text. But I don't think the quote quite supports your assertion
that "the padding bits alone cannot create a trap
representation": The fact that "proper" arithmetic operations
cannot affect the padding bits in such a way as to create a trap
representation doesn't imply that affecting the padding bits can
not be the sole cause of trap representations; it means that
implementations can't cause padding bits to somehow find
themselves in that representation in the coruse of well-defined
arithmetic operations. But this is obvious, because the only way
to affect padding bits in the first place is to use arithmetic
operations in such a way as to cause exceptional conditions (such
as overflows). I see no reason why using ~ on a signed integer
type can't cause such an exceptional condition: Obviously,
overflow itself can easily occur, and the very act of flipping a
padding bit could be considered an "exceptional condition",
AFAICT.
This is obviously the Right Thing for the standard to say, too,
since if padding bits' values *could* create trap representations
out of thin air, how could programmers on those platforms ever
compute anything?

This is not a valid conclusion. Assuming padding bits' values
*can* create trap representations out of thin air, you can only
conclude that an implementation can't arrange padding bits to
such a representation when arithmetic operators are being
*properly* used. Can you think of a well-defined way to directly
affect padding bits? But some ways that don't involve arithmetic
operators (directly) would be to read an uninitialized value
(which could already *be* a trap representation), or you could
do:

int foo;
unsigned char *bar = (unsigned char *)&foo; /* Well-defined */

bar[0] = bar[0] ^ 01000; /* Well-defined, but might set a
padding bit of foo. */
foo; /* Not well-defined: might be a trap representation. */
N869 doesn't have a 6.2.6.2#5. Could you post that paragraph and
footnote, please?

The paragraph doesn't tell you anything you don't know (it's got
to be in N869), and the footnote is the very same one you've
quoted from N869. But here it is, anyway (6.2.6.2#5:)

The values of any padding bits are unspecified.45) A valid
(non-trap) object representation of a signed integer type where
the sign bit is zero is a valid object representation of the
corresponding unsigned type, and shall represent the same
value.

-Micah
 
I

Irrwahn Grausewitz

Micah Cowan said:
"Arthur J. O'Dwyer" <[email protected]> writes:

The paragraph doesn't tell you anything you don't know (it's got
to be in N869)
<snip>

Chapter 6.2.6.2:
C99 final N869
#1 --> #1
#2 --> #2
#3 --> --
#4 --> --
#5 --> #3
#6 --> #4

Regards
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,091
Messages
2,570,605
Members
47,225
Latest member
DarrinWhit

Latest Threads

Top