Size of bool unspecified: why?

J

JBarleycorn

James said:
No. Unspecified means that you cannot know or depend on the
size; it might change from one compilation to the next.

So is the list of unspecified things in the standard very long? I'd like
to know what those things are. I surely hope bool is not unspecified, but
even if it is, I don't care, because I don't use it directly. I know that
bool is broken, I just don't know how broken it is.
Implementation defined means that each implementation must
define it, and document how it is defined.



C didn't have bool, so there can't be an issue of C
compatibility.
Indeed.

As for data transfer and storage, you treat it
like any other type: you define the external format, and
implement the conversion.

Unnecessary tedium. Use bool though, and it becomes necessary tedium.
In practice, it's probably simpler to
handle than most other types, since it's relatively trivial to
handle all possible values separately.



If the representation is a single byte, and the both the caller
and the callee agree, there's no need for extension.
Extension
is only necessary if you have a smaller size, and need to pass a
larger one.

I have a feeling that "most" compilers by default expand all arguments
less than the "word" size to the "word" size. Surely for simplicity of
implementation... stack offsets, etc.
Depending on the architecture, it may be necessary, or more
efficient, to pass par a register, rather than to push directly
onto the stack, if the bool is in an array (and thus
misaligned).



Have you actually verified that?

I said "probably", as I wouldn't expect that, and I don't know why, but I
must have had something in mind when I wrote that. Certainly easy enough
to check: just look for the extending instructions instead of the
non-extending instructions. "movzx" vs. "mov" on x86, for example.
I would expect almost every
compiler to do so when it is advantageous.



Even common practice depends on the platform.

Not always, and not necessarily usually (it depends on the area of
interest) and surely I was asking about cross-platform commonality *if*
there was any.
 
J

JBarleycorn

Ian said:
Anyone who is writing for a specific platform, or who doesn't care
about the sizes of natural types.


OK, give one concrete example where the size of bool matters.


So you are writing embedded code or drivers? There are very few other
places where the size of type matters.


You keep staying bool is broken, but you still haven't provided any
real world examples to back it up.


How would your design cope with both 8 bit MCUs and 32 bit DSPs (where
CHAR_BIT = 32)?

*I* don't have to for I consider anything where char is not 8 bits to be
an exotic.
Please name one.

On the other stuff, we're going around in circles so I'll just leave it
at that. Please my replies to JK in this thread for more insight to the
"bool is broken" thing. I've already noted that having some kind of
specification of fixed-width integral types in the standard shows that
the size of a type does matter. In addition, I'll add here that
recognition that type size matters was the motivation for modifying enums
so that the type could be specified. Before that, enums were also broken.
In the same vein, bool is broken, because there is no way to rely on the
size of it (though in practice maybe it doesn't vary so much across
platforms... I don't know).
 
J

JBarleycorn

Ian said:
Can anyone here say definitively whether bool is this "unspecified"
thing or ""implemetation defined" or... ? (!)

From

5.3.3 1

"[Note: in particular, sizeof(bool) and sizeof(wchar_t) are
implementation-defined]"

Thx. That's still not good enough though to make bool unbroken, though.
It *is* what makes bool broken.
 
I

Ian Collins

*I* don't have to for I consider anything where char is not 8 bits to be
an exotic.

How can something that exists in the hundreds of millions be considered
exotic?
On the other stuff, we're going around in circles so I'll just leave it
at that. Please my replies to JK in this thread for more insight to the
"bool is broken" thing.

I assume you won't give a real world example why bool is "broken"
because you don't have one.
I've already noted that having some kind of
specification of fixed-width integral types in the standard shows that
the size of a type does matter.

For specialised cases, where bool wouldn't be appropriate.
In addition, I'll add here that
recognition that type size matters was the motivation for modifying enums
so that the type could be specified. Before that, enums were also broken.
In the same vein, bool is broken, because there is no way to rely on the
size of it (though in practice maybe it doesn't vary so much across
platforms... I don't know).

Yet again, in most cases, the size doesn't matter.
 
J

JBarleycorn

Ian said:
How can something that exists in the hundreds of millions be
considered exotic?


I assume you won't give a real world example why bool is "broken"
because you don't have one.

That you wrote that and keep asking that shows you don't know what the
discussion is all about. I refer you to read the other posts.
For specialised cases, where bool wouldn't be appropriate.


Yet again, in most cases, the size doesn't matter.

Just the opposite, to me (as I have noted in *another post*): in most
cases, type size does matter. That you may choose not to take advantage
of the capabilities that the language offers is up to you. I, OTOH,
exploit it as required and as can be had.
 
I

Ian Collins

That you wrote that and keep asking that shows you don't know what the
discussion is all about. I refer you to read the other posts.

I keep asking because you keep making unsubstantiated claims. Time to
put up.
 
N

Nick Keighley

Ian Collins wrote:


Why would anyone expect sizeof(int) to be anything specific? int is not
an issue. The fixed-width sizes are an issue, but you can test for it and
do whatever is necessary to maintain consistency.

if you want struct with predicatable layout you have to replace all
the ints with int32_t (or int16_t). Just do the same with bools. I'm
sorry I can't see the difference.

<snip>

Why is bool in any way special?
 
N

Nick Keighley

The only fixed-width types I know are optional, and aren't
supported by implementations which don't have immediate hardware
support for them.  Their use is fairly limited, and I can't
think of a case where I'd use them in a struct.  (They are
convenient for certain operations, however, when their
restricted portability isn't an issue.)


You keep saying this, but it flies in the face of actual facts.
People use bool in structs all the time, with no problems.


You've mentionned serialization several times, but you've yet to
explain how the fact that the size of bool being implementation
defined affects it in any way.  The internal representation
doesn't generally affect serialization.  And it varies, not just
for bool, but for int.

I'm guessing he wants to write raw structs to disk or to comms links.
Hence he cares about layout. I've spent too much time debugging
programs that did this to be keen on the idea. So I consider
serialization a necessary evil. Since it is rather tedious code its
generation can be mechanised.

I'm still baffeled as to why 'bool' is special but 'int' isn't. I
wonder what he does with floatign point... Oh wait everyone uses IEEE
so portability is trivial! :)
 
N

Nick Keighley

How doesn't it?


Well, there are structs, and then there are structs.




On VC++ 2010 it is one byte. In some version long ago, it was the size of
int, but probably because that was before the standard declared it could
be unspecified. Let's play a game ("Would you like to play a game? How
about a nice game of chess?" ;) ). If you could have your bool and eat it
too... let me rephrase that: if you could create your own boolean type,
describe it. Describe 2 if you want to: one for C++, and one for your
ideal language that does not yet exist.

define-type bool
bool is-element-of {symbol-true, symbol-false}
abbrev symbol-true=T symbol-false=F -- just within this
definition

operators
and: T and T = T, T and F = F, F and T = F, F and F = F
...

I believe bools can be defined in terms of lattices (the hour I spent
on that comp sci lecture I will never get back)
 
N

Nick Keighley

How doesn't it?


Well, there are structs, and then there are structs.




On VC++ 2010 it is one byte. In some version long ago, it was the size of
int, but probably because that was before the standard declared it could
be unspecified. Let's play a game ("Would you like to play a game? How
about a nice game of chess?" ;) ). If you could have your bool and eat it
too... let me rephrase that: if you could create your own boolean type,
describe it. Describe 2 if you want to: one for C++, and one for your
ideal language that does not yet exist.

define-type bool
bool is-element-of {symbol-true, symbol-false}
abbrev symbol-true=T symbol-false=F -- just within this
definition

operators
and: T and T = T, T and F = F, F and T = F, F and F = F
...
 
J

James Kanze

Why would anyone expect sizeof(int) to be anything specific?
int is not an issue. The fixed-width sizes are an issue, but
you can test for it and do whatever is necessary to maintain
consistency.
A different numeric type, which there are a number to choose
from. Who writes a program that doesn't have a header that
tests for assumptions based upon the platform?

Who does? In over thirty years of experience in C and C++, I've
yet to see any application which uses such a header. I'm
tempted to say that someone who uses such a header doesn't
understand the basic philosophy behind C and C++.
If bool doesn't meet your expectations, you don't have the option to
change to a similar type of different width, because there is no similar
type of different width.

Sure there is: signed char, short, int and long are all similar
types, with potentially different widths. (Don't ask me why
bool is a signed integral type, rather than an unsigned one.)
You have to change compilers or something. *Or*, avoid bool
from the get go noting from the start that you can't rely on
the width of bool.

Exactly like every other integral type.
*Very* often.

For example? You keep saying it's important, but I've only
encountered the case in a few rare instances. (For starters, it
can't be important in portable code.)
*Very* often.

Then there's something wrong with your programming style.
*Very* much more often than using non-fixed-width integral
types.

Then there's something very wrong with your programming style.
Well there you go then.

Drivers are a bit special, but even in drivers, it's not rare to
use the standard types. And of course, unless things have
changed enormously since I was writing drivers, the fixed width
types don't occur in structs.

[...]
No, not likely for then bool would be eliminated from a lot of places
where it could be used to good effect. I think multiple-width boolean
types can be made to work, but the complexity is not worth it, so 1-byte
bools, even though there may be a performance penalty on most platforms,
are probably the best choice. I have pretty much decided that where a
"fast bool" is needed it's a profile/optimizaation thing and trying to
find an optimum boolean type size as the best compromise across a set of
platforms falls into the category of premature optimization. Therefore,
the overriding concern is suitability for use as a struct field. So, my
boolean class uses a 1-byte type as the underlying type.

That sounds like the most rational decision. Or rather, you use
the bool the implementation gives you, until you need something
faster for your application. (As I've mentionned earlier, I
expect that bool is a single byte on most, if not all, byte
addressed platforms.)
 
J

James Kanze

if you want struct with predicatable layout you have to replace all
the ints with int32_t (or int16_t). Just do the same with bools. I'm
sorry I can't see the difference.

That still doesn't give you a struct with predictable layout.
int32_t has a different layout on a Sparc than on a PC. And
doesn't even exist on some mainframes. (The standard requires
int32_t to be a 2's complement representation. And at least two
mainframe architectures don't use 2's complement.) Machines
without int32_t can probably be considered exotic, and I suspect
that most programmers don't have to take them into account. On
the other hand, Intel's layout of int32_t is not the same as
that of a Sparc or a Power PC, both widespread architectures.
 
J

James Kanze

James Kanze wrote:
[...]
Their use is fairly limited, and I can't
think of a case where I'd use them in a struct. (They are
convenient for certain operations, however, when their
restricted portability isn't an issue.)
You view of the concept seems to be exactly the opposite of mine. Your
view seems more appropriate, to me, for a 4GL rather than for C/C++.

My view is based on Kernighan and Richie. It was current thirty
years ago, and dominates things like Unix kernel code (which is
hardly written in the spirit of a 4GL).
It depends on how you program. When I say that, I mean for the way I
program and use the language.

I'd still be interested in seeing an actual example where the
size of bool makes a difference. Suppose it was required to be
1: XDR requires it to be four bytes, so if you're serializing in
XDR format, you write:

void
writeBool( std::eek:stream& dest, bool value )
{
writeInt( dest, value ? 1 : 0 );
}

or something along those lines. Code which, of course, works
equally well if the size of bool is something other than 1.
If I had to be continually faced with having to think about what the
effect of integral type size changes would be in C++, I would not use the
language. Though maybe I wouldn't program at all then!

Who thinks of it? Ever. About the only thing one might
consider is whether the type is large enough: I've seen a lot of
code which uses long, rather than int, because INT_MAX is only
guaranteed to be 32767. This even in code that will clearly
never run on a 16 bit machine. *IF* I felt I had to take this
into consideration, I'd use things like int_least32_t or
int_fast32_t. (But if the code clearly will never be used on a
16 bit machine, I'll just go with int.)
int and bool are in the same boat in that respect. The *only* time I use
int, or its unsigned counterpart, is when I consciously *want* a type to
expand across 32-bit and 64-bit platforms, and yes, I test in a header if
my assumptions are valid on a given platform. I view fixed-width integral
types, even if I have to synthesize them as I did before there was
stdint.h, as the "normal" ones and the "expanding" types as the special
ones. Try it sometime and you'll probably never go back to the other way
of thinking.

Why? It sounds like a major step backwards to me, and it
certainly isn't how the language was designed to be used.
 
J

James Kanze

[...]
First, I don't think it's unspecified.
This "unspecified" designation is something new to me. I don't really
understand the point of it or how to determine if something is
"implementation defined" or that.

How you determine is by reading the standard. The distinction
is formally important: something that is implementation defined
must be fully defined and specified by the implementation, so if
you are writing for a specific implementation, you can count on
it. The size of an int is a good example: it may vary between
different implementations, but for a given implementation, it
will be the same every time you compile your code (at least with
the same options), and (at least in theory), you can find out
exactly what it is in the documentation. Something which is
unspecified needed be documented, and can vary. The classical
example is order of evaluation: if you write something like 'f()
+ g()', and f() and g() have side effects, the order those side
effects occur is unspecified, and might be different from one
compilation to the next (and often will be different if you
change the optimization options).
There are a few different integral types, that are not boolean, from
which to synthesize "fixed-width" integral types.

'bool' is an integral type. In the rare cases where you need
fixed-width types, you can specify int32_t, and assign a bool to
it, with no problem.
APIs are only half of the story.

So what's the other half?

[...]
Right, they didn't, but had they, they would have stood at
least some chance at not implementing a boolean type that is
broken.

Well, the C++ definition of bool isn't broken. IMHO, the C
definition is, but the C committee obviously doesn't agree with
me on that:). In both cases, the type has been "weakened" by
considerations of backwards compatibility: implicit conversions
in C++ (so that e.g. 'if (p)', where 'p' is a pointer, still
works---the original proposal would have deprecated this, but
the final version didn't), and in C, the need to include a
special header to "activate" bool, and the fact that it is still
a typedef to another integral type.
[...]
Are not 64-bit ints indeed more
efficient (in time) than 32-bit ints, say on x86 vs. x64? Anyone
have a good simple test for this?
It depends on how you use them, but I'd be very surprised if
there were any use cases where 64 bit ints would be faster than
32 bit ints.
I was thinking that the register size and instructions on a 64-bit
machine would somehow favor 64-bit operations.

I can't speak for all 64 bit machines, of course, but at least
on an Intel or a Sparc, there should be no speed difference for
an isolated instance.
In usage, those probably cover enough of the territory. From a language
implementation standpoint, some of those 12 others you have in mind may
be important. What are the 12 others you have in mind?

The most important one is probably the cost of testing one, i.e.
the execution time for 'if ( aBooleanValue )'. In some
contexts, the cost of handling large arrays (where the number of
instances you can fit in a cache line is important) may be
significant.

Note that on most of the systems I currently use, the effective
size of a bool will vary: although the value is effectively
maintained on a single byte, they will pass a bool into a
function as a 4 or 8 byte value, and in a struct, the effective
size (including padding) will depend on what comes after the
bool in the struct, and may be one, two, four or eight bytes.
Do you know of any test suites to analyze such? That is, not the cache
line thing, but the integral type thing, for the latter is the more
abstract thing.

No. In practice, it tends to vary, and even depend on the exact
model of the chip being used. Still, in my current work (which
includes a lot of high performance number crunching), we
consistently find that reducing the size of each element in an
array improves performance accross the board, on all of our
different machines. In one particular case, for example,
replacing:

struct Something { bool isValid; double value; };
std::vector<Something> theData;

with:

std::vector<double> theValues;
std::vector<unsigned char> theIsValidFlags; // 8 flags per entry

resulted in a significant speed-up, despite the additional
effort needed to extract the flag. (Note that I'm not
recommending this in general. It just happened to improve
performance in one particular critical case in our code.)
 
J

James Kanze

Try it sometime. Your paradigm may be holding back your potential.

I suspect the reason Nick argues against it is that he has tried
it. I can remember when Microsoft changed the layout of a long
between versions of a compiler. Update the compiler, and none
of your data was readable. Unless, of course, you'd written the
serialization code correctly. And of course, a lot of companies
today are migrating from Solaris Sparc or IBM AIX to Linux on a
PC.
 
J

James Kanze

[...]
No. Unspecified means that you cannot know or depend on the
size; it might change from one compilation to the next.
So is the list of unspecified things in the standard very long?

Definitely more than I'd like. The two which seem to cause the
most problems in the code I've seen is order of evalutation:
something like:

std::cout << f() << ' ' << g() << std::endl;

, for example, where both f() and g() use and modify the same
global variable, and whether intermediate values in a floating
point expression are in extended precision or not: the trickiest
case I saw of that was where someone had defined:

bool operator<( MyClass const& lhs, MyClass const& rhs )
{
return lhs.doubleCalule() < rhs.doubleCalule();
}

and used it as an ordering operator for 'sort'. (The compiler
he was using returned the floating point value in a register
with extended precision, but truncated to double when it spilled
to memory. Which resulted in the function returning true when
lhs and rhs designated the same object.)

[...]
Unnecessary tedium. Use bool though, and it becomes necessary tedium.

Only unnecessary if you don't need to read the data later (in
which case, writing it out in any format is unnecessary tedium).

Of course, normally, all serialization code will be generated by
other programs anyway. The only thing that is hand written is
the low level code to handle the basic types.
I have a feeling that "most" compilers by default expand all arguments
less than the "word" size to the "word" size. Surely for simplicity of
implementation... stack offsets, etc.

Most compilers do insert padding, since not doing so will either
slow the code down significantly (Intel) or cause the program to
crash (most other processors) because of misaligned data.
Inserting padding is not expanding an argument to word size;
when a compiler inserts padding, the bytes in the padding have
unspecified (and often random) values. If I have something
like:

void f( char ch );

char aChar;
f( aChar );

I expect a compiler on an Intel to generate either:

push aChar

(if `aChar` is correctly aligned), or

mov al, aChar
push eax

to pass the argument. Four bytes end up on the stack, but only
one is significant, and read by the called code. (On a Sparc,
of course, arguments are passed in registers, and there are no
byte registers, so you end up with something like:

ldsb aChar, %o0

---load signed byte to register o0.)
 
J

James Kanze

James Kanze wrote:
[...]
Anything that implements char as anything other than 8-bits is an exotic,

I tend to agree, but I've heard that 32 bit char is fairly
common on embedded platforms. (But I've no concrete knowledge
of this---when I was working on embedded platforms, they were
still all 8 bit processors.)
IMO, and therefore does not deserve to taint all other code for its
existence.

It doesn't taint the code. I suspect that well over 90% of the
code I've written would run on a 36 ones complement processor
(Unisys 2200 mainframe) without any changes. Not because I
wrote it with the intent of being portable to such a platform,
but because I wrote it cleanly, in the philosophy of C/C++ (as
explained by Kernighan and Richie). A lot of my modern code
will not run on a 16 bit processor, because it assumes that an
int is at least 32 bits (or at least considerably more than 16
bits); this seems a reasonable compromize for the environments I
currently work in. But the size of bool has never been an
issue.
Special circumstances require special handling. Code for the
common case, to some defined "common", and not for the exceptional case.
Imagine telling your client that you didn't take advantage of the fact
that most platforms define char as 8-bits and then telling them the
statistics and applicability to their needs of those exotic platforms.
They would run you out of town on a rail.. or *should*!

But the size of an int or a char makes no difference in most of
the code, as long as it is big enough to hold all of the values
I need.
 
J

JBarleycorn

James Kanze said:
That still doesn't give you a struct with predictable layout.
int32_t has a different layout on a Sparc than on a PC. And
doesn't even exist on some mainframes. (The standard requires
int32_t to be a 2's complement representation. And at least two
mainframe architectures don't use 2's complement.) Machines
without int32_t can probably be considered exotic, and I suspect
that most programmers don't have to take them into account. On
the other hand, Intel's layout of int32_t is not the same as
that of a Sparc or a Power PC, both widespread architectures.

If you meant endianness, why not just say so? (And don't those allow
choice of endianness upon system setup?).
 
J

JBarleycorn

James Kanze said:
Who does? In over thirty years of experience in C and C++, I've
yet to see any application which uses such a header. I'm
tempted to say that someone who uses such a header doesn't
understand the basic philosophy behind C and C++.

Now we're getting somewhere! You saying that you code against a
*"philosophical"* paradigm. The word "religion" will surface soon, I'm
sure. Then you can say that my designs are unholy and yours are holy and
then the congregation you belong to will ... something.
Sure there is: signed char, short, int and long are all similar
types, with potentially different widths. (Don't ask me why
bool is a signed integral type, rather than an unsigned one.)

Like I *said*: there is no *similar* type.

1. That C++ bool is a half-assed integral type, adds to its brokeness.
2. Full-blown integral behavior is not *similar* to boolean behavior.

(Surely it's signed so it converts to what the standard likes to view as
ubiquitous (a broken line of reasoning IMO): int.)
Exactly like every other integral type.

I'm not going to repeat myself. If you conveniently want to ignore what
I've said before, fine, but don't try to keep the discussion going then
just to "listen to your own voice". I'm giving you a convenient platform
to propagandize? Fool me once, shame on you, fool me twice, shame on me!
For example? You keep saying it's important, but I've only
encountered the case in a few rare instances. (For starters, it
can't be important in portable code.)



Then there's something wrong with your programming style.


Then there's something very wrong with your programming style.

Contraire: you are limited by your paradigm (or choose to promote it for
ulterior reasons). (Or I'm so far ahead of you that you can't even see
the dust anymore... Beep! Beep!).
Well there you go then.

Drivers are a bit special, but even in drivers, it's not rare to
use the standard types. And of course, unless things have
changed enormously since I was writing drivers, the fixed width
types don't occur in structs.

[...]
No, not likely for then bool would be eliminated from a lot of places
where it could be used to good effect. I think multiple-width boolean
types can be made to work, but the complexity is not worth it, so
1-byte
bools, even though there may be a performance penalty on most
platforms,
are probably the best choice. I have pretty much decided that where a
"fast bool" is needed it's a profile/optimizaation thing and trying to
find an optimum boolean type size as the best compromise across a set
of
platforms falls into the category of premature optimization.
Therefore,
the overriding concern is suitability for use as a struct field. So,
my
boolean class uses a 1-byte type as the underlying type.

That sounds like the most rational decision. Or rather, you use
the bool the implementation gives you, until you need something
faster for your application.

1. The size of C++ bool can't be relied on.
2. C++ bool has brain-damaged behavior (promotion to int).

The only reason bool is needed at all is because relational expressions
are of type bool. Yes, that broken, brain-damaged little boolean type is
inextricably interleaved throughout all C++ source code. :(
(As I've mentionned earlier, I
expect that bool is a single byte on most, if not all, byte
addressed platforms.)

Now it is, but historically, I have an incling, that on all those
platforms, bool was synonymous with int. In that case, you have just
different versions (old and new) of the compiler and encounter a size
change of type (or synonym) bool.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,141
Messages
2,570,817
Members
47,362
Latest member
ChandaWagn

Latest Threads

Top