Typecast clarification

R

Roger Tombey

Hi Folks,

To determine if a machine is little endian / big endian the foll. code
snippet is used...

int num = 1;

if( * (char *)&num == 1)
printf ("\n Little Endian");

else
printf("\n Big endian");

I needed a few clarifications regarding this.

1. Can we use void * instead of char * ?
2. Does the above typecast convert an integer to a char (1 byte) in
memory?
For e.g if I used a variable ch, to store the result of the above
typecast

3. In general, when can we safely do typecasts ? Is such code
portable ?

Thanks a lot for your help. Appreciate it.
 
B

Ben Bacarisse

Roger Tombey said:
To determine if a machine is little endian / big endian the foll. code
snippet is used...

int num = 1;

if( * (char *)&num == 1)
printf ("\n Little Endian");
else
printf("\n Big endian");

That's one way. There are others and some of them make it possible to
detect other byte-orderings (there are more than 2 possibilities).
I needed a few clarifications regarding this.

1. Can we use void * instead of char * ?

No. Did you try? What did you compiler say?
2. Does the above typecast convert an integer to a char (1 byte) in
memory?

No, it doesn't. It converts one pointer type to another. By the way,
the operator is called a cast. A cast is an operator that explicitly
converts a value from one type to another. Calling it a "typecast" is a
bit like talking about a "PIN number".
For e.g if I used a variable ch, to store the result of the above
typecast

Try it. Your compiler should complain -- if not look at how to turn up
the warning level on the compiler. Note that

char ch = *(char *)#

(which is perfectly legal) does not "store the result of the above
typecast". It stores the result of the indirection operator, *.
3. In general, when can we safely do typecasts ? Is such code
portable ?

That's too huge a question. I can't think of a good definition of
"safely" so I don't think I can hazard an answer. Even the term
"portable" is not well defined. For example, is the code you showed
portable? It is safe on every machine, but it exposes a difference
between machines which is usually best avoided of you want portable
code.

Also, casts are only conversions. C will do some conversions without a
cast and some of the can be "dangerous" (or their consequences can be
dangerous). For example:

void example(void *ptr, unsigned int u)
{
int x = u; /* may raise a signal */
int *ip = ptr; /* ptr may not be properly aligned for an int */
/* ... */
}

No casts in sight but two potentially dangerous conversions. Don't
focus on the cast -- understand which conversions are safe and which
aren't.
 
K

Keith Thompson

Roger Tombey said:
To determine if a machine is little endian / big endian the foll. code
snippet is used...

int num = 1;

if( * (char *)&num == 1)
printf ("\n Little Endian");

else
printf("\n Big endian");

That will probably work, but it's not guaranteed. And before you use
it, you should think about why you care whether your system is
little-endian or big-endian. There are cases where you need to know,
but I think there are more cases where people *think* they need to know
but really don't.
I needed a few clarifications regarding this.

1. Can we use void * instead of char * ?

No. You can't dereference a void* pointer.
2. Does the above typecast convert an integer to a char (1 byte) in
memory?

No, it's a pointer conversion. &num computes the address of num; that's
a value of type int*. The cast converts that value from int* to char*.
The outer "*" operator then dereferences the char* value, yielding a
value of type char.

If you wanted to convert an int to a char, you could just write:
(char)num

Conversions work on values, not on representations. A pointer
conversions converts a pointer value, yielding a new pointer that
(typically) points to the same location in memory but treats it as
having a different type.

What you're doing is a form of "type-punning", treating an object of one
type as if it were of a different type.

To understand the difference between value conversion and type-punning,
consider this:

#include <stdio.h>
int main(void)
{
float f = 12.34;
int i;
unsigned int rep;

printf("f = %f\n", f);

i = f; /* No cast is necessary; the (value) conversion is implicit
*/
printf("i = %d\n", i);

rep = *(unsigned int*)&f; /* type-punning */
printf("rep = %u = 0x%x\n", rep, rep);

return 0;
}

On my system, the output is:

f = 12.340000
i = 12
rep = 1095069860 = 0x414570a4
For e.g if I used a variable ch, to store the result of the above
typecast

3. In general, when can we safely do typecasts ? Is such code
portable ?

The correct term is "casts", not "typecasts". A cast is an operator
that specifies an explicit conversion. Implicit conversions are
very common.

Casts are used a lot more often than they should be. There are
a handful of cases where they make some sense (certain
arguments to variadic functions, some arithmetic conversions,
even pointer conversions where you really know what you're doing
and aren't very concerned about portability).

I don't know of a general rule for when casts are safe and/or
portable, but a good rule of thumb is that any cast operator should
be viewed with suspicion.
 
E

Eric Sosman

Hi Folks,

To determine if a machine is little endian / big endian the foll. code
snippet is used...

int num = 1;

if( * (char *)&num == 1)
printf ("\n Little Endian");

else
printf("\n Big endian");

You're right that code like this "is used" to make the test,
but it's unfortunate that you're right because the code is not
reliable. For example, it will always announce "Little Endian"
on a machine with 1-byte integers (like some DSP's). Also, it
considers only two possible outcomes; on a machine with 4-byte
int's there are twenty-four (of which at least three have been
used on actual, non-hypothetical hardware). Finally, there is a
(hypothetical) possibility that the correspondence between bit
values in an int and bit values in its constituent bytes is not
so simple; for a 32-bit int there are, in theory, 32 possible
ways the value 1 could be represented. All but one of those would
produce the reassuring but untrustworthy "Big Endian" answer.

As an early computer scientist put it, "There are more things
in heaven and earth, Horatio, than are dreamt of in your philosophy."
I needed a few clarifications regarding this.

1. Can we use void * instead of char * ?

No. (Try it!)
2. Does the above typecast convert an integer to a char (1 byte) in
memory?
For e.g if I used a variable ch, to store the result of the above
typecast

I don't understand your question. The cast (not "typecast")
converts an int* pointer value to a char* pointer value. There is
no way to tell whether this conversion happens "in memory" or in
a CPU register or in one of the watchtowers of Elsinore. You would
not try to store the cast's result (a pointer) in a char variable
(again, try it!), but you might do so with the value you get by
dereferencing that pointer.
3. In general, when can we safely do typecasts ? Is such code
portable ?

You can cast data pointer values to data pointer values of
different (or the same) types. You can cast function pointer
values to function pointer types. You can cast arithmetic values
to arithmetic types. And as a special case you can cast integers
(all kinds) to and from data pointers (all kinds).

BUT not all such conversions are meaningful. If you start
with a char* pointing at the 'd' in "Hello, world!", convert that
to a short, convert that to a double*, and try to use the pointer
that results, you are likely to get into trouble. The nature of
trouble you get into is non-portable (it might not even seem at
first to *be* trouble). Similarly, if you cast 1.2e30 to an int
you may be unhappy with the non-portable outcome.
 
R

Roger Tombey

Keith said:
That will probably work, but it's not guaranteed. And before you use
it, you should think about why you care whether your system is
little-endian or big-endian. There are cases where you need to know,
but I think there are more cases where people *think* they need to know
but really don't.

For networking you need to know whether to swop the incoming and outgoing
byte order...
No. You can't dereference a void* pointer.

Yes, this is dumb. A void* and char* are exactly the same under the hood.
A void* you can convert without a typecast but you can't dereference. A
char* you can dereference but not convert automatically. Why not have one
pointer type with both features??
No, it's a pointer conversion. &num computes the address of num; that's
a value of type int*. The cast converts that value from int* to char*.
The outer "*" operator then dereferences the char* value, yielding a
value of type char.

But what I mean is that the int's memory is being regarded as a char,
isn't it?
Casts are used a lot more often than they should be. There are a
handful of cases where they make some sense (certain arguments to
variadic functions, some arithmetic conversions, even pointer
conversions where you really know what you're doing and aren't very
concerned about portability).

I don't know of a general rule for when casts are safe and/or portable,
but a good rule of thumb is that any cast operator should be viewed with
suspicion.

It would make life simpler if all typecasts were either disallowed or
guaranteed portable!

Rgds
 
E

Eric Sosman

For networking you need to know whether to swop the incoming and outgoing
byte order...

They're not C library functions, but many or most networking
environments provide ntohl(), htonl(), ntohs(), htons() that will
take care of this without effort on your part.
Yes, this is dumb. A void* and char* are exactly the same under the hood.
A void* you can convert without a typecast but you can't dereference. A
char* you can dereference but not convert automatically. Why not have one
pointer type with both features??

Because then the compiler wouldn't notice the error (or at any
rate, wouldn't be required to notice the error) in

char *ptr = malloc(SIZE);
if (ptr == NULL) {
fprintf ("Out of memory: file %s, line %d\n",
__FILE__, __LINE__);
exit (EXIT_FAILURE);
}

(Do people actually make this mistake? Not any more, but Back In
The Day they certainly did. It's my belief that the mistake is still
made, but caught by the compiler before the author is humiliated
publicly.)
But what I mean is that the int's memory is being regarded as a char,
isn't it?

Some part of the int's memory is being treated as a char, yes.
That treatment involves no "conversion."
It would make life simpler if all typecasts were either disallowed or
guaranteed portable!

Your wish is granted: I have it on excellent authority that the
"C1x" version of the C Standard will have no typecasts whatsoever!
That is, it will follow the precedents established by C99, C90, C89,
and K&R, none of which have typecasts.

They all do -- and will -- have casts, though. (And you've been
told the difference more than once.)

(Because computers are so plaguey literal-minded, programming is
an activity that requires unusual precision and attention to detail.
It continues to astonish me that programmers, so able to see the
necessity of getting every little niggly bit Just So, should be so
gawdawful sloppy when they write or speak natural languages. What's
going on? Does exactitude tire out some insufficiently exercised blob
of grey matter? Do the neurons lose elasticity with overuse? Or has
programming been made so seemingly simple that semi-intelligent semi-
literates imagine they can do it? Considering the sorry state of
software, maybe the last explanation is plausible.)
 
R

Roger Tombey

Eric said:
They're not C library functions, but many or most networking
environments provide ntohl(), htonl(), ntohs(), htons() that will take
care of this without effort on your part.

Yes but you need to know that your host byte order is little endian in
order to know whether to invoke these functions. If your host is big
endian and you call one of the functions, you'll swop it to little endian,
which will be wrong for the netowrk!
Because then the compiler wouldn't notice the error (or at any
rate, wouldn't be required to notice the error) in

char *ptr = malloc(SIZE);
if (ptr == NULL) {
fprintf ("Out of memory: file %s, line %d\n",
__FILE__, __LINE__);
exit (EXIT_FAILURE);
}

I don't see any error here.
Some part of the int's memory is being treated as a char, yes.
That treatment involves no "conversion."

I would call taking part of an int and transforming it to a char, a
"conversion".
They all do -- and will -- have casts, though. (And you've been
told the difference more than once.)

My textbook and many others use the term typecast. I think both forms are
correct but typecast is more expressive.

Rgds
 
E

Eric Sosman

Yes but you need to know that your host byte order is little endian in
order to know whether to invoke these functions. If your host is big
endian and you call one of the functions, you'll swop it to little endian,
which will be wrong for the netowrk!


I don't see any error here.

Okay: Slap it inside a suitable framework like

#include <stdio.h>
#include <stdlib.h>
int main(void) {
#define SIZE 42

/* paste the code here */

return 0;
}

.... and ask your compiler to take a gander at it. If the compiler
detects the error I put there (or think I put there), and if you did
not detect it even after being told it was there, would that change
your opinion about the desirability of a freely-convertible char*?
I would call taking part of an int and transforming it to a char, a
"conversion".

You might, but C doesn't. In C's view, any data object (other
than a bit field or a register variable) can be viewed as an array
of char; the elements of that array are the object's representation.
Reading or writing a single char in an array of char -- a single X
in an array of X -- involves no conversion at all.
My textbook and many others use the term typecast. I think both forms are
correct but typecast is more expressive.

More prolix, you mean. Confused speakers (or those who desire
to confuse, like politicians) frequently pump up their language with
excess syllables just to sound more impressive. Professor Peter
Schickele demonstrated the technique by coining the delightful term
"musicologicallywise-speaking."
 
S

Seebs

For networking you need to know whether to swop the incoming and outgoing
byte order...

No, you don't.

You need to hand things to the implementation, which knows.

Look up the (admittedly somewhat Unixy) "htonl()" and "ntohl()".
Yes, this is dumb. A void* and char* are exactly the same under the hood.
A void* you can convert without a typecast but you can't dereference. A
char* you can dereference but not convert automatically. Why not have one
pointer type with both features??

Because they represent different things. A "char *" is a pointer to a byte
of memory. A "void *" is a pointer to a region of memory of unknown size.

The point of "void *" is to allow you to pass an untyped address, without
committing to what it will be the address of.
But what I mean is that the int's memory is being regarded as a char,
isn't it?

No. The "first" (whatever that means) byte of the int's memory is being
regarded as a char. You're looking through a pointer to a character; you
get a character. You do not get the character that you would get by
converting some other object to a character-range value, you get the character
that happens to be the first byte of that object.
It would make life simpler if all typecasts were either disallowed or
guaranteed portable!

Simpler, but also untenable.

Casts are very useful, but you have to know what you're doing. You can't
"fix" this. It's essential to C's combination of being basically portable
but capable of doing machine-specific stuff.

-s
 
S

Seebs

Yes but you need to know that your host byte order is little endian in
order to know whether to invoke these functions. If your host is big
endian and you call one of the functions, you'll swop it to little endian,
which will be wrong for the netowrk!

1. "swap", not "swop".
2. No, you won't. RTFM. If no "swap" is needed, then they're no-ops.

The entire point is to guarantee, regardless of what host and network byte
orders are, whether or not the host is EITHER little endian or big endian
(it could be middle-endian!), that you end up with the correct conversion.

Each implementation, then, implements them according to its native byte
ordering.
I would call taking part of an int and transforming it to a char, a
"conversion".

You would be incorrect. It's not being transformed. It's being exactly
what it was all along -- the same byte of memory, no change in its value
occurring. A transformation would be a conversion from one type or range
or something to another, but none is occurring.

-s
 
K

Keith Thompson

Eric Sosman said:
On 5/26/2010 3:40 PM, Roger Tombey wrote: [...]
Yes, this is dumb. A void* and char* are exactly the same under the hood.
A void* you can convert without a typecast but you can't dereference. A
char* you can dereference but not convert automatically. Why not have one
pointer type with both features??

Because then the compiler wouldn't notice the error (or at any
rate, wouldn't be required to notice the error) in

char *ptr = malloc(SIZE);
if (ptr == NULL) {
fprintf ("Out of memory: file %s, line %d\n",
__FILE__, __LINE__);
exit (EXIT_FAILURE);
}

(Do people actually make this mistake? Not any more, but Back In
The Day they certainly did. It's my belief that the mistake is still
made, but caught by the compiler before the author is humiliated
publicly.)

The error doesn't involve void*, so I'm not sure it's relevant to your
point.

[...]
 
E

Eric Sosman

Eric Sosman said:
On 5/26/2010 3:40 PM, Roger Tombey wrote: [...]
Yes, this is dumb. A void* and char* are exactly the same under the hood.
A void* you can convert without a typecast but you can't dereference. A
char* you can dereference but not convert automatically. Why not have one
pointer type with both features??

Because then the compiler wouldn't notice the error (or at any
rate, wouldn't be required to notice the error) in

char *ptr = malloc(SIZE);
if (ptr == NULL) {
fprintf ("Out of memory: file %s, line %d\n",
__FILE__, __LINE__);
exit (EXIT_FAILURE);
}

(Do people actually make this mistake? Not any more, but Back In
The Day they certainly did. It's my belief that the mistake is still
made, but caught by the compiler before the author is humiliated
publicly.)

The error doesn't involve void*, so I'm not sure it's relevant to your
point.

He stated a desire for a char* that would convert implicitly to
any other data pointer type. My example illustrates a situation
(one I've actually encountered) where that implicit conversion could
hide an error that would almost certainly produce a crash.

Actually, he didn't ask that char* be the magical pointer type
that converts freely and dereferences as char -- but he clearly
wanted "one pointer type" to fill both roles. Array-of-char must
decay to *some* kind of pointer, and if there's only one pointer
type that dereferences to char (because both roles are combined in
that one pointer type), array-of-char must decay to that freely-
convertible type, no matter how he chooses to spell it.
 
B

Ben Bacarisse

Eric Sosman said:
Because then the compiler wouldn't notice the error (or at any
rate, wouldn't be required to notice the error) in

char *ptr = malloc(SIZE);
if (ptr == NULL) {
fprintf ("Out of memory: file %s, line %d\n",
__FILE__, __LINE__);
exit (EXIT_FAILURE);
}

(Do people actually make this mistake? Not any more, but Back In
The Day they certainly did. It's my belief that the mistake is still
made, but caught by the compiler before the author is humiliated
publicly.)

I suspect there is another less deliberate error here, but I am not
sure. Ironically it is a place where I think a cast is needed for
portability. What is your opinion of

printf("%d\n", (int)__LINE__);

? The standard says that __LINE__ is "an integer constant" (not an int)
so presumably it could behave as if it were 42ULL which can't printed
with a %d format without undefined behaviour.

If I'm right, I suspect it is an oversight rather than a deliberate
attempt to allow implementations this freedom, but then the exact type
is given for some predefined macros (such as __STDC_VERSION__ for
example) so maybe it is intended.

<snip>
 
B

Ben Bacarisse

Roger Tombey said:
My textbook and many others use the term typecast. I think both forms are
correct but typecast is more expressive.

Languages changes and I fear this battle is being slowly lost. However,
it is probably worth disagreeing, at least for a while longer.

As a verb, type-cast (I think it should be hyphenated) has a meaning
which has no sensible metaphorical connection to what a cast operator
does; and to use the word as a noun involves a completely new meaning
since the word has been, historically, only a verb and an adjective

Cast, on the other hand, does have exactly the right metaphorical
meaning (both as a verb and as a noun). I think it expresses the new
meaning very well, and rather poetically at that:

From the OED: cast (n): The form into which any work is thrown
.... /fig/. Mould. cast (v): To put 'into shape' or into order; to
dispose, arrange.

I won't argue that type-cast is not expressive, but for many people (and
most people here, I'd venture) it is expressive of the wrong idea. Cast
is an admirably expressive name for this operator and I commend it to
you.
 
K

Keith Thompson

Ben Bacarisse said:
I suspect there is another less deliberate error here, but I am not
sure. Ironically it is a place where I think a cast is needed for
portability. What is your opinion of

printf("%d\n", (int)__LINE__);

? The standard says that __LINE__ is "an integer constant" (not an int)
so presumably it could behave as if it were 42ULL which can't printed
with a %d format without undefined behaviour.

If I'm right, I suspect it is an oversight rather than a deliberate
attempt to allow implementations this freedom, but then the exact type
is given for some predefined macros (such as __STDC_VERSION__ for
example) so maybe it is intended.

Interesting. Requiring it to be of type int could cause problems
for implementations where int is 16 bits and source files are
allowed to be longer than 32767 lines.

I suspect that on most implementations it's equivalent to a decimal
literal with the appropriate value -- which means its type will
vary for very large line numbers. (You don't need a huge source
file to demonstrate this; you can use a #line directive.)
 
B

Ben Bacarisse

Keith Thompson said:
Ben Bacarisse <[email protected]> writes:

Interesting. Requiring it to be of type int could cause problems
for implementations where int is 16 bits and source files are
allowed to be longer than 32767 lines.

Yes, I was not suggesting it /should/ be int!
I suspect that on most implementations it's equivalent to a decimal
literal with the appropriate value -- which means its type will
vary for very large line numbers. (You don't need a huge source
file to demonstrate this; you can use a #line directive.)

I notice that the specification of #line limits the digit sequence to
specifying a number less than or equal to 2147483647. Given that limit,
__LINE__ could have been defined to expand to an integer constant of
type long int.

Anyway, given that is isn't so defined, the limit does at least mean
that using (long)__LINE__ is reasonable for all but the most unnatural
source files.
 
K

Keith Thompson

Ben Bacarisse said:
I notice that the specification of #line limits the digit sequence to
specifying a number less than or equal to 2147483647.

Ah, I didn't know that. Note that it's specified by a "shall"
outside a constraint, so violating it invokes undefined behavior.
Given that limit,
__LINE__ could have been defined to expand to an integer constant of
type long int.

Well, sort of.
Anyway, given that is isn't so defined, the limit does at least mean
that using (long)__LINE__ is reasonable for all but the most unnatural
source files.

Depends on what you mean by "unnatural". If you write
#line 2147483647
the lines following it will have even larger numbers.

#include <stdio.h>
int main(void)
#line 2147483647
{
printf("__LINE__ = %ld\n", (long)__LINE__);
return 0;
}

On my system, this prints:

__LINE__ = -2147483648
 
P

Phil Carmody

Keith Thompson said:
The correct term is "casts", not "typecasts". A cast is an operator
that specifies an explicit conversion.

But not necessarily of type. There may be only a conversion of
the qualified type, but not of the type, such as by adding or
casting away const. I think it's still useful to have a term
that specifies that a type conversion has taken place, and there
seem to be few better terms for that than "type casts".

Phil
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,997
Messages
2,570,240
Members
46,830
Latest member
HeleneMull

Latest Threads

Top