64 bit porting

M

Mohanasundaram

Hi All,

We are working on porting a product written in C and C++ from 32 bit
to 64 bit. We need to maintain both 32 bit and 64 bit versions in the
future. We took the 32 bit source code and copiled it using a 64 bit
compiler and fixed all the compilation warnings. Compilation went
through fine but the product breaks in lots of places. We understood
that porting a 32 bit code to 64 bit platform is not just a matter of
compilation. We have to handle the problems which will not be caught
by the compiler due to the change from ILP32 to LP64. So we are trying
to list out all the possible problems that might accour due to the
change from size of long and pointer from 4 bytes to 8 bytes with int
still being 4 bytes. We have listed out few possibilites of the places
where bugs might creep up. We want to validate the correctness of the
points and would like to add more in to this list. Please help us.

1. Change all the long to integers blindly. But not integers to long.
We think this might solve the following problems
(a) Code written using bitwise operators assuming that the size of
long is 4 will create problems
(b) Getting the offsets of the fields in structures by not using
OFFSET macro will create problems when the structe has longs
(c) Manipulating the long data bytewise by breaking them using
pointers like long i = 1; char a = ((char *)&i)[0];
etc.....

2. Check for all the library functions which returns long, like atol
and make sure no code is written assuming that the reaturn value
is of 4 bytes. Or consider changing atol to atoi or simlar functions.

3. If C style memory allocation is used insted of "new" then there are
possibility for bugs.
long *ptr = malloc(4*2);
in 32 bit compilation the above statement will allocate 8 bytes of
memory and ptr can be used as an array of two elements.
But in 64 bit compilation it will allocate 8 bytes and the number of
elements in the array is one. So if code is written
assuming that the number of elements is two then it will break. So
all "malloc"s "calloc"s and "realloc"s should be checked.

4. If the pointers are casted to integers anywhere it has to be
checked.
For example
int a = 10;
int *ptr = &a;
int b = reinterpret_cast<int>(a);
The above code will crete problems in 64 bit compilation since
pointer is 8 bytes and int is 4 bytes. So it has to be
changed to
long b = reinterpret_cast<long>(a);

5. Getting the offsets of the structure fields by assuming the size of
the fields and not using OFFSET macro. How does this stuff
work in case of unions or classes

6. size_t is a 32 bit quantity in 32 bit compilation wherein it grows
to 64 in 64 bit compilation.

7. The format specifiers should be checked for example
in printfs and scanfs
long i = <some expression>;
printf("%d",i);
will not be a big problem as far as the result is converned but it
will print wrong values when the value is i is very
big and exceeds the limit of integer.

Thanks a lot for your time.

Regards,
Mohan.
 
D

Dan Pop

In said:
We are working on porting a product written in C and C++ from 32 bit
to 64 bit. We need to maintain both 32 bit and 64 bit versions in the
future.

If you do your job right, you will have only version to maintain, that
will work on both 32 and 64-bit platforms. This is usually called
64-bit clean code.
1. Change all the long to integers blindly. But not integers to long.

Don't change *anything* blindly. Try to understand *all* the implications
of *each and every* change you make.
We think this might solve the following problems
(a) Code written using bitwise operators assuming that the size of
long is 4 will create problems

Fix such code instead.
(b) Getting the offsets of the fields in structures by not using
OFFSET macro will create problems when the structe has longs

Doing that is sheer stupidity in the first place. If you need such
offsets, offsetof() or pointer arithmetic are the ONLY ways to go.
(c) Manipulating the long data bytewise by breaking them using
pointers like long i = 1; char a = ((char *)&i)[0];
etc.....

This is not affected by 32 vs 64 bit issues, but may be affected by byte
order issues. Switching from long to int buys you nothing. And you
really want to use unsigned char for this purpose.
2. Check for all the library functions which returns long, like atol
and make sure no code is written assuming that the reaturn value
is of 4 bytes. Or consider changing atol to atoi or simlar functions.

Much better, remove *all* the dependencies of the C types sizes in the
code, if reasonably possible.
3. If C style memory allocation is used insted of "new" then there are
possibility for bugs.
long *ptr = malloc(4*2);
in 32 bit compilation the above statement will allocate 8 bytes of
memory and ptr can be used as an array of two elements.
But in 64 bit compilation it will allocate 8 bytes and the number of
elements in the array is one. So if code is written
assuming that the number of elements is two then it will break. So
all "malloc"s "calloc"s and "realloc"s should be checked.

Indeed, and the proper fix is:

long *ptr = malloc(2 * sizeof *ptr);

which is correct *everywhere*.
4. If the pointers are casted to integers anywhere it has to be
checked.
For example
int a = 10;
int *ptr = &a;
int b = reinterpret_cast<int>(a);
The above code will crete problems in 64 bit compilation since
pointer is 8 bytes and int is 4 bytes. So it has to be
changed to
long b = reinterpret_cast<long>(a);
^^^^^^^^^^^^^^^^^^^^^^
This is not valid C syntax, so I don't know what you're talking about.
If you need to convert pointers to integers, the type unsigned long is
the best choice on both 32 (ILP32) and 64-bit (I32LP64) platforms.
5. Getting the offsets of the structure fields by assuming the size of
the fields and not using OFFSET macro.

Deja vu (point 1b above).
How does this stuff work in case of unions or classes

It is not needed for unions (all members have offset 0) and there are
no classes in C.
6. size_t is a 32 bit quantity in 32 bit compilation wherein it grows
to 64 in 64 bit compilation.

Why should your code care about the size of size_t?
7. The format specifiers should be checked for example
in printfs and scanfs
long i = <some expression>;
printf("%d",i);
will not be a big problem as far as the result is converned but it
will print wrong values when the value is i is very
big and exceeds the limit of integer.

This code is already broken and it works by pure accident. If i has type
long, %d is NOT an option. %ld will correctly work on both 32 and 64-bit
platforms.

It looks like your code was severely broken even on 32-bit platforms and
it worked by luck/accident. Once you fix it, if you do the job right,
it will work equally well on both 32 and 64-bit platforms, without needing
separate versions.

If you need to share binary files between 32 and 64-bit platforms, pay
extra attention to the definition of the data that gets written into the
files.

Dan
 
T

Tim Prince

Dan Pop said:
In <[email protected]>


Why should your code care about the size of size_t?
Some "C/C++" (sic) customers are adamant that there has to be a way to bury
size_t stuff in the middle of a struct without padding or breaking
alignments between platforms, or that any compiler which barfs at storing
int and size_t interchangeably is broken. When that comes up, it's a
probable sign that I should go back to projects which don't have C++ in
them.
 
I

Igmar Palsenberg

Mohanasundaram said:
1. Change all the long to integers blindly. But not integers to long.
We think this might solve the following problems
(a) Code written using bitwise operators assuming that the size of
long is 4 will create problems
(b) Getting the offsets of the fields in structures by not using
OFFSET macro will create problems when the structe has longs
(c) Manipulating the long data bytewise by breaking them using
pointers like long i = 1; char a = ((char *)&i)[0];
etc.....

If you're on a *NIX system, use sys/types.h, and use things like
u_int32_t, int32_t, etc, etc. If you're on Windows, use something that
looks like it.
2. Check for all the library functions which returns long, like atol
and make sure no code is written assuming that the reaturn value
is of 4 bytes. Or consider changing atol to atoi or simlar functions.

Fix the library I would say.
3. If C style memory allocation is used insted of "new" then there are
possibility for bugs.
long *ptr = malloc(4*2);
in 32 bit compilation the above statement will allocate 8 bytes of
memory and ptr can be used as an array of two elements.
But in 64 bit compilation it will allocate 8 bytes and the number of
elements in the array is one. So if code is written
assuming that the number of elements is two then it will break. So
all "malloc"s "calloc"s and "realloc"s should be checked.

use malloc(sizeof(long) * 2) for that.
4. If the pointers are casted to integers anywhere it has to be
checked.
For example
int a = 10;
int *ptr = &a;
int b = reinterpret_cast<int>(a);
The above code will crete problems in 64 bit compilation since
pointer is 8 bytes and int is 4 bytes. So it has to be
changed to
long b = reinterpret_cast<long>(a);

Fix the code. Casting pointers to ints is a sign of problems in the design.
5. Getting the offsets of the structure fields by assuming the size of
the fields and not using OFFSET macro. How does this stuff
work in case of unions or classes

The compiler knows the offset. Since you can use members of structs
directly, I hardly see a reason to use them.
6. size_t is a 32 bit quantity in 32 bit compilation wherein it grows
to 64 in 64 bit compilation.

size_t can be a 64 bits variable in 32 bits platforms. It is common
these days, since offsets need 64 bits when dealing with large files.

7. The format specifiers should be checked for example
in printfs and scanfs
long i = <some expression>;
printf("%d",i);
will not be a big problem as far as the result is converned but it
will print wrong values when the value is i is very
big and exceeds the limit of integer.

Replace long by an type that indicates what variable and length you
actually mean. That saves tons of headaches, an make the code better to
read.

Thanks a lot for your time.

Regards,
Mohan.



Igmar
 
S

Stephen Sprunk

Igmar Palsenberg said:
size_t can be a 64 bits variable in 32 bits platforms. It is common
these days, since offsets need 64 bits when dealing with large files.

Why would size_t be 64b on a 32b platform? size_t is the maximum size of a
single allocated object _in memory_, so where is the need for it to exceed
the size of the address space?

There's at least one well-known case where size_t is smaller than the
address space size, but on what implementations can it be larger?

S
 
J

jacob navia

Stephen Sprunk said:
There's at least one well-known case where size_t is smaller than the
address space size, but on what implementations can it be larger?

In the bloated ones :)
 
I

Igmar Palsenberg

Stephen said:
Why would size_t be 64b on a 32b platform? size_t is the maximum size of a
single allocated object _in memory_, so where is the need for it to exceed
the size of the address space?

Never mind the size_t remark : That should be off_t.
There's at least one well-known case where size_t is smaller than the
address space size, but on what implementations can it be larger?

None it seems :)



Igmar
 
R

Randy Howard

Never mind the size_t remark : That should be off_t.


None it seems :)

I suppose if you had a magic compiler that supported PAE extension for
Intel, it would be possible to have "magic pointers" and offsets that
were outside the range of 32-bit registers by themselves. As may, or
may not be known, 32-bit operating systems such as the W2K and W2K3
server platforms plus Linux distributions can see much more than the
expected 4GB limitation, in some cases as much as 64GB of RAM using
the "PAE" hack Intel came up with. However, most of the time, there
is still a 2GB per process address space limitation, so I'm not sure
how this magical compiler could get around such a limit in all cases,
perhaps using something akin to the MS "AWE API" on your behalf.

I suspect such a compiler would always be buggy, and cost more than
anyone cares to imagine.

It's quite a bit easier to buy a motherboard and AMD 64-bit CPU
for < $500 and go on your merry way. :)
 
R

Rupert Pigott

Dan Pop wrote:

[SNIP]

Weird. Size my very first malloc program I've been using sizeof() to
work out how big I want stuff.

/* Single ptr */
long* ptr = malloc( sizeof( long* ) );
Indeed, and the proper fix is:

long *ptr = malloc(2 * sizeof *ptr);

Erm... Wouldn't something like the following be a little safer and
easier explain for an array of two or more pointers ?

/* Array of 2 ptrs */
long* ptr = malloc( sizeof( long*[2] ) );

The rationale for this approach is that you're taking into account
any weird array element alignment stuff that the compiler might
want to do. I did come unstuck with this many moons ago when I
wrote the following :

short* array = malloc( sizeof( short ) * 42 );

.... The compiler liked to pad shorts up to a word boundary (so
there was practically zero point in them). Therefore I had not
allocated enough memory. When I wrote to that array I ended up
corrupting other stuff and the program died a firey death after
some entertaining but wrong results... After much hair pulling
I changed the line to take this into account, it became :

short* array = malloc( sizeof( short[42] ) );
which is correct *everywhere*.

The big question is : Does ANSI C permit compilers to pad array
elements up to some other size like it does with structures ?
My guess is *no* given the amount of code that runs fine with
the more intuitive "sizeof( type ) * n" approach.

Cheers,
Rupert
 
R

Richard Bos

Rupert Pigott said:
Dan said:
Indeed, and the proper fix is:

long *ptr = malloc(2 * sizeof *ptr);

Erm... Wouldn't something like the following be a little safer and
easier explain for an array of two or more pointers ?

/* Array of 2 ptrs */
long* ptr = malloc( sizeof( long*[2] ) );

The rationale for this approach is that you're taking into account
any weird array element alignment stuff that the compiler might
want to do.

The implementation is not allowed to do any weird array alignment stuff,
unless it also does it in an array of one element, aka the base type.
I.e., sizeof (long*[2]) _must_ be 2*sizeof (long)
I did come unstuck with this many moons ago when I
wrote the following :

short* array = malloc( sizeof( short ) * 42 );

... The compiler liked to pad shorts up to a word boundary (so
there was practically zero point in them). Therefore I had not
allocated enough memory.

If it did that for the array, but _not_ for individual shorts, it was
not a C compiler.
The big question is : Does ANSI C permit compilers to pad array
elements up to some other size like it does with structures ?

No. Not more so than the individual elements.

Richard
 
E

Eric Sosman

Rupert said:
Dan Pop wrote:

[SNIP]

Weird. Size my very first malloc program I've been using sizeof() to
work out how big I want stuff.

/* Single ptr */
long* ptr = malloc( sizeof( long* ) );

If this sample is representative, you've been using
it incorrectly ...
Indeed, and the proper fix is:

long *ptr = malloc(2 * sizeof *ptr);

Erm... Wouldn't something like the following be a little safer and
easier explain for an array of two or more pointers ?

/* Array of 2 ptrs */
long* ptr = malloc( sizeof( long*[2] ) );

Same error as in the first sample. The snippet described
as "the proper fix" may be less easy to explain (to some), but
it has the virtue of being correct. "Things should be as
simple as possible, and no simpler."
The rationale for this approach is that you're taking into account
any weird array element alignment stuff that the compiler might
want to do. I did come unstuck with this many moons ago when I
wrote the following :

short* array = malloc( sizeof( short ) * 42 );

This one's correct.
... The compiler liked to pad shorts up to a word boundary (so
there was practically zero point in them). Therefore I had not
allocated enough memory.

The snippet you've shown allocates enough memory for
forty-two `short's, padding or no. If `sizeof(short)'
failed to include the padding, the compiler was broken --
and broken so badly that it's hard to imagine it surviving
even the most rudimentary set of tests. Although you were
there and I wasn't, it seems more likely that you've mis-
remembered some aspect of the problem than that the compiler
could be so seriously and obviously defective.
When I wrote to that array I ended up
corrupting other stuff and the program died a firey death after
some entertaining but wrong results... After much hair pulling
I changed the line to take this into account, it became :

short* array = malloc( sizeof( short[42] ) );

This has exactly the same meaning as the previous line.
If the implementation behaved differently, it was broken.
The big question is : Does ANSI C permit compilers to pad array
elements up to some other size like it does with structures ?
My guess is *no* given the amount of code that runs fine with
the more intuitive "sizeof( type ) * n" approach.

The `sizeof' an array of N elements is N times the
`sizeof' a single element. The `sizeof' an array element
of type T is equal to the `sizeof' a free-standing object
of that type. If there's any padding involved, it's part
of each and every T object.
 
A

Arthur J. O'Dwyer

Rupert Pigott said:
Dan said:
Indeed, and the proper fix is:

long *ptr = malloc(2 * sizeof *ptr);

Erm... Wouldn't something like the following be a little safer and
easier explain for an array of two or more pointers ?

/* Array of 2 ptrs */
long* ptr = malloc( sizeof( long*[2] ) );

Nope. Two problems. The obvious one is that you now have one more
type dependency in your program, and one more place that will need to
be changed if you decide that really '*ptr' ought to be a 'long long'
or a 'ptrdiff_t' or something.
The less obvious mistake is the more serious: you're allocating the
wrong amount of space! You meant

long* ptr = malloc(sizeof (long[2]));

but the extra verbosity got in the way of your reading comprehension.
You added an asterisk, thus changing the meaning of the expression.
If sizeof(long) != sizeof(long*), then your program has a very subtle
bug. This is why we always recommend the canonical form,

long *ptr = malloc(2 * sizeof *ptr);

No room for mistakes there; any more or fewer asterisks than exactly
two, and we instantly see that there is a bug. </hyperbole, but I
think you get the idea>
The rationale for this approach is that you're taking into account
any weird array element alignment stuff that the compiler might
want to do.

The implementation is not allowed to do any weird array alignment stuff,
unless it also does it in an array of one element, aka the base type.
I.e., sizeof (long*[2]) _must_ be 2*sizeof (long)

See how easy that bug is to miss? For 'sizeof(long)', read
'sizeof(long*)'.

No. Not more so than the individual elements.

...Well, it *could*, but it would have to very carefully hide that
fact from the programmer. In your case, I'd say that 'malloc' was
buggy --- it ought to have realized that when you said you wanted
room for an array of a funny size, it needed to give you a little
extra to account for that invisible padding.

But as I said, it's all invisible to the programmer:
'sizeof(foo[m][n][p][q])' must be exactly 'm*n*p*q*sizeof(foo)',
where 'foo' is a type.

-Arthur
 
R

Rupert Pigott

Arthur said:
Rupert Pigott said:
Dan Pop wrote:

Indeed, and the proper fix is:

long *ptr = malloc(2 * sizeof *ptr);

Erm... Wouldn't something like the following be a little safer and
easier explain for an array of two or more pointers ?

/* Array of 2 ptrs */
long* ptr = malloc( sizeof( long*[2] ) );


Nope. Two problems. The obvious one is that you now have one more
type dependency in your program, and one more place that will need to
be changed if you decide that really '*ptr' ought to be a 'long long'
or a 'ptrdiff_t' or something.
The less obvious mistake is the more serious: you're allocating the
wrong amount of space! You meant

long* ptr = malloc(sizeof (long[2]));

Bugger. My fault for trusting on-the-hoof thinking and not double
checking what I was *actually* typing. I was thinking of some code
I fixed back in 97 that allocated an array of pointers. :/
...Well, it *could*, but it would have to very carefully hide that
fact from the programmer. In your case, I'd say that 'malloc' was
buggy --- it ought to have realized that when you said you wanted
room for an array of a funny size, it needed to give you a little
extra to account for that invisible padding.

Nah, the malloc implementation was correct (one of the things I
checked). What wasn't correct was my concept of an array of
shorts being 'packed' (in PASCAL parlance) and the actual reality
of them being padded.

Thanks for the correction.

More proof that code-review works. :)

Cheers,
Rupert
 
R

Rupert Pigott

Eric said:
Rupert Pigott wrote:
[SNIP]
/* Array of 2 ptrs */
long* ptr = malloc( sizeof( long*[2] ) );
Same error as in the first sample. The snippet described

The extra asterisk in the sizeof( long*[2] ) has been pointed
out to me. Combination of thinko and typo I'm afraid. :(

[SNIP]
forty-two `short's, padding or no. If `sizeof(short)'
failed to include the padding, the compiler was broken --
and broken so badly that it's hard to imagine it surviving
even the most rudimentary set of tests. Although you were

It was a nearly 15 years ago, standards were different back
then. C compilers have come on a long way during that time.
there and I wasn't, it seems more likely that you've mis-
remembered some aspect of the problem than that the compiler
could be so seriously and obviously defective.

It was a "one-off" completed before the ink of C89 had time
to dry.

[SNIP]
The `sizeof' an array of N elements is N times the
`sizeof' a single element. The `sizeof' an array element
of type T is equal to the `sizeof' a free-standing object
of that type. If there's any padding involved, it's part
of each and every T object.

That's what I thought, it seemed like "common sense" to me.

Cheers,
Rupert
 
E

Eric Sosman

Rupert said:
Eric Sosman wrote:
[...]
forty-two `short's, padding or no. If `sizeof(short)'
failed to include the padding, the compiler was broken --
and broken so badly that it's hard to imagine it surviving
even the most rudimentary set of tests. Although you were

It was a nearly 15 years ago, standards were different back
then. C compilers have come on a long way during that time.

"Nearly 15 years," eh?

2004 or since 2004
- 15 you said - 15-
==== "nearly" ====
1989 1989+

Something about that date seems vaguely familiar ;-)

My own experience of C started in 1978, and I'm quite
aware that things were pretty wild and wooly before, oh,
about 1992 or so. (For all its peculiarities, most of
them probably historical, the Standard has made things
far easier for C programmers than beforehand. Sometimes
we forget just how bad it was.) But even in the Bad Old
Days it would have been passing strange to find a C compiler
for which sizeof(T[N]) != N * sizeof(T). It would have been
akin to finding a C compiler that didn't support arrays (in
fact, s/akin/equivalent/ might state the case better).
 
R

Rupert Pigott

Eric said:
Rupert said:
Eric Sosman wrote:
[...]
forty-two `short's, padding or no. If `sizeof(short)'
failed to include the padding, the compiler was broken --
and broken so badly that it's hard to imagine it surviving
even the most rudimentary set of tests. Although you were


It was a nearly 15 years ago, standards were different back
then. C compilers have come on a long way during that time.


"Nearly 15 years," eh?

2004 or since 2004
- 15 you said - 15-
==== "nearly" ====
1989 1989+

Something about that date seems vaguely familiar ;-)

Does the phrase "draft-ANSI" come to mind ? ;)
about 1992 or so. (For all its peculiarities, most of
them probably historical, the Standard has made things
far easier for C programmers than beforehand. Sometimes

I welcomed the standard myself. However, some time later I
actually got hold of a copy of the standard and read it
with dismay...

Being a head-strong young utopian I felt that it gave far
too much wiggle room to vendors. That said I did know *why*
it did that...
we forget just how bad it was.) But even in the Bad Old
Days it would have been passing strange to find a C compiler
for which sizeof(T[N]) != N * sizeof(T). It would have been

Compilers targetted at machines with vector units might
have had that 'feature'.
> akin to finding a C compiler that didn't support arrays (in
> fact, s/akin/equivalent/ might state the case better).


Cheers,
Rupert
 
M

Mohanasundaram

Hi All,

Thanks a lot for your wonderful inputs. Can you suggest us some
possible problems which I have not listed.

Regards,
Mohan.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,955
Messages
2,570,117
Members
46,705
Latest member
v_darius

Latest Threads

Top