size_t problems

J

jacob navia

I am trying to compile as much code in 64 bit mode as
possible to test the 64 bit version of lcc-win.

The problem appears now that size_t is now 64 bits.

Fine. It has to be since there are objects that are more than 4GB
long.

The problem is, when you have in thousands of places

int s;

// ...
s = strlen(str) ;

Since strlen returns a size_t, we have a 64 bit result being
assigned to a 32 bit int.

This can be correct, and in 99.9999999999999999999999999%
of the cases the string will be smaller than 2GB...

Now the problem:

Since I warn each time a narrowing conversion is done (since
that could loose data) I end up with hundreds of warnings each time
a construct like int a = strlen(...) appears. This clutters
everything, and important warnings go lost.


I do not know how to get out of this problem. Maybe any of you has
a good idea? How do you solve this when porting to 64 bits?

jacob
 
M

Malcolm McLean

jacob navia said:
I am trying to compile as much code in 64 bit mode as
possible to test the 64 bit version of lcc-win.

The problem appears now that size_t is now 64 bits.

Fine. It has to be since there are objects that are more than 4GB
long.

The problem is, when you have in thousands of places

int s;

// ...
s = strlen(str) ;

Since strlen returns a size_t, we have a 64 bit result being
assigned to a 32 bit int.

This can be correct, and in 99.9999999999999999999999999%
of the cases the string will be smaller than 2GB...

Now the problem:

Since I warn each time a narrowing conversion is done (since
that could loose data) I end up with hundreds of warnings each time
a construct like int a = strlen(...) appears. This clutters
everything, and important warnings go lost.


I do not know how to get out of this problem. Maybe any of you has
a good idea? How do you solve this when porting to 64 bits?
There's a very obvious answer to that one. As a compiler-writer, youa re in
a position to do it.
 
K

Kenneth Brody

jacob navia wrote:
[... "64-bit compiler" with 64-bit size_t ...]
The problem is, when you have in thousands of places

int s;

// ...
s = strlen(str) ;

Since strlen returns a size_t, we have a 64 bit result being
assigned to a 32 bit int.

This can be correct, and in 99.9999999999999999999999999%
of the cases the string will be smaller than 2GB...

Now the problem:

Since I warn each time a narrowing conversion is done (since
that could loose data) I end up with hundreds of warnings each time
a construct like int a = strlen(...) appears. This clutters
everything, and important warnings go lost.

I do not know how to get out of this problem. Maybe any of you has
a good idea? How do you solve this when porting to 64 bits?

Well, some people will probably claim that those hundreds of warnings
are a good thing, as strlen() returns size_t and not int. However,
if you are bombarded with hundreds of such warnings, many people will
simply start ignoring all of the warnings, and the "real" ones will
be lost in the noise.

Perhaps a flag that says "only display the first N instances of this
warning"?

Perhaps you could make int 64 bits as well?

--
+-------------------------+--------------------+-----------------------+
| Kenneth J. Brody | www.hvcomputer.com | #include |
| kenbrody/at\spamcop.net | www.fptech.com | <std_disclaimer.h> |
+-------------------------+--------------------+-----------------------+
Don't e-mail me at: <mailto:[email protected]>
 
B

Ben Pfaff

jacob navia said:
I am trying to compile as much code in 64 bit mode as
possible to test the 64 bit version of lcc-win.

The problem appears now that size_t is now 64 bits.

Fine. It has to be since there are objects that are more than 4GB
long.

The problem is, when you have in thousands of places

int s;

// ...
s = strlen(str) ;

Since strlen returns a size_t, we have a 64 bit result being
assigned to a 32 bit int.

I'd suggest fixing the code that does this to use size_t instead
of int. size_t is correct. int is, at best, an approximation to
correct. We've just had a pretty long thread with Malcolm McLean
discussing this very topic; perhaps you should refer to that
thread, if you're not already aware of it.
 
K

Keith Thompson

jacob navia said:
I am trying to compile as much code in 64 bit mode as
possible to test the 64 bit version of lcc-win.

The problem appears now that size_t is now 64 bits.

Fine. It has to be since there are objects that are more than 4GB
long.

The problem is, when you have in thousands of places

int s;

// ...
s = strlen(str) ;

Since strlen returns a size_t, we have a 64 bit result being
assigned to a 32 bit int.

This can be correct, and in 99.9999999999999999999999999%
of the cases the string will be smaller than 2GB...

Now the problem:

Since I warn each time a narrowing conversion is done (since
that could loose data) I end up with hundreds of warnings each time
a construct like int a = strlen(...) appears. This clutters
everything, and important warnings go lost.


I do not know how to get out of this problem. Maybe any of you has
a good idea? How do you solve this when porting to 64 bits?

Why didn't you get the same warnings in 32-bit mode? If int and
size_t are both 32 bits, INT_MAX < SIZE_MAX, and there are values of
size_t that cannot be stored in an int. If the "narrowing conversion"
warning is based on the sizes of the type rather than the ranges, I'd
say you've just discovered a compiler bug.

If you're getting hundreds of warnings, it's because you have hundreds
of instances of potential loss of information.

Note that a conversion to a signed type of a value that doesn't fit in
that type yields an implementation-defined result (or, in C99, raises
an implementation-defined signal). In theory, the result could be
more than just a loss of information.

The problem is to distinguish cases where the conversion can't
actually overflow at execution times from the cases where it can.

Sufficiently clever dataflow analysis in the compiler might eliminate
some of the errors. If, given
int s = strlen(str);
the compiler knows enough about how the value of str that it can be
sure it's no longer than INT_MAX bytes, it can eliminate the warning.
But I don't know if it's practical, or even possible to eliminate
enough of the warnings this way. Doing this in most cases is hard;
doing it in all cases might be equivalent to solving the halting
problem. (That latter is only a guess.)

(Making int 64 bits won't solve the problem, since INT_MAX will still
be less than SIZE_MAX.)

You can filter the compiler's output to eliminate warnings about
narrowing implicit conversions (or, if available, use a compiler
option to turn off that particular warning), but that could miss cases
that could actually overflow.

In my opinion, the warnings are legitimate. The ideal solution is not
to suppress them, but to fix the code, assigning the result of
strlen() to a size_t rather than to an int. (Or I suppose you could
use a cast to shut up the compiler if you're *certain* the result can
never exceed INT_MAX, but that's not what I'd do.)

By compiling the code in 64-bit mode, you've discovered a number of
dormant bugs in the code.
 
K

Keith Thompson

Malcolm McLean said:
I am trying to compile as much code in 64 bit mode as
possible to test the 64 bit version of lcc-win.

The problem appears now that size_t is now 64 bits.
[...]
int s;

// ...
s = strlen(str) ;

Since strlen returns a size_t, we have a 64 bit result being
assigned to a 32 bit int.

This can be correct, and in 99.9999999999999999999999999%
of the cases the string will be smaller than 2GB...

Now the problem:

Since I warn each time a narrowing conversion is done (since
that could loose data) I end up with hundreds of warnings each time
a construct like int a = strlen(...) appears. This clutters
everything, and important warnings go lost.


I do not know how to get out of this problem. Maybe any of you has
a good idea? How do you solve this when porting to 64 bits?
There's a very obvious answer to that one. As a compiler-writer, youa
re in a position to do it.

I presume the solution you're suggesting is to make int 64 bits. How
does this help? strlen() still returns size_t, and if int and size_t
are both 64 bits, there will still be size_t values that cannot be
stored in an int.
 
M

Malcolm McLean

Ben Pfaff said:
I'd suggest fixing the code that does this to use size_t instead
of int. size_t is correct. int is, at best, an approximation to
correct. We've just had a pretty long thread with Malcolm McLean
discussing this very topic; perhaps you should refer to that
thread, if you're not already aware of it.
Yup. As I said, if people would use size_t consistently for every single
calculation that ultimately ends up in an array index there wouldn't be such
a problem. The reality is that people won't, and lots of code doesn't.
 
M

Malcolm McLean

Keith Thompson said:
Malcolm McLean said:
I am trying to compile as much code in 64 bit mode as
possible to test the 64 bit version of lcc-win.

The problem appears now that size_t is now 64 bits.
[...]
int s;

// ...
s = strlen(str) ;

Since strlen returns a size_t, we have a 64 bit result being
assigned to a 32 bit int.

This can be correct, and in 99.9999999999999999999999999%
of the cases the string will be smaller than 2GB...

Now the problem:

Since I warn each time a narrowing conversion is done (since
that could loose data) I end up with hundreds of warnings each time
a construct like int a = strlen(...) appears. This clutters
everything, and important warnings go lost.


I do not know how to get out of this problem. Maybe any of you has
a good idea? How do you solve this when porting to 64 bits?
There's a very obvious answer to that one. As a compiler-writer, youa
re in a position to do it.

I presume the solution you're suggesting is to make int 64 bits. How
does this help? strlen() still returns size_t, and if int and size_t
are both 64 bits, there will still be size_t values that cannot be
stored in an int.
Yes, but then you'd need an extremely long string to break the code, so the
warning can be suppressed with some confidence that it won't cause a
malfunction.
 
J

jacob navia

Keith said:
Why didn't you get the same warnings in 32-bit mode? If int and
size_t are both 32 bits, INT_MAX < SIZE_MAX, and there are values of
size_t that cannot be stored in an int. If the "narrowing conversion"
warning is based on the sizes of the type rather than the ranges, I'd
say you've just discovered a compiler bug.

2GB strings are the most you can get under the windows schema in 32 bits.
If you're getting hundreds of warnings, it's because you have hundreds
of instances of potential loss of information.

Yes, "*POTENTIALLY*" I could be missing all those strings longer
than 4GB (!!!). But I do not care about those :)
Note that a conversion to a signed type of a value that doesn't fit in
that type yields an implementation-defined result (or, in C99, raises
an implementation-defined signal). In theory, the result could be
more than just a loss of information.

Only for strings >2GB Keith. Let's keep it realistic!
The problem is to distinguish cases where the conversion can't
actually overflow at execution times from the cases where it can.

Sufficiently clever dataflow analysis in the compiler might eliminate
some of the errors. If, given
int s = strlen(str);
the compiler knows enough about how the value of str that it can be
sure it's no longer than INT_MAX bytes, it can eliminate the warning.
But I don't know if it's practical, or even possible to eliminate
enough of the warnings this way. Doing this in most cases is hard;
doing it in all cases might be equivalent to solving the halting
problem. (That latter is only a guess.)

(Making int 64 bits won't solve the problem, since INT_MAX will still
be less than SIZE_MAX.)

You can filter the compiler's output to eliminate warnings about
narrowing implicit conversions (or, if available, use a compiler
option to turn off that particular warning), but that could miss cases
that could actually overflow.

In my opinion, the warnings are legitimate. The ideal solution is not
to suppress them, but to fix the code, assigning the result of
strlen() to a size_t rather than to an int. (Or I suppose you could
use a cast to shut up the compiler if you're *certain* the result can
never exceed INT_MAX, but that's not what I'd do.)

By compiling the code in 64-bit mode, you've discovered a number of
dormant bugs in the code.

There isn't any string longer than a few K in this program!
Of course is a potential bug, but it is practically impossible!
 
J

jacob navia

Ben said:
I'd suggest fixing the code that does this to use size_t instead
of int. size_t is correct. int is, at best, an approximation to
correct. We've just had a pretty long thread with Malcolm McLean
discussing this very topic; perhaps you should refer to that
thread, if you're not already aware of it.

The problem is that if I change those ints into size_t's they
are unsigned, and they will produce problems when comparing them with
signed quantities, making MORE modifications necessary in a cascade
of modifications that would surely introduce bugs...

I have *already* introduced (int)strlen(...) in many places...
 
J

jacob navia

Malcolm said:
There's a very obvious answer to that one. As a compiler-writer, youa re
in a position to do it.

???

(Please excuse my stupidity by I do not see it...)
 
C

christian.bau

I am trying to compile as much code in 64 bit mode as
possible to test the 64 bit version of lcc-win.

The problem appears now that size_t is now 64 bits.

Fine. It has to be since there are objects that are more than 4GB
long.

The problem is, when you have in thousands of places

int s;

// ...
s = strlen(str) ;

Since strlen returns a size_t, we have a 64 bit result being
assigned to a 32 bit int.

This can be correct, and in 99.9999999999999999999999999%
of the cases the string will be smaller than 2GB...

Now the problem:

Since I warn each time a narrowing conversion is done (since
that could loose data) I end up with hundreds of warnings each time
a construct like int a = strlen(...) appears. This clutters
everything, and important warnings go lost.

I do not know how to get out of this problem. Maybe any of you has
a good idea? How do you solve this when porting to 64 bits?

So the compiler is giving a warning when a 64 bit value is assigned to
a 32 bit variable, but not when a 32 bit unsigned value is assigned to
a 32 bit signed variable.

Well, just because you changed size_t to 64 bits doesn't make strings
any longer. strlen ("hello") still returns 5 and it will fit into an
int just as well as before. So you _could_, possibly as a compiler
option, mark certain functions as returning small(ish) values that
don't require a warning when stored in an int.

But maybe you should look at it from the point of view of a developer
who is switching from a 32 bit to a 64 bit compiler (or most likely
wants to write code that runs fine on a 32 bit and a 64 bit system),
and who _wants_ to fix problems. That programmer would _want_ the
warning and change the variable from int to something else.

Here is the approach that Apple takes: Define two typedefs, Int and
Uint (they actually use different names, but that doesn't matter).
These are used for almost all integer values. On a 32 bit system (32
bit int/long/size_t) they are equal to int/unsigned int, on a 64 bit
system (32 bit int, 64 bit long/size_t) they are equal to long/
unsigned long. Your warning problem goes away. Different types are
used on purpose so that if you mismatch int*/Int* or long*/Int* either
the 32 bit or 64 bit version will give you a compiler error.

Situations where you don't use these types: If you definitely need 64
bit, use long long. If you want to save space, use char/short/int as
suitable.
 
M

Malcolm McLean

jacob navia said:
???

(Please excuse my stupidity by I do not see it...)
The campaign for 64 bit ints T-shirts obviously didn't generate enough
publicity. I still have a few left. XXL, one size fits all.

There are some good reasons for not making int 64 bits on a 64 bit machine,
which as a compiler-writer you will be well aware of. However typical
computers are going to have 64 bits of main address space for a very long
time to come, so it makes sense to get the language right now, and keep it
that way for the forseeable future, and not allow decisions to be dominated
by the need to maintain compatibility with legacy 32 bit libraries.
 
K

Kelsey Bjarnason

Yup. As I said, if people would use size_t consistently for every single
calculation that ultimately ends up in an array index there wouldn't be such
a problem. The reality is that people won't, and lots of code doesn't.

And lots of people do and lots of code does, and those people don't get
those problems on that code.

Which just goes to show, doing the right thing - using size_t - makes
perfect sense, and ignoring the right thing - as you persist in doing -
makes for problems.
 
U

user923005

I am trying to compile as much code in 64 bit mode as
possible to test the 64 bit version of lcc-win.

The problem appears now that size_t is now 64 bits.

Fine. It has to be since there are objects that are more than 4GB
long.

The problem is, when you have in thousands of places

int s;

// ...
s = strlen(str) ;

Since strlen returns a size_t, we have a 64 bit result being
assigned to a 32 bit int.

This can be correct, and in 99.9999999999999999999999999%
of the cases the string will be smaller than 2GB...

Now the problem:

Since I warn each time a narrowing conversion is done (since
that could loose data) I end up with hundreds of warnings each time
a construct like int a = strlen(...) appears. This clutters
everything, and important warnings go lost.

I do not know how to get out of this problem. Maybe any of you has
a good idea? How do you solve this when porting to 64 bits?

Make your default int 64 bits, and be done with it.
Ought to be 64 bits on a 64 bit platform anyway.
 
U

user923005

Malcolm McLean said:
I am trying to compile as much code in 64 bit mode as
possible to test the 64 bit version of lcc-win.
The problem appears now that size_t is now 64 bits.
[...]




int s;
// ...
s = strlen(str) ;
Since strlen returns a size_t, we have a 64 bit result being
assigned to a 32 bit int.
This can be correct, and in 99.9999999999999999999999999%
of the cases the string will be smaller than 2GB...
Now the problem:
Since I warn each time a narrowing conversion is done (since
that could loose data) I end up with hundreds of warnings each time
a construct like int a = strlen(...) appears. This clutters
everything, and important warnings go lost.
I do not know how to get out of this problem. Maybe any of you has
a good idea? How do you solve this when porting to 64 bits?
There's a very obvious answer to that one. As a compiler-writer, youa
re in a position to do it.

I presume the solution you're suggesting is to make int 64 bits. How
does this help? strlen() still returns size_t, and if int and size_t
are both 64 bits, there will still be size_t values that cannot be
stored in an int.

If strlen() returns a number bigger than 9,223,372,036,854,775,808
then there are bigger fish to fry.
Sure, Bill Gates supposedly said that nobody will ever need more than
640K of RAM, and so someday it may be true that strings longer than 9
quintillion bytes are common. But I guess it will be a minor problem
until he can get around to fully correcting the code the right way by
assigning size_t values to the return from strlen() and other things
that return a size_t.
 
R

Richard Tobin

jacob navia said:
s = strlen(str) ;

Since strlen returns a size_t, we have a 64 bit result being
assigned to a 32 bit int.

This can be correct, and in 99.9999999999999999999999999%
of the cases the string will be smaller than 2GB...

Clearly with strlen() the chance of it being an error is negligible.
And I think this is true other size_t->int assignments. For example,
int s = sizeof(whatever) is almost never a problem.

Ideally, I would suggest not generating a warning unless some option
is set for it. (There should always be a "maximally paranoid" option
to help track down obscure errors.) But that only applies to
size_t->int assignments. Other 64->32 assignments may be more likely to be
in error. At the point you generate the warning, can you still tell
that it's a size_t rather than some other 64-bit int type?

-- Richard
 
K

Keith Thompson

Malcolm McLean said:
Yes, but then you'd need an extremely long string to break the code,
so the warning can be suppressed with some confidence that it won't
cause a malfunction.

That's assuming you're able to suppress the warning for 64-bit
unsigned to 64-bit signed conversions without supressing warnings for,
say, 8-bit unsigned to 8-bit signed conversions. I don't know of any
compiler that allow that kind of find-grained control.

It's better to fix the code. It's even better to write it correctly
in the first place.
 
R

Richard Tobin

Make your default int 64 bits, and be done with it.
Ought to be 64 bits on a 64 bit platform anyway.

A compiler for an existing operating system needs to fit in with the
system's libraries, so he may not have that choice.

-- Richard
 
R

Richard Tobin

Keith Thompson said:
It's better to fix the code. It's even better to write it correctly
in the first place.

But int s = sizeof(char *) is not broken, even though sizeof() returns
a size_t.

-- Richard
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,236
Members
46,822
Latest member
israfaceZa

Latest Threads

Top