size_t problems

M

Malcolm McLean

Ben Bacarisse said:
Malcolm McLean said:
That's why Basic Algorithms is absolutely consistent in using
int. Otherwise I would either have to translate everything to size_t,
or you would rapidly risk a mess.

I can't understand why, since you acknowledge that part of the problem
is old code that uses int[1], you choose to perpetuate the problem in a
new book.
Two things will happen.
Probably there will be a howl of protest as desktop programs move from 32 to
64 bits, and the implications of size_t being no longer the same size as an
int (give or take a sign bit) become obvious. So something will be done, and
people will look at code saying size_t i and say "Oh, that garbage the
committee inisted on back in 2007? What obsolete code."

The other possibility is that the committee will have its way, and we've all
got to write size_t for practically every array index. This makes C a
difficult language, OK for the specialist, but not very good for beginner
use. So it is no longer a good choice for a beginning book. Either use a
different language, or use a cut down, simplified version of the existing
language, with a note to say what you've done.

Either way, it is a bad idea to always follow the latest fashion in
programming. That way you've got to keep on rewriting things.
 
R

Richard Heathfield

Malcolm McLean said:

Probably there will be a howl of protest as desktop programs move from
32 to 64 bits,

Why? Surely everyone has learned their lesson from the early 1990s -
"don't rely on exact-size types, or your code will break one day" -
haven't they?
and the implications of size_t being no longer the same
size as an int (give or take a sign bit) become obvious.

It has never been the case that size_t is the same size as an int,
except by coincidence. ints are sizeof(int) bytes big, whereas size_ts
are sizeof(size_t) bytes big. If these values are the same, that's an
interesting coincidence, but nothing more.
So something
will be done, and people will look at code saying size_t i and say
"Oh, that garbage the committee inisted on back in 2007? What obsolete
code."

(a) far from being garbage, size_t is a useful type;
(b) the committee codified size_t is 1989, not 2007;
(c) far from being obsolete, code that uses proper types in the proper
way is more likely to survive and flourish than code that does not.

Either way, it is a bad idea to always follow the latest fashion in
programming.

Such as, say, 64-bit ints.
That way you've got to keep on rewriting things.

Precisely. Whereas, if you use the proper types in the right way, you
are less likely to have to do that.
 
K

Keith Thompson

CBFalconer said:
pete said:
Richard said:
An array of char can potentially have an index range of
0...SIZE_MAX. An array of any larger object type has a more
limited index range. Therefore, size_t is always a suitable
type for representing an array index.

For a sufficiently restricted interpretation of array index.
p[-3] can be perfectly legal.

If (&p) is the address of an object of an array type,
then p[-3] isn't defined.

Disproof:

int aone[10];
int *const atwo = &aone[3];
/* atwo is now effectively an array of indices -3 thru 6 */
...
int i;
for (i = -3; i < 7; i++) atwo = i; /* legal */


No, there's no such thing as an array with indices -3 through 6 -- and
atwo is a pointer, not an array. But a good case could be made that
atwo points to the first element of an array of length 7 (that happens
to overlap the last 7 elements of aone). I'm not sure just how good a
case can be made; it depends on the exact wording of the standard
(which I don't have handy at the moment).
 
I

Ian Collins

Malcolm said:
Ben Bacarisse said:
I can't understand why, since you acknowledge that part of the problem
is old code that uses int[1], you choose to perpetuate the problem in a
new book.
Two things will happen.
Probably there will be a howl of protest as desktop programs move from
32 to 64 bits, and the implications of size_t being no longer the same
size as an int (give or take a sign bit) become obvious. So something
will be done, and people will look at code saying size_t i and say "Oh,
that garbage the committee inisted on back in 2007? What obsolete code."
Those of us with decent desktops have been in the 64 bit world for well
over a decade and I haven't heard any howls yet.
The other possibility is that the committee will have its way, and we've
all got to write size_t for practically every array index.

They've had their way since 1989, where have you been? 64 bit desktops
started to appear shortly after.
This makes C
a difficult language, OK for the specialist, but not very good for
beginner use. So it is no longer a good choice for a beginning book.

Are you really saying too hard for windows programmers?
Either way, it is a bad idea to always follow the latest fashion in
programming. That way you've got to keep on rewriting things.
You must be behind the times Malcolm, there have been plenty of fashions
that have been and gone in the past 18 years.
 
P

pete

CBFalconer said:
pete wrote:
If (&p) is the address of an object of an array type,
then p[-3] isn't defined.

Disproof:

int aone[10];
int *const atwo = &aone[3];
/* atwo is now effectively an array of indices -3 thru 6 */
...
int i;
for (i = -3; i < 7; i++) atwo = i; /* legal */


(&aone[3]) is the address of an object of type int.
Your disproof is irrelevant to my statement.
 
C

CBFalconer

Malcolm said:
.... snip ...

The other possibility is that the committee will have its way, and
we've all got to write size_t for practically every array index.
This makes C a difficult language, OK for the specialist, but not
very good for beginner use. So it is no longer a good choice for a
beginning book. Either use a different language, or use a cut down,
simplified version of the existing language, with a note to say
what you've done.

You don't type an array index because it's indexing an array. You
type it according to the values it has to hold. Similarly for
anything else. If an index has to hold any value returned from
strlen (which is a size_t) then it must be a size_t. If it has to
hold "sizeof double" it can be a char, a short, an int, a long, or
a size_t, and unsigned versions of all. I don't think anyone will
take you to task for assuming "sizeof double" is no larger than
127.

If you had ever had the training of using Pascal correctly, you
would be aware of this. There you first type the variable that
indexes an array (lower and upper bounds). Then you build an array
indexed by that type. Now the error detection will catch you
anytime you exceed the preset bounds in the index, and use of the
index involves no checks.
 
R

Richard Tobin

For a sufficiently restricted interpretation of array index.
p[-3] can be perfectly legal.
[/QUOTE]
No, there's no such thing as an array with indices -3 through 6 -- and
atwo is a pointer, not an array.

That's why I said "for a sufficiently restricted interpretation of
array index". How the standard defines array is unimportant; the
point is that in indexing, both sizes (which "should" be unsigned
size_ts) and offsets (both negative and positive) are used and
combined and compared. So I find the fact that sizes are inherently
positive unconvincing as an argument for their being unsigned.

The *real* reason for their being unsigned is that the good sizes for
signed ints have in the past been inadequate for addressing all
objects. At the risk of sounding like Mr Gates, I suggest that 63
bits will be quite adequate for object sizes throughout the future
life of C.

-- Richard
 
R

Richard Tobin

int aone[10];
int *const atwo = &aone[3];
[/QUOTE]
(&aone[3]) is the address of an object of type int.

I realise this is just pedantry, but who can complain?

Is the following legal:

typedef int array_type[7];
array_type *atwo = (array_type *)&aone[3];

and if so, what is the type of *atwo? And is not (*atwo)[-1] legal?

-- Richard
 
C

CBFalconer

Richard said:
Keith Thompson said:
For a sufficiently restricted interpretation of array index.
p[-3] can be perfectly legal.
No, there's no such thing as an array with indices -3 through 6
-- and atwo is a pointer, not an array.

That's why I said "for a sufficiently restricted interpretation of
array index". How the standard defines array is unimportant; the
point is that in indexing, both sizes (which "should" be unsigned
size_ts) and offsets (both negative and positive) are used and
combined and compared. So I find the fact that sizes are inherently
positive unconvincing as an argument for their being unsigned.

No, you miss the point. The index type has nothing to do with the
array, it only has to do with the span of that arrays index. Make
that type suit the required index.

You snipped my example (and the attributions - which is bad. Don't
do that for material you quote) which showed the construction.
Following that the use of atwo is an adequate substitute for aone,
except for sizeof, and can be passed to functions in the same
manner. However atwo requires a signed index, while aone required
an unsigned one. C doesn't have the strict 'subrange' kind of
typing available in other languages, so individual indexes have to
be checked.
 
J

Joe Wright

Richard said:
CBFalconer said:


No, at which point his code won't even compile.


I think he should start with something a little easier to understand.
This compiles just fine for me.

#include <stdio.h>

size_t Strlen(char *s) {
char *p = s;
if (p) while (*p) p++;
return p - s;
}

#define strlen Strlen

int main(void) {
char line[80] = "Are you kidding me?";
printf("The length of string \"%s\" is %d bytes.\n",
line, (int)strlen(line));
return 0;
}

Is there anything wrong with it?
 
H

Harald van =?UTF-8?B?RMSzaw==?=

Joe said:
This compiles just fine for me.

#include <stdio.h>

size_t Strlen(char *s) {
char *p = s;
if (p) while (*p) p++;
return p - s;
}

#define strlen Strlen

int main(void) {
char line[80] = "Are you kidding me?";
printf("The length of string \"%s\" is %d bytes.\n",
line, (int)strlen(line));
return 0;
}

Is there anything wrong with it?

No, ignoring style, there is nothing wrong with it, as long as <string.h> is
not included.
 
R

Richard Heathfield

<much snippage>

Joe Wright said:
This compiles just fine for me.

Look at his code more closely. Much more closely. Vewy vewy cwosewy, in
fact. I have re-quoted the relevant line.
 
J

Joe Wright

Richard said:
<much snippage>

Joe Wright said:

Look at his code more closely. Much more closely. Vewy vewy cwosewy, in
fact. I have re-quoted the relevant line.
I see it (;) now. The admonishment to compile even your snippets before
posting is valid.
 
E

Ed Jensen

Richard Heathfield said:
Why? Surely everyone has learned their lesson from the early 1990s -
"don't rely on exact-size types, or your code will break one day" -
haven't they?

Sometimes (most of the time?) C developers need to choose between:

1. Writing 100% portable code. This can be non-trivial and really
slow down your development. (However, I'm sure writing 100% portable
code doesn't slow down any of the geniuses HERE. I'm talking strictly
about MORTAL developers.)

2. Writing code that's portable to the platforms they're currently
targeting. And perhaps keeping in mind platforms they're likely to
need to support in the future.

It's easy to be smug and regurgitate the ivory tower attitude:

"Well, just write your C code so it's 100% portable in the first
place. Easy! Problem solved! Only dummies don't do that!"

The reality of the situation is that many developers who choose the
"100% portable path" may end up (1) being unemployed, because their
productivity is low compared to more pragmatic developers, or (2)
working 80+ hours per week in order to keep pace with those more
pragmatic developers.

Of course, what's really happened in the market is that more and more
projects have abandoned C (and C++) for evil, horrible, limited,
short-sighted languages that made pragmatic choices like: fixed size
primitive types.

I personally try to walk the fine line between "TOO pragmatic" and
"TOO ivory tower". I'm not quite ready to look down my nose and
admonish those developers that sometimes used "int" when they should
have used "size_t". (I do agree, however, that they should fix their
code the right way, if they decide to continue to use C.)

The problem, if you care to see it as such, is really that C is aging,
and some of the choices made decades ago made sense then, but perhaps
don't make so much sense now, for an ever increasing number of
applications.

Because C was one of the first languages I knew well, and I had done
it for such a long time, it'll always have a soft spot in my heart.
However, I can't help but yearn for something very much in the C
tradition but updated and refreshed. A "C2" language, perhaps, where
there's no need for an alphabet soup of types, and where a "size_t"
type becomes unnecessary.
(a) far from being garbage, size_t is a useful type;
(b) the committee codified size_t is 1989, not 2007;
(c) far from being obsolete, code that uses proper types in the proper
way is more likely to survive and flourish than code that does not.

In my opinion, (c) is only true in a limited sort of way. It's more
likely that developers will flee to languages that offer increased
productivity AND portability through fixed size types. Those
languages make (a) almost entirely, if not entirely, irrelevant.
Precisely. Whereas, if you use the proper types in the right way, you
are less likely to have to do that.

When programming in C, it is, quite simply, too much of a productivity
killer to always make sure you're using the proper types in the right
way 100% of the time. This comment shouldn't be mistaken as an excuse
for developers to use "int" when they should use "size_t", though.
It's just a comment on C in general.
 
J

Joe Wright

Harald said:
Joe said:
This compiles just fine for me.

#include <stdio.h>

size_t Strlen(char *s) {
char *p = s;
if (p) while (*p) p++;
return p - s;
}

#define strlen Strlen

int main(void) {
char line[80] = "Are you kidding me?";
printf("The length of string \"%s\" is %d bytes.\n",
line, (int)strlen(line));
return 0;
}

Is there anything wrong with it?

No, ignoring style, there is nothing wrong with it, as long as <string.h> is
not included.

Style? Anyway, what changes if I include <string.h> after <stdio.h> and
before #define strlen Strlen ?
 
H

Harald van =?UTF-8?B?RMSzaw==?=

Joe said:
Harald said:
Joe said:
This compiles just fine for me.

#include <stdio.h>

size_t Strlen(char *s) {
char *p = s;
if (p) while (*p) p++;
return p - s;
}

#define strlen Strlen

int main(void) {
char line[80] = "Are you kidding me?";
printf("The length of string \"%s\" is %d bytes.\n",
line, (int)strlen(line));
return 0;
}

Is there anything wrong with it?

No, ignoring style, there is nothing wrong with it, as long as <string.h>
is not included.

Style?

Defining your own functions with the same name as standard library functions
or macros is not something I would ever consider good style. Not even the
times when it's allowed and actually useful.
Anyway, what changes if I include <string.h> after <stdio.h> and
before #define strlen Strlen ?

If <string.h> already defines strlen as a macro, you will get a complaint
from your compiler that you're redefining the macro. If you make sure to
use #undef first, the behaviour is undefined.
 
J

jacob navia

Joe said:
I see it (;) now. The admonishment to compile even your snippets before
posting is valid.

I do not think so.

Snippets are intended for people, not machines. Besides this, that guy
can only be satisfied with things like that:
missing semicolons, missing this or that.
 
J

jacob navia

Ed said:
Sometimes (most of the time?) C developers need to choose between:

1. Writing 100% portable code. This can be non-trivial and really
slow down your development. (However, I'm sure writing 100% portable
code doesn't slow down any of the geniuses HERE. I'm talking strictly
about MORTAL developers.)

2. Writing code that's portable to the platforms they're currently
targeting. And perhaps keeping in mind platforms they're likely to
need to support in the future.

It's easy to be smug and regurgitate the ivory tower attitude:

"Well, just write your C code so it's 100% portable in the first
place. Easy! Problem solved! Only dummies don't do that!"

And then, like heathfield, they discover that they published a book
(c unleashed) with in one page the assumption that
sizeof(int) == sizeof(int *).

It is easy to play the guru here. More difficult in reality.
The reality of the situation is that many developers who choose the
"100% portable path" may end up (1) being unemployed, because their
productivity is low compared to more pragmatic developers, or (2)
working 80+ hours per week in order to keep pace with those more
pragmatic developers.

who cares about the job?
c.lang.c is the only thing it counts
Of course, what's really happened in the market is that more and more
projects have abandoned C (and C++) for evil, horrible, limited,
short-sighted languages that made pragmatic choices like: fixed size
primitive types.

I personally try to walk the fine line between "TOO pragmatic" and
"TOO ivory tower". I'm not quite ready to look down my nose and
admonish those developers that sometimes used "int" when they should
have used "size_t". (I do agree, however, that they should fix their
code the right way, if they decide to continue to use C.)

The problem, if you care to see it as such, is really that C is aging,
and some of the choices made decades ago made sense then, but perhaps
don't make so much sense now, for an ever increasing number of
applications.

Because C was one of the first languages I knew well, and I had done
it for such a long time, it'll always have a soft spot in my heart.
However, I can't help but yearn for something very much in the C
tradition but updated and refreshed. A "C2" language, perhaps, where
there's no need for an alphabet soup of types, and where a "size_t"
type becomes unnecessary.


In my opinion, (c) is only true in a limited sort of way. It's more
likely that developers will flee to languages that offer increased
productivity AND portability through fixed size types. Those
languages make (a) almost entirely, if not entirely, irrelevant.


When programming in C, it is, quite simply, too much of a productivity
killer to always make sure you're using the proper types in the right
way 100% of the time. This comment shouldn't be mistaken as an excuse
for developers to use "int" when they should use "size_t", though.
It's just a comment on C in general.


more about that later
 
C

CBFalconer

Ed said:
.... snip ...

"Well, just write your C code so it's 100% portable in the first
place. Easy! Problem solved! Only dummies don't do that!"
True.


The reality of the situation is that many developers who choose the
"100% portable path" may end up (1) being unemployed, because their
productivity is low compared to more pragmatic developers, or (2)
working 80+ hours per week in order to keep pace with those more
pragmatic developers.

False. The portable writer reaches for the earlier developed
routines, all tested, and incorporates them. Then he goes
swimming, or something else enjoyable.
 
C

CBFalconer

Joe said:
This compiles just fine for me.

#include <stdio.h>

size_t Strlen(char *s) {
char *p = s;
if (p) while (*p) p++;
return p - s;
}

AFAICS this has the same action as strlen.
#define strlen Strlen

This leads to undefined behaviour.
int main(void) {
char line[80] = "Are you kidding me?";
printf("The length of string \"%s\" is %d bytes.\n",
line, (int)strlen(line));
return 0;
}

Is there anything wrong with it?

Yes. See above.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

Forum statistics

Threads
473,997
Messages
2,570,239
Members
46,827
Latest member
DMUK_Beginner

Latest Threads

Top