size_t problems

Ben Pfaff · Sep 1, 2007

Malcolm McLean said:
No it's not. It's 4 times faster, which makes it O(N), which means it
is about as fast as the canonical loop.

4 times faster *is* a hell of a lot faster. Asymptotic
performance is not what the world is all about. In the end it's
all about how fast you can finish a particular task. The
asymptotic complexity of me adding numbers by hand is the same as
if the computer does it, but I tend to let the computer do it.
It's faster.

Peter J. Holzer · Sep 1, 2007

No it's not. It's 4 times faster,

Probably less.

which makes it O(N), which means it is about as fast as the canonical
loop.

By that kind of reasoning a snail is about as fast as a jet.

hp

jacob navia · Sep 1, 2007

Peter said:
Probably less.

By that kind of reasoning a snail is about as fast as a jet.

hp

Most of the strings in this application are less than 80 bytes long.

The difference is zero!

It is all swamped in the overhead of function call, and loop setup!

jacob

Mark McIntyre · Sep 1, 2007

Standard C doesn't have

1) Any serious i/o. To do anything fast you need system specific stuff.
2) Any notion of the keyboard. To handle the keyboard you need system
specific stuff.
3) Any graphics. Ditto.
4) No network.
5) Not any timers with reasonable accuracy.

So? in any typical application, all the above interface specific stuff
can (and should) be separated from the meat of the programme.

It would be possible to at least do something reasonable portable if the
standard would specify a reasonable string library, a common container
library, a common base for using in day to day programming.

Hey, didn't someone invent a new language cos they had similar issues,
remind us what its called?

Or they do not use the network, nor do they do any graphics, nor do they
use any i/o, etc etc.

or they practice good progamming technique and isolate interface code
into different (and replaceable) libraries.

--
Mark McIntyre

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it."
--Brian Kernighan

Joe Wright · Sep 1, 2007

CBFalconer said:
Joe said:

This compiles just fine for me.

#include <stdio.h>

size_t Strlen(char *s) {
char *p = s;
if (p) while (*p) p++;
return p - s;
}

Click to expand...

AFAICS this has the same action as strlen.

#define strlen Strlen

Click to expand...

This leads to undefined behaviour.

int main(void) {
char line[80] = "Are you kidding me?";
printf("The length of string \"%s\" is %d bytes.\n",
line, (int)strlen(line));
return 0;
}

Is there anything wrong with it?

Click to expand...

Yes. See above.

Not quite the same. See 'if (p)' checking for NULL.

Saying it doesn't make it so. The preprocessor does its thing early on
and by the time anything gets to the compiler, there is no reference to
strlen to be found, only to Strlen.

I suppose you don't like '#define strlen Strlen'. It has the effect of
removing a reference to a standard library function and replacing it
with the name of a local function before compilation. Harmless.

Malcolm McLean · Sep 1, 2007

Peter J. Holzer said:
and the admission of long, double, long long or any other type.

Let's face it, admitting types to C was a mistake.
We should go back to B.

The campaign for 64 bit ints wants int to be 64 bits. Then basically it's
ints for everything - no need for unsigned, 63 bits hold a number large
enough to count most things. Other types will be kept for special purposes.
Audio samples will be 16 bits for the foreseeable future, and you might need
a 32 bit type for interfacing with legacy libraries, and 128 bit longs for
cryptography. But bascially everything non-special can be an int, and the
problems disappear.

You've still got the problem of real numbers of course. The existence of two
and now three formats creates inefficiencies enough. But at least we'll have
the integers sorted out.

Keith Thompson · Sep 1, 2007

jacob navia said:
Most of the strings in this application are less than 80 bytes long.

The difference is zero!

It is all swamped in the overhead of function call, and loop setup!

Oh? Have you measured it?

Even if you have, your measurements apply only to your application.

strlen() is simple enough that re-inventing it isn't a huge deal; if
that's what you want to do, go ahead. But in general, predefined
functions are likely to be at least as fast as anything you can write
in portable C. (qsort() probaby imposes significant overhead because
it uses an indirect function call for each comparison, so a
custom-written sorting routine may be faster. But a custom-written
routine that does what qsort() does is unlikely to be faster than your
implementation's qsort().)

Even with small strings, a word-at-a-time version of strlen() might be
significantly faster if you invoke it enough times.

Note that I'm not advocating micro-optimization, i.e., obfuscating
your source code for the sake of some small performance increase. In
this case, the simplest code (calling the predefined strlen()) is both
simpler and likely faster than any replacement.

Of course, you could always re-write the application to use some other
representation for strings, so you don't have to call strlen() at all.
It might (or might not) give you a significant improvement in
performance and/or reliability if strlen() calls are a bottleneck, and
it's doable in purely standard C.

The performance difference between the predefined strlen() and your
re-implementation of it may not be significant, but you seem to be
offended by the idea of calling strlen(), and I have no idea why.

Malcolm McLean · Sep 1, 2007

Peter J. Holzer said:
By that kind of reasoning a snail is about as fast as a jet.

The snail, going West, is moving towards the Andromeda galaxy at 50.000001
km/s. The jet, going East, is moving towards Andromeda at about 49.660 km/s,
assuming it's a Concorde.

So to two decimal places, the snail is about as fast as the jet.

Richard Heathfield · Sep 1, 2007

Malcolm McLean said:

The snail, going West, is moving towards the Andromeda galaxy at
50.000001 km/s. The jet, going East, is moving towards Andromeda at
about 49.660 km/s, assuming it's a Concorde.

If it's a Concorde, it isn't going East, and it's travelling rather
slower than the snail.

Peter J. Holzer · Sep 1, 2007

The campaign for 64 bit ints wants int to be 64 bits.

I think somebody's irony detector needs adjusting.

You've still got the problem of real numbers of course. The existence of two
and now three formats creates inefficiencies enough. But at least we'll have

^^^

Now? "long double" exists at least since C89. I think some pre-ANSI
compilers I used had it, too. Oh, I forgot. You are the guy who knows
that C in hundred years will look like C 30 years ago, and everything
added in between is just a short-lived fashion which will eventually be
removed again.

hp

Ian Collins · Sep 1, 2007

Martin said:
gcc.

Nope, your example compiles cleanly without the cast with -Wall -ansi
-pedantic

Chris Torek · Sep 1, 2007

On the larger issue of "write portable code in the first place",
Martin Wells and Craig Gullixson are correct (in my opinion) and
I will not add more than that.

On the specifics of mixing signed and unsigned...

Signed/unsigned numbers have different ranges. Why is it a big deal to
compare these two types of values? Is it because one type can store a
value that does not exist in the other? That's also a problem with
short and long ints. Anyway the solution can be simple, such as
converting the numbers into a type that accommodates both ranges.

Indeed.

I think the "big deal" is that people get confused about the
possible problems. It helps, I think, to take a step or two
back and think about the actual inputs.

Suppose that you have two variables denoted "x" and "y", which
have differing types, but which are otherwise comparable with
relational operators.

The possible range for x is X_MIN to X_MAX, and the possible
range for y is Y_MIN to Y_MAX.

If there is a common type Z, for which numbers in X_MIN to X_MAX
and Y_MIN to Y_MAX always fit within Z_MIN to Z_MAX, then the
C code:

(Z)x < (Z)y

suffices. For example, if x and y are "signed char" and "unsigned
char" respectively, and we can be reasonably sure that INT_MAX meets
or exceeds UCHAR_MAX, then a simple:

(int)x < (int)y

suffices. If x is near SCHAR_MIN, say -125, and y is a value such
as (say) 200, we just get -125 < 200, which is true.

If there is no such common type -- for instance, if the type for
x is "signed long long" and the type for y is "unsigned long long"
-- then we have a *slightly* thornier problem. In this particular
case, we must decide whether negative values of "x" are less than
all values of "y". If so:

x < 0 || (unsigned long long)x < y

will do the trick. Even if x is near LLONG_MIN, so that forcing
x to "unsigned long long" produces a number very near ULLONG_MAX,
the first test takes care of the problem.

Ian Collins · Sep 1, 2007

Mark said:
So? in any typical application, all the above interface specific stuff
can (and should) be separated from the meat of the programme.

Um, I've just finished a little application to interface to a serial
port through a socket (make it look like an Ethernet to serial adapter).
I think there might be a portable line or two (the argument checking),
but the bulk is target specific. That's not uncommon for system code.

Malcolm McLean · Sep 1, 2007

Ian Collins said:
Um, I've just finished a little application to interface to a serial
port through a socket (make it look like an Ethernet to serial adapter).
I think there might be a portable line or two (the argument checking),
but the bulk is target specific. That's not uncommon for system code.

IO is hardware dependent, the rest is not.
I call subroutines that do IO "procedures" and those that don't "functions".
I know these words are used in other ways by other people.
I give my procedures capital letters, and keep the functions in lower case.
Everything in lower case is portable. Except main(), of course, as Richard
Heathfield once pointed out.
(This system isn't used on the website or book. stdio IO is also in
lowercase, because it is reasonably portable).

Flash Gordon · Sep 1, 2007

Malcolm McLean wrote, On 01/09/07 20:04:

It is also a lot easier to find errors in books than to write one.

It is even harder to write a good book.

Having been through the same process I won't criticise Heathfield too
much. They can creep in during formatting as well as in development and
testing. My book had some errors as well.

I just checked and your book STILL has errors since it has not been
updated. Please you the correct tense. Unless, of course, you are
deliberately trying to mislead.

Richard · Sep 2, 2007

Flash Gordon said:
Malcolm McLean wrote, On 01/09/07 20:04:

It is even harder to write a good book.

You really are quite a nasty person.

I just checked and your book STILL has errors since it has not been
updated. Please you the correct tense. Unless, of course, you are
deliberately trying to mislead.

I suspect he will do it in his good time.

Mark McIntyre · Sep 2, 2007

Um, I've just finished a little application to interface to a serial
port through a socket (make it look like an Ethernet to serial adapter).
I think there might be a portable line or two (the argument checking),
but the bulk is target specific. That's not uncommon for system code.

*shrugs*.
System code is by, um, definition, system-specific. You can't write it
in C. I'm not sure where you're going with this - are you suggesting
that C should include a superset of all possible system-specific
interfaces? If so, feel free to write that library and propose it to
the Committee.

--
Mark McIntyre

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it."
--Brian Kernighan

Ian Collins · Sep 2, 2007

Mark said:
*shrugs*.
System code is by, um, definition, system-specific. You can't write it
in C.

Oh it's C all right, it just uses system specific libraries.

There's nothing "not C" about "unsigned s = read( buffer, size);" for a
given definition of read.

Calling system specific functions does not prevent code from being C.

Ed Jensen · Sep 2, 2007

Martin Wells said:
Now you're just preaching about your own incompetence. Sorry to sound
hostile, but it's the truth.

Don't worry about it, Martin. To be honest, I was expecting that kind
of response much sooner. It's just sort of the...personality...of
this newsgroup. Since I've been online since about 1979, I've had
ample time to marvel at this kind of fascinating emergent behavior in
online communities.

There are very few regulars here in comp.lang.c that'll admit that
writing 100% portable C code is non-trivial. People get awfully
religious about strange things, even computer programming languages.
Your religion of choice is C. Hey, that's cool.

Therefore, I knew before I walked down this path, that the response
would ultimately be, "The problem can't possibly be that it's
non-trivial to write 100% portable C code; the problem must be you."
I've seen the denizens of comp.lang.c use this response on several
people. Why should I be immune?

But, instead of pointless and unfounded insults, let's try a real
world test for a change. You paste one or two thousand lines of C
code you've written from your most recent project, and we'll see if
anyone on the newsgroup can identify any code that's not 100% portable
C code.

Since you've made the claim that writing 100% portable C code isn't
just easy, but VERY easy, I'm quite sure you're up to the challenge.
It's time to put your code where your mouth is.

And while we're on the topic, I'd like to present a little poll: Is
there anyone else here that agrees with Martin when he says that
writing 100% portable C code is VERY easy? Keep in mind the question
isn't whether or not it's possible or desirable, just whether or not
it's VERY easy.

Chris Torek · Sep 2, 2007

Ed Jensen said:
There are very few regulars here in comp.lang.c that'll admit that
writing 100% portable C code is non-trivial. ...

Since you've made the claim that writing 100% portable C code isn't
just easy, but [in a later post that is not quoted above] VERY easy ...

... I'd like to present a little poll: Is
there anyone else here that agrees with Martin when he says that
writing 100% portable C code is VERY easy?

I would say "often easy enough, rarely VERY easy", although of
course the precise meaning of "enough" and "VERY" is tough to
pin down.

Furthermore, the easy-ness of portability varies with the goal of
the code. Clearly, something like "calculate mortgage payments"
or "concatenate files specified by argv[] elements" is going to be
easier than "write an operating system" or "do bitmapped graphics":
the latter two *require* at least *some* non-portable code.

The trick in this case is to know when to make the tradeoff -- but
this in turn requires being able to write portable code (or even
"extremely" or "100%" portable code, whatever that may mean

)
in the first place. That, of course, requires knowing what is
portable, i.e., at least some degree of study of Standard C. This
is where the comp.lang.c newsgroup comes in: here in comp.lang.c,
you can find out what is "portable", or how to take any given chunk
of code with "not very portable" parts and rewrite it to have large
"portable parts", and thus learn when to make tradeoffs.

size_t, ssize_t and ptrdiff_t	56	Oct 12, 2013
The problem with size_t	45	Oct 15, 2009
size_t	18	Dec 6, 2004
size_t in inttypes.h	4	May 26, 2011
Weird Behavior with Rays in C and OpenGL	4	Feb 13, 2024
mixed declarations and code (and size_t)?	7	Nov 15, 2010
64 bit porting - size_t vs unsigned int	7	Dec 23, 2006
size_t in C++	0	May 9, 2010

size_t problems

Ben Pfaff

Peter J. Holzer

jacob navia

Mark McIntyre

Joe Wright

Malcolm McLean

Keith Thompson

Malcolm McLean

Richard Heathfield

Peter J. Holzer

Ian Collins

Chris Torek

Ian Collins

Malcolm McLean

Flash Gordon

Richard

Mark McIntyre

Ian Collins

Ed Jensen

Chris Torek

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads