strcpy overlapping memory

E

Eric Sosman

But with crashing software *nobody* would get paid. And if an application
used "", then the field would be blank anyway.

Consider the consequences. If the program crashes, there's a
panic page in the middle of the night and a team of sleepy programmers
get rousted out of bed to fix the bug and get the checks out. Too bad
for the programmers who'd rather have slept, but they shouldn't have
written the bug that caused the pointer to be NULL in the first place.

Or, the alternative: The program runs, no alarms ring, the checks
print and are distributed. And the first five recipients to notice
their opportunity drain the entire payroll account and vanish to
Vanuatu. Nobody else gets paid because all the checks bounce.
If a blank field is an error, it will be obvious when the field appears
'blank'. Not so obvious when there's a seg fault or whatever.

A crashed program is about a jillion times more "obvious" than a
program that runs silently and produces bad output.
If. Usually a string is a string, you might not do anything different
with a
0-length string compared with an N-length one.

What is the temperature in your home's underwater bowling alley?
You seem unable to distinguish between "My home has no underwater
bowling alley" and "Zero Kelvin." Others see a difference.
But it is a pain then to have to check for NULLs (which I tend to do with
functions taking string arguments).

If a function is *supposed* to behave differently with NULL than
it does with "", or differently with "" than with "purple", the tests
are a necessary part of its ordinary operation. How could it be
otherwise?
At the moment I might write:

if (s==NULL) return NULL;
slen = strlen(s);
if (slen==0) return NULL;

And mention in the specs that NULL is an acceptable argument equivalent to
an empty string.

But wouldn't it be useful to be able to omit that first check?

Impossible to say. What is the function's purpose? What should
it return for "purple", and why? Fragments don't define interfaces.
 
T

tm

     Consider the consequences.  If the program crashes, there's a
panic page in the middle of the night and a team of sleepy programmers
get rousted out of bed to fix the bug and get the checks out.  Too bad
for the programmers who'd rather have slept, but they shouldn't have
written the bug that caused the pointer to be NULL in the first place.

     Or, the alternative: The program runs, no alarms ring, the checks
print and are distributed.  And the first five recipients to notice
their opportunity drain the entire payroll account and vanish to
Vanuatu.  Nobody else gets paid because all the checks bounce.

You can have endless discussions like this. Some people just
fear error messages. The fear is so big that they prefer to
get wrong results.

This fear from error messages exists for compile-time error
messages and for runtime error messages.

I have a totally different view than this fearful people. I am
happy when I get an error message (compile-time or runtime),
since it gives me the opportunity to improve my program. The
error will not go away when I don't hear from it.


Greetings Thomas Mertes

--
Seed7 Homepage: http://seed7.sourceforge.net
Seed7 - The extensible programming language: User defined statements
and operators, abstract data types, templates without special
syntax, OO with interfaces and multiple dispatch, statically typed,
interpreted or compiled, portable, runs under linux/unix/windows.
 
B

BartC

Eric Sosman said:
On 12/10/2010 8:56 AM, BartC wrote:

What is the temperature in your home's underwater bowling alley?
You seem unable to distinguish between "My home has no underwater
bowling alley" and "Zero Kelvin." Others see a difference.

Your analogy is not meaningful; better would be the length of such an alley.

Then an alley with a length of 0 metres, might require 0 square metres of
carpet (or gallons of water; whatever). Exactly the same as no alley.

You wouldn't expect a machine that calculates areas of carpet, to blow up
when someone enters 'no bowling alley' into it.
 
E

Eric Sosman

[...]
You can have endless discussions like this. Some people just
fear error messages. The fear is so big that they prefer to
get wrong results.

This fear from error messages exists for compile-time error
messages and for runtime error messages.

I have a totally different view than this fearful people. I am
happy when I get an error message (compile-time or runtime),
since it gives me the opportunity to improve my program. The
error will not go away when I don't hear from it.

I can't say whether "fear" is the right word, and I can't say
that error messages make me "happy," but I agree wholeheartedly
that an error message is vastly preferable to incorrect output
offered with a pretense of accuracy.

If a program requires that a certain pointer aims at the start
of a string and it happens that the pointer is NULL, this is proof
that the program has already erred. Pretending that NULL is "" does
not solve the problem, does not repair the error. Most likely, it
makes the error harder to find by deferring its discovery until the
back-trail has grown both longer and colder. It's treating a symptom,
not treating the disease -- or, as a long-ago manager of mine used to
say, it's "throwing a blanket over the problem."
 
J

Jorgen Grahn

On Mon, 2010-12-06, JohnF wrote:

[strcpy()
And, like you guys said, I >>am<< surprised such runtime optimizations
work, i.e., that, on average, they save more time than they cost.

I'm surprised too when I look at a typical memcpy() implementation
which chooses between a whole bunch of different strategies before
starting with the actual copying.

But I trust the library programmers; that they did some thinking and
measurements on real-world programs before coming up with that one. In
modern systems hitting memory is expensive, so the gain a lot by
copying 16, 32 or 64 bits at a time probably shadows the cost of the
"detection" phase pretty soon.
It also somewhat tarnishes my picture of C the way it's often
described as a "portable assembly language". In that picture, I'd
kind of hope that strcpy would just assemble to some straightforward
move instruction, along with whatever '\000' end-of-string check
is available in the particular instruction set. If they want
to add optimizations, they could at least reserve them for -O3,
or something like that.

If it makes you feel any better, I'd expect the copy in

void foo(const struct Bar* bar)
{
struct Bar baz = *bar;
...
}

to be well-optimized: just a few 32-bit inlined MOVEs if Bar is
sufficiently small, or else a call to memcpy(). But strcpy() is
trickier; the compiler usually has no idea if the string is 0, 1 or
1 000 000 characters long.

/Jorgen
 
B

BartC

Eric Sosman said:
[...]
You can have endless discussions like this. Some people just
fear error messages. The fear is so big that they prefer to
get wrong results.

This fear from error messages exists for compile-time error
messages and for runtime error messages.

I have a totally different view than this fearful people. I am
happy when I get an error message (compile-time or runtime),
since it gives me the opportunity to improve my program. The
error will not go away when I don't hear from it.

I can't say whether "fear" is the right word, and I can't say
that error messages make me "happy," but I agree wholeheartedly
that an error message is vastly preferable to incorrect output
offered with a pretense of accuracy.

If a program requires that a certain pointer aims at the start
of a string and it happens that the pointer is NULL, this is proof
that the program has already erred. Pretending that NULL is "" does
not solve the problem, does not repair the error. Most likely, it
makes the error harder to find by deferring its discovery until the
back-trail has grown both longer and colder. It's treating a symptom,
not treating the disease -- or, as a long-ago manager of mine used to
say, it's "throwing a blanket over the problem."

I make use of many C standard functions from an interpreted language.

But calling C functions directly is fraught with problems: I represent a
string as (pointer,length), where the pointer part is passed to a C
function.

But when length is 0, pointer is usually zero too. This can cause a crash
when calling certain functions with NULL for certain arguments. Such
behaviour is out of place in an interpreted language (and the programmer
can't do much about the implementation of an empty string), so that wrapper
functions have to be used, slowing things down (as often these are
interpreted too).

The choices of using NULL and/or "" for No String, and NULL and/or "" for
Empty String, just makes things fiddlier than they need to be. Having system
functions support NULL without crashing (applying whatever documented
meaning they assume), would have been useful.
 
E

Eric Sosman

Your analogy is not meaningful; better would be the length of such an
alley.

Then an alley with a length of 0 metres, might require 0 square metres
of carpet (or gallons of water; whatever). Exactly the same as no alley.

You wouldn't expect a machine that calculates areas of carpet, to blow
up when someone enters 'no bowling alley' into it.

I'd hope that the machine would not blow up, true. I'd also hope
it wouldn't give a meaningless answer like "zero." (The answer I'd
*really* like is "Carpet on a bowling alley? Are you mad?")

But let's indulge your odd taste in interior decor and push ahead
just a little. Your program informs you

Amount of carpet needed: 0.00 square cubits

What kind of carpet? Well, how high up the quality ladder can your
allocated budget take you? Divide budget by area

Maximum carpet price: Inf splonders / square cubit

The program then adds in the known industry-standard per-unit price
for underlayment, installation, and so on

Installation, etc: 27.33 splonders / square cubit
Maximum per-area price: Inf splonders / square cubit

.... and multiplies by the area to get a ceiling on the total cost
of the carpeting

Carpet sub-project cost ceiling: NaN splonders

.... and adds that to the already-computed cost of the other sub-projects
for carpentry, painting, bribing the housing inspector, and so on:

GRAND TOTAL PROJECT COST CEILING: NaN splonders

Golly gee, but that program sure has been a big help!

At some point, BartC, at some point in any decision chain *somebody*
has to wake up to the fact that a meaningless question has been asked or
a meaningless answer given. I maintain that the earlier the moment of
recognition occurs, the better. The program *should* have noticed that
there was no bowling alley at all, and skipped all these meaningless
calculations. Instead, it's allowed the zeroes, infinities, and NaN's
of the non-existent alley to poison the entire remodeling project and
make the entire exercise useless, or worse. (What might an unprincipled
carpet supplier do with a purchase order specifying "not to exceed Inf
splonders?")
 
T

tm

[...]
You can have endless discussions like this. Some people just
fear error messages. The fear is so big that they prefer to
get wrong results.
This fear from error messages exists for compile-time error
messages and for runtime error messages.
I have a totally different view than this fearful people. I am
happy when I get an error message (compile-time or runtime),
since it gives me the opportunity to improve my program. The
error will not go away when I don't hear from it.
    I can't say whether "fear" is the right word, and I can't say
that error messages make me "happy," but I agree wholeheartedly
that an error message is vastly preferable to incorrect output
offered with a pretense of accuracy.
    If a program requires that a certain pointer aims at the start
of a string and it happens that the pointer is NULL, this is proof
that the program has already erred.  Pretending that NULL is "" does
not solve the problem, does not repair the error.  Most likely, it
makes the error harder to find by deferring its discovery until the
back-trail has grown both longer and colder.  It's treating a symptom,
not treating the disease -- or, as a long-ago manager of mine used to
say, it's "throwing a blanket over the problem."

I make use of many C standard functions from an interpreted language.

But calling C functions directly is fraught with problems: I represent a
string as (pointer,length), where the pointer part is passed to a C
function.

But when length is 0, pointer is usually zero too. This can cause a crash
when calling certain functions with NULL for certain arguments. Such
behaviour is out of place in an interpreted language (and the programmer
can't do much about the implementation of an empty string), so that wrapper
functions have to be used, slowing things down (as often these are
interpreted too).

You use C standard functions directly from your interpreter,
but your wrapper functions need to be interpreted?

That sounds strange. Why don't you write the wrapper
functions in C?

The Seed7 interpreter has wrapper functions written in C for
almost all primitive actions. How can an interpreter directly
call e.g. strcpy without checking that the destination is
allocated and is big enough? Seed7 does not do such things.
Seed7 manages string memory automatically and does other
things that the C standard library functions do not care about.

Please don't tell me that you want to change the C standard
library functions just to speed up your interpreter. That is
just the wrong approach. You get advantages from the fact that
a C standard exists. Your C programs are portable because of
this. When you really want performance compile your language
to C. This is at least the approach I use for Seed7. And it
works great.


Greetings Thomas Mertes

--
Seed7 Homepage: http://seed7.sourceforge.net
Seed7 - The extensible programming language: User defined statements
and operators, abstract data types, templates without special
syntax, OO with interfaces and multiple dispatch, statically typed,
interpreted or compiled, portable, runs under linux/unix/windows.
 
J

James Dow Allen

I'm surprised too when I look at a typical memcpy() implementation ...
...
But I trust the library programmers; that they did some thinking ...

I hate to be Contrary Harry again, but I had an experience circa 1987
that was startling. I was working with a Sun-3 workstation, wrote
my own memcpy() (just a trivial unrolled loop in C), compared it with
the
library memcpy() and discovered mine was ... much faster !!?!?!?!??!!

Delving into Sun-3 source code I found that memcpy() was a very
tiny loop in assembly, with a comment clearly stating
"optimized for the [tiny] Sun-2 instruction cache."
Please understand that the Sun-2 was quite obsolescent by then;
indeed the Sun-4 was soon to arrive.

I'm speaking of memcpy() for heaven's sakes! If one is supposed to
remember to optimize *any* function, that function is memcpy()!!!

(I suppose some c.l.c'ers will boast that the best compiler/library
designers weren't even alive in 1987. ;-) But 1987 did seem
high-tech at the time, to those of us who'd been programming
plugboards in the 1960's. :)

James Dow Allen
 
B

BartC

tm said:
You use C standard functions directly from your interpreter,
but your wrapper functions need to be interpreted?

That sounds strange. Why don't you write the wrapper
functions in C?

For some functions, they *are* written in the same compiled language as the
interpreter itself. That's fine, I'm the implementer.

But in general, somebody writing in my interpreted language does not want to
mess about with modifying and building the interpreter, or writing C code,
which then needs to be placed in a DLL file, when then calls the C function
in turn.

To take a very simple example, I have a function ival(string) => int, based
on C's atoi() function. But I have to write it, in interpreted code, like
this:

function ival(s) =
if s="" then return 0 fi
return atoi(s)
end

Because, if I called atoi() with s="" (ie. NULL), it would likely crash
(depending on the C compiler and runtime library; gcc seems to be good at
checking for NULLs, and behaving exactly as I've suggested; with other
compiler's, the behaviour is unpredictable).

Maybe this function could be 'compiled', by having it as a built-in function
to the interpreter, but you don't really want to do that unless it's
impossible to write otherwise, or is highly speed-critical; the idea is to
do as much as possible in the 'easy' language.
The Seed7 interpreter has wrapper functions written in C for
almost all primitive actions. How can an interpreter directly
call e.g. strcpy without checking that the destination is
allocated and is big enough? Seed7 does not do such things.
Seed7 manages string memory automatically and does other
things that the C standard library functions do not care about.

Not all C functions are practical to use directly from interpreted code. I
just have the ability to call any DLL function, but it doesn't always make
sense, such as low-level functions that are not needed. One that can work
well is:

printf("Hello, World!\n")

which looks remarkably like it's C version. But try this:

s:=""
printf(s) # or printf("%s",s)

and it might crash (depending on whether the msvcrt or crtdll library was in
use). And printf with it's variable arguments is awkward to wrap.
Please don't tell me that you want to change the C standard
library functions just to speed up your interpreter. That is
just the wrong approach.

That was just an example of a specific problem where someone is using NULL
to represent an empty string instead of "".

Sometimes NULL is treated sensibly by standard functions, and sometimes it
isn't. It seems that if you're running gcc, as well as the extra check for
NULL in your application, there might be one in the runtime too.

If library functions were to standardise on what happens with NULL
arguments, I think that would be helpful.
 
K

Keith Thompson

BartC said:
If library functions were to standardise on what happens with NULL
arguments, I think that would be helpful.

I disagree, but let's assume for the moment that you're right.

At best, such a change to the standard might be approved by the
committee for the upcoming C201X standard. It would require a
non-trivial amount of work just to define what each standard function
does with a null pointer argument. Ok, you want strlen(NULL) to return
0. Here are a few other things that would have to be defined:

memcpy(buf, NULL, 0)
memcpy(buf, NULL, 42)
strcpy(s, NULL)
strcpy(NULL, s)
strcmp(NULL, "")
strcmp(NULL, "foo")
strstr(NULL, NULL)
memset(NULL, 0, 0)
memset(NULL, 0, 42)

And what about things other than string functions?
rename("foo.txt", NULL)
fopen(NULL, NULL)
fclose(NULL)
printf("%s", NULL)
sprintf(NULL, "%s", "hello")
fprintf(NULL, NULL)
printf("%n", NULL)

I suppose you could define a coherent set of rules for dealing with
null pointers for all these things, but I'm not convinced it would
be possible to reach universal agreement on what makes sense.

But let's assume the committee agrees with you and makes the
necessary changes to the C201X standard. How long do think it
will be before you can assume that any implementation you're using
follows the new behavior? I'd be astonished if the answer were as
low as a decade.
 
E

Eric Sosman

Eric Sosman said:
[...]
You can have endless discussions like this. Some people just
fear error messages. The fear is so big that they prefer to
get wrong results.

This fear from error messages exists for compile-time error
messages and for runtime error messages.

I have a totally different view than this fearful people. I am
happy when I get an error message (compile-time or runtime),
since it gives me the opportunity to improve my program. The
error will not go away when I don't hear from it.

I can't say whether "fear" is the right word, and I can't say
that error messages make me "happy," but I agree wholeheartedly
that an error message is vastly preferable to incorrect output
offered with a pretense of accuracy.

If a program requires that a certain pointer aims at the start
of a string and it happens that the pointer is NULL, this is proof
that the program has already erred. Pretending that NULL is "" does
not solve the problem, does not repair the error. Most likely, it
makes the error harder to find by deferring its discovery until the
back-trail has grown both longer and colder. It's treating a symptom,
not treating the disease -- or, as a long-ago manager of mine used to
say, it's "throwing a blanket over the problem."

I make use of many C standard functions from an interpreted language.

But calling C functions directly is fraught with problems: I represent a
string as (pointer,length), where the pointer part is passed to a C
function.

But when length is 0, pointer is usually zero too. This can cause a crash
when calling certain functions with NULL for certain arguments.

Sounds to me like a bug in the interpreter. Probably not a bug
in the implementation as such, but in the design: (pointer, length)
doesn't translate easily to C's notion of "string" at all, so the
interpreter shouldn't be trying to pretend that those are "strings."
[...] so that wrapper
functions have to be used, slowing things down (as often these are
interpreted too).

Oh, puh-leeze! You've got a buggy interpreter design, and you're
worried about running its bugs too slowly? Oh, puh-leeze!
The choices of using NULL and/or "" for No String, and NULL and/or "" for
Empty String, just makes things fiddlier than they need to be. Having
system
functions support NULL without crashing (applying whatever documented
meaning they assume), would have been useful.

You want to abolish the distinction between No String and Empty
String? You want strcmp(getenv("NO_SUCH_VARIABLE"), "") to yield 0?
You want strcmp(strchr("abcd", 'x'), "") to yield 0? If you can't
perceive the difference between No String and Empty String, I bet you
had trouble understanding the difference between "The Emperor wears
transparent clothing" and "The Emperor is nekkid as a jaybird."

I give up. Most people seem to see utility in being able to tell
the difference between "No mountain is both taller than Everest and
shorter than Fuji" and "The mountain taller than Everest and shorter
than Fuji has height zero." If you don't grasp why this is a useful
notion, I fear I can't explain it to you.
 
B

BartC

Keith Thompson said:
I disagree, but let's assume for the moment that you're right.

At best, such a change to the standard might be approved by the
committee for the upcoming C201X standard. It would require a
non-trivial amount of work just to define what each standard function
does with a null pointer argument. Ok, you want strlen(NULL) to return
0. Here are a few other things that would have to be defined:

Some of these don't deal with strings (ie. zero-terminated sequences of
chars).

Where a string argument is needed as a destination, then a non-NULL argument
is still needed for it to work properly. If NULL is supplied, then it cannot
be written to. (What happens now, is it undefined?)

A few of these might be controversial, for example whether strcmp(NULL,"")
will match the strings or not (I would say Yes).
memcpy(buf, NULL, 0)
memcpy(buf, NULL, 42)

Not string functions. But, no copy is performed as no source data is
provided. (This must be what it does at the minute; surely these will not
crash with null arguments?)
strcpy(s, NULL)

One of the controversial ones; should s be set to "", or left unmodified?
strcpy(NULL, s)

No copy.
strcmp(NULL, "")

Return 0.
strcmp(NULL, "foo")

Return -1.
strstr(NULL, NULL)

Return NULL (perhaps "" is better, but NULL is OK provided other functions
now deal with NULL too)
memset(NULL, 0, 0)
memset(NULL, 0, 42)

Not string functions, but again, no action is performed as no address is
provided. (NULL works better as "" when it is an input, rather than an
output, which needs a tangible address to work with.)

And what about things other than string functions?
rename("foo.txt", NULL)

Since rename("foo.txt","") is not useful (and probably a no-op), then this
can be a no-op too.
fopen(NULL, NULL)

Return NULL (as the operation will fail).
fclose(NULL)

This is nothing to do with strings. But no reason why it cannot be a no-op
(I believe this can crash at the moment).
printf("%s", NULL)

This sometimes prints "(null)". If NULL already has an expected behaviour
here, perhaps it should be left alone.
sprintf(NULL, "%s", "hello")

No destination, so no action.
fprintf(NULL, NULL)

Error, because no file handle provided. Probably it can just do nothing.
printf("%n", NULL)

That's a weird one; I've never come across %n before. Obviously, it should
just do nothing, as no destination is provided.
I suppose you could define a coherent set of rules for dealing with
null pointers for all these things, but I'm not convinced it would
be possible to reach universal agreement on what makes sense.

Many are obvious. Some not so clear, and some debatable. What seems obvious
(to me anyway), is that library functions shouldn't cause a crash just
because they have been deliberately passed a null pointer, that the function
hasn't checked for. (We're not talking about invalid pointers, just ones
with a null value.)
But let's assume the committee agrees with you and makes the
necessary changes to the C201X standard. How long do think it
will be before you can assume that any implementation you're using
follows the new behavior? I'd be astonished if the answer were as
low as a decade.

Originally I was just expressing an opinion on how NULL strings might have
been handled.

But if this change was made, it might well be a long time before
conventional NULL checking could be omitted from code. Some advantages can
be realised sooner by writing conditional code (#ifdef C201X..), so that
generated code is shorter, but would mean even more clutter than now
(checking for NULL *and* C201X...)

The only practical workaround would be to use different names for the
functions:

strlen_1x() would accept NULL strings, and might be implemented as a wrapper
for strlen(); or be implemented directly, and strlen() is an alias for it.
Eventually strlen() could be used by itself.
 
B

BartC

Eric Sosman said:
On 12/11/2010 10:10 AM, BartC wrote:

Sounds to me like a bug in the interpreter. Probably not a bug
in the implementation as such, but in the design: (pointer, length)
doesn't translate easily to C's notion of "string" at all, so the
interpreter shouldn't be trying to pretend that those are "strings."

In that design, the pointer is a pointer to a zero-terminated char array,
just like C. But because I also have a length, I've chosen to use (null,0)
for empty strings.

That could also have been a viable choice in a C program.

(Why didn't I just stick in a pointer to a dummy zero byte somewhere? I
thought that was the way it did work, then investigated why it was crashing,
and realised it didn't! But a zero pointer for empty data mirrors the way
other data structures are handled; an array has 0 elements, therefore it's
pointer is 0 -- where would it point to otherwise?

And I use a different scheme for No String, by having a tag saying whether
something is a string, or something else.)
You want to abolish the distinction between No String and Empty
String? You want strcmp(getenv("NO_SUCH_VARIABLE"), "") to yield 0?
You want strcmp(strchr("abcd", 'x'), "") to yield 0? If you can't
perceive the difference between No String and Empty String, I bet you
had trouble understanding the difference between "The Emperor wears
transparent clothing" and "The Emperor is nekkid as a jaybird."

Except that many C library functions don't seem to recognise the concept of
No String (as personified by a NULL argument where a char* one is expected);
they assume it IS a pointer to a string, empty or otherwise.
 
M

Mark Wooding

Keith Thompson said:
Here are a few other things that would have to be defined:

memcpy(buf, NULL, 0)
memset(NULL, 0, 0)

I certainly think the world would be a better place if these had defined
(trivial) behaviour -- if just to be consistent with the permission
granted to malloc to return a null pointer when a zero-sized block is
requested. Similarly for memcpy(NULL, buf, 0), memmove, and maybe
memchr and so on. Note that -- unlike the str* functions -- this isn't
introducing some new magical meaning for a null pointer; rather, it's
saying that the mem* functions should behave as if they only access the
areas of memory specified by their arguments.
memcpy(buf, NULL, 42)
strcpy(s, NULL)

And so on should continue to be undefined, I think.

-- [mdw]
 
J

JohnF

Jorgen Grahn said:
JohnF wrote:

[strcpy()
And, like you guys said, I >>am<< surprised such runtime optimizations
work, i.e., that, on average, they save more time than they cost.

I'm surprised too when I look at a typical memcpy() implementation
which chooses between a whole bunch of different strategies before
starting with the actual copying.
But I trust the library programmers; that they did some thinking and
measurements on real-world programs before coming up with that one. In
modern systems hitting memory is expensive, so the gain a lot by
copying 16, 32 or 64 bits at a time probably shadows the cost of the
"detection" phase pretty soon.

Well, you subtly changed the discussion from strcpy() to memcpy().
As I pointed out to JDA in a preceding followup, strcpy() is mostly
used just to parse input, format output, that kind of stuff, i.e.,
not usually in the middle of computationally intensive tasks.
But I've often used memcpy() in contexts where a whole lot of bytes
would get moved during the course of execution.
So I'm more inclined to accept memcpy() optimizations as worthwhile,
whereas even if a strcpy() optimization saves 99.9% of the time it uses,
that'll still represent only, like, 0.000001% of the program's total
execution time. Seems penny wise, pound foolish (i.e., the compiler
implementers could probably better spend their own time optimizing
other things).
If it makes you feel any better, I'd expect the copy in
void foo(const struct Bar* bar) {
struct Bar baz = *bar;
... }
to be well-optimized: just a few 32-bit inlined MOVEs if Bar is
sufficiently small, or else a call to memcpy(). But strcpy() is
trickier; the compiler usually has no idea if the string is 0, 1 or
1 000 000 characters long.

Well, I wouldn't really have many expectations (based on viewing
C as a portable assembler) about how struct Bar baz = *bar; would
compile. That's clearly a higher level construct to begin with.
But memcpy() has a clear correspondence to the underlying instruction
set of any processor I've ever seen. And I'd prefer that to compile
in a way which respects expectations based on that correspondence,
e.g., if the compiler insists on optimizing memcpy, it should at least
be done in a way so that memcpy(s,s+k,n) (or strcpy(s,s+k)) works as
expected on that kind of overlap.
 
J

J. J. Farrell

Keith said:
Only because SGI *and everyone else in the world* didn't do it the way
HPUX did, and because the HPUX behavior isn't guaranteed by the
standard.

No, a great many other OSes and systems did it that way; I suspect most
commercial UNIXes before SVR4, for example, and I've heard various DEC
machines. I'd guess most of the systems on which C was used in its
earlier days had this "feature". UNIX System V utilities in the 80s
frequently had the assumption that a NULL pointer represented an empty
string, implemented by the NULL pointer having value 0, and a page of
zeroes being mapped at 0.
 
E

Eric Sosman

In that design, the pointer is a pointer to a zero-terminated char array,
just like C. But because I also have a length, I've chosen to use (null,0)
for empty strings.

Consider the string "abcd", in your interpreter (ptr to 'a', 4).
To make a C string out of this, you've got to tack on a '\0' after
the payload characters; the `4' doesn't translate to C. So already
there's a disconnect that you must paper over.

Now take the substring consisting of the first two characters of
the string mentioned above. Representation: (ptr to 'a', 2). Except
that in order to translate this to a C string you need to put a '\0'
where the 'c' is; that is, you need to make a copy somewhere else.
Again, you're doing extra work to deal with the disconnect.

Now consider storing '\0' where the 'c' is. Your descriptor is
still (ptr to 'a', 4), and you can still arrange to append a '\0' after
the four-char payload, but the C string will not be (*can* not be)
"ab\0d", no matter how hard you try. You're papering over the gap and
not quite meeting both edges thereof.

In short, C's strings are *not* suitable for a start-and-length
style. There's been some hot-headed discussion of this, and I do not
wish to re-ignite it; let us simply take it as read that C strings as
they are -- regardless of what they "should" be -- just aren't adequate
vehicles for a start-and-length representation. If your language thinks
of a string as "The *many* characters beginning *here*," it is a mistake
to attempt to map this notion onto C strings.

And no amount of "NULL is empty string," even if you were to get the
requisite legislation through Congress, would alter the situation. Even
*with* your longed-for change, the mapping from start-and-length to
C-strings-or-NULLs *still* wouldn't work. Design thinko.
That could also have been a viable choice in a C program.

(Why didn't I just stick in a pointer to a dummy zero byte somewhere? I
thought that was the way it did work, then investigated why it was
crashing,
and realised it didn't! But a zero pointer for empty data mirrors the way
other data structures are handled; an array has 0 elements, therefore it's
pointer is 0 -- where would it point to otherwise?

You have an array of zero elements? In a C program? How did
you manage that? Was this before or after you squared the circle?
 
B

BartC

Eric Sosman said:
On 12/11/2010 5:53 PM, BartC wrote:

Consider the string "abcd", in your interpreter (ptr to 'a', 4).
To make a C string out of this, you've got to tack on a '\0' after
the payload characters; the `4' doesn't translate to C. So already
there's a disconnect that you must paper over.

I've already said (see above) these strings are *already* zero-terminated.
Now take the substring consisting of the first two characters of
the string mentioned above. Representation: (ptr to 'a', 2). Except
that in order to translate this to a C string you need to put a '\0'
where the 'c' is; that is, you need to make a copy somewhere else.
Again, you're doing extra work to deal with the disconnect.

Embedded substrings cannot be zero-terminated. However, in this code
fragment from the language (s.[1..2] is leftmost 2 characters), which does a
direct call to C's printf():

s:="abcd"

printf("%s %s\n", s.[1..2], s)

actual output:

ab abcd

it does appear to work (not sure how it does it, I've not going to delve
into it now).
In short, C's strings are *not* suitable for a start-and-length
style. There's been some hot-headed discussion of this, and I do not
wish to re-ignite it; let us simply take it as read that C strings as
they are -- regardless of what they "should" be -- just aren't adequate
vehicles for a start-and-length representation. If your language thinks
of a string as "The *many* characters beginning *here*," it is a mistake
to attempt to map this notion onto C strings.

In a newer design, I've done away with the zero-terminator. Internally,
interfaces to foreign functions will sort out whatever is needed, but it is
not a big deal. For example, for non-embedded strings, spare memory
allocation means that in at least 95% of cases, there is space to insert a
zero byte.

And print()-family functions for example, can anyway work happily with
counted strings:

printf("%.*s", length, ptr); # represents a call in the runtime

For direct calls in the language itself (which will now be native code) to C
or Windows functions expecting zero-terminated strings, probably internal
mechanisms will help out here too, otherwise it might mean writing:

s:="Hello, World!\n"
printf(s + "\z")

or:

printf(cstring(s))

or some other ugly workaround.
And no amount of "NULL is empty string," even if you were to get the
requisite legislation through Congress, would alter the situation. Even
*with* your longed-for change, the mapping from start-and-length to
C-strings-or-NULLs *still* wouldn't work. Design thinko.

No. You have C (or it's runtime) on one hand, with it's nul-terminated
strings. And *any* language on the other, with *any* implementation of
strings it likes.

And there's a bit in the middle, eg. the language runtime, which will do
whatever it takes to make it work.
You have an array of zero elements? In a C program? How did
you manage that? Was this before or after you squared the circle?

It was just after I wrote this:

void print_intarray(int *p, int length) {
printf("(");
while (length--) {
printf("%d",*p++);
if (length) printf(", ");
}
printf(")\n");
}

int main(void) {

int *array=0;
int array_length=0;

print_intarray(array,array_length);

array = malloc((array_length=4)*sizeof *array); /* assume success */
array[0]=1000;
array[1]=1111;
array[2]=1222;
array[3]=1333;

print_intarray(array,array_length);
}

The array is zero length at the time of the first print_intarray() call.
 
E

Eric Sosman

Eric Sosman said:
On 12/11/2010 5:53 PM, BartC wrote:
[...] an array has 0 elements, therefore
it's
pointer is 0 -- where would it point to otherwise?

You have an array of zero elements? In a C program? How did
you manage that? Was this before or after you squared the circle?

It was just after I wrote this:
[...]
int *array=0;
[...]
The array is zero length at the time of the first print_intarray() call.

"The array?" What array? Perhaps you should read Section 6
of the FAQ. (If you've already read it, re-read it. This time,
read for comprehension.)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Staff online

Members online

Forum statistics

Threads
474,083
Messages
2,570,588
Members
47,211
Latest member
JaydenBail

Latest Threads

Top