strndup: RFC

R

Richard Heathfield

jacob navia said:

Since there isn't in C a portable way to determine if a memory
block is the result of malloc() we are screwed...

There is actually such a way. It's called "programming". When we call
malloc, obviously we know we're calling malloc, so we're in an ideal
position to record the fact that this particular pointer was returned by
malloc. The fact that there isn't some magical "is_malloc" function in ISO
C doesn't mean that there is no portable way to achieve what we want.

It's the same as with block sizes - "if you need to know this, well, at one
point in the program you *do* know it, so the answer is simply: DON'T
FORGET".
 
R

Richard Heathfield

Richard Tobin said:

Um, you seem to have mixed up strndup and strncpy here.

Mea culpa.
But looking
at my own posting, I seem to have done the same thing in the last
sentence.

Youa culpa!
If you know the string is null-terminated but don't know
how big it is, just use strdup.

If you don't know how big it is, you might want to find out before calling
strdup. (I'm thinking of possible Denial of Memory attacks.)
I can't imagine why you'd use strndup
unless you had a counted string or wanted to just copy a prefix of the
string.

Yes - splitting up a CSV line into tokens, perhaps (where you've answered
the "how long" question with something like strchr and ptr arithmetic).
 
B

Barry Schwarz

Joe Wright said:
jacob navia wrote:
[ snip ]
stdlib.h defines
calloc(size_t,size_t)
malloc(size_t)
qsort(void *,size_t,size_t,int (*)(...etc));
realloc(size_t);
and many others, so I do not see how size_t could be unknown after
including stdlib.h...
Obviously in other implementation they could have defined size_t
several times in several files.

Lazy? Headers tend to declare rather than define. All four of your
examples fail as prototypes.

How so? A prototype is a function declaration that declares the types
(not necessarily the names) of its parameters. Only the definition
needs the parameter names.

All the samples seem to be missing the return type.


Remove del for email
 
C

CBFalconer

Richard said:
Richard Tobin said:



Mea culpa.


Youa culpa!

We'alla culpa!

.... snip ...
Yes - splitting up a CSV line into tokens, perhaps (where you've
answered the "how long" question with something like strchr and
ptr arithmetic).

Which is basically my function 'toksplit', which can be thought of
as a combination of strchr and strncpy, plus fol-de-rol:

const char *toksplit(const char *src, /* Source of tokens */
char tokchar, /* token delimiting char */
char *token, /* receiver of parsed token */
size_t lgh) /* length token can receive */
/* not including final '\0' */
{
if (src) {
while (' ' == *src) src++;

while (*src && (tokchar != *src)) {
if (lgh) {
*token++ = *src;
--lgh;
}
src++;
}
if (*src && (tokchar == *src)) src++;
}
*token = '\0';
return src;
} /* toksplit */
 
C

CBFalconer

Stan said:
Why can't you test the arguments for valid conditions and set errno
to EINVAL when one of the condition fails, and return NULL. Now,
if the validations are good and the memory allocation fails errno
will be set to ENOMEM. Now you know why you failed and can get the
error message with strerror().

In part because I disapprove of global variables.
 
R

Richard Heathfield

Barry Schwarz said:
Joe Wright said:
jacob navia wrote:
[ snip ]
stdlib.h defines
calloc(size_t,size_t)
malloc(size_t)
qsort(void *,size_t,size_t,int (*)(...etc));
realloc(size_t);
and many others, so I do not see how size_t could be unknown after
including stdlib.h...
Obviously in other implementation they could have defined size_t
several times in several files.

Lazy? Headers tend to declare rather than define. All four of your
examples fail as prototypes.

How so? A prototype is a function declaration that declares the types
(not necessarily the names) of its parameters. Only the definition
needs the parameter names.

All the samples seem to be missing the return type.

Guys, guys, what's the matter with you? Mr Navia was just showing how often
size_t crops up in <stdlib.h>, for heaven's sake! He has a perfectly valid
point, which is borne out by the Standard. If you want <stdlib.h> you know
where to find it.
 
R

Richard Tobin

If you know the string is null-terminated but don't know
how big it is, just use strdup.
[/QUOTE]
If you don't know how big it is, you might want to find out before calling
strdup. (I'm thinking of possible Denial of Memory attacks.)

It wouldn't be a very *good* denial of memory attack. strdup only
produces a string as big as one you've already got. You've already
made the mistake by reading in or creating the existing string.

It's not impossible for it to be a real consideration I suppose. You
might not want to call strdup inside your operating system kernel on a
string passed in by the user. But strndup is probably not the place
to fix the problem in that case either.

-- Richard
 
R

Roland Pibinger

I've written code like that, yes, though more often with <OT>open() /
close()</OT> than fopen() / fclose(). It's also pretty standard when
working with <OT>sockets</OT>; because they require so much f'ing work
to create, people tend to put all that in another function to avoid
clutter.

IMO, there is no problem when you write your own symmetric open
(create, connect, ...) and close (cleanup, disconnect, ...) functions,
e.g.

Handle h = my_open (...);
// ...
my_close (h);
This whole "who deallocates the returned string" argument is one of the
largest problems I have with C; yes, you can always find the correct
answer if you look at the function specs (assuming they exist), but it's
not obvious and is therefore prone to errors.

So just don't do it, i.e. don't return a string that has to be freed
by the caller. Even the Windows API uses that convention, AFAIK.

Best wishes,
Roland Pibinger
 
S

stmilam

Richard said:
Stan Milam said:



Except that neither EINVAL nor ENOMEM is defined by the Standard.

Ah, another glaring deficiency of the standard. Every implementation I
have used for over 20 years has had EINVAL and ENOMEM defined.

Regards,
Stan Milam.
 
R

Richard Heathfield

(e-mail address removed) said:
Ah, another glaring deficiency of the standard. Every implementation I
have used for over 20 years has had EINVAL and ENOMEM defined.

I don't see why that means the Standard is deficient. By the same reasoning,
someone who has only ever used Turbo C 2.0 could claim that the Standard is
glaringly deficient in its omission of initgraph() from the library section
despite its being present in every implementation that person has used for
over 20 years - but this would not say so much about the Standard as it
would about the person making the claim!
 
K

Keith Thompson

Richard Heathfield said:
(e-mail address removed) said:
Richard Heathfield wrote: [...]
Except that neither EINVAL nor ENOMEM is defined by the Standard.

Ah, another glaring deficiency of the standard. Every implementation I
have used for over 20 years has had EINVAL and ENOMEM defined.

I don't see why that means the Standard is deficient. By the same reasoning,
someone who has only ever used Turbo C 2.0 could claim that the Standard is
glaringly deficient in its omission of initgraph() from the library section
despite its being present in every implementation that person has used for
over 20 years - but this would not say so much about the Standard as it
would about the person making the claim!

Yes, but EINVAL ("Invalid argument") and ENOMEM ("Not enough space")
are much more generic, perhaps to the point of being universal, than
something like initgraph().

The only E* macros defined by the standard in <errno.h> are EDOM,
EILSEQ, and ERANGE. I don't see the lack of EINVAL and ENOMEM as
"glaring deficiencies", but it would have been perfectly reasonable to
include them.

On the other hand, standardizing more E* macros would encourage the
use of the ugly errno mechanism. On the other other hand, it would be
nice if the standard offered something better. On the other**3 hand,
defining a decent error handling mechanism for C without turning it
into C++ would be non-trivial.
 
J

jacob navia

Richard said:
jacob navia said:




There is actually such a way. It's called "programming". When we call
malloc, obviously we know we're calling malloc, so we're in an ideal
position to record the fact that this particular pointer was returned by
malloc. The fact that there isn't some magical "is_malloc" function in ISO
C doesn't mean that there is no portable way to achieve what we want.

It's the same as with block sizes - "if you need to know this, well, at one
point in the program you *do* know it, so the answer is simply: DON'T
FORGET".

Please....

You really mean that for users to use this getline()
function they should set up a complicated system of maintaining a list
of memory allocated with their sizes???
 
R

Richard Heathfield

jacob navia said:
Please....

You really mean that for users to use this getline()
function they should set up a complicated system of maintaining a list
of memory allocated with their sizes???

No, I only mean that it is possible for a program to keep track of such
things and, if it is *necessary*, then of course it should be done. If
getline() makes memory management too much of a pain for a particular task,
well, there is no shortage of alternatives.
 
S

Stephen Sprunk

Roland Pibinger said:
IMO, there is no problem when you write your own symmetric open
(create, connect, ...) and close (cleanup, disconnect, ...) functions,
e.g.

Handle h = my_open (...);
// ...
my_close (h);

Except that, in the case of <OT>sockets</OT>, all you need is a simple
close(fd) to get rid of them. I suppose I could create a my_close(fd)
function, but in reality all it'd do is close(fd), and people would end
up not using it because it'd add overhead.
So just don't do it, i.e. don't return a string that has to be freed
by the caller. Even the Windows API uses that convention, AFAIK.

.... and the result is that any time you call a huge number of
<OT>Windows API</OT> functions, you have to call them twice: once with a
NULL argument to find out how big of a buffer to create, and again after
you've created the buffer to actually get the data you wanted. That's
not so great an idea when you consider the overhead of having to do all
your syscalls twice, and it leads to people taking shortcuts that end up
biting them later.

S
 
R

Richard Bos

jacob navia said:
Roland Pibinger a écrit :

Actually, TTBOMK they were not included for three reasons:
- There was not enough prior art for them back then, as explained in the
Rationale;
- There would have to be an argument about whether they belonged in
<string.h>, because they deal with strings, or in <stdlib.h>, because
they deal with memory allocation, and no matter which of those were
chosen, there would always have been accusations of the Committee
getting it horribly wrong (because this is precisely the kind of anal-
retentive detail about which some people who should get a hobby Have
Opinions);
- They are remarkably simple for the user-programmer to write.
1) What's wrong with the user deallocating?

For a programmer with more than a year's experience, nothing.
2) Maybe this view is changing since that technical report is there...

That TR looks like a follow-up on that other "safer" function report.
It, too, was broken.

Richard
 
R

Richard Bos

Wrong analogy again. Would you write a function like the following
(probably not):

/* user must call fclose() on the returned FILE* */
FILE *do_something (int i);

Yes, I would definitely write a function like that. It is very useful
for, say, opening a log file and writing a standardised header to it.
The calling function(s) then write log entries to it, and close it when
it's no longer needed.

Richard
 
R

Richard Tobin

By that reasoning malloc, calloc, and realloc should also be
omitted.
[/QUOTE]
Why? Do those functions force you to free something you haven't
allocated?

Um, why do you consider that you have done the allocating in the
case of calloc() but not in the case of strdup()? Both allocate
some memory and store values in it.
Would you write a function like the following
(probably not):

/* user must call fclose() on the returned FILE* */
FILE *do_something (int i);

Whyever not?

-- Richard
 
K

Keith Thompson

Yes, I would definitely write a function like that. It is very useful
for, say, opening a log file and writing a standardised header to it.
The calling function(s) then write log entries to it, and close it when
it's no longer needed.

That's a good example. In effect, the function is a wrapper around
fopen(). If you absolutely insist on having strict pairings of
functions for allocation and deallocation, you could write
FILE *open_log(int i);
int close_log(FILE *log);
but close_log would do exactly the same thing as fclose().

In general, some functions allocate resources, and other functions
deallocate those same resources. Having a strict one-to-one mapping
between the two can be helpful, I suppose, but IMHO it's not strictly
necessary. Even the standard memory allocation functions don't do
this; malloc() and calloc() both allocate memory, free() deallocates
it, and realloc() potentially does both. The clean symmetry of
fopen() and fclose() is broken by freopen().

Programmers simply have to *read the documentation* to find out which
function does what. The interface described by the documentation
should be as simple as possible, but no simpler. If it makes sense to
have a strict pairing of allocators and deallocators, by all means do
that; if it doesn't, don't.

(In OO terms, there's no requirement to have exactly one constructor
and exactly one destructor for a given type.)
 
S

Stan Milam

Richard said:
(e-mail address removed) said:


I don't see why that means the Standard is deficient. By the same reasoning,
someone who has only ever used Turbo C 2.0 could claim that the Standard is
glaringly deficient in its omission of initgraph() from the library section
despite its being present in every implementation that person has used for
over 20 years - but this would not say so much about the Standard as it
would about the person making the claim!

I haven't used Turbo C since around 1992. I never cared for it either.
I've used C on several platforms, mostly UNIX these past few years,
but also DOS and some embedded stuff, using a wide variety of compilers.
EINVAL and ENOMEM have always been defined so why not standardize
them? Now as for the initgraph() analogy (this shows you know more
about Turbo C than I do), well, that is just silly. Of course a
compiler vendor's graphic library is not very likely to be portable,
ergo not a good candidate for standardization, but every C program I
know has functions using arguments, and dynamic memory allocation is a
way of life in C (and one of the truly great things about C). Why not
define values for errno when invalid argument values are used and when
no memory is available for allocation? It seems such a simple and
reasonable idea. Moreover, why isn't there a required compiler flag
that when used will cause the compiler to generate an error when
undefined behavior is encountered? Oh, yeah, it's called LINT!

Now I suppose I am in over my head discoursing such weighty issues with
an erudite C programmer such as yourself. I am by my own admission just
an old country programmer trying to make a living. But I think there is
some sensibility to what I suggest.

I am not finished yet, but I am feeling ill - like I might be having a
stroke - seriously!

Got to run.

--
Regards,
Stan Milam
=============================================================
Charter Member of The Society for Mediocre Guitar Playing on
Expensive Instruments, Ltd.
=============================================================
 
R

Richard Heathfield

Stan Milam said:

Why not
define values for errno when invalid argument values are used and when
no memory is available for allocation? It seems such a simple and
reasonable idea.

Implementors do not introduce extensions that they consider complicated and
unreasonable - at least, not often! Nevertheless, the mere introduction of
a simple and reasonable extension by one implementor does not oblige ISO to
adopt that extension and standardise it. They might do so, but they might
not. If you disagree with their decision, by all means take it up with
them. If you are sufficiently persuasive, maybe EINVAL and ENOMEM will
finally make it into the Standard.
Moreover, why isn't there a required compiler flag
that when used will cause the compiler to generate an error when
undefined behavior is encountered?

What should the compiler do with the following code, if such a flag is in
operation?

#include <stddef.h>
void foo(int *p, int *q, size_t len)
{
while(len--)
{
*p++ = *q++;
}
}

Do we need a compiler error ("Error: undefined behaviour in foo()") here, or
not? The answer is: it all depends. Specifically, it depends on how the
function is called. And the compiler might never see this code and the
calling code in the same invocation.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,954
Messages
2,570,116
Members
46,704
Latest member
BernadineF

Latest Threads

Top