[C] simple string question

Oleg Melnikov · Feb 6, 2004

B. v Ingen Schenau said:
AirPete said:

Which is just a waste of space in a constant length string.

Click to expand...

But in C, the term 'string' is defined as 'a contiguous sequence of
characters terminated by and including the first null character.' (C
standard clause 7.1.1/1).
Without the nul-terminator, you can not call it a string, regardless of what
you do with it.

- Pete
Bart v Ingen Schenau

Click to expand...

With using constant string like char[80] you should use 'n'-funcs
from std library like strncmp, strncpy. Note that s="Test string" is
already null-terminated.

Melnikov Oleg, (e-mail address removed)

Click to expand...

Joona I Palaste · Feb 6, 2004

Mac <[email protected]> scribbled the following

Mac said:
Mac said:

E. Robert Tisdale wrote:
Alan wrote:
I want to define a constant length string, say 4
then, in a function at some time,
I want to set the string to a constant value,
say a below is my code but it fails.
What is the correct code?

char string[4] = {0};

string = 'a '; /* <-- failed */

I'm going to assume that you really meant an array of characters.
In C, a *string* must be terminated by a nul character '\0'.

The Alan's code does terminate the string.

Click to expand...

Click to expand...

Why did you change "The OP's" to "The Alan's?" I'm sure I make grammatical
mistakes from time to time, but I would appreciate it if you did not add
new ones when quoting me.

You're lucky he only did that. He's been known to alter other people's
arguments, and to alter their code, introducing errors that weren't
present in the original code.

Jerry Coffin · Feb 6, 2004

[ ... ]

what I want to do is,
copy characters from some fixed positions at a source file, and then write
those fixed length characters to a new binary file. And there are times that
I assign values directly to those fixed length characters instead of reading
from a source file and then write to the binary file.
After writing the binary file, I will read the fixed length characters base
of the length of characters.
Here is how I do it:

Roughly 90% of the code you've given is more or less pointless. First
of all, to accomplish this you don't need to (and generally don't want
to) make extra copies of the characters at all. To copy N characters
from one file to another, you can do something like this:

char buffer[N];
FILE *infile, *outfile;

// ... open files, fseek to starting points in files.

int characters_read = fread(infile, buffer, 1, N);

if (characters_read != N)
; // couldn't read that many characters.

fwrite(outfile, buffer, 1, characters_read);

If you want to create and write the data, you just set up the contents
of buffer, and then write it out. In the end, it comes down to this:
you're using NUL-terminated strings, and string I/O functions to work
with non-string data. You may be able to make that work, but even at
best it probably won't work very well -- at the very least, it leads the
reader to believe that you're dealing with string-like data, which
apparently isn't the case.

CBFalconer · Feb 6, 2004

Joona said:
You're lucky he only did that. He's been known to alter other
people's arguments, and to alter their code, introducing errors
that weren't present in the original code.

I think it's a new trick he learned in the past few months. I
don't recall him performing it earlier than that. He may be
playing the game "See how many I can sneak in without getting
caught".

Jerry Coffin · Feb 6, 2004

[ ... ]

It would be hard to argue that the coder thought the problem through. If he
only wants the first 50 characters, why can the source buffer hold more
than 50? (If it can't, strcpy will work fine.) And why not just nail it:

source[49] = '\0';
strcpy(dest, source);

I don't seem to have the original post, so it's hard for me to guess at
all the details here, but I can imagine situations where it's legitimate
to have a larger string, and under some circumstances you want a shorter
version (e.g. with column headers, it's fairly common to truncate or
wrap a name at the width of the column).

The code you've given above truncates the original string, which is
rarely what's wanted. If you want to copy the first N characters of a
string, strncat usually does the job quite nicely:

dest[0] = '\0';
strncat(dest, source, N);

strncat requires that you start with a terminated string, but otherwise
it mostly does what most people expect strncpy to do.

CBFalconer · Feb 6, 2004

Jerry said:
(e-mail address removed) says...

[ ... ]

what I want to do is, copy characters from some fixed positions
at a source file, and then write those fixed length characters
to a new binary file. And there are times that

Click to expand...

.... snip ...

char buffer[N];
FILE *infile, *outfile;

// ... open files, fseek to starting points in files.

int characters_read = fread(infile, buffer, 1, N);

if (characters_read != N)
; // couldn't read that many characters.

fwrite(outfile, buffer, 1, characters_read);

If you want to create and write the data, you just set up the
contents of buffer, and then write it out. In the end, it
comes down to this: you're using NUL-terminated strings, and
string I/O functions to work with non-string data. You may be
able to make that work, but even at best it probably won't
work very well -- at the very least, it leads the reader to
believe that you're dealing with string-like data, which
apparently isn't the case.

Why the complications? Assuming the streams infile and outfile
are open and appropriately positioned, and you want to copy n
chars where n is non-negative and in a variable, all you need is:

int ch, n;
....
while (n-- && (EOF != (ch = fgetc(infile))))
fputc(ch, outfile);

No need to think about nul bytes, cr, lf, tabs, whatever. Should
there happen to be efficiency problems in the final application,
that is the time to consider further. Not now.

A few more benefits: If it fails (with n non-negative) the values
in ch, n, and ferror allow you to diagnose things fairly well.
You don't need to think about buffer sizes. Odds are high that
the above will be more efficient than the buffered version on many
implementations.

Todd · Feb 6, 2004

Good Lord, it was a joke. (Well, it was supposed to be. Clearly all ng's
have some teaching value, except for those that purposely misspell stuff,
like warez, etc.)

Humour is off topic here, and thus will not be tolerated.

Begone!

nrk · Feb 6, 2004

Richard said:
nrk said:

Richard said:

Leor Zolman wrote:

On Thu, 5 Feb 2004 19:01:08 +0000 (UTC), Richard Heathfield

In the general case when
you don't know the length of the source string, it is safer than
using strcpy.

No, it isn't. The only safe and correct thing to do, if you don't know
the length of the source string, is to ***find out***.

Yes, I believe I see what you mean now re. strncpy, after taking
another look at the Standard's description of it. If the length of the
source text is greater than the capacity of the destination as
conveyed via the size argument, a NUL won't get appended. You're
right, sorry.

That's one of the problems with strncpy. There are plenty more. For a
start, what if you incorrectly specify the third parameter? (It happens,
believe me.)

Click to expand...

The first problem is a strawman. Since the third argument is well known,
and due to the way standard specifies strncpy must behave, you only have
to check and see if dst[n-1] is '\0' or not, to tackle this situation.

Click to expand...

That doesn't /tackle/ the problem - it merely /detects/ it.

Well, detecting the problem is all you can do if you don't want to find out
the source length before copying. However, it is important to note that
detecting the problem is both easy and safe in this case. Harken back to
where you say:

If you do understand strncpy, then using it is a perfectly safe and valid
alternative to trying and finding out the length of the source string, then
doing a malloc and then doing a strcpy.

If you think about it, the fact that strncpy doesn't put a terminating null
character when the source is longer, and that it fills the rest of the
target buffer with nulls when the source is shorter is unavoidable, since
the return value is useless. Without that, it is impossible with a single,
simple last element check on the target to find out whether you got all of
the source or not. A better design of course is the strlcpy in *BSD. For
no discernible reason, someone decided that strncpy's return value should
be absolutely useless, and therefore we have the tricky null termination
semantics.

Again, if you always wanted all of the source regardless of the source
length, you should go for the malloc+strcpy route. But think of situations
where:

If my input is larger than x, it is an error and I simply quit. Here, I
don't want to see if the source is larger than x before issuing the copy.
For instance, such a large source potentially indicates a malicious input
and I wouldn't want to trust it to be a well-formed string with a null
terminator. I simply use strncpy, and see if my destination has a null at
position x or not. If not, I can't handle that input and report it as such
to the user. While this is not bullet-proof, it is atleast better than
running through a possible malicious string in search of a non-existent
null character.

99% of the time, I know that my input is exactly of length x +/- epsilon.
Also, I find that dynamic memory allocation overheads are prohibitive. One
can then think of devising a string pool. Since I know my input profile, I
would design my pool so that by default it is capable of storing strings of
length x+epsilon or less. The usage of this pool would be to get a default
object from the pool, use strncpy and see if you get all of the source, if
not ask for an object big enough to hold the source. You may ask why not
use strlen to start with. Well you see, this happens to be a frequent
operation and I don't want to traverse the source twice all the time. And
I know, my strncpy is not wasted 99% of the time, and is a good choice
provided epsilon isn't significant.

I know. I don't intend to take it very far; I'm just pointing out that
/any/ library function - including strcpy AND strncpy - can be misused,
and that many such functions will be unsafe if misused, including both
those two.

Yes. But you also seemed to be implying that malloc+strcpy is superior and
strncpy was in someway more unsafe (atleast that was my reading). IMHO, it
is the other way around and strncpy is safer than strcpy, if you know how
to use it. Too often, you see something like:

char str[64];

...
/* no sanity check on haxorinput */
strcpy(str, haxorinput);

which is no better than gets.

Sure, of course it can. And so can strcpy. The objection I am making in
this thread is not to strncpy per se, but strncpy as "the safe equivalent
of the unsafe strcpy function". That is what I consider to be wrong.

I agree. strncpy is not a replacement for strcpy. But it is a safe
alternative when you don't want to go through the strlen+malloc+strcpy
route, or take that route only if strncpy fails to fit the bill. Look at
the argument from a maintenance POV again:

char str[64];

...

strcpy(str, haxorinput);
...
strncpy(str, haxorinput, sizeof str);
assert(str[sizeof str - 1] == 0);

For the strcpy, when I look at that code, I have to make sure that
haxorinput has been properly validated to fit into str before that point.
This may or may not be close to the strcpy statement itself. It may even
be done in some other function in some other file (This happens more
frequently than an incorrect 3rd argument to strncpy in my limited
experience).
However, for the strncpy+assert (or strncpy+some other validation), I don't
need to know anything about *haxorinput* except that it is a valid pointer
(which we assume normally). If the validation doesn't immediately follow
that strncpy, you'd have to strongly suspect that something must be wrong,
for there is no logical reason to not validate the result of a function
call immediately afterwards.

That depends how robust and correct you want your program to be on those
occasions when the input is longer.

It can be made just as robust and just as correct as any alternative that
you suggest with strlen+malloc+strcpy.

I would go for an array of 21
characters, try to strncpy 21 characters into it, and check if array[20]
is '\0' or not after the strncpy to see if I've hit the rare case.

Click to expand...

Would you not find it easier just to handle the rare case /all/ the time,
since that would result in shorter code than "other cases + rare case"?

No. There can be legitimate reasons to optimize for the common case. It is
not a question of ease of coding.

I qualify my input parameters with const as far as possible. Modifying
the source unnecessarily is not only not an option, but is also bad style
in my books.

Click to expand...

I agree. My preferred solution would be to make sure the target buffer
/is/ big enough.

Also, if this is a solution, so is:
dst[49] = 0;
strncpy(dst, src, 49);

Click to expand...

Yes, but it takes longer to type.

Barring that bogeyman argument, if I used strncpy, I *don't* have to
check. All I have to check is that the src was no longer than I expected,
which
can be done in a very straight-forward and simple manner. In fact, you
can
(and I do), wrap these operations into a function and use it safely.
IMHO, creating a buffer overrun with strncpy is less likely than with
strcpy.
YMMSTV. Of course, if you always wanted all of the source regardless of
size, well, that's what strcpy is for

Click to expand...

Right! And if you didn't want all the source, why did you bother to
capture it?

Well, maybe it wasn't me (a library) that captured it. There can be several
layers between user input and your code, not all under your control.

I'll buy all those objections to my objections - because they apply
equally to similar objections to strcpy (that is, the arguments against
strcpy are equally strawlike).

strcpy makes me look at code harder to see if things are really ok. Other
than that, I don't have objections to its use. I only have objections to
objections to strncpy (unless those objections are accompanied by a
suggestion to strlcpy or like alternatives). strncpy is a perfectly safe
function to use.

sizeof target is all very well, but doesn't guarantee you a
null-terminated string at the end, whereas sizeof target - 1 does.

No. sizeof target - 1 doesn't guarantee a null terminated target either
(not unless you said target[sizeof target - 1] = 0 before or after). You
can see by checking the last character in your target whether your target
buffer was big enough or not.

Thanks for the vote of confidence.

Anytime

This discussion is merely an effort to learn more. I have an
opinion. By airing it somewhat stridently, I am trying to provoke you and
other clueful regulars into expanding my knowledge

Agreed, but code doesn't exist in a vacuum. In typical code that I've
written, code will exist /before/ the strcpy, that makes sure the target
is large enough.

See argument above. Your code might be well written so that you don't have
to search long and hard for the pre-condition validation. My limited view
suggests that a lot of people tend to spread their pre-condition
validations somewhat more aribitrarily (in time and space) than they would
validate the result of a function call.

I've recently been doing a lot of work on a portable code library called
CLINT. I just grepped the latest source for strcpy, and sure enough, in
over 17000 lines of code, there /is/ a call - one call - to strcpy. Here
it is, in context:

for(i = 0; i < sizeof objname / sizeof objname[0]; i++)
{
assert(sizeof
wnn_GlobalConfig->ObjectCount.ObjectName >
strlen(objname));

strcpy(wnn_GlobalConfig->ObjectCount.ObjectName,
objname);
}

(Please understand that we're dealing with fixed size arrays here, arrays
that are not accessible to the user-programmer under normal circumstances.
So I adjudged an assertion to be appropriate.)

Why only one call to strcpy (and no calls to strncpy) in 17000+ lines?
Well, that's because CLINT includes a stretchy string library with its own
string copying routines. But, in the one place I do use it, I think it's
fair to say that I use it appropriately.

I think it is fair to say that you missed a good opportunity to investigate
a strncpy (or like) alternative

for(i = 0; i < sizeof objname / sizeof objname[0]; i++)
{
/* this dst is merely because your original name is too long */
char *dst = wnn_GlobalConfig->ObjectCount.ObjectName;
/* this len for same reasons as above */
size_t len = sizeof wnn_GlobalConfig->ObjectCount.ObjectName;

strncpy(dst, objname, len);
assert(dst[len-1] == 0);
}

The only argument against strncpy here would be the fact that you will
always write len characters. As long as this is less expensive than an
additional function call and traversing the source once more, the strncpy
alternative is better. An even better alternative is to use something akin
to the non-standard strlcpy. There are no real arguments against using an
alternative like strlcpy here (you could easily roll your own if it is not
part of your platform already, or if you plan to distribute your code wider
than your current platform). Here's the strlcpy alternative:

size_t ret = strlcpy(wnn_GlobalConfig->ObjectCount.ObjectName,
objname,
sizeof
wnn_GlobalConfig->ObjectCount.ObjectName);
assert(ret < sizeof wnn_GlobalConfig->ObjectCount.ObjectName);

If you'd given strncpy a fair think, you might've even thought of a strlcpy
like alternative

With strcpy, you already know, because you already checked.

I can buy that.

Not as idiotic as gets().

Similarly, strncpy may or may not leave a null terminator in your output,
and may or may not copy the entire source string.

Click to expand...

Yes. But unlike fgets where there is no easy way to tell other than
travelling through the string again, strncpy gives you an easy fool-proof
way to resolve all those may/may not issues.

-nrk.

ps: Aplogies. I didn't realize that in my original post, clc was removed
from the follow-up list.

Jerry Coffin · Feb 6, 2004

[ ... ]

Why the complications? Assuming the streams infile and outfile
are open and appropriately positioned, and you want to copy n
chars where n is non-negative and in a variable, all you need is:

int ch, n;
....
while (n-- && (EOF != (ch = fgetc(infile))))
fputc(ch, outfile);

At least to me, this looks substantially more complicated than one call
to fread followed by one to fwrite.

No need to think about nul bytes, cr, lf, tabs, whatever.

What code do you see that DOES force one to think about nul bytes, cr,
lf, tabs, etc?

A few more benefits: If it fails (with n non-negative) the values
in ch, n, and ferror allow you to diagnose things fairly well.
You don't need to think about buffer sizes. Odds are high that
the above will be more efficient than the buffered version on many
implementations.

I doubt it'll often matter, and your implementation isn't nearly as much
slower as many people expect, but I've yet to see an implementation with
which it's really as fast as using fread and fwrite to work in big
chunks. In theory I can see reasons it _could_ be, but I've yet to
actually see it in reality.

CBFalconer · Feb 7, 2004

Jerry said:
(e-mail address removed) says...

[ ... ]

[ ... ]

Why the complications? Assuming the streams infile and outfile
are open and appropriately positioned, and you want to copy n
chars where n is non-negative and in a variable, all you need is:

int ch, n;
....
while (n-- && (EOF != (ch = fgetc(infile))))
fputc(ch, outfile);

Click to expand...

At least to me, this looks substantially more complicated than one
call to fread followed by one to fwrite.

The primary advantage is that the buffer is one integer, rather
than some possibly humungous array of char. For efficiency lets
use getc/putc in place of fgetc/fputc. The run-time is probably
able to optimize buffers much better than you can.

What code do you see that DOES force one to think about nul bytes,
cr, lf, tabs, etc?

For this particular usage, none.

I doubt it'll often matter, and your implementation isn't nearly as
much slower as many people expect, but I've yet to see an
implementation with which it's really as fast as using fread and
fwrite to work in big chunks. In theory I can see reasons it
_could_ be, but I've yet to actually see it in reality.

When you get to embedded systems and minimum load modules, the
difference may really show up. Any code inefficiencies are almost
certainly lost in the actual i/o time.

Martijn Lievaart · Feb 7, 2004

When you get to embedded systems and minimum load modules, the
difference may really show up. Any code inefficiencies are almost
certainly lost in the actual i/o time.

I've used implementations where the fgetc aproach was substancially slower
that using fread/fwrite, say by a factor between 10 and 100. It was so
much slower that the inefficiencies really where not only noticable, but
even prohibitive.

So I think the approach one should take is very much dependend on the
context, but in general if you can spare the memory fread/fwrite is imo
the way to go. Why? Because it is never slower, and often much faster.

HTH,
M4

Jerry Coffin · Feb 7, 2004

[email protected] says... said:
The primary advantage is that the buffer is one integer, rather
than some possibly humungous array of char.

It's only as large as you make it -- yes, it can be huge if you decide
there's a good reason for that, but there's certainly nothing that
mandates it.

For efficiency lets
use getc/putc in place of fgetc/fputc. The run-time is probably
able to optimize buffers much better than you can.

Would that it were so. Unfortunately, both I and _many_ others have
long and widespread experience that indicates otherwise.

When you get to embedded systems and minimum load modules, the
difference may really show up. Any code inefficiencies are almost
certainly lost in the actual i/o time.

You're starting with an assumption that seems (to me) to be false: that
getc/putc (or fgetc/fputc) are fundamentally simpler than fread/fwrite.
In point of fact, the opposite can be true: getc/putc typically use a
buffer, and it's not entirely out of line for the implementation to use
fread and fwrite to fill an empty read buffer or flush a full write
buffer. IOW, as far as load modules go, using fread and fwrite directly
can actually be a savings rather than a cost.

In the end, at least in my experience, the bottom line is fairly simple:
for bulk data transfers, fread and fwrite are sometimes a considerable
gain, and I've yet to see a situation where they were a significant
loss.

Noob question about mathematical addition vs. "string addition" in C#	1	Mar 6, 2022
C pipe	1	Dec 9, 2021
Copy string from 2D array to a 1D array in C	1	Nov 1, 2023
Problem with simple pthread program in C	1	Mar 9, 2023
Simple functions don't work :(	3	Jul 29, 2023
Simple (x,y) graph	3	Mar 27, 2023
Filter sober in c++ don't pass test	0	Dec 2, 2023
C language. work with text	3	Dec 9, 2021

[C] simple string question

Oleg Melnikov

Joona I Palaste

Jerry Coffin

CBFalconer

Jerry Coffin

CBFalconer

Todd

nrk

Jerry Coffin

CBFalconer

Martijn Lievaart

Jerry Coffin

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads