gcc initialization better or extra line of execution

A

anish singh

So here is the problem:
Progarm 'A'

char array[128] = {0};
memcpy(array, "testing", strlen("testing"));

Progarm 'B'

char array[128];
memcpy(array, "testing", strlen("testing"));
array[strlen("testing")] = '\0';

Which one is better program?Definition of better: A program is better if it has less code size and executes in less cycle.
Argument 1: Program 'B' is better as it doesn't have to initialize the whole array and just need to append '\0' at the end which is just one more cycle.
Please answer this question with concrete data in any architecture i.e. ARM, x86 or anything which is available to you.
Please use disassembly to explain your point w.r.t. cycles or code size.Strictly guess is not allowed.



I believe the question might be phrased, "Is it better to initialize

all array elements to zero prior to initializing some of them to

something other than zero or is it better to leave it uninitialized

and wait until runtime?"



Program A takes the time and processor effort to initialize all 128

members and then immediately overwrites that initialization with a new

one, counting on the first initialization to provide the zero

terminator for a string. I would call this "bad practice". It violates

locality of reference and is less maintainable. What happens to

maintenance if the initialization is widely separated from the usage

of the memcpy?



Program B provides the allocation of the buffer space and then

initializes the string and provides it's own explicit termination.



Esthetically, program B is preferable, since it explicitly

zero-terminates the string and is more maintainable. The array will

typically be initialized to some value by the OS at runtime, usually

to zero for security reasons in OS like Windows or Linux, sometimes

garbage left over from other allocations in other OS. In the case of

Microsoft's compilers, they initialize memory such as this to

pre-defined values in Debug mode (i.e., 0xCC) and this can be useful

for detecting buffer overruns or programmer errors like forgetting to

zero-terminate your string and you lose this capability when you zero

the array members. They initialize to zero in Release mode. In B, the

programmer is saying, "I know this is a C string and I am handling it

as such".



Operationally, program B is preferable since it doesn't waste time

doing something the OS may already have done, doesn't waste time doing

something the implementation may have done, and only writes initial

data that is required for correct operation.
Hey Geoff,
My colleague also said exactly same thing and who was supporting B implementation.So we are on the same page here.
Specially about "I know this is a C string and I am handling it as such".
 
G

glen herrmannsfeldt

Geoff said:
On Fri, 8 Nov 2013 23:47:30 -0800 (PST), anish singh
So here is the problem:
Progarm 'A'
main()
{
char array[128] = {0};
memcpy(array, "testing", strlen("testing"));
}
(snip)

I believe the question might be phrased, "Is it better to initialize
all array elements to zero prior to initializing some of them to
something other than zero or is it better to leave it uninitialized
and wait until runtime?"

It is an auto array, so initialized at run time anyway.

Personally, I wouldn't worry about it for arrays less than about 1000
elements, except for embedded systems with small memory.
Program A takes the time and processor effort to initialize all 128
members and then immediately overwrites that initialization with a new
one, counting on the first initialization to provide the zero
terminator for a string. I would call this "bad practice". It violates
locality of reference and is less maintainable. What happens to
maintenance if the initialization is widely separated from the usage
of the memcpy?

If you decided to write the whole array out to disk, it would be nice
to have it zeroed all the way out. Also, there are some security
considerations with uncleared buffers. Doesn't seem bad to me.
Program B provides the allocation of the buffer space and then
initializes the string and provides it's own explicit termination.
Esthetically, program B is preferable, since it explicitly
zero-terminates the string and is more maintainable. The array will
typically be initialized to some value by the OS at runtime, usually
to zero for security reasons in OS like Windows or Linux, sometimes
garbage left over from other allocations in other OS.

On the stack, if it is in main then I suppose there is a good chance
that the stack space is still zero. Other than main, there could be
almost anything on the stack.

(snip)
Operationally, program B is preferable since it doesn't waste time
doing something the OS may already have done, doesn't waste time doing
something the implementation may have done, and only writes initial
data that is required for correct operation.

For static arrays, it is different. C always zeros static data before
starting the program. On many systems, static data is read from the
object program. That is, large static arrays result in large object
programs. Many now will not store the zeros, but any other value, even
a whole array full, will be stored. I don't know that C has anything
like the Fortran:

INTEGER X(1000000)
DATA X/100000*1/
All these points are architecturally irrelevant.

Different systems have different ways to initialize the array,
some fast, some slow. Some do strcpy(), memcpy() and/or memmove()
inline, others as function calls.

-- glen
 
S

Seebs

Yes, I know it is wrong.This was not written to compile
but more of a pseudocode.

That sort of undermines the goal of evaluating it, no? I mean,
if the code won't compile, how exactly is someone supposed to
compare the assembly output?
Yes, this is the problem.This question gets its origin from a silly
code review where the reviewer commented what i wrote as argument 1
in my original question.

Okay. Then the answer is "this definition is wrong". And once you've
established that, the rest is irrelevant.
Yes, as the code in question is not posted here but rather the pseudocode
to get the comments from the experts present here.

No, not just the code, but also the definition of "better". What
you said is "better" is not in fact better.
This is the best part till now which just confirms my understanding.

This disturbs me.
Yes,I told that this question stemmed from a silly code review by a so-called
expert senior.
Fascinating.

I wrote that code for just asking the question about not initialized and initialized array.The code written was not to demonstrate my expertise in C
language.I offer my apologies if it appeared that way.

The problem here is that those errors make it impossible to discern
which parts of the sample code you think are relevant. Are you really
asking about the initialized array, or about the repeated use of
strlen()? We can't tell!
Reason why i was asking this was to get a concrete idea about the
above said question.So as i understand it depends on several factors
which includes code optimization.

Yes. But also the question itself is probably wrong.
I guess not you get my point.However if you still feel otherwise
please let me know and I will do my best to explain more.

It's very hard to tell, because you appear to be trying to evaluate
code under a set of standards which are objectively wrong.

For the sample code given, there is no reason I can imagine to do
something other than using a string literal initializer for the array.
Unless the parts you changed to make an example change the intended
semantics, which they might well.

If you want a technical opinion on a piece of code, it's very important
to use the actual code you want to know about. The entire point of asking
experts is that you presumably don't yourself *know* what parts of the
code are or aren't going to be relevant to them. If you knew, you'd be
the expert.

-s
 
A

anish singh

That sort of undermines the goal of evaluating it, no? I mean,

if the code won't compile, how exactly is someone supposed to

compare the assembly output?








Okay. Then the answer is "this definition is wrong". And once you've

established that, the rest is irrelevant.







No, not just the code, but also the definition of "better". What

you said is "better" is not in fact better.






This disturbs me.








The problem here is that those errors make it impossible to discern

which parts of the sample code you think are relevant. Are you really

asking about the initialized array, or about the repeated use of

strlen()? We can't tell!








Yes. But also the question itself is probably wrong.







It's very hard to tell, because you appear to be trying to evaluate

code under a set of standards which are objectively wrong.



For the sample code given, there is no reason I can imagine to do

something other than using a string literal initializer for the array.

Unless the parts you changed to make an example change the intended

semantics, which they might well.



If you want a technical opinion on a piece of code, it's very important

to use the actual code you want to know about. The entire point of asking

experts is that you presumably don't yourself *know* what parts of the

code are or aren't going to be relevant to them. If you knew, you'd be

the expert.
Ok, I got what you wanted to tell me.Next time I will exactly produce the code where I have problem or doubt.Anyway this is what I was looking for.

int main()
{
/* I can rewrite in several ways but that is not the purpose here */
char ch[128];
char *string = "anish";
strncpy(ch, string, 5);
ch[6] = '\0';
printf("%s\n", ch);
return 0;
}

gcc -g file.c

gdb a.out

(gdb) disassemble /m
Dump of assembler code for function main:
2 {
0x0804848c <+0>: push %ebp
0x0804848d <+1>: mov %esp,%ebp
0x0804848f <+3>: and $0xfffffff0,%esp
0x08048492 <+6>: sub $0xa0,%esp
=> 0x08048498 <+12>: mov %gs:0x14,%eax
0x0804849e <+18>: mov %eax,0x9c(%esp)
0x080484a5 <+25>: xor %eax,%eax

3 char ch[128];
4 char *string = "anish";
0x080484a7 <+27>: movl $0x8048598,0x18(%esp)

5 strncpy(ch, string, 5);
0x080484af <+35>: movl $0x5,0x8(%esp)
0x080484b7 <+43>: mov 0x18(%esp),%eax
0x080484bb <+47>: mov %eax,0x4(%esp)
0x080484bf <+51>: lea 0x1c(%esp),%eax
0x080484c3 <+55>: mov %eax,(%esp)
0x080484c6 <+58>: call 0x8048390 <strncpy@plt>

6 ch[6] = '\0';
0x080484cb <+63>: movb $0x0,0x22(%esp)

7 printf("%s\n", ch);
0x080484d0 <+68>: lea 0x1c(%esp),%eax
0x080484d4 <+72>: mov %eax,(%esp)
0x080484d7 <+75>: call 0x8048360 <puts@plt>

8 return 0;
0x080484dc <+80>: mov $0x0,%eax

9 }
0x080484e1 <+85>: mov 0x9c(%esp),%edx
0x080484e8 <+92>: xor %gs:0x14,%edx
0x080484ef <+99>: je 0x80484f6 <main+106>
0x080484f1 <+101>: call 0x8048350 <__stack_chk_fail@plt>
0x080484f6 <+106>: leave
0x080484f7 <+107>: ret

End of assembler dump.

Same program with initilization as below:

int main()
{
/* I can rewrite in several ways but that is not the purpose here */
char ch[128] = {0};
char *string = "anish";
strncpy(ch, string, 5);
printf("%s\n", ch);
return 0;
}

gcc -g file.c

gdb a.out

(gdb) disassemble /m
Dump of assembler code for function main:
2 {
0x0804848c <+0>: push %ebp
0x0804848d <+1>: mov %esp,%ebp
0x0804848f <+3>: push %edi
0x08048490 <+4>: push %ebx
0x08048491 <+5>: and $0xfffffff0,%esp
0x08048494 <+8>: sub $0xa0,%esp
=> 0x0804849a <+14>: mov %gs:0x14,%eax
0x080484a0 <+20>: mov %eax,0x9c(%esp)
0x080484a7 <+27>: xor %eax,%eax

3 /* I can rewrite in several ways but that is not the purpose here*/
4 char ch[128] = {0};
0x080484a9 <+29>: lea 0x1c(%esp),%ebx
0x080484ad <+33>: mov $0x0,%eax
0x080484b2 <+38>: mov $0x20,%edx
0x080484b7 <+43>: mov %ebx,%edi
0x080484b9 <+45>: mov %edx,%ecx
0x080484bb <+47>: rep stos %eax,%es:(%edi)

5 char *string = "anish";
0x080484bd <+49>: movl $0x80485a8,0x18(%esp)

6 strncpy(ch, string, 5);
0x080484c5 <+57>: movl $0x5,0x8(%esp)
0x080484cd <+65>: mov 0x18(%esp),%eax
0x080484d1 <+69>: mov %eax,0x4(%esp)
0x080484d5 <+73>: lea 0x1c(%esp),%eax
0x080484d9 <+77>: mov %eax,(%esp)
0x080484dc <+80>: call 0x8048390 <strncpy@plt>

7 printf("%s\n", ch);
0x080484e1 <+85>: lea 0x1c(%esp),%eax
0x080484e5 <+89>: mov %eax,(%esp)
0x080484e8 <+92>: call 0x8048360 <puts@plt>

8 return 0;
0x080484ed <+97>: mov $0x0,%eax
9 }
0x080484f2 <+102>: mov 0x9c(%esp),%edx
0x080484f9 <+109>: xor %gs:0x14,%edx
0x08048500 <+116>: je 0x8048507 <main+123>
0x08048502 <+118>: call 0x8048350 <__stack_chk_fail@plt>
0x08048507 <+123>: lea -0x8(%ebp),%esp
0x0804850a <+126>: pop %ebx
0x0804850b <+127>: pop %edi
0x0804850c <+128>: pop %ebp
0x0804850d <+129>: ret

End of assembler dump.


Code I ran on my ubuntu 32bit system.
As you can see that there is more code generated for initialized code alongwith more instructions.From the above disassembled code looks like initialized code is having more code + more cycles(assuming each line is executingin one cycle which may not be the case).
I may have assumed many other things.So my expectation from the experts is to point out my assumption/mistakes.
 
S

Seebs

int main()
{
/* I can rewrite in several ways but that is not the purpose here */
char ch[128];
char *string = "anish";
strncpy(ch, string, 5);
ch[6] = '\0';
printf("%s\n", ch);
return 0;
}

The big distinction between the previous code and this is that
you're hand-entering the value 6 (which is wrong, btw, it should be 5).

It's not obvious why you're using strncpy and a terminator rather than
using memcpy, given that the string already has a terminator. For
instance:
memcpy(ch, string, 6);

Or you could look at:
char ch[128] = "anish";

And I guess my point is: Rewriting it in several ways SHOULD be your
purpose if you want to understand this, because as is, the results you're
seeing are entirely dominated by random coincidence and things which
are in no way part of what you're trying to compare.
As you can see that there is more code generated for initialized
code along with more instructions.From the above disassembled code
looks like initialized code is having more code + more cycles(assuming
each line is executing in one cycle which may not be the case).

That's not even CLOSE to the case. Most noticably, you're ignoring the
function call. The function call into strncpy is going to execute some
significant hunk of code. Heck, if it's the only call to strncpy in
the program, you could be talking about *hundreds or thousands* of
instructions for the initial reference setup from a dynamic library.
I may have assumed many other things.So my expectation from the
experts is to point out my assumption/mistakes.

The entire thing is fundamentally confused. You're introducing multiple
possible points for there to be errors, and indeed, there's an outright
error in the sample quoted above: You're leaving one uninitialized byte
after the "h" but before the null byte.

Assuming this really is a fixed string known at compile time, there is
no reason I can comprehend to do anything but:
char ch[128] = "anish";
Anything else will, at the very *least*, expose you to a much higher
risk of severe bugs.

But past that... You are treating a call into a library function which
is quite likely an indirect call through a lookup table, may require
runtime resolution of symbols from libraries, and which then does
complicated things which you clearly do not accurately comprehend,
as though it's "one instruction".

If you want a fair comparison, I think good starting points would
be (assume each of these is a function body or some such):

/* #1 */
char ch[128] = "anish";

/* #2 */
char ch[128] = { 0 };
strcpy(ch, "anish");

/* #3 */
char ch[128];
strncpy(ch, "anish", 128);

The key to this last one is that if you want to fairly compare two
things which zero out the entire array, you should zero out the entire
array both times.

Unless you're quite sure you don't need the whole array zeroed out.

And really, this... this is just too tiny to be spending time on. That's
by far the biggest mistake here. Unless you've got a real program where
profiling shows that this function is consuming a noticeable amount of CPU
time, the only thing you should be doing is writing the *clearest* possible
code.

If you get a string at runtime, and want to store it in an array,
nul-terminated, the right choice would usually be something like strlcpy(),
only that may not be portable enough yet. If you can be confident that
it's short enough to fit, use strcpy(). Circumstances where strncpy()
is the right choice are exceedingly rare.

If I had to do this, and it needed to be null-terminated, and the string
was a parameter:

int example(char *s) {
char ch[128];
snprintf(ch, sizeof(ch), "%s", s);
/* ... */
}

It may not be the fastest, but it is *completely clear* what it does, and
we can be confident that the resulting string is correctly null-terminated.

(Unless I screwed it up in a funny way. This happens sometimes.)

-s
 
M

Malcolm McLean

If you get a string at runtime, and want to store it in an array,
nul-terminated, the right choice would usually be something like strlcpy(),
only that may not be portable enough yet. If you can be confident that
it's short enough to fit, use strcpy(). Circumstances where strncpy()
is the right choice are exceedingly rare.
That's why I try to make a distinction between a "buffer" and an "array",
an array contains data, a buffer doesn't necessarily. Since the code is
obviously a fragment, it may be that for some reason the data is only valid
if padded out to 128 zero bytes, maybe later on you hash it or something.
But most likely "array" is misnamed, and it should be called "buffer", and
it doesn't matter what it contains after the end of the data.

strncpy() is the right choice for setting to a string then padding to zero.
It isn't such a rare thing to need to do, because it can make string
comparisons easier - if you know that fields are always a multiple of 8 and
zero-padded, you can do a wordwise rather than a bytewise compare, for
example.
 
G

glen herrmannsfeldt

anish singh said:
On Sunday, November 10, 2013 1:15:35 AM UTC-8, Seebs wrote:
(snip)
(snip)
Ok, I got what you wanted to tell me.Next time I will exactly
produce the code where I have problem or doubt.
Anyway this is what I was looking for.
int main()
{
/* I can rewrite in several ways but that is not the purpose here */
char ch[128];
char *string = "anish";
strncpy(ch, string, 5);
ch[6] = '\0';
printf("%s\n", ch);
return 0;
}
gcc -g file.c (snip)
Dump of assembler code for function main:

I usually like the output from gcc -S better.

First, it is more important to write correct code than to save a few
cycles. Sometimes clearing the whole array is a good thing to do,
other times it isn't needed.

Often, though, it is better to explicitly clear the array, such as
with memset(), and not rely on the initialization. (It might be in
a loop.)

Code reviews are a good way to learn, though not all code reviewers
know what they are talking about.

For most modern systems, you should only worry about cycles when the
problem could be O(n**2), but not when it is O(n) or O(n log n).

Note that you might be surprised sometimes, though. Some years ago
I had to find out why a program was running so slow. It turned out
to be something like (not trying to get it exactly right):

char buf[10000000], line[100];

buf[0]=0;
while(fgets(line,sizeof line, infile)) strcat(buf, line);

as well as I remember, the program did enough checking to avoid
buffer overflow, but note that this loop is O(n**2).

It is easy to recognize O(n**2) problems when there are nested for
loops, but here there are no visible for loops!

Otherwise, you might have had better response if you had said that it
was a code review from the beginning. The replies for code review are
different from those in general.

-- glen
 
A

anish singh

int main()

/* I can rewrite in several ways but that is not the purpose here */
char ch[128];
char *string = "anish";
strncpy(ch, string, 5);
ch[6] = '\0';

Oops! Writing at 2 in the night doesn't always bring the best out of me.
printf("%s\n", ch);
return 0;



The big distinction between the previous code and this is that

you're hand-entering the value 6 (which is wrong, btw, it should be 5). Yes,



It's not obvious why you're using strncpy and a terminator rather than

using memcpy, given that the string already has a terminator. For

instance:

memcpy(ch, string, 6);



Or you could look at:

char ch[128] = "anish";



And I guess my point is: Rewriting it in several ways SHOULD be your

purpose if you want to understand this, because as is, the results you're

The reason why I have written the way it is written here is that i want it to closely match what was being reviewed and suggested.There were two suggestions which closely matches the two programs I mentioned above.I can't do this way:
char ch[128] = "anish";
As the buffer is provided by userspace to the kernel.We don't know if the buffer provided is null terminated or not.
seeing are entirely dominated by random coincidence and things which

are in no way part of what you're trying to compare.









That's not even CLOSE to the case. Most noticably, you're ignoring the

function call. The function call into strncpy is going to execute some

significant hunk of code. Heck, if it's the only call to strncpy in

the program, you could be talking about *hundreds or thousands* of

instructions for the initial reference setup from a dynamic library.
I didn't understand this.In the above two programs I wrote both call the same strncpy function and code is generated for both of them.So if we remove out the common factors i.e. common code; what we are left with is the initialized code and explicit null termination and that is what I was referring here.Disassembled output of both is given to compare the extra amount of code which gets generated and the more code means more cycles.No?
I may have assumed many other things.So my expectation from the
experts is to point out my assumption/mistakes.



The entire thing is fundamentally confused. You're introducing multiple

possible points for there to be errors, and indeed, there's an outright

error in the sample quoted above: You're leaving one uninitialized byte

after the "h" but before the null byte. Agreed.



Assuming this really is a fixed string known at compile time, there is

no reason I can comprehend to do anything but:

char ch[128] = "anish";

Anything else will, at the very *least*, expose you to a much higher

risk of severe bugs.
Yes, but my situation is different as explained above.
But past that... You are treating a call into a library function which

is quite likely an indirect call through a lookup table, may require

runtime resolution of symbols from libraries, and which then does

complicated things which you clearly do not accurately comprehend,

as though it's "one instruction".



If you want a fair comparison, I think good starting points would

be (assume each of these is a function body or some such):



/* #1 */

char ch[128] = "anish"; Can't do this.



/* #2 */

char ch[128] = { 0 };

strcpy(ch, "anish");
This is what I did in one of the program but in kernel strcpy doesn't existand this API is not liked by many people because it opens the gateway for stack overflow attacks.
/* #3 */

char ch[128];

strncpy(ch, "anish", 128); This I can't do.



The key to this last one is that if you want to fairly compare two

things which zero out the entire array, you should zero out the entire

array both times.
Situation is different in my case.There were two suggestions as two programs posted here.So here I am not after fair comparison but rather comparison in whatever form it is.
Unless you're quite sure you don't need the whole array zeroed out.
What I am looking for is a buffer which has the null terminated string onceit is sent by a different function.
And really, this... this is just too tiny to be spending time on. That's

by far the biggest mistake here. Unless you've got a real program where

profiling shows that this function is consuming a noticeable amount of CPU

time, the only thing you should be doing is writing the *clearest* possible

code. Sure.



If you get a string at runtime, and want to store it in an array,

nul-terminated, the right choice would usually be something like strlcpy(),

only that may not be portable enough yet. If you can be confident that

it's short enough to fit, use strcpy(). Circumstances where strncpy()

is the right choice are exceedingly rare.



If I had to do this, and it needed to be null-terminated, and the string

was a parameter:



int example(char *s) {

char ch[128];

snprintf(ch, sizeof(ch), "%s", s);
I can't do this as I am not sure if example is called with a null terminated string.
 
K

Keith Thompson

Malcolm McLean said:
strncpy() is the right choice for setting to a string then padding to
zero. It isn't such a rare thing to need to do, because it can make
string comparisons easier - if you know that fields are always a
multiple of 8 and zero-padded, you can do a wordwise rather than a
bytewise compare, for example.

Many programmers assume that strncpy() is just a "safer" version of
strcpy(). It isn't. It's a bizarre function that can easily leave
the target array without a terminating '\0' (i.e., not containing a
string) if you're not very careful. If you add an explicit check
to ensure that the target is null-terminated, you probably might
as well use strcpy() or memcpy().

http://the-flat-trantor-society.blogspot.com/2012/03/no-strncpy-is-not-safer-strcpy.html
 
J

James Kuyper

char ch[128] = "anish";
As the buffer is provided by userspace to the kernel.We don't know if the buffer provided is null terminated or not.


Then your examples should have been structured to reflect that fact.
There's nothing in your example to suggest any distinction between a
kernal and userspace.
Assuming this really is a fixed string known at compile time, there is
no reason I can comprehend to do anything but:

char ch[128] = "anish";

Anything else will, at the very *least*, expose you to a much higher
risk of severe bugs.
Yes, but my situation is different as explained above.

It would have been better to explain that at the beginning.

....
This is what I did in one of the program but in kernel strcpy doesn't exist and this API is not liked by many people because it opens the gateway for stack overflow attacks.

For that to be a problem, your program would have to a part of the
kernel. Again, this is something pretty important to know, which should
have been mentioned in your original question.

....
/* #3 */

char ch[128];

strncpy(ch, "anish", 128);
This I can't do.

A simple statement that you can't do it isn't very useful. You'll need
to explain why. The reason why you can't do it isn't obvious, even after
the additional explanations you gave in this message.

....
Situation is different in my case.There were two suggestions as two programs posted here.So here I am not after fair comparison but rather comparison in whatever form it is.

You may only be interested in those two suggestions, but you've not made
it clear why. Why are you only interested in which of two bad ways to
write something is faster? Why are you not interested in the best way to
do the task?

....
What I am looking for is a buffer which has the null terminated string once it is sent by a different function.

So, in other words - you don't need the whole array zeroed out.

....
If I had to do this, and it needed to be null-terminated, and the string
was a parameter:



int example(char *s) {

char ch[128];

snprintf(ch, sizeof(ch), "%s", s);
I can't do this as I am not sure if example is called with a null terminated string.

Both of your original examples called strlen(). That's not a safe
function to call if you don't absolutely know that the argument is a
null terminated string.

Your most recent re-writes both use strncpy(), which is unsafe with a
string that isn't null-terminated, unless you know that it has a minimum
length, if not null-terminated, that is greater than the limit that you
pass to strncpy(). That's exactly the same thing you can say about this
use of snprintf() - so why do you think it's safe to use strncpy(), but
dangerous to snprintf()?

If the argument does not point to a null-terminated string, how does
your program know how much can safely be copied?
 
S

Seebs

strncpy() is the right choice for setting to a string then padding to zero.

No, it isn't. It doesn't guarantee null termination. So it's not the right
choice for setting something to a string, because sometimes it produces a
non-string.
It isn't such a rare thing to need to do, because it can make string
comparisons easier - if you know that fields are always a multiple of 8 and
zero-padded, you can do a wordwise rather than a bytewise compare, for
example.

I have not yet had a circumstance where I could reasonably expect to
beat strcmp().

-s
 
S

Seebs

The reason why I have written the way it is written here is that i want it to closely match what was being reviewed and suggested.There were two suggestions which closely matches the two programs I mentioned above.I can't do this way:
char ch[128] = "anish";
As the buffer is provided by userspace to the kernel.We don't know if the buffer provided is null terminated or not.

See, this just tells us that what you're doing does NOT match
what was being reviewed.

Because that suggestion is 100% applicable to the code you
provided. So if it isn't applicable to your real problem,
you need to find a way to express what you are really trying to
do.

You appear to be trying to ensure that a buffer is made to contain
a fixed set of bytes, and you don't care about the bytes after that.
But this comment about the buffer being "provided by userspace"
suggests that you are receiving the address of a buffer somewhere
else.

But if that's the case, then
char ch[128] = { 0 };
is also inapplicable, because that's declaring a local buffer, not
initializing a buffer somewhere else.
I didn't understand this.In the above two programs I wrote both
call the same strncpy function and code is generated for both of
them.

First off, that's already bad -- strncpy() is the wrong function
for almost every task.
So if we remove out the common factors i.e. common code; what
we are left with is the initialized code and explicit null termination
and that is what I was referring here.Disassembled output of both
is given to compare the extra amount of code which gets generated
and the more code means more cycles.No?

Not necessarily. First off, it's very easy to come up with cases where
more code means fewer cycles. Look up "loop unrolling".

Secondly, you're spending an inordinate amount of time looking at
what is probably a miniscule fraction of the runtime, because strcnpy
is much more complicated than everything you're doing.

Thirdly, *the entire concept is fundamentally severely broken*. You
should not be thinking about this *at all*.

Imagine that someone is building a house. What you are doing here is
roughly analagous to that person spending three days with laser levels
and fancy tools trying to make sure that the height of one of the door
latches is precise to within a ten thousandth of an inch, without
even considering things like "is this latch on the side opposite the
hinges, rather than the same side as the hinges"?
Yes, but my situation is different as explained above.

Then no advice you get can possibly be relevant to your actual situation.
/* #1 */

char ch[128] = "anish";
Can't do this.

Then show us ACTUAL CODE EXAMPLES THAT HAVE ANYTHING TO DO WITH WHAT
YOU ARE DOING.

If this is not equivalent to the intent of your other code samples, then
they are wrong in *fundamental* ways which mean that *the generated code
you are looking at has nothing whatsoever to do with the actual code you
care about*.
This is what I did in one of the program but in kernel strcpy doesn't exist and this API is not liked by many people because it opens the gateway for stack overflow attacks.

There is no gateway for stack overflow attacks when using a fixed-length
string literal. (Also, I have no idea what kernel you're working on,
but if it's Linux, pretty sure you could be using str*l*cpy, which has
the distinctive advantage that, unlike str*n*cpy, it's not stupid.)

And if you're not using a fixed-length string literal in your real code,
the most important thing by far is going to be for you to come up with
an example that shows the actual use case and the place your string
actually comes from.

Modern compilers will sometimes do HORRIBLE THINGS to code that
includes string operations and string literals. You should not use
string literals if you want to see what will happen with real code
that isn't using them.
/* #3 */

char ch[128];

strncpy(ch, "anish", 128);
This I can't do.

I like the way you provide a detailed explanation of why you can't
do this.
Situation is different in my case.There were two suggestions as
two programs posted here.So here I am not after fair comparison but
rather comparison in whatever form it is.

And yet, everything an expert can conclude from studying those suggestions
is "impossible" for reasons not adequately explained.

This means those "suggestions" are *not actually your real code*. So nothing
we tell you about them is useful.

If you really had a function:
void foo(void) {
char ch[128];
strncpy(ch, "anish", 5);
ch[5] = '\0';
}
it would be totally legit for a compiler to optimize it into *nothing at
all* because it has no effects that can be observed by a conforming program.

Also, if you're doing kernel code, and allocating a buffer on the stack,
you should perhaps be initializing the whole thing, because you know
what matters more than cycle counts? LEAKING KERNEL STACK SPACE MATTERS
MORE THAN CYCLE COUNTS. If you declare a 128-byte buffer, then write
over the first 6 bytes, you may be returning 122 bytes of kernel stack
data to userspace. That's Very Bad.
What I am looking for is a buffer which has the null terminated string once it is sent by a different function.

Maybe give us an example of an *actual complete function* which does what
you want it to do? Something like:
void populate_buffer(char *src);
char ch[128];
strncpy(ch, src, strlen(src) + 1);
}

or
void populate_buffer(void);
char ch[128];
char *src = other_fn();
strncpy(ch, src, strlen(src) + 1);
}

Because how you get the other string matters. Also it matters how
*absolutely* sure you are that the thing you are getting is actually
a string, and that it is under 128 characters.

See, everything so far has been predicated on the notion that there was
even a tenuous connection between the code you posted and your actual
question. So we were talking about a fixed initializer, known at runtime,
which was a string literal.

If you are getting a buffer from another source, everything changes.

Consider what happens with your happy little program here if the
string you're given is 129 characters long.
I can't do this as I am not sure if example is called with a null terminated string.

....

Dude.

DUDE.

WHAT. THE. ****.

Let us look back at your original example:

void foo()
{
char array[128] = {0};
memcpy(array, "testing", strlen("testing"));
}

Now let us imagine that we are calling this with a string, not using a
string literal:

void foo(char *src)
{
char array[128] = {0};
memcpy(array, src, strlen(src));
}

Similarly:
void foo(char *src)
{
char array[128];
memcpy(array, src, strlen(src));
array[strlen(src)] = '\0';
}

Now.

You just said you *do not know* whether src is null-terminated.

What. Do. You. Think. Happens. If. It's. Not?

The reason I went all the way to "what the ****" with this is:

1. You are working on kernel code.
2. You are apparently calling strlen() on something that *may not be
a null-terminated string*.
3. You are responding to review comments.
4. The review comments are arguing about the cycle count for the
initialization, not the fact that you are apparently calling strlen()
on something which may not be a string.

This is, frankly, absolutely terrifying.

-s
 
K

Keith Thompson

James Kuyper said:
On Sunday, November 10, 2013 3:05:35 AM UTC-8, Seebs wrote: ...
The reason why I have written the way it is written here is that i
want it to closely match what was being reviewed and suggested.There
were two suggestions which closely matches the two programs I
mentioned above.I can't do this way:
char ch[128] = "anish";
As the buffer is provided by userspace to the kernel.We don't know if
the buffer provided is null terminated or not.

Then your examples should have been structured to reflect that fact.
There's nothing in your example to suggest any distinction between a
kernal and userspace.

And since the C language itself has no concept of kernel and userspace,
this might not be the best place for your question. I'm not sure what
is. comp.unix.programmer *might* be ok, but something with "linux" in
its name might be better.

Also, I see you're posting using the badly broken Google Groups
interface. It encourages writing paragraphs as long lines with no
explicit breaks; it's better to keep your lines down to about 72 columns
(don't count on other readers' software to cleanly break text at word
boundaries). And it tends to insert gratuitous blank lines into quoted
text, resulting in text that appears to be double- or quadruple-spaced.

If you must use Google Groups, please copy-and-paste your article into a
decent text editor, fix it up, and then copy-and-paste it back. Better
yet, get an account on a real Usenet server (I use and recommend
news.eternal-september.org) and use a Usenet client program (Mozilla
Thunderbird isn't bad; I use Gnus, which you'll like if and only if you
like Emacs).
 
K

Keith Thompson

Keith Thompson said:
James Kuyper said:
On Sunday, November 10, 2013 3:05:35 AM UTC-8, Seebs wrote: ...
The reason why I have written the way it is written here is that i
want it to closely match what was being reviewed and suggested.There
were two suggestions which closely matches the two programs I
mentioned above.I can't do this way:
char ch[128] = "anish";
As the buffer is provided by userspace to the kernel.We don't know if
the buffer provided is null terminated or not.

Then your examples should have been structured to reflect that fact.
There's nothing in your example to suggest any distinction between a
kernal and userspace.

And since the C language itself has no concept of kernel and userspace,
this might not be the best place for your question. I'm not sure what
is. comp.unix.programmer *might* be ok, but something with "linux" in
its name might be better.

And that's assuming the kernel in question is Linux, which I don't think
you've actually mentioned.

You are not giving us enough information to help you.
 
A

anish singh

Keith Thompson said:
On 11/10/2013 02:20 PM, anish singh wrote:
The reason why I have written the way it is written here is that i
want it to closely match what was being reviewed and suggested.There
were two suggestions which closely matches the two programs I
mentioned above.I can't do this way:
char ch[128] = "anish";
As the buffer is provided by userspace to the kernel.We don't know if
the buffer provided is null terminated or not.

Then your examples should have been structured to reflect that fact.
There's nothing in your example to suggest any distinction between a
kernal and userspace.
And since the C language itself has no concept of kernel and userspace,
this might not be the best place for your question. I'm not sure what
is. comp.unix.programmer *might* be ok, but something with "linux" in
its name might be better.



And that's assuming the kernel in question is Linux, which I don't think

you've actually mentioned.



You are not giving us enough information to help you.
I apologize for causing this mess by giving a
lot of messed up code and asking something which
doesn't make sense.So I have decided to drop this
topic of deciding which is better code as it
doesn't make sense.However I have this below code
which i think will answer everyone's query.

#define ERROR 1
#define OK 0

/*userspace wants us to copy size bytes*/
int foo(char *buf, int size)
{
char string[128];
if(size >= 127) {
return -ERROR;
}
/*
* what I want here is the null terminated string
* eventhough the caller of foo has not provided
* a null terminated string.If size is 126 then
* I will set 127 byte to 0.
*/
snprintf(string, size+1, "%s", buf);
printf("%s\n", string);
return OK;
}

int main()
{
char *string;
int size=0;
string = malloc(256*sizeof(char));
strcpy(string, "anish"); /*set in userspace*/
/*set in userspace--anything can happen*/
size=5;
foo(string, size); /* check the return value*/
size=6;
strcpy(string, "anishkumar");
foo(string, size);/* check the return value*/
return OK;
}

Is it a good code?
 
G

glen herrmannsfeldt

(snip on strncmp)
Many programmers assume that strncpy() is just a "safer" version of
strcpy(). It isn't. It's a bizarre function that can easily leave
the target array without a terminating '\0' (i.e., not containing a
string) if you're not very careful. If you add an explicit check
to ensure that the target is null-terminated, you probably might
as well use strcpy() or memcpy().

When I use strncmp(), I usually follow the call by a store of zero
(null) into the last array element. I suppose strncmp() could have been
designed to do that, but maybe there are some cases where it isn't
needed or wanted.

-- glen
 
K

Kenny McCormack

(snip on strncmp)


When I use strncmp(), I usually follow the call by a store of zero
(null) into the last array element. I suppose strncmp() could have been
designed to do that, but maybe there are some cases where it isn't
needed or wanted.

You mean, of course, strncpy(), not strncmp().

Two comments:

1) I've heard that strncpy() was originally designed for applications
that needed exactly that - that is, exactly 'n' bytes copied, neither
more nor less. One example is filling the filename blocks on the
PDP11.

2) When I first started working with strncpy(), I, too, found its
default behavior weird, so I wrote a Strncpy() that was a wrapper
around it that, as you suggest, adds in the null byte afterwards.

--
(This discussion group is about C, ...)

Wrong. It is only OCCASIONALLY a discussion group
about C; mostly, like most "discussion" groups, it is
off-topic Rorsharch [sic] revelations of the childhood
traumas of the participants...
 
J

James Kuyper

In the following "you" refers to Anish, not me - right? In context, it
seems obvious, but I ordinarily would have expected the use of the third
person when referring to him in a reply to me.
 
S

Seebs

I apologize for causing this mess by giving a
lot of messed up code and asking something which
doesn't make sense.So I have decided to drop this
topic of deciding which is better code as it
doesn't make sense.However I have this below code
which i think will answer everyone's query.
Thanks!

#define ERROR 1
#define OK 0
/*userspace wants us to copy size bytes*/
int foo(char *buf, int size)

This makes a huge difference, I'd point out: That the size
of the provided string is precomputed changes all sorts of
things. For instance, consider the earlier versions:
/* 1 */
memcpy(space, buf, strlen(buf));
/* 2 */
memcpy(space, buf, strlen(buf));
buf[strlen(buf)] = '\0';

Here, the second one makes two consecutive calls to strlen()
on the same string, so it's going to need computational time
on the close order of 3x the length of the string. The first
is only making one strlen call, and one memcpy call.

But if we change them to:
/* 1 */
memcpy(space, buf, size);
/* 2 */
memcpy(space, buf, size);
buf[size] = '\0';

Now each is only iterating over the string once. Very different!
{
char string[128];
if(size >= 127) {
return -ERROR;
}
/*
* what I want here is the null terminated string
* eventhough the caller of foo has not provided
* a null terminated string.If size is 126 then
* I will set 127 byte to 0.
*/
snprintf(string, size+1, "%s", buf);
printf("%s\n", string);
return OK;
}

My biggest concern about this is that typically, when kernel code
is using a local buffer, there is some possibility that the buffer may
later be exposed to user code. If you don't zero out the whole
buffer, you can be exposing stack contents to user code.
int main()
{
char *string;
int size=0;
string = malloc(256*sizeof(char));
strcpy(string, "anish"); /*set in userspace*/
/*set in userspace--anything can happen*/
size=5;
foo(string, size); /* check the return value*/
size=6;
strcpy(string, "anishkumar");
foo(string, size);/* check the return value*/
return OK;
}
Is it a good code?

It makes a little more sense, anyway. :)

There are still some issues here.
int foo(char *buf, int size)
{
char string[128];
if(size >= 127) {
return -ERROR;
}
/*
* what I want here is the null terminated string
* eventhough the caller of foo has not provided
* a null terminated string.If size is 126 then
* I will set 127 byte to 0.
*/
snprintf(string, size+1, "%s", buf);
printf("%s\n", string);
return OK;
}

This has at least a couple of serious potential bugs lurking. The most
obvious is that "int" is a signed type, which means size can be negative.
A negative size isn't >= 127, but snprintf(string, (-2)+1, "%s", buf);
would be at risk of overrunning the buffer fairly dramatically, unless
the snprintf implementation is helpfully realizing you can't possibly
have meant that.

There's another, perhaps more subtle, bug which is that your test should
probably be ">127", not ">=127". You can copy 127 characters, plus a
null byte, into a 128-byte string.

That said, I think you are misunderstanding the point of snprintf.
You should ALWAYS specify the amount of space available to you, not
the amount you think you have, for snprintf. So that line should be:
snprintf(string, sizeof(string), "%s", buf);

If you do this, you are guaranteed that you will not overrun "string",
no matter what value size has. So let's see:

int foo(char *buf, int size)
{
char string[128];
int copied;
if (size > (sizeof(string) - 1)) {
return -ERROR;
}
/* use .* to make snprintf stop immediately if it reaches the end
* of the buffer, so if the user hands us a 32kb string, we don't
* spend a lot of time not-copying it to determine how many bytes
* we'd need to copy.
*/
copied = snprintf(string, sizeof(string), "%.*s", sizeof(string), buf);
if (copied != size) {
/* user lied to us: string was shorter
* or longer than expected. */
return -ERROR;
}
printf("%s\n", string);
return OK;
}

That said... I'd still initialize the local variable. I don't know what
you plan to do with string after copying the user's data into it, but if
there is ANY chance that it would ever be part of something returned to
a user function, it should not have uninitialized stack contents in it.

You'll note a lot of use of sizeof(string). That's because it is a really
bad habit to rewrite that number. Let's say you start with:
char string[128];
if(size >= 127) {
return -ERROR;
}

Well, somewhere along the line, someone will add an additional test:
char string[128];
if (!buf) {
return -ERROR;
}
if(size >= 127) {
return -ERROR;
}

Now the "127" and "128" aren't adjacent. Then someone else comes along
and says "hang on, we know what's in this string, and it's impossible
for it to ever exceed 60 bytes. Let's leave a little slop value just
in case, but not that much":
char string[80];
if (!buf) {
return -ERROR;
}
if(size >= 127) {
return -ERROR;
}

And now what happens when someone inserts an 87-byte string? *BOOM*.

I guess the key point I want to get at here is:

C does not care what you are trying to illustrate, or what you think is
important, or anything else. C does what you told it to, exactly, whether
or not that is what you want. If you are going to work in C, remember that
C *does not care*. It is not trying to help you. It is not going to catch
your harmless mistakes, it is not going to divine your intent, it is not
going to do what you meant.

It does EXACTLY what you said. Mostly. Unless it can do something totally
different and be sure that, if you are not cheating, you won't be able to
tell.

Consider the following code:
int main(void) {
char buf[128];
strcpy(buf, "hello, world!");
printf("%s\n", buf);
return 0;
}

Do you think the code generated by the compiler will include a reference
to strcpy()?

I'll give you a hint: On the implementation I tried on, there was no
call to either strcpy or printf.

I'll give you two lines of assembly, and see whether you can figure out
why:

movabsq $8583909746840200552, %rax ## imm = 0x77202C6F6C6C6568
[...]
callq _puts

What do you think that big number ($85839...) represents?

-s
 
J

James Kuyper

On 11/10/2013 05:45 PM, anish singh wrote:
....
I apologize for causing this mess by giving a
lot of messed up code and asking something which
doesn't make sense.So I have decided to drop this
topic of deciding which is better code as it
doesn't make sense.However I have this below code
which i think will answer everyone's query.

#define ERROR 1
#define OK 0

/*userspace wants us to copy size bytes*/
int foo(char *buf, int size)
{
char string[128];
if(size >= 127) {
return -ERROR;
}

If 1 means "ERROR", as suggested by the macro name, what does -1 mean?
If -1 is your actual error return, why not use

#define ERROR (-1)

Also, if you're going to take a signed size value, you should also be
checking whether it's negative. Since the size argument to snprintf()
has the type size_t, which is unsigned, very interesting and probably
undesirable things could happen if size happens to be a negative number.
If this is indeed part of the kernel, it's even more important than
usual to be careful about such issues.
/*
* what I want here is the null terminated string
* eventhough the caller of foo has not provided
* a null terminated string.If size is 126 then
* I will set 127 byte to 0.
*/
snprintf(string, size+1, "%s", buf);

In this context, snprintf() seems the right tool to use.
printf("%s\n", string);
return OK;
}

int main()
{
char *string;
int size=0;
string = malloc(256*sizeof(char));
strcpy(string, "anish"); /*set in userspace*/
/*set in userspace--anything can happen*/
size=5;
foo(string, size); /* check the return value*/

That's a rather interesting comment to put on a line where you very
explicitly fail to check the return value. I think that what you mean is
something I would ordinarily express by a comment saying "error handling
suppressed for the sake of clarity".
size=6;
strcpy(string, "anishkumar");
foo(string, size);/* check the return value*/
return OK;
}

Is it a good code?

The 'size' variable in main() is pretty pointless. This example code
would have been slightly simpler and a little bit clearer by just
directly passing 5 or 6 to foo().
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,123
Messages
2,570,735
Members
47,289
Latest member
KathrynSta

Latest Threads

Top