Something wrong in my program

  • Thread starter Dominique =?ISO-8859-1?Q?L=E9ger?=
  • Start date
R

Régis Troadec

I'm sorry to post such programming misbehaviours,
but it drives me to ask two questions...See below

August Derleth said:
exit(EXIT_SUCCESS); /* 0 is fine, too */
/* or you could have return 0; just as well */

ugghhh...I forgot to return
char* trim(char* s)
{
char * res;
int cnt = 0;
int cnt2 = 0;

while (isspace(s[cnt]) != 0)
cnt++;

res = (char*)malloc(sizeof(char)*(strlen(s)-cnt+1));
Not testing the return value of malloc() is VERY bad, because if
malloc() returns NULL you will attempt to access memory you don't own.
Which is a source of undefined behavior. Which means your program, OS,
and hardware can literally do /anything/ after that point.

It is important enough that even trivial code should test for a failure,
and some coders (myself included) have created a version of malloc that
will quit the program with an error message instead of returning NULL.
(That can be deeply stupid in some cases, which is why I only use it in
the cases where it makes sense.)

Shame on me, I'll provide much more safe-code next time.
I don't always think to test the retval of malloc
in trivial programs ...I've surely acquired this bad practice by programming
at school
Casting the retval of malloc() is bad:
1. Can hide a failure to #include <stdlib.h> because if a prototype is
not in scope, malloc() implicitly returns an int. This can be dangerous
on some machines.
2. Makes code harder to change later, when you decide to make malloc()
allocate something besides a buffer of char.
3. Demonstrates a lack of knowledge about how a pointer to void works.
4. Introduces unneeded visual clutter.

2 questions :

1. I've heard that explicit conversion from void* to the type of lvalue
was sometimes necessary for portability because the adresses of used types
could
have different formats on the target system, is it true ?
2. I read the FAQ (7.7), and another explanation (surely good) is given :
casting malloc() was required before ANSI/ISO introduced the void* pointer,
only to silence warnings of assignements. Is it the only reason ?
sizeof(char) is unneeded, as sizeof(char) == 1 by definition.
while (s[cnt] != '\0')
res[cnt2++] = s[cnt++];

res[cnt2] = '\0';

return &res[0];

Why not just return res; ?

I find &tab quite easy to understand and interpret as the "i-th element
adress"

Best regards, régis
 
I

Irrwahn Grausewitz

CBFalconer said:
With all this foofaraw, nobody seems to point out that all trim
requires (for leading blanks) is:

const char *trim(const char *s)
{
while ((' ' == *s) || ('\t' == *s)) s++;
return s;
}

You must've missed my first reply to the OP upthread
(ID: (e-mail address removed)), which
contained the following code snippet:

IG: #include <ctype.h>
IG: [...]
IG: char *skipspace( char *s )
IG: {
IG: while ( isspace( *s ) )
IG: s++;
IG: return s;
IG: }

However, some notes:

- I admit that throwing in some const qualifiers at appropriate
places would be an improvement.

- In C99 isblank() could be used instead of isspace(), which
would make the code behave like your version, modulo locale
specifics, which are accounted for by isblank() but not by
your code.

Regards
 
C

Chris Torek

What happens if you try to use a VLA to allocate more memory than your
OS will let your program have?

The Standard does not say.

Several "real world" systems rudely terminate your program with a
SIGILL, often subsequently reported as an "Illegal instruction".
This is rather misleading, as the actual machine instructions
involved were entirely legal ("subtract N from stack pointer") --
it was the resulting stack overflow that was problematic. :) A
termination message of the form "this program tried to use more
memory than you instructed me to allow it" would be considerably
nicer.

(On at least some of these systems, you can arrange to catch
SIGILL signals on an alternate stack, and do something about it.
Or, of course, you can just use malloc() instead of VLAs, but
then you should free() the memory when you are done with it.)
 
M

Mark McIntyre

2 questions :

1. I've heard that explicit conversion from void* to the type of lvalue
was sometimes necessary for portability because the adresses of used types
could have different formats on the target system, is it true ?

In C. this is never true - void* converts implicitly to any other
pointer type. C++ is different.
2. I read the FAQ (7.7), and another explanation (surely good) is given :
casting malloc() was required before ANSI/ISO introduced the void* pointer,
only to silence warnings of assignements. Is it the only reason ?

Yes, in C. The other reason often given is that C++ needs the casts,
so if you are a C++ programmer you get used to them. This is a bad
reason - its like not using the clutch in a manual gearbox because you
habitually drive an automatic. It may be harmless but....
return &res[0];

Why not just return res; ?

I find &tab quite easy to understand and interpret as the "i-th element
adress"


Sure, for you and me, with our decades of experience. But you don't
write code for /your/ ease of understanding, you write it for the
maintenance programmer who comes after you. This maintainer may be a
junior programmer, and you need to make it easy for them. So use the
simplest expression that is correct.

In this case, returning the address of the 0-th element merely
obfuscates the code and makes maintenance harder.
 
D

Dominique =?ISO-8859-1?Q?L=E9ger?=

Bruno said:
Dominique said:
Hello guys,

I'm kinda new to C, and I'm having a hard time with strings. What I'm
trying to do is a simple function that trims spaces & tabs at the
beginning of a
given string. For example, I want this: " Hello World" to become this:
"Hello World". At first glance, my function seems to work, but returns
some strange values...

Here's my code (please pardon the mess):

#include <stdio.h>
#include <string.h>

int main(void){
char *trimbegin(char *text);

Move the function definition above the main(), and you won't have to
declare its prototype here.
char *str = " Hello World!";
char *result = trimbegin(str);
printf("What the function returns: \"%s\"\n", result);
return 0;
}

char *trimbegin(char *text){
int i = 0, j = 0, ok = 0;
int size = strlen(text);
should be a size_t, not an int.
char buffer[size + 1];
gcc -Wall -ansi -pedantic :
warning: ISO C89 forbids variable-size array `buffer'

char *ptr;

printf("Original text is: \"%s\"\n", text);
printf("That's %d characters long...\n", size);
printf("Now, our text buffer can contain %d characters\n", size +
1);
for (i = 0; i <= size; i++){
if (ok == 1){
buffer[j] = text;
j++;
}
else if (isspace(text) == 0 && ok == 0){

gcc -Wall -ansi -pedantic :
warning: implicit declaration of function `isspace'
buffer[j] = text;
j++;
ok = 1;
}
}


Are you sure your algorithm is ok and as simple as it could be ?
printf("What the result is supposed to be: \"%s\"\n", buffer);
ptr = buffer;
return ptr;

Which can be shortened to :
return &buffer[0];
and further to :
return buffer;

in which case you've got an additionnal, and very annoying warning :

gcc -Wall -ansi -pedantic :
warning: function returns address of local variable

You understand that, even if the value returned is an effective memory
address, what becomes of the memory block starting at this address is no
more under your control as soon as the function returns ? This memory
may be reclaimed and reused by the system at any time, so trying to read
at this at address may have any unpredictable result - not talking about
*writing* at this address.
And here's the output:

[dom@localhost C]$ ./a.out
Original text is: " Hello World!"
That's 14 characters long...
Now, our text buffer can contain 15 characters
What the result is supposed to be: "Hello World!"
What the function returns: "Hello World! @8@$÷ÿ¿¸öÿ¿PS@ý@Èöÿ¿"


Why does it return "Hello World! @8@$÷ÿ¿¸öÿ¿PS@ý@Èöÿ¿" and not "Hello
World!"?

First you forgot to copy the terminating '\0' to buffer, so buffer is
char array, but not a string. In C, a string is a char array *terminated
by the char '\0'*. Note that this character is *not* counted by
strlen(), since it's not part of the string, but merely a 'sentinel'
indicating where the string ends (the 'effective' string may be shorter
than the memory block it is contained in).

Then, you return the address of a local automatic variable, which
invokes UB (Undefined Behavior) as soon as you read or write at this
address. Anything can happen, even that it *seems* to work correctly.

Solutions are :
- either dynamically allocate memory for buffer in trimbegin(), or
declare buffer outside trimbegin() and pass its address to trimbegin
(for now you'd better try the second solution)
- make sure that buffer terminates with a '\0' (the simplest way to do
it being to copy the terminating '\0' of the source string)
- eventually rewrite your algorithm to make it simpler

tip 1 : you dont need any test in the for loop body nor the ok flag
tip 2 : the for loop syntax is :
for (<initialisation>;<test>;<on_each_iteration>)
{
<body>
}
where any of <initialisation>, <test>, <on_each_iteration>, and
<body> can be empty.


Feel free to post amended code for review !-)

HTH
Bruno


Here's the amended code:

#include <stdio.h>
#include <string.h>
#include <ctype.h>

char *trim(const char *text);
int main(void){
char *result = trim(" Hello World!");
printf("What the function returns: \"%s\"\n", result);
return 0;
}

char *trim(const char *text){
int i = 0;
static char buffer[20];
while( isspace(text) ) i++;
strcpy(buffer, &text);
printf("What the result is supposed to be: \"%s\"\n", buffer);
return buffer;
}

A little better, isn't it? :) However, dynamically allocating memory for
'buffer' would definitely be more interesting. I'll try to figure it out on
my own and post the results later...

I have one last question: Everybody here seems to despise the new standard,
C99. Why is that so?

-Dom
 
S

Sidney Cadot

Dominique said:
big snip...
I have one last question: Everybody here seems to despise the new standard,
C99. Why is that so?


I don't think you are gauging opinion on this accurately. While opinions
differ, I think that most people here would agree that C99 has a number
of nice improvements over C89, such as

- introduction of the "restrict" keyword
- locally declared-and-scoped loop variables
- introduction of >= 64-bit ints, and boolean/complex types

Most would probably also agree that '//' type comments should not have
been introduced, but that's another matter :)

Unfortunately, C99 compilers are few and far between; e.g. the GNU C
compiler supports only a subset of the new features. Basically if you
need to write code that must be portable to multiple compilers /
platforms, C89 is the way to go at this point.

One thing that raises my eyebrows once in a while when following the
traffic here, is that the "standards thumpers" usually quote
chapter-and-verse from the C99 standard (this seems to be the consensus
on the 'de jure' standard), while code snippets using new C99 features
usually elicit snide comments - C89 is preferred as the 'de facto'
standard. The latter is perhaps not in small part because C99 code can
look a lot like C++ code :)


Best regards,

Sidney
 
C

CBFalconer

August said:
those who know me have no need of my name wrote:
.... snip ...

What happens if you try to use a VLA to allocate more memory than
your OS will let your program have?

I would expect about the same reaction as to a stack overflow. If
the OS can't simply allocate more stack the program normally goes
boom.
 
C

CBFalconer

those said:
isspace() may be better to use, as it is locale sensitive. then
again some may dislike that carriage control, line termination,
and vertical white-space are also included.

And if I had used it I would have needed to mention some includes,
complicate with casts to unsigned char, etc. to fend off the
ravening hordes of c.l.c nitpickers. I would also have had to
point out the above isspace failings, and the routine would have
been performing calls to isspace, and thus probably be less
efficient. I suspect blanks and tabs are precisely what the OP
wanted to skip, so I said so.
 
C

CBFalconer

Sidney said:
I don't think you are gauging opinion on this accurately. While
opinions differ, I think that most people here would agree that
C99 has a number of nice improvements over C89, such as

In addition the C99 standard is readily available, while the C89
standard is not. So far, if one adheres to C89 the result is
usually C99 valid, but not the reverse. Once GCC becomes fully
C99 compliant, and people built suitable libraries, I think the
rush will begin.
 
A

August Derleth

CBFalconer said:
... snip ...



I would expect about the same reaction as to a stack overflow. If
the OS can't simply allocate more stack the program normally goes
boom.

Eh, ugly. Very ugly.

Between what you and Chris Torek wrote, I'd rather use malloc(): At
least the library function gives me some chance to respond to an error,
even if I need to keep track of my malloc()-free() pairs.

But I wasn't on the C99 committee, and I don't have a Real C99 Compiler
anyway. ;)
 
I

Irrwahn Grausewitz

Irrwahn Grausewitz said:
(ID: (e-mail address removed))
IG: while ( isspace( *s ) )
IG: s++;
<snip>

D'oh! I wonder why no-one objected to the missing cast
to unsigned char; will I ever get this right?!?
Well, at least the code matches the subject line... :)

Regards
 
B

Bruno Desthuilliers

Dominique said:
Bruno Desthuilliers wrote:

(snip)
Feel free to post amended code for review !-)


Here's the amended code:

#include <stdio.h>
#include <string.h>
#include <ctype.h>

char *trim(const char *text);
int main(void){
char *result = trim(" Hello World!");
printf("What the function returns: \"%s\"\n", result);
return 0;
}

char *trim(const char *text){
int i = 0;
static char buffer[20];
while( isspace(text) ) i++;
strcpy(buffer, &text);


woops ! What happens if buffer is too small ?
printf("What the result is supposed to be: \"%s\"\n", buffer);
return buffer;
}

A little better, isn't it? :)

I see you've carefully read the other answers !-)
But now you have a big security hole in your code (strcpy to a
fixed-size char array), and the code is not thread-safe (static char array).
However, dynamically allocating memory for
'buffer' would definitely be more interesting.

As a learning exercise, certainly. As a practical solution, I'm not sure
this is the better thing to do...
I'll try to figure it out on
my own and post the results later...

I have one last question: Everybody here seems to despise the new standard,
C99. Why is that so?

It's more a matter of C99-compliant compilers availability than 'despise'.

Bruno
 
R

Randy Howard

Eh, ugly. Very ugly.

Between what you and Chris Torek wrote, I'd rather use malloc(): At
least the library function gives me some chance to respond to an error,
even if I need to keep track of my malloc()-free() pairs.

I agree. VLA's don't really provide anything you can't have otherwise
with a lot less downside risk of demon invoking.
 
R

Régis Troadec

Thank you for the answers.

Mark McIntyre said:
2 questions :

1. I've heard that explicit conversion from void* to the type of lvalue
was sometimes necessary for portability because the adresses of used types
could have different formats on the target system, is it true ?

In C. this is never true - void* converts implicitly to any other
pointer type. C++ is different.
2. I read the FAQ (7.7), and another explanation (surely good) is given :
casting malloc() was required before ANSI/ISO introduced the void* pointer,
only to silence warnings of assignements. Is it the only reason ?

Yes, in C. The other reason often given is that C++ needs the casts,
so if you are a C++ programmer you get used to them. This is a bad
reason - its like not using the clutch in a manual gearbox because you
habitually drive an automatic. It may be harmless but....
return &res[0];

Why not just return res; ?

I find &tab quite easy to understand and interpret as the "i-th element
adress"


Sure, for you and me, with our decades of experience. But you don't
write code for /your/ ease of understanding, you write it for the
maintenance programmer who comes after you. This maintainer may be a
junior programmer, and you need to make it easy for them. So use the
simplest expression that is correct.

In this case, returning the address of the 0-th element merely
obfuscates the code and makes maintenance harder.


--
Mark McIntyre
CLC FAQ <http://www.eskimo.com/~scs/C-faq/top.html>
CLC readme: <http://www.angelfire.com/ms3/bchambless0/welcome_to_clc.html>


----== Posted via Newsfeed.Com - Unlimited-Uncensored-Secure Usenet News==----
http://www.newsfeed.com The #1 Newsgroup Service in the World! >100,000 Newsgroups
---= 19 East/West-Coast Specialized Servers - Total Privacy via Encryption

=---
 
R

Richard Heathfield

Mark said:
In C. this is never true - void* converts implicitly to any other
pointer type.

But not without possible loss of information when used with function pointer
types. Caution is indicated.
 
P

pete

Richard said:
But not without possible loss of information
when used with function pointer types. Caution is indicated.

void* doesn't point to functions.
What do you mean ?
 
R

Richard Heathfield

pete said:
void* doesn't point to functions.
What do you mean ?

I mean that it's not safe to point void * to functions. Since function
pointers are certainly of pointer type, the statement "void* converts
implicitly to *any other* pointer type" is a dangerous one IMHO.
 
P

pete

Richard said:
I mean that it's not safe to point void * to functions. Since function
pointers are certainly of pointer type, the statement "void* converts
implicitly to *any other* pointer type" is a dangerous one IMHO.

You're talking about "possible loss of information"
I thought it was undefined behavior and a constraint violation.
 
A

Arthur J. O'Dwyer

You're talking about "possible loss of information"
I thought it was undefined behavior and a constraint violation.

I think pete is right. I suggest the use of the term 'object pointer',
as contrasted with 'function pointer' -- a void* MAY safely be converted
to any other OBJECT POINTER type, as far as I'm aware.

-Arthur
 
R

Richard Heathfield

pete said:
You're talking about "possible loss of information"
Yes.

I thought it was undefined behavior and a constraint violation.

It is certainly undefined behaviour (I wrongly assumed that this would be
obvious from the fact that a loss of information is involved - a rather
drastic thing where a function address is concerned!). But I can't find a
constraint that it violates, but that doesn't mean there isn't one. Do you
have a reference?

Of course, allowing it is a common extension (see informative Annex J). If
it /is/ a constraint violation, then clearly a diagnostic is required. If
not, then it is possible that a newbie might fall into this trap unawares,
so I still think it was worth pointing out the issue.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,135
Messages
2,570,783
Members
47,341
Latest member
hanifree

Latest Threads

Top