Parsing a string

I

Ian Collins

If you don't initialize a variable then I see the only other way is to
make sure you assign some value to the variable before you use it. Is
that right way ?

Some will say yes, some will say no. I say declare it and initialise it
when it's first needed.
 
M

Malcolm McLean

Some will say yes, some will say no.  I say declare it and initialise it
when it's first needed.
The argument for is that it makes bugs deterministic.

Consider this

int str_count(char *str, char ch)
{
int answer;

while(*str)
if(*str++ == ch)
answer++;

return answer;
}
This will often appear to work when you test it.

This
int str_count(char *str, char ch)
{
int answer = 1;

while(*str)
if(*str++ == ch)
answer++;

return answer;
}

will always fail in the same way.
 
I

Ian Collins

The argument for is that it makes bugs deterministic.

Consider this

int str_count(char *str, char ch)
{
int answer;

while(*str)
if(*str++ == ch)
answer++;

return answer;
}
This will often appear to work when you test it.

This
int str_count(char *str, char ch)
{
int answer = 1;

while(*str)
if(*str++ == ch)
answer++;

return answer;
}

will always fail in the same way.

Because answer is declared and initialised when it's first needed.
 
A

arnuld

Some will say yes, some will say no. I say declare it and initialise it
when it's first needed.

You meant declare it and *assign* to it when it's first needed:

int i; /* declaration */
i = 0; /* assignment */

int i = 0; /* declaration and initialization */
 
A

arnuld

The argument for is that it makes bugs deterministic.

Consider this

int str_count(char *str, char ch)
{
  int answer;

  while(*str)
    if(*str++ == ch)
      answer++;

  return answer;}

This will often appear to work when you test it.

No assignment to variable before its used. Thats what I consider a
spottable bug.

This
int str_count(char *str, char ch)
{
  int answer = 1;

  while(*str)
    if(*str++ == ch)
      answer++;

  return answer;

}

will always fail in the same way.

Okay, Now I got what Eric Sosman was pointing out in my code.
Initialized values in my code were nowhere blonging to some reason.
(like here answer should start from zero, reason for its assignment/
initialization). My initialized values were just for the sake of
initialization.
 
E

Eric Sosman

Thanks for clarification else I kept on thinking you were talking about
initializing the array. My use of arrays is clumsy ... hummm....... I
used arrays like this only:

(1) initialize
(2) pass it to function to do the job
(3) print it

So what is clumsy here ?

First, the very existence of two arrays. The code as shown can
certainly do without at least one of them, perhaps both. A variable
that isn't needed (especially a "large" variable) is as clumsy as the
kitchen sink in your car's back seat.

Second, the way you use one of them strikes me as clumsy. You
initialize the entire array to zeroes, just so you can then memcpy()
some characters into the beginning and know that they're followed
by a zero terminator. The initial part of the array is overwritten
by the memcpy(), the tail is never used at all, and there's only
one byte that you actually need to have as a zero. Why not zero
that one necessary byte instead of clumsily zeroing a lot of bytes
you'll never care about?

Parsimony is the soul of programming.
 
E

Eric Sosman

If you don't initialize a variable then I see the only other way is to
make sure you assign some value to the variable before you use it. Is
that right way ?

The variable must be given a value (by whatever means) before you
attempt to use its value. But what you've done is, paraphrasing,

int variable = 42;
variable = 17;

.... and the initialization is pointless.
 
K

Keith Thompson

arnuld said:
You meant declare it and *assign* to it when it's first needed:

int i; /* declaration */
i = 0; /* assignment */

int i = 0; /* declaration and initialization */

What makes you think that was what he meant?

I won't speak for Ian, but my own preference is to use initialization
whenever a sensible initial value can be computed at the point
where the object is declared, and assignment otherwise. In C90,
I might occasionally introduce an inner block so I can delay the
declaration and use an initializer; in C99 <OT>or C++</OT> I might
just declare the object just before it's needed.

If every possible code path assigns a value to a given variable,
you don't actually *need* to initialize it. You might choose to
do so anyway, just for the sake of robustness, to guarantee that
it has some defined value. On the other hand, in such a case if
the initial value is used, it indicates a bug in your program --
one that the compiler might have detected for you if you had omitted
the initializer.

Consider this horribly contrived program:

#include <stdio.h>
#include <time.h>
int main (void)
{
int n;
switch (time(NULL) % 4) {
case 0:
n = 10;
break;
case 1:
n = 42;
break;
case 2:
n = 500;
break;
}
printf("n = %d\n", n);
return 0;
}

25% of the time, it uses the value of n without having initialized it,
which is clearly a bug. When I compile this with the right
optimizations and warnings enabled, I get:

c.c: In function `main':
c.c:5: warning: `n' may be used uninitialized in this function

If I changed the declaration "int n;" to "int n = -1;", I'd get more
predictable behavior (it would print "n = -1" 25% of the time),
but I wouldn't get the warning, since the compiler has no way of
knowing that -1 isn't a perfectly valid value.
 
B

BartC

Eric Sosman said:
On 11/16/2010 11:46 PM, arnuld wrote:
Second, the way you use one of them strikes me as clumsy. You
initialize the entire array to zeroes, just so you can then memcpy()
some characters into the beginning and know that they're followed
by a zero terminator. The initial part of the array is overwritten
by the memcpy(), the tail is never used at all, and there's only
one byte that you actually need to have as a zero. Why not zero
that one necessary byte instead of clumsily zeroing a lot of bytes
you'll never care about?

What's the problem?

During development, it might be useful to know that the array is all-zeros.
It also means any bugs cause the program to fail in a more consistent,
repeatable manner.

And if it's proven the initialisation is not necessary, then perhaps it can
be removed after development is over, as an optimisation step.
 
B

BartC

Eric Sosman said:
The variable must be given a value (by whatever means) before you
attempt to use its value. But what you've done is, paraphrasing,

int variable = 42;
variable = 17;

... and the initialization is pointless.

Until you change it to this:

int variable = 42;
// variable = 17;

or possibly:

if (cond) variable = 17;

(Yes I know this is the same point I made elsewhere in the thread, but is
another example.)
 
I

Ian Collins

What makes you think that was what he meant?

I won't speak for Ian, but my own preference is to use initialization
whenever a sensible initial value can be computed at the point
where the object is declared, and assignment otherwise. In C90,
I might occasionally introduce an inner block so I can delay the
declaration and use an initializer; in C99<OT>or C++</OT> I might
just declare the object just before it's needed.

That's also my preference.
 
E

Eric Sosman

Until you change it to this:

int variable = 42;
// variable = 17;

or possibly:

if (cond) variable = 17;

(Yes I know this is the same point I made elsewhere in the thread, but
is another example.)

Let me get this straight: Do you object to an assessment of
Program A because it might not apply to a completely different
Program B?
 
K

Keith Thompson

Eric Sosman said:
On 11/17/2010 12:11 PM, BartC wrote: [...]
During development, it might be useful to know that the array is
all-zeros. It also means any bugs cause the program to fail in a more
consistent, repeatable manner.

(Shrug.) It might be even more useful if the compiler's data
flow analysis could produce a warning about an uninitialized variable,
which of course it cannot if the variable is initialized. You pays
your money and you takes your choice: You silence the compiler and
gain a repeatable (though not necessarily correct) outcome, or you
leave the compiler the opportunity to warn (it may or may not do so),
with the drawback that the behavior of the erroneous code might not
be reproducible.
[...]

It might be nice to have a language feature that lets you initialize
an object while telling the compiler that you never want to use
the initial value. The generated code would initialize the object
(though the initialization could be removed if the compiler can
prove that the value is never used), but the compiler could still
perform data flow analysis and warn you if it determines that the
value might be used.
 
B

Barry Schwarz

I wrote this program (parser ?) to extract two values from a string
separated by a comma. I extract the values without any problem. Error
checking is there with different kinds of input but in comparison to clc
folks my error checking always looks like Calvin and Hobbes plan for
saving themselves form world war 3.

As usual, any advice om improving the program ? (I wonder why the heck
the value of *pname never changes even when pointer pname is going to
next element for each loop)


/* A program that searches for 2 comma separated values inside a string:
name and number. Name can be anything while number can only be
* an integer greater than zero.
*
* VERSION 0.0
*/

#include <stdio.h>
#include <stdlib.h>
#include <errno.h>

enum { SIZE_NAME = 10 };

void get_values(const char*, char*, const int, int*);

int main(void)
{
const char* arr1 = "Arnul@23,1";
const char* arr2 = "00#$%^&@23,Arnuld";
const char* arr3 = "Arnuld,";
char name[SIZE_NAME+1] = {0};
int num = -1;

printf("arr: %s\tname: %s\tnum: %d\n", arr3, name, num);
printf("\n");

get_values(arr3, name, SIZE_NAME, &num);
printf("name = %s\tnum = %d\n", name, num);

return 0;
}



void get_values(const char* p, char* pname, const int pname_size, int*
pnum)
{
int idx;

for(idx = 0; (pname_size != idx) && *p && (',' != *p); ++p, ++idx, +
+pname)
{
printf("*pname = %c, *tmp = %c\n", *pname, *p);

Since there is no variable tmp, your choice of text is just
misleading.

What is the value of *pname (arr3[0]) when you enter this function?
What would you expect it to look like on your display?
*pname = *p;

You store a value in *pname, and then increment pname as part of the
for statement. What is the value of *pname immediately after the
first increment (arr3[1])? After the second increment (arr3[2])?
}

errno = 0;
*pnum = (int) strtol(p+1, NULL, 10);
if(errno)
{
perror("***ERROR***, can not convert string \"Max Phone Lines\"
into integer\n");

The only value guaranteed to be stored in errno is ERANGE. If that is
the case, then the previous assignment statement already invoked
implementation defined behavior which could preclude reaching the call
to perror.
 
B

Barry Schwarz

Here is the strchr() and memcpy() version, any issues in this:

Why do you post multiple versions of a program without waiting for or
including the comments on previous versions?
 
A

arnuld

(Shrug.) It might be even more useful if the compiler's data
flow analysis could produce a warning about an uninitialized variable,
which of course it cannot if the variable is initialized. You pays your
money and you takes your choice: You silence the compiler and gain a
repeatable (though not necessarily correct) outcome, or you leave the
compiler the opportunity to warn (it may or may not do so), with the
drawback that the behavior of the erroneous code might not be
reproducible.

I want to make some things clear here before I speak more. There are 2
things you can learn from clc or any programming language newsgroup:

(1) Genuine/Proper/Good/Real/(fill your favorite word here) way of
programming in that language. Its only because of clc that I *always*
compile with -ansi (-std=c99) -pedantic -Wall -Wextra flags to gcc.

(2) Style of Programming: This is personal. e.g. Richard Heathfield
considers it as a bug if his function contains more than one return
statement, while others consider having return statements at 3/4/N places
in the functions as a fine idea.


So, what you are saying belongs to (1) or (2) ? I think its (1) but I
want to be sure.



Ah. We can all be heroes and collect large bonuses for speeding
up code by umpty-huge percent. All we need to do is write a useless
delay loop in the original, and later remove it as an "optimization."
Ten out of ten for gaming the system, but zero points for honesty.

Thats why I love clc :)
 
A

arnuld

Why do you post multiple versions of a program without waiting for or
including the comments on previous versions?

The comments from one person made me realize that program was designed in
a very bad way, so I had to correct it. Is correcting after one comment
is wrong ?
 
T

Tim Rentsch

arnuld said:
I want to make some things clear here before I speak more. There are 2
things you can learn from clc or any programming language newsgroup:

(1) Genuine/Proper/Good/Real/(fill your favorite word here) way of
programming in that language. Its only because of clc that I *always*
compile with -ansi (-std=c99) -pedantic -Wall -Wextra flags to gcc.

(2) Style of Programming: This is personal. e.g. Richard Heathfield
considers it as a bug if his function contains more than one return
statement, while others consider having return statements at 3/4/N places
in the functions as a fine idea.


So, what you are saying belongs to (1) or (2) ? I think its (1) but I
want to be sure.

I consider it (2), ie, there are reasonable arguments on both
sides (even though, at the same time, most often I would agree
with the practice Eric Sosman advocates). There isn't any single
right answer because it depends on what compilers and tools are
available, not to mention numerous other potential factors.
 
B

BartC

Let me get this straight: Do you object to an assessment of
Program A because it might not apply to a completely different
Program B?

OK. I just see it differently; ie. as providing a known default value in a
program structure that might be subject to change, or that might have an as
yet unperceived bug. So it's just good practice.

And if that's obvious the initialisation is unnecessary, perhaps the
compiler will remove it.

(BTW why is it that uninitialized static variables are generally all set to
zero, rather than some completely arbitrary bit-pattern that is different on
every run?)
 
C

Chris M. Thomasson

arnuld said:
I wrote this program (parser ?) to extract two values from a string
separated by a comma. I extract the values without any problem. Error
checking is there with different kinds of input but in comparison to clc
folks my error checking always looks like Calvin and Hobbes plan for
saving themselves form world war 3.

As usual, any advice om improving the program ? (I wonder why the heck
the value of *pname never changes even when pointer pname is going to
next element for each loop)


/* A program that searches for 2 comma separated values inside a string:
name and number. Name can be anything while number can only be
* an integer greater than zero.

[...]

What about something simple like this:


http://codepad.org/soY31mYl
_____________________________________________________________
#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <string.h>
#include <limits.h>




#define STRTOL_VALIDATE(mp_number) \
(! ((! (mp_number) \
|| (mp_number) == LONG_MAX \
|| (mp_number) == LONG_MIN) \
&& errno))




struct name_and_number
{
char* name;
int number;
};




struct name_and_number*
name_and_number_parse(
char const* string
){
long int number;
size_t name_size;
struct name_and_number* self;
char const* name = strchr(string, ',');

if (! name) return NULL;
errno = 0;
number = strtol(name + 1, NULL, 10);
if (! STRTOL_VALIDATE(number) || ((int)number) < 1) return NULL;
name_size = name - string;
if (! name_size) return NULL;

if ((self = malloc(sizeof(*self) + name_size + 1)))
{
self->name = (char*)(self + 1);
memcpy(self->name, string, name_size);
self->name[name_size] = '\0';
self->number = (int)number;
}

return self;
}


#define name_and_number_destroy free


void
name_and_number_parse_and_display(
char const* string
){
struct name_and_number* parsed;

if ((parsed = name_and_number_parse(string)))
{
printf("origin: %s\nname: %s\nnumber: %d\n\n",
string, parsed->name, parsed->number);

name_and_number_destroy(parsed);
}

else
{
printf("failed to parse: %s\n\n", string);
}

puts("------------------------------------------");
}




int
main(void)
{
name_and_number_parse_and_display("Arnul@23,1");
name_and_number_parse_and_display("00#$%^&@23,Arnuld");
name_and_number_parse_and_display("Arnuld,");
name_and_number_parse_and_display("Chris Thomasson, 12.65");
name_and_number_parse_and_display("John Doe,-1234");
name_and_number_parse_and_display("Jane Doe");
name_and_number_parse_and_display("");
name_and_number_parse_and_display("1234,34345,6");
name_and_number_parse_and_display("lzkfjclkdsjfaj,asd123");

puts("\n\n\n___________________________________________\n"
"The program has completed; hit <ENTER> to exit...");

getchar();

return 0;
}
_____________________________________________________________



?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,079
Messages
2,570,575
Members
47,207
Latest member
HelenaCani

Latest Threads

Top