String parsing program

P

pereges

Hi I've a string input and I have to parse it in such a way that that
there can be only white space till a digit is reached and once a digit
is reached, there can be only digits or white space till the string
ends. Am I doing this correctly ? :

Code:

#include <stdio.h>
#include <string.h>

int main(void)
{
char s[50];
int i = 0;

gets(s);

while (isspace(s))
i++;
while (isdigit(s))
i++;
while (isspace(s))
i++;
if (s != '\0')
printf("\nIncorrect string\n");

return (0);
}

I want to actually convert a string to unsigned long. So this kind of
algorithm should be carried out prior to strtoul function to ensure
that some of the weakness from which the strtoul function suffers like
convertin 123aaaaa to 123 for eg or -123 to some unsigned value is
removed. This will also ensure that when you have a string like :

1234 78

1234 is not returned but an error message will be printed. Because a
string should only contain 1 number in my program.
 
S

santosh

pereges said:
Hi I've a string input and I have to parse it in such a way that that
there can be only white space till a digit is reached and once a digit
is reached, there can be only digits or white space till the string
ends. Am I doing this correctly ? :

Code:

#include <stdio.h>
#include <string.h>

int main(void)
{
char s[50];
int i = 0;

gets(s);

while (isspace(s))
i++;
while (isdigit(s))
i++;
while (isspace(s))
i++;
if (s != '\0')
printf("\nIncorrect string\n");

return (0);
}


A string completely of whitespace will pass your test.

<snip>
 
P

pereges

A string completely of whitespace will pass your test.

It will be caught by strtoul, I think. But anyway one can extend it
for that case as well:

char s[50];
int i = 0;

gets(s);

while (isspace(s))
i++;
if (s == '\0')
printf("Invalid string\n");
 
S

santosh

pereges said:
A string completely of whitespace will pass your test.

It will be caught by strtoul, I think. But anyway one can extend it
for that case as well:

char s[50];
int i = 0;

gets(s);

while (isspace(s))
i++;
if (s == '\0')
printf("Invalid string\n");


Also isspace will return true for whitespace characters like vertical
tab, newline, carriage return and form feed. If you only want to allow
space and horizontal tab in input then consider isblank.
 
P

pereges

Also isspace will return true for whitespace characters like vertical
tab, newline, carriage return and form feed. If you only want to allow
space and horizontal tab in input then consider isblank.

Thanks for the suggestion but from what I see, it works with isspace
as well. Btw here's my program for parsing doubles/floats (not in
exponential form) :

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int main(void)
{
char s[50];
int i;

gets(s);

i = 0;

while(isblank(s))
{
i++;
}

if (s == '+' || s == '-')
{
i++;
}

if (isdigit(s))
{
while (isdigit(s))
{
i++;
}

if (s == '.')
{
i++;

if (isdigit(s))
{
while (isdigit(s))
{
i++;
}
while (isblank(s))
{
i++;
}
if (s != '\0')
{
printf("Invalid String\n");
return (EXIT_FAILURE);
}
}
else
{
printf("Invalid String\n");
return (EXIT_FAILURE);
}
}
else
{
printf("Invalid string\n");
return (EXIT_FAILURE);
}
}
else
{
printf("Invalid string\n");
return (EXIT_FAILURE);
}
return (EXIT_SUCCESS);
}
 
S

santosh

pereges said:
Thanks for the suggestion but from what I see, it works with isspace
as well. Btw here's my program for parsing doubles/floats (not in
exponential form) :

<snip code>

Your code exhibits undefined behaviour because you have failed to
include ctype.h where the declarations for the is* functions are.
Didn't your compiler warn you about missing declarations? If not, then
set it to the highest possible ISO C conformance and diagnostic levels.
 
P

pereges

Your code exhibits undefined behaviour because you have failed to
include ctype.h where the declarations for the is* functions are.
Didn't your compiler warn you about missing declarations? If not, then
set it to the highest possible ISO C conformance and diagnostic levels.

I'm using digital mars compiler. Yes, I included the ctype.h now and
it still works (for some eg. I took).
 
S

santosh

pereges said:
I'm using digital mars compiler. Yes, I included the ctype.h now and
it still works (for some eg. I took).

One of the first things to do after installing a compiler is to read
it's documentation and find out the switches to supply for enabling
strictest conformance to ISO C and emit maximum possible diagnostics.
It's of greatest help when attempting to write robust, standard C
programs.

For Digital Mars you would probably want to use the '-A95' or '-A99'
option. All warnings are apparently enabled by default. Also consider
the '-r' option, which would have warned you about not including
ctype.h in above code. You can also use the '-p' switch to turn off
the "autoprototyping" feature, which can be dangerous for newbies.
 
B

Ben Bacarisse

pereges said:
Hi I've a string input and I have to parse it in such a way that that
there can be only white space till a digit is reached and once a digit
is reached, there can be only digits or white space till the string
ends.
I want to actually convert a string to unsigned long. So this kind of
algorithm should be carried out prior to strtoul function to ensure
that some of the weakness from which the strtoul function suffers like
convertin 123aaaaa to 123 for eg or -123 to some unsigned value is
removed.

I am not a fan of pre-scanning. It seems like duplicating the effort
already put in by the library author! I think you have been
(slightly) led-astray -- in part because people have just answered the
questions you've asked, and in part because I am ignorant! (See
below...)
This will also ensure that when you have a string like :

1234 78

1234 is not returned but an error message will be printed. Because a
string should only contain 1 number in my program.

The simplest way to scan for a number whilst reporting bad input is to
use the signed strtol function. You check that errno has not been set
to ERANGE and that the end-pointer is not the string you passed in.
If you like, you can now check that nothing but white space is left in
the string. Finally, you confirm the input is the range your program
expects. The signed version lets you detect input like -123. The down
side is that you loose half the range of possible inputs. If that
matters, you can (probably) go up to strtoll.

[Aside. I feel I must "come clean". Until today I did not know that
strtoul accepted "-123" as a valid number[1]. Of course it does the
right thing with it but you can't tell, from the result alone, that
the input was not 4294967173[2]. If I'd been more clued up on that at
the start, I'd have advised the use of strtol right from the get-go.]

[1] Well, I might have known. It seems a strangely familiar
discovery, but it was not up there at the front on my brain where it
was needed to give the best advice. The OP is validating input and,
for most application end users, C's interpretation of (unsigned
long)-123 is just baffling. strtoul is not the right tool.

[2] YMMV
 
B

Ben Bacarisse

pereges said:
Thanks for the suggestion but from what I see, it works with isspace
as well. Btw here's my program for parsing doubles/floats (not in
exponential form) :

STOP!

This way madness lies. If you need to enforce a simplified input
syntax then, OK, I see the point but otherwise strtod will do it all
for you. stroul has an oddity in that it accepts some strings that
might confuse your users, but I don't think strtod has any similar
problems. Of course, I am hardly an authority in the area now!

I think you are making work for yourself.
 
P

pereges

<snip>
The simplest way to scan for a number whilst reporting bad input is to
use the signed strtol function. You check that errno has not been set
to ERANGE and that the end-pointer is not the string you passed in.
If you like, you can now check that nothing but white space is left in
the string. Finally, you confirm the input is the range your program
expects. The signed version lets you detect input like -123. The down
side is that you loose half the range of possible inputs. If that
matters, you can (probably) go up to strtoll.
<snip>

My input is of following format :

45 5666 16000

^^ All of that is just a single string. I need to read the three
numbers into 3 different size_t variables. There can be white space
amongst them but no alphabets or any other characters. It is possible
check if *endp character is nothing but white space (This must be done
when errno != ERANGE and s != endp i.e. when one usually expects
correct output), but any character other than that means data is
erroneous. With unsigned long, you can check if the first non white
space character is a '-' or not. This should solve the problem of
negative numbers as well and prevent their conversion to some unsigned
value before hand.
 
D

Default User

santosh said:
Your code exhibits undefined behaviour because you have failed to
include ctype.h where the declarations for the is* functions are.

Not really. The default declarations will do for those. It's not good
practice, of course.




Brian
 
B

Ben Bacarisse

pereges said:
My input is of following format :

45 5666 16000

^^ All of that is just a single string. I need to read the three
numbers into 3 different size_t variables. There can be white space
amongst them but no alphabets or any other characters. It is possible
check if *endp character is nothing but white space (This must be done
when errno != ERANGE and s != endp i.e. when one usually expects
correct output), but any character other than that means data is
erroneous. With unsigned long, you can check if the first non white
space character is a '-' or not. This should solve the problem of
negative numbers as well and prevent their conversion to some unsigned
value before hand.

Here is one way based on using the widest signed type for input. It
is not ideal, but then without very details specs, what could be? If
you need to accept input right up to SIZE_MAX and you have an
implementation where intmax_t can't hold that value, then you will
need to use strtoumax and check for the - manually, so to speak. That
would not be a big change to parse_size.

#include <stdio.h>
#include <stdbool.h>
#include <inttypes.h>
#include <stdint.h>
#include <errno.h>
#include <ctype.h>

size_t parse_size(const char *num, const char **endp, bool *error)
{
char *ep;
errno = 0;
intmax_t imax = strtoimax(num, &ep, 10);
if (errno == ERANGE || imax < 0 || imax > SIZE_MAX) {
while (isspace(*num))
num++;
fprintf(stderr, "Input \"%.*s\" out of range.\n",
ep - num, num);
if (error)
*error = true;
}
else if (ep == num) {
/* Skip to the next space-delimited portion of the string. */
while (isspace(*ep))
ep++;
num = ep;
while (*ep != '\0' && !isspace(*ep))
ep++;
fprintf(stderr, "Input \"%.*s\" could not be converted.\n",
ep - num, num);
if (error)
*error = true;
}
if (endp)
*endp = ep;
return imax;
}

void parse_three_sizes(const char *input)
{
bool errors = false;
const char *ep;
size_t s1 = parse_size(input, &ep, &errors);
size_t s2 = parse_size(ep, &ep, &errors);
size_t s3 = parse_size(ep, &ep, &errors);

if (!errors) {
/* Check that everything parsed. */
const char *save_ep = ep;
while (isspace(*ep))
ep++;
if (*ep != '\0')
fprintf(stderr, "Superfluous input found: \"%s\"\n", save_ep);
printf("Got: %zu %zu %zu\n", s1, s2, s3);
}
}

int main(int argc, char **argv)
{
if (argc > 1)
parse_three_sizes(argv[1]);
return 0;
}
 
K

Keith Thompson

Default User said:
Not really. The default declarations will do for those. It's not good
practice, of course.

That works only in C90. Even though there are few full C99 compilers,
it doesn't hurt to write good C90 code that's also valid C99 code.

And the is* and to* functions are very likely to be implemented as
macros, with better performance than the function calls. By not
including the header, you miss out on the macro definitions.
 
D

Default User

Keith said:
That works only in C90. Even though there are few full C99 compilers,
it doesn't hurt to write good C90 code that's also valid C99 code.

So? In that case there's a required diagnostic for the missing
declaration.
And the is* and to* functions are very likely to be implemented as
macros, with better performance than the function calls. By not
including the header, you miss out on the macro definitions.

But not undefined behavior.




Brian
 
P

Peter Nilsson

pereges said:
Hi I've a string input and I have to parse it in such a way that that
there can be only white space till a digit is reached and once a digit
is reached, there can be only digits or white space till the string
ends. Am I doing this correctly ? :

Code:

#include <stdio.h>
#include <string.h>

int main(void)
{
char s[50];
int i = 0;

gets(s);

Real bad example.
while (isspace(s))


while ((unsigned char) s)
i++;
while (isdigit(s))
i++;


This does not _require_ a digit to be present.
while (isspace(s))
i++;
if (s != '\0')
printf("\nIncorrect string\n");
return (0);
}

I want to actually convert a string to unsigned long. So this
kind of algorithm should be carried out prior to strtoul
function to ensure that some of the weakness from which
the strtoul function suffers like convertin 123aaaaa to 123


That is not a weakness but a strength.

x = strtoul(str, &endp, 0);

if (endp != str)
while (isspace((unsigned char) endp))
endp++;

if (endp != str && *endp == 0)
/* all good */;
for eg or -123 to some unsigned value is removed.

if (endp != str && *endp == 0 && strchr(str,'-') == 0)
/* all good */;
 
K

Keith Thompson

Default User said:
So? In that case there's a required diagnostic for the missing
declaration.


But not undefined behavior.

You're right, there's no undefined behavior in either C90 or C99.

(Well, there's a constraint violation in C99; if the implementation
accepts the program in spite of that, after issuing the required
diagnostic, then the behavior is undefined. But that's stretching the
point.)

Failing to include <ctype.h> when using the is* or to* functions is
still a bad idea, of course.
 
D

Default User

Keith said:
You're right, there's no undefined behavior in either C90 or C99.

(Well, there's a constraint violation in C99; if the implementation
accepts the program in spite of that, after issuing the required
diagnostic, then the behavior is undefined. But that's stretching the
point.)

Failing to include <ctype.h> when using the is* or to* functions is
still a bad idea, of course.


Which is probably why I said, "It's not good practice, of course."





Brian
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,997
Messages
2,570,239
Members
46,827
Latest member
DMUK_Beginner

Latest Threads

Top