get number at end of string

J

jake1138

Here is a function I have to get a number at the end of a string. I'm
posting this in case it proves helpful to someone. Comments are
welcome.

int getnum(char *str)
{
char buffer[BUFSIZE];
char *buf = buffer;
int i;

for (i = 0; i < BUFSIZE-1; i++) {
if (str == '\0') break;
if (!isdigit(str)) continue;
*(buf++) = str;
}
*buf = '\0';

if (buffer[0] != '\0') {
return atoi(buffer);
} else {
return -1;
}
}
 
A

Andrey Tarasevich

jake1138 said:
Here is a function I have to get a number at the end of a string. I'm
posting this in case it proves helpful to someone. Comments are
welcome.

int getnum(char *str)
{
char buffer[BUFSIZE];
char *buf = buffer;
int i;

for (i = 0; i < BUFSIZE-1; i++) {
if (str == '\0') break;
if (!isdigit(str)) continue;
*(buf++) = str;
}
*buf = '\0';

if (buffer[0] != '\0') {
return atoi(buffer);
} else {
return -1;
}
}


Firstly, I don't see what it has to do with "getting a number at the end
of a string". This function is not targeted specifically at the "end of
a string" in any way.

Secondly, this function inspects at most 'BUFSIZE' first character of
the input sequence, which is more than strange. What it the input string
(it is supposed to be a string, right?) is longer than 'BUFSIZE'?
 
U

Ulrich Eckhardt

jake1138 said:
Here is a function I have to get a number at the end of a string. I'm
posting this in case it proves helpful to someone. Comments are
welcome.

int getnum(char *str)
{
char buffer[BUFSIZE];
char *buf = buffer;
int i;

for (i = 0; i < BUFSIZE-1; i++) {
if (str == '\0') break;
if (!isdigit(str)) continue;
*(buf++) = str;
}
*buf = '\0';

if (buffer[0] != '\0') {
return atoi(buffer);
} else {
return -1;
}
}


Comments (you asked for 'em):
- Undocumented. This really needs its semantics documented, in particular
the valid types of inputvalues and the behaviour on parsing failure.
- It doesn't modify 'str', use const.
- I'd put the 'buf++' from the loop's body into the header, just as the
check 'if (str == '\0')'. This is my personal preference though.
- Prone to failure due to size of 'buffer'. You check that you don't
overflow it, but if the number doesn't fit, you return faulty results.
- Too complicated. You (try to) copy the number at the end of 'str' to a
temporary buffer and then feed that buffer to atoi(). Instead, search the
beginning of that number and feed that pointer to atoi() directly.
- Your copying is faulty, it concats all digits and then parses the result
as number.

You need to do testcases according to how you want the function to work
(i.e. according to the yet undocumented semantics):

assert( getnum(NULL)==-1);
assert( getnum("")==-1);
assert( getnum("aeu123htn456)==456);
assert( getnum("x:-1")==-1);
....

Uli
 
J

Jens.Toerring

jake1138 said:
Here is a function I have to get a number at the end of a string. I'm
posting this in case it proves helpful to someone. Comments are
welcome.

A bit of nitpicking:
int getnum(char *str)

Sine you don't modify what 'str' is pointing to you might make that

int getnum( const char *str )
{
char buffer[BUFSIZE];
char *buf = buffer;
int i;

Some people think that it is a good habit to check the arguments of
a function as far as possible, and you could test is 'str' is a NULL
pointer and report an error in this case.
for (i = 0; i < BUFSIZE-1; i++) {
if (str == '\0') break;
if (!isdigit(str)) continue;


What's wrong with negative numbers? Your function will return a positive
number for them... And, contrary to the description you gave, your
function will return the first number it finds in the string, which
is not necessarily the one at the end of the string, think of e.g.
"blabla12blabla42blabla".
*(buf++) = str;


You can leave off the parentheses around the 'buf++'. But why do you
think you need to copy the string at all? You can easily use 'str'
itself as a pointer to the current position in the string and later
pass that to atoi() or whatever you're going to use.
}
*buf = '\0';
if (buffer[0] != '\0') {
return atoi(buffer);

While you can safely assume that what's in 'buffer' starts with a digit,
you still don't know if the number there isn't too large to be stored
in an integer - and atoi() gives you no chance to figure that out. For
that reason it is usually more prudent to use strtol().
} else {
return -1;

If you modify the function to deal with negative numbers also you
will have to use a different way to report errors...

Here's a version of your function that takes care of a few of the
issues mentioned (but which still does not handle negative values
correctly!):

#include <stdio.h>
#include <ctype.h>
#include <stdlib.h>
#include <errno.h>
#include <limits.h>

int getnum( const char *s )
{
long val;

if ( s == NULL )
return -1;

while ( *s && ! isdigit( *s ) )
s++;
if ( ! *s )
return -1 ;

val = strtol( s, NULL, 10 );
if ( errno == ERANGE || val > INT_MAX )
return -1;
return val;
}
Regards, Jens
 
E

Eric Sosman

jake1138 said:
Here is a function I have to get a number at the end of a string. I'm
posting this in case it proves helpful to someone. Comments are
welcome.

int getnum(char *str)

`const char *str' would be a little better, as it
indicates your intention not to modify the string.
{
char buffer[BUFSIZE];
char *buf = buffer;

I can't see any reason to make an extra copy of the
digits; you could just as well convert them directly
from the `str' string (but see below). Why bother with
the copy, especially since it places an artificial limit
on the length of the string you can handle?
int i;

for (i = 0; i < BUFSIZE-1; i++) {
if (str == '\0') break;
if (!isdigit(str)) continue;


Make that `isdigit((unsigned char)str)' to defend
against negative-valued character codes in the string.
*(buf++) = str;


Harmless but unnecessary and funny-looking parentheses.
}
*buf = '\0';

This loop doesn't quite match your "number at the end
of a string" description. For example, consider what it
does with a string like "abc123xyz456". If you actually
intend to extract "123456" from such an input, then the
copy into buffer[] may be justified -- but if you intend
to convert just the "456" the copy is unnecessary and you
need to change the way this loop works.
if (buffer[0] != '\0') {
return atoi(buffer);

Despite its name, atoi() is not a good choice of a
conversion function. The biggest drawback is that it has
no way to report errors in the input (in fact, erroneous
input provokes undefined behavior). And yes: a string
consisting entirely of digits can be erroneous; consider
the string consisting of one million '9' digits ... Try
strtol() instead, so you can check for errors after the
conversion. Alternatively, do the conversion yourself,
one digit at a time -- this may actually be easier.
} else {
return -1;

I was also going to remark on the fact that you've
ignored the possibility of a minus sign before the digit
string, but this `return' makes me think you probably don't
intend to process a sign: "abc-123" is thought of as "abc-"
plus "123", not as "abc" plus "-123". That's fine, but you
ought to say so explicitly in your description of what the
function is supposed to do.

Putting all this together leads to something like
(untested):

#include <stdlib.h>
#include <ctype.h>
#include <limits.h>
#include <errno.h>

int getnum(const char *str) {
const char *ptr;
char *end;
int dseen;
long value;

/* Advance `str' to start of all-digits suffix.
* (Alternating strspn() and strcspn() calls might
* be slicker, but this loop has the virtue of
* simplicity.)
*/
dseen = 0;
for (ptr = str; *ptr != '\0'; ++ptr) {
if (isdigit((unsigned char)*ptr)) {
if (dseen == 0) {
/* Digit string starts here */
str = ptr;
dseen = 1;
}
}
else {
/* Non-digit: suffix not found yet */
dseen = 0;
}
}

/* Can't proceed if no such suffix found */
if (dseen == 0)
return -1;

/* Try to convert the suffix */
errno = 0;
value = strtol(str, &end, 10);

/* Fail if unable to convert or if result is
* too large for an `int'
*/
if (errno == ERANGE || end == str || value > INT_MAX)
return -1;

/* Huzzah! */
return value;
}
 
K

Kiru Sengal

jake1138 said:
Here is a function I have to get a number at the end of a string.

Your function attempts to take all the numeric digits in the string
from left to right, not just the number the end of the string.
"number" as you say, implies both negative and positive variations.
I'm
posting this in case it proves helpful to someone. Comments are
welcome.

int getnum(char *str)

getnum's signature implies that it returns a number (positive or
negative) and takes a pointer to a modifiable stirng (although leaving
out 'const' is not a big deal).
{
char buffer[BUFSIZE];

You are hiding some peculiar "limits" to your function inside the
function. Where did/do you define BUFSIZE? What if the string is too
long, do you signal an error?
char *buf = buffer;
int i;

for (i = 0; i < BUFSIZE-1; i++) {
^^^^^
You are testing i, which is used to index into str not buffer, to
compare against BUFSIZE. This means, buf could still be pointing at
the beginning of your buffer (your buffer is empty) yet the limits of
the for loop can still be breached. You are limitting your function
greatly without need. Although you don't and shouldn't really use an
automatic char array in this situation, if for some reason you had to,
use some other index variable, like j, to point into buffer, and use
that to test against BUFSIZE (that way you know when exactly buffer is
filling up).
if (str == '\0') break;
if (!isdigit(str)) continue;
*(buf++) = str;
}
*buf = '\0';


There is no support for negative numbers or signs in your fuction. It
grabs all the numeric digits from left to right and plops it into a
limitted size array, which is actually not even limitted by it's own
use (see above) but rather by str's use.
if (buffer[0] != '\0') {
return atoi(buffer);

Where is atoi declared?

atoi has it's limitations, there are better functions for conversion
out there (strtol). Why don't you scan your string from right to left
with a pointer, wait until a non numeric value is reached, check the
next left char, if it's a - or +, move the pointer another unit left,
otherwise don't. Then you can pass this pointer into strtol and let it
do the conversion. Once that done, you should check for all types of
failure (associated with strtol/errno.h).
} else {
return -1;

If and when you do add support for negative numbers, this return value
will add confusion. Better let the caller supply the address of some
variable, and use it to signal errors (or some other technique).

Take care
 
A

Andrey Tarasevich

...
Here's a version of your function that takes care of a few of the
issues mentioned (but which still does not handle negative values
correctly!):

But this is not the same. The OP's function was collecting (possibly
non-adjacent) digits from the input string. You function looks for (and
converts) the first continuous sequence of digits in the input string.

Of course, I can only guess what was the OP's real intent.
 
J

Jens.Toerring

But this is not the same. The OP's function was collecting (possibly
non-adjacent) digits from the input string. You function looks for (and
converts) the first continuous sequence of digits in the input string.

Nicely spotted, I did overlook that completely! Perhaps that explains
the copying of the string to a new buffer.
Of course, I can only guess what was the OP's real intent.

I have a hard time coming up with an example where it would make sense
to do that (except perhaps in cases where one wants to skip separators
between digits like in "total population: 1,812,587", but then I would
expect a test for the correct separator character)...

Regards, Jens
 
A

Andrey Tarasevich


BTW, unless I'm missing something, the entire body of your version of
'getnum' function can be replaced with a single 'sscanf' call that will
do essentially the same thing :)

sscanf(s, "%*[^0123456789]%ld", &val);
 
C

Chris Torek

BTW, unless I'm missing something, the entire body of your version of
'getnum' function can be replaced with a single 'sscanf' call that will
do essentially the same thing :)

sscanf(s, "%*[^0123456789]%ld", &val);

No, this will fail when the string consists entirely of digits
(because %[ directives must match at least one character). You
could use:

if (sscanf(s, "%*[^0123456789]%ld", &val) != 1 &&
sscanf(s, "%ld", &val) != 1) ... do something about no-digits-at-all

Note that when scanning something like "-123", the first sscanf()
call does succeed and return 1, so the second call never occurs.
(The negated, assignment-suppressed %[ directive matches the '-'
character and therefore succeeds.)
 
J

Jens.Toerring

BTW, unless I'm missing something, the entire body of your version of
'getnum' function can be replaced with a single 'sscanf' call that will
do essentially the same thing :)
sscanf(s, "%*[^0123456789]%ld", &val);

Beside the problem Chris Torek pointed out I can't see a way to check
for cases where the number in the string is too large to be stored in
an int. And for an all-purpose routine that seems to me to be a rather
important aspect.
Regards, Jens
 
C

CBFalconer

Andrey Tarasevich said:
BTW, unless I'm missing something, the entire body of your version
of 'getnum' function can be replaced with a single 'sscanf' call
that will do essentially the same thing :)
sscanf(s, "%*[^0123456789]%ld", &val);

Beside the problem Chris Torek pointed out I can't see a way to
check for cases where the number in the string is too large to be
stored in an int. And for an all-purpose routine that seems to me
to be a rather important aspect.

Starting at the rear of the string, skip blanks. While char is a
digit, accumulate an integer, checking for overflow. If terminal
char is a sign act accordingly and back up one more. If terminal
char is '.' keep count of digits to form fractional part, and
continue digit scan into integer part. Also watch out for 'e' or
'E'. This should develop a suitable syntax for a backward scan
that yields a number of some form. It may be convenient to place
some form of sentinel at the start of the string.

I don't see any real use for it, however.
 
J

Jens.Toerring

CBFalconer said:
Andrey Tarasevich said:
BTW, unless I'm missing something, the entire body of your version
of 'getnum' function can be replaced with a single 'sscanf' call
that will do essentially the same thing :)
sscanf(s, "%*[^0123456789]%ld", &val);

Beside the problem Chris Torek pointed out I can't see a way to
check for cases where the number in the string is too large to be
stored in an int. And for an all-purpose routine that seems to me
to be a rather important aspect.
Starting at the rear of the string, skip blanks. While char is a
digit, accumulate an integer, checking for overflow. If terminal
char is a sign act accordingly and back up one more. If terminal
char is '.' keep count of digits to form fractional part, and
continue digit scan into integer part. Also watch out for 'e' or
'E'. This should develop a suitable syntax for a backward scan
that yields a number of some form. It may be convenient to place
some form of sentinel at the start of the string.

Sorry, I meant checking for overflow while using sscanf() with some
construct like the one Andrey proposed.

Regards, Jens
 
A

Andrey Tarasevich

BTW, unless I'm missing something, the entire body of your version of
'getnum' function can be replaced with a single 'sscanf' call that will
do essentially the same thing :)
sscanf(s, "%*[^0123456789]%ld", &val);

Beside the problem Chris Torek pointed out I can't see a way to check
for cases where the number in the string is too large to be stored in
an int. And for an all-purpose routine that seems to me to be a rather
important aspect.

Yes, my suggestion indeed suffers miserably from these two problems.
Trying to salvage at least some of it :) I can suggest reducing it to a
mere replacement of the 'while (!is_digit())' cycle. I.e. instead of doing

while ( *s && ! isdigit( *s ) )
s++;

one can use

int d = 0;
sscanf(s, "%*[^0123456789]%n", &d);
s += d;

although I myself would probably stick with the 'while' version (at
least for such a compact test as 'is_digit').
 
J

jake1138

Of course, I can only guess what was the OP's real intent.

Thanks for all the responses. I should have posted how my function was
being used. You all made some very good points.

The current purpose of this function is to get a positive integer at
the end of a string, where the expected string would be something like
"WAN1", "WAN2", "WAN3" ... "WANn" (where n <= 20).

I wasn't sure how to "search backwards" from the end of the string to
do this, so I came up with the function I posted. What would be nice,
for full validation, would be to search forward, specifically looking
for 'W', 'A', 'N', then taking the rest as a number, but I guess I
hadn't thought too much on how to do that.

I could use strstr() to see if "WAN" is in the string, but it doesn't
tell me it's at the beginning of the string and even if it does find
it, it only gives me a pointer to the beginning (but then I could count
up 3 from there). Hmmm, I guess if the return pointer of strstr() is
equal to the string pointer, then it was found at the beginning. I'll
have to think about this a little more.
 
P

Peter Nilsson

jake1138 said:
I should have posted how my function was being used. ...
The current purpose of this function is to get a positive
integer at the end of a string, where the expected string
would be something like "WAN1", "WAN2", "WAN3" ... "WANn"
(where n <= 20).

char ignore;
int n, r = sscanf(blah, "WAN%2d%c", &n, &ignore);
if (r == 1 && 1 <= n && n <= 20)
... WAN n ...
else
...bad input...
 
E

Eric Sosman

jake1138 said:
Thanks for all the responses. I should have posted how my function was
being used. You all made some very good points.

The current purpose of this function is to get a positive integer at
the end of a string, where the expected string would be something like
"WAN1", "WAN2", "WAN3" ... "WANn" (where n <= 20).

I wasn't sure how to "search backwards" from the end of the string to
do this, so I came up with the function I posted. What would be nice,
for full validation, would be to search forward, specifically looking
for 'W', 'A', 'N', then taking the rest as a number, but I guess I
hadn't thought too much on how to do that.

I could use strstr() to see if "WAN" is in the string, but it doesn't
tell me it's at the beginning of the string and even if it does find
it, it only gives me a pointer to the beginning (but then I could count
up 3 from there). Hmmm, I guess if the return pointer of strstr() is
equal to the string pointer, then it was found at the beginning. I'll
have to think about this a little more.

If you'd revealed what you were trying to do in the
first place, this thread would have been a lot shorter.
One word: sscanf().
 
J

jake1138

Peter said:
char ignore;
int n, r = sscanf(blah, "WAN%2d%c", &n, &ignore);
if (r == 1 && 1 <= n && n <= 20)
... WAN n ...
else
...bad input...

Well, that's quite simple. I guess it helps to have a good
understanding of the C standard library. I hadn't used sscanf before.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,161
Messages
2,570,891
Members
47,423
Latest member
henerygril

Latest Threads

Top