How to check if a string "is" an int?

E

Erik Max Francis

Steven said:
In that case, the name is misleadingly wrong. I suppose it is not likely
that it could be changed before Python 3?

Why?

The primary purpose of the .isdigit, etc. methods is to test whether a
single character has a certain property. There is, however, no special
character data type in Python, and so by necessity those methods must be
on strings, not characters.

Thus, you have basically two choices: Have the methods throw exceptions
for strings with a length different from one, or have them just iterate
over every character in a string. The latter is clearly a more useful
functionality.
 
D

David Rasmussen

Daniel said:
others already answered, this is just an idea

I guess, if we want to avoid the exception paradigm for a particular
problem, we could just do something like:

def isNumber(n):
try:
dummy = int(n)
return True
except ValueError:
return False

and use that function from whereever in the program.

/David
 
P

Paul Rubin

Erik Max Francis said:
The primary purpose of the .isdigit, etc. methods is to test whether a
single character has a certain property. There is, however, no
special character data type in Python, and so by necessity those
methods must be on strings, not characters.

Right, those two sentences contradict each other. There's no
character data type so .isdigit can only test whether a string has a
certain property. That certain property is whether string is a digit,
which is to say, a single-character string with one of a certain set
of values.
Thus, you have basically two choices: Have the methods throw
exceptions for strings with a length different from one, or have them
just iterate over every character in a string. The latter is clearly
a more useful functionality.

There is a third choice which is the natural and obvious one: have the
function do what its name indicates. Return true if the arg is a
digit and false otherwise. If iterating over the whole string is
useful (which it may be), then the function should have been named
differently, like .isdigits instead of .isdigit.

FWIW, I've usually tested for digit strings with re.match. It never
occurred to me that isdigit tested a whole string.
 
D

Dave Hansen

There is a third choice which is the natural and obvious one: have the
function do what its name indicates. Return true if the arg is a
digit and false otherwise. If iterating over the whole string is
useful (which it may be), then the function should have been named
differently, like .isdigits instead of .isdigit.

Following your logic to its conclusion, had the name isdigits been
chosen, '1'.isdigits() should return False. It's only one digit, not
more than one, as the plural would imply.

I, for one, don't see any utility in the dichotomy. We only need
(should only have) one function. I do agree that isdigits might have
been a better name, but we're stuck with isdigit for hysterical
raisins. And it's logical that string functions work over a string
rather than its first character.
FWIW, I've usually tested for digit strings with re.match. It never
occurred to me that isdigit tested a whole string.

Someone's been trotting out that old jwz chestnut about regular
expressions and problems... Not that I agree with it, but ISTM that
regular expressions are vast overkill for this problem.

Regards,
-=Dave
 
P

Peter Otten

Steven said:
How do I check if a string contains (can be converted to) an int? I
want to do one thing if I am parsing and integer, and another if not.

/David

others already answered, this is just an idea
def isNumber(n):
... import re
... if re.match("^[-+]?[0-9]+$", n):
... return True
... return False

This is just a thought experiment, right, to see how slow you can make
your Python program run?

Let's leave the thought experiments to the theoretical physicists and
compare a regex with an exception-based approach:

~ $ python -m timeit -s'import re; isNumber =
re.compile(r"^[-+]\d+$").match' 'isNumber("-123456")'
1000000 loops, best of 3: 1.24 usec per loop
~ $ python -m timeit -s'import re; isNumber =
re.compile(r"^[-+]\d+$").match' 'isNumber("-123456x")'
1000000 loops, best of 3: 1.31 usec per loop

~ $ python -m timeit -s'def isNumber(n):' -s' try: int(n); return True' -s
' except ValueError: pass' 'isNumber("-123456")'
1000000 loops, best of 3: 1.26 usec per loop
~ $ python -m timeit -s'def isNumber(n):' -s' try: int(n); return True' -s
' except ValueError: pass' 'isNumber("-123456x")'
100000 loops, best of 3: 10.8 usec per loop

A tie for number-strings and regex as a clear winner for non-numbers.

Peter
 
S

Steven D'Aprano

Why?

The primary purpose of the .isdigit, etc. methods is to test whether a
single character has a certain property. There is, however, no special
character data type in Python, and so by necessity those methods must be
on strings, not characters.

Thus, you have basically two choices: Have the methods throw exceptions
for strings with a length different from one, or have them just iterate
over every character in a string. The latter is clearly a more useful
functionality.

*shrug*

If your argument was as obviously correct as you think, shouldn't
ord("abc") also iterate over every character in the string, instead of
raising an exception?

But in any case, I was arguing that the *name* is misleading, not that the
functionality is not useful. (Some might argue that the functionality is
harmful, because it encourages Look Before You Leap testing.) In English,
a digit is a single numeric character. In English, "123 is a digit" is
necessarily false, in the same way that "A dozen eggs is a single egg" is
false.

In any case, it isn't important enough to break people's code. I'd rather
that the method isdigit() were called isnumeric() or something, but I can
live with the fact that it is not.
 
D

Dennis Lee Bieber

In any case, it isn't important enough to break people's code. I'd rather
that the method isdigit() were called isnumeric() or something, but I can
live with the fact that it is not.

Yet, I would expect an isnumeric() to accept something like
"-1.23e-45j" (it's a valid complex number)
 
S

Steven D'Aprano

Steven said:
(e-mail address removed) wrote:
How do I check if a string contains (can be converted to) an int? I
want to do one thing if I am parsing and integer, and another if not.

/David


others already answered, this is just an idea

def isNumber(n):
... import re
... if re.match("^[-+]?[0-9]+$", n):
... return True
... return False

This is just a thought experiment, right, to see how slow you can make
your Python program run?

Let's leave the thought experiments to the theoretical physicists

Didn't I have a smiley in there?

and compare a regex with an exception-based approach:

~ $ python -m timeit -s'import re; isNumber =
re.compile(r"^[-+]\d+$").match' 'isNumber("-123456")'
1000000 loops, best of 3: 1.24 usec per loop

But since you're going to take my protests about regexes more seriously
than I intended you to, it is ironic that you supplied a regex that
is nice and fast but doesn't work:
re.compile(r"^[-+]\d+$").match("123456") is None
True

Isn't that the point of Jamie Zawinski's quote about regexes? I too can
write a regex that doesn't solve the problem -- and this regex is a dead
simple case, yet still easy to get wrong.

BTW, you might find it informative to run timeit on the code snippet
provided by Daniel before reflecting on the context of my "how slow"
comment.
 
P

Peter Otten

Steven said:
But since you're going to take my protests about regexes more seriously
than I intended you to, it is ironic that you supplied a regex that
is nice and fast but doesn't work:

I think you said that "exceptions are cheap" elsewhere in this thread and
I read your post above as "regular expressions are slow". I meant to set
these statements into proportion.

Those who snip the Zawinski quote are doomed to demonstrate it in their
code, though it wouldn't have taken this lapse for me to grant you that
regexes are errorprone.
BTW, you might find it informative to run timeit on the code snippet
provided by Daniel before reflecting on the context of my "how slow"
comment.

I'm getting about 10 usec for both cases, i. e. roughly the same as the
worstcase behaviour for try...except.

Peter
 
F

Fredrik Lundh

Grant said:
But that is "obviously" wrong, since '15' is not a digit.

no, but all characters in the string belongs to the "digit" character
class, which is what the "is" predicates look for.

cf.
True

and so on.

</F>
 
P

Paul Rubin

Fredrik Lundh said:
no, but all characters in the string belongs to the "digit" character
class, which is what the "is" predicates look for.

That description is not quite right. All characters in the empty
string belong to the "digit" character class, but isdigit returns
false (which it probably should).

Python 2.3.4 (#1, Feb 2 2005, 12:11:53)
[GCC 3.4.2 20041017 (Red Hat 3.4.2-6.fc3)] on linux2
Type "help", "copyright", "credits" or "license" for more information. False
 
F

Fredrik Lundh

Paul said:
That description is not quite right. All characters in the empty
string belong to the "digit" character class

A: are there any blue cars on the street?
B: no. not a single one.
A: you're wrong! all cars on the street are blue!
B: no, the street is empty.
A: yeah, so all the cars that are on the street are blue!
B: oh, please.
A: admit that you're wrong! admit that you're wrong! admit that you're wrong!
*smack*
B: (muttering) moron.

</F>
 
P

Paul Rubin

Fredrik Lundh said:
A: are there any blue cars on the street?
B: no. not a single one.
A: you're wrong! all cars on the street are blue!

B and A are both correct. It's just logic ;-).
 
D

Duncan Booth

Fredrik said:
no, but all characters in the string belongs to the "digit" character
class, which is what the "is" predicates look for.

then gave examples including:

I don't see how istitle() matches your definition of what the "is"
predicates look for.
 
G

Grant Edwards

no, but all characters in the string belongs to the "digit"
character class, which is what the "is" predicates look for.

I know.

My point was that '15'.isdigit() returning True is in my
opinion "surprising" since '15' is not a digit in the most
obvious meaning of the phrase. In language design, "surprise"
is a bad quality.

It's like saying that [1,2,3,4] is an integer.
 
A

Alex Martelli

Paul Rubin said:
B and A are both correct. It's just logic ;-).

Charles Lutwidge Dodgson spent his professional life arguing against
this, as I mentioned in
<http://mail.python.org/pipermail/python-list/2001-July/052732.html> --
but, mostly, "mainstream" logic proceeded along the opposite channel you
mention. Good thing he had interesting hobbies (telling stories to
children, and taking photographs), or today he perhaps might be
remembered only for some contributions to voting-theory;-).

I don't know of any "complete and correct" logic (or set-theory) where
there is more than one empty-set, but I'm pretty sure that's because I
never really dwelled into the intricacies of modern theories such as
modal logic (I would expect modal logic, and intensional logic more
generally, would please Dodgson far better than extensional logic...
but, as I said, I don't really understand them in sufficient depth)...


Alex
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,274
Messages
2,571,368
Members
48,060
Latest member
JerrodSimc

Latest Threads

Top