Convert '165.0' to int

F

Frank Millman

Hi all

I want to convert '165.0' to an integer.

The obvious method does not work -
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: '165.0'

If I convert to a float first, it does work -

Is there a short cut, or must I do this every time (I have lots of them!) ?
I know I can write a function to do this, but is there anything built-in?

Thanks

Frank Millman
 
S

SigmundV

Is there a short cut, or must I do this every time (I have lots of them!)?
I know I can write a function to do this, but is there anything built-in?

I'd say that we have established that there is no shortcut, no built-
in for this. You write you own function:

string_to_int = lambda s: int(float(s))

Then you apply it to your list of strings:

list_of_integers = map(string_to_int, list_of_strings)

Of course, this will be horribly slow if you have thousands of
strings. In such a case you should use an iterator (assuming you use
python 2.7):

import itertools as it
iterator = it.imap(string_to_int, list_of_strings)


Regards,

Sigmund
 
B

Billy Mays

I'd say that we have established that there is no shortcut, no built-
in for this. You write you own function:

string_to_int = lambda s: int(float(s))

Then you apply it to your list of strings:

list_of_integers = map(string_to_int, list_of_strings)

Of course, this will be horribly slow if you have thousands of
strings. In such a case you should use an iterator (assuming you use
python 2.7):

import itertools as it
iterator = it.imap(string_to_int, list_of_strings)


Regards,

Sigmund

if the goal is speed, then you should use generator expressions:

list_of_integers = (int(float(s)) for s in list_of_strings)
 
C

Chris Angelico

if the goal is speed, then you should use generator expressions:

list_of_integers = (int(float(s)) for s in list_of_strings)

Clarification: This is faster if and only if you don't actually need
it as a list. In spite of the variable name, it's NOT a list, and you
can't index it (eg you can't work with list_of_integers[7]). However,
you can iterate over it to work with the integers in sequence, and for
that specific (and very common) use, it will be faster and use less
memory than actually creating the list. It's also going to be a LOT
faster than creating the list, if you only need a few from the
beginning of it; the generator evaluates lazily.

Personally, I'd just create a tiny function and use that, as has been suggested.

ChrisA
 
S

Steven D'Aprano

On 7/24/2011 2:27 PM, SigmundV wrote:

if the goal is speed, then you should use generator expressions:

list_of_integers = (int(float(s)) for s in list_of_strings)


I'm not intending to pick on Billy or Sigmund here, but for the beginners
out there, there are a lot of myths about the relative speed of map, list
comprehensions, generator expressions, etc.

The usual optimization rules apply:

We should forget about small efficiencies, say about 97% of
the time: premature optimization is the root of all evil.
-- Donald Knuth

More computing sins are committed in the name of efficiency
(without necessarily achieving it) than for any other single
reason - including blind stupidity. -- W.A. Wulf

and of course:

If you haven't measured it, you're only guessing whether it is
faster or slower.

(And unless you're named Raymond Hettinger, I give little or no credibility
to your guesses except for the most obvious cases. *wink*)

Generators (including itertools.imap) include some overhead which list
comprehensions don't have (at least in some versions of Python). So for
small sets of data, creating the generator may be more time consuming than
evaluating the generator all the way through.

For large sets of data, that overhead is insignificant, but in *total*
generators aren't any faster than creating the list up front. They can't
be. They end up doing the same amount of work: if you have to process one
million strings, then whether you use a list comp or a gen expression, you
still end up processing one million strings. The only advantage to the
generator expression (and it is a HUGE advantage, don't get me wrong!) is
that you can do the processing lazily, on demand, rather than all up front,
possibly bailing out early if necessary.

But if you end up pre-processing the entire data set, there is no advantage
to using a gen expression rather than a list comp, or map. So which is
faster depends on how you end up using the data.

One other important proviso: if your map function is a wrapper around a
Python expression:

map(lambda x: x+1, data)
[x+1 for x in data]

then the list comp will be much faster, due to the overhead of the function
call. List comps and gen exprs can inline the expression x+1, performing it
in fast C rather than slow Python.

But if you're calling a function in both cases:

map(int, data)
[int(x) for x in data]

then the overhead of the function call is identical for both the map and the
list comp, and they should be equally as fast. Or slow, as the case may be.

But don't take my word on this! Measure, measure, measure! Performance is
subject to change without notice. I could be mistaken.

(And don't forget that everything changes in Python 3. Whatever you think
you know about speed in Python 2, it will be different in Python 3.
Generator expressions become more efficient; itertools.imap disappears; the
built-in map becomes a lazy generator rather than returning a list.)
 
S

SigmundV

One other important proviso: if your map function is a wrapper around a
Python expression:

map(lambda x: x+1, data)
[x+1 for x in data]

then the list comp will be much faster, due to the overhead of the function
call. List comps and gen exprs can inline the expression x+1, performing it
in fast C rather than slow Python.

But if you're calling a function in both cases:

map(int, data)
[int(x) for x in data]

then the overhead of the function call is identical for both the map and the
list comp, and they should be equally as fast. Or slow, as the case may be.

I would like to thank Steven for his enlightening (at least for me)
post.

In the OP's case I'd keep everything as lists initially. If speed then
is an issue other constructs can be considered. The use of map in the
example only reflects my inherently mathematical way of thinking.

Generally, I'd say

1) write code that works, i.e. does what it's intended to do in all
cases, and
2) if speed is an issue, try to sort out the main culprits.

Coding style is a different issue altogether, but in general I'd say
that one should use self-explanatory variable names.


Sigmund
 
B

Billy Mays

But if you're calling a function in both cases:

map(int, data)
[int(x) for x in data]

I am aware the premature optimization is a danger, but its also
incorrect to ignore potential performance pitfalls.

I would favor a generator expression here, if only because I think its
easier to read. In addition, it properly handles large amounts of data
by not duplicating the list. For very long input sequences, genexp
would be the proper thing to do (assuming you don't need to index into
results, in which case, its wrong.)

I think the fastest way to solve the OP's problem is the following: ;)

def convert_165_0_to_int(arg):
return 165
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,994
Messages
2,570,223
Members
46,812
Latest member
GracielaWa

Latest Threads

Top