What is wrong in my list comprehension?

Hussein B · Feb 1, 2009

Hey,
I have a log file that doesn't contain the word "Haskell" at all, I'm
just trying to do a little performance comparison:
++++++++++++++
from datetime import time, timedelta, datetime
start = datetime.now()
print start
lines = [line for line in file('/media/sda4/Servers/Apache/
Tomcat-6.0.14/logs/catalina.out') if line.find('Haskell')]
print 'Number of lines contains "Haskell" = ' + str(len(lines))
end = datetime.now()
print end
++++++++++++++
Well, the script is returning the whole file's lines number !!
What is wrong in my logic?
Thanks.

Peter Otten · Feb 1, 2009

Hussein said:
Hey,
I have a log file that doesn't contain the word "Haskell" at all, I'm
just trying to do a little performance comparison:
++++++++++++++
from datetime import time, timedelta, datetime
start = datetime.now()
print start
lines = [line for line in file('/media/sda4/Servers/Apache/
Tomcat-6.0.14/logs/catalina.out') if line.find('Haskell')]
print 'Number of lines contains "Haskell" = ' + str(len(lines))
end = datetime.now()
print end
++++++++++++++
Well, the script is returning the whole file's lines number !!
What is wrong in my logic?
Thanks.

"""
find(...)
S.find(sub [,start [,end]]) -> int

Return the lowest index in S where substring sub is found,
such that sub is contained within s[start:end]. Optional
arguments start and end are interpreted as in slice notation.

Return -1 on failure.
"""

a.find(b) returns -1 if b is no found. -1 evaluates to True in a boolean
context.

Use

[line for line in open(...) if line.find("Haskell") != -1]

or, better

[line for line in open(...) if "Haskell" in line]

to get the expected result.

Peter

Chris Rebert · Feb 1, 2009

Hey,
I have a log file that doesn't contain the word "Haskell" at all, I'm
just trying to do a little performance comparison:
++++++++++++++
from datetime import time, timedelta, datetime
start = datetime.now()
print start
lines = [line for line in file('/media/sda4/Servers/Apache/
Tomcat-6.0.14/logs/catalina.out') if line.find('Haskell')]

From the help() for str.find:

find(...)
<snip>
Return -1 on failure.

~ $ python
Python 2.6 (r26:66714, Nov 18 2008, 21:48:52)
[GCC 4.0.1 (Apple Inc. build 5484)] on darwin
Type "help", "copyright", "credits" or "license" for more information.True

str.find() returns -1 on failure (i.e. if the substring is not in the
given string).
-1 is considered boolean true by Python.
Therefore, your list comprehension will contain all lines that don't
*start* with "Haskell" rather than all lines that don't *contain*
"Haskell".

Use `if "Haskell" in line` instead of `if line.find("Haskell")`. It's
even easier to read that way.

Cheers,
Chris

J Kenneth King · Feb 2, 2009

Chris Rebert said:
Python 2.6 (r26:66714, Nov 18 2008, 21:48:52)
[GCC 4.0.1 (Apple Inc. build 5484)] on darwin
Type "help", "copyright", "credits" or "license" for more information.True

str.find() returns -1 on failure (i.e. if the substring is not in the
given string).
-1 is considered boolean true by Python.

That's an odd little quirk... never noticed that before.

I just use regular expressions myself.

Wouldn't this be something worth cleaning up? It's a little confusing
for failure to evaluate to boolean true even if the relationship isn't
direct.

J Kenneth King · Feb 2, 2009

Chris Rebert said:
Python 2.6 (r26:66714, Nov 18 2008, 21:48:52)
[GCC 4.0.1 (Apple Inc. build 5484)] on darwin
Type "help", "copyright", "credits" or "license" for more information.True

str.find() returns -1 on failure (i.e. if the substring is not in the
given string).
-1 is considered boolean true by Python.

That's an odd little quirk... never noticed that before.

I just use regular expressions myself.

Wouldn't this be something worth cleaning up? It's a little confusing
for failure to evaluate to boolean true even if the relationship isn't
direct.

J Kenneth King · Feb 2, 2009

Chris Rebert said:
Python 2.6 (r26:66714, Nov 18 2008, 21:48:52)
[GCC 4.0.1 (Apple Inc. build 5484)] on darwin
Type "help", "copyright", "credits" or "license" for more information.True

str.find() returns -1 on failure (i.e. if the substring is not in the
given string).
-1 is considered boolean true by Python.

That's an odd little quirk... never noticed that before.

I just use regular expressions myself.

Wouldn't this be something worth cleaning up? It's a little confusing
for failure to evaluate to boolean true even if the relationship isn't
direct.

J Kenneth King · Feb 2, 2009

Chris Rebert said:
Python 2.6 (r26:66714, Nov 18 2008, 21:48:52)
[GCC 4.0.1 (Apple Inc. build 5484)] on darwin
Type "help", "copyright", "credits" or "license" for more information.True

str.find() returns -1 on failure (i.e. if the substring is not in the
given string).
-1 is considered boolean true by Python.

That's an odd little quirk... never noticed that before.

I just use regular expressions myself.

Wouldn't this be something worth cleaning up? It's a little confusing
for failure to evaluate to boolean true even if the relationship isn't
direct.

Peter Otten · Feb 2, 2009

J said:
Chris Rebert said:

Python 2.6 (r26:66714, Nov 18 2008, 21:48:52)
[GCC 4.0.1 (Apple Inc. build 5484)] on darwin
Type "help", "copyright", "credits" or "license" for more information.

bool(-1)

Click to expand...

True

str.find() returns -1 on failure (i.e. if the substring is not in the
given string).
-1 is considered boolean true by Python.

Click to expand...

That's an odd little quirk... never noticed that before.

I just use regular expressions myself.

Wouldn't this be something worth cleaning up? It's a little confusing
for failure to evaluate to boolean true even if the relationship isn't
direct.

Well, what is your suggested return value when the substring starts at
position 0?
0

By the way, there already is a method with a cleaner (I think) interface:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: substring not found

Peter

Stephen Hansen · Feb 2, 2009

str.find() returns -1 on failure (i.e. if the substring is not in the

That's an odd little quirk... never noticed that before.

I just use regular expressions myself.

Wouldn't this be something worth cleaning up? It's a little confusing
for failure to evaluate to boolean true even if the relationship isn't
direct.

But what would you clean it up to?

str.find can return 0 ... which is a *true* result as that means it
finds what you're looking for at position 0... but which evaluates to
boolean False. The fact that it can also return -1 which is the
*false* result which evaluates to boolean True is just another side of
that coin.

What's the options to clean it up? It can return None when it doesn't
match and you can then test str.find("a") is None... but while that
kinda works it also goes a bit against the point of having boolean
truth/falsehood not representing success/failure of the function. 0
(boolean false) still is a success.

Raising an exception would be a bad idea in many cases, too. You can
use str.index if that's what you want.

So there's not really a great solution to "cleaning it up" . I
remember there was some talk in py-dev of removing str.find entirely
because there was no really c, but I have absolutely no idea if they
ended up doing it or not.

--S

MRAB · Feb 2, 2009

J said:
Chris Rebert said:

Python 2.6 (r26:66714, Nov 18 2008, 21:48:52)
[GCC 4.0.1 (Apple Inc. build 5484)] on darwin
Type "help", "copyright", "credits" or "license" for more information.

bool(-1)

Click to expand...

True

str.find() returns -1 on failure (i.e. if the substring is not in the
given string).
-1 is considered boolean true by Python.

Click to expand...

That's an odd little quirk... never noticed that before.

I just use regular expressions myself.

Wouldn't this be something worth cleaning up? It's a little confusing
for failure to evaluate to boolean true even if the relationship isn't
direct.

str.find() returns the index (position) where the substring was found.
Because string indexes start at 0 the returned value is -1 if it's not
found.

In those languages where string indexes start at 1 the returned value is
0 if not found.

J Kenneth King · Feb 2, 2009

Stephen Hansen said:
But what would you clean it up to?

str.find can return 0 ... which is a *true* result as that means it
finds what you're looking for at position 0... but which evaluates to
boolean False. The fact that it can also return -1 which is the
*false* result which evaluates to boolean True is just another side of
that coin.

What's the options to clean it up? It can return None when it doesn't
match and you can then test str.find("a") is None... but while that
kinda works it also goes a bit against the point of having boolean
truth/falsehood not representing success/failure of the function. 0
(boolean false) still is a success.

Raising an exception would be a bad idea in many cases, too. You can
use str.index if that's what you want.

So there's not really a great solution to "cleaning it up" . I
remember there was some talk in py-dev of removing str.find entirely
because there was no really c, but I have absolutely no idea if they
ended up doing it or not.

--S

(Sorry all for the multiple post... my gnus fudged a bit there)

That's the funny thing about integers having boolean contexts I
guess. Here's a case where 0 actually isn't "False." Any returned value
should be considered "True" and "None" should evaluate to "False." Then
the method can be used in both contexts of logic and procedure.

(I guess that's how I'd solve it, but I can see that implementing it
is highly improbable)

I'm only curious if it's worth cleaning up because the OP's case is one
where there is more than one way to do it.

However, that's not the way the world is and I suppose smarter people
have discussed this before. If there's a link to the discussion, I'd
like to read it. It's pedantic but fascinating no less.

rdmurray · Feb 2, 2009

Quoth Stephen Hansen said:
I just think at this point ".find" is just not the right method to use;
"substring" in "string" is the way to determine what he wants is all.
".find" is useful for when you want the actual position, not when you just
want to determine if there's a match at all. The way I'd clean it is to
remove .find, personally I don't remember the outcome of their discussion
on py-dev, and haven't gotten around to loading up Py3 to test it out

Python 3.0 (r30:67503, Dec 18 2008, 19:09:30)
[GCC 4.3.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.Help on built-in function find:

find(...)
S.find(sub[, start[, end]]) -> int

Return the lowest index in S where substring sub is found,
such that sub is contained within s[start:end]. Optional
arguments start and end are interpreted as in slice notation.

Return -1 on failure.

Jason Scheirer · Feb 2, 2009

Hussein said:
Hussein said:

Hey,
I have a log file that doesn't contain the word "Haskell" at all, I'm
just trying to do a little performance comparison:
++++++++++++++
from datetime import time, timedelta, datetime
start = datetime.now()
print start
lines = [line for line in file('/media/sda4/Servers/Apache/
Tomcat-6.0.14/logs/catalina.out') if line.find('Haskell')]
print 'Number of lines contains "Haskell" = ' + str(len(lines))
end = datetime.now()
print end
++++++++++++++
Well, the script is returning the whole file's lines number !!
What is wrong in my logic?
Thanks.

Click to expand...

"""
find(...)
S.find(sub [,start [,end]]) -> int

Return the lowest index in S where substring sub is found,
such that sub is contained within s[start:end]. Optional
arguments start and end are interpreted as in slice notation.

Return -1 on failure.
"""

a.find(b) returns -1 if b is no found. -1 evaluates to True in a boolean
context.

Use

[line for line in open(...) if line.find("Haskell") != -1]

or, better

[line for line in open(...) if "Haskell" in line]

to get the expected result.

Peter

Or better, group them together in a generator:

sum(line for line in open(...) if "Haskell" in line)

and avoid allocating a new list with every line that contains Haskell
in it.

http://www.python.org/dev/peps/pep-0289/

Peter Otten · Feb 2, 2009

Jason said:
Hussein said:

Hey,
I have a log file that doesn't contain the word "Haskell" at all, I'm
just trying to do a little performance comparison:
++++++++++++++
from datetime import time, timedelta, datetime
start = datetime.now()
print start
lines = [line for line in file('/media/sda4/Servers/Apache/
Tomcat-6.0.14/logs/catalina.out') if line.find('Haskell')]
print 'Number of lines contains "Haskell" = ' + Â str(len(lines))
end = datetime.now()
print end
++++++++++++++
Well, the script is returning the whole file's lines number !!
What is wrong in my logic?
Thanks.

Click to expand...

"""
find(...)
S.find(sub [,start [,end]]) -> int

Return the lowest index in S where substring sub is found,
such that sub is contained within s[start:end]. Â Optional
arguments start and end are interpreted as in slice notation.

Return -1 on failure.
"""

a.find(b) returns -1 if b is no found. -1 evaluates to True in a boolean
context.

Use

[line for line in open(...) if line.find("Haskell") != -1]

or, better

[line for line in open(...) if "Haskell" in line]

to get the expected result.

Peter

Click to expand...

Or better, group them together in a generator:

sum(line for line in open(...) if "Haskell" in line)

You probably mean

sum(1 for line in open(...) if "Haskell" in line)

if you want to count the lines containing "Haskell", or

sum(line.count("Haskell") for line in open(...) if "Haskell" in line)

if you want to count the occurences of "Haskell" (where the if clause is
logically superfluous, but may improve performance).

and avoid allocating a new list with every line that contains Haskell
in it.

But note that the OP stated that there were no such lines.

Peter

is list comprehension necessary?	15	Oct 26, 2010
List comprehension - NameError: name '_[1]' is not defined ?	27	Jan 15, 2009
Python 2.2 code continues running before list comprehension is completed?	7	Jul 19, 2004
Partly erratic wrong behaviour, Python 3, lxml	5	Mar 4, 2010
Memory issues when storing as List of Strings vs List of List	2	Nov 30, 2010
Google Treasure solution in python - first time python user, helpwhats wrong	1	May 23, 2008
No-syntax Web-programming-IDE (was: Does turtle graphics have the wrong associations?)	0	Nov 22, 2009
Newbie ? file structures in Dict, List, Tuples etc How	1	Dec 12, 2007

What is wrong in my list comprehension?

Hussein B

Peter Otten

Chris Rebert

J Kenneth King

J Kenneth King

J Kenneth King

J Kenneth King

Peter Otten

Stephen Hansen

MRAB

J Kenneth King

rdmurray

Jason Scheirer

Peter Otten

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads