Something weird about re.finditer()

G

Gilles Ganault

Hello

I stumbled upon something funny while downloading web pages and
trying to extract one or more blocks from a page: Even though Python
seems to return at least one block, it doesn't actually enter the for
loop:

======
re_block = re.compile('before (.+?) after',re.I|re.S|re.M)

#Here, get web page and put it into "response"

blocks = None
blocks = re_block.finditer(response)
if blocks == None:
print "No block found"
else:
print "Before blocks"
for block in blocks:
#Never displayed!
print "In blocks"
======

Since "blocks" is no longer set to None after calling finditer()...
but doesn't contain a single block... what does it contain then?

Thank you for any tip.
 
P

Peter Otten

Gilles said:
        I stumbled upon something funny while downloading web pages and
trying to extract one or more blocks from a page: Even though Python
seems to return at least one block, it doesn't actually enter the for
loop:

======
re_block = re.compile('before (.+?) after',re.I|re.S|re.M)

#Here, get web page and put it into "response"

blocks = None
blocks = re_block.finditer(response)
if blocks == None:
        print "No block found"
else:
        print "Before blocks"
        for block in blocks:
                #Never displayed!
                print "In blocks"
======

Since "blocks" is no longer set to None after calling finditer()...
but doesn't contain a single block... what does it contain then?

This is by design. When there are no matches re.finditer() returns an empty
iterator, not None.

Change your code to something like

has_matches = False
for match in re_block.finditer(response):
if not has_matches:
has_matches = True
print "before blocks"
print "in blocks"
if not has_matches:
print "no block found"

or

match = None
for match in re_block.finditer(response):
print "in blocks"
if match is None:
print "no block found"

Peter
 
J

John Machin

Hello

        I stumbled upon something funny while downloading web pages and
trying to extract one or more blocks from a page: Even though Python
seems to return at least one block, it doesn't actually enter the for
loop:

======
re_block = re.compile('before (.+?) after',re.I|re.S|re.M)

#Here, get web page and put it into "response"

blocks = None
blocks = re_block.finditer(response)
if blocks == None:
        print "No block found"
else:
        print "Before blocks"
        for block in blocks:
                #Never displayed!
                print "In blocks"
======

Since "blocks" is no longer set to None after calling finditer()...
but doesn't contain a single block... what does it contain then?

Thank you for any tip.

Tip 0: contemplate what type you could infer from the name findITER
Tip 1: Read the manual to see what type is returned by re.finditer
(or do import re; help(re.finditer))
Tip 2: Append
, type(blocks)
to the relevant print statements in your above code, and inspect the
output.

Metatip 0: Following the tips can be done rapidly without any need for
an internet connection.

Meta**2tip 0: The Tips and the Metatip can be applied to many things,
not just re.finditer.

HTH,
John
 
J

Justin Ezequiel

re_block = re.compile('before (.+?) after',re.I|re.S|re.M)

#Here, get web page and put it into "response"

blocks = None
blocks = re_block.finditer(response)
if blocks == None:
        print "No block found"
else:
        print "Before blocks"
        for block in blocks:
                #Never displayed!
                print "In blocks"
======

Since "blocks" is no longer set to None after calling finditer()...
but doesn't contain a single block... what does it contain then?

Thank you for any tip.

because finditer returns a generator which in your case just happens
to be empty
import re
patt = re.compile('foo')
gen = patt.finditer('bar')
gen is None False
gen == None False
gen
list(gen) []
 
S

Steven D'Aprano

Since "blocks" is no longer set to None after calling finditer()... but
doesn't contain a single block... what does it contain then?

It probably took you twenty times more time and effort to ask the
question than it would have to look for yourself.

[]




BTW, testing for None with == is not recommended, because one day
somebody might pass your function some strange object that compares equal
to None. Although it wouldn't have solved your problem, the recommended
way to test if an object is None is with the `is` operator.
 
L

Lawrence D'Oliveiro

Steven said:
BTW, testing for None with == is not recommended, because one day
somebody might pass your function some strange object that compares equal
to None.

Presumably if it compares equal to None, that is by design, precisely so it
would work in this way.
 
S

Steven D'Aprano

Presumably if it compares equal to None, that is by design, precisely so
it would work in this way.

In context, no. We're not talking about somebody creating an object which
is equivalent to None when treated as a value, but using None as a
sentinel. Sentinels are markers, and it is important that nothing else
can be mistaken for that marker or breakage will occur.

Of course, if the caller knows how the sentinel is used, then he might
choose to duplicate that usage but pass some other object. But that would
be stupid and should be discouraged. I mean, what would be the point? I
can think of use-cases for creating something that returns equal to None
-- the Null object pattern comes to mind. But what would be the point of
creating an object that was not None but would fool a function into
treating it as the same sentinel as None?
 
A

Aaron Brady

In context, no. We're not talking about somebody creating an object which
is equivalent to None when treated as a value, but using None as a
sentinel. Sentinels are markers, and it is important that nothing else
can be mistaken for that marker or breakage will occur.

Of course, if the caller knows how the sentinel is used, then he might
choose to duplicate that usage but pass some other object. But that would
be stupid and should be discouraged. I mean, what would be the point? I
can think of use-cases for creating something that returns equal to None
-- the Null object pattern comes to mind. But what would be the point of
creating an object that was not None but would fool a function into
treating it as the same sentinel as None?

In that case, they could use separate sentinels, that are instances of
a class or classes that have defined behavior for comparing to each
other.

It might get as bad as setting a flag on the class or sentinel, though
you'd have to be careful about concurrency and especially nested
calls.

You'd have to rely on the user function to use equality instead of
identity testing, since 'sentinel is None' won't return true no matter
what you do to it.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,292
Messages
2,571,494
Members
48,183
Latest member
GarfieldBa

Latest Threads

Top