How do you print a string after it's been searched for an RE?

J

John Salerno

After I've run the re.search function on a string and no match was
found, how can I access that string? When I try to print it directly,
it's an empty string, I assume because it has been "consumed." How do
I prevent this?

It seems to work fine for this 2.x code:

import urllib.request
import re

next_nothing = '12345'
pc_url = 'http://www.pythonchallenge.com/pc/def/linkedlist.php?
nothing='
pattern = re.compile(r'[0-9]+')

while True:
page = urllib.request.urlopen(pc_url + next_nothing)
match_obj = pattern.search(page.read().decode())
if match_obj:
next_nothing = match_obj.group()
print(next_nothing)
else:
print(page.read().decode())
break

But when I try it with my own code (3.2), it won't print the text of
the page:

import urllib.request
import re

next_nothing = '12345'
pc_url = 'http://www.pythonchallenge.com/pc/def/linkedlist.php?
nothing='
pattern = re.compile(r'[0-9]+')

while True:
page = urllib.request.urlopen(pc_url + next_nothing)
match_obj = pattern.search(page.read().decode())
if match_obj:
next_nothing = match_obj.group()
print(next_nothing)
else:
print(page.read().decode())
break

P.S. I plan to clean up my code, I know it's not great right now. But
my immediate goal is to just figure out why the 2.x code can print
"text", but my own code can't print "page," which are basically the
same thing, unless something significant has changed with either the
urllib.request module, or the way it's decoded, or something, or is it
just an RE issue?

Thanks.
 
I

Ian Kelly

After I've run the re.search function on a string and no match was
found, how can I access that string? When I try to print it directly,
it's an empty string, I assume because it has been "consumed." How do
I prevent this?

This has nothing to do with regular expressions. It would appear that
page.read() is letting you read the response body multiple times in
2.x but not in 3.x, probably due to a change in buffering. Just store
the string in a variable and avoid calling page.read() multiple times.
 
J

John Salerno

This has nothing to do with regular expressions. It would appear that
page.read() is letting you read the response body multiple times in
2.x but not in 3.x, probably due to a change in buffering.  Just store
the string in a variable and avoid calling page.read() multiple times.

Thank you. That worked, and as a result I think my code will look
cleaner.
 
T

Thomas L. Shinnick

There is also
print(match_obj.string)
which gives you a copy of the string searched. See end of section
6.2.5. Match Objects
 
J

John Salerno

There is also
       print(match_obj.string)
which gives you a copy of the string searched.  See end of section
6.2.5. Match Objects

I tried that, but the only time I wanted the string printed was when
there *wasn't* a match, so the match object was a NoneType.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,968
Messages
2,570,153
Members
46,701
Latest member
XavierQ83

Latest Threads

Top