raw strings in regexps

M

Mike

I've been having trouble with a regular expression, and I finally simplified
things down to the point that (a) my example is very simple, and (b) I'm
totally confused. There are those who would say (b) is normal, but that's
another thread.

I finally simplified my problem down to this simple case:

re.match(r'\\this', r'\\this')

Both the pattern and the string to match are identical raw strings, yet they
don't match. What does match is this:

re.match(r'\\\\this', r'\\this')

Below are outputs from two versions of Python on two different machines,
with identical outputs, so it's probably not a compiler problem or a version
bug.

What's going on here? Am I missing something obvious?

linux25> python
Python 2.2.3 (#1, Feb 2 2005, 12:22:48)
[GCC 3.2.3 20030502 (Red Hat Linux 3.2.3-49)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
C:\Dev\python>python
ActivePython 2.4.2 Build 10 (ActiveState Corp.) based on
Python 2.4.2 (#67, Jan 17 2006, 15:36:03) [MSC v.1310 32 bit (Intel)] on
win32
Type "help", "copyright", "credits" or "license" for more information.
-- Mike --
 
C

Carl Banks

Mike said:
I've been having trouble with a regular expression, and I finally simplified
things down to the point that (a) my example is very simple, and (b) I'm
totally confused. There are those who would say (b) is normal, but that's
another thread.

I finally simplified my problem down to this simple case:

re.match(r'\\this', r'\\this')

Both the pattern and the string to match are identical raw strings, yet they
don't match.

In the pattern, the first backslash escapes the second; thus the two
backslashes in the pattern match one backslash in the string. Another
example:

re.match(r"\[",r"\[") => False
re.match(r"\[",r"[") => True


Carl Banks
 
F

Fredrik Lundh

Mike said:
I've been having trouble with a regular expression, and I finally simplified
things down to the point that (a) my example is very simple, and (b) I'm
totally confused. There are those who would say (b) is normal, but that's
another thread.

I finally simplified my problem down to this simple case:

re.match(r'\\this', r'\\this')

Both the pattern and the string to match are identical raw strings, yet they
don't match.

the regular expression engine matches a pattern against a string, not a
string against a string. in a pattern, "\\" matches a single backslash.
in your raw string, you have *two* backslashes. try this instead:

re.match(r'\\this', '\\this')

</F>
 
G

Gabriel Genellina

I finally simplified my problem down to this simple case:

re.match(r'\\this', r'\\this')

Both the pattern and the string to match are identical raw strings, yet they
don't match. What does match is this:

re.match(r'\\\\this', r'\\this')

Perhaps you can understand better with a simpler example without backslashes:
<_sre.SRE_Match object at 0x00C5CDB0>

You have to quote metacharacters if you want to match them. The
escape method is useful for this:
'\\(a\\)'


--
Gabriel Genellina
Softlab SRL

__________________________________________________
Correo Yahoo!
Espacio para todos tus mensajes, antivirus y antispam ¡gratis!
¡Abrí tu cuenta ya! - http://correo.yahoo.com.ar
 
M

Mike

....
You have to quote metacharacters if you want to match them. The escape
method is useful for this:

'\\(a\\)'

Doh! Of course! Thanks everyone.

-- Mike --
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,226
Members
46,815
Latest member
treekmostly22

Latest Threads

Top