andrew said:
Is the third case here surprising to anyone else? It doesn't make
sense to me...
Python 2.6.2 (r262:71600, Oct 24 2009, 03:15:21)
[GCC 4.4.1 [gcc-4_4-branch revision 150839]] on linux2
Type "help", "copyright", "credits" or "license" for more information.
'a\x62c' is a string literal which is the same as 'abc', so re.compile
receives the characters:
abc
as the regex, which matches the string:
abc
'a\\x62c' is a string literal which represents the characters:
a\x62c
so re.compile receives these characters as the regex.
The re module understands has its own set of escape sequences, most of
which are the same as Python's string escape sequences. The re module
treats \x62 like the string escape, ie it represents the character 'b',
so this regex is the same as:
abc
'a\\\x62c' is a string literal which is the same as 'a\\bc', so
re.compile receives the characters:
a\bc
as the regex.
The re module treats the \b in a regex as representing a word boundary,
unless it's in a character set, eg. [\b].
The regex will try to match a word boundary sandwiched between 2
letters, which can never happen.