rejecting newlines with re.match

R

r0g

Hi,

I want to use a regex to match a string "poo" but not "poo\n" or
"poo"+chr(13) or "poo"+chr(10) or "poo"+chr(10)+chr(13)

According to http://docs.python.org/library/re.html

'.' (Dot.) In the default mode, this matches any character except a
newline. If the DOTALL flag has been specified, this matches any
character including a newline.


So I tried
a = re.compile(r'^.{1,50}$')
print a.match("poo\n")
<_sre.SRE_Match object at 0xb7767988>

:-(

The library says...

'$' Matches the end of the string or just before the newline at the end
of the string, and in MULTILINE mode also matches before a newline. foo
matches both ‘foo’ and ‘foobar’, while the regular expression foo$
matches only ‘foo’. More interestingly, searching for foo.$ in
'foo1\nfoo2\n' matches ‘foo2’ normally, but ‘foo1’ in MULTILINE mode;
searching for a single $ in 'foo\n' will find two (empty) matches: one
just before the newline, and one at the end of the string.


So that explains it but what am I to do then? I assume it isn't matching
the newline itself as the returned string does not contain one but is
there a switch that can stop $ matching 'just before the newline at the
end of the string' or is there another character class I could use here?
Any ideas greatly appreciated!

Thanks,


Roger.
 
M

MRAB

r0g said:
Hi,

I want to use a regex to match a string "poo" but not "poo\n" or
"poo"+chr(13) or "poo"+chr(10) or "poo"+chr(10)+chr(13)
"\n" is the same as chr(10).
According to http://docs.python.org/library/re.html

'.' (Dot.) In the default mode, this matches any character except a
newline. If the DOTALL flag has been specified, this matches any
character including a newline.


So I tried
a = re.compile(r'^.{1,50}$')
print a.match("poo\n")
<_sre.SRE_Match object at 0xb7767988>

:-(

The library says...

'$' Matches the end of the string or just before the newline at the end
of the string, and in MULTILINE mode also matches before a newline. foo
matches both ‘foo’ and ‘foobar’, while the regular expression foo$
matches only ‘foo’. More interestingly, searching for foo.$ in
'foo1\nfoo2\n' matches ‘foo2’ normally, but ‘foo1’ in MULTILINE mode;
searching for a single $ in 'foo\n' will find two (empty) matches: one
just before the newline, and one at the end of the string.


So that explains it but what am I to do then? I assume it isn't matching
the newline itself as the returned string does not contain one but is
there a switch that can stop $ matching 'just before the newline at the
end of the string' or is there another character class I could use here?
Any ideas greatly appreciated!
There is also "\Z" which matches only at the end of the string:

I don't know what your use case is, but do you actually need to use
regex? Sometimes is simpler and faster if you don't.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,983
Messages
2,570,187
Members
46,747
Latest member
jojoBizaroo

Latest Threads

Top