[regex] How to check for non-space character?

Gilles Ganault · Mar 21, 2009

Hello

Some of the adresses are missing a space between the streetname and
the ZIP code, eg. "123 Main Street01159 Someville"

The following regex doesn't seem to work:

#Check for any non-space before a five-digit number
re_bad_address = re.compile('([^\s].)(\d{5}) ',re.I | re.S | re.M)

I also tried ([^ ].), to no avail.

What is the right way to tell the Python re module to check for any
non-space character?

Thank you.

Tim Chase · Mar 21, 2009

Gilles said:
Hello

Some of the adresses are missing a space between the streetname and
the ZIP code, eg. "123 Main Street01159 Someville"

The following regex doesn't seem to work:

#Check for any non-space before a five-digit number
re_bad_address = re.compile('([^\s].)(\d{5}) ',re.I | re.S | re.M) -------------------------------------^

I also tried ([^ ].), to no avail. --------------------^

What is the right way to tell the Python re module to check for any
non-space character?

It looks like it's these periods that are throwing you off. Just
remove them. For a 3rd syntax:

(\S)(\d{5})

the \S (capital, instead of "\s") is "any NON-white-space character"

-tkc

John Machin · Mar 21, 2009

Gilles Ganault said:
Hello

Some of the adresses are missing a space between the streetname and
the ZIP code, eg. "123 Main Street01159 Someville"

This problem appears very similar to the one you had in a previous episode,
where you were deleting <br /> in address contexts where it obviously should
have been treated as importantly as a comma or even (would you believe) a line
break.

The example botched output was "... St Johns WoodLondon ..." IIRC.

Prevention is better than cure; try to find out if your earlier code is causing
this problem.

The following regex doesn't seem to work:

Regexes do work. If the outcome is not what you expected, it is your
eexpectation-to-regex translator that is not working.

What does it do? Does it match zero addresses, all addresses, many addresses
that contain a 5-digit number /followed/ by a space, something else? Could you
use the answer to that question to narrow in on the problem with your regex?

#Check for any non-space before a five-digit number
re_bad_address = re.compile('([^\s].)(\d{5}) ',re.I | re.S | re.M)

The comment is quite incorrect. After removing the fog of useless parentheses,
the regex says:
[^\s] -- one non-whitespace character (better written as \S)
.. -- any character (more or less, see later) (why?)
\d{5} -- 5 digits
-- a space (why?)

Then there's a hail of flags:
re.I (ignore case) -- irrelevant
re.S (DOTALL) -- makes your pointless . match any character (instead of any
character except newline) Do you have any newlines in your addresses?
re.M (MULTILINE) -- I'm 99% sure you don't need this either.

I also tried ([^ ].), to no avail.

If not-whitespace doesn't match, changing it to not-space doesn't help.

What is the right way to tell the Python re module to check for any
non-space character?

r'[^ ]' -- but that's NOT the question you should be asking.

HTH,
John

Gilles Ganault · Mar 22, 2009

It looks like it's these periods that are throwing you off. Just
remove them. For a 3rd syntax:

(\S)(\d{5})

the \S (capital, instead of "\s") is "any NON-white-space character"

Thanks guys for the tips.

How to check for single character change in a string?	6	Dec 24, 2011
How to check the date validity?	1	Dec 24, 2013
[ActivePython 2.5.1.1] Why does Python not return first line?	5	Mar 16, 2009
How to escape # hash character in regex match strings	8	Jun 10, 2009
non-terminating regex match	5	Apr 2, 2008
compound regex	0	Feb 9, 2009
How to check for remaining hard drive space in Windows?	10	Feb 28, 2007
problem with regex, how to conclude more than one character	3	Nov 7, 2008

[regex] How to check for non-space character?

Gilles Ganault

Tim Chase

John Machin

Gilles Ganault

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads