Regular Expression Help

Jean-Claude Neveu · Apr 12, 2009

Hello,

I was wondering if someone could tell me where
I'm going wrong with my regular expression. I'm
trying to write a regexp that identifies whether
a string contains a correctly-formatted currency
amount. I want to support dollars, UK pounds and
Euros, but the example below deliberately omits
Euros in case the Euro symbol get mangled
anywhere in email or listserver processing. I
also want people to be able to omit the currency symbol if they wish.

My regexp that I'm matching against is: "^\$\£?\d{0,10}(\.\d{2})?$"

Here's how I think it should work (but clearly
I'm wrong, because it does not actually work):

^\$\£? Require zero or one instance of $ or £ at the start of the string.
d{0,10} Next, require between zero and ten alpha characters.
(\.\d{2})? Optionally, two characters can
follow. They must be preceded by a decimal point.

Examples of acceptable input should be:

$12.42
$12
£12.42
$12,482.96 (now I think about it, I have not catered for this in my regexp)

And unacceptable input would be:

$12b.42
blah
$blah
etc

Here is my Python script:

#
import re

def is_currency(str):
rex = "^\$\£?\d{0,10}(\.\d{2})?$"
if re.match(rex, str):
return 1
else:
return 0

def test_match(str):
if is_currency (str):
print str + " is a match"
else:
print str + " is not a match"

# All should match except the last two
test_match("$12.47")
test_match("12.47")
test_match("£12.47")
test_match("£12")
test_match("$12")
test_match("$12588.47")
test_match("$12,588.47")
test_match("£12588.47")
test_match("12588.47")
test_match("£12588")
test_match("$12588")
test_match("blah")
test_match("$12b.56")

AND HERE IS THE OUTPUT FROM THE ABOVE SCRIPT:
$12.47 is a match
12.47 is not a match
£12.47 is not a match
£12 is not a match
$12 is a match
$12588.47 is a match
$12,588.47 is not a match
£12588.47 is not a match
12588.47 is not a match
£12588 is not a match
$12588 is a match
blah is not a match
$12b.56 is not a match

Many thanks in advance. Regular expressions are not my strong suit

J-C

rurpy · Apr 12, 2009

My regexp that I'm matching against is: "^\$\£?\d{0,10}(\.\d{2})?$"

Here's how I think it should work (but clearly
I'm wrong, because it does not actually work):

^\$\£? Require zero or one instance of $ or £ at the start of the string.

The "or" in "$ or £" above is a vertical bar. You
want ^(\$|£)? here.

John Machin · Apr 12, 2009

The "or" in "$ or £" above is a vertical bar. You
want ^(\$|£)? here.

Best not to use a capturing group (blah) when you don't need to
capture ... use (?:blah) instead.

When the alternatives are all single characters, for greater typing
efficiency and computing efficiency use a character class:

^[\$£]?

Graham Breed · Apr 13, 2009

Jean-Claude Neveu said:
Hello,

I was wondering if someone could tell me where I'm going wrong with my
regular expression. I'm trying to write a regexp that identifies whether
a string contains a correctly-formatted currency amount. I want to
support dollars, UK pounds and Euros, but the example below deliberately
omits Euros in case the Euro symbol get mangled anywhere in email or
listserver processing. I also want people to be able to omit the
currency symbol if they wish.

If Euro symbols can get mangled, so can Pound signs.
They're both outside ASCII.

My regexp that I'm matching against is: "^\$\£?\d{0,10}(\.\d{2})?$"

Here's how I think it should work (but clearly I'm wrong, because it
does not actually work):

^\$\£? Require zero or one instance of $ or £ at the start of the
string.

^[$£]? is correct. And, as you're using re.match, the ^ is
superfluous. (A previous message suggested ^[\$£]? which
will also work. You generally need to escape a Dollar sign
but not here.)

You should also think about the encoding. In my terminal,
"£" is identical to '\xc2\xa3'. That is, two bytes for a
UTF-8 code point. If you assume this encoding, it's best to
make it explicit. And if you don't assume a specific
encoding it's best to convert to unicode to do the
comparisons, so for 2.x (or portability) your string should
start u"

d{0,10} Next, require between zero and ten alpha characters.

There's a backslash missing, but not from your original
expression. Digits are not "alpha characters".

(\.\d{2})? Optionally, two characters can follow. They must be preceded
by a decimal point.

That works. Of course, \d{2} is longer than the simpler \d\d

Note that you can comment the original expression like this:

rex = u"""(?x)
^[$£]? # Zero or one instance of $ or £
# at the start of the string.
\d{0,10} # Between zero and ten digits
(\.\d{2})? # Optionally, two digits.
# They must be preceded by a decimal point.
$ # End of line
"""

Then anybody (including you) who comes to read this in the
future will have some idea what you were trying to do.

\> Examples of acceptable input should be:

$12.42
$12
£12.42
$12,482.96 (now I think about it, I have not catered for this in my
regexp)

Yes, you need to think about that.

Graham

Help with regular expression in python	1	Aug 18, 2011
Regular Expression for the special character "\|" pipe	7	May 27, 2014
Regular expression negative look-ahead	1	Jul 2, 2013
Regular Expression : Bad Character Range	0	Dec 20, 2013
Repeating assertions in regular expression	3	Jan 3, 2012
grimace: a fluent regular expression generator in Python	0	Jul 15, 2013
Regular expression syntax error	1	Dec 20, 2015
Question: Optional Regular Expression Grouping	4	Oct 10, 2011

Regular Expression Help

Jean-Claude Neveu

rurpy

John Machin

Graham Breed

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads