Getting pyparsing to backtrack

J

John Nagle

I'm working on street address parsing again, and I'm trying to deal
with some of the harder cases.

Here's a subparser, intended to take in things like "N MAIN" and
"SOUTH", and break out the "directional" from street name.

Directionals = ['southeast', 'northeast', 'north', 'northwest',
'west', 'east', 'south', 'southwest', 'SE', 'NE', 'N', 'NW',
'W', 'E', 'S', 'SW']

direction = Combine(MatchFirst(map(CaselessKeyword, directionals)) +
Optional(".").suppress())

streetNameParser = Optional(direction.setResultsName("predirectional"))
+ Combine(OneOrMore(Word(alphanums)),
adjacent=False, joinString=" ").setResultsName("streetname")



This parses something like "N WEBB" fine; "N" is the "predirectional",
and "WEBB" is the street name.

"SOUTH" (which, when not followed by another word, is a streetname,
not a predirectional), raises a parsing exception:

Street address line parse failed for SOUTH : Expected W:(abcd...)
(at char 5), (line:1, col:6)

The problem is that "direction" matched SOUTH, and even though
"direction" is within an "Optional" and followed by another word,
the parser didn't back up when it hit the end of the expression
without satisfying the OneOrMore clause.

Pyparsing does some backup, but I'm not clear on how much,
or how to force it to happen. There's some discussion at
"http://www.mail-archive.com/[email protected]/msg169559.html".
Apparently the "Or" operator will force some backup, but it's not
clear how much lookahead and backtracking is supported.

John Nagle
 
D

Dennis Lee Bieber

I'm working on street address parsing again, and I'm trying to deal
with some of the harder cases.

Hasn't it been suggested before, that the sanest method to parse
addresses is from the end backwards...

So that:

123 N South St.

is parsed as

St. South N 123
 
J

John Nagle

I'm working on street address parsing again, and I'm trying to deal
with some of the harder cases.

The approach below works for the cases given. The "Or" operator ("^")
supports backtracking, but "Optional()" apparently does not.


direction = Combine(MatchFirst(map(CaselessKeyword, directionals)) +
Optional(".").suppress())

streetNameOnly = Combine(OneOrMore(Word(alphanums)), adjacent=False,
joinString=" ").setResultsName("streetname")

streetNameParser =
((direction.setResultsName("predirectional") + streetNameOnly)
^ streetNameOnly)



John Nagle
 
T

Thomas Jollans

Hasn't it been suggested before, that the sanest method to parse
addresses is from the end backwards...

So that:

123 N South St.

is parsed as

St. South N 123

You will of course need some trickery for that to work with

Hauptstr. 12
 
C

Cousin Stanley

I'm working on street address parsing again,
and I'm trying to deal with some of the harder cases.
....

For yet another test case
my actual address includes ....

... East South Mountain Avenue


Sometimes written as ....

... E. South Mtn Ave
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

No members online now.

Forum statistics

Threads
473,982
Messages
2,570,185
Members
46,736
Latest member
AdolphBig6

Latest Threads

Top