prefix matching

C

Christian Gudrian

Hello, there!

Given a list with strings.

What is the most pythonic way to check if a given string starts with one of
the strings in that list?

I started by composing a regular expression pattern which consists of all
the strings in the list separated by "|" in a for loop. Then I used that
pattern to do a regexp match.

Seems rather complicated to me. Any alternatives?

Christian
 
J

Jeff Epler

Here's another simple approach:
def startswith_one(s, prefixes):
for p in prefixes:
if s.startswith(p): return True
return False
If speed is important and "prefixes" doesn't change frequently, then
coding a FSM in C is the way to go. The last time this question went
around, I learned that REs like
a|b|c
essentially tries the alternatives one after another, rather than
compiling into an FSM like I learned in school, so the amount of time
taken is proportional to the length of the total RE, not the longest
alternative.

Jeff
 
S

Stefan Meier

Must be something like
l=["pref1","pref2"]
s="String"
if filter(lambda x:s[:len(x)] == x, l):
# do something

Cheers,
Stefan

Hello, there!

Given a list with strings.

What is the most pythonic way to check if a given string starts with one of
the strings in that list?

I started by composing a regular expression pattern which consists of all
the strings in the list separated by "|" in a for loop. Then I used that
pattern to do a regexp match.

Seems rather complicated to me. Any alternatives?

Christian

--
Stefan Meier
Computational Biology & Informatics
Exelixis, Inc.
170 Harbor Way, P.O. Box 511
South San Francisco, CA 94083-511
fon. (650)837 7816
 
P

Peter Otten

Christian said:
What is the most pythonic way to check if a given string starts with one
of the strings in that list?

I started by composing a regular expression pattern which consists of all
the strings in the list separated by "|" in a for loop. Then I used that
pattern to do a regexp match.

Seems rather complicated to me. Any alternatives?

Taken from the itertools examples at
http://docs.python.org/lib/itertools-example.html
import itertools
True in itertools.imap("so what".startswith, ["so", "what", "else"]) True
True in itertools.imap("so what".startswith, ["what", "else"]) False

It's wrapped in a function - any() - there.

Peter
 
C

Christian Gudrian

Jeff Epler said:
def startswith_one(s, prefixes):
for p in prefixes:
if s.startswith(p): return True
return False

Thanks! startswith sounds good.
If speed is important and "prefixes" doesn't change frequently, then
coding a FSM in C is the way to go.

The prefixes don't change very often but can be configured by the user. So a
FSM is not the preferred approach here I think. And speed does not really
matter.

Christian
 
F

Fredrik Lundh

Christian said:
I started by composing a regular expression pattern which consists of all
the strings in the list separated by "|" in a for loop. Then I used that
pattern to do a regexp match.

Seems rather complicated to me. Any alternatives?

does it work? is it fast enough?

(if the answer is yes and yes, what's wrong with you ;-)

</F>
 
?

=?ISO-8859-1?Q?Holger_T=FCrk?=

Christian said:
Given a list with strings.

What is the most pythonic way to check if a given string starts with one of
the strings in that list?

I started by composing a regular expression pattern which consists of all
the strings in the list separated by "|" in a for loop. Then I used that
pattern to do a regexp match.

Seems rather complicated to me. Any alternatives?

I'd use something like this:

reduce (operator.or_,
[string_to_test.startswith (x) for x in list_with_strings])

Greetings,

Holger
 
F

Fredrik Lundh

Jeff said:
The last time this question went around, I learned that REs like
a|b|c
essentially tries the alternatives one after another, rather than
compiling into an FSM

that depends somewhat on what a, b, and c happens to be.

for example,

if all alternatives consist of a single literal character, the entire
subexpression is replaced with a character set ("a|b|c" becomes
"[abc]")

for alternatives that start with literal text or a character set, the
engine never checks alternatives that cannot possible match (if
you feed "aha" to "a...|b...|c...", the second and third alternative
are never checked).

if all alternatives share a common prefix, that prefix will be checked
before any alternative is tried; if the prefix matches, only the suffixes
will be checked for each alternative. ("aa|ab|ac" becomes "a(?:a|b|c)"
becomes "a[abc]")

when searching, the engine uses a KMP-style overlap table to skip
over places where the prefix cannot possibly match. (which explains
why re.search can sometimes run faster than string.find)

</F>
 
C

Christian Gudrian

Fredrik Lundh said:
does it work? is it fast enough?

(if the answer is yes and yes, what's wrong with you ;-)

Well, um, the answer is indeed ("yes", "yes"). But if intended to write code
that just works and runs fast enough I wouldn't be here. :)

Just like many of us I'm quite new to Python. Learning a high level
programming language is a matter of some hours nowadays. What really takes
time is digging into the libraries. And that's what my question aimed at:
learning some new functions and methods.

Christian
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,201
Messages
2,571,047
Members
47,646
Latest member
xayaci5906

Latest Threads

Top