string search function

gyromagnetic · Jul 23, 2004

Hi,
I have written a function that searches a text string for various
words. The text is searched using a boolean 'and' or a boolean 'or' of
the input list of search terms.

Since I need to use this function for many long strings and many
search words, I would like to use as efficient a method as possible.

Are there improvements that can be made to the code below? Are there
better alternatives?
I am currently using Python 2.3.

Thanks.

-g

-----

def bsearch(fterms, stext, btype='AND'):
if btype == 'AND': # boolean 'and' search
found = True
for f in fterms:
if f not in stext:
found = False
break
else: # boolean 'or' search
found = False
for f in fterms:
if f in stext:
found = True
break

return found

if __name__ == '__main__':
stext = "hello to you all"
terms = ['hello', 'goodbye']
btype = 'OR'

if bsearch(terms, stext, btype):
print 'found'
else:
print 'not found'

Christopher T King · Jul 23, 2004

Are there improvements that can be made to the code below? Are there
better alternatives?

There are two ways I can think of:

1)

You can replace the 'or' search with a regular expression:

import re

def bsearch(fterms, stext, btype='AND'):
if btype == 'AND': # boolean 'and' search
...
else: # boolean 'or' search
found = bool(re.search(fterms.join('|'),stext))

return found

Assuming Python's regexps are somewhat optimized, this should be about the
fastest search you can get with arbitrary strings. Unfortunately, there's
no direct 'and' equivalent with regexps (it can be done, but it's helluva
ugly). If you want to optimize 'and', you'll need to use whatever
advanced algorithms our friends at Google use

2)

Assuming your search terms don't contain whitespace, and you want to
compare only entire words (not substrings), replace the string search with
a list search, simply by adding the line 'stext = stext.split()' to the
top of the search function. This will greatly speed up the search by
avoiding checking partial words. The same effect can be achieved with the
regexp above by appending '\s' to the beginning and end of it.

Continuing along this line (whole words, no whitespace), you can replace
the lists with sets for an even greater speedup. The resulting code would
look like this (with a bit of code stolen from sets.py):

from sets import Set

def ispartialsubset(self, other):
"""Report whether another set contains at least one member of this set."""
self._binary_sanity_check(other)
for elt in ifilter(other._data.has_key, self):
return True
return False

Set.ispartialsubset = ispartialsubset

def bsearch(fterms, stext, btype='AND'):
stext = stext.split()
if btype == 'AND': # boolean 'and' search
found = Set(fterms).issubset(Set(stext))
else: # boolean 'or' search
found = Set(fterms).ispartialsubset(Set(stext))
return found

If the same stext or fterms is used multiple times, you could move the
..split() and/or the Set() transformations outside of the function, so they
only have to be done once.

Again, this method (using sets) only works if your words contain no
whitespace, and you're not interested in matching parts of words. It's
/much/ faster than a string search, though.

Hope this helps!

Larry Bates · Jul 23, 2004

You should take a look at regular expressions (re)
functions. I think you will find that they are
much more efficient and can handle things your
routines can't handle.

http://www.amk.ca/python/howto/regex/

HTH,
Larry Bates
Syscon, Inc.

Search Results with Pagination	1	Oct 25, 2024
Bootstrap Tree View doesnt search properly. I am searching for 954116679 (FSP) but it returns 0 matches found	2	May 27, 2024
I made a blockchain and want to make a cryptocurrency, but my code doesn't verify hash of each block	2	Jun 2, 2024
Twitter Bot for Series recommendations help please	1	Oct 2, 2024
Keypress function	0	Dec 19, 2022
Code sharing	2	Oct 15, 2024
JavaScript String Syntax Checking	5	Jun 29, 2022
ChatBot	4	Jan 19, 2021

string search function

gyromagnetic

Christopher T King

Larry Bates

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads