A question about searching with multiple strings

G

googleboy

Hi there.

I have defined a class called Item with several (about 30 I think)
different attributes (is that the right word in this context?). An
abbreviated example of the code for this is:

class Item(object):

def __init__(self, height, length, function):
params = locals()
del params['self']
self.__dict__.update(params)
def __repr__(self):

all_items = self.__dict__.items()
return '%s,%s,%s' % (self.height, self.length, self.function)



I have a csv file that I use to store and retrieve all the info about
each Item, one item per line.

I have written a little peice of python that lets me search through all
Items (after reading them into a variable called all_items) and will
return matching results:



for item in all_items:

strItem = str(item)

m = re.search(p, strItem, flags = re.I)
if m:
height = getattr(item, "height")
length = getattr(item, "length")
function = getattr(item, "function")
print "height is %s, length is %s and function is %s" % height,
length, function



This has the limitation of only working over a single search item. I
want to be able to search over an uncontrollable number of search
strings because I will have people wanting to search over 2, 3 or even
(maybe) as many as 5 different things.

I was thinking that I would try to write a function that created a
sublist of Items if it matched and then run subsequent searches over
the subsequent search strings using this sublist.

I am not entirely sure how to store this subset of Items in such a way
that I can make searches over it. I guess I have to initialize a
variable of type Item, which I can use to add matching Item's to, but
I have no idea how to do that....(If it was just a list I could say
"sublist = []", what do I use for self defined classes? I Am also
usure how to go about creating a function that will accept any number
of parameters.

Any assistance with these two questions will be greatly appreciated!

Thanks!

googleboy
 
M

Mike Meyer

googleboy said:
for item in all_items:

strItem = str(item)

m = re.search(p, strItem, flags = re.I)
if m:
height = getattr(item, "height")
length = getattr(item, "length")
function = getattr(item, "function")
print "height is %s, length is %s and function is %s" % height,
length, function


This has the limitation of only working over a single search item. I
want to be able to search over an uncontrollable number of search
strings because I will have people wanting to search over 2, 3 or even
(maybe) as many as 5 different things.

I was thinking that I would try to write a function that created a
sublist of Items if it matched and then run subsequent searches over
the subsequent search strings using this sublist.

I am not entirely sure how to store this subset of Items in such a way
that I can make searches over it. I guess I have to initialize a
variable of type Item, which I can use to add matching Item's to, but
I have no idea how to do that....(If it was just a list I could say
"sublist = []", what do I use for self defined classes? I Am also
usure how to go about creating a function that will accept any number
of parameters.

Any assistance with these two questions will be greatly appreciated!


Don't use a real list, use an iterator. Inn particular,
itertools.ifilter will take an arbitrary sequence and returns a
sequence of items that a function says to.

for item in ifilter(lambda i: re.search(p, str(i), flags = re.I),
all_items):
print "height is %s, length is %s and function is %s" % \
(item.height, item.length, item.function)

The trick is that ifilter returns a sequence, so you can nest them:

for item in filter(filter1, ifilter(filter2, ifilter(filter3, all_items))):
...

<mike
 
S

Steven D'Aprano

Hi there.

I have defined a class called Item with several (about 30 I think)
different attributes (is that the right word in this context?). An
abbreviated example of the code for this is:

class Item(object):

def __init__(self, height, length, function):
params = locals()
del params['self']
self.__dict__.update(params)

I get very worried when I see code like that. It makes me stop and think
about what it does, and why you would want to do it. I worry about hidden
side effects. Instead of just groking the code instantly, I've got to stop
and think. You're taking a copy of the locals, deleting self from it, and
them updating self's dictionary with them... why? What do you hope to
achieve?

If I were project manager, and one of my coders wrote something like this,
I would expect him or her to have a really good reason for it. I'd be
thinking not only of hidden bugs ("what if there is something in locals
you don't expect?"), but every time a developer has to work on this class,
they have to stop and think about it.

Joel (of Joel On Software fame) talks about code looking wrong and
smelling dirty. This code might work. It might be perfectly safe. But
there's a whiff to this code.

http://www.joelonsoftware.com/articles/Wrong.html


def __repr__(self):
all_items = self.__dict__.items()
return '%s,%s,%s' % (self.height, self.length, self.function)

You aren't using all_items. Why waste a lookup fetching it?
I have a csv file that I use to store and retrieve all the info about
each Item, one item per line.

Would you like to give us a couple of examples of items from the CSV file?

I have written a little peice of python that lets me search through all
Items (after reading them into a variable called all_items) and will
return matching results:



for item in all_items:

strItem = str(item)

m = re.search(p, strItem, flags = re.I)
if m:
height = getattr(item, "height")
length = getattr(item, "length")
function = getattr(item, "function")
print "height is %s, length is %s and function is %s" % height,
length, function


And here we why global variables are Bad: without knowing what p is, how
are we supposed to understand this code?
This has the limitation of only working over a single search item.

So you are searching items for items... I think you need to use a better
name for your class. What does class Item actually represent?
I
want to be able to search over an uncontrollable number of search
strings because I will have people wanting to search over 2, 3 or even
(maybe) as many as 5 different things.

I was thinking that I would try to write a function that created a
sublist of Items if it matched and then run subsequent searches over
the subsequent search strings using this sublist.

That might work.
I am not entirely sure how to store this subset of Items in such a way
that I can make searches over it.

How about in a list?
I guess I have to initialize a
variable of type Item, which I can use to add matching Item's to, but
I have no idea how to do that....(If it was just a list I could say
"sublist = []", what do I use for self defined classes?

See my next post (to follow).
I Am also
usure how to go about creating a function that will accept any number
of parameters.

def func1(*args):
for arg in args:
print arg

def func2(mandatory, *args):
print "Mandatory", mandatory
for arg in args:
print arg

Does that help?
 
S

Steven D'Aprano

Hi there.

I have defined a class called Item with several (about 30 I think)
different attributes (is that the right word in this context?).

Generally speaking, attributes shouldn't be used for storing arbitrary
items in an object. That's what mapping objects like dicts are for. I
would change your class so that it no longer mucked about with it's
internal __dict__:

class Item():
def __init__(self, height, length, function, **kwargs):
# assumes that ALL items will have height, length, function
# plus an arbitrary number (may be zero) of keyword args
self.height = height
self.length = length
self.function = function
self.data = kwargs # store custom data in an instance attribute,
# NOT in the object __dict__


You would use it something like this:

def create_items():
all_items = []
# WARNING WARNING WARNING
# pseudo-code -- this doesn't work because I don't
# know what your input file looks like
open input file
for record in input file:
h = read height
l = read length
f = read function
D = {}
for any more items in record:
D[item key] = item value
newitem = Item(h, l, f, D)
all_items.append(newitem)
close input file
return all_items

Now you have processed your input file and have a list of Items. So let's
search for some!

Firstly, create a function that searches a single Item:

def SearchOneOr(source, height=None, length=None, \
function=None, **kwargs):
"""Performs a short-circuit OR search for one or more search term."""
if height is not None:
if source.height == height: return True
if length is not None:
if source.length == length: return True
if function is not None:
if source.function == function: return True
for key, value in kwargs:
if source.data.has_key(key) and source.data[key] == value:
return True
return False

def SearchOneAnd(source, height=None, length=None, \
function=None, **kwargs):
"""Performs a short-circuit AND search for one or more search term."""
if height is not None:
if source.height != height: return False
if length is not None:
if source.length != length: return False
if function is not None:
if source.function != function: return False
for key, value in kwargs:
if source.data.has_key(key) and source.data[key] != value:
return False
else:
return False
return True


Now create a function that searches all items:

def SearchAll(source_list, flag, height=None, length=None, \
function=None, **kwargs):
found = []
if flag:
search = SearchOneOr
else:
search = SearchOneAnd
for source in source_list:
if search(source, height, length, function, kwargs):
found.append(source)
return found

Now pass all_items to SearchAll as the first argument, and it will search
through them all and return a list of all the items which match your
search terms.

Hope this helps.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,816
Latest member
SapanaCarpetStudio

Latest Threads

Top