R
Robert Watkins
Has anyone ever run into code that will parse a Boolean query into a
tree structure? I'm working on a search application, one of the features
of which is that it should be able to store a set of queries, each of
which can refer to other queries, i.e.:
1. cat OR dog
2. apples AND oranges AND (NOT bananas)
3. fred NEAR barney
4. #1 OR #2
5. #4 AND #3
Users should be able to edit and delete queries, so that, using the
above example, if one were to delete query #3, query #5 would have to
change also (becoming "#4").
The given example is rather simple, but a set of queries can run upwards
of 300 individual queries, of any level of complexity. And when a query
is deleted, all queries that reference it must change to accommodate the
deletion, and -- here's the fun part -- these queries must be
reformatted so that the Boolean syntax is not broken. For example, if we
start with the query element:
n. bottles AND (#3 AND (NOT #2)) OR (plates AND (#2 OR #4))
and we delete #2, we must end up with:
n. bottles AND #3 OR (plates AND #4)
The example may be convoluted, but I think you can see what I mean.
I've been playing around with Jakarta's Lucene 1.2, which is intended as
a complete full-text search solution, but the QueryParserTokenManager
class allows me to fiddle with query elements quite nicely. However,
it's still at only one dimension, if you will, and I think (perhaps
spuriously) that query reformatting might be easier with some sort of
hierarchical structure. And while, yes, it would be possible to write
this myself, I'd rather see if someone smarter has done it already! I've
also been fiddling with Lucene 1.3-RC2, trying, for example to override
the toString() method of the BooleanQuery class, but it's become quite a
mess, and I don't feel like I've moved any closer to a solution.
Help! (and thank you)
-- Robert
tree structure? I'm working on a search application, one of the features
of which is that it should be able to store a set of queries, each of
which can refer to other queries, i.e.:
1. cat OR dog
2. apples AND oranges AND (NOT bananas)
3. fred NEAR barney
4. #1 OR #2
5. #4 AND #3
Users should be able to edit and delete queries, so that, using the
above example, if one were to delete query #3, query #5 would have to
change also (becoming "#4").
The given example is rather simple, but a set of queries can run upwards
of 300 individual queries, of any level of complexity. And when a query
is deleted, all queries that reference it must change to accommodate the
deletion, and -- here's the fun part -- these queries must be
reformatted so that the Boolean syntax is not broken. For example, if we
start with the query element:
n. bottles AND (#3 AND (NOT #2)) OR (plates AND (#2 OR #4))
and we delete #2, we must end up with:
n. bottles AND #3 OR (plates AND #4)
The example may be convoluted, but I think you can see what I mean.
I've been playing around with Jakarta's Lucene 1.2, which is intended as
a complete full-text search solution, but the QueryParserTokenManager
class allows me to fiddle with query elements quite nicely. However,
it's still at only one dimension, if you will, and I think (perhaps
spuriously) that query reformatting might be easier with some sort of
hierarchical structure. And while, yes, it would be possible to write
this myself, I'd rather see if someone smarter has done it already! I've
also been fiddling with Lucene 1.3-RC2, trying, for example to override
the toString() method of the BooleanQuery class, but it's become quite a
mess, and I don't feel like I've moved any closer to a solution.
Help! (and thank you)
-- Robert