B
Ben Jessel
I am doing the technical design for a news syndication system that:
1) Reads news feeds ( xml-rss ) from user defined sources.
2) Filters out the news feeds based on applying user defined search
expressions in the subject, and body xml portions.
3) Stores this in a database so that people can view the filtered
news.
I've had a look at the options:
1) Write the whole thing from scratch; devise an algorithm for text
searching. This would have to deal with logic ( i.e "must match Java
AND Programmer but not Coffee" OR "must match java AND UML" ) and
possible regular expressions ( can be dropped out of scope ).
Advantages
Totally meets requirements.
Disadvantage
Complex coding.
Time intensive
2) Use XPath - this would involve stylesheets to be created
on-the-fly, which has the appropriate logic. Some translation between
XPath's search and what the user enters may be required.
Advantages
Less Flexible
Disadvantages
May not be flexible enough ( could you do "must match Java AND
Programmer but not Coffee" OR "must match java AND UML" in XPath ).
3) Save the whole lot to the database and use database Full Text
Retrieval.
Advantages
Simple And Easy
Disadvantages
May be slow.
But of a hacky workaround.
Databases are not Search engines!
I'd really appreciate some comments as going down the wrong route
could be a world of pain!
Thanks,
Ben
1) Reads news feeds ( xml-rss ) from user defined sources.
2) Filters out the news feeds based on applying user defined search
expressions in the subject, and body xml portions.
3) Stores this in a database so that people can view the filtered
news.
I've had a look at the options:
1) Write the whole thing from scratch; devise an algorithm for text
searching. This would have to deal with logic ( i.e "must match Java
AND Programmer but not Coffee" OR "must match java AND UML" ) and
possible regular expressions ( can be dropped out of scope ).
Advantages
Totally meets requirements.
Disadvantage
Complex coding.
Time intensive
2) Use XPath - this would involve stylesheets to be created
on-the-fly, which has the appropriate logic. Some translation between
XPath's search and what the user enters may be required.
Advantages
Less Flexible
Disadvantages
May not be flexible enough ( could you do "must match Java AND
Programmer but not Coffee" OR "must match java AND UML" in XPath ).
3) Save the whole lot to the database and use database Full Text
Retrieval.
Advantages
Simple And Easy
Disadvantages
May be slow.
But of a hacky workaround.
Databases are not Search engines!
I'd really appreciate some comments as going down the wrong route
could be a world of pain!
Thanks,
Ben