lucene: add a field to index, based on html meta tag

K

Keith Beef

I have a question about building an index file.

I've been using the Lucene demo from
http://lucene.apache.org/java/2_1_0/demo.html

I want to add a field named "category" to my HTML documents, and ideally I
would like to do this by reading a meta tag in the HTML document, so that
when searching I can use a term like "category:spare_parts" to limit the
hits returned.

E.g., when indexing the file123456789.html the tag <meta name="category"
content="spare_parts"> would put the value "spare parts" in the "category"
field.

So how could I do this?


Regards,
Keith.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,236
Members
46,821
Latest member
AleidaSchi

Latest Threads

Top