D
Dan Fitzpatrick
I am trying to build an indexing structure on some phrases. Most phrases
will have 2 - 5 parts (words). The resulting array will be dumped into
an index to find the matching phrases. I don't want to do wildcard
searching on the resulting array to find the phrase.
I would like to turn "This is some text" into
["This",
"This is",
"This is some",
"This is some text",
"is",
"is some",
"is some text",
"some",
"some text",
"text"]
The order of the resulting array doesn't matter. When someone searches
for "is some" or "some text", I want it to find this phrase. I don't
want a search for "is text" to find this phrase though.
My solution so far can find all but the middle elements. In this case,
"is some". But when the original phrase has more parts, then more middle
parts are not added to the array.
text = "This is some text"
#=> "This is some text"
ws = ''; text.split(/\W/).collect{|w| ws = (ws+' '+w).strip; ws}
#=> ["This", "This is", "This is some", "This is some text"]
ws = ''; text.split(/\W/).reverse.collect{|w| ws = (w+' '+ws).strip; ws}
#=> ["text", "some text", "is some text", "This is some text"]
text.split(/\W/).collect{|w| w}
=> ["This", "is", "some", "text"]
Is there an better Ruby way to do this? Or is there a better data
structure for retrieving a word or an exact phrase within a
phrase/sentence without wild-carding the search.
Thanks,
Dan
will have 2 - 5 parts (words). The resulting array will be dumped into
an index to find the matching phrases. I don't want to do wildcard
searching on the resulting array to find the phrase.
I would like to turn "This is some text" into
["This",
"This is",
"This is some",
"This is some text",
"is",
"is some",
"is some text",
"some",
"some text",
"text"]
The order of the resulting array doesn't matter. When someone searches
for "is some" or "some text", I want it to find this phrase. I don't
want a search for "is text" to find this phrase though.
My solution so far can find all but the middle elements. In this case,
"is some". But when the original phrase has more parts, then more middle
parts are not added to the array.
text = "This is some text"
#=> "This is some text"
ws = ''; text.split(/\W/).collect{|w| ws = (ws+' '+w).strip; ws}
#=> ["This", "This is", "This is some", "This is some text"]
ws = ''; text.split(/\W/).reverse.collect{|w| ws = (w+' '+ws).strip; ws}
#=> ["text", "some text", "is some text", "This is some text"]
text.split(/\W/).collect{|w| w}
=> ["This", "is", "some", "text"]
Is there an better Ruby way to do this? Or is there a better data
structure for retrieving a word or an exact phrase within a
phrase/sentence without wild-carding the search.
Thanks,
Dan