S
setar
I store in TreeSet expressions in some natural language. This can be
any natural language. The content of a set is sorted in this language
by language collator:
collator = Collator.getInstance(locale);
collator.compare(string1, string2)
Sorting is working correctly. For example in English the letters have
following order: ..., c, d, e, ... so words: cat, dog, ear will be
sorted in the given order.
Sorting is also working correctly in other languages. In non-English
languages there can be additional letters. For example in Polish there
is a 'æ' letter (c with an accent). This letters change the order of
other letter. In Polish the order is: ..., c, æ, d, e, .... and for
example following words have given order: cena (en. price), æma (en.
moth), duma (en. pride), echo (en. echo).
So index is build correctly for any language and it is sorted
correctly.
Now I want to receive a range of its sorted values which are strings
beginning with a given string.
For English it is easy. If I want to get all expressions beginning with
a string "cat" I can invoke subSet method with the first argument
equal to this string and the second argument the same as the first but
with last letter replaced with its SUCCESOR. Successors in English are
easy to determine:
TreeSet index;
index.subSet("cat", "cau");
But for other languages determining successors of letters isn't easy.
For example in Polish the successor of letter 'c' is 'æ' not 'd'. I
could create tables with orders of letters for all languages, but there
is too many languages... I was looking in Java API for method
retrieving successor of given character but I found nothing. Does
anybody know if there is such a method?
Second way of retrieving expressions beginning with a given string is
invoking subSet with following arguments:
TreeSet index;
index.subSet("cat", "cat" + Character.MAX_VALUE);
I tested it with few string and everything seemed to work ok. But
recently I tried to return all expressions starting with a word
"cat", not only with a string. So I want to receive string "cat
is running", but I don't want to receive expression "catch". So I
invoked subSet with space after string "cat":
TreeSet index;
index.subSet("cat ", "cat " + Character.MAX_VALUE);
Unfortunately method returned also the "catch" word. I don't know
why subSet is working in this way.
Does anybody know how to correct described ideas or maybe problem can
be solved in other way?
any natural language. The content of a set is sorted in this language
by language collator:
collator = Collator.getInstance(locale);
collator.compare(string1, string2)
Sorting is working correctly. For example in English the letters have
following order: ..., c, d, e, ... so words: cat, dog, ear will be
sorted in the given order.
Sorting is also working correctly in other languages. In non-English
languages there can be additional letters. For example in Polish there
is a 'æ' letter (c with an accent). This letters change the order of
other letter. In Polish the order is: ..., c, æ, d, e, .... and for
example following words have given order: cena (en. price), æma (en.
moth), duma (en. pride), echo (en. echo).
So index is build correctly for any language and it is sorted
correctly.
Now I want to receive a range of its sorted values which are strings
beginning with a given string.
For English it is easy. If I want to get all expressions beginning with
a string "cat" I can invoke subSet method with the first argument
equal to this string and the second argument the same as the first but
with last letter replaced with its SUCCESOR. Successors in English are
easy to determine:
TreeSet index;
index.subSet("cat", "cau");
But for other languages determining successors of letters isn't easy.
For example in Polish the successor of letter 'c' is 'æ' not 'd'. I
could create tables with orders of letters for all languages, but there
is too many languages... I was looking in Java API for method
retrieving successor of given character but I found nothing. Does
anybody know if there is such a method?
Second way of retrieving expressions beginning with a given string is
invoking subSet with following arguments:
TreeSet index;
index.subSet("cat", "cat" + Character.MAX_VALUE);
I tested it with few string and everything seemed to work ok. But
recently I tried to return all expressions starting with a word
"cat", not only with a string. So I want to receive string "cat
is running", but I don't want to receive expression "catch". So I
invoked subSet with space after string "cat":
TreeSet index;
index.subSet("cat ", "cat " + Character.MAX_VALUE);
Unfortunately method returned also the "catch" word. I don't know
why subSet is working in this way.
Does anybody know how to correct described ideas or maybe problem can
be solved in other way?