A
AR
I had some system maintaince..
Text are long at least 15Kb up to some megs. It shouldn't be so hard to
find out in which language is text, because it is long enought. About 70%
of texts are labeled in which lnguage they are, rest are not. But among
remaining 30% there is at least 90% English, some in German, few French
and few Italian and maybe some Duch, Chinese, .. Russian.. At least I need
tool that will filter out English texts as accurate as possible.
So I think that some kind of statistical approach would be fine...
About 100%.. yes.. I know it is impossible...
Text are long at least 15Kb up to some megs. It shouldn't be so hard to
find out in which language is text, because it is long enought. About 70%
of texts are labeled in which lnguage they are, rest are not. But among
remaining 30% there is at least 90% English, some in German, few French
and few Italian and maybe some Duch, Chinese, .. Russian.. At least I need
tool that will filter out English texts as accurate as possible.
So I think that some kind of statistical approach would be fine...
About 100%.. yes.. I know it is impossible...