M
mike b.
Hi all,
I have to parse about 2000 files that are written in multiple
languages (some English, some Korean, some Arabic and some Japanese).
I have to split these UTF-8 encoded into individual sentences. Has
anyone written a good parser that can parse all these non-Latin
character languages or can someone give me some advice on how to go
about writing a parser that can handle all these fairly different
languages?
Thank you,
Mike
I have to parse about 2000 files that are written in multiple
languages (some English, some Korean, some Arabic and some Japanese).
I have to split these UTF-8 encoded into individual sentences. Has
anyone written a good parser that can parse all these non-Latin
character languages or can someone give me some advice on how to go
about writing a parser that can handle all these fairly different
languages?
Thank you,
Mike