How to identify cyrillic characters in String?

T

t.javast

Hi everyone,

How can I identfiy my string contains cyrillic characters?
Whether my String has cyrillic characters in it?

Thanks in advance :)
 
K

Karl Uppiano

Hi everyone,

How can I identfiy my string contains cyrillic characters?
Whether my String has cyrillic characters in it?

Thanks in advance :)

I know it's a little weird, but you might iterate through your string,
testing each character, using something like this:

boolean isCyrillic(char c) {
return
Character.UnicodeBlock.CYRILLIC.equals(Character.UnicodeBlock.of(c));
}

Perhaps not the most efficient, but you don't have to recreate and maintain
parts of the Unicode character database this way.
 
A

Andreas Leitgeb

Thanks a lot karl !!
This will help. :)

Excuse me, I'm just curious, as to why one would need
this. Recognizing only cyrillic, but e.g. not chinese,etc.
Does cyrillic include both greek and russian?

If it's about language specifics, would you also
recognize e.g. "greeklish" (greek written with roman
letters, like 'ellhnika') ?

Please help me widen my horizon.
 
T

Thomas Fritsch

Andreas said:
Excuse me, I'm just curious, as to why one would need
this. Recognizing only cyrillic, but e.g. not chinese,etc.
Does cyrillic include both greek and russian?
No, Cyrillic ist just Cyrillic, and has nothing to do with Greek
characters. The Cyrillic characters are '\u0400' to '\u04FF'. The Greek
characters are '\u0370' to '\u03FF'. However, the 'A' in Latin, Greek,
Cyrillic look all the same, although they are 3 different characters
('\0041', '\u0391', '\u0410').
It seems you confuse character ranges (like Cyrillic) with languages
(like Russian, Bulgarian, Serbian) which use these characters.
May be the confusion arises because there is only one language in the
world (the Greek language) which uses Greek characters.
If it's about language specifics, would you also
recognize e.g. "greeklish" (greek written with roman
letters, like 'ellhnika') ?
Recognizing "greeklish" is a completely different story (probably much
more difficult).
 
A

Andreas Leitgeb

Thomas Fritsch said:
No, Cyrillic ist just Cyrillic, and has nothing to do with Greek
characters.

I think there is a misunderstanding... Greek and Russian
character sets are generally referred to as "cyrillic", and
so are the scripts of some more countries (you named serbia,
yourself), whose people use a script whose "R" looks like a
latin "P" (*)

Now that it's clear that you mean the subset of unicode chars
used for russian language, I'd still, and out of pure curiosity,
like to know, what difference it makes in your application, if
a user types russian letters as opposed to whether he's writing
chinese, vietnamese, X-hosa, accented latin letters or just
plain us-latin. ... unless of course, if telling me that
would conflict with any non disclosure agreements...


(*): yes, I'm aware that this is not a language science
worthy definition of cyrillic scripts :)
 
A

Alan Morgan

I think there is a misunderstanding... Greek and Russian
character sets are generally referred to as "cyrillic",

By whom? I can see the Russian alphabet being referred to as
Greek or Hellenic as it is derived mostly from Greek, but I've
never heard of the Greek alphabet referred to as Cyrillic. That
would be like referring to the Latin alphabet as Turkic.

Alan
 
A

Andreas Leitgeb

Alan Morgan said:
By whom? I can see the Russian alphabet being referred to as
Greek or Hellenic as it is derived mostly from Greek, ...

So it's two to one in this thread...
I'll do some research on what "cyrillic" really is.
Maybe I'm wrong, afterall. You know, there's always
things that one believes one knows for sure, and many
years later one might find out it was wrong all the time...
Whether greek has got anything to do with cyrillic wasn't
really my point...

Anyway I'd like to get to know any reason why one would try to
detect characters of any particular alphabet in a user's input.
What difference in applications behaviour would such a detection
reasonably trigger?
 
J

John W. Kennedy

Andreas said:
So it's two to one in this thread...
I'll do some research on what "cyrillic" really is.

The Cyrillic alphabet is the alphabet used for Russian, Serbian,
Bulgarian, Ukrainian, Belorus, and some other East Slavic languages,
plus some non-Slavic languages in the traditional area of Russian
hegemony. The Greek alphabet is something different. The Cyrillic
alphabet is named for St. Cyril, who, along with his brother, St.
Methodius, first brought Christianity to the Slavs.
 
A

Andreas Leitgeb

John W. Kennedy said:
[ about cyrill and his script ]

fine, learnt a thing.

Still I'm eager to learn about possible reasons to determine
existence of any particular script within a user's input, as
the original poster wanted to do.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,236
Messages
2,571,185
Members
47,821
Latest member
mikey

Latest Threads

Top