Ola K said:
Hi,
I am pretty new to Python and I want to make a script that will search
for the following options:
1) words made of uppercase characters -only- (like "YES")
2) words made of lowercase character -only- (like "yes")
3) and words with only the first letter capitalized (like "Yes")
* and I need to do all these considering the fact that not all letters
are indeed English letters.
I went through different documention section but couldn't find a right
condition, function or method for it.
Suggestions will be very much appriciated...
--Ola
Ola,
You may be new to Python, but are you new to regular expressions too? I am
no wiz at them, but here is a script that takes a stab at what you are
trying to do. (For more regular expression info, see
http://www.amk.ca/python/howto/regex/.)
The script has these steps:
- create strings containing all unicode chars that are considered "lower"
and "upper", using the unicode.is* methods
- use these strings to construct 3 regular expressions (or "re"s), one for
words of all lowercase letters, one for words of all uppercase letters, and
one for words that start with an uppercase letter followed by at least one
lowercase letter.
- use each re to search the string u"YES yes Yes", and print the found
matches
I've used unicode strings throughout, so this should be applicable to your
text consisting of letters beyond the basic Latin set (since Outlook Express
is trying to install Israeli fonts when reading your post, I assume these
are the characters you are trying to handle). You may have to do some setup
of your locale for proper handling of unicode.isupper, etc., but I hope this
gives you a jump start on your problem.
-- Paul
import sys
import re
uppers = u"".join( unichr(i) for i in range(sys.maxunicode)
if unichr(i).isupper() )
lowers = u"".join( unichr(i) for i in range(sys.maxunicode)
if unichr(i).islower() )
allUpperRe = ur"\b[%s]+\b" % uppers
allLowerRe = ur"\b[%s]+\b" % lowers
capWordRe = ur"\b[%s][%s]+\b" % (uppers,lowers)
regexes = [
(allUpperRe, "all upper"),
(allLowerRe, "all lower"),
(capWordRe, "title case"),
]
for reString,label in regexes:
reg = re.compile(reString)
result = reg.findall(u" YES yes Yes ")
print label,":",result
Prints:
all upper : [u'YES']
all lower : [u'yes']
title case : [u'Yes']