a -very- case sensitive search

Ola K · Nov 25, 2006

Hi,
I am pretty new to Python and I want to make a script that will search
for the following options:
1) words made of uppercase characters -only- (like "YES")
2) words made of lowercase character -only- (like "yes")
3) and words with only the first letter capitalized (like "Yes")
* and I need to do all these considering the fact that not all letters
are indeed English letters.

I went through different documention section but couldn't find a right
condition, function or method for it.
Suggestions will be very much appriciated...
--Ola

Goofy666 · Nov 25, 2006

Hi

> * and I need to do all these considering the fact that not all letters
> are indeed English letters.

You mean letters from the English alphabet (derived from the Latin/Roman
alphabet, fyi)? I'm sorry for the nitpicking, but 'English letters'
sounds a bit too 'ackward' to me.

> I went through different documention section but couldn't find a right
> condition, function or method for it.
> Suggestions will be very much appriciated...

I'm still (trying to) learn(ing) it myself, but you can try looking into
using regular expressions. There's a standard module for it (re), see
the PyLib Reference for details; http://docs.python.org/lib/module-re.html.

--Laurens

Dustan · Nov 25, 2006

Ola said:
Hi,
I am pretty new to Python and I want to make a script that will search
for the following options:
1) words made of uppercase characters -only- (like "YES")
2) words made of lowercase character -only- (like "yes")
3) and words with only the first letter capitalized (like "Yes")
* and I need to do all these considering the fact that not all letters
are indeed English letters.

I went through different documention section but couldn't find a right
condition, function or method for it.
Suggestions will be very much appriciated...
--Ola

I'm not sure exactly what you mean by "considering the fact that not
all letters are indeed English letters"; you could mean you don't care
about the non-english characters, or you could mean you don't want any
non-english characters at all (so the function should return False in
that case). If the case is the former, there's a simple test for each:

Dustan · Nov 25, 2006

Dustan said:
I'm not sure exactly what you mean by "considering the fact that not
all letters are indeed English letters"; you could mean you don't care
about the non-english characters, or you could mean you don't want any
non-english characters at all (so the function should return False in
that case). If the case is the former, there's a simple test for each:

If you're using google groups, it for some reason thought my example
code was 'quoted text', which it certainly isn't, seeing as it's not
found anywhere prior to my message.

Dustan · Nov 25, 2006

Steven said:
Hi,
I am pretty new to Python and I want to make a script that will search
for the following options:
1) words made of uppercase characters -only- (like "YES")
2) words made of lowercase character -only- (like "yes")
3) and words with only the first letter capitalized (like "Yes")
* and I need to do all these considering the fact that not all letters
are indeed English letters.

I went through different documention section but couldn't find a right
condition, function or method for it.

Click to expand...

At the command prompt:

# result edited for clarity
[ ... 'isalnum', 'isalpha', 'isdigit', 'islower',
'isspace', 'istitle', 'isupper', ... ]

Then do this:

and read the text it provides. Then experiment on the command line:
False

Forget what I said; I didn't know about the str.is* methods.

Steven D'Aprano · Nov 25, 2006

Hi,
I am pretty new to Python and I want to make a script that will search
for the following options:
1) words made of uppercase characters -only- (like "YES")
2) words made of lowercase character -only- (like "yes")
3) and words with only the first letter capitalized (like "Yes")
* and I need to do all these considering the fact that not all letters
are indeed English letters.

I went through different documention section but couldn't find a right
condition, function or method for it.

At the command prompt:

# result edited for clarity
[ ... 'isalnum', 'isalpha', 'isdigit', 'islower',
'isspace', 'istitle', 'isupper', ... ]

Then do this:

and read the text it provides. Then experiment on the command line:
False

Then come back to us if they aren't suitable, and tell us WHY they aren't
suitable.

John Machin · Nov 25, 2006

Dustan said:
Steven said:

Hi,
I am pretty new to Python and I want to make a script that will search
for the following options:
1) words made of uppercase characters -only- (like "YES")
2) words made of lowercase character -only- (like "yes")
3) and words with only the first letter capitalized (like "Yes")
* and I need to do all these considering the fact that not all letters
are indeed English letters.

I went through different documention section but couldn't find a right
condition, function or method for it.

Click to expand...

At the command prompt:

dir('')

Click to expand...

# result edited for clarity
[ ... 'isalnum', 'isalpha', 'isdigit', 'islower',
'isspace', 'istitle', 'isupper', ... ]

Then do this:

help(''.islower)

Click to expand...

and read the text it provides. Then experiment on the command line:

'abcd1234'.islower() True
'aBcd1234'.islower()

Click to expand...

False

Click to expand...

Forget what I said; I didn't know about the str.is* methods.

.... or about the unicode.is* [same bunch of] methods, which may
possibly address the OP's
"not all letters are indeed English letters" concerns. If he's stuck
with 8-bit str objects, he may need to ensure the locale is set
properly, as vaguely hinted at in the docs that Steven pointed him to.

Cheers,
John

John Machin · Nov 26, 2006

Dustan said:
If you're using google groups, it for some reason thought my example
code was 'quoted text', which it certainly isn't, seeing as it's not
found anywhere prior to my message.

Sigh. And if we're NOT using Google Groups, it still thinks so ...

The reason is that your "example code" was in fact a screen-dump of a
Python interactive session, in which lines are preceded by ">>>" which
Google Groups simplistically thinks is quoted text from a previous
message.

HTH,
John

Dennis Lee Bieber · Nov 26, 2006

The reason is that your "example code" was in fact a screen-dump of a
Python interactive session, in which lines are preceded by ">>>" which
Google Groups simplistically thinks is quoted text from a previous
message.

So does Agent, though I've learned to live with it...
--
Wulfraed Dennis Lee Bieber KD6MOG
(e-mail address removed) (e-mail address removed)
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: (e-mail address removed))
HTTP://www.bestiaria.com/

Paul McGuire · Nov 26, 2006

Ola K said:
Hi,
I am pretty new to Python and I want to make a script that will search
for the following options:
1) words made of uppercase characters -only- (like "YES")
2) words made of lowercase character -only- (like "yes")
3) and words with only the first letter capitalized (like "Yes")
* and I need to do all these considering the fact that not all letters
are indeed English letters.

I went through different documention section but couldn't find a right
condition, function or method for it.
Suggestions will be very much appriciated...
--Ola

Ola,

You may be new to Python, but are you new to regular expressions too? I am
no wiz at them, but here is a script that takes a stab at what you are
trying to do. (For more regular expression info, see
http://www.amk.ca/python/howto/regex/.)

The script has these steps:
- create strings containing all unicode chars that are considered "lower"
and "upper", using the unicode.is* methods
- use these strings to construct 3 regular expressions (or "re"s), one for
words of all lowercase letters, one for words of all uppercase letters, and
one for words that start with an uppercase letter followed by at least one
lowercase letter.
- use each re to search the string u"YES yes Yes", and print the found
matches

I've used unicode strings throughout, so this should be applicable to your
text consisting of letters beyond the basic Latin set (since Outlook Express
is trying to install Israeli fonts when reading your post, I assume these
are the characters you are trying to handle). You may have to do some setup
of your locale for proper handling of unicode.isupper, etc., but I hope this
gives you a jump start on your problem.

-- Paul

import sys
import re

uppers = u"".join( unichr(i) for i in range(sys.maxunicode)
if unichr(i).isupper() )
lowers = u"".join( unichr(i) for i in range(sys.maxunicode)
if unichr(i).islower() )

allUpperRe = ur"\b[%s]+\b" % uppers
allLowerRe = ur"\b[%s]+\b" % lowers
capWordRe = ur"\b[%s][%s]+\b" % (uppers,lowers)

regexes = [
(allUpperRe, "all upper"),
(allLowerRe, "all lower"),
(capWordRe, "title case"),
]
for reString,label in regexes:
reg = re.compile(reString)
result = reg.findall(u" YES yes Yes ")
print label,":",result

Prints:
all upper : [u'YES']
all lower : [u'yes']
title case : [u'Yes']

John Machin · Nov 26, 2006

Paul said:
Ola,

You may be new to Python, but are you new to regular expressions too? I am
no wiz at them, but here is a script that takes a stab at what you are
trying to do. (For more regular expression info, see
http://www.amk.ca/python/howto/regex/.)

The script has these steps:
- create strings containing all unicode chars that are considered "lower"
and "upper", using the unicode.is* methods
- use these strings to construct 3 regular expressions (or "re"s), one for
words of all lowercase letters, one for words of all uppercase letters, and
one for words that start with an uppercase letter followed by at least one
lowercase letter.
- use each re to search the string u"YES yes Yes", and print the found
matches

I've used unicode strings throughout, so this should be applicable to your
text consisting of letters beyond the basic Latin set (since Outlook Express
is trying to install Israeli fonts when reading your post, I assume these
are the characters you are trying to handle).

I'd guessed the OP was in Israel from his e-mail address. If that's
what Outlook Express is doing, then that's conclusive proof

An aside to the OP: Pardon my ignorance, but does Hebrew have upper and
lower case?

You may have to do some setup
of your locale for proper handling of unicode.isupper, etc.,

Whatever gave you that impression?

but I hope this
gives you a jump start on your problem.

-- Paul

import sys
import re

uppers = u"".join( unichr(i) for i in range(sys.maxunicode)
if unichr(i).isupper() )
lowers = u"".join( unichr(i) for i in range(sys.maxunicode)
if unichr(i).islower() )

Just in case the OP is running a 32-bit unicode implementation, you
might want to make that xrange, not range

allUpperRe = ur"\b[%s]+\b" % uppers
allLowerRe = ur"\b[%s]+\b" % lowers
capWordRe = ur"\b[%s][%s]+\b" % (uppers,lowers)

regexes = [
(allUpperRe, "all upper"),
(allLowerRe, "all lower"),
(capWordRe, "title case"),
]
for reString,label in regexes:
reg = re.compile(reString)
result = reg.findall(u" YES yes Yes ")
print label,":",result

Prints:
all upper : [u'YES']
all lower : [u'yes']
title case : [u'Yes']

Cheers,
John

Paul McGuire · Nov 26, 2006

John -

Thanks for the updates. Comments below...

-- Paul

Whatever gave you that impression?

Nothing. Just my own ignorance of unicode and i18n. This post really is
just string mechanics and re's - I wasn't sure I had all the underlying
unicode stuff right.

Just in case the OP is running a 32-bit unicode implementation, you
might want to make that xrange, not range

Good tip. I rarely use xrange, it seems like such a language wart. Isn't
"range" going to become what "xrange" is in Py3k?

Ola K · Dec 2, 2006

Thank you! This was really helpful. Also the data bit about .istitle()
was the missinng piece of the puzzle for me... So now my script is nice
and working

And as beside the point, yes I am from Israel, and no, we don't have
uper case and lower case letters. Hebrew has only one set of letters.
So my script was actualy for the english letters inside the hebrew
text...

--Ola

Paul McGuire ëúá:

Ola K said:
Ola K said:

Hi,
I am pretty new to Python and I want to make a script that will search
for the following options:
1) words made of uppercase characters -only- (like "YES")
2) words made of lowercase character -only- (like "yes")
3) and words with only the first letter capitalized (like "Yes")
* and I need to do all these considering the fact that not all letters
are indeed English letters.

I went through different documention section but couldn't find a right
condition, function or method for it.
Suggestions will be very much appriciated...
--Ola

Click to expand...

Ola,

You may be new to Python, but are you new to regular expressions too? I am
no wiz at them, but here is a script that takes a stab at what you are
trying to do. (For more regular expression info, see
http://www.amk.ca/python/howto/regex/.)

The script has these steps:
- create strings containing all unicode chars that are considered "lower"
and "upper", using the unicode.is* methods
- use these strings to construct 3 regular expressions (or "re"s), one for
words of all lowercase letters, one for words of all uppercase letters, and
one for words that start with an uppercase letter followed by at least one
lowercase letter.
- use each re to search the string u"YES yes Yes", and print the found
matches

I've used unicode strings throughout, so this should be applicable to your
text consisting of letters beyond the basic Latin set (since Outlook Express
is trying to install Israeli fonts when reading your post, I assume these
are the characters you are trying to handle). You may have to do some setup
of your locale for proper handling of unicode.isupper, etc., but I hope this
gives you a jump start on your problem.

-- Paul

import sys
import re

uppers = u"".join( unichr(i) for i in range(sys.maxunicode)
if unichr(i).isupper() )
lowers = u"".join( unichr(i) for i in range(sys.maxunicode)
if unichr(i).islower() )

allUpperRe = ur"\b[%s]+\b" % uppers
allLowerRe = ur"\b[%s]+\b" % lowers
capWordRe = ur"\b[%s][%s]+\b" % (uppers,lowers)

regexes = [
(allUpperRe, "all upper"),
(allLowerRe, "all lower"),
(capWordRe, "title case"),
]
for reString,label in regexes:
reg = re.compile(reString)
result = reg.findall(u" YES yes Yes ")
print label,":",result

Prints:
all upper : [u'YES']
all lower : [u'yes']
title case : [u'Yes']

Case tagging and python	3	Jul 31, 2008
how to search multiple textfiles ?	12	Sep 26, 2008
Boids, a use case	6	Mar 22, 2010
PyWart: PEP8: a seething cauldron of inconsistencies.	1	Jul 28, 2011
PyWart: PEP8: A cauldron of inconsistencies.	7	Jul 27, 2011
I'm tempted to quit out of frustration	1	Aug 13, 2023
[SUMMARY] Word Search Generator (#159)	3	Apr 18, 2008
A text lossy compression scheme	1	Sep 1, 2012

a -very- case sensitive search

Ola K

Goofy666

Dustan

Dustan

Dustan

Steven D'Aprano

John Machin

John Machin

Dennis Lee Bieber

Paul McGuire

John Machin

Paul McGuire

Ola K

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads