trying to match a string

A

arnimavidyarthy

Hi,

Hi,

I am taking a string as an input from the user and it should only
contain the chars:L , M or R

I tried the folllowing in kodos but they are still not perfect:

[^A-K,^N-Q,^S-Z,^0-9]
[L][M][R]
[LRM]?L?[LRM]? etc but they do not exactly meet what I need.

For eg: LRLRLRLRLM is ok but LRLRLRNL is not as it has 'N' .like that.

regards,
SZ

The string may or may not have all the three chars.
 
J

John Machin

Hi,

Hi,

I am taking a string as an input from the user and it should only
contain the chars:L , M or R

I tried the folllowing in kodos but they are still not perfect:

[^A-K,^N-Q,^S-Z,^0-9]
[L][M][R]
[LRM]?L?[LRM]? etc but they do not exactly meet what I need.

For eg: LRLRLRLRLM is ok but LRLRLRNL is not as it has 'N' .like that.

regards,
SZ

The string may or may not have all the three chars.

re.match(r'[LMR]+\Z', your_string)

in English: one or more of L, M , or R, followed by the end of the
string.
 
O

oj

Hi,

Hi,

I am taking a string as an input from the user and it should only
contain the chars:L , M or R

I tried the folllowing in kodos but they are still not perfect:

[^A-K,^N-Q,^S-Z,^0-9]
[L][M][R]
[LRM]?L?[LRM]? etc but they do not exactly meet what I need.

For eg: LRLRLRLRLM is ok but LRLRLRNL is not as it has 'N' .like that.

regards,
SZ

The string may or may not have all the three chars.

With regular expressions, [^LRM] matches a character that isn't L, R
or M. So:

import re

var = "LRLRLRLNR"

if re.search(r'[^LRM]', var):
print "Invalid"
 
J

John Machin

Hi,

I am taking a string as an input from the user and it should only
contain the chars:L , M or R
I tried the folllowing in kodos but they are still not perfect:
[^A-K,^N-Q,^S-Z,^0-9]
[L][M][R]
[LRM]?L?[LRM]? etc but they do not exactly meet what I need.
For eg: LRLRLRLRLM is ok but LRLRLRNL is not as it has 'N' .like that.

The string may or may not have all the three chars.

With regular expressions, [^LRM] matches a character that isn't L, R
or M. So:

import re

var = "LRLRLRLNR"

if re.search(r'[^LRM]', var):
print "Invalid"

Fails if var refers to the empty string.
 
O

oj

On Jul 18, 11:33 am, (e-mail address removed) wrote:
Hi,
Hi,
I am taking a string as an input from the user and it should only
contain the chars:L , M or R
I tried the folllowing in kodos but they are still not perfect:
[^A-K,^N-Q,^S-Z,^0-9]
[L][M][R]
[LRM]?L?[LRM]? etc but they do not exactly meet what I need.
For eg: LRLRLRLRLM is ok but LRLRLRNL is not as it has 'N' .like that..
regards,
SZ
The string may or may not have all the three chars.
With regular expressions, [^LRM] matches a character that isn't L, R
or M. So:
import re
var = "LRLRLRLNR"
if re.search(r'[^LRM]', var):
    print "Invalid"

Fails if var refers to the empty string.

No it doesn't, it succeeds if var is an empty string. An empty string
doesn't contain characters that are not L, R or M.

The OP doesn't specify whether an empty string is valid or not. My
interpretation was that an empty string would be valid.
 
A

Andrew Freeman

oj said:
On Jul 18, 11:33 am, (e-mail address removed) wrote:

Hi,

Hi,

I am taking a string as an input from the user and it should only
contain the chars:L , M or R

I tried the folllowing in kodos but they are still not perfect:

[^A-K,^N-Q,^S-Z,^0-9]
[L][M][R]
[LRM]?L?[LRM]? etc but they do not exactly meet what I need.

For eg: LRLRLRLRLM is ok but LRLRLRNL is not as it has 'N' .like that.

regards,
SZ

The string may or may not have all the three chars.

With regular expressions, [^LRM] matches a character that isn't L, R
or M. So:

import re

var = "LRLRLRLNR"

if re.search(r'[^LRM]', var):
print "Invalid"
Fails if var refers to the empty string.

No it doesn't, it succeeds if var is an empty string. An empty string
doesn't contain characters that are not L, R or M.

The OP doesn't specify whether an empty string is valid or not. My
interpretation was that an empty string would be valid.
Why not just use * instead of + like:

if re.search(r'^[^LRM]*$', var): # note: ^ outside [] is start of string; $ means end of string
print "Invalid"

This will *only* print invalid when there is a character other than L, R, or M or a empty string.
 
A

Andrew Freeman

Andrew said:
oj said:
On Jul 18, 11:33 am, (e-mail address removed) wrote:

Hi,
Hi,
I am taking a string as an input from the user and it
should only
contain the chars:L , M or R
I tried the folllowing in kodos but they are still not
perfect:
[^A-K,^N-Q,^S-Z,^0-9]
[L][M][R]
[LRM]?L?[LRM]? etc but they do not exactly meet what I need.
For eg: LRLRLRLRLM is ok but LRLRLRNL is not as it has 'N'
.like that.
regards,
SZ
The string may or may not have all the three chars.

With regular expressions, [^LRM] matches a character that isn't L, R
or M. So:
import re
var = "LRLRLRLNR"
if re.search(r'[^LRM]', var):
print "Invalid"

Fails if var refers to the empty string.

No it doesn't, it succeeds if var is an empty string. An empty string
doesn't contain characters that are not L, R or M.

The OP doesn't specify whether an empty string is valid or not. My
interpretation was that an empty string would be valid.
Why not just use * instead of + like:

if re.search(r'^[^LRM]*$', var): # note: ^ outside [] is start of
string; $ means end of string
print "Invalid"

This will *only* print invalid when there is a character other than L,
R, or M or a empty string.
Sorry, forget the beginning and ending markers, I just tried it out, it
doesn't work.
use this instead:


if re.search(r'[^LRM]*', var):
print "Invalid"
 
A

arni

Andrew said:
oj said:
On Jul 18, 11:33 am, (e-mail address removed) wrote:
Hi,
        Hi,
        I am taking a string as an input from the user and it
should only
contain the chars:L , M or R
        I tried the folllowing in kodos but they are still not
perfect:
        [^A-K,^N-Q,^S-Z,^0-9]
[L][M][R]
[LRM]?L?[LRM]? etc but they do not exactly meet what I need.
        For eg: LRLRLRLRLM is ok but LRLRLRNL is not as it has 'N'
.like that.
        regards,
SZ
        The string may or may not have all the three chars.
With regular expressions, [^LRM] matches a character that isn't L, R
or M. So:
      import re
      var = "LRLRLRLNR"
      if re.search(r'[^LRM]', var):
    print "Invalid"
Fails if var refers to the empty string.
No it doesn't, it succeeds if var is an empty string. An empty string
doesn't contain characters that are not L, R or M.
The OP doesn't specify whether an empty string is valid or not. My
interpretation was that an empty string would be valid.
Why not just use * instead of + like:
if re.search(r'^[^LRM]*$', var): # note: ^ outside [] is start of
string; $ means end of string
   print "Invalid"
This will *only* print invalid when there is a character other than L,
R, or M or a empty string.

Sorry, forget the beginning and ending markers, I just tried it out, it
doesn't work.
use this instead:

if re.search(r'[^LRM]*', var):
   print "Invalid"

I was using kodos to check the regex.I should have used the IDE
instead.Thanks a lot again.
 
J

John S

Andrew said:
oj said:
On Jul 18, 11:33 am, (e-mail address removed) wrote:
Hi,
Hi,
I am taking a string as an input from the user and it
should only
contain the chars:L , M or R
I tried the folllowing in kodos but they are still not
perfect:
[^A-K,^N-Q,^S-Z,^0-9]
[L][M][R]
[LRM]?L?[LRM]? etc but they do not exactly meet what I need.
For eg: LRLRLRLRLM is ok but LRLRLRNL is not as it has 'N'
.like that.
regards,
SZ
The string may or may not have all the three chars.
With regular expressions, [^LRM] matches a character that isn't L, R
or M. So:
import re
var = "LRLRLRLNR"
if re.search(r'[^LRM]', var):
print "Invalid"
Fails if var refers to the empty string.
No it doesn't, it succeeds if var is an empty string. An empty string
doesn't contain characters that are not L, R or M.
The OP doesn't specify whether an empty string is valid or not. My
interpretation was that an empty string would be valid.
Why not just use * instead of + like:
if re.search(r'^[^LRM]*$', var): # note: ^ outside [] is start of
string; $ means end of string
print "Invalid"
This will *only* print invalid when there is a character other than L,
R, or M or a empty string.

Sorry, forget the beginning and ending markers, I just tried it out, it
doesn't work.
use this instead:

if re.search(r'[^LRM]*', var):
print "Invalid"

This won't work -- every string in the universe contains 0 or more
characters which are not 'L', 'R', or 'M'. That is, the regular
expression X* could match the empty string, which can be found in all
strings.
 
O

oj

Why not just use * instead of + like:
if re.search(r'^[^LRM]*$', var): # note: ^ outside [] is start of
string; $ means end of string
   print "Invalid"
This will *only* print invalid when there is a character other than L,
R, or M or a empty string.

Sorry, forget the beginning and ending markers, I just tried it out, it
doesn't work.
use this instead:

if re.search(r'[^LRM]*', var):
   print "Invalid"

No, that's broken.

That searches for any number of invalid characters. Even 0, so it
ALWAYS matches, no matter what string you give it.

My regex worked in the first place, you're complicating it needlessly.
The presence of one invalid character makes the string invalid, so why
not just search for one? Similarly, there's no need to stick in the
beginning and end markers - you're not trying to match the entire
string, just find part of it that is invalid.

However, I think the sets solution by Scott David Daniels is the most
elegant method put forward.
 
A

Andrew Freeman

oj said:
Why not just use * instead of + like:

if re.search(r'^[^LRM]*$', var): # note: ^ outside [] is start of
string; $ means end of string
print "Invalid"

This will *only* print invalid when there is a character other than L,
R, or M or a empty string.
Sorry, forget the beginning and ending markers, I just tried it out, it
doesn't work.
use this instead:

if re.search(r'[^LRM]*', var):
print "Invalid"

No, that's broken.

That searches for any number of invalid characters. Even 0, so it
ALWAYS matches, no matter what string you give it.

My regex worked in the first place, you're complicating it needlessly.
The presence of one invalid character makes the string invalid, so why
not just search for one? Similarly, there's no need to stick in the
beginning and end markers - you're not trying to match the entire
string, just find part of it that is invalid.

However, I think the sets solution by Scott David Daniels is the most
elegant method put forward.
I see your point after rereading the question, he only wants to match L
R and M
let me revise it please:

To show if valid:

if re.search(r'^[LRM]*$', 'LM'):
print 'Valid'

To show if invalid,

if re.search(r'^[^LRM]*$', '0'):
print 'Inalid'

Example session:
>>> import re
>>> def match(var):
>>> if re.search(r'^[LRM]*$', var):
........ print 'Valid'
........ else:
........ print 'Invalid'Invalid
 
A

Andrew Freeman

John said:
To show if valid:

if re.search(r'^[LRM]*$', 'LM'):
print 'Valid'

A couple of points:
(1) Instead of search(r'^blahblah', ...) use match(r'blahblah', ...)
(2) You need to choose your end-anchor correctly; your pattern is
permitting a newline at the end:

re.search(r'^[LRM]*$', 'LM\n')
<_sre.SRE_Match object at 0x00B9E528>
Thanks for your pointers, however have a question regarding the first:
>>> import re
>>> def my_match(var): # my original
>>> if re.search(r'^[LRM]*$', var):
>>> print 'Valid'
>>> else:
>>> print 'invalid'
>>> def other_match(var): # your suggestion, I believe
>>> if re.search(r'[LRM]*$', var):
>>> print 'Valid'
>>> else:
>>> print 'Invalid'
>>>
>>> eg = 'LRLRLRLRLM'
>>> fg = 'LRLRLRNL'
>>> my_match(eg) Valid # correct!
>>> my_match(fg) Invaild # correct!
>>>
>>> other_match(eg) Valid # correct!
>>> other_match(fg)
Vaild # INCORRECT, please explain

I believe other_match was your suggestion; to remove the ^
my_match is just a renamed version of my original function

Point 2:
Yes, I totally agree with you on point 2, let me try to combine my
knowledge and make a satisfactory function.
>>> def final_match(var):
>>> if re.search(r'^[LRM]*\Z', var): # replace $ with \Z to limit
newlines
.... print 'Valid'
.... else:
.... print 'Invalid'Invalid

So, in conclusion, is this function satisfactory?

def match(var):
if re.search(r'^[LRM]*\Z', var):
print 'Valid'
else:
print 'Invalid'
 
A

Andrew Freeman

I forgot to change search to match. This should be better:

def match(var):
if re.match(r'[LRM]*\Z', var):
return True
else:
return False

I was also thinking if you had a list of these items needing to be
verified you could use this:
>>> l = ['LLMMRR', '00thLL', 'L', '\n']
>>> out = []
>>> map(lambda i: match(i)==False or out.append(i), l)
>>> print out
['LLMMRR', 'L']
 
J

John Machin

I forgot to change search to match. This should be better:

def match(var):
if re.match(r'[LRM]*\Z', var):
return True
else:
return False

A bit wordy ...

if blahblah:
return True
else:
return False

can in total generality be replaced by:

return blahblah

I was also thinking if you had a list of these items needing to be
verified you could use this:

You could, but I suggest you don't use it in a job interview :)
l = ['LLMMRR', '00thLL', 'L', '\n']

(1) Don't use 'L'.lower() as a name; it slows down reading as people
need to fire up their mental parser to distinguish it from the result
of 3 - 2
out = []
map(lambda i: match(i)==False or out.append(i), l)
(2) Read PEP 8
(3) blahblah == False ==> not blahblah
(4) You didn't show the output from map() i.e. something like [None,
True, None, True]
(5) or out.append(...) is a baroque use of a side-effect, and is quite
unnecessary. If you feel inexorably drawn to following the map way,
read up on the filter and reduce functions. Otherwise learn about list
comprehensions and generators.
['LLMMRR', 'L']

Consider this:
import re
alist = ['LLMMRR', '00thLL', 'L', '\n']
zeroplusLRM = re.compile(r'[LRM]*\Z').match
filter(zeroplusLRM, alist) ['LLMMRR', 'L']
[x for x in alist if zeroplusLRM(x)] ['LLMMRR', 'L']

Cheers,
John
 
A

Andrew Freeman

John said:
Andrew said:
John Machin wrote:

A couple of points:
(1) Instead of search(r'^blahblah', ...) use match(r'blahblah', ...)
(2) You need to choose your end-anchor correctly; your pattern is
permitting a newline at the end:
I forgot to change search to match. This should be better:

def match(var):
if re.match(r'[LRM]*\Z', var):
return True
else:
return False

A bit wordy ...

if blahblah:
return True
else:
return False

can in total generality be replaced by:

return blahblah


I was also thinking if you had a list of these items needing to be
verified you could use this:

You could, but I suggest you don't use it in a job interview :)

l = ['LLMMRR', '00thLL', 'L', '\n']

(1) Don't use 'L'.lower() as a name; it slows down reading as people
need to fire up their mental parser to distinguish it from the result
of 3 - 2

out = []
map(lambda i: match(i)==False or out.append(i), l)
(2) Read PEP 8
(3) blahblah == False ==> not blahblah
(4) You didn't show the output from map() i.e. something like [None,
True, None, True]
(5) or out.append(...) is a baroque use of a side-effect, and is quite
unnecessary. If you feel inexorably drawn to following the map way,
read up on the filter and reduce functions. Otherwise learn about list
comprehensions and generators.

print out
['LLMMRR', 'L']

Consider this:

import re
alist = ['LLMMRR', '00thLL', 'L', '\n']
zeroplusLRM = re.compile(r'[LRM]*\Z').match
filter(zeroplusLRM, alist)
['LLMMRR', 'L']
[x for x in alist if zeroplusLRM(x)]
['LLMMRR', 'L']

Thank you for the pointers!
(1) Depending on the typeface I totally agree, Courier New has a nearly
indistinguishable 1 and l, I'm using Dejavu Sans Mono (Bitstream Vera
based). I was just thinking of it as a generic variable name for some
input. I'm fairly new to python and programming in general, it's more of
a hobby.

(2-3) This is actually the first time I've used map, maybe I should not
give extra examples, I was actually using it as a learning tool for
myself. I'm very thankful the mailing list has such skilled
contributers, such as yourself, but I assume that it can't hurt to give
working code, even though the style is less than perfect.

(3) Personally I think map(lambda i: match(i)==False or out.append(i),
l) is a little more readable than map(lambda i: not match(i) or
out.append(i), l) even if "baroque", your use of filter is obviously
much clearer than either.

(4) I highly doubt that this code was actually to be used in an
interactive session, the False/True output was truncated intentionally,
it's an obvious, but superfluous output (unless you were to rely on this
by attaching it to a variable which might lead to sorting issues).

(5) Thank you very much, I've read of the filter and reduce functions,
but haven't used them enough to recognize their usefulness.

I did realize that a list comprehension would be useful, but wanted to
try map()
I put together a generic matcher that returns either a list of True data
(if the input is a list or tuple) or a boolean value:

def match(ex, var):
"ex is the regular expression to match for, var the iterable or
string to return a list of matching items or a boolean value respectively."
ex = re.compile(ex).match
if isinstance(var, (list, tuple)):
return filter(ex, var)
else:
return bool(ex(var))

I believe this is fairly clean and succinct code, it would help my
learning immensely if you feel there is a more succinct, generic way of
writing this function.
 
J

John Machin

John Machin wrote:
(4) I highly doubt that this code was actually to be used in an
interactive session,

The offending code is a nonsense wherever it is used.
the False/True output was truncated intentionally,

What meaning are you attaching to "truncated"?
it's an obvious, but superfluous output (unless you were to rely on this
by attaching it to a variable which might lead to sorting issues).

I put together a generic matcher that returns either a list of True data
(if the input is a list or tuple) or a boolean value:

def match(ex, var):
"ex is the regular expression to match for, var the iterable or
string to return a list of matching items or a boolean value respectively."
ex = re.compile(ex).match

You lose clarity by rebinding ex like that, and you gain nothing.

if isinstance(var, (list, tuple)):
return filter(ex, var)
else:
return bool(ex(var))

I believe this is fairly clean and succinct code, it would help my
learning immensely if you feel there is a more succinct, generic way of
writing this function.

You have created a function which does two quite different things
depending on whether one of the arguments is one of only two of the
many kinds of iterables and which has a rather generic (match what?)
and misleading (something which filters matches is called "match"??)
name. The loss of clarity and ease of understanding caused by the
readers having to find the code for the function so that they can
puzzle through it means that the most succinct, generic and
*recommended* way of writing this function would be not to write it at
all.

Write a function which returns a MatchObject. In the unlikely event
that that anyone really wants to put bools in a list and sort them,
then they can wrap bool() around it. Give it a meaningful name e.g.
match_LRM.

You want to check if a single variable refers to a valid LRM string?
Use match_LRM(the_variable). Nice and clear.

You want to filter out of some iterable all the occurrences of valid
LRM strings? Use filter (whose name indicates its task) or a generator
or list comprehension ... what [x for x in some_iterable if
match_LRM(x)] does should be screamingly obvious i.e. have less chance
than filter of needing a trip to the manual.

HTH,
John
 
A

Andrew Freeman

John said:
The offending code is a nonsense wherever it is used.



What meaning are you attaching to "truncated"?

I'm attaching the meaning of "deleted the line (manually (not in
python))" to truncated, I'm actually using ipython, but though it would
be a good practice to type it out as if it come from the standard
interpretor. I also though it would be OK to leave out some output which
I considered unnecessary.
 
O

oj

let me revise it please:

To show if valid:

if re.search(r'^[LRM]*$', 'LM'):
    print 'Valid'

Fine, this works, although match instead of search blah blah blah as
has already been mentioned. I still think searching for one invalid
character is more elegant then trying to match the entire string, but
that's just personal preference, I guess.
To show if invalid,

if re.search(r'^[^LRM]*$', '0'):
    print 'Inalid'

No. This is wrong. This only matches strings that consist entirely of
characters that are not L, R or M:
import re
if re.search(r'^[^LRM]*$', 'ZZZLZZZ'):
... print "Invalid"
...
This doesn't print "Invalid" because there is one non-invalid
character there, which is clearly not what the OP wanted.
 
F

Fredrik Lundh

oj said:
Fine, this works, although match instead of search blah blah blah as
has already been mentioned. I still think searching for one invalid
character is more elegant then trying to match the entire string, but
that's just personal preference, I guess.

The drawback is that it's a lot easier to mess up the edge cases if you
do that (as this thread has shown). The small speedup you get in
typical cases is quickly offset by extra debugging/testing time (or, for
that matter, arguing with c.l.py:ers over more or less contrived ways to
interpret the original post).

Guess it's up to personal preferences for how to best help others.
Unless the OP explicitly asks for something else, I prefer to use simple
and straight-forward solutions with reasonable execution behaviour over
clever tricks or odd-ball solutions; it's not a JAPH contest, after all.

</F>
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,236
Members
46,822
Latest member
israfaceZa

Latest Threads

Top