trying to match a string

arnimavidyarthy · Jul 18, 2008

Hi,

Hi,

I am taking a string as an input from the user and it should only
contain the chars:L , M or R

I tried the folllowing in kodos but they are still not perfect:

[^A-K,^N-Q,^S-Z,^0-9]
[L][M][R]
[LRM]?L?[LRM]? etc but they do not exactly meet what I need.

For eg: LRLRLRLRLM is ok but LRLRLRNL is not as it has 'N' .like that.

regards,
SZ

The string may or may not have all the three chars.

John Machin · Jul 18, 2008

Hi,

Hi,

I am taking a string as an input from the user and it should only
contain the chars:L , M or R

I tried the folllowing in kodos but they are still not perfect:

[^A-K,^N-Q,^S-Z,^0-9]
[L][M][R]
[LRM]?L?[LRM]? etc but they do not exactly meet what I need.

For eg: LRLRLRLRLM is ok but LRLRLRNL is not as it has 'N' .like that.

regards,
SZ

The string may or may not have all the three chars.

re.match(r'[LMR]+\Z', your_string)

in English: one or more of L, M , or R, followed by the end of the
string.

oj · Jul 18, 2008

Hi,

Hi,

I am taking a string as an input from the user and it should only
contain the chars:L , M or R

I tried the folllowing in kodos but they are still not perfect:

[^A-K,^N-Q,^S-Z,^0-9]
[L][M][R]
[LRM]?L?[LRM]? etc but they do not exactly meet what I need.

For eg: LRLRLRLRLM is ok but LRLRLRNL is not as it has 'N' .like that.

regards,
SZ

The string may or may not have all the three chars.

With regular expressions, [^LRM] matches a character that isn't L, R
or M. So:

import re

var = "LRLRLRLNR"

if re.search(r'[^LRM]', var):
print "Invalid"

John Machin · Jul 18, 2008

Hi,

I am taking a string as an input from the user and it should only
contain the chars:L , M or R

Click to expand...

I tried the folllowing in kodos but they are still not perfect:

Click to expand...

[^A-K,^N-Q,^S-Z,^0-9]
[L][M][R]
[LRM]?L?[LRM]? etc but they do not exactly meet what I need.

Click to expand...

For eg: LRLRLRLRLM is ok but LRLRLRNL is not as it has 'N' .like that.

The string may or may not have all the three chars.

Click to expand...

With regular expressions, [^LRM] matches a character that isn't L, R
or M. So:

import re

var = "LRLRLRLNR"

if re.search(r'[^LRM]', var):
print "Invalid"

Fails if var refers to the empty string.

oj · Jul 18, 2008

On Jul 18, 11:33 am, (e-mail address removed) wrote:

Hi,
Hi,
I am taking a string as an input from the user and it should only
contain the chars:L , M or R
I tried the folllowing in kodos but they are still not perfect:
[^A-K,^N-Q,^S-Z,^0-9]
[L][M][R]
[LRM]?L?[LRM]? etc but they do not exactly meet what I need.
For eg: LRLRLRLRLM is ok but LRLRLRNL is not as it has 'N' .like that..
regards,
SZ
The string may or may not have all the three chars.

Click to expand...

Click to expand...

With regular expressions, [^LRM] matches a character that isn't L, R
or M. So:

Click to expand...

import re

Click to expand...

var = "LRLRLRLNR"

Click to expand...

if re.search(r'[^LRM]', var):
print "Invalid"

Click to expand...

Fails if var refers to the empty string.

No it doesn't, it succeeds if var is an empty string. An empty string
doesn't contain characters that are not L, R or M.

The OP doesn't specify whether an empty string is valid or not. My
interpretation was that an empty string would be valid.

Andrew Freeman · Jul 18, 2008

oj said:
On Jul 18, 11:33 am, (e-mail address removed) wrote:

Hi,

Hi,

I am taking a string as an input from the user and it should only
contain the chars:L , M or R

I tried the folllowing in kodos but they are still not perfect:

[^A-K,^N-Q,^S-Z,^0-9]
[L][M][R]
[LRM]?L?[LRM]? etc but they do not exactly meet what I need.

For eg: LRLRLRLRLM is ok but LRLRLRNL is not as it has 'N' .like that.

regards,
SZ

The string may or may not have all the three chars.

With regular expressions, [^LRM] matches a character that isn't L, R
or M. So:

import re

var = "LRLRLRLNR"

if re.search(r'[^LRM]', var):
print "Invalid"

Click to expand...

Fails if var refers to the empty string.

Click to expand...

No it doesn't, it succeeds if var is an empty string. An empty string
doesn't contain characters that are not L, R or M.

The OP doesn't specify whether an empty string is valid or not. My
interpretation was that an empty string would be valid.

Why not just use * instead of + like:

if re.search(r'^[^LRM]*$', var): # note: ^ outside [] is start of string; $ means end of string
print "Invalid"

This will *only* print invalid when there is a character other than L, R, or M or a empty string.

Andrew Freeman · Jul 18, 2008

Andrew said:
oj said:

On Jul 18, 11:33 am, (e-mail address removed) wrote:

Hi,
Hi,
I am taking a string as an input from the user and it
should only
contain the chars:L , M or R
I tried the folllowing in kodos but they are still not
perfect:
[^A-K,^N-Q,^S-Z,^0-9]
[L][M][R]
[LRM]?L?[LRM]? etc but they do not exactly meet what I need.
For eg: LRLRLRLRLM is ok but LRLRLRNL is not as it has 'N'
.like that.
regards,
SZ
The string may or may not have all the three chars.

With regular expressions, [^LRM] matches a character that isn't L, R
or M. So:
import re
var = "LRLRLRLNR"
if re.search(r'[^LRM]', var):
print "Invalid"

Fails if var refers to the empty string.

Click to expand...

No it doesn't, it succeeds if var is an empty string. An empty string
doesn't contain characters that are not L, R or M.

The OP doesn't specify whether an empty string is valid or not. My
interpretation was that an empty string would be valid.

Click to expand...

Why not just use * instead of + like:

if re.search(r'^[^LRM]*$', var): # note: ^ outside [] is start of
string; $ means end of string
print "Invalid"

This will *only* print invalid when there is a character other than L,
R, or M or a empty string.

Sorry, forget the beginning and ending markers, I just tried it out, it
doesn't work.
use this instead:

if re.search(r'[^LRM]*', var):
print "Invalid"

arni · Jul 18, 2008

Andrew said:
Andrew said:

oj said:

On Jul 18, 11:33 am, (e-mail address removed) wrote:
Hi,
Hi,
I am taking a string as an input from the user and it
should only
contain the chars:L , M or R
I tried the folllowing in kodos but they are still not
perfect:
[^A-K,^N-Q,^S-Z,^0-9]
[L][M][R]
[LRM]?L?[LRM]? etc but they do not exactly meet what I need.
For eg: LRLRLRLRLM is ok but LRLRLRNL is not as it has 'N'
.like that.
regards,
SZ
The string may or may not have all the three chars.
With regular expressions, [^LRM] matches a character that isn't L, R
or M. So:
import re
var = "LRLRLRLNR"
if re.search(r'[^LRM]', var):
print "Invalid"
Fails if var refers to the empty string.
No it doesn't, it succeeds if var is an empty string. An empty string
doesn't contain characters that are not L, R or M.
The OP doesn't specify whether an empty string is valid or not. My
interpretation was that an empty string would be valid.

Click to expand...

Click to expand...

Why not just use * instead of + like:

Click to expand...

if re.search(r'^[^LRM]*$', var): # note: ^ outside [] is start of
string; $ means end of string
print "Invalid"

Click to expand...

This will *only* print invalid when there is a character other than L,
R, or M or a empty string.

Click to expand...

Sorry, forget the beginning and ending markers, I just tried it out, it
doesn't work.
use this instead:

if re.search(r'[^LRM]*', var):
print "Invalid"

I was using kodos to check the regex.I should have used the IDE
instead.Thanks a lot again.

John S · Jul 18, 2008

Andrew said:
Andrew said:

oj said:

On Jul 18, 11:33 am, (e-mail address removed) wrote:
Hi,
Hi,
I am taking a string as an input from the user and it
should only
contain the chars:L , M or R
I tried the folllowing in kodos but they are still not
perfect:
[^A-K,^N-Q,^S-Z,^0-9]
[L][M][R]
[LRM]?L?[LRM]? etc but they do not exactly meet what I need.
For eg: LRLRLRLRLM is ok but LRLRLRNL is not as it has 'N'
.like that.
regards,
SZ
The string may or may not have all the three chars.
With regular expressions, [^LRM] matches a character that isn't L, R
or M. So:
import re
var = "LRLRLRLNR"
if re.search(r'[^LRM]', var):
print "Invalid"
Fails if var refers to the empty string.
No it doesn't, it succeeds if var is an empty string. An empty string
doesn't contain characters that are not L, R or M.
The OP doesn't specify whether an empty string is valid or not. My
interpretation was that an empty string would be valid.

Click to expand...

Click to expand...

Why not just use * instead of + like:

Click to expand...

if re.search(r'^[^LRM]*$', var): # note: ^ outside [] is start of
string; $ means end of string
print "Invalid"

Click to expand...

This will *only* print invalid when there is a character other than L,
R, or M or a empty string.

Click to expand...

Sorry, forget the beginning and ending markers, I just tried it out, it
doesn't work.
use this instead:

if re.search(r'[^LRM]*', var):
print "Invalid"

This won't work -- every string in the universe contains 0 or more
characters which are not 'L', 'R', or 'M'. That is, the regular
expression X* could match the empty string, which can be found in all
strings.

oj · Jul 18, 2008

Why not just use * instead of + like:

Click to expand...

if re.search(r'^[^LRM]*$', var): # note: ^ outside [] is start of
string; $ means end of string
print "Invalid"

Click to expand...

This will *only* print invalid when there is a character other than L,
R, or M or a empty string.

Click to expand...

Sorry, forget the beginning and ending markers, I just tried it out, it
doesn't work.
use this instead:

if re.search(r'[^LRM]*', var):
print "Invalid"

No, that's broken.

That searches for any number of invalid characters. Even 0, so it
ALWAYS matches, no matter what string you give it.

My regex worked in the first place, you're complicating it needlessly.
The presence of one invalid character makes the string invalid, so why
not just search for one? Similarly, there's no need to stick in the
beginning and end markers - you're not trying to match the entire
string, just find part of it that is invalid.

However, I think the sets solution by Scott David Daniels is the most
elegant method put forward.

Andrew Freeman · Jul 19, 2008

oj said:
Why not just use * instead of + like:

if re.search(r'^[^LRM]*$', var): # note: ^ outside [] is start of
string; $ means end of string
print "Invalid"

This will *only* print invalid when there is a character other than L,
R, or M or a empty string.

Click to expand...

Sorry, forget the beginning and ending markers, I just tried it out, it
doesn't work.
use this instead:

if re.search(r'[^LRM]*', var):
print "Invalid"

Click to expand...

No, that's broken.

That searches for any number of invalid characters. Even 0, so it
ALWAYS matches, no matter what string you give it.

My regex worked in the first place, you're complicating it needlessly.
The presence of one invalid character makes the string invalid, so why
not just search for one? Similarly, there's no need to stick in the
beginning and end markers - you're not trying to match the entire
string, just find part of it that is invalid.

However, I think the sets solution by Scott David Daniels is the most
elegant method put forward.

I see your point after rereading the question, he only wants to match L
R and M
let me revise it please:

To show if valid:

if re.search(r'^[LRM]*$', 'LM'):
print 'Valid'

To show if invalid,

if re.search(r'^[^LRM]*$', '0'):
print 'Inalid'

Example session:

>>> import re
>>> def match(var):
>>> if re.search(r'^[LRM]*$', var):

Click to expand...

Click to expand...

........ print 'Valid'
........ else:
........ print 'Invalid'Invalid

John Machin · Jul 19, 2008

To show if valid:

if re.search(r'^[LRM]*$', 'LM'):
print 'Valid'

A couple of points:
(1) Instead of search(r'^blahblah', ...) use match(r'blahblah', ...)
(2) You need to choose your end-anchor correctly; your pattern is
permitting a newline at the end:

re.search(r'^[LRM]*$', 'LM\n')

Click to expand...

Click to expand...

Andrew Freeman · Jul 19, 2008

John said:
To show if valid:

if re.search(r'^[LRM]*$', 'LM'):
print 'Valid'

Click to expand...

A couple of points:
(1) Instead of search(r'^blahblah', ...) use match(r'blahblah', ...)
(2) You need to choose your end-anchor correctly; your pattern is
permitting a newline at the end:

re.search(r'^[LRM]*$', 'LM\n')

Click to expand...

Click to expand...

<_sre.SRE_Match object at 0x00B9E528>

Thanks for your pointers, however have a question regarding the first:

>>> import re
>>> def my_match(var): # my original
>>> if re.search(r'^[LRM]*$', var):
>>> print 'Valid'
>>> else:
>>> print 'invalid'
>>> def other_match(var): # your suggestion, I believe
>>> if re.search(r'[LRM]*$', var):
>>> print 'Valid'
>>> else:
>>> print 'Invalid'
>>>
>>> eg = 'LRLRLRLRLM'
>>> fg = 'LRLRLRNL'
>>> my_match(eg) Valid # correct!
>>> my_match(fg) Invaild # correct!
>>>
>>> other_match(eg) Valid # correct!
>>> other_match(fg)

Click to expand...

Click to expand...

Vaild # INCORRECT, please explain

I believe other_match was your suggestion; to remove the ^
my_match is just a renamed version of my original function

Point 2:
Yes, I totally agree with you on point 2, let me try to combine my
knowledge and make a satisfactory function.

>>> def final_match(var):
>>> if re.search(r'^[LRM]*\Z', var): # replace $ with \Z to limit

Click to expand...

Click to expand...

newlines
.... print 'Valid'
.... else:
.... print 'Invalid'Invalid

So, in conclusion, is this function satisfactory?

def match(var):
if re.search(r'^[LRM]*\Z', var):
print 'Valid'
else:
print 'Invalid'

Andrew Freeman · Jul 19, 2008

I forgot to change search to match. This should be better:

def match(var):
if re.match(r'[LRM]*\Z', var):
return True
else:
return False

I was also thinking if you had a list of these items needing to be
verified you could use this:

>>> l = ['LLMMRR', '00thLL', 'L', '\n']
>>> out = []
>>> map(lambda i: match(i)==False or out.append(i), l)
>>> print out

Click to expand...

Click to expand...

['LLMMRR', 'L']

John Machin · Jul 19, 2008

I forgot to change search to match. This should be better:

def match(var):
if re.match(r'[LRM]*\Z', var):
return True
else:
return False

A bit wordy ...

if blahblah:
return True
else:
return False

can in total generality be replaced by:

return blahblah

I was also thinking if you had a list of these items needing to be
verified you could use this:

You could, but I suggest you don't use it in a job interview

l = ['LLMMRR', '00thLL', 'L', '\n']

Click to expand...

Click to expand...

(1) Don't use 'L'.lower() as a name; it slows down reading as people
need to fire up their mental parser to distinguish it from the result
of 3 - 2

out = []
map(lambda i: match(i)==False or out.append(i), l)

Click to expand...

Click to expand...

(2) Read PEP 8
(3) blahblah == False ==> not blahblah
(4) You didn't show the output from map() i.e. something like [None,
True, None, True]
(5) or out.append(...) is a baroque use of a side-effect, and is quite
unnecessary. If you feel inexorably drawn to following the map way,
read up on the filter and reduce functions. Otherwise learn about list
comprehensions and generators.

['LLMMRR', 'L']

Consider this:

import re
alist = ['LLMMRR', '00thLL', 'L', '\n']
zeroplusLRM = re.compile(r'[LRM]*\Z').match
filter(zeroplusLRM, alist) ['LLMMRR', 'L']
[x for x in alist if zeroplusLRM(x)] ['LLMMRR', 'L']

Click to expand...

Click to expand...

Cheers,
John

Andrew Freeman · Jul 20, 2008

John said:
Andrew said:

John Machin wrote:

A couple of points:
(1) Instead of search(r'^blahblah', ...) use match(r'blahblah', ...)
(2) You need to choose your end-anchor correctly; your pattern is
permitting a newline at the end:

Click to expand...

I forgot to change search to match. This should be better:

def match(var):
if re.match(r'[LRM]*\Z', var):
return True
else:
return False

Click to expand...

A bit wordy ...

if blahblah:
return True
else:
return False

can in total generality be replaced by:

return blahblah

I was also thinking if you had a list of these items needing to be
verified you could use this:

Click to expand...

You could, but I suggest you don't use it in a job interview

l = ['LLMMRR', '00thLL', 'L', '\n']

Click to expand...

Click to expand...

(1) Don't use 'L'.lower() as a name; it slows down reading as people
need to fire up their mental parser to distinguish it from the result
of 3 - 2

out = []
map(lambda i: match(i)==False or out.append(i), l)

Click to expand...

Click to expand...

(2) Read PEP 8
(3) blahblah == False ==> not blahblah
(4) You didn't show the output from map() i.e. something like [None,
True, None, True]
(5) or out.append(...) is a baroque use of a side-effect, and is quite
unnecessary. If you feel inexorably drawn to following the map way,
read up on the filter and reduce functions. Otherwise learn about list
comprehensions and generators.

print out

Click to expand...

['LLMMRR', 'L']

Click to expand...

Consider this:

import re
alist = ['LLMMRR', '00thLL', 'L', '\n']
zeroplusLRM = re.compile(r'[LRM]*\Z').match
filter(zeroplusLRM, alist)

Click to expand...

Click to expand...

['LLMMRR', 'L']

[x for x in alist if zeroplusLRM(x)]

Click to expand...

Click to expand...

['LLMMRR', 'L']

Thank you for the pointers!
(1) Depending on the typeface I totally agree, Courier New has a nearly
indistinguishable 1 and l, I'm using Dejavu Sans Mono (Bitstream Vera
based). I was just thinking of it as a generic variable name for some
input. I'm fairly new to python and programming in general, it's more of
a hobby.

(2-3) This is actually the first time I've used map, maybe I should not
give extra examples, I was actually using it as a learning tool for
myself. I'm very thankful the mailing list has such skilled
contributers, such as yourself, but I assume that it can't hurt to give
working code, even though the style is less than perfect.

(3) Personally I think map(lambda i: match(i)==False or out.append(i),
l) is a little more readable than map(lambda i: not match(i) or
out.append(i), l) even if "baroque", your use of filter is obviously
much clearer than either.

(4) I highly doubt that this code was actually to be used in an
interactive session, the False/True output was truncated intentionally,
it's an obvious, but superfluous output (unless you were to rely on this
by attaching it to a variable which might lead to sorting issues).

(5) Thank you very much, I've read of the filter and reduce functions,
but haven't used them enough to recognize their usefulness.

I did realize that a list comprehension would be useful, but wanted to
try map()
I put together a generic matcher that returns either a list of True data
(if the input is a list or tuple) or a boolean value:

def match(ex, var):
"ex is the regular expression to match for, var the iterable or
string to return a list of matching items or a boolean value respectively."
ex = re.compile(ex).match
if isinstance(var, (list, tuple)):
return filter(ex, var)
else:
return bool(ex(var))

I believe this is fairly clean and succinct code, it would help my
learning immensely if you feel there is a more succinct, generic way of
writing this function.

John Machin · Jul 20, 2008

John Machin wrote:

(4) I highly doubt that this code was actually to be used in an
interactive session,

The offending code is a nonsense wherever it is used.

the False/True output was truncated intentionally,

What meaning are you attaching to "truncated"?

it's an obvious, but superfluous output (unless you were to rely on this
by attaching it to a variable which might lead to sorting issues).

I put together a generic matcher that returns either a list of True data
(if the input is a list or tuple) or a boolean value:

def match(ex, var):
"ex is the regular expression to match for, var the iterable or
string to return a list of matching items or a boolean value respectively."
ex = re.compile(ex).match

You lose clarity by rebinding ex like that, and you gain nothing.

if isinstance(var, (list, tuple)):
return filter(ex, var)
else:
return bool(ex(var))

I believe this is fairly clean and succinct code, it would help my
learning immensely if you feel there is a more succinct, generic way of
writing this function.

You have created a function which does two quite different things
depending on whether one of the arguments is one of only two of the
many kinds of iterables and which has a rather generic (match what?)
and misleading (something which filters matches is called "match"??)
name. The loss of clarity and ease of understanding caused by the
readers having to find the code for the function so that they can
puzzle through it means that the most succinct, generic and
*recommended* way of writing this function would be not to write it at
all.

Write a function which returns a MatchObject. In the unlikely event
that that anyone really wants to put bools in a list and sort them,
then they can wrap bool() around it. Give it a meaningful name e.g.
match_LRM.

You want to check if a single variable refers to a valid LRM string?
Use match_LRM(the_variable). Nice and clear.

You want to filter out of some iterable all the occurrences of valid
LRM strings? Use filter (whose name indicates its task) or a generator
or list comprehension ... what [x for x in some_iterable if
match_LRM(x)] does should be screamingly obvious i.e. have less chance
than filter of needing a trip to the manual.

HTH,
John

Andrew Freeman · Jul 20, 2008

John said:
The offending code is a nonsense wherever it is used.

What meaning are you attaching to "truncated"?

I'm attaching the meaning of "deleted the line (manually (not in
python))" to truncated, I'm actually using ipython, but though it would
be a good practice to type it out as if it come from the standard
interpretor. I also though it would be OK to leave out some output which
I considered unnecessary.

oj · Jul 21, 2008

let me revise it please:

To show if valid:

if re.search(r'^[LRM]*$', 'LM'):
print 'Valid'

Fine, this works, although match instead of search blah blah blah as
has already been mentioned. I still think searching for one invalid
character is more elegant then trying to match the entire string, but
that's just personal preference, I guess.

To show if invalid,

if re.search(r'^[^LRM]*$', '0'):
print 'Inalid'

No. This is wrong. This only matches strings that consist entirely of
characters that are not L, R or M:

import re
if re.search(r'^[^LRM]*$', 'ZZZLZZZ'):

Click to expand...

Click to expand...

... print "Invalid"
...
This doesn't print "Invalid" because there is one non-invalid
character there, which is clearly not what the OP wanted.

Fredrik Lundh · Jul 21, 2008

oj said:
Fine, this works, although match instead of search blah blah blah as
has already been mentioned. I still think searching for one invalid
character is more elegant then trying to match the entire string, but
that's just personal preference, I guess.

The drawback is that it's a lot easier to mess up the edge cases if you
do that (as this thread has shown). The small speedup you get in
typical cases is quickly offset by extra debugging/testing time (or, for
that matter, arguing with c.l.py:ers over more or less contrived ways to
interpret the original post).

Guess it's up to personal preferences for how to best help others.
Unless the OP explicitly asks for something else, I prefer to use simple
and straight-forward solutions with reasonable execution behaviour over
clever tricks or odd-ball solutions; it's not a JAPH contest, after all.

</F>

Did you know that there is a match-case function in python?	4	Dec 17, 2023
Measuring a string of text	1	Sep 15, 2022
Blue J Ciphertext Program	2	Nov 22, 2023
My Status, Ciphertext	2	Nov 28, 2023
Copy string from 2D array to a 1D array in C	1	Nov 1, 2023
How to play corresponding sound?	2	Jun 10, 2023
trying to match series of tokens to string	3	Apr 22, 2010
String and list error while running a Markov Chain	1	Aug 26, 2020

trying to match a string

arnimavidyarthy

John Machin

oj

John Machin

oj

Andrew Freeman

Andrew Freeman

arni

John S

oj

Andrew Freeman

John Machin

Andrew Freeman

Andrew Freeman

John Machin

Andrew Freeman

John Machin

Andrew Freeman

oj

Fredrik Lundh

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads