re sub help

S

s99999999s2003

hi

i have a string :
a =
"this\nis\na\nsentence[startdelim]this\nis\nanother[enddelim]this\nis\n"

inside the string, there are "\n". I don't want to substitute the '\n'
in between
the [startdelim] and [enddelim] to ''. I only want to get rid of the
'\n' everywhere else.

i have read the tutorial and came across negative/positive lookahead
and i think it can solve the problem.but am confused on how to use it.
anyone can give me some advice? or is there better way other than
lookaheads ...thanks..
 
M

Mike Meyer

hi

i have a string :
a =
"this\nis\na\nsentence[startdelim]this\nis\nanother[enddelim]this\nis\n"

inside the string, there are "\n". I don't want to substitute the '\n'
in between
the [startdelim] and [enddelim] to ''. I only want to get rid of the
'\n' everywhere else.

Well, I'm not an expert on re's - I've only been using them for three
decades - but I'm not sure this can be done with a single re, as the
pattern you're interested in depends on context, and re's don't handle
that well.

On the
 
M

Mike Meyer

hi

i have a string :
a =
"this\nis\na\nsentence[startdelim]this\nis\nanother[enddelim]this\nis\n"

inside the string, there are "\n". I don't want to substitute the '\n'
in between
the [startdelim] and [enddelim] to ''. I only want to get rid of the
'\n' everywhere else.

Well, I'm not an expert on re's - I've only been using them for three
decades - but I'm not sure this can be done with a single re, as the
pattern you're interested in depends on context, and re's don't handle
that well.

a = "this\nis\na\nsentence[startdelim]this\nis\nanother[enddelim]this\nis\n"
sd = '[startdelim]'
ed = '[enddelim]'
s, r = a.split(sd, 1)
m, e = r.split(ed, 1)
a = s + sd + m.replace('\n', '') + ed + e
a 'this\nis\na\nsentence[startdelim]thisisanother[enddelim]this\nis\n'

<mike
 
S

s99999999s2003

thanks for the reply.

i am still interested about using re, i find it useful. am still
learning it's uses.
so i did something like this for a start, trying to get everything in
between [startdelim] and [enddelim]

a =
"this\nis\na\nsentence[startdelim]this\nis\nanother[enddelim]this\nis\n"

t = re.compile(r"\[startdelim\](.*)\[enddelim\]")

t.findall(a)
but it gives me []. it's the "\n" that prevents the results.
why can't (.*) work in this case? Or am i missing some steps to "read"
in the "\n"..?
thanks.
 
F

Fredrik Lundh

i am still interested about using re, i find it useful. am still
learning it's uses.
so i did something like this for a start, trying to get everything in
between [startdelim] and [enddelim]

a =
"this\nis\na\nsentence[startdelim]this\nis\nanother[enddelim]this\nis\n"

t = re.compile(r"\[startdelim\](.*)\[enddelim\]")

"*" is greedy (=searches backwards from the right end), so that won't
do the right thing if you have multiple delimiters

to fix this, use "*?" instead.
t.findall(a)
but it gives me []. it's the "\n" that prevents the results.
why can't (.*) work in this case? Or am i missing some steps to "read"
in the "\n"..?

http://docs.python.org/lib/re-syntax.html

(Dot.) In the default mode, this matches any character except
a newline. If the DOTALL flag has been specified, this matches any
character including a newline.

to fix this, pass in re.DOTALL or re.S as the flag argument, or
prepend (?s) to the expression.

</F>
 
M

Mike Meyer

i am still interested about using re, i find it useful. am still
learning it's uses.
so i did something like this for a start, trying to get everything in
between [startdelim] and [enddelim]

a =
"this\nis\na\nsentence[startdelim]this\nis\nanother[enddelim]this\nis\n"

t = re.compile(r"\[startdelim\](.*)\[enddelim\]")

t.findall(a)
but it gives me []. it's the "\n" that prevents the results.
why can't (.*) work in this case? Or am i missing some steps to "read"
in the "\n"..?
thanks.

Newlines are magic to regular expressions. You use the flags in re to
change that. In this case, you want . to match them, so you use the
DOTALL flag:
a = "this\nis\na\nsentence[startdelim]this\nis\nanother[enddelim]this\nis\n"
t = re.compile(r"\[startdelim\](.*)\[enddelim\]", re.DOTALL)
t.findall(a) ['this\nis\nanother']

<mike
 
K

Kent Johnson

hi

i have a string :
a =
"this\nis\na\nsentence[startdelim]this\nis\nanother[enddelim]this\nis\n"

inside the string, there are "\n". I don't want to substitute the '\n'
in between
the [startdelim] and [enddelim] to ''. I only want to get rid of the
'\n' everywhere else.

Here is a solution using re.sub and a class that maintains state. It works when the input text contains multiple startdelim/enddelim pairs.

import re

a = "this\nis\na\nsentence[startdelim]this\nis\nanother[enddelim]this\nis\n" * 2

class subber(object):
def __init__(self):
self.delimiterSeen = False

def __call__(self, m):
text = m.group()
if text == 'startdelim':
self.delimiterSeen = True
return text

if text == 'enddelim':
self.delimiterSeen = False
return text

if self.delimiterSeen:
return text

return ''

delimRe = re.compile('\n|startdelim|enddelim')

newText = delimRe.sub(subber(), a)
print repr(newText)


Kent
 
B

Bengt Richter

hi

i have a string :
a =
"this\nis\na\nsentence[startdelim]this\nis\nanother[enddelim]this\nis\n"

inside the string, there are "\n". I don't want to substitute the '\n'
in between
the [startdelim] and [enddelim] to ''. I only want to get rid of the
'\n' everywhere else.

i have read the tutorial and came across negative/positive lookahead
and i think it can solve the problem.but am confused on how to use it.
anyone can give me some advice? or is there better way other than
lookaheads ...thanks..

Sometimes splitting and processing the pieces selectively can be a solution, e.g.,
if delimiters are properly paired, splitting (with parens to keep matches) should
give you a repeating pattern modulo 4 of
said:
>>> a = "this\nis\na\nsentence[startdelim]this\nis\nanother[enddelim]this\nis\n"
>>> import re
>>> splitter = re.compile(r'(?s)(\[startdelim\]|\[enddelim\])')
>>> sp = splitter.split(a)
>>> sp ['this\nis\na\nsentence', '[startdelim]', 'this\nis\nanother', '[enddelim]', 'this\nis\n']
>>> ''.join([(lambda s:s, lambda s:s.replace('\n',''))[not i%4](s) for i,s in enumerate(sp)]) 'thisisasentence[startdelim]this\nis\nanother[enddelim]thisis'
>>> print ''.join([(lambda s:s, lambda s:s.replace('\n',''))[not i%4](s) for i,s in enumerate(sp)])
thisisasentence[startdelim]this
is
another[enddelim]thisis

I haven't checked for corner cases, but HTH
Maybe I'll try two pairs of delimiters:
>>> a += "2222\n33\n4\n55555555[startdelim]6666\n77\n8888888[enddelim]9999\n00\n"
>>> sp = splitter.split(a)
>>> print ''.join([(lambda s:s, lambda s:s.replace('\n',''))[not i%4](s) for i,s in enumerate(sp)])
thisisasentence[startdelim]this
is
another[enddelim]thisis222233455555555[startdelim]6666
77
8888888[enddelim]999900

which came from ['this\nis\na\nsentence', '[startdelim]', 'this\nis\nanother', '[enddelim]', 'this\nis\n2222\n33
\n4\n55555555', '[startdelim]', '6666\n77\n8888888', '[enddelim]', '9999\n00\n']

Which had the replacing when not i%4 was true
...
True: 'this\nis\na\nsentence'
False: '[startdelim]'
False: 'this\nis\nanother'
False: '[enddelim]'
True: 'this\nis\n2222\n33\n4\n55555555'
False: '[startdelim]'
False: '6666\n77\n8888888'
False: '[enddelim]'
True: '9999\n00\n'

Regards,
Bengt Richter
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,270
Messages
2,571,339
Members
48,029
Latest member
Anchorman2022

Latest Threads

Top