re sub help

s99999999s2003 · Nov 5, 2005

hi

i have a string :
a =
"this\nis\na\nsentence[startdelim]this\nis\nanother[enddelim]this\nis\n"

inside the string, there are "\n". I don't want to substitute the '\n'
in between
the [startdelim] and [enddelim] to ''. I only want to get rid of the
'\n' everywhere else.

i have read the tutorial and came across negative/positive lookahead
and i think it can solve the problem.but am confused on how to use it.
anyone can give me some advice? or is there better way other than
lookaheads ...thanks..

Mike Meyer · Nov 5, 2005

hi

i have a string :
a =
"this\nis\na\nsentence[startdelim]this\nis\nanother[enddelim]this\nis\n"

inside the string, there are "\n". I don't want to substitute the '\n'
in between
the [startdelim] and [enddelim] to ''. I only want to get rid of the
'\n' everywhere else.

Well, I'm not an expert on re's - I've only been using them for three
decades - but I'm not sure this can be done with a single re, as the
pattern you're interested in depends on context, and re's don't handle
that well.

On the

Mike Meyer · Nov 5, 2005

hi

i have a string :
a =
"this\nis\na\nsentence[startdelim]this\nis\nanother[enddelim]this\nis\n"

inside the string, there are "\n". I don't want to substitute the '\n'
in between
the [startdelim] and [enddelim] to ''. I only want to get rid of the
'\n' everywhere else.

Well, I'm not an expert on re's - I've only been using them for three
decades - but I'm not sure this can be done with a single re, as the
pattern you're interested in depends on context, and re's don't handle
that well.

a = "this\nis\na\nsentence[startdelim]this\nis\nanother[enddelim]this\nis\n"
sd = '[startdelim]'
ed = '[enddelim]'
s, r = a.split(sd, 1)
m, e = r.split(ed, 1)
a = s + sd + m.replace('\n', '') + ed + e
a 'this\nis\na\nsentence[startdelim]thisisanother[enddelim]this\nis\n'

Click to expand...

Click to expand...

<mike

s99999999s2003 · Nov 5, 2005

thanks for the reply.

i am still interested about using re, i find it useful. am still
learning it's uses.
so i did something like this for a start, trying to get everything in
between [startdelim] and [enddelim]

a =
"this\nis\na\nsentence[startdelim]this\nis\nanother[enddelim]this\nis\n"

t = re.compile(r"\[startdelim\](.*)\[enddelim\]")

t.findall(a)
but it gives me []. it's the "\n" that prevents the results.
why can't (.*) work in this case? Or am i missing some steps to "read"
in the "\n"..?
thanks.

Fredrik Lundh · Nov 5, 2005

i am still interested about using re, i find it useful. am still
learning it's uses.
so i did something like this for a start, trying to get everything in
between [startdelim] and [enddelim]

a =
"this\nis\na\nsentence[startdelim]this\nis\nanother[enddelim]this\nis\n"

t = re.compile(r"\[startdelim\](.*)\[enddelim\]")

"*" is greedy (=searches backwards from the right end), so that won't
do the right thing if you have multiple delimiters

to fix this, use "*?" instead.

t.findall(a)
but it gives me []. it's the "\n" that prevents the results.
why can't (.*) work in this case? Or am i missing some steps to "read"
in the "\n"..?

http://docs.python.org/lib/re-syntax.html

(Dot.) In the default mode, this matches any character except
a newline. If the DOTALL flag has been specified, this matches any
character including a newline.

to fix this, pass in re.DOTALL or re.S as the flag argument, or
prepend (?s) to the expression.

</F>

Mike Meyer · Nov 5, 2005

i am still interested about using re, i find it useful. am still
learning it's uses.
so i did something like this for a start, trying to get everything in
between [startdelim] and [enddelim]

a =
"this\nis\na\nsentence[startdelim]this\nis\nanother[enddelim]this\nis\n"

t = re.compile(r"\[startdelim\](.*)\[enddelim\]")

t.findall(a)
but it gives me []. it's the "\n" that prevents the results.
why can't (.*) work in this case? Or am i missing some steps to "read"
in the "\n"..?
thanks.

Newlines are magic to regular expressions. You use the flags in re to
change that. In this case, you want . to match them, so you use the
DOTALL flag:

a = "this\nis\na\nsentence[startdelim]this\nis\nanother[enddelim]this\nis\n"
t = re.compile(r"\[startdelim\](.*)\[enddelim\]", re.DOTALL)
t.findall(a) ['this\nis\nanother']

Click to expand...

Click to expand...

<mike

Kent Johnson · Nov 5, 2005

hi

i have a string :
a =
"this\nis\na\nsentence[startdelim]this\nis\nanother[enddelim]this\nis\n"

inside the string, there are "\n". I don't want to substitute the '\n'
in between
the [startdelim] and [enddelim] to ''. I only want to get rid of the
'\n' everywhere else.

Here is a solution using re.sub and a class that maintains state. It works when the input text contains multiple startdelim/enddelim pairs.

import re

a = "this\nis\na\nsentence[startdelim]this\nis\nanother[enddelim]this\nis\n" * 2

class subber(object):
def __init__(self):
self.delimiterSeen = False

def __call__(self, m):
text = m.group()
if text == 'startdelim':
self.delimiterSeen = True
return text

if text == 'enddelim':
self.delimiterSeen = False
return text

if self.delimiterSeen:
return text

return ''

delimRe = re.compile('\n|startdelim|enddelim')

newText = delimRe.sub(subber(), a)
print repr(newText)

Kent

Bengt Richter · Nov 6, 2005

hi

i have a string :
a =
"this\nis\na\nsentence[startdelim]this\nis\nanother[enddelim]this\nis\n"

inside the string, there are "\n". I don't want to substitute the '\n'
in between
the [startdelim] and [enddelim] to ''. I only want to get rid of the
'\n' everywhere else.

i have read the tutorial and came across negative/positive lookahead
and i think it can solve the problem.but am confused on how to use it.
anyone can give me some advice? or is there better way other than
lookaheads ...thanks..

Sometimes splitting and processing the pieces selectively can be a solution, e.g.,
if delimiters are properly paired, splitting (with parens to keep matches) should
give you a repeating pattern modulo 4 of

said:
>>> a = "this\nis\na\nsentence[startdelim]this\nis\nanother[enddelim]this\nis\n"
>>> import re
>>> splitter = re.compile(r'(?s)(\[startdelim\]|\[enddelim\])')
>>> sp = splitter.split(a)
>>> sp ['this\nis\na\nsentence', '[startdelim]', 'this\nis\nanother', '[enddelim]', 'this\nis\n']
>>> ''.join([(lambda s:s, lambda s:s.replace('\n',''))[not i%4](s) for i,s in enumerate(sp)]) 'thisisasentence[startdelim]this\nis\nanother[enddelim]thisis'
>>> print ''.join([(lambda s:s, lambda s:s.replace('\n',''))[not i%4](s) for i,s in enumerate(sp)])

Click to expand...

Click to expand...

thisisasentence[startdelim]this
is
another[enddelim]thisis

I haven't checked for corner cases, but HTH
Maybe I'll try two pairs of delimiters:

>>> a += "2222\n33\n4\n55555555[startdelim]6666\n77\n8888888[enddelim]9999\n00\n"
>>> sp = splitter.split(a)
>>> print ''.join([(lambda s:s, lambda s:s.replace('\n',''))[not i%4](s) for i,s in enumerate(sp)])

Click to expand...

Click to expand...

thisisasentence[startdelim]this
is
another[enddelim]thisis222233455555555[startdelim]6666
77
8888888[enddelim]999900

which came from ['this\nis\na\nsentence', '[startdelim]', 'this\nis\nanother', '[enddelim]', 'this\nis\n2222\n33
\n4\n55555555', '[startdelim]', '6666\n77\n8888888', '[enddelim]', '9999\n00\n']

Which had the replacing when not i%4 was true
...
True: 'this\nis\na\nsentence'
False: '[startdelim]'
False: 'this\nis\nanother'
False: '[enddelim]'
True: 'this\nis\n2222\n33\n4\n55555555'
False: '[startdelim]'
False: '6666\n77\n8888888'
False: '[enddelim]'
True: '9999\n00\n'

Regards,
Bengt Richter

Re for Apache log file format	4	Oct 8, 2013
RE Engine error with sub()	6	Apr 15, 2005
Can't solve problems! please Help	0	Sep 26, 2022
Bad Code (that works) help me re-write!	3	Oct 11, 2006
re beginner	16	Jun 4, 2006
python regex "negative lookahead assertions" problems	2	Nov 22, 2009
How to define repeated string when using the re module?	0	Aug 2, 2011
Accessing Function Variables from Sub-functions	2	Oct 11, 2007

re sub help

s99999999s2003

Mike Meyer

Mike Meyer

s99999999s2003

Fredrik Lundh

Mike Meyer

Kent Johnson

Bengt Richter

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads