splitting perl-style find/replace regexp using python

J

John Pye

Hi all

I have a file with a bunch of perl regular expressions like so:

/(^|[\s\(])\*([^ ].*?[^ ])\*([\s\)\.\,\:\;\!\?]|$)/$1'''$2'''$3/ #
bold
/(^|[\s\(])\_\_([^ ].*?[^ ])\_\_([\s\)\.\,\:\;\!\?]|$)/$1''<b>$2<\/
b>''$3/ # italic bold
/(^|[\s\(])\_([^ ].*?[^ ])\_([\s\)\.\,\:\;\!\?]|$)/$1''$2''$3/ #
italic

These are all find/replace expressions delimited as '/search/replace/
# comment' where 'search' is the regular expression we're searching
for and 'replace' is the replacement expression.

Is there an easy and general way that I can split these perl-style
find-and-replace expressions into something I can use with Python, eg
re.sub('search','replace',str) ?

I though generally it would be good enough to split on '/' but as you
see the <\/b> messes that up. I really don't want to learn perl
here :)

Cheers
JP
 
J

James Stroud

John said:
Hi all

I have a file with a bunch of perl regular expressions like so:

/(^|[\s\(])\*([^ ].*?[^ ])\*([\s\)\.\,\:\;\!\?]|$)/$1'''$2'''$3/ #
bold
/(^|[\s\(])\_\_([^ ].*?[^ ])\_\_([\s\)\.\,\:\;\!\?]|$)/$1''<b>$2<\/
b>''$3/ # italic bold
/(^|[\s\(])\_([^ ].*?[^ ])\_([\s\)\.\,\:\;\!\?]|$)/$1''$2''$3/ #
italic

These are all find/replace expressions delimited as '/search/replace/
# comment' where 'search' is the regular expression we're searching
for and 'replace' is the replacement expression.

Is there an easy and general way that I can split these perl-style
find-and-replace expressions into something I can use with Python, eg
re.sub('search','replace',str) ?

I though generally it would be good enough to split on '/' but as you
see the <\/b> messes that up. I really don't want to learn perl
here :)

Cheers
JP

This could be more general, in principal a perl regex could end with a
"\", e.g. "\\/", but I'm guessing that won't happen here.

py> for p in perlish:
.... print p
....
/(^|[\s\(])\*([^ ].*?[^ ])\*([\s\)\.\,\:\;\!\?]|$)/$1'''$2'''$3/
/(^|[\s\(])\_\_([^ ].*?[^ ])\_\_([\s\)\.\,\:\;\!\?]|$)/$1''<b>$2<\/b>''$3/
/(^|[\s\(])\_([^ ].*?[^ ])\_([\s\)\.\,\:\;\!\?]|$)/$1''$2''$3/
py> import re
py> splitter = re.compile(r'[^\\]/')
py> for p in perlish:
.... print splitter.split(p)
....
['/(^|[\\s\\(])\\*([^ ].*?[^ ])\\*([\\s\\)\\.\\,\\:\\;\\!\\?]|$',
"$1'''$2'''$", '']
['/(^|[\\s\\(])\\_\\_([^ ].*?[^ ])\\_\\_([\\s\\)\\.\\,\\:\\;\\!\\?]|$',
"$1''<b>$2<\\/b>''$", '']
['/(^|[\\s\\(])\\_([^ ].*?[^ ])\\_([\\s\\)\\.\\,\\:\\;\\!\\?]|$',
"$1''$2''$", '']

(I'm hoping this doesn't wrap!)

James
 
P

Peter Otten

John said:
I have a file with a bunch of perl regular expressions like so:

/(^|[\s\(])\*([^ ].*?[^ ])\*([\s\)\.\,\:\;\!\?]|$)/$1'''$2'''$3/ #
bold
/(^|[\s\(])\_\_([^ ].*?[^ ])\_\_([\s\)\.\,\:\;\!\?]|$)/$1''<b>$2<\/
b>''$3/ # italic bold
/(^|[\s\(])\_([^ ].*?[^ ])\_([\s\)\.\,\:\;\!\?]|$)/$1''$2''$3/ #
italic

These are all find/replace expressions delimited as '/search/replace/
# comment' where 'search' is the regular expression we're searching
for and 'replace' is the replacement expression.

Is there an easy and general way that I can split these perl-style
find-and-replace expressions into something I can use with Python, eg
re.sub('search','replace',str) ?

I though generally it would be good enough to split on '/' but as you
see the <\/b> messes that up. I really don't want to learn perl
here :)

How about matching all escaped chars and '/', and then throwing away the
former:

def split(s):
breaks = re.compile(r"(\\.)|(/)").finditer(s)
left, mid, right = [b.start() for b in breaks if b.group(2)]
return s[left+1:mid], s[mid+1:right]

Peter
 
J

James Stroud

James said:
John said:
Hi all

I have a file with a bunch of perl regular expressions like so:

/(^|[\s\(])\*([^ ].*?[^ ])\*([\s\)\.\,\:\;\!\?]|$)/$1'''$2'''$3/ #
bold
/(^|[\s\(])\_\_([^ ].*?[^ ])\_\_([\s\)\.\,\:\;\!\?]|$)/$1''<b>$2<\/
b>''$3/ # italic bold
/(^|[\s\(])\_([^ ].*?[^ ])\_([\s\)\.\,\:\;\!\?]|$)/$1''$2''$3/ #
italic

These are all find/replace expressions delimited as '/search/replace/
# comment' where 'search' is the regular expression we're searching
for and 'replace' is the replacement expression.

Is there an easy and general way that I can split these perl-style
find-and-replace expressions into something I can use with Python, eg
re.sub('search','replace',str) ?

I though generally it would be good enough to split on '/' but as you
see the <\/b> messes that up. I really don't want to learn perl
here :)

Cheers
JP

This could be more general, in principal a perl regex could end with a
"\", e.g. "\\/", but I'm guessing that won't happen here.

py> for p in perlish:
... print p
...
/(^|[\s\(])\*([^ ].*?[^ ])\*([\s\)\.\,\:\;\!\?]|$)/$1'''$2'''$3/
/(^|[\s\(])\_\_([^ ].*?[^ ])\_\_([\s\)\.\,\:\;\!\?]|$)/$1''<b>$2<\/b>''$3/
/(^|[\s\(])\_([^ ].*?[^ ])\_([\s\)\.\,\:\;\!\?]|$)/$1''$2''$3/
py> import re
py> splitter = re.compile(r'[^\\]/')
py> for p in perlish:
... print splitter.split(p)
...
['/(^|[\\s\\(])\\*([^ ].*?[^ ])\\*([\\s\\)\\.\\,\\:\\;\\!\\?]|$',
"$1'''$2'''$", '']
['/(^|[\\s\\(])\\_\\_([^ ].*?[^ ])\\_\\_([\\s\\)\\.\\,\\:\\;\\!\\?]|$',
"$1''<b>$2<\\/b>''$", '']
['/(^|[\\s\\(])\\_([^ ].*?[^ ])\\_([\\s\\)\\.\\,\\:\\;\\!\\?]|$',
"$1''$2''$", '']

(I'm hoping this doesn't wrap!)

James

I realized that threw away the closing parentheses. This is the correct
version:

py> splitter = re.compile(r'(?<!\\)/')
py> for p in perlish:
.... print splitter.split(p)
....
['', '(^|[\\s\\(])\\*([^ ].*?[^ ])\\*([\\s\\)\\.\\,\\:\\;\\!\\?]|$)',
"$1'''$2'''$3", '']
['', '(^|[\\s\\(])\\_\\_([^ ].*?[^
])\\_\\_([\\s\\)\\.\\,\\:\\;\\!\\?]|$)', "$1''<b>$2<\\/b>''$3", '']
['', '(^|[\\s\\(])\\_([^ ].*?[^ ])\\_([\\s\\)\\.\\,\\:\\;\\!\\?]|$)',
"$1''$2''$3", '']

James
 
P

Peter Otten

James said:
James said:
John said:
Hi all

I have a file with a bunch of perl regular expressions like so:

/(^|[\s\(])\*([^ ].*?[^ ])\*([\s\)\.\,\:\;\!\?]|$)/$1'''$2'''$3/ #
bold
/(^|[\s\(])\_\_([^ ].*?[^ ])\_\_([\s\)\.\,\:\;\!\?]|$)/$1''<b>$2<\/
b>''$3/ # italic bold
/(^|[\s\(])\_([^ ].*?[^ ])\_([\s\)\.\,\:\;\!\?]|$)/$1''$2''$3/ #
italic

These are all find/replace expressions delimited as '/search/replace/
# comment' where 'search' is the regular expression we're searching
for and 'replace' is the replacement expression.

Is there an easy and general way that I can split these perl-style
find-and-replace expressions into something I can use with Python, eg
re.sub('search','replace',str) ?

I though generally it would be good enough to split on '/' but as you
see the <\/b> messes that up. I really don't want to learn perl
here :)

Cheers
JP

This could be more general, in principal a perl regex could end with a
"\", e.g. "\\/", but I'm guessing that won't happen here.

py> for p in perlish:
... print p
...
/(^|[\s\(])\*([^ ].*?[^ ])\*([\s\)\.\,\:\;\!\?]|$)/$1'''$2'''$3/
/(^|[\s\(])\_\_([^ ].*?[^
])\_\_([\s\)\.\,\:\;\!\?]|$)/$1''<b>$2<\/b>''$3/ /(^|[\s\(])\_([^ ].*?[^
])\_([\s\)\.\,\:\;\!\?]|$)/$1''$2''$3/ py> import re
py> splitter = re.compile(r'[^\\]/')
py> for p in perlish:
... print splitter.split(p)
...
['/(^|[\\s\\(])\\*([^ ].*?[^ ])\\*([\\s\\)\\.\\,\\:\\;\\!\\?]|$',
"$1'''$2'''$", '']
['/(^|[\\s\\(])\\_\\_([^ ].*?[^ ])\\_\\_([\\s\\)\\.\\,\\:\\;\\!\\?]|$',
"$1''<b>$2<\\/b>''$", '']
['/(^|[\\s\\(])\\_([^ ].*?[^ ])\\_([\\s\\)\\.\\,\\:\\;\\!\\?]|$',
"$1''$2''$", '']

(I'm hoping this doesn't wrap!)

James

I realized that threw away the closing parentheses. This is the correct
version:

py> splitter = re.compile(r'(?<!\\)/')
py> for p in perlish:
... print splitter.split(p)
...
['', '(^|[\\s\\(])\\*([^ ].*?[^ ])\\*([\\s\\)\\.\\,\\:\\;\\!\\?]|$)',
"$1'''$2'''$3", '']
['', '(^|[\\s\\(])\\_\\_([^ ].*?[^
])\\_\\_([\\s\\)\\.\\,\\:\\;\\!\\?]|$)', "$1''<b>$2<\\/b>''$3", '']
['', '(^|[\\s\\(])\\_([^ ].*?[^ ])\\_([\\s\\)\\.\\,\\:\\;\\!\\?]|$)',
"$1''$2''$3", '']

There is another problem with escaped backslashes:
['', 'abc\\\\/def', '']

Peter
 
J

James Stroud

Peter said:
James said:
James said:
John Pye wrote:
Hi all

I have a file with a bunch of perl regular expressions like so:

/(^|[\s\(])\*([^ ].*?[^ ])\*([\s\)\.\,\:\;\!\?]|$)/$1'''$2'''$3/ #
bold
/(^|[\s\(])\_\_([^ ].*?[^ ])\_\_([\s\)\.\,\:\;\!\?]|$)/$1''<b>$2<\/
b>''$3/ # italic bold
/(^|[\s\(])\_([^ ].*?[^ ])\_([\s\)\.\,\:\;\!\?]|$)/$1''$2''$3/ #
italic

These are all find/replace expressions delimited as '/search/replace/
# comment' where 'search' is the regular expression we're searching
for and 'replace' is the replacement expression.

Is there an easy and general way that I can split these perl-style
find-and-replace expressions into something I can use with Python, eg
re.sub('search','replace',str) ?

I though generally it would be good enough to split on '/' but as you
see the <\/b> messes that up. I really don't want to learn perl
here :)

Cheers
JP

This could be more general, in principal a perl regex could end with a
"\", e.g. "\\/", but I'm guessing that won't happen here.

py> for p in perlish:
... print p
...
/(^|[\s\(])\*([^ ].*?[^ ])\*([\s\)\.\,\:\;\!\?]|$)/$1'''$2'''$3/
/(^|[\s\(])\_\_([^ ].*?[^
])\_\_([\s\)\.\,\:\;\!\?]|$)/$1''<b>$2<\/b>''$3/ /(^|[\s\(])\_([^ ].*?[^
])\_([\s\)\.\,\:\;\!\?]|$)/$1''$2''$3/ py> import re
py> splitter = re.compile(r'[^\\]/')
py> for p in perlish:
... print splitter.split(p)
...
['/(^|[\\s\\(])\\*([^ ].*?[^ ])\\*([\\s\\)\\.\\,\\:\\;\\!\\?]|$',
"$1'''$2'''$", '']
['/(^|[\\s\\(])\\_\\_([^ ].*?[^ ])\\_\\_([\\s\\)\\.\\,\\:\\;\\!\\?]|$',
"$1''<b>$2<\\/b>''$", '']
['/(^|[\\s\\(])\\_([^ ].*?[^ ])\\_([\\s\\)\\.\\,\\:\\;\\!\\?]|$',
"$1''$2''$", '']

(I'm hoping this doesn't wrap!)

James
I realized that threw away the closing parentheses. This is the correct
version:

py> splitter = re.compile(r'(?<!\\)/')
py> for p in perlish:
... print splitter.split(p)
...
['', '(^|[\\s\\(])\\*([^ ].*?[^ ])\\*([\\s\\)\\.\\,\\:\\;\\!\\?]|$)',
"$1'''$2'''$3", '']
['', '(^|[\\s\\(])\\_\\_([^ ].*?[^
])\\_\\_([\\s\\)\\.\\,\\:\\;\\!\\?]|$)', "$1''<b>$2<\\/b>''$3", '']
['', '(^|[\\s\\(])\\_([^ ].*?[^ ])\\_([\\s\\)\\.\\,\\:\\;\\!\\?]|$)',
"$1''$2''$3", '']

There is another problem with escaped backslashes:
['', 'abc\\\\/def', '']

Peter

Yes, this would be a case of the expression (left side) ending with a
"\" as I mentioned above.

James
 
P

Peter Otten

James said:
Yes, this would be a case of the expression (left side) ending with a
"\" as I mentioned above.

Sorry for not tracking the context.

Peter
 
J

John Pye

Thanks all for your suggestions on this. The 'splitter' idea was
particularly good, not something I'd thought of. Sorry for my late
reply.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,228
Members
46,818
Latest member
SapanaCarpetStudio

Latest Threads

Top