Conversion of perl based regex to python method

A

Andrew Robert

I have two Perl expressions


If windows:

perl -ple "s/([^\w\s])/sprintf(q#%%%2X#, ord $1)/ge" somefile.txt

If posix

perl -ple 's/([^\w\s])/sprintf("%%%2X", ord $1)/ge' somefile.txt



The [^\w\s] is a negated expression stating that any character
a-zA-Z0-9_, space or tab is ignored.

The () captures whatever matches and throws it into the $1 for
processing by the sprintf

In this case, %%%2X which is a three character hex value.

How would you convert this to a python equivalent using the re or
similar module?

I've begun reading about using re expressions at
http://www.amk.ca/python/howto/regex/ but I am still hazy on implementation.

Any help you can provide would be greatly appreciated.

Thanks,
Andy
 
A

Andrew Robert

Andrew said:
I have two Perl expressions


If windows:

perl -ple "s/([^\w\s])/sprintf(q#%%%2X#, ord $1)/ge" somefile.txt

If posix

perl -ple 's/([^\w\s])/sprintf("%%%2X", ord $1)/ge' somefile.txt



The [^\w\s] is a negated expression stating that any character
a-zA-Z0-9_, space or tab is ignored.

The () captures whatever matches and throws it into the $1 for
processing by the sprintf

In this case, %%%2X which is a three character hex value.

How would you convert this to a python equivalent using the re or
similar module?

I've begun reading about using re expressions at
http://www.amk.ca/python/howto/regex/ but I am still hazy on implementation.

Any help you can provide would be greatly appreciated.

Thanks,
Andy
Okay.. I got part of it..

The code/results below seem to do the first part of the expression.

I believe the next part is iterating across each of the characters,
evaluate the results and replace with hex as needed.


# Import the module
import re

# Open test file
file=open(r'm:\mq\mq\scripts\testme.txt','r')

# Read in a sample line
line=file.readline()

# Compile expression to exclude all characters plus space/tab
pattern=re.compile('[^\w\s]')

# Look to see if I can find a non-standard character
# from test line #! C:\Python24\Python

var=pattern.match('!')

# gotcha!
print var
<_sre.SRE_Match object at 0x009DA8E0

# I got
print var.group()

!

# See if pattern will come back with something it shouldn't
var =pattern.match('C')
print var

#I got
None



Instead of being so linear, I was thinking that this might be closer.
Got to figure out the hex line but then we are golden


# Evaluate captured character as hex
def ret_hex(ch):
return chr((ord(ch) + 1) % )

# Evaluate the value of whatever was matched
def eval_match(match):
return ret_hex(match.group(0))

# open file
file = open(r'm:\mq\mq\scripts\testme.txt','r')

# Read each line, pass any matches on line to function
for line in file.readlines():
re.sub('[^\w\s]',eval_match, line)
 
P

Peter Otten

Andrew Robert wrote:

Wanted:
perl -ple 's/([^\w\s])/sprintf("%%%2X", ord $1)/ge'  somefile.txt
Got:

# Evaluate captured character as hex
def ret_hex(ch):
return chr((ord(ch) + 1) % )

Make it compile at least before posting :)
# Evaluate the value of whatever was matched
def eval_match(match):
return ret_hex(match.group(0))

# open file
file = open(r'm:\mq\mq\scripts\testme.txt','r')

# Read each line, pass any matches on line to function
for line in file.readlines():
re.sub('[^\w\s]',eval_match, line)

for line in file:
...

without readlines() is better because it doesn't read the whole file into
memory first. If you want to read data from files passed as commandline
args or from stdin you can use fileinput.input():

import re
import sys
import fileinput

def replace(match):
return "%%%2X" % ord(match.group(0))

for line in fileinput.input():
sys.stdout.write(re.sub("[^\w\s]", replace, line))

Peter
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,236
Members
46,822
Latest member
israfaceZa

Latest Threads

Top