Replace and inserting strings within .txt files with the use of regex

M

MRAB

Îίκος said:
src_data = re.sub( '<\?(.*?)\?>', '', src_data, re.DOTALL )

like this?

re.sub doesn't accept a flags argument. You can put the flag inside the
regex itself like this:

src_data = re.sub(r'(?s)<\?(.*?)\?>', '', src_data)

(Note that the abbreviation for re.DOTALL is re.S and the inline flag is
'(?s)'. This is for historical reasons! :))
 
M

MRAB

Íßêïò said:
Now the code looks as follows:

=============================
#!/usr/bin/python

import re, os, sys

id = 0 # unique page_id

for currdir, files, dirs in os.walk('test'):

for f in files:

if f.endswith('php'):
[snip]

I just tried to test it. I created a folder names 'test' in me 'd:\'
drive.
Then i have put to .php files inside form the original to test if it
would work ok for those too files before acting in the whole copy and
after in the original project.

so i opened a 'cli' form my Win7 and tried

D:\>convert.py

D:\>

Itsjust printed an empty line and nothign else. Why didn't even try to
open the folder and fiels within?
Syntactically it doesnt ghive me an error!
Somehting with os.walk() methos perhaps?

Can you help in this too please?

Now iam able to just convrt a single file 'd:\test\index.php'

But these needs to be done for ALL the php files in every subfolder.
for currdir, files, dirs in os.walk('test'):

for f in files:

if f.endswith('php'):

Should the above lines enter folders and find php files in each folder
so to be edited?

I'd start by commenting-out the lines which change the files and then
add some more print statements to see which files it's finding. That
might give a clue. Only when it's fixed and finding the correct files
would I remove the additional print statements and then restore the
commented lines.
 
Î

Îίκος

re.sub doesn't accept a flags argument. You can put the flag inside the
regex itself like this:

     src_data = re.sub(r'(?s)<\?(.*?)\?>', '', src_data)

(Note that the abbreviation for re.DOTALL is re.S and the inline flag is
'(?s)'. This is for historical reasons! :))

This is for the '.' to match any character including '\n' too right?
so no matter if the php start tag and the end tag is in different
lines still to be matched, correct?

We nned the 'raw' string as well? why? The regex doens't cotnain
backslashes.
 
Í

Íßêïò

Ãßêïò said:
Now the code looks as follows:
=============================
#!/usr/bin/python
import re, os, sys
id = 0  # unique page_id
for currdir, files, dirs in os.walk('test'):
        for f in files:
                if f.endswith('php'):
[snip]
I just tried to test it. I created a folder names 'test' in me 'd:\'
drive.
Then i have put to .php files inside form the original to test if it
would work ok for those too files before acting in the whole copy and
after in the original project.
so i opened a 'cli' form my Win7 and tried
D:\>convert.py
D:\>
Itsjust printed an empty line and nothign else. Why didn't even try to
open the folder and fiels within?
Syntactically it doesnt ghive me an error!
Somehting with os.walk() methos perhaps?
Can you help in this too please?
Now iam able to just convrt a single file 'd:\test\index.php'
But these needs to be done for ALL the php files in every subfolder.
Should the above lines enter folders and find php files in each folder
so to be edited?

I'd start by commenting-out the lines which change the files and then
add some more print statements to see which files it's finding. That
might give a clue. Only when it's fixed and finding the correct files
would I remove the additional print statements and then restore the
commented lines.

I did that, but it doesnt even get to the 'test' folder to search for
the files!
 
Í

Íßêïò

D:\>convert.py
File "D:\convert.py", line 34
SyntaxError: Non-ASCII character '\xce' in file D:\convert.py on line
34, but no
encoding declared; see http://www.python.org/peps/pep-0263.html for
details

D:\>

What does it refering too? what character cannot be identified?

Line 34 is:

src_data = src_data.replace( '</body>', '<br><br><center><h4><font
color=green> Áñéèìüò Åðéóêåðôþí: %(counter)d </body>' )

Also,

for currdir, files, dirs in os.walk('test'):

for f in files:

if f.lower().endswith("php"):

in the above lines

should i state os.walk('test') or os.walk('d:\test') ?
 
M

MRAB

Îίκος said:
This is for the '.' to match any character including '\n' too right?
so no matter if the php start tag and the end tag is in different
lines still to be matched, correct?

We nned the 'raw' string as well? why? The regex doens't cotnain
backslashes.

Yes it does; two of them!
 
M

MRAB

Íßêïò said:
D:\>convert.py
File "D:\convert.py", line 34
SyntaxError: Non-ASCII character '\xce' in file D:\convert.py on line
34, but no
encoding declared; see http://www.python.org/peps/pep-0263.html for
details

D:\>

What does it refering too? what character cannot be identified?

Line 34 is:

src_data = src_data.replace( '</body>', '<br><br><center><h4><font
color=green> Áñéèìüò Åðéóêåðôþí: %(counter)d </body>' )
Didn't you say that you're using Python 2.7 now? The default file
encoding will be ASCII, but your file isn't ASCII, it contains Greek
letters. Add the encoding line:

# -*- coding: utf-8 -*-

and check that the file is saved as UTF-8.
Also,

for currdir, files, dirs in os.walk('test'):

for f in files:

if f.lower().endswith("php"):

in the above lines

should i state os.walk('test') or os.walk('d:\test') ?

The path 'test' is relative to the current working directory. Is that
D:\ for your script? If not, then it won't find the (correct) folder.

It might be better to use an absolute path instead. You could use
either:

r'd:\test'

(note that I've made it a raw string because it contains a backslash
which I want treated as a literal backslash) or:

'd:/test'

(Windows should accept a slash as well as of a backslash.)
 
Í

Íßêïò

Didn't you say that you're using Python 2.7 now? The default file
encoding will be ASCII, but your file isn't ASCII, it contains Greek
letters. Add the encoding line:

     # -*- coding: utf-8 -*-

and check that the file is saved as UTF-8.







The path 'test' is relative to the current working directory. Is that
D:\ for your script? If not, then it won't find the (correct) folder.

It might be better to use an absolute path instead. You could use
either:

     r'd:\test'

(note that I've made it a raw string because it contains a backslash
which I want treated as a literal backslash) or:

     'd:/test'

(Windows should accept a slash as well as of a backslash.)

I will try it as soon as i make another change that i missed:

The ID number of each php page was contained in the old php code
within this string

PageID = some_number

So instead of create a new ID number for eaqch page i have to pull out
this number to store to the beginnign to the file as comment line,
because it has direct relationship with the mysql database as in
tracking the number of each webpage and finding the counter of it.

# Grab the PageID contained within the php code and store it in id
variable
id = re.search( 'PageID = ', src_data )

How to tell Python to Grab that number after 'PageID = ' string and to
store it in var id that a later use in the program?

also i made another changewould something like this work:

===============================
# open same php file for storing modified data
print ( 'writing to %s' % dest_f )
f = open(src_f, 'w')
f.write(src_data)
f.close()

# rename edited .php file to .html extension
dst_f = src_f.replace('.php', '.html')
os.rename( src_f, dst_f )
===============================

Because instead of creating a new .html file and inserting the desired
data of the old php thus having two files(old php, and new html) i
decided to open the same php file for writing that data and then
rename it to html.
Would the above code work?
 
Í

Íßêïò

Please help me with these last changes before i try to perform an
overall change.
its almost done!
 
M

MRAB

Îίκος wrote:
[snip]
The ID number of each php page was contained in the old php code
within this string

PageID = some_number

So instead of create a new ID number for eaqch page i have to pull out
this number to store to the beginnign to the file as comment line,
because it has direct relationship with the mysql database as in
tracking the number of each webpage and finding the counter of it.

# Grab the PageID contained within the php code and store it in id
variable
id = re.search( 'PageID = ', src_data )

How to tell Python to Grab that number after 'PageID = ' string and to
store it in var id that a later use in the program?
If the part of the file you're trying to match look like this:

PageID = 12

then the regex should look like this:

PageID = (\d+)

and the code should look like this:

page_id = re.search(r'PageID = (\d+)', src_data).group(1)

The page_id will, of course, be a string.
also i made another changewould something like this work:

===============================
# open same php file for storing modified data
print ( 'writing to %s' % dest_f )
f = open(src_f, 'w')
f.write(src_data)
f.close()

# rename edited .php file to .html extension
dst_f = src_f.replace('.php', '.html')
os.rename( src_f, dst_f )
===============================

Because instead of creating a new .html file and inserting the desired
data of the old php thus having two files(old php, and new html) i
decided to open the same php file for writing that data and then
rename it to html.
Would the above code work?

Why wouldn't it?
 
Î

Îίκος

Îίκος wrote:

[snip]




The ID number of each php page was contained in the old php code
within this string
PageID = some_number
So instead of create a new ID number for eaqch page i have to pull out
this number to store to the beginnign to the file as comment line,
because it has direct relationship with the mysql database as in
tracking the number of each webpage and finding the counter of it.
# Grab the PageID contained within the php code and store it in id
variable
id = re.search( 'PageID = ', src_data )
How to tell Python to Grab that number after 'PageID = ' string and to
store it in var id that a later use in the program?

If the part of the file you're trying to match look like this:

     PageID = 12

then the regex should look like this:

     PageID = (\d+)

and the code should look like this:

     page_id = re.search(r'PageID = (\d+)', src_data).group(1)

The page_id will, of course, be a string.

Thank you very much for helping me with the syntax.
Why wouldn't it?

I though i was perhaps did something wrong with the code.

=========================================
for currdir, files, dirs in os.walk('d:\\test'): # neither 'd:/test'
tracks the folder

for f in files:

if f.lower().endswith("php"):

print currdir, files, dirs, f
=========================================

As you advised me in a previous post of yours i need to find out why
the converting code
although works for a single file doesn't for some reason enter folders
and subfolders to grab files form there to convert.

So as you said i should comment all other statements to find out the
culprit in the above lines.

Well those lines are supposed to print current working folder and
files but when i run the above code it gives me nothing in response,
not even 'f'.

So does that mean that os.walk() method cannot enter the windows 7
folders?

* One more thing is that instead of trying to run the above script
form 'cli' wouldn't it better to run it as a cgi script and see the
results in the browser instead with the addition fo this line?

print ( "Content-type: text/html; charset=UTF-8 \n" )

Or for some reason this has to be run from the shell to both
local(windows 7) and remote hosting (linux) servers?
 
Í

Íßêïò

Didn't you say that you're using Python 2.7 now? The default file
encoding will be ASCII, but your file isn't ASCII, it contains Greek
letters. Add the encoding line:

     # -*- coding: utf-8 -*-

and check that the file is saved as UTF-8.

sctually its for currdir, dirs, filesin os.walk('test'): thats whay
ti couldnt run!! :)

After changifn this and made some other modification my convertion
script finally run!

Here it is for someone that might want a similar functionality.

======================================================================

#!/usr/bin/python
# -*- coding: utf-8 -*-

import re, os, sys


count = 520

for currdir, dirs, files in os.walk('d:\\akis'):

for f in files:

if f.lower().endswith("php"):

# get abs path to filename
src_f = os.path.join(currdir, f)

# open php src file
f = open(src_f, 'r')
src_data = f.read()
f.close()

# Grab the id number contained within the php code and insert it
above all other data
found = re.search( r'PageID = (\d+)', src_data )
if found:
id = found.group(1)
else:
id = count =+ 1
src_data = ( '<!-- %s -->\n\n' % id ) + src_data

# replace php tags and contents within
src_data = re.sub( r'(?s)<\?(.*?)\?>', '', src_data )

# add template variables
src_data = src_data.replace( '</body>', '<br><br><center><h4><font
color=green> ΑÏιθμός Επισκεπτών: %(counter)d </body>' )

# open same php file for storing modified data
f = open(src_f, 'w')
f.write(src_data)
f.close()

# rename edited .php file to .html extension
dst_f = src_f.replace('.php', '.html')
os.rename( src_f, dst_f )
print ( "renaming: %s => %s\n" % (src_f, dst_f) )
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,170
Messages
2,570,927
Members
47,469
Latest member
benny001

Latest Threads

Top