Regular expression worries

C

CSUIDL PROGRAMMEr

folks
I am new to python, so excuse me if i am asking stupid questions.

I have a txt file and here are some lines of it

Document<Keyword<date:2006-08-19> Keyword<time:11:00:43>
Keyword said:
> Keyword<date:2006-08-19> Keyword<time:11:00:44> Keyword<sender:>
Keyword<receiver:> Keyword<data::+iwx> Keyword<mode::+iwx

I am writing a python program to replace the tags and word Document
with Doc.

Here is my python program

#! /usr/local/bin/python

import sys
import string
import re

def replace():
filename='/root/Desktop/project/chatlog_20060819_110043.xml.txt'
try:
fh=open(filename,'r')
except:
print 'file not opened'
sys.exit(1)
for l in
open('/root/Desktop/project/chatlog_20060819_110043.xml.txt'):

l=l.replace("Document", "DOC")
fh.close()

if __name__=="__main__":
replace()

But it does not replace Document with Doc in the txt file

Is there anything wrong i am doing

thanks
 
J

johnzenger

You are opening the same file twice, reading its contents line-by-line
into memory, replacing "Document" with "Doc" *in memory*, never writing
that to disk, and then discarding the line you just read into memory.

If your file is short, you could read the entire thing into memory as
one string using the .read() method of fh (your file object). Then,
call .replace on the string, and then write to disk.

If your file is long, then you want to do the replace line by line,
writing as you go to a second file. You can later rename that file to
the original file's name and delete the original.

Also, you aren't using regular expressions at all. You do not
therefore need the re module.
 
B

Bruno Desthuilliers

CSUIDL said:
folks
I am new to python, so excuse me if i am asking stupid questions.

From what I see, you seem to be new to programming in general !-)
I have a txt file and here are some lines of it

Document<Keyword<date:2006-08-19> Keyword<time:11:00:43>

Keyword<receiver:> Keyword<data::+iwx> Keyword<mode::+iwx

I am writing a python program to replace the tags and word Document
with Doc.

Here is my python program

#! /usr/local/bin/python

import sys
import string
import re

def replace():
filename='/root/Desktop/project/chatlog_20060819_110043.xml.txt'
try:
fh=open(filename,'r')
except:
print 'file not opened'
sys.exit(1)

You open your file a first time, and bind the reference to the file
object to fh.
for l in
open('/root/Desktop/project/chatlog_20060819_110043.xml.txt'):

And then you open the file a second time...
l=l.replace("Document", "DOC")

This modifies the string referenced by l (talk about a bad name) and
rebind to the same name
fh.close()

Then you close fh... and discard the modifications to l.
if __name__=="__main__":
replace()

But it does not replace Document with Doc in the txt file

Why should it ? You didn't asked for it !-)
Is there anything wrong i am doing

Yes.

The canonical way to modify a text file is to read from original / do
transformations / *write modifications to a tmp file* / replace the
original with the tmp file.
 
T

Tim Chase

for l in
open('/root/Desktop/project/chatlog_20060819_110043.xml.txt'):

l=l.replace("Document", "DOC")
fh.close()

But it does not replace Document with Doc in the txt file

In addition to closing the file handle for the loop *within* the
loop, you're changing "l" (side note: a bad choice of names, as
in most fonts, it's difficult to visually discern from the number
"1"), but you're not writing it back out any place. One would do
something like

outfile = open('out.txt', 'w')
infile = open(filename)
for line in infile:
outfile.write(line.replace("Document", "DOC"))
outfile.close()
infile.close()

You could even let garbage collection take care of the file
handle for you:


outfile = open('out.txt', 'w')
for line in open(filename):
outfile.write(line.replace("Document", "DOC"))
outfile.close()


If needed, you can then move the 'out.txt' overtop of the
original file.

Or, you could just use

sed 's/Document/DOC/g' $FILENAME > out.txt

or with an accepting version, do it in-place with

sed -i 's/Document/DOC/g' $FILENAME

if you have sed available on your system.

Oh...and it doesn't look like your code is using regexps for
anything, despite the subject-line of your email :) I suspect
they'll come in later for the "replace the tags" portion you
mentioned, but that ain't in the code.

-tkc
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,819
Latest member
masterdaster

Latest Threads

Top