Python and stale file handles

T

tgiles

Hi, All!

I started back programming Python again after a hiatus of several
years and run into a sticky problem that I can't seem to fix,
regardless of how hard I try- it it starts with tailing a log file.

Basically, I'm trying to tail a log file and send the contents
elsewhere in the script (here, I call it processor()). My first
iteration below works perfectly fine- as long as the log file itself
(logfile.log) keeps getting written to.

I have a shell script constantly writes to the logfile.log... If I
happen to kill it off and restart it (overwriting the log file with
more entries) then the python script will stop sending anything at all
out.

import time, os

def processor(message,address):
#do something clever here

#Set the filename and open the file
filename = 'logfile.log'
file = open(filename,'r')

#Find the size of the file and move to the end
st_results = os.stat(filename)
st_size = st_results[6]
file.seek(st_size)

while 1:
where = file.tell()
line = file.readline()
if not line:
time.sleep(1)
file.seek(where)
else:
print line, # already has newline
data = line
if not data:
break
else:
processor(data,addr)
print "Sending message '",data,"'....."

someotherstuffhere()

===

This is perfectly normal behavior since the same thing happens when I
do a tail -f on the log file. However, I was hoping to build in a bit
of cleverness in the python script- that it would note that there was
a change in the log file and could compensate for it.

So, I wrote up a new script that opens the file to begin with,
attempts to do a quick file measurement of the file (to see if it's
suddenly stuck) and then reopen the log file if there's something
dodgy going on.

However, it's not quite working the way that I really intended it to.
It will either start reading the file from the beginning (instead of
tailing from the end) or just sit there confuzzled until I kill it
off.

===


import time, os

filename = logfile.log

def processor(message):
# do something clever here

def checkfile(filename):
file = open(filename,'r')
print "checking file, first pass"
pass1 = os.stat(filename)
pass1_size = pass1[6]

time.sleep(5)

print "file check, 2nd pass"
pass2 = os.stat(filename)
pass2_size = pass2[6]
if pass1_size == pass2_size:
print "reopening file"
file.close()
file = open(filename,'r')
else:
print "file is OK"
pass



while 1:
checkfile(filename)
where = file.tell()
line = file.readline()
print "reading file", where
if not line:
print "sleeping here"
time.sleep(5)
print "seeking file here"
file.seek(where)
else:
# print line, # already has newline
data = line
print "readying line"
if not data:
print "no data, breaking here"
break
else:
print "sending line"
processor(data)

So, have any thoughts on how to keep a Python script from bugging out
after a tailed file has been refreshed? I'd love to hear any thoughts
you my have on the matter, even if it's of the 'that's the way things
work' variety.

Cheers, and thanks in advance for any ideas on how to get around the
issue.

tom
 
B

bockman

Hi, All!

I started back programming Python again after a hiatus of several
years and run into a sticky problem that I can't seem to fix,
regardless of how hard I try- it it starts with tailing a log file.

Basically, I'm trying to tail a log file and send the contents
elsewhere in the script (here, I call it processor()). My first
iteration below works perfectly fine- as long as the log file itself
(logfile.log) keeps getting written to.

I have a shell script constantly writes to the logfile.log... If I
happen to kill it off and restart it (overwriting the log file with
more entries) then the python script will stop sending anything at all
out.

import time, os

def processor(message,address):
        #do something clever here

#Set the filename and open the file
filename = 'logfile.log'
file = open(filename,'r')

#Find the size of the file and move to the end
st_results = os.stat(filename)
st_size = st_results[6]
file.seek(st_size)

while 1:
    where = file.tell()
    line = file.readline()
    if not line:
        time.sleep(1)
        file.seek(where)
    else:
        print line, # already has newline
        data = line
        if not data:
            break
        else:
                processor(data,addr)
                print "Sending message '",data,"'....."

someotherstuffhere()

===

This is perfectly normal behavior since the same thing happens when I
do a tail -f on the log file. However, I was hoping to build in a bit
of cleverness in the python script- that it would note that there was
a change in the log file and could compensate for it.

So, I wrote up a new script that opens the file to begin with,
attempts to do a quick file measurement of the file (to see if it's
suddenly stuck) and then reopen the log file if there's something
dodgy going on.

However, it's not quite working the way that I really intended it to.
It will either start reading the file from the beginning (instead of
tailing from the end) or just sit there confuzzled until I kill it
off.

===

import time, os

filename = logfile.log

def processor(message):
    # do something clever here

def checkfile(filename):
    file = open(filename,'r')
    print "checking file, first pass"
    pass1 = os.stat(filename)
    pass1_size = pass1[6]

    time.sleep(5)

    print "file check, 2nd pass"
    pass2 = os.stat(filename)
    pass2_size = pass2[6]
    if pass1_size == pass2_size:
        print "reopening file"
        file.close()
        file = open(filename,'r')
    else:
        print "file is OK"
        pass

while 1:
        checkfile(filename)
    where = file.tell()
    line = file.readline()
    print "reading file", where
    if not line:
        print "sleeping here"
        time.sleep(5)
        print "seeking file here"
        file.seek(where)
    else:
        # print line, # already has newline
        data = line
        print "readying line"
        if not data:
            print "no data, breaking here"
            break
        else:
            print "sending line"
            processor(data)

So, have any thoughts on how to keep a Python script from bugging out
after a tailed file has been refreshed? I'd love to hear any thoughts
you my have on the matter, even if it's of the 'that's the way things
work' variety.

Cheers, and thanks in advance for any ideas on how to get around the
issue.

tom

Possibly, restarting the program that writes the log file creates a
new file rather than
appending to the old one??

I think you should always reopen the file between the first and the
second pass
of your checkfile function, and then:
- if the file has the same size, it is probably the same file (but it
would better to
check the update time!), so seek to the end of it
- otherwise, its a new file, and then start reading it from the
beginning

To reduce the number of seeks, you could perform checkfile only if for
N cycles you did not
get any data.

Ciao
 
C

colas.francis

I started back programming Python again after a hiatus of several
years and run into a sticky problem that I can't seem to fix,
regardless of how hard I try- it it starts with tailing a log file.
Basically, I'm trying to tail a log file and send the contents
elsewhere in the script (here, I call it processor()). My first
iteration below works perfectly fine- as long as the log file itself
(logfile.log) keeps getting written to.
I have a shell script constantly writes to the logfile.log... If I
happen to kill it off and restart it (overwriting the log file with
more entries) then the python script will stop sending anything at all
out.
import time, os
def processor(message,address):
#do something clever here
#Set the filename and open the file
filename = 'logfile.log'
file = open(filename,'r')
#Find the size of the file and move to the end
st_results = os.stat(filename)
st_size = st_results[6]
file.seek(st_size)
while 1:
where = file.tell()
line = file.readline()
if not line:
time.sleep(1)
file.seek(where)
else:
print line, # already has newline
data = line
if not data:
break
else:
processor(data,addr)
print "Sending message '",data,"'....."


This is perfectly normal behavior since the same thing happens when I
do a tail -f on the log file. However, I was hoping to build in a bit
of cleverness in the python script- that it would note that there was
a change in the log file and could compensate for it.
So, I wrote up a new script that opens the file to begin with,
attempts to do a quick file measurement of the file (to see if it's
suddenly stuck) and then reopen the log file if there's something
dodgy going on.
However, it's not quite working the way that I really intended it to.
It will either start reading the file from the beginning (instead of
tailing from the end) or just sit there confuzzled until I kill it
off.

import time, os
filename = logfile.log
def processor(message):
# do something clever here
def checkfile(filename):
file = open(filename,'r')
print "checking file, first pass"
pass1 = os.stat(filename)
pass1_size = pass1[6]
time.sleep(5)

print "file check, 2nd pass"
pass2 = os.stat(filename)
pass2_size = pass2[6]
if pass1_size == pass2_size:
print "reopening file"
file.close()
file = open(filename,'r')
else:
print "file is OK"
pass
while 1:
checkfile(filename)
where = file.tell()
line = file.readline()
print "reading file", where
if not line:
print "sleeping here"
time.sleep(5)
print "seeking file here"
file.seek(where)
else:
# print line, # already has newline
data = line
print "readying line"
if not data:
print "no data, breaking here"
break
else:
print "sending line"
processor(data)
So, have any thoughts on how to keep a Python script from bugging out
after a tailed file has been refreshed? I'd love to hear any thoughts
you my have on the matter, even if it's of the 'that's the way things
work' variety.
Cheers, and thanks in advance for any ideas on how to get around the
issue.

Possibly, restarting the program that writes the log file creates a
new file rather than
appending to the old one??

It seems at least the op should definitely reopen the file:

# create a file
In [322]: f1 = open("test.txt", 'w')
In [323]: f1.write("test\n")
In [324]: f1.close()

# check content of file
In [325]: f_test1 = open("test.txt")
In [326]: f_test1.readline()
Out[326]: 'test\n'
# check twice, we never know
In [327]: f_test1.seek(0)
In [328]: f_test1.readline()
Out[328]: 'test\n'

# rewrite over the same file
In [329]: f1 = open("test.txt", 'w')
In [330]: f1.write("new test\n")
In [331]: f1.close()

# check if ok
In [332]: f_test2 = open("test.txt")
In [333]: f_test2.readline()
Out[333]: 'new test\n'

# first file object has not seen the change
In [334]: f_test1.seek(0)
In [335]: f_test1.readline()
Out[335]: 'test\n'
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,969
Messages
2,570,161
Members
46,708
Latest member
SherleneF1

Latest Threads

Top