fileinput module dangerous?

C

Chris Connett

I just started working with the fileinput module, and reading the bit
about inplace modification, it seems very dangerous:

From the doc:
---
Optional in-place filtering: if the keyword argument inplace=1 is
passed to input() or to the FileInput constructor, the file is moved
to a backup file and standard output is directed to the input file

**(if a file of the same name as the backup file already exists, it
will be replaced silently)**.

This makes it possible to write a filter that rewrites its input file
in place. If the keyword argument backup='.<some extension>' is also
given, it specifies the extension for the backup file, and the backup
file remains around; by default, the extension is '.bak' and it is
deleted when the output file is closed. In-place filtering is disabled
when standard input is read.
---[Emphasis mine]

This seems like very dangerous behavior, as .bak is a very common and
useful extension for old versions of files users would like to keep
around, and this module obliterates files one would not expect it to
touch. This behavior is also very un-Pythonic. The Zen says "Errors
should never pass silently." There seems to be no way to get useful
behavior either, as the backup keyword just specifies a backup
extension, which you can never guarantee will not be in use, and it
leaves the file around in the directory as well.

It seems like this module should use tmpfile in the os module for it's
temp file needs.

Is there a rational reason it's this way, or should I start working on
a patch?
 
J

Jorgen Grahn

I just started working with the fileinput module, and reading the bit
about inplace modification, it seems very dangerous:

From the doc:
---
Optional in-place filtering: if the keyword argument inplace=1 is
passed to input() or to the FileInput constructor, the file is moved
to a backup file and standard output is directed to the input file

**(if a file of the same name as the backup file already exists, it
will be replaced silently)**. ....

This seems like very dangerous behavior, as .bak is a very common and
useful extension for old versions of files users would like to keep
around, and this module obliterates files one would not expect it to
touch. This behavior is also very un-Pythonic.

(The end user probably doesn't care if this is Pythonic or not when it
clobbers her files ...)

I had no idea it worked like this -- thanks for pointing it out!
It was such an obvious import of the Perl idiom:
perl -pi -e 's/a/b/' file1 file2 ...
perl -pi.bak -e 's/a/b/' file1 file2 ...
which clobbers existing .bak files -- /if you tell it to, not otherwise/.
Anything else is insane.

(Perl will happily overwrite write-protected files, at least under Unix, but
that's a different story...)
Is there a rational reason it's this way, or should I start working on
a patch?

I can't see how there could be a reason. I'd welcome a patch.

/Jorgen
 
C

Chris Connett

Jorgen Grahn wrote:
[...]
I can't see how there could be a reason. I'd welcome a patch.

OK, I've gone ahead and made a fix. It makes use of os.tmpfile if it is
available, but implements the same (now safe) behavior if it is
unavailable through other means.

I also searched through old postings about this module, and went over
various complaints. I found that unicode awareness was missing and
wanted, and since there is no other effective way to mimic it in client
code, I added encoding support. All my changes *should* be backwards
compatible, which brings me to my next question.

As someone who has never posted a patch, what are next steps from here.
Is there a standard test suite that I need to run? What about
documentation updates?

This patch should be considered alpha quality, especially the encoding
support, though if the module is used without specifying an encoding, it
should work exactly as before. Please beat on it, though do note that I
haven't run any standard test suites if they exist, so bugs may not
necessarily be obscure ones. I will continue to run more tests myself
in the coming days.

Note:
- Due to the buffering nature of fileinput, universal newlines only
works when using an encoding whose StreamReader's readlines() method
performs universal newlining.

--
Chris Connett

190d189
< self._encoding = encoding
200a200
self._encoding = encoding 211a212,218
# need parameterized file modes, since the files backing
# encodings should be raw bytestreams
self._readmode = 'r'
self._writemode = 'w'
if self._encoding:
self._readmode += 'b'
self._writemode += 'b'
248c255
< self._savestdout = 0
---
self._savestdout = None
253c260
< self._output = 0
---
self._output = None
290c297
< self._backupfilename = 0
---
self._backupfilename = None
297,307c304,305
< self._backupfilename = (
< self._filename + (self._backup or os.extsep+"bak"))
< try: os.unlink(self._backupfilename)
< except os.error: pass
< # The next few lines may raise IOError
< os.rename(self._filename, self._backupfilename)
< self._file = open(self._backupfilename, "r")
< try:
< perm = os.fstat(self._file.fileno()).st_mode
< except OSError:
< self._output = open(self._filename, "w")
---
if self._backup:
self._backupfilename = self._filename + self._backup
309,312d306
< fd = os.open(self._filename,
< os.O_CREAT | os.O_WRONLY | os.O_TRUNC,
< perm)
< self._output = os.fdopen(fd, "w")
314,316c308,331
< if hasattr(os, 'chmod'):
< os.chmod(self._filename, perm)
< except OSError:
---
self._file = os.tmpfile()
# copy our input into the tmpfile to keep the same
# backup/write relationship
self._file.write(open(self._filename,'rb').read())
self._file.seek(0)
except NameError: # tmpfile not available
import random
self._backupfilename = self._filename
while os.path.exists( self._backupfilename ):
self._backupfilename += (
'%x' % random.randint(0, 16))
# The next few lines may raise IOError
if not self._file: # _file may already be a tmpfile
os.unlink(self._backupfilename)
# self._backupfilename will only exist if the
# user explicitly specified an extension to
# use, i.e., it's their fault if we clobber
# something now
os.rename(self._filename, self._backupfilename)
self._file = open(self._backupfilename, self._readmode)
try:
perm = os.stat(self._backupfilename).st_mode
os.chmod(self._filename,perm)
except (OSError, NameError): 317a333,334

self._output = open(self._filename, self._writemode)
322c339,347
< self._file = open(self._filename, "r")
---
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,209
Messages
2,571,088
Members
47,686
Latest member
scamivo

Latest Threads

Top