python 3, subclassing TextIOWrapper.

L

lambertdw

'''
A python 3 question.
Presume this code is in file p.py.
The program fails.

$ python3 p.py
...
ValueError: I/O operation on closed file.

Removing the comment character to increase the stream
reference count fixes the program, at the expense of
an extra TextIOWrapper object.

Please, what is a better way to write the class with
regard to this issue?
'''

import re
import io

class file(io.TextIOWrapper):

'''
Enhance TextIO. Streams have many sources,
a file name is insufficient.
'''

def __init__(self,stream):
#self.stream = stream
super().__init__(stream.buffer)

def seek_pattern(self,pattern):
'''
A motivational method, otherwise inconsequential to the
problem.
'''
search = re.compile(pattern).search
while True:
line = next(self)
if (not line) or search(line):
return line


print(file(open('p.py')).read())
 
G

Gabriel Genellina

'''
A python 3 question.
Presume this code is in file p.py.
The program fails.

$ python3 p.py
...
ValueError: I/O operation on closed file.

Removing the comment character to increase the stream
reference count fixes the program, at the expense of
an extra TextIOWrapper object.

Please, what is a better way to write the class with
regard to this issue?
'''

import re
import io

class file(io.TextIOWrapper):

'''
Enhance TextIO. Streams have many sources,
a file name is insufficient.
'''

def __init__(self,stream):
#self.stream = stream
super().__init__(stream.buffer)


print(file(open('p.py')).read())

You're taking a shortcut (the open() builtin) that isn't valid here.

open() creates a "raw" FileIO object, then a BufferedReader, and finally
returns a TextIOWrapper. Each of those has a reference to the previous
object, and delegates many calls to it. In particular, close() propagates
down to FileIO to close the OS file descriptor.

In your example, you call open() to create a TextIOWrapper object that is
discarded as soon as the open() call finishes - because you only hold a
reference to the intermediate buffer. The destructor calls close(), and
the underlying OS file descriptor is closed.

So, if you're not interested in the TextIOWrapper object, don't create it
in the first place. That means, don't use the open() shortcut and build
the required pieces yourself.

---

There is another alternative that relies on undocumented behaviour: use
open to create a *binary* file and wrap the resulting BufferedReader
object in your own TextIOWrapper.

import io

class file(io.TextIOWrapper):
def __init__(self, buffer):
super().__init__(buffer)

print(file(open('p.py','rb')).read())
 
R

R. David Murray

Gabriel Genellina said:
You're taking a shortcut (the open() builtin) that isn't valid here.

open() creates a "raw" FileIO object, then a BufferedReader, and finally
returns a TextIOWrapper. Each of those has a reference to the previous
object, and delegates many calls to it. In particular, close() propagates
down to FileIO to close the OS file descriptor.

In your example, you call open() to create a TextIOWrapper object that is
discarded as soon as the open() call finishes - because you only hold a
reference to the intermediate buffer. The destructor calls close(), and
the underlying OS file descriptor is closed.

So, if you're not interested in the TextIOWrapper object, don't create it
in the first place. That means, don't use the open() shortcut and build
the required pieces yourself.

---

There is another alternative that relies on undocumented behaviour: use
open to create a *binary* file and wrap the resulting BufferedReader
object in your own TextIOWrapper.

import io

class file(io.TextIOWrapper):
def __init__(self, buffer):
super().__init__(buffer)

print(file(open('p.py','rb')).read())

I'm wondering if what we really need here is either some way to tell open
to use a specified subclass(s) instead of the default ones, or perhaps
an 'open factory' function that would yield such an open function that
otherwise is identical to the default open.

What's the standard python idiom for when consumer code should be
able to specialize the classes used to create objects returned from
a called package? (I'm tempted to say monkey patching the module,
but that can't be optimal :)
 
G

Gabriel Genellina

En Sun, 22 Mar 2009 15:11:37 -0300, R. David Murray
Gabriel Genellina said:
class file(io.TextIOWrapper):

'''
Enhance TextIO. Streams have many sources,
a file name is insufficient.
'''

def __init__(self,stream):
#self.stream = stream
super().__init__(stream.buffer)


print(file(open('p.py')).read())


[...] So, if you're not interested in the TextIOWrapper object, don't
create it in the first place. That means, don't use the open() shortcut
and build
the required pieces yourself.
I'm wondering if what we really need here is either some way to tell open
to use a specified subclass(s) instead of the default ones, or perhaps
an 'open factory' function that would yield such an open function that
otherwise is identical to the default open.

What's the standard python idiom for when consumer code should be
able to specialize the classes used to create objects returned from
a called package? (I'm tempted to say monkey patching the module,
but that can't be optimal :)

I've seen:
- pass the desired subclass as an argument to the class constructor /
factory function.
- set the desired subclass as an instance attribute of the factory object.
- replacing the f_globals attribute of the factory function (I wouldn't
recomend this! but sometimes is the only way)

In the case of builtin open(), I'm not convinced it would be a good idea
to allow subclassing. But I have no rational arguments - just don't like
the idea :(
 
B

Benjamin Peterson

Gabriel Genellina said:
There is another alternative that relies on undocumented behaviour: use
open to create a *binary* file and wrap the resulting BufferedReader
object in your own TextIOWrapper.

How is that undocumented behavior? TextIOWrapper can wrap any buffer which
follows the io.BufferedIOBase ABC. BufferedReader is a subclass of
io.BufferedIOBase.
 
L

lambertdw

For D. Murray's suggestion---I think that we programmers have to learn
the idiom. We don't always control open, such as subprocess.Popen().

Thank you. I hope these thoughts help with issue 5513 and the related
questions to follow about complete removal of file in python3.
Opening the file in binary mode for text behavior was not obvious to
me, but makes good sense now that you've explained the further
nesting.
 
R

R. David Murray

Gabriel Genellina said:
En Sun, 22 Mar 2009 15:11:37 -0300, R. David Murray
Gabriel Genellina said:
En Sat, 21 Mar 2009 23:58:07 -0300, <[email protected]> escribió:

class file(io.TextIOWrapper):

'''
Enhance TextIO. Streams have many sources,
a file name is insufficient.
'''

def __init__(self,stream):
#self.stream = stream
super().__init__(stream.buffer)


print(file(open('p.py')).read())


[...] So, if you're not interested in the TextIOWrapper object, don't
create it in the first place. That means, don't use the open() shortcut
and build
the required pieces yourself.
I'm wondering if what we really need here is either some way to tell open
to use a specified subclass(s) instead of the default ones, or perhaps
an 'open factory' function that would yield such an open function that
otherwise is identical to the default open.

What's the standard python idiom for when consumer code should be
able to specialize the classes used to create objects returned from
a called package? (I'm tempted to say monkey patching the module,
but that can't be optimal :)

I've seen:
- pass the desired subclass as an argument to the class constructor /
factory function.
- set the desired subclass as an instance attribute of the factory object.
- replacing the f_globals attribute of the factory function (I wouldn't
recomend this! but sometimes is the only way)

In the case of builtin open(), I'm not convinced it would be a good idea
to allow subclassing. But I have no rational arguments - just don't like
the idea :(

When 'file' was just a wrapper around C I/O, that probably made as much
sense as anything. But now that IO is more Pythonic, it would be nice
to have Pythonic methods for using a subclass of the default classes
instead of the default classes. Why should a user have to reimplement
'open' just in order to use their own TextIOWrapper subclass?

I should shift this thread to Python-ideas, except I'm not sure I'm
ready to take ownership of it (yet?). :)
 
G

Gabriel Genellina

En Sun, 22 Mar 2009 16:37:31 -0300, Benjamin Peterson
How is that undocumented behavior? TextIOWrapper can wrap any buffer
which
follows the io.BufferedIOBase ABC. BufferedReader is a subclass of
io.BufferedIOBase.

The undocumented behavior is relying on the open() builtin to return a
BufferedReader for a binary file.
 
B

Benjamin Peterson

Gabriel Genellina said:
The undocumented behavior is relying on the open() builtin to return a
BufferedReader for a binary file.

I don't see the problem. open() will return some BufferedIOBase implmentor, and
that's all that TextIOWrapper needs.
 
G

Gabriel Genellina

En Sun, 22 Mar 2009 21:03:38 -0300, Scott David Daniels
Gabriel said:
En Sun, 22 Mar 2009 19:12:13 -0300, Benjamin Peterson

How do you know? AFAIK, the return value of open() is completely
undocumented:
http://docs.python.org/3.0/library/functions.html#open
And if you open the file in text mode, the return value isn't a
BufferedIOBase.

OK, it is documented, but not so clearly. I went first to the io
module, rather than the open function documentation, and looked at
what io.TextIOWrapper should get ast its first arg:
[...]
The type of file object returned by the open() function depends on
the mode. When open() is used to open a file in a text mode ('w',
'r', 'wt', 'rt', etc.), it returns a TextIOWrapper. When used to
open a file in a binary mode, the returned class varies: in read
binary mode, it returns a BufferedReader; in write binary and append
binary modes, it returns a BufferedWriter, and in read/write mode,
it returns a BufferedRandom.

Aha! it is documented. If you have some good ideas on how to make
this more obvious, I'm sure we'd be happy to "fix" the documentation.

Ah, yes. Hmm, so the same description appears in three places: the open()
docstring, the docs for the builtin functions, and the docs for the io
module. And all three are different :(
Perhaps open.__doc__ == documentation for io.open, and documentation for
builtin.open should just tell the basic things and refer to io.open for
details...
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

Forum statistics

Threads
474,294
Messages
2,571,511
Members
48,203
Latest member
LillianaFr

Latest Threads

Top