object.enable() anti-pattern

  • Thread starter Steven D'Aprano
  • Start date
R

Roy Smith

There is no sensible use-case for creating a file OBJECT unless it
initially wraps an open file pointer.

OK, I guess that's a fair statement. But mostly because a python file
object only exposes those subset of operations you can do on file
descriptors which deal with reading and writing the contents of a
file.

It would not be true if python file objects included methods for
querying and manipulating file metadata. It's not hard to imagine a
file class which could be used like:

f = file("/path/to/my/file")
f.delete()

That would be a totally different model from the current python file
object. And then there would be plenty of things you might want to do
to a file other than open it...

file("/path/to/my/directory").chdir()
file("/dev/sdf").mount("/var/lib/whatever")
file("/mnt/swapfile").swapon()
The standard C I/O library doesn't support creating a file
descriptor unless it is a file descriptor to an open file [...]
there is no corresponding function to create a *closed* file
description. (Because such a thing would be pointless.)

What about sockets? From the python standard library:

s = socket.socket()

Now what? You can't do much with your shiny new socket until you call
bind() or connect(), or a few other things. At least not for TCP.
This is essentially the two-phased construction pattern. Of course,
the python socket module just exposes the semantics of the underlying
OS sockets, so there's not a lot of choice there.
 
C

Chris Angelico

It's not hard to imagine a
file class which could be used like:

f = file("/path/to/my/file")
f.delete()

That would be a totally different model from the current python file
object. And then there would be plenty of things you might want to do
to a file other than open it...

file("/path/to/my/directory").chdir()
file("/dev/sdf").mount("/var/lib/whatever")
file("/mnt/swapfile").swapon()

Sure, you can imagine it. But what does it do that can't be done with
a moduleful of flat functions accepting strings? This makes sense in
Java, I guess, but why do it in Python?

ChrisA
 
G

Greg Ewing

Cameron said:
You open a file with "0" modes, so
that it is _immediately_ not writable. Other attempts to make the
lock file thus fail because of the lack of write,

I don't think that's quite right. You open it with
O_CREAT+O_EXCL, which atomically fails if the file
already exists. The read/write modes don't really
come into it, as far as I know.
 
C

Cameron Simpson

| On Thu, 09 May 2013 18:23:31 +1000, Cameron Simpson wrote:
|
| > | Steven D'Aprano wrote:
| > | > There is no sensible use-case for creating a file WITHOUT OPENING
| > | > it. What would be the point?
| > |
| > | Early unix systems often used this as a form of locking.
| >
| > Not just early systems: it's a nice lightweight method of making a
| > lockfile even today if you expect to work over NFS, where not that many
| > things are synchronous. You OPEN A FILE with "0" modes
|
| [emphasis added]
| This is all very well and good, but for the life of me, I cannot see how
| opening a file is a good example of not opening a file. Perhaps it is a
| Zen thing, like the sound no spoon makes when you don't tap it against a
| glass that isn't there.

Because a file usually does not exist in isolation (yes sometimes
we want an isolated file). Files usually exist in the filesystem,
which is a namespace. And this is effectively a namespace operation,
not a data storage operation.

Of course, I can take this the other way: just because I opened it
with a 0 mode field doesn't mean _I_, the opener, cannot read/write
it. I've got an open file handle... A race free way to make a scratch
file in a shared area, for example.

The point is probably that a file isn't merely a feature free byte
storage container; in the real world they usually come with all
sorts of features like names and permissions. Those features will
always imply creative uses.

Anyway, this has little to do with your antipattern (about which
I'm not totally convinced anyway unless it is a rule of thumb or
code smell). It might apply to a Platonicly ideal file, but real
files have more than one use case.

Cheers,
 
G

Gregory Ewing

Wayne said:
You don't ever want a class that has functions that need to be called in
a certain order to *not* crash.

That seems like an overly broad statement. What
do you think the following should do?

f = open("myfile.dat")
f.close()
data = f.read()
 
M

Michael Speer

By his reasoning it simply shouldn't exist. Instead you would access the
information only like this:

with open("myfile.dat") as f:
data = f.read()

Which is my preferred way to work with resources requiring cleanup in
python anyways, as it ensures I have the least chance of messing things up,
and that all of my resources are disposed of properly during the unwind.
It's a hell of a lot cleaner than remembering to call the appropriate
cleanup functions at every callsite, and the resultant values can never be
in a bad state ( at least due to programmer function-ordering fault,
underlying file i/o errors and things can still cause errors, but not due
to API mis-usage ).

Python 3's multiple-with-statement-target syntax was backported to 2.7 as
well, flattening the deep nests of with statements otherwise needed to
implement this pattern
http://docs.python.org/dev/whatsnew/2.7.html#other-language-changes
 
R

Roy Smith

Michael Speer said:
By his reasoning it simply shouldn't exist. Instead you would access the
information only like this:

with open("myfile.dat") as f:
data = f.read()

The problem with things like file objects is they model external
real-world entities, which have externally-imposed real-world behaviors.

f.close() can fail, most commonly because some buffered output couldn't
be written when being flushed as part of the close(). Sometimes it's
important to find out about that.
 
S

Steven D'Aprano

You might want to do this:

f = File(path)
if f.exists():
...

This would be an alternative to:

if os.path.exists(path):
...

Sure, but your made-up File object does not represent a file, it
represents a pathname which *may or may not* exist. Pathnames are not
files. Not all files have a pathname that refers to them, and not all
pathnames point to an actual file. Since it has an exists() method that
can return False, there are so-called "File" objects that aren't files
and the name is a misnomer. A much better name for the class would be
"Path", not "File".

I'm not saying that there is never a use-case for some objects to have an
"enable" or "start" or "open" method. That would clearly be a silly thing
to say. In the real world, we design many objects to have a start switch.
It would be pretty horrible if your car was running all the time. But
that's because there are actual use-cases for having cars *not* run, and
"make it stop" is the safe default behaviour.

Your fridge, on the other hand, doesn't have a "make it go" button. So
long as power is applied to it, your fridge automatically runs. Likewise,
your watch is an "always on" device, provided it hasn't wound down or
have a flat battery. Your fire alarm is "always on".

I must admit I am astonished at how controversial the opinion "if your
object is useless until you call 'start', you should automatically call
'start' when the object is created" has turned out to be.
 
C

Chris Angelico

I must admit I am astonished at how controversial the opinion "if your
object is useless until you call 'start', you should automatically call
'start' when the object is created" has turned out to be.

I share your astonishment. This is a very simple point: If, after
constructing an object, the caller MUST call some method on it prior
to the object being of use, then better design is to embed that call
directly into the constructor. As always, it has its exceptions, but
that doesn't stop it being a useful rule.

The Path() equivalent would be:

p = Path()
p.set_path("/foo/bar")
if p.exists():
pass

Even if you have a set_path() method, it makes good sense to symlink
it to __init__ to avoid this anti-pattern.

C level APIs often have these sorts of initialization requirements.

fd_set selectme;
FD_ZERO(&selectme);

This is because declaring a variable in C cannot initialize it.
Anything that *has* constructors should be using them to set objects
up... that's what they're for.

Where's the controversy?

ChrisA
 
R

Roy Smith

Steven D'Aprano said:
I must admit I am astonished at how controversial the opinion "if your
object is useless until you call 'start', you should automatically call
'start' when the object is created" has turned out to be.

I'm sorry. I thought you were here for an argument.

I think where things went pear shaped is when you made the statement:

That's a pretty absolute point of view. Life is rarely so absolute.
 
M

Mark Janssen

I think where things went pear shaped is when you made the statement:
That's a pretty absolute point of view. Life is rarely so absolute.

In the old days, it was useful to have fine-grained control over the
file object because you didn't know where it might fail, and the OS
didn't necessarily give you give good status codes. So being able to
step through the entire process was the job of the progammers.

Now, with languages so high like python and hardware so common, it
almost is never necessary, so he has some point. A closed file
pointer is useful from a OS-progamming point-of-view though because
you generally never want to leave files open where they'd block other
I/O.
 
C

Chris Angelico

In the old days, it was useful to have fine-grained control over the
file object because you didn't know where it might fail, and the OS
didn't necessarily give you give good status codes. So being able to
step through the entire process was the job of the progammers.

I don't know what you mean by the "old days", but a couple of decades
ago, there were no such things as "file objects". You call a function
to open a file, you get back a number. You explicitly close that by
calling another function and passing it that number. In fact, there is
no way to have a "file object" that doesn't have an open file
associated with it, because it's simply... a number.
Now, with languages so high like python and hardware so common, it
almost is never necessary, so he has some point. A closed file
pointer is useful from a OS-progamming point-of-view though because
you generally never want to leave files open where they'd block other
I/O.

I'm beginning to wonder if you and Dihedral are swapping notes.
Dihedral's been sounding fairly coherent lately.

ChrisA
 
M

Mark Janssen

In the old days, it was useful to have fine-grained control over the
I don't know what you mean by the "old days", but a couple of decades
ago, there were no such things as "file objects".

My apologies. I used the word "object" when I shouldn't have.
I'm beginning to wonder if you and Dihedral are swapping notes.
Dihedral's been sounding fairly coherent lately.

Dihedral... That's my dream-self. Where did you encounter him? heh
 
S

Steven D'Aprano

On 09May2013 11:30, Steven D'Aprano
| On Thu, 09 May 2013 18:23:31 +1000, Cameron Simpson wrote:
|
| > | Steven D'Aprano wrote:
| > | > There is no sensible use-case for creating a file WITHOUT
| > | > OPENING it. What would be the point?
| > |
| > | Early unix systems often used this as a form of locking.
| >
| > Not just early systems: it's a nice lightweight method of making a
| > lockfile even today if you expect to work over NFS, where not that
| > many things are synchronous. You OPEN A FILE with "0" modes
| [emphasis added]

| This is all very well and good, but for the life of me, I cannot see
| how opening a file is a good example of not opening a file. Perhaps it
| is a Zen thing, like the sound no spoon makes when you don't tap it
| against a glass that isn't there.

Because a file usually does not exist in isolation (yes sometimes we
want an isolated file). Files usually exist in the filesystem, which is
a namespace. And this is effectively a namespace operation, not a data
storage operation.

Of course, I can take this the other way: just because I opened it with
a 0 mode field doesn't mean _I_, the opener, cannot read/write it. I've
got an open file handle... A race free way to make a scratch file in a
shared area, for example.


But you are opening the file. Therefore, it cannot possibly be an example
of not opening the file.

Unlike Pascal, there is no way to create a C file descriptor in a closed
state. Such a thing does not exist. If you have a file descriptor, the
file is open. Once you close it, the file descriptor is no longer valid.

But even if C allowed you to do so, doesn't mean that it is a good idea.
At least some variants of Pascal force you to do the following:

# Pseudo-code.
f = open(filename)
really_open(f)
data = read(f) # or write, or *any other operation at all*

Surely we can agree that having to call both open() and really_open()
before you get an actually opened file that you can use is one call too
many? There is *nothing* you can do with f before calling really_open().
So why require it?

(For the record, "really_open" is spelled "reset" or "rewrite" depending
on whether you want to read or write to the file.)

The point is probably that a file isn't merely a feature free byte
storage container; in the real world they usually come with all sorts of
features like names and permissions. Those features will always imply
creative uses.

There are at least three related but separate things here.

* file objects, or their low-level equivalent, file descriptors;

* pathnames;

* files on disk, which are an abstraction for data in a particular kind
of data structure.

They are obviously related, but they are also obviously independent. You
can have a pathname that doesn't refer to any file on disk; you can have
a file on disk that has been lost, and therefore is no longer accessible
via a file name. (If you run fsck or equivalent, the file may be
recoverable, but the name will be lost.) You can open a file object, then
unlink it so it no longer points to a file on disk, but does still accept
read or write calls.

The counter-examples given so far apply to pathnames or files on disk.
They don't apply to file objects and file descriptors. I have tried
really hard to be clear that I am talking about file objects. To the
extent that I have failed, I am sorry.
 
S

Steven D'Aprano

I'm sorry. I thought you were here for an argument.

No, I'm here for the abuse.

I think where things went pear shaped is when you made the statement:


That's a pretty absolute point of view. Life is rarely so absolute.

So far the only counter-examples given aren't counter-examples. One
involves opening the file. The other involves something which isn't a
file, but a string instead. If there are any counter-examples, they are
impossible in Python and C: you cannot create a file object in Python
without opening it, and you cannot create a file descriptor in C without
opening it. But not in Pascal, which actually supports my claim that this
is an anti-pattern: while some Pascal implementations do allow you to
create a non-open file, you cannot do *anything* with it until you open
it, except generate bugs.
 
R

Roy Smith

Steven D'Aprano said:
there is no way to create a C file descriptor in a closed state. Such
a thing does not exist. If you have a file descriptor, the file is
open. Once you close it, the file descriptor is no longer valid.

Of course there is.

int fd = 37;

I've just created a file descriptor. There is not enough information
given to know if it corresponds to an open file or not.

Before you protest that "it's just an int, not a file descriptor", I
should point out that they're the same thing. It's pretty common to do
something like:

for (int fd = 0; fd <= MAX_FD; fd++) {
close(fd)
}

before forking, to make sure all file descriptors are closed.
 
S

Steven D'Aprano

Of course there is.

int fd = 37;

I've just created a file descriptor. There is not enough information
given to know if it corresponds to an open file or not.

No, you haven't created a file descriptor. You've made up a number which
C will allow you to use as an index into the file descriptor table,
because C is a high-level assembler with very little in the way of type
safety, and what little there is you can normally bypass. What you
haven't done is create the record in the file descriptor table. You can't
expect that read(fd) or write(fd) will work, although both should fail
safe rather than segfault if 37 happens to not be an actual file
descriptor.

What you've done is the moral equivalent of choosing an integer at
random, coercing it to a pointer, then dereferencing it to peek or poke
at some memory address. (Although fortunately much safer.)

It's a nice hack, but not one that takes away from what I'm saying.
 
S

Steven D'Aprano

So far the only counter-examples given aren't counter-examples ...

Well, sure, if you discount operations like "create this file" and
queries like "could I delete this file if I wanted to?" [0] as methods
of the file system rather than of a hypothetical file object.

What about a distributed system? Suppose I want to create a file object
in one place, and send that object to the another place for the file to
be read from or written to [1]? Suppose that for security reasons, I
have to do it that way, because the first place can only create the
objects, and the second place can only access the underly file contents
through an existing object?

Unless you have re-implemented the file I/O system, it doesn't matter. If
your file objects are based on C I/O, then even if the first server
cannot read or write to the files it still creates file objects in an
open state, because that is how C works.

Or maybe the first server only creates some sort of proxy to the real
underlying file object. Or maybe you're re-implemented the I/O system,
and aren't following C's design. Since you're just making this up as a
thought experiment, anything is possible.

But either way, that's fine. You've found an object where it does make
sense to have an explicit "make it go" method: first one entity has
permission to construct the object, but not to open the underlying file.
Another entity has permission to open the underlying file, but not to
create the object. I have no idea whether this is a reasonable security
design or not, it actually sounds a bit rubbish to me but what do I know?
So let's treat it as a reasonable design.

As I've said, repeatedly, that's not what I'm talking about.

When you DON'T have useful things that can be done with the object before
calling "enable", then it is an anti-pattern to require a separate call
to "enable" method, and the enable functionality should be moved into the
object constructor. If you DO have useful things that can be done, like
pass the object to another entity, for security, then that's a whole
'nuther story.

Really, what I'm describing is *normal* behaviour for most objects. We
don't usually design APIs like this:

n = int("42")
n.enable()
m = n + 1
m.enable()
x = m*2 + n*3
print x - 1 # oops, raises because I forgot to call x.enable()

That's a rubbish API, and for simple data-like objects, we all agree it
is a rubbish API. So why accept the same rubbish API just because the
object is more complicated? If you don't have a good, reasonable, non-
contrived use-case for a separate "make it go" method, don't use one.


For my next controversial opinion, I'm going to argue that we should do
arithmetic using numbers rather than by inserting lists inside other
lists:

# Do this:

count = 0
count += 1

# Not this:

count = []
count.insert(0, [])


*wink*
 
R

Robert Kern

But either way, that's fine. You've found an object where it does make
sense to have an explicit "make it go" method: first one entity has
permission to construct the object, but not to open the underlying file.
Another entity has permission to open the underlying file, but not to
create the object. I have no idea whether this is a reasonable security
design or not, it actually sounds a bit rubbish to me but what do I know?
So let's treat it as a reasonable design.

As I've said, repeatedly, that's not what I'm talking about.

When you DON'T have useful things that can be done with the object before
calling "enable", then it is an anti-pattern to require a separate call
to "enable" method, and the enable functionality should be moved into the
object constructor. If you DO have useful things that can be done, like
pass the object to another entity, for security, then that's a whole
'nuther story.

I'd be curious to see in-the-wild instances of the anti-pattern that you are
talking about, then. I think everyone agrees that entirely unmotivated "enable"
methods should be avoided, but I have my doubts that they come up very often. Do
programmers have a natural tendency to make an extra, completely unnecessary
method? I would think that they have a natural tendency to the opposite.

In my experience, everyone has a reason in mind when they follow a
pattern/anti-pattern. It is pretty rare that someone just does some specific,
nameable thing for no reason at all. There is no need to call out an
anti-pattern for which no one has a reason to do it. But there is a continuum of
reasons. Some reasons are better than others. Some reasons only apply in a small
set of circumstances but seem like they would apply more generally, at least to
novice programmers. Programmers can be wrong about what they think the
(anti-)pattern actually achieves. The whole point of naming an anti-pattern is
to discuss those reasons, show where they are misapplied, where YAGNI, why
novices overuse it, other patterns that should be used instead, and also the
circumstances where it is actually a good pattern instead.

To artificially limit the discussion of the anti-pattern to the trivial,
entirely unmotivated case forbids most of the interesting and instructive parts
of the conversation.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
 
R

Roy Smith

int fd = 37;

I've just created a file descriptor. There is not enough information
given to know if it corresponds to an open file or not.

No, you haven't created a file descriptor. You've made up a number which
C will allow you to use as an index into the file descriptor table,
because C is a high-level assembler with very little in the way of type
safety, and what little there is you can normally bypass.[/QUOTE]

No, I've created a file descriptor, which is, by definition, an integer.
It has nothing to do with C. This is all defined by the POSIX
interface. For example, the getdtablesize(2) man page says:

"The entries in the descriptor table are numbered with small integers
starting at 0. The call getdtablesize() returns the size of this table."

So, I am now guaranteed that fds will be ints. I also know the
guaranteed minimum and maximum values.

The system even makes certain guarantees which let me predict what file
descriptor I'll get next in certain situations. For example, from the
dup(2) page on my OSX box:

"The new descriptor returned by the call is the lowest numbered
descriptor currently not in use by the process."
What you haven't done is create the record in the file descriptor table.

That's correct. But, as described above, the system makes certain
guarantees which allow me to reason about the existence or non-existence
os such entries.
You can't expect that read(fd) or write(fd) will work

I can expect that they will work if I have reasoned correctly about the
POSIX-guaranteed semantics. For example, POSIX says(*) that this C
program is guaranteed to print, "hello, fd world" (assuming the
assertion passes):

#include <unistd.h>
#include <stdio.h>
#include <string.h>
#include <assert.h>

int main(int argc, char** argv) {
int max_files = getdtablesize();
assert(max_files >= 4);

for (int i = 3; i < max_files; ++i) {
close(i);
}

dup(2);
char* message = "hello, fd world\n";
write(3, message, strlen(message));
}
What you've done is the moral equivalent of choosing an integer at
random, coercing it to a pointer, then dereferencing it to peek or poke
at some memory address. (Although fortunately much safer.)

No, what I've done is taken advantage of behaviors which are guaranteed
by POSIX.

But, we're going off into the weeds here. Where this started was you
said:
There is no sensible use-case for creating a file WITHOUT OPENING
it. What would be the point?

I agree with you, in general, that it is usually poor design to have
classes which require instances to be initialized after they are created.

The problem is, you chose as your example a particular domain where the
underlying objects being modeled have unusual semantics imposed by an
interface that's 40 years old. And then you made absolute statements
about there not possibly ever being certain use cases, when clearly
there are (for that domain).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,139
Messages
2,570,805
Members
47,351
Latest member
LolaD32479

Latest Threads

Top