K
Kirk Strauser
This is a long post, but I tried to keep it clean and concise. Please don't
just skip over it because it has a lot of stuff - I really need some help.
I want to get a project off on the right foot but lack the experience to be
sure I'm doing it as efficiently [0] as possible.
I'm creating a set of classes to implement an API [1]. It looks something
like below, with the exception that I'm writing this from home and am not
posting the several thousand lines of code. Suffice it to say that the
program works alright, but I'm looking for a way to organize it for clean
future expansion:
FileRetriever.py:
class DataSource:
def __init__(self):
self.containers = []
for container in remoteSource():
self.containers.append(Container(container))
class Container:
def __init__(self, container):
self.files = []
for subfile in message:
self.files.append(DataStore(subfile))
class DataStore:
def __init__(self, subfile):
self.param1 = someTransform(attachment)
self.param2 = someOtherTransform(attachment)
self.param3 = YetAnotherTransform(attachment)
def classMethodOne(self):
pass
...
def classMethodTwenty(self):
pass
Now, the problem is that I plan to subclass the heck out of each of these
classes, with overloading appropriate to the type of data source being
represented. For example, a DataSource that retrieves images from a POP3
mailbox might be defined like:
POP3Retriever.py:
import FileRetriever
class POP3DataSource(Datasource):
def __init__(self):
self.containers = []
for message in getPop3MessageList():
self.containers.append(Container(message))
class POP3Container(Container):
def __init__(self, message):
self.files = []
for attachment in message:
self.files.append(DataStore(attachment))
Such a class will be further subclassed into modules like POP3TiffFile,
POP3ZipArchive, etc., the goal being to keep all functionality as high in
the inheritence hierarchy as possible, so that the "leaf" modules define
nothing more than the bare minimum possible to distinguish each other. I'd
like to carry this to the point of not defining any classes that are the
same between siblings (the DataSource class is identical between all of the
different POP3Retriever subclasses, for example).
I've only been heavily using Python for about a year and haven't leaned too
heavily on inheritence yet, so I want to do this the right way. First, a
question on file layout. I've thought about several ways to classify these
modules:
1) Stick each set of classes in a file in the same directory. That is,
FileRetriever.py, POP3Retriever.py, POP3TiffFile.py, etc. are all in
the same place.
2) Create a tree like:
+ FileRetriever
+-- __init__.py
+-- DataSource.py
+-- Container.py
+-- DataStore.py
+-- POP3Retriever
| +-- __init__.py
| +-- DataSource.py
| +-- Container.py
| +-- DataStore.py
| +-- POP3TiffFile
| | +-- __init__.py
| | +-- DataStore.py
| +-- POP3ZipArchive
| +-- __init__.py
| +-- Container.py
+-- SFTPRetriever
+-- __init__.py
...
...
3) Just kidding. I only have two ideas.
The first layout has the advantage that it's simple and involves a minimum
of files, but has annoying quirks such as if I define a DataSource subclass
before a Container subclass, then that DataSource will use the parent's
Container class since the local one hasn't been defined yet when the local
DataSource definition is being read.
The second layout has more files to deal with, but (hopefully?) avoids that
dependency on defining things in a particular order.
Second, what's a good way to name each of the classes? Again, I see two
main possibilities:
1) Name all of the DataSource classes "DataSource", and explicitly name
the parent class:
class DataSource(FileRetriever.DataSource):
2) Name all of the DataSource classes with some variation:
class POP3ZipArchive(POP3Retriever):
The first seems preferable, in that whenever a client program wants to use
one of the leaf classes, it will always be named DataSource. However, that
seems like a whole lotta namespace confusion that could come back to bite me
if I didn't do it right ("What do you mean I accidentally inherited
CarrierPigeonDataSource and nuked all of the files our customer
uploaded?!?").
I ask all of this because the project is still relatively young and
malleable, and this is something that will have to be maintained and
expanded for years to come. I want to take the time now to build a solid
foundation, but I don't have enough experience with Python to have a good
grasp on recommended styles.
[0] "Efficient" being hereby defined as "easy for me to understand when I
revisit the code six months from now".
[1] We receive files from our customers via many means - fax, email, ftp,
you name it. I'm developing delivery method agnostic tools to manipulate
those files and flatly refuse to write n tools to handle n methods.
just skip over it because it has a lot of stuff - I really need some help.
I want to get a project off on the right foot but lack the experience to be
sure I'm doing it as efficiently [0] as possible.
I'm creating a set of classes to implement an API [1]. It looks something
like below, with the exception that I'm writing this from home and am not
posting the several thousand lines of code. Suffice it to say that the
program works alright, but I'm looking for a way to organize it for clean
future expansion:
FileRetriever.py:
class DataSource:
def __init__(self):
self.containers = []
for container in remoteSource():
self.containers.append(Container(container))
class Container:
def __init__(self, container):
self.files = []
for subfile in message:
self.files.append(DataStore(subfile))
class DataStore:
def __init__(self, subfile):
self.param1 = someTransform(attachment)
self.param2 = someOtherTransform(attachment)
self.param3 = YetAnotherTransform(attachment)
def classMethodOne(self):
pass
...
def classMethodTwenty(self):
pass
Now, the problem is that I plan to subclass the heck out of each of these
classes, with overloading appropriate to the type of data source being
represented. For example, a DataSource that retrieves images from a POP3
mailbox might be defined like:
POP3Retriever.py:
import FileRetriever
class POP3DataSource(Datasource):
def __init__(self):
self.containers = []
for message in getPop3MessageList():
self.containers.append(Container(message))
class POP3Container(Container):
def __init__(self, message):
self.files = []
for attachment in message:
self.files.append(DataStore(attachment))
Such a class will be further subclassed into modules like POP3TiffFile,
POP3ZipArchive, etc., the goal being to keep all functionality as high in
the inheritence hierarchy as possible, so that the "leaf" modules define
nothing more than the bare minimum possible to distinguish each other. I'd
like to carry this to the point of not defining any classes that are the
same between siblings (the DataSource class is identical between all of the
different POP3Retriever subclasses, for example).
I've only been heavily using Python for about a year and haven't leaned too
heavily on inheritence yet, so I want to do this the right way. First, a
question on file layout. I've thought about several ways to classify these
modules:
1) Stick each set of classes in a file in the same directory. That is,
FileRetriever.py, POP3Retriever.py, POP3TiffFile.py, etc. are all in
the same place.
2) Create a tree like:
+ FileRetriever
+-- __init__.py
+-- DataSource.py
+-- Container.py
+-- DataStore.py
+-- POP3Retriever
| +-- __init__.py
| +-- DataSource.py
| +-- Container.py
| +-- DataStore.py
| +-- POP3TiffFile
| | +-- __init__.py
| | +-- DataStore.py
| +-- POP3ZipArchive
| +-- __init__.py
| +-- Container.py
+-- SFTPRetriever
+-- __init__.py
...
...
3) Just kidding. I only have two ideas.
The first layout has the advantage that it's simple and involves a minimum
of files, but has annoying quirks such as if I define a DataSource subclass
before a Container subclass, then that DataSource will use the parent's
Container class since the local one hasn't been defined yet when the local
DataSource definition is being read.
The second layout has more files to deal with, but (hopefully?) avoids that
dependency on defining things in a particular order.
Second, what's a good way to name each of the classes? Again, I see two
main possibilities:
1) Name all of the DataSource classes "DataSource", and explicitly name
the parent class:
class DataSource(FileRetriever.DataSource):
2) Name all of the DataSource classes with some variation:
class POP3ZipArchive(POP3Retriever):
The first seems preferable, in that whenever a client program wants to use
one of the leaf classes, it will always be named DataSource. However, that
seems like a whole lotta namespace confusion that could come back to bite me
if I didn't do it right ("What do you mean I accidentally inherited
CarrierPigeonDataSource and nuked all of the files our customer
uploaded?!?").
I ask all of this because the project is still relatively young and
malleable, and this is something that will have to be maintained and
expanded for years to come. I want to take the time now to build a solid
foundation, but I don't have enough experience with Python to have a good
grasp on recommended styles.
[0] "Efficient" being hereby defined as "easy for me to understand when I
revisit the code six months from now".
[1] We receive files from our customers via many means - fax, email, ftp,
you name it. I'm developing delivery method agnostic tools to manipulate
those files and flatly refuse to write n tools to handle n methods.