A design problem I met again and again.

Ò»Ê×Ê« · Apr 1, 2009

Hi all,

I am a programmer who works with some different kinds of programming
languages, like python, C++(in COM), action script, C#, etc.

Today, I realized that, what ever language I use, I always meet a same
problem and I think I never solve it very well.

The problem is : how to break my app into functional pieces?

I know it's important to break an application to lots of pieces to
make it flexible. But it's easier said than done. I can split an
application to 4 or 5 pieces based on "programming functions", for
example, logging, socket, string, math, ...

When it comes to the business logic, I found I always provide a big
class with many methods, and it grow bigger when new functions are
added.

Recently I use twisted to write a server. It has several protocol
classes which decode and encode different kinds of network protocols ,
and a protocol independent service class which handle request from
clients according to business logic.

Protocol classes receive message from client, decode it, call method
of service, encode result and send it back to client.

There are also some utility packages such as logging as I mentioned
before.

So far so fine, every thing is clear.

Until one day I find service has nearly 100 methods and 6000 lines of
code. I don't need to read any programming book to know that it's
too big.

But I can not find an easier way to split it. Here are some
solutions I found:

1. add several business classes, and move code in service into them.
But this means although service will contains much less code, it still
has to keep lots of methods, and the only functions of these methods
is call corresponding methods in business classes. The number of
methods in service will keep growing for ever.

2. completely move codes in service to business classes containing
only classmethods. These protocol classes calls these classmethods
directly instead of call service. But this pattern doesn't look that
OO.

3. completely move codes in service to business classes. Initialize
these classes and pass them to protocol classes.
These protocol classes calls these instances of business classes
instead of call service. These means whenever I add a new business
class. I have to add a parameter to __init__ methods of every
protocol class. Not very clear either.

==========================================

I got the same problem when writing C#/C++ when I have to provide a
lot of method to my code's user. So I create a big class as the entry
point of my code. Although these big classes doesn't contains much
logic, they do grow bigger and bigger.

Lawrence D'Oliveiro · Apr 1, 2009

In message <48506803-a6b9-432b-acef-

Until one day I find service has nearly 100 methods and 6000 lines of
code. I don't need to read any programming book to know that it's
too big.

The question is not how many lines or how many methods, but whether it makes
sense to remain as one piece or not. In one previous project, I had one
source file with nearly 15,000 lines in it. Did it make sense to split that
up? Not really.

ä¸€é¦–è¯— · Apr 1, 2009

I also think that's my best choice. Before I wrote my mail, I
already knew that this is not a good question. It lacks details, and
it is too big.

But I think the first step to resolve a problem is to describe it. In
that way, I might find the answer myself

ä¸€é¦–è¯— · Apr 1, 2009

In message <48506803-a6b9-432b-acef-

The question is not how many lines or how many methods, but whether it makes
sense to remain as one piece or not. In one previous project, I had one
source file with nearly 15,000 lines in it. Did it make sense to split that
up? Not really.

What are the average size of source files in your project? If it's
far lower than 15,000, don't feel it's a little unbalance?

Carl Banks · Apr 1, 2009

I got the same problem when writing C#/C++ when I have to provide a
lot of method to my code's user. So I create a big class as the entry
point of my code. Although these big classes doesn't contains much
logic, they do grow bigger and bigger.

This seems to be a classic result of "code-based organization", that
is, you are organizing your code according to how your functions are
used. That's appropriate sometimes. Procedural libraries are often
organized by grouping functions according to use. The os module is a
good example.

However, it's usually much better to organize code according to what
data it acts upon: "data-based organization". In other words, go
though your big class and figure out what data belongs together
conceptually, make a class for each conceptual set of data, then
assign methods to classes based on what data the methods act upon.

Consider the os module again. It's a big collection of functions, but
there are a group of functions is os that all act on a particular
piece of data, namely a file descriptor. This suggests tha all the
functions that act upon file descriptors (os.open, os.close, os.seek,
etc.) could instead be methods of a single class, with the file
descriptor as a class member.

(Note: the os library doesn't do that because functions like os.open
are supposed to represent low-level operations corresponding to the
underlying system calls, but never mind that. Ordinarily a bunch of
functions operating on common data should be organized as a class.)

Carl Banks

Lawrence D'Oliveiro · Apr 2, 2009

In message <158986a9-b2d2-413e-9ca0-

What are the average size of source files in your project? If it's
far lower than 15,000, don't feel it's a little unbalance?

Why?

Steven D'Aprano · Apr 2, 2009

Why?

If you have too much code in one file, it will upset the balance of the
spinning hard drive platter, and it will start to wobble and maybe even
cause a head-crash.

Tim Rowe · Apr 2, 2009

2009/4/1 Ò»Ê×Ê« said:
Hi all,

I am a programmer who works with some different kinds of programming
languages, like python, C++(in COM), action script, C#, etc.

Today, I realized that, what ever language I use, I always meet a same
problem and I think I never solve it very well.

The problem is : how to break my app into functional pieces?

One approach is to go through the specification of the program,
underline all of the significant nouns and try to implement each of
the nouns as a class. That won't take you all the way to a good design
-- some of the resulting classes will be too trivial, and it won't
give you the derived classes you need, but it's a good first step to
breaking a problem down, and might help break your one big class
habit.

Steven D'Aprano · Apr 2, 2009

one reason is that it becomes inefficient to find code. if you
structure code as a set of nested packages, then a module, and finally
classes and methods, then you have a tree structure. and if you divide
the structure along semantic lines then you can efficiently descend the
tree to find what you want. if you choose the division carefully you
can get a balanced tree, giving O(log(n)) access time. in contrast a
single file means a linear scan, O(n).

What's n supposed to be? The number of lines in a file? No, I don't think
so -- you said it yourself: "if you divide the structure along semantic
lines then you can efficiently descend the tree to find what you want".
Not "arbitrarily divide the files after n lines". If one semantic
division requires 15,000 lines, and another semantic division requires 15
lines, then the most efficient way to divide the code base is 15,000
lines in one module and 15 lines in another.

Admittedly, I'd expect that any python module with 15,000 lines
(approximately 900KB in size) could do with some serious refactoring into
modules and packages, but hypothetically it could genuinely make up a
single logical, semantic whole. That's "only" four and a half times
larger than decimal.py.

I can't imagine what sort of code would need to be that large without
being divided into modules, but it could be possible.

Ò»Ê×Ê« · Apr 2, 2009

You get it. Sometimes I feel that my head is trained to work in a
procedural way. I use a big class just as a container of functions.

About the "data-based" approach, what if these functions all shares a
little data, e.g. a socket, but nothing else?

Jorgen Grahn · Apr 2, 2009

[top-posting fixed]

....

You get it. Sometimes I feel that my head is trained to work in a
procedural way. I use a big class just as a container of functions.

If that's true, then your problems are not surprising.
A real class normally doesn't get that big.

About the "data-based" approach, what if these functions all shares a
little data, e.g. a socket, but nothing else?

If that is true, then those functions *are* the Python socket class
and everything has already been done for you.

Turn your question around and it makes more sense (to me, at least).
You don't primarily work with functions: you work with data, a.k.a.
state, a.k.a. objects. The functions follow from the data.

To me, if I can find something with a certain lifetime, a certain set
of invariants, and a suitable name and catchphrase describing it, then
that's probably a class. Then I keep my fingers crossed and hope it
works out reasonably well. If it doesn't, I try another approach.

/Jorgen

Carl Banks · Apr 2, 2009

You get it. Sometimes I feel that my head is trained to work in a
procedural way. I use a big class just as a container of functions.

About the "data-based" approach, what if these functions all shares a
little data, e.g. a socket, but nothing else?

Then perhaps your problem is that you are too loose with the
interface. Do you write new functions that are very similar to
existing functions all the time? Perhaps you should consolidate, or
think about how existing functions could do the job.

Or perhaps you don't have a problem. There's nothing wrong with large
classes per se, it's just a red flag. If you have all these functions
that really all operate on only one piece of data, and really all do
different things, then a large class is fine.

Carl Banks

Emile van Sebille · Apr 3, 2009

ä¸€é¦–è¯— said:
Hi all,

I am a programmer who works with some different kinds of programming
languages, like python, C++(in COM), action script, C#, etc.

Today, I realized that, what ever language I use, I always meet a same
problem and I think I never solve it very well.

The problem is : how to break my app into functional pieces?

My question would be why? Refactoring adds nothing to a functioning app
but clarity and maintainability -- both admirable qualities, granted,
and both unnecessary until needed. When I need to update an app is when
I start refactoring, and then just those areas that need it. Certainly
I refactor constantly during development to avoid code reuse through
cut-n-paste, but once I've got it going, whether it's 1000 or 6000
lines, it doesn't matter as long as it works. I'll tease it out when
the upgrades are needed, new applications can reuse pieces, or sooner if
business refactoring requires it.

Emile, writing in the role of sole developer and maintainer of 500k
lines of code dating back 35 years...

Steven D'Aprano · Apr 3, 2009

My question would be why? Refactoring adds nothing to a functioning app
but clarity and maintainability -- both admirable qualities, granted,
and both unnecessary until needed.

But they're always needed, except possibly for use-once throw-away
scripts.

When I need to update an app is when
I start refactoring, and then just those areas that need it. Certainly
I refactor constantly during development

Well, that pretty much disproves your assertion that refactoring is only
needed when updating an application.

to avoid code reuse through
cut-n-paste, but once I've got it going, whether it's 1000 or 6000
lines, it doesn't matter as long as it works.

If you've been refactoring during development, and gotten to the point
where it is working, clear and maintainable, then there's very little
refactoring left to do. I don't think anyone suggests that you refactor
code that doesn't need refactoring. Once it is already split into
functional pieces, there's no need to continue breaking it up further.

Emile van Sebille · Apr 3, 2009

Steven said:
If you've been refactoring during development, and gotten to the point
where it is working,

yes, but

clear and maintainable,

not necessarily

then there's very little refactoring left to do.

Again, not necessarily. I often find it easier to refactor old code
when I'm maintaining it to better understand how to best implement the
change I'm incorporating at the moment. The refactoring certainly may
have been done when the code was originally written, but at that time
refactoring would have only served to pretty it up as it already worked.

I don't think anyone suggests that you refactor
code that doesn't need refactoring.

That's exactly what I read the OP as wanting to do. That's why I was
asking why. So, I think the question becomes, when does code need
refactoring?

Emile

Michele Simionato · Apr 3, 2009

So, I think the question becomes, when does code need
refactoring?

I would say that 99.9% of the times a single class with 15,000
lines of code is a signal that something is wrong,
and refactoring is needed.

M. Simionato

Ò»Ê×Ê« · Apr 3, 2009

Consolidate existing functions?

I've thought about it.

For example, I have two functions:

#=========================

def startXXX(id):
pass

def startYYY(id):
pass
#=========================

I could turn it into one:

#=========================
def start(type, id):
if(type == "XXX"):
pass
else if(type == "YYY"):
pass
#=========================

But isn't the first style more clear for my code's user?

That's one reason why my interfaces grow fast.

Steven D'Aprano · Apr 3, 2009

yes, but

not necessarily

If it's not clear and maintainable, then there *is* refactoring left to
do. Whether you (generic you) choose to do so or not is a separate issue.

Again, not necessarily. I often find it easier to refactor old code
when I'm maintaining it to better understand how to best implement the
change I'm incorporating at the moment. The refactoring certainly may
have been done when the code was originally written, but at that time
refactoring would have only served to pretty it up as it already worked.

That's exactly what I read the OP as wanting to do. That's why I was
asking why. So, I think the question becomes, when does code need
refactoring?

(1) When the code isn't clear and maintainable.

(2) When you need to add or subtract functionality which would leave the
code unclear or unmaintainable.

(3) When refactoring would make the code faster, more efficient, or
otherwise better in some way.

(4) When you're changing the API.

Emile van Sebille · Apr 3, 2009

Steven said:
If it's not clear and maintainable, then there *is* refactoring left to
do.
Agreed.

Whether you (generic you) choose to do so or not is a separate issue.

Also agreed - and that is really my point. Doing so feels to me like
continuing to look for a lost object once you've found it.

(1) When the code isn't clear and maintainable.

(2) When you need to add or subtract functionality which would leave the
code unclear or unmaintainable.

(3) When refactoring would make the code faster, more efficient, or
otherwise better in some way.

(4) When you're changing the API.

Certainly agreed on (2) and (4). (1) follows directly from (3). And (3)
only after an issue has been observed.

Emile

Emile van Sebille · Apr 3, 2009

andrew said:
i can see your point here, but there's two things more to consider:

1 - if you do need to refactor it later, because there is a bug say, it
will be harder to do so because you will have forgotten much about the
code.

Yes, I generally count on it. Refactoring at that time is precisely
when you get the most benefit, as it will concisely focus your
attentions on the sections of code that need to be clearer to support
the debugging changes. Face it, you'll have to get your head around the
code anyway, be it 1, 5, or 10k lines and all beautifully structured or
not. Remember, proper refactoring by definition does not change
functionality -- so that bug in the code will be there regardless.

so if it is likely that you will need to refactor in the future, it
may pay to do some of that work now.

Certainly -- and I envy those who know which sections to apply their
attentions to and when to stop. Personally, I stop when it works and
wait for feedback.

2 - if someone else needs to work with the code then the worse state it is
in - even if it works - the harder time they will have understanding it.
which could lead to them using or extending it incorrectly, for example.

Assuming you're talking about non-refactored code when you say worse,
consider Zope vs Django. I have no doubt that both meet an acceptable
level of organization and structure intended in part to facilitate
maintenance. I've got multiple deployed projects of each. But I'll hack
on Django if it doesn't do what I want and I find that easy, while
hacking on Zope ranks somewhere behind having my mother-in-law come for
a three-week stay on my favorite-things-to-do list. Refactored code
doesn't necessarily relate to easier understanding.

both of the above fall under the idea that code isn't just a machine that
produces a result, but also serves as documentation. and working code
isn't necessarily good documentation.

Here I agree. Once I've got it working and I have the time I will add
minor clean up and some notes to help me the next time I'm in there.
Clean up typically consists of dumping unused cruft, relocating imports
to the top, and adding a couple lines of overview comments. On the
other hand, I do agree with Aahz's sometimes tag line quote accepting
all comments in code as lies. It's akin to believing a user -- do so
only at your own peril. They're really bad witnesses.

i don't think there's a clear, fixed answer to this (i don't think "stop
refactoring as soon as all tests work" can be a reliable general rule any
more than "refactor until it is the most beautiful code in the world" can
be). you need to use your judgement on a case-by-case basis.

Well said.

in fact, the thing i am most sure of in this thread is that 15000 lines of
code in one module is a disaster.

Agreed. I took a quick scan and the largest modules I'm working with
look to be closer to 1500 lines. Except tiddlywiki of course, which
comes in at 9425 lines in the current download before adding anything to
it. I bet I'd prefer even hacking that to zope though.

One programmer's disaster is another programmer's refactoring dream

Emile

Need help again please	19	Feb 14, 2020
Python Unicode handling wins again -- mostly	67	Nov 30, 2013
RFC: tkSimpleDialog IMPROVED AGAIN!.	1	Jul 3, 2011
memoize again	0	Nov 21, 2009
HGE and Python (again)	2	Dec 11, 2008
Encodign issue in Python 3.3.1 (once again)	42	May 26, 2013
How to Design a Cross Network app	0	Aug 15, 2022
I made a blockchain and want to make a cryptocurrency, but my code doesn't verify hash of each block	2	Jun 2, 2024

A design problem I met again and again.

Ò»Ê×Ê«

Lawrence D'Oliveiro

ä¸€é¦–è¯—

ä¸€é¦–è¯—

Carl Banks

Lawrence D'Oliveiro

Steven D'Aprano

Tim Rowe

Steven D'Aprano

Ò»Ê×Ê«

Jorgen Grahn

Carl Banks

Emile van Sebille

Steven D'Aprano

Emile van Sebille

Michele Simionato

Ò»Ê×Ê«

Steven D'Aprano

Emile van Sebille

Emile van Sebille

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads