Copy / Paste in software development

J

James Kanze

Using pejorative descriptors such as "sordid mess" isn't an
argument,

It's not an argument, it's a qualifier to his argument. Note
the use of "spaghetti state machine", later. His point
(perfectly valid, IMHO) is that if the state machine is
structured (not spaghetti), it will lend itself to translation
into structured code, and if it's not, well, you can't make a
silk purse out of a sow's ear, and you're better off fixing the
problem at the source, and redesigning the state machine so that
it's not spaghetti.
although it may be your opinion (e.g., you don't like
licorice). Statement machines regardless of how implemented
are damn pretty compared to ad hoc alternatives.

State machines are a form of a program. They can be structured,
or they can be spaghetti. Most things that are "ad hoc", rather
than designed, will end up spaghetti; a structured state machine
is certainly easier to understand and maintain than spaghetti
coded C or C++.
A common construction of a statement using in C/C++ is a
switch statement with that that "one" integral variable, and
set of cases, right? So the basic form is:
enum state = { ... }
while (true) {
switch(state) {
case state1: ....
break;
case stateN: ...
if(...) state=nextstate;
break;
...
}
}
Somhow I don't see the real coding difference between that
and:
state1:...
...
stateN: ...
if (....) goto nextstate;

It's readable and maintainable. It makes the state transitions
clearer.

But Alf's point is well taken: if the state machine isn't
spaghetti, you can translate it into really nicely structured
code, without even the switch. The switch is useful in two
cases: when the state machine is part of the external
specification (so you can't clean it up), and when it is
constructed dynamically (so you don't actually know what it will
be until runtime).
So I don't understand your objection in terms of code clarity
or maintainability.
[I'd prefer there to be a comment to tell me its a state
machine with both I note the goto-coded variable-less version
is actually less stuff to write, and less stuff to read, so in
fact I think it is clearer. But he amount of code isn't a lot
so that doesn't matter much to me]
If you have modest performance needs, the switch version is a
fine implementation. If you are processing lots of data in
which the tests and operations in state are simple (lexical
analysis, message filtering, ...) the switch version overhead
gets to be interesting and one wants it to go away. (I build
tools that read the source code for large source code bases,
so this matters). Each switch state tests some condition,
and does an assignment to the state variable. It then
(implicitly) jumps back to the beginning of the switch, the
switch does an indirect branch, the processor blows its
pipeline, and then the next state makes progress. The
goto-version doesn't have that extra overhead. So your
alternative is to always choose the slower implementation, to
what end?

Is the extra overhead measurable? I've used the switch a lot,
and I've never found a measurable difference in performance with
respect to using goto. The only difference is that the reader
can clearly see when a state transition occurs. (Another
difference is that it is possible to decide the next state, then
do some sort of additional processing. That can sometimes be
useful.)
 
J

Jorgen Grahn

Actually, it isn't, and it never has been an acceptable method.
The procedure in well run shops is called refactoring, and it
doesn't involve copy/paste.
....
... I've yet to see any competent
programmer use copy/paste.

Depends on what you mean, as always ...

Personally:

When I write a new command-line tool, I happily copy & paste the
previous one's main() and modify it, so I get the getopt() and error
handling right and readable.

When I need to do a certain thing in application Foo, and I know a very
similar thing already works well in application Bar, I don't create a
separate library "libThing" and rewrite Bar to use it -- I copy the
code and add a comment saying the code comes from Bar version X.
Later, I can go into the version control system and see if either of
them has been improved.

Maybe the original poster is thinking of something else -- duplicated
logic inside a piece of code, even if it isn't copied. I like fixing
things like that, but only if the code is more elegant afterwards. It
is easy to take two similar, fairly readable functions and merge them
into one unreadable function ... or to merge two classes into a
polymorphic monstrosity.

/Jorgen
 
J

James Kanze

Depends on what you mean, as always ...

When I write a new command-line tool, I happily copy & paste
the previous one's main() and modify it, so I get the getopt()
and error handling right and readable.

Hmmm. Do you really write a new main for each command-line
tool? I just include one from the library.
When I need to do a certain thing in application Foo, and I
know a very similar thing already works well in application
Bar, I don't create a separate library "libThing" and rewrite
Bar to use it -- I copy the code and add a comment saying the
code comes from Bar version X. Later, I can go into the
version control system and see if either of them has been
improved.

But most professionals do refactor; if you only need something
once, there's no point in making it generic. But once you need
it twice, there's a good chance you'll need it three times, or
more.
 
A

Andreas Huber

jacob navia said:
(e-mail address removed) wrote:

It seems you keep trying to justify code duplication in designed systems
with the duplication found in biological organisms:

[snip]
Well, how do you explain this proliferation of copy/paste operations in
the genes that control our brain 6 million years ago?
[snip]

Simple: In environments with random mutation and natural selection,
duplication is relatively cheap. All it costs is the added time to copy the
genetic code. Also, duplication actually helps the organism to adapt. When
the same mechanism is used in two different places then changing just one
instance will often be necessary. If code was shared then the same change
would require a *designer* that is able to step back an say "ok we need to
vary the behavior here, here and here, which is best achieved by ...".

Change in a selective environment is a relatively slow process as
modifications to the genetic code take place randomly. For most software
(there are exceptions) written today, you simply cannot afford to *randomly*
change the application and retest it in the environment. Until we have that
sheer processing power (which will probably not happen in your lifetime), we
have to resort to humans writing code *and* humans maintaining that code.
Especially the maintenance bit is tremendously simplified and much cheaper
when you avoid duplication of any sort.

Conclusion: Justifying code duplication in designed systems with code
duplication in organisms exposed to random mutation and natural selection
is, well, like comparing apples to, um, bananas.

Regards,
 
D

DerTopper

I think nobody here can deny that copy and paste is an established
method of software development.

Unfortunately, you are right. But does the existence of a method
necessarily mean that it is a good method? Just an interesting
analogy: Until the mid 90's most programming code was written in
Cobol. Does this make Cobol an superior programming language?
Apparently not, but Cobol was one of the first that got somehow
elected as THE standard software in the financial industry.

[snip]
The copy/paste is considered harmful.

But... I was surprised when I read this article in PLOS: [1]

<quote> [snip]
"... humans showed the highest number of genes with increased copy
numbers, at 134. Many of these duplicated human genes are implicated in
brain structure and function"

All those millions of years later, the descendants of those apes, still
running the same code, start to wonder...

Just a thought: I'm hardly an expert but the way how proteins are
created by reading the genetic information is quite difficult. AFAIK,
there are some enzymes that block a individual genes, so that the gene
will only be read when a certain concentration of some liquid reaches
a certain level (this way some sort of equilibrum is maintained). I
think that the kind of redundancy is not redundancy in the way how
computer scientists would think about it, but may in fact be necessary
so that a certain gene is read more often than another. For example if
you have two proteins A and B which are encoded by two genes gA and
gB, and the proteins A and B should appear in relation 2:3, it would
make sense that gA appears 2 times for each 3 appearance of gene gB.
This way the so called "redundancy" would actually matter.

Regards,
Stuart
 
P

Pascal J. Bourguignon

Unfortunately, you are right. But does the existence of a method

This is not a method. It's a practice. There's absolutely nothing
methodic in copy-and-pasting, it's anarchic.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,163
Messages
2,570,897
Members
47,434
Latest member
TobiasLoan

Latest Threads

Top