It also makes them exceedingly difficult to maintain.
Bollocks. If the code has to change, in my world the header changes and
N source files that use it rebuild. In yours, you edit all N places you
copied and pasted and N source files rebuild. I do 1/N times as much
work and the compiler does the same in both cases.
this is the issue of "synchronization".
if the code is such that a change would need to be propagated everywhere
it is used, then it is not as good of a choice for copy/paste.
most cases where copy/paste is sensibly used, are not cases where such
synchronization is likely to be necessary.
typically, each will drift apart, becoming functionally absorbed into
whatever new use-case they are put into.
Now as for bloat, please explain how N function calls cause more bloat
than N copy and pastes.
if the functions are contained directly in the headers, the following
happens:
all of the code has to be included every time the header is included;
one will often end up with a copy of any used function for *every*
compilation unit which uses the header, which may well be far larger
than the number of times it would have been sanely copy-pasted (probably
once per library/component, which is really not as much of an issue if
one breaks up their app into "libraries" of maybe 20-50 kloc...).
say, a 1Mloc project is broken into 20 50 kloc components.
now, if we copy-paste, maybe, 1 kloc of code to each, this only amounts
to an additional 20 kloc over the entire project.
now, if that 1 kloc is put in a header, and say, 100 loc worth of
functions are used in each compilation unit, and each compilation unit
is 1 kloc (so, 50 compilation units per library), this works out to
around 100 kloc worth of overhead in the project.
granted, IRL is is not likely to be so cleanly organized or predictable.
Boy you sure like making work for your self, don't you?
in practice, this has rarely been too big of an issue...
after the copy/paste/edit process, most such code is entirely
independent of its siblings/ancestors.
it then gains a separate identity, and often its functional role may be
notably altered in the process.
Just like the standard library!
a project may end up looking much like a bunch of libraries.
as-is, my project has 10MB worth of DLLs, spread over 22 libraries, and
a codebase of ~ 820 kloc last I checked (it was around 1.25 Mloc, prior
to the "great GPL purge" where I removed all code which was GPL, which
at the time dropped the codebase to ~ 750 kloc).
as for EXEs, there are 12MB spread over 50 files, many of these being
"front ends" (which organize the libraries above into a particular
"application" with a particular user interface).
but, it is not like copy-paste is always the preferred option, and code
may also be put into a shared location when:
a good candidate for a shared location exists;
the code is sufficiently independent of how it is being used (its
functional behavior does not depend on external context, ...);
doing this will not otherwise compromise the functional independence of
the components involved;
....
an example of compromising functional independence would be, for
example, one suddenly finds that using the assembler also requires one
to link against the C frontend, or the XML libraries, or the GUI
subsystem, or the 3D renderer.
so, the goal would be, say, that the assembler can be used
independently, and doesn't require also using other components in order
for it to be able to do its thing.
likewise, the GUI widgets shouldn't require also linking against the
assembler "just because...", even despite possibly some similar-looking
code having been shared between them.
sometimes, some basic level of, yes, copy-pasting crap, is necessary to
help keep down such dependencies.
another (macro-scale) example of such copy-pasting, are what are
traditionally called "project forks".
That's called a function template...
or a library function in general...
Those documents are all structured differently, so forcing the same
parser logic on each would be folly.
hence, why the logic is copy/pasted and edited to each particular use
case...
one doesn't have to write, say, a new fresh tokenizer each time, as an
existing tokenizer may be "fairly close", and one can edit it for the
particular types and rules of tokens one needs to parse (which types of
tokens there are, which combinations of characters there are, ...).
likewise goes for any particular logic related to the particular syntax.
however, for sufficiently similar syntax, a shared parser may be used.
for example, my C, Java, C#, and BGBScript2 compiler frontends all
shared the same core parser and compiler logic, mostly using conditional
logic to gloss over most language specific issues (enabling and
disabling features, using alternate declaration-handling logic, ...).
nevermind that my C compiler was itself mostly a fork of my BGBScript VM
(which discarded the bytecode interpreter, and focused solely on the
JIT), although each has developed independently (the idea with my now
dead "BGBScript2" effort was to essentially try to re-unify my compiler
and VM technology, leading to a single unified VM core which could
handle languages ranging between plain C and a highly-dynamic
JavaScript-like language, meanwhile, incorporating some amount of
architecture similar to the JVM and .NET).
however, this effort had design complexity issues which quickly ran out
of control at the time, and effort was shifted to a far more
conservative strategy: work on beating a lot of the new planned features
onto the pre-existing BGBScript VM core (rather than invest all the time
and effort in what was essentially a ground-up reimplementation of many
core parts of the VM). for this more conservative effort, I ended up
calling it "BGBScript 1.5", and it has implemented many of the planned
language features for BS2 (although with me eventually opting with a
more ActionScript3-like syntax, rather than the more Java+C# like syntax
planned for BS2).
or such...