Generally, are the programs written by C++ slower than written by C10% ?

N

Nick Keighley

of ones' own code...

I don't really care whose code. It was copy-paste /itself/ I was
railing against.
copy/paste may well be justified in some cases, such as to avoid
creating a physical dependency between unrelated components,

but creating a copy-paste coupling. What you call the "synchronisation
problem". I'd have thought the avoidance of physical coupling was
quite rare but you use the phrase "subjecting it to endless copy-
paste" implying copy-paste is pretty widespread in your world.

whereas
traditional code-reuse creates dependencies by extension.
what?

however, copy-paste from others' code is plagiarism and may also be
illegal.

quite. But it wasn't what i talkign about.


typically because this creates dependencies between ones' components.

bung it in a shared library. Didn't they invent this concept in about
1955?

factoring things out to eliminate dependencies while still having
code-reuse may often introduce a fair amount of additional architectural
complexity (reuse via "mutual 3rd parties", ...).

I don't understand why it's *that* hard.
copy-paste allows physical decoupling without the need to create "3rd
parties".

a problem though is when copy-paste introduces "synchronization issues",
which yes, can be a nasty problem.

indeed. I thinks it's a major malaise.

ideally code should not be
copy-pasted in this case, or any such synchronization should be defined
as part of the "formal model" of whatever code or data is in question
(an example is things like codecs, where it is assumed that whatever
comes back out is whatever the encoder intended to come out, ...).

lost me. Are you saying the encode side shares code with the decode
side?

actually, several major implementations (glib, newlib, ...), are "GPL
with linking exception" and similar.

so you *can* use qsort()

hell, who knows, I go wherever the topic takes me, and have never really
been so good with keeping on-topic anyways...

or such...

I wasn't complaining about topicality. Just that your frequent segways
make it very difficult to work out what you are trying to say.

I almost get the idea you use copy-paste a lot and are now making up
reasons for it.

The REAL reason people use copy-paste is because it's expedient. It's
quicker (right now) to CP and hang the longer term consequences. In
the long run we're all dead anyway.
 
B

BGB

It also makes them exceedingly difficult to maintain.


Bollocks. If the code has to change, in my world the header changes and
N source files that use it rebuild. In yours, you edit all N places you
copied and pasted and N source files rebuild. I do 1/N times as much
work and the compiler does the same in both cases.

this is the issue of "synchronization".

if the code is such that a change would need to be propagated everywhere
it is used, then it is not as good of a choice for copy/paste.

most cases where copy/paste is sensibly used, are not cases where such
synchronization is likely to be necessary.

typically, each will drift apart, becoming functionally absorbed into
whatever new use-case they are put into.

Now as for bloat, please explain how N function calls cause more bloat
than N copy and pastes.

if the functions are contained directly in the headers, the following
happens:
all of the code has to be included every time the header is included;
one will often end up with a copy of any used function for *every*
compilation unit which uses the header, which may well be far larger
than the number of times it would have been sanely copy-pasted (probably
once per library/component, which is really not as much of an issue if
one breaks up their app into "libraries" of maybe 20-50 kloc...).

say, a 1Mloc project is broken into 20 50 kloc components.

now, if we copy-paste, maybe, 1 kloc of code to each, this only amounts
to an additional 20 kloc over the entire project.

now, if that 1 kloc is put in a header, and say, 100 loc worth of
functions are used in each compilation unit, and each compilation unit
is 1 kloc (so, 50 compilation units per library), this works out to
around 100 kloc worth of overhead in the project.


granted, IRL is is not likely to be so cleanly organized or predictable.

Boy you sure like making work for your self, don't you?

in practice, this has rarely been too big of an issue...

after the copy/paste/edit process, most such code is entirely
independent of its siblings/ancestors.

it then gains a separate identity, and often its functional role may be
notably altered in the process.

Just like the standard library!

a project may end up looking much like a bunch of libraries.

as-is, my project has 10MB worth of DLLs, spread over 22 libraries, and
a codebase of ~ 820 kloc last I checked (it was around 1.25 Mloc, prior
to the "great GPL purge" where I removed all code which was GPL, which
at the time dropped the codebase to ~ 750 kloc).

as for EXEs, there are 12MB spread over 50 files, many of these being
"front ends" (which organize the libraries above into a particular
"application" with a particular user interface).


but, it is not like copy-paste is always the preferred option, and code
may also be put into a shared location when:
a good candidate for a shared location exists;
the code is sufficiently independent of how it is being used (its
functional behavior does not depend on external context, ...);
doing this will not otherwise compromise the functional independence of
the components involved;
....


an example of compromising functional independence would be, for
example, one suddenly finds that using the assembler also requires one
to link against the C frontend, or the XML libraries, or the GUI
subsystem, or the 3D renderer.

so, the goal would be, say, that the assembler can be used
independently, and doesn't require also using other components in order
for it to be able to do its thing.

likewise, the GUI widgets shouldn't require also linking against the
assembler "just because...", even despite possibly some similar-looking
code having been shared between them.

sometimes, some basic level of, yes, copy-pasting crap, is necessary to
help keep down such dependencies.


another (macro-scale) example of such copy-pasting, are what are
traditionally called "project forks".


That's called a function template...

or a library function in general...

Those documents are all structured differently, so forcing the same
parser logic on each would be folly.

hence, why the logic is copy/pasted and edited to each particular use
case...


one doesn't have to write, say, a new fresh tokenizer each time, as an
existing tokenizer may be "fairly close", and one can edit it for the
particular types and rules of tokens one needs to parse (which types of
tokens there are, which combinations of characters there are, ...).

likewise goes for any particular logic related to the particular syntax.


however, for sufficiently similar syntax, a shared parser may be used.

for example, my C, Java, C#, and BGBScript2 compiler frontends all
shared the same core parser and compiler logic, mostly using conditional
logic to gloss over most language specific issues (enabling and
disabling features, using alternate declaration-handling logic, ...).

nevermind that my C compiler was itself mostly a fork of my BGBScript VM
(which discarded the bytecode interpreter, and focused solely on the
JIT), although each has developed independently (the idea with my now
dead "BGBScript2" effort was to essentially try to re-unify my compiler
and VM technology, leading to a single unified VM core which could
handle languages ranging between plain C and a highly-dynamic
JavaScript-like language, meanwhile, incorporating some amount of
architecture similar to the JVM and .NET).

however, this effort had design complexity issues which quickly ran out
of control at the time, and effort was shifted to a far more
conservative strategy: work on beating a lot of the new planned features
onto the pre-existing BGBScript VM core (rather than invest all the time
and effort in what was essentially a ground-up reimplementation of many
core parts of the VM). for this more conservative effort, I ended up
calling it "BGBScript 1.5", and it has implemented many of the planned
language features for BS2 (although with me eventually opting with a
more ActionScript3-like syntax, rather than the more Java+C# like syntax
planned for BS2).


or such...
 
B

BGB

No doubt, but it does make sense in many cases to at least prefer one
or another, because _mixing_ them can be problematic and/or
inefficient.

probably...

but, yeah, I, unsurprisingly, fall more on the stdio side of this issue,
and personally don't like iostreams as much on an aesthetic basis
(visually, it still seems a bit much like, someone, long ago, was like
"wow, cool, I have operator overloading" and decided to make heavy use
of it for the console interface, and whether or not this is a good idea?
who really knows...).
 
N

Nick Keighley

<sarcasm>
which copy-paste doesn't do at all...
It also makes them exceedingly difficult to maintain.

the first duty of the software engineer is to maintain the
maintainability of the source base


why not?

as does copy-paste. copy-paste is the ultimate bloat.
quite.


 If the code has to change, in my world the header changes and
N source files that use it rebuild.  In yours, you edit all N places you
copied and pasted and N source files rebuild.  I do 1/N times as much
work and the compiler does the same in both cases.

plus he has to get all those edits correct. His code must be a
nightmare to work with
Now as for bloat, please explain how N function calls cause more bloat
than N copy and pastes.


Boy you sure like making work for your self, don't you?

you say this like its a bad thing...

<snip>
 
N

Nick Keighley

No doubt, but it does make sense in many cases to at least prefer one
or another, because _mixing_ them can be problematic and/or
inefficient.

and iostreams are extensible. It's hard to make printf() print a user
defined type but relativly easy to make operator<< do it
 
N

Nick Keighley

probably...

but, yeah, I, unsurprisingly, fall more on the stdio side of this issue,
and personally don't like iostreams as much on an aesthetic basis
(visually, it still seems a bit much like, someone, long ago, was like
"wow, cool, I have operator overloading" and decided to make heavy use
of it for the console interface, and whether or not this is a good idea?
who really knows...).

I have to agree it gets pretty ugly as soon as you want to format
anything. I seem to rad the manual a lot more tahn I remmeber doing
with printf() (but maybe I forget how much of a PITA printf() was in
the beginning)
 
B

BGB

I don't really care whose code. It was copy-paste /itself/ I was
railing against.


but creating a copy-paste coupling. What you call the "synchronisation
problem". I'd have thought the avoidance of physical coupling was
quite rare but you use the phrase "subjecting it to endless copy-
paste" implying copy-paste is pretty widespread in your world.

well, it is not exactly unheard of...

for example, reasonably recently I was working on making my 3D engine
have a "server end" (in the Quake sense, where the "client" manages 3D
rendering and user input, and the "server" manages all of the world
physics, AIs, ...).

so, one has an issue:
both the 3D renderer, and server, have a need for things like
map-loading/BSPs/...

one option would have been to split the BSP code off from the rendering
code, leaving both the renderer and server depending on it.

however, these was no good choice of a library shared directly between
them, so one would have to be created.

also, one didn't want a direct dependency (say, server needs 3D renderer
to work), because they may be used independently (and "dedicated server
needs OpenGL window" would be silly).


great solution:
copy-paste this whole region of the 3D renderer into the server, and
then remove any 3D rendering stuff from the server copy.

in all, it has worked out fairly well.

but, there are other examples (parsers, codegen logic, ...).


one annoyance that has popped up a few times is that the code often
looks similar enough that when working on both at the same time, one may
end up applying edits to one meant for the other. but, this is a minor
issue.

this was partly because, annoyingly, adding support for bezier patches
and Doom3 map files did imply applying edits to both.

however, this case is fairly rare, as usually the versions are
independent (one does not synchronize changes, but only
modifies/fixes/... the version one is dealing with at the moment).


everyone who uses the code has to also link against it.


if a project gets largish, then this becomes a much bigger issue, like
which library depends on which, ...

these issues become far more apparent when building for Linux, as every
library dependency needs an "-l/name/" option, which may need to be
added to every Makefile which uses said library, ...


so, one could almost make a big flowchart of which libraries may depend
on which other libraries, ... (if one is into flowcharts and so on).

quite. But it wasn't what i talkign about.

fair enough, but someone mentioned copy-pasting others' code.

bung it in a shared library. Didn't they invent this concept in about
1955?

shared libraries and DLLs have their own drawbacks, and managing them
gets more complex into the double-digits of libraries, so it is
preferable to avoid creating new ones when possible.

I don't understand why it's *that* hard.

adding a new library (3rd party) may often involve much editing of
Makefile's.

when using the traditional "parallel trees of per-target Makefile's"
strategy, one generally wants to avoid any non-trivial changes to the
build-tree when possible, because it is a hassle.


granted, I guess some projects also end up tooling up their Makefile
tree as well, or switching to non-Makefile alternatives (such as "GNU
autoconf", "CMake", ...).

indeed. I thinks it's a major malaise.

not as big of an issue though, if following the copy, the code is
treated independently.

lost me. Are you saying the encode side shares code with the decode
side?

nope, typically, the specification for the codec will specify things
like rounding behavior and many other subtle details.


another example would be, say, two pieces of code assume that given
entities will be numbered according to a certain algorithm, so, for sake
of interaction one writes it up in the spec what is the algorithm for
numbering things, ...

some algorithms may also depend on the hash function as well, so the
hash function is also a part of the spec.

....

so you *can* use qsort()

yes, but it doesn't mean qsort is *good*...

I wasn't complaining about topicality. Just that your frequent segways
make it very difficult to work out what you are trying to say.

I almost get the idea you use copy-paste a lot and are now making up
reasons for it.

The REAL reason people use copy-paste is because it's expedient. It's
quicker (right now) to CP and hang the longer term consequences. In
the long run we're all dead anyway.

why not do whatever is more expedient?...

it is more a big question of time-saving option A vs time-saving option
B, and how each will impact the codebase, and also, which will be faster
and less effort.


but, anyways, why should one care about distant future stuff, when they
have all the stuff going on now to worry about?...

like, tomorrow creates its own worries, and one can deal with today's
worries today.

years from now?... who knows, who cares... maybe all the code will be
dead, or maybe its problems will have been fixed, but really it doesn't
matter because it is a long ways off, and one gains little by sitting
around and twiddling their thumbs worrying about the distant future.

granted, yes, it is pointless worrying about the past as well ("glory
days", "lost loves", ... bleh...).


or such...
 
B

BGB

Let's see: adding -l/name/ in a Makefile: 30 seconds; copy-pasting source
code from other project, adding new files to the project, adding new
files to the version control system: 15 minutes; fixing M bugs in N-1
diverged copies of source code and testing that the fix worked: M*(N-1)
hours or more... I think I stick with the 30 seconds option!

unless if applies to roughly 150 Makefiles, which is where the issue
comes in (then one has to go through a cycle of adding "-l/name/" to
each of them).

adding a few source files into a Makefile is also a lot easier than
adding a new library to the project, since the source files may often
only effect a single Makefile, whereas the addition of a library or
moving some code from one library to another, may potentially effect
*many* Makefiles.

in a few cases, I ended up putting several conceptually disjoint
"libraries" into a single DLL/SO, mostly as to avoid the problem of
having to go and update all of the Makefiles which depended on it.

Of course the build system must know the dependencies of the libraries,
otherwise it could not build them in the right order. BTW, the Makefile
format is a convenient way to mark these dependencies. About flowcharts -
I'm pretty sure the flowchart of our nightly build would not fit any
printable sheet of paper, but fortunately no one has ever needed such a
thing.

typically, in my case, it is done recursively as several stages:
"includes", which rebuilds headers;
"libs", which rebuilds any libraries;
"apps", which rebuilds any binaries;
"meta", which rebuilds any metadata databases.

currently, source-processing/generating tools building is handled in the
"includes" stage, meaning that any such tools can't depend on external
headers or libraries (they need to be entirely self-contained, and are
often implemented as single large source files and may be scattered
throughout the project).

trying to address the tools issue would likely end up introducing
additional stages, which would again, require going and making a bunch
of edits to the Makefile tree.

at each level, one may have a command like "all: includes libs apps
meta", to allow invoking make from various locations (rebuilding this
particular branch of the toplevel).

also, in some cases, special build targets exist generally to allow
omitting rebuilding parts of the project tree (mostly to save time, say,
if I am only working on the 3D engine, I don't want to also be
rebuilding all of the compiler and script-VM stuff...).


typically, each library also has its own little source tree, with a set
of directory organization rules:
"base" generally for source code (others may be used);
"include" for headers;
"docs" for any library-specific documentation;
....


the overall build process is generally a big recursive make with each
project directory level having its own set of Makefiles (sort of like in
the Linux kernel or similar, or at least that is how it was back when I
was messing with it, dunno if it is still this way now).


running a line counter for Makefiles:
10.84 kloc in 153 files.

yep...
 
N

Noah Roberts

unless if applies to roughly 150 Makefiles, which is where the issue
comes in (then one has to go through a cycle of adding "-l/name/" to
each of them).

adding a few source files into a Makefile is also a lot easier than
adding a new library to the project, since the source files may often
only effect a single Makefile, whereas the addition of a library or
moving some code from one library to another, may potentially effect
*many* Makefiles.

This sounds to be like a rather absurd response. Certainly at a point
at which one is tempted to copy/paste and is evaluating the cost of
maintaining that copy/pasted code vs. refactoring into a new source
file, the risk of cascading dependencies is limited. Certainly also
when one is tempted to copy paste large sections of code and
evaluating the cost of maintaining that disaster vs. refactoring into
a library, a limited number of dependencies upon that library exist.
Thus claiming that you'd have to then modify an inordinately large
amount of Makefiles to satisfy the refactor seems to me to be nothing
but a bunch of hand waving and panic.

Clearly if refactoring one source file out of one library into another
causes you to have to edit ALL the Makefiles in your project your
project's build system is configured quite poorly.

On the other hand, once the decision to copy/paste rather than
refactor has been made, and the rot has permeated the entire source
tree, it becomes a much larger project to refactor that mess away.
That of course must be granted but I think its a red herring here
since the crux of the argument is that you don't succumb to these
temptations and thus don't pay these extraordinary costs.

The standard approach is to recognize that you are tempted to use copy/
paste coding, recognize that succumbing to that temptation can come at
a very high maintenance cost, and instead factor the code you want to
reuse into a library as a FIRST step. At that point there's only one
dependency on the new library and the effort to add it to the build
system should be quite minimal. Compile, test...pass...use the new
library at the new location where you're adding a single, new
dependency and again the impact on the build system should be
negligible.

I might also suggest that if you're actually coding in Make by
hand...you're living in the dark ages. But then one who sees C++ as
nothing but "convenient" extensions to C is probably quite comfortable
there.
 
N

Nick Keighley

I'll probably give up. What you think is important is different from
what I think is important. And just bangin on about it isn't going to
change anyone's mind. I just thought people stopped writing code the
way you do sometime last century! I hope we never share a code base.

well, it is not exactly unheard of...

but is it incredible common ("yes" appears to be the answer)

for example, reasonably recently I was working on making my 3D engine
have a "server end" (in the Quake sense, where the "client" manages 3D
rendering and user input, and the "server" manages all of the world
physics, AIs, ...).

sounds a reasonable architecture
so, one has an issue:
both the 3D renderer, and server, have a need for things like
map-loading/BSPs/...

one option would have been to split the BSP code off from the rendering
code, leaving both the renderer and server depending on it.

yes. This is exactly what I would have done. It sounds likean
invoction of both the Single Responsibility Principle (SRP) *and*
Factor Mercilessly
however, these was no good choice of a library shared directly between
them, so one would have to be created.

and so? You make it sound like creating a library is some sort of big
deal! In full coding frenzy I must do it several times a day!
also, one didn't want a direct dependency (say, server needs 3D renderer
to work), because they may be used independently (and "dedicated server
needs OpenGL window" would be silly).

yes. Bonkers solution.
great solution:
copy-paste this whole region of the 3D renderer into the server, and
then remove any 3D rendering stuff from the server copy.

crap solution. You've bloated your code with two copies of the same
code. You've violated SRP and DRY (Don't repeat Yourself) and you now
have to debug the code twice. A maintenance nightmare. Go and read a
book on structured programming (I like Constantine and Yourdon)
in all, it has worked out fairly well.

but, there are other examples (parsers, codegen logic, ...).

and all have the same simple obvious answer. If you needd the same
thing in more than one place then package it up in re-usable form and
reuse it. Simple maintenance. Reduced bloat. Reduced debugging time.
one annoyance that has popped up a few times is that the code often
looks similar enough that when working on both at the same time, one may
end up applying edits to one meant for the other. but, this is a minor
issue.

well woop-y-doop. Who would have guessed *that* might happen. The
Database people talk about Normalised Form for damn good reasons. Just
as your data shouldn't be stored more than once (except for backup
purposes) so your code should not be in more than one place.

[licenses] is a reason for writing stuff yourself. it is /not/ a reason for
copy-paste. If you've written the code yourself why not package it in
reusable form?
typically because this creates dependencies between ones' components.
bung it in a shared library. Didn't they invent this concept in about
1955?

shared libraries and DLLs have their own drawbacks,

their drawbacks pale into insignificance compared with copy-paste-edit-
oops drawbacks
and managing them
gets more complex into the double-digits of libraries, so it is
preferable to avoid creating new ones when possible.

I think you're fooling yourself if you think you've avoided these
problems
adding a new library (3rd party) may often involve much editing of
Makefile's.

when using the traditional "parallel trees of per-target Makefile's"
strategy, one generally wants to avoid any non-trivial changes to the
build-tree when possible, because it is a hassle.

granted, I guess some projects also end up tooling up their Makefile
tree as well, or switching to non-Makefile alternatives (such as "GNU
autoconf", "CMake", ...).

I've used makefiles in the past and I don't recall them being that
much of a pain. maybe you need to refactor your build system. Oh wait
do you copy-paste makefiles as well...

These days I use studio and adding anew project is a piece of piss.
And yes there are at least double digit number of libraries on the
software i usually work on

nope, typically, the specification for the codec will specify things
like rounding behavior and many other subtle details.

another example would be, say, two pieces of code assume that given
entities will be numbered according to a certain algorithm, so, for sake
of interaction one writes it up in the spec what is the algorithm for
numbering things, ...

some algorithms may also depend on the hash function as well, so the
hash function is also a part of the spec.

all these decisions should be represented in code (or config files)
and should be shared between the two halves of the codec.

yes, but it doesn't mean qsort is *good*...

how do know its *bad*. I'm arguing you use qsort as your first choice
and IFF its shown to inadequate then you use something else. If I'm
sorting 20 items it probably doesn't matter what I use to sort things.

"Premature Optimisation is the root of all evil"

Probably applies more widely than software.

why not do whatever is more expedient?...

because it bites you in the bum later when your code base decays into
an unmaintanable morass. I've seen a large program (all in one file!)
where the same thing was done in at least three different ways. As the
code grew larger (as copy-paste made inevitable) it grewer harder to
understand so more copy-pasting ensued. The compiler began to generate
bad code (the compiler writers used "short jumps" and the code was
just too big for this to work). There was a horrid positive feedback
causeing the program to grow like topsey. Many "happy" hours spent
debugging it.
it is more a big question of time-saving option A vs time-saving option
B, and how each will impact the codebase, and also, which will be faster
and less effort.

but, anyways, why should one care about distant future stuff, when they
have all the stuff going on now to worry about?...

like, tomorrow creates its own worries, and one can deal with today's
worries today.

years from now?... who knows, who cares... maybe all the code will be
dead, or maybe its problems will have been fixed, but really it doesn't
matter because it is a long ways off, and one gains little by sitting
around and twiddling their thumbs worrying about the distant future.

granted, yes, it is pointless worrying about the past as well ("glory
days", "lost loves", ... bleh...).

or such

when you've spent some time debugging someone elses code maybe you'll
chnage your mind.

Anyone remember "The Software Crisis". Is it back?
 
B

BGB

This sounds to be like a rather absurd response. Certainly at a point
at which one is tempted to copy/paste and is evaluating the cost of
maintaining that copy/pasted code vs. refactoring into a new source
file, the risk of cascading dependencies is limited. Certainly also
when one is tempted to copy paste large sections of code and
evaluating the cost of maintaining that disaster vs. refactoring into
a library, a limited number of dependencies upon that library exist.
Thus claiming that you'd have to then modify an inordinately large
amount of Makefiles to satisfy the refactor seems to me to be nothing
but a bunch of hand waving and panic.

Clearly if refactoring one source file out of one library into another
causes you to have to edit ALL the Makefiles in your project your
project's build system is configured quite poorly.

well, the great issue is that one can't just be like (to make) "hey,
this code moved from over here to over there and now anything which
depends on X also needs to depend on Y".

hence, why one does need to edit the makefiles, especially for changes
near the project core (which may effect many potential clients of the
library).

On the other hand, once the decision to copy/paste rather than
refactor has been made, and the rot has permeated the entire source
tree, it becomes a much larger project to refactor that mess away.
That of course must be granted but I think its a red herring here
since the crux of the argument is that you don't succumb to these
temptations and thus don't pay these extraordinary costs.

The standard approach is to recognize that you are tempted to use copy/
paste coding, recognize that succumbing to that temptation can come at
a very high maintenance cost, and instead factor the code you want to
reuse into a library as a FIRST step. At that point there's only one
dependency on the new library and the effort to add it to the build
system should be quite minimal. Compile, test...pass...use the new
library at the new location where you're adding a single, new
dependency and again the impact on the build system should be
negligible.

for a codebase written by myself and having been maintained since around
the late 90s, it has probably achieved roughly a state of equilibrium...

I might also suggest that if you're actually coding in Make by
hand...you're living in the dark ages. But then one who sees C++ as
nothing but "convenient" extensions to C is probably quite comfortable
there.

it is hand-written Makefiles, and actually parallel Makefile trees for
each build target. I think originally I got the idea from NASM or
something, and used it more generally.

hence, a "Makefile.msvc" and "Makefile.lnx" build tree (Windows/MSVC and
Linux).
others, like "Makefile.cyg" and "Makefile.mingw" were past, but no
longer maintained, trees (for cygwin and mingw).

there is also "Makefile.lnxcc" which was for Linux cross-compiling to
ARM (because rebuilding the project inside an ARM emulator was very slow).


generally, it is less effort to do things in source than in Makefiles,
so due to the way my system works, I usually include all of the
target-specific source-files in every build target (then one can have
the list of source files be equivalent between all targets), but use
'#ifdef' to exclude the contents of files which are not needed.


in general though, it all works fairly well...
 
N

Noah Roberts

well, the great issue is that one can't just be like (to make) "hey,
this code moved from over here to over there and now anything which
depends on X also needs to depend on Y".

Except that this is not how things work on any platform I know of.
Clients of a library only need to link to the dependencies of that
library if it uses them. The library links to those parts that it
needs already.

In other words, you have component A, which depends on component B.
You pull some internal parts out from component B into component C so
that they can also be used by component D (they are internal because
if they were public you could just link to component B itself). The
changed component B links to the new component C and those parts that
are used by it are included in the library. The original component A
dependencies do not change.
 
B

BGB

Except that this is not how things work on any platform I know of.
Clients of a library only need to link to the dependencies of that
library if it uses them. The library links to those parts that it
needs already.

on some platforms, this is how it works, but not on all of them.

In other words, you have component A, which depends on component B.
You pull some internal parts out from component B into component C so
that they can also be used by component D (they are internal because
if they were public you could just link to component B itself). The
changed component B links to the new component C and those parts that
are used by it are included in the library. The original component A
dependencies do not change.

Windows does this with DLLs...
in this way, everything "just works" on Windows.


but, for whatever reason, on Linux with shared-objects, one seems to
have to link against *every* dependency when linking the final binary,
otherwise apparently "ld.so" or similar will throw a fit that it
couldn't find a bunch of functions when trying to load the program.


this means, say, if they give "-lm" or "-ldl" to one of the libraries,
they may also have to be sure to supply "-lm" or "-ldl" with the main
binary (even if the main binary itself does not directly use the math
functions or dlopen or similar, ...).

everything will link, so the linker seems to be happy, but it will barf
once one tries to load/run the program.


why?... who knows, it just seems to work this way, unless there is a bug
somewhere?... (but, all of the other programs work, so this is likely
the intended design). for all I know it could also have something to do
with "-Wl,-rpath,..." as well (needed as I tend not to install my apps
into "/usr/bin" and "/usr/lib64" or similar, which is generally assumed
by the OS, rather than, say, Windows, which will check the current
directory followed by everything in the PATH variable).


similar applies to the case where one is using static libraries, except
in this case "ld" will generally barf ("undefined symbol ..."), rather
than the program failing to load.

the eventual result generally being that adding a dependency "somewhere
down the line" may require propagating the dependency up-the-tree.


granted, I guess most people seem to use autoconf on Linux, which seems
to do this by itself (but I never really liked autoconf personally, and
it tends not really to work well on Windows anyways...).

I had looked at CMake and scons, and had partly considered the former,
but haven't done much with it yet due to the effort which would be
required to replace all the Makefiles with CMake scripts.


and, meanwhile, IRL just went was faced with the issue of a bunch of
dirt needing to be dug to reconnect ones' driveway with the road,
following some rain, and some people having torn up the road fairly bad
trying to get their car unstuck (them having just using the "use lots of
gas, dig a big hole" strategy) during said rain...


or such...
 
M

Miles Bader

BGB said:
but, for whatever reason, on Linux with shared-objects, one seems to
have to link against *every* dependency when linking the final binary,
otherwise apparently "ld.so" or similar will throw a fit that it
couldn't find a bunch of functions when trying to load the program.

What? That sounds completely wrong (so if you think it isn't, please
give more detail).

Shared libraries in linux have dependencies, just like programs, so
the end program only needs to link with the libraries it uses
directly; any "transitive" dependencies are pulled in automatically.

-Miles
 
N

Nobody

but, for whatever reason, on Linux with shared-objects, one seems to
have to link against *every* dependency when linking the final binary,
otherwise apparently "ld.so" or similar will throw a fit that it
couldn't find a bunch of functions when trying to load the program.

This shouldn't happen for correctly-built shared libraries (i.e. those
which have dependency information).

OTOH, static libraries don't have dependency information, so you need to
include indirect dependencies if you want to be able to build
statically-linked binaries for whatever reason.
why?... who knows, it just seems to work this way, unless there is a bug
somewhere?... (but, all of the other programs work, so this is likely
the intended design). for all I know it could also have something to do
with "-Wl,-rpath,..." as well (needed as I tend not to install my apps
into "/usr/bin" and "/usr/lib64" or similar, which is generally assumed
by the OS, rather than, say, Windows, which will check the current
directory followed by everything in the PATH variable).

The loader needs to be able to find the libraries. Apart from any
directories embedded in the binary via -rpath, it uses /etc/ld.so.cache
(generated by ldconfig based upon the contents of /etc/ld.so.conf) and the
LD_LIBRARY_PATH environment variable.

If you use non-standard locations, you may need to add those to
/etc/ld.so.conf. You also need to run ldconfig when adding new libraries.

But this is unrelated to whether you add -l switches for indirect
dependencies.
granted, I guess most people seem to use autoconf on Linux, which seems
to do this by itself (but I never really liked autoconf personally, and
it tends not really to work well on Windows anyways...).

autoconf is okay at what it does, but I draw the line there. I won't use
automake or libtool, as those tend to render the entire build process
unintelligible to anyone who isn't an automake/libtool guru. I expect
Makefiles to be human-readable.
 
M

Miles Bader

Nobody said:
autoconf is okay at what it does, but I draw the line there. I won't use
automake or libtool, as those tend to render the entire build process
unintelligible to anyone who isn't an automake/libtool guru. I expect
Makefiles to be human-readable.

I beg to differ. The human-readable file when using automake is
"Makefile.am", which is far, far, nicer (readable, maintainable, etc)
than a hand-written Makefile for anything but the most utterly trivial
cases. The language used by automake in Makefile.am is very nice.

As the process for generating Makefiles from Makefile.am is completely
automated and hands-free, there's no reason for users to ever look at
it.

_Very simple_ makefiles are generally pretty readable, but they very
quickly become a big mess of boilerplate. Automake solves this
problem by generating the boilerplate automatically, so that humans
never have to look at it.

[my general habit is to write straight Makefiles at the beginning, and
eventually when the complexity exceeds some threshold, change to
automake -- which is accompanied by a huge reduction in complexity and
size, but a big increase in functionality .... win win all 'round basically...]

-Miles
 
B

BGB

What? That sounds completely wrong (so if you think it isn't, please
give more detail).

Shared libraries in linux have dependencies, just like programs, so
the end program only needs to link with the libraries it uses
directly; any "transitive" dependencies are pulled in automatically.

well, that is what one would think, and what would seem sane (for a
sanely written loader), but for whatever reason (funky settings? bug?
....) it wasn't really working last I tried dealing with the issue.

this was on Fedora 13 x86-64, FWIW, and I doubt there was anything
seriously wrong with the install/..., or else, the thing probably
wouldn't boot up correctly.


may well happen to deal with the fact that I use RPATH so that the PWD
is checked for dependencies, although I have little idea why that might
effect much, except that this is the only thing really "unusual" I think
is being done here.


FWIW, below is the Linux-specific portion of one of the Makefiles (for
one of the core libraries):
<--
BUILD=.
MKSUF=.lnx
LIBSUF=.a
DLLSUF=.so
EXESUF=.elf
OBJSUF=.o
DLLPF=lib

CC_OUTEXE=gcc -o
CC_OUTDLL=gcc -shared -o
CC_OUTOBJ=gcc -c -o
CC_OUTTOOL=gcc -o

EXE = .elf
BIN = .bin
O = .o
A = .a

CFLAGS=-Iinclude -I../include -g -pg
DLL_CFLAGS=$(CFLAGS) -DBGBDY_DLL
LDFLAGS=-g -pg -L. -lbgbdy -L.. -lbgbgc -lbgbdy -lbgbasm -lzpack -lm \
-Wl,-rpath,"$$ORIGIN"
DLL_LDFLAGS=-L.. -lbgbgc -lbgbasm -lzpack -lm -fPIC \
-Wl,-rpath,"$$ORIGIN"

include Makefile.inc
-->

and, for reference, the shared portion (the 'inc' file):
<--
# base/memory2_2$(OBJSUF) \
# base/int128$(OBJSUF) \

# base/dyll_frame$(OBJSUF) \
# base/dyll_frame_x86$(OBJSUF) \
# base/dyll_frame_x64$(OBJSUF) \
# base/dyc_rcit$(OBJSUF) \

OBJS=\
base/prng$(OBJSUF) \
base/int128$(OBJSUF) \
base/float128$(OBJSUF) \
\
base/vfile2$(OBJSUF) \
base/vfile2_dir$(OBJSUF) \
base/vfile2_zip$(OBJSUF) \
base/vfile2_zpak$(OBJSUF) \
base/inflate$(OBJSUF) \
base/deflate$(OBJSUF) \
\
base/netval$(OBJSUF) \
\
base/dyll_func$(OBJSUF) \
base/dyll_func2$(OBJSUF) \
base/dyll_func2_auto$(OBJSUF) \
base/dyll_sig$(OBJSUF) \
base/dyll_sig_arg$(OBJSUF) \
base/dyll_sig_flags$(OBJSUF) \
base/dyll_addr$(OBJSUF) \
base/dyll_sigcache$(OBJSUF) \
base/dyll_metapath$(OBJSUF) \
base/dyll_catch$(OBJSUF) \
base/dyll_thunk$(OBJSUF) \
base/dyll_typebox$(OBJSUF) \
base/dyll_mrbc2$(OBJSUF) \
base/dyll_iface$(OBJSUF) \
base/dyll_struct$(OBJSUF) \
\
base/sqlite3.c \
base/dyll_sql$(OBJSUF) \
\
base/ty_complex$(OBJSUF) \
base/ty_quat$(OBJSUF) \
base/ty_matrix$(OBJSUF) \
\
base/dy_wref$(OBJSUF) \
base/dy_strarith$(OBJSUF) \
base/dy_chan$(OBJSUF) \
base/dy_dytf$(OBJSUF) \
\
base/dy_method$(OBJSUF) \
base/dy_oo$(OBJSUF) \
base/dy_dyo$(OBJSUF) \
base/dy_access$(OBJSUF) \
\
base/dyc_class$(OBJSUF) \
base/dyc_lookup$(OBJSUF) \
base/dyc_hash$(OBJSUF) \
base/dyc_proto$(OBJSUF) \
base/dyc_struct$(OBJSUF) \
base/dyc_ns$(OBJSUF) \
base/dyc_array$(OBJSUF) \
base/dyc_api$(OBJSUF) \
base/dyc_dyii$(OBJSUF) \
\
base/dyc_jni_fcn$(OBJSUF) \
base/dyc_jni_iface$(OBJSUF) \
base/dyc_jvmti_fcn$(OBJSUF) \
base/dyc_jvmti_iface$(OBJSUF) \
\
base/dy_opr$(OBJSUF) \
base/dy_cons$(OBJSUF) \
base/dy_sparse$(OBJSUF) \
base/dy_smxl$(OBJSUF) \
base/dy_array$(OBJSUF) \
base/dy_string$(OBJSUF) \
\
base/dy_xml$(OBJSUF) \
base/dy_print$(OBJSUF) \
base/dy_vli$(OBJSUF) \
base/dys_binenc$(OBJSUF) \
base/dys_bindec$(OBJSUF) \
base/dys_parse$(OBJSUF) \
base/dys_print$(OBJSUF) \
base/dyx_parse$(OBJSUF) \
base/dyx_print$(OBJSUF) \
base/dyx_sbxe$(OBJSUF) \
\
base/dysh_console$(OBJSUF) \
base/dysh_shell$(OBJSUF) \
\
base/bgbdy_api$(OBJSUF) \
\
bgal/bgal_main$(OBJSUF) \
bgal/bgal_interp$(OBJSUF) \
bgal/bgal_neuron$(OBJSUF)

SRCS=$(OBJS:$(OBJSUF)=.c)

DEP_HDRS = \
include/bgbdy_auto.h \
include/bgbdy_autoi.h \
include/dyc_auto.h

all: libs apps

FORCE:

libs: $(DLLPF)bgbdy$(DLLSUF)
apps: dytst0$(EXESUF) tst_class0$(EXESUF) gatst0$(EXESUF)

includes: $(DEP_HDRS)

include/bgbdy_auto.h: autohead$(EXESUF) $(SRCS)
./autohead$(EXESUF) -apionly BGBDY_API $(SRCS) > include/bgbdy_auto.h
include/bgbdy_autoi.h: autohead$(EXESUF) $(SRCS)
./autohead$(EXESUF) -noapi BGBDY_API $(SRCS) > include/bgbdy_autoi.h

include/dyc_auto.h: autohead$(EXESUF) base/dyc_api.c base/dyc_array.c
./autohead$(EXESUF) -api BGBDY_API base/dyc_api.c \
base/dyc_array.c > include/dyc_auto.h

%$(OBJSUF): %.c
$(CC_OUTOBJ)$@ $< $(CFLAGS)

# bgbdy.lib: $(DEP_HDRS) $(OBJS)
# lib /out:bgbdy.lib $(OBJS)

$(DLLPF)bgbdy$(DLLSUF): $(DEP_HDRS) $(SRCS)
$(CC_OUTDLL)$(DLLPF)bgbdy$(DLLSUF) $(SRCS) $(DLL_CFLAGS) $(DLL_LDFLAGS)

autohead$(EXESUF): autohead.c
$(CC_OUTTOOL)autohead$(EXESUF) autohead.c

dytst0$(EXESUF): dytst0.c $(DEP_HDRS) libs
$(CC_OUTEXE)dytst0$(EXESUF) dytst0.c $(CFLAGS) $(LDFLAGS)

tst_class0$(EXESUF): tst_class0.c $(DEP_HDRS) libs
$(CC_OUTEXE)tst_class0$(EXESUF) tst_class0.c $(CFLAGS) $(LDFLAGS)

gatst0$(EXESUF): gatst0.c $(DEP_HDRS) libs
$(CC_OUTEXE)gatst0$(EXESUF) gatst0.c $(CFLAGS) $(LDFLAGS)

clean:
-->
 
J

Joshua Maurice

What?  That sounds completely wrong (so if you think it isn't, please
give more detail).

Shared libraries in linux have dependencies, just like programs, so
the end program only needs to link with the libraries it uses
directly; any "transitive" dependencies are pulled in automatically.

This isn't true for my experience of Linux with gcc. However, I have
seen this on one other platform where the linker would complain if you
didn't give it all transitive shared-obj (aka shared-lib aka dlls) on
the command line for some cases. HP-UX maybe?
 
M

Miles Bader

Joshua Maurice said:
This isn't true for my experience of Linux with gcc. However, I have
seen this on one other platform where the linker would complain if you
didn't give it all transitive shared-obj (aka shared-lib aka dlls) on
the command line for some cases. HP-UX maybe?

Here's an example:

# A very silly program (on Debian):

$ cat > lt.cc <<EOF
#include <ImfRgbaFile.h>
int main () { Imf::RgbaInputFile inf ("oink", Imf::WRITE_RGBA); }
EOF

# We only specify a single library at link-time, "libIlmImf":

$ g++ -o lt -I/usr/include/OpenEXR lt.cc -lIlmImf

# Here's the dependencies in the resulting ELF file:

$ readelf --dynamic lt | grep NEEDED
0x0000000000000001 (NEEDED) Shared library: [libIlmImf.so.6]
0x0000000000000001 (NEEDED) Shared library: [libstdc++.so.6]
0x0000000000000001 (NEEDED) Shared library: [libm.so.6]
0x0000000000000001 (NEEDED) Shared library: [libgcc_s.so.1]
0x0000000000000001 (NEEDED) Shared library: [libc.so.6]

# But if we look at the actual libraries actualy _used_ at runtime:

$ ldd lt
linux-vdso.so.1 => (0x00007fff66a8d000)
libIlmImf.so.6 => /usr/lib/libIlmImf.so.6 (0x00007f063b738000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f063b42e000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f063b1ab000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f063af95000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f063ac11000)
libz.so.1 => /usr/lib/libz.so.1 (0x00007f063a9f8000)
libImath.so.6 => /usr/lib/libImath.so.6 (0x00007f063a7f3000)
libHalf.so.6 => /usr/lib/libHalf.so.6 (0x00007f063a5b1000)
libIex.so.6 => /usr/lib/libIex.so.6 (0x00007f063a392000)
libIlmThread.so.6 => /usr/lib/libIlmThread.so.6 (0x00007f063a18b000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f0639f6f000)
/lib64/ld-linux-x86-64.so.2 (0x00007f063ba20000)

# ... note there are a bunch of additional libraries!
#
# They are transitive dependencies, which came from the dependencies
# of "libIlmImf", the single library we linked with:

$ readelf --dynamic /usr/lib/libIlmImf.so | grep NEEDED
0x0000000000000001 (NEEDED) Shared library: [libz.so.1]
0x0000000000000001 (NEEDED) Shared library: [libImath.so.6]
0x0000000000000001 (NEEDED) Shared library: [libHalf.so.6]
0x0000000000000001 (NEEDED) Shared library: [libIex.so.6]
0x0000000000000001 (NEEDED) Shared library: [libIlmThread.so.6]
0x0000000000000001 (NEEDED) Shared library: [libpthread.so.0]
0x0000000000000001 (NEEDED) Shared library: [libstdc++.so.6]
0x0000000000000001 (NEEDED) Shared library: [libm.so.6]
0x0000000000000001 (NEEDED) Shared library: [libc.so.6]
0x0000000000000001 (NEEDED) Shared library: [libgcc_s.so.1]

-Miles
 
J

Joshua Maurice

Here's an example:

# A very silly program (on Debian):

[snip]

Indeed. As I said, that's how things work on Linux. However, I'm
pretty sure the situation is different for at least 1 other unix-like
variant. It's annoying when I add a new dependency and forget to
update the makefiles to add the transitive dependency to all places
where it's needed. I haven't really looked at why it is, just
begrudgingly accepted it and moved on. (I have made somewhat of a fuss
over broken build practices, but they tell me to ignore it and move
on, so meh.)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,142
Messages
2,570,818
Members
47,362
Latest member
eitamoro

Latest Threads

Top