Dennis Ritchie -- An Appreciation

J

James Kuyper

In C, typically functions that operate on a list will take a pointer
to an array of structures, and a count. There's then a loop from 0 to
count stepping through the array.

In C++ standard template library, the standard way is to pass in two
iterators, one to the start and one to the end of the sequence. The
iterator is then incremented until it equals the end point.

You can write C that way, too; and the C++ standard library also
contains counted versions of most of the functions that take a range
delimited by two iterators. Example:

std::copy_n(in_array, size, out_array)
 
N

Nick Keighley

Let's say you examine your program, and find that there are no logical
subroutines you can break out of the main loop. Are you still doing
structured programming?



I don't find it does.

I find that typically my programs are structured as arrays of arrays
of arrays, mainly of structures at the upper levels.

what? How can your *program* be structured as "arrays of arrays...".
My programs are hierarchies of function calls. Nearly a tree but most
likely a DAG. Did you mean your data structures are strcts containing
arrays? Seems very limited.

Where I need more
complex structures, quite often special links are needed for
performance, or for the algorithm. So in both cases the use of fancy
containers doesn't make much sense.

the STL containers have pretty good performance.

I'm not saying it wouldn't help if
there was a way of avoiding the little realloc dance when resizing a C
array, but this small benefit doesn't justify all the complexity in
interfacing that use of containers introduces.

*what* compilcation.

int a [10];
int b [10];
a[0] = 27;
a[1] = a[2];
memcpy (b, a, 10);

std::vector<int> v1(10);
std::vector<int> v2(10);
v1[0] = 27;
v1[1] = v1[2];
memcpy (&v2[0], &v1[0], 10);

I don't see the problem.

I know that in the C++
stl you can do everything with iterators, the problem is that most
programmers aren't disciplined enough to use the system.

but you don't have to. Why does it matter if they don't?
 
M

Malcolm McLean

but you don't have to. Why does it matter if they don't?
Think of the data elements as plugs and the functions as sockets.

Not every plug will fit every socket. As you multiply plugs and
sockets, the system gradually becomes harder and harder to understand,
use, and change.

That's why the standard template library tries to be a sort of adapter
that will hold any plug. But the system only works if everyone writes
all their functions to take iterators. In practice, people don't, for
a variety of reasons. So you end up with an even more complicated
welter of plugs and sockets than you had before.
 
M

Malcolm McLean

On Oct 30, 8:07 am, Malcolm McLean <[email protected]>

what? How can your *program* be structured as "arrays of arrays...".
My programs are hierarchies of function calls. Nearly a tree but most
likely a DAG. Did you mean your data structures are strcts containing
arrays? Seems very limited.
The data in the program, not the code.

Most things naturally fall into arrays of arrays. For instance a
protein consists of an array of atoms, each of which has an element
type and an x y z position. The atoms are grouped into amino acid
residues. The residues are grouped into chains. Then you might be
working on more than one protein.

That's what most data is like. Arrays are by far the most commonly
used data structure in C. They're the only one which has explicit
syntactical support.

That's not to say you never use other structures, there's a case for
representing the bonds between atoms as a graph, for instance, you
might want to do that for some applications.
 
N

nroberts

Not having scope tied destructors hurts. There are a whole lot of
methods for simplifying code through tying to scope that can't be done
without them. The gcc compiler has an extension though that allows
this in C. Of course it is quite obviously possible to go without
these things in C, especially since you're probably not using
exceptions (setjmp/longjmp is available but pretty rarely used), it
just makes things easier and more straight forward in some people's
opinions.

Some people like being able to create a variable that is guaranteed to
have a function called on it so you can fill that function with
important things like closing file handles, releasing resources,
etc... That way you don't have to remember to do so for every error
condition and in the right order, and only those parts you've
initiated, etc... Other people find the hiding of these details in an
interface like that makes code harder to understand and prefer the
verbosity of having to clean up properly for different conditions
within the block that deals with those conditions. Neither
perspective is wrong.
Oh, please. At least to of the above tend to make code unreadable. I
mean, if you see the following line, for instance:

x = y++;

and the compiler doesn't warn on this line on highest level, then in C
we have a pretty limited set of possibilities, namely:

- x and y can both be arithmetic types
- x and y can both be pointers to the same type
- x can be a void* and y can be a typed pointer
- x or y can be preprocessor macros, though in this case that would
really be bad... oh wait, there's at least one instance of it in my
system library:

#define errno (*__errno_location())

C++ adds:
- x or y can be class instances, in which case:
- there might be a typeof y::eek:perator++() which is then called
- there might be an operator++(typeof y) defined somewhere else
- there might be typeof x::eek:perator=(typeof y&)
- there might be operator=(typeof x&, typeof y&)
- typeof x might have an implicit constructor taking a typeof y
- typeof x might have an implicit constructor taking whatever
  typeof y::eek:perator++() or operator++(typeof y) return
- I don't know exactly about this one, but operator++(typeof y) might
  even return a class instance that has an operator(typeof x)(), IIRC

All of these features are important and have their uses. They can all
be used badly. Poor design is poor design in C, C++, or any other
language. Consider this function:

void function_that_outputs_blob(blob* b);

If that function does something other than outputting a blob, whatever
that means, then people reading code that use it are not going to know
what that code does. This is exactly the same for using standard
function names like operator++ to mean something other than the
standard use or having conversion operators for types to which
conversion makes no sense.
- even if they aren't class instances, we have:
- x might be a reference to typeof y
- y might be a reference to something
- x might even reference y, which means that this line invokes
  undefined behaviour
- or y might reference x, which yields the same

These later cases can all occur in C if the line is changed to:

*x += (*y)++;
As you can see, C++ blows this line all the way from "something will be
incremented and something else will store the old value" to "no friggin
idea what this line does! Might be anything. And I wouldn't even know
where to start looking"

Only if you're an idiot. I'm sorry, but it's simply true. If you
can't look at declarations of your x and y variables to see what type
they are then you are quite fucked in whatever language you choose to
be programming in.

You might of course be tempted to say something like, "But I shouldn't
have to go looking at the type to know what that line is doing." This
is of course a good argument but it is a good argument for variable
names that are more informative than 'x' and 'y'. Variable names need
to have some semantic meaning that explains what the variable is used
for. When that is done well you generally know what can be done with
the variable, which is more important than what type it is (which is
something I can let the compiler worry about).
As of yet I only saw one useful application of operator overloading and
references, and that was a typesafe printf() implementation (which
basically has the compiler choose the correct functions according to the
argument type and those functions then check whether the conversion
specification on the format string was correct).

Which is a pretty excellent example of great use of templates.

If that's the only example that you've seen though then you're
obviously not looking at a lot of C++.
Templates are yet another pothole to the learning programmer: Every
instanciation creates an entirely new and unrelated set of routines and
class variables.

As well it should. A list of widgets is not the same as a list of
blobbets. If they were then you'd instantiate the same template to
hold them both and your argument is moot.

C programmers tend to bring up this bloat "issue" a lot. It's not an
issue. In C++ you write a template to create new types. Templates
are like the preprocessor on steroids and fulfill many, but not all,
of the tasks originally reserved for macros. While I can write a
template for a container that works with any type in C++, in C I'm
stuck having to use void* or macros. The former might be a good
solution that doesn't introduce bloat, but it does require a lot of
care and may also introduce an unnecessary level of obscurity; it's
also no different from instantiating std::list<void*>. The other
option, macros, can get you a lot of the same thing, but they're MUCH
harder to use and are inherently unsafe.

If you want to write generic code that operates at the static type
level then you really need templates. Macros can almost get you there
but at a very high cost.

The one thing that can be said against templates is that they can be
hard to learn. The syntax isn't exactly optimal and the requirements
of when, where, and how to use the typename and template keywords can
be confusing. For example:

template < typename T >
struct example
{
typedef typename T::some_typedef local_typedef;

template < typename U > static void fun();
};

example<int>::template fun<double>();

It's kind of confusing and ugly. Bloated? Depends on how much
depends on the templates' parameters. If none of it does, then yeah
it's going to be unnecessarily bloated. If most of it does then no,
it's going to be about as "bloated" as it has to be.

Of course, the generic programming paradigm, which is what templates
are pretty much about, is often foreign to someone stuck in the C
world and unwilling to expand their knowledge. This is no different
from any other tool though. If you don't want to learn these things,
that's fine, but your willing ignorance isn't a good argument.

I mean, if I have:
class A { static int count; };

then there's only one A::count in the whole program. If I have

template <class T> class A {static int count; };

suddenly there's no bound to the number of A::counts in the program,
because there is _no_ A::count, but instead A<int>::count,
A<long>::count, etc.

As well it should. If you mean to count the amount of times a
particular class is instantiated then you SHOULD have a different
count for each class. Templates are not classes. They are
*templates* for creating classes...and other things. If you want to
count the amount of times any class built with that template are
instantiated then you need to account for that with a different
structure. Perhaps something like so:

struct count_these { static int count; };
template < typename T > struct A { static count_these our_count; };

This also more accurately reflects the fact that you are counting many
different things in the same bucket.
 
N

nroberts

Couple things on this...
In C, typically functions that operate on a list will take a pointer
to an array of structures, and a count. There's then a loop from 0 to
count stepping through the array.

Why would a C developer write a function that operates on a list but
takes an array?
In C++ standard template library, the standard way is to pass in two
iterators, one to the start and one to the end of the sequence. The
iterator is then incremented until it equals the end point.

This is if you're writing function templates that need to operate on
generic containers. If you are writing a function that operates on a
particular instance you generally simply pass the container by
reference and use its members to get the information you need to
perform your algorithm.
The C++ way is considerably more complicated, both in terms of what
the compiler is actually doing, and to use. The proof is that it's
very common to see functions written "incorrectly".

Please explain the former claim. What that the compiler is doing is
more complicated here? I would tend to intuit that it would be
exactly the opposite as the compiler would generally translate the
index version into the iterator version.

As to more complicated to use, I just don't see it. Can you give an
example of an incorrectly written function?
 
N

nroberts

Let's say you examine your program, and find that there are no logical
subroutines you can break out of the main loop. Are you still doing
structured programming?
I don't find it does.
I find that typically my programs are structured as arrays of arrays
of arrays, mainly of structures at the upper levels.

what? How can your *program* be structured as "arrays of arrays...".
My programs are hierarchies of function calls. Nearly a tree but most
likely a DAG. Did you mean your data structures are strcts containing
arrays? Seems very limited.
Where I need more
complex structures, quite often special links are needed for
performance, or for the algorithm. So in both cases the use of fancy
containers doesn't make much sense.

the STL containers have pretty good performance.
I'm not saying it wouldn't help if
there was a way of avoiding the little realloc dance when resizing a C
array, but this small benefit doesn't justify all the complexity in
interfacing that use of containers introduces.

*what* compilcation.

   int a [10];
   int b [10];
   a[0] = 27;
   a[1] = a[2];
   memcpy (b, a, 10);

   std::vector<int> v1(10);
   std::vector<int> v2(10);
   v1[0] = 27;
   v1[1] = v1[2];
   memcpy (&v2[0], &v1[0], 10);

Why would you do that?? You're much better off using assignment and
letting the implementation decide if memcpy is even the best thing to
do! You're getting no benefit by using it directly and only introduce
extra complexity and confusion by using the C API here.
 
B

Ben Bacarisse

That's what most data is like. Arrays are by far the most commonly
used data structure in C. They're the only one which has explicit
syntactical support.

What makes this true of arrays and not of structs? I.e. I don't see how
you are define a data structure with explicit syntactical support.

<snip>
 
B

Ben Bacarisse

Malcolm McLean said:
Think of the data elements as plugs and the functions as sockets.

Not every plug will fit every socket. As you multiply plugs and
sockets, the system gradually becomes harder and harder to understand,
use, and change.

That's why the standard template library tries to be a sort of adapter
that will hold any plug. But the system only works if everyone writes
all their functions to take iterators. In practice, people don't, for
a variety of reasons. So you end up with an even more complicated
welter of plugs and sockets than you had before.

How does C solve this problem of having a great may data types and badly
written (i.e. overly specific) function interfaces?

<snip>
 
M

Malcolm McLean

How does C solve this problem of having a great may data types and badly
written (i.e. overly specific) function interfaces?
It doesn't really solve the problem. One of the most dangerous
features of C is the ability to typedef a basic type to something
like, say DWORD. Then you find yourself rewriting perfectly good code,
just because someone decided to put DWORDs where they really meant
"int", and the code no longer runs under the particular operating
system where DWORDs are used.

But it alleviates it, because it's easy to write a function that
operates on lists (a list is an ordered collection, usually of like
items) as taking an array and a count. It's hard to do anything
fancier, like wrapping the list into a structure with a "length"
member, creating a linked list, or semi-hardcoding the length of the
array with a preprocessor define. So the plugs might not fit the
sockets, but at least all the sockets are set up in a similar way.

Once you start allowing containers, that simplicity goes.
 
M

Malcolm McLean

   std::vector<int> v1(10);
   std::vector<int> v2(10);
   v1[0] = 27;
   v1[1] = v1[2];
   memcpy (&v2[0], &v1[0], 10);


I don't see the problem.
This is classic badly-written C++. As always, the issue is hard to
illustrate in a snippet or toy example, but bites you in real code.

std::vector<int> grocery_ids(10);

/* deep down in the gubbins */
memcpy(grocery_ids, toptensellers, 10 * sizeof(int));

Ten years later, the business expands. We have been giving each line
of groceries an id, but now we've reached the 2 billion mark.

the first line becomes

std::vector<BIGNUM> grocery_ids(10);

Now most of the code has been written using iterators, and is robust
to this. But your little routine, hidden away deep in the gubbins, is
now a bug.
 
I

Ian Collins

It doesn't really solve the problem. One of the most dangerous
features of C is the ability to typedef a basic type to something
like, say DWORD. Then you find yourself rewriting perfectly good code,
just because someone decided to put DWORDs where they really meant
"int", and the code no longer runs under the particular operating
system where DWORDs are used.

Conditionally typedef depending on the platform?
But it alleviates it, because it's easy to write a function that
operates on lists (a list is an ordered collection, usually of like
items) as taking an array and a count. It's hard to do anything
fancier, like wrapping the list into a structure with a "length"
member, creating a linked list, or semi-hardcoding the length of the
array with a preprocessor define. So the plugs might not fit the
sockets, but at least all the sockets are set up in a similar way.

Once you start allowing containers, that simplicity goes.

Once again, how? Is

void f( std::vector<int>& );

more complex than

void f( int*, size_t );

?

With a container, the size information is embedded, no need to pass a
count (which you would have to manually track). From where I sit, the
container removes at least two bits of complexity: you don't have to
track a size and you don't have to worry about a null pointer.
 
B

Ben Bacarisse

Malcolm McLean said:
It doesn't really solve the problem. One of the most dangerous
features of C is the ability to typedef a basic type to something
like, say DWORD. Then you find yourself rewriting perfectly good code,
just because someone decided to put DWORDs where they really meant
"int", and the code no longer runs under the particular operating
system where DWORDs are used.

But it alleviates it, because it's easy to write a function that
operates on lists (a list is an ordered collection, usually of like
items) as taking an array and a count. It's hard to do anything
fancier, like wrapping the list into a structure with a "length"
member, creating a linked list, or semi-hardcoding the length of the
array with a preprocessor define. So the plugs might not fit the
sockets, but at least all the sockets are set up in a similar way.

Once you start allowing containers, that simplicity goes.

Best just to say I disagree, in that his does not match my experience
with C++. In my limited experience, as the types proliferate, C++
starts to win out big time.

Some code might clarify things. You might want to sketch a situation
with lots of "plugs and sockets" which gets more complex hen written in
C++.
 
B

Ben Bacarisse

Malcolm McLean said:
   std::vector<int> v1(10);
   std::vector<int> v2(10);
   v1[0] = 27;
   v1[1] = v1[2];
   memcpy (&v2[0], &v1[0], 10);


I don't see the problem.
This is classic badly-written C++. As always, the issue is hard to
illustrate in a snippet or toy example, but bites you in real code.

std::vector<int> grocery_ids(10);

/* deep down in the gubbins */
memcpy(grocery_ids, toptensellers, 10 * sizeof(int));

I agree it's bad C++ but it's bad C as well. You need

memcpy(grocery_ids, toptensellers, 10 * sizeof grocery_ids[0]);

to avoid the most obvious problems down the line. If the two arrays
end up having different element types, the bug is much more serious than
just this memcpy. Here again, C++ offers a simple solution (use the
"counted" form of std::copy) where C does not.
Ten years later, the business expands. We have been giving each line
of groceries an id, but now we've reached the 2 billion mark.

the first line becomes

std::vector<BIGNUM> grocery_ids(10);

Now most of the code has been written using iterators, and is robust
to this. But your little routine, hidden away deep in the gubbins, is
now a bug.

I'm still struggling to see where C++ adds problems rather than offering
solutions.
 
M

Malcolm McLean

Once again, how?  Is

void f( std::vector<int>& );

more complex than

void f( int*, size_t );

?

With a container, the size information is embedded, no need to pass a
count (which you would have to manually track).  From where I sit, the
container removes at least two bits of complexity: you don't have to
track a size and you don't have to worry about a null pointer.
You have to use std::vectors everywhere with the C++ version, if you
want to call f. You have to use arrays of ints everywhere with the C
version if you want to do the same.

The C situation will happen, the C++ situation won't.

Snippets don't really illustrate the problem very well. Computer
programs begin to break down when complexity gets beyond a certain
level, and a human can no longer keep track of all the variables and
types within the program.

Tagging an array with a size member is a genuine advantage of a
vector. The cost is that it becomes harder to see what the compiler is
doing. Another problem is that, often, code ends up being written with
vect.size() rather than N, which makes the expressions unreadable.
Loss of the null pointer isn't such an advantage. I've seen C++ code
with "null objects" to get round the problem that sometimes things are
missing and we need to express "not there".
 
I

Ian Collins

You have to use std::vectors everywhere with the C++ version, if you
want to call f. You have to use arrays of ints everywhere with the C
version if you want to do the same.

The C situation will happen, the C++ situation won't.

Eh? If you want to call a function that requires a specific type, you
have to use that type irrespective of the language.
Snippets don't really illustrate the problem very well. Computer
programs begin to break down when complexity gets beyond a certain
level, and a human can no longer keep track of all the variables and
types within the program.

Which is where encapsulation is the programmer's friend, it helps to
reduce the complexity exposed to the programmer.
Tagging an array with a size member is a genuine advantage of a
vector. The cost is that it becomes harder to see what the compiler is
doing.

In all but the most basic situations, the programmer lost track of what
the compiler is doing when compilers grew optimisers.
Another problem is that, often, code ends up being written with
vect.size() rather than N, which makes the expressions unreadable.

You've lost me there, what is N? If in the the size of an array,
doesn't all the additional code used to track it do far more to make the
code hard to follow? container.size() is very idiomatic in C++, so its
use doesn't cause any readability issues. If you do want to simplify an
expression, just an a "const size_t n = vect.size();" before the
expression. This is no worse than keeping taps of an array's length -
you just keep table at the last moment!
Loss of the null pointer isn't such an advantage. I've seen C++ code
with "null objects" to get round the problem that sometimes things are
missing and we need to express "not there".

You'll see kludges an any language! Null objects are an abomination and
anyone using them should be tarred and feathered by their peers.
 
M

Malcolm McLean

Some code might clarify things.  You might want to sketch a situation
with lots of "plugs and sockets" which gets more complex hen written in
C++.

I'm not sure I've even got the syntax right, it's so long since I did
this, but the root of the problem goes something like this

template <class Iterator, class numeric>
<numeric> standard_deviation( Iterator begin, Iterator end)
{
while(begin !- end)
{
/* code here */
++begin;
}
/* There's something wrong with out standard deviation, so I want
to call a tried and tested function to get the mean, which is foo ()
*/
foo();
/* The problem is that, even though under the bonnet I've just been
passed a double *, the only way of calling foo() is to construct a
temporary vector */
}

temple <class numeric>
<numeric> foo( std:vector<numeric> &list )
{
}

In C, you can have the problem that we might have floats or doubles or
long doubles. The whole point of the template idea is that you only
have to write the routines once. But you need iron tight discipline to
achieve this.
 
N

Nick Keighley

Not having scope tied destructors hurts.  There are a whole lot of
methods for simplifying code through tying to scope that can't be done
without them.  The gcc compiler has an extension though that allows
this in C.  Of course it is quite obviously possible to go without
these things in C, especially since you're probably not using
exceptions (setjmp/longjmp is available but pretty rarely used), it
just makes things easier and more straight forward in some people's
opinions.

Some people like being able to create a variable that is guaranteed to
have a function called on it so you can fill that function with
important things like closing file handles, releasing resources,
etc...

in C++ RAII it's considerd better style not to do too many things in
the destructor. So you'd have a class to hold the file handle and a
class for each resource that needed special handling. Then your super
class doesn't need any explicit code in the destructor.
 That way you don't have to remember to do so for every error
condition and in the right order, and only those parts you've
initiated, etc...  Other people find the hiding of these details in an
interface like that makes code harder to understand and prefer the
verbosity of having to clean up properly for different conditions
within the block that deals with those conditions.  Neither
perspective is wrong.








All of these features are important and have their uses.  They can all
be used badly.  Poor design is poor design in C, C++, or any other
language.  Consider this function:

void function_that_outputs_blob(blob* b);

If that function does something other than outputting a blob, whatever
that means, then people reading code that use it are not going to know
what that code does.  This is exactly the same for using standard
function names like operator++ to mean something other than the
standard use or having conversion operators for types to which
conversion makes no sense.

this is why I rarely overload operators (except for << >> for i/o
streams). its pretty well best to confine operator over loading to
arithmatic types.
These later cases can all occur in C if the line is changed to:

*x += (*y)++;


Only if you're an idiot.  I'm sorry, but it's simply true.  If you
can't look at declarations of your x and y variables to see what type
they are then you are quite <expletive> in whatever language you choose to
be programming in.

I'm with you up to here.
You might of course be tempted to say something like, "But I shouldn't
have to go looking at the type to know what that line is doing."

sounds pretty crazy to me
 This is of course a good argument

no it isn't

but it is a good argument for variable


noooooo! This way lies the Hungarian Lunacy. If your functions are
small and localised then the variable will be defined close by and we
can see what type it is.

void refrog (MyType x, MyType y)
{
x = y++;
}

Variable names need
to have some semantic meaning that explains what the variable is used
for.

no. really really no.

 When that is done well you generally know what can be done with
the variable, which is more important than what type it is (which is
something I can let the compiler worry about).

a well chosen type tells you the semantics of the type

operator<< does soemthing like this with io streams.

operator()() allows you define classes that behave like functions
operator=() sometimes has to be defined
operator==() is often nice
conversion operators can be useful if used carefully
Which is a pretty excellent example of great use of templates.

If that's the only example that you've seen though then you're
obviously not looking at a lot of C++.

give some examples then. Arthmatic types need them but what else?
string concatentaion. And?


so does copy-pasteing which is the poor mans template

As well it should.  A list of widgets is not the same as a list of
blobbets.  If they were then you'd instantiate the same template to
hold them both and your argument is moot.

C programmers tend to bring up this bloat "issue" a lot.  It's not an
issue.  In C++ you write a template to create new types.  Templates
are like the preprocessor on steroids

they make your balls shrink?

and fulfill many, but not all,
of the tasks originally reserved for macros.

gah. For the /misuses/ of macros

<snip good stuff>
 
N

Nick Keighley

In C, typically functions that operate on a list will take a pointer
to an array of structures, and a count. There's then a loop from 0 to
count stepping through the array.

you can do the same with a C++ vector except you don't have to
explicitly pass the size.
 
N

Nick Keighley

On Oct 30, 4:41 am, Malcolm McLean <[email protected]>

Why would a C developer write a function that operates on a list but
takes an array?

people use the term "list" in "shopping list" mode. The mean a
collection of things (possibly ordered) not necessarily the CS term
"linked list". In a sense they are being /more/ abstract!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,093
Messages
2,570,607
Members
47,227
Latest member
bluerose1

Latest Threads

Top