Comparision of C Sharp and C performance

M

Moi

The unanswerable question:

Did he do that because he's fundamentally dishonest, and wanted to rig
the test to look bad for C, or did he do that because he doesn't
understand the algorithm well enough to do better?


I don't know, I don't say, I don't care.
Any possible answer would be off-topic.
Let everybody decide for themselves how to deal with pedant trolls.

BTW: Spinoza's handling of zeros used as NULL-values inside the hash table
is wrong, too. Could have been be handled by some clever bit-cramming.

AvK
 
S

spinoza1111

I think it is much worse than that.  The C# code can chose the table
size for good performance (and I can't imagine it would not).  In the
C version, the table was chosen to be as bad as it could be without
failing (i.e. exactly the same size as the number of numbers being put
into the set).

That is not the case, and part of the problem is that you think so
normatively. In a toxic and pseudo-scientific corporate environment,
where (in Adorno's words) "all scientists are candidate for posts"
there's no real problem solving, just a continual (and futile) effort
to establish and destroy credibility.

You are correct in saying that the C version becomes very slow as the
number of table entries approaches the size of the table. I've already
noted that this is a problem, and how to solve it by using a linked
list originating at each entry.

However, the two programs demonstrate my point. C more or less forces
the C program to decide on the maximum table size, therefore the test,
which filled up the table, was realistic, since tables often fill up
in production.

Whereas in a completely encapsulated way the C Sharp program either
preallocated more than enough storage or automatically expanded the
table. We don't know what it did because we do not need to know. In
production, the only responsibility of the user of hashset is to but
additions in try..catch error handling.

Giving C this type of flexibility would involve a lot of extra coding,
or visiting some creepy site.

If it becomes necessary to delete from the C hash table, and the basic
"search forward to available entry" method actually implemented is in
use, a special value is needed to mark deleted entries, since entries
that match the hash code need to be retrieved past the deleted entry.
If linked lists are used, they need to be searched to find the entry
to be deleted. Whereas in the C Sharp code, deletion is one line of
additional code!

Therefore, the benchmark numbers are realistic. A simple C hash table,
implemented in the simplest fashion without a link list, will probably
run much slower than a C Sharp hash table user application with much
more coding effort. It will be much harder to maintain.

C sucks, and it's not "efficient" in practise.
 
S

spinoza1111

...



Hot, sexy, broken.  Full employment.  C is good.

Seriously, does anyone not see just how bad the economy could get if
software actually worked?

I do see that it is getting really bad, precisely because more and
more shops are using much better software written for free by people
who don't even know they're virtual slaves. Employment in the USA is
at 10.2% and much higher for minorities.

Add to this the fact that businesses don't care about software safety,
and the employment picture is grim, indeed.
 
S

spinoza1111

...



Hot, sexy, broken.  Full employment.  C is good.

Seriously, does anyone not see just how bad the economy could get if
software actually worked?

You (or you and I) should write a book about bad software and
programmer stupidity called "hot, sexy, and broken" with a bimbo in a
bikini on the cover. It'd sell like hotcakes, like "Hot Naked Chicks
and World News Report" in Idiocracy.
 
S

spinoza1111

I think it is much worse than that.  The C# code can chose the table
size for good performance (and I can't imagine it would not).  In the
C version, the table was chosen to be as bad as it could be without
failing (i.e. exactly the same size as the number of numbers being put
into the set).

There is no evidence that any attempt at a reasonable comparison is
being attempted, here.

To perform the same task, the obvious C solution would use a bit
array, since the number range in the C# code was quite small, but I
suspect that was an arbitrary choice.
Might be. But the problem is that the optimal solution is conceivable
only by a person who thinks in C all the time. I submit that this
stunts the mind even as a programmer, not to mention humanity. The
sort of crap that's posted here indicates this.

I can concede that a Troglodyte who overspecializes in C could write
code that operates in a fraction of the C Sharp time for the same
reason that assembler can in the last analysis run at the fraction of
the time of even C; the distributed software that calculates large
primes on thousands of computers all over the world, using spare
cycles on volunteer computers, seems to be written in assembler.

This shows the ultimate vanity of C; it pretends to be as fast as
assembler but is not.

But in either assembler or C, a level of committment and intensity
consistently destroys the programmer's humanity save in rare cases of
true genius, of a sort which isn't found here, and which itself (like
Grigory Perelman, the Russian mathematician who left the field after
refusing the Fields medal) is smart enough to recognize an expense of
spirit in a waste of shame when it sees it.
 
S

spinoza1111

In your C example, you are using linear probing combined
with a fill factor of 100%. That is a bad choice, IMO.

I explained that already as a problem. I said that the average search
time slows down as the table fills up. I've known this for nearly
forty years, since I read it in Knuth in 1971. I said (did you miss
it) that you need to use linked lists based at each entry to get K*C
time where C is the average number of collisions.

The problem is, as I've said, that the C program is more
psychologically complex and harder to maintain than C Sharp code which
does the same thing better.
Also, beware of rand(). It is guaranteed to deliver only 15 bits of random.
Implementations _may_ produce a larger range of values.

I am also aware of the limitations of rand(). Indeed, many defenses of
C snigger at its obvious facilities and recommend arcana, some of
which is available from creepy sites. But this is completely absurd,
because one of the major reasons for using a programming language is
clarity.

Of course, the hatred of clarity, combined with a basic
misunderstanding of the very word, is on display in the two documents
that started the Schildt canard: they call him "clear" (which means
he's correct by implication) and then tear into him...for being
insufficientl unclear, and for failing to tell C programmers what Thou
Shalt Not like a proper *ayatollah* or *imam* of Holy C.
 
S

spinoza1111

I think it is much worse than that.  The C# code can chose the table
size for good performance (and I can't imagine it would not).  In the
C version, the table was chosen to be as bad as it could be without
failing (i.e. exactly the same size as the number of numbers being put
into the set).

There is no evidence that any attempt at a reasonable comparison is
being attempted, here.

To perform the same task, the obvious C solution would use a bit

The problem is that like most "obvious C solutions", a bit array
doesn't scale. Sure, it's cheesily and in the small "elegant" to small
minds to allocate a huge bit array and index into this directly, for
the space taken would be 1/32 of an array of longs.

But if the key is changed to a more realistic range of values (such as
an alphanumeric symbol), the bit vector gets too large. You're back
where you started, and this often happens when the user initially says
"the key is a number" and later says "it's a symbol".

Whereas the hashkey scales nicely. It would even be better to roll
your own hashkey class, one that allows the user to give it an idea of
the key range on instance creation. Then, the hashkey class could use
a bit map as appropriate.
 
S

spinoza1111

The unanswerable question:

Did he do that because he's fundamentally dishonest, and wanted to rig the
test to look bad for C, or did he do that because he doesn't understand
the algorithm well enough to do better?

Peter, when you make the issue into personalities, you enable the
destruction of this newsgroup. That's because people have the right to
defend themselves.

Like most code posted here, the code was written quickly for
discussion by educated professionals, who've studied computer science
in a non-corporate environment or somehow acquired collegial habits
and basic decency.

A more tuned C version (such as Ben's bit map) doesn't scale. It is of
course good to know that inserting hash table entries after the
collision creates slow performance, and it was I who said this first.
But we need to compare programs of roughly equal psychological
complexity.

Where's your code, Scripto Boy?

Shove it up your ass, fella.
 
S

spinoza1111

In any event, it's a red herring; Mono exists to make people think C# is
an open language, when it is in fact nothing of the sort.  It's a pure
deception, so far as I can tell.

C# is an open language. It had to be to be ECMA certified. It is being
changed and improved rapidly but in a controlled fashion, so we won't
have to worry about "standardizing" crap.
 
S

spinoza1111

spinoza1111wrote:

Broad based statement backed by a very narrow example.

Intelligent students can see that the square of the hypotenuse is
equal to the sum of the squares of the opposing sides based on one
diagram. Likewise, I show that for approximately equal levels of
effort, you get a much more efficient, powerful and easier to maintain
piece of code.
 
T

Tom St Denis

A more tuned C version (such as Ben's bit map) doesn't scale. It is of
course good to know that inserting hash table entries after the
collision creates slow performance, and it was I who said this first.
But we need to compare programs of roughly equal psychological
complexity.

One thing you repeatedly fail to grasp is the concept of APIs. You're
not comparing C# to C, you're comparing C# with a tuned hash library
to C with a poorly implemented hash algorithm. You also fail at
computer science in general.

For one thing, you're C# hash array is NOT bounded at 1000 entries.
You're trying to add 1000 entries to a hash array with 1000 spots
which means you're getting a lot of collisions and doing many
sequential searches [for empty spots]. That could easily explain the
time difference. Generally if you have an unbounded input size you'd
use a tree not a hash. Hashes are useful when you know you have at
most X symbols, then you can easily allocate some Y >> X space and
hash collisions should be low.

But don't let this stop you.

Tom
 
T

Tom St Denis

You are correct in saying that the C version becomes very slow as the
number of table entries approaches the size of the table. I've already
noted that this is a problem, and how to solve it by using a linked
list originating at each entry.

The solution is not to use linked lists [at least not that way].
Either use a larger table or a tree. Generally for unbounded inputs a
tree is a better idea as it has better properties on the whole (as the
symbol set size grows...).
However, the two programs demonstrate my point. C more or less forces
the C program to decide on the maximum table size, therefore the test,
which filled up the table, was realistic, since tables often fill up
in production.

No, you could grow the table at some threshold, reimport existing
symbols into the new hash. A naive approach would just add another X
symbol spots every time you get more than [say] 80% full.

Most of the time when a hash is preferable over a tree [or other
structure] is when the symbol set is of known dimensions. E.g. you're
writing a lexer for a parser for a language.
Whereas in a completely encapsulated way the C Sharp program either
preallocated more than enough storage or automatically expanded the
table. We don't know what it did because we do not need to know. In
production, the only responsibility of the user of hashset is to but
additions in try..catch error handling.

And you couldn't have a hash_add(), hash_find() function in C that
does all of this hash table manipulations [like done in C#] in C
because....?

The same algorithm(s) that C# uses could be implemented in a C library
[I'll bet they exist all over the net].
Giving C this type of flexibility would involve a lot of extra coding,
or visiting some creepy site.

Or doing a 2 second google search

http://www.gnu.org/s/libc/manual/html_node/Hash-Search-Function.html

Wow. That was hard.
If it becomes necessary to delete from the C hash table, and the basic
"search forward to available entry" method actually implemented is in
use, a special value is needed to mark deleted entries, since entries
that match the hash code need to be retrieved past the deleted entry.
If linked lists are used, they need to be searched to find the entry
to be deleted. Whereas in the C Sharp code, deletion is one line of
additional code!

hash_delete()

You really need to learn what functions are.
Therefore, the benchmark numbers are realistic. A simple C hash table,
implemented in the simplest fashion without a link list, will probably
run much slower than a C Sharp hash table user application with much
more coding effort. It will be much harder to maintain.

C sucks, and it's not "efficient" in practise.

So because you suck at software development and computer science in
general, C sucks.

When you're comparing equivalent algorithms maybe you might have a
point. Until then this is just further demonstration that there are
people in the world stupider than I.

Tom
 
T

Tom St Denis

Intelligent students can see that the square of the hypotenuse is
equal to the sum of the squares of the opposing sides based on one
diagram. Likewise, I show that for approximately equal levels of
effort, you get a much more efficient, powerful and easier to maintain
piece of code.

Which is of course dishonest since a quick google search turns up all
sorts of hashing code, some of which is PART OF GNU LIBC.

Tom
 
T

Tom St Denis

I explained that already as a problem. I said that the average search
time slows down as the table fills up. I've known this for nearly
forty years, since I read it in Knuth in 1971. I said (did you miss
it) that you need to use linked lists based at each entry to get K*C
time where C is the average number of collisions.

You've been in software for 40 years (1971 was 38 years ago btw...)
and it didn't occur to you to use a binary tree?
The problem is, as I've said, that the C program is more
psychologically complex and harder to maintain than C Sharp code which
does the same thing better.

man hsearch
man bsearch
man tsearch

Sure those are not part of C99, but they're part of GNU libc, which a
lot of people have access to. There are standalone data management
libraries out there.

You're being obtuse to think that people haven't worked on these
problems and that ALL developers must code their own solutions from
scratch.
I am also aware of the limitations of rand(). Indeed, many defenses of
C snigger at its obvious facilities and recommend arcana, some of
which is available from creepy sites. But this is completely absurd,
because one of the major reasons for using a programming language is
clarity.

rand() is useful for simple non-sequential numbers. If you need a
statistically meaningful PRNG use one. I'd hazard a guess C# is no
different in it's use of a LCG anyways. So I wouldn't be so apt to
praise it.

You could easily write an API that had functions like

hash_add(table, key, value);
value = hash_search(table, key);
hash_remove(table, key);

Why would that be so much harder than your class based methods from
C#?

Tom
 
T

Tom St Denis

Which is of course dishonest

I think Hanlon's Razor applies -- never ascribe to malice that which can
be adequately explained by stupidity.

In this case I don't think he can't perform a google search [e.g.,
being stupid], rather I think he's trying to troll to a point and is
shying away from doing any actual research in his quest to prove his
point because the truth of the matter is he's full of sh!#. He tried
to compare two completely different algorithms in an attempt to prove
that C# is better than C. Which, even if it were true wouldn't matter
in the slightest.

This is like comparing a 1L diesel Mini to a 5L hemi and saying the
Mini gets better mileage therefore diesel is better than gasoline.

If he actually researched the hash implementation in Mono, then coded
up a reasonable implementation in C, compared the speed [and/or the
complexity of the C code] he might actually have a point.

But as it stands all this thread has proved is he doesn't understand
Computer Science and that he's anti-social.

Tom
 
K

Kenny McCormack

(seebs)
(spinny)
C# is an open language. It had to be to be ECMA certified. It is being

On this one, I think seebs is right. Something about a stopped clock.

As a practical matter, .NET is an MS technology. The slice of the
computing world that is going to go for the ideas and concepts that are
..NET and yet would use an off-brand implementation is vanishingly small.

Give them credit. MS does know how to play the standards game for show,
while at the same time keeping the ball in their home territory.
 
K

Keith Thompson

Tom St Denis said:
man hsearch
man bsearch
man tsearch

Sure those are not part of C99, but they're part of GNU libc, which a
lot of people have access to. There are standalone data management
libraries out there.
[...]

bsearch is standard in both C90 and C99.
 
J

jamm

C Sharp "sucks". The proof is that MS keeps replacing it with new versions.
How long did 1.0 last? Less than 2 years? MS has billions to spend on their
strategies of market control and dominance without even worrying about
profit. OS and app sales fund their developer technologies. Do you like
being on the never ending MS learning cycle? I use C# professionally and I
don't like it.

Ohh now we have covariance and contravariance!! Lets all get sexually
aroused and blog about it! C# 4!!!

Remember the Windows 2000 era? Win32 programs just POPPED onto the screen,
lightning fast redraws.

But to this day I can spot a .Net application by its sluggish drawing. Aero
glass now hides this somewhat. Sneaky.

I'm sure you can come up with some C# algo. that performs super fast. We
should not be surprised by this should we? C# doesn't magically run on a
different processor or something. Like C, its all machine language in the
end.
 
S

spinoza1111

One thing you repeatedly fail to grasp is the concept of APIs.  You're
not comparing C# to C, you're comparing C# with a tuned hash library
to C with a poorly implemented hash algorithm.  You also fail at
computer science in general.

C# comes WITH a tuned hash library whereas the libraries that "come
with" C in the sense of being visible are a joke: snprintf being an
example.
For one thing, you're C# hash array is NOT bounded at 1000 entries.
You're trying to add 1000 entries to a hash array with 1000 spots

Many arrays in fact fill up in production. And don't waste my time: I
raised this issue right after I posted the original code, pointing out
that the average probe time would go to shit and describing how to fix
it...the point being that none of this was necessary with C Sharp.
which means you're getting a lot of collisions and doing many
sequential searches [for empty spots].  That could easily explain the
time difference.  Generally if you have an unbounded input size you'd
use a tree not a hash.  Hashes are useful when you know you have at
most X symbols, then you can easily allocate some Y >> X space and
hash collisions should be low.

Talk about bloatware: on the one hand one guy complains that C Sharp
executables are too large, here a C fan suggests wasting memory to
make a crude algorithm perform. It might be a good idea in certain
circumstances, but it fails to demonstrate that C doesn't suck. The
linked list that I suggested (but haven't implemented thisthread) is
better.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,089
Messages
2,570,602
Members
47,222
Latest member
jspanther

Latest Threads

Top