Do you use a garbage collector?

M

Mark Space

asterisc said:
Why doesn't Java optimize this and allocate nothing, since they are
not used?

An interesting idea. Responding to your earlier request for an output
line for each deletion, I added a finalize method. Then I cut down the
number of loops to 1000, and ran it. I got this:

init:
deps-jar:
Compiling 1 source file to
C:\Users\Brenden\Dev\misc\FinalizeTest\build\classes
compile:
run:
finalizetest.Main@1eed786
Time: 2 ms
BUILD SUCCESSFUL (total time: 0 seconds)


Hmmm..... although increasing the loops does produce more output,
increasing the loops to one million only produces ~7k lines of total output.


Code below:


package finalizetest;

public class Main
{

private static final int LOOPS = 1000;
private static final int MODULUS = 5000000;

Main( int c )
{
count = c;
}

int count;

public static void main( String[] arg )
{

long start = System.currentTimeMillis();

for ( int i = 0; i < LOOPS; i++ )
{
Main test = new Main( i );
if ( i % MODULUS == 0 )
{
System.out.println( test );
}
}
long end = System.currentTimeMillis();
System.out.println( "Time: " + ( end - start ) + " ms" );

}

@Override
protected void finalize() throws Throwable
{
System.out.println( "Finalized: " + this );
super.finalize();
}
}
 
J

Jerry Coffin

[ ... ]
How does this efficently handle dynamic numbers of threads that can, and
will be created and joined with, or detached, during the lifetime of a given
process?

That depends -- if the number of threads is _extremely_ dynamic (i.e.
you have no idea of the maximum number of threads) it would probably
become a bit of a problem, though that's purely a guess, not the result
from testing.

Keep in mind that this design isn't (and never was) intended as a
polished product, but strictly an example of one idea about how a
reasonably scalable allocator _could_ work. Just as no battle plan
survives contact with the enemy, such an untested design almost
certainly wouldn't survive implementation unscathed. OTOH, the idea has
been voiced that cross-thread freeing is necessarily serialized, and I
think this is enough to show that's not the case.
It seems like you are describing a fully connected network. One
freelist per-thread, that holds N buckets, where N == current number of
threads at any one time. This implies that N is an unknown variable,
therefore, if user creates a number of threads, so on a high-end system with
many cores, they must all be interconnected. Each threads heap needs an
entry for a thread in the system, if a new thread gets added, it must auto
register, or register on "first" usage; think 'pthread_once()'.

Right. Mildly non-trivial, but definitely not rocket science either. The
design is almost certainly wasteful of memory, but the amount wasted
wouldn't become very large until/unless your system included thousands
of threads (or something on that general order).
This contention can normally be reduced to a single atomic exchange to
free memory and another to grab memory for coalescing. Specifically,
consider a structure something like:

struct free_block {
void *next;
size_t size;
size_t origin_threadID;
char payload[0];
};

Actually, the free-block only needs to be == to sizeof(void*):

Oh, I realize it doesn't need to be larger -- this wasn't an attempt at
an optimized implementation at all, but purely an attempt at
demonstrating the general idea in a minimum of code, so it includes no
"tricks" of any kind at all.

[ rest snipped -- I haven't had a chance to read through the cited
references yet... ]
 
M

Mirek Fidler

How about mobile and embedded devices that don't have sophisticated
memory management? If a C++ application is leaking memory, the memory
might never be returned even after the application is terminated.
This is more dangerous than memory leak in Java application, where,
after the application is terminated, all memory is returned by VM.

If VM is able to return memory to OS, so it should be C++ runtime.

Mirek
 
M

Mirek Fidler

I followed a link to James Kanze's web site in another thread and was
surprised to read this comment by a link to a GC:

"I can't imagine writing C++ without it"

How many of you c.l.c++'ers use one, and in what percentage of your
projects is one used? I have never used one in personal or professional
C++ programming. Am I a holdover to days gone by?



I guess you can stay calm. I was always surprised by James Kanze's
relation to GC too. Especially if he is writting mission critical apps
and the only GC he can AFAIK use is convervative GC (what will happen
if somebody attacks his app with white noise data ?:)

Mirek
 
M

Mirek Fidler

Well, my friend, I have proven you wrong. Razi has been victorious
once again :)

Time: 2125 ms (C++)
Time: 328 ms (java)

What do you think you have proved? I see that MSC allocator is slower
than Java's, all right. But you have not proved that it is calling
system directly.

Besides, the code is not equivalent.
--- c++--

#include <ctime>
#include <cstdlib>
#include <iostream>

using namespace std;

class Test {
public:
Test (int c) {count = c;}
virtual ~Test() { }
int count;

};

int main(int argc, char *argv[]) {

clock_t start=clock();
for (int i=0; i<=10000000; i++) {
Test *test = new Test(i);
if (i % 5000000 == 0)
cout << test;

// you need to add here
delete test;
// because Java does not hold reference to created pointer, so it gets
GCed soon
}
clock_t endt=clock();
std::cout <<"Time: " <<
double(endt-start)/CLOCKS_PER_SEC * 1000 << " ms\n";

}

-- java ---

import java.util.*;

class Test {
Test (int c) {count = c;}
int count;

public static void main(String[] arg) {

long start = System.currentTimeMillis();

for (int i=0; i<=10000000; i++) {
Test test = new Test(i);
if (i % 5000000 == 0)
System.out.println (test);
}
long end = System.currentTimeMillis();
System.out.println("Time: " + (end - start) + " ms");

}

}

If you have time, fix and try with U++.... (it has overloaded new/
delete).

(Note: I do not know what the result will be, I guess it will still be
slower than Java, I am just interested :)

Mirek
 
M

Mirek Fidler

int main(int argc, char *argv[]) {
clock_t start=clock();
for (int i=0; i<=10000000; i++) {
Test *test = new Test(i);
if (i % 5000000 == 0)
cout << test;
}

If I add delete test; to this loop it gets faster. huh? what the
exaplanation for this?

2156 ms

and after I add delete test; to the loop

1781 ms

why is that?

Ah, so you found it ;)

Well, is not it obvious?

Without delete, you are allocating more and more memory. Means more
cache issues, more memory to manage etc. In Java, the total amount of
memory in loop is kept low, because a lot of it is cheaply recollected
by GC.

Mirek
 
M

Mirek Fidler

If you have time, fix and try with U++.... (it has overloaded new/
delete).

Well, to save you a bit of time:

#include <Core/Core.h>

using namespace Upp;

class Test {
public:
Test (int c) {count = c;}
// virtual ~Test() { }
int count;
};

CONSOLE_APP_MAIN
{
RTIMING("NewDelete");
for (int i=0; i<=10000000; i++) {
Test *test = new Test(i);
delete test;
}
}

Note that commenting / uncommenting virtual destructor has huge
impact... (means allocation/deallocation itself is about as fast as
virtual table assignment...)

Mirek
 
J

Jerry Coffin

Juha Nieminen said:
I don't see how this is so much different from what Java does.

»[A]llocation in modern JVMs is far faster than the best
performing malloc implementations. The common code path
for new Object() in HotSpot 1.4.2 and later is
approximately 10 machine instructions (data provided by
Sun; see Resources), whereas the best performing malloc
implementations in C require on average between 60 and 100
instructions per call (Detlefs, et. al.; see Resources).

If somebody wanted to make an equally meaningless claim in the opposite
direction, they could just as accurately claim that "freeing a block
memory with free() typically consumes no more than 4 machine
instructions, while a single execution of a garbage collector typically
consumes at least 10,000 clock cycles."
And allocation performance is not a trivial component of
overall performance -- benchmarks show that many
real-world C and C++ programs, such as Perl and
Ghostscript, spend 20 to 30 percent of their total
execution time in malloc and free -- far more than the
allocation and garbage collection overhead of a healthy
Java application (Zorn; see Resources).«

If you're using the exact versions of Ghostscript and Perl they tested,
compiled with the exact C++ compiler they used, running the exact
scripts they used for testing, this comparison probably means a lot.
Changing any of these will reduce the meaning of the tests -- and with
any more than minimal changes, there's likely to be no meaning left at
all.

Again, I could equally easily exchange "Java" and "C++", by merely
replacing "Perl and Ghostscript" with a couple of carefully chosen Java
programs.

To put things in perspective, consider the I recently profiled an
application that I wrote and maintain for my real work. According to the
profiler, the combined total of time spent in operator new and operator
delete (including everything else they called) was 0.115%. The very best
Java (or anything else) could hope to do is beat that by 0.115%, which
would hardly be enough to measure, not to mention caring about.

Of course, you don't know the exact nature of that program or even
generally what it does. You don't have the source code, so you have no
idea how it works, or whether it normally uses dynamic allocation at all
-- IOW, you know exactly as much about it as you do about the tests
cited by IBM.

Unlike them, I'll tell you at least a bit about the code. Like most of
my code, it uses standard containers where they seem useful. Unlike
some, it and makes no real attempt at optimizing their usage either
(e.g. by reserving space). Essentially all the data it loads (typically
at least a few megabytes, sometimes as much as a few gigabytes) goes
into dynamically allocated memory (mostly vectors). OTOH, after loading
that data, it does multidimensional scaling and then displays the result
in 3D (using OpenGL). It allocates a lot of memory dynamically, but most
of its time is spent on computation.
 
J

Jerry Coffin

[email protected] says... said:
If you use a relatively low-level language such as C++, this certainly
_does_ effect your productivity. I can't understand where you get the
idea that it doesn't. A more expressive language, preferrably one
designed for or adaptable to the problem domain, will in many cases
produce a dramatic improvement in productivity over the kind of manual
stone-breaking one is doing in C++.

It sounds a great deal as if you don't know C++ at all. Offhand, I can't
think of any other language I've used that has nearly as good of support
for domain-specific languages as C++.

Just for one well-known example, the Boost.Spirit library is a parser
generator library built entirely out of C++ templates. Though it's not
at all domain-specific itself, Boost.MPL (MetaProgramming Library)
supports metaprogramming that can be used to generate all sorts of other
domain specific languages. In fact, one of the primary uses for
metaprogramming is to embed domain specific languages into C++.

If you honestly care about support for domain specific languages, read
_C++ Template Metaprogramming_ by David Abrahams and Aleksey Gurtovoy.
 
M

Mirek Fidler

I added #include <ctime>

RTIMING("NewDelete");
clock_t start=clock();

or should it be...

clock_t start=clock();
RTIMING("NewDelete");

FYI, "RTIMING" does the "clock" job. It measures a time spend in code
from "RTIMING" to the end of block. Press Alt+L in theide and you will
get the ouput log file with number(s).

Anyway, with numbers like this, I would say we can put "superior GC
memory management performance over manual new/delete" to the rest, can
we?

The main reason why GC seems to perform better is that much more time
was spent optimizing GC than manual alloc/free in last years.

And also note that in this example, GC's job is fairly simple - there
are no living blocks left in the code. Makes GC zero time operation.

Mirek
 
M

Mirek Fidler

BTW, if you want a more realistic benchmark, try this and its Java
equivalent:

#include <Core/Core.h>

using namespace Upp;

struct Tree {
Tree *left;
Tree *right;
};

Tree *CreateTree(int n)
{
if(n <= 0)
return NULL;
Tree *t = new Tree;
t->left = CreateTree(n - 1);
t->right = CreateTree(n - 1);
return t;
}

void DeleteTree(Tree *t)
{
if(t) {
DeleteTree(t->left);
DeleteTree(t->right);
delete t;
}
}

CONSOLE_APP_MAIN
{
RTIMING("Tree new/delete");
for(int i = 0; i < 100; i++)
DeleteTree(CreateTree(20));
}
 
M

Mike Schilling

Mirek Fidler said:
If VM is able to return memory to OS, so it should be C++ runtime.

The JVM can (in principle, at least) compact its heap and return the
now-free space to the OS. An environment that doesn't allow memory
compaction (which includes most C++ implementations) would find this
impossible.
 
M

Mirek Fidler

The JVM can (in principle, at least) compact its heap and return the
now-free space to the OS. An environment that doesn't allow memory
compaction (which includes most C++ implementations) would find this
impossible.

Yes, but we are speaking about "application is terminated" situation
here...

Mirek
 
M

Mirek Fidler

The common claim is that GC is much slower than manual new/delete.


How could that be? C is 30 years old and all this time has been spent
in optimizing C and C++ too.

Nope. Especially, new/delete in MSC is absolutely terrible.
Unfortunately, very little effort went into C++ standard library.
Shame, but true.
GC and especially JIT compilers are
newer, and this is not enough time to improve them.

Well, everybody was throwing a lot of money on optimizing VMs and GCs.

IMO, for each developer in major SW company, 10 times more people you
will find optimizing GC than malloc/free.

Otherwise, explain me how we can ouperform MSC new/delete by 10 times?

Mirek
 
M

Mirek Fidler

I was using g++ (MinGW) MSC9 did better a little better.

13859 ms (Vc++) cl /O2 /GL test.cpp /link /ltcg
17765 ms (g++) g++ -O2 -fomit-frame-pointer "test.cpp" -o "test.exe"

but, for some reason if I use U++ IDE to compile, these time improve..

5142 ms (g++)
3906 ms (MSC9)

still much slower than java-server ~1400 ms..

At this point, hard to say what have you really tested/measured ;)

This looks like you have adapted my code for normal C++ (e.g. removing
#include <Core/Core.h>). Then tested in theide original version.
Correct?

Where is your Java version?

Mirek
 
M

Mirek Fidler

4 times slower than the last U++ version that was doing 100,000,000

the last U++ version was around 1700ms this one 7671 ms

Ah, you have missed important fact that we are now doing completely
different benchmark :) One that will keep some live data in the
memory, this maybe exhibiting the real costs of GC...

Mirek
 
M

Mirek Fidler

Java -server -Xms64m Test

Ah, so any application in Java will now consume at least 64MB?
Time: 17219 ms

And even then (this is twice as much as C++) GC is 2 times slower...

Be fair and post the number without tweaking initial memory pool...
I am two times faster than U++ :)

That was cheap. Anyway, I guess, we can put this to rest. Java memory
management is much slower than optimal C++ allocator, end of story.

Mirek
 
M

Mirek Fidler

GC ran only 17 times. Witout any flags it runs 779 times. with output
that looks like

Obviously, less times GC runs, faster the code is.

But we can now stop pretending that GC is faster than manual
management, can we? :)

Of course, if there is no live memory involved, it can be as fast as
manual. But that proves nothing.

(And, BTW, we are still actually using the heap. Of course, in C++,
you allocated much less items there).

Mirek
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,175
Messages
2,570,946
Members
47,498
Latest member
yelene6679

Latest Threads

Top