Collecting different execution statistics of C++ programs

romixnews · Apr 5, 2006

Hi,

I'm facing the problem of analyzing a memory allocation dynamic and
object creation dynamics of a very big C++ application with a goal of
optimizing its performance and eventually also identifying memory
leaks. The application in question is the Mozilla Web Browser. I also
have had similar tasks before in the compiler construction area. And
it is easy to come up with many more examples, where such kind of
statistics can be very useful. It is obvious that the availability of
execution statistics for C++ applications can be very useful for
optimization of the applications and for analyzing their behavior.

While trying to solve the problem on a per-project basis and
introducing some custom solutions for that (as an example, Mozilla
provides some custom means to collect at least some of the
statistics), I realized that in many cases, the kind of statistics
that is nice to have is not really dependent on the application.
Therefore I try to write down, what I consider to be useful as an
execution statistic of a C++ application and try to check what tools
are available for collecting some of these statistics.

Some of the most interesting statistics can be related to the
following characteristics of the application (the list is incomplete
and very memory-management centric):

Code execution statistics:
1) Execution time statistics for whole application
2) Fine-grained execution time statistics for each (used) function
3) Code coverage information
4) Where a certain function/method was called and how often?
5) When a certain function/method was called?

Memory allocation and memory access related statistics:

6) Cumulative memory usage

7) dynamic memory allocation information:
what memory blocks was allocated/freed?
when a certain memory block was allocated/freed?
which allocated blocks are not freed, i.e. leaked?

7.1) All statistics from (7), but extended with type/class information

8) Memory access dynamic
which memory regions were accessed by the application?
what parts of the code has accessed these memory regions?
when certain/all memory regions were accessed?

8.1) All statistics from (8), but extended with type/class
information, where it is appropriate

C++ specific statistics:
9) Object creation statistics:
How many objects of a given type/class were created - overall?
How many objects of a given type/class were created as global
variables?
How many objects of a given type/class were created on stack?
How many objects of a given type/class were created on the heap?

10) Object creation dynamics:
Where certain objects of a given class were created and how many?
When certain objects of a given class were created and how many?

11) Object access dynamics:
How many read/write accesses have happened to a given(all) object of
given(all) class?
How many read/write accesses have happened to a a given(all) object of
given(all) class?
Where these accesses have happened?
When these accesses have happened?

12) (Non-)static method invocations dynamics:
How many invocations of a given member method have happened for a
given object/class?
Where these invocations have happened?
When these invocations have happened?

For some of the mentioned bullets, there are some tools available
already.
(1) and (2) can be solved by prof/gprof-like tools
(3) can be solved by gcov-like tools
(4) and (5) can be probably also solved by prof/gprof and/or static
code analyzers that can build a call tree
(6) and (7) can be solved by special debug malloc libraries, e.g.
dmalloc, mpatrol. Or one can use tools like purify or valgrind.

But I'm not aware of the tools and libraries that can be used to
collect statistics described in other bullets:

(7.1) - this is particularly interesting if you want to know "per
class" allocation statistics. In C memory allocation primitives like
malloc/free are untyped. But in C++ operator new and operator delete
are do actually know the type of their arguments, at least the compiler
knows it. Unfortunately, this type information is not available at the
program level in general (only for classes that define their own
operator new. But even in this case it is implied and there is no way
to distinguish between a call for the class itself and for a derived
class). It is not possible with current tools, since all tools external
to the compiler "do not know enough" about types and classes used by
the program. As a result, all type-related information is essentially
lost. And we do not have any useful reflection/introspection facilities
in C++ yet.

(8), (8.1) - these statistics are important for understanding of the
dynamical behavior of the application with regard to memory usage. It
could be used for analysis and identifying memory access patterns. It
can provide some useful input for designing or selecting more
efficient memory allocators and garbage collectors for a given
application. It also could provide some insight about paging/swapping
behavior of the application and eventually provide some hints for VM
manager.

(9) is also rather related to the memory management. But it has a bit
broader scope. It can give you the idea about objects creation on the
per-class basis.
(10) extends (9) with a dynamic of objects creation and it is rather
similar to (7), but concentrates on objects.

(11) is interesting for better understanding of objects usage
patterns. It provides information with the object or member variable
granularity and can be collected on a per-object or per-class basis.

(12) is similar to (11) but collects statistics about member method
invocations.

Of course, I realize that many of these bullets, which I have marked
as not-solved by currently available tools, can be solved by using
some project specific solution. But virtually all of these solutions
would require the instrumentation of the original application. And it
is most likely to happen at the source code level, by inserting
certain statements for statistics collection (e.g. inside constructors
and destructors and/or other member methods, inside operators new and
delete, etc). Even worse, this is likely to be done by hand, since
there are not that many C++ analysis tools that could it
automatically. To recap, we have two options:

a) Instrumentation by hand

This is very inconvenient and eventually very time-consuming,
especially for big code bases like Mozilla's. And this introduces at
the source level a code that is not directly related to the
applications semantics. If done without using any automated tools, it
is probably only feasible when used right from the beginning of the
project and added to each class as it is designed. Applying such
changes to an already existing big project could be rather annoying.
Just imagine modifying several hundred classes by hand and inserting
such a code.

b) Doing the instrumentation automatically

Automating the instrumentation of C++ code makes the task much easier.
This can be done either at the source code level or at the machine
code level.

When we speak about automated source-code instrumentation, some tools
like Aspect++ or OpenC++, as well as some other source-to-source
transformation tools come up to my mind. As it can be easily seen,
they are coming from such areas like aspect-oriented programming,
meta-object protocols, etc. This is not surprising, since collection
of statistics can be considered to be just an "optional aspect" of the
application. I guess, it is possible to instrument any C++ application
for a statistics collection using these tools. But again, this would
introduce some changes at the source level, which can be considered as
a drawback in some situations.

When it comes to the machine-code level instrumentation, I'm not aware
of any tools that can cope with C++. Valgrind with its plugins and
Daikon are probably the closest candidates, but they do much more than
required and slowdown the execution greatly. It is also not so obvious
if all of the described kinds of statistics can be gathered using
these tools. At the same time, for some other languages, especially
ones having a virtual machine, e.g. Java and .Net languages, this can
be done rather easily using aspect-oriented programming and
instrumenting at the run-time! Of course, these languages have a much
simpler and higher-level semantic and availability of a virtual
machine makes it easy to intercept certain instructions and actions.
Another big advantages of these languages are rather powerful
reflection mechanisms that can be used at the run-time.

And I'd like to stress it again: A common, application independent way
of gathering such statistics is required. It would make the whole
process more straight forward, less error-prone, more portable and
faster. The current situation, where everybody introduces his own
solution for the same problem is not really how it should be, or?

Having said all that, I'd like to ask others about their opinion about
the status of execution statistics collection for C++ programs.

What are your experiences?
What tools do you use?
Where do you see shortcomings?
What statistics would you like to be able to collect?

Do you see any other solutions than instrumentation for collecting the
described kinds of statistics in C++? If instrumentation is required,
what would be better to have: a source-level instrumentation or a
machine-code level instrumentation?

How can this be achieved?
Should we have special tools for that?
Should we extend compilers to support this kind of instrumentation
(e.g. you can tell the compiler to insert a certain call at the
beginning/end of each function/method; or you can tell it to call
certain function before/after each 'operator new' call)?

I'm very interested in your opinions and hope that we can have an
interesting discussion about these topics.

Best Regards,
Roman

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Thomas Maier-Komor · Apr 6, 2006

What are your experiences?
What tools do you use?
Where do you see shortcomings?
What statistics would you like to be able to collect?

Do you see any other solutions than instrumentation for collecting the
described kinds of statistics in C++? If instrumentation is required,
what would be better to have: a source-level instrumentation or a
machine-code level instrumentation?

How can this be achieved?
Should we have special tools for that?
Should we extend compilers to support this kind of instrumentation
(e.g. you can tell the compiler to insert a certain call at the
beginning/end of each function/method; or you can tell it to call
certain function before/after each 'operator new' call)?

I'm very interested in your opinions and hope that we can have an
interesting discussion about these topics.

Best Regards,
Roman

Hi Roman,

a tool you probably should take a look at is the libumem of Solaris. It
doesn't require any instrumentation, has almost no slowdown effect, and
you can analyze running programs. You will get detailed statistics of
what stack traces did allocations, how the distribution of allocation
size looks like, information on memory leaks, and a ton of other things.
I use it to hunt bugs, analyze allocation performance, and more.

And the best thing you just preload the shared object set some
environment variables and run your executable. You don't have to care
about the language your program is written in, and at any point in time
during program execution, you just pull a core file with gcore and run
mdb. Then you see what is going on. You can even attach to the running
process, but this will stop the program.

Concerning other performance related problems, there I again have to
refer to Solaris board utilities. Take a look at cputrack, mpstat, and
friends.

Cheers,
Tom

P.S.: I just tested libumem against mozilla and it found a ton of leaks
in libfontconfig...

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

romixnews · Apr 6, 2006

Hi Tom,

Thanks a lotf for your answer. According to your recommendation, I've
looked at the mentioned Solaris utilities. I never used them before,
since I mainly develop for Linux and somtimes for Windows.

It looks like the libumem is a typical (well, may be a bit more
advanced) version of a memory allocation debugging library. Linux
systems also have something like this, e.g. mpatrol, dmalloc, etc.
Indeed, you get statistics about what was allocated, what was leaked,
when these allocations took place.

And cpustat & friends gives you some overall application performance
stats.

I would say, that these things solve problems (1) till (7) in my
categorization. But also in my original mail I stated, that exactly
these bullets are not a great problem and there are tools to solve
them.

But solution that you describe and what I mentioned collect the memory
allocation stats with a rather poor granularity, if you look at the
bullets 7.1 and higher. What I'd like to have is the statistics per
class, per object, per field, per method. Basically, it would provide a
more precise information at a much finer granularity. And it will also
include a type information, which is often very important. You'll be
able to analyze some issues at the language level and not at the OS
memory allocation API level. It is obvious, that this type of
statistics is programming-language specific.

And I'd like to have a dynamics of memory access and modification, not
only overall stats about allocation and deallocation. Under "dynamics"
I understand the information about which memory regions were accessed,
when, which of these accesses lead to page faults, etc. And then also
some statistics related to this. If it would be possible to get it, one
could try to analyze it (prefferably automatically) and derive some
_memory usage patterns_. For example, you can identify as a result of
analysis that you always access memory very sequentially. Or you see
that you always traverse very big regions of memory which leads to a
big number of page faults. One can think of many more use cases. Based
on all that, you get the idea how you can optimize your application.
You might want to redesign your data structures or you decide to use a
different and more efficient memory allocator or garbage collector.

It is may be not so obvious, why such detailed statistics are useful.
Well, the short answer is that they give you much more insight into the
"inner life" of your application. Standard tools are not sufficient in
many cases, especially if you really want to optimize a memory-related
performance of your application. These additional statistics and logs
could greatly improve analysis capabilities.

Cheers,
Roman

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Ian Collins · Apr 6, 2006

Looks like my x-posted reply disappeared into a black hole, so I'll
just post it again here.

OK, I'll bite.

I do most of my development on Solaris, which has a comprehensive set of
tools for application performance analysis.

The Studio analyser gives you the data for 1-5, along with the
applications memory page usage (which you didn't mention). This is a
useful data point if you wish to minimise paging. There is a garbage
collector library that gives you the data for 6.

I use my own global new/delete to give me 7.

That's as much data as I've ever required to tune an application. I've
found global memory allocation statistics adequate for tuning any local
allocators.

If I wanted to dig deeper, the dtrace utility would probably give me all
the hooks I'd require to gather most of your other data, but as I said,
I haven't had to do this.

I think this is best left to platform tools, like the Solaris
performance analyser and dtrace, that can be applied to any running
executable without the requirement for an instrumented build.

I think we do.

-- Ian Collins.

Default User · Apr 6, 2006

Ian said:
Looks like my x-posted reply disappeared into a black hole, so I'll
just post it again here.

When you reply to a message crossed to a moderated group, it won't post
to ANY group until approved in the moderated group. Messages should
never be crossed to clc++m and here.

Brian

Ian Collins · Apr 7, 2006

Default said:
Ian Collins wrote:

When you reply to a message crossed to a moderated group, it won't post
to ANY group until approved in the moderated group. Messages should
never be crossed to clc++m and here.

I'm aware of the delay, that's why I waited 36 hours before reposting here.

Thomas Maier-Komor · Apr 7, 2006

But solution that you describe and what I mentioned collect the memory
allocation stats with a rather poor granularity, if you look at the
bullets 7.1 and higher. What I'd like to have is the statistics per
class, per object, per field, per method. Basically, it would provide a
more precise information at a much finer granularity. And it will also
include a type information, which is often very important. You'll be
able to analyze some issues at the language level and not at the OS
memory allocation API level. It is obvious, that this type of
statistics is programming-language specific.

If you think it doesn't address 7.1 and higher, I either don't get your
point or you didn't have a deep enough look at libumem. Read
umem_debug(3MALLOC).

Simple Example:
$ env LD_PRELOAD=/lib/libumem.so UMEM_DEBUG=audit=8
UMEM_LOGGING=transaction=16M mozilla

$ mdb `pgrep mozilla-bin`

::umalog ! c++filt | less

T-0.000000000 addr=3a6c240 umem_alloc_56
libumem.so.1`umem_cache_free+0x50
libumem.so.1`process_free+0x78
libxpcom.so`PL_ProcessPendingEvents+0x258
libxpcom.so`unsigned nsEventQueueImpl:

rocessPendingEvents()+0x20
libwidget_gtk2.so`int
event_processor_callback(_GIOChannel*,GIOConditio
n,void*)+0x18
libglib-2.0.so.0.400.1`g_main_dispatch+0x19c
libglib-2.0.so.0.400.1`g_main_context_dispatch+0x9c
libglib-2.0.so.0.400.1`g_main_context_iterate+0x454

T-0.000119000 addr=3a6c240 umem_alloc_56
libumem.so.1`umem_cache_alloc+0x210
libumem.so.1`umem_alloc+0x60
libumem.so.1`malloc+0x28
libxpcom.so`void nsTimerImpl:

ostTimerEvent()+0x20
libxpcom.so`unsigned TimerThread::Run()+0x150
libxpcom.so`void nsThread::Main(void*)+0x8c
libnspr4.so`_pt_root+0xcc
libc.so.1`_lwp_start

T-1.589364400 addr=3a6c0d8 umem_alloc_56
libumem.so.1`umem_cache_free+0x50
libumem.so.1`process_free+0x78
libxpcom.so`PL_ProcessPendingEvents+0x258
libxpcom.so`unsigned nsEventQueueImpl:

rocessPendingEvents()+0x20
libwidget_gtk2.so`int
event_processor_callback(_GIOChannel*,GIOConditio
n,void*)+0x18
libglib-2.0.so.0.400.1`g_main_dispatch+0x19c
libglib-2.0.so.0.400.1`g_main_context_dispatch+0x9c
libglib-2.0.so.0.400.1`g_main_context_iterate+0x454

T-1.589370300 addr=2dc8708 umem_alloc_56
libumem.so.1`umem_cache_free+0x50
libumem.so.1`process_free+0x78
libCrun.so.1`void operator delete(void*)+4
libxpcom.so`unsigned nsTimerImpl::Release()+0x58
libxpcom.so`void destroyTimerEvent(TimerEventType*)+0x10
libxpcom.so`PL_ProcessPendingEvents+0x258
libxpcom.so`unsigned nsEventQueueImpl:

rocessPendingEvents()+0x20
libwidget_gtk2.so`int
event_processor_callback(_GIOChannel*,GIOConditio
n,void*)+0x18

T-1.589378700 addr=348b1e8 umem_alloc_32
libumem.so.1`umem_cache_free+0x50
libumem.so.1`process_free+0x78
libCrun.so.1`void operator delete(void*)+4
libnecko.so`unsigned nsFileOutputStream::Release()+0x34
libnecko.so`unsigned nsCookieService::Write()+0x50c
libnecko.so`void nsCookieService:

oLazyWrite(nsITimer*,void*)+4
libxpcom.so`void*handleTimerEvent(TimerEventType*)+0x174
libxpcom.so`PL_ProcessPendingEvents+0x1e4

This is only a very small snippet and I limited stack tracing to 8
frames, as you can see in the setup. But you get timing information,
information about the objects allocated or freed, and the call stack at
the point of the event. And this is not the limit of libumem...

And I'd like to have a dynamics of memory access and modification, not
only overall stats about allocation and deallocation. Under "dynamics"
I understand the information about which memory regions were accessed,
when, which of these accesses lead to page faults, etc. And then also
some statistics related to this. If it would be possible to get it, one
could try to analyze it (prefferably automatically) and derive some
_memory usage patterns_. For example, you can identify as a result of
analysis that you always access memory very sequentially. Or you see
that you always traverse very big regions of memory which leads to a
big number of page faults. One can think of many more use cases. Based
on all that, you get the idea how you can optimize your application.
You might want to redesign your data structures or you decide to use a
different and more efficient memory allocator or garbage collector.

This is a whole different area. These kind of problems are IMO best
debugged from an OS level view. I must again refer to my original
posting, and point towards the tools of Solaris. Pagefaults for example
can be easily analyzed using vmstat with its various options. There is
much more to all these utilities than you can see in a brief moment. Use
them for some time and you will get a lot.

As far as it is possible I do my developments under Solaris, although I
might target another system. The reason I stick with it and take the
effort of supporting another target, is the availability of these tools.

Don't forget that many performance problems that might come up, are
related to memory, but not caused directly by your application. E.g.
dynamic linking could be a source of problems or I/O. Sometimes one can
use a different I/O pattern and achieve drastic improvements.

It is may be not so obvious, why such detailed statistics are useful.
Well, the short answer is that they give you much more insight into the
"inner life" of your application. Standard tools are not sufficient in
many cases, especially if you really want to optimize a memory-related
performance of your application. These additional statistics and logs
could greatly improve analysis capabilities.

The white box view is the best precondition to do a successful analysis
of bottlenecks. It is impossible to go into the details of what kind of
information you can gather with all the tools I am refering to. But I
can list some manpages you might want to take a look at:

ld.so.1(1)
mpss.so.1(1)
ppgsz(1)
lari(1)
apptrace(1)
truss(1)
cputrack(1)
busstat(1m)

The following tools are included in Sun's Studio Compilers (C, C++,
Fortran), which is available for free for any use:
collect(1)
analyzer(1)
er_print(1)
tcov(1)

And, most important: DTrace.
Take a look at it. You won't regret it, as it will change the way you
think about performance analysis and the view you have on the system.
You become more aware about the impacts of interactions between your
application and the OS. And most of the improvements you can achieve
like this under Solaris, are also beneficial on other OSs.

Cheers,
Tom

P.S.: The time when Solaris cost a premium is over. Install it on an old
PC and test the tools. If you find out how to use them you will stick to
them. I found nothing comparable under Linux, *BSD or Windows.

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

romixnews · Apr 7, 2006

Hi Ian, Hi Tom,

I looked more precisely at the Solaris tools you described and they
really seem to be more powerful, as I seemed at the first glance. In
particular, the Dtrace tool. I'll probably try to develop something
under Solaris and get a hands-on experience.

But coming back to the question about granularity. Do you state that
having statistics and logs at the object/method/class level for C++
(including access logs at this granularity) is a sort of overkill and
does not bring in any add-value? The statistics delivered by the tools
you described are just enough?

For example, looking at the umem_debug example given by Tom, I agree
that it is very useful. But still one have to decrypt a stack-trace to
realize that a given allocation was done let's say by 'operator new'
or released by teh 'operator delete'. And if you do not define a
class-specific 'operator new' and 'operator delete' for each class, you
do not really know object of which class was allocated/freed by this
allocation.

Also the fact that many different statistics come from different tools
in different formats and it is not very easily possible to correlate
them to each other makes the analysis harder and more time consuming.

That is why I still think that it might be also very useful to have the
statistics at the programming language abstraction level (objects,
methods, classes). And customizable instrumentation (i.e. where you
can define the actions to be taken on certain events that it tracks,
a-la AOP) that takes programming language semantics into account could
give you a lot of flexibility and would provide you with the ability to
collect and process the executuion information about your program in a
form closer to the semantics of the original source program, instead of
looking at everything from the OS level.

In Java, most of these things are possible without any great problems.
In C++ - not yet. But I'm trying to understand why? Is it because it is
not needed in usual scenarios? Or is it due to the lack of
corresponding tools for C++? Do you use the tools you mentioned because
there are no better alternatives or do you say that you are happy with
them and you do not need anything else, which is more high-level ?

Roman

Ira Baxter · Apr 9, 2006

Hi,

I'm facing the problem of analyzing a memory allocation dynamic and
object creation dynamics of a very big C++ application ...

[many types of statistics... some methods for collecting them]

b) Doing the instrumentation automatically

Automating the instrumentation of C++ code makes the task much easier.
This can be done either at the source code level or at the machine
code level.

When we speak about automated source-code instrumentation, some tools
like Aspect++ or OpenC++, as well as some other source-to-source
transformation tools come up to my mind. As it can be easily seen,
they are coming from such areas like aspect-oriented programming,
meta-object protocols, etc.

Well, some come from there. Our perspective is that massive change
is a capability that should be in the hands of programmers, and has
nothing specific to do with aspects (expect they are special case).

I guess, it is possible to instrument any C++ application
for a statistics collection using these tools. But again, this would
introduce some changes at the source level, which can be considered as
a drawback in some situations.

I don't see it that way. If you can easily instrument your code,
you can instrument as you desire, collect your data, and throw
the instrumentated code away. A really good instrumentation tool knows the
complete language,
can insert arbitrary probes, and can optimize probe insertion in
appropriate circumstances to minimize the Heisenprobe effect.
Where's the disadvantage?

The DMS Software Reengineering Toolkit can (and is) be used
for this effect. The page,
http://www.semanticdesigns.com/Products/Profilers/index.html
has a white paper on how to insert a particular type of code coverage
probe; we do this for many langauges including C++.
Changing probe styles isn't particularly hard, and many
of the this of data collection activities you described could
easily be implemented this way.

--
Ira Baxter, CTO
www.semanticdesigns.com

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Ian Collins · Apr 9, 2006

Hi Ian, Hi Tom,

I looked more precisely at the Solaris tools you described and they
really seem to be more powerful, as I seemed at the first glance. In
particular, the Dtrace tool. I'll probably try to develop something
under Solaris and get a hands-on experience.

But coming back to the question about granularity. Do you state that
having statistics and logs at the object/method/class level for C++
(including access logs at this granularity) is a sort of overkill and
does not bring in any add-value? The statistics delivered by the tools
you described are just enough?

In my case, yes both for host and embedded developments.

For example, looking at the umem_debug example given by Tom, I agree
that it is very useful. But still one have to decrypt a stack-trace to
realize that a given allocation was done let's say by 'operator new'
or released by teh 'operator delete'. And if you do not define a
class-specific 'operator new' and 'operator delete' for each class, you
do not really know object of which class was allocated/freed by this
allocation.

Also the fact that many different statistics come from different tools
in different formats and it is not very easily possible to correlate
them to each other makes the analysis harder and more time consuming.

Probably because they all do a different job and would be used where
appropriate.

That is why I still think that it might be also very useful to have the
statistics at the programming language abstraction level (objects,
methods, classes). And customizable instrumentation (i.e. where you
can define the actions to be taken on certain events that it tracks,
a-la AOP) that takes programming language semantics into account could
give you a lot of flexibility and would provide you with the ability to
collect and process the executuion information about your program in a
form closer to the semantics of the original source program, instead of
looking at everything from the OS level.

That's what I don't like, having to instrument the program. I prefer
tools that can be applied without special builds.

In Java, most of these things are possible without any great problems.
In C++ - not yet. But I'm trying to understand why? Is it because it is
not needed in usual scenarios? Or is it due to the lack of
corresponding tools for C++? Do you use the tools you mentioned because
there are no better alternatives or do you say that you are happy with
them and you do not need anything else, which is more high-level ?

Yes. Either individually or in combination, they can tell you just
about everything that you might want to know about an application, even
a third party one (obviously no data that requires symbols).

romixnews · Apr 9, 2006

Hi Tom,

Following your advice and recommendation, I looked at DTrace during the
weekend. I've downloaded VMWare image of the Nexenta OS (i.e. Debian on
top of Solaris), which contains dtrace. And I've read almost
everything available on the Internet about dtrace.

I have to admit that it is one of the most powerful performace analysis
and tracing tools I've ever seen. Really impressive! So, you are
right. I do not regret doing it. Thanks a lot!

BTW, the approach is very similar to the run-time weaving used by many
aspect-oriented programming implementations, e.g. AspectWerkz or JBoss
AOP. But they do it for Java, of course. I think it would be
interesting to have DTrace D language extended into the direction of
AOP aspect definition languages. But it is just dreaming

Having seen all the power of DTrace, I still think that supporting C++
at the language level and at the granularity of the classes, objects
and fields can be very useful. Apparenty, I've found out that there are
already DTrace providers for Ruby and Python languages. I guess, this
is done at the language level. Basically, the run-time systems of these
languages do produce certain events for DTrace (e.g. method invocation,
object creation, etc). Providing a similar DTrace provider for C++ is
of course more tricky, because you have to cope with different C++
compilers and native code. But it is still possible ...

Cheers,
Roman

P.S. Tom, I have the impression that DTrace can do usermode profiling
only for dynamically linked extecutables. And it seems that Solaris 10
does not support the creation of static executables any more. Is it
true? May be you know, since I've no previous experience with Solaris.

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

romixnews · Apr 9, 2006

Hi Ian,

In my case, yes both for host and embedded developments.

I can good understand that in what you are doing you probably do not
need this. But do you also think that there is no application domain or
kind of applications, where it is useful and where you could need it,
if you would implement those kinds of applications?

BTW, just FYI, I have some experience in the embedded development too.
Actually, I have developed an optimizing C compiler product for some
specialized embedded systems and also some other support tools. So, I
have an impression about what is usually required in this domain.

Probably because they all do a different job and would be used where
appropriate.

OK, they do a different job, but sometimes you want to see the
correlation of different features or how they play together. As far as
I understand your statement, it is a developer who will perform this
tedious task. Why shouldn't it be possible to do it by using an
automated tool?

That's what I don't like, having to instrument the program. I prefer
tools that can be applied without special builds.

I can understand ypur point. But your conclusion about a special build
is not quite correct. More precisely, you need special build only if
you cannot do the instrumentation at the run-time or at the deployment
time of an application. And the ability to do it is very dependent on
the programming language. For example, in Java it can be done
completely at the run-time without any special builds or source code
modifications, at the level of compiled classes and even for
third-party libraries. Have a look at AspectWerkz (e.g. look at
http://aspectwerkz.codehaus.org/definition_issues.html to see what you
can intercept) or JBoss AOP, to see how powerful it is. In C++ it is
impossible since compiled code lacks too much information (related to
introspection and reflection) from the original source code. Therefore
doing the C++ instrumentation completely during run-time at machine
code level always leaks some interesting things. So, if you want this
additional infos, you need to provide this additional information
somehow, most likely by compiler or some other "static" intrumentation
tool. And then yes, it is a special build.

Yes. Either individually or in combination, they can tell you just
about everything that you might want to know about an application, even

Yes they can. But a developer works with certain abstractions in his
proggramming language and application and he wants to see the answer
using these abstractions and not only some microscopic properties of
these abstractions. Imagine that you ask someone to describe a certain
person and you get the answer, which indeed does it - but describing
this person at the level of single atoms. You will be just killed by
this amount of information, or? And it will be very difficult for you
(actually nearly impossible) to do any conclusions about some higher
level properties, e.g. gender, age, especially if your time is limited.
Is it appropriate? This is my point on having more higher-level
statistics that are more alligned with the abstraction level of the
application.

a third party one (obviously no data that requires symbols).

As for 3rd parties, see my comment about AOP and Java above.

Cheers,
Roman

Ian Collins · Apr 9, 2006

I can good understand that in what you are doing you probably do not
need this. But do you also think that there is no application domain or
kind of applications, where it is useful and where you could need it,
if you would implement those kinds of applications?

I don't know. I can only speculate based on my experience that the more
complex the application becomes, the more generic (or at least heavily
filtered) the performance data has to be to be of practical use and not
swamp the user.

OK, they do a different job, but sometimes you want to see the
correlation of different features or how they play together. As far as
I understand your statement, it is a developer who will perform this
tedious task. Why shouldn't it be possible to do it by using an
automated tool?

For common performance optimisations, look into the way the Studio
compiler and analyser work together to tune the application.

I have found I either require a top level view of performance or a very
detailed view on one aspect, for the former a tun time analyser does the
job, for the latter, dtrace.

I can understand ypur point. But your conclusion about a special build
is not quite correct. More precisely, you need special build only if
you cannot do the instrumentation at the run-time or at the deployment
time of an application. And the ability to do it is very dependent on
the programming language.

this is where a toll like dtrace comes in handy, it can be used with C,
C++ and Java.

For example, in Java it can be done

completely at the run-time without any special builds or source code
modifications, at the level of compiled classes and even for
third-party libraries. Have a look at AspectWerkz (e.g. look at
http://aspectwerkz.codehaus.org/definition_issues.html to see what you
can intercept) or JBoss AOP, to see how powerful it is. In C++ it is
impossible since compiled code lacks too much information (related to
introspection and reflection) from the original source code. Therefore
doing the C++ instrumentation completely during run-time at machine
code level always leaks some interesting things.

It's also a difference is philosophy between languages like Java and
Those like C and C++ that don't place anything between the code and the
machine.

So, if you want this

additional infos, you need to provide this additional information
somehow, most likely by compiler or some other "static" intrumentation
tool. And then yes, it is a special build.

Which might not behave like the regular build.

Yes they can. But a developer works with certain abstractions in his
proggramming language and application and he wants to see the answer
using these abstractions and not only some microscopic properties of
these abstractions. Imagine that you ask someone to describe a certain
person and you get the answer, which indeed does it - but describing
this person at the level of single atoms. You will be just killed by
this amount of information, or? And it will be very difficult for you
(actually nearly impossible) to do any conclusions about some higher
level properties, e.g. gender, age, especially if your time is limited.
Is it appropriate? This is my point on having more higher-level
statistics that are more alligned with the abstraction level of the
application.

Yes, I think so. As I said earlier, too much information is often a bad
thing.

For example, do you care which class allocates 8 byte blocks, or are you
more interested in the overall distribution?

Roman Levenstein · Apr 9, 2006

That is why I still think that it might be also very useful to have the

this is where a toll like dtrace comes in handy, it can be used with C,
C++ and Java.

Sure. It can be used and it is used. But it is a common denominator. It
does not "know" enough about any of these languages. It is still a
system view on the execution.

For example, in Java it can be done

It's also a difference is philosophy between languages like Java and
Those like C and C++ that don't place anything between the code and the
machine.

Of course. C/C++ are much more low level and are closer to the hardware
in this regard. And they do not have the introspection capabilitities
or reflection. But my point is that you can very cheaply add it to
C/C++, though not with the same completeness as in the case of Java or
..Net. For example, if a compiler or any other instrumenter could emit
special "markers" or placeholders in the object code, which would
indicate such places in the application like: entering function,
exiting function, calling a method for a particular object, accessing a
given field of a given class and so on, then it would be possible for a
tool like dtrace or some others to do the statsistics collection very
easily, because they would can find the places in the program, where
the high-level events take place. If the applications are run in a
normal mode,i.e. not under dtrace-like tool, the markers or
placeholders are not executed at all or with a minimal impact (e.g.
NOPs or jumps to the next instruction in the user code). If something
like this is available, you can do tracing at the programming language
level and the price you pay is very low, in my opinion.

Actually, when I think a bit more about it, I can even imagine that an
extended dtrace could do this even without any specific support from a
compiler. If dtrace would understand the naming convention of C++ and
understand that some symbols are C++ function names, then it could in
principle provide you with tracing of function calls at the class level
or even object level (by analysing the this pointer in this case).
You'll be able to put conditions like: "trace ony method of a class X
if it is called for object at address Y". Should be pretty easy to
implement. Cool! Of course, it will not solve the problem of object
field access tracing. But it is still much better than nothing.

So, if you want this
Which might not behave like the regular build.

Sure, it can introduce some performance impacts as well as any other
performance measurement tool. But dtrace does it anyway, doesn't it?
Dtrace simply cannot instrument higher-level things in the application
because it has no idea where to find them. That is why it is our duty
to help it finding such places for instrumentation

Yes, I think so. As I said earlier, too much information is often a bad
thing.

For example, do you care which class allocates 8 byte blocks, or are you
more interested in the overall distribution?

Hmm. I can tell you that in some C++ applications you really want to
know which class allocates 8 bytes. For example, in compiler
construction you have some classes where you create millions of objects
and some others that are used rather seldom. So, you decide to have a
special memory allocators for certain classes or to use garbage
collector for them. To decide how to implement them in a best way and
how it plays together with others you need a statistic at the class
level. And I even did it, but by instrumenting by hand, since there was
no tool that could do it.

Another example is a browser like Mozilla. It creates tons of objects:
for pages, for JavaScripts, for XUL, for pictures, for DOM trees, etc.
Some of them are short-lived, some other can survive longer. In many
situations it is related to the class of these objects, to the space
and time locality of these objects and many other properties. As you
have probably noticed, Mozilla can easily eat a lot of memory, even if
it has just one open page after many previous actions. And it is also
rather slow. In many situations it can be grealy optimized, but you
need more information and it is very likely that you need it at a
higher-level. Overall distribution does not help much in these
situations. In my experience, overall distribution is good for more
homogeneous scenarios. There it really gives you a good idea of what is
going on and how to reduce the impact of some parts of your code.

Cheers,
Roman

romixnews · Apr 9, 2006

Hi Ira,

Nice to see you in this discussion. I often read your comments in the
comp.compilers newsgroup with a big interest.

Hi,

I'm facing the problem of analyzing a memory allocation dynamic and
object creation dynamics of a very big C++ application ...

Click to expand...

[many types of statistics... some methods for collecting them]

Click to expand...

b) Doing the instrumentation automatically

Automating the instrumentation of C++ code makes the task much easier.
This can be done either at the source code level or at the machine
code level.

When we speak about automated source-code instrumentation, some tools
like Aspect++ or OpenC++, as well as some other source-to-source
transformation tools come up to my mind. As it can be easily seen,
they are coming from such areas like aspect-oriented programming,
meta-object protocols, etc.

Click to expand...

Click to expand...

Well, some come from there. Our perspective is that massive change
is a capability that should be in the hands of programmers, and has
nothing specific to do with aspects (expect they are special case).

Click to expand...

Well, I have not stated that it is explicitly only AOP tools. I also
spoke about source-to-source translation. After all, source-to-source
translation is probably even more powerful than AOP when we speak about
source code tranformations. May be more emphasis on the AOP tools was
due to the deep impression that I got from Java AOP tools like
AspectWerkz, JBoss AOP, AspectJ, etc.

If the performance statistics collection as well as some logging of the
activities in the application is considered, the usage of AOP is quite
natural. This performance collection activity can be viewed as an
OPTIONAL orthogonal aspect (i.e. a cross-cutting concern) of an
application. It has nothing to do with the business logic of this app.
It should be possible to use this aspect only if required and
completely forget it, if it is not needed.

BTW, when you say "in hands of programmers", it sounds as if AOP does
not let programmers to hold it in their hands. But why? AOP gives you
as a programmer the possibility to tell, where and what should be
applied. You have the choice.

When AOP is applied at the source level, then it is probably less
powerful than some general purpose source-to-source transformation
tools. But the nice thing about Java AOP tools and some others is that
you can apply the same transformations at the run-time or deployment
time. No special builds are required. No access to the sources is
required. This gives a lot of freedom. And this flexibility compared to
the source-to-source transformation comes partly due to the fact that
AOP is _less powerful_ and more restricted with reagrd to source code.
Basically, it restricts itself not to the syntactic constructs, but to
the semantic constucts and artifacts that are still present even in the
compiled form, e.g. to the classes, methods, fields, function
invocations, etc. Exactly this fact gives the opportunity to apply
aspects even at the run-time.

Now, if AOP means are not good enough for your use-case and cannot
capture the abstractions and statistics you'd like to have, than
source-to-source transformation tools come into play and can be very
useful.

Main disadvantage is that you can not do it at run-time or deployment
time. Sometimes you do not have access to the source code, e.g. if you
have a 3rd party library in a binary form. Doing it at run-time is less
intrusive, since it exists only during the execution of the process.
Binary image of a program is not affected at all. And as a developer
using a debugger, you do not see any code that you have not written.
And since it is the same production build, even customers can do the
peroformance monitoring on their own under real conditions.

Doing it at run-time, especially for some languages with a virtual
machine concept, e.g. Java platofrm or .Net CLR, is also more
convenient, because than you are not programming language dependent.
Applications written in any language compiled into a bytecode for these
VMs can profit from the performance statistics collection tools. In the
source-to-source case, you'd need to have special front-ends for each
of the source languages.

I personally also think that applying changes at the source level to
huge code bases like Mozilla is not the best idea, specially if you do
not own the code and just want to check some properties of it. Doing it
by source-to-source transformation may be too "heavy weight".

I've read the paper. Looks nice. And it can do some things that AOP
cannot do, i.e. instrumenting branches. But due to the reasons
described before, I still think source-to-source is an overkill for
just a performance statistics collection. But for very specialized and
more complicated kinds of instrumentations it can be a way to go.

Roman

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Ira Baxter · Apr 11, 2006

Hi Ira,

Nice to see you in this discussion. I often read your comments in the
comp.compilers newsgroup with a big interest.

news:[email protected]...

Click to expand...

Hi,

I'm facing the problem of analyzing a memory allocation dynamic and
object creation dynamics of a very big C++ application ...

Click to expand...

[many types of statistics... some methods for collecting them]

Click to expand...

b) Doing the instrumentation automatically

Automating the instrumentation of C++ code makes the task much easier.
This can be done either at the source code level or at the machine
code level.

When we speak about automated source-code instrumentation, some tools
like Aspect++ or OpenC++, as well as some other source-to-source
transformation tools come up to my mind. As it can be easily seen,
they are coming from such areas like aspect-oriented programming,
meta-object protocols, etc.

Click to expand...

Well, some come from there. Our perspective is that massive change
is a capability that should be in the hands of programmers, and has
nothing specific to do with aspects (expect they are special case).

Click to expand...

Click to expand...

If the performance statistics collection as well as some logging of the
activities in the application is considered, the usage of AOP is quite
natural. This performance collection activity can be viewed as an
OPTIONAL orthogonal aspect (i.e. a cross-cutting concern) of an
application. It has nothing to do with the business logic of this app.
It should be possible to use this aspect only if required and
completely forget it, if it is not needed.

BTW, when you say "in hands of programmers", it sounds as if AOP does
not let programmers to hold it in their hands. But why? AOP gives you
as a programmer the possibility to tell, where and what should be
applied. You have the choice.

AOP requires you write aspects ("advice"), and run a weaver.
If the aspects are simply separate
from the main code, and you don't run the weaver, the aspects
simply aren't present. You can think of such advice
as just being specialized ways to state a limited class
of transformations (e.g., usually before-and-after function-call
insertions) and a source-to-source transformation system can easily
implement them. Just like AOP and *not* running a weaver,
if you use a source-to-source transformation system, application
of the instrumenter is completely optional
So the source-to-source systems can do AOP, and a lot
more.

And you aren't limited to "before and after"
function advice. Your examples included requirements
to count accesses to specific classes; this implies the need
the instrument every class access. With source to source
transforms, you can instrument any combination
of events (e.g., "count the number of times procedures
are call with the same value in the first and 3rd parameters").

When AOP is applied at the source level, then it is probably less
powerful than some general purpose source-to-source transformation
tools. But the nice thing about Java AOP tools and some others is that
you can apply the same transformations at the run-time or deployment
time.

At two costs. 1) You have to apply it at runtime; this doesn't help
performance.
Many applications have (even soft) realtime requirements, and
instrumentation
overhead (especially runtime, unoptimized) can sometimes prevent
the instrumentation from being useful. Think "HeisenProbes".
With source-to-source, you can often instrument special cases
specially, reducing overhead. SD's test coverage and profiling
instrumenters, based on DMS, generate probe code that typically compiles
to a single machine instruction.
2) You have generally have a "neat" object code (such as the Java JVM).
(This is a weak substitute for actually having the source).
That doesn't apply to C++, the topic of this newsgroup, since C++ codes
are generally machine object codes. (In another thread, I see
you are exploring the Sun tool, DTrace. Sounds like a great tool.
If you are running Solaris. Most people aren't. How does it
capture trace data? I was under the impression that the OS was
involved).

No special builds are required. No access to the sources is required.

Eh? How can you possibly code a AOP before-piece of advice without
knowing something about the function name which is the target
of the before-advice? You're clearly looking at the source code.
So you must have access to it.

This gives a lot of freedom. And this flexibility compared to
the source-to-source transformation comes partly due to the fact that
AOP is _less powerful_ and more restricted with reagrd to source code.
Basically, it restricts itself not to the syntactic constructs, but to
the semantic constucts and artifacts that are still present even in the
compiled form, e.g. to the classes, methods, fields, function
invocations, etc. Exactly this fact gives the opportunity to apply
aspects even at the run-time.

The "semantic constructs" require you have *full access* to
the "source code" and the implied langauge semantics.
With DMS, we simulate access to the full source with
by providing a source-code isomorphic abstract syntax
tree. Implied langauge semantics are just that; this
information often has to be encoded in auxiliary
analyses and custom transforms that take such semantics
into account.

AFAIK, AOP's "less powerful" facilities restrict the
places where advice can be inserted, and often has
the "single-point-of-insertion" phenomenon, which makes
it extremely difficult to collect data whose triggering events
are at two widely separated places in the source code.

Now, if AOP means are not good enough for your use-case and cannot
capture the abstractions and statistics you'd like to have, than
source-to-source transformation tools come into play and can be very
useful.

I'll agree that to the extent that AOP will allow you instrument
your code, you can use it. So far, AOP only seems to be available
for Java. (Yes, I'm familiar with AspectC++; that's a research
implementation, and I don't think it handles the full C++ langauge).
So in practice, there are no AOP tools for C++.
DMS exists, is production with 6 dialects of C++,
and has been used for astonishingly hard transformational tasks.

considered as a drawback in some situations.

Main disadvantage is that you can not do it at run-time or deployment
time. Sometimes you do not have access to the source code, e.g. if you
have a 3rd party library in a binary form.

Yep. That's a problem with source-to-source. However,
how can you do you do intrusive instrumentation on such code without having
its source? (Yes, I can see how you can trap calls to and
from library routines in binary code. How can you trap accesses
to data, when you don't know its name or structure?)

Doing it at run-time is less
intrusive, since it exists only during the execution of the process.
Binary image of a program is not affected at all.

In some simple cases, yes, if the JVM has hooks in function
call logic. But in general if you are doing arbitrary
runtime instrumentation, how does
the "binary" not get affected? Either you have byte-code editors
for probe insertion, or somebody painfully writes reflection
code *before* the code was compiled. That reflection
code wasn't part of the application, but does end up in the
binary. For C++ object code, in general I think you
have to build binary code patchers, which does affect
the binary. And it gets into the problem of distinguishing
code from data. It isn't good when an object code
editor patches data bytes by inserting machine code.

And as a developer
using a debugger, you do not see any code that you have not written.

So, are you debugging while you are collecting instrumentation?
Yes, that could be a problem. But I don't see why this is different
with AspectJ, can you explain why? And finally, we're back
the fact that this discussion is about C++.

And since it is the same production build, even customers can do the
peroformance monitoring on their own under real conditions.

Sampling profiling is nice for this. More intrusive instrumentation...
I don't see how you intend to accomplish it.

Doing it at run-time, especially for some languages with a virtual
machine concept, e.g. Java platofrm or .Net CLR, is also more
convenient, because than you are not programming language dependent.
Applications written in any language compiled into a bytecode for these
VMs can profit from the performance statistics collection tools. In the
source-to-source case, you'd need to have special front-ends for each
of the source languages.

So, "if you can get the infrastructure, VMs are a good thing".
The object to source-to-source here is, "you need special front ends
for each language". First, the front ends don't need to be special
except that they have to be able to deal with the languages.
Second, they have to be available.
DMS has front ends for most popular langauages and dialects.

I personally also think that applying changes at the source level to
huge code bases like Mozilla is not the best idea, specially if you do
not own the code and just want to check some properties of it. Doing it
by source-to-source transformation may be too "heavy weight".

Well, if you can get the instrumentation by some other means,
I suppose. So, what specific instrumentation method do you
propose for Mozilla? DMS and its source-to-source transformation
facility can do this now. (See below paper

I've read the paper. Looks nice. And it can do some things that AOP
cannot do, i.e. instrumenting branches. But due to the reasons
described before, I still think source-to-source is an overkill for
just a performance statistics collection.

Depends on what you mean by "performance statistics". For simple
"how many times did this function get executed" statistics,
I might agree with you. You started this thread with
requirements for all kinds of data collection, which I don't
think fit this simple model.

But for very specialized and
more complicated kinds of instrumentations it can be a way to go.

We agree :-}

-- IDB

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Thomas Maier-Komor · Apr 11, 2006

P.S. Tom, I have the impression that DTrace can do usermode profiling
only for dynamically linked extecutables. And it seems that Solaris 10
does not support the creation of static executables any more. Is it
true? May be you know, since I've no previous experience with Solaris.

Solaris 10 is delivered with dynamic libraries only. Up to Solaris 9
there were also static libraries. There are some issues concerning the
handling of static library. I think the main reason for Solaris 10 to be
delivered without a static libc is that it eases supporting the strong
guarantees of Solaris concerning binary compatibility. But you are still
free to create your own static libraries, although one should discourage
doing so.

Using dynamic libraries has man advantages over static libraries. I
think you have seen some of them already. E.g. libumem could not be used
if malloc and free are linked statically in an executable. So be aware
that the -lfast flag of Sun Studio will break the ability to use libumem.

Concerning dynamic libraries, the dynamic linker, and the most recent
developments in performance analysis tools like dtrace you could take a
look at the blogs on opensolaris.org. There are great examples for
dtrace, libumem, and many things more. You also get a lot of background
information.

Cheers,
Tom

Roman Levenstein · Apr 13, 2006

Hi Ira,

This post was rejected by comp.lang.c++.moderated since it is getting
slightly off-topic. We should consider continuing the discussion
somewhere else or by directly contacting each other via email, if there
is still interest.

See my comments inline.

BR,
Roman

Hi,

I'm facing the problem of analyzing a memory allocation dynamic and
object creation dynamics of a very big C++ application ...

[many types of statistics... some methods for collecting them]

b) Doing the instrumentation automatically

Automating the instrumentation of C++ code makes the task much easier.
This can be done either at the source code level or at the machine
code level.

When we speak about automated source-code instrumentation, some tools
like Aspect++ or OpenC++, as well as some other source-to-source
transformation tools come up to my mind. As it can be easily seen,
they are coming from such areas like aspect-oriented programming,
meta-object protocols, etc.

Well, some come from there. Our perspective is that massive change
is a capability that should be in the hands of programmers, and has
nothing specific to do with aspects (expect they are special case).

Click to expand...

If the performance statistics collection as well as some logging of the
activities in the application is considered, the usage of AOP is quite
natural. This performance collection activity can be viewed as an
OPTIONAL orthogonal aspect (i.e. a cross-cutting concern) of an
application. It has nothing to do with the business logic of this app.
It should be possible to use this aspect only if required and
completely forget it, if it is not needed.

Click to expand...

BTW, when you say "in hands of programmers", it sounds as if AOP does
not let programmers to hold it in their hands. But why? AOP gives you
as a programmer the possibility to tell, where and what should be
applied. You have the choice.

Click to expand...

AOP requires you write aspects ("advice"), and run a weaver.
If the aspects are simply separate
from the main code, and you don't run the weaver, the aspects
simply aren't present. You can think of such advice
as just being specialized ways to state a limited class
of transformations (e.g., usually before-and-after function-call
insertions) and a source-to-source transformation system can easily
implement them. Just like AOP and *not* running a weaver,
if you use a source-to-source transformation system, application
of the instrumenter is completely optional
So the source-to-source systems can do AOP, and a lot
more.

Yes. But I never argued against it. I even stated that source-to-source
systems are more powerful and can do AOP (at least source code level
AOP ), and a lot more.

And you aren't limited to "before and after"
function advice. Your examples included requirements
to count accesses to specific classes; this implies the need
the instrument every class access. With source to source
transforms, you can instrument any combination
of events (e.g., "count the number of times procedures
are call with the same value in the first and 3rd parameters").

In principle, AOP is not limited only to "before and after". You can
instrument every class. You can even instrument on the object instance
level. Have a look at AspectWerkz, really. You won't regret it.
Actually, some ideas (e.g. pointcut specification languages and
semantics) could be probably even interesting and useful also for
source-to-source tools. But I generally agree that some very complex
static(!!!) conditions can be probably more naturally expressed by
means of source-to-source transoformers. At the same time, some of the
comlex dynamic conditions (e.g. where the applicability of the
interceptor is dependent on the dynamic state and previous behavior of
the system) are likely to be easier to express by means of run-time
based AOP engines.

At two costs. 1) You have to apply it at runtime; this doesn't help
performance.
Many applications have (even soft) realtime requirements, and
instrumentation
overhead (especially runtime, unoptimized) can sometimes prevent
the instrumentation from being useful. Think "HeisenProbes".

Actually, Java AOP engines apply changes and than it is JITted. So, I;d
not call it unoptimized. But I agree that a static call is probably
still cheaper.

With source-to-source, you can often instrument special cases
specially, reducing overhead. SD's test coverage and profiling
instrumenters, based on DMS, generate probe code that typically compiles
to a single machine instruction.

No doubts, that if you have more control, you can place you
instrumentation more precisely, resulting in less overhead.

2) You have generally have a "neat" object code (such as the Java JVM).
(This is a weak substitute for actually having the source).
That doesn't apply to C++, the topic of this newsgroup, since C++ codes
are generally machine object codes.

You are absolutely right. We are drifting a bit from C++ into the AOP,
Java and compiler-related waters. May be we should move this part of
the thread there, so that moderators and other readers do not get
annoyed?

(In another thread, I see
you are exploring the Sun tool, DTrace. Sounds like a great tool.
If you are running Solaris. Most people aren't.

There are some plans to support it on BSD and Linux systems.
The approach itself is not very Solaris based. But it is probaly a big
code base, so porting will take some time...

How does it capture trace data? I was under the impression that the OS was
involved).

As far as I understand, they patch the binray image of a process in
memory. They are able to patch entries and exits from functions in
shared libs and system calls.
Additionally, they have some special providers in the OS kernel and
some important subsystems (IO, process mgmt, memory management, etc).
These providers have some explicit Dtrace-tracepoints in their code.
They are switched off by default.

So, when you use dtrace tool it activates (and eventually inserts) the
tracepoints according to the patterns you give. A pattern specify
certain event and can say something like "every time when a given
function in a given process is called and the first parameter is like
this, do the following actions". Have a look at the dtrace docs for
more information.

Eh? How can you possibly code a AOP before-piece of advice without
knowing something about the function name which is the target
of the before-advice?
You're clearly looking at the source code.
So you must have access to it.

Well, first of all there are not only before-advices. Secondly, of
course you have to know the name of the function. But this is not the
same as having the source code, or? In Java, you can obtain enough
information from the compiled class. But even in the case of C or C++,
if you have a binray, you can still see function names of external
functions (just use nm objectfile.o) and if the binary is not stripped
then you can also see the names of all the internal functions. All that
without access to the real source code. It is enough to have _some
artifacts_ that are preserved from source-code after compilation (e.g.
function names, etc). Of course, if you want to to something that is
very deeply based on the logic of the original function being
intercepted than you probably need access to the sources. But in this
case, AOP is not a right tool for a task. AOP is for orthogonal
features, which is not the case in such scenarios.

The "semantic constructs" require you have *full access* to
the "source code" and the implied langauge semantics.
With DMS, we simulate access to the full source with
by providing a source-code isomorphic abstract syntax
tree. Implied langauge semantics are just that; this
information often has to be encoded in auxiliary
analyses and custom transforms that take such semantics
into account.

Depends on how much semantics you wish to use. With Java you have
access to the classes, objects and their fields and functions. This is
the granularity you can use for instrumentation. All that does not
require any source code and can be done at the startup-time/run-time,
since it is preserved in the compiled classes.

If you wish to instrument some smaller constructs, e.g. conditional
statements or something like that, then you cannot do it with AOP. Then
you really need a source code and some source-to-source tools.

AFAIK, AOP's "less powerful" facilities restrict the
places where advice can be inserted, and often has
the "single-point-of-insertion" phenomenon, which makes
it extremely difficult to collect data whose triggering events
are at two widely separated places in the source code.

I cannot follow you here. The whole idea of AOP is that you use
pointcuts to describe a set of places where an advice should be
inserted. And then the AOP engines applies the advice at those places.
Such pointcuts can be defined with a granularity I explained before.
Additionally, annotations in the source code can be (optionally!) used
to make it even more precise. This was introduced in Java in the latest
JDK 1.5, where Sun has introduced the annotation concept. Annotations
are preserved in the classes. So, still no need for source code at the
time, where you apply the advice.

I'll agree that to the extent that AOP will allow you instrument
your code, you can use it. So far, AOP only seems to be available
for Java.

And .Net, which is conceptually very close to Java.

(Yes, I'm familiar with AspectC++; that's a research
implementation, and I don't think it handles the full C++ langauge).

I'm also not quite sure about the maturity of this project.

So in practice, there are no AOP tools for C++.

And that is a pity. I hope the situation will improve.

How about releasing a (preferably free) AOP product for C++ ? :-} You
have already everything you might need for it in place. You just need
to make your system less powerful and looking more like AOP. And
reducing the system functionality is always easy

DMS exists, is production with 6 dialects of C++,
and has been used for astonishingly hard transformational tasks.

There are no doubts that you product can do it. But it is a commercial
product and probably not very cheap, or? Now, almost all AOP tools and
also many source-to-source tools are free, though they are probably not
very powerful compared to your tools. So, many developers would still
use not-so-advanced AspectC++ or something else.
You argued that not everybody uses Dtrace and Solaris,even though they
are free now. True. But I'd argue that evem less people use DMS
products. And for open-source projects, e.g. Mozilla, it is a good
style to use mostly open-source tools, so that everybody has the
ability to work on the project and has access to all the required
tools. Therefore commericial products are not very popular for these
kinds of projects.
(Don't get me wrong. I'm not a purist of the open-source approach. I
work for a very big corporation and use commercial development tools on
every day basis with a great success. But for open-source projects,
usage of open-source development tools is almost a must.)

And again, you speak about some "astonishingly hard transformational
tasks". What I need as well as many other performance testers just does
not falll into this category, in my opinion. It is just overkill for
many simpler and less astonishing tasks.

Yep. That's a problem with source-to-source. However,
how can you do you do intrusive instrumentation on such code without having
its source? (Yes, I can see how you can trap calls to and
from library routines in binary code. How can you trap accesses
to data, when you don't know its name or structure?)

OK. First of all, I was speaking mainly about Java AOP tools. There you
can trap all reads or writes even at the single field level!

When we speak about C/C++, trapping access to the data selectively
(e.g. based on the type of the variables) is certainly not possible
without some support from the compiler (that should have a
corresponding support) or instrumentation at the source level (this is
where DMS could come into play). Of course, it is possible to trap
every memory read or write access, though it is not as precise as a
selective trapping. This is exactly what tolls like Pruify or Valgrind
do.

BTW, GCC already has option "-finstrument-functions" (Generate
instrumentation calls for entry and exit to functions.) Taking this
idea forward, one can imagine that it could also instrument access to
the data structures. The difference from source-to-source approach is
that it does not affect sources in any form.

In some simple cases, yes, if the JVM has hooks in function
call logic. But in general if you are doing arbitrary
runtime instrumentation, how does
the "binary" not get affected?
Either you have byte-code editors
for probe insertion, or somebody painfully writes reflection
code *before* the code was compiled. That reflection
code wasn't part of the application, but does end up in the
binary.

I meant binary executable file code on the filesystem. An AOP engine
changes and patches the running binary image in memory. In case of
Java, it indeed uses byte-code editing libraries and modifies the
loaded classes on the fly. With Java, you can do it pretty easily and
efficiently. Of course, this exists only during the run-time. May be it
is also somehow possible to save a modified version in a persistent
store, so that next time you don't need to patch it again.

For C++ object code, in general I think you
have to build binary code patchers, which does affect
the binary. And it gets into the problem of distinguishing
code from data. It isn't good when an object code
editor patches data bytes by inserting machine code.

Yes. Dtrace does patch the running image of the program in memory. But
what is a problem in distinguishing code from data? Code sections (i.e.
,text) are usually marked as those either in the ELF/PE executable or
even in memory using special memory protection mechanisms. So, this is
not such a big problem. If you mean that code can be mixed with data, I
agree. It could be an issue. But usually standard prologue and epilogue
code sequences are well known for a given platform, so they are not a
problem. Now, if you wish to patch in the middle of the function then
you have a problem! Doing so at the binary level at run-time could
require a very complex analysis (actually disassembly) and probably not
possible in general case, unless you as a developer explicitly mark
such places in you source code. So, in this case you really go for
source-to-source transformations or try extend AOP to support join
point in the middle of the functions.

So, are you debugging while you are collecting instrumentation?
Yes, that could be a problem. But I don't see why this is different
with AspectJ, can you explain why?

Good point. This is not different from AspectJ at all, at least from
its previous versions that existed before they merged it with
AspectWerkz. AspectJ is source-to-source tool with limitations of AOP
approach. But most of the best Java AOP engines do not work at the
source level. They work at the run-time. Again, have a look at
AspectWerks or JBoss AOP. Of course with "invisible" patches introduced
by these tools you have another kind of problem. Something goes wrong
in the advice code, but you as a developer don't see it

But this is
another issue. Depends on what you want.

And finally, we're back
the fact that this discussion is about C++.

That's true.

Sampling profiling is nice for this. More intrusive instrumentation...
I don't see how you intend to accomplish it.

Well, Dtarce does it by producing some sampling every N milliseconds.
You can configure it. No instrumentation is needed at all, as far as I
understand (or on the opposite, you could instrument every entry and
exit from any function of the process or even every machine
instruction). It just looks at the PC register and looks up which
function in the current proces it is. But of course, it is not as
precise, as it probably could be with a cleverer instrumenation.

So, "if you can get the infrastructure, VMs are a good thing".
The object to source-to-source here is, "you need special front ends
for each language". First, the front ends don't need to be special
except that they have to be able to deal with the languages.

Under special I meant a separate front-end per language. Not that they
are very special in any other sense.

Second, they have to be available.
DMS has front ends for most popular langauages and dialects.

Yes. If you customer has it, this is very good for him. But VMs are
installed on almost any computer and are usually free and standardized.
One could argue that there will be more tools for this kind of
infrastructure. And of course the argumentation about free/commercial
is still applicable here.

Well, if you can get the instrumentation by some other means,
I suppose. So, what specific instrumentation method do you
propose for Mozilla? DMS and its source-to-source transformation
facility can do this now. (See below paper

Looking at what the Mozilla developers use now
(http://www.mozilla.org/performance/tools.html), almost anything would
be a win

Now seriously, at the moment the following methods are used:
- different malloc debug libraries (dmalloc, Boehm GC)
- some parts of the code are instrumented by hand to collect some
statistics about memory allocation, etc
- It is theoretically possible to use Pruify, if you have it

Since I'm not a Mozilla contributor (I'm just starting to look into
Mozilla code), and not a great expert in the Mozilla code base,
consider the following as my personal opinion based on the reading of
the documentation and analysis of the code:

The lowest layer of the project (NSPR - portable runtime) is
implemented in C++ and they use reference counting for grabage
collection. Such concepts like memory areans are used for memory
allocation at this layer. Higher layers are implemented in JavaScript
and XUL. JavaScript uses mark&sweep grabage collector for its own
objects management. Higher levels can access C++ objects from the lower
layers using special proxy objects exposed by the lower layers.

A problem is that there are some circular dependencies in the objects
graph between lower and higher layers. As a result, objects in such
cycles are not collected by a reference counting GC and memory is
leaked.

Another problem is the way how memory allocation is done at C++ layer.
It can and should be probably greatly improved. But this requires a
thorough analysis of its behavior today. That is one of the reasons, I
started this thread.

Depends on what you mean by "performance statistics". For simple
"how many times did this function get executed" statistics,
I might agree with you. You started this thread with
requirements for all kinds of data collection, which I don't
think fit this simple model.

Still, for the most requirements I had, AOP (especially Java-like) can
do the job. For something like code coverage - probably not, since
separate statements are not handled by typical AOP tools. You product
could do it easily, I guess.

We agree :-}

Good that we agree at least on something common :-}

BR,
Roman

Procedural Programs of the Past	3	May 9, 2023
Print multiple rows of text from different programs. (C language)	3	Apr 11, 2022
Collecting multiple items and saving to one list item, for eventual storage as a record.	8	Mar 5, 2023
Please Help me to Write these C programs, I am fully confused to solve these Programs. Thanks alot.	1	May 30, 2022
(Rebooting) Python Usage Statistics	0	Feb 13, 2012
Different font sizes inside same div	2	Dec 3, 2023
Would Python be suitable for a sports statistics website?	2	Jan 31, 2014
installation of Statistics::Test::WilcoxonRankSum	0	Nov 5, 2011

Collecting different execution statistics of C++ programs

romixnews

Thomas Maier-Komor

romixnews

Ian Collins

Default User

Ian Collins

Thomas Maier-Komor

romixnews

Ira Baxter

Ian Collins

romixnews

romixnews

Ian Collins

Roman Levenstein

romixnews

Ira Baxter

Thomas Maier-Komor

Roman Levenstein

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads