callgrind - Developer IT

How do I get callgrind to dump source line information?

- by Jeremybub

I'm trying to profile a shared library on GNU/Linux which does real-time audio processing, so performance is important. I run another program which hooks it up to the audio input and output of my system, and profile that with callgrind. Looking at the results in KCacheGrind, I get great information about what functions are taking up most of my time. However, it won't let me look at the line by line information, and instead says I need to compile it with debugging symbols and run the profiling again. The program which I am profiling is not compiled with debug symbols, but the library is. And I know this, because interestingly, source code annotations for cachegrind work fine. When I run callgrind, it says the default is to dump source line information, but it just isn't doing that. Is there some way I could force it to, or figure out what's stopping it?

Read the article

Need help in reading callgrind output

- by n179911

Hi, I have run callgrind with my application like this "valgrind --tool=callgrind MyApplication" and then call 'callgrind_annotate --auto=yes ./callgrind.out.2489' I see output like 768,097,560 PROGRAM TOTALS -------------------------------------------------------------------------------- Ir file:function -------------------------------------------------------------------------------- 18,624,794 /build/buildd/eglibc-2.11.1/elf/dl-lookup.c:do_lookup_x [/lib/ld-2.11.1.so] 18,149,492 /src/js/src/jsgc.cpp:JS_CallTracer'2 [/src/firefox-debug-objdir/js/src/libmozjs.so] 16,328,897 /src/layout/style/nsCSSDataBlock.cpp:nsCSSExpandedDataBlock::DoAssertInitialState() [/src/firefox-debug-objdir/toolkit/library/libxul.so] 13,376,634 /build/buildd/eglibc-2.11.1/nptl/pthread_getspecific.c:pthread_getspecific [/lib/libpthread-2.11.1.so] 13,005,623 /build/buildd/eglibc-2.11.1/malloc/malloc.c:_int_malloc [/lib/libc-2.11.1.so] 10,404,453 ???:0x0000000000009190 [/usr/lib/libpangocairo-1.0.so.0.2800.0] 10,358,646 /src/xpcom/io/nsFastLoadFile.cpp:NS_AccumulateFastLoadChecksum(unsigned int*, unsigned char const*, unsigned int, int) [/src/firefox-debug-objdir/toolkit/library/libxul.so] 8,543,634 /src/js/src/jsscan.cpp:js_GetToken [/src/firefox-debug-objdir/js/src/libmozjs.so] 7,451,273 /src/xpcom/typelib/xpt/src/xpt_arena.c:XPT_ArenaMalloc [/src/firefox-debug-objdir/toolkit/library/libxul.so] 7,335,131 ???:g_type_check_instance_is_a [/usr/lib/libgobject-2.0.so.0.2400.0] I have a few questions: What does the number on the right mean? Does it mean it spend accumulative that long in calling the function on the right? How can I tell how many times that function has been called and Does that include the time spend in calling the functions called by that function? What does line with ??? mean? e.g. ???:0x0000000000009190 [/usr/lib/libpangocairo-1.0.so.0.2800.0] Thank you.

Read the article

Generating a Call Hierarchy for dynamicly invoked method

- by Maxim Veksler

Hello, Today's world of dynamic invoke, reflection and runtime injection just doesn't play well with traditional tools such as ctags, doxygen and CDOC. I am searching for a method call hierarchy visualization tool that can display both static and dynamic method invocations. It should be easy to use, light during execution and provide helpful detailed information about the recorded runtime session. Now I guess Callgrind could be considered a valid solution for the family C. What tool / technique could you suggest to create a call graph for both static and dynamic method invocation for JVM based bytecode? The intended end result is a graphical display (preferably interactive) which can show path from main() to each method that was invoked. During research for this post I stumbled upon javashot, it seems that this is the kind of approach I'm aiming at, I would prefer that this would be integrated into a kind of profiler or alike which than can be used from within my IDE (Eclipse, IntelliJ, Netbeans and alike). Thank you, Maxim.

Read the article

Need help in understanding kcachedgrind output

- by hap497

Hi, I am using valgrind callgrind to profile a program on gtk. And then I use kcachedgrind to read the result. I have captured an update a screenshot of kcachedgrind here: http://i41.tinypic.com/168spk0.jpg. It said the function gtk_moz_embed_new() costed '15.61%'. But I dont understand how is that possible. the function gtk_moz_embed_new() literally has 1 line: and it is just calling a g_object_new(). GtkWidget * gtk_moz_embed_new(void) { return GTK_WIDGET(g_object_new(GTK_TYPE_MOZ_EMBED, NULL)); } Can you please help understanding the result or how to use kcachedgrind. Thank you.

Read the article

What is the most efficient method to find x contiguous values of y in an array?

- by Alec

Running my app through callgrind revealed that this line dwarfed everything else by a factor of about 10,000. I'm probably going to redesign around it, but it got me wondering; Is there a better way to do it? Here's what I'm doing at the moment: int i = 1; while ( ( (*(buffer++) == 0xffffffff && ++i) || (i = 1) ) && i < desiredLength + 1 && buffer < bufferEnd ); It's looking for the offset of the first chunk of desiredLength 0xffffffff values in a 32 bit unsigned int array. It's significantly faster than any implementations I could come up with involving an inner loop. But it's still too damn slow.

Read the article

_dl_runtime_resolve -- When do the shared objects get loaded in to memory?

- by windfinder

We have a message processing system with high performance demands. Recently we have noticed that the first message takes many times longer then subsequent messages. A bunch of transformation and message augmentation happens as this goes through our system, much of it done by way of external lib. I just profiled this issue (using callgrind), comparing a "run" of just one message with a "run" of many messages (providing a baseline of comparison). The main difference I see is the function "do_lookup_x" taking up a huge amount of time. Looking at the various calls to this function, they all seem to be called by the common function: _dl_runtime_resolve. Not sure what this function does, but to me this looks like the first time the various shared libraries are being used, and are then being loaded in to memory by the ld. Is this a correct assumption? That the binary will not load the shared libraries in to memory until they are being prepped for use, therefore we will see a massive slowdown on the first message, but on none of the subsequent? How do we go about avoiding this? Note: We operate on the microsecond scale.

Developer IT