Need help with bug (glibc, linux, malloc-related)

L

Louis B. (ldb)

I have a long running program that eventually crashes when valloc()
returns a 0. This program is relatively non-trivial as it's written in
Ada, is multithreaded, has alot of SSE routines. A memory leak would
be the most obvious cause but this appears to be more sinister then a
simple memory leak.

After alot of running around and searching through the code I found an
anomaly that I'd like to explain and understand if it's the cause of
valloc() returning a 0. It may be unrelated to my problem above, but I
can't be sure. I've recreated this anomaly in a very simple program.

Basically, mallinfo() seems to produce garbage results in multi-
threaded code. In a very single program where I fire up 2 pthreads
have them malloc() and free a bunch of stuff, once all the threads are
finished, I print out malloc_stats() and mallinfo() and I seem to get
garbage for the mmap() related fields.

Most of the time I run the code, the hblks and hblkshd fields of
mallinfo() come back 0 and 0, but a fair percentage of the time I get
a strange answer where hblks is either 2, 5, -3 or -1 or something
like that. It's almost like there's a race condition inside the
malloc()/free() code that updates these fields.

This is out-of-the-box Ubuntu with gcc 4.1.2

I've included the code at the bottom, but here is an example output:
Arena 0:
system bytes = 135168
in use bytes = 288
Arena 1:
system bytes = 135168
in use bytes = 1128
Total (incl. mmap):
system bytes = 4045234176
in use bytes = 4044965256
max mmap regions = 1
max mmap bytes = 250003456
hblks : -1 hblkshd : -250003456


The mmap() and hblk (from mallinfo()) data seems to be totally
corrupted, to me. (In this particular case, they've gone negative). In
this code, the "answer" should be 0 since everything has been freed,
should it not? Are these numbers supposed to be meaningful?


(this code has a pretty large malloc, but similar results with more
reasonable sized mallocs like 10 megs)
---------------------------------
Built with:
gcc main.c -lpthread

Here is the code I'm running:

#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include <malloc.h>

void *detection_thread();

main (int argc, char *argv[])
{
pthread_t thread1, thread2;
pthread_t thread3, thread4;
struct mallinfo mi;

// spawn threads
pthread_create(&thread1, NULL, detection_thread, NULL);
pthread_create(&thread2, NULL, detection_thread, NULL);

// wait for threads to return
pthread_join(thread1, NULL);
pthread_join(thread2, NULL);

printf("********************************\n");
malloc_stats();
mi = mallinfo();
printf("hblks : %d hblkshd : %d\n", mi.hblks, mi.hblkhd);
}

void *detection_thread()
{
int *slappy;
int i;
struct mallinfo mi;

for (i = 0; i < 5000; i++)
{
slappy = malloc(1000*1000*250);

if (slappy == NULL)
{
printf("CRASH\n");
exit(1);
}
free(slappy);
}

printf("Done!\n");
}
 
B

Ben Pfaff

Louis B. (ldb) said:
I have a long running program that eventually crashes when valloc()
returns a 0. This program is relatively non-trivial as it's written in
Ada, is multithreaded, has alot of SSE routines. A memory leak would
be the most obvious cause but this appears to be more sinister then a
simple memory leak.

comp.lang.c is not the right place to submit a glibc bug report.
I'd suggest a GNU newsgroup or mailing list instead.
 
D

Dave Vandervies

I have a long running program that eventually crashes when valloc()
returns a 0. This program is relatively non-trivial as it's written in
Ada, is multithreaded, has alot of SSE routines. A memory leak would
be the most obvious cause but this appears to be more sinister then a
simple memory leak.

None of valloc(), Ada, multithreading, or SSE are defined by the C
language, so your problem is well beyond the scope of comp.lang.c.

A more general Linux programming newsgroup might be better able to answer
your question.


dave
 
C

CBFalconer

Louis B. (ldb) said:
.... snip ...

Basically, mallinfo() seems to produce garbage results in multi-
threaded code. In a very single program where I fire up 2 pthreads
have them malloc() and free a bunch of stuff, once all the threads
are finished, I print out malloc_stats() and mallinfo() and I seem
to get garbage for the mmap() related fields.

This is all OT for c.l.c and you should try a newsgroup dedicated
to your system and/or threads. However ...

Basically, malloc (and mallinfo) are running in user space. If
those are true threads, as opposed to full processes with separate
data areas, of course the system will get confused. You need to
protect all access to the malloc and mallinfo packages with
suitable constructs, such as semaphores, monitors, etc. You can
see one implementation of both packages (for DJGPP - these things
are system specific) as nmalloc at:

<http://cbfalconer.home.att.net/download/>

--
<http://www.cs.auckland.ac.nz/~pgut001/pubs/vista_cost.txt>
<http://www.securityfocus.com/columnists/423>

"A man who is right every time is not likely to do very much."
-- Francis Crick, co-discover of DNA
"There is nothing more amazing than stupidity in action."
-- Thomas Matthews
 
M

matevzb

:

... snip ...


This is all OT for c.l.c and you should try a newsgroup dedicated
to your system and/or threads. However ...

Basically, malloc (and mallinfo) are running in user space. If
those are true threads, as opposed to full processes with separate
data areas, of course the system will get confused.
As you said, it's <OT>In this case the system shouldn't get confused
if it conforms to POSIX - it requires that "Each function defined in
the System Interfaces volume of IEEE Std 1003.1-2001 is thread-safe
unless explicitly stated otherwise". malloc() is a system interface
function and nothing is explicitly stated about thread-safeness. So
IMO it *should* be safe to use, unless of course, it isn't (and that
is documented).
To the OP: check the malloc() man page first. If it says it conforms
to POSIX/SUSv3, file a bug report, but not for gcc, this is a (g)libc
problem.</OT>
<snip>
 
W

William Ahern

On Mon, 05 Feb 2007 07:18:45 -0800, Louis B. (ldb) wrote:
Most of the time I run the code, the hblks and hblkshd fields of
mallinfo() come back 0 and 0, but a fair percentage of the time I get
a strange answer where hblks is either 2, 5, -3 or -1 or something
like that. It's almost like there's a race condition inside the
malloc()/free() code that updates these fields.

This is out-of-the-box Ubuntu with gcc 4.1.2

I can say with an extremely high-degree of confidence that there isn't a
race condition in malloc()/free(), unless some other code is interposing
these functions. (I was once convinced for three days I had found a bug in
GCC, until I spotted that superfluous semi-colon ;)

Try Valgrind. Valgrind has plugs-in to analyze threaded coded and detect
as best it can unprotected shared resources. (Valgrind also will catch
memory errors with better diagnostics than other software.)

If that fails, try another newsgroup. This one is definitely not the group
you want.

- Bill
 
E

Eric Sosman

Louis B. (ldb) wrote On 02/05/07 10:18,:
I have a long running program that eventually crashes when valloc()
returns a 0. [...]

Others have pointed out that this isn't a C question.
However, one possible source of confusion may be a C
mistake:
struct mallinfo mi;
[...]
mi = mallinfo();
printf("hblks : %d hblkshd : %d\n", mi.hblks, mi.hblkhd);

There's no `struct mallinfo' in Standard C, but on the
box I'm using at the moment all the members of that struct
are of type `unsigned long'. If that's true of your machine,
too, then you're printing them with the wrong format specifier:
"%d" requires a corresponding `(signed) int' argument, not an
`unsigned long'. Turn up your warning levels, and fix what
the compiler complains about.

That might not cure what ails you -- but when you're faced
with a mystery, it's always a good policy to get your code into
squeaky-clean condition before concluding that you've found a
bug.
 
C

CBFalconer

Eric said:
Louis B. (ldb) wrote On 02/05/07 10:18,:
I have a long running program that eventually crashes when valloc()
returns a 0. [...]

Others have pointed out that this isn't a C question.
However, one possible source of confusion may be a C mistake:
struct mallinfo mi;
[...]
mi = mallinfo();
printf("hblks : %d hblkshd : %d\n", mi.hblks, mi.hblkhd);

There's no `struct mallinfo' in Standard C, but on the
box I'm using at the moment all the members of that struct
are of type `unsigned long'. If that's true of your machine,
too, then you're printing them with the wrong format specifier:
"%d" requires a corresponding `(signed) int' argument, not an
`unsigned long'. Turn up your warning levels, and fix what
the compiler complains about.

Here is the header for my malldbg module, which was deliberately
designed to be compatible with the POSIX mallinfo module, except
for DJGPP. It has some added features. It is specific to the
DJGPP system, where an int and a long are identical. The
mallsethook and malldbgdumpfile functions are not present in
POSIX. Note that the malldbg module is written in standard C
(apart from the int size mentioned above), i.e. the system
dependant stuff is isolated in nmalloc.c. The connection is
established via sysquery.h. You can see the whole thing at:

<http://cbfalconer.home.att.net/download/nmalloc.zip>

/* -------- malldbg.h ----------- */

/* Copyright (c) 2003 by Charles B. Falconer
Licensed under the terms of the GNU LIBRARY GENERAL PUBLIC
LICENSE and/or the terms of COPYING.DJ, all available at
<http://www.delorie.com>.

Bug reports to <mailto:[email protected]>
*/

#ifndef malldbg_h
#define malldbg_h

/* This is to be used in conjunction with a version of
nmalloc.c compiled with:

gcc -DNDEBUG -o malloc.o -c nmalloc.c

after which linking malldbg.o and malloc.o will
provide the usual malloc, free, realloc calls.
Both malloc.o and malldbg.o can be components
of the normal run time library.
*/

#include <stddef.h>
#include "sysquery.h"

struct mallinfo {
int arena; /* Total space being managed */
int ordblks; /* Count of allocated & free blocks */
int smblks;
int hblks; /* Count of free blocks */
int hblkhd; /* Size of the 'lastsbrk' block */
int usmblks;
int fsmblks;
int uordblks; /* Heap space in use w/o overhead */
int fordblks; /* Total space in free lists */
int keepcost; /* Overhead in tracking storage */
};

struct mallinfo mallinfo(void);
int malloc_verify(void);
int malloc_debug(int level);
void mallocmap(void);
FILE *malldbgdumpfile(FILE *fp);
M_HOOKFN mallsethook(enum m_hook_kind which,
M_HOOKFN newhook);

#endif
/* -------- malldbg.h ----------- */


--
<http://www.cs.auckland.ac.nz/~pgut001/pubs/vista_cost.txt>
<http://www.securityfocus.com/columnists/423>

"A man who is right every time is not likely to do very much."
-- Francis Crick, co-discover of DNA
"There is nothing more amazing than stupidity in action."
-- Thomas Matthews
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,954
Messages
2,570,116
Members
46,704
Latest member
BernadineF

Latest Threads

Top