Memory contents mysteriously changing

M

Mark

Hello, I've run into a strange bug and I'm not sure how to proceed
with fixing it. Any suggestions would be most appreciated.

Here is the relevant code:

template <class T>
class Mesh3d {

public:
// constructors and data access methods (not shown)

write_to_file(string filename);

private:
T * data;
}

Inside my write_to_file() method, I am invoking a library called SILO
to write the data to a .silo file, like this:

template <class T>
void Mesh3d<T>::write_to_file(string filename)
{
DBfile * file = NULL;

file = DBCreate(filename.c_str(), DB_CLOBBER, DB_LOCAL, NULL,
DB_PDB); //!!!!

DBPutQuadmesh(file, "SPH_data", NULL, coordinates, dims, ndims,
DB_FLOAT, DB_COLLINEAR, NULL);

DBPutQuadvar1(file, "density", "SPH_data", data, dims, ndims, NULL, 0,
DB_FLOAT, DB_NODECENT, NULL);

DBClose(file);
}

Note that every function starting with "DB" is a call to the SILO
library. I've double-checked the SILO documentation and as far as I
can tell I am invoking these methods properly (I've also double-
checked that against some sample SILO code, which when on its own
compiles and runs fine).

I invoke the above code from a main() like this:

Mesh3d<float> * mesh = new Mesh3d<float>(xmin, xmax, ymin, ymax, zmin,
zmax, xdim, ydim, zdim);
string fn("mesh.silo");
mesh->write_to_file(fn);


There are no compile or runtime errors. But when I watch the contents
of the private variable "data", they mysteriously change. I've
pinpointed the line at which they change, and it is the line I have
commented above with the //!!!! (ie: it's the DBCreate line).
Strangely, the DBCreate function is passed NO information pertaining
to "data", and as far as I can tell it has no pointers to "data" or
any other way in which "data" could be in scope inside the function
DBCreate().

However, when I run this in a debugger, just before calling DBCreate I
have:

(gdb) p data[614]
$1 = 0.904355466

And immediately after the DBCreate line I have:

(gdb) p data[614]
$2 = 4.17010799e-34

I just used the index 614 as an example... some entries in the "data"
array change, and some don't. 614 happens to be one of those that
changes.

How could the data at this memory location be changing? I get no
errors, no seg faults, or anything like that. And the DBCreate()
routine should have nothing to do with the "data" array, as far as I
can tell, and yet according to the debugger it seems to be the line
that is causing the memory contents to change.

Hopefully I am just overlooking something simple, but right now I am
quite baffled by this. Any insights would be very helpful!

Thanks,

Mark
 
V

Victor Bazarov

Mark said:
Hello, I've run into a strange bug and I'm not sure how to proceed
with fixing it. Any suggestions would be most appreciated.

Here is the relevant code:

template <class T>
class Mesh3d {

public:
// constructors and data access methods (not shown)

Too bad.
write_to_file(string filename);

Probably

void write_to_file(string const& filename) const;
private:
T * data;

}
;
Inside my write_to_file() method, I am invoking a library called SILO

to write the data to a .silo file, like this:

template <class T>
void Mesh3d<T>::write_to_file(string filename)
{
DBfile * file = NULL;

file = DBCreate(filename.c_str(), DB_CLOBBER, DB_LOCAL, NULL,
DB_PDB); //!!!!

DBPutQuadmesh(file, "SPH_data", NULL, coordinates, dims, ndims,
DB_FLOAT, DB_COLLINEAR, NULL);

DBPutQuadvar1(file, "density", "SPH_data", data, dims, ndims, NULL, 0,

So, here you pass 'data'. Has it been initialized, assigned to
anything? You don't show that part of your code. Memory management can
become tricky. Are you sure you're doing it right?
DB_FLOAT, DB_NODECENT, NULL);

DBClose(file);
}

Note that every function starting with "DB" is a call to the SILO
library. I've double-checked the SILO documentation and as far as I
can tell I am invoking these methods properly (I've also double-
checked that against some sample SILO code, which when on its own
compiles and runs fine).

I invoke the above code from a main() like this:

Mesh3d<float> * mesh = new Mesh3d<float>(xmin, xmax, ymin, ymax, zmin,
zmax, xdim, ydim, zdim);

string fn("mesh.silo");
mesh->write_to_file(fn);


There are no compile or runtime errors. But when I watch the contents
of the private variable "data", they mysteriously change. I've
pinpointed the line at which they change, and it is the line I have
commented above with the //!!!! (ie: it's the DBCreate line).
Strangely, the DBCreate function is passed NO information pertaining
to "data", and as far as I can tell it has no pointers to "data" or
any other way in which "data" could be in scope inside the function
DBCreate().

However, when I run this in a debugger, just before calling DBCreate I
have:

(gdb) p data[614]

Why 614?
$1 = 0.904355466

And immediately after the DBCreate line I have:

(gdb) p data[614]
$2 = 4.17010799e-34

What's the significance of '614'?
I just used the index 614 as an example... some entries in the "data"
array change, and some don't. 614 happens to be one of those that
changes.

Example? What's the *real* size of the data, how do you allocate memory
for it, how do you fill the memory up?
How could the data at this memory location be changing?

If you don't designate the memory as belonging to your program, why
can't it change?
> I get no
errors, no seg faults, or anything like that.

It doesn't mean your program is OK. Most likely you have undefined
behaviour due to accessing memory you didn't allocate.
> And the DBCreate()
routine should have nothing to do with the "data" array,

"Should"? And why are you calling it "an array"? It's a pointer. What
it points to is unknown, at least to us (you didn't show us how the
memory is managed).
> as far as I
can tell, and yet according to the debugger it seems to be the line
that is causing the memory contents to change.

Hopefully I am just overlooking something simple, but right now I am
quite baffled by this. Any insights would be very helpful!

Something simple? Well, yes, I suppose. You use a pointer that points
to nothing (or rather who knows where it points to), as if it were an
array of values you're allowed to change. Most likely. If you intended
to dynamically manage your memory, it's fine. Just do it right. Since
my crystal ball is in the shop right now, I can't see inside your
computer's RAM/drive or inside your brain to know how 'data' member in
your class is manipulated. Post more code.

V
 
M

Mark

Hi Victor,

Thanks for your response and for your questions. Just having someone
ask me questions helps me make progress in debugging.
A pointer? A *naked* pointer? Why? Couldn't you use 'vector<T>' or
some other container?

Good point. I started out as a C programmer once upon a time and old
habits die hard. ;-)
I went ahead and changed it to a vector said:
<shrug> You shouldn't assume that we know anything about it.

I'm not assuming you do. Nor do I know much about it, really. It's
just a library that for me is essentially a black box.
So, here you pass 'data'. Has it been initialized, assigned to
anything? You don't show that part of your code. Memory management can
become tricky. Are you sure you're doing it right?

I agree, memory management is tricky and it is likely the source of my
problems. I'll post the rest of the code below. I apologize for not
posting more code in the first place... I have hundreds of lines of
code and I didn't want to overwhelm this forum with code. I tried to
pick "relevant" lines but clearly I should've included more. So since
you apparently don't mind reading my code, here is a larger sample
(but still a sample nonetheless)...

template <class T>
class Mesh3d {

public:

// constructors and initialization:

~Mesh3d() { }

Mesh3d(float Xmin, float Xmax, float Ymin, float Ymax, float
Zmin, float Zmax, int Xres, int Yres, int Zres);
// creates a 3d mesh on a rectangular volume
with the appropriate maxes and mins as the boundaries
// of the volume; the resolution in each X, Y,
and Z direction is given by Xres, Yres, and Zres

void zero(); // sets the value of every
grid point to 0
// only call this if T is of
type float, int, etc

// mesh properties:

float xmin, xmax, ymin, ymax, zmin, zmax, xinc, yinc, zinc;
int gridpoints;
int xgridmax, ygridmax, zgridmax;

private:

void construct_mesh(float Xmin, float Xmax, float Ymin, float
Ymax, float Zmin, float Zmax, int Xres, int Yres, int Zres);
// helper function for the constructors


// mesh data:

vector<T> data;
};

template <class T>
Mesh3d<T>::Mesh3d(float Xmin, float Xmax, float Ymin, float Ymax,
float Zmin, float Zmax, int Xres, int Yres, int Zres)
{
construct_mesh(Xmin, Xmax, Ymin, Ymax, Zmin, Zmax, Xres, Yres,
Zres);
}


template <class T>
void Mesh3d<T>::construct_mesh(float Xmin, float Xmax, float Ymin,
float Ymax, float Zmin, float Zmax, int Xres, int Yres, int Zres)
{
xmin = Xmin;
xmax = Xmax;
ymin = Ymin;
ymax = Ymax;
zmin = Zmin;
zmax = Zmax;

xgridmax = Xres;
ygridmax = Yres;
zgridmax = Zres;

gridpoints = xgridmax * ygridmax * zgridmax;

xinc = (xmax - xmin) / (xgridmax-1);
yinc = (ymax - ymin) / (ygridmax-1);
zinc = (zmax - zmin) / (zgridmax-1);

xmax = xmin + (xgridmax-1)*xinc;
ymax = ymin + (ygridmax-1)*yinc;
zmax = zmin + (zgridmax-1)*zinc;

data.resize(gridpoints);

// initialize data to all zeroes; for now I'm assuming that T
will be of type float
zero();
}

template <class T>
void Mesh3d<T>::zero()
{
for(int i=0; i<gridpoints; i++)
data = 0;
}

template <class T>
void Mesh3d<T>::writesilo(string filename)
{
// create the coordinate grid
float * xcoords = new float[xgridmax];
float * ycoords = new float[ygridmax];
float * zcoords = new float[zgridmax];

for(int i=0; i<xgridmax; i++) {
xcoords = itox(i);
}
for(int i=0; i<ygridmax; i++) {
ycoords = jtoy(i);
}
for(int i=0; i<zgridmax; i++) {
zcoords = ktoz(i);
}

float * coordinates[3] = {xcoords, ycoords, zcoords};

// open the output file
DBfile * file = NULL;
file = DBCreate(filename.c_str(), DB_CLOBBER, DB_LOCAL, NULL,
DB_PDB);

// construct the quad mesh
int ndims = 3;
int dims[3] = {xgridmax, ygridmax, zgridmax};

DBPutQuadmesh(file, "SPH_data", NULL, coordinates, dims, ndims,
DB_FLOAT, DB_COLLINEAR, NULL);

// collect the density data (later this will be more complicated
than a simple 1 to 1 assign)
float density[gridpoints];
for(int i=0; i<gridpoints; i++) density = data;

// write density data to the quad mesh
DBPutQuadvar1(file, "density", "SPH_data", density, dims, ndims,
NULL, 0, DB_FLOAT, DB_NODECENT, NULL);

DBClose(file);
}


And the code that utilizes this Mesh3d class:


int main(int argc, char** argv) {

if (argc < 5) {
printf("Usage: %s <gadget_snapshot> xdim ydim zdim\n",argv
[0]);
return 0;
}

snapshot *snap = new snapshot();

snap->read(argv[1]);

vector<float> bb = snap->getBB();

printf("Bounding box:\nMinima: x = %f, y = %f, z = %f\nMaxima: x =
%f, y = %f, z = %f\n",bb[0],bb[2],bb[4],bb[1],bb[3],bb[5]);

int xdim = atoi(argv[2]);
int ydim = atoi(argv[3]);
int zdim = atoi(argv[4]);
Mesh3d<float> * mesh = new Mesh3d<float>(bb[0], bb[1], bb[2], bb[3],
bb[4], bb[5], xdim, ydim, zdim);

string fn("mesh.silo");
if (snap->convert_to_mesh(mesh)) {
mesh->writesilo(fn);
}

}

The mesh object above is populated with data during the
convert_to_mesh method. By using a debugger I see the data contained
in the mesh is what I want. Now I want to output that data to a file
using the SILO library, and that is done inside the writesilo()
method. Unfortunately, within the writesilo() method, the data in the
mesh object is being changed during the line that includes a call to
DBCreate(). I don't know why. Thanks for any insight.

Mark
 
L

LR

Mark wrote:

I agree, memory management is tricky and it is likely the source of my
problems. I'll post the rest of the code below. I apologize for not
posting more code in the first place... I have hundreds of lines of
code and I didn't want to overwhelm this forum with code. I tried to
pick "relevant" lines but clearly I should've included more. So since
you apparently don't mind reading my code, here is a larger sample
(but still a sample nonetheless)...

Have you tried compiling a much smaller example, perhaps just calling
DBCreate and DBClose?
template <class T>
class Mesh3d {

public:

// constructors and initialization:

~Mesh3d() { }

Mesh3d(float Xmin, float Xmax, float Ymin, float Ymax, float
Zmin, float Zmax, int Xres, int Yres, int Zres);
// creates a 3d mesh on a rectangular volume
with the appropriate maxes and mins as the boundaries
// of the volume; the resolution in each X, Y,
and Z direction is given by Xres, Yres, and Zres

void zero(); // sets the value of every
grid point to 0
// only call this if T is of
type float, int, etc

// mesh properties:

float xmin, xmax, ymin, ymax, zmin, zmax, xinc, yinc, zinc;
int gridpoints;
int xgridmax, ygridmax, zgridmax;

private:

void construct_mesh(float Xmin, float Xmax, float Ymin, float
Ymax, float Zmin, float Zmax, int Xres, int Yres, int Zres);
// helper function for the constructors


// mesh data:

vector<T> data;
};

template <class T>
Mesh3d<T>::Mesh3d(float Xmin, float Xmax, float Ymin, float Ymax,
float Zmin, float Zmax, int Xres, int Yres, int Zres)
{
construct_mesh(Xmin, Xmax, Ymin, Ymax, Zmin, Zmax, Xres, Yres,
Zres);
}


template <class T>
void Mesh3d<T>::construct_mesh(float Xmin, float Xmax, float Ymin,
float Ymax, float Zmin, float Zmax, int Xres, int Yres, int Zres)
{
xmin = Xmin;
xmax = Xmax;
ymin = Ymin;
ymax = Ymax;
zmin = Zmin;
zmax = Zmax;

xgridmax = Xres;
ygridmax = Yres;
zgridmax = Zres;

gridpoints = xgridmax * ygridmax * zgridmax;

xinc = (xmax - xmin) / (xgridmax-1);
yinc = (ymax - ymin) / (ygridmax-1);
zinc = (zmax - zmin) / (zgridmax-1);

xmax = xmin + (xgridmax-1)*xinc;
ymax = ymin + (ygridmax-1)*yinc;
zmax = zmin + (zgridmax-1)*zinc;

data.resize(gridpoints);

// initialize data to all zeroes; for now I'm assuming that T
will be of type float
zero();

I know this isn't what you asked about, but
Why not
data = std::vector<T>(gridpoints,0.0);

or else write your ctor like
template <class T>
void Mesh3d<T>::Mesh3d(
T Xmin, T Xmax, T Ymin,
T Ymax, T Zmin, T Zmax,
int Xres, int Yres, int Zres
) :
...
gridpoints(Xres*Yres*Zres),
....
data(gridpoints,0.0)
{}
}

template <class T>
void Mesh3d<T>::zero()
{
for(int i=0; i<gridpoints; i++)
data = 0;
}

template <class T>
void Mesh3d<T>::writesilo(string filename)
{
// create the coordinate grid
float * xcoords = new float[xgridmax];
float * ycoords = new float[ygridmax];
float * zcoords = new float[zgridmax];

for(int i=0; i<xgridmax; i++) {
xcoords = itox(i);
}
for(int i=0; i<ygridmax; i++) {
ycoords = jtoy(i);
}
for(int i=0; i<zgridmax; i++) {
zcoords = ktoz(i);
}

float * coordinates[3] = {xcoords, ycoords, zcoords};

// open the output file
DBfile * file = NULL;
file = DBCreate(filename.c_str(), DB_CLOBBER, DB_LOCAL, NULL,
DB_PDB);


What is the value of file after this?

Is a file actually created here?

If not, then can you write a smaller test case and see if you can just
create the file and whatever else DBCreate is supposed to do, and close it?
// construct the quad mesh
int ndims = 3;
int dims[3] = {xgridmax, ygridmax, zgridmax};

DBPutQuadmesh(file, "SPH_data", NULL, coordinates, dims, ndims,
DB_FLOAT, DB_COLLINEAR, NULL);

// collect the density data (later this will be more complicated
than a simple 1 to 1 assign)
float density[gridpoints];
for(int i=0; i<gridpoints; i++) density = data;

// write density data to the quad mesh
DBPutQuadvar1(file, "density", "SPH_data", density, dims, ndims,
NULL, 0, DB_FLOAT, DB_NODECENT, NULL);

DBClose(file);


Did I miss where xcoords etc gets deleted? Why are those raw pointers?
}


And the code that utilizes this Mesh3d class:


int main(int argc, char** argv) {

if (argc < 5) {
printf("Usage: %s <gadget_snapshot> xdim ydim zdim\n",argv
[0]);
return 0;
}

snapshot *snap = new snapshot();

snap->read(argv[1]);

vector<float> bb = snap->getBB();

printf("Bounding box:\nMinima: x = %f, y = %f, z = %f\nMaxima: x =
%f, y = %f, z = %f\n",bb[0],bb[2],bb[4],bb[1],bb[3],bb[5]);

int xdim = atoi(argv[2]);
int ydim = atoi(argv[3]);
int zdim = atoi(argv[4]);

What values do these numbers have?
Mesh3d<float> * mesh = new Mesh3d<float>(bb[0], bb[1], bb[2], bb[3],
bb[4], bb[5], xdim, ydim, zdim);

string fn("mesh.silo");
if (snap->convert_to_mesh(mesh)) {
mesh->writesilo(fn);
}

}

The mesh object above is populated with data during the
convert_to_mesh method. By using a debugger I see the data contained
in the mesh is what I want. Now I want to output that data to a file
using the SILO library, and that is done inside the writesilo()
method. Unfortunately, within the writesilo() method, the data in the
mesh object is being changed during the line that includes a call to
DBCreate(). I don't know why. Thanks for any insight.

What changes specifically? Can you output all the data in your instance
of Mesh3d before and after calling DBCreate and see what happened?

LR
 
K

Krice

// create the coordinate grid
float * xcoords = new float[xgridmax];
float * ycoords = new float[ygridmax];
float * zcoords = new float[zgridmax];

I didn't see you use delete[] on these ones.
 
V

Victor Bazarov

Mark said:
[..] Unfortunately, within the writesilo() method, the data in the
mesh object is being changed during the line that includes a call to
DBCreate(). I don't know why. Thanks for any insight.

Unfortunately I likely won't be able to provide much of insight, but
have you tried using a different library or simulate your SILO thing?
If you comment out all calls to the library (so that you don't even have
to link it in), does the corruption happen? If not, it's the library
and you need to look into getting a different one. If yes, then the
library has nothing to do with it and you need to start looking at other
places in your code.

Memory corruption is notoriously difficult to debug. It is quite
possible that when it happens, the execution already past the point
where the actual bug is. It can be a dangling pointer (you know what it
means, don't you?) or it can be that you're overshooting some dynamic
(or even automatic) arrays by just a bit (often writing into a single
element past the last one in the array is enough to corrupt the stack in
a way that won't show until too late).

I still would start by chucking the SILO library.

V
 
M

Mark

If you comment out all calls to the library (so that you don't even have
to link it in), does the corruption happen? If not, it's the library
and you need to look into getting a different one. If yes, then the
library has nothing to do with it and you need to start looking at other
places in your code.

The answer to your question is: yes, when I comment out all calls to
the SILO library, the problem goes away. I guess that's pretty strong
evidence in favor of the library having problems, huh?

BTW, for that reason I tried upgrading to the most recent release of
SILO, but I still have the problem.
 
M

Mark

Sorry, I forgot to ask about Silo.

Is this a C or a C++ library?

LR


Good question. As best I can tell by looking at the header file it's
a C library that is written to be compatible with C++.

Will I run into memory allocation problems if I mix C and C++? And is
there a way to check whether this is happening?
 
K

Kaz Kylheku

Hello, I've run into a strange bug and I'm not sure how to proceed
with fixing it. Any suggestions would be most appreciated.

If you're on Linux, run your program using

valgrind --tool=memcheck <yourprog> <args>

Debugging memory corruption problems the hard way is a valuable exercise,
but after you've done it N times, for a sufficiently educational value of N,
you want to UTTL.

Use The Tools, Luke.
 
V

Victor Bazarov

Mark said:
Good question. As best I can tell by looking at the header file it's
a C library that is written to be compatible with C++.

I personally don't think it matters.
Will I run into memory allocation problems if I mix C and C++?

Not usually. Unless you try to 'delete' the pointer you got from
'malloc' (even indirectly), and it's a pointer to a class with a d-tor
or any members that might have a d-tor... Anyway, there is always more
than one way to get your program to have undefined behaviour, but it
does not stem from mixing languages. You can do it in a pure C++
program, it's simple, really.
> And is
there a way to check whether this is happening?

Whether memory problems are due to mixing languages? I don't think so.

Now, considering Kaz's and Pete's replies, try to find a good tool that
would help you identify the cause of the problem. The variations on the
cause are few. An array overrun, a dangling pointer, an uninitialised
pointer, an invalid (usually C-style) cast. That's probably not the
complete list, but close. Along with Kaz's recommendation of
'valgrind', I'd try turning up the warning level on your compiler,
running PC-lint on your source, and simplifying the code to try to
identify the place where it might happen: if you suspect SILO, start by
using their code samples first (supposing they work), then gradually add
to them what you've tested and verified as working (your code)...

V
 
M

Mark

Memory management problems show up at places that have nothing to do
with the spot where the error actually occurred. And since a memory
management problem typically means that code ends up stomping on memory
that it shouldn't be touching, the effects can seem random. Swapping two
lines of code can make the symptoms disappear; commenting out large
chunks of code can do the same. But often that's just symptoms. The
underlying problem is still there.

I must admit that everything you said rings of the painful truth.
I've had all of these experiences in the past and the underlying cause
was always memory (mis-)management.

Thanks to Pete, Victor, Kaz, and company for your help. I'll try
using tools like valgrind, which will be a new but worthwhile (and
overdue) learning experience for me.

You all have been generous with your assistance. Thank you!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,997
Messages
2,570,240
Members
46,828
Latest member
LauraCastr

Latest Threads

Top