gdb VS c++ template instances

J

Jim King

We have a template class named Temp<...>. It's used everywhere. We are
interested in Temp<DWORD>. And we use gdb command "info types
Temp<DWORD>" to detect them. The printed message is like following:

xxxx.c
....Temp<DWORD>...

yyyyy.c
....Temp<DWORD>....

zzzzz.c
....Temp<DWORD>....

And I'm sure that the 3 Temp<DWORD>s are not the same for some reason.
How can I print the message of each Temp<DWORD>?

Thanks,
Jim King
 
V

Victor Bazarov

We have a template class named Temp<...>. It's used everywhere. We are
interested in Temp<DWORD>. And we use gdb command "info types
Temp<DWORD>" to detect them.

I understand that some users might know what it means to use some
command to "detect them", but I don't. Could you put it in C++ terms?
> The printed message is like following:

xxxx.c
...Temp<DWORD>...

yyyyy.c
...Temp<DWORD>....

zzzzz.c
...Temp<DWORD>....

And I'm sure that the 3 Temp<DWORD>s are not the same for some reason.

What makes you so sure? I am not doubting your statement, I am just
asking to learn about your methods/approaches.
How can I print the message of each Temp<DWORD>?

What is "the message of each Temp<DWORD>"? Perhaps it would be clearer
if you published the definition of your 'Temp' class...

V
 
J

Jim King

I understand that some users might know what it means to use some
command to "detect them", but I don't.  Could you put it in C++ terms?

 > The printed message is like following:




What makes you so sure?  I am not doubting your statement, I am just
asking to learn about your methods/approaches.


What is "the message of each Temp<DWORD>"?  Perhaps it would be clearer
if you published the definition of your 'Temp' class...

V

Sorry for my ambiguous words.

in xxxx.cpp there's code snippet:

template<class T>
class Temp
{
public:
....
Temp<T> *clone() const
{
new Temp<t>(a, b);
}

private:
... a;
... b;
};

In gdb, I type "ptype Temp<StdMap>", the message is:
class Temp<StdMap>
{
... a;
... b;
....

Then I interrupt on Temp<StdMap>::clone(), and type "ptype *this":
class Temp<StdMap>
{
... b;
... a;

....

See, the order of member variables are different. There must be
something wrong. Since they are all Temp<StdMap>, how can I tell which
is which?

Thanks,
Jim King
 
J

Jim King

Sorry for my ambiguous words.

in xxxx.cpp there's code snippet:

template<class T>
class Temp
{
public:
...
    Temp<T> *clone() const
    {
        new Temp<t>(a, b);
    }

private:
    ... a;
    ... b;

};

In gdb, I type "ptype Temp<StdMap>", the message is:
class Temp<StdMap>
{
    ... a;
    ... b;
...

Then I interrupt on Temp<StdMap>::clone(), and type "ptype *this":
class Temp<StdMap>
{
    ... b;
    ... a;

...

See, the order of member variables are different. There must be
something wrong. Since they are all Temp<StdMap>, how can I tell which
is which?

Thanks,
Jim King

And My intent is to find where comes the wrong definition which "b"
goes ahead "a".

Thanks,
Jim King
 
V

Victor Bazarov

That's not real code, is it? Please refer to FAQ 5.8.
And My intent is to find where comes the wrong definition which "b"
goes ahead "a".

I'd certainly start by posting in a gnu newsgroup where 'gdb' is on
topic. Here it really isn't.

V
 
P

Paul Bibbings

Jim King said:
Sorry for my ambiguous words.

in xxxx.cpp there's code snippet:

template<class T>
class Temp
{
public:
...
Temp<T> *clone() const
{
new Temp<t>(a, b);
}

private:
... a;
... b;
};

In gdb, I type "ptype Temp<StdMap>", the message is:
class Temp<StdMap>
{
... a;
... b;
...

Then I interrupt on Temp<StdMap>::clone(), and type "ptype *this":
class Temp<StdMap>
{
... b;
... a;

...

See, the order of member variables are different. There must be
something wrong. Since they are all Temp<StdMap>, how can I tell which
is which?

As far as I understand it a conforming implementation gives a guarantee as
to the ordering of non-static data members that are contained within the
same access-specifier (though not necessarily that they be contiguous.
The wording is:

[class.mem]/12
"Nonstatic data members of a (non-union) class declared without an
intervening access-specifier are allocated so that later members have
higher addresses within a class object. The order of allocation of
nonstatic data members separated by an access-specifier is
unspecified (11.1). Implementation alignment requirements might cause
two adjacent members not to be allocated immediately after each
other; so might requirements for space for managing virtual functions
(10.3) and virtual base classes (10.1)."

Having said that, the issue that you are facing seems, more likely, to
have something to do with how gdb is /reporting/ the class structure in
the two instances rather than with what is actually going on behind the
scenes. Perhaps you could post a complete, minimal code example that
can be compiled to illustrate what you are seeing.

Regards

Paul Bibbings
 
P

Paul Bibbings

Jim King said:
Sorry for my ambiguous words.

in xxxx.cpp there's code snippet:

template<class T>
class Temp
{
public:
...
Temp<T> *clone() const
{
new Temp<t>(a, b);
}

private:
... a;
... b;
};

In gdb, I type "ptype Temp<StdMap>", the message is:
class Temp<StdMap>
{
... a;
... b;
...

Then I interrupt on Temp<StdMap>::clone(), and type "ptype *this":
class Temp<StdMap>
{
... b;
... a;

...

See, the order of member variables are different. There must be
something wrong. Since they are all Temp<StdMap>, how can I tell which
is which?

To be useful code examples should be short, compilable and illustrate
the problem. Very likely the problem that you are encountering is not
`visible' in the parts of your pseudo-code that you are showing us
here. For comparison, the following naive reconstruction appears not to
show this reorderring (but then, it surely doesn't contain the essential
problematic element either):

16:17:19 Paul Bibbings@JIJOU
/cygdrive/d/CPPProjects/CLCPP $cat gdb_reordering.cpp
// file: gdb_reorderring.cpp

template<typename T>
class Temp
{
public:
Temp(T a, T b)
: a_(a), b_(b)
{ }
Temp<T> *clone() const
{
return new Temp<T>(a_, b_);
}
private:
T a_;
T b_;
};

int main()
{
Temp<int> tmp(1, 2);
Temp<int> *tmp_ptr = tmp.clone();
delete tmp_ptr;
}


16:17:25 Paul Bibbings@JIJOU
/cygdrive/d/CPPProjects/CLCPP $i686-pc-cygwin-gcc-4.4.3 -O0 -gdwarf-2
-g3 -c gdb_reordering.cpp

16:18:02 Paul Bibbings@JIJOU
/cygdrive/d/CPPProjects/CLCPP $i686-pc-cygwin-g++-4.4.3
-Wl,--enable-auto-import -static -o gdb_reordering gdb_reordering.o

16:18:33 Paul Bibbings@JIJOU
/cygdrive/d/CPPProjects/CLCPP $gdb ./gdb_reordering
GNU gdb (GDB) 7.0.50.20100128-cvs
Copyright (C) 2010 Free Software Foundation, Inc.
// ...
Reading symbols from
/cygdrive/d/CPPProjects/CLCPP/gdb_reordering...done.
(gdb) start
Temporary breakpoint 1 at 0x4010ee: file gdb_reordering.cpp, line
21.
Starting program: /cygdrive/d/CPPProjects/CLCPP/gdb_reordering
[New Thread 380.0x1538]
[New Thread 380.0x106c]
[New Thread 380.0x868]
[New Thread 380.0x11c8]
[New Thread 380.0x1528]

Temporary breakpoint 1, main () at gdb_reordering.cpp:21
21 Temp<int> tmp(1, 2);
(gdb) ptype Temp<int>
type = class Temp<int> {
private:
int a_;
int b_;

public:
void Temp(int, int);
Temp<int> * clone() const;
}
(gdb) next
22 Temp<int> *tmp_ptr = tmp.clone();
(gdb) step
Temp<int>::clone (this=0x22cce4) at gdb_reordering.cpp:12
12 return new Temp<T>(a_, b_);
(gdb) ptype *this
type = const class Temp<int> {
private:
int a_;
int b_;

public:
void Temp(int, int);
Temp<int> * clone() const;
}
(gdb)


You mentioned your problem as being something to do with definitions of
Temp<DWORD> across three separate .c files (and, should that be .C for
c++ code?). This isn't just a simple case of a violation of the One
Definition Rule - that you are actually using definitions of Temp that
differ across these three files? You have said anything about how you
have made the definition(s) available to these files. Is it via the
same included header in all instances?

Regards

Paul Bibbings
 
J

Jim King

Jim King said:
Sorry for my ambiguous words.
in xxxx.cpp there's code snippet:
template<class T>
class Temp
{
public:
...
    Temp<T> *clone() const
    {
        new Temp<t>(a, b);
    }
private:
    ... a;
    ... b;
};
In gdb, I type "ptype Temp<StdMap>", the message is:
class Temp<StdMap>
{
    ... a;
    ... b;
...
Then I interrupt on Temp<StdMap>::clone(), and type "ptype *this":
class Temp<StdMap>
{
    ... b;
    ... a;

See, the order of member variables are different. There must be
something wrong. Since they are all Temp<StdMap>, how can I tell which
is which?

As far as I understand it a conforming implementation gives a guarantee as
to the ordering of non-static data members that are contained within the
same access-specifier (though not necessarily that they be contiguous.
The wording is:

   [class.mem]/12
   "Nonstatic data members of a (non-union) class declared without an
   intervening access-specifier are allocated so that later members have
   higher addresses within a class object. The order of allocation of
   nonstatic data members separated by an access-specifier is
   unspecified (11.1). Implementation alignment requirements might cause
   two adjacent members not to be allocated immediately after each
   other; so might requirements for space for managing virtual functions
   (10.3) and virtual base classes (10.1)."

Having said that, the issue that you are facing seems, more likely, to
have something to do with how gdb is /reporting/ the class structure in
the two instances rather than with what is actually going on behind the
scenes.  Perhaps you could post a complete, minimal code example that
can be compiled to illustrate what you are seeing.

Regards

Paul Bibbings

I'm sorry that it can't be extracted. Actually, the code is definitely
correct. In one branch, it works well, another branch, it fails. I
don't know the difference of the two branches (that are huge module
with more than 10 million lines), they are almost the same. In one
sentence, if I extract the code, it absolutely will work.

I don't expect member "a" is always ahead "b". I just hope the order
is fixed on runtime. It's odd that the two instances of template which
has the same parameter are not the same. There are some unconfirmed
reasons (probably wrong):
1. I have a cpp which use old template class which b comes first, it
has been deleted while the obj remains.
2. Some unexpected compile parameter are passed to g++ when compiling
the cpp.
3. Two header files contains the different definitions of the template
class.

Any ideas?

Thanks,
Jim King
 
J

Jim King

As far as I understand it a conforming implementation gives a guarantee as
to the ordering of non-static data members that are contained within the
same access-specifier (though not necessarily that they be contiguous.
The wording is:
   [class.mem]/12
   "Nonstatic data members of a (non-union) class declared without an
   intervening access-specifier are allocated so that later members have
   higher addresses within a class object. The order of allocation of
   nonstatic data members separated by an access-specifier is
   unspecified (11.1). Implementation alignment requirements might cause
   two adjacent members not to be allocated immediately after each
   other; so might requirements for space for managing virtual functions
   (10.3) and virtual base classes (10.1)."
Having said that, the issue that you are facing seems, more likely, to
have something to do with how gdb is /reporting/ the class structure in
the two instances rather than with what is actually going on behind the
scenes.  Perhaps you could post a complete, minimal code example that
can be compiled to illustrate what you are seeing.

Paul Bibbings

I'm sorry that it can't be extracted. Actually, the code is definitely
correct. In one branch, it works well, another branch, it fails. I
don't know the difference of the two branches (that are huge module
with more than 10 million lines), they are almost the same. In one
sentence, if I extract the code, it absolutely will work.

I don't expect member "a" is always ahead "b". I just hope the order
is fixed on runtime. It's odd that the two instances of template which
has the same parameter are not the same. There are some unconfirmed
reasons (probably wrong):
1. I have a cpp which use old template class which b comes first, it
has been deleted while the obj remains.
2. Some unexpected compile parameter are passed to g++ when compiling
the cpp.
3. Two header files contains the different definitions of the template
class.

Any ideas?

Thanks,
Jim King

Thank you Paul. The real code is a little bit more complex. Template
class is inherited from a non-template interface class.

class interface
{
public:
virtual ~interface(void) {}
virtual interface *clone(void) const = 0;
};

template<class T>
class Temp : public interface
{
....



And yes, the template class is distributed by the unique header file.
I searched the whole hard disk, no other definition.

Again, code is definitely correct. It works in the main branch.

I really hope "ptype" could print the source file name in which the
symbol is defined, just like "info types". If it is true, only ptype
*this could find the culprit.


Thanks,
Jim King
 
P

Paul Bibbings

Actually, the code is definitely correct. In one branch, it works
well, another branch, it fails.

I can't quite understand the confidence you express in your first
sentence here, given what you acknowledge in the second. And what do
you mean by `branch'? Is this a branch as in a c++ code structure
element or a CVS branch?
I
don't know the difference of the two branches (that are huge module
with more than 10 million lines), they are almost the same. In one
sentence, if I extract the code, it absolutely will work.

I don't expect member "a" is always ahead "b". I just hope the order
is fixed on runtime. It's odd that the two instances of template which
has the same parameter are not the same. There are some unconfirmed
reasons (probably wrong):
1. I have a cpp which use old template class which b comes first, it
has been deleted while the obj remains.

In this case you would need a rebuild. Such an anomaly cannot be
allowed to persist if any expectation of operational correctness is
important to you.
2. Some unexpected compile parameter are passed to g++ when compiling
the cpp.

I cannot immediately imagine an example of this which might lead to the
kind of problem you have discovered. More likely, given your first
idea, you could have a badly constructed makefile that is not handling
dependencies correctly.
3. Two header files contains the different definitions of the template
class.

If there is any possibility of this being the case then your program
violates the One Definition Rule and thereby invokes undefined
behaviour. As such the language specification puts no requirements
whatsoever upon the behaviour of the program *as a whole*.
Any ideas?

If the first of your ideas is possible then you certainly need a rebuild
and you will have to address how the build process is being
(incorrectly) managed to permit such an anomaly. If the third is a
possibility then you have to simply correct it. It sounds almost like
the project has developed without proper consideration as to how files
are integrated into and taken out of it. The One Definition Rule
exists, so to speak, to facilitate the easy management of multiple
instances of a definition, across different translation units, through
the inclusion of a common header. If there's a chance that your project
is not following this practice then you might want to refactor
accordingly.

In case you don't know already, undefined behaviour is Bad(TM). If your
program violates the ODR it has undefined behaviour and is `incorrect',
even if it appears, at certain times and under certain circumstances, to
be operating according to your expectations.

Regards

Paul Bibbings
 
P

Paul Bibbings

Jim King said:
I really hope "ptype" could print the source file name in which the
symbol is defined, just like "info types". If it is true, only ptype
*this could find the culprit.

Can't you egrep your codebase to find out whether or not there are, in
fact, multiple definitions?

If I understand you correctly, you appear to be saying that the code is
correct in one CVS branch but not in another. Is this, then, perhaps
just a question of versioning errors?

Regards

Paul Bibbings
 
J

Jim King

I can't quite understand the confidence you express in your first
sentence here, given what you acknowledge in the second.  And what do
you mean by `branch'?  Is this a branch as in a c++ code structure
element or a CVS branch?



In this case you would need a rebuild.  Such an anomaly cannot be
allowed to persist if any expectation of operational correctness is
important to you.


I cannot immediately imagine an example of this which might lead to the
kind of problem you have discovered.  More likely, given your first
idea, you could have a badly constructed makefile that is not handling
dependencies correctly.


If there is any possibility of this being the case then your program
violates the One Definition Rule and thereby invokes undefined
behaviour.  As such the language specification puts no requirements
whatsoever upon the behaviour of the program *as a whole*.


If the first of your ideas is possible then you certainly need a rebuild
and you will have to address how the build process is being
(incorrectly) managed to permit such an anomaly.  If the third is a
possibility then you have to simply correct it.  It sounds almost like
the project has developed without proper consideration as to how files
are integrated into and taken out of it.  The One Definition Rule
exists, so to speak, to facilitate the easy management of multiple
instances of a definition, across different translation units, through
the inclusion of a common header.  If there's a chance that your project
is not following this practice then you might want to refactor
accordingly.

In case you don't know already, undefined behaviour is Bad(TM).  If your
program violates the ODR it has undefined behaviour and is `incorrect',
even if it appears, at certain times and under certain circumstances, to
be operating according to your expectations.

Regards

Paul Bibbings

Hi Paul,

The "branch" means perforce branch, the same to CVS branch.

I have rebuilt all several times, the abnormal still remains.

I searched the whole hard disk, I can't find another definition.

I totally have no idea about it. Is there still a way to walk out?

Thanks,
Jim King
 
J

Jim King

You have not told what exactly is abnormal. If it is justgdbshowing b
before a, then this might well be an abnormality ofgdbitself.


Actually, gdb tells the truth. And the template class has been
constructed by Temp<...>(a, b). When I acquire member a through
getA(), it returns the value of b instead. It leads to crash
consequently. That's the very abnormal. I didn't see any memory
corruption anyway.

I am just curious that if you met such a weird thing, how could you
do?

Thanks,
Jim King
 
V

Victor Bazarov

Actually, gdb tells the truth. And the template class has been
constructed by Temp<...>(a, b). When I acquire member a through
getA(), it returns the value of b instead. It leads to crash
consequently. That's the very abnormal. I didn't see any memory
corruption anyway.

I am just curious that if you met such a weird thing, how could you
do?

I would take the source code (only the source code) to a clean machine
and rebuilt everything from scratch.

Do you have any place in the code where 'getA()' behaves as expected? I
mean, in the same application. If you do, you need to put a breakpoint
on that call and on the call that doesn't behave correctly.

If you can, try stepping into 'getA()' call and (a) see the assembly and
understand what offset the code uses to return the value, (b) verify
that you're in the same module - the EIP register value has to be the
same. If EIP has different values, you're in two different 'getA'
functions. Find out where the bad one comes from (by examining the map
file the linker produces, for instance), and remove it.

V
 
P

Paul Bibbings

Jim King said:
Actually, gdb tells the truth. And the template class has been
constructed by Temp<...>(a, b). When I acquire member a through
getA(), it returns the value of b instead. It leads to crash
consequently. That's the very abnormal. I didn't see any memory
corruption anyway.

I am just curious that if you met such a weird thing, how could you
do?

You seem to have all the tools that you need but, it being a large
codebase (as you have said), you appear to be having problems utilizing
them to good effect.

You are using gdb and you are able to locate points in the execution
where the problem, as you have identified it, occurs. I don't know how
you are invoking gdb or what debugging options you are passing to your
compiler but, from my own use of gdb, I am able to know at all times
both the source file I am `in' and the line number of the current
executing line. Consequently, I would expect that you would be able to
do the same.

Having located the source of the error, what source file contains the
code you are at that point executing? Once you have identified this,
then you have localized the problem within your huge codebase
dramatically. Then, in that file, what #include is provided to bring in
the definition of your Temp class? Have a look at that file. Is there
something wrong with the definition of Temp there? Perhaps the
definition of that type from that header is not too large to post here.

Maybe you notice that, at other points in the execution, you do *not*
get the same problem. At /these/ points, are you in a different source
file? Is the header file #included in this instance /different from/
the one that you identified what you encountered the problem (above)?

Possibilities are:

1. your code does not comply with the One Definition Rule (with
different headers used and containing, perhaps subtly, but
significantly different definitions for Temp). In this case, you
have bought for yourself undefined behaviour, and you may fairly
expect *anything* to happen.

2. you have a problem with your initialization order in the definition
of your Temp class. According to [class.base.init]/5, after
initialization of virtual base classes and direct base classes, then:

"nonstatic data members shall be initialized in the order in which
they were declared in the class definition (again regardless of
the order of the /mem-initializers/."

Are you perhaps relying on the order of the /mem-initializers/ in the
belief that everything is being initialized in the correct order
where, in fact, the order as determined by the declaration order in
the class definition results in initialization using indeterminate
values? (There is a thread about this on comp.lang.c++.moderated
currently).

3. You have not yet performed a full clean and build to remove (at your
own suggestion) the possibility of having object files still hanging
around that were built using an outdated (and incompatible)
definition of your class, previously in a header that may not itself
be present in the codebase.

It really shouldn't be too hard, and the size of the codebase shouldn't
be causing too much of a problem now that you have, as you say, already
been able to track down where the error is occurring and have been able
to say something about what is happening at that point. There, the use
of your Temp instance will require a definition in some header. Find
it. Look at it. Something is wrong, /there/, most probably.

These are some ideas but first of all, please... *ensure* that you have
entirely ruled out point 3. above. If you have not yet done this, then
everything else could be wasted time. As explained, it could indeed be
(in this case) that the problem code is not actually present in the
codebase for you to find. If you do not have faith in your build script
(makefile) to do the clean properly, run it and then check afterwards to
see that all object files and such like are actually removed.

Failing any such strategy, perhaps you should respectfully accept that
the problem is too big for you (if, that is, you are working on this
huge project in a professional context) and pass it on.

Good luck!

Regards

Paul Bibbings
 
P

Paul Bibbings

Jim King said:
Actually, gdb tells the truth. And the template class has been
constructed by Temp<...>(a, b). When I acquire member a through
getA(), it returns the value of b instead. It leads to crash
consequently. That's the very abnormal. I didn't see any memory
corruption anyway.

I am just curious that if you met such a weird thing, how could you
do?

Just an idea. You have mentioned earlier your disappointment in ptype
not being able to give you the file in which the type is defined. Have
your tried a backtrace? If you do this at appropriate points in the
execution, you should be able to get closer to the knowing in which file
the problem is occurring. If the crash mentioned above is predictable,
put a break point just before it occurs, run to it, and get a backtrace.

For example, for a snippet of code that I am working with, I can get:

(gdb) backtrace
#0 ColorArray<(bits)6>::eek:perator[] (this=0x22cc50, index=0)
at color_array.cpp:136
#1 0x00401605 in main () at color_array.cpp:183
(gdb)

which shows the nesting of function calls and gives the files in which
the definitions are to be found.

One more thing. If you do get such a backtrace, and then look at the
identified code and find that there doesn't appear to be anything wrong
with it, take special care to check the line numbers given in the
backtrace against the file itself. If these are different, it may
signal that the object file containing the executing code is out-of-date
with the code file itself, indicating that the problem might be what we
have considered a possibility for some while now, requiring nothing more
than a full rebuild (make clean && make all).

Regards

Paul Bibbings
 
J

Jim King

Paul said:
Jim King said:
Actually, gdb tells the truth. And the template class has been
constructed by Temp<...>(a, b). When I acquire member a through
getA(), it returns the value of b instead. It leads to crash
consequently. That's the very abnormal. I didn't see any memory
corruption anyway.

I am just curious that if you met such a weird thing, how could you
do?

Just an idea. You have mentioned earlier your disappointment in ptype
not being able to give you the file in which the type is defined. Have
your tried a backtrace? If you do this at appropriate points in the
execution, you should be able to get closer to the knowing in which file
the problem is occurring. If the crash mentioned above is predictable,
put a break point just before it occurs, run to it, and get a backtrace.

For example, for a snippet of code that I am working with, I can get:

(gdb) backtrace
#0 ColorArray<(bits)6>::eek:perator[] (this=0x22cc50, index=0)
at color_array.cpp:136
#1 0x00401605 in main () at color_array.cpp:183
(gdb)

which shows the nesting of function calls and gives the files in which
the definitions are to be found.

One more thing. If you do get such a backtrace, and then look at the
identified code and find that there doesn't appear to be anything wrong
with it, take special care to check the line numbers given in the
backtrace against the file itself. If these are different, it may
signal that the object file containing the executing code is out-of-date
with the code file itself, indicating that the problem might be what we
have considered a possibility for some while now, requiring nothing more
than a full rebuild (make clean && make all).

Regards

Paul Bibbings

Hi Paul,

Thank you very much, you are so experienced and a good tutor.

I'll look into it according to your information. However, there is
code access limitation and I cannot rebuild them all. I have only
binaries and symbol files for a few modules.

I'll resort to disassember finally.

Well, our environment is Max OS 10.6 + xcode 3.1.1.

Thank you again and wish you have a good day.

Regards,
Jim King
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,236
Members
46,822
Latest member
israfaceZa

Latest Threads

Top