Regression testing for pointers

I

Ike Naar

I don't think so, assuming the behaviour of malloc is well defined. All
strdup is doing is something along the lines of:

char* pNew = malloc( strlen(in) );

s/strlen(in)/strlen(in)+1/ (one extra byte for the null terminator)
 
B

Ben Bacarisse

Ian Collins said:
Well I guess you could rule out both by asserting strdup calls malloc.

That would be to miss the point! Assume that strdup does not use
malloc -- this is an environment where dynamic allocation is not
permitted but we want to be able to copy some limited number of
strings.
I don't think so, assuming the behaviour of malloc is well defined.
All strdup is doing is something along the lines of:

char* pNew = malloc( strlen(in) );

if( pNew )
strcpy( pNew, p );

return pNew;

See above.
 
I

Ian Collins

That would be to miss the point! Assume that strdup does not use
malloc -- this is an environment where dynamic allocation is not
permitted but we want to be able to copy some limited number of
strings.

Um, OK. I was using the description in the Solaris man page:

The strdup() function returns a pointer to a new string that
is a duplicate of the string pointed to by s. The returned
pointer can be passed to free(). The space for the new
string is obtained using malloc(3C).

Given your caveat, I would first test the function that retuned the
memory and then carry on as before by asserting strdup calls that
function. I would then add tests to ensure the string was correctly
copied. This separates testing the memory management functionality from
the string copying.

See my earlier reply to "Don Y" for an example of how those tests would
be built.
 
M

Malcolm McLean

I.e., I can't come up with "constant" results against
which to compare test-time results.  Sticking with the
traditional malloc/free for example (my routines are
more heavily parameterized), I can create a malloc.test
that pushes size_t's to malloc.  But, aside from verifying
the result is not NULL (or, *is* NULL, in some cases!),
there's nothing I can do to check that the result is
actually correct!
malloc() should be considered a special case. Really it's an IO
function, but the IO is to the heap. You verify that an IO function
works correctly be seeing that the attached hardware operates as
expected (or a dummy device gets the bits if you're testing nukes or
something). You verify that malloc() works by seeing that the heap
changes as you want it to. But you can't actually see the heap,
because it's a static array in the mymalloc() function.
For instance if malloc() returns two pointers to the same block of
memory, that would be very hard to catch in a black box test.
 
B

Ben Bacarisse

Ian Collins said:
Um, OK. I was using the description in the Solaris man page:

The strdup() function returns a pointer to a new string that
is a duplicate of the string pointed to by s. The returned
pointer can be passed to free(). The space for the new
string is obtained using malloc(3C).

Given your caveat, I would first test the function that retuned the
memory and then carry on as before by asserting strdup calls that
function. I would then add tests to ensure the string was correctly
copied. This separates testing the memory management functionality
from the string copying.

See my earlier reply to "Don Y" for an example of how those tests
would be built.

Hmm... I'm not seeing it. Can we take a specific case? Your previous
post happened to have a strdup with a bug (storage returned was one byte
too short). What sort of unit test would pick that bug up? I may be
misreading the allocation tests you are referring to but they don't seem
to cover that.
 
I

Ian Collins

Hmm... I'm not seeing it. Can we take a specific case? Your previous
post happened to have a strdup with a bug (storage returned was one byte
too short). What sort of unit test would pick that bug up? I may be
misreading the allocation tests you are referring to but they don't seem
to cover that.

I would probably be testing this a two separate libraries. The memory
allocation functions would have their own set of tests and I would then
mock those functions (let's call them staticMalloc and staticFree) in
the string library tests.

So in my string library tests I would have a test like:

TEST_F( MyStingTest,
requestedBufferSizeOneMoreThanStringLength )
{
const char* string = "hello";

test::staticMalloc::expect( sizeof(string)+1 );

strdup( string );

ASSERT_TRUE( test::staticMalloc::called );
}
 
B

Ben Bacarisse

Ian Collins said:
I would probably be testing this a two separate libraries. The memory
allocation functions would have their own set of tests and I would
then mock those functions (let's call them staticMalloc and
staticFree) in the string library tests.

So in my string library tests I would have a test like:

TEST_F( MyStingTest,
requestedBufferSizeOneMoreThanStringLength )
{
const char* string = "hello";

test::staticMalloc::expect( sizeof(string)+1 );

strdup( string );

ASSERT_TRUE( test::staticMalloc::called );
}

I see, thank you.

You have a bug in the test that is interesting -- the size is also wrong
here. That was a similar bug to the one in strdup. It seems a shame
that one of the things that needs to be tested (the size expression)
must be written out here as well -- I suppose that's why the code's
author should not write the tests. BTW, I might change the expect
call was to expect_at_least(...) so as to uncouple the test from
the implementation a little more.

If I can prevail on you some more... What about the requirement that
the malloced store be unique and not overlapping any others? Is that
done statistically by setting the memory to some known data pattern?
 
I

Ian Collins

I see, thank you.

You have a bug in the test that is interesting -- the size is also wrong
here. That was a similar bug to the one in strdup.

I would test with more than one string, so the bug would soon come to
light. I also try to use a different expression in the tests (such as
size of rather than strlen) to minimise common errors.
It seems a shame
that one of the things that needs to be tested (the size expression)
must be written out here as well -- I suppose that's why the code's
author should not write the tests.

The odds on the same bug appearing in two different expression are less
than a single occurrence. If I am working with a pair, we often swap
roles between tests. If I am working on my own, I have to take more care!
BTW, I might change the expect
call was to expect_at_least(...) so as to uncouple the test from
the implementation a little more.

That would be fine for numeric values, but not much use for other types.
My harness' expect function accepts function objects as well as values
(one handy feature of C++), so I could use an at_least functor here.
If I can prevail on you some more... What about the requirement that
the malloced store be unique and not overlapping any others? Is that
done statistically by setting the memory to some known data pattern?

That would be a memory allocator test. I'll ruminate some more on how
to write it!
 
G

Guest

On 3/9/2012 9:53 PM, Ian Collins wrote:


For an embedded device, you *know* the environment you
are operating in. You *know* where the heap resides
(actual addresses). You know what the alignment
restrictions of the processor are. You know that a
request for 0x61 bytes with the heap starting at 0x40000000
having a length of 0x100 in a byte alignable environment
allocating from the *end* of the free list (as opposed to
the start) trimming the allocation unit to the requested
size *should* give you a result of 0x61 bytes located
at 0x4000009F. You *then* know that an attempt to
allocate 0x9E bytes will FAIL under the same calling
conditions (because a 1 byte crumb will be left in the
free list -- which "/* CAN'T HAPPEN */".

I write MyMalloc() and hand it to you WITH A TEST SUITE.
You run it on <whatever> processor and rely on the test
suite to prove to you that the code works as advertised.

Repeat the above example with a heap at 0x2000 (!!) of size
0x1000 and you get different results.


What "test harness malloc"? There *is* no "test harness
malloc! Recall, I said "malloc and free *types* of functions"
not "malloc and free" -- in particular, note the semantics of
myfree() differ from free(3).

All I can compare against is myself -- and if I have a bug
then the other instance will faithfully repeat the same
bug.

if you need to test something as complex as malloc you're going to have to open the box. I'd write validation code for the internal data structures. Code that tested for over-laps in allocated data (probably not possible if the data structures are ok) etc. Then run various test running the validation after each alloc or dealloc.

I'm going to give you some code to implement a MyMalloc
(-like) function. I'm not going to tell you where
the heap resides in memory. I'm not going to tell you
if memory accesses must be byte/word/cache-line aligned.
Heck, I'm not even going to tell you how *big* the heap
is!

What are your test cases? How do you know if they have
succeeded or not? I.e., just because MyMalloc *appears*
to return a value, are you sure that it is the *correct*
value?

Are you sure that the allocation strategy actually enacted
was "smallest fit" and not "last fit" or "largest fit"?
(my memory management system is highly parameterized)
How do you know that the allocated block was *intentionally*
aligned with whatever the processor wants -- instead of
*coincidentally* happening to be aligned? Are you sure
that any "crumbs" left behind in the free list as a
result of that alignment are recoverable? (i.e., if
you left a single byte behind, then you probably can never
recover it since its not big enough to contain the list
overhead necessary to tie it into the free list)

white box rather than black box testing.
 
G

Guest

That is moot anyway because the point of a /regression/ test suite isn't to
find bugs or validate functionality. It's simply to validate that the
behavior hasn't changed between revisions (including buggy behavior!)

ah. ok. But the first time you run the test it /is/ supposed to be part of your validation.
Thus an imperfect regression test suite can still find a regression.


This is just "top down design" under new verbiage.

ah, I thought it was nonsense the first time round! I remember reading a book where a guy constructed an entire program top. Jolly cleaver I thought but how does he know how to write exectly the right procedure at each step. To make no mistakes, no false steps. I suspected then- and more than suspect now! that he didn't actually write the program the way he claimed he'd written it.
Some programmers like to
write the low-level functions and then express the higher level ones in terms
of the language of these lower level functions. Other programmers are able to
write the top that they want first and fill in the bottom.

I do a bit of both. I like to think of the "outside" of the program. Then go into a tool building phase. These are the sort of things I'll need to getthe outside behaviour. The bolt things together, testing as I go. If its very complex then there may be many layers. Oh and these days OO seems to the natural way to do things even if C is the implementaion language. Functional programs I'm still struggling with.
There are obvious limitations to "test first" or "top down".

my favourite example was a stream of bytes to hold messages. This could be read from, usually forwards but occaisonally backwards. Internally the stream was a singly linked list of fixed sized blocks. 32 bytes in size if I remeber correctly. This was hurried into "integration test" (put everyting ina big bag and shake it) without any module testing. In the field it failed.. That's right when it tried to reverse over a block boundary.
For instance, imagine it is 1969. Let's write a test case for the
awk language!
Okay, let's see now: invent C, Unix, Yacc, the Bourne shell, etc.
Finally, awk arrives, and and we can execute our test case!

Bottom up is the way things really progress; top down is a small-scale, local
aberration.

I used to design top down but implement bottom up. Writing mock functions so I can implement and test top down just seems tedious.

top down used to lead to some odd programs. One program I saw seemed to follow a rule of three. The top called three functions each in turn called three more. Eventually he got bored and a big lump of code down in the bowels suddenly did all the work.
 
G

Guest

First of all, I consider it sinful to release code that hasn't
been tested. "Well, I ran it a couple of times and it *looked*
like it was working. Play with it and tell me what you think..."
is not a valid attitude, IMO.

that's not quite the same as "no bugs"

"Luck" has no place in an engineering/science discipline:
"Here, Isaac, drop the apple... maybe it will FALL"

You engineer tests to *verify* expected results. If the test
doesn't turn out the way you *expect*, it should truly be a
surprise to you!


That, in itself (in this particular case) is actually one of the
things you want to test for! *If* the code is functioning
properly, then I can expect performance to be X in this situation
(based on the deliberate actions I have taken to fragment the
free list).

I'll be impressed if you succeed
Part of the rationale for my abandonment of the traditional
malloc/free approach is because it doesn't give the programmer
any control over runtime *performance* -- it's an opaque box!


How could a "trial" possibly give me more information than I
have available when examining the source?

because the real world is a constant surprise
If I know my art,
then I should be able to anticipate damn near every possible
case that the code will encounter -- moreso than an application
is likely to stumble upon in the course of pursuing *its* goal.

"should" being the operative word!

"
GLENDOWER
I can call spirits from the vasty deep.

HOTSPUR
Why, so can I, or so can any man,
But will they come when you do call for them?
"

This is one of the reasons the formal methods approach has been sluggish.
In theory you prove a program is correct without ever executing it. But
people rarely do this on any large scale.

The writers of the Space Shuttle software might have acheived to what
you're trying to do. At enourmous cost.
 
D

Don Y

Hi Nick,

white box rather than black box testing.

That still doesn't address the problem. You don't know,
a priori, what the test environment will be! So, now you
have replaced "writing a function that can perform correctly
in that unknown environment" with "writing a function that
can TEST ANOTHER FUNCTION in that unknown environment".
You've just moved the issue to another piece of code.
 
D

Don Y

Hi Nick,

I used to design top down but implement bottom up. Writing mock functions so
I can implement and test top down just seems tedious.

I've a friend who codes entirely top-down. It is very (psychologically)
straining to watch over his shoulder:

"What's foo() do?"
<shrug> "I dunno. I'll figure it out when I write it..."

I design hardware so tend to think of a project in terms of what
it "needs" to perform its function. Then, build higher levels of
abstraction up from the bottom.

But, you have to know where you are headed, ultimately, so that
you propagate the necessary abstractions upward. "Do I need to
convey communication errors up to the consumers of that service?"
etc.

I find looking at a problem from the top is the only way to understand
the nature of that goal. OTOH, I find it easier to determine when
and where to shed detail as building up -- rather than *adding*
detail when drilling down.
 
D

Don Y

Hi Malcolm,

malloc() should be considered a special case. Really it's an IO
function, but the IO is to the heap.

Why would you consider this different than a function that
manipulates a doubly linked list? (a pure function in your
lexicon)
You verify that an IO function
works correctly be seeing that the attached hardware operates as
expected (or a dummy device gets the bits if you're testing nukes or
something). You verify that malloc() works by seeing that the heap
changes as you want it to. But you can't actually see the heap,
because it's a static array in the mymalloc() function.

But you really only care that the results malloc() returns to
you are consistent with its design specifications.

E.g., how would this testing differ from enqueue()/dequeue()
functions? You don't go poking around ALL of memory verifying
that enqueue and dequeue haven't, accidentally, twiddled
something that they shouldn't have...?
For instance if malloc() returns two pointers to the same block of
memory, that would be very hard to catch in a black box test.

Actually, that sort of thing is easy to check for.
In theory, you not only want to test that the pointers
never coincide but that a pointer doesn't reference a
location *inside* another (previously allocated) block
*or* outside the configured range of the heap. E.g.,
my "releaseMemory()" verifies that the "chunk" that
is passed to it resides *in* the heap that is selected,
is not present in the free list *and* "makes sense"
in the context of those constraints.

*If* you know the details of the execution environment,
then you can easily create a test scenario where every
option of the allocator is observable -- much the
same way that you *know* 2+3==5.

E.g., if the heap resides at 0x80000000 and has a size of
0x1000, is initially "full" (i.e., I just created it!)
then a request for 0x20 bytes chosen from the *tail*
of the free list using a "best fit" strategy, trimming
the allocation to the specified size with an alignment
granularity of 0x80 will yield the value 0x80000F80.
A subsequent allocation request without the alignment
constraint will yield 0x800000E0 while one addition
request for 0x100 bytes with no alignment constraints
would yield 0x80000E80. These can be determined from the
specification of the function's operation.

These are predictable and invariant -- in *this* test
environment. The problem is the "constants" will
vary when the platform, heap location, etc. vary.
 
D

Don Y

Hi Nick,

On 3/11/2012 6:09 AM, (e-mail address removed) wrote:

that's not quite the same as "no bugs"

Exactly! Yet that seems to be the most common approach to testing
nowadays (how many folks have formal specifications, test suites,
validation procedures, etc.).

You have to consciously *attack*/challenge a piece of code to
identify deficiencies. Where are the boundary conditions?
What did the developer *not* expect? etc.

I tend to be pretty good at coming up with "unexpected"
operating conditions. When the developer invariably
*whines* saying "but you're not supposed to *do* that!"
I grin and reply, "Then don't LET me!"
I'll be impressed if you succeed

This is really not hard to do. Since most of my work is for
real-time applications, *how* the algorithm performs (timeliness)
is as important as what it *does*.

This was the motivation for developing this version of the
memory management library -- since an application must be
able to accurately predict how a particular request will
be satisfied INCLUDING HOW LONG IT WILL TAKE.
because the real world is a constant surprise

Engineering is the art of anticipating those surprises!
In the "ethereal" world of software, this is considerably easier
than in the physical world ("What happens if a guy whacks it
with a 20lb sledge hammer?")
"should" being the operative word!

Of course! That's what makes me good at what I do! :>
"
GLENDOWER
I can call spirits from the vasty deep.

HOTSPUR
Why, so can I, or so can any man,
But will they come when you do call for them?
"

This is one of the reasons the formal methods approach has been sluggish.
In theory you prove a program is correct without ever executing it. But
people rarely do this on any large scale.

Actually, this is common on very large scale projects (which,
otherwise, would never be able to run a single invocation to
completion, otherwise).

Where it *doesn't* happen is smaller projects, "lone wolf"
developers, "immature" organizations, etc. "Pay me now or
pay me later" is a great mantra when it comes to software
quality and testing, in general.
The writers of the Space Shuttle software might have acheived to what
you're trying to do. At enourmous cost.

There are many industries where formal testing and validation
are the norm and "required" practices. Surprisingly, the
cost for this isn't as great as one would think. Once a
module has been validated, that cost is behind you -- any
future use of the module comes with an *assurance* of proper
operation (within the constraints of the contract/spec).

In these environments, you really benefit from code AND
design reuse.

OTOH, if you are developing "ad hoc" and testing is left
to the whims of the developer, chances are you can't
afford a proper validation environment and you'll just keep
paying for it in OTHER ways (bugs, feature creep, training,
etc.) without ever being able to wrap your hands around the
entire problem.

[Do you get the impression that I am a big advocate of
formalized methods? :> ]
 
I

Ian Collins

Hi Nick,



That still doesn't address the problem. You don't know,
a priori, what the test environment will be! So, now you
have replaced "writing a function that can perform correctly
in that unknown environment" with "writing a function that
can TEST ANOTHER FUNCTION in that unknown environment".
You've just moved the issue to another piece of code.

Didn't my suggestion address those issues? If not, what did I miss?
 
J

Jorgen Grahn

Hi Jorgen,



Note the use of "TYPES of routines" (i.e., NOT "malloc")


See above. And, the earlier comment regarding "functions
(in general) that manipulate pointers.

I just went with your assumption that a malloc()/free() example would
be useful. (I have to admit I didn't understand the part about
manipulating pointers.)
No, you can't. You don't know what valid results *are*!

I just gave some rules for valid and invalid results above. Not *all*
rules, of course! You surely realize that it's not possible to prove
the correctness of a malloc() implementation this way.

Note that I'm not saying I would be satisfied with a test which only
uses the three rules above. It just seemed to me you believed there
was nothing you could do to test such a function, without a sort of
debugging interface into its internals.

....
What if ptr references a location that was NEVER in the heap?
What if it references a location that could never have been
*returned* by malloc()? E.g., not properly aligned.

I don't see the relation between what you write and what I wrote ...
but in the case of free() the specification allows anything to happen,
so in some sense you don't need to test this.

/Jorgen
 
I

Ian Collins

Since you can't nail down actual addresses a priori (i.e., at
test creation time), the only way to design aggressive tests
(for memory allocators) is to do so algorithmically. But,
then your test suite becomes nontrivial and subject to
bugs (1.732 is ALWAYS 1.732)

This appears to be the flaw in your understanding of unit tests. You
can nail down actual addresses. Memory for malloc has to come from
somewhere and in the tests, that somewhere is the test harness.
 
G

Guest

On 3/11/2012 6:09 AM, (e-mail address removed) wrote:

[...] Yet that seems to be the most common approach to testing
nowadays (how many folks have formal specifications, test suites,
validation procedures, etc.).

there's a system test spec and we have to comply to an international standard. This is verified by an external 3rd party.
You have to consciously *attack*/challenge a piece of code to
identify deficiencies. Where are the boundary conditions?
What did the developer *not* expect? etc.

I tend to be pretty good at coming up with "unexpected"
operating conditions. When the developer invariably
*whines* saying "but you're not supposed to *do* that!"
I grin and reply, "Then don't LET me!"

I worked in a syetem test department for a while. I collected a list of best excuses "that hardly ever happens", "the customer would never do that" and my all time favourite "but the code says that's what happens!"

Engineering is the art of anticipating those surprises!
In the "ethereal" world of software, this is considerably easier
than in the physical world ("What happens if a guy whacks it
with a 20lb sledge hammer?")

yes but most engineers don't have to cope with millions of potentially interacting parts.
Of course! That's what makes me good at what I do! :>


Actually, this is common on very large scale projects (which,
otherwise, would never be able to run a single invocation to
completion, otherwise).

? why not? Because they don't terminate?

I was talking about actual proofs of correctness. VDM, Z, Sparc. Are these widely used?
Where it *doesn't* happen is smaller projects, "lone wolf"
developers, "immature" organizations, etc. "Pay me now or
pay me later" is a great mantra when it comes to software
quality and testing, in general.
The writers of the Space Shuttle software might have achieved [] what
you're trying to do. At enourmous cost.

There are many industries where formal testing and validation
are the norm and "required" practices.

yes. I know. The Space Shuttle software was a step above this. CMM level 5 and all that.

Formal testing is not the same as formal methods
Surprisingly, the
cost for this isn't as great as one would think. Once a
module has been validated, that cost is behind you -- any
future use of the module comes with an *assurance* of proper
operation (within the constraints of the contract/spec).

In these environments, you really benefit from code AND
design reuse.

OTOH, if you are developing "ad hoc" and testing is left
to the whims of the developer, chances are you can't
afford a proper validation environment and you'll just keep
paying for it in OTHER ways (bugs, feature creep, training,
etc.) without ever being able to wrap your hands around the
entire problem.

[Do you get the impression that I am a big advocate of
formalized methods? :> ]
 
D

Don Y

Hi Ian,

OK, it's a dull Sunday morning, so I cooked up a quick working example
using Google Test.

The simple function myMalloc() will:

Return NULL for a zero size request.
Return a correctly aligned pointer within the heap for a non-zero request.
Use a fixed heap base of 0xA000 and size of 0x1000 on TARGET_X.

But those criteria might not test the full capabilities of
myMalloc. E.g., does it behave when the heap size is increased
to 0x6000? Does it behave if I move the heap to 0x0000? Does
it work for "big" requests -- e.g., a megabyte? (it *should*
on TARGET_Y but that wouldn't make sense on TARGET_X... two
different test suites?)
#if defined TARGET_X
const uint8_t* heapBase = 0xA000;
const size_t heapSize = 0x1000;
const size_t minAlign = 8;
#else
uint8_t* heapBase = NULL;
size_t heapSize = 0x1000;
size_t minAlign = 8;
#endif

You are relying on being able to enumerate all of the test
conditions for each potential platform. But you can't write
that test suite "today" without knowledge of what they will be
(e.g., 1MB request).

The problem is inherent in the fact that this is an area
(pointers, size_t) where the language gives the implementation
lots of flexibility. OTOH, you *know* that a double will
be good for ~10+ digits...
TEST_F( MyMallocTest,
returnIsNullForZeroByteRequest )
{
ASSERT_FALSE( myMalloc(0) );
}

Should be null for "too big" request.
TEST_F( MyMallocTest,
returnIsInHeapForOneByteRequest )
{
ASSERT_TRUE( myMalloc(1) >= heapBase );
ASSERT_TRUE( myMalloc(1) < heapBase+heapSize );
}

Should be "in heap" for *all* requests.
TEST_F( MyMallocTest,
returnForOneByteRequestCorrectlyAlligned )
{
ASSERT_EQ( 0, (size_t)myMalloc(1)%minAlign );
}

Should be correctly aligned for *all* requests.
TEST_F( MyMallocTest,
returnForOneByteRequestCorrectlyAllignedFor128Bytes )
{
minAlign = 128;
ASSERT_EQ( 0, (size_t)myMalloc(1)%minAlign );
}

The Google test code is at the end to avoid clutter.

The point of my "should be's" is that you can pin the
function's return values down much tighter than that. And,
you *need* to be able to do so in order to assure yourself
that it truly is operating properly.

If I run the returnIsInHeapForOneByteRequest 0x1000 times,
will it always pass (without an intervening free())?
Shouldn't it fail at the 0x1000/minAlign point, instead?
Am I sure that no two of the returned pointers EVER coincide?

E.g., you wouldn't write:

ASSERT(sqrt(3) > 1)
ASSERT(sqrt(3) < 2)
ASSERT(sqrt(3) > sqrt(2))
ASSERT(sqrt(3) < sqrt(4))

*All* of these are appropriate. But, what you really want
to know is

ASSERT(sqrt(3) =~ 1.732...)

This embodies *all* of the above tests and gives the
function far less "wiggle room".

(malloc -- and your myMalloc -- are too loosely defined for
robust formal testing. That's one of the reasons why I
write my own memory management tools. I need to be able
to *predict* the return values for each invocation -- since
I know how I've called it in each *prior* invocation, this
should be straightforward for a deterministic function)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,082
Messages
2,570,589
Members
47,212
Latest member
JaydenBail

Latest Threads

Top