Typecasting Pointers on a 64 bit System

B

Ben Bacarisse

James Kuyper said:
On 11/16/2011 06:53 AM, James Kuyper wrote:
...

Yes, I can count. However, when I decided to add realloc() to the list,
I forgot to update that to "four".

For even greater generality you could replace "variables" with "objects"
and you'd be including compound literals that appear inside a function.

Oh, reading on I see "defined" which does fit for CLs. "Created" maybe?
 
S

Seebs

Astounding. And I say that as someone who's spent a long time in maintenance
fixing major ****-ups by people who just didn't get it. But not one of them
(even the "we've read /Design Patterns/, we don't have to care about O(n^2)
getting huge" crowd, and even the "shift by N bits by shifting by 1 bit N-1
times" dude) would have done *that*.

You would be amazed.
I hope you're exagerating. I guess I should get a nice warm feeling of
comfort that ${DAYJOB} really is about as great a bunch of coders as
you could ever hope to work with.

Nope, not exaggerating. Couple jobs back, there was this guy who had
on paper a decade of C experience, who absolutely *insisted* that this
had to work, and was the only reasonable way to avoid memory leaks.

And he got promoted to being in charge of stuff.

But yeah. It is sort of astounding, but then... I read a lot of code,
and it is full of nonsense. Today's:

*p++ = toupper((unsigned char) *p);

This is in code that's widely used, and gcc even warns about it... If
you compile at -O0. Which most people don't.

In another fairly famous piece of software, we came across:

unsigned char x;

x = NULL;

which, again, "had worked most of the time".

I guess that's the thing. Most code is not subject to serious technical
review, and most programmers are sorta shoddy. Heck, I do horrible stuff,
it's just that I usually get it caught in review, or I catch it myself
if I'm lucky.

-s
 
F

Fritz Wuehler

Let's try something else. I have a lot of experience in programming but I
have written about 10 lines of C (and no Java, C++ or anything related) and
not had to work with it or close to it ever.

Does this code fragment mean

1. allocate some storage
2. free it
3. assign a value to the area pointed to by the pointer to the storage you just
freed

If so that won't work forever, especially not in a multiuser or
multiprocessing environment. It might work for a long time on a single user
DOS though. Where is this code supposed to run?

If I got it right you don't need 3-5 years of programming in C to realize
the problem you just need to have a clue. If I am wrong then I will probably
need 3-5 years of programming in C to understand the issue!

Btw I don't understand the nuances of the first line, does it mean define a
structure called foo that contains a pointer to storage enough for one
pointer to a structure named foo that contains only a pointer?
 
S

Seebs

Let's try something else. I have a lot of experience in programming but I
have written about 10 lines of C (and no Java, C++ or anything related) and
not had to work with it or close to it ever.
Does this code fragment mean
1. allocate some storage
2. free it
3. assign a value to the area pointed to by the pointer to the storage you just
freed
Yup.

If so that won't work forever, especially not in a multiuser or
multiprocessing environment. It might work for a long time on a single user
DOS though. Where is this code supposed to run?

It was running originally on SunOS, which is Unixy, and in which the
default behavior (everything involved was single-threaded) was basically
that the pointer would probably stay valid for a while until something else
got allocated using the same space.
If I got it right you don't need 3-5 years of programming in C to realize
the problem you just need to have a clue. If I am wrong then I will probably
need 3-5 years of programming in C to understand the issue!

Well, yes. And that's my point; it is not reasonable to infer that someone
who has been professionally programming for several years in a language has
any particular clue.

Which is why I think pointing out that there are risks involved in, say,
handing a pointer to allocated memory to another thread in a multithreaded
program is probably reasonable. Yes, all the experienced-and-competent
programmers know that, but lots of people are only one or the other of
those, and lots are neither.
Btw I don't understand the nuances of the first line, does it mean define a
structure called foo that contains a pointer to storage enough for one
pointer to a structure named foo that contains only a pointer?

No, it means declare a pointer-to-struct-foo named 'f', and set the pointer
to the return of malloc(N), where N is "the number of bytes you need to
hold a thing of the same type as *f".

-s
 
K

Kaz Kylheku

Let's try something else. I have a lot of experience in programming but I
have written about 10 lines of C (and no Java, C++ or anything related) and
not had to work with it or close to it ever.

Does this code fragment mean

1. allocate some storage
2. free it
3. assign a value to the area pointed to by the pointer to the storage you just
freed

If so that won't work forever, especially not in a multiuser or
multiprocessing environment.

In practical terms (how this typically works): the freed object's memory area
is no longer owned by the program, but by the memory allocator. The memory
allocator can put bits there, such as link pointers that put the object on a
free list. If that is how the allocator works, it will typically put those
bits there right away, before the free function returns.

In many memory allocators, in the interests of saving space, allocated objects
do not have any headers. This allows objects to be adjacently allocated with
no wasted space. Only free objects are kept on free lists, and the bookkeeping
pieces such as pointer and flag fields for keeping free lists go inside
the objects, in the same memory where the application's data used to be
when those objects were allocated.

After the program did the above, it might continue executing fine until the
next time it makes any kind of call to the memory allocator. This is true even
if there is no concurrency. The memory allocator might walk its free lists
and hit a bad pointer due to the clobbered memory location.

In ``ISO C language lawyer'' terms, as soon sa free(f) is called, the pointer f
(and all copies of f that the program may have elsewhere) become
``indeterminate'' values. The use of an indeterminately-valued object results
in undefined behavior.

I.e undefined behavior occurs in the expression f->x itself, regardless of the
assignment to that location f->x = 1. The expression f->x uses the pointer f
(to access the place x, or to designate it as an lvalue). This use is undefined
since f is indeterminate.
It might work for a long time on a single user
DOS though.

Perhaps, if the program does not ever make another call to functions like
malloc, free, realloc, calloc again.

If the DOS memory allocator is being directly used by malloc, this might screw
up the operating system.
If I got it right you don't need 3-5 years of programming in C to realize
the problem you just need to have a clue.
If I am wrong then I will probably
need 3-5 years of programming in C to understand the issue!

Btw I don't understand the nuances of the first line, does it mean define a

The snippet is incomplete. The first line assumes that there is a 'struct foo'
type that has been previously defined, which has a member x of integer type (or
some arithmetic type to which 1 can be assigned). The first line could define
the type by including the body of the structure:

struct foo { int x; } *f = malloc(sizeof *f);

This is a declaration with an initializer. The 'struct foo .. { ... }'
is the list of declaration specifiers (containing one specifier: that
for a structure type). The *f part is a declarator, declaring a name f.
The * type construction operator means that f is declared as a pointer
type (a pointer to what? The type produced by the declaration specifier list).

C declarations are split into specifiers and declarators, which allows
multiple declarators to hang off a shared "stem" of specifiers, and
yet declare different kinds of things:

/* x is of type int; y is a pointer to int;
z is a function of no arguments returning int. */

int x, *y, (*z)(void);

Our pointer f is initialized from the return value of a call to the principal C
memory allocation function, malloc. sizeof is an operator for computing the
sizes of types based on object-designating expressions or type expressions, and
*f is an object-designating expression which means "the object obtained by
dereferencing the pointer f". f is not yet initialized, so the object does not
exist, but that doesn't matter because sizeof doesn't use an expression's
value, only its type. The type of *f is "struct foo". So we are asking malloc
for enough bytes to cover the structure.

If you write some_type *p = malloc(sizeof *p), then the size automatically
adjusts itself if you edit something to other_type: you have not
mentioned the type name in two places, as in:
some_type *p = malloc(sizeof (some_type)); /* more error-prone */
 
J

Joe Pfeiffer

Let's try something else. I have a lot of experience in programming but I
have written about 10 lines of C (and no Java, C++ or anything related) and
not had to work with it or close to it ever.

Does this code fragment mean

1. allocate some storage
2. free it
3. assign a value to the area pointed to by the pointer to the storage you just
freed

If so that won't work forever, especially not in a multiuser or
multiprocessing environment. It might work for a long time on a single user
DOS though. Where is this code supposed to run?

You're right that it won't work forever -- it's going to fail as soon as
that space is malloc'ed again, and written over. Whenever that is.

This has absolutely nothing to do with multiuser or multiprocessing
environments, as processes all have their own address space (subject to
some very specific sharing, which won't have anything to do with the
dynamic memory heap).
If I got it right you don't need 3-5 years of programming in C to realize
the problem you just need to have a clue. If I am wrong then I will probably
need 3-5 years of programming in C to understand the issue!

Btw I don't understand the nuances of the first line, does it mean define a
structure called foo that contains a pointer to storage enough for one
pointer to a structure named foo that contains only a pointer?

It means declare a pointer to an object of type 'struct foo' (that's
what the 'struct foo *f' part means, allocate a block of storage
large enough to hold an object of that size, and assign the pointer to
point at the space (that's what the '= malloc(sizeof(*f));' part
means).

'struct foo' must have been defined elsewhere for this to work.
 
B

blmblm

I tend towards this view, myself.

If nothing else, I do not consider it safe to assume that generic programmers
not previously verified to understand the issues have any awareness of the
issues of passing the address of "some object somewhere" into a thread
mechanism or the like.

Heck.

Consider:

struct foo *f = malloc(sizeof(*f));
free(f);
f->x = 1;

I do not consider it a reasonable assumption that an arbitrary person
with 3-5 years or more of experience programming C will see anything
wrong here. I've seen someone who had >10 years claimed experience and
got promoted to lead engineer for a project who insisted that this was
not only permissible but the only way to avoid memory leaks in C.

Remarkable. Just .... remarkable.

For some reason lately the following quotation, attributed to Charles
Babbage, has been coming to mind a lot:

"On two occasions, I have been asked [by members of Parliament],
'Pray, Mr. Babbage, if you put into the machine wrong figures,
will the right answers come out?' I am not able to rightly
apprehend the kind of confusion of ideas that could provoke such
a question." -- Charles Babbage (1791-1871)

Something about that "I am not able to rightly apprehend the kind
of confusion of ideas ...." :)
... And yes, the project failed utterly and everyone involved lost their
jobs. Funny only because it was over a decade ago. :)

It is very easy for people who have finally been taught that it's not okay
to return the address of a local variable to conclude that it's fine to
pass the address of a local variable to a function you call. And you're
*calling* pthread_create(), see. So that's fine.

Well, now *I*'m the one who's confused .... Are you saying
it's not fine? I suppose if the function you call creates an
independent thread, and that thread persists longer than the one
that created it, yes, there could be a problem .... Was that
your point?
 
B

blmblm

[ snip ]
In this case, the intent of the "arg" argument to pthread_create()
is that it should be a pointer to data needed by the start routine.

Which in my thinking is a strong argument for *not* using creative
casting to pass some other kind of data.

Granted that it may take a bit of thought to be sure that both
the pointer and the pointed-to data are stable enough to be
passed to another thread without creating the potential for race
conditions, and maybe that argues against using the parameter as
it was clearly meant to be used. But still, if anyone else is
ever going to read or use this code, well ....
 
S

Seebs

Well, now *I*'m the one who's confused .... Are you saying
it's not fine?
Yes.

I suppose if the function you call creates an
independent thread, and that thread persists longer than the one
that created it, yes, there could be a problem .... Was that
your point?

Yes.

But it's "fine" according to the simple rule people learn if they don't
really *get* threaded programming. In the absence of threads, passing the
address of a local variable to a function you call is generally safe.
(Not, of course, if that function stashes the pointer for later use...)

-s
 
P

Phil Carmody

James Kuyper said:
Off-hand I can think of three ways by which previously valid addresses
can be come invalid, and three of them are quite explicit: free(),
realloc(), and fclose().
However, the addresses of variables with automatic storage duration
become invalid as soon as execution of the block in which they are
defined ends; that seems pretty implicit to me. YMMV

I was considering the ending of a scope to be quite explicit.
I hope you're not suggesting that the fact that free(), realloc() and
fclose() can only occur explicitly somehow makes it trivial to ensure
that they have not yet been called for a given pointer? If that were the
case, garbage collection wouldn't be as popular an extension as it is.
Even in single-threaded code it takes a certain amount of discipline to
ensure that a pointer is not used after the memory it points to has been
free()d.

That's the fallacy of the hasty generalisation. Whilst /in general/ it's
hard to ensure all pointers are treated safely, that doesn't mean for every
single pointer it's hard.
When possible, I try to make sure that pointers into dynamically
allocated memory are stored only in automatic objects whose lifetime
ends immediately after the call to free(), rendering it impossible for
such problems to occur. However, it's not always possible to impose such
restrictions. When I can't do that, I insert code setting the pointer to
NULL immediately after the free(). However, to be helpful, that requires
that I protect all problematic uses of such a pointer with tests to
determine whether it's null.

That started off sounding like good defensive programming, but as the
issue in question was having multiple pointers to the same block of
memory, resetting and checking the pointer is not a solution.
When the expressions p=malloc(), *p=3, x=*p, and free(p) occur in two or
more different threads executing asynchronously, it seems to me that it
would be all that much harder to ensure that they occur only in the
proper order. I'm not saying it's impossible - just that it's harder.
Though Ian has assured me that this is not the case, I don't quite see
how that could be.

If the creator is tasked with freeing the memory, then clearly some
synchronisation is required. However, it the thread is tasked with the
free, then none is required (and the creator may defensively NULL the
pointer it no longer may assume is valid).

Phil
 
P

Phil Carmody

James Kuyper said:
On 11/16/2011 06:53 AM, James Kuyper wrote:
...

Yes, I can count. However, when I decided to add realloc() to the list,
I forgot to update that to "four".

It's OK, when in a discussion for mutual edification, I read for meaning,
not typos.

Phil
 
B

blmblm

Yes.

But it's "fine" according to the simple rule people learn if they don't
really *get* threaded programming. In the absence of threads, passing the
address of a local variable to a function you call is generally safe.
(Not, of course, if that function stashes the pointer for later use...)

Good heavens. That way lies madness .... though maybe not, since
is it paranoia if they really *are* out to get you?

I'd have put myself in the group that claims to "get" threaded
programming, but I'll admit I never thought about this particular
potential pitfall. Then again, I tend to write the kind of
threaded programs in which there's a "master" thread that creates
the other threads and doesn't terminate until all the others do,
and in that case surely passing addresses of local variables in the
master thread to the other threads is okay, no? (At least in the
sense that the memory will still be there for the other threads,
though if multiple threads have access to the same location, well,
there are the usual potential problems with that.)
 
J

James Kuyper

That's the fallacy of the hasty generalisation. Whilst /in general/ it's
hard to ensure all pointers are treated safely, that doesn't mean for every
single pointer it's hard.

I never meant to suggest that it's always hard. The origin of this
discussion was a general comment that "Pass-by-reference and threads
aren't known to play well" - I don't think it would appropriate to
interpret a comment so casually worded, as having been meant in an
absolute sense. That comment, in turn, was in response to proposals to
solve Quentin's problem by dynamically allocating memory to store the
file descriptor in. I don't think Quentin's description of the problem
was ever sufficiently specific to allow us to conclude that this would
have been an easy solution.
That started off sounding like good defensive programming, but as the
issue in question was having multiple pointers to the same block of
memory, resetting and checking the pointer is not a solution.

I don't think multiple pointers to the same block of memory has ever
been brought up in the messages leading to this one. You are, of
course, perfectly correct that the solution is not sufficient for that
case. Options include nulling all of the saved pointers, or designating
one particular pointer (most reasonably, the one that free() will be
applied to) which must be checked to make sure it's not null before
using any of the others. There's other, more complicated solutions
possible, depending upon the nature of the problem. The key point is
that, one way or another, if a stored pointer has a possibility of being
de-referenced when invalid, something must be done to remove that
possibility. That constrains designs in ways that make them harder to
write, harder to read, possibly less efficient, and more prone to error.

If the creator is tasked with freeing the memory, then clearly some
synchronisation is required. However, it the thread is tasked with the
free, then none is required (and the creator may defensively NULL the
pointer it no longer may assume is valid).

I was talking about pointers to memory that are shared between threads;
you're talking about the simpler case where a pointer to dynamically
allocated memory is passed between threads, but is never in use by two
of them at the same time. That's certainly a safer approach, but I doubt
that it's always acceptable to impose such constraints on how the
pointer is used.
 
P

Phil Carmody

James Kuyper said:
I was talking about pointers to memory that are shared between threads;
you're talking about the simpler case where a pointer to dynamically
allocated memory is passed between threads, but is never in use by two
of them at the same time. That's certainly a safer approach, but I doubt
that it's always acceptable to impose such constraints on how the
pointer is used.

There's certainly been some drift in the thread, and I'm quite possibly
guilty of blindly steering that.

The particular (pthread) scenario is impossible in the C codebase I
maintain, but I know that were something equivalent to crop up in a
pull request without adequate synchronisation primitives, I'd reject
it immediately. I.e. I don't just think it's acceptable, I think it's
obligatory in the context I'm familiar with. However, my perspective
is admittedly narrow.

Phil
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,083
Messages
2,570,591
Members
47,212
Latest member
RobynWiley

Latest Threads

Top