no-op function pointer

S

slebetman

Walter said:
I think you missed bluejack's idea.

The usual implementation semantics for a function pointer is:

- set up any register preservation needed
- set up the parameters
- unconditionally perform a subroutine call to the address indicated
by the function pointer -- historically if not to the actual instruction
location whose value is stored in the pointer, then to an instruction
location determined by taking the pointer value to be an offset from
some register or some segment pointer
- save any expected return parameters
- undo any necessary register preservation

with this all handled inline.

A potential replacement semantics for a function pointer would be something
like this:

- construct a pointer to a block of code that would set up the
parameters, and another pointer to a block of code that would
retreive return parameters to the proper locatons, and put the address
of the structure containing both into a register reserved for this
purpose... or push the structure contents onto the stack
- unconditionally perform a lightweight call to the address indicated
by the function pointer, either as a direct value or as an offset
as per the above. A lightweight call would just record the return
address and then do the branch
- deallocate the above structure / pop the values from the stack

The process of building a function pointer would then involve writing
into memory one of two things:

- if the pointer is NULL, just write a lightweight jump back to the
return location, without having called the setup or teardown routine
- otherwise, write a sequence to retrieve the address of the
setup routine and to perform a lightweight call to it, followed by
a sequence to perform the subroutine call the the actual code
location, followed by a sequence to retrieve the address of the
teardown routine and to perform a lightweight call to it.

I can see how register preservation might be able to be handled later
after the call. But setting up of parameters must still be done before
the call. This is because the compiler cannot know in advance where the
parameters will be coming from so it cannot handle it later, it must be
handled by the caller. So the parameter passing overhead is still
there.

There is one possibility where this optimisation may be possible. That
is if the function is only called at only one place in the code. In
this case, the compiler definitely know in advance where all the
function parameters will be coming from and can put all parameter
passing code in the subroutine itself rather than the caller.
Note: I cannot at the moment think of any implementation of the
idea that does not involve at least -one- call/return: to go
below that, one would need self-modifying code... or code that
just tests for NULL and branches around the call, which is of course
what the CALL_MAYBE_NOOP macro does...

(Yes, we've looped back to where we started, but now we know that
the idea wasn't impossible, but that the CALL_MAYBE_NOOP implementation
is probably much more efficient anyhow...)

Ahh.. in this case we are indeed back where we started and in fact
doing a NULL test. In which case why not simply write if(fp!=NULL)
which makes it clearer what you are actually doing?
 
W

Walter Roberson

Walter Roberson wrote:
I can see how register preservation might be able to be handled later
after the call. But setting up of parameters must still be done before
the call. This is because the compiler cannot know in advance where the
parameters will be coming from so it cannot handle it later, it must be
handled by the caller. So the parameter passing overhead is still
there.

At each call made to a function or function pointer, the compiler
knows where the parameters are coming from for that call. It
writes a section of code inline to wrangle the parameters into their
proper place (calling routines along the way as necessary), and then
does a full-call to the address obtained either from the pointer
or from the compile (link) time constant routine address. Like this:

Bn; /* all the code before the call */
Sn; /* the code to set up for this particular call
Cn; /* the actual call to the address from pointer or static value */
Tn; /* the code to teardown after the return, restoring registers etc. */
Mn; /* more code */

This can be replaced by,

Bn; /* all the code before the call */
push @Sn;
push @Tn;
LCn; /* lightweight call to address from pointer or static value */
Mn; /* more code */
....
S1: /* the setup code that would have been written for the first call */
return; /* lightweight return after S1 */
T1; /* the teardown code that would have been written for the first call */
return; /* lightweight return after T1 */
S2: /* setup code for second call */
return; /* lightweight return after S2 */
T2: /* teardown code after second call */
return; /* lightweight return after T2 */

You can see that as long as you can distinguish a boundary between
the previous code and the setup for the next call, then you could
take the block of setup code, move it elsewhere, and in its place insert
an absolute branch to that location and an asbolute return back after that
code; the only effect of such a transformation would be to slow the
execution by two branches. Clearly then you can take the moved
section of code, change the terminal absolute branch back after it to
be a return, and change the absolute branch into that section to be
a call. From there it follows that instead of putting the call inline,
that one could push the address that would be called, and defer the
calling into the setup code for the function itself. This would be
only a minor change to the call/return API. It shouldn't even change
optimization very much, I would think.


The success of this possibly depends upon the calling convention
remaining consistant. If, for example, calling a routine might
suffle the registers being used to track variables, then you could
run into restrictions on optimization. Suppose, for example, that
the optimizer uses Intra Procedural Analysis and determines that
the called routine is going to compute something (oh, say, take
the address of an array element), and that "soon" after the return
that the calling routine would compute the same thing -- and that
there is no specific return of the value of this computation from
the called routine to the calling routine. The traditional subroutine
ABI would be that anything computed inside a subroutine is opaque,
and that only the explicit result may be returned, and that those
are to come back through well-defined return registers. A human
optimizer looking over the same code might say, "Ah, but register
R6 of the calling code can be destroyed at this point, so if I
have the subroutine compute that item into R6 and I disable the
call linkage from preserving R6 and restoring the original value, then
I can skip computing that value in the calling routine, knowing that
it is already available. And if a human can make that kind of
optimization, then there are machine optimizers which could make
that kind of information lifetime analysis and number of times
called, number of times the internal results would be useful outside
and so on, and make the same kind of decision. If you have API-breaking
magic like that happening via the optimizer, then in theory
the code-rearrangement and semantics I set out code weaken the
opportunities for optimization. [In practice... probably not.]
 
E

Emmanuel Delahaye

Richard Crowley a écrit :
void function_that_does_nothing( void )
{
asm( " nop" );
return;
}

Huh ? What about

void function_that_does_nothing (void)
{
}

At least, it is portable !
 
S

slebetman

Walter said:
At each call made to a function or function pointer, the compiler
knows where the parameters are coming from for that call.
<snip>
Bn; /* all the code before the call */
Sn; /* the code to set up for this particular call
Cn; /* the actual call to the address from pointer or static value */
Tn; /* the code to teardown after the return, restoring registers etc. */
Mn; /* more code */
This can be replaced by,
Bn; /* all the code before the call */
push @Sn;
push @Tn;
LCn; /* lightweight call to address from pointer or static value */
Mn; /* more code */
....
S1: /* the setup code that would have been written for the first call */
return; /* lightweight return after S1 */
T1; /* the teardown code that would have been written for the first call */
return; /* lightweight return after T1 */
S2: /* setup code for second call */
return; /* lightweight return after S2 */
T2: /* teardown code after second call */
return; /* lightweight return after T2 */

If I understand it correctly, that would be result in worse performance
since it doesn't gain much over a NULL check of the function pointer
but requires two branches when an object actually needs to be handled
(fp is not NULL). That means we've speeded up "doing nothing" by
slowing down "doing something".

Furthermore, since the function pointer is not known at compile time
(the OP was emulating polymorphism) each function requires separate
set-up and tear-down codes for each location they are called from. So,
10 functions called in 3 places would result in 30 set-up and tear-down
codes.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,172
Messages
2,570,934
Members
47,473
Latest member
ChristelPe

Latest Threads

Top