Function pointer vs conditional branch

  • Thread starter Edward Rutherford
  • Start date
B

BGB

Kaz Kylheku wrote:
)> an indirect function call involves call/return magic
)> (creating/destroying stack frames, passing any arguments, ...), and also
)> can't generally be branch-predicted
)
) You can safely predict that an indirect function call is always taken,
) since it is unconditional.

But you can't predict *where* it will jump, which is the point.

really?

load r1,<whatever>
add r2, r3
jmp [r1]

We already know where the jmp [r1] will go as soon as r1 is loaded, and can start
prefetching instructions as soon as the value of r1 is stable.

in a use, such as, say:
....
mov eax, [ebp-...]
mov ecx, [eax]
call dword [ecx+0x44]

the CPU may not know the address until the memory loads get done (in the
call instruction), so the pipeline may not get filled.


a lot though depends on how likely it is that one will endlessly jump to
the same address as before, which the CPU can optimize for somewhat.

That is called pipeline prefetch. Branch prediction is an educated guess
whether or not a conditional jump will be taken, or fall through.

never-mind any exactness of terminology or not...

I was meaning in a more general sense, where the branch target is
predicted and pipelined (generally) rather than the specific case of
predicting a branch-taken vs not-taken.
 
S

Stephen Sprunk

Kaz Kylheku wrote:
)> an indirect function call involves call/return magic
)> (creating/destroying stack frames, passing any arguments, ...), and also
)> can't generally be branch-predicted
)
) You can safely predict that an indirect function call is always taken,
) since it is unconditional.

But you can't predict *where* it will jump, which is the point.

really?

load r1, <whatever>
add r2, r3
jmp [r1]

We already know where the jmp [r1] will go as soon as r1 is loaded, and can start
prefetching instructions as soon as the value of r1 is stable.

True, but we can't know to start fetching instructions at [r1] until the
jump has at least been decoded. And are decoders smart enough to try to
resolve the jump immediately and send it to the prefetcher rather than
send it to an execution unit? And does that matter, given it may be
tens to hundreds of cycles until the load completes and the target is
known--and then there may be tens to hundreds more cycles until the
first instruction at the target is fetched and decoded?

S
 
B

BGB

The OP hasn't clarified exactly what he means. Perhaps selecting the
render mode simply means it is known at the start of the process.

this is why most of the rest of the post tried to address general tradeoffs.

That mode might still need checking on a per-line, per-pixel, or
per-byte basis, or maybe it's per primitive.

or per-scene...

In fact I think it is likely the OP knows perfectly well that a single
if-statement or indirect call per render (even if it's per frame, and
there are many frames per second), has no significance, but it is not
practical to have several independent lots of render subsystems, which
only differ in one small rendering detail. But I might be wrong...

dunno...

I think for whatever reason many people think "the video card is very
fast, yet the CPU very slow", and could very well actually think that an
if statement here or there can actually adversely effect performance.

from personal experience though, the GPU tends to get bogged down by
things like fill-rate and similar well before one has to start worrying
about the CPU getting overloaded by the occasional if-statement.

much complexity in a 3D engine still ends up being needed mostly to
figure out to (reasonably efficiently) cull the scene prior to passing
the geometry to the video card, as fill-rate can still be a bit of a killer.

(yes, yes, WRT numerical processing, the video card is much faster than
the CPU, but it spends a good deal more time drawing individual pixels
and running shader programs for each pixel and stuff, so it mostly
averages out, and games/... still tend to be much more video-card bound
than CPU bound).


so, maybe the OP was thinking that the time to execute an if statement
was larger than the time for the video card to draw a bunch of geometry
or similar?...


or such...
 
K

Kaz Kylheku

Kaz Kylheku wrote:
)> an indirect function call involves call/return magic
)> (creating/destroying stack frames, passing any arguments, ...), and also
)> can't generally be branch-predicted
)
) You can safely predict that an indirect function call is always taken,
) since it is unconditional.

But you can't predict *where* it will jump, which is the point.

really?

load r1,<whatever>
add r2, r3
jmp [r1]

We already know where the jmp [r1] will go as soon as r1 is loaded, and can start
prefetching instructions as soon as the value of r1 is stable.

in a use, such as, say:
...
mov eax, [ebp-...]
mov ecx, [eax]
call dword [ecx+0x44]

the CPU may not know the address until the memory loads get done (in the
call instruction), so the pipeline may not get filled.

You mean that the if there is inadequate prefetch in one instruction,
it may interfere with prefetch in a later instruction?

Earth-shattering!
 
K

Kaz Kylheku

Kaz Kylheku wrote:
)> an indirect function call involves call/return magic
)> (creating/destroying stack frames, passing any arguments, ...), and also
)> can't generally be branch-predicted
)
) You can safely predict that an indirect function call is always taken,
) since it is unconditional.

But you can't predict *where* it will jump, which is the point.

really?

load r1, <whatever>
add r2, r3
jmp [r1]

We already know where the jmp [r1] will go as soon as r1 is loaded, and can start
prefetching instructions as soon as the value of r1 is stable.

True, but we can't know to start fetching instructions at [r1] until the
jump has at least been decoded.

It is statically fixed in the code that it this instruction fetches through r1.
The identity of r1 won't change.

All branch-prediction/prefetch related optimizations in the CPU have to know
/something/ about an instruction ahead of time, before that instruction is
fetched and decoded. Special state info is maintained for that purpose,
right?
 
J

jacob navia

Le 07/01/12 00:49, Kaz Kylheku a écrit :
It is statically fixed in the code that it this instruction fetches through r1.
The identity of r1 won't change.

Only in statically fixed calls, i.e. when you write
call someFunction
or, in C
someFunction();

But we are speaking about dynamically loaded target address that is
*NOT* statically fixed as when you write in C

FnPtr fn

fn = GetFnAddress();
fn(34);

or

FnPtr fn[10];

(*fn[idx])(34);
 
K

Kaz Kylheku

Le 07/01/12 00:49, Kaz Kylheku a écrit :

Only in statically fixed calls, i.e. when you write
call someFunction
or, in C
someFunction();

But we are speaking about dynamically loaded target address that is
*NOT* statically fixed as when you write in C

No, I mean hat the identity of the register r1 will not change;
of course r1 has different contents in different invocations
of the code. But we can know that "this instruction branches
to the address of r1" without having to decode the
instruction every time. That decoded information can be cached in some
prefetch-related data structure in the processor.
 
M

Michael Angelo Ravera

Hello

As a general rule-of-thumb: Are function pointers more efficient than if
statements, if the same path will be taken throughout the course of the
program?

E.g. If I was going to repeatedly render an image in one of three ways,
and the method of rendering it gets selected at the beginning of
execution: Would it be more efficient to use a function pointer instead
of an if statement?

The main improvement may be to readability. It may be cleaner to work out which function to call once and just call it by pointer. The cost of an instruction to make an indirect call to a function or to make a conditional branch will be approximately equal. Either could be more effecient.
 
B

BartC

christian.bau said:
Of course you can. The simplest prediction is "the indirect jump
instruction will jump to the same place as it did the last time". And
just as with conditional branch prediction,

A conditional branch has a choice of two destinations.

The indirect call (it's not even a jump) has a choice of millions.

Even it is actually the same as last time, with the instructions after the
call possibly depending on new values for stack and frame pointers, what can
be done might be limited compared with a branch prediction.
the processor can start
executing instructions, and if the prediction turns out to be wrong,
the processor throws the results away and starts from the correct
place.

Hopefully it won't go too far in executing those instructions (such as
overwriting some essential global..).
 
K

Kaz Kylheku

the ordinary OoO mechanisms. Calls to (C++) virtual functions, which
are quite similar to the case being discussed, are a major reason for
branch target prediction.

Also, shared library calls, which are another example of late binding.
 
J

Joe keane

E.g. If I was going to repeatedly render an image in one of three ways,
and the method of rendering it gets selected at the beginning of
execution: Would it be more efficient to use a function pointer instead
of an if statement?

Take a look at optimizing switches, it's rather similar.

e.g.

if (ind == 0)
do_ind_0();
else if (ind < 3)
{
if (ind != 1)
do_ind_2();
else
do_ind_1();
}
else
{
...
}

versus:

if (ind < TOOBIG)
_asm('jmp _hdntbl[$ind]');
else
goto does_not_compute;
 
S

Stephen Sprunk

Also, shared library calls, which are another example of late binding.

I thought that case was handled with (load-time or lazy) relocation
and/or position-independent code.

S
 
S

Stephen Sprunk

A conditional branch has a choice of two destinations.

The indirect call (it's not even a jump) has a choice of millions.

.... and that's why modern processors have a Branch Target Buffer, which
tracks the last few targets of each indirect branch. When the decoder
finds an indirect branch, it looks up that branch in the BTB, predicts
the target and continues on from there.

Note that failing to predict the target would stall the CPU anyway until
the branch was resolved; an incorrect prediction resulting in a pipeline
flush is no worse, but there's always the possibility the prediction
will be right and that stall/flush can be averted.
Even it is actually the same as last time, with the instructions after
the call possibly depending on new values for stack and frame pointers,
what can be done might be limited compared with a branch prediction.

.... just as if the branch didn't exist. That's standard OoO stuff,
nothing special for indirect branches.
Hopefully it won't go too far in executing those instructions (such as
overwriting some essential global..).

Of course not; nothing is committed until an instruction reaches the end
of the pipeline. If a branch misprediction is detected (or various
other OoO hazards), the pipeline is flushed--and the processor stalls
while the pipeline is refilled.

S
 
K

Kaz Kylheku

It would be possible, at least on some architectures, to patch the
original call directly, but for whatever reason, that's not
particularly popular.

That reason would be that if you treat the code as immutable, you can
map the same pages into multiple address spaces (a big part of the "shared" in
"shared object").
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,083
Messages
2,570,591
Members
47,212
Latest member
RobynWiley

Latest Threads

Top