On Dec 31 2011, 2:24 am, BGB<
[email protected]> wrote:
...
Yes. The task switcher would include code along the lines of
for the outgoing task and, for the incoming task,
Since the word would be checked many times between task switches this
is much faster than following a structure off FS or similar.
possibly, however the issue then partly becomes:
how does the task switcher know where the variable is?...
Isn't this easy if you have a flat address space: just put the
variable at a fixed location?
if the task switcher is part of the OS kernel (it generally is), then it
does not have direct visibility of variables declared in a user-space
binary.
Does this mean that the global, G, in the following would not be
accessed directly?
int G;
void sub() {
while (G> 0) {
G -= 1;
sub();
}
}
int main(int argc, char **argv) {
G = argc;
sub();
return 0;
}
it depends on the target (CPU mode, compilation mode, and object format).
if compiling for PE/COFF (EXE or DLL), it will be accessed directly.
for ELF, in this case it will generally be accessed indirectly via the
GOT (since the compiler will not know at this point where the variable
is declared, a "common" variable could be either within the local ".bss"
segment, or potentially imported from another SO, but this is only known
to the linker).
if it were initialized or static:
static int G;
or:
int G=0;
then the compiler would know that it was defined in the same compilation
unit, in which case:
on 32-bit x86, it will depend on whether or not this is PIC code (which
will generally depend on whether or not this is a Shared-Object).
on x86-64, the CPU has RIP-relative addressing, and so will presumably
use this to access the variable (for both SO's and non-SO binaries).
also, segment overrides are fairly cheap.
True. It obviously depends on the CPU but in my tests, while segment
loading cost a bit, segment overrides were completely free.
yep.
the cheapest option then would probably be to include whatever magic
state directly into the TEB/TIB/whatever, then be like:
mov eax, [fs:address]
in some cases, accessing a variable this way could in-fact be
potentially cheaper than accessing a true global.
As mentioned, I would expect a file-scope global to be accessed
directly - as a machine word at a location that gets filled-in by the
linker. Am I wrong? Are you are thinking of library code?
...
I am thinking for the case where an OS scheduler needs access to a
userspace global variable.
the kernel can't easily know about this case, unless something "special"
is done.
Why is this no good? There can be any number of user exception types
so it's not possible to have one bit for each. The best I could think
of was to have one bit to indicate a user exception and functions to
examine as much other detail as needed.
exceptions don't generally work this way.
yes, "one bit per type for every type" is absurd, but using one bit
per-type for certain exceptions but not others, isn't much better.
If only one exception at a time is needed that makes things
significantly easier. But is only one needed? I was thinking of
exceptions caused in exception handlers. For example, the exception
handler divides by the number it first thought of and triggers an
overflow or it checks an array index and finds it out of bounds. What
does the system do?
It could leave the first exception in place - after all that was the
root cause. Or it could replace the exception information with its own
- after all it has an internal fault. Neither seems obviously best. It
might be best to keep both indications until both have been either
resolved or reported ... but it would be easier to handle if only one
was kept.
most exception handler systems will simply ignore the first and rethrow
the next.
some systems (those with Resumable Exceptions, such as Win32 SEH),
generally nest them sort of like a stack, so (AFAIK) exceptions are
handled in LIFO order provided all are resumable exceptions. if a
non-resumable exception occurs, then it will unwind and presumably there
is no way to handle prior exceptions.
If an exception "can't be thrown"? What would prevent it being thrown?
stack overflow;
crash within dispatcher;
(Win64) if function prologues/epilogues are missing or illegible;
....
Not very good, though, is it! In reality I think that Windows or any
OS will handle almost all exceptions. A BSOD should only occur on a
hardware failure such as a machine-check exception. (It looks like
Windows does run third-party drivers as privileged this increasing its
vulnerability many-fold.)
nope.
apparently the Windows kernel treats pretty much any CPU exception
within the kernel as a BSOD-causing offense.
this leads to more BSODs, potentially, but the argument is that if
something has gone wrong in the kernel, it is better to try to kill the
system "gracefully" than to continue on in a potentially inconsistent state.
AFAIK, Linux tries a little harder, displaying a register dump and
similar and tries to recover, or failing this does a "kernel panic".
user-space exceptions can be handled more gracefully (raising signals or
invoking exception handlers, ...).
interestingly, the "This program has performed an illegal operation and
must be shut down." message is also an exception handler. if one crashes
the application hard enough, one will not see this, and instead the
process simply/silently dies/disappears.
in the past, this could leave dead/phantom windows (don't redraw, can't
be moved or closed, ...) but I suspect newer versions have addressed
this (any windows owned by the process are automatically destroyed even
if the process dies rather non-gracefully...).
I could be wrong on the details here, mostly speaking from old memories.
Following your analogy of a CPU you could perhaps think of each
exception condition as an interrupt line that gets asserted until the
condition causing the interrupt has been resolved. I know, as an
analogy it's not perfect. :-(
flags and similar are used here.
so, normal interrupt occurs:
transfers to handler (and temporarily disables other interrupts);
handler does its thing;
iret (return from interrupt, restores prior CPU state, although GPRs/...
generally need to be manually saved/restored).
if things go bad in the first handler, then a double-fault occurs (a
dedicated special interrupt).
if the DF handler itself fails, the CPU reboots.
AFAIK, older CPUs would essentially just ignore whatever (maskable)
interrupts occurred during the handler, but newer ones will essentially
queue them and handle them in a FIFO order.
my memory gets a bit foggy here, as it has been a long time since I have
messed with this stuff...
I'd like to understand why they would be. I tried compiling the code
above with gcc and it seems to refer to G directly such as in sub()
the compiler generates
mov DWORD PTR G, %eax
but, with which OS and what compiler settings.
if this was Linux-i386 and "-shared" or "-fpic" was given, one would not
likely see a global being loaded this way.
if it was on Win32, then the above is what one would expect to see.
I know. I found that in some systems there is a call which returns the
address of errno. In that case the compiler should (hopefully)
remember that address for each use in a function.
this depends...
in most cases, it probably wont, since the compiler can't be sure that
the function wont return a different address each time (though I wont
rule out the possibility of some sort of a "this function will always
return the same value" modifier/attribute).
Don't forget the original intention was speed. The point of a status
word, as you call it, is to allow exception handling (or, non-
handling) quickly. It is not an end in itself but a means of adding
exceptions to C in a way that recognises exceptions in the right
places while having the lowest possible impact on overall performance.
it is debatable that this is actually faster...
the problem is if the status is checked in each function (possibly
multiple times) then this may infact cost more overall than even
registering/unregistering exception handlers.
handlers capable of reflective unwinding (the GCC/DWARF strategy, Win64
SEH), only have a cost if an exception occurs, and are free otherwise
(apart from debatable "possibly contrivance" in the prologue/epilogue
case WRT Win64 SEH).
in my own (mostly unsused) ABI, I had done something like:
push ebp
mov ebp, esp
push ... ;push-register sequence, like in Win64
sub esp, ... ;space for locals/...
....
lea esp, [ebp-offs]
pop ...
pop ebp
ret
nop [magic_info] ;function metadata (optional)
this preserved a property that is desirable from typical with cdecl, but
is lost with both Win64 and SysV/AMD64: the ability to backtrack simply
using an EBP/RBP chain (with Win64, one has to unwind using awkward
logic, and on SysV/AMD64 one may be SOL apart from using DWARF as the
ABI does not mandate any sort of standardized frame pointer).
in the 32-bit case, my ABI would have been otherwise nearly identical to
cdecl, only differing in the use of special prologues and epilogues.
the 64-case was mostly similar to the 32-bit case and Win64, differing
from Win64 mostly in that the first 4 arguments are always passed on the
stack (and RBP being reserved as the frame pointer).
(gluing between them would generally be little more than a few 'mov'
instructions or similar).
in practice, I had mostly just used the Win64 and SysV/AMD64 ABIs (and a
custom name-mangling scheme). my implementation of the SysV/AMD64 ABI
was technically kind of crappy though, as I didn't bother much with
complex edge cases (passing/returning structs in registers, ...), and
due to technical reasons, arguments were stored to the stack and loaded
into registers just before the call operation.
otherwise, cross-ABI calls would require the use of costly
transfer-thunks (which basically save/restore registers and re-pack
argument lists, so a poorly-implemented SysV/AMD64 was at least a little
faster).
or such...