Ok. Here we go. My test code is at the end. All results are from runs
executed today.
Thank you for the test code and your work. That's quite fair and a good
basis for discussions.
And then it's fair enough to invest some time by my own.
[...]
Also, if you see any problems with my test, please say so, so I can
fix it and rerun it.
No problem, but some remarks:
- Windows timers based on TickCount and Clock are quite inaccurate,
since they are updated on each IRQ and therefore have a resolution of
15ms (on most systems). For long running tests (such as this) and
to get a first impression they are o.k.
I prefer PerformanceCounters for high precision timings
(but for this test I don't think it makes a difference)
- From the command line parameters I conclude you are using VS 2008.
I use VS 2010. I don't think that they've changed the exception model
completely in VS2010 - but who knows I've gonna check this at work
next week. There is a free version of VS2010 available, but I don't
know if the code optimization is restricted (AFAIK no - since 2010).
I run the tests on my laptop - Intel dual core 2.2 GHz.
I used the same parameters, but full optimization -> but shouldn't have
a significant effect.
C++ command line:
/Zi /nologo /W3 /WX- /Ox /Oi /Ot /GL /D "WIN32" /D "NDEBUG" /D
"_CONSOLE" /D "_UNICODE" /D "UNICODE" /Gm- /EHsc /GS /Gy /fp
recise
/Zc:wchar_t /Zc:forScope /Yu"StdAfx.h" /Fp"x64\Release\ForumCpp.pch"
/Fa"x64\Release\" /Fo"x64\Release\" /Fd"x64\Release\vc100.pdb" /Gd
/errorReport:queue
Linker:
/OUT:"Cpp.exe" /INCREMENTAL:NO /NOLOGO "kernel32.lib" "user32.lib"
"gdi32.lib" "winspool.lib" "comdlg32.lib" "advapi32.lib" "shell32.lib"
"ole32.lib" "oleaut32.lib" "uuid.lib" "odbc32.lib" "odbccp32.lib"
/MANIFEST /ManifestFile:"x64\Release\Cpp.exe.intermediate.manifest"
/ALLOWISOLATION /MANIFESTUAC:"level='asInvoker' uiAccess='false'" /DEBUG
/PDB:"trash\Cpp.pdb" /SUBSYSTEM:CONSOLE /OPT:REF /OPT:ICF /PGD:"Cpp.pgd"
/LTCG /TLBID:1 /DYNAMICBASE /NXCOMPAT /MACHINE:X64 /ERRORREPORT:QUEUE
I've got the following results: (rearranged output)
Command line: 60000 -10 1:
Inlineable Member Return Code : 3.182
Inlineable Global Return Code : 3.151
Manually Inlined Return Code : 3.198
Virtual Return Code : 14.368
Virtual Return Code Fake Try : 17.394
Manually Inlined Optimized Return Code : 3.182
Virtual Exception : 14.243
Command line: 60000 -10:
Inlineable Member Return Code : 3.26
Inlineable Global Return Code : 3.198
Manually Inlined Return Code : 3.198
Virtual Return Code : 14.259
Virtual Return Code Fake Try : 17.459
Manually Inlined Optimized Return Code : 3.183
Virtual Exception : 17.487
Nearly the same results, besides some neglectable differences.
Only one result shows a difference:
After I've changed the code
class TestImpl2 : public TestInterface
{
public:
virtual void virtualCanThrow(int a, int b, int targetSum)
{
// cout << "XXX" << endl;
}
virtual ReturnCodeT virtualReturnCode(int a, int b, ...
{ //cout << "XXX" << endl;
return Success;
}
};
into
class TestImpl2 : public TestInterface
{
public:
virtual void virtualCanThrow(int a, int b, int targetSum)
{
if (a + b == targetSum)
cout << "XXX" << endl;
}
virtual ReturnCodeT virtualReturnCode(int a, int b, int .... { //cout
<< "XXX" << endl;
if (a + b == targetSum)
cout << "XXX" << endl;
return Success;
}
};
I've got the same results.
So far I don't experience speed differences.
> However, sadly too many compiler writers and ABI writers missed this
> very important memo, which I think is core to the entire C++
For x86 Windows I agree.
But let's have a look at the assembly code of your other test code:
int main(int argc, char* argv[])
{
try
{
if (argc == 3) throw 1;
}
catch(int)
{
return -1;
}
return 0;
}
Windows x86 code VC2010:
00BA1002 in al,dx
00BA1003 push 0FFFFFFFFh
00BA1005 push offset __ehhandler$_wmain (0BA1980h)
00BA100A mov eax,dword ptr fs:[00000000h]
00BA1010 push eax
00BA1011 mov dword ptr fs:[0],esp
00BA1018 sub esp,8
try
{
if (argc == 3) throw 1;
00BA101B cmp dword ptr [ebp+8],3
00BA101F push ebx
00BA1020 push esi
00BA1021 push edi
00BA1022 mov dword ptr [ebp-10h],esp
00BA1025 mov dword ptr [ebp-4],0
00BA102C jne $LN8+14h (0BA105Dh)
00BA102E push offset __TI1H (0BA22C8h)
00BA1033 lea eax,[ebp-14h]
00BA1036 push eax
00BA1037 mov dword ptr [ebp-14h],1
00BA103E call _CxxThrowException (0BA1970h)
}
catch(int)
{
return -1;
00BA1043 mov eax,offset $LN8 (0BA1049h)
00BA1048 ret
$LN8:
00BA1049 or eax,0FFFFFFFFh
}
return 0;}
00BA104C mov ecx,dword ptr [ebp-0Ch]
00BA104F mov dword ptr fs:[0],ecx
00BA1056 pop edi
00BA1057 pop esi
00BA1058 pop ebx
00BA1059 mov esp,ebp
00BA105B pop ebp
00BA105C ret
00BA105D mov ecx,dword ptr [ebp-0Ch]
00BA1060 pop edi
00BA1061 pop esi
00BA1062 xor eax,eax
00BA1064 mov dword ptr fs:[0],ecx
00BA106B pop ebx
00BA106C mov esp,ebp
00BA106E pop ebp
00BA106F ret
Windows x64 code VC2010:
{
if (argc == 3) throw 1;
000000013F13100D cmp ecx,3
000000013F131010 jne wmain+2Ch (13F13102Ch)
000000013F131012 mov dword ptr [rsp+20h],1
000000013F13101A lea rdx,[_TI1H (13F1324A0h)]
000000013F131021 lea rcx,[rsp+20h]
000000013F131026 call _CxxThrowException (13F13190Eh)
000000013F13102B nop
}
return 0;
000000013F13102C xor eax,eax
000000013F13102E jmp $LN8+3 (13F131033h)
{
return -1;
000000013F131030 or eax,0FFFFFFFFh}
000000013F131033 add rsp,38h
000000013F131037 ret
There is a huge difference. In x86 code the old implementation is used
-> exception stack which pointer is held in [fs] segment register.
In x64 code there is no exception stack anymore since the compiler uses
a static table for stack unwinding, therefore no overhead if no
exception is thrown.
So the implementation should be comparable to Linux / Unix systems and
compilers.
Since I already mentioned that SEH doesn't add any overhead (if the
compiler ignores SEH exceptions and doesn't need to track them if
thrown) it's the compilers fault if there is a speed difference.
And I don't think for example GCC under Windows uses a different
exception model than under Linux ? Or does it ?