D
David Turrell
A few months ago I attempted to compile an old version (3.0j) of
Nethack. Once I got a few modules to compile I disassembled the
code of the shortest and am puzzled by a couple of things.
1. Where the C code calls for an operation that algebraically is
(x - 19) * 10,000,000 [i.e. "return (10000000L*(lev-19));"]
gcc generates code which algebraically is
((x - 19) + (x - 19) * 31 * 63 * 8) * 5 * 128
and I'm wondering at such a roundabout method.
See the attached code, especially the disassembler listing at offsets
19 to 30 (hex).
2. I thought the use of NOPs was to facilitate pre-fetching of
branch or jump-to code; but the compiler doesn't align all jump-to
code on word boundaries, so I'm still at a loss to explain it.
See the pseudo NOPs at offsets 12 and 32 (hex) and compare those
to the lack of a NOP to word-align the instruction at offset 42.
Here are the source and object listings.
----------------------------------------
C SOURCE
Given a Nethack experience-level number
{1, 2, 3, 4, 5, 6, 7, 8, 9,
10, 11, 12, 13, 14, 15, 16, 17, 18, 19,
20, 21, 22, 23, 24, 25, 26, 27, 28, 29,
30}
newuexp(lev) returns the minimum number of experience points for
the *next higher* Nethack experience-level, except that there isn't
a level higher than 30
{20, 40, 80, 160, 320, 640, 1280, 2560, 5120,
10K, 20K, 40K, 80K, 160K, 320K, 640K, 1280K, 2560K, 5120K,
10M, 20M, 30M, 40M, 50M, 60M, 70M, 80M, 90M, 100M,
110M}
(DIS)ASSEMBLER LISTING
Compiled: 'gcc -ansi -O -I../include' [5/2/04]
Preprocessing: LINT undefined
Version: gcc version egcs-2.91.66 19990314 (egcs-1.1.2 release) [6/18/04]
OS: NetBSD 1.5.4
CPU: cpu0: AMD Athlon Model 1 (686-class), 604.31 MHz [panix3, 6/26/04]
Disassembled with 'objdump -d'.
Offset Object code Opcode Src, dest operands
---------------------------------------------------------
Stardard C linkage:
old base pointer pushed on stack
base pointer = top-of-stack pointer
Reg C = lev [arg list begins at offset 8 from stack top].
If lev > 9 go to offset 14 hex.
Reg A = 10 dec. Set up for lev 1-9.
Jump to offset 40 hex, where 10 * 2^lev will be done.
To put following jump-to code on word boundary? Instead of 2 NOPs?
If lev <= 19 dec go to offset 38 hex, for lev 10-19 setup.
This multiplies (lev - 19) by 10 million in a puzzlingly roundabout way,
for lev 20-30.
Putting the above 10 instructions in algebraic notation:
Reg C = lev - 19 [2's complement]
Reg D = lev - 19
Reg D = (lev - 19) * 32
Reg D = (lev - 19) * 31
Reg A = (lev - 19) * 31
Reg A = (lev - 19) * 31 * 64
Reg A = (lev - 19) * 31 * 63 = (lev - 19) * 1953
Reg A = (lev - 19) + (lev - 19) * 1953 * 8 = (lev - 19) * 15625
Reg A = (lev - 19) * 15625 * 5 = (lev - 19) * 78125
Reg A = (lev - 19) * 78125 * 128 = (lev - 19) * 10000000
Or, to preserve the intermediate steps, the last line of the above
could read:
Reg A = ((lev - 19) + (lev - 19) * 31 * 63 * 8) * 5 * 128
Jump to offset 42 hex, to end, lev 20-30 done.
Another jump-to word-boundary alignment for following instruction?
Facilitates pre-fetching for superscaling? Use of NOPs so much
slower?
Reg C = lev - 10 [2's complement]. Setup for lev 10-19.
Reg A = 10,000 dec. Continued lev 10-19 setup.
Confluence of lev 1-9 and 10-19 processing.
Reg A = reg A [ = 10 or 10,000 dec] * 2 ^ reg C [ = lev or lev - 10].
Lev 20-30 processing rejoins here.
Offset 42 is jump-to code, but it's not preceded by a word boundary
aligning NOP. Why?
Result in reg A, per C linkage.
Nethack. Once I got a few modules to compile I disassembled the
code of the shortest and am puzzled by a couple of things.
1. Where the C code calls for an operation that algebraically is
(x - 19) * 10,000,000 [i.e. "return (10000000L*(lev-19));"]
gcc generates code which algebraically is
((x - 19) + (x - 19) * 31 * 63 * 8) * 5 * 128
and I'm wondering at such a roundabout method.
See the attached code, especially the disassembler listing at offsets
19 to 30 (hex).
2. I thought the use of NOPs was to facilitate pre-fetching of
branch or jump-to code; but the compiler doesn't align all jump-to
code on word boundaries, so I'm still at a loss to explain it.
See the pseudo NOPs at offsets 12 and 32 (hex) and compare those
to the lack of a NOP to word-align the instruction at offset 42.
Here are the source and object listings.
----------------------------------------
C SOURCE
(see http://www.juiblex.co.uk/nethack/source.html).From exper.c, from the source for the game Nethack, version 3.0.10,
#include "hack.h"
#ifdef LINT
#define NEW_SCORING
#endif
long
newuexp(lev)
register unsigned lev;
{
#ifdef LINT /* long conversion */
return(0L * lev);
#else
if(lev < 10) return (10L*(1L << lev));
if(lev < 20) return (10000L*(1L << (lev-10)));
return (10000000L*(lev-19));
#endif
}
Given a Nethack experience-level number
{1, 2, 3, 4, 5, 6, 7, 8, 9,
10, 11, 12, 13, 14, 15, 16, 17, 18, 19,
20, 21, 22, 23, 24, 25, 26, 27, 28, 29,
30}
newuexp(lev) returns the minimum number of experience points for
the *next higher* Nethack experience-level, except that there isn't
a level higher than 30
{20, 40, 80, 160, 320, 640, 1280, 2560, 5120,
10K, 20K, 40K, 80K, 160K, 320K, 640K, 1280K, 2560K, 5120K,
10M, 20M, 30M, 40M, 50M, 60M, 70M, 80M, 90M, 100M,
110M}
(DIS)ASSEMBLER LISTING
Compiled: 'gcc -ansi -O -I../include' [5/2/04]
Preprocessing: LINT undefined
Version: gcc version egcs-2.91.66 19990314 (egcs-1.1.2 release) [6/18/04]
OS: NetBSD 1.5.4
CPU: cpu0: AMD Athlon Model 1 (686-class), 604.31 MHz [panix3, 6/26/04]
Disassembled with 'objdump -d'.
exper.o: file format elf32-i386
Disassembly of section .text:
Offset Object code Opcode Src, dest operands
---------------------------------------------------------
00000000 <newuexp>:
0: 55 push %ebp
1: 89 e5 mov %esp,%ebp
3: 8b 4d 08 mov 0x8(%ebp),%ecx
Stardard C linkage:
old base pointer pushed on stack
base pointer = top-of-stack pointer
Reg C = lev [arg list begins at offset 8 from stack top].
6: 83 f9 09 cmp $0x9,%ecx
9: 77 09 ja 14 <newuexp+0x14>
If lev > 9 go to offset 14 hex.
b: b8 0a 00 00 00 mov $0xa,%eax
Reg A = 10 dec. Set up for lev 1-9.
10: eb 2e jmp 40 <newuexp+0x40>
Jump to offset 40 hex, where 10 * 2^lev will be done.
12: 89 f6 mov %esi,%esi
To put following jump-to code on word boundary? Instead of 2 NOPs?
14: 83 f9 13 cmp $0x13,%ecx
17: 76 1f jbe 38 <newuexp+0x38>
If lev <= 19 dec go to offset 38 hex, for lev 10-19 setup.
19: 83 c1 ed add $0xffffffed,%ecx
1c: 89 ca mov %ecx,%edx
1e: c1 e2 05 shl $0x5,%edx
21: 29 ca sub %ecx,%edx
23: 89 d0 mov %edx,%eax
25: c1 e0 06 shl $0x6,%eax
28: 29 d0 sub %edx,%eax
2a: 8d 04 c1 lea (%ecx,%eax,8),%eax
2d: 8d 04 80 lea (%eax,%eax,4),%eax
30: c1 e0 07 shl $0x7,%eax
This multiplies (lev - 19) by 10 million in a puzzlingly roundabout way,
for lev 20-30.
Putting the above 10 instructions in algebraic notation:
Reg C = lev - 19 [2's complement]
Reg D = lev - 19
Reg D = (lev - 19) * 32
Reg D = (lev - 19) * 31
Reg A = (lev - 19) * 31
Reg A = (lev - 19) * 31 * 64
Reg A = (lev - 19) * 31 * 63 = (lev - 19) * 1953
Reg A = (lev - 19) + (lev - 19) * 1953 * 8 = (lev - 19) * 15625
Reg A = (lev - 19) * 15625 * 5 = (lev - 19) * 78125
Reg A = (lev - 19) * 78125 * 128 = (lev - 19) * 10000000
Or, to preserve the intermediate steps, the last line of the above
could read:
Reg A = ((lev - 19) + (lev - 19) * 31 * 63 * 8) * 5 * 128
33: eb 0d jmp 42 <newuexp+0x42>
Jump to offset 42 hex, to end, lev 20-30 done.
35: 8d 76 00 lea 0x0(%esi),%esi
Another jump-to word-boundary alignment for following instruction?
Facilitates pre-fetching for superscaling? Use of NOPs so much
slower?
38: 83 c1 f6 add $0xfffffff6,%ecx
Reg C = lev - 10 [2's complement]. Setup for lev 10-19.
3b: b8 10 27 00 00 mov $0x2710,%eax
Reg A = 10,000 dec. Continued lev 10-19 setup.
40: d3 e0 shl %cl,%eax
Confluence of lev 1-9 and 10-19 processing.
Reg A = reg A [ = 10 or 10,000 dec] * 2 ^ reg C [ = lev or lev - 10].
42: c9 leave
Lev 20-30 processing rejoins here.
Offset 42 is jump-to code, but it's not preceded by a word boundary
aligning NOP. Why?
43: c3 ret
Result in reg A, per C linkage.