FindFirstIn

B

Ben Bacarisse

jacob navia said:
Here is a gcc compatible version.
Tested under Mac OS X

As you can see, you just replace rcx, rdx by rdi, rsi

OK, but I've lost sight of why you are posting it. The original seemed
to be about alternative algorithms (or many about c coding) but you
can't be saying that assembler is the way to write it, can you?

<snip>
 
J

jacob navia

Le 24/05/2014 00:35, Ben Bacarisse a écrit :
OK, but I've lost sight of why you are posting it. The original seemed
to be about alternative algorithms (or many about c coding) but you
can't be saying that assembler is the way to write it, can you?

<snip>

I am writing a small essay in compilers and CPUs. The thesis of that
paper is that we are abstracting too much from the actual machines that
we are using.

Take calling conventions, for instance. They are part of a bigger
software context, about not only "calling" but conventions in general.
Software is precisely conventions.

Let's assume that 65 means 'A', 66 'B', etc. Or that the precise syntax
of C is required, or, for that matter of any computer "language".

All those conventions are part of a thing called "software". It runs in
printed circuits called "computers" ever more ubiquous in our daily life.

It is easy to mistify computers for the uninitiated. And this lack of
insight into the real machine is praised by many as abstracting "details".

How the machine works is not just a detail. We all should know what are
we doing.

Do I really know how the motor of my scooter works?

Not really, I am just a user: If I turn the direction wheel left, it
goes left. Of course I know "in principle" how a combustion engine
works, but I couldn't repair my scooter sorry.

For a mechanic at the garage however, how the motor works is not a
detail. It is the essence of his/her work!

It is really ridiculous this emphasis in abstract machines, hiding away
how the motor works.

Writing in assembler you can see how it works.
 
J

jacob navia

Le 24/05/2014 00:13, Ike Naar a écrit :
This is not correct.
Which proves that I'm lacking the x86-64 expertise.

Look, I thought abut that too. Why do I test for that special case?

Because I would have to do it anyway later, when I start the loop, so I
moved it to the top to improve the case of an empty string. This doesn't
alter the code size but improves the case for a special situation.

I learned that later, I did not publish the intermediate steps but just
the final result.

Assembler is fun, I do it occasionally just for fun. Generating is much
more sensible, I agree with that.

But I could never generate some programs I write... in this case there
are SO many optimizations that are specific to this case, that I doubt
that any compiler would do this.
 
J

jacob navia

Le 23/05/2014 23:47, Ben Bacarisse a écrit :
jacob navia said:
Le 23/05/2014 13:47, Ben Bacarisse a écrit :
bool table[256] = {0};
bool *tab = table - CHAR_MIN;
do tab[*set] = true; while (*set++);

while (!tab[*str]) ++str;
return str;

gcc compiles this to 20/21 instructions (depending on optimising for
speed or size). That's fewer than your hand-coded assembler version,
though it may, of course, be either more bytes, or slower, or both.

No, at least not in my Mac.
/tmp $ gcc -v
Configured with:
--prefix=/Applications/Xcode.app/Contents/Developer/usr
--with-gxx-include-dir=/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.9.sdk/usr/include/c++/4.2.1
Apple LLVM version 5.1 (clang-503.0.40) (based on LLVM 3.4svn)
Target: x86_64-apple-darwin13.2.0
Thread model: posix

You'll have to help me out here.

Did you compile in 64 or 32 bits?

That does not look like a gcc version
at all, but I am not a Mac person.

Well I am, and that is the Apple's version of gcc. At least I think so.

That output references LLVM and
clang which I find odd.

LLVM odd? Why?
clang is the compiler for LLVM as far as I know.

Is there any correspondence with gcc version
numbers on other systems?

No. Apple has extensively modified gcc for its language requirements
since objective C and Steve Jobs in the "Next" computer. They always did
their own version of gcc.
Oddly, I get the same code when I use clang rather than gcc on my Linux
box. Is that just a coincidence?

Well, clang is clang then! What is "odd" about it?
 
J

jacob navia

Le 23/05/2014 23:33, Kaz Kylheku a écrit :
What? That may be implicit, but it requires a diagnostic.

bad idea.

It is a consequence of mixing two different types in the language:

1) Characters, i.e. textual data
2) Small integers that fit in a byte but are numeric data.

Both are different kinds of data, hence different types. C mixes both
with bad consequences. I think this is a bug.

All those contortions shouldn't be necessary.
 
K

Keith Thompson

Ben Bacarisse said:
You'll have to help me out here. That does not look like a gcc version
at all, but I am not a Mac person. That output references LLVM and
clang which I find odd. Is there any correspondence with gcc version
numbers on other systems?

Oddly, I get the same code when I use clang rather than gcc on my Linux
box. Is that just a coincidence?

My understanding is that recent versions of MacOS provide clang as the
default C compiler, but name the executable "gcc" for the sake of
compatibility. (I don't know how much, if any, gcc code is in the
compiler Apple distributes.)

clang is, as I recall, designed to be reasonably compatible with gcc,
particularly in the syntax of its command line arguments.
 
M

Melzzzzz

/tmp $ gcc -v
Configured with:
--prefix=/Applications/Xcode.app/Contents/Developer/usr
--with-gxx-include-dir=/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.9.sdk/usr/include/c++/4.2.1
Apple LLVM version 5.1 (clang-503.0.40) (based on LLVM 3.4svn)
Target: x86_64-apple-darwin13.2.0 Thread model: posix

Analysis:

Instead of using the stosq instruction, gcc repeats 32 times the
store of 16 bytes using xmm0. In my opinion this is a misguided
optimisation.

I think that code generator is LLVM in this case...
gcc 4.8.2 does not produce such code but clang 3.5 does...
 
S

Stefan Ram

jacob navia said:
Can you give a better solution?

When the character set is large, such as the character set
of Unicode, one might not want to fill a lookup table. When
the set of characters to be found is small, the following
approach might help to eliminate some comparisons:

The program below stores the bits that are set in /every/
character of the set into »t« and then skips all characters
of the string where those bits are not all set.

#include <stdio.h>
#include <string.h>

/*prints all positions where a character from the array c appears
in the array s, where z is the size of s, and n is the size of c. */
void f
( size_t const z, char const * const s, size_t const n, char const * const c )
{ unsigned char t = -1; for( size_t i = 0; i < n; ++i )t &=c[ i ];
unsigned char b = 0; for( size_t i = 0; i < n; ++i )b |=c[ i ];
for( size_t i = 0; i < z; ++i )
{ char const r = i[ s ];
if((( r & t )== t )&&(( r & !b )== 0 )) /* the bits-check optimization */
{ for( size_t j = 0; j < n; ++j )if( r == j[ c ] )
{ printf( "found %c at %d.\n", r, i ); goto over; }
printf( "Did an exhaustive search loop, but no match was found.\n" );
over: ; }
else printf( "Skipped an exhaustive search loop.\n" ); }}

int main( void )
{ char const * const s = "This sentence"; char const * const c = "aeiou";
f( strlen( s ), s, strlen( c ), c ); }

Skipped an exhaustive search loop.
Skipped an exhaustive search loop.
found i at 2.
Did an exhaustive search loop, but no match was found.
Skipped an exhaustive search loop.
Did an exhaustive search loop, but no match was found.
found e at 6.
Skipped an exhaustive search loop.
Skipped an exhaustive search loop.
found e at 9.
Skipped an exhaustive search loop.
Did an exhaustive search loop, but no match was found.
found e at 12.
 
B

Ben Bacarisse

jacob navia said:
Le 23/05/2014 23:47, Ben Bacarisse a écrit :
jacob navia said:
Le 23/05/2014 13:47, Ben Bacarisse a écrit :

bool table[256] = {0};
bool *tab = table - CHAR_MIN;
do tab[*set] = true; while (*set++);

while (!tab[*str]) ++str;
return str;

gcc compiles this to 20/21 instructions (depending on optimising for
speed or size). That's fewer than your hand-coded assembler version,
though it may, of course, be either more bytes, or slower, or both.

No, at least not in my Mac.
/tmp $ gcc -v
Configured with:
--prefix=/Applications/Xcode.app/Contents/Developer/usr
--with-gxx-include-dir=/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.9.sdk/usr/include/c++/4.2.1
Apple LLVM version 5.1 (clang-503.0.40) (based on LLVM 3.4svn)
Target: x86_64-apple-darwin13.2.0
Thread model: posix

You'll have to help me out here.

Did you compile in 64 or 32 bits?
64.
That does not look like a gcc version
at all, but I am not a Mac person.

Well I am, and that is the Apple's version of gcc. At least I think
so.

I am beginning to doubt it. I thought you might be using clang, and
Keith's post suggests this is what might be happening.
LLVM odd? Why?
clang is the compiler for LLVM as far as I know.

Yes, but it's not the same compiler as gcc, though there is no reason
gcc could not target LLVM (and I think there is such a thing). But the
output you show suggests that this is clang, pure and simple.
No. Apple has extensively modified gcc for its language requirements
since objective C and Steve Jobs in the "Next" computer. They always
did their own version of gcc.


Well, clang is clang then! What is "odd" about it?

I said gcc does X and you said no. It therefore matters if you are
using gcc or not. I am still not certain, but I don't think you are.

It's interesting that clang is not quite so good at making tiny code,
but that misses the point. A little re-writing of the C permits some
compilers (at least one version of gcc on Linux) to generate code
smaller than the code whose size you were happy about. That has an
impact on the value of writing assembler. If you can't get clearer
code, or more portable code, or smaller code, you have to ask why you
are bothering. You *may* get faster code, but I would not be too
confident about that either. Certainly, with nicely written C, you can
get either just by using different options.
 
M

Melzzzzz

Le 23/05/2014 07:45, Ike Naar a écrit :
Another interesting metric would be the number of bugs.

Go ahead. If your remark is not just bad faith tell me why this
program would be wrong
1 .text 64
2 ; char *strFindFirst(char *str,char *set)
3 strFindFirst:
4 ; Make space for a table of 256 positions
5 subq $256,%rsp
6 ; if (*str==0) return p;
7 cmpb $0,(%rcx)
8 je SetResultAndExit
9 ; memset(tab,0,sizeof(tab));
10 ; Save the registers rdi and rcx into r10 and r11
11 movq %rdi,%r10
12 movq %rcx,%r11
13 ; Prepare for the stosq loop: Count is 32 (256/8),rax is zero
14 ; and destination is rsp, stored in rdi.
15 movl $32,%ecx
16 xorl %eax,%eax
17 movq %rsp,%rdi
18 rep
19 stosq
20 movq %r10,%rdi
21 movq %r11,%rcx
22 jmp TestFillTableLoopCondition
23 FillTable:
24 ; tab[*(unsigned char *)set++]=1;
25 incb (%rsp,%rax)
26 incq %rdx
27 TestFillTableLoopCondition:
28 movb (%rdx),%al
29 testb %al,%al
30 jne FillTable
31 SearchCharLoop:
32 ; if (tab[*p]) return p;
33 movb (%rcx),%al ; move char to al
34 cmpb $0,(%rsp,%rax) ; index table with it
35 jne SetResultAndExit ; test if nonzero
36 incq %rcx ; next char
37 cmpb $0,(%rcx) ; test if zero
38 jne SearchCharLoop
39 SetResultAndExit:
40 movq %rcx,%rax ; set result
41 addq $256,%rsp ; restore stack
42 ret

What about sse 4.2 ;) ?

; rdi str , rsi chars
strFindFirstInSSE42:
xor r8d,r8d
xor r9d,r9d
pxor xmm2,xmm2
..L0:
movdqu xmm1,[r9+rsi]
..L1:
movdqu xmm0,[r8+rdi]
pcmpistri xmm1,xmm0,0 ; search for any of chars in xmm1
jc .L2 ; found char bail out
jz .L3 ; we are at end of str, next chars
add r8,16
jmp .L1
..L2:
add r8,rcx
lea rax,[r8+rdi]
ret
..L3:
pcmpistri xmm2,xmm1,1000b ; does it contains zero?
jz .L2 ; not found, but last chars, bail out
add r9,16
xor r8d,r8d
jmp .L0
 
M

Melzzzzz

Le 23/05/2014 07:45, Ike Naar a écrit :
Le 22/05/2014 18:42, jacob navia a ?crit :
I rewrote the second function in assembler. Only 77 bytes!
gcc (with -Os) is 350.

Another interesting metric would be the number of bugs.

Go ahead. If your remark is not just bad faith tell me why this
program would be wrong
1 .text 64
2 ; char *strFindFirst(char *str,char *set)
3 strFindFirst:
4 ; Make space for a table of 256 positions
5 subq $256,%rsp
6 ; if (*str==0) return p;
7 cmpb $0,(%rcx)
8 je SetResultAndExit
9 ; memset(tab,0,sizeof(tab));
10 ; Save the registers rdi and rcx into r10 and r11
11 movq %rdi,%r10
12 movq %rcx,%r11
13 ; Prepare for the stosq loop: Count is 32 (256/8),rax is zero
14 ; and destination is rsp, stored in rdi.
15 movl $32,%ecx
16 xorl %eax,%eax
17 movq %rsp,%rdi
18 rep
19 stosq
20 movq %r10,%rdi
21 movq %r11,%rcx
22 jmp TestFillTableLoopCondition
23 FillTable:
24 ; tab[*(unsigned char *)set++]=1;
25 incb (%rsp,%rax)
26 incq %rdx
27 TestFillTableLoopCondition:
28 movb (%rdx),%al
29 testb %al,%al
30 jne FillTable
31 SearchCharLoop:
32 ; if (tab[*p]) return p;
33 movb (%rcx),%al ; move char to al
34 cmpb $0,(%rsp,%rax) ; index table with it
35 jne SetResultAndExit ; test if nonzero
36 incq %rcx ; next char
37 cmpb $0,(%rcx) ; test if zero
38 jne SearchCharLoop
39 SetResultAndExit:
40 movq %rcx,%rax ; set result
41 addq $256,%rsp ; restore stack
42 ret

What about sse 4.2 ;) ?
To tired, this is correct version ;)

; rdi str , rsi chars
strFindFirstInSSE42:
xor r8d,r8d
xor r9d,r9d
pxor xmm2,xmm2
..L0:
movdqu xmm1,[r9+rsi]
..L1:
movdqu xmm0,[r8+rdi]
pcmpistri xmm1,xmm0,0 ; search for any of chars in xmm1
jc .L2 ; found char bail out
jz .L3 ; we are at end of str, next chars
add r8,16
jmp .L1
..L2:
add r8,rcx
lea rax,[r8+rdi]
ret
..L22:
pcmpistri xmm2,xmm0,1000b ; we need to find out zero position
add r8,rcx ; add zero position
lea rax,[r8+rdi]
ret
..L3:
pcmpistri xmm2,xmm1,1000b ; does it contains zero?
jz .L22 ; not found, but last chars, bail out
add r9,16
xor r8d,r8d
jmp .L0
 
M

Malcolm McLean

I am writing a small essay in compilers and CPUs. The thesis of that
paper is that we are abstracting too much from the actual machines that
we are using.

Take calling conventions, for instance. They are part of a bigger
software context, about not only "calling" but conventions in general.
Software is precisely conventions.

Let's assume that 65 means 'A', 66 'B', etc. Or that the precise syntax
of C is required, or, for that matter of any computer "language".

All those conventions are part of a thing called "software". It runs in
printed circuits called "computers" ever more ubiquous in our daily life.
Most software is heavily dependent on conventions, I'd agree. A lot
of the numbers in the machine don't represent inherent values, but
conventions, like 65 = 'A', 0 = false, -1 = error.
We've also got mathematical conventions like axes being labelled x, y
and z, angles being called theta, numbers being represented in base 10.
A lot of these have been thrown into chaos by computers, for example
y axes can go either upwards or downwards, angles can be in degrees,
radians, or internal 0-1 representation, subscripts can go 0 - N-1 or
1 - N, numbers might in in base 10 or base 16, equal signs may be
used for assignment, the distinction between rationals and reals has
become fuzzy, even operator precedence rules aren't entirely stable.

Children tend not to be very aware what is a conventional and what
is a fundamental, because they're taught the same way at school.
Spelling tests and times table tests are marked with the same symbols,
often the results are even aggregated and treated as an observation.
It is however important to teach the difference, especially important
with computers and maths where the conventions are now fluid.
 
K

Kaz Kylheku

Le 24/05/2014 00:35, Ben Bacarisse a écrit :

I am writing a small essay in compilers and CPUs. The thesis of that
paper is that we are abstracting too much from the actual machines that
we are using.

Take calling conventions, for instance. They are part of a bigger
software context, about not only "calling" but conventions in general.
Software is precisely conventions.

Let's assume that 65 means 'A', 66 'B', etc. Or that the precise syntax
of C is required, or, for that matter of any computer "language".

All those conventions are part of a thing called "software". It runs in
printed circuits called "computers" ever more ubiquous in our daily life.

It is easy to mistify computers for the uninitiated. And this lack of
insight into the real machine is praised by many as abstracting "details".

How the machine works is not just a detail. We all should know what are
we doing.

Do I really know how the motor of my scooter works?

Not really, I am just a user: If I turn the direction wheel left, it
goes left. Of course I know "in principle" how a combustion engine
works, but I couldn't repair my scooter sorry.

For a mechanic at the garage however, how the motor works is not a
detail. It is the essence of his/her work!

It is really ridiculous this emphasis in abstract machines, hiding away
how the motor works.

Writing in assembler you can see how it works.

I don't think so, because assembler is very abstracted also.
The x86 instruction set has nothing to do with how the processor
actually works.

Also, the mundane aspects of assembly like doing arithmetic or string
manipulation really don't teach you anything.

BUT: here is the thing. Assembly language instruction sets have a much more
detailed model of the machine state. They define things like interrupts
and exceptions, which have precise behaviors. You can catch any condition,
pinpoint its origin and possibly fixup the machine state to recover or
whatever.

C gives us pitiful things like setjmp.

In assembly language, we can, for instance, write a pre-emptively
multi-threaded kernel, *with* precisely tracing garbage collection.

That's the sort of stuff that is educational to do with assembly language.

Also, if you want englightenment, learn a beautiful assembly language like
MC68000. Or MIPS.

Assembly language should also teach the esthetics of how to nicely design an
instruction set.

If all you have is a PC, get an emulator for a nice instruction set and program
that.
 
B

BartC

Kaz Kylheku said:
I don't think so, because assembler is very abstracted also.
The x86 instruction set has nothing to do with how the processor
actually works.

x86 code is the closest you're going to get, without the ability to write
microcode or somehow get further inside the workings of the processor.
Also, the mundane aspects of assembly like doing arithmetic or string
manipulation really don't teach you anything.

I disagree. Doing basic arithmetic in assembler (especially floating point)
can be educational. (But you don't want to do it more than once.)
In assembly language, we can, for instance, write a pre-emptively
multi-threaded kernel, *with* precisely tracing garbage collection.

That's the sort of stuff that is educational to do with assembly language.

There're a few other things, but many can also be made available with a more
carefully designed (than C) higher level language.
Also, if you want englightenment, learn a beautiful assembly language like
MC68000.

That's what I thought too when it first appeared. Until I examined it more
closely and it was even less orthogonal than x86! (For example, having two
kinds of registers, the Address and Data registers, immediately an awkward
decision needs to be made, if you're generating code, of which to use. Some
things work on D regs and not on A regs, and vice versa. Etc.)
Assembly language should also teach the esthetics of how to nicely design
an
instruction set.

If all you have is a PC, get an emulator for a nice instruction set and
program
that.

Z8000 and NatSemi 32032/16 were more beautiful, iirc. Sadly the better
products never seem to be the successful ones.
 
M

Melzzzzz

Le 23/05/2014 07:45, Ike Naar a écrit :
Le 22/05/2014 18:42, jacob navia a ?crit :
I rewrote the second function in assembler. Only 77 bytes!
gcc (with -Os) is 350.

Another interesting metric would be the number of bugs.


Go ahead. If your remark is not just bad faith tell me why this
program would be wrong
1 .text 64
2 ; char *strFindFirst(char *str,char *set)
3 strFindFirst:
4 ; Make space for a table of 256 positions
5 subq $256,%rsp
6 ; if (*str==0) return p;
7 cmpb $0,(%rcx)
8 je SetResultAndExit
9 ; memset(tab,0,sizeof(tab));
10 ; Save the registers rdi and rcx into r10 and r11
11 movq %rdi,%r10
12 movq %rcx,%r11
13 ; Prepare for the stosq loop: Count is 32 (256/8),rax is
zero 14 ; and destination is rsp, stored in rdi.
15 movl $32,%ecx
16 xorl %eax,%eax
17 movq %rsp,%rdi
18 rep
19 stosq
20 movq %r10,%rdi
21 movq %r11,%rcx
22 jmp TestFillTableLoopCondition
23 FillTable:
24 ; tab[*(unsigned char *)set++]=1;
25 incb (%rsp,%rax)
26 incq %rdx
27 TestFillTableLoopCondition:
28 movb (%rdx),%al
29 testb %al,%al
30 jne FillTable
31 SearchCharLoop:
32 ; if (tab[*p]) return p;
33 movb (%rcx),%al ; move char to al
34 cmpb $0,(%rsp,%rax) ; index table with it
35 jne SetResultAndExit ; test if nonzero
36 incq %rcx ; next char
37 cmpb $0,(%rcx) ; test if zero
38 jne SearchCharLoop
39 SetResultAndExit:
40 movq %rcx,%rax ; set result
41 addq $256,%rsp ; restore stack
42 ret

What about sse 4.2 ;) ?
To tired, this is correct version ;)
Nope that one switches which is outer/inner loop.
Loops for every 16 characters whole string it should be opposite:

Final version :
; rdi str , rsi chars
strFindFirstInSSE42:
xor r8d,r8d
xor r9d,r9d
pxor xmm2,xmm2
..L0:
movdqu xmm1,[r9+rsi]
movdqu xmm0,[r8+rdi]
pcmpistri xmm1,xmm0,0 ; search for any of chars in xmm1
jc .L2 ; found char bail out
pcmpistri xmm2,xmm1,1000b ; does it contains zero?
jz .L1 ; not found, but last chars, next
add r9,16
jmp .L0
..L1:
pcmpistri xmm2,xmm0,1000b ; is it last?
jz .L2 ; last, bail out
add r8,16
xor r9d,r9d
jmp .L0
..L2:
add r8,rcx
lea rax,[r8+rdi]
ret
 
K

Keith Thompson

BartC said:
That's what I thought too when it first appeared. Until I examined it more
closely and it was even less orthogonal than x86! (For example, having two
kinds of registers, the Address and Data registers, immediately an awkward
decision needs to be made, if you're generating code, of which to use. Some
things work on D regs and not on A regs, and vice versa. Etc.)

Use the A regs for addresses, and the D regs for non-addresses. What's
the problem?
 
B

BartC

Keith Thompson said:
[MC68000]
That's what I thought too when it first appeared. Until I examined it
more
closely and it was even less orthogonal than x86! (For example, having
two
kinds of registers, the Address and Data registers, immediately an
awkward
decision needs to be made, if you're generating code, of which to use.
Some
things work on D regs and not on A regs, and vice versa. Etc.)

Use the A regs for addresses, and the D regs for non-addresses. What's
the problem?

Because an address which isn't going to be used for an address is just data;
does it go into An or Dn?

If a function has a return value which normally is put into a register,
should it be D0 or A0? Should it depend on the type the value has in a
high-level language? Because if programming in assembler, you don't want to
have to worry about exactly which register any particular function puts its
return value.

If you wanted to use a calling convention where some parameters are passed
in registers, it's a similar problem.

Some instructions only operate on one or the other set of registers, so LEA
loads an effective address to An, but then you might want to use that as
data. Or you might want to mess with the lower bits of a pointer, and maybe
the AND and OR instructions don't work on A registers (it's not clear
whether they do or not). If not, it means moving an address to a data
register to set the lower bits, then moving it back to an address register
to use it as a pointer!

It's an extra complication that you can do without. (It's bad enough having
floating point values that need loading into the floating point unit if
doing actual floating ops on them, but need loading into normal registers
for anything else. But sometimes it's all mixed up.

At least you can sort of see the need for floating point to do that; there's
no reason to treat address values specially, and in fact few other
processors seem to split their registers like this.
 
K

Keith Thompson

BartC said:
Keith Thompson said:
[MC68000]
That's what I thought too when it first appeared. Until I examined
it more closely and it was even less orthogonal than x86! (For
example, having two kinds of registers, the Address and Data
registers, immediately an awkward decision needs to be made, if
you're generating code, of which to use. Some things work on D regs
and not on A regs, and vice versa. Etc.)

Use the A regs for addresses, and the D regs for non-addresses. What's
the problem?

Because an address which isn't going to be used for an address is just data;
does it go into An or Dn?

Why would you have "an address which isn't going to be used for an
address"?
If a function has a return value which normally is put into a register,
should it be D0 or A0? Should it depend on the type the value has in a
high-level language? Because if programming in assembler, you don't want to
have to worry about exactly which register any particular function puts its
return value.

So your concern is that you might have a function that returns a 32-bit
value, but you don't know whether that value is an address or not.

There are such cases, but in my experience they're rare -- and I'd
expect them to be rare even in assembly language. If a function returns
a value that can be interpreted either as an integer or as an address,
you can always return it in a D register and let the caller copy it into
an A register if necessary. It's like a C function that returns an
integer that, in some cases, might be treated as an address; it's a rare
requirement, but you can still deal with it by letting the caller do the
conversion.

Yes, this is a small cost, but there's also a substantial benefit: the
68K has 16 registers (D0..D7 and A0..A7), but instructions only need 3
bits to select one of them; the choice between D and A is implied by the
instruction.

But for most functions, the meaning of the returned value should be
reasonably clear and unambiguous. A memory allocation function like
malloc() returns an address; an arithmetic function returns an integer.

[...]
At least you can sort of see the need for floating point to do that;
there's no reason to treat address values specially, and in fact few
other processors seem to split their registers like this.

So perhaps the tradeoff turned out not to be worthwhile. Still,
there were definite advantages in the way the 68K organized its
general-purpose registers. And I imagine (with no evidence) that
bugs involving accidentally treating integers as addresses or
vice versa might be rarer in hand-written 68K code than in, say,
hand-written x86 code.
 
J

James Kuyper

Because an address which isn't going to be used for an address is just data;
does it go into An or Dn?

I would normally agree with your "just data" assessment, but only
because I can't figure out why you consider it to be an address. If it
isn't going to be used for an address, it isn't an address.

The people who designed this platform considered addresses to be a
clearly distinct category from non-addresses; for people like that,
deciding whether to use An or Dn is trivial. You, on the other hand,
have a concept of what an address is that is too fuzzy to make it easy
for you to use such a platform.
If a function has a return value which normally is put into a register,
should it be D0 or A0? Should it depend on the type the value has in a
high-level language?

It depends upon the meaning of the returned value; a high-level language
isn't needed to determine that meaning. However, a certain amount of
design discipline is needed. I you routinely return addresses and
non-addresses by the same mechanism, a platform like this is not going
to be convenient for you.
 
K

Kaz Kylheku

Because an address which isn't going to be used for an address is just data;
does it go into An or Dn?

If a function has a return value which normally is put into a register,
should it be D0 or A0? Should it depend on the type the value has in a
high-level language?

Yes; I believe this is a common calling convention on m68K. A pointer
return goes into A0, and an integer into D0.

Code which calls malloc without declaring it first really bites the dust.
Because if programming in assembler, you don't want to
have to worry about exactly which register any particular function puts its
return value.

What? That's exactly the sort of thing you exactly know for every darn
function.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,228
Members
46,816
Latest member
nipsseyhussle

Latest Threads

Top