inca: a hacked-up apl in c based on the J incunabulum

J

James Kuyper

(snip, someone wrote)


There are still a few cases left where compact is fast, and fast
is important. For real-time systems, it is either fast enough or
doesn't work at all.

Compact source code? The speed with which source code is processed
seldom is an important factor. In any event, this style of code should,
if anything, slow down the compiler, because of the additional work that
needs to be done during translation phase 4 (preprocessing). The only
thing that this speeds up is the typing of the code - and even that only
applies if the unreadability of the code doesn't slow down the code
development process enough to make up for the smaller source code files.
 
G

glen herrmannsfeldt

(snip, I wrote)
Compact source code? The speed with which source code is processed
seldom is an important factor.

Sometimes more compact code is faster, but not always.

Personally, I like table-driven code, which tends to be compact,
some may find it more readable, some less.

I prefer a loop with a single IF statement in it to tens or
hundreds of consecutive IF statements testing different values
for a single, or small number, of variables. (Yes, I have seen
that latter in actual production code.)

Other than intentional obfuscation, exactly what someone
finds more readable can be different for different people.

-- glen
 
G

glen herrmannsfeldt

(snip)
My comment was that its obfuscation would not be suitable for the IOCCC
(http://www.ioccc.org/).
That kind of compactness is not a virtue. Abbreviating "return" and
"printf" (which all C programmers understand) as "R" and "P" serves no
useful purpose that I can think of. I for one will not waste my time
trying to read code written in that horrid style.

There was once a question asked to one of the original creators of
unix, if they could go back and change one thing, what would they
have done differently.

As I remember it, it was spelling creat() with an extra e.

Many unix commands are shorter than similar commands on other
systems, and if not short enough, shell aliases will make them shorter.

It is very common in PostScript to define shorter versions of
common commands (I forget what PS calls them). (Though often that
is for program generated, not meant to be read by humans, code.)
(Apparently there are other problems as well, such as depending on
implicit int and assuming that int and pointers are the same size.)
If you find it fun to write such code for yourself, you are of course
free to do so. If you want anyone else to read it, as implied by
posting it here, going out of your way to make it more difficult
to read is counterproductive.

-- glen
 
B

BartC

glen herrmannsfeldt said:
(snip, someone wrote)


There are still a few cases left where compact is fast, and fast
is important. For real-time systems, it is either fast enough or
doesn't work at all.

That doesn't follow, unless you're dealing with actual source code at run
time.

And even if significant, we don't know in this example how compact or
otherwise the expanded code might be.
The inner loop of an interpreter should also be fast, sometimes
at the expense of readability. (Though there is no excuse
for not having enough comments to explain the unreadable part.)

This is one dispatch loop for a bytecode interpreter:

typedef void (*(*fnptr))(void);

do {
(**(fnptr)(pcptr))();
} while (!stopped);

It's reasonably fast (over 100M bytecodes per second), and is still pretty
clear, which is not unexpected for three lines of code! (BTW removing the
stop condition, just while(1), makes it slower for some reason.)

Faster (approaching 200M) is this, showing just one of nearly 300 similar
cases, each of which is handled in the same way:

while (1) {
switch (*pcptr) {
case knop: do_nop(); break;
....
}
}

That's about as far as I could go while using standard C, and still having
an actual inner loop. I've gone a couple of further steps, and managed
200-400M bytecodes per second, but the basic C bytecode handler functions
/are the same/ in all cases.

That is, for every bytecode instruction, there is a discrete, dedicated
function to handle it. You can't get more straightforward than that. And
compiler inlining will also collapse a lot of these, there is no need for
the code to be physically compact and unreadable.
 
G

glen herrmannsfeldt

(big snip)
Faster (approaching 200M) is this, showing just one of nearly 300 similar
cases, each of which is handled in the same way:
while (1) {
switch (*pcptr) {
case knop: do_nop(); break;
....
}
}
That's about as far as I could go while using standard C, and still having
an actual inner loop. I've gone a couple of further steps, and managed
200-400M bytecodes per second, but the basic C bytecode handler functions
/are the same/ in all cases.
That is, for every bytecode instruction, there is a discrete, dedicated
function to handle it. You can't get more straightforward than that. And
compiler inlining will also collapse a lot of these, there is no need for
the code to be physically compact and unreadable.

I was recently looking at the flow charts of the microcode for the
IBM 360/40 CPU. (The smallest of the 360s that could run OS/360.)
Seems that they do a 16 way branch on the high half of the opcode,
followed by 16 way branches on the low half. I presume that it was
too much to supply a 256 way branch in microcode.

I wonder how the speed of nested 16 way switch/case would compare
to a 256 way switch/case?

-- glen
 
B

BartC

I was recently looking at the flow charts of the microcode for the
IBM 360/40 CPU. (The smallest of the 360s that could run OS/360.)
Seems that they do a 16 way branch on the high half of the opcode,
followed by 16 way branches on the low half. I presume that it was
too much to supply a 256 way branch in microcode.

I wonder how the speed of nested 16 way switch/case would compare
to a 256 way switch/case?

There must have had their reasons for using a two-level switch. It's also
possible the flowchart didn't fully represent what was going on in the
microcode.

Although that would be a little different from a C switch which has to deal
with a default case, which is extra overhead for each level.

(Maybe there is a way of telling C not to use a default case, or it might be
able to infer it if, say, the switch expression was 8-bits, and 256
different case labels were provided.)
 
K

Kaz Kylheku

(snip, I wrote)


Sometimes more compact code is faster, but not always.

Compactness obtained from transformations like:

#define l long_variable_name

has no bearing on executable speed.
 
J

James Kuyper

(snip, I wrote)


Sometimes more compact code is faster, but not always.

Are you talking about the code compiling faster, or executing faster?
Use of these macros might have a small amount of influence on the
compilation speed, but they shouldn't have any effect on the execution
speed. Are you really suggesting that use of these particular macros
might speed up (rather than slow down, as I expect) compilation of the
code sufficiently to justify using them?

....
Other than intentional obfuscation, exactly what someone
finds more readable can be different for different people.

In some cases, yes, readability can be a highly subjective issue, but
I'm not willing to concede that these particular macros could ever be
anything but an obfuscation, and a fairly substantial one at that. That
might not have been the author's intent, but it is in fact what he has
achieved.
 
J

Javier Lopez

El 27/03/2014 21:38, glen herrmannsfeldt escribió:
(snip, I wrote)


Sometimes more compact code is faster, but not always.
I guess you are referring to generated machine code and then, yes, in
modern architectures more compact code == faster because of cache
locality. Likewise, larger code == slower because cache trashing.

But compact object code doesn't have /anything/ to do with compact
sources anyway.
 
J

Javier Lopez

El 27/03/2014 22:06, BartC escribió:
This is one dispatch loop for a bytecode interpreter:

typedef void (*(*fnptr))(void);

do {
(**(fnptr)(pcptr))();
} while (!stopped);

It's reasonably fast (over 100M bytecodes per second), and is still pretty
clear, which is not unexpected for three lines of code! (BTW removing
the stop condition, just while(1), makes it slower for some reason.)
This depends on how smart the compiler is and the target architecture.
Because of indirect SIB (Scale-Index-Base) call on x86, I guess it
should be faster to use a jump table than to use switch.

If I'm not wrong, it should be faster to use switch if there's little
cases because of iCache.
 
J

James Kuyper

El 27/03/2014 19:04, Keith Thompson escribió:
If this wasn't intented for a obfuscation contest, then it is
deliberately obfuscated and I don't understand the point.

Code using such macros has frequently been posted to this newsgroup by
people using a variety of different names, such as io_x <[email protected]>,
Rosario1903 <[email protected]>, "Citizen C" <[email protected]>,
"¬a\\/b" <[email protected]>, av <[email protected]>, RoSsIaCrIiLoIA <[email protected]>, Giuseppe
<[email protected]>. I found several references to an older
obfuscator named "Jens", though I didn't find any of his actual
messages. The reason why they chose to do so has been frequently asked,
but never (to my knowledge) satisfactorily answered. Hanlon's Razor
suggests that it might have been intended merely to reduce typing, but
I'm not willing to rule out a deliberate attempt to obfuscate.

It's hard to believe that so many people could separately invent a
concept so stupid; possibly they borrowed from each other. Since very
few, if any, of them seem to have valid e-mail addresses, they could
easily be multiple aliases for a much smaller cadre of idiots.

One message I found traced the usage back to "Arthur Whitney's J
Interpreter", which seems pretty suspicious in the context of this thread.
 
G

glen herrmannsfeldt

(snip, I wrote)
Are you talking about the code compiling faster, or executing faster?

Sorry, I meant more generally on writing compact code, and not
specifically on using macros to make it look smaller.
Use of these macros might have a small amount of influence on the
compilation speed, but they shouldn't have any effect on the execution
speed. Are you really suggesting that use of these particular macros
might speed up (rather than slow down, as I expect) compilation of the
code sufficiently to justify using them?

For non-english speakers, I have no idea what macros might make
it easier to read C code. I might believe that for some people
single letters would be better.

Reminds me of a comment from IBM about the Fortran H compiler.
Seems that they use six (names are one to six characters long)
balanced trees, one for each length. They suggest that for maximum
compilation speed you should distribute your names evenly between
one and six characters long. No suggestion of readability.

In some cases, yes, readability can be a highly subjective issue, but
I'm not willing to concede that these particular macros could ever be
anything but an obfuscation, and a fairly substantial one at that. That
might not have been the author's intent, but it is in fact what he has
achieved.

I might believe that on a 32 character wide screen that shorter names
make for more readable C. I did at one time use a computer running C
with a 32 character screen. Though I could print them out on wider
paper. (I had a 132 character wide printer at the time.)

-- glen
 
S

Stefan Ram

glen herrmannsfeldt said:
Reminds me of a comment from IBM about the Fortran H compiler.
Seems that they use six (names are one to six characters long)

Reminds me of ANSI C 1989 where for names with external linkage
the first six characters were portably significant IIRC.
 
L

luser- -droog

Code using such macros has frequently been posted to this newsgroup by
people using a variety of different names, such as io_x <[email protected]>,
Rosario1903 <[email protected]>, "Citizen C" <[email protected]>,
"¬a\\/b" <[email protected]>, av <[email protected]>, RoSsIaCrIiLoIA <[email protected]>, Giuseppe
<[email protected]>. I found several references to an older
obfuscator named "Jens", though I didn't find any of his actual
messages. The reason why they chose to do so has been frequently asked,
but never (to my knowledge) satisfactorily answered. Hanlon's Razor
suggests that it might have been intended merely to reduce typing, but
I'm not willing to rule out a deliberate attempt to obfuscate.

It's hard to believe that so many people could separately invent a
concept so stupid; possibly they borrowed from each other. Since very
few, if any, of them seem to have valid e-mail addresses, they could
easily be multiple aliases for a much smaller cadre of idiots.

One message I found traced the usage back to "Arthur Whitney's J
Interpreter", which seems pretty suspicious in the context of this thread..

What, I wonder, is "suspicious" about this?

My source says the same thing.
http://www.jsoftware.com/jwiki/Essays/Incunabulum

FWIW, I have removed the P and R macros and added
many comments to inca.c and examples to the README,
so you don't actually need to try to compile it to
get a feel for what the heck I'm talking about.
 
L

luser- -droog

Perhaps a select few others across this wide globe.

It's definitely a hornet's nest of a pile of code droppings.
But it kinda all works, if you type the syntax right.

In your attempt, the spacing was off.

1+1
should work. A space following a number must be
followed by another number, and it concatenates
the two into a list terminated by the first
operator/function.

1 1 1+3 => 4 4 4

It took me weeks poring over that original J
incunabulum. Then learning more apl for context,
then going back to the incunabulum.

Oops. No. Rosy memory. I've been scrutinizing the
original code for over a year, as the dates on my
comments to the SO question linked in the README show.
 
L

luser- -droog

(snip)


There was once a question asked to one of the original creators of
unix, if they could go back and change one thing, what would they
have done differently.

As I remember it, it was spelling creat() with an extra e.

Many unix commands are shorter than similar commands on other
systems, and if not short enough, shell aliases will make them shorter.

It is very common in PostScript to define shorter versions of
common commands (I forget what PS calls them). (Though often that
is for program generated, not meant to be read by humans, code.)

I'll look again. But I think I've added "I"s for all the
returns actually used. Then remaining implicit ints
should all be void-functions.

I do understand that it appears that way. But that's
not how it happened. Take a peek at the the J incunabulum
that I started with. I merely tried to maintain a consistent
style while adding extensions.
 
B

BartC

Javier Lopez said:
El 27/03/2014 22:06, BartC escribió:
This depends on how smart the compiler is and the target architecture.
Because of indirect SIB (Scale-Index-Base) call on x86, I guess it should
be faster to use a jump table than to use switch.

If I'm not wrong, it should be faster to use switch if there's little
cases because of iCache.

Yes, that switch-based code I showed was faster than the simple
function-call loop. Almost certainly because the former can make excellent
use of inlining.

What is faster still than the switch, is to use a jump-table containing
label pointers. But this is not satisfactory because it uses both a
non-standard gcc extension, and lots of gotos:

loop:
goto **pcptr;

jnop:
do_nop();
goto loop;

..... // 280 more similar groups

(The table of pointers exists, but the contents are used to fix-up the
byte-code so it contains the label pointers directly.)

Strangely, eliminating the second goto to achieve 'threaded' code (and have
no 'inner loop') was slower:

loop:
goto **pcptr; // only executed once

jnop:
do_nop();
goto **pcptr; // After each bytecode, jump direct to the next
.....

(Fastest is probably a threaded code 'inner loop' written in ASM and using
dedicated registers. Initially however it is slower than even the
function-call loop, because there is no inlining, and there are extra
overheads in saving registers to call the C bytecode handlers. The speed
comes from dealing with selected bytecodes in ASM and avoiding calling C as
much as possible.

This is all for a pure bytecode interpreter dealing with dynamic, tagged
data. There are more sophisticated techniques but then it will no longer be
a simple interpreter.)
 
B

BartC

BartC said:
Why is it necessary to keep it tight?

(I had a look earlier with a view to expanding it to 'normal' C. But after
about thirty seconds I gave up! It would be easier to rewrite from
scratch.)

I've done some work with the source. The results are here:

http://pastebin.com/7J0tfbJU

Mainly, replacing I by INT, A by ABC (to make things simpler for my editor),
and giving more vertical space to the functions. Ideally everything there
would be one statement per line, and the code would be consistently
indented, but I guess tools exist to do all that.

The overall structure is now a bit clearer (especially after I figured out
that V1 and V2 define functions), but it still looks quite complex to fully
understand. (Not knowing APL doesn't help.)

The code is also still very fragile: most things I type in will crash it.
 
L

luser- -droog

(snip)


There was once a question asked to one of the original creators of
unix, if they could go back and change one thing, what would they
have done differently.

As I remember it, it was spelling creat() with an extra e.

Many unix commands are shorter than similar commands on other
systems, and if not short enough, shell aliases will make them shorter.

It is very common in PostScript to define shorter versions of
common commands (I forget what PS calls them). (Though often that
is for program generated, not meant to be read by humans, code.)

They're usually called "mnemonics", assuming you do it right.
In postscript this can have benefits in storage space,
transmission time, and execution speed. Handwritten code will
often use loops and procedures, thus paying for decoding the
names just once, but generated code is usually unrolled,
so reducing a name by 1 char reduces the file by 1 char
for each usage, which can be significant.
 
K

Keith Thompson

luser- -droog said:
I'll look again. But I think I've added "I"s for all the
returns actually used. Then remaining implicit ints
should all be void-functions.

If you're trying to write conforming modern C, you need to remove *all*
occurrences of implicit int. A void function needs to be declared with
an explicit return type of "void".

The "implicit int" feature was removed from the language by the C99
standard.

[...]
I do understand that it appears that way. But that's
not how it happened. Take a peek at the the J incunabulum
that I started with. I merely tried to maintain a consistent
style while adding extensions.

I found this:

https://github.com/tangentstorm/j-incunabulum

and in particular this:

https://github.com/tangentstorm/j-incunabulum/blob/master/ji.c

which I see is where the horrid macro definitions:

#define P printf
#define R return

came from. It's based on some code written in 1989, which probably
explains the use of implicit int. Quoting the README.md:

Although the code is terse and may seem unreadable at first,
it can be understood with a little effort. It may help to
understand that the code was written for and by people who
were very familiar with array-centric languages (APL) and who
came from a strong mathematical background. Indeed, one of the
main reasons for adopting a style like this is that it meakes
it easier to reason about and manipulate the code algebraically.

I'm not convinced about the claimed advantages of the style,
though I suppose it's possible I'm missing something.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,073
Messages
2,570,539
Members
47,197
Latest member
NDTShavonn

Latest Threads

Top