Interesting coding idea

J

Jack Klein

Perhaps it would be interesting to program a virtual machine simulating
an ancient computer (such as the pdp-7). Then, it would be rather
interesting to code for it (porting gcc to it maybe?). I think it would
be fun to play with the long-forgotten art of coding in machine language.

And what about a fictional computer, such as one that works on an
entirely different way (such as a non-binary computer)?

It wouldn't be very useful, but it wouold be a very fun and very
interesting thing to hack on.

And what exactly does this have to do with the C language? I suspect
it is equally off-topic in comp.lang.python.
 
?

=?iso-8859-1?q?Nils_O=2E_Sel=E5sdal?=

Perhaps it would be interesting to program a virtual machine simulating
an ancient computer (such as the pdp-7). Then, it would be rather
interesting to code for it (porting gcc to it maybe?). I think it would
be fun to play with the long-forgotten art of coding in machine language.
http://www.aracnet.com/~healyzh/decemu.html
(There are emulators/virtual machines for most ancient computers out there
as well ;)
And what about a fictional computer, such as one that works on an
entirely different way (such as a non-binary computer)?
Well, gcc supports mmix
http://www-cs-faculty.stanford.edu/~knuth/mmix.html , and something in the
same area; http://tph.tuwien.ac.at/~oemer/qcl.html
 
B

Bruno R. Dias

Perhaps it would be interesting to program a virtual machine simulating
an ancient computer (such as the pdp-7). Then, it would be rather
interesting to code for it (porting gcc to it maybe?). I think it would
be fun to play with the long-forgotten art of coding in machine language.

And what about a fictional computer, such as one that works on an
entirely different way (such as a non-binary computer)?

It wouldn't be very useful, but it wouold be a very fun and very
interesting thing to hack on.

--
-----BEGIN GEEK CODE BLOCK-----
Version: 3.1
GCS/E/L d-- s+:+ a--- C++ UL+ P--- L++>+++ E W++ N+ o+ K++ w---
!O M-- V--PS++ PE++ Y>+ PGP>+ t++(+++) 5? X R+ tv@ b+++@ DI++++ D--- G+
e- h! r-- y
------END GEEK CODE BLOCK------
 
P

Paul Foley

Perhaps it would be interesting to program a virtual machine simulating
an ancient computer (such as the pdp-7). Then, it would be rather
interesting to code for it (porting gcc to it maybe?). I think it would
be fun to play with the long-forgotten art of coding in machine language.

http://simh.trailing-edge.com
 
B

Bruno R. Dias

Jack said:
And what exactly does this have to do with the C language? I suspect
it is equally off-topic in comp.lang.python.
It would be *programmed* in a language, obviously. It's just that C is
rather appropriate for that kind of stuff, It's one of my favorite
languages, and It's a subject that should interest C programmers. The
same goes for Python.

--
-----BEGIN GEEK CODE BLOCK-----
Version: 3.1
GCS/E/L d-- s+:+ a--- C++ UL+ P--- L++>+++ E W++ N+ o+ K++ w---
!O M-- V--PS++ PE++ Y>+ PGP>+ t++(+++) 5? X R+ tv@ b+++@ DI++++ D--- G+
e- h! r-- y
------END GEEK CODE BLOCK------
 
B

Bruno R. Dias

Nils said:
http://www.aracnet.com/~healyzh/decemu.html
(There are emulators/virtual machines for most ancient computers out there
as well ;)


Well, gcc supports mmix
http://www-cs-faculty.stanford.edu/~knuth/mmix.html , and something in the
same area; http://tph.tuwien.ac.at/~oemer/qcl.html
Thanks a lot, but the two last links don't work. :)

--
-----BEGIN GEEK CODE BLOCK-----
Version: 3.1
GCS/E/L d-- s+:+ a--- C++ UL+ P--- L++>+++ E W++ N+ o+ K++ w---
!O M-- V--PS++ PE++ Y>+ PGP>+ t++(+++) 5? X R+ tv@ b+++@ DI++++ D--- G+
e- h! r-- y
------END GEEK CODE BLOCK------
 
C

CBFalconer

Jack said:
And what exactly does this have to do with the C language? I
suspect it is equally off-topic in comp.lang.python.

If he moves to alt.folklore.computers, he will find plenty of
people who have programmed such beasts, and even be on-topic.
Follow-ups set.
 
M

Malcolm

Bruno R. Dias said:
It would be *programmed* in a language, obviously. It's just that C is
rather appropriate for that kind of stuff, It's one of my favorite
languages, and It's a subject that should interest C programmers. The
same goes for Python.
Just because a program could be implemented in C doesn't make it on-topic
for comp.lang.c. However "is C the the most appropriate language for this
program?" is probably topical.

There are plenty of emulators out there, an emulator is not an especially
difficult program to write, and it is often useful. For instance if you want
to play 80's vintage Spectrum games from the comfort of your PC it is
possible using emulation software and program dumps (it is illegal to sell
such dumps unless you own the copyright, it is OK to make a copy of a game
you own for personal use, taking a copy from a friend without payment is a
grey area).

An interesting project would be a Fibonnaci computer. Instead of using a
exponent-based system (binary, decimal, hex etc) you represent numbers as
Fibonnaci sequences. This has some interesting properties, for instance
there are never two consecutive 1s in a valid number.
 
B

Bruno R. Dias

Malcolm wrote:
An interesting project would be a Fibonnaci computer. Instead of using a
exponent-based system (binary, decimal, hex etc) you represent numbers as
Fibonnaci sequences. This has some interesting properties, for instance
there are never two consecutive 1s in a valid number.

It would be a bitch to code for such a machine, but it sure would be
interesting.

--
-----BEGIN GEEK CODE BLOCK-----
Version: 3.1
GCS/E/L d-- s+:+ a--- C++ UL+ P--- L++>+++ E W++ N+ o+ K++ w---
!O M-- V--PS++ PE++ Y>+ PGP>+ t++(+++) 5? X R+ tv@ b+++@ DI++++ D--- G+
e- h! r-- y
------END GEEK CODE BLOCK------
 
D

Dave Vandervies

[comp.lang.python trimmed from crosspost list]

Perhaps it would be interesting to program a virtual machine simulating
an ancient computer (such as the pdp-7). Then, it would be rather
interesting to code for it (porting gcc to it maybe?). I think it would
be fun to play with the long-forgotten art of coding in machine language.

And what about a fictional computer, such as one that works on an
entirely different way (such as a non-binary computer)?

It wouldn't be very useful, but it wouold be a very fun and very
interesting thing to hack on.

I'm reading this in comp.programming, but for the comp.lang.c readers,
here's a not entirely off-topic idea:

Why not build a DeathStation simulator?

Create a VM that allows aggressive testing for bad (especially not-
well-defined) code, and a compiler targeting it that optimizes for
checkability rather than performance or size.

Obviously you'd need pointers to be more than just a memory address
(segment/offset/size would work, with pointer arithmetic results
checked to make sure the offset is inside the segment; this would add
checkability for no-longer-valid (free()d or old automatic) segments).
This would also let us trap on invalid int-to-pointer conversions (and
possibly on invalid pointer-to-pointer conversions if it's done right).

If we've got heavyweight segments anyways, we can add an "initialized"
flag and trap on access-to-uninitialized-memory. Possibly even
arbitrarily set uninitialized bytes read as unsigned char to random values
(or values that are invalid for whatever other data is there - can we get
away with having non-mallocd memory typed? unions might be a problem).

Having the VM recognize sequence points would also let it trap assorted
types of undefined behavior that typically go unrecognized until they
cause bugs.

Standard library calls could check (with implementation magic) their
arguments and warn at runtime if, they're given bad arguments (f'rexample,
if they're given a buffer that's smaller than the buffer size argument,
so this:
char buf[10];
fgets(buf,20,stdin);
would produce a warning when it runs, in addition to trapping if the
input overflows the buffer).


Other thoughts?
Anybody with enough compiler/VM experience to comment intelligently on
just how much work this would be?


dave
 
B

boa

Dave said:
[comp.lang.python trimmed from crosspost list]

Bruno R. Dias said:
Perhaps it would be interesting to program a virtual machine simulating
an ancient computer (such as the pdp-7). Then, it would be rather
interesting to code for it (porting gcc to it maybe?). I think it would
be fun to play with the long-forgotten art of coding in machine language.

And what about a fictional computer, such as one that works on an
entirely different way (such as a non-binary computer)?

It wouldn't be very useful, but it wouold be a very fun and very
interesting thing to hack on.


I'm reading this in comp.programming, but for the comp.lang.c readers,
here's a not entirely off-topic idea:

Why not build a DeathStation simulator?

Create a VM that allows aggressive testing for bad (especially not-
well-defined) code, and a compiler targeting it that optimizes for
checkability rather than performance or size.

Obviously you'd need pointers to be more than just a memory address
(segment/offset/size would work, with pointer arithmetic results
checked to make sure the offset is inside the segment; this would add
checkability for no-longer-valid (free()d or old automatic) segments).
This would also let us trap on invalid int-to-pointer conversions (and
possibly on invalid pointer-to-pointer conversions if it's done right).

If we've got heavyweight segments anyways, we can add an "initialized"
flag and trap on access-to-uninitialized-memory. Possibly even
arbitrarily set uninitialized bytes read as unsigned char to random values
(or values that are invalid for whatever other data is there - can we get
away with having non-mallocd memory typed? unions might be a problem).

Having the VM recognize sequence points would also let it trap assorted
types of undefined behavior that typically go unrecognized until they
cause bugs.

Standard library calls could check (with implementation magic) their
arguments and warn at runtime if, they're given bad arguments (f'rexample,
if they're given a buffer that's smaller than the buffer size argument,
so this:
char buf[10];
fgets(buf,20,stdin);
would produce a warning when it runs, in addition to trapping if the
input overflows the buffer).


Other thoughts?
Anybody with enough compiler/VM experience to comment intelligently on
just how much work this would be?


dave

It exists already and is called valgrind ;-)

boa
 
D

Dave Vandervies

Dave Vandervies wrote:

[Snip a few ideas]
It exists already and is called valgrind ;-)

That would be this valgrind (first hit on Google)?
"Valgrind, an open-source memory debugger for x86-linux"

How many of these will valgrind catch?
--------
/*Mentioned in my post - have the VM be aware of sequence points so it
can catch this
*/
i=i++;
--------
int *foo(int *dummy)
{
int i;
return &i;
}
int bar(int *p)
{
int dummy=42;
return *p;
}

/*In a function somewhere*/
/*Ideally, we want to warn here, when a no-longer-valid pointer is stored
in a variable (or, better, immediately on return from foo when the
storage the pointer points at goes away)
*/
p=foo(&i);
/*p is an invalid pointer; typical stack-using implementations will
have it pointing at the dummy int in bar()
*/
i=bar(p);
--------
char buf[10];
char *str=some_string_pointer;
buf[0]='\0';
/*We want to warn about this (claimed buffer size larger than actual
destination buffer) even if str fits into buf
*/
strncat(buf,str,20);
--------

Without knowing anything other than that it's a memory debugger, I'd give
part marks for the third one (no warning if the buffer doesn't overflow)
and a small chance at catching the second one.

Of course, it looks like it won't even get there if I try to run it on
a Mac.

So, quite obviously not what I was thinking of.


dave
 
W

William Ahern

Dave Vandervies said:
That would be this valgrind (first hit on Google)?
"Valgrind, an open-source memory debugger for x86-linux"
How many of these will valgrind catch?
--------
/*Mentioned in my post - have the VM be aware of sequence points so it
can catch this
*/
i=i++;

Yes, if `i' has not been set yet. But, not quite what you were looking for.
--------
int *foo(int *dummy)
{
int i;
return &i;
}
int bar(int *p)
{
int dummy=42;
return *p;
}

/*In a function somewhere*/
/*Ideally, we want to warn here, when a no-longer-valid pointer is stored
in a variable (or, better, immediately on return from foo when the
storage the pointer points at goes away)
*/
p=foo(&i);
/*p is an invalid pointer; typical stack-using implementations will
have it pointing at the dummy int in bar()
*/
i=bar(p);
--------

I believe this will print an error because bar takes the value of an
uninitialized variable. Valgrind keeps track of which regions in memory have
been touched, and reading from an untouched memory region (whether from heap
or stack or where ever) is caught.

Valgrind's real weakness is with automatic variables. If you had initialized
i in foo() none of this may have been caught.
char buf[10];
char *str=some_string_pointer;
buf[0]='\0';
/*We want to warn about this (claimed buffer size larger than actual
destination buffer) even if str fits into buf
*/
strncat(buf,str,20);
--------

Again, Valgrind would probably not catch this since buf is automatic.
However, OpenBSD has patches to GCC and their library definitions which
might catch this.
Without knowing anything other than that it's a memory debugger, I'd give
part marks for the third one (no warning if the buffer doesn't overflow)
and a small chance at catching the second one.

Of course, it looks like it won't even get there if I try to run it on
a Mac.

So, quite obviously not what I was thinking of.

Yep. Valgrind is a great tool but definitely has its limitations.
 
B

boa

Dave said:
boa said:
Dave Vandervies wrote:


[Snip a few ideas]

It exists already and is called valgrind ;-)


That would be this valgrind (first hit on Google)?
"Valgrind, an open-source memory debugger for x86-linux"

How many of these will valgrind catch?
--------
/*Mentioned in my post - have the VM be aware of sequence points so it
can catch this
*/
i=i++;
--------
int *foo(int *dummy)
{
int i;
return &i;
}
int bar(int *p)
{
int dummy=42;
return *p;
}

/*In a function somewhere*/
/*Ideally, we want to warn here, when a no-longer-valid pointer is stored
in a variable (or, better, immediately on return from foo when the
storage the pointer points at goes away)
*/
p=foo(&i);
/*p is an invalid pointer; typical stack-using implementations will
have it pointing at the dummy int in bar()
*/
i=bar(p);
--------
char buf[10];
char *str=some_string_pointer;
buf[0]='\0';
/*We want to warn about this (claimed buffer size larger than actual
destination buffer) even if str fits into buf
*/
strncat(buf,str,20);
--------

Without knowing anything other than that it's a memory debugger, I'd give
part marks for the third one (no warning if the buffer doesn't overflow)
and a small chance at catching the second one.

Of course, it looks like it won't even get there if I try to run it on
a Mac.

So, quite obviously not what I was thinking of.

You're right. I was too quick recommending valgrind. It is a good tool,
though.

boa
 
D

Dave Vandervies

I believe this will print an error because bar takes the value of an
uninitialized variable.

But it doesn't! It takes a pointer (that has been initialized) that
points to a region of automatic storage that was never initialized and
no longer exists.
Valgrind keeps track of which regions in memory have
been touched, and reading from an untouched memory region (whether from heap
or stack or where ever) is caught.

The pointer that bar() gets is (if we assume a few reasonable things
about the implementation) pointing at wherever i in foo() was; this is
likely to be the same place as dummy in bar() - which is initialized
before the pointer is dereferenced.

The problem is that that's no longer the i in foo() that we returned a
pointer to. Assigning a new (not recycled) segment descriptor for every
automatic variable (thus invalidating the aforementioned assumptions
about the implementation) would let this be caught as soon as we tried to
load the pointer after foo() returned (even before we try to follow it
in bar()). (Note that this would also be Bloody Slow if it was checked
every time a pointer value was handled.)


dave
 
W

William Ahern

But it doesn't! It takes a pointer (that has been initialized) that
points to a region of automatic storage that was never initialized and
no longer exists.
The pointer that bar() gets is (if we assume a few reasonable things
about the implementation) pointing at wherever i in foo() was; this is
likely to be the same place as dummy in bar() - which is initialized
before the pointer is dereferenced.

Ah. Damn, you were way ahead of me already. Valgrind is fooled by this. In
fact, Valgrind didn't even catch `i=i++' inside of main. Oh well.
 
T

Thad Smith

Dave said:
Dave Vandervies wrote:

[Snip a few ideas]
It exists already and is called valgrind ;-)
How many of these will valgrind catch?
--------
/*Mentioned in my post - have the VM be aware of sequence points so it
can catch this
*/
i=i++;
--------

C or C++ undefined behavior because of multiple updates between
sequence points is a *language specification conformance* issue, not
an execution one. This is detected by static analysis before or
during translation. Once converted to a sequence of instructions,
such as
ld i
inc i
st i
or
ld i
st i
inc i

the results are well defined for the virtual machine.
--------
char buf[10];
char *str=some_string_pointer;
buf[0]='\0';
/*We want to warn about this (claimed buffer size larger than actual
destination buffer) even if str fits into buf
*/
strncat(buf,str,20);
--------

There are different levels of warnings. In this example, if
strlen(str) is short enough, the behavior is well-defined, of course,
even though the construct isn't safe for arbitrarily long str
arguments. We can use static analysis in this case to determine that
sizeof(buf) < 20, indicating a questionable construct.

To protect against overflow, we really want
strncat(buf, str, sizeof(buf)-strlen(buf)-1);
That can be detected with static analysis, in some cases, as well. To
check dynamically for potential errors, we would verify that
len <= sizeof(buf)-strlen(buf)-1,
assuming that debug_strncat() has access to sizeof(buf).

Thad
 
D

Dave Vandervies

Dave Vandervies wrote: [snippage]
--------
/*Mentioned in my post - have the VM be aware of sequence points so it
can catch this
*/
i=i++;
--------

C or C++ undefined behavior because of multiple updates between
sequence points is a *language specification conformance* issue, not
an execution one. This is detected by static analysis before or
during translation. Once converted to a sequence of instructions,
such as
ld i
inc i
st i
or
ld i
st i
inc i

the results are well defined for the virtual machine.

Static analysis can't catch all problems of this sort. Consider:
--------
/*Somewhere*/
void foo(int *a,int *b)
{
*a=(*b)++;
}

/*Somewhere else*/
void bar(int x)
{
/*Do some stuff, including:*/
foo(&x,&x);
}
--------
If your static analyzer is smart enough to recognize that you're calling
foo() with equal pointers, then wrap a few more levels of indirection
around it until you've got enough to confuse it. Being able to
(especially unintentionally) construct arbitrarily complex code that can
still lead to this case makes static checking Highly Impractical at best.

On the other hand, handling this dynamically with a sequence-point-aware
VM would trap when foo() gets two pointers to the same int, and at that
point the debugger can be invoked to work out what led to that:
--------
seq_pt ;beginning of foo()
ld a0,arg2
ld a1,arg1
ld i0,(a0)
st i0,(a1) ;VM notes that *a has been modified since last sequence point
inc i0
st i0,(a0) ;traps if a==b: object modified twice between sequence points
seq_pt ;end of *a=(*b)++. Clear modified-object list.
--------

Keep in mind that the VM's purpose is to check for poorly-defined (or
otherwise bad) C code, even if that C code can be compiled to a set of
instructions that are well-defined in the VM.

Since it's constructed as a dynamic code checker for a language that
prohibits multiple updates (and some cases of both access and update)
between sequence points, the VM knows that even though the sequence of
instructions it sees is well-defined, it could only have been generated
from C code that isn't well-defined, so it can trap on that.


--------
char buf[10];
char *str=some_string_pointer;
buf[0]='\0';
/*We want to warn about this (claimed buffer size larger than actual
destination buffer) even if str fits into buf
*/
strncat(buf,str,20);
--------

There are different levels of warnings.

Keep in mind that I introduced this with:
}Create a VM that allows aggressive testing for bad (especially not-
}well-defined) code,

I'm assuming that you wouldn't be using such a thing if you didn't want
something approaching the "pathologically paranoid" level of warnings.

In this example, if
strlen(str) is short enough, the behavior is well-defined, of course,
even though the construct isn't safe for arbitrarily long str
arguments. We can use static analysis in this case to determine that
sizeof(buf) < 20, indicating a questionable construct.

But, once again, static analysis is only enough for the trivial examples
that illustrate the point without confusing the reader, and is unlikely
to be enough to catch the cases where a similar problem shows up in
real code.

If a function gets a buffer size argument larger than the real buffer
size, that's a bug, even if what ends up being written into that buffer
does fit; we want to catch that bug as soon as possible even if the
behavior is actually well-defined until a user's cat starts sleeping
on the keyboard. (If the programmer knows that what's getting written
into the buffer won't overflow it, that's what the non-counted variants
of the functions (strcpy in this case) are for.)

To protect against overflow, we really want
strncat(buf, str, sizeof(buf)-strlen(buf)-1);
That can be detected with static analysis, in some cases, as well. To
check dynamically for potential errors, we would verify that
len <= sizeof(buf)-strlen(buf)-1,
assuming that debug_strncat() has access to sizeof(buf).

If we're storing pointers as segment-offset-size, then a little bit of
implementation magic will give it the appropriate size.
Note that this isn't directly available to the code the programmer sees if
(as is likely) the buffer isn't a local or global array; buffers passed in
(as a pointer) from elsewhere or obtained from malloc are the ones most
likely to have mismatched sizes, and sizeof won't give the size of the
buffer in those cases.
Once you're doing aggressive dynamic checking in the implementation's
runtime environment anyways, it's much simpler for all concerned to let
the library function check the sizes; it knows how buffer size and size
arguments are related, and has access to implementation magic to get at
the information it needs to check them.


dave
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,150
Messages
2,570,853
Members
47,394
Latest member
Olekdev

Latest Threads

Top