hexump.c

M

Morris Keesan

18 char tab[16+1]; ....
24 memset(tab,'.',16); ....
34 if (column == 16) { ....
40 if (line == 16) { ....
52 while (column < 16) { ....
57 tab[16]=0;

I usually deduct points from students who use uncommented bare magic
numbers like this. Especially when the program is related to
base-16 (hexadecimal) output, and where the 16 in line 40 seems to be
unrelated to any of the other 16s. This might be a good time to
introduce the sizeof operator, and/or #defines. If they've already
been introduced, consider adding, as an exercise, "What would you need
to change to print a different number of characters per line?"



6: What would happen if you are working in a machine where the
characters are 16 bits wide? What needs to be changed in the
above program?

What about if you are working with a machine where the character width
is not a multiple of 4 bits?
(For many years, I worked with a Unix platform where the chars were
ten bits wide. Someone doing hardware design thought this was a good
idea.)
 
J

jacob navia

Le 10/09/11 23:37, Morris Keesan a écrit :
18 char tab[16+1]; ...
24 memset(tab,'.',16); ...
34 if (column == 16) { ...
40 if (line == 16) { ...
52 while (column < 16) { ...
57 tab[16]=0;

I usually deduct points from students who use uncommented bare magic
numbers like this. Especially when the program is related to
base-16 (hexadecimal) output, and where the 16 in line 40 seems to be
unrelated to any of the other 16s. This might be a good time to
introduce the sizeof operator, and/or #defines.

That's what the exercise 4 should learn to do:

4: Add another option (call it -column:XXX) todisplay more or less
text positions in a line. For instance -column:80 would fix the
display to 80 columns. Adjust the number of characters displayed
accordingly. Note that you should not make the number of characters
less than 4 or greater than 512.

The goal here is to show exactly why those magic numbers aren't OK. I
said that in my original post when I started this thread:

<quote>
Note that the manifest
constants will be replaced by #defines in the exercises, when they
are asked to increase the number of columns, etc.
What about if you are working with a machine where the character width
is not a multiple of 4 bits?

Ahh the famous Deathstation 9000 you mean?

Let's be realistic, maybe those systems existed in the past but now...
(For many years, I worked with a Unix platform where the chars were
ten bits wide. Someone doing hardware design thought this was a good
idea.)

Well, I think you would need another exercise for that :)
 
P

Patrick Scheible

Ahh the famous Deathstation 9000 you mean?

Let's be realistic, maybe those systems existed in the past but now...

There are still systems with 36-bit words. In order to make a C
compiler for those systems, 9-bit chars are used. While such a compiler
can meet standard C, way too much C code can't be ported because of the
widespread assumption that all the world has 8-bit chars.

-- Patrick
 
B

Barry Schwarz

Got the idea of adding one more exercise to the tutorial.

The goal is to show a small hexdump utility without any bells and
whistles, and add a bunch of exercises to add those. Here it is.

It uses standard C. Please tell me if there could be any
portability problems.

I do not use putchar but fputc to make it easier to add an output
file later as one more argument.

Please tell me if you see any errors in it. Note that the manifest
constants will be replaced by #defines in the exercises, when they
are asked to increase the number of columns, etc.

------------------------------------------------------cut here
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(int argc,char *argv[])
{
    if (argc < 2) {
       fprintf(stderr,"Usage: %s <file name>\n",argv[0]);
       return EXIT_FAILURE;
    }
    FILE *file = fopen(argv[1],"rb");
    if (file == NULL) {
       fprintf(stderr,"Impossible to open %s for reading\n",argv[1]);
       return EXIT_FAILURE;
    }

<snip>

Unless the student is required to have a C99 compiler, all your
declarations should precede your statements.
 
J

jacob navia

Le 11/09/11 09:37, Barry Schwarz a écrit :
Unless the student is required to have a C99 compiler, all your
declarations should precede your statements.

Well, C99 is already 12 years old , so let's require standard C.

A C99 compiler exists formost of the wok stations today.
PC: lcc-win, gcc, intel
Linux: gcc, intel
PowerPC: IBM
Mac OSX: Apple's gcc (compiles C99 by default)

This is a tutorial that starts with standard C, so better get a
conforming compiler
 
N

NicStevens

jacob said:
       fprintf(stderr,"Usage: %s <file name>\n",argv[0]); return

fails if argv[0] is null
    unsigned char tab[16+1];

you should use char to store printable characters
       if (oneChar >= ' ' && oneChar <= 127) {
          tab[column] = oneChar;
       }

could lead to unprintable characters appearing on non-ascii systems
       fputc(hex[(oneChar >> 4)&0xf],stdout);

the mask is unnecessary unless you are targeting systems with CHAR_BIT>8
          fprintf(stdout,"   ");

you know there is a function called printf?

looks like your C code for trivial exercises is just as sloppy as the
code in your buggy and overpriced compiler jacob.

printf is a much more expensive operation to output one char versus
fputc or putchar
 
J

Jorgen Grahn

Undefined behaviour in the (unlikely) case where argc = 0 and argv[0] = NULL.

Yes. But do you know a system where that happens?

Never found one.

It's easy for a malicious invoker to do that with the UNIX execv()
or execl() call. I suppose it's possible to do it accidentally but
that's unlikely.

IIRC, the Commodore Amiga did that, if you started a program by
clicking it in the GUI (the Workbench). But few people wrote
standard-conformant C on that platform.

/Jorgen
 
S

Seebs

I agree that argc = 0 is unlikely. But the standard allows it.

I think there was some system with the convention that argc = 0 meant
that argv[1] was a pointer to a system-specific magical structure that
had information about a GUI invocation or something to that effect.

-s
 
J

jacob navia

Le 13/09/11 08:44, Seebs a écrit :
I agree that argc = 0 is unlikely. But the standard allows it.

I think there was some system with the convention that argc = 0 meant
that argv[1] was a pointer to a system-specific magical structure that
had information about a GUI invocation or something to that effect.

-s

WHO CARES?

Not me. I think you can't possible try to satisfy all the systems that
existed some time in the past to make your application portable to
PDP11 or whatever. Specially not for new students of the C language
that will surely work in a PC (windows or linux) or a MacIntosh when
doing their homework.
 
F

Fritz Wuehler

Not me. I think you can't possible try to satisfy all the systems that
existed some time in the past to make your application portable to
PDP11 or whatever.

Why not, isn't that what the preprocessor is for?

Oh the PDP 11 doesn't have the preprocessor? ;-)
 
J

jacob navia

Le 13/09/11 22:36, Kenneth Brody a écrit :
I think all classes on C programming should be taught on the DS-9000,
unless you are specifically teaching "Brand X's C compiler". It's a lot
easier to learn "the right way" first and then learn system-specific
extensions, than it is to learn system-specific extensions and then
"unlearn" them when working on other platforms.

1) argv[0] is a system specific extension ???? That would be news to me.

2) Can you tell me of an EXISTING system that sets argv[0] to anything
else than the name of the program?
 
K

Keith Thompson

jacob navia said:
Le 13/09/11 08:44, Seebs a écrit :
I agree that argc = 0 is unlikely. But the standard allows it.

I think there was some system with the convention that argc = 0 meant
that argv[1] was a pointer to a system-specific magical structure that
had information about a GUI invocation or something to that effect.

-s

WHO CARES?

Not me. I think you can't possible try to satisfy all the systems that
existed some time in the past to make your application portable to
PDP11 or whatever. Specially not for new students of the C language
that will surely work in a PC (windows or linux) or a MacIntosh when
doing their homework.

Ok, you don't care. But the standard is quite explicit about *not*
requiring argv[0] to be non-null. See C99 5.1.2.2.1p2:

-- The value of argc shall be nonnegative.
-- argv[argc] shall be a null pointer.
-- If the value of argc is greater than zero, [...]

But consider this: students who learned to program on a PDP-11 or a VAX
are probably no longer using those systems. Students learning today on
a Windows, Linux, or Macintosh system may be using very different
systems in the future. It's never too early to learn good habits.

I'm not necessarily saying that a tutorial intended for beginners needs
to check for every possibility; assuming that argv[0] != NULL might be
reasonable in that context. But in code that's intended to be robust, a
test is worth the effort, perhaps something like:

const char *const program_name = argv[0] ? argv[0] : "<program>";
fprintf(stderr, "Usage: %s <filename>\n", program_name);
 
M

Malcolm McLean

But in code that's intended to be robust, a
test is worth the effort, perhaps something like:

    const char *const program_name = argv[0] ? argv[0] : "<program>";
    fprintf(stderr, "Usage: %s <filename>\n", program_name);
The problem with that is that you might get a full path in argv[0].

Ypu can use strrchr() to chop, using either / or \ as a path
separator. That will work in most real situations, but not if a unixy
person includes a backslash in a file name.
 
B

BartC

But consider this: students who learned to program on a PDP-11 or a VAX
are probably no longer using those systems.

I bet they were taught a bunch of other stuff that was also specific to
those systems and which is now irrelevant, but still managed to gradually
progress to new systems without too much trouble.

And the chances are that a couple of decades later, they will be using a
different language anyway, if they are still even programming.
 
J

jacob navia

Le 14/09/11 12:40, BartC a écrit :
I bet they were taught a bunch of other stuff that was also specific to
those systems and which is now irrelevant, but still managed to
gradually progress to new systems without too much trouble.

And the chances are that a couple of decades later, they will be using a
different language anyway, if they are still even programming.

This obsession with portability reaches here new heights...

Programs must be written (by beginners) such that they run still
unmodified in 20 years, so the children of the students can peek into
their father's homework and copy it unmodified.


Well, this Thompson guy has never presented any program here, not
even a small utility like the one I presented. The only thing he knows
is to pick a small detail from a post and add his nonsense.
 
N

Nobody

2) Can you tell me of an EXISTING system that sets argv[0] to anything
else than the name of the program?

Unless running a script (a file beginning with "#!"), Unix sets argv to
exactly what is passed to execve(). If you execute a program with:

char *arg0 = NULL;
execve("foo", &arg0, environ);

the resulting program will have "argc == 0 && argv[0] == NULL".

Setting argv[0] to the name of the program is only a convention, not
something which is enforced by the kernel.
 
N

Nobody

This obsession with portability reaches here new heights...

Programs must be written (by beginners) such that they run still
unmodified in 20 years, so the children of the students can peek into
their father's homework and copy it unmodified.

It isn't always necessary to write portable code. However, whether
your code is portable or not should be a deliberate choice, not an
accidental consequence of not knowing what is or isn't portable.

Anyone with a brain can discover typical behaviour through trial and
error, but that won't tell you whether such behaviour is actually
guaranteed. People will only learn that information if they specifically
go looking for it (which won't happen if they /think/ they already know),
or if someone actually tells them.

Why does it matter if code "only" works 99.999% of the time? Because if
security is an issue, it's often quite straightforward for someone to
force the "0.001%" case to happen.

If you ever need to write a program which has security implications, it's
not enough to know what should happen or what will probably happen. You
have to know what /could/ happen; all of it: every, single possibility.

Admittedly, argv[0]==NULL doesn't have much scope for exploitation,
although it has some; e.g. a potential DoS if you can force a program to
segfault while reporting an error which should otherwise be recoverable
(or if the program should at least be able to "clean up" before
terminating).
 
J

jacob navia

Le 14/09/11 14:31, Nobody a écrit :
2) Can you tell me of an EXISTING system that sets argv[0] to anything
else than the name of the program?

Unless running a script (a file beginning with "#!"), Unix sets argv to
exactly what is passed to execve(). If you execute a program with:

char *arg0 = NULL;
execve("foo",&arg0, environ);

the resulting program will have "argc == 0&& argv[0] == NULL".

Setting argv[0] to the name of the program is only a convention, not
something which is enforced by the kernel.

Yes. So, you can crash probably most Unix programs that way.
So what?

Most zip/unzip programs are only one program. They look at argv[0]
to see if they should be unzipping or zipping.

There is no system that doesn't set argv[0] to the name of the program.
I added a guard (if (argv[0]) in the program as presented in the
exercise.

One thing you should consider is that the beginner must begin with
simple programs, not highly secured programs designed to run a simple
hex dump utility in a 100% secure environment. There is NO point in
making hexdump.c more complex than it is already.
 
B

Ben Bacarisse

BartC said:
I bet they were taught a bunch of other stuff that was also specific
to those systems and which is now irrelevant, but still managed to
gradually progress to new systems without too much trouble.

Yes, but the new knowledge of diversity was added to the old -- it did
not simply replace it. The PDP-11 was mixed endian; the VAX had a
single endianness. Moving on to, say, 64-bit Intel CPUs you'd see yet
another endianness. All had C compilers that made different choices for
the sizes of C basic types. After that, it becomes very difficult to,
say, use a packed struct to access a network protocol header -- you are
all too aware of the many ways it which it can go wrong.

I don't see anything wrong in trying to short circuit this learning.
Giving people the benefit of experience is one thing that Usenet is good
at.
And the chances are that a couple of decades later, they will be using
a different language anyway, if they are still even programming.

I am not so sure of that. C is still with us. I'd like to think you
are right but I'd bet the other way even so.

If C is still around, the future of CPU architecture is unpredictable.
Designers of ultra-high performance graphics chips or of fast low-power
chips for embedding in your brain stem (or whatever really happens in
2044[1]) may have to make radically different choices.

[1] I first wrote a program in 1978, so someone starting now may well be
programming in 2044. Consider what's happened in those 33 years and try
to hazard a guess as to what will be happening in 2044.
 
M

Malcolm McLean

I first wrote a program in 1978, so someone starting now may well be
programming in 2044.  Consider what's happened in those 33 years and try
to hazard a guess as to what will be happening in 2044.
I used to "poke" the screen to get character mapped interactive
displays.

I still write a lot of programs like that. The difference is that I
now have to do a little bit of messing about with windowing systems to
get a character raster. Then it's back to old habits.

See Amino Invaders for an example

http://www.malcommclean.site11.com/www.

That program's written in Java. It's just a quick throwaway game to
teach the amino acid codes.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,082
Messages
2,570,589
Members
47,211
Latest member
Shamestone

Latest Threads

Top