Character arrays

Michael Mair · Nov 13, 2004

I completely agree. As there are no other courses, I included the tools
at the end.

That's assuming that the C newbie is also a newbie to programming and
to *nix as well. Which might be a good assumption in general, I suppose.
But I basically learned CVS this semester, in the same class for which
I learned OCaml... so I have no doubt that a student in a C-newbies course
who already knew the basics of programming in a language like Pascal or
Java could deal with learning both C and the basic Unix tools at the same
time.

Well, in my last course less than ten percent knew anything about
programming or *nix and another ten percent told me that they had
never worked with computers before...

Cheers
Michael

Michael Mair · Nov 13, 2004

J.L.Cooper said:
To me it seems more logical to introduce the students to them at the start
of their course. After all RCS/CVS can be used to source code control any
files being worked on and Make can be used to do a lot more than compile C
programs.

Of course. But IMO it is more motivating to see the necessity for using
it. As I have only finite time to get them started on the environment,
C, tools, and algorithmic optimisation, I will do so in this order as it
enables me to treat all topics.

In fact I use RCS and Make a lot more than I use C even my Thesis is stored
in RCS and has a make file. After all C should not be the first module that
a student undertakes, there some which are more important (like Logic,
Computer Architecture and Assembly Language, of course this only my opinion
and I am sure other peoples will differ).

As there are no other courses, I fill the gap as good as I can.
C is IMO an excellent starting language as it is at once high level
and relatively close to the machines -- and is not as "big" as other
languages with comparable scope.
From C to Assembly is one step and using C from, say, Matlab, another.

Apart from that: If you have once _experienced_ how much relief comes
along with using make or version control, then you are much more likely
to start new projects using them. Thus, my students have to "suffer"
through the time without to get the revelation when introduced to make.
This probably is still not steep enough (Honestly, how bad can it get
in a sensible course?) to drive the lesson home but hopefully makes
them remember soon enough to (re)organise projects accordingly.

Cheers
Michael

ranjeet · Nov 13, 2004

Hi

I observed something while coding the other day:

if I declare a character array as char s[0], and try to use it as any
other character array..it works perfectly fine most of the times. It
holds strings of any length. I guess what is happening here is that
this array initially holds only '\0' and hence is of length 1.

AFAIK it will not proceeed further well Your Compiler must(Will) give
The error as soon it starts to compile the Above staement.
You Are declaring the char s[0] of Zero lengh. and this is not possible.
If you really want to get your concept clear then go through the
Book K&R and solve the problem 2.5 you will be clear with all the
Functionalites.
or write the program which read only 20 characters.
dont use any libraray function except the printf statement you will
get all the points(doubts) clear.

Chris Croughton · Nov 13, 2004

For -W, you need to know too much. This is an option I tell
them about but usually the introduction to splint helps them
more.

True, -W is a bit over the top (but lint is usually far more so, I know
a lot of people who when introduced to lint "shoot the messenger" and
either don't use it of turn off everything they can). The one I do turn
off is the signed/unsigned comparisons, that gets really annoying
(especially with some versions of gcc which complain about constants, I
really do not want to have to write x > 0U).

Definitely! But I prefer teaching them the language to a point where
they can appreciate make before introducing them to it.

I use make almost more for other transformations than for compiling.

Already happens. ar, make, gprof, cvs and other come in towards the end
of my course. When they have learned to write complex code which
justifies it. In the long run, the tools are more important than the
language used, but most of them do not even know "that computers can
have command lines" when they enter the course.

Ah, to me that's a given. Computers had nothing but command lines when
I started -- well, OK, unless you count the front panel switches and the
button which loaded the bootstrap from the card reader. I suppose the
lights on the panel could be called a GUI, they were sort of graphical

and it was a user interface of sorts. And the oscilloscope said:
Make is definitely too much for absolute beginners. They first have
to really understand (by "menial labour" with an actual language at
"actual" problems) why they need it.

Hmm. I tend to write the makefile before typing in the start of the
program...

Aliases (as Dan Pop suggested) or this script is quite enough at the
start.

Pity gcc doesn't take an environment variable for the default options,
so it could be set up in the login scripts. There was something very
elegant about just typing "cc x.c". But I suppose an alias is about as
close...

Chris C

Michael Mair · Nov 13, 2004

Chris said:
Michael Mair wrote

True, -W is a bit over the top (but lint is usually far more so, I know
a lot of people who when introduced to lint "shoot the messenger" and
either don't use it of turn off everything they can).

*g* I could not get some of my colleagues to use splint because
even the -weak option gave them too much to digest when tried on an
average file.
However, splint -weak gives you only the "reasonable" warnings and
really helps finding the most obvious mistakes.
If you then go to the default, and then -checks, you usually learn
a good bit about your code and bad coding habits.

As soon as my students feel too confident, I give them the task
to correct some program which will give no warnings up to splint
-checks but crash nonetheless and have a hidden bug at about
every fourth line...

> The one I do turn
> off is the signed/unsigned comparisons, that gets really annoying
> (especially with some versions of gcc which complain about constants,
> I really do not want to have to write x > 0U).

Ah yes, I am always "happy" about that, too ;-)
Nonetheless, I switch this off only after having a look at the
respective lines...

Cheers,
Michael

Chris Croughton · Nov 14, 2004

*g* I could not get some of my colleagues to use splint because
even the -weak option gave them too much to digest when tried on an
average file.

I can believe that, I've had to tell it (OK, lclint but they are from
the same base) a number of things to ignore. Like its insistance that
%X wants an unsigned int (almost all of the time I'm using %X is in
%debug code, where I want the hex value, I don't give a monkey's
%whether it's signed or not).

However, splint -weak gives you only the "reasonable" warnings and
really helps finding the most obvious mistakes.
If you then go to the default, and then -checks, you usually learn
a good bit about your code and bad coding habits.

Things like comparing an int with a character constant? When the int
has to be that to contain EOF? That's just silly (in fact I can't
recall any case of comparing an int with a character constant which was
an error).

As soon as my students feel too confident, I give them the task
to correct some program which will give no warnings up to splint
-checks but crash nonetheless and have a hidden bug at about
every fourth line...

Oh yes, those are the interesting ones. I have long believed that any
program which compiles first time with no errors or warnings is
extremely suspect. This is not just superstition ("there must be
/something/ wrong with it!"), it has basis in the fact that if a program
has had errors and has had to be corrected then the author will at least
have looked at (at least part of) the code critically and will likely
have noticed other bugs that the compiler didn't see.

One I was given at a job interview was of the form:

#include <stdio.h>

int main(void)
{
int i = 0;
++i; /* comment *
++i; * comment *
++i; * comment *
++i; * comment *
++i; * comment */
printf("%d\n", i);
return 0;
}

What value does it print out? No, not 5, although a lot of even
experienced programmers will give that answer. The "block comment" on
the right is visually confusing, the brain tends to ignore it. Of
course, any modern editor (including vim) with highlighting will show
the error immediately, so give it to them on paper. It's perfectly
valid ANSI C, even lclint on its most picky setting won't object, it
just happens to not give the answer the programmer probably expects...

(Actually, the above is a simplified version. The one I was given
included a loop, which did indeed have a bug in it (off-by-one error)
quite apart from the comment block, which confuses things more...)

Ah yes, I am always "happy" about that, too ;-)
Nonetheless, I switch this off only after having a look at the
respective lines...

I've found some compilers where it can't be turned off (one of the ARM
ones in particular) and it's really annoying...

Chris C

Michael Mair · Nov 14, 2004

Chris said:
I can believe that, I've had to tell it (OK, lclint but they are from
the same base) a number of things to ignore. Like its insistance that
%X wants an unsigned int (almost all of the time I'm using %X is in
%debug code, where I want the hex value, I don't give a monkey's
%whether it's signed or not).

Yep, this can be rather annoying. And eventually, changing the own
programming style just to please *lint is not really what one wants
to do ;-)

Things like comparing an int with a character constant? When the int
has to be that to contain EOF? That's just silly (in fact I can't
recall any case of comparing an int with a character constant which was
an error).

Umh, character constants (you are talking of things like 'a', are
you?) are of type int, aren't they?

Oh yes, those are the interesting ones. I have long believed that any
program which compiles first time with no errors or warnings is
extremely suspect. This is not just superstition ("there must be
/something/ wrong with it!"), it has basis in the fact that if a program
has had errors and has had to be corrected then the author will at least
have looked at (at least part of) the code critically and will likely
have noticed other bugs that the compiler didn't see.

Well, I only once in my life wrote a longish module and had no
compiler errors or warnings. And the most evil test data worked.
Then, I asked a colleague to have a look at it. Gave me really
the creeps when he told me that everything looked fine... ;-)

One I was given at a job interview was of the form:

#include <stdio.h>

int main(void)
{
int i = 0;
++i; /* comment *
++i; * comment *
++i; * comment *
++i; * comment *
++i; * comment */
printf("%d\n", i);
return 0;
}

What value does it print out? No, not 5, although a lot of even
experienced programmers will give that answer. The "block comment" on
the right is visually confusing, the brain tends to ignore it. Of
course, any modern editor (including vim) with highlighting will show
the error immediately, so give it to them on paper. It's perfectly
valid ANSI C, even lclint on its most picky setting won't object, it
just happens to not give the answer the programmer probably expects...

(Actually, the above is a simplified version. The one I was given
included a loop, which did indeed have a bug in it (off-by-one error)
quite apart from the comment block, which confuses things more...)

Nice example; with a little bit more around the crucial part, it can be
really misleading. However, if someone is really experienced (s)he has
his/her own policy about comments which will lead to quick discovery of
what is happening; given the pressure at an interview it can fail to
work but no one with some sanity left will accept block comments or
"line" comments in many consecutive lines behind the code. Nobody will
read the important comments. IMO, thoroughly miscommented code with
useless comments at every line is worse then one comment line per
function.

I've found some compilers where it can't be turned off (one of the ARM
ones in particular) and it's really annoying...

Oh yes, I fondly remember having to move to another compiler
which warned me that the return statements at the end of some
two hundert functions could not be reached when they were only
in there to make sure that we catch the error if someone does
something stupid when changing code. That could not be switched off
either... We ended up filtering the warnings to find the "real"
issues as we still used the other compiler on other machines.

Cheers
Michael

Chris Croughton · Nov 14, 2004

Heh, your indenter decided that % was a comment indent function. I
haven't seen that one before...

Yep, this can be rather annoying. And eventually, changing the own
programming style just to please *lint is not really what one wants
to do ;-)

Indeed. The cry "I write C, dammit, not lint!" has been heard...

Umh, character constants (you are talking of things like 'a', are
you?) are of type int, aren't they?

Yup, and yup. In fact the output even tells you that using the lclint
flag to suppress it is likely to be safe because character constants
have type int!

/tmp/ttt.c:12:9: Operands of > have incompatible types (int, char): i > 'a'
A character constant is used as an int. Use +charintliteral to allow
character constants to be used as ints. (This is safe since the actual
type of a char constant is int.)

If there is really any "house style" violated by saying

int ch;
ch = getchar();
if (ch == EOF)
...
if (ch > 'a")
...

then I'd like to know, so I can avoid the place like the plague...

(Whether one should use getchar(), of course, is a different container
of piscine entities...)

Well, I only once in my life wrote a longish module and had no
compiler errors or warnings. And the most evil test data worked.
Then, I asked a colleague to have a look at it. Gave me really
the creeps when he told me that everything looked fine... ;-)

Definitely spooky! I did have one the other day, where I started
'writing' it in my head overnight and typed it in the next day, and it
not only compiled but also worked first time. However, once I started
adding the rest of the code it failed to compile spectacularly, so that

made up for it said:
Nice example; with a little bit more around the crucial part, it can be
really misleading. However, if someone is really experienced (s)he has
his/her own policy about comments which will lead to quick discovery of
what is happening; given the pressure at an interview it can fail to
work but no one with some sanity left will accept block comments or
"line" comments in many consecutive lines behind the code. Nobody will
read the important comments.

Yes, true. The problem comes when that person tries to strip them out
mentally, human parsers make mistakes because they look for patterns
("Ignore anything to the right of the semicolon") which aren't always
correct. Even more when switching languages (I'm used to the C++ style
// line comments, which C99 allows as do almost all modern C compilers,
so I read it as "oh, a block of line comments").

IMO, thoroughly miscommented code with
useless comments at every line is worse then one comment line per
function.

Oh definitely, I've lost count of the times I've used (and sometimes
written) a 'noft' program for various languages to remove all of the
comments, because I couldn't trust them. Or occasionaly because they
were in some language which I almost understood but not enough to be
certain that I was reading them correctly (I read a certain amount of
French, German and Dutch, enough to get confused with words which look
like English ones but aren't). Badly commented code is worse than no
comments at all because it misleads whereas lack of comments mean that
the maintainer has to actually read what the code does instead of what
the creator thought it was doing.

Oh yes, I fondly remember having to move to another compiler
which warned me that the return statements at the end of some
two hundert functions could not be reached when they were only
in there to make sure that we catch the error if someone does
something stupid when changing code. That could not be switched off
either... We ended up filtering the warnings to find the "real"
issues as we still used the other compiler on other machines.

BTDT as well. I got a bonus once for writing (in my own time) a
compiler warning filter, it improved productivity immensely. In those
days we had the option of either all warnings or none...

Chris C

Old Wolf · Nov 14, 2004

Chris Croughton said:
True, -W is a bit over the top (but lint is usually far more so, I know
a lot of people who when introduced to lint "shoot the messenger" and
either don't use it of turn off everything they can). The one I do turn
off is the signed/unsigned comparisons, that gets really annoying
(especially with some versions of gcc which complain about constants, I
really do not want to have to write x > 0U).

I turn this on whenever I can find it. Signed-unsigned
comparisons are the cause of some of the most hard-to-find
bugs. I wish there was a switch to make the C rules more
sensible

(I would prefer that (-1 < 0x1) were TRUE).

pete · Nov 15, 2004

Old said:
(I would prefer that (-1 < 0x1) were TRUE).

(-1 < 0x1) is true.

(-1 < 0x1u) is false.

Chris Croughton · Nov 15, 2004

(-1 < 0x1) is true.

(-1 < 0x1u) is false.

I think that what "Old Wolf" meant is what I would prefer, that any
signed type with a negative value would compare less than any unsigned
value, which would make sense.

Chris C

pete · Nov 15, 2004

Chris said:
I think that what "Old Wolf" meant is what I would prefer, that any
signed type with a negative value would compare less than any unsigned
value, which would make sense.

As Old Wolf does, I like the signed/unsigned mismatch warning also.

Charlie Gordon · Nov 16, 2004

/tmp/ttt.c:12:9: Operands of > have incompatible types (int, char): i > 'a'
A character constant is used as an int. Use +charintliteral to allow
character constants to be used as ints. (This is safe since the actual
type of a char constant is int.)

If there is really any "house style" violated by saying

int ch;
ch = getchar();
if (ch == EOF)
...
if (ch > 'a")
...

then I'd like to know, so I can avoid the place like the plague...

no house style violation, merely a syntax error on 'a" ;-)

Since you read some amount of French, I will give you some examples of character
constant oddities :

if (ch > 'a') ... is quite meaningless if you are trying to produce portable
code (EBCDIC issues)

if (ch == 'é') ... stands a good chance of never matching anything read with
getchar()

if (ch == 'ÿ') ... will erroneously match EOF if chars are signed by default.

sizeof('a') == sizeof(int) comes as a surprise to some !

sizeof(L'a') != sizeof(int) is even more surprising (on systems where
wchar_t is a short and short != int, eg: windows)

Chqrlie.

Charlie Gordon · Nov 16, 2004

Chris Croughton said:
I think that what "Old Wolf" meant is what I would prefer, that any
signed type with a negative value would compare less than any unsigned
value, which would make sense.

Even more surprising:

usually caught with a warning:
(-1U < 1) is false

not even a signed/unsigned comparison:
(sizeof(char) - sizeof(int) > 0) is true

Chqrlie.

Richard Bos · Nov 16, 2004

Charlie Gordon said:
if (ch == 'ÿ') ... will erroneously match EOF if chars are signed by default.

Not necessarily. It's true on most systems, since the majority uses
extended versions of ASCII in which 'ÿ' is 255, and usually EOF is -1,
CHAR_BIT is 8, and integer overflow simply wraps around. None of this is
guaranteed by the Standard, however; it's just rare to find a desktop
machine on which it isn't true.

Richard

Chris Croughton · Nov 16, 2004

no house style violation, merely a syntax error on 'a" ;-)

Fair cop <g>. But it still applies to

if (ch == 'a')

Since you read some amount of French, I will give you some examples of character
constant oddities :

if (ch > 'a') ... is quite meaningless if you are trying to produce portable
code (EBCDIC issues)

But whether ch is int or char (or long, short, etc.) is irrelevant to
that. You might as well give a warning if any character constant is
used anywhere.

And note that lclint gives the same warning if the code is

if (ch >= '0' && ch <= '9')

which is valid whatever the character set used. Section 5.1(3) says:

In both the source and execution basic character sets, the value of
each character after 0 in the above list shall be one greater than the
value of the previous.

(where "the above list" was the 10 decimal digits 0 1 2 3 4 5 6 7 8 9).
That even holds in EBCDIC with 8-bit signed char.

if (ch == 'é') ... stands a good chance of never matching anything read with
getchar()

True, getchar (getc, fgetc) returns an unsigned value (except for EOF).
But since the value for 'é' needs to be determined with regard to the
current locale, which is only determinable at runtime, I would regard
any use of a character constant which is not 7-bit clean as an error.

if (ch == 'ÿ') ... will erroneously match EOF if chars are signed by default.

True, but again irrelevant. If you're dealing with non-ASCII characters
you need to do special processing anyway (in fact all of your examples
are assuming the LATIN-1 or equivalent character sets, none of it will
make any sense at all in a Cyrillic character set for instance). In any
real application which has to take account of such characters the
program will have to look at the locale in use at runtime to determine
the correct values anyway.

In none of those cases does having a warning about comparing an int with
a character constant do any more than having a general warning about
comparing anything with a character constant, or indeed a warning if you
use a character constant at all (note that the character set used for
preprocessing might also not be the one used at runtime -- indeed, the
one used at runtime may not be fixed, especially with non-ASCII
characters).

sizeof('a') == sizeof(int) comes as a surprise to some !

But not to me, since I pointed out that the warning is meaningless
because a character constant is an int!

sizeof(L'a') != sizeof(int) is even more surprising (on systems where
wchar_t is a short and short != int, eg: windows)

wchar_t is an oddity anyway, I never use it. If I am doing operations
with multi-byte character sets I use UCS-4 internally (in an int32_t,
since UCS-4 is a 31-bit type) and convert to and from UTF-8 (or a
specific 8-bit character set) externally. But if you have to do that,
there shouldn't be any character or string literals used at all.

Chqrlie.

In which character set is that? <g>

Chris C

Charlie Gordon · Nov 16, 2004

Chris Croughton said:
Fair cop <g>. But it still applies to

if (ch == 'a')

I agree with you about the excessive warning in the example mentioned, I was
merely pointing a few problems with character constants that you are aware of,
but will surprise the majority of C programmers.

But whether ch is int or char (or long, short, etc.) is irrelevant to
that. You might as well give a warning if any character constant is
used anywhere.

No : the use of the > operator implies assumptions about the character set that
may well be false.
other uses of 'a' are not concerned.

And note that lclint gives the same warning if the code is

if (ch >= '0' && ch <= '9')
which is valid whatever the character set used. Section 5.1(3) says:
...

lclint is not smart enough ;-)

True, getchar (getc, fgetc) returns an unsigned value (except for EOF).
But since the value for 'é' needs to be determined with regard to the
current locale, which is only determinable at runtime, I would regard
any use of a character constant which is not 7-bit clean as an error.

getc() and fgetc() return an int, and treat the data stream as a sequence of
unsigned chars, which is inconsistent with the char type being signed by
default, and its consequences in terms of the value of character constants with
the high bit on.
I agree with you about the extra issues related to using 8 bit characters in
strings and character constants,

I could have written :

if (ch == '\351')... // may never match anything
and
if (ch == '\377') // may erroneously match EOF

Chqrlie.

PS: the 'q' in there is a French joke.

Dan Pop · Nov 16, 2004

In said:
Because we cannot afford the Comeau compiler plus Dinkumware libraries.

You don't need them, either.

Because I hope that in the long run this is the better course than
teaching only C89 and doing C99 as add-on at the end, if at all,

For the time being, there is no point in teaching C99 at all: it is far
from clear that it will ever become an industry standard; for all we know
now, 5 years after its adoption, it may remain a committee pipe dream
forever.

and many useful constructs can be used already.

Not portably, which makes them far less useful.

If every C course did this, then there would be enough weight to get
full conformance not only in gcc but in most and eventually all
major compilers.

Wishful thinking. No one really needs the _Bool nonsense and most of
the big time number crunching is not done in C.

However, I fear that in the long run we will get C89 plus C99
standard library and nothing more.

More likely, C89 plus some (small) parts of the C99 language and some
(small) parts of C99 standard library.

I can see long long and inline becoming mainstream extensions to C89
and snprintf and the revised freopen as mainstream extensions to the C89
library. I also hope for VLAs, but I'm not holding my breath.

Note that gcc's support for both inline and VLAs is not conforming to the
C99 specification. That's why I wouldn't recommend gcc -std=c99 as the
proper teaching aid for a C99 course.

Dan

Michael Mair · Nov 16, 2004

Dan said:
You don't need them, either.

Assuming we want to use C99:
Which compiler/library combination conforming to the C99
standard would you suggest, then?

For the time being, there is no point in teaching C99 at all: it is far
from clear that it will ever become an industry standard; for all we know
now, 5 years after its adoption, it may remain a committee pipe dream
forever.

I fear so, too. I still hope that it's only a hen-egg problem: If one of
the widely used compilers would give us C99, it would start being used
and demanded in other compilers too.
As the gcc people still claim to aim for C99, I keep my hopes up.

Not portably, which makes them far less useful.

Yep. So I teach both, hoping for the best... :-/

Wishful thinking.

Granted

No one really needs the _Bool nonsense and most of
the big time number crunching is not done in C.

Granted. But _Bool is really not my reason for using C99; I am
not sure what the latter part of your sentence refers to.

More likely, C89 plus some (small) parts of the C99 language and some
(small) parts of C99 standard library.

I can see long long and inline becoming mainstream extensions to C89
and snprintf and the revised freopen as mainstream extensions to the C89
library. I also hope for VLAs, but I'm not holding my breath.

Yes. I also would like to have the types from <stdint.h> and designated
initializers.

Note that gcc's support for both inline and VLAs is not conforming to the
C99 specification. That's why I wouldn't recommend gcc -std=c99 as the
proper teaching aid for a C99 course.

I am aware of both; apart from flexible array members, most people
will not notice any difference with respect to the VLAs, but the
"extern inline" issue is rather nasty. I am not sure which is
actually the "better" solution.
The complex type support would have been a nice toy but is not
really necessary.

I find it more worrying that things like typeof have been kept
from us and that certain things are (still) so weak that they are
next to useless (e.g. volatile or bit fields).

Apart from that and even though I am aware that this is a
_very_ controversial issue, I would have liked to see in addition
to the standard library sort of extended libraries covering
stuff which should not demanded for portable applications but
is more or less the object of reinventing the wheel for most
C programmers. These should really be kept apart from C as
such but could ease the way for "compatibility" between compilers
on mainstream systems. Could also be an additional standard.
*sigh* Now bring the flames...

Cheers
Michael

Old Wolf · Nov 17, 2004

pete said:
(-1 < 0x1) is true.

(-1 < 0x1u) is false.

Right. I thought that 0x.. constants were unsigned, I must
have been confusing that with some other situation
(although I can't think what).

dynamic array of character arrays	13	Nov 30, 2009
sizeof and character arrays?	3	Sep 13, 2007
Weird Behavior with Rays in C and OpenGL	4	Feb 13, 2024
Can't solve problems! please Help	0	Sep 26, 2022
delete end character / null terminator ?	6	Aug 18, 2009
K&R2 section 1.9 character arrays	6	Mar 10, 2007
need help with arrays	6	Apr 29, 2008
Making an array of arrays?	1	Mar 15, 2011

Character arrays

Michael Mair

Michael Mair

ranjeet

Chris Croughton

Michael Mair

Chris Croughton

Michael Mair

Chris Croughton

Old Wolf

pete

Chris Croughton

pete

Charlie Gordon

Charlie Gordon

Richard Bos

Chris Croughton

Charlie Gordon

Dan Pop

Michael Mair

Old Wolf

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads