What is wrong with this c-program?

  • Thread starter Albert van der Horst
  • Start date
K

Keith Thompson

David Brown said:
I have used processors with 16-bit "char", and no way to make an 8-bit
type (except as a bitfield). Nowhere, in any of the documentation,
manuals, datasheets, or anywhere else was there any reference to a
"byte" that is not 8 bits. It made clear that a /char/ was 16 bits
rather than the usual 8 bits, but they were never called "bytes".

I haven't used such devices much - but the vast majority of people who
use the term "byte" have never even heard of such devices, never mind
used them.

There are only two situations when "byte" does not automatically and
unequivocally mean "8 bits" - one is in reference to ancient computer
history (and documents from that era, such as network RFC's), and the
other is extreme pedantry. There is a time and place for both of these
- but you won't convince many people that you would ever /really/ think
a reference to a "byte" meant anything other than 8 bits.

(If you can give modern, or at least still-current, references to usage
of "byte" for something other than 8 bits, then I will recant and blame
the egg nog!.)

A "char", as you say, has a well defined meaning - but not a well
defined size.

As I'm sure you know, the ISO C standard uses the term "byte" to refer
to the size of type char, which is CHAR_BIT bits. CHAR_BIT is required
to be *at least* 8, but may be larger. I can't think of anything in the
standard that even implies that 8 is the preferred value.

And yes, I understand that there are real-world systems with CHAR_BIT >
8 (DSPs, mostly), though I haven't used a C compiler on any of them.

Even if CHAR_BIT were required to be exactly 8, I'd still prefer to
refer to CHAR_BIT rather than using the constant 8, since the macro name
makes it clearer just what I mean.

But if I have a need to write code that won't work unless CHAR_BIT==8,
I'll probably take a moment to ensure that it won't *compile* unless
CHAR_BIT==8. (Unless I'm working on existing code that has such
assumptions scattered through it; in that case, I probably won't bother.)
 
D

David Brown

On 12/29/2013 03:32 PM, David Brown wrote:
...

I'm curious about your labeling of this definition as "old-fashioned". A
very large fraction of newly written C code is targeted to embedded
systems. I'm not sure how large that fraction is, but it's large enough
that, on one memorable occasion, I had considerable trouble convincing
one guy that it was less than 100%. Many of those embedded systems are
DSPs with 16 bit bytes. So implementations of C with bytes that are not
equivalent to octets is a very current issue.

I am an embedded programmer myself - with 8-bit, 16-bit and 32-bit
microcontrollers. I have used DSP's a little, but DSP programming is a
niche area. I don't have any hard numbers, but I think the percentage
of C code written for DSP's is very low, and getting lower as ordinary
microcontrollers replace them. There are lots of DSP devices sold - but
fewer people programming them.

Anyway, these devices do not have 16-bit "bytes" as such. Many types
have 16-bit or 32-bit "chars" - but they are not (normally) referred to
as "bytes". The exception, of course, is the C standards which define
"byte" to be the smallest addressable unit. (Some microcontrollers and
DSP's allow direct addressing of bits, but that uses compiler-specific
extensions to C.) I don't have any DSP datasheets or toolchain manuals
handy, so I am relying on memory here, but with the devices I used,
groups of 16 bits were never referred to as "bytes".

The word "byte" has several definitions, such as the one in the C (and
C++) standards, the one in the IEEE (I've forgotten the number)
standard, and the ones used by various computer manufacturers over the
decades. But the de facto definition used in almost all current
contexts is 8 bits. That is why I say other uses are old-fashioned. The
C standards are written in a rather specific style, with their own
specific definitions of terms that are often consistent with historical
usage rather than current usage. (Compare that to the Java standard,
which I believe defines a "byte" as 8 bits.)
So, you consider the definition of "byte" that is provided by the C
standard to be so thoroughly esoteric (is that the right word to cover
your objection?) that it would never occur to you that I might consider
that definition to be authoritative in the context of C code? Unless I
emphatically told you otherwise (as I am now doing), you:


That seems like a rather extreme position to take. It's as if you were
actively resisting learning the fact (and it IS a fact) that there are
contexts where some people use the term with a different definition from
the one you're used to.

Fair enough - the context of C standards is a clear exception where
"byte" can mean more than 8 bits, and obviously that is a common case
here (although it is "esoteric" outside the world of c.l.c. and similar
forums).

But even here, how often does the issue of "char" possibly having more
than 8 bits come up when it makes a real-world, practical difference?
It is almost invariably when someone writes code that assumes chars are
8 bits, or assumes the existence of uint8_t, and then one of the
standards experts points out that this might not always be true. (I
don't mean this comment in a bad way - it is a useful thing to be
informed of these details.) And almost invariably, the code the OP is
writing will never be used on a system without 8-bit chars, and the OP
knows this.

And even in the context of machines with more than 8-bit chars, how
often are these referred to as "bytes" without a great deal of
qualification or context to avoid confusion?


Maybe I expressed myself a bit too strongly, but I would certainly be
surprised to read any reference to "byte" here that did not refer to
8-bit bytes, unless there was context to qualify it.
 
D

David Brown

As I'm sure you know, the ISO C standard uses the term "byte" to refer
to the size of type char, which is CHAR_BIT bits. CHAR_BIT is required
to be *at least* 8, but may be larger. I can't think of anything in the
standard that even implies that 8 is the preferred value.

From the N1570 draft of C11 standard (since I have it to hand):

byte
addressable unit of data storage large enough to hold any member of the
basic character set of the execution environment

I don't know /exactly/ what that means in a non-hosted environment
without any character set (as would be the case for most DSP's), but I
take it to mean the smallest directly addressable unit of storage of
size at least 8-bit (by the later definition of CHAR_BIT).

And I'll agree that these standards are current documents, though their
definition of "byte" is for consistency with historical versions of the
standards rather than for consistency with modern usage.
And yes, I understand that there are real-world systems with CHAR_BIT >
8 (DSPs, mostly), though I haven't used a C compiler on any of them.

Yes, that's true - and I have used a C compiler for a couple of them.
But they don't refer to a 16-bit "char" as a "byte" (except as implied
by the C standards), precisely to avoid confusion. There are more than
enough sources of confusion when you have to work with such systems...
Even if CHAR_BIT were required to be exactly 8, I'd still prefer to
refer to CHAR_BIT rather than using the constant 8, since the macro name
makes it clearer just what I mean.

Fair enough - clarity is important.
But if I have a need to write code that won't work unless CHAR_BIT==8,
I'll probably take a moment to ensure that it won't *compile* unless
CHAR_BIT==8. (Unless I'm working on existing code that has such
assumptions scattered through it; in that case, I probably won't bother.)

Absolutely. Usually I do that by using "uint8_t" (and occasionally
"int8_t") types - code that is dependent on CHAR_BIT == 8 will typically
have use of such types. And on the few occasions when I have been
unable to avoid working on DSP's with 16-bit (or even 32-bit) chars, I
have gone through all the code carefully to make sure it will work.
 
J

James Kuyper

....
The word "byte" has several definitions, such as the one in the C (and
C++) standards, the one in the IEEE (I've forgotten the number)
standard, and the ones used by various computer manufacturers over the
decades. But the de facto definition used in almost all current
contexts is 8 bits. That is why I say other uses are old-fashioned.

I don't follow that - it would only make sense if the de-facto usage had
already long since replaced the standard-defined meaning. The fact is
that both meanings have been in existence and in use at the same time
for a very long time. The de-facto usage is unambiguously the more
common one, but that's not because it has replaced the standard one. The
standard definition has never been widely used, but the same type of
people who used to use it in the appropriate contexts have continued to
use it in those contexts.
 
J

James Kuyper

On 12/30/2013 06:35 AM, David Brown wrote:
....
From the N1570 draft of C11 standard (since I have it to hand):

byte
addressable unit of data storage large enough to hold any member of the
basic character set of the execution environment

I don't know /exactly/ what that means in a non-hosted environment
without any character set

The environment might not have any character set, but the C standard
requires that a conforming implementation for that environment support
the basic character set as defineed in section 5.2.1. The first use of
"basic character set" in that section is in italics, an ISO convention
indicating that it constitutes the definition of that term:
A byte with
all bits set to 0, called the null character, shall exist in the basic execution character set; it
is used to terminate a character string.
3 Both the basic source and basic execution character sets shall have the following
members: the 26 uppercase letters of the Latin alphabet

A B C D E F G H I J K L M
N O P Q R S T U V W X Y Z

the 26 lowercase letters of the Latin alphabet

a b c d e f g h i j k l m
n o p q r s t u v w x y z

the 10 decimal digits

0 1 2 3 4 5 6 7 8 9

the following 29 graphic characters

! " # % & ' ( ) * + , - . / :
; < = > ? [ \ ] ^ _ { | } ~

the space character, and control characters representing horizontal tab, vertical tab, and
form feed.

A freestanding implementation is not required to support <stdio.h>,
which makes that character set relatively unimportant, but it still must
be defined, - each of those characters must be assigned a unique
encodings, character literals must have those values, and string
literals require the existence of arrays of char whose elements have
those values. There are, if I counted correctly, 92 members of the basic
character set, so it requires more than 7 bits to give each one a unique
encoding, so the implication is that a the addressable storage unit must
be at least 8 bits. This is also implied the requirements that SCHAR_MIN
<= -127, SCHAR_MAX >= 127, and UCHAR_MAX >= 255 (5.2.4.2.1p1).
 
K

Keith Thompson

James Kuyper said:
A freestanding implementation is not required to support <stdio.h>,
which makes that character set relatively unimportant, but it still must
be defined, - each of those characters must be assigned a unique
encodings, character literals must have those values, and string
literals require the existence of arrays of char whose elements have
those values. There are, if I counted correctly, 92 members of the basic
character set, so it requires more than 7 bits to give each one a unique
encoding, so the implication is that a the addressable storage unit must
be at least 8 bits. This is also implied the requirements that SCHAR_MIN
<= -127, SCHAR_MAX >= 127, and UCHAR_MAX >= 255 (5.2.4.2.1p1).

92 distinct values can be represented in just 7 bits, so the definition
of the basic character set only implies CHAR_BIT >= 7. The actual
requirement that CHAR_BIT >= 8 is stated explicitly elsewhere in the
standard.
 
K

Keith Thompson

gyl said:
when/why a printf will cause a segment fault?

If you call printf, you need to call it correctly. Some errors, such as
passing something of the wrong type as the first argument, are
compile-time errors; others, such as passing a later argument with the
wrong format, needn't be caught by the compiler. The latter class of
errors result in *undefined behavior*. A segmentation fault is one
possible result.

If you want to print a long int value, you need to use the "%ld" format
(or something like it); "%d" requires an int argument.

Without looking at the code, I don't know whether that's what's causing
the symptom you're seeing, but you should certainly fix that problem.
You should also invoke gcc with options to enable more warnings, such as
"gcc -std=c99 -pedantic -Wall -Wextra -O3". (You might vary the
"-std=c99 -pedantic" options depending on what dialect of C you're
trying to use.)
 
J

James Kuyper

92 distinct values can be represented in just 7 bits, so the definition
of the basic character set only implies CHAR_BIT >= 7. ...

You're right, of course - I wasn't thinking hard enough about what I was
writing.
... The actual
requirement that CHAR_BIT >= 8 is stated explicitly elsewhere in the
standard.

Specifically, 5.2.4.2.1p1, where CHAR_BIT is described as "number of
bits for smallest object that is not a bit-field (byte)".

It seems a little odd that "byte" is defined as being "large enough to
hold any member of the basic character set", when that requirement is
NOT the one that determines its minimum size. I had incorrectly
remembered the size as being determined by that requirement, which would
have rendered the requirement that CHAR_BIT >= 8 redundant.
 
J

James Kuyper

On 12/30/2013 01:27 PM, Martin Ambuhl wrote:
....
In the case of his code, others have pointed out logical errors more
likely to cause a segfault. But, yes, printf can cause a segfault when
you misspecify the the lengths of arguments.

Yes, but it's substantially less likely to malfunction in that
particular fashion when only one format specifier is present, and it
specifies a type that is probably no larger than the type of the only
argument. Malfunctions of other types are still quite likely, of course.
 
J

James Kuyper

On 12/30/2013 06:35 AM, David Brown wrote:
...
From the N1570 draft of C11 standard (since I have it to hand):

byte
addressable unit of data storage large enough to hold any member of the
basic character set of the execution environment

I don't know /exactly/ what that means in a non-hosted environment
without any character set

The environment might not have any character set, but the C standard
requires that a conforming implementation for that environment support
the basic character set as defineed in section 5.2.1. The first use of
"basic character set" in that section is in italics, an ISO convention
indicating that it constitutes the definition of that term:
A byte with
all bits set to 0, called the null character, shall exist in the basic execution character set; it
is used to terminate a character string.
3 Both the basic source and basic execution character sets shall have the following
members: the 26 uppercase letters of the Latin alphabet

A B C D E F G H I J K L M
N O P Q R S T U V W X Y Z

the 26 lowercase letters of the Latin alphabet

a b c d e f g h i j k l m
n o p q r s t u v w x y z

the 10 decimal digits

0 1 2 3 4 5 6 7 8 9

the following 29 graphic characters

! " # % & ' ( ) * + , - . / :
; < = > ? [ \ ] ^ _ { | } ~

the space character, and control characters representing horizontal tab, vertical tab, and
form feed.

A freestanding implementation is not required to support <stdio.h>,
which makes that character set relatively unimportant, but it still must
be defined, - each of those characters must be assigned a unique
encodings, character literals must have those values, and string
literals require the existence of arrays of char whose elements have
those values. There are, if I counted correctly, 92 members of the basic
character set, so it requires more than 7 bits to give each one a unique
encoding, so the implication is that a the addressable storage unit must
be at least 8 bits. ...

That should have said 6 and 7 bits respectively. I should also have
mentioned that CHAR_BIT is explicitly required to be at least 8, which
is therefore the tighter constraint.
The basic character set for the execution environment includes several
additional control characters ("In the basic execution character set,
there shall be control characters representing alert, backspace,
carriage return, and new line). I think that pushes the total to 99
(52+10+29+space+7ctrl).

That's why I said "if I counted correctly". I knew that 92 didn't sound
right. The basic character set includes

26 upper case letters
26 lower case letters
10 decimal digits
29 graphic characters
1 space character
4 control characters
==
96 characters

My count included the null character, which is only in the basic
execution character set (5.2.1p2), and didn't include the last two
categories.

The basic execution character set, the one referenced by the definition
of "byte", includes the null character and the three additional control
characters mentioned in the sentence you cited, bringing the total to an
even 100 characters.
 
K

Kenny McCormack

I once had to write

int fifteen = 15;

and replace a literal 15 in the code with fifteen to make the code work
(after a compiler update).

Kiki will be by any minute now to point out that that compiler was not
standards-compliant.

And is thus OT in this newsgroup. Shame on you!
 
G

glen herrmannsfeldt

I once had to write
int fifteen = 15;
and replace a literal 15 in the code with fifteen to make the
code work (after a compiler update).

There are places in Fortran IV and Fortran 66 where variables are
allowed, but not constants. (The I/O list of WRITE statements,
for one.) That sometimes results in such variables.

One I remember from a C compiler was failing to compile ++
applied to a double variable. Presumably rare enough that it
wasn't caught in testing, but I had to change it so my program
would compile.

-- glen
 
J

Jorgen Grahn

.
Anyway, these devices do not have 16-bit "bytes" as such. Many types
have 16-bit or 32-bit "chars" - but they are not (normally) referred to
as "bytes". The exception, of course, is the C standards which define
"byte" to be the smallest addressable unit. (Some microcontrollers and
DSP's allow direct addressing of bits, but that uses compiler-specific
extensions to C.) I don't have any DSP datasheets or toolchain manuals
handy, so I am relying on memory here, but with the devices I used,
groups of 16 bits were never referred to as "bytes".

That matches my recollection (I've been programming Texas DSPs at two
quite diffeerent workplaces, around 1997 and in 2003 or so). IIRC you
tended to talk about "words" instead -- another rather fuzzy term
which in my mind translates to "the width which is a rough best match
for registers and memory accesses".

/Jorgen
 
G

guinness.tony

I have the following program that gives consistently a segfault

for ( i=2; i<=n; i++)

Your problem lies here. The limit of the loop should be the truncated
(integer) square root of n, not n itself.

See https://en.wikipedia.org/wiki/Sieve_of_eratosthenes#Implementation
int main( int argv, char** argc)

Although by no means wrong, this is an unusual choice of parameter
names for main(). By convention, the argument-count is usually
named 'argc' and the argument-values pointer 'argv'.

HTH,
Tony.
 
J

Jorgen Grahn

You're too kind. It's totally "wrong" by any measure other than allowing
people to write crap unmaintainable code. That line would screw up loads
of people and cost countless man hours down the road. So it IS wrong imo.

It's not wrong, but it's so close to wrong that it might as well been.
(And I think I commented on it last year, elsewhere in the thread.
I expect it to be fixed already.)

/Jorgen
 
K

Ken Brody

ie its wrong. Lets not be silly here. Only an idiot would purposely swap
around the names of two pretty much industry standard main parameters.

It's a little wrong to say a tomato is a vegetable. It's very wrong to say
it's a suspension bridge.
 
K

Kaz Kylheku

The program itself doesn't give any complaints with gcc 4.4
on a stable Debian.
^^^^^^

apt-get install valgrind
It is a classical prime sieve.
I can't find any fault with it.

You're joking, right?

How about making effective use of readily available tools.

Compiler's opinion: plenty wrong here!

$ gcc -Wall -ansi -pedantic -W -g sieve.c -o sieve
sieve.c: In function ‘main’:
sieve.c:41:5: warning: implicit declaration of function ‘atoi’ [-Wimplicit-function-declaration]
sieve.c:42:5: warning: format ‘%d’ expects argument of type ‘int’, but argument 2 has type ‘long int’ [-Wformat]
sieve.c:40:9: warning: unused variable ‘i’ [-Wunused-variable]
sieve.c:38:15: warning: unused parameter ‘argv’ [-Wunused-parameter]
sieve.c:45:1: warning: control reaches end of non-void function [-Wreturn-type]

Validation with valgrind:

$ valgrind ./sieve 1231234
[.. snip ...]
==31965== Invalid write of size 1
==31965== at 0x8048519: fill_primes (sieve.c:33)
==31965== by 0x8048579: main (sieve.c:43)
==31965== Address 0x881002e9 is not stack'd, malloc'd or (recently) free'd

The offending line 33 is

for (j=i*i; j<=n; j+=i) composite[j] = 1;

suggesting you're blowing past the end of a the static array.
 
S

Skybuck Flying

When I saw this subject line the first thing I thought was:

"Probably a newb question/program".

The second thing was more funny:

"Everything because it was written in C :)"

Bye,
Skybuck.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,075
Messages
2,570,553
Members
47,197
Latest member
NDTShavonn

Latest Threads

Top