differance between binary file and ascii file

C

Chris Hills

Alan said:
I think you mean character not byte

"Byte: a group of eight binary digits, often used to represent one
character". The Concise Oxford Dictionary.

"Byte: a group of eight binary digits processed as a unit by a computer and
used to represent an alphanumeric character". Merriam-Webster Dictionary.

We have to use some standard definition of words otherwise we will fall into
a morass of misunderstanding.

Unfortunately both are wrong... A byte AFAIK is the smallest amount you
can address on a MCU/CPU. Now 95% of the time it is 8 bits but in the
past it has been many other things.
 
C

Chris Hills

Alan said:
This newsgroup is for C language related questions. The OP is asking a more
generalized question.

We keep having this argument. SOME users want to limit the scope of this
NG and other want to widen it. As there is no charter the majority will
eventually prevail.

A byte is always 8 bits by definition!!!

I don't think so. My Father who has been in computing since 1952 would
not agree with you. AS you say....
On older CDC computers, for
example,

Actually many other "older computers that is pre the mid-late 1980's
when the PC and other 8 bit "home" computers came to prominence. A byte
width of all sorts of sizes was used previously.

Bytes were *usually* but not always the same size as a single character
the character set being used.

there was a "character" of 6 bits but it was never referred to as
a "byte".

Characters were all sorts of sizes (I have a comms program that would
handle 4 to 9 bits)
A character is not some arbitrary size. A character is either one Byte (ie
8 bits) or in the case of Unicode it is two Bytes (ie 16 bits). BTW - the
word is "guaranteed".

I though a char could hold a character. Some systems use less than 8
bits and some more but, in the past these were not always multiples of 8
bits.

Because the size of byte is variable we used OCTET for 8 bits in the
comms industry where PC's are not common.
 
F

Flash Gordon

Richard said:
Flash Gordon said:


No, greater than 7 bits. It is legal for CHAR_BIT to be 8, as you really
ought to know. Please make sure your "corrections" are correct before
posting.

Obviously too early on a Sunday morning. I'm so used to seeing >=8 I
misread it.
 
P

pete

Chris Hills wrote:
Because the size of byte is variable we used OCTET for 8 bits in the
comms industry where PC's are not common.

In C, which is the topic here of this newsgroup,
the "size" of a type, is measured in bytes.

The "width" of a byte, varries according to the implementation.

CHAR_BIT from limits.h, is the width of a byte.
 
T

Tomás

Chris Hills posted:

Unfortunately both are wrong... A byte AFAIK is the smallest amount you
can address on a MCU/CPU. Now 95% of the time it is 8 bits but in the
past it has been many other things.

Incorrect.

On a particular system, it's okay if:

A) The smallest amount of memory addressable is 16 bits.
B) A char is 8 bits. (And thus a byte is 8 bits).

To facilitate this on this particular system, a char* holds more info than
an int*, because it also specifies whether it wants "the first 8 bits or
the last 8 bits".

-Tomás
 
O

osmium

:

"byte: addressable unit of data storage large enough to hold any member of
the basic character set of the execution environment.
Note 1: It is possible to express the address of each individual byte of
an
object uniquely.
Note 2: A byte is composed of a contiguous sequence of bits, the number of
which is implementation-defined." - ISO/IEC 9899:1999

Authoritative technical definitions trump dictionary definitions.


That's why we have an International C Standard, which defines "byte" very
precisely.

I'll be damned! In Note 2, they defined byte very precisely as a word that
simply means a collection of contiguous bits. They took a widely used
word, that meant something to hundreds of thousands of people and redefined
it to mean something entirely different.

There are about 30 definitions of byte that make the cut on google, and the
*vast* majority say a byte is eight bits.

http://tinyurl.com/j79j4

That's simply appalling! Now the world needs a committee to define a word
for eight contiguous bits. How about naming it in honor of the clown who
got that inserted into the standard?

Historically, the smallest addressable unit of storage was a character.
They seem to have gotten tangled up and ignored the distinction between a
character and a character code, and ignored the fact hat they were different
things. I think this made up example from history is right: The IBM 7094
has a six-bit character. The character code is BCD.

Note that addressable does not imply that a single character can be read
from memory, it only means there are hardware instructions to do something
useful at this level.
 
R

Richard Heathfield

Tomás said:
Chris Hills posted:



Incorrect.

On a particular system, it's okay if:

A) The smallest amount of memory addressable is 16 bits.
B) A char is 8 bits. (And thus a byte is 8 bits).

Well, the underlying system can be like that, but if so, a conforming
implementation must either set CHAR_BIT as 16 or do some magic to make it
appear as if octets are addressable individually.

Chris is perfectly correct, from a C perspective.
 
P

P.J. Plauger

I'll be damned! In Note 2, they defined byte very precisely as a word
that simply means a collection of contiguous bits. They took a widely
used word, that meant something to hundreds of thousands of people and
redefined it to mean something entirely different.

There are about 30 definitions of byte that make the cut on google, and
the *vast* majority say a byte is eight bits.

We forgot to do a web search before we chose that terminology in 1983.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com
 
O

osmium

P.J. Plauger said:
We forgot to do a web search before we chose that terminology in 1983.

I appreciate your sarcasm and have no desire to argue with anyone - and most
certainly not with you.

But wasn't the word byte pretty much introduced into the world by the IBM
360 in 1964 or thereabouts?
 
R

Richard Heathfield

osmium said:
I appreciate your sarcasm and have no desire to argue with anyone - and
most certainly not with you.

But wasn't the word byte pretty much introduced into the world by the IBM
360 in 1964 or thereabouts?

Knuth says that the 8-bit "standardisation" happened in around 1975 or so.
By then, C was already well under way, and dmr was almost certainly
accustomed to using the word in its non-"standard" sense.
 
M

Martin Ambuhl

osmium said:
But wasn't the word byte pretty much introduced into the world by the IBM
360 in 1964 or thereabouts?

No. The word "byte" was common long before 1964, and with many machines
when no one had ever hallucinated a 360. In fact, it is from 1956,
coined by Werner Buchholz.
At least you got the company right. Otherwise, you have wrong
the date: 1956, not 1964
the computer: IBM Stretch, not IBM 360
You probably got the size wrong, too. Its orginal incarnation was 1 to
6 bits, not 8. The Dec family, especially the PDP-6 and -10 nicely
extended this to 1 to 36 bits.
 
K

Keith Thompson

Tomás said:
Chris Hills posted:

Incorrect.

On a particular system, it's okay if:

A) The smallest amount of memory addressable is 16 bits.
B) A char is 8 bits. (And thus a byte is 8 bits).

To facilitate this on this particular system, a char* holds more info than
an int*, because it also specifies whether it wants "the first 8 bits or
the last 8 bits".

Then an 8-bit byte is addressible (with a little extra effort by the
compiler).
 
K

Keith Thompson

Chris Hills said:
We keep having this argument. SOME users want to limit the scope of this
NG and other want to widen it. As there is no charter the majority will
eventually prevail.

There is currently exactly one newsgroup where we can discuss standard
C without getting bogged down in system-specific or otherwise
irrelevant details. (comp.lang.c.moderated is too slow, and
comp.std.c has a different purpose.) Some users want us to have *no*
such newsgroup at all. It's possible that they'll eventually prevail,
but I sincerely hope they don't.

Chris, I've asked you a couple of questions on this topic, and you've
never answered them.

You mentioned other (non-Usenet) forums where people discuss the C
programming language. I'd like to take a look. Where are they?

If you think the current topicality guidelines should be changed,
*how* do you think they should be changed? Can you give specific
examples of things you think should be considered topical that
currently aren't? If there were a charter for this newsgroup, what
should it say?
 
K

Keith Thompson

osmium said:
I'll be damned! In Note 2, they defined byte very precisely as a word that
simply means a collection of contiguous bits. They took a widely used
word, that meant something to hundreds of thousands of people and redefined
it to mean something entirely different.

There are about 30 definitions of byte that make the cut on google, and the
*vast* majority say a byte is eight bits.

http://tinyurl.com/j79j4

That's simply appalling! Now the world needs a committee to define a word
for eight contiguous bits. How about naming it in honor of the clown who
got that inserted into the standard?

Historically, the smallest addressable unit of storage was a character.
They seem to have gotten tangled up and ignored the distinction between a
character and a character code, and ignored the fact hat they were different
things. I think this made up example from history is right: The IBM 7094
has a six-bit character. The character code is BCD.

No, the term "byte" did not originally mean exactly 8 bits.

<http://www.catb.org/~esr/jargon/html/B/byte.html> says:

byte: /bi:t/, n.

[techspeak] A unit of memory or data equal to the amount used to
represent one character; on modern architectures this is
invariably 8 bits. Some older architectures used byte for
quantities of 6, 7, or (especially) 9 bits, and the PDP-10
supported bytes that were actually bitfields of 1 to 36 bits!
These usages are now obsolete, killed off by universal adoption of
power-of-2 word sizes.

Historical note: The term was coined by Werner Buchholz in 1956
during the early design phase for the IBM Stretch computer;
originally it was described as 1 to 6 bits (typical I/O equipment
of the period used 6-bit chunks of information). The move to an
8-bit byte happened in late 1956, and this size was later adopted
and promulgated as a standard by the System/360. The word was
coined by mutating the word .bite. so it would not be accidentally
misspelled as bit. See also nybble.

(I would dispute the use of the word "invariably".)
 
C

Chris Hills

Keith Thompson <kst- said:
There is currently exactly one newsgroup where we can discuss standard
C without getting bogged down in system-specific or otherwise
irrelevant details. (comp.lang.c.moderated is too slow, and
comp.std.c has a different purpose.) Some users want us to have *no*
such newsgroup at all. It's possible that they'll eventually prevail,
but I sincerely hope they don't.

Chris, I've asked you a couple of questions on this topic, and you've
never answered them.

I have them marked for reply and feeling guilty I have not done so. It
needs thinking about for specific answers and I have been busy this week
and did not want to just dash off a quick reply.
You mentioned other (non-Usenet) forums where people discuss the C
programming language. I'd like to take a look. Where are they?

I found some on both Yahoo and Google. The problem is it is very easy on
Yahoo to start a group compared to usenet. SO some people never get past
the web interface of Yahoo and to the usenet groups.

It came as quite a surprise to me that there were so many c, c++ 8051
and 8 bit MCU Yahoo groups. People how have no idea that there is a
comp.arch.embedded. Eventually the people using some of the usenet the
technical groups are going to fade away and not be replaced.
If you think the current topicality guidelines should be changed,
*how* do you think they should be changed? Can you give specific
examples of things you think should be considered topical that
currently aren't? If there were a charter for this newsgroup, what
should it say?

That is a good idea.... can we draw up a charter? It would mean people
have to think about it. I did not replay to your "specific" question as
it needed some thought and I did not want to do an off the cuff reply.

The problem is it is difficult to be specific on vaguely loosening a
tight spec. :)
 
O

osmium

Chris Hills said:
Keith Thompson wrote:

That is a good idea.... can we draw up a charter? It would mean people
have to think about it. I did not replay to your "specific" question as
it needed some thought and I did not want to do an off the cuff reply.

<I *think* that pruning is right>

My guess is that the question is more along the lines of "If I gave you a
million dollars, what would you do with it?".
 
C

CBFalconer

Chris said:
.... snip ...


That is a good idea.... can we draw up a charter? It would mean
people have to think about it. I did not replay to your "specific"
question as it needed some thought and I did not want to do an off
the cuff reply.

The problem is it is difficult to be specific on vaguely loosening
a tight spec. :)

If you are going to make a proposal then I suggest you start from
"the tight spec". Remember that even now things involving POSIX
are adequately housed on comp.unix.programming (or such) and
windows (cursed be its name) on microsoft.*.

One loosening that might be worthwhile is advice on how to make
various almost-C compilers hew to the various standards, and their
failings.

--
"If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers." - Keith Thompson
More details at: <http://cfaj.freeshell.org/google/>
Also see <http://www.safalra.com/special/googlegroupsreply/>
 
C

Chris Hills

osmium said:
<I *think* that pruning is right>

My guess is that the question is more along the lines of "If I gave you a
million dollars, what would you do with it?".

500K on wine, women and parties. Then just fritter the rest away :)
 
P

P.J. Plauger

I appreciate your sarcasm and have no desire to argue with anyone - and
most certainly not with you.

But wasn't the word byte pretty much introduced into the world by the IBM
360 in 1964 or thereabouts?

As others have pointed out in detail, byte is an old word. IBM's System/360,
introduced in the early 1960s, began the modern trend to eight-bit bytes
and byte-resolution arithmetic. Nevertheless, in the 1980s it was still not
uncommon to refer to "an x-bit byte machine", where x assumed quite a few
different values. Thus, the C Standard hardly plowed any new ground in
this area, and certainly didn't defy common parlance of the time.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com
 
M

Mike S

Richard said:
No, that almost certainly means simply that you've got a Windows text file
on a Linux system. Linux doesn't know, bless it, that you've been sullying
its filesystem with foreign muck. :)

Sorry, I should have explained myself more clearly. At the moment I am
running on Windows with a Windows port of gcc. But, before I get
off-topic with environment specs, my real question is simply: does the
Standard require that stdin, stdout, and stdout be opened in a known
mode or is this detail left to the compiler? I ask because if I compile
the following (say as test.exe):

#include <stdio.h>

/* count occurences of '\r' in the input stream */
int main()
{
int c, count;
count = 0;
while ((c = getchar()) != EOF)
if(c == '\r')
++count;
printf("counted %d \\r's in input.\n", count);
return 0;
}

and I run the program with itself as input ("test < test.c"), the
result (on my machine) is

counted 12 \r's in input.

However, as I understand it, any '\r\n' sequences in the input stream
should have been mapped to '\n'. The only literature I can find
relating to this is what Microsoft has to say about how their C
compilers open stdin (they say it is opened in text mode in their
compilers). But what does the ANSI/ISO Standard say about what mode
stdin, stdout, and stderr are opened in? Maybe my compiler is just
misbehaving...

Mike S
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,183
Messages
2,570,967
Members
47,520
Latest member
KrisMacono

Latest Threads

Top