C Text/Binary Files

B

Bartc

The stdin/stdout files of C seem to be always in Text mode.

Is there any way of running a C program so that these (but especially
stdout) are in Binary mode instead?

(I'm in the process of wrapping a different language around C which doesn't
want the concept of text and binary files. But if I output a string such as
"ONE\nTWO\n", this will behave differently between stdout and a regular
(binary) file. Examples on my OS:

"\n" Output 13,10 in text mode; 10 in binary mode
"\w" Output 13,13,10 in text mode; 13,10 in binary mode

(\w is a new escape code equivalent to \r\n). Workarounds will be awkward
(and I could never stop \n expanding to 13,10 for stdout) so would be nice
to avoid them)
 
S

santosh

Bartc said:
The stdin/stdout files of C seem to be always in Text mode.

Is there any way of running a C program so that these (but especially
stdout) are in Binary mode instead?

Yes, use freopen like this:

FILE *fin, *fout, *ferr;

fin = freopen(NULL, "rb", stdin);
fout = freopen(NULL, "ab", stdout);
ferr = freopen(NULL, "ab", stderr);

You could assign the return value to stdin, stdout and stderr itself,
but the standards says that they are not necessarily modifiable
lvalues. However it will probably work on most systems you would care
about.

See section 7.19.5.4 of the standard for details.

<snip>
 
B

Ben Bacarisse

santosh said:
Yes, use freopen like this:

FILE *fin, *fout, *ferr;

fin = freopen(NULL, "rb", stdin);
fout = freopen(NULL, "ab", stdout);
ferr = freopen(NULL, "ab", stderr);

You could assign the return value to stdin, stdout and stderr itself,
but the standards says that they are not necessarily modifiable
lvalues. However it will probably work on most systems you would care
about.

More importantly, freopen is not guaranteed to do what Bartc wants.
Thus the key information is not what the standard says but what
typical implementations do on systems where there is difference
between text and binary mode. I can give only one data point:
lcc-win32 returns NULL from the freopen call (for stdout).
 
S

santosh

Ben said:
More importantly, freopen is not guaranteed to do what Bartc wants.
Thus the key information is not what the standard says but what
typical implementations do on systems where there is difference
between text and binary mode. I can give only one data point:
lcc-win32 returns NULL from the freopen call (for stdout).

And it similarly fails for stdin too. It's perhaps surprising that it
should fail. What difficulty would an implementation like win-lcc have
with this?
 
B

Bartc

Bartc said:
The stdin/stdout files of C seem to be always in Text mode.

Thanks for the replies.

I think if I use exclusively "\w" for newlines (ie. "\r\n") in strings and
internal functions that generate newlines, then this will work for binary
files.

For stdout, this will generate (on my OS) 13,13,10, but for console output
that is not critical. The only problem will be when stdout is piped or
redirected to a file at the OS command line, then I will need to process the
output to take out the extra 13.

I can live with that.

I have tried freopen() as suggested, and that sort of works, but output is
then sent to a file. So this is an alternative perhaps to redirection by the
OS and the mode /will/ be binary.
 
R

rahul

And it similarly fails for stdin too. It's perhaps surprising that it
should fail. What difficulty would an implementation like win-lcc have
with this?


The following works for me:
#include <stdio.h>
#include <stdlib.h>

int
main(void) {
stdout = freopen(NULL, "ab", stdout);
return 0;
}

I compiled that with gcc on Linux. It works probably because Linux/
Unix does not distinguish between text and binary mode.
 
R

Richard Bos

santosh said:
Yes, use freopen like this:

FILE *fin, *fout, *ferr;

fin = freopen(NULL, "rb", stdin);
fout = freopen(NULL, "ab", stdout);
ferr = freopen(NULL, "ab", stderr);

Note that freopen() with a null first argument is new in C99. In C89,
you had to give a new file name.

Richard
 
K

Keith Thompson

Bartc said:
Thanks for the replies.

I think if I use exclusively "\w" for newlines (ie. "\r\n") in strings and
internal functions that generate newlines, then this will work for binary
files.
[...]

What is "\w"? It's not a standard escape sequence; its value is
implementation-defined.
 
H

Harald van Dijk

What is "\w"? It's not a standard escape sequence; its value is
implementation-defined.

"\w" does not match the syntax of a string literal, so by the rule of the
longest match this is tokenised as {"}{\}{w}{"}. The behaviour is
undefined if a double quote character occurs as a single token. There need
not be any value given to "\w", and if there is, it need not be documented.
 
B

Bartc

Keith Thompson said:
Bartc said:
Thanks for the replies.

I think if I use exclusively "\w" for newlines (ie. "\r\n") in strings
and
internal functions that generate newlines, then this will work for binary
files.
[...]

What is "\w"? It's not a standard escape sequence; its value is
implementation-defined.

Sorry. In my original post I'd indicated (not very clearly) that \w was a
new escape in a language I was creating to wrap around C.

So it's not a C escape but is translated to "\r\n". It represents 'windows
newline'; (or more generally, the full newline sequence used in the target
OS).
 
K

Keith Thompson

Harald van Dþÿ3k said:
"\w" does not match the syntax of a string literal, so by the rule
of the longest match this is tokenised as {"}{\}{w}{"}. The
behaviour is undefined if a double quote character occurs as a
single token. There need not be any value given to "\w", and if
there is, it need not be documented.

I believe you're mostly or entirely right, and I was wrong.

I misinterpreted the second clause of C99 6.4.4.4p10:

The value of an integer character constant containing more than
one character (e.g., 'ab'), or containing a character or escape
sequence that does not map to a single-byte execution character,
is implementation-defined.

as applying to things like '\w'; instead, it applies to things like
'\xffffffff'.

"\w" is split into 4 preprocessor tokens:
" \ w "
The " is not a punctuator; it's in the category "each non-white-space
character that cannot be one of the above" (C99 6.4), which means the
behavior is undefined.

In addition, though, this preprocessor token cannot be converted to a
token. The constraint in 6.4p2 is:

Each preprocessing token that is converted to a token shall have
the lexical form of a keyword, an identifier, a constant, a string
literal, or a punctuator.

So, assuming that "\w" isn't surrounded by something like "#if 0"
.... "endif", it would seem to be a constraint violation. By C99
5.1.1.3, this requires a diagnostic even if the behavior is also
undefined.

Note that, by the same reasoning, "abcd\w" should be split into 5
preprocessing tokens:

" abcd \ w "

which just seems confusing. But since such cases require a diagnostic
anyway, a compiler doesn't actually have to pp-tokenize it that way;
as long as it prints a warning or error message, its job is done.

Still, I think the description would have been simpler if a \ followed
by any character in a character or string literal were allowed
syntactically, with a constraint limiting the following character to
the ones that are specified. Then "\w" would be a single pp-token and
a single token (a string literal), with a diagnostic required because
of the constraint violation.
 
A

Antoninus Twink

䡡牡汤â¶æ…®â„ij欠㱴牵敤晸ä§æµ¡æ¥¬â¹£æ½­ã¸ ç²æ¥´æ•³?
㸠佮â潮Ⱐ㈳âŠç•®â€²ã€°ã  ã„²ã¨µã¤ºã€±â€­ã€·ã€°â° ä­¥æ¥´æ  å‘¨æ½­ç³æ½®â·ç‰¯ç‘¥?

You may want to check whether you really mean to include this header:
Content-Type: text/plain; charset=utf-16be
 
H

Harald van Dijk

Harald van Dþÿ3k said:
"\w" does not match the syntax of a string literal, so by the rule of
the longest match this is tokenised as {"}{\}{w}{"}. The behaviour is
undefined if a double quote character occurs as a single token. There
need not be any value given to "\w", and if there is, it need not be
documented.
[...]
"\w" is split into 4 preprocessor tokens:
" \ w "
The " is not a punctuator; it's in the category "each non-white-space
character that cannot be one of the above" (C99 6.4), which means the
behavior is undefined.

Yes. This would normally cause nothing more than a constraint violation
(as you pointed out below) or syntax error, but in the special case of '
or ", the behaviour is explicitly undefined.
In addition, though, this preprocessor token cannot be converted to a
token. The constraint in 6.4p2 is:

Each preprocessing token that is converted to a token shall have the
lexical form of a keyword, an identifier, a constant, a string
literal, or a punctuator.

So, assuming that "\w" isn't surrounded by something like "#if 0" ...
"endif", it would seem to be a constraint violation. By C99 5.1.1.3,
this requires a diagnostic even if the behavior is also undefined.

That's a fair point, though I'm not sure this is intended. As I understand
it, the point of making a stray " undefined was (in part) to allow for
implementations to support multi-line string literals as an extension. An
example similar to what I've posted on c.l.c before:

#define IGNORE(arg) /* nothing */
int main(void) {
IGNORE(")
void *p = 1;
IGNORE(")
}

Strictly by the standard, the two identical lines are tokenised as
{IGNORE}{(}{"}{)}, which expands to nothing. So after preprocessing, an
non-zero integer constant is used to initialise a pointer, which violates
a constraint. Some implementations, however, are unable to diagnose this,
because they take the undefined behaviour of a stray " as permission to
tokenise the body of main as

{IGNORE}
{(}
{")\n void *p = 1;\n IGNORE("}
{)}

I believe that since the behaviour is undefined in translation phase 3,
any constraint violations in later phases should not require a diagnostic.
I cannot back this up with wording from the standard, only explain with
examples.
Note that, by the same reasoning, "abcd\w" should be split into 5
preprocessing tokens:

" abcd \ w "

Yes, and then by my interpretation, the behaviour is undefined, so an
implementation may choose to make this a single string literal, with or
without a diagnostic, without any requirement on generated code (if any).
which just seems confusing. But since such cases require a diagnostic
anyway, a compiler doesn't actually have to pp-tokenize it that way; as
long as it prints a warning or error message, its job is done.

Still, I think the description would have been simpler if a \ followed
by any character in a character or string literal were allowed
syntactically, with a constraint limiting the following character to the
ones that are specified. Then "\w" would be a single pp-token and a
single token (a string literal), with a diagnostic required because of
the constraint violation.

Agreed.
 
K

Keith Thompson

[Apologies for the binary garbage I posted earlier. I'm having
multiple system problems, and the system I'm now using apparently
didn't like the non-ASCII character in Harald's last name. My "From:"
address has also been incorrect in most of today's postings; the
"(e-mail address removed)" address hasn't existed for several years.]

Harald van D?k said:
"\w" does not match the syntax of a string literal, so by the rule
of the longest match this is tokenised as {"}{\}{w}{"}. The
behaviour is undefined if a double quote character occurs as a
single token. There need not be any value given to "\w", and if
there is, it need not be documented.

I believe you're mostly or entirely right, and I was wrong.

I misinterpreted the second clause of C99 6.4.4.4p10:

The value of an integer character constant containing more than
one character (e.g., 'ab'), or containing a character or escape
sequence that does not map to a single-byte execution character,
is implementation-defined.

as applying to things like '\w'; instead, it applies to things like
'\xffffffff'.

"\w" is split into 4 preprocessor tokens:
" \ w "
The " is not a punctuator; it's in the category "each non-white-space
character that cannot be one of the above" (C99 6.4), which means the
behavior is undefined.

In addition, though, this preprocessor token cannot be converted to a
token. The constraint in 6.4p2 is:

Each preprocessing token that is converted to a token shall have
the lexical form of a keyword, an identifier, a constant, a string
literal, or a punctuator.

So, assuming that "\w" isn't surrounded by something like "#if 0"
.... "endif", it would seem to be a constraint violation. By C99
5.1.1.3, this requires a diagnostic even if the behavior is also
undefined.

Note that, by the same reasoning, "abcd\w" should be split into 5
preprocessing tokens:

" abcd \ w "

which just seems confusing. But since such cases require a diagnostic
anyway, a compiler doesn't actually have to pp-tokenize it that way;
as long as it prints a warning or error message, its job is done.

Still, I think the description would have been simpler if a \ followed
by any character in a character or string literal were allowed
syntactically, with a constraint limiting the following character to
the ones that are specified. Then "\w" would be a single pp-token and
a single token (a string literal), with a diagnostic required because
of the constraint violation.
 
J

Joachim Schmitz

Bartc said:
Keith Thompson said:
Bartc said:
The stdin/stdout files of C seem to be always in Text mode.

Thanks for the replies.

I think if I use exclusively "\w" for newlines (ie. "\r\n") in
strings and
internal functions that generate newlines, then this will work for
binary files.
[...]

What is "\w"? It's not a standard escape sequence; its value is
implementation-defined.

Sorry. In my original post I'd indicated (not very clearly) that \w
was a new escape in a language I was creating to wrap around C.

So it's not a C escape but is translated to "\r\n". It represents
'windows newline'; (or more generally, the full newline sequence used
in the target OS).
So where then does your '\w' differ from C's '\n'? In Windows '\n' results
in CR LF, in UNIX in LF, in MacOS in CP (or the other way round?), on other
platforms in whatever that platform uses to separate lines.

Bye, Jojo
 
B

Bartc

So where then does your '\w' differ from C's '\n'? In Windows '\n' results
in CR LF, in UNIX in LF, in MacOS in CP (or the other way round?), on
other platforms in whatever that platform uses to separate lines.

\w expands to \r\n (eg. CR,LF) at compile-time (in the other language).
\n stays as \n (typically LF) at compile-time.

\n only expands to all those other combinations at runtime, and only for
text modes.
At runtime, \w would result in \r followed by the expansion of \n, for text
modes.

Actual code:
printf("Hello World\w")

After translating to C:
printf("Hello World\r\n");

At runtime (using printf, stdout directed to a file):
150C:0100 48 65 6C 6C 6F 20 57 6F-72 6C 64 0D 0D 0A 30 3A Hello
World...0:
 
R

Richard Tobin

Bartc said:
\w expands to \r\n (eg. CR,LF) at compile-time (in the other language).
\n stays as \n (typically LF) at compile-time.

I can see this might be useful for writing to binary files in the
system's native text format.

It's limited to systems where the line break is represented by a
sequence of characters: it doesn't make sense on systems with lines
implemented in some other way (e.g. with a count). Of course, you may
not consider that important nowadays.

For a purely C solution you could just define a macro; e.g. for
Windows

#define LINEEND "\015\012"

and you can use it easily in constant strings

"hello" LINEEND "world" LINEEND

-- Richard
 
S

santosh

BigRelax said:
Hello ``
I am a student from china.
I like c.

If you make a friend with me, I am very happy.
My MSN ID is (e-mail address removed)

This is not a group for "making friends" or idle chit-chat. If you have
questions or problem on standard C post them here.

Complain to the maintainer of the above forum that the signature
separator that they add is broken.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,819
Latest member
masterdaster

Latest Threads

Top