Reading Text Files

B

Ben Bacarisse

Bartc said:
So octal numbers are up to 3 digits, while hexadecimal numbers are any
length? That seems odd. The hex escape makes more sense as:

\x hexadecimal-digit
\x hexadecimal-digit hexadecimal-digit

And matches the syntax style for octal.

Character and string literals can contain wide characters.
 
B

Bartc

Ben Bacarisse said:
Character and string literals can contain wide characters.

OK, (I did try longer hex sequences in a L"..." string, but the first
compiler I tried didn't like more than two hex digits).

Still, it would have been neater for both octal/hex to be any length
sequence of digits, and to leave it to the compiler to check for overflow
values.
 
K

Keith Thompson

Bartc said:
So octal numbers are up to 3 digits, while hexadecimal numbers are any
length?
Yes.

That seems odd. The hex escape makes more sense as:

\x hexadecimal-digit
\x hexadecimal-digit hexadecimal-digit

And matches the syntax style for octal.

Octal escapes are older. When they were invented, it was reasonable
to assume that a character was no more than 8 bits. Expanding octal
escapes to allow more than 3 digits would have broken existing code.
I might write "\1234" with the intent of specifying { '\123', '4', '\0' }.

When hexadecimal escapes were added later (were they an ANSI
invention?), there was an opportunity to allow for larger values. If
it's necessary to terminate a hex escape, you can use string literal
concatenation:

"\x1234" is { '\x1234', '\0' }
"\x12" "34" is { '\x12', '3', '4', '\0' }
 
B

Ben Bacarisse

Bartc said:
OK, (I did try longer hex sequences in a L"..." string, but the first
compiler I tried didn't like more than two hex digits).

Still, it would have been neater for both octal/hex to be any length
sequence of digits, and to leave it to the compiler to check for
overflow values.

If starting from scratch. C is old and revisions must try not to
break too much old code. In 1989 that was a lot of code where "\2612"
meant ±2. Is the symmetry of an old format (octal) with a newer one
worth breaking code for?
 
B

Bartc

Ben Bacarisse said:
If starting from scratch. C is old and revisions must try not to
break too much old code. In 1989 that was a lot of code where "\2612"
meant ±2. Is the symmetry of an old format (octal) with a newer one
worth breaking code for?

No. I didn't even think about how these sequences are terminated. And in
fact I'm having some trouble putting the escape \x9 (as allowed by the
syntax) in the middle of the string "ABCDEF".
 
B

Ben Bacarisse

... I didn't even think about how these sequences are terminated. And
in fact I'm having some trouble putting the escape \x9 (as allowed by
the syntax) in the middle of the string "ABCDEF".

You can't do it directly but you can write "ABC\x9" "DEF".

(I think Keith Thompson reply deals with this as well.)
 
M

Mark L Pappin

[octal vs hex escapes in string literals]
in fact I'm having some trouble putting the escape \x9 (as allowed by
the syntax) in the middle of the string "ABCDEF".

Follow the single-digit hex value with a source character which is not
a hex digit but which does not [necessarily] change the content of the
literal.

To put it another way,
strcmp("ABC" "DEF", "ABCDEF") == 0

HTH. HAD.

mlp
 
B

Ben Bacarisse

CBFalconer said:
You are trying to create characters to install in a string. I am
not up on use of the L modifier, but in general string chars have
to fit into a byte. If that byte is 8 bits long, it is fairly hard
to fit a 3 digit hex value into it.

It seems odd that you comment on something that you are "not up on the
use of" when that thing is at the heart of the post you are commenting
on. Yes, characters in strings usually fit in a byte except when the
string is a wide string, e.g. L"\x1D0B". This is a string with one
character.
 
B

Barry Schwarz

What mandatory diagnostic? In C90 those are all just simple escape
sequences invoking undefined behaviour. If gcc is right then \U has
special meaning in C99, but there is every chance the OP is not using C99.

I was thinking of the one required by footnote 65 in n1256 but I see
that it is not required in C89.
 
H

Harald van Dijk

Barry said:
fp = fopen( "C:\Users\<Username>\Documents\R.txt", "r" );

Did not your compiler generate some mandatory diagnostics on this
line?

What mandatory diagnostic? [...]

I was thinking of the one required by footnote 65 in n1256 but I see
that it is not required in C89.

Footnote 65 contradicts the normative text, and should be ignored except
possibly as an indication of the intent of the authors. Because "\i" is
not a single token and also not a single preprocessing token, 6.4p4
requires this to be tokenised as {"}{\}{i}{"}, for which the behaviour is
undefined per 6.4p3. No diagnostic is required.
 
L

lawrence.jones

Keith Thompson said:
Octal escapes are older. When they were invented, it was reasonable
to assume that a character was no more than 8 bits.

I believe 9 bit bytes were known to exist at that time, but three octal
digits are sufficient for them, too.
 
R

Rafael

Barry Schwarz escreveu:
I was thinking of the one required by footnote 65 in n1256 but I see
that it is not required in C89.
Copy teh document to the root directory (c:\) and give it a try.

By the way, what compiler are you using (I dont have your original post,
sorry).
 
J

JosephKK

You need to escape backslashes in string literals: try
fp = fopen( "C:\\Users\\<Username>\\Documents\\R.txt", "r" );

I believe Windows will also accept forward slashes in pathnames, which
would mean you don't need to do any escaping at all.

In some cases, certainly not in all cases.
 
J

JosephKK

What mandatory diagnostic? In C90 those are all just simple escape
sequences invoking undefined behaviour. If gcc is right then \U has
special meaning in C99, but there is every chance the OP is not using C99.


Upping the warning level is often the way, and simply ignoring
diagnostics is never good.


Definitely a good idea.

Each to their own, i am wondering just is being expressed by
"<Username>" inside the string.
 
J

JosephKK

You are trying to create characters to install in a string. I am
not up on use of the L modifier, but in general string chars have
to fit into a byte. If that byte is 8 bits long, it is fairly hard
to fit a 3 digit hex value into it.

The world seems to be moving in the unicode direction (specifically
including UTF-8), this gives us some marvelous oddities, including
variable width characters.
 
K

Keith Thompson

Joe Wright said:
Please elucidate. In what cases will the use of '/' fail?

I just tried this program on a Windows system after creating a small
text file named C:\tmp.txt containing a single line "hello":

#include <stdlib.h>
int main(void)
{
system("type C:\\tmp.txt");
system("type C:/tmp.txt");
return 0;
}

The output was:

hello
The syntax of the command is incorrect.
 
K

Keith Thompson

Joe Wright said:
Boo. You're using command.com or cmd.exe, not Windows. Try this.

I'm using the standard system() function *on* Windows. It happens to
invoke cmd.exe, which is part of Windows.
#include <stdio.h>
int main(void) {
int c;
FILE *in;
if ((in = fopen("c:/tmp.txt", "r")))
while ((c = fgetc(in)) != EOF)
putchar(c);
return 0;
}

You asked for cases where the use of '/' will fail. I presented such
a case. Presenting another case where it doesn't fail doesn't refute
that.

"C:/tmp.txt" is a valid name for that file, but a name that can't be
used in all contexts.
I still don't know where the OS will barf on '/' as a separator. Do you?

Yes, I just showed it to you.
 
K

Kenny McCormack

Keith Thompson said:
"C:/tmp.txt" is a valid name for that file, but a name that can't be
used in all contexts.

Keith is, of course, correct here.
Of course, whether or not the command line is the OS is open to debate.

But in general, relying on this hack (using the "wrong" - in the sense
of non-native - separator) can lead to grief.
Yes, I just showed it to you.

Yes, and in fact, I believe it is actually true in some cases that don't
involve the command line.

A bit of background: (modern versions of) Windows is a kernel, and
actually a pretty well built and designed kernel, with, like 12,000
layers of emulation wrapped around it. Based on something I read
recently, apparently the kernel itself only works with backslashes. The
conversion from forward slashes to backslashes occurs in one of the
emulation layers.

The effect of this is that, although rarely necessary, there exist some
situations where you, as an application programmer, can call very
low-level kernel functions (not really APIs, at this level) directly
(rather than through the emulation layers). And if you do so, you can't
use forward slashes; you must use backslashes.
 
C

CBFalconer

Keith said:
.... snip ...


I just tried this program on a Windows system after creating a
small text file named C:\tmp.txt containing a single line "hello":

#include <stdlib.h>
int main(void) {
system("type C:\\tmp.txt");
system("type C:/tmp.txt");
return 0;
}

The output was:

hello
The syntax of the command is incorrect.

I believe that there the problem is passing the command through the
COMMAND (or CMD) processor. type is a local instruction within
that shell.
 
P

Phil Carmody

Keith Thompson said:
I'm using the standard system() function *on* Windows. It happens to
invoke cmd.exe, which is part of Windows.


You asked for cases where the use of '/' will fail. I presented such
a case. Presenting another case where it doesn't fail doesn't refute
that.

"C:/tmp.txt" is a valid name for that file, but a name that can't be
used in all contexts.


Yes, I just showed it to you.

You are apparently treating a shell as part of the OS. You're not
alone, it's very common amongst DOS-heads, for example, but it's
a view that no-one I know who actually works on operating systems
shares.

MS Paint came with MS Windows. Does that make MS Paint part of the
OS too, in your eyes?

Phil
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,817
Latest member
DicWeils

Latest Threads

Top