getting rid of EOL character ?

S

stef

hello,

In the previous language I used,
when reading a line by readline, the EOL character was removed.

Now I'm reading a text-file with CR+LF at the end of each line,
Datafile = open(filename,'r')
line = Datafile.readline()

now this gives an extra empty line
print line

and what I expect that should be correct, remove CR+LF,
gives me one character too much removed
print line[,-2]

while this gives what I need ???
print line[,-1]

Is it correct that the 2 characters CR+LF are converted to 1 character ?
Is there a more automatic way to remove the EOL from the string ?

thanks,
Stef Mientki
 
M

Michael Hoffman

stef said:
hello,

In the previous language I used,
when reading a line by readline, the EOL character was removed.

Now I'm reading a text-file with CR+LF at the end of each line,
Datafile = open(filename,'r') line = Datafile.readline()

now this gives an extra empty line
print line

and what I expect that should be correct, remove CR+LF,
gives me one character too much removed
print line[,-2]

while this gives what I need ???
print line[,-1]

Is it correct that the 2 characters CR+LF are converted to 1 character ?
Is there a more automatic way to remove the EOL from the string ?

line = line.rstrip("\r\n") should take care of it. If you leave out the
parameter, it will strip out all whitespace at the end of the line,
which is what I do in most cases.
 
S

stef

line = line.rstrip("\r\n") should take care of it. If you leave out
the parameter, it will strip out all whitespace at the end of the
line, which is what I do in most cases.
thanks for the solution Michael,

cheers,
Stef
 
J

John Machin

Very interesting; how did you distinguish between EOF and an empty line?
Did you need to call an isEOF() method before each read?
Now I'm reading a text-file with CR+LF at the end of each line,
Datafile = open(filename,'r') line = Datafile.readline()

now this gives an extra empty line
print line

and what I expect that should be correct, remove CR+LF,
gives me one character too much removed
print line[,-2]

Stef, that would give you a syntax error. I presume that you meant to
type line[:-2]
while this gives what I need ???
print line[,-1]

Is it correct that the 2 characters CR+LF are converted to 1 character ?

In text mode (the default), whatever is the line ending on your platform
is converted to a single "newline" '\n' which is the same as LF.

Using line[:-1] is NOT recommended, as the last line in your file may
not be terminated, and in that case you would lose the last data character.
line = line.rstrip("\r\n") should take care of it. If you leave out the
parameter, it will strip out all whitespace at the end of the line,
which is what I do in most cases.

If you want *exactly* what is in the line, use line.rstrip('\n') -- this
will remove only the trailing newline (if it exists).

If you want to strip all trailing whitespace, use line.rstrip() as
Michael suggested.

Michael, note carefully that line.rstrip('\r\n') removes instances of
'\r' OR '\n' -- the arg is a set of characters to be removed, not a
suffix to be removed. In Stef's situation, it "works" only by accident.
Using that would not always give you the correct answer -- e.g. if your
(Windows) file had a line ending in CR CR LF [I've seen stranger].

HTH,
John
 
S

Stef Mientki

hi John,
Very interesting; how did you distinguish between EOF and an empty line?
Did you need to call an isEOF() method before each read?
Yes indeed, and I admit it needs some more coding ;-)
Now I'm reading a text-file with CR+LF at the end of each line,
Datafile = open(filename,'r') line = Datafile.readline()

now this gives an extra empty line
print line

and what I expect that should be correct, remove CR+LF,
gives me one character too much removed
print line[,-2]

Stef, that would give you a syntax error. I presume that you meant to
type line[:-2] Yes, sorry.
while this gives what I need ???
print line[,-1]

Is it correct that the 2 characters CR+LF are converted to 1 character ?

In text mode (the default), whatever is the line ending on your platform
is converted to a single "newline" '\n' which is the same as LF.
Aha, that was the answer I was looking for.

<snip>

thanks for the splendid explanation John,

cheers,
Stef Mientki
 
M

Michael Hoffman

John said:
Very interesting; how did you distinguish between EOF and an empty line?
Did you need to call an isEOF() method before each read?
Now I'm reading a text-file with CR+LF at the end of each line,
Datafile = open(filename,'r') line = Datafile.readline()

now this gives an extra empty line
print line

and what I expect that should be correct, remove CR+LF,
gives me one character too much removed
print line[,-2]

Stef, that would give you a syntax error. I presume that you meant to
type line[:-2]
while this gives what I need ???
print line[,-1]

Is it correct that the 2 characters CR+LF are converted to 1 character ?

In text mode (the default), whatever is the line ending on your platform
is converted to a single "newline" '\n' which is the same as LF.

Using line[:-1] is NOT recommended, as the last line in your file may
not be terminated, and in that case you would lose the last data character.
line = line.rstrip("\r\n") should take care of it. If you leave out
the parameter, it will strip out all whitespace at the end of the
line, which is what I do in most cases.

If you want *exactly* what is in the line, use line.rstrip('\n') -- this
will remove only the trailing newline (if it exists).

If you want to strip all trailing whitespace, use line.rstrip() as
Michael suggested.

Michael, note carefully that line.rstrip('\r\n') removes instances of
'\r' OR '\n' -- the arg is a set of characters to be removed, not a
suffix to be removed. In Stef's situation, it "works" only by accident.
Using that would not always give you the correct answer -- e.g. if your
(Windows) file had a line ending in CR CR LF [I've seen stranger].

I knew that about line.rstrip, but didn't consider the possibility of
\r\r\n, while still wanting the first \r. Yuck.

Honestly, I almost always use line.rstrip()--it is seldom that I care
about closing whitespace.
 
J

John Machin

Very interesting; how did you distinguish between EOF and an empty line?
Did you need to call an isEOF() method before each read?
Now I'm reading a text-file with CR+LF at the end of each line,
Datafile = open(filename,'r') line = Datafile.readline()
now this gives an extra empty line
print line
and what I expect that should be correct, remove CR+LF,
gives me one character too much removed
print line[,-2]
Stef, that would give you a syntax error. I presume that you meant to
type line[:-2]
while this gives what I need ???
print line[,-1]
Is it correct that the 2 characters CR+LF are converted to 1 character ?
In text mode (the default), whatever is the line ending on your platform
is converted to a single "newline" '\n' which is the same as LF.
Using line[:-1] is NOT recommended, as the last line in your file may
not be terminated, and in that case you would lose the last data character.
Is there a more automatic way to remove the EOL from the string ?
line = line.rstrip("\r\n") should take care of it. If you leave out
the parameter, it will strip out all whitespace at the end of the
line, which is what I do in most cases.
If you want *exactly* what is in the line, use line.rstrip('\n') -- this
will remove only the trailing newline (if it exists).
If you want to strip all trailing whitespace, use line.rstrip() as
Michael suggested.
Michael, note carefully that line.rstrip('\r\n') removes instances of
'\r' OR '\n' -- the arg is a set of characters to be removed, not a
suffix to be removed. In Stef's situation, it "works" only by accident.
Using that would not always give you the correct answer -- e.g. if your
(Windows) file had a line ending in CR CR LF [I've seen stranger].

I knew that about line.rstrip, but didn't consider the possibility of
\r\r\n, while still wanting the first \r. Yuck.

It would be unusual to want that first \r -- a possibly more likely
scenario might be where your text file contains an extract from a
database, and you need to check that there are no unwanted (e.g.
unprintable) characters in the data (whether at the end of the line,
the middle, or the start).

In any case I think that you are missing the point that when reading a
normal text file on Windows with readline, while the line in the file
may be 'foo bar\r\n', what you get from readline is 'foo bar\n' -- so
in normal usage, the \r in your line.rstrip('\r\n') is pointless.
Honestly, I almost always use line.rstrip()--it is seldom that I care
about closing whitespace.

Honestly, I almost always split a line into fields and then for each
field, strip leading and trailing whitespace, and change runs of 1 or
more whitespace characters to a single space -- where "whitespace"
includes the pesky U+00A0 aka &nbsp; which doesn't qualify as
whitespace in a str instance.

Cheers,
John
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,819
Latest member
masterdaster

Latest Threads

Top