Python MSI not installing, log file showing name of a Viatnemesecommunist revolutionary

M

Mark Lawrence

I perceive that this is your singular pet peeve, or, you were elected by
the python community some time ago to police the line-end problem ?

It's a pet peeve as:-

a) trying to read something that's the fourth level of reply or higher
to the original gets to be almost impossible as it's all white space and
no substance.

b) better tools exist

c) the work around is shown on the Python wiki, not on the crappy, bug
ridden gg site itself.

Wow, it's like a sauna in here :)

I doubt that the Python community would elect me to do anything. Anyhow
I start my new job in the diplomatic corp next week.
 
M

Mark Lawrence

Le samedi 22 mars 2014 05:59:34 UTC+1, Mark H. Harris a écrit :

No offense. A good start would be to understand "unicode"
instead of bashing MS.

jmf

How apt given how this thread has moved :)
 
S

Steven D'Aprano

I notice (since moving my stuff to Thunderbird two weeks back) the
double spacing you keep squawking about, but I don't find it the big
nuisance you're talking about; ok, so we have to scroll a bit further.

It's not the scrolling that interferes

with readability, it's the interruption

to the flow of text by having excess

blank lines within a paragraph of text.


I am honestly convinced that this might even be a python problem. More
likely than not, gg is written in python, and this is the goofy line-end
character problem we have to deal with when we read lines in python.

Well, that's certainly a novel idea. Why you think that Google Groups is
written in Python? Does every post from GG end with "Powered By Python"?

Why do we suck in the new-line character as though it were part of the
line?

Because it is the only sensible way to handle it. If you don't, then
there is no way to distinguish between a file that ends with a newline
and one which doesn't.

This is asinine behavior. The new-line is a "file" delimiter
character and NOT intended to be part of the line.

Line endings are terminators: they end the line. Whether you consider the
terminator part of the line or not is a matter of opinion (is the cover
of a book part of the book?) but consider this:

If you say that the end of lines are *not* part of the line, then
that implies that some parts of the file are not inside any line
at all. And that would be just weird.

Thinking this through a bit

Yes, that helps :)

I've noticed that a blank line comes back
with a '\n' which differentiates it from file end which comes back
"without" the new-line. So, it appears that python is using the
new-line character (or lack there-of) to have meaning which the new=line
in a unix file was never suppose to carry.

I don't understand what meaning you are referring to here. Blank lines
comes back as a \n because a blank line *is* a \n with nothing before it.
Python isn't doing anything funny here, at least not on Linux. If you
open a text editor, and type:

spam ENTER ENTER ENTER ENTER eggs ENTER

(where ENTER means to hit the Enter key, not the letters E N T E R) and
then save, your editor will show the words "spam" and "eggs" separated by
three blank lines. If you open that file in a hex editor, you will see
something like:

73 70 61 6d 0a 0a 0a 0a 65 67 67 73 0a

Notice the four 0a bytes in a row? That gives you three blank lines.
Python is no adding any extra newlines or putting them where they aren't,
so I don't really understand what point you're trying to make here.


If python would just return EOF like every other language at file end,
and a test line without '\n' tacked to the end of it, this little snag
with gg would probably go away. What say you?

There is no evidence that Google Group's difficulty is because of Python.
More likely it is related to the translation between "rich text"
formatted using HTML, and plain text.

By the way, Python *does* return EOF at the end of the file. It is just
that EOF in Python is spelled "the empty string" instead of some other
special value. Both Ruby and Lua behave like Python, returning the empty
string on end of file:

steve@orac:~$ irb
irb(main):001:0> f = File.open('/tmp/io.txt')
=> #<File:/tmp/io.txt>
irb(main):002:0> f.read()
=> "hello"
irb(main):003:0> f.read()
=> ""


[steve@ando ~]$ lua
Lua 5.1.4 Copyright (C) 1994-2008 Lua.org, PUC-Rio
fp = io.open('/tmp/io.txt', 'r')
print(fp:read("*all")) hello


Similarly, Rust returns None:
http://static.rust-lang.org/doc/0.9/std/io/trait.Reader.html#tymethod.read

And Java's java.io.BufferedReader is also similar, returning null.
 
C

Chris Angelico

Line endings are terminators: they end the line. Whether you consider the
terminator part of the line or not is a matter of opinion (is the cover
of a book part of the book?) but consider this:

If you say that the end of lines are *not* part of the line, then
that implies that some parts of the file are not inside any line
at all. And that would be just weird.

Not so weird IMO. A file is not a concatenation of lines; it is a
stream of bytes. Now, if you ask Python to read you 512 bytes from a
binary file, and then ask for another 512 bytes, and so on until you
reach the end, then it would indeed be VERY weird if there were parts
of the file that weren't in the returned (byte) strings. But if you
ask for a line, and then another line, and another line, then it's
quite reasonable to interpret U+000A as "line separation" rather than
"line termination", and not return it. (Both interpretations make
sense. I just wish the most obvious form of iteration gave the
cleaner/tidier version, or at very least that there be some really
obvious way to ask for lines-without-endings.) Imagine the output of
GNU find as a series of records. You can ask for those to be separated
by newlines (the default, or -print), or by NULs (with the -print0
command). In either case, the records do not *contain* that value,
they're separated by it; the records consist of file names.

ChrisA
 
D

Dennis Lee Bieber

Well, and now that I'm thinking about this again, since we have
unicode, maybe we should have an entire set of standard "file"
delimiters for flat-files.

But the bottom line (pun intended) is that I just want to suck the
lines in, and I only want the system to have to handle the delimiters;

What "system"... as I understand the UNIX/C stream concept, there are
no "system" delimiters -- it is up to the application program to determine
how to respond to special characters.

Maybe you'd prefer VMS FORTRAN segmented records -- which encoded
start-of-record and end-of-record bits at the OS level, while writing
blocks of (as I recall -- been 15 years since I had to parse a FORTRAN text
file in a non-compatible language) no more than 256-bytes. A long text
record then would something like:
01lots of text
00carrying over to more blocks
00until finally getting to
10the end block.

A short line would have both start and end markers
11a short line

http://h71000.www7.hp.com/doc/82final/6443/6443pro_021.html#subhead_segrec_type

implies unformatted (binary) files, but I'm fairly certain I've seen text
contents too (maybe written as unformatted).

To get a "C-style" file one had to go out of their way to specify a
stream mode, what the line end character would be, and for formatted
output, often a "carriagecontrol" specification when opening the file (as I
recall that activated newline, linefeed, tab, and formfeed behaviors on
output files).
 
S

Steven D'Aprano

Not so weird IMO. A file is not a concatenation of lines; it is a stream
of bytes.

But a *text file* is a concatenation of lines. The "text file" model is
important enough that nearly all programming languages offer a line-based
interface to files, and some (Python at least, possibly others) make it
the default interface so that iterating over the file gives you lines
rather than bytes -- even in "binary" mode.

Now, if you ask Python to read you 512 bytes from a binary
file, and then ask for another 512 bytes, and so on until you reach the
end, then it would indeed be VERY weird if there were parts of the file
that weren't in the returned (byte) strings. But if you ask for a line,
and then another line, and another line, then it's quite reasonable to
interpret U+000A as "line separation" rather than "line termination",
and not return it. (Both interpretations make sense. I just wish the
most obvious form of iteration gave the cleaner/tidier version, or at
very least that there be some really obvious way to ask for
lines-without-endings.)

There is: call strip('\n') on the line after reading it. Perl and Ruby
spell it chomp(). Other languages may spell it differently. I don't know
of any language that automatically strips newlines, probably because you
can easily strip the newline from the line, but if the language did it
for you, you cannot reliably reverse it.

Imagine the output of GNU find as a series of
records. You can ask for those to be separated by newlines (the default,
or -print), or by NULs (with the -print0 command). In either case, the
records do not *contain* that value, they're separated by it; the
records consist of file names.

I have no problem with that: when interpreting text as a record with
delimiters, e.g. from a CSV file, you normally exclude the delimiter.
Sometimes the line terminator does double-duty as a record delimiter as
well.

Reading from a file is considered a low-level operation. Reading
individual bytes in binary mode is the lowest level; reading lines in
text mode is the next level, built on top of the lower binary mode. You
build higher protocols on top of one or the other of that mode, e.g.
"read a zip file" would be built on top of binary mode, "read a csv file"
would be built on top of text mode.

As a low-level protocol, you ought to be able to copy a file without
changing it by reading it in then writing it out:

for blob in infile:
outfile.write(blob)


ought to work whether you are in text mode or binary mode, so long as the
infile and outfile are opened in the same mode. If Python were to strip
newlines, that would no longer be the case.

(Even high-level protocols should avoid unnecessary modifications to
files. One of the more annoying, if not crippling, limitations to the
configparser module is that reading an INI file in, then writing it out
again destroys the high-level structure of the file: comments and blank
lines are stripped, and records may be re-ordered.)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,077
Messages
2,570,566
Members
47,202
Latest member
misc.

Latest Threads

Top