Newbie question about file input

A

Aaron Deskins

Hello everyone,
I'm trying to make a simple python script that will read a text file
with a bunch of chess games and tell me how many games there are. The
common format for such chess games is the .pgn format (which is just a
text file) with the following being typical (with the following file
having 2 games):

[Event "Quizzes"]
[Site "?"]
[Date "????.??.??"]
[Round "?"]
[White "?"]
[Black "?"]
[Result "*"]
[SetUp "1"]
[FEN "6k1/5p2/8/4p3/pp1qPn2/5P2/PP2B3/2Q2K2 b - - 0 1"]
[PlyCount "5"]

1... Qg1+ 2. Kxg1 Nxe2+ 3. Kf1 Nxc1 *

[Event "Quizzes"]
[Site "?"]
[Date "????.??.??"]
[Round "?"]
[White "?"]
[Black "?"]
[Result "*"]
[SetUp "1"]
[FEN "8/r4pbk/4p2p/8/p5R1/Pq3N1P/1P1Q1PP1/6K1 w - - 0 1"]
[PlyCount "5"]

1. Rxg7+ Kxg7 2. Qd4+ Kf8 3. Qxa7 *


Basically every game starts with the [Event "..."] header and then the
information about the game is given.

My first attempt at the python script is:

#! /usr/bin/env python
import string
import sys
zf=open('test.pgn','r')
# games is number of games
games = 0
while 1:
line = zf.readline()
if line == '':
break
ls = line.split()
print ls[0]
if ls[0] == '[Event':
games+=1
zf.close()
print games


I'm having problems when the script reads a blank line from the pgn
file. I get the following error message:
IndexError: list index out of range
The problem is that ls[0] does not exist when a blank line is read. What
would be the best way of fixing this?
 
P

Peter Hansen

Aaron said:
My first attempt at the python script is:

#! /usr/bin/env python
import string
import sys
zf=open('test.pgn','r')
# games is number of games
games = 0
while 1:
line = zf.readline()
if line == '':
break
ls = line.split()
print ls[0]
if ls[0] == '[Event':
games+=1
zf.close()
print games

Small note: it would make your code more readable and thus
easier to comment on if you followed convention and used four
spaces for indentation. Your volunteer tutors thank you! :)
I'm having problems when the script reads a blank line from the pgn
file. I get the following error message:
IndexError: list index out of range
The problem is that ls[0] does not exist when a blank line is read. What
would be the best way of fixing this?

The "best" way might be just to check for and ignore blank lines prior
to the rest of the code that might fail when you find one:

while 1:
line = zf.readline()
if not line:
break # more idiomatic, perhaps, than if line == ''

# remove leading and trailing whitespace, including newline
line = line.strip()
if not line:
continue # don't break, go back for another

# from here on line is guaranteed not to be blank


Another approach might be to use exceptions, though I wouldn't
really recommend it here since the above is fairly idiomatic,
I believe:

while 1:
# blah blah
ls = line.split()
try:
if ls[0] == '[Event':
# blah
except IndexError:
# blank line, so no first field, so continue
continue


You might also consider using "ls.startswith('[Event')"
instead, as that avoids the need to split the line at
all, but you're doing fine on your own so far. (Not that
this will stop someone from posting a one-liner using
the "re" module and len(findall()), but you can safely
ignore them. ;-)

-Peter
 
C

Christopher T King

Basically every game starts with the [Event "..."] header and then the
information about the game is given.

My first attempt at the python script is:

#! /usr/bin/env python
import string
import sys
zf=open('test.pgn','r')
# games is number of games
games = 0
while 1:
line = zf.readline()
if line == '':
break
ls = line.split()
print ls[0]
if ls[0] == '[Event':
games+=1
zf.close()
print games

I'm having problems when the script reads a blank line from the pgn
file. I get the following error message:
IndexError: list index out of range
The problem is that ls[0] does not exist when a blank line is read. What
would be the best way of fixing this?

The immediate fix is to check for newlines, in addition to blank strings:

if line == '' or line =='\n':
break

The not-so-immediate fix would be to skip the blank line check, and
instead check the length of ls:

if len(ls) and ls[0] == '[Event':

But perhaps a better fix would be to skip the split and use the
str.startswith() method:

if line.startswith('[Event'):

This won't fail in the case of a blank line, and will be somewhat faster
than str.split().

There are, of course, even better fixes (such as using regexes or
PyParsing), but they would likely be overkill unless you plan on
extracting more data from the file (such as the name of the event).

Hope this helps.
 
C

Christopher T King

The not-so-immediate fix would be to skip the blank line check

Oops, I didn't see you were using the blank line check to check for EOF
(as Ben pointed out); forget I said that. Do what he said instead.

(Not that this will stop someone from posting a one-liner using the "re"
module and len(findall()), but you can safely ignore them. ;-)

It was hard, but I was able to restrain myself ;)
 
A

Aaron Deskins

Thanks to all for the replies. I changed it and it works fine. Now my
question- what is this convention on indentation? Should all
indentations be 4 spaces in?

Thanks again.

Peter Hansen wrote:
 
G

Grant Edwards

I'm trying to make a simple python script that will read a
text file with a bunch of chess games and tell me how many
games there are.

$ grep '^\[Event' | wc -l

;)
#! /usr/bin/env python
import string
import sys
zf=open('test.pgn','r')
# games is number of games
games = 0
while 1:
line = zf.readline()
if line == '':
break
ls = line.split()
print ls[0]
if ls[0] == '[Event':
games+=1
zf.close()
print games


I'm having problems when the script reads a blank line from the pgn
file. I get the following error message:
IndexError: list index out of range
The problem is that ls[0] does not exist when a blank line is read. What
would be the best way of fixing this?

Ignore the blank lines by doing something like this before you
split them:

line = line.strip()
if not line:
continue

Or by checking how many words were found after you split the
line:

ls = line.split()
if len(ls) == 0:
continue

Perhaps something like this (just to be a smartass, I'm going
to condense your file open/readline()/if-break construct into
the nice new file-as-iterator usage):

numgames = 0
for line in file('test.pgn','r'):
ls = line.split()
if len(ls) == 0:
continue
if ls[0] == '[Event':
numgames += 1
print numgames

Or better yet, forget split() and use the startswith() string
method:

games = 0
for line in file('test.pgn','r'):
if line.startswith('[Event'):
games += 1
print games

If whitespace is allowed at the beginning of the line, then we
should also strip() the line:

numgames = 0
for line in file('test.pgn','r'):
if line.strip().startswith('[Event'):
numgames += 1
print games

An argument can be made that you're better of explicitly
opening/closing files, but that would add more lines that don't
really have anything to do with the algorithm we're playing with.

If you want to be particularly obtuse we can rely on the fact
that True evaluates to 1 and and False evaluates to 0, and just
sum up the boolean values returned by .startswith(). That only
takes one line (not counting the "import operator"):

print reduce(operator.add,[l.startswith('[Event') for l in file('test.pgn','r')])

The 5-line version if probably slightly easier to understand at
a glance.
 
D

Dan Schmidt

| If you want to be particularly obtuse we can rely on the fact
| that True evaluates to 1 and and False evaluates to 0, and just
| sum up the boolean values returned by .startswith(). That only
| takes one line (not counting the "import operator"):
|
| print reduce(operator.add,[l.startswith('[Event') for l in file('test.pgn','r')])

I'd just write something like

print len( [ l for l in file( 'test.pgn') if l.startswith( '[Event' ) ] )

which actually looks clear enough to me that I might write it that way
if I were writing this program.

If anonymous functions weren't so ugly I might use

print len( filter( lambda l: l.startswith( '[Event' ), file( 'test.pgn' ) ) )

instead, since I find the [ l for l ] idiom for filter kind of unappealing.

Dan
 
D

Dennis Lee Bieber

Thanks to all for the replies. I changed it and it works fine. Now my
question- what is this convention on indentation? Should all
indentations be 4 spaces in?
There are two concerns, actually...

One is esthetics and user-friendliness... An indent level which
is quickly visible to the reader of the code, yet doesn't result in
typical nesting levels running off the edge of the editor window
(typically 80 characters wide).

The other is internal... If all you ever use are spaces,
consistency is key. If you tend to use tab characters (rather than an
editor that does space substitution on entry of a tab), all tab
characters are considered to align on 8-character spacing regardless of
what the editor displays for them. Again, if you do use tabs, and ONLY
tabs, it doesn't matter what the display looks like.

The problem comes when your tabbed file is edited by someone
using spaces. Unless their editor is displaying tabs @8, their
space-aligned edits will NOT be properly indented -- and vice versa,
when displayed in your editor, if you use non-8 tabs, their 8-space
lines will not display in alignment in your editor.

--
 
A

Aaron Deskins

Ok everyone,
Thanks again for all the input. I rewrote the code and have included
some more routines. Basically the program takes a pgn file with all the
chess games and randomizes them and spits out a new pgn file. It is
below. Hopefully I was able to get the spacing right. :)

Also, what exactly is stored when a blank line is read by the
readline command? A zero, blank, or what??? What is the difference
between trying to read a non-existant line (i.e. a line at the end of
the file) and a blank line (line with nothing on it)?

Thanks again.



pgn-rand.py:

#! /usr/bin/env python
import string
import sys
import random

random.seed

# Find number of games, it
zf=open(sys.argv[1],'r')
it = 0
while 1:
line = zf.readline()
if line.startswith("[Event"):
it+=1
if not line:
break
zf.close()

# Initialize order which has list of games
a = 0
order = [1]
while a < it-1:
order.append(a+2)
a+=1

# Randomize game list 4x
a=0
while a < 4*it:
one = random.randint(0,it-1)
two = random.randint(0,it-1)
temp = order[one]
order[one]=order[two]
order[two]=temp
a+= 1

# Output to new pgn file in new order
out=open(sys.argv[2],'w')
a=0
while a < it:
curr=0
zf=open(sys.argv[1],'r')
while 1:
line = zf.readline()
if line.startswith("[Event"):
curr+= 1
if order[a] == curr:
out.write(line)
if not line:
break
zf.close()
a+=1
out.close()
 
C

Christopher T King

Also, what exactly is stored when a blank line is read by the
readline command? A zero, blank, or what???

readline() always includes the '\n' that terminates the line, so any
'blank' line that is read will be returned as a simply '\n'. The only
time readline() won't return a line terminated with '\n' is if the last
line of the file isn't terminated with '\n' (common on Windows), in which
context a blank line is meaningless.
 
A

Aaron Deskins

Christopher said:
readline() always includes the '\n' that terminates the line, so any
'blank' line that is read will be returned as a simply '\n'. The only
time readline() won't return a line terminated with '\n' is if the last
line of the file isn't terminated with '\n' (common on Windows), in which
context a blank line is meaningless.
So why does a linesplit fail after the program reads a blank line (or a
'\n')? Is this simply because there's no way to assign the variable '\n' ?
 
G

Grant Edwards

So why does a linesplit fail after the program reads a blank
line (or a '\n')?

It doesn't. It just returns an empty list since there are not
any non-whitespace characters to return.
Is this simply because there's no way to assign the variable '\n' ?

I don't understand the question.
 
A

Aaron Deskins

Grant said:
I don't understand the question.

Perhaps I should have been more specific. What does python store when
you read a blank line? Nothing? A null variable? A '\n'?

How about this:

import string
import sys
zf=open(sys.argv[1],'r')
it = 0
while 1:
line = zf.readline()
print line
rs = line.split()
print rs
if rs[0]== '[Event':
it+=1
print it
if not line:
break
zf.close()

This fails when it tries the "if rs[0]== '[Event':" statement. rs[0]
doesn't exist (or is blank?) for a blank line in my input file.

Another code:

import string
import sys
zf=open(sys.argv[1],'r')
it = 0
while 1:
line = zf.readline()
it+=1
if not line:
break
zf.close()
print it

This only ends when the end of file is reached. Why not when a blank
line is read? How does python treat the variable line (after a readline)
differently after a blank line or the last line of the file?
 
M

Marc 'BlackJack' Rintsch

Perhaps I should have been more specific. What does python store when
you read a blank line? Nothing? A null variable? A '\n'?

It reads '\n' as that's, more or less, the content in the text file for
that line. In the file there may be some other platform specific byte
(combination) to denote a line ending.
How about this:

import string
import sys
zf=open(sys.argv[1],'r')
it = 0
while 1:
line = zf.readline()
print line
rs = line.split()
print rs
if rs[0]== '[Event':
it+=1
print it
if not line:
break
zf.close()

This fails when it tries the "if rs[0]== '[Event':" statement. rs[0]
doesn't exist (or is blank?) for a blank line in my input file.

Let's fire up the interpreter and see why:
>>> line = "\n"
>>> line '\n'
>>> rs = line.split()
>>> rs []
>>> rs[0]
Traceback (most recent call last):
File "<stdin>", line 1, in ?
IndexError: list index out of range

A line without anything but whitespace characters "splits" to an empty
list.
Another code:

import string
import sys
zf=open(sys.argv[1],'r')
it = 0
while 1:
line = zf.readline()
it+=1
if not line:
break
zf.close()
print it

This only ends when the end of file is reached. Why not when a blank
line is read? How does python treat the variable line (after a readline)
differently after a blank line or the last line of the file?

As said before: a blank line is not really empty but contains a newline
character. When the file ends, readline() returns an empty string.

Ciao,
Marc 'BlackJack' Rintsch
 
A

Aaron Deskins

Marc said:
A line without anything but whitespace characters "splits" to an empty
list.

Thanks for the info. I wasn't aware that "\n" is whitespace. I'm still a
programming beginner and learning everyday! Any other whitespace
characters I should know about?

Thanks again
 
G

Grant Edwards

Thanks for the info. I wasn't aware that "\n" is whitespace. I'm still a
programming beginner and learning everyday! Any other whitespace
characters I should know about?

Usually space, carriage-return, horizontal-tab, vertical-tab,
and form-feed. You may want to consule the documentation on
the split() string method to find out what it considers white
space.
 
P

Peter Hansen

Aaron said:
Thanks for the info. I wasn't aware that "\n" is whitespace. I'm still a
programming beginner and learning everyday! Any other whitespace
characters I should know about?
c:\>python
6

That's ASCII TAB, LF, VT, FF, CR, and SPACE.

-Peter
 
D

Dennis Lee Bieber

Thanks for the info. I wasn't aware that "\n" is whitespace. I'm still a
programming beginner and learning everyday! Any other whitespace
characters I should know about?

tab, newline, <ctrl-k>/?vertical-tab?, <ctrl-l>/form-feed,
carriage return, and looks like a space at the end

--
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,202
Messages
2,571,057
Members
47,667
Latest member
DaniloB294

Latest Threads

Top