Problem with reading CSV file from URL, last record truncated.

KB · Aug 3, 2009

Hi,

I am trying to download from a URL, a CSV using the following:

import re
import urllib, urllib2, cookielib
import mechanize
import csv
import numpy
import os

def return_ranking():

cj = mechanize.MSIECookieJar(delayload=True)
cj.load_from_registry() # finds cookie index file from registry

# set things up for cookies

opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))

urllib2.install_opener(opener)

reply = opener.open('http://ichart.finance.yahoo.com/table.csv?
s=CSCO&a=00&b=01&c=2009&d=01&e=2&f=2010&g=d&ignore=.csv').read()

fout=open('csco.csv','wb')
fout.write(reply)
fout.close

fin=open('csco.csv','rb')
table = csv.reader(fin)
fin.close

for row in table:
print row

return_ranking()

I need to use cookies etc (mechanize/urllib2) for a different, more
complex URL but since it wasn't working, I went back to a simple Yahoo
example (above) which I have working with urllib (not urllib2).

The behaviour I am seeing is that the last record is being truncated:

(sample output)
['2009-04-08', '17.29', '17.33', '16.94', '17.13', '45389100',
'17.13']
['2009-04-07', '17.20', '17.25', '16.58', '16.85', '59902600',
'16.85']
['200']

A friend said I should do the above writing out to a file and have
csvreader read in the file, but as you can see, to no avail!

Any help greatly appreciated! Note that urllib.urlretrieve works
perfectly but I give up the ability to import cookies from my registry
which is all important (AFAIK anyway mechanize requires urllib2).

Any help greatly appreciated.

KB · Aug 4, 2009

Hi,

I am trying to download from a URL, a CSV using the following:

import re
import urllib, urllib2, cookielib
import mechanize
import csv
import numpy
import os

def return_ranking():

cj = mechanize.MSIECookieJar(delayload=True)
cj.load_from_registry() # finds cookie index file from registry

# set things up for cookies

opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))

urllib2.install_opener(opener)

reply = opener.open('http://ichart.finance.yahoo.com/table.csv?
s=CSCO&a=00&b=01&c=2009&d=01&e=2&f=2010&g=d&ignore=.csv').read()

fout=open('csco.csv','wb')
fout.write(reply)
fout.close

return_ranking()

I need to use cookies etc (mechanize/urllib2) for a different, more
complex URL but since it wasn't working, I went back to a simple Yahoo
example (above) which I have working with urllib (not urllib2).

The behaviour I am seeing is that the last record is being truncated:

(sample output)
['2009-04-08', '17.29', '17.33', '16.94', '17.13', '45389100',
'17.13']
['2009-04-07', '17.20', '17.25', '16.58', '16.85', '59902600',
'16.85']
['200']

A friend said I should do the above writing out to a file and have
csvreader read in the file, but as you can see, to no avail!

Any help greatly appreciated! Note that urllib.urlretrieve works
perfectly but I give up the ability to import cookies from my registry
which is all important (AFAIK anyway mechanize requires urllib2).

Any help greatly appreciated.

By moving:

fin=open('csco.csv','rb')
table = csv.reader(fin)
fin.close

for row in table:
print row

outside of the routine and into the mainline, it works like a charm.

Would like to know why though, so would love to hear any clues!

MRAB · Aug 4, 2009

This should be:

fout.close()

[snip]

By moving:

fin=open('csco.csv','rb')
table = csv.reader(fin)
fin.close

Click to expand...

This should be:

fin.close()

outside of the routine and into the mainline, it works like a charm.

Would like to know why though, so would love to hear any clues!

The parentheses aren't optional; without them you're just referring to
the method, not calling it.

Because you weren't closing the file the text wasn't all written to
disk. When it returns from return_ranking() there's no longer any
reference to 'fout', so the file object is available for collection by
the garbage collector. When the file object is collected it writes the
remaining text to disk. In CPython the file object is collected as soon
as there's no reference to it, but in other implementations that might
not be the case.

Problem with reading CSV file from URL, last record truncated.

KB

KB

MRAB

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads