Download excel file from web?

P

patf

Jul 28, 2008

#1

Hi - experienced programmer but this is my first Python program.

This URL will retrieve an excel spreadsheet containing (that day's)
msci stock index returns.

http://www.mscibarra.com/webapp/ind...sOf=Jul+25,+2008&export=Excel_IEIPerfRegional

Want to write python to download and save the file.

So far I've arrived at this:

P

patf

Jul 28, 2008

#2

Hi - experienced programmer but this is my first Python program.

This URL will retrieve an excel spreadsheet containing (that day's)
msci stock index returns.

http://www.mscibarra.com/webapp/indexperf/excel?priceLevel=0&scope=0&....

Want to write python to download and save the file.

So far I've arrived at this:

# import pdb
import urllib2
from win32com.client import Dispatch

xlApp = Dispatch("Excel.Application")

# test 1
# xlApp.Workbooks.Add()
# xlApp.ActiveSheet.Cells(1,1).Value = 'A'
# xlApp.ActiveWorkbook.ActiveSheet.Cells(2,1).Value = 'B'
# xlBook = xlApp.ActiveWorkbook
# xlBook.SaveAs(Filename='C:\\test.xls')

# pdb.set_trace()
response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&market=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional')
# test 2 - returns check = False
check_for_data = urllib2.Request('http://www.mscibarra.com/webapp/
indexperf/excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&market=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional').has_data()

xlApp = response.fp
print(response.fp.name)
print(xlApp.name)
xlApp.write
xlApp.Close

Click to expand...

Woops hit Send when I wanted Preview. Looks like the html

tag
doesn't work from groups.google.com (nice).

Anway, in test 1 above, I determined how to instantiate an excel
object; put some stuff in it; then save to disk.

So, in theory, I'm retrieving my excel spreadsheet with

response = urllib2.urlopen()

Except what then do I do with this?

Well for one read some of the urllib2 documentation and found the
Request class with the method has_data() on it. It returns False.
Hmm that's not encouraging.

I supposed the trick to understand what urllib2.urlopen is returning
to me; rummage around in there; and hopefully find my excel file.

I use pdb to debug. This is interesting:

(Pdb) dir(response)
['__doc__', '__init__', '__iter__', '__module__', '__repr__', 'close',
'code', '
fileno', 'fp', 'geturl', 'headers', 'info', 'msg', 'next', 'read',
'readline', '
readlines', 'url']
(Pdb)

I suppose the members with __*_ are methods; and the names without the
underbars are attributes (variables) (?).

Or maybe this isn't at all the right direction to take (maybe there
are much better modules to do this stuff). Would be happy to learn if
that's the case (and if that gets the job done for me).

pat

D

Diez B. Roggisch

Jul 28, 2008

#3

Hi - experienced programmer but this is my first Python program.

This URL will retrieve an excel spreadsheet containing (that day's)
msci stock index returns.

http://www.mscibarra.com/webapp/indexperf/excel?priceLevel=0&scope=0&...

Want to write python to download and save the file.

So far I've arrived at this:

# import pdb
import urllib2
from win32com.client import Dispatch

xlApp = Dispatch("Excel.Application")

# test 1
# xlApp.Workbooks.Add()
# xlApp.ActiveSheet.Cells(1,1).Value = 'A'
# xlApp.ActiveWorkbook.ActiveSheet.Cells(2,1).Value = 'B'
# xlBook = xlApp.ActiveWorkbook
# xlBook.SaveAs(Filename='C:\\test.xls')

# pdb.set_trace()
response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&market=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional')
# test 2 - returns check = False
check_for_data = urllib2.Request('http://www.mscibarra.com/webapp/
indexperf/excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&market=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional').has_data()

xlApp = response.fp
print(response.fp.name)
print(xlApp.name)
xlApp.write
xlApp.Close

Click to expand...

Click to expand...

Woops hit Send when I wanted Preview. Looks like the html

tag
doesn't work from groups.google.com (nice).

Anway, in test 1 above, I determined how to instantiate an excel
object; put some stuff in it; then save to disk.

So, in theory, I'm retrieving my excel spreadsheet with

response = urllib2.urlopen()

Except what then do I do with this?

Well for one read some of the urllib2 documentation and found the
Request class with the method has_data() on it. It returns False.
Hmm that's not encouraging.

I supposed the trick to understand what urllib2.urlopen is returning
to me; rummage around in there; and hopefully find my excel file.

I use pdb to debug. This is interesting:

(Pdb) dir(response)
['__doc__', '__init__', '__iter__', '__module__', '__repr__', 'close',
'code', '
fileno', 'fp', 'geturl', 'headers', 'info', 'msg', 'next', 'read',
'readline', '
readlines', 'url']
(Pdb)

I suppose the members with __*_ are methods; and the names without the
underbars are attributes (variables) (?).

Click to expand...

No, these are the names of all attributes and methods. read is a method,
for example.

Or maybe this isn't at all the right direction to take (maybe there
are much better modules to do this stuff). Would be happy to learn if
that's the case (and if that gets the job done for me).

Click to expand...

The docs (http://docs.python.org/lib/module-urllib2.html) are pretty
clear on this:

"""
This function returns a file-like object with two additional methods:
"""

And then for file-like objects:

http://docs.python.org/lib/bltin-file-objects.html

"""
read( [size])
Read at most size bytes from the file (less if the read hits EOF
before obtaining size bytes). If the size argument is negative or
omitted, read all data until EOF is reached. The bytes are returned as a
string object. An empty string is returned when EOF is encountered
immediately. (For certain files, like ttys, it makes sense to continue
reading after an EOF is hit.) Note that this method may call the
underlying C function fread() more than once in an effort to acquire as
close to size bytes as possible. Also note that when in non-blocking
mode, less data than what was requested may be returned, even if no size
parameter was given.
"""

Diez

P

patf

Jul 28, 2008

#4

(e-mail address removed) schrieb:

Hi - experienced programmer but this is my first Python program.
This URL will retrieve an excel spreadsheet containing (that day's)
msci stock index returns.
http://www.mscibarra.com/webapp/indexperf/excel?priceLevel=0&scope=0&...
Want to write python to download and save the file.
So far I've arrived at this:

# import pdb
import urllib2
from win32com.client import Dispatch
xlApp = Dispatch("Excel.Application")
# test 1
# xlApp.Workbooks.Add()
# xlApp.ActiveSheet.Cells(1,1).Value = 'A'
# xlApp.ActiveWorkbook.ActiveSheet.Cells(2,1).Value = 'B'
# xlBook = xlApp.ActiveWorkbook
# xlBook.SaveAs(Filename='C:\\test.xls')
# pdb.set_trace()
response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&market=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional')
# test 2 - returns check = False
check_for_data = urllib2.Request('http://www.mscibarra.com/webapp/
indexperf/excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&market=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional').has_data()
xlApp = response.fp
print(response.fp.name)
print(xlApp.name)
xlApp.write
xlApp.Close

Click to expand...

Click to expand...

Click to expand...

Woops hit Send when I wanted Preview. Looks like the html

tag
doesn't work from groups.google.com (nice).

Click to expand...

Anway, in test 1 above, I determined how to instantiate an excel
object; put some stuff in it; then save to disk.

Click to expand...

So, in theory, I'm retrieving my excel spreadsheet with

Click to expand...

response = urllib2.urlopen()

Click to expand...

Except what then do I do with this?

Click to expand...

Well for one read some of the urllib2 documentation and found the
Request class with the method has_data() on it. It returns False.
Hmm that's not encouraging.

Click to expand...

I supposed the trick to understand what urllib2.urlopen is returning
to me; rummage around in there; and hopefully find my excel file.

Click to expand...

I use pdb to debug. This is interesting:

Click to expand...

(Pdb) dir(response)
['__doc__', '__init__', '__iter__', '__module__', '__repr__', 'close',
'code', '
fileno', 'fp', 'geturl', 'headers', 'info', 'msg', 'next', 'read',
'readline', '
readlines', 'url']
(Pdb)

Click to expand...

I suppose the members with __*_ are methods; and the names without the
underbars are attributes (variables) (?).

Click to expand...

No, these are the names of all attributes and methods. read is a method,
for example.

Click to expand...

right - I got it backwards.

Or maybe this isn't at all the right direction to take (maybe there
are much better modules to do this stuff). Would be happy to learn if
that's the case (and if that gets the job done for me).

Click to expand...

The docs (http://docs.python.org/lib/module-urllib2.html) are pretty
clear on this:

"""
This function returns a file-like object with two additional methods:
"""

And then for file-like objects:

http://docs.python.org/lib/bltin-file-objects.html

"""
read( [size])
Read at most size bytes from the file (less if the read hits EOF
before obtaining size bytes). If the size argument is negative or
omitted, read all data until EOF is reached. The bytes are returned as a
string object. An empty string is returned when EOF is encountered
immediately. (For certain files, like ttys, it makes sense to continue
reading after an EOF is hit.) Note that this method may call the
underlying C function fread() more than once in an effort to acquire as
close to size bytes as possible. Also note that when in non-blocking
mode, less data than what was requested may be returned, even if no size
parameter was given.
"""

Diez

Click to expand...

Just stumbled upon .read:

response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&market=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional').read

Now the question is: what to do with this? I'll look at the
documentation that you point to.

thanx - pat

P

patf

Jul 28, 2008

#5

(e-mail address removed) schrieb:

Hi - experienced programmer but this is my first Python program.
This URL will retrieve an excel spreadsheet containing (that day's)
msci stock index returns.
http://www.mscibarra.com/webapp/indexperf/excel?priceLevel=0&scope=0&...
Want to write python to download and save the file.
So far I've arrived at this:

# import pdb
import urllib2
from win32com.client import Dispatch
xlApp = Dispatch("Excel.Application")
# test 1
# xlApp.Workbooks.Add()
# xlApp.ActiveSheet.Cells(1,1).Value = 'A'
# xlApp.ActiveWorkbook.ActiveSheet.Cells(2,1).Value = 'B'
# xlBook = xlApp.ActiveWorkbook
# xlBook.SaveAs(Filename='C:\\test.xls')
# pdb.set_trace()
response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&market=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional')
# test 2 - returns check = False
check_for_data = urllib2.Request('http://www.mscibarra.com/webapp/
indexperf/excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&market=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional').has_data()
xlApp = response.fp
print(response.fp.name)
print(xlApp.name)
xlApp.write
xlApp.Close

Click to expand...

Woops hit Send when I wanted Preview. Looks like the html

tag
doesn't work from groups.google.com (nice).
Anway, in test 1 above, I determined how to instantiate an excel
object; put some stuff in it; then save to disk.
So, in theory, I'm retrieving my excel spreadsheet with
response = urllib2.urlopen()
Except what then do I do with this?
Well for one read some of the urllib2 documentation and found the
Request class with the method has_data() on it. It returns False.
Hmm that's not encouraging.
I supposed the trick to understand what urllib2.urlopen is returning
to me; rummage around in there; and hopefully find my excel file.
I use pdb to debug. This is interesting:
(Pdb) dir(response)
['__doc__', '__init__', '__iter__', '__module__', '__repr__', 'close',
'code', '
fileno', 'fp', 'geturl', 'headers', 'info', 'msg', 'next', 'read',
'readline', '
readlines', 'url']
(Pdb)
I suppose the members with __*_ are methods; and the names without the
underbars are attributes (variables) (?).

Click to expand...

Click to expand...

No, these are the names of all attributes and methods. read is a method,
for example.

Click to expand...

right - I got it backwards.

The docs (http://docs.python.org/lib/module-urllib2.html) are pretty
clear on this:

Click to expand...

"""
This function returns a file-like object with two additional methods:
"""

Click to expand...

And then for file-like objects:

http://docs.python.org/lib/bltin-file-objects.html

Click to expand...

"""
read( [size])
Read at most size bytes from the file (less if the read hits EOF
before obtaining size bytes). If the size argument is negative or
omitted, read all data until EOF is reached. The bytes are returned as a
string object. An empty string is returned when EOF is encountered
immediately. (For certain files, like ttys, it makes sense to continue
reading after an EOF is hit.) Note that this method may call the
underlying C function fread() more than once in an effort to acquire as
close to size bytes as possible. Also note that when in non-blocking
mode, less data than what was requested may be returned, even if no size
parameter was given.
"""

Click to expand...

Diez

Click to expand...

Just stumbled upon .read:

response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&market=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional').read

Now the question is: what to do with this? I'll look at the
documentation that you point to.

thanx - pat

Click to expand...

Or rather (next iteration):

response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&market=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional').read(1000000)

The file is generally something like 26 KB so specifying 1,000,000
seems like a good idea (first approximation).

And then when I do:

print(response)

I get a whole lot of garbage (and some non-garbage), so I know I'm
onto something.

When I read the .read documentation further, it says that read() has
returned the data as a string object. Now - how do I convince Python
that the string object is in fact an excel file - and save it to disk?

pat

G

Guilherme Polo

Jul 28, 2008

#6

(e-mail address removed) schrieb:

Click to expand...

Hi - experienced programmer but this is my first Python program.

Click to expand...

This URL will retrieve an excel spreadsheet containing (that day's)
msci stock index returns.

Want to write python to download and save the file.

Click to expand...

So far I've arrived at this:

# import pdb
import urllib2
from win32com.client import Dispatch

Click to expand...

xlApp = Dispatch("Excel.Application")

Click to expand...

# test 1
# xlApp.Workbooks.Add()
# xlApp.ActiveSheet.Cells(1,1).Value = 'A'
# xlApp.ActiveWorkbook.ActiveSheet.Cells(2,1).Value = 'B'
# xlBook = xlApp.ActiveWorkbook
# xlBook.SaveAs(Filename='C:\\test.xls')

Click to expand...

# pdb.set_trace()
response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&market=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional')
# test 2 - returns check = False
check_for_data = urllib2.Request('http://www.mscibarra.com/webapp/
indexperf/excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&market=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional').has_data()

Click to expand...

xlApp = response.fp
print(response.fp.name)
print(xlApp.name)
xlApp.write
xlApp.Close

Click to expand...

Click to expand...

Woops hit Send when I wanted Preview. Looks like the html

tag
doesn't work from groups.google.com (nice).

Click to expand...

Anway, in test 1 above, I determined how to instantiate an excel
object; put some stuff in it; then save to disk.

Click to expand...

So, in theory, I'm retrieving my excel spreadsheet with

Click to expand...

response = urllib2.urlopen()

Click to expand...

Except what then do I do with this?

Click to expand...

Well for one read some of the urllib2 documentation and found the
Request class with the method has_data() on it. It returns False.
Hmm that's not encouraging.

Click to expand...

I supposed the trick to understand what urllib2.urlopen is returning
to me; rummage around in there; and hopefully find my excel file.

Click to expand...

I use pdb to debug. This is interesting:

Click to expand...

(Pdb) dir(response)
['__doc__', '__init__', '__iter__', '__module__', '__repr__', 'close',
'code', '
fileno', 'fp', 'geturl', 'headers', 'info', 'msg', 'next', 'read',
'readline', '
readlines', 'url']
(Pdb)

Click to expand...

I suppose the members with __*_ are methods; and the names without the
underbars are attributes (variables) (?).

Click to expand...

No, these are the names of all attributes and methods. read is a method,
for example.

Click to expand...

right - I got it backwards.

Or maybe this isn't at all the right direction to take (maybe there
are much better modules to do this stuff). Would be happy to learn if
that's the case (and if that gets the job done for me).

Click to expand...

The docs (http://docs.python.org/lib/module-urllib2.html) are pretty
clear on this:

Click to expand...

"""
This function returns a file-like object with two additional methods:
"""

Click to expand...

And then for file-like objects:

http://docs.python.org/lib/bltin-file-objects.html

Click to expand...

"""
read( [size])
Read at most size bytes from the file (less if the read hits EOF
before obtaining size bytes). If the size argument is negative or
omitted, read all data until EOF is reached. The bytes are returned as a
string object. An empty string is returned when EOF is encountered
immediately. (For certain files, like ttys, it makes sense to continue
reading after an EOF is hit.) Note that this method may call the
underlying C function fread() more than once in an effort to acquire as
close to size bytes as possible. Also note that when in non-blocking
mode, less data than what was requested may be returned, even if no size
parameter was given.
"""

Click to expand...

Diez

Click to expand...

Just stumbled upon .read:

response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&market=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional').read

Now the question is: what to do with this? I'll look at the
documentation that you point to.

thanx - pat

Click to expand...

Or rather (next iteration):

response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&market=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional').read(1000000)

The file is generally something like 26 KB so specifying 1,000,000
seems like a good idea (first approximation).

And then when I do:

print(response)

I get a whole lot of garbage (and some non-garbage), so I know I'm
onto something.

When I read the .read documentation further, it says that read() has
returned the data as a string object. Now - how do I convince Python
that the string object is in fact an excel file - and save it to disk?

Click to expand...

You don't need to convince Python, just write it to a file.
More reading for you: http://docs.python.org/tut/node9.html

P

patf

Jul 29, 2008

#7

(e-mail address removed) schrieb:
Hi - experienced programmer but this is my first Python program.
This URL will retrieve an excel spreadsheet containing (that day's)
msci stock index returns.
http://www.mscibarra.com/webapp/indexperf/excel?priceLevel=0&scope=0&...
Want to write python to download and save the file.
So far I've arrived at this:

# import pdb
import urllib2
from win32com.client import Dispatch
xlApp = Dispatch("Excel.Application")
# test 1
# xlApp.Workbooks.Add()
# xlApp.ActiveSheet.Cells(1,1).Value = 'A'
# xlApp.ActiveWorkbook.ActiveSheet.Cells(2,1).Value = 'B'
# xlBook = xlApp.ActiveWorkbook
# xlBook.SaveAs(Filename='C:\\test.xls')
# pdb.set_trace()
response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&market=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional')
# test 2 - returns check = False
check_for_data = urllib2.Request('http://www.mscibarra.com/webapp/
indexperf/excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&market=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional').has_data()
xlApp = response.fp
print(response.fp.name)
print(xlApp.name)
xlApp.write
xlApp.Close

Click to expand...

Woops hit Send when I wanted Preview. Looks like the html

tag
doesn't work from groups.google.com (nice).
Anway, in test 1 above, I determined how to instantiate an excel
object; put some stuff in it; then save to disk.
So, in theory, I'm retrieving my excel spreadsheet with
response = urllib2.urlopen()
Except what then do I do with this?
Well for one read some of the urllib2 documentation and found the
Request class with the method has_data() on it. It returns False.
Hmm that's not encouraging.
I supposed the trick to understand what urllib2.urlopen is returning
to me; rummage around in there; and hopefully find my excel file.
I use pdb to debug. This is interesting:
(Pdb) dir(response)
['__doc__', '__init__', '__iter__', '__module__', '__repr__', 'close',
'code', '
fileno', 'fp', 'geturl', 'headers', 'info', 'msg', 'next', 'read',
'readline', '
readlines', 'url']
(Pdb)
I suppose the members with __*_ are methods; and the names without the
underbars are attributes (variables) (?).
No, these are the names of all attributes and methods. read is a method,
for example.
right - I got it backwards.
Or maybe this isn't at all the right direction to take (maybe there
are much better modules to do this stuff). Would be happy to learn if
that's the case (and if that gets the job done for me).
The docs (http://docs.python.org/lib/module-urllib2.html) are pretty
clear on this:
"""
This function returns a file-like object with two additional methods:
"""
And then for file-like objects:
http://docs.python.org/lib/bltin-file-objects.html
"""
read( [size])
Read at most size bytes from the file (less if the read hits EOF
before obtaining size bytes). If the size argument is negative or
omitted, read all data until EOF is reached. The bytes are returned as a
string object. An empty string is returned when EOF is encountered
immediately. (For certain files, like ttys, it makes sense to continue
reading after an EOF is hit.) Note that this method may call the
underlying C function fread() more than once in an effort to acquire as
close to size bytes as possible. Also note that when in non-blocking
mode, less data than what was requested may be returned, even if no size
parameter was given.
"""
Diez
Just stumbled upon .read:
response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&market=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional').read
Now the question is: what to do with this? I'll look at the
documentation that you point to.
thanx - pat

Click to expand...

Click to expand...

Or rather (next iteration):

Click to expand...

response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&market=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional').read(1000000)

Click to expand...

The file is generally something like 26 KB so specifying 1,000,000
seems like a good idea (first approximation).

Click to expand...

And then when I do:

I get a whole lot of garbage (and some non-garbage), so I know I'm
onto something.

Click to expand...

When I read the .read documentation further, it says that read() has
returned the data as a string object. Now - how do I convince Python
that the string object is in fact an excel file - and save it to disk?

Click to expand...

You don't need to convince Python, just write it to a file.
More reading for you:http://docs.python.org/tut/node9.html

pat

Click to expand...

Click to expand...

OK:

response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&market=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional').read(1000000)
# print(response)
f = open("c:\\msci.xls",'w')
f.write(response)

OK this makes the file, and there's a c:\msci.xls in place and it's
about the right size. But whether I make the second param to open 'w'
or 'wb', when I try to open msci.xls from the Windows file explorer,
excel tells me that the file is corrupted.

pat

P

patf

Jul 29, 2008

#8

(e-mail address removed) schrieb:
Hi - experienced programmer but this is my first Python program..
This URL will retrieve an excel spreadsheet containing (that day's)
msci stock index returns.
http://www.mscibarra.com/webapp/indexperf/excel?priceLevel=0&scope=0&...
Want to write python to download and save the file.
So far I've arrived at this:

# import pdb
import urllib2
from win32com.client import Dispatch
xlApp = Dispatch("Excel.Application")
# test 1
# xlApp.Workbooks.Add()
# xlApp.ActiveSheet.Cells(1,1).Value = 'A'
# xlApp.ActiveWorkbook.ActiveSheet.Cells(2,1).Value = 'B'
# xlBook = xlApp.ActiveWorkbook
# xlBook.SaveAs(Filename='C:\\test.xls')
# pdb.set_trace()
response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&market=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional')
# test 2 - returns check = False
check_for_data = urllib2.Request('http://www.mscibarra.com/webapp/
indexperf/excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&market=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional').has_data()
xlApp = response.fp
print(response.fp.name)
print(xlApp.name)
xlApp.write
xlApp.Close

Click to expand...

Woops hit Send when I wanted Preview. Looks like the html

tag
doesn't work from groups.google.com (nice).
Anway, in test 1 above, I determined how to instantiate an excel
object; put some stuff in it; then save to disk.
So, in theory, I'm retrieving my excel spreadsheet with
response = urllib2.urlopen()
Except what then do I do with this?
Well for one read some of the urllib2 documentation and found the
Request class with the method has_data() on it. It returns False.
Hmm that's not encouraging.
I supposed the trick to understand what urllib2.urlopen is returning
to me; rummage around in there; and hopefully find my excel file..
I use pdb to debug. This is interesting:
(Pdb) dir(response)
['__doc__', '__init__', '__iter__', '__module__', '__repr__', 'close',
'code', '
fileno', 'fp', 'geturl', 'headers', 'info', 'msg', 'next', 'read',
'readline', '
readlines', 'url']
(Pdb)
I suppose the members with __*_ are methods; and the names without the
underbars are attributes (variables) (?).
No, these are the names of all attributes and methods. read is a method,
for example.
right - I got it backwards.
Or maybe this isn't at all the right direction to take (maybe there
are much better modules to do this stuff). Would be happy to learn if
that's the case (and if that gets the job done for me).
The docs (http://docs.python.org/lib/module-urllib2.html) are pretty
clear on this:
"""
This function returns a file-like object with two additional methods:
"""
And then for file-like objects:
http://docs.python.org/lib/bltin-file-objects.html
"""
read( [size])
Read at most size bytes from the file (less if the read hits EOF
before obtaining size bytes). If the size argument is negative or
omitted, read all data until EOF is reached. The bytes are returned as a
string object. An empty string is returned when EOF is encountered
immediately. (For certain files, like ttys, it makes sense to continue
reading after an EOF is hit.) Note that this method may call the
underlying C function fread() more than once in an effort to acquire as
close to size bytes as possible. Also note that when in non-blocking
mode, less data than what was requested may be returned, even if no size
parameter was given.
"""
Diez
Just stumbled upon .read:
response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&market=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional').read
Now the question is: what to do with this? I'll look at the
documentation that you point to.
thanx - pat
Or rather (next iteration):
response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&market=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional').read(1000000)
The file is generally something like 26 KB so specifying 1,000,000
seems like a good idea (first approximation).
And then when I do:
print(response)
I get a whole lot of garbage (and some non-garbage), so I know I'm
onto something.
When I read the .read documentation further, it says that read() has
returned the data as a string object. Now - how do I convince Python
that the string object is in fact an excel file - and save it to disk?

Click to expand...

Click to expand...

You don't need to convince Python, just write it to a file.
More reading for you:http://docs.python.org/tut/node9.html

Click to expand...

OK:

response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&market=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional').read(1000000)
# print(response)
f = open("c:\\msci.xls",'w')
f.write(response)

OK this makes the file, and there's a c:\msci.xls in place and it's
about the right size. But whether I make the second param to open 'w'
or 'wb', when I try to open msci.xls from the Windows file explorer,
excel tells me that the file is corrupted.

pat

Click to expand...

Nope - must have been stumbling over my own feet.

'wb' _is_ necessary (as I would expect).

So it works:

# pdb.set_trace()
response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&market=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional').read(1000000)
# print(response)
f = open("c:\\msci.xls",'wb')
f.write(response)
f.flush
f.close

I know the f.flush and f.close are redundant - in the sense that both
flush the contents to disk. So I can probably just take out the
f.flush.

Thanx for the help.

pat

G

Guilherme Polo

Jul 29, 2008

#9

On Jul 28, 3:29 pm, "Diez B. Roggisch" <[email protected]> wrote:

Click to expand...

(e-mail address removed) schrieb:

Click to expand...

Hi - experienced programmer but this is my first Python program.

Click to expand...

This URL will retrieve an excel spreadsheet containing (that day's)
msci stock index returns.

Want to write python to download and save the file.

Click to expand...

So far I've arrived at this:

# import pdb
import urllib2
from win32com.client import Dispatch

Click to expand...

xlApp = Dispatch("Excel.Application")

Click to expand...

# test 1
# xlApp.Workbooks.Add()
# xlApp.ActiveSheet.Cells(1,1).Value = 'A'
# xlApp.ActiveWorkbook.ActiveSheet.Cells(2,1).Value = 'B'
# xlBook = xlApp.ActiveWorkbook
# xlBook.SaveAs(Filename='C:\\test.xls')

Click to expand...

# pdb.set_trace()
response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&market=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional')
# test 2 - returns check = False
check_for_data = urllib2.Request('http://www.mscibarra.com/webapp/
indexperf/excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&market=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional').has_data()

Click to expand...

xlApp = response.fp
print(response.fp.name)
print(xlApp.name)
xlApp.write
xlApp.Close

Click to expand...

Click to expand...

Woops hit Send when I wanted Preview. Looks like the html

tag
doesn't work from groups.google.com (nice).

Click to expand...

Anway, in test 1 above, I determined how to instantiate an excel
object; put some stuff in it; then save to disk.

Click to expand...

So, in theory, I'm retrieving my excel spreadsheet with

Click to expand...

response = urllib2.urlopen()

Click to expand...

Except what then do I do with this?

Click to expand...

Well for one read some of the urllib2 documentation and found the
Request class with the method has_data() on it. It returns False.
Hmm that's not encouraging.

Click to expand...

I supposed the trick to understand what urllib2.urlopen is returning
to me; rummage around in there; and hopefully find my excel file.

Click to expand...

I use pdb to debug. This is interesting:

Click to expand...

(Pdb) dir(response)
['__doc__', '__init__', '__iter__', '__module__', '__repr__', 'close',
'code', '
fileno', 'fp', 'geturl', 'headers', 'info', 'msg', 'next', 'read',
'readline', '
readlines', 'url']
(Pdb)

Click to expand...

I suppose the members with __*_ are methods; and the names without the
underbars are attributes (variables) (?).

Click to expand...

No, these are the names of all attributes and methods. read is a method,
for example.

Click to expand...

right - I got it backwards.

Click to expand...

Or maybe this isn't at all the right direction to take (maybe there
are much better modules to do this stuff). Would be happy to learn if
that's the case (and if that gets the job done for me).

Click to expand...

The docs (http://docs.python.org/lib/module-urllib2.html) are pretty
clear on this:

Click to expand...

"""
This function returns a file-like object with two additional methods:
"""

Click to expand...

And then for file-like objects:

http://docs.python.org/lib/bltin-file-objects.html

Click to expand...

"""
read( [size])
Read at most size bytes from the file (less if the read hits EOF
before obtaining size bytes). If the size argument is negative or
omitted, read all data until EOF is reached. The bytes are returned as a
string object. An empty string is returned when EOF is encountered
immediately. (For certain files, like ttys, it makes sense to continue
reading after an EOF is hit.) Note that this method may call the
underlying C function fread() more than once in an effort to acquire as
close to size bytes as possible. Also note that when in non-blocking
mode, less data than what was requested may be returned, even if no size
parameter was given.
"""

Diez

Click to expand...

Just stumbled upon .read:

Click to expand...

response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&market=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional').read

Click to expand...

Now the question is: what to do with this? I'll look at the
documentation that you point to.

Click to expand...

thanx - pat

Click to expand...

Or rather (next iteration):

Click to expand...

response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&market=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional').read(1000000)

Click to expand...

The file is generally something like 26 KB so specifying 1,000,000
seems like a good idea (first approximation).

Click to expand...

And then when I do:

I get a whole lot of garbage (and some non-garbage), so I know I'm
onto something.

Click to expand...

When I read the .read documentation further, it says that read() has
returned the data as a string object. Now - how do I convince Python
that the string object is in fact an excel file - and save it to disk?

Click to expand...

You don't need to convince Python, just write it to a file.
More reading for you:http://docs.python.org/tut/node9.html

pat

Click to expand...

Click to expand...

OK:

response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&market=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional').read(1000000)
# print(response)
f = open("c:\\msci.xls",'w')
f.write(response)

Click to expand...

I would initially change that to:

response = urllib2.urlopen('http://www.mscibarra.com/webapp/ind...sOf=Jul+25,+2008&export=Excel_IEIPerfRegional')

f = open("c:\\msci.xls", "wb")
for line in response:
f.write(line)
f.close()

and then..

OK this makes the file, and there's a c:\msci.xls in place and it's
about the right size. But whether I make the second param to open 'w'
or 'wb', when I try to open msci.xls from the Windows file explorer,
excel tells me that the file is corrupted.

Click to expand...

try it.

P

patf

Jul 29, 2008

#10

(e-mail address removed) schrieb:
Hi - experienced programmer but this is my first Python program.
This URL will retrieve an excel spreadsheet containing (that day's)
msci stock index returns.
http://www.mscibarra.com/webapp/indexperf/excel?priceLevel=0&scope=0&...
Want to write python to download and save the file.
So far I've arrived at this:

# import pdb
import urllib2
from win32com.client import Dispatch
xlApp = Dispatch("Excel.Application")
# test 1
# xlApp.Workbooks.Add()
# xlApp.ActiveSheet.Cells(1,1).Value = 'A'
# xlApp.ActiveWorkbook.ActiveSheet.Cells(2,1).Value = 'B'
# xlBook = xlApp.ActiveWorkbook
# xlBook.SaveAs(Filename='C:\\test.xls')
# pdb.set_trace()
response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&market=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional')
# test 2 - returns check = False
check_for_data = urllib2.Request('http://www.mscibarra.com/webapp/
indexperf/excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&market=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional').has_data()
xlApp = response.fp
print(response.fp.name)
print(xlApp.name)
xlApp.write
xlApp.Close

Click to expand...

Woops hit Send when I wanted Preview. Looks like the html

tag
doesn't work from groups.google.com (nice).
Anway, in test 1 above, I determined how to instantiate an excel
object; put some stuff in it; then save to disk.
So, in theory, I'm retrieving my excel spreadsheet with
response = urllib2.urlopen()
Except what then do I do with this?
Well for one read some of the urllib2 documentation and found the
Request class with the method has_data() on it. It returns False.
Hmm that's not encouraging.
I supposed the trick to understand what urllib2.urlopen is returning
to me; rummage around in there; and hopefully find my excel file.
I use pdb to debug. This is interesting:
(Pdb) dir(response)
['__doc__', '__init__', '__iter__', '__module__', '__repr__', 'close',
'code', '
fileno', 'fp', 'geturl', 'headers', 'info', 'msg', 'next', 'read',
'readline', '
readlines', 'url']
(Pdb)
I suppose the members with __*_ are methods; and the names without the
underbars are attributes (variables) (?).
No, these are the names of all attributes and methods. read is a method,
for example.
right - I got it backwards.
Or maybe this isn't at all the right direction to take (maybe there
are much better modules to do this stuff). Would be happy to learn if
that's the case (and if that gets the job done for me).
The docs (http://docs.python.org/lib/module-urllib2.html) are pretty
clear on this:
"""
This function returns a file-like object with two additional methods:
"""
And then for file-like objects:
http://docs.python.org/lib/bltin-file-objects.html
"""
read( [size])
Read at most size bytes from the file (less if the read hits EOF
before obtaining size bytes). If the size argument is negative or
omitted, read all data until EOF is reached. The bytes are returned as a
string object. An empty string is returned when EOF is encountered
immediately. (For certain files, like ttys, it makes sense to continue
reading after an EOF is hit.) Note that this method may call the
underlying C function fread() more than once in an effort to acquire as
close to size bytes as possible. Also note that when in non-blocking
mode, less data than what was requested may be returned, even if no size
parameter was given.
"""
Diez
Just stumbled upon .read:
response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&market=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional').read
Now the question is: what to do with this? I'll look at the
documentation that you point to.
thanx - pat
Or rather (next iteration):
response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&market=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional').read(1000000)
The file is generally something like 26 KB so specifying 1,000,000
seems like a good idea (first approximation).
And then when I do:
print(response)
I get a whole lot of garbage (and some non-garbage), so I know I'm
onto something.
When I read the .read documentation further, it says that read() has
returned the data as a string object. Now - how do I convince Python
that the string object is in fact an excel file - and save it to disk?
You don't need to convince Python, just write it to a file.
More reading for you:http://docs.python.org/tut/node9.html
pat

Click to expand...

OK:

Click to expand...

response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&market=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional').read(1000000)
# print(response)
f = open("c:\\msci.xls",'w')
f.write(response)

Click to expand...

I would initially change that to:

response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/excel?priceLevel=0&scope=0&...)

f = open("c:\\msci.xls", "wb")
for line in response:
f.write(line)
f.close()

and then..

OK this makes the file, and there's a c:\msci.xls in place and it's
about the right size. But whether I make the second param to open 'w'
or 'wb', when I try to open msci.xls from the Windows file explorer,
excel tells me that the file is corrupted.

Click to expand...

try it.

pat

Click to expand...

Click to expand...

A simple f.write(response) does work (click on a single row in Excel
and you get a single row).

But I can see that what you recommend Guilherme is probably safer -
thanx.

pat

M

MRAB

Jul 29, 2008

#11

(e-mail address removed) schrieb:
Hi - experienced programmer but this is my first Python program.
This URL will retrieve an excel spreadsheet containing (that day's)
msci stock index returns.
http://www.mscibarra.com/webapp/indexperf/excel?priceLevel=0&scope=0&...
Want to write python to download and save the file.
So far I've arrived at this:

# import pdb
import urllib2
from win32com.client import Dispatch
xlApp = Dispatch("Excel.Application")
# test 1
# xlApp.Workbooks.Add()
# xlApp.ActiveSheet.Cells(1,1).Value = 'A'
# xlApp.ActiveWorkbook.ActiveSheet.Cells(2,1).Value = 'B'
# xlBook = xlApp.ActiveWorkbook
# xlBook.SaveAs(Filename='C:\\test.xls')
# pdb.set_trace()
response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&market=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional')
# test 2 - returns check = False
check_for_data = urllib2.Request('http://www.mscibarra.com/webapp/
indexperf/excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&market=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional').has_data()
xlApp = response.fp
print(response.fp.name)
print(xlApp.name)
xlApp.write
xlApp.Close

Click to expand...

Woops hit Send when I wanted Preview. Looks like the html

tag
doesn't work from groups.google.com (nice).
Anway, in test 1 above, I determined how to instantiate an excel
object; put some stuff in it; then save to disk.
So, in theory, I'm retrieving my excel spreadsheet with
response = urllib2.urlopen()
Except what then do I do with this?
Well for one read some of the urllib2 documentation and found the
Request class with the method has_data() on it. It returns False.
Hmm that's not encouraging.
I supposed the trick to understand what urllib2.urlopen is returning
to me; rummage around in there; and hopefully find my excel file.
I use pdb to debug. This is interesting:
(Pdb) dir(response)
['__doc__', '__init__', '__iter__', '__module__', '__repr__', 'close',
'code', '
fileno', 'fp', 'geturl', 'headers', 'info', 'msg', 'next', 'read',
'readline', '
readlines', 'url']
(Pdb)
I suppose the members with __*_ are methods; and the names without the
underbars are attributes (variables) (?).
No, these are the names of all attributes and methods. read is a method,
for example.
right - I got it backwards.
Or maybe this isn't at all the right direction to take (maybe there
are much better modules to do this stuff). Would be happy to learn if
that's the case (and if that gets the job done for me).
The docs (http://docs.python.org/lib/module-urllib2.html) are pretty
clear on this:
"""
This function returns a file-like object with two additional methods:
"""
And then for file-like objects:
http://docs.python.org/lib/bltin-file-objects.html
"""
read( [size])
Read at most size bytes from the file (less if the read hits EOF
before obtaining size bytes). If the size argument is negative or
omitted, read all data until EOF is reached. The bytes are returned as a
string object. An empty string is returned when EOF is encountered
immediately. (For certain files, like ttys, it makes sense to continue
reading after an EOF is hit.) Note that this method may call the
underlying C function fread() more than once in an effort to acquire as
close to size bytes as possible. Also note that when in non-blocking
mode, less data than what was requested may be returned, even if no size
parameter was given.
"""
Diez
Just stumbled upon .read:
response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&market=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional').read
Now the question is: what to do with this? I'll look at the
documentation that you point to.
thanx - pat
Or rather (next iteration):
response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&market=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional').read(1000000)
The file is generally something like 26 KB so specifying 1,000,000
seems like a good idea (first approximation).
And then when I do:
print(response)
I get a whole lot of garbage (and some non-garbage), so I know I'm
onto something.
When I read the .read documentation further, it says that read() has
returned the data as a string object. Now - how do I convince Python
that the string object is in fact an excel file - and save it to disk?
You don't need to convince Python, just write it to a file.
More reading for you:http://docs.python.org/tut/node9.html
pat
--
http://mail.python.org/mailman/listinfo/python-list
--
-- Guilherme H. Polo Goncalves
OK:
response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&market=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional').read(1000000)
# print(response)
f = open("c:\\msci.xls",'w')
f.write(response)

Click to expand...

Click to expand...

I would initially change that to:

Click to expand...

response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/excel?priceLevel=0&scope=0&...)

Click to expand...

f = open("c:\\msci.xls", "wb")
for line in response:
f.write(line)
f.close()

Click to expand...

and then..

Click to expand...

try it.

Click to expand...

A simple f.write(response) does work (click on a single row in Excel
and you get a single row).

But I can see that what you recommend Guilherme is probably safer -
thanx.

pat

Click to expand...

If response contains a string then:

for line in response:
f.write(line)

will actually be writing the string one character at a time!

P

patf

Jul 29, 2008

#12

(e-mail address removed) schrieb:
Hi - experienced programmer but this is my first Python program.
This URL will retrieve an excel spreadsheet containing (that day's)
msci stock index returns.
http://www.mscibarra.com/webapp/indexperf/excel?priceLevel=0&scope=0&...
Want to write python to download and save the file.
So far I've arrived at this:

# import pdb
import urllib2
from win32com.client import Dispatch
xlApp = Dispatch("Excel.Application")
# test 1
# xlApp.Workbooks.Add()
# xlApp.ActiveSheet.Cells(1,1).Value = 'A'
# xlApp.ActiveWorkbook.ActiveSheet.Cells(2,1).Value = 'B'
# xlBook = xlApp.ActiveWorkbook
# xlBook.SaveAs(Filename='C:\\test.xls')
# pdb.set_trace()
response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&market=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional')
# test 2 - returns check = False
check_for_data = urllib2.Request('http://www.mscibarra.com/webapp/
indexperf/excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&market=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional').has_data()
xlApp = response.fp
print(response.fp.name)
print(xlApp.name)
xlApp.write
xlApp.Close

Click to expand...

Woops hit Send when I wanted Preview. Looks like the html

tag
doesn't work from groups.google.com (nice).
Anway, in test 1 above, I determined how to instantiate an excel
object; put some stuff in it; then save to disk.
So, in theory, I'm retrieving my excel spreadsheet with
response = urllib2.urlopen()
Except what then do I do with this?
Well for one read some of the urllib2 documentation and found the
Request class with the method has_data() on it. It returns False.
Hmm that's not encouraging.
I supposed the trick to understand what urllib2.urlopen is returning
to me; rummage around in there; and hopefully find my excel file.
I use pdb to debug. This is interesting:
(Pdb) dir(response)
['__doc__', '__init__', '__iter__', '__module__', '__repr__', 'close',
'code', '
fileno', 'fp', 'geturl', 'headers', 'info', 'msg', 'next', 'read',
'readline', '
readlines', 'url']
(Pdb)
I suppose the members with __*_ are methods; and the names without the
underbars are attributes (variables) (?).
No, these are the names of all attributes and methods. read is a method,
for example.
right - I got it backwards.
Or maybe this isn't at all the right direction to take (maybe there
are much better modules to do this stuff). Would be happy to learn if
that's the case (and if that gets the job done for me).
The docs (http://docs.python.org/lib/module-urllib2.html) are pretty
clear on this:
"""
This function returns a file-like object with two additional methods:
"""
And then for file-like objects:
http://docs.python.org/lib/bltin-file-objects.html
"""
read( [size])
Read at most size bytes from the file (less if the read hits EOF
before obtaining size bytes). If the size argument is negative or
omitted, read all data until EOF is reached. The bytes are returned as a
string object. An empty string is returned when EOF is encountered
immediately. (For certain files, like ttys, it makes sense to continue
reading after an EOF is hit.) Note that this method may call the
underlying C function fread() more than once in an effort to acquire as
close to size bytes as possible. Also note that when in non-blocking
mode, less data than what was requested may be returned, even if no size
parameter was given.
"""
Diez
Just stumbled upon .read:
response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&market=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional').read
Now the question is: what to do with this? I'll look at the
documentation that you point to.
thanx - pat
Or rather (next iteration):
response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&market=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional').read(1000000)
The file is generally something like 26 KB so specifying 1,000,000
seems like a good idea (first approximation).
And then when I do:
print(response)
I get a whole lot of garbage (and some non-garbage), so I know I'm
onto something.
When I read the .read documentation further, it says that read() has
returned the data as a string object. Now - how do I convince Python
that the string object is in fact an excel file - and save it to disk?
You don't need to convince Python, just write it to a file.
More reading for you:http://docs.python.org/tut/node9.html
pat
--
http://mail.python.org/mailman/listinfo/python-list
--
-- Guilherme H. Polo Goncalves
OK:
response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&market=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional').read(1000000)
# print(response)
f = open("c:\\msci.xls",'w')
f.write(response)
I would initially change that to:
response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/excel?priceLevel=0&scope=0&...)
f = open("c:\\msci.xls", "wb")
for line in response:
f.write(line)
f.close()
and then..
OK this makes the file, and there's a c:\msci.xls in place and it's
about the right size. But whether I make the second param to open 'w'
or 'wb', when I try to open msci.xls from the Windows file explorer,
excel tells me that the file is corrupted.
try it.
pat

Click to expand...

Click to expand...

A simple f.write(response) does work (click on a single row in Excel
and you get a single row).

Click to expand...

But I can see that what you recommend Guilherme is probably safer -
thanx.

Click to expand...

pat

Click to expand...

If response contains a string then:

for line in response:
f.write(line)

will actually be writing the string one character at a time!

Click to expand...

Hmm. In this case, response was a string object. (that's what
urllib2.urlopen().read() returns).

My concern was with line ending characters (delimiters). I was
thinking that if the string object doesn't contain line ending
delimiters then maybe the for loop was better. Although that begs the
question of how

for line in reponse

recognizes lines (as defined by line ending delimiters) in the first
place.

pat

G

Guilherme Polo

Jul 29, 2008

#13

On Jul 28, 3:29 pm, "Diez B. Roggisch" <[email protected]> wrote:

Click to expand...

(e-mail address removed) schrieb:

Click to expand...

Hi - experienced programmer but this is my first Python program.

Click to expand...

This URL will retrieve an excel spreadsheet containing (that day's)
msci stock index returns.

Want to write python to download and save the file.

Click to expand...

So far I've arrived at this:

# import pdb
import urllib2
from win32com.client import Dispatch

Click to expand...

xlApp = Dispatch("Excel.Application")

Click to expand...

# test 1
# xlApp.Workbooks.Add()
# xlApp.ActiveSheet.Cells(1,1).Value = 'A'
# xlApp.ActiveWorkbook.ActiveSheet.Cells(2,1).Value = 'B'
# xlBook = xlApp.ActiveWorkbook
# xlBook.SaveAs(Filename='C:\\test.xls')

Click to expand...

# pdb.set_trace()
response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&market=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional')
# test 2 - returns check = False
check_for_data = urllib2.Request('http://www.mscibarra.com/webapp/
indexperf/excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&market=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional').has_data()

Click to expand...

xlApp = response.fp
print(response.fp.name)
print(xlApp.name)
xlApp.write
xlApp.Close

Click to expand...

Click to expand...

Woops hit Send when I wanted Preview. Looks like the html

tag
doesn't work from groups.google.com (nice).

Click to expand...

Anway, in test 1 above, I determined how to instantiate an excel
object; put some stuff in it; then save to disk.

Click to expand...

So, in theory, I'm retrieving my excel spreadsheet with

Click to expand...

response = urllib2.urlopen()

Click to expand...

Except what then do I do with this?

Click to expand...

Well for one read some of the urllib2 documentation and found the
Request class with the method has_data() on it. It returns False.
Hmm that's not encouraging.

Click to expand...

I supposed the trick to understand what urllib2.urlopen is returning
to me; rummage around in there; and hopefully find my excel file.

Click to expand...

I use pdb to debug. This is interesting:

Click to expand...

(Pdb) dir(response)
['__doc__', '__init__', '__iter__', '__module__', '__repr__', 'close',
'code', '
fileno', 'fp', 'geturl', 'headers', 'info', 'msg', 'next', 'read',
'readline', '
readlines', 'url']
(Pdb)

Click to expand...

I suppose the members with __*_ are methods; and the names without the
underbars are attributes (variables) (?).

Click to expand...

No, these are the names of all attributes and methods. read is a method,
for example.

Click to expand...

right - I got it backwards.

Click to expand...

Or maybe this isn't at all the right direction to take (maybe there
are much better modules to do this stuff). Would be happy to learn if
that's the case (and if that gets the job done for me).

Click to expand...

The docs (http://docs.python.org/lib/module-urllib2.html) are pretty
clear on this:

Click to expand...

"""
This function returns a file-like object with two additional methods:
"""

Click to expand...

And then for file-like objects:

http://docs.python.org/lib/bltin-file-objects.html

Click to expand...

"""
read( [size])
Read at most size bytes from the file (less if the read hits EOF
before obtaining size bytes). If the size argument is negative or
omitted, read all data until EOF is reached. The bytes are returned as a
string object. An empty string is returned when EOF is encountered
immediately. (For certain files, like ttys, it makes sense to continue
reading after an EOF is hit.) Note that this method may call the
underlying C function fread() more than once in an effort to acquire as
close to size bytes as possible. Also note that when in non-blocking
mode, less data than what was requested may be returned, even if no size
parameter was given.
"""

Diez

Click to expand...

Just stumbled upon .read:

Click to expand...

response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&market=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional').read

Click to expand...

Now the question is: what to do with this? I'll look at the
documentation that you point to.

Click to expand...

thanx - pat

Click to expand...

Or rather (next iteration):

Click to expand...

response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&market=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional').read(1000000)

Click to expand...

The file is generally something like 26 KB so specifying 1,000,000
seems like a good idea (first approximation).

Click to expand...

And then when I do:

I get a whole lot of garbage (and some non-garbage), so I know I'm
onto something.

Click to expand...

When I read the .read documentation further, it says that read() has
returned the data as a string object. Now - how do I convince Python
that the string object is in fact an excel file - and save it to disk?

Click to expand...

You don't need to convince Python, just write it to a file.
More reading for you:http://docs.python.org/tut/node9.html

--
-- Guilherme H. Polo Goncalves

response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&market=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional').read(1000000)
# print(response)
f = open("c:\\msci.xls",'w')
f.write(response)

Click to expand...

I would initially change that to:

Click to expand...

response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/excel?priceLevel=0&scope=0&...)

Click to expand...

f = open("c:\\msci.xls", "wb")
for line in response:
f.write(line)
f.close()

Click to expand...

and then..

Click to expand...

OK this makes the file, and there's a c:\msci.xls in place and it's
about the right size. But whether I make the second param to open 'w'
or 'wb', when I try to open msci.xls from the Windows file explorer,
excel tells me that the file is corrupted.

Click to expand...

try it.

Click to expand...

A simple f.write(response) does work (click on a single row in Excel
and you get a single row).

But I can see that what you recommend Guilherme is probably safer -
thanx.

pat

Click to expand...

If response contains a string then:

Click to expand...

Did you notice I removed the read(...) part ?

P

patf

Jul 29, 2008

#14

(e-mail address removed) schrieb:
Hi - experienced programmer but this is my first Python program.
This URL will retrieve an excel spreadsheet containing (that day's)
msci stock index returns.
http://www.mscibarra.com/webapp/indexperf/excel?priceLevel=0&scope=0&...
Want to write python to download and save the file.
So far I've arrived at this:

# import pdb
import urllib2
from win32com.client import Dispatch
xlApp = Dispatch("Excel.Application")
# test 1
# xlApp.Workbooks.Add()
# xlApp.ActiveSheet.Cells(1,1).Value = 'A'
# xlApp.ActiveWorkbook.ActiveSheet.Cells(2,1).Value = 'B'
# xlBook = xlApp.ActiveWorkbook
# xlBook.SaveAs(Filename='C:\\test.xls')
# pdb.set_trace()
response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&market=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional')
# test 2 - returns check = False
check_for_data = urllib2.Request('http://www.mscibarra.com/webapp/
indexperf/excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&market=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional').has_data()
xlApp = response.fp
print(response.fp.name)
print(xlApp.name)
xlApp.write
xlApp.Close

Click to expand...

Woops hit Send when I wanted Preview. Looks like the html

tag
doesn't work from groups.google.com (nice).
Anway, in test 1 above, I determined how to instantiate an excel
object; put some stuff in it; then save to disk.
So, in theory, I'm retrieving my excel spreadsheet with
response = urllib2.urlopen()
Except what then do I do with this?
Well for one read some of the urllib2 documentation and found the
Request class with the method has_data() on it. It returns False.
Hmm that's not encouraging.
I supposed the trick to understand what urllib2.urlopen is returning
to me; rummage around in there; and hopefully find my excel file.
I use pdb to debug. This is interesting:
(Pdb) dir(response)
['__doc__', '__init__', '__iter__', '__module__', '__repr__', 'close',
'code', '
fileno', 'fp', 'geturl', 'headers', 'info', 'msg', 'next', 'read',
'readline', '
readlines', 'url']
(Pdb)
I suppose the members with __*_ are methods; and the names without the
underbars are attributes (variables) (?).
No, these are the names of all attributes and methods. read is a method,
for example.
right - I got it backwards.
Or maybe this isn't at all the right direction to take (maybe there
are much better modules to do this stuff). Would be happy to learn if
that's the case (and if that gets the job done for me).
The docs (http://docs.python.org/lib/module-urllib2.html) are pretty
clear on this:
"""
This function returns a file-like object with two additional methods:
"""
And then for file-like objects:
http://docs.python.org/lib/bltin-file-objects.html
"""
read( [size])
Read at most size bytes from the file (less if the read hits EOF
before obtaining size bytes). If the size argument is negative or
omitted, read all data until EOF is reached. The bytes are returned as a
string object. An empty string is returned when EOF is encountered
immediately. (For certain files, like ttys, it makes sense to continue
reading after an EOF is hit.) Note that this method may call the
underlying C function fread() more than once in an effort to acquire as
close to size bytes as possible. Also note that when in non-blocking
mode, less data than what was requested may be returned, even if no size
parameter was given.
"""
Diez
Just stumbled upon .read:
response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&market=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional').read
Now the question is: what to do with this? I'll look at the
documentation that you point to.
thanx - pat
Or rather (next iteration):
response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&market=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional').read(1000000)
The file is generally something like 26 KB so specifying 1,000,000
seems like a good idea (first approximation).
And then when I do:
print(response)
I get a whole lot of garbage (and some non-garbage), so I know I'm
onto something.
When I read the .read documentation further, it says that read() has
returned the data as a string object. Now - how do I convince Python
that the string object is in fact an excel file - and save it to disk?
You don't need to convince Python, just write it to a file.
More reading for you:http://docs.python.org/tut/node9.html
pat
--
http://mail.python.org/mailman/listinfo/python-list
--
-- Guilherme H. Polo Goncalves
OK:
response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&market=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional').read(1000000)
# print(response)
f = open("c:\\msci.xls",'w')
f.write(response)
I would initially change that to:
response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/excel?priceLevel=0&scope=0&...)
f = open("c:\\msci.xls", "wb")
for line in response:
f.write(line)
f.close()
and then..
OK this makes the file, and there's a c:\msci.xls in place and it's
about the right size. But whether I make the second param to open 'w'
or 'wb', when I try to open msci.xls from the Windows file explorer,
excel tells me that the file is corrupted.
try it.
pat
--
http://mail.python.org/mailman/listinfo/python-list
--
-- Guilherme H. Polo Goncalves
A simple f.write(response) does work (click on a single row in Excel
and you get a single row).
But I can see that what you recommend Guilherme is probably safer -
thanx.
pat

Click to expand...

Click to expand...

If response contains a string then:

Click to expand...

Did you notice I removed the read(...) part ?

for line in response:
f.write(line)

Click to expand...

will actually be writing the string one character at a time!

Click to expand...

Click to expand...

Actually no I didn't Guilherme (although I'll take it out now).

Would leaving the in urllib2.urlopen().read() imply, as MRAB would
seem to indicate, that the following for loop would act byte-by-byte?
And if so, how?

Even with the .read() in, it was very fast. But it looks like it
won't hurt (and very possibly helps) to take it out.

pat

P

patf

Jul 29, 2008

#15

(e-mail address removed) schrieb:
Hi - experienced programmer but this is my first Python program.
This URL will retrieve an excel spreadsheet containing (that day's)
msci stock index returns.
http://www.mscibarra.com/webapp/indexperf/excel?priceLevel=0&scope=0&...
Want to write python to download and save the file.
So far I've arrived at this:

# import pdb
import urllib2
from win32com.client import Dispatch
xlApp = Dispatch("Excel.Application")
# test 1
# xlApp.Workbooks.Add()
# xlApp.ActiveSheet.Cells(1,1).Value = 'A'
# xlApp.ActiveWorkbook.ActiveSheet.Cells(2,1).Value = 'B'
# xlBook = xlApp.ActiveWorkbook
# xlBook.SaveAs(Filename='C:\\test.xls')
# pdb.set_trace()
response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&market=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional')
# test 2 - returns check = False
check_for_data = urllib2.Request('http://www.mscibarra.com/webapp/
indexperf/excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&market=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional').has_data()
xlApp = response.fp
print(response.fp.name)
print(xlApp.name)
xlApp.write
xlApp.Close

Click to expand...

Woops hit Send when I wanted Preview. Looks like the html

tag
doesn't work from groups.google.com (nice).
Anway, in test 1 above, I determined how to instantiate an excel
object; put some stuff in it; then save to disk.
So, in theory, I'm retrieving my excel spreadsheet with
response = urllib2.urlopen()
Except what then do I do with this?
Well for one read some of the urllib2 documentation and found the
Request class with the method has_data() on it. It returns False.
Hmm that's not encouraging.
I supposed the trick to understand what urllib2.urlopen is returning
to me; rummage around in there; and hopefully find my excel file.
I use pdb to debug. This is interesting:
(Pdb) dir(response)
['__doc__', '__init__', '__iter__', '__module__', '__repr__', 'close',
'code', '
fileno', 'fp', 'geturl', 'headers', 'info', 'msg', 'next', 'read',
'readline', '
readlines', 'url']
(Pdb)
I suppose the members with __*_ are methods; and the names without the
underbars are attributes (variables) (?).
No, these are the names of all attributes and methods. read is a method,
for example.
right - I got it backwards.
Or maybe this isn't at all the right direction to take (maybe there
are much better modules to do this stuff). Would be happy to learn if
that's the case (and if that gets the job done for me).
The docs (http://docs.python.org/lib/module-urllib2.html) are pretty
clear on this:
"""
This function returns a file-like object with two additional methods:
"""
And then for file-like objects:
http://docs.python.org/lib/bltin-file-objects.html
"""
read( [size])
Read at most size bytes from the file (less if the read hits EOF
before obtaining size bytes). If the size argument is negative or
omitted, read all data until EOF is reached. The bytes are returned as a
string object. An empty string is returned when EOF is encountered
immediately. (For certain files, like ttys, it makes sense to continue
reading after an EOF is hit.) Note that this method may call the
underlying C function fread() more than once in an effort to acquire as
close to size bytes as possible. Also note that when in non-blocking
mode, less data than what was requested may be returned, even if no size
parameter was given.
"""
Diez
Just stumbled upon .read:
response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&market=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional').read
Now the question is: what to do with this? I'll look at the
documentation that you point to.
thanx - pat
Or rather (next iteration):
response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&market=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional').read(1000000)
The file is generally something like 26 KB so specifying 1,000,000
seems like a good idea (first approximation).
And then when I do:
print(response)
I get a whole lot of garbage (and some non-garbage), so I know I'm
onto something.
When I read the .read documentation further, it says that read() has
returned the data as a string object. Now - how do I convince Python
that the string object is in fact an excel file - and save it to disk?
You don't need to convince Python, just write it to a file.
More reading for you:http://docs.python.org/tut/node9.html
pat
--
http://mail.python.org/mailman/listinfo/python-list
--
-- Guilherme H. Polo Goncalves
OK:
response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&market=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional').read(1000000)
# print(response)
f = open("c:\\msci.xls",'w')
f.write(response)
I would initially change that to:
response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/excel?priceLevel=0&scope=0&...)
f = open("c:\\msci.xls", "wb")
for line in response:
f.write(line)
f.close()
and then..
OK this makes the file, and there's a c:\msci.xls in place and it's
about the right size. But whether I make the second param to open 'w'
or 'wb', when I try to open msci.xls from the Windows file explorer,
excel tells me that the file is corrupted.
try it.
pat
--
http://mail.python.org/mailman/listinfo/python-list
--
-- Guilherme H. Polo Goncalves
A simple f.write(response) does work (click on a single row in Excel
and you get a single row).
But I can see that what you recommend Guilherme is probably safer -
thanx.
pat

Click to expand...

Click to expand...

If response contains a string then:

Click to expand...

Did you notice I removed the read(...) part ?

for line in response:
f.write(line)

Click to expand...

will actually be writing the string one character at a time!

Click to expand...

Click to expand...

Actually no I didn't Guilherme (although I'll take it out now).

Would leaving the in urllib2.urlopen().read() imply, as MRAB would
seem to indicate, that the following for loop would act byte-by-byte?
And if so, how?

Even with the .read() in, it was very fast. But it looks like it
won't hurt (and very possibly helps) to take it out.

pat

G

Guilherme Polo

Jul 29, 2008

#16

On Jul 28, 4:20 pm, "Guilherme Polo" <[email protected]> wrote:

Click to expand...

On Jul 28, 3:29 pm, "Diez B. Roggisch" <[email protected]> wrote:

Click to expand...

(e-mail address removed) schrieb:

Click to expand...

Hi - experienced programmer but this is my first Python program.

Click to expand...

This URL will retrieve an excel spreadsheet containing (that day's)
msci stock index returns.

Want to write python to download and save the file.

Click to expand...

So far I've arrived at this:

# import pdb
import urllib2
from win32com.client import Dispatch

Click to expand...

xlApp = Dispatch("Excel.Application")

Click to expand...

# test 1
# xlApp.Workbooks.Add()
# xlApp.ActiveSheet.Cells(1,1).Value = 'A'
# xlApp.ActiveWorkbook.ActiveSheet.Cells(2,1).Value = 'B'
# xlBook = xlApp.ActiveWorkbook
# xlBook.SaveAs(Filename='C:\\test.xls')

Click to expand...

# pdb.set_trace()
response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&market=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional')
# test 2 - returns check = False
check_for_data = urllib2.Request('http://www.mscibarra.com/webapp/
indexperf/excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&market=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional').has_data()

Click to expand...

xlApp = response.fp
print(response.fp.name)
print(xlApp.name)
xlApp.write
xlApp.Close

Click to expand...

Click to expand...

Woops hit Send when I wanted Preview. Looks like the html

tag
doesn't work from groups.google.com (nice).

Click to expand...

Anway, in test 1 above, I determined how to instantiate an excel
object; put some stuff in it; then save to disk.

Click to expand...

So, in theory, I'm retrieving my excel spreadsheet with

Click to expand...

response = urllib2.urlopen()

Click to expand...

Except what then do I do with this?

Click to expand...

Well for one read some of the urllib2 documentation and found the
Request class with the method has_data() on it. It returns False.
Hmm that's not encouraging.

Click to expand...

I supposed the trick to understand what urllib2.urlopen is returning
to me; rummage around in there; and hopefully find my excel file.

Click to expand...

I use pdb to debug. This is interesting:

Click to expand...

(Pdb) dir(response)
['__doc__', '__init__', '__iter__', '__module__', '__repr__', 'close',
'code', '
fileno', 'fp', 'geturl', 'headers', 'info', 'msg', 'next', 'read',
'readline', '
readlines', 'url']
(Pdb)

Click to expand...

I suppose the members with __*_ are methods; and the names without the
underbars are attributes (variables) (?).

Click to expand...

No, these are the names of all attributes and methods. read is a method,
for example.

Click to expand...

right - I got it backwards.

Click to expand...

Or maybe this isn't at all the right direction to take (maybe there
are much better modules to do this stuff). Would be happy to learn if
that's the case (and if that gets the job done for me).

Click to expand...

The docs (http://docs.python.org/lib/module-urllib2.html) are pretty
clear on this:

Click to expand...

"""
This function returns a file-like object with two additional methods:
"""

Click to expand...

And then for file-like objects:

http://docs.python.org/lib/bltin-file-objects.html

Click to expand...

"""
read( [size])
Read at most size bytes from the file (less if the read hits EOF
before obtaining size bytes). If the size argument is negative or
omitted, read all data until EOF is reached. The bytes are returned as a
string object. An empty string is returned when EOF is encountered
immediately. (For certain files, like ttys, it makes sense to continue
reading after an EOF is hit.) Note that this method may call the
underlying C function fread() more than once in an effort to acquire as
close to size bytes as possible. Also note that when in non-blocking
mode, less data than what was requested may be returned, even if no size
parameter was given.
"""

Diez

Click to expand...

Just stumbled upon .read:

Click to expand...

response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&market=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional').read

Click to expand...

Now the question is: what to do with this? I'll look at the
documentation that you point to.

Click to expand...

thanx - pat

Click to expand...

Or rather (next iteration):

Click to expand...

response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&market=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional').read(1000000)

Click to expand...

The file is generally something like 26 KB so specifying 1,000,000
seems like a good idea (first approximation).

Click to expand...

And then when I do:

I get a whole lot of garbage (and some non-garbage), so I know I'm
onto something.

Click to expand...

When I read the .read documentation further, it says that read() has
returned the data as a string object. Now - how do I convince Python
that the string object is in fact an excel file - and save it to disk?

Click to expand...

You don't need to convince Python, just write it to a file.
More reading for you:http://docs.python.org/tut/node9.html

--
-- Guilherme H. Polo Goncalves

response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&market=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional').read(1000000)
# print(response)
f = open("c:\\msci.xls",'w')
f.write(response)

Click to expand...

I would initially change that to:

Click to expand...

response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/excel?priceLevel=0&scope=0&...)

Click to expand...

f = open("c:\\msci.xls", "wb")
for line in response:
f.write(line)
f.close()

Click to expand...

and then..

Click to expand...

OK this makes the file, and there's a c:\msci.xls in place and it's
about the right size. But whether I make the second param to open 'w'
or 'wb', when I try to open msci.xls from the Windows file explorer,
excel tells me that the file is corrupted.

Click to expand...

try it.

Click to expand...

A simple f.write(response) does work (click on a single row in Excel
and you get a single row).

Click to expand...

But I can see that what you recommend Guilherme is probably safer -
thanx.

If response contains a string then:

Click to expand...

Did you notice I removed the read(...) part ?

for line in response:
f.write(line)

Click to expand...

will actually be writing the string one character at a time!

Click to expand...

Click to expand...

Actually no I didn't Guilherme (although I'll take it out now).

Would leaving the in urllib2.urlopen().read() imply, as MRAB would
seem to indicate, that the following for loop would act byte-by-byte?
And if so, how?

Click to expand...

..read() returns a string, so yes.
The point in removing the .read(xxxxx) is that you no longer need to
guess how long is the file to read it entirely.

Upload/DownLoad from Excel	1	Aug 23, 2007
The devolution of English language and slothful c.l.p behaviors exposed!	50	Jan 24, 2012
pyhton or json, list or array?	0	Sep 30, 2008
Sizewell B++	0	Mar 30, 2006
Rods	0	Mar 30, 2006
Lowering Of Graphite Rods Into Reactor Core	1	Mar 30, 2006
comp.lang.c Answers to Frequently Asked Questions (FAQ List)	15	Apr 1, 2006
comp.lang.c Answers to Frequently Asked Questions (FAQ List)	1	Feb 1, 2004

patf

patf

Diez B. Roggisch

patf

patf

Guilherme Polo

patf

patf

Guilherme Polo

patf

MRAB

patf

Guilherme Polo

patf

patf

Guilherme Polo

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads