why not in python 2.4.3

R

Rocco

hi
I made the upgrade to python 2.4.3 from 2.4.2.
I want to take from google news some atom feeds with a funtion like
this
import urllib2
def takefeed(url):
request=urllib2.Request(url)
request.add_header('User-Agent', 'Mozilla/4.0 (compatible; MSIE 5.5;
Windows NT')
opener = urllib2.build_opener()
data=opener.open(request).read()
return data
url='http://news.google.it/?output=rss'
d=takefeed(url)
This woks well with python 2.3.5 but does not work with 2.4.3.
Why?
Thanks
 
C

Carl Banks

Rocco said:
hi
I made the upgrade to python 2.4.3 from 2.4.2.
I want to take from google news some atom feeds with a funtion like
this
import urllib2
def takefeed(url):
request=urllib2.Request(url)
request.add_header('User-Agent', 'Mozilla/4.0 (compatible; MSIE 5.5;
Windows NT')
opener = urllib2.build_opener()
data=opener.open(request).read()
return data
url='http://news.google.it/?output=rss'
d=takefeed(url)
This woks well with python 2.3.5 but does not work with 2.4.3.
Why?

Define "woks [sic] well". It works fine for me on 2.4.3 (and by "works
fine" I mean it ran without an exception and it returned what appeared
to be RSS data). If you would give us an exception trace it would help
a lot.

Maybe Google's server (or your ISP's) was down. That happens
sometimes.

Carl
 
R

Rocco

This is the problem when I run the function
this is the result from 2.3.5<?xml version="1.0" encoding="UTF-8"?><feed version="0.3" xml:lang="it"
xmlns="http://purl.org/atom/ns#"><generator>NFE/1.0</generator><title>Google
News Italia</title><link rel="alternate" type="text/html"
href="http://news.google.it/"/><tagline>Google News
Italia</tagline><author><name>Google
Inc.</name><email>[email protected]</email></author><copyright>&amp;copy;2006
Google</copyright><modified>2006-05-28T19:09:13+00:00</modified>
<!-- A couple notes:
* add an "output=atom" param to get Atom
* section pages have a "topic=?" param;
use "topic=h" for a Top Stories section.
--><entry><title>Benedetto XVI: Wojtyla santo subito - LibertÃ
</title><link rel="alternate" type="text/html"
href="http://www.liberta.it/default.asp?IDG=605282024"/><id>tag:news.google.com,2005:cluster=41b535fb</id><summary>Prima
pagina</summary><issued>2006-05-28T11:05:00+00:00</issued><modified>2006-05-28T11:05:00+00:00</modified><content
type="text/html" mode="escaped">&lt;br&gt;&lt;table border=0 align=
cellpadding=5 cellspacing=0&gt;&lt;tr&gt;&lt;td width=80 align=center
valign=top&gt;&lt;a .....
'\x1f\x8b\x08\x00\x00\x00\x00\x00\x02\xff\xe5}Ks\xe3F\xb6\xe6\xfeF\xdc\xff\x90\xd77\xba\xc3\x9e\x10D\xbc\x01\xcaU\xee\xa1\x9eM[\xa2\xd4$\xabl\xf7\x86\x93\x04\x93Tv\x81H\x1a\x0fV\xa9V\xfe\x0f3\x9b\x8e\x98\x89\xb8\xcb\x1b\xd1\xb3\x9a\xddD\xef\xec\x7f\xe2_2\xe7$\x00\x8a/\x11|\x93\xd6\xb4\xa3U"\x04\x02\x99\xe7d\x9e<\xdfy\xbe\xf9\xd3\xa7\xbeO\x86,\x8c\xb8\x08\xde~\xa1\x9d\xaa_\x10\x16x\xa2\xc3\x83\xde\xdb/\xde5\xaf\x15\xf7\x8b?}\xf3&\x8c\xa2\xe7\x9bt\xb8\xe9\x9b7\xde#\r\x02\xe6\x7f\xf3\xa6\xc7\x02\x16\xd2X\x84\xdf\xd4\xae\xafJ\xf0\x847\xa5\xe7Kob\x1e\xfb\xec\x9b\x1b!z>#5\xf61"\xd5\x98\xfa\x9c\xbe)\xa5\x7fy\xe3\xf3\xe0\xc37\x8fq<8+\x95\x02\xf8\xfbiO\xde{\xca\xe3\xd2\x9b\x92\xfc\xe3\x9b\x0e\x8b\xbc\x90\x0fbx\xfb\xdc\'\x8d\xff\xfd\x8dO\x83^B{\xec\x1b\x1e\xc3\xf7\xf3\x0fo>\xb2\xf6\x1d\x8db\x16~\x83/Q\xba\x8cu\xda\xd4\xfb\xf0_\xb3\xb7y\xa2\xff\xa6\xf4|\xcf\x1bO\x0c\x9eB\xde{\x8c\xbf\xf9#\xed\x0f\xbe\xc6\x8f_\xeb\xaaj\x93\xf4\xfdoJ\xcf7\xbc\x19$\xedK\x1a\xb3o\x1aIpBt\x97\xdc\xd1\'"\xef\xd5\xb53\xcd<3\x1crs\xd7|S\xcao\x83\x11F\xf1y\xc2\xfd\xce2\xdf\x9a\xbc\xf9_\xff\xe5\xcd\xbf)\n\xa9\x10O$\x03
C b\x16\x9d\xfd\xeb\xbf\x10\xfc\xdf\x7f!\xb4\xd3!4
_\x88$\x1e$\xf1[\xe0\xda\x17d@C\xda\'\xb1
=\x16\x93z\xa31\xba9b\x1eR\x0cn\xe8\xb1\x88<\xd2!#\x94|\x11\x8b\x01\xf7\xde\xfe)\xfb\xde\xd7\xd9\xdd\x84$\x11\xcb\xff\xf8\xf8\x05\xe9\x8a\x10nn\x8a\x01i\x00\x979|?{\xda\xe9\xbf\xfe\x8b\xa2|\xf3\x86\xf7%\xd1\x0b\x99\x9f\x84\xfe<\xde\x037J<\x88\xfd\x12\x8f[\xb0\x0e\xe4\xd3"yG+dp\x17\xef\xbe)\xe1W\x97Y<\xa5l,<f\xfd|D\x15\xf2=\xed\x88\x8f\xdcc\'\xc4\xe3q\xfc\xeb\x7f\x90\x80\xc2\xc8\x18\xe9p\xf2\x1d\r\x85\x7fB\xb8O\x1eD\x10\xb3.\xdc\x05\xd4!\xb4\xdbea\x1f\x1659==%\n\xa9\xfa\xa4\xc9\xfa\x03\xb1\xccB\xc6\x8f8\xe0?E\xf4m3]P\xf1[\xb8\xae*\xf0\x9f\xfc\xdc\xed\xbc\xad\xcb_\xe0\xae\xb7\xd9C>~\xfcx\xca\xfd\x18_\x82\x0f\xa1\x83A(\xba"\xe8\xf0>\x0bb\x0e\x04\xea\xb0O\xa74\x
No exception trace
Thanks again
 
D

Dennis Lee Bieber

No exception trace
Thanks again

PythonWin 2.4.3 (#69, Apr 11 2006, 15:32:42) [MSC v.1310 32 bit (Intel)]
on win32.
Portions Copyright 1994-2004 Mark Hammond ([email protected]) -
see 'Help/About PythonWin' for further copyright information.
Off-hand -- I'd say it is a problem with your installation... I
don't know -- some site default package changing encoding, perhaps?
--
Wulfraed Dennis Lee Bieber KD6MOG
(e-mail address removed) (e-mail address removed)
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: (e-mail address removed))
HTTP://www.bestiaria.com/
 
S

Serge Orlov

Rocco said:
Also with ascii the function does not work.

Well, at least you fixed misconfiguration ;)

Googling for 1F8B (that's two first bytes from your strange python 2.4
result) gives a hint: it's a beginning of gzip stream. Maybe urllib2 in
python 2.4 reports to the server that it supports compressed data but
doesn't decompress it when receives the reply?
 
R

Rocco

Thanks Serge.
It's a gzip string.
So the code is request=urllib2.Request(url)
request.add_header('User-Agent', 'Mozilla/4.0 (compatible; MSIE
5.5;Windows NT')
opener = urllib2.build_opener()
data=opener.open(request).read()
return data
url='http://news.google.it/?output=rss'
d=takefeed(url)
from StringIO import StringIO
zipdata=StringIO(d)
import gzip
gz=gzip.GzipFile(fileobj=zipdata)
rss=gz.read()
len(rss) 102529
print rss[0:100]
 
J

John Machin

Well, at least you fixed misconfiguration ;)

Googling for 1F8B (that's two first bytes from your strange python 2.4
result) gives a hint: it's a beginning of gzip stream.

Well done!
Maybe urllib2 in
python 2.4 reports to the server that it supports compressed data but
doesn't decompress it when receives the reply?

Something funny is happening here. Others reported it working with 2.4.3
and Rocco's original code as posted in this thread -- which works for me
on 2.4.2, Windows XP.

There was one suss thing about Rocco's problem description:
First message ended with d=takefeed(url)
But next message said print rss
Is rss == d?

Cheers,
John
 
J

John Machin

Thanks Serge.
It's a gzip string.

Look, Ma, no gzip!!!

C:\junk>rocco_rss.py
'<?xml version="1.0" encoding="UTF-8"?><rss
version="2.0"><channel><generator>NF
E/1.0</generator><tit'

C:\junk>type rocco_rss.py
import urllib2
def takefeed(url):
request=urllib2.Request(url)
request.add_header('User-Agent', 'Mozilla/4.0 (compatible; MSIE
5.5; Win
dows NT')
opener = urllib2.build_opener()
data=opener.open(request).read()
return data
url='http://news.google.it/?output=rss'
d=takefeed(url)
print repr(d[:100])
 
S

Serge Orlov

John said:
Something funny is happening here. Others reported it working with 2.4.3
and Rocco's original code as posted in this thread -- which works for me
on 2.4.2, Windows XP.

It "works" for me too, returning raw uncompressed data.
There was one suss thing about Rocco's problem description:
First message ended with d=takefeed(url)
But next message said print rss
Is rss == d?

Nope. If you look at html tags, 2.3 code returns <feed> <generator> ...
whereas 2.4 code returns <rss> <channel> <generator> ... That may
explain why 2.3 result is not compressed and 2.4 result is compressed,
but that doesn't explain why 2.4 *is* compressed. I looked at python
2.4 httplib, I'm sure it's not a problem, quote from httplib:

# we only want a Content-Encoding of "identity" since we
don't
# support encodings such as x-gzip or x-deflate.

I think there is a web accellerator sitting somewhere between Rocco and
Google server that is confused that Rocco is "misinforming" web server
saying he's using Firefox, but at the same time claiming that he cannot
handle compressed data. That's why they teach little kids: don't lie :)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,233
Members
46,820
Latest member
GilbertoA5

Latest Threads

Top