REXML/RSS parse error

  • Thread starter Patrick Plattes
  • Start date
P

Patrick Plattes

Hello,

I have a problem while parsing an RSS file. I try to open a URL via
open-uri and it usually works fine, but with the RSS URLs from ccMixter
I get a parse error. It's a bit strange because if i download the file
and try to open it, it works fine.

I tried:
rss =
RSS::parser.parse("http://ccmixter.org/media/api/query...5&tags=remix+editorial_pick&rand=1&format=rss",false)

And got:
RSS::NotWellFormedError: This is not well formed XML
Missing end tag for 'html' (got "head")
Line:
Position:
Last 80 unconsumed characters:
from /usr/lib/ruby/1.8/rss/rexmlparser.rb:24:in `_parse'
from /usr/lib/ruby/1.8/rss/parser.rb:163:in `parse'
from /usr/lib/ruby/1.8/rss/parser.rb:78:in `parse'
from (irb):43


If i save the file and try to open it, it works fine:
rss = RSS::parser.parse("query",false)


Imho there should be no difference between open a local file or an URL.


Thanks for all the help I got the last days from this list,
Patrick
 
K

Kouhei Sutou

Hi,

In <[email protected]>
"REXML/RSS parse error" on Thu, 7 Dec 2006 19:45:53 +0900,
Patrick Plattes said:
I have a problem while parsing an RSS file. I try to open a URL via
open-uri and it usually works fine, but with the RSS URLs from ccMixter
I get a parse error. It's a bit strange because if i download the file
and try to open it, it works fine.

I tried:
rss =
RSS::parser.parse("http://ccmixter.org/media/api/query...5&tags=remix+editorial_pick&rand=1&format=rss",false)

And got:
RSS::NotWellFormedError: This is not well formed XML
Missing end tag for 'html' (got "head")

I got some garbages after RSS 2.0:

% ruby -r open-uri -e 'puts open("http://ccmixter.org/media/api/query...5&tags=remix+editorial_pick&rand=1&format=rss").read' | tail -n 25
</item>
</channel>
</rss>
"/web/ccmixter/www/cclib/cc-util.php"(205): Cannot modify header information - headers already sent by (output started at /web/ccmixter/www/cclib/cc-feed.php:432) [2006-12-07 07:10 am][138.243.129.4][/media/api/query?score=400&sinceu=1157536651&limit=25&tags=remix+editorial_pick&rand=1&format=rss]
<html>
<head>
<style>
body {
font-size: 11px;
font-family: Verdana, sans-serif;
background-color: #F99;
margin: 4%;
text-align: center;
}
</style>
</head>
<body>
<p> <img src="/mixter-files/skull.gif" /></p>
<h3>wups, ccMixter is experiencing technical difficulties...</h3>
<p>If you were in the middle of an upload or posting a message it probably worked OK
but you should click <a href="/">here</a> to get back to the site's home page or
use your browser's BACK button to return to the site and make sure.</p>
<p>The admins have been notified of the problem and will look into it very shortly.</p>
</body>
</head>


Thanks,
 
P

Patrick Plattes

Kouhei said:
I got some garbages after RSS 2.0:

Thank you, I hadn't seen it. I've written an e-mail to them, but the
most RSS reader are able to parse this malicious file. Do you know any
way to force the parser to read this file. For RSS It would be ok, to
stop parsing after the closing RSS tag.

Thanks,
Patrick
 
K

Kouhei Sutou

Hi,

In <[email protected]>
"Re: REXML/RSS parse error" on Thu, 7 Dec 2006 23:56:38 +0900,
Patrick Plattes said:
most RSS reader are able to parse this malicious file. Do you know any
way to force the parser to read this file. For RSS It would be ok, to
stop parsing after the closing RSS tag.

What about gsub(/<\/rss>.*\z/m, '</rss>')?

Thanks,
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,982
Messages
2,570,185
Members
46,737
Latest member
Georgeengab

Latest Threads

Top