Charset detection

M

Miquel Oliete

Hi

I'm coding a planet software in Ruby, like planet planet, but inserting
all rss data into a database (mysql now) and showing the entries from
database.

I don't know how can I detect the rss charset. Can you help me?

Thanks in advance

Kind regards

--

Miquel (a.k.a. Ton)
Linux User #286784
GPG Key : 4D91EF7F
Debian GNU/Linux (Linux Wolverine 2.6.14)

Welcome to the jungle, we got fun and games
Guns n' Roses


______________________________________________
LLama Gratis a cualquier PC del Mundo.
Llamadas a fijos y móviles desde 1 céntimo por minuto.
http://es.voice.yahoo.com
 
D

David Vallner

--------------enig59279F3B4E92B82146E458C8
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Miquel said:
Hi
=20
I'm coding a planet software in Ruby, like planet planet, but inserting=
all rss data into a database (mysql now) and showing the entries from
database.
=20
I don't know how can I detect the rss charset. Can you help me?
=20

Look for the XML header? That one should list encoding.

If it doesn't, bitch, whine, and moan at the feed author to do so,
charset detection is unavoidably a hack and shouldn't have to be done by
now if interoperating apps are coded sanely. (And if someone don't code
a RSS feed provider with interop in mind, there is no God anymore.)

David Vallner


--------------enig59279F3B4E92B82146E458C8
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (MingW32)

iD8DBQFFUyFTy6MhrS8astoRAm5sAJ4ujhl/Uhk60QWbdYjLF0EM8JvJcQCffHBi
R+PXPob9UdBdEWLNtZhqCfY=
=Zfmf
-----END PGP SIGNATURE-----

--------------enig59279F3B4E92B82146E458C8--
 
J

James Edward Gray II

Look for the XML header? That one should list encoding.

If it doesn't, bitch, whine, and moan at the feed author to do so,
charset detection is unavoidably a hack and shouldn't have to be
done by
now if interoperating apps are coded sanely.

Right, cause XML encoding headers *never* lie. ;)

If you have the header it's probably best to trust it. If not,
libcharguess is quite accurate, even if David labels it a "hack."

James Edward Gray II
 
J

Josef 'Jupp' Schugt

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

* Miquel Oliete, 11/08/2006 10:50 PM:
I don't know how can I detect the rss charset. Can you help me?

The only way is to write an artificial intelligence that a) can
understand any language present in the feeds and b) tries out all
possible encodings unless it understands the text.

Given the state and evolution speed of current implementations of
artificial intelligence it can be expected that such software will be
available soon - as early as next millenium or so.

Jupp
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)

iD8DBQFFVInPrhv7B2zGV08RAtSqAKCfOTh5ssyDHoV6ga8Nf3lS4eJVzQCff4Sz
7ViBT6pdRS08W7eTGeVtuB4=
=iYLE
-----END PGP SIGNATURE-----
 
R

Richard Conroy

Look for the XML header? That one should list encoding.

Shouldn't you be looking at the HTTP header instead/also?

Or just default to UTF-8 which *should* cover you anyway for 8859-1 loving
anglo-philes and most of the rest of the world. Though Japan can be a
bit native-charset-centric, especially the further you get from well-resourced
web sites (hobby sites etc.). There was a larger utf-8 burden-of-effort there,
whereas in the west being non-utf-8 is just pure laziness.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads

Ampersand entities 3
Saving an UTF-8 file 5
Problem with Amrita 2
encoding question 1
ruby gtk editable cells in treeview 2
encoding in windows (newby question) 4
Class & modifiers modifiers 4
Big project in ruby 3

Members online

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,820
Latest member
GilbertoA5

Latest Threads

Top