Questions on XML

J

joy99

Dear Group,

I like to convert some simple strings of natural language to XML. May
I use Python to do this? If any one can help me, on this.

I am using primarily UTF-8 based strings, like Hindi or Bengali. Can I
use Python to help me in this regard?

How can I learn good XML aspects of Python. If any one can kindly name
me a book or URL.

I am using Python2.6 on Windows XP with IDLE as GUI.

Best Regards,
Subhabrata.
 
D

David Smith

joy99 said:
Dear Group,

I like to convert some simple strings of natural language to XML. May
I use Python to do this? If any one can help me, on this.

I am using primarily UTF-8 based strings, like Hindi or Bengali. Can I
use Python to help me in this regard?

How can I learn good XML aspects of Python. If any one can kindly name
me a book or URL.

I am using Python2.6 on Windows XP with IDLE as GUI.

Best Regards,
Subhabrata.

Take a look at xml.etree.ElementTree package and it's contents. It's
included in the binary distributions of Python 2.6. There are lot's of
books out covering XML and UTF-8 is exactly where you want to be w/ XML.

--David
 
R

Rami Chowdhury

I am using primarily UTF-8 based strings, like Hindi or Bengali. Can I
use Python to help me in this regard?

I can say from experience that Python on Windows (at least, Python
2.5 on 32-bit Vista) works perfectly well with UTF-8 files containing
Bangla. I have had trouble with working with the data in IDLE,
however, which seems to prefer ASCII by default.

-------------
Rami Chowdhury
"Never assume malice when stupidity will suffice." -- Hanlon's Razor
408-597-7068 (US) / 07875-841-046 (UK) / 0189-245544 (BD)
 
S

Stefan Behnel

Rami said:
I can say from experience that Python on Windows (at least, Python 2.5
on 32-bit Vista) works perfectly well with UTF-8 files containing
Bangla. I have had trouble with working with the data in IDLE, however,
which seems to prefer ASCII by default.

Defaults almost never work for encodings. You have to be explicit: add an
encoding declaration to the top of your source file if you use encoded
literal strings in your code; use the codecs module with a suitable
encoding to read encoded text files, and use an XML parser when reading XML.

Stefan
 
K

Kee Nethery

Dear Group,

I like to convert some simple strings of natural language to XML. May
I use Python to do this? If any one can help me, on this.

I am using primarily UTF-8 based strings, like Hindi or Bengali. Can I
use Python to help me in this regard?

As a newbie, the thing that caused me trouble was importing a string
into the XML parser. The parser seemed to want to open a file and I
had a string. The solution was one of these:

from xml.etree import ElementTree as et
theXmlDataTree = et.parse(StringIO.StringIO(theXmlString))


from xml.etree import ElementTree as et
theXmlDataTree = et.ElementTree(et.XML(theXmlString))

Not sure which you would use nor what the differences are. I have the
first set commented out in my code so for some reason I switched to
the second set of code to take a string and pull it into the XML parser.

Once the string is in the parser, all the examples worked. It was
getting it into the parser that had me stumped because none of the
examples showed this situation, it appears to be obvious to someone
who has used Python for a while.

Kee
 
J

joy99

Defaults almost never work for encodings. You have to be explicit: add an
encoding declaration to the top of your source file if you use encoded
literal strings in your code; use the codecs module with a suitable
encoding to read encoded text files, and use an XML parser when reading XML.

Stefan

Dear Group,
Thanx for your reply. Python works perfectly for Hindi and Bangla with
Win XP. I never had a trouble.
Best Regards,
Subhabrata.
 
E

Emmanuel Surleau

I am using primarily UTF-8 based strings, like Hindi or Bengali. Can I
Defaults almost never work for encodings. You have to be explicit: add an
encoding declaration to the top of your source file if you use encoded
literal strings in your code; use the codecs module with a suitable
encoding to read encoded text files, and use an XML parser when reading
XML.

Actually, default *should* work for XML. The default encoding for an XML file
is UTF-8 (AFAIK).

Cheers,

Emm
 
E

Emmanuel Surleau

Dear Group,
Thanx for your reply. Python works perfectly for Hindi and Bangla with
Win XP. I never had a trouble.
Best Regards,
Subhabrata.

You might also want to have a look at lxml. It can much more than the XML
module in the default distribution, uses ElementTree as well, and is backed by
the kickass, fast libxml library (http://codespeak.net/lxml/). It will allow
you to use XSLs, for instance. Regardless of whether you use lxml or not, have
a look at etree.iterparse, it is invaluable when processing huge XML
documents.

Cheers,

Emm
 
R

Rami Chowdhury

encoding declaration to the top of your source file if you use encoded
literal strings in your code

Any tips for how to set the encoding in IDLE? printing the Unicode
strings works -- trying to repr() the variable chokes with a
UnicodeDecodeError, and trying to enter the literals inside IDLE just
gets me '?' characters instead.

(this is Python 2.5 + IDLE installed from the Python-2.5.msi on
python.org)

-------------
Rami Chowdhury
"Never assume malice when stupidity will suffice." -- Hanlon's Razor
408-597-7068 (US) / 07875-841-046 (UK) / 0189-245544 (BD)
 
S

Stefan Behnel

Kee said:
As a newbie, the thing that caused me trouble was importing a string
into the XML parser.

That would be

root_element = et.fromstring(some_string)

The parse() function is meant to parse from a file.

from xml.etree import ElementTree as et
theXmlDataTree = et.parse(StringIO.StringIO(theXmlString))

from xml.etree import ElementTree as et
theXmlDataTree = et.ElementTree(et.XML(theXmlString))

You can use both, but I suspect parsing from StringIO to be slower than
parsing from the string directly. That's the case for lxml, at least.

Note that fromstring() behaves the same as XML(), but it reads better when
parsing from a string variable. XML() reads better when parsing from a
literal string.

Stefan
 
K

Kee Nethery

You can use both, but I suspect parsing from StringIO to be slower
than
parsing from the string directly. That's the case for lxml, at least.

Note that fromstring() behaves the same as XML(), but it reads
better when
parsing from a string variable. XML() reads better when parsing from a
literal string.

I'm not sure I know the difference between a string variable and a
literal string. Is the difference as simple as:

somestring = u'<stuff>hello world</stuff>'
fromstring(somestring) <-- string variable
vs
XML(u'<stuff>hello world</stuff>') <-- literal string

Kee
 
P

Piet van Oostrum

Kee Nethery said:
KN> On Aug 22, 2009, at 3:32 AM, Stefan Behnel wrote:
KN> I'm not sure I know the difference between a string variable and a literal
KN> string. Is the difference as simple as:
KN> somestring = u'<stuff>hello world</stuff>'
KN> fromstring(somestring) <-- string variable
KN> vs
KN> XML(u'<stuff>hello world</stuff>') <-- literal string

Yes.

Stefan probably means `looks better for the human reader' when he says
`reads better'. XML and fromstring are just different names for the same
function.
 
J

joy99

Any tips for how to set the encoding in IDLE? printing the Unicode  
strings works -- trying to repr() the variable chokes with a  
UnicodeDecodeError, and trying to enter the literals inside IDLE just  
gets me '?' characters instead.

(this is Python 2.5 + IDLE installed from the Python-2.5.msi on  
python.org)

-------------
Rami Chowdhury
"Never assume malice when stupidity will suffice." -- Hanlon's Razor
408-597-7068 (US) / 07875-841-046 (UK) / 0189-245544 (BD)





- Show quoted text -

Dear Sir,

There is no big issue for this. I simply downloaded Python2.5 and
using IDLE on Windows XP with service pack 2. I used Python 2.5
earlier and presently using Python 2.6 and it is also perfectly fine.
I am not using it for small programs but machine learning programs,
where code itself runs few million lines over some terabytes of Hindi
and Bengali data, and I never found any problem. What is the exact
nature of problem you are getting? If you can kindly specify. If
possible, with some sample codes. And printing is also never been a
problem, all good printers like HP,Epson,Xerox,Canon are printing
Hindi or Bengali data finely.
If you can kindly specify the problem.

Best Regards,
Subhabrata.
 
J

Jan Kaliszewski

22-08-2009 o 19:46:51 Kee Nethery said:
I'm not sure I know the difference between a string variable and a
literal string. Is the difference as simple as:

somestring = u'<stuff>hello world</stuff>'
fromstring(somestring) <-- string variable
vs
XML(u'<stuff>hello world</stuff>') <-- literal string

Yes, simply:

s = 'hello world'
# ^
# it is a *string literal*

s # <- it is a *string object*
# (or rather a name referring to it :))

(In Python we have rather 'names' than 'variables', though -- as
a mental leap -- they are commonly referred to as 'variables',
regarding other languages' terminology).

Cheers,
*j
 
R

Rami Chowdhury

My problem is with IDLE on Windows. When I try to type Bangla directly into
the IDLE window I only get '?' characters, and repr() fails with a
UnicodeDecodeError. I expect, though, that that's because of my specific
installation / Windows issues, as it works fine on Fedora 10...
I do not get any problem in processing Hindi or Bangla or any Indian
language in Python it is perfectly fine.
I have no problems either -- my issues are with IDLE, and only on Windows.

----
Rami Chowdhury
"Strangers are just friends who haven't had enough gin." -- Howdle's Saying
408-597-7068 (US) / 07875-841-046 (UK) / 0189-245544 (BD)

Should I help you? If you answered my questions I am differing from your
view I do not get any problem in processing Hindi or Bangla or any Indian
language in Python it is perfectly fine.
Best Regards,
Subhabrata.

 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,199
Messages
2,571,045
Members
47,643
Latest member
ashutoshjha_1101

Latest Threads

Top