How to convert Elements' name to lowercase?

S

Son KwonNam

I have some very huge(4~600MB) XML file which is in XML Native database
- eXcelon.

The problem is that I need to convert all the xml elements' names to
lowercase.

I think I could do this with XSLT.
But the problem is that it's too big XML.

Speed doesn't matter.

Any idea to conver the big xml with small amount of memory?

The database support xslt, DOM, SAX.

Thanks,
KwonNam.
 
K

Keith Davies

Son KwonNam said:
I have some very huge(4~600MB) XML file which is in XML Native database
- eXcelon.

The problem is that I need to convert all the xml elements' names to
lowercase.

I think I could do this with XSLT.
But the problem is that it's too big XML.

Speed doesn't matter.

Any idea to conver the big xml with small amount of memory?

The database support xslt, DOM, SAX.

Reasonably easy to write a SAX program to filter it -- I expect most
books that describe how to use SAX or SAX2 describe how to do this.

You might consider Perl or the like, too. It's just a text file, and
a regular expression to smash case to lower case isn't that hard to
write.


Keith
 
P

Peter Flynn

Son said:
I have some very huge(4~600MB) XML file which is in XML Native database
- eXcelon.

The problem is that I need to convert all the xml elements' names to
lowercase.

I think I could do this with XSLT.
But the problem is that it's too big XML.

Speed doesn't matter.

Any idea to conver the big xml with small amount of memory?

The database support xslt, DOM, SAX.

On any Linux/Unix system, type

grep -v '^<?xml' myfile.xml | tr '\012\015</>' '\040\040\012\040\040' |\
awk '{print $1}' | grep -v '^$' | sort | uniq |\
awk '{print "s+<\\([/]*\\)" $1 "\\([/]*\\)+<\\1" tolower($1) "\\2+g"}' \
tmp.sed; sed -f tmp.sed myfile.xml >out.xml

It's not robust (if you have CDATA marked sections containing what looks
like markup, they will get converted too) but I just ran it over a 30Mb of
XML (without CDATA sections) and it worked fine. Crude, but it may help.

///Peter
 
D

Dimitre Novatchev

Do have in mind that any "solution" will generally not be lossless.

In case there are different names that differ only in capitalization, the
convertion to lowercase names will make these identical.

Cheers,
Dimitre Novatchev.
 
N

nicolas //

You might consider Perl or the like, too. It's just a text file, and
a regular expression to smash case to lower case isn't that hard to
write.

Use the perl module XML::Twig, by M. Rodrigez (http://www.xmltwig.org/, there is a tutorial on the website) and process your huge file chunk by chunk, so that you avoid memory leaks
 
G

grouch

The problem is that I need to convert all the xml elements' names to
lowercase.

Using xmlstarlet 1.0.1 (freeware) from http://xmlstar.sourceforge.net/
you could do (single line)

xml pyx SampleReport.xml | awk '{if (/^\(/) print tolower($0); else if
(/^\)/) print tolower($0); else print $0; }' | xml p2x

XML file will be processed using SAX, so it should be fast.

--MG
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,999
Messages
2,570,246
Members
46,839
Latest member
MartinaBur

Latest Threads

Top