A
Alex Mizrahi
Hello, All!
i have 3mb long XML document with about 150000 lines (i think it has about
200000 elements there) which i want to parse to DOM to work with.
first i thought there will be no problems, but there were..
first i tried Python.. there's special interest group that wants to "make
Python become the premier language for XML processing" so i thought there
will be no problems with this document..
i used xml.dom.minidom to parse it.. after it ate 400 meg of RAM i killed
it - i don't want such processing.. i think this is because of that fat
class impementation - possibly Python had some significant overhead for each
object instance, or something like this..
then i asdf-installed s-xml package and tried it with it. it ate only 25
megs for lxml representation. i think interning element names helped a lot..
it was CLISP that has unicode inside, so i think it could be even less
without unicode..
then i tried C++ - TinyXML. it was fast, but ate 65 megs.. ye, looks like
interning helps a lot 8-]
then i tried Perl XML:OM.. it was better than python - about 180megs, but
it was slowest.. at least it consumed mem slower than python 8-]
and java.. with default parser it took 45mbs.. maybe it interned strings,
but there was overhead from classes - storing trees is definitely what's
lisp optimized for 8-]
so lisp is winner.. but it has not standard way (even no non-standard but
simple) way to write binary IEEE floating point representation, so common
lisp suck and i will use c++ for my task.. 8-]]]
With best regards, Alex 'killer_storm' Mizrahi.
i have 3mb long XML document with about 150000 lines (i think it has about
200000 elements there) which i want to parse to DOM to work with.
first i thought there will be no problems, but there were..
first i tried Python.. there's special interest group that wants to "make
Python become the premier language for XML processing" so i thought there
will be no problems with this document..
i used xml.dom.minidom to parse it.. after it ate 400 meg of RAM i killed
it - i don't want such processing.. i think this is because of that fat
class impementation - possibly Python had some significant overhead for each
object instance, or something like this..
then i asdf-installed s-xml package and tried it with it. it ate only 25
megs for lxml representation. i think interning element names helped a lot..
it was CLISP that has unicode inside, so i think it could be even less
without unicode..
then i tried C++ - TinyXML. it was fast, but ate 65 megs.. ye, looks like
interning helps a lot 8-]
then i tried Perl XML:OM.. it was better than python - about 180megs, but
it was slowest.. at least it consumed mem slower than python 8-]
and java.. with default parser it took 45mbs.. maybe it interned strings,
but there was overhead from classes - storing trees is definitely what's
lisp optimized for 8-]
so lisp is winner.. but it has not standard way (even no non-standard but
simple) way to write binary IEEE floating point representation, so common
lisp suck and i will use c++ for my task.. 8-]]]
With best regards, Alex 'killer_storm' Mizrahi.