How to read big XML files using Perl - Memory Problem

Y

yann.pambou

Hi,

I'm converting a XML file into a .csv file. It works pretty well for
small XML files, but it doesn't work for big files ~150MB. My script
stops and gives me a memory problem message.

Anyone has any idea?

Here is the script that i'm using: Thanks,

#!/usr/bin/perl -w

## Remove commas

use warnings;
use strict;
use XML::XPath;

open (BLANK, "XML/Verizon.xml");
open (BLANK2, ">XML/Verizon2.xml");

## Align tags and values

my $blank = "";
while ($blank = <BLANK>) {
if ($blank =~ /^\s*$/) {
chomp($blank);
}
else {
$blank =~ s/,//g;
print BLANK2 $blank;
}
}

close (BLANK);
close (BLANK2);

iopen (DATA, "XML/Verizon2.xml");
open (FILE, ">import.csv");
my ($xp) = XML::XPath->new( join('', <DATA>) );
my(@records) = $xp->findnodes( '/TroubleTickets/TroubleTicket' );
my($firstTime) = 0;


foreach my $record ( @records ) {
my(@fields) = $xp->find( './child::*', $record )->get_nodelist();
unless ( $firstTime++ ) {
# print( join( ',', map { $_->getName() } @fields ), "\n");
print FILE ( join( ',', map { $_->getName() } @fields ), "\n");
}


# print( join( ',', map { $_->string_value() } @fields ), "\n");
my $test = join( ',', map { $_->string_value() } @fields );
$test =~ s/\n|\t//g;
# print FILE ( join( ',', map { $_->string_value() } @fields ), "\n");
print FILE "$test \n";
}


close (DATA);
close (FILE);
 
U

usenet

I'm converting a XML file into a .csv file. It works pretty well for
small XML files, but it doesn't work for big files ~150MB. My script
stops and gives me a memory problem message.
<snip>
iopen (DATA, "XML/Verizon2.xml");
<snip>
my ($xp) = XML::XPath->new( join('', <DATA>) );

There's your problem; you're reading that whole file into memory at
once.
 
U

usenet

Would you know how i can better do it? Thanks for helping me,

FWIW, you should quote some context on a follow-up, because not all
newsreaders (easily) provide article history/context. Quoting context
makes it easier for other lurkers in the group to also provide
follow-up help, so you aren't relying on just me to reply (you never
know - the aliens might abduct me again before I can respond).

I'm not really familiar with XPath, but I do see that you are doing:
my ($xp) = XML::XPath->new( join('', <DATA>) );

which reads the whole file into memory at once, whereas the docs show
it like this:

my $xp = XML::XPath->new(filename => 'test.xhtml');

which I'm pretty sure would not read the whole thing at once. Have you
tried it that way?

if you reply, you should quote my message from the point that I say
"I'm not really familiar...". If you're using GoogleGroups, you should
reply by clicking "Show Options" and then "Reply", which will
automatically quote context for you (but you should snip for brevity -
only quote back the minimum needed to establish context, not the whole
message).
 
M

Michel Rodriguez

I'm converting a XML file into a .csv file. It works pretty well for
small XML files, but it doesn't work for big files ~150MB. My script
stops and gives me a memory problem message.

Anyone has any idea?

I see 2 solutions:

- buy more memory for your machine, 1G should be plenty enough to hold
a 150Mb file. Dump XML::XPath, install libxml2 and XML::LibXML, and
fix your code to work with XML::LibXML. XML::LibXML is faster and
uses less memory than XML::XPath, and the code is very similar (they
both implement the DOM and XPath). This would be the cheapest and
easiest solution

- use XML::Twig, with a handler on /TroubleTickets/TroubleTicket,
and purge the data at the end of the handler, to only have 1
TroubleTicket in memory at any time. The code should not be too
difficult to write (from a _very_ cursory look at what you wrote),
it would be slower than when using XML::LibXML, but not dramatically
so (wild guess: maybe 5 minutes for 150Mb?)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,819
Latest member
masterdaster

Latest Threads

Top