add document tags to xml doc

D

DJ Stunks

hey all,

I'm using XML::parser to parse an xml file which is not well-formed.
The source I receive it from formats it as:

$ cat data.xml
<row><data foo="a"/></row>
<row><data baz="b"/></row>

Obviously the parser chokes on this. If I manually add document tags
as follows, my script is fine:

$ cat fixed-data.xml
<d>
<row><data foo="a"/></row>
<row><data baz="b"/></row>
</d>

Question: script is below. What is the easiest way to add the
document tags such that the parser doesn't choke without the
additional step of manually adding them? I can pass a filehandle to
the parser if there was a way to add document tags to the filehandle,
but the file is very large and I couldn't think of an easy way to add
the data to the filehandle without slurping into a scalar.

TIA,
-jp

$ cat tmp.pl
#!/usr/bin/perl

use strict;
use warnings;

use XML::parser;

my $parser = XML::parser->new(Handlers => { Start =>
\&handle_start });
$parser->parsefile('fixed-data.xml');

sub handle_start {
my ($p, $el, %atts) = @_;

if ($el eq 'data') {
for my $k (keys %atts) {
print "$k: $atts{$k}\n";
}
}
}

__END__
 
D

DJ Stunks

Quoth DJ Stunks <[email protected]>:









See XML::parser->parse_start. You will obviously have to handle reading
chunks from the file manually.

Thanks very much, Ben. With the modified code below I'm good to go.
It took a bit of hunting on that method, thanks for pointing it out.

-jp

#!/usr/bin/perl

use strict;
use warnings;

use XML::parser;

my $file = 'data.xml';

my $parser = XML::parser->new(Handlers => { Start =>
\&handle_start });

my $p = $parser->parse_start();
$p->parse_more('<d>');

open (my $fh, '<', $file) or die "Could not open '$file': $!";

LINE:
while (my $line = <$fh>) {
$p->parse_more($line);
}

sub handle_start {
my ($p, $el, %atts) = @_;

if ($el eq 'data') {
for my $k (keys %atts) {
print "$k: $atts{$k}\n";
}
}

}

__END__
 
X

Xho Jingleheimerschmidt

DJ said:
hey all,

I'm using XML::parser to parse an xml file which is not well-formed.
The source I receive it from formats it as:

$ cat data.xml
<row><data foo="a"/></row>
<row><data baz="b"/></row>

Obviously the parser chokes on this. If I manually add document tags
as follows, my script is fine:

$ cat fixed-data.xml
<d>
<row><data foo="a"/></row>
<row><data baz="b"/></row>
</d>

Question: script is below. What is the easiest way to add the
document tags such that the parser doesn't choke without the
additional step of manually adding them? I can pass a filehandle to
the parser if there was a way to add document tags to the filehandle,
but the file is very large and I couldn't think of an easy way to add
the data to the filehandle without slurping into a scalar.

Ben's answer is better, of course, but here is a dirty way to get
such file handle if you really need a file handle:

open my $fh, q{echo "<d>"; cat $file; echo "</d>"|} or die $!;

It's not portable, not tolerant of special characters in $file, and may
not fail as expected if $file is not readable.

Xho
 
S

sln

Thanks very much, Ben. With the modified code below I'm good to go.
It took a bit of hunting on that method, thanks for pointing it out.

-jp

#!/usr/bin/perl

use strict;
use warnings;

use XML::parser;

my $file = 'data.xml';

my $parser = XML::parser->new(Handlers => { Start =>
\&handle_start });

my $p = $parser->parse_start();
$p->parse_more('<d>');

open (my $fh, '<', $file) or die "Could not open '$file': $!";

LINE:
while (my $line = <$fh>) {
$p->parse_more($line);
}

sub handle_start {
my ($p, $el, %atts) = @_;

if ($el eq 'data') {
for my $k (keys %atts) {
print "$k: $atts{$k}\n";
}
}

}

__END__

Its likely that you will want to finish the ExpatNB parse
with a call to XML::parser::ExpatNB::parse_done because it
releases any circular data structure references.

But calling parse_done() validates your closures.
The sequence then requires a final call to parse_more('</d>').
Something like below.

-sln
----------
use strict;
use warnings;

use XML::parser;

my $parser = XML::parser::ExpatNB->new();
$parser->setHandlers(Start => \&handle_start);


print $parser,"\n";

{ local $/;
$parser->parse_more('<d>');
$parser->parse_more(<DATA>);
$parser->parse_more('</d>');
$parser->parse_done;
}

sub handle_start {
my ($p, $el, %atts) = @_;
if ($el eq 'data') {
for my $k (keys %atts) {
print "$k: $atts{$k}\n";
}
}
}

__DATA__

<row><data foo="a"/></row>
<row><data baz="b"/></row>
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,982
Messages
2,570,186
Members
46,743
Latest member
WoodrowMea

Latest Threads

Top