XML Replace

T

Trev

I'm trying to use Perl to replace a line in a few XML files I have.

Example XML below, I'm wanting to change the Id= part from Id="/Local/
App/App1" to Id=/App1". I know there's an easy way to do this with
perl alone however I'm trying to use XML::Simple or any XML plugin for
perl.

<?xml version="1.0" encoding="UTF-8" standalone="no" ?>

<Profile xmlns="xxxxxxxxx" name="" version="1.1" xmlns:xsi="http://
www.w3.org/2001/XMLSchema-instance">


<Application Name="App1" Id="/Local/App/App1" Services="1" policy=""
StartApp="" Bal="5" sessInt="500" WaterMark="1.0"/>


<AppProfileGuid>586e3456dt</AppProfileGuid>

</Profile>
 
K

Klaus

I'm trying to use Perl to replace a line in a few XML files I have.

Example XML below, I'm wanting to change the Id= part from  Id="/Local/
App/App1" to Id=/App1". I know there's an easy way to do this with
perl alone however

I don't think that processing XML with Perl alone (i.e. without any
module) is easy.
I'm trying to use XML::Simple
or any XML plugin for perl.

Have a look first at the excellent web site
Ways to Rome: Processing XML with Perl
http://xmltwig.com/article/ways_to_rome/ways_to_rome.html
(original version by Ingo Macherius, maintained by Michel Rodriguez)

If you don't find a solution there,
then you can always employ a combination of the CPAN modules
XML::Reader and XML::Writer
http://search.cpan.org/~keichner/XML-Reader-0.34/lib/XML/Reader.pm
http://search.cpan.org/~josephw/XML-Writer-0.611/Writer.pm

A sample program would look as follows:

use strict;
use warnings;

use XML::Reader;
use XML::Writer;

my $rdr = XML::Reader->newhd(\*DATA, {filter => 3});
my $wrt = XML::Writer->new(OUTPUT => \*STDOUT,
NEWLINES => 0, DATA_MODE => 1, DATA_INDENT => 2);

# If, with XML::Writer, you write mixed content XML (that
# is tags and characters in the same level, such as, for ex.:
# <data>abc<sub>def</sub>ghi</data>
# then XML::Writer will abort with message "Mixed content
# not allowed". To allow XML::Writer in this case, you
# will have to alter the parameters to
# XML::Writer->new(NEWLINES=>0, DATA_MODE=>0, DATA_INDENT=>0);
# or even to
# XML::Writer->new(NEWLINES=>1, DATA_MODE=>0, DATA_INDENT=>0);

$wrt->xmlDecl('UTF-8', 'no');

while ($rdr->iterate) {
my $tag = $rdr->tag;
my $val = $rdr->value;
my %att = %{$rdr->att_hash};

if ($rdr->path eq '/Profile/Application'
and defined $att{Id}) {
# change '/../../zzz' into 'zzz'
$att{Id} =~ s{\A .* /}''xms;
}

if ($rdr->is_start) { $wrt->startTag($tag, %att); }
if ($val ne '') { $wrt->characters($val); }
if ($rdr->is_end) { $wrt->endTag($rdr->tag); }
}

$wrt->end();

__DATA__
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<Profile
xmlns="xxxxxxxxx"
name=""
version="1.1"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

<Application
Name="App1" Id="/Local/App/App1" Services="1"
policy="" StartApp="" Bal="5" sessInt="500"
WaterMark="1.0"/>

<Application
Name="App99" Id="/Dummy/Test/iii" Services="3"
policy="99" StartApp="2" Bal="7" sessInt="27"
WaterMark="4.3"/>

<Application
Name="Yyee" Id="/Dat/Inp/Out" Services="5"
policy="88" StartApp="" Bal="1" sessInt="8"
WaterMark="2.1"/>

<AppProfileGuid>586e3456dt</AppProfileGuid>
<AppProfileGuid>a46y2hktt7</AppProfileGuid>
<AppProfileGuid>mi6j77mae6</AppProfileGuid>
</Profile>
 
J

Jürgen Exner

Klaus said:
I don't think that processing XML with Perl alone (i.e. without any
module) is easy.

Well, XML is a rather straightforward, well structured language. If you
are familar with compiler construction then it should be no big deal. At
least much easier to parse than let's say C or Perl itself or even HTML
(there are too many special cases in HTML).

jue
 
K

Klaus

Well, XML is a rather straightforward, well structured language. If you
are familar with compiler construction then it should be no big deal. At
least much easier to parse than let's say C or Perl itself or even HTML
(there are too many special cases in HTML).

I agree, XML is straight forward and well structured, that's why I
like to use it wherever I can.

....and if I was a compiler writer, I would say that processing XML was
easy :)

By the way, I have now released a new version of XML::Reader (ver
0.35) with some bug fixes, warts removed, relicensing, etc...
http://search.cpan.org/~keichner/XML-Reader-0.35/lib/XML/Reader.pm

The line I wrote in my previous post (which was for XML::Reader ver
0.34) was:

my $rdr = XML::Reader->newhd(\*DATA, {filter => 3});

With the new version 0.35 of XML::Reader, the same line would be
spelled:

my $rdr = XML::Reader->new(\*DATA, {mode => 'attr-in-hash'});
 
S

sln

I'm trying to use Perl to replace a line in a few XML files I have.

Example XML below, I'm wanting to change the Id= part from Id="/Local/
App/App1" to Id=/App1". I know there's an easy way to do this with
perl alone however I'm trying to use XML::Simple or any XML plugin for
perl.

<?xml version="1.0" encoding="UTF-8" standalone="no" ?>

<Profile xmlns="xxxxxxxxx" name="" version="1.1" xmlns:xsi="http://
www.w3.org/2001/XMLSchema-instance">


<Application Name="App1" Id="/Local/App/App1" Services="1" policy=""
StartApp="" Bal="5" sessInt="500" WaterMark="1.0"/>


<AppProfileGuid>586e3456dt</AppProfileGuid>

</Profile>

If what you need is all you state,
this code should fix up your xml.
Its restricted to just single tag-attribute pair.
It works by parsing exclusionary and specific markup.

The advantage here is that nothing else changes in the
original markup, only the string content of Id is changed
via the replacement side of the regex.
This avoids formatting headaches with some writers.

The regex may look simple for a parser, thats becuse it
is custom to the specific task.
The markup interraction is correct.

-sln

# -------------------------------------------
# rx_xml_fixval.pl
# -sln, 5/2/2010
#
# Util to extract some attribute/val's from
# xml/xhtml
# -------------------------------------------

use strict;
use warnings;

##
my $rxopen = "(?: Application )"; # Open tag , cannot be empty alternation
my $rxattr = "(?: Id )"; # Attribute we seek, cannot have an empty alternation

my $Rxmarkup = qr/
[^<]*
(?:
# Things that hide markup
(?: <! (?: \[CDATA\[.*?\]\] | --.*?-- | \[[A-Z][A-Z\ ]*\[.*?\]\] ) > ) \K
|
# Specific markup
(?: < (?<OPEN> $rxopen ) \s+[^>]*? (?<=\s) (?<ATTR> $rxattr) \s*=\s* \K(?<VAL> ".+?"|'.+?')
(?= [^>]*? \s* \/? > )
)
)
|
< \K
/xs;

##
my $html = join '', <DATA>;
$html =~ s/ $Rxmarkup/ fixval( $+{VAL} ) /eg;
print "\n",$html;

exit (0);

##
sub fixval {
return '' unless defined $_[0];
if ($_[0] =~ / \/ \s* (?<val>[^\/]+?) \s* (?<delim>["']) $/x) {
return "$+{delim}$+{val}$+{delim}";
}
return $_[0];
}


__DATA__

<?xml version="1.0" encoding="UTF-8" standalone="no" ?>

<Profile xmlns="xxxxxxxxx" name="" version="1.1" xmlns:xsi="http://
www.w3.org/2001/XMLSchema-instance">


<Application Name="App1" Id="/Local/App/App1" Services="1" policy=""
StartApp="" Bal="5" sessInt="500" WaterMark="1.0"/>


<AppProfileGuid>586e3456dt</AppProfileGuid>

</Profile>
 
S

sln

If what you need is all you state,
this code should fix up your xml.
Its restricted to just single tag-attribute pair.
It works by parsing exclusionary and specific markup.

The advantage here is that nothing else changes in the
original markup, only the string content of Id is changed
via the replacement side of the regex.
This avoids formatting headaches with some writers.

The regex may look simple for a parser, thats becuse it
is custom to the specific task.
The markup interraction is correct.

With a slight modification, multiple attr-val's can be done
within a single tag. Of course this includes some re-eval
fringe code (?{}) and a conditional (?() | ) but does the
same search and replace and on multiples.

Cheers!
-sln

Some output:
--------------------------
Id = "/Local/App/App1", (valnew = "App1")
Id2 = "/Local/App/App2", (valnew = "App2")
Id = '/Dummy/Test/iii', (valnew = 'iii')
Id = "/testing", (valnew = "testing")
Id = "/Dum
my/Test/iii
", (valnew = "iii")
Id = "/Dat/Inp/Out", (valnew = "Out")
Id = "/Local/App/App1", (valnew = "App1")
Id = "/Dummy/Test/iii", (valnew = "iii")
Id = "/Dat/Inp/Out", (valnew = "Out")
Tt = "TT/tt hello", (valnew = "tt hello")
Id = "/he llo", (valnew = "he llo")
-----------------------------

# -------------------------------------------
# rx_html_fixval2.pl
# -sln, 5/5/2010
#
# Util to search/replace attribute/val's from
# xml/html
# -------------------------------------------

use strict;
use warnings;

## Initialization
##

my $rxopen = "(?: Application )"; # Open tags , cannot be empty alternation
my $rxattr = "(?: Id.?|Tt )"; # Attributes we seek, cannot have an empty alternation
# "(?: \\w+ )";

use re 'eval';
my $topen = 0;

my $Rxmarkup = qr
{
(?(?{$topen}) # Begin Conditional

# Have <OPEN> ?
(?:
# Try to match next attr-val pair
\s+[^>]*? (?<=\s) (?<ATTR> $rxattr) \s*=\s* \K(?<VAL> ".+?"|'.+?')
(?= [^>]*? \s* /? > )
|
# No more attr-value pairs
(?{$topen = 0})
)
|
# Look for new <OPEN>
(?:
[^<]*
(?:
# Things that hide markup:
# - Comments/CDATA
(?: <! (?: \[CDATA\[.*?\]\] | --.*?-- | \[[A-Z][A-Z\ ]*\[.*?\]\] ) > ) \K
|
# Specific markup we seek:
# - OPEN tag
(?: < (?<OPEN> $rxopen \K) )
(?{$topen = 1})
)
|
< \K
)
) # End Conditional
}xs;

## Code
##

my $html = join '', <DATA>;
$html =~ s/$Rxmarkup/ fixval( $+{ATTR}, $+{VAL} ) /eg;
print "\n",$html;

exit (0);


## Subs
##

sub fixval {
return '' unless defined $_[1];
print "$_[0] = $_[1], ";
if ($_[1] =~ / \/ \s* (?<val>[^\/]+?) \s* (?<delim>["']) $/x) {
my $valnew = $+{delim}.$+{val}.$+{delim};
print "(valnew = $valnew)\n";
return $valnew;
}
print "(val unchanged)\n";
return $_[1];
}


__DATA__

<?xml version="1.0" encoding="UTF-8" standalone="no" ?>

<Profile xmlns="xxxxxxxxx" name="" version="1.1" xmlns:xsi="http://
www.w3.org/2001/XMLSchema-instance">

<Application Name="App1" Id="/Local/App/App1"
Id2="/Local/App/App2" Services="1" policy=""
StartApp="" Bal="5" sessInt="500" WaterMark="1.0"/>

<AppProfileGuid>586e3456dt</AppProfileGuid>

</Profile>

<Application
Name="App99" Id='/Dummy/Test/iii' Services="3"
policy="99" StartApp="2" Bal="7" sessInt="27"
WaterMark="4.3" />

<Application Id="/testing"
Name="App100" Id="/Dum
my/Test/iii
" Services="4"
policy="99" StartApp="2" Bal="7" sessInt="27"
WaterMark="4.3"/>

<Application
Name="Yyee" Id="/Dat/Inp/Out" Services="5"
policy="88" StartApp="" Bal="1" sessInt="8"
WaterMark="2.1"/>

<![INCLUDE CDATA [ <Application Name="App99" Id="//Test/can't see me"/> ]]>

<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<Profile
xmlns="xxxxxxxxx"
name=""
version="1.1"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

<Application
Name="App1" Id="/Local/App/App1" Services="1"
policy="" StartApp="" Bal="5" sessInt="500"
WaterMark="1.0"/>

<Application
Name="App99" Id="/Dummy/Test/iii" Services="3"
policy="99" StartApp="2" Bal="7" sessInt="27"
WaterMark="4.3"/>

<Application
Name="Yyee" Id="/Dat/Inp/Out" Services="5"
policy="88" StartApp="" Bal="1" sessInt="8"
WaterMark="2.1" Tt = "TT/tt hello"/>

<Application
Name="Yyee" Id="/he llo" Services="5"
policy="88" StartApp="" Bal="1" sessInt="8"
WaterMark="2.1"/>

<AppProfileGuid>586e3456dt</AppProfileGuid>
<AppProfileGuid>a46y2hktt7</AppProfileGuid>
<AppProfileGuid>mi6j77mae6</AppProfileGuid>
</Profile>
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,994
Messages
2,570,222
Members
46,810
Latest member
Kassie0918

Latest Threads

Top