XML::Parser and namespaces

P

placebo.domingo

I'm pulling my hair out trying to do something that seems like it ought
to be really simple to do. In a nutshell, all I want to do is process
an XML document using only the elements that are in a certain
namespace. Elements that are in any other namespace should be ignored
by my processor. It seems like a simple objective but, despite reading
the documentation on XML::parser and XML::parser::Expat I can't figure
out how to selectively ignore elements based on namespace.

Let me give an example of what I want to do. Consider the following
XML document:

<?xml version="1.0" encoding="UTF-8"?>
<nsa:note
xmlns:nsa="http://www.mydomain.org/NameSpaceA"
xmlns:nsb="http://www.mydomain.org/NameSpaceB"<nsa:data>
<nsb:mystuff/>
<mystuff xmlns="http://www.mydomain.org/NameSpaceB">
<details/>
</mystuff>
</nsa:data>
</nsa:note>

That document uses two namespaces: http://www.mydomain.org/NameSpaceA
and http://www.mydomain.org/NameSpaceB. Suppose that in my parser I
only care about the elements associated with
http://www.mydomain.org/NameSpaceA (which we'll call "nsa"). Elements
associated with http://www.mydomain.org/NameSpaceB or any other
namespace should be ignored.

I'm open to any suggestions on how to do this. Let me show how I've
been *trying* to do it and maybe you can tell me what to fill in.

Consider this very simple perl code:

package XML::SchemaInfo;
use strict;
use XML::parser;

my ($p);
$p = XML::parser->new(Namespaces=>1);
$p->setHandlers(Start => \&StartTag);
$p->parsefile('my.xml');

sub StartTag {
my ($expat, $tagname) = @_;
my $namespace = "HOW DO I DO THIS???";
print "$tagname: this tag is in the namespace $namespace\n";
}

In the StartTag subroutine, $tagname is input as the name of the tag
without any namespace prefix. Well and good, but I want to know the
namespace.

See that line that says "HOW DO I DO THIS???" That's where I would
want some kind of call that returns the namespace for the tag that
triggered the call to the handler. I frankly admit I thought there
would be some kind of method call called "tag_namespace" or something
like that.

I've read and reread the document Perldocs for XML::parser and
XML::parser::Expat and I just don't see anything that tells me how to
tell the namespace of an element.

Any help would be appreciated.
 
M

Matt Garrish

I'm pulling my hair out trying to do something that seems like it ought
to be really simple to do. In a nutshell, all I want to do is process
an XML document using only the elements that are in a certain
namespace. Elements that are in any other namespace should be ignored
by my processor. It seems like a simple objective but, despite reading
the documentation on XML::parser and XML::parser::Expat I can't figure
out how to selectively ignore elements based on namespace.

Let me give an example of what I want to do. Consider the following
XML document:

<?xml version="1.0" encoding="UTF-8"?>
<nsa:note
xmlns:nsa="http://www.mydomain.org/NameSpaceA"
xmlns:nsb="http://www.mydomain.org/NameSpaceB"
<nsa:data>
<nsb:mystuff/>
<mystuff xmlns="http://www.mydomain.org/NameSpaceB">
<details/>
</mystuff>
</nsa:data>
</nsa:note>

That document uses two namespaces: http://www.mydomain.org/NameSpaceA
and http://www.mydomain.org/NameSpaceB. Suppose that in my parser I
only care about the elements associated with
http://www.mydomain.org/NameSpaceA (which we'll call "nsa"). Elements
associated with http://www.mydomain.org/NameSpaceB or any other
namespace should be ignored.

I'm open to any suggestions on how to do this. Let me show how I've
been *trying* to do it and maybe you can tell me what to fill in.

Consider this very simple perl code:

package XML::SchemaInfo;
use strict;
use XML::parser;

my ($p);
$p = XML::parser->new(Namespaces=>1);
$p->setHandlers(Start => \&StartTag);
$p->parsefile('my.xml');

sub StartTag {
my ($expat, $tagname) = @_;
my $namespace = "HOW DO I DO THIS???";

my $namespace = $expat->namespace($tagname);
print "$tagname: this tag is in the namespace $namespace\n";
}

The above will give you the URI, not the prefix, but that should be enough
to do what you want.

Matt
 
R

robic0

I'm pulling my hair out trying to do something that seems like it ought
to be really simple to do. In a nutshell, all I want to do is process
an XML document using only the elements that are in a certain
namespace. Elements that are in any other namespace should be ignored
by my processor. It seems like a simple objective but, despite reading
the documentation on XML::parser and XML::parser::Expat I can't figure
out how to selectively ignore elements based on namespace.

Let me give an example of what I want to do. Consider the following
XML document:

<?xml version="1.0" encoding="UTF-8"?>
<nsa:note
xmlns:nsa="http://www.mydomain.org/NameSpaceA"
xmlns:nsb="http://www.mydomain.org/NameSpaceB"
<nsa:data>
<nsb:mystuff/>
<mystuff xmlns="http://www.mydomain.org/NameSpaceB">
<details/>
</mystuff>
</nsa:data>
</nsa:note>

That document uses two namespaces: http://www.mydomain.org/NameSpaceA
and http://www.mydomain.org/NameSpaceB. Suppose that in my parser I
only care about the elements associated with
http://www.mydomain.org/NameSpaceA (which we'll call "nsa"). Elements
associated with http://www.mydomain.org/NameSpaceB or any other
namespace should be ignored.

I'm open to any suggestions on how to do this. Let me show how I've
been *trying* to do it and maybe you can tell me what to fill in.

Consider this very simple perl code:

package XML::SchemaInfo;
use strict;
use XML::parser;

my ($p);
$p = XML::parser->new(Namespaces=>1);
$p->setHandlers(Start => \&StartTag);
$p->parsefile('my.xml');

sub StartTag {
my ($expat, $tagname) = @_;
my $namespace = "HOW DO I DO THIS???";
print "$tagname: this tag is in the namespace $namespace\n";
}

In the StartTag subroutine, $tagname is input as the name of the tag
without any namespace prefix. Well and good, but I want to know the
namespace.

See that line that says "HOW DO I DO THIS???" That's where I would
want some kind of call that returns the namespace for the tag that
triggered the call to the handler. I frankly admit I thought there
would be some kind of method call called "tag_namespace" or something
like that.

I've read and reread the document Perldocs for XML::parser and
XML::parser::Expat and I just don't see anything that tells me how to
tell the namespace of an element.

Any help would be appreciated.

Well heres just some help. Your getting paid for it not me.
Don't know about xml-parser. Know a little more about xml-parser-expat.
Create a new parser with the namespace flag set.
The start handler will be passed the object. From the start/end tags, seems you have
to call the namespace method to determine which namespace its from. Makes sence since
why should the damn parse do filtering for you. The processors job is to provide
you every relavent detail in its methods. Its really up to *you* what you do
with it. I respect that philosophy.

I don't really know why the processor must socket out to get info from a site??
Guess I don't know web that well. Actually I've written a xml processor to 1.1 stds.
However its incomplete. Its fairly complicated. So the namespace is stripped from the
tag and the name is passed. Within the handler just call the method to find out from which
it comes from. Don't make a big deal out of it. Let the truth be known, you didn't read shit!!

Expat METHODS
==================
new(options)
-------
This is a class method, the constructor for XML::parser::Expat.
Options are passed as keyword value pairs. The recognized options are:
[snip]
Namespaces (A option to new)
When this option is given with a true value, then the parser does namespace processing.
By default, namespace processing is turned off. When it is turned on, the parser consumes
xmlns attributes and strips off prefixes from element and attributes names where those
prefixes have a defined namespace.
A name's namespace can be found using the namespace method and two names can be checked
for absolute equality with the eq_name method.


namespace(name)
-----------------
Return the URI of the namespace that the name belongs to.
If the name doesn't belong to any namespace, an undef is returned.
This is only valid on names received through the Start or End handlers from a single document,
or through a call to the generate_ns_name method. In other words, don't use names generated
from one instance of XML::parser::Expat with other instances.

other *_ns_* options ..........
 
R

robic0

my $namespace = $expat->namespace($tagname);


The above will give you the URI, not the prefix, but that should be enough
to do what you want.

Matt
Hey your beating me to the posting punch boy ..............
 
R

robic0

I'm pulling my hair out trying to do something that seems like it ought
to be really simple to do. In a nutshell, all I want to do is process
an XML document using only the elements that are in a certain
namespace. Elements that are in any other namespace should be ignored
by my processor. It seems like a simple objective but, despite reading
the documentation on XML::parser and XML::parser::Expat I can't figure
out how to selectively ignore elements based on namespace.

Let me give an example of what I want to do. Consider the following
XML document:

<?xml version="1.0" encoding="UTF-8"?>
<nsa:note
xmlns:nsa="http://www.mydomain.org/NameSpaceA"
xmlns:nsb="http://www.mydomain.org/NameSpaceB"
<nsa:data>
<nsb:mystuff/>
<mystuff xmlns="http://www.mydomain.org/NameSpaceB">
<details/>
</mystuff>
</nsa:data>
</nsa:note>

That document uses two namespaces: http://www.mydomain.org/NameSpaceA
and http://www.mydomain.org/NameSpaceB. Suppose that in my parser I
only care about the elements associated with
http://www.mydomain.org/NameSpaceA (which we'll call "nsa"). Elements
associated with http://www.mydomain.org/NameSpaceB or any other
namespace should be ignored.

I'm open to any suggestions on how to do this. Let me show how I've
been *trying* to do it and maybe you can tell me what to fill in.

Consider this very simple perl code:

package XML::SchemaInfo;
use strict;
use XML::parser;

my ($p);
$p = XML::parser->new(Namespaces=>1);
$p->setHandlers(Start => \&StartTag);
$p->parsefile('my.xml');

sub StartTag {
my ($expat, $tagname) = @_;
my $namespace = "HOW DO I DO THIS???";
print "$tagname: this tag is in the namespace $namespace\n";
}

In the StartTag subroutine, $tagname is input as the name of the tag
without any namespace prefix. Well and good, but I want to know the
namespace.

See that line that says "HOW DO I DO THIS???" That's where I would
want some kind of call that returns the namespace for the tag that
triggered the call to the handler. I frankly admit I thought there
would be some kind of method call called "tag_namespace" or something
like that.

I've read and reread the document Perldocs for XML::parser and
XML::parser::Expat and I just don't see anything that tells me how to
tell the namespace of an element.

Any help would be appreciated.

Well heres just some help. Your getting paid for it not me.
Don't know about xml-parser. Know a little more about xml-parser-expat.
Create a new parser with the namespace flag set.
The start handler will be passed the object. From the start/end tags, seems you have
to call the namespace method to determine which namespace its from. Makes sence since
why should the damn parse do filtering for you. The processors job is to provide
you every relavent detail in its methods. Its really up to *you* what you do
with it. I respect that philosophy.

I don't really know why the processor must socket out to get info from a site??
Guess I don't know web that well. Actually I've written a xml processor to 1.1 stds.
However its incomplete. Its fairly complicated. So the namespace is stripped from the
tag and the name is passed. Within the handler just call the method to find out from which
it comes from. Don't make a big deal out of it. Let the truth be known, you didn't read shit!!

Expat METHODS
==================
new(options)
-------
This is a class method, the constructor for XML::parser::Expat.
Options are passed as keyword value pairs. The recognized options are:
[snip]
Namespaces (A option to new)
When this option is given with a true value, then the parser does namespace processing.
By default, namespace processing is turned off. When it is turned on, the parser consumes
xmlns attributes and strips off prefixes from element and attributes names where those
prefixes have a defined namespace.
A name's namespace can be found using the namespace method and two names can be checked
for absolute equality with the eq_name method.


namespace(name)
-----------------
Return the URI of the namespace that the name belongs to.
If the name doesn't belong to any namespace, an undef is returned.
This is only valid on names received through the Start or End handlers from a single document,
or through a call to the generate_ns_name method. In other words, don't use names generated
from one instance of XML::parser::Expat with other instances.

other *_ns_* options ..........

If you want to learn why these processors offer these methods, read the standard from start to
finish. It will give you invaluable insight to what these parser methods seem to provide.

http://www.w3.org/TR/xml11/

Its not that tough. A couple of 8 hour days readin this is all you need. You will never have to
post questions like this again....

good luck rider
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,981
Messages
2,570,188
Members
46,733
Latest member
LonaMonzon

Latest Threads

Top