DBIx::XML::DataLoader

Ted · Apr 15, 2008

I am trying to figure out how to use this package. It looks like it
may do what I need, and help me write the code more quickly than would
be the case if I started de novo.

First, although I have been programming in a variety of languages for
quite a while, i have managed to avoid having to parse XML until now.
I HATE parsing. I'd rather be implementing a new numeric integration
algorithm or method for some obscure but interesting statistical
analysis. But here I am and have to get this done.

The data feed I get appears to be well formed XML, but it is open
ended in that there is no defined schema. The only information I have
about what to expect in the XML is provided in the data feed
provider's documentation. The data structure appears to be very
simple, but working with it is tedious at best.

Is there a package or utility that can read an XML file of the sort I
get and create a schema based on what it sees in the data feed file?

In the page for DBIx::XML:

ataLoader::MapIt, I see the following:

<XMLtoDB>
<RootElement name="/Users"/>
<dbinfo dbuser="user" dbpass="pass"
dbsource="dbi:mysql:userdata" name="userdata"/>
<Table name="userinfo" dbname="userdata" xpath="./
user">
<KeyColumn name="USER_ID" order="1"/>
<KeyColumn name="USER_LAST_NAME" order="2"/>
<KeyColumn name="USER_FIRST_NAME" order="3"/>
<Element xpath="./id" toColumn="USER_ID"/>
<Element xpath="./last_name"
toColumn="USER_LAST_NAME"/>
<Element xpath="./first_name"
toColumn="USER_FIRST_NAME"/>
<Element xpath="./phone_number"
toColumn="PHONE_NUMBER"/>
</Table>
</XMLtoDB>};

Please bear with me for a moment. Am I to understand this is a
typical example of what a map file looks like? If I understand this
example correctly, the root element is just the outer most element of
the XML file to be expected. The dbinfo specifies the login
credentials to the database, and the database. It isn't clear to me
though what the name tag is for. Isn't the database specified as the
last item in the dbsource element? If so, of what value is the name
element? I would assume, based on what I see, that this package can
readily connect to a database implemented within MySQL.

The table element is of most interest. Obviously, I can create a
suite of tables that correspond to the structure of the XML file.
Also obviously not all columns in the database are keys or indeces.
And, I have no idea what XPATH is, let alone what to do with it. I
would hazard a guess that the KeyColumn and Element items map elements
in the XML file to elements in the table in the database. But it
isn't clear what I should do with columns that are NOT keys, or what
to do when the XML file is hierarchical, with nested elements that
logically ought to be placed in a different table, and keys created to
link the tables together (for example, imagine a trivial address book
that supports peole having multiple addresses, multiple email
addresses, and multiple phone numbers - or they may well NOT have a
phone or email). It is obvious that in a database, there would be a
person table, and an address table, (or email address table or phone
number table), and both the person and the address would have indeces
serving as primary keys, and there'd be a relation table that has a
pair of columns mapping person IDs to address IDs. And it is equally
obvious that there'd be no need for such indeces and keys in the XML
file since the relation would be implicit in the address (phone,
email) elements being child elements of the person element. Is
"DBIx::XML:

ataLoader::MapIt" able to facilitate managing such
normalized data from the XML file to the suite of tables required in
the database?

A little guidance, or the URL for a tutorial showing how best to use
this interesting package, would be greatly appreciated.

Thanks

Ted

Peter J. Holzer · Apr 23, 2008

I am trying to figure out how to use this package. It looks like it
may do what I need, and help me write the code more quickly than would
be the case if I started de novo.

First, although I have been programming in a variety of languages for
quite a while, i have managed to avoid having to parse XML until now.
I HATE parsing. I'd rather be implementing a new numeric integration
algorithm or method for some obscure but interesting statistical
analysis. But here I am and have to get this done.

The data feed I get appears to be well formed XML, but it is open
ended in that there is no defined schema. The only information I have
about what to expect in the XML is provided in the data feed
provider's documentation. The data structure appears to be very
simple, but working with it is tedious at best.

Is there a package or utility that can read an XML file of the sort I
get and create a schema based on what it sees in the data feed file?

In the page for DBIx::XML:ataLoader::MapIt, I see the following:

<XMLtoDB>
<RootElement name="/Users"/>
<dbinfo dbuser="user" dbpass="pass"
dbsource="dbi:mysql:userdata" name="userdata"/>
<Table name="userinfo" dbname="userdata" xpath="./
user">
<KeyColumn name="USER_ID" order="1"/>
<KeyColumn name="USER_LAST_NAME" order="2"/>
<KeyColumn name="USER_FIRST_NAME" order="3"/>
<Element xpath="./id" toColumn="USER_ID"/>
<Element xpath="./last_name"
toColumn="USER_LAST_NAME"/>
<Element xpath="./first_name"
toColumn="USER_FIRST_NAME"/>
<Element xpath="./phone_number"
toColumn="PHONE_NUMBER"/>
</Table>
</XMLtoDB>};

Please bear with me for a moment. Am I to understand this is a
typical example of what a map file looks like? If I understand this
example correctly, the root element is just the outer most element of
the XML file to be expected. The dbinfo specifies the login
credentials to the database, and the database. It isn't clear to me
though what the name tag is for. Isn't the database specified as the
last item in the dbsource element? If so, of what value is the name
element?

I think (just from superficially reading
http://search.cpan.org/~cberning/DBIx-XML-DataLoader-1.1b/DataLoader.pm,
I've never used this module) that you can use it to specify multiple
databases in the same map file. Then you can say "this data goes into
table X on database A, and this goes into table Y on database B". If you
don't need that, just choose a descriptive name.

I would assume, based on what I see, that this package can
readily connect to a database implemented within MySQL.

The table element is of most interest. Obviously, I can create a
suite of tables that correspond to the structure of the XML file.
Also obviously not all columns in the database are keys or indeces.
And, I have no idea what XPATH is, let alone what to do with it.

XPath is like Perl. It gets cranky when it's spelt in all upper case
;-).

Seriously: XPath is a language for selecting stuff from an XML file.
Sort of what regexps are for plain text or SQL for relational databases.
You can find the specification at http://www.w3.org/TR/xpath20/ and
google will help you find tutorials (Sorry, I don't have a good one at
hand - you'll probably have to read several of them and the specs, too
to get the hang of it).

would hazard a guess that the KeyColumn and Element items map elements
in the XML file to elements in the table in the database.

Seems plausible.

But it
isn't clear what I should do with columns that are NOT keys,

Just use an Element without a KeyColumn. Note that the Element element
has two attributes: xpath (which specifies where to find the data in
the XML file) and toColumn (which specifies where to put the data in the
table).

or what to do when the XML file is hierarchical, with nested elements
that logically ought to be placed in a different table, and keys
created to link the tables together (for example, imagine a trivial
address book that supports peole having multiple addresses, multiple
email addresses, and multiple phone numbers - or they may well NOT
have a phone or email).

If the XML file already contains the keys, you can specify them with
xpath. If it doesn't you are probably supposed to create them in a
handler.

hp

Martijn Lievaart · Apr 24, 2008

XPath is like Perl. It gets cranky when it's spelt in all upper case
;-).

Seriously: XPath is a language for selecting stuff from an XML file.
Sort of what regexps are for plain text or SQL for relational databases.
You can find the specification at http://www.w3.org/TR/xpath20/ and
google will help you find tutorials (Sorry, I don't have a good one at
hand - you'll probably have to read several of them and the specs, too
to get the hang of it).

May I advice the OP to get the O'Reilly XML book? Even if it's not the
solution to his problem, it's money well spend. It explains XPath not
downto all details but good enough to get you going. Once you get it it's
not so hard, but that book is probably the easiest way to get started.

HTH,
M4

Lord of Hyphens · Apr 25, 2008

May I advice the OP to get the O'Reilly XML book? Even if it's not the
solution to his problem, it's money well spend. It explains XPath not
downto all details but good enough to get you going. Once you get it it's
not so hard, but that book is probably the easiest way to get started.

HTH,
M4

The book you're talking about is actually the XSLT book; all of the
xpath query language is in that (I have both).

--LoH

Lord of Hyphens · Apr 25, 2008

The book you're talking about is actually the XSLT book; all of the
xpath query language is in that (I have both).

--LoH

Then again, the parent post is probably talking about the handbook, I
was talking about the Pocket Guide (a $10 book).

DBIx::Simple, authentication fails	2	Jun 27, 2007
DBIx::Simple variable interpolation problem	3	Jul 6, 2007
Read xml column inside csv file with Python	0	Jul 23, 2022
How to make XML::XPath ignore namespaces?	0	May 21, 2013
DBIx::Simple, fails with no error (not CGI this time!)	5	Jun 28, 2007
Foreign key error	2	Dec 26, 2024
How to save textBox values into a xml-file(with naming an choosing directory)?	1	Aug 23, 2022
Updating dynamic data from a sensor to xml document	0	Jan 13, 2020

DBIx::XML::DataLoader

Ted

Peter J. Holzer

Martijn Lievaart

Lord of Hyphens

Lord of Hyphens

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads