XML Parser

T

taz

Hello everyone, was wondering if someone might be able to point me in
the right direction. I have been working on a project that involves
reading XML files and taking some information from the file and
inserting into a MySQL db. I am fairly new to processing XML. I am
using xml simple to retrieve the needed information. I am getting all
the information I need, but when I pull it out of the XML I am getting
entities converted to their single characters (&) becomes &, etc.
Is there a way to keep the entity or do I have to use something else
to convert the string and have the characters into their proper format
before I insert it into the db?
 
S

sln

Hello everyone, was wondering if someone might be able to point me in
the right direction. I have been working on a project that involves
reading XML files and taking some information from the file and
inserting into a MySQL db. I am fairly new to processing XML. I am
using xml simple to retrieve the needed information. I am getting all
the information I need, but when I pull it out of the XML I am getting
entities converted to their single characters (&) becomes &, etc.
Is there a way to keep the entity or do I have to use something else
to convert the string and have the characters into their proper format
before I insert it into the db?

Simple isin't a parser. It uses a parser though. If Simple supports
it, you have to tell it to pass on to the parser that you want raw
content (original_content()) instead of translated.

Usually though, you use a SAX parser (Simple Api Xml) with your own
handlers to capture raw xml (original_content()) then, send the xml,
tags, attrib's, whatever, to Simple to have it convert into a structure.

Hopefully, you aren't using Simple to do the entire xml document.
Thats not such a good way to do it.

Of course, you could use RxParse (my module) version 2b which isin't
released yet, to do all of what Simple does and a hell of alot more.
I'm just finishing up on non-blocking and I will post it soon.

-sln
 
P

Peter J. Holzer

Hello everyone, was wondering if someone might be able to point me in
the right direction. I have been working on a project that involves
reading XML files and taking some information from the file and
inserting into a MySQL db. I am fairly new to processing XML. I am
using xml simple to retrieve the needed information. I am getting all
the information I need, but when I pull it out of the XML I am getting
entities converted to their single characters (&) becomes &, etc.
Is there a way to keep the entity or do I have to use something else
to convert the string and have the characters into their proper format
before I insert it into the db?

I am not sure why you would want to store strings with embedded entities
in a database, but you can simply reencode them before inserting them,
for example with escapeHTML from the CGI module or something like

$s =~ s/([&<>'"])/sprintf("&#%d;", ord($1))/eg;

hp
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,819
Latest member
masterdaster

Latest Threads

Top