XML and PDF...

P

Peter Flynn

Verner said:
Hi'

Is it possible to store a PDF doc, as part of an XML?

No, not directly.
Should the PDF-part
be encoded/wrapped or something,

Yes, that's possible. You just have to ensure that the encode will never
output non-XML characters, nor "<" or "&" unless you put it in a CDATA
section.
cause I can't figure out how the XML text
format is able to hold binary data?

It can't. XML is a text file format.
The assignment is to extract the PDF from the XML - put it in an Oracle
BLOB - and store it in an Ora-DB.

The part which extract the PDF from XML - should this contain some kind of
conversion (text => binary) ?

The code which extracts the encoded data would trigger a decoder which
would recreate the PDF document.

I realise it's a college assignment, but I have difficulty imagining any
circumstances in which I would want to do this. I'd be interested to know
what the person who set the assignment envisages.

///Peter, java groups removed from posting
 
P

Patrick TJ McPhee

% Is it possible to store a PDF doc, as part of an XML? Should the PDF-part be
% encoded/wrapped or something, cause I can't figure out how the XML text
% format is able to hold binary data?

It's typical to use MIME base-64 encoding to encode binary data in XML
files.
 
V

Verner Jensen, Ålborg

Hi'

Is it possible to store a PDF doc, as part of an XML? Should the PDF-part be
encoded/wrapped or something, cause I can't figure out how the XML text
format is able to hold binary data?

The assignment is to extract the PDF from the XML - put it in an Oracle
BLOB - and store it in an Ora-DB.

The part which extract the PDF from XML - should this contain some kind of
conversion (text => binary) ?

Any help, samples, eg. would be appreciated...
Rgds, Henrik
 
R

Romin Irani

% Is it possible to store a PDF doc, as part of an XML? Should the PDF-part be
% encoded/wrapped or something, cause I can't figure out how the XML text
% format is able to hold binary data?

It's typical to use MIME base-64 encoding to encode binary data in XML
files.

Since the PDF file is a binary format -- you have to encode it in a
fashion that is compatible with text while inserting it into the XML
instance. As correctly mentioned here, you should be base64 encoding
for the same.

The process would roughly be the following:
a) To encode the PDF
1) Take the PDF content as bytes
2) Run it through a program / method which goes something like:
PDFInBase64Bytes = convertToBase64(PDFBytes)
3) Insert it into a XML instance after converting to string.
<MyXMLDoc>
<!-- other elements -->
<PDFSegment>Base64 representation of
PDF</PDFSegment>
</MyXMLDoc>
b) To decode the PDF
1) Extract out the value of the XML element <PDFSegment>.
2) Do the reverse i.e.
PDFBytes = decodeFromBase64(<PDFSegment> value...)
3) Provide the PDFBytes to a PDF-aware application e.g. Adobe PDF
Reader.

There are several free base64 encoding/decoding libraries available on
the net in a variety of languages. Pick up and try it out quickly.

We have used the above process as mentioned and it works fine.
 
V

Verner Jensen, Ålborg

Thx alot - fine description ;-)

Rgds, Henrik

Romin Irani said:
(e-mail address removed) (Patrick TJ McPhee) wrote in message


Since the PDF file is a binary format -- you have to encode it in a
fashion that is compatible with text while inserting it into the XML
instance. As correctly mentioned here, you should be base64 encoding
for the same.

The process would roughly be the following:
a) To encode the PDF
1) Take the PDF content as bytes
2) Run it through a program / method which goes something like:
PDFInBase64Bytes = convertToBase64(PDFBytes)
3) Insert it into a XML instance after converting to string.
<MyXMLDoc>
<!-- other elements -->
<PDFSegment>Base64 representation of
PDF</PDFSegment>
</MyXMLDoc>
b) To decode the PDF
1) Extract out the value of the XML element <PDFSegment>.
2) Do the reverse i.e.
PDFBytes = decodeFromBase64(<PDFSegment> value...)
3) Provide the PDFBytes to a PDF-aware application e.g. Adobe PDF
Reader.

There are several free base64 encoding/decoding libraries available on
the net in a variety of languages. Pick up and try it out quickly.

We have used the above process as mentioned and it works fine.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,999
Messages
2,570,246
Members
46,840
Latest member
BrendanG78

Latest Threads

Top