J
jimmyfishbean
Hi,
I am using VB6, SAX (implementing IVBSAXContentHandler).
I need to extract binary encoded data (images) from large XML files and
decode this data and generate the appropriate images onto disk. My XML
files have the following structure:
<?xml version="1.0" encoding="utf-8" ?>
<imagepla xmlns:dt="urn:schemas-microsoft-com:datatypes">
<attachment>
<primary_id>28899</primary_id>
<filename>userguide3.pdf</filename>
<file
dt:dt="bin.base64">JVBERi0xLjMNJeLjz9MNCjU5NTAgMCBvYmoNPDwgDS9MaW5lYXJpemVkIDEgDS9PIDU5NTMgDS9I
IFsgMTM4OSAzODY0IF0gDS9MIDUwNTEyOTggDS9FIDEwMTQ3NCANL04gMTUzIA0vVCA0OTMyMTc4
.........
...................
</file>
</attachment>
<attachment>
......
......
</attachment>
</imagepla>
The encoded data (in the <file> element) neds to be extracted and then
decoded. I am trying to use SAX but I cannot read the whole of the
<file> element data at once (i.e. using DOM I would use
DOMDoc.nodeTypedValue). I understand that the DOM loads the whole
document into memory therefore the nodeTypedValue can be used.
I am using the following extract of code:
Dim strTmp as String
Dim byArr() as Byte
Private Sub IVBSAXContentHandler_characters(text As String)
...
strTmp = strTmp & text
...
btArr = strTmp
Open strAttFile For Binary As #1
Put #1, 1, btArr
Close #1
...
End Sub
The problem is that only 1 line at a time of the <file> node data is
passed to this sub. Therefore I need to reconstruct the whole of the
binary data for the image in a temp variable (strTmp), before I
determine the end of the file and then write it to disk.
This takes a vast amount of time (i.e. 20 minutes to try and decode a
4MB image). The XML file will contain 100s of images, so really the
current way of processing is no good at all.
Is there a way to read the whole of the data from the <file> node in
one go?
Also, I will be extracting the binary data and then use DOM to rewrite
the XML file without the binary data (so the user has a copy of the
original XML file - but a much smaller one since no binary in it).
Should I use DOM or SAXReader/SAXWriter?
Greatly appreciated. Thanks.
Jimmy
I am using VB6, SAX (implementing IVBSAXContentHandler).
I need to extract binary encoded data (images) from large XML files and
decode this data and generate the appropriate images onto disk. My XML
files have the following structure:
<?xml version="1.0" encoding="utf-8" ?>
<imagepla xmlns:dt="urn:schemas-microsoft-com:datatypes">
<attachment>
<primary_id>28899</primary_id>
<filename>userguide3.pdf</filename>
<file
dt:dt="bin.base64">JVBERi0xLjMNJeLjz9MNCjU5NTAgMCBvYmoNPDwgDS9MaW5lYXJpemVkIDEgDS9PIDU5NTMgDS9I
IFsgMTM4OSAzODY0IF0gDS9MIDUwNTEyOTggDS9FIDEwMTQ3NCANL04gMTUzIA0vVCA0OTMyMTc4
.........
...................
</file>
</attachment>
<attachment>
......
......
</attachment>
</imagepla>
The encoded data (in the <file> element) neds to be extracted and then
decoded. I am trying to use SAX but I cannot read the whole of the
<file> element data at once (i.e. using DOM I would use
DOMDoc.nodeTypedValue). I understand that the DOM loads the whole
document into memory therefore the nodeTypedValue can be used.
I am using the following extract of code:
Dim strTmp as String
Dim byArr() as Byte
Private Sub IVBSAXContentHandler_characters(text As String)
...
strTmp = strTmp & text
...
btArr = strTmp
Open strAttFile For Binary As #1
Put #1, 1, btArr
Close #1
...
End Sub
The problem is that only 1 line at a time of the <file> node data is
passed to this sub. Therefore I need to reconstruct the whole of the
binary data for the image in a temp variable (strTmp), before I
determine the end of the file and then write it to disk.
This takes a vast amount of time (i.e. 20 minutes to try and decode a
4MB image). The XML file will contain 100s of images, so really the
current way of processing is no good at all.
Is there a way to read the whole of the data from the <file> node in
one go?
Also, I will be extracting the binary data and then use DOM to rewrite
the XML file without the binary data (so the user has a copy of the
original XML file - but a much smaller one since no binary in it).
Should I use DOM or SAXReader/SAXWriter?
Greatly appreciated. Thanks.
Jimmy