XML Oddity

M

Mark Johnson

DELURK<<

Over the last few weeks, we've been working on building an online
portfolio using XML to pass content to an HTML page via PHP. In the
process, we've run across a rather inexplicable error which we've been
unable to find any reference to elsewhere. Hopefully, someone who
reads this will know what's going on and be able to provide some
assistance.

Here is our XML:
http://www.uky.edu/AuxServ/creativegraphics/clients/test/portfolio_xml.txt

Here is our HTML and PHP:
http://www.uky.edu/AuxServ/creativegraphics/clients/test/portfolio_php.txt

And here is the page in action:
http://www.uky.edu/AuxServ/creativegraphics/clients/test/portfolio.php

The problem is this: When a user clicks the third link under the
"Digital" heading, as you can see from the XML, the following text
ought to be displayed:

==begin==
Such has been the patient sufferance of these Colonies; and such is now
the necessity which constrains them to alter their former Systems of
Government. The history of the present King of Great Britain [George
III] is a history of repeated injuries and usurpations, all having in
direct object the establishment of an absolute Tyranny over these
States. To prove this, let Facts be submitted to a candid world. He
has refused his Assent to Laws, the most wholesome and necessary for
the public good. He has forbidden his Governors to pass Laws of
immediate and pressing importance, unless suspended in their operation
till his Assent should be obtained; and when so suspended, he has
utterly neglected to attend to them.
==end==

However, rather than that text being displayed in its entirety, the
following is all that displays:
==begin==
sing importance, unless suspended in their operation till his Assent
should be obtained; and when so suspended, he has utterly neglected to
attend to them.
==end==

Somehow, everything prior to that point has been eaten.

This is what we know: this error occurs in WindowsXP, MacOSX, and
RedHat Linux. It occurs regardless of whether IE or a Gekko-based
browser is used. It occurs regardless of what type of server the files
are uploaded to. If all elements are edited to contain the exact same
number of characters, the error seems to disappear, but doing so
renders the code useless for our purposes. No other errors have been
noted. Changing the code so that no elements are undisplayed has no
effect. The question is this: what is causing this error, and how can
it be avoided? Any assistance would be greatly appreciated.

Mark Johnson
 
R

Richard Light

In message <[email protected]>, Mark

Caveat: I know nothing about the PHP XML parser. However, I suspect
that the problem is a failure to separate the physical reading of input
blocks from the logical parsing of the data they contain. My reason for
saying this is that the truncated phrase you quote "sing importance,
unless suspended ..." is at the start of the second 4096-byte block in
the file.

I would guess that the parser handed you the first part of this data
content, you placed in your array variable, and then it handed you the
second part ... Little suspecting this, you promptly overwrote the
variable with this second chunk. You can easily test this hypothesis by
changing the block size and seeing if the position of the error changes.

If this is the case, you'll have to be a bit smarter about processing
character data. Or get a better parser ...

Richard Light
Over the last few weeks, we've been working on building an online
portfolio using XML to pass content to an HTML page via PHP. In the
process, we've run across a rather inexplicable error which we've been
unable to find any reference to elsewhere. Hopefully, someone who
reads this will know what's going on and be able to provide some
assistance.

Here is our XML:
http://www.uky.edu/AuxServ/creativegraphics/clients/test/portfolio_xml.txt

Here is our HTML and PHP:
http://www.uky.edu/AuxServ/creativegraphics/clients/test/portfolio_php.txt

And here is the page in action:
http://www.uky.edu/AuxServ/creativegraphics/clients/test/portfolio.php

The problem is this: When a user clicks the third link under the
"Digital" heading, as you can see from the XML, the following text
ought to be displayed:

==begin==
Such has been the patient sufferance of these Colonies; and such is now
the necessity which constrains them to alter their former Systems of
Government. The history of the present King of Great Britain [George
III] is a history of repeated injuries and usurpations, all having in
direct object the establishment of an absolute Tyranny over these
States. To prove this, let Facts be submitted to a candid world. He
has refused his Assent to Laws, the most wholesome and necessary for
the public good. He has forbidden his Governors to pass Laws of
immediate and pressing importance, unless suspended in their operation
till his Assent should be obtained; and when so suspended, he has
utterly neglected to attend to them.
==end==

However, rather than that text being displayed in its entirety, the
following is all that displays:
==begin==
sing importance, unless suspended in their operation till his Assent
should be obtained; and when so suspended, he has utterly neglected to
attend to them.
==end==

Somehow, everything prior to that point has been eaten.

This is what we know: this error occurs in WindowsXP, MacOSX, and
RedHat Linux. It occurs regardless of whether IE or a Gekko-based
browser is used. It occurs regardless of what type of server the files
are uploaded to. If all elements are edited to contain the exact same
number of characters, the error seems to disappear, but doing so
renders the code useless for our purposes. No other errors have been
noted. Changing the code so that no elements are undisplayed has no
effect. The question is this: what is causing this error, and how can
it be avoided? Any assistance would be greatly appreciated.

Mark Johnson
 
M

Malcolm Dew-Jones

Richard Light ([email protected]) wrote:
: In message <[email protected]>, Mark

: Caveat: I know nothing about the PHP XML parser. However, I suspect
: that the problem is a failure to separate the physical reading of input
: blocks from the logical parsing of the data they contain. My reason for
: saying this is that the truncated phrase you quote "sing importance,
: unless suspended ..." is at the start of the second 4096-byte block in
: the file.

: I would guess that the parser handed you the first part of this data
: content, you placed in your array variable, and then it handed you the
: second part ... Little suspecting this, you promptly overwrote the
: variable with this second chunk. You can easily test this hypothesis by
: changing the block size and seeing if the position of the error changes.

: If this is the case, you'll have to be a bit smarter about processing
: character data. Or get a better parser ...
^^^^^^^^^^^^^^^^^^^^^

sounds like a likely scenario

however that doesn't mean there's anything wrong with the parser. a SAX
parser has no requirement to feed all of some contiguous character data in
a single call, and in fact a parser that did so could be considered a
problem.

Imagine if I had an xml document that had a giga byte of contiguous
character data. One of the points of the SAX parser is that it can feed
that data to the handler in smaller, more memory efficient chunks, and not
have to load the entire string in to memory.
 
R

Richard Light

Malcolm Dew-Jones said:
however that doesn't mean there's anything wrong with the parser. a SAX
parser has no requirement to feed all of some contiguous character data in
a single call, and in fact a parser that did so could be considered a
problem.

Imagine if I had an xml document that had a giga byte of contiguous
character data. One of the points of the SAX parser is that it can feed
that data to the handler in smaller, more memory efficient chunks, and not
have to load the entire string in to memory.

I would agree with that principle entirely. However, from a software
engineering point of view, I would expect as the user of such a parser
to be able to control the "text chunk" size, and not have character data
cut into arbitrary chunks based on where the block boundaries in the
input stream happen to fall.

Richard
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads

iframe oddity? 2
Ant oddity 9
Enum oddity 14
Relax NG specification oddity 4
mimetypes oddity 2
A syntax oddity 3000 5
Using SOAP in XML 0
SHA512 Prediction percentage 1

Members online

Forum statistics

Threads
473,999
Messages
2,570,246
Members
46,840
Latest member
BrendanG78

Latest Threads

Top