NITF: cant load objDOM because of HTML-entities

R

Ragnar Heil

Hi,

I am receiving news from a press-agency in NITF-XML.
Then I want to import them into my CMS using XML&SOAP.
The import-tool runs fine if I have got an xml-document with real
German special characters, not HTML entities.

Unfortunately I receive the news with entities and get this error
(translate from German):
Parse Error in input XML file: Reference to a not definded entity
'auml'.

my code:
Set objDom = CreateObject("MSXML2.DOMDocument.3.0")
objDom.setProperty "SelectionLanguage", "XPath"
objDom.async = False objDom.setProperty "SelectionNamespaces",
"xmlns:tcmapi='http://www.tridion.com/ContentManager/5.0/TCMAPI'"
objDom.Load (strFilePath & strXmlFileName)
If Not objDom.parseError.reason = "" Then
WriteToLog "Parse Error in input XML file: " &
objDom.parseError.reason
End If

thanks for your help!
Ragnar
 
M

Martin Honnen

Ragnar Heil wrote:

I am receiving news from a press-agency in NITF-XML.
Then I want to import them into my CMS using XML&SOAP.
The import-tool runs fine if I have got an xml-document with real
German special characters, not HTML entities.

Unfortunately I receive the news with entities and get this error
(translate from German):
Parse Error in input XML file: Reference to a not definded entity
'auml'.

my code:
Set objDom = CreateObject("MSXML2.DOMDocument.3.0")
objDom.setProperty "SelectionLanguage", "XPath"
objDom.async = False objDom.setProperty "SelectionNamespaces",
"xmlns:tcmapi='http://www.tridion.com/ContentManager/5.0/TCMAPI'"
objDom.Load (strFilePath & strXmlFileName)
If Not objDom.parseError.reason = "" Then
WriteToLog "Parse Error in input XML file: " &
objDom.parseError.reason
End If

Well if an XML document uses entity references those entities need to be
defined thus if @auml; is used there needs to be an entity declaration
in the document type definition that declares the entity, otherwise the
XML is not well-formed.
 
R

Ragnar Heil

Well if an XML document uses entity references those entities need to be
defined thus if @auml; is used there needs to be an entity declaration
in the document type definition that declares the entity, otherwise the
XML is not well-formed.

Hi Martin,

now I have seen that this thread talks about a similar issue
Subject: XML: "undefined entity"

yes, you are right, entity references have to be defined in the DTD like
<!ENTITY % HTMLlat1 PUBLIC "-//W3C//ENTITIES Latin 1 for XHTML//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent">

I am really wondering why the NITF-files have no reference to a DTD.
I could modify the NITF.dtd on our server but not the incoming files.
Would you do it? take the incoming files and add a DTD-reference to them?
Then I also can do another way of hacking and replace all entities with the
real special characters (Umlaute).
 
J

Johannes Koch

Ragnar said:
I am receiving news from a press-agency in NITF-XML.
Then I want to import them into my CMS using XML&SOAP.
The import-tool runs fine if I have got an xml-document with real
German special characters, not HTML entities.

Unfortunately I receive the news with entities

Tell the press agency to send XML:
a) use characters directly with the appropriat encoding, or
b) use numerical references (e.g. ü for german u umlaut).
and to add a document type declaration.

If you have a contract with them to get NITF-XML, they have to fulfill
their part (send NITF-XML and not some code that looks like XML).
 
M

Martin Honnen

Ragnar Heil wrote:

I am really wondering why the NITF-files have no reference to a DTD.
I could modify the NITF.dtd on our server but not the incoming files.
Would you do it? take the incoming files and add a DTD-reference to them?

If someone tells you that he is going to provide XML and it is not XML
then you should probably insist that XML is being sent and not something
that fullfills some rules of XML but not others. Otherwise you are
forced to fix their not well-formed markup and as you can't use existing
XML parsers to that you are left with some text processing.
 
R

Ragnar Heil

If you have a contract with them to get NITF-XML, they have to fulfill
their part (send NITF-XML and not some code that looks like XML).

HI Johannes and Martin,

now I talked to a technical person from the press agency.
They are aware that their NITF-xml-documents are not valid and wellformed
:-(

Now I am thinking of ways how to load the news-file into my objDOM without
getting an error message from the parser which checks the validation


Ragnar
 
J

Johannes Koch

Ragnar said:
now I talked to a technical person from the press agency.
They are aware that their NITF-xml-documents are not valid and wellformed
:-(

And they don't want to change it?
 
R

Ragnar Heil

And they don't want to change it?

well, I am going to mention this to DPA ;-)

Are you aware of any tools which convert files with entities to files with
Umlaute?


Ragnar
 
A

Andy Dingley

I am receiving news from a press-agency in NITF-XML.

Most (some ? / many ? / nearly all ?) NITF / NewsML / RSS feeds become
invalid whenever they encounters an accented character. You have no
practical hope of fixing this, because the organisations are beyond
your control and you really just have to deal with the garbage they're
sending you. Raise the issue with them, complain as loudly as you
can, but don't expect them to fix it.

I use some very ugly pre-processor code before the parser. If the
first parse attempt fails for this reason, I re-try with a version
that has had a reference to an appropriate local DTD added to it.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,999
Messages
2,570,243
Members
46,836
Latest member
login dogas

Latest Threads

Top