Literal 
 (not newline)

W

will

I have an XML input that includes things like:

<foo>line of text
another line of text
yet another</foo>

And I want the
entities PRESERVED (not translated) on the result,
so:

<bar>line of text
another line of text
yet another</bar>

I've tried <xsl:copy-of select="foo/text()" />, I've tried
<xsl:value-of select="foo" disable-output-escaping="yes" />, I've tried
<xsl:text disable-output-escaping="yes"><xsl:copy-of
select="foo/text()" /></xsl:text>, and it seems nothing works.

Strangely, will (with certain incantations of the above) be
preserved properly, but it seems that perhaps the PARSER is translating
the entities, not copying them. i.e., no matter what I try, the

from the input become newlines in the output.

I'm using Xerces J (a couple of different versions with the same).

Thanks smart people.
 
J

Joseph Kesselman

Per the XML spec, newlines are normalized as they are read in, and you
can't distinguish one representation from another. You may be able to
tell your serializer that you want all newlines output as
... but
it won't be able to tell those from other line breaks in your source file.

I'd recommend using semantic markup, such as an <lf/> element, to
represent this case, and postprocessing it to yield the desired
character. Or fixing whatever downstream tool is forcing you to worry
about the exact representation of line-break.
 
C

chiaman

Joseph said:
Per the XML spec, newlines are normalized as they are read in, and you
can't distinguish one representation from another.
I was kinda' afraid of that.
I'd recommend using semantic markup, such as an <lf/> element, to
represent this case, and postprocessing it to yield the desired
character. Or fixing whatever downstream tool is forcing you to worry
about the exact representation of line-break.
Another option I'd love to be able to deal with - however, I'm neither
in control of the input format (a vendor tool that saves reports in XML
format) nor the desired output format (M$ Excel).

Thanks for the help.
 
R

Richard Tobin

Another option I'd love to be able to deal with - however, I'm neither
in control of the input format (a vendor tool that saves reports in XML
format) nor the desired output format (M$ Excel).

You haven't told us *why* you need the newlines as character
references (incidentally, they're not entities). When you say the
output format is Excel, what do you mean? An XML document that Excel
can process? If so, it shouldn't care about whether you use literal
newlines or a reference.

-- Richard
 
C

chiaman

Because the data that includes the embedded references is formatted.
So I want the references included in the excel so that the newlines
appear in the cell data when displayed in excel. (I know, pick a
better tool than excel).

For example, given the following:

<poem>
<lines>A unix salesperson, Lenore
Loved her job, but loved the
beach more.
She devised such a way
to combine work and
play:
She sells C-shells by the seashore</lines>
<author>Unknown</author>
</poem>

Translated into Excel:
<Cell><Data ss:Type="String">A unix salesperson, Lenore
Loved her job, but loved the beach more.
She devised such a way
to combine work and play:
She sells C-shells by the seashore</Data><Cell>

when actually opened in Excel renders as

A unix salesperson, Lenore Loved her job, but loved the beach more. She
devised such a way to combine work and play: She sells C-shells by the
seashore

but if the newlines in the Excel XML include actual references:

<Cell><Data ss:Type="String">A unix salesperson, Lenore

Loved her job, but loved the beach more.

She devised such a way

to combine work and play:

She sells C-shells by the seashore</Data><Cell>

Will render properly in the Excel as

A unix salesperson, Lenore
Loved her job, but loved the beach more.
She devised such a way
to combine work and play:
She sells C-shells by the seashore

So the references are in the source because they're actually important.
I want them retained when I translate it to excel because they remain
important.
 
J

Johannes Koch

chiaman said:
but if the newlines in the Excel XML include actual references:

<Cell><Data ss:Type="String">A unix salesperson, Lenore

Loved her job, but loved the beach more.

She devised such a way

to combine work and play:

She sells C-shells by the seashore</Data><Cell>

Will render properly in the Excel as

A unix salesperson, Lenore
Loved her job, but loved the beach more.
She devised such a way
to combine work and play:
She sells C-shells by the seashore

What does Excel render if is is

<Cell><Data ss:Type="String">A unix salesperson, Lenore
Loved her
job, but loved the beach more.
She devised such a way
to combine
work and play:
She sells C-shells by the seashore</Data><Cell>

instead?
 
C

chiaman

As I said earlier - if the actual references are included, when viewing
the file in Excel, the line breaks show in the correct places - this
is, of course, assuming the last <Cell> is actually </Cell> ;) When
you view this in Excel, you would see:

A unix salesperson, Lenore
Loved her job, but loved the beach more.
She devised such a way
to combine work and play:
She sells C-shells by the seashore

For actual line breaks to appear in Excel, they have to be included in
the XML as references, otherwise, they're just parsed as whitespace and
render as a single space within Excel.
 
J

Johannes Koch

chiaman said:
As I said earlier

No. You provided two examples:

1. Newline characters, no character references
2. Newline characters followed by character references

for wich you added the renderings in Excel.

I asked for a third:
No newline characters, but character references

Maybe, in the end it's an issue of various line break character(s) on
different systems (u000A/u000D vs. u000A vs. u000D).
 
R

Richard Tobin

chiaman said:
For actual line breaks to appear in Excel, they have to be included in
the XML as references, otherwise, they're just parsed as whitespace and
render as a single space within Excel.

I'm afraid that all I can suggest is that you complain to Microsoft,
because XML applications should not treat
in text any
differently from a newline character (a conforming XML parser will
return the character in both cases).

-- Richard
 
P

Peter Flynn

chiaman wrote:
[...]
So the references are in the source because they're actually important.
I want them retained when I translate it to excel because they remain
important.

OK. Yes, picking a better system than Excel would be nice, but...

If you're not in control of the input format, then just run the
file through a filter and turn the numeric references into some
dummy empty element which you can transform back to
after.
<lb/> as ?Joseph suggested would be conventional, eg

$ sed -e "s+
+<lb/>+g" original.file >new.file

///Peter
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,005
Messages
2,570,264
Members
46,859
Latest member
HeidiAtkin

Latest Threads

Top