Viewing XML

M

MartinC

I am no expert on the subject but some of my translation work is being
saved as XML files by a dotnet ap. Now I would like to just "read" the
text. Is there a way to turn XML automagically into something readable?
Like:
===============================
<?xml version="1.0" encoding="utf-8"?>
<!--export-->
<editionyear="2007">
<film id="11">
<!--Bled Number One-->
<originalarticle />
<originaltitle>Bled Number One</originaltitle>
<englisharticle />
<englishtitle />
<useenglishtitle>False</useenglishtitle>
<premiere>Geen</premiere>
<productioncountry>Algeria, France</productioncountry>
<productionyear>2006</productionyear>
<color>Kleur</color>
<format>35mm</format>
<screenratio>-</screenratio>
<lengthinminutes>100</lengthinminutes>
<credits>
<director>Ameur-Zaïmeche, Rabah</director>
<producers />
<sales>Les Films du Losange</sales>
<distributors>A-Film Distribution</distributors>
<scenario>Rabah Ameur-Zaïmeche, Louise Thermes</scenario>
<photography>Lionel Sautier, Olivier Smittarello, Hakim Si
Ahmed</photography>
<editor>Nicolas Bancilhon</editor>
<artdesign />
<sound>Timothee Alazraki, Bruno Auzet, Mohamed Naman</sound>
<music>Rodolphe Burger</music>
<other />
<cast>Rabah Ameur-Zaïmeche, Meriem Serbah, Abel Jafri, Farida
Ouchani, Ramzy Bedia</cast>
</credits>
<synopsis>
<short language="dutch" length="252">
<p>Fictie als een documentaire jamsessie. De filmmaker speelt
zelf de hoofdrol van een crimineel die Frankrijk is uitgezet en terug
moet naar zijn dorp in Algerije dat hij niet voor niets ooit heeft
verlaten. Improviserend en met muzikaal gevoel gefilmd. </p>
</short>
===============================
into:
===============================
Bled Number One
Algeria, France
2006
Color
35mm
100
Ameur-Zaïmeche, Rabah
Les Films du Losange
A-Film Distribution
Rabah Ameur-Zaïmeche, Louise Thermes
Lionel Sautier, Olivier Smittarello, Hakim Si Ahmed
Nicolas Bancilhon
Timothee Alazraki, Bruno Auzet, Mohamed Naman
Rodolphe Burger
Rabah Ameur-Zaïmeche, Meriem Serbah, Abel Jafri, Farida Ouchani,
Ramzy Bedia
Fictie als een documentaire jamsessie. De filmmaker speelt zelf de
hoofdrol van een crimineel die Frankrijk is uitgezet en terug moet naar
zijn dorp in Algerije dat hij niet voor niets ooit heeft verlaten.
Improviserend en met muzikaal gevoel gefilmd.
===============================

Rgds

Martin
 
J

Joe Kesselman

MartinC said:
I am no expert on the subject but some of my translation work is being
saved as XML files by a dotnet ap. Now I would like to just "read" the
text. Is there a way to turn XML automagically into something readable?

Yes, but.

The problem is that XML is not a single language. It's a basic syntax,
with many different languages layered on top of it. An automatic
conversion needs to understand which language you're working in, and how
it expresses what kinds of information.

There are certainly tools such as XSLT which can be used to render XML
into forms which are easier for humans to read, for example by producing
HTML from it. But to use those tools, you need to be able to describe to
them what the current structure of the XML is and how you want to map
that to the displayable version. Essentially, you're writing a program
to automate this conversion, though you're doing so in a language
designed for the purpose.

So you can get automatic rendering -- but only after someone automates
it for you. Contact whoever wrote the application you're working with,
and ask them how to obtain a rendered version of the information. Or
learn to write your own stylesheets. Or settle for reading the XML,
which *is* human-readable even when it isn't particularly human-friendly.
 
P

p.lepin

MartinC said:
I am no expert on the subject but some of my translation
work is being saved as XML files by a dotnet ap.
[byte order mark removed]
Duh.

<?xml version="1.0" encoding="utf-8"?>
<!--export-->
<editionyear="2007">

That's not XML.
<film id="11">
<!--Bled Number One-->
<originalarticle />
<originaltitle>Bled Number One</originaltitle>
<englisharticle />
<englishtitle />
<useenglishtitle>False</useenglishtitle>
<premiere>Geen</premiere>
<productioncountry>Algeria,
France</productioncountry>
<productionyear>2006</productionyear>
<color>Kleur</color>
<format>35mm</format>
<screenratio>-</screenratio>
<lengthinminutes>100</lengthinminutes>
<credits>
<director>Ameur-Zaïmeche, Rabah</director>
<producers />
<sales>Les Films du Losange</sales>
<distributors>A-Film Distribution</distributors>
<scenario>Rabah Ameur-Zaïmeche, Louise
Thermes</scenario>
<photography>Lionel Sautier, Olivier Smittarello,
Hakim Si
Ahmed</photography>
<editor>Nicolas Bancilhon</editor>
<artdesign />
<sound>Timothee Alazraki, Bruno Auzet, Mohamed
Naman</sound>
<music>Rodolphe Burger</music>
<other />
<cast>Rabah Ameur-Zaïmeche, Meriem Serbah, Abel
Jafri, Farida
Ouchani, Ramzy Bedia</cast>
</credits>
<synopsis>
<short language="dutch" length="252">
<p>Fictie als een documentaire jamsessie. De
filmmaker speelt
zelf de hoofdrol van een crimineel die Frankrijk is
uitgezet en terug
moet naar zijn dorp in Algerije dat hij niet voor niets
ooit heeft
verlaten. Improviserend en met muzikaal gevoel gefilmd.
</p>
</short>

You just saved a few bytes of bandwidth and a few seconds
of your time by omitting the closing tags. By doing that,
you've also wasted several seconds of time for every person
who decided to try and help you. That's not very kind of
you. Not very smart, either.
Now I would like to just "read" the text. Is there a way
to turn XML automagically into something readable?

Not reliably. The default transformation does something
close to what you seem to be wanting:

<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:eek:utput method="text"/>
</xsl:stylesheet>

Stuff into default.xsl or somesuch, then add the following
processing instruction:

<?xml-stylesheet type="text/xsl" href="default.xsl"?>

after the XML declaration in your XML file.

If that isn't what you really need, ask your resident XML
expert to create a transformation for you.
 
J

Juergen Kahrs

MartinC said:
I am no expert on the subject but some of my translation work is being
saved as XML files by a dotnet ap. Now I would like to just "read" the
text. Is there a way to turn XML automagically into something readable?
Like:
===============================
<?xml version="1.0" encoding="utf-8"?>
<!--export-->
<editionyear="2007">

First you have to correct this line to

<edition year="2007">

Then you can use a parser to verify well-formedness

xmllint --noout file.xml
Bled Number One
Algeria, France
2006
Color
35mm

Looks like you want some parts of the XML file to be
displayed and others to be ignored. I usually do such
kind of processing with xgawk. For example this script

http://home.vrweb.de/~juergen.kahrs...Character-data-and-encoding-of-character-sets

@load xml
XMLCHARDATA { printf $0 }

will print all character data (text) and ignore comments
and attributes. Maybe this helps you.
 
P

Peter Flynn

MartinC said:
I am no expert on the subject but some of my translation work is being
saved as XML files by a dotnet ap. Now I would like to just "read" the
text. Is there a way to turn XML automagically into something readable?

No, not automatically, unless someone has done it before for this type
of file and is prepared to let you have their code. XML documents
typically contain no information about how the text is to be used
(displayed, typeset, spoken, etc). You have to specify (in some
transformation language, eg XSLT) how each element is to be represented.
This isn't difficult, but it does mean learning how to do it, or
employing someone who does.

///Peter
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

Forum statistics

Threads
474,007
Messages
2,570,266
Members
46,864
Latest member
DaniEbswor

Latest Threads

Top