Scripting XML?

V

Vijai Kalyan

Hello All,

I have a few questions which you might seem irrelavant and/or foolish.
I am asking anyway so I can find out.

1. Is XSL as powerful as a programming language such as Java in its
abilities to transform XML? The W3C site has the following definition
on XSLT for example:

"XSLT is designed for use as part of XSL, which is a stylesheet
language for XML. In addition to XSLT, XSL includes an XML vocabulary
for specifying formatting. XSL specifies the styling of an XML
document by using XSLT to describe how the document is transformed
into another XML document that uses the formatting vocabulary.

XSLT is also designed to be used independently of XSL. However, XSLT
is not intended as a completely general-purpose XML transformation
language. Rather it is designed primarily for the kinds of
transformations that are needed when XSLT is used as part of XSL."

2. Does the above mean that when an XML document is transported over a
network, it's content is totally static?

This last question is relevant because we faced this problem when
using XML to describe a sequence of actions. Data for the actions were
available in the document largely, but some part of it was dynamic,
that is, provided by the environment.

To solve this we looked at various scripting languages (some of them
geared towards XML but the majority not) including Groovy, Jython,
Simkin, Xscript, XML Script etc.

If you do find these questions relevant, can I impose upon the group
to read this writeup and give your comments (including comments such
as "this whole shebang is not correct but it is not even wrong")

My news reader for some reason or other wouldn't allow me to post the
document so I will have to ask you to read it from here:

http://www.sourceforge.net/projects/xmlms/

thank you,

-vijai.
 
M

Martin Honnen

Vijai said:
1. Is XSL as powerful as a programming language such as Java in its
abilities to transform XML? The W3C site has the following definition
on XSLT for example:

"XSLT is designed for use as part of XSL, which is a stylesheet
language for XML. In addition to XSLT, XSL includes an XML vocabulary
for specifying formatting. XSL specifies the styling of an XML
document by using XSLT to describe how the document is transformed
into another XML document that uses the formatting vocabulary.

XSLT is also designed to be used independently of XSL. However, XSLT
is not intended as a completely general-purpose XML transformation
language. Rather it is designed primarily for the kinds of
transformations that are needed when XSLT is used as part of XSL."

XSLT 1.0 (http://www.w3.org/TR/xslt) is a declarative/functional
programming language that I think has been proven to be Turing complete
so in theory you can solve any task with it that you can solve with
other programming languages. So yes, for transforming XML the language
XSLT 1.0 is certainly theoretically as powerful as Java and practically
even more suited to do such transformations as it is specifically
constructed to transform XML to XML or to HTML or to text.
 
A

Andy Dingley

1. Is XSL as powerful as a programming language such as Java in its
abilities to transform XML?

Without an agreement as to what "powerful" means, then there's no way
to answer this question.

XSLT is extremely specialised. It takes XML documents as input and
generates an XML document as output. If you ask it nicely, it can
serialise this output document as HTML or even text, by breaking a few
of the XML well-formedness rules, but it's still broadly a "flattened"
serialisation of an XML structure.

XSL is by and large equivalent to XSLT. The difference is that XSL
also includes XSL:FO, which are useful for generating non-XML outputs
- frequently PDF, Quark output, or similar integration to DTP systems.
The terms XSLT and XSL are strictly a subset/superset of each other,
but "XSL" is a common loose term for either.

XSLT is based on XPath (learning XSL is easy, it's learning XPath
that takes the effort). With these tools, you can very easily perform
tasks that would be difficult in Java. However XSLT is also a
functional language, not a procedural, and so most programmers have a
great deal of trouble in writing it well. It's more "write-only" than
badly structured Perl.

XSLT 2 and EXSLT are also worth looking at. XSLT is based on XML,
where the contents of nodes (text nodes or attributes) are largely
opaque. This is a major limitation in practice, and so these are
efforts to improve matters.

On the whole, I'd assert that XSL was "more useful" than Java "in its
abilities to transform XML", but I wouldn't claim this was "more
powerful".

2. Does the above mean that when an XML document is transported over a
network, it's content is totally static?

I have no idea how you draw that conclusion from the statement listed,
so I don't really know what you're getting at.

XML has a number of issues (re encoding and whitespace) which are seen
as freely interchangeable. As such, a transport protocol can change
this without consequence. Whether this is regarded as "static" depends
on the context.

There are also many cases (e.g. serving XML content to HTML-only
browser) where XML may be transformed on being served. This isn't
really a mere "transport" though.

So XML documents transported over networks are static. But I'm sure
this wasn't what you meant, because your whole project seems to be
based on breaking this.
This last question is relevant because we faced this problem when
using XML to describe a sequence of actions.

I had a brief read of your document. To be frank, I found it a very
hard read - it needs an introduction to it, which it painfully doesn't
have at present. There's no distinction drawn between your monitoring
project, and your XML-MS concept.

If your project boils down to "Add in-transit processing of scriptlets
to XML", then that's a worthy idea (although it's already out there).
What I want to know about it is how the processing model works (who
processes it, and how does it decide what to process), what's
available as a coding platform to use, and what's the presented
interface for the document that's being processed. I don't care about
in-line code fragments, because quite honestly if I have to learn
anything new to fulfill that particular role, this isn't a good
solution for me.

There's also a strong sense that you're ignorant of Schematron, JSP,
Cocoon, even XSLT, and many of the other "pre-invented wheels" that
are already out there.
 
S

Shmuel (Seymour J.) Metz

on 11/07/2004 said:
1. Is XSL as powerful as a programming language such as Java in its
abilities to transform XML?

It is for the sorts of things that it is designed to do. If you need
to do something beyond that, use a programming language and an XML
parser.
2. Does the above mean that when an XML document is transported over
a network, it's content is totally static?

No. It has nothing to do with transporting XML over a network. The T
stands for transform, not transport.

--
Shmuel (Seymour J.) Metz, SysProg and JOAT <http://patriot.net/~shmuel>

Unsolicited bulk E-mail subject to legal action. I reserve the
right to publicly post or ridicule any abusive E-mail. Reply to
domain Patriot dot net user shmuel+news to contact me. Do not
reply to (e-mail address removed)
 
V

Vijayaraghavan Kalyanapasupathy

Subject: Re: Scripting XML?
From: Andy Dingley <[email protected]>
Newsgroups: comp.text.xml
Without an agreement as to what "powerful" means, then there's no way
to answer this question.
XSLT 2 and EXSLT are also worth looking at. XSLT is based on XML,
where the contents of nodes (text nodes or attributes) are largely
opaque. This is a major limitation in practice, and so these are
efforts to improve matters.
On the whole, I'd assert that XSL was "more useful" than Java "in its
abilities to transform XML", but I wouldn't claim this was "more
powerful".

Yes, you are right. My mistake in not clarifying that.

Well, simply put,

Can I take a XML parser, a XML document, a programming language such as
Java and combine them in the following manner for example:

XML Document => parsed by parser --> in-memory representation of
document => manipulated by application --> output in different form

So, if there were commands in some "embedded language" (or variable
value substituitions, variable declarations, function definitions,
funtion declarations etc.) in the document (in attribute values, element
data etc), the application can walk through the in-memory representation
of the document and act as a recognizer for the embedded language. Upon
completion the final document is free of all the "embedded language"
constructs.

Note that in both cases, per se, the document is valid as against being
merely well-formed because the embedded language constructs have no
meaning to a pure XML parser.

I can do this with the above combination certainly!

Question is, can I do the same thing with only XSLT which boils down to
asking, "can I represent the same actions in XSLT" or can I contrive an
equivalence between say Java and XSLT in their ability to do things with
an XML document?
I have no idea how you draw that conclusion from the statement listed,
so I don't really know what you're getting at.
XML has a number of issues (re encoding and whitespace) which are seen
as freely interchangeable. As such, a transport protocol can change
this without consequence. Whether this is regarded as "static" depends
on the context.
There are also many cases (e.g. serving XML content to HTML-only
browser) where XML may be transformed on being served. This isn't
really a mere "transport" though.
So XML documents transported over networks are static. But I'm sure
this wasn't what you meant, because your whole project seems to be
based on breaking this.

Well, what I meant is, if we consider the network as a black box and the
server and client on each side, then what is input into the box at the
server side is the same when it comes out at the client side of the box.
In between yes, the transport protocol may change things.

Specifically, an example of the client is "the parser on the browser"
not the rendering engine or the end-user.

Transformations such as applying a stylesheet are applied after the
client received it, so what I am really getting at is really "before
transformation on the client side are applied" and "after processing at
server side is complete".

Naturally, when you consider stand alone documents used by an
application, the server really does nothing but the client (the parser)
does a lot of stuff before the document is passed on to the final
application (the end-user?).
This last question is relevant because we faced this problem when
using XML to describe a sequence of actions.

I had a brief read of your document. To be frank, I found it a very
hard read - it needs an introduction to it, which it painfully doesn't
have at present. There's no distinction drawn between your monitoring
project, and your XML-MS concept.

I see. It could be that I was explaining it from our perspective and
problems in the project so things got intermixed. I will try to separate
things out and clarify them.
If your project boils down to "Add in-transit processing of scriptlets
to XML", then that's a worthy idea (although it's already out there).

Well, a general purpose scripting language can be used for anything,
including in transit processing (I presume you mean things like in-
network processing?) primarily because the language itself doesn't
impose any restrictions on what can be done with the document and its
content.

Yes, the primary point here is as you said above, the "non-opaqueness"
about the content (such as attribute values or element data). Simply
put, it is the same as the C-preprocessor walking through C modules and
substituting macros, except that the "pre-processor" in our problem is a
more powerful than merely doing macro expansion.
What I want to know about it is how the processing model works (who
processes it, and how does it decide what to process), what's
available as a coding platform to use, and what's the presented
interface for the document that's being processed. I don't care about
in-line code fragments, because quite honestly if I have to learn
anything new to fulfill that particular role, this isn't a good
solution for me.

Well here is how it could work:

- XML-Document A (in Schema S) served by server (which may be a
webserver, or just a file reader library)

- XML-Document A is parsed by XML Parser P into an in-memory
representation M

- Depending on the type of application (stand alone, web browser etc)

+ P invokes I (an interpreter for the embedded scripting language L)
on the value of each attribute and the data of each element.

or

+ P invokes I on only certain elements/attributes on-demand by the
stand alone application

# A stand alone application would be something that I would code for
some specific project.

# A general application would be something like a web-browser which
does not expose the presence of L constructs in a particular
document to an end-user, but does allow the end-user to tell it at
a very general level what to do when such constructs are found.
This is similar to you or me having control over whether IE or
Firefox executes Java applets or Javascript functions.

+ I recognizes certain constructs in the data that is passed to it.
So, in effect, I plays two roles:

* A pattern recognizer
* A language interpreter

# Note that the language is as powerful as any conventional
programming language.
# The patterns are relatively simple and serve only to mark out
regions of the data that should be considered as constructs in L

* On recognition of the pattern, I performs the tasks denoted by
the pattern:

@ This task may be as simple as requesting macro subsutition
@ or it may be as complex as a nested function call

+ When I finishes, the result is a XML document that is free of L in
all respects and conforms to a schema S'.

This model does actually allow you one to do in-transit processing
because all the necessary information to do such processing is available
in the document itself and the "patterns" can be extended to signal
which patterns are to be invoked and where. (That is a very good point,
thanks for pointing it out; I will look this up).
There's also a strong sense that you're ignorant of Schematron, JSP,
Cocoon, even XSLT, and many of the other "pre-invented wheels" that
are already out there.

Definitely true. I had no idea about Schematron and Coocoon (and no
great experience with XSLT) before you pointed them out. JSP I have
heard off. So, yes, I will go and investigate them as well!

Thanks again for your comments, but I hope I can impose upon you to
comment some more?

regards,

-vijai.
 
P

Patrick TJ McPhee

% 1. Is XSL as powerful as a programming language such as Java in its
% abilities to transform XML?

It can be easier to write transformations involving document structure
using XSLT than to do it in some general-purpose language using
a tree representation of the same document. XSLT is not as strong
at text manipulation as some languages. I think Java is a poor language
for both problems.

One thing you can do with many XSLT processors is to define extension
functions written in other languages. This can allow you to take advantage
of its strengths and ignore its weaknesses.

[...]

% 2. Does the above mean that when an XML document is transported over a
% network, it's content is totally static?

The above, which I've omitted, didn't seem to have anything to do with
network transport. You can pass environmental data to an XSLT processor
using parameters.

Your problem description is too vague for me to give specific advice,
but if you're not prepared to blow some time delving into it a bit and
seeing what it's all about, you're not going to get XSLT working
effectively in your project.
 
V

Vijayaraghavan Kalyanapasupathy

If your project boils down to "Add in-transit processing of scriptlets
to XML", then that's a worthy idea (although it's already out there).

You have given me food for thought. Can you give me examples of these?

regards,

-vijai.
 
A

Andy Dingley

Can I take a XML parser, a XML document, a programming language such as
Java and combine them in the following manner for example:

XML Document => parsed by parser --> in-memory representation of
document => manipulated by application --> output in different form

Yes, of course you can. But that's not a particularly helpful
statement - it's still far too vague.

I think that at the most generic you're talking about "Applying
scriptlets to XML documents as they pass through a transform process"?
Even at this level, the problem splits in two.

One is like Coccoon. You have a fairly rigid "processor" engine, and
you tag your documents with a very lightweight reference to a styling
or scripting task. This will typically use an XML PI (processing
instruction) because it's emphatically a _link_ to a shared process /
styling / script. Many documents will pass through this same engine,
and have similar sets of transforms applied to them. It's not even
significant who transforms them - that task could be widely
distributed, so long as there's some consistency about the
interpretation of the stylesheet.

The other (which I think is rather more like your project) uses
embedded script fragments. At some point, not entirely unlike the XML
DOM model, an "event" is triggered which causes appropriate script
elements to be invoked.
http://www.w3.org/TR/2003/REC-xml-events-20031014/

This TR already defines much of what you're looking at. It (obviously
enough) sees a separation between "observer" and "target", whilst your
more narrow context assumes they're bound together. This would require
your model to state this binding for each document, but it also means
that you don't need to repeat the script inside each document.



I don't see any concepts like "pattern recognizer" or "language
interpreter" as being at all helpful to this project. My life already
has enough language interpreters in it - don't give me another one,
give mea simple binding to the one(s) I already know. Look at DOM
and event models, not parsing embedded scriptlets from scratch.
 
J

John Fereira

Yes, of course you can. But that's not a particularly helpful
statement - it's still far too vague.

I think that at the most generic you're talking about "Applying
scriptlets to XML documents as they pass through a transform process"?
Even at this level, the problem splits in two.

One is like Coccoon.

The Jakarta Velocity project may also be a good example. I just recently
converted all of the documentation for an open source project to Velocity
Anakia. All of the static html pages were converted to xhtml, some anakia
specific tags added, and a transformation performed such that a format for
viewing on a web page and a format for printing are available. I could have
just as easily (well almost as easy) added an xmlfo transformation so that
the documentation pages are available as pdf files as well.
 
V

Vijayaraghavan Kalyanapasupathy

I think that at the most generic you're talking about "Applying
scriptlets to XML documents as they pass through a transform process"?

Well, yes. I guess another way to think about this is to consider an
XML-document as a template for something by itself (self-contained?).
It's processed repeatedly (repetition of the transform process) till
every scriptlet has been processed. The result is an XML-document.
The other (which I think is rather more like your project) uses
embedded script fragments. At some point, not entirely unlike the XML

Yes that is correct.
DOM model, an "event" is triggered which causes appropriate script
elements to be invoked.
http://www.w3.org/TR/2003/REC-xml-events-20031014/

I will look into this. Thanx for the info!
enough) sees a separation between "observer" and "target", whilst your
more narrow context assumes they're bound together. This would require
your model to state this binding for each document, but it also means
that you don't need to repeat the script inside each document.

Before I comment on this, I probably need to read up on XML_Events.
I don't see any concepts like "pattern recognizer" or "language
interpreter" as being at all helpful to this project. My life already
has enough language interpreters in it - don't give me another one,
give mea simple binding to the one(s) I already know. Look at DOM
and event models, not parsing embedded scriptlets from scratch.

Yep, precisely why I am asking these questions. I don't want to
implement a parser and interpreter for a full-fledged language from
scratch either (painful :)!

thanx and regards,

-vijai.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,231
Members
46,820
Latest member
GilbertoA5

Latest Threads

Top