XML-schema 'best practice' question

F

Frank Millman

Hi all

This is not strictly a Python question, but as I am writing in Python,
and as I know there are some XML gurus on this list, I hope it is
appropriate here.

XML-schemas are used to define the structure of an xml document, and
to validate that a particular document conforms to the schema. They
can also be used to transform the document, by filling in missing
attributes with default values.

In my situation, both the creation and the processing of the xml
document are under my control. I know that this begs the question 'why
use xml in the first place', but let's not go there for the moment.

Using minixsv, validating a document with a schema works, but is quite
slow. I appreciate that lxml may be quicker, but I think that my
question is still applicable.

I am thinking of adding a check to see if a document has changed since
it was last validated, and if not, skip the validation step. However,
I then do not get the default values filled in.

I can think of two possible solutions. I just wondered if this is a
common design issue when it comes to xml and schemas, and if there is
a 'best practice' to handle it.

1. Don't use default values - create the document with all values
filled in.

2. Use python to check for missing values and fill in the defaults
when processing the document.

Or maybe the best practice is to *always* validate a document before
processing it.

How do experienced practitioners handle this situation?

Thanks for any hints.

Frank Millman
 
L

Lorenzo Gatti

I am thinking of adding a check to see if a document has changed since
it was last validated, and if not, skip the validation step. However,
I then do not get the default values filled in.

I can think of two possible solutions. I just wondered if this is a
common design issue when it comes to xml and schemas, and if there is
a 'best practice' to handle it.

1. Don't use default values - create the document with all values
filled in.

2. Use python to check for missing values and fill in the defaults
when processing the document.

Or maybe the best practice is to *always* validate a document before
processing it.

The stated problem rings a lot of premature optimization bells;
performing the validation and default-filling step every time,
unconditionally, is certainly the least crooked approach.

In case you really want to avoid unnecessary schema processing, if you
are willing to use persistent data to check for changes (for example,
by comparing a hash or the full text of the current document with the
one from the last time you performed validation) you can also store
the filled-in document that you computed, either as XML or as
serialized Python data structures.

Regards,
Lorenzo Gatti
 
S

skip

Frank> 1. Don't use default values - create the document with all values
Frank> filled in.

Frank> 2. Use python to check for missing values and fill in the defaults
Frank> when processing the document.

Frank> Or maybe the best practice is to *always* validate a document
Frank> before processing it.

Frank> How do experienced practitioners handle this situation?

3. Don't use XML.

(sorry, couldn't resist)

Skip
 
F

Frank Millman

Hi all

This is not strictly a Python question, but as I am writing in Python,
and as I know there are some XML gurus on this list, I hope it is
appropriate here.

XML-schemas are used to define the structure of an xml document, and
to validate that a particular document conforms to the schema. They
can also be used to transform the document, by filling in missing
attributes with default values.
[..]

Or maybe the best practice is to *always* validate a document before
processing it.

I have realised that my question was irrelevant.

xml's raison d'etre is to facilitate the exchange of information
between separate entities. If I want to use xml as a method of
serialisation within my own system, I can do what I like, but there
can be no question of 'best practice' in this situation.

When xml is used as intended, and you want to process a document
received from a third party, there is no doubt that you should always
validate it first before processing it. Thank you, Lorenzo, for
pointing out the obvious. It may take me a while to catch up, but at
least I can see things a little more clearly now.

As to why I am using xml at all, I know that there is a serious side
to Skip's light-hearted comment, so I will try to explain.

I want to introduce an element of workflow management (aka Business
Process Management) into the business/accounting system I am
developing. I used google to try to find out what the current state of
the art is. After several months of very confusing research, this is
the present situation, as best as I can figure it out.

There is an OMG spec called BPMN, for Business Process Modeling
Notation. It provides a graphical notation, intended to be readily
understandable by all business users, from business analysts, to
technical developers, to those responsible for actually managing and
monitoring the processes. Powerful though it is, it does not provide a
standard method of serialsing the diagram, so there is no standard way
of exchanging a diagram between different vendors, or of using it as
input to a workflow engine.

There is an OASIS spec called WS-BPEL, for Web Services Business
Process Execution Language. It defines a language for specifying
business process behavior based on Web Services. This does have a
formal xml-based specification. However, it only covers processes
invoked via web services - it does not cover workflow-type processes
within an organisation. To try to fill this gap, a few vendors got
together and submitted a draft specification called BPEL4People. This
proposes a series of extensions to the WS-BPEL spec. It is still at
the evaluation stage.

The BPMN spec includes a section which attempts to provide a mapping
between BPMN and BPEL, but the authors state that there are areas of
incompatibility, so it is not a perfect mapping.

Eventually I would like to make sense of all this, but for now I want
to focus on BPMN, and ignore BPEL. I can use wxPython to design a BPMN
diagram, but I have to invent my own method of serialising it so that
I can use it to drive the business process. For good or ill, I decided
to use xml, as it seems to offer the best chance of keeping up with
the various specifications as they evolve.

I don't know if this is of any interest to anyone, but it was
therapeutic for me to try to organise my thoughts and get them down on
paper. I am not expecting any comments, but if anyone has any thoughts
to toss in, I will read them with interest.

Thanks

Frank
 
L

Lorenzo Gatti

I want to introduce an element of workflow management (aka Business
Process Management) into the business/accounting system I am
developing. I used google to try to find out what the current state of
the art is. After several months of very confusing research, this is
the present situation, as best as I can figure it out.

What is the state of the art of existing, working software? Can you
leverage it instead of starting from scratch? For example, the
existing functionality of your accounting software can be reorganized
as a suite of components, web services etc. that can be embedded in
workflow definitions, and/or executing a workflow engine can become a
command in your application.
There is an OMG spec called BPMN, for Business Process Modeling
Notation. It provides a graphical notation [snip]
there is no standard way
of exchanging a diagram between different vendors, or of using it as
input to a workflow engine.

So BPMN is mere theory. This "spec" might be a reference for
evaluating actual systems, but not a standard itself.
There is an OASIS spec called WS-BPEL, for Web Services Business
Process Execution Language. It defines a language for specifying
business process behavior based on Web Services. This does have a
formal xml-based specification. However, it only covers processes
invoked via web services - it does not cover workflow-type processes
within an organisation. To try to fill this gap, a few vendors got
together and submitted a draft specification called BPEL4People. This
proposes a series of extensions to the WS-BPEL spec. It is still at
the evaluation stage.

Some customers pay good money for buzzword compliance, but are you
sure you want to be so bleeding edge that you care not only for WS-
something specifications, but for "evaluation stage" ones?

There is no need to wait for BPEL4People before designing workflow
systems with human editing, approval, etc.
Try looking into case studies of how BPEL is actually used in
practice.
The BPMN spec includes a section which attempts to provide a mapping
between BPMN and BPEL, but the authors state that there are areas of
incompatibility, so it is not a perfect mapping.

Don't worry, BPMN does not exist: there is no incompatibility.
On the other hand, comparing and understanding BPMN and BPEL might
reveal different purposes and weaknesses between the two systems and
help you distinguish what you need, what would be cool and what is
only a bad idea or a speculation.
Eventually I would like to make sense of all this, but for now I want
to focus on BPMN, and ignore BPEL. I can use wxPython to design a BPMN
diagram, but I have to invent my own method of serialising it so that
I can use it to drive the business process. For good or ill, I decided
to use xml, as it seems to offer the best chance of keeping up with
the various specifications as they evolve.

If you mean to use workflow architectures to add value to your
business and accounting software, your priority should be executing
workflows, not editing workflow diagrams (which are a useful but
unnecessary user interface layer over the actual workflow engine);
making your diagrams and definitions compliant with volatile and
unproven specifications should come a distant last.
I don't know if this is of any interest to anyone, but it was
therapeutic for me to try to organise my thoughts and get them down on
paper. I am not expecting any comments, but if anyone has any thoughts
to toss in, I will read them with interest.


1) There are a number of open-source or affordable workflow engines,
mostly BPEL-compliant and written in Java; they should be more useful
than reinventing the wheel.

2) With a good XML editor you can produce the workflow definitions,
BPEL or otherwise, that your workflow engine needs, and leave the
interactive diagram editor for a phase 2 that might not necessarily
come; text editing might be convenient enough for your users, and for
graphical output something simpler than an editor (e.g a Graphviz
exporter) might be enough.

3) Maybe workflow processing can grow inside your existing accounting
application without the sort of "big bang" redesign you seem to be
planning; chances are that the needed objects are already in place and
you only need to make workflow more explicit and add appropriate new
features.

Regards,
Lorenzo Gatti
 
L

Lorenzo Gatti

Sorry for pressing the send button too fast.

I want to introduce an element of workflow management (aka Business
Process Management) into the business/accounting system I am
developing. I used google to try to find out what the current state of
the art is. After several months of very confusing research, this is
the present situation, as best as I can figure it out.

What is the state of the art of existing, working software? Can you
leverage it instead of starting from scratch? For example, the
existing functionality of your accounting software can be reorganized
as a suite of components, web services etc. that can be embedded in
workflow definitions, and/or executing a workflow engine can become a
command in your application.
There is an OMG spec called BPMN, for Business Process Modeling
Notation. It provides a graphical notation [snip]
there is no standard way
of exchanging a diagram between different vendors, or of using it as
input to a workflow engine.

So BPMN is mere theory. This "spec" might be a reference for
evaluating actual systems, but not a standard itself.
There is an OASIS spec called WS-BPEL, for Web Services Business
Process Execution Language. It defines a language for specifying
business process behavior based on Web Services. This does have a
formal xml-based specification. However, it only covers processes
invoked via web services - it does not cover workflow-type processes
within an organisation. To try to fill this gap, a few vendors got
together and submitted a draft specification called BPEL4People. This
proposes a series of extensions to the WS-BPEL spec. It is still at
the evaluation stage.

Some customers pay good money for buzzword compliance, but are you
sure you want to be so bleeding edge that you care not only for WS-
something specifications, but for "evaluation stage" ones?

There is no need to wait for BPEL4People before designing workflow
systems with human editing, approval, etc.
Try looking into case studies of how BPEL is actually used in
practice.
The BPMN spec includes a section which attempts to provide a mapping
between BPMN and BPEL, but the authors state that there are areas of
incompatibility, so it is not a perfect mapping.

Don't worry, BPMN does not exist: there is no incompatibility.
On the other hand, comparing and understanding BPMN and BPEL might
reveal different purposes and weaknesses between the two systems and
help you distinguish what you need, what would be cool and what is
only a bad idea or a speculation.
Eventually I would like to make sense of all this, but for now I want
to focus on BPMN, and ignore BPEL. I can use wxPython to design a BPMN
diagram, but I have to invent my own method of serialising it so that
I can use it to drive the business process. For good or ill, I decided
to use xml, as it seems to offer the best chance of keeping up with
the various specifications as they evolve.

If you mean to use workflow architectures to add value to your
business and accounting software, your priority should be executing
workflows, not editing workflow diagrams (which are a useful but
unnecessary user interface layer over the actual workflow engine);
making your diagrams and definitions compliant with volatile and
unproven specifications should come a distant last.
I don't know if this is of any interest to anyone, but it was
therapeutic for me to try to organise my thoughts and get them down on
paper. I am not expecting any comments, but if anyone has any thoughts
to toss in, I will read them with interest.


1) There are a number of open-source or affordable workflow engines,
mostly BPEL-compliant and written in Java; they should be more useful
than reinventing the wheel.

2) With a good XML editor you can produce the workflow definitions,
BPEL or otherwise, that your workflow engine needs, and leave the
interactive diagram editor for a phase 2 that might not necessarily
come; text editing might be convenient enough for your users, and for
graphical output something simpler than an editor (e.g a Graphviz
exporter) might be enough.

3) Maybe workflow processing can grow inside your existing accounting
application without the sort of "big bang" redesign you seem to be
planning; chances are that the needed objects are already in place and
you only need to make workflow more explicit and add appropriate new
features.

Regards,
Lorenzo Gatti
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,228
Members
46,818
Latest member
SapanaCarpetStudio

Latest Threads

Top