Excluding values in the xsd

D

dick.deneer

I have a XML which specifies a Cobol copybook member. The XML is
checked against a XSD.
One of the xml attributes is the Cobol fieldname. The xsd constraints
the value of this attribute to be greater then zero and less then 31.
Now I want to include another check. The value must not be one of the
Cobol reserved words. So I have a list with reserved words (like SUM,
ACCEPT, COMPUTE, etc).
How can I specify this excluded values in the xsd so that the XML
vaidation will return errors if one of the reserved words is used in
the attribute value.
DickD
 
M

Martin Honnen

I have a XML which specifies a Cobol copybook member. The XML is
checked against a XSD.
One of the xml attributes is the Cobol fieldname. The xsd constraints
the value of this attribute to be greater then zero and less then 31.
Now I want to include another check. The value must not be one of the
Cobol reserved words. So I have a list with reserved words (like SUM,
ACCEPT, COMPUTE, etc).
How can I specify this excluded values in the xsd so that the XML
vaidation will return errors if one of the reserved words is used in
the attribute value.

You can enumerate those reserved words e.g.
<xs:simpleType name="reserved-word">
<xs:restriction base="xs:string">
<xs:enumeration value="ACCEPT"/>
<xs:enumeration value="COMPUTE"/>
<xs:enumeration value="SUM"/>
<!-- add further values here -->
</xs:restriction>
</xs:simpleType>
 
M

Martin Honnen

Martin said:
You can enumerate those reserved words e.g.

Sorry, I misread your request, enumeration helps if you want to allow
all reserved words, but not if you want to disallow them.
 
D

dick.deneer

Use a regular expression to describe the attribute's acceptable values.

http://www.w3.org/TR/xmlschema-2/#rf-patternhttp://www.w3.org/TR/xmlschema-2/#regexs

The problem is that any value can be accepted, except the list of
reserved words.
In regex it is not easy to negate an expression. There is not
something like ^(SUM,COMPUTE,DATA).
After a long internet search I found an expression the matched my
needs.
Here is the java code:
String s2 = "perfOrm";
String regex = "^(?:(?!^(?im:accept|accept-encoding|from|to|perform|
sub)$)[\\w-])*$";
System.out.println(s2 + " matches " + regex + " =
"+s2.matches(regex));

The exclude values in this example are arbritary.
But: ... this kind of expression is not supported by Xerces or any
other parser.
I found that the XML Schema specifcations talk about level 1 regex
support.

So if anyone has a idea to solve this ??
Regards
Dick Deneer
 
J

Joseph Kesselman

So if anyone has a idea to solve this ??

I think Schema's supported regular expressions can be presuaded to do
it, though the expression may be painfully ugly.

If you aren't happy with that, implement the check in the application
rather than in schema.

Remember, the schema is only an initial sanity check on syntax and
overall structure of the document. It is NOT intended to capture and
check all possible semantic constraints. Some checking will still have
to be implemented in the application.
 
D

dick.deneer

I think Schema's supported regular expressions can be presuaded to do
it, though the expression may be painfully ugly.

I think it is not possible. Please convince me :)

Regards
Dick Deneer
 
A

Alain Ketterlin

I think it is not possible. Please convince me :)

Regular languages are closed under complementation. So, you can be
sure it is possible: there _is_ a regular expression that matches
everything except a finite set of words. If you want to exclude, e.g.,
"if" and "else", you can go:

([^i]|i[^f]|if.|[^e]|e[^l]|el[^s]|els[^e]|else.).*

(I'm not sure about the regexp syntax for schemas). It may be a real
pain. I don't know if there's an easier way to get the same result.

-- Alain.
 
D

dick.deneer

I think it is not possible. Please convince me :)

Regular languages are closed under complementation. So, you can be
sure it is possible: there _is_ a regular expression that matches
everything except a finite set of words. If you want to exclude, e.g.,
"if" and "else", you can go:

([^i]|i[^f]|if.|[^e]|e[^l]|el[^s]|els[^e]|else.).*

(I'm not sure about the regexp syntax for schemas). It may be a real
pain. I don't know if there's an easier way to get the same result.

-- Alain.

Alain,

I tested your expression and it always returns true, whatever
(including if and else) I type.
Do I miss something?
 
A

Alain Ketterlin

([^i]|i[^f]|if.|[^e]|e[^l]|el[^s]|els[^e]|else.).*
I tested your expression and it always returns true, whatever
(including if and else) I type.
Do I miss something?

I did :) I went to fast. You have to 1) include trailing chars in the
alternative, 2) group prefixes to exclude, 3) take care of strict
prefixes. Something like:

([^ie].*|i|i[^f].*|if.+|e|e[^l].*|el|el[^s].*|els|els[^e].*|else.+)

May get really hairy with lots of keywords. Be careful with common
prefixes, like "if" and "int":

([^i].*|i|i[^nf].*|if.+|in[^t].*|int.+)

I stop here, in fear of writing nonsense. The basic idea is simple:

1) draw a trie (lexicographic tree) containing all the words
2) add one alternative for each path to a non leaf node (i,el,els)
3) add one alternative for each path out of a node (either
leaf or non-leaf), i.e., a path that starts "in" the tree and "exits"
the tree at some point (i[^f].*,if.+ etc.)

(It basically amounts in reverting the output of a deterministic
finite automaton.)

-- Alain.

P/S: BTW, I just discovered grep --colour... Useful in such cases.
 
B

Bjoern Hoehrmann

* Alain Ketterlin wrote in comp.text.xml:
Regular languages are closed under complementation. So, you can be
sure it is possible: there _is_ a regular expression that matches
everything except a finite set of words. If you want to exclude, e.g.,
"if" and "else", you can go:

([^i]|i[^f]|if.|[^e]|e[^l]|el[^s]|els[^e]|else.).*

(I'm not sure about the regexp syntax for schemas). It may be a real
pain. I don't know if there's an easier way to get the same result.

You created not(if) or not(else) which matches if and else, you need to
create not(if) and not(else), i.e. the intersection of two regular ex-
pressions. I suppose there is a painful way in XML Schema to specify
multiple regular expressions a string must match, and inverting a group
is simple (abc -> not(a) .* or a not(b) .* or ab not(c) or abc .+). It
would be better to compute the intersection of the regular expressions.
There may be finite state automata tools that support that. I am about
to release a tool that can do it aswell.
 
D

dick.deneer

([^i]|i[^f]|if.|[^e]|e[^l]|el[^s]|els[^e]|else.).*
I tested your expression and it always returns true, whatever
(including if and else) I type.
Do I miss something?

I did :) I went to fast. You have to 1) include trailing chars in the
alternative, 2) group prefixes to exclude, 3) take care of strict
prefixes. Something like:

([^ie].*|i|i[^f].*|if.+|e|e[^l].*|el|el[^s].*|els|els[^e].*|else.+)

May get really hairy with lots of keywords. Be careful with common
prefixes, like "if" and "int":

([^i].*|i|i[^nf].*|if.+|in[^t].*|int.+)

I stop here, in fear of writing nonsense. The basic idea is simple:

1) draw a trie (lexicographic tree) containing all the words
2) add one alternative for each path to a non leaf node (i,el,els)
3) add one alternative for each path out of a node (either
leaf or non-leaf), i.e., a path that starts "in" the tree and "exits"
the tree at some point (i[^f].*,if.+ etc.)

(It basically amounts in reverting the output of a deterministic
finite automaton.)

-- Alain.

P/S: BTW, I just discovered grep --colour... Useful in such cases.

Alain (and Bjoern)

I am convinced.
It is possible but indeed very painfull if your list of reserved words
is big, which is the case for me.
Thanks a lot,
Dick Deneer
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

Forum statistics

Threads
474,008
Messages
2,570,268
Members
46,867
Latest member
Lonny Petersen

Latest Threads

Top