regexp(ing) Backus-Naurish expressions ...

qwertmonkey · Mar 13, 2013

Arne said:
I would do it as:
- switch from properties to XML
- define a schema for the XML with strict restrictions on data
- let the application parse that with a validating parser and
read it into some config object, this will ensure that required
information is there and that the data types are correct
- let the application apply business validation rules in Java code
on the config objects - this will ensure that the various
information is consistent

~
Arne, what do you specifically mean when you say "read it into some
config object"? Using JAXB? AFAIK JAXB needs source (re)compilation in
Android:
~
http://code.google.com/p/android/issues/detail?id=314
~
Also I am trying to deal with it in a general "named-value" pair way, so that
different schema files should be parsed and the result (as I see it) should
be some String[*][2] with the names and values of parameters/properties
~

When working with regular expressions you should always remember that
you don't need to do everything in a single expression. There's no law
against splitting things up into sub-expressions or using "boring old
code" for parts of the match.

You should also bear in mind that some parsing tasks are just not
suited to regular expressions and if the regular expression starts
getting complicated you should consider if the task might be solved
more easily with another approach.

Here, assuming I've understood the problem right, I might do something
as below (I'm not on my development computer, so note that this has
not been checked for errors):

~
Yeah, I would agree with you but the switch case block is really awful
and totally useless to me. While doing NLP work you would go mad with
code full of switch-case sections for every single and virtually endless
cases
~

Not sure if this is what you are after as I've never used it myself but

http://commons.apache.org/proper/commons-cli/

~
well, no. It wasn't helpful because I need to do my work at the parsing stage

http://commons.apache.org/proper/commons-cli/usage.html
~

Regexes are quite limited. When you bang into their limits you can write a finite state machine or use a parser.

~
and I have been constantly banking against their limits ;-) in fact I find regexes quite limited for what I do
~

Based on your syntax example and you title, why bother with
"Backus-Naurish?" Java has full parser generators.

http://www.antlr.org/

for my needs antlr is an overkill

~

Martin said:
This is implemented as the ArgParser class in my environ.jar library and
can be found at:

http://sourceforge.net/projects/cdocumenter/files/cdocumenter/environment/

your ArgParser:

Constructor Detail
public ArgParser(java.lang.String progName,
java.lang.String[] args,
java.lang.String optlist)

must be passed an optlist and, similar to commons-cli, must be navigated/parsed.

All I make my users do is:

1) setup everything in <program_name>.properties files for default settings, and

2) let users set specific (protocolled) parameters as command line if they so decide

my ArgParser-like constructor looks like this:

SysEnvCtxt(){ ... }

public void setCtxt(String aKNm, String[] aKLnArgs, File ODir, String aPropsMetaMD5Sign, long lTm00Start) throws IOException

where:
aKNm: class name (passed from calling env)
aKLnArgs: command line args (automatically passed from calling env)
ODir: output dir (set and passed from calling env)
aPropsMetaMD5Sign: MD5 Signature of properties definitions and names (passed from calling env and set for some type of running context/properties)
lTm00Start: start time (automatically passed from calling env)

and then the user sets up a system and logical context running env properties (or xml) files which look like this:

# fully explicit and declaratively defined running properties written in a Backus?Naur(ish) form

# all system property names must start with (*nix standard) double hyphen
# metadata names are prefixed and suffixed as system_<property name>_values_def
# options are explicitly piped (with "|") "true[|false]" means it must [|not] be defined
# the last of the existing options after closing square bracket is the default
# if default option is not listed, it must be retrievable via java.lang.System.getProperty(<option>)

# ~ ~ ~ ~ ~ ~ ~ ~ java system level settings

# y: prints to standard error all java system and current process properties, as well as OS-level env variables the JVM has access to
--print-env-context: n

# y: redirects standard error file to <output dirirectory>/yyyyMMddHHmmss.SSSS"_err.log
--redirect-err: n

# y: redirects standard output file ...
--redirect-out: q

# file encoding used for file (it must be UTF-8 like)
--char-encoding: UTF-8

# version: <release>.<update[even:finished|prime:editing]>_<date +%Y-%m-%d>_<girl name>_phase
--version: 0.3_2013-03-08_kerala_pre-alpha

# code points are read off files line by line
--end-of-line:

# ~ ~ ~ ~ ~ ~ METADATA ~ DO NOT EDIT! ~ ~ ~ ~ ~ ~ ~ ~

system_print-env-context_values_def: true[y|n]n

system_redirect-err_values_def: true[y|n]n

system_redirect-out_values_def: true[y|n]n

system_char-encoding_values_def: true[UTF-8|UTF8|UTF-7|US-ASCII|ISO-8859-1|ISO-LATIN-1|ISO646-US|ANSI X3.4-1968]UTF-8

system_version_values_def: true[0.3_2013-03-08_kerala_pre-alpha]

system_end-of-line_values_def: false[nix|windows|mac]line.separator

# ~ ~ ~ ~ ~ logical context for java running instance ~ ~ ~ ~ ~

--input-files-list:

# ~ ~ ~ ~ ~ ~ METADATA ~ DO NOT EDIT! ~ ~ ~ ~ ~ ~ ~ ~

# file containing one liner of input files must be defined
input-files-list_values_def: true

thank you guys and I think I will go ahead and do the parsing myself
lbrtchx

markspace · Mar 13, 2013

# all system property names must start with (*nix standard) double hyphen

This to me says Apache CLI could be a big help.

thank you guys and I think I will go ahead and do the parsing myself

Agreed, I think your problem is specific enough that you are going to
have to custom code it.

Arne Vajhøj · Mar 13, 2013

~
Arne, what do you specifically mean when you say "read it into some
config object"? Using JAXB? AFAIK JAXB needs source (re)compilation in
Android:

JAXB is on way to get from XML to Java objects.

But there are plenty of other. W3C DOM, SAX, StAX, JDOM etc.. I would
expect some of them to be available on Android.

Also I am trying to deal with it in a general "named-value" pair way, so that
different schema files should be parsed and the result (as I see it) should
be some String[*][2] with the names and values of parameters/properties

Anything that can be represented in a properties file should be
possible to represent in a XML file. And most likely in a more
structured way.

Arne

Arved Sandstrom · Mar 13, 2013

~
Arne, what do you specifically mean when you say "read it into some
config object"? Using JAXB? AFAIK JAXB needs source (re)compilation in
Android:

Click to expand...

JAXB is on way to get from XML to Java objects.

But there are plenty of other. W3C DOM, SAX, StAX, JDOM etc.. I would
expect some of them to be available on Android.

Also I am trying to deal with it in a general "named-value" pair
way, so that
different schema files should be parsed and the result (as I see it)
should
be some String[*][2] with the names and values of parameters/properties

Click to expand...

Anything that can be represented in a properties file should be
possible to represent in a XML file. And most likely in a more
structured way.

Arne

However, many people - myself included - may find a properties file
easier to read than XML.

Also, XML no more gives you a _good_ hierarchy - which requires thought
- than a properties file with well-designed keys. Keys for properties
files for several Java loggers are examples of how they can be used to
easily define a hierarchy.

It's easier to read in a properties file.

Back in the day, not in the Java environment admittedly, I used to
prefer YAML to XML for properties files.

AHS

Arne Vajhøj · Mar 15, 2013

Arne Vajhøj schrieb:
I would do it as:
- switch from properties to XML
- define a schema for the XML with strict restrictions on data
- let the application parse that with a validating parser and
read it into some config object, this will ensure that required
information is there and that the data types are correct
- let the application apply business validation rules in Java code
on the config objects - this will ensure that the various
information is consistent
~
Arne, what do you specifically mean when you say "read it into some
config object"? Using JAXB? AFAIK JAXB needs source (re)compilation in
Android:

Click to expand...

JAXB is on way to get from XML to Java objects.

But there are plenty of other. W3C DOM, SAX, StAX, JDOM etc.. I would
expect some of them to be available on Android.

Also I am trying to deal with it in a general "named-value" pair
way, so that
different schema files should be parsed and the result (as I see it)
should
be some String[*][2] with the names and values of parameters/properties

Click to expand...

Anything that can be represented in a properties file should be
possible to represent in a XML file. And most likely in a more
structured way.

Click to expand...

However, many people - myself included - may find a properties file
easier to read than XML.

I don't see XML as difficult to read.

Also, XML no more gives you a _good_ hierarchy - which requires thought
- than a properties file with well-designed keys. Keys for properties
files for several Java loggers are examples of how they can be used to
easily define a hierarchy.

With property files it becomes a convention instead of structure.

And regarding the loggers, then note that some of the advanced
features are only available via XML config not via properties
config, so I am not sure that loggers is an argument against XML.

It's easier to read in a properties file.

If you don't need to check values - yes.

But if you need to check values, then XML with a schema
and a validating parser saves a ton of Java code.

Which was my original point.

Arne

Arved Sandstrom · Mar 15, 2013

On 3/13/2013 5:54 PM, (e-mail address removed) wrote:
Arne Vajhøj schrieb:
I would do it as:
- switch from properties to XML
- define a schema for the XML with strict restrictions on data
- let the application parse that with a validating parser and
read it into some config object, this will ensure that required
information is there and that the data types are correct
- let the application apply business validation rules in Java code
on the config objects - this will ensure that the various
information is consistent
~
Arne, what do you specifically mean when you say "read it into some
config object"? Using JAXB? AFAIK JAXB needs source (re)compilation in
Android:

JAXB is on way to get from XML to Java objects.

But there are plenty of other. W3C DOM, SAX, StAX, JDOM etc.. I would
expect some of them to be available on Android.

Also I am trying to deal with it in a general "named-value" pair
way, so that
different schema files should be parsed and the result (as I see it)
should
be some String[*][2] with the names and values of parameters/properties

Anything that can be represented in a properties file should be
possible to represent in a XML file. And most likely in a more
structured way.

Click to expand...

However, many people - myself included - may find a properties file
easier to read than XML.

Click to expand...

I don't see XML as difficult to read.

Point being, other people may. XML is only readable - obviously - when
it is properly formatted (whitespaced), and it is *considerably* more
readable when (1) it is colour-coded and (2) element tags (open and
close) and CDATA content all are on individual lines. But if you don't
have colour-coding and the formatting is fairly condensed (but still
allowing for decent indentation) then I consider XML to often be less
efficient at conveying information to a human than an equivalent
properties file.

With property files it becomes a convention instead of structure.

No more so than deciding *how* to interpret an XML file. DTDs or schemas
only do first-line validation - you need an accompanying specification
(out-of-band docs) that explains to humans what all that XML means, just
as for properties files. There are no magic bullets.

And regarding the loggers, then note that some of the advanced
features are only available via XML config not via properties
config, so I am not sure that loggers is an argument against XML.

Yeah, I know. I think that was an implementation choice, is all. It's
not like the properties format couldn't support the extra options.

If you don't need to check values - yes.

That's only first-level checking (is this element content a number,
say). However, it's pretty straightforward to accomplish the same thing
with properties files, assuming that the goal is a Java "properties"
bean of some sort. If you have a properties file entry mapped to a bean
field of type X, and the conversion succeeds or fails, it's the same
thing as doing your XML schema checking.

Since the properties bean *is* a bean, it has getters and setters. You
can easily put any validation into your setters, if you're handrolling,
and you may well have to anyway, since some validation can be complex.

Some environments make this particularly easy: if I'm using Spring (and
I often must, and many people do) then autowiring, normal Spring
property file use, and the @Value annotation make it extremely simple to
load up a properties POJO from a properties file.

But even handrolling is easy. Often you may as well, since your
second-level semantic checking is not something that anything but code
will do for you anyway.

But if you need to check values, then XML with a schema
and a validating parser saves a ton of Java code.

Which was my original point.

Arne

How much does that schema and XML parsing save you? Presumably you want
those properties to end up in one or more strongly typed "properties"
objects. Whether the source of properties is a properties file or XML,
you have to code up those Java "properties" POJOs with full knowledge of
expected structure. *That* is your schema right there, regardless:
running an XML validator on an XML file against an XML schema is
duplication of effort. You'd learn the same things by failing to load
the properties into your beans, which you have to do anyway.

AHS

Lew · Mar 15, 2013

I was happy to merely observe this variation on Editor Wars, but there are a couple of points
I'd like to offer.

Arved said:
How much does that schema and XML parsing save you? Presumably you want

A lot, in my experience. I've done a fair amount of heterogenous-system communications
via various protocols including fixed-format ("columns 1-6 mean identifier, 7-8 are a control code,
9-15 are the section name, ..."), CSV, XML, Google Buffers and JSON.

Using XML for, say, web services or to communicate an object model is quite powerful, made
more so by the use of schemas and schema validation.

It doesn't handle deep validation, nor should it. It's the XML equivalent of surface-edit validation
in a GUI. You don't expect the back end to validate everything, typically, but to count on certain
sanity checks from the front end. Thus it is common for the front end (GUI widget or XML doc)
to validate things like "is this a number?". And useful.

those properties to end up in one or more strongly typed "properties"
objects. Whether the source of properties is a properties file or XML,
you have to code up those Java "properties" POJOs with full knowledge of
expected structure. *That* is your schema right there, regardless:

That is not your schema. That is one layer's implementation of your schema.

running an XML validator on an XML file against an XML schema is
duplication of effort. You'd learn the same things by failing to load

Not if you do it right, it isn't.

the properties into your beans, which you have to do anyway.

Not if you do it right, you don't.

Systems have different pipelines at different layers. In a system where XML is
advantageous, you have validation at the gateway, before it gets into your queues
and components and heavy logic. This increases throughput and scalability in addition
to correctness and reliability.

Also, redundant checks are not always a bad thing. Back in the 1970s a nuclear
missile siloed in West Virginia lost three of its four failsafes. Had there not been redundancy,
there would have been catastrophe. Back in the 1980s there was a nuclear-medicine
radiation-doser manufacturer in Canada who removed "redundant" hardware failsafes in the
dosage, and the software bugs promptly started killing people.

A properly designed system will put surface edits in the front end, whether it's XML or
source code compilation or JSON parsing or what-have-you, and different checks in
different layers. Useful redundancy is achieved by dependent layers asserting the validity
promised by antecedent layers rather than duplication of all the effort.

Arved Sandstrom · Mar 15, 2013

I was happy to merely observe this variation on Editor Wars, but there are a couple of points
I'd like to offer.

A lot, in my experience. I've done a fair amount of heterogenous-system communications
via various protocols including fixed-format ("columns 1-6 mean identifier, 7-8 are a control code,
9-15 are the section name, ..."), CSV, XML, Google Buffers and JSON.

I agree, a lot *overall*. But I am thinking in this thread of
configuration files, strictly configuration files, and *only*
configuration files. That I use XML a great deal is totally irrelevant.

Using XML for, say, web services or to communicate an object model is quite powerful, made
more so by the use of schemas and schema validation.

It is that, although I wouldn't consider XML to be superior to other
possibilities for communication of object models. It works; so do other
methods.

It doesn't handle deep validation, nor should it. It's the XML equivalent of surface-edit validation
in a GUI. You don't expect the back end to validate everything, typically, but to count on certain
sanity checks from the front end. Thus it is common for the front end (GUI widget or XML doc)
to validate things like "is this a number?". And useful.

Well, yes. What I said.

That is not your schema. That is one layer's implementation of your schema.

Oh no, I disagree. That is my schema, if my source of truth is a Java
POJO. I didn't say "XML schema", I said "schema". The only thing that
concerns me in this argument is

POJO <-> configuration file

I know what I need that "properties" or "configuration" POJO to be; I
can write it first, it's authoritative. It *is* my schema. Not an XML
schema, but my expressed configuration data structure.

Not if you do it right, it isn't.

Not if you do it right, you don't.

Systems have different pipelines at different layers. In a system where XML is
advantageous, you have validation at the gateway, before it gets into your queues
and components and heavy logic. This increases throughput and scalability in addition
to correctness and reliability.

Also, redundant checks are not always a bad thing. Back in the 1970s a nuclear
missile siloed in West Virginia lost three of its four failsafes. Had there not been redundancy,
there would have been catastrophe. Back in the 1980s there was a nuclear-medicine
radiation-doser manufacturer in Canada who removed "redundant" hardware failsafes in the
dosage, and the software bugs promptly started killing people.

A properly designed system will put surface edits in the front end, whether it's XML or
source code compilation or JSON parsing or what-have-you, and different checks in
different layers. Useful redundancy is achieved by dependent layers asserting the validity
promised by antecedent layers rather than duplication of all the effort.

This is a good argument - validation redundancy - in the bigger picture.
I don't dispute that. But you don't need XML or XML DTDs/schemas to
achieve that.

AHS

regexp(ing) Backus-Naurish expressions ...	23	Mar 10, 2013
Constant expressions	11	Oct 14, 2011
Why is Python telling me variable is local not global?	3	Sep 2, 2023
My "telegram_polling()" and "@message_handler" does not work on "herokuapp.com" under gunicorn	0	Dec 12, 2021
Blue J Ciphertext Program	2	Nov 22, 2023
Is there a way where i can limit the array output results?	1	Oct 19, 2022
ChatBot	4	Jan 19, 2021
Question about multiple metadata files to one file	0	Feb 14, 2022

regexp(ing) Backus-Naurish expressions ...

qwertmonkey

markspace

Arne Vajhøj

Arved Sandstrom

Arne Vajhøj

Arved Sandstrom

Lew

Arved Sandstrom

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads