Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   Java (http://www.velocityreviews.com/forums/f30-java.html)
-   -   regexp(ing) Backus-Naurish expressions ... (http://www.velocityreviews.com/forums/t958647-regexp-ing-backus-naurish-expressions.html)

qwertmonkey@syberianoutpost.ru 03-13-2013 09:54 PM

regexp(ing) Backus-Naurish expressions ...
 
Arne Vajh°j schrieb:

> I would do it as:
> - switch from properties to XML
> - define a schema for the XML with strict restrictions on data
> - let the application parse that with a validating parser and
> read it into some config object, this will ensure that required
> information is there and that the data types are correct
> - let the application apply business validation rules in Java code
> on the config objects - this will ensure that the various
> information is consistent

~
Arne, what do you specifically mean when you say "read it into some
config object"? Using JAXB? AFAIK JAXB needs source (re)compilation in
Android:
~
http://code.google.com/p/android/issues/detail?id=314
~
Also I am trying to deal with it in a general "named-value" pair way, so that
different schema files should be parsed and the result (as I see it) should
be some String[*][2] with the names and values of parameters/properties
~

Leif Roar Moldskred schrieb:

> When working with regular expressions you should always remember that
> you don't need to do everything in a single expression. There's no law
> against splitting things up into sub-expressions or using "boring old
> code" for parts of the match.


> You should also bear in mind that some parsing tasks are just not
> suited to regular expressions and if the regular expression starts
> getting complicated you should consider if the task might be solved
> more easily with another approach.


> Here, assuming I've understood the problem right, I might do something
> as below (I'm not on my development computer, so note that this has
> not been checked for errors):

~
Yeah, I would agree with you but the switch case block is really awful
and totally useless to me. While doing NLP work you would go mad with
code full of switch-case sections for every single and virtually endless
cases
~

lipska the kat schrieb:

> Not sure if this is what you are after as I've never used it myself but


> http://commons.apache.org/proper/commons-cli/

~
well, no. It wasn't helpful because I need to do my work at the parsing stage

http://commons.apache.org/proper/commons-cli/usage.html
~

Roedy Green schrieb:

> > Any ideas you would share?


> Regexes are quite limited. When you bang into their limits you can write a finite state machine or use a parser.

~
and I have been constantly banking against their limits ;-) in fact I find regexes quite limited for what I do
~

markspace schrieb:

> Based on your syntax example and you title, why bother with
> "Backus-Naurish?" Java has full parser generators.


> http://www.antlr.org/



for my needs antlr is an overkill

~


Martin Gregorie schrieb:

> This is implemented as the ArgParser class in my environ.jar library and
> can be found at:


> http://sourceforge.net/projects/cdoc...r/environment/


your ArgParser:

Constructor Detail
public ArgParser(java.lang.String progName,
java.lang.String[] args,
java.lang.String optlist)


must be passed an optlist and, similar to commons-cli, must be navigated/parsed.

All I make my users do is:

1) setup everything in <program_name>.properties files for default settings, and

2) let users set specific (protocolled) parameters as command line if they so decide


my ArgParser-like constructor looks like this:

SysEnvCtxt(){ ... }

public void setCtxt(String aKNm, String[] aKLnArgs, File ODir, String aPropsMetaMD5Sign, long lTm00Start) throws IOException

where:
aKNm: class name (passed from calling env)
aKLnArgs: command line args (automatically passed from calling env)
ODir: output dir (set and passed from calling env)
aPropsMetaMD5Sign: MD5 Signature of properties definitions and names (passed from calling env and set for some type of running context/properties)
lTm00Start: start time (automatically passed from calling env)

and then the user sets up a system and logical context running env properties (or xml) files which look like this:

# fully explicit and declaratively defined running properties written in a Backus?Naur(ish) form

# all system property names must start with (*nix standard) double hyphen
# metadata names are prefixed and suffixed as system_<property name>_values_def
# options are explicitly piped (with "|") "true[|false]" means it must [|not] be defined
# the last of the existing options after closing square bracket is the default
# if default option is not listed, it must be retrievable via java.lang.System.getProperty(<option>)


# ~ ~ ~ ~ ~ ~ ~ ~ java system level settings

# y: prints to standard error all java system and current process properties, as well as OS-level env variables the JVM has access to
--print-env-context: n

# y: redirects standard error file to <output dirirectory>/yyyyMMddHHmmss.SSSS"_err.log
--redirect-err: n

# y: redirects standard output file ...
--redirect-out: q

# file encoding used for file (it must be UTF-8 like)
--char-encoding: UTF-8

# version: <release>.<update[even:finished|prime:editing]>_<date +%Y-%m-%d>_<girl name>_phase
--version: 0.3_2013-03-08_kerala_pre-alpha

# code points are read off files line by line
--end-of-line:

# ~ ~ ~ ~ ~ ~ METADATA ~ DO NOT EDIT! ~ ~ ~ ~ ~ ~ ~ ~

system_print-env-context_values_def: true[y|n]n

system_redirect-err_values_def: true[y|n]n

system_redirect-out_values_def: true[y|n]n

system_char-encoding_values_def: true[UTF-8|UTF8|UTF-7|US-ASCII|ISO-8859-1|ISO-LATIN-1|ISO646-US|ANSI X3.4-1968]UTF-8

system_version_values_def: true[0.3_2013-03-08_kerala_pre-alpha]

system_end-of-line_values_def: false[nix|windows|mac]line.separator





# ~ ~ ~ ~ ~ logical context for java running instance ~ ~ ~ ~ ~

--input-files-list:


# ~ ~ ~ ~ ~ ~ METADATA ~ DO NOT EDIT! ~ ~ ~ ~ ~ ~ ~ ~

# file containing one liner of input files must be defined
input-files-list_values_def: true


thank you guys and I think I will go ahead and do the parsing myself
lbrtchx

markspace 03-13-2013 10:12 PM

Re: regexp(ing) Backus-Naurish expressions ...
 
On 3/13/2013 2:54 PM, qwertmonkey@syberianoutpost.ru wrote:
> # all system property names must start with (*nix standard) double hyphen


This to me says Apache CLI could be a big help.

> thank you guys and I think I will go ahead and do the parsing myself


Agreed, I think your problem is specific enough that you are going to
have to custom code it.


Arne Vajh°j 03-13-2013 10:20 PM

Re: regexp(ing) Backus-Naurish expressions ...
 
On 3/13/2013 5:54 PM, qwertmonkey@syberianoutpost.ru wrote:
> Arne Vajh°j schrieb:
>> I would do it as:
>> - switch from properties to XML
>> - define a schema for the XML with strict restrictions on data
>> - let the application parse that with a validating parser and
>> read it into some config object, this will ensure that required
>> information is there and that the data types are correct
>> - let the application apply business validation rules in Java code
>> on the config objects - this will ensure that the various
>> information is consistent

> ~
> Arne, what do you specifically mean when you say "read it into some
> config object"? Using JAXB? AFAIK JAXB needs source (re)compilation in
> Android:


JAXB is on way to get from XML to Java objects.

But there are plenty of other. W3C DOM, SAX, StAX, JDOM etc.. I would
expect some of them to be available on Android.

> Also I am trying to deal with it in a general "named-value" pair way, so that
> different schema files should be parsed and the result (as I see it) should
> be some String[*][2] with the names and values of parameters/properties


Anything that can be represented in a properties file should be
possible to represent in a XML file. And most likely in a more
structured way.

Arne



Arved Sandstrom 03-13-2013 11:00 PM

Re: regexp(ing) Backus-Naurish expressions ...
 
On 03/13/2013 07:20 PM, Arne Vajh°j wrote:
> On 3/13/2013 5:54 PM, qwertmonkey@syberianoutpost.ru wrote:
>> Arne Vajh°j schrieb:
>>> I would do it as:
>>> - switch from properties to XML
>>> - define a schema for the XML with strict restrictions on data
>>> - let the application parse that with a validating parser and
>>> read it into some config object, this will ensure that required
>>> information is there and that the data types are correct
>>> - let the application apply business validation rules in Java code
>>> on the config objects - this will ensure that the various
>>> information is consistent

>> ~
>> Arne, what do you specifically mean when you say "read it into some
>> config object"? Using JAXB? AFAIK JAXB needs source (re)compilation in
>> Android:

>
> JAXB is on way to get from XML to Java objects.
>
> But there are plenty of other. W3C DOM, SAX, StAX, JDOM etc.. I would
> expect some of them to be available on Android.
>
>> Also I am trying to deal with it in a general "named-value" pair
>> way, so that
>> different schema files should be parsed and the result (as I see it)
>> should
>> be some String[*][2] with the names and values of parameters/properties

>
> Anything that can be represented in a properties file should be
> possible to represent in a XML file. And most likely in a more
> structured way.
>
> Arne
>
>

However, many people - myself included - may find a properties file
easier to read than XML.

Also, XML no more gives you a _good_ hierarchy - which requires thought
- than a properties file with well-designed keys. Keys for properties
files for several Java loggers are examples of how they can be used to
easily define a hierarchy.

It's easier to read in a properties file.

Back in the day, not in the Java environment admittedly, I used to
prefer YAML to XML for properties files.

AHS

Arne Vajh°j 03-15-2013 01:16 AM

Re: regexp(ing) Backus-Naurish expressions ...
 
On 3/13/2013 7:00 PM, Arved Sandstrom wrote:
> On 03/13/2013 07:20 PM, Arne Vajh°j wrote:
>> On 3/13/2013 5:54 PM, qwertmonkey@syberianoutpost.ru wrote:
>>> Arne Vajh°j schrieb:
>>>> I would do it as:
>>>> - switch from properties to XML
>>>> - define a schema for the XML with strict restrictions on data
>>>> - let the application parse that with a validating parser and
>>>> read it into some config object, this will ensure that required
>>>> information is there and that the data types are correct
>>>> - let the application apply business validation rules in Java code
>>>> on the config objects - this will ensure that the various
>>>> information is consistent
>>> ~
>>> Arne, what do you specifically mean when you say "read it into some
>>> config object"? Using JAXB? AFAIK JAXB needs source (re)compilation in
>>> Android:

>>
>> JAXB is on way to get from XML to Java objects.
>>
>> But there are plenty of other. W3C DOM, SAX, StAX, JDOM etc.. I would
>> expect some of them to be available on Android.
>>
>>> Also I am trying to deal with it in a general "named-value" pair
>>> way, so that
>>> different schema files should be parsed and the result (as I see it)
>>> should
>>> be some String[*][2] with the names and values of parameters/properties

>>
>> Anything that can be represented in a properties file should be
>> possible to represent in a XML file. And most likely in a more
>> structured way.
>>
>>

> However, many people - myself included - may find a properties file
> easier to read than XML.


I don't see XML as difficult to read.

> Also, XML no more gives you a _good_ hierarchy - which requires thought
> - than a properties file with well-designed keys. Keys for properties
> files for several Java loggers are examples of how they can be used to
> easily define a hierarchy.


With property files it becomes a convention instead of structure.

And regarding the loggers, then note that some of the advanced
features are only available via XML config not via properties
config, so I am not sure that loggers is an argument against XML.

> It's easier to read in a properties file.


If you don't need to check values - yes.

But if you need to check values, then XML with a schema
and a validating parser saves a ton of Java code.

Which was my original point.

Arne



Arved Sandstrom 03-15-2013 09:31 AM

Re: regexp(ing) Backus-Naurish expressions ...
 
On 03/14/2013 10:16 PM, Arne Vajh°j wrote:
> On 3/13/2013 7:00 PM, Arved Sandstrom wrote:
>> On 03/13/2013 07:20 PM, Arne Vajh°j wrote:
>>> On 3/13/2013 5:54 PM, qwertmonkey@syberianoutpost.ru wrote:
>>>> Arne Vajh°j schrieb:
>>>>> I would do it as:
>>>>> - switch from properties to XML
>>>>> - define a schema for the XML with strict restrictions on data
>>>>> - let the application parse that with a validating parser and
>>>>> read it into some config object, this will ensure that required
>>>>> information is there and that the data types are correct
>>>>> - let the application apply business validation rules in Java code
>>>>> on the config objects - this will ensure that the various
>>>>> information is consistent
>>>> ~
>>>> Arne, what do you specifically mean when you say "read it into some
>>>> config object"? Using JAXB? AFAIK JAXB needs source (re)compilation in
>>>> Android:
>>>
>>> JAXB is on way to get from XML to Java objects.
>>>
>>> But there are plenty of other. W3C DOM, SAX, StAX, JDOM etc.. I would
>>> expect some of them to be available on Android.
>>>
>>>> Also I am trying to deal with it in a general "named-value" pair
>>>> way, so that
>>>> different schema files should be parsed and the result (as I see it)
>>>> should
>>>> be some String[*][2] with the names and values of parameters/properties
>>>
>>> Anything that can be represented in a properties file should be
>>> possible to represent in a XML file. And most likely in a more
>>> structured way.
>>>
>>>

>> However, many people - myself included - may find a properties file
>> easier to read than XML.

>
> I don't see XML as difficult to read.


Point being, other people may. XML is only readable - obviously - when
it is properly formatted (whitespaced), and it is *considerably* more
readable when (1) it is colour-coded and (2) element tags (open and
close) and CDATA content all are on individual lines. But if you don't
have colour-coding and the formatting is fairly condensed (but still
allowing for decent indentation) then I consider XML to often be less
efficient at conveying information to a human than an equivalent
properties file.

>> Also, XML no more gives you a _good_ hierarchy - which requires thought
>> - than a properties file with well-designed keys. Keys for properties
>> files for several Java loggers are examples of how they can be used to
>> easily define a hierarchy.

>
> With property files it becomes a convention instead of structure.


No more so than deciding *how* to interpret an XML file. DTDs or schemas
only do first-line validation - you need an accompanying specification
(out-of-band docs) that explains to humans what all that XML means, just
as for properties files. There are no magic bullets.

> And regarding the loggers, then note that some of the advanced
> features are only available via XML config not via properties
> config, so I am not sure that loggers is an argument against XML.


Yeah, I know. I think that was an implementation choice, is all. It's
not like the properties format couldn't support the extra options.

>> It's easier to read in a properties file.

>
> If you don't need to check values - yes.


That's only first-level checking (is this element content a number,
say). However, it's pretty straightforward to accomplish the same thing
with properties files, assuming that the goal is a Java "properties"
bean of some sort. If you have a properties file entry mapped to a bean
field of type X, and the conversion succeeds or fails, it's the same
thing as doing your XML schema checking.

Since the properties bean *is* a bean, it has getters and setters. You
can easily put any validation into your setters, if you're handrolling,
and you may well have to anyway, since some validation can be complex.

Some environments make this particularly easy: if I'm using Spring (and
I often must, and many people do) then autowiring, normal Spring
property file use, and the @Value annotation make it extremely simple to
load up a properties POJO from a properties file.

But even handrolling is easy. Often you may as well, since your
second-level semantic checking is not something that anything but code
will do for you anyway.

> But if you need to check values, then XML with a schema
> and a validating parser saves a ton of Java code.
>
> Which was my original point.
>
> Arne
>


How much does that schema and XML parsing save you? Presumably you want
those properties to end up in one or more strongly typed "properties"
objects. Whether the source of properties is a properties file or XML,
you have to code up those Java "properties" POJOs with full knowledge of
expected structure. *That* is your schema right there, regardless:
running an XML validator on an XML file against an XML schema is
duplication of effort. You'd learn the same things by failing to load
the properties into your beans, which you have to do anyway.

AHS


Lew 03-15-2013 06:34 PM

Re: regexp(ing) Backus-Naurish expressions ...
 
I was happy to merely observe this variation on Editor Wars, but there are a couple of points
I'd like to offer.

Arved Sandstrom wrote:
> How much does that schema and XML parsing save you? Presumably you want


A lot, in my experience. I've done a fair amount of heterogenous-system communications
via various protocols including fixed-format ("columns 1-6 mean identifier, 7-8 are a control code,
9-15 are the section name, ..."), CSV, XML, Google Buffers and JSON.

Using XML for, say, web services or to communicate an object model is quite powerful, made
more so by the use of schemas and schema validation.

It doesn't handle deep validation, nor should it. It's the XML equivalent of surface-edit validation
in a GUI. You don't expect the back end to validate everything, typically, but to count on certain
sanity checks from the front end. Thus it is common for the front end (GUI widget or XML doc)
to validate things like "is this a number?". And useful.

> those properties to end up in one or more strongly typed "properties"
> objects. Whether the source of properties is a properties file or XML,
> you have to code up those Java "properties" POJOs with full knowledge of
> expected structure. *That* is your schema right there, regardless:


That is not your schema. That is one layer's implementation of your schema.

> running an XML validator on an XML file against an XML schema is
> duplication of effort. You'd learn the same things by failing to load


Not if you do it right, it isn't.

> the properties into your beans, which you have to do anyway.


Not if you do it right, you don't.

Systems have different pipelines at different layers. In a system where XML is
advantageous, you have validation at the gateway, before it gets into your queues
and components and heavy logic. This increases throughput and scalability in addition
to correctness and reliability.

Also, redundant checks are not always a bad thing. Back in the 1970s a nuclear
missile siloed in West Virginia lost three of its four failsafes. Had there not been redundancy,
there would have been catastrophe. Back in the 1980s there was a nuclear-medicine
radiation-doser manufacturer in Canada who removed "redundant" hardware failsafes in the
dosage, and the software bugs promptly started killing people.

A properly designed system will put surface edits in the front end, whether it's XML or
source code compilation or JSON parsing or what-have-you, and different checks in
different layers. Useful redundancy is achieved by dependent layers asserting the validity
promised by antecedent layers rather than duplication of all the effort.

--
Lew


Arved Sandstrom 03-15-2013 11:17 PM

Re: regexp(ing) Backus-Naurish expressions ...
 
On 03/15/2013 03:34 PM, Lew wrote:
> I was happy to merely observe this variation on Editor Wars, but there are a couple of points
> I'd like to offer.
>
> Arved Sandstrom wrote:
>> How much does that schema and XML parsing save you? Presumably you want

>
> A lot, in my experience. I've done a fair amount of heterogenous-system communications
> via various protocols including fixed-format ("columns 1-6 mean identifier, 7-8 are a control code,
> 9-15 are the section name, ..."), CSV, XML, Google Buffers and JSON.


I agree, a lot *overall*. But I am thinking in this thread of
configuration files, strictly configuration files, and *only*
configuration files. That I use XML a great deal is totally irrelevant.

> Using XML for, say, web services or to communicate an object model is quite powerful, made
> more so by the use of schemas and schema validation.


It is that, although I wouldn't consider XML to be superior to other
possibilities for communication of object models. It works; so do other
methods.

> It doesn't handle deep validation, nor should it. It's the XML equivalent of surface-edit validation
> in a GUI. You don't expect the back end to validate everything, typically, but to count on certain
> sanity checks from the front end. Thus it is common for the front end (GUI widget or XML doc)
> to validate things like "is this a number?". And useful.


Well, yes. What I said.

>> those properties to end up in one or more strongly typed "properties"
>> objects. Whether the source of properties is a properties file or XML,
>> you have to code up those Java "properties" POJOs with full knowledge of
>> expected structure. *That* is your schema right there, regardless:

>
> That is not your schema. That is one layer's implementation of your schema.


Oh no, I disagree. That is my schema, if my source of truth is a Java
POJO. I didn't say "XML schema", I said "schema". The only thing that
concerns me in this argument is

POJO <-> configuration file

I know what I need that "properties" or "configuration" POJO to be; I
can write it first, it's authoritative. It *is* my schema. Not an XML
schema, but my expressed configuration data structure.

>> running an XML validator on an XML file against an XML schema is
>> duplication of effort. You'd learn the same things by failing to load

>
> Not if you do it right, it isn't.
>
>> the properties into your beans, which you have to do anyway.

>
> Not if you do it right, you don't.
>
> Systems have different pipelines at different layers. In a system where XML is
> advantageous, you have validation at the gateway, before it gets into your queues
> and components and heavy logic. This increases throughput and scalability in addition
> to correctness and reliability.
>
> Also, redundant checks are not always a bad thing. Back in the 1970s a nuclear
> missile siloed in West Virginia lost three of its four failsafes. Had there not been redundancy,
> there would have been catastrophe. Back in the 1980s there was a nuclear-medicine
> radiation-doser manufacturer in Canada who removed "redundant" hardware failsafes in the
> dosage, and the software bugs promptly started killing people.
>
> A properly designed system will put surface edits in the front end, whether it's XML or
> source code compilation or JSON parsing or what-have-you, and different checks in
> different layers. Useful redundancy is achieved by dependent layers asserting the validity
> promised by antecedent layers rather than duplication of all the effort.
>

This is a good argument - validation redundancy - in the bigger picture.
I don't dispute that. But you don't need XML or XML DTDs/schemas to
achieve that.

AHS


All times are GMT. The time now is 11:59 PM.

Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.