Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   Java (http://www.velocityreviews.com/forums/f30-java.html)
-   -   regexp(ing) Backus-Naurish expressions ... (http://www.velocityreviews.com/forums/t958534-regexp-ing-backus-naurish-expressions.html)

qwertmonkey@syberianoutpost.ru 03-10-2013 02:27 AM

regexp(ing) Backus-Naurish expressions ...
 
I need to set up some code's running context via properties files and I want
to make sure that users don't get too playful messing with them, because that
could alter results greatly and in unexpected ways (they must probably won't
be able to make sense of and then they would bother the hell out of you)
~
So, I must do some sanity check the running parameters if entered via the
command prompt or if the defaults are used from the properties files
~
I am telling you all of that because you many know of libraries to do such
thing
~
I think one possible way to do that is via a regexp, which should match all
the options included in the test array aISAr
~
One of the problems I am having is that if you enter as options say [true|t],
the matcher would match just the "t" of "true" and I want for "true" to be
actually matched another one is that, say, " true ", should be matched, as well
as "false [ nix |mac| windows ] line.separator" ...
~
Any ideas you would share?
~
thanks,
lbrtchx
~
~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ TEST CODE ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

import java.util.regex.Matcher;
import java.util.regex.Pattern;

// __
public class RegexMatches02Test{
// __
public static void main( String args[] ){
String aRegEx;
String aIS;
Pattern Ptrn;
Matcher Mtchr;
int iCnt, iMtxStart, iMtxEnd;
// __
aRegEx = "^\\s*[true|false|t|f]{1}\\s*\\[";
aRegEx = "^\\s*[true|false|t|f]{1}";
aRegEx = "^\\s*[true|false|t|f]{1}\\s*";
aRegEx = "^\\s*[true|false t|f]{1}\\s*";

// __
String[] aISAr = new String[]{
" true[a|b |c ] q"
, " true [a|b |c ] q"
, "true [a|b |c ] q"
, "true[a|b|c] b"
, "true[a|b|c]q"
, "False[ y | n | q ] q"
, "false[nix|windows|mac]line.separator"
, "false [ nix |mac| windows ] line.separator"
, "T[y|n]q"
, "T[y]"
, "false"
, "faLse"
, "true"
, "TrUe"
, "F"
, "T"
};
int iISArL = aISAr.length, i = 0;
// __
boolean IsLoop;
Ptrn = Pattern.compile(aRegEx, Pattern.CASE_INSENSITIVE);

System.err.println("// __ matching pattern: |" + aRegEx + "|");

Mtchr = Ptrn.matcher(aISAr[i]); // get a matcher object
IsLoop = (i < iISArL);
while(IsLoop){
System.err.println("// __ |" + i + "|" + aISAr[i] + "|");
iCnt = 0;
// __
while(Mtchr.find()){
iMtxStart = Mtchr.start();
iMtxEnd = Mtchr.end();
System.err.println("|" + iCnt + "|" + iMtxStart + "|" + iMtxEnd + "|" +
aISAr[i].substring(iMtxStart, iMtxEnd) + "|");
++iCnt;
}// (Mtchr.find())
System.err.println("~");
// __
++i;
IsLoop = (i < iISArL);
if(IsLoop){ Mtchr.reset(aISAr[i]); }
}// while(IsLoop)
}
}

Arne Vajhj 03-10-2013 02:33 AM

Re: regexp(ing) Backus-Naurish expressions ...
 
On 3/9/2013 9:27 PM, qwertmonkey@syberianoutpost.ru wrote:
> I need to set up some code's running context via properties files and I want
> to make sure that users don't get too playful messing with them, because that
> could alter results greatly and in unexpected ways (they must probably won't
> be able to make sense of and then they would bother the hell out of you)
> ~
> So, I must do some sanity check the running parameters if entered via the
> command prompt or if the defaults are used from the properties files
> ~
> I am telling you all of that because you many know of libraries to do such
> thing
> ~
> I think one possible way to do that is via a regexp, which should match all
> the options included in the test array aISAr
> ~
> One of the problems I am having is that if you enter as options say [true|t],
> the matcher would match just the "t" of "true" and I want for "true" to be
> actually matched another one is that, say, " true ", should be matched, as well
> as "false [ nix |mac| windows ] line.separator" ...
> ~
> Any ideas you would share?


I would do it as:
- switch from properties to XML
- define a schema for the XML with strict restrictions on data
- let the application parse that with a validating parser and
read it into some config object, this will ensure that required
information is there and that the data types are correct
- let the application apply business validation rules in Java code
on the config objects - this will ensure that the various
information is consistent

Arne



Joshua Cranmer 🐧 03-10-2013 03:00 AM

Re: regexp(ing) Backus-Naurish expressions ...
 
On 3/9/2013 8:27 PM, qwertmonkey@syberianoutpost.ru wrote:
> One of the problems I am having is that if you enter as options say [true|t],
> the matcher would match just the "t" of "true" and I want for "true" to be
> actually matched another one is that, say, " true ", should be matched, as well
> as "false [ nix |mac| windows ] line.separator" ...


Do you know the syntax of Java's regular expressions? See
<http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html>.

In short, anything contained within square brackets is considered to be
a set of characters to match on, so [true|t] succeeds if the character
it's matching against is a t, r, u, e, or |. The syntax you probably
wanted was (true|t), which would either match the string "true" or the
string "t".

--
Beware of bugs in the above code; I have only proved it correct, not
tried it. -- Donald E. Knuth

Stefan Ram 03-10-2013 03:04 AM

Re: regexp(ing) Backus-Naurish expressions ...
 
qwertmonkey@syberianoutpost.ru writes:
> I am telling you all of that because you many know of libraries to do such
>thing


The config class can be seen as a bean, and then bean
validation can be applied, possibly (I never used that).

http://docs.oracle.com/javaee/6/tutorial/doc/gircz.html

> One of the problems I am having is that if you enter as options say [true|t],
>the matcher would match just the "t" of "true" and I want for "true" to be


(?:true|t(?=[^r][^u][^e]))

(sketch, untested)


Roedy Green 03-10-2013 02:57 PM

Re: regexp(ing) Backus-Naurish expressions ...
 
On Sun, 10 Mar 2013 02:27:32 +0000 (UTC),
qwertmonkey@syberianoutpost.ru wrote, quoted or indirectly quoted
someone who said :

> Any ideas you would share?


Regexes are quite limited. When you bang into their limits you can
write a finite state machine or use a parser.

see http://mindprod.com/jgloss/parser.html
http://mindprod.com/jgloss/finitestate.html
--
Roedy Green Canadian Mind Products http://mindprod.com
Software gets slower faster than hardware gets faster.
~ Niklaus Wirth (born: 1934-02-15 age: 79) Wirth's Law

markspace 03-10-2013 06:16 PM

Re: regexp(ing) Backus-Naurish expressions ...
 
On 3/9/2013 6:27 PM, qwertmonkey@syberianoutpost.ru wrote:

> One of the problems I am having is that if you enter as options say [true|t],
> the matcher would match just the "t" of "true" and I want for "true" to be
> actually matched another one is that, say, " true ", should be matched, as well
> as "false [ nix |mac| windows ] line.separator" ...
> ~
> Any ideas you would share?
> ~



Based on your syntax example and you title, why bother with
"Backus-Naurish?" Java has full parser generators.

http://www.antlr.org/



Robert Klemme 03-10-2013 09:39 PM

Re: regexp(ing) Backus-Naurish expressions ...
 
On 10.03.2013 15:57, Roedy Green wrote:
> On Sun, 10 Mar 2013 02:27:32 +0000 (UTC),
> qwertmonkey@syberianoutpost.ru wrote, quoted or indirectly quoted
> someone who said :
>
>> Any ideas you would share?

>
> Regexes are quite limited.


I beg to differ: it's amazing what you can do with them. Especially
modern RX engines are usually much more powerful than those needed for
the class of regular languages.

> When you bang into their limits you can
> write a finite state machine or use a parser.


What limitations would make me want to write a FSM instead by hand?

Cheers

robert

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Stefan Ram 03-10-2013 10:21 PM

Re: regexp(ing) Backus-Naurish expressions ...
 
Robert Klemme <shortcutter@googlemail.com> writes:
>What limitations would make me want to write a FSM instead by hand?


It is a natural idea that the user may input simple
arithmetic expressions with numeric literals, basic
arithmetics, parentheses and algebraic signs when the
program asks for a numeric value.


Roedy Green 03-10-2013 10:54 PM

Re: regexp(ing) Backus-Naurish expressions ...
 
Examples where regexes run out of steam:
parsing Java, HTML, BAT language ... to do syntax colouring.
screen scraping, where what you want can appear in arbiter orders, be
missing, or enclosed in a variety of delimiters.

creating code to simulate the output of forms. You have to do it in
stages. You pick out a string then you pick out strings of that


--
Roedy Green Canadian Mind Products http://mindprod.com
Software gets slower faster than hardware gets faster.
~ Niklaus Wirth (born: 1934-02-15 age: 79) Wirth's Law

Roedy Green 03-10-2013 11:24 PM

Re: regexp(ing) Backus-Naurish expressions ...
 
On Sun, 10 Mar 2013 22:39:22 +0100, Robert Klemme
<shortcutter@googlemail.com> wrote, quoted or indirectly quoted
someone who said :

>What limitations would make me want to write a FSM instead by hand?


Compacting out nugatory space in HTML would be another example.

Though they are quite complicated, I find FSMs very easy to write, and
they almost always work first time. You can narrow your thinking to a
tiny case and ignore the big picture quite safely.

In contrast, I find my regexes (of any complexity) nearly always have
some unexpected behaviour, often than does not show up immediately.

The other complicating factor is I use three different regex schemes
in a day: Java, Funduc and SlickEdit. I keep borrowing syntax from
one of the other schemes than the one I am using. Some day I will
have to write replacements that use Java syntax.
--
Roedy Green Canadian Mind Products http://mindprod.com
Software gets slower faster than hardware gets faster.
~ Niklaus Wirth (born: 1934-02-15 age: 79) Wirth's Law


All times are GMT. The time now is 05:01 PM.

Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.