Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Java > Parsing a Boolean expression easy?

Reply
Thread Tools

Parsing a Boolean expression easy?

 
 
cbongior@stny.rr.com
Guest
Posts: n/a
 
      08-15-2005
String testText = "\"christian bongiorno\" AND Joe OR \"Electrical,
plumbing\"";

I would like to parse the above text into it's 'components' in an easy
and preferrably native java library fashion. I mean, I can implement a
custom parse, but it would be a little ugly.

Ultimately, I would like the following tokens:

1) Christian bongiorno
2) AND
3) Joe
4) OR
5) Electrical, plumbing

With StringTokenizer I can correctly get the quoted words, but it
doesn't distingush the non-quoted. So, I get

StringTokenizer tokens = new StringTokenizer("\"christian bongiorno\"
AND Joe OR \"Electrical, plumbing\"","\"");

produces

1) Christian bongiorno
2) AND Joe OR
3) Electrical, plumbing

As you guessed, this is for text searching. Also, No 3rd party
libraries. But be all core Java

ideas?

 
Reply With Quote
 
 
 
 
Oliver Wong
Guest
Posts: n/a
 
      08-15-2005

<(E-Mail Removed)> wrote in message
news:(E-Mail Removed) oups.com...
> String testText = "\"christian bongiorno\" AND Joe OR \"Electrical,
> plumbing\"";
>
> I would like to parse the above text into it's 'components' in an easy
> and preferrably native java library fashion. I mean, I can implement a
> custom parse, but it would be a little ugly.
>
> Ultimately, I would like the following tokens:
>
> 1) Christian bongiorno
> 2) AND
> 3) Joe
> 4) OR
> 5) Electrical, plumbing
>
> With StringTokenizer I can correctly get the quoted words, but it
> doesn't distingush the non-quoted. So, I get
>
> StringTokenizer tokens = new StringTokenizer("\"christian bongiorno\"
> AND Joe OR \"Electrical, plumbing\"","\"");
>
> produces
>
> 1) Christian bongiorno
> 2) AND Joe OR
> 3) Electrical, plumbing
>
> As you guessed, this is for text searching. Also, No 3rd party
> libraries. But be all core Java
>
> ideas?


There's a difference between parsing and tokenizing. A lot of the time
when people say parsing, they mean tokenizing (which is why the string
tokenizer solves their problem). The problem you're describing is actual,
real parsing.

If you don't want to use 3rd party tools, then you'll just have to write
a parser by hand. Lookup "recursive descent parsing". You may also want to
try posting future questions on this project to comp.compilers to learn more
about parsing theory.

- Oliver


 
Reply With Quote
 
 
 
 
shakah
Guest
Posts: n/a
 
      08-15-2005
http://www.velocityreviews.com/forums/(E-Mail Removed) wrote:
> String testText = "\"christian bongiorno\" AND Joe OR \"Electrical,
> plumbing\"";
>
> I would like to parse the above text into it's 'components' in an easy
> and preferrably native java library fashion. I mean, I can implement a
> custom parse, but it would be a little ugly.
>
> Ultimately, I would like the following tokens:
>
> 1) Christian bongiorno
> 2) AND
> 3) Joe
> 4) OR
> 5) Electrical, plumbing
>
> With StringTokenizer I can correctly get the quoted words, but it
> doesn't distingush the non-quoted. So, I get
>
> StringTokenizer tokens = new StringTokenizer("\"christian bongiorno\"
> AND Joe OR \"Electrical, plumbing\"","\"");
>
> produces
>
> 1) Christian bongiorno
> 2) AND Joe OR
> 3) Electrical, plumbing


How about something with regular expressions, e.g.:

jc@soyuz:~/tmp$ cat bparse.java
public class bparse {
public static void main(String [] asArgs) {
java.util.regex.Pattern p
= java.util.regex.Pattern.compile(asArgs[0]) ;
System.out.println(" regex: '" + asArgs[0] + "'") ;
for(int i=1; i<asArgs.length; ++i) {
String sExpr = asArgs[i] ;
System.out.println("input str: '" + sExpr + "'") ;
java.util.regex.Matcher m = p.matcher(sExpr) ;
while(m.find()) {
System.out.println(
" match: '"
+ sExpr.substring(m.start(), m.end()) + "'") ;
}
}
}
}

jc@soyuz:~/tmp$ java bparse '("[^"]*"|AND|OR|[A-Za-z0-9]+)'
"\"christian bongiorno\" AND Joe OR \"Electrical, plumbing\""
regex: '("[^"]*"|AND|OR|[A-Za-z0-9]+)'
input str: '"christian bongiorno" AND Joe OR "Electrical, plumbing"'
match: '"christian bongiorno"'
match: 'AND'
match: 'Joe'
match: 'OR'
match: '"Electrical, plumbing"'

?

 
Reply With Quote
 
cbongior@stny.rr.com
Guest
Posts: n/a
 
      08-15-2005
I was SURE regular expression could do it, but my regexp skills SUCK!
As an aside, linux interprets the commandline differently than windows.
Windows turned those commandline args into like, 8 seperate arguments.
So, I adapted but, it works!

Thanks

Christian

http://christian.bongiorno.org/resume.pdf

 
Reply With Quote
 
cbongior@stny.rr.com
Guest
Posts: n/a
 
      08-15-2005
One question though? In the results, is it possible to easily throw out
the " " around a quoted part?

so...
instead of
match: '"christian bongiorno"'

I get
match: 'christian bongiorno'

 
Reply With Quote
 
shakah
Guest
Posts: n/a
 
      08-15-2005
(E-Mail Removed) wrote:
> One question though? In the results, is it possible to easily throw out
> the " " around a quoted part?
>
> so...
> instead of
> match: '"christian bongiorno"'
>
> I get
> match: 'christian bongiorno'


How about:
while(m.find()) {
String sMatch = sExpr.substring(m.start(), m.end()) ;
if(sMatch.startsWith("\"") && sMatch.endsWith("\"")) {
sMatch = sMatch.substring(1, sMatch.length()-1) ;
}
System.out.println(" match: '" + sMatch + "'") ;
}

 
Reply With Quote
 
Roedy Green
Guest
Posts: n/a
 
      08-15-2005
On 15 Aug 2005 12:47:59 -0700, (E-Mail Removed) wrote or quoted :

>As you guessed, this is for text searching. Also, No 3rd party
>libraries. But be all core Java


Here is how you could implement an elcheapo tokenizer.

use a regex to split on space, teaching it to ignore spaces inside
quotes.

Use a HashMap of defined words and keywords mapping to an enum that
classifies them.

Look up the word to see if it is magic e.g. and or.

You now have an array of tokens that identify their general class.
That is a lot easier to parse, especially if you use a postfix
notation.

The other approach is to use a parser generator, which will be much
easier than you imagine. See http://mindprod.com/jgloss/parser.html


 
Reply With Quote
 
cbongior@stny.rr.com
Guest
Posts: n/a
 
      08-16-2005
Thanks, I was thinking that something in the REGEX could do it. I
logicked around it already.

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Subtle difference between boolean value and boolean comparison? Metre Meter Javascript 7 08-06-2010 08:40 PM
C/C++ language proposal: Change the 'case expression' from "integral constant-expression" to "integral expression" Adem C++ 42 11-04-2008 12:39 PM
C/C++ language proposal: Change the 'case expression' from "integral constant-expression" to "integral expression" Adem C Programming 45 11-04-2008 12:39 PM
difference between 'boolean' and 'java.lang.Boolean' J Leonard Java 4 01-19-2008 02:56 AM
Perl expression for parsing CSV (ignoring parsing commas when in double quotes) GIMME Perl 2 02-11-2004 05:40 PM



Advertisments