Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Java > Regular Expression finder

Reply
Thread Tools

Regular Expression finder

 
 
Joe Smith
Guest
Posts: n/a
 
      09-14-2004
Hi,

does anyone know of a tool that would be able to extract the regular
expression that corresponds to a set of Strings?

For instance:

This tool, given
"abc", "aec", "akkc"
would return a regular expression like "a.+c"

Is this possible? Is it done?

Thanks!


 
Reply With Quote
 
 
 
 
David Hilsee
Guest
Posts: n/a
 
      09-14-2004
"Joe Smith" <> wrote in message
news:ci6nvt$9la$...
> Hi,
>
> does anyone know of a tool that would be able to extract the regular
> expression that corresponds to a set of Strings?
>
> For instance:
>
> This tool, given
> "abc", "aec", "akkc"
> would return a regular expression like "a.+c"
>
> Is this possible? Is it done?


_The_ regular expression? There are an infinite number of regular
expressions that match those strings. Even if there were a tool that could
guess at a regex using heuristics, you'd still need to examine its output to
ensure that its result meets your needs.

Personally, I'd prefer using something that can quickly test the regexes
that your brain comes up with. The Komodo IDE had such a feature that I
found quite helpful. I haven't seen anything like it in other IDEs, though.

--
David Hilsee


 
Reply With Quote
 
 
 
 
Michael Borgwardt
Guest
Posts: n/a
 
      09-14-2004
Joe Smith wrote:
> does anyone know of a tool that would be able to extract the regular
> expression that corresponds to a set of Strings?


There is no "the" there.

> For instance:
>
> This tool, given
> "abc", "aec", "akkc"
> would return a regular expression like "a.+c"


Why not "a[bek].*" or "a.*"?

> Is this possible? Is it done?


It's certainly possible (and very easy) to write a method to
return a regular expression that matches any of a given set of
Strings:

public String getRegexp(String[] strings){
return ".*";
}

Or did you mean a regexp that matches all of the given Strings
and *only* those? The example you give fails in that regard, but
it's also quite easy to do:

public String getRegexp(String[] strings){
StringBuffer result = new StringBuffer("(");
for(int i=0; i<strings.lenght; i++){
result.append(strings[i]+"|");
}
result.setCharAt(result.length()-1, ')');
return result.toString();
}

(you'd have to add escape sequences for characters that have
meaning in regexps)

The real question is: which if the *infinite* number of regular expressions
that matches a given set of Strings do you want to find?
 
Reply With Quote
 
Joe Smith
Guest
Posts: n/a
 
      09-14-2004
> > does anyone know of a tool that would be able to extract the regular
> > expression that corresponds to a set of Strings?

>
> There is no "the" there.
>
> > For instance:
> >
> > This tool, given
> > "abc", "aec", "akkc"
> > would return a regular expression like "a.+c"

>
> Why not "a[bek].*" or "a.*"?
>
>
> The real question is: which if the *infinite* number of regular

expressions
> that matches a given set of Strings do you want to find?


Ok, ok... it's clear that my idea needs more explanations:

It's true that there's an infinite number of regexps that may match a set of
Strings... So perhaps, what I really want is to extract the common sections
of these strings... And replace the other parts with the "minimum" regexp...
And yes, there will be countless of them!!...
Idea:

"header body1 body2 footer epilogue"

"Prolog header body1 footer"

I would have something like: "(Prolog)? header body1 (body2)? footer
(epilogue)?"

For instance, "diff" is able to find the differences between two files...
The tool I'm thinking off would perform diffs on several inputs, to be able
to extract these common parts...

But well, I guess it's too "abstract" for a program.

Thanks anyway!!


 
Reply With Quote
 
Matt Humphrey
Guest
Posts: n/a
 
      09-14-2004

"Joe Smith" <> wrote in message
news:ci6sji$kbk$...
> > > does anyone know of a tool that would be able to extract the regular
> > > expression that corresponds to a set of Strings?

> >
> > There is no "the" there.
> >
> > > For instance:
> > >
> > > This tool, given
> > > "abc", "aec", "akkc"
> > > would return a regular expression like "a.+c"

> >
> > Why not "a[bek].*" or "a.*"?
> >
> >
> > The real question is: which if the *infinite* number of regular

> expressions
> > that matches a given set of Strings do you want to find?

>
> Ok, ok... it's clear that my idea needs more explanations:
>
> It's true that there's an infinite number of regexps that may match a set

of
> Strings... So perhaps, what I really want is to extract the common

sections
> of these strings... And replace the other parts with the "minimum"

regexp...
> And yes, there will be countless of them!!...
> Idea:
>
> "header body1 body2 footer epilogue"
>
> "Prolog header body1 footer"
>
> I would have something like: "(Prolog)? header body1 (body2)? footer
> (epilogue)?"
>
> For instance, "diff" is able to find the differences between two files...
> The tool I'm thinking off would perform diffs on several inputs, to be

able
> to extract these common parts...
>
> But well, I guess it's too "abstract" for a program.


This is a research area, particular in user interfaces. You may find
something useful here:
http://www.ics.uci.edu/~dhilbert/pap...-ICS-98-13.pdf in section
4.4

Cheers,
Matt Humphrey http://www.iviz.com/


 
Reply With Quote
 
sks
Guest
Posts: n/a
 
      09-14-2004

"David Hilsee" <> wrote in message
news:h72dncA43Y9ucdvcRVn-...
> "Joe Smith" <> wrote in message
> news:ci6nvt$9la$...
> > Hi,
> >
> > does anyone know of a tool that would be able to extract the regular
> > expression that corresponds to a set of Strings?
> >
> > For instance:
> >
> > This tool, given
> > "abc", "aec", "akkc"
> > would return a regular expression like "a.+c"
> >
> > Is this possible? Is it done?

>
> _The_ regular expression? There are an infinite number of regular
> expressions that match those strings. Even if there were a tool that

could
> guess at a regex using heuristics, you'd still need to examine its output

to
> ensure that its result meets your needs.
>
> Personally, I'd prefer using something that can quickly test the regexes
> that your brain comes up with. The Komodo IDE had such a feature that I
> found quite helpful. I haven't seen anything like it in other IDEs,

though.

There's a plug in for Eclipse, you'd have to search for it on google though.


 
Reply With Quote
 
Carl Howells
Guest
Posts: n/a
 
      09-14-2004
Michael Borgwardt wrote:

> public String getRegexp(String[] strings){
> StringBuffer result = new StringBuffer("(");
> for(int i=0; i<strings.lenght; i++){
> result.append(strings[i]+"|");
> }
> result.setCharAt(result.length()-1, ')');
> return result.toString();
> }
>
> (you'd have to add escape sequences for characters that have
> meaning in regexps)


Last I checked, the java regex engine is pretty bad for that... It uses
recursion to build the automaton used for matching, which recurses too
deeply on an alternation with a few thousand options, throwing an exception.
 
Reply With Quote
 
Michael Borgwardt
Guest
Posts: n/a
 
      09-14-2004
Carl Howells wrote:
> Last I checked, the java regex engine is pretty bad for that... It uses
> recursion to build the automaton used for matching, which recurses too
> deeply on an alternation with a few thousand options, throwing an
> exception.


It wasn't really meant as a serious suggestion. *any* Regexp engine would
be a waste of resources to process that kind of pattern.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Seek xpath expression where an attribute name is a regular expression GIMME XML 3 12-29-2008 03:11 PM
C/C++ language proposal: Change the 'case expression' from "integral constant-expression" to "integral expression" Adem C++ 42 11-04-2008 12:39 PM
C/C++ language proposal: Change the 'case expression' from "integral constant-expression" to "integral expression" Adem C Programming 45 11-04-2008 12:39 PM
Matching abitrary expression in a regular expression =?iso-8859-1?B?bW9vcJk=?= Java 8 12-02-2005 12:51 AM
Dynamically changing the regular expression of Regular Expression validator VSK ASP .Net 2 08-24-2003 02:47 PM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57