Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   Python (http://www.velocityreviews.com/forums/f43-python.html)
-   -   Re: Regex Generator From Multiple Files (http://www.velocityreviews.com/forums/t651429-re-regex-generator-from-multiple-files.html)

MRAB 01-06-2009 12:48 AM

Re: Regex Generator From Multiple Files
 
James Pruitt wrote:
> I am looking for a way given a number of files, say 3, that represent
> technical support tickets in the same format to generate regular
> expressions for the different fields automatically.
>
> An example from of one line from each file:
> Date: 12/30/2008 Room: 457 Building: Main
> Date: 12/31/2008 Room: A21 Building: Annex
> Date: 1/4/2009 Room: L69 Building: Library
>
> The program would then, possibly using the python diff library, generate
> the regular expression needed to parse out different fields. In this
> case it might return a tuple like
> ("^Date:[\w]+(.*)[\w]+Room","Room:[\w]+(.*)[\w]+Building","Building:[\w]+(.*)[\w]+$")
> that would match each of the fields based on the common data and sort of
> assume that what doesn't change between them is data we are looking for.
>

Why not just assume that each field consists of a word terminated by a
colon, then some text, then the next field or the end of the line?

Jeremy.Chen 01-06-2009 01:17 PM

Re: Regex Generator From Multiple Files
 
On Jan 6, 8:48*am, MRAB <goo...@mrabarnett.plus.com> wrote:
> James Pruitt wrote:
> > I am looking for a way given a number of files, say 3, that represent
> > technical support tickets in the same format to generate regular
> > expressions for the different fields automatically.

>
> > An example from of one line from each file:
> > Date: 12/30/2008 Room: 457 Building: Main
> > Date: 12/31/2008 Room: A21 Building: Annex
> > Date: 1/4/2009 Room: L69 Building: Library

>
> > The program would then, possibly using the python diff library, generate
> > the regular expression needed to parse out different fields. In this
> > case it might return a tuple like
> > ("^Date:[\w]+(.*)[\w]+Room","Room:[\w]+(.*)[\w]+Building","Building:[\w]+(.**)[\w]+$")
> > that would match each of the fields based on the common data and sort of
> > assume that what doesn't change between them is data we are looking for..

>
> Why not just assume that each field consists of a word terminated by a
> colon, then some text, then the next field or the end of the line?- Hide quoted text -
>
> - Show quoted text -


do you mean the sub method?
-------------
re.sub( r'(?i)(example)',self.captureRegxp,content )


All times are GMT. The time now is 11:22 PM.

Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.