Go Back   Velocity Reviews > Newsgroups > Python
User Name
Password
Register FAQ Members List Calendar Search Today's Posts Mark Forums Read

Reply

Python - New implementation of re module

 
Thread Tools Search this Thread
Old 07-30-2009, 02:39 PM   #11
Default Re: New implementation of re module


>>>>> MRAB <> (M) wrote:

>M> Hi all,
>M> I've been working on a new implementation of the re module. The details
>M> are at http://bugs.python.org/issue2636, specifically from
>M> http://bugs.python.org/issue2636#msg90954. I've included a .pyd file for
>M> Python 2.6 on Windows if you want to try it out.


>M> I'm interested in how fast it is generally, compared with the current re
>M> module, but especially when faced with those 'pathological' regular
>M> expressions which seem to take a long time to finish, for example:


>M> re.search(r"^(.+|D)*A$", "x" * 25 + "B")


>M> which on my PC (1.8GHz) takes 18.98secs with the re module but <0.01secs
>M> with this new implementation.


Is this version also going to use the Thompson approach?
--
Piet van Oostrum <>
URL: http://pietvanoostrum.com [PGP 8DAE142BE17999C4]
Private email:


Piet van Oostrum
  Reply With Quote
Old 07-30-2009, 11:06 PM   #12
MRAB
 
Posts: n/a
Default Re: New implementation of re module
Piet van Oostrum wrote:
>>>>>> MRAB <> (M) wrote:

>
>> M> Hi all,
>> M> I've been working on a new implementation of the re module. The details
>> M> are at http://bugs.python.org/issue2636, specifically from
>> M> http://bugs.python.org/issue2636#msg90954. I've included a .pyd file for
>> M> Python 2.6 on Windows if you want to try it out.

>
>> M> I'm interested in how fast it is generally, compared with the current re
>> M> module, but especially when faced with those 'pathological' regular
>> M> expressions which seem to take a long time to finish, for example:

>
>> M> re.search(r"^(.+|D)*A$", "x" * 25 + "B")

>
>> M> which on my PC (1.8GHz) takes 18.98secs with the re module but <0.01secs
>> M> with this new implementation.

>
> Is this version also going to use the Thompson approach?


That approach is what inspired the method in the new implementation, but
its speed comes from asking _whether_ there's a match, not _where_
there's a match, so it can ignore the route taken to get the match.

"grep", for example, selects those lines which match a pattern, whereas
the typical Python (and Perl) usage is to extract part of a string or
which part of the string matches.

Trying to support lazy and possessive quantifiers in Thompson's approach
is tricky (at least, I haven't been able to figure it out), and
back-references are right out!

There's always the chance that I could come up with some sort of hybrid
scheme, applying different approaches depending on certain conditions.
It already switches from depth-first to breadth-first when it's taken
too many iterations without advancing and merges similar states, which
is why it's so much better with 'pathological' patterns; I just have to
get the depth-first phase to be as fast as the current 're' module.


MRAB
  Reply With Quote
Old 08-03-2009, 02:00 PM   #13
John Machin
 
Posts: n/a
Default Re: New implementation of re module
On Jul 28, 2:34*am, MRAB <pyt...@mrabarnett.plus.com> wrote:
> Hi all,
>
> I've been working on a new implementation of the re module. The details
> are athttp://bugs.python.org/issue2636, specifically fromhttp://bugs.python.org/issue2636#msg90954. I've included a .pyd file for
> Python 2.6 on Windows if you want to try it out.
>


Where/how should we report perceived bugs: On that bugs.python.org
issue? Here? Private e-mail?




John Machin
  Reply With Quote
Old 08-03-2009, 05:00 PM   #14
MRAB
 
Posts: n/a
Default Re: New implementation of re module
John Machin wrote:
> On Jul 28, 2:34 am, MRAB <pyt...@mrabarnett.plus.com> wrote:
>> Hi all,
>>
>> I've been working on a new implementation of the re module. The details
>> are athttp://bugs.python.org/issue2636, specifically fromhttp://bugs.python.org/issue2636#msg90954. I've included a .pyd file for
>> Python 2.6 on Windows if you want to try it out.
>>

>
> Where/how should we report perceived bugs: On that bugs.python.org
> issue? Here? Private e-mail?
>

Probably on http://bugs.python.org/issue2636.


MRAB
  Reply With Quote
Old 08-05-2009, 01:30 AM   #15
Alex Willmer
 
Posts: n/a
Default Re: New implementation of re module
On Jul 27, 5:34*pm, MRAB <pyt...@mrabarnett.plus.com> wrote:
> Hi all,
>
> I've been working on a new implementation of the re module. The details
> are athttp://bugs.python.org/issue2636, specifically fromhttp://bugs.python.org/issue2636#msg90954. I've included a .pyd file for
> Python 2.6 on Windows if you want to try it out.


Firstly Matthew, thank you for all your work on this. It brings some
very nice regex features to Python.

I've used Christopher Arndt's post as a basis and created a package
from you latest upload (issue2636-20090804.zip), which builds for
Python 2.5 and 2.6. I'd like to see this on PyPI, so it's easier to
install the module and your work gets wider exposure. Would this be
alright and would you prefer to have control of the upload, as this is
your work?

Below is the setup.py, the unicodedata_db.h files are taken from the
appropriate branches on svn.python.org

#!/usr/bin/env python

import shutil
import sys
from distutils.core import setup, Extension

MAJOR, MINOR = sys.version_info[:2]

# Copy appropriate unicodedata_db.h, not found in published includes
if (MAJOR, MINOR) == (2, 6):
shutil.copy('Python26/unicodedata_db.h', './')
elif (MAJOR, MINOR) == (2, 5):
shutil.copy('Python25/unicodedata_db.h', './')
else:
print "WARNING: No unicodedata_db.h prepared."

setup(
name='regex',
version='20080804',
description='Alternate regular expression module, to replace re.',
author='Matthew Barnett',
author_email='', # Obsfucated
url='http://bugs.python.org/issue2636',
py_modules = ['regex'],
ext_modules=[Extension('_regex', ['_regex.c'])],
)


Sincerely, Alex Willmer


Alex Willmer
  Reply With Quote
Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are Off
Refbacks are Off

Similar Threads
Thread Thread Starter Forum Replies Last Post
implementation of collapsible panel control in asp.net bhawneshkumar Software 0 07-17-2009 06:38 AM
ImportError: No module named server manasonnet Software 0 11-20-2008 04:53 AM
WFS:GBIC module Helenzhou Hardware 0 06-19-2008 08:04 AM
I have an Invalid Page Fault in module XVID.AX Wassup Mang! DVD Video 1 10-19-2003 02:40 PM
Re: SmartCertify A+ Core Module Tony Sivori A+ Certification 2 07-17-2003 05:56 AM




SEO by vBSEO 3.3.2 ©2009, Crawlability, Inc.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46