![]() |
|
|
|||||||
![]() |
Python - New implementation of re module |
|
|
Thread Tools | Search this Thread |
|
|
#11 |
|
>>>>> MRAB <> (M) wrote:
>M> Hi all, >M> I've been working on a new implementation of the re module. The details >M> are at http://bugs.python.org/issue2636, specifically from >M> http://bugs.python.org/issue2636#msg90954. I've included a .pyd file for >M> Python 2.6 on Windows if you want to try it out. >M> I'm interested in how fast it is generally, compared with the current re >M> module, but especially when faced with those 'pathological' regular >M> expressions which seem to take a long time to finish, for example: >M> re.search(r"^(.+|D)*A$", "x" * 25 + "B") >M> which on my PC (1.8GHz) takes 18.98secs with the re module but <0.01secs >M> with this new implementation. Is this version also going to use the Thompson approach? -- Piet van Oostrum <> URL: http://pietvanoostrum.com [PGP 8DAE142BE17999C4] Private email: Piet van Oostrum |
|
|
|
|
#12 |
|
Posts: n/a
|
Piet van Oostrum wrote:
>>>>>> MRAB <> (M) wrote: > >> M> Hi all, >> M> I've been working on a new implementation of the re module. The details >> M> are at http://bugs.python.org/issue2636, specifically from >> M> http://bugs.python.org/issue2636#msg90954. I've included a .pyd file for >> M> Python 2.6 on Windows if you want to try it out. > >> M> I'm interested in how fast it is generally, compared with the current re >> M> module, but especially when faced with those 'pathological' regular >> M> expressions which seem to take a long time to finish, for example: > >> M> re.search(r"^(.+|D)*A$", "x" * 25 + "B") > >> M> which on my PC (1.8GHz) takes 18.98secs with the re module but <0.01secs >> M> with this new implementation. > > Is this version also going to use the Thompson approach? That approach is what inspired the method in the new implementation, but its speed comes from asking _whether_ there's a match, not _where_ there's a match, so it can ignore the route taken to get the match. "grep", for example, selects those lines which match a pattern, whereas the typical Python (and Perl) usage is to extract part of a string or which part of the string matches. Trying to support lazy and possessive quantifiers in Thompson's approach is tricky (at least, I haven't been able to figure it out), and back-references are right out! There's always the chance that I could come up with some sort of hybrid scheme, applying different approaches depending on certain conditions. It already switches from depth-first to breadth-first when it's taken too many iterations without advancing and merges similar states, which is why it's so much better with 'pathological' patterns; I just have to get the depth-first phase to be as fast as the current 're' module. MRAB |
|
|
|
#13 |
|
Posts: n/a
|
On Jul 28, 2:34*am, MRAB <pyt...@mrabarnett.plus.com> wrote:
> Hi all, > > I've been working on a new implementation of the re module. The details > are athttp://bugs.python.org/issue2636, specifically fromhttp://bugs.python.org/issue2636#msg90954. I've included a .pyd file for > Python 2.6 on Windows if you want to try it out. > Where/how should we report perceived bugs: On that bugs.python.org issue? Here? Private e-mail? John Machin |
|
|
|
#14 |
|
Posts: n/a
|
John Machin wrote:
> On Jul 28, 2:34 am, MRAB <pyt...@mrabarnett.plus.com> wrote: >> Hi all, >> >> I've been working on a new implementation of the re module. The details >> are athttp://bugs.python.org/issue2636, specifically fromhttp://bugs.python.org/issue2636#msg90954. I've included a .pyd file for >> Python 2.6 on Windows if you want to try it out. >> > > Where/how should we report perceived bugs: On that bugs.python.org > issue? Here? Private e-mail? > Probably on http://bugs.python.org/issue2636. MRAB |
|
|
|
#15 |
|
Posts: n/a
|
On Jul 27, 5:34*pm, MRAB <pyt...@mrabarnett.plus.com> wrote:
> Hi all, > > I've been working on a new implementation of the re module. The details > are athttp://bugs.python.org/issue2636, specifically fromhttp://bugs.python.org/issue2636#msg90954. I've included a .pyd file for > Python 2.6 on Windows if you want to try it out. Firstly Matthew, thank you for all your work on this. It brings some very nice regex features to Python. I've used Christopher Arndt's post as a basis and created a package from you latest upload (issue2636-20090804.zip), which builds for Python 2.5 and 2.6. I'd like to see this on PyPI, so it's easier to install the module and your work gets wider exposure. Would this be alright and would you prefer to have control of the upload, as this is your work? Below is the setup.py, the unicodedata_db.h files are taken from the appropriate branches on svn.python.org #!/usr/bin/env python import shutil import sys from distutils.core import setup, Extension MAJOR, MINOR = sys.version_info[:2] # Copy appropriate unicodedata_db.h, not found in published includes if (MAJOR, MINOR) == (2, 6): shutil.copy('Python26/unicodedata_db.h', './') elif (MAJOR, MINOR) == (2, 5): shutil.copy('Python25/unicodedata_db.h', './') else: print "WARNING: No unicodedata_db.h prepared." setup( name='regex', version='20080804', description='Alternate regular expression module, to replace re.', author='Matthew Barnett', author_email='', # Obsfucated url='http://bugs.python.org/issue2636', py_modules = ['regex'], ext_modules=[Extension('_regex', ['_regex.c'])], ) Sincerely, Alex Willmer Alex Willmer |
|
![]() |
| Thread Tools | Search this Thread |
|
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| implementation of collapsible panel control in asp.net | bhawneshkumar | Software | 0 | 07-17-2009 06:38 AM |
| ImportError: No module named server | manasonnet | Software | 0 | 11-20-2008 04:53 AM |
| WFS:GBIC module | Helenzhou | Hardware | 0 | 06-19-2008 08:04 AM |
| I have an Invalid Page Fault in module XVID.AX | Wassup Mang! | DVD Video | 1 | 10-19-2003 02:40 PM |
| Re: SmartCertify A+ Core Module | Tony Sivori | A+ Certification | 2 | 07-17-2003 05:56 AM |