Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Ruby > Regexp Guru Needed

Reply
Thread Tools

Regexp Guru Needed

 
 
James Edward Gray II
Guest
Posts: n/a
 
      10-30-2005
We're having a discussion on Ruby Core about how to speed up CSV.
I'm trying to tune a Regexp that matches CSV fields. However, I'm
seeing something I don't expect. Can someone explain this to me,
please?

>> ",".scan(/(?:^|,)(?:"()"|([^",]*))/)

=> [[nil, ""]]

That's a simplified version of what I'm messing with. My question
is, why does it only match once, when I expect two matches?

The first match should be right at the beginning, and is basically
(?:^ ... )(?: ... ([^",]*)). The second match should begin at the
comma, being (?: ... ,)(?: ... ([^",]*)). What am I missing?

James Edward Gray II


 
Reply With Quote
 
 
 
 
Peter Vanbroekhoven
Guest
Posts: n/a
 
      10-30-2005
------=_Part_6082_11694773.1130634239195
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline

On 10/30/05, James Edward Gray II <(E-Mail Removed)> wrote:
>
> We're having a discussion on Ruby Core about how to speed up CSV.
> I'm trying to tune a Regexp that matches CSV fields. However, I'm
> seeing something I don't expect. Can someone explain this to me,
> please?
>
> >> ",".scan(/(?:^|,)(?:"()"|([^",]*))/)

> =3D> [[nil, ""]]
>
> That's a simplified version of what I'm messing with. My question
> is, why does it only match once, when I expect two matches?
>
> The first match should be right at the beginning, and is basically
> (?:^ ... )(?: ... ([^",]*)). The second match should begin at the
> comma, being (?: ... ,)(?: ... ([^",]*)). What am I missing?
>


I'm not pretending to be a regexp guru, but nonetheless:

scan moves forward one character even if the portion of the string that it
matched has length 0. This is to prevent it from going into an infinite
loop. Consider your example: the regexp matches at the start of the string,
and matches 0 characters. If for the next match, Ruby has not moved forward
one character, the regexp would match at the start of the string again in
exactly the same way and still have not matched anything of the string.

My suggestion would be to have two regexps, one to strip off the beginning
of the CSV line, and one to split the remainder into parts.

Peter

------=_Part_6082_11694773.1130634239195--


 
Reply With Quote
 
 
 
 
James Edward Gray II
Guest
Posts: n/a
 
      10-30-2005
On Oct 29, 2005, at 8:04 PM, Peter Vanbroekhoven wrote:

> I'm not pretending to be a regexp guru, but nonetheless:
>
> scan moves forward one character even if the portion of the string
> that it
> matched has length 0.


I am aware of the infamous "bump-along", but doesn't 0 + 1 == 1? I
expected that to put it on the comma, which would work just fine.

James Edward Gray II



 
Reply With Quote
 
James Edward Gray II
Guest
Posts: n/a
 
      10-30-2005
On Oct 29, 2005, at 10:18 PM, James Edward Gray II wrote:

> I am aware of the infamous "bump-along", but doesn't 0 + 1 == 1? I
> expected that to put it on the comma, which would work just fine.


Nevermind. I get how dumb I'm being now. There's only one
character, at 0. Duh. Thanks for the lesson.

James Edward Gray II



 
Reply With Quote
 
Warren Seltzer
Guest
Posts: n/a
 
      10-30-2005
If you google for "CSV regexp" you get a lot of hits. This one looks promising:

http://www.codeguru.com/columns/DotN...cle.php/c8153/

Warren Seltzer


-----Original Message-----
From: James Edward Gray II [(E-Mail Removed)]
Sent: Sunday, October 30, 2005 2:27 AM
To: ruby-talk ML
Subject: Regexp Guru Needed
...




 
Reply With Quote
 
James Edward Gray II
Guest
Posts: n/a
 
      10-30-2005
On Oct 30, 2005, at 3:43 AM, Warren Seltzer wrote:

> If you google for "CSV regexp" you get a lot of hits. This one
> looks promising:
>
> http://www.codeguru.com/columns/DotN...cle.php/c8153/


Thanks.

Just FYI, the main expression we are working with is:

/\G(?:^|,)(?:"((?>[^"]*)(?>""[^"]*)*)"|([^",]*))/

From Mastering Regular Expressions (2nd Edition).

James Edward Gray II


 
Reply With Quote
 
Stephen Waits
Guest
Posts: n/a
 
      11-02-2005
James Edward Gray II wrote:
>
> From Mastering Regular Expressions (2nd Edition).


Check out RegexBuddy. Worth getting access to Win32 just for this if
you're a Mac guy needing to debug some REs.

--Steve



 
Reply With Quote
 
Martin DeMello
Guest
Posts: n/a
 
      11-02-2005
Stephen Waits <(E-Mail Removed)> wrote:
> James Edward Gray II wrote:
> >
> > From Mastering Regular Expressions (2nd Edition).

>
> Check out RegexBuddy. Worth getting access to Win32 just for this if
> you're a Mac guy needing to debug some REs.


Or http://www.weitz.de/regex-coach/ - it's the best one I've seen, and
has Linux and Windows ports (sadly no Mac version).

martin
 
Reply With Quote
 
Stephen Waits
Guest
Posts: n/a
 
      11-02-2005

On Nov 2, 2005, at 4:42 AM, Martin DeMello wrote:

> Stephen Waits <(E-Mail Removed)> wrote:
>> Check out RegexBuddy. Worth getting access to Win32 just for this if
>> you're a Mac guy needing to debug some REs.

>
> Or http://www.weitz.de/regex-coach/ - it's the best one I've seen,
> and
> has Linux and Windows ports (sadly no Mac version).


Thanks for the link Martin. I hadn't found it before. I tried it
out, and, it's a nice "free" alternative to RegexBuddy; however, it
pales in comparison to what RB can do. I do wish RB was a little
cheaper - I've bought much richer software for less money.

--Steve



 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
[regexp] How to convert string "/regexp/i" to /regexp/i - ? Joao Silva Ruby 16 08-21-2009 05:52 PM
IT Job Guru - Certification Guru Rohit A+ Certification 0 08-13-2008 07:31 PM
ASP.NET Image Upload... Guru needed John Thompson ASP .Net 1 06-30-2004 07:22 AM
Control Guru Needed Joe ASP .Net 0 01-20-2004 02:43 AM
Language and Direction real GURU needed! (Hebrew and maybe arabic) CR or Charset issue? Efy. ASP .Net 2 06-25-2003 03:28 PM



Advertisments