Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Ruby > Short question on regex in Ruby

Reply
Thread Tools

Short question on regex in Ruby

 
 
Chris Ro
Guest
Posts: n/a
 
      09-26-2008
Hi,

I have a little problem with a regex in Ruby:

I have twos strings:

string1 = "He is the 20th."
string2 = "25th"

I wrote this to "extract" the place (20 or 25 respectively):

place1 = string1.gsub(/.*(\d+)th.*/,'\1')
place2 = string2.gsub(/.*(\d+)th.*/,'\1')
pp place1
pp place1

=> "0"
=> "5"

Of course, I would like to get all the digits before "th". Why is only
the last one captured?

If anyone could please explain this, and help me come up with a regex
that captures 20 and 25, respectively, this would be greatly
appreciated.

Cheers, Chris
--
Posted via http://www.ruby-forum.com/.

 
Reply With Quote
 
 
 
 
Mark Thomas
Guest
Posts: n/a
 
      09-26-2008
On Sep 26, 10:02*am, Chris Ro <(E-Mail Removed)> wrote:
> Hi,
>
> I have a little problem with a regex in Ruby:
>
> I have twos strings:
>
> string1 = "He is the 20th."
> string2 = "25th"
>
> I wrote this to "extract" the place (20 or 25 respectively):
>
> place1 = string1.gsub(/.*(\d+)th.*/,'\1')
> place2 = string2.gsub(/.*(\d+)th.*/,'\1')
> pp place1
> pp place1
>
> => "0"
> => "5"
>
> Of course, I would like to get all the digits before "th". Why is only
> the last one captured?


Because the .* is greedy and will get all it can, which is all but the
last digit.

> If anyone could please explain this, and help me come up with a regex
> that captures 20 and 25, respectively, this would be greatly


place = string[/\d+(?=th)/]

-- Mark.
 
Reply With Quote
 
 
 
 
Thomas B.
Guest
Posts: n/a
 
      09-26-2008
Chris Ro wrote:
> place1 = string1.gsub(/.*(\d+)th.*/,'\1')


Hello. I think your approach with using gsub is not the best possible
here. It's better to simply find the matching part using match and
substitute it for the whole string, like this:
place1 = string1.match(/(\d+)th\b/)[1]
The \b ensures that the next character after 'th' is not a word
character (\b is word boundary), and [1] at the end is extracting the
first bracketed group. It also makes it possible to skip the .* at both
ends, which is a bit ugly.

Apart from that, a useful piece of knowledge about regexps:
/.*?(\d+)th.*/ will match what you want, because the first .*? will be
reluctant to eat up more characters, so it will pass to \d+ as many
digits as it can.

TPR.
--
Posted via http://www.ruby-forum.com/.

 
Reply With Quote
 
Robert Klemme
Guest
Posts: n/a
 
      09-26-2008
2008/9/26 Thomas B. <(E-Mail Removed)>:
> Chris Ro wrote:
>> place1 = string1.gsub(/.*(\d+)th.*/,'\1')

>
> Hello. I think your approach with using gsub is not the best possible
> here.


Agree.

> It's better to simply find the matching part using match and
> substitute it for the whole string, like this:
> place1 = string1.match(/(\d+)th\b/)[1]


For extraction there is a simpler solution

irb(main):002:0> "He is the 20th."[/(\d+)th\b/, 1]
=> "20"
irb(main):003:0> "25th"[/(\d+)th\b/, 1]
=> "25"

> The \b ensures that the next character after 'th' is not a word
> character (\b is word boundary), and [1] at the end is extracting the
> first bracketed group. It also makes it possible to skip the .* at both
> ends, which is a bit ugly.


Right.

> Apart from that, a useful piece of knowledge about regexps:
> /.*?(\d+)th.*/ will match what you want, because the first .*? will be
> reluctant to eat up more characters, so it will pass to \d+ as many
> digits as it can.


But reluctant is slow (see my benchmark from a few days ago).

Cheer

robert

--
use.inject do |as, often| as.you_can - without end

 
Reply With Quote
 
Thomas B.
Guest
Posts: n/a
 
      09-26-2008
Robert Klemme wrote:
>> It's better to simply find the matching part using match and
>> substitute it for the whole string, like this:
>> place1 = string1.match(/(\d+)th\b/)[1]

>
> For extraction there is a simpler solution
>
> irb(main):002:0> "He is the 20th."[/(\d+)th\b/, 1]
> => "20"
> irb(main):003:0> "25th"[/(\d+)th\b/, 1]
> => "25"


Yes, I forgot about this one. +1

>> Apart from that, a useful piece of knowledge about regexps:
>> /.*?(\d+)th.*/ will match what you want, because the first .*? will be
>> reluctant to eat up more characters, so it will pass to \d+ as many
>> digits as it can.

>
> But reluctant is slow (see my benchmark from a few days ago).


OK. I guess reluctant is slow especially when the string that it has to
cover is long. And I agree that it's not a very good idea to use
reluctant regexps in time-critical applications, and the first solution
is much better here. I mentioned them just to let the original poster
gain some knowledge. I use reluctant patterns when not in hurry, because
they make things much easier sometimes.

TPR.

--
Posted via http://www.ruby-forum.com/.

 
Reply With Quote
 
Patrick He
Guest
Posts: n/a
 
      09-26-2008
[Note: parts of this message were removed to make it a legal post.]

IMO, lookahead is the best solution for the problem.

Mark Thomas wrote:
> On Sep 26, 10:02 am, Chris Ro <(E-Mail Removed)> wrote:
>
>> Hi,
>>
>> I have a little problem with a regex in Ruby:
>>
>> I have twos strings:
>>
>> string1 = "He is the 20th."
>> string2 = "25th"
>>
>> I wrote this to "extract" the place (20 or 25 respectively):
>>
>> place1 = string1.gsub(/.*(\d+)th.*/,'\1')
>> place2 = string2.gsub(/.*(\d+)th.*/,'\1')
>> pp place1
>> pp place1
>>
>> => "0"
>> => "5"
>>
>> Of course, I would like to get all the digits before "th". Why is only
>> the last one captured?
>>

>
> Because the .* is greedy and will get all it can, which is all but the
> last digit.
>
>
>> If anyone could please explain this, and help me come up with a regex
>> that captures 20 and 25, respectively, this would be greatly
>>

>
> place = string[/\d+(?=th)/]
>
> -- Mark.
>
>
>


 
Reply With Quote
 
Nit Khair
Guest
Posts: n/a
 
      09-27-2008
If you need to get multiple numbers out you could try scan().

> d="9,45, 567"

=> "9,45, 567"

> d.scan(/\d+/)

=> ["9", "45", "567"]

--
Posted via http://www.ruby-forum.com/.

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Difference of extern short *x and extern short x[]? Andre C Programming 5 07-17-2012 07:38 PM
How make regex that means "contains regex#1 but NOT regex#2" ?? seberino@spawar.navy.mil Python 3 07-01-2008 03:06 PM
unsigned short, short literals Ioannis Vranos C Programming 5 03-05-2008 01:25 AM
longs, long longs, short short long ints . . . huh?! David Geering C Programming 15 01-11-2007 09:39 PM
unsigned short short? slougheed@gmail.com C++ 4 10-16-2006 11:25 PM



Advertisments