![]() |
Short question on regex in Ruby
Hi,
I have a little problem with a regex in Ruby: I have twos strings: string1 = "He is the 20th." string2 = "25th" I wrote this to "extract" the place (20 or 25 respectively): place1 = string1.gsub(/.*(\d+)th.*/,'\1') place2 = string2.gsub(/.*(\d+)th.*/,'\1') pp place1 pp place1 => "0" => "5" Of course, I would like to get all the digits before "th". Why is only the last one captured? If anyone could please explain this, and help me come up with a regex that captures 20 and 25, respectively, this would be greatly appreciated. Cheers, Chris -- Posted via http://www.ruby-forum.com/. |
Re: Short question on regex in Ruby
On Sep 26, 10:02*am, Chris Ro <kyl...@gmx.net> wrote:
> Hi, > > I have a little problem with a regex in Ruby: > > I have twos strings: > > string1 = "He is the 20th." > string2 = "25th" > > I wrote this to "extract" the place (20 or 25 respectively): > > place1 = string1.gsub(/.*(\d+)th.*/,'\1') > place2 = string2.gsub(/.*(\d+)th.*/,'\1') > pp place1 > pp place1 > > => "0" > => "5" > > Of course, I would like to get all the digits before "th". Why is only > the last one captured? Because the .* is greedy and will get all it can, which is all but the last digit. > If anyone could please explain this, and help me come up with a regex > that captures 20 and 25, respectively, this would be greatly place = string[/\d+(?=th)/] -- Mark. |
Re: Short question on regex in Ruby
Chris Ro wrote:
> place1 = string1.gsub(/.*(\d+)th.*/,'\1') Hello. I think your approach with using gsub is not the best possible here. It's better to simply find the matching part using match and substitute it for the whole string, like this: place1 = string1.match(/(\d+)th\b/)[1] The \b ensures that the next character after 'th' is not a word character (\b is word boundary), and [1] at the end is extracting the first bracketed group. It also makes it possible to skip the .* at both ends, which is a bit ugly. Apart from that, a useful piece of knowledge about regexps: /.*?(\d+)th.*/ will match what you want, because the first .*? will be reluctant to eat up more characters, so it will pass to \d+ as many digits as it can. TPR. -- Posted via http://www.ruby-forum.com/. |
Re: Short question on regex in Ruby
2008/9/26 Thomas B. <tpreal@gmail.com>:
> Chris Ro wrote: >> place1 = string1.gsub(/.*(\d+)th.*/,'\1') > > Hello. I think your approach with using gsub is not the best possible > here. Agree. > It's better to simply find the matching part using match and > substitute it for the whole string, like this: > place1 = string1.match(/(\d+)th\b/)[1] For extraction there is a simpler solution irb(main):002:0> "He is the 20th."[/(\d+)th\b/, 1] => "20" irb(main):003:0> "25th"[/(\d+)th\b/, 1] => "25" > The \b ensures that the next character after 'th' is not a word > character (\b is word boundary), and [1] at the end is extracting the > first bracketed group. It also makes it possible to skip the .* at both > ends, which is a bit ugly. Right. > Apart from that, a useful piece of knowledge about regexps: > /.*?(\d+)th.*/ will match what you want, because the first .*? will be > reluctant to eat up more characters, so it will pass to \d+ as many > digits as it can. But reluctant is slow (see my benchmark from a few days ago). Cheer robert -- use.inject do |as, often| as.you_can - without end |
Re: Short question on regex in Ruby
Robert Klemme wrote:
>> It's better to simply find the matching part using match and >> substitute it for the whole string, like this: >> place1 = string1.match(/(\d+)th\b/)[1] > > For extraction there is a simpler solution > > irb(main):002:0> "He is the 20th."[/(\d+)th\b/, 1] > => "20" > irb(main):003:0> "25th"[/(\d+)th\b/, 1] > => "25" Yes, I forgot about this one. +1 >> Apart from that, a useful piece of knowledge about regexps: >> /.*?(\d+)th.*/ will match what you want, because the first .*? will be >> reluctant to eat up more characters, so it will pass to \d+ as many >> digits as it can. > > But reluctant is slow (see my benchmark from a few days ago). OK. I guess reluctant is slow especially when the string that it has to cover is long. And I agree that it's not a very good idea to use reluctant regexps in time-critical applications, and the first solution is much better here. I mentioned them just to let the original poster gain some knowledge. I use reluctant patterns when not in hurry, because they make things much easier sometimes. TPR. -- Posted via http://www.ruby-forum.com/. |
Re: Short question on regex in Ruby
[Note: parts of this message were removed to make it a legal post.]
IMO, lookahead is the best solution for the problem. Mark Thomas wrote: > On Sep 26, 10:02 am, Chris Ro <kyl...@gmx.net> wrote: > >> Hi, >> >> I have a little problem with a regex in Ruby: >> >> I have twos strings: >> >> string1 = "He is the 20th." >> string2 = "25th" >> >> I wrote this to "extract" the place (20 or 25 respectively): >> >> place1 = string1.gsub(/.*(\d+)th.*/,'\1') >> place2 = string2.gsub(/.*(\d+)th.*/,'\1') >> pp place1 >> pp place1 >> >> => "0" >> => "5" >> >> Of course, I would like to get all the digits before "th". Why is only >> the last one captured? >> > > Because the .* is greedy and will get all it can, which is all but the > last digit. > > >> If anyone could please explain this, and help me come up with a regex >> that captures 20 and 25, respectively, this would be greatly >> > > place = string[/\d+(?=th)/] > > -- Mark. > > > |
Re: Short question on regex in Ruby
If you need to get multiple numbers out you could try scan().
> d="9,45, 567" => "9,45, 567" > d.scan(/\d+/) => ["9", "45", "567"] -- Posted via http://www.ruby-forum.com/. |
| All times are GMT. The time now is 07:57 PM. |
Powered by vBulletin®. Copyright ©2000 - 2013, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.