Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Ruby > regexp -how to match this?

Reply
Thread Tools

regexp -how to match this?

 
 
Nanyang Zhan
Guest
Posts: n/a
 
      04-09-2007
what kind of pattern will match the part of sentence before a <span>
tag?

for instance:
for this sentence:
This forum is connected to a mailing list that is read by <span
class="wow">thousands</span> of people.

it'll match:
This forum is connected to a mailing list that is read by

--
Posted via http://www.ruby-forum.com/.

 
Reply With Quote
 
 
 
 
Rick DeNatale
Guest
Posts: n/a
 
      04-09-2007
On 4/9/07, Nanyang Zhan <(E-Mail Removed)> wrote:
> what kind of pattern will match the part of sentence before a <span>
> tag?
>
> for instance:
> for this sentence:
> This forum is connected to a mailing list that is read by <span
> class="wow">thousands</span> of people.
>
> it'll match:
> This forum is connected to a mailing list that is read by


/^.*?(?=<span)/

This is a little loose since it treats anything starting with "<span"
as a span tag.

Breaking it down:

^ - start of string

*? - 0 or more characters, non-greedy, otherwise this would match
everything up to the LAST "<span" in the string, in stead of the first
which is what I suspect you really want.

(?=<span) - This is a zero-length lookahead, this means that "<span"
must occur just after what has been matched, but it will not be part
of the match itself.

HTH

--
Rick DeNatale

My blog on Ruby
http://talklikeaduck.denhaven2.com/

 
Reply With Quote
 
 
 
 
Robert Klemme
Guest
Posts: n/a
 
      04-09-2007
On 09.04.2007 15:28, Nanyang Zhan wrote:
> what kind of pattern will match the part of sentence before a <span>
> tag?
>
> for instance:
> for this sentence:
> This forum is connected to a mailing list that is read by <span
> class="wow">thousands</span> of people.
>
> it'll match:
> This forum is connected to a mailing list that is read by


One way to do it:

irb(main):022:0* s='This forum is connected to a mailing list that is
read by <span
irb(main):023:0' class="wow">thousands</span> of people.'
=> "This forum is connected to a mailing list that is read by
<span\nclass=\"wow\">thousands</span> of people."
irb(main):024:0> s[/\A(.*?)<span/, 1]
=> "This forum is connected to a mailing list that is read by "

robert
 
Reply With Quote
 
Nanyang Zhan
Guest
Posts: n/a
 
      04-09-2007
Rick Denatale wrote:

> /^.*?(?=<span)/


thanks.

BTW, what is “Duck Typing”?

--
Posted via http://www.ruby-forum.com/.

 
Reply With Quote
 
Rick DeNatale
Guest
Posts: n/a
 
      04-09-2007
On 4/9/07, Nanyang Zhan <(E-Mail Removed)> wrote:
> Rick Denatale wrote:
>
> > /^.*?(?=<span)/

>
> thanks.
>
> BTW, what is "Duck Typing"?


Well, here's some of what *I've* written on the subject:
http://talklikeaduck.denhaven2.com/articles/tag/ducks

I'd suggest looking at them starting with the oldest one (they are in
reverse chronological order).

--
Rick DeNatale

My blog on Ruby
http://talklikeaduck.denhaven2.com/

 
Reply With Quote
 
John Joyce
Guest
Posts: n/a
 
      04-09-2007
Using the old saying,
"If it walks like a duck and talks like a duck, then it is a duck."
It means deciding something is a duck if it seems to be a duck.
Part of the principle of least surprise [to Matz]
On Apr 9, 2007, at 11:37 PM, Nanyang Zhan wrote:

> Rick Denatale wrote:
>
>> /^.*?(?=3D<span)/

>
> thanks.
>
> BTW, what is =93Duck Typing=94?
>
> --=20
> Posted via http://www.ruby-forum.com/.
>



 
Reply With Quote
 
Phillip Gawlowski
Guest
Posts: n/a
 
      04-09-2007
Nanyang Zhan wrote:

> BTW, what is “Duck Typing”?


PickAxe 2nd Edition (and probably the freely available 1st Edition) have
a nice, interesting and very readable chapter covering that.

In a nutshell: What the other's have already said.

--
Phillip "CynicalRyan" Gawlowski
http://cynicalryan.110mb.com/

Rule of Open-Source Programming #6:

The user is always right unless proven otherwise by the developer.

 
Reply With Quote
 
Nanyang Zhan
Guest
Posts: n/a
 
      04-10-2007
Rick Denatale wrote:

> (?=<span) - This is a zero-length lookahead, this means that "<span"
> must occur just after what has been matched, but it will not be part
> of the match itself.


so ?= makes pattern lookAHEAD. How to make pattern lookBEHIND?

for instance:

example sentence:
This forum is connected to a mailing list that is read by <span
class="wow">thousands</span> of people.

question:
how to make a Regexp to match the words followed by the </span> tag?

a /<\/span>.*/ will include the tag, which isn't what I want.

--
Posted via http://www.ruby-forum.com/.

 
Reply With Quote
 
Phrogz
Guest
Posts: n/a
 
      04-10-2007
On Apr 10, 7:00 am, Nanyang Zhan <(E-Mail Removed)> wrote:
> so ?= makes pattern lookAHEAD. How to make pattern lookBEHIND?


http://phrogz.net/ProgrammingRuby/la...tml#extensions

Zero-width positive and negative lookaheads are supported in Ruby's
regexp engine in 1.8. Zero-width lookbehind assertions are not
supported by the current regexp engine. (However, they are supported
by Oniguruma, the regexp engine used in 1.9 and future builds of
Ruby.)

> example sentence:
> This forum is connected to a mailing list that is read by <span
> class="wow">thousands</span> of people.
>
> question:
> how to make a Regexp to match the words followed by the </span> tag?


Just because you consume them doesn't mean you have to use them. Use
parentheses to saved parts of text extracted by your regular
expression.

irb(main):001:0> str = 'is read by <span class="wow">thousands</span>
of people.'
=> "is read by <span class=\"wow\">thousands</span> of people."

irb(main):002:0> str[ /<\/span>(.+)/, 1 ]
=> " of people."

irb(main):003:0> %r{</span>(.+)}.match( str ).to_a
=> ["</span> of people.", " of people."]



 
Reply With Quote
 
Nanyang Zhan
Guest
Posts: n/a
 
      04-10-2007
Gavin Kistner wrote:
> Just because you consume them doesn't mean you have to use them. Use
> parentheses to saved parts of text extracted by your regular
> expression.


I'm trying to code one method(with one regexp input) to extract any part
of a given string.

but now it seems a fix method is very hard to accomplish this job.

--
Posted via http://www.ruby-forum.com/.

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
[regexp] How to convert string "/regexp/i" to /regexp/i - ? Joao Silva Ruby 16 08-21-2009 05:52 PM
String#match vs. Regexp#match - confused Old Echo Ruby 1 09-04-2008 06:11 PM
Ruby 1.9 - ArgumentError: incompatible encoding regexp match(US-ASCII regexp with ISO-2022-JP string) Mikel Lindsaar Ruby 0 03-31-2008 10:27 AM
RegExp.exec() returns null when there is a match - a JavaScript RegExp bug? Uldis Bojars Javascript 2 12-17-2006 09:59 PM
Java regex can't match lengthy match? hiwa Java 0 01-29-2004 10:09 AM



Advertisments