Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > regular expression pb. with tags

Reply
Thread Tools

regular expression pb. with tags

 
 
steeve_dun@SoftHome.net
Guest
Posts: n/a
 
      09-26-2006
Hi,
I want to make some pattern replacement. ie to delete every thing
that's between 2 tags.
For example for

1<tag> 2</tag>3
x<tag> a<tag> b </tag> c</tag>z

I want to get

1 3
x z

But I have a problem with embeded tags.
I've tried :
$text =~ s/\<tag\>(.*?)\<\/tag\>//sg;
but it doens't work for embeded tags. It gives:
13
x c</tag>z

Is there a way to deal with this?

Thank you

-steeve

 
Reply With Quote
 
 
 
 
David Squire
Guest
Posts: n/a
 
      09-26-2006
http://www.velocityreviews.com/forums/(E-Mail Removed) wrote:
> Hi,
> I want to make some pattern replacement. ie to delete every thing
> that's between 2 tags.
> For example for
>
> 1<tag> 2</tag>3
> x<tag> a<tag> b </tag> c</tag>z
>
> I want to get
>
> 1 3
> x z
>
> But I have a problem with embeded tags.
> I've tried :
> $text =~ s/\<tag\>(.*?)\<\/tag\>//sg;
> but it doens't work for embeded tags. It gives:
> 13
> x c</tag>z
>
> Is there a way to deal with this?


Yep. Don't try to use regular expressions to parse XML. Use a module
that understands XML. Go to CPAN and you will find many.


DS

 
Reply With Quote
 
 
 
 
anno4000@radom.zrz.tu-berlin.de
Guest
Posts: n/a
 
      09-26-2006
<(E-Mail Removed)> wrote in comp.lang.perl.misc:
> Hi,
> I want to make some pattern replacement. ie to delete every thing
> that's between 2 tags.
> For example for
>
> 1<tag> 2</tag>3
> x<tag> a<tag> b </tag> c</tag>z
>
> I want to get
>
> 1 3
> x z
>
> But I have a problem with embeded tags.
> I've tried :
> $text =~ s/\<tag\>(.*?)\<\/tag\>//sg;
> but it doens't work for embeded tags. It gives:
> 13
> x c</tag>z
>
> Is there a way to deal with this?


Not using regular expressions directly. Use one of the HTML-parsing
modules from CPAN.

Anno
 
Reply With Quote
 
Xicheng Jia
Guest
Posts: n/a
 
      09-26-2006
(E-Mail Removed) wrote:
> Hi,
> I want to make some pattern replacement. ie to delete every thing
> that's between 2 tags.
> For example for
>
> 1<tag> 2</tag>3
> x<tag> a<tag> b </tag> c</tag>z
>
> I want to get
>
> 1 3
> x z
>
> But I have a problem with embeded tags.
> I've tried :
> $text =~ s/\<tag\>(.*?)\<\/tag\>//sg;
> but it doens't work for embeded tags. It gives:
> 13
> x c</tag>z
>
> Is there a way to deal with this?


Since you are using Perl, and XML is quite well formated, you may try
something like:

my $ptn;
$ptn = qr(<tag>(???{$ptn})|.)*?</tag>)s;
$line =~ s/$ptn//g;

I am not encouraging you using regexes at work. But in case of some
small programs, using regexes might be much faster/easier if you know
what you do.

Regards,
Xicheng

 
Reply With Quote
 
Ted Zlatanov
Guest
Posts: n/a
 
      09-26-2006
On 26 Sep 2006, (E-Mail Removed) wrote:

> I want to make some pattern replacement. ie to delete every thing
> that's between 2 tags.
> For example for
>
> 1<tag> 2</tag>3
> x<tag> a<tag> b </tag> c</tag>z
>
> I want to get
>
> 1 3
> x z
>
> But I have a problem with embeded tags.
> I've tried :
> $text =~ s/\<tag\>(.*?)\<\/tag\>//sg;
> but it doens't work for embeded tags. It gives:
> 13
> x c</tag>z
>
> Is there a way to deal with this?


For the first example, you're getting exactly what you wanted ("13").
Look at your input data.

For the second example, your requirements are not good. You don't say
whether you want to replace the outermost tags (in which case a regex
would work) or you want to balance tags. For outermost tag
replacement, use

$text =~ s/\<tag\>(.*)\<\/tag\>//sg;

but note that this will also replace "<tag>a</tag> extra <tag>b</tag>"
with "" and not " extra " as you may expect.

My guess is that you do want to balance tags, and you can use
Text::Balanced for that (especially if your text is not valid XML or
even SGML). If you are doing SGML/HTML/XML/etc. tagged formats then
you should search CPAN for the appropriate parser, as others have
suggested. Look at "perldoc -q html" as well.

Ted
 
Reply With Quote
 
steeve_dun@SoftHome.net
Guest
Posts: n/a
 
      09-27-2006
Thank you all
-steve

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Regular Expression for HTML Tags and Special Characters Marc Bogaard Perl Misc 12 10-21-2004 07:11 PM
Re: Regular expression to find <tr> tags in 2nd level HTML tables Shannon Jacobs Perl 8 01-24-2004 05:26 AM
Regular expression to find <tr> tags in 2nd level HTML tables Shannon Jacobs Javascript 19 01-24-2004 05:26 AM
Regular expression to find <tr> tags in 2nd level HTML tables Shannon Jacobs Perl Misc 18 01-23-2004 02:17 AM
Dynamically changing the regular expression of Regular Expression validator VSK ASP .Net 2 08-24-2003 02:47 PM



Advertisments