Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > regular expression could do this?(newbie)

Reply
Thread Tools

regular expression could do this?(newbie)

 
 
Alont
Guest
Posts: n/a
 
      09-21-2004
I want to pattern a text block, but the text block very large(and
multi-line), the first line should be:
<html><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"

and the end of the text block:
rel="external" target="new">Forum</a></li>
</ul>
</div>
</div>
</div>

so, how I can pattern the text block in a html file(many html files
waiting for pattern and then replace to"<!-- #include
virtual="/Head.inc" -->")

I have seen much examples, but can't find a example could do this
--
Your fault as a Government is My failure as a citizen
 
Reply With Quote
 
 
 
 
wfsp
Guest
Posts: n/a
 
      09-21-2004

"Alont" <(E-Mail Removed)> wrote in message
news:41519803.43923109@130.133.1.4...
>I want to pattern a text block, but the text block very large(and
> multi-line), the first line should be:
> <html><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
>
> and the end of the text block:
> rel="external" target="new">Forum</a></li>
> </ul>
> </div>
> </div>
> </div>
>
> so, how I can pattern the text block in a html file(many html files
> waiting for pattern and then replace to"<!-- #include
> virtual="/Head.inc" -->")
>
> I have seen much examples, but can't find a example could do this
> --
> Your fault as a Government is My failure as a citizen


Using regexs on HTML is _very_ difficult; especially "many" "very large"
files. My advice would be to not even consider it. There are many good
modules to parse HTML (I use HTML::Tokeparser) and I would urge you to have
a look at them. If you hit any snags come back with what you have tried and
we'll see how we go from there.
Best of luck.


 
Reply With Quote
 
 
 
 
Alont
Guest
Posts: n/a
 
      09-21-2004
"wfsp" <(E-Mail Removed)>Wrote at Tue, 21 Sep 2004 07:35:32 +0000
(UTC):
>Using regexs on HTML is _very_ difficult; especially "many" "very large"
>files. My advice would be to not even consider it. There are many good
>modules to parse HTML (I use HTML::Tokeparser) and I would urge you to have
>a look at them. If you hit any snags come back with what you have tried and
>we'll see how we go from there.
>Best of luck.
>


I'll try what you say, thank you
--
Your fault as a Government is My failure as a citizen
 
Reply With Quote
 
Jim Keenan
Guest
Posts: n/a
 
      09-21-2004
Alont <(E-Mail Removed)> wrote in message news:<41519803.43923109@130.133.1.4>...
> I want to pattern a text block, but the text block very large(and
> multi-line), the first line should be:
> <html><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
>
> and the end of the text block:
> rel="external" target="new">Forum</a></li>
> </ul>
> </div>
> </div>
> </div>
>
> so, how I can pattern the text block in a html file(many html files
> waiting for pattern and then replace to"<!-- #include
> virtual="/Head.inc" -->")
>


The keys to solving a regex like this are: (1) use the 's' qualifier
so '\n' gets counted in '.'; (2) use the 'x' qualifier so that you can
include comments and whitespace within the substitution code; (3)
build up the successful matches incrementally. I built up the
successful match using the commented-out lines below beginning with
'if'.

my $str = '<html><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0
Transitional//EN"

text in the middle:
rel="external" target="new">Forum</a></li>
</ul>
</div>
</div>
</div>
';

print $str, "\n";

# if ($str =~ s{<html><!DOCTYPE\shtml\sPUBLIC\s"}
# if ($str =~ s{<html><!DOCTYPE\shtml\sPUBLIC\s"-\/\/W3C}
# if ($str =~ s{<html><!DOCTYPE\shtml\sPUBLIC\s"-\/\/W3C\/\/DTD\sXHTML\s1.0\s}
# if ($str =~ s{<html><!DOCTYPE\shtml\sPUBLIC\s"-\/\/W3C\/\/DTD\sXHTML\s1.0\sTransitional\/\/EN"\n}
# failure
if ($str =~ s{<html><!DOCTYPE\shtml\sPUBLIC\s"-\/\/W3C\/\/DTD\sXHTML\s1.0\sTransitional\/\/EN"\s
.*\s
rel="external"\starget="new">Forum<\/a><\/li>\s
\s+<\/ul>\s
\s+<\/div>\s
\s+<\/div>\s
<\/div>\s
} # end of pattern to be matched
{"<!-- #include virtual="\/Head.inc" -->"}sx # text to be
substituted
# qualifiers to make \n work as \s, ignore whitespace and
comments
) # end of 'if' condition
{
print "Success! String is now:\n";
print "$str\n";
} else {
print "Failure\n";
}
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Seek xpath expression where an attribute name is a regular expression GIMME XML 3 12-29-2008 03:11 PM
C/C++ language proposal: Change the 'case expression' from "integral constant-expression" to "integral expression" Adem C++ 42 11-04-2008 12:39 PM
Matching abitrary expression in a regular expression =?iso-8859-1?B?bW9vcJk=?= Java 8 12-02-2005 12:51 AM
regular expression assistance - which newsgroup? Could not find one on MSNEWS Keith-Earl ASP .Net 1 06-15-2004 05:38 PM
Dynamically changing the regular expression of Regular Expression validator VSK ASP .Net 2 08-24-2003 02:47 PM



Advertisments