Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > pattern replacement in xml

Reply
Thread Tools

pattern replacement in xml

 
 
tom
Guest
Posts: n/a
 
      06-21-2005
Just picked up perl to do some emergency task. Hope some expert can
help here.

I'm using perl to cleanse an xml file so it can be parsed. One problem
is to replace strings like this
<font color=669966>:
with:
&lt;font color=669966&rt;

The code is:
$templine =~ s/<font color=669966>/&lt;font color=669966&gt;/g;

The problem is anytime the color value changes, I need to do another
replacement. Can there be a pattern to find this kind of strings. eg
<font ....> and replace them with &lt;font ....&gt;

Thanks for the help.

 
Reply With Quote
 
 
 
 
A. Sinan Unur
Guest
Posts: n/a
 
      06-21-2005
"tom" <(E-Mail Removed)> wrote in news:1119392886.167881.82760
@g47g2000cwa.googlegroups.com:

> Just picked up perl to do some emergency task. Hope some expert can
> help here.
>
> I'm using perl to cleanse an xml file so it can be parsed. One problem
> is to replace strings like this
> <font color=669966>:
> with:
> &lt;font color=669966&rt;
>
> The code is:
> $templine =~ s/<font color=669966>/&lt;font color=669966&gt;/g;
>
> The problem is anytime the color value changes, I need to do another
> replacement. Can there be a pattern to find this kind of strings. eg
> <font ....> and replace them with &lt;font ....&gt;


You probably should be using

<URL:http://search.cpan.org/~gaas/HTML-Parser-3.45/lib/HTML/Entities.pm>

along with an appropriate XML parser from CPAN.

#!/usr/bin/perl

use strict;
use warnings;

use HTML::Entities;

print encode_entities(q{<font color=669966>})."\n";

__END__



--
A. Sinan Unur <(E-Mail Removed)>
(reverse each component and remove .invalid for email address)

comp.lang.perl.misc guidelines on the WWW:
http://mail.augustmail.com/~tadmc/cl...uidelines.html
 
Reply With Quote
 
 
 
 
Bob Walton
Guest
Posts: n/a
 
      06-21-2005
tom wrote:

> Just picked up perl to do some emergency task. Hope some expert can
> help here.
>
> I'm using perl to cleanse an xml file so it can be parsed. One problem
> is to replace strings like this
> <font color=669966>:
> with:
> &lt;font color=669966&rt;
>
> The code is:
> $templine =~ s/<font color=669966>/&lt;font color=669966&gt;/g;
>
> The problem is anytime the color value changes, I need to do another
> replacement. Can there be a pattern to find this kind of strings. eg
> <font ....> and replace them with &lt;font ....&gt;


Sure. Try:

$templine=~s/<(font.*?)>/&lt;$1&gt;/gi;

....
--
Bob Walton
Email: http://bwalton.com/cgi-bin/emailbob.pl
 
Reply With Quote
 
tom
Guest
Posts: n/a
 
      06-21-2005
Thanks a lot. This works exactly as I wanted:

 
Reply With Quote
 
John Bokma
Guest
Posts: n/a
 
      06-21-2005
"tom" <(E-Mail Removed)> wrote:

> Just picked up perl to do some emergency task. Hope some expert can
> help here.
>
> I'm using perl to cleanse an xml file so it can be parsed. One problem
> is to replace strings like this
> <font color=669966>:
> with:
> &lt;font color=669966&rt;

^^^
should be &gt; Also, the > doesn't have to be escaped in XML afaik.

--
John Small Perl scripts: http://johnbokma.com/perl/
Perl programmer available: http://castleamber.com/
Happy Customers: http://castleamber.com/testimonials.html

 
Reply With Quote
 
A. Sinan Unur
Guest
Posts: n/a
 
      06-21-2005
John Bokma <(E-Mail Removed)> wrote in
news:Xns967CB7085D416castleamber@130.133.1.4:

> "tom" <(E-Mail Removed)> wrote:
>
>> Just picked up perl to do some emergency task. Hope some expert can
>> help here.
>>
>> I'm using perl to cleanse an xml file so it can be parsed. One
>> problem is to replace strings like this
>> <font color=669966>:
>> with:
>> &lt;font color=669966&rt;

> ^^^
> should be &gt; Also, the > doesn't have to be escaped in XML afaik.


This is somewhat off-topic but I think what the OP had in mind was
something like:

<custom-tag>
<font color="white">Bad HTML</font>
</custom-tag>

where he does not want the text between <custom-tag>...</custom-tag> to
be interpreted as XML.

AFAIK, and that's not saying much, in that case, one needs to use:

<custom-tag>
<![CDATA[<font color="white">Bad HTML</font>]]\
</custom-tag>

rather than encoding the < and > inside <custom-tag>...</custom-tag>.

I am drifting off-topic, so I will shut up now.

Sinan
--
A. Sinan Unur <(E-Mail Removed)>
(reverse each component and remove .invalid for email address)

comp.lang.perl.misc guidelines on the WWW:
http://mail.augustmail.com/~tadmc/cl...uidelines.html
 
Reply With Quote
 
John Bokma
Guest
Posts: n/a
 
      06-22-2005
"A. Sinan Unur" <(E-Mail Removed)> wrote:

> John Bokma <(E-Mail Removed)> wrote in
> news:Xns967CB7085D416castleamber@130.133.1.4:


[...]

>>> &lt;font color=669966&rt;

>> ^^^
>> should be &gt; Also, the > doesn't have to be escaped in XML afaik.

>
> This is somewhat off-topic but I think what the OP had in mind was
> something like:
>
> <custom-tag>
> <font color="white">Bad HTML</font>
> </custom-tag>
>
> where he does not want the text between <custom-tag>...</custom-tag>
> to be interpreted as XML.
>
> AFAIK, and that's not saying much, in that case, one needs to use:
>
> <custom-tag>
> <![CDATA[<font color="white">Bad HTML</font>]]\
> </custom-tag>
>
> rather than encoding the < and > inside <custom-tag>...</custom-tag>.


Both work, yours is probably more neat, but also a lot of overhead.
I personally would drop the font element entirely. Or if I have to, make
it valid XML (color="#669966" would be sufficient + DTD update), and
"ignore" it in the processing stage.

--
John Small Perl scripts: http://johnbokma.com/perl/
Perl programmer available: http://castleamber.com/
Happy Customers: http://castleamber.com/testimonials.html

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
need some help in multiple pattern replacement inderpaul_s@yahoo.com Perl Misc 11 04-16-2006 03:29 PM
string pattern replacement jagonzal@gmail.com Java 9 03-12-2006 09:06 PM
boolean endsWith(String s, Pattern pattern) lepikhin@gmail.com Java 17 11-16-2005 10:31 AM
replacement pattern tom Perl Misc 3 06-23-2005 02:33 PM
multiple pattern replacement using regular expressions Jarkko Viinamäki Java 1 02-22-2004 06:22 AM



Advertisments