Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > regex to extract color guide from html

Reply
Thread Tools

regex to extract color guide from html

 
 
cp
Guest
Posts: n/a
 
      10-26-2004
I copied a webpage that had a color guide that I liked. I wanted to extract
the color names and codes and make a list of name alternating with code,
which, of course, could be made into a hash or saved in a file or whatever.
Below is some random clippings from the html so you can see what I am
working with. Below that is the foreach loop that goes through and looks
for the color name and color code. The html file is already loaded into
@data. I thought that it worked fine until I realized that some colors
were missed. I then observed that the first color to be picked up on a
line was picked up but the remaining colors on the same line were skipped.
I thought that adding the g modifier at the end of the regex would fix it
but it produced the same exact output. Any suggestions would be greatly
appreciated.


class=s><br>&nbsp;<td>mediumseagreen (<a href="colorsvg.html">SVG</a>)
#3CB371<td bgcolor="#3CB371" class=s><td>gray24 #3D3D3D<td
bgcolor="#3D3D3D" class=s>^M
<tr align=right><td>cobalt #3D59AB<td bgcolor="#3D59AB"
class=s><br>&nbsp;<td>cobaltgreen #3D9140<td bgcolor="#3D9140"
class=s><td>gray25 #404040<td bgcolor="#404040" class=s>^M

<tr align=right><td>dodgerblue4 #104E8B<td bgcolor="#104E8B"
class=s><br>&nbsp;<td>ultramarine #120A8F<td bgcolor="#120A8F"
class=s><td>gray7 #121212<td bgcolor="#121212" class=s>^M


foreach(@data)
{
next if not /td\>(\S+\s?\S*)\s*(\#[[digit:]]+)\<td/g;
my $s1 = "$1\n";
my $s2 = "$2\n";
push @output,($s1,$s2);
}



--
www.cherryplankton.com
 
Reply With Quote
 
 
 
 
Gunnar Hjalmarsson
Guest
Posts: n/a
 
      10-26-2004
cp wrote:
> I then observed that the first color to be picked up on a line was
> picked up but the remaining colors on the same line were skipped. I
> thought that adding the g modifier at the end of the regex would fix
> it but it produced the same exact output.


<snip>

> foreach(@data)
> {
> next if not /td\>(\S+\s?\S*)\s*(\#[[digit:]]+)\<td/g;
> my $s1 = "$1\n";
> my $s2 = "$2\n";
> push @output,($s1,$s2);
> }


You are assigning $s1 and $s2 only once per line, so only the last pair
on respective line is added to @output.

One possible solution is to process each line in a while loop:

foreach(@data) {
while (/td>(\S+\s?\S*)\s*(#[[digit:]]+)<td/g) {
my $s1 = "$1\n";
my $s2 = "$2\n";
push @output,($s1,$s2);
}
}

But what happens if the color name and color code are on different
lines? A better solution is to slurp the whole file as one string into a
scalar variable, and drop the foreach loop:

my $data = do { local $/; <FILE> };
while ($data =~ /td>(\S+\s?\S*)\s*(#[[digit:]]+)<td/g) {
...

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl
 
Reply With Quote
 
 
 
 
cp
Guest
Posts: n/a
 
      10-27-2004
Gunnar Hjalmarsson wrote:

> cp wrote:
>> I then observed that the first color to be picked up on a line was
>> picked up but the remaining colors on the same line were skipped. I
>> thought that adding the g modifier at the end of the regex would fix
>> it but it produced the same exact output.

>
> <snip>
>
>> foreach(@data)
>> {
>> next if not /td\>(\S+\s?\S*)\s*(\#[[digit:]]+)\<td/g;
>> my $s1 = "$1\n";
>> my $s2 = "$2\n";
>> push @output,($s1,$s2);
>> }

>
> You are assigning $s1 and $s2 only once per line, so only the last pair
> on respective line is added to @output.
>
> One possible solution is to process each line in a while loop:
>
> foreach(@data) {
> while (/td>(\S+\s?\S*)\s*(#[[digit:]]+)<td/g) {
> my $s1 = "$1\n";
> my $s2 = "$2\n";
> push @output,($s1,$s2);
> }
> }
>
> But what happens if the color name and color code are on different
> lines? A better solution is to slurp the whole file as one string into a
> scalar variable, and drop the foreach loop:
>
> my $data = do { local $/; <FILE> };
> while ($data =~ /td>(\S+\s?\S*)\s*(#[[digit:]]+)<td/g) {
> ...
>


Thanks to all for helpful advice. I followed and now have 570 named colors
instead of the 243 I had before! I did finally go with the all in one
string solution. I am going to send them up to my website now...Thanks
again!

--
www.cherryplankton.com
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
extract high resolution b/w from color? james Digital Photography 38 10-15-2009 12:07 AM
Changing font color from current font color to black color Kamaljeet Saini Ruby 0 02-13-2009 04:58 PM
How make regex that means "contains regex#1 but NOT regex#2" ?? seberino@spawar.navy.mil Python 3 07-01-2008 03:06 PM
How do i extract vidios when winrar wont extract them??? help plzzzzzzzz smuttdog@sc.rr.com Computer Support 2 12-23-2007 07:03 AM
regex: How to extract substrings? Markus Dehmann Java 2 12-10-2005 06:35 AM



Advertisments