Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Ruby > Find/Replace In Files Using Lookup Table

Reply
Thread Tools

Find/Replace In Files Using Lookup Table

 
 
Andrew Porter
Guest
Posts: n/a
 
      05-28-2008
I have a directory full of HTML files. Some have anchor tags (<a =20
href=3D"directory/filename.html">), some do not. I also have a tab-=20
delimited text file with=97among other things=97an ID, title, and =
filename.

What I need to do is create a script that will:

1. Search all of the HTML files in a directory for anchor tags
2. Strip out the file name from the href attribute
3. Use the file name to look up the correlating ID in the lookup file
4. Replace the contents of the href attribute with the ID

Being new to Ruby and command-line scripting, I'm not sure where to =20
begin looking for examples of how to do this. Any help is appreciated.


 
Reply With Quote
 
 
 
 
Eric I.
Guest
Posts: n/a
 
      05-28-2008
On May 28, 6:18*pm, Andrew Porter <apor...@eyequeue.net> wrote:
> I have a directory full of HTML files. Some have anchor tags (<a *
> href="directory/filename.html">), some do not. I also have a tab-
> delimited text file with—among other things—an ID, title, and filename..
>
> What I need to do is create a script that will:
>
> 1. Search all of the HTML files in a directory for anchor tags
> 2. Strip out the file name from the href attribute
> 3. Use the file name to look up the correlating ID in the lookup file
> 4. Replace the contents of the href attribute with the ID
>
> Being new to Ruby and command-line scripting, I'm not sure where to *
> begin looking for examples of how to do this. Any help is appreciated.


Obviously your goal is to this processing. But are you hoping to use
this to learn Ruby? If so, this is a nice-sized project that will
help you to learn the language. Here are some pointers to help you
figure out where to look or start with certain aspects of the project
(the numbers match up with your numbers above):

1. To get a list of all of the HTML files in a given directory, you
can use Dir.glob.

2. To parse an HTML file you can use the hpricot gem. Alternatively,
you could open the file and use regular expressions.

3. To have read your tab-delimited file at the start of the program,
you can use the CSV class in the standard library or the fastercsv
gem. You can put the data into a hash where the file name is the key
and the ID is the value. Lookup becomes trivial then.

4. Depending on whether you're using hpricot or regular expressions
will determine how you do this. If you're using regular expressions,
you might want to do a gsub! call with a block that would allow you to
do your lookup and replacement.

Some relevant information sources:

You should have one of the Ruby books to help you with basic syntax
and all that. They will also help you with regular expressions,
hashes, and file I/O.

Documentation on File (and IO), Dir, CSV, Regexp, and Hash, you can
use:

http://ruby-doc.org/core/

For hpricot:

http://code.whytheluckystiff.net/hpricot/

For fastercsv:

http://fastercsv.rubyforge.org/

I hope that's helpful,

Eric

====

LearnRuby.com offers Rails & Ruby HANDS-ON public & ON-SITE
workshops.
Ruby Fundamentals Wkshp June 16-18 Ann Arbor, Mich.
Ready for Rails Ruby Wkshp June 23-24 Ann Arbor, Mich.
Ruby on Rails Wkshp June 25-27 Ann Arbor, Mich.
Ruby Plus Rails Combo Wkshp June 23-27 Ann Arbor, Mich
Please visit http://LearnRuby.com for all the details.
 
Reply With Quote
 
 
 
 
David Masover
Guest
Posts: n/a
 
      05-29-2008
On Wednesday 28 May 2008 18:05:15 Eric I. wrote:
> On May 28, 6:18=A0pm, Andrew Porter <apor...@eyequeue.net> wrote:


> 2. To parse an HTML file you can use the hpricot gem. Alternatively,
> you could open the file and use regular expressions.


I'd suggest hpricot or REXML if the files are reasonably well-formed and/or=
=20
XML-ish, and regex if they're not.

 
Reply With Quote
 
Andrew Porter
Guest
Posts: n/a
 
      05-29-2008
Thanks, Eric. These are excellent tips.


On May 28, 2008, at 5:05 PM, Eric I. wrote:

> On May 28, 6:18 pm, Andrew Porter <apor...@eyequeue.net> wrote:
>> I have a directory full of HTML files. Some have anchor tags (<a
>> href=3D"directory/filename.html">), some do not. I also have a tab-
>> delimited text file with=97among other things=97an ID, title, and =20
>> filename.
>>
>> What I need to do is create a script that will:
>>
>> 1. Search all of the HTML files in a directory for anchor tags
>> 2. Strip out the file name from the href attribute
>> 3. Use the file name to look up the correlating ID in the lookup file
>> 4. Replace the contents of the href attribute with the ID
>>
>> Being new to Ruby and command-line scripting, I'm not sure where to
>> begin looking for examples of how to do this. Any help is =20
>> appreciated.

>
> Obviously your goal is to this processing. But are you hoping to use
> this to learn Ruby? If so, this is a nice-sized project that will
> help you to learn the language. Here are some pointers to help you
> figure out where to look or start with certain aspects of the project
> (the numbers match up with your numbers above):
>
> 1. To get a list of all of the HTML files in a given directory, you
> can use Dir.glob.
>
> 2. To parse an HTML file you can use the hpricot gem. Alternatively,
> you could open the file and use regular expressions.
>
> 3. To have read your tab-delimited file at the start of the program,
> you can use the CSV class in the standard library or the fastercsv
> gem. You can put the data into a hash where the file name is the key
> and the ID is the value. Lookup becomes trivial then.
>
> 4. Depending on whether you're using hpricot or regular expressions
> will determine how you do this. If you're using regular expressions,
> you might want to do a gsub! call with a block that would allow you to
> do your lookup and replacement.
>
> Some relevant information sources:
>
> You should have one of the Ruby books to help you with basic syntax
> and all that. They will also help you with regular expressions,
> hashes, and file I/O.
>
> Documentation on File (and IO), Dir, CSV, Regexp, and Hash, you can
> use:
>
> http://ruby-doc.org/core/
>
> For hpricot:
>
> http://code.whytheluckystiff.net/hpricot/
>
> For fastercsv:
>
> http://fastercsv.rubyforge.org/
>
> I hope that's helpful,
>
> Eric
>
> =3D=3D=3D=3D
>
> LearnRuby.com offers Rails & Ruby HANDS-ON public & ON-SITE
> workshops.
> Ruby Fundamentals Wkshp June 16-18 Ann Arbor, Mich.
> Ready for Rails Ruby Wkshp June 23-24 Ann Arbor, Mich.
> Ruby on Rails Wkshp June 25-27 Ann Arbor, Mich.
> Ruby Plus Rails Combo Wkshp June 23-27 Ann Arbor, Mich
> Please visit http://LearnRuby.com for all the details.
>



 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
form with a lookup table Tales Mein ASP .Net 0 01-16-2006 09:39 PM
transform db lookup table to xml schema s o Java 0 04-12-2005 04:46 PM
Databinding to a lookup table in an edititemtemplate class Big Dave ASP .Net 1 10-07-2004 01:45 PM
How to define lookup table in Schema Ian Mayo XML 0 06-02-2004 07:45 PM
populating an asp list box from a simple access lookup list (single column not a table) gerry ASP .Net 0 04-24-2004 09:21 AM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57