Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Matching filenames with typos

Reply
Thread Tools

Matching filenames with typos

 
 
Peter v.d. Berger
Guest
Posts: n/a
 
      12-04-2006
Hello,

I'm working on a script that can place results of soccergames from different
seasons in a row, to see the history of the game.
I've gattered a lot of scores from different websites on a FreeBSD
webserver. The scores are all placed in a directory with the season as name,
and the names of the team as the filename.
So for example results of the game 'AC Milan - Ajax' are in different files
for different seasons:

../0405/AC Milan - Ajax.txt
../0304/AC Milan - Ajax.txt
../0203/AC Milan - Ajax.txt
(team names seperated with '-')

My script creates an HTML-page with an overview of the results of al
seasons.
The problem is that I gathered the names of the teams for the results from
different websites, and some websites will use 'AC Milan', others just
'Milan'
Some websites use the name 'Ajax', others 'Ajax FC', others 'Ajax
Amsterdam'.
Since I gathered results of hundreds of teams, in tenthousands of results,
renaming all the files is not an option.
Is there a way to improve the matching of these files, with the knowledge
that:

- two or three character strings can be left out (like FC, Utd.)
- make a match when, for example, two out of three names in the filename
match
(like: the game 'name1 name2 - name3' matches both 'name1 - name 3', and
'name2 - name3')

I hope i could make my question clear, and someone can help me.

Thanks!


 
Reply With Quote
 
 
 
 
Jim Gibson
Guest
Posts: n/a
 
      12-05-2006
In article <4574a474$0$333$>, Peter v.d. Berger
<> wrote:

> Hello,
>
> I'm working on a script that can place results of soccergames from different
> seasons in a row, to see the history of the game.
> I've gattered a lot of scores from different websites on a FreeBSD
> webserver. The scores are all placed in a directory with the season as name,
> and the names of the team as the filename.
> So for example results of the game 'AC Milan - Ajax' are in different files
> for different seasons:
>
> ./0405/AC Milan - Ajax.txt
> ./0304/AC Milan - Ajax.txt
> ./0203/AC Milan - Ajax.txt
> (team names seperated with '-')
>
> My script creates an HTML-page with an overview of the results of al
> seasons.
> The problem is that I gathered the names of the teams for the results from
> different websites, and some websites will use 'AC Milan', others just
> 'Milan'
> Some websites use the name 'Ajax', others 'Ajax FC', others 'Ajax
> Amsterdam'.
> Since I gathered results of hundreds of teams, in tenthousands of results,
> renaming all the files is not an option.
> Is there a way to improve the matching of these files, with the knowledge
> that:
>
> - two or three character strings can be left out (like FC, Utd.)
> - make a match when, for example, two out of three names in the filename
> match
> (like: the game 'name1 name2 - name3' matches both 'name1 - name 3', and
> 'name2 - name3')
>
> I hope i could make my question clear, and someone can help me.


Create an array of unique team names and use a regular expression to
test if each name occurs in the file name. Generate a new name that
contains the two team names and either use that name as a key or rename
the old file to the new name. Example (untested):

my $name = 'AC Milan - Ajax FC';
my @teams = qw( Ajax Milan );

my $newname = '';
for my $team ( @teams ) {
if( $name =~ /$team/i ) {
$newname .= $team;
}
}
print "New name is '$newname'\n";

should produce

New name is 'AjaxMilan'

FYI: this newsgroup is defunct. Try comp.lang.perl.misc in the future.

Posted Via Usenet.com Premium Usenet Newsgroup Services
----------------------------------------------------------
** SPEED ** RETENTION ** COMPLETION ** ANONYMITY **
----------------------------------------------------------
http://www.usenet.com
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Parsing text acounting for typos? dagoodyear Java 1 06-12-2005 09:19 PM
problem with filenames, Filenames and FILENAMES B.J. HTML 4 04-23-2005 08:13 PM
typos in set functions Siemel Naran C++ 5 12-02-2004 06:56 AM
Typos os Bugs(70-315 self paced)? john hansen MCSD 4 10-30-2003 06:43 PM
Typos in the Exam Davin Mickelson MCSE 3 07-21-2003 11:31 PM



Advertisments