Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Re: Arrange files according to a text file

Reply
Thread Tools

Re: Arrange files according to a text file

 
 
Emile van Sebille
Guest
Posts: n/a
 
      08-27-2011
On 8/27/2011 10:03 AM said...
> Hello,
>
> What would be the best way to accomplish this task?


I'd do something like:


usernames = """Adler, Jack
Smith, John
Smith, Sally
Stone, Mark""".split('\n')

filenames = """Smith, John - 02-15-75 - business files.doc
Random Data - Adler Jack - expenses.xls
More Data Mark Stone files list.doc""".split('\n')

from difflib import SequenceMatcher as SM


def ignore(x):
return x in ' ,.'


for filename in filenames:
ratios = [SM(ignore,filename,username).ratio() for username in
usernames]
best = max(ratios)
owner = usernames[ratios.index(best)]
print filename,":",owner


Emile



> I have many files in separate directories, each file name
> contain a persons name but never in the same spot.
> I need to find that name which is listed in a large
> text file in the following format. Last name, comma
> and First name. The last name could be duplicate.
>
> Adler, Jack
> Smith, John
> Smith, Sally
> Stone, Mark
> etc.
>
>
> The file names don't necessary follow any standard
> format.
>
> Smith, John - 02-15-75 - business files.doc
> Random Data - Adler Jack - expenses.xls
> More Data Mark Stone files list.doc
> etc
>
> I need some way to pull the name from the file name, find it in the
> text list and then create a directory based on the name on the list
> "Smith, John" and move all files named with the clients name into that
> directory.



 
Reply With Quote
 
 
 
 
Emile van Sebille
Guest
Posts: n/a
 
      08-27-2011
On 8/27/2011 1:15 PM said...
>
> Hello Emile ,
>
> Thank you for the code below as I have not encountered SequenceMatcher
> before and would have to take a look at it closer.
>
> My question would it work for a text file list of names about 25k
> lines and a directory with say 100 files inside?


Sure.

Emile


>
> Thank you once again.
>
>
> On Sat, 27 Aug 2011 11:06:22 -0700, Emile van Sebille<>
> wrote:
>
>> On 8/27/2011 10:03 AM said...
>>> Hello,
>>>
>>> What would be the best way to accomplish this task?

>>
>> I'd do something like:
>>
>>
>> usernames = """Adler, Jack
>> Smith, John
>> Smith, Sally
>> Stone, Mark""".split('\n')
>>
>> filenames = """Smith, John - 02-15-75 - business files.doc
>> Random Data - Adler Jack - expenses.xls
>> More Data Mark Stone files list.doc""".split('\n')
>>
>>from difflib import SequenceMatcher as SM
>>
>>
>> def ignore(x):
>> return x in ' ,.'
>>
>>
>> for filename in filenames:
>> ratios = [SM(ignore,filename,username).ratio() for username in
>> usernames]
>> best = max(ratios)
>> owner = usernames[ratios.index(best)]
>> print filename,":",owner
>>
>>
>> Emile
>>
>>
>>
>>> I have many files in separate directories, each file name
>>> contain a persons name but never in the same spot.
>>> I need to find that name which is listed in a large
>>> text file in the following format. Last name, comma
>>> and First name. The last name could be duplicate.
>>>
>>> Adler, Jack
>>> Smith, John
>>> Smith, Sally
>>> Stone, Mark
>>> etc.
>>>
>>>
>>> The file names don't necessary follow any standard
>>> format.
>>>
>>> Smith, John - 02-15-75 - business files.doc
>>> Random Data - Adler Jack - expenses.xls
>>> More Data Mark Stone files list.doc
>>> etc
>>>
>>> I need some way to pull the name from the file name, find it in the
>>> text list and then create a directory based on the name on the list
>>> "Smith, John" and move all files named with the clients name into that
>>> directory.

>>



 
Reply With Quote
 
 
 
 
MRAB
Guest
Posts: n/a
 
      08-27-2011
On 28/08/2011 00:18, wrote:
> Thank you so much. The code worked perfectly.
>
> This is what I tried using Emile code. The only time when it picked
> wrong name from the list was when the file was named like this.
>
> Data Mark Stone.doc
>
> How can I fix this? Hope I am not asking too much?
>

Have you tried the alternative word orders, "Mark Stone" as well as
"Stone, Mark", picking whichever name has the best ratio for either?
>
> import os
> from difflib import SequenceMatcher as SM
>
> path = r'D:\Files '
> txt_names = []
>
>
> with open(r'D:/python/log1.txt') as f:
> for txt_name in f.readlines():
> txt_names.append(txt_name.strip())
>
> def ignore(x):
> return x in ' ,.'
>
> for filename in os.listdir(path):
> ratios = [SM(ignore,filename,txt_name).ratio() for txt_name in
> txt_names]
> best = max(ratios)
> owner = txt_names[ratios.index(best)]
> print filename,":",owner
>
>
>
>
>
> On Sat, 27 Aug 2011 14:08:17 -0700, Emile van Sebille<>
> wrote:
>
>> On 8/27/2011 1:15 PM said...
>>>
>>> Hello Emile ,
>>>
>>> Thank you for the code below as I have not encountered SequenceMatcher
>>> before and would have to take a look at it closer.
>>>
>>> My question would it work for a text file list of names about 25k
>>> lines and a directory with say 100 files inside?

>>
>> Sure.
>>
>> Emile
>>
>>
>>>
>>> Thank you once again.
>>>
>>>
>>> On Sat, 27 Aug 2011 11:06:22 -0700, Emile van Sebille<>
>>> wrote:
>>>
>>>> On 8/27/2011 10:03 AM said...
>>>>> Hello,
>>>>>
>>>>> What would be the best way to accomplish this task?
>>>>
>>>> I'd do something like:
>>>>
>>>>
>>>> usernames = """Adler, Jack
>>>> Smith, John
>>>> Smith, Sally
>>>> Stone, Mark""".split('\n')
>>>>
>>>> filenames = """Smith, John - 02-15-75 - business files.doc
>>>> Random Data - Adler Jack - expenses.xls
>>>> More Data Mark Stone files list.doc""".split('\n')
>>>>
>>> >from difflib import SequenceMatcher as SM
>>>>
>>>>
>>>> def ignore(x):
>>>> return x in ' ,.'
>>>>
>>>>
>>>> for filename in filenames:
>>>> ratios = [SM(ignore,filename,username).ratio() for username in
>>>> usernames]
>>>> best = max(ratios)
>>>> owner = usernames[ratios.index(best)]
>>>> print filename,":",owner
>>>>
>>>>
>>>> Emile
>>>>
>>>>
>>>>
>>>>> I have many files in separate directories, each file name
>>>>> contain a persons name but never in the same spot.
>>>>> I need to find that name which is listed in a large
>>>>> text file in the following format. Last name, comma
>>>>> and First name. The last name could be duplicate.
>>>>>
>>>>> Adler, Jack
>>>>> Smith, John
>>>>> Smith, Sally
>>>>> Stone, Mark
>>>>> etc.
>>>>>
>>>>>
>>>>> The file names don't necessary follow any standard
>>>>> format.
>>>>>
>>>>> Smith, John - 02-15-75 - business files.doc
>>>>> Random Data - Adler Jack - expenses.xls
>>>>> More Data Mark Stone files list.doc
>>>>> etc
>>>>>
>>>>> I need some way to pull the name from the file name, find it in the
>>>>> text list and then create a directory based on the name on the list
>>>>> "Smith, John" and move all files named with the clients name into that
>>>>> directory.
>>>>

>>


 
Reply With Quote
 
Emile van Sebille
Guest
Posts: n/a
 
      08-28-2011
On 8/27/2011 4:18 PM said...
> Thank you so much. The code worked perfectly.
>
> This is what I tried using Emile code. The only time when it picked
> wrong name from the list was when the file was named like this.
>
> Data Mark Stone.doc
>
> How can I fix this? Hope I am not asking too much?


What name did it pick? I imagine if you're picking a name from a list
of 25000 names that some subset of combinations may yield like ratios.

But, if you double up on the file name side you may get closer:

for filename in filenames:
ratios = [SM(ignore,filename+filename,username).ratio() for
username in usernames]
best = max(ratios)
owner = usernames[ratios.index(best)]
print filename,":",owner

.... on the other hand, if you've only got a 100 files to sort out, you
should already be done.



Emile

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
How to arrange many files of C source code James Harris C Programming 22 03-14-2013 07:25 PM
how to arrange classes in .py files? Kent Python 6 03-28-2009 12:30 AM
Scale text according to window/screen size. UJ ASP .Net 3 09-06-2005 06:44 PM
[OT] auto-resizing text according to current window width Kerberos Javascript 3 10-25-2004 11:48 AM
doing a function according to parsed text Anna C Programming 3 08-11-2003 05:44 AM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57