Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   Python (http://www.velocityreviews.com/forums/f43-python.html)
-   -   Re: Arrange files according to a text file (http://www.velocityreviews.com/forums/t753301-re-arrange-files-according-to-a-text-file.html)

Emile van Sebille 08-27-2011 06:06 PM

Re: Arrange files according to a text file
 
On 8/27/2011 10:03 AM Ric@rdo.python.org said...
> Hello,
>
> What would be the best way to accomplish this task?


I'd do something like:


usernames = """Adler, Jack
Smith, John
Smith, Sally
Stone, Mark""".split('\n')

filenames = """Smith, John - 02-15-75 - business files.doc
Random Data - Adler Jack - expenses.xls
More Data Mark Stone files list.doc""".split('\n')

from difflib import SequenceMatcher as SM


def ignore(x):
return x in ' ,.'


for filename in filenames:
ratios = [SM(ignore,filename,username).ratio() for username in
usernames]
best = max(ratios)
owner = usernames[ratios.index(best)]
print filename,":",owner


Emile



> I have many files in separate directories, each file name
> contain a persons name but never in the same spot.
> I need to find that name which is listed in a large
> text file in the following format. Last name, comma
> and First name. The last name could be duplicate.
>
> Adler, Jack
> Smith, John
> Smith, Sally
> Stone, Mark
> etc.
>
>
> The file names don't necessary follow any standard
> format.
>
> Smith, John - 02-15-75 - business files.doc
> Random Data - Adler Jack - expenses.xls
> More Data Mark Stone files list.doc
> etc
>
> I need some way to pull the name from the file name, find it in the
> text list and then create a directory based on the name on the list
> "Smith, John" and move all files named with the clients name into that
> directory.




Emile van Sebille 08-27-2011 09:08 PM

Re: Arrange files according to a text file
 
On 8/27/2011 1:15 PM Ric@rdo.python.org said...
>
> Hello Emile ,
>
> Thank you for the code below as I have not encountered SequenceMatcher
> before and would have to take a look at it closer.
>
> My question would it work for a text file list of names about 25k
> lines and a directory with say 100 files inside?


Sure.

Emile


>
> Thank you once again.
>
>
> On Sat, 27 Aug 2011 11:06:22 -0700, Emile van Sebille<emile@fenx.com>
> wrote:
>
>> On 8/27/2011 10:03 AM Ric@rdo.python.org said...
>>> Hello,
>>>
>>> What would be the best way to accomplish this task?

>>
>> I'd do something like:
>>
>>
>> usernames = """Adler, Jack
>> Smith, John
>> Smith, Sally
>> Stone, Mark""".split('\n')
>>
>> filenames = """Smith, John - 02-15-75 - business files.doc
>> Random Data - Adler Jack - expenses.xls
>> More Data Mark Stone files list.doc""".split('\n')
>>
>>from difflib import SequenceMatcher as SM
>>
>>
>> def ignore(x):
>> return x in ' ,.'
>>
>>
>> for filename in filenames:
>> ratios = [SM(ignore,filename,username).ratio() for username in
>> usernames]
>> best = max(ratios)
>> owner = usernames[ratios.index(best)]
>> print filename,":",owner
>>
>>
>> Emile
>>
>>
>>
>>> I have many files in separate directories, each file name
>>> contain a persons name but never in the same spot.
>>> I need to find that name which is listed in a large
>>> text file in the following format. Last name, comma
>>> and First name. The last name could be duplicate.
>>>
>>> Adler, Jack
>>> Smith, John
>>> Smith, Sally
>>> Stone, Mark
>>> etc.
>>>
>>>
>>> The file names don't necessary follow any standard
>>> format.
>>>
>>> Smith, John - 02-15-75 - business files.doc
>>> Random Data - Adler Jack - expenses.xls
>>> More Data Mark Stone files list.doc
>>> etc
>>>
>>> I need some way to pull the name from the file name, find it in the
>>> text list and then create a directory based on the name on the list
>>> "Smith, John" and move all files named with the clients name into that
>>> directory.

>>




MRAB 08-27-2011 11:48 PM

Re: Arrange files according to a text file
 
On 28/08/2011 00:18, Ric@rdo.python.org wrote:
> Thank you so much. The code worked perfectly.
>
> This is what I tried using Emile code. The only time when it picked
> wrong name from the list was when the file was named like this.
>
> Data Mark Stone.doc
>
> How can I fix this? Hope I am not asking too much?
>

Have you tried the alternative word orders, "Mark Stone" as well as
"Stone, Mark", picking whichever name has the best ratio for either?
>
> import os
> from difflib import SequenceMatcher as SM
>
> path = r'D:\Files '
> txt_names = []
>
>
> with open(r'D:/python/log1.txt') as f:
> for txt_name in f.readlines():
> txt_names.append(txt_name.strip())
>
> def ignore(x):
> return x in ' ,.'
>
> for filename in os.listdir(path):
> ratios = [SM(ignore,filename,txt_name).ratio() for txt_name in
> txt_names]
> best = max(ratios)
> owner = txt_names[ratios.index(best)]
> print filename,":",owner
>
>
>
>
>
> On Sat, 27 Aug 2011 14:08:17 -0700, Emile van Sebille<emile@fenx.com>
> wrote:
>
>> On 8/27/2011 1:15 PM Ric@rdo.python.org said...
>>>
>>> Hello Emile ,
>>>
>>> Thank you for the code below as I have not encountered SequenceMatcher
>>> before and would have to take a look at it closer.
>>>
>>> My question would it work for a text file list of names about 25k
>>> lines and a directory with say 100 files inside?

>>
>> Sure.
>>
>> Emile
>>
>>
>>>
>>> Thank you once again.
>>>
>>>
>>> On Sat, 27 Aug 2011 11:06:22 -0700, Emile van Sebille<emile@fenx.com>
>>> wrote:
>>>
>>>> On 8/27/2011 10:03 AM Ric@rdo.python.org said...
>>>>> Hello,
>>>>>
>>>>> What would be the best way to accomplish this task?
>>>>
>>>> I'd do something like:
>>>>
>>>>
>>>> usernames = """Adler, Jack
>>>> Smith, John
>>>> Smith, Sally
>>>> Stone, Mark""".split('\n')
>>>>
>>>> filenames = """Smith, John - 02-15-75 - business files.doc
>>>> Random Data - Adler Jack - expenses.xls
>>>> More Data Mark Stone files list.doc""".split('\n')
>>>>
>>> >from difflib import SequenceMatcher as SM
>>>>
>>>>
>>>> def ignore(x):
>>>> return x in ' ,.'
>>>>
>>>>
>>>> for filename in filenames:
>>>> ratios = [SM(ignore,filename,username).ratio() for username in
>>>> usernames]
>>>> best = max(ratios)
>>>> owner = usernames[ratios.index(best)]
>>>> print filename,":",owner
>>>>
>>>>
>>>> Emile
>>>>
>>>>
>>>>
>>>>> I have many files in separate directories, each file name
>>>>> contain a persons name but never in the same spot.
>>>>> I need to find that name which is listed in a large
>>>>> text file in the following format. Last name, comma
>>>>> and First name. The last name could be duplicate.
>>>>>
>>>>> Adler, Jack
>>>>> Smith, John
>>>>> Smith, Sally
>>>>> Stone, Mark
>>>>> etc.
>>>>>
>>>>>
>>>>> The file names don't necessary follow any standard
>>>>> format.
>>>>>
>>>>> Smith, John - 02-15-75 - business files.doc
>>>>> Random Data - Adler Jack - expenses.xls
>>>>> More Data Mark Stone files list.doc
>>>>> etc
>>>>>
>>>>> I need some way to pull the name from the file name, find it in the
>>>>> text list and then create a directory based on the name on the list
>>>>> "Smith, John" and move all files named with the clients name into that
>>>>> directory.
>>>>

>>



Emile van Sebille 08-28-2011 01:10 AM

Re: Arrange files according to a text file
 
On 8/27/2011 4:18 PM Ric@rdo.python.org said...
> Thank you so much. The code worked perfectly.
>
> This is what I tried using Emile code. The only time when it picked
> wrong name from the list was when the file was named like this.
>
> Data Mark Stone.doc
>
> How can I fix this? Hope I am not asking too much?


What name did it pick? I imagine if you're picking a name from a list
of 25000 names that some subset of combinations may yield like ratios.

But, if you double up on the file name side you may get closer:

for filename in filenames:
ratios = [SM(ignore,filename+filename,username).ratio() for
username in usernames]
best = max(ratios)
owner = usernames[ratios.index(best)]
print filename,":",owner

.... on the other hand, if you've only got a 100 files to sort out, you
should already be done.

:)

Emile



All times are GMT. The time now is 08:24 AM.

Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.