Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > parsing directory for certain filetypes

Reply
Thread Tools

parsing directory for certain filetypes

 
 
royG
Guest
Posts: n/a
 
      03-10-2008
hi
i wrote a function to parse a given directory and make a sorted list
of files with .txt,.doc extensions .it works,but i want to know if it
is too bloated..can this be rewritten in more efficient manner?

here it is...

from string import split
from os.path import isdir,join,normpath
from os import listdir

def parsefolder(dirname):
filenms=[]
folder=dirname
isadr=isdir(folder)
if (isadr):
dirlist=listdir(folder)
filenm=""
for x in dirlist:
filenm=x
if(filenm.endswith(("txt","doc"))):
nmparts=[]
nmparts=split(filenm,'.' )
if((nmparts[1]=='txt') or (nmparts[1]=='doc')):
filenms.append(filenm)
filenms.sort()
filenameslist=[]
filenameslist=[normpath(join(folder,y)) for y in filenms]
numifiles=len(filenameslist)
print filenameslist
return filenameslist


folder='F:/mysys/code/tstfolder'
parsefolder(folder)


thanks,
RG
 
Reply With Quote
 
 
 
 
sam
Guest
Posts: n/a
 
      03-10-2008
royG napisał(a):

> i wrote a function to parse a given directory and make a sorted list
> of files with .txt,.doc extensions .it works,but i want to know if it
> is too bloated..can this be rewritten in more efficient manner?
>


Probably this should be rewriten and should be very compact. Maybe you should
grab string:

find $dirname -type f -a \( -name '*.txt' -o -name '*.doc' \)

and split by "\n"?


--
UFO Occupation
www.totalizm.org
 
Reply With Quote
 
 
 
 
jay graves
Guest
Posts: n/a
 
      03-10-2008
On Mar 10, 8:57 am, royG <(E-Mail Removed)> wrote:
> i wrote a function to parse a given directory and make a sorted list
> of files with .txt,.doc extensions .it works,but i want to know if it
> is too bloated..can this be rewritten in more efficient manner?


Try the 'glob' module.

....
Jay
 
Reply With Quote
 
Robert Bossy
Guest
Posts: n/a
 
      03-10-2008
royG wrote:
> hi
> i wrote a function to parse a given directory and make a sorted list
> of files with .txt,.doc extensions .it works,but i want to know if it
> is too bloated..can this be rewritten in more efficient manner?
>
> here it is...
>
> from string import split
> from os.path import isdir,join,normpath
> from os import listdir
>
> def parsefolder(dirname):
> filenms=[]
> folder=dirname
> isadr=isdir(folder)
> if (isadr):
> dirlist=listdir(folder)
> filenm=""
>

This las line is unnecessary: variable scope rules in python are a bit
different from what we're used to. You're not required to
declare/initialize a variable, you're only required to assign a value
before it is referenced.


> for x in dirlist:
> filenm=x
> if(filenm.endswith(("txt","doc"))):
> nmparts=[]
> nmparts=split(filenm,'.' )
> if((nmparts[1]=='txt') or (nmparts[1]=='doc')):
>

I don't get it. You've already checked that filenm ends with "txt" or
"doc"... What is the purpose of these three lines?
Btw, again, nmparts=[] is unnecessary.

> filenms.append(filenm)
> filenms.sort()
> filenameslist=[]
>

Unnecessary initialization.

> filenameslist=[normpath(join(folder,y)) for y in filenms]
> numifiles=len(filenameslist)
>

numifiles is not used so I guess this line is too much.

> print filenameslist
> return filenameslist
>


Personally, I'd use glob.glob:


import os.path
import glob

def parsefolder(folder):
path = os.path.normpath(os.path.join(folder, '*.py'))
lst = [ fn for fn in glob.glob(path) ]
lst.sort()
return lst


I leave you the exercice to add .doc files. But I must say (whoever's
listening) that I was a bit disappointed that glob('*.{txt,doc}') didn't
work.

Cheers,
RB
 
Reply With Quote
 
sam
Guest
Posts: n/a
 
      03-10-2008
Robert Bossy napisał(a):

> I leave you the exercice to add .doc files. But I must say (whoever's
> listening) that I was a bit disappointed that glob('*.{txt,doc}') didn't
> work.


"{" and "}" are bash invention and not POSIX standard unfortunately

--
UFO Occupation
www.totalizm.org
 
Reply With Quote
 
jay graves
Guest
Posts: n/a
 
      03-10-2008
On Mar 10, 9:28 am, Robert Bossy <(E-Mail Removed)> wrote:
> Personally, I'd use glob.glob:
>
> import os.path
> import glob
>
> def parsefolder(folder):
> path = os.path.normpath(os.path.join(folder, '*.py'))
> lst = [ fn for fn in glob.glob(path) ]
> lst.sort()
> return lst
>


Why the 'no-op' list comprehension? Typo?

....
Jay
 
Reply With Quote
 
Tim Chase
Guest
Posts: n/a
 
      03-10-2008
> i wrote a function to parse a given directory and make a sorted list
> of files with .txt,.doc extensions .it works,but i want to know if it
> is too bloated..can this be rewritten in more efficient manner?
>
> here it is...
>
> from string import split
> from os.path import isdir,join,normpath
> from os import listdir
>
> def parsefolder(dirname):
> filenms=[]
> folder=dirname
> isadr=isdir(folder)
> if (isadr):
> dirlist=listdir(folder)
> filenm=""
> for x in dirlist:
> filenm=x
> if(filenm.endswith(("txt","doc"))):
> nmparts=[]
> nmparts=split(filenm,'.' )
> if((nmparts[1]=='txt') or (nmparts[1]=='doc')):
> filenms.append(filenm)
> filenms.sort()
> filenameslist=[]
> filenameslist=[normpath(join(folder,y)) for y in filenms]
> numifiles=len(filenameslist)
> print filenameslist
> return filenameslist
>
>
> folder='F:/mysys/code/tstfolder'
> parsefolder(folder)


It seems to me that this is awfully baroque with many unneeded
superfluous variables. Is this not the same functionality (minus
prints, unused result-counting, NOPs, and belt-and-suspenders
extension-checking) as

def parsefolder(dirname):
if not isdir(dirname): return
return sorted([
normpath(join(dirname, fname))
for fname in listdir(dirname)
if fname.lower().endswith('.txt')
or fname.lower().endswith('.doc')
])

In Python2.5 (or 2.4 if you implement the any() function, ripped
from the docs[1]), this could be rewritten to be a little more
flexible...something like this (untested):

def parsefolder(dirname, types=['.doc', '.txt']):
if not isdir(dirname): return
return sorted([
normpath(join(dirname, fname))
for fname in listdir(dirname)
if any(
fname.lower().endswith(s)
for s in types)
])

which would allow you to do both

parsefolder('/path/to/wherever/')

and

parsefolder('/path/to/wherever/', ['.xls', '.ppt', '.htm'])

In both cases, you don't define the case where isdir(dirname)
fails. Caveat Implementor.

-tkc


[1] http://docs.python.org/lib/built-in-funcs.html




 
Reply With Quote
 
Robert Bossy
Guest
Posts: n/a
 
      03-10-2008
jay graves wrote:
> On Mar 10, 9:28 am, Robert Bossy <(E-Mail Removed)> wrote:
>
>> Personally, I'd use glob.glob:
>>
>> import os.path
>> import glob
>>
>> def parsefolder(folder):
>> path = os.path.normpath(os.path.join(folder, '*.py'))
>> lst = [ fn for fn in glob.glob(path) ]
>> lst.sort()
>> return lst
>>
>>

>
> Why the 'no-op' list comprehension? Typo?
>

My mistake, it is:

import os.path
import glob

def parsefolder(folder):
path = os.path.normpath(os.path.join(folder, '*.py'))
lst = glob.glob(path)
lst.sort()
return lst


 
Reply With Quote
 
royG
Guest
Posts: n/a
 
      03-11-2008
On Mar 10, 8:03 pm, Tim Chase wrote:

> In Python2.5 (or 2.4 if you implement the any() function, ripped
> from the docs[1]), this could be rewritten to be a little more
> flexible...something like this (untested):
>


that was quite a good lesson for a beginner like me..
thanks guys

in the version using glob()
>path = os.path.normpath(os.path.join(folder, '*.txt'))
>lst = glob.glob(path)


is it possible to check for more than one file extension? here i will
have to create two path variables like
path1 = os.path.normpath(os.path.join(folder, '*.txt'))
path2 = os.path.normpath(os.path.join(folder, '*.doc'))

and then use glob separately..
or is there another way?

RG
 
Reply With Quote
 
Gerard Flanagan
Guest
Posts: n/a
 
      03-11-2008
On Mar 11, 6:21 am, royG <(E-Mail Removed)> wrote:
> On Mar 10, 8:03 pm, Tim Chase wrote:
>
> > In Python2.5 (or 2.4 if you implement the any() function, ripped
> > from the docs[1]), this could be rewritten to be a little more
> > flexible...something like this (untested):

>
> that was quite a good lesson for a beginner like me..
> thanks guys
>
> in the version using glob()
>
> >path = os.path.normpath(os.path.join(folder, '*.txt'))
> >lst = glob.glob(path)

>
> is it possible to check for more than one file extension? here i will
> have to create two path variables like
> path1 = os.path.normpath(os.path.join(folder, '*.txt'))
> path2 = os.path.normpath(os.path.join(folder, '*.doc'))
>
> and then use glob separately..
> or is there another way?
>


I don't think you can match multiple patterns directly with glob, but
`fnmatch` - the module used by glob to do check for matches - has a
`translate` function which will convert a glob pattern to a regular
expression (string). So you can do something along the lines of the
following:

---------------------------------------------

import os
from fnmatch import translate
import re

d = '/tmp'
patt1 = '*.log'
patt2 = '*.ini'
patterns = [patt1, patt2]

rx = '|'.join(translate(p) for p in patterns)
patt = re.compile(rx)

for f in os.listdir(d):
if patt.match(f):
print f

---------------------------------------------

hth

Gerard
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
argparse and filetypes Bradley Hintze Python 2 03-22-2011 03:52 PM
askopenfilename filetypes problem embirath Python 0 07-02-2010 04:50 PM
tkFileDialog.askopenfilename filetypes problem Justin Straube Python 2 09-27-2006 08:48 PM
Accessing/Dowloading certain filetypes... Brandon ASP General 2 01-18-2006 04:57 PM
Filetypes in email attachments. justin.vanwinkle@gmail.com Python 1 08-25-2005 09:02 PM



Advertisments