Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Function for examine content of directory

Reply
Thread Tools

Function for examine content of directory

 
 
Tigerstyle
Guest
Posts: n/a
 
      09-06-2012
Hi guys,

I'm trying to write a module containing a function to examine the contents of the current working directory and print out a count of how many files have each extension (".txt", ".doc", etc.)

This is the code so far:
--
import os

path = "v:\\workspace\\Python2_Homework03\\src\\"
dirs = os.listdir( path )
filenames = {"this.txt", "that.txt", "the_other.txt","this.doc","that.doc","this.pdf"," first.txt","that.pdf"}
extensions = []
for filename in filenames:
f = open(filename, "w")
f.write("Some text\n")
f.close()
name , ext = os.path.splitext(f.name)
extensions.append(ext)

# This would print all the files and directories
for file in dirs:
print(file)

for ext in extensions:
print("Count for %s: " %ext, extensions.count(ext))

--

When I'm trying to get the module to print how many files each extension has, it prints the count of each ext multiple times for each extension type. Like this:

this.pdf
the_other.txt
this.doc
that.txt
this.txt
that.pdf
first.txt
that.doc
Count for .pdf: 2
Count for .txt: 4
Count for .doc: 2
Count for .txt: 4
Count for .txt: 4
Count for .pdf: 2
Count for .txt: 4
Count for .doc: 2

Any help is appreciated.

T
 
Reply With Quote
 
 
 
 
Ian Foote
Guest
Posts: n/a
 
      09-06-2012
On 06/09/12 15:56, Tigerstyle wrote:
> Hi guys,
>
> I'm trying to write a module containing a function to examine the contents of the current working directory and print out a count of how many files have each extension (".txt", ".doc", etc.)
>
> This is the code so far:
> --
> import os
>
> path = "v:\\workspace\\Python2_Homework03\\src\\"
> dirs = os.listdir( path )
> filenames = {"this.txt", "that.txt", "the_other.txt","this.doc","that.doc","this.pdf"," first.txt","that.pdf"}
> extensions = []

Try using a set here instead of a list:
extensions = set()
> for filename in filenames:
> f = open(filename, "w")
> f.write("Some text\n")
> f.close()
> name , ext = os.path.splitext(f.name)
> extensions.append(ext)

and use:
extensions.add(ext)

This should take care of duplicates for you.

Regards,
Ian
 
Reply With Quote
 
 
 
 
MRAB
Guest
Posts: n/a
 
      09-06-2012
On 06/09/2012 15:56, Tigerstyle wrote:
> Hi guys,
>
> I'm trying to write a module containing a function to examine the contents of the current working directory and print out a count of how many files have each extension (".txt", ".doc", etc.)
>
> This is the code so far:
> --
> import os
>
> path = "v:\\workspace\\Python2_Homework03\\src\\"
> dirs = os.listdir( path )
> filenames = {"this.txt", "that.txt", "the_other.txt","this.doc","that.doc","this.pdf"," first.txt","that.pdf"}
> extensions = []
> for filename in filenames:
> f = open(filename, "w")
> f.write("Some text\n")
> f.close()
> name , ext = os.path.splitext(f.name)
> extensions.append(ext)
>
> # This would print all the files and directories
> for file in dirs:
> print(file)
>
> for ext in extensions:
> print("Count for %s: " %ext, extensions.count(ext))
>
> --
>
> When I'm trying to get the module to print how many files each extension has, it prints the count of each ext multiple times for each extension type. Like this:
>
> this.pdf
> the_other.txt
> this.doc
> that.txt
> this.txt
> that.pdf
> first.txt
> that.doc
> Count for .pdf: 2
> Count for .txt: 4
> Count for .doc: 2
> Count for .txt: 4
> Count for .txt: 4
> Count for .pdf: 2
> Count for .txt: 4
> Count for .doc: 2
>

That's because each extension can occur multiple times in the list.

Try the Counter class:

from collections import Counter

for ext, count in Counter(extensions).items():
print("Count for %s: " % ext, count)

 
Reply With Quote
 
Tigerstyle
Guest
Posts: n/a
 
      09-06-2012
Thanks, just what I was looking for

T

kl. 17:20:27 UTC+2 torsdag 6. september 2012 skrev MRAB følgende:
> On 06/09/2012 15:56, Tigerstyle wrote:
>
> > Hi guys,

>
> >

>
> > I'm trying to write a module containing a function to examine the contents of the current working directory and print out a count of how many files have each extension (".txt", ".doc", etc.)

>
> >

>
> > This is the code so far:

>
> > --

>
> > import os

>
> >

>
> > path = "v:\\workspace\\Python2_Homework03\\src\\"

>
> > dirs = os.listdir( path )

>
> > filenames = {"this.txt", "that.txt", "the_other.txt","this.doc","that..doc","this.pdf", "first.txt","that.pdf"}

>
> > extensions = []

>
> > for filename in filenames:

>
> > f = open(filename, "w")

>
> > f.write("Some text\n")

>
> > f.close()

>
> > name , ext = os.path.splitext(f.name)

>
> > extensions.append(ext)

>
> >

>
> > # This would print all the files and directories

>
> > for file in dirs:

>
> > print(file)

>
> >

>
> > for ext in extensions:

>
> > print("Count for %s: " %ext, extensions.count(ext))

>
> >

>
> > --

>
> >

>
> > When I'm trying to get the module to print how many files each extension has, it prints the count of each ext multiple times for each extension type. Like this:

>
> >

>
> > this.pdf

>
> > the_other.txt

>
> > this.doc

>
> > that.txt

>
> > this.txt

>
> > that.pdf

>
> > first.txt

>
> > that.doc

>
> > Count for .pdf: 2

>
> > Count for .txt: 4

>
> > Count for .doc: 2

>
> > Count for .txt: 4

>
> > Count for .txt: 4

>
> > Count for .pdf: 2

>
> > Count for .txt: 4

>
> > Count for .doc: 2

>
> >

>
> That's because each extension can occur multiple times in the list.
>
>
>
> Try the Counter class:
>
>
>
> from collections import Counter
>
>
>
> for ext, count in Counter(extensions).items():
>
> print("Count for %s: " % ext, count)


 
Reply With Quote
 
Tigerstyle
Guest
Posts: n/a
 
      09-06-2012
Thanks, just what I was looking for

T

kl. 17:20:27 UTC+2 torsdag 6. september 2012 skrev MRAB følgende:
> On 06/09/2012 15:56, Tigerstyle wrote:
>
> > Hi guys,

>
> >

>
> > I'm trying to write a module containing a function to examine the contents of the current working directory and print out a count of how many files have each extension (".txt", ".doc", etc.)

>
> >

>
> > This is the code so far:

>
> > --

>
> > import os

>
> >

>
> > path = "v:\\workspace\\Python2_Homework03\\src\\"

>
> > dirs = os.listdir( path )

>
> > filenames = {"this.txt", "that.txt", "the_other.txt","this.doc","that..doc","this.pdf", "first.txt","that.pdf"}

>
> > extensions = []

>
> > for filename in filenames:

>
> > f = open(filename, "w")

>
> > f.write("Some text\n")

>
> > f.close()

>
> > name , ext = os.path.splitext(f.name)

>
> > extensions.append(ext)

>
> >

>
> > # This would print all the files and directories

>
> > for file in dirs:

>
> > print(file)

>
> >

>
> > for ext in extensions:

>
> > print("Count for %s: " %ext, extensions.count(ext))

>
> >

>
> > --

>
> >

>
> > When I'm trying to get the module to print how many files each extension has, it prints the count of each ext multiple times for each extension type. Like this:

>
> >

>
> > this.pdf

>
> > the_other.txt

>
> > this.doc

>
> > that.txt

>
> > this.txt

>
> > that.pdf

>
> > first.txt

>
> > that.doc

>
> > Count for .pdf: 2

>
> > Count for .txt: 4

>
> > Count for .doc: 2

>
> > Count for .txt: 4

>
> > Count for .txt: 4

>
> > Count for .pdf: 2

>
> > Count for .txt: 4

>
> > Count for .doc: 2

>
> >

>
> That's because each extension can occur multiple times in the list.
>
>
>
> Try the Counter class:
>
>
>
> from collections import Counter
>
>
>
> for ext, count in Counter(extensions).items():
>
> print("Count for %s: " % ext, count)


 
Reply With Quote
 
Dennis Lee Bieber
Guest
Posts: n/a
 
      09-06-2012
On Thu, 6 Sep 2012 07:56:29 -0700 (PDT), Tigerstyle
<(E-Mail Removed)> declaimed the following in
gmane.comp.python.general:


> extensions.append(ext)
>

Don't append an ext if it is already in the list...

if ext not in extensions: extensions.append(ext)
--
Wulfraed Dennis Lee Bieber AF6VN
http://www.velocityreviews.com/forums/(E-Mail Removed) HTTP://wlfraed.home.netcom.com/

 
Reply With Quote
 
Chris Angelico
Guest
Posts: n/a
 
      09-06-2012
On Fri, Sep 7, 2012 at 12:56 AM, Tigerstyle <(E-Mail Removed)> wrote:
> I'm trying to write a module containing a function to examine the contents of the current working directory and print out a count of how many files have each extension (".txt", ".doc", etc.)


If you haven't already, look into the Python 'dict' type; you may find
it easier to work with for this sort of job. You can map an extension
("txt") to its count (4) directly.

ChrisA
 
Reply With Quote
 
Tigerstyle
Guest
Posts: n/a
 
      09-07-2012
kl. 16:56:29 UTC+2 torsdag 6. september 2012 skrev Tigerstyle følgende:
> Hi guys,
>
>
>
> I'm trying to write a module containing a function to examine the contents of the current working directory and print out a count of how many files have each extension (".txt", ".doc", etc.)
>
>
>
> This is the code so far:
>
> --
>
> import os
>
>
>
> path = "v:\\workspace\\Python2_Homework03\\src\\"
>
> dirs = os.listdir( path )
>
> filenames = {"this.txt", "that.txt", "the_other.txt","this.doc","that.doc","this.pdf"," first.txt","that.pdf"}
>
> extensions = []
>
> for filename in filenames:
>
> f = open(filename, "w")
>
> f.write("Some text\n")
>
> f.close()
>
> name , ext = os.path.splitext(f.name)
>
> extensions.append(ext)
>
>
>
> # This would print all the files and directories
>
> for file in dirs:
>
> print(file)
>
>
>
> for ext in extensions:
>
> print("Count for %s: " %ext, extensions.count(ext))
>
>
>
> --
>
>
>
> When I'm trying to get the module to print how many files each extension has, it prints the count of each ext multiple times for each extension type.. Like this:
>
>
>
> this.pdf
>
> the_other.txt
>
> this.doc
>
> that.txt
>
> this.txt
>
> that.pdf
>
> first.txt
>
> that.doc
>
> Count for .pdf: 2
>
> Count for .txt: 4
>
> Count for .doc: 2
>
> Count for .txt: 4
>
> Count for .txt: 4
>
> Count for .pdf: 2
>
> Count for .txt: 4
>
> Count for .doc: 2
>
>
>
> Any help is appreciated.
>
>
>
> T


 
Reply With Quote
 
Tigerstyle
Guest
Posts: n/a
 
      09-07-2012
Ok I'm now totally stuck.

This is the code:

---
import os
from collections import Counter

path = ":c\\mypath\dir"
dirs = os.listdir( path )
filenames = {"this.txt", "that.txt", "the_other.txt","this.doc","that.doc","this.pdf"," first.txt","that.pdf"}
extensions = []
for filename in filenames:
f = open(filename, "w")
f.write("Some text\n")
f.close()
name , ext = os.path.splitext(f.name)
extensions.append(ext)

# This would print all the files and directories
for file in dirs:
print(file)



for ext, count in Counter(extensions).items():
print("Count for %s: " % ext, count)

---

I need to make this module into a function and write a separate module to verify by testing that the function gives correct results.

Help and pointers are much appreciated.

T


 
Reply With Quote
 
Dennis Lee Bieber
Guest
Posts: n/a
 
      09-07-2012
On Fri, 7 Sep 2012 07:28:03 -0700 (PDT), Tigerstyle
<(E-Mail Removed)> declaimed the following in
gmane.comp.python.general:

> Ok I'm now totally stuck.
>
> This is the code:
>

This code is full of errors...

> ---
> import os
> from collections import Counter
>
> path = ":c\\mypath\dir"


Not a valid Windows path. The format should be "c:\mypath\dir"
(actually, to use \ you should probably declare it a raw string -- much
simpler, since all the python/OS functions don't care, is to use / -- as
in "c:/mypath/dir")

> dirs = os.listdir( path )


Warning, this will also list items that are not files (like
subdirectories). (hence "dirs" is a misleading name)


> filenames = {"this.txt", "that.txt", "the_other.txt","this.doc","that.doc","this.pdf"," first.txt","that.pdf"}
> extensions = []
> for filename in filenames:
> f = open(filename, "w")
> f.write("Some text\n")
> f.close()
> name , ext = os.path.splitext(f.name)
> extensions.append(ext)
>
> # This would print all the files and directories
> for file in dirs:
> print(file)


This prints the file/directory /name/

NOTE: you grabbed the list of names BEFORE you created your test
data files, so...

>
>
>
> for ext, count in Counter(extensions).items():
> print("Count for %s: " % ext, count)
>

.... this is not really a count of files grouped by extension IN the
directory -- this is only the count based on the file names you defined
to be created.

I'm not going to create test files, nor a test suite, and what I
have done is still too much... but...

-=-=-=-=-
import os
import collections

PATH = "e:/userdata/wulfraed/my documents/python progs"

fids = os.listdir(PATH)

fids.sort()

nmlen = max([len(f) for f in fids])

format = "%%%ss %%10s" % nmlen

cntr = collections.Counter()

for fid in fids:
prefix, ext = os.path.splitext(fid)
print format % (prefix, ext)
cntr.update([ext])

print "\n\n"

for ext, cnt in cntr.items():
print "%10s %10s" % (ext, cnt)
-=-=-=-=-

.project
.pydevproject
.settings
ABA .py
ADC .py
BookList .zip
CGIServer
DGen .py
DiskCatalog .py
DiskCatalog .pyc
Dload .py
Firearms .csv
GWhist .py
HTML .py
Hanoi .py
Hanoi .pyc
HierHead .py
Intervals .py
MBX_Split .py
MySQLTest .py
MySQLTest .pyc
MySQLdb .html
MySQLdb_files
NIM1 .py
NumberPrinter .py
PhotoFrame .py
Probability .py
ProgressBar .py
ProgressBar2 .py
RandomScores .py
SQL .py
SQLiteTest .py
SampleData .txt
SampleFormat .tsv
Script1 .py
Script2 .py
Script3 .py
Script3 .pyc
Sociable_Chain .py
Sociable_Chain .pyc
Stereo .py
TAGS .py
azel_interp .py
binadd .py
binadd2 .py
bsddb-test .py
cgiform .py
chessclock .py
counter .py
counterthread .py
cp .py
data .txt
databasetest .py
databasetest2 .py
dbfail .py
dbg .py
dbg .pyc
dbtst .py
dirwalk .py
execsub .py
extractor .py
filecnt .py
filter .py
fulldicttest .py
h2b .py
h2b .pyc
headers .py
highScore .py
htmlparse .py
i2b .py
i2b .pyc
infile1 .tsv
infile2 .tsv
infile3 .tsv
int2wrd .py
int2wrd .pyc
int2wrd2 .py
int2wrd2 .pyc
intervalfile .txt
invoice .csv
junk .py
justify .py
linkedlist .py
llist .py
main .py
make_ou_class .py
make_ou_class .pyc
mileage .py
minmax .py
mofn .py
mofn.py .zip
movefiles .py
moving .py
mptest1 .py
myhtmlparser .py
myhtmlparser .pyc
mytest .py
mytest .pyc
node .py
node .pyc
pcdtojpeg .py
pst .py
queens1 .py
queens2 .py
queens2.py .zip
query .py
railroad .py
rpg .py
run .py
s .txt
sample .tsv
scramble .py
scratch .db
script1 .html
script1 .sql
script2 .html
setuptools-0.6c6-py2.4 .egg
sgml .py
spam .py
sqltest .py
sqrot .py
src
sub .py
sub_p1 .py
sub_p3 .py
sudoku .py
sudoku.py .bak
sudoku .pyc
summup_dict1
summup_dict2
summup_dict2b
summup_dict3
summup_list
t .dat
t .py
tabspace .py
tabspace .pyc
tdriver .py
test .csd
test .db
test .sql
test .txt
testABA .py
testABA .pyc
tgsetup .py
thread .py
threadsample .py
threadswap .py
timetest .py
timing .py
trips .dat
update_log
ut_00 .py
wordprob .py



12
.pyc 17
.bak 1
.sql 2
.tsv 5
.csv 2
.db 2
.dat 2
.py 98
.txt 5
.html 3
.csd 1
.egg 1
.zip 3
--
Wulfraed Dennis Lee Bieber AF6VN
(E-Mail Removed) HTTP://wlfraed.home.netcom.com/

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Does lex+yacc produce a tree data structure that is easy for anexternal C++ program to examine and manipulate? Robert C++ 1 04-14-2008 03:47 PM
Examine the ASP.NET worker process Anders ASP .Net 1 01-10-2006 06:39 PM
Examine/Edit html output before it's sent to the client J. Shane Kunkle ASP .Net 1 12-15-2005 08:42 PM
Examine 70-305 EC MCSD 0 01-08-2005 11:18 AM
Examine items in the ASP.NET cache added using the OutputCache directive Edward Wilde ASP .Net 1 12-10-2004 01:44 AM



Advertisments