Function for examine content of directory

T

Tigerstyle

Hi guys,

I'm trying to write a module containing a function to examine the contents of the current working directory and print out a count of how many files have each extension (".txt", ".doc", etc.)

This is the code so far:
--
import os

path = "v:\\workspace\\Python2_Homework03\\src\\"
dirs = os.listdir( path )
filenames = {"this.txt", "that.txt", "the_other.txt","this.doc","that.doc","this.pdf","first.txt","that.pdf"}
extensions = []
for filename in filenames:
f = open(filename, "w")
f.write("Some text\n")
f.close()
name , ext = os.path.splitext(f.name)
extensions.append(ext)

# This would print all the files and directories
for file in dirs:
print(file)

for ext in extensions:
print("Count for %s: " %ext, extensions.count(ext))

--

When I'm trying to get the module to print how many files each extension has, it prints the count of each ext multiple times for each extension type. Like this:

this.pdf
the_other.txt
this.doc
that.txt
this.txt
that.pdf
first.txt
that.doc
Count for .pdf: 2
Count for .txt: 4
Count for .doc: 2
Count for .txt: 4
Count for .txt: 4
Count for .pdf: 2
Count for .txt: 4
Count for .doc: 2

Any help is appreciated.

T
 
I

Ian Foote

Hi guys,

I'm trying to write a module containing a function to examine the contents of the current working directory and print out a count of how many files have each extension (".txt", ".doc", etc.)

This is the code so far:
--
import os

path = "v:\\workspace\\Python2_Homework03\\src\\"
dirs = os.listdir( path )
filenames = {"this.txt", "that.txt", "the_other.txt","this.doc","that.doc","this.pdf","first.txt","that.pdf"}
extensions = []
Try using a set here instead of a list:
extensions = set()
for filename in filenames:
f = open(filename, "w")
f.write("Some text\n")
f.close()
name , ext = os.path.splitext(f.name)
extensions.append(ext)
and use:
extensions.add(ext)

This should take care of duplicates for you.

Regards,
Ian
 
M

MRAB

Hi guys,

I'm trying to write a module containing a function to examine the contents of the current working directory and print out a count of how many files have each extension (".txt", ".doc", etc.)

This is the code so far:
--
import os

path = "v:\\workspace\\Python2_Homework03\\src\\"
dirs = os.listdir( path )
filenames = {"this.txt", "that.txt", "the_other.txt","this.doc","that.doc","this.pdf","first.txt","that.pdf"}
extensions = []
for filename in filenames:
f = open(filename, "w")
f.write("Some text\n")
f.close()
name , ext = os.path.splitext(f.name)
extensions.append(ext)

# This would print all the files and directories
for file in dirs:
print(file)

for ext in extensions:
print("Count for %s: " %ext, extensions.count(ext))

--

When I'm trying to get the module to print how many files each extension has, it prints the count of each ext multiple times for each extension type. Like this:

this.pdf
the_other.txt
this.doc
that.txt
this.txt
that.pdf
first.txt
that.doc
Count for .pdf: 2
Count for .txt: 4
Count for .doc: 2
Count for .txt: 4
Count for .txt: 4
Count for .pdf: 2
Count for .txt: 4
Count for .doc: 2
That's because each extension can occur multiple times in the list.

Try the Counter class:

from collections import Counter

for ext, count in Counter(extensions).items():
print("Count for %s: " % ext, count)
 
T

Tigerstyle

Thanks, just what I was looking for :)

T

kl. 17:20:27 UTC+2 torsdag 6. september 2012 skrev MRAB følgende:
Hi guys,

I'm trying to write a module containing a function to examine the contents of the current working directory and print out a count of how many files have each extension (".txt", ".doc", etc.)

This is the code so far:

import os

path = "v:\\workspace\\Python2_Homework03\\src\\"
dirs = os.listdir( path )
filenames = {"this.txt", "that.txt", "the_other.txt","this.doc","that..doc","this.pdf","first.txt","that.pdf"}
extensions = []
for filename in filenames:
f = open(filename, "w")
f.write("Some text\n")

name , ext = os.path.splitext(f.name)


# This would print all the files and directories
for file in dirs:


for ext in extensions:
print("Count for %s: " %ext, extensions.count(ext))



When I'm trying to get the module to print how many files each extension has, it prints the count of each ext multiple times for each extension type. Like this:









Count for .pdf: 2
Count for .txt: 4
Count for .doc: 2
Count for .txt: 4
Count for .txt: 4
Count for .pdf: 2
Count for .txt: 4
Count for .doc: 2

That's because each extension can occur multiple times in the list.



Try the Counter class:



from collections import Counter



for ext, count in Counter(extensions).items():

print("Count for %s: " % ext, count)
 
T

Tigerstyle

Thanks, just what I was looking for :)

T

kl. 17:20:27 UTC+2 torsdag 6. september 2012 skrev MRAB følgende:
Hi guys,

I'm trying to write a module containing a function to examine the contents of the current working directory and print out a count of how many files have each extension (".txt", ".doc", etc.)

This is the code so far:

import os

path = "v:\\workspace\\Python2_Homework03\\src\\"
dirs = os.listdir( path )
filenames = {"this.txt", "that.txt", "the_other.txt","this.doc","that..doc","this.pdf","first.txt","that.pdf"}
extensions = []
for filename in filenames:
f = open(filename, "w")
f.write("Some text\n")

name , ext = os.path.splitext(f.name)


# This would print all the files and directories
for file in dirs:


for ext in extensions:
print("Count for %s: " %ext, extensions.count(ext))



When I'm trying to get the module to print how many files each extension has, it prints the count of each ext multiple times for each extension type. Like this:









Count for .pdf: 2
Count for .txt: 4
Count for .doc: 2
Count for .txt: 4
Count for .txt: 4
Count for .pdf: 2
Count for .txt: 4
Count for .doc: 2

That's because each extension can occur multiple times in the list.



Try the Counter class:



from collections import Counter



for ext, count in Counter(extensions).items():

print("Count for %s: " % ext, count)
 
C

Chris Angelico

I'm trying to write a module containing a function to examine the contents of the current working directory and print out a count of how many files have each extension (".txt", ".doc", etc.)

If you haven't already, look into the Python 'dict' type; you may find
it easier to work with for this sort of job. You can map an extension
("txt") to its count (4) directly.

ChrisA
 
T

Tigerstyle

Ok I'm now totally stuck.

This is the code:

---
import os
from collections import Counter

path = ":c\\mypath\dir"
dirs = os.listdir( path )
filenames = {"this.txt", "that.txt", "the_other.txt","this.doc","that.doc","this.pdf","first.txt","that.pdf"}
extensions = []
for filename in filenames:
f = open(filename, "w")
f.write("Some text\n")
f.close()
name , ext = os.path.splitext(f.name)
extensions.append(ext)

# This would print all the files and directories
for file in dirs:
print(file)



for ext, count in Counter(extensions).items():
print("Count for %s: " % ext, count)

---

I need to make this module into a function and write a separate module to verify by testing that the function gives correct results.

Help and pointers are much appreciated.

T
 
D

Dennis Lee Bieber

Ok I'm now totally stuck.

This is the code:
This code is full of errors...
---
import os
from collections import Counter

path = ":c\\mypath\dir"

Not a valid Windows path. The format should be "c:\mypath\dir"
(actually, to use \ you should probably declare it a raw string -- much
simpler, since all the python/OS functions don't care, is to use / -- as
in "c:/mypath/dir")
dirs = os.listdir( path )

Warning, this will also list items that are not files (like
subdirectories). (hence "dirs" is a misleading name)

filenames = {"this.txt", "that.txt", "the_other.txt","this.doc","that.doc","this.pdf","first.txt","that.pdf"}
extensions = []
for filename in filenames:
f = open(filename, "w")
f.write("Some text\n")
f.close()
name , ext = os.path.splitext(f.name)
extensions.append(ext)

# This would print all the files and directories
for file in dirs:
print(file)

This prints the file/directory /name/

NOTE: you grabbed the list of names BEFORE you created your test
data files, so...
for ext, count in Counter(extensions).items():
print("Count for %s: " % ext, count)
.... this is not really a count of files grouped by extension IN the
directory -- this is only the count based on the file names you defined
to be created.

I'm not going to create test files, nor a test suite, and what I
have done is still too much... but...

-=-=-=-=-
import os
import collections

PATH = "e:/userdata/wulfraed/my documents/python progs"

fids = os.listdir(PATH)

fids.sort()

nmlen = max([len(f) for f in fids])

format = "%%%ss %%10s" % nmlen

cntr = collections.Counter()

for fid in fids:
prefix, ext = os.path.splitext(fid)
print format % (prefix, ext)
cntr.update([ext])

print "\n\n"

for ext, cnt in cntr.items():
print "%10s %10s" % (ext, cnt)
-=-=-=-=-

.project
.pydevproject
.settings
ABA .py
ADC .py
BookList .zip
CGIServer
DGen .py
DiskCatalog .py
DiskCatalog .pyc
Dload .py
Firearms .csv
GWhist .py
HTML .py
Hanoi .py
Hanoi .pyc
HierHead .py
Intervals .py
MBX_Split .py
MySQLTest .py
MySQLTest .pyc
MySQLdb .html
MySQLdb_files
NIM1 .py
NumberPrinter .py
PhotoFrame .py
Probability .py
ProgressBar .py
ProgressBar2 .py
RandomScores .py
SQL .py
SQLiteTest .py
SampleData .txt
SampleFormat .tsv
Script1 .py
Script2 .py
Script3 .py
Script3 .pyc
Sociable_Chain .py
Sociable_Chain .pyc
Stereo .py
TAGS .py
azel_interp .py
binadd .py
binadd2 .py
bsddb-test .py
cgiform .py
chessclock .py
counter .py
counterthread .py
cp .py
data .txt
databasetest .py
databasetest2 .py
dbfail .py
dbg .py
dbg .pyc
dbtst .py
dirwalk .py
execsub .py
extractor .py
filecnt .py
filter .py
fulldicttest .py
h2b .py
h2b .pyc
headers .py
highScore .py
htmlparse .py
i2b .py
i2b .pyc
infile1 .tsv
infile2 .tsv
infile3 .tsv
int2wrd .py
int2wrd .pyc
int2wrd2 .py
int2wrd2 .pyc
intervalfile .txt
invoice .csv
junk .py
justify .py
linkedlist .py
llist .py
main .py
make_ou_class .py
make_ou_class .pyc
mileage .py
minmax .py
mofn .py
mofn.py .zip
movefiles .py
moving .py
mptest1 .py
myhtmlparser .py
myhtmlparser .pyc
mytest .py
mytest .pyc
node .py
node .pyc
pcdtojpeg .py
pst .py
queens1 .py
queens2 .py
queens2.py .zip
query .py
railroad .py
rpg .py
run .py
s .txt
sample .tsv
scramble .py
scratch .db
script1 .html
script1 .sql
script2 .html
setuptools-0.6c6-py2.4 .egg
sgml .py
spam .py
sqltest .py
sqrot .py
src
sub .py
sub_p1 .py
sub_p3 .py
sudoku .py
sudoku.py .bak
sudoku .pyc
summup_dict1
summup_dict2
summup_dict2b
summup_dict3
summup_list
t .dat
t .py
tabspace .py
tabspace .pyc
tdriver .py
test .csd
test .db
test .sql
test .txt
testABA .py
testABA .pyc
tgsetup .py
thread .py
threadsample .py
threadswap .py
timetest .py
timing .py
trips .dat
update_log
ut_00 .py
wordprob .py



12
.pyc 17
.bak 1
.sql 2
.tsv 5
.csv 2
.db 2
.dat 2
.py 98
.txt 5
.html 3
.csd 1
.egg 1
.zip 3
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,968
Messages
2,570,154
Members
46,702
Latest member
LukasConde

Latest Threads

Top