Bruno Desthuilliers a écrit :
Kent Johnson a écrit :
It should be
str_counts = dict((s, str_list.count(s)) for s in set(str_list))
Of course, my bad
from sets import Set as set
str_counts = dict([(s, str_list.count(s)) for s in set(str_list)])
Note that this solution iterates str_list once for each element of
str_list
once for each *distinct* element or str_list (it iterates over the set
created from str_list).
- the call to count traverses the entire list to create the count. I
expect Paul Rubin's solution will be dramatically faster for large
lists as it only iterates str_list once.
Yeps. But I would benchmark it before choosing one or the other solution.
And of course, I was right. My solution seems to be faster than Paul's
one (but slower than bearophile's), be it on small, medium or large lists.
nb: A is mine, B is Paul's and C is bearophile's, and the number after
is the size of the list...
A100 (10000 times): 1.5801050663
B100 (10000 times): 1.87287902832
C100 (10000 times): 0.991976976395
A10000 (100 times): 1.083589077
B10000 (100 times): 1.30713891983
C10000 (100 times): 0.988032817841
A1000000 (10 times): 10.5345788002
B1000000 (10 times): 13.094493866
C1000000 (10 times): 9.67438292503
source:
def countA(lst):
return dict((s, lst.count(s)) for s in set(lst))
def countB(lst):
counts = {}
for s in lst:
counts
= counts.get(s, 0)+1
return counts
def countC(lst):
counts = {}
for s in lst:
if s in counts:
counts += 1
else:
counts = 1
return counts
def mklst(ln):
from random import choice
items = ['abc', 'def', 'ghi', 'jkl', 'mno',
'pqr', 'stu', 'vwx', 'yz']
return [choice(items) for i in range(ln)]
lst100 = mklst(100)
lst10000 = mklst(10000)
lst1000000 = mklst(1000000)
def run():
from timeit import Timer
timers = [
('A100',
Timer('countA(lst100)',
'from __main__ import countA, lst100'),
10000),
('B100',
Timer('countB(lst100)',
'from __main__ import countB, lst100'),
10000),
('C100',
Timer('countC(lst100)',
'from __main__ import countC, lst100'),
10000),
('A10000',
Timer('countA(lst10000)',
'from __main__ import countA, lst10000'),
100),
('B10000',
Timer('countB(lst10000)',
'from __main__ import countB, lst10000'),
100),
('C10000',
Timer('countC(lst10000)',
'from __main__ import countC, lst10000'),
100),
('A1000000',
Timer('countA(lst1000000)',
'from __main__ import countA, lst1000000'),
10),
('B1000000',
Timer('countB(lst1000000)',
'from __main__ import countB, lst1000000'),
10),
('C1000000',
Timer('countC(lst1000000)',
'from __main__ import countC, lst1000000'),
10),
]
for name, timer, repeat in timers:
print "%s (%s times): %s" % (name,
repeat,
timer.timeit(repeat))