M
Mathias Mamsch
Hi,
I got a text with about 1 million words where I want to count words and put
them sorted to a list
like " list = [(most-common-word,1001),(2nd-word,986), ...] "
I think there are at about 10% (about 100.000) different words in the text.
I am wondering if you can give me something faster than my approach:
My first straightforward approach was:
----
s = "Hello this is my 1 million word text".split()
s2 = s.split()
dict = {}
for i in s2: # the loop needs 10s
if dict.has_key(i):
dict += 1
else:
dict = 1
list = dict.items()
# this is slow:
list.sort(lambda x,y: 2*(x[1] < y[1])-1)
----
That works, but i wonder if there is a faster, more elegant way to do this
....
Thanks for you interest,
Mathias Mamsch
I got a text with about 1 million words where I want to count words and put
them sorted to a list
like " list = [(most-common-word,1001),(2nd-word,986), ...] "
I think there are at about 10% (about 100.000) different words in the text.
I am wondering if you can give me something faster than my approach:
My first straightforward approach was:
----
s = "Hello this is my 1 million word text".split()
s2 = s.split()
dict = {}
for i in s2: # the loop needs 10s
if dict.has_key(i):
dict += 1
else:
dict = 1
list = dict.items()
# this is slow:
list.sort(lambda x,y: 2*(x[1] < y[1])-1)
----
That works, but i wonder if there is a faster, more elegant way to do this
....
Thanks for you interest,
Mathias Mamsch