Thorsten Kampe said:
* Steven D'Aprano (06 Aug 2009 19:17:30 GMT)
What if you're writing a loop which takes one million different lines of
text and decodes them once each?
setup = 'L = ["abc"*(n%100) for n in xrange(1000000)]'
t1 = timeit.Timer('for line in L: line.decode("utf-8")', setup)
t2 = timeit.Timer('for line in L: unicode(line, "utf-8")', setup)
t1.timeit(number=1) 5.6751680374145508
t2.timeit(number=1)
2.6822888851165771
Seems like a pretty meaningful difference to me.
Bollocks. No one will even notice whether a code sequence runs 2.7 or
5.7 seconds. That's completely artificial benchmarking.
For a real-life example, I have often a file with one word per line, and
I run python scripts to apply some (sometimes fairy trivial)
transformation over it. REAL example, reading lines with word, lemma,
tag separated by tabs from stdin and writing word into stdout, unless it
starts with '<' (~6e5 lines, python2.5, user times, warm cache, I hope
the comments are self-explanatory)
no unicode
user 0m2.380s
decode('utf-8'), encode('utf-8')
user 0m3.560s
sys.stdout = codecs.getwriter('utf-8')(sys.stdout);sys.stdin = codecs.getreader('utf-8')(sys.stdin)
user 0m6.180s
unicode(line, 'utf8'), encode('utf-8')
user 0m3.820s
unicode(line, 'utf-8'), encode('utf-8')
user 0m2.880sa
python3.1
user 0m1.560s
Since I have something like 18 million words in my currenct project (and
> 600 million overall) and I often tweak some parameters and re-run the
> transformations, the differences are pretty significant.
Personally, I have been surprised by:
1) bad performance of the codecs wrapper (I expected it to be on par with
unicode(x,'utf-8'), mayble slightly better due to less function calls
2) good performance of python3.1 (utf-8 locale)
--
-----------------------------------------------------------
| Radovan GarabÃk
http://kassiopeia.juls.savba.sk/~garabik/ |
| __..--^^^--..__ garabik @ kassiopeia.juls.savba.sk |
-----------------------------------------------------------
Antivirus alert: file .signature infected by signature virus.
Hi! I'm a signature virus! Copy me into your signature file to help me spread!