S
Serhiy Storchaka
Indexing is O(0) for any string.
Typo. O(1)
Indexing is O(0) for any string.
- Unfortunately, I got opposite and even much worst results on my win
box, considering
- libfrancais is one of my module and it does a little bit more than the
std sorting tools.
My rationale: very simple.
1) I never heard about something better than sticking with one of the
Unicode coding scheme. (genreral theory)
the "new" Py 3.3 algorithm. I'm not the only one guy, who noticed
problems.
Arguing, "it is fast enough", is not a correct answer.
I see that this misconception widely spread.
In fact Python 3.3 uses four kinds of ready strings.
* ASCII. All codes <= U+007F.
* UCS1. All codes <= U+00FF, at least one code > U+007F.
* UCS2. All codes <= U+FFFF, at least one code > U+00FF.
* UCS4. All codes <= U+0010FFFF, at least one code > U+FFFF.
Indexing is O(0) for any string.
Also the string can optionally cache UTF-8 and wchar_t* representation.
I am not familiar enough with the C implementation to tell what Python
3.3 actually does, and the PEP assumes a fair amount of familiarity with
the CPython source. So I welcome corrections.
Where UCS1 is equivalent to Latin-1, correct?
UCS2 is what Python 3.2 narrow builds uses for all strings, including
codes > U+FFFF using surrogate pairs.
UCS4 is what Python 3.2 wide builds uses for all strings.
This means that Python 3.3 will no longer have surrogate pairs.
Hmm, and with locale.strxfrm Python 3.3 20% slower than 3.2.
Hmm, and with locale.strxfrm Python 3.3 20% slower than 3.2.
Le dimanche 2 septembre 2012 14:01:18 UTC+2, Serhiy Storchaka a écrit :
With a memory gain = 0 since my text contains non-latin-1 characters!
jmf
Le dimanche 2 septembre 2012 14:01:18 UTC+2, Serhiy Storchaka a écrit :
With a memory gain = 0 since my text contains non-latin-1 characters!
At least users of wide builds will see a decrease in memory use:
Indexing is O(0) for any string.
If you are *seriously* interested in debugging why string code is slower
for you, you can start by running the full suite of Python string
benchmarks: see the stringbench benchmark in the Tools directory of
source installations, or see here:
http://hg.python.org/cpython/file/8ff2f4634ed8/Tools/stringbench
This means that Python 3.3 will no longer have surrogate pairs.
Am I right?
I can't confirm this. At least users of wide builds will see a decrease in
memory use:
Hmm, and with locale.strxfrm Python 3.3 20% slower than 3.2.
http://hg.python.org/cpython/file/default/Tools/stringbench
However, stringbench is not good tool to measure the effectiveness of
new string representation, because it focuses mainly on ASCII strings
and comparing strings with bytes.
Want to reply to this thread or ask your own question?
You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.