sorted() erraticly fails to sort string numbers

U

uuid

I would be very interested in a logical explanation why this happens on
python 2.5.1:

In order to sort an etree by the .text value of one child, I adapted
this snippet from effbot.org:
import xml.etree.ElementTree as ET

tree = ET.parse("data.xml")

def getkey(elem):
return elem.findtext("number")

container = tree.find("entries")

container[:] = sorted(container, key=getkey)

tree.write("new-data.xml")

While working with a moderately sized xml file (2500 entries to sort
by), I found that a few elements were not in order. It seems that
numbers with seven digits were sorted correctly, while those with six
digits were just added at the end.

I fixed the problem by converting the numbers to int in the callback:
def getkey(elem):
return int(elem.findtext("number"))

So to my naive mind, it seems as if there was some error with the
sorted() function. Would anyone be as kind as to explain why it could
be happening? Thanks in advance!
 
A

Andre Engels

I would be very interested in a logical explanation why this happens on
python 2.5.1:

In order to sort an etree by the .text value of one child, I adapted this
snippet from effbot.org:
import xml.etree.ElementTree as ET

tree = ET.parse("data.xml")

def getkey(elem):
   return elem.findtext("number")

container = tree.find("entries")

container[:] = sorted(container, key=getkey)

tree.write("new-data.xml")

While working with a moderately sized xml file (2500 entries to sort by), I
found that a few elements were not in order. It seems that numbers with
seven digits were sorted correctly, while those with six digits were just
added at the end.

I fixed the problem by converting the numbers to int in the callback:
def getkey(elem):
   return int(elem.findtext("number"))

So to my naive mind, it seems as if there was some error with the sorted()
function. Would anyone be as kind as to explain why it could be happening?
Thanks in advance!

When sorting strings, including strings that represent numbers,
sorting is done alphabetically. In this alphabetical order the numbers
are all ordered the normal way, so two numbers with the same number of
digits will be sorted the same way, but any number starting with "1"
will come before any number starting with "2", whether they denote
units, tens, hundreds or millions. Thus:

"1" < "15999" < "16" < "2"
 
U

uuid

I am at the same time impressed with the concise answer and
disheartened by my inability to see this myself.
My heartfelt thanks!
 
J

John Posner

uuid said:
I am at the same time impressed with the concise answer and
disheartened by my inability to see this myself.
My heartfelt thanks!
Don't be disheartened! Many people -- myself included, absolutely! --
occasionally let a blind spot show in their messages to this list. BTW:

container[:] = sorted(container, key=getkey)

.... is equivalent to:

container.sort(key=getkey)

(unless I'm showing *my* blind spot here)
 
U

uuid

Don't be disheartened! Many people -- myself included, absolutely! --
occasionally let a blind spot show in their messages to this list.

Thanks for the encouragement :)
BTW:

container[:] = sorted(container, key=getkey)

... is equivalent to:

container.sort(key=getkey)

(unless I'm showing *my* blind spot here)

I don't think etree element objects support the .sort method.
At least in lxml they don't
(http://codespeak.net/lxml/api/elementtree.ElementTree.Element-class.html)
 
L

Lie Ryan

John said:
uuid said:
I am at the same time impressed with the concise answer and
disheartened by my inability to see this myself.
My heartfelt thanks!
Don't be disheartened! Many people -- myself included, absolutely! --
occasionally let a blind spot show in their messages to this list. BTW:

container[:] = sorted(container, key=getkey)

.... is equivalent to:

container.sort(key=getkey)

Equivalent, and in fact better since the sorting is done in-place
instead of creating a new list, then overwriting the old one.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,999
Messages
2,570,243
Members
46,838
Latest member
KandiceChi

Latest Threads

Top