R
Roy Smith
I stumbled upon an interesting bit of trivia concerning lists and list
comprehensions today.
We use mongoengine as a database model layer. A mongoengine query
returns an iterable object called a QuerySet. The "obvious" way to
create a list of the query results would be:
my_objects = list(my_query_set)
and, indeed, that works. But, then I found this code:
my_objects = [obj for obj in my_query_set]
which seemed a bit silly. I called over the guy who wrote it and asked
him why he didn't just write it using list(). I was astounded when it
turned out there's a good reason!
Apparently, list() has an "optimization" where it calls len() on its
argument to try and discover the number of items it's going to put into
the list. Presumably, list() uses this information to pre-allocate the
right amount of memory the first time, without any resizing. If len()
fails, it falls back to just iterating and resizing as needed.
Normally, this would be a win.
The problem is, QuerySets have a __len__() method. Calling it is a lot
faster than iterating over the whole query set and counting the items,
but it does result in an additional database query, which is a lot
slower than the list resizing! Writing the code as a list comprehension
prevents list() from trying to optimize when it shouldn't!
comprehensions today.
We use mongoengine as a database model layer. A mongoengine query
returns an iterable object called a QuerySet. The "obvious" way to
create a list of the query results would be:
my_objects = list(my_query_set)
and, indeed, that works. But, then I found this code:
my_objects = [obj for obj in my_query_set]
which seemed a bit silly. I called over the guy who wrote it and asked
him why he didn't just write it using list(). I was astounded when it
turned out there's a good reason!
Apparently, list() has an "optimization" where it calls len() on its
argument to try and discover the number of items it's going to put into
the list. Presumably, list() uses this information to pre-allocate the
right amount of memory the first time, without any resizing. If len()
fails, it falls back to just iterating and resizing as needed.
Normally, this would be a win.
The problem is, QuerySets have a __len__() method. Calling it is a lot
faster than iterating over the whole query set and counting the items,
but it does result in an additional database query, which is a lot
slower than the list resizing! Writing the code as a list comprehension
prevents list() from trying to optimize when it shouldn't!