A
Aaron Watters
Poking around I discovered somewhere someone saying that
Python gc adds a 4-7% speed penalty.
So since I was pretty sure I was not creating
reference cycles in nucular I tried running the tests with garbage
collection disabled.
To my delight I found that index builds run 30-40% faster without
gc. This is really nice because testing gc.collect() afterward
shows that gc was not actually doing anything.
I haven't analyzed memory consumption but I suspect that should
be significantly improved also, since the index builds construct
some fairly large data structures with lots of references for a
garbage collector to keep track of.
Somewhere someone should mention the possibility that disabling
gc can greatly improve performance with no down side if you
don't create reference cycles. I couldn't find anything like this
on the Python site or elsewhere. As Paul (I think) said, this should
be a FAQ.
Further, maybe Python should include some sort of "backoff"
heuristic which might go like this: If gc didn't find anything and
memory size is stable, wait longer for the next gc cycle. It's
silly to have gc kicking in thousands of times in a multi-hour
run, finding nothing every time.
Just my 2c.
-- Aaron Watters
nucular full text fielded indexing: http://nucular.sourceforge.net
===
http://www.xfeedme.com/nucular/pydistro.py/go?FREETEXT=dingus fish
Python gc adds a 4-7% speed penalty.
So since I was pretty sure I was not creating
reference cycles in nucular I tried running the tests with garbage
collection disabled.
To my delight I found that index builds run 30-40% faster without
gc. This is really nice because testing gc.collect() afterward
shows that gc was not actually doing anything.
I haven't analyzed memory consumption but I suspect that should
be significantly improved also, since the index builds construct
some fairly large data structures with lots of references for a
garbage collector to keep track of.
Somewhere someone should mention the possibility that disabling
gc can greatly improve performance with no down side if you
don't create reference cycles. I couldn't find anything like this
on the Python site or elsewhere. As Paul (I think) said, this should
be a FAQ.
Further, maybe Python should include some sort of "backoff"
heuristic which might go like this: If gc didn't find anything and
memory size is stable, wait longer for the next gc cycle. It's
silly to have gc kicking in thousands of times in a multi-hour
run, finding nothing every time.
Just my 2c.
-- Aaron Watters
nucular full text fielded indexing: http://nucular.sourceforge.net
===
http://www.xfeedme.com/nucular/pydistro.py/go?FREETEXT=dingus fish