Kretch said:
I was wondering whether find_by_name has linear complexity O(n) with n
being the number of records? Or is it log(n) or constant? I have some
code which uses find_by_name heavily, and I was wondering if the long
runtimes I'm seeing are a result of find_by_name being linear
Assuming you're talking about Rails and ActiveRecord (this is a *Ruby*
group here; there are better places to ask Rails questions)....
find_by_name turns into an SQL "select * from <table> where name = 'value'".
If you have query caching enabled (the default), this call might be
averted and be turned into a hash lookup... but assuming that isn't the
case, the costs come down to several factors:
Any call to SQL has a large almost-constant overhead, simply due to the
context-switch involved. A remote procedure call (whether transported
using shared memory, pipes or sockets) involves upward of 100,000
instructions, and from Ruby, possibly much more - especially if several
layers of database adapters are involved. This will almost completely
flush the processor's caches, and possibly much of the L2 cache as
well. All the code executed, and all the data touched, will have to be
loaded from RAM, and that's a significant time cost. A rule of thumb
would be 2-15 milliseconds per database call... an age in the CPU's
time-scale. The limit here is the speed of main memory and the FSB
interface of your CPU chip.
Next, SQL must parse and optimize the query, since Rails doesn't make
any use of the stored procedures that most DBMS provide to speed bypass
this step. The optimization process will *hopefully* find that you've
placed an index over the "name" field. If not, the entire table must
be loaded until the matching entry is found. In either case, quite a
lot of code is required to parse and optimize your query and that will
normally complete the cache flushing mentioned above.
Then, to execute the search requires a log(N) search through disk pages,
where N is the number of disk pages in the index - divide the size of
an index page (say 8K) by the average size of a key (plus say 16 bytes),
and you know how many pages are involved. Each physical disk access will
take perhaps 5 milliseconds, but even a large index will only search 2-4
pages. However, the disk pages will normally exist in the DBMS's buffer
cache, at least all the internal nodes of the indices should be, so you
may only have one real disk access. In these days of web apps having
databases of 20MB running on servers having 1GB, the need to hit the
disk is minimized. DBMS technology is optimized around the DB being
more than 10x the available RAM, but typical Rails apps never go close
to that.
Having found the correct index entry, the data page will normally need
to be loaded as well - again Rails doesn't know anything about clustering
the data with the most-used index (and even if it did, that'd be the
surrogate key "id" field).
Finally the results will be passed back to your program and turned from
a row into an ActiveRecord object. That requires your CPU to reload
most of the instructions (and some data) that make up the Ruby interpreter
and program, hitting main memory again.
Bottom line: though there is a log(N) component - and perhaps even an
O(N) component if you forgot to add indexes - in most Rails apps you
are hit heavily on fixed per-call costs.
I hope this message gives you some insight into why Rails database
performance is often poor, especially compared to more "enterprisey"
toolkits that streamline some of the above operations by making proper
use of the DBMS' features.
Clifford Heath, Data Constellation.