L
Laszlo Nagy
psycopg2 is said to be db api 2.0 compilant, but apparent it is buggy.
By default, when I create a cursor with
cur = conn.cursor()
then it creates a cursor that will fetch all rows into memory, even if
you call cur.fetchone() on it. (I tested it, see below.)
I was looking for psycopg2 documentation, but I found nothing. However,
I found some posts telling that named cursors do support fetching a
single row at a time. Here is how to create a named cursor:
cur = conn.cursor('mycursor')
This is very strange, because DB API 2.0 does not have this feature. Why
this feature was created, and how to use it? Not documented.
Anyway, I wrote a program that uses named/nameless cursors, sets
cur.arraysize to different values before callig cur.execute, then
fetches all rows with fetchone(). I used the following query on a bigger
table:
select name from product limit 1000000
Here are the test results for fetchone():
Named cursor, arraysize=1 2613 rec/sec, query opens in 0.005 sec,
uses only a few memory
Named cursor, arraysize=1000 2831 rec/sec, query opens in 0.005 sec,
uses only a few memory
Nameless cursor, arraysize=1 41335 rec/sec, query opens in 28 sec, uses
100MB+ memory
Nameless cursor, arraysize=1 39602 rec/sec, query opens in 25.8 sec,
uses 100MB+ memory
It is clear that named cursors have very bad performance, I cannot use
them. Nameless cursors cannot be used either, because they are stressing
the system, put 100% disk I/O and big memory usage, without any good reason.
The only one solution I found is to use named cursors, and use
fetchmany(100) instead of fetchone(). This results in fast opening
(0.005 sec) of the cursor, and good speed (30 000 rec/sec, about 75% of
the nameless version). (Do I really need to implement buffering in
Python to be efficient???)
However, this is also not usable, because named cursors do not have a
".description" property! You can try this:
cur = conn.cursor('mycursor')
cur.execute('select name from product limit 1000000')
print repr(cur.description) # -> None
This is unacceptable! This is not DB API 2.0 compilant. I have to know
the names of the columns, how can I do that?
What am I doing wrong? Please help me!
Thanks,
Laszlo
p.s.: I tried to subscribe to the psycopg mailing list, but the
confirmation email did not arrive in 6 hours...
By default, when I create a cursor with
cur = conn.cursor()
then it creates a cursor that will fetch all rows into memory, even if
you call cur.fetchone() on it. (I tested it, see below.)
I was looking for psycopg2 documentation, but I found nothing. However,
I found some posts telling that named cursors do support fetching a
single row at a time. Here is how to create a named cursor:
cur = conn.cursor('mycursor')
This is very strange, because DB API 2.0 does not have this feature. Why
this feature was created, and how to use it? Not documented.
Anyway, I wrote a program that uses named/nameless cursors, sets
cur.arraysize to different values before callig cur.execute, then
fetches all rows with fetchone(). I used the following query on a bigger
table:
select name from product limit 1000000
Here are the test results for fetchone():
Named cursor, arraysize=1 2613 rec/sec, query opens in 0.005 sec,
uses only a few memory
Named cursor, arraysize=1000 2831 rec/sec, query opens in 0.005 sec,
uses only a few memory
Nameless cursor, arraysize=1 41335 rec/sec, query opens in 28 sec, uses
100MB+ memory
Nameless cursor, arraysize=1 39602 rec/sec, query opens in 25.8 sec,
uses 100MB+ memory
It is clear that named cursors have very bad performance, I cannot use
them. Nameless cursors cannot be used either, because they are stressing
the system, put 100% disk I/O and big memory usage, without any good reason.
The only one solution I found is to use named cursors, and use
fetchmany(100) instead of fetchone(). This results in fast opening
(0.005 sec) of the cursor, and good speed (30 000 rec/sec, about 75% of
the nameless version). (Do I really need to implement buffering in
Python to be efficient???)
However, this is also not usable, because named cursors do not have a
".description" property! You can try this:
cur = conn.cursor('mycursor')
cur.execute('select name from product limit 1000000')
print repr(cur.description) # -> None
This is unacceptable! This is not DB API 2.0 compilant. I have to know
the names of the columns, how can I do that?
What am I doing wrong? Please help me!
Thanks,
Laszlo
p.s.: I tried to subscribe to the psycopg mailing list, but the
confirmation email did not arrive in 6 hours...