Is it just me, or is Sqlite3 goofy?

M

mensanator

Magnus said:
While I can understand your frustration, I think it is
important to think about the tone in our postings here.
Hydrocephalus is one of the most common birth defects,
and it's not terribly unlikely that someone who reads
this has a family member or someone else in his proximity
who suffers from this condition.

Ok, I appologize. No more Larry the Cable Guy humor.
I'll stick with Monty Python and Douglas Adams.
Still, many people with lots of experience in databases
use it, and prefer it for certain kinds of applications.
All systems have limitations and deviations, and those
limitations and deviations are stated more clearly for
SQLite than for most commercial products at least.

But they're not stated up front. The deviations from
standard SQL are extreme enough to warrant mentioning
in the Python docs.
The
market leader Oracle still can't store empty strings in
VARCHAR fields for instance. They are silently converted
to NULL. I'm pretty sure that has been in clear violation
to the official spec since 1986 at least.

But does Oracle claim the specification is a bug?
As far as I understand, noone here is forcing you to use
SQLite,

As long as it's included in the standard library, I'm going
to use it. There is nothing wrong with the idea of a "lite"
database. It is very misleading, though, to claim it's SQL.
and with your long experience of MS Access I'd
expect you to be fairly used to "almost SQL"... It's
some time since I used Jet/Access now, but I had much
more problems with that than I've had with SQLite.

SQLite is built in Tcl, by someone who appreciates the
way Tcl works, with its weak typing. I don't think Tcl's
type handling is nearly as clever as Python's, but I
think it's a good thing that Python's standard lib finally
has a DB-API compliant module, and while I would have
preferred something that was closer to standard SQL, I
don't know of a better candidate than SQLite.

It's good that it's usable without a server setup, and
that it's very light weight. A Jet engine is obviously
not an option, and I would have preferred SQLite even
if Jet was open source and worked on all platforms.
(Well, if JET *was* open source, I suspect it would
have been fixed by now.) It's possible that one could
have used the embedded version of Firebird instead, but
in my experience that's not nearly as lean or easy to
deploy.

With your long experience of Access and SQL Server I'm
sure you know well that any attempt to build a working
database application requires extensive knowledge of
the backend to understand its peculiarities and
limitations.

The list of software projects where not quite competent
developers built Access applications that worked ok in
small scale tests and failed catastrophically in real
life is looong...

Of course, if you've stayed with one vendor for 15 years,
I can imagine that you've forgotten how long it took you
Having worked with half a dozen backends or so, I'm no
longer surprised that SQL can be interpreted in so many
ways... I agree that SQLite is unique in it's approach
to typing, but if you are aware of this, it's really not
a big problem.

Ok, I'm now aware of it.

I'm aware that all my WHERE clauses will have to be
modified to allow for text that may show up in my numeric
fields.

I'm aware that all my ORDER BY clauses will have to
be modified for the same reason.

And I'm aware that INNER JOIN doesn't even work in
theory with dynamic typed fields and can't even be
worked around like the WHERE and ORDER BY clauses
can.

And given a typical query I use:

SELECT tblLocations.SiteID,
tblSites.SiteName,
tblLocations.IEPALocationNumber,
tblZones.Zone,
tblSampleEvent.SampleEventCode,
tblSampleAnalyses.SampleDate,
tblSamples.SampleMatrixID,
tblSamples.SampleNumber,
tblRefQAQCTypes.QAQCType,
tblResults.LabSampleNumber,
tblRefParameterGroups.ParameterGroupCode,
tblSampleAnalyses.AnalysisID,
tblRefAnalyses.AnalysisTypeCode,
tblRefParameters.ParameterReportLabel,
tblRefParameters.CASNumber,
tblResults.Result,
tblResults.Qualifier,
tblRefUnitOfMeasure.Unit,
Val(Format$(IIf(([tblResults].[unitid]=5) Or
([tblResults].[unitid]=7),
[result]/1000,[result]),
"0.0000"))
AS the_ppm_result,
IIf([tblResults].[unitid]=7,"mg/kg",
IIf([tblResults].[unitid]=5,"mg/L",
[unit]))
AS the_ppm_unit,
Val(Format$(IIf(([tblResults].[unitid]=5) Or
([tblResults].[unitid]=7),
[quantitationlimit]/1000,
[quantitationlimit]),"0.0000"))
AS the_ppm_dl,
IIf(IsNull([qualifier]) Or
([qualifier] Not Like "*U*"),1,0)
AS detect,
IIf([detect] And
[the_ppm_result]>[tbl_GW_classi],"1","0")
AS x,
IIf([detect] And
[the_ppm_result]>[tbl_GW_classi],"!","_")
AS xflag,
Master_Lookup.Tbl_pH_range,
Master_Lookup.Tbl_GW_units,
Master_Lookup.Tbl_GW_ClassI,
tblResults.Comment,
IIf([detect],[the_ppm_result],[the_ppm_result]/2)
AS detected_result,
IIf([detect],[the_ppm_result],Null)
AS detected_max_result
FROM tblSites
INNER JOIN ((tblSampleEvent
INNER JOIN (tblRefQAQCTypes
INNER JOIN ((tblSamples
INNER JOIN tblLocations
ON tblSamples.TrueLocation = tblLocations.LocationID)
INNER JOIN tblZones
ON tblLocations.LocationID = tblZones.LocationID)
ON tblRefQAQCTypes.QAQCID = tblSamples.QAQCID)
ON tblSampleEvent.EventID = tblSamples.EventID)
INNER JOIN ((tblRefAnalyses
INNER JOIN tblSampleAnalyses
ON tblRefAnalyses.AnalysisID = tblSampleAnalyses.AnalysisID)
INNER JOIN (tblRefUnitOfMeasure
INNER JOIN ((tblRefParameterGroups
INNER JOIN (tblRefParameters
LEFT JOIN Master_Lookup
ON tblRefParameters.CASNumber = Master_Lookup.Tbl__B_cas)
ON tblRefParameterGroups.ParameterGroupID =
tblRefParameters.ParameterGroupID)
INNER JOIN tblResults
ON tblRefParameters.ParameterID = tblResults.ParameterID)
ON tblRefUnitOfMeasure.UnitID = tblResults.UnitID)
ON tblSampleAnalyses.SampleAnalysisID = tblResults.SampleAnalysisID)
ON tblSamples.SampleID = tblSampleAnalyses.SampleID)
ON tblSites.SiteID = tblLocations.SiteID
WHERE (((tblLocations.SiteID)<51)
AND (Not (tblLocations.IEPALocationNumber)="G116")
AND ((tblZones.Zone)="UMAquifer")
AND ((tblSampleEvent.SampleEventCode) Like "200[2-6]Q*")
AND ((tblSamples.SampleMatrixID)=1)
AND ((tblSampleAnalyses.AnalysisID)>1)
AND ((tblResults.Qualifier) Is Null)
AND ((Master_Lookup.Tbl_pH_range)="pH 6.9 to 7.24" Or
(Master_Lookup.Tbl_pH_range) Is Null)
AND ((tblSamples.QAQCID)=1 Or (tblSamples.QAQCID)=4))
OR (((tblLocations.SiteID)<51)
AND (Not (tblLocations.IEPALocationNumber)="G116")
AND ((tblZones.Zone)="UMAquifer")
AND ((tblSampleEvent.SampleEventCode) Like "200[2-6]Q*")
AND ((tblSamples.SampleMatrixID)=1)
AND ((tblSampleAnalyses.AnalysisID)>1)
AND ((tblResults.Qualifier) Not Like "*Z*" And
(tblResults.Qualifier) Not Like "*R*")
AND ((Master_Lookup.Tbl_pH_range)="pH 6.9 to 7.24" Or
(Master_Lookup.Tbl_pH_range) Is Null)
AND ((tblSamples.QAQCID)=1 Or
(tblSamples.QAQCID)=4))
ORDER BY tblLocations.SiteID,
tblLocations.IEPALocationNumber,
tblSampleEvent.SampleEventCode,
tblRefParameterGroups.ParameterGroupCode,
tblRefParameters.ParameterReportLabel,
tblSampleAnalyses.SampleDate;

you're saying I don't have a big problem?
 
P

Paul Rubin

As long as it's included in the standard library, I'm going
to use it. There is nothing wrong with the idea of a "lite"
database. It is very misleading, though, to claim it's SQL.

Maybe it could be renamed by changing the "t" in "lite" to "k".
 
M

Marc 'BlackJack' Rintsch

In <[email protected]>,
So I can't just say

WHERE qty=12

I have to say

WHERE (qty=12) OR (qty="12")

No you just write the first and don't stuff strings into that column.
It's your choice after all.
Do you know what INNER JOIN means?

Do you know how important it is to a relational database?

Can you explain how an INNER JOIN can even work, in theory,
with dynamic data types?

The same as with static types!? Where's the problem?

table_a:

id name
----- ----------
42 bob
'foo' methusalem
'42' gaylord

table_b:

id age
----- ------
42 23
'foo' 1000
0.5 None

Join on `id`:

id name age
----- ---------- ----
42 bob 23
'foo' methusalem 1000
The implications of violating the SQL Language Specification
are much more serious than you think.

Or not so serious than you think.

Ciao,
Marc 'BlackJack' Rintsch
 
P

Paul McNett

Do you know what INNER JOIN means?

Do you know how important it is to a relational database?

Can you explain how an INNER JOIN can even work, in theory,
with dynamic data types?

Let's stop the pissing contest and just see how it works. After all,
this is Python and we can easily just try it out. Here's my example.
Please tell me how this causes unexpected results, and why it isn't SQL.
Please modify my example to get it to cause a catastrophe, and post it
here so we can see the errors of our ways and be properly humbled.

#-- Preliminaries:
#-- Create 3 tables for a M:M relationship between customers
#-- and categories:autoincrement, cust_id integer, cat_id integer)")

#-- Insert some test data into customer and categories:
#-- Take a look at the data (and retrieve the pk's):
>>> cur.fetchall() [(1, u'Ziggy Marley'), (2, u'David Bowie')]
>>> cur.execute("select * from categories")
>>> cur.fetchall()
[(1, u'Glam Rock'), (2, u'Nuevo Reggae'), (3, u'Male Singers')]

#-- Relate some customers to some categories. Note how I send strings
#-- in some places and ints in others:
#-- Run some queries:cust_name, categories.id as cat_id, categories.name as cat_name from
customers inner join cust_cat on cust_cat.cust_id = customers.id inner
join categories on categories.id = cust_cat.cat_id order by 2,4")[(2, u'David Bowie', 1, u'Glam Rock'), (2, u'David Bowie', 3, u'Male
Singers'), (1, u'Ziggy Marley', 3, u'Male Singers'), (1, u'Ziggy
Marley', 2, u'Nuevo Reggae')]
cust_name, categories.id as cat_id, categories.name as cat_name from
customers inner join cust_cat on cust_cat.cust_id = customers.id inner
join categories on categories.id = cust_cat.cat_id where categories.id =
1 order by 2,4")[(2, u'David Bowie', 1, u'Glam Rock')]
cust_name, categories.id as cat_id, categories.name as cat_name from
customers inner join cust_cat on cust_cat.cust_id = customers.id inner
join categories on categories.id = cust_cat.cat_id where categories.id =
'1' order by 2,4")[(2, u'David Bowie', 1, u'Glam Rock')]
cust_name, categories.id as cat_id, categories.name as cat_name from
customers inner join cust_cat on cust_cat.cust_id = customers.id inner
join categories on categories.id = cust_cat.cat_id where categories.id =
'2' order by 2,4")[(1, u'Ziggy Marley', 2, u'Nuevo Reggae')]
cust_name, categories.id as cat_id, categories.name as cat_name from
customers inner join cust_cat on cust_cat.cust_id = customers.id inner
join categories on categories.id = cust_cat.cat_id where categories.id =
'3' order by 2,4")[(2, u'David Bowie', 3, u'Male Singers'), (1, u'Ziggy Marley', 3, u'Male
Singers')]
cust_name, categories.id as cat_id, categories.name as cat_name from
customers inner join cust_cat on cust_cat.cust_id = customers.id inner
join categories on categories.id = cust_cat.cat_id where categories.id =
3 order by 2,4")[(2, u'David Bowie', 3, u'Male Singers'), (1, u'Ziggy Marley', 3, u'Male
Singers')]

If I have skipped the test case that will fail, please enlighten me.
 
M

mensanator

Marc said:
In <[email protected]>,


No you just write the first and don't stuff strings into that column.
It's your choice after all.

Not when I don't control the source of the data.
Despite the laboratory having been given a static
type definition for data deliverables, I sporadically
find text in my numeric fields.

And surely you don't think I write INSERT statements
for each of the over 500,000 records I have? The records
are inserted programatically.

Static types mean the exceptions...raise exceptions.
How am I supposed to identify the exceptions if SQLite
simply quietly converts the non-numeric data to text?
The same as with static types!? Where's the problem?

table_a:

id name
----- ----------
42 bob
'foo' methusalem
'42' gaylord

table_b:

id age
----- ------
42 23
'foo' 1000
0.5 None

Join on `id`:

id name age
----- ---------- ----
42 bob 23
'foo' methusalem 1000

Numbers JOINing numbers and text JOINing text doesn't illustrate
the problem. The problem is when SQLite allows bad data to be
inserted. The problem isn't that fields JOIN when they are not
supposed to, it's when they fail to JOIN when they are supposed
to.
Or not so serious than you think.

Well, I can make up examples also.

Bill for customer: Tom Smith
------------------------------------------------------------------
0003 Olsen's Standard Book of British Birds (Expurgated) $ 99.95


Bill for customer: Dick Smith
------------------------------------------------------------------
0002 A Sale of Two Titties $ 20.00
0005 David Coperfield $ 3.95


Bill for customer: Harry Smith
------------------------------------------------------------------
0001 101 Ways to Start A Fight $ 19.95


Hmm...looks like item 4 isn't being billed to anyone. That's no
way to run a business. If I do a LEFT JOIN instead of an INNER JOIN:


Bill for customer: None None
------------------------------------------------------------------
0004 Ethel the Aardvark Goes Quantity Surveying $ 9.99


Bill for customer: Tom Smith
------------------------------------------------------------------
0003 Olsen's Standard Book of British Birds (Expurgated) $ 99.95


Bill for customer: Dick Smith
------------------------------------------------------------------
0002 A Sale of Two Titties $ 20.00
0005 David Coperfield $ 3.95


Bill for customer: Harry Smith
------------------------------------------------------------------
0001 101 Ways to Start A Fight $ 19.95


I see the missing item, but still don't know who to bill it to.
What if I dumped the invoice table?

4
54
Ethel the Aardvark Goes Quantity Surveying
None
1975
9.99

3
001
Olsen's Standard Book of British Birds (Expurgated)
Olsen
None
99.95

2
032
A Sale of Two Titties
Charles Dikkens
1855
20.0

5
032
David Coperfield
Edmund Wells
1955
3.95

1
066
101 Ways to Start A Fight
some Irish gentleman
1919
19.95

Aha, it was sold to customer 54, so now I just need to dump
the customer table:

001
Tom
Smith
42

032
Dick
Smith
28

066
Harry
Smith
66

only to discover there is no customer 54! How can that happen?

invoices = [(1,'066','101 Ways to Start A Fight','some Irish
gentleman',1919,19.95), \
(2,'032','A Sale of Two Titties','Charles Dikkens',1855,20.00), \
(3,'001',"Olsen's Standard Book of British Birds
(Expurgated)","Olsen",None,99.95), \
(4,066,'Ethel the Aardvark Goes Quantity Surveying',None,1975,9.99), \
(5,'032','David Coperfield','Edmund Wells',1955,3.95)]

Oops! Forgot the quotes around the customer id for item 4.
But why didn't it become 66? Because the leading 0 made
it octal! A little static typing would have helped here.

Now, IF the same error is repeated in EVERY table that's JOINed,
THEN the dynamic typing won't be a problem. But that never happens
in practice. I populate one table and ensure it's correct. Someone
else populates the other table and screws it up so the JOIN never
works.

Sure, errors happen with static typing. After all, the values still
have to match. Dynamic typing allows for more potential errors and,
thanks to Murpy's Law, I will have a much bigger problem with data
integrity.
 
B

Bryan Olson

Ben said:
It's not a crackpot theory. It's a completely reasonable theory. SQL is
based on relational algebra, which provides a mathematical set of
operators for grouping data that is stored in separate sets. That data
is selected and projected according to its value, and nothing else. The
concept of it having a 'type' has been overlaid on top of this,
presumably to facilitate efficient implementation, which tends to
require fixed-width rows (and hence columns). It's not necessary in any
sense, and it's reasonable to argue that if it was trivial to implement
variable width columns as efficiently as fixed width columns, that
explicit data types might never have needed to exist.

The mathematical definition of the relational model includes
that data values are drawn from specific sets. Implementing
variable width columns has nothing to do with it. Here's the
reference:

1.3. A Relational View of Data

The term relation is used here in its accepted mathematical
sense. Given sets S1, S2, ···, Sn, (not necessarily
distinct), R is a relation on these n sets if it is a set
of n-tuples each of which has its first element from S1,
its second element from S2, and so on [1]. We shall refer to
Sj as the jth domain of R.

[Footnote 1:] More concisely, R is a subset of the Cartesian
product S1* S2 * ... * Sn.

-- E. F. Codd. "A Relational Model of Data for Large Shared
Data Banks." /Communications of the ACM/, Vol. 13, No. 6,
June 1970, pp. 377-387.
 
G

George Sakkis

Sure, errors happen with static typing. After all, the values still
have to match. Dynamic typing allows for more potential errors and,
thanks to Murpy's Law, I will have a much bigger problem with data
integrity.

If this was a java or c++ list, all this rant would be more
understandable, but bashing dynamic typing in a dynamic language list
seems pointless at best (as this has been beaten to death over and over
again), flamebait at worst. It should be clear by now that there are
two (at least) alternatives:
1. Validate the data in python before (or at the same time when)
feeding the DB.
2. Forget sqlite and use a statically typed DBMS; it's not like there
is a shortage of them.

Massaging your SQL statements to make up for the lack of type checking
(even if this is always possible) would be a bad idea for more than
one reasons (complexity,portability,performance), so you'd better not
go down this road.

George
 
K

Kay Schluehr

It's not a bug, it's a feature. And answered as third point in the
FAQ:
http://www.sqlite.org/faq.html#q3

I think your whole experience is based on it. Live with it or use a
real RDBMS.

If you are so fond of static typing, why are you using Python in the first
place? Just see it as consistency -- dynamically typed language →
dynamically typed DB columns. ;-)

Ciao,
Marc 'BlackJack' Rintsch

I have to admit I find this bogus too. It has by no means anything to
do with static typing but letting "errors pass silently" i.e.
deactivating runtime type checks as well. The problem here is that
fields are not dynamically type checked but completely untyped and only
coercion hints are present. Using a clever coercion / "type-affinity"
does not justify that there is no error case handling when the coercion
fails. This might be handled by user code ( or better by the pysqlite
wrapper ) but it appears to be redundant.
 
D

Dennis Lee Bieber

On 8 Sep 2006 16:46:03 -0700, "(e-mail address removed)" <[email protected]>
declaimed the following in comp.lang.python:

<snip>

After a sequence of hypothetical results of occult SQL you show
this...
invoices = [(1,'066','101 Ways to Start A Fight','some Irish
gentleman',1919,19.95), \
(2,'032','A Sale of Two Titties','Charles Dikkens',1855,20.00), \
(3,'001',"Olsen's Standard Book of British Birds
(Expurgated)","Olsen",None,99.95), \
(4,066,'Ethel the Aardvark Goes Quantity Surveying',None,1975,9.99), \
(5,'032','David Coperfield','Edmund Wells',1955,3.95)]

... A Python list of tuples!
Oops! Forgot the quotes around the customer id for item 4.
But why didn't it become 66? Because the leading 0 made
it octal! A little static typing would have helped here.
Do you routinely populate your databases by editing python lists of
tuples? And if you do, why not show us the table definition and insert
statements that go with the above data?

Or do you get the data from some file... What format is that file?

Would you agree that the following is a clean representation of your
example data, when considered as a CSV data source? (I left off the
primary key -- we'll just let SQLite add that value). I DID, however,
normalize the quoting by changing ' to ". (please ignore the line wrap
on the (Expurgated) -- it does not exist in the data file)

invoice.csv
-=-=-=-=-=-
"066","101 Ways to Start A Fight","some Irish gentleman",1919,19.95
"032","A Sale of Two Titties","Charles Dikkens",1855,20.00
"001","Olsen"s Standard Book of British Birds
(Expurgated)","Olsen",None,99.95
066,"Ethel the Aardvark Goes Quantity Surveying",None,1975,9.99
"032","David Coperfield","Edmund Wells",1955,3.95
-=-=-=-=-=-

Now, since you seem to believe that a "customer number" is a string
data type (implied by the presence of quotes around all but the "bad
data" record), I've so defined it in the table definition... Ah, but
what the heck, let's create a table with it defined as an integer too...

SQLiteTest.py
-=-=-=-=-=-=-=-
from pysqlite2 import dbapi2 as sql
import csv

TABLE_DEF_1 = """
create table invoice_1
(
ID integer primary key,
CustNo char,
Title char,
Author char,
Year integer,
Price float
) """

TABLE_DEF_2 = """
create table invoice_2
(
ID integer primary key,
CustNo integer,
Title char,
Author char,
Year integer,
Price float
) """

db = sql.connect("test.db")
cr = db.cursor()
try:
rs = cr.execute(TABLE_DEF_1)
db.commit()
except: #I know, I should be explicit
pass #assume table already exists

try:
rs = cr.execute(TABLE_DEF_2)
db.commit()
except: #I know, I should be explicit
pass #assume table already exists


fin = open("invoice.csv", "rb")
indata = csv.reader(fin)

print "\nInserting:"
for r in indata:
print r
if len(r) != 5:
print "^^^Bad Record"
else:
rs = cr.execute("""insert into invoice_1
(CustNo, Title, Author, Year, Price)
values (?,?,?,?,?)""",
r)
rs = cr.execute("""insert into invoice_2
(CustNo, Title, Author, Year, Price)
values (?,?,?,?,?)""",
r)
db.commit()

fin.close()

print "\nSelect all from Invoice_1 (CustNo is CHARACTER)"
rs = cr.execute("select * from invoice_1")
for r in cr:
print r

print "\nSelect all from Invoice_2 (CustNo is INTEGER)"
rs = cr.execute("select * from invoice_2")
for r in cr:
print r

db.close()
-=-=-=-=-=-=-=-

Now, let us run the above program, using the above data file! Again,
watch out for line wrapping (my comments will be blocked off with """ )

E:\UserData\Dennis Lee Bieber\My Documents\Python Progs>python
SQLiteTest.py

Inserting:
['066', '101 Ways to Start A Fight', 'some Irish gentleman', '1919',
'19.95']
['032', 'A Sale of Two Titties', 'Charles Dikkens', '1855', '20.00']
['001', 'Olsens Standard Book of British Birds (Expurgated)"', 'Olsen',
'None', '99.95']
['066', 'Ethel the Aardvark Goes Quantity Surveying', 'None', '1975',
'9.99']
['032', 'David Coperfield', 'Edmund Wells', '1955', '3.95']

"""
Well, look at that... ALL the data from the file is coming in as
character strings... customer number, year, price, title, author... It's
ALL character! The difference between quoted and unquoted numbers has
been lost.
"""

Select all from Invoice_1 (CustNo is CHARACTER)
(1, u'066', u'101 Ways to Start A Fight', u'some Irish gentleman', 1919,
19.949999999999999)
(2, u'032', u'A Sale of Two Titties', u'Charles Dikkens', 1855, 20.0)
(3, u'001', u'Olsens Standard Book of British Birds (Expurgated)"',
u'Olsen', u'None', 99.950000000000003)
(4, u'066', u'Ethel the Aardvark Goes Quantity Surveying', u'None',
1975, 9.9900000000000002)
(5, u'032', u'David Coperfield', u'Edmund Wells', 1955,
3.9500000000000002)

"""
No strange results there -- the year and price aren't stored as
strings, even though they were string data when inserted.
"""
Select all from Invoice_2 (CustNo is INTEGER)
(1, 66, u'101 Ways to Start A Fight', u'some Irish gentleman', 1919,
19.949999999999999)
(2, 32, u'A Sale of Two Titties', u'Charles Dikkens', 1855, 20.0)
(3, 1, u'Olsens Standard Book of British Birds (Expurgated)"', u'Olsen',
u'None', 99.950000000000003)
(4, 66, u'Ethel the Aardvark Goes Quantity Surveying', u'None', 1975,
9.9900000000000002)
(5, 32, u'David Coperfield', u'Edmund Wells', 1955, 3.9500000000000002)

"""
And look here... Again no strange results -- SQLite didn't even
treat the leading 0 as a signal that the customer number is octal. They
come out as integers without leading 0s though -- but then again, I'm
not using a formatting statement on the output...
E:\UserData\Dennis Lee Bieber\My Documents\Python Progs>
"""
Sure, errors happen with static typing. After all, the values still
have to match. Dynamic typing allows for more potential errors and,
thanks to Murpy's Law, I will have a much bigger problem with data
integrity.
SQLite's "dynamic typing" does not mean "random" or "unpredictable"
typing.

The basic rules are fairly simple.

IF the data field is declared as a numeric type, AND the input data
can be coerced to numeric without error, it is stored and returned as a
numeric value -- one would have to pass in a data value that contained
non-numeric characters for it to become a character string.

IF the data field is declared as a character type, AND the input
data is a numeric, it is converted to a character representation and
stored/returned as character.

(less basic involves the application of data type converters which are
probably user supplied)


As for your example of invoices and customers, surely the
application isn't trusting the user to type in a raw "customer number"
for the invoice without first validating it by attempting to retrieve
that customer from a customer table. Or, more likely, using the customer
name to look up the number in the customer table, meaning the customer
number -- whatever it is -- /will/ match the invoice data as it was
taken directly from the database.

Any data supplied to you in file format, if humanly readable, is
going to be character strings when fed to SQLite UNLESS your code first
performs some sort of conversion on it -- and I suspect you'll detect
the bad data when doing that conversion. And, by the way, the octal
detection only happens for numeric literals IN a Python statement, not
in conversion of an external string data item to numeric.

Of course, if you're being supplied binary data files, you are
probably using the struct module to extract the numeric data fields...
But how, I wonder, would one get a non-numeric value using a numeric
specification on a string of raw bytes?
--
Wulfraed Dennis Lee Bieber KD6MOG
(e-mail address removed) (e-mail address removed)
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: (e-mail address removed))
HTTP://www.bestiaria.com/
 
D

Dennis Lee Bieber

I'm aware that all my WHERE clauses will have to be
modified to allow for text that may show up in my numeric
fields.
Why?

If the field is declared numeric, and the data supplied LOOKS
numeric (IE, no alphabetic characters, no odd punctuation...) SQLite
stores it AS numeric.

"12"
and
12

are the SAME value to SQLite. In a numeric field both are stored as

12 (0000000c in hex)

and in a character field both are stored as

u"12" (don't know the unicode, so will list as ASCII 00023132
assuming a two byte VARCHAR size marker)

(remember, the " are only used to make a string data LITERAL in code,
they are not part of the data itself)

"'12'", OTOH is not numeric -- it has ' inside the ", so the ' is
part of the non-numeric data.

Now, if someone supplies

"1Z"

for that numeric field, SQLite finds that "1Z" can not be turned into a
numeric without error, and only THEN stores it as a character type.
--
Wulfraed Dennis Lee Bieber KD6MOG
(e-mail address removed) (e-mail address removed)
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: (e-mail address removed))
HTTP://www.bestiaria.com/
 
M

mensanator

George said:
If this was a java or c++ list, all this rant would be more
understandable, but bashing dynamic typing in a dynamic language list
seems pointless at best (as this has been beaten to death over and over
again), flamebait at worst.

But I'm not bashing Python's use of dynamic typing. But if the
SQL Language Specification says static typing, then static typing
it is. Period.
It should be clear by now that there are
two (at least) alternatives:
1. Validate the data in python before (or at the same time when)
feeding the DB.

Data integrity is an issue even with static typing. It's a bigger
issue with dynamic typing.
2. Forget sqlite and use a statically typed DBMS; it's not like there
is a shortage of them.

I have no intention of forgetting sqlite simply because it's
now part of the standard library. I have now qualms about
using it *now* because I understand it better. But reaching
that level of understanding was like pulling teeth.

Documentation shouldn't talk down to the reader. It's always
bad when you confuse the smart people. The ignorant are
supposed to be confused. It's job of the documentation to
educate the ignorant. Hiding the idiosynchrocies of Sqlite3
from the user who's already familiar with SQL is simply
unacceptable.
 
M

mensanator

Dennis said:
On 8 Sep 2006 16:46:03 -0700, "(e-mail address removed)" <[email protected]>
declaimed the following in comp.lang.python:

<snip>

After a sequence of hypothetical results of occult SQL you show
this...
invoices = [(1,'066','101 Ways to Start A Fight','some Irish
gentleman',1919,19.95), \
(2,'032','A Sale of Two Titties','Charles Dikkens',1855,20.00), \
(3,'001',"Olsen's Standard Book of British Birds
(Expurgated)","Olsen",None,99.95), \
(4,066,'Ethel the Aardvark Goes Quantity Surveying',None,1975,9.99), \
(5,'032','David Coperfield','Edmund Wells',1955,3.95)]

... A Python list of tuples!
Oops! Forgot the quotes around the customer id for item 4.
But why didn't it become 66? Because the leading 0 made
it octal! A little static typing would have helped here.
Do you routinely populate your databases by editing python lists of
tuples?

I don't routinely do anything, as I've only been playing with it for
4 days.
And if you do, why not show us the table definition and insert
statements that go with the above data?

I was simply following the examples in the Python manuals.
If the examples are poor usage, maybe they shouldn't be in
the manuals.
Or do you get the data from some file... What format is that file?

No file. Point is moot.
Would you agree that the following is a clean representation of your
example data, when considered as a CSV data source? (I left off the
primary key -- we'll just let SQLite add that value). I DID, however,
normalize the quoting by changing ' to ". (please ignore the line wrap
on the (Expurgated) -- it does not exist in the data file)

invoice.csv
-=-=-=-=-=-
"066","101 Ways to Start A Fight","some Irish gentleman",1919,19.95
"032","A Sale of Two Titties","Charles Dikkens",1855,20.00
"001","Olsen"s Standard Book of British Birds
(Expurgated)","Olsen",None,99.95
066,"Ethel the Aardvark Goes Quantity Surveying",None,1975,9.99
"032","David Coperfield","Edmund Wells",1955,3.95
-=-=-=-=-=-

Now, since you seem to believe that a "customer number" is a string
data type (implied by the presence of quotes around all but the "bad
data" record), I've so defined it in the table definition... Ah, but
what the heck, let's create a table with it defined as an integer too...

SQLiteTest.py
-=-=-=-=-=-=-=-
from pysqlite2 import dbapi2 as sql
import csv

TABLE_DEF_1 = """
create table invoice_1
(
ID integer primary key,
CustNo char,
Title char,
Author char,
Year integer,
Price float
) """

TABLE_DEF_2 = """
create table invoice_2
(
ID integer primary key,
CustNo integer,
Title char,
Author char,
Year integer,
Price float
) """

db = sql.connect("test.db")
cr = db.cursor()
try:
rs = cr.execute(TABLE_DEF_1)
db.commit()
except: #I know, I should be explicit
pass #assume table already exists

try:
rs = cr.execute(TABLE_DEF_2)
db.commit()
except: #I know, I should be explicit
pass #assume table already exists


fin = open("invoice.csv", "rb")
indata = csv.reader(fin)

print "\nInserting:"
for r in indata:
print r
if len(r) != 5:
print "^^^Bad Record"
else:
rs = cr.execute("""insert into invoice_1
(CustNo, Title, Author, Year, Price)
values (?,?,?,?,?)""",
r)
rs = cr.execute("""insert into invoice_2
(CustNo, Title, Author, Year, Price)
values (?,?,?,?,?)""",
r)
db.commit()

fin.close()

print "\nSelect all from Invoice_1 (CustNo is CHARACTER)"
rs = cr.execute("select * from invoice_1")
for r in cr:
print r

print "\nSelect all from Invoice_2 (CustNo is INTEGER)"
rs = cr.execute("select * from invoice_2")
for r in cr:
print r

db.close()
-=-=-=-=-=-=-=-

Now, let us run the above program, using the above data file! Again,
watch out for line wrapping (my comments will be blocked off with """ )

E:\UserData\Dennis Lee Bieber\My Documents\Python Progs>python
SQLiteTest.py

Inserting:
['066', '101 Ways to Start A Fight', 'some Irish gentleman', '1919',
'19.95']
['032', 'A Sale of Two Titties', 'Charles Dikkens', '1855', '20.00']
['001', 'Olsens Standard Book of British Birds (Expurgated)"', 'Olsen',
'None', '99.95']
['066', 'Ethel the Aardvark Goes Quantity Surveying', 'None', '1975',
'9.99']
['032', 'David Coperfield', 'Edmund Wells', '1955', '3.95']

"""
Well, look at that... ALL the data from the file is coming in as
character strings... customer number, year, price, title, author... It's
ALL character! The difference between quoted and unquoted numbers has
been lost.
"""

How 'bout that? Maybe I should try harder to make a
better example.
Select all from Invoice_1 (CustNo is CHARACTER)
(1, u'066', u'101 Ways to Start A Fight', u'some Irish gentleman', 1919,
19.949999999999999)
(2, u'032', u'A Sale of Two Titties', u'Charles Dikkens', 1855, 20.0)
(3, u'001', u'Olsens Standard Book of British Birds (Expurgated)"',
u'Olsen', u'None', 99.950000000000003)
(4, u'066', u'Ethel the Aardvark Goes Quantity Surveying', u'None',
1975, 9.9900000000000002)
(5, u'032', u'David Coperfield', u'Edmund Wells', 1955,
3.9500000000000002)

"""
No strange results there -- the year and price aren't stored as
strings, even though they were string data when inserted.
"""
Select all from Invoice_2 (CustNo is INTEGER)
(1, 66, u'101 Ways to Start A Fight', u'some Irish gentleman', 1919,
19.949999999999999)
(2, 32, u'A Sale of Two Titties', u'Charles Dikkens', 1855, 20.0)
(3, 1, u'Olsens Standard Book of British Birds (Expurgated)"', u'Olsen',
u'None', 99.950000000000003)
(4, 66, u'Ethel the Aardvark Goes Quantity Surveying', u'None', 1975,
9.9900000000000002)
(5, 32, u'David Coperfield', u'Edmund Wells', 1955, 3.9500000000000002)

"""
And look here... Again no strange results -- SQLite didn't even
treat the leading 0 as a signal that the customer number is octal. They
come out as integers without leading 0s though -- but then again, I'm
not using a formatting statement on the output...
E:\UserData\Dennis Lee Bieber\My Documents\Python Progs>
"""

SQLite's "dynamic typing" does not mean "random" or "unpredictable"
typing.

The basic rules are fairly simple.

IF the data field is declared as a numeric type, AND the input data
can be coerced to numeric without error, it is stored and returned as a
numeric value -- one would have to pass in a data value that contained
non-numeric characters for it to become a character string.

As if that never happens.
IF the data field is declared as a character type, AND the input
data is a numeric, it is converted to a character representation and
stored/returned as character.

(less basic involves the application of data type converters which are
probably user supplied)


As for your example of invoices and customers, surely the
application isn't trusting the user to type in a raw "customer number"
for the invoice without first validating it by attempting to retrieve
that customer from a customer table. Or, more likely, using the customer
name to look up the number in the customer table, meaning the customer
number -- whatever it is -- /will/ match the invoice data as it was
taken directly from the database.

Any data supplied to you in file format, if humanly readable, is
going to be character strings when fed to SQLite UNLESS your code first
performs some sort of conversion on it -- and I suspect you'll detect
the bad data when doing that conversion. And, by the way, the octal
detection only happens for numeric literals IN a Python statement, not
in conversion of an external string data item to numeric.

Ok, it was a bad example.
 
M

mensanator

Paul said:
Let's stop the pissing contest and just see how it works. After all,
this is Python and we can easily just try it out. Here's my example.
Please tell me how this causes unexpected results,

When we use a best case scenario, we get what we expect.
and why it isn't SQL.

It isn't SQL simply because SQL won't let you insert text
into a numeric field.
Please modify my example to get it to cause a catastrophe,

Make it worse case? Sure, I can do that.
and post it
here so we can see the errors of our ways and be properly humbled.

#-- Preliminaries:
## from pysqlite2 import dbapi2 as sqlite
import sqlite3 as sqlite
## con = sqlite.connect("test.db")
con = sqlite.connect(":memory:")
cur = con.cursor()

#-- Create 3 tables for a M:M relationship between customers
#-- and categories:
cur.execute("create table customers (id integer primary key
autoincrement, name char)")
cur.execute("create table categories (id integer primary key
autoincrement, name char)")
cur.execute("create table cust_cat (id integer primary key
autoincrement, cust_id integer, cat_id integer)")

## cur.execute("create table cust_cat (id integer, cust_id,
cat_id)")
## cur.execute("create table customers (id, name char)")
## cur.execute("create table categories (id, name char)")
##
## # Ok, THIS fails. Because the JOINs were originally made against
fields
## # that were cast as integers so the 'hinting' of sqlite must
operate in a JOIN
## # allowing ints to JOIN against strings. Take away the casts and
the JOIN
## # fails. Unfortunately, not every situation will involve JOINing
primary keys.
##
## [(1, u'Ziggy Marley'), (2, u'David Bowie')]
## [(1, u'Glam Rock'), (2, u'Nuevo Reggae'), (3, u'Male Singers')]
## [(None, 1, 3), (None, 1, u'2'), (None, u'2', u'1'), (None, u'2',
u'3')]
##
## [(1, u'Ziggy Marley', 3, u'Male Singers')]
## []
## []
## []
## []
## [(1, u'Ziggy Marley', 3, u'Male Singers')]


#-- Insert some test data into customer and categories:
cur.execute("insert into customers (name) values ('Ziggy Marley')")
cur.execute("insert into customers (name) values ('David Bowie')")
cur.execute("insert into categories (name) values ('Glam Rock')")
cur.execute("insert into categories (name) values ('Nuevo Reggae')")
cur.execute("insert into categories (name) values ('Male Singers')")

## # if int cast removed, manually insert ids
##cur.execute("insert into customers (id, name) values (1,'Ziggy
Marley')")
##cur.execute("insert into customers (id, name) values (2,'David
Bowie')")
##cur.execute("insert into categories (id, name) values (1,'Glam
Rock')")
##cur.execute("insert into categories (id, name) values (2,'Nuevo
Reggae')")
##cur.execute("insert into categories (id, name) values (3,'Male
Singers')")

cur.execute("select * from customers")


#-- Take a look at the data (and retrieve the pk's):
print cur.fetchall()
#[(1, u'Ziggy Marley'), (2, u'David Bowie')]
cur.execute("select * from categories")
print cur.fetchall()
#[(1, u'Glam Rock'), (2, u'Nuevo Reggae'), (3, u'Male Singers')]


#-- Relate some customers to some categories. Note how I send strings
#-- in some places and ints in others:
##cur.execute("insert into cust_cat (cust_id, cat_id) values (1, 3)")
##cur.execute("insert into cust_cat (cust_id, cat_id) values (1, '2')")

##cur.execute("insert into cust_cat (cust_id, cat_id) values ('2',
'1')")
##cur.execute("insert into cust_cat (cust_id, cat_id) values ('2', 3)")


##cc = [(1,3),(1,'2'),('2','1'),('2','3')]
cc = [(1,3),(1,'>2'),('>2','>1'),('>2','>3')]

## # And this also fails (differently). The 'hinting' of sqlite that
operates
## # during a JOIN only works if the string looks like an integer.
And, no,
## # the use of the '>' is not a contrived example like 'fred'. I
often get a
## # result like '>200' in what is supposed to be a numeric field.
##
## [(1, u'Ziggy Marley'), (2, u'David Bowie')]
## [(1, u'Glam Rock'), (2, u'Nuevo Reggae'), (3, u'Male Singers')]
## [(1, 1, 3), (2, 1, u'>2'), (3, u'>2', u'>1'), (4, u'>2', u'>3')]
##
## [(1, u'Ziggy Marley', 3, u'Male Singers')]
## []
## []
## []
## [(1, u'Ziggy Marley', 3, u'Male Singers')]
## [(1, u'Ziggy Marley', 3, u'Male Singers')]

cur.executemany("insert into cust_cat (cust_id, cat_id) values
(?,?)",cc)
cur.execute("select * from cust_cat")
print cur.fetchall()

print

#-- Run some queries:
cur.execute("""
select customers.id as cust_id,
customers.name as cust_name,
categories.id as cat_id,
categories.name as cat_name
from customers
inner join cust_cat
on cust_cat.cust_id = customers.id
inner join categories
on categories.id = cust_cat.cat_id
order by 2,4""")

print cur.fetchall()
#[(2, u'David Bowie', 1, u'Glam Rock'),
# (2, u'David Bowie', 3, u'Male Singers'),
# (1, u'Ziggy Marley', 3, u'Male Singers'),
# (1, u'Ziggy Marley', 2, u'Nuevo Reggae')]

cur.execute("""
select customers.id as cust_id,
customers.name as cust_name,
categories.id as cat_id,
categories.name as cat_name
from customers
inner join cust_cat
on cust_cat.cust_id = customers.id
inner join categories
on categories.id = cust_cat.cat_id
where categories.id = 1
order by 2,4""")

print cur.fetchall()
#[(2, u'David Bowie', 1, u'Glam Rock')]

cur.execute("""
select customers.id as cust_id,
customers.name as cust_name,
categories.id as cat_id,
categories.name as cat_name
from customers
inner join cust_cat
on cust_cat.cust_id = customers.id
inner join categories
on categories.id = cust_cat.cat_id
where categories.id = '1'
order by 2,4""")

print cur.fetchall()
#[(2, u'David Bowie', 1, u'Glam Rock')]

cur.execute("""
select customers.id as cust_id,
customers.name as cust_name,
categories.id as cat_id,
categories.name as cat_name
from customers
inner join cust_cat
on cust_cat.cust_id = customers.id
inner join categories
on categories.id = cust_cat.cat_id
where categories.id = '2'
order by 2,4""")

print cur.fetchall()
#[(1, u'Ziggy Marley', 2, u'Nuevo Reggae')]

cur.execute("""
select customers.id as cust_id,
customers.name as cust_name,
categories.id as cat_id,
categories.name as cat_name
from customers
inner join cust_cat
on cust_cat.cust_id = customers.id
inner join categories
on categories.id = cust_cat.cat_id
where categories.id = '3'
order by 2,4""")

print cur.fetchall()
#[(2, u'David Bowie', 3, u'Male Singers'),
# (1, u'Ziggy Marley', 3, u'Male Singers')]

cur.execute("""
select customers.id as cust_id,
customers.name as cust_name,
categories.id as cat_id,
categories.name as cat_name
from customers
inner join cust_cat
on cust_cat.cust_id = customers.id
inner join categories
on categories.id = cust_cat.cat_id
where categories.id = 3
order by 2,4""")

print cur.fetchall()
#[(2, u'David Bowie', 3, u'Male Singers'),
# (1, u'Ziggy Marley', 3, u'Male Singers')]
 
K

Kay Schluehr

I wouldn't be at all surprised if the pysqlite author operated under that
assumption. That the Python developers didn't pick up on the issue is not
surprising. I'm not sure how many of them are (py)sqlite users, probably
relatively few.

Skip

Who has reviewed sqlite/pysqlite after all? Reading the passage in the
sqlite FAQ I can hardly believe that passing errors silently and
coercing everything to string when it fails to be coerced to INTEGER
although INTEGER was an invariant declared in the create command is on
par with Pythons design philosophy. In other cases doctoral
dissertations are written about whether a keyword or some punctuation
shall be used for decorator syntax and in this case everything must be
rushed into the code base of the standard library?
 
P

Paul Boddie

[Quoting Marc 'BlackJack' Rintsch...]
I have to admit I find this bogus too. It has by no means anything to
do with static typing but letting "errors pass silently" i.e.
deactivating runtime type checks as well.

If the questioner had been talking about Perl, most respondents would
rightly have said that Perl's silent coercion or conversion of values
is an irritating source of potential errors. Here, the behaviour of
SQLite, whilst not as bad as Perl - values are not converted into
something different which is then erroneous - does contradict the naive
expectations of users who expect the data type declarations to mean
something and for such constraints to be enforced.

Of course, the criticism of the questioner should be more forgiving in
this debate, since I imagine that most people with experience of SQLite
know of its "loose" data typing model, and that many assume that
everyone else is aware of this feature or limitation, even though that
may not be the case. Thus, the priority should be on emphasizing this
in the module documentation (in order to avoid unnecessary confusion),
along with mentioning other possible strategies for emulating other
database system behaviours: "Other Affinity Modes" in the SQLite
documentation [1] may be helpful here, if the future tense can be
replaced with the present tense whilst preserving the factual content
of that section.

Paul

[1] http://www.sqlite.org/datatype3.html
 
S

Steve Holden

When we use a best case scenario, we get what we expect.




It isn't SQL simply because SQL won't let you insert text
into a numeric field.
Yup, I have to agree that's pretty crappy. (Makes mental note to limit
use of SQLite).

regards
Steve
 
D

Dennis Lee Bieber

Talking to myself again, I see...

At present, I intend this to be my last post on the subject.

The basis of this thread seems to be the blind usage of
untrustworthy data source during INSERT or UPDATE operations, with an
expectation that the database engine will trap malformed numeric data,
as applied to SQLite which instead of trapping on malformed numeric data
silently stores it in the string format in which it was provided.

Would this be a reasonable summary?

All the arguments regarding WHERE and JOIN clause conditionals are
misdirection then. These clauses work properly for the data that has
been stored in the database; furthermore, whether a numeric value is
supplied to the adapter as a Python numeric data type, or as a Python
string representation, is irrelevant -- SQLite will coerce the supplied
data into the specified format. If SQLite can not coerce the data, it is
unlikely that Python could coerce it also -- SQLite will store the
"uncoerced" representation of the data.

So, what I see as the entire crux of the problem, is that SQLite
"recommends" that one should validate the input data BEFORE performing
the INSERT or UPDATE (or supplying user criteria on a WHERE clause).
Conditionals which work across tables (as in a JOIN) are not affected.

I've not done enough with SQLite (and the O'Reilly book /is/ a tad
dated) to know if INSERT/UPDATE triggers could be used to hide the
validation (giving behavior closer to the apparent expectation --
wherein the DBMS rejects the statement). However, it seems to me that
validation before the INSERT/UPDATE is not that onerous... (I've also
not figured out how SQLite handles Null or None)

invoice.csv (beware the Jabberwock... uh... line wrapping)
-=-=-=-=-=-=-
"066","101 Ways to Start A Fight","some Irish gentleman",1919,19.95
"032","A Sale of Two Titties","Charles Dikkens",1855,20.00
"001","Olsen"s Standard Book of British Birds
(Expurgated)","Olsen",None,99.95
066,"Ethel the Aardvark Goes Quantity Surveying",None,1975,9.99
"032","David Coperfield","Edmund Wells",1955,3.95
68,"Strawmen and Dorothy",,2006,49.89
033,"The Emperor's Old Clothes","Grimm Hound",1887,Priceless
-=-=-=-=-=-=-
from pysqlite2 import dbapi2 as sql
import csv
import exceptions
TABLE_DEF_1 = """
create table invoice_1
(
ID integer primary key,
CustNo char,
Title char,
Author char,
Year integer,
Price float
) """

TABLE_DEF_2 = """
create table invoice_2
(
ID integer primary key,
CustNo integer,
Title char,
Author char,
Year integer,
Price float
) """


# Validation functions
class ValidationError(exceptions.ValueError):
def __init__(self, value):
self.value = value
def __str__(self):
return repr(self.value)

def validateInteger(data):
if ((str(data).strip().upper() == "NONE")
or (str(data).strip().upper() == "NULL")):
return None
else:
try:
return int(data)
except: #should test explicitly
raise ValidationError("Invalid Integer data: %s" % data)

def validateFloat(data):
if ((str(data).strip().upper() == "NONE")
or (str(data).strip().upper() == "NULL")):
return None
else:
try:
return float(data)
except:
raise ValidationError("Invalid Float data: %s" % data)

def validateString(data):
if ((str(data).strip().upper() == "NONE")
or (str(data).strip().upper() == "NULL")):
return None
else:
return data


db = sql.connect("test.db")
cr = db.cursor()
try:
rs = cr.execute(TABLE_DEF_1)
db.commit()
except: #I know, I should be explicit
pass #assume table already exists

try:
rs = cr.execute(TABLE_DEF_2)
db.commit()
except: #I know, I should be explicit
pass #assume table already exists


fin = open("invoice.csv", "rb")
indata = csv.reader(fin)

print "\nInserting:"
for r in indata:
print r
if len(r) != 5:
print "^^^Bad Record Length"
else:
try:
rv = []
rv.append(validateString(r[0])) #custno
rv.append(validateString(r[1])) #title
rv.append(validateString(r[2])) #author
rv.append(validateInteger(r[3])) #year
rv.append(validateFloat(r[4])) #price
rs = cr.execute("""insert into invoice_1
(CustNo, Title, Author, Year, Price)
values (?,?,?,?,?)""",
r)
except ValidationError, e:
print "Validation Error; Invoice_1: %s" % e.value

try:
rv = []
rv.append(validateInteger(r[0])) #custno
rv.append(validateString(r[1])) #title
rv.append(validateString(r[2])) #author
rv.append(validateInteger(r[3])) #year
rv.append(validateFloat(r[4])) #price
rs = cr.execute("""insert into invoice_2
(CustNo, Title, Author, Year, Price)
values (?,?,?,?,?)""",
r)
except ValidationError, e:
print "Validation Error; Invoice_2: %s" % e.value

db.commit()

fin.close()

print "\nSelect all from Invoice_1 (CustNo is CHARACTER)"
rs = cr.execute("select * from invoice_1")
for r in cr:
print r

print "\nSelect all from Invoice_2 (CustNo is INTEGER)"
rs = cr.execute("select * from invoice_2")
for r in cr:
print r

db.close()
-=-=-=-=-=-=-=-
E:\UserData\Dennis Lee Bieber\My Documents\Python Progs>python
SQLiteTest.py

Inserting:
['066', '101 Ways to Start A Fight', 'some Irish gentleman', '1919',
'19.95']
['032', 'A Sale of Two Titties', 'Charles Dikkens', '1855', '20.00']
['001', 'Olsens Standard Book of British Birds (Expurgated)"', 'Olsen',
'None', '99.95']
['066', 'Ethel the Aardvark Goes Quantity Surveying', 'None', '1975',
'9.99']
['032', 'David Coperfield', 'Edmund Wells', '1955', '3.95']
['>68', 'Strawmen and Dorothy', '', '2006', '49.89']
Validation Error; Invoice_2: Invalid Integer data: >68
['033', "The Emperor's Old Clothes", 'Grimm Hound', '1887', 'Priceless']
Validation Error; Invoice_1: Invalid Float data: Priceless
Validation Error; Invoice_2: Invalid Float data: Priceless

Select all from Invoice_1 (CustNo is CHARACTER)
(1, u'066', u'101 Ways to Start A Fight', u'some Irish gentleman', 1919,
19.949999999999999)
(2, u'032', u'A Sale of Two Titties', u'Charles Dikkens', 1855, 20.0)
(3, u'001', u'Olsens Standard Book of British Birds (Expurgated)"',
u'Olsen', u'None', 99.950000000000003)
(4, u'066', u'Ethel the Aardvark Goes Quantity Surveying', u'None',
1975, 9.9900000000000002)
(5, u'032', u'David Coperfield', u'Edmund Wells', 1955,
3.9500000000000002)
(6, u'>68', u'Strawmen and Dorothy', u'', 2006, 49.890000000000001)

Select all from Invoice_2 (CustNo is INTEGER)
(1, 66, u'101 Ways to Start A Fight', u'some Irish gentleman', 1919,
19.949999999999999)
(2, 32, u'A Sale of Two Titties', u'Charles Dikkens', 1855, 20.0)
(3, 1, u'Olsens Standard Book of British Birds (Expurgated)"', u'Olsen',
u'None', 99.950000000000003)
(4, 66, u'Ethel the Aardvark Goes Quantity Surveying', u'None', 1975,
9.9900000000000002)
(5, 32, u'David Coperfield', u'Edmund Wells', 1955, 3.9500000000000002)

E:\UserData\Dennis Lee Bieber\My Documents\Python Progs>
--
Wulfraed Dennis Lee Bieber KD6MOG
(e-mail address removed) (e-mail address removed)
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: (e-mail address removed))
HTTP://www.bestiaria.com/
 
D

Dennis Lee Bieber

Guess I lied...

Talking to myself again, I see...

<snip>
rs = cr.execute("""insert into invoice_1
(CustNo, Title, Author, Year, Price)
values (?,?,?,?,?)""",
r)

Whoops, r => rv, though the exceptions raised made it moot

rs = cr.execute("""insert into invoice_2
(CustNo, Title, Author, Year, Price)
values (?,?,?,?,?)""",
r)

Same comment

Out of curiousity, I converted to using MySQL(db) as a test. As
expected, the pre-insert validation code worked with same results (well,
the price was declared decimal, and Python 2.4 appears to handle that as
a Decimal("value") on return <G>)

Now, taking out the pre-validation and type conversion, supplying
all data as it came from the CSV file:

-=-=-=-=-=-=-
Inserting:
['066', '101 Ways to Start A Fight', 'some Irish gentleman', '1919',
'19.95']
['032', 'A Sale of Two Titties', 'Charles Dikkens', '1855', '20.00']
['001', 'Olsens Standard Book of British Birds (Expurgated)"', 'Olsen',
'None', '99.95']
['066', 'Ethel the Aardvark Goes Quantity Surveying', 'None', '1975',
'9.99']
['032', 'David Coperfield', 'Edmund Wells', '1955', '3.95']
['>68', 'Strawmen and Dorothy', '', '2006', '49.89']
['033', "The Emperor's Old Clothes", 'Grimm Hound', '1887', 'Priceless']

Select all from Invoice_1 (CustNo is CHARACTER)
(1L, '066', '101 Ways to Start A Fight', 'some Irish gentleman', 1919L,
Decimal("19.95"))
(2L, '032', 'A Sale of Two Titties', 'Charles Dikkens', 1855L,
Decimal("20.00"))
(3L, '001', 'Olsens Standard Book of British Birds (Expurgated)"',
'Olsen', 0L, Decimal("99.95"))
(4L, '066', 'Ethel the Aardvark Goes Quantity Surveying', 'None', 1975L,
Decimal("9.99"))
(5L, '032', 'David Coperfield', 'Edmund Wells', 1955L, Decimal("3.95"))
(6L, '>68', 'Strawmen and Dorothy', '', 2006L, Decimal("49.89"))
(7L, '033', "The Emperor's Old Clothes", 'Grimm Hound', 1887L,
Decimal("0.00"))

Select all from Invoice_2 (CustNo is INTEGER)
(1L, 66L, '101 Ways to Start A Fight', 'some Irish gentleman', 1919L,
Decimal("19.95"))
(2L, 32L, 'A Sale of Two Titties', 'Charles Dikkens', 1855L,
Decimal("20.00"))
(3L, 1L, 'Olsens Standard Book of British Birds (Expurgated)"', 'Olsen',
0L, Decimal("99.95"))
(4L, 66L, 'Ethel the Aardvark Goes Quantity Surveying', 'None', 1975L,
Decimal("9.99"))
(5L, 32L, 'David Coperfield', 'Edmund Wells', 1955L, Decimal("3.95"))
(6L, 0L, 'Strawmen and Dorothy', '', 2006L, Decimal("49.89"))
(7L, 33L, "The Emperor's Old Clothes", 'Grimm Hound', 1887L,
Decimal("0.00"))
-=-=-=-=-=-=-

How interesting... With MySQL/MySQLdb I did NOT get exceptions or
error results on inserting bad numeric data supplied as character string
format (ie, as read from the CSV). Instead, MySQL SILENTLY converted
them to ZEROS

A price of "Priceless" becomes Decimal("0.00").

The Customer number of ">68" became 0L


Which would one rather have to work with -- a database that copied
invalid numerics as string literals (which, in my mind, makes it much
easier to correct the data later, using "update .... set field = correct
where field = invalid") or a database that silently converts them all to
0 values. (Of course, I now expect to have a rejoinder about "Using a
REAL database instead of MySQL" -- but unless said person wishes to
start making the same comments about SQLite on at least as regular a
basis, I believe the objection itself is invalid for this example).

(Apparently we have fallen afoul of this clause from the old
O'Reilly/MySQL black/brown book: "When asked to store a value in a
numeric column that is outside the column type's allowable range, MySQL
clips the value to the appropriate endpoint of the range and stores the
resulting value instead." -- seems character data "clips" to zero.

-=-=-=-=-=-=-
import MySQLdb
import csv
##import exceptions

TABLE_DEF_1 = """
create table invoice_1
(
ID integer auto_increment primary key,
CustNo varchar(4),
Title varchar(100),
Author varchar(50),
Year integer,
Price decimal(10, 2)
) """

TABLE_DEF_2 = """
create table invoice_2
(
ID integer auto_increment primary key,
CustNo integer,
Title varchar(100),
Author varchar(50),
Year integer,
Price decimal(10, 2)
) """


### Validation functions
##class ValidationError(exceptions.ValueError):
## def __init__(self, value):
## self.value = value
## def __str__(self):
## return repr(self.value)
##
##def validateInteger(data):
## if ((str(data).strip().upper() == "NONE")
## or (str(data).strip().upper() == "NULL")):
## return None
## else:
## try:
## return int(data)
## except: #should test explicitly
## raise ValidationError("Invalid Integer data: %s" % data)
##
##def validateFloat(data):
## if ((str(data).strip().upper() == "NONE")
## or (str(data).strip().upper() == "NULL")):
## return None
## else:
## try:
## return float(data)
## except:
## raise ValidationError("Invalid Float data: %s" % data)
##
##def validateString(data):
## if ((str(data).strip().upper() == "NONE")
## or (str(data).strip().upper() == "NULL")):
## return None
## else:
## return data


db = MySQLdb.connect(host="localhost", user="test", db="test")
cr = db.cursor()

rs = cr.execute("drop table if exists invoice_1, invoice_2")
db.commit()

rs = cr.execute(TABLE_DEF_1)
db.commit()

rs = cr.execute(TABLE_DEF_2)
db.commit()

fin = open("invoice.csv", "rb")
indata = csv.reader(fin)

print "\nInserting:"
for r in indata:
print r
if len(r) != 5:
print "^^^Bad Record Length"
else:
## try:
## rv = []
## rv.append(validateString(r[0])) #custno
## rv.append(validateString(r[1])) #title
## rv.append(validateString(r[2])) #author
## rv.append(validateInteger(r[3])) #year
## rv.append(validateFloat(r[4])) #price
rs = cr.execute("""insert into invoice_1
(CustNo, Title, Author, Year, Price)
values (%s, %s, %s, %s, %s)""",
r)
## except ValidationError, e:
## print "Validation Error; Invoice_1: %s" % e.value
##
## try:
## rv = []
## rv.append(validateInteger(r[0])) #custno
## rv.append(validateString(r[1])) #title
## rv.append(validateString(r[2])) #author
## rv.append(validateInteger(r[3])) #year
## rv.append(validateFloat(r[4])) #price
rs = cr.execute("""insert into invoice_2
(CustNo, Title, Author, Year, Price)
values (%s, %s, %s, %s, %s)""",
r)
## except ValidationError, e:
## print "Validation Error; Invoice_2: %s" % e.value

db.commit()

fin.close()

print "\nSelect all from Invoice_1 (CustNo is CHARACTER)"
rs = cr.execute("select * from invoice_1")
for r in cr:
print r

print "\nSelect all from Invoice_2 (CustNo is INTEGER)"
rs = cr.execute("select * from invoice_2")
for r in cr:
print r

db.close()

--
Wulfraed Dennis Lee Bieber KD6MOG
(e-mail address removed) (e-mail address removed)
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: (e-mail address removed))
HTTP://www.bestiaria.com/
 
P

Paul Boddie

Dennis said:
Talking to myself again, I see...

Not quite. ;-)

[...]
How interesting... With MySQL/MySQLdb I did NOT get exceptions or
error results on inserting bad numeric data supplied as character string
format (ie, as read from the CSV). Instead, MySQL SILENTLY converted
them to ZEROS

A price of "Priceless" becomes Decimal("0.00").

The Customer number of ">68" became 0L

This kind of thing is "classic" MySQL behaviour.
Which would one rather have to work with -- a database that copied
invalid numerics as string literals (which, in my mind, makes it much
easier to correct the data later, using "update .... set field = correct
where field = invalid") or a database that silently converts them all to
0 values. (Of course, I now expect to have a rejoinder about "Using a
REAL database instead of MySQL" -- but unless said person wishes to
start making the same comments about SQLite on at least as regular a
basis, I believe the objection itself is invalid for this example).

Given subsequent research into SQLite's affinity modes and their
presumed status as future features, the solution in that database
system's case is to apply validation in the driver/module or through
driver extensions, and there is apparently some flexibility in the
pysqlite2 modules for changing the way data types are handled, although
a cursory inspection of the documentation doesn't appear to suggest a
convenient, ready-made solution.

As for MySQL, the situation is possibly more awkward: one expects the
database system to complain about certain things, which it probably
does from time to time, but it would seem wasteful to duplicate
whatever validation the database system does do just to cover those
situations where the system misbehaves.

Paul
 
M

mensanator

Dennis said:
Guess I lied...

Talking to myself again, I see...

<snip>
rs = cr.execute("""insert into invoice_1
(CustNo, Title, Author, Year, Price)
values (?,?,?,?,?)""",
r)

Whoops, r => rv, though the exceptions raised made it moot

rs = cr.execute("""insert into invoice_2
(CustNo, Title, Author, Year, Price)
values (?,?,?,?,?)""",
r)

Same comment

Out of curiousity, I converted to using MySQL(db) as a test. As
expected, the pre-insert validation code worked with same results (well,
the price was declared decimal, and Python 2.4 appears to handle that as
a Decimal("value") on return <G>)

Now, taking out the pre-validation and type conversion, supplying
all data as it came from the CSV file:

-=-=-=-=-=-=-
Inserting:
['066', '101 Ways to Start A Fight', 'some Irish gentleman', '1919',
'19.95']
['032', 'A Sale of Two Titties', 'Charles Dikkens', '1855', '20.00']
['001', 'Olsens Standard Book of British Birds (Expurgated)"', 'Olsen',
'None', '99.95']
['066', 'Ethel the Aardvark Goes Quantity Surveying', 'None', '1975',
'9.99']
['032', 'David Coperfield', 'Edmund Wells', '1955', '3.95']
['>68', 'Strawmen and Dorothy', '', '2006', '49.89']
['033', "The Emperor's Old Clothes", 'Grimm Hound', '1887', 'Priceless']

Select all from Invoice_1 (CustNo is CHARACTER)
(1L, '066', '101 Ways to Start A Fight', 'some Irish gentleman', 1919L,
Decimal("19.95"))
(2L, '032', 'A Sale of Two Titties', 'Charles Dikkens', 1855L,
Decimal("20.00"))
(3L, '001', 'Olsens Standard Book of British Birds (Expurgated)"',
'Olsen', 0L, Decimal("99.95"))
(4L, '066', 'Ethel the Aardvark Goes Quantity Surveying', 'None', 1975L,
Decimal("9.99"))
(5L, '032', 'David Coperfield', 'Edmund Wells', 1955L, Decimal("3.95"))
(6L, '>68', 'Strawmen and Dorothy', '', 2006L, Decimal("49.89"))
(7L, '033', "The Emperor's Old Clothes", 'Grimm Hound', 1887L,
Decimal("0.00"))

Select all from Invoice_2 (CustNo is INTEGER)
(1L, 66L, '101 Ways to Start A Fight', 'some Irish gentleman', 1919L,
Decimal("19.95"))
(2L, 32L, 'A Sale of Two Titties', 'Charles Dikkens', 1855L,
Decimal("20.00"))
(3L, 1L, 'Olsens Standard Book of British Birds (Expurgated)"', 'Olsen',
0L, Decimal("99.95"))
(4L, 66L, 'Ethel the Aardvark Goes Quantity Surveying', 'None', 1975L,
Decimal("9.99"))
(5L, 32L, 'David Coperfield', 'Edmund Wells', 1955L, Decimal("3.95"))
(6L, 0L, 'Strawmen and Dorothy', '', 2006L, Decimal("49.89"))
(7L, 33L, "The Emperor's Old Clothes", 'Grimm Hound', 1887L,
Decimal("0.00"))
-=-=-=-=-=-=-

How interesting... With MySQL/MySQLdb I did NOT get exceptions or
error results on inserting bad numeric data supplied as character string
format (ie, as read from the CSV). Instead, MySQL SILENTLY converted
them to ZEROS

A price of "Priceless" becomes Decimal("0.00").

The Customer number of ">68" became 0L


Which would one rather have to work with -- a database that copied
invalid numerics as string literals (which, in my mind, makes it much
easier to correct the data later, using "update .... set field = correct
where field = invalid") or a database that silently converts them all to
0 values. (Of course, I now expect to have a rejoinder about "Using a
REAL database instead of MySQL" -- but unless said person wishes to
start making the same comments about SQLite on at least as regular a
basis, I believe the objection itself is invalid for this example).

(Apparently we have fallen afoul of this clause from the old
O'Reilly/MySQL black/brown book: "When asked to store a value in a
numeric column that is outside the column type's allowable range, MySQL
clips the value to the appropriate endpoint of the range and stores the
resulting value instead." -- seems character data "clips" to zero.

Are you saying that MySQL is goofy? ;-)

Based on these replies, I'm pulling back and retrenching.

As I said before, I'm not entering 500,000 records by writing
INSERT statements for each record, so reading csv files is
a more realistic test. Nevertheless, I am still convinced that
the documentation (or lack thereof) is mainly responsible for
my confusion. I was, after all, mimicing the examples given
(which still have errors).

I think an explanation of how Sqlite3 differs from SQL and
a better set of examples is still warranted.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,157
Messages
2,570,879
Members
47,414
Latest member
djangoframe

Latest Threads

Top