Python sqlite and regex.

J

Julien ARNOUX

Hi,
I'd like to use regular expressions in sqlite query, I using apsw module
but it doesn't work...Can you help me ?
My script:

import apsw
import re

path = 'db/db.db3'

#regexp function (extract from python-list discusion)
def regexp(expr, item):
reg = re.compile(expr)
return reg.match(item) is not None

con = apsw.Connection(path)
#create function
con.createscalarfunction("REGEXP", regexp)
cur = con.cursor()
#exampl
cur.execute("select foo from test where foo regex 'aa.[0-9])")

and the error is:

cur.execute('select foo from test where foo regex tata')
apsw.SQLError: SQLError: near "regex": syntax error

Thanks
 
D

Dan Sommers

cur.execute("select foo from test where foo regex 'aa.[0-9])")
and the error is:
cur.execute('select foo from test where foo regex tata')
apsw.SQLError: SQLError: near "regex": syntax error

I think you're missing a closing quote on that regex; or perhaps that's
an extra closing parenthesis at the end.

Also, it's probably best to let the database module do any escaping you
may need. For example:

fooregex = r'aa.[0-9]'
sql = 'select foo from test where foo regex %s'
cur.execute( sql, tuple( fooregex ) )

See the DP API spec for more information.

Regards,
Dan
 
M

Matt Good

SQLite3 already has a REGEXP function, so you don't need to create your
own.

As Dan mentioned you also have a problem in your expression: 'aa.[0-9])
You need a closing quote on the expression, and you need to match the
close paren with an open paren, or remove it.

Also, in case you weren't aware, there will be a "sqlite3" module in
Python 2.5 based on pysqlite 2.2: http://initd.org/tracker/pysqlite

Using pysqlite will make it easier to move to the Python 2.5 sqlite3
module if that's important to you.

-- Matt Good
 
?

=?ISO-8859-1?Q?Gerhard_H=E4ring?=

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Matt said:
SQLite3 already has a REGEXP function, so you don't need to create your
own. [...]

Yes, but SQLite does not include a regular expression engine, and thus
according to the SQLite docs you need to register a REGEXP function in
order to make the REGEXP operator work:

"""
The REGEXP operator is a special syntax for the regexp() user function. No
regexp() user function is defined by default and so use of the REGEXP
operator will normally result in an error message. If a user-defined
function named "regexp" is defined at run-time, that function will be
called in order to implement the REGEXP operator.
"""

- -- Gerhard
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFEbfbQdIO4ozGCH14RAqqqAJ49/9Kpi8xA6AyGB0tVJ/JcU4MczgCgoIsW
gdYgUl9ge63CiHqj4Mzgpns=
=1ZDV
-----END PGP SIGNATURE-----
 
P

Paul McGuire

Gerhard Häring said:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Matt said:
SQLite3 already has a REGEXP function, so you don't need to create your
own. [...]

Yes, but SQLite does not include a regular expression engine, and thus
according to the SQLite docs you need to register a REGEXP function in
order to make the REGEXP operator work:

"""
The REGEXP operator is a special syntax for the regexp() user function. No
regexp() user function is defined by default and so use of the REGEXP
operator will normally result in an error message. If a user-defined
function named "regexp" is defined at run-time, that function will be
called in order to implement the REGEXP operator.
"""

This is very interesting. So I *could* define my own regexp function that
processes not regular expressions, but say, glob-like strings, which are
usually much easier for end users to work with (very basic wild-carding
where '*' matches one or more characters, and '?' matches any single
character - maybe add '#' to match any single digit and '@' to match any
single alpha character).

-- Paul
 
P

Paul McGuire

where '*' matches one or more characters, and '?' matches any single

oops, I meant '*' matches zero or more characters.

In many applications, these tests are sufficient for most user queries. And
this eliminates the problem of teaching application users how to create
queries using the full regular expression syntax.

-- Paul
 
P

Paul McGuire

John Salerno said:
'?' also matches 0 characters

Maybe it does, and maybe it doesn't... :)

Here is my test with the glob module (in my directory, there is a file named
PUTTY.RND):
glob.glob('PUTTY.RND') ['PUTTY.RND']
glob.glob('PUTTY.RN?') ['PUTTY.RND']
glob.glob('PUTTY.RN?D') []
glob.glob('PUTTY.RN?') ['PUTTY.RND']
glob.glob('PUTTY.RND?') []
glob.glob('PUTT?.RND') ['PUTTY.RND']
glob.glob('PUTTY?.RND') []
glob.glob('PUTT?Y.RND') []

Looks like '?' does *not* match zero characters in glob. On the other hand,
in the Windows console window, it appears that '?' *does* match zero
characters.

Of course, you could write your regexp() routine to interpret '?' any way
you wanted.

-- Paul
 
D

Dan Sommers

This is very interesting. So I *could* define my own regexp function
that processes not regular expressions, but say, glob-like strings,
which are usually much easier for end users to work with (very basic
wild-carding where '*' matches one or more characters, and '?' matches
any single character - maybe add '#' to match any single digit and '@'
to match any single alpha character).

Doesn't SQL already have lightweight wildcards?

SELECT somefield FROM sometable WHERE someotherfield LIKE '%foo%'

Regards,
Dan
 
P

Paul McGuire

Dan Sommers said:
Doesn't SQL already have lightweight wildcards?

SELECT somefield FROM sometable WHERE someotherfield LIKE '%foo%'

Yes it does - '%' is like '*', and '_' is like '?'. But it is rare for
application users to be familiar with these. In my experience, users are
more likely to have seen '*' and '?'.

The reason I keep citing end/application users, as opposed to developers, is
for the case where the user has filled in some kind of wildcard search
field, to be passed to SQL in a query. Woe to the app designer who figures
that users want to enter a regular expression in such a field. '%' and '_'
wildcards a little better, but frankly, I think '*' and '?' looks more like
a wildcards than '%' and '_'.

-- Paul
 
M

Matt Good

Oops, sorry about the confusion regarding the built-in REGEXP. That's
kind of disappointing. It would appear that the user-defined regexp
function in the original post should work assuming the SQL and regex
syntax errors are corrected.

However, there *is* a GLOB built-in to SQLite 3 that has a default
registered implementation (though it can be overriden by a user-defined
"glob" method if necessary).
 
D

Dan Sommers

Yes it does - '%' is like '*', and '_' is like '?'. But it is rare
for application users to be familiar with these. In my experience,
users are more likely to have seen '*' and '?'.
The reason I keep citing end/application users, as opposed to
developers ...

Users, application users, end users, etc., vary over time. Since this
is c.l.p., though, we'd better just agree to disagree over which user(s)
may or may not be familiar with which (or any) wildcard conventions.
... the case where the user has filled in some kind of wildcard search
field, to be passed to SQL in a query. Woe to the app designer who
figures that users want to enter a regular expression in such a field
...

Woe to the app designer who fails to target the app to the users, make
it clear to the users what is expected in any given field, and accept
"constructive criticism" from unhappy users. ;-)

Knowing your users is one of the most important factors in software
design.
... '%' and '_' wildcards a little better, but frankly, I think '*'
and '?' looks more like a wildcards than '%' and '_'.

Ah, but then you're not an end user, are you? ;-)

Regards,
Dan
 
B

Ben Finney

John Salerno said:
'?' also matches 0 characters

Not in globs. In a glob, '?' matches any one character, '*' matches
any zero or more characters.

In a regex, '.' matches any one character, '?' matches the preceding
atom zero or one times, '*' matches the preceding atom zero or more
times, and '+' matches the preceding atom one or more times.

They're quite different syntaxes, but confusingly similar in
appearance.

On most GNU+Linux systems, these two commands get the relevant manual
pages:

$ man 7 glob
$ man 7 regex
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,298
Messages
2,571,540
Members
48,274
Latest member
HowardKipp

Latest Threads

Top