SQLite3/Sinatra not returning results

A

Alex Gutteridge

Hi,

This has been doing my head in all morning - I wonder if anyone can help
me work out where my problem is:

I have the following code within a Sinatra app:

$probe_db_file = "data/lumiHumanIDMapping.sqlite"
$probe_table = "HumanHT12_V4_0_R2_15002873_B"
$data_db_file = "data/rpediff.db"

$probe_db = SQLite3::Database.new($probe_db_file)
$data_db = SQLite3::Database.new($data_db_file)

get '/:gene' do |gene|

STDOUT.puts gene.inspect
STDOUT.puts gene.class
STDOUT.puts gene == "PAX6"

probes = $probe_db.execute("SELECT Probe_ID,Symbol from #{$probe_table}
WHERE Symbol = ?",gene).map{|row| row[0]}

STDOUT.puts "#{Time.now} - Gene: #{gene}"
STDOUT.puts "#{Time.now} - Probes: #{probes.inspect}"

gene = "PAX6"

probes = $probe_db.execute("SELECT Probe_ID,Symbol from #{$probe_table}
WHERE Symbol = ?",gene).map{|row| row[0]}

STDOUT.puts "#{Time.now} - Gene: #{gene}"
STDOUT.puts "#{Time.now} - Probes: #{probes.inspect}"

end

Running on my development machine (OSX, Ruby 1.8.7, Sinatra 1.1,
sqlite3-ruby 1.3.2) this logs (as expected):

"PAX6"
String
true
Thu Nov 18 11:12:09 +0000 2010 - Gene: PAX6
Thu Nov 18 11:12:09 +0000 2010 - Probes: ["ILMN_1789905", "ILMN_2314140"]
Thu Nov 18 11:12:09 +0000 2010 - Gene: PAX6
Thu Nov 18 11:12:09 +0000 2010 - Probes: ["ILMN_1789905", "ILMN_2314140"]

Running on my production machine (Linux, Ruby 1.9.2, Sinatra 1.1,
sqlite3-ruby 1.3.2) the same query logs:

"PAX6"
String
true
2010-11-18 11:08:48 +0000 - Gene: PAX6
2010-11-18 11:08:48 +0000 - Probes: []
2010-11-18 11:08:48 +0000 - Gene: PAX6
2010-11-18 11:08:48 +0000 - Probes: ["ILMN_1789905", "ILMN_2314140"]

So on this machine the first query returns an empty result for some
reason, even though the rows it should match *are* in $probe_table (as you
can see I can find them if I set the 'gene' parameter 'manually'). All I
can think of is that somewhere in Sinatra/sqlite3-ruby the 'gene' variable
is not quite the simple String it claims to be, but the fact that this only
seems to happen on Ruby 1.9.2 has got me baffled! Both Probe_ID and Symbol
are TEXT columns in Sqlite3 BTW.
 
Z

zimbatm ...

Hi Alex,

the problem is probably related to the different ruby versions since
many things have changed between 1.8.7 and 1.9.2. Try installing ruby
1.9.2 with RVM ( http://rvm.beginrescueend.com ) on your development
machine and see if you have the same problem.

$ gem install rvm
$ rvm-install
$ rvm install ruby-1.9.2
$ rvm use --default ruby-1.9.2

Fix your .profile, make sure you don't export GEM_HOME in the
environment and that ~/.gemrc
doesn't list "gem: --user-install". Othewise, it works pretty well.

RVM is also able to switch ruby versions depending on the project you're
working on, which is handy if you have different projects which require
different ruby versions.

Cheers,
zimbatm
 
A

Alex Gutteridge

Hi Alex,

the problem is probably related to the different ruby versions since
many things have changed between 1.8.7 and 1.9.2. Try installing ruby
1.9.2 with RVM ( http://rvm.beginrescueend.com ) on your development
machine and see if you have the same problem.

$ gem install rvm
$ rvm-install
$ rvm install ruby-1.9.2
$ rvm use --default ruby-1.9.2

Fix your .profile, make sure you don't export GEM_HOME in the
environment and that ~/.gemrc
doesn't list "gem: --user-install". Othewise, it works pretty well.

RVM is also able to switch ruby versions depending on the project you're
working on, which is handy if you have different projects which require
different ruby versions.

Cheers,
zimbatm

Thanks. It turned out to be an String encoding issue. Sinatra was
returning an "ASCII-8BIT" encoded string, while the SQLite3 db is
"US-ASCII" encoded. Manually setting the Sinatra string encoding to
"US-ASCII" got the DB query to match.
 
B

Brian Candler

Alex Gutteridge wrote in post #963848:
Thanks. It turned out to be an String encoding issue. Sinatra was
returning an "ASCII-8BIT" encoded string, while the SQLite3 db is
"US-ASCII" encoded. Manually setting the Sinatra string encoding to
"US-ASCII" got the DB query to match.

Ergh. I can replicate this, with ruby 1.9.2p0 (rvm) + sqlite3-ruby-1.3.2
(gem) under Ubuntu Lucid x86_64.

ruby-1.9.2-p0 > require 'sqlite3'
=> true
ruby-1.9.2-p0 > db = SQLite3::Database.new("foo.db")
=> #<SQLite3::Database:0x00000002631bc0>
ruby-1.9.2-p0 > db.execute("create table foo (id integer auto_increment,
bar varchar(255))")
=> []
ruby-1.9.2-p0 > db.execute("insert into foo (bar) values ('hello')")
=> []
ruby-1.9.2-p0 > s1, s2, s3 = "hello",
"hello".force_encoding("US-ASCII"), "hello".force_encoding("ASCII-8BIT")
=> ["hello", "hello", "hello"]
ruby-1.9.2-p0 > s1 == s2
=> true
ruby-1.9.2-p0 > s1 == s3
=> true
ruby-1.9.2-p0 > db.execute("select * from foo where bar=?", s1)
=> [[nil, "hello"]]
ruby-1.9.2-p0 > db.execute("select * from foo where bar=?", s2)
=> [[nil, "hello"]]
ruby-1.9.2-p0 > db.execute("select * from foo where bar=?", s3)
=> []

So s1 == s3, but the two queries give different results?? This is
barking mad. Having said that, I can't find anything in the sqlite3-ruby
documentation which says how queries may be affected by encodings, so
the behaviour is undefined.

Sinatra's documentation also doesn't seem to mention the encoding that
your |gene| parameter will have, so you have to find out by
trial-and-error that it's ASCII-8BIT. You might have guessed this
because the data comes from a socket and sockets get ASCII-8BIT by
default - but Sinatra might have decided to parse the
Content-Type:...encoding parameter and apply that.

In ruby 1.8 the value "hello" is just 5 bytes, with no hidden dimension
to worry about.
 
M

Markus Fischer

Ergh. I can replicate this, with ruby 1.9.2p0 (rvm) + sqlite3-ruby-1.3.2
(gem) under Ubuntu Lucid x86_64.

ruby-1.9.2-p0 > require 'sqlite3'
=> true
ruby-1.9.2-p0 > db = SQLite3::Database.new("foo.db")
=> #<SQLite3::Database:0x00000002631bc0>
ruby-1.9.2-p0 > db.execute("create table foo (id integer auto_increment,
bar varchar(255))")
=> []
ruby-1.9.2-p0 > db.execute("insert into foo (bar) values ('hello')")
=> []
ruby-1.9.2-p0 > s1, s2, s3 = "hello",
"hello".force_encoding("US-ASCII"), "hello".force_encoding("ASCII-8BIT")
=> ["hello", "hello", "hello"]
ruby-1.9.2-p0 > s1 == s2
=> true
ruby-1.9.2-p0 > s1 == s3
=> true
ruby-1.9.2-p0 > db.execute("select * from foo where bar=?", s1)
=> [[nil, "hello"]]
ruby-1.9.2-p0 > db.execute("select * from foo where bar=?", s2)
=> [[nil, "hello"]]
ruby-1.9.2-p0 > db.execute("select * from foo where bar=?", s3)
=> []

So s1 == s3, but the two queries give different results?? This is
barking mad. Having said that, I can't find anything in the sqlite3-ruby
documentation which says how queries may be affected by encodings, so
the behaviour is undefined.

I was just affected by that, Brian, rhanks for analyzing this problem in
the past! I'm running 1.9.2-p180 in my case. Now when I re-read the
other thread "A question about Ruby 1.9's "external encoding" when
Robert concludes "The rule as such is pretty clear IMHO" and "Now,
everything is clear" I feel like:

nothing.is.clear.to.me

What is this whole character encoding madness about? I never encountered
a language I stumbled into so many subtle problems with encodings left
and right (mainly working with Java in my case). The more I work with
ruby 1.9, the more confusing it gets. I'm getting worried. What am I
missing what others do know that I run into such troubles?

Getting back to this specific case, where's the problem exactly here? Is
it Sinatra? My source file? The sqlite database?

I wrote a simple testing app for the sqlite db and got this:

$ sqlite3 database.sqlite3 .dump
PRAGMA foreign_keys=OFF;
BEGIN TRANSACTION;
CREATE TABLE user ( username varchar(255), password varchar(255), email
varchar(255));
INSERT INTO "user" VALUES('a','b','c');
COMMIT;


$ cat dbitest.rb
require 'dbi'
require 'pp'

dbi = DBI.connect('dbi:SQLite3:database.sqlite3')

sql = "SELECT * FROM user WHERE username = ? AND password = ?"

username = 'a'
password = 'b'

pp username.encoding, password.encoding, dbi.select_one(sql, username,
password)[0].encoding


$ ruby dbitest.rb
#<Encoding:US-ASCII>
#<Encoding:US-ASCII>
#<Encoding:UTF-8>


Within Sinatra it is as you said:

Relevant excerpt of the code:

if @auth.provided? && @auth.basic? && @auth.credentials
sql = "SELECT * FROM user WHERE username = ? AND password = ?"
username = @auth.credentials[0]
password = @auth.credentials[1]
pp username.encoding, password.encoding, $dbi.select_one(sql,
username, password)
username.force_encoding('US-ASCII')
password.force_encoding('US-ASCII')
pp username.encoding, password.encoding, $dbi.select_one(sql,
username, password)[0].encoding
end


This is the output

#<Encoding:ASCII-8BIT>
#<Encoding:ASCII-8BIT>
nil
#<Encoding:US-ASCII>
#<Encoding:US-ASCII>
#<Encoding:UTF-8>

Stunning.

- Markus
 
B

brabuhr

Within Sinatra it is as you said:

=A0if @auth.provided? && @auth.basic? && @auth.credentials
=A0 =A0 sql =3D "SELECT * FROM user WHERE username =3D ? AND password =3D= ?"
=A0 =A0 username =3D @auth.credentials[0]
=A0 =A0 password =3D @auth.credentials[1]
=A0 =A0 pp username.encoding, password.encoding, $dbi.select_one(sql,
username, password)
=A0 =A0 username.force_encoding('US-ASCII')
=A0 =A0 password.force_encoding('US-ASCII')
=A0 =A0 pp username.encoding, password.encoding, $dbi.select_one(sql,
username, password)[0].encoding
=A0end

This is the output

#<Encoding:ASCII-8BIT>
#<Encoding:ASCII-8BIT>
nil
#<Encoding:US-ASCII>
#<Encoding:US-ASCII>
#<Encoding:UTF-8>
Getting back to this specific case, where's the
problem exactly here? Is it Sinatra? My source
file? The sqlite database?

The credentials in Sinatra are unpacked in Rack:

ruby 1.9.2p0:
Encoding.default_external
=3D> # said:
Encoding.default_internal =3D> nil
s =3D 'hello' =3D> "hello"
s.encoding
=3D> # said:
.pack('m*') =3D> "aGVsbG8=3D\n"
.pack('m*').unpack('m*') =3D> ["hello"]
.pack('m*').unpack('m*').first.encoding

=3D> #<Encoding:ASCII-8BIT>

rack/blob/master/lib/rack/auth/basic.rb:
def credentials
@credentials ||=3D params.unpack("m*").first.split(/:/, 2)
end

http://redmine.ruby-lang.org/issues/show/4279
pack.c (pack_unpack): the resulted string of unpack('M') must have
ASCII-8BIT encoding (and ENC_CODERANGE_VALID). [ruby-core:34482]

I don't know if there is any reasonable way for Rack to reliably
return a more correct encoding for the credentials, but that seems to
be where your incompatible encoding is coming from.
 
M

Markus Fischer

The credentials in Sinatra are unpacked in Rack:

ruby 1.9.2p0:
Encoding.default_external
=> # said:
Encoding.default_internal => nil
s = 'hello' => "hello"
s.encoding
=> # said:
.pack('m*') => "aGVsbG8=\n"
.pack('m*').unpack('m*') => ["hello"]
.pack('m*').unpack('m*').first.encoding

=> #<Encoding:ASCII-8BIT>

rack/blob/master/lib/rack/auth/basic.rb:
def credentials
@credentials ||= params.unpack("m*").first.split(/:/, 2)
end

http://redmine.ruby-lang.org/issues/show/4279
pack.c (pack_unpack): the resulted string of unpack('M') must have
ASCII-8BIT encoding (and ENC_CODERANGE_VALID). [ruby-core:34482]

I don't know if there is any reasonable way for Rack to reliably
return a more correct encoding for the credentials, but that seems to
be where your incompatible encoding is coming from.
http://redmine.ruby-lang.org/issues/show/4279
pack.c (pack_unpack): the resulted string of unpack('M') must have
ASCII-8BIT encoding (and ENC_CODERANGE_VALID). [ruby-core:34482]

Oops, that's unpack('M'), Rack is using unpack('m').


Thanks, that very informative! Guess proper encoding handling just comes
with cost and confusion. Although it's very confusing, it actually looks
right what ruby is doing. When it decodes base64, how should it know
it's encoding? My example, involving PHP:

$ php -r 'echo base64_encode("this is a töst"), "\n";'
dGhpcyBpcyBhIHTDtnN0

$ irb
ruby-1.9.2-p180 :001 > "dGhpcyBpcyBhIHTDtnN0".unpack('m')[0].encoding
=> #<Encoding:ASCII-8BIT>
ruby-1.9.2-p180 :002 > "dGhpcyBpcyBhIHTDtnN0".unpack('m')[0]
=> "this is a t\xC3\xB6st"
ruby-1.9.2-p180 :003 >
"dGhpcyBpcyBhIHTDtnN0".unpack('m')[0].force_encoding('UTF-8')
=> "this is a töst"
ruby-1.9.2-p180 :004 >

So it simply can't know it, slaps ASCII-8BIT at it and the application
needs to deal with the proper encoding. Me not knowingly that what I get
is a direct result from unpack() [and if I would have knew it, I
wouldn't have knew'd that it's ASCII-8BIT either ;)] treated it like a
source string when it's more a network string.

Good to know, case solved, still long road ahead :)

thank you,
- Markus
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,968
Messages
2,570,150
Members
46,697
Latest member
AugustNabo

Latest Threads

Top