A algorithm for finding the number

Z

zuerrong

Hi members,

Maybe this is not a special question to ruby.
But since my ruby app has meet this algorithm problem, so please give
the suggestion if you'd like.

I have these datas in a mysql table:

mysql> select startNum,endNum from ip_data limit 10;
+----------+----------+
| startNum | endNum |
+----------+----------+
| 16777216 | 16777471 |
| 16843008 | 16843263 |
| 16909056 | 16909311 |
| 17367040 | 17498111 |
| 17498112 | 17563647 |
| 17563648 | 17825791 |
| 17825792 | 18153471 |
| 18153472 | 18219007 |
| 18219008 | 18350079 |
| 18350080 | 18874367 |
+----------+----------+

(Those are actually the bigint for ip addresses.)

Given a numer, say 17498200, I want to find this number is in which
row of the table.

I could run the SQL with ruby's mysql API (dbi and mysql::dbd):

select * from ip_data where startNum <= #{number} and endNum >= #{number}

But I found that is much slow.

The talbe is not so large, has only 339542 rows totally.

So I was thinking loading the whole data into memory, discarding
mysql, and find a algorithm for doing it quickly.

Any suggestion please? Thanks.

Regards,
zuerrong
 
M

Mike Stephens

The common sense thing to do is exactly what you have done. Make sure
you have indexes on the fields. If it's still slow, try a different
database. Five minutes and you should be able to move the data into
Access (Jet), like this:

require 'win32ole'
connection = WIN32OLE.new('ADODB.Connection')
connection.Open('Provider=Microsoft.Jet.OLEDB.4.0;Data
Source=c:\Ruby\RubyAccess.mdb')
recordset = WIN32OLE.new('ADODB.Recordset')
sql = 'SELECT * FROM T1'
recordset.Open(sql, connection)
data = recordset.GetRows.transpose
puts data
recordset.close
end
 
Z

zuerrong

2010/12/1 Mike Stephens said:
The common sense thing to do is exactly what you have done. Make sure
you have indexes on the fields. If it's still slow, try a different
database.

Yes I have added keys to the columns in mysql.
While it's still slooow.
So I was thinking moving the whole data into memory, and find a good
algorithm for it.

Thanks.
 
M

Mike Stephens

I don't know how SQLite works but Access Jet would only fetch across the
network the index pages and the target data page(s). Perhaps SQLite is
bringing the whole database across. A normal server-based database would
obviously solve your problem.
 
B

botp

So I was thinking loading the whole data into memory, discarding
mysql, and find a algorithm for doing it quickly.
Any suggestion

try doing a soft binary search on the first column

best regards -botp
 
B

Barry Smith

[Note: parts of this message were removed to make it a legal post.]

Anyone know how i Unsubscribe from this ?
 
G

Giampiero Zanchi

zuerrong wrote in post #965316:
+----------+----------+
| startNum | endNum |
+----------+----------+
| 16777216 | 16777471 |
| 16843008 | 16843263 |
| 16909056 | 16909311 |
| 17367040 | 17498111 |
| 17498112 | 17563647 |
| 17563648 | 17825791 |
| 17825792 | 18153471 |
| 18153472 | 18219007 |
| 18219008 | 18350079 |
| 18350080 | 18874367 |
+----------+----------+
select * from ip_data where startNum <= #{number} and endNum >=
#{number}

try with

select * from ip_data where startNum = (select max(startnum) from
ip_data where startnum <= #{number})
 
B

Ben Bleything

I don't know how SQLite works but Access Jet would only fetch across the
network the index pages and the target data page(s). Perhaps SQLite is
bringing the whole database across. A normal server-based database would
obviously solve your problem.

They've said twice they're using mysql, not sqlite. And remember that
most folks aren't on Windows and can't use Access.

Quite aside from that, sqlite doesn't do any network access at all, it
is always filesystem access only.

Ben
 
M

Mike Stephens

Ben Bleything wrote in post #965416:
They've said twice they're using mysql, not sqlite. And remember that
most folks aren't on Windows and can't use Access.

Well spotted. However I don't know why you think Windows is not widely
used. It is in the corporate world.
Quite aside from that, sqlite doesn't do any network access at all, it
is always filesystem access only.

In the corporate world you rarely - if ever - store data except across a
network
 
B

Ben Bleything

Well spotted. However I don't know why you think Windows is not widely
used. It is in the corporate world.

Call it long experience in the Ruby community. At any rate, giving a
platform-specific solution, whether it's windows or not, is a big
assumption.
In the corporate world you rarely - if ever - store data except across a
network

Yes, of course. But since sqlite has no concept of the network, it all
appears to be local disk access as far as the database is concerned.
If you happen to be accessing it over a network filesystem, sqlite has
no idea. My point was that sqlite will never "bring the whole database
across" since it has no idea what that even means.

Ben
 
M

Mike Stephens

Ben Bleything wrote in post #965426:
My point was that sqlite will never "bring the whole database
across" since it has no idea what that even means.

Whilst this is not a Ruby issue, it is worth pointing out to those that
don't already know, that client databases like Jet (and I would wager,
SQLite) do not read all the database file if there are indexes they can
use.

Your program will read the first bit of the file and detect if there are
applicable indexes. It will then fetch the index pages and look up the
addresses (typically blocks that hold data pages) and fetch those from
the file system.

If you are looking at mickey mouse databases or stuff off a local drive,
this might not be of any consequence, but if you start to get hundreds
of
thousands of records, or millions, it will dramatically cut down your
fetches across the network and the work your program has to do to set up
the data.

If you have joins, the program might be able to do them purely off the
indexes, so that next to no real data has to be fetched.

The original poster was talking about MySQL as you rightly said. The
first issue is therefore to check why the Select was running slowly -
outside of Ruby. There are many reasons why that might be.

Databases are sophisticated components that have millions of man hours
lovingly invested in them. It would be entirely illogical to attempt to
reinvent the wheel in your Ruby program.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,141
Messages
2,570,817
Members
47,367
Latest member
mahdiharooniir

Latest Threads

Top