[QUIZ] IP to Country (#139)

R

Ruby Quiz

The three rules of Ruby Quiz:

1. Please do not post any solutions or spoiler discussion for this quiz until
48 hours have passed from the time on this message.

2. Support Ruby Quiz by submitting ideas as often as you can:

http://www.rubyquiz.com/

3. Enjoy!

Suggestion: A [QUIZ] in the subject of emails about the problem helps everyone
on Ruby Talk follow the discussion. Please reply to the original quiz message,
if you can.

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

This week's Ruby Quiz is to write a simple utility. Your program should accept
an IP address as a command-line argument and print out the two letter code for
the country that IP is assigned in. You can find a database for the matching
at:

http://software77.net/cgi-bin/ip-country/geo-ip.pl

To keep the problem interesting though, let's write our programs with a focus on
speed and memory efficiency.

$ time ruby ip_to_country.rb 68.97.89.187
US

real 0m0.314s
user 0m0.259s
sys 0m0.053s
 
S

Simon Kröger

Ruby said:
[...]

$ time ruby ip_to_country.rb 68.97.89.187
US

real 0m0.314s
user 0m0.259s
sys 0m0.053s

Is an 'initialisation run' allowed to massage the data?
(we should at least split the benchmarks to keep it fair)

Is it motivating or a spoiler to post timings?

cheers

Simon
 
J

James Edward Gray II

Ruby said:
[...]

$ time ruby ip_to_country.rb 68.97.89.187
US
=09
real 0m0.314s
user 0m0.259s
sys 0m0.053s

Is an 'initialisation run' allowed to massage the data?
(we should at least split the benchmarks to keep it fair)

My script does need and initialization run, yes. I don't see any =20
harm in paying a one time penalty to set things up right.
Is it motivating or a spoiler to post timings?

Motivating, definitely. :)

James Edward Gray II=
 
S

Simon Kröger

James said:
Ruby said:
[...]

$ time ruby ip_to_country.rb 68.97.89.187
US

real 0m0.314s
user 0m0.259s
sys 0m0.053s

Is an 'initialisation run' allowed to massage the data?
(we should at least split the benchmarks to keep it fair)

My script does need and initialization run, yes. I don't see any harm
in paying a one time penalty to set things up right.
Is it motivating or a spoiler to post timings?

Motivating, definitely. :)

James Edward Gray II

Ok, my script does not need any initialization, it uses the file
IpToCountry.csv exactly as downloaded.

----------------------------------------------------------------
$ ruby -v
ruby 1.8.4 (2005-12-24) [i386-cygwin]

$ time ruby quiz139.rb 68.97.89.187
US

real 0m0.047s
user 0m0.030s
sys 0m0.030s

$ time ruby quiz139.rb 84.191.4.10
DE

real 0m0.046s
user 0m0.046s
sys 0m0.015s
----------------------------------------------------------------

This is on a Pentium M 2.13GHz Laptop with 2GB RAM and rather slow HD.

cheers

Simon
 
J

James Edward Gray II

James said:
Ruby Quiz wrote:
[...]

$ time ruby ip_to_country.rb 68.97.89.187
US

real 0m0.314s
user 0m0.259s
sys 0m0.053s

Is an 'initialisation run' allowed to massage the data?
(we should at least split the benchmarks to keep it fair)

My script does need and initialization run, yes. I don't see any =20
harm
in paying a one time penalty to set things up right.
Is it motivating or a spoiler to post timings?

Motivating, definitely. :)

James Edward Gray II

Ok, my script does not need any initialization, it uses the file
IpToCountry.csv exactly as downloaded.

----------------------------------------------------------------
$ ruby -v
ruby 1.8.4 (2005-12-24) [i386-cygwin]

$ time ruby quiz139.rb 68.97.89.187
US

real 0m0.047s
user 0m0.030s
sys 0m0.030s

$ time ruby quiz139.rb 84.191.4.10
DE

real 0m0.046s
user 0m0.046s
sys 0m0.015s
----------------------------------------------------------------

Wow, I'm impressed. Can't wait to see that code!

James Edward Gray II=
 
S

Simon Kröger

Wow, I'm impressed. Can't wait to see that code!

Thanks. :)

Because startup time startet to take over the benchmark:
----------------------------------------------------------------
$ time ruby quiz139.rb 68.97.89.187 84.191.4.10 80.79.64.128 210.185.128.123
202.10.4.222 192.189.119.1
US
DE
RU
JP
AU
EU

real 0m0.078s
user 0m0.046s
sys 0m0.031s
----------------------------------------------------------------

and by the way: thanks for telling me such a database exists!

cheers

Simon
 
D

diego scataglini

I think that the timing of the scripts are not a good index. It all =20
depends on what hardware/os you are running it on.
If we want to use speed as an index we should probably have J.E. =20
compare them all on the same machine.

Maybe we could also write a ruby script that runs all the entry =20
scripts and time them, and that could be another ruby quiz which will =20=

also be voted on speed and then we could write a ruby script to time =20
those entries an then we could .... Just ignore this paragraph

Diego Scataglini

James said:
Ruby Quiz wrote:
[...]

$ time ruby ip_to_country.rb 68.97.89.187
US

real 0m0.314s
user 0m0.259s
sys 0m0.053s

Is an 'initialisation run' allowed to massage the data?
(we should at least split the benchmarks to keep it fair)

My script does need and initialization run, yes. I don't see any =20
harm
in paying a one time penalty to set things up right.
Is it motivating or a spoiler to post timings?

Motivating, definitely. :)

James Edward Gray II

Ok, my script does not need any initialization, it uses the file
IpToCountry.csv exactly as downloaded.

----------------------------------------------------------------
$ ruby -v
ruby 1.8.4 (2005-12-24) [i386-cygwin]

$ time ruby quiz139.rb 68.97.89.187
US

real 0m0.047s
user 0m0.030s
sys 0m0.030s

$ time ruby quiz139.rb 84.191.4.10
DE

real 0m0.046s
user 0m0.046s
sys 0m0.015s
----------------------------------------------------------------

This is on a Pentium M 2.13GHz Laptop with 2GB RAM and rather slow HD.

cheers

Simon
 
J

James Edward Gray II

I think that the timing of the scripts are not a good index. It all
depends on what hardware/os you are running it on.
If we want to use speed as an index we should probably have J.E.
compare them all on the same machine.

I view it that we are getting a rough idea of speeds, not exact
counts. Both scripts timed so far seem to be able to answer the
question in under a second on semi-current hardware. Good enough for
me.
Maybe we could also write a ruby script that runs all the entry
scripts and time them, and that could be another ruby quiz which
will also be voted on speed and then we could write a ruby script
to time those entries an then we could .... Just ignore this paragraph

Thank you for volunteering... ;)

James Edward Gray II
 
B

Bill Kelly

From: "Simon Kröger said:
Ok, my script does not need any initialization, it uses the file
IpToCountry.csv exactly as downloaded.

We probably did something similar. :) Mine also works on the
unmodified IpToCountry.csv file.

$ time ruby 139_ip_to_country.rb 67.19.248.74 70.87.101.66 205.234.109.18 217.146.186.221 62.75.166.87
US
US
US
GB
DE

real 0m0.122s
user 0m0.015s
sys 0m0.000s


(ruby 1.8.4 (2005-12-24) [i386-mswin32], timed from cygwin bash shell,
2GHz athlon64, winxp.)

I don't think the timings are very accurate on this system. It
didn't change much whether I looked up one IP or five.

. . Looking up 80 IPs on one command line resulted in:

real 0m0.242s
user 0m0.015s
sys 0m0.016s


Regards,

Bill
 
C

Clifford Heath

Ruby said:
This week's Ruby Quiz is to write a simple utility. Your program should accept
an IP address as a command-line argument and print out the two letter code for
the country that IP is assigned in.

I assume that it's not ok for folk to use my Ruby GeoIP gem?
It has a reasonable compromise between speed and memory usage.
:)

Clifford Heath.
 
J

James Edward Gray II

I assume that it's not ok for folk to use my Ruby GeoIP gem?
It has a reasonable compromise between speed and memory usage.
:)

Please do. I would love to see how it stacks up against the custom
solutions we will no doubt see.

James Edward Gray II
 
J

Jesús Gabriel y Galán

This week's Ruby Quiz is to write a simple utility. Your program should accept
an IP address as a command-line argument and print out the two letter code for
the country that IP is assigned in. You can find a database for the matching
at:

http://software77.net/cgi-bin/ip-country/geo-ip.pl

To keep the problem interesting though, let's write our programs with a focus on
speed and memory efficiency.

Hi,

This is my second submission to Ruby Quiz, any tip will be greatly
appreciated. First I downloaded the file and removed all comments,
then tried a straightforward approach. Here it is:

require 'faster_csv'
ip = ARGV[0].split(/\./)
ip = ip[0].to_i * 16777216 + ip[1].to_i * 65536 + ip[2].to_i * 256 + ip[3].to_i
file = ARGV[1] || 'ipdb.csv'

FasterCSV.foreach(file) do |line|
if (line[0].to_i <= ip && line[1].to_i >= ip)
puts line[4]
exit
end
end

It allows to pass the IP as the first parameter, then transforms it to
the format in the file (the "base" 256). The file can be passed as the
second argument, by default I used my modified file without the
comments.

The method uses FasterCSV to parse the file line by line, checking ip
against the range. The result for the IP in the example was:

time ruby quiz139a.rb 68.97.89.187
US

real 0m0.355s
user 0m0.328s
sys 0m0.024s

So, it was a little bit more than the one presented with the problem.
Unfortunately, things don't go so good when you search for an IP that
matches entries at the end of the file:

time ruby quiz139a.rb 255.0.0.1
ZZ

real 0m5.337s
user 0m5.036s
sys 0m0.292s

More than 5 seconds :-(

So, I went on to build a second version. This time the approach I
took, as the file is ordered is a binary search directly on the file.
Here it is:

require 'ftools'

ip = ARGV[0].split(/\./)
ip = ip[0].to_i * 16777216 + ip[1].to_i * 65536 + ip[2].to_i * 256 + ip[3].to_i
file = ARGV[1] || 'ipdb.csv'

File.open(file) do |f|
low = 0
high = f.stat.size
f.seek(high / 2)
while true
while ((a = f.getc) != 10)
f.seek(-2, IO::SEEK_CUR)
end
pos = f.pos
line = f.readline.split(",")
low_range = line[0][1..-2].to_i
high_range = line[1][1..-2].to_i
if (low_range > ip)
high = pos
offset = (f.pos - pos) + ((high - low) / 2)
f.seek(-offset, IO::SEEK_CUR)
elsif (high_range < ip)
low = f.pos
f.seek((high-low) / 2, IO::SEEK_CUR)
else
puts line[4][1..-2]
exit
end
end
end

What the method does is a binary search in the file. It starts a low
and high variables to manage the partition, then positions the cursor
in the file to the middle. Every time, the method reads back until it
finds a 10 (\n), then reads that line to check the IP range. If the
lower range is higher than the IP, the method moves the cursor in the
file, down to half of the current low-high range. If the higher range
is lower than the IP, then it moves up half of the range. The timings
here are much better:

time ruby quiz139b.rb 68.97.89.187
US

real 0m0.077s
user 0m0.048s
sys 0m0.004s

time ruby quiz139b.rb 255.0.0.1
ZZ

real 0m0.069s
user 0m0.060s
sys 0m0.008s

Apart from the general strategy of the binary search, I haven't had
the time to give too much thought to the details, like how to extract
the values after the readline. Any tips regarding that would be
greatly appreciated. Also I don't know if there's a better way of
moving back in the file to the previous line break (the while (a =
f.getc) != 10 part). Tips?

Thanks for this quiz. I hope I had time to work on every quiz.

Regards,

Jesus.
 
S

Simon Kröger

Hi,

one of the (without a doubt) many binary search solutions. This one reads a 1K
chunk of the file around the current position and uses a regexp to extract
valid lines (and values) from this chunk.

The reasoning behind that is that data from disk is read in blocks anyway (in
most cases even larger than 1K) and the last 2 or 3 iterations of the algorithm
can be avoided. (max depth encountered so far was 14)

Using the regexp frees us from 'manually' looking for newlines (it will find
about 10 valid sets of data in each iteration).

I tried a lot of stuff like biasing the median towards the side where the
solution will be most likely found, using 'static' buffers for the File#read
calls or unrolling the recursion but nothing proofed realy usefull so i decided
to go with the short and fast enough(tm) solution.

------------------------------------------------------------------------------
require 'ipaddr'

class IPAddr
def country_code()
open('IpToCountry.csv'){|file| code(file, to_i, 0, file.stat.size)}
end

private
def code(file, ip, min, max)
median = (min + max) / 2
file.seek(median - 512)
ary = file.read(1024).scan(/^"(\d*)","(\d*)","(?:\w*)","(?:\d*)","(\w*)"/)
return code(file, ip, median, max) if ary.empty? || ary.last[1].to_i < ip
return code(file, ip, min, median) if ary.first[0].to_i > ip
ary.find{|l| l[1].to_i >= ip}[2]
end
end

ARGV.each{|arg| puts IPAddr.new(arg).country_code} if $0 == __FILE__
------------------------------------------------------------------------------

timings:

$ time ruby quiz139.rb 0.0.0.0 68.97.89.187 84.191.4.10 80.79.64.128
210.185.128.123 202.10.4.222 192.189.119.1 255.255.255.255
ZZ
US
DE
RU
JP
AU
EU
ZZ

real 0m0.062s
user 0m0.046s
sys 0m0.030s

cheers

Simon
 
M

Matthias Wächter

--------------000706010906020103020108
Content-Type: text/plain; charset=ISO-8859-15
Content-Transfer-Encoding: 7bit

Ruby said:
This week's Ruby Quiz is to write a simple utility. Your program should accept
an IP address as a command-line argument and print out the two letter code for
the country that IP is assigned in. You can find a database for the matching
at:

http://software77.net/cgi-bin/ip-country/geo-ip.pl

To keep the problem interesting though, let's write our programs with a focus on
speed and memory efficiency.

$ time ruby ip_to_country.rb 68.97.89.187
US

real 0m0.314s
user 0m0.259s
sys 0m0.053s

This is my first submission to Ruby Quiz, so don't expect ground-breaking techniques from this implementation.

Let me characterize my solution: Instead of looking up the csv file directly, I precompile it. The search can then be performed quickly. I think my solution will beat some others when it comes to massive lookups in the range of 100th of thousand lookups.

First, I wrote a random IP address generator. It can alternatively output the IP addresses as a space delimited string or each IP address on a separate line. On my bash submitting all IP address worked up to about 8 k entries until the argument list was too long, so I implemented that the addresses can be given on $stdin also. For a proper argument list on stdin, the entries should be separated by \n, so just add a second argument to rand_ip.rb to generate the list of ips -- the script doesn't care what the second parameter is, it just has to be there (i use "byline").

I use the script like this:

$ ./rand_ip.rb 100000 byline > ip-test-list


Second, I use a precompile run. I don't know how folks here can stay under 100 milliseconds without parsing the file beforehand. My precompilation generates a binary table that stores 10 bytes for each entry: 4 bytes each for start/end of address space and 2 bytes for the country code. Additionally, for saving memory, I squeeze memory areas with adjacent addresses in the same country. This saves more than 50% of the records while it doesn't significantly speed up the search (just about 1 step more in my binary search algorithm, see below). the packer (packr.rb) uses pack("NNa2") for building the binary table, and look, the addresses are stored in network byte order (i.e. big endian). The output table holds about 32 k Entries.

The packer doesn't accept any parameters, just call

$ ./packer.rb


Third comes the actual address search. I implemented two versions for comparison: One with a small RAM footprint that looks up everything in the packed table file. The other one reads the whole table into memory and performs all lookups in memory.

The algorithm is the same in both implementations: I use binary search over all entries until the ip address either matches one range or there is no match at all.

Comparison is sped up by making 4-letter string comparisons of the respective binary addresses. That way "search_ip >= addr_start" works even when the addresses are strings. This saves a lot of time because at search time there is no .to_i calculation avoided.

I run the searchers (search_file.rb, search_str.rb) for small numbers of random IP address using the command line:

$ time ./search_file.rb `./rand_ip.rb 3000` > /dev/null

For more addresses I use one of the following -- the last usage is to make successive runs with equal input:

$ time ./rand_ip.rb 100000 byline | ./search_file.rb > /dev/null
$ ./rand_ip.rb 100000 byline > ip-test-list; time ./search_file.rb < ip-test-list > /dev/null



My results for 100000 lookups are:

$ time ./packer.rb

real 0m1.634s
user 0m1.576s
sys 0m0.056s
$ time ./rand_ip.rb 100000 byline > ip-test-list

real 0m0.797s
user 0m0.740s
sys 0m0.056s
$ time ./search_file.rb < ip-test-list > /dev/null

real 0m11.091s
user 0m9.673s
sys 0m1.420s
$ time ./search_str.rb < ip-test-list > /dev/null

real 0m7.131s
user 0m6.960s
sys 0m0.168s

btw: I don't understand the following result -- can you help me improving this? 129 ms just for firing ruby up! Can't be!
$ time ruby /dev/null

real 0m0.129s
user 0m0.120s
sys 0m0.012s
$ uname -a
Linux server 2.6.21-modgentoo #2 SMP Wed May 2 19:07:13 CEST 2007 x86_64 AMD Athlon(tm) 64 Processor 3000+ AuthenticAMD GNU/Linux
$ ruby --version
ruby 1.8.6 (2007-03-13 patchlevel 0) [x86_64-linux]

Cheers,
- Matthias

--------------000706010906020103020108
Content-Type: text/plain;
name="rand_ip.rb"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
filename="rand_ip.rb"

#!/usr/bin/ruby

r=ARGV[0].to_i

EOL= ARGV[1] ? "\n" : " "

r.times {
print "#{rand(256)}.#{rand(256)}.#{rand(256)}.#{rand(256)}#{EOL}"
}

--------------000706010906020103020108
Content-Type: text/plain;
name="packer.rb"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
filename="packer.rb"

#!/usr/bin/ruby
# comment

last_start=nil
last_end=nil
last_country=nil
File.open("packed-ip.dat","wb") do |wfile|
IO.foreach("geo-ip.csv") do |line|
next if !(line =~ /^"/ )
s,e,d1,d2,co=line.delete!("\"").split(",")
s,e = s.to_i,e.to_i
if !last_start
# initialize with first entry
last_start,last_end,last_country = s,e,co
else
if s==last_end+1 and co==last_country
# squeeze if successive ranges have zero gap
last_end=e
else
# append last entry, remember new one
wfile << [last_start,last_end,last_country].pack("NNa2")
last_start,last_end,last_country = s,e,co
end
end
end
# print last entry
if last_start
wfile << [last_start,last_end,last_country].pack("NNa2")
end
end

--------------000706010906020103020108
Content-Type: text/plain;
name="search_file.rb"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
filename="search_file.rb"

#!/usr/bin/ruby

# take command line or stdin -- the latter has performance advantage for long lists
if ARGV[0]
arr=ARGV
else
arr=$stdin
end

# the binary table file is looked up with each request
File.open("packed-ip.dat","rb") do |rfile|
rfile.seek(0,IO::SEEK_END)
record_max=rfile.pos/10-1

arr.each { |argv|
# build a 4-char string representation of IP address
# in network byte order so it can be a string compare below
ipstr= argv.split(".").map {|x| x.to_i.chr}.join

# low/high water marks initialized
low,high=0,record_max
while true
mid=(low+high)/2 # binary search median
rfile.seek(10*mid) # one record is 10 byte, seek to position
str=rfile.read(8) # for range matching, we need only 8 bytes
# at comparison, values are big endian, i.e. packed("N")
if ipstr>=str[0,4] # is this IP not below the current range?
if ipstr<=str[4,4] # is this IP not above the current range?
puts rfile.read(2) # a perfect match, voila!
break
else
low=mid+1 # binary search: raise lower limit
end
else
high=mid-1 # binary search: reduce upper limit
end
if low>high # no entries left? nothing found
puts "no country"
break
end
end
}
end

--------------000706010906020103020108
Content-Type: text/plain;
name="search_str.rb"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
filename="search_str.rb"

#!/usr/bin/ruby

# read the binary table upfront
f=File.open("packed-ip.dat","rb")
table=f.read
record_max=table.length/10-1
f.close

# take command line or stdin -- the latter has performance advantage for long lists
if ARGV[0]
arr=ARGV
else
arr=$stdin
end

arr.each { |argv|
# build a 4-char string representation of IP address
# in network byte order so it can be a string compare below
ipstr= argv.split(".").map {|x| x.to_i.chr}.join

# low/high water marks initialized
low,high=0,record_max
while true
mid=(low+high)/2 # binary search median
# at comparison, values are big endian, i.e. packed("N")
if ipstr>=table[10*mid,4] # is this IP not below the current range?
if ipstr<=table[10*mid+4,4] # is this IP not above the current range?
puts table[10*mid+8,2] # a perfecct match, voila!
break
else
low=mid+1 # binary search: raise lower limit
end
else
high=mid-1 # binary search: reduce upper limit
end
if low>high # no entries left? nothing found
puts "no country"
break
end
end
}


--------------000706010906020103020108--
 
J

James Edward Gray II

This is the code I used to generate the quiz example. To give credit
where credit is due though, it was heavily inspired from some code
Allan Odgaard shared with me:

#!/usr/bin/env ruby -wKU

require "open-uri"
require "zlib"

begin
require "rubygems"
rescue LoadError
# load without gems
end

begin
require "faster_csv"
FCSV.build_csv_interface
rescue LoadError
require "csv"
end


class IP
def initialize(address)
@address_chunks = address.split(".").map { |n| Integer(n) }
raise AgumentError, "Malformed IP" unless @address_chunks.size == 4
end

def to_i
@address_chunks.inject { |result, chunk| result * 256 + chunk }
end

STRING_SIZE = new("255.255.255.255").to_i.to_s.size

def to_s
"%#{STRING_SIZE}s" % to_i
end
end

class IPToCountryDB
REMOTE = "http://software77.net/cgi-bin/ip-country/geo-ip.pl?
action=download"
LOCAL = "ip_to_country.txt"

RECORD_SIZE = IP::STRING_SIZE * 2 + 5

def self.build(path = LOCAL)
open(path, "w") do |db|
open(REMOTE) do |url|
csv = Zlib::GzipReader.new(url)

last_range = Array.new
csv.each do |line|
next if line =~ /\A\s*(?:#|\z)/
from, to, country = CSV.parse_line(line).values_at(0..1, 4).
map { |f| Integer(f) rescue f }
if last_range[2] == country and last_range[1] + 1 == from
last_range[1] = to
else
build_line(db, last_range)
last_range = [from, to, country]
end
end
build_line(db, last_range)
end
end
end

def self.build_line(db, fields)
return if fields.empty?
db.printf("%#{IP::STRING_SIZE}s\t%#{IP::STRING_SIZE}s\t%s\n",
*fields)
end
private_class_method :build_line

def initialize(path = LOCAL)
begin
@db = open(path)
rescue Errno::ENOENT
self.class.build(path)
retry
end
end

def search(address)
binary_search(IP.new(address).to_i)
end

private

def binary_search(ip, min = 0, max = @db.stat.size / RECORD_SIZE)
return "Unknown" if min == max

middle = (min + max) / 2
@db.seek(RECORD_SIZE * middle, IO::SEEK_SET)

if @db.read(RECORD_SIZE) =~ /\A *(\d+)\t *(\d+)\t([A-Z]{2})\n\z/
if ip < $1.to_i then binary_search(ip, min, middle)
elsif ip > $2.to_i then binary_search(ip, middle + 1, max)
else $3
end
else
raise "Malformed database at offset #{RECORD_SIZE * middle}"
end
end
end

if __FILE__ == $PROGRAM_NAME
require "optparse"

options = {:db => IPToCountryDB::LOCAL, :rebuild => false}

ARGV.options do |opts|
opts.banner = "Usage:\n" +
" #{File.basename($PROGRAM_NAME)} [-d PATH] IP\n" +
" #{File.basename($PROGRAM_NAME)} [-d PATH] -r"

opts.separator ""
opts.separator "Specific Options:"

opts.on("-d", "--db PATH", String, "The path to database file")
do |path|
options[:db] = path
end
opts.on("-r", "--rebuild", "Rebuild the database") do
options[:rebuild] = true
end

opts.separator "Common Options:"

opts.on("-h", "--help", "Show this message.") do
puts opts
exit
end

begin
opts.parse!
raise "No IP address given" unless options[:rebuild] or
ARGV.size == 1
rescue
puts opts
exit
end
end

if options[:rebuild]
IPToCountryDB.build(options[:db])
else
puts IPToCountryDB.new(options[:db]).search(ARGV.shift)
end
end

__END__

James Edward Gray II
 
E

Eugene Kalenkovich

My solution suffers from over-generality (is supposed to work for any sorted
string file by any sorting criteria), and from over-limiting memory usage -
I like Simon's approach, reading in chunks makes everything easier and
faster.

BTW, all solutions already submitted will lie for subnets 1,2 and 5 :)
Most (but not all) will break on out of bounds submissions (256.256.256.256
or 0.0.0.-1, latter if comments are stripped out)

(yes, I know, just trying to find an excuse for not very elegant solution -
edge cases kill elegancy :)
-----------------------------------------------------------------------------

def bin_find(file,search)
if block_given?
compare = lambda { |a, b| yield(a, b) }
else
compare = lambda { |a, b| a <=> b }
end
open(file) do |f|
return bin_search(f,search,f.lstat.size,&compare)
end
end

def bin_search(f,search,size)
start,finish=0,size
while start<finish
dist=(finish-start)/2
f.seek(start+dist)
f.readline unless start+dist==0
case (l1=f.readline.chomp rescue false) ? yield(search,l1) : -1
when -1
next if (finish=start+dist)>0
break
when 0
return l1
else
case (l2=f.readline.chomp rescue false) ? yield(search,l2) : -1
when -1
return l1
when 0
return l2
else
start+=dist; next
end
end
end
nil
end

nums=[]
out=true
if ARGV[0]=='test'
n=ARGV[1].to_i
n.times{nums << rand(4294967296)}
out=false
else
ARGV.each do |argv|
nums << ((($1.to_i*256)+$2.to_i)*256+$3.to_i)*256+$4.to_i if
argv=~/(\d{1,3}).(\d{1,3}).(\d{1,3}).(\d{1,3})/
end
end
if nums.empty?
puts "Please enter valid ip(s) (or use '#{$0} test NNN' for testing)"
exit
end

nums.each do |num|
ctry='Unknown'
res=bin_find('IpToCountry.csv',num) { |search, str|
str.empty? || str[0,1]!='"' ? 1 : search <=>
str.gsub('"','').split(',')[0].to_i
}.gsub('"','').split(',')
ctry=res[4] if (res[0].to_i..res[1].to_i)===num
puts ctry if out
end
 
B

Bill Kelly

Greetings!

Thanks for the fun quiz.

Looks like my solution is a little more verbose than many
submitted so far.

Like many others, I'm performing a binary search on the
original data file. For better or for worse, I imposed the
following restrictions on my design, just out of curiosity
for how it would turn out:

- non-recursive binary search

- only using seek() and gets() to access the database
(using gets means no backward scanning for the beginning
of a record)

Regards,

Bill


====== Begin file: 139_ip_to_country.rb =====

# IP FROM IP TO REGISTRY ASSIGNED CTRY CNTRY COUNTRY
# "1346797568","1346801663","ripencc","20010601","IL","ISR","ISRAEL"

IpRec = Struct.new:)from, :to, :registry, :assigned, :ctry, :cntry, :country) do
def contains?(ip_value)
ip_value >= self.from && ip_value <= self.to
end
end


class IpToCountry
DATABASE_FNAME = "IptoCountry.csv"

def initialize(db_fname=DATABASE_FNAME)
@db_size = File.stat(db_fname).size
@db = File.open(db_fname, "r")
@db_pos = 0
end

def close
@db.close
@db = nil
end

# Lookup IpRec containing ip. Exception is raised if not found.
def search(ip)
ip_value = self.class.ip_to_int(ip)
find_rec(ip_value)
end

# Binary search through sorted IP database.
# The search uses non-record-aligned byte offsets into the
# file, but always scans forward for the next parsable
# record from a given offset. It keeps track of the
# byte offset past the end of the parsed record in @db_pos.
def find_rec(ip_value)
lower, upper = 0, @db_size - 1
prev_rec = nil
loop do
ofst = lower + ((upper - lower) / 2)
rec = next_rec_from_ofst(ofst)
if rec == prev_rec
# We have narrowed the search to where we're hitting
# the same record each time. Can't get any narrower.
# But these are variable-length records, so there may
# be one or more at our lower bound we haven't seen.
# Do a linear scan from our lower bound to examine the
# few records remaining.
ofst = lower
while (rec = next_rec_from_ofst(ofst)) != prev_rec
return rec if rec.contains? ip_value
ofst = @db_pos
end
raise("no record found for ip_value #{ip_value}")
end
return rec if rec.contains? ip_value
if ip_value < rec.from
upper = ofst
else
lower = @db_pos
end
prev_rec = rec
end
end

def next_rec_from_ofst(ofst)
@db.seek(ofst)
@db_pos = ofst
while line = @db.gets
@db_pos += line.length
break if rec = self.class.parse_rec(line)
end
rec || raise("no record found after ofst #{ofst}")
end

def self.ip_to_int(ip_str)
ip_str.split(".").map{|s|s.to_i}.pack("c4").unpack("N").first
end

# NOTE: Using a strict regexp instead of a simpler split operation,
# because it's important we find a valid record, not one embedded
# in a comment or such.
def self.parse_rec(line)
if line =~ %r{\A \s*"(\d+)"\s*,
\s*"(\d+)"\s*,
\s*"(\w+)"\s*,
\s*"(\d+)"\s*,
\s*"(\w+)"\s*,
\s*"(\w+)"\s*,
\s*"([^"]+)"\s* \z
}x
rec = IpRec.new($1.to_i, $2.to_i, $3, $4.to_i, $5, $6, $7)
end
end
end


if $0 == __FILE__

# Accepts zero-or-more IP addresses on the command line.

ip2c = IpToCountry.new

ARGV.each do |ip|
rec = ip2c.search(ip) rescue nil
puts( rec ? rec.ctry : "(#{ip} not found)" )
end

end

====== End file: 139_ip_to_country.rb =====



====== Begin file: test_139_ip_to_country.rb ======

require '139_ip_to_country'

require 'test/unit'

class TestIpToCountry < Test::Unit::TestCase

# NOTE: the following are the first two and last two records in
# my database file:
REC_0 = IpToCountry.parse_rec(%{"0","16777215","IANA","410227200","ZZ","ZZZ","RESERVED"})
REC_1 = IpToCountry.parse_rec(%{"50331648","67108863","ARIN","572572800","US","USA","UNITED STATES"})
REC_NEG_1 = IpToCountry.parse_rec(%{"4261412864","4278190079","IANA","410227200","ZZ","ZZZ","RESERVED"})
REC_LAST = IpToCountry.parse_rec(%{"4278190080","4294967295","IANA","410227200","ZZ","ZZZ","RESERVED"})

def test_find_rec
ip2c = IpToCountry.new
assert_equal( REC_0, ip2c.find_rec(REC_0.from) )
assert_equal( REC_0, ip2c.find_rec(REC_0.to) )
assert_equal( REC_1, ip2c.find_rec(REC_1.from) )
assert_equal( REC_1, ip2c.find_rec(REC_1.to) )
assert_equal( REC_NEG_1, ip2c.find_rec(REC_NEG_1.from) )
assert_equal( REC_NEG_1, ip2c.find_rec(REC_NEG_1.to) )
assert_equal( REC_LAST, ip2c.find_rec(REC_LAST.from) )
assert_equal( REC_LAST, ip2c.find_rec(REC_LAST.to) )
ip2c.close
end

def test_search
ip2c = IpToCountry.new
rec = ip2c.search("67.19.248.74")
assert_not_nil( rec )
assert_equal( "ARIN", rec.registry )
assert_equal( "US", rec.ctry )
end

end

====== End file: test_139_ip_to_country.rb ======
 
B

Bill Kelly

From: "Eugene Kalenkovich said:
BTW, all solutions already submitted will lie for subnets 1,2 and 5 :)
Most (but not all) will break on out of bounds submissions (256.256.256.256
or 0.0.0.-1, latter if comments are stripped out)

Hi, could you clarify what is meant by lying about subnets
1, 2, and 5?


Thanks,

Bill
 
E

Eugene Kalenkovich

Bill Kelly said:
Hi, could you clarify what is meant by lying about subnets
1, 2, and 5?
Check what ccountry is 5.1.1.1. If you get any valid answer - this answer
is a lie :)
 
B

Bill Kelly

From: "Eugene Kalenkovich said:
Check what ccountry is 5.1.1.1. If you get any valid answer - this answer
is a lie :)

Ah, OK. I get:

ruby 139_ip_to_country.rb 0.1.1.1 1.1.1.1 2.1.1.1 3.1.1.1 4.1.1.1 5.1.1.1
ZZ
(1.1.1.1 not found)
(2.1.1.1 not found)
US
US
(5.1.1.1 not found)


Regards,

Bill
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,979
Messages
2,570,185
Members
46,722
Latest member
DebShillit

Latest Threads

Top