--HAv5+T9jbwMPl6Kw
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable
Excellent! The only thing remaining is an efficient algorithm for a sear= ch
for all zipcodes within a given radius.
Using Ruby and SQLite3:
pabs@halcyon:~/proj/zip> ./import.rb zipcode.{csv,db}
pabs@halcyon:~/proj/zip> ./find.rb zipcode.db 22003 3
"city","state","zip","distance (mi)"
"Annandale","VA","22003","0.0"
"Springfield","VA","22161","1.62363604423677"
"Springfield","VA","22151","1.87190097838136"
"Falls Church","VA","22042","2.97362028549975"
Here's the code for each piece (also available at the URL=20
http://pablotron.org/files/zipfind.tar.gz):
---- import.rb ----
#!/usr/bin/env ruby
=20
# load libraries
require 'rubygems' rescue nil
require 'sqlite3'
=20
# constants
SCAN_RE =3D /"(\d{5})","([^"]+)","(..)","([\d.-]+)","([\d.-]+)","([\d-]+)=
","(\d)"/
SQL =3D "INSERT INTO zips(zip, city, state, lat, long, timezone, dst)=20
VALUES (?, ?, ?, ?, ?, ?, ?)"
TABLE_SCHEMA =3D "CREATE TABLE zips (
id INTEGER NOT NULL PRIMARY KEY,
=20
zip VARCHAR(5) NOT NULL,
city TEXT NOT NULL,
state VARCHAR(2) NOT NULL,
lat FLOAT NOT NULL,
long FLOAT NOT NULL,
timezone INTEGER NOT NULL,
dst BOOLEAN NOT NULL
);"
=20
=20
# handle command-line arguments
unless ARGV.size =3D=3D 2
$stderr.puts "Usage: #$0 <csv> <db>"
exit -1
end
csv_path, db_path =3D ARGV
=20
# load database, create zip table and prepared statement
db =3D SQLite3:
atabase.new(db_path)
db.query(TABLE_SCHEMA)
st =3D db.prepare(SQL)
=20
# parse CSV and add each line to the database
db.transaction {
File.read(csv_path).scan(SCAN_RE).each { |row| st.execute(*row) }
}
----------
---- find.rb ----
#!/usr/bin/env ruby
=20
require 'rubygems'
require 'sqlite3'
=20
MI_R =3D 1.15
=20
# grab base zip code
unless ARGV.size > 1
$stderr.puts "Usage: #$0 <db> <zipcode> [radius]"
exit -1
end
db_path, src_zip, radius =3D ARGV
radius =3D (radius || 50).to_i
=20
# open database
db =3D SQLite3:
atabase.new(db_path)
=20
# get lat/long for specified zip code
sql =3D "SELECT lat, long FROM zips WHERE zip =3D ?"
src_lat, src_long =3D db.get_first_row(sql, src_zip).map { |v| v.to_f }
=20
unless src_lat && src_long
$stderr.puts "Unknown zip code '#{src_zip}'"
exit -1
end
=20
# calculate min/max lat/long
ret, range =3D [], radius / 69.0
=20
# get all codes within given rectangle
sql =3D "SELECT lat, long, city, state, zip
FROM zips=20
WHERE lat > ? AND lat < ?
AND long > ? AND long < ?"
args =3D [src_lat - range, src_lat + range,=20
src_long - range, src_long + range]
=20
db.prepare(sql).execute(*args).each do |row|
# get row values, convert lat/long to floats
dst_lat, dst_long, dst_zip, dst_city, dist_st =3D row
dst_lat, dst_long =3D dst_lat.to_f, dst_long.to_f
=20
# calculate distance between zip codes. if dst_zip is within the
# specified radius, then add it to the list of results
d =3D Math.sqrt((dst_lat - src_lat) ** 2 + (dst_long - src_long) ** 2)
ret << [dst_zip, dst_city, dist_st, d * 69.0] if d <=3D range
end
=20
# sort results by distance
ret =3D ret.sort { |a, b| a[-1] <=3D> b[-1] }
=20
# print out results as a CSV
puts '"city","state","zip","distance (mi)"',=20
ret.map { |row| '"' << row.join('","') << '"' }
----
I suppose one technique might be to first narrow the databse search withi= n a
given a given square latitude/longitude range and then filter those resul= ts
by testing that they are within the given circle radius
That's all the code above does. There's some room for optimization
there; for example, you could create a region field, then calculate list
of regions that intersect with the search radius. If you index on the
region field, then the query becomes essentially an index lookup instead
of a lat/long comparison (you still have to do the second distance
calculation, of course).
Anyway, I didn't do that because the code above runs pretty quickly on
my machine. =20
--=20
Paul Duncan <
[email protected]> pabs in #ruby-lang (OPN IRC)
http://www.pablotron.org/ OpenPGP Key ID: 0x82C29562
--HAv5+T9jbwMPl6Kw
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: Digital signature
Content-Disposition: inline
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.0 (GNU/Linux)
iD8DBQFDoOcjzdlT34LClWIRAh74AJ0Y3trUYm6otAH8s5uAoP6W37Uq7ACbB2cJ
YiyjVF67Q3A5eo/bPnwL5cI=
=q2s1
-----END PGP SIGNATURE-----
--HAv5+T9jbwMPl6Kw--