James said:
Great. Run it for us and let us know how we do.
Here are the results of the supplied solutions so far, and it looks like
my solution can take the 100k-performance victory
First Table: Compilation (Table Packing)
real user sys
Adam[*] 0.005 0.002 0.003
Luis 0.655 0.648 0.007
James[**] 21.089 18.142 0.051
Jesse 1.314 1.295 0.020
Matthias 0.718 0.711 0.008
[*]: Adam does not perform a real compression but he builds two
boundaries to search within the original .csv he subsequently uses.
[**]: Upon rebuild, James fetches the .csv sources from the web making
his solution look slow. This output highly depends on your--actually
my--ISP speed.
Second Table: Run (100_000 Addresses)
real user sys
Adam 24.943 22.993 1.951
Bill 35.080 33.029 2.051
Luis 16.149 13.706 2.444
Eugene[*] 52.307 48.689 3.620
Eugene 65.790 61.984 3.805
James 14.803 12.449 2.356
Jesse 14.016 12.343 1.673
Matt_file 6.192 5.332 0.859
Matt_str 3.704 3.699 0.005
Simon 69.417 64.679 4.706
Justin 56.639 53.292 3.345
steve 63.659 54.355 9.294
[*]: Eugene already implements a random generator. But to make things
fair, I changed his implementation to read the same values from $stdin
as all the other implementations. The "Star" version is using his own
random generator and runs outside competition, the starless version is
my modified one.
[**]: O Jesus
, I can't make your FasterCSV version (a) run, and in
the later version you sent your direct parsing breaks when it comes to
detecting the commented lines in the first part of the file. I couldn't
manage to make it run, sorry.
[***]: Although I managed to write the missing SQL insertion script and
to even add separate indexes for the address limits, Kevin's SQLite3
version took simply too long. I estimated a run time of over an hour. I
am willing to replay the test if someone tells me how to speed up things
with SQLite3 to make it competitive.
Note that I slightly changed all implementations to contain a loop that
iterates on $stdin.each instead of ARGV or using just ARGV[0]. For the
test the script was run only once and was supplied with all addresses in
one run. The test set consisted of 100_000 freshly generated random IP
addresses written to a file and supplied using the following syntax:
$ (time ruby IpToCountry.rb <IP100k > /dev/null) 2>100k.time
I didn't check the output of the scripts, although I checked one address
upfront. This was mainly because all scripts have a different output
format. My tests were just for measuring the performance.
Just for Info:
$ uname -a
Linux sabayon2me 2.6.22-sabayon #1 SMP Mon Sep 3 00:33:06 UTC 2007
x86_64 Intel(R) Core(TM)2 CPU 6600 @ 2.40GHz GenuineIntel GNU/Linux
$ ruby --version
ruby 1.8.6 (2007-03-13 patchlevel 0) [x86_64-linux]
$ cat /etc/sabayon-release
Sabayon Linux x86-64 3.4
- Matthias