[SUMMARY] Vehicle Counters (#146)

R

Ruby Quiz

This is a cool little data mining problem. Unfortunately, it is a fair bit of
work to get a good solution. Justin Either sent in the only full solution and
even he chose to cut a few corners. His code does produce almost 200 lines of
interesting feedback when run on the sample data set though, so it's fun to play
around with. The downside is that it's close to 300 lines of code. Because of
that, I'm going to try giving a pretty high-level overview of how it works.

Let's work backwards this time and just follow the code:

# ...

if ARGV.size < 1
puts "Usage: vehicle_counter.rb datafile [-avg]"
else
vc = VehicleCounter.new
vc.process(ARGV[0], ARGV[1] != nil)
end

This is the code that actually gets executed when you run the program. I always
call that the application code.

Here we see a simple branch that either presents a usage message or builds a
VehicleCounter and calls process() on it. This could actually be just a module,
since it doesn't maintain any state. It may help you to think of the following
code that way:

# ...

class VehicleCounter
# ...

def process(file, report_averages)
raw_data = parse(file)
for sensor in 0..1
sensor_report = DirectionDataReport.new(raw_data[sensor])
sensor_report.report(sensor, report_averages)
end
end

# ...
end

# ...

This defines the overall process of this program which is just two steps: read
in the data and build a report. The parse() method manages the reading step
while reporting is encapsulated in DirectionDataReport objects. We will begin
with the reading:

# ...

Dir_A = 0
Dir_B = 1

def parse(file)
times = []
dirs = [[], []]

f = File.open(file)
f.each_line do |line|
sensor, time = parse_record(line)
times << time

if (times.size % 2) == 0
if (times.size == 2 and sensor == Dir_A) or
(times.size == 4 and sensor == Dir_B)

# Remove "B" records from second direction
times = [times[0], times[2]] if sensor == Dir_B

dirs[sensor] << times
times = []
elsif (times.size == 4 and sensor != Dir_B)
puts "Parse error"
times = []
end
elsif (times.size % 2) == 1 and sensor == Dir_B
puts "Parse error - Unexpected B record"
end
end
f.close

dirs
end

def parse_record(data)
unpacked = data.unpack("a1a*")
return unpacked[0] == "A" ? Dir_A : Dir_B, unpacked[1].to_i
end

# ...

This parser is what I meant by cutting corners earlier. The data provided is
quite simple and you don't have to worry about cars going both directions at the
same time when working with it. Given that, a pair of A times is a northbound
car and a set of A, B, A, B times is a southbound car. This parse just hunts
for those sets.

Both sets of times are maintained in the Array of Arrays called dirs. The
constants are used to select the right group based on the indicator found in
unpack()ing the last record of the group. These two Arrays are the ultimate
results of this method.

Note that the file reading loop could be reduced to a File.foreach() call.

The rest of the code is reporting. We saw a DirectionDataReport initialized in
the process() method and then witnessed a call to report() on that object.
These two steps are the primary interface:

MSecPerMin = 1000 * 60
MSecPerHour = MSecPerMin * 60
InchesPerMile = 63360

class DirectionDataReport
Verbose = false
Sensors = %w(Northbound Southbound)
Days = %w(Mon Tue Wed Thu Fri Sat Sun)

def initialize(raw_data)
@raw_data = raw_data
end

def report(sensor, report_averages)
puts "Direction: #{Sensors[sensor]}"
puts "Total Cars: #{self.total_count}"

report_time_periods(sensor, report_averages, MSecPerHour * 12)
report_time_periods(sensor, report_averages, MSecPerHour)
report_time_periods(sensor, report_averages, MSecPerHour / 2)
report_time_periods(sensor, report_averages, MSecPerHour / 4)
puts ""
end

def total_count()
@raw_data.size
end

# ...

You can see here that the constructor just squirrels away the data from the
parse. The report() method, on the other, hand drives the output process.
After reporting the direction and a total car count, it displays time oriented
breakdowns for various scales: 12 hours, one hour, 30 minutes, and 15 minutes.

Here is the report_time_periods() method and three of the methods it relies on:

# ...

def report_time_periods(sensor, report_averages, time_period_length)
days = create_time_periods(time_period_length)
num_time_periods = (MSecPerHour * 24) / time_period_length

counts = count_per_time_period(days)
avg_speeds = speed_per_time_period(days)
avg_dists = dist_per_time_period(days)

puts("\nTime Interval: #{time_period_length/MSecPerMin} Minutes")
if (num_time_periods > 2)
peaks = find_peak_times(days)
puts("Peak Times")
for i in 0...peaks.size
printf("#{Days}:")
peaks.size.times {|p|
printf(format_time_interval_index(peaks[p][1],
time_period_length))
}
puts ""
end
end

puts("Statistics")
printf("Data ")
printf("\tDay") if not report_averages

num_time_periods.times{|i|
printf(format_time_interval_index(i, time_period_length))
}
report_column_data(days, num_time_periods, report_averages, counts,
report_averages ? "Avg Count" : "Count ", "% 5d")
report_column_data(days, num_time_periods, report_averages, avg_speeds,
"Avg Speed", "%02.02f")
report_column_data(days, num_time_periods, report_averages, avg_dists,
"Avg Dist ", "%02.02f")
puts ""
end

def create_time_periods(time_period_length = MSecPerHour)
days = []
time_periods = nil
prev_start = @raw_data[0][0] + 1

for data in @raw_data
if prev_start > data[0]
days << time_periods if time_periods != nil

cur_time_period = 0
time_periods = [[]]

puts "New day: data=#{data[0]}" if Verbose
elsif data[0] > ((cur_time_period + 1) * time_period_length)
while data[0] > ((cur_time_period + 1) * time_period_length)
cur_time_period += 1
time_periods[cur_time_period] = []
end

puts "New time period: data=#{data[0]}" if Verbose
end

time_periods[cur_time_period] << data
prev_start = data[0]
end

days << time_periods
days
end

def format_time_interval_index(index, time_period_length)
sprintf("\t%02d:%02d",
index * time_period_length / MSecPerHour,
(index * time_period_length / MSecPerMin) % 60)
end

def report_column_data(days, num_time_periods, report_averages, data,
data_label, format_string)
if report_averages
printf("\n#{data_label}")
for time in 0...num_time_periods
avg = 0
days.size.times {|day| avg += data[day][time] }
printf("\t#{format_string}", avg / days.size)
end
else
for day in 0...days.size
printf("\n#{data_label}\t#{Days[day]}")
for time in 0...num_time_periods
printf("\t#{format_string}", data[day][time])
end
end
end
end

# ...

There is a lot of code here obviously, but none of it is very complex. First,
report_time_periods() breaks the data down into time intervals using
create_time_periods(). That method is just a partition tool for the data. Once
it's divided out, a few utility method we will examine next are used to separate
the interesting information out.

After that, the entire rest of the of the method is printing code for what it
found. (The peak data section actually calculates, again using another helper,
and prints for appropriate data ranges.) Most of the print work is delegated to
the bottom two methods and much or their magic is for printing columnar data.

Here are the helpers I glossed over and the helpers they rely on:

# ...

def find_peak_times(days, num_peaks=4)
days.map do |day|
find_daily_peak_times(day, num_peaks)
end
end

def find_daily_peak_times(daily_time_periods, num_peaks)
peaks = []
daily_time_periods.size.times {|i|
peaks << [daily_time_periods.size, i]
}
peaks.sort.reverse.slice(0, num_peaks).sort {|a,b| a[1]<=>b[1]}
end

def count_per_time_period(days)
days.map do |day|
day.map {|time_period| time_period.size}
end
end

def speed_per_time_period(days)
days.map do |day|
day.map {|time_period| calc_average_speed(time_period) }
end
end

def dist_per_time_period(days)
days.map do |day|
day.map {|time_period| calc_average_distance(time_period) }
end
end

def calc_average_speed(time_period)
return 0 if time_period.size == 0

sum = 0
for time in time_period
sum += calc_speed(time[0], time[1])
end
sum / (time_period.size)
end

def calc_speed(start_time, end_time)
return (100.0 / (end_time - start_time)) *
MSecPerHour / (InchesPerMile * 1.0)
end

def calc_average_distance(time_period)
return 0 if time_period.size <= 1 # Need at least 2 cars
sum = 0
for i in 0...(time_period.size - 1)
sum += calc_distance(time_period[0], time_period[1])
end
return sum / (time_period.size - 1)
end

def calc_distance(leader_time, follower_time)
follower_speed = calc_speed(follower_time[0], follower_time[1])

dist = follower_speed * ((follower_time[0] - leader_time[0]) /
(MSecPerHour * 1.0))

return dist
end
end

# ...

There are three kinds of helpers here. The peak methods just sort the data and
take so many entries off the top. The *_per_time_period() method are wrappers
that apply the calc_*() methods over periods of data. Which means the calc_*()
methods are where the action is. They figure out car speeds, distances between
cars, and the averages for both using simple math. This is the primary data
that we see in the report output.

Again it's a lot of code to cover, but Justin has produced a great end result.
I'm not going to include the output here because it too is large, but go run the
program and take a look at what it builds.

My thanks to Justin for the heroic effort and Gavin for the interesting problem.

The Ruby Quiz began with an encoding problem that came out of a book and
tomorrow we have another one for you...
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,979
Messages
2,570,185
Members
46,723
Latest member
TwilaTarde

Latest Threads

Top