[SUMMARY] hexdump (#171)

M

Matthew Moss

When learning a new programming language, the first thing many coders
do is write the traditional "Hello, world!" program. This generally
provides the bare minimum needed for coding: base program structure,
compilation if needed... In Ruby, this is very bare, as `puts "Hello,
world!"` is sufficient. (See quiz #158 for some non-traditional
versions.)

What also seems a tradition is the question, "What should I program
now?" after "Hello, world!" is output to the console. New coders are
looking for something to try, to expand their skills, without becoming
overwhelmed. Often, I find, the easiest way to do this is to reproduce
an existing program. You can focus on learning the new language and
implementing an existing design, rather than coming up with something
novel.

This week's quiz was chosen with this in mind; it is a good project
for new Rubyists, to dive into the language a bit without drowning.
Hex dump utilities have been around for ages, and there are plenty of
them, so we don't have to think about implementing anything new;
rather, we can focus on learning the Ruby. And writing a hex dump
program let's you deal with files, strings, arrays and output: some of
the basics of any code.

I'm going to look at parts from each of the few solutions, to
highlight some of the things you should know as a Rubyist. If you're
new to Ruby, you might consider trying the quiz first before reading
this summary and the submissions. Then, after reading this summary,
revise and refactor your solution to be leaner and cleaner.

First, let's look at the non-golfed (and slightly modified)
submission _Mikael Hoilund_. It's short, but dense with good
Ruby-isms.

i = 0
ARGF.read.scan(/.{0,16}/m) { |match|
puts(("%08x " % i) + match.unpack('H4'*8).join(' '))
i += 16

}

`ARGF` is a special constant. It isn't a file, but can be treated as
such (as seen above, via the call to the `IO#read` method). It will
sequentially read through all files provided on the command-line or,
if none are provided, will read from standard input. It works together
with `ARGV`, the array of arguments provided to your program,
expecting that all values in `ARGV` are filenames. If you happen to
have a script that also expects command-line options (such as
`--help`), just make sure to process and/or remove them from `ARGV`
before using `ARGF`.

`String#scan` which finds instances of the pattern provided in the
source string. In this case, Mikael is using a regular-expression that
grabs up to 16 characters (i.e. bytes) at a time, including newlines.
(The `m` in the regular-expression indicates a multi-line match, in
which newline characters are treated like any other character, rather
than terminators.)

`String#scan` can return an array of matches, but it can also be used
in block-form, as shown above, the block called once per match with
the matching values passed in argument `match`.

Another trick here is replication. These aren't really "tricks", as
they are standard functions defined on the class, but they can
certainly save typing and keep the code clearer. Try these in `irb`:
=> "H4H4H4H4H4H4H4H4"
[1, 2, 3] * 2
=> [1, 2, 3, 1, 2, 3]


`String#unpack` is a powerful function for handling raw data. It uses
a format string (e.g. "H4H4H4H4H4H4H4H4") to decode the raw data. In
this case, `H4` indicates that four nybbles (e.g. two bytes) should be
decoded from the string. Doing that eight times decodes 16 bytes,
which is how much we are reading at a time in Mikael's code above.

`String#unpack` (and the reverse `Array#pack`) can do a lot of work in
short-order. It just takes a bit of practice to understand, and
easy-access to the formats table. (On the command-line, type: `ri
String#unpack`.)

Finally, take a quick look at Mikael's golfed solution. Aside from
squeezing everything together, it makes use of some special globals:
`$<` (equivalent to `ARGF`) and `$&` (evaluates to the current match
from `scan`, eliminating the need for the `match` parameter to the
block). Globals like this can certainly make it more fun to "golf"
(i.e. the deliberate shrinking and obfuscation of a program), but
aren't recommended for clarity.

_Robert Dober_ provides a clean, straightforward solution that needs
little explanation. Make sure to look at the whole of it, while I
examine briefly his `output` method.

require 'enumerator'

BYTES_PER_LINE = 0x10

def output address, line
e = line.enum_for :each_byte
puts "%04x %-#{BYTES_PER_LINE*3+1}s %s" % [ address,
e.map{ |b| "%02x" % b }.join(" "),
e.map{ |b|
0x20 > b || 0x7f < b ? "." : b.chr
}.join ]
end

The most useful bit here is the `enumerator` module, and the
`enum_for` method that returns an `Enumerable::Enumerator` object.
This object provides a number of ways to access the data. Here, Robert
accesses it one byte at a time, having passed the argument
`:each_byte`. Enumerators, of course, are not required to process each
byte of the source string: a couple calls to `each_byte` could have
done that as well. But the enumerator is a convenient package, which
can be used multiple times, can be used as an `Enumerable`, and remove
redundancy, all shown above.

Enumerators also have access to other ways to enumerate... What if you
want to get three objects at a time from a collection? Disjointed or
overlapping? You can use `:each_cons` or `:each_slice` to that effect.
x = [1, 2, 3, 4, 5]
=> [1, 2, 3, 4, 5]
x.enum_for:)each_cons, 3).to_a
=> [ [1, 2, 3], [2, 3, 4], [3, 4, 5] ]
x.enum_for:)each_slice, 3).to_a
=> [ [1, 2, 3], [4, 5] ]

(Note that there are some changes going on with enumerators between
Ruby 1.8.6 and 1.9; here is some good information on the [changes in
Ruby 1.9][1]).

Now we look briefly at _Adam Shelly_'s solution, in particular his
command-line option handling.

width = 16
group = 2
skip = 0
length = Float::MAX
do_ascii = true
file = $stdin

while (opt = ARGV.shift)
if opt[0] == ?-
case opt[1]
when ?n
length = ARGV.shift.to_i
when ?s
skip = ARGV.shift.to_i
when ?g
group = ARGV.shift.to_i
when ?w
width = ARGV.shift.to_i
when ?a
do_ascii = false
else
raise ArgumentError, "invalid Option #{opt}"
end
else
file = File.new(opt)
end
end

`ARGV.shift` is a common pattern. It removes the first item from
`ARGV` and returns it. Doing the assignment and while-loop test in one
motion with `ARGV.shift` is a simple way to look at all the
command-line arguments.

Adam's arguments to his hexdump program are expected to be a single
character preceded by a single dash. The question-mark notation (e.g.
`?n`) returns the integer ASCII value of the character immediately
following. Likewise, single-character array access (e.g. `opt[1]`)
_also_ returns an integer ASCII value. (Note: This also differs in
1.9.) So by checking the first two characters of an argument pulled
from `ARGV` against the dash character and various other options
implemented, Adam can replace the default values provided at the top.

For a quick-and-dirty script, handling options in such a way is simple
and convenient. For more complex option-handling, you would do well to
make use of the [standard `optparse`][2] module, or [third-party
`main`][3].

That's it for this week! Thanks for the submissions; I certainly
learned a few things myself. (I can't believe I didn't know about
`ARGF`...)



[1]: http://eigenclass.org/hiki.rb?Changes+in+Ruby+1.9
[2]: http://www.ruby-doc.org/stdlib/libdoc/optparse/rdoc/classes/OptionParser.html
[3]: http://groups.google.com/group/ruby-talk-google/browse_thread/thread/88bf54ad98a769ca
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,955
Messages
2,570,117
Members
46,705
Latest member
v_darius

Latest Threads

Top