[QUIZ] Quoted Printable (#23)

R

Ruby Quiz

The three rules of Ruby Quiz:

1. Please do not post any solutions or spoiler discussion for this quiz until
48 hours have passed from the time on this message.

2. Support Ruby Quiz by submitting ideas as often as you can:

http://www.rubyquiz.com/

3. Enjoy!

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

The quoted printable encoding is used in primarily in email, thought it has
recently seen some use in XML areas as well. The encoding is simple to
translate to and from.

This week's quiz is to build a filter that handles quoted printable translation.

Your script should be a standard Unix filter, reading from files listed on the
command-line or STDIN and writing to STDOUT. In normal operation, the script
should encode all text read in the quoted printable format. However, your
script should also support a -d command-line option and when present, text
should be decoded from quoted printable instead. Finally, your script should
understand a -x command-line option and when given, it should encode <, > and &
for use with XML.

Here are the rules we will use, from the quoted printable format:

1. Bytes with ASCII values from 33 (exclamation point) through 60 (less
than) and values from 62 (greater than) through 126 (tilde) should be
passed through the encoding process unchanged. Note that the -x switch
modifies this rule slightly, as stated above.

2. Other bytes are to be encoded as an equals sign (=) followed by two
hexadecimal digits. For example, when -x is active less than (<) will
become =3C. Use only capital letters for hex digits.

3. The exceptions are spaces and tabs. They should remain unencoded as
long as any non-whitespace character follows them on the line. Spaces
and tabs at the end of a line, must be encoded per rule 2 above.

4. Native line endings should be translated to carriage return-line feed
pairs.

5. Quoted printable lines are limited to 76 characters of length (not
counting the line ending pair). Longer lines must be divided up. Any
line endings added by the encoding process should be proceeded by an
equals sign, so the unecoder will know to remove them. The equals sign
must be the last character on the line, followed immediately by the line
end pair. Such an equals sign does count as a non-whitespace character
for rule 3, allowing preceding spaces and tabs to remain unencoded.
The equals sign must fit inside the 76 character limit.

To unecode, just reverse the process.
 
G

Glenn Parker

Note: I assumed it would be cheating to use the builtin quoted printable
facilities.

I found it somewhat frustrating that String#each_byte does not return
any useful value (see encode_str).

I found it a bit more frustrating that String#chomp! is a greedier than
you might expect, discarding all sorts of potential line endings,
instead of limiting itself to $/.

I would also suggest that adding support for GetoptLong#[] to query
options directly, instead of requiring a full iteration.



#!/usr/bin/env ruby -w

require 'getoptlong'

MaxLength = 76

def main
opts = GetoptLong.new(
[ "-d", GetoptLong::NO_ARGUMENT ],
[ "-x", GetoptLong::NO_ARGUMENT ]
)
$opt_decode = false
$opt_xml = false
opts.each do |opt, arg|
case opt
when "-d": $opt_decode = true
when "-x": $opt_xml = true
end
end

if $opt_decode
decode_input
else
encode_input
end
end

def encode_input
STDOUT.binmode # We need to control the line-endings.
while (line = gets) do
# Note: String#chomp! swallows more than just $/.
line.sub!(/#{$/}$/o, "")
# Encode the entire line.
line.gsub!(/[^\t -<>-~]+/) { |str| encode_str(str) }
line.gsub!(/[&<>]+/) { |str| encode_str(str) } if $opt_xml
line.sub!(/\s*$/) { |str| encode_str(str) }
# Split the line up as needed.
while line.length > MaxLength
split = line.index("=", MaxLength - 4) - 1
split = (MaxLength - 2) if split.nil? or (split > MaxLength - 2)
print line[0..split], "=\r\n"
line = line[(split + 1)..-1]
end
print line, "\r\n"
end
end

def encode_str(str)
encoded = ""
str.each_byte { |c| encoded << "=%02X" % c }
encoded
end

def decode_input
while (line = gets) do
line.chomp!
line.gsub!(/=([\dA-F]{2})/) { $1.hex.chr }
if line[-1] == ?=
print line[0..-2]
else
print line, $/
end
end
end

main
 
J

James Edward Gray II

Note: I assumed it would be cheating to use the builtin quoted
printable facilities.

I must sheepishly admit that I was unaware of of Ruby's converter when
I made the quiz. It was pointed out the me in a private email after I
posted it. The converter isn't a complete solution to the quiz, but it
gets you very close.

Is it cheating to use Ruby features? Never. Feel free, then poke a
little fun at the quiz editor because you're smarter than he is. All
part of the fun.

Sorry for the oversight.

James Edward Gray II
 
D

Dave Burt

Hi,

Testing. I found building a test suite before doing the code really helpful on
this one, to get my head around the intricacies of the encoding. Actually
thinking through the edge cases and working out expected results was necessary
for me to develop this solution.

Now, of course, this would have been a lot easier if I'd just been able to find
the "builtin quoted printable facilities." What builtin quoted printable
facilities?

Anyway, here is my result:
http://www.dave.burt.id.au/ruby/quoted-printable.rb

And the tester:
http://www.dave.burt.id.au/ruby/test-quoted-printable.rb

The testing program generates test methods and test data dynamically.

The public interface to my solution looks like this:

module QuotedPrintable

WHITESPACE = [?\t, ?\ ]
WHITESPACE_REGEXP = /[\t ]/
WHITESPACE_ESCAPED_REGEXP = /=09|=20/

# bytes that do not need to be escaped
PRINTABLES = ((?!..?~).to_a + WHITESPACE) - [?=]

MAX_LINE_WIDTH = 76

NEWLINE = "\r\n"

# additional bytes to escape for safety in an EBCDIC document
EBCDIC_EXCEPTIONS = %w' ! " # $ @ [ \ ] ^ ` { | } ~ '
EBCDIC_PRINTABLES = PRINTABLES - EBCDIC_EXCEPTIONS
# additional bytes to escape for safety in an XML document
XML_EXCEPTIONS = %w' < > & '
XML_PRINTABLES = PRINTABLES - XML_EXCEPTIONS

# Encode self to the quoted-printable transfer encoding
def to_quoted_printable(printables = QuotedPrintable::pRINTABLES)

# Decode self from the quoted-printable transfer encoding
def from_quoted_printable


# Functions that do quoted-printable encoding and decoding
class << self

# Return the quoted-printable escaped representation of the given byte
# (byte must be a Fixnum between 0 and 255)
def encode_byte(byte)

# Return the byte corresponding to the given quoted-printable escape
# sequence as a String. If it's not valid, return nil.
def decode_sequence(escape_sequence)

# Return the given string encoded as quoted-printable, including the
# canonical \r\n line terminators.
def encode_string(string, printables = PRINTABLES)

# Consider the given string quoted-printable encoded, and decode it,
# including translating line terminators to the native default.
def decode_string(string)

# Add quoted-printable conversions to String
class String
include QuotedPrintable # to_quoted_printable, from_quoted_printable
end

Cheers,
Dave
 
J

James Edward Gray II

Now, of course, this would have been a lot easier if I'd just been
able to find the "builtin quoted printable facilities." What builtin
quoted printable facilities?

Look up the "M" format for Array.pack.

James Edward Gray II
 
D

Dave Burt

What builtin quoted printable facilities?
Look up the "M" format for Array.pack.

So here's the cheat solution:

class String
def to_quoted_printable(*args)
[self].pack("M").gsub(/\n/, "\r\n")
end
def from_quoted_printable
self.gsub(/\r\n/, "\n").unpack("M").first
end
end

(Just add my original if __FILE__ block to make it almost quiz-compatible)

And here's how it fares against my test suite:

Loaded suite TC_QuotedPrintable
Started
.............FF.FFFFFFF..
Finished in 0.39 seconds.

So it's 10 times the speed of my original one (against random binary data), but
chops lines too early, ends up with 73- instead of 76-character lines. Of
course, this one won't do XML.

Interestingly, if I use a gsub! instead of a loop with sub!s in my soft_break!
method, I get a 5x speedup... and fail the same tests.

Cheers,
Dave
 
J

James Edward Gray II

(from Dave's solution)

if __FILE__ == $0
require 'optparse'

# Look, James, I'm opt-parsing! :)
...

I'm so proud! :D

James Edward Gray II
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,962
Messages
2,570,134
Members
46,690
Latest member
MacGyver

Latest Threads

Top