[QUIZ] TumbleDRYer (#53)

R

Ruby Quiz

The three rules of Ruby Quiz:

1. Please do not post any solutions or spoiler discussion for this quiz until
48 hours have passed from the time on this message.

2. Support Ruby Quiz by submitting ideas as often as you can:

http://www.rubyquiz.com/

3. Enjoy!

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

by Hugh Sasse

The Pragmatic Programmers (and others) recommend that you remove repetition from
code. They call this principle DRY: Don't Repeat Yourself. Benefits include
not having to track down multiple places to make a change affecting the whole
system, prevention of inconsistency, as well as reduced debugging time.
Sometimes code ends up being repetitive. Take this MySQL example from:

http://manuals.rubyonrails.com/read/chapter/48

CREATE TABLE `authors` (
`id` int(11) NOT NULL auto_increment,
`firstname` varchar(50) NOT NULL default '',
`name` varchar(50) NOT NULL default '',
`nickname` varchar(50) NOT NULL default '',
`contact` varchar(50) NOT NULL default '',
`password` varchar(50) NOT NULL default '',
`description` text NOT NULL,
PRIMARY KEY (`id`)
) TYPE=MyISAM AUTO_INCREMENT=3 ;

CREATE TABLE `categories` (
`id` int(11) NOT NULL auto_increment,
`name` varchar(20) NOT NULL default '',
`description` varchar(70) NOT NULL default '',
PRIMARY KEY (`id`)
) TYPE=MyISAM AUTO_INCREMENT=3 ;

CREATE TABLE `categories_documents` (
`category_id` int(11) NOT NULL default '0',
`document_id` int(11) NOT NULL default '0',
) TYPE=MyISAM ;

CREATE TABLE `documents` (
`id` int(11) NOT NULL auto_increment,
`title` varchar(50) NOT NULL default '',
`description` text NOT NULL,
`author_id` int(11) NOT NULL default '0',
`date` date NOT NULL default '0000-00-00',
`filename` varchar(50) NOT NULL default '',
PRIMARY KEY (`id`),
KEY `document` (`title`)
) TYPE=MyISAM AUTO_INCREMENT=14 ;

clearly there's a lot of repetition in there. It would be nice if it were
possible to eliminate it. This quiz is about writing a program which we'll call
TumbleDRYer, which, given repetitive input, will create ruby code to generate
that input, such that the generating program has much less repetition in it.
One way might be to generate a program like this:


# Undoes the work of TumbleDRYer :)
class WashingMachine
def initialize
@create = 'CREATE TABLE'
@type = 'TYPE=MyISAM'
@varchar_50 = "varchar(50) NOT NULL default '',"
@varchar_20 = "varchar(20) NOT NULL default '',"
@int_11 = "int(11) NOT NULL auto_increment,"
@int_11_0 = "int(11) NOT NULL default '0',"
@pk = "PRIMARY KEY (`id`)"
@auto = "AUTO_INCREMENT="
end

def output
print <<EOF
#{@create} `authors` (
`id` #{@int_11}
`firstname` #{@varchar_50}
`name` #{@varchar_50}
`nickname` #{@varchar_50}
`contact` #{@varchar_50}
`password` #{@varchar_50}
`description` text NOT NULL,
#{@pk}
) #{@type} #{@auto}3 ;

#{@create} `categories` (
`id` #{@int_11}
`name` #{@varchar_20}
`description` varchar(70) NOT NULL default '',
#{@pk}
) #{@type} #{@auto}3 ;

#{@create} `categories_documents` (
`category_id` #{@int_11_0}
`document_id` #{@int_11_0}
) #{@type} ;

#{@create} `documents` (
`id` #{@int_11}
`title` #{@varchar_50}
`description` text NOT NULL,
`author_id` #{@int_11_0}
`date` date NOT NULL default '0000-00-00',
`filename` #{@varchar_50}
#{@pk},
KEY `document` (`title`)
) #{@type} #{@auto}14 ;
EOF
end
end

WashingMachine.new.output

or something of the kind. Or it might be worth using ERB.

Obviously the more human readable to the resulting program is the better: the
point is to produce something that simplifies maintenance, rather than data
compression. Techniques from data compression would seem to be useful, however.
Building up a dictionary of character groups and their frequencies might be
useful, and various strategies are possible for how to fragment the string into
repeated chunks, such as: the longest repeated substring; splitting on some kind
of boundary; or techniques from Ziv-Lempel coding to create the groups.
 
B

Bill Guindon

[snip]
-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-= =3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D=
-=3D-=3D-=3D

by Hugh Sasse
[snip]

CREATE TABLE `documents` (
`id` int(11) NOT NULL auto_increment,
`title` varchar(50) NOT NULL default '',
`description` text NOT NULL,
`author_id` int(11) NOT NULL default '0',
`date` date NOT NULL default '0000-00-00',
`filename` varchar(50) NOT NULL default '',
PRIMARY KEY (`id`),
KEY `document` (`title`)
) TYPE=3DMyISAM AUTO_INCREMENT=3D14 ;

This looks like a fun one to me, as it's one of the main things I use
Ruby for. Looking forward to seeing what others do with it -- and
hoping to clean up an example of my own.

Since I had to rtfm [1] for a couple simple SQL questions (yeah, my
SQL is probably rustier than most), thought I'd save others the time.

q) Why doesn't the 'description' field have a 'default'?
a) "BLOB and TEXT columns cannot be assigned a default value."

q) How should you treat columns that allow NULLs (none of the examples do)?
a) "If neither NULL nor NOT NULL is specified, the column is treated
as though NULL had been specified."

[1] http://dev.mysql.com/doc/refman/5.0/en/create-table.html
 
B

Bill Guindon

[snip]
-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=
=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D=
-=3D-=3D-=3D-=3D
by Hugh Sasse
[snip]

CREATE TABLE `documents` (
`id` int(11) NOT NULL auto_increment,
`title` varchar(50) NOT NULL default '',
`description` text NOT NULL,
`author_id` int(11) NOT NULL default '0',
`date` date NOT NULL default '0000-00-00',
`filename` varchar(50) NOT NULL default '',
PRIMARY KEY (`id`),
KEY `document` (`title`)
) TYPE=3DMyISAM AUTO_INCREMENT=3D14 ;

This looks like a fun one to me, as it's one of the main things I use
Ruby for. Looking forward to seeing what others do with it -- and
hoping to clean up an example of my own.

Since I had to rtfm [1] for a couple simple SQL questions (yeah, my
SQL is probably rustier than most), thought I'd save others the time.

q) Why doesn't the 'description' field have a 'default'?
a) "BLOB and TEXT columns cannot be assigned a default value."

q) How should you treat columns that allow NULLs (none of the examples do= )?
a) "If neither NULL nor NOT NULL is specified, the column is treated
as though NULL had been specified."

[1] http://dev.mysql.com/doc/refman/5.0/en/create-table.html

nm, I seriously misread the challenge. either way, looking forward to
it, and might take a stab at it.
 
D

Dave Burt

Bill said:
Since I had to rtfm [1] for a couple simple SQL questions (yeah, my
SQL is probably rustier than most), thought I'd save others the time.

q) Why doesn't the 'description' field have a 'default'?
a) "BLOB and TEXT columns cannot be assigned a default value."

I'm pretty sure that's DBMS-specific (MySQL), and I'm pretty sure SQLite
(for example) allows this. I'm actually not sure about other major DBMSes.
q) How should you treat columns that allow NULLs (none of the examples
do)?
a) "If neither NULL nor NOT NULL is specified, the column is treated
as though NULL had been specified."

That's standard SQL. Most schema-DDLs include optional columns; I like to
omit both the NULL and the NOT.

While we're looking at the MySQL DDL:

q) What's with the backticks (`)?
a) That's MySQL's way of quoting column names. They're unnecessary as long
as the name is alphabetic and not a keyword. I don't know of any other DBMS
that uses backticks for this.

q) What's with the TYPE=?
a) Again, a MySQL feature - MySQL supports different storage engines (and
here was I thinking a DBMS was supposed to manage storage...) and this
specifies which to use. MyISAM is the crippled default one which doesn't
support referential integrity.

q) Why doesn't auto_increment work with other databases?
a) Because defining an auto-increment field is not specified in SQL-92. They
can all do it, though.

q) Why are data types, "default" and "auto_increment" in lowercase when
other keywords are uppercase?
a) I don't know.

For what it's worth, MySQL 5 is a lot better than MySQL 4 (now with in-line
views, I believe), but I'd still recommend Postgresql or really anything
else over it.

Cheers,
Dave
 
D

Donald Huebschman

I have a beige g3 that apple will not support past 10.2. What version
of ruby is the latest version that will run on a 10.2 system and
where can I get it. 1.8.3 does not compile. Is there anything else I
would need to run ruby?

Thanks,
Donald
 
M

Michal Suchanek

T24gMTAvMzEvMDUsIERvbmFsZCBIdWVic2NobWFuIDxkaHVlYnNjaEBtYWMuY29tPiB3cm90ZToK
PiBJIGhhdmUgYSBiZWlnZSBnMyB0aGF0IGFwcGxlIHdpbGwgbm90IHN1cHBvcnQgcGFzdCAxMC4y
LiBXaGF0IHZlcnNpb24KPiBvZiBydWJ5IGlzIHRoZSBsYXRlc3QgdmVyc2lvbiB0aGF0IHdpbGwg
cnVuIG9uIGEgMTAuMiBzeXN0ZW0gYW5kCj4gd2hlcmUgY2FuIEkgZ2V0IGl0LiAxLjguMyBkb2Vz
IG5vdCBjb21waWxlLiBJcyB0aGVyZSBhbnl0aGluZyBlbHNlIEkKPiB3b3VsZCBuZWVkIHRvIHJ1
biBydWJ5PwoKT2YgY291cnNlIGl0IGRvZXMgbm90LiBMb29rIGF0IHRoZSBwYXRjaGVzIGluY2x1
ZGVkIGluIHRoZSBmaW5rCnBhY2thZ2UgZm9yIDEwLjMuCkkgZ3Vlc3MgdGhlIHBhY2thZ2Ugd291
bGQgYnVpbGQgb24gMTAuMiBhcyB3ZWxsLiBCdXQgSSBjYW5ub3QgdHJ5IG15c2VsZi4KU3BlY2lm
aWNhbGx5IEkgYW0gbm90IHN1cmUgdGhlIGVudmlyb24gcGF0Y2ggd291bGQgd29yayBvbiAxMC4y
IGFzIHdlbGwuCgpodGgKCk1pY2hhbCBTdWNoYW5lawo=
 
B

Bob Showalter

My approach was to look for repeated "phrases" made up of one or more
"words" within a line. Running my solution against the sample input data
produces the following output:

class WashingMachine
def initialize
@id = '`id` int(11) NOT NULL auto_increment,'
@va = 'varchar(50) NOT NULL default \'\','
@in = 'int(11) NOT NULL default \'0\','
@no = 'NOT NULL default'
@ty = ') TYPE=MyISAM'
@de = '`description`'
@cr = 'CREATE TABLE'
@pr = 'PRIMARY KEY'
end
def output
print <<EOF
#{@cr} `authors` (
#{@id}
`firstname` #{@va}
`name` #{@va}
`nickname` #{@va}
`contact` #{@va}
`password` #{@va}
#{@de} text NOT NULL,
#{@pr} (`id`)
#{@ty} AUTO_INCREMENT=3 ;

#{@cr} `categories` (
#{@id}
`name` varchar(20) #{@no} '',
#{@de} varchar(70) #{@no} '',
#{@pr} (`id`)

#{@ty} AUTO_INCREMENT=3 ;
#{@cr} `categories_documents` (
`category_id` #{@in}
`document_id` #{@in}
#{@ty} ;

#{@cr} `documents` (
#{@id}
`title` #{@va}
#{@de} text NOT NULL,
`author_id` #{@in}
`date` date #{@no} '0000-00-00',
`filename` #{@va}
#{@pr} (`id`),
KEY `document` (`title`)
#{@ty} AUTO_INCREMENT=14 ;
EOF
end
end

Here's my solution:

require 'strscan'
require 'abbrev'

class TumbleDRYer

# minimum length of a phrase to consider
MIN_PHRASE = 10

# minimum times a phrase must occur to consider
MIN_OCCUR = 3

# minimum length for abbreviation
MIN_ABBR = 2

def initialize(string)
@input = string
end

def dry

# this will accumulate a list of repeated phrases to condense
phrases = Array.new

# this will receive the abbreviation for each phrase
abbr = Hash.new

lines = @input.to_a
loop do

# process the input data by lines. we find "phrases" by
# first finding the start and end of each "word" in the line,
# and then combining those words into longer phrases. for
# each phrase, we count the number of times it occurs in the
# total input.
phr = Hash.new
lines.each do |line|
s = StringScanner.new(line)
words = Array.new
loop do
s.scan_until(/(?=\S)/) or break
beg = s.pos
s.scan(/\S+/)
words << [ beg, s.pos ]
end
require 'pp'

# combine words to make 'phrases'
combos(words)

# accumulate phrases, counting their occurences.
# skip phrases that are too short.
words.each do |from, to|
p = line[from, to - from]
next unless p.length >= MIN_PHRASE
phr[p] ||= 0
phr[p] += 1
end
end

# get the longest phrase that occurs the most times
longest = phr.sort_by { |k,v| -(k.length * 1000 + v)
}.find { |k,v| v >= MIN_OCCUR } or break
phrase, occurs = longest

# save the phrase, and then blank it out of the input data
# so we can search for more phrases
phrases << phrase
lines.each { |line| line.gsub!(phrase, ' ' * phrase.length) }

end

# now we have all the phrases we want to replace.
# find unique abbreviations for each phrase.
temp = Hash.new
phrases.each do |phrase|
key = phrase.scan(/\w+/).flatten.to_s.downcase
key = '_' + key unless key =~ /^[_a-zA-Z]/
key += '_' while temp.has_key? key
temp[key] = phrase
end
temp.keys.abbrev.sort.each do |s, key|
phrase = temp[key]
abbr[phrase] = s if abbr[phrase].nil? ||
abbr[phrase].length < MIN_ABBR
end

# generate the output class
puts "class WashingMachine"
puts " def initialize"
phrases.each do |phrase|
puts ' @' + abbr[phrase] + " = '" +
phrase.gsub("'", "\\\\'") + "'"
@input.gsub!(phrase, '#{@' + abbr[phrase] + '}')
end
puts " end\n"
puts " def output\nprint <<EOF"
puts @input
puts "EOF\n end\n"
puts "end"

end

private

def combos(arr, max = arr.size - 1, i = 0)
(i+1..max).each do |j|
arr << [ arr[0], arr[j][1] ]
end
combos(arr, max, i + 1) if i < max - 1
end

end

TumbleDRYer.new(ARGF.read).dry
 
H

Hugh Sasse

Interesting solution. I've overlooked a lot of the functionality of
strscan, so that is worth knowing about. The approach for
generating mnemonics was nice as well, as well as the reverse sort
by negation.
Thank you,
Hugh
 
B

Brian Schröder

SSBkZWNpZGVkIHRvIGNoZWF0LCBoZXJlIGlzIG15IHNvbHV0aW9uOgoKYnNjaHJvZWRAb2VyZmk6
fi9zdm4vcHJvamVrdGUvdHVtYmxlemlwJCAuL3R1bWJsZXppcC5yYiB0dW1ibGV6aXAucmIgPgp0
dW1ibGV6aXAudHVtYmxlZC5yYgpic2Nocm9lZEBvZXJmaTp+L3N2bi9wcm9qZWt0ZS90dW1ibGV6
aXAkIGNhdCB0dW1ibGV6aXAudHVtYmxlZC5yYgpyZXF1aXJlICJ6bGliIgpwdXRzIFpsaWI6Oklu
ZmxhdGUuaW5mbGF0ZShEQVRBLnJlYWQpCl9fRU5EX18KeFNWw5QvLS7Dkk/DisOMw5MvKk3CqsOk
w6IqSi0sw40sSlVQwq/DisOJTFLDp8OiKigtKVZQw5UgICAgICAKKzR1wrhAQFUswrPCssOyw4xL
w4tJLEnDlcOLw5DilpIuIXpFKGMpKTDDpUrDscOxKHIpfi7DscOxSjoKRlTCrcOh4paSw6TDrsOR
w4gxfS8KYnNjaHJvZWRAb2VyZmk6fi9zdm4vcHJvamVrdGUvdHVtYmxlemlwJCBydWJ5IHR1bWJs
ZXppcC50dW1ibGVkLnJiCiMhL3Vzci9iaW4vcnVieQoKcmVxdWlyZSAnemxpYicKCnB1dHMgJShy
ZXF1aXJlICJ6bGliIiksCiAgICAgJShwdXRzIFpsaWI6OkluZmxhdGUuaW5mbGF0ZShEQVRBLnJl
YWQpKSwKICAgICAiX19FTkRfXyIsCiAgICAgWmxpYjo6RGVmbGF0ZS5kZWZsYXRlKEFSR0YucmVh
ZCkKCkl0IGRlZmluaXRlbHkgZmFpbHMgb24gaHVtYW4gcmVhZGFiaWxpdHksIHRob3VnaCBjb21w
cmVzc2lvbiBhbmQgZHJ5CmlzIHF1aXRlIGdvb2QgOy0pCgpjaGVlcnMsCgpCcmlhbgoKCgotLQpo
dHRwOi8vcnVieS5icmlhbi1zY2hyb2VkZXIuZGUvCgpTdHJpbmdlZCBpbnN0cnVtZW50IGNob3Jk
czogaHR0cDovL2Nob3JkbGlzdC5icmlhbi1zY2hyb2VkZXIuZGUvCg==
 
H

Hugh Sasse

---559023410-758783491-1130926764=:12187
Content-Type: MULTIPART/MIXED; BOUNDARY="-559023410-758783491-1130926764=:12187"

This message is in MIME format. The first part should be readable text,
while the remaining parts are likely unreadable without MIME-aware tools.

---559023410-758783491-1130926764=:12187
Content-Type: TEXT/PLAIN; charset=X-UNKNOWN
Content-Transfer-Encoding: QUOTED-PRINTABLE

I decided to cheat, here is my solution:
[...]
xSV=C3=94/-.=C3=92O=C3=8A=C3=8C=C3=93/*M=C2=AA=C3=A4=C3=A2*J-,=C3=8D,JUP=
=C2=AF=C3=8A=C3=89LR=C3=A7=C3=A2*(-)VP=C3=95 =20
+4u=C2=B8@@U,=C2=B3=C2=B2=C3=B2=C3=8CK=C3=8BI,I=C3=95=C3=8B=C3=90=E2=96= =92.!zE(c))0=C3=A5J=C3=B1=C3=B1(r)~.=C3=B1=C3=B1J:
FT=C2=AD=C3=A1=E2=96=92=C3=A4=C3=AE=C3=91=C3=881}/
bschroed@oerfi:~/svn/projekte/tumblezip$ ruby tumblezip.tumbled.rb
#!/usr/bin/ruby
=20
require 'zlib'
=20
puts %(require "zlib"),
%(puts Zlib::Inflate.inflate(DATA.read)),
"__END__",
Zlib::Deflate.deflate(ARGF.read)
=20
It definitely fails on human readability, though compression and dry

:) Well, as the reason for this idea was clearly an aid to code
maintenance, I think I'd fail this, but you'd have to get marks for
subversion, in both the subversIVE and svn senses of the word :)
is quite good ;-)
=20
cheers,
=20
Brian
=20
Thank you,
Hugh
---559023410-758783491-1130926764=:12187--
---559023410-758783491-1130926764=:12187--
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,968
Messages
2,570,152
Members
46,698
Latest member
LydiaHalle

Latest Threads

Top