Seeking the Ruby way

Todd Breiholz · Feb 2, 2006

------=_Part_7436_16587385.1138917441910
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline

I'm just getting my feet wet with Ruby and would like some advice on how yo=
u
"old-timers" would write the following script using Ruby idioms.

The intent of the script is to parse a CSV file that contains 2 fields per
row, sorted on the second field. There may be multiple rows for field 2. I
want to get a list of all of the unique values of field2 that has more than
1 value for the 1st 6 characters of field 1.

Here's what I did:

require 'csv'

last_account_id =3D ''
last_adv_id =3D ''
parent_co_ids =3D []
cntr =3D 0
first =3D true
CSV::Reader.parse(File.open('e:\\tmp\\20060201\\bsa.csv', 'r')) do |row|
if row[1] =3D=3D last_account_id
parent_co_ids << last_adv_id[0, 6] unless
parent_co_ids.include?(last_adv_id[0, 6])
else
if !first
parent_co_ids << last_adv_id[0, 6] unless
parent_co_ids.include?(last_adv_id[0, 6])
if parent_co_ids.size > 1
puts "#{last_account_id} - (#{parent_co_ids.join(',')})"
cntr =3D cntr + 1
end
parent_co_ids.clear
else
first =3D false
end
end
last_account_id =3D row[1]
last_adv_id =3D row[0]
end
puts "Found #{cntr} accounts with multiple parent companies"

Thanks in advance!

Todd Breiholz

------=_Part_7436_16587385.1138917441910--

ara.t.howard · Feb 2, 2006

I'm just getting my feet wet with Ruby and would like some advice on how you
"old-timers" would write the following script using Ruby idioms.

The intent of the script is to parse a CSV file that contains 2 fields per
row, sorted on the second field. There may be multiple rows for field 2. I
want to get a list of all of the unique values of field2 that has more than
1 value for the 1st 6 characters of field 1.

Here's what I did:

require 'csv'

last_account_id = ''
last_adv_id = ''
parent_co_ids = []
cntr = 0
first = true
CSV::Reader.parse(File.open('e:\\tmp\\20060201\\bsa.csv', 'r')) do |row|
if row[1] == last_account_id
parent_co_ids << last_adv_id[0, 6] unless
parent_co_ids.include?(last_adv_id[0, 6])
else
if !first
parent_co_ids << last_adv_id[0, 6] unless
parent_co_ids.include?(last_adv_id[0, 6])
if parent_co_ids.size > 1
puts "#{last_account_id} - (#{parent_co_ids.join(',')})"
cntr = cntr + 1
end
parent_co_ids.clear
else
first = false
end
end
last_account_id = row[1]
last_adv_id = row[0]
end
puts "Found #{cntr} accounts with multiple parent companies"

Thanks in advance!

Todd Breiholz

harp:~ > cat a.rb
require "csv"
require "yaml"

path = ARGV.shift
sum = Hash::new{|h,k| h[k] = 0}
count = lambda{|row| sum[row.last.to_s[0,6]] += 1}
CSV:

pen(path,"r"){|row| count[row]}
y sum.delete_if{|k,v| v == 1}

harp:~ > cat in.csv
0,aaaaaa___
1,aaaaaa___
2,aaabbb___
3,aaabbb___
4,aaabbb___
5,aaaccc___

harp:~ > ruby a.rb in.csv
---
aaaaaa: 2
aaabbb: 3

hth. regards.

-a

Jacob Fugal · Feb 2, 2006

require "csv"
require "yaml"

path =3D ARGV.shift
sum =3D Hash::new{|h,k| h[k] =3D 0}
count =3D lambda{|row| sum[row.last.to_s[0,6]] +=3D 1}
CSV:pen(path,"r"){|row| count[row]}
y sum.delete_if{|k,v| v =3D=3D 1}

I'm curious why you decided to make `count` its own lambda when:

1) It's only ever used once
2) The block that uses it has only one statement, namely the call to `cou=
nt`
3) count and the block to CSV:

pen have the same signature

I think at a minimum, given 2) and 3), I'd just replace the block to
CSV:

pen with count itself:

count =3D lambda{|row| sum[row.last.to_s[0,6]] +=3D 1}
CSV:

pen(path,"r", &count)

Then, since count isn't used anywhere else, I'd join those together:

CSV:

pen(path,"r"){|row| sum[row.last.to_s[0,6]] +=3D 1}

After those transformations:

galadriel:~ lukfugl$ cat a.rb
require "csv"
require "yaml"

path =3D ARGV.shift
sum =3D Hash::new{|h,k| h[k] =3D 0}
CSV:

pen(path,"r"){|row| sum[row.last.to_s[0,6]] +=3D 1}
y sum.delete_if{|k,v| v =3D=3D 1}

galadriel:~ lukfugl$ cat in.csv
0,aaaaaa___
1,aaaaaa___
2,aaabbb___
3,aaabbb___
4,aaabbb___
5,aaaccc___

galadriel:~ lukfugl$ ruby a.rb in.csv
---
aaaaaa: 2
aaabbb: 3

Just seems a little clearer to me over having an extra one-time use lambda.

Jacob Fugal

Joel VanderWerf · Feb 2, 2006

Jacob said:
sum = Hash::new{|h,k| h[k] = 0}

And for some reason, I tend to write

sum = Hash.new(0)

when dealing with an immediate value. (But maybe it's a better practice
to use Ara's form, so that if you ever replace 0 with, say, a matrix,
you don't reuse the same object for each key in the hash.)

ara.t.howard · Feb 2, 2006

require "csv"
require "yaml"

path = ARGV.shift
sum = Hash::new{|h,k| h[k] = 0}
count = lambda{|row| sum[row.last.to_s[0,6]] += 1}
CSV:pen(path,"r"){|row| count[row]}
y sum.delete_if{|k,v| v == 1}

Click to expand...

I'm curious why you decided to make `count` its own lambda when:

1) It's only ever used once
2) The block that uses it has only one statement, namely the call to `count`
3) count and the block to CSV:pen have the same signature

it's for abstraction only. i wrote how to count before writing the csv open
line. when i wrote it ended up with something like

CSV:

pen(path,"r"){|row| p row; count[row]}

during editing - as i always seem to for debugging ;-)

basically i find

{{{{}}}}

tough to read sometimes and factor out things using lambda. it's rare that it
acutally ends up being the the only thing left as in this case - but here you
are quite right that it can be compacted.

I think at a minimum, given 2) and 3), I'd just replace the block to
CSV:pen with count itself:

count = lambda{|row| sum[row.last.to_s[0,6]] += 1}
CSV:pen(path,"r", &count)

Then, since count isn't used anywhere else, I'd join those together:

CSV:pen(path,"r"){|row| sum[row.last.to_s[0,6]] += 1}

but i disagree here. people, esp nubies will look at that and say - what?
whereas reading

count = lambda{|row| sum[row.last.to_s[0,6]] += 1}

... count[row] ...

is pretty clear. i often us variable as comments to others and myself. eg.
what does this do:

password = "#{ sifname }_#{ eval( ((0...256).to_a.map{|c| c.chr}.sort_by{rand}.select{|c| c =~ %r/[[

rint:]]/})[0,4].join.inspect ) }"

hard to say huh?

how about this?

four_random_printable_chars = eval( ((0...256).to_a.map{|c| c.chr}.sort_by{rand}.select{|c| c =~ %r/[[

rint:]]/})[0,4].join.inspect )
password = "#{ sifname }_#{ four_random_printable_chars }"

ugly (yes i'm hacking like crazy today) but at least anyone reading it (most
importantly me) knows what i'm trying to do if not how!

anyhow - same goes with 'count': it's all good until you start cutting and
pasting - then you want vars not wicked expressions to move around.

Just seems a little clearer to me over having an extra one-time use lambda.

__iff__ you are good at reading ruby ;-)

cheers.

-a

Jacob Fugal · Feb 3, 2006

it's for abstraction only.

basically i find

{{{{}}}}

tough to read sometimes and factor out things using lambda. it's rare th= at it
acutally ends up being the the only thing left as in this case - but here= you
are quite right that it can be compacted.

Yeah, I agree. I often use similar abstraction techniques for
readability. My brain just has the tendency to refactor code inwards
as well as outwards when an abstraction seems extraneous.

CSV:pen(path,"r"){|row| sum[row.last.to_s[0,6]] +=3D 1}

Click to expand...

but i disagree here. people, esp nubies will look at that and say - what= ?
whereas reading

count =3D lambda{|row| sum[row.last.to_s[0,6]] +=3D 1}

... count[row] ...

is pretty clear. i often us variable as comments to others and myself.

Again, agreed. In this case though I don't think the abstraction of
naming sum[...] +=3D 1 as count is a necessary one. If I were to
refactor part of the complex expression

sum[row.last.to_s[0,6]] +=3D 1

to improve readability, it would be the index:

identifier_prefix =3D lambda{ |row| row.last.to_s[0,6] }
... sum[identifier_prefix[row]] +=3D 1 ...

what does this do:

password =3D "#{ sifname }_#{ eval( ((0...256).to_a.map{|c| c.chr}.sor=

t_by{rand}.select{|c| c =3D~ %r/[[

rint:]]/})[0,4].join.inspect ) }"

hard to say huh?

Ick, yes, I'd definitely split that into chunks.

how about this?

four_random_printable_chars =3D eval( ((0...256).to_a.map{|c| c.chr}.s=

ort_by{rand}.select{|c| c =3D~ %r/[[

rint:]]/})[0,4].join.inspect )

password =3D "#{ sifname }_#{ four_random_printable_chars }"

ugly (yes i'm hacking like crazy today) but at least anyone reading it (m= ost
importantly me) knows what i'm trying to do if not how!

If you say so...

Jacob Fugal

William James · Feb 3, 2006

Todd said:
I'm just getting my feet wet with Ruby and would like some advice on how you
"old-timers" would write the following script using Ruby idioms.

The intent of the script is to parse a CSV file that contains 2 fields per
row, sorted on the second field. There may be multiple rows for field 2. I
want to get a list of all of the unique values of field2 that has more than
1 value for the 1st 6 characters of field 1.

--- input data -----
123456ab,900
123456cd,900
123456ef,909
012345gh,909
--- end of input -----

--- Using a hash of arrays:

require 'csv'

h = Hash.new{ [] }
CSV::Reader.parse(File.open( ARGV.first )) { |row|
h[row.last] |= [ row.first[0,6] ] }
p h.delete_if{|k,v| v.size == 1 }

--- output -----
{"909"=>["123456", "012345"]}
--- end of output -----

--- Using a hash of hashes:

require 'csv'

h = Hash.new{|h,k| h[k] = {} }
CSV::Reader.parse(File.open( ARGV.first )) { |row|
h[row.last][ row.first[0,6] ] = 8 }
p h.delete_if{|k,v| v.size == 1 }

--- output -----
{"909"=>{"012345"=>8, "123456"=>8}}
--- end of output -----

Robert Klemme · Feb 3, 2006

William said:
Todd said:

I'm just getting my feet wet with Ruby and would like some advice on
how you "old-timers" would write the following script using Ruby
idioms.

The intent of the script is to parse a CSV file that contains 2
fields per row, sorted on the second field. There may be multiple
rows for field 2. I want to get a list of all of the unique values
of field2 that has more than 1 value for the 1st 6 characters of
field 1.

Click to expand...

--- input data -----
123456ab,900
123456cd,900
123456ef,909
012345gh,909
--- end of input -----

--- Using a hash of arrays:

require 'csv'

h = Hash.new{ [] }

I wonder how this works since the Hash never stores these arrays.

CSV::Reader.parse(File.open( ARGV.first )) { |row|
h[row.last] |= [ row.first[0,6] ] }
p h.delete_if{|k,v| v.size == 1 }

--- output -----
{"909"=>["123456", "012345"]}
--- end of output -----

Is this really the output of the script above?

robert

Robert Klemme · Feb 3, 2006

Todd said:
I'm just getting my feet wet with Ruby and would like some advice on
how you "old-timers" would write the following script using Ruby
idioms.

The intent of the script is to parse a CSV file that contains 2
fields per row, sorted on the second field. There may be multiple
rows for field 2. I want to get a list of all of the unique values of
field2 that has more than 1 value for the 1st 6 characters of field 1.

There are two possible interpretations of what you state here:

1. You want all values for row2 that occur more than once.

2. You want all values for row2 that have more than one distinct row1
value.

Implementations:

ad 1.

require 'csv'

h = Hash.new(0)
CSV::Reader.parse(ARGF) {|row| h[row[1]] += 1}
h.each {|k,v| puts k if v > 1}

ad 2.

require 'csv'
require 'set'

h = Hash.new {|h,k| h[k] = Set.new}
CSV::Reader.parse(ARGF) {|row| h[row[1]] << row[0]}
h.each {|k,v| puts k if v.size > 1}

Note: CSV::Reader can use ARGF which makes it easy to read from stdin as
well as multiple files.

Kind regards

robert

Robert Klemme · Feb 3, 2006

Robert said:
There are two possible interpretations of what you state here:

1. You want all values for row2 that occur more than once.

Just remembered that the file is sorted. Then this implementation of case
1 is even more efficient as it does not store values in mem and works on
arbitrary large files:

require 'csv'

last = nil
CSV::Reader.parse(ARGF) do |row|
last, k = row[1], last
puts k if last == k
end

Kind regards

robert

trouble installing Ruby DBI for MySQL	4	Nov 16, 2005
rake aborted! Validation failed:	4	Oct 12, 2010
Searching a CSV file - beginner seeking help	7	Apr 1, 2011
Win32OLE + DRb - Windows = Fun	2	Feb 10, 2006
[SOLUTION] Ruby Quiz #33 Tiling Turmoil	2	May 22, 2005
[QUIZ][SUMMARY] Restoring Data From SQL (#199)	1	Apr 14, 2009
MiniQuiz : Renesting Nodes (OWLScratch)	1	Jun 23, 2005
Obtain 'CSV' data	11	Jan 23, 2011

Seeking the Ruby way

Todd Breiholz

ara.t.howard

Jacob Fugal

Joel VanderWerf

ara.t.howard

Jacob Fugal

William James

Robert Klemme

Robert Klemme

Robert Klemme

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads