Seeking the Ruby way

T

Todd Breiholz

------=_Part_7436_16587385.1138917441910
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline

I'm just getting my feet wet with Ruby and would like some advice on how yo=
u
"old-timers" would write the following script using Ruby idioms.

The intent of the script is to parse a CSV file that contains 2 fields per
row, sorted on the second field. There may be multiple rows for field 2. I
want to get a list of all of the unique values of field2 that has more than
1 value for the 1st 6 characters of field 1.

Here's what I did:

require 'csv'

last_account_id =3D ''
last_adv_id =3D ''
parent_co_ids =3D []
cntr =3D 0
first =3D true
CSV::Reader.parse(File.open('e:\\tmp\\20060201\\bsa.csv', 'r')) do |row|
if row[1] =3D=3D last_account_id
parent_co_ids << last_adv_id[0, 6] unless
parent_co_ids.include?(last_adv_id[0, 6])
else
if !first
parent_co_ids << last_adv_id[0, 6] unless
parent_co_ids.include?(last_adv_id[0, 6])
if parent_co_ids.size > 1
puts "#{last_account_id} - (#{parent_co_ids.join(',')})"
cntr =3D cntr + 1
end
parent_co_ids.clear
else
first =3D false
end
end
last_account_id =3D row[1]
last_adv_id =3D row[0]
end
puts "Found #{cntr} accounts with multiple parent companies"

Thanks in advance!

Todd Breiholz

------=_Part_7436_16587385.1138917441910--
 
A

ara.t.howard

I'm just getting my feet wet with Ruby and would like some advice on how you
"old-timers" would write the following script using Ruby idioms.

The intent of the script is to parse a CSV file that contains 2 fields per
row, sorted on the second field. There may be multiple rows for field 2. I
want to get a list of all of the unique values of field2 that has more than
1 value for the 1st 6 characters of field 1.

Here's what I did:

require 'csv'

last_account_id = ''
last_adv_id = ''
parent_co_ids = []
cntr = 0
first = true
CSV::Reader.parse(File.open('e:\\tmp\\20060201\\bsa.csv', 'r')) do |row|
if row[1] == last_account_id
parent_co_ids << last_adv_id[0, 6] unless
parent_co_ids.include?(last_adv_id[0, 6])
else
if !first
parent_co_ids << last_adv_id[0, 6] unless
parent_co_ids.include?(last_adv_id[0, 6])
if parent_co_ids.size > 1
puts "#{last_account_id} - (#{parent_co_ids.join(',')})"
cntr = cntr + 1
end
parent_co_ids.clear
else
first = false
end
end
last_account_id = row[1]
last_adv_id = row[0]
end
puts "Found #{cntr} accounts with multiple parent companies"

Thanks in advance!

Todd Breiholz

harp:~ > cat a.rb
require "csv"
require "yaml"

path = ARGV.shift
sum = Hash::new{|h,k| h[k] = 0}
count = lambda{|row| sum[row.last.to_s[0,6]] += 1}
CSV::eek:pen(path,"r"){|row| count[row]}
y sum.delete_if{|k,v| v == 1}



harp:~ > cat in.csv
0,aaaaaa___
1,aaaaaa___
2,aaabbb___
3,aaabbb___
4,aaabbb___
5,aaaccc___




harp:~ > ruby a.rb in.csv
---
aaaaaa: 2
aaabbb: 3


hth. regards.

-a
 
J

Jacob Fugal

require "csv"
require "yaml"

path =3D ARGV.shift
sum =3D Hash::new{|h,k| h[k] =3D 0}
count =3D lambda{|row| sum[row.last.to_s[0,6]] +=3D 1}
CSV::eek:pen(path,"r"){|row| count[row]}
y sum.delete_if{|k,v| v =3D=3D 1}

I'm curious why you decided to make `count` its own lambda when:

1) It's only ever used once
2) The block that uses it has only one statement, namely the call to `cou=
nt`
3) count and the block to CSV::eek:pen have the same signature

I think at a minimum, given 2) and 3), I'd just replace the block to
CSV::eek:pen with count itself:

count =3D lambda{|row| sum[row.last.to_s[0,6]] +=3D 1}
CSV::eek:pen(path,"r", &count)

Then, since count isn't used anywhere else, I'd join those together:

CSV::eek:pen(path,"r"){|row| sum[row.last.to_s[0,6]] +=3D 1}

After those transformations:

galadriel:~ lukfugl$ cat a.rb
require "csv"
require "yaml"

path =3D ARGV.shift
sum =3D Hash::new{|h,k| h[k] =3D 0}
CSV::eek:pen(path,"r"){|row| sum[row.last.to_s[0,6]] +=3D 1}
y sum.delete_if{|k,v| v =3D=3D 1}

galadriel:~ lukfugl$ cat in.csv
0,aaaaaa___
1,aaaaaa___
2,aaabbb___
3,aaabbb___
4,aaabbb___
5,aaaccc___

galadriel:~ lukfugl$ ruby a.rb in.csv
---
aaaaaa: 2
aaabbb: 3

Just seems a little clearer to me over having an extra one-time use lambda.

Jacob Fugal
 
J

Joel VanderWerf

Jacob said:
sum = Hash::new{|h,k| h[k] = 0}

And for some reason, I tend to write

sum = Hash.new(0)

when dealing with an immediate value. (But maybe it's a better practice
to use Ara's form, so that if you ever replace 0 with, say, a matrix,
you don't reuse the same object for each key in the hash.)
 
A

ara.t.howard

require "csv"
require "yaml"

path = ARGV.shift
sum = Hash::new{|h,k| h[k] = 0}
count = lambda{|row| sum[row.last.to_s[0,6]] += 1}
CSV::eek:pen(path,"r"){|row| count[row]}
y sum.delete_if{|k,v| v == 1}

I'm curious why you decided to make `count` its own lambda when:

1) It's only ever used once
2) The block that uses it has only one statement, namely the call to `count`
3) count and the block to CSV::eek:pen have the same signature

it's for abstraction only. i wrote how to count before writing the csv open
line. when i wrote it ended up with something like

CSV::eek:pen(path,"r"){|row| p row; count[row]}

during editing - as i always seem to for debugging ;-)

basically i find

{{{{}}}}

tough to read sometimes and factor out things using lambda. it's rare that it
acutally ends up being the the only thing left as in this case - but here you
are quite right that it can be compacted.
I think at a minimum, given 2) and 3), I'd just replace the block to
CSV::eek:pen with count itself:
count = lambda{|row| sum[row.last.to_s[0,6]] += 1}
CSV::eek:pen(path,"r", &count)

Then, since count isn't used anywhere else, I'd join those together:

CSV::eek:pen(path,"r"){|row| sum[row.last.to_s[0,6]] += 1}

but i disagree here. people, esp nubies will look at that and say - what?
whereas reading

count = lambda{|row| sum[row.last.to_s[0,6]] += 1}

... count[row] ...

is pretty clear. i often us variable as comments to others and myself. eg.
what does this do:

password = "#{ sifname }_#{ eval( ((0...256).to_a.map{|c| c.chr}.sort_by{rand}.select{|c| c =~ %r/[[:print:]]/})[0,4].join.inspect ) }"

hard to say huh?

how about this?

four_random_printable_chars = eval( ((0...256).to_a.map{|c| c.chr}.sort_by{rand}.select{|c| c =~ %r/[[:print:]]/})[0,4].join.inspect )
password = "#{ sifname }_#{ four_random_printable_chars }"

ugly (yes i'm hacking like crazy today) but at least anyone reading it (most
importantly me) knows what i'm trying to do if not how!

anyhow - same goes with 'count': it's all good until you start cutting and
pasting - then you want vars not wicked expressions to move around.
Just seems a little clearer to me over having an extra one-time use lambda.

__iff__ you are good at reading ruby ;-)

cheers.

-a
 
J

Jacob Fugal

it's for abstraction only.

basically i find

{{{{}}}}

tough to read sometimes and factor out things using lambda. it's rare th= at it
acutally ends up being the the only thing left as in this case - but here= you
are quite right that it can be compacted.

Yeah, I agree. I often use similar abstraction techniques for
readability. My brain just has the tendency to refactor code inwards
as well as outwards when an abstraction seems extraneous.
CSV::eek:pen(path,"r"){|row| sum[row.last.to_s[0,6]] +=3D 1}

but i disagree here. people, esp nubies will look at that and say - what= ?
whereas reading

count =3D lambda{|row| sum[row.last.to_s[0,6]] +=3D 1}

... count[row] ...

is pretty clear. i often us variable as comments to others and myself.

Again, agreed. In this case though I don't think the abstraction of
naming sum[...] +=3D 1 as count is a necessary one. If I were to
refactor part of the complex expression

sum[row.last.to_s[0,6]] +=3D 1

to improve readability, it would be the index:

identifier_prefix =3D lambda{ |row| row.last.to_s[0,6] }
... sum[identifier_prefix[row]] +=3D 1 ...
what does this do:

password =3D "#{ sifname }_#{ eval( ((0...256).to_a.map{|c| c.chr}.sor=
t_by{rand}.select{|c| c =3D~ %r/[[:print:]]/})[0,4].join.inspect ) }"
hard to say huh?

Ick, yes, I'd definitely split that into chunks. :)
how about this?

four_random_printable_chars =3D eval( ((0...256).to_a.map{|c| c.chr}.s=
ort_by{rand}.select{|c| c =3D~ %r/[[:print:]]/})[0,4].join.inspect )
password =3D "#{ sifname }_#{ four_random_printable_chars }"

ugly (yes i'm hacking like crazy today) but at least anyone reading it (m= ost
importantly me) knows what i'm trying to do if not how!

If you say so... ;)

Jacob Fugal
 
W

William James

Todd said:
I'm just getting my feet wet with Ruby and would like some advice on how you
"old-timers" would write the following script using Ruby idioms.

The intent of the script is to parse a CSV file that contains 2 fields per
row, sorted on the second field. There may be multiple rows for field 2. I
want to get a list of all of the unique values of field2 that has more than
1 value for the 1st 6 characters of field 1.

--- input data -----
123456ab,900
123456cd,900
123456ef,909
012345gh,909
--- end of input -----

--- Using a hash of arrays:

require 'csv'

h = Hash.new{ [] }
CSV::Reader.parse(File.open( ARGV.first )) { |row|
h[row.last] |= [ row.first[0,6] ] }
p h.delete_if{|k,v| v.size == 1 }

--- output -----
{"909"=>["123456", "012345"]}
--- end of output -----


--- Using a hash of hashes:

require 'csv'

h = Hash.new{|h,k| h[k] = {} }
CSV::Reader.parse(File.open( ARGV.first )) { |row|
h[row.last][ row.first[0,6] ] = 8 }
p h.delete_if{|k,v| v.size == 1 }

--- output -----
{"909"=>{"012345"=>8, "123456"=>8}}
--- end of output -----
 
R

Robert Klemme

William said:
Todd said:
I'm just getting my feet wet with Ruby and would like some advice on
how you "old-timers" would write the following script using Ruby
idioms.

The intent of the script is to parse a CSV file that contains 2
fields per row, sorted on the second field. There may be multiple
rows for field 2. I want to get a list of all of the unique values
of field2 that has more than 1 value for the 1st 6 characters of
field 1.

--- input data -----
123456ab,900
123456cd,900
123456ef,909
012345gh,909
--- end of input -----

--- Using a hash of arrays:

require 'csv'

h = Hash.new{ [] }

I wonder how this works since the Hash never stores these arrays.
CSV::Reader.parse(File.open( ARGV.first )) { |row|
h[row.last] |= [ row.first[0,6] ] }
p h.delete_if{|k,v| v.size == 1 }

--- output -----
{"909"=>["123456", "012345"]}
--- end of output -----

Is this really the output of the script above?

robert
 
R

Robert Klemme

Todd said:
I'm just getting my feet wet with Ruby and would like some advice on
how you "old-timers" would write the following script using Ruby
idioms.

The intent of the script is to parse a CSV file that contains 2
fields per row, sorted on the second field. There may be multiple
rows for field 2. I want to get a list of all of the unique values of
field2 that has more than 1 value for the 1st 6 characters of field 1.

There are two possible interpretations of what you state here:

1. You want all values for row2 that occur more than once.

2. You want all values for row2 that have more than one distinct row1
value.

Implementations:

ad 1.

require 'csv'

h = Hash.new(0)
CSV::Reader.parse(ARGF) {|row| h[row[1]] += 1}
h.each {|k,v| puts k if v > 1}


ad 2.

require 'csv'
require 'set'

h = Hash.new {|h,k| h[k] = Set.new}
CSV::Reader.parse(ARGF) {|row| h[row[1]] << row[0]}
h.each {|k,v| puts k if v.size > 1}

Note: CSV::Reader can use ARGF which makes it easy to read from stdin as
well as multiple files.

Kind regards

robert
 
R

Robert Klemme

Robert said:
There are two possible interpretations of what you state here:

1. You want all values for row2 that occur more than once.

Just remembered that the file is sorted. Then this implementation of case
1 is even more efficient as it does not store values in mem and works on
arbitrary large files:

require 'csv'

last = nil
CSV::Reader.parse(ARGF) do |row|
last, k = row[1], last
puts k if last == k
end

Kind regards

robert
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,999
Messages
2,570,244
Members
46,839
Latest member
MartinaBur

Latest Threads

Top