How to do this complicated logic in ruby

V

Valentino Lun

Dear all

I have an array with size around 1000, I want to perform some data
checking and correction in this array.

For instance, the first record of this array is a hash, as follow
my_array[0] = {"server"=>"AHN", "hosp"=>"AHN", "loc"=>"PC1",
"pspec"=>"ANA", "number"=>"1", "pcat"=>"1"}

server hosp loc pspec pcat
AHN AHN PC1 ANA 1
PWH AHN PC1 ANA 1
NDH AHN PC1 ANA 2 <= This pcat value need update in
array1
TMH AHN PC1 ANA 2 <= This pcat value need update in
array1
.......
.....
...
(around 1000 records)

When keys hosp, loc, pspec has the same values, their pcat must be
identical. So, there is problem in the last two records, the key pcat
should be 1, because the pcat is correct if array["server"] equal to
array["hosp"].

I cannot figure out the logic to doing this in ruby (even in other
language). Can someone give me some hints on this? Thanks

Many thanks
Valentino
 
M

Martin DeMello

Dear all

I have an array with size around 1000, I want to perform some data
checking and correction in this array.

For instance, the first record of this array is a hash, as follow
my_array[0] = {"server"=>"AHN", "hosp"=>"AHN", "loc"=>"PC1",
"pspec"=>"ANA", "number"=>"1", "pcat"=>"1"}

server hosp loc pspec pcat
AHN AHN PC1 ANA 1
PWH AHN PC1 ANA 1
NDH AHN PC1 ANA 2 <= This pcat value need update in
array1
TMH AHN PC1 ANA 2 <= This pcat value need update in
array1
.......
.....
...
(around 1000 records)

When keys hosp, loc, pspec has the same values, their pcat must be
identical. So, there is problem in the last two records, the key pcat
should be 1, because the pcat is correct if array["server"] equal to
array["hosp"].

Simple way:

1. Have a 'signature' for each row, composed of the hosp, loc and
pspec. Could be as simple as

def signature(ary, row)
%w(hosp loc pspec).map {|k| ary[row][k]}.join(",")
end

2. Collect all the rows with the same signature

verify = Hash.new {|h,k| h[k] = []}
ary.each_with_index {|row, i|
h[signature(ary, row)] << [i, row['pcat']]
}

3. See if there are any problems

verify.each_pair {|k, v|
if v.length > 1
fix_array_for(v)
end
}

4. Write fix_array_for(v)

Note that v is an array of pairs of [index, pcat]. So for your
example, it would be
[[0,1], [1,1], [2,2], [3,2]]

you basically need to iterate over that array, see which pcat is
right, then iterate over it once more and set all the pcats to the
right value.

There are probably more efficient ways to do all this, but this has
the advantage of being straightforward.

martin
 
L

Luiz Vitor Martinez Cardoso

Using Symbols here make a big sense. Try to structure your array like:

my_array[0] =3D {:server =3D> "AHN", :hosp =3D>"AHN", :loc =3D>"PC1",
:pspec=3D>"ANA", :number=3D>"1", :pcat=3D>"1"}

And for all the values that are frequently repeated use Symbols. Basically
when you use Symbols you create one object and all the times that you use
one object with the same name you create a referece to this object and NOT
another object. Making that you will free memory.

Regards,
Luiz Vitor.

Dear all

I have an array with size around 1000, I want to perform some data
checking and correction in this array.

For instance, the first record of this array is a hash, as follow
my_array[0] =3D {"server"=3D>"AHN", "hosp"=3D>"AHN", "loc"=3D>"PC1",
"pspec"=3D>"ANA", "number"=3D>"1", "pcat"=3D>"1"}

server hosp loc pspec pcat
AHN AHN PC1 ANA 1
PWH AHN PC1 ANA 1
NDH AHN PC1 ANA 2 <=3D This pcat value need update in
array1
TMH AHN PC1 ANA 2 <=3D This pcat value need update in
array1
.......
.....
...
(around 1000 records)

When keys hosp, loc, pspec has the same values, their pcat must be
identical. So, there is problem in the last two records, the key pcat
should be 1, because the pcat is correct if array["server"] equal to
array["hosp"].

Simple way:

1. Have a 'signature' for each row, composed of the hosp, loc and
pspec. Could be as simple as

def signature(ary, row)
%w(hosp loc pspec).map {|k| ary[row][k]}.join(",")
end

2. Collect all the rows with the same signature

verify =3D Hash.new {|h,k| h[k] =3D []}
ary.each_with_index {|row, i|
h[signature(ary, row)] << [i, row['pcat']]
}

3. See if there are any problems

verify.each_pair {|k, v|
if v.length > 1
fix_array_for(v)
end
}

4. Write fix_array_for(v)

Note that v is an array of pairs of [index, pcat]. So for your
example, it would be
[[0,1], [1,1], [2,2], [3,2]]

you basically need to iterate over that array, see which pcat is
right, then iterate over it once more and set all the pcats to the
right value.

There are probably more efficient ways to do all this, but this has
the advantage of being straightforward.

martin


--=20
Regards,

Luiz Vitor Martinez Cardoso
cel.: (11) 8187-8662
blog: rubz.org
engineer student at maua.br

"Posso nunca chegar a ser o melhor engenheiro do mundo, mas tenha certeza d=
e
que eu vou lutar com todas as minhas for=C3=A7as para ser o melhor engenhei=
ro que
eu puder ser"
 
M

Martin DeMello

Using Symbols here make a big sense. Try to structure your array like:

my_array[0] = {:server => "AHN", :hosp =>"AHN", :loc =>"PC1",
:pspec=>"ANA", :number=>"1", :pcat=>"1"}

And for all the values that are frequently repeated use Symbols. Basically
when you use Symbols you create one object and all the times that you use
one object with the same name you create a referece to this object and NOT
another object. Making that you will free memory.

Even better: http://www.codeforpeople.com/lib/ruby/arrayfields/

martin
 
R

Robert Klemme

2009/2/16 Valentino Lun said:
Dear all

I have an array with size around 1000, I want to perform some data
checking and correction in this array.

For instance, the first record of this array is a hash, as follow
my_array[0] = {"server"=>"AHN", "hosp"=>"AHN", "loc"=>"PC1",
"pspec"=>"ANA", "number"=>"1", "pcat"=>"1"}

server hosp loc pspec pcat
AHN AHN PC1 ANA 1
PWH AHN PC1 ANA 1
NDH AHN PC1 ANA 2 <= This pcat value need update in
array1
TMH AHN PC1 ANA 2 <= This pcat value need update in
array1
.......
.....
...
(around 1000 records)

When keys hosp, loc, pspec has the same values, their pcat must be
identical. So, there is problem in the last two records, the key pcat
should be 1, because the pcat is correct if array["server"] equal to
array["hosp"].

I cannot figure out the logic to doing this in ruby (even in other
language). Can someone give me some hints on this? Thanks

IMHO this is plainly the wrong data structure for the task. Since you
identify entries by their hosp, loc, pspec you should *index* the
whole thing by these columns. Also, since your Hashes seem to be
uniform I would rather define a particular type for this, e.g.

Entry = Struct.new :server, :hosp, :loc, :pspec, :pcat

EntryKey = Struct.new :server, :hosp, :loc do
def self.create(entry)
new(*members.map {|m| entry[m]})
end
end

index = Hash.new {|h,k| h[k] = []}
# loop reading input
entry = ...
index[EntryKey.create(entry)] << entry

# now you can process them or do it while reading

See also Martin's reply which goes into the same direction just with a
different approach.

Cheers

robert
 
M

Martin DeMello

EntryKey = Struct.new :server, :hosp, :loc do
def self.create(entry)
new(*members.map {|m| entry[m]})
end
end

index = Hash.new {|h,k| h[k] = []}
# loop reading input
entry = ...
index[EntryKey.create(entry)] << entry

# now you can process them or do it while reading

See also Martin's reply which goes into the same direction just with a
different approach.

The different approach is mostly due to the fact that I'm
uncomfortable using objects with mutable fieds as hash keys. I prefer
to explicitly map them to a string, and then use that string as a hash
key.

martin
 
R

Robert Klemme

2009/2/16 Martin DeMello said:
EntryKey = Struct.new :server, :hosp, :loc do
def self.create(entry)
new(*members.map {|m| entry[m]})
end
end

index = Hash.new {|h,k| h[k] = []}
# loop reading input
entry = ...
index[EntryKey.create(entry)] << entry

# now you can process them or do it while reading

See also Martin's reply which goes into the same direction just with a
different approach.

The different approach is mostly due to the fact that I'm
uncomfortable using objects with mutable fieds as hash keys. I prefer
to explicitly map them to a string, and then use that string as a hash
key.

Hehe, that would be something *I* would be uncomfortable with. :) It
is interesting that you advertise this approach as a more robust one.
Because IMHO this is more on the hackish side of things because
instead of using a structured type you lump everything into a single
unstructured object. This can break awfully (i.e. in your example, if
fields contain "," in different places).

The nice thing about Struct is that it defines #==, #eql? and #hash
properly making generated classes suitable as Hash keys. If you are
afraid of mutations you can always freeze keys.

Kind regards

robert
 
P

Pit Capitain

2009/2/16 Valentino Lun said:
I cannot figure out the logic to doing this in ruby (even in other
language). Can someone give me some hints on this? Thanks

While I agree on what the others have said, that you should create a
better data structure, here's a way to do what you wanted with your
array of hashes. But look at the other posts. It's easy to build good
data structures in Ruby.

# create a key for the given record to be used in the pcat hash
def pcat_key(record)
[record["hosp"], record["loc"], record["psec"]]
end

# build hash with valid pcat values
pcat = {}
my_array.each do |record|
next unless record["server"] == record["hosp"]
pcat[pcat_key(record)] = record["pcat"]
end

# look for invalid records
my_array.each do |record|
next if record["pcat"] == pcat[pcat_key(record)]
# do something with the invalid record
p record
end

Regards,
Pit
 
V

Valentino Lun

Dear all

Thank you for your help. Finally, I used about 5 hours (>_<) to figure
out my solution and it works..But it takes long time to execute.

Below is my code to share with you all, and I am seeking your expert
advices if any optimization can be done. Thank you.


# data collection about 5000 records for each variable (lis, gcrs)
lis = ActiveRecord::Base.connection.execute("select * from lis_requests
order by hosp, spec, loc, pspec")
gcrs = ActiveRecord::Base.connection.execute("select * from
gcrs_requests order by hosp, spec, loc, pspec")

def find_correct_pcat(arr)

server_ref = {"AHN" => "AHN", "TPH" => "AHN",
"NDH" => "NDH", "BBH" => "NDH", "CHS" => "NDH",
"PWH" => "PWH", "SH" => "PWH"}

arr.each do |x|
return x["pcat"] if x["server"] == server_ref[x["hosp"]]
end

#if not, then find the pcat with the largest "number"
a.sort_by {|y| y["number"].to_i}.last["pcat"]

end

# The result will put in this hash
result = {}

#looping in all index key and get the result.
lis.collect {|x| [x["hosp"],x["spec"],x["loc"],x["pspec"]]}.uniq.each do
|index_key|

lis_record = lis.select {|x| x["hosp"] == index_key[0] and x["spec"]
== index_key[1] and x["loc"] == index_key[2] and x["pspec"] ==
index_key[3]}
gcrs_record = gcrs.select {|x| x["hosp"] == index_key[0] and x["spec"]
== index_key[1] and x["loc"] == index_key[2] and x["pspec"] ==
index_key[3]}
lis_req_count = lis_record.inject(0) {|sum,n| sum + n["number"].to_i }
gcrs_req_count = gcrs_record.inject(0) {|sum,n| sum + n["number"].to_i
}

if lis_record.collect {|x| x["pcat"]}.uniq.size == 1
pcat = lis_record.first["pcat"]
else
pcat = find_correct_pcat(lis_record)
end

result[index_key] = [pcat, gcrs_req_count, lis_req_count]

end

Thanks again
Valentino
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,236
Members
46,822
Latest member
israfaceZa

Latest Threads

Top