R
Rainer Weikusat
ela said:I've been working on this problem for 4 days and still cannot come out a
good solution and would appreciate if you could comment on the problem.
Given a table containing cells delimited by tab like this
[ please see original for the indeed gory details ]
Provided I understood the problem correctly, a possible solution could
look like this (this code has had very little testing): First, you
define your groups by associating array references containing the group
with the 'group ID' with the help of a hash:
$grp{1} = [1, 2];
Then, you create a hash mapping the column name to the column value
for each ID and put these hashes into an id hash associated with the
ID:
$id{1} = { F1 => 'SuperC1', F2 => 'C1', F3 => 'subC4' };
$id{2} = { F1 => 'SuperC1', F2 => 'C1', F3 => 'subC3' };
Provided this has been done, a 'consistency' routine can be defined as
sub consistency($$)
{
my ($grp, $col) = @_;
my %seen;
$seen{$_} = 1
for (map { $id{$_}{$col} } @{$grp{$grp}});
return 1.0 / keys(%seen);
}
This takes a group ID and a column name as argument and returns the
'consistency' of this column for this group. Then, an array needs to
be created which names the columns in the order they are supposed to
be checked in:
@order = qw(F3 F2 F1);
Now, a 'decide' routine can be defined like this:
sub decide($)
{
my $grp = $_[0];
consistency($grp, $_) >= THRESHOLD and return $_
for (@order);
return undef;
}
This takes a group ID as argument and returns either the name of the
first column (checked in the order given by @order) whose consistency
is >= the THRESHOLD or undef := inconsistent). As a complete script:
------------------
#!/usr/bin/perl
use constant THRESHOLD => 0.7;
my (%grp, %id, @order, $res);
@order = qw(F3 F2 F1);
$grp{1} = [1, 2];
$id{1} = { F1 => 'SuperC1', F2 => 'C1', F3 => 'subC4' };
$id{2} = { F1 => 'SuperC1', F2 => 'C1', F3 => 'subC3' };
sub consistency($$)
{
my ($grp, $col) = @_;
my %seen;
$seen{$_} = 1
for (map { $id{$_}{$col} } @{$grp{$grp}});
return 1.0 / keys(%seen);
}
sub decide($)
{
my $grp = $_[0];
consistency($grp, $_) >= THRESHOLD and return $_
for (@order);
return undef;
}
$res = decide(1);
$res //= 'inconsistent';
print("$res\n");