list-parsing problem

M

Marcus Claesson

Hi People,

I have a silly little list-parsing problem that I can't get my head
around, and I'm sure some of you have come across it before.

I have a list like this:

1 a
2 b
2 c
3 a
4 d
4 d
4 e
4 f
5 g

and I want to make the first column non-redundant and collect the
second column values on the same line, like this:

1 a
2 b,c
3 a
4 d,e,f
5 g

Please note that line 4 only has one 'd'.

I've tried with both hashes and arrays (don't want to confuse you so I
won't display them here), but nothing really works...

I would really appreciate any help!

Marcus
 
P

Peter Hickman

Marcus said:
Hi People,

I have a silly little list-parsing problem that I can't get my head
around, and I'm sure some of you have come across it before.

I have a list like this:

1 a
2 b
2 c
3 a
4 d
4 d
4 e
4 f
5 g

and I want to make the first column non-redundant and collect the
second column values on the same line, like this:

1 a
2 b,c
3 a
4 d,e,f
5 g

Please note that line 4 only has one 'd'.

I've tried with both hashes and arrays (don't want to confuse you so I
won't display them here), but nothing really works...

I would really appreciate any help!

Marcus

Build up a hash of hashes

For each data pair...

$data{$left}->{$right}=1;

Then report by...

foreach my $key (keys %data) {
print "$key\t";
print join(',', keys %{$data{$key}});
print "\n";
}

Ok could be better but I am only on my first cup of tea.
 
A

Anno Siegel

Marcus Claesson said:
Hi People,

I have a silly little list-parsing problem that I can't get my head
around, and I'm sure some of you have come across it before.

I have a list like this:

1 a
2 b
2 c
3 a
4 d
4 d
4 e
4 f
5 g

and I want to make the first column non-redundant and collect the
second column values on the same line, like this:

1 a
2 b,c
3 a
4 d,e,f
5 g

Please note that line 4 only has one 'd'.

I've tried with both hashes and arrays (don't want to confuse you so I
won't display them here), but nothing really works...

As usual, when uniqueness and duplicates come into play, the Perl
solution involves a hash. In your problem, you want to eliminate
two kinds of duplicates, so we shall need to us the hash trick twice.

First, make the numbers in the first column hash keys that will
eventually point to the list of letters associated with it. That
will bring multiple numbers together.

First off, we create hash(ref)s for each of the numbers, whose keys
will be the letters associated with each number. That will store
each letter only once for a number.

When all the letters have been collected, we transform the hashes to
lists of their keys. Finally, print the result. In code:

my %coll;
while ( <DATA> ) {
my ( $first, $second) = split;
undef $coll{ $first}->{ $second}; # undef brings the key to existence
}

# replace hashes with lists of keys
$_ = [ sort keys %$_ ] for values %coll;

# print result
for ( sort keys %coll ) {
print "$_ -> @{ $coll{ $_}}\n"
}

__DATA__
1 a
2 b
2 c
3 a
4 d
4 d
4 e
4 f
5 g

If the sequence of the numbers or letters doesn't matter, take out
the relevant "sort".

Anno
 
A

Andreas Kahari

Hi People,

I have a silly little list-parsing problem that I can't get my head
around, and I'm sure some of you have come across it before.

I have a list like this: [cut]

and I want to make the first column non-redundant and collect the
second column values on the same line, like this: [cut]

Please note that line 4 only has one 'd'.

I've tried with both hashes and arrays (don't want to confuse you so I
won't display them here), but nothing really works...

I would really appreciate any help!

One, possibly suboptimal, way of doing it:


my %hash;
while (defined(my $line = <>)) {
chomp $line;
my @fields = split /\s+/, $line;
$hash{$fields[0]}{$fields[1]} = 1; # Note its precence
}

while (my ($key, $val) = each %hash) {
printf "%s\t%s\n", $key, join(',', keys %{ $val });
}

No error handling...
 
T

Tom

Abigail said:
Marcus Claesson ([email protected]) wrote on MMMDCLXX September
MCMXCIII in <URL:^^ Hi People,
^^
^^ I have a silly little list-parsing problem that I can't get my head
^^ around, and I'm sure some of you have come across it before.
^^
^^ I have a list like this:
^^
^^ 1 a
^^ 2 b
^^ 2 c
^^ 3 a
^^ 4 d
^^ 4 d
^^ 4 e
^^ 4 f
^^ 5 g
^^
^^ and I want to make the first column non-redundant and collect the
^^ second column values on the same line, like this:
^^
^^ 1 a
^^ 2 b,c
^^ 3 a
^^ 4 d,e,f
^^ 5 g
^^
^^ Please note that line 4 only has one 'd'.
^^
^^ I've tried with both hashes and arrays (don't want to confuse you so I
^^ won't display them here), but nothing really works...

I saw a few solutions, but neither of them tried to keep the order
of the input as much as possible. The following solution does:

#!/usr/bin/perl

use strict;
use warnings;

my (@keys, %data);

while (<DATA>) {
chomp;
my ($key, $value) = split /\s+/ => $_, 2;
push @{$data {$key}} => $value;
push @keys => $key;
}

my %h;
foreach my $key (grep {!$h {$_} ++} @keys) {
my %h;
print "$key ", join (", " => grep {!$h {$_} ++} @{$data {$key}}), "\n";
}


__DATA__
1 a
2 b
2 c
3 a
4 d
4 d
4 e
4 f
5 g



Abigail

I think the solutions that used only hash are easier to visualize than
the hash/array combination as shown. Perhaps with the changes to those
solutions will also keep the order of the input as much as possible:

#!/usr/bin/perl
use strict;

my (%data,%list);
while (<DATA>)
{
next unless /(\w+)\s+(\w+)/; # skip if format is not as specified

if(!$data{"$1$2"}) # add to list only if not already saved
{
if($list{$1}) { $list{$1} .= "," } # add comma separator
$list{$1} .= $2;
}
$data{"$1$2"}++;
}
foreach my $k(sort keys %list)
{
print "$k\t$list{$k}\n";
}

__DATA__
1 a
2 b
2 c
3 a
4 d
4 d
4 e
4 f
5 g

Tom
ztml.com
 
U

Uri Guttman

AS> while ( <DATA> ) {
AS> my ( $first, $second) = split;
AS> undef $coll{ $first}->{ $second}; # undef brings the key to existence

that is the single ugliest use of undef i have ever seen!!

i will have to copy that for some future talk at yapc: "where to not use
undef"

:)

uri
 
T

Tore Aursand

undef $coll{ $first}->{ $second}; # undef brings the key to existence

Uhm. I don't know what you've been drinking, but I sure want some of it! :)


--
Tore Aursand <[email protected]>

"You know the world is going crazy when the best rapper is white, the best
golfer is black, France is accusing US of arrogance and Germany doesn't
want to go to war."
 
A

Anno Siegel

Uri Guttman said:
AS> while ( <DATA> ) {
AS> my ( $first, $second) = split;
AS> undef $coll{ $first}->{ $second}; # undef brings the key
to existence

that is the single ugliest use of undef i have ever seen!!

i will have to copy that for some future talk at yapc: "where to not use
undef"

I hope you'll acknowledge my contribution. Oh, wait... should I patent it?

I may be lacking in esthetic sensibility, but how is it uglier than
"$coll{ $first}->{ $second} = undedf"? What do you use when you want
a key but no value? I don't know a satisfactory idiom.

Anno
 
T

Tassilo v. Parseval

Also sprach Anno Siegel:
I hope you'll acknowledge my contribution. Oh, wait... should I patent it?

I may be lacking in esthetic sensibility, but how is it uglier than
"$coll{ $first}->{ $second} = undedf"? What do you use when you want
a key but no value? I don't know a satisfactory idiom.

Efficiency-wise, using undef is the best solution. It's static and
global and will have the smallest memory footprint. For the sake of
readability however, I'd probably write:

$coll{$first}->{$second}++;

or maybe even

$coll{$first}->{$second} = (); # now we have our undef again

Uri's right. undef always tends to look fishy although it is sometimes
perfectly reasonable to use it.

Tassilo
 
A

Anno Siegel

Tassilo v. Parseval said:
Also sprach Anno Siegel:


Efficiency-wise, using undef is the best solution. It's static and
global and will have the smallest memory footprint. For the sake of
readability however, I'd probably write:

$coll{$first}->{$second}++;

Yes, very common, as is "$coll{$first}->{$second} = 1". However, the
reader will have to find out that the values are never used. Especially
the "++" thing looks like you're collecting a count for a purpose.
or maybe even

$coll{$first}->{$second} = (); # now we have our undef again

Translation: Assign nothing to the value associated with field "$second".
Thanks, I like it. It expresses the intention very well.
Uri's right. undef always tends to look fishy although it is sometimes
perfectly reasonable to use it.

....especially when the purpose of "undef" is to define something (a key).
I think it was back in Perl 4 days when I decided to use "undef" in that
way, and never looked back. I am reformed, it's "= ()" from now.

Anno
 
U

Uri Guttman

AS> I hope you'll acknowledge my contribution. Oh, wait... should I
AS> patent it?

will the euro laws allow it?

AS> I may be lacking in esthetic sensibility, but how is it uglier
AS> than "$coll{ $first}->{ $second} = undedf"? What do you use when
AS> you want a key but no value? I don't know a satisfactory idiom.

why do you need no value here? i would assign 1 so you don't need to use
exists. only when i can assign () and create a bunch of keys (with a
slice) do i generate keys with no values.

as an aside, i dislike undef as a function in general. it leads to undef
@array which leads to defined @array which is dead wrong. i only use
undef for a value generator and never as a function.

uri
 
T

Tad McClellan

Anno Siegel said:
I hope you'll acknowledge my contribution. Oh, wait... should I patent it?


That's no way to get ahead.

You should release it under the viral GPL as well as sell it yourself
for a period long enough for its use to become widespread.

Then you should leave the software business and convert to
the lawsuit business.

Then you should sue not only the companies that distributed it, but
also every user that has installed it.


Then everybody will love you and your stock price will go up.
 
T

Tore Aursand

I have a list like this:

1 a
2 b
2 c
3 a
4 d
4 d
4 e
4 f
5 g

and I want to make the first column non-redundant and collect the
second column values on the same line, like this:

1 a
2 b,c
3 a
4 d,e,f
5 g

#!/usr/bin/perl
#
use strict;
use warnings;

my %hash = ();
while ( <DATA> ) {
chomp;
my @columns = split( /\s+/ );
$hash{$columns[0]}->{$columns[1]}++;
}

my @sorted = sort {
$hash{$a} <=> $hash{$b}
} keys %hash;

foreach ( @sorted ) {
print $_ . "\t" . join(', ', sort keys %{$hash{$_}}) . "\n";
}

__DATA__
1 a
2 b
2 c
3 a
4 d
4 d
4 e
4 f
5 g


--
Tore Aursand <[email protected]>

"You know the world is going crazy when the best rapper is white, the best
golfer is black, France is accusing US of arrogance and Germany doesn't
want to go to war."
 
B

Bart Lateur

Marcus said:
I have a silly little list-parsing problem that I can't get my head
around, and I'm sure some of you have come across it before.

I have a list like this:

1 a
2 b
2 c
3 a
4 d
4 d
4 e
4 f
5 g

and I want to make the first column non-redundant and collect the
second column values on the same line, like this:

1 a
2 b,c
3 a
4 d,e,f
5 g

Please note that line 4 only has one 'd'.

IMO your problem isn't in the parsing, but it the data storing. For the
second, i'd use a hash of which only the keys are relevant, and the
order if you want the original order. For the first column, I'm not
sure. Can you use an array? A hash that is sorted on the numeric value
of the string, or simply the original order?

Assuming an array will work, let's try this:

# Fill the data structure
my @data;
while(<DATA>) {
chomp;
my($i, $val) = split " ", $_;
for($data[$i]) {
$_->{$val} ||= keys %$_;
}
}


# Let's check what we've got
use Data::Dumper;
print Dumper \@data;


# Generate the report
local ($\, $,) = ("\n", "\t");
my $i = 0;
foreach (@data) {
defined or next;
print $i, join ",", sort { $_->{$a} <=> $_->{$b} } keys %$_;
} continue {
$i++;
}

__DATA__
(your data follows)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,139
Messages
2,570,805
Members
47,356
Latest member
Tommyhotly

Latest Threads

Top