Text File Processing

G

Greg Carlill

Hi All,

I have a text file that contains column data like:

730 B13
730 B33
730 B53
730 B73
800 B10-1
800 B30-1
800 B50-1
800 B70-1

and want to get a output text file like this.

730 B13, B33, B53, B73
800 B10-1, B30-1, B50-1, B70-1

Perl 5.005_02 on NT is all I have to do this. What is the best way to
attack this. I have almost no knowledge of Perl but am willing to
learn what I need.

Thanks in advance
Greg
 
E

Eric J. Roode

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

(e-mail address removed) (Greg Carlill) wrote in
Hi All,

I have a text file that contains column data like:

730 B13
730 B33
730 B53
730 B73
800 B10-1
800 B30-1
800 B50-1
800 B70-1

and want to get a output text file like this.

730 B13, B33, B53, B73
800 B10-1, B30-1, B50-1, B70-1

Perl 5.005_02 on NT is all I have to do this. What is the best way to
attack this. I have almost no knowledge of Perl but am willing to
learn what I need.

First of all, a couple questions: Are the values in the second column
going to be unique for each value in the right column? Do you need to
output the result in the same order as the values arrived in the first
column?

Assuming the answers are both "yes", I'd attack the problem as follows:

1. loop over each line in the file
a. split into two parts, $col1 and $col2.
b. push $col1 onto @col1values if $col2values{$col1} doesn't
exist.
c. push $col2 onto @{ $col2values{$col1} }.

2. loop over each value in @col1values
a. set $aref = $col2values{$_}
b. print $_, @$aref (suitably formatted).

In other words, keep the column 1 values in an array (to preserve order),
and keep the column 2 values in a hash-of-arrays, keyed on the column 1
value.

If the output order isn't significant, you can dispose of the col1values
array -- just use the keys() of %col2values.

If the second column values are not necessarily unique, but you want to
remove dups for the output, you'll have to modify %col2values to be a
hash-of-hashes, and output keys(%{$col2values{$_}} instead of @$aref in
step 2b. If you additionally need to preserve input order, you'll have
to keep a parallel array of col2 values, like you did in @col1values.

Make sense? If not, feel free to ask me to elaborate :)

- --
Eric
$_ = reverse sort $ /. r , qw p ekca lre uJ reh
ts p , map $ _. $ " , qw e p h tona e and print

-----BEGIN PGP SIGNATURE-----
Version: PGPfreeware 7.0.3 for non-commercial use <http://www.pgp.com>

iQA/AwUBP1aqP2PeouIeTNHoEQJTOgCZAcCZryK+/dFk7l9jJvaEnnCQHroAoL9L
p4GdE9QjeVP1xab/Qqe/cj0y
=kcCw
-----END PGP SIGNATURE-----
 
J

John W. Krahn

Greg said:
I have a text file that contains column data like:

730 B13
730 B33
730 B53
730 B73
800 B10-1
800 B30-1
800 B50-1
800 B70-1

and want to get a output text file like this.

730 B13, B33, B53, B73
800 B10-1, B30-1, B50-1, B70-1

Perl 5.005_02 on NT is all I have to do this. What is the best way to
attack this. I have almost no knowledge of Perl but am willing to
learn what I need.

Here is one way to do it:

#!/usr/bin/perl -w
use strict;

my %data;
while ( <> ) {
my ( $key, $val ) = split or next;
if ( exists $data{ $key } ) {
$data{ $key } .= ", $val";
}
else {
print join( "\t", %data ), "\n" if %data;
%data = ( $key, $val );
}
}
print join( "\t", %data ), "\n" if %data;

__END__



John
 
A

Anno Siegel

Greg Carlill said:
Hi All,

I have a text file that contains column data like:

730 B13
730 B33
730 B53
730 B73
800 B10-1
800 B30-1
800 B50-1
800 B70-1

and want to get a output text file like this.

730 B13, B33, B53, B73
800 B10-1, B30-1, B50-1, B70-1

Perl 5.005_02 on NT is all I have to do this. What is the best way to
attack this. I have almost no knowledge of Perl but am willing to
learn what I need.

Assuming DATA is a read filehandle to your data:

my %table;
push( @{ $table{ $_->[ 0]}}, $_->[ 1]) for map [ split], <DATA>;

To see the result:

print "$_ @{ $table{ $_}}\n" for keys %table;

Anno
 
B

Barry Kimelman

[This followup was posted to comp.lang.perl.misc]

Greg said:
Hi All,

I have a text file that contains column data like:

730 B13
730 B33
730 B53
730 B73
800 B10-1
800 B30-1
800 B50-1
800 B70-1

and want to get a output text file like this.

730 B13, B33, B53, B73
800 B10-1, B30-1, B50-1, B70-1

Perl 5.005_02 on NT is all I have to do this. What is the best way to
attack this. I have almost no knowledge of Perl but am willing to
learn what I need.

Thanks in advance
Greg

Perl has a very nice feature called "hashes". Hashes are essentially
arrays whose key can be any type of value, such as an integer, a
character string, etc...


#!/usr/bin/perl -w

$filename = "mydatafile.txt";
open(INPUT,"<$filename") or die("open failed : $!\n");

%mydata = ();
while ( $buffer = <INPUT> ) {
chomp $buffer;
$buffer =~ s/^\s+//; # delete leading whitespace
@fields = split(/\s+/,$buffer);
if ( exists $mydata{$fields[0]} ) {
$mydata{$fields[0]} .= ", " . $fields[1];
}
else {
$mydata{$fields[0]} = $fields[1];
}
}
close INPUT;

foreach $keyval ( keys %mydata ) {
print "$keyval $mydata{$keyval}\n";
}

--
 
J

John Bokma

Barry said:
[This followup was posted to comp.lang.perl.misc]

Greg said:
Hi All,

I have a text file that contains column data like:

730 B13
730 B33
730 B53
730 B73
800 B10-1
800 B30-1
800 B50-1
800 B70-1

and want to get a output text file like this.

730 B13, B33, B53, B73
800 B10-1, B30-1, B50-1, B70-1

Perl 5.005_02 on NT is all I have to do this. What is the best way to
attack this. I have almost no knowledge of Perl but am willing to
learn what I need.

Thanks in advance
Greg


Perl has a very nice feature called "hashes". Hashes are essentially
arrays whose key can be any type of value, such as an integer, a
character string, etc...


#!/usr/bin/perl -w

$filename = "mydatafile.txt";
open(INPUT,"<$filename") or die("open failed : $!\n");

%mydata = ();
while ( $buffer = <INPUT> ) {

IIRC defined($buffer = said:
chomp $buffer;
$buffer =~ s/^\s+//; # delete leading whitespace
@fields = split(/\s+/,$buffer);


how about push(@{$mydata{$fields[0]}}, $fields[1]);

I also would recommend using sensible names for the fields:

my($number, $bcode);

push(@{$mydata{$number}}, $bcode)
if (($number, $bcode) =~ /^\s+(\d+)\s+(\S+)/';

the \S+ could be made more specific probably.
if ( exists $mydata{$fields[0]} ) {
$mydata{$fields[0]} .= ", " . $fields[1];
}
else {
$mydata{$fields[0]} = $fields[1];
}
}
close INPUT;

or die....
foreach $keyval ( keys %mydata ) {
print "$keyval $mydata{$keyval}\n";

print "$keyval ", join(", ", @{mydata{$keyval}), "\n";

Which seperates the data and the presentation layer.
 
T

Tad McClellan

John Bokma said:
What happens if $buffer reads 0 ?


The same thing in either case.

perl will add the defined() test for you if you leave it out:


perl -MO=Deparse -e 'while ( $buffer = <INPUT> ) {1}'
while (defined($buffer = <INPUT>)) {
'???';
}

vs:

perl -MO=Deparse -e 'while ( defined($buffer = <INPUT>) ) {1}'
while (defined($buffer = <INPUT>)) {
'???';
}

Has this been changed?


Yes. It was changed when the warning for it went away.

I don't remember what version that was at though.
 
J

John Bokma

Sam said:
DWIM kicks in and it works (perl adds the defined test).
Thanks.


Yes, A long time ago.

Ah, I remember those days some of my scripts broke because of the
defined test missing ('98 or '97).
 
J

John Bokma

Tad said:
Yes. It was changed when the warning for it went away.

Thanks. I remember a server writing an error log that was huge because
of the warning that suddenly popped up in a perl 5.x build. Since then I
always add the defined test.
 
G

Greg Carlill

Hi All,

Just a quick note of thanks. I've used one of the methods suggested
here and it's working fine.

Thanks again

Greg
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,257
Messages
2,571,031
Members
48,768
Latest member
first4landlord

Latest Threads

Top