Text File Processing

Greg Carlill · Sep 3, 2003

Hi All,

I have a text file that contains column data like:

730 B13
730 B33
730 B53
730 B73
800 B10-1
800 B30-1
800 B50-1
800 B70-1

and want to get a output text file like this.

730 B13, B33, B53, B73
800 B10-1, B30-1, B50-1, B70-1

Perl 5.005_02 on NT is all I have to do this. What is the best way to
attack this. I have almost no knowledge of Perl but am willing to
learn what I need.

Thanks in advance
Greg

Eric J. Roode · Sep 3, 2003

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

(e-mail address removed) (Greg Carlill) wrote in

Hi All,

I have a text file that contains column data like:

730 B13
730 B33
730 B53
730 B73
800 B10-1
800 B30-1
800 B50-1
800 B70-1

and want to get a output text file like this.

730 B13, B33, B53, B73
800 B10-1, B30-1, B50-1, B70-1

Perl 5.005_02 on NT is all I have to do this. What is the best way to
attack this. I have almost no knowledge of Perl but am willing to
learn what I need.

First of all, a couple questions: Are the values in the second column
going to be unique for each value in the right column? Do you need to
output the result in the same order as the values arrived in the first
column?

Assuming the answers are both "yes", I'd attack the problem as follows:

1. loop over each line in the file
a. split into two parts, $col1 and $col2.
b. push $col1 onto @col1values if $col2values{$col1} doesn't
exist.
c. push $col2 onto @{ $col2values{$col1} }.

2. loop over each value in @col1values
a. set $aref = $col2values{$_}
b. print $_, @$aref (suitably formatted).

In other words, keep the column 1 values in an array (to preserve order),
and keep the column 2 values in a hash-of-arrays, keyed on the column 1
value.

If the output order isn't significant, you can dispose of the col1values
array -- just use the keys() of %col2values.

If the second column values are not necessarily unique, but you want to
remove dups for the output, you'll have to modify %col2values to be a
hash-of-hashes, and output keys(%{$col2values{$_}} instead of @$aref in
step 2b. If you additionally need to preserve input order, you'll have
to keep a parallel array of col2 values, like you did in @col1values.

Make sense? If not, feel free to ask me to elaborate

- --
Eric
$_ = reverse sort $ /. r , qw p ekca lre uJ reh
ts p , map $ _. $ " , qw e p h tona e and print

-----BEGIN PGP SIGNATURE-----
Version: PGPfreeware 7.0.3 for non-commercial use <http://www.pgp.com>

iQA/AwUBP1aqP2PeouIeTNHoEQJTOgCZAcCZryK+/dFk7l9jJvaEnnCQHroAoL9L
p4GdE9QjeVP1xab/Qqe/cj0y
=kcCw
-----END PGP SIGNATURE-----

John W. Krahn · Sep 4, 2003

Greg said:
I have a text file that contains column data like:

730 B13
730 B33
730 B53
730 B73
800 B10-1
800 B30-1
800 B50-1
800 B70-1

and want to get a output text file like this.

730 B13, B33, B53, B73
800 B10-1, B30-1, B50-1, B70-1

Perl 5.005_02 on NT is all I have to do this. What is the best way to
attack this. I have almost no knowledge of Perl but am willing to
learn what I need.

Here is one way to do it:

#!/usr/bin/perl -w
use strict;

my %data;
while ( <> ) {
my ( $key, $val ) = split or next;
if ( exists $data{ $key } ) {
$data{ $key } .= ", $val";
}
else {
print join( "\t", %data ), "\n" if %data;
%data = ( $key, $val );
}
}
print join( "\t", %data ), "\n" if %data;

__END__

John

Anno Siegel · Sep 4, 2003

Greg Carlill said:
Hi All,

I have a text file that contains column data like:

730 B13
730 B33
730 B53
730 B73
800 B10-1
800 B30-1
800 B50-1
800 B70-1

and want to get a output text file like this.

730 B13, B33, B53, B73
800 B10-1, B30-1, B50-1, B70-1

Perl 5.005_02 on NT is all I have to do this. What is the best way to
attack this. I have almost no knowledge of Perl but am willing to
learn what I need.

Assuming DATA is a read filehandle to your data:

my %table;
push( @{ $table{ $_->[ 0]}}, $_->[ 1]) for map [ split], <DATA>;

To see the result:

print "$_ @{ $table{ $_}}\n" for keys %table;

Anno

Barry Kimelman · Sep 4, 2003

[This followup was posted to comp.lang.perl.misc]

Greg said:
Hi All,

I have a text file that contains column data like:

730 B13
730 B33
730 B53
730 B73
800 B10-1
800 B30-1
800 B50-1
800 B70-1

and want to get a output text file like this.

730 B13, B33, B53, B73
800 B10-1, B30-1, B50-1, B70-1

Perl 5.005_02 on NT is all I have to do this. What is the best way to
attack this. I have almost no knowledge of Perl but am willing to
learn what I need.

Thanks in advance
Greg

Perl has a very nice feature called "hashes". Hashes are essentially
arrays whose key can be any type of value, such as an integer, a
character string, etc...

#!/usr/bin/perl -w

$filename = "mydatafile.txt";
open(INPUT,"<$filename") or die("open failed : $!\n");

%mydata = ();
while ( $buffer = <INPUT> ) {
chomp $buffer;
$buffer =~ s/^\s+//; # delete leading whitespace
@fields = split(/\s+/,$buffer);
if ( exists $mydata{$fields[0]} ) {
$mydata{$fields[0]} .= ", " . $fields[1];
}
else {
$mydata{$fields[0]} = $fields[1];
}
}
close INPUT;

foreach $keyval ( keys %mydata ) {
print "$keyval $mydata{$keyval}\n";
}

--

John Bokma · Sep 4, 2003

Barry said:
[This followup was posted to comp.lang.perl.misc]

Greg said:

Hi All,

I have a text file that contains column data like:

730 B13
730 B33
730 B53
730 B73
800 B10-1
800 B30-1
800 B50-1
800 B70-1

and want to get a output text file like this.

730 B13, B33, B53, B73
800 B10-1, B30-1, B50-1, B70-1

Perl 5.005_02 on NT is all I have to do this. What is the best way to
attack this. I have almost no knowledge of Perl but am willing to
learn what I need.

Thanks in advance
Greg

Click to expand...

Perl has a very nice feature called "hashes". Hashes are essentially
arrays whose key can be any type of value, such as an integer, a
character string, etc...

#!/usr/bin/perl -w

$filename = "mydatafile.txt";
open(INPUT,"<$filename") or die("open failed : $!\n");

%mydata = ();
while ( $buffer = <INPUT> ) {

IIRC defined($buffer = said:
chomp $buffer;
$buffer =~ s/^\s+//; # delete leading whitespace
@fields = split(/\s+/,$buffer);

how about push(@{$mydata{$fields[0]}}, $fields[1]);

I also would recommend using sensible names for the fields:

my($number, $bcode);

push(@{$mydata{$number}}, $bcode)
if (($number, $bcode) =~ /^\s+(\d+)\s+(\S+)/';

the \S+ could be made more specific probably.

if ( exists $mydata{$fields[0]} ) {
$mydata{$fields[0]} .= ", " . $fields[1];
}
else {
$mydata{$fields[0]} = $fields[1];
}
}
close INPUT;

or die....

foreach $keyval ( keys %mydata ) {
print "$keyval $mydata{$keyval}\n";

print "$keyval ", join(", ", @{mydata{$keyval}), "\n";

}

Which seperates the data and the presentation layer.

Tad McClellan · Sep 4, 2003

John Bokma said:
IIRC defined($buffer = <INPUT>) ..

You do not recall correctly.

John Bokma · Sep 4, 2003

Tad said:
You do not recall correctly.

What happens if $buffer reads 0 ?

Has this been changed?

Tad McClellan · Sep 4, 2003

John Bokma said:
What happens if $buffer reads 0 ?

The same thing in either case.

perl will add the defined() test for you if you leave it out:

perl -MO=Deparse -e 'while ( $buffer = <INPUT> ) {1}'
while (defined($buffer = <INPUT>)) {
'???';
}

vs:

perl -MO=Deparse -e 'while ( defined($buffer = <INPUT>) ) {1}'
while (defined($buffer = <INPUT>)) {
'???';
}

Has this been changed?

Yes. It was changed when the warning for it went away.

I don't remember what version that was at though.

Sam Holden · Sep 4, 2003

What happens if $buffer reads 0 ?

DWIM kicks in and it works (perl adds the defined test).

Has this been changed?

Yes, A long time ago.

John Bokma · Sep 4, 2003

Sam said:
DWIM kicks in and it works (perl adds the defined test).
Thanks.

Yes, A long time ago.

Ah, I remember those days some of my scripts broke because of the
defined test missing ('98 or '97).

John Bokma · Sep 4, 2003

Tad said:
Yes. It was changed when the warning for it went away.

Thanks. I remember a server writing an error log that was huge because
of the warning that suddenly popped up in a perl 5.x build. Since then I
always add the defined test.

Greg Carlill · Sep 9, 2003

Hi All,

Just a quick note of thanks. I've used one of the methods suggested
here and it's working fine.

Thanks again

Greg

Php combine identical lines in text file	4	Oct 11, 2023
Insert replace text based on a name in other file python script	4	Mar 5, 2025
Problem Splitting Text String	2	Dec 28, 2022
perl DB (pgsql)I: problems searching text strings with ' symbol (es d'ambrose)	1	Apr 7, 2007
Batch modifying text - content and context based	5	Jan 19, 2023
Text processing	29	Sep 26, 2011
PHP failed to create file	13	Dec 12, 2023
Adobe Acrobat JavaScript PDF Script Issues: File Matching and Dynamic Retrieval	0	Nov 29, 2024

Text File Processing

Greg Carlill

Eric J. Roode

John W. Krahn

Anno Siegel

Barry Kimelman

John Bokma

Tad McClellan

John Bokma

Tad McClellan

Sam Holden

John Bokma

John Bokma

Greg Carlill

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads