delimited data into nested array

Y

Yup

Hello,
I'm just starting to learn Perl. Up until now I'd been struggling to
learn how to import a tab-delimited data table (from a text file) into
Perl as a two dimensional nested array. I wanted to be able to
manipulate that data by accessing it using x-y coordinates, and then
output it again.

I may have been looking in the wrong place, but I had a hard time
finding help on Google groups and other places online. However, I've
managed to figure it out, and thought I'd post my code.

For two reasons, I suppose - to help other novices, and perhaps get
some comments from more advanced users. On the latter part, I'm
interested in knowing what I could do to make my code run faster and
make it more Perl-ish.

Thanks for your help ahead of time.

#!/usr/local/bin/perl -w

#define filename and open it
$file = 'a1.txt';
open(INFO, $file) || die "Can't open file $file\n" ;

# read file into temporary array called "lines"
while(<INFO>)
{
#chop off the carrage return
chop $_;
push @lines, $_;
}

# Close the file
close(INFO);

#reset index for generating arrays named "line_{$i}"
$i = 0;

#read each array entry into new array, split with tab
foreach (@lines)
{
push @{'line_'.${i}}, split("\t", $lines[$i]);
$i++
}

#generate an array to hold the other arrays
for ($i=0; $i<scalar(@lines); $i++)
{
push @A, *{'line_'.${i}};
}

#rename the top corner to be START
$A[0][0] = "START";

#open file to send data to
open(OUTFILE, ">a1_edited.txt");

for ($i=0; $i<scalar(@lines);$i++)
{

for ($j=0;$j<scalar(@line_0);$j++)
{
print OUTFILE "$A[$i][$j]";
if ($j<(scalar(@line_0)-1)) { print OUTFILE "\t";}
}

print OUTFILE "\n";
}
close(OUTFILE);
 
J

Joe Smith

Yup said:
I'm just starting to learn Perl. Up until now I'd been struggling to
learn how to import a tab-delimited data table (from a text file) into
Perl as a two dimensional nested array.
# read file into temporary array called "lines"
while(<INFO>)
{
#chop off the carrage return
chop $_;
push @lines, $_;
}

That can be simplified to just two statements:
#read each array entry into new array, split with tab
foreach (@lines)
{
push @{'line_'.${i}}, split("\t", $lines[$i]);
$i++
}

Ugh! You're indirectly using symbolic references. Don't do that.
Use [] to create an anonymous array and push that instead.
-Joe

my @lines = <DATA>;
chomp @lines;
my @A; # Two-dimensional array
push @A,[split "\t",$_] foreach @lines;
$A[0][0] = "START";
$A[2][3] = "Fourth item in third row";
for my $row (0 .. $#A) {
print "Row $row: ";
for my $col (0 .. $#{$A[$row]}) {
print "\tC$col=$A[$row][$col]";
}
print "\n";
}
__DATA__
zero one two three four five six
ten eleven twelve thirteen
twenty twenty-one twenty-two
thirty thirty-one
 
G

Gunnar Hjalmarsson

Yup said:
I'm just starting to learn Perl. Up until now I'd been struggling
to learn how to import a tab-delimited data table (from a text
file) into Perl as a two dimensional nested array. I wanted to be
able to manipulate that data by accessing it using x-y coordinates,
and then output it again.

I may have been looking in the wrong place, but I had a hard time
finding help on Google groups and other places online. However,
I've managed to figure it out, and thought I'd post my code.

For two reasons, I suppose - to help other novices,

Hmm.. To be honest, there is not much for novices to learn from that
code (even if it gets the job done). Actually, it does not even work
the way you say it does: There is no two dimensional array, but rather
a hierarchy of named arrays. There is the array @A that contains
references to a bunch of other named arrays. On top of that, you have
not declared your variables and enabled strictures.
and perhaps get some comments from more advanced users. On the
latter part, I'm interested in knowing what I could do to make my
code run faster and make it more Perl-ish.

Even if I don't consider myself an "advanced" Perl programmer, I made
a suggestion below.

But first a couple of detailed comments:
#!/usr/local/bin/perl -w

No strictures!
#define filename and open it

Pointless comment. (It's apparent from the code.)
$file = 'a1.txt';
open(INFO, $file) || die "Can't open file $file\n" ;

# read file into temporary array called "lines"
while(<INFO>)
{
#chop off the carrage return
chop $_;

In Perl 5 you use chomp() for that.
push @lines, $_;
}

# Close the file

Yes, I can see that. ;-)
close(INFO);

#reset index for generating arrays named "line_{$i}"
$i = 0;

#read each array entry into new array, split with tab
foreach (@lines)
{
push @{'line_'.${i}}, split("\t", $lines[$i]);

Here you are using symbolic references, which is normally not
advisable and not allowed under strictures. In this case, there is no
reason to do so.
$i++
}

#generate an array to hold the other arrays
for ($i=0; $i<scalar(@lines); $i++)
{
push @A, *{'line_'.${i}};

Symbolic references again.
(Is it possibly Perl 4 style to use globrefs like that?)
}

#rename the top corner to be START
$A[0][0] = "START";

#open file to send data to
open(OUTFILE, ">a1_edited.txt");

for ($i=0; $i<scalar(@lines);$i++)
{

for ($j=0;$j<scalar(@line_0);$j++)
{
print OUTFILE "$A[$i][$j]";
if ($j<(scalar(@line_0)-1)) { print OUTFILE "\t";}
}

print OUTFILE "\n";
}
close(OUTFILE);


How about this instead:

#!/usr/local/bin/perl
use strict;
use warnings;

my @A;
my $file = 'a1.txt';
my $outfile = 'a1_edited.txt';

open IN, $file or die "Can't open $file: $!";
while (<IN>) {
chomp;
push @A, [ split /\t/ ];
}
close IN;

$A[0][0] = 'START';

open OUT, "> $outfile" or die "Can't open $outfile: $!";
for (@A) {
print OUT join "\t", @$_, "\n";
}
close OUT;

__END__
 
G

Gunnar Hjalmarsson

Joe said:
That can be simplified to just two statements:
@lines = <INFO>;
chomp @lines; # (Don't use chop)

Or to one:

chomp( my @lines = <INFO> );

But there is no need for any temporary array at all, is there?

while (<INFO>) {
chomp;
push @A, [ split /\t/ ];
 
B

Ben Morrow

Quoth (e-mail address removed) (Yup):
Hello,
I'm just starting to learn Perl. Up until now I'd been struggling to
learn how to import a tab-delimited data table (from a text file) into
Perl as a two dimensional nested array. I wanted to be able to
manipulate that data by accessing it using x-y coordinates, and then
output it again.

I may have been looking in the wrong place, but I had a hard time
finding help on Google groups and other places online. However, I've
managed to figure it out, and thought I'd post my code.

For two reasons, I suppose - to help other novices, and perhaps get
some comments from more advanced users. On the latter part, I'm
interested in knowing what I could do to make my code run faster and
make it more Perl-ish.

Thanks for your help ahead of time.

#!/usr/local/bin/perl -w

use strict;
use warnings;

warnings is a more modern replacement for -w: see perldoc warnings.
strict helps catch common coding errors.
#define filename and open it
$file = 'a1.txt';
open(INFO, $file) || die "Can't open file $file\n" ;

open my $INFO, '<', $file or die "Can't open $file: $!";

The lexical FH ('my $INFO' vs simply 'INFO') means it will close when it
goes out of scope. The '<' protects against nasty filenames. Using 'or'
instead of '||' removes the need for brackets.
# read file into temporary array called "lines"

my @lines;

as you are now using strictures.
while(<INFO>)
{
#chop off the carrage return
chop $_;

Use chomp instead of chop to remove newlines: you might be using "\r\n"
instead of just "\n".
push @lines, $_;
}

# Close the file
close(INFO);

This whole section can be compressed into

my @lines = do {
open my $INFO, '<', $file or die "...";
chomp <$INFO>;
};

#reset index for generating arrays named "line_{$i}"
$i = 0;

my $i;

There is no need to initialize to 0.
#read each array entry into new array, split with tab
foreach (@lines)
{

More usual style would be

for (@lines) {
push @{'line_'.${i}}, split("\t", $lines[$i]);

Ohmygoodnessme. These are 'symrefs', and are a *very* bad idea. They are
such a bad idea that 'use strict' will prevent you from using them. What
you want here is an array; and there's no need to keep count of your
array indices as $lines[$i] is already put in $_ by the 'for':

my @A;

# A better name than @A is probably appropriate...

for (@lines) {
push @A, [ split "\t" ];
}

or, more Perlishly, you could use map:

my @A = map { [ split /\t/ ] }, @lines;

The [...] construct is an array ref constructor: see perldoc perlreftut.
$i++
}

#generate an array to hold the other arrays
for ($i=0; $i<scalar(@lines); $i++)

Don't use C-style loops. It's not good Perl style.

There's no need for that explicit 'scalar': $i < @lines will work
perfectly well.
{
push @A, *{'line_'.${i}};

I'm slightly amazed this even works... I would have expected you to
needed to say *{...}{ARRAY}... anyway, it's YUCK. You shouldn't be
messing with globs (the '*' things) unless you *really* know what you're
doing.

You've already done this, now, anyway (one advantage of doing things
right in the first place :)...
}

#rename the top corner to be START
$A[0][0] = "START";

#open file to send data to
open(OUTFILE, ">a1_edited.txt");

Output can fail too:

open my $OUTFILE, '>', 'a1_edited.txt'
or die "can't create a1_edited.txt: $!"
for ($i=0; $i<scalar(@lines);$i++)
{

for ($j=0;$j<scalar(@line_0);$j++)
{
print OUTFILE "$A[$i][$j]";

Don't quote things when you don't need to.
if ($j<(scalar(@line_0)-1)) { print OUTFILE "\t";}
}

print OUTFILE "\n";
}

for (@A) {
print $OUTFILE join( "\t" => @$_ ), "\n";
}

or even

print $OUTFILE join( "\n", map { join "\t" => @$_ } @A ), "\n";

or, using Perl's special output variables (this is how I'd do it):

{
open my $OUTFILE, '>', '...' or die "...";
local ($,, $\) = ("\t", "\n");
print $OUTFILE @$_ for @A;
}

The 'local's keep the changes to $, and $/ to within the braces.
close(OUTFILE);

Again, I would use a scope (set of braces) to close the file unless I
wanted to check for an error on close, which might be a good idea...

Ben
 
G

Gunnar Hjalmarsson

krakle said:
How about this instead:
#!/usr/local/bin/perl -Tw

Why would you use the -w switch when warnings are enabled through the
warnings pragma? As regards -T, it's not a CGI script, and the OP did
not indicate anything else that would make -T motivated. For what
reason did you suggest it?
 
Y

Yup

while ( said:
chomp;
push @A, [ split /\t/ ];

Many thanks everyone for your help. The above code was actually what I
had been searching for, but I kept being told that Perl didn't
directly support multi-dimensional arrays. If this is the case, what
is this called, then?

But now that I can access @A in the same way as a C++ vector, it suits
me well.
 
L

LaDainian Tomlinson

Hello,
I'm just starting to learn Perl. Up until now I'd been struggling to
learn how to import a tab-delimited data table (from a text file) into
Perl as a two dimensional nested array. I wanted to be able to
manipulate that data by accessing it using x-y coordinates, and then
output it again.

I may have been looking in the wrong place, but I had a hard time
finding help on Google groups and other places online. However, I've
managed to figure it out, and thought I'd post my code.

For two reasons, I suppose - to help other novices, and perhaps get
some comments from more advanced users. On the latter part, I'm
interested in knowing what I could do to make my code run faster and
make it more Perl-ish.

There is lots to correct, I'm afraid. No offense, but I hope that the
other novices will continue reading the thread rather than stopping at
this solution (or mine, for that matter).
#!/usr/local/bin/perl -w

Good, but not good enough. Get rid of the -w and add:

use warnings;
use strict;

Then you'll have to put a 'my' in front of all the variables you
declare. It'll break some other things too, but I promise it's for the
best.
#define filename and open it
$file = 'a1.txt';
open(INFO, $file) || die "Can't open file $file\n" ;

Let perl tell you why it died by including $! in the output:

open my $INFO, '<', $file or die "Couldn't open $file: $!";
# read file into temporary array called "lines"
while(<INFO>)
{
#chop off the carrage return
chop $_;
push @lines, $_;
}

Bleh. Do you have a good reason for reading the file into an array
rather than performing the data manipulation and immediately writing to
the file? If you're reading really huge files into arrays, you're going
to run into memory problems. Use the following construct if you can:

my $outfile = 'some_other.txt';
open my $OUTFILE, '<', $outfile or die "Couldn't open $outfile: $!";

# output line terminator
$\ = "\n";

while ( <$INFO> ){
chomp; # chomp is safer than chop

my @fields = split /\t/;
my @new_fields = do_stuff_with( @fields );

print $OUTFILE join "\t" => @new_fields;
}

If you'd still like to use the arrays, read up on perllol and use this:

my @AoA; # an array of arrays
while ( <$INFO> ){
chomp;
my @fields = split /\t/;

# push references to arrays, not arrays themselves
push @AoA => \@fields;
}

my @new_AoA = do_stuff_with( @AoA );

# open, etc.

print $OUTFILE join "\t" => @$_ for @new_AoA;

That will basically do everything you wanted, but I'll put some comments
below as well.
# Close the file
close(INFO);

If you use lexical filehandles like above, you don't need to close them
explicitly. They are closed when they go out of scope (in this case,
when you exit the program; in others, when you exit a subroutine or
other block).
#reset index for generating arrays named "line_{$i}"
$i = 0;

Please do not use variables in this way. You're already familiar with
arrays, and you've probably seen hashes (associative arrays) as well.
Use them whenever you're tempted to try this. The 'use strict' above
will prevent this mistake. See:

perldoc -q "How can I use a variable as a variable name?"
#read each array entry into new array, split with tab
foreach (@lines)
{
push @{'line_'.${i}}, split("\t", $lines[$i]);
$i++
}

#generate an array to hold the other arrays
for ($i=0; $i<scalar(@lines); $i++)
{
push @A, *{'line_'.${i}};
}

You won't very often need for loops like this in Perl. In this case,
you could use the range operator if you wanted. It's usually easier
just to loop over the list itself:

# range operator
for ( 0..$#lines ){ ... }

# aliasing
for my $line ( @lines ){ ... }
#rename the top corner to be START
$A[0][0] = "START";

#open file to send data to
open(OUTFILE, ">a1_edited.txt");

Always check the return value of open. Always. Yes, always.

open( ... ) or die "Oh well: $!";
for ($i=0; $i<scalar(@lines);$i++)
{

for ($j=0;$j<scalar(@line_0);$j++)
{
print OUTFILE "$A[$i][$j]";
if ($j<(scalar(@line_0)-1)) { print OUTFILE "\t";}
}

print OUTFILE "\n";
}
close(OUTFILE);

This whole loop was condensed above by using join(), $\, and that handy
array of arrays (array references, really).

Hopefully this will be useful to you. Perl is pretty complicated and
features a lot of little tricks to make your code cleaner. It takes a
long time to learn them all (I'm approximately 3% of the way there).

Good luck,

Brandan L.
 
P

Paul Lalli

while (<INFO>) {
chomp;
push @A, [ split /\t/ ];

Many thanks everyone for your help. The above code was actually what I
had been searching for, but I kept being told that Perl didn't
directly support multi-dimensional arrays. If this is the case, what
is this called, then?

It's an array of array references. Arrays in Perl can only contain
scalars. A reference is a scalar. To mimic the functionality of
multi-dimensional arrays (or hashes for that matter), you populate an
array with references to other arrays. In this case, the [ ] create an
anonymous array reference (ie, a reference to an array that doesn't have a
'name', like @words). That anonymous array reference is then pushed into
the array @A.
But now that I can access @A in the same way as a C++ vector, it suits
me well.

Be careful. While you can pretend that a Perl array only has the
functionality of a vector, they're not the same data structures. A vector
only allows insertions and removals from the 'back'. A Perl array can be
modified at the back (using push and pop), the front (using shift and
unshift), or arbitrarily in the middle (using splice and array index []
notation).

Paul Lalli
 
B

Ben Morrow

Quoth (e-mail address removed) (Yup):
while (<INFO>) {
chomp;
push @A, [ split /\t/ ];

Many thanks everyone for your help. The above code was actually what I
had been searching for, but I kept being told that Perl didn't
directly support multi-dimensional arrays.

The key word there is 'directly'...
If this is the case, what
is this called, then?

An array of references to arrays; almost always simply referred to as an
array of arrays.
But now that I can access @A in the same way as a C++ vector, it suits
me well.

Ben
 
T

Tassilo v. Parseval

Also sprach Ben Morrow:
Quoth (e-mail address removed) (Yup):

Use chomp instead of chop to remove newlines: you might be using "\r\n"
instead of just "\n".


This whole section can be compressed into

my @lines = do {
open my $INFO, '<', $file or die "...";
chomp <$INFO>;
};

This wont work. chomp() cannot work on lists, it needs a proper variable
as argument.
<$INFO> in list context will return all the lines if the file.

And even if chomp() could be used like that, it'd still do the wrong
thing. The 'do' block would return the return value of chomp() and not
the chomped lines.

Tassilo
 
A

Anno Siegel

Ben Morrow said:
my @lines = do {
open my $INFO, '<', $file or die "...";
chomp <$INFO>;
};

Chomp() will tell you it can't "modify <HANDLE>...", and if it could,
it would return the number of characters chomped, not the chomped list.
One could wrap chomp() around the assignment, but I'd probably use an
extra statement.

Anno
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,969
Messages
2,570,161
Members
46,709
Latest member
AustinMudi

Latest Threads

Top