split problem

G

gabkin

I am having a problem with the split function.
Here is the sub that it is used in, it should illustrate what I'm
doing, criticism is welcomed...


<PERL SUB>
sub parseLine()
{
#this parses a line which will be in a similar format to this
#"0010230" "Book of the Dead" "Yendor books"
#(tab delimited, escaped by quotes)
#it will take as an argument the column headers and the string to
parse
#it will return a hash,using the columnheader as the key
#and the column data as the element
my $ParseMe = $_[0];
my @ColumnHeaders = $_[1..@_];
my %returnData;
chop($ParseMe);
my @Columns = split(/\t/,$ParseMe);
#my $size=@Columns;print("Size = ",$size,"\n");
for(my $i=0;$i<@Columns;$i++) {
$Columns[$i] =~ s/\"//g; # remove extraneous quotes
#print($ColumnHeaders[$i],"\t",$Columns[$i],"\n");
$returnData{$ColumnHeaders[$i]} = $Columns[$i];
}
return %returnData;
}
</PERL SUB>
(Sorry about the awful two-space indentation, but google seems to strip
out tabs)

A problem has arisen in that in one example, the last four columns are
blank (i.e. null) they're there, theres just nothing in them. For these
last four, the split function seems to discard them. I checked this
with the aid of the commented out lines.

Is there a way to force split to not lose the blank columns?

Or would I have to 're-invent' the split algorithm so as to keep them?
Any help would be greatly appreciated...
 
P

Paul Lalli

gabkin said:
I am having a problem with the split function.

Did you consider reading the documentation for the function you're
having problems with?

A problem has arisen in that in one example, the last four columns are
blank (i.e. null) they're there, theres just nothing in them. For these
last four, the split function seems to discard them. I checked this
with the aid of the commented out lines.

Is there a way to force split to not lose the blank columns?

perldoc -f split
4th paragraph.

Paul Lalli
 
T

thundergnat

gabkin said:
I am having a problem with the split function.
Here is the sub that it is used in, it should illustrate what I'm
doing, criticism is welcomed...
[snip]
A problem has arisen in that in one example, the last four columns are
blank (i.e. null) they're there, theres just nothing in them. For these
last four, the split function seems to discard them. I checked this
with the aid of the commented out lines.

Is there a way to force split to not lose the blank columns?

Or would I have to 're-invent' the split algorithm so as to keep them?
Any help would be greatly appreciated...

Did you read the docs for split? (Really. Not being sarcastic.)

Seems like you are looking for the Limit option on split.

Since you know how many cloumns you are looking for, specify that.
 
J

John W. Krahn

gabkin said:
I am having a problem with the split function.
Here is the sub that it is used in, it should illustrate what I'm
doing, criticism is welcomed...
^^^^^^^^^^^^^^^^^^^^^
Ok, you asked for it. :)

<PERL SUB>
sub parseLine()
{
#this parses a line which will be in a similar format to this
#"0010230" "Book of the Dead" "Yendor books"
#(tab delimited, escaped by quotes)
#it will take as an argument the column headers and the string to
parse
#it will return a hash,using the columnheader as the key
#and the column data as the element
my $ParseMe = $_[0];
my @ColumnHeaders = $_[1..@_];
^^^^^^^^^
That is wrong. The '$' at the beginning denotes a scalar value so you are
assigning a single value from the @_ array to the @ColumnHeaders array. And
even if you had used a proper array slice, you are accessing an extra element
at the end of the array that does not exist.

$ perl -le'@x="a".."f"; print @x . " @x"; @y = @x[1..@x]; print @y . " @y"'
6 a b c d e f
6 b c d e f

The correct syntax is:

my @ColumnHeaders = @_[ 1 .. $#_ ];

However the usual way to do that is:

my ( $ParseMe, @ColumnHeaders ) = @_;

Or if you really want to do it on two lines:

my $ParseMe = shift;
my @ColumnHeaders = @_;

my %returnData;
chop($ParseMe);

chop() isn't really used very much anymore. You should use chomp() unless you
have a valid reason not to.

my @Columns = split(/\t/,$ParseMe);

As others have pointed out, use the third argument to split().

my @Columns = split /\t/,$ParseMe, -1;

#my $size=@Columns;print("Size = ",$size,"\n");
for(my $i=0;$i<@Columns;$i++) {

That is usually written as:

for my $i ( 0 .. $#Columns ) {

$Columns[$i] =~ s/\"//g; # remove extraneous quotes

Double quote characters don't have to be escaped in regular expressions.

#print($ColumnHeaders[$i],"\t",$Columns[$i],"\n");
$returnData{$ColumnHeaders[$i]} = $Columns[$i];
}
return %returnData;
}
</PERL SUB>



John
 
G

Gabkin

John said:
^^^^^^^^^^^^^^^^^^^^^
Ok, you asked for it. :)

I welcome criticism because I know I am new to perl and am probably
carrying over mistakes from other languages (Java,VB,COBOL) into my perl
writing.
my @ColumnHeaders = $_[1..@_];

^^^^^^^^^
That is wrong. The '$' at the beginning denotes a scalar value so you
are assigning a single value from the @_ array to the @ColumnHeaders
array. And even if you had used a proper array slice, you are accessing
an extra element at the end of the array that does not exist.

$ perl -le'@x="a".."f"; print @x . " @x"; @y = @x[1..@x]; print @y . "
@y"'
6 a b c d e f
6 b c d e f

The correct syntax is:

my @ColumnHeaders = @_[ 1 .. $#_ ];

However the usual way to do that is:

my ( $ParseMe, @ColumnHeaders ) = @_;

Or if you really want to do it on two lines:

my $ParseMe = shift;
my @ColumnHeaders = @_;

Uh, Thanks. I'm still trying to understand all of this but I have
implemented the much cleaner single-line assignment. I have actually
seen this before and thus should know about it. Thanks!

I still have a lot of trouble with all of the 'magic' variables (like
$#) and 'shift', it may be because I have never used C...
chop() isn't really used very much anymore. You should use chomp()
unless you have a valid reason not to.

It's more a case of I started using chop from the start and it works, so
I haven't changed it, I will try to use 'chomp' over 'chop' though.
As others have pointed out, use the third argument to split().

Yes, I have found that out.
I now use this instead..
my @Columns = split(/\t/,$ParseMe,@ColumnHeaders);
That is usually written as:

for my $i ( 0 .. $#Columns ) {

I have never seen that before, it looks quite handy.
I am not too familiar with the $# usage yet, so I will go look it up now.
$Columns[$i] =~ s/\"//g; # remove extraneous quotes

Double quote characters don't have to be escaped in regular expressions.

I tend to err on the side of caution with regexes, due to their
inconsistent handling between perl,sed,grep and vi (and probably others
too).
Thanks though, duly noted.

Thanks for these little tips!

I would love to get my entire program inspected and criticized like
this, but I feel I might be amiss to post the entire thing (1252 lines
in the main program, and 113 lines in the 'data verification' program),
because I know of at least one major algorithm that I did wrong, I used
a hash where I should have used a string.
 
G

Gabkin

thundergnat said:
Did you read the docs for split? (Really. Not being sarcastic.)

Seems like you are looking for the Limit option on split.

Since you know how many cloumns you are looking for, specify that.

You are quite right, I did not read the help for split before posting this!

I apologize, since it has answered my question perfectly...
 
J

John W. Krahn

Gabkin said:
John said:
The correct syntax is:

my @ColumnHeaders = @_[ 1 .. $#_ ];

However the usual way to do that is:

my ( $ParseMe, @ColumnHeaders ) = @_;

Or if you really want to do it on two lines:

my $ParseMe = shift;
my @ColumnHeaders = @_;

Uh, Thanks. I'm still trying to understand all of this but I have
implemented the much cleaner single-line assignment. I have actually
seen this before and thus should know about it. Thanks!

I still have a lot of trouble with all of the 'magic' variables (like
$#) and 'shift', it may be because I have never used C...

Don't confuse the magic variable $# (which is deprecated)

perldoc perlvar

with the index of the last element in an array

perldoc perldata

It's more a case of I started using chop from the start and it works, so
I haven't changed it, I will try to use 'chomp' over 'chop' though.

chop() will always remove and return the last character in a string while
chomp() will remove the value of $/ if it is at the end of the string.

perldoc -f chop
perldoc -f chomp


John
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,994
Messages
2,570,223
Members
46,811
Latest member
SaulFernan

Latest Threads

Top