split problem

gabkin · Sep 20, 2004

I am having a problem with the split function.
Here is the sub that it is used in, it should illustrate what I'm
doing, criticism is welcomed...

<PERL SUB>
sub parseLine()
{
#this parses a line which will be in a similar format to this
#"0010230" "Book of the Dead" "Yendor books"
#(tab delimited, escaped by quotes)
#it will take as an argument the column headers and the string to
parse
#it will return a hash,using the columnheader as the key
#and the column data as the element
my $ParseMe = $_[0];
my @ColumnHeaders = $_[1..@_];
my %returnData;
chop($ParseMe);
my @Columns = split(/\t/,$ParseMe);
#my $size=@Columns;print("Size = ",$size,"\n");
for(my $i=0;$i<@Columns;$i++) {
$Columns[$i] =~ s/\"//g; # remove extraneous quotes
#print($ColumnHeaders[$i],"\t",$Columns[$i],"\n");
$returnData{$ColumnHeaders[$i]} = $Columns[$i];
}
return %returnData;
}
</PERL SUB>
(Sorry about the awful two-space indentation, but google seems to strip
out tabs)

A problem has arisen in that in one example, the last four columns are
blank (i.e. null) they're there, theres just nothing in them. For these
last four, the split function seems to discard them. I checked this
with the aid of the commented out lines.

Is there a way to force split to not lose the blank columns?

Or would I have to 're-invent' the split algorithm so as to keep them?
Any help would be greatly appreciated...

Paul Lalli · Sep 20, 2004

gabkin said:
I am having a problem with the split function.

Did you consider reading the documentation for the function you're
having problems with?

A problem has arisen in that in one example, the last four columns are
blank (i.e. null) they're there, theres just nothing in them. For these
last four, the split function seems to discard them. I checked this
with the aid of the commented out lines.

Is there a way to force split to not lose the blank columns?

perldoc -f split
4th paragraph.

Paul Lalli

thundergnat · Sep 20, 2004

gabkin said:
I am having a problem with the split function.
Here is the sub that it is used in, it should illustrate what I'm
doing, criticism is welcomed...
[snip]
A problem has arisen in that in one example, the last four columns are
blank (i.e. null) they're there, theres just nothing in them. For these
last four, the split function seems to discard them. I checked this
with the aid of the commented out lines.

Is there a way to force split to not lose the blank columns?

Or would I have to 're-invent' the split algorithm so as to keep them?
Any help would be greatly appreciated...

Did you read the docs for split? (Really. Not being sarcastic.)

Seems like you are looking for the Limit option on split.

Since you know how many cloumns you are looking for, specify that.

John W. Krahn · Sep 21, 2004

gabkin said:
I am having a problem with the split function.
Here is the sub that it is used in, it should illustrate what I'm
doing, criticism is welcomed...

^^^^^^^^^^^^^^^^^^^^^
Ok, you asked for it.

<PERL SUB>
sub parseLine()
{
#this parses a line which will be in a similar format to this
#"0010230" "Book of the Dead" "Yendor books"
#(tab delimited, escaped by quotes)
#it will take as an argument the column headers and the string to
parse
#it will return a hash,using the columnheader as the key
#and the column data as the element
my $ParseMe = $_[0];
my @ColumnHeaders = $_[1..@_];

^^^^^^^^^
That is wrong. The '$' at the beginning denotes a scalar value so you are
assigning a single value from the @_ array to the @ColumnHeaders array. And
even if you had used a proper array slice, you are accessing an extra element
at the end of the array that does not exist.

$ perl -le'@x="a".."f"; print @x . " @x"; @y = @x[1..@x]; print @y . " @y"'
6 a b c d e f
6 b c d e f

The correct syntax is:

my @ColumnHeaders = @_[ 1 .. $#_ ];

However the usual way to do that is:

my ( $ParseMe, @ColumnHeaders ) = @_;

Or if you really want to do it on two lines:

my $ParseMe = shift;
my @ColumnHeaders = @_;

my %returnData;
chop($ParseMe);

chop() isn't really used very much anymore. You should use chomp() unless you
have a valid reason not to.

my @Columns = split(/\t/,$ParseMe);

As others have pointed out, use the third argument to split().

my @Columns = split /\t/,$ParseMe, -1;

#my $size=@Columns;print("Size = ",$size,"\n");
for(my $i=0;$i<@Columns;$i++) {

That is usually written as:

for my $i ( 0 .. $#Columns ) {

$Columns[$i] =~ s/\"//g; # remove extraneous quotes

Double quote characters don't have to be escaped in regular expressions.

#print($ColumnHeaders[$i],"\t",$Columns[$i],"\n");
$returnData{$ColumnHeaders[$i]} = $Columns[$i];
}
return %returnData;
}
</PERL SUB>

John

Gabkin · Sep 21, 2004

John said:
^^^^^^^^^^^^^^^^^^^^^
Ok, you asked for it.

I welcome criticism because I know I am new to perl and am probably
carrying over mistakes from other languages (Java,VB,COBOL) into my perl
writing.

my @ColumnHeaders = $_[1..@_];

Click to expand...

^^^^^^^^^
That is wrong. The '$' at the beginning denotes a scalar value so you
are assigning a single value from the @_ array to the @ColumnHeaders
array. And even if you had used a proper array slice, you are accessing
an extra element at the end of the array that does not exist.

$ perl -le'@x="a".."f"; print @x . " @x"; @y = @x[1..@x]; print @y . "
@y"'
6 a b c d e f
6 b c d e f

The correct syntax is:

my @ColumnHeaders = @_[ 1 .. $#_ ];

However the usual way to do that is:

my ( $ParseMe, @ColumnHeaders ) = @_;

Or if you really want to do it on two lines:

my $ParseMe = shift;
my @ColumnHeaders = @_;

Uh, Thanks. I'm still trying to understand all of this but I have
implemented the much cleaner single-line assignment. I have actually
seen this before and thus should know about it. Thanks!

I still have a lot of trouble with all of the 'magic' variables (like
$#) and 'shift', it may be because I have never used C...

chop() isn't really used very much anymore. You should use chomp()
unless you have a valid reason not to.

It's more a case of I started using chop from the start and it works, so
I haven't changed it, I will try to use 'chomp' over 'chop' though.

As others have pointed out, use the third argument to split().

Yes, I have found that out.
I now use this instead..
my @Columns = split(/\t/,$ParseMe,@ColumnHeaders);

That is usually written as:

for my $i ( 0 .. $#Columns ) {

I have never seen that before, it looks quite handy.
I am not too familiar with the $# usage yet, so I will go look it up now.

$Columns[$i] =~ s/\"//g; # remove extraneous quotes

Click to expand...

Double quote characters don't have to be escaped in regular expressions.

I tend to err on the side of caution with regexes, due to their
inconsistent handling between perl,sed,grep and vi (and probably others
too).
Thanks though, duly noted.

John

Thanks for these little tips!

I would love to get my entire program inspected and criticized like
this, but I feel I might be amiss to post the entire thing (1252 lines
in the main program, and 113 lines in the 'data verification' program),
because I know of at least one major algorithm that I did wrong, I used
a hash where I should have used a string.

Gabkin · Sep 21, 2004

thundergnat said:
Did you read the docs for split? (Really. Not being sarcastic.)

Seems like you are looking for the Limit option on split.

Since you know how many cloumns you are looking for, specify that.

You are quite right, I did not read the help for split before posting this!

I apologize, since it has answered my question perfectly...

John W. Krahn · Sep 21, 2004

Gabkin said:
John said:

The correct syntax is:

my @ColumnHeaders = @_[ 1 .. $#_ ];

However the usual way to do that is:

my ( $ParseMe, @ColumnHeaders ) = @_;

Or if you really want to do it on two lines:

my $ParseMe = shift;
my @ColumnHeaders = @_;

Click to expand...

Uh, Thanks. I'm still trying to understand all of this but I have
implemented the much cleaner single-line assignment. I have actually
seen this before and thus should know about it. Thanks!

I still have a lot of trouble with all of the 'magic' variables (like
$#) and 'shift', it may be because I have never used C...

Don't confuse the magic variable $# (which is deprecated)

perldoc perlvar

with the index of the last element in an array

perldoc perldata

It's more a case of I started using chop from the start and it works, so
I haven't changed it, I will try to use 'chomp' over 'chop' though.

chop() will always remove and return the last character in a string while
chomp() will remove the value of $/ if it is at the end of the string.

perldoc -f chop
perldoc -f chomp

John

Problem Splitting Text String	2	Dec 29, 2022
Problem with KMKfw libraries	1	May 11, 2023
Trouble creating multi dimensional array. 0 to 26 in 3 dimensions.	1	Oct 12, 2022
Java matrix problem	3	Sep 10, 2023
Non-uniform split	10	Sep 7, 2006
READFILE sorting coding problem	3	Oct 25, 2013
I Need Fix In Code	1	Apr 12, 2023
Split attachment from email	1	Jan 4, 2011

split problem

gabkin

Paul Lalli

thundergnat

John W. Krahn

Gabkin

Gabkin

John W. Krahn

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads