join on space instead of comma

LHradowy · Aug 4, 2004

Right now I have a perl script that takes a comma separated file and adds a
couple of things to it as will as takes away the data at the end.
I have done this the hard way, by saving a file in excel and a comma
separated file, then ftp it over, dos2ux file >file1.

And this is the outcome BEFORE I run my perl script.
3xxxx18,00 0 02 00,TELN NOT
3xxxx22,00 0 03 11,CUST HAS >

Then after all that I run my perl script against it prompts user for input,
adds some data, then greps file for certain things, and creates 3 files.

What I want to do is elinate the first part of saving it as a comma
separated file. I belive I can do this in perl, but I can not split on
spaces since I have spaces that I need to be part of a column. So, (how to
explain) instead of the above mention where there is a comma, I need to
split this file, based on criteria, and also add a comma between the
columns, so it looks like above...

This is the file I get before I save it as a comma separated file.
3xxxx33 00 0 00 21 CUSTOMER HAS

1

3xxxx63 00 0 01 07 CUSTOMER HAS

1

3xxxx75 00 0 02 09 CUSTOMER HAS

1

3xxxx85 00 0 12 09 TELN NOT BILL
3xxxx28 00 0 02 00 TELN NOT BILL
yada...

I want to avoid this step, how do I change my perl script to reflect this
instead of a comma.
Remember in the 2 and third fields there are spaces that I need.
OUTCOME
3xxxx33,BUILDING1,ROOM2,00 0 00 21,CUSTOMER HAS > 1
3xxxx66,BUILDING1,ROOM2,00 0 01 07,CUSTOMER HAS > 1
3xxxx75,BUILDING1,ROOM2,00 0 02 09,CUSTOMER HAS > 1
3xxxx85,BUILDING1,ROOM2,00 0 12 09,TELN NOT BILL

SCRIPT
*****************************

#!/opt/perl/bin/perl

use strict;
use warnings;

system ("clear"); #Clear the screen
my $acode = "204";

print "Enter BLD: ";
chomp (my $bld =<STDIN>);
my $CAPbld = uc($bld);
my $bld4=substr $CAPbld,0,4; #Pull first 4 char out of BLD for naming of
file

print "Enter Room: ";
chomp (my $room = <STDIN>);
my $CAProom = uc($room);

open my $fc, ">$bld4.cust_has" or die "$bld4.cust_has: $!";
open my $ft, ">$bld4.teln_not" or die "$bld4.teln_not: $!";
open my $fo, ">$bld4.PRTDIST.err" or die "$bld4.PRTDIST.err: $!";

while (<>) {
chomp; # Will remove the leading , or new line
my @a = split /,/, $_, -1;
my $f = /TELN/ ? $ft : /CUST/? $fc : $fo;
print $f join "," => $acode.$a[0],$CAPbld, $CAProom, $a[1], $a[2], "\n";
}
close $fc;
close $ft;
close $fo;

## Modify the cust_has file and pull only the first column.
my $fc_name = "$bld4.cust_has";
open (my $fc, $fc_name) or die "$fc_name:$!";
open my $fcC, ">$bld4.cust_has.tn" or die "$bld4.cust_has.tn: $!";
while (<$fc>) {
chomp;
my ( $FirstField,@Rest)=split /,/;
print $fcC join (",","'$FirstField',",)."\n";
}
close fc;
close fcC;

## Modify the teln_not file to take off last column
## File is now ready for report making.
my $fc_name2 = "$bld4.teln_not";
open (my $fc, $fc_name2) or die "$fc_name2:$!";
open my $fcT, ">$bld4.teln_not-1" or die "$bld4.teln_not-1: $!";
while (<$fc>) {
chomp;
my ( $FirstField1,$SecondField1,$ThirdField1,$FourthField1,@Rest)=split /,/;
print $fcT join
(",","$FirstField1","$SecondField1","$ThirdField1","$FourthField1",)."\n";
}
close fc;
close fcT;

`mv $bld4.teln_not-1 $bld4.teln_not`;

Gunnar Hjalmarsson · Aug 4, 2004

LHradowy said:
And this is the outcome BEFORE I run my perl script.
3xxxx18,00 0 02 00,TELN NOT
3xxxx22,00 0 03 11,CUST HAS >

What I want to do is elinate the first part of saving it as a comma
separated file. I belive I can do this in perl, but I can not
split on spaces since I have spaces that I need to be part of a
column.

Can't you split on instances of multiple spaces?

So, (how to explain) instead of the above mention where there is a
comma, I need to split this file, based on criteria, and also add a
comma between the columns, so it looks like above...

This is the file I get before I save it as a comma separated file.
3xxxx33 00 0 00 21 CUSTOMER HAS > 1
3xxxx63 00 0 01 07 CUSTOMER HAS > 1
3xxxx75 00 0 02 09 CUSTOMER HAS > 1
3xxxx85 00 0 12 09 TELN NOT BILL
3xxxx28 00 0 02 00 TELN NOT BILL

my @a = split /,/, $_, -1;

s/\s+//;
my @a = split /\s{3,}/;

Brian McCauley · Aug 4, 2004

^^^^^^^^^^^^^^^^^
127.0.0.127.... cute!

my (@lines, @fields) = (<>);

I somehow find the technique of tagging extra variables into the LHS
of a list assigment in order to declare them just seems ugly.

Is there really any need to slup here anyhow? Whould it not be
simpler to read the input linewise.

Isn't @fields being declared at the wrong scope anyhow - it should be
inside the loop.

chomp @lines;

for (@lines) {
$fields[0] = substr $_,7,7;
$fields[1] = substr $_,39,10;
$fields[2] = substr $_,63;

For unpacking fixed position records you may want to consider unpack()
as an alternative to several substr().

--
\\ ( )
. _\\__[oo
.__/ \\ /\@
. l___\\
# ll l\\
###LL LL\\

Anno Siegel · Aug 4, 2004

bowsayge said:
LHradowy said to us:

[...]

What I want to do is elinate the first part of saving it as a comma
separated file. I belive I can do this in perl, but I can not split on
spaces since I have spaces that I need to be part of a column.

Click to expand...

[...]

You can extract substrings from your input lines like so:

Ah, you're learning fast. This begins to look like Perl code

Your solution is correct. I'll add a few comments about style and
point out alternatives.

I am aware, if I read your postings right, that you are rather new to
Perl, if not to programming in general. My (and other's) comments are
brief and often have the form of directions. They're still in the spirit
of "you can also do it this way", not of "you should have done it like this".
So...

my (@lines, @fields) = (<>);

You don't need to declare @fields here. Instead, declare it in the
smallest possible scope, which would be the loop body.

But even if you had to declare it here, it isn't the done thing to
combine a mere declaration with a massive operation like slurping the
file. Use an extra line.

The parens around said:
chomp @lines;

"chomp" can be applied to an assignment, even a list assignment. This
*is* idiomatic:

chomp( my @lines = said:
for (@lines) {

This would be the place to declare @fields. The array is cleared each
time my() happens at run-time, usually what you want.

$fields[0] = substr $_,7,7;
$fields[1] = substr $_,39,10;
$fields[2] = substr $_,63;

It is rare in Perl that you need to index into an array. (Hashes are
different.) The more you think of an array as a whole, the better.
This is certainly not a place for indexing.

my @fields = (
substr( $_,7,7),
substr( ...),
substr( ...),
);

But there is a better way. See below...

local $" = ',';

Nothing wrong with that, especially since it's properly localized. Still,
there's a tendency to avoid the "punctuation variables", with a few
exceptions.

print "@fields\n";

Without assignment to $"

print join( ',', @fields), "\n";

}

If you have to extract fields of fixed length at fixed positions,
the unpack() function is the right tool. It can extract multiple
substrings in one step.

"pack" and "unpack" and their formats are a sub-language of its own.
No-one memorizes all of it, but a few idioms are worth memorizing.
One is, to extract a substring of length $length at position $pos,
the unpack template is "@${pos}a$length". Putting it all together,
your solution becomes

chomp( my @lines = <DATA>);
for ( @lines ) {
my @fields = unpack( '@7a7 @39a10 @63a*', $_);
print join( ', ', @fields), "\n";
}

Anno

Andrew Palmer · Aug 5, 2004

Anno Siegel said:
If you have to extract fields of fixed length at fixed positions,
the unpack() function is the right tool. It can extract multiple
substrings in one step.

"pack" and "unpack" and their formats are a sub-language of its own.
No-one memorizes all of it, but a few idioms are worth memorizing.
One is, to extract a substring of length $length at position $pos,
the unpack template is "@${pos}a$length". Putting it all together,
your solution becomes

You don't need both a starting position and a string length for each field
(unpack() will pick up at the next field where it leaves off with the last).
If you need to strip trailing spaces, use capital "A" (which is meant for
extracting space-padded fields), rather than lowercase "a" (which is for
nul-terminated fields).

chomp( my @lines = <DATA>);
for ( @lines ) {
my @fields = unpack( '@7a7 @39a10 @63a*', $_);

For the data posted, the above happens to work the same, although this is my
preferred way:
my @fields = unpack( '@7 A32 A24 A*', $_);

print join( ', ', @fields), "\n";
}

(The "@7" is for the 7 spaces at the beginning of each line. Are they there
in the actual data, or was the example just indented?)

David Combs · Aug 7, 2004

SNIP

If you have to extract fields of fixed length at fixed positions,
the unpack() function is the right tool. It can extract multiple
substrings in one step.

"pack" and "unpack" and their formats are a sub-language of its own.
No-one memorizes all of it, but a few idioms are worth memorizing.
One is, to extract a substring of length $length at position $pos,
the unpack template is "@${pos}a$length". Putting it all together,
your solution becomes

chomp( my @lines = <DATA>);
for ( @lines ) {
my @fields = unpack( '@7a7 @39a10 @63a*', $_);
print join( ', ', @fields), "\n";
}

Anno

Anno -- what are the *other* pack-unpack idioms you think worth
memorizing?

I bet lots of people here would like to see what you've got!

Thanks,

David

Tassilo v. Parseval · Aug 7, 2004

Also sprach David Combs:

Anno -- what are the *other* pack-unpack idioms you think worth
memorizing?

Click to expand...

Not that I'm Anno, but here's one that I find useful, namely the '/'
construct. The template preceeding the slash is used as a count argument
for the template following the slash:

# look at the first byte and extract that many
# bytes after that (3 in this case)
# as unsigned characters

my @x = unpack "c/C", "\x03\x00\x01\xff\x03";
print "@x\n";

__END__
0 1 255

Note how this can be combined with @:

my @x = unpack '@2c/C', "\x03\x00\x01\xff\x03";
print "@x\n",
__END__
255

Tassilo

Anno Siegel · Aug 7, 2004

David Combs said:
SNIP

Anno -- what are the *other* pack-unpack idioms you think worth
memorizing?

I bet lots of people here would like to see what you've got!

Not all that much, come to think of it. There's the bit-counting "%32b*",
but that is advertised right in the unpack doc and needs no promotion.
I use that one even more frequently than the substr() replacement,
but I may be inordinately fond of bit tables.

Other things thing to keep in mind about pack/unpack (though not idioms)
is the possibility of reading the length of a field from the data itself
(the "/" construct). Tassilo has also pointed this out.

Then there's the use of grouping parentheses in a template, which applies
a repeat count to a group of sub-templates at once. In the form
"(<composite template>)*" this is slightly more that syntactic sugar.

Together with the knowledge what pack/unpack generally are about, this
pretty much outlines the range of their applicability. The details
can be looked up when you decide one or the other is a likely candidate.
Very few template characters deserve to be known by heart, maybe

b - a single bit
a - a binary byte
i - a native integer (native to your C compiler)

Anno

David Combs · Aug 11, 2004

THANK YOU!

Now, finally, I have some *real* motivation to (finally) go
learn unpack, so I can *understand* all those tricks.

Any way you two can convince someone (O'Reilly?) to come
up with a "wild hacks with perl" book, and put out a
call for donated hacks to include in it?

Thanks again;

David

Tassilo v. Parseval · Aug 11, 2004

Also sprach David Combs:

Now, finally, I have some *real* motivation to (finally) go
learn unpack, so I can *understand* all those tricks.

Any way you two can convince someone (O'Reilly?) to come
up with a "wild hacks with perl" book, and put out a
call for donated hacks to include in it?

I am not sure that a book with such a title would do Perl's already
quite infamous reputation much good.

Tassilo

nice parallel file reading	14	Apr 26, 2013
Brocade Switch Perl Script	1	Aug 19, 2016
print join to a file	2	Jul 27, 2004
Problem Splitting Text String	2	Dec 29, 2022
space deliminated to comma delinated with varried and need spaces between some columns	14	Sep 20, 2004
Getting position from unpack (was: "join on space instead of comma")	2	Aug 14, 2004
?Merging files based on first two comma delimited fields?	8	Sep 16, 2005
add a "suffix" to a variable in a array	0	Jul 27, 2004

join on space instead of comma

LHradowy

Gunnar Hjalmarsson

Brian McCauley

Anno Siegel

Andrew Palmer

David Combs

Tassilo v. Parseval

Anno Siegel

David Combs

Tassilo v. Parseval

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads