Trouble with Regexps

E

evlika

Hi all,
Can't seem to find the right way to extract what I need

Here is what I have:
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
ASCP ASCP [PACK 1,(2,3) FLO 21-50-04-00
DISAG]
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
AUTO DISABLE RL AUTO DISABLE RL 21-31-04-00
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@


Here is what I need:

ASCP ASCP [PACK 1,(2,3) FLO DISAG] 21-50-04-00
AUTO DISABLE RL AUTO DISABLE RL 21-31-04-00


The columns are seperated by spaces not tabs. The second example I have
no
problem with. The first one has data on a second line that should be on
the
first line/appended to the second column. Any thoughts?

Thanks!
 
G

Gunnar Hjalmarsson

evlika said:
Here is what I have:
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
ASCP ASCP [PACK 1,(2,3) FLO 21-50-04-00
DISAG]
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
AUTO DISABLE RL AUTO DISABLE RL 21-31-04-00
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

Here is what I need:

ASCP ASCP [PACK 1,(2,3) FLO DISAG] 21-50-04-00
AUTO DISABLE RL AUTO DISABLE RL 21-31-04-00

The columns are seperated by spaces not tabs. The second example I
have no problem with. The first one has data on a second line that
should be on the first line/appended to the second column. Any
thoughts?

Maybe something along these lines:

my @rec;
my $i = -1;
while (<>) {
if( substr($_, 0, 1) eq '@' ) {
map { s/\s*$// } @{ $rec[$i] } if $rec[$i];
$i++;
next;
}
no warnings qw(substr uninitialized);
$rec[$i][0] .= substr($_, 0, 27);
$rec[$i][1] .= substr($_, 27, 22);
$rec[$i][2] .= substr($_, 57, 11);
}
print join("\n", @$_), "\n\n" for @rec;
 
A

A. Sinan Unur

Hi all,
Can't seem to find the right way to extract what I need

Here is what I have:
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
ASCP ASCP [PACK 1,(2,3) FLO 21-50-04-00
DISAG]
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
AUTO DISABLE RL AUTO DISABLE RL 21-31-04-00
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@


Here is what I need:

ASCP ASCP [PACK 1,(2,3) FLO DISAG] 21-50-04-00
AUTO DISABLE RL AUTO DISABLE RL 21-31-04-00

This is one of those cases where the lowly substr comes in handy. I have
a feeling someone will post something infinitely neater, but if the
column widths are always the same, then you can do something along the
lines of:

#! /usr/bin/perl

use strict;
use warnings;

use Data::Dumper;

my @parsed;

{
local $/ =

"@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
\n";
while(my $line = <DATA>) {
chomp $line;
my @segments = split "\n", $line;
next unless @segments;

my $segment = shift @segments;

my $left = substr $segment, 0, 26;
my $mid = substr $segment, 27, 30;
my $right = substr $segment, 58;

$left =~ s/\s+$//g;
$right =~ s/^\s+//g;
$mid =~ s/^\s+//g;
$mid =~ s/\s+$//g;

for my $s (@segments) {
$s =~ s/^\s+//g;
$s =~ s/\s+$//g;
$mid .= $s;
}
push @parsed, {
field1 => $left,
field2 => $mid,
field3 => $right,
};
}
}

print Dumper \@parsed;

__DATA__

@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
ASCP ASCP [PACK 1,(2,3) FLO 21-50-04-00
DISAG]
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
AUTO DISABLE RL AUTO DISABLE RL 21-31-04-00
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

D:\Home\asu1\UseNet\clpmisc> b
$VAR1 = [
{
'field1' => 'ASCP',
'field2' => 'ASCP [PACK 1,(2,3) FLODISAG]',
'field3' => '1-50-04-00'
},
{
'field1' => 'AUTO DISABLE RL',
'field2' => 'AUTO DISABLE RL',
'field3' => '1-31-04-00'
}
];
 
A

A. Sinan Unur

my $right = substr $segment, 58;

Despite my assertions, it seems like I really don't know how to count.
That should be:

my $right = substr $segment, 57;

Sinan
 
J

Jeffrey Ross

evlika said:
Hi all,
Can't seem to find the right way to extract what I need

Here is what I have:
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
ASCP ASCP [PACK 1,(2,3) FLO 21-50-04-00
DISAG]
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
AUTO DISABLE RL AUTO DISABLE RL 21-31-04-00
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@


Here is what I need:

ASCP ASCP [PACK 1,(2,3) FLO DISAG] 21-50-04-00
AUTO DISABLE RL AUTO DISABLE RL 21-31-04-00


The columns are seperated by spaces not tabs. The second example I have
no
problem with. The first one has data on a second line that should be on
the
first line/appended to the second column. Any thoughts?

Thanks!
Assuming that your data is in a file called infile and that the 'columns'
are always separated by at least 2 consecutive spaces, that 2 consecutive
space are not present within a 'column', that the optional continuation line
starts with a space, and that "@" in column 1 indicates a separation line,
awk -F" *" '
$0 ~ /^@/ {print f1, f2, f3; f1=f2=f3=""; next}
f1 == "" {f1=$1; f2=$2; f3=$3; next}
{f2=f2 " " $0; next}
END {print f1, f2, f3}
' <infile

In English this may be interpreted as... use awk with two or more spaces as
field separators.
If a line starts with "@" print f1, f2, and f3. Clear f1, f2, and f3. Skip
to next line.
If f1 is empty store field1 in f1, field2 in f2, and field3 in f3. Skip to
next line.
Append this line (which better be the continuation line!) to f2. Skip to
next line.
Once the last line has been processed, print f1, f2, and f3 (in case there's
no final separator line).
The data is read from infile.

Note that I have not tested this, so it may not bequite right but should
give you a a start. It will print a blank line at the beginning and maybe
another at the end. You can avoid that by using "if (f1 != "") print ...".
If the assumptions above do not match your data this solution probably won't
work.
It's probably cleaner in Perl, but I'm more of an awk expert. It would be
much better if you could generate your data with clearer column divisions.
Regards,
Jeffrey.
 
B

Bob Walton

evlika said:
Hi all,
Can't seem to find the right way to extract what I need

Here is what I have:
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
ASCP ASCP [PACK 1,(2,3) FLO 21-50-04-00
DISAG]
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
AUTO DISABLE RL AUTO DISABLE RL 21-31-04-00
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@


Here is what I need:

ASCP ASCP [PACK 1,(2,3) FLO DISAG] 21-50-04-00
AUTO DISABLE RL AUTO DISABLE RL 21-31-04-00


The columns are seperated by spaces not tabs. The second example I have
no
problem with. The first one has data on a second line that should be on
the
first line/appended to the second column. Any thoughts?

Well, the actual format of your incoming data is not apparent, so
any responses will have be based upon assumptions. For example,
with my assumptions indicated in []:

Is the "second field" the only one that can be continued onto the
next line, or can the "first field" and "third field" also be
continued sometimes? [the first and seconds fields may be
continued, the third may not, since there is no way to specify a
third continuation field unless there is a non-empty second field
continuation unless the fields are column-based]

Are "records" always separated with a line containing nothing but
a bunch of @'s? [yes]

Can there be two, three, or more continuation lines, or is it
limited to just one? [indefinite number]

Are the input "fields" delimited by two or more space characters,
or do they occur within specific "columns"? [two or more space
characters, implies a field cannot contain two or more
consecutive space characters]

Is there always a @-line at the start of the data? At the end of
the data? [yes, always @-line at beginning and end]

When continuations are appended, is a space character inserted? [yes]

Given those assumptions:

use strict;
use warnings;
my @fields;
while(<DATA>){
chomp;
if(/^\@+$/){
#remove unwanted extra spaces
for my $f(@fields){
$f=~s/^ +//;
$f=~s/ +$//;
$f=~s/ +/ /g;
}
print "$fields[0] $fields[1] $fields[2]\n"
if @fields;
@fields=();
next;
}
else{
my @pf=$_=~/(.*?)(?: ?$| {2,}(.*?)(?: ?$| {2,}(.*)))/;
die "Input error" unless @pf;
no warnings 'uninitialized';
for my $i(0..2){
$fields[$i].=' '.$pf[$i];
}
}
}
__END__
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
ASCP ASCP [PACK 1,(2,3) FLO 21-50-04-00
DISAG]
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
AUTO DISABLE RL AUTO DISABLE RL 21-31-04-00
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@


generates the output you say you want (with some liberty taken to
prevent wrapping of your data lines -- I shortened them a bit).
....
 
I

ioneabu

evlika said:
Hi all,
Can't seem to find the right way to extract what I need

Here is what I have:
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
ASCP ASCP [PACK 1,(2,3) FLO 21-50-04-00
DISAG]
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
AUTO DISABLE RL AUTO DISABLE RL 21-31-04-00
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@


Here is what I need:

ASCP ASCP [PACK 1,(2,3) FLO DISAG] 21-50-04-00
AUTO DISABLE RL AUTO DISABLE RL 21-31-04-00


The columns are seperated by spaces not tabs. The second example I have
no
problem with. The first one has data on a second line that should be on
the
first line/appended to the second column. Any thoughts?

Thanks!


#!/usr/bin/perl

use strict;
use warnings;

my $arrayref = [ ];
my @record;
while (<>)
{
if ($_ !~ /^@+$/)
{
@record = /(.+)\s{2,}(.+)\s{2,}(.+)/;
}
push @$arrayref, [ @record ];
}

for (@$arrayref)
{
print "$_->[0]\t$_->[1]\t$_->[2]\n" if scalar @$_ == 3;
}
 
G

Gunnar Hjalmarsson

if ($_ !~ /^@+$/)
--------------------^^^^^^
What do you think that does? You probably mean:

if ( $_ !~ /^\@+$/ )

or (cleaner)

unless ( /^\@+$/ )
{
@record = /(.+)\s{2,}(.+)\s{2,}(.+)/;
}
push @$arrayref, [ @record ];

The push() statement should be in the inner block, shouldn't it?

{
@record = /(.+)\s{2,}(.+)\s{2,}(.+)/;
push @$arrayref, [ @record ];
}
print "$_->[0]\t$_->[1]\t$_->[2]\n" if scalar @$_ == 3;
---------------------------------------------^^^^^^^^^^^^^^^^^^^
With the above changes, that condition is redundant.

Nevertheless, I think you missed the point. What's now $arrayref->[0]
and $arrayref->[1] should be merged to one record. See above quote from
the OP.
 
I

ioneabu

Gunnar said:
if ($_ !~ /^@+$/)
--------------------^^^^^^
What do you think that does? You probably mean:

if ( $_ !~ /^\@+$/ )
or (cleaner)
unless ( /^\@+$/ )
{
@record = /(.+)\s{2,}(.+)\s{2,}(.+)/;
}
push @$arrayref, [ @record ];
The push() statement should be in the inner block, shouldn't it?
{
@record = /(.+)\s{2,}(.+)\s{2,}(.+)/;
push @$arrayref, [ @record ];
}
print "$_->[0]\t$_->[1]\t$_->[2]\n" if scalar @$_ == 3;
---------------------------------------------^^^^^^^^^^^^^^^^^^^
With the above changes, that condition is redundant.
Nevertheless, I think you missed the point. What's now $arrayref->[0]
and $arrayref->[1] should be merged to one record. See above quote from
the OP.

Thanks for the tips. I was just looking at the visual format of the
input and desired output so I didn't totally get what he wanted.

wana
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,167
Messages
2,570,913
Members
47,455
Latest member
Delilah Code

Latest Threads

Top