Substition for fixed number of characters

A

Al Roy

Hi,

I have this record format to read:

c001c005ccccc002cc010cccccccccc (record 1)
c001c003ccc020cccccccccccccccccccc (record 2)

where the 3 digit numbers represent the number of characters the
following field has.

So I used the following:

if (/^(.)(\d{3})(.{\2})/)
...

but the \2 does not seem to be substituted by, in these cases, 001,
reading in only 1 character.

I've also tried with $2 instead. Not better.

Any rule that says we can't use the substition in the {} parameter ?

Thanks
Thanks.
 
T

Tad McClellan

Al Roy said:
So I used the following:

if (/^(.)(\d{3})(.{\2})/)
Any rule that says we can't use the substition in the {} parameter ?


This is the 3rd time in the last few days that this question
has been asked. I don't remember which threads they were in though.

I'll let you find those threads to answer your question.

You didn't really ask how to get what you want to get, but I assume
that you do want to get it, so here is how you can get what (I think)
you want to get:

while ( /(\d{3})/g ) {
if ( /\G(.{$1})/ ) {
print "$1\n";
}
}
 
G

Greg Bacon

: I have this record format to read:
:
: c001c005ccccc002cc010cccccccccc (record 1)
: c001c003ccc020cccccccccccccccccccc (record 2)
:
: where the 3 digit numbers represent the number of characters the
: following field has.
:
: So I used the following:
:
: if (/^(.)(\d{3})(.{\2})/)
: ...
:
: but the \2 does not seem to be substituted by, in these cases, 001,
: reading in only 1 character.
:
: I've also tried with $2 instead. Not better.
:
: Any rule that says we can't use the substition in the {} parameter ?

Delayed evaluation comes with the (??{$code}) construct. The feature
is experimental, and I'm either abusing it or hitting a dark corner
with the following code:

#! /usr/bin/perl

use warnings;
use strict;

use re 'eval';
use Data::Dumper;

while (<DATA>) {
my @pairs;

print;
if (/^.((\d\d\d)((??{".{$2}"}))(?{push @pairs, [$2,$3]}))+$/) {
print "match:\n", Dumper \@pairs;
}
else {
print "no match:\n", Dumper \@pairs;
}
}

__END__
c001c003ccc020cccccccccccccccccccc
c001c005ccccc002cc010cccccccccc
c001c005ccccc002cc010ccccccccccX

My suspicion is based on the program's output (empty @pairs):

c001c003ccc020cccccccccccccccccccc
match:
$VAR1 = [
[
'001',
'c'
],
[
'003',
'ccc'
],
[
'020',
'cccccccccccccccccccc'
]
];
c001c005ccccc002cc010cccccccccc
match:
$VAR1 = [];
c001c005ccccc002cc010ccccccccccX
no match:
$VAR1 = [];

You could go conventional:

#! /usr/bin/perl

use warnings;
use strict;

use Data::Dumper;

LINE:
while (<DATA>) {
my @pairs;

print;

my $copy = $_;
chomp $copy;
if ($copy =~ s/^.//) {
my @pairs;

while ($copy =~ s/^(\d\d\d)//) {
my $length = $1;

if ($copy =~ s/^(.{$length})//) {
push @pairs => [$length, $1];
}
}

if ($copy) {
print "no match (trailing [$copy])\n";
next LINE;
}
else {
print "match\n", Dumper \@pairs;
}
}
else {
print "no match (no leader)\n";
}
}

__END__
c001c003ccc020cccccccccccccccccccc
c001c005ccccc002cc010cccccccccc
c001c005ccccc002cc010ccccccccccX

That gives the following output:

c001c003ccc020cccccccccccccccccccc
match
$VAR1 = [
[
'001',
'c'
],
[
'003',
'ccc'
],
[
'020',
'cccccccccccccccccccc'
]
];
c001c005ccccc002cc010cccccccccc
match
$VAR1 = [
[
'001',
'c'
],
[
'005',
'ccccc'
],
[
'002',
'cc'
],
[
'010',
'cccccccccc'
]
];
c001c005ccccc002cc010ccccccccccX
no match (trailing [X])

Hope this helps,
Greg
 
A

Anno Siegel

Tad McClellan said:
This is the 3rd time in the last few days that this question
has been asked. I don't remember which threads they were in though.

I noticed, too. Plus, it's the first time(s) ever I see someone attempt
to put backreferences in replicator braces. I wasn't aware it doesn't
work.
I'll let you find those threads to answer your question.

You didn't really ask how to get what you want to get, but I assume
that you do want to get it, so here is how you can get what (I think)
you want to get:

while ( /(\d{3})/g ) {
if ( /\G(.{$1})/ ) {
print "$1\n";
}
}

Ah, but the old dogs Pack and Unpack have learned a few new tricks recently:

my @fields = unpack 'A(A3/A*)*', $_;

does exactly what the OP was trying to do with a regex: read off a three
digit integer (and skip it), then deliver that many characters in a string.
It even works when the 'ccc...' parts contain more digits.

The template "A(A3/A*)*" uses parentheses to replicate a sub-template
("A3/A*"). This grouping is a new feature that adds considerable
expressiveness to pack templates.

The use of "/" in "A3/A*" is also new, and it is designed for just this
situation where the data contains the length of the next item. Details
are in the pack doc near 'length-item'.

Anno
 
M

Mike Flannigan

Tad said:
You didn't really ask how to get what you want to get, but I assume
that you do want to get it, so here is how you can get what (I think)
you want to get:

while ( /(\d{3})/g ) {
if ( /\G(.{$1})/ ) {
print "$1\n";
}
}

I think I see that the \G anchors the 2nd regex to the end of the
first regex - pretty cool.


Mike
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,293
Messages
2,571,505
Members
48,192
Latest member
LinwoodFol

Latest Threads

Top