Regex repeating capture

J

Jay

Howdy,

I'm trying to break an input string into multpile pieces using a
series of delimiters that start with an asterisk. Following the
asterisk is a mulitple character identifier immediately followed by a
data string of variable length. The input string may contain more than
one identifier anywhere in the string. In all, there are 50+
identifiers to search for and the asterisk is allowed to part of the
data string as long as it isn't defined as an identifier (it would be
treated as another identifier at that point).

Here is a simple example:
*CZ1 2.3 4-56 *fuuuS24364 08 23 72

I'd like to break this into
CZ
1 2.3 4-56
fuuu
S24364 08 23 72

I have tried the pattern (?:\*(CZ|fuuu)(.*)), which produces the
following ouput:
CZ
1 2.3 4-56 *fuuuS24364 08 23 72

How can I force it to repeat the capturing?

Thanks,
Jay
 
P

Paul Lalli

I'm trying to break an input string into multpile pieces using a
series of delimiters that start with an asterisk. Following the
asterisk is a mulitple character identifier immediately followed by a
data string of variable length. The input string may contain more than
one identifier anywhere in the string. In all, there are 50+
identifiers to search for and the asterisk is allowed to part of the
data string as long as it isn't defined as an identifier (it would be
treated as another identifier at that point).

Here is a simple example:
*CZ1 2.3 4-56 *fuuuS24364 08 23 72

I'd like to break this into
CZ
1 2.3 4-56
fuuu
S24364 08 23 72

So CZ and fuuu are your delimiters, but only if preceded by an
asterisk, and you want those delimiters to also be in your results?
I have tried the pattern (?:\*(CZ|fuuu)(.*)),

What does that mean? How did you try it? In a list-context pattern
match? In a split? In a scalar-context pattern match with the /g
option? Please show your actual code, not a tiny piece of it.
which produces the
following ouput:
CZ
1 2.3 4-56 *fuuuS24364 08 23 72

How can I force it to repeat the capturing?

Without knowing what you actually did, there's no way to tell you how
to modify it. I will say that the following seems to produce the
results you were looking for, for the data you gave:

perl -le'
my @fields = split /(\*(?:CZ|fuuu))/, q{*CZ1 2.3 4-56 *fuuuS24364 08
23 72};
s/^\*// for @fields;
print for grep { length } @fields;
'
CZ
1 2.3 4-56
fuuu
S24364 08 23 72

perldoc -f split
perldoc -f grep
perldoc perlretut

Paul Lalli
 
T

Todd

Jay said:
Howdy,

I'm trying to break an input string into multpile pieces using a
series of delimiters that start with an asterisk. Following the
asterisk is a mulitple character identifier immediately followed by a
data string of variable length. The input string may contain more than
one identifier anywhere in the string. In all, there are 50+
identifiers to search for and the asterisk is allowed to part of the
data string as long as it isn't defined as an identifier (it would be
treated as another identifier at that point).

Here is a simple example:
*CZ1 2.3 4-56 *fuuuS24364 08 23 72

I'd like to break this into
CZ
1 2.3 4-56
fuuu
S24364 08 23 72

I have tried the pattern (?:\*(CZ|fuuu)(.*)), which produces the
following ouput:
CZ
1 2.3 4-56 *fuuuS24364 08 23 72

How can I force it to repeat the capturing?

Thanks,
Jay
my $line = '*CZ1 2.3 4-56 *fuuuS24364 08 23 72';
$line =~ /\*(CZ)(.+)\s+\*(fuuu)(.+)\s*$/;

# $1 = CZ
# $2 = 1 2.3 4-56
# $3 = fuuu
# $4 = S24364 08 23 72

Todd
 
M

Mirco Wahab

Jay said:
I'm trying to break an input string into multpile pieces using a
series of delimiters that start with an asterisk. Following the
asterisk is a mulitple character identifier immediately followed by a
data string of variable length.
Here is a simple example:
*CZ1 2.3 4-56 *fuuuS24364 08 23 72
I'd like to break this into
CZ
1 2.3 4-56
fuuu
S24364 08 23 72

I have tried the pattern (?:\*(CZ|fuuu)(.*)), which produces the
following ouput:
CZ
1 2.3 4-56 *fuuuS24364 08 23 72
How can I force it to repeat the capturing?

You can force the repeated capturing by the /g flag
on the regex.

Your complete solution should look, if I
guessed correct from your riddle, sth. like:

...
my $simple = q{*CZ1 2.3 4-56 *fuuuS24364 08 23 72 *AAA3 44 5-66};
my %hits;

$hits{$1} = $2 while $simple=~/\*([a-z]+|[A-Z]+)([^*\\z]+)/g;

print "$_ ==> $hits{$_}\n" for keys %hits;
...

This would print (on the above data):
CZ ==> 1 2.3 4-56
AAA ==> 3 44 5-66
fuuu ==> S24364 08 23 72


But your problem is not really completely specified ...

Regards

M.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,202
Messages
2,571,057
Members
47,661
Latest member
sxarexu

Latest Threads

Top