Hello,
Assume I have following string:
my $cmds = <<DOC
__begin {
abc;
def;
{foo;bar}
} __end;
__begin {
cde;
} __end;
abc;
bad;
DOC
;
I want to split it into an array, the first item is "__begin {
abc;
def;
{foo;bar}
} __end", the second item is "__begin {
cde;
} __end", and the third is "abc" and the fourth is "bad".
split obviously cannot be used here, so I use following regex:
my @lines = ($cmds =~ /__begin.*?__end|[^;]+/sg);
^^
my @lines = ($cmds =~ /__begin.*?__end|[^\s;]+/sg);
You were on the right track. [^;] however is first to match all before';',
which means it grabs the' __begin { .. abc;' then the next, then next.
'__begin.*?__end' is never matched. By including not whitespace, [^\s;] in
the character class, begin and end have a chance.
You are right. Thanks for your explanation. My sample is some
oversimple. the standalone sentence may contain other word and space,
with following test message:
my $cmds = <<DOC
__begin {
abc sss;
def;
{foo;bar}
} __end;
__begin {
cde;
} __end;
abc kkk;
bad fde;
DOC
;
you solution gives following Dumper result:
$VAR1 = '__begin {
abc sss;
def;
{foo;bar}
} __end';
$VAR2 = '__begin {
cde;
} __end';
$VAR3 = 'abc';
$VAR4 = 'kkk';
$VAR5 = 'bad';
$VAR6 = 'fde';
Thats too bad. You made a good attempt and I gave
you credit by saying you almost had it right the first time.
And the regex was altered slightly from why you yourself tried.
I didn't write a regex for you. Because if I did that, you could
always come back and say for example:
#>You are right. Thanks for your explanation. My sample is some
#>oversimple. the standalone sentence may contain other word and space,
#>with following test message ...
But you didn't say that in the first place.
that's not what I want. Apart from John's solution, I have no other
solution.
^^^^^^^^^^^^^
Think again ... you just invalidated his regex.
my @lines = $str =~ /^\s*__begin(?s:.*?)__end;$|^\s*\S+;$/mg;
$lines[0] =
" __begin {
abc sss;
def;
{foo;bar}
} __end;"
$lines[1] =
" __begin {
cde;
} __end;"
What are you going to do now?
We're still in the extremely simple stage.
In fact, the more you add, the simpler it gets.
sln
-------------------------------
Version 2
#################
# Misc Parse 2
#################
use strict;
use warnings;
# the old
my $cmd1 = <<DOC1
__begin {
abc;
def;
{foo;bar}
} __end;
__begin {
cde;
} __end;
abc;
bad;
DOC1
;
# the new
my $cmds2 = <<DOC2
__begin {
abc sss;
def;
{foo;bar}
} __end;
__begin {
cde;
} __end;
abc kkk;
bad fde;
DOC2
;
my $str = $cmds2;
my @lines = ($str =~ /\s*(__begin.*?__end|.*?);/sg);
for (my $i = 0; $i < @lines; $i++) {
print "\n\$lines[$i] = \n\n\"$lines[$i]\"\n";
}
__END__
output:
$lines[0] =
"__begin {
abc sss;
def;
{foo;bar}
} __end"
$lines[1] =
"__begin {
cde;
} __end"
$lines[2] =
"abc kkk"
$lines[3] =
"bad fde"