Brian McCauley said:
Scott said:
"delete text delimited by * and ; respectively, including the delimiters,
unless such delimiters are contained within a quoted string"
... negative lookbehind requires fixed length patterns (bummer).
A common trick is to reexpress the problem as "delete text delimited by *
and ; respectively, including the delimiters, only if such delimiters are
preceeded by an even number of quotes".
use strict;
use warnings;
my @strings =
( 'titles "statement";',
'titles "statement"; * comment ;',
"titles '* ; statement';",
"titles '* ; statement'; * comment ;",
);
for ( @strings ) {
s/^((?
?:[^']*'){2})*[^']*)\*[^;]*;/$1/;
print "$_\n";
}
__END__
The above only removes one comment per string. To remove all you can put
the s/// in a loop until it returns false or use \G and /g.
But before you can do either you must first consider the implications of
quote characters appearing between the * and ;. In the above there can be
unbalanced quotes between * and ; and the ; is still seen as the end of
the comment. Is this right?
No. The titles statements consist of:
titles ['"] text ['"] ;
Comments can be delimited by either /* */ or * ; and can either precede or
follow the titles statements.
Within the titles text, any comment delimiter (/* */ *
can appear as
text, as well as unbalanced quotes.
Quotes can appear as either:
titles 'text "text" text';
titles "text 'text' text";
titles 'text ''text'' text';
titles "text ""text"" text";
In all these scenarios, I want to remove any comments from the text of the
titles block, but leave the code itself intact.
Actually, I can live with not covering all scenarios. The most common
"tricky" scenario would be an asterisk in titles text, with either comment
style following, eg;
titles "PROC FREQ output of var1*var2"; * var2 may have missing values ;
titles "PROC FREQ output of var1*var2"; /* var2 may have missing values */
BTW, what I am working on is something like POD, but for non-Perl files. My
script extracts structured text and builds a documentation file. Within the
program source code, this structured text can appear within the titles
statements, so the programmer only has to specify the text once - once for
the titles statement (executable code) and once for the "POD" output.
A whole extra level of complexity is introduced if you want to consider
both single and double quote characters as marking strings. And yet
another if there is some way to quote quote characters within quoted
strings (other than doubling).
You need to be clear in your mind what you want to do in all possible
cases before you can implement it.
Brian, thank you *so* much for the code you posted. Much appreciated.
Here is a more realistic test case:
use strict;
use warnings;
my @strings1 = (
'1titles "statement";',
'2titles "statement"; * comment ;',
"3titles '* ; statement';",
"4titles '* ; statement'; * comment ;",
'5titles "* ; statement"; * comment ;',
"* comment ; 6titles '* ; statement'; * comment ;",
'7titles "statement"; * Don\'t comment? ;',
"8titles '* ; statement'; * comment ; * another comment? ;",
q{9titles " * This isn't a comment is it?"; * comment ;},
);
my @strings2 = (
'Atitles "statement"; /* comment */',
"Btitles '/* */ statement';",
"Ctitles '/* */ statement'; /* comment */",
"/* comment */ Dtitles '/* */ statement'; /* comment */",
'Etitles "statement"; /* Don\'t comment? */',
'Ftitles "/* */ is this a comment";',
"Gtitles '/* */ statement'; /* comment */ /* another comment? */",
q{Htitles "/* This isn't a comment is it?" */},
);
for ( @strings1 ) {
1 while s/^((?
?:[^']*'){2})*[^']*)\*[^;]*;/$1/;
print "$_\n";
}
for ( @strings2 ) {
1 while s/^((?
?:[^']*'){2})*[^']*)\*[^;]*;/$1/;
print "$_\n";
}
__END__
The code you posted (with the addition of the loop) works for all scenarios
except #5 and #9 (see numbers added to titles statements above). As stated
above, double quotes are also valid delimiters, so I will need to figure out
how to add them to the RE. Scenarios #4 & #5 are the most common; I will
need to code for #5, but can live with some of the other scenarios failing.
I didn't add the scenarios in @strings2 in my previous posts because I was
hoping that, with a solution to @strings1, I could work out how to code
@strings2. My mistake, mea culpa.
The sad fact is, I've come to realize I'm over my head here regarding these
"fancy" regular expressions (as described here
http://www.unix.org.ua/orelly/perl/prog3/ch05_10.htm). I've got the various
O'Reilly Perl books, including Mastering Regular Expressions, but I'm going
to have to read, re-read, re-read, and hack around until I "master regular
expressions" (as the title suggests :-/)
I don't expect you to do my work, so I'll just have to study and hack around
with this until I get it to work.
I really, *really* appreciate your most helpful replies to my recent posts.
Kind Regards,
Scott