P
PerlFAQ Server
This is an excerpt from the latest version perlfaq6.pod, which
comes with the standard Perl distribution. These postings aim to
reduce the number of repeated questions as well as allow the community
to review and update the answers. The latest version of the complete
perlfaq is at http://faq.perl.org .
--------------------------------------------------------------------
6.2: I'm having trouble matching over more than one line. What's wrong?
Either you don't have more than one line in the string you're looking at
(probably), or else you aren't using the correct modifier(s) on your
pattern (possibly).
There are many ways to get multiline data into a string. If you want it
to happen automatically while reading input, you'll want to set $/
(probably to '' for paragraphs or "undef" for the whole file) to allow
you to read more than one line at a time.
Read perlre to help you decide which of "/s" and "/m" (or both) you
might want to use: "/s" allows dot to include newline, and "/m" allows
caret and dollar to match next to a newline, not just at the end of the
string. You do need to make sure that you've actually got a multiline
string in there.
For example, this program detects duplicate words, even when they span
line breaks (but not paragraph ones). For this example, we don't need
"/s" because we aren't using dot in a regular expression that we want to
cross line boundaries. Neither do we need "/m" because we aren't wanting
caret or dollar to match at any point inside the record next to
newlines. But it's imperative that $/ be set to something other than the
default, or else we won't actually ever have a multiline record read in.
$/ = ''; # read in whole paragraph, not just one line
while ( <> ) {
while ( /\b([\w'-]+)(\s+\g1)+\b/gi ) { # word starts alpha
print "Duplicate $1 at paragraph $.\n";
}
}
Here's code that finds sentences that begin with "From " (which would be
mangled by many mailers):
$/ = ''; # read in whole paragraph, not just one line
while ( <> ) {
while ( /^From /gm ) { # /m makes ^ match next to \n
print "leading from in paragraph $.\n";
}
}
Here's code that finds everything between START and END in a paragraph:
undef $/; # read in whole file, not just one line or paragraph
while ( <> ) {
while ( /START(.*?)END/sgm ) { # /s makes . cross line boundaries
print "$1\n";
}
}
--------------------------------------------------------------------
The perlfaq-workers, a group of volunteers, maintain the perlfaq. They
are not necessarily experts in every domain where Perl might show up,
so please include as much information as possible and relevant in any
corrections. The perlfaq-workers also don't have access to every
operating system or platform, so please include relevant details for
corrections to examples that do not work on particular platforms.
Working code is greatly appreciated.
If you'd like to help maintain the perlfaq, see the details in
perlfaq.pod.
comes with the standard Perl distribution. These postings aim to
reduce the number of repeated questions as well as allow the community
to review and update the answers. The latest version of the complete
perlfaq is at http://faq.perl.org .
--------------------------------------------------------------------
6.2: I'm having trouble matching over more than one line. What's wrong?
Either you don't have more than one line in the string you're looking at
(probably), or else you aren't using the correct modifier(s) on your
pattern (possibly).
There are many ways to get multiline data into a string. If you want it
to happen automatically while reading input, you'll want to set $/
(probably to '' for paragraphs or "undef" for the whole file) to allow
you to read more than one line at a time.
Read perlre to help you decide which of "/s" and "/m" (or both) you
might want to use: "/s" allows dot to include newline, and "/m" allows
caret and dollar to match next to a newline, not just at the end of the
string. You do need to make sure that you've actually got a multiline
string in there.
For example, this program detects duplicate words, even when they span
line breaks (but not paragraph ones). For this example, we don't need
"/s" because we aren't using dot in a regular expression that we want to
cross line boundaries. Neither do we need "/m" because we aren't wanting
caret or dollar to match at any point inside the record next to
newlines. But it's imperative that $/ be set to something other than the
default, or else we won't actually ever have a multiline record read in.
$/ = ''; # read in whole paragraph, not just one line
while ( <> ) {
while ( /\b([\w'-]+)(\s+\g1)+\b/gi ) { # word starts alpha
print "Duplicate $1 at paragraph $.\n";
}
}
Here's code that finds sentences that begin with "From " (which would be
mangled by many mailers):
$/ = ''; # read in whole paragraph, not just one line
while ( <> ) {
while ( /^From /gm ) { # /m makes ^ match next to \n
print "leading from in paragraph $.\n";
}
}
Here's code that finds everything between START and END in a paragraph:
undef $/; # read in whole file, not just one line or paragraph
while ( <> ) {
while ( /START(.*?)END/sgm ) { # /s makes . cross line boundaries
print "$1\n";
}
}
--------------------------------------------------------------------
The perlfaq-workers, a group of volunteers, maintain the perlfaq. They
are not necessarily experts in every domain where Perl might show up,
so please include as much information as possible and relevant in any
corrections. The perlfaq-workers also don't have access to every
operating system or platform, so please include relevant details for
corrections to examples that do not work on particular platforms.
Working code is greatly appreciated.
If you'd like to help maintain the perlfaq, see the details in
perlfaq.pod.