K
kj
There are three major conventions for the end-of-line marker:
"\n", "\r\n", and "\r".
In a variety of situation, Perl must split strings into "lines",
and must therefore follow a particular convention to identify line
boundaries. There are three situations that interest me in
particular: 1. the splitting into lines that happens when one
iterates over a file using the <> operator; 2. the meaning of the
operation performed by chomp; and 3. the meaning of the $ anchor
in regular expressions.
These three issues are tested by the following simple script:
my $lines = my $matches = 0;
while (<>) {
$lines++;
if (/z$/) {
$matches++;
chomp;
print ">$_<";
}
}
print "$/$matches matches out of $lines lines$/";
__END__
I have three files, unix.txt, dos.txt, and mac.txt, each containing
four lines. Disregarding the end-of-line character(s) these lines
are "foo", "bar", "baz", "frobozz".
The file unix.txt uses "\n" to separate the lines. The output that
I get when I pass it as the argument to the script is this:
% demo.pl unix.txt
The file dos.txt uses "\r\n" to separate lines, and the file mac.txt
uses "\r". Here's the output I get when I pass these files to the
script:
% demo.pl dos.txt
0 matches out of 4 lines
% demo.pl mac.txt
0 matches out of 1 lines
How can I change the script so that the output for unix.txt, dos.txt,
and mac.txt will be the same as the one shown above for unix.txt?
(Mucking with the value of $/ I was able to get <> to split the
input stream at the right places, but it had no impact on the result
of the regular expression match.)
TIA!
kynn
"\n", "\r\n", and "\r".
In a variety of situation, Perl must split strings into "lines",
and must therefore follow a particular convention to identify line
boundaries. There are three situations that interest me in
particular: 1. the splitting into lines that happens when one
iterates over a file using the <> operator; 2. the meaning of the
operation performed by chomp; and 3. the meaning of the $ anchor
in regular expressions.
These three issues are tested by the following simple script:
my $lines = my $matches = 0;
while (<>) {
$lines++;
if (/z$/) {
$matches++;
chomp;
print ">$_<";
}
}
print "$/$matches matches out of $lines lines$/";
__END__
I have three files, unix.txt, dos.txt, and mac.txt, each containing
four lines. Disregarding the end-of-line character(s) these lines
are "foo", "bar", "baz", "frobozz".
The file unix.txt uses "\n" to separate the lines. The output that
I get when I pass it as the argument to the script is this:
% demo.pl unix.txt
2 matches out of 4 linesbaz<>frobozz<
The file dos.txt uses "\r\n" to separate lines, and the file mac.txt
uses "\r". Here's the output I get when I pass these files to the
script:
% demo.pl dos.txt
0 matches out of 4 lines
% demo.pl mac.txt
0 matches out of 1 lines
How can I change the script so that the output for unix.txt, dos.txt,
and mac.txt will be the same as the one shown above for unix.txt?
(Mucking with the value of $/ I was able to get <> to split the
input stream at the right places, but it had no impact on the result
of the regular expression match.)
TIA!
kynn