inputting the ephemerides

S

sln

The dollar-digit variables are only set when the pattern match *succeeds*.

Therefore, you should never use the dollar-digit variables unless
you have first ensured that the match in question succeeded:

if ( /(\w+)\W*(\d2).*(\d2).*(\d2)\W*([-|+]\d+).*(\d+\.\d+).*(\d+\.\d+)
.*(-*\d+\.\d+).*(-*\d+\.\d+)\W*(\w+)\W*/x ) {
print "$1\n";
}
else {
print "match failed!\n";
}


I doubt that \d2 does what you think it does.

It matches 2-digit strings where the 2nd digit is a "2".

You probably want \d{2} instead?

I doubt that [-|+] does what you think it does. It matches any of 3
characters: vertical bar, plus sign, minus sign.

You probably want [+-] instead.

You probably want \W+ rather than \W*

You probably want .*? rather than .*

but I don't have output yet.:-(


Don't try to do it all at once. Get it working a little at a time:

/(\w+)\W+/

/(\w+)\W+(\d{2})/

/(\w+)\W+(\d{2}).*?(\d{2})/

etc...


Note that all of this is moot, because pattern matching is
not the Right Tool for what you are trying to accomplish...

I'm inclined to think that this is the way. With your changes, I'm getting
real clean input until I get halfway through:

my $filename = 'eph6.txt';
open(my $fh, '<', $filename) or die "cannot open $filename: $!";

while (<$fh>) {


/(\w+)\W+/;
/(\w+)\W+(\d{2})/;

/(\w+)\W+(\d{2}).*?(\d{2})/;

/(\w+)\W+(\d{2}).*?(\d{2}).*?(\d{2})/;
/(\w+)\W+(\d{2}).*?(\d{2}).*?(\d{2}).*?([-+]\d{2})/;


[snip]

I'm falling down laughing. This is better than the Comedy chanel.

sln
 
L

Larry Gates

I'm falling down laughing. This is better than the Comedy chanel.

Apparently this regex has a side effect that involves a catastrophic loss
of seating stability.

my $filename = 'eph6.txt';
open(my $fh, '<', $filename) or die "cannot open $filename: $!";

while (<$fh>) {


/(\w+)\W+(\d{2}).*?(\d{2}).*?(\d{2}).*?([-+]\d{2}).*?(\d+\.\d+)/;

print "string one is $1\n";
print "string two is $2\n";
print "string three is $3\n";
print "string four is $4\n";
print "string five is $5\n";
print "string six is $6\n";
print $_;
}
close($fh)

# perl faulk14.pl


C:\MinGW\source>perl faulk14.pl
string one is Sun
string two is 19
string three is 43
string four is 51
string five is -21
string six is 17.8
Sun 19h 43m 51s -21 17.8' 0.984 -35.020 87.148 Set
....
string one is Pluto
string two is 18
string three is 40
string four is 17
string five is -52
string six is 108.052
Pluto 18h 6m 40s -17 44.9' 32.485 -52.833 108.052 Set
C:\MinGW\source>

Still having trouble pickup up -17 with ([-+]\d{2}) .
--
larry gates

Besides, it's good to force C programmers to use the toolbox occasionally.
:)
-- Larry Wall in
<[email protected]>
 
J

Jim Gibson

Larry Gates said:
I'm falling down laughing. This is better than the Comedy chanel.

Apparently this regex has a side effect that involves a catastrophic loss
of seating stability.

my $filename = 'eph6.txt';
open(my $fh, '<', $filename) or die "cannot open $filename: $!";

while (<$fh>) {


/(\w+)\W+(\d{2}).*?(\d{2}).*?(\d{2}).*?([-+]\d{2}).*?(\d+\.\d+)/;

print "string one is $1\n";
print "string two is $2\n";
print "string three is $3\n";
print "string four is $4\n";
print "string five is $5\n";
print "string six is $6\n";
print $_;
}
close($fh)

# perl faulk14.pl


C:\MinGW\source>perl faulk14.pl
string one is Sun
string two is 19
string three is 43
string four is 51
string five is -21
string six is 17.8
Sun 19h 43m 51s -21 17.8' 0.984 -35.020 87.148 Set
...
string one is Pluto
string two is 18
string three is 40
string four is 17
string five is -52
string six is 108.052
Pluto 18h 6m 40s -17 44.9' 32.485 -52.833 108.052 Set
C:\MinGW\source>

Still having trouble pickup up -17 with ([-+]\d{2}) .

You are actually having trouble extracting '6' from '6m' with (\d{2}),
which causes all of your other matches to shift up one. You are better
off matching a string of digits with (\d+), rather than insisting upon
a specific number. If not, you should use (\d{1,2}).
 
L

Larry Gates

You are actually having trouble extracting '6' from '6m' with (\d{2}),
which causes all of your other matches to shift up one. You are better
off matching a string of digits with (\d+), rather than insisting upon
a specific number. If not, you should use (\d{1,2}).

Thanks, Jim, that puts me back on track.

My next hard match is œ 59.3' ,as in:
Mercury 20h 36m 41s -16œ 59.3'

What I have is *(\d{1,2}\.\d{1}) No amount of mixing symbols worked on
this one.

There's also trouble with the moon, as it has an extra ER that the others
don't have. This is the current script:

my $filename = 'eph6.txt';
open(my $fh, '<', $filename) or die "cannot open $filename: $!";

while (<$fh>) {



/(\w+)\W+(\d{1,2}).*?(\d{1,2}).*?(\d{1,2}).*?([-+]\d{1,2}).*(\d{1,2}\.\d{1})/;

print "string one is $1\n";
print "string two is $2\n";
print "string three is $3\n";
print "string four is $4\n";
print "string five is $5\n";
print "string six is $6\n";
print "string seven is $7\n";
print $_;
}
close($fh)

# perl faulk16.pl


C:\MinGW\source>perl faulk16.pl
string one is Sun
string two is 19
string three is 43
string four is 51
string five is -21
string six is 7.1
string seven is
Sun 19h 43m 51s -21 17.8' 0.984 -35.020 87.148 Set
....
Moon 10h 24m 21s +7 29.5' 58.6 ER -4.992 -102.785 Set
....
string one is Pluto
string two is 18
string three is 6
string four is 40
string five is -17
string six is 8.0
string seven is
Pluto 18h 6m 40s -17 44.9' 32.485 -52.833 108.052 Set
C:\MinGW\source>
^^^^^^^
I wonder why œ doesn't display.
--
larry gates

I suppose you could switch grammars once you've seen "use strict subs".
:)
-- Larry Wall in <[email protected]>
 
L

Larry Gates

perldoc -f split

I couldn't get any input with split. I'm working up a data set that is
properly spaced and columnated.
Assuming you don't mean
perl -p -e 's/[^\s\d]//g' ephemerides.txt

What does this mean? I've been looking at regexs all day and am reminded
of days when I had to consume cyrillic data.

How would you get rid of the ° and the ' and leave the -44.6 in the
following:
° -44.6'
?

--
larry gates

*** The previous line contains the naughty word "$&".\n
if /(ibm|apple|awk)/; # :)
-- Larry Wall in the perl man page
 
T

Tad J McClellan

Larry Gates said:
Assuming you don't mean
perl -p -e 's/[^\s\d]//g' ephemerides.txt

What does this mean?


What did you observe when you tried it?

It deletes all characters except for whitespace and digit characters.

perl -p -e 's/[^\s\d]+//g' ephemerides.txt

does the same thing, only faster.

I haven't benchmarked it, but I'd expect this to be faster still:

perl -p -e 'tr/ \n\r\t\f0123456789//dc' ephemerides.txt

How would you get rid of the ° and the ' and leave the -44.6 in the
following:
° -44.6'


s/[°']//g;

or do it faster without using any regex at all:

tr/°'//d; # tr/// does not use regular expressions

Perhaps you messed up the specification and are really looking for:

tr/°' //d;



When _you_ look at the string, how do _you_ identify the part
that is "interesting"?

Once you can describe that well enough, then writing a pattern
that does it becomes easy or at least possible.


I'll assume that string is in $_.

Keep whatever looks like a number?

($num) = /(-?\d+\.?\d*)/;

(see: perldoc -q "scalar is a number")

Keep a number that comes after whitespace?

($num) = /\s(-?\d+\.?\d*)/;

Keep whatever non-whitespace comes before a "'"?

($num) = /(\S+)'/;

Keep whatever is between whitespace and "'"?

($num) = /\s(\S.*)'/;
or
($num) = /\s+(.+)'/;

Keep whatever comes after whitespace except the last character?

($num) = /\s+(.+)./;

And on and on and on...


You must first identify how the "interesting" part is distinguished
before you can devise a pattern that will match it.
 
J

Jürgen Exner

Larry Gates said:
How would you get rid of the ° and the ' and leave the -44.6 in the
following:
° -44.6'

On way:
my $s = substr('° -44.6\'', 1, length($t)-2);

Another way
my @s = split(//,'° -44.6\'');
my $s = join('', @s[1..$#s-1]);


Or you could play the old
chop
reverse
chop
reverse
trick.

I'm sure there are many more ways to remove the first and last character
of a string.

jue
 
L

Larry Gates

Larry Gates said:
Assuming you don't mean
perl -p -e 's/[^\s\d]//g' ephemerides.txt

What does this mean?


What did you observe when you tried it?

It deletes all characters except for whitespace and digit characters.

perl -p -e 's/[^\s\d]+//g' ephemerides.txt

does the same thing, only faster.

I haven't benchmarked it, but I'd expect this to be faster still:

perl -p -e 'tr/ \n\r\t\f0123456789//dc' ephemerides.txt

My shell might need some tweaking:

C:\MinGW\source>perl -p -e 's/[^\s\d]+//g' eph6.txt
Sun 19h 43m 51s -21 17.8' 0.984 -35.020 87.148 Set
Mercury 20h 36m 41s -16 59.3' 0.747 -22.075 84.236 Set
Venus 22h 51m 18s -7 46.9' 0.691 10.142 72.919 Up
Moon 10h 24m 21s +7 29.5' 58.6 ER -4.992 -102.785 Set
Mars 18h 58m 51s -23 33.8' 2.398 -45.280 90.860 Set
Jupiter 20h 17m 22s -20 8.1' 6.082 -27.618 83.843 Set
Saturn 11h 32m 29s +5 16.0' 8.806 -19.672 -111.729 Set
Uranus 23h 23m 12s -4 46.5' 20.638 18.211 70.235 Up
Neptune 21h 41m 17s -14 13.9' 30.892 -7.527 77.864 Set
Pluto 18h 6m 40s -17 44.9' 32.485 -52.833 108.052 Set
C:\MinGW\source> perl -p -e 'tr/ \n\r\t\f0123456789//dc' eph6.txt
Can't find string terminator "'" anywhere before EOF at -e line 1.

C:\MinGW\source>

How would you get rid of the œ and the ' and leave the -44.6 in the
following:
œ -44.6'


s/[œ']//g;

or do it faster without using any regex at all:

tr/œ'//d; # tr/// does not use regular expressions

Perhaps you messed up the specification and are really looking for:

tr/œ' //d;



When _you_ look at the string, how do _you_ identify the part
that is "interesting"?

Once you can describe that well enough, then writing a pattern
that does it becomes easy or at least possible.


I'll assume that string is in $_.

Keep whatever looks like a number?

($num) = /(-?\d+\.?\d*)/;

(see: perldoc -q "scalar is a number")

Keep a number that comes after whitespace?

($num) = /\s(-?\d+\.?\d*)/;

Keep whatever non-whitespace comes before a "'"?

($num) = /(\S+)'/;

Keep whatever is between whitespace and "'"?

($num) = /\s(\S.*)'/;
or
($num) = /\s+(.+)'/;

Keep whatever comes after whitespace except the last character?

($num) = /\s+(.+)./;

And on and on and on...


You must first identify how the "interesting" part is distinguished
before you can devise a pattern that will match it.

Thanks, Tad. I'll stew on these tonight and see if I can get something
figured out.

Right now, I seem to have a choice

while <$fh>{

/ big honking regex/
}

or I've got a $line, and I don't know how to get it into 8 different
strings. I tried to do your thing hwere you said just to do one step, and
the lippy hollander said it made him laugh his ass off. On the other side
of the learning curve, I find it less than something worth a guffaw.

If I were going to start over, I would replace tabs with spaces, œ with a
space, ER with a space, and then I'd go over it again. Once I had a
columns of spaces between all the players, then I would
my( $name, $hour, $min, $sec, ... ) =
unpack('A8 A4 A4 A4 ... ',$line);

, because then there *does* exist an integer sequence that works.

The fortran folks think perl is great stuff.
--
larry gates

Life gets boring, someone invents another necessity, and once again we
turn the crank on the screwjack of progress hoping that nobody gets
screwed.
-- Larry Wall in <[email protected]>
 
R

RedGrittyBrick

Larry said:
perldoc -f split

I couldn't get any input with split. I'm working up a data set that is
properly spaced and columnated.
Assuming you don't mean
perl -p -e 's/[^\s\d]//g' ephemerides.txt

What does this mean? I've been looking at regexs all day and am reminded
of days when I had to consume cyrillic data.

How would you get rid of the ° and the ' and leave the -44.6 in the
following:
° -44.6'
?

I wouldn't use your approach at all

I'd start with something like this ...



#!perl
use strict;
use warnings;
use Encode qw(encode decode);

while (<>) {
my $line = decode('UTF-8', $_);

chomp $line;
(my $body = substr $line, 0, 8) =~ s/\s+$//;
(my $time = substr $line, 8, 12) =~ s/[hms]\s+/:/g;
(my $angle = substr $line, 24, 12) =~ s/[^\s\d]+//g;
(my $hat = substr $line, 40, 8) =~ s/\s+$//;
(my $debt = substr $line, 48, 8) =~ s/\s+$//;
(my $shoe = substr $line, 56, 8) =~ s/\s+$//;
(my $mood = substr $line, 64) =~ s/\s+//g;

my ($degrees, $minutes) = split(/\s+/, $angle, 2);
$angle = $degrees + $minutes/60;

print
"Body = '", $body, "'\n",
"Time = '", $time, "'\n",
"Angle = '", $angle, "'\n",
"Hat size = '", $hat, "'\n",
"National debt = '", $debt, "'\n",
"Shoe size = '", $shoe, "'\n",
"Mood = '", $mood, "'\n",
"\n";

}


C:\temp>perl eph.pl eph6.txt
Body = 'Sun'
Time = '19:43:51:'
Angle = '23.9666666666667'
Hat size = '0.984'
National debt = '-35.020'
Shoe size = '87.148'
Mood = 'Set'

Body = 'Mercury'
Time = '20:36:41:'
Angle = '25.8833333333333'
Hat size = '0.747'
National debt = '-22.075'
Shoe size = '84.236'
Mood = 'Set'

My script has numerous problems, for example, if your file isn't UTF-8
or has a Byte-Order Mark at the start. But it's where I'd start.
 
T

Tad J McClellan

Larry Gates said:
On Wed, 21 Jan 2009 21:27:34 -0600, Tad J McClellan wrote:


My shell might need some tweaking:

Can't find string terminator "'" anywhere before EOF at -e line 1.


I think Windoze needs double quotes instead:

perl -p -e "tr/ \n\r\t\f0123456789//dc" ephemerides.txt
 
P

Peter J. Holzer

Thanks, Jim, that puts me back on track.

My next hard match is ° 59.3' ,as in:
Mercury 20h 36m 41s -16° 59.3'

What I have is *(\d{1,2}\.\d{1})

"*" applies to the expression to the left of it. Starting a pattern with
it makes no sense. If you think that the pattern to the left of your
number is relevant, quote it completely, otherwise omit it.
No amount of mixing symbols worked on
this one.

There's also trouble with the moon, as it has an extra ER that the others
don't have. This is the current script:

my $filename = 'eph6.txt';
open(my $fh, '<', $filename) or die "cannot open $filename: $!";

while (<$fh>) {



/(\w+)\W+(\d{1,2}).*?(\d{1,2}).*?(\d{1,2}).*?([-+]\d{1,2}).*(\d{1,2}\.\d{1})/;

Use the /x modifier and comments to make your regex more readable:

/(\w+)\W+ # name
(\d{1,2}) .*? (\d{1,2}) .*? (\d{1,2}) .*? # RA
([-+]\d{1,2}) .* (\d{1,2}\.\d{1}) # declination
/x;

This makes it much easier to see that you match the longest possible
sequence of arbitrary characters (/.*/) in the middle of the
declination. So in
Sun 19h 43m 51s -21â–‘ 17.8' 0.984 -35.020 87.148 Set

/.*/ will match "â–‘ 17.8' 0.984 -35.020 8" and /(\d{1,2}\.\d{1})/
will match "7.1". You probably wanted /.*?/ instead like in RA.

But using /.*?/ when you want to match a fixed string isn't ideal,
either. You *know* that between the hours and minutes of the RA there is
always the string "h ", so you should match it that way:

/(\w+)\W+ # name
(\d{1,2}) h\ (\d{1,2}) m\ (\d{1,2}) s .*? # RA
([-+]\d{1,2}) °\ (\d{1,2}\.\d{1})' # declination
/x;

You also know that your fields are separated by a single tab, so match
that tab:

/(\w+) \t # name
(\d{1,2}) h\ (\d{1,2}) m\ (\d{1,2}) s \t # RA
([-+]\d{1,2}) °\ (\d{1,2}\.\d{1})' \t # declination
/x;
Pluto 18h 6m 40s -17â–‘ 44.9' 32.485 -52.833 108.052 Set
C:\MinGW\source>
^^^^^^^
I wonder why ° doesn't display.

° isn't an ASCII character. To read and print the file correctly, you
need to know which character set is used in the file and in your
terminal and convert accordingly.

hp
 
E

Eric Pozharski

"*" applies to the expression to the left of it. Starting a pattern with
it makes no sense. If you think that the pattern to the left of your
number is relevant, quote it completely, otherwise omit it.

Who cares if no-one can?

{8772:15} [1:0]$ perl -wle 'qr{*}'
Quantifier follows nothing in regex; marked by <-- HERE in m/* <-- HERE
/ at -e line 1.

I assume that LG didn't copy-paste. What a surprise...

*CUT*
 
L

Larry Gates

On 2009-01-22 00:40, Larry Gates <[email protected]> wrote:
"*" applies to the expression to the left of it. Starting a pattern with
it makes no sense. If you think that the pattern to the left of your
number is relevant, quote it completely, otherwise omit it.
ok
Use the /x modifier and comments to make your regex more readable:

/(\w+)\W+ # name
(\d{1,2}) .*? (\d{1,2}) .*? (\d{1,2}) .*? # RA
([-+]\d{1,2}) .* (\d{1,2}\.\d{1}) # declination
/x;

This makes it much easier to see that you match the longest possible
sequence of arbitrary characters (/.*/) in the middle of the
declination. So in
Sun 19h 43m 51s -21â–‘ 17.8' 0.984 -35.020 87.148 Set

/.*/ will match "â–‘ 17.8' 0.984 -35.020 8" and /(\d{1,2}\.\d{1})/
will match "7.1". You probably wanted /.*?/ instead like in RA.

But using /.*?/ when you want to match a fixed string isn't ideal,
either. You *know* that between the hours and minutes of the RA there is
always the string "h ", so you should match it that way:

/(\w+)\W+ # name
(\d{1,2}) h\ (\d{1,2}) m\ (\d{1,2}) s .*? # RA
([-+]\d{1,2}) °\ (\d{1,2}\.\d{1})' # declination
/x;

You also know that your fields are separated by a single tab, so match
that tab:

/(\w+) \t # name
(\d{1,2}) h\ (\d{1,2}) m\ (\d{1,2}) s \t # RA
([-+]\d{1,2}) °\ (\d{1,2}\.\d{1})' \t # declination
/x;

Thanks, peter, I've gotten a lot farther now using the above:

my $filename = 'eph6.txt';
open(my $fh, '<', $filename) or die "cannot open $filename: $!";

while (<$fh>) {

/(\w+) \t # name
(\d{1,2}) h\ (\d{1,2}) m\ (\d{1,2}) s \t # RA
([-+]\d{1,2}) °\ (\d{1,2}\.\d{1})' \t # declination
(\d{1,2}\.\d+) \t # no tab before ER distance
([-]\d+\.\d+) \t #altitude
# ([-]\d+\.\d+) #azimuth
# (\w+) #up?
/x;

print "string one is $1\n";
print "string two is $2\n";
print "string three is $3\n";
print "string four is $4\n";
print "string five is $5\n";
print "string six is $6\n";
print "string seven is $7\n";
print "string eight is $8\n";
print "string nine is $9\n";
print "string ten is $10\n";
print $_;
}
close($fh)

# perl faulk19.pl

The output looks much the same, except I've gotten far enough that the ER
in the moon is torpedoing that data.

I was thinking for altitude and azimuth, we could use something like:
+? Match 1 or more times

A couple questions:

q1) Why do you have a backslash after °, h, and m, but not s and ' ?

q2) In your writing above, what would you call it when you mean the
characters between the slashes, like /.*/ ?
° isn't an ASCII character. To read and print the file correctly, you
need to know which character set is used in the file and in your
terminal and convert accordingly.

hp

C:\MinGW\source>perl rgb2.pl
Wide character in print at rgb2.pl line 18, <$fh> line 1.
Body = Tab 1234 $$ ∩â”â•œ quick
^^^^^^
Apparently, this is what it looks like as perl output on windows.
 
L

Larry Gates

I think Windoze needs double quotes instead:

perl -p -e "tr/ \n\r\t\f0123456789//dc" ephemerides.txt

Believe it or not, it represents a day's worth of studying to get the above
ideas hooked up with a binding operator:

my $filename = 'eph6.txt';
open(my $fh, '<', $filename) or die "cannot open $filename: $!";
while (my $line = <$fh>) {
$line =~ s/[^\s\d]+//g;
print $line;
}
close($fh);

print "\n";

open($gh, '<', $filename) or die "cannot open $filename: $!";
while ($line = <$gh>) {
$line =~ tr/ \n\r\t\f01234.+-56789//dc;
print $line;
}
close($gh)

19 43 51 21 178 0984 35020 87148
20 36 41 16 593 0747 22075 84236
22 51 18 7 469 0691 10142 72919
10 24 21 7 295 586 4992 102785
18 58 51 23 338 2398 45280 90860
20 17 22 20 81 6082 27618 83843
11 32 29 5 160 8806 19672 111729
23 23 12 4 465 20638 18211 70235
21 41 17 14 139 30892 7527 77864
18 6 40 17 449 32485 52833 108052
19 43 51 -21 17.8 0.984 -35.020 87.148
20 36 41 -16 59.3 0.747 -22.075 84.236
22 51 18 -7 46.9 0.691 10.142 72.919
10 24 21 +7 29.5 58.6 -4.992 -102.785
18 58 51 -23 33.8 2.398 -45.280 90.860
20 17 22 -20 8.1 6.082 -27.618 83.843
11 32 29 +5 16.0 8.806 -19.672 -111.729
23 23 12 -4 46.5 20.638 18.211 70.235
21 41 17 -14 13.9 30.892 -7.527 77.864
18 6 40 -17 44.9 32.485 -52.833 108.052
C:\MinGW\source>

Does perl have a function like rewind in fortran that obviates reopening
the file to get back to the beginning?
--
larry gates

That being said, I think we should immediately deprecate any string
concatenation that combines "19" with "99". :)
-- Larry Wall in <[email protected]>
 
M

Mart van de Wege

Larry Gates said:
Does perl have a function like rewind in fortran that obviates reopening
the file to get back to the beginning?

perldoc -f seek

Mart
 
R

RedGrittyBrick

Larry said:
A couple questions:

q1) Why do you have a backslash after °, h, and m, but not s and ' ?

Usually a backslash is /before/ something rather than after it.
q2) In your writing above, what would you call it when you mean the
characters between the slashes, like /.*/ ?

perldoc perlop
C:\MinGW\source>perl rgb2.pl
Wide character in print at rgb2.pl line 18, <$fh> line 1.
Body = Tab 1234 $$ ∩â”â•œ quick
^^^^^^
Apparently, this is what it looks like as perl output on windows.

perldoc perliniintro
 
T

Tad J McClellan

Larry Gates said:
Use the /x modifier and comments to make your regex more readable:
[snip]
You *know* that between the hours and minutes of the RA there is
always the string "h ", so you should match it that way:
^^
^^ match an "h" followed by a space
^^^
^^^ yuck
([-+]\d{1,2}) °\ (\d{1,2}\.\d{1})' # declination
/x;

/(\w+) \t # name
(\d{1,2}) h\ (\d{1,2}) m\ (\d{1,2}) s \t # RA
([-+]\d{1,2}) °\ (\d{1,2}\.\d{1})' \t # declination
(\d{1,2}\.\d+) \t # no tab before ER distance
([-]\d+\.\d+) \t #altitude
# ([-]\d+\.\d+) #azimuth
# (\w+) #up?
/x;

print "string one is $1\n";
print "string two is $2\n";
print "string three is $3\n";
print "string four is $4\n";
print "string five is $5\n";
print "string six is $6\n";
print "string seven is $7\n";
print "string eight is $8\n";
print "string nine is $9\n";
print "string ten is $10\n";


You can use m// in a list context to get all the captures:

my @matches = / big ol' honkin' regex /;
print "$_ ==> '$matches[$_]'\n" for 0 .. $#matches;

q1) Why do you have a backslash after °, h, and m, but not s and ' ?


It is not a backslash after "h". It is a backslash before a space character.

Since you're using m//x, space characters are normally ignored. If you
do not want them to be treated normally (ignored) then you must
escape them.

I personally never use /h\ /.

I would instead use either /h[ ]/ or maybe /h\s/.

(Not really. I would really use either /h [ ]/x or /h \s/x :)

q2) In your writing above, what would you call it when you mean the
characters between the slashes, like /.*/ ?


A "pattern" (my favorite) or a "regular expression" or a "regex".
 
P

Peter J. Holzer

It is not a backslash after "h". It is a backslash before a space character.

Since you're using m//x, space characters are normally ignored. If you
do not want them to be treated normally (ignored) then you must
escape them.

I personally never use /h\ /.

I would instead use either /h[ ]/ or maybe /h\s/.

Usually I use \s because most of the time I don't care about the exact
whitespace character, but in this case I didn't want to preach matching
"h " exactly and then matching any whitespace. I considered
using \x{20}, but didn't think of [ ]. You are right, that's the most
readable way.

hp
 
P

Peter J. Holzer

"*" applies to the expression to the left of it. Starting a pattern with
it makes no sense. If you think that the pattern to the left of your
number is relevant, quote it completely, otherwise omit it.

Who cares if no-one can?

{8772:15} [1:0]$ perl -wle 'qr{*}'
Quantifier follows nothing in regex; marked by <-- HERE in m/* <-- HERE
/ at -e line 1.

I care. When I answer a posting I want to understand what the problem
is.

I assume that LG didn't copy-paste. What a surprise...

He did copy-paste (as you would have seen yourself if you had read the rest
of my (or his) posting). He just just copy-pasted one character too
little (or too much), splitting a sub-pattern in half. Which could
either be just a copy-paste error (indicating sloppy editing) or a
failure to understand how regular expressions are constructed.

hp
 
L

Larry Gates

perldoc -f seek

Works like a charm:

my $filename = 'eph6.txt';
open(my $fh, '<', $filename) or die "cannot open $filename: $!";
while (my $line = <$fh>) {
$line =~ s/[^\s\d]+//g;
print $line;
}
print "\n";
seek($fh,0,0);
while (my $line = <$fh>) {
$line =~ tr/ \n\r\t\f01234.+-56789//dc;
print $line;
}
close($fh);

How does a person know what the appropriate whence value is? Zero seems to
work for me. I really like it when something works right off the bat, and
you don't even know why.
--
larry gates

str->str_pok |= SP_FBM; /* deep magic */
s = (unsigned char*)(str->str_ptr); /* deeper magic */
-- Larry Wall in util.c from the perl source code
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,969
Messages
2,570,161
Members
46,710
Latest member
bernietqt

Latest Threads

Top