Need help with Perl regex

S

surfking

I found this line of code
which was parsing the /etc/termcap file located on a UNIX system.

if (/(^|\|)${term}[:\|]/) {

I used the code from which this line was extracted and it successfully
parsed/extracted the termcap entry for my particulat type of terminal.

I realize that the "^" character is used to anchor the pattern match to
the start of a buffer and that enclosing part of a pattern match within
a set of parenthesis enables you to retrieve the value of the matched
segment and that "|" is used as an "logical or" operator, but given the
format of entries in the /etc/termcap file, I don't see how this pattern
is successfull. Can anyone out there give me some ideas on this ?
 
J

Jim Keenan

surfking said:
I found this line of code
which was parsing the /etc/termcap file located on a UNIX system.

I'm not on such a system, so I can't answer your question. Could you
post a few sample entries from this file? Thanks.

jimk
 
E

Eric Bohlman

I found this line of code
which was parsing the /etc/termcap file located on a UNIX system.

if (/(^|\|)${term}[:\|]/) {

I used the code from which this line was extracted and it successfully
parsed/extracted the termcap entry for my particulat type of terminal.

I realize that the "^" character is used to anchor the pattern match
to the start of a buffer and that enclosing part of a pattern match
within a set of parenthesis enables you to retrieve the value of the
matched segment and that "|" is used as an "logical or" operator, but
given the format of entries in the /etc/termcap file, I don't see how
this pattern is successfull. Can anyone out there give me some ideas
on this ?

Actually, in this case the parentheses are almost certainly being used
simply to set precedence.

Let's spread that regex out a bit, which we can actually do in Perl code
thanks to the "x" modifier:

/ #start regex
( #begin group that's treated as a unit
^ #start of the string
| #logical or
\| #a literal pipe character
) #end group
#so in order to match, it has to either be at the beginning of the line
#or preceded by a pipe symbol
${term} #treat whatever is in the variable $term as part of the regex
[ #begin a character class
: #a literal colon
\| #a literal pipe character
] #end character class
#the character class matches any character that's either a colon or a
#pipe /x #end regex; the "x" lets us put in spaces and comments

So we know that whatever matches has to come either at the beginning of
the line or after a pipe symbol, and it has to end with a colon or a
pipe. The question is, what's in between? We can't know the answer
until we know what's in $term. I can guess (only guess) that it's
simply the name of your terminal and doesn't contain any regex special
characters. If that's the case, then the expression will match any line
in which the name of your terminal appears either at the beginning or
after a pipe, and is immediately followed by either a colon or a pipe.
But again, that's just a guess; if $term contrains any regex special
characters, they'll be treated the same as if they had been written out
in the regex.

The perlretut, perlrequick, perlre, and perlreref documents that come
with every Perl distribution are the definitive reference for Perl
regexes. Start with:

perldoc perlretut

and work your way through them.
 
J

Joe Smith

surfking said:
I found this line of code
which was parsing the /etc/termcap file located on a UNIX system.

if (/(^|\|)${term}[:\|]/) {

format of entries in the /etc/termcap file, I don't see how this pattern
is successfull. Can anyone out there give me some ideas on this ?

It's designed to match entries like this:

ibmpcx|xenix|ibmx|IBM PC xenix console display:

For $term = 'ibmpcx', /^$term[|]/ matches.
For $term = 'xenix', /\|$term[|]/ matches. Same for 'ibmx'.
For $term = 'IBM PC xenix console display', /\|$term[:]/ matches.
For entries that have no aliases, /^$term[:]/ matches.

-Joe
 
T

Tom Regner

surfking said:
I found this line of code
which was parsing the /etc/termcap file located on a UNIX system. [...]
I realize that the "^" character is used to anchor the pattern match to
the start of a buffer and that enclosing part of a pattern match within
a set of parenthesis enables you to retrieve the value of the matched
segment and that "|" is used as an "logical or" operator, but given the
format of entries in the /etc/termcap file, I don't see how this pattern
is successfull. Can anyone out there give me some ideas on this ?

let's see what matches what:
if (/(^|\|)${term}[:\|]/) {
^ ^ ^
| | |
| | |___ 3) colon OR backslash
| |____ OR pipe (character-class)
| 2) contents of variable $term
|
1) beginning of line OR pipe, capturing
pipe is escaped here!

and one entry to see the format:

v1|xterm-24|xterms|vs100|24x80 xterm:\
:li#24:\
:tc=xterm:


that is a pipe-seperated list of terminal names, followed by a colon
followed by colon-delimited definitions, the backslash at the end of each
lines suggests that these three lines are to be read as one huge line (the
newlines are escaped).

so lets see:
--------code
#!/usr/bin/perl
my $term = 'vs100';
my $tc = <<'EOF'
v1|xterm-24|xterms|vs100|24x80 xterm:\
:li#24:\
:tc=xterm:
EOF
;
print "matched\n1) $1\n2) $2\n3) $3\n" if $tc =~ /(^|\|)(${term})([:\|])/;
--------/code
(I'm capturing the three numberd parts for clarification only!)

produces:

[1520]tom@margo perl $ perl test.pl
matched
1) |
2) vs100
3) |


The maybe unfamiliar "${term}" is explained best (at least the best
explanation I found :) in perldoc perldata

--------quote
As in some shells, you can enclose the variable name in braces to
disambiguate it from following alphanumerics (and underscores). You must
also do this when interpolating a variable
into a string to separate the variable name from a following
double-colon or an apostrophe, since these would be otherwise treated as a
package separator.
--------/quote

hth,
Tom
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,164
Messages
2,570,898
Members
47,440
Latest member
YoungBorel

Latest Threads

Top