String search/match question

K

Kevin Potter

I have a string, that I need to find the first occurance of a
particular known character in ("index" comes to mind), but with a
twist.

The problem is, I need to find the first occurance of this character
that is not enclosed in brackets "[]" in this string.

Example string:
"abcdefg[extra]hijklmn[texas]opqrstuvwxYZabc[tex]defghijklmnopqrstuvwxyz".

How can I find the POSITION number of the first occurance of the
letter "x" that is not enclosed in the bracketed text (the one I want
is the one directly before the capital YZ).

I've tried numerous versions indexing this and that, comparing this to
the positions of the first and successive brackets, but cannot come up
with a realiable method.

Any thoughts, or pointers would be really appreciated. Maybe a fancy
regex would be able to do this?
 
M

Matija Papec

X-Ftn-To: Kevin Potter

How can I find the POSITION number of the first occurance of the
letter "x" that is not enclosed in the bracketed text (the one I want
is the one directly before the capital YZ).

I've tried numerous versions indexing this and that, comparing this to
the positions of the first and successive brackets, but cannot come up
with a realiable method.

Any thoughts, or pointers would be really appreciated. Maybe a fancy
regex would be able to do this?

It's not fancy but it should be reliable. :)

my $s =
"abcdefg[extra]hijklmn[texas]opqrstuvwxYZabc[tex]defghijklmnopqrstuvwxyz";

s/(\[.+?\])/' ' x length $1/ge, $_ = index $_,'x' for my $pos = $s;
print "$pos\n";
 
B

Bigus

Matija Papec said:
X-Ftn-To: Kevin Potter

How can I find the POSITION number of the first occurance of the
letter "x" that is not enclosed in the bracketed text (the one I want
is the one directly before the capital YZ).

I've tried numerous versions indexing this and that, comparing this to
the positions of the first and successive brackets, but cannot come up
with a realiable method.

Any thoughts, or pointers would be really appreciated. Maybe a fancy
regex would be able to do this?

It's not fancy but it should be reliable. :)

my $s =
"abcdefg[extra]hijklmn[texas]opqrstuvwxYZabc[tex]defghijklmnopqrstuvwxyz";

s/(\[.+?\])/' ' x length $1/ge, $_ = index $_,'x' for my $pos = $s;
print "$pos\n";

think you might need to add 1 to that.. it's cleverer than my solution
though:

$match = "x";
$str =
"abcdefg[extra]hijklmn[texas]opqrstuvwxYZabc[tex]defghijklmnopqrstuvwxyz";
($ms) = $str =~ /(?:^|\])[^\[]*$match(.*$)/;
print length($str)-length($ms);

Bigus
 
M

Matt Garrish

Matija Papec said:
X-Ftn-To: Kevin Potter

How can I find the POSITION number of the first occurance of the
letter "x" that is not enclosed in the bracketed text (the one I want
is the one directly before the capital YZ).

I've tried numerous versions indexing this and that, comparing this to
the positions of the first and successive brackets, but cannot come up
with a realiable method.

Any thoughts, or pointers would be really appreciated. Maybe a fancy
regex would be able to do this?

It's not fancy but it should be reliable. :)

my $s =
"abcdefg[extra]hijklmn[texas]opqrstuvwxYZabc[tex]defghijklmnopqrstuvwxyz";

s/(\[.+?\])/' ' x length $1/ge, $_ = index $_,'x' for my $pos = $s;
print "$pos\n";

It won't work, though, if there are nested brackets. I would prefer
something like the following just to be safe:

my $s =
"abcdefg[extra]hijklm[n[texas]opqrstuvwxYZ]abc[tex]defghijklmnopqrstuvwxyz";

my $pos = 0;
my $bracket = 0;

foreach my $char ( $s =~ /(.)/g ) {

if ( ($char eq 'x') && !$bracket ) {

print $pos;

}

elsif ( $char eq '[' ) {

$bracket += 1;

}

elsif ( $char eq ']' ) {

$bracket -= 1;

}

else {

}

$pos += 1;

}


Matt
 
T

Tore Aursand

The problem is, I need to find the first occurance of this character
that is not enclosed in brackets "[]" in this string.

1. Remove all the characters within brackets.
2. Use 'index'.
 
A

Anno Siegel

Tore Aursand said:
The problem is, I need to find the first occurance of this character
that is not enclosed in brackets "[]" in this string.

1. Remove all the characters within brackets.
2. Use 'index'.

Removing won't do, you must replace them. Matija has proposed a a solution
based on that. One small but relevant problem is that we must use a
character other than the one we want to find. chr( ( 1 + ord $char) % 256)
should do, but it ain't pretty.

The standard solution is in the pertinent FAQ

How can I split a [character] delimited string except when inside
[character]?

found through "perldoc -q 'except'". It points to a number of modules,
most of them standard, that can be used.

For an ad-hoc solution, this may be one of the rare cases where
character-wise processing is indicated in Perl. It leads to simple
code that covers nested "[]":

$_ =
"abcdefg[extra]hijklmn[texas]opqrstuvwxYZabc[tex]defghijklmnopqrstuvwxyz";
my $char = 'x';

my ( $pos, $nest) = 0;
for ( split // ) {
$nest ++ if $_ eq '[';
$nest -- if $_ eq ']';
last if $_ eq $char and not $nest;
$pos ++;
}
print "$pos (", substr( $_, $pos), ")\n" if $pos < length;

I'd like to note that the seemingly random string "abcdefg[extra]hi..."
is a quite well thought-out bit of test data. It contains a few cases
of "[]"-enclosed "x", which helps explain the intent, as well as two
cases of un-enclosed "x". The one that matters is marked with the only
two upper-case letters, so it's easy to find. Otherwise, it is
uncluttered, the only non-alphabetics being the relevant "[]".

We get to see such a lot of ill-prepared test data, this seems worth
mentioning.

Anno
 
K

Kevin Potter

All EXCELLENT suggestions! Thanks for the help! This will get me going!
-Kevin-
 
J

John W. Krahn

Kevin said:
I have a string, that I need to find the first occurance of a
particular known character in ("index" comes to mind), but with a
twist.

The problem is, I need to find the first occurance of this character
that is not enclosed in brackets "[]" in this string.

Example string:
"abcdefg[extra]hijklmn[texas]opqrstuvwxYZabc[tex]defghijklmnopqrstuvwxyz".

How can I find the POSITION number of the first occurance of the
letter "x" that is not enclosed in the bracketed text (the one I want
is the one directly before the capital YZ).

I've tried numerous versions indexing this and that, comparing this to
the positions of the first and successive brackets, but cannot come up
with a realiable method.

Any thoughts, or pointers would be really appreciated. Maybe a fancy
regex would be able to do this?

perl -le'
$_ = "abcdefg[extra]hijklmn[texas]opqrstuvwxYZabc[tex]defghijklmnopqrstuvwxyz";
/x(?=[^]]*\[)/ && print $-[0]
'
37



John
 
A

Anno Siegel

John W. Krahn said:
Kevin said:
I have a string, that I need to find the first occurance of a
particular known character in ("index" comes to mind), but with a
twist.

The problem is, I need to find the first occurance of this character
that is not enclosed in brackets "[]" in this string.

Example string:
"abcdefg[extra]hijklmn[texas]opqrstuvwxYZabc[tex]defghijklmnopqrstuvwxyz".

How can I find the POSITION number of the first occurance of the
letter "x" that is not enclosed in the bracketed text (the one I want
is the one directly before the capital YZ).

I've tried numerous versions indexing this and that, comparing this to
the positions of the first and successive brackets, but cannot come up
with a realiable method.

Any thoughts, or pointers would be really appreciated. Maybe a fancy
regex would be able to do this?

perl -le'
$_ = "abcdefg[extra]hijklmn[texas]opqrstuvwxYZabc[tex]defghijklmnopqrstuvwxyz";
/x(?=[^]]*\[)/ && print $-[0]
'
37

That fails when no "[" follows the valid "x".

Anno
 
J

John W. Krahn

Anno said:
John W. Krahn said:
perl -le'
$_ = "abcdefg[extra]hijklmn[texas]opqrstuvwxYZabc[tex]defghijklmnopqrstuvwxyz";
/x(?=[^]]*\[)/ && print $-[0]
'
37

That fails when no "[" follows the valid "x".

perl -le'
$_ =
"abcdefg[extra]hijklmn[texas]opqrstuvwYZabc[tex]defghijklmnopqrstuvwxyz";
/x(?=[^]]*(?:\[|$))/ && print $-[0]
'
67


John
 
A

Anno Siegel

John W. Krahn said:
Anno said:
John W. Krahn said:
perl -le'
$_ = "abcdefg[extra]hijklmn[texas]opqrstuvwxYZabc[tex]defghijklmnopqrstuvwxyz";
/x(?=[^]]*\[)/ && print $-[0]
'
37

That fails when no "[" follows the valid "x".

perl -le'
$_ =
"abcdefg[extra]hijklmn[texas]opqrstuvwYZabc[tex]defghijklmnopqrstuvwxyz";
/x(?=[^]]*(?:\[|$))/ && print $-[0]
'
67

Yup.

Anno
 
K

Kevin Potter

John W. Krahn said:
Anno said:
perl -le'
$_ = "abcdefg[extra]hijklmn[texas]opqrstuvwxYZabc[tex]defghijklmnopqrstuvwxyz";
/x(?=[^]]*\[)/ && print $-[0]
'
37

That fails when no "[" follows the valid "x".

perl -le'
$_ =
"abcdefg[extra]hijklmn[texas]opqrstuvwYZabc[tex]defghijklmnopqrstuvwxyz";
/x(?=[^]]*(?:\[|$))/ && print $-[0]
'
67

Yup.

Anno

Wow... for myself and other less regular expression savy folks, could
I get a step by step walk-thru of exactly what this expression is
saying:

/x(?=[^]]*(?:\[|$))/ && print $-[0]

Thank you....!
 
S

Sam Holden

John W. Krahn said:
Anno Siegel wrote:


perl -le'
$_ = "abcdefg[extra]hijklmn[texas]opqrstuvwxYZabc[tex]defghijklmnopqrstuvwxyz";
/x(?=[^]]*\[)/ && print $-[0]
'
37

That fails when no "[" follows the valid "x".

perl -le'
$_ =
"abcdefg[extra]hijklmn[texas]opqrstuvwYZabc[tex]defghijklmnopqrstuvwxyz";
/x(?=[^]]*(?:\[|$))/ && print $-[0]
'
67

Yup.

Anno

Wow... for myself and other less regular expression savy folks, could
I get a step by step walk-thru of exactly what this expression is
saying:

/x(?=[^]]*(?:\[|$))/ && print $-[0]

x -> match character 'x'
(?= ... ) -> zero-width positive lookahead assertion
[^]]* -> match zero or more characters which are anything except ']'
(?: ... ) -> grouping without capturing
\[|$ -> match either the character '[' or the end of the string.


So it matches an x for which there is no following ] unless a
[ is found first.

So if [] come in non-nested pairs it matches an x which isn't inside
such a pair of [].
 
K

Kevin Potter

Matija Papec said:
X-Ftn-To: Kevin Potter

How can I find the POSITION number of the first occurance of the
letter "x" that is not enclosed in the bracketed text (the one I want
is the one directly before the capital YZ).

I've tried numerous versions indexing this and that, comparing this to
the positions of the first and successive brackets, but cannot come up
with a realiable method.

Any thoughts, or pointers would be really appreciated. Maybe a fancy
regex would be able to do this?

It's not fancy but it should be reliable. :)

my $s =
"abcdefg[extra]hijklmn[texas]opqrstuvwxYZabc[tex]defghijklmnopqrstuvwxyz";

s/(\[.+?\])/' ' x length $1/ge, $_ = index $_,'x' for my $pos = $s;
print "$pos\n";

Matija's regex works *almost* perfectly for my needs, but I've just
discovered that the data enclosed in the [...] will also contain
unprintable ascii characters [00-FF]. In testing the regex above, it
doesn't seem to work if there is a linefeed (\x0a) contained within
the brackets,,, the regex doesn't seem to see the [...] block
correctly to perform the space substitution for this block of data.

Given example string:

my $s = "abc[extra\xa ]x";

The regex returns 5. This is the first occurance of X, but I need it
to ignore the characters in the [...], no matter what characters are
contained in there. Is there a way to have it ignore the \x0a?

Thanks again.
 
E

Eric Bohlman

(e-mail address removed) (Kevin Potter) wrote in
s/(\[.+?\])/' ' x length $1/ge, $_ = index $_,'x' for my $pos = $s;
print "$pos\n";

Matija's regex works *almost* perfectly for my needs, but I've just
discovered that the data enclosed in the [...] will also contain
unprintable ascii characters [00-FF]. In testing the regex above, it
doesn't seem to work if there is a linefeed (\x0a) contained within
the brackets,,, the regex doesn't seem to see the [...] block
correctly to perform the space substitution for this block of data.

By default, '.' matches any character *except* a newline (represented on
most systems as \x0a). perldoc perlre will show you how to override that
behavior.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,156
Messages
2,570,878
Members
47,404
Latest member
PerryRutt

Latest Threads

Top