Regexp greediness.

T

Tad McClellan

robic0 said:
On the greedy issue, given

"this string \"some other\"" =~ m/"(.+)"/;

would match and $1 would equal <this string "some other">


No it wouldn't.

perl -le 'print $1 if "this string \"some other\"" =~ m/"(.+)"/'

(output: some other)

As a general rule,


you should wait for the effect of the drugs to wear off before you post.
 
I

it_says_BALLS_on_your forehead

Tad said:
No it wouldn't.

perl -le 'print $1 if "this string \"some other\"" =~ m/"(.+)"/'

(output: some other)

perl -le 'print $1 if q{"this string "some other""} =~ m/"(.+)"/'

....would have the desired output.
 
P

Paul Lalli

it_says_BALLS_on_your forehead said:
perl -le 'print $1 if q{"this string "some other""} =~ m/"(.+)"/'

...would have the desired output.

.... yes it would, but it's not what the OP wrote, so what's your point?

Paul Lalli
 
I

it_says_BALLS_on_your forehead

Paul said:
... yes it would, but it's not what the OP wrote, so what's your point?

i guess my point is that i think it's helpful to show the string that
would match the pattern and populate $1 with the expected output in
addition to pointing out that robic0's assertion was false.

maybe it's not too helpful in this particular instance, but i think an
explanation following an example of something not working is helpful
for readers. if the explanation can be summarized/encapsulated by a
simple example of something working, then that's great. i think (hope)
that readers will benefit from my working example by seeing that the
*reason* for robic0's error was that the initial and final double
quotes in his string were not contents of the string, which is a
mistake easily made.

so i posted because if *i* saw something that failed, but didn't know
why, i would like to see someone who *did* know why post an
explanation, or (if an explanation might necessarily be too exhaustive)
a simple example illustrating the reason why it failed. that's my
interpretation of the spirit of Usenet. Golden Rule. i've been called
naive before though...
 
R

robic0

robic0 schrob:

This regex doesn't make sense. It's parsed as:

( ^-> ) | ( <-\s+(.+)@@\s+"(.+?)" ) Not correct

because | has very low precedence. (?:->) by itself is always the same
as -> alone. This also means $1 is undef if the first part succeeds.

Please read perldoc perlretut and perldoc perlre.

HTH, Lukas

'|' works on adjacent groups or character
if it makes you feel better use this
$hl =~ s/^(?(?:->)|(?:<-))\s+(.+?)@@\s+"(.+?)"/$1/sg;
 
D

DJ Stunks

robic0 said:
< yet another mess of complete and total gibberish>

hey, on a related topic, does anyone know how to rate a post as 0 *'s
on Google?

0: poster clearly eating steady diet of retard sandwiches

hahahaha

-jp
 
L

Lukas Mai

robic0 schrob:
'|' works on adjacent groups or character
if it makes you feel better use this
$hl =~ s/^(?(?:->)|(?:<-))\s+(.+?)@@\s+"(.+?)"/$1/sg;

There are no "adjacent groups or character". Your so-called regex
doesn't even compile. Please learn about regular expressions *now*.

If you contradict me one more time when I'm obviously right and you
haven't even tried an example or read the docs, I will claim your soul
as punishment for your foolishness.

HTH, Lukas
PS: F'up-to poster set.
 
R

robic0

robic0 schrob:
Here's just something to bust Gunnar's balls, its the ^ it's
anti-greedy formula, if you can understand it...
$_ =
qr/(?:<(?:(?:(\/*)($Name)\s*(\/*))|(?:META(.*?))|(?:($Name)((?:\s+$Name\s*=\s*["'][^<]*['"])+)\s*(\/*))|(?:\?(.*?)\?)|(?:!(?:(?:DOCTYPE(.*?))|(?:\[CDATA\[(.*?)\]\])|(?:--(.*?[^-])--)|(?:ATTLIST(.*?))|(?:ELEMENT(.*?))|(?:ENTITY(.*?)))))>)|(.+?)/s;

OK, let's see:
The last (.+?) doesn't make sense because it's not followed by any

This regular expression is pre-compiled for use in another expression
(I put in the $_ but its assigned a permanent name in use).
This
$RxParse = qr/(?:<(?:(..)|(..)|(..))>)|(.+?)/s;
( ( 1 1|2 2|3 3) )|4 4
broken down is --

two outer posibilities separed by '|', one is $1,2 or 3, the other is $4.
pattern, which means +? will never backtrack to consume more. It should
be equivalent to (.).

The whole thing looks like a horribly broken regex for HTML parsing.

The whole thing is a high performance main parsing regexp used in a finished
XML 1.1 compliant parser. I say main because there are several subsequent regexp.

It kinda looks like this in a program line --
while ($$ref_parse_ln =~ /$RxParse/g) {}
It
produces weird results for input like '<META content=">foo">' or '<img
alt="foo"> this is not part of "foo">'. The last one is due to
inappropriate greediness.

I won't recommend any perldoc reading or any of that shit but
'<META content=">foo">'
poses a paradox that results in a non-parsing dilema that logic can't cure.
Primarily, the default general entities
&amp;&gt;&lt;&apos;&quot;
&><'"
apply even in html META char data.

I have written in perl, a high performance XML 1.1 compliant parser. META is not part of
XML. I have generalized my main regexp (with much penalty) to include META. The position
of META in the regexp overlaps HTML and XHTML because of closure.

This is because I am going to integrated HTML, XHTML and XML into a parser. I'm sitting on
a pure perl 1.1 compliant parser on my hard drive right now. A highly tuned, high-performance,
1.1 compliant parser. There's many tools involved. I also want to do full Schema validation.
I also want to jam as many tools as possible into it. I also don't want to give it away, I'm
not into this for the glory!

The $Name above works out to this (I don't feel I'm giving anything away here, this is trivial) --

@UC_Nstart = (
"\\x{C0}-\\x{D6}",
"\\x{D8}-\\x{F6}",
"\\x{F8}-\\x{2FF}",
"\\x{370}-\\x{37D}",
"\\x{37F}-\\x{1FFF}",
"\\x{200C}-\\x{200D}",
"\\x{2070}-\\x{218F}",
"\\x{2C00}-\\x{2FEF}",
"\\x{3001}-\\x{D7FF}",
"\\x{F900}-\\x{FDCF}",
"\\x{FDF0}-\\x{FFFD}",
"\\x{10000}-\\x{EFFFF}",
);
@UC_Nchar = (
"\\x{B7}",
"\\x{0300}-\\x{036F}",
"\\x{203F}-\\x{2040}",
);
$Nstrt = "[A-Za-z_:".join ('',@UC_Nstart)."]";
$Nchar = "[-\\w:\\.".join ('',@UC_Nchar).join ('',@UC_Nstart)."]";
$Name = "(?:$Nstrt$Nchar*?)";
I don't understand that but it's "icebergs".
I hope you can forgive a dislexic, bad speller
HTH, Lukas

Hey thanks Lukas!
Any more questions, just let me know
 
R

robic0

hey, on a related topic, does anyone know how to rate a post as 0 *'s
on Google?

0: poster clearly eating steady diet of retard sandwiches

hahahaha

-jp

Back to class you fuckin dipshit
 
R

robic0

No it wouldn't.

perl -le 'print $1 if "this string \"some other\"" =~ m/"(.+)"/'

(output: some other)




you should wait for the effect of the drugs to wear off before you post.

Yup, some good wine there. To tell you the truth I've been doing
so much damn regex I'm quite sick of it. The truth is even you can't
know from source so you have to run program check on even a simple
thing as this.

The only problem with Perl is you can't make money from it...
ohhh, cgi for the porno industry......
 
R

robic0

That's a break of the netiquette. OTOH, in your case it probably
wouldn't have made a difference.


Wrong, but what else could we expect from that robic0 character?


There is more in the regex but those arrows, so your discussion is out
of context and thus irrelevant.


More BS statements. Fact is that greediness _never_ affects whether a
regex matches or not.


LOL, robic0 commenting on the /s modifier again. Maybe you could explain
how it would make a difference in this case? (Second thought: Please
don't!!)

I'll try to sort this out. I was a little pasted when I posted (am a little now too).

First and formost, alternation (this regex '->|<-' as an example) is character
or group orientated, specifically '>|<'. Anything to the left or right is NOT,
I repeat NOT considered in the alternation character '|' syntax.

Is that not correct, or what ???
 
R

robic0

No, but wasn't intended to - I should have specified that the input I
posted was all one line, but posting it on groups.google munged it a
bit.


No, quite. I understand that now.


I have done, but in this case it was a near-terminal case of stupidity
brought on, I think, by tiredness.

Thanks Gunnar.

Adam...

You cower like a dog, and make excuses.
Stand the **** up and walk and talk like a man!
 
R

robic0

Hi there Perl gurus,

I'm using (trying to use) a regexp to extract a path and a comment from
the output of a 'describe' command in clearcase. I suspect I'm being
daft, so please go easy on me... I have read what I think is the
appropriate perldoc (perldoc -q greedy - "What does it mean that
regexes are greedy? How can I get around it? greedy greediness"), but
I'm already doing what it suggests - that is, reducing the greediness
of the '.+' expression with a '?'. I guess I must have missed
something..

The input ($hl) should look something like this:
->
M:\adam_admin\eq-gla_playpen\eq-cc-gla_playpen\testforccase\TestTwo.PRM@@
"This is the to-text"

I'm trying to get at the path and the comment:

if ($hl =~ m%^[->|<-]%)
{
$hl =~ s%^[->|<-] (.+?)@@ "(.+?)"%$1%;
$comment = $2;
}
else
{
$hl = 0;
}
print "Target is $hl\n";
print "Comment is \"$comment\"\n";

Produces the following output:
Target is ->
M:\adam_admin\eq-gla_playpen\eq-cc-gla_playpen\testforccase\TestTwo.PRM@@
"This is the to-text"
Comment is ""

I've also tried escaping the '@@' thus '\@\@' but that hasn't made any
difference.

My suggestion for you is using something more specific ([^@]*), ([^"]*)
instead of always swaying between greedy and non-greedy things(.*?) or
(.*). For you case, if you can make sure that there is not any '@' in
you pathname, you can do it this way:
=============================
$hl =q(->
M:\adam_admin\eq-gla_playpen\eq-cc-gla_playpen\testforccase\TestTwo.PRM@@
"This is the to-text");

if ($hl =~ m%^(?:->|<-)%)
{
$hl =~ s%^(?:->|<-)\s*([^@]*)@@\s*"([^"]*)"%$1%x;

$comment = $2;
}

else
{
$hl = 0;
}

print "Target is $hl\n";
print "Comment is \"$comment\"\n";
=============================
Best,
Xicheng
se I could probably use split to do this and then substitute out
the -> or <-, but I'm quite keen to understand what I'm doing wrong.

Cheers - Adam...
 
R

robic0

only if you had retard sandwiches for lunch.

Ok, apparently your fixated on the word 'retard' as if either you are
handicapped (or young), and without thought of the millions of such
on this planet, or you are incapable of expressing why that is such a
fixture in your writing. Please don't go postal in your home town on
the retards! Give them a second chance before you take out your M-16
(modified to full auto) and cleave limbs, head, leg at the knee, arms
at the elbow, from thier bodies (in wheelchairs).

Please visit the Mental Health Society counselors before making such a
decision!
 
R

robic0

i guess my point is that i think it's helpful to show the string that
would match the pattern and populate $1 with the expected output in
addition to pointing out that robic0's assertion was false.
"in addition".... Hey, robic0's assertion was false? What was that?
My assertion was loaded with code. Where's yours?
maybe it's not too helpful in this particular instance, but i think an
explanation following an example of something not working is helpful
for readers. if the explanation can be summarized/encapsulated by a
simple example of something working, then that's great. i think (hope)
that readers will benefit from my working example by seeing that the
*reason* for robic0's error was that the initial and final double
quotes in his string were not contents of the string, which is a
mistake easily made.

so i posted because if *i* saw something that failed, but didn't know
whats '*i*'? The asterisks. What failed? What, did I miss something here?
why, i would like to see someone who *did* know why post an
explanation, or (if an explanation might necessarily be too exhaustive)
a simple example illustrating the reason why it failed. that's my
interpretation of the spirit of Usenet. Golden Rule. i've been called
naive before though...
Usenet has nothing to do with anything here. Whats the 'Golden Rule'?
Did I miss something here. Why did you bring up (e-mail address removed) 's name
anyway. Something wrong I said? Shouldn't you ask me about it?
Wuss up dude?
 
R

robic0

robic0 schrob:
'|' works on adjacent groups or character
if it makes you feel better use this
$hl =~ s/^(?(?:->)|(?:<-))\s+(.+?)@@\s+"(.+?)"/$1/sg;

There are no "adjacent groups or character". Your so-called regex
doesn't even compile. Please learn about regular expressions *now*.

If you contradict me one more time when I'm obviously right and you
haven't even tried an example or read the docs, I will claim your soul
as punishment for your foolishness.

HTH, Lukas
PS: F'up-to poster set.


Lukas, as a record, regarding this matter, it is addressed in a different
sub post.
I would like to email you in regards but something not working in agent.
my email is robic0@multipleISP's.com, one iss yahoo. I don't know why you
need to communicate to me but I will listen to your proposition.
I'm out of work right now and would like a project. I'm open to large scale
work propositions. Now is a good time to catch me, once grabbed by the sharks,
even if its shit pay, I will have to take it... and it will consume my time.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,183
Messages
2,570,967
Members
47,517
Latest member
Andres38A1

Latest Threads

Top