Regular expression for this?

S

stevewy

I'm just trying to work out (if what I want is at all possible), a
regular expression that will search for and select (in a text editor
that supports regexps, like Notepad++) the word "onclick", then any
text at all, up to and including ">".

I thought

onclick*\>

would work, but it doesn't.

Basically it needs to Find the word onclick, then select all the text
up to >. Sort of like an extended search.

The wildcard "*" symbol select "the previous token", not "all and
anything" like I am used to.

What am I doing wrong?

Steve
 
J

Joe Nine

I'm just trying to work out (if what I want is at all possible), a
regular expression that will search for and select (in a text editor
that supports regexps, like Notepad++) the word "onclick", then any
text at all, up to and including ">".

I thought

onclick*\>

would work, but it doesn't.

Basically it needs to Find the word onclick, then select all the text
up to >. Sort of like an extended search.

The wildcard "*" symbol select "the previous token", not "all and
anything" like I am used to.

What am I doing wrong?

Steve

I don't know the right regexp but I do notice that you're making an
assumption that the onclick is always going to be last, before the >
character. It might not be.
 
G

Gabriel Gilini

Joe said:
I don't know the right regexp but I do notice that you're making an
assumption that the onclick is always going to be last, before the >
character. It might not be.
No, he isn't. Read his post again, he wants to match everything that
goes after the string "onclick" until the first appearance of ">".

What I have failed to understand is how the OP issue correlates with
Javascript.
 
T

Thomas 'PointedEars' Lahn

Stefan said:
I'm just trying to work out (if what I want is at all possible), a
regular expression that will search for and select (in a text editor
that supports regexps, like Notepad++) the word "onclick", then any
text at all, up to and including ">".

If the regular expressions used by Notepad++ are similar to those in
JavaScript, you could try

onclick.*?> or onclick[^>]*>

The .*? in the first variant matches anything, in a non-greedy way (as
little as possible).

The [^>]* in the second variant matches any number of characters other
than ">".

One thing that every programmer should know is that SGML-based markup
languages like HTML, and programming languages, are usually not regular
languages (they are of the Correct Bracket Language type: context-free but
not regular), so they cannot be parsed with one regular expression alone (a
false positive for your suggestion has already been mentioned). With
further constraints as the one described here it is sometimes possible to
parse them with one application of a regular expression if the regular
expression grammar supports alternation and other special features.

The other thing is: What does this have to do with ECMAScript-based
scripting languages other than the text being replaced constitutes such
code? IMNSHO, this question is quite off-topic here, and not likely to
be answered in a way that is helpful to OP anyway, since the flavor of
regular expressions that their text editor supports is unknown.


PointedEars
 
T

Thomas 'PointedEars' Lahn

Gabriel said:
No, he isn't. Read his post again, he wants to match everything that
goes after the string "onclick" until the first appearance of ">".

What I have failed to understand is how the OP issue correlates with
Javascript.

Add me.


PointedEars
 
J

Joe Nine

Gabriel said:
No, he isn't. Read his post again, he wants to match everything that
goes after the string "onclick" until the first appearance of ">".

Yes technically that's what he said. I was reading between the lines and
deducing that it's probably not what he wants. I suspect he wants the
contents of the onclick string. Here's an example where he gets more
than that.

< ...onclick="something()" onmouseover="somethingelse()">
 
G

Gabriel Gilini

Thomas said:
<… onclick="if (2 > 1) window.alert(42);">…</…>
Yes, I know that, given the context, that is a faulty request. I was
just stating OP's request.
The short answer would be: Don't.
 
T

Thomas 'PointedEars' Lahn

Gabriel said:
Thomas said:
Gabriel said:
Joe Nine wrote:
(e-mail address removed) wrote:
I'm just trying to work out (if what I want is at all possible), a
regular expression that will search for and select (in a text editor
that supports regexps, like Notepad++) the word "onclick", then any
text at all, up to and including ">".
[...]

Basically it needs to Find the word onclick, then select all the text
up to >. Sort of like an extended search.

The wildcard "*" symbol select "the previous token", not "all and
anything" like I am used to.

What am I doing wrong?
I don't know the right regexp but I do notice that you're making an
assumption that the onclick is always going to be last, before the >
character. It might not be.
No, he isn't. Read his post again, he wants to match everything that
goes after the string "onclick" until the first appearance of ">".

<… onclick="if (2 > 1) window.alert(42);">…</…>
Yes, I know that, given the context, that is a faulty request.

There is nothing faulty about this.
I was just stating OP's request.

You have (mis)interpreted it in your favor.
The short answer would be: Don't.

Nonsense.


PointedEars
 
G

Gabriel Gilini

Thomas said:
Gabriel said:
Thomas said:
Gabriel Gilini wrote:
Joe Nine wrote:
(e-mail address removed) wrote:
I'm just trying to work out (if what I want is at all possible), a
regular expression that will search for and select (in a text editor
that supports regexps, like Notepad++) the word "onclick", then any
text at all, up to and including ">".
[...]

Basically it needs to Find the word onclick, then select all the text
up to >. Sort of like an extended search.

The wildcard "*" symbol select "the previous token", not "all and
anything" like I am used to.

What am I doing wrong?
I don't know the right regexp but I do notice that you're making an
assumption that the onclick is always going to be last, before the >
character. It might not be.
No, he isn't. Read his post again, he wants to match everything that
goes after the string "onclick" until the first appearance of ">".
<… onclick="if (2 > 1) window.alert(42);">…</…>
Yes, I know that, given the context, that is a faulty request.

There is nothing faulty about this.

I think you misunderstood me. What I tried to say is that trying to
match everything after an onclick attribute up to the end of the opening
tag with Regular Expressions in HTML isn't something that could be
relied upon, as you so technically put in your reply to OP.
You have (mis)interpreted it in your favor.

I don't think so.

| (e-mail address removed) wrote:
| > I'm just trying to work out (if what I want is at all possible), a
| > regular expression that will search for and select (in a text editor
| > that supports regexps, like Notepad++) the word "onclick", then any
| > text at all, up to and including ">".

This is exactly what I said.
Nonsense.

Now you're confusing me. Do you think that what OP is trying to
accomplish with Regular Expressions should be done or not?
 
G

Gabriel Gilini

Joe said:
Yes technically that's what he said. I was reading between the lines and
deducing that it's probably not what he wants. I suspect he wants the
contents of the onclick string. Here's an example where he gets more
than that.

< ...onclick="something()" onmouseover="somethingelse()">
That's one way of deducting what he wants, but that's nothing but an
exercise in futility. OP didn't give enough information for us to know
exactly what he wants.

Either way, this probably don't belong do c.l.js
 
T

Thomas 'PointedEars' Lahn

Gabriel said:
I think you misunderstood me. What I tried to say is that trying to
match everything after an onclick attribute up to the end of the opening
tag with Regular Expressions in HTML isn't something that could be
relied upon, as you so technically put in your reply to OP.

You need to read my explanation more carefully. It is quite possible to do
what was intended with regular expressions reliably, just not with any
flavor of regular expressions.
Now you're confusing me. Do you think that what OP is trying to
accomplish with Regular Expressions should be done or not?

I do not see why it should not be done if it is done properly. For example,
I have frequently used Java regular expressions in Eclipse, and sometimes
GNU-BREs, EREs and PCREs in shell scripts, for efficient search-and-replace,
including in HTML documents. With regard to JS/ES and the DOM, using
regular expressions is also the first step in writing an efficient
`innerHTML' replacement.

So there certainly is value in knowing how to use flavors of regular
expressions to solve the parsing problem of context-free languages.


HTH

PointedEars
 
T

Thomas 'PointedEars' Lahn

Stefan said:
Thomas said:
Stefan said:
(e-mail address removed) wrote:
I'm just trying to work out (if what I want is at all possible), a
regular expression that will search for and select (in a text editor
that supports regexps, like Notepad++) the word "onclick", then any
text at all, up to and including ">".

If the regular expressions used by Notepad++ are similar to those in
JavaScript, you could try

onclick.*?> or onclick[^>]*>

The .*? in the first variant matches anything, in a non-greedy way (as
little as possible).

The [^>]* in the second variant matches any number of characters other
than ">".

One thing that every programmer should know is that SGML-based markup
languages like HTML, and programming languages, are usually not regular
languages (they are of the Correct Bracket Language type: context-free
but not regular), so they cannot be parsed with one regular expression
alone

And I never said they could.

You are misunderstanding my followup as an attempt at complete rebuttal of
your arguments.
Besides, it would depend on the type of regular expression used. For
example, take Perl's (?{...}) and (??{...}) constructs, which can be used
to embed Perl code in regexes. Same thing goes for the /e modifier in Perl
substitutions. Voila, Turing complete regular expressions. (yeah, I know
that's cheating ;-)

(?R…) suffices with PCRE said:
The first (highest rated) comment on this page is a good indication of
what happens when you think too hard about parsing HTML with regular
expressions:
http://stackoverflow.com/questions/1732348/regex-match-open-tags-except
xhtml-self-contained-tags

Yes, cluelessness is a widespread disease, and especially common at
stackoverflow. You can parse HTML with regular expressions, just not
with a (non-PCRE) regular expression alone.
(a false positive for your suggestion has already been mentioned).

A false positive for what? The OP wanted to match...

| the word "onclick", then any text at all, up to and including ">".

...which is just what the proposed expressions do. [...]

No, think again.


PointedEars
 
T

Thomas 'PointedEars' Lahn

Stefan said:
Thomas said:
Stefan said:
Thomas 'PointedEars' Lahn wrote:
(a false positive for your suggestion has already been mentioned).

A false positive for what? The OP wanted to match...

| the word "onclick", then any text at all, up to and including ">".

...which is just what the proposed expressions do. [...]

No, think again.

I'm curious. Do you mean that "any text at all" should exclude the empty
string as an edge case? [...]

I mean that it should include "any text" to begin with. Granted, the OP's
request is ambiguous to a large degree, but I would not assume "any text" to
exclude `>' characters. So if there is a correct answer to this "question"
it should, IMO, be more like

onclick.*>

(Not that this would likely be overly useful, of course.)


PointedEars
 
S

stevewy

If the regular expressions used by Notepad++ are similar to those in
JavaScript, you could try

  onclick.*?>    or    onclick[^>]*>
Well, it seems onclick="[^"]*" matches what I want (which is up to the
symbol, but not including it as I erroneously stated in my original
post), but unfortunately does not include newlines. To be
comprehensive, it would need to include any newline characters between
onclick and the " symbol.

So, it needs to select from onclick, all the way through the onclick
statement till it finds the closing " mark. Across several lines if
necessary.

The reason I am needing this rather odd thing done, is that at work I
deal with putting client-side validation into questionnaires, that are
churned out by a survey application. The client-side validation
relies, of course, on Javascript and has a lot of onClick statements.
Later on in the life-cycle of the questionnaire, the validation needs
to be stripped out. I am using Notepad++ that accommodates regexps in
its find & replace feature.

Being that onClick statements are of the form onClick=" [JS
statements] " and each onClick is placed inside a form element tag
of the HTML (like <INPUT>), I thought it would save time to use the
find & replace feature of Notepad++ to select onClick statements and
replace them with nothing, thus removing them.

I realise, Thomas, that this is more a regexp query and not
exclusively Javascript, although I am using it in a JS task.

Does this help in figuring out the regexp I would need to accomplish
this? At the moment, I am "so near and yet so far"....

Steve
 
S

stevewy

Given the extra information supplied above, I did a more targeted
Google search about my problem, and found this article:
http://blog.microugly.com/2009/10/notepad-linebreaks-in-regular.html,
which would indicate Notepad++ does not have very good support of
regular expressions anyway. Other text editors have an option to
specify whether "." includes newlines or not, but Notepad does not.

And so, other than following the tip supplied in the article, it does
not seem very likely that a regexp could be found to accomplish
exactly what I need, not in Notepad++ at any rate.

Anyway, thank you for the responses supplied to my initial query.

Steve
 
S

SAM

Le 6/11/10 11:41 AM, (e-mail address removed) a écrit :
If the regular expressions used by Notepad++ are similar to those in
JavaScript, you could try

onclick.*?> or onclick[^>]*>
Well, it seems onclick="[^"]*" matches what I want (which is up to the
symbol, but not including it as I erroneously stated in my original
post), but unfortunately does not include newlines.

maybe :

onclick="([^"]|\s)*"

I am using Notepad++ that accommodates regexps in
its find & replace feature.

Don't know NotePad, sorry.
Being that onClick statements are of the form onClick=" [JS
statements] " and each onClick is placed inside a form element tag
of the HTML (like <INPUT>), I thought it would save time to use the
find & replace feature of Notepad++ to select onClick statements and
replace them with nothing, thus removing them.

search :
(onclick=")([^"]|\s)*
replace :
\1"
or ?
$1"
 
S

SAM

Le 6/11/10 1:00 PM, (e-mail address removed) a écrit :
On 11 June, 11:55, SAM <[email protected]>
wrote:
maybe :

onclick="([^"]|\s)*"

No, it doesn't do anything with that string.

Sorry,
that works fine in my text editor.

Rest to use a JS tool ?

<form onsubmit="return doIt(this)">
<div>
Enter your code here :<br>
<textarea name="txt" cols=80 rows=16></textarea><br>
Search: <input name="fSearch"><br>
Replace: <input name="fReplace"><br>
<input type="submit" value="replace all">
<input type="reset" onclick="restitue(this)">
</div>
</form>
<script type="text/javascript">
var memoriz = '';
function doIt(where) {
var f = where.fSearch.value,
r = where.fReplace.value,
t = where.txt;
if(memoriz=='') memoriz = t.value;
var rg = new RegExp ( f, 'ig');
t.value = t.value.replace(rg,r);
return false;
}
function restitue(what) {
setTimeout( function() {
if(memoriz!='')
what.form.txt.value = memoriz;
memoriz = '';
},10);
}
</script>
 
D

Dr J R Stockton

In comp.lang.javascript message <b60bba22-da8f-4437-baca-ffbe23c8b5e9@z1
0g2000yqb.googlegroups.com>, Thu, 10 Jun 2010 09:14:52,
(e-mail address removed) posted:
I'm just trying to work out (if what I want is at all possible), a
regular expression that will search for and select (in a text editor
that supports regexps, like Notepad++) the word "onclick", then any
text at all, up to and including ">".

How "like" must it be?

MiniTrue will do it, at least at the XP 32-bit command line (CMD.EXE) :

PROMPT>mtr $1.htm "onclick[^^]*>"

The character ^ is a command-line escape, so only the second one counts;
[^] thus means to search for not nothing, which, at least in MiniTrue,
is a more potent "anything" than a mere dot is.

You could see if that works in Notepad++. Or in JavaScript.

MiniTrue is not a full interactive editor, but it can do substitutions.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,969
Messages
2,570,161
Members
46,710
Latest member
bernietqt

Latest Threads

Top