complex regex

C

carlbernardi

HI,

I am new to java.util.regex package which I am using to detect each
time the javascript tag occurs in an html file and delete it. I tried
using the following code to find examples such as the ones below but
instead it finds the first occurrence of "<" and the last occurrence
of ">" which is not what I am looking for.

<script>
<script src="script.js">
</script>

String mat = "<html><script><p><font></script>";
String pat = "<*[\\x00-\\x7f]*jscript*[a-z0-9]*>";
Pattern pattern = Pattern.compile(pat);
Matcher matcher = pattern.matcher(mat);
while(matcher.find()){
System.out.println("Match: "+matcher.group()+" Start:"+
matcher.start()+" End:"+ matcher.end());
}

output:
Match: <html><javascript><p><font><javascript> Start:0 End:39

i would be looking for an out put of:
Match: <javascript> Start:6 End:18
Match: <javascript> Start:27 End:18

Appreciate any input,

Carl
 
G

Gordon Beaton

I am new to java.util.regex package which I am using to detect each
time the javascript tag occurs in an html file and delete it. I
tried using the following code to find examples such as the ones
below but instead it finds the first occurrence of "<" and the last
occurrence of ">" which is not what I am looking for.

Of course, because you are using greedy quantifiers, which will match
as much as possible. Use reluctant quantifiers instead, or at least a
more restrictive set of characters before and after "jscript".

/gordon

--
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,982
Messages
2,570,190
Members
46,736
Latest member
zacharyharris

Latest Threads

Top