Search/replace patterns in web pages?

J

Jane Doe

Hi,

I need to search and replace patterns in web pages, but I
can't find a way even after reading the ad hoc chapter in New Rider's
"Inside JavaScript".

Here's what I want to do:

function filter() {
var items = new Array("John", "Jane");

for (x = 0; x < items.length; x++) {
//Doesn't work
pattern = '/' + items[x] + '/';
//Doesn't work either
document.body = document.body.replace(pattern,"IGNORED");
}

ie., create an array of items to look for in the BODY section of the
page, and if any item exists, replace the item with IGNORED.

Anyone knows how to do this?

Thank you
JD.
 
L

Lasse Reichstein Nielsen

Jane Doe said:
Hi,

I need to search and replace patterns in web pages, but I
can't find a way even after reading the ad hoc chapter in New Rider's
"Inside JavaScript".

Here's what I want to do:

function filter() {
var items = new Array("John", "Jane");

for (x = 0; x < items.length; x++) {
//Doesn't work
pattern = '/' + items[x] + '/';

This builds a string. (Make pattern a local variable with the "var" operator,
no need to have it global).
//Doesn't work either
document.body = document.body.replace(pattern,"IGNORED");

The object document.body is a DOM Node, not a text string.
What you can do, in some browsers, is to work on
document.body.innerHTML.

Also, change "pattern" to "new RegExp(items[x],'')" in this line. Then
you have created a regular expression with the name as content.

There is no need to run through all the items on at a time.
You can replace the entire for loop with

document.body.innerHTML =
document.body.innerHTML.replace(new RegExp(items.join("|"),""),"IGNORED");

(This way, the regualr expression becomes "John|Jane". Since you replace
them with the same string, you can just match them at the same time.

/L
 
J

Jane Doe

document.body.innerHTML =
document.body.innerHTML.replace(new RegExp(items.join("|"),""),"IGNORED");

Thx a bunch Lasse for the prompt answer :) It looks like a much
better solution, although I'll still have to find out the following:

1. innerHTML only seems to work in IE. Doesn't work with Opera 5 and
might not work with Netscape

2. Only the first occurence of the pattern is replace, ie. if I have
(John|Jane), and those items both appear in the page, only the first
occurence is replaced (the second is ignored). I assume I need to add
/g somewhere to tell JS to search & replace _all_ occurences

3. I'm actually parsing rows in a table, so need to construct a more
complicated search pattern than the one I gave to get started. The
goal is to replace any row that contains any of the items into an
empty row (ie.
<tr><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td></tr>).

FWIW, here's what I'd like to do:

---------
function clean() {
var items = new Array("John", "Jane");
document.body.innerHTML = document.body.innerHTML.replace(new
RegExp(items.join("|"),""),"IGNORED");
}

[...]

<body onload='clean()()'>

<table>
<tr>
<td bgcolor="#FFFFFF" ><a
href="forum.php?forum=myforum&m=123">Title</a></td>
<td bgcolor="#FFFFFF">John</td>
<td bgcolor="#FFFFFF">10</td>
<td bgcolor="#FFFFFF">Posted 13 sept</td>
</tr>
<tr>
<td bgcolor="#FFFFFF" ><a
href="forum.php?forum=myforum&m=124">Title</a></td>
<td bgcolor="#FFFFFF">Jane</td>
<td bgcolor="#FFFFFF">2</td>
<td bgcolor="#FFFFFF">Posted 12 sept</td>
</tr>
</table>

---------

If you have any idea or sample code on the Net swhere, I'm interested
:)

Thx again for your help
JD.
 
L

Lasse Reichstein Nielsen

Jane Doe said:
Thx a bunch Lasse for the prompt answer :) It looks like a much
better solution, although I'll still have to find out the following:

1. innerHTML only seems to work in IE. Doesn't work with Opera 5 and
might not work with Netscape

It works in IE 4+, Opera 7 and Mozilla. Perhas a few other recent
browsers. Any older browsers are out.

On the other hand, Netscape 4 and Opera 6 will not allow you to change
the contents of the page at all, after it is loaded. So there is no
method that works there.

If you can ignore IE 4, I would prefer to use DOM methods, traversing
the DOM tree and changing the text in the text nodes.
2. Only the first occurence of the pattern is replace, ie. if I have
(John|Jane), and those items both appear in the page, only the first
occurence is replaced (the second is ignored). I assume I need to add
/g somewhere to tell JS to search & replace _all_ occurences

Doh. Yes, the place to add the "g" is in the second argument to RegExp
(currently an empty string, make it "g", and perhaps even "gi").
Also notice that you match even inside words, so Johnson becomes
IGNOREDson. You can fix that, by making the regular expression

new RegExp("\\b("+items.join("|")+")\\b","gi");

The "\b" matches the boundary between a word character and a non-word
character, so it won't match after "John" in "Johnson".
3. I'm actually parsing rows in a table, so need to construct a more
complicated search pattern than the one I gave to get started.

It is sometimes easier to split the problem into more than one regular
expression. E.g., one to find a table row, another to test whether
it contains the forbidden words. You can alway combine them, they might
just be horribly much bigger.
function clean() {

Ok. If we only aim at newer browsers, try this:

function clean() {
var body = document.body.innerHTML;
var itemRE = new RegExp("\\b("+items.join("|")+")\\b","gi");
body = body.replace(/<tr(.|\s)*?<\/tr>/gi,function(row) {
if (row.match(itemRE)) {
return "";
} else {
return row;
}
});
document.body.innerHTML = body;
}

it replace each table row (from "<tr" to "</tr>") with either
itself or the empty string, depending on whether the row
contains the words in the "items" array.

/L
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,002
Messages
2,570,258
Members
46,858
Latest member
FlorrieTuf

Latest Threads

Top