Regular Expression Help

P

pbreah

I need to figure out a pattern that can match each letter of the
message, but leaves all the html entities alone.

For example, I have a input like this:

<div>
This is the content &nbsp; &lt; Hello &gt;
</div>

Just as an example to make it more clearer, If I wanted to replace the
all letters of the message with the number "1" I would have this
resut:

<div>
1111 11 111 1111111 &nbsp; &lt; 11111 &gt;
</div>

Can anyone help?

Thanks in advanced...
 
P

pbd22

I need to figure out a pattern that can match each letter of the
message, but leaves all the html entities alone.

For example, I have a input like this:

<div>
This is the content &nbsp; &lt; Hello &gt;
</div>

Just as an example to make it more clearer, If I wanted to replace the
all letters of the message with the number "1" I would have this
resut:

<div>
1111 11 111 1111111 &nbsp; &lt; 11111 &gt;
</div>

Can anyone help?

Thanks in advanced...


Hi.
Check out the "replace" method for javascript strings.
If that doesn't do what you are looking for, try checking
out the various ways of manipulating strings here:

http://javascriptkit.com/javatutors/string4.shtml

hope that helps.
 
G

Geoffrey Summerhayes

I need to figure out a pattern that can match each letter of the
message, but leaves all the html entities alone.

For example, I have a input like this:

<div>
This is the content &nbsp; &lt; Hello &gt;
</div>

Just as an example to make it more clearer, If I wanted to replace the
all letters of the message with the number "1" I would have this
resut:

<div>
1111 11 111 1111111 &nbsp; &lt; 11111 &gt;
</div>

Not sure if it can be done in one line
but then I don't have too much experience
using regexps.

Best I could come up with off the cuff(partially tested):

var y=/(<[^<>&;]*>)|(&[a-z]*;)|([^<>&]*)/g;
var array=x.match(y);
var output="";
for(var i=0;i<array.length;i++)
{
var str=array;
if(/[<&]/.test(str))
output+=str;
else
{
output+=str.replace(/\S/g,"1");
}
}
 
D

Dr J R Stockton

In comp.lang.javascript message <[email protected]
I need to figure out a pattern that can match each letter of the
message, but leaves all the html entities alone.

For example, I have a input like this:

<div>
This is the content &nbsp; &lt; Hello &gt;
</div>

Just as an example to make it more clearer, If I wanted to replace the
all letters of the message with the number "1" I would have this
resut:

<div>
1111 11 111 1111111 &nbsp; &lt; 11111 &gt;
</div>

Can anyone help?

The following code, probably slowly, encodes all alphanumeric entities
by adding 999 to their numerical value. Slightly tested.

It is then trivial to replace all remaining letters by "1" and to see
how to reverse the 999.

If your text may contain Russian, use another number or be more careful
about reversing.

St = "<div>\nThis is the content &nbsp; &lt; Hello &gt;\n</div>"

function WOK(P, x) { return P.replace(/(\w)/g, // Encode all chars by x
function (z, p1) {
return String.fromCharCode((p1.charCodeAt(0)+x)) } ) }

St = St.replace(/(&\w+;)/g, function (z, p1) { return WOK(p1, +999) } )




More is needed if the message can contain such as <b>no</b>, since the
markup would also need to be protected.

The complete tool should be able to irreversibly obfuscate the content
of a Web page, so that it could be submitted for criticism without
revealing its textual content.


Afterthought : put a semicolon at the beginning and an ampersand at the
end, and replace every letter between a semicolon and the next ampersand
with a "1",

It's a good idea to read the newsgroup and its FAQ. See below.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,819
Latest member
masterdaster

Latest Threads

Top