Regular expression help

R

RobG

I'm working with XML files that sometimes use a default namespace,
unfortunately there doesn't seem to be an elegant way of dealing with
them. In some cases I need to modify part of the expression to include
a random namespace, e.g. change:

/LandXML/Parcels/Parcel

into something like:

/xx:LandXML/xx:parcels/xx:parcel

Sometimes the expression starts with // so I've been using the
following regular expression:

expr = expr.replace(/(\/+)/g,'$1xx:');


which works fine in most cases. However, sometimes the expression
includes an attribute value that has slashes. In that case, I don't
want to modify the attribute value's slash. e.g. at the moment,

/LandXML/Parcels/Parcel[@name="79a/SP199095"]

is converted to:

/xx:LandXML/xx:parcels/xx:parcel[@name="79a/xx:SP199095"]


Modifying the attribute value means that the result will be wrong. Is
there a regular expression that will only modify slashes outside
square brackets?

An alternative is to fix the namespace when building the expression,
which is less elegant than conditionally modifying the expression in
the evaluator function.
 
K

Ken Snyder

...
... Is
there a regular expression that will only modify slashes outside
square brackets?
...

I'm pretty sure a single regular expression can't do that. You could
try splitting the expression at 2 double quotes and replacing slashes
in every other array part. Capturing the split characters is not
supported across browsers, so you would need to use a function from a
library like Prototype that normalizes it.

The splitting expression might look something like this: /("["]+")/

- Ken
 
D

Dr J R Stockton

In comp.lang.javascript message <66446dfb-e9cb-4d82-8146-377423e48aa8@k4
g2000prh.googlegroups.com>, Tue, 23 Mar 2010 16:32:24, RobG
Modifying the attribute value means that the result will be wrong. Is
there a regular expression that will only modify slashes outside
square brackets?

An alternative is to fix the namespace when building the expression,
which is less elegant than conditionally modifying the expression in
the evaluator function.

If you in fact only need to modify / where not somewhere preceded by [,
you could probably split on [, RegExp-alter [0], and rejoin.

expr = '/LandXML/Parcels/Parcel[@name="79a/SP199095"]'
S = expr.split("[")
S[0] = S[0].replace(/(\/+)/g,'$1xx:');
expr = S.join("[")

gave '/xx:LandXML/xx:parcels/xx:parcel[@name="79a/SP199095"]'.

Or you could use, which might be more flexible,

expr = '/LandXML/Parcels/Parcel[@name="79a/SP199095"]'
S = "" ; X = 0
for (J=0 ; J<expr.length ; J++) { C = expr.charAt(J) ; S += C
if (C == "[") X++ ;
else if (C == "]") X-- ;
else if (C == "/" && X<=0) S += "xx:" }
expr = S
 
A

Antony Scriven

[... ] I need to modify part of the expression to include
a random namespace, e.g. change:

/LandXML/Parcels/Parcel

into something like:

/xx:LandXML/xx:parcels/xx:parcel

Sometimes the expression starts with // so I've been using the
following regular expression:

expr = expr.replace(/(\/+)/g,'$1xx:');

which works fine in most cases. However, sometimes the expression
includes an attribute value that has slashes. In that case, I don't
want to modify the attribute value's slash. e.g. at the moment,

/LandXML/Parcels/Parcel[@name="79a/SP199095"]

is converted to:

/xx:LandXML/xx:parcels/xx:parcel[@name="79a/xx:SP199095"]

Modifying the attribute value means that the result will be wrong. Is
there a regular expression that will only modify slashes outside
square brackets?

expr = expr.replace(/(?![^[]*\])\//g, '$&xx:');

Can your attribute value contain square brackets or escaped
quotation marks? --Antony
 
A

Antony Scriven

[...]

little typo ('+' missing) :
expr = expr.replace(/(?![^[]*\])\/+/g, '$&xx:');
....................................^

Ah yes, I missed that part of the Rob's specification,
thanks. --Antony
 
R

RobG

On Mar 23, 11:32pm, RobG wrote:

 > [... ] I need to modify part of the expression to include
 > a random namespace, e.g. change:
 >
 > /LandXML/Parcels/Parcel
 >
 > into something like:
 >
 > /xx:LandXML/xx:parcels/xx:parcel
 >
 > Sometimes the expression starts with // so I've been using the
 > following regular expression:
 >
 >  expr = expr.replace(/(\/+)/g,'$1xx:');
 >
 > which works fine in most cases. However, sometimes the expression
 > includes an attribute value that has slashes. In that case, I don't
 > want to modify the attribute value's slash. e.g. at the moment,
 >
 > /LandXML/Parcels/Parcel[@name="79a/SP199095"]
 >
 > is converted to:
 >
 > /xx:LandXML/xx:parcels/xx:parcel[@name="79a/xx:SP199095"]
 >
 > Modifying the attribute value means that the result will be wrong.
Is
 > there a regular expression that will only modify slashes outside
 > square brackets?

expr = expr.replace(/(?![^[]*\])\//g, '$&xx:');

Can your attribute value contain square brackets or escaped
quotation marks? --Antony

It is for a general XPath expression evaluator, so I don't want to
have any restrictions on attribute vaules other than those specified
by XML.
 
R

RobG

In comp.lang.javascript message <66446dfb-e9cb-4d82-8146-377423e48aa8@k4
g2000prh.googlegroups.com>, Tue, 23 Mar 2010 16:32:24, RobG
<[email protected]> posted:


Modifying the attribute value means that the result will be wrong. Is
there a regular expression that will only modify slashes outside
square brackets?
An alternative is to fix the namespace when building the expression,
which is less elegant than conditionally modifying the expression in
the evaluator function.

If you in fact only need to modify / where not somewhere preceded by [,
you could probably split on [, RegExp-alter [0], and rejoin.

        expr = '/LandXML/Parcels/Parcel[@name="79a/SP199095"]'
        S = expr.split("[")
        S[0] = S[0].replace(/(\/+)/g,'$1xx:');
        expr = S.join("[")

gave '/xx:LandXML/xx:parcels/xx:parcel[@name="79a/SP199095"]'.

Or you could use, which might be more flexible,

        expr = '/LandXML/Parcels/Parcel[@name="79a/SP199095"]'
        S = "" ; X = 0
        for (J=0 ; J<expr.length ; J++) { C = expr.charAt(J) ; S += C
          if        (C == "[") X++ ;
            else if (C == "]") X-- ;
            else if (C == "/" && X<=0) S += "xx:" }
        expr = S

I'd though of something along those lines, however I think I'll deal
with it at the expression builder stage, then pass a variable to
indicate whether a default namespace is being used or not.

It doesn't matter for IE (as it doesn't know what namespaces are
anyway), but Firefox is another matte.
 
A

Antony Scriven

On Mar 23, 11:32pm, RobG wrote:
[...]

expr = expr.replace(/(\/+)/g,'$1xx:');

[...]

/LandXML/Parcels/Parcel[@name="79a/SP199095"]

is converted to:

/xx:LandXML/xx:parcels/xx:parcel[@name="79a/xx:SP199095"]

Modifying the attribute value means that the result
will be wrong. Is there a regular expression that
will only modify slashes outside square brackets?
expr = expr.replace(/(?![^[]*\])\//g, '$&xx:');
Can your attribute value contain square brackets or
escaped quotation marks? --Antony

It is for a general XPath expression evaluator, so
I don't want to have any restrictions on attribute vaules
other than those specified by XML.

Now you're moving the goalposts. I've no idea what you're
really trying to accomplish, but it now sounds that you
might be better off writing a proper parser rather than
messing about with regexps. --Antony
 
E

Elegie

RobG wrote:

Hello Rob,

However, sometimes the expression
includes an attribute value that has slashes. In that case, I don't
want to modify the attribute value's slash. e.g. at the moment,

/LandXML/Parcels/Parcel[@name="79a/SP199095"]

is converted to:

/xx:LandXML/xx:parcels/xx:parcel[@name="79a/xx:SP199095"]


Modifying the attribute value means that the result will be wrong. Is
there a regular expression that will only modify slashes outside
square brackets?

The following should give you some path worth walking :)
 
A

Antony Scriven

[...] In that case, I don't want to modify the
attribute value's slash. e.g. at the moment,
/LandXML/Parcels/Parcel[@name="79a/SP199095"]

is converted to:
/xx:LandXML/xx:parcels/xx:parcel[@name="79a/xx:SP199095"]

Modifying the attribute value means that the result
will be wrong. Is there a regular expression that will
only modify slashes outside square brackets?

The following should give you some path worth walking :)

---
   (s).replace(
     /\/([^[/]*(\[(\\.|[^\\\]]*)*\])?)/g,
     "/xx:$1"
   )

Well, if you like brittle one-liners, try this.

var s = '//LandXML/Parcels/Parcel[@name="]7]' +
'9a/S\\"P19[9\\"]0/9[/5"]/Blah';

// Add xx: after slashes, only if they are not within "...".
// Handles \" within "...".
s.replace(
/(\/+)(("([^"]*\\.)*[^"]*"|[^\/])+)/g,
'$1xx:$2'
);

There might be a neater regexp but I'm beginning to lose
the will! And yes, this is now beginning to resemble line
noise. --Antony
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

Forum statistics

Threads
473,994
Messages
2,570,223
Members
46,814
Latest member
SpicetreeDigital

Latest Threads

Top