Help me figure out this regular expression

A

asaf.peled

I have an XML Editor (Oxygen) that allows regular expressions in its
Find / Replace area.

It accepts Perl 5 regexp which I taught myself the very basics of only
a few days ago.

Anyway I dont know if this is possible or not but it would make my life
1000 times easier if it was.

I have XML that looks like this:

<book>
<title>some title</title>
<chapter/>
</book>

Basically I need to erase entire book chunks whose chapter nodes are
empty. I'm not a programmer otherwise im sure I could write some quick
java to do this in two seconds.

I was playing around with the regexp find/replace in Oxygen and the
best I can do is match the entire <title>some title</title> line with
the following expression: <title>.+<title> But what I need is to match
that entire chunk above. I tried things like
<book>\S+<title>.+</title>\S+<chapter/>\S+</book> OR
<book>[\t\n\r\f\v]*<title> etc..

These dont work.

Does anyone know how to accomplish this?

Thanks for your help!
 
R

roger_owen333

<book>
<title>some title</title>
<chapter/>
</book>
\s Match whitespace character
\S Match non-whitespace character
.. Match any character
* Match 0 or more times
+ Match 1 or more times
Perhaps:
<book>\s*<title>.+</title>\s*<chapter/>\s*</book>
may work. Roger ;-)
<book>\S+<title>.+</title>\S+<chapter/>\S+</book>
 
R

Robert Watkins

Don't know if this is exactly what you're after, because it will match
any book chunk that's got a single empty chapter (and doesn't care about
the <title/> element):

<book>(.*?)<chapter/>(.*?)</book>\s*

note that the full expression in Perl would be:

s#<book>(.*?)<chapter/>(.*?)</book>\s*##gs

where the g flag says this is a global replacement, and the s flag says
to allow . to match newlines. These flags may be available in oXygen, I
don't know.

Also, I am assuming that your xml is something like:

<opus>
<book>
<title>title no chap</title>
<chapter/>
</book>
<book>
<title>title with chap</title>
<chapter>chapter title</chapter>
</book>
</opus>

and that the result you want would be:

<opus>
<book>
<title>title with chap</title>
<chapter>chapter title</chapter>
</book>
</opus>

-- RW
 
M

Matt Garrish

<book>
<title>some title</title>
<chapter/>
</book>
\s Match whitespace character
\S Match non-whitespace character
. Match any character
* Match 0 or more times
+ Match 1 or more times
Perhaps:
<book>\s*<title>.+</title>\s*<chapter/>\s*</book>

I would strongly suggest you read up on greedy vs. non-greedy pattern
matching. .+ will find everything until the *last* closing title tag
followed by the others. More often than not, this not what you want.
<title>.*?</title> is more likely what you were looking for.

Matt
 
R

Robert Watkins

I have since played with oXygen, and can't get any regular expression to
work, in the "find" field, over more than a single line -- even their
{$NEWLINE} variable gives an error if put into the "find" field. My guess
is that their claim to support "any Perl regular expression" is a bit
overblown.

-- RW
 
A

asaf.peled

Yeah I got a response from their tech support saying the use of reg exp
only applies to one line at a time and they plan to fix it in their
next version.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,183
Messages
2,570,966
Members
47,515
Latest member
Harvey7327

Latest Threads

Top