Help me figure out this regular expression

asaf.peled · Jan 25, 2006

I have an XML Editor (Oxygen) that allows regular expressions in its
Find / Replace area.

It accepts Perl 5 regexp which I taught myself the very basics of only
a few days ago.

Anyway I dont know if this is possible or not but it would make my life
1000 times easier if it was.

I have XML that looks like this:

<book>
<title>some title</title>
<chapter/>
</book>

Basically I need to erase entire book chunks whose chapter nodes are
empty. I'm not a programmer otherwise im sure I could write some quick
java to do this in two seconds.

I was playing around with the regexp find/replace in Oxygen and the
best I can do is match the entire <title>some title</title> line with
the following expression: <title>.+<title> But what I need is to match
that entire chunk above. I tried things like
<book>\S+<title>.+</title>\S+<chapter/>\S+</book> OR
<book>[\t\n\r\f\v]*<title> etc..

These dont work.

Does anyone know how to accomplish this?

Thanks for your help!

roger_owen333 · Jan 25, 2006

<book>
<title>some title</title>
<chapter/>
</book>
\s Match whitespace character
\S Match non-whitespace character
.. Match any character
* Match 0 or more times
+ Match 1 or more times
Perhaps:
<book>\s*<title>.+</title>\s*<chapter/>\s*</book>
may work. Roger ;-)
<book>\S+<title>.+</title>\S+<chapter/>\S+</book>

Robert Watkins · Jan 25, 2006

Don't know if this is exactly what you're after, because it will match
any book chunk that's got a single empty chapter (and doesn't care about
the <title/> element):

<book>(.*?)<chapter/>(.*?)</book>\s*

note that the full expression in Perl would be:

s#<book>(.*?)<chapter/>(.*?)</book>\s*##gs

where the g flag says this is a global replacement, and the s flag says
to allow . to match newlines. These flags may be available in oXygen, I
don't know.

Also, I am assuming that your xml is something like:

<opus>
<book>
<title>title no chap</title>
<chapter/>
</book>
<book>
<title>title with chap</title>
<chapter>chapter title</chapter>
</book>
</opus>

and that the result you want would be:

<opus>
<book>
<title>title with chap</title>
<chapter>chapter title</chapter>
</book>
</opus>

-- RW

Matt Garrish · Jan 25, 2006

<book>
<title>some title</title>
<chapter/>
</book>
\s Match whitespace character
\S Match non-whitespace character
. Match any character
* Match 0 or more times
+ Match 1 or more times
Perhaps:
<book>\s*<title>.+</title>\s*<chapter/>\s*</book>

I would strongly suggest you read up on greedy vs. non-greedy pattern
matching. .+ will find everything until the *last* closing title tag
followed by the others. More often than not, this not what you want.
<title>.*?</title> is more likely what you were looking for.

Matt

Robert Watkins · Jan 27, 2006

I have since played with oXygen, and can't get any regular expression to
work, in the "find" field, over more than a single line -- even their
{$NEWLINE} variable gives an error if put into the "find" field. My guess
is that their claim to support "any Perl regular expression" is a bit
overblown.

-- RW

asaf.peled · Jan 30, 2006

Yeah I got a response from their tech support saying the use of reg exp
only applies to one line at a time and they plan to fix it in their
next version.

Recursion regular expression (xtended)	1	Aug 16, 2010
Regular Expression Help?	5	Feb 4, 2009
Regular expression help	2	Sep 24, 2009
Please help me understand this expression	1	Jan 26, 2011
Regular Expression	9	Sep 7, 2007
GET NEIL DEGRASSES TYSON, I ripped a hole with this one...	0	Nov 10, 2022
Requesting regular expression help	12	Feb 26, 2010
Help needed with tough regular expression matching	11	Oct 12, 2009

Help me figure out this regular expression

asaf.peled

roger_owen333

Robert Watkins

Matt Garrish

Robert Watkins

asaf.peled

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads