Regular expression to match a nested quoted string

A

a

I need to write a regular expression to match a quoted string in which the
double quote character itself is represented by 2 double quotes. For
example:

"beginning ""nested quoted string"" end"

Any idea how to write this in boost::xpressive or boost::regex.

Thanks,

A
 
J

Jim Langston

a said:
I need to write a regular expression to match a quoted string in which the
double quote character itself is represented by 2 double quotes. For
example:

"beginning ""nested quoted string"" end"

Any idea how to write this in boost::xpressive or boost::regex.

Thanks,

A

I'm not familiar with boost, but in C++ you need to escape the " like \"
Try
"beginning \"nested quoted string\" end"
 
P

Pete Becker

a said:
I need to write a regular expression to match a quoted string in which the
double quote character itself is represented by 2 double quotes. For
example:

"beginning ""nested quoted string"" end"

Quotes aren't special characters in any of the regular expression
grammars supported by TR1, so the regular expression is

""(.*)""

To write that as a C++ text string, add escapes to the quotes:

char *rx = "\"\"(.*)\"\"";

That is, unless you're using the basic or grep grammars. In those cases
you need \( instead of (, so the regular expression is

""\(.*\)""

and the corresponding literal constant has more escapes:

char *rx = "\"\"\\(.*\\)\"\"";

--

-- Pete

Author of "The Standard C++ Library Extensions: a Tutorial and Reference."
For more information about this book, see www.petebecker.com/tr1book.
 
A

a

Thanks for your answers.

I'll clarify my question.

The issue for me is not how to escape quote characters in a C quoted string,
but rather what regular expression to use to match quoted strings that
contain other nested quoted strings

The strings are coming from an external text file in Microsoft rc file
format used to describe resources used by an application. Here is a sample:

IDD_ADD_DUPLICATE
......
CAPTION "Duplicate Entry Found"
......
LTEXT "You are trying to add the entry""%s <%s>"" to your
list, but ""%s <%s>"" already exists.\n\n\nWould you like to merge these two
entries instead of creating a new one?",
.......
END

My expression must match the first string ("Duplicate Entry Found" ) as well
as the _entire_ string after LTEXT:

"You are trying to add the entry""%s <%s>"" to your list, but ""%s <%s>""
already exists.\n\n\nWould you like to merge these two entries instead of
creating a new one?"

If I use a simple regex for a quoted string, it will stop at the first " and
it will match only: "You are trying to add the entry".

So the question is how to make the expression recognize that one '"' is the
end of the string, but '""' is part of the string.

I tried using static regexes in boost::xpressive, but they generate runtime
stack overflows when I add patterns for the inner "" characters, so I'm
assuming the expressions are not correct.

Thanks,

A
 
D

David Harmon

I need to write a regular expression to match a quoted string in which the
double quote character itself is represented by 2 double quotes. For
example:

"beginning ""nested quoted string"" end"

Quotes aren't special characters in any of the regular expression
grammars supported by TR1, so the regular expression is

""(.*)""[/QUOTE]

Close, but no cigar. That will match a string that has quotes at
the begin and end but the quotes inside (if any) might not be
doubled.

If a quoted string "A" is matched simply by /"[^"]*"/
or in c++, "\"[^\"]*\""

A string with possible double quotes in it "A""A" is just a series
of the above. So it is /("[^"]*")*/, or in c++ "(\"[^\"]*\")*".



#include <string>
#include <iostream>

#define BOOST_REGEX_NO_FILEITER
#include <boost/regex.hpp>

using namespace std;
using namespace boost;

bool check (std::string what)
{
static const boost::regex expr("(\"[^\"]*\")*" );
boost::smatch tokens;
bool result = regex_match(what, tokens, expr);
cout << result << "\t(" << what << ")\n";
return result;
}

int main(int argc, char* argv[])
{
check("\"I am a duck\"");
check("\"I am a \"\" duck\"");
check("\"I \"\"am\"\" a \"\" duck\"");
check("I am not a duck");
check("\"I am \"not\" a duck\"");
check("I am \"not\" a duck");
check("\"I am not \" a duck\"");
}
 
P

Pete Becker

David said:
On Sat, 09 Sep 2006 06:58:37 -0400 in comp.lang.c++, Pete Becker



Close, but no cigar. That will match a string that has quotes at
the begin and end but the quotes inside (if any) might not be
doubled.

You're reading far too much into a vague specification. The regular
expression I gave matches the nested quoted string in the example.

--

-- Pete

Author of "The Standard C++ Library Extensions: a Tutorial and Reference."
For more information about this book, see www.petebecker.com/tr1book.
 
A

a

Actually David's regex does the job. It's my fault that my initial specs
weren't too clear - see my previous reply to your post that better describes
the problem.

Thanks,

A
 
P

Pete Becker

a said:
IDD_ADD_DUPLICATE
.....
CAPTION "Duplicate Entry Found"
.....
LTEXT "You are trying to add the entry""%s <%s>"" to your
list, but ""%s <%s>"" already exists.\n\n\nWould you like to merge these two
entries instead of creating a new one?",
......
END

My expression must match the first string ("Duplicate Entry Found" ) as well
as the _entire_ string after LTEXT:

Okay, with more context, it looks like David Harmon's guess was right.

--

-- Pete

Author of "The Standard C++ Library Extensions: a Tutorial and Reference."
For more information about this book, see www.petebecker.com/tr1book.
 
A

a

Thanks! This works.

Now I only have to be able to mark the content of the outermost string
(without the enclosing quotes) and I'm done.

Thanks,

A


David Harmon said:
Quotes aren't special characters in any of the regular expression
grammars supported by TR1, so the regular expression is

""(.*)""

Close, but no cigar. That will match a string that has quotes at
the begin and end but the quotes inside (if any) might not be
doubled.

If a quoted string "A" is matched simply by /"[^"]*"/
or in c++, "\"[^\"]*\""

A string with possible double quotes in it "A""A" is just a series
of the above. So it is /("[^"]*")*/, or in c++ "(\"[^\"]*\")*".



#include <string>
#include <iostream>

#define BOOST_REGEX_NO_FILEITER
#include <boost/regex.hpp>

using namespace std;
using namespace boost;

bool check (std::string what)
{
static const boost::regex expr("(\"[^\"]*\")*" );
boost::smatch tokens;
bool result = regex_match(what, tokens, expr);
cout << result << "\t(" << what << ")\n";
return result;
}

int main(int argc, char* argv[])
{
check("\"I am a duck\"");
check("\"I am a \"\" duck\"");
check("\"I \"\"am\"\" a \"\" duck\"");
check("I am not a duck");
check("\"I am \"not\" a duck\"");
check("I am \"not\" a duck");
check("\"I am not \" a duck\"");
}
[/QUOTE]
 
D

David Harmon

On Sat, 09 Sep 2006 16:58:37 GMT in comp.lang.c++, "a"
Now I only have to be able to mark the content of the outermost string
(without the enclosing quotes) and I'm done.

This time I have to complain about ambiguity. What is the
"outermost" string? The even numbered captured pieces?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,236
Members
46,822
Latest member
israfaceZa

Latest Threads

Top