A big problem about regular expression

G

Guest

Now i need to match a string in a text file(actually a stored procedure
file), the code is like below:
private static void test1()
{
String regex =
"(-{128})(\\s*\\r\\n\\s*\\r\\n)(-{2})(\\s*)ADD(\\s*)YOUR(\\s*)CODE(\\s*)HERE\\r\\n((.|\\r|\\n)*)(-{2})(\\s*)END(\\s*)OF(\\s*)YOUR(\\s*)CODE\\r\\n\\r\\n(-{128})";
String text = readTextFromFile("C:\\test.txt");// read the text
file into a string

Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(text);

if(matcher.find())
{
System.out.println("matches");
}
else
{
System.out.println("Not match");
}
BufferedReader input = new BufferedReader(new
InputStreamReader(System.in));
try
{
input.readLine();
}
catch(IOException ex)
{

}

}
and unfortunately, it fails because of stack overflow, i guess jdk
mathes the regular expression in a recursive way, so when the regex is
some complicated, the stack overflows, is that right? and can someone
give me a explanation and help me solve this problem? thanks
 
H

hiwa

Trying with a simple text string like "asdfghjjkl" doesn't get stack
overflow.
However, regarding your current regex string, I have some questions:
(1)Why so many capturing groups?
(2)\s includes \r and \n ... then what to do?
(3)Why not use DOTALL mode?
(4)Simple dot matches any characters in greedy mode, then what to do?
--- ((.|\\r|\\n)*) ---- after this
 
G

Guest

actually, it's a jdk design problem, and somebody has fired a bug in
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4675952
when a regex includes a pattern like "(a|b)*", the StackOverflowError
will occur.
however, since sun doesn't to consider to fix it, i have changed the
pattern as
:(DECLARE\s*@object\s*int\s*--declare\s*the\s*object\s*variable)([\w\W]*)(CAST\(@error\s*as\s*nvarchar\(200\)\))[\r\n\s]*(END);and
it works

(1) i want to match the "DECLARE @object int
--declare the object variable.
..................................
..................................
CAST @error as nvarchar(200))
END
"
(2) it's my fault, i have though that \s cannot match \r and \n
(3) Maybe it's ok to use dot to match all chars.
(4) i just want to match any char groups, and now i alternate it to
(\w\W)*
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,982
Messages
2,570,186
Members
46,739
Latest member
Clint8040

Latest Threads

Top