Regex help

S

SpideyPKT

I need some help with a regex I'm trying to get working. I'm trying to
write a regex which will handle a variable number of args in a string.
It will look something like this:

args=['token 1']
or
args=['token 1', 'token 2']
or
args=['token 1', 'token 2', 'token 3']
and so on...

I'm trying to write it so that I can get the tokens with something like
this:
while(true)
try {
tokens.add(m.group(i));
i++;
catch(IndexOutOfBoundsException e) {break;}
}

The closest I got to this correctly working was:
"args=\\[(?:\\s*\'(.*?)\',)*\\s*(?:\'(.+?)\')\\]\\s*". The problem I'm
having is that with the single argument case, I get 2 tokens, the first
one is null, the second is 'token 1'.

The 2 argument case works fine, but the case of 3+ arguments puts
'token 1' in the first toekn and everything else into the second token,
so I know I've got a problem using a greedy selector but I can't figure
out where it is.

Any ideas?
 
C

Chris Smith

SpideyPKT said:
args=['token 1']
or
args=['token 1', 'token 2']
or
args=['token 1', 'token 2', 'token 3']
and so on...

I'm trying to write it so that I can get the tokens with something like
this:
while(true)
try {
tokens.add(m.group(i));
i++;
catch(IndexOutOfBoundsException e) {break;}
}

Nope, not possible. Sorry. Although you can recognize this string with
regular expressions, you can't get an arbitrary number of groups from
Matcher. Instead, you ought to first parse out the list, then use
String.split() to get the pieces.

--
www.designacourse.com
The Easiest Way To Train Anyone... Anywhere.

Chris Smith - Lead Software Developer/Technical Trainer
MindIQ Corporation
 
S

shakah

SpideyPKT said:
I need some help with a regex I'm trying to get working. I'm trying to
write a regex which will handle a variable number of args in a string.
It will look something like this:

args=['token 1']
or
args=['token 1', 'token 2']
or
args=['token 1', 'token 2', 'token 3']
and so on...

I'm trying to write it so that I can get the tokens with something like
this:
while(true)
try {
tokens.add(m.group(i));
i++;
catch(IndexOutOfBoundsException e) {break;}
}

The closest I got to this correctly working was:
"args=\\[(?:\\s*\'(.*?)\',)*\\s*(?:\'(.+?)\')\\]\\s*". The problem I'm
having is that with the single argument case, I get 2 tokens, the first
one is null, the second is 'token 1'.

The 2 argument case works fine, but the case of 3+ arguments puts
'token 1' in the first toekn and everything else into the second token,
so I know I've got a problem using a greedy selector but I can't figure
out where it is.

Any ideas?

I'm not sure this is what you're trying to do, but what about the
following?

jc@sarah:~/tmp$ cat regextest3.java
public class regextest3 {
public static void main(String [] asArgs) {
java.util.regex.Pattern p =
java.util.regex.Pattern.compile(asArgs[0]) ;
for(int nArg=1; nArg<asArgs.length; ++nArg) {
System.out.println("looking in '" + asArgs[nArg] + "'") ;
java.util.regex.Matcher m = p.matcher(asArgs[nArg]) ;
while(m.find()) {
for(int i=0; i<m.groupCount(); ++i) {
String sTemp = m.group(i+1) ;
if(null!=sTemp) {
System.out.println(" found: '" + sTemp + "'") ;
}
}
}
}
}
}

jc@sarah:~/tmp$ /usr/java/jdk1.5.0_01/bin/java regextest3 "'([^']*)'"
"args=['token 1', 'token 2', 'token 3']"
looking in 'args=['token 1', 'token 2', 'token 3']'
found: 'token 1'
found: 'token 2'
found: 'token 3'
 
O

Oliver Wong

SpideyPKT said:
I need some help with a regex I'm trying to get working. I'm trying to
write a regex which will handle a variable number of args in a string.
It will look something like this:

args=['token 1']
or
args=['token 1', 'token 2']
or
args=['token 1', 'token 2', 'token 3']
and so on...

I'm trying to write it so that I can get the tokens with something like
this:
while(true)
try {
tokens.add(m.group(i));
i++;
catch(IndexOutOfBoundsException e) {break;}
}

The closest I got to this correctly working was:
"args=\\[(?:\\s*\'(.*?)\',)*\\s*(?:\'(.+?)\')\\]\\s*". The problem I'm
having is that with the single argument case, I get 2 tokens, the first
one is null, the second is 'token 1'.

The 2 argument case works fine, but the case of 3+ arguments puts
'token 1' in the first toekn and everything else into the second token,
so I know I've got a problem using a greedy selector but I can't figure
out where it is.

Any ideas?

I think this problem can be solved with a DFA instead of a regular
expression. Skip everything until you see the opening ', then record
everything until you see the closing '. If you allow your strings to be
escaped, e.g. 'Hello, I\'m doing fine', then you'll need to add in a few
extra states to handle the escaping, but it shouldn't be too difficult to
do.

- Oliver
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,228
Members
46,816
Latest member
nipsseyhussle

Latest Threads

Top