Pattern and Regex

N

newsnet customer

Hi,

Would like to use regular expression to find patterns in a string.
Consider a string seq "ABCDEFHIJKLMNOP";
Would like to find three patterns (ABC, NOP, HIJ) which ever comes first.
Expect the output "0" representing the index of ABC.
Long story short, couldn't get it to work and tried the two attempts below:
Much help appreciated on this regular expression, which is doing my head in.

Cheers
ST

Attempt One:
String seq = "ABCDEFHIJKLMNOP";

Pattern p1 = Pattern.compile("ABC" || "NOP" || "HIJ");
Matcher m = p1.matcher(seq);

while ( m.find())
{
System.out.println(m.start());
//nothing happens
}


Attempt Two:
String seq = "ABCDEFHIJKLMNOP";

Pattern p1 = Pattern.compile("ABC");
Pattern p2 = Pattern.compile("NOP");
Pattern p3 = Pattern.compile("HIJ");

Matcher m1 = p1.matcher(seq);
Matcher m2 = p2.matcher(seq);
Matcher m3 = p3.matcher(seq);

while ( m1.find() || m2.find() || m3.find() )
{
//won't work cos I don't know which matcher gave me the true
}
 
B

Bart Cremers

You got pretty close with your first attempt. The problem is you're
mixing regex with java a java OR instead of sticking within the regex.

Pattern p1 = Pattern.compile("ABC|NOP|HIJ");

if you run this you'll see 3 matches. If you only want the first;
replace the "while" with a simple "if".

Regards,

Bart
 
J

Jussi Piitulainen

newsnet said:
String seq = "ABCDEFHIJKLMNOP";

Pattern p1 = Pattern.compile("ABC" || "NOP" || "HIJ");
Matcher m = p1.matcher(seq);

while ( m.find())
{
System.out.println(m.start());
//nothing happens
}

Did you just ignore the error message from the Java compiler?
Don't do that.

As to the pattern, you want "ABC|NOP|HIJ".
 
N

newsnet customer

You got pretty close with your first attempt. The problem is you're
mixing regex with java a java OR instead of sticking within the regex.

Pattern p1 = Pattern.compile("ABC|NOP|HIJ");

if you run this you'll see 3 matches. If you only want the first;
replace the "while" with a simple "if".

Regards
Bart

Thanks.
Consider this harder problem, which im not sure regular expression can
solve.
Imagine the string below. ignore the white space - it shouldnt be there but
i
deliberately put it there so you and others can see what im talking about.
I want to find the first instance of either of the three patterns (FHI,
HIJ,NOP).
But the first instance must be within a block of threes. so that removes
FHI,
leaving HIJ and NOP. Since HIJ comes before NOP, HIJ becomes the output.

String seq = "ABC DEF HIJ KLM NOP";

Pattern p1 = Pattern.compile("FHI|"HIJ"|"NOP");
Matcher m = p1.matcher(seq);

if ( m.find())
{
//do something
}


EXPECT:
HIJ

(1) I have tried putting * but that doesnt work
Pattern p1 = Pattern.compile("***FHI|"***HIJ"|"***NOP");
//compile error
(2) I have tried putting * but that doesnt work
Pattern p1 = Pattern.compile("*FHI|"*HIJ"|"*NOP");
//doesnt give me right output

I know im close like before.
help appeciated.

Cheers
ST
 
B

Bart Cremers

You could simply combine the regex operation with a simple modulo
operation on the start of the match. It works in your simple example
case, but might not work for more complex cases:

String seq = "ABCDEFHIJKLMNOP";

Pattern p1 = Pattern.compile("FHI|HIJ|NOP");

Matcher m = p1.matcher(seq);

int start = 0;
while (m.find(start)) {
System.out.printf("%3d - %s", m.start(),
seq.substring(m.start(), m.end()));
if (m.start() % 3 == 0) {
System.out.println(" -> OK");
// maybe break out here
} else {
System.out.println(" -> ignore");
}
start = m.start() + 1;
}


Bart
 
G

Gordon Beaton

(1) I have tried putting * but that doesnt work
Pattern p1 = Pattern.compile("***FHI|"***HIJ"|"***NOP");
//compile error
(2) I have tried putting * but that doesnt work
Pattern p1 = Pattern.compile("*FHI|"*HIJ"|"*NOP");
//doesnt give me right output

I know im close like before.

First, the whole regex must be a single string, enclosed between one
pair of quotation marks. Try to remember this. Neither of your
examples are even compilable.

Second, quantifiers (such as * and ?) can't be used on their own, they
must be preceded by a pattern to modify. So .* will match 0 or more
characters, while \s* will match 0 or more whitespace characters, etc.
Instead of guessing, read the regex documentation and think about what
you're trying to do.

If you have optional whitespace among the stuff you really want to
match, try something like this (untested):

"\\s*((F\\s*H\\s*I)|(H\\s*I\\s*J)|(N\\s*O\\s*P))\\s*"

/gordon
 
N

newsnet customer

You could simply combine the regex operation with a simple modulo
operation on the start of the match. It works in your simple example
case, but might not work for more complex cases:

String seq = "ABCDEFHIJKLMNOP";

Pattern p1 = Pattern.compile("FHI|HIJ|NOP");

Matcher m = p1.matcher(seq);

int start = 0;
while (m.find(start)) {
System.out.printf("%3d - %s", m.start(),
seq.substring(m.start(), m.end()));
if (m.start() % 3 == 0) {
System.out.println(" -> OK");
// maybe break out here
} else {
System.out.println(" -> ignore");
}
start = m.start() + 1;
}


Bart

cheers Bart.
you have been really helpful.
If i can't get it to work without using the modulus.
that is, just using the regular expression then i will use your code.

ST
 
J

Jussi Piitulainen

newsnet said:
If i can't get it to work without using the modulus.
that is, just using the regular expression [...]

The following prints the shortest prefix of triples in args[0] that
ends in one of FHI, HIJ and NOP.

Pattern p = Pattern.compile("(...)*?(FHI|HIJ|NOP)");
Matcher m = p.matcher(args[0]);
if (m.find()) {
System.out.println(m.group(0));
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

No members online now.

Forum statistics

Threads
473,982
Messages
2,570,190
Members
46,740
Latest member
AdolphBig6

Latest Threads

Top