Regex: Capturing and replacing question

H

Hal Vaughan

I can't find an actual reference to this in the API (I'm using 1.4.2), so I
want to be sure what I'm doing is "legal."

I'm using regexes to find any place in a string where there is a small
letter followed by a capital one and put a space between them. I'm
using "([a-z])([A-Z])" as the pattern to search for and using "$1 $2" as
the replacement string. This is working well in all my tests, but since I
didn't find it documented where I'd feel safe, I thought I should check
(I've also learned, in Perl, just how tricky regexes can be).

Is it correct that $1 in a replacement string references the first captured
text sequence in the regex? And so on with $2, $3....?

I've included my test case below (and I've tested more strings than what I
have in the code for now). I just want to be sure there aren't side
effects or other issues I'm not aware of!

Thanks!

Hal
---------------------------
import java.util.regex.Matcher;
import java.util.regex.Pattern;


public class Tester {

public static void main(String[] args) {
Tester tTest = new Tester();
tTest.test("1AlphaBeta");
tTest.test("AlphaBetaGammaDelta");
for (int x = 0; x < args.length; x++) {
tTest.test(args[x]);
}
}

public Tester() {}

public void test(String sInput) {
String sPattern = "([a-z])([A-Z])", sOutput = "", sReplace = "$1 $2";
Pattern pRegex;
Matcher mRegex;
pRegex = Pattern.compile(sPattern);
mRegex = pRegex.matcher(sInput);
sOutput = mRegex.replaceAll(sReplace);
System.out.println("Input: " + sInput + ", Output: " + sOutput);
return;
}

}
 
H

Hal Vaughan

Roedy said:
"([a-z])([A-Z])" a

You might want to experiment without the (). I use 3 different regex
schemes in a day. I forget which ones need the ().

My understanding is I need them to create capture groups. Without them, I
get an error at the line with the Matcher.replaceAll() command.

I just want to be sure the $1 and $2 specifically refer to captured
sequences.

Hal
 
J

Joshua Cranmer

Hal said:
I'm using regexes to find any place in a string where there is a small
letter followed by a capital one and put a space between them. I'm
using "([a-z])([A-Z])" as the pattern to search for and using "$1 $2" as
the replacement string. This is working well in all my tests, but since I
didn't find it documented where I'd feel safe, I thought I should check
(I've also learned, in Perl, just how tricky regexes can be).

Alternatively, this should work:

"(?<=[a-z])(?=[A-Z])" replaced with " ".
Is it correct that $1 in a replacement string references the first captured
text sequence in the regex? And so on with $2, $3....?

The Matcher.appendReplacement says that $1 should be the output of
group(1), so that is correct.
 
L

Lew

Hal said:
I'm using regexes to find any place in a string where there is a small
letter followed by a capital one and put a space between them. I'm
using "([a-z])([A-Z])" as the pattern to search for and using "$1 $2" as
the replacement string. This is working well in all my tests, but
since I
didn't find it documented where I'd feel safe,

From said:
Groups and capturing

Capturing groups are numbered by counting their opening parentheses from left to right.
In the expression ((A)(B(C))), for example, there are four such groups:

1 ((A)(B(C)))
2 (A)
3 (B(C))
4 (C)

Group zero always stands for the entire expression.

Joshua said:
The Matcher.appendReplacement says that $1 should be the output of
group(1), so that is correct.

<http://java.sun.com/javase/6/docs/a...ent(java.lang.StringBuffer, java.lang.String)>

The documentation is in the Javadocs for the relevant classes. Javadocs are
often a great first place to look. The Javadocs are a place where you will
"find it documented" and should be "where [you]'d feel safe" to trust it.
 
H

Hal Vaughan

Lew said:
Hal said:
I'm using regexes to find any place in a string where there is a small
letter followed by a capital one and put a space between them. I'm
using "([a-z])([A-Z])" as the pattern to search for and using "$1 $2" as
the replacement string. This is working well in all my tests, but
since I
didn't find it documented where I'd feel safe,

From said:
Groups and capturing

Capturing groups are numbered by counting their opening parentheses from
left to right. In the expression ((A)(B(C))), for example, there are four
such groups:

1 ((A)(B(C)))
2 (A)
3 (B(C))
4 (C)

Group zero always stands for the entire expression.

Joshua said:
The Matcher.appendReplacement says that $1 should be the output of
group(1), so that is correct.
<http://java.sun.com/javase/6/docs/a...html#appendReplacement(java.lang.StringBuffer
%20java.lang.String)>

The documentation is in the Javadocs for the relevant classes. Javadocs
are
often a great first place to look. The Javadocs are a place where you
will "find it documented" and should be "where [you]'d feel safe" to trust
it.

I goofed on that. I read all the info for Pattern, since that had all the
regex expressions and patterns, then went over the documentation at the
start of Matcher, but I didn't expect to find something like this explained
in the description of a particular method. I also had Googled quite a bit,
but had a high noise-to-signal ratio and kept getting sections that gave
the same general regex explanations. I would think that something like
this should have been in the main part of Pattern or Matcher and not in a
method description.

Hal
 
L

Lew

Lew said:

Hal said:
I read all the info for Pattern, since that had all the
regex expressions and patterns, then went over the documentation at the
start of Matcher, but I didn't expect to find something like this explained
in the description of a particular method. ... I would think that something like
this should have been in the main part of Pattern or Matcher and not in a
method description.

I couldn't agree more. In fact, I was feeling somewhat incensed at Sun for
that when I finally turned up the reference. I had remembered that it was in
the 'docs for Pattern or Matcher, but not that it was obscurely buried in a
method description. That'll teach us to read every dripping word in every
corner of the Javadocs, for sure!

This is definitely a frak-up by Sun.

C'mon, Sun, polish up that Javadoc page!
 
H

Hal Vaughan

I couldn't agree more. In fact, I was feeling somewhat incensed at Sun
for
that when I finally turned up the reference. I had remembered that it was
in the 'docs for Pattern or Matcher, but not that it was obscurely buried
in a
method description. That'll teach us to read every dripping word in every
corner of the Javadocs, for sure!

I find, though, that no matter what I read when I get a question like this,
it doesn't matter. I'll check the API docs, I'll check my books, then I'll
Google under the terms that I think would work (and often
include "tutorial" since that gives me good example pages), and it's always
on the one doc page I didn't read or I need to use one term in Google that
I didn't think of. There are times I've had to post just to ask what the
proper term is for something so I know what to look up.

This is definitely a frak-up by Sun.

Ah, a fellow Galactica fan! ;)
C'mon, Sun, polish up that Javadoc page!

It definitely should be included in the main part, and in the Pattern page
as well. Even though it's not used as part of a pattern (maybe it could be,
it works that way in Perl), it's related closely enough it should be
included there.

Hal
 
L

Lew

Lew said:
Farscape.

Let me explain. "Frak" is the Galactica term, "Frell" the Farscape term. I
got the word from Galactica, but I was much more a Farscape fan.

Even on SCIFI.com they cop to the cognate nature:
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

No members online now.

Forum statistics

Threads
473,982
Messages
2,570,190
Members
46,740
Latest member
AdolphBig6

Latest Threads

Top