Java regexp help

S

stevengarcia

I'm having trouble writing a regular expression (using Java) to extract
out the numeric portion of a String. Some examples are

"$.50." --> .50
"$0.50." --> 0.50
"$6.25." --> 6.25
"$12.30." --> 12.30
"$6.25" --> 6.25
"$0.50" --> 0.50
"2350." --> 2350
"10." --> 10
"$23a.4." no match

Note that the first character can optionally be a $, and the last
character can optionally be a .

Thus far I have a regular expression that is
^\\$?(\\d*\\.?\\d*)\\.?
which reads "0 or 1 dollar sign, 0 or more digits, 0 or 1 period, 0 or
more digits, 0 or 1 period."

As far as I can tell this is the correct expression. But I want to
extract out what is in-between the parenthesis, as shown in the
expression. It's not working and my simple program is below (with
results):

I would appreciate any help!


public static void main(String[] args) {
String[] lines = new String[] { "$.50.",
"$0.50.",
"$6.25.",
"$12.30.",
"$6.25",
"$0.50",
"2350.",
"10.",
"$23a.4."};

String regexp = "^\\$?(\\d*\\.?\\d*)\\.?";
Pattern pattern = Pattern.compile(regexp);

for (int i = 0; i < lines.length; i++) {
System.out.println("Does text " + lines + " match
regexp? " + lines.matches(regexp));
if (lines.matches(regexp)) {
Matcher matcher = pattern.matcher(lines);
while (matcher.find()) {
System.out.println(" " + lines + " matches " +
regexp + " with " + matcher.group() +
" Start <" + matcher.start() +
">, End <" + matcher.end() + ">.");
}
}
}
}

and the output I get is

Does text $.50. match regexp? true
$.50. matches ^\$?(\d*\.?\d*)\.? with $.50. Start <0>, End <5>.
Does text $0.50. match regexp? true
$0.50. matches ^\$?(\d*\.?\d*)\.? with $0.50. Start <0>, End <6>.
Does text $6.25. match regexp? true
$6.25. matches ^\$?(\d*\.?\d*)\.? with $6.25. Start <0>, End <6>.
Does text $12.30. match regexp? true
$12.30. matches ^\$?(\d*\.?\d*)\.? with $12.30. Start <0>, End <7>.
Does text $6.25 match regexp? true
$6.25 matches ^\$?(\d*\.?\d*)\.? with $6.25 Start <0>, End <5>.
Does text $0.50 match regexp? true
$0.50 matches ^\$?(\d*\.?\d*)\.? with $0.50 Start <0>, End <5>.
Does text 2350. match regexp? true
2350. matches ^\$?(\d*\.?\d*)\.? with 2350. Start <0>, End <5>.
Does text 10. match regexp? true
10. matches ^\$?(\d*\.?\d*)\.? with 10. Start <0>, End <3>.
Does text $23a.4. match regexp? false
 
P

pet0etie

i don't know if i get the question right but this might be my solution :

import java.util.regex.*;

public class Value {
public static void main(String[] args) {
String[] lines = new String[] {
"$.50.","$0.50.","$6.25.","$12.30.","$6.25","$0.50","2350.","10.","$23a.4."}
;
String regexp = "^\\$?(\\d*\\.?\\d*)\\.?";
Pattern pattern = Pattern.compile(regexp);

for (int i = 0; i < lines.length; i++) {
System.out.println("Does text " + lines + " match regexp? " +
lines.matches(regexp));
if (lines.matches(regexp)) {
Matcher matcher = pattern.matcher(lines);
while (matcher.find()) {
System.out.println(" " + lines + " matches " + regexp + " with
" + matcher.group() + " Start <" + matcher.start() + ">, End <" +
matcher.end() + ">.");
// start modification //
int start = matcher.start() + ((lines.charAt(matcher.start())
== '$') ? 1 : 0);
int end = matcher.end() - ((lines.charAt(matcher.end()-1) ==
'.') ? 1 : 0);
System.out.println(" The extracted value = " +
lines.substring(start,end));
// end modification //
}
}
}
}
}

greetz,
pet0etie
 
A

Alan Moore

I'm having trouble writing a regular expression (using Java) to extract
out the numeric portion of a String. Some examples are

"$.50." --> .50
"$0.50." --> 0.50
"$6.25." --> 6.25
"$12.30." --> 12.30
"$6.25" --> 6.25
"$0.50" --> 0.50
"2350." --> 2350
"10." --> 10
"$23a.4." no match

Note that the first character can optionally be a $, and the last
character can optionally be a .

Thus far I have a regular expression that is
^\\$?(\\d*\\.?\\d*)\\.?
which reads "0 or 1 dollar sign, 0 or more digits, 0 or 1 period, 0 or
more digits, 0 or 1 period."

Your biggest problem is that you're using matcher.group() to extract
the number, when you should be using matcher.group(1). But there's a
problem with your regex, too. Here's the result I get:

$.50. --> .50
$0.50. --> 0.50
$6.25. --> 6.25
$12.30. --> 12.30
$6.25 --> 6.25
$0.50 --> 0.50
2350. --> 2350.
10. --> 10.
$23a.4. no match

As you can see, it fails to filter out the trailing dot. In your
regex, everything is optional--each atom controlled by a '?' or a
'*'--and that gives it too much leeway in how it matches. But you
know there has to be at least one group of digits and, if there is
only one, it should be after any dot, not before. To express that
restriction in the regex, just change the second "\\d*" to "\\d+".

import java.util.regex.*;

class Test
{
public static void main(String[] args)
{
String[] lines = new String[] {
"$.50.",
"$0.50.",
"$6.25.",
"$12.30.",
"$6.25",
"$0.50",
"2350.",
"10.",
"$23a.4."
};

String regexp = "^\\$?(\\d*\\.?\\d+)\\.?";
Pattern pattern = Pattern.compile(regexp);

for (int i = 0; i < lines.length; i++)
{
Matcher matcher = pattern.matcher(lines);
if (matcher.matches())
{
System.out.println(lines + "\t--> " + matcher.group(1));
}
else
{
System.out.println(lines + "\tno match");
}
}
}
}


result:

$.50. --> .50
$0.50. --> 0.50
$6.25. --> 6.25
$12.30. --> 12.30
$6.25 --> 6.25
$0.50 --> 0.50
2350. --> 2350
10. --> 10
$23a.4. no match
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

No members online now.

Forum statistics

Threads
473,982
Messages
2,570,185
Members
46,736
Latest member
AdolphBig6

Latest Threads

Top