newbie Java regexp question

M

mitchmcc

Below is a small test program I wrote to try and
do a simple parse of an XML expression, where I
can extract the tag(s) and the data on a single
line. Yes, I know about the other ways to parse
real XML, but I am trying to learn Java only. My
test case is very simple (see below). The problem
seems to be something tricky about the fact that
I am reading the input from the console.

I have tried the regexp in all of the following forms:

Pattern p1 = Pattern.compile("<(\\S+)>(\\S+)</\\1>");
Pattern p1 = Pattern.compile("<(\\S+)>(\\S+)</\\1>\n");
Pattern p1 = Pattern.compile("<(\\S+)>(\\S+)</\\1>\r\n");

In Windows cmd.exe, none of these match when I enter

<t1>foo</t1>

as standard input.

Any advice would be greatly appreciated.

Mitch

-----------------------------------------------------------------------------------------------

import java.io.*;
import java.net.*;
import java.util.regex.*;

public class test {
public static void main(String[] args) throws IOException {

PrintWriter out = null;
BufferedReader stdIn = null;
String server = "";
String userInput;

stdIn = new BufferedReader(new InputStreamReader(System.in));

// read arguments
if(args.length == 1) {
server = args[0];
} else {
System.out.println("no args");
}

// this one works, but is not really what I want
// Pattern p1 = Pattern.compile("<(\\S+)>(\\S+)<(\\S+)>");

// this one is the correct one that won't match unless the closing tag
matches
// the opening tag, but I cannot get it to work with input from the
console...
Pattern p1 = Pattern.compile("<(\\S+)>(\\S+)</\\1>\r\n");

Matcher m1 = p1.matcher("<t1>foo</t1>\r\n");
System.out.println("matched test string = " + m1.matches());

while ((userInput = stdIn.readLine()) != null) {

System.out.println("got user input: " + userInput + " length " +
userInput.length());

// Now see if the pattern matches

Matcher m = p1.matcher(userInput);

System.out.println("matched = " + m.matches());

System.out.println("numGroups found: " + m.groupCount() + "\n");

// If there were matches, print out the groups found

if (m.matches()) {

for (int j = 1; j <= m.groupCount(); j++) {
System.out.println("group " + m.group(j) + " found\n");
} // end for
} // end if

} // end while

stdIn.close();

} // end main

} // end class test
 
D

david.karr

Below is a small test program I wrote to try and
do a simple parse of an XML expression, where I
can extract the tag(s) and the data on a single
line. Yes, I know about the other ways to parse
real XML, but I am trying to learn Java only.

You're going to be following all sorts of gnarly twisty passages if
you try to avoid not learning XML. The functionality for parsing XML
is easily available in standard Java libraries.

Feel free to explore regular expressions as an intellectual exercise,
but it's a waste of time if you're actually trying to produce real
code to parse XML.
 
T

timjowers

Below is a small test program I wrote to try and
do a simple parse of an XML expression, where I
can extract the tag(s) and the data on a single
line. Yes, I know about the other ways to parse
real XML, but I am trying to learn Java only. My
test case is very simple (see below). The problem
seems to be something tricky about the fact that
I am reading the input from the console.

I have tried the regexp in all of the following forms:

Pattern p1 = Pattern.compile("<(\\S+)>(\\S+)</\\1>");
Pattern p1 = Pattern.compile("<(\\S+)>(\\S+)</\\1>\n");
Pattern p1 = Pattern.compile("<(\\S+)>(\\S+)</\\1>\r\n");

In Windows cmd.exe, none of these match when I enter

<t1>foo</t1>

as standard input.

Any advice would be greatly appreciated.

Mitch

-----------------------------------------------------------------------------------------------

import java.io.*;
import java.net.*;
import java.util.regex.*;

public class test {
public static void main(String[] args) throws IOException {

PrintWriter out = null;
BufferedReader stdIn = null;
String server = "";
String userInput;

stdIn = new BufferedReader(new InputStreamReader(System.in));

// read arguments
if(args.length == 1) {
server = args[0];
} else {
System.out.println("no args");
}

// this one works, but is not really what I want
// Pattern p1 = Pattern.compile("<(\\S+)>(\\S+)<(\\S+)>");

// this one is the correct one that won't match unless the closing tag
matches
// the opening tag, but I cannot get it to work with input from the
console...
Pattern p1 = Pattern.compile("<(\\S+)>(\\S+)</\\1>\r\n");

Matcher m1 = p1.matcher("<t1>foo</t1>\r\n");
System.out.println("matched test string = " + m1.matches());

while ((userInput = stdIn.readLine()) != null) {

System.out.println("got user input: " + userInput + " length " +
userInput.length());

// Now see if the pattern matches

Matcher m = p1.matcher(userInput);

System.out.println("matched = " + m.matches());

System.out.println("numGroups found: " + m.groupCount() + "\n");

// If there were matches, print out the groups found

if (m.matches()) {

for (int j = 1; j <= m.groupCount(); j++) {
System.out.println("group " + m.group(j) + " found\n");
} // end for
} // end if

} // end while

stdIn.close();

} // end main

} // end class test


It works.

Pattern p1 = Pattern.compile("<(\\S+)>(\\S+)</\\1>");

you may be putting a whitespace in the text of the element. Try
revising the regexp to look for anything not the terminator. E.g. this
works as is:
<i>test</i>

Yet this does not.
<i>test two</i>


TimJOwers
 
K

kaldrenon

E.g. this
works as is:
<i>test</i>

Yet this does not.
<i>test two</i>

TimJOwers

Which could easily be fixed by replacing the (\\S+) in the middle with
(.?) or (.+), I believe.
 
R

Roedy Green

Pattern p1 = Pattern.compile("<(\\S+)>(\\S+)</\\1>");
Pattern p1 = Pattern.compile("<(\\S+)>(\\S+)</\\1>\n");
Pattern p1 = Pattern.compile("<(\\S+)>(\\S+)</\\1>\r\n");

You have 4 things that have to work for your regex as a whole to work.
Chop your pattern down to just match <t1> then when you get the
working add the next bit.

Instead of trying all possibilities of \n, have a look at your string
and see what is on the end. use charAt to examine it.

see http://mindprod.com/jgloss/regex.html

..
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

Forum statistics

Threads
473,982
Messages
2,570,190
Members
46,740
Latest member
AdolphBig6

Latest Threads

Top