Scanner class and regex problem

L

Lee Weiner

I teach Java, and we're switching to 1.5.0 next semester. I was thinking
about using the Scanner class to read data from text files, but I'm having
a problem specifying a delimiter string.

The file I'm using for the following example contains two records:

Weiner@572-6544@57
Kirby@572-6544@36

Using the following:

import java.util.Scanner;
import java.io.FileNotFoundException;
import java.io.File;

public class ScannerFile
{
public static void main ( String[] args )
{
try
{
Scanner scan = new Scanner( new File( "lee.txt" ) );
scan.useDelimiter( "\\s+" ); //1 or more white space chars
while( scan.hasNext() )
{
System.out.println( "*" + scan.next() + "*" );
}
scan.close();
}
catch(FileNotFoundException exc)
{
System.out.println( "Error - Input file not found. Terminating." );
System.exit( 1 );
}
System.exit(0);
}
}

I get:

*Weiner@572-6544@57*
*Kirby@572-6544@36*

Exactly what I expect, but if I also want to delimit on the "@" signs with

scan.useDelimiter( "[@\\s+]" );

I get:

*Weiner*
*572-6544*
*57*
**
*Kirby*
*572-6544*
*36*
**

Can anyone tell me what I'm doing to cause that extra empty token at the
end of each record? I running under WindowsXP, if that's important.

Lee Weiner
lee AT leeweiner DOT org
 
C

Chris Smith

Lee Weiner said:
Exactly what I expect, but if I also want to delimit on the "@" signs with

scan.useDelimiter( "[@\\s+]" );

I get:

*Weiner*
*572-6544*
*57*
**
*Kirby*
*572-6544*
*36*
**

Can anyone tell me what I'm doing to cause that extra empty token at the
end of each record?

Yes. Your regular expression is faulty. [@\\s+] means "either an @
symbol, or a single whitespace character, or a + symbol". None of your
input contained a + character, but things would have gotten even weirder
if it had. Perhaps you meant [@\\s]+ or @|(\\s+) (the difference being
that the first would not produce an empty token between two consecutive
@ characters, while the second one would).

Incidentally, when you're reading multiple records, it's far safer to
separate records first, then parse the record content. A simple error
in the input file here, rather than being detected, could cause you to
confuse names for phone numbers for the entire rest of the file and
store corrupt data that has to be hunted and purged after the error has
been discovered. That's not good.

Also incidentally, this question and many others like it should be
carefully read by the fanatics at Sun who seem to think, and write in
documentation, that just because a problem can be solved by regular
expressions, it has to be.

--
www.designacourse.com
The Easiest Way To Train Anyone... Anywhere.

Chris Smith - Lead Software Developer/Technical Trainer
MindIQ Corporation
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

No members online now.

Forum statistics

Threads
473,982
Messages
2,570,186
Members
46,740
Latest member
JudsonFrie

Latest Threads

Top