Is Scanner's nextLine() Supposed to Return True with Unread Empty Lines?

K

KevinSimonson

Most programs I write are in Java, because I understand Java pretty
well, and enjoy it. When I write programs that do a lot of text I/O,
I've got a pretty standard approach that has worked fine in the past
for me. If my program needs to do text output, I usually declare a
variable of class <PrintWriter> and initialize it with <prntWrtr = new
PrintWriter( BufferedWriter( new FileWriter( outFileName)))>, where
<outFileName> is a <String> that contains the name of the file. If my
program needs to do text input, I usually declare a variable of class
<Scanner> and initialize it with <scnnr = new Scanner( new
File( inputFileName))>, where <inFileName> is a <String> that contains
the name of the input file.

The <PrintWriter> class gives me <println()> and <print()>, and that's
usually all the methods I need. The <Scanner> class gives me
<hasNext()> and <nextLine()>, and I had _thought_ that that was as
adequate as <PrintWriter>'s methods were, but I've run into a
situation that may have changed my mind.

At work we've been using Microsoft Excel to load a file whose contents
are divided into rows delimited by newlines, and whose rows are
divided into fields delimited by vertical pipes. Excel, with a little
bit of prompting, shows the data in a spreadsheet form. But we were
getting a discrepancy between what the data Excel _told_ us we had and
what data we _thought_ we should be getting.

So the guy I work with told me to look at the actual contents of the
pipe-delimited file, and see if Excel was giving us the right
information. I started doing it by hand, bringing up the file in an
editor, and moving the pipes on one row so that they lined up with the
pipes on the next row, to make it readable. But this file has over
7500 rows, and the job was kind of tedious, and whenever I'm in a
situation like that I usually decide to write a Java program to do the
job for me.

So I wrote the program, and it works pretty well. The discrepancy
seems to be that in the column we're interested in, a lot of the rows
at the end of the spreadsheet have absolutely nothing at all in that
field. Twenty-nine rows have that field completely blank. So when my
program isolates that column, printing out that one particular field
only, for the 7500 plus rows, I get twenty-nine new-lines at the end
of the file, which actually is exactly what I want for that kind of a
spreadsheet. But Excel apparently concludes that those rows don't
exist, and ends up not listing them in the spreadsheet at all.

The data file is used by a program that displays the complete set of
unique values displayed in that field, and along with each one the
number of rows that have that particular value in that field. So I
wrote another Java program that does essentially the same thing, to
check the program I'm testing. But _it too_ was missing the last
twenty-nine (zero-length) lines.

I debugged the problem for a bit, and think I've got a simple Java
program that isolates the problem. After a <Scanner> object reads the
last text line in the file with nonzero length, <hasNext()> returns
<true> even if there are several lines after it of _zero_ length.
(Interestingly enough, if there is a line of zero length, or a group
of lines of zero length, before another line of _nonzero_ length,
<hasNext()> _returns <false>_, and <nextLine()> returns each of those
zero-length lines quite correctly as the empty <String>.)

That's not the behavior I would have expected, or that I need for this
particular project. After I've read the last line of nonzero length
in a file, and there are, say, seventeen newlines after that line, and
no other characters, wouldn't it make sense for the next seventeen
<nextLine()>s to each return the empty <String>, and _then_ have
<hasNext()> return <true>, rather than _before_ those seventeen lines
are read in? Is the <Scanner> class _supposed_ to do this, or is this
just evidence that my Java compiler is buggy? And if it's supposed to
do it this way, is there _any other_ class besides <Scanner> that I
can use to read input from a file, that recognizes empty strings when
there are a bunch of them at the end of the file? If someone can
point me to such a class I'd really appreciate it.

Anyhow, below I'm inserting the script file that I used to capture the
behavior I'm talking about. Notice that "wc -l" says each of the
".Txt" files has three lines, but the Java program seems to think
"ScBug2.Txt" only has _one_ line. Let me know what you think.

Kevin Simonson


##############################################################################

Script started on Sat Mar 12 16:07:38 2011

sh-4.1$ pwd
/cygdrive/c/Users/kvnsmnsn/Java/Sb
sh-4.1$ ls -F
ScBug.java ScBug1.Txt ScBug2.Txt ScannerBug
sh-4.1$ cat ScBug.java
import java.io.FileNotFoundException;

import java.io.File;

import java.util.Scanner;



public class ScBug

{

public static void main ( String[] arguments)

{

if (arguments.length == 1)

{ try

{ int count = 0;

System.out.println

( "Immediately before call to open a <Scanner> object on
file \""

+ arguments[ 0] + "\".");

Scanner scnnr = new Scanner( new File( arguments[ 0]));

System.out.println

( "And immediately after call to open a <Scanner> object on
file \""

+ arguments[ 0] + "\".");

while (scnnr.hasNext())

{ System.out.println( '[' + scnnr.nextLine() + ']');

count++;

}

System.out.println

( "<scnnr.hasNext()> just returned <false> after reading " +
count

+ " lines.");

scnnr.close();

System.out.println( "Just closed the <scnnr> object.");

}

catch (FileNotFoundException excptn)

{ System.err.println

( "Couldn't find input file \"" + arguments[ 0] + "\"!");

}

}

else

{ System.out.println( "Usage is\n java ScBug <input-file>");

}

}

}

sh-4.1$ cat ScBug1.Txt




abc

sh-4.1$ cat ScBug2.Txt
abc





sh-4.1$ wc -l ScBug*.Txt
3 ScBug1.Txt
3 ScBug2.Txt
6 total
sh-4.1$ javac ScBug.java
sh-4.1$ java ScBug ScBug1.Txt
Immediately before call to open a <Scanner> object on file
"ScBug1.Txt".

And immediately after call to open a <Scanner> object on file
"ScBug1.Txt".

[]

[]

[abc]

<scnnr.hasNext()> just returned <false> after reading 3 lines.

Just closed the <scnnr> object.

sh-4.1$ java ScBug ScBug2.Txt
Immediately before call to open a <Scanner> object on file
"ScBug2.Txt".

And immediately after call to open a <Scanner> object on file
"ScBug2.Txt".

[abc]

<scnnr.hasNext()> just returned <false> after reading 1 lines.

Just closed the <scnnr> object.

sh-4.1$ exit
exit
Script done on Sat Mar 12 16:09:04 2011
 
D

Daniele Futtorovic

On 13/03/2011 01:25, KevinSimonson allegedly wrote:
That's not the behavior I would have expected, or that I need for
this particular project. After I've read the last line of nonzero
length in a file, and there are, say, seventeen newlines after that
line, and no other characters, wouldn't it make sense for the next
seventeen <nextLine()>s to each return the empty<String>, and _then_
have <hasNext()> return<true>, rather than _before_ those seventeen
lines are read in? Is the<Scanner> class _supposed_ to do this, or
is this just evidence that my Java compiler is buggy? And if it's
supposed to do it this way, is there _any other_ class
besides<Scanner> that I can use to read input from a file, that
recognizes empty strings when there are a bunch of them at the end of
the file? If someone can point me to such a class I'd really
appreciate it.
<snip />

A lengthy post that was. I'm not sure I understand all of it, and it
seems to me in the paragraph above and the one before you inverted
'true' and 'false'.

Anyway, in a nutshell I do not think this is a bug with
java.util.Scanner, but rather that the problem is with your
understanding of it.

As per Javadocs (read them!), hasNext() and next() deal with *tokens*.
Whatever tokens are exactly, empty lines are *not* not among them. The
program is *correct* in telling you false on #hasNext() if there are
only newlines left in the input.

Just as hasNext() and next() work with tokens, hasNextLine() and
nextLine() deal with lines. If lines is what you're interested in, use
those methods.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,969
Messages
2,570,161
Members
46,709
Latest member
AustinMudi

Latest Threads

Top