K
KevinSimonson
Most programs I write are in Java, because I understand Java pretty
well, and enjoy it. When I write programs that do a lot of text I/O,
I've got a pretty standard approach that has worked fine in the past
for me. If my program needs to do text output, I usually declare a
variable of class <PrintWriter> and initialize it with <prntWrtr = new
PrintWriter( BufferedWriter( new FileWriter( outFileName)))>, where
<outFileName> is a <String> that contains the name of the file. If my
program needs to do text input, I usually declare a variable of class
<Scanner> and initialize it with <scnnr = new Scanner( new
File( inputFileName))>, where <inFileName> is a <String> that contains
the name of the input file.
The <PrintWriter> class gives me <println()> and <print()>, and that's
usually all the methods I need. The <Scanner> class gives me
<hasNext()> and <nextLine()>, and I had _thought_ that that was as
adequate as <PrintWriter>'s methods were, but I've run into a
situation that may have changed my mind.
At work we've been using Microsoft Excel to load a file whose contents
are divided into rows delimited by newlines, and whose rows are
divided into fields delimited by vertical pipes. Excel, with a little
bit of prompting, shows the data in a spreadsheet form. But we were
getting a discrepancy between what the data Excel _told_ us we had and
what data we _thought_ we should be getting.
So the guy I work with told me to look at the actual contents of the
pipe-delimited file, and see if Excel was giving us the right
information. I started doing it by hand, bringing up the file in an
editor, and moving the pipes on one row so that they lined up with the
pipes on the next row, to make it readable. But this file has over
7500 rows, and the job was kind of tedious, and whenever I'm in a
situation like that I usually decide to write a Java program to do the
job for me.
So I wrote the program, and it works pretty well. The discrepancy
seems to be that in the column we're interested in, a lot of the rows
at the end of the spreadsheet have absolutely nothing at all in that
field. Twenty-nine rows have that field completely blank. So when my
program isolates that column, printing out that one particular field
only, for the 7500 plus rows, I get twenty-nine new-lines at the end
of the file, which actually is exactly what I want for that kind of a
spreadsheet. But Excel apparently concludes that those rows don't
exist, and ends up not listing them in the spreadsheet at all.
The data file is used by a program that displays the complete set of
unique values displayed in that field, and along with each one the
number of rows that have that particular value in that field. So I
wrote another Java program that does essentially the same thing, to
check the program I'm testing. But _it too_ was missing the last
twenty-nine (zero-length) lines.
I debugged the problem for a bit, and think I've got a simple Java
program that isolates the problem. After a <Scanner> object reads the
last text line in the file with nonzero length, <hasNext()> returns
<true> even if there are several lines after it of _zero_ length.
(Interestingly enough, if there is a line of zero length, or a group
of lines of zero length, before another line of _nonzero_ length,
<hasNext()> _returns <false>_, and <nextLine()> returns each of those
zero-length lines quite correctly as the empty <String>.)
That's not the behavior I would have expected, or that I need for this
particular project. After I've read the last line of nonzero length
in a file, and there are, say, seventeen newlines after that line, and
no other characters, wouldn't it make sense for the next seventeen
<nextLine()>s to each return the empty <String>, and _then_ have
<hasNext()> return <true>, rather than _before_ those seventeen lines
are read in? Is the <Scanner> class _supposed_ to do this, or is this
just evidence that my Java compiler is buggy? And if it's supposed to
do it this way, is there _any other_ class besides <Scanner> that I
can use to read input from a file, that recognizes empty strings when
there are a bunch of them at the end of the file? If someone can
point me to such a class I'd really appreciate it.
Anyhow, below I'm inserting the script file that I used to capture the
behavior I'm talking about. Notice that "wc -l" says each of the
".Txt" files has three lines, but the Java program seems to think
"ScBug2.Txt" only has _one_ line. Let me know what you think.
Kevin Simonson
##############################################################################
Script started on Sat Mar 12 16:07:38 2011
sh-4.1$ pwd
/cygdrive/c/Users/kvnsmnsn/Java/Sb
sh-4.1$ ls -F
ScBug.java ScBug1.Txt ScBug2.Txt ScannerBug
sh-4.1$ cat ScBug.java
import java.io.FileNotFoundException;
import java.io.File;
import java.util.Scanner;
public class ScBug
{
public static void main ( String[] arguments)
{
if (arguments.length == 1)
{ try
{ int count = 0;
System.out.println
( "Immediately before call to open a <Scanner> object on
file \""
+ arguments[ 0] + "\".");
Scanner scnnr = new Scanner( new File( arguments[ 0]));
System.out.println
( "And immediately after call to open a <Scanner> object on
file \""
+ arguments[ 0] + "\".");
while (scnnr.hasNext())
{ System.out.println( '[' + scnnr.nextLine() + ']');
count++;
}
System.out.println
( "<scnnr.hasNext()> just returned <false> after reading " +
count
+ " lines.");
scnnr.close();
System.out.println( "Just closed the <scnnr> object.");
}
catch (FileNotFoundException excptn)
{ System.err.println
( "Couldn't find input file \"" + arguments[ 0] + "\"!");
}
}
else
{ System.out.println( "Usage is\n java ScBug <input-file>");
}
}
}
sh-4.1$ cat ScBug1.Txt
abc
sh-4.1$ cat ScBug2.Txt
abc
sh-4.1$ wc -l ScBug*.Txt
3 ScBug1.Txt
3 ScBug2.Txt
6 total
sh-4.1$ javac ScBug.java
sh-4.1$ java ScBug ScBug1.Txt
Immediately before call to open a <Scanner> object on file
"ScBug1.Txt".
And immediately after call to open a <Scanner> object on file
"ScBug1.Txt".
[]
[]
[abc]
<scnnr.hasNext()> just returned <false> after reading 3 lines.
Just closed the <scnnr> object.
sh-4.1$ java ScBug ScBug2.Txt
Immediately before call to open a <Scanner> object on file
"ScBug2.Txt".
And immediately after call to open a <Scanner> object on file
"ScBug2.Txt".
[abc]
<scnnr.hasNext()> just returned <false> after reading 1 lines.
Just closed the <scnnr> object.
sh-4.1$ exit
exit
Script done on Sat Mar 12 16:09:04 2011
well, and enjoy it. When I write programs that do a lot of text I/O,
I've got a pretty standard approach that has worked fine in the past
for me. If my program needs to do text output, I usually declare a
variable of class <PrintWriter> and initialize it with <prntWrtr = new
PrintWriter( BufferedWriter( new FileWriter( outFileName)))>, where
<outFileName> is a <String> that contains the name of the file. If my
program needs to do text input, I usually declare a variable of class
<Scanner> and initialize it with <scnnr = new Scanner( new
File( inputFileName))>, where <inFileName> is a <String> that contains
the name of the input file.
The <PrintWriter> class gives me <println()> and <print()>, and that's
usually all the methods I need. The <Scanner> class gives me
<hasNext()> and <nextLine()>, and I had _thought_ that that was as
adequate as <PrintWriter>'s methods were, but I've run into a
situation that may have changed my mind.
At work we've been using Microsoft Excel to load a file whose contents
are divided into rows delimited by newlines, and whose rows are
divided into fields delimited by vertical pipes. Excel, with a little
bit of prompting, shows the data in a spreadsheet form. But we were
getting a discrepancy between what the data Excel _told_ us we had and
what data we _thought_ we should be getting.
So the guy I work with told me to look at the actual contents of the
pipe-delimited file, and see if Excel was giving us the right
information. I started doing it by hand, bringing up the file in an
editor, and moving the pipes on one row so that they lined up with the
pipes on the next row, to make it readable. But this file has over
7500 rows, and the job was kind of tedious, and whenever I'm in a
situation like that I usually decide to write a Java program to do the
job for me.
So I wrote the program, and it works pretty well. The discrepancy
seems to be that in the column we're interested in, a lot of the rows
at the end of the spreadsheet have absolutely nothing at all in that
field. Twenty-nine rows have that field completely blank. So when my
program isolates that column, printing out that one particular field
only, for the 7500 plus rows, I get twenty-nine new-lines at the end
of the file, which actually is exactly what I want for that kind of a
spreadsheet. But Excel apparently concludes that those rows don't
exist, and ends up not listing them in the spreadsheet at all.
The data file is used by a program that displays the complete set of
unique values displayed in that field, and along with each one the
number of rows that have that particular value in that field. So I
wrote another Java program that does essentially the same thing, to
check the program I'm testing. But _it too_ was missing the last
twenty-nine (zero-length) lines.
I debugged the problem for a bit, and think I've got a simple Java
program that isolates the problem. After a <Scanner> object reads the
last text line in the file with nonzero length, <hasNext()> returns
<true> even if there are several lines after it of _zero_ length.
(Interestingly enough, if there is a line of zero length, or a group
of lines of zero length, before another line of _nonzero_ length,
<hasNext()> _returns <false>_, and <nextLine()> returns each of those
zero-length lines quite correctly as the empty <String>.)
That's not the behavior I would have expected, or that I need for this
particular project. After I've read the last line of nonzero length
in a file, and there are, say, seventeen newlines after that line, and
no other characters, wouldn't it make sense for the next seventeen
<nextLine()>s to each return the empty <String>, and _then_ have
<hasNext()> return <true>, rather than _before_ those seventeen lines
are read in? Is the <Scanner> class _supposed_ to do this, or is this
just evidence that my Java compiler is buggy? And if it's supposed to
do it this way, is there _any other_ class besides <Scanner> that I
can use to read input from a file, that recognizes empty strings when
there are a bunch of them at the end of the file? If someone can
point me to such a class I'd really appreciate it.
Anyhow, below I'm inserting the script file that I used to capture the
behavior I'm talking about. Notice that "wc -l" says each of the
".Txt" files has three lines, but the Java program seems to think
"ScBug2.Txt" only has _one_ line. Let me know what you think.
Kevin Simonson
##############################################################################
Script started on Sat Mar 12 16:07:38 2011
sh-4.1$ pwd
/cygdrive/c/Users/kvnsmnsn/Java/Sb
sh-4.1$ ls -F
ScBug.java ScBug1.Txt ScBug2.Txt ScannerBug
sh-4.1$ cat ScBug.java
import java.io.FileNotFoundException;
import java.io.File;
import java.util.Scanner;
public class ScBug
{
public static void main ( String[] arguments)
{
if (arguments.length == 1)
{ try
{ int count = 0;
System.out.println
( "Immediately before call to open a <Scanner> object on
file \""
+ arguments[ 0] + "\".");
Scanner scnnr = new Scanner( new File( arguments[ 0]));
System.out.println
( "And immediately after call to open a <Scanner> object on
file \""
+ arguments[ 0] + "\".");
while (scnnr.hasNext())
{ System.out.println( '[' + scnnr.nextLine() + ']');
count++;
}
System.out.println
( "<scnnr.hasNext()> just returned <false> after reading " +
count
+ " lines.");
scnnr.close();
System.out.println( "Just closed the <scnnr> object.");
}
catch (FileNotFoundException excptn)
{ System.err.println
( "Couldn't find input file \"" + arguments[ 0] + "\"!");
}
}
else
{ System.out.println( "Usage is\n java ScBug <input-file>");
}
}
}
sh-4.1$ cat ScBug1.Txt
abc
sh-4.1$ cat ScBug2.Txt
abc
sh-4.1$ wc -l ScBug*.Txt
3 ScBug1.Txt
3 ScBug2.Txt
6 total
sh-4.1$ javac ScBug.java
sh-4.1$ java ScBug ScBug1.Txt
Immediately before call to open a <Scanner> object on file
"ScBug1.Txt".
And immediately after call to open a <Scanner> object on file
"ScBug1.Txt".
[]
[]
[abc]
<scnnr.hasNext()> just returned <false> after reading 3 lines.
Just closed the <scnnr> object.
sh-4.1$ java ScBug ScBug2.Txt
Immediately before call to open a <Scanner> object on file
"ScBug2.Txt".
And immediately after call to open a <Scanner> object on file
"ScBug2.Txt".
[abc]
<scnnr.hasNext()> just returned <false> after reading 1 lines.
Just closed the <scnnr> object.
sh-4.1$ exit
exit
Script done on Sat Mar 12 16:09:04 2011