D
Dave Stallard
I ran into a problem last week with some Java code I have that read in a
text file and parsed it up, and built some data structures. It had
worked fine for 2 years, running on Win2K and previous WinXP machines,
but on a new WinXP laptop failed (in front of customers, naturally),
claiming that a well-formed input text file was ill-formed. The text
file had been created with Notepad.
I looked at the file with Emacs and found three initial characters that
were outside the normal ASCII set. When I put some logging into the
file parsing code (which reads the file UTF-8), I found a single initial
character, Unicode point 65,279. (Evidently, the UTF-8 reader saw the
3 bytes as this one Unicode character.) I repeated this expt many times
with the same result. It is clear that Windows was inserting this
character, whatever it is, as a kind of header into every text file it
makes.
I had previously run into this problem with Notepad writing files out in
UTF-8, but never before when writing simple ASCII txt files. Has
anybody else seen this?
Dave
text file and parsed it up, and built some data structures. It had
worked fine for 2 years, running on Win2K and previous WinXP machines,
but on a new WinXP laptop failed (in front of customers, naturally),
claiming that a well-formed input text file was ill-formed. The text
file had been created with Notepad.
I looked at the file with Emacs and found three initial characters that
were outside the normal ASCII set. When I put some logging into the
file parsing code (which reads the file UTF-8), I found a single initial
character, Unicode point 65,279. (Evidently, the UTF-8 reader saw the
3 bytes as this one Unicode character.) I repeated this expt many times
with the same result. It is clear that Windows was inserting this
character, whatever it is, as a kind of header into every text file it
makes.
I had previously run into this problem with Notepad writing files out in
UTF-8, but never before when writing simple ASCII txt files. Has
anybody else seen this?
Dave