doctest.testfile fails on text files with Windows line endings

Steven D'Aprano · Apr 11, 2010

After converting a text file containing doctests to use Windows line
endings, I'm getting spurious errors:

ValueError: line 19 of the docstring for examples.txt has inconsistent
leading whitespace: '\r'

I don't believe that doctest.testfile is documented as requiring Unix
line endings, and the line endings in the file are okay. I've checked in
a hex editor, and they are valid \r\n line endings.

In doctest._load_testfile, I find this comment and code:

# get_data() opens files as 'rb', so one must do the equivalent
# conversion as universal newlines would do.
return file_contents.replace(os.linesep, '\n'), filename

which I read as an attempt to normalise line endings in the file to \n.

(But surely this will fail? If you're running, say, Linux or MacOS,
linesep will already be '\n' not '\r\n', and consequently the replace
does nothing, any Windows line endings aren't normalised, and doctest
will choke on the \r characters. It's only useful if running on Windows.)

But the above only occurs when using a package loader. Otherwise,
_load_testfile executes:

return open(filename).read(), filename

which doesn't do any line ending normalisation at all.

To my mind, this is a bug in doctest. Does anyone disagree? I think the
simplest fix is to change it to:

return open(filename, 'rU').read(), filename

Comments?

Patrick Maupin · Apr 11, 2010

After converting a text file containing doctests to use Windows line
endings, I'm getting spurious errors:

ValueError: line 19 of the docstring for examples.txt has inconsistent
leading whitespace: '\r'

I don't believe that doctest.testfile is documented as requiring Unix
line endings, and the line endings in the file are okay. I've checked in
a hex editor, and they are valid \r\n line endings.

In doctest._load_testfile, I find this comment and code:

# get_data() opens files as 'rb', so one must do the equivalent
# conversion as universal newlines would do.
return file_contents.replace(os.linesep, '\n'), filename

which I read as an attempt to normalise line endings in the file to \n.

(But surely this will fail? If you're running, say, Linux or MacOS,
linesep will already be '\n' not '\r\n', and consequently the replace
does nothing, any Windows line endings aren't normalised, and doctest
will choke on the \r characters. It's only useful if running on Windows.)

But the above only occurs when using a package loader. Otherwise,
_load_testfile executes:

return open(filename).read(), filename

which doesn't do any line ending normalisation at all.

To my mind, this is a bug in doctest. Does anyone disagree? I think the
simplest fix is to change it to:

return open(filename, 'rU').read(), filename

Comments?

Seems like a bug to me. I often assume that I don't know where a
string is coming from, so one of the first steps I usually take when
parsing a string is:

s = s.replace('\r\n', '\n').replace('\r', '\n')

And, out of long-standing pre-Python habit, I always open files in
binary mode and then have my way with them. I know universal mode is
available, but honestly, I don't care for all the bookkeeping on what
kinds of line endings have been seen -- I just want to normalize the
data.

Regards,
Pat

UTF16, BOM, and Windows Line endings	4	Feb 6, 2006
Text file with mixed end-of-line terminations	2	Aug 31, 2011
doctest.testfile universal newline -- only when module_relative=True?	0	Jan 11, 2008
autoconf error on Windows	0	Jan 29, 2012
windows one liner to output unix line feed	13	Aug 19, 2009
Need help with this script	4	Mar 12, 2023
how to get the spesific line of the text in the clipboard	0	May 19, 2009
removingCR/LF from unix and windows and mixed files	3	Sep 11, 2008

doctest.testfile fails on text files with Windows line endings

Steven D'Aprano

Patrick Maupin

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads