doctest.testfile fails on text files with Windows line endings

S

Steven D'Aprano

After converting a text file containing doctests to use Windows line
endings, I'm getting spurious errors:

ValueError: line 19 of the docstring for examples.txt has inconsistent
leading whitespace: '\r'


I don't believe that doctest.testfile is documented as requiring Unix
line endings, and the line endings in the file are okay. I've checked in
a hex editor, and they are valid \r\n line endings.

In doctest._load_testfile, I find this comment and code:

# get_data() opens files as 'rb', so one must do the equivalent
# conversion as universal newlines would do.
return file_contents.replace(os.linesep, '\n'), filename

which I read as an attempt to normalise line endings in the file to \n.

(But surely this will fail? If you're running, say, Linux or MacOS,
linesep will already be '\n' not '\r\n', and consequently the replace
does nothing, any Windows line endings aren't normalised, and doctest
will choke on the \r characters. It's only useful if running on Windows.)

But the above only occurs when using a package loader. Otherwise,
_load_testfile executes:

return open(filename).read(), filename

which doesn't do any line ending normalisation at all.

To my mind, this is a bug in doctest. Does anyone disagree? I think the
simplest fix is to change it to:

return open(filename, 'rU').read(), filename


Comments?
 
P

Patrick Maupin

After converting a text file containing doctests to use Windows line
endings, I'm getting spurious errors:

ValueError: line 19 of the docstring for examples.txt has inconsistent
leading whitespace: '\r'

I don't believe that doctest.testfile is documented as requiring Unix
line endings, and the line endings in the file are okay. I've checked in
a hex editor, and they are valid \r\n line endings.

In doctest._load_testfile, I find this comment and code:

    # get_data() opens files as 'rb', so one must do the equivalent
    # conversion as universal newlines would do.
    return file_contents.replace(os.linesep, '\n'), filename

which I read as an attempt to normalise line endings in the file to \n.

(But surely this will fail? If you're running, say, Linux or MacOS,
linesep will already be '\n' not '\r\n', and consequently the replace
does nothing, any Windows line endings aren't normalised, and doctest
will choke on the \r characters. It's only useful if running on Windows.)

But the above only occurs when using a package loader. Otherwise,
_load_testfile executes:

    return open(filename).read(), filename

which doesn't do any line ending normalisation at all.

To my mind, this is a bug in doctest. Does anyone disagree? I think the
simplest fix is to change it to:

    return open(filename, 'rU').read(), filename

Comments?

Seems like a bug to me. I often assume that I don't know where a
string is coming from, so one of the first steps I usually take when
parsing a string is:

s = s.replace('\r\n', '\n').replace('\r', '\n')

And, out of long-standing pre-Python habit, I always open files in
binary mode and then have my way with them. I know universal mode is
available, but honestly, I don't care for all the bookkeeping on what
kinds of line endings have been seen -- I just want to normalize the
data.

Regards,
Pat
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,994
Messages
2,570,223
Members
46,813
Latest member
lawrwtwinkle111

Latest Threads

Top