os.path.normcase rationale?

C

Chris Withers

Hi All,

I'm curious as to why, with a file called "Foo.txt"
os.path.normcase('FoO.txt') will return "foo.txt" rather than "Foo.txt"?

Yes, I know the behaviour is documented, but I'm wondering if anyone can
remember the rationale for that behaviour?

cheers,

Chris
 
G

Gregory Ewing

Ben said:
it doesn't matter what the case is, so there's no need for
anything more complex than all lowercase.

Also doing what was suggested would require looking at
what's in the file system, which would be a lot of bother
to go to for no good reason, and would fail for paths
that don't correspond to an existing file.
 
N

Nobody

I'm curious as to why, with a file called "Foo.txt"
os.path.normcase('FoO.txt') will return "foo.txt" rather than "Foo.txt"?

normcase() doesn't look at the filesystem; it's just string manipulation.
 
N

Nobody

The docstring is fairly poor, IMO. You might want to submit a bug report
to improve it.

The description in the library documentation is misleading:

os.path.normcase(path)
Normalize the case of a pathname. On Unix and Mac OS X, this returns
the path unchanged; on case-insensitive filesystems, it converts the
path to lowercase. On Windows, it also converts forward slashes to
backward slashes.

It implies that the behaviour depends upon the actual filesystem, which
isn't the case. It only depends upon the platform, i.e. it assumes that
all filenames are case-sensitive on Unix systems and case-insensitive on
Windows. But Unix systems can access FAT/SMBFS/etc filesystems which are
case-insensitive.
 
C

Chris Withers

What kind of answer are you looking for?

A direct answer would be: it does that because on case-insensitive
filesystems, it doesn't matter what the case is, so there's no need for
anything more complex than all lowercase.

Well, no, that doesn't feel right. Normalisation of case, for me, means
"give me the case as the filesystem thinks it should be", not "just
lowercase it all". This makes a difference; I hit it by way of version
pinning in buildout, the culprit being setuptools calling normcase on
distributions found on the filesystem on Windows. Buildout's version
pinning is case sensitive (arguably a bug) and so doesn't work when
setuptools's use of normcase ends up with he distribution name being
lowercased, even though the case is correct on the file system...
The docstring is fairly poor, IMO. You might want to submit a bug report
to improve it.

I think it would have been a lot more confusing had it not been
mentioned that it was just lowercasing everything...

Chris
 
C

Chris Withers

Also doing what was suggested would require looking at
what's in the file system, which would be a lot of bother
to go to for no good reason, and would fail for paths
that don't correspond to an existing file.

..lower() is shorter to type, if that's what you want...

Chris
 
C

Chris Withers

os.path.normcase(path)
Normalize the case of a pathname. On Unix and Mac OS X, this returns
the path unchanged; on case-insensitive filesystems, it converts the
path to lowercase. On Windows, it also converts forward slashes to
backward slashes.

It implies that the behaviour depends upon the actual filesystem, which
isn't the case. It only depends upon the platform, i.e. it assumes that
all filenames are case-sensitive on Unix systems and case-insensitive on
Windows. But Unix systems can access FAT/SMBFS/etc filesystems which are
case-insensitive.

Right, so in its current form it seems pretty useless ;-)

What I expected it to mean was "give me what the filesystem thinks this
file path is", which doesn't seem unreasonable and would be a lot more
useful, no matter the platform...

Chris
 
S

Steven D'Aprano

Well, no, that doesn't feel right. Normalisation of case, for me, means
"give me the case as the filesystem thinks it should be",

What do you mean "the filesystem"?

If I look at the available devices on my system now, I see:

2 x FAT-32 filesystems
1 x ext2 filesystem
3 x ext3 filesystems
1 x NTFS filesystem
1 x UDF filesystem

and if I ever get my act together to install Basilisk II, as I've been
threatening to do for the last five years, there will also be at least
one 1 x HFS filesystem. Which one is "the" filesystem?

If you are suggesting that os.path.normcase(filename) should determine
which filesystem actually applies to filename at runtime, and hence work
out what rules apply, what do you suggest should happen if the given path
doesn't actually exist? What if it's a filesystem that the normpath
developers haven't seen or considered before?
 
E

Ethan Furman

Steven said:
What do you mean "the filesystem"?

Well, if it were me, it would be either the filesystem in the path
that's being norm'ed, or if no path is explicity stated, the filesystem
that is hosting the current directory.
If I look at the available devices on my system now, I see:

2 x FAT-32 filesystems
1 x ext2 filesystem
3 x ext3 filesystems
1 x NTFS filesystem
1 x UDF filesystem

and if I ever get my act together to install Basilisk II, as I've been
threatening to do for the last five years, there will also be at least
one 1 x HFS filesystem. Which one is "the" filesystem?

If you are suggesting that os.path.normcase(filename) should determine
which filesystem actually applies to filename at runtime, and hence work
out what rules apply, what do you suggest should happen if the given path
doesn't actually exist? What if it's a filesystem that the normpath
developers haven't seen or considered before?

Something along those lines seems to be happening now. Observe what
happens on my XP machine with an NTFS drive.

--> import foo
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: No module named foo
--> import FOO
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: No module named FOO
--> import fOo
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: No module named fOo
--> import Foo
Foo has been imported!

~Ethan~
 
N

Nobody

Another is that filesystems don't have a standard way of determining
whether they are case-sensitive. The operating system's driver for that
particular filesystem knows,

I'm not even sure that's true; with a networked filesytem, some parts of
it may be case-sensitive and others case-insensitve (e.g. if you export a
Linux filesystem which includes Windows filesystems mounted beneath the
root of the export).
 
C

Chris Withers

What do you mean "the filesystem"?

If I look at the available devices on my system now, I see:

2 x FAT-32 filesystems
1 x ext2 filesystem
3 x ext3 filesystems
1 x NTFS filesystem
1 x UDF filesystem

Right, and each of these will know what it thinks a file's "real" name
is, along with potentially accepting as set of synonyms for them...
and if I ever get my act together to install Basilisk II, as I've been
threatening to do for the last five years, there will also be at least
one 1 x HFS filesystem. Which one is "the" filesystem?

Whichever one you're getting the file from...
If you are suggesting that os.path.normcase(filename) should determine
which filesystem actually applies to filename at runtime, and hence work
out what rules apply, what do you suggest should happen if the given path
doesn't actually exist?

I'd suggest an exception be raised.
Really, what's the point of normcase if it's basically just
"if os=='win': return path.lower()"
What if it's a filesystem that the normpath
developers haven't seen or considered before?

I didn't say it was an easy problem, but the current normpath is a waste
of space...

Chris
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,169
Messages
2,570,920
Members
47,462
Latest member
ChanaLipsc

Latest Threads

Top