PyWart: os.path needs immediate attention!

R

rantingrick

--------------------------------------------------
Overview of Problems:
--------------------------------------------------

* Too many methods exported.
* Poor choice of method names.
* Non public classes/methods exported!
* Duplicated functionality.

--------------------------------------------------
Proposed new functionality:
--------------------------------------------------

* New path module will ONLY support one path sep! There is NO reason
to support more than one. When we support more than one path sep we
help to propagate multiplicity.We should only support the slash and
NOT the backslash across ALL OS's since the slash is more widely
accepted. If an OS does not have the capability to support only the
slash then that OS is not worthy of a Python builtin module. The users
of such OS will be responsible for managing their OWN os_legacy.path
module. We are moving forward. Those who wish to wallow in the past
will be left behind.

* Introduce a new method named "partition" which (along with string
splitting methods) will replace the six methods "basename", "dirname",
"split", "splitdrive", "splitunc", "splittext". The method will return
a tuple of the path split into four parts: (drive, path, filename,
extension). This is the ONLY splitting method this module needs. All
other splits can use string methods.

--------------------------------------------------
Expose of the Warts of current module:
--------------------------------------------------


~~~~~~~~~~~~~~~~~~~~~~~~~
1. Too many methods
~~~~~~~~~~~~~~~~~~~~~~~~~

Follows is a list of what to keep and what to cull:

+ abspath
+ altsep
- basename --> path.partition[-2]
+ commonprefix
+ curdir
+ defpath
+ devnull
- dirname --> os.path.join(drive,path)
+ exists
+ expanduser
+ expandvars
+ extsep
- genericpath --> should be private!
+ getatime
+ getctime
+ getmtime
+ getsize
+ isabs
+ isdir
+ isfile
+ islink
+ ismount
+ join
- lexists --> duplicate!
- normcase --> path = path.lower()
- normpath --> should not need this!
- os --> should be private!
+ pardir
+ pathsep
+ realpath
+ relpath
+ sep
- split --> path.rsplit('/', 1)
- splitdrive --> path.split(':', 1)
- splitext --> path.rsplit('.')
- splitunc --> Unix specific!
- stat --> should be private!
+ supports_unicode_filenames --> windows specific!
- sys --> should be private!
+ walk
- warnings --> should be private!


~~~~~~~~~~~~~~~~~~~~~~~~~
2. Poor Name Choices:
~~~~~~~~~~~~~~~~~~~~~~~~~

* basename --> should be: filename
* split --> split what?
* splitext --> Wow, informative!

~~~~~~~~~~~~~~~~~~~~~~~~~
3. Non Public Names Exposed!
~~~~~~~~~~~~~~~~~~~~~~~~~

* genericpath
* os
* stat
* sys
* warnings


Note: i did not check the Unix version of os.path for this.

~~~~~~~~~~~~~~~~~~~~~~~~~
4. Duplicated functionality.
~~~~~~~~~~~~~~~~~~~~~~~~~
'Test whether a path exists. Returns False for broken symbolic links'

Should have been one method:
 
A

Andrew Berg

* New path module will ONLY support one path sep! There is NO reason
to support more than one. When we support more than one path sep we
help to propagate multiplicity.We should only support the slash and
NOT the backslash across ALL OS's since the slash is more widely
accepted. If an OS does not have the capability to support only the
slash then that OS is not worthy of a Python builtin module. The users
of such OS will be responsible for managing their OWN os_legacy.path
module. We are moving forward. Those who wish to wallow in the past
will be left behind.
So now you propose that not only does Python need drastic changes, but a
major operating system family as well (I know Windows will accept a
forward slash in some contexts, but I'd be pretty surprised if one could
completely replace the backslash with it completely)? Interesting.
* Introduce a new method named "partition" which (along with string
splitting methods) will replace the six methods "basename", "dirname",
"split", "splitdrive", "splitunc", "splittext". The method will return
a tuple of the path split into four parts: (drive, path, filename,
extension). This is the ONLY splitting method this module needs. All
other splits can use string methods.
So these pretty specifically named functions (except perhaps split)
should be replaced with one ambiguously named one? Interesting.
- dirname --> os.path.join(drive,path)
Now you've stopped making sense completely. Also interesting.
* split --> split what?
...
I actually agree with you on these, which I suppose is interesting.
 
H

harrismh777

Andrew said:
* New path module will ONLY support one path sep! There is NO reason

I actually agree with this. Like Timon once told Pumbaa, "Ya gotta put
your behind in the past. . . "

The backslash sep is an asinine CPM/80 | DOS disk based carry-over which
does not fit well with the modern forward direction. The disk based file
system carry-over is bad enough; but, propagating multiple ways of doing
simple things like specifying file-system paths is not helpful in any
context.

The modern direction today (almost universally on the server-side) is to
specify the path from the root "/" regardless of physical disk array
geometries (or physical drives "C:\"). The forward slash actually makes
some philosophical sense, and of course is more aesthetically pleasing.

So, let's put our behinds in the past and slash forward !
 
W

Waldek M.

Dnia Fri, 29 Jul 2011 14:41:22 -0500, harrismh777 napisa³(a):
The backslash sep is an asinine CPM/80 | DOS disk based carry-over which
does not fit well with the modern forward direction. The disk based file
system carry-over is bad enough; but, propagating multiple ways of doing
simple things like specifying file-system paths is not helpful in any
context.

Please, do tell it to Microsoft. And once you've convinced them,
and they've implemented it, do report :)

Waldek
 
T

Teemu Likonen

* 2011-07-29T10:22:04-07:00 * said:
* New path module will ONLY support one path sep! There is NO reason
to support more than one.

Pathnames and the separator for pathname components should be abstracted
away, to a pathname object. This pathname object could have a "path" or
"directory" slot which is a list of directory components (strings). Then
there would be method like "to_namestring" which converts a pathname
object to native pathname string. It takes care of any platform-specific
stuff like pathname component separators. Of course "to_pathname" method
is needed too. It converts system's native pathname string to a pathname
object.
 
C

Chris Angelico

~~~~~~~~~~~~~~~~~~~~~~~~~
 3. Non Public Names Exposed!
~~~~~~~~~~~~~~~~~~~~~~~~~

 * genericpath
 * os
 * stat
 * sys
 * warnings

And you intend to do what, exactly, with these?
- splitunc --> Unix specific!

1) So?
2) http://docs.python.org/library/os.path.html#os.path.splitunc says
"Availability: Windows."

If you actually meant "Windows specific" then the first one still holds.

Under Unix, the only place UNC names are supported is CIFS mounting
(and equivalents, smbmount etc).

ChrisA
 
C

Corey Richardson

Excerpts from rantingrick's message of Fri Jul 29 13:22:04 -0400 2011:
--------------------------------------------------
Proposed new functionality:
--------------------------------------------------

* New path module will ONLY support one path sep! There is NO
reason to support more than one. When we support more than one path
sep we help to propagate multiplicity.We should only support the
slash and NOT the backslash across ALL OS's since the slash is more
widely accepted. If an OS does not have the capability to support
only the slash then that OS is not worthy of a Python builtin
module. The users of such OS will be responsible for managing their
OWN os_legacy.path module. We are moving forward. Those who wish to
wallow in the past will be left behind.

People who use windows are used to \ being their pathsep. If you show
them a path that looks like C:/whatever/the/path in an app, they are
going to think you are nuts. It isn't up to us to decide what anyone
uses as a path separator. They use \ natively, so should we. If at
any point Windows as an OS starts using /'s, and not support, actually
uses (in Windows Explorer as default (or whatever the filebrowser's
name is)), it would be great to switch over.
* Introduce a new method named "partition" which (along with string
splitting methods) will replace the six methods "basename",
"dirname", "split", "splitdrive", "splitunc", "splittext". The
method will return a tuple of the path split into four parts:
(drive, path, filename, extension). This is the ONLY splitting
method this module needs. All other splits can use string methods.

I agree, although what if one wants to further split the returned
path, in an OS-independent way? Just because we want all pathseps
to be /, doesn't mean they are (and I personally don't care either
way).
~~~~~~~~~~~~~~~~~~~~~~~~~ 1. Too many methods ~~~~~~~~~~~~~~~~~~~~~~~~~

Follows is a list of what to keep and what to cull:

+ abspath
+ altsep
- basename --> path.partition[-2]
+ commonprefix
+ curdir
+ defpath
+ devnull
- dirname --> os.path.join(drive,path)
+ exists
+ expanduser
+ expandvars
+ extsep
- genericpath --> should be private!
+ getatime
+ getctime
+ getmtime
+ getsize
+ isabs
+ isdir
+ isfile
+ islink
+ ismount
+ join
- lexists --> duplicate!
- normcase --> path = path.lower()

Not quite, here are a few implementations of normcase (pulled from 2.7)

# NT
def normcase(s):
"""Normalize case of pathname.

Makes all characters lowercase and all slashes into backslashes."""
return s.replace("/", "\\").lower()

# Mac (Correct in this case)

def normcase(path):
return path.lower()

# POSIX

def normcase(s):
"""Normalize case of pathname. Has no effect under Posix"""
return s


But I can't think of where I would ever use that. Isn't case important on
Windows?
- normpath --> should not need this!

Why not? It provides an OS-independent way to make the pathname look pretty,
maybe for output? I don't really see a use for it, to be honest. But I'd
rather there be a working solution in the stdlib than have everyone need to
throw in their own that might not handle some things properly. Talk about
multiplicity!
- os --> should be private!
+ pardir
+ pathsep
+ realpath
+ relpath
+ sep
- split --> path.rsplit('/', 1)

And on those operating systems where "\\" is the pathsep?
- splitdrive --> path.split(':', 1)
- splitunc --> Unix specific!

Err...no. It's for UNC paths, as in \\server\mount\foo\bar. It's not even
in posixpath.py, so in no way could it ever be Unix specific.
- stat --> should be private!
+ supports_unicode_filenames --> windows specific!
- sys --> should be private!
+ walk
- warnings --> should be private!


~~~~~~~~~~~~~~~~~~~~~~~~~
2. Poor Name Choices:
~~~~~~~~~~~~~~~~~~~~~~~~~

* basename --> should be: filename

I agree. The name is a carryover from basename(1) and I guess it's good for
those people, but it certainly isn't the least surprising name. If anything,
I would think the base is everything before!
* split --> split what?

The path, of course. On its own, it's uninformative, but considering
the whole name is "os.path.split", it's fairly intuitive.
* splitext --> Wow, informative!

split extension...seems straightforward to me.
~~~~~~~~~~~~~~~~~~~~~~~~~
4. Duplicated functionality.
~~~~~~~~~~~~~~~~~~~~~~~~~

'Test whether a path exists. Returns False for broken symbolic links'

Should have been one method:

It is.

/usr/lib64/python2.7/ntpath.py says..
# alias exists to lexists
lexists = exists

But over here in Not-NT where we actually *have* symlinks to be broken, it's
'Test whether a path exists. Returns False for broken symbolic links

I agree that it should be an argument to os.path.exists, though.
--
Corey Richardson
"Those who deny freedom to others, deserve it not for themselves"
-- Abraham Lincoln

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (GNU/Linux)

iQEcBAEBCAAGBQJOMxucAAoJEAFAbo/KNFvpM7QIAICiWYI6Y/Ipr4jyF1YgsYhN
m/9BpYjqU9F/isiiRSGrnitpFWrjG2FoKc9w71/4JExHg1XIhrUJI0kg4LeofTOE
Qm2pWEzxNZkMVdC3c3BzInYhRoxof2UZhp1gDkRpw1cJcm/6vm9mGarqd3kBxa90
+edqSUDvQV7MPydn99gI0Kw2gE+OFi6363UlsRCJZSh8gu2mFn0uBKRQGXwOdUc5
VOjgCUcoHuu/qfK5TP479KGiJQWh1//5P+iLPnKdapIeBs79Tog8yV5S47fdPe5e
3k4VC9jEF7KVGhB1L1V7EDQg7bojYDTcT1+H8kFFlSvK1YHLJdozh5MbBnwSVg4=
=o+aC
-----END PGP SIGNATURE-----
 
A

Alister Ware

--------------------------------------------------
Overview of Problems:
--------------------------------------------------

* Too many methods exported.
* Poor choice of method names.
* Non public classes/methods exported!
* Duplicated functionality.

--------------------------------------------------
Proposed new functionality:
--------------------------------------------------

* New path module will ONLY support one path sep! There is NO reason
to support more than one. When we support more than one path sep we help
to propagate multiplicity.We should only support the slash and NOT the
backslash across ALL OS's since the slash is more widely accepted. If an
OS does not have the capability to support only the slash then that OS
is not worthy of a Python builtin module. The users of such OS will be
responsible for managing their OWN os_legacy.path module. We are moving
forward. Those who wish to wallow in the past will be left behind.

* Introduce a new method named "partition" which (along with string
splitting methods) will replace the six methods "basename", "dirname",
"split", "splitdrive", "splitunc", "splittext". The method will return a
tuple of the path split into four parts: (drive, path, filename,
extension). This is the ONLY splitting method this module needs. All
other splits can use string methods.

--------------------------------------------------
Expose of the Warts of current module:
--------------------------------------------------


~~~~~~~~~~~~~~~~~~~~~~~~~
1. Too many methods
~~~~~~~~~~~~~~~~~~~~~~~~~

Follows is a list of what to keep and what to cull:

+ abspath + altsep - basename --> path.partition[-2]
+ commonprefix + curdir + defpath + devnull - dirname -->
os.path.join(drive,path)
+ exists + expanduser + expandvars + extsep - genericpath --> should be
private!
+ getatime + getctime + getmtime + getsize + isabs + isdir + isfile +
islink + ismount + join - lexists --> duplicate!
- normcase --> path = path.lower()
- normpath --> should not need this!
- os --> should be private!
+ pardir + pathsep + realpath + relpath + sep - split -->
path.rsplit('/', 1)
- splitdrive --> path.split(':', 1)
- splitext --> path.rsplit('.')
- splitunc --> Unix specific!
- stat --> should be private!
+ supports_unicode_filenames --> windows specific!
- sys --> should be private!
+ walk - warnings --> should be private!


~~~~~~~~~~~~~~~~~~~~~~~~~
2. Poor Name Choices:
~~~~~~~~~~~~~~~~~~~~~~~~~

* basename --> should be: filename * split --> split what?
* splitext --> Wow, informative!

~~~~~~~~~~~~~~~~~~~~~~~~~
3. Non Public Names Exposed!
~~~~~~~~~~~~~~~~~~~~~~~~~

* genericpath * os * stat * sys * warnings


Note: i did not check the Unix version of os.path for this.

~~~~~~~~~~~~~~~~~~~~~~~~~
4. Duplicated functionality.
~~~~~~~~~~~~~~~~~~~~~~~~~
'Test whether a path exists. Returns False for broken symbolic links'

Should have been one method:

so far all I have is posts stating that everything is wrong.

instead of all this negativity why don't you try being productive for a
change either make a suggestion for an addition (ie something that does
not yest exits) or better yet give us all the benefit of your supreme
coding talent & provide some code?
 
A

Andrew Berg

instead of all this negativity why don't you try being productive for a
change either make a suggestion for an addition (ie something that does
not yest exits) or better yet give us all the benefit of your supreme
coding talent & provide some code?
Because trolling the group is apparently more fun, even though most of
the regulars here know he's trolling.
 
C

Chris Angelico

Excerpts from rantingrick's message of Fri Jul 29 13:22:04 -0400 2011:

People who use windows are used to \ being their pathsep. If you show
them a path that looks like C:/whatever/the/path in an app, they are
going to think you are nuts. It isn't up to us to decide what anyone
uses as a path separator. They use \ natively, so should we. If at
any point Windows as an OS starts using /'s, and not support, actually
uses (in Windows Explorer as default (or whatever the filebrowser's
name is)), it would be great to switch over.

Just tested this: You can type c:/foldername into the box at the top
of a folder in Explorer, and it works. It will translate it to
c:\foldername though.

We are not here to talk solely to OSes. We are here primarily to talk
to users. If you want to display a path to the user, it's best to do
so in the way they're accustomed to - so on Windows, display
backslashes.
Not quite, here are a few implementations of normcase (pulled from 2.7)

# NT
   return s.replace("/", "\\").lower()

# Mac (Correct in this case)
   return path.lower()

# POSIX
   return s

But I can't think of where I would ever use that. Isn't case important on
Windows?

No, as is demonstrated by the three above; case isn't important on
Windows, but it is on Unix.
Why not? It provides an OS-independent way to make the pathname look pretty,
maybe for output? I don't really see a use for it, to be honest.

See above, we talk to users.
But I'd
rather there be a working solution in the stdlib than have everyone need to
throw in their own that might not handle some things properly.
Absolutely!

I agree. The name is a carryover from basename(1) and I guess it's good for
those people, but it certainly isn't the least surprising name. If anything,
I would think the base is everything before!

Agreed that it's an odd name; to me, "basename" means no path and no
extension - the base name from r"c:\foo\bar\quux.txt" is "quux".
Unfortunately that's not what this function returns, which would be
"quux.txt".

ChrisA
 
T

Terry Reedy

* Introduce a new method named "partition" which (along with string
splitting methods) will replace the six methods "basename", "dirname",
"split", "splitdrive", "splitunc", "splittext". The method will return
a tuple of the path split into four parts: (drive, path, filename,
extension).

A named tuple would be an even better return, so one could refer to the
parts as t.drive, etc.
 
M

Michael Poeltl

join 'Python-Dev'-mailinglist and tell them!
from now on I will just ignore threads you initiated

does trolling really make that much fun?
* rantingrick said:
--------------------------------------------------
Overview of Problems:
--------------------------------------------------

* Too many methods exported.
* Poor choice of method names.
* Non public classes/methods exported!
* Duplicated functionality.

--------------------------------------------------
Proposed new functionality:
--------------------------------------------------

* New path module will ONLY support one path sep! There is NO reason
to support more than one. When we support more than one path sep we
help to propagate multiplicity.We should only support the slash and
NOT the backslash across ALL OS's since the slash is more widely
accepted. If an OS does not have the capability to support only the
slash then that OS is not worthy of a Python builtin module. The users
of such OS will be responsible for managing their OWN os_legacy.path
module. We are moving forward. Those who wish to wallow in the past
will be left behind.

* Introduce a new method named "partition" which (along with string
splitting methods) will replace the six methods "basename", "dirname",
"split", "splitdrive", "splitunc", "splittext". The method will return
a tuple of the path split into four parts: (drive, path, filename,
extension). This is the ONLY splitting method this module needs. All
other splits can use string methods.

--------------------------------------------------
Expose of the Warts of current module:
--------------------------------------------------


~~~~~~~~~~~~~~~~~~~~~~~~~
1. Too many methods
~~~~~~~~~~~~~~~~~~~~~~~~~

Follows is a list of what to keep and what to cull:

+ abspath
+ altsep
- basename --> path.partition[-2]
+ commonprefix
+ curdir
+ defpath
+ devnull
- dirname --> os.path.join(drive,path)
+ exists
+ expanduser
+ expandvars
+ extsep
- genericpath --> should be private!
+ getatime
+ getctime
+ getmtime
+ getsize
+ isabs
+ isdir
+ isfile
+ islink
+ ismount
+ join
- lexists --> duplicate!
- normcase --> path = path.lower()
- normpath --> should not need this!
- os --> should be private!
+ pardir
+ pathsep
+ realpath
+ relpath
+ sep
- split --> path.rsplit('/', 1)
- splitdrive --> path.split(':', 1)
- splitext --> path.rsplit('.')
- splitunc --> Unix specific!
- stat --> should be private!
+ supports_unicode_filenames --> windows specific!
- sys --> should be private!
+ walk
- warnings --> should be private!


~~~~~~~~~~~~~~~~~~~~~~~~~
2. Poor Name Choices:
~~~~~~~~~~~~~~~~~~~~~~~~~

* basename --> should be: filename
* split --> split what?
* splitext --> Wow, informative!

~~~~~~~~~~~~~~~~~~~~~~~~~
3. Non Public Names Exposed!
~~~~~~~~~~~~~~~~~~~~~~~~~

* genericpath
* os
* stat
* sys
* warnings


Note: i did not check the Unix version of os.path for this.

~~~~~~~~~~~~~~~~~~~~~~~~~
4. Duplicated functionality.
~~~~~~~~~~~~~~~~~~~~~~~~~
'Test whether a path exists. Returns False for broken symbolic links'

Should have been one method:
 
S

Steven D'Aprano

Andrew said:
I actually agree with you on these, which I suppose is interesting.


Guido has a rule of thumb: "No constant arguments". Or another way to put
it: if a function takes an argument which is nearly always a constant
(usually, but not always, a flag) then it is usually better off as two
functions.

Especially if the implementation looks like this:

def get_thing(argument, flag):
if flag:
return one_thing(argument)
else:
return another_thing(argument)


Argument flags which do nothing but change the behaviour of the function
from Mode 1 to Mode 2 are an attractive nuisance: they seem like a good
idea, but aren't. Consider it a strong guideline rather than a law, but
it's one I would think very long and hard about before violating.

But having said that, I'm currently writing a library where nearly all the
functions violate the No Constant Argument rule. (The API isn't yet stable,
so I may still change my mind.) Make of that what you will.
 
A

Andrew Berg

Especially if the implementation looks like this:

def get_thing(argument, flag):
if flag:
return one_thing(argument)
else:
return another_thing(argument)
Well, that would be annoying, but wouldn't it be even more annoying to
do this:
def get_one_thing(arg):
return one_thing(arg)

def get_another_thing(arg):
return another_thing(arg)
Argument flags which do nothing but change the behaviour of the function
from Mode 1 to Mode 2 are an attractive nuisance: they seem like a good
idea, but aren't. Consider it a strong guideline rather than a law, but
it's one I would think very long and hard about before violating.
Creating separate functions for two thing that do almost the same thing
seem more of a nuisance to me, especially if they share a lot of code
that isn't easily separated into other functions.
 
A

Andrew Berg

If they share a lot of code, either it *is* separable to common
functions (in which case, implement it that way), or the “same thingâ€
code is sufficiently complex that it's better to show it explicitly.

But this is all getting rather generic and abstract. What specific
real-world examples do you have in mind?
I can't come up with any off the top of my head. I was thinking of a
general rule anyway; ultimately one decides (or at least should) how to
write code on a case-by-case basis.
 
D

Dennis Lee Bieber

Just tested this: You can type c:/foldername into the box at the top
of a folder in Explorer, and it works. It will translate it to
c:\foldername though.
So far as I know, the only place that /requires/ the \ is the
command shell (since / is the "option" introducer). If one uses run-time
libraries only, and never invokes os.system() or similar, one is free to
use / all they want.
 
T

Terry Reedy

Guido has a rule of thumb: "No constant arguments". Or another way to put
it: if a function takes an argument which is nearly always a constant
(usually, but not always, a flag) then it is usually better off as two
functions.

I do not really understand his 'rule'*. The stdlib has lots of functions
with boolean flags and params which default to None and are seldom
over-ridden.

* Which is to say, it feels more like his gut feeling applied on a
case-by-case basis than an actual rule that anyone could apply in any
objective manner.
Especially if the implementation looks like this:

def get_thing(argument, flag):
if flag:
return one_thing(argument)
else:
return another_thing(argument)

If the rule is limited to this situation, where no code is shared, it
seems pretty sensible.
Argument flags which do nothing but change the behaviour of the function
from Mode 1 to Mode 2 are an attractive nuisance: they seem like a good
idea, but aren't. Consider it a strong guideline rather than a law, but
it's one I would think very long and hard about before violating.

But having said that, I'm currently writing a library where nearly all the
functions violate the No Constant Argument rule. (The API isn't yet stable,
so I may still change my mind.) Make of that what you will.

See * above ;-).

Terry Jan Reedy
 
G

Grant Edwards

join 'Python-Dev'-mailinglist and tell them!
from now on I will just ignore threads you initiated

does trolling really make that much fun?

RR must think so, considering how much effort he seems to put into it.
It is rather amusing to see how many people take him seriously. I
plonked him ages ago, but thanks to all the people who reply with one
or two lines and then quote his entire flippin' rant, I still get to
see most of his posts.
* rantingrick <[email protected]> [2011-07-29 19:25]:

[100+] lines of unnecessarily quoted bait elided.
 
A

Andrew Berg

RR must think so, considering how much effort he seems to put into it.
He hasn't replied to his last two troll threads, though. It does seem
odd to write a wall of text and then not respond to replies. To be fair,
though, most replies either mock him or point out that he's a troll. :D
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,989
Messages
2,570,207
Members
46,783
Latest member
RickeyDort

Latest Threads

Top