compacting '..' path segments using File::Spec

O

ofer

Here's the scenario:

I am given a path to a file, which is actually a relative symlink.
Example:

/foo/bar/somelink -> ../somefile

I tried the following code to follow the symlink and then elegantly
combine the two into a final absolute path to the real file:

use File::Spec;
my $symlink = '/foo/bar/somelink';
my $realfile = readlink( $symlink );
unless ( File::Spec->file_name_is_absolute( $realfile ) ) {
my ( $volume, $directories, $file ) = File::Spec->splitpath( $symlink
);
$realfile = File::Spec->rel2abs( $realfile, $directories );
}

It works... but instead of producing '/foo/somefile', which is what I
want, it produces '/foo/bar/../somefile'. It is technically correct,
but not as elegant as I would like, and makes the final result
unnecessarily depend on the continued existance of the 'bar'
subdirectory in order for the path to remain valid (this data is going
into a long-term database).

Any ideas?

-ofer
 
J

Josef Moellers

Here's the scenario:

I am given a path to a file, which is actually a relative symlink.
Example:

/foo/bar/somelink -> ../somefile

I tried the following code to follow the symlink and then elegantly
combine the two into a final absolute path to the real file:

use File::Spec;
my $symlink = '/foo/bar/somelink';
my $realfile = readlink( $symlink );
unless ( File::Spec->file_name_is_absolute( $realfile ) ) {
my ( $volume, $directories, $file ) = File::Spec->splitpath( $symlink
);
$realfile = File::Spec->rel2abs( $realfile, $directories );
}

It works... but instead of producing '/foo/somefile', which is what I
want, it produces '/foo/bar/../somefile'. It is technically correct,
but not as elegant as I would like, and makes the final result
unnecessarily depend on the continued existance of the 'bar'
subdirectory in order for the path to remain valid (this data is going
into a long-term database).

Any ideas?

Well, since you haven't shown us what you have tried to canonizise the
path name, we have few ideas.

1. You could try a regex which replaces /./s by /s and /<anyname>/../s
by /s.
2. You could split the pathname and work on the components.

There's even a CPAN module iirc.
 
B

Brian McCauley

Here's the scenario:

I am given a path to a file, which is actually a relative symlink.
Example:

/foo/bar/somelink -> ../somefile

I tried the following code to follow the symlink and then elegantly
combine the two into a final absolute path to the real file:

use File::Spec;
my $symlink = '/foo/bar/somelink';
my $realfile = readlink( $symlink );
unless ( File::Spec->file_name_is_absolute( $realfile ) ) {
my ( $volume, $directories, $file ) = File::Spec->splitpath( $symlink
);
$realfile = File::Spec->rel2abs( $realfile, $directories );
}

It works... but instead of producing '/foo/somefile', which is what I
want, it produces '/foo/bar/../somefile'. It is technically correct,
but not as elegant

So elegant is more important than correct?

IIRC File::Spec works only with the file spec in astract. It does not
assume the file spec refers to a file system to which it currently has
access. As such it cannot tell if bar is a directory or a symlink.

Cwd::abspath, on the other hand will give you the canonical absolute
path with no symbolic references.
as I would like, and makes the final result
unnecessarily depend on the continued existance of the 'bar'
subdirectory in order for the path to remain valid (this data is going
into a long-term database).

Any ideas?

Yes, don't do it.

Seriously, if the user chooses to specify the path to the file using one
or more symlinks it is quite possibly because that is what they consider
to be the (logically) canonical location of the data and the absolute
(physically) canonical path is subject to change.
 
A

A. Sinan Unur

(e-mail address removed) wrote:
....

I think you need canonpath:

D:\Home\asu1\UseNet\clpmisc> cat fs.pl
#! perl

use strict;
use warnings;

use File::Spec;

my $p = '../../../asu1/../';
$p = File::Spec->rel2abs($p);
$p = File::Spec->canonpath($p);

print "$p\n";
__END__

D:\Home\asu1\UseNet\clpmisc> fs
D:\Home

Sinan.
 
O

ofer

I'll ignore the two morons and reply to the one who seems to have
understood what I'm trying to accomplish.

By the description of canonpath ('a logical cleanup of a path'), and
your example, it would seem to be what I'm looking for. I threw it in
to my test script... and it didn't change the path at all. It still
returns /foo/bar/../somefile instead of /foo/somefile.

Of course, I'm running on Linux, and I see you're running on DOS or
Windows. So I copied my test script over to my windows desktop,
tweaked it a bit, and tried it. It works!

So it appears canonpath does what I want on DOS/Windows, but not on
Linux.

How bizarre.

Anyways, thanks for the tip. It was valid, even if it doesn't work for
me.
 
A

A. Sinan Unur

(e-mail address removed) wrote in @c13g2000cwb.googlegroups.com:

[ Please provide some context when you are posting. ]
I'll ignore the two morons

OK, you got me curious, let me look up who those are ... Hmmmm ... It's
Brian McCauley and Josef Moellers who have provided insight and help to
countless people. In fact, I just learned something from reading Brian's
post. Thank you Brian.

Oh, getting back to the topic at hand ...

* PLONK *

I hope you'll enjou Xahzilla's company.

Sinan.
 
D

Darren Dunham

By the description of canonpath ('a logical cleanup of a path'), and
your example, it would seem to be what I'm looking for. I threw it in
to my test script... and it didn't change the path at all. It still
returns /foo/bar/../somefile instead of /foo/somefile.

That's be cause on unix filesystems, /foo/bar/../somefile can not
usually be determined to be the same as /foo/somefile.
Of course, I'm running on Linux, and I see you're running on DOS or
Windows. So I copied my test script over to my windows desktop,
tweaked it a bit, and tried it. It works!
So it appears canonpath does what I want on DOS/Windows, but not on
Linux.
How bizarre.

On windows/dos, (no symlinks), the two files are the same.

Even on unix, you can call the windows canonpath directly if you don't
care about it breaking in some cases.
 
J

John W. Krahn

Darren said:
That's be cause on unix filesystems, /foo/bar/../somefile can not
usually be determined to be the same as /foo/somefile.

Sure it can. lstat() both files and if the device numbers and inode numbers
are the same then they are the same file.


John
 
O

ofer

Since I asked the question, it's only fair that I share the answer when
I find it. Ken Williams, author of the wonderful Path::Class, had this
to say on the topic:

---
The reason File::Spec and Path::Class don't handle this case is that,
for instance, /data/current could be a symlink to somewhere else. If
/data/current points to /foo/bar, then
/data/current/../backup/thefile-20050211.tgz really points to
/foo/backup/thefile-20050211.tgz , not
/data/backup/thefile-20050211.tgz .

The "correct" way to do this is to use the Cwd.pm module, which has a
realpath() function that will resolve any '.' and '..' components in
the path.
---

Excellent point. In that case, splitting directories or fancy regexs
would have yielded a blantantly incorrect answer. After trying the
Cwd::realpath() function on some hairy tests, it seems it perfectly
follows the twisty maze of symlinks and returns the most beautiful,
canonical path you could ever ask for.

-ofer
 
B

Brian McCauley

Since I asked the question, it's only fair that I share the answer when
I find it.
The "correct" way to do this is to use the Cwd.pm module, which has a
realpath() function that will resolve any '.' and '..' components in
the path.

You'll note that I mentioned this in my response dated 2005-02-01.
Excellent point. In that case, splitting directories or fancy regexs
would have yielded a blantantly incorrect answer. After trying the
Cwd::realpath() function on some hairy tests, it seems it perfectly
follows the twisty maze of symlinks and returns the most beautiful,
canonical path you could ever ask for.

But as I explained before the correct thing to do is often to do nothing.

On Unix the _physically_ cannonical path to a file is typically subject
to change as filesystems are expediently reorganised. If the user
chooses to specify their perferred _logically_ cannonical path that
resolves via a twisty maze of symlinks then is is counterproductive for
a program to second guess the user.

Indeed I have oft put forward this very assertion as a test to
distinguish between someone who speaks Unix as a foriegn language and
one who speaks Unix as a native. You can tell when you are really at
home in Unix by when the idea of cannoicalizing a file path as you
describe ceases to seem intuatively attractive and starts to appear
intuatively unattractive.

Note: I'm not saying that there are never any situations where physical
cannonicalization is appropriate. I'm just saying that it is
inappropriate in the vast majority of cases.
 
O

ofer

You'll note that I mentioned this in my response dated 2005-02-01.

Yes, you did. I didn't respond to that post because it didn't contain
anything resembling a solution, but you deserve credit for informing us
about that trap (and it is a nasty one).
On Unix the _physically_ cannonical path to a file is typically subject
to change as filesystems are expediently reorganised. If the user
chooses to specify their perferred _logically_ cannonical path that
resolves via a twisty maze of symlinks then is is counterproductive for
a program to second guess the user.

(sigh) Why do you keep assuming that I'm writing some app that will be
released to users? Why do you assume the situation is such that I
probably shouldn't be cannonicalizing paths? I know you're trying to
be helpful, but I know my application, and I know what I'm trying to
accomplish and why. The only thing I asked for help with was how.

I'm writing back-end code that works off other back-end systems in my
company. I need to cannonicalize paths in order to log which exact
file was processed in a run, which would otherwise be hidden by the
symlink, which is not version specific. This is done on purpose to
make it easier to figure out what file to process. Something like
this:

/somedir/thefile.txt -> thefile.2005021802.txt
/somedir/thefile.2005021801.txt (old)
/somedir/thefile.2005021802.txt (current)
/somedir/thefile.2005021803.txt (in the middle of being created)

So the script processes /somedir/thefile.txt, which is wonderful. But
then I log the fact that it was pointing to
/somedir/thefile.2005021802.txt at the time, so we can investigate
problems later.

The moral of the story is... if you feel someone might be approaching a
problem from the wrong angle, it's extremely thoughtful of you to
mention this, but phrase it like "if the situation is thus, then know
that this is probably not the way to go...", and then preferably also
answer the question. Don't simply assume the situation IS thus, and
say "you're wrong".

-ofer
 
M

Martien Verbruggen

On 20 Feb 2005 13:57:05 -0800,
Yes, you did. I didn't respond to that post because it didn't contain
anything resembling a solution, but you deserve credit for informing us
about that trap (and it is a nasty one).

It did have the same solution that you posted in your own followup. To
quote from Brian's post:
Cwd::abspath, on the other hand will give you the canonical
absolute path with no symbolic references.

Martien
 
D

David Combs

Since I asked the question, it's only fair that I share the answer when
I find it. Ken Williams, author of the wonderful Path::Class, had this
to say on the topic:

---
The reason File::Spec and Path::Class don't handle this case is that,
for instance, /data/current could be a symlink to somewhere else. If
/data/current points to /foo/bar, then
/data/current/../backup/thefile-20050211.tgz really points to
/foo/backup/thefile-20050211.tgz , not
/data/backup/thefile-20050211.tgz .

The "correct" way to do this is to use the Cwd.pm module, which has a
realpath() function that will resolve any '.' and '..' components in
the path.

I'm surely doing something wrong, or maybe you have a newer emacs
than mine, but I can't find a "sub.*realpath' anywhere:

egrep -in 'sub.*realpath' `gauf Cwd.pm`

(gauf is alias that greps for its arg, in batch-created
via-find huge list of all files on my (standalone) computer)

David
 
B

Brian McCauley

David said:
I'm surely doing something wrong, or maybe you have a newer emacs
than mine,

What does emacs have to do with it?
I can't find a "sub.*realpath' anywhere

So what?

I just so happens that realpath() is implemented as an alias to another
subroutine.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,989
Messages
2,570,207
Members
46,782
Latest member
ThomasGex

Latest Threads

Top