Trouble with embedded whitespace in filenames using File::Find

Mike Scott · Jan 25, 2013

RW> In this case, it was about answering a question someone asked
RW> which happened to be related to perl. If that someone should
RW> perhaps have asked a different question in another newsgroup is
RW> for him to decide.

In other words, it really isn't about being helpful, as far as you're
concerned.

Thank you. I used to have a work colleague who'd suggest that 'if you
want to go there, don't start here'. He was too often right; partial
solutions can be worse than none, and this seems a case in point.

I do notice the OP has gone silent. Would be good to know his progress.

Rainer Weikusat · Jan 25, 2013

Charlton Wilbur said:
RW> In this case, it was about answering a question someone asked
RW> which happened to be related to perl. If that someone should
RW> perhaps have asked a different question in another newsgroup is
RW> for him to decide.

In other words, it really isn't about being helpful, as far as you're
concerned.

My opinion on 'being helpful' apparently differs from your opinion on
that. And - this being the important aspect here - what pre-existing
program could possibly solve the problem the OP was trying to solve is
a discussion of its own and one which belongs elsewhere.

Peter J. Holzer · Jan 27, 2013

RW> MD5 (or any other hashing algorithm) is a lot more expensive
RW> than a comparison and especially so if MD5 needs to process 2G
RW> of data while the comparison would only need 8K.

You make several unfounded assumptions here. [...]
Two, that the number of comparisons is small. The more comparisons you
have, the more the advantage goes to the hashing algorithm. If you have
2 files, it is best to read the first 8K of each and compare them,
since, as you note, odds are that any differences will appear early on.
If you have 1000 files, reading the first 8K of each file for
comparison purposes means a great deal of seeking and reading;

It's about the same amount of seeking and a lot less reading than
computing a hash of each of the 1000 files. At least if the files are a
lot larger than 8k.

and then you either store the first 8K, leading to a large working set
(and the first time you swap, you've lost anything you won by avoiding
calculating hashes),

8k * 1000 is 8 MB. That's negligible. And you only have to store this if
there are actually 1000 files of the same size.

There is also a hybrid approach:

For each group of files of the same size, you could initially read only
the first 8k (or some other size large enough to find the first
difference with a high probability, but small enough to be dwarfed by
the overhead of open(2)), and if those are the identical, switch to
computing a hash (and as Ben said, you can use something like SHA512 -
where a collision is IMHO less likely than a false positive due to a
hardware or software error).

hp

Clint O · Jan 28, 2013

That's easy. Step one: find a real news client. Step two: find a real
news server.

Google Groups is unusable as a posting interface to Usenet.

I used to use slrn. Is there anything better than that these days? I do have a subscription to Supernews.

Thanks,

-Clint

Peter J. Holzer · Jan 29, 2013

I used to use slrn. Is there anything better than that these days? I do have a subscription to Supernews.

I guess that depends on your quality criteria.

I use slrn and think it's pretty decent. It has about all the navigation
and filter capabilities I need, displays text/plain in just about any
charset ok (as long as you stick to left-to-right scripts; But since I
can read neither Hebrew nor Arabic that's rarely a problem) and it lets
me edit my postings with my favourite editor (in fact it insists on it).

It is seriously deficient in the display of non-text/plain articles, but
those aren't common in any newsgroup I currently read.

I have also tried Thunderbird, KNode and Pan (although none of them
recently) but wasn't impressed.

hp

File::Find in FreeBSD msdosfs mounts	1	Aug 15, 2004
using File::Find	33	Mar 23, 2011
use file::find to find files modified in last 5 days	8	Feb 22, 2012
Oddity with File::Find and timestamps	12	Dec 24, 2010
Controlling traversal depth using File::Find	2	Mar 16, 2010
reading .ini file without using a module	2	Mar 16, 2011
reading filenames from stdin - with umlauts?	18	Jul 27, 2008
sorting file according to a unicode column	17	May 28, 2014

Trouble with embedded whitespace in filenames using File::Find

Mike Scott

Rainer Weikusat

Peter J. Holzer

Clint O

Peter J. Holzer

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads