Ben Morrow said:
Quoth Rainer Weikusat said:
Ben Morrow said:
$filename !~ /[^[:ascii:]]/
is clearer, and works properly against Unicode strings.
Additionally, it doesn't work (in the sense that it would solve the
problem).
A simpler way to test wheter a string contains 'non-printable octets'
would be
$filename =~ /[^[
rint:]]/
You're right.
except -- unfortunately space and htab (0x20 and 9) are printable (I
don't quite understand why space is considered to be a 'safe'
character while \t is not, hence I assumed that ' ' was also supposed
to be excluded).
Space is an ordinary single-width character like any other, it just
happens not to have any ink in its glyph. Tab is a control character
that (typically) produces a context-dependant amount of whitespace.
For example, an app that wanted to know whether it was safe to assume 1
column per byte would treat space like 'A', but not tab.
Both space and \t (and \v, \r and \n, here supposed to be C escape
sequence mapped to ASCII) are whitespace characters and an application
which wanted to know whether it was safe to assume that a filename can
be fed to something which breaks its input into words separated by
whitespace characters would treat them all differently from any
non-whitespace character (eg, encoding them in some form, such as URL
encoding, so that 'splitting on whitespace' produces the correct
results).
Depending on the unknown context of the original question, both
interpretations could make sense (arguably, yours make more sense
because it is not based on the assumption that space was erroneously
included).
This will probably also need a 'use bytes'.
'use bytes' is always wrong.
A statement of the form 'xxx is always wrong' is always wrong when
referring to some kind of existing feature. The 'use bytes'
documentation states
When "use bytes" is in effect [...] each string is treated as
a series of bytes
Yes, I know that. The general opinion among those who actually know how
these things work (which doesn't include me) is that both the design and
the implementation are buggy, and the pragma needs to be deprecated and
then removed. I'm not making these things up, I'm simply relaying the
opinion of those perl developers who are actively working on perl's
Unicode implementation.
If these people are not aware that Perl scalars don't necessarily
store 'character strings' but also arbitrary binary data, and if they
actually want to remove the ability to use them in this way from the
language based on their ignorance of the existance of a world beyond
text processing, they're crackpots and their opinions as irrelevant as
"laymen's babbling" about any topic usually is.
Sorry guys, computer networks do exist and XML is not the universal
messageing data format. You may be convinced that this is terribly
wrong and really shouldn't be in this way, but then - please - go find
yourself some soapbox and preach the true gospel to the nonbelievers
elsewhere, leaving people who have to interoperate with the real world
alone ...
[...]
Go find the relevant p5p threads if you want examples. There are quite a
few of them, as I recall...
I don't even know what you consider to be relevant and I'm certainly
not in the mood for trying to guess what the unknown source you
claimed to be referring to could possibly be. That's a 08/15
propaganda trick: Stay vague enough that people have to supply
sensible interpretations of your statement using their own knowledge/
experience and thus mistakenly believe to agree with you while they're
actually agreeing with themselves.
He who refers to authorities should name them.
I was inclined to think the same thing, until I learned that it's not
that simple and, while 'use bytes' seems like an attractive idea, it
doesn't appear to be possible to make it work properly.
Perl has supported using scalars for binary data since ever and if the
people who 'work on the Perl unicode implementation' cannot make that
work correctly without breaking this feature, this would hint at the
fact that either 'unicode support' cannot be implemented correctly or
(more likely) the peope who happen to dabble in this area are not
competent enough to produce useful results.