"is-numeric" check?

I

Ivan Shmakov

Is there a simple way to check if a value is numeric in Perl?

FWIW, (($x + 0) eq $x) doesn't fit, as it returns false should
$x contain any leading zeros or whitespace.

Apparently, there /is/ such a check in Perl, but is there a way
to call it explicitly?

TIA.

$ perl -we 'print ("non-numeric" + 0, "\n");'
Argument "non-numeric" isn't numeric in addition (+) at -e line 1.
0
$
 
W

Wolf Behrenhoff

Am 29.07.2012 10:16, schrieb Ivan Shmakov:
Is there a simple way to check if a value is numeric in Perl?

Maybe you want:

use Scalar::Util qw(looks_like_number);

- Wolf
 
I

Ivan Shmakov

Maybe you want:
use Scalar::Util qw(looks_like_number);

Indeed, and it even seems to differentiate between integers,
floats and "specials" (which is what I need), like:

$ perl -we 'use Scalar::Util qw (looks_like_number);
foreach my $v (" 12", "abc", " 03", "4.5", ".67", "NaN") {
printf ("%3d %s\n", looks_like_number ($v), $v);
};'
1 12
0 abc
1 03
5 4.5
5 .67
36 NaN
$

Unfortunately, perlapi(3perl) [1] doesn't seem to describe the
exact meaning of these distinct values, so I guess I shouldn't
rely on that.

[1] http://perldoc.perl.org/perlapi.html#looks_like_number
 
J

Jürgen Exner

Ivan Shmakov said:
Is there a simple way to check if a value is numeric in Perl?

At some time this Question was Asked Frequently. Please see
"perldoc -q number":
"How do I determine whether a scalar is a
number/whole/integer/float?"

jue
 
I

Ivan Shmakov

Ben Morrow said:
Indeed, and it even seems to differentiate between integers, floats
and "specials" (which is what I need), like:
[...]
Unfortunately, perlapi(3perl) [1] doesn't seem to describe the exact
meaning of these distinct values, so I guess I shouldn't rely on
that.
They are documented under grok_number, which is called by lln for
scalars which are currently strings. This isn't reliable, though,
because scalars which are currently numbers will return something
entirely different:
[...]

(Scalar::Util really ought to smash its return value to boolean.)

ACK, thanks. I've decided that I don't actually need to
distinguish integers from non-integers, and for now ended up
with the following bit of code:

sub number_or {
foreach my $v (@_) {
## .
return $v
if (looks_like_number ($v));
}

## .
undef;
}

## FIXME: should signal an error if FOO exists, non-empty and non-number
my $foo
= number_or ($ENV{"FOO"},
$ENV{"FOO_COMPAT"},
$foo_default);
 
I

Ivan Shmakov

BTW, the From: of the article I'm replying to contains unencoded
(as per RFC 2047) non-ASCII data, which is explicitly prohibited
by the recent revision of the Netnews article format (RFC 5536,
section 2.2.)

The interoperability is thus non-warranted.

(In particular, I'm planning to work on a "NNTP server"
implementation next year, and it's likely that it will reject
messages with non-ASCII headers outright.)
At some time this Question was Asked Frequently. Please see "perldoc
-q number":
"How do I determine whether a scalar is a
number/whole/integer/float?"

ACK, thanks. It mentions looks_like_number, too, but also
POSIX::strtod, POSIX::strtol (which are somewhat non-portable,
AIUI); and pure-Perl String::Scanf and a solution using "given",
both based on Perl regular expressions, which I'd like to avoid.

(JFTR: it's also at [1].)

[1] http://perldoc.perl.org/perlfaq4.html
 
D

Dr.Ruud

Is there a simple way to check if a value is numeric in Perl?

No. Why do you think that you need it?

Perl is a strongly typed language. The type is not in the values, as it
is in many other languages, but in the operators.
 
I

Ivan Shmakov

No. Why do you think that you need it?

My program receives a crucial piece of information via its
command line, and I'd like it to fail with a clear error message
should a non-number be passed, instead of silently (or with a
warning) interpreting it as zero.
Perl is a strongly typed language.

I guess that my $x = "x" + 1; should then die at once.
 
D

Dr.Ruud

My program receives a crucial piece of information via its
command line, and I'd like it to fail with a clear error message
should a non-number be passed, instead of silently (or with a
warning) interpreting it as zero.

So you need to validate user input. For that you need to use a parser.
How do you define 'numeric'? Does 1_000_000 == 1000000?

I guess that my $x = "x" + 1; should then die at once.

perl -Mstrict -wle'
#local $SIG{"__WARN__"}= sub { die @_ };
my $x= "x" + 1;
print $x;
'

If you need it to die on warnings, uncomment that SIG line.
 
R

Rainer Weikusat

Dr.Ruud said:
On 2012-07-30 12:06, Ivan Shmakov wrote:
[...]
I guess that my $x = "x" + 1; should then die at once.

perl -Mstrict -wle'
#local $SIG{"__WARN__"}= sub { die @_ };
my $x= "x" + 1;
print $x;
'

If you need it to die on warnings, uncomment that SIG line.

A strongly-typed language would be one where no automatic conversions
are performed, especially not very likely wrong ones like converting
"x" to 0. But by default, Perl doesn't even warn about this
conversion, this has to be enabled explicitly and that explicit code
needs to be written in order to turn this optional warning into a
fatal runtime error is also the opposited of 'strongly typed'.
 
I

Ivan Shmakov

So you need to validate user input.
Yes.

For that you need to use a parser.

Indeed, and the one that Perl provides is fine, as long as it
allows for explicit validation. And thanks to
looks_like_number, it does.
How do you define 'numeric'?

I see no point in extending Perl's own numeric syntax for my
application. Neither do I see any reason in using a "stricter"
syntax than the one supported by Perl at this moment.
Does 1_000_000 == 1000000?

As per the above, no. OTOH, 1e6 == 1000000.

[...]
 
I

Ivan Shmakov

[Cross-posting and setting Followup-To: for
obvious reasons.]
That's appropriate for now, but keep your eye on the
internationalization efforts at IETF. There is an RFC set for
internationalized e-mail, and news is the obvious next step;

As of RFC 5536 (published November 2009), it's /allowed/ to use
national characters in Netnews article headers, /provided/ that
RFC 2047 is used to encode them in 7-bit ASCII. For instance:

Subject: al =?utf-8?Q?=C4=89iu?= abonantoj

is perfectly valid, while:

Subject: al \xE6iu abonantoj

(where \xE6 is the octet to be interpreted using a particular,
unspecified encoding; the form I was complaining to) is not.
 
I

Ivan Shmakov

[Cross-posting to just in case; dropping
news:comp.lang.perl.misc from Followup-To:.]
Yes, but compare that to what is going on with e-mail; the
experimental RFC 5335, allowing raw UTF-8 in headers, has been
replaced by a standard track RFC, 6532.

ACK, thanks for the pointer!
While RFC 3977 provides an equivalent to 8BITMIME, there is no
netnews equivalent to SMTPUTF8, and I'm predicting that there will
eventually be one.

So, I should be prepared to allow both pure-ASCII /and/ UTF-8.
It's still better than allowing an arbitrary octet sequence in
an unspecified encoding in the header.
BTW, I interpreted your "non-ASCII" as referring to octets beyond
127, rather than to ASCII encoding of non-ASCII data.

Indeed, it's exactly what I've meant.

PS. Still, I believe that while the systems of the days passed
warranted a separation between Netnews /Transfer/ Agents (BKA
NNTP "servers", though it's a bit a misnomer) and Netnews /User/
Agents (NNTP "clients"; and, similarly, between Mail Transfer
Agents and Mail User Agents), the performance of the modern
computers, along with the reasonable success of the contemporary
P2P systems, makes it possible to get rid of such a distinction,
and allow for a direct "user-to-user" communication, in both a
Netnews- and Mail-like fashion. Perhaps, such an approach would
be more in line with the "Gen X" habits?
 
I

Ivan Shmakov

[Still wondering why news:comp.lang.perl.misc wasn't dropped.]
Prepared in the sense of having a flag with initial value false that,
if set, will permit non-ASCII.

... to permit UTF-8 (or anything that could be interpreted as
that) in the header, not just arbitrary binary data. (Binary
data will be allowed for the body, subject to the relevant
restrictions, and provided that Content-Transfer-Encoding: is
present and has "8bit" as its value.)
I don't agree; I like being able to work offline

After the data is downloaded from a P2P network (it may be
BitTorrent, GNUnet, Freenet, or whatever else), it could be used
off-line just perfectly.
and I like not having to use bloated web intefraces.

There're P2P agents with almost whatever interface: CLI,
full-screen text, graphical, Web, XML-RPC, etc.

(Not to mention that the contemporary Web is anything but a P2P
network.)
Further, there are security issues.

Namely? Freenet, GNUnet and Tor networks seem to have an
explicit focus on security, while BitTorrent's metadata (both
.torrent and Metalink) may be protected by digital signatures
(OpenPGP will work for either; Metalink should support XMLDSig,
too.)
 
I

Ivan Shmakov

[Dropping from Followup-To: as a matter
of habit.]
Do those have the routing capabilities that mail and news require?

The current Netnews architecture relies, in essence, on a
simplistic flood-fill routing. That is, for each group, we can
imagine a network of nodes, each of which, upon receipt of a new
message, relays it to all of its peers. The only two things
that complicate this scheme are Path: and IHAVE, which both
reduce the load by not sending the message the peer already has.

In this scheme, and taking INN as an example, newsfeeds(5), and
its reciprocal incoming.conf(5), serve two purposes: they remedy
the fact that Netnews currently lack autodiscovery (contrary to
the P2P networks mentioned above), and they code (in a crude,
but working, fashion) the "trust" relationship between the
peers.

However, dealing with "trust" at the link level (and not at the
user level) can by itself led to certain security implications.

With a sound use of digital signatures (and implementing the
relevant WoT, or re-using the OpenPGP one), we can lay the
control over what's trustworthy and what's not straight to the
hands of the user.

The mail routing is a trickier one. However, considering that
virtually the only reason behind such a routing nowadays is the
belief in the security of firewalled intranets, we may simplify
the whole task of routing to a three-hop (user, relay, user)
scheme, detailed below.

Let's first assume that "autodiscovery" is in place. Now, Alice
chooses a "relay" (there may be both free of charge and paid
ones), and her agent puts (into the distributed hash table, or
DHT) a (digitally-signed) "pointer" record that all her mail
should be delivered to that relay. When Bob wants to send mail
to Alice, its agent searches the DHT, finds the "pointer"
record, and sends a copy of the letter to the relay thus found.
(Naturally, Alice's agent checks the relay for any new messages
periodically when online.)

It makes sense for the Bob's letter to mention a few of his
previous messages (sent to Alice) in the header. This way, it
would be much harder for a (malicious) relay to hide the
information of any such message. Also, it makes sense to
encrypt all the communication, so such a relay won't be able to
intercept or tamper the messages, either.

This scheme could be modified slightly by requesting Bob to copy
the message he wishes to send to Alice to his own "relay" B,
while sending only a pointer to Alice's A. This way, Bob (and
not Alice) "pays" for the handling of his outgoing letters,
which may thwart certain abuse scenarios.

Note that should Mallory try to send numerous pointers to a
single message, it'd be possible to "mark" such a message as
"spam" just once per group of users of the "Netnews-like part"
of the P2P communication system being discussed.
Also, wouldn't you still have a separation between a user agent and a
transfer agent?

It /may/ be done, for various reasons, including the
compatibility with the MUA's currently in use.

Both GNUnet and Freenet (IIRC) implement an HTTP interface, so
that an ordinary Web browser can be used to connect to the
network. OTOH, BitTorrent agents (commonly called /clients/,
though it's a misnomer) are mostly self-contained.
E. g., audit trail of the route for mail.

And what exactly it's for? I don't see why one may need to care
/how/ the letter has reached its destination if it's a valid and
wanted one. And if it's not, imposing a policy on the relays is
only a partial solution.
 
I

Ivan Shmakov

There's more than trust involved in current news routing; there's
also the ability, e. g., for specific servers to carry only specific
groups,

First of all, newsgroups doesn't matter that much. They're just
tags, which are immutable once the message is posted, and, quite
often than not, are incomplete or misleading. (And this thread
is a good example of that.) Would I run my own NNTP "server",
I'd make it "track" a number of selected individuals, and all
the threads they've been participating in, whatever are the
newsgroups.

Unfortunately, NNTP doesn't make this task all that easy.
and to announce what they carry.

AIUI, NNTP only provides for a way to know if a newsgroup is
created on the "server", not whether it's actually carried
(i. e., subscribed to on one or more of the peers) or not.
I don't see how.

Isn't it trivial to make a whitelist of public keys of all the
people one wants to receive mail from? It's much less an issue
than providing a way for a stranger (or one who has lost his or
her private key) to communicate, while still stopping abuse.
How do you authenticate a message from a stranger?

If the stranger's public key isn't reachable via my own WoT, and
if it wasn't used to sign any "public" messages I may find, then
I have no way to authenticate it.
With SMTP there's an unforgeable source IP address that I can filter
on.

Not quite, at least since the time various Webmails became
widespread. For privacy concerns, they're quite likely to
"hide" the IP of the HTTP client used to submit the message.

Also, the IP in question may be autoconfigured, or "leased" via
DHCPv6 or DHCP, or it may be an IPv4 address of the NAT'ting
router, just as well. Thus, only the network prefix could be
obtained (more or less) reliably from the message's headers.

Tor could be used to anonymize IP, too, but its default is to
block traffic to TCP port 25. (Which doesn't quite help if
there was a "transit Webmail" at TCP port 80 or 443, though.)
Not if there are no previous messages.
Obviously.
Exactly what I want to avoid.

HTTP is a quite decent file transfer protocol, and "speaking" it
isn't a problem. (Not anymore, and arguably much less, than
speaking FTP, for instance.) My guess is that one can access
these networks with, say, GNU Wget or aria2, too.
The audit trail in e-mail is intended for diagnostic purposes,

Yes, but as I've said before, any complex routing doesn't make
much sense for e-mail anymore, and there isn't much "diagnostic"
that could be obtained in the "two MTA's" case other than that
already recorded in the logs of those MTA's.

Don't we already have some complex routing running on the
network level? Why repeat it a few levels higher?
but in practice it is used for spam filtering as well.

Essentially, that means that instead of relying on almost
unforgeable TCP/IP "peer" (address, port) pair, one decides to
rely on the Received: headers, as added by the "third party"
(= transit MTA's.)
There's no automated way to tell that a message is a validated and
wanted one, so you have to rely on heuristics. Deep filtering is one
useful heuristic.

We can design a system for /reliable/ (something that the
present e-mail doesn't quite offer) delivery between
pairwise-trusted peers. It's up to the user then to decide
whose messages he wants to be delivered to him or her.
Partial solutions are all we have.

Ultimately, yes.
In addition to the difficulty of developing a new infrastructure that
retains the functionality of the existing infrastructure, there's
also the problem of transition.

Yes, at least to some extent.

Though it was my understanding that opting for social networking
sites, GenX has pretty much forsaken e-mail. (My guess is that
they didn't notice that part of the functionality has vanished
meanwhile.)
 
I

Ivan Shmakov

7.6.6. LIST NEWSGROUPS
This keyword MUST be supported by servers advertising the READER
capability.
The newsgroups list is maintained by NNTP servers to contain the
name of each newsgroup that is available on the server and a short
description about the purpose of the group.

... And the purpose of this command is to deliver this
description to the newsreader software, and therefore to the
user.

There's no magic way that the Netnews "server" responding to
this command may know it's no longer feed with the relevant
articles by its peers.

[...]
The webmail server has to inject the messages to SMTP servers, and
they have access to its IP address. I can blocklist that IP address
if appropriate.

Say, the IP address of one (or more) of Google Mail MX'es?
The solution is to reject all traffic from that IP block unless they
are blocking outbound port 25.

The "privacy-enabled" IPv6 autoconfiguration makes the host
choose a new IPv6 address once in, like, 15 minutes. Likewise,
"dynamic" IP's (still widely used, as it seems) are likely to
change every few hours to few days.

And did I mention botnets, BTW?
No, that means that you don't accept any relayed traffic from an MTA
that you haven't already determined logs the IP addresses correctly.

And how do you determine that without accepting some traffic
first?
That's not good enough.

Perhaps.

It's worth trying, anyway.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,226
Members
46,815
Latest member
treekmostly22

Latest Threads

Top