Any way to tell if a scalar is a string?

J

jl_post

Hi,

I'm wondering if there's a way in Perl to tell if a scalar was set
as a string. For example, I'd like to write code like this:

my $var1 = "7777";
my $var2 = 7777;

foreach ($var1, $var2)
{
if ( SOME_TEST )
{
# Value is a string, so print it with quotes:
print "Value = \"$_\"\n";
}
else
{
# Print it without quotes:
print "Value = $_\n";
}
}

The reason I want to do this is because I write a lot of Perl code
that uses the unpack() function, and then reports the data it
extracts. For example, I might write:

$string = 'A 65';
@data = unpack('c x a2', $string);

in which case @data should be filled with two elements: the first
being the integer 65, and the second the string "65". But since Perl
seamlessly converts strings to the numeric values they contain, it can
be hard for me to know whether a value that looks numeric was
originally a string or a number.

I noticed that if I use the perl debugger (with "perl -wde 1"),
typing "x @data" does not show which element is a string. On the
other hand, "use Data::Dumper; print Dumper @data;" DOES show the
difference -- and it shows it by surrounding the second element with
single quotes. (Evidently Perl does keep track of whether a numeric
scalar was set as a string or not.)

So my question is: How can I know if a scalar (that looks like a
number) was originally a string? (That is, assigned/encoded to a
string?)

I've read "perldoc Scalar::Util", but the only functions that look
related are looks_like_number() (which is not what I want because
regardless of whether it looks like a number I want to know if it was
encoded as a string) and isvstring() (which returns true if the value
was coded as a "vstring" (but not as a string, which is what I want)).

Thanks in advance for any help.

-- Jean-Luc
 
D

Dr.Ruud

I'm wondering if there's a way in Perl to tell if a scalar was set
as a string.

Not dependably, unless nothing happened to it after.

$v = "123";

Here, $v only has a string face.

1 if $v > 0;

Now, $v also has numeric faces.

$v = 123;

Now, $v only has an integer face.

$v eq "123";

And now, it also has a string face.


So my question is: How can I know if a scalar (that looks like a
number) was originally a string?

Not possible. Unless it was never touched. Or was initialized with some
whitespace before and/or after. :)

perl -MDevel::peek -wle'

my $v = " 123 ";
Dump($v);

1 if $v > 0;
Dump($v);

1 if $v/1;
Dump($v)
'

SV = PV(0x803048) at 0x800228
REFCNT = 1
FLAGS = (PADBUSY,PADMY,POK,pPOK)
PV = 0x201390 " 123 "\0
CUR = 5
LEN = 8

SV = PVIV(0x802060) at 0x800228
REFCNT = 1
FLAGS = (PADBUSY,PADMY,IOK,POK,pIOK,pPOK)
IV = 123
PV = 0x201390 " 123 "\0
CUR = 5
LEN = 8

SV = PVNV(0x804480) at 0x800228
REFCNT = 1
FLAGS = (PADBUSY,PADMY,IOK,NOK,POK,pIOK,pNOK,pPOK)
IV = 123
NV = 123
PV = 0x201390 " 123 "\0
CUR = 5
LEN = 8
 
C

C.DeRykus

Hi,

   I'm wondering if there's a way in Perl to tell if a scalar was set
as a string.  For example, I'd like to write code like this:

   my $var1 = "7777";
   my $var2 = 7777;

   foreach ($var1, $var2)
   {
      if ( SOME_TEST )
      {
         # Value is a string, so print it with quotes:
         print "Value = \"$_\"\n";
      }
      else
      {
         # Print it without quotes:
         print "Value = $_\n";
      }
   }

   The reason I want to do this is because I write a lot of Perl code
that uses the unpack() function, and then reports the data it
extracts.  For example, I might write:

    $string = 'A 65';
    @data = unpack('c x a2', $string);

in which case @data should be filled with two elements:  the first
being the integer 65, and the second the string "65".  But since Perl
seamlessly converts strings to the numeric values they contain, it can
be hard for me to know whether a value that looks numeric was
originally a string or a number.

   I noticed that if I use the perl debugger (with "perl -wde 1"),
typing "x @data" does not show which element is a string.  On the
other hand, "use Data::Dumper; print Dumper @data;" DOES show the
difference -- and it shows it by surrounding the second element with
single quotes.  (Evidently Perl does keep track of whether a numeric
scalar was set as a string or not.)

   So my question is:  How can I know if a scalar (that looks like a
number) was originally a string?  (That is, assigned/encoded to a
string?)

   I've read "perldoc Scalar::Util", but the only functions that look
related are looks_like_number() (which is not what I want because
regardless of whether it looks like a number I want to know if it was
encoded as a string) and isvstring() (which returns true if the value
was coded as a "vstring" (but not as a string, which is what I want)).


Maybe just use Data::Dumper...


my @var = ("7777", 7777, 'foobar', 123_456) ;

for (@var) {
my $d = Data::Dumper->new([$_]);
print "var is a ", ( $d->Dump =~ /'/ ? "string" : "number");
}

-> var is a string
var is a number
var is a string
var is a number
 
S

sln

Hi,

I'm wondering if there's a way in Perl to tell if a scalar was set
as a string. For example, I'd like to write code like this:

my $var1 = "7777";
my $var2 = 7777;

foreach ($var1, $var2)
{
if ( SOME_TEST )
{
# Value is a string, so print it with quotes:
print "Value = \"$_\"\n";
}
else
{
# Print it without quotes:
print "Value = $_\n";
}
}

The reason I want to do this is because I write a lot of Perl code
that uses the unpack() function, and then reports the data it
extracts. For example, I might write:

$string = 'A 65';
@data = unpack('c x a2', $string);

in which case @data should be filled with two elements: the first
being the integer 65, and the second the string "65". But since Perl
seamlessly converts strings to the numeric values they contain, it can
be hard for me to know whether a value that looks numeric was
originally a string or a number.
Seems the truth table of Perl is:

65 == '65' true
65 eq '65' true
65 eq 'A' false
65 == 'A' error, 'A' is not a digit

The bias is in the usage, in which conditionals are
hard to get around.

I guess if you could find out what something is without a
usage context, but how's that possible?

This :
@data = unpack('c x a2', $string);
is a known data format isin't it?
When do you not know what something is?

-sln
 
J

jl_post

Maybe just use Data::Dumper...

my @var = ("7777", 7777,  'foobar', 123_456) ;

for (@var)  {
   my $d = Data::Dumper->new([$_]);
   print "var is a ", ( $d->Dump =~ /'/ ? "string" : "number");
}


Yeah, I was thinking of checking Data::Dumper's output for "'" as
well. However, to do that we really should be checking to see if our
scalar is not a reference. So we could write a function like this:

#!/usr/bin/perl

# Returns a true value if set as a string;
# returns false if not:
sub isString
{
return 0 if ref($_[0]);
use Data::Dumper;
return 1 if Dumper($_[0]) =~ m/'/;
return 0;
}

foreach ("7777", 7777, 'foobar', 123_456)
{
print((isString($_) ? "Is " : "Is not "),
"a string.\n");
}

__END__

outputs:

Is a string.
Is not a string.
Is a string.
Is not a string.

So using Data::Dumper works, as you say. But I think this might be
overkill for what I'm trying to do. I just want to know if there's a
simpler way that I'm overlooking.

Thanks for your reply, though.

-- Jean-Luc
 
J

jl_post

This :
@data = unpack('c x a2', $string);
is a known data format, isn't it?

When do you not know what something is?


That's a good question. The example I gave is overly simplistic;
in reality the binary strings/files I'm parsing through can have
literally hundreds (if not thousands) of entries, and keeping track of
whether they are derived from strings or numeric values can get a bit
hairy.

Not only that, but if I use a format string like:

'I/i I/(a10) I/i'

the number of elements I extract is not fixed. Therefore, it's
difficult to know when the first set of numerical values ends and the
set of not-necessarily-numerical values begins (as well as ends).

I could work around this by parsing the string without using the
unpack() function, but unpack() is so useful I'd rather not work
around it.

The Devel::peek module looks promising, but its Dump() method seems
to output by printing to STDOUT. So if I wanted to use that approach,
I'd have to capture the text sent to STDOUT and parse through that in
order to examine what I have. Unless, of course, there's a
Devel::peek function I don't know about yet that returns me the
information I need in a nice structure.

-- Jean-Luc
 
S

sreservoir

That's a good question. The example I gave is overly simplistic;
in reality the binary strings/files I'm parsing through can have
literally hundreds (if not thousands) of entries, and keeping track of
whether they are derived from strings or numeric values can get a bit
hairy.

Not only that, but if I use a format string like:

'I/i I/(a10) I/i'

the number of elements I extract is not fixed. Therefore, it's
difficult to know when the first set of numerical values ends and the
set of not-necessarily-numerical values begins (as well as ends).

I could work around this by parsing the string without using the
unpack() function, but unpack() is so useful I'd rather not work
around it.

The Devel::peek module looks promising, but its Dump() method seems
to output by printing to STDOUT. So if I wanted to use that approach,
I'd have to capture the text sent to STDOUT and parse through that in
order to examine what I have. Unless, of course, there's a
Devel::peek function I don't know about yet that returns me the
information I need in a nice structure.

Devel::peek prints to stderr, and can be circumvented by open()ing the
file handle to a variable. perhaps local() on *STDERR, and then call
Dump() and process the output.
 
S

sreservoir

Tad said:
None of the operations performed below is performed as written below.



A numeric operator, so the string is converted to a number:

65 == 65 true



A string operator, so the number is converted to a string:

'65' eq '65' true



'65' eq 'A' false



65 == 0 false

It is not an error. 'A' is converted to zero (and a *warning* is generated).


use warnings qw(FATAL all);

makes it an error. it's generally good practice to treat warnings as
errors anyway.
 
J

jl_post

Quoth "[email protected] said:
B::svref_2object, though you will need a
good understanding of perl guts
to make use of the returned objects.


Excellent!

Thanks to you, I wrote this code which (I think) does what I want:

#!/usr/bin/perl

use strict;
use warnings;

foreach ("7777", 7777, 'foobar', 123_456)
{
use B;
if (ref(B::svref_2object(\$_)) eq 'B::pV') {
print "Is a string.\n";
} else {
print "Is not a string.\n";
}
}

__END__

which outputs:

Is a string.
Is not a string.
Is a string.
Is not a string.

This does exactly what I want, provided that I don't use
mathematical operators on a variable before I check it. (That is, if
I do "if ($var > 0) { ... }" then it is not necessarily a reference to
B::pV anymore.)

Thanks again, Ben!

-- Jean-Luc
 
S

sln

B::svref_2object, though you will need a
good understanding of perl guts
to make use of the returned objects.
[snip]
if (ref(B::svref_2object(\$_)) eq 'B::pV') {
print "Is a string.\n";
} else {
print "Is not a string.\n";
}

Problem solved!

-sln
 
U

Uri Guttman

jpc> Yeah, I was thinking of checking Data::Dumper's output for "'" as
jpc> well. However, to do that we really should be checking to see if our
jpc> scalar is not a reference. So we could write a function like this:

jpc> sub isString
jpc> {
jpc> return 0 if ref($_[0]);
jpc> use Data::Dumper;
jpc> return 1 if Dumper($_[0]) =~ m/'/;
jpc> return 0;
jpc> }

see the other post as to why that will not work. a scalar value can be
both an integer and string at the same time. so dumper could report one
or the other based on how it looks at things.

jpc> overkill for what I'm trying to do. I just want to know if there's a
jpc> simpler way that I'm overlooking.

i smell an xy problem. if you are breaking up strings with unpack but
don't know which fields are number (in text) or strings, just use a
regex on them or to extract them. you need to show some examples of data
and code and what you expect to see. unpack is not how you check for an
integer or string.

uri
 
J

jl_post

a scalar value can be both an integer and string
at the same time. so dumper could report one
or the other based on how it looks at things.


That's very true. It's just that sometimes I extract text with
unpack() that is all numbers, but represents a name or identifier,
rather than a quantity (examples are "286", "386", and "486" Intel
architectures).

In such a case, if I extract it as a string, I want to report it as
text (even though I could technically report it as a numerical
value). So if Data::Dumper reports it as a string, chances are I also
want to report it as a string, as well.

True, Perl seamlessly converts strings that look like numbers to
numbers and vice-versa (which is very convenient in most cases), but
in this case I'd to know that such a value was extracted from text.

-- Jean-Luc
 
U

Uri Guttman

jpc> That's very true. It's just that sometimes I extract text with
jpc> unpack() that is all numbers, but represents a name or identifier,
jpc> rather than a quantity (examples are "286", "386", and "486" Intel
jpc> architectures).

jpc> In such a case, if I extract it as a string, I want to report it as
jpc> text (even though I could technically report it as a numerical
jpc> value). So if Data::Dumper reports it as a string, chances are I also
jpc> want to report it as a string, as well.

jpc> True, Perl seamlessly converts strings that look like numbers to
jpc> numbers and vice-versa (which is very convenient in most cases), but
jpc> in this case I'd to know that such a value was extracted from text.

but all you have is text to start with. all of your values would be text
in that context. you need to apply some parsing or heuristics to know
what should stay as text or be converted to a number. but as i keep
saying, i think you are barking up the wrong tree. a proper parser would
do better for you than unpack and then you would always know the
context and what a value should be. unpack is used when you have a known
fixed format but you don't have that.

uri
 
J

jl_post

jl post@hotmail com said:
but all you have is text to start with.


Actually, no. I have a mix of text and numerical values to deal with,
as I'm dealing with packstrings similar to 'c x a2' and 'I/i I/(a10) I/
i'. (If that weren't true, I wouldn't have my issue.)

With either packstring I'd get back a mix of numbers and strings. And
I'd like to know which elements were returned as strings, even if they
happen to look_like_numbers().

(But you're right in that if all I had to deal with is text, then
there'd be no need to keep track of what's text and what's not.)

-- Jean-Luc
 
U

Uri Guttman

jpc> Actually, no. I have a mix of text and numerical values to deal with,
jpc> as I'm dealing with packstrings similar to 'c x a2' and 'I/i I/(a10) I/
jpc> i'. (If that weren't true, I wouldn't have my issue.)

so your pack string knows the format, then you can use it to tell you
what is a number or a string. in either case you have the info and don't
need to determine the data type after the unpacking.

jpc> With either packstring I'd get back a mix of numbers and strings.
jpc> And I'd like to know which elements were returned as strings,
jpc> even if they happen to look_like_numbers().

but you know that. who is creating these pack format strings? if you
are, just break them on spaces (ignored in pack as you seem to know) and
you can easily parse those for number/text.

uri
 
J

jl_post

if (B::svref_2object(\$_)->POK) {

See, I told you you needed to know about perl's guts :).


Point taken! :)

Okay, so to make sure you and I are on the same page, I'd write my
sample script like this:

#!/usr/bin/perl

use strict;
use warnings;

foreach ("7777", 7777, 'foobar', 123_456)
{
use B;
if (B::svref_2object(\$_)->POK) {
print "String\n";
} else {
print "Not a string\n";
}
}

__END__

which outputs:

String
Not a string
String
Not a string

You should be aware that trying to determine if a
scalar is a string or a number (other than by using
looks_like_number) is a highly dubious proposition.
Perl just isn't set up to consistently record that
information.


Yes, I've already noticed. I discovered that if I insert the line:

$_ . '';

as the first line of the foreach loop in the above sample script, then
all the elements are reported as strings. On the other hand, if I
make the first line of the loop to be:

$_ + 0;

then the elements still retain their string/non-string property (even
the string "7777", which does not generate a warning when added to 0).

So what I'm taking from this is that Perl internally may (or may
not) change how a scalar is viewed simply by applying it in a
particular context. (In other words, modification is not required for
Perl to see a scalar "differently.")

If this is just for display, and if you treat the PV form
as canonical where it exists, you should be OK; but don't
get the idea this is something you should be doing in general.


This need of mine has only ever come up when I'm writing a utility
script to peek inside binary files (created by programs written in a
language other than Perl). Perl's unpack() function provides a
(relatively) simple way for me to see all the binary file's data at-a-
glance without having to stare at (and navigate my way through) pages
and pages of hex dump output.

In fact, I often use unpack() when I'm trying to track down a bug
concerning corrupted data. The data could be stored to file wrong, or
it could be being retrieved incorrectly. In either case, being able
to look at the file quickly and easily (to locate possible corruption)
is something that's invaluable for debugging purposes, and is exactly
what Perl and its unpack() function afford me.

I've had people tell me that you can't use Perl to look through
binary files created in C/C++ (as if a hex editor or another C/C++
program is required to read/edit the binary file). Those same people
usually react incredulously when I tell them I've already written a
Perl script to do so.

Anyway, thanks a million times over for all your help, Ben.

-- Jean-Luc
 
J

jl_post

so your pack string knows the format, then you can
use it to tell you what is a number or a string.
in either case you have the info and don't need
to determine the data type after the unpacking.


I have to disagree. I've worked with packstrings similar to:

'I/i I/(a10) I/i'

and, as I've mentioned in an earlier post, there's no way for me to
necessarily know where the first set (of integers) ends and the second
set (of text) begins and ends, especially if all the text strings in
the second set all happen to look like integers.

who is creating these pack format strings? if you are,
just break them on spaces (ignored in pack as you seem to
know) and you can easily parse those for number/text.


I create the packstrings myself (usually from reading the logic inside
of another language's raw code), but oftimes I end up using dozens of
terms in my packstrings, many of which resemble something like this:
"(i dd a10)5". In such cases, parsing the packstring to figure out if
a specific element was extracted as a string can get very messy very
quickly.

So if Perl knows right off the bat if a scalar was returned as a
string, I'd like to take advantage of that. And it looks like Ben
Morrow's solution to use:

use B;
if (B::svref_2object(\$_)->POK)

accomplishes just that.

-- Jean-Luc
 
C

C.DeRykus

Quoth Sir Robert Burbridge said:
On 12/09/2009 07:52 PM, Ben Morrow wrote:
Out of curiosity, why "Don't ever check for class membership with
'ref'"?  Do you mean, "... unless you want to exclude subclassed
objects", or is there some more dire problem with it that I don't see....

Consider

    ref bless [], "HASH";
    ref bless [], "0";

for why one of reftype or blessed from Scalar::Util is better, depending
on what you meant.
...

Aren't 'ref' and 'blessed' equally confusing here...am I
missing something?

perl -wle "use Scalar::Util 'blessed';$o=bless [],'HASH';
print blessed $o;print ref $o"

HASH
HASH
 
U

Uri Guttman

Quoth Sir Robert Burbridge said:
On 12/09/2009 07:52 PM, Ben Morrow wrote:
Don't ever check for class membership with 'ref'. In fact, most uses of
'ref' except as a boolean are bugs. The correct way to check for class
membership is to call the ->isa method; if you aren't sure something is
an object, you can use Scalar::Util::blessed or wrap it in an eval {}.
Out of curiosity, why "Don't ever check for class membership with
'ref'"?  Do you mean, "... unless you want to exclude subclassed
objects", or is there some more dire problem with it that I don't see...

Consider

    ref bless [], "HASH";
    ref bless [], "0";

for why one of reftype or blessed from Scalar::Util is better, depending
on what you meant.
...

CD> Aren't 'ref' and 'blessed' equally confusing here...am I
CD> missing something?

CD> perl -wle "use Scalar::Util 'blessed';$o=bless [],'HASH';
CD> print blessed $o;print ref $o"

CD> HASH
CD> HASH

but you know what function you called and can interpret the results
accordingly. $o is blessed into 'HASH' and is also a HASH ref. nothing
confusing there to me. actually reftype would be a better call there as
ref is returning the class, not the type.

uri
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,982
Messages
2,570,190
Members
46,736
Latest member
zacharyharris

Latest Threads

Top