Question on the length of a Scalar

S

sln

I just have a simple question.

When I call the length function on a scalar, is it read directly
(ie: already know its length), or does it traverse the string
counting its characters until it hits a nul terminator?

As an example, which one of these would be a more efficient test?
I'm not saying these constructs hold any practicality, its just to
test the nature of length.

my $str = 'Start';
my $cnt = 1;

# method 1
while (length ($str) )
{
$str .= (sprintf "more %d", $cnt);
$str = '' if ( $cnt % 10000000 == 0);
$cnt++;
}

# method 2
while (defined $str )
{
$str .= (sprintf "more %d", $cnt);
$str = undef if ( $cnt % 10000000 == 0);
$cnt++;
}

Thanks!
 
X

xhoster

I just have a simple question.

When I call the length function on a scalar, is it read directly
(ie: already know its length), or does it traverse the string
counting its characters until it hits a nul terminator?

The first one. It can't just traverse for a nul, because in Perl a nul is
a legal character to be in the middle of a string.

Xho

--
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.
 
T

Tim Greer

I just have a simple question.

When I call the length function on a scalar, is it read directly
(ie: already know its length), or does it traverse the string
counting its characters until it hits a nul terminator?

In Perl,

$string = "this and that and junk to ending here";

and

$string = "this\nthat\r\nother\lfand whatever else
here
and here
and
here";

will see the string from "^this ... all the way to ... here$" regarding
its total length.
 
J

Jürgen Exner

Neither, nor.
First the string representation of that scalar is computed, then the
length of that string representation returned.

jue
 
D

Dr.Ruud

Jürgen Exner schreef:
Neither, nor.
First the string representation of that scalar is computed, then the
length of that string representation returned.

s/(?<=computed)/ or selected/

(because the string representation can already be available)
 
P

Peter J. Holzer

The first one.

That depends on the string. If it is a byte string, you are right. If it
is a scalar string, the string must be traversed to count the
characters. However, the result seems to be cached - it makes almost no
difference whether I call length on a (long) string once or 1000 times
(tested with perl 5.8.8 and 5.10.0).

hp
 
P

Peter J. Holzer

No. A scalar string in Perl has its length stored in the typeglob
as part of the variable's metadata.

That's the length in bytes, not the length in characters. In the case of
a byte string that's identical, but for a character string it is not (a
character may be more than one byte).

perl
$string = "=" x 80 . "\000" . "_" x 80 . "\000";
die if length $string != 80+1+80+1;
print "The string, with embedded nulls, is ", length $string, " bytes.\n";
$string .= "\x{1234}";
print "Adding one UTF-8 character, the length is now ", length $string, ".\n";
^D
The string, with embedded nulls, is 162 bytes.
Adding one UTF-8 character, the length is now 163.

What is this example meant to demonstrate?


Sorry for the typo. That should have read "character string", not
"scalar string".

Nope. Perl knows exactly how many characters are in a string at all
times.


There is no "seems to be" about it. The size of a scalar is stored as
part of the typeglob where it can be accessed _without_ traversing the
string.

Definitely not. I tested it, and the *first* time length is called on
any string, it takes linear time - which is imho a clear indication that
it does not know how many characters there are and needs to count them.
However, on subsequent calls, the result is returned immediately, so the
result of the previous call must be cached somewhere. I don't see it in
the output of Devel::peek::Dump, so I guess it isn't stored in the SV.

hp
 
P

Peter J. Holzer

Definitely not. I tested it, and the *first* time length is called on
any string, it takes linear time - which is imho a clear indication that
it does not know how many characters there are and needs to count them.
However, on subsequent calls, the result is returned immediately, so the
result of the previous call must be cached somewhere. I don't see it in
the output of Devel::peek::Dump, so I guess it isn't stored in the SV.

Correction: It is visible:

"a" x 10 . "¤":

SV = PV(0x9c3a730) at 0x9c56290
REFCNT = 1
FLAGS = (PADMY,POK,pPOK,UTF8)
PV = 0x9cb79e8 "aaaaaaaaaa\342\202\254"\0 [UTF8 "aaaaaaaaaa\x{20ac}"]
CUR = 13
LEN = 16

Length in bytes (13) is correct. No length in characters.

After calling length wie have a "magic" field which contains the length in
characters:

SV = PVMG(0x9c857c8) at 0x9c56290
REFCNT = 1
FLAGS = (PADMY,SMG,POK,pPOK,UTF8)
IV = 0
NV = 0
PV = 0x9cb79e8 "aaaaaaaaaa\342\202\254"\0 [UTF8 "aaaaaaaaaa\x{20ac}"]
CUR = 13
LEN = 16
MAGIC = 0x9c5b278
MG_VIRTUAL = &PL_vtbl_utf8
MG_TYPE = PERL_MAGIC_utf8(w)
MG_LEN = 11

After adding another character, the magic field is still there but invalid:

V = PVMG(0x88e57c8) at 0x88b6290
REFCNT = 1
FLAGS = (PADMY,SMG,POK,pPOK,UTF8)
IV = 0
NV = 0
PV = 0x89179e8 "aaaaaaaaaa\342\202\254\342\202\254"\0 [UTF8 "aaaaaaaaaa\x{20ac}\x{20ac}"]
CUR = 16
LEN = 20
MAGIC = 0x88bb278
MG_VIRTUAL = &PL_vtbl_utf8
MG_TYPE = PERL_MAGIC_utf8(w)
MG_LEN = -1

hp
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,819
Latest member
masterdaster

Latest Threads

Top