Why references??

  • Thread starter Thomas Deschepper
  • Start date
T

thumb_42

Thomas Deschepper said:
I've been reading Beginning Per & Programming Perl from O'Reilly for some time
now and I'm getting used to references and how to grow them..

But why would someone use a reference if they can use a normal variable? Yeah, I
know, with references you can grow complex data structures, but in simple
programs, why would you use them (=references)?

Thanks for your (smart) answers :)

Aside from making it more efficient, references are handy ways of tracking
variables when processing stream data. Parsing an XML document and storing
the character data is an example:

Say you want something like:

$NAMES{E2620} = "John";

And an XML document with ah, <Employee id="E2620">John</Employee>

$garbage = "";
$char_ptr = \$garbage;

Using a reference you could do this:

# Start an element.
$start = sub {
my($expat,$tag) = (shift,shift);
my(%ats) = @_;
# For a complex structure, the tagname should be table driven,
# another use of references, since the table of expected tags
# will change with each parent tag.
if($tag eq 'Employee'){
my($id) = $ats{id}; # Get employee ID from id="E2620".
$NAMES{$id} = ""; # Clear, and make it a string.
$char_ptr = \$NAMES{$id}; # We want our CDATA here now.
}
};

# Deal with CDATA.
$char = sub {
shift; # Get rid of expat.
$$char_ptr .= shift; # Store character data to whatever area.
};

$end = sub {
...
clear $garbage,
put $char_ptr back to $garbage, or parent,
or wherever you need to store the CDATA from here.
..
};

Using a reference, you don't need to have much logic in your character
handler. (In the above case, it'd be simple to just:

if($id) {
$NAME{$id} .= shift;
}

Try doing that if you need chardata from several XML elements to several
hash elements. (probably most of us have tried the 'non-reference' approach
when learning XML::parser, increasing the sale of aspirin.. :) ) If you
don't use references you'll end up tracking which ID, obtain the
current_element() determine which hash element (of which hash variable) to
store your chardata and it gets really messy.

That same bit of 'state logic' helps when parsing other kinds of data, too.
(any time you need to store stuff some place dependent on some prior bit
of info you got from the stream)

So references take what could be a complicated script and turn it into a
simple one. I've simplified a LOT of stuff this way.

There are times when I don't like references, it's a style issue.
an example is a cursor type of implementation:

package MyCursor;
# init.. or whatever you need to do to create $self with a filehandle, etc..
sub fetch {
my($self) = shift;
# Do what is needed for the next "row","entry","thingy".
return(%row);
}
1;

You could return \%row, and it'd probably be faster but then you'd have a
variable that was lexically scoped to MyCursor::fetch() and that just "feels
wrong" to me. (if %row were a large hash, then it's different) Feels better
to have my(%row) = $cursor->fetch(); then my($row) = $cursor->fetch(); I
regard that as a personal choice. (can always use 'wantarray()' and
do either if you're a people pleaser. <g> )

Jamie
 
B

Ben Morrow

There are times when I don't like references, it's a style issue.
an example is a cursor type of implementation:

package MyCursor;
# init.. or whatever you need to do to create $self with a filehandle, etc..
sub fetch {
my($self) = shift;
# Do what is needed for the next "row","entry","thingy".
return(%row);
}
1;

You could return \%row, and it'd probably be faster but then you'd have a
variable that was lexically scoped to MyCursor::fetch() and that just "feels
wrong" to me.

You haven't made the distinction in your mind between 'variables' and
'values'. If we have

sub hash {
my %h = (a => 'b');
return \%h;
}

my $h = hash;

then the *variable* %h is lexically scoped to the sub. It never gets
out, as in there is nowhere else in the program where %h refers to the
same data. The value, i.e. the hash (the HV), is refcounted like all
Perl values. It starts with a refcount of 1, held by the %h name, then
taking a reference increases its refcount to 2, then when the name %h
goes out of scope it goes back to 1 again, this time held by the
reference in $h.

The distinction is an important dichotomy in Perl (indeed, in all
compiled languages): variables are lexical, values dynamic; variables
are created at compile-time, values at run-time. my acts on variables,
local on values; this is the source of the differences between them.

Ben
 
B

Brian McCauley

Ala Qumsieh said:
\%{$kingdom{animalia}{chordata}{mammalia}{artiodactylae}{camelidae}{camelus}
{bactrianus}};

Why are you de-referencing your reference, and then referencing it again?

In order to autovivify and/or as an assertion that I'm expecting a hash.
You don't need to do that.

Except when you do. Doing so when you don't need to is rarely a
problem. Not doing so when you do need to can produce many an obsure
bug.
$bactrianus_ref =
$kingdom{animalia}{chordata}{mammalia}{artiodactylae}{camelidae}{camelus}{ba
ctrianus};

That will set $bactrianus_ref to undef if there's no such bactrianus
in the camelus hash. Sometimes that _is_ what you want. Often it is not.
Then you can quickly [...] set attributes:
$batctrian_ref->{hump_count} = 2;

If $batctrian_ref was undef this will create a new anonymous ref and
not attach it to the camelus hash.

--
\\ ( )
. _\\__[oo
.__/ \\ /\@
. l___\\
# ll l\\
###LL LL\\
 
B

Brian McCauley

Ben Morrow said:
You haven't made the distinction in your mind between 'variables' and
'values'.

And the distinction you have made, while perfectly valid in isolation,
is not the same as the conventinal Perl nomenclature used in the
standards docs and by most people here.
sub hash {
my %h = (a => 'b');
return \%h;
}

my $h = hash;

then the *variable* %h is lexically scoped to the sub. It never gets
out, as in there is nowhere else in the program where %h refers to the
same data. The value, i.e. the hash (the HV), is refcounted like all
Perl values.

The thing you call "value" is more conventinally called "variable" in Perl.

The thing you call "variable" is more conventinally called "variable
name" or "lexical binding" or "pad entry" (or in the case of package
variables "symbol table (stash) entry").

The term "lexical variable" or "package variable" can be used to refer
to the composite concept of the variable and it's pad/stash entry.

Many people use "global variable" as a synomym for "package variable"
but I diapprove of this because it leads people to draw false inferences.
The distinction is an important dichotomy in Perl (indeed, in all
compiled languages): variables are lexical, values dynamic; variables
are created at compile-time, values at run-time. my acts on variables,
local on values; this is the source of the differences between them.

It is important, as you say, to understand the disticution between
these entities. It is also a good idea to use (in public) the same
nomenclature as other people (even if your nomenclature seems more
logical).

Oh, and by the way, pad entries are lexical, stash entries are not
lexical.

--
\\ ( )
. _\\__[oo
.__/ \\ /\@
. l___\\
# ll l\\
###LL LL\\
 
B

Ben Morrow

Brian McCauley said:
The thing you call "value" is more conventinally called "variable"
in Perl.

OK, if you say so... my reason for thinking otherwise is that I'm sure
I've read either Tad or MJD, I forget which, writing that my scopes a
variable whereas local scopes a variable's value.
It is important, as you say, to understand the disticution between
these entities. It is also a good idea to use (in public) the same
nomenclature as other people (even if your nomenclature seems more
logical).

Of course. I was not intending to rock any boats.
Oh, and by the way, pad entries are lexical, stash entries are not
lexical.

The FQ name (whatever you choose to call it) is lexically global:
visible from everywhere. The short name is, of course, package scoped;
package statements are lexically scoped, so the visibility of such
names is also lexically scoped. The point is that it is possible to
work out which variable a given name refers to solely from the lexical
structure of the program: there are no run-time action-at-a-distance
effects.

Ben
 
T

thumb_42

Ben Morrow said:
OK, if you say so... my reason for thinking otherwise is that I'm sure
I've read either Tad or MJD, I forget which, writing that my scopes a
variable whereas local scopes a variable's value.

This is kind of points out a case where references might be a bad idea,
the "who's variable/data/thing is it, *what* is it for that matter?"
can get confusing, while:

sub fetch {
return(%hash);
}

is pretty clear and straightforward since everything is passed "by-value".

sub fetch {
return(\%hash);
}

Is faster, and a lot of folks perfer it, but it forces you to think how can
$hash_ref get modified? (Ok, more specifically, who/what/where can modify
the values that are referenced by $hash_ref) the effect is the same, you
have to be careful about modifying that data after you've returned the
reference. (unless this is what you want to do on purpose for some reason,
ex: the returned reference some how tracks the state of the package/scope it
was returned from or the reference that the caller is expected to append
text to/fill with values or something.

I suppose when you consider anonymous references, the idea of what a
variable is and how it's scoped gets interesting. :)

I don't think it's a particularly big deal in either case, but the original
question was "Why use a reference when you can just use a value".

I'd just use the value if the hash were something small, a record from a
tabfile or something. It eliminates the possibility of shooting myself in
the foot, there is no way the contents can unexpectedly change later on.

IMO, there is really nothing wrong with returning a reference either if thats
what you want to do. Heck, you could use wantarray() and return either
depending on the callers 'mood'. :)

Jamie
 
B

Ben Morrow

This is kind of points out a case where references might be a bad idea,
the "who's variable/data/thing is it, *what* is it for that matter?"
can get confusing, while:

sub fetch {
return(%hash);
}

is pretty clear and straightforward since everything is passed "by-value".

sub fetch {
return(\%hash);
}

Is faster, and a lot of folks perfer it, but it forces you to think how can
$hash_ref get modified? (Ok, more specifically, who/what/where can modify
the values that are referenced by $hash_ref) the effect is the same, you
have to be careful about modifying that data after you've returned the
reference. (unless this is what you want to do on purpose for some reason,
ex: the returned reference some how tracks the state of the package/scope it
was returned from or the reference that the caller is expected to append
text to/fill with values or something.

You are conflating two questions here: 'do I want to return the
original data or a copy' and 'do I want to return the data as a list
or as a ref'. It is hard to return the original data as a list; but if
you want to return a ref you have a choice between

sub fetch { return \%hash }

and

sub fetch { return {%hash} }

.. Obviously there is no speed benefit to the second, but that *does*
*not* *concern* me: premature optimization is the root of all evil. If
the function is supposed to return a copy of the data that the caller
can do with what he will, then it should return a copy. If it is
supposed to return the actual data, such that modifications to the
return value modify the original data, then that is what it should
do. In either case, the decision *must not* be made on the basis of
unjustified assumptions about efficiency: there's time enough to deal
with that later, once you've got the logic right.

Ben
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,145
Messages
2,570,828
Members
47,374
Latest member
anuragag27

Latest Threads

Top