C linked lists in Perl

S

szr

Ted said:
s> [...]
I really don't know how much more "built-in" linked lists can be
than what you find in Lisp, considering that code and data both are
lists and treated as such by the language, and you can construct
either dynamically. At this point you're arguing that with Lisp
you have an assembled band saw (vs. Perl's Swiss Army knife or
Java's diesel-powered hammer-screwdriver) but you can't use it
until you plug it in. Well, yeah, but that is not an argument
worth having.

s> You forgot PHP's nuclear-powered 10,000 piece rachet set. :)

I tried to resist but you made me do it...

Emacs Lisp = the band saw everyone has lost a fingertip to
C = a club dotted with razor blades dipped in iodine "for better
lubrication" C++ = machine gun that shoots dandellions
Python = a Swedish Army knife
COBOL = enough rubber bands WILL shoot you to the moon, apparently
Fortran = a calculator embedded in a rock
HTML/CSS/Javascript = a flail that keeps flinging back at you
APL = a unicorn horn
Ruby = Lisp's band saw and Perl's army knife used to decorate a
princess birthday cake ("so cute!")

This is just priceless :)
 
S

szr

Joost said:
szr said:
Why would the GC get in the way? In a nutshell, it doesn't free
something until no one is referring to it anymore (the ref-count is
zero), so I'd imagine there would be no problem, or have you run into
issue in certain circumstances?

The problem is that perl's CG is buggy and relatively slow for large
linked lists (or any kind of deeply nested structure):

#!/usr/local/bin/perl -w
use strict;
$|=1;

sub test {
print "Building list\n";
my $head = [];
my $tail = $head;
for (0 .. 1000000) {
$tail = ($tail->[1] = [$_]);
^^^^^^^^^^^^^^^^^^^^^

Is it any surprise that it segfaults when you're using a construct with
undefined behavior?
 
P

Peter J. Holzer

Joost said:
#!/usr/local/bin/perl -w
use strict;
$|=1;

sub test {
print "Building list\n";
my $head = [];
my $tail = $head;
for (0 .. 1000000) {
$tail = ($tail->[1] = [$_]);
^^^^^^^^^^^^^^^^^^^^^

Is it any surprise that it segfaults when you're using a construct with
undefined behavior?

It also crashes when the construct is replaced with one with defined
behaviour.

The crash occurs when $head goes out of scope, because the garbage
collector recursively frees the deeply nested data structure.

hp
 
J

Joost Diepenmaat

szr said:
$tail = ($tail->[1] = [$_]);
^^^^^^^^^^^^^^^^^^^^^

Is it any surprise that it segfaults when you're using a construct with
undefined behavior?

how is that behaviour undefined?

Also note that it's not this construct that causes the crash. It's the
refcounting GC that causes a stack overflow when collecting the
structure (as someone else already pointed out).
 
X

xhoster

szr said:
Joost Diepenmaat wrote:
sub test {
print "Building list\n";
my $head = [];
my $tail = $head;
for (0 .. 1000000) {
$tail = ($tail->[1] = [$_]);
^^^^^^^^^^^^^^^^^^^^^

Is it any surprise that it segfaults when you're using a construct with
undefined behavior?

Is that behavior really undefined?

Xho

--
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.
 
J

Joost Diepenmaat

Peter J. Holzer said:
Define "buggy".

See below.
That program doesn't demonstrate that it is "relatively slow". You would
have to compare it to something else relative to which it is slow.

True. I don't have a good comparison ready at the moment, and I think
I confused the refcounting GC with the "general GC" at END time (which
*is* a lot slower than the refcounting GC). Please ignore it.
#!/usr/local/bin/perl -w
use strict;
$|=1;

sub test {
print "Building list\n";
my $head = [];
my $tail = $head;
for (0 .. 1000000) {
$tail = ($tail->[1] = [$_]);
}
print "Exiting sub\n";
}

test();
print "Exiting program\n";

ouput:

Building list
Exiting sub
Segmentation fault

That seems to be related to the stack size (if I reduce the stack size
the crash occurs with smaller lists). So without looking at the code I'd
guess that the GC calls itself recursively on all referenced objects
after freeing an object. Since on most OSs the stack is relatively small
(e.g., 8 MB on Linux/x86), you run out of stack space after a few 10000
nesting levels. Should be relatively simple to fix, but who in his right
mind builds datastructures with 100000 nesting levels?

I think you're correct that the problem is probably related to the
stack use during GC, but IMHO it's hardly good behaviour that GC'ing a
large linked structure causes a segfault. Of course it's possible to
break up the list yourself to prevent this, but IMHO the programmer
shouldn't have to.
 
X

xhoster

Peter J. Holzer said:
That seems to be related to the stack size (if I reduce the stack size
the crash occurs with smaller lists). So without looking at the code I'd
guess that the GC calls itself recursively on all referenced objects
after freeing an object. Since on most OSs the stack is relatively small
(e.g., 8 MB on Linux/x86), you run out of stack space after a few 10000
nesting levels. Should be relatively simple to fix, but who in his right
mind builds datastructures with 100000 nesting levels?

Someone trying to implement a large linked list, for one, or course.

Xho

--
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.
 
P

Peter J. Holzer

Peter J. Holzer said:
Define "buggy".

See below. [...]
I think you're correct that the problem is probably related to the
stack use during GC, but IMHO it's hardly good behaviour that GC'ing a
large linked structure causes a segfault.

True. I'm hesitating to call it a bug in perl, though: The GC algorithm
works, it just needs quite a bit of memory in an area where the OS
chooses to enforce a rather strict limit (for no good reason AFAICS,
except "it has always been that way, and deeply recursive programs are
probably buggy anyway"). When GC exceeds that limit, the process is
killed by the OS, and there is little perl can do about it. The OS may also
kill the process if it exceeds some other memory limit, although it
shouldn't (however perl doesn't have a way to recover from malloc
returning NULL, either, which one might call a bug).

There is an easy workaround for the limited stack space, and one could
call it a bug that perl isn't using it. But I think it would be quite a
bit slower (obviously I would have to implement and benchmark it to be
sure), and I prefer a faster GC to one which slows down all my scripts
but handles a very unusual case.

So I'd file the GC crash on deeply nested structures under "perl likes
to have lots of memory and it is very unhappy if it can't get it".

If someone wants to improve the GC, I think there are a few much more
important problems, like the lack of cycle detection.

hp
 
P

Peter J. Holzer

True. I don't have a good comparison ready at the moment, and I think
I confused the refcounting GC with the "general GC" at END time (which
*is* a lot slower than the refcounting GC). Please ignore it.

Do you know why the END time GC is so slow? I only notice it when my
script needs more VM than my machine has RAM, so I assumed that it was
just the threshing which occurs when VM is accessed in mostly random
order. I didn't think that the GC at that time is somehow different from
GC which occurs when a variable goes out of scope.

hp
 
P

Peter J. Holzer

Someone trying to implement a large linked list, for one, or course.

However, why would someone do this in Perl? Which problem would be
better solved by a singly linked list than an array in Perl? It just
seems to be unidiomatic.

hp
 
B

Ben Morrow

Quoth "Peter J. Holzer said:
shouldn't (however perl doesn't have a way to recover from malloc
returning NULL, either, which one might call a bug).

See $^M in perlvar. Of course, this is hardly easy to use; but then,
recovering from lack-of-memory is hardly easy under any circumstances.

Ben
 
B

Ben Morrow

Quoth "Peter J. Holzer said:
Do you know why the END time GC is so slow?

Basically because perl has to make sure it collects *everything*,
including things with circular references. Once-upon-a-time, perl used
to just _exit and let the OS clean up; since perl has been ported to
OSen which don't (Win32, I'm looking at you here) and since Perl now
supports DESTROY methods, it has to clean everything up itself. However,
although perl guarentees to call DESTROY methods when the process exits,
it *doesn't* guarentee to do so in a sane order, which is often a source
of bugs.

Ben
 
P

Peter J. Holzer

szr said:
$tail = ($tail->[1] = [$_]);
^^^^^^^^^^^^^^^^^^^^^

Is it any surprise that it segfaults when you're using a construct with
undefined behavior?

how is that behaviour undefined?

perldoc perlop:

Note that just as in C, Perl doesn’t define when the variable is
incremented or decremented. You just know it will be done sometime
before or after the value is returned. This also means that modifying a
variable twice in the same statement will lead to undefined behaviour.

AFAIK this is the only place in the perl doc which refers to "undefined
behaviour".

However, this is very unclear. Unlike C, which has a general rule which
defines when modifying an lvalue yields defined behaviour and when it
yields undefined behaviour, Perl doesn't. We don't even know if that
paragraph refers only to the ++ and -- operators or to all assignment
operators. Also in C, "undefined behaviour" really means "anything can
happen, including a crash, the compiler refusing to compile the program,
or a sudden affluence of nasal demons. The above paragraph (as well as
the next sentence "Perl will not guarantee what the result of the above
statements is.") suggest that only the order of evaluation is undefined.

By C rules,

$tail = ($tail->[1] = [$_]);

is undefined, because an lvalue which is modified between to sequence
points may be read only to determine its new value. In the above
statement, $tail is modified ($tail = ...) and also read ($tail->[1]),
but not to determine its new value, but to determine the location to
which [$_] will be assigned. (Although that's a bit of an edge case - I
think one could argue that $tail is needed to compute $tail->[1] in turn
is needed to determine the value of ($tail->[1] = [$_]) which is then
assigned to $tail. But then, because of autovivification, assigning to
$tail->[1] may modify $tail, so $tail would be modified twice between
two sequence points, which is also forbidden, so this would again be
undefined).

But I don't think C rules apply to Perl. The languages are too
different, and the Perl docs should be fixed to spell out the rules for
Perl.

Basically I see three possibilities:

1) Eliminate "undefined behaviour" completely: Work out how expressions
are evaluated by current perl, document it and specify it as the way
it should be.

2) Specify some rules for expression evaluation, but leave some details
explicitely unspecified. For example, one could specify that any
subexpression of an operator is completely evaluated before the
operator, not specify the order in which the subexpressions are
evaluated.

3) Stick with C's idea of sequence points and document where they occur
and what counts as "modifying an lvalue".

Personally, I'm in favour of 2, but right now I couldn't come up with a
consistent set of rules which explains why

perl -e '$i = 5; $x = ++ $i + $i ++; print "$x\n" '

prints 13.

hp
 
J

Joost Diepenmaat

Peter J. Holzer said:
However, why would someone do this in Perl? Which problem would be
better solved by a singly linked list than an array in Perl? It just
seems to be unidiomatic.

It's definitely unidiomatic, but linked lists (especially large linked
lists) are a conceptually simple way of dealing with certain problems
that large arrays have (like splicing). Having the GC take so much
memory more or less forces the programmer to either use an XS-based
linked list, or a mixed linked-list/array solution. And neither are as
easy to do correctly as a simple linked list.

IMHO having built-in linked lists would be a good thing. But I'm sure
there's a CPAN solution somewhere.
 
J

Joost Diepenmaat

Peter J. Holzer said:
szr said:
$tail = ($tail->[1] = [$_]);
^^^^^^^^^^^^^^^^^^^^^

Is it any surprise that it segfaults when you're using a construct with
undefined behavior?

how is that behaviour undefined?

perldoc perlop:

Note that just as in C, Perl doesn’t define when the variable is
incremented or decremented. You just know it will be done sometime
before or after the value is returned. This also means that modifying a
variable twice in the same statement will lead to undefined behaviour.

That specifically applies to constructs such as

$b = ++$a + ++$a;

*not* to the kind of construct I used above. Also note that I'm *not*
modifying any variable twice.

Also this may or may not be relevant (from the perlop chapter on assignment):

Unlike in C, the scalar assignment operator produces a valid
lvalue. Modifying an assignment is equivalent to doing the
assignment and then modifying the variable that was assigned
to.
 
S

sln

I guess like almost everybody, I like to discuss (argue) the merits of
different technologies. In my world, the big two are Java and
ColdFusion. Recently, we had someone with a background in embedded
systems who has been advocating C. The conversation goes something
like this:

him - Does Perl have linked lists?
me - No.
him - Therefore, C is better than Perl because it has linked lists.
me - But Perl has other data structures that are easier to use than
linked lists.
him - So what? Perl still doesn't have linked lists.

I've never studied linked lists and certainly never coded one (in C or
anything else) but it aroused my curiosity. So after searching on
c.l.p.m. and online, I decided to see if I couldn't do a linked list
in Perl. My first thought is to do something like this:

%linked_list{$key} = { prev => $linked_list{$prev_key}{prev},
value => $value,
next => $linked_list{$next_key}
{next} };

I know that most people will think I'm an absolute moron for thinking
about this (and they likely would be right), but CAN you do a linked
list in Perl (never mind that you would never want to), and if so, how
would you go about it?

My motive is to say to my C friend, "Nya, nya, nya."

Thanks, CC.

I'm kinda of against Linked lists in Perl, I don't see the need for it.
Haven't read the voluminous responces here yet but, if you really see a
need for it, I would suggest that you do it right.

I've dealt with some low level list code in my life. It boils down to
a few (about 7 or 8) tedious low level function's you will need to set up,
if you are serious.

I won't write them for you (although I could) but will steer you in the
right direction.

In addittion to what is below, which you will have to make into functions,
you will have to make functions for adding tail/head, removing, insertion,
and deletion (this is important).

I will leave these as an excersice. You should try to keep it "C" style
as much as possible, as below.

Note also that linked lists are extremely primitive. Navigation, knowing
where you are at all times is critical. By and large, doing this in Perl
incurs an insane overhead!

Good luck,
sln

## node.pl

use strict;
use warnings;

sub new_node
{
my $node = {
# Node Header
prev => undef,
next => undef,
# Node Data
data => {val => undef}
};
return $node;
}

# Create the list with a single node
my $ListHead = new_node();
my $ListTail = $ListHead;
my $curnode = $ListHead;
$curnode->{data}->{val} = 0;

# Add some nodes to the list
for (my $i=1; $i<20; $i++)
{
$curnode->{next} = new_node();
$curnode->{next}->{prev} = $curnode;
$curnode->{next}->{data}->{val} = $i;
$curnode = $curnode->{next};
$ListTail = $curnode;
}

# Traverse the list from the Head
print "Traverse list from Head -\n";
$curnode = $ListHead;
while (defined $curnode)
{
if (defined $curnode->{data}->{val})
{
print "data is: val = $curnode->{data}->{val}\n";
}
$curnode = $curnode->{next};
}

# Traverse the list from the Tail
print "Traverse list from Tail -\n";
$curnode = $ListTail;
while (defined $curnode)
{
if (defined $curnode->{data}->{val})
{
print "data is: val = $curnode->{data}->{val}\n";
}
$curnode = $curnode->{prev};
}

__END__

output:

Traverse list from Head -
data is: val = 0
data is: val = 1
data is: val = 2
data is: val = 3
data is: val = 4
data is: val = 5
data is: val = 6
data is: val = 7
data is: val = 8
data is: val = 9
data is: val = 10
data is: val = 11
data is: val = 12
data is: val = 13
data is: val = 14
data is: val = 15
data is: val = 16
data is: val = 17
data is: val = 18
data is: val = 19
Traverse list from Tail -
data is: val = 19
data is: val = 18
data is: val = 17
data is: val = 16
data is: val = 15
data is: val = 14
data is: val = 13
data is: val = 12
data is: val = 11
data is: val = 10
data is: val = 9
data is: val = 8
data is: val = 7
data is: val = 6
data is: val = 5
data is: val = 4
data is: val = 3
data is: val = 2
data is: val = 1
data is: val = 0
 
P

Peter J. Holzer

It's definitely unidiomatic, but linked lists (especially large linked
lists) are a conceptually simple way of dealing with certain problems
that large arrays have (like splicing).

Singly linked lists (and we are only talking about singly linked lists
here, other types of linked lists or mixed list/hash or list/tree
structures don't have the aforementioned GC problem (they have other
problems instead ;-)) allow fast insertion and deletion (on my machine
adding or removing a single element is faster for lists of more than
10000 elements than using splice), but at the cost of very slow or
limited search. There may be a few problems where this doesn't matter,
but in general linked lists are often used for stacks and queues, where
you don't have to search because you only ever access the ends. Both can
be efficiently implemented with Perl arrays (about 5 times faster than
an equivalent linked list for stacks/queues with 100_000 to 1_000_000
elements).

IMHO having built-in linked lists would be a good thing.

Can't say I ever missed them.

hp
 
P

Peter J. Holzer

Peter J. Holzer said:
$tail = ($tail->[1] = [$_]);
^^^^^^^^^^^^^^^^^^^^^

Is it any surprise that it segfaults when you're using a construct with
undefined behavior?

how is that behaviour undefined?

perldoc perlop:

Note that just as in C, Perl doesn’t define when the variable is
incremented or decremented. You just know it will be done sometime
before or after the value is returned. This also means that modifying a
variable twice in the same statement will lead to undefined behaviour.

That specifically applies to constructs such as

$b = ++$a + ++$a;

As I wrote (as you would have noticed if you had bothered to read a few
lines more) it is *not* clear what this applies to.

*not* to the kind of construct I used above. Also note that I'm *not*
modifying any variable twice.

As I wrote (as you would have noticed if you had bothered to read a few
lines more) this is immaterial, since in C there are also restriction on
accessing a variable which is modified between the same sequence points,
so *if* Perl behaves "just as in C" (which as I wrote, is not clear).

Also this may or may not be relevant (from the perlop chapter on assignment):

Unlike in C, the scalar assignment operator produces a valid
lvalue. Modifying an assignment is equivalent to doing the
assignment and then modifying the variable that was assigned
to.


as you would have noticed if you had bothered to read a few
lines more.

hp
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,236
Members
46,825
Latest member
VernonQuy6

Latest Threads

Top