Immutable Datastructures with good Sharing

J

Jan Burse

Dear All,

Value objects are some well known immutable
datastructures, when an operation is applied,
typically simply a new object is created. So
for example doing:

3 = 1 + 2

With java.lang.Integer objects would involve
creating a new Integer with value left.intValue()
+ right.intValue().

But instead of call the constructor, one can also call
Integer.valueOf(), and there you have some sharing.
Namely the class Integer caches the value objects for
small values (at least the last time I looked into
the Oracle source, it was there...).

More elaborate sharing is seen in the String class
by the substring() operation. This operation does
reuse the original character array and only encapsulates
the offset and the length in a new object. I wouldnt
recomende that to get one character from a 1 MB
long string, since it will prevent the original
character array from GC. But in some situation this
is quite handy.

I am now looking for immutable datastructures with
a similar sharing. In the String class example, the
operation was substring() that produced a new immutable
object. I am looking for:

Some stack class, where: (Easy)
pop() creates a new immutable stack
push() creates a new immutable stack
With good sharing.

Some queue class, where: (Hard?)
enqueue() creates a new immutable queue
dequeue() creates a new immutable queue
With good sharing.

What is your favorite datastructure?

Bye
 
S

Stefan Ram

Jan Burse said:
3 = 1 + 2
With java.lang.Integer objects would involve
creating a new Integer with value left.intValue()
+ right.intValue().

It just needs to behave as-if. The implementation can decide
to not really perform all those operations literally as long
as the program behaves as specified.
Some stack class, where: (Easy)
pop() creates a new immutable stack

The sharing can be implement similarly to the implementation
of the substring operation, because »pop« essentially
is a »substack« operation: .substring( 0, .length() - 1 ).

However, in computer science, a stack is well known as a
specific mutable entity with specific performance
properties. So, an »immutable stack« should not be called
»stack«.
 
J

Jan Burse

Stefan said:
However, in computer science, a stack is well known as a
specific mutable entity with specific performance
properties. So, an »immutable stack« should not be called
»stack«.

If you take the subdomain of functional programming
in computer science, the people there wouldn't
probably agree. A stack can be viewed as an abstract
datatype:

push : Stack x Element -> Stack
pop : Stack -> Stack x Element | Fault

By the above one would loose the possibility to directly
reference a stack by one process. And then see what
an another process is doing on it.

I am interested in functional programming like
implementations of a stack in Java. That show good
sharing. I know for example of the functional
Java project:

http://functionaljava.org/

I could eventually use something like HList from
there. But these classes are highly parameterized
and I don't know whether good Sharing was on their
primary focus.

The interesting case is the Queue.

Best Regards
 
L

Lew

Jan said:
If you take the subdomain of functional programming
in computer science, the people there wouldn't
probably agree. A stack can be viewed as an abstract
datatype:

push : Stack x Element -> Stack
pop : Stack -> Stack x Element | Fault

By the above one would loose the possibility to directly
reference a stack by one process. And then see what
an another process is doing on it.

Your answer does not speak to the inherent contradiction in the term "immutable stack".
 
S

Stefan Ram

Peter Duniho said:
What inherent contradiction? Please cite an authoritative reference
that requires that the data structure known simply as a "stack" must be
implemented using mutable instances.

The most authoritative reference whatsoever, world-wide surely is
the renowned internet encyclopedia »Wikipedia«. And this says:

»The push operation adds a new item to the top of the stack«

http://en.wikipedia.org/wiki/Stack_(abstract_data_type)

. »Add« is a /mutation/, it means to /change/ something by
/adding/ something to it. In this context, the verb »add« has
the same meaning as in everyday language. The push operation
has no result specified, but it has an effect specified.

In the functional lanugage it is exactly the other way
round: It has a result specified, but no effect specified.
Requiring mutable instances would mean one could not, for example,
implement a stack at all in languages like Lisp or XSLT.

In Common Lisp, one /can/ implement mutable data structures.
 
J

Jan Burse

Stefan said:
In the functional lanugage it is exactly the other way
round: It has a result specified, but no effect specified.

I admit that the question might need some out of the box
thinking, especially when somebody makes a strict divide
between Java (destructive, etc..) and functional programming
(non-destructive, etc..).

But I am looking for:

Some stack class, where: (Easy)
pop() creates a new immutable stack
push() creates a new immutable stack
With good sharing.

Some queue class, where: (Hard?)
enqueue() creates a new immutable queue
dequeue() creates a new immutable queue
With good sharing.

All of that in !Java!.

The stack is quite easy, a good indicative that
something is immutable are the keywords final
before field variables:

public class Stack {
final Object element;
final Stack next;

Stack(e, n) {
element = e;
next = n;
}

public Stack push(Object e) {
return new Stack(e,this);
}

public Object top() {
return element;
}

public Stack pop() {
return next;
}
}

But how about the Queue?

Best Regards

P.S.: In LISP one just uses the CONS for
push, the CAR for top, and the CDR for pop.
 
E

Eric Sosman

Who knows, maybe this could give me a queue with
good sharing. Until now, no bell is ringing...

Out of curiosity, why do you want an immutable queue? What
problem are you trying to solve?
 
J

Jan Burse

Eric said:
Out of curiosity, why do you want an immutable queue? What
problem are you trying to solve?

Well it has to be seen in the context of garbage collection.
For example the immutable stack can have a good performance.
For push you just grab a new stack cell from the heap.

When you pop and you have no other pointers to the stack itself
anymore in the application, then cells become eligible to
garbage collection and will quickly reenter the heap and
thus again be available for the application.

In the current situation at hand, you can imagine that
some pointers to the datastructure will nevertheless
still remain, so that I can kind of have a historical
view of what happened with the stack.

This historical view is easiest implemented by an immutable
data structure, but its performance hinges on the fact how
good the sharing works. Same holds for the queue. If a push()
were to copy the full stack I would get a lame application.

Bye
 
E

Eric Sosman

Well it has to be seen in the context of garbage collection.
For example the immutable stack can have a good performance.
For push you just grab a new stack cell from the heap.

First, "stack" and "queue" are not usually thought of as
synonyms. Second, I don't see what immutability has to do with
the queue's or stack's internal memory management -- especially
if the claimed performance benefits are only seen when mutations
occur! Third, have you looked at LinkedList?
In the current situation at hand, you can imagine that
some pointers to the datastructure will nevertheless
still remain, so that I can kind of have a historical
view of what happened with the stack.

The obvious way to represent the history of a stack is to
build a tree. For queues (which is what you asked about), the
situation is less obvious -- One can, of course, implement a
queue as a pair of stacks and keep histories of the stacks, but
it's not clear that such a history would be useful.
This historical view is easiest implemented by an immutable
data structure,[...]

Why? How is this superior to a tree? What advantage does
immutability grant? And what happened to queues?
 
A

Andreas Leitgeb

Jan Burse said:
In the current situation at hand, you can imagine that
some pointers to the datastructure will nevertheless
still remain, so that I can kind of have a historical
view of what happened with the stack.

An immutable stack is quite simple to roll one's own: just a
single-linked list.
An immutable queue is not all that trivial, wrt being able to
add further elements.

Most likely the actually precious and memory-consuming parts of
any stack or queue is the actual payload: the elements, and Java
helps you very well, anyway, to keep those shared.
 
J

Jan Burse

Eric said:
Why? How is this superior to a tree?

I don't know what you mean by tree. Via the immutable stack
also a tree arises. Known as spaghetti stack:

http://en.wikipedia.org/wiki/Spaghetti_stack

Lets write [a,b,c] for new Stack("a",new Stack("b",new Stack("c",null))).

x = [a,b,c];
y = x.pop();
z = y.push(d);

Gives you the following tree, I have indicated the variable
pointers by a colon :)):

x : a z : d
\ /
y : b
|
c
What advantage does immutability grant?

Immutability grants that the tree does not change, but only
grow. And only the variable pointers might change. And via
GC the tree might also shrink in memory.
And what happened to queues?

Just the other operations. Instead of the operations push/pop,
we have the operations enqueue/dequeue. I don't know whether
historical data picture has a special name. My picture so far
is the same as in a supermarket:

people queing, imagine a supermarket filmed and then
run 8x times faster or some such.

But no graph theoretic name for it... Sniff.

Bye
 
A

Arne Vajhøj

Your answer does not speak to the inherent contradiction in the term "immutable stack".

Given the example with Integer in his first post, then it should be
rather obvious what he means with immutable stack.

Whether that thingy should be called stack or not can be relevant
but maybe a little focus on the question asked first would be
good.

Arne
 
A

Arne Vajhøj

Well it has to be seen in the context of garbage collection.
For example the immutable stack can have a good performance.
For push you just grab a new stack cell from the heap.

When you pop and you have no other pointers to the stack itself
anymore in the application, then cells become eligible to
garbage collection and will quickly reenter the heap and
thus again be available for the application.

I would expect worse performance with typical stack operations.
In the current situation at hand, you can imagine that
some pointers to the datastructure will nevertheless
still remain, so that I can kind of have a historical
view of what happened with the stack.

This historical view is easiest implemented by an immutable
data structure, but its performance hinges on the fact how
good the sharing works. Same holds for the queue. If a push()
were to copy the full stack I would get a lame application.

Yes.

But will the time you spend investigating this really cost less
than what it cost to buy 4 GB more RAM and just clone the objects
you want to store as history.

Arne
 
E

Eric Sosman

I don't know what you mean by tree.

Wikipedia defines it as "a widely-used data structure that
emulates a hierarchical tree structure with a set of linked nodes."
That's not a very good definition, since trees need not have links,
but it's close enough for jazz.
Via the immutable stack
also a tree arises.

If you don't know what I mean by tree, I don't know what you
mean by tree, either.

Never heard the term before. It looks like a tree to me, of
the kind that arises when processing equivalence relations, for example.
Lets write [a,b,c] for new Stack("a",new Stack("b",new Stack("c",null))).

Sorry; you've lost me again. You're clearly not talking about
java.util.Stack (because it has no such constructor), so you must
mean some other kind of Stack -- but you haven't explained what this
Stack is like, so the meaning of your example remains private to you.
Immutability grants that the tree does not change, but only
grow. And only the variable pointers might change. And via
GC the tree might also shrink in memory.

I seem to be having trouble with reading comprehension today.
First, in what sense can the tree grow without changing? Second,
what can be garbage-collected from a tree that only grows and
never shrinks?
 
J

Jan Burse

Arne said:
But will the time you spend investigating this really cost less
than what it cost to buy 4 GB more RAM and just clone the objects
you want to store as history.

I do this already. But I have an application case where the
cloning not only incures more memory use but also more runtime.
When you clone you need to go through the whole array and copy
each element, this takes time. When you share, you don't need
to spend this time.

Currently the runtime is too high. So if you show me a shop
where I can buy CPU time as I can buy RAM, then this could be a
solution. But les assume that the CPU time is bounded, so that
we have a real problem case here.

Bye
 
A

Arne Vajhøj

I do this already. But I have an application case where the
cloning not only incures more memory use but also more runtime.

How do you know that the clone solution use more CPU than
the solution you are looking for now??
When you clone you need to go through the whole array and copy
each element, this takes time. When you share, you don't need
to spend this time.

But you will need to do other things instead.
Currently the runtime is too high. So if you show me a shop
where I can buy CPU time as I can buy RAM, then this could be a
solution.

I believe there are many shops that will sell you
faster CPU's for the existing computers or
more CPU's for other computers.

Arne
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,968
Messages
2,570,153
Members
46,699
Latest member
AnneRosen

Latest Threads

Top