Best way to loop through ArrayList and remove elements on the way?

K

Kevin

Hi guys,

Just want to confirm:

for a ArrayList, in single thread mode (only one thread will access
this ArrayList), what is the best way to:

1) loop through all the element of this arraylist (for example, each
item of it is a String).
2) do some check on each item (for example, check if it equals to
"abc").
3) remove this item from the arraylist if the above check is true.

Is using iterator() and then use iterator.remove() the best way?
Like:

for (Iterator it = myarraylist.iterator(); it.hasNext(); )
{
String s = (String) it.next();
if (s.equals("abc"))
{
it.remove();
}
};


Thanks a log.
 
D

Daniele Futtorovic

Hi guys,

Just want to confirm:

for a ArrayList, in single thread mode (only one thread will access
this ArrayList), what is the best way to:

1) loop through all the element of this arraylist (for example, each
item of it is a String). 2) do some check on each item (for example,
check if it equals to "abc"). 3) remove this item from the arraylist
if the above check is true.

Is using iterator() and then use iterator.remove() the best way?
Like:

for (Iterator it = myarraylist.iterator(); it.hasNext(); ) { String s
= (String) it.next(); if (s.equals("abc")) { it.remove(); } };


Thanks a log.

"Best" with respects to what?

The code above will be more efficient on a LinkedList than on an
ArrayList. As a general rule, when doing filtering operations
(conditional removing/keeping) on array-backed structures (ArrayList,
StringBuffer, etc.), it is more efficient, especially if the structure
is big, to copy those elements which are to be kept into a new
structure, and then to discard the old one, instead of perform multiple
deletions. Only if there would be a significant number (more than a
couple) of deletions, of course. Your mileage may vary, but it takes a
lot off the CPU and adds only little memory overhead.

DF.
 
L

Lasse Reichstein Nielsen

Kevin said:
for a ArrayList, in single thread mode (only one thread will access
this ArrayList), what is the best way to:

1) loop through all the element of this arraylist (for example, each
item of it is a String).
2) do some check on each item (for example, check if it equals to
"abc").
3) remove this item from the arraylist if the above check is true.

First of all, an ArrayList takes linear time to remove a single
element, so repeated removal is a bad idea.

Either use a LinkedList, which can remove in constant time during
iteration, or first collect the elements in a HashSet and then remove
them at the end using Collection.removeAll(Collection). I suggest the
latter.

For ArrayList, the removeAll method takes time proportional to the
size of the list times the lookup time for the argument collection.
Using a HashSet as argument should minimize the time it takes.

If you only remove a few values, any collection will probably suffice.

I.e., either:

LinkedList tmpLinkedList = new LinkedList(myarraylist);
for(Iterator iter = tmpLinkedList.iterator(); iter.hasNext();) {
if (test(iter.next)) { iter.remove(); }
}
myarraylist.clear();
myarraylist.addAll(tmpLinkedList);

or

HashSet toRemove = new HashSet();
for(Iterator iter = myarraylist.iterator(); iter.hasNext();) {
Object elem = iter.next();
if (test(elem)) { toRemove.add(elem); }
}
myarraylist.removeAll(toRemove);

This is assuming the test is complicated. If it's just equality
check, then you might be able to use removeAll directly.
Is using iterator() and then use iterator.remove() the best way?

For the above reasons, I'd say no.
Like:

for (Iterator it = myarraylist.iterator(); it.hasNext(); )
{
String s = (String) it.next();
if (s.equals("abc"))
{
it.remove();
}
};

In this simple case, where you only compare for equality (and even to
only a single value), it would suffice to do:
myarraylist.removeAll(Collections.singletonSet("abc"));
If you have more values, but still only do equality checks, just create
a collection and remove them all.
/L
 
K

Kevin

Thanks guys!

So remove(object) is linear time with respect to the ArrayList's
lenght, right?

Is using iterator.remove() still O(n) time? I think the iterator
already keep the "reference" to the current item (for example, in my
previous example code, it "points" to the current item already), and
if we want to remove it, there is no need to look for it in the
arraylist, so it just removes it directly, which should be constant
time, right?

Thanks.
 
D

Daniele Futtorovic

First of all, an ArrayList takes linear time to remove a single
element, so repeated removal is a bad idea.

Either use a LinkedList, which can remove in constant time during
iteration, or first collect the elements in a HashSet and then remove
them at the end using Collection.removeAll(Collection). I suggest
the latter.

For ArrayList, the removeAll method takes time proportional to the
size of the list times the lookup time for the argument collection.
Using a HashSet as argument should minimize the time it takes.

I don't quite understand that advice. The docs for removeAll(Collection)
in AbstractCollection (neither AbstractList nor ArrayList override that
method) state:
"This implementation iterates over this collection, checking each
element returned by the iterator in turn to see if it's contained in the
specified collection. If it's so contained, it's removed from this
collection with the iterator's remove method".

Which is what the OP was doing -- except for the part of the lookup.
Sure, using removeAll is cleaner, and, as opposed to removing one "type"
(as defined by equality relations in this context) at once, it is
superior. But only in that it avoids multiple iterations. The most
itensive operation here, however, isn't the iteration, but the removal.
Which your suggestion does not affect.

As far as I can surmise, there are only two ways of substantially
improving the algorithm: either switch to a sequential list (and then
using removeAll would be a good idea too), or give up on having the
filtering modify the list at all, but rather yield a new one.

DF.
 
P

Patricia Shanahan

Kevin said:
Hi guys,

Just want to confirm:

for a ArrayList, in single thread mode (only one thread will access
this ArrayList), what is the best way to:

1) loop through all the element of this arraylist (for example, each
item of it is a String).
2) do some check on each item (for example, check if it equals to
"abc").
3) remove this item from the arraylist if the above check is true.

Is using iterator() and then use iterator.remove() the best way?
Like:

for (Iterator it = myarraylist.iterator(); it.hasNext(); )
{
String s = (String) it.next();
if (s.equals("abc"))
{
it.remove();
}
};


Thanks a log.

From the point of view of coding simplicity, I think it is the best way.

If it becomes a performance bottleneck, investigate whether you should
switch to LinkedList or take other steps to reduce the cost.

Patricia
 
M

Mike Schilling

Kevin said:
Thanks guys!

So remove(object) is linear time with respect to the ArrayList's
lenght, right?

Yes. So is remove(int), but remove(object) has a larger constant.
remove(m) for an ArrayList of size n needs to:

1. copy the references numbered from m+1 to n-1 to the positions m
to n-2 respectively
2. Set the reference at m-1 to null.

remove(o) has to

a. Find the index m of the first o in the ArrayList
b. perform steps 1 and 2 above.
Is using iterator.remove() still O(n) time?

Yes, because it amounts to remove(int).
 
K

Kevin McMurtrie

Kevin said:
Hi guys,

Just want to confirm:

for a ArrayList, in single thread mode (only one thread will access
this ArrayList), what is the best way to:

1) loop through all the element of this arraylist (for example, each
item of it is a String).
2) do some check on each item (for example, check if it equals to
"abc").
3) remove this item from the arraylist if the above check is true.

Is using iterator() and then use iterator.remove() the best way?
Like:

for (Iterator it = myarraylist.iterator(); it.hasNext(); )
{
String s = (String) it.next();
if (s.equals("abc"))
{
it.remove();
}
};


Thanks a log.

Removing from ArrayList requires shifting elements so it can be slow.
If an ArrayList is still the right tool, it may be faster in some cases
to build a new list based on what wouldn't be removed. If you can't
rebuild, you can still get an improvement by going backwards through the
list.
 
L

Lasse Reichstein Nielsen

Daniele Futtorovic said:
I don't quite understand that advice. The docs for removeAll(Collection)
in AbstractCollection (neither AbstractList nor ArrayList override that
method) state:
"This implementation iterates over this collection, checking each
element returned by the iterator in turn to see if it's contained in the
specified collection. If it's so contained, it's removed from this
collection with the iterator's remove method".

True, my mistake.

I was looking at the GNU Classpath implementation of ArrayList,
which does implement an optimized removeAll, without noticing
that it wasn't the Sun version.
<URL:http://www.docjar.com/html/api/java/util/ArrayList.java.html>

/L
 
D

Daniele Futtorovic

True, my mistake.

I was looking at the GNU Classpath implementation of ArrayList, which
does implement an optimized removeAll, without noticing that it
wasn't the Sun version.
<URL:http://www.docjar.com/html/api/java/util/ArrayList.java.html>

/L

I see. They've forgotten to modify the Javadoc accordingly, though. In
their AbstractCollection class they still write it's done using an
Iterator. Which, at that level, is correct, of course. They would have
to override it ArrayList solely for the doc, I suppose.


I started wondering why this hadn't been added to the core classes,
fired up the bugparade, and here we are:
<http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6529800>
.... is flagged as "Closed, fixed" (took them til 1.6!!).

The ArrayList class in my SDK doesn't contain any such implementation,
but it's 1.6.0_02 (yeah, I should update). However, I can't seem to find
the corresponding bugID in the release notes either:
<http://java.sun.com/javase/6/webnotes/ReleaseNotes.html>

:scratches head:

DF.
 
B

Boris Stumm

Kevin said:
for a ArrayList, in single thread mode (only one thread will access
this ArrayList), what is the best way to:

1) loop through all the element of this arraylist (for example, each
item of it is a String).
2) do some check on each item (for example, check if it equals to
"abc").
3) remove this item from the arraylist if the above check is true.

Is using iterator() and then use iterator.remove() the best way?

I do not think so, as stated in the other replies. An Iterator moves
from the beginning of the ArrayList to the end, so that the complete
operation will terminate in O(n^2). However, as long as speed is not
an issue (or the lists are small), this approach is the cleanest one.

The fastest way to check all elements in an ArrayList, and remove
some of them is probably the following (assuming that the list is NOT
SORTED!)

ArrayList list = ...;
for (int i = list.size() - 1; i >= 0; i--) {
if (elementNeedsToBeRemoved(list.get(i))) {
list.set(i, list.get(list.size() -1 ));
list.remove(list.size() - 1);
}
}

This will work in O(n).
 
P

Patricia Shanahan

Boris said:
I do not think so, as stated in the other replies. An Iterator moves
from the beginning of the ArrayList to the end, so that the complete
operation will terminate in O(n^2). However, as long as speed is not
an issue (or the lists are small), this approach is the cleanest one.

Isn't the iterator-based version O(n*(m+1)) where n is the length of the
list and m is the number of items being removed?

Whether this is equivalent to O(n), O(n^2), or something in between
depends on the relationship between n and m.

Patricia
 
L

Lasse Reichstein Nielsen

Patricia Shanahan said:
Isn't the iterator-based version O(n*(m+1)) where n is the length of the
list and m is the number of items being removed?

It's O(n*m*k) where n is the size of the list, m is the number of
elements removed (it's an average, removing the first element is more
expensive than the last element), and k is the time it takes to check
whether an element is in the argument collection.
Whether this is equivalent to O(n), O(n^2), or something in between
depends on the relationship between n and m.

Worst case is O(n^2), so it's fair to use that as an upper bound on
the complexity of the algorithm.

/L
 
P

Patricia Shanahan

Lasse said:
It's O(n*m*k) where n is the size of the list, m is the number of
elements removed (it's an average, removing the first element is more
expensive than the last element), and k is the time it takes to check
whether an element is in the argument collection.

I don't understand why you multiply together m and k.

The cost of examining each element is O(n*k), and the cost of removing
the elements that need removing is O(n*m*j), where j is the cost of
shifting one element down the array. We can treat at least one of k and
j as being one, because constant factors don't affect complexity, so
that can reduce to O(n*k+n*m)=O(n*(k+m)). I was treating them both as
being one unit, so I reduced it to O(n*(m+1)).
Worst case is O(n^2), so it's fair to use that as an upper bound on
the complexity of the algorithm.

/L

However, in many real applications of this sort of operation the number
of removed elements is very small, so I think it is important to
remember that the complexity depends on the removal rate.

Patricia
 
L

Lasse Reichstein Nielsen

Patricia Shanahan said:
Lasse Reichstein Nielsen wrote: ....
I don't understand why you multiply together m and k.

Neither do I, now that you mention it. I should have said O(n*m+n*k),
but I was too busy multiplying :)
The cost of examining each element is O(n*k), and the cost of removing
the elements that need removing is O(n*m*j), where j is the cost of
shifting one element down the array. We can treat at least one of k and
j as being one, because constant factors don't affect complexity, so
that can reduce to O(n*k+n*m)=O(n*(k+m)). I was treating them both as
being one unit, so I reduced it to O(n*(m+1)).
Indeed.
However, in many real applications of this sort of operation the number
of removed elements is very small, so I think it is important to
remember that the complexity depends on the removal rate.

Nah, being practical shouldn't get in the way of a good theory :)
But ofcourse, you are right. Knowing the actual problem being solved
is more important than general theory. Like Bubble sort being the best
sorting algorithm ... for almost sorted lists.

/L
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,819
Latest member
masterdaster

Latest Threads

Top