Size of an arraylist in bytes

S

sara

Hi All,

I create an Arraylist<Integer> tmp and add some integers to it.
Afterward, I measure the size of tmp in bytes (by converting tmp to
bytes array). Assume the result is byte[] C. However, when I update an
element of tmp, and measure size of tmp in bytes again, the result is
different than C!
Why this is the case?

Best
Sara
 
M

markspace

Hi All,

I create an Arraylist<Integer> tmp and add some integers to it.
Afterward, I measure the size of tmp in bytes (by converting tmp to
bytes array). Assume the result is byte[] C. However, when I update an
element of tmp, and measure size of tmp in bytes again, the result is
different than C!
Why this is the case?


We'd have to see some code to give you a good answer, but basically you
can't measure the memory size of Java objects. They change over time,
in ways that C or C++ can't or doesn't, and there's not much to do that
can rectify that.
 
S

sara

Here is the code:

ArrayList<Integer> tmp=new ArrayList<Integer>();
tmp.add(-1);
tmp.add(-1);
System.out.println(DiGraph.GetBytes(tmp).length);
tmp.set(0, 10);
System.out.println(DiGraph.GetBytes(tmp).length);


public static byte[] GetBytes(Object v) {
ByteArrayOutputStream bos = new ByteArrayOutputStream();
ObjectOutputStream oos;
try {
oos = new ObjectOutputStream(bos);
oos.writeObject(v);
oos.flush();
oos.close();
bos.close();
} catch (IOException e) {
e.printStackTrace();
}
byte[] data = bos.toByteArray();
return data;
}

The problem is I need to write multiple arraylists on disk and later
on I update the elements of them. I store the starting location of
arraylists and their size such that later I can refer to them. If the
size of objects change then it messes up! Could you please help?
I create an Arraylist<Integer>  tmp and add some integers to it.
Afterward, I measure the size of tmp in bytes (by converting tmp to
bytes array). Assume the result is byte[] C. However, when I update an
element of tmp, and measure size of tmp in bytes again, the result is
different than C!
Why this is the case?

We'd have to see some code to give you a good answer, but basically you
can't measure the memory size of Java objects.  They change over time,
in ways that C or C++ can't or doesn't, and there's not much to do that
can rectify that.
 
E

Eric Sosman

Hi All,

I create an Arraylist<Integer> tmp and add some integers to it.
Afterward, I measure the size of tmp in bytes (by converting tmp to
bytes array). Assume the result is byte[] C. However, when I update an
element of tmp, and measure size of tmp in bytes again, the result is
different than C!
Why this is the case?

See markspace's response. Another possible point of confusion:
The ArrayList does not actually contain objects, but references to
those objects -- that's why the same object instance can be in three
ArrayLists, two Sets, and a Map simultaneously. In fact, the same
Integer object could appear forty-two times in a single ArrayList:

List<Integer> list = new ArrayList<Integer>();
Integer number = Integer.valueOf(42);
for (int i = 0; i < 42; ++i)
list.add(number);

If you're coming from a C background, a rough analogy is that
the ArrayList holds "pointers" to the objects it holds, not copies
of those objects.
 
S

sara

I create an Arraylist<Integer>  tmp and add some integers to it.
Afterward, I measure the size of tmp in bytes (by converting tmp to
bytes array). Assume the result is byte[] C. However, when I update an
element of tmp, and measure size of tmp in bytes again, the result is
different than C!
Why this is the case?

     See markspace's response.  Another possible point of confusion:
The ArrayList does not actually contain objects, but references to
those objects -- that's why the same object instance can be in three
ArrayLists, two Sets, and a Map simultaneously.  In fact, the same
Integer object could appear forty-two times in a single ArrayList:

        List<Integer> list = new ArrayList<Integer>();
        Integer number = Integer.valueOf(42);
        for (int i = 0; i < 42; ++i)
            list.add(number);

     If you're coming from a C background, a rough analogy is that
the ArrayList holds "pointers" to the objects it holds, not copies
of those objects.

But do you have any answer to my second question?
 
A

Andreas Leitgeb

sara said:
Here is the code:
ArrayList<Integer> tmp=new ArrayList<Integer>();
tmp.add(-1);
tmp.add(-1);
System.out.println(DiGraph.GetBytes(tmp).length);
tmp.set(0, 10);
System.out.println(DiGraph.GetBytes(tmp).length);

public static byte[] GetBytes(Object v) {
ByteArrayOutputStream bos = new ByteArrayOutputStream();
ObjectOutputStream oos;
try {
oos = new ObjectOutputStream(bos);
oos.writeObject(v);

The serialization output size of an ArrayList<Integer> depends on
more than just the number of Integer elements in the array. There
is the capacity, which may be larger than the size, but what really
spoils it for you is the Integer-objects, which get serialized along
with the array. If both are same, only one Integer-object gets saved,
but if you change the value for one, then you get two different
Integer-objects serialized along with the actual array, and thus
you get more bytes.

If you need fixed-size records for your arrays (assuming a fixed
size() ), you might be more lucky with arrays of primitives:

If you had:
int[] = new int[2]; tmp[0]=-1; tmp[1]=-1;
and dump that array onto oos, then change tmp[0]=0;
it's very likely, you'll see the same number of bytes
dumped, afterwards.
oos.flush();
oos.close();
bos.close();
} catch (IOException e) {
e.printStackTrace();
}
byte[] data = bos.toByteArray();
return data;
}

The problem is I need to write multiple arraylists on disk and later
on I update the elements of them. I store the starting location of
arraylists and their size such that later I can refer to them. If the
size of objects change then it messes up! Could you please help?
 
E

Eric Sosman

An ArrayList /does/ hold pointers (in the sense of Java),
this is not just »a rough analogy«:

»(...) reference values (...) are pointers«

They're "pointers" in Java's terms, but Java is considerably
more restrictive about what you can do with a "pointer" than C is.
You cannot, for example, print the value of a Java reference; you
can do so in C. You cannot convert a Java reference to or from an
integer; C allows it (with traps for the unwary). Java references
obey a type hierarchy; C's types (and hence the pointers to them)
are unrelated. And so on, and so on: Little niggly differences.
Since Java's references support (and prohibit) a different set of
operations than C's pointers do, I maintain they're as similar as
dogs and wolves, and as different.

Put it this way: If I had told sara "An ArrayList contains
C-style pointers to the objects it holds," would I have been
telling the truth?
 
M

markspace

The problem is I need to write multiple arraylists on disk and later
on I update the elements of them. I store the starting location of
arraylists and their size such that later I can refer to them. If the
size of objects change then it messes up! Could you please help?


Yes, this is the problem. You have to use something different from an
ArrayList, because the ArrayList does change size.

Look into plain arrays, IntBuffer, DataInputStream and DataOutputStream.

It would also help now if we knew why you want to store multiple
ArraysLists on disk. What is it you are trying to do?
 
A

Arne Vajhøj

On 11/20/2011 1:58 PM, Eric Sosman wrote:
...

No, but if you had said "An ArrayList contains pointers to the objects
it holds." you would have been telling the exact truth.
Yes.

The baggage that C added to pointers was an unfortunate aberration, not
something that should ever be considered to be the default definition of
"pointer".

C/C++ pointers has certainly caused a lot of problems over the
years.

But the languages would not have been the same without them. And
I even doubt that they would have been as popular.

C and C++ was not chosen because alternatives without
"do anything you want pointers" did not exist.

Arne
 
E

Eric Sosman

[...]
But do you have any answer to my second question?

Only that you're going about it wrong. As Andreas Leitgeb points
out, serializing an object is a different proposition than serializing
a bunch of "raw" values: It saves enough information to reconstruct an
"image" of the original object, with the same structure.

What do I mean by "structure?" Something like this:

Integer x = new Integer(42);
Integer y = new Integer(42);

Here we have two distinct Integer instances, each with the value 42.

ArrayList<Integer> one = new ArrayList<Integer>();
one.add(x);
one.add(x);

The first ArrayList holds one of the Integer instances, twice, and
has nothing to do with the other.

ArrayList<Integer> two = new ArrayList<Integer>();
two.add(x);
two.add(y);

The second ArrayList holds both Integer instances, once each.

If you serialize `one' and read it back again, you'll get an
ArrayList with two references to the same Integer. Reading it back
will produce one Integer, not two. There will be two objects in
the serialized stream: One ArrayList and one Integer, plus enough
additional information to reassemble them. (Actually, there will
probably be additional objects: The ArrayList owns an array, which
is an object in its own right, and perhaps there might be others.
But there'll be two "visible" objects in the stream.)

If you serialize `two' and read it back, you'll get an ArrayList
with two references to two distinct Integers: Three "visible" objects
in all.

It's all right to serialize an object graph and store it on disk.
It is *not* all right to try to update the serialization in place,
nor to modify the object and expect a re-serialization to have the
same size. If you need in-place operations or same-size guarantees,
you'll need to invent a different external representation for your data.
 
S

Stefan Ram

Arne Vajhøj said:
C/C++ pointers has certainly caused a lot of problems over the
years.

C serves as a »portable, abstract machine language«, so the
C pointers are inherited machine addresses from machine
languages, where one can freely add machine addresses and
numbers. But, after all, C already adds some type safety and
abstraction. So, C still makes sense as the first layer on
top of the bare metal. And C cannot be blamed for someone
choosing C where it is not appropriate.

»=head2 What language is Parrot written in?

C.

=head2 For the love of God, man, why?!?!?!?

Because it's the best we've got.«

http://www.davidcole.net/msie/notes/ipl/perl/jul13/parrot/parrot-0.0.4/docs/faq.pod

»Here's the thing: C is everywhere. Recently Tim Bray
made basically the same point; all the major operating
systems, all the high-level language runtimes, all the
databases, and all major productivity applications are
written in C.«

http://girtby.net/archives/2008/08/23/in-defence-of-c/
 
A

Arne Vajhøj

Here is the code:

ArrayList<Integer> tmp=new ArrayList<Integer>();
tmp.add(-1);
tmp.add(-1);
System.out.println(DiGraph.GetBytes(tmp).length);
tmp.set(0, 10);
System.out.println(DiGraph.GetBytes(tmp).length);


public static byte[] GetBytes(Object v) {
ByteArrayOutputStream bos = new ByteArrayOutputStream();
ObjectOutputStream oos;
try {
oos = new ObjectOutputStream(bos);
oos.writeObject(v);
oos.flush();
oos.close();
bos.close();
} catch (IOException e) {
e.printStackTrace();
}
byte[] data = bos.toByteArray();
return data;
}

That code measure the size of ArrayList serialized.

It does not reflect how much it take up in memory.

And you should not user serialization for persistent
storage.

Arne
 
S

Stefan Ram

Patricia Shanahan said:
My main concern with C's pointers is that they were called "pointers",
not "addresses". They behave far more like assembly language addresses
than like something more abstract, whose only job is to point.

Actually, they /are/ called »addresses«:

»An object exists, has a constant address, retains its
last-stored value throughout its lifetime.«

ISO/IEC 9899:1999 (E), 6.2.4p2

»The unary & operator returns the address of its operand.«

ISO/IEC 9899:1999 (E), 6.5.3.2p3

»it is permitted to take the address of a library function«

ISO/IEC 9899:1999 (E), 7.1.4p1

The language cannot be blamed for persons calling addresses
»pointers«.

However, ISO/IEC 9899:1999 (E) also /does/ contain the word
»pointer«, but a »pointer« is /an object/ that contains an
address value.

At least some programmers read this from:

»A pointer type describes an object whose value provides
a reference to an entity of the referenced type.«

ISO/IEC 9899:1999 (E), 6.2.5, #20

However, nowhere does ISO/IEC 9899:1999 (E) give an explicit
definition of »pointer«, and the usage of this document with
regard to the word »pointer« is not always consistent.

But the first three quotations should give you enough rights to
speak of »addresses« of objects and functions in a C context.
 
L

Lew

Here is the code:

ArrayList<Integer> tmp=new ArrayList<Integer>();

*DO NOT USE TAB CHARACTERS TO INDENT USENET CODE LISTINGS!*
tmp.add(-1);
tmp.add(-1);
System.out.println(DiGraph.GetBytes(tmp).length);
tmp.set(0, 10);
System.out.println(DiGraph.GetBytes(tmp).length);


public static byte[] GetBytes(Object v) {
ByteArrayOutputStream bos = new ByteArrayOutputStream();
ObjectOutputStream oos;
try {
oos = new ObjectOutputStream(bos);
oos.writeObject(v);
oos.flush();
oos.close();
bos.close();
} catch (IOException e) {
e.printStackTrace();
}
byte[] data = bos.toByteArray();
return data;
}

The problem is I need to write multiple arraylists on disk and later

The problem is that the code you posted won't compile.
on I update the elements of them. I store the starting location of
arraylists and their size such that later I can refer to them. If the
size of objects change then it messes up! Could you please help?

Java changes the sizes of things in surprising ways, and makes no promises about the size of an 'ArrayList' in the way you're asking.

What do you really want to do?
On Nov 20, 1:05 pm, markspace <-@.> wrote:

*DO NOT TOP-POST!*
 
L

Lew

Eric said:
They're "pointers" in Java's terms, but Java is considerably

They're "pointers" in programming terms, not just Java's.
more restrictive about what you can do with a "pointer" than C is.
So?

You cannot, for example, print the value of a Java reference; you
can do so in C. You cannot convert a Java reference to or from an
integer; C allows it (with traps for the unwary). Java references
obey a type hierarchy; C's types (and hence the pointers to them)
are unrelated. And so on, and so on: Little niggly differences.
Since Java's references support (and prohibit) a different set of
operations than C's pointers do, I maintain they're as similar as
dogs and wolves, and as different.

Dogs and wolves are the same species. They can interbreed.

Java pointers *are* pointers - and that's all they are. They don't pretendto do arithmetic on themselves. That does not make them less a pointer.

The essence of pointers is that they point. The implicit 'const' on them (in C terms) doesn't change that a jot.
Put it this way: If I had told sara "An ArrayList contains
C-style pointers to the objects it holds," would I have been
telling the truth?

Why would you say such a bone-headed thing, and what difference does it make? A pointer is a pointer still, if it but points, though you cannot increment it.

No one is claiming that they're "C-style" pointers. so we'll throw that redherring back in the water.
 
R

Roedy Green

I create an Arraylist<Integer> tmp and add some integers to it.
Afterward, I measure the size of tmp in bytes (by converting tmp to
bytes array). Assume the result is byte[] C. However, when I update an
element of tmp, and measure size of tmp in bytes again, the result is
different than C!
Why this is the case?

What code did you use to convert to byte[]?

An ArrayList consists of a base ArrayList object, a array of pointers
object, and one object for each integer. If the integers are small,
e.g. two 1s in the list will point to the same canonical Integer
object.

Each object (including all the Integers) has perhaps 8 to 16 bytes of
overhead. So it is fairly complicated to figure out how much RAM this
thing uses. It is not like a C array where you just multiply 4xslots.

An int[] is much simpler.
 
S

Stefan Ram

Newsgroups: comp.lang.java.programmer,comp.lang.c
Followup-To: comp.lang.c

Patricia Shanahan said:
My main concern with C's pointers is that they were called "pointers",
not "addresses". They behave far more like assembly language addresses
than like something more abstract, whose only job is to point.

An important difference between address arithmetics
and pointer arithmetics can be seen here:

#include <stdio.h>

int addressdiff( void const * const b, void const * const a )
{ return( char const * const )b -( char const * const )a; }

int main( void )
{ char address[ 2 ];
int pointer[ 2 ];
printf( "%d\n", addressdiff( address + 1, address ));
printf( "%d\n", addressdiff( pointer + 1, pointer )); }

1
4

Newsgroups: comp.lang.java.programmer,comp.lang.c
Followup-To: comp.lang.c
 
A

Arne Vajhøj

I create an Arraylist<Integer> tmp and add some integers to it.
Afterward, I measure the size of tmp in bytes (by converting tmp to
bytes array). Assume the result is byte[] C. However, when I update an
element of tmp, and measure size of tmp in bytes again, the result is
different than C!
Why this is the case?

What code did you use to convert to byte[]?

The code was posted in a followup.

Arne
 
A

Arne Vajhøj

My main concern with C's pointers is that they were called "pointers",
not "addresses". They behave far more like assembly language addresses
than like something more abstract, whose only job is to point.

Since C does not have both constructs, then it is pure terminology.

Arne
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,819
Latest member
masterdaster

Latest Threads

Top