Is a byte data type really a 32-bit int in the JVM?

D

Digital Puer

Is a byte data type really a 32-bit int in the JVM? More
specifically, if I have an an array of N byte types, are N
32-bit ints actually allocated underneath? I am writing
a memory-sensitive application and would appreciate
some insight.

I came across this tidbit saying within the JVM, a 'byte'
data type is actually a 32-bit integer:

http://www.jguru.com/faq/view.jsp?EID=13647

Checking the JVM specification, I think I can confirm
that assertion. In sections 3.11.1 and 3.11.4, it is
mentioned that:

"As noted in §3.11.1, values of type byte, char, and short
are internally widened to type int, making these conversions
implicit."

Can someone please clear this up for me?
 
L

Lew

Digital said:
Is a byte data type really a 32-bit int in the JVM? More
specifically, if I have an an array of N byte types, are N
32-bit ints actually allocated underneath? I am writing
a memory-sensitive application and would appreciate
some insight.

I came across this tidbit saying within the JVM, a 'byte'
data type is actually a 32-bit integer:

http://www.jguru.com/faq/view.jsp?EID=13647

It would help if you would quote the passage of interest:
In the Java Virtual Machine, bytes, shorts and ints are all four bytes long.

This statement is irrelevant. The author doesn't know what he's talking
about. What is relevant is the Java Language Specification:
The integral types are byte, short, int, and long,
whose values are 8-bit, 16-bit, 32-bit and 64-bit
signed two's-complement integers, respectively,
and char, whose values are 16-bit unsigned integers representing UTF-16 code units (§3.1).
<http://java.sun.com/docs/books/jls/third_edition/html/typesValues.html#4.2>

Whatever a byte is in the JVM, it's an eight-bit quantity in Java.

This is just the same as in C or C++, or really any language. Just because a
byte is eight bits long in some implementation of C doesn't mean that its host
platform stores it that way; it very well could be stored and manipulated as a
32-bit value with 24 bits ignored.

Do you let that bother you when you're programming in C?
Checking the JVM specification, I think I can confirm
that assertion. In sections 3.11.1 and 3.11.4, it is
mentioned that:

"As noted in §3.11.1, values of type byte, char, and short
are internally widened to type int, making these conversions
implicit."

Can someone please clear this up for me?

Yes. Like any machine, virtual or actual, the internal opcodes may deal with
wider types than expressed in the language. Floating-point, for example, may
be defined in terms of 64-bit values but calculated by the machine at the low
level with 80-bit values.

From our point of view as Java programmers, we don't care. We care about the
language's semantics, not the machine's. In the Java language, a byte is
eight bits long precisely.

That said, let's look at the JVM spec again.
# byte, whose values are 8-bit signed two's-complement integers

That sure doesn't say that bytes are 32-bit quantities in the JVM.

Why would 3.11.1 and 3.11.4 contradict that?
Ss. 3.11.1:
For the majority of typed instructions, the instruction type is represented explicitly
in the opcode mnemonic by a letter: i for an int operation, l for long, s for short,
b for byte, c for char, f for float, d for double, and a for reference.
Some instructions for which the type is unambiguous do not have a type letter in their mnemonic.

So the JVM has opcodes that recognize bytes, defined in ss. 3.3 as 8-bit
quantities. Still more evidence that bytes are not 32-bit quantities in the JVM.

But wait! There is this:
Compilers encode loads of literal values of types byte and short using Java
virtual machine instructions that sign-extend those values to values of
type int at compile time or run time.

But that is specific to loads, not all instructions, and only because there
aren't specific opcodes to load bytes. The reason for that is:
Given the Java virtual machine's one-byte opcode size, encoding types into
opcodes places pressure on the design of its instruction set.

So the conversion into int is a hack to account for the limited opcode set.
It doesn't make bytes generally into 32-bit quantities.

What about the quote you took out of context? That was the clue that the JVM
supports narrow integral types by ignoring high-order bits. In other words,
to the JVM a byte is exactly eight bits long, but it's stored in a container
that is 32 bits wide. It's like packing a steamer trunk with a single change
of clothes. Just because the trunk is large doesn't mean you won't need to
buy another shirt if you spill soy sauce on the one you packed.

Summary: bytes in the JVM are actually eight bits long, they're simply stored
in containers that are four times larger than needed. None of this matters
unless you're programming in JVM assembly; Java programmers know that bytes
are eight bits long in the language regardless. Fretting over the JVM's
internal representation thereof is silly.
 
M

Mike Schilling

Digital said:
Is a byte data type really a 32-bit int in the JVM? More
specifically, if I have an an array of N byte types, are N
32-bit ints actually allocated underneath? I am writing
a memory-sensitive application and would appreciate
some insight.

I came across this tidbit saying within the JVM, a 'byte'
data type is actually a 32-bit integer:

http://www.jguru.com/faq/view.jsp?EID=13647

That's a terrible article; its author knows little or nothing about
Java. For instance , which it's true that
byte b = 0xAA;causes an error, all that's needed to remove it is

byte b = (byte)0xAA;

And

This is obviously of great consternation and gnashing of teeth to
C/C++
programmers who are used to reading in files as streams of
unsigned bytes,
or generating unsigned bytes, or saving space by using unsigned
bytes

is silly, since Java does the same things with its signed bytes.

In the Java Virtual Machine, bytes, shorts and ints are all four bytes
long.

is simply false.

etc.
Checking the JVM specification, I think I can confirm
that assertion. In sections 3.11.1 and 3.11.4, it is
mentioned that:

"As noted in §3.11.1, values of type byte, char, and short
are internally widened to type int, making these conversions
implicit."

That means something else entirely, e.g. the fact that no cast is
required in

void methoda(int i)
{
byte b;
a(b); // "b" is implicitly converted to int for the
call
}
 
R

Robert Dodier

From our point of view as Java programmers, we don't care.

Speak for yourself. Maybe you don't care, but the OP does care,
with good reason. Your sneering tone notwithstanding, you've
completely missed the point.

Robert Dodier
 
D

Digital Puer

Lew said:
From our point of view as Java programmers, we don't care. We care about the
language's semantics, not the machine's. In the Java language, a byte is
eight bits long precisely.

Summary: bytes in the JVM are actually eight bits long, they're simply stored
in containers that are four times larger than needed. None of this matters
unless you're programming in JVM assembly; Java programmers know that bytes
are eight bits long in the language regardless. Fretting over the JVM's
internal representation thereof is silly.


Thanks for the reply. However, I want to separate the language
semantics from the underlying implementation.

Yes, I aware of the semantics of the signed byte, its value range,
and operations on it.

But what I want to know (and am still vague about, despite your
helpful reply) is how much RAM is taken up with the following
statement:

byte array[] = new byte[N];

Are N * 1 bytes allocated in memory, or N * 4 bytes allocated?

If N*1 bytes are allocated, I am still puzzled by your statement:
" they're simply stored in containers that are four times larger
than needed." Does this mean that when operations are
performed on the bytes, the JVM loads the byte into a 4-byte
container for the purpose of matching the opcode requirements?
But when the byte is "just sitting around" it's still actually just
1 byte long?
 
M

Mike Schilling

Robert said:
Speak for yourself. Maybe you don't care, but the OP does care,
with good reason. Your sneering tone notwithstanding, you've
completely missed the point.

The real answer is "That's up to the JVM implementer; within the
language, there's no way to tell." Note that is a very different
answer than you'd get for C or C++, where there are lots of ways to
tell, sizeof(char) being the simplest one.

JVM implementers not being idiots, and arrays of bytes being used all
over the place in the system classes, I very much doubt that any JVM
implementation makes them four times as big as they have to be. It is
(IIRC) a common implementation that bytes used as local variables take
up a full 32 bits, but in normal cases that's a small enough fraction
of the total size of a new stack frame to be down in the noise.

A more interesting question is whether each entry in an array of
boolean takes up a full byte, when in principle only a single bit is
needed. An implementer needs to weigh the added cost of accessing a
single bit against the savings in space. I don't know what the usual
result is.
 
A

Arne Vajhøj

Mike said:
The real answer is "That's up to the JVM implementer; within the
language, there's no way to tell." Note that is a very different
answer than you'd get for C or C++, where there are lots of ways to
tell, sizeof(char) being the simplest one.

JVM implementers not being idiots, and arrays of bytes being used all
over the place in the system classes, I very much doubt that any JVM
implementation makes them four times as big as they have to be. It is
(IIRC) a common implementation that bytes used as local variables take
up a full 32 bits, but in normal cases that's a small enough fraction
of the total size of a new stack frame to be down in the noise.

A more interesting question is whether each entry in an array of
boolean takes up a full byte, when in principle only a single bit is
needed. An implementer needs to weigh the added cost of accessing a
single bit against the savings in space. I don't know what the usual
result is.

public class Sizeof {
private final static int N = 10000000;
public static long mem() {
System.gc();
Runtime rt = Runtime.getRuntime();
return rt.totalMemory() - rt.freeMemory();
}
public static void main(String[] args) {
long m1 = mem();
byte[] ba = new byte[N];
long m2 = mem();
System.out.println("sizeof byte = " + (m2 - m1)*1.0/N);
}
}

will give a hint !

(it indicates 1 for the Java version I am using)

Arne
 
P

Patricia Shanahan

Digital Puer wrote:
....
But what I want to know (and am still vague about, despite your
helpful reply) is how much RAM is taken up with the following
statement:

byte array[] = new byte[N];

Are N * 1 bytes allocated in memory, or N * 4 bytes allocated?

Approximately N*1 bytes, at least on systems I've tested. Try this
program on yours:

public class ByteArraySizeTest {

public static void main(String[] args) {
long mem;
byte[] big;
int size = (int)4e7;
mem = memoryInUse();
System.out.println("Initial memory: "+mem);
big = new byte[size];
mem = memoryInUse();
System.out.println("With big array: "+mem);
}

private static long memoryInUse() {
System.gc();
return Runtime.getRuntime().totalMemory()
- Runtime.getRuntime().freeMemory();
}

}

Note that there may be different answers for two cases:

1. The per-element space in a byte[].

2. The space allocated for a byte variable.

Patricia
 
T

Tim Smith

Is a byte data type really a 32-bit int in the JVM? More
specifically, if I have an an array of N byte types, are N
32-bit ints actually allocated underneath? I am writing
a memory-sensitive application and would appreciate
some insight.

If you are concerned about memory usage, here's something you might not
have known, and that many people find surprising when they first
encounter it. Suppose you've got large arrays of constants:

public static final byte[] table = {
1, 2, 4, 7, 84, 24, 19, ...
...
};

If you are used to other languages, you might expect that each byte in
this table takes one byte of storage in your .class file. In fact, it
takes around 6 or 7 bytes in the .class file per byte in the array!

I asked about this (on this group, in fact, I think) when I ran into it,
and was told that the reason is that the JVM doesn't have any special
instructions for setting up arrays. The code above ends up being a
series of JVM instructions to initialize the array, one byte at a time.
E.g., in psuedocode, something like this:

allocate the array
store a 1 in position 0
store a 2 in position 1
store a 4 in position 2
...

So, initializing an array of N bytes takes a lot more than N bytes, and
will also be much slower than it would be in, say, C, where the array
would be in an initialized data section in the object file, and would
require no further processing after being loaded into memory.
 
L

Lew

Robert said:
Speak for yourself. Maybe you don't care, but the OP does care,
with good reason. Your sneering tone notwithstanding, you've
completely missed the point.

Honestly, I don't know how you read "sneering" into my post. I was speaking
of the separation of concerns between Java programmers and the low-level JVM
details. In that regard, we, all of us, need not worry about the JVM
implementation as long as we are consistent with Java semantics.

I assure you my post was intended strictly as a technical discussion on the
matters the OP introduced, and that there was not any sneering involved.
 
A

Arne Vajhøj

Lew said:
Honestly, I don't know how you read "sneering" into my post. I was
speaking of the separation of concerns between Java programmers and the
low-level JVM details. In that regard, we, all of us, need not worry
about the JVM implementation as long as we are consistent with Java
semantics.

I assure you my post was intended strictly as a technical discussion on
the matters the OP introduced, and that there was not any sneering
involved.

I think the keyword in the OP was "memory-sensitive". The implementation
(using 1 or 4 bytes) should not have functional consequences, but it
can have an impact on memory usage.

It is preferable if you don't need to know about the implementation to
be sure the app will run in the available memory. But not everyone have
that luxury.

Arne
 
D

Daniele Futtorovic

The real answer is "That's up to the JVM implementer; within the
language, there's no way to tell." Note that is a very different
answer than you'd get for C or C++, where there are lots of ways to
tell, sizeof(char) being the simplest one.

JVM implementers not being idiots, and arrays of bytes being used all
over the place in the system classes, I very much doubt that any JVM
implementation makes them four times as big as they have to be.

Yes. It seems there's been some confusion in this thread between the
byte type and the byte array (byte[]) type. They have little to do with
each other. The JVM doesn't create arrays of byte types, but
array-of-byte types, using the newarray op. How newarray is implemented
seems to be left up to the implementer, as far as I can surmise, so the
reasoning above would seem to be as much as one can tell.
It is (IIRC) a common implementation that bytes used as local
variables take up a full 32 bits, but in normal cases that's a small
enough fraction of the total size of a new stack frame to be down in
the noise.

The computational type for boolean, byte, short, char and int is
specified to be int, so this is more than "common implementation". See:
A more interesting question is whether each entry in an array of
boolean takes up a full byte, when in principle only a single bit is
needed. An implementer needs to weigh the added cost of accessing a
single bit against the savings in space. I don't know what the
usual result is.

boolean arrays and byte arrays share the same access ops: baload and
bastore. So, yes, an array of /n/ boolean will take up the same size as
an array of /n/ bytes. See:
<http://java.sun.com/docs/books/jvms/second_edition/html/Overview.doc.html#22909>

DF.
 
M

Mike Schilling

Daniele said:
The real answer is "That's up to the JVM implementer; within the
language, there's no way to tell." Note that is a very different
answer than you'd get for C or C++, where there are lots of ways to
tell, sizeof(char) being the simplest one.

JVM implementers not being idiots, and arrays of bytes being used
all
over the place in the system classes, I very much doubt that any
JVM
implementation makes them four times as big as they have to be.

Yes. It seems there's been some confusion in this thread between the
byte type and the byte array (byte[]) type. They have little to do
with each other. The JVM doesn't create arrays of byte types, but
array-of-byte types, using the newarray op. How newarray is
implemented seems to be left up to the implementer, as far as I can
surmise, so the reasoning above would seem to be as much as one can
tell.
It is (IIRC) a common implementation that bytes used as local
variables take up a full 32 bits, but in normal cases that's a
small
enough fraction of the total size of a new stack frame to be down
in
the noise.

The computational type for boolean, byte, short, char and int is
specified to be int, so this is more than "common implementation".
See:
<http://java.sun.com/docs/books/jvms/second_edition/html/Overview.doc.html#37906>

That says how computation is done on them, it doesn't prove that
they're not stored in their "natural" size and sign-extended to
perform ariuthmetic on them.
boolean arrays and byte arrays share the same access ops: baload and
bastore. So, yes, an array of /n/ boolean will take up the same size
as an array of /n/ bytes. See:
<http://java.sun.com/docs/books/jvms/second_edition/html/Overview.doc.html#22909>

That's not proof either. If each array has a descriptor giving its
base type, the implementation of baload and bastore can access the two
types differently. See

http://java.sun.com/docs/books/jvms/second_edition/html/Instructions2.doc1.html:

"In Sun's implementation of the Java virtual machine, boolean arrays
(arrays of type T_BOOLEAN; see §3.2 and the description of the
newarray instruction in this chapter) are implemented as arrays of
8-bit values. Other implementations may implement packed boolean
arrays; in such implementations the bastore instruction must be able
to store boolean values into packed boolean arrays as well as byte
values into byte arrays."
 
M

Mike Schilling

Arne said:
I think the keyword in the OP was "memory-sensitive". The
implementation (using 1 or 4 bytes) should not have functional
consequences, but it can have an impact on memory usage.

It is preferable if you don't need to know about the implementation
to
be sure the app will run in the available memory. But not everyone
have that luxury.

What Arne said. There's a difference between a program being correct
and running acceptably.
 
D

Daniele Futtorovic

On 2008-02-04 07:58 +0100, Mike Schilling allegedly wrote:

Haven't got anything to anwer to that, so I'll just repost your
information :)


Says that computation on boolean, byte, short, char and int is done
as on ints, it doesn't prove that they're not stored in their
"natural" size and sign-extended to perform arithmetic on them.


About the sizes of byte arrays vs. boolean arrays:
"In Sun's implementation of the Java virtual machine, boolean arrays
(arrays of type T_BOOLEAN; see §3.2 and the description of the
newarray instruction in this chapter) are implemented as arrays of
8-bit values. Other implementations may implement packed boolean
arrays; in such implementations the bastore instruction must be able
to store boolean values into packed boolean arrays as well as byte
values into byte arrays."
From:
<http://java.sun.com/docs/books/jvms/second_edition/html/Instructions2.doc1.html>
 
L

Lew

Mike said:
What Arne said. There's a difference between a program being correct
and running acceptably.

Well, the difference in byte storage size wouldn't make a large difference in
most cases.

The byte [] difference could be significant, but I only addressed the question
as asked by the OP:
Is a byte data type really a 32-bit int in the JVM?

To which the answer is, "No, not really, but it gets stored in a 32-bit
location."

Others addressed the other question regarding byte []. That's the power of
Useneet - we each who answer can address the part of the original post that we
feel best able to handle.
 
R

Roedy Green

Is a byte data type really a 32-bit int in the JVM? More
specifically, if I have an an array of N byte types, are N
32-bit ints actually allocated underneath? I am writing
a memory-sensitive application and would appreciate
some insight.

Loggically they are 8-bits, but the way the JVM actually does it is
its business, so long as it LOOKS like 8-bits to the programmer.
Similarly it can use big or little endian values internally so long as
they look big endian to DataOutputStream.

In practice an array of bytes would be 8-bits each.
a primitive local var would be 32 bits
A primitive inside an object would be either 8 or 32 bits, maybe even
64 bits.
 
R

Roedy Green

A more interesting question is whether each entry in an array of
boolean takes up a full byte, when in principle only a single bit is
needed. An implementer needs to weigh the added cost of accessing a
single bit against the savings in space. I don't know what the usual
result is.

I would think the decision would be based on whether the machine has
bit addressibility. If you had to shift to construct the address, and
index to create a mask storing packed as 1-bit would not pay.

The sort of hardware you would like is one that can get/set a bit,
clearing high bits, given two registers, one pointing to the base and
one giving the bit offset. Anything less and you would likely go for
8-bits per byte for access speed.

Java has java.util.BitSet when you definitely want bits packed.
 
R

Roedy Green

Honestly, I don't know how you read "sneering" into my post. I was speaking
of the separation of concerns between Java programmers and the low-level JVM
details. In that regard, we, all of us, need not worry about the JVM
implementation as long as we are consistent with Java semantics.

Java makes a big distinction about how the JVM logically behaves and
how it is implemented on a given platform. The implementor has extreme
freedom to do things the way he wants. It makes no sense then to ask
how Java works under the hood. It only make sense to ask how some
particular implementation does.

In C++ there is nowhere near the same sharp distinction. Programs
fend for themselves discovering implementation differences. In Java,
apps behave the same the same no matter how they are implemented. You
have to make inferences from memory utilisation and speed
measurements, not from results of computation.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,982
Messages
2,570,186
Members
46,740
Latest member
JudsonFrie

Latest Threads

Top