Digital said:
Is a byte data type really a 32-bit int in the JVM? More
specifically, if I have an an array of N byte types, are N
32-bit ints actually allocated underneath? I am writing
a memory-sensitive application and would appreciate
some insight.
I came across this tidbit saying within the JVM, a 'byte'
data type is actually a 32-bit integer:
http://www.jguru.com/faq/view.jsp?EID=13647
It would help if you would quote the passage of interest:
In the Java Virtual Machine, bytes, shorts and ints are all four bytes long.
This statement is irrelevant. The author doesn't know what he's talking
about. What is relevant is the Java Language Specification:
The integral types are byte, short, int, and long,
whose values are 8-bit, 16-bit, 32-bit and 64-bit
signed two's-complement integers, respectively,
and char, whose values are 16-bit unsigned integers representing UTF-16 code units (§3.1).
<
http://java.sun.com/docs/books/jls/third_edition/html/typesValues.html#4.2>
Whatever a byte is in the JVM, it's an eight-bit quantity in Java.
This is just the same as in C or C++, or really any language. Just because a
byte is eight bits long in some implementation of C doesn't mean that its host
platform stores it that way; it very well could be stored and manipulated as a
32-bit value with 24 bits ignored.
Do you let that bother you when you're programming in C?
Checking the JVM specification, I think I can confirm
that assertion. In sections 3.11.1 and 3.11.4, it is
mentioned that:
"As noted in §3.11.1, values of type byte, char, and short
are internally widened to type int, making these conversions
implicit."
Can someone please clear this up for me?
Yes. Like any machine, virtual or actual, the internal opcodes may deal with
wider types than expressed in the language. Floating-point, for example, may
be defined in terms of 64-bit values but calculated by the machine at the low
level with 80-bit values.
From our point of view as Java programmers, we don't care. We care about the
language's semantics, not the machine's. In the Java language, a byte is
eight bits long precisely.
That said, let's look at the JVM spec again.
# byte, whose values are 8-bit signed two's-complement integers
That sure doesn't say that bytes are 32-bit quantities in the JVM.
Why would 3.11.1 and 3.11.4 contradict that?
Ss. 3.11.1:
For the majority of typed instructions, the instruction type is represented explicitly
in the opcode mnemonic by a letter: i for an int operation, l for long, s for short,
b for byte, c for char, f for float, d for double, and a for reference.
Some instructions for which the type is unambiguous do not have a type letter in their mnemonic.
So the JVM has opcodes that recognize bytes, defined in ss. 3.3 as 8-bit
quantities. Still more evidence that bytes are not 32-bit quantities in the JVM.
But wait! There is this:
Compilers encode loads of literal values of types byte and short using Java
virtual machine instructions that sign-extend those values to values of
type int at compile time or run time.
But that is specific to loads, not all instructions, and only because there
aren't specific opcodes to load bytes. The reason for that is:
Given the Java virtual machine's one-byte opcode size, encoding types into
opcodes places pressure on the design of its instruction set.
So the conversion into int is a hack to account for the limited opcode set.
It doesn't make bytes generally into 32-bit quantities.
What about the quote you took out of context? That was the clue that the JVM
supports narrow integral types by ignoring high-order bits. In other words,
to the JVM a byte is exactly eight bits long, but it's stored in a container
that is 32 bits wide. It's like packing a steamer trunk with a single change
of clothes. Just because the trunk is large doesn't mean you won't need to
buy another shirt if you spill soy sauce on the one you packed.
Summary: bytes in the JVM are actually eight bits long, they're simply stored
in containers that are four times larger than needed. None of this matters
unless you're programming in JVM assembly; Java programmers know that bytes
are eight bits long in the language regardless. Fretting over the JVM's
internal representation thereof is silly.