Recognize & understand JVM bytecode
Gain familiarity with bytecode tools
Use bytecode to gain deeper insight into Java language features
Instruction set for execution of the Java Virtual Machine
Conceptually similar to VB's P-code (or earlier precedents)
Provides portable representation of executable code
Can be interpreted or JIT compiled to native code
Most JVMs JIT-compile frequently-executed methods: "Hotspots"
Some experimental JVMs will re-JIT multiple times
"The assembly language of the Java generation"
Class file format could contain CPU instructions
In fact, could even contain multiple CPU's instructions
This could provide WORA in a different fashion
Avoid hit of JIT compilation at runtime, or interpretation
Take advantage of CPU optimizations
So why bother?
Managed environments provide safety, robustness, security
Harder to do with raw CPU instructions
JVMIS can be optimized to particular CPU at runtime if desired
But optimizations don't have to be decided at compile-time
WORA
javap
: Java developer's best friend
disassemble any .class file, including Sun's
disassembled code == JVMIS code
Other languages starting to encroach on the JVM
Groovy, JRuby, others starting to challenge Java's supremacy
Bytecode-manipulation toolkits becoming more popular
Helps delineate where Java language leaves off
Question: How are inner classes implemented?
Question: How are generics implemented (in 1.5)?
Question: What's the cost of J2SE 1.4's assert keyword?
Better understanding of what Java compiler generates
Better understanding of the limitations of obfuscation
Useful for 3rd-party debugging/spelunking
<</home/runner/work/Slides/Slides/Content/JVM/Bytecode/code/Hello.java NOT FOUND>>
<</home/runner/work/Slides/Slides/Content/JVM/Bytecode/code/Hello.jbc NOT FOUND>>
Dynamically loaded, linked at runtime
Loaded via cooperating collection of ClassLoaders
Verified at load-time
All operations produce/consume stack elements
Locals, incoming parameters, live on stack
Stack slots are 32 bits wide (longs/doubles == 2 slots)
May or may not use real stack if JITted
"native" name format looks a bit different
primitives have single-letter codes (I, J, V, ...)
classes are "Ljava/lang/String;" format
array typenames are "[type;", with one "[" per dimension
"$" often used for synthesized class/field/method names
Stack manipulation
dup
, dup2
: Duplicate top element of stack (pop, push, push)
pop
, pop2
: Remove top element of stack
Push constant value onto stack
aconst_null
bipush
(-128 to 127), sipush
(-32k to 32k)
dconst_N
, fconst_N
, iconst_N
, lconst_N
, ldc X
, ldc2_w X
Local load: Push content of local var onto stack
aload
, iload
, ...: one for each data type (a, d, f, i, l)
Local store: Pop top element of stack into local var
astore
, istore
, ...: one for each data type (a, d, f, i, l)
Arithmetic operations
Data conversion: convert TOS to different type
X-2
-Y opcode naming convention
d2f
, d2i
, d2l
, f2d
, f2i
, f2l
, ...
tadd
, trem
, tsub
, tmul
, tdiv
: + % - * / for all types t (d, f, i, l)
iand
, ior
, ishl
, ishr
, ixor
, ...: bitwise operations (int)
dcmpg
, dcmpl
, fcmpg
, fcmpl
, lcmp
: Comparison ops
Branching, control flow
nop
: Do nothing
goto
: Branch always
ifeq
, ifge
, ifgt
, ifle
, iflt
, ifne
, ifnonnull
, ifnull
: Branch if true
jsr
: Jump to location, push return location
areturn
, dreturn
, freturn
, ireturn
, lreturn
, return
lookupswitch
: switch/case implementation
Object model instructions
new
, newarray
, anewarray
: Create object or array, push ref
getfield
, putfield
: Get/Put top of stack (TOS) from/to field
getstatic
, putstatic
: Get/Put from/to static field
checkcast
: Throw exception if top-of-stack is not of type
instanceof
: Push 1 if TOS is of type, else push 0
invokevirtual
: Invoke method on TOS using dynamic binding
invokestatic
: Invoke static method
invokespecial
: Invoke method on TOS w/o dynamic binding
invokeinterface
: Invoke method on TOS through interface
Exception Handling
takes the form of an EH "table" alongside the method
each line is a "from, to, target, type" tuple
from
opcode offset
to
opcode offset
goto target
opcode offset when exception is thrown of type type
type can be "any", usually indicating finally block
"How are inner classes implemented?"
Think about it: JVM states that fundamental atom is a class
Class private boundaries are enforced: no class gets access to another class's private parts
So, without changing the JVM, how are inner classes handled?
"What's the cost of using assert?"
Part of the goodness of C/C++ assert() was zero overhead in non-debug, production builds
J2SE 1.4 introduced assert language keyword
What's the cost of using it, even if turned off?
"How are generics implemented (in 1.5)?"
We're being sold on the goodness of typesafe containers
But is there a cost?
How, without changing the JVM, are generics handled?
"Lambdas: How are they implemented (in 1.8)?"
We're told they're not a breaking change to the JVM
But in Java, everything needs to be in a class!
"Lambdas: How do they capture references (in 1.8)?"
If a lambda captures a reference from enclosing scope...
... is that reference mutable?
... is the object on the other side of that reference mutable?
... does the lamdba capture a copy, or the actual reference?
JVMIS is good to know
offers insights into underlying platform
offers access to power Java doesn't provide
aids debugging and spelunking
crucial to understanding compiler optimizations and/or costs
way to justify all those college courses on assembly language
just plain fun!
Books
Programming the Java Virtual Machine, by Engel
Inside the Java2 Virtual Machine, by Venners
JVM Specification, 2nd Ed, by Steele, et al
Who is this guy?
Architect, Engineering Manager/Leader, "force multiplier"
Principal -- Neward & Associates
http://www.newardassociates.com
Educative (http://educative.io) Author
Performance Management for Engineering Managers
Author
Professional F# 2.0 (w/Erickson, et al; Wrox, 2010)
Effective Enterprise Java (Addison-Wesley, 2004)
SSCLI Essentials (w/Stutz, et al; OReilly, 2003)
Server-Based Java Programming (Manning, 2000)