ted.neward@newardassociates.com | Blog: http://blogs.newardassociates.com | Github: tedneward | LinkedIn: tedneward
Learning how the JVM manages memory
What it means when we say "managing memory"
Getting an overview of memory management
I will NOT be telling you the super-secret "make it go fast" switch
I will NOT be able to debug your OutOfMemoryError
I will NOT be doing Really Cool Demos (TM)
no computer has an infinite amount of memory
programs must thus obtain, use, and release memory as they execute
thus, "memory management" is the manner in which we obtain and release memory
manual memory management
"memory management is too important to be left up to the system"
automatic memory management
"memory management is too important to be left up to the programmer"
manual: human error
space reclaimed twice ("double deletion")
reclaimed space accessed ("dangling pointer")
space never reclaimed ("memory leak")
space overwritten ("memory corruption")
automatic: machine ignorance
space managed ineffectively ("fragmentation")
space not reclaimed immediately ("pause time" and "deferral")
overhead (time, space) incurred along the way
and both can run into similar kinds of problems
management overhead (time, space)
fragmentation
allocation of free space
reclamation of unused space
If we're lucky, it'll have a way to determine which space is which
safety
prime directive: no object still "in use" should go away!
throughput
number of objects that can be passed through the system
speed/pause time
how long does it take to do a reclamation?
space overhead
above and beyond the actual allocated space
completeness
do we get all the reclamation-eligible space?
scalability
how well can it take advantage of multi-core hardware?
portability
how well can it be adapted to different hardware configuration/platforms?
pause time
how long it takes a GC to conduct a collection
concurrency (stop-the-world)
pausing all activity to conduct a collection
allocator
the entity responsible for heap memory management
usually consists of at least two operations: allocate and release
responsible for tracking allocated and free space
collector
the entity responsible for automatic memory reclamation
will usually work closely with the allocator
mutator
the entity executing application code and mutates the object graph
references
handles to blocks of allocated memory (often "objects" or "variables")
activation record / stack frame
the block of memory allocated for use by a block of code
function / procedure / method
block of code for execution
object
space allocated for language usage purposes, typically for an atomic entity as defined by the language
"live" objects are those still in use
"dead" objects are those no longer used (but not reclaimed)
heap
complete amount of space in the process not dedicated to other use (stack, OS, etc)
page
an OS-determined "chunk" of the heap
granule
smallest amount of memory allocable, usually a machine-sized word
remembered set
keeping a separate list of interesting references between two sets of objects; eliminates need for scanning
a giant array of words/bytes, addressable by indices (pointers)
operating systems put some structure on that space
they may reserve some of it for their own use/management
... but the rest of it is entirely up to the program
including how to partition that space for use
some of which will be for code, but more of it for data
... and it might seem really really big
32-bit pointers means 4GB of available addresses
64-bit pointers means more available addresses than stars
... but it's still finite
"640k should be enough for anyone...." -B Gates
physical limitations (even with swap space)
static allocation
stack allocation
heap allocation
"How do I know what to allocate/deallocate?"
"When do I allocate/deallocate?"
"How do I manage allocated storage?"
all names are bound to memory at compile-time
limitations:
the size of each data structure must be known at compile-time
no recursion permitted since all procedure activations will share the same locations for local names
data structures cannot be created dynamically
"How do I know what to allocate":
known at compile time/first analysis or startup
"When do I allocate/deallocate":
allocate: at program start
deallocate: at program termination
"How do I manage":
typically simple linear allocation
decided at compile-time
... but of course this can vary
storage is allocated for use by procedure call records (the "stack")
"activation records"/"frames" created per procedure
local names/values bound to space in these areas
implicitly destroyed when the activation frame "pops"
recursion is possible, since each call gets a new frame
size of local allocations can vary
local name/values cannot persist from one activiation to the next
called activation record cannot outlive its caller
only an object whose size is known at compile-time can be returned as the result of a procedure
fast cleanup; everything is cleaned up with the "pop" of one stack frame
fixed-order cleanup; reversed order from allocation
"How do I know what to allocate":
declared/known at compile-time
"When do I alloc/dealloc":
allocate: at procedure entry
deallocate: at procedure exit
"How do I manage":
simple linear allocation (on the thread or activation frame)
based on sizes known at compilation time
the "rest" of the available memory, with no structure imposed
storage allocated out of the heap can be allocated/deallocated in any order
implications:
size is no longer fixed and can vary dynamically
heap-allocated storage can outlive the activation record
slow cleanup; each allocation must be individually released
no fixed order for cleanup
"How do I know what": deferred until actual request
"When do I alloc/dealloc":
allocate: on demand (when requested)
deallocate: how do we identify live data?
"How do I manage":
how do you want to manage it?
algorithms for heap-managed space abound
static
code
"globals" or "object static"
stack
procedure/method activation records
typically per-thread
heap
all other space
usually either managed manually or automatically (GC)
allocate each object sequentially out of the total space
maintain just a "next" pointer
on allocation, just bump the pointer by size of allocation
when the end of the space is reached, space is exhausted
maintain a list of allocated/free sub-chunks
allocate out of the chunks (often sequentially)
when the chunk is no longer big enough, move to the next chunk
maintain a list of open spaces (regions)
crawl through the available list
first one that's large enough to contain the request, allocate it
update:
either that allocation is the entire region
create a new free region and update the list
maintain a list of open spaces (regions)
sort or group the regions by size
crawl through the available-region list
find the best fit, allocate it
garbage determination
deallocation/release
cleanup
tactical (individual allocation) cleanup
strategic cleanup
Reference counting
Mark-sweep
Mark-compact
Copying
deferred reference counting
coalescing reference counting
buffered reference counting
multi-region copying
Mark-sweep-compact
Generational collectors
some will combine them together
Python: reference counting + mark/sweep
JVM: copying + generational
others will allow for "tuning" in varying ways
JVM offers a gazillion tuning parameters
CLR puts "large objects" into their own heap
each time an object is referenced, count goes up
each time an object is "released", count goes down
at count zero, object is eligible for reclamation
typically the object frees itself
Objective-C (NeXT, Mac, iOS)
Swift (iOS)
Microsoft COM
early (pre-1.0) Java releases
who is responsible for inc/dec operations?
what is the maximum number of references?
memory mgmnt costs distributed throughout the application
no runtime requirements/costs
can factor in as part of libraries (C++/Boost)
easy native language interoperability
pointers to objects are simply pointers
time overhead on the mutator
every operation must manipulate reference counts
ref count manipulaions and pointer load/store must be atomic
potential concurrency contention
read-only operations can turn into read-write operations
meaning, system must inc ref count, which is a write op
cyclical references can create unclaimable garbage
pause times still possible
destroying one object triggers "destruction storm"
heap can get fragmented (no rearrangement)
"mark" phase: identify (mark) objects still in use
"sweep" phase: remove/reclaim any objects not so marked
each object in use must be "marked" to remain alive
algorithms to detect live objects vary by environment
object is simply reclaimed in place
objects are NOT moved or adjusted in the heap
concurrency: how many threads are in play?
borrow caller thread on allocation requests
one for mark and sweep operations
one for mark, one for sweep
multiple threads for each?
size of the heap in question
fragmentation is almost guaranteed
objects must have a header (for the mark bit/flag)
the entire heap must be referenceable
for large heaps, mark/sweep phases can be excessively long
marker/tracer must be able to identify references (as opposed to integers)
"mark" phase: identify (mark) objects still in use
"compact" phase: rearrange the heap
go through the entire heap, object by object
if it is in use, mark it; if not, move on
how we determine use is a variance point
thus, only objects still in use will get marked
move objects to "snuggle up" next to one another
usually this means moving objects up to/into the first open space
this can be either first-fit or best-fit or some other algorithm
is compaction necessary?
concurrency: how many threads are in play?
borrow caller thread when allocation request comes in
one for mark and sweep operations
one for mark, one for sweep
multiple threads for each
size of the heap in question
moving objects means pointers to those objects require update
this can be expensive!
sometimes object references are handles (into a lookup table) rather than pointers
objets must have a header (for the mark bit/flag)
tracer must be able to identify references (as opposed to integers)
the entire heap must be referencable
fast ("bump a pointer") allocation
for large heaps, mark/compact phases can be excessively long
throughput costs of compaction
requires multiple passes over live objects
long-lived objects will get copied/moved repeatedly
also sometimes called "arena" collectors
fast allocation
fast reclamation
but huge overhead (50%)
divide the heap in half (!)
allocate new objects out of only the first half
the "fromspace"
typically a simple linear allocator
zero concerns for fragmentation
leave the other half untouched/unused
the "tospace"
stays entirely empty and never allocated
when the "fromspace" runs out:
identify objects in the "fromspace" still in use
copy only those objects over to the "tospace"
empty the "fromspace" entirely (easiest to dealloc/realloc the entire space)
flip the spaces!
"fromspace" is now "tospace"
"tospace" is now "fromspace"
can we copy/move objects? (handles-vs-pointers)
do we have to limit to 2 spaces?
concurrency? (can we run this concurrently?)
allocation is fast
complete elimination of fragmentation
compaction happens automatically during copy
loss of half of the available heap
means a lot more GC cycles
breaks down (significantly) for larger heaps
reclamation of the "fromspace" may not be as fast as we'd like
destructors/deinitializers still need to be run
long-lived objects require copying each pass
large heaps with lots of live objects take much longer to examine
most objects are extremely short-lived
objects allocated together tend to live/die together
called "generations"
first generation is the "young" generation
next generations are successively "older"
objects are allocated out of the first/youngest generation
objects are steadily promoted from one to the next
typically at collection boundaries
objects are collected generationally
allocation will be fast
only scouring the young generation takes much less time
more overhead
each generation requires administrative bookkeeping
moving costs promoting objects from one generation to the next
older objects will not be collected aggressively
programs that don't follow the assumptions will perform badly
in manually-managed systems, objects must clean themselves up
any other (memory) resources allocated as part of their lifecycle
any other (external) resources allocated
in automatically-managed systems, memory no longer needs explicit cleanup
however, external resources still require release
when object "goes away", we invoke its cleanup method
this allows the object to release non-memory resources
file handles, socket handles, database connections, etc
in some languages, this is known ahead-of-time
these are often called a destructor
if the timing isn't known, we call this a finalizer
after the object is determined to be "dead"...
... it is enqueued in a list of other "dead" objects
... and the declared cleanup method is executed
less optimal than a destructor
nondeterministic execution
we have no idea when they will fire
nondeterministic concurrent execution
we often have no idea what order in which they will fire
nonguaranteed execution
may be no guarantee of finalizers' execution during system shutdown
potential for abuse
finalizer code could (potentially) make the "dead" space reachable again
when do finalizers run?
which thread runs a finalizer?
can finalizers run concurrently with each other?
can finalizers access the finalizing object?
when are finalized objects reclaimed?
what happens if there is an error in a finalizer?
is there any guaranteed order to finalization?
JVM is an automatic memory-managed platform
objects are always accessed through references
objects are always allocated out of the heap
non-objects are always embedded in declaration scope
JVM Specification quite silent about GC implementation
"The heap is created on virtual machine start-up. Heap storage for objects is reclaimed by an automatic storage management system (known as a garbage collector); objects are never explicitly deallocated. The Java Virtual Machine assumes no particular type of automatic storage management system, and the storage management technique may be chosen according to the implementor's system requirements. The heap may be of a fixed size or may be expanded as required by the computation and may be contracted if a larger heap becomes unnecessary. The memory for the heap does not need to be contiguous. ... A Java Virtual Machine implementation may provide the programmer or the user control over the initial size of the heap, as well as, if the heap can be dynamically expanded or contracted, control over the maximum and minimum heap size." --Section 2.5.3, JVM Spec
start from "root sets"
static fields
stack-declared reference variables (locals and parameters)
the finalizable queue ("f-reachable")
follow each reference to the object, mark it as "in use"
recursively follow those objects' references (fields) to other objects
if already marked, return
if not, mark then traverse further
any object not marked is not reachable from executable code
and therefore eligible for reclamation
We have a standard flag we can throw
-verbose:gc
: Provides detail about GC operations
We have some "extra" JVM-standard flags we can throw
-Xms
size: set initial Java heap size
-Xmx
size: set maximum Java heap size
-Xss
size: set Java thread stack size
-Xlog:gc:
file: log GC status to file with time stamps
-Xnoclassgc
: disable class garbage collection
--finalization=enabled|disabled
: perform finalization of objects? (enabled)
Any JVM-specific flags are prefixed with -XX
OpenJDK has a ton, but a few other JVM providers have their own set
a reference-counted reclamation system
huge "pauses" at certain points
really bad reputation
given that Java was expected to be just a spec, not surprising
moved to a Mark-Sweep GC
better performance, still not great
Java started to move to the server
Introduction of "Hotspot" JVM engine
a "pluggable" system
"client" VM optimized for short-lived, high-performance applications
"server" VM optimized for long-running, better-throughput apps
Hotspot also introduced us to the "-XX" flags
this afforded opportunities to "tune" the garbage collector
class metadata allocated out of "permanent generation" (and GC'ed!)
hybrid garbage collector implementations
command-line flags to choose (and tune)
infrastructure and tools to monitor the JVM
JVMPI, JVMDI -> JDI, JVMTI
PermGen removed; class metadata now allocated in native memory
still configurable if desired (-XX:MaxMetaspaceSize
)
Objects sometimes represent something more than "just memory"
Files (java.io
)
Network connections (java.net
)
Graphics resources (java.awt
)
We need to be able to execute code when they are deallocated
implement a protected void finalize()
method
this is invoked when the cleans up the object
when that happens, though, is not known ahead of time
you'll know only when the method gets called
order of execution is nondeterministic
time of execution is nondeterministic
possible zombie/resurrection scenarios
zero error-recovery facilities
security risks
performance hits
deprecated as of Java9
slated for outright removal in a future JDK
more and more tools and runtime to support to point out their use
https://openjdk.org/jeps/421
finalizers are bad for a variety of reasons
... but we still need something to run when we deallocate sometimes
set up a block of code to run for each deallocating object
essentially this is where the body of the finalize()
method goes
register these blocks of code to run when an object is phantom reachable
set up a thread to run these blocks of code when the object deallocates
obtain a Cleaner
instance (share it across multiple use) via create()
set up cleanup block via register
:
the object to watch
the Runnable
to run to clean up
NOTE: still no guarantees of execution on VM exit
NOTE: you could build one of these yourself
uses PhantomReference
s and ReferenceQueue
s to implement
nothing VM-specific or implementation-dependent
Simplest Cleaner usage
import java.lang.ref.Cleaner; public class SimpleCleaner { private static final Cleaner CLEANER = Cleaner.create(); public SimpleCleaner() { CLEANER.register(this, () -> System.out.println("I'm doing some clean up")); } public static void main(String... args) { SimpleCleaner app = new SimpleCleaner(); app = null; System.gc(); // On most systems, this should trigger cleanup. } }
using a lambda here is bad
accidentally holds references to enclosing scope
enclosing references will keep the object alive
still relies on GC to reclaim the external resource
non-deterministic
try-with-resources (AutoCloseable
) is preferred Java idiom post-Java8
don't!
don't; let the JVM choose
benchmark early, benchmark often
don't benchmark only "when it's slow"--too many red herrings
avoid micro-benchmarks unless you are a JDK committer
don't mistake activity for bottlenecks
objects come, objects go; that's the circle of life
adjust the heap size first
you can avoid some time cost by maxing out the heap size up front
but this adds overhead and costs flexibility
avoid finalizers like the plague
cleanup is good, but only clean up non-memory resources
if you must clean up, use the Cleaner API
consider reference objects
phantom, soft, weak references help manage memory pressure
know your JVM GC's algorithms
each will require some (slightly) different tuning advice
generational garbage collection
heap divided into "young" and "old" generations
young generation: "eden" and two "survivor" spaces (copying collector)
old generation: mark-sweep-compact collector
will start at specified size, then grow the heap as needed until max is reached
will not shrink the heap during execution
actual implementation details differ
Epsilon
Serial
Parallel
G1
ZGC
"barrier set"
-XX:+DisableExplicitGC
Garbage-First (G1) Collector (server) or Serial Collector (client)
"server-class machines... more than two processors and a heap size >= 1792 MB"
GC thread max count limited by heap size and available CPU resources
Initial heap size of 1/64 of physical memory
Maximum heap size of 1/4 of physical memory
configure to prefer one of two goals:
maximum pause-time
application throughput
if the preferred goal is met, the JVM will then attempt to meet the non-priority goal
pause time: duration during which the GC stops the app and recovers unused objects
max pause time: limit the longest of these pauses
specify the goal with -XX:MaxGCPauseMillis=
nnn
average is taken from the start of execution
more recent pauses count more heavily
throughput: time spent collecting garbage
application time: time spent outside of garbage collection
specify the goal with -XX:GCTimeRatio=
nnn
this means target a ratio of GC:App time of 1/(1+nnn)
-XX:GCTimeRatio=19
sets a goal of 1/20th or 5% total time for GC
if both goals are met, the GC reduces the size of the heap until one can't be met
set min and max heap size with -Xms=
nnn and -Xmx=
nnn respectively
https://openjdk.org/jeps/318
"Develop a GC that handles memory allocation but does not implement any actual memory reclamation mechanism. Once the available Java heap is exhausted, the JVM will shut down."
clearly labeled 'experimental'
do NOT blame anybody if your JVM terminates while using this
linear allocation in a single contiguous chunk of allocated memory
No code changes required
when launching JVM, pass these additional flags:
-XX:+UnlockExperimentalVMOptions -XX:+UseEpsilonGC
prepare for the OutOfMemoryError
but look at how fast we run!
Performance testing. Get the GC out of the picture.
Memory pressure testing. Simplifies finding outer boundaries for memory configurations.
VM interface testing. Useful to the JVM engineers.
Extremely short-lived jobs.
Last-drop latency improvements. For ultra-latency-sensitive operations.
Last-drop throughput improvements. No coordination barriers are needed (no GC operations).
Fail fast and recover scenarios. Sometimes it's better to just die and be replaced.
uses a single thread to perform all GC work
hence all the work is done in a "serial" fashion
might be the default on certain hardware/OS configurations
use via -XX:+UseSerialGC
best-suited to single-processor machines
it doesn't multithread, so it can't take advantage of more than one CPU anyway
it can be useful with apps on multiprocessors w/small data sets (less than 100MB)
see the parallel collector page
Chapter 6, "Hotspot Virtual Machine Garbage Collection Tuning Guide"
https://docs.oracle.com/en/java/javase/24/gctuning/parallel-collector1.html
-XX:MaxPauseTimeMillis=
N: provides a hint to the GC that pause times should be less than N ms
no default
-XX:GCTimeRatio=
N: sets the ratio for the time spent between performing GC work and application work
ratio becomes 1 / (1 + N), so N=10 means 5% (= 1 / 20 = 1 / (1 + 19))
-XX:Parallel GCThreads=
N: sets the number of threads that can be used for performing GC work
defaults to either all the hardware threads (if less than 8) or around 5/8 the total number (8+)
+UseStringDeduplication
: JDK 18 ported String deduplication from G1 to Parallel
can reduce memory footprint by about 10% but could impact pause times.
max pause times under a millisecond (ms)
intended for apps w/low-latency requirements
pause times are independent of heap size (100sMB up through 16TB)
-XX:+UseZGC
see Chapter 9, HotSpot Virtual Machine Garbage Collection Tuning Guide
https://docs.oracle.com/en/java/javase/24/gctuning/other-considerations.html
"... a generational, incremental, parallel, mostly concurrent, stop-the-world, and evacuating garbage collector which monitors pause-time goals in each of the stop-the-world pauses."
performs some expensive work concurrently to the app
designed to scale from small machines to large multi-processor machines w/large heaps
meet a pause-time goal with high probability, still achieving high throughput
-XX:+UseG1GC
a conceptually fascinating topic
a really deep rathole
hard to observe in any meaningful way
not nearly as necessary to "tune" as some posit
Garbage Collection, by Jones/Lins (Wiley, 1996/1999)
The Garbage Collection Handbook, by Jones/Hosking/Moss (Wiley, 2011)
The Garbage Collection Handbook 2nd Ed, by Jones/Hosking/Moss (Wiley, 2023)
Memory Management Reference
https://www.memorymanagement.org/index.html
A method for overlapping and erasure of lists. George E. Collins.
Communications of the ACM, 3(12):655-657, December 1960
Recursive functions of symbolic expressions and their computation by machine. John McCarthy.
Communications of the ACM, 3:184-195, 1960
A Lisp garbage collector for virtual memory computer systems. Robert R. Fenichel and Jerome C. Yochelson.
Communications of the ACM, 12(11):611-612, November 1969
heapothesys: a heap allocation JVM benchmark developed by the Amazon Corretto team
https://github.com/corretto/heapothesys
HeapFragger: Gil Tene's heap fragmentation inducer
https://github.com/giltene/HeapFragger
jHiccup: measure the pauses and stalls ("hiccups") associated with JVM
https://www.azul.com/products/components/jhiccup/
Architect, Engineering Manager/Leader, "force multiplier"
http://www.newardassociates.com
http://blogs.newardassociates.com
Sr Distinguished Engineer, Capital One
Educative (http://educative.io) Author
Performance Management for Engineering Managers
Books
Developer Relations Activity Patterns (w/Woodruff, et al; APress, forthcoming)
Professional F# 2.0 (w/Erickson, et al; Wrox, 2010)
Effective Enterprise Java (Addison-Wesley, 2004)
SSCLI Essentials (w/Stutz, et al; OReilly, 2003)
Server-Based Java Programming (Manning, 2000)