It is important to understand the impact of memory use and garbage collection upon performance in the Java Virtual Machine (JVM) when used in high-performance applications, such as a Diffusion based system.
A full-scale Diffusion deployment can deal with millions of topics, with thousands of subscribers. A deployment of this type will inevitably place high memory requirements on the JVM (in the range of gigabytes). Memory, in a rapidly changing, scalable environment, needs to be reclaimed effectively, and the modern garbage collectors used in Java are optimized for this. However, garbage collecting events have an impact on performance. In systems which require high availability and responsiveness to data changes, this impact should be minimized. Understanding the behaviour of the JVM is one of the first steps to find problems such as memory leaks or performance and scalability issues, and the memory profile provides invaluable data for it. The final part of this document introduces tools which can be used to monitor and analyze a memory profile of JVM.
Java Virtual Machine heap area
The memory structure in the Java Virtual Machine consists of several data areas which reside in native memory and which have different roles.
The focus of this article is on the heap area where memory for all class instances and arrays is allocated. The heap is created at JVM start-up and shared by all JVM threads. During execution of a Java program, threads allocate memory for newly-created objects on the heap, but do not reclaim it when these objects are not required anymore. With time the heap, which has a limited amount of memory space, is filled with unreachable objects: objects without reachable reference, eligible for reclamation (1). This memory needs to be reclaimed, otherwise the heap will be exhausted, filled with unreachable objects with no space for freshly created objects. In languages like C or C++, developers are responsible for memory management, but in Java this process is automated and it is called garbage collection (GC). The JVM garbage collector divides the heap into smaller parts called generations. These are: Young Generation and Old or Tenured Generation. Different garbage collection algorithms are used for managing different generations.
The Young Generation is an area where all new objects are allocated and aged (2). When the young generation becomes full, a minor garbage collection is performed which moves still referenced objects (survivors) from it, to allow for threads to create more new objects. At the end of the process, only unreachable objects are left in the garbage collected young generation area. That means the area can be reclaimed and used for the allocation of new objects. The minor garbage collections are stop-the-world events: program execution stops in all threads, excepting threads needed by GC, until collection is completed. However, minor collection is very fast, so for most applications, this pause has negligible impact on latency. Fig.2 represents the heap structure of Java HotSpot Virtual Machine, where the Young Generation is further divided into more areas, called Eden and Survivors. These areas are used to improve the minor garbage collection process. In this architecture, new objects are created in Eden space. During the first minor GC, surviving objects are moved to one of the Survivor areas, instead of directly to the old generation. The following minor GC will collect garbage from Eden and the initial Survivor (with age incremented by 1) and move them to another Survivor space. This step empties the initial Survivor space, which can be then reused. The process is repeated when Eden space is full and only objects of an age exceeding a certain threshold are moved to the Old Generation space.
Objects that survive long enough are moved to the Old Generation space. As these objects collect here, eventually they need to be collected by a major garbage collection event. The JVM offers several types of GC with different characteristics, which are suitable for different platforms and architectures:
- Serial GC – default for client machines, as it uses a single CPU.
- Parallel GC – default for server-class machines with more than 2 cores and a large amount of memory. It is similar to Serial GC, but uses multiple threads to speed up the process and shorten collection pauses. Can be used for minor and major collection.
- Concurrent Mark and Sweep (CMS) GC – this attempts to minimize the stop-the-world pauses by doing most of the work concurrently with the application threads.
- Garbage-First (G1) GC – a newer GC type, this is the default used when starting Diffusion 6.0. This collector drastically changes how the heap is managed. It divides the heap into small, equal-sized regions, any of which can be used as Eden, Survivor or Old space (3). This provides greater flexibility, as the area assigned to the three different types of space can now vary depending on the demands of the application.
Memory utilization in JVM
Measuring memory utilization with Java Mission Control
There are many tools available to monitor and record memory use in the JVM. The Oracle JDK comes with several monitoring tools. Java Mission Control (JMC) and Java Flight Recorder (JFR) are two of them. JMC and JFR monitor a whole range of JVM parameters. They are free to use for development (an Oracle license is currently required to use them in production environment) and have a minimal impact on system performance. Java Mission Control provides a control console. Java Flight Recorder records a JVM profile over the specified time period into a file which can be analyzed later. The Java Flight Recording Knowledge Base article provides information on enabling these tools for the JVM and how to record a JVM profile. Long running Java processes such as Diffusion might require observation for a long period of time, during which many GC events may take place. With a large heap size, these events might take a significant time to occur, so triggering GC manually can be used to check the trend of memory use.
When there is an indication of a memory leak in an application, the next step is to find where it occurs. JFR provides heap objects statistics which can help, but a heap dump can provide more detailed information about heap object allocations. The Heap Dumps article discusses in detail how this can be done.
Oracle’s specification for The Structure of the Java Virtual Machine Oracle tutorial on Java Garbage Collection Basics Oracle guide to Getting Started with the G1 Garbage Collector Details on The Parallel Collector in the HotSpot Virtual Machine Garbage Collection Tuning Guide Details on The Concurrent Mark Sweep Collector in the HotSpot Virtual Machine Garbage Collection Tuning Guide Details on The G1 Collector in the HotSpot Virtual Machine Garbage Collection Tuning Guide Garbage Collection discussed in the context of tuning JVMs, including a comparison of different GC types Java Mission Control description in the Java Platform Troubleshooting Guide