Understanding Memory Tuning in JVM- A Case Study and Analysis

Posted by

JVM Heap Model

The JVM heap model consists of the Young generation and the Old generation memory. The newly created objects are allocated to the young generation memory, as they grow older and are able to survive the Garbage collection cycles they are moved to the tenured or old generation.


Fig 1. View of the Java Heap


GC roots

The GC (Garbage Collector) roots are objects which are the root of a collection of objects which represent live objects at any point during the up-time of an application. The live objects are not collected during the Garbage collection phase and the Garbage collector collects those objects that are not GC roots and those that are not referenced directly or indirectly from the GC roots.

There are different kinds of GC roots. Each object can belong to more than one kind of root. Some of the kinds of GC root are given below:

Class – Classes loaded by system class loader. Such classes can never be unloaded and they can hold objects via static fields.

Thread – Any live thread

Stack Local – local variables and parameters of a Java method

JNI Local – local variables and parameters of JNI method

JNI Global – any global JNI reference

Held by JVM – Objects that are excluded from garbage collection and held by JVM for its purposes are part of the GC Roots. There is no clear guideline for determining the objects belonging to this category. The known cases are: system class loader, some exception classes which the JVM knows about, a few pre-allocated objects used by exception handling, and custom class loaders when they are in the process of loading classes.

GC Markup

Fig 2.  An overview of Garbage Collection Markup Phase for Old Generation

Types of Garbage Collectors

Various types of GC algorithms are supported by the JVM; if you don’t specify the algorithm explicitly a platform-specific default will be used. There are separate algorithms for reclaiming memory from the young generation and the older generation but in-fact a particular set of algorithms must work on both the above generations. These can be classified into the following categories.

  • Serial GC
  • Parallel GC
  • Parallel New for Young + Concurrent Mark and Sweep (CMS) for the Old Generation
  • G1 or the Garbage First.

Whenever a Java Application is running, garbage collection might happen for the young generation as well as for the old generation. During the former (called the minor GC) all the non GC-Root objects in the young generation are cleaned up and some of them are moved to the old generation. Whereas, during the Full GC, both the young generation as well as the old generation memory are cleaned up. There is another term called the major GC which is used to signify that only the tenured or the old generation is collected.


Serial GC

This collection of garbage collection algorithms uses mark-copy for the Young Generation and mark-sweep-compact for the Old Generation. Both of these collectors can only run as a single thread and hence can’t make use of multiple CPU’s where available, and also they make the application stop while the GC is running. This behavior is signified by the stop-the-world pauses and it has an impact on the response times of an application. If the duration of these pauses exceeds an acceptable limit then the application threads will be paused for a long time and the time consumed by the garbage collector will negatively affect the throughput or the response time for each operation that is requested of the application. This algorithm should be used only in cases where the underlying hardware supports only a single CPU with a couple of hundred MB of RAM.We can enable this collector using the -XX:+UseSerialGC jvm flag


Parallel GC

This combination of Garbage Collector algorithms uses mark-copy algorithm in the Young Generation and mark-sweep-compact algorithm in the Old Generation. Both these algorithms involve marking and copying / compacting phases and they cause the application threads to pause while the garbage collection is happening. They induce pauses thus stopping all other application threads and this is the classic Stop-The-World pause. This algorithm works using multiple threads executing in parallel. Using this approach, collection times can be considerably reduced because the Garbage Collector runs on multiple CPU’s (hence parallel). But if response times are critical then this type of GC might also not be suitable as it also causes STW pauses, though it’s much better than Serial GC.

Selection of Parallel GC is done via the specification of any of the following combinations of JVM Parameters:

java -XX:+UseParallelGC -XX:+UseParallelOldGC

Concurrent Mark and Sweep

It uses the parallel stop-the-world mark-copy algorithm in the Young Generation and the mostly concurrent mark-sweep algorithm in the Old Generation.

The algorithm for collecting the young generation is the same as the parallel collector algorithm but the algorithm for the collection of the old generation is “mostly concurrent” and does not induce long pauses. It uses an algorithm which has multiple phases and out of these  most  are concurrent (they run along with the application) and there are two of them are not concurrent which means that they pause the application, thus the term “mostly” concurrent. With the Old GC happening at the same time application threads are running there is minimal impact on the response times and throughput.

However, the CMS algorithm for Old Generation still tries to run with the application threads and thus gets into competition for the CPU time. By default this GC algorithm uses the number of threads equal to ¼ of the number of physical CPU cores of your machine.

As in the case of other Collectors available, the CMS collector works on both minor and major collections. The CMS collector tries to reduce the impact on response time of application by reducing the pause times during major collections .It does this by tracing the live objects using separate threads and more so doing it concurrently with the execution of the application threads. There are two pause periods during the CMS collection of the old generation memory with the second pause being longer of the two.

Selection of this algorithm is done via the specification of the following  JVM Parameters:

java -XX:+UseConcMarkSweepGC

Concurrent Mode failure

The inability to complete a collection concurrently is referred to as concurrent mode failure. Concurrent mode failure occurs If the CMS collector is not able to reclaim the set of Objects marked as garbage before the tenured generation is full up-to capacity also If allocation of a set of objects is attempted and there is no available free space in the tenured generation and when the Garbage Collector is run explicitly using System.gc() or through a Diagnostic tool.


Excessive GC Time and OutOfMemoryError

A Java application throws an OutOfMemoryError if the time being spent in garbage collection is way beyond it’s benefit. If more than 98% of the total time (Application execution+ GC time) is being spent in garbage collection and less than 2% of the heap is recovered, then an OutOfMemoryError is thrown. This feature is designed so as to indicate to the administrator that the application/s running on the JVM  are making no progress because the heap is too small and need more memory to be allocated to the application/s and further need tuning of the GC.


G1 is a real-time  garbage collector which can be tuned by setting configurable performance criteria to it such as the duration of the stop-the-world pauses, the criterion can take the form of STW pauses no longer than 10 milliseconds in any given second. This is a relatively soft goal and the G1 collector will try to meet this but it’s not guaranteed. Through this approach we can achieve higher predictability with which the algorithm will behave and serve a very important goal of predictive behavior of the GC.

To Achieve this G1 has adapted to a new perspective of the Heap, it says that the heap doesn’t have to be split into contiguous Young and Old generation, instead the heap can be split into a number of smaller heap regions and these regions can be marked as Eden, survivor or old regions. The cumulative collection of Eden and survivor regions is the Young generation memory and the sum of old regions is the old generation memory.

G1 Heap view

Fig 3. Garbage First Collector(G1) View of the Heap

This allows the GC to avoid collecting the entire heap at once, and instead approach the problem incrementally: only a subset of the regions, called the collection set will be considered at a time. All the Young regions are collected during each pause, but some Old regions may be included as well.

Selection of this algorithm is done via the specification of the following  JVM Parameters:

java -XX:+UseG1GC

A Case Study

We have taken a sample program that is creating some objects and then doing some processing, It requires more memory than 4Gb and is generating enough garbage to force a full GC using both the CMS and the G1 collector. We have taken an insight into the operational aspects of the GC using The Jstat command line utility that ships with the JDK and is located in the $JAVA_HOME\bin directory, We have captured the lvmid using the jps command which is used by the jstat command. This lvmid is the identifier of the Jvm process that is executing a java program and could be anything from an app-server to a Standalone Java program. We use the following command for our sampling purpose

Jstat –gc <lvmid> 300ms 15

Here we are taking the garbage collection statistics for specified lvmid at an interval of 300ms and we are specifying a count of 15 intervals. This can be changed to suit our needs.

Also we are using the JConsole Utility to take an insight into the JVM heap and we are collating the two findings to come-up with a conclusion.

G1 facts

As it is widely acknowledged that G1 is better than CMS only for large heap sizes and mostly for server-class machines so the results in the screenshots below can-not prove the superiority of the G1 above the CMS because we are working with a mini-model simulated at Max Heap size of 4 GB.


We have tried to vary the complexity of the java program we are monitoring and we have come up with a final version with which we can come up with conclusive results and we have attempted to tune the JVM Arguments for the G1 collector for improving the throughput and latency.

Also we have evolved our study from an extremely simplistic case where we used a Max Heap Size of 32MB. In this case it was quite evident that CMS is much faster than G1, because G1 primarily aims at 90% application time and 10% GC Time, and if the Max Heap size required, churn rate, amount of Garbage being generated and processing time is not too high and is on the lower side CMS definitely was a clear choice.

Given below are some insights into the operational aspects of the GC for our simulation exercise.

CMS (Concurrent Mark and Sweep)

The below snapshot shows that the Eden space usage is touching around a high of 830mb as shown from the jstat output EC column and as soon as there is allocation failure the YG collection happens and either the objects are removed or they are moved to the Survivor generation. As is shown below up-to the time when the sampling has been done there have been 35 YG collections and the total time spent in it is 26.935 seconds.


The jstat command below taken at a random pick time shows that till that time there have been 8 YG collections and duration of the YG is 11.2s as shown by the YGC column and there have been 15 FGC and the duration of the FGC is 13.78s. If you notice the transition from 8 to 9 there is a peak YGC Eden Utilization and it almost touches the Eden Capacity and also the survivor 0 utilization touches the peak capacity so Young generation is almost full and hence the Young GC collection. The application pauses for 1695ms during this phase.


Old Gen memory graph shows that the Old Generation capacity is around 3.1 gb and there have been around 45 major collections. These collections have happened in parallel and the ConcurrentMarkSweep Algorithm has been used and as explained above in the article the pause time for full collections is optimized to be on the lesser side.


Execution time of the program with the following JVM args -Xmx4096m -Xms2048m -Xmn1024m -XX:+UseConcMarkSweepGC was 395108ms

Garbage First Collector (or G1)

The below snapshot shows that the Heap space is divided into dynamic Eden regions as is evident from the jstat output EC column and the Eden Capacity is also not fixed or contiguous and as soon as there is enough garbage in the YG the YG collection happens and it collects the collection set where there is highest concentration. As is shown below up-to the time when the sampling has been one there have been 154 YG collections and the total time spent in it is 60 seconds.


As is shown in the jstat output below the first full gc occurs when the Old generation capacity touches approximately 4Gb and the time spent in the GC cycle is 2 seconds 484ms, there after there’s a big drop in the tenured generation size as well as the utilization also the survivor spaces are collected so this is collection of both the YG and the OG.


Execution time of the program using G1 with the following JVM args -Xmx4096m -Xms2048m -XX:+UseG1GC was  479798ms

Tuning of G1 collector

We tried tuning of the Application with a Max heap size of 4GB by adjusting the –XXMaxGCPauseMillis and the -XXG1HeapRegionSize and achieved a decent improvement in the execution time. The defaults for the above parameters are given below

–XXMaxGCPauseMillis=200ms This is a soft goal, it says that the Max pause time in milliseconds should not be more than 200ms, the G1 algorithm tries to achieve it as much as possible but it doesn’t guarantee the achievement of this goal.

-XXG1HeapRegionSize=1m to 32m this goal should ideally target a maximum number of regions of 2048 within the minimum heap size specified.


Here we have tuned our application with the following JVM parameters:- -Xmx4096m –Xms2048m -XX+UseG1GC -XXMaxGCPauseMillis=600 -XXG1HeapWastePercent=20 -XXG1HeapRegionSize=4m

We found that if we increase the Pause target the response time was poor and also with a smaller pause target. The purpose of a higher target in this case is to have better throughput, whereas with a lower target the latency is low.

Once again, we experimented with the Heap Region Size and found that on decreasing this size, than the above value, the execution time of the program was more.

As a result of the above tuning, the execution time was 381074ms which was lower than the time taken by the CMS. Also, though as already stated, the benefit of G1 is visible with server class machines, a much larger memory model and with heavier application load.






Related Posts

  • Introduction to Messaging

    Messaging is one of the most important aspects of modern programming techniques. Majority of today's systems consist of several modules and external dependencies. If they weren't able to communicate with…

  • Hadoop Cluster Verification (HCV)Hadoop Cluster Verification (HCV)

    Verification scripts basically composed of idea to run a smoke test against any Hadoop component using shell script. HCV is a set of artifacts developed to verify successful implementation of…

  • Teradata Intelligent Memory (TIM)

    Overview of Teradata Teradata is a RDBMS (Relational Database Management System). This system is solely based on off-the-shelf (readymade) symmetric multiprocessing (SMP) technology combined with communication networking, connecting SMP systems…

  • Introduction to Time Series Forecasting in RIntroduction to Time Series Forecasting in R

    Time is the most critical aspect of a business. It has the power to make or break any business, making proper utilization of time all the more crucial. It is,…

  • jBPM with Human Task

    jBPM is a flexible Business Process Management (BPM) Suite, fully open-source and written in Java. It allows one to model, execute, and monitor business processes throughout their life cycle. Business…

  • Perl One-LinersPerl One-Liners

    What are Perl One-Liners? As the name suggests, Perl One-Liners are small single line of codes that can perform several tasks, ranging from changing line spacing and numbering lines to…

Leave a Reply

Your email address will not be published. Required fields are marked *