Recently, a project I was working on experienced a memory leak problem. I used JVM verbose GC log, GCeasy.io, Eclipse Memory Analyzer (MAT), and IntelliJ to figure out the root cause of the problem and fixed it.
While monitoring our distributed Akka application, I noticed that one of the Akka micro-service spills out warnings of “slow heartbeat”, which suggests a look into the slow GC of the JVM.
First, I turned on verbose GC on the JVM running the problematic micro-service, with the help of GCeasy.io, I could clearly tell there was a memory leak which makes the full GC slower and slower.
It was a nice first step to confirm that the slow GC was caused by a memory leak; however, the next question is what caused the memory leak and how I can get more insight of the JVM heap when the slow GC happens. Secondly, I took a heap dump and used Eclipse Memory Analyzer (MAT) to analyze which class holds the most of the memory.
With the help of the heap dump and Eclipse Memory Analyzer (MAT) I could affirm that the memory leak is created too many instances of class: akka.actor.LightArrayRevolverScheduler. Then I googled and read the Akka documentation, which made me think that the memory leak is related to the scheduled jobs.
Thirdly, I pulled out the Akka source code in IntelliJ and found the code where the jobs were scheduled and set a break point on that line of code. After starting the micro-service in debug mode, sending a few messages to the micro-service, voila! The break point was triggered for each message sent to the micro-service. In the application, the micro-service was supposed to have millions of messages sent to it, which means that every message it received, it would start a new scheduled job. Here you are, this is the memory leak I was hunting for. Based on the call stack I figured out that for each every message the micro-service receives we are using SerializationExtension.createExtension(system) rather the SerializationExtension.get(system).
The solution is simple, replacing all the SerializationExtension.createExtension(system) call with SerializationExtension.get(system) call, the memory leak was fixed.
It was a good journey to dig out this interesting problem. What I learnt from the process is that always ask yourself what I can see more information related to my question and what kind of tools I should use:
Slow GC? — Verbose GC
Memory Leak? — Heap Dump +Eclipse MAT
Why the code logic happens — IntelliJ + Source Code.