This post is part of our serie about transcoding. Read them all via the link “Transcoding” to get an understanding on how we proceed to convert an application 1oo% automatically from Cobol to Java.
Cobol to Java for large systems
We always migrate (very) large applications with advanced Cobol programs: we have worked on assets ranging from 4 to 12 millions lines of Cobol source code. Those big applications are in use in large corporations: many thousands of users process data at very high transactional rate. Some programs used in those transactions may have thousands of variables and a single transaction can chain the calls to tenths of those programs. Total activity may range in millions of transactions per day.
This article explains how we handle the usual trade-off between memory consumption and cpu cycles burning in our runtime framework. It is all about reducing the impact of “blocking garbage collection” in Java which means thousands users stopped in their business activity.
Java garbage collection: principles
Java librates the programmer to manage object memory deallocation: the JVM does it by himself by recognizing the objects that are fully dereferenced (i.e not pointed to by any other live object) and by freeing corresponding memory for further reallocation. This is a big relief for programmers and a nice booster for their productivity.
But, somebody has to take care of freeing the memory corresponding to those objects that are no longer referenced by any other object. That the mission of the Garbage collection in Java Virtual Machine. Some of this housekeeping is done is the JVM by system threads running in parallel with application code. This is usually called “non-blocking” garbage collection. [For more details about garbage collection, the corresponding Wikipedia article is a great starting point]
But at some points when it gets really messy and overcrowded, the JVM has to stop the application code in order to compact as much as possible the active objects in memory in order to make new allocations simpler and more efficient. Its means moving the objects still active and changing their pointer within running code. Hence, the full stop required for application in order to avoid transient dangling pointers. This kind of garbage collection is usually called full GC.
Impact of garbage collection on user productivity
In our first project , we did not initially take care of garbage collection, especially full GC, considering it as it a natural and acceptable tribute to pay to teh relief of memory management by the system. But, when loaded increased to a few hundred users, we had to rethink it thoroughly as we would experience 20 to 30 period of full GC of 20s to 25s during office hours: when you do the full maths, it means hundred of hours of work lost collectively per day by thousands of users just waiting the traditional hourglass icon to disappear…
It was by then Java 5, now the standard is Java 7 where lots of progress have been done to reduce full GC: this article by Oracle explains how.
Even with this evolution across Java version to reduce GC to a minimum, it still happens on live systems.
Garbage collection under control
So, over time, we developped our strategy to reduce full GC to a very strict minimum: just a couple per day and they happen mostly at period of very low activity. In fact, most happen very early in the morning when no user is already active: the cleanup by the garbage collector is done right after system start.
Why is it so ? Because, we rely heavily on our own very controlled object allocation:
- the transcoded application code in Java does not allocate any new object: it relies on the Vars objects allocated by our runtime in the transcoded working storage section (see our other post on Cobol variables for more details).
- Those Vars objects belong to the category of “managed objects” in our framework. Most objects needed to represent Cobol programs and their processing activity also belong to this category: WorkingStorageSection, CobolProgram, LinkageSection, SQL connections, user terminals, etc.
- all managed objects are handled by a class acting as a very usual object factory (more details on the Factory pattern in this article of Wikipedia). A cache of already allocated but currently free instances of a given managed class is part of this factory.
When a request of a new instance of a new object of a given managed arrives to the factory, it first checks if any free instance is already available. If yes, it is given back right away (after re-initialization of data values) to the requester and if not, a new instance is created.
The cache is initialized at system start with a number of instances given as a parameter. On the other side, we collect via MBeans conforming to Java Management Extensions, also known as JMX (more details in this Wikipedia article), lots of numbers and statistics around the activity of the factory and its associated cache. After just a few days of observation, it is very easy to determine the right value corresponding to the “average maximum” of simultaneous instances of a given managed class.
By using this value in the property file of our runtime framework, we obtain a very simple way to minimize the full GC on the transcoded application during office hours:
- the initialization of our runtime framework happens at JVM start (let’s say at 05:am)
- all the instances of objects of managed classes defined by our initialization parameters are created by the factory and stored in its list of available instances
- this initialization sequence is fairly intense on memory fragmentation while (tens of) millions of new objects, some of them transient, are allocated
- while it happens, the normal GC activity happens (even full GC) but it’s ok because nobody is there
- when it’s over, the application is ready to run with caches full of needed instances of variables, working storage sections, program descriptions, etc. and the JVM memory space is rather clean and compacted because the GC could do the needed memory sweeps and object compactions with no harm as nobody was there.
So, when people arrive, even all at once, the GC activity remains very low because all needed objects are allocated: the factory just uses the objects that are ready in its list.
Of course, our strategy is a clear application of the usual time vs memory computing trade-off: in order to save time, you have to spend more on memory.
This memory may have been an issue in the past with proprietary systems where memory was extremely expensive. But, as our favorite and recommended target for migration are x86 servers with very cheap components, adding 1 gigabyte of memory to maintain user productivity is no longer an issue !