d566 March | 2012 | Eranea

Monthly Archives: March 2012

Large Cobol transactional systems migrated to Java: impact of garbage collection

This post is part of our serie about transcoding. Read them all via the link “Transcoding” to get an understanding on how we proceed to convert an application 1oo% automatically from Cobol to Java.

Cobol to Java for large systems

We always migrate (very) large applications with advanced Cobol programs: we have worked on assets ranging from 4 to 12 millions lines of Cobol source code. Those big applications are in use in large corporations: many thousands of users process data at very high transactional rate. Some programs used in those transactions may have thousands of variables and a single transaction can chain the calls to tenths of those programs. Total activity may range in millions of transactions per day.

This article explains how we handle the usual trade-off between memory consumption and cpu cycles burning in our runtime framework. It is all about reducing the impact of “blocking garbage collection” in Java which means thousands users stopped in their business activity.

Java garbage collection: principles

Java librates the programmer to manage object memory deallocation: the JVM does it by himself by recognizing the objects that are fully dereferenced (i.e not pointed to by any other live object)  and by freeing  corresponding memory for further reallocation. This is a big relief for programmers and a nice booster for their productivity.

But, somebody has to take care of freeing the memory corresponding to those objects that are no longer referenced by any other object. That the mission of the Garbage collection in Java Virtual Machine. Some of this housekeeping is done is the JVM by system threads running in parallel with application code. This is usually called “non-blocking” garbage collection. [For more details about garbage collection, the corresponding Wikipedia article is a great starting point]

But at some points when it gets really messy and overcrowded, the JVM has to stop the application code in order to compact as much as possible the active objects in memory in order to make new allocations simpler and more efficient. Its means moving the objects still active and changing their pointer within running code. Hence, the full stop required for application in order to avoid transient dangling pointers. This kind of garbage collection is usually called full GC.

Impact of garbage collection on user productivity

In our first project , we did not initially take care of garbage collection, especially full GC, considering it as it a natural and acceptable tribute to pay to teh relief of memory management by the system. But, when loaded increased to a few hundred users, we had to rethink it thoroughly as we would experience 20 to 30 period of full GC of 20s to 25s during office hours: when you do the full maths, it means hundred of hours of work lost collectively per day by thousands of users just waiting the traditional hourglass icon to disappear…

It was by then Java 5,  now the standard is Java 7 where lots of progress have been done to reduce full GC: this article by Oracle explains how.

Even with this evolution across Java version to reduce GC to a minimum, it still happens on live systems.

Garbage collection under control

So, over time, we developped our strategy to reduce full GC to a very strict minimum: just a couple per day and they happen mostly at period of very low activity. In fact, most happen very early in the morning when no user is already active: the cleanup by the garbage collector is done right after system start.

Why is it so ? Because, we rely heavily on our own very controlled object allocation:

  • the transcoded application code in Java does not allocate any new object: it relies on the Vars objects allocated by our runtime in the transcoded working storage section (see our other post on Cobol variables for more details).
  • Those Vars objects belong to the category of  “managed objects” in our framework. Most objects needed to represent Cobol programs and their processing activity also belong to this category: WorkingStorageSection, CobolProgram, LinkageSection, SQL connections, user terminals, etc.
  • all managed objects are handled by a class acting as a very usual object factory (more details on the Factory pattern in this article of Wikipedia). A cache of already allocated but currently free instances of a given managed class is part of this factory.

When a request of a new instance of a new object of a given managed arrives to the factory, it first checks if any free instance is already available. If yes, it is given back right away (after re-initialization of data values) to the requester and if not, a new instance is created.

The cache is initialized at system start with a number of instances given as a parameter. On the other side, we collect via MBeans conforming to  Java Management Extensions, also known as JMX (more details in this Wikipedia article), lots of numbers and statistics around the activity of the factory and its associated cache. After just a few days of observation, it is very easy to determine the right value corresponding to the “average maximum” of simultaneous instances of a given managed class.

By using this value in the property file of our runtime framework, we obtain a very simple way to minimize the full GC on the transcoded application during office hours:

  • the initialization of our runtime framework happens at JVM start (let’s say at 05:am)
  • all the instances of objects of managed classes defined by our initialization parameters are created by the factory and stored in its list of available instances
  • this initialization sequence is fairly intense on memory fragmentation while (tens of) millions of new objects,  some of them transient, are allocated
  • while it happens, the normal GC activity happens (even full GC) but it’s ok because nobody is there
  • when it’s over, the application is ready to run with caches full of needed instances of variables, working storage sections, program descriptions, etc. and the JVM memory space is rather clean and compacted because the GC could do the needed memory sweeps and object compactions with no harm as nobody was there.

So, when people arrive, even all at once, the GC activity remains very  low because all needed objects are allocated: the factory just uses the objects that are ready in its list.

Of course, our strategy is a clear application of the usual time vs memory computing trade-off: in order to save time, you have to spend more on memory.

This memory may have been an issue in the past with proprietary systems where memory was extremely expensive. But, as our favorite and recommended target for migration are x86 servers with very cheap components, adding 1 gigabyte of memory to maintain user productivity is no longer an issue !

Posted in Transcoding, Uncategorized | Tagged , , , | Leave a comment

Cobol working storage section and variable declarations transcoded to Java

This post is part of our serie about transcoding. Read them all via the link “Transcoding” to get an understanding on how we proceed to convert an application 1oo% automatically from Cobol to Java.

In this post, we will describe challenges created by the implementation of data representation and variable declaration (PICTURE, LEVEL, etc.)  in the working storage section and linkage section of a Cobol program. We also explain how we implemented their transcoding to Java in order to fully respect the original semantics of Cobol and consequently achieve the iso-functionality, core of our technology.

Cobol working storage section and variable declarations

Cobol has a very “physical” description of data and variable declarations in its working storage section: a programmer defines a data structure and corresponding variable with a given mask (materialized by PIC, PICTURE directives – see below) and then can redefine it with another mask using other basic data types with different PICTURES. See AG01MA1O redefining AG01MA1I here below.

The initial definition and the REDEFINES apply to same physical memory block. The levels (02, 03, 05, 10, 15, etc.) are used to express subsequent parts of the definition mask. Those parts can:

  • be contiguous and describe memory block further when at the same level. Below in the code example: A1CLIL  and preceding FILLER are both at LEVEL 03. the filler takes 12 bytes and then A1CLI1 takes 5 bytes right after this filler.
  • describe in more details the current level:A1CLI1, preceding FILLER and subsequent declarations at level 03 extend progressively the size of AG01MA1I
  • redefine an already defined level via the use of REDEFINES and a mask at same level than the one that it redefines. In the example below, AG01MA1O at level 02 redefines the view of memory already defined by AG01MA1I, using the same memory addresses but for other purposes.

02  AG01MA1I.
03 FILLER PIC X(12).
03 A1CLIL COMP PIC S9(4).
03 A1CLIF PIC X.
03 FILLER PIC X(2).
03 A1CLII PIC X(49).
03 S8I OCCURS  17 TIMES.
04 A1LGL COMP PIC S9(4).
04 A1LGF PIC X.
04 FILLER PIC X(2).
04 A1LGI PIC X(2).
04 A1TXTL COMP PIC S9(4).
04 A1TXTF PIC X.
04 FILLER PIC X(2).
04 A1TXTI PIC X(71).
02  AG01MA1O REDEFINES AG01MA1I.
03 FILLER PIC X(12).
03 FILLER PIC X(2).
03 A1CLIA PIC X.
03 A1CLIC PIC X.
03 A1CLIH PIC X.
03 A1CLIO PIC X(49).
03 S8O OCCURS  17 TIMES.
04 FILLER PIC X(2).
04 A1LGA PIC X.
04 A1LGC PIC X.
04 A1LGH PIC X.
04 A1LGO PIC X(2).
04 FILLER PIC X(2).
04 A1TXTA PIC X.
04 A1TXTC PIC X.
04 A1TXTH PIC X.
04 A1TXTO PIC X(71).

When going back to the roots of Cobol, it means that these definitions correspond to the same chunk of memory seen as a single and continuous data buffer. The compiler then uses different assembler instructions at runtime to process this same memory with the basic mask or a redefinition mask depending on the variable names used by the Cobol programmer.

Emulation of Cobol semantics in Java

When you want to emulate this behavior of Cobol via Java data structure, there a couple of additional things to preserve:

  • the PICTURE of a variable defines how it has to be manipulated: an INITIALIZE on a PIC X(n) set it to space characters whereas the same INITIALIZE on a PIC 9(n) sets it to zeroes
  • each variable at each level has to be defined and accessible individually because itcan be used as such in the program but it must be part of a more global memory structure that can be impacted by program statements of higher order: MOVE to a variable at level 05 will impact all its subcomponent at levels 10, 15, etc…
  • variables that are defined contiguously in Cobol have to be represented the same way in Java because use of a variable of a low level (let’s say 05) can impact all its subparts in their specific order. They cannot be per se fully distinct object: they need to share an underlying data memory object where the data is stored so that the MOVE semantics just described above is preserved for data described in a contiguous manner in the working storage section.
public Var ag01ma1i = declare.level(2).var() ;                              // (20) 02  AG01MA1I.
public Var filler$1 = declare.level(3).picX(12).filler() ;              // (21)     03 FILLER PIC X(12).
public Var a1clil = declare.level(3).picS9(4).comp().var() ;            // (22)     03 A1CLIL COMP PIC S9(4).
public Var a1clif = declare.level(3).picX(1).var() ;                    // (23)     03 A1CLIF PIC X.
public Var filler$2 = declare.level(3).picX(2).filler() ;               // (24)     03 FILLER PIC X(2).
public Var a1clii = declare.level(3).picX(49).var() ;                   // (27)     03 A1CLII PIC X(49).
public Var s8i = declare.level(3).occurs(17).var() ;                    // (28)     03 S8I OCCURS  17 TIMES.
public Var a1lgl = declare.level(4).picS9(4).comp().var() ;         // (29)       04 A1LGL COMP PIC S9(4).
public Var a1lgf = declare.level(4).picX(1).var() ;                 // (30)       04 A1LGF PIC X.
public Var filler$3 = declare.level(4).picX(2).filler() ;           // (31)       04 FILLER PIC X(2).
public Var a1lgi = declare.level(4).picX(2).var() ;                 // (32)       04 A1LGI PIC X(2).
public Var a1txtl = declare.level(4).picS9(4).comp().var() ;        // (33)       04 A1TXTL COMP PIC S9(4).
public Var a1txtf = declare.level(4).picX(1).var() ;                // (34)       04 A1TXTF PIC X.
public Var filler$4 = declare.level(4).picX(2).filler() ;           // (35)       04 FILLER PIC X(2).
public Var a1txti = declare.level(4).picX(71).var() ;               // (36)       04 A1TXTI PIC X(71).
public Var ag01ma1o = declare.level(2).redefines(ag01ma1i).var() ;          // (37) 02  AG01MA1O REDEFINES AG01MA1I.
public Var filler$5 = declare.level(3).picX(12).filler() ;              // (38)     03 FILLER PIC X(12).
public Var filler$6 = declare.level(3).picX(2).filler() ;               // (39)     03 FILLER PIC X(2).
public Var a1clia = declare.level(3).picX(1).var() ;                    // (40)     03 A1CLIA PIC X.
public Var a1clic = declare.level(3).picX(1).var() ;                    // (41)     03 A1CLIC PIC X.
public Var a1clih = declare.level(3).picX(1).var() ;                    // (42)     03 A1CLIH PIC X.
public Var a1clio = declare.level(3).picX(49).var() ;                   // (43)     03 A1CLIO PIC X(49).
public Var s8o = declare.level(3).occurs(17).var() ;                    // (44)     03 S8O OCCURS  17 TIMES.
public Var filler$7 = declare.level(4).picX(2).filler() ;           // (45)       04 FILLER PIC X(2).
public Var a1lga = declare.level(4).picX(1).var() ;                 // (46)       04 A1LGA PIC X.
public Var a1lgc = declare.level(4).picX(1).var() ;                 // (47)       04 A1LGC PIC X.
public Var a1lgh = declare.level(4).picX(1).var() ;                 // (48)       04 A1LGH PIC X.
public Var a1lgo = declare.level(4).picX(2).var() ;                 // (49)       04 A1LGO PIC X(2).
public Var filler$8 = declare.level(4).picX(2).filler() ;           // (50)       04 FILLER PIC X(2).
public Var a1txta = declare.level(4).picX(1).var() ;                // (51)       04 A1TXTA PIC X.
public Var a1txtc = declare.level(4).picX(1).var() ;                // (52)       04 A1TXTC PIC X.
public Var a1txth = declare.level(4).picX(1).var() ;                // (53)       04 A1TXTH PIC X.
public Var a1txto = declare.level(4).picX(71).var() ;               // (54)       04 A1TXTO PIC X(71).

Additional issues between Cobol and Java are:

  • a character is 8 bits in Cobol whereas Java uses UTF-16 where a char is sixteen bits
  • number precision can be very high in Cobol: number with 31 digits can be defined. Numbers with such a precision do not exist per se in Java. Even object of BigDecimals do no reach that extent. Our runtime framework implements an optimized representation of those big numbers and make sure that operations (ADD, SUBTRACT, MULTIPLY, DIVIDE, etc.) produce the exact same results (incl. rounding) as if the operations were executed natively in Cobol.

In contrast, the walls in Java are very well defined: each object definedis independent of all the others and they don’t share memory.

Experienced Cobol programmers can code very tricky statements in order to make the most out of the Cobol mechanisms of REDEFINES described above in terms of maximum execution speed and minimum memory consumption . Cobol was invented when both CPU and memory were very expensive.

Cobol semantics have to be ported unchanged to Java to respect the full iso-functionality of processing that is a USP of Eranea’s solution. Any change to these semantics due to an appromative implementation of Cobol data structure would mean lots of very tricky issues, hard and costly to debug especially as they would appear in the smartest Cobol assets.

So, there is no other solution than a full emulation of the Cobol behavior in Java: for each data structure (usually starting at level 01), we allocate a permanent and shared memory buffer where the various levels of definition of the original variable get  “physically” concatenated when the successive definitions in the working storage section happen.

The core type of our framework to achieve that goal is Var. A Var is defined for each separate Cobol variable. The “factory” for the generation of those various Vars is the level(n) function of the variable named “declare” of type VarSectionDeclaration part of our root class CobolProgram.

The memory organization of the buffer is defined by the value of n in level(x) as it is in Cobol:

  • same level as previous one means extension of buffer to cope with additional variables. For example, in code above, Var filler$1 and Var a1clil get – in their implementation and and instanciation time – access to the buffer created in the instanciation ag01ma1i
  • higher level means further sub-definition of previous level
  • lower level means redefinition of previous level at same value

the picX(n) or pic9(n) methods of the object returned by level() define length and type of currently defined Var.

the valueSpace() or value valueZero() methods are used to initialize the object according to its type.

It is clear that type Var makes internal use of Java native data types like String, int, long, etc. to execute the requested Cobol operations (MOVE, ADD, MULTIPLY, etc.) by leveraging java runtime functions. There is a permanent conversion between Cobol Vars and java native types happening under the hood: Var stores the Cobol representation and tansient native types allow the execution of an operation on this representation.

The current state of the Cobol Vars is always stored in the internal shared buffer and reaccessed through the Var methods in order to preserve the Cobol original semantics described in the introduction of this article.

This post is introductory so we will not touch here all the details of all what happens to properly handle arrays (declared by OCCURS in Cobol), complex REDEFINES, use of POINTER type (via ADDRESS OF statement)  in Cobol, sophisticated data access (memory address values accessed as binary numbers, etc) and so forth. Its aim is to give good hints of  what has to be done to respect original Cobol semantics.

We spend lots of energy in respecting these semantics because it is the basis of our iso-functional transcoding that brings so much advantages for the success of our projects:

  • very simple exhaustive and simple testability of the new system: it has to do the exact same thing at the bit level because it is conceived so. It means that very “mechanical” tools can be built to check that a Java program delivers identical results to its original Cobol counterpart: our non-regression testing tool can then rely on scenarios captured on the mainframe via 3270 and replay them on our web interface and check if results are exactly the same. No interpretation of values has to be done: just compare and if identical, everything is fine.
  • live sharing of production database: as both versions (Cobol & Java) handles data identically, they can share the productive database in real time as a way to communicate transparently between users on the new system and the old one. A user cannot detect if the data currently being accessed was previously accessed on new or old system.
  • Parallel construction: from the previous point, it is obvious that our methodology is based on the new Java system running in parallel of the old Cobol system and on both systems sharing one single database in real time. Consequently, users can be very smoothly transferred from old to new system at most suitable pace for the customer. The shared database functions as collaboration vector between Cobol application and Java derived version. Lethal big-bang can be avoided and no need for a very complex inter-system communication !

This article cleary demonstrated what has to be done to preserve semantics but also the gains derived from this iso-functionality.

Posted in Transcoding | Tagged , , , | 2 Comments

Transcoding Cobol to java: the ‘Hello, World!’ case

This post is the first of our serie about transcoding. Read them all via the link “Transcoding” to get an understanding on how we proceed to convert application 1oo% automatically from Cobol to Java.

The right first step for such a tutorial serie is to go through the traditional “Hello World” in Cobol and see how we transcode it to Java.


IDENTIFICATION DIVISION.
PROGRAM-ID. HELLO_WORLD.
AUTHOR. JOHN SMITH.
DATE-WRITTEN. JAN 2012.
PROCEDURE DIVISION.
DISPLAY "Hello World!".
STOP RUN.

The java version of the original Cobol demonstrates many points.


package com.nea_samples.batch.basic.programme_batch ;                   // (1) IDENTIFICATION DIVISION.

import com.eranea.neaRuntime.*;
import com.eranea.neaRuntime.core.basePrgEnv.*;
import com.eranea.neaRuntime.core.lineMode.*;
import com.eranea.neaRuntime.core.misc.KeyPressed;
import com.eranea.neaRuntime.core.program.*;
import com.eranea.neaRuntime.core.sqlSupport.*;
import com.eranea.neaRuntime.core.varEx.*;
import com.eranea.neaRuntime.pluginsIntf.cics.CESMReturnCode;
import java.math.BigDecimal;
import com.eranea.neaRuntime.core.neaVersionning.*;

public class HELLO_WORLD extends CobolProgram
{
                                                                        // (2) PROGRAM-ID. HELLO_WORLD.
                                                                        // (3) AUTHOR. DIDIER DURAND.
                                                                        // (4) DATE-WRITTEN. JAN 2012.
    public void procedureDivision() {                                   // (5) PROCEDURE DIVISION.
        display("Hello World!") ;                                       // (6) DISPLAY "Hello World!".
        stopRun();                                                      // (7) STOP RUN.
    }
}

Java source code not byte code

we generate fully compliant Java source code and not JVM byte code. This approach has major advantages:

  • the generated application is no longer dependent on our transcoding tools when the migration project is finished.
  • the new source code is regular and plain Java source code that can be handled, processed and analyzed by all Java tools available on the market: javac, Eclipse, Netbeans, CheckStyle, Findbugs, Cobertura, etc.
  • the customer can fully get rid of the obsoleting Cobol technology and mutates its development environment to Java (Eclipse, etc.) which clearly represents current state-of-the-art technology for large corporate applications
  • we made quite a lot of efforts to make the Java source code readable, understandable and maintainable as it becomes the new reference. We preserve original code organization, variable and paragraph names, etc.
  • for readability purposes and for a simpler transition of Cobol programmers to the new Java world, we generate the original Cobol as Java comments on the right side. Each line is Cobol is on the same line as its Java replacement: Cobol developers can better understand how we replace Cobol verbs with their Java equivalent. The learning curve is then much shorter as people on a permanent basis the correspondence between Java and Cobol. It also makes us of Java interactive debugger much easier for beginners.

Java package organization

At the beginning of translated Cobol program as in any Java class definition,  a Java package is defined [here it is com.nea_samples.batch.basic.programme_batch - see line 1] to “position” the class in the hierarchical grouping of source code in the application.

It  allows smart grouping of original Cobol files. When we enter a new project, we usually get thousands of files (Cobol programs, Copy books, SQL statements, BMS / 3270 map definitions) in a single big and flat bucket.

This non-organization is not safe and can make use of an important feature of Java: packages. Packages have several capabilities: package access protection being among the most salient ones. See this Wikipedia page for more details.

So, we have a tool (based on regular expressions applied to original file names of Cobol application objects) that allows all the files to be dispatched in various sub-buckets: the name of those sub-buckets combined to domain name of our customer allows us to generate a hierarchical structure of Java packages that makes it simple for developers to work only on a part of the application.

Rather than checking out, updating and committing the full Subversion source tree under Eclipse, each developer can restrict its working set to a limited number of packages and consequently be more efficient and generate less issues.

Additionally these packages reflected as a hierarchical path in Subversion allow to erect barriers among different parts of source code for better security in corporations where it is required (finance institutions, etc.)

Java code structure

We preserve file organization and names: a Cobol program PGM.CBL become the equivalent Java class named pgm.java. The class stored in this file inheritates from java class CobolProgram.

This class CobolProgram is very important in our runtime framework (named NeaRuntime): it encompasses all the methods that emulate the Cobol verbs. In this 1st example, we see that DISPLAY in Cobol is changed to a call to method display() of Cobol program. In the same manner, STOP RUN in Cobol is replaced by a call to stopRun() of CobolProgram.

We also preserve the structures of Cobol: the Procedure Division is here . It is in fact an abstract method required by CobolProgram in the object-oriented hierarchy of our runtime framework. This way we make sure that the method is always present (generated anyway by the transcoder) and we can call it safely to start program execution by calling it.

procedureDivision() implements then the various verbs in reading order of original Cobol. More details about paragraphs, PERFORM, GOTO, etc. in upcoming articles.

At top of file, we also include all of the packages of our framework that are needed to emulate Cobol verbs and  also CICS and SQL verbs when used.

Posted in Transcoding | Tagged , , , | Leave a comment
0