This post is part of our serie about transcoding. Read them all via the link “Transcoding” to get an understanding on how we proceed to convert an application 1oo% automatically from Cobol to Java.
In this post, we will describe challenges created by the implementation of data representation and variable declaration (PICTURE, LEVEL, etc.) in the working storage section and linkage section of a Cobol program. We also explain how we implemented their transcoding to Java in order to fully respect the original semantics of Cobol and consequently achieve the iso-functionality, core of our technology.
Cobol working storage section and variable declarations
Cobol has a very “physical” description of data and variable declarations in its working storage section: a programmer defines a data structure and corresponding variable with a given mask (materialized by PIC, PICTURE directives – see below) and then can redefine it with another mask using other basic data types with different PICTURES. See AG01MA1O redefining AG01MA1I here below.
The initial definition and the REDEFINES apply to same physical memory block. The levels (02, 03, 05, 10, 15, etc.) are used to express subsequent parts of the definition mask. Those parts can:
- be contiguous and describe memory block further when at the same level. Below in the code example: A1CLIL and preceding FILLER are both at LEVEL 03. the filler takes 12 bytes and then A1CLI1 takes 5 bytes right after this filler.
- describe in more details the current level:A1CLI1, preceding FILLER and subsequent declarations at level 03 extend progressively the size of AG01MA1I
- redefine an already defined level via the use of REDEFINES and a mask at same level than the one that it redefines. In the example below, AG01MA1O at level 02 redefines the view of memory already defined by AG01MA1I, using the same memory addresses but for other purposes.
03 FILLER PIC X(12).
03 A1CLIL COMP PIC S9(4).
03 A1CLIF PIC X.
03 FILLER PIC X(2).
03 A1CLII PIC X(49).
03 S8I OCCURS 17 TIMES.
04 A1LGL COMP PIC S9(4).
04 A1LGF PIC X.
04 FILLER PIC X(2).
04 A1LGI PIC X(2).
04 A1TXTL COMP PIC S9(4).
04 A1TXTF PIC X.
04 FILLER PIC X(2).
04 A1TXTI PIC X(71).
02 AG01MA1O REDEFINES AG01MA1I.
03 FILLER PIC X(12).
03 FILLER PIC X(2).
03 A1CLIA PIC X.
03 A1CLIC PIC X.
03 A1CLIH PIC X.
03 A1CLIO PIC X(49).
03 S8O OCCURS 17 TIMES.
04 FILLER PIC X(2).
04 A1LGA PIC X.
04 A1LGC PIC X.
04 A1LGH PIC X.
04 A1LGO PIC X(2).
04 FILLER PIC X(2).
04 A1TXTA PIC X.
04 A1TXTC PIC X.
04 A1TXTH PIC X.
04 A1TXTO PIC X(71).
When going back to the roots of Cobol, it means that these definitions correspond to the same chunk of memory seen as a single and continuous data buffer. The compiler then uses different assembler instructions at runtime to process this same memory with the basic mask or a redefinition mask depending on the variable names used by the Cobol programmer.
Emulation of Cobol semantics in Java
When you want to emulate this behavior of Cobol via Java data structure, there a couple of additional things to preserve:
- the PICTURE of a variable defines how it has to be manipulated: an INITIALIZE on a PIC X(n) set it to space characters whereas the same INITIALIZE on a PIC 9(n) sets it to zeroes
- each variable at each level has to be defined and accessible individually because itcan be used as such in the program but it must be part of a more global memory structure that can be impacted by program statements of higher order: MOVE to a variable at level 05 will impact all its subcomponent at levels 10, 15, etc…
- variables that are defined contiguously in Cobol have to be represented the same way in Java because use of a variable of a low level (let’s say 05) can impact all its subparts in their specific order. They cannot be per se fully distinct object: they need to share an underlying data memory object where the data is stored so that the MOVE semantics just described above is preserved for data described in a contiguous manner in the working storage section.
public Var ag01ma1i = declare.level(2).var() ; // (20) 02 AG01MA1I.
public Var filler$1 = declare.level(3).picX(12).filler() ; // (21) 03 FILLER PIC X(12).
public Var a1clil = declare.level(3).picS9(4).comp().var() ; // (22) 03 A1CLIL COMP PIC S9(4).
public Var a1clif = declare.level(3).picX(1).var() ; // (23) 03 A1CLIF PIC X.
public Var filler$2 = declare.level(3).picX(2).filler() ; // (24) 03 FILLER PIC X(2).
public Var a1clii = declare.level(3).picX(49).var() ; // (27) 03 A1CLII PIC X(49).
public Var s8i = declare.level(3).occurs(17).var() ; // (28) 03 S8I OCCURS 17 TIMES.
public Var a1lgl = declare.level(4).picS9(4).comp().var() ; // (29) 04 A1LGL COMP PIC S9(4).
public Var a1lgf = declare.level(4).picX(1).var() ; // (30) 04 A1LGF PIC X.
public Var filler$3 = declare.level(4).picX(2).filler() ; // (31) 04 FILLER PIC X(2).
public Var a1lgi = declare.level(4).picX(2).var() ; // (32) 04 A1LGI PIC X(2).
public Var a1txtl = declare.level(4).picS9(4).comp().var() ; // (33) 04 A1TXTL COMP PIC S9(4).
public Var a1txtf = declare.level(4).picX(1).var() ; // (34) 04 A1TXTF PIC X.
public Var filler$4 = declare.level(4).picX(2).filler() ; // (35) 04 FILLER PIC X(2).
public Var a1txti = declare.level(4).picX(71).var() ; // (36) 04 A1TXTI PIC X(71).
public Var ag01ma1o = declare.level(2).redefines(ag01ma1i).var() ; // (37) 02 AG01MA1O REDEFINES AG01MA1I.
public Var filler$5 = declare.level(3).picX(12).filler() ; // (38) 03 FILLER PIC X(12).
public Var filler$6 = declare.level(3).picX(2).filler() ; // (39) 03 FILLER PIC X(2).
public Var a1clia = declare.level(3).picX(1).var() ; // (40) 03 A1CLIA PIC X.
public Var a1clic = declare.level(3).picX(1).var() ; // (41) 03 A1CLIC PIC X.
public Var a1clih = declare.level(3).picX(1).var() ; // (42) 03 A1CLIH PIC X.
public Var a1clio = declare.level(3).picX(49).var() ; // (43) 03 A1CLIO PIC X(49).
public Var s8o = declare.level(3).occurs(17).var() ; // (44) 03 S8O OCCURS 17 TIMES.
public Var filler$7 = declare.level(4).picX(2).filler() ; // (45) 04 FILLER PIC X(2).
public Var a1lga = declare.level(4).picX(1).var() ; // (46) 04 A1LGA PIC X.
public Var a1lgc = declare.level(4).picX(1).var() ; // (47) 04 A1LGC PIC X.
public Var a1lgh = declare.level(4).picX(1).var() ; // (48) 04 A1LGH PIC X.
public Var a1lgo = declare.level(4).picX(2).var() ; // (49) 04 A1LGO PIC X(2).
public Var filler$8 = declare.level(4).picX(2).filler() ; // (50) 04 FILLER PIC X(2).
public Var a1txta = declare.level(4).picX(1).var() ; // (51) 04 A1TXTA PIC X.
public Var a1txtc = declare.level(4).picX(1).var() ; // (52) 04 A1TXTC PIC X.
public Var a1txth = declare.level(4).picX(1).var() ; // (53) 04 A1TXTH PIC X.
public Var a1txto = declare.level(4).picX(71).var() ; // (54) 04 A1TXTO PIC X(71).
Additional issues between Cobol and Java are:
- a character is 8 bits in Cobol whereas Java uses UTF-16 where a char is sixteen bits
- number precision can be very high in Cobol: number with 31 digits can be defined. Numbers with such a precision do not exist per se in Java. Even object of BigDecimals do no reach that extent. Our runtime framework implements an optimized representation of those big numbers and make sure that operations (ADD, SUBTRACT, MULTIPLY, DIVIDE, etc.) produce the exact same results (incl. rounding) as if the operations were executed natively in Cobol.
In contrast, the walls in Java are very well defined: each object definedis independent of all the others and they don’t share memory.
Experienced Cobol programmers can code very tricky statements in order to make the most out of the Cobol mechanisms of REDEFINES described above in terms of maximum execution speed and minimum memory consumption . Cobol was invented when both CPU and memory were very expensive.
Cobol semantics have to be ported unchanged to Java to respect the full iso-functionality of processing that is a USP of Eranea’s solution. Any change to these semantics due to an appromative implementation of Cobol data structure would mean lots of very tricky issues, hard and costly to debug especially as they would appear in the smartest Cobol assets.
So, there is no other solution than a full emulation of the Cobol behavior in Java: for each data structure (usually starting at level 01), we allocate a permanent and shared memory buffer where the various levels of definition of the original variable get “physically” concatenated when the successive definitions in the working storage section happen.
The core type of our framework to achieve that goal is Var. A Var is defined for each separate Cobol variable. The “factory” for the generation of those various Vars is the level(n) function of the variable named “declare” of type VarSectionDeclaration part of our root class CobolProgram.
The memory organization of the buffer is defined by the value of n in level(x) as it is in Cobol:
- same level as previous one means extension of buffer to cope with additional variables. For example, in code above, Var filler$1 and Var a1clil get – in their implementation and and instanciation time – access to the buffer created in the instanciation ag01ma1i
- higher level means further sub-definition of previous level
- lower level means redefinition of previous level at same value
the picX(n) or pic9(n) methods of the object returned by level() define length and type of currently defined Var.
the valueSpace() or value valueZero() methods are used to initialize the object according to its type.
It is clear that type Var makes internal use of Java native data types like String, int, long, etc. to execute the requested Cobol operations (MOVE, ADD, MULTIPLY, etc.) by leveraging java runtime functions. There is a permanent conversion between Cobol Vars and java native types happening under the hood: Var stores the Cobol representation and tansient native types allow the execution of an operation on this representation.
The current state of the Cobol Vars is always stored in the internal shared buffer and reaccessed through the Var methods in order to preserve the Cobol original semantics described in the introduction of this article.
This post is introductory so we will not touch here all the details of all what happens to properly handle arrays (declared by OCCURS in Cobol), complex REDEFINES, use of POINTER type (via ADDRESS OF statement) in Cobol, sophisticated data access (memory address values accessed as binary numbers, etc) and so forth. Its aim is to give good hints of what has to be done to respect original Cobol semantics.
We spend lots of energy in respecting these semantics because it is the basis of our iso-functional transcoding that brings so much advantages for the success of our projects:
- very simple exhaustive and simple testability of the new system: it has to do the exact same thing at the bit level because it is conceived so. It means that very “mechanical” tools can be built to check that a Java program delivers identical results to its original Cobol counterpart: our non-regression testing tool can then rely on scenarios captured on the mainframe via 3270 and replay them on our web interface and check if results are exactly the same. No interpretation of values has to be done: just compare and if identical, everything is fine.
- live sharing of production database: as both versions (Cobol & Java) handles data identically, they can share the productive database in real time as a way to communicate transparently between users on the new system and the old one. A user cannot detect if the data currently being accessed was previously accessed on new or old system.
- Parallel construction: from the previous point, it is obvious that our methodology is based on the new Java system running in parallel of the old Cobol system and on both systems sharing one single database in real time. Consequently, users can be very smoothly transferred from old to new system at most suitable pace for the customer. The shared database functions as collaboration vector between Cobol application and Java derived version. Lethal big-bang can be avoided and no need for a very complex inter-system communication !
This article cleary demonstrated what has to be done to preserve semantics but also the gains derived from this iso-functionality.