Tales From Mainframe Modernization
At my last workplace, I wrote transpilers (or just compilers if you prefer) from mainframe languages (COBOL, JCL, BASIC etc.) to Java (in Rust!).
Legacy code is full of surprises. In the roughly 200k lines of COBOL that I had the (dis)pleasure of working with, I saw some wonderful hacks to get around the limitations of the system. Mainframes are also chock full of history.
Base-10 numerics
This is the first thing that stood out to me when I looked at COBOL code, a data-definition (the phrase for “variable”) in COBOL is declared like so:
,-- name
| ,- type
__|___ __|_
01 HEIGHT PIC 9(3).
-- ---
| |
| `- picture clause (keyword)
`- level number
That statement declares a variable called HEIGHT
with
type 9(3)
, which is shorthand for 999
, which
indicates “3-digit number”. The possible values for this variable are
0
to 999
!
Internationalisation
Below is another data-definition in COBOL, declaring 3 variables:
01 FOO-PERSON.
05 FOO-NAME PIC X(5).
05 FOO-HEIGHT PIC 9(3).
What that means is:
FOO-PERSON
: a “group” variable consisting of two other variablesFOO-NAME
: an alphanumeric type with 5 charactersFOO-HEIGHT
: a numeric type with 3 digits (remember, base 10 and not base 2)
COBOL has an interesting construct called “REDEFINES”:
01 FOO-PERSON.
05 FOO-NAME PIC X(5).
05 FOO-HEIGHT PIC 9(3).
01 FOO-PERSONNE REDEFINES FOO-PERSON.
05 FOO-NOM PIC X(5).
05 FOO-TAILLE PIC 9(3).
FOO-PERSON
and FOO-PERSONNE
refer to the
same region of memory.
I helped modernise a codebase that had clearly been worked on by a Spanish consultancy at some point, and they had decided to redefine all data definitions in Spanish.
String parsing
Here’s another fun one:
01 FOO-PERSON.
05 FOO-NAME PIC X(5).
05 FOO-HEIGHT PIC 9(3).
.
.
.
MOVE "PETER" TO FOO-NAME.
MOVE 175 TO FOO-HEIGHT.
*> display the entire memory region
DISPLAY FOO-PERSON.
*> PETER175
*> subscripting the first 7 bytes...
DISPLAY FOO-PERSON (1:7)
*> PETER17
So data-definitions simply describe names for regions. Which enables a clever way to parse strings:
01 DATE.
05 DD PIC 9(2).
05 FILLER PIC X.
05 MMM PIC A(3).
05 FILLER PIC X.
05 YYYY PIC 9(4).
.
.
.
MOVE "03 MAR 2025" TO DATE.
DISPLAY "DAY: " DD. *> DAY: 03
DISPLAY "MONTH: " MMM. *> MONTH: MAR
DISPLAY "YEAR: " YYYY. *> YEAR: 2025
*> also works:
MOVE "03-MAR-2025" TO DATE.
Early exit
I’d see this peppered around in a few places; which I later realized was a way to trigger an abnormal end to a batch job (possibly triggering an error handling routine in the outer job control system):
01 CONSTANT-ZERO S9(9)V9 VALUE 0.
01 ABEND S9(9)V9.
.
.
.
COMPUTE ABEND = CONSTANT-ZERO / CONSTANT-ZERO.
All the numbers
I have yet to find an explanation for this one, but I once found a file with just the first 800 natural numbers defined as string constants:
01 TC0001 X(5) "00001".
01 TC0002 X(5) "00002".
01 TC0003 X(5) "00003".
.
.
*> .... 800 lines later ....
.
.
01 TC0800 X(5) "00800".
The file was definitely not generated, and I can’t imagine text editors on the mainframe were all that advanced either.
dd
- disk destroyer
The DD
statement in the JCL subsystem stands for “data
definition”, which is largely used to describe files and IO streams used
by a batch job. The dd
command 1 on
UNIX is named after this statement!
I'm Akshay, programmer, pixel-artist & programming-language enthusiast.
I am currently building tangled.sh — a decentralized code-collaboration platform.
Reach out at oppili@libera.chat.