Many years ago, I left college armed with a Computer Science degree, a marked-up copy of a first edition K&R, a job offer and full expectation that my freshly minted C skills would soon be honed in the real world (yes plain old C, which was considered new and up-and-coming at the time). Like many things in life however, reality worked out somewhat differently from my plans and I quickly found myself on a project steeped in all things mainframe, including COBOL, CICS and JCL. While I eventually made my way to newer technologies, I spent a fair bit of my early formative professional years writing COBOL code. And while it’s true that at the time I was eager to move on to newer “cooler” things (and eventually did), looking back retrospectively I can’t help but feel thankful for the then-unwanted opportunity and now look back on those years with more than a bit of nostalgia.
A few weeks ago (i.e. many many years later) I was asked by a prospect about COBOL copybooks and how data structured around them might be ingested into MarkLogic. Since we hadn’t to that point even discussed anything regarding mainframe data conversion, the question seemed to me to be an abstract thought exercise, with applicability at some hypothetical later date. Since I actually understood the question (thank you, first job out of college), we discussed a few possibilities, looked at a number of open source libraries that might be of use and hashed out a verbal back-of-the-napkin approach (yes there was an imaginary napkin). I also agreed to socialize our ideas with other colleagues within my company, which I did. The approach we came up with was vetted internally as generally sound (and as a bonus I came to learn that one of my colleagues is related to the inventor of Realia Cobol). I circled back with the prospect with a few updates and he seemed to be OK with the information.
So at that point the thought exercise seemed to be done until the prospect and I reconnected a few weeks later and I found out that he had been very busy looking into real implementation possibilities and as a result trying some things out. In fact, during his research he found another open source library called Legstar that I happened to miss during my initial searches that turned out to be much more feature-complete than the ones I had found.
So we got to working again but now the task was to take the abstract and make it concrete. He sent me a small set of obfuscated sample data, along with a nice juicy copybook and off to the lab I went. In a relatively short period of time the data was loaded into MarkLogic, ready to be searched and queried. Along the way, I got to stroll down memory lane, and in one small proof-point was able to string together technologies that spanned many decades of innovation; from COBOL to XSD to Java to MarkLogic, all using a browser and Eclipse while turning EBCDIC encoded files to ASCII/Unicode XML. Pretty cool stuff if you’re a geek.
Since this prospect’s data is sensitive (even the obfuscated kind), I cannot share the gory details of the working example but for a future blog post I will share a working example with mocked-up data against a mocked-up copybook. I’ll of course be working in an editor capable of producing EBCDIC-encoded data but it’s worth the exercise of sharing the fun.
In the meantime, I will leave you with a high level set of bullet points of what was involved:
- I download the Legstar package into an Eclipse installation from http://www.legsem.com/legstar/eclipse/update.
- Then using the above, generated an XSD from a COBOL copybook.
- After creating the XSD, I generated java transformers (also using the Legstar package).
- Lastly I wrote a very short Java program that used the generated transformers and the MarkLogic Java API to transform the EBCDIC records into nicely formatted XML, and load as documents into MarkLogic.
After that the data was fully indexed inside of MarkLogic and ready to be queried at will.
If you’re adventurous and feel that you don’t want to wait until the next blog post, feel free to download MarkLogic and Legstar (and Eclipse if you don’t have it yet) and give it a try yourself. If you get stuck and need some guidance, send a quick tweet with contact info to @kenkrupa.
Otherwise check back here for part 2.