Introducing the Gobol Language

in #transpiler6 years ago

realtime_data_movement.jpg

I was quite busy during Y2K fixing software systems (yes, they needed fixing, the only reason everything didn’t collapse is that a lot of people worked very hard to make sure it didn’t) and something that was always a big deal with EDI (Electronic Data Interchange). Not necessarily ANSI EDI, but the exchange of data all the same. Often this gets referred to as ETL these days (Extract, Transform, Load) but the net result is the same, the exchange of data between partners. When you have lots of partners that want data their way, you end up with lots of programs that are formatting that data. This led me to a long thought process and sporadic starts at designing a high-level language that was oriented towards the movement and manipulation of data, which is what Gobol is targeting to do.

Fundamentally I have borrowed a lot of verbs from COBOL because it is easy to read, learn and program in, and is very data movement oriented. It is, however, an old general purpose language, so we can certainly clean up a lot of those old ideas that were from the punched card days. I also borrowed heavily from a now deprecated and platform specific (HP 3000) language called Warehouse from Taurus Software in addition to using ideas from Python and Go.
GOBOL_logo.png

Originally I wanted to translate the code into Python, because it is robust and multi-platform, however with the advent of the Go language, I decided that a transpiler would make much more sense and provide nice tight executables that can be scheduled and run on multiple platforms in a hands-free fashion. This allows the code to be run and scheduled without all the bloat associated with many products in this area that require you use their infrastructure to schedule and run, and are typically written in Java. As should now be obvious, the name is a play on COBOL and GO.

A major goal with the language is clarity, not brevity. Each command should be easy to understand and very powerful, for example, the following examples assume the script:

DEFINE R : RECORD 
    F : CHAR(4) 
    G : CHAR(6)
    H : CHAR(2) 
END 

MOVE "ABCD"   TO R.F
MOVE "efghij" TO R.G
MOVE “13”     TO R.H

Expression                                   Result
MOVE EXTRACT(R, 1, 3)   TO var               var = "ABC" 
MOVE EXTRACT(R, 3, 3)   TO var               var = "CDe" 
MOVE EXTRACT(R.F, 1, 3) TO var               var = "ABC" 
MOVE EXTRACT(R.G, 3, 3) TO var               var = "ghi" 
MOVE STR2NUM(EXTRACT(R, 11, 2)) TO numvar    numvar = 13

We are defining a record structure named R with elements, F, G, and H. The MOVE command puts a value in the declared variables. The EXTRACT function of the MOVE command takes 3 parameters, a source, a start position, and a length. You’ll notice we are counting from 1 instead of 0 because that is how non-programmers count. Finally, you’ll see the STR2NUM function also embedded in the command, so we first extract 2 characters from R starting at the 11th byte (we could have also just referenced R.H instead) and cast it to a numeric before we send it to numvar.

Now let’s assume that we have a variable that contains a name, this is from a partner file that was sent to us so we don’t know how clean the data is, but we do know that we want all names to have an upper case first character followed by lower case, so we could use a function like this:
UPSHIFT(DOWNSHIFT(fullname),EACH)

In this case we have a variable named fullname that we are first going to downshift the entire string and then upshift the first letter of each word in the string using each. We could combine this with the MOVE command to perform this function and put it in a new variable at the same time, like:
MOVE UPSHIFT(DOWNSHIFT(fullname),EACH) to Myfullname.

Now let’s assume we are reading a file that contains a formatted birthday in MM/DD/YYYY order, but we need to put it into a MySQL date field, then that would look something like this:
MOVE STR2DATE(VR.dtBirthDate, "MM/DD/YYYY") to table.BirthDate

How about if you want to dump an array as a delimited file? I am proposing something like this:
MOVE EXPORT(ARRAY, <DELIMITER>, [NEW, APPEND, OVERWRITE]) TO OUTPUT-FILE

will take the ARRAY and write it out one member at a time to OUTPUT-FILE, the optional flags will create, overwrite or append the file. The default behavior is NEW. A delimiter character will insert that between each field, this is useful for export to CSV. The example shows how we’ve eliminated a number of lines of code, which makes the verbosity acceptable as it is pretty obvious what this command will do. My goal with the MOVE command is to make it obvious what is happening, but one thought as an alternative to the above command would be something like:
WRITE OUTPUT-FILE(NEW, APPEND,OVERWRITE) FROM ARRAY DELIMITED BY “,”

I have no financial interest in this project, although I’d like to find some financial sponsors to help speed the development and help gain adoption once it is viable. I am looking to find people who have written transpilers to Golang before that would like to get involved in the project. The current language specification isn’t written in stone, I have it as a guideline to get started and anticipate that it will continue to grow and be refined as elements are implemented. The Github project is hosted at https://github.com/the-kompany/gobol and I hope you can join us. It’s going to be fun to create a language that isn’t a C derivative.