One mistake that I’ve made for the last year as a programmer is writing custom parsers, from scratch, in the different programming languages I work with. I know realize I can save a lot of time and effort by factoring out common parsing tasks into a parsing library for each programming language I work in.
One of my current programming projects is to implement WKT import in CAD using AutoLISP. Instead of taking my traditional approach of writing a custom WKT parser from scratch, I’m going to create a parser library in AutoLISP, and then I will build the WKT parser using that library. I can then more easily implement other parsers (like a LandXML parser) in AutoLISP with more efficiently by using the parser library.
What does my parser library need to do?
It is going to handle three (3) basic tasks:
- Separate target strings into “chunks”. Four (4) basic chunks will need to be recognized. These are (1) whitespace, (2) groups of letters [or words], (3) groups of digits, (4) symbols [or punctuation]. (The groups of letters may also include embedded numbers).
- Process “chunks” into token chains.
- Interpret token chains into expressions.
This is different from the traditional parsing method in which tokens are produced in one step called scanning or lexing. I separate this into two (2) tasks: Chunk production and THEN token production. I’ll talk about this later in another blog post.
I started implementing the parser library for AutoLISP today. I’ve written functions that identify letters, digits, symbols, and whitespace using ASCII codes. I’ve started writing a function that will separate input strings into chunks.
When I’ve completed this parsing library in AutoLISP, I’ll implement it in Java and Python.
I’ll keep my readers posted on my progress.
The Sunburned Surveyor
Filed under: AutoLISP, Computer Aided Drafting, SurveyLISP, AutoLISP, Parsing, Scanning, Tokenizing