Pidgins Aren't DSLs

Piers Cawley just posted about Martin Fowler’s attempt to write a book about DSL’s actually, “internal DSLs”. Piers calls these “Pidgins” and I think it’s a pretty good term for them.

These are the sorts of languages where you don’t write a lexer or parser but instead build a family of objects, methods, functions or whatever other bits and pieces your host language provides in order to create a part of your program that, while it is directly interpreted by the host language, feels like it’s written in some new dialect. - Piers Cawley

Unfortunately ever since Ruby on Rails came out everybody (except Piers) seems to have forgotten what a real DSL is and it’s been driving me batty. Rails is not a DSL it’s a Pidgin. And it seems as if every post about writing DSLs since RoR has actually been a post about writing Pidgins. Which sucked because I was actually interested in writing DSLs but the only references I could find essentially said “you need to write a lexer and a parser, maybe generate an AST. Go use YACC or Bison”. Which didn’t help me one bit because while I knew what a lexer and a parser were I certainly didn’t want to reinvent the wheel about how to generate them and even now I’m still a little unclear on what an AST is under the covers.

I know what they let you do but not how they’re stored or accessed. Also, using YACC always seemed like it would involve journeys into unix obscurity and uber-geekdom. It may actually be trivial to use, I have no idea. And then, thanks to a friend, I was introduced to ANTLR.

OMFGWTFBBQ. ANTLR is the bees knees. It’s got a very nice IDE (ANTLRWorks) in which you write out your grammar using an EBNF-like notation. It will show you the Syntax Diagram of the rule you’re currently working on, and has an interpreter and debugger. Then, for most simple DSLs, you just hook a little code to be executed or returned at various parts of your grammar, click “Generate Code”, and you’re done. I’m not kidding. It’s that freaking simple.

ANTLR builds you a lexer and a parser from your grammar, inserts the native method calls you specified and outputs classes for you to stick in the app that’s going to be reading in the DSL. Now by default ANTLR’s focused on Java, but according to the docs you can also generate Python, C#, C++, and more. AND, if you’re writing a language that’s going to do some pretty heavy duty work (like Python for example) then you can also have ANTLR generate an AST for its parser to use which will speed up processing dramatically. I haven’t explored this part of ANTLR yet so I can’t comment on it. If you’re interested in writing your own DSL (a real DSL not a fucking Pidgin) I highly recommend that you grab the ANTLR book (it’s very well done) and start playing with ANTLR. If you’re not going to be leveraging your DSL from Java you may need to grab the latest version of ANTLR from SVN and you’ll definitely have to read some more of the docs on the site about making it output code in your language of choice.

I was able to get my head around ANTLR, write a grammar for my small language, and then hook in the code to generate a parser that actually did something in roughly one work day. Next week we’re having a brainstorming session at work to see just how far we can take this language and what systems we’d like to retool to take advantage of it. So please people, stop confusing DSLs with Pidgins and stop writing about Pidgin’s as if they were DSLs. There are so many good reasons to write real DSLs to help in whatever problem domain your applications are addressing, and no good reason to limit yourself to a Pidgin. Real DSLs, like SQL for example, can be leveraged from code in any language and be incredibly useful in their domain.

P.S. You should really go read Piers’ article. It makes some really good points and is covering a different, although related, topic to this one.