Or you could use replace() with a fat regex, and then eval(). But this is awesom...

_f1dq · on Nov 4, 2014

But then you'd have three problems...

vidarh · on Nov 4, 2014

That's one of the most terrifying sentences I've read.

klibertp · on Nov 4, 2014

What? That's actually the way all the compilers/transpilers work. You just take some string and transform it into some other string that happens to be runnable.

vidarh · on Nov 4, 2014

No, that is actually not the way most compilers/transpilers work.

Using a regex for parsing a whole language is something I've never seen done, in 25+ years of writing and reading compilers as a hobby.

Using a regex for tokenization or parsing simple sub-expressions is sometimes done, but that too is fairly rare outside of toy parsers.

Doing the translation step through replace on the basis of a regex is also not going to work for anything but the simplest translations, not least because the resulting monstrous regex would be nearly impossible for a human to reason about.

But the thing that terrified me about it was the thought of someone lifting something lie that out of the browser sandbox and running it server side: The combination of a near undecipherable regex for parsing/translation coupled with eval() to run the result would be a near certain security nightmare.

klibertp · on Nov 4, 2014

I think you misunderstood me completely. I was referring to a much more fundamental nature of compilation, not to any particular implementation technique. I agree that "a couple (because I assumed the OP meant this and not a single, giant regexp) of regexes with replace" is not a particularly often used implementation technique for compilers, but it is a workable one. For example, compiling some simple custom markup language to html is a good use case for this. Take a look at "Text processing in Python" (http://gnosis.cx/TPiP/) for a longer discussion.

The `eval()` part is also nothing new or strange. There are many systems which let you input expressions, compile it and run. Scala does this, as does Nim, as did Forth for a longest time.

In general compilation is not magic, on the contrary, it's conceptually simple and it's a good thing to know the basics. This is the view I wanted to express.

vidarh · on Nov 4, 2014

> I think you misunderstood me completely. I was referring to a much more fundamental nature of compilation, not to any particular implementation technique.

Then you seem to have misunderstood the point of my initial comment entirely, which is down to the specific case of suggesting regexes + eval() as a reasonable way to implement a compiler.

> because I assumed the OP meant this and not a single, giant regexp

The linked implementation already uses regexps for tokenization. OPs comment explicitly states "you could use replace() with a fat regex, and then eval()". I find it hard to interpret that as anything other than a suggestion to use a single, giant regexp.

> The `eval()` part is also nothing new or strange. There are many systems which let you input expressions, compile it and run.

Having an eval() is nothing "new or strange" but that was not what I was reacting to.

> In general compilation is not magic, on the contrary, it's conceptually simple and it's a good thing to know the basics. This is the view I wanted to express.

I've implemented several compilers and interpreters, and is writing a multiple-year-long article series on writing a Ruby compiler, so I agree (though Ruby is testing my patience...).

But personally dragging out regexps is the last thing I'd do for illustrating the conceptual simplicity of compilation... Personally, when I see people dragging out regexps, and they're longer than about 5 characters, I assume that's where I'll be most likely to find bugs.

klibertp · on Nov 4, 2014

> Then you seem to have misunderstood the point of my initial comment entirely

Yeah, I agree, sorry about that.

> I find it hard to interpret that as anything other than a suggestion to use a single, giant regexp.

Now that I look at this you're probably right. I assumed any meaningful compilation is actually impossible with a single regexp and so tried to interpret original comment in a way which made at least some sense for me.

> But personally dragging out regexps is the last thing I'd do for illustrating the conceptual simplicity of compilation...

Well, there are some advantages to using regexps as an example: they are widely known and they also are able to describe regular languages. But I have no experience at all talking to people about compilation, so I'll assume that you're right and that using regexps makes it actually harder to explain things :)

sklogic · on Nov 4, 2014

Yes, compilation can be represented as a sequence of term rewriting rules applications. But it does not make much sense to think of the ASTs (i.e., terms) as flat strings, hence, regular expressions (or Markov algorithms in general) is not a very suitable idiom here.

kyllo · on Nov 4, 2014

No. https://github.com/jashkenas/coffeescript/tree/master/lib/co...

kyllo · on Nov 4, 2014

So Coffeescript for example is just a fat regex plus a call to eval?