Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
A programming language in 450 lines of JavaScript (jsfiddle.net)
316 points by breuleux on Nov 3, 2014 | hide | past | favorite | 70 comments


Would love to see a little behind the scenes for this, maybe a write up / blog post. Pretty neat little experiment.


I agree I should post one, but it usually takes me a lot of time, and I wasn't sure how much interest there would be. I'll start working on one.


I started writing my own language in JavaScript last Summer but gave up after only writing the Lexer because I didn't know how to continue. I would LOVE to read this blog post!


Kaleidoscope - LLVM Tutorial [0] is an excellent resource with code samples in C++ and OCaml.

[0] http://llvm.org/docs/tutorial/index.html


Agreed, it's an excellent resource. One of the best PL tutorials I've ever read.

But if you use the OCaml version with a recent version of OCaml, you'll run into a few issues. I sent in a patch with a few simple updates to the path mailing list earlier this year, but last I checked it wasn't merged.

Patch is here (it's a text file, despite the "bin" extension):

http://permalink.gmane.org/gmane.comp.compilers.llvm.cvs/183...

I also pasted the patch here:

http://pastebin.com/YvzB71nJ


Not all parsers need to be complex. If you're up for the challenge, read a bit on top-down parser and try to write it for your language!


There may be better books but this was fairly pragmatic and I learned a lot: http://www.amazon.com/Programming-Language-Processors-Java-I...


Niklaus Wirth´s book about compiler construction is quite good.

It is available at his ETHZ site nowadays,

http://www.ethoberon.ethz.ch/WirthPubl/CBEAll.pdf


I wrote a tutorial some time back — http://lisperator.net/pltut/ — also targeted at JS. It walks you through parsing, evaluating, continuations and CPS compiling.


Parsing, evaluating, continuations and CPS... it seems similar to this language: https://curiosity-driven.org/continuations#interpreter but the syntax is clearly different. Also, that interpreter is implemented in ~320 lines of JS.



That's pretty cool :)

To be fair, though, the 450 lines include about 120 lines of comments, a configurable full operator precedence parser, syntax for array creation and indexing, range and looping operators, short-circuit logical operators...

Most minimal language implementations skimp on syntax because it is irrelevant to a language's expressive power. The point of this exercise, though, was to make one that didn't.


Yours is a lisp-like, so much easier to parse; and it doesn't have the nice in-dom ast viewer nor nearly the same number of comments (I compared with http://eloquentjavascript.net/code/chapter/11_language.js).


Still very interesting


It's even more interesting because it's part of book. Just setting the expectations, because it's not an apple-to-apple comparison.


The AST viewer is not included in the 450 lines, although the comments are. Most of the difference is due to the more sophisticated parser and some extra features like arrays.


How about 49 lines?

https://github.com/samphilipd/the-little-schemer/blob/master...

I wrote an implementation of Scheme in Javascript to use as I work through the excellent book 'The Little Schemer'. Turns out you can implement a basic version of the language in just 7 functions.

It's Turing complete and you can use the base methods to build any function you need.


Damn, beat me by 6 lines, though I also cheated and translated to JS to interpret the code: https://github.com/capnmidnight/betty-boop/blob/master/loosp...

EDIT: though I did end up writing a Pong game in it: https://github.com/capnmidnight/betty-boop/blob/master/pong....


Peter Norvig's lis.py[1] is an oldie but a goodie. It really shows you the beauty and simplicity of Scheme/Lisp interpreters.

[1]: http://norvig.com/lispy.html


The best thing about this is being able to play with the entire mechanism right on jsfiddle. Thanks for this work and want to read the writeup!


This is awesome! I wish somebody had shown me this when I learned about compilers in CS. The best way I've found in learning about anything theoretical is to write the code for it.


Yup! I found this course pretty useful when I was studying it. Fun to jump right in alongside the theory and all, I find:

http://createyourproglang.com/


That's a nice example of some elegant and readable code.


Just a semi-unrelated comment:

The Fibonacci code example classically has a horrid runtime. Will like likely crash your browser when supplied a number higher than the low thirties. IIRC it has a O(2^n) runtime and can be improved either by a bottom up approach (non-recursive) or a "dynamic-programming" (OOohhhh, buzzwords) method where you use memoization (memo table) to store (or memoize) results.


I always thought the fastest was the closed form formula: http://en.wikipedia.org/wiki/Fibonacci_number#Closed-form_ex...

Unless looping and counting is that much faster than doing some floating point math?


Only "faster" because you precompute digits of the golden ratio, and will quickly become inaccurate unless you compute more of them


The closed form will become inaccurate at some point. However, there is a way to calculate Fibonacci accurately with O(log(n)) time (ignoring the time to multiply - otherwise O(M(n) log(n)) where M(n) is the time to multiply two numbers of n digits).


I thought recursive fibonacci solutions like this had O(fib n) run time.



> EDIT: Tied with JadeNB (https://news.ycombinator.com/item?id=8555051).

So close, and only ngorenflo (https://news.ycombinator.com/item?id=8555050) can come between us. :-)


but isn't O(phi^n) < O(2^n)? Exponential regardless.


You're right, although the "<" symbol isn't technically correct notation. I think O(phi^n) ⊂ O(2^n) or O(phi^n) ⊊ O(2^n) would be more correct to indicate that the left is a proper subset of the right, given that big O notation refers to sets of functions.


That's certainly true, although notation like `fib_n = O(phi^n)` (rather than `fib_n \in O(phi^n)`) is so ingrained that it's probably too late to fight it. (I seem to remember that Knuth says something to this effect.)

In that spirit, one can adopt a sort of compromise notation: `O(phi^n) = o(2^n)` (where `=` should really be `\subseteq`).


Sure, O(2^n) is an upper bound here. O(phi^n) is a tighter bound.


`O(fib_n)` is `O(phi^n)`, where phi = (1 + sqrt(5))/2 < 2.

EDIT: Tied with Mithrandir (https://news.ycombinator.com/item?id=8555049).


This seems to be a language where there are several kinds of blocks: if-elif-else , a let-in-end, begin-end and curly braces, square brackets, and round parenthesis.

There are a few operators, with several levels of precedence, both prefix operators, infix notation, but no sign of post-fix notation like (!) for 3! == 6 and 5! == 120

There also seems to be list data type as well as strings and numbers.


Postfix operators do work. Try this:

    let x! = if x == 0 then 1 else x * (x - 1)! end in 5! end
Any operator can be prefix, infix or postfix depending on the relative priority with the operators next to them. For instance, in "1 ^ * 2" the engine interprets ^ as postfix because ^ has higher priority (it just gets null as its right hand side). In "1 * ^ 2", ^ is prefix. Unless specified otherwise operators are given maximal priority, which is why the factorial example works.

The engine just has specific provisions to allow operators in "prefix position" to have different priority than in infix position, otherwise prefix minus would fail to work as expected.


I'm trying to figure out how much work it would take to add on OCaml style pattern matching to this language, particularly list matching.

Would you mind pointing me in the right direction? Like a very brief(ie., a sentence or two), high-level overview of how you might approach the task?


Not a lot of work. I have implemented pattern matching before, so I just went ahead and did it:

http://jsfiddle.net/f1ycatas/2/

See line 266 for the syntax definition (the precedence entry for "->" on line 278 is also relevant) and line 409 for the implementation of pattern matching. I use a few auxiliary functions here: extractArgs just checks that an AST node has a certain form and extracts parts of it, and listify is a kind of normalization function that will give you a list of expressions or statements.

I think the code is relatively straightforward: we have a list of "pattern -> body" statements. For each entry we create a fresh environment for the variables declared by the pattern and we try to match the pattern with the value. For matching, if a pattern is a variable name we set that variable in the environment, if it's a number or string we check for equality, and if it's a list of subpatterns we check that we have a list of the correct length and we check the subpatterns recursively. If there's a match, we execute the corresponding body.

If that interests you, I wrote a compile-to-JS language called Earl Grey (http://breuleux.github.io/earl-grey/repl/) using a similar parser and it has rather sophisticated pattern matching. The implementation is quite a lot more than 450 lines, though :)


Thanks! Nice explanation and very clear code.


Let's live it once again https://news.ycombinator.com/item?id=7745561

;)

"Have you heard about our Lord and Saviour JavaScript?"

"How to touch yourself at night without JavaScript knowing it."

"Breaking news! POP3 and IMAP written in Javascript!"

"How I ported the control software of a nuclear reactor to reactive Javascript"

"Linux kernel ported to JavaScript running in the browser. See how we did it."

"How I made a filesystem in javascript."

"How I got my girlfriend pregnant using JavaScript."

"How to avoid getting HIV using JavaScript."

"The Linux kernel doesn't have enough javascript."

"I recommended my boss to rewrite the local powergrid infrastructure to javascript and how I lost my job."

"I decided to re-implement Javascript in Javascript. It failed. Here is my story."

"ABC in # lines of JavaScript (400 comments, 1000 points)"


Wow, nice project! I would love to see both the lexical and syntactic analyzers broken down a little more.


Loving it! It's so nice to see this kind of things around :)


You can also check out http://javascript.crockford.com/tdop/tdop.html

The article is also in the book Beautiful Code (O'Reilly)


Indeed, that is a very good article. I highly recommend it. TDOP is the combination of recursive descent and Floyd's operator precedence grammars, the latter of which is the technique I used (I find it sufficient for my needs).


TDOP of Vaughan Pratt is also used in maxima (that is the way they use an infix language similar to python as a frontend to Lisp. In maxima you can define easily infix, prefix and postfix operator with a left and right binding force for your grammar. I couldn't find with grep the file I was looking for in the maxima source, but for examen Fateman paper "Syntax Extension as a Tool for Application Programming" is a good reference.


Since some 150 lines are comments or blank, it's more like a 300 lines programming language. Even more impressive!


Thanks so much for this - really looking forward to going through it in the next couple days


Yes me too! I am curious to know the motivation behind it as well. I have been putting on the back burner going through the "Dragon Book" and I figured a project like this would be motivation to finally get around to it.


Ah this is pretty baddass, kudos!


This is great! Specially the AST viewer.

Any inspiration from the Tiger book for the language?


can you create your own programming language with python?


Any Turing complete language can be used for writing compilers/interpreters, it is just harder in some than other.

Python is actually pretty neat for doing it and you even have tools like PyPy (with RPython) to write a JIT for the language if you wish.


You don't even need a Turing-complete language for writing compilers, see CompCert for example.


I need to dive into it.

Do you mean it because of the formal logic being used?


This is because compilation can be represented as a finite sequence of applying simple rewrite rules. You don't need a true Turing-complete language to do so, because you don't have any indeterminate loops in such process. Any simple total language (like Coq) is more than enough.


Sure! You could use a parser generator like Yapps [0] to write a parser for your own programming language.

[0]: http://mbreen.com/yapps2.html


Dat AST rendering.


Or you could use replace() with a fat regex, and then eval(). But this is awesome.


But then you'd have three problems...


That's one of the most terrifying sentences I've read.


What? That's actually the way all the compilers/transpilers work. You just take some string and transform it into some other string that happens to be runnable.


No, that is actually not the way most compilers/transpilers work.

Using a regex for parsing a whole language is something I've never seen done, in 25+ years of writing and reading compilers as a hobby.

Using a regex for tokenization or parsing simple sub-expressions is sometimes done, but that too is fairly rare outside of toy parsers.

Doing the translation step through replace on the basis of a regex is also not going to work for anything but the simplest translations, not least because the resulting monstrous regex would be nearly impossible for a human to reason about.

But the thing that terrified me about it was the thought of someone lifting something lie that out of the browser sandbox and running it server side: The combination of a near undecipherable regex for parsing/translation coupled with eval() to run the result would be a near certain security nightmare.


I think you misunderstood me completely. I was referring to a much more fundamental nature of compilation, not to any particular implementation technique. I agree that "a couple (because I assumed the OP meant this and not a single, giant regexp) of regexes with replace" is not a particularly often used implementation technique for compilers, but it is a workable one. For example, compiling some simple custom markup language to html is a good use case for this. Take a look at "Text processing in Python" (http://gnosis.cx/TPiP/) for a longer discussion.

The `eval()` part is also nothing new or strange. There are many systems which let you input expressions, compile it and run. Scala does this, as does Nim, as did Forth for a longest time.

In general compilation is not magic, on the contrary, it's conceptually simple and it's a good thing to know the basics. This is the view I wanted to express.


> I think you misunderstood me completely. I was referring to a much more fundamental nature of compilation, not to any particular implementation technique.

Then you seem to have misunderstood the point of my initial comment entirely, which is down to the specific case of suggesting regexes + eval() as a reasonable way to implement a compiler.

> because I assumed the OP meant this and not a single, giant regexp

The linked implementation already uses regexps for tokenization. OPs comment explicitly states "you could use replace() with a fat regex, and then eval()". I find it hard to interpret that as anything other than a suggestion to use a single, giant regexp.

> The `eval()` part is also nothing new or strange. There are many systems which let you input expressions, compile it and run.

Having an eval() is nothing "new or strange" but that was not what I was reacting to.

> In general compilation is not magic, on the contrary, it's conceptually simple and it's a good thing to know the basics. This is the view I wanted to express.

I've implemented several compilers and interpreters, and is writing a multiple-year-long article series on writing a Ruby compiler, so I agree (though Ruby is testing my patience...).

But personally dragging out regexps is the last thing I'd do for illustrating the conceptual simplicity of compilation... Personally, when I see people dragging out regexps, and they're longer than about 5 characters, I assume that's where I'll be most likely to find bugs.


> Then you seem to have misunderstood the point of my initial comment entirely

Yeah, I agree, sorry about that.

> I find it hard to interpret that as anything other than a suggestion to use a single, giant regexp.

Now that I look at this you're probably right. I assumed any meaningful compilation is actually impossible with a single regexp and so tried to interpret original comment in a way which made at least some sense for me.

> But personally dragging out regexps is the last thing I'd do for illustrating the conceptual simplicity of compilation...

Well, there are some advantages to using regexps as an example: they are widely known and they also are able to describe regular languages. But I have no experience at all talking to people about compilation, so I'll assume that you're right and that using regexps makes it actually harder to explain things :)


Yes, compilation can be represented as a sequence of term rewriting rules applications. But it does not make much sense to think of the ASTs (i.e., terms) as flat strings, hence, regular expressions (or Markov algorithms in general) is not a very suitable idiom here.



So Coffeescript for example is just a fat regex plus a call to eval?


Actually both are scripting languages. Still impressive though.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: