How to implement a spreadsheet

david-given · on July 24, 2015

If you're interested in spreadsheets, don't miss sc, the Standard Unix Spreadsheet™ which nobody has ever heard of (the un-Googleable name doesn't help). It was originally written by James Gosling in the 1980s and is in Debian. It runs in a terminal and uses vi keybindings:

http://www.linuxjournal.com/article/10699

franek · on July 24, 2015

There is a continuation to sc, called scim:

https://github.com/andmarti1424/scim

oever · on July 24, 2015

Is there a more recent version than 7.16 from 2002? http://ibiblio.org/pub/Linux/apps/financial/spreadsheet/

beefsack · on July 24, 2015

The package version in Arch is 7.16, so I assume it's the latest.

howeyc · on July 24, 2015

Indeed. In fact I liked it so much that I wrote a program very similar that can read/write such files on Windows[1]. I couldn't find how to compile the original on Windows.

The file format is pretty simple. At least for the spreadsheets I create.

As a side note, I love text-based (or extremely simple) file formats. So easy to write tools for them.

[1] http://github.com/howeyc/sc

ape4 · on July 24, 2015

Yes, what a very nice format:

    # This data file was generated by the Spreadsheet Calculator.
    # You almost certainly shouldn't edit it.
    let A0 = 45
    let A1 = 56
    let A2 = @sum(A0:A1)

hayd · on July 24, 2015

pffft, xml is obviously a better choice for a serious spreadsheet engine... \s

aidos · on July 24, 2015

That's the one! I've used it before and was struggling to remember the name the other day.

pbhjpbhj · on July 24, 2015

FWIW on Debian-like systems there is "apropos" that searches for binaries based on keywords in their descriptions. My problem is when I need it I can never remember the name, just spent 10 minutes search on variations of "advogato".

https://en.wikipedia.org/wiki/Apropos_%28Unix%29 : "Often a wrapper for the "man -k" command, the apropos command is used to search all manual pages for the string specified. This is often useful if one knows the action that is desired, but does not remember the exact command or page name."

TBH learning now that "man -k" exists (I don't think I can have "man man"-ed in the last decade) is probably a lot more useful for me.

DougMerritt · on July 25, 2015

> FWIW on Debian-like systems there is "apropos"

Ouch, too much Debian credit, too much Linux credit, even. That's by Bill Joy circa 1977, so it's a BSD Unix thing, widely adopted everywhere else.

Man -k was AT&T's reaction to apropos, IIRC. There is a certain logic to using "man" to search man pages.

justincormack · on July 24, 2015

Except at least in Ubuntu they basically dont work eg man -k MANOPT should find the man(1) man page and doesnt on my system.

McElroy · on July 24, 2015

There's nothing broken about that. If you want to do full text search, do `man -K` instead of `man -k`. E.g.

    man -K MANOPT

Works for me on Ubuntu 14.04, though slowly so and I still have to ask the pager to search for it within the manpage which is opened. Because of this, I read the manpages when I know where to find what I'm looking for but otherwise just Google.

A far more severe problem IMO is the fact that many Linux distros ship without manpages in their default installs.

zarq · on July 24, 2015

Back in 2000, a winner of the International Obfuscated C Code Contest (IOCCC), had an implementation of an X-based graphical spreadsheet in 2kB of very unreadable C: http://www.ioccc.org/2000/jarijyrki.c . Details here: http://www.ioccc.org/2000/jarijyrki.hint

leni536 · on July 24, 2015

This is amazing. It compiles and runs under cygwin fine too.

fit2rule · on July 24, 2015

How'd you get it to compile?

    gcc -I/usr/include/X11 jarijyrki.c  -o jarijyrki
    jarijyrki.c:15:33: error: ‘U’ undeclared here (not in a function)
    int   q,P,W,Z,X,Y,r,u; char   E[U][U][T+1] ,D[T];   Window J; GC k;  XEvent  w;
                                     ^

leni536 · on July 24, 2015

I downloaded the Makefile from the same site [1]. I changed CC to gcc (not sure if necessary) then just ran "make jarijyrki". The relevant build command makes it clear why your compile command fails.

   	${CC} ${X11CCFLAGS} ${CFLAGS} -DNeedFunctionPrototypes \
   	    -DU=40 -DT=98 '-Dz=(T+1)*U*U' -DQ=80 -DS=20 -DN=10 -DB=5 -DG=23 \
   	    -Dp=7 '-DM=((p+1)*Q)+S' '-DH=(G*S)+S+S' -DC=XK_Up -DL=XK_Down \
   	    -DO=XK_Left -DV=XK_Right -DR=XK_Escape -D_=XK_BackSpace \
   	    $? -o $@ ${X11LDFLAGS} -lX11

I guess there was a limitation on the length on the compilation command too and the winner made use of it fully. You have to run it as "./jarijyrki < sheet1.info > myedits.info", sheet1.info is in the same site[2].

[1] http://www.ioccc.org/2000/Makefile [2] http://www.ioccc.org/2000/sheet1.info

fit2rule · on July 24, 2015

Oh geeze, I didn't even see there was a Makefile there .. sorry about that.

_ij0r · on July 24, 2015

Tangentially, the Löb function over (Haskell) functors implements spreadsheet-like computations in an unexpected way: https://github.com/quchen/articles/blob/master/loeb-moeb.md

ex.

    loeb [const 1,
          const 2,
          \xs -> x !! 0 + x !! 1]

gives you [1, 2, 3].

Naturally, it doesn't allow for mutations or anything fancy, but it is an interesting curiosity.

bodhi · on July 24, 2015

Here's another reference:

http://blog.sigfpe.com/2006/11/from-l-theorem-to-spreadsheet...

(I completely missed your GitHub link the first time I read this comment)

cousin_it · on July 24, 2015

Isn't that the same as using vanilla laziness:

    > let xs = [1, 2, (xs !! 0) + (xs !! 1)] in xs
    [1, 2, 3]

It's order independent as well:

    > let xs = [(xs !! 1) - 1, (xs !! 2) - 1, 3] in xs
    [1, 2, 3]

Why do you need loeb?

_ij0r · on July 24, 2015

I guess it finds fixed points over a functor without requiring you to have a reference to the functor (having about the same usefulness as `fix`).

masklinn · on July 24, 2015

It basically works the same way the "self-referential" fib does, doesn't it?

    fibs = 0 : 1 : zipWith (+) fibs (tail fibs)

art-w · on July 24, 2015

Not exactly: you don't get the O(1) access to the two previous computations with loeb. It abstracts the collection used for memoization, but it doesn't provides "locality".

Here's a presentation that dives into loeb and expands it to a comonadic fixpoint that lets you do the fibs example correctly: https://www.youtube.com/watch?v=F7F-BzOB670

todd8 · on July 24, 2015

Emacs comes with a couple of built in spreadsheets: ses-mode supports emacs lisp formula in cells, org-mode tables allow simple formulas in table cells. Furthermore, there are half a dozen or so other packages that users have written that are older or are more specialized in some way. See http://www.emacswiki.org/emacs/SpreadSheet.

All of these spreadsheets are, naturally, written in emacs lisp.

pjmlp · on July 24, 2015

Actually, the Borland's Turbo C 2.0 for MS-DOS had an example how to implement one.

Nice to see it in OCaml though.

Narishma · on July 24, 2015

I believe all of Borland's compilers all the way back to the original Turbo Pascal came with it.

pjmlp · on July 24, 2015

I don't remember seeing it in the Turbo Pascal manuals. Used 3, 5.5, 6 and TPW 1.5.

But it might have been there as well.

Narishma · on July 24, 2015

It's definitely in TP3 for PC. I still have the original floppy disk. I don't remember if it was in the earlier CP/M versions though.

zura · on July 24, 2015

I remember it in Borland C++ 3.1. If I remember correctly, it was one of the samples using BGI.

kawsper · on July 24, 2015

One of my good friends worked with some researcher to offload the work to a GPU ten years ago.

I have stumbled on massive spreadsheets in the wild where most of the cells needed to be recalculated when you changed something, and that could take a long time to recalculate.

It seems like most software out there today does this, which is pretty cool.

zitterbewegung · on July 24, 2015

I wish there was a interface for data analysis that people could cut and paste any data into it and it outputs a magical spreadsheet that business people can use.

huskyr · on July 24, 2015

Maybe not quite what you're looking for, but i was surprised at how well Microsoft Excels handles pasting HTML tables.

hassiktir · on Aug 1, 2015

python pandas in an ipython notebook?

zimpenfish · on July 24, 2015

I once had to implement a tiny spreadsheet with cell references and simple ranged formula in Perl given 2 hours. Possibly the most interesting programming test I've had for a job yet.

moron4hire · on July 24, 2015

Grok this and you will know everything you need to about spreadsheets: http://jsfiddle.net/ondras/hYfN3/

You'll probably also end up learning a thing or two about JavaScript. Even now, 2 years after I first saw this, I'm seeing new things in it, and I've been doing JS for 20 years.

art-w · on July 24, 2015

I hope you are joking. This isn't a spreadsheet, it's a stupid bruteforce algorithm that misses the point of the dataflow paradigm:

- Everything gets recomputed on any minor change.

- If a cell is referenced by N other cells, it will be recomputed N+1 times on every update. Worse, this can be exponential for A2=A1+A1, A3=A2+A2, which leads to A1 being recomputed 5 times. There's no sharing or memoization.

- Circular references are detected because they trigger a "too much recursion" exception.

- An impure cell "=Math.random()" will be observed with different values by its dependencies.

Trufa · on July 24, 2015

This thing is getting me confused, there seems to be some black magic going on, I can't understand how it can possibly work.

moron4hire · on July 24, 2015

Look at the calls to defineProperty, then focus on figuring out why the with() block has been used in the `getter` function.

thro1237 · on July 24, 2015

Is there a python/clojure/java version of this?

moron4hire · on July 24, 2015

Not that I know of. It exploits some of JavaScript's weird scoping rules to keep everything very short. You're not going to get that in most other languages.

klibertp · on July 24, 2015

You mean this bit:

    with (DATA) return eval(value.substring(1));

? It can be implemented in Python without any problems. Of the dynamic languages I know, at least PERL, Python, Ruby, Smalltalk, Lua, Io, Racket have support for something similar. And even if not directly supported, any other language with `eval` would probably be hackable enough to implement this. In TCL that's actually a common idiom.

Anyway, I'm not aware of any JS feature which wouldn't be easily replicated in most other dynamic languages. JS was rather unlucky when it comes to language features - its development stalled for years, while in the same time Python, Perl, Ruby, PHP evolved quickly. JS only now gets features implemented in other langs many years ago.

anton-107 · on July 24, 2015

Does anybody know any good open source spreadsheet implementation with JS+DOM?

ohlain · on July 24, 2015

We´re using http://handsontable.com/ in our project and are very happy with it so far.

mati · on July 24, 2015

There is for example http://handsontable.com/

Incidentally, I have been playing around with the concept myself (client work) and it turns out you can hack a semi-working solution in just a handful lines of js:

http://soquel.github.io/spreadsheet-in-js/

kephra · on July 24, 2015

I know a dirty Spreedsheet for browser. I first seen it in IRC, iirc. I was a greedy copycat to paste it into my site, for the case paste server goes down.

http://kephra.de/copycat/zarasheet/

I kept the references. Both the original fiddle and the authors website are linked from my page.

bluedino · on July 24, 2015

I remember good old CALC.PAS that Borland included as a demo program with early versions of Turbo Pascal

http://z80cpu.eu/files/archive/rlee/B/BORLAND/TURBO%20PASCAL...

arethuza · on July 24, 2015

I've been thinking recently it would be nice to have an easy to work with open standard for spreadsheets - maybe based on JSON.

NB I know you can get Google spreadsheet data as JSON, but I was more thinking of something where you could have "headless" operation that could be embedded in other applications.

oever · on July 24, 2015

You can use OpenDocument Format files. These are supported by most spreadheet implementations. It comes in two variants: 1) a zip file with XML inside 2) a single XML file

http://docs.oasis-open.org/office/v1.2/os/OpenDocument-v1.2-...

asselinpaul · on July 24, 2015

This is a good JS component for embedding spreadsheets into apps: http://handsontable.com/

qznc · on July 24, 2015

Why the "reads" field? It is just filled and cleared, but never used. This also solves GC issues.

reycharles · on July 24, 2015

It seems "reads" is used in "invalidate" to remove the cell from "observers" in the cells in "reads".

qznc · on July 24, 2015

Yes. I missed that.

Nevertheless, removing the "reads" seems more worthwhile than removing "observers". You want to model the data flow efficiently, not the data dependencies. The dependencies are provided indirectly in the code anyways.

art-w · on July 24, 2015

Even if you manage to remove the explicit "reads" field, the "code" thunk is still going to point to the parent cells, so that wouldn't help the GC much.

wooby · on July 24, 2015

Spreadsheets rule. For ClojureScript there is the Javelin library: https://github.com/tailrecursion/javelin

It has a few novel features like transactional input.

We use Javelin for state management in the hoplon.io web framework.

kwhitefoot · on July 24, 2015

Why do all the simple spreadsheet implementations forbid circular references? A colleague of mine made a very nice solver for temperature and pressure calculations in a transformer which would have been much more complicated without circular references.

alexchamberlain · on July 24, 2015

Presumably a naive implementation would run the risk of having an infinite loop? Have you got any references of the theory proving them safe and/or removing the references?

kwhitefoot · on July 28, 2015

I don't know if an arbitrary set of cell formulas could be proven to converge but I do know from experience that it is a useful technique. The trick is in setting the initial conditions; usually there will have to be one or more cells that contain a check for a wildly out of range value that will force the cell value into a reasonable range. Excel, and other spreadsheet programs, have counters that are used to halt circular calculations after an arbitrary limit. This is really no different than using a while loop to compute a value iteratively.

lindig · on July 24, 2015

I feel there is an interesting bit (or example) missing: how does the (OCaml) code look that implements the expressions that are evaluated? The code returns a value and the list of cells it read - how is obtaining this list best organised?

_pmf_ · on July 24, 2015

Every time I read about reactive patterns, I hope that some data dependency or cells like behavior is included, and every time it's just some thin wrapper around basic pub/sub.

tonyg · on July 24, 2015

Can you expand on what you mean by "data dependency" and "cells-like behaviour" in this context? I'm afraid I'm drawing a blank on the terminology.

_pmf_ · on July 24, 2015

By cells, I mean something like this: https://common-lisp.net/project/cells/ (there's even a HN discussion about it here: https://news.ycombinator.com/item?id=1885974)

which basically allows to express interdependencies between data (a.k.a. formulas) in a way that updates sink variables when source variables are changed.

At the moment, I think not even Rx extensions for .NET (which I consider to be the most advanced reactive implementation in a mainstream language) do support this style of computation.

nickstefan12 · on July 24, 2015

I implemented a spreadsheet in react, flux, immutable, and used a command pattern here: https://github.com/NickStefan/rixif

I did not take the approach of cells actually observing each other. Instead I had a recursive function that worked from the entered cell to:

-- parsed a string DSL of "=sum(B4:B8)" into a javascript function, that pulls some premade functions like sum, vlookup etc, and makes a string list representing what an arguments getter will need to fetch ("a_r3c1_r7c1" ie array, row 3, col 1 to row 7 col 1). The user still just sees "=sum(B4:B8)"

-- note who "I depend on",

-- note who "depends on me"

-- recurse 'outwards', skipping cells in "depends on me" that are still waiting on 'needs recalc' of their own "I depend on".

The update algorithm is not the hard part. Microsoft also has a very detailed documentation of their own update algorithm online (cant find it at the moment though).

The hardest part was updating the string representations of the formulas when you insert a new column or row, and then re-updating each cell's dependencies arrays.

Definitely not a performance problem, but more of a "how the fuck do I wrap my head around all the different ways someone could want to insert, cut, copy and paster here".

One mistake I made was trying to impliment the undo/redo to be totally reversable at every step. So every command stores the way to go both back and forward. In hindsight, I should have just stored forward commands and rebaked from the beginning when someone wanted to go back in time.

I wish I had more time to work on the project, but I've been busy with other work.

notes: - an excel to JSON parsing library in node :) https://github.com/NickStefan/parsexcel.js

- A lot of people mentioning Handsontable.js. That library has some major design flaws. We used it in our git style version controlled spreadsheet app (http://www.gridhub.xyx). Handsontable only takes simple 'number' or 'string' value for each cell. It should have been an object that could store CSS for that cell, formula for that cell, and the value for that cell. I made a few pull requests, but they largely ignored me. The code is odd (they use labels, continue, and other C style code).Handsontable is great only for presenting tables. Its terrible for implementing a full excel clone.

artenix · on July 25, 2015

Twitter detects the URL as dangerous. O_o

pantulis · on July 24, 2015

Well I think the title should be "How to implement a spreadsheet in OCaml" It's impossible for me to follow the article :(

dspillett · on July 24, 2015

Any programmer with much experience should be able to follow the structure of that article and translate the concepts to the programming language of their preference.

There is nothing language specific here, just think of the OCaml as pseudo-code.

reitanqild · on July 24, 2015

Totally of topic to spreadsheets but why oh why does blogspot still change the domain based on ip? (Yes, I know that it is related to selectively blocking certain blogs in certain areas but there has to be a better way to do this, seriously.)

frik · on July 24, 2015

BlogSpot is owned by Google, they should know better.

reycharles · on July 29, 2015

google.com does the same.