Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
How to implement a spreadsheet (semantic-domain.blogspot.com)
173 points by kristianp on July 24, 2015 | hide | past | favorite | 68 comments


If you're interested in spreadsheets, don't miss sc, the Standard Unix Spreadsheet™ which nobody has ever heard of (the un-Googleable name doesn't help). It was originally written by James Gosling in the 1980s and is in Debian. It runs in a terminal and uses vi keybindings:

http://www.linuxjournal.com/article/10699


There is a continuation to sc, called scim:

https://github.com/andmarti1424/scim


Is there a more recent version than 7.16 from 2002? http://ibiblio.org/pub/Linux/apps/financial/spreadsheet/


The package version in Arch is 7.16, so I assume it's the latest.


Indeed. In fact I liked it so much that I wrote a program very similar that can read/write such files on Windows[1]. I couldn't find how to compile the original on Windows.

The file format is pretty simple. At least for the spreadsheets I create.

As a side note, I love text-based (or extremely simple) file formats. So easy to write tools for them.

[1] http://github.com/howeyc/sc


Yes, what a very nice format:

    # This data file was generated by the Spreadsheet Calculator.
    # You almost certainly shouldn't edit it.
    let A0 = 45
    let A1 = 56
    let A2 = @sum(A0:A1)


pffft, xml is obviously a better choice for a serious spreadsheet engine... \s


That's the one! I've used it before and was struggling to remember the name the other day.


FWIW on Debian-like systems there is "apropos" that searches for binaries based on keywords in their descriptions. My problem is when I need it I can never remember the name, just spent 10 minutes search on variations of "advogato".

https://en.wikipedia.org/wiki/Apropos_%28Unix%29 : "Often a wrapper for the "man -k" command, the apropos command is used to search all manual pages for the string specified. This is often useful if one knows the action that is desired, but does not remember the exact command or page name."

TBH learning now that "man -k" exists (I don't think I can have "man man"-ed in the last decade) is probably a lot more useful for me.


> FWIW on Debian-like systems there is "apropos"

Ouch, too much Debian credit, too much Linux credit, even. That's by Bill Joy circa 1977, so it's a BSD Unix thing, widely adopted everywhere else.

Man -k was AT&T's reaction to apropos, IIRC. There is a certain logic to using "man" to search man pages.


Except at least in Ubuntu they basically dont work eg man -k MANOPT should find the man(1) man page and doesnt on my system.


There's nothing broken about that. If you want to do full text search, do `man -K` instead of `man -k`. E.g.

    man -K MANOPT
Works for me on Ubuntu 14.04, though slowly so and I still have to ask the pager to search for it within the manpage which is opened. Because of this, I read the manpages when I know where to find what I'm looking for but otherwise just Google.

A far more severe problem IMO is the fact that many Linux distros ship without manpages in their default installs.


Back in 2000, a winner of the International Obfuscated C Code Contest (IOCCC), had an implementation of an X-based graphical spreadsheet in 2kB of very unreadable C: http://www.ioccc.org/2000/jarijyrki.c . Details here: http://www.ioccc.org/2000/jarijyrki.hint


This is amazing. It compiles and runs under cygwin fine too.


How'd you get it to compile?

    gcc -I/usr/include/X11 jarijyrki.c  -o jarijyrki
    jarijyrki.c:15:33: error: ‘U’ undeclared here (not in a function)
    int   q,P,W,Z,X,Y,r,u; char   E[U][U][T+1] ,D[T];   Window J; GC k;  XEvent  w;
                                     ^


I downloaded the Makefile from the same site [1]. I changed CC to gcc (not sure if necessary) then just ran "make jarijyrki". The relevant build command makes it clear why your compile command fails.

   	${CC} ${X11CCFLAGS} ${CFLAGS} -DNeedFunctionPrototypes \
   	    -DU=40 -DT=98 '-Dz=(T+1)*U*U' -DQ=80 -DS=20 -DN=10 -DB=5 -DG=23 \
   	    -Dp=7 '-DM=((p+1)*Q)+S' '-DH=(G*S)+S+S' -DC=XK_Up -DL=XK_Down \
   	    -DO=XK_Left -DV=XK_Right -DR=XK_Escape -D_=XK_BackSpace \
   	    $? -o $@ ${X11LDFLAGS} -lX11
I guess there was a limitation on the length on the compilation command too and the winner made use of it fully. You have to run it as "./jarijyrki < sheet1.info > myedits.info", sheet1.info is in the same site[2].

[1] http://www.ioccc.org/2000/Makefile [2] http://www.ioccc.org/2000/sheet1.info


Oh geeze, I didn't even see there was a Makefile there .. sorry about that.


Tangentially, the Löb function over (Haskell) functors implements spreadsheet-like computations in an unexpected way: https://github.com/quchen/articles/blob/master/loeb-moeb.md

ex.

    loeb [const 1,
          const 2,
          \xs -> x !! 0 + x !! 1]
gives you [1, 2, 3].

Naturally, it doesn't allow for mutations or anything fancy, but it is an interesting curiosity.


Here's another reference:

http://blog.sigfpe.com/2006/11/from-l-theorem-to-spreadsheet...

(I completely missed your GitHub link the first time I read this comment)


Isn't that the same as using vanilla laziness:

    > let xs = [1, 2, (xs !! 0) + (xs !! 1)] in xs
    [1, 2, 3]
It's order independent as well:

    > let xs = [(xs !! 1) - 1, (xs !! 2) - 1, 3] in xs
    [1, 2, 3]
Why do you need loeb?


I guess it finds fixed points over a functor without requiring you to have a reference to the functor (having about the same usefulness as `fix`).


It basically works the same way the "self-referential" fib does, doesn't it?

    fibs = 0 : 1 : zipWith (+) fibs (tail fibs)


Not exactly: you don't get the O(1) access to the two previous computations with loeb. It abstracts the collection used for memoization, but it doesn't provides "locality".

Here's a presentation that dives into loeb and expands it to a comonadic fixpoint that lets you do the fibs example correctly: https://www.youtube.com/watch?v=F7F-BzOB670


Emacs comes with a couple of built in spreadsheets: ses-mode supports emacs lisp formula in cells, org-mode tables allow simple formulas in table cells. Furthermore, there are half a dozen or so other packages that users have written that are older or are more specialized in some way. See http://www.emacswiki.org/emacs/SpreadSheet.

All of these spreadsheets are, naturally, written in emacs lisp.


Actually, the Borland's Turbo C 2.0 for MS-DOS had an example how to implement one.

Nice to see it in OCaml though.


I believe all of Borland's compilers all the way back to the original Turbo Pascal came with it.


I don't remember seeing it in the Turbo Pascal manuals. Used 3, 5.5, 6 and TPW 1.5.

But it might have been there as well.


It's definitely in TP3 for PC. I still have the original floppy disk. I don't remember if it was in the earlier CP/M versions though.


I remember it in Borland C++ 3.1. If I remember correctly, it was one of the samples using BGI.


One of my good friends worked with some researcher to offload the work to a GPU ten years ago.

I have stumbled on massive spreadsheets in the wild where most of the cells needed to be recalculated when you changed something, and that could take a long time to recalculate.

It seems like most software out there today does this, which is pretty cool.


I wish there was a interface for data analysis that people could cut and paste any data into it and it outputs a magical spreadsheet that business people can use.


Maybe not quite what you're looking for, but i was surprised at how well Microsoft Excels handles pasting HTML tables.


python pandas in an ipython notebook?


I once had to implement a tiny spreadsheet with cell references and simple ranged formula in Perl given 2 hours. Possibly the most interesting programming test I've had for a job yet.


Grok this and you will know everything you need to about spreadsheets: http://jsfiddle.net/ondras/hYfN3/

You'll probably also end up learning a thing or two about JavaScript. Even now, 2 years after I first saw this, I'm seeing new things in it, and I've been doing JS for 20 years.


I hope you are joking. This isn't a spreadsheet, it's a stupid bruteforce algorithm that misses the point of the dataflow paradigm:

- Everything gets recomputed on any minor change.

- If a cell is referenced by N other cells, it will be recomputed N+1 times on every update. Worse, this can be exponential for A2=A1+A1, A3=A2+A2, which leads to A1 being recomputed 5 times. There's no sharing or memoization.

- Circular references are detected because they trigger a "too much recursion" exception.

- An impure cell "=Math.random()" will be observed with different values by its dependencies.


This thing is getting me confused, there seems to be some black magic going on, I can't understand how it can possibly work.


Look at the calls to defineProperty, then focus on figuring out why the with() block has been used in the `getter` function.


Is there a python/clojure/java version of this?


Not that I know of. It exploits some of JavaScript's weird scoping rules to keep everything very short. You're not going to get that in most other languages.


You mean this bit:

    with (DATA) return eval(value.substring(1));
? It can be implemented in Python without any problems. Of the dynamic languages I know, at least PERL, Python, Ruby, Smalltalk, Lua, Io, Racket have support for something similar. And even if not directly supported, any other language with `eval` would probably be hackable enough to implement this. In TCL that's actually a common idiom.

Anyway, I'm not aware of any JS feature which wouldn't be easily replicated in most other dynamic languages. JS was rather unlucky when it comes to language features - its development stalled for years, while in the same time Python, Perl, Ruby, PHP evolved quickly. JS only now gets features implemented in other langs many years ago.


Does anybody know any good open source spreadsheet implementation with JS+DOM?


We´re using http://handsontable.com/ in our project and are very happy with it so far.


There is for example http://handsontable.com/

Incidentally, I have been playing around with the concept myself (client work) and it turns out you can hack a semi-working solution in just a handful lines of js:

http://soquel.github.io/spreadsheet-in-js/


I know a dirty Spreedsheet for browser. I first seen it in IRC, iirc. I was a greedy copycat to paste it into my site, for the case paste server goes down.

http://kephra.de/copycat/zarasheet/

I kept the references. Both the original fiddle and the authors website are linked from my page.


I remember good old CALC.PAS that Borland included as a demo program with early versions of Turbo Pascal

http://z80cpu.eu/files/archive/rlee/B/BORLAND/TURBO%20PASCAL...


I've been thinking recently it would be nice to have an easy to work with open standard for spreadsheets - maybe based on JSON.

NB I know you can get Google spreadsheet data as JSON, but I was more thinking of something where you could have "headless" operation that could be embedded in other applications.


You can use OpenDocument Format files. These are supported by most spreadheet implementations. It comes in two variants: 1) a zip file with XML inside 2) a single XML file

http://docs.oasis-open.org/office/v1.2/os/OpenDocument-v1.2-...


This is a good JS component for embedding spreadsheets into apps: http://handsontable.com/


Why the "reads" field? It is just filled and cleared, but never used. This also solves GC issues.


It seems "reads" is used in "invalidate" to remove the cell from "observers" in the cells in "reads".


Yes. I missed that.

Nevertheless, removing the "reads" seems more worthwhile than removing "observers". You want to model the data flow efficiently, not the data dependencies. The dependencies are provided indirectly in the code anyways.


Even if you manage to remove the explicit "reads" field, the "code" thunk is still going to point to the parent cells, so that wouldn't help the GC much.


Spreadsheets rule. For ClojureScript there is the Javelin library: https://github.com/tailrecursion/javelin

It has a few novel features like transactional input.

We use Javelin for state management in the hoplon.io web framework.


Why do all the simple spreadsheet implementations forbid circular references? A colleague of mine made a very nice solver for temperature and pressure calculations in a transformer which would have been much more complicated without circular references.


Presumably a naive implementation would run the risk of having an infinite loop? Have you got any references of the theory proving them safe and/or removing the references?


I don't know if an arbitrary set of cell formulas could be proven to converge but I do know from experience that it is a useful technique. The trick is in setting the initial conditions; usually there will have to be one or more cells that contain a check for a wildly out of range value that will force the cell value into a reasonable range. Excel, and other spreadsheet programs, have counters that are used to halt circular calculations after an arbitrary limit. This is really no different than using a while loop to compute a value iteratively.


I feel there is an interesting bit (or example) missing: how does the (OCaml) code look that implements the expressions that are evaluated? The code returns a value and the list of cells it read - how is obtaining this list best organised?


Every time I read about reactive patterns, I hope that some data dependency or cells like behavior is included, and every time it's just some thin wrapper around basic pub/sub.


Can you expand on what you mean by "data dependency" and "cells-like behaviour" in this context? I'm afraid I'm drawing a blank on the terminology.


By cells, I mean something like this: https://common-lisp.net/project/cells/ (there's even a HN discussion about it here: https://news.ycombinator.com/item?id=1885974)

which basically allows to express interdependencies between data (a.k.a. formulas) in a way that updates sink variables when source variables are changed.

At the moment, I think not even Rx extensions for .NET (which I consider to be the most advanced reactive implementation in a mainstream language) do support this style of computation.


I implemented a spreadsheet in react, flux, immutable, and used a command pattern here: https://github.com/NickStefan/rixif

I did not take the approach of cells actually observing each other. Instead I had a recursive function that worked from the entered cell to:

-- parsed a string DSL of "=sum(B4:B8)" into a javascript function, that pulls some premade functions like sum, vlookup etc, and makes a string list representing what an arguments getter will need to fetch ("a_r3c1_r7c1" ie array, row 3, col 1 to row 7 col 1). The user still just sees "=sum(B4:B8)"

-- note who "I depend on",

-- note who "depends on me"

-- recurse 'outwards', skipping cells in "depends on me" that are still waiting on 'needs recalc' of their own "I depend on".

The update algorithm is not the hard part. Microsoft also has a very detailed documentation of their own update algorithm online (cant find it at the moment though).

The hardest part was updating the string representations of the formulas when you insert a new column or row, and then re-updating each cell's dependencies arrays.

Definitely not a performance problem, but more of a "how the fuck do I wrap my head around all the different ways someone could want to insert, cut, copy and paster here".

One mistake I made was trying to impliment the undo/redo to be totally reversable at every step. So every command stores the way to go both back and forward. In hindsight, I should have just stored forward commands and rebaked from the beginning when someone wanted to go back in time.

I wish I had more time to work on the project, but I've been busy with other work.

notes: - an excel to JSON parsing library in node :) https://github.com/NickStefan/parsexcel.js

- A lot of people mentioning Handsontable.js. That library has some major design flaws. We used it in our git style version controlled spreadsheet app (http://www.gridhub.xyx). Handsontable only takes simple 'number' or 'string' value for each cell. It should have been an object that could store CSS for that cell, formula for that cell, and the value for that cell. I made a few pull requests, but they largely ignored me. The code is odd (they use labels, continue, and other C style code).Handsontable is great only for presenting tables. Its terrible for implementing a full excel clone.


Twitter detects the URL as dangerous. O_o


Well I think the title should be "How to implement a spreadsheet in OCaml" It's impossible for me to follow the article :(


Any programmer with much experience should be able to follow the structure of that article and translate the concepts to the programming language of their preference.

There is nothing language specific here, just think of the OCaml as pseudo-code.


Totally of topic to spreadsheets but why oh why does blogspot still change the domain based on ip? (Yes, I know that it is related to selectively blocking certain blogs in certain areas but there has to be a better way to do this, seriously.)


BlogSpot is owned by Google, they should know better.


google.com does the same.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: