I've just been exploring serving large SQLite databases in chunks and querying them with http range requests to prevent downloading the entire database. It's pretty awesome!
I found a really interesting library called sql.js-httpvfs[0] that does pretty much all the work. I chunked up my 350Mb sqlite db into 43 x 8Mb pieces with the included script and uploaded them with my static files to GitHub, which gets deployed via GitHub Pages.[1]
It's in the very rough early stages but you can check it out here.
I recommend going into the console and network tab to see it in action. It's impressively quick and I haven't even fine-tuned it at all yet. SQLite rules.
> Traditionally, coding involves three distinct “time buckets”:
> Why am I doing this? Understanding the business problem and value
> What do I need to do? Designing the solution conceptually
> How am I going to do it? Actually writing the code
> For decades, that last bucket consumed enormous amounts of our time. We’d spend hours, days or weeks writing, debugging, and refining. With Claude, that time cost has plummeted to nearly zero.
That last part is actually the easiest, and if you're spending inordinate amount of time there, that usually means the first two were not done well or you're not familiar with the tooling (language, library, IDE, test runner,...).
There's some drudgery involved in manual code editing (renaming variable, extracting functions,...) but those are already solved in many languages with IDEs and indexers that automate them. And so many editors have programmable snippets support. I can genuinely say in all of my programming projects, I spent more time understanding the problem than writing code. I even spent more time reading libraries code than writing my own.
The few roadblocks I have when writing code was solved by configuring my editor.
I have tried a lot of local models. I have 656GB of them on my computer so I have experience with a diverse array of LLMs. Gemma has been nothing to write home about and has been disappointing every single time I have used it.
Models that are worth writing home about are;
EXAONE-3.5-7.8B-Instruct - It was excellent at taking podcast transcriptions and generating show notes and summaries.
Rocinante-12B-v2i - Fun for stories and D&D
Qwen2.5-Coder-14B-Instruct - Good for simple coding tasks
OpenThinker-7B - Good and fast reasoning
The Deepseek destills - Able to handle more complex task while still being fast
DeepHermes-3-Llama-3-8B - A really good vLLM
Medical-Llama3-v2 - Very interesting but be careful
I found the combination of real-world problems, general SQL advice, and the broad range of topics to be a really good book. It took my SQL from “the database is not much more than a place to persist application data” to “the application is not much more than a way to match commands to the database”. It’s amazing how much bespoke code is doing a job the database can do for you in a couple of lines.
I built a pipeline to automatically cluster and visualize large amounts of text documents in a completely unsupervised manner:
- Embed all the text documents.
- Project to 2D using UMAP which also creates its own emergent "clusters".
- Use k-means clustering with a high cluster count depending on dataset size.
- Feed the ChatGPT API ~10 examples from each cluster and ask it to provide a concise label for the cluster.
- Bonus: Use DBSCAN to identify arbitrary subclusters within each cluster.
It is extremely effective and I have a theoetical implementation of a more practical use case to use said UMAP dimensionality reduction for better inference. There is evidence that current popular text embedding models (e.g. OpenAI ada, which outputs 1536D embeddings) are way too big for most use cases and could be giving poorly specified results for embedding similarity as a result, in addition to higher costs for the entire pipeline.
I've started to bring up Admiral Rickover's speech Doing a Job in all of the Boeing threads because he is just so relevant. Admiral Rickover was the man responsible for America having nuclear submarines. https://govleaders.org/rickover.htm
The speech is well worth a read in its entirety and it feels prescient in regards to Boeing. I think this paragraph more than any other hits at the core problem at Boeing:
> Unless the individual truly responsible can be identified when something goes wrong, no one has really been responsible. With the advent of modern management theories it is becoming common for organizations to deal with problems in a collective manner, by dividing programs into subprograms, with no one left responsible for the entire effort. There is also the tendency to establish more and more levels of management, on the theory that this gives better control. These are but different forms of shared responsibility, which easily lead to no one being responsible—a problems that often inheres in large corporations as well as in the Defense Department.
To contrast here is a statement from Calhoun: “We caused the problem. And we understand that. Over these last few weeks, I've had tough conversations with our customers, with our regulators, congressional leaders, and more. We understand why they are angry, and we will work to earn their confidence,” Calhoun said.
That we is him failing to take personal responsibility and choosing instead to spread responsibility to all employees, making no one responsible for the state of Boeing.
Boeing needs a leader who will take personal responsibility.
I believe the prevailing wisdom is that tech hiring slowed because interest rates rose.
Because software scales so well, it benefits from speculative effort more than other business types. We see this in venture capital, where they only need 1 out of 100 bets to hit in order to make their money. Large tech companies do something similar internally. They may fund the development of 100 products or features, knowing they only need one of them to hit big in order to fund the company going forward.
When money was essentially free to borrow, it made all the sense in the world to make a large number of bets because the odds were on your side that at least one of them would pay off. Now, however, each bet comes with a real opportunity cost, so companies are making fewer speculative bets and thus need fewer people.
---
The other thing he doesn't talk about is the rise of remote work and the downward pressure that it puts on wages. I know that many companies are forcing employees to return to the office, but I'd speculate that the number of remote workers has risen significantly. And that opens up the labor market pretty significantly.
I'll tell you that I'm getting overseas talent for roles where 10 years ago I would have hired entry level talent in the US. But since my company is fully remote and distributed, the downside to hiring in LatAm and Eastern Europe has been significantly reduced.
>The Valid method takes a context (which is optional but has been useful for me in the past) and returns a map. If there is a problem with a field, its name is used as the key, and a human-readable explanation of the issue is set as the value.
I used to do this, but ever since reading Lexi Lambda's "Parse, Don't Validate," [0] I've found validators to be much more error-prone than leveraging Go's built-in type checker.
For example, imagine you wanted to defend against the user picking an illegal username. Like you want to make sure the user can't ever specify a username with angle brackets in it.
With the Validator approach, you have to remember to call the validator on 100% of code paths where the username value comes from an untrusted source.
Instead of using a validator, you can do this:
type Username struct {
value string
}
func NewUsername(username string) (Username, error) {
// Validate the username adheres to our schema.
...
return Username{username}
}
That guarantees that you can never forget to validate the username through any codepath. If you have a Username object, you know that it was validated because there was no other way to create the object.
People will never be motivated to go the extra mile by a standardized, bureaucratized process. It's not a problem specifically with OKRs, it's a problem with the whole concept that if HR can just put in this one simple system then doing so will be magically motivational and the whole company will go to ludicrous speed.
There is no replacement for good people. Not in leadership positions, and not in IC positions. Recruit for strengths, hire for culture, train for gaps. No process, least of which OKRs, can make up for recruiting weak people, people who don't fit your culture, or people not interested in personal growth (i.e. filling gaps).
There's some fun music theory lurking inside this project. It turns out that every transition from one chord to another via the algorithm described is either from one chord to itself (e.g., C major to an inversion of itself or adding/removing a 7th), or one of the three basic Neo-Riemannian transformations: P, L, or R.
Go check out the Wikipedia page on Neo-Riemannian theory for more details, but here are a few key facts about P, L, and R: For any chord x, then if x is major, then P(x), L(x), and R(x) are all minor. If x is minor, then P(x), L(x) and R(x) are all major. P, L, and R are all inverses of themselves, so that P(P(x)) = x, and so on for L and R. It's possible to reach any major or minor chord from any other major or minor chord by some sequence of P, L, and R transformations. For example, from C major, applying L then R gets you to G major; applying R then P gets you to A major; and applying L then P gets you to E major.
Hopping around via Neo-Riemannian transformations are a quick way to use smooth voice leading to get to a "remote" key center (i.e., one that doesn't have many scale tones in common with the key you started in), but I was surprised when listening to the piece how (relatively) stable the harmony seemed. What's interesting here is that because of the way the algorithm is constructed, P transforms are much less common than L or R transforms (or just staying with the same chord) -- and crucially, P transforms are a vital ingredient in quickly moving to remote keys. By my rough calculations (which assume the Markov process has reached steady state and ignore the limits on min/max pitch), only 1/27th of all chord changes are P transforms. It also turns out that in steady state, 7th chords are more common than simple triads by a ratio of 16:11.
First step, I needed to build a MIDI player in JavaScript. At first, I was determinted to write one from scratch in JavaScript and use the Web Audio API to synthesize all the instruments in code. I thought this would yield the the smallest possible JavaScript file size.
However, I didn’t really have the audio engineering skills to pull this off. So ended up settling for an approach that uses SoundFonts, which are basically instrument voices or files that contain all the possible notes an instrument can play.
BitMidi uses the instrument voices from the General MIDI sound set released by FreePats.
Then I compiled the best MIDI player written in C (libtimidity) to WebAssembly using Emscripten. I put in lots of effort to optimize the built size and include the minimal amount of code. The result of my efforts are available in the npm package timidity (https://github.com/feross/timidity). It’s quite lightweight – just 34 KB of JavaScript and 23 KB of lazy-loaded WebAssembly, smaller than anything I’ve seen on any other site.
Then I put a frontend on it, so it’s easy to browse all the files. BitMidi uses all the best techniques that I know about to make it super fast and snappy. The site gets perfect 100s on all categories on Chrome’s Lighthouse Performance benchmark, which is extremely non-trivial in my experience.
I plan to ingest a lot more MIDI files in the future, from sources like the Geocities MIDI archive on the Internet Archive and elsewhere.
- What does our app do?
- Microservices usually don't work well for startups
- Move fast and outsource things
- Consider building reusable things
- Be pragmatic
- Boundaries along sync/async communication patterns
- How we did it and how we would do it next time
- About flexibility
- Predictability
- Rule #1: Every endpoint should tell a story
- Rule #2: Keep business logic in services
- Rule #3: Make services the locus of reusability
- Rule #4: Always sanitize user input, sometimes save raw input, always escape output
- Rule #5: Don't split files by default & never split your URLs file
- Readability
- Rule #6: Each variable's type or kind should be obvious from its name
- Rule #7: Assign unique names to files, classes, and functions
- Rule #8: Avoid *args and **kwargs in user code
- Rule #9: Use functions, not classes
- Rule #10: There are exactly 4 types of errors
- Simplicity
- Rule #11: URL parameters are a scam
- Rule #12: Write tests. Not too many. Mostly integration.
- Rule #13: Treat unit tests as a specialist tool
- Rule #14: Use serializers responsibly, or not at all
- Rule #15: Write admin functionality as API endpoints
- Upgradability
- Rule #16: Your app lives until your dependencies die
- Rule #17: Keep logic out of the front end
- Rule #18: Don't break core dependencies
- Why make coding easier?
- Velocity
- Optionality
- Security
- Diversity
An early explainer of transformers, which is a quicker read, that I found very useful when they were still new to me, is The Illustrated Transformer[1], by Jay Alammar.
A more recent academic but high-level explanation of transformers, very good for detail on the different flow flavors (e.g. encoder-decoder vs decoder only), is Formal Algorithms for Transformers[2], from DeepMind.
1. Web3 hired a lot of these people and so they had less time to work on this stuff. Shame to spend that much on a dead end but eh
2. Scala died with Big Data. It is still around and all but noone care anymore, which emptied the room. It also happened that the whole Implicits experiment for polymorphism, which scala was really supposed to explore, did not pan out that well
3. Effects progressed but... Mostly out of view. Ocaml shipped them with its multicore, we are seeing good work on the academic side, you see Verse wanting them, etc. Same thing with linear types.
4. Dependent types ... Never really crossed to the realm of production. And Idris and co are mostly "complete" so it slowed down
5. Oh and monad interest, mostly fueled by scala, died slowly. Effect handlers seems to be a nicer solution in practice to most of this stuff.
6. Typescript killed a lot of the need for advanced stuff, same with python and ruby shipping their stuff too. Meanwhile Rust and Elixir showed you did not need the really up there stuff to have results in prod.
In the end what happened is that a lot of the highly abstract stuff was driven by "hype domain" that died, while more pragmatic but limited implementation burgeoned and absorbed some of them. Rubber met the road and that tampered a lot of people down.
There is still work being done, but rn it is more at the "experimental language" stage. Think Rust in the mid 00s.
Oh and Rust mindshare is still growing. A lot. A looooot.
There’s an incredibly straightforward and readable paper by Simon Peyton Jones (one of the creators of Haskell and GHC) which explains how Haskell deals with IO, exceptions, and concurrency. It also explains why they settled on this, rather than some other design. In my opinion, it is the best explanation of the IO monad (specifically IO) out there. Even just reading the first 10 or so pages is completely worthwhile.
I love this thread, it has two of my favorite HN topics:
1) People shitting on JavaScript not realizing that their "obviously better" solution was considered and found not a good solution.
2) People shitting on TypeScript not realizing that conditional types and template literal types are awesome. I really like those type-safe routers (https://tanstack.com/router/v1/docs/guide/type-safety) and fully-typed database clients (https://www.edgedb.com/docs/clients/js/index#the-query-build...).
I've used Tailwind extensively at previous companies and inevitably each one creates an abstraction that's akin to:
const headerClasses = [(list of Tailwind classes here)];
<header className={...headerClasses}>...</header>
because the complexity of reading and writing all of the classes is just too much. At that point, you've just reinvented CSS classes. Tailwind fans will tell you to not do this but if multiple companies are independently having the same problem and coming up with the same solution, the onus is not on the user anymore, it's on the creator to fix it. @apply can work but again it's really not recommended by Tailwind itself, for whatever reason.
These days I recommend learning CSS really well and then using Vanilla Extract (https://vanilla-extract.style), a CSS in TypeScript library that compiles down to raw CSS, basically using TS as your preprocessor instead of SCSS. For dynamic styles, they have an optional small runtime.
They have a Stitches-like API called Recipes that's phenomenal as well, especially for design systems, you can define your variants and what CSS needs to be applied for each one, so you can map your design components 1-to-1 with your code:
import { recipe } from '@vanilla-extract/recipes';
There is a Japanese carpenter with 50 years of experience on YouTube who documents his construction of entire homes, which I would recommend to anyone who is interested in modern Japanese wood construction:
The article is grouping things together that don't belong in the same categories.
OO, functional, imperative, declarative: these are ways of controlling dispatch.
Monoliths and microservices are both ways to organize codebases and teams of programmers and control whether dispatch is intermediated by the network or not. Either way, both of these options are implemented by some kind of language in the previous category (OO, functional, imperative, or declarative).
Service-oriented architecture applies to both monoliths and microservices, and very few programmers still working in the industry have really seen what an alternative to service-oriented architecture actually looks like.