Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

A Mercurial compatible SCM (not sure if it is a fork) built for their workflow (monorepo) and scale (enormous, git is not usable at their scale, at least for a monorepo). Uses Python and Rust. Designed for efficient centralization rather than decentralization.


So if you are at Facebook's or Google's scale, and also run a monorepo, this will be great for you.

Which is to say, this is a product for one company - Facebook.


To their credit, I don't think its use-case is limited to monorepos. I've personally had a multiple GB `.git` file due to storing many data files in a repo. In retrospect, data __shouldn't__ be version-controlled, but it's sometimes the simplest solution, e.g. having a unit-test suite intake a bunch of CSV data. Eden's "a file is checked out only if opened" and "scan only modified directories" would've allowed me to avoid decoupling data from code.


Data should be versioned if it’s one of your inputs, even if you can’t merge it. It’s just that the gut tooling for it (including git-lfs) is horrible.

Perforce (and apparently Eden) make it usable.


data shouldn't be version controlled? Are you saying all the AAA game studios that version control their assets in p4 are doing it wrong?


Data can and should be versioned, but not by just `git add BLOAT`. Take a look at https://dvc.org/: blobs are uploaded to a S3 compatible blob storage, metadata is versioned in a config file and this one gets versioned in git


This. I've got a 35gb repo for the game I'm working on mostly solo.


Or Google, as per your first sentence :)


Google already has its own [=


Would kill to have piper open source though.


I would be interesting but basically useless.

- Requires a custom kernel to run. Although most of these patches are probably floating around the kernel mailing list in some form or another.

- Requires Google's RPC and authentication system. (I guess GRPC is open source now, IDK if piper has switched, but you still need auth)

- Requires Google's group membership seefvice.

- Requires Google's storage engine. I don't remember if they have migrated to spanner yet but even then it would be using internal spanner APIs not the cloud ones.

There are probably also more less obvious ones but the point is that when Google writes software to run in Google production it is based on top if a mountain of infrastructure. I'm not sure the design of Piper is in any way novel enough to be worth it. I mean piper works and scales well but I don't think it is a fantastic VCS.


Why wouldn't it be great for other companies that run a monorepo? Would it be nuts to go from some other monorepo (like Subversion) to this?


I wouldn’t think there would be many companies running repos as large as Facebook.

If you use a tool like this you would basically be on your own. If you “stick to git” (or mercurial or whatever) at least you have all that momentum behind you, and you almost definitely won’t be the first people to encounter a problem.


You don’t need to have millions of files to have hg performance that leaves you wanting. A few GB of repo should do the trick.


Can't wait for the "We switched to Eden from Git" posts in 6 months.


Which will result in lots of people thinking "FB does it! It must be the future! We have to do this as well, or we will be left behind!" sigh.


It’s interesting that they prefer to develop such a tool rather than giving up on the monorepo concept.


One very counterintuitive truth of large scale software development is that as you scale to multiple services, you are gravitationally pulled into a monorepo. There are different forces at work but the strongest at this scale are your data models and API definitions.


The problems with data models can happen at small scale. I remember the first large-ish project that I built, it had a few different components but the problem started when I tried to introduce data models to the Java and Python parts (I had it in my head after reading a book that I needed domain objects or some other nonsense in each language)...mistake, the data was still changing, it took forever to make changes, wasn't critical but I learned my lesson very quickly.

One perspective on this is that many ORM libraries don't take the DB as the source of truth (one very good ORM library that does, and which saved me in this case, was JOOQ). I think a lot of small-scale problems could be solved this way, monorepo is just another variation of this solution: moving the source of truth into the repo.

It is surprising to me how often variations of this problem come up. Obviously, there are solutions from multiple directions: having cross-language definitions (ProtoBuf), Arrow (zero-copy abstractions suitable for high performance), maybe even Swagger which comes at the problem from documentation...but I think this problem still comes up anywhere (and the DB approach is, imo, a very strong approach with a decent ORM at smaller scale).


Does your schema for data models (inception!) have a revision associated with it? If not, deployments are going to be spicy. If so, you end up having to deal with version-rot. Part of why putting this in the repo with your source code is a winning solution is that when you're working off head, you will naturally pick up and test the latest thing, and in most cases your next deploy will also just naturally roll forward as well.


Is it? I'm working for a small company and even at our scale, when the workflow is centralized, I find git a bit painful at times. I mean it's still an amazing tool, don't get me wrong, but when you have to deal with several sub-projects that you have to keep in sync and need to evolve together, I find that it gets messy real fast.

I think the core issue with git is that the submodule thing is obviously an afterthought that's cobbled together on top of the preexisting SCM instead of something that was taken into account from the start. It's better than nothing, but it's probably the one aspect of git that sometimes makes me long for SVN.

At the scale of something like Facebook you'd either have to pay a team to implement and support your split repo framework, or you'd have to pay a team to implement and support your monorepo framework. I don't have enough experience with such large codebases to claim expertise, but I would probably go for a monorepo as well based on my personal experience. Seems like the most straightforward, flexible and easier to scale approach.


If your company is small, I don't think you should be using git submodules at all.

My last place was about 10 years young, 150 engineers, and was still working within a single git repo without submodules.

There is a non-zero amount of discoverable config that goes into managing a repo like that, but it's trivial compared to the ongoing headaches of managing submodules, like you suggest.


We need to track large external projects (buildroot, the Linux kernel for instance) so the ability to include them as submodules and update them fairly easily is worth it IMO. If you're at the scale of Google it probably makes vastly more sense just including the code in your monorepo and pay a bunch of engineers to merge back and forth with upstream and have the rest of your team not worry about it, but for us it would take a lot of time and effort to maintain a clone of these projects in a bespoke repository.


We have customer IP that not everyone is allowed to access and has to be deleted after the project is done. We use submodules and IMO it sucks but I don't see a way around it considering the restrictions.


For extremely small companies (N == 1) git submodules can be neat though. It’s a great way to create small libs without having to bother distribution through LuaRocks, npm, RubyGems and the like.


Submodules are a great way to break out libraries in a language-agnostic way without having them really be broken out. This is independent of team size.


Dan Luu wrote about monorepo. It's worth a read https://danluu.com/monorepo/


You need good tooling to work with large monorepos, you need good tooling to work with large multirepos. Neither option is easy at that scale.


Do Facebook and Google literally have repos with everything they write in there available to everyone that works there (modulo privileged stuff)?


For a little more color on your modulo, the major omission in google3 I can recall from ~9 years ago was Android. For Reasons, I think legal.

The others weren’t “oh huh” enough to be easily recalled writing this comment, which probably speaks to their interestingness. But yes, you can chdir from search to calendar to borg and their dependencies, internal and vendored. It’s pretty much all there. It was pretty splendid, actually, and influences my thoughts on monos to this day.


Not quite, but almost.


the monorepo is handy: simplifies dependency management


But it ads complexity and creates its own issues.


Monorepo (on git) has been awesome for us the last 5 years or so.


The docs still refer to the the tool as Mecurial/hg:

https://github.com/facebookexperimental/eden/tree/main/eden/...


This doesn't surprise me. It's a fork, nobody bothered to update the readme, and as much as folks wanted to update things like documentation, improving the software was a higher priority.


It says it was originally based on Mercurial, but is no longer a distributed source code control system. Are you sure it's still compatible?


I thought Microsoft was working on a "mod" to git that made it work on huge repos, e.g., the Windows source. Did that ever come to fruition?



you mean git lfs? That is alive




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: