More

Buetol · on April 14, 2021

I like https://listenparadise.org/ in the same vein of music only radio.

Buetol · on Feb 15, 2021

+1 for Work Adventure !

Buetol · on Dec 11, 2020

Top 10:

- Python: salt, core (https://github.com/home-assistant/core), pandas, scikit-learn, numpy, airflow, erpnext, matplotlib, pytest & pip

- Rust: servo, cargo, rust-clippy, tokio, rust-analyzer, tock, tikv, alacritty, libc & substrate

- JS: node, react-native, react, gatsby, three.js, bootstrap, material-ui, odoo, next.js & Rocket.Chat

- Java: elasticsearch, flink, spring-boot, hadoop, netty, jenkins, beam, bazel, alluxio & pmd

- C++: tensorflow, ceph, pytorch, bitcoin, electron, Marlin, Cataclysm-DDA, llvm-project, rocksdb & QGIS

- C: git, linux, linux, php-src, openssl, systemd, curl, u-boot, qemu & mbed-os

megous · on Dec 11, 2020

Which of these match the image in the article, where some random Nebraskan is maintaining some obscure dependency?

yorwba · on Dec 11, 2020

The criteria https://github.com/ossf/criticality_score/blob/main/README.m... are designed to give higher scores to projects that are updated frequently by many contributors from different organizations, so of course the random Nebraskan's critical project is going to get a low "criticality score".

eeZah7Ux · on Dec 11, 2020

> of course the random Nebraskan's critical project is going to get a low "criticality score"

...which is the exact opposite of what it should be.

abharya · on Dec 11, 2020

That is not true, check out this as an example - https://groups.google.com/g/wg-securing-critical-projects/c/.... We are just trying to help, so please provide constructive feedback and any ideas on metrics we can use.

sansnom · on Dec 11, 2020

Nice tool.

I think we could improve it a bit. For example, Spring boot should have a very low score for me. It's backed by a large company Pivotal. They don't need any support I think. Same thing for elasticsearch.

For me:

    - backed by a large company ? 
 
    - number of contributor doing 80% of the work ? or active in the last 12 months ? commits breakdown (99% is done by one guy) ?  

    - issues created/closed ratio  

    - PR created/merged ratio  

    - use critical projects ?  

    - other from your original score

A nice bonus: if we could use the tool to assess critical score for our project (not globally). For local dependency, we could increase the critical value if dependents count is low. Very few person is using it: that's a bad sign. With this, we could find those dependencies.

We could also create a global score (like you did) by using the previous score and scaling it using the dependency usage (dependents_count like you did).

With this calculation, I think it's more likely to find relevant projects.

How to find it's backed by a large company ? Not sure about this, we can check if the project is part of an organization, if contributor have a company or if they have a pro account. For example, if the top 5 contributors are from Google, it's likely it's sponsored by it(could be done during their free time but less likely).

Note: check what happens with a stable project (no new issue and PR).

carapace · on Dec 11, 2020

I'm not sure if this is relevant, but bash and the readline lib are both maintained by a single unpaid volunteer. (I don't know if he's from Nebraska though.)

yorwba · on Dec 11, 2020

Gensim is #119 in the list according to your link, far behind projects with many more active contributors, so hardly a resounding success of your scoring method.

In terms of metrics, you could start by weighing projects with few contributors as more critical, not less. Specifically, gensim does appear to have had quite a few contributors, but the bulk of the code was written by the single maintainer https://github.com/RaRe-Technologies/gensim/graphs/contribut... So maybe you should add a metric "percentage of code in the past year authored by the top contributor".

If you want to go about it in a more data-driven fashion, you could go through the top projects for each language, check whether they actually need your support (e.g. find out what the development goals are, ask whether the current resources are sufficient and what they'd do with the additional resources you can provide) to get a ground-truth labeling of critical projects, then readjust your weights to match the ground truth.

commandlinefan · on Dec 11, 2020

That would be "left-pad": https://qz.com/646467/how-one-programmer-broke-the-internet-...

gfxgirl · on Dec 12, 2020

Yes, because the maintenance of such a library is astronomical! </sarcasm>

wyldfire · on Dec 11, 2020

I don't think it's explicit in the article, but that's a comic (XKCD). Relevant discussion [1] suggests that there's not a specific project referenced by the comic.

When I first read this comic, ntpd [2] [3] came to mind.

[1] https://www.explainxkcd.com/wiki/index.php/2347:_Dependency#...

[2] https://lwn.net/Articles/701222/

[3] https://lwn.net/Articles/713901/

imglorp · on Dec 11, 2020

At least NTP is a standard protocol and there are a bunch of cromulent alternatives to the original Mills ntpd. Chrony and ntpsec are both reasonable. If any one of the implementations went away tomorrow, distros would scramble but they'd have a place to go.

stonogo · on Dec 11, 2020

I'm interested in the gulf between low-level systems code and effectively end-user code here. Things like GLib, Cairo, Harbuzz, etc -- none of the end-user tools work without them. It really demonstrates just how difficult this evaluation can be.

abharya · on Dec 12, 2020

It is definitely difficult especially with critical dependencies. We are looking for any criteria to identify these in automated fashion. Parallely, for ones we know about those are we are trying to run our automated tools on. E.g. glib, cairo, harfbuzz are all continuously fuzzed as part of OSS-Fuzz - https://github.com/google/oss-fuzz/tree/master/projects

justinclift · on Dec 11, 2020

Definitely needs more work.

The main Go SQLite3 for accessing SQLite databases is in the top "C" list.

But SQLite itself doesn't seem to be included in any of them. o_O

arp242 · on Dec 11, 2020

That's probably because SQLite doesn't use git, and this tool seems to require git. Actually, in its current state it seems to require GitHub: https://github.com/ossf/criticality_score/blob/main/critical...

This probably omits some other projects as well which don't use git or GitHub.

abharya · on Dec 11, 2020

Yes correct. Right now, we are query-ing projects hosted on Github, but will be expanding to our source control system in the near future.

rhencke · on Dec 11, 2020

Please consider allowing scanning tarball/zip distributions of source directly as well. It is a SCM-agnostic method that is also well-supported by GitHub, Gitiles, hgweb, and many old but still-in-use projects that pre-date Git.

arp242 · on Dec 12, 2020

It would be nice if this could be mentioned a bit clearer in the blog post and/or README; it's not really that obvious at all and I had to go to the source to check, and loads of people here seem confused about it since it more or less implies "we looked at all open source projects".

justinclift · on Dec 11, 2020

No worries. Just to point out though, SQLite does have an official mirror repo on GitHub:

https://github.com/sqlite/sqlite/

jchw · on Dec 11, 2020

That might be because sqlite isn’t on Github.

justinclift · on Dec 11, 2020

Yeah, that could be the case. That being said, lots of projects aren't on GitHub, and SQLite does have a mirror there which is kept up to date:

https://github.com/sqlite/sqlite/

abharya · on Dec 12, 2020

The issue with the mirror is we don't get the important stats to make decisions. E.g. number of contributors, issue changes due to custom issue tracker. We are still thinking on how to add information from such cases in automated fashion, ideas welcome!

justinclift · on Dec 12, 2020

Would it be feasible to add support for Fossil, so it's not just a git-only tool?

EvilEy3 · on Dec 11, 2020

In what universe Bazel is a critical project for Java ecosystem?

leegraham · on Dec 11, 2020

It looks like it's in the Java section because it's written _in_ Java, not because it's critical _to_ Java. Similarly, php-src isn't critical to C.

lokedhs · on Dec 11, 2020

The Google universe. Don't they use it for all their builds?

abharya · on Dec 11, 2020

The list captures projects written in the Java language, not for Java ecosystem. See why it is critical here - https://github.com/bazelbuild/bazel/wiki/Bazel-Users

tziki · on Dec 11, 2020

In the Android universe?

nine_k · on Dec 11, 2020

Interestingly missing from the list: LLVM, SQLite, Postgres.

abharya · on Dec 11, 2020

They are there:

cplusplus_top_200.csv:llvm-project,https://github.com/llvm/llvm-project,C++,48,0,2573,5,652.3,2...

c_top_200.csv:postgres,https://github.com/postgres/postgres,C,124,0,50,5,41.1,52,1,...

sqlite sorry since it is not hosted on github, and we do plan to add non-github repos in future.

eeZah7Ux · on Dec 11, 2020

The results are really flawed.

TAForObvReasons · on Dec 11, 2020

Some de-duplication is needed. 12, 13, and 14 for JS are amphtml

abharya · on Dec 12, 2020

That is a bug, will be fixed soon.

bayindirh · on Dec 11, 2020

C++: Eigen (the linear algebra and matrix library powering tensorflow and many more projects).

Buetol · on Dec 11, 2020

If you use your own domain for email, you can change host anytime you want, pretty nice.

kennywinker · on Dec 11, 2020

I wouldn't dream of using the provider's domain - but it's still a hassle to switch email providers (especially if you want your old mail ported over)

Buetol · on Oct 6, 2020

Took 2sec to setup and works very well !

EDIT: It's really Custom Search Engine rebranded to me, it doesn't index all you website, I still prefer Algolia if I had to have a external search provider.

EDIT 2: Seems very customizable, maybe by adding all your urls it will index all you site, making it a viable alternative to Algolia

Buetol · on June 5, 2020

Looks like the link to the "I want HUE" tool is broken, here's the proper link: http://medialab.github.io/iwanthue/

azepoi · on June 5, 2020

I'm impressed by the other mentioned picker, for HCL this time http://tristen.ca/hcl-picker/

This tool is also useful https://vis4.net/palettes/#/9|s|00429d,96ffea,ffffe0|ffffe0,...

I have to mention this fantastic presentation on colormaps https://www.youtube.com/watch?v=xAoljeRJ3lU (A Better Default Colormap for Matplotlib, SciPy 2015, Nathaniel Smith and Stéfan van der Walt). It is about perceptually uniform colormaps. https://bids.github.io/colormap/

Buetol · on June 2, 2020

My main takeaway is that Tor is introducing a new domain names suffix: .tor.onion

For information, there was a similar initiative by Namecoin with .bit.onion: https://www.namecoin.org/docs/tor-resolution/ncprop279/stemn...

Buetol · on June 2, 2020

This is buried in the comments but `pudb` is great debugger: https://pypi.org/project/pudb/

- visual interface to set breakpoint

- inline interpreter

- inspect on exception

It's like using a modern debugger but inside your console. Can't recommend enough.

ur-whale · on June 2, 2020

Does it support multi-threaded python code?

Buetol · on June 2, 2020

Looks like it does: https://medium.com/@auro_227/use-pudb-to-debug-python-multip...

Buetol · on May 4, 2020

I've done a similar project, it takes a website and output broken urls using Scrapy: https://gist.github.com/mdamien/7b71ef06f49de1189fb75f8fed91...

Buetol · on March 27, 2020

I also researched the best diff algorithm. Google's diff-match-patch [1] library produce very good diffs for example. But I found that the best diffs are produced by wikidiff2 [2], the MediaWiki diff engine. Both engines produce word-by-word diff.

[1]: https://github.com/google/diff-match-patch

[2]: https://www.mediawiki.org/wiki/Wikidiff2

steerablesafe · on March 27, 2020

I wonder if these could be used as custom git diff drivers.

Buetol · on March 27, 2020

Maybe, I've already used another diff engine, WikiWho [1], to do my own implementation of git blame [2] (shameless self-promotion)

[1]: https://github.com/wikiwho/WikiWho

[2]: https://pypi.org/project/git-word-blame/