I thought I must be going crazy until I saw your comment. This sounds like a bad...

itsaquicknote · on March 23, 2023

Research into systemically important infrastructure cannot be damned because that infrastructure isn't public. It's a cheap moralizing argument to say "pfff, this was predictable". Maybe so, but there isn't an alternative. Much like research on Twitter. Once these companies start to drift into providing what become broadscale social utilities and public services it doesn't matter that they're private. There are(/should be) obligations that come with that.

You can't handwave and say go do your research on some micro-niche open source project that's way behind the SOTA and has nowhere near the same reach. That's not what "best practice" means here.

itsaquicknote · on March 23, 2023

Replying to both responses because they're all good points. My argument boils down to the fact that some private companies end up becoming social utilities and once that happens, the rules (should) change as part of the social contract which means, yeah, they can't simply "pull the rug". The research is important precisely because its into systemically significant systems.

I get that it's difficult to define the line where that gets crossed. But the idea to provide a publicly funded trust that manages legacy versions of things like this is not a bad idea.

dahart · on March 23, 2023

No matter how you define it, or whether people even agree companies should be obligated to provide certain public services, we are just nowhere near that line yet in this case, net even remotely close. It’s hand-wavy to say it’s important, but this is all brand new, there are only a handful of researchers involved, the critical mass to justify what you’re suggesting does not yet exist, it won’t for some time, and there’s no guarantee it ever will. I’m not sure what you mean by publicly funded trust, but that’s typically quite different from privately funded public services. Assuming that cost is even the reason here, then if someone wants to establish a trust and engage OpenAI, they can.

That said, what if OpenAI shut down codex because it has dangerous possibilities and amoral “researchers” started figuring out how to exploit them? What if it was fundamentally buggy or encouraging misleading research? What if codex was accidentally leaking or distributing export-controlled or other illegal (copyright, etc.) information? I’m explicitly speculating on possibilities, while you’re making unstated assumptions, so entertain the question of whether OpenAI is already doing a public service by shutting it down.

itsaquicknote · on March 27, 2023

Agree to disagree.

dahart · on March 27, 2023

Feel free to elaborate, if you can. I gave you some added reasoning, so it doesn’t help anyone to flatly state disagreement without offering any justification. Why even bother to say you disagree?

What evidence is there that OpenAI’s codex has become a social utility? How many people used it to publish? Do you think the US government agrees? How likely is this case to go to court, and result in OpenAI being ordered to provide ongoing access to codex? That seems pretty far fetched to me, but I’m willing to entertain the possibility that I’m wrong.

Are you certain there aren’t problems with codex, that OpenAI isn’t working on something better, and/or shutting it down because it’s causing harm? If so, why are you certain?

dahart · on March 23, 2023

Sure but OpenAI isn’t preventing research. It’s not their responsibility to provide reproducibility, at their expense, for any researchers looking at GPT, that job is the responsibility of the researchers, and the researchers still can work. It might be unfortunate from their perspective that there used to be a nice tool that makes their job easier, but the flip side here is that OpenAI didn’t say why they’re removing access to codex, and they probably have good reasons, not least of which is it costs them money that researchers aren’t subsidizing.

z3c0 · on March 23, 2023

I'm going to be frank here, because I know my argument isn't "cheap". When one utilizes OSINT techniques (which using an ML service hosted by a third-party certainly qualifies as), there are baked-in assumptions that

1) this source could go away at any time, and

2) the source is only a reflection of the interests of the third-party, not something to be taken at face value.

No 2 can certainly be the subject of research, but to do so without accounting for No 1 would indicate bad research practices from the jump. For example, they could have (and should have) been snapshotting the outputs, tagged with versions & dates. By the sound of it, the outputs weren't even the subject of research, but were instead propping up the research. That flies in the face of No 2 as well. Let them start over, with better methodology this time.