Cody is being open sourced under Apache 2. The source code is here: https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-.... The analog would be if GitHub open-sourced Copilot but didn't open source GitHub (Sourcegraph is open core, similar to GitLab, with all the code publicly available and the enterprise-licensed code under "enterprise" directories).
The network dependencies are Cody --> Sourcegraph --> Anthropic. Cody does need to talk to a chat-based LLM to generate responses. (It hits other APIs specific to Sourcegraph that are optional.)
We are working on making the chat-based LLM swappable. Anthropic has been a great partner so far and they are stellar to work with. But our customers have asked for the ability to use GPT-4 as well as the ability to self-host, which means we are exploring open source models. Actively working on that at the moment.
Sorry for any lack of clarity here. We would like to have Cody (the 100% open source editor plugin) talk to a whole bunch of dev tools (OSS and proprietary). We think it's totally fine to have proprietary tools in your stack, but would prefer to live in a world where the thing that integrates all that info in your editor using the magic of AI and LLMs to be open source. This fits into our broader principle of selling to companies/teams, and making tools free and open for individual devs.
Thank you for your response. I think I get it now. When I first read the announcement I jumped at the thought that the entirety was being open sourced. Perhaps I got a bit clouded through that. I understand your needs as a company with a business plan and expect nothing is free. When the option to ask here in this discussion arose I took it, since I couldn't figure this out elsewhere.
Ability to use other LLMs, especially open ones, is promising. I guess it's mostly a matter of how APIs are standardised across these products. I mostly use Copilot and truly hope things can get better than that. Especially the lack of control is infuriating, and tendency to go off on repeats for no discernible reason. On paper Cody looks to do better here.
Hopefully it's not just on paper :) There are a lot of rough edges still, but we hope to iron them out as quickly as we can.
One of our core design principles for Cody is to make it "unmagic". Like, the AI is magic enough, but the rest of what we're doing in terms of orchestrating the LLMs in combination with various other data sources and backends should be clear and transparent to the user. This allows for greater understandability and steerability (e.g., if Cody infers the wrong context, maybe you can just tell it the file it should be reading and then regenerate the answer).
Copilot is a great tool, and Oege de Moor, Alex Graveley, and the whole GitHub Next team deserve huge credit for shipping it. That being said, I really want the standard AI coding assistant to be open, and there's been a ton of innovation in LLMs since Copilot's initial launch that doesn't seem to have been rolled in yet. I think this is a case where being open means we can accelerate the pace of innovation.
I'll add if folks want to submit a PR to turn on other LLMs (or have Cody talk to the base LLM provider directly, sans Sourcegraph), we're happy to accept those. Literally the only thing preventing us from doing that right now is prioritization (our team is 4 people and we're scrambling to improve context fetching and implement autocomplete rn :sweat-laugh-emoji:)
Local context is definitely the key factor for small models achieving better quality than copilot (related: [1], [2]) .
One things I’d really wanna have in Sourcegraph: A Search API supports custom retrieval / ranking. Research works (e.g [2]) show simple BoW fetched context is more efficient for code completion tasks.
The network dependencies are Cody --> Sourcegraph --> Anthropic. Cody does need to talk to a chat-based LLM to generate responses. (It hits other APIs specific to Sourcegraph that are optional.)
We are working on making the chat-based LLM swappable. Anthropic has been a great partner so far and they are stellar to work with. But our customers have asked for the ability to use GPT-4 as well as the ability to self-host, which means we are exploring open source models. Actively working on that at the moment.
Sorry for any lack of clarity here. We would like to have Cody (the 100% open source editor plugin) talk to a whole bunch of dev tools (OSS and proprietary). We think it's totally fine to have proprietary tools in your stack, but would prefer to live in a world where the thing that integrates all that info in your editor using the magic of AI and LLMs to be open source. This fits into our broader principle of selling to companies/teams, and making tools free and open for individual devs.