← Back to all posts

All Your (Code)Bases Belong to Us

7 min read

If you’ve built something clever on top of an AI coding platform and pushed it to a public repo, I have bad news: it’s not yours anymore. Not really.

It’s training data. It’s competitive research. It’s a feature request with working code attached. And the companies whose platforms you’re extending have every incentive to scrape it, learn from it, and ship their own version.

This isn’t hypothetical. It’s already happening.

The Pattern

Here’s how it works. A developer or small team builds something useful on top of an agentic coding platform. A set of skills, a workflow automation, a framework for managing agent behavior. They’re proud of it. They share it publicly because that’s what developers do. Open source is in our DNA.

Within weeks or months, the platform provider ships something suspiciously similar. New name. Polished UI. Official documentation. No attribution.

The community project that proved the concept? It’s now competing against the platform’s native implementation, built by a team with full access to the internals, the distribution channel, and the brand trust.

We’ve already seen this play out in the Claude ecosystem. Community-built agent skills and workflows get shared publicly. Then official versions appear with new names and better integration. Projects like ClawBot get studied, replicated, and absorbed. The original builders get nothing except the satisfaction of knowing they validated the idea for free.

This isn’t unique to Claude. It’s the economics of every platform that has ever existed. But AI makes it faster, more systematic, and harder to compete against.

This Isn’t New, But AI Makes It Worse

Platform companies have always watched what their ecosystem builds. Microsoft famously absorbed features from third-party utilities into Windows. Apple has a long history of “Sherlocking,” building native features that replicate popular App Store apps. Google regularly integrates functionality that started as Chrome extensions.

The developer community has a dark joke about it: the most dangerous thing you can do is build a successful product that fills a gap in a platform’s roadmap.

And this isn’t limited to big companies taking from small developers. It’s happening between the titans themselves. Watch Microsoft’s Copilot iterate. It’s mirroring Claude Code’s functionality at a remarkable pace. It even reads from the .claude directory. Anthropic built a developer experience, published enough of it for the ecosystem to engage, and Microsoft is studying every publicly visible piece of it to accelerate their competing product.

This is the dynamic at every level now. Billion-dollar companies do it to each other. Platform providers do it to their community builders. And AI makes the cycle faster every quarter.

What’s different now is the mechanism. AI companies don’t just watch your product. They can ingest your code. When your repo is public, every line of it is available as training data, as competitive intelligence, and as a blueprint. The AI doesn’t need to reverse-engineer your approach. It can read your implementation directly and generate something equivalent, or better, because it has access to the platform’s internals that you don’t.

The feedback loop is brutal: you build on their platform, share your work publicly, they learn from it, and they ship a version you can’t compete with. Your public repo isn’t just open source. It’s an unpaid R&D department.

Your License Won’t Save You

Here’s where it gets really uncomfortable: even if you license your code restrictively, it doesn’t matter.

When Claude Code’s source was leaked, a developer analyzed the code for its functionality (how it worked, what it did, how the pieces fit together) and built Claurst, a clone written in Rust. The developer described it explicitly as a “clean-room implementation.” No code was copied. No license was violated. None of Anthropic’s actual IP was legally compromised. But the innovation, the architecture, the “how did they solve this?” All of it walked out the door.

That’s the distinction most developers miss, and it comes down to a difference most developers have never thought about: copyright vs. patent.

Copyright protects expression: the specific code you wrote, the particular way you arranged the words and logic. Your license (MIT, GPL, proprietary, whatever) is a copyright instrument. It governs what people can do with your actual code. Copy it, modify it, distribute it. Those are copyright questions.

Patents protect invention: the method, the approach, the novel way you solved a problem. If you patented your architecture, your workflow, your novel agent orchestration pattern, then someone who reimplements it from scratch in a different language would be infringing, even if they never saw your code.

Here’s the problem: almost nobody patents their software innovations. Patents are expensive. They require legal expertise most indie developers and small teams don’t have. And critically, they take years. The average software patent takes 2 to 3 years to grant. In an industry where AI platforms ship new features every quarter, your patent will be approved long after your innovation has been reimplemented, absorbed, and forgotten. The entire open-source ecosystem runs on copyright and licensing, not patents. So the one form of IP protection that would actually stop a clean-room reimplementation is the one that’s too slow, too expensive, and too impractical for almost anyone to use.

That leaves copyright, which is useless against the actual threat. If someone reads your public repo, understands what you built and why, and then reimplements it from scratch in a different language, they haven’t violated your copyright. They haven’t broken your license. They’ve just used your published work as a blueprint, which copyright was never designed to prevent.

And AI makes that process trivially easy. A model can analyze a public codebase, extract the architectural patterns, understand the novel approaches, and generate a clean-room reimplementation in a fraction of the time it took to build the original. No human even needs to read your code. The AI does the analysis and the reimplementation in one pass.

Think that’s an exaggeration? A developer recently pointed Claude Code at artifacts from a multiplayer game he built in 1992, a game that won Computer Gaming World’s 1993 award for artistic excellence. No source code survived. All he had were Game Master scripts written in a custom scripting language he’d designed, a gameplay capture from 1996, and a GM manual from 1998. No formal spec. No documentation of the language grammar. Just examples and manuals.

Claude Code reverse-engineered the entire proprietary scripting language from those artifacts alone, reconstructed the full grammar, and rebuilt the game engine from scratch in Go and React, including 2,273 rooms, nearly 2,000 items, 297 monster types, a full crafting pipeline, and a d100 combat system. A weekend project. The original took months of solo C programming.

That was from artifacts. Incomplete ones. No source code at all.

Now imagine what it does with your fully documented, well-commented, publicly accessible source code. Your .NET library with XML docs and a README. Your Python package with type hints and a test suite. You’re not even making it work for the answer. You’re handing it the complete blueprint with instructions.

So when someone says “just use an open-source license that prevents commercial use,” that’s not the threat model. The threat isn’t someone forking your repo. The threat is someone understanding what you built and rebuilding it without ever touching your code. Your license, any license, is irrelevant to that.

The only protection that actually works is not publishing the code in the first place.

The Open Source Tension

I want to be careful here because I’m not arguing against open source. Open source has been one of the most productive forces in the history of software. The collaborative ethos of sharing code, learning from each other, and building on each other’s work has produced extraordinary things.

But the social contract of open source was built on an assumption that no longer holds: that the entities consuming your code are roughly similar in scale and capability to you. A developer shares a library. Other developers use it, improve it, contribute back. Everyone benefits. That’s the model.

When a company with a billion-dollar platform, thousands of engineers, and an AI that can process every public repo on GitHub is the one consuming your code, the power dynamics are fundamentally different. You’re not sharing with peers. You’re providing free labor to an entity that will use your innovation to strengthen a platform you don’t own and can’t control.

That doesn’t mean you should never open-source anything. It means you should be strategic about what you open-source, especially when the thing you’ve built directly enhances the capabilities of an AI platform.

What’s Actually at Risk

Not all public code carries the same risk. There’s a meaningful distinction between:

Low risk: Libraries, utilities, and tools that solve general problems. A date formatting library. A testing utility. An API wrapper. These are commodities. Nobody’s going to “steal” your date parser.

Medium risk: Integrations and workflows that connect existing tools in useful ways. A CI/CD pipeline template. A deployment automation. These have value, but the value is mostly in the configuration, not the concept.

High risk: Novel approaches to extending AI platform capabilities. Skills, agent behaviors, prompt architectures, workflow orchestrations that make an AI coding tool meaningfully more powerful. This is where the platform providers are watching most closely, because this is their product roadmap being written for them in public.

If your work falls in that third category (if you’ve figured out how to make an agentic platform do something it doesn’t do natively, and you’ve proven it works), you need to think very carefully about whether a public repo serves your interests.

The Practical Advice

If it’s a business: Keep your repos private. Full stop. If your competitive advantage is built on novel extensions to an AI platform, publishing your source code is publishing your business plan. License it, sell it, or keep it internal. Don’t give it away and hope for attribution.

If it’s a side project you might commercialize: Start private. You can always open-source later. You can never un-open-source something that’s already been scraped and trained on.

If it’s genuinely a community contribution: Go public, but go in with open eyes. Know that what you’re sharing will be used by platform providers. Decide that the community benefit is worth more to you than the commercial potential. That’s a valid choice. Just make it consciously.

If you’re building skills or extensions for a specific AI platform: This is the highest-risk category. Your work is literally extending the platform’s capabilities. The platform provider has the strongest possible incentive to absorb it. Keep it private unless you’ve made a deliberate decision to donate it.

The Uncomfortable Economics

Here’s the part that makes developers uncomfortable: the incentive structures have shifted, and most of us haven’t updated our instincts.

For twenty years, the default advice was “put it on GitHub.” Build your portfolio. Show your work. Contribute to the community. That advice was sound when the consumers of your public code were other developers and small companies.

Today, the consumers include AI models that are being trained on your code and platform companies that are looking for their next feature. The audience changed. The advice should change too.

And here’s the part that stings the most: you won’t even get credit. In the old open-source model, attribution was part of the deal. You contributed, your name was in the commit history, the community knew who built it. Recognition and reputation were the currency. For many developers, that was enough.

That’s gone now. Your work goes into a black hole. It gets analyzed, absorbed, and reimplemented. It comes out the other side as a feature with a company’s name on it. No attribution. No acknowledgment. No “based on the work of.” Just a changelog entry and a press release. The one reward that open source reliably provided was recognition. That doesn’t even exist in this model.

This doesn’t mean hide everything. It means be intentional. Ask yourself: who benefits most from this being public? If the answer is “a platform company that will ship a competing version within six months,” maybe keep it in a private repo until you’ve extracted the value yourself.

The Bigger Picture

We’re in a weird moment in software development. The platforms that developers build on are also the entities most capable of absorbing what developers build. The AI tools that help us write code are trained on code we’ve already written. The line between “ecosystem participant” and “unpaid contributor to someone else’s product” is blurrier than it’s ever been.

I don’t think the answer is to stop building. I don’t think the answer is to stop sharing. But I do think the answer is to stop being naive about where the value flows.

If your code makes an AI platform more powerful, that’s valuable. Treat it like it’s valuable. Protect it like it’s valuable. And make a conscious decision about when and whether to share it, instead of defaulting to public because that’s what we’ve always done.

All your codebases might belong to them. But only if you let them.

Coach's Playbook

AI workflows, team systems, and engineering leadership. Practical. Actionable. Weekly. Get it in your inbox — free.

Subscribe to Coach's Playbook →