langchain-ai/open-swe

open-swe: Overview

An Open-Source Asynchronous Coding Agent

Python7,191 starspodcast14 min42 plays

View Repo

open-swe: Overview

podcast

Part 1 of 2

0:0014:52

Space: Play/Pause•←→: Seek•↑↓: Volume•M: Mute

Transcript

Okay, so picture this. You're a developer, and you've got this massive GitHub issue sitting in your backlog. It's been there for three weeks. You know what needs to be done, you just... haven't had the time. What if you could just hand that off to an agent, go grab lunch, come back, and find a pull request waiting for you? That's the promise of Open SWE, and honestly, after spending time digging through this codebase, I think they might actually be delivering on it. Welcome back, everyone. Today we're doing a deep dive into langchain-ai/open-swe — an open-source asynchronous coding agent that has pulled in over seven thousand stars on GitHub. Seven thousand. For a repository that's doing something genuinely hard. So let's unpack what's actually happening under the hood, because the architecture here is... well, it's interesting in ways I didn't expect. Let me start with the big picture. Open SWE is described as an asynchronous coding agent, and I want to dwell on that word "asynchronous" for a second, because it's doing a lot of work in that description. Most coding agents you've probably encountered — and there are a lot of them these days — are synchronous. You give them a task, you sit there watching a spinner, and eventually you get an answer. Open SWE is designed differently. The idea is that you fire off a task, you go do something else, and the agent works in the background. That's a fundamentally different user experience, and it requires a fundamentally different architecture to pull off. So the repository is organized into a handful of key directories. You've got the agent directory, which is really the heart of everything. You've got static, which holds the branding assets — those nice SVG logos you see in the README. And you've got tests, which, spoiler alert, is where things get philosophically interesting. The whole thing is primarily Python, with a Dockerfile and a Makefile to handle the infrastructure side. It's a clean, focused codebase. Sixty-five files, ten directories. Not sprawling. Intentional. Let's start where it all begins — the agent directory. This is where the actual intelligence lives, and when I started poking around in here, I immediately noticed something. The project is built on top of LangGraph. Now, if you're not familiar with LangGraph, it's LangChain's framework for building stateful, multi-step agent workflows as graphs. Think of it like... you know how you'd draw a flowchart of "agent does this, then decides between these two paths, then does that"? LangGraph lets you actually implement that flowchart as code, with proper state management, branching, and the ability to pause and resume. And that pause-and-resume capability is crucial for the async story. Here's what I find genuinely clever about this. Because LangGraph handles the state persistence, the agent can be interrupted, serialized to some storage backend, and then picked back up later. That's how you get the asynchronous behavior. It's not magic — it's a very deliberate architectural choice to use a framework that makes that possible. And honestly, when I first saw it, I thought "wait, that's actually a really elegant solution to a hard problem." Most people trying to build async agents are rolling their own state management, and it's a mess. These folks just... used the right tool. Now, let me talk about what the agent actually does, because "coding agent" is vague. Open SWE is specifically designed to work on software engineering tasks. Real ones. The kind that involve reading code, understanding context, writing new code, running tests, debugging failures, and iterating. It's modeled on the SWE-bench benchmark, which if you haven't heard of it, is basically the gold standard for evaluating whether AI systems can actually solve real GitHub issues from real open-source repositories. The name "SWE" literally stands for Software Engineering. So the agent has to be able to do things like... read a repository, understand its structure, identify what files are relevant to a given issue, make changes across potentially multiple files, run the test suite, see if things pass, and if they don't, figure out why and try again. That's a lot. And doing it asynchronously, where the whole thing might be interrupted and resumed? That's genuinely hard. Let me dig into the agent's graph structure a bit, because this is where it gets technically meaty. In LangGraph, you define your agent as a set of nodes — these are basically functions that do some work — and edges between them, which define how control flows. You can have conditional edges where the agent makes a decision about what to do next. And you can have special "human in the loop" nodes where the agent pauses and waits for external input before continuing. Open SWE uses this to implement a planning and execution loop. The agent starts by analyzing the task — reading the issue, understanding what's being asked. Then it does a planning phase, where it figures out what approach to take. Then it executes — making file changes, running commands. Then it evaluates the results. And based on that evaluation, it either declares success, or it loops back and tries something different. That loop is the core of how it handles the messiness of real software engineering tasks, because real tasks rarely work perfectly on the first attempt. And here's something I want to highlight, because it's a design decision that I think is underappreciated. The agent is built to work with tools. File reading tools, file writing tools, shell execution tools. These aren't just nice-to-haves — they're the mechanism by which the agent actually interacts with the codebase. The LLM at the center of the agent is choosing which tools to use and how, and the graph structure is what orchestrates the overall flow. So you've got this nice separation of concerns: the LLM handles the intelligence, the tools handle the side effects, and the graph handles the coordination. Now, let me talk about the infrastructure side, because this is where the "open source" part of Open SWE really shines. The Dockerfile and the Makefile are doing some interesting things. The Docker setup is designed to create a reproducible environment where the agent can safely execute code. And I mean safely in a real sense — you don't want an agent that's running arbitrary code to have access to your production systems or your home directory or anything sensitive. The containerization is doing important security work here. The Makefile is the developer experience layer. It's got targets for things like running the agent locally, running tests, building the Docker image. It's the kind of thing where you clone the repo, look at the Makefile, and immediately understand how to get started. I appreciate that. Too many open-source projects make you read three different README files and a wiki page just to figure out how to run the thing. Speaking of the README, let me sidebar on the branding for a second, because I think it tells you something about the project's ambitions. The static directory has these nicely designed SVG logos with light and dark mode variants. There's a level of polish there that says "we want this to be a real project that people actually use, not just a research demo." Seven thousand stars suggests they're succeeding at that. Okay, let me get back to the technical meat. I want to talk about the tests directory, because this is where I had an interesting philosophical moment. Testing an AI agent is... genuinely hard. How do you write a unit test for something that's supposed to understand code and make intelligent decisions? The answer is, you test the parts you can test deterministically — the tool implementations, the state management logic, the graph structure — and you rely on integration testing and evaluation benchmarks for the higher-level behavior. What I find interesting is that the test suite reflects a mature understanding of this challenge. There are tests for the infrastructure — making sure the tools work correctly, making sure state is persisted and restored properly, making sure the graph structure is valid. These are the kinds of tests that give you confidence that the scaffolding is solid, even if you can't fully test the intelligence sitting on top of it. That's the right approach. I've seen too many agent projects with no tests at all, which is a disaster, and too many with tests that are basically just "did the LLM say something?" which is meaningless. This is somewhere in the middle, in a good way. Now, I want to zoom out and talk about the broader context, because I think it's important for understanding why this project exists and why people care about it. The software engineering agent space has exploded in the last year or two. You've got commercial products like Devin, GitHub Copilot Workspace, various others. They're impressive, but they're black boxes. You can't see how they work, you can't customize them, you can't run them on your own infrastructure with your own models. Open SWE is the answer to that. It's saying: here's a capable coding agent, and it's open source, so you can inspect it, modify it, run it wherever you want, and use whatever LLM you prefer. That's a significant value proposition. Especially for organizations that have data privacy requirements, or that want to use fine-tuned models, or that just want to understand what's happening inside the system. And the choice to build on LangChain and LangGraph is interesting from this angle too. Those are also open source, well-documented, and have a large community. So you're not just getting an open-source agent — you're getting one built on open-source foundations that you can actually understand and extend. It's open source all the way down, which matters. Let me talk about something that I think is underappreciated in discussions of coding agents, which is the context window problem. When you're working on a real codebase, there's a lot of code. Way more than fits in any LLM's context window. So the agent has to be smart about what it reads and when. It can't just dump the entire repository into the prompt and ask the LLM what to do. It has to selectively retrieve relevant files, understand which parts of the codebase are relevant to the current task, and build up context incrementally. Open SWE handles this through its tool use pattern. The agent reads files on demand, using its understanding of the task to decide what's relevant. It might start by reading the issue description, then look at the file structure, then dive into specific files that seem relevant, then look at related tests. It's building up a mental model of the codebase incrementally, just like a human engineer would. That's a subtle but important design point. And here's where I'll share a little opinion, clearly marked as such. I think the incremental context building approach is actually more robust than approaches that try to pre-index the entire codebase and do semantic search. Semantic search is great, but it can miss things that are relevant in non-obvious ways. The agent's ability to follow chains of reasoning — "this function calls that function, so I should look at that file too" — is something that's hard to replicate with pure retrieval. Alright, let me bring this home. What does Open SWE represent in the bigger picture? I think it represents a bet that the right architecture for AI coding assistance is not a chat interface where you go back and forth with an LLM, but an autonomous agent that you give a task to and then get out of the way. The asynchronous design is central to this — it's designed for tasks that take minutes or hours, not seconds. And the open-source nature means that as the underlying models improve, as the tooling improves, as the community discovers better approaches, all of that can be incorporated. It's not locked into any particular vendor's model or any particular set of capabilities. That's a durable foundation. Seven thousand stars in a space that's moving incredibly fast tells you that people are hungry for this. They want a capable, transparent, customizable coding agent that they can actually understand and control. And looking at the codebase — the clean architecture, the thoughtful use of LangGraph, the proper containerization, the honest approach to testing — I think Open SWE is making a serious attempt to deliver that. So if you're building software and you've got a backlog full of issues that you know how to fix but just haven't had time to fix... this is worth a look. Clone it, read the Makefile, spin up the Docker container, and point it at something. See what happens. The worst case is you learn something about how coding agents work. The best case is you come back from lunch to find your PR waiting for you. That's Open SWE. Thanks for hanging out with me on this one — it was a genuinely fun codebase to dig through. Until next time.

langchain-ai/open-swe

open-swe: Overview

Transcript

More Stories