Docker Sandbox: Running LLM coding agents in a clean, secure local environment
Using Docker Sandbox has allowed me to have higher trust in my LLM coding agents. It has also fixed a few issues I’ve encountered with shell tools that Claude Code battles against.
Docker sandbox is a relatively recent addition to docker desktop, released in November 2025.
It is still considered an “expriemental” feature, so the API may change quickly over time. As of this writing, it only supports Calude Code and Google’s Gemini CLI, but I imagine OpenAI’s codex will be coming very soon.
The Benefits of Running Agents in a Sandbox
Since only the local project directory is mounted inside the container where the agent is running, the agent doesn’t have access to your full filesystem. This makes it much “safer” to allow for running an AI coding agent “dangerously” – with all premissions fully enabled. This means there are no permission prompts to review & approve, and so the agent can work more autonomously.
This might not seem like a big deal, but it does qualitatively change the way I work with AI agents like claude code. I can be fully “hands off” and let the agent do its thing – following the instructions in my prompt, in agent.md and running tests as it goes until it feels it has completed the assignment. At that point, I can jump back in to be the “human in the loop” where I can review and potentially modify what it came up with, or send it off to try again w a better prompt instruction.
For Simon Wilson’s “lethal trifecta”, running a coding agent locally inside of a docker container that only has access to your project’s code helps to eliminate access to private data that you might have stored elsewhere on your local computer.
The way docker mount works, configuration files like .env files are also available to the agent (although you can configure it to not read them) but those should only contain secrets for running the application locally. One could argue that your project’s source code itself could be considered private data, but if you’re not comfortable sharing that with a coding agent, then you won’t get much value from them… Nothing is ever totally safe, but this feels categorically safer than running the agent with complete access to your computer’s filesystem.
As a side bonus, Docker Sandbox also gives a neat, tidy shell for your coding agent to use when executing commands. This eliminates issues I’ve run into with customizations I’ve made to my shell – like adding shortcuts, prompts or any custom .dotfiles – to make the shell more useful for me as a (typing!) human, but which are unecessarry for an agent and can even cause issues when inputs and outputs don’t behave as the LLM expects.
Getting Started Locally
Docs for the experiemental sandbox feature are here.
Make sure you have docker desktop version 4.50 (released Nov 2025) or later for the docker sandbox commands to work.
If the docker desktop app gives you issues with its “update and restart” funtionality (mine never seemed to work, and would restart at the same version), you can simply download the latest docker desktop release here.
You’ll also obviously need a subscription to your coding agent of choice, mine is Claude Code.
Get a simple overview of the command by running:
docker sandbox
To get started with claude code inside of a containerized sandbox, all you need to run is
docker sandbox run claude
You’ll see docker downloading the layers for a docker image called docker/sandbox-templates:claude-code on dockerhub.
Then it will prompt you to configure claude code by choosing a color palette and logging into your account.
How Docker Sandbox Works
When you run docker sandbox run claude, it mounts the current directory into the container. This means that as the agent makes edits, you’ll see them show up in the project’s files on your machine – even though the agent is running inside the container – since the copy of the files that the containerized agent has access to are from that mount.
Make sure you run that command from your project’s directory, so that it has access to your project’s source code.
Then the container starts running claude code with the --dangerously-skip-permissions flag, so that claude will never pause its work to ask you whether it can run a command.
The container also maintains a tiny docker volume for persisting your claude code configuration information – including your API key and theme color preference.
That’s about all it does. It has all of the same claude code features you know, such as
- shift+tab to toggle between auto-accepting edits and “plan” mode
- ctrl+enter to queue up subsequent tasks to ask the agent to perform once it’s done w the current task
I had been planning to try to set something up like this myself manually, when I came across this exerimental feature. To be honest, I was a bit surprised to see docker getting into the world of LLM coding agents as a first-class citizen. It’s not really what I’d imagine as a core part of the docker desktop product, but it is certainly a useful feature and it saved me from trying to set up all of the plumbing myself.
It also got me to update my version of Docker Desktop for the first time in four years, which was probably a good idea. 😅
How I Have Used It So Far
I have been using it for a few weeks so far, and here is what I’ve found works well.
To make good use of fully autonous coding agents, make sure you have a good AGENT.md file for your project. Any time you notice that you’re needing to nudge the agent towards or away from some behavior, it’s good to add a rule here so it doesn’t do it again.
This approach allows you to be much more nit-picky than you would be allowed to if you were giving the feedback to a junior developer, since there’s no need to help claude save face or worry about making it feeling discouraged. 🤖
Also make good use of /clear command within claude code, each time you’re working on a new task. I’ve found that code quality tends to degrade with threads that are longer than 5+ messages. If it seems like the agent is going down a rabbit hole, reset the work and start a new conversation with a better initial prompt.
The docker sandbox also passes along your git name and email, in case you want to let claude make commits on your behalf. I don’t use claude code this way, and I don’t recommend it. I’d rather let it make modification and then I can review the changes individually to stage them for commit. Letting a coding agent commit “as me” seems disengenous – if my name is going to be on the commit, I want it to be something I have reviewed and approve of.