Slopbox: A sandbox for our AI slop :)

· , , , ,

I don’t normally do this … but … occasionally I want to let an AI agent run unsupervised on a task: to generate a throwaway POC, explore an implementation idea, or hack on something while I’m doing other work. Even with each Agent’s permission system, giving it free rein over my repo feels uncomfortable. One bad rm -rf or overzealous refactor and I’m picking through git reflog trying to recover my afternoon. Not to mention the damange it could do to my environment or access stuff it shouldn’t.

So I built a small tool called slopbox for running AI agents in isolated Docker containers and a copy of my repo. This way I’m free to merge changes in (or throw them away) without it affecting my main worktree. It was initially inspired by Dagger’s container-use. This is just much more in tune with my workflow and the tools I use - but also I don’t trust an Agent to do something (consistently) just because it says so in a markdown file. I need stronger guarantees.

How it works

Slopbox flow diagram

Slopbox creates disposable sandboxes using Docker. Each sandbox gets:

  • A git worktree clone of my repo (not the actual repo)
  • A shared Nix store mounted read-only from a daemon container (yes I don’t want to let it touch my host’s nix store either)
  • My devenv environment, so tools “just work”

When I run slop my-feature, it creates a branch called agent/my-feature, clones my repo to ~/.cache/slopbox/worktrees/, and drops me (or the agent) into a container. The agent can commit, install packages, break things…whatever. My real repo and environment stays untouched.

When I’m done, slop diff my-feature shows what’s changed. If I like it, slop apply my-feature merges it back. If not, slop gc deletes everything.

This is an experiment

This is super experimental and exploratory! It may change or break in unexpected ways - or I may abandon it completely. I rarely run unsupervised agent loops right now. Most of my Agent usage is interactive - I’m watching what it does, reviewing diffs, steering it when it goes off course. But I want the option to spin up a sandbox, point an agent at a problem, and check back later.

The use case I keep coming back to is throwaway POCs. “Hey Claude, prototype a CLI that does X” or “sketch out how we’d add Y to this service.” Things where I don’t care about code quality, I just want to see if an idea is feasible. Having the agent work in an isolated clone means I can let it run wild without worrying about the mess.

Future directions

Docker containers provide decent isolation for my use-case. I don’t think I really need much stronger isolation than that - nor am I thinking about restricting it’s network access or some such. That said, I did think about other approaches:

Bubblewrap - The sandboxing tool Flatpak uses. Lighter weight than Docker, gives me fine-grained control over namespaces and bind mounts. Only reason I’m considering it casue it could be faster to spin up and tear down.

Full NixOS VMs - Since I’m already running NixOS on my laptop, I could theoretically replicate my entire workstation inside a VM. VirtualBox, QEMU, whatever - once. The Nix configuration is already there - my system is declarative. This would give stronger isolation than containers (separate kernel, no shared daemon) at the cost of more overhead. Might be worth it for truly untrusted workloads. Realistically I don’t see this happening since it will be even slower than the current approach and that irks me. But we’ll see. Maybe if it’s a long running single VM for all agent workflow and not one per agent.

Neither of these exist yet. Just ideas I’m mulling over.

I also may consider writing a proper CLI in a proper programming language. Right now it’s just a bunch of bash scripts that are distributed via devenv. Honestly, if devenv had this built in that would be kinda cool I guess. But I don’t think the idea has fully crystallized yet and bash is fine for experimentation. Ironically I’m totally fine with bash now that I’m not the one writing them - the Agent is.

Drawbacks

This tool is tightly coupled to my setup and a very specific way I work. I use devenv and Nix for most of my projects. If you’re not already in that ecosystem, the on-boarding cost is high. There are probably simpler solutions for your use case.

Other issues:

  • Slow - Even with caching, sandbox startup is slower than running locally. First run for a project is especially painful as it populates the Nix store. The good news is that subsequent runs (even for other projects) are much faster.
  • Disk hungry - Each worktree is a full clone. The shared Nix store grows over time. Run slop gc periodically.
  • Complexity - Docker daemon, Nix daemon, volumes, git mirrors. More moving parts means more things that can break.
  • Manual agent setup - There’s an instructions file at /etc/slop/instructions.txt that explains the environment to agents running inside it, but I have to manually tell them to read it. Haven’t thought about how to automate that yet.
  • Multi agent support - Right now I mostly run claude but I’d like to have custom support built it for each major CLI agent. And not have to remember --dangerously-skip-permissions would also be nice.

I’m sharing this mostly for my own documentation, because I promised myself I’d write more this year and because I might gain more insight by writing things down. Maybe it’s useful as a reference for others thinking about similar problems.

Check it out in all it’s bash glory at: github.com/denibertovic/slopbox.

Did you like this post?

If your organization needs help with implementing modern DevOps practices, scaling your infrastructure and engineer productivity—I can help! I offer a variety of services.

Get in touch