My AI teams — at home & at work | Blog

Or: what I learned watching Samwise sort my Gmail.

A bit of a brain dump rather than a polished argument — I've had a lot of thoughts floating around about how quickly all this is moving, especially from inside a big bank, and I wanted to get them down somewhere.

Discord message from the Samwise app: Inbox Once-Over summary in character voice

This is a message from my agent, Samwise. He runs my Gmail now.

Samwise is one of four agents on a Lord of the Rings themed Discord server I built last weekend. Each one has the personality of its character. Sam writes like a Hobbit, complete with "Mr Kip" and "proper-like". Gwaihir writes like the Lord of the Eagles, which is to say slightly imperious and prone to making pronouncements.

Gandalf and Faramir round out the crew. They run on a Raspberry Pi 5 in my office — fresh Linux install, OpenClaw doing the heavy lifting, OAuth onto my OpenAI subscription so I'm not paying twice for tokens I'm already paying for.

The catalyst was pretty straightforward. A friend at work mentioned they'd started using OpenClaw a couple of weeks ago, and I'd been quietly embarrassed for a while about a specific personal failure: my work email is meticulously organised — every label, every folder, every flag in its place — and my home Gmail had five hundred and something unread emails, no labels, no folders, and no energy to sit down and sort it out by hand. I had a spare afternoon. I gave it a crack.

Gwaihir was first. In Tolkien's world he's a messenger of the Vala Manwë, which felt fitting for an agent whose job is to bring me word. His brief: build out my Discord channels, assign the agents their roles, manage their access — plus read the small dashboard I built that pulls my banking transactions to a local database every night, and at the end of each day, send me a Discord message with a verdict. Some days it's "chuck in an extra couple hundred this pay to savings, things look healthy". Some days it's "easy on it tomorrow, lord". Faramir handles a couple of other things; Gandalf shows up for the bigger questions. (The casting still needs work.)

By Sunday evening I had four agents running, my home inbox was sorted for the first time in about three years, and Gwaihir had made his first call on my coffee budget. And then I went to work on Monday and realised I'd already been doing this for months.

* * *

I've used Cursor for a few years. It started as a research aide during my masters — citations, context-synthesis, pulling threads together across a stack of papers. Solid. Useful. Nothing world-ending. It has come along in leaps and bounds since.

Cursor-style agent UI running a multi-agent code review with two Code Reviewer subagents in parallel — Cursor mid-orchestration: I dispatch subagents at Tasks, Features, and Epics — here, two Code Reviewers working a spec file each — then I review their work and guide it where it goes off-track.

What today actually looks like is this. First thing in the morning I run a slash command — /whats-next — which checks my backlog, looks at the most recent context I've curated, and tells me what to pick up. I audit my workspaces. I dispatch groups of subagents at Tasks, Features, and Epics. I review their work, I guide it where it goes off-track, I organise comms. The architecture underneath is real and visible: a centralised content-and-intention library, a fortnightly leadership-update pipeline that scans commit dumps from sibling repos and synthesises them against strategic pillars, a verifier subagent that gates finalised entries, a locally-run dashboard that tails the orchestration log so I can see what's happening across all of it at once.

The shape of my role has changed. I keep thinking of friends I know who studied information systems, who work as librarians — the profession has always been about architecting how knowledge gets organised and retrieved. That is what my role is now. I curate the context. I write the rules that bound how my agents ask for it. I name the skills they use, the slash commands they trigger, the audiences they're writing for. They do the work.

Live orchestration log tail showing slash commands, timestamps, and status lines — A small internal webapp I've built at work (with a fair chunk of the implementation done by AI agents) that tails my orchestration log — slash commands, how I interact with Cursor, the activity I want to review. Aiming, longer term, for a single hub for all my automations.

I'm doing months' worth of work in a week, and it's high quality. I review every output before it ships. I keep finding things to fix — a wrong tone in a stakeholder email, a missing column in a data table, an Epic broken into the wrong shape — so the human-in-the-loop framing still describes what I actually do. But it's getting harder, week on week, to argue that the work needs me to be the one in that loop.

* * *

The trigger moment was a couple of weeks ago. I was setting up a new slash command and reading the documentation for the Cursor SDK — the version that lets you orchestrate agents in the cloud, hands-off, without even opening Cursor itself. And I caught myself drawing the diagram of where my work was heading. The slash commands could migrate to the cloud. They could run on a schedule. The orchestration log already exists. The dashboard already exists. The verifier subagent already exists. The whole flow could run autonomously, and at that point my role — architectural, librarian-y, curatorial — would mostly be... reviewing things.

Which is when the question landed. Am I a human in the loop, or am I a middleman between my executive's intention and a team of agents that does the actual work? Honestly, the answer isn't comforting. I have a team of five to ten agents underneath me at NAB, at any given time. They don't have names — I haven't gone in for the LOTR thing at work — but they're set up the same way: subagents with their own skills, their own access to a shared context library, their own slice of a task. They burn out the moment their context window is full and a fresh one spins up to replace them. They are nameless and transient and they do the work.

Currently I see myself as the human in the loop. More and more I wonder whether the loop needs me. The architecture isn't waiting for me to decide. The Cursor SDK is already shipping. Half the slash commands I'd want to migrate are migration-ready. My executive's intention has more direct paths to my agents than it strictly needs to route through me.

It's a hard look at yourself, that question. What's left for me?

* * *

It turns out I'm not the only one being asked that question. And the people asking it aren't waiting for me to answer.

When I was in my mid-twenties, I was on the phones at NAB — Direct Servicing, fielding inbound calls from a few thousand colleagues. I'd just come from doing payroll at Country Road and I'd used spreadsheets a decent amount, so when someone needed fifteen minutes of daily reporting done on the side, I put my hand up. That fifteen-minute gig followed me. Off the phones into international value servicing. Then resource planning. Then workforce planning. Then analytics. And now data science. I'm grateful for the on-ramp. It gave me fundamentals, and more importantly it gave me a generous amount of room to make mistakes that I learned from. That on-ramp doesn't exist any more. The fifteen-minute reporting role is an agent now. It's not waiting on someone in their mid-twenties who's nifty in Excel.

A senior exec at NAB and I were having a coffee a few weeks ago, trying to do a prescient thing — what comes next? Their point: ninety-nine per cent of our security risk comes from third-party SaaS vendors. But we're already paying for squads of software engineers, internal tooling, the lot. If the cost of building software has collapsed, which it has, the question they're starting to ask is: why aren't we just building our own HRIS? It's not a hypothetical. The pricing assumptions of every enterprise SaaS company are quietly being recalculated, alongside the risk they bring to a customer's security posture.

Raspberry Pi 5 in the official red and white case on a wooden desk, USB-C power connected — The home stack: Raspberry Pi 5 in the official case — where the LOTR crew and OpenClaw actually run.

On 5 May, Anthropic shipped ten ready-to-run agent templates for financial services — pitch builder, KYC screener, month-end closer, statement auditor, the lot. Plus add-ins for Excel, PowerPoint, Word, and Outlook. The Outlook add-in is described, in their own words, as "a chief of staff that triages your inbox, arranges meetings, and drafts responses in your voice." Citadel is using it. BNY is using it. Carlyle, Mizuho, Walleye, Hg. That description is functionally identical to Samwise. I built mine on a Raspberry Pi 5 in my office with a Discord webhook and an OpenAI subscription. Anthropic's version is being sold by the seat to hedge funds. They are the same product.

This week I replaced a PowerPoint deck with an HTML site generated in Cursor. The time saving was so immense that I sat down afterwards and thought about David Graeber's Bullshit Jobs. The pptx slingers don't grasp it yet. Presentation development is going to be auxiliary, automated work in months, not years. The models don't need to get any better. The applications do.

* * *

Monday morning

Home office desk with curved monitor, keyboard, mouse, and mug in warm lighting

So this is where it leaves me on any given Monday morning. Last week's email was "the Cursor SDK is here." This week's was "agents for financial services are here." Next week's will be something else. I open my laptop, I take a step back, and I learn the new thing. Sometimes I set an agent on it to teach me about it, which is its own kind of joke.

My personal team has grown beyond the LOTR crew. There's another agent that goes off to bill-comparison sites once a month and tells me whether my internet, electricity, and insurance are still good deals. And here I am — I don't even have any kids, I don't have to make dinner for anyone other than myself. I've outsourced the admin of a cosy life, and the hours that came back are real and noticeable.

I feel lucky. My work, the people I read, the platforms I'm on — they've kept me close enough to all of this to keep up. A lot of people aren't going to be that lucky. The tools are arriving faster than the people they're going to replace can be retrained, and the people who are pretending to have an answer aren't paying close enough attention.

What are the new pathways and opportunities? How can they be designed so they aren't outstripped by AI development? What are the 'new jobs' that get developed?

But here's the part I don't want to lose in all of this: I find it genuinely exciting. The pace is dizzying, the tools are extraordinary, and I get to spend my days experimenting at the edge of something almost no one was doing a few years ago — months ago — weeks ago. Working at a big bank, where tech change has historically moved slowly, makes the contrast even sharper — and a lot more interesting.

I'm experimenting with it. I'm trying to learn from it.

It's going to be a tricky time. It's also going to be a fascinating one.