Token Guard: Keeping Your Agent Context Lean in CI

If you've worked with AI coding agents recently, e.g. Claude Code, Cursor, Windsurf, Copilot, you've probably noticed something: the quality of their output depends heavily on the context you feed them. Good context in, good code out.

The team at Cuttlesoft has been integrating agents into our development workflow across multiple projects and team members, and that surfaced a practical question: how do we share and maintain the project-specific context that makes these agents more useful?

The answer, for us, was to check that context into our repositories. Rules files, system prompts, project documentation, architectural decision records; all version-controlled alongside the code they describe. This way, when one developer dials in a set of agent instructions that produce reliable results, the rest of the team benefits immediately. It's the same principle behind committing your linter config. If it shapes how the project gets built, it belongs in the repo.

Context Windows Are Large, Not Infinite

Context windows have indeed grown dramatically. Google's Gemini 3 Pro already has a 1 million token window, and Anthropic's Claude Opus 4.6 is testing 1 million tokens in beta. These are big numbers, and it's tempting to treat them as effectively unlimited.

But every token you spend on context is a token you're not spending on output. If you're loading 80K tokens of documentation into a session before the agent writes a single line of code, you've consumed a meaningful chunk of that window. The agent has less room to reason, less room to generate, and less room to iterate. And you're paying that overhead on every single interaction, across every agent session, for every developer on the team.

Developers sometimes refer to this as "lobotomizing" the agent. You've technically given it more information, but in practice, you've given it less room to think. The context was supposed to help, but it ate the space the agent needed to actually do the work.

We wanted a way to keep an eye on this. Not to optimize prematurely, but to make informed decisions about what context we're carrying and what it costs us.

Hello, Token Guard

Token Guard is a GitHub Action we recently published to the GitHub Marketplace. It counts the tokens in your tracked context files on every push or pull request and reports the results directly in your CI pipeline.

The idea is straightforward: you tell it which files or directories contain your context, it counts the tokens, and then writes a summary to your GitHub job output. If you set a threshold and the total exceeds it, the check fails. It uses tiktoken, OpenAI's tokenizer library, and supports glob patterns so you can target exactly the files you care about.

The best part? It works out of the box with zero configuration. The defaults target known LLM instruction files (CLAUDE.md, .cursor/rules/, AGENTS.md, .windsurfrules, .clinerules, .github/copilot-instructions.md, common prompts/ directories) with a 2,500 default token limit:

name: Token Guard

on: [push, pull_request]

jobs:
  check-tokens:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: cuttlesoft/token-guard@v1

That's it. On every push, Token Guard scans the default patterns, counts the tokens, and writes a markdown table to the job summary showing the breakdown per file and pass/fail status.

Custom Patterns and Limits

Override the defaults to target your specific context directories and set your own budget:

- uses: cuttlesoft/token-guard@v1
  with:
    patterns: |
      docs/diagrams/**/*.md
      prompts/**/*.txt
      !prompts/archive/**
    max_tokens: "4000"

Per-File Mode

If you'd rather enforce limits on each file individually instead of the total sum, flip the mode:

- uses: cuttlesoft/token-guard@v1
  with:
    patterns: |
      prompts/**/*.md
    max_tokens: "1000"
    token_limit_mode: per_file

Token Guard also exposes outputs — total_tokens, file_count, and files_over_limit — so you can wire the results into downstream steps in your workflow.

A Scale, Not a Calculator

An important caveat worth calling out: token counting is proprietary and vary across models and providers. OpenAI's cl100k_base tokenizer won't produce the same count as Anthropic's tokenizer or Google's. Claude, for instance, uses its own tokenizer that isn't available in tiktoken at all. Token Guard defaults to cl100k_base as a reasonable general-purpose baseline, and supports other tiktoken encodings (o200k_base for GPT-4o, p50k_base, etc.), but whatever it reports is an approximation.

Think of it as a bathroom scale, not a lab instrument. It tells you whether your context is 10K tokens or 100K tokens, whether that last PR added 15K tokens of new rules, and whether you're trending in a direction you're comfortable with. In practice, the variance between encodings for small config files is roughly 10-20%. That's close enough for budgeting purposes, just don't use these numbers to calculate your API costs.

The value is in the relative measurement. When a teammate opens a PR that bumps your context from 30K to 55K tokens, that's a conversation worth having and Token Guard surfaces that conversation automatically.

Why This Matters

The real motivation behind Token Guard isn't about saving a few cents on API calls. It's about collaboration between humans and agents.

When you're a team of developers all working with AI agents on the same codebase, the context those agents consume becomes a shared resource. One person adds a detailed architectural overview to the rules directory. Another adds per-feature specifications. Someone else drops in a style guide. Each addition is individually reasonable, but collectively they can balloon past the point of usefulness.

Without visibility into the cumulative overhead, there's no forcing function to curate. Token Guard gives your team that visibility. It turns "how much context are we carrying?" from a gut feeling into a number that shows up in every PR review.

This is especially relevant as agent-assisted development matures. We're not just using one agent in one IDE anymore. A single project might involve Claude Code for backend work, Cursor for frontend components, Copilot for quick completions, and a custom agent pipeline for code review. Each of these agents ingests your context files. Each one carries its own token overhead. Keeping that intentional and well-understood is just good hygiene, in the same way you'd monitor your bundle size or keep an eye on your dependency count.

Get Started

Token Guard is open source and available on the GitHub Marketplace. Drop it into a workflow, point it at your context directories, and start getting visibility into what your agents are actually consuming.

And if you're figuring out how AI agents fit into your products or your team's workflow more broadly, whether that's LLM integration, RAG pipelines, or custom agent tooling... that's what we do!