IBM Bob Needs a Context Budget Before It Needs More Tools

How GitHub MCP scope and diff size change cost, focus, and reliability in day-to-day engineering work.

Jun 06, 2026

Open IBM Bob or your coding assistant of choice, look at the token counter in the top-right corner, then connect a large MCP server. You can spend a meaningful part of the context window before you ask Bob to do real work.

I like MCP. I do not like acting as if it has no cost.

The current Bob context window docs are unusually direct about this. Bob gets a 200,000-token context window, starts condensing the conversation at 140,000, and includes MCP tool definitions in that same budget. The current Bob MCP docs are just as blunt: disable unused tools because their definitions consume context.

For me, that changes the useful question. Bob can reach GitHub. The more useful question is “what did I put into the model before it even looked at my code?”

I wanted a number I could check, so I measured the current GitHub MCP Server surface and compared it with a different kind of context waste: raw git output from a real dirty repository that I have locally. It is my development repository for The Main Thread. It has a ton of local changes, that only once in a while get pushed to Github. It is ideal to test so we can answer the question “is MCP bad?” A broad MCP server spends budget up front. A git or gh workflow spends budget later, when you push large outputs into the same chat. This article is about where that context budget goes.

What you need

You do not need a benchmark setup for this. You need one real repository, one Bob window, and a willingness to look at the token counter before trusting your first impression.

IBM Bob (sign up for a free trial if you like), with access to the current docs and the token counter in the chat panel
A repository with real local changes, not a staged toy example
Plain git
Optional: GitHub CLI if you want to apply the same narrowing pattern to pull request work
About 20 minutes

As of the date of writing, the current GitHub MCP Server configuration guide says the default toolsets are context, issues, pull_requests, repos, and users. The current feature flag docs also show how that surface can expand when you opt into more granular issue and pull request tools.

My token estimates here use o200k_base tokenization against the official GitHub MCP tool snapshots and the current local git payloads in said repository. o200k_base is OpenAI’s latest BPE (Byte Pair Encoding) tokenization algorithm used by advanced models like GPT-4o and later

Bob’s exact live tokenizer does surly differ a little. The underlying problem does not.

The quiet cost shows up before the task starts

The first cost is the catalog tax. I mean the tokens you spend on tool definitions before the task starts.

Bob’s docs say MCP tool definitions live inside the context window. The GitHub MCP docs say the default server surface includes five toolsets. So I pulled the official GitHub MCP tool snapshots and counted them.

The default GitHub MCP toolsets came out to about 17,201 tokens.

That is already a non-trivial slice of Bob’s budget:

About 8.6 percent of the full 200,000-token window
About 12.3 percent of the 140,000-token condensation threshold

The broader official snapshot surface I measured landed at about 37,787 tokens. That is roughly 18.9 percent of the full window and 27.0 percent of the condensation threshold. A few always-on servers are enough to reduce the free space quite a lot.

The default GitHub buckets were not evenly sized, either:

repos cost about 7,841 tokens
pull_requests cost about 5,093 tokens
issues cost about 3,541 tokens
context and users were cheap by comparison at about 403 and 323

That breakdown matters because it gives you a practical way to be more careful. If the task is code review, I would rather load a narrow GitHub review surface. I do not want repository, issue, discussion, action, and write-heavy tools in the same task just because “GitHub” sounds like one thing.

The current GitHub MCP Server configuration guide supports exactly that kind of narrowing with X-MCP-Toolsets, X-MCP-Tools, X-MCP-Exclude-Tools, and read-only mode. The GitHub docs are making the same point as the Bob docs: shape the surface on purpose.

If I were setting up a review-oriented GitHub MCP connection in Bob, I would start closer to this than to “just enable everything”:

{
  "mcpServers": {
    "github": {
      "url": "https://api.githubcopilot.com/mcp/",
      "headers": {
        "X-MCP-Toolsets": "issues,pull_requests",
        "X-MCP-Readonly": "true",
        "X-MCP-Exclude-Tools": "create_pull_request,merge_pull_request"
      }
    }
  }
}

I am only showing the headers that shape the surface. Add authentication the same way you already do for your GitHub MCP setup. My point is the surface shape, not your secret-management style.

This is also where Bob’s project-level .bob/mcp.json support matters more than teams admit. A global MCP setup gets crowded very quickly. A project-level setup is more likely to reflect what that repository is actually doing.

Git and gh also cost tokens

Skipping MCP and using git sounds simpler. The real picture is more mixed than that.

The second cost is the payload tax. I mean the tokens you spend when commands return large outputs.

git and gh feel lighter because Bob does not need to carry a large tool catalog before the task starts. That is true. The problem comes later, when people spend that saving immediately by asking the agent to read a very large patch all at once.

I measured my current dirty working tree in my test repository because it has the kind of mess that shows bad habits clearly: edits, deletions, untracked publishing files, and generated assets.

I did not force a separate gh benchmark for the local-change case because that is not where gh is most useful. For “what changed in my working tree?” the useful comparison is broad MCP surface versus ordinary git. For pull request work, gh behaves much more like git than like MCP. It has almost no upfront catalog cost, and then a payload cost only when you ask for output.

The small commands were genuinely small:

git status --short came out to about 817 tokens
git diff --name-only came out to about 619 tokens
git diff --stat came out to about 655 tokens
git status --porcelain=v2 was fatter, but still small at about 3,261 tokens

The full patch was much larger:

git diff came out to about 82,747 tokens
git diff --unified=0 was not meaningfully better here at about 83,206 tokens
Reading the current untracked text files as raw content came out to about 83,568 tokens

That means a local-change prompt can use half the window even without MCP if you choose the broadest possible output.

The combined numbers are more serious:

Default GitHub MCP surface plus the current tracked diff came out to about 99,948 tokens
Broader GitHub MCP surface plus the current tracked diff came out to about 120,534 tokens
Default GitHub MCP surface plus tracked diff plus current untracked text content came out to about 183,516 tokens
Broader GitHub MCP surface plus tracked diff plus current untracked text content came out to about 204,102 tokens

That last number is already beyond Bob’s documented 200,000-token window.

One result is useful and a little absurd. One deleted SVG in my repository accounted for about 37,787 tokens by itself. That single asset was roughly the same size as the broader GitHub MCP snapshot surface I measured.

This is why I do not like the simple “use the CLI instead” advice. The CLI does not solve the problem by itself. It is only a different way to waste the budget if you choose broad outputs too early.

Start local-change analysis with the cheap questions

If the prompt is “analyze my local changes,” I would not start with a full patch unless the repository is tiny or I already know the diff is clean text.

I would start like this:

git status --short
git diff --stat
git diff --name-only

That gives Bob three useful things at almost no cost:

Which files changed
Rough size by file
Whether the mess is concentrated or spread out

After that, narrow on purpose:

git diff -- path/to/file
git diff -- path/to/second-file

That pattern is boring, and that is good here. You are asking the model to tell you where deeper attention belongs before you give it the expensive context.

The same principle applies to GitHub CLI.

If the work is really about a pull request, I would rather start with small remote views such as gh pr view metadata or gh pr diff --name-only. I do not want to dump a full PR diff into the chat on turn one. The exact command matters less than the sequence:

Ask what changed
Ask where the risky areas are
Read only the files that matter
Escalate to the full diff only if the first three steps justify it

Here is the main operating rule in one sentence: narrow the catalog first, then narrow the payload.

This is not an argument against MCP

I would still use GitHub MCP for workflows where it fits well.

It fits well when the work is mainly about GitHub:

Reading issue state and comments
Reviewing a pull request with structured operations
Writing a review comment or updating GitHub state directly
Repeating the same repository workflow often enough that the upfront catalog cost is worth it

I would lean on git or gh first when the work is structurally local:

“Analyze my working tree”
“Tell me what changed before I commit”
“Which files deserve review first?”
“Did I accidentally mix three tasks into one diff?”

This is the part I think teams mix together. They install a broad GitHub server globally, then ask a local-change question, then feel surprised that the agent is carrying a lot of GitHub machinery into a task that mostly needed git status, git diff --stat, and some restraint.

The experiment I would actually run

If you want to make this concrete in your own setup, run the same three tasks through three different Bob surfaces.

Use these setups:

No GitHub MCP server at all. Let Bob use normal file and command tools with git.
A lean GitHub MCP setup with only the toolsets needed for review work.
A broad GitHub MCP setup that looks convenient but loads a lot of extra surface.

Use tasks shaped like these:

Analyze my current local changes and tell me which three files deserve deeper review first.
Summarize the risk in the deleted or moved areas.
Tell me what I need to read before I touch the GitHub workflow in this repository.

Watch three things:

The token counter before the first real action
The first tool or command Bob reaches for
Whether Bob narrows the problem or tries to swallow the whole repository at once

I do not expect one setup to win every task. The right setup changes with the job. Broad always-on configurations fail more often than teams want to admit.

The default I would keep

If I were writing one rule for a team, it would be simple.

Keep GitHub MCP project-specific whenever you can. Keep the toolsets narrow. Prefer read-only for review-style work. Do not feed full diffs into the first turn. Treat generated assets as suspiciously expensive until proven otherwise. Ask the agent to rank files before it reads them in full.

That is also why I like Bob’s current docs on this point. They do not pretend the model can solve bad context hygiene by itself. The docs say the window is finite, tool definitions consume tokens, and disabling tools helps. Good. That is the more useful way to talk about this.

MCP is not the enemy. git is not the savior. Unbudgeted context is the problem. Sometimes that waste arrives as a large set of tool definitions. Sometimes it arrives as an 80k-token diff blob. Usually it arrives because nobody decided what the agent actually needed for this task.

This is also why I do not find “we just need a bigger context window” very convincing. A larger window can help in some cases, but it also lets bad context stay alive for longer. That includes stale instructions, irrelevant tool definitions, oversized diffs, and weak intermediate summaries that keep pushing the model in the wrong direction. I think of that as context window poisoning: the window is full, but too much of it is low-value, misleading, or simply old. A bigger window does not fix that. It often hides the problem for a while, increases cost, and delays the moment when the team learns to narrow the tool surface and the payload. Most of the time, better selection beats more storage.

Conclusion

The useful mental model here is “catalog tax versus payload tax.” A broad GitHub MCP server can use tens of thousands of tokens before Bob touches your code, and a broad local git diff workflow can use just as much a minute later. The fix is simpler than the demos: narrower tool surfaces, narrower outputs, and a little more honesty about what should compete for attention in the same 200,000-token window.

Mick Darling

Jun 6

I’ve built a wrapper protocol called MCPAQL that keeps all the target MCP server’s operations without trimming and shrinks the token size to just a few thousand tokens.

You can check out the spec at MCPAQL.com and the GitHub mcp server adapter below.

https://github.com/MCPAQL/examples/blob/develop/adapters/github-api-adapter.md

1 reply by Markus Eisele

1 more comment...

Discussion about this post

Ready for more?