Tari Ibaba

Tari Ibaba is a software developer with years of experience building websites and apps. He has written extensively on a wide range of programming topics and has created dozens of apps and open-source libraries.

This incredible IDE upgrade lets you always know the best coding model to use

Wow this is huge.

Windsurf just completely revolutionized AI coding with this incredible new IDE feature.

With the new Arena Mode in Windsurf, you can finally know exactly how strong your coding models are…

By putting them against each other inside your actual project, on the same prompt, at the same time—and then you just pick the winner with absolute clarity.

No more guessing or vibes like many developers are still doing

No “someone on YouTube said this model is better”…

Just:

Which one actually helped me more right now, for this particular use case?

What Arena Mode actually does

Normally when you use a particular coding model you’re making the assumption that it’s one of the best for the job.

Arena Mode challenges that head on.

With Arena Mode, Windsurf spins up multiple parallel Cascade sessions, each powered by a different model, and runs them side-by-side on the same task. You see the outputs next to each other, compare the approaches, and decide which one wins.

Once you choose, Windsurf keeps going with the winner and drops the rest. Simple.

It sounds small, but it completely changes how you think about model choice.

The underrated magic: isolated git worktrees

This is what makes Arena Mode feel legit instead of gimmicky and unstable.

Each model runs in its own git worktree. That means:

  • No stepping on each other’s changes
  • No weird merge situations
  • No “wait, which model edited this file?”

You can actually try two different solutions—accept changes, inspect diffs, and judge them like real code—because they are real code.

This alone makes Arena Mode way more useful than copying prompts between tools or tabs.

Battle Groups: removing brand bias

We need this because many of us are biased towards certain models by default…

Instead of picking specific models, you choose a group and Windsurf randomly selects contenders for you.

You don’t see which model is which until after you vote.

So you’re forced to judge based on:

  • clarity
  • correctness
  • style
  • how well it fits your codebase

Not reputation.

Once you pick a winner, Windsurf reveals the models and reshuffles things for the next round. It’s part productivity tool, part science experiment.

Your choices actually matter (beyond your editor)

Arena Mode isn’t just for your local workflow.

Every time you pick a winner:

  • Windsurf builds a personal leaderboard that reflects what works best for you
  • Your votes also feed into a global leaderboard based on real coding tasks

That’s the big idea behind these new wave of Windsurf updates: model evaluation shouldn’t live in abstract benchmarks. It should come from actual developers, working in real repos, solving real problems.

Arena Mode turns everyday coding into feedback.

When Arena Mode shines the most

Arena Mode is especially invaluable when:

  • You’re doing a non-trivial refactor and want different approaches
  • You’re debugging something gnarly and want multiple hypotheses
  • You’re testing a new or unfamiliar model without committing to it
  • You want to compare “safe and boring” vs “bold and clever” solutions

It’s less useful for tiny edits, but for anything that requires judgment, tradeoffs, or taste, it’s great.

A couple things to know before jumping in

It’s not totally plug-and-play:

  • Your project needs to be a git repo
  • Only git-tracked files are copied into Arena worktrees by default (extra setup needed for untracked files)

Obviously not deal-breakers, but worth knowing so you don’t get confused the first time you try it.

Even more upgrades

Windsurf has also add a new Plan Mode feature for step-by-step planning before code generation.

It pairs nicely with Arena:

  1. Plan the solution
  2. Let two models implement it
  3. Keep the better one

Simple and short.

Instead of telling you which model is best, with Arena Mode Windsurf is saying:

“You decide. In your code. On your problems.”

Did OpenAI just kill Cline?!

Is this really how OpenAI wants to compete now?

OpenAI didn’t buy Cline.
They didn’t shut down Cline.
They didn’t even ship a press release announcing anything at all.

Instead, it did something more chilling —and arguably more effective

It hired away a critical chunk of the people who made Cline matter.

What does OpenAI really think it’s doing to the open-source developer ecosystem?

Because this looks like a familiar pattern:

observe an open project validate a product category → hire the people who built it → internalize the insight → leave the project hollowed out.

No acquisition. No announcement. No responsibility.

Just LinkedIn updates and a lot of unanswered questions.

If this isn’t killing Cline, what is it?

Technically, Cline still exists. The repository is live. The license hasn’t changed. No one flipped a switch and turned it off.

But open-source projects don’t survive on legal existence alone.

They survive on momentum, credibility, and the belief that the people who actually understand the system are still around to guide it.

When a dominant AI lab hires a meaningful chunk of that group, the question isn’t “is the code still there?”
It’s who’s left that users trust to make decisions—and why should anyone bet on them now?

OpenAI understands this dynamic. They are not naive about software culture. Which makes it reasonable to ask: was this impact considered, or was it simply convenient?

Open source as a proving ground

Cline didn’t succeed because it had a marketing budget or enterprise sales. It succeeded because it did what open tools often do best: iterate quickly, listen to users, and explore ideas that large organizations tend to move slowly on.

That process creates value—real value. Design intuition. Workflow insights. A sense of what developers actually want from agentic coding tools.

When the people most closely associated with that value migrate en masse to a dominant lab, the value doesn’t disappear. It just changes ownership. The lessons learned in public move behind a product boundary, governed by incentives that are, by definition, proprietary.

Is that illegal? No.
Is it healthy for an ecosystem that depends on credible open alternatives? That is the question.

At some point, it’s reasonable to ask whether open source in AI is being treated less as shared infrastructure and more as unpaid R&D—a place where ideas are de-risked before being absorbed by whoever can afford to hire fastest.

The damage isn’t binary

Cline’s repository still exists. The license still allows forks. No one flipped a switch and shut it down.

But projects don’t die only when code disappears. They die when confidence erodes.

Users hesitate to depend on them. Contributors hesitate to invest deeply. Maintainers hesitate to promise continuity they may not be able to deliver. The chilling effect isn’t dramatic—it’s slow, quiet, and corrosive.

And here’s the uncomfortable part: OpenAI doesn’t have to intend this outcome for it to be predictable. When you’re this large, downstream effects aren’t hypothetical. They’re part of the cost of doing business.

“We didn’t acquire them” isn’t enough

The implicit defense—there was no acquisition; the project can continue—sets an extremely low bar. Large infrastructure players aren’t judged only by what they refrain from doing, but by how their actions shape the environment everyone else operates in.

So it’s fair to ask:

  • Was there any effort to support the project’s transition or governance?
  • Any acknowledgment of the community left behind?
  • Or is the assumption that open source will simply self-heal, every time?

Because if that assumption keeps proving false, then the message is hard to ignore: the commons is optional.

The bigger question

This isn’t really about Cline alone. It’s about whether the future of developer tooling will be shaped by options and credible alternatives—or by a handful of labs that can absorb emerging threats simply by hiring the people who understand them best.

Cline may survive. Or it may slowly fade, not because the idea failed, but because its center of gravity was quietly relocated.

If this pattern continues, we should stop treating it as incidental and start asking the question companies like OpenAI never seems eager to answer:

What does “open” actually mean in an ecosystem where the biggest player can hire the future out from under it?

I can’t believe what Anthropic just did to Clawdbot

“Clawdbot” is gone forever — and all thanks to Anthrophic.

Go to the original GitHub Clawdbot repo and what do you see? There is nothing there.

One day it was Clawdbot. The next day it wasn’t..

Over the last week, a fast-rising open-source AI project quietly changed its name to Moltbot after trademark pressure from Anthropic.

❌ Before:

✅ After:

The new Moltbot repo:

What looked like a simple rebrand turned into a drama about AI agents, naming rights, and what happens when an indie project goes viral a little too fast.

What Clawdbot was — and why people cared

It was everywhere. It could actually do things — and all under your control, all on your machine.

Clawdbot caught attention because it wasn’t just another chatbot. It was an AI agent designed to run locally and actually do things on your behalf — all on your machine.

Instead of chatting in a browser, users could talk to it through familiar apps like iMessage, WhatsApp, Telegram, Signal, or Discord.

From there, the bot could handle tasks like sending messages, managing reminders, interacting with calendars, and automating workflows across apps.

That local-first setup was a big part of the appeal.

People weren’t just testing it — they were setting it up as a persistent assistant, sometimes even dedicating always-on machines like Mac minis to keep it running.

It felt useful, powerful, and a little bit dangerous — which is exactly why it spread so quickly.

Anthropic wasn’t having it

The problem wasn’t the software. It was the name.

“Clawdbot” — and its lobster-style mascot “Clawd” — sounded and looked uncomfortably close to Claude, the flagship AI model from Anthropic. According to the project’s creator, Anthropic raised trademark concerns and asked that the name be changed.

This wasn’t framed as a suggestion. The creator publicly said the rename wasn’t his choice.

From Anthropic’s point of view, this was standard trademark behavior. Companies are often required to enforce their marks once there’s a real chance of confusion — especially when a project using a similar name starts getting widespread attention.

Why the name couldn’t just be tweaked

A lot of people asked the obvious question:

Why not just rename it something like “Clawbot” and move on?

Apparently, that wasn’t allowed either.

So the project went with Moltbot, keeping the crustacean theme but dropping anything that could plausibly be linked back to Claude. The mascot followed suit: Clawd became Molty.

It was fast, clean, and legally safer — but the timing couldn’t have been worse.

The rebrand fallout: confusion, scams, and noise

The rename landed right in the middle of peak hype, which created a perfect storm.

Suddenly there were new repo names, new social handles, people unsure which accounts were official, and opportunists rushing in. Fake tokens appeared. Crypto scammers tried to capitalize on the confusion.

The creator even said his personal GitHub account was briefly compromised, though the project itself wasn’t affected.

At the same time, he was asking people to stop pinging and harassing him — not because of criticism, but because the sheer volume of attention was becoming disruptive.

This is the ugly side of open-source virality: once a project breaks containment, it attracts everyone — including people who have nothing to do with the software.

The bigger conversation Moltbot kicked off

Even without the naming drama, Moltbot raises real questions.

Unlike a chat-only AI, an agent that can read messages, interact with apps, and take actions on your behalf comes with serious security implications.

If something goes wrong — or if an attacker figures out how to manipulate it — the blast radius is much larger.

That doesn’t make Moltbot reckless or irresponsible. It just means it sits right at the edge of where “cool demo” turns into “this needs careful thought.”

And that’s part of why the project blew up so fast. It’s why MCP blew up so fat:

People are hungry for AI that acts, not just talks.

It’s a telling moment we’re in

Open-source AI projects can now gain massive attention almost overnight. When they do, they immediately run into the same forces big companies deal with: trademarks, security concerns, bad actors, and public scrutiny.

Moltbot now clearly states that it’s independent and not affiliated with Anthropic. The software didn’t change — just the name. But the episode is a reminder that in today’s AI ecosystem, even naming something can have real consequences.

The agent era is arriving fast.

And sometimes, it starts with a lobster losing its claws.

Clawdbot is by far the most powerful AI assistant ever made

This is going to completely revolutionize the entire computing industry.

You will totally abandon all those your useless chatbots once you understand just how groundbreaking this is.

Like just imagine this:

An AI assistant that lives on YOUR machine, works inside the ALL chat apps you already use, and can take ANY real-world actions for you — including proactively sending you intelligent reminders and messages

❌❌ Another frustrating web app you have to open

✅✅ Clawdbot shows up every single place that you already are: WhatsApp, Telegram, Slack, Discord, Signal, iMessage, Teams…

You message it like a person. It replies like a person. It doesn’t have to stuck in a faraway server where you have absolutely idea what is going on with your data.

And it can actually message you first.

All this and we haven’t even started talking about all the amazing real-world actions that it can take for you.

Just built different

In simple terms, Clawdbot is an open-source personal AI assistant you run on your own computer or server.

It has two main parts:

The agent is the “brain.” It’s powered by an LLM and can use tools like the filesystem, shell commands, web browsing, and any integrations you enable. So it’s not just generating text — it can operate.

The gateway connects that agent to your chat apps and skills. It routes messages from WhatsApp or Slack to your local assistant, lets it run tools, and sends the reply back to you.

Local is the massive, massive keyword here — it’s why you’re seeing so many people desperately rushing for Mac Minis.

Put together you get something that feels less like a chatbot and more like a genius assistant living on your machine.

Everyone is going crazy about it right now

Okay I don’t know about this 😂

But Clawdbot has been shipping fast — and the recent releases are what pushed it into the spotlight.

A very recent release this month added things like richer replies for chat platforms, better text-to-speech behavior, and in-chat approvals — so the assistant can ask “Can I do this?” and you can approve it with a simple command.

It’s also starting to plug into bigger AI infrastructure.

Vercel recently published guidance on using Clawdbot with their AI Gateway, which lets you route requests across different models and providers.

That combination — fast iteration plus real ecosystem support — is what’s giving Clawdbot momentum.

Just not like the others

It lives in your chat apps.
You don’t open Clawdbot. You message it. That sounds small, but it removes the friction that kills most “assistant” workflows. If you can delegate a task the same way you text a friend, you’re far more likely to actually use it.

It’s local-first and transparent.
Clawdbot stores its memory, preferences, and configuration as real files on your machine — folders and Markdown you can open and edit. You can inspect what it “knows,” version-control it, and change its behavior without dealing with a black-box product.

It can use real tools.
Clawdbot can browse the web, run shell commands, read and write files, and call APIs. At that point, it stops being “advice” and starts being “execution.”

You can say things like:

  • “Clean up this folder and rename the files logically.”
  • “Pull my last 50 emails and summarize what needs action.”
  • “Draft a reply and send it.”

And it can actually do those things.

It’s built to grow new abilities.
Clawdbot is designed around skills and plugins. Once the core is running, you can keep adding new capabilities instead of switching tools every time you want something new. If you’re already into agent workflows or automation setups, Clawdbot fits naturally into that world.

What can it actually do right now?

Out of the box, Clawdbot is aimed at practical work, not novelty demos.

People are using it for things like:

  • managing email
  • scheduling and updating calendars
  • sending messages
  • summarizing inboxes and task lists
  • stitching together personal automations

And because it runs locally and can call tools, many people are using it as a personal automation layer instead of juggling multiple SaaS apps.

Clawdbot is a strong signal of where personal AI is heading.

Not one assistant inside one app.
Not one company owning your memory and workflows.

But agents you own.
Running locally.
Living across your communication channels.
With real tools and real memory.

It feels less like “ChatGPT with plugins” and more like an early version of a true personal assistant.

Not perfect, not finished…

But clearly pointing at what comes next.

Claude Code’s boring new “Tasks” update just changed AI coding forever

You hear “Tasks” and you roll your eyes thinking this is just another boring task list feature.

Not realizing that this unlocks the full power of one of the most revolutionary Claude Code upgrades so far:

Sub-agents.

Claude Code can now spawn new agents running in parallel — to break down a long complex task into simpler sub-tasks.

This is way better than our typical agent reasoning mode — because each sub-agent now has a completely new session for itself.

No more complex bloat from previous tasks — which means more space to think and give accurate results.

Each sub-agent only has to focus on its own task — and report back when it’s done.

Claude Tasks is here to keep all the sub-agents coordinated and in line with the overall goal of what you’re trying to achieve.

Now you have a real task list that shows up in your terminal as progress is made:

All the sub-tasks are coordinated intelligently — so for example you can see that Task B will be delayed if it depends on output from Task A:

And something really important about this is that the task list isn’t fragile anymore.

It survives:

  • long sessions
  • context compaction
  • jumping into side quests and back

So Claude doesn’t forget the overall plan just because the conversation moved around.

Effortlessly reuse tasks across sessions

This is the part most people are going to miss at first.

You can run multiple Claude Code sessions and point them at the same task list:

JavaScript
CLAUDE_CODE_TASK_LIST_ID=my-project claude

Now all those terminals share one checklist.

That means you can:

  • have one session refactoring files
  • another running tests or builds
  • another hunting edge cases

…and they all coordinate against the same “what’s left to do” list.

It basically turns Claude Code into a lightweight multi-agent setup without any extra tools.

1. Long workflows are finally reliable

Before this, Claude’s planning lived inside the chat context.
That’s a terrible place for anything that needs to last more than a few turns.

Tasks gives Claude a stable memory for:

  • what it already did
  • what it still needs to do
  • what order things should happen in

So you stop getting weird regressions like:
“Why are you re-doing step 3 again?”

2. Forget history, just share the list

Once you use the shared task list once, you feel the difference.

You stop treating Claude like:

“one fragile conversation that must never break”

and start treating it like:

“a expert dev that can pick tasks off a board.”

A powerful mental shift that makes a huge difference.

3. Lives where you already are

No dashboard.
No browser UI.
No extra tool to install.

It’s just… there in the terminal.
Hit Ctrl+T. See progress. Keep moving.

Frictionless.

Tasks vs /tasks — don’t confuse them

Claude Code already has a /tasks command.

That’s not this feature.

  • Tasks (new feature): the planning/progress checklist
  • /tasks command: background jobs (long-running commands)

They’re totally different things with the same name.

How to actually use it well

If you want Tasks to shine, do two small things:

1) Tell Claude what “done” looks like
For example:

“Create tasks for: reproduce bug, write failing test, fix bug, add regression test, run full suite.”

You’ll get much cleaner, more useful task breakdowns.

2) Use it for anything with branches
Tasks are perfect for work like:

  • refactor + tests + docs
  • migration + backfill + validation
  • bug repro + fix + regression suite

Basically: anything where multiple threads exist at once.

5 tricks to make Claude Code go 10x crazy (amateur vs pro devs)

Garbage in garbage out.

Many developers treat Claude Code like it’s supposed to magically read their minds — and then they get furious when it gives weak results.

Oh add this feature, oh fix that bug for me… just do it Claudy, I don’t care if I give you a miserably vague, low-quality prompt — I still expect the best Claudy.

And if you can’t give me what I asked for — then of course you’re worthless and vibe coding is total hype BS.

They just have no clue how to drive this tool.

Amateurs ask for code.

Pros ask for outcomes, constraints, trade-offs, and a plan.

They give Claude Code enough context to behave like a senior engineer who can reason, sequence work, and protect the codebase from subtle failure.

1) Use “think mode” for complex problems

If your prompt sounds like a simple low-effort task, Claude Code will give you… a simple low-effort solution.

If your prompt signals “this is a thinking problem”, you’ll get a completely different quality of output: constraints, risks, alternatives, and a step-by-step implementation plan.

Amateur prompt

Add authentication to my app.

Pro prompt (the “think” unlock)

I need you to think through a secure, maintainable authentication design for a React frontend with a Node/Express API. Compare cookie sessions vs JWT, include password hashing strategy, rate limiting, CSRF considerations, refresh-token handling (if relevant), and how this should fit our existing user model. Then propose an implementation plan with milestones and tests.

Why this works: you’re explicitly asking for architecture + trade-offs + sequencing, not “spit out code.”

Extra pro tip: add “assumptions” + “unknowns” to force clarity:

List assumptions you’re making, and ask me the minimum questions needed if something is missing.

2) Connect Claude Code to the world

Stop wasting the Claude Code’s potential — use MCP to connect it to external tools including databases and developer APIs.

Pros don’t keep Claude Code hopelessly relegated to just writing code.

They extend it with tools so it can inspect your environment and act with real context.

Project-scoped MCP config means: everyone on the team shares the same Ctoolbelt, checked into the repo. New dev joins? They pull the project and Claude Code instantly knows how to access the same tools.

What this unlocks

  • “Look at our database schema and generate endpoints”
  • “Scan the repo and find all usages of X”
  • “Check deployment status and suggest a fix”
  • “Run tests, interpret failures, patch code”

Amateur approach

Here’s my schema (pastes partial schema). Make me an API.

Pro approach

Use our project MCP tools to inspect the actual schema, identify relationships, then generate a CRUD module with validation, error handling, and tests. After that, propose performance improvements based on indexes and query patterns you observe.

What changes: Claude Code stops guessing and starts integrating with your environment.

3) Stop using Git like that

Amateurs are still treating git like a sequence of memorized commands — they think it’s still 2022.

Pros treat git + PRs like an orchestrated workflow: branching, implementation, commit hygiene, PR description quality, reviewer routing, and cleanup—all expressed as intent.

Amateur behavior

  • One giant commit: “stuff”
  • Still using manual git commands: git commit, git branch, etc.
  • Vague PR description
  • No reviewer guidance

Pro command (workflow orchestration)

Create a new feature branch for adding “Sign in with Google” using OAuth2. Implement the full flow end-to-end (redirect handling, token exchange, session persistence, logout). Commit in logical chunks using our conventions (small, descriptive messages). Open a PR with a clear summary, testing notes, and security considerations, and request review from the security-focused reviewers.

Why this works: Claude Code shines when it can plan a multi-step process and keep the repo readable for humans.

4) Don’t be so naive

Amateurs build like software like naive optimists — and this shows up in both their hand-written code — and their prompts to Claude Code.

Pros build systems that keep working when reality shows up: timeouts, duplicate requests, partial failures, bad inputs, rate limits, retries, and logging that makes incidents survivable.

Claude Code is unusually strong at “paranoid engineering”—you just have to ask for it.

Amateur prompt

Make a payment function.

Pro prompt (tests first + failure modes)

Design this payment flow defensively. Start by writing tests first (including failures): network timeouts, declined cards, malformed input, duplicate submission, idempotency, provider rate limiting, and partial capture scenarios. Then implement the code to satisfy the tests. Add structured logs, clear error taxonomy, and safe fallbacks where appropriate.

If you want to push it even further:

Include a retry policy with jitter, a circuit-breaker-like safeguard, and metrics hooks so we can observe success/failure rates.

Outcome: instead of “works on my machine,” you get code that holds up under pressure.

5) Stop refactoring like a noob

Amateurs refactor locally: rename a variable, extract a function, call it done.

Pros refactor system-wide: centralize logic, enforce boundaries, update imports everywhere, adjust tests, and keep behavior consistent across the codebase.

Claude becomes terrifyingly effective when you give it a refactor target + constraints + a migration plan.

Amateur prompt

Move this function into another file.

Pro prompt (multi-file, consistent patterns)

Refactor authentication so UI components no longer contain auth logic. Create a dedicated auth module/service, route all auth-related API calls through it, standardize error handling, and update all imports across the app. Add TypeScript types/interfaces where needed. Update tests to mock the new service cleanly. Then search the repo for any leftover auth logic in utilities and migrate it too.

Why this works: you’re not asking for “a refactor.” You’re asking for a controlled architectural change with guardrails.

The real secret: pros don’t write prompts, they write specs

If you want Claude to “go 10×,” stop giving it chores and start giving it:

  • intent (“what success looks like”)
  • constraints (security, performance, conventions, compatibility)
  • context (stack, repo patterns, architecture)
  • sequencing (“plan first, then implement, then test, then cleanup”)

Vercel’s new tool just made web dev way easier for coding agents

AI coding agents are about to get a lot more reliable for web automation & development — thanks to this new tool from Vercel.

These agents do excel at code generation — but what happens when it’s time to actually test the code in a real browser, like a human or like Pupeeteer?

They’ve always struggled with being able to autonomously navigate the browser– and identify/manipulate elements in a quick and reliable way.

Flaky selectors. Bloated DOM code. Screenshots that can’t really be understood in the context of your prompts.

And this is exactly what the agent-browser tool from Vercel is here to fix.

It’s a tiny CLI on top of Playwright, but with one genuinely clever idea that makes browser control way more reliable for AI.

The killer feature: “snapshot + refs”

Instead of asking an agent to guess CSS selectors or XPath, agent-browser does this:

  1. It takes a snapshot of the page’s accessibility tree
  2. It assigns stable references like @e1, @e2, @e3 to elements
  3. Your agent clicks and types using those refs

So instead of having to guess the element you mean on its own from a simple prompt like:

“Find the blue submit button and click it”

you get:

JavaScript
agent-browser snapshot -i # - button "Sign up" [ref=e7] agent-browser click @e7

No selector guessing or brittle DOM queries.

This one design choice makes browser automation way more deterministic for agents.

Why this is actually a big deal for AI agents

1. Way less flakiness

Traditional automation breaks all the time because selectors depend on DOM structure or class names.

Refs don’t care about layout shifts or renamed CSS classes.
They point to the exact element from the snapshot the agent just saw.

That alone eliminates a huge amount of “it worked yesterday” failures.

2. Much cleaner “page understanding” for the model

Instead of dumping a massive DOM or a raw screenshot into the model context, you give it a compact, structured snapshot:

  • headings
  • inputs
  • buttons
  • links
  • roles
  • labels
  • refs

That’s a way more usable mental model for an LLM.

The agent just picks refs and issues actions.
No token explosion or weird parsing hacks.

3. It’s built for fast agent loops

agent-browser runs as a CLI + background daemon.

The first command starts a browser.
Every command after that reuses it.

So your agent can do:

act → observe → act → observe → act → observe

…without paying a cold-start tax every time.

That matters a lot once you’re running 20–100 small browser steps per task.

Great power features

These are the things that make it feel agent-native — not just another wrapper around Playwright.

Skip login flows with origin-scoped headers

You can attach headers to a specific domain:

JavaScript
agent-browser open api.example.com \ --headers '{"Authorization":"Bearer TOKEN"}'

So your agent is already authenticated when the page loads.

Even better: those headers don’t leak to other sites.
So you can safely jump between domains in one session.

This is perfect for:

  • dashboards
  • admin panels
  • internal tools
  • staging environments

Live “watch the agent browse” mode

You can stream what the browser is doing over WebSocket.

So you can literally watch your agent click around a real website in real time.

It’s incredibly useful for:

  • debugging weird agent behavior
  • demos
  • sanity-checking what the model thinks it’s doing

Where it shines the most

agent-browser is especially good for:

  • self-testing agents
    (“build the app → open it → click around → see if it broke → fix → repeat”)
  • onboarding and signup flows
  • dashboard sanity checks
  • form automation
  • E2E smoke tests driven by LLMs

It feels like it was designed for the exact “agentic dev loop” everyone’s building right now.

Claude Code finally fixed its biggest flaw — this is huge

Every single developer using Claude Code is about to get way more powerful & productive than they already are.

This new Claude Code update finally fixes a major issue that’s been negatively impacting its accuracy for several months now — and many of us were never even aware.

All this time, Claude Code has been bloating up your context in the background with unnecessary data from every single one of your MCP servers.

It didn’t matter whether you actually used them or not in any given prompt — if you have 100 MCP servers, it would dump all the complex tool definitions and metadata for all of them into your context, with no exceptions.

Drowning out context that actually matters and lowering the accuracy.

But now with new Tool Search feature in Claude Code, this problem is finally gone forever.

They’ve fixed everything — and they did it in such an amazing way — web developers would be jealous.

The old MCP experience was quietly broken

Here’s what was happening before:

  • You connect a few MCP servers
  • Each server exposes a bunch of tools
  • Claude loads all of them at startup
  • Your context window gets eaten alive
  • Tool selection gets worse as your tool list grows

So even before Claude starts thinking about your actual code, it’s already wasting tokens on tool schemas you may never use in that session.

The more “power user” you became, the worse things got.

That’s backwards.

Tool Search changes everything — with a neat trick from web dev

With Tool Search enabled, Claude Code stops doing dumb work up front.

Instead of loading everything, it does this:

  • Nothing is loaded at startup
  • Claude keeps MCP tools out of context by default
  • When a task comes up, Claude searches for relevant tools
  • Only the tools it actually needs get pulled in
  • Everything else stays out of the way

Same MCP. Same tools.
But with lazy loading: Massively better behavior.

This is exactly how modern AI tooling should work.

Why this is so huge

1. You instantly get more usable context

This is the obvious win — and it matters a lot.

Tool schemas can be massive. When you’re running multiple MCP servers, you’re talking thousands (sometimes tens of thousands) of tokens wasted on definitions alone.

Lazy loading gives that space back to:

  • real code
  • repo context
  • actual reasoning

That alone makes Claude Code feel noticeably smarter.

2. Tool selection gets better, not worse

Too many tools hurt accuracy in another crucial way:

When a model sees a huge wall of tools, it’s harder for it to consistently pick the right one. Lazy loading narrows the decision space.

Claude now:

  • searches for tools relevant to this task
  • loads a small, focused set
  • chooses more reliably

That’s not theoretical — it’s how Anthropic designed Tool Search to scale.

3. MCP finally scales the way you always wanted

Before this update, connecting more MCP servers felt risky:

“Am I about to blow up my context just by having this enabled?”

But now you can keep everything connected.

With lazy loading, unused MCP servers are basically free. They don’t cost context until Claude actually needs them.

That changes how you think about building and composing MCP ecosystems.

It turns on automatically (which is perfect)

Claude Code enables Tool Search automatically once your MCP tool definitions would take more than 10% of the context window.

That’s smart:

  • small setups stay simple
  • big setups get optimized
  • no babysitting required

Very important: This changes how MCP servers should be written

Because Claude now searches for tools instead of seeing them all at once, your MCP server descriptions actually matter.

Good servers:

  • clearly state what problems they solve
  • make it obvious when Claude should use them
  • have clean, intentional tool naming

Bad descriptions = your tools don’t get discovered.

Lazy loading turns MCP servers into discoverable “capabilities” instead of background noise.

Google just made AI coding agents more powerful than ever

This is going to have such a massive positive impact on the accuracy and reliability of AI agents in software development.

The new Skills feature in the Google Antigravity IDE finally solves the problem of AI agents giving us wildly unpredictable/inaccurate results for the same prompt.

Too little context is terrible for agent accuracy — but things can get even worse when your agent has access to TOO MUCH context for a particular task.

The truth is your coding agent has access to a boatload of input/context that will not be necessary for any given task — but still take part in the agent’s thinking process.

Every single file and folder from every segment segment of your codebase… all the frontend, all the backend, all the tests, scripts, utilities, style guides…

Even all the MCP servers you have connected will also be part of the context…

So what do you think is gonna happen when you give instructions like, “Fix the password reset bug in the API”?

Your agent is going to take every single context it has into consideration for how best to respond to you.

You were only expecting it to change 2 files in the backend, but it went ahead to change 27 files all over the place (“Oh this vibe coding thing is such a scam, i knew it”)

Because you gave it the full responsibility of figuring out what exactly what you thinking. Figuring out the precise locations you wanted changes to be made in. Essential, reading your mind — when all it gave it was a painfully vague instruction.

And while it can do that a decent amount of the time, other times it fails miserably. “Miserably” as far what you were expecting is concerned.

And this is exactly what this new Skills feature from Google is trying to solve.

Skills let you finally give structure to the agent — you can now specify a high-level series of tasks the agent should perform in response to certain kinds of prompts.

Instead of using all the context and input all the time, the agent processes only the context relevant to the task at hand.

It can still intelligently decide how to make changes to your codebase — but only withing the framework and constraints you’ve provided with Skills.

And this is the major breakthrough.

What a Skill actually is

A Skill is just a small folder that defines how a certain kind of task should be done.

At the center of that folder is a file called SKILL.md. Around it, you can optionally include:

  • scripts the agent can run,
  • templates it should follow,
  • reference docs it can consult,
  • static assets it might need.

You can scope Skills:

  • per project (rules for this repo only),
  • or globally (rules that follow you everywhere).

That means you can encode “how we do things here” once, instead of re-explaining it every time.

The key idea: Skills load only when needed

This is the part that actually makes things more reliable.

Antigravity doesn’t shove every Skill into the model’s context up front. Instead, it keeps a lightweight index of what Skills exist, and only loads the full instructions when your request matches.

So if you ask to:

  • commit code → commit rules load
  • fix a bug → bug-fix workflow loads
  • change a schema → safety rules load

Everything else stays out of the way.

Less noise. Less confusion. Fewer “creative interpretations” where you didn’t want any.

What goes inside SKILL.md

A Skill has two layers:

1) The trigger

At the top is a short description that says when this Skill should be used.
This is what Antigravity matches against your request.

2) The playbook

The rest is pure instruction:

  • step-by-step workflows
  • constraints (“don’t touch unrelated files”)
  • formats (“output a PR summary like this”)
  • safety rules

When the Skill activates, this playbook is injected into context and followed explicitly.

Another powerful example: commit messages that stop being garbage

Imagine a Skill whose entire job is to handle commits.

Instead of:

“Commit these changes (and please follow our style)”

You encode:

  • allowed commit types
  • subject length limits
  • required “why” explanations
  • forbidden vague messages

Now whenever you say:

“Commit this”

The agent doesn’t improvise.
It follows the rules.

Same input.
Same standards.
Every time.

That’s reliability.

3 important ways

Skills improve reliability in three important ways.

1. They turn tribal knowledge into enforcement

Instead of hoping the agent remembers how your team works, you encode it.

2. They can delegate to real scripts

For things that shouldn’t rely on judgment — tests, validation, formatting — a Skill can call actual scripts and report results. That’s deterministic behavior, not vibes.

3. They narrow the decision space

A tightly scoped Skill reduces guesswork. The agent is less likely to invent a workflow when you’ve already defined one.

This new MCP server from Google just changed everything for app developers

Wow this new MCP server from Google is going to change a whole lot for app developers.

Your apps are about to become so much more of something your user’s actually care to use.

You’ll finally be able to effortlessly understand your users without having to waste time hopelessly going through mountains of Analytics data.

Once you set up the new official Google Analytics MCP server, you’ll be able to ask the AI intuitive, human-friendly questions:

  • “Which acquisition channel brings users who actually retain?”
  • “Did onboarding improve after the last release? Show me conversion by platform”

And it’ll answer using the massive amount of data sitting inside your analytics.

No more surfing through event tables and wasting time trying to interpret what numbers mean for your product. You just ask the AI exactly what you want to know.

Analytics becomes a seamless part of your workflow.

Don’t ignore this.

This is the first-class, Google-supported MCP (Model Context Protocol) server for Google Analytics.

MCP is now the standard way for an AI tool (like Gemini) to connect to external systems through a set of structured “tools.”

Instead of the model guessing from vibes, the AI can call real functions like “list my GA properties” or “run a report for the last 28 days,” get actual results back, and then reason on top of those results.

So think of the Google Analytics MCP server as a bridge:

  • Your AI agent on one side
  • Your GA4 data on the other side
  • A clean tool interface in the middle

What can it do?

Under the hood, it uses the Google Analytics APIs (Admin for account/property info, Data API for reporting). In practical terms, it gives your AI the ability to:

  • list the accounts and GA4 properties you have access to
  • fetch details about a specific property
  • check things like Google Ads links (where relevant)
  • run normal GA4 reports (dimensions, metrics, date ranges, filters)
  • run realtime reports
  • read your custom dimensions and custom metrics, so it understands your schema

Also important: it’s read-only. It’s built for pulling data and analyzing it, not for changing your Analytics configuration.

A game changer

A big reason many people don’t use analytics deeply isn’t because they don’t care.

It’s because it’s slow, complex and annoying.

You open GA → you click around → you find a chart → it doesn’t answer the real question → you add a dimension → now it’s messy → you export → you still need to interpret it in the context of your app.

With MCP, you can move closer to the way you actually think:

  • “Did onboarding improve after the last release? Show me conversion by platform.”
  • “What events tend to happen right before users churn?”
  • “Which acquisition channel brings users who actually retain?”
  • “What changed this week, and what’s the most likely cause?”

That’s what makes this feel different. It’s not “analytics in chat” as a gimmick — it’s analytics as a fast feedback loop.

High-level setup

The official path is basically:

  1. enable the relevant Analytics APIs in a Google Cloud project
  2. authenticate using Google’s recommended credentials flow with read-only access
  3. add the server to your Gemini MCP config so your agent can discover and call the tools

After that, your agent can list properties, run reports, and answer questions grounded in your real GA4 data.

This isn’t just a nicer interface for analytics—it’s a fundamental shift in how you build products people actually want to use. When your data becomes something you can ask instead of hunt, you make better decisions faster, and your app becomes something users genuinely love spending time in.

A real difference maker.