Tari Ibaba

Tari Ibaba is a software developer with years of experience building websites and apps. He has written extensively on a wide range of programming topics and has created dozens of apps and open-source libraries.

Claude Code’s new voice mode just changed AI coding forever

Wow Anthropic is on fire — now they just gave us this brilliant new voice mode feature for Claude Code — and this is going to totally transform the way so many developers interact with AI coding tools moving forward.

Instead of carefully typing every instruction — you can now speak your intent directly to an AI agent that understands and works inside your codebase.

From prompt-writing to near-real-time collaboration — now communicating closer than ever to the speed of thought — explaining problems, delegating tasks, and refining instructions naturally and intuitively.

And it’s not just a generic speech recognition — this was built specifically for coding.

With seamless activation, real-time streaming transcription, and seamless voice-plus-keyboard input, Claude Code is going to start feeling less like a chatbot and more like a peer-to-peer coding partner.

Enable it with 1 command

The setup is intentionally lightweight.

  • You just type /voice to enable voice mode.
  • No external dictation tool or additional setup is required.
  • Voice becomes simply another input layer inside the existing workflow.

Fine-tuned for coding, not generic conversation

Claude Code voice mode isn’t just speech added to a chatbot.

The key point is that the transcription itself is optimized for coding workflows, not everyday conversation. That means it’s tuned to handle the kinds of things developers actually say when working:

  • syntax-heavy phrases
  • function and class names
  • file paths and CLI commands
  • library names and technical terminology

So that means we can say things like:

  • “Open auth-middleware.ts and trace where the token validation fails.”
  • “Refactor the UserService class to use dependency injection.”
  • “Run the test suite and show me the failing cases.”

And Claude Code can reliably capture and act on those instructions.

Voice becomes a way to direct a coding agent, not just chat with one.

Zero-cost transcription lowers the barrier

This is one of the biggest selling points:

Voice transcription tokens are free.

This removes a major adoption barrier.

Benefits include:

  • No need to worry about usage costs while speaking
  • Easier to use voice for rough or exploratory prompts
  • Encourages natural thinking out loud during development

If transcription were metered, people would hesitate to use it casually. Removing that friction makes voice a default option when it’s faster.

Real-time streaming is what makes it usable

The feature supports real-time streaming transcription.

This means:

  • Your speech appears in the prompt as you talk
  • Voice and keyboard input work together
  • You can seamlessly switch between speaking and typing

Example hybrid flow:

  • Speak the high-level task
  • Type a specific filename or function
  • Continue speaking to explain constraints or context

This hybrid interaction is what makes voice mode genuinely useful instead of gimmicky.

How to use it

The workflow follows a simple three-step loop.

1. Activate

  • Type /voice in Claude Code.
  • If your account has access, voice mode will enable immediately.

2. Speak

  • Hold Space to talk.
  • Release the key when finished.
  • Your speech is transcribed directly into the prompt.

You can mix:

  • spoken instructions
  • typed edits
  • additional clarifications

in the same prompt.

3. Execute

Once your request is ready:

  • Claude Code processes it like any normal instruction.
  • It can explain code, modify files, or execute tasks depending on permissions.

Why this matters

1. Communicate intent much faster

Speech is often faster than typing, especially for complex requests.

Voice works best for:

  • multi-step instructions
  • exploratory prompts
  • long explanations of a problem

Example:

Instead of typing:

Trace the error path for this authentication bug and suggest the minimal safe fix

You can simply say it.

Speaking removes the friction of composing a perfectly structured prompt.

2. High-level intent is easier to express out loud

Voice naturally encourages higher-level thinking.

When speaking, people tend to include:

  • goals
  • tradeoffs
  • uncertainties
  • constraints

Like:

  • “I think this bug is somewhere in the auth middleware…”
  • “We probably shouldn’t change the public API…”
  • “Try the smallest fix first.”

That additional context helps the AI understand what you actually want, not just what you typed.

3. The hybrid workflow is the real power move

The biggest advantage isn’t voice alone.

It’s the voice + keyboard workflow.

Benefits include:

  • Keep your eyes on the code while speaking instructions
  • Avoid stopping to craft perfectly typed prompts
  • Maintain flow while navigating files and debugging

This reduces micro-context switching, which is one of the biggest productivity drains in development workflows.

4. Stay productive for much longer

Long coding sessions can be physically demanding.

Voice mode helps by reducing:

  • repetitive typing
  • hand strain
  • keyboard fatigue

Possible ergonomic benefits:

  • alternate between typing and speaking
  • maintain better posture during long sessions
  • sustain focus for longer periods

Voice won’t replace keyboards—but it can balance the workload on your hands.

5. Spoken language often gives Claude better context

People naturally provide more context when speaking.

Compared to typing, spoken instructions often include:

  • more explanation
  • clearer reasoning
  • additional situational details

For an AI coding assistant, this extra context improves:

  • understanding of the problem
  • reasoning about potential fixes
  • the quality of generated solutions

In other words, speaking can actually improve the clarity of your request.

Claude Code’s new Voice Mode is here to reduce the distance between thinking and delegating work.

This isn’t just a new input method.

It’s a more natural way to direct AI-powered development workflows—one that keeps you focused on the code while communicating intent at the speed of thought.

Claude just made leaving ChatGPT easier than ever

The Claude memory import feature is going to make a world of difference for how we use and think about AI assistants moving forward.

Until now moving from one tool to another meant starting over — re-explaining your preferences, projects, tone, and workflow from scratch.

But now, the memory import feature removes so much of that friction by letting you bring over the context another AI has already built about you.

No more wasting time rewriting words and recreating context that already exists in another chatbot.

What it is

Claude’s memory import lets you transfer personalization data from another AI into Claude.

That can include:

  • Writing preferences
  • Tone and formatting style
  • Recurring projects
  • Professional goals
  • Tools and workflows you use
  • Corrections you’ve made to previous AI behavior

Instead of rebuilding this manually, you can import it and give Claude a strong starting point.

This is huge because modern AI value isn’t just about intelligence — it’s about accumulated context.

How to use it

The process is simple:

  • Ask your current AI assistant to export everything it remembers about you
  • Copy the exported memory
  • Paste it into Claude’s memory import flow
  • Claude extracts and converts that information into structured memory entries

Important distinction:

  • Claude does not import your full chat history
  • It imports a synthesized personalization layer
  • It converts that synthesis into editable memory items

This makes it about portability of context — not portability of conversations.

Why it matters

1. Zero-day personalization

Normally, switching AI tools means:

  • Repeating your writing preferences
  • Re-explaining your job or industry
  • Re-teaching tone and formatting
  • Re-stating tools and workflows
  • Re-correcting predictable mistakes

That can take days or weeks.

Memory import changes that.

  • Claude starts with a richer understanding on day one
  • No need to manually recreate long preference lists
  • Faster path to useful outputs

It compresses the personalization timeline.

2. No more context lock-in

AI lock-in today isn’t just about files. It’s about learned context.

Before now, the more an assistant knows about you, the harder it feels to leave.

Claude’s import feature weakens that dynamic:

  • Makes personalization more portable
  • Reduces switching costs
  • Gives you more control over your AI context

The bigger idea:

  • You should own the data AI has on you
  • That includes the memory layer
  • Personalization shouldn’t trap you on a platform

That’s a meaningful shift in power toward users.

3. Switch whenever

It lowers the barrier to walking away from ChatGPT.

Reasons someone might want to leave:

  • Product direction
  • Trust concerns
  • Pricing
  • Ecosystem preference
  • Competitive experimentation

The hardest part of leaving isn’t model access — it’s losing personalization.

Claude reduces that cost.

That makes it easier to:

  • Switch tools
  • Diversify AI usage
  • Or fully boycott ChatGPT if desired

Even if people don’t leave, the leverage dynamic changes.

How it differs from ChatGPT memory

Two key differences stand out.

Memory synthesis

Claude’s system is built around:

  • Ingesting exported context
  • Extracting key information
  • Converting it into structured memory entries

That creates:

  • Faster onboarding
  • Migration-friendly personalization
  • A deliberate “context transfer” workflow

ChatGPT memory, by contrast, primarily improves through ongoing usage and gradual accumulation.

Claude accelerates that process.

Work-centric prioritization

Claude appears to prioritize professional context.

Its memory focuses on:

  • Work-related information
  • Projects
  • Tools
  • Goals
  • Collaboration preferences

It may not retain unrelated trivial personal details.

That suggests:

  • Less life-log
  • More professional collaborator

For developers, that focus makes the feature more valuable.

The bigger takeaway

This isn’t just a convenience feature.

It signals a shift toward:

  • Portable AI memory
  • User-controlled personalization
  • Lower switching friction
  • Reduced platform lock-in

The next phase of AI competition won’t just be about smarter models.

It will be about:

  • Who personalizes fastest
  • Who gives users control
  • Who makes context movable

Claude’s memory import feature pushes in that direction.

Cursor agents are now writing themselves

30% of internal merged PRs at Cursor are now created by Cloud agents.

We’ve had autocomplete. We’ve had chat-based coding assistants. We’ve had agents that can open a repo and make a pull request.

This is something different entirely — this is the next generation of AI-assisted coding.

The agents are writing themselves

These agents don’t just suggest code, but take the wheel, build features, open PRs, and ship to production on their own. It’s virtual computer control.

We are no longer talking about the AI agents writing code faster or with greater accuracy.

We are now firmly in the era of the self-driving codebase.

The Cursor team asked an agent to add GitHub source links to each component on their Marketplace plugin pages.
The agent implemented the feature end-to-end — then it recorded itself clicking each component to verify the links worked correctly.

We’ve already seen major strides being made to ascend AI agents to a higher level of autonomy beyond just modifying the codebase according to prompts.

We saw this with Previews from Claude Code — with Claude Code now comprehensively testing your app and fixing any detected runtime bugs in realtime.

Now we are seeing this with Cursor agents being now being able to control their own computers — not just their codebase anymore.

We are in the age of handing AI full computer control, letting it run in parallel, validate its own work, and hand you something that’s ready to merge — complete with demos and high-level descriptions of everything it did.

This isn’t just a genius senior developer anymore.

This is entire freaking development team. And you just became the executive.

Full computer control

Most AI coding tools live inside text. They edit files, maybe run a command, maybe see the output. But they don’t really use the software they’re building.

Cursor’s newer cloud agents change that. They run inside isolated virtual machines. They can open the browser. Click through flows. Start servers. Inspect logs. Take screenshots. Record videos. In other words, they don’t just write the feature — they experience it.

That’s a big deal.

Because once an agent can use the product, it stops being just an intelligent assistant stuck inside the codebase — and starts behaving more like an engineer. It can try something, see what breaks, fix it, and repeat. The ceiling gets much higher when the AI isn’t blind to the environment.

Parallelization as a first principle

Instead of one agent slowly working through a task, Cursor experiments with fleets of them. Hundreds, in some cases. But throwing more agents at a problem doesn’t magically make things better. Without structure, they step on each other, block on shared resources, or get stuck playing it safe.

So the system borrows from organizational design. A top-level planner owns the big goal. Sub-planners break that goal into chunks. Workers execute in isolation. Planning and execution both happen in parallel.

Software development stops looking like a solo craft and starts looking like systems management.

Self-validation and merge-ready output

Here’s the part that really changes the workflow: the output isn’t just code.

The agent runs the tests. If there aren’t tests, it can add them. It clicks through the UI to verify behavior. It resolves merge conflicts. It rebases. It checks logs.

And then it attaches artifacts.

Videos of the feature working. Screenshots of edge cases. Structured summaries explaining what changed and why. Logs showing that the server booted cleanly.

This matters because trust is the real bottleneck in AI-assisted development. A diff alone doesn’t tell you whether something works. Proof does.

When an agent hands you a pull request with evidence attached, your role shifts from “figure out what happened” to “decide whether this meets the bar.”

That’s a different posture.

Artifacts as proof of work

The artifacts aren’t fluff. They’re the connective tissue between autonomous execution and human judgment.

Think of them as receipts.

They reduce ambiguity. They shorten review cycles. They make it easier to delegate bigger chunks of work without losing visibility.

Instead of asking, “Did it actually work?” you can just watch it work.

Over time, that changes how much responsibility you’re willing to hand off.

The developer’s new job

All of this leads to the biggest shift: your role moves up a level.

If agents can execute, validate, and document, your leverage isn’t in typing. It’s in direction.

You define the goal. Clarify constraints. Shape the plan. Review outcomes. Decide what ships.

You spend less time authoring every line and more time navigating complexity. You become the orchestrator rather than the instrument.

This doesn’t make developers obsolete. It makes judgment more valuable. Taste. Prioritization. Architecture. Product sense.

The work doesn’t disappear. It changes altitude.

So is it really “self-driving”?

Not fully (yet??)

Humans are still in the loop. They set intent and make the final call.

But the trajectory is clear. When software can control its environment, split work across many workers, validate its own results, and return merge-ready output with proof attached, it starts to resemble autonomy.

The self-driving codebase isn’t about replacing developers. It’s about amplifying them — and shifting the craft from line-by-line construction to high-level steering.

And once you’ve experienced that shift, it’s hard to go back.

Google’s new AI image generator just changed everything

Wow this is huge.

Google just released a massive upgrade to their image generation model — and this thing is on a whole different level.

Nano Banana 2 pushes AI image generation way beyond novelty and closer to something we can actually use in production, use as a daily driver in everyday life.

Created with Nano Banana 2 — Infographic comparing cloud types

It’s not just about spitting out unbelievable or ultra-realistic images this time.

It’s about cost-effective speed, consistency, accuracy, and flexibility — the traits that make an image generation model usable in the real-world of software development, the traits creative teams actually need.

1. Pro-level quality at Flash speed

Nano Banana 2 gives you high-fidelity images in seconds (typically 10–15s) while improving overall visual quality.

Created with Nano Banana 2 — a misty panoramic aerial shot of a verdant valley

What’s improved:

  • More vibrant, dynamic lighting
  • Richer textures and sharper detail
  • Cleaner handling of complex scenes
  • Faster iteration without major quality loss

Why it matters:
You no longer have to choose between speed and polish. The model is built for rapid concepting, quick revisions, and high-quality drafts that are often close to final output.

2. 🌐 Google Search grounding

Localization an image in Nano Banana 2

One of the biggest upgrades is Google Search grounding.

Nano Banana 2 can:

  • Pull real-time visual references from Google Search
  • Verify landmarks, people, and products
  • Use up-to-date visual information before generating

Why this is significant:

  • Reduces guesswork in recognizable subjects
  • Improves factual accuracy
  • Makes the model more viable for commercial and educational use

Instead of approximating a famous building or product from memory, the model can check current references — a major step toward reliable AI visuals.

3. 🎭 Subject consistency

Created with Nano Banana 2 — an image with several characters

Consistency has long been a weak point in image generation. Nano Banana 2 addresses that directly.

It can maintain:

  • Up to 5 characters
  • Up to 14 objects
  • Across multiple images in a sequence
Created with Nano Banana 2 — an image with several characters

What this enables:

  • Storyboarding
  • Comic strip creation
  • Branded character campaigns
  • Multi-frame marketing concepts

Characters keep their appearance. Objects stay recognizable. Visual identity becomes more stable across iterations.

4. 📝 Precision text rendering

Created with Nano Banana 2 — an infographic depicting the water cycle

Text inside AI images used to be notoriously unreliable a few years back.

The first Nano Banana made serious improvements here, and v2 takes it even further.

It can handle:

  • Complex labels and signage
  • Clean typographic layouts
  • Infographics and diagrams
  • Structured text blocks

It also supports:

  • In-image translation
  • Instant localization of text within graphics

Practical benefit:
You can generate posters, packaging mockups, charts, menus, and educational graphics without rebuilding all text manually in a separate design tool.

5. 📐 Flexible specs

Nano Banana 2 supports a wide range of resolutions and aspect ratios.

Resolution range:

  • 512px
  • 1K
  • 2K
  • 4K

Native aspect ratios:

  • 16:9 (widescreen)
  • 9:16 (vertical/social)
  • 21:9 (cinematic)
  • 8:1 (panoramic)

Why this matters:
Modern content lives everywhere — social feeds, websites, presentations, digital signage. This flexibility means assets can be generated in the correct format from the start.

Bottom line

Nano Banana 2 isn’t just about stunning or realistic images. It combines:

  • ⚡ Fast generation
  • 🎨 Higher visual fidelity
  • 🌍 Real-time search grounding
  • 🔁 Stronger multi-image consistency
  • 📝 Accurate in-image text
  • 📏 Flexible output specs

The result is a model designed not just to wow and amaze — but to integrate into real creative workflows.

If these capabilities hold up at scale, Nano Banana 2 could become one of Google’s most practically useful AI image tools to date.

5 genius tricks to make Claude go 10x crazy (amateur vs pro devs)

Claude Code gets unbelievably powerful when you stop treating it like just a “coding assistant”.

And start treating it like an full-fledged operating system for your engineering workflow:

Standards, reusable playbooks, parallel execution, deep codebase interrogation, and tool chains that run end-to-end.

1) Implement team-wide coding standards (and make them stick)

Most teams have standards, but they’re scattered across docs, half-remembered conventions, and PR comments.

Claude Code gives you a single place to encode “how we build software here”: a root CLAUDE.md file Claude reads at the start of every session.

What belongs in it:

  • Non-negotiables (error handling, logging, security rules)
  • Architecture map (module boundaries, “this package owns X”)
  • Golden paths (preferred patterns for DB work, retries, input validation)
  • PR checklist (tests required, docs updates, performance/security checks)
  • Commands (how to run lint/typecheck/tests/migrations so Claude can verify its own work)

Pro move: keep it short and strict. If CLAUDE.md turns into a wiki, it becomes background noise. Treat it like a contract.

2) Extend capabilities with Skills

A Skill is a reusable playbook that turns “how we do X” into something you can invoke consistently. Not more prompting — repeatable procedures.

The point is to make Claude behave like your team’s best engineer on their best day, every day.

How to build one (fast, practical):

  • Define when to use it (and when not to)
  • Specify required inputs (paths, module names, constraints)
  • Write the method as steps (search → analyze → implement → verify)
  • Define the output contract (diff + tests + summary, or checklist + findings)
  • Add quality gates (lint/typecheck/tests must pass before “done”)

Skills worth building first:

  • /review-pr: runs your checklist the same way every time
  • /add-tests: generates tests in your preferred style with coverage expectations
  • /refactor-module: your “safe refactor” procedure, including guardrails

If you do nothing else, build a review Skill. Consistency is as important as raw model intelligence.

3) Get things done 10× faster with Claude Code Agent Teams

Most people run one Claude session and ask it to do everything sequentially.

Pros run Agent Teams: multiple Claude sessions in parallel, each working in its own context, with a lead session coordinating tasks and synthesizing results.

Where it shines:

  • Refactors across many packages (split by directory ownership)
  • Cross-cutting changes (API + UI + tests + docs)
  • Big bug hunts (repro agent, tracing agent, fix+tests agent)

The prompt pattern:

  1. define the outcome
  2. define the split strategy
  3. define a no-collisions rule

Example:
“Create an agent team for this web application. Split work by packages (api/, web/, shared/). Each teammate proposes a minimal diff plus tests. Lead delivers a single integrated patch and summary.”

You’re basically turning Claude into a mini org chart: parallel workers + one integrator.

Most developers search codebases manually: grep for names, chase string literals, click through files until they “feel close.” That’s slow, and it misses the subtle stuff: duplicated checks, hidden bypasses, and patterns that drifted over time.

Pros use Claude Code like a superintelligent code archaeologist: not “find the file,” but “reconstruct the system.”

What amateurs do:
“Find where we handle user authentication.”

What pros command:
“Analyze our entire codebase and identify all authentication-related logic: direct implementations, helper functions, middleware, hooks, and hardcoded auth checks scattered throughout components. Map relationships between these implementations, identify inconsistencies, and flag potential security vulnerabilities or duplication.”

Why this works:

  • It finds semantic equivalents, not just keywords
  • It builds a map (entry points → flows → dependencies)
  • It surfaces drift (multiple token parsers, mismatched role logic)
  • It finds risk (client-only enforcement, missing server checks)

Ask for a structured output:

  • Auth Map (flows + entry points)
  • Inconsistencies (what differs and why it’s risky)
  • Smells/Vulns (missing checks, unsafe fallbacks, duplication)
  • Unification plan (what to centralize, what to delete, how to migrate)

That’s the difference: amateurs “search.” Pros run investigations.

5) Build custom MCP server chains (autonomous pipelines, not “one tool”)

Most people set up one MCP server and call it a day. Pros chain multiple MCP servers into an orchestration network that can run multi-step operations: analysis → changes → tests → deploy → verification → promotion.

Amateurs add just one single server, like “database.”

Pros orchestrate a set like:

  • codeAnalysis (find issues, map affected surfaces)
  • testRunner (targeted tests + suite gating)
  • securityScanner (dependency + pattern scanning)
  • deploymentPipeline (staging deploy, promotion, rollback)

The real unlock is one-shot execution with pre-approved permissions — not reckless “no prompts,” but deliberate guardrails:

  • least-privilege scopes
  • explicit allowlists
  • hard stop-conditions
  • mandatory gates (tests/scans must pass)
  • audit trail (commits, summaries, artifacts)

What amateurs ask:
“Scan for vulnerabilities.”

What pros command (single cascade prompt):
“Analyze our codebase for security vulnerabilities, apply safe fixes, run automated tests, update vulnerable dependencies, commit changes with documentation, deploy to staging, scan the deployed version, and if everything passes, deploy to production with rollback strategies ready.”

Wrap that into a Skill and you stop “asking Claude to help” — you start running pipelines.

This new Claude Code upgrade just changed everything

Wow I’ve never seen Windsurf or Copilot do something this incredible.

But Claude Code is going way way beyond just code generation for us now. This is on a whole different level. This is total and complete software engineering. It’s all coming together.

Not just writing code based on your desires — but doing everything to intelligently make sure every single line of code ever written by you or itself or anyone actually matches those desires.

Just look at what it did here with Claude Code Desktop — we told it to launch the app and make sure everything is right — the checkout flow, the mobile responsiveness the dark mode…

Not only did Claude Code autonomously run all the flows — it caught critical runtime errors along the way and fixed them all.

The best most other coding tools can do is to fix the syntax errors they make while generating code — but what Claude Code is doing here is light years more sophisticated and advanced.

And you know, these time of runtime errors can be so tricky — because a lot of them only occur in very specific flows and usage patterns. The app runs successfully and you think everything is fine — not realizing the serious flaws on their way to production.

And this is just 1 of all the latest upgrades Claude Code just received within the past few days.

We just got Opus and Sonnet 4.6 for higher quality code and superior intelligence — now we are getting even more amazing new features to level up the entire software development process with that intelligence.

1. Built-in local code review

You can now run a “Review code” action on your local changes before pushing anything.

Claude analyzes your diff and leaves comments directly in the desktop diff view. It flags risky changes, missing edge cases, inconsistent patterns, or potential regressions.

Think of it as a pre-PR quality pass.

It’s not replacing human review, but it’s extremely useful for catching the “obvious in hindsight” mistakes before they ever reach your team.

2. Visual debugging — with autonomous self-correction

Claude can now spin up your local development server and see your running app directly in the desktop interface.

It doesn’t just read logs — it uses its vision capabilities to look at what’s actually rendered.

That means it can:

  • Identify layout issues
  • Notice broken spacing or alignment
  • Catch visual regressions
  • Flag components that don’t behave correctly in dark mode

You can literally say something like, “Make sure the dark mode works well,” and Claude can visually inspect the UI, identify contrast issues, spacing inconsistencies, or styling mistakes — and then fix them.

That’s a big step up from traditional AI coding workflows, where you had to describe what the UI looked like and what was wrong with it. Now Claude can see the output itself and self-correct.

It feels much closer to working with a human who can glance at your screen and say, “Yeah, that modal padding is off.”

3. Catching runtime errors — not just syntax mistakes

Syntax errors are the easy part.

What about:

  • Runtime errors that only appear after a button click?
  • State bugs that show up after a specific user flow?
  • Crashes triggered by edge-case inputs?
  • Logic errors that technically run but produce wrong results?

This is where Claude Code Desktop’s preview loop becomes powerful.

Because it can run your app, monitor logs, and interact with it, Claude can catch runtime errors — not just compilation issues. Even more importantly, it can test usage flows that surface bugs you wouldn’t catch from static analysis alone.

Instead of just fixing what won’t compile, Claude can:

  • Trigger flows
  • Observe failures
  • Trace stack errors
  • Patch logic
  • Re-run and verify the fix

That’s a much more comprehensive testing-and-repair loop than simply cleaning up red squiggly lines in an editor.

4. PR monitoring and optional auto-merge

Once your changes are pushed to GitHub, Claude can monitor the PR lifecycle inside the desktop app.

You can:

  • Track CI status
  • Let Claude attempt fixes if CI fails
  • Enable optional auto-merge once checks pass

This is where Claude starts handling workflow glue. Instead of babysitting a PR and refreshing checks, you can move on to something else while Claude watches it.

If CI breaks, it can try to fix the issue. If everything passes and you’ve enabled it, it can merge automatically.

That’s not just coding assistance — that’s delivery assistance.

5. Sessions that move with you

Claude Code sessions can now flow between CLI, desktop, and web. Start in one environment, continue in another, without losing context.

It sounds small, but not having to re-explain your project every time you switch surfaces removes friction fast.

We’re moving beyond “AI that helps you type code” toward “AI that helps you validate and ship working software.”

The real question isn’t whether Claude can generate a component anymore.

It’s whether you’re ready to let it run your app, test your flows, fix your runtime bugs, and quietly merge your PR while you work on the next thing.

Gemini 3.1 Pro is an absolute game changer

I guess it was too soon to call this 4.0 — but don’t let the 3.1 fool you.

This was way more than just a minor upgrade.

This was one of the biggest capability jumps we’ve seen in a while — especially if you care about reasoning, research, and actually shipping well-built, high-quality work.

Everyone has been talking about 1 particular unbelievable improvement with this new update.

Imagine going from scoring 31.1% in a reasoning test… to 77.1% and being the absolute best in the same test just a few months later — but this is what Gemini 3.1 just shocked the world with.

More than a 100% upgrade in capabilities.

And this is abstract reasoning we’re talking about — not memorization or “glorified autocomplete”. It had to solve problems with completely new logic patterns, problems it had never seen before — or something like before.

This is huge.

And this makes the 1 million context window it has even more lethal for coding and every other use case we can think of.

It’s vastly superior to its predecessor in every way. The graphics and SVG generation are so good — which is also a huge win for web developers.

1. Web browsing got dramatically better: 59.2% →…

This one is just as important.

On BrowseComp — a benchmark that measures how well a model can use web tools and navigate information — Gemini 3.1 Pro jumped from 59.2% to 85.9% — overtaking all Claude models, including the recently released Sonnet 4.6.

That’s huge.

The difference between those two numbers isn’t cosmetic. It’s the difference between:

  • Surface-level summaries vs. actual synthesis
  • Grabbing the first answer vs. cross-checking sources
  • Losing context across tabs vs. maintaining a clear research thread

If you use AI for research, competitive analysis, trend tracking, sourcing stats, or building content from multiple references, this upgrade matters a lot.

Better browsing doesn’t just mean “it can search.” It means it’s better at deciding what to search for, what to ignore, and how to combine findings into something coherent.

That’s a big shift.

2. This reasoning upgrade is not a joke

And neither was the test that measured it.

On ARC-AGI-2 — a standard benchmark designed to test abstract reasoning (not pattern regurgitation, but actual problem-solving) — Gemini jumped from 31.1% to 77.1%.

That’s not incremental improvement. That’s a different class of performance.

What does that mean in real life?

It means:

  • Fewer moments where the model “almost” understands your problem but misses a key constraint.
  • Better step-by-step thinking when tasks require multiple logical hops.
  • Stronger performance on planning, debugging, and structured workflows.
  • More reliable outputs when you’re building agents or automation.

If you’ve ever felt like an AI model lost the thread halfway through a complex task — this is the kind of upgrade that directly addresses that frustration.

3. Expanded output limits (aka: it can finally finish the job)

One of the most powerful upgrades — this model can now generate more output tokens than ever in a single go.

Gemini 3.1 Pro supports:

  • Up to ~1 million tokens of input context
  • Up to 65,536 tokens of output

In practical terms?

You can feed it massive documents, long threads, multi-file codebases, research dumps — and it doesn’t immediately choke.

And when it generates output, it doesn’t stop halfway through a spec or give you a half-written guide that needs three “continue” prompts.

For developers, creators, educators, founders, and product teams, this means you can:

  • Generate full-length documentation
  • Draft detailed product requirement docs
  • Create structured courses or long-form content
  • Produce complex code scaffolds in one go

The difference between “smart” and “usable” is often just output capacity. This pushes it firmly into usable territory.

4. Native SVG and creative coding

This part is honestly fun — and useful.

Gemini 3.1 Pro can generate native SVG animations directly from text prompts.

Not screenshots. Not image files. Actual, editable, website-ready SVG code.

Why does that matter?

Because SVG is:

  • Scalable (perfect at any resolution)
  • Lightweight
  • Editable
  • Animatable
  • Easy to embed into websites and apps

That means you can prompt:

“Create an animated SVG of a pulsing network graph with gradient nodes.”

And get code you can drop straight into a project.

For designers, indie hackers, frontend devs, educators, or anyone building interactive content, this opens up a new workflow:

Prompt → tweak → ship.

It’s creative coding without the blank-page paralysis.

And it hints at something bigger: AI models that don’t just generate text or images — they generate real artifacts you can deploy.

Gemini 3.1 Pro is not just “a bit smarter”.

It’s:

  • Dramatically better at abstract reasoning
  • Dramatically better at tool-based research
  • Capable of handling much larger context and outputs
  • More useful for real creative and technical production

If you build things, research things, or create things, this version is meaningfully different from what came before.

And if this trajectory continues, we’re moving from “AI that assists” toward “AI that actually executes complex workflows with you.”

Claude Sonnet 4.6 is absolutely insane

Wow I’ve never seen Sonnet do something like this before. This is huge.

You absolutely cannot ignore this.

I don’t even need to compare it to GPT or Gemini or whatever.

Claude Sonnet is actually no longer trying to be a nice little tradeoff between intelligence and cost.

This new Claude Sonnet is here to be a MASSIVE CHALLENGER to its big brother Claude Opus.

And from the numbers I’m seeing, it has made dangerous progress toward achieving that with this new 4.6 update.

It decimated the previous version of Claude Opus (4.5) in basically every metric — and was incredibly close to the current Opus version — and even beat this latest Opus in notable areas.

Literally 2nd position in the biggest AI benchmarks out there — and guess the one model that stopped it from gaining top spot?

It’s gotten so much better at automating actions on your computer now (Computer Use):

1 MILLION token context — trust me this is not a model you want to mess around with.

With Sonnet 4.6, Claude will handle all your real-world, production AI workloads — especially coding and tool use — without the higher cost of Opus.

1. Essential coding upgrade — that we will all feel

Sonnet 4.6 scored 79.6% on SWE-bench Verified, extremely close to Opus 4.6’s ~80.8%, showing near-flagship coding performance at lower cost.

And not just benchmarks. Sonnet 4.6 is here to work with us in real workflows:

  • Understanding large repos
  • Editing across multiple files
  • Avoiding unnecessary rewrites
  • Following existing structure instead of “overengineering”

In Anthropic’s own testing, developers preferred Sonnet 4.6 over Sonnet 4.5 about 70% of the time in Claude Code, citing better context reading and less duplication/overengineering.

2. Unbelievable Computer Use gains

Anthropic has been massively pushing Computer Use lately: the AI models controlling out software like we would to carry out complex actions for us — clicking, typing, navigating interfaces along the way.

With 4.6, that capability improved significantly.

Sonnet 4.6 achieved 72.5% on OSWorld-Verified, dramatically up from Sonnet 4.5’s ~61.4% and nearly matching Opus 4.6’s ~72.7%, which demonstrates near-parity in practical interface interaction tasks.

Sonnet 4.6 now performs nearly on par with Opus in Computer Use.

That’s a big deal because computer-use tasks are messy. They require:

  • Reading dynamic UI elements
  • Recovering from small mistakes
  • Planning multi-step actions

It’s not perfect, but it’s much closer to “practical assistant” than previous versions.

3. 1M is serious business

The new 1 million token context window means you can easily:

  • Load an entire workspace spanning multiple codebases
  • Drop in several multiple long contracts
  • Analyze huge research dumps
  • Work across extended conversation history

More importantly, Anthropic emphasizes that 4.6 isn’t just ingesting that volume — it’s designed to reason across it.

For anyone doing knowledge-heavy work, that’s where things get interesting.

4. Built agentic and terminal workflows — notable upgrades

Sonnet 4.6 posted 59.1% on Terminal-Bench 2.0, a notable improvement over Sonnet 4.5’s ~51.0% and closer to Opus 4.6’s ~62.7%, underscoring progress in complex, multi-step coding tasks.

Sonnet 4.6 feels very optimized for agents — the kind that:

  1. Plan
  2. Call tools
  3. Execute steps
  4. Reflect
  5. Iterate

Sonnet 4.6 scored 91.7% (retail) and 97.9% (telecom) on t²-bench agentic tool use, which is a clear improvement over Sonnet 4.5’s 86.2 % retail and essentially on par with Opus 4.6’s 91.9 % retail and 99.3 % telecom results.

Benchmarks around tool use (like t²-bench) suggest strong reliability when interacting with structured tools and APIs.

If you’re building workflows that involve repeated tool calls and feedback loops, cost-to-performance matters. And this is where Sonnet 4.6 seems carefully positioned.

5. Safety and prompt injection resistance

When models start browsing or using tools, prompt injection becomes a serious concern.

Sonnet 4.6 significantly improves resistance to malicious or hidden instructions compared to 4.5, performing similarly to Opus 4.6 in safety evaluations.

In other words: it’s better at ignoring sketchy instructions embedded in web pages or documents.

That matters a lot for autonomous or semi-autonomous systems.

6. Pricing stays the same

This is one the biggest deals in this release.

Sonnet 4.6 is far far better than both Sonnet 4.5, yet the pricing remains the same.

  • $3 per million input tokens
  • $15 per million output tokens

Sending the clear message:

Opus-level reliability in many workflows — without Opus-level cost.

When should you use it?

Choose Sonnet 4.6 if you want:

  • A daily-driver model for coding
  • A strong agent backbone
  • Large context handling
  • Reliable tool usage
  • Production deployment without premium-tier costs

Choose Opus 4.6 if:

  • The reasoning task is extremely complex
  • Precision is mission-critical
  • You’re doing heavy multi-agent orchestration

For most teams, Sonnet 4.6 is likely to become the default.

Anthropic seems to be collapsing the gap between “mid-tier” and “frontier.”

Instead of forcing users to upgrade to Opus for serious work, they’re making Sonnet strong enough to handle most of it.

If 4.5 felt like a capable assistant, then 4.6 feels more like a dependable coworker — especially for developers.

And that might be the real story here.

This new open-source model just became a major challenger to Claude Opus

Yet another unbelievable open-source coding model just got unleashed into the world.

Imagine a model that’s just as intelligent as Claude Opus — but 13 times cheaper!

But no need for you to imagine anymore — because this is exactly what the new MiniMax M2.5 is:

  • Unbelievably cheap — yet still incredibly smart
  • Blazing fast
  • Open-source with open weights

No wonder I’ve been seeing so many developers going crazy about it.

I’m seeing experiments showing that you can run complex agentic tasks continuously for as low as $1 for every hour of output.

This is going to be massive for all those our long, multi-step workflows where the models need to plan, browse, write code, revise outputs, and loop until something works.

Aggressive pricing

This is like the biggest reason for all the buzz right now:

  • Performance in real coding and agent workflows comparable to Claude Opus–class models
  • Roughly 13× cheaper in practical usage scenarios
  • Cost low enough that you stop optimizing prompts purely to save money

This is going to make a real difference.

Most agent systems fail economically before they fail technically:

  • In development: repeated tool calls and retries multiply costs quickly.
  • In production: cost add up fast as a growing list of users repeatedly try out multiple AI-powered features

M2.5’s pricing is designed to remove that constraint.

Standard pricing:

  • $0.30 per 1M input tokens
  • $1.20 per 1M output tokens

At rates like these, running long reasoning loops or persistent background agents becomes financially realistic for much larger workloads.

The benchmark that made engineers pay attention

  • 80.2% on SWE-Bench Verified

This is it.

SWE-Bench Verified is widely considered one of the more meaningful coding benchmarks because it measures whether a model can actually resolve real GitHub issues under constrained evaluation conditions.

A high score here signals something specific:

  • Strong code understanding
  • Ability to follow multi-step debugging processes
  • Reliability in structured environments

In other words, it suggests the model can do more than generate code — it can fix existing systems.

Ridiculously fast

Raw intelligence is only part of agent performance. Speed determines whether iteration is usable.

M2.5’s high-speed variant runs at approximately:

  • 100 tokens per second

That level of throughput changes how agents behave in practice:

  • Faster plan → execute → verify cycles
  • Less waiting between iterations
  • Higher tolerance for multi-pass refinement
  • Better human experience when supervising agents

Many agent workflows involve dozens of internal steps. When each step is fast, experimentation becomes normal instead of frustrating.

Open weights: control instead of dependency

Another major part of the story is that M2.5 is not just an API product.

MiniMax released:

  • Open weights
  • A permissive modified-MIT style license
  • Support for local deployment stacks

This matters for companies building internal tooling because it allows:

  • Local inference for sensitive data
  • Predictable costs at scale
  • Custom infrastructure integration
  • Reduced vendor lock-in

The combination of strong performance and deployability makes M2.5 particularly attractive for engineering teams building long-lived internal agents.

GLM-5 is absolutely incredible for coding (7 new features)

Woah this is huge.

China’s Z.ai just released their brand new GLM-5 model and it’s absolutely incredible. I hope Windsurf adds support for this ASAP…

This is not just a “coding” model. This is full-blown software engineering.

They designed it from the ground up to build highly complex systems and intricate dev workflows.

Record-low hallucinations — from 90% in the previous version… to 34% in GLM-5. Thanks to a groundbreaking approach to training the model.

Like for example if I ask the model a question it doesn’t know — it’s more likely to just tell me it doesn’t know — instead of inventing garbage on the fly — like I see at times from GPT and the rest.

And it’s open-source with open weights (!)

Let’s check out all the amazing features in this release.

1. Agent-first behavior (designed to stay on task)

GLM-5 is positioned around what we developers call agent workflows — situations where the model has to plan, execute, check results, and continue working toward a goal instead of responding once and stopping.

The main improvement here isn’t personality or creativity. It’s consistency. The model is tuned to maintain context and direction over longer sequences of actions, which is essential if you want AI to handle real workflows instead of isolated prompts.

2. A true coding-focused model

Software engineering is one the first and foremost priorities of this new model.

GLM-5 is optimized for working across larger codebases and longer development tasks rather than generating small snippets.

In practice this means keeping track of project structure, following constraints across files, and iterating toward working solutions. Improvements in coding usually signal broader gains in reasoning and planning — since programming requires precision and structured thinking.

3. Very large context window (so it can hold more of the problem at once)

GLM-5 supports an extremely long context lengths of 200,000 tokens — allowing large amounts of text, documentation, or code to stay visible to the model at once.

This matters more than it sounds. Instead of feeding information piece by piece, developers can provide entire specifications or large repositories in one session. That reduces fragmentation and makes long-running tasks far more stable.

4. Production-ready tool use

Another major focus is making the model usable inside real applications. GLM-5 includes features aimed at integration rather than conversation alone, such as:

  • function calling for external tools or APIs
  • structured outputs for predictable formatting
  • streaming responses
  • context caching for efficiency
  • different reasoning modes for complex tasks

These features make it easier to embed the model into systems where it needs to coordinate with software rather than simply generate text.

5) The “slime” framework (the training story behind the behavior)

One of the more interesting additions sits behind the scenes. The slime framework is an open reinforcement-learning post-training system designed to make large-scale training more efficient.

Its purpose is to improve how models learn from feedback during long or complex interactions.

Instead of only learning from static examples, the model can be refined through iterative training setups that resemble real workflows. That kind of training infrastructure is closely tied to improvements in stability and long-task performance.

In simple terms, slime helps train models to behave better over time, not just answer individual questions well.

6) Efficient long-context architecture

GLM-5 also uses newer attention techniques designed to keep long-context performance manageable in terms of compute cost. Long context is useful only if it remains practical to run, so part of the engineering effort goes into maintaining efficiency while scaling capability.

This reflects a broader trend in AI development: smarter architecture choices instead of only increasing size.

7) Hardware and ecosystem implications

Another reason GLM-5 has drawn attention is that it was developed with deployment in mind on domestically produced AI chips. That makes it notable beyond technical capability, since it signals growing independence from the traditional hardware stack that has dominated AI training and inference.

GLM-5 isn’t mainly about sounding smarter in conversation.

Its significance comes from where it points the industry next: models designed to manage complexity over time. Long context, structured tool use, reinforcement learning infrastructure like slime, and strong coding ability all serve the same goal — making AI systems that can carry work from start to finish rather than stopping at the first response.