Posts - Coding Beauty

New Claude Opus 5 just changed everything

By Tari Ibaba / Last updated on July 26, 2026

Anthropic just shocked the world with with this absolutely incredible model.

It’s more powerful than Fable 5 in several critical areas — yet a whopping 50% cheaper.

You’re literally slashing your token costs in half just by swapping models.

Unbelievable — Opus 5 completely generated this AAA first-person shooter game from scratch — it didn’t use a single external asset — everything is generated by code:

“A wrecking ball demolishing an apartment block” — Opus 5 shows superior physicals and design ability in this 3D generation test:

On CursorBench 3.2, at max effort, the model performs within 0.5% of Fable 5’s peak score, but at half the cost per task; it also achieves greater performance at a given cost than all other models on high, xhigh, and max effort
— Anthropic

One week ago Opus 4.8 was losing to Kimi K3. Now Anthropic holds the top two spots on the leaderboard:

Just imagine how scary the upcoming Fable 5.1 is going to be.

They’ve made so many massive improvements in coding.

Opus 5 dominated in several tough coding benchmarks, like:

ARC-AGI 3: Around 3× higher performance than previous leading models on novel abstract reasoning tasks.
Frontier-Bench v0.1: 43.3% pass rate, more than double Opus 4.8’s score on complex terminal coding.
CursorBench 3.2: Within 0.5% of Claude Fable 5 despite costing half as much.

“Generate a tornado that sucks in a whole field” — Opus 5 displayed superior physics and design ability compared to other models:

Opus 5 plans before coding, understands large repositories, maintains consistency across files, and produces cleaner implementations that require fewer revisions.

It’s also now much more powerful in visual reasoning and UI generation.

It can generate rich, interactive artifacts directly in chat, including:

Interactive SVG and Canvas applications
Scientific simulations
Educational visualizations
Data exploration tools
More polished UI prototypes

Iterative design is noticeably better, too.

The model accepts feedback, updates layouts, and preserves styling with far fewer broken elements or visual artifacts than previous generations.

It also has incredible long-horizon ability now — and it’s so good at persisting at a task until it finally gets it right.

It doesn’t just give up when it faces the most difficult problems — it deploys several techniques to work through them.

Key capabilities that let it do this:

Root-cause debugging: Finds and fixes underlying issues instead of applying temporary patches.
Self-written tools: During testing, it built its own computer vision pipeline to analyze raw pixel geometry when standard tools weren’t available.
Automatic self-verification: Checks branch states, reruns tests, and validates its own work before finalizing changes.

Prompts like “double-check your work” or “verify the output” become unnecessary because the model already does it automatically.

And now its tool usage just become so much more sophisticated.

We now have dynamic mid-conversation tool management when using Claude API with Opus 5.

You’re no longer locked in to fixed set of tools for an entire session — we can now:

Add new tools between conversion turns
Update existing tool definitions
Remove tools when they’re no longer needed
Preserve conversation context and prompt caching throughout the session

It’s a small API change with a big impact for anyone building AI agents or complex developer workflows.

This is hands-down one of the best coding models we’ve seen in a long time.

New Claude Code Agent Teams feature is absolutely insane

By Tari Ibaba / Last updated on July 24, 2026

Wow this is incredible.

Claude Code’s Agent Teams feature is absolutely revolutionary.

This is going to improve my productivity massively.

A complete departure from the standard way we think about coding with AI.

Many AI coding tools still work like solo developers.

You give them a task, they analyze the codebase, write code, debug issues, and return the result — all within a single conversation.

That’s nice but incredibly limited — which is why Claude Code takes a totally different approach with Agent Teams.

Instead of relying on one AI to do everything it lets you orchestrate multiple Claude instances that work together like a real engineering team.

Let me show you five features that make it such a huge deal — no one should ignore this.

1. Independent context windows

This is hands-down one of the biggest selling points for me.

Every teammate runs in its own completely separate Claude session, each with an independent context window.

Why does this matter?

Because when you expect one AI understand hundreds of files while simultaneously:

Writing backend logic
Designing frontend components
Creating unit tests
Updating documentation

…its context quickly becomes cluttered. The result is often forgotten requirements, inconsistent reasoning, or hallucinations.

Agent Teams avoids this by dividing responsibilities. For example:

Lead Agent → planning and architecture
Backend Agent → APIs and business logic
Frontend Agent → UI development
QA Agent → testing
Documentation Agent → docs and guides

Each agent only needs to understand its own domain, allowing it to stay focused and produce more reliable results.

2. True peer-to-peer communication

I’ve seen some other tools that try to implement this multi-agent stuff too — but they simply can’t compare because of this.

Most of them use some sort of hub-and-spoke model — where every subagent reports back to a single lead.

Agent Teams completely breaks that pattern.

Teammates can communicate directly with one another.

Like for example: a Frontend Agent can ask the Backend Agent for an API response format without routing the request through the Team Lead.

That reduces coordination overhead and makes collaboration feel much closer to a real software engineering team.

As the human in the loop, you can also:

Jump directly into any agent teammate’s session
Inspect what it’s doing
Interrupt or redirect it mid-task

Meanwhile, the rest of the team continues working uninterrupted.

3. Interactive split-pane views

I hate when I see some coding assistants hide what they’re doing until they’re finished.

Agent Teams makes the entire workflow visible.

We can choose between:

In-Process Mode: Cycle through agents in a single terminal.
Split-Pane Mode: Run every teammate in its own terminal pane using tools like tmux.

The split-pane view is especially compelling.

You can literally watch:

One agent writing backend endpoints
Another generating unit tests
Another updating documentation

All at the same time.

And because every pane is interactive, you can jump into any agent, provide new instructions, then continue monitoring the rest of the team.

4. Mix-and-match AI models

Since every teammate is an independent session each can run a different Claude model.

A practical setup might look like this:

Claude Opus/Fable → Team Lead for planning, architecture, and complex reasoning
Claude Sonnet → Backend and frontend implementation
Claude Haiku → Documentation, testing, and boilerplate tasks

So we can reserve the most capable model for strategic decisions while using faster, more affordable models for routine work.

The result: better cost-performance without sacrificing productivity.

5. Parallel “competing hypotheses” debugging

This is another crucial area where Agent Teams gets especially interesting.

Instead of investigating one possible cause at a time, the Team Lead can launch multiple investigators simultaneously.

For example:

Agent A investigates a race condition.
Agent B examines third-party API timeouts.
Agent C checks for database locks.

Each agent tests its own hypothesis independently before presenting its findings.

Instead of exhausting a single context window with sequential troubleshooting — the team explores multiple explanations in parallel, compares evidence, challenges conclusions, and converges on the root cause much faster.

Claude Code Agent Teams isn’t simply about running multiple AI agents.

It’s about enabling them to collaborate effectively — a unprecedented level of transformation from individual coding assistant to AI engineering team.

OpenAI just made Claude Code 10 times more incredible

By Tari Ibaba / Last updated on July 23, 2026

This is unbelievable.

They’ve literally brought their most insane GPT models to elevate Claude Code with their incredible new Codex plugin…

Not even to replace it — but to work side-by-side and fix every possible weakness Claude could possibly have when working on your codebase.

You’re literally getting the best of both worlds now — combining the best of the best in AI coding capability into one single workflow, incredible.

And we’ve been seeing similar things in other tools like Cursor and Google’s Antigravity — it’s very clear where AI-powered software development is heading right now.

1. Proactive delegation

Definitely one of the most interesting features of this plugin.

With the plugin installed, Claude Code doesn’t have to do everything itself. It can hand off work to Codex using /codex:rescue, which acts like a built-in escalation path.

Just imagine:

You’re working in Claude Code
Something gets tricky—maybe a bug, maybe a messy refactor
Codex instantly step in and take over that part

You don’t need to decide when to switch tools anymore. The system itself is structured so that one agent can call another when needed.

AI transforms from a single assistant into a full-blown coordinated team.

2. Cross-provider review

Review from a different model entirely.

Two main modes:

/codex:review → a standard second-pass code review
/codex:adversarial-review → a more critical, challenge-focused review

This is where things get powerful.

Instead of relying on one model’s perspective, you can:

Write code with Claude
Then have Codex review it independently
Or even challenge the approach itself

That matters because different models:

Learn from different data
Have different blind spots
Catch different kinds of mistakes

Now you end up getting:

Fewer missed bugs
More robust edge-case handling
Better overall code quality

It’s not magic—but two perspectives are almost always stronger than one — especially in debugging and design review.

3. Hybrid runtime: local + cloud working together

Another hidden benefit.

Claude Code is very much a local, terminal-first tool. It lives in your environment, works directly with your files, and runs commands on your machine.

Codex on the other hand can operate in sandboxed environments including cloud execution.

Put them together and you get a hybrid setup:

Claude handles local context, editing, and orchestration
Codex can step in with isolated execution or deeper analysis

This combination gives you the best of both worlds:

Speed and control locally
Safety and scalability when offloading tasks

4. MCP shines yet again

None of this works smoothly without a shared way for tools to communicate.

That’s where Model Context Protocol (MCP) comes in.

Both Claude Code and Codex are built to use MCP — our now-very-standard universal interface for:

Tools
data access
workflows

Because they speak the same “language,” they can:

Share context
Access the same tools
Plug into the same workflows

This is what makes the integration feel natural instead of bolted on.

5. Competitive pricing: follow the strategy

There’s also a business angle here that’s hard to ignore.

OpenAI recently introduced a $100/month Pro tier, landing right in the same range as Anthropic’s Claude Max plan.

Now add the plugin into the picture:

Developers can keep using Claude Code (Anthropic’s tool)
But still route meaningful work through Codex (OpenAI’s system)

In other words, OpenAI doesn’t even need to win the interface.

If Codex is:

Handling reviews
Fixing bugs
Running delegated tasks

…then OpenAI still captures usage, even inside a competitor’s environment.

It’s a brilliant move.

What this really means

We’re very clearly moving away from:

“Which AI is best?”

And toward:

“How do different AIs work together?”

With this setup:

Claude acts as the orchestrator
Codex acts as the reviewer, challenger, or specialist

And you, the developer, are managing a multi-agent workflow instead of a single assistant.

That’s the real story here.

Not just a plugin—but the early shape of AI systems that collaborate, not compete.

Gemini 3.6 Flash doesn’t deserve all this hate

By Tari Ibaba / Last updated on July 22, 2026

I don’t get why people keep acting like Gemini is so bad?

Now Gemini 3.6 Flash just come out and the whole of Twitter is just bashing it for no reason, acting like Google is totally finished in the AI race.

Like how good were you really expecting to be?

And why would you even compare this to a model like Fable or GPT-6.5 Sol?

Anti-Gemini narrative (just look at that massively inflated y-axis):

Reality — look at where 3.6 Flash is and look at Kimi K3, look at Sonnet & Fable 5…:

This is a mid-tier model for goodness sake. It’s not supposed to dominate every leaderboard and crush every benchmark.

This was a real upgrade with real improvements. It’s better than other mid-tier models like Sonnet 5 in certain areas.

And it’s even better than some frontier models like Grok-4.5 at multiple things.

People are just jumping to trash a good and promising model because that’s what trendy right now.

I bet many of them have never actually used Gemini since like 1.5 Pro came out 2 years ago.

Gemini right now has been amazing for me in coding — yes coding.

Especially Gemini in Devin/Windsurf (maybe some harnesses are much better than others).

Multiple times it has fixed bugs in my codebase that other models simply weren’t able too.

Gemini 3.6 Flash is comparable to Claude Opus 4.8 in intelligence, in the AI Arena leaderboard:

We’ve gotten so used to every new model topping the benchmarks that we’ve forgotten that there are other ways a model can improve other than raw intelligence.

Improvements like token efficiency, intelligence per cost, and intelligence-speed ratio.

Gemini 3.6 Flash’s blazing speed will make the most ideal for many real-world applications where latency needs to be as low as possible.

And some devs act like coding is the only thing people use AI models for.

Okay maybe Gemini is really so so terrible at coding compared to Claude for you — but that’s actually only one use case, you know — out of several dozens and hundreds.

It doesn’t have to be the absolute best at every single thing.

Google is not only thinking about developers when they make their models — in case you didn’t know, they have a user base of multiple billions with all sorts of goals and interests and careers.

And unlike OpenAI or Anthropic, they have a whole of a lot more going on for them other than AI.

They don’t have to constantly release model after model to keep investors pleased and not go bankrupt.

They even have positive cashflow — with or without any of the AI stuff.

They have the means to be a lot more patient and play the long game.

And once you zoom out beyond the developer community, you’ll see that they’re doing very well even in the short game.

900 million monthly active users in the Gemini app.

Millions of positive reviews and 5-star reviews on both app stores.

Even Gemini API itself is in massive demand with over 13 million users — so despite the narrative they are still a lot of developers and businesses that consider Gemini models to be the best choice possible for their apps and systems (I’m one of them btw).

Something tells me they’re not half as worried as people think they are.

Prompting is not enough for Claude Code. You need Hooks.

By Tari Ibaba / Last updated on July 21, 2026

Prompting is not enough.

At times there are much better ways to control an AI agent — like when you need the agent to perform a precise action or be aware of something only under a specific scenario.

That’s why top-tier tools like Claude Code come packed with features like Hooks.

Claude Code Hooks are event-driven scripts that run automatically at key points in Claude’s execution lifecycle — like:

before a tool is called
after a file is modified
when a session starts,
or when a task finishes.

Instead of relying on prompt engineering, hooks let you enforce rules and automate workflows at the system level, making Claude Code more predictable, secure, and customizable.

You can inject dynamic instructions that the agent gets only when certain conditions are met.

Let’s look at some of the major benefits and features of integrating Hooks into your workflow.

1. Hard security guardrails

❌ “don’t run dangerous commands”

❌ “don’t expose secrets”

These are not things you should be putting in prompts. Use hooks.

Instead of telling Claude what not to do in extremely sensitive scenarios and hoping it remembers — a PreToolUse hook can intercept tool calls long before they execute:

// sample hook file

import { readFileSync } from 'node:fs';

// 1. Read event payload from stdout/stdin pipe
const input = JSON.parse(readFileSync(0, 'utf-8'));
const command = input.tool_input?.command || '';

// 2. Define blocked patterns (e.g., recursive deletes or reading secrets)
if (/rm -rf|\bcat .*\.env/i.test(command)) {
  console.error('SECURITY REJECTION: Dangerous command blocked.');
  process.exit(2); // Non-zero exit code stops execution & feeds error to Claude
}

process.exit(0);

This lets you automatically:

Block destructive shell commands such as rm -rf, chmod 777, or force-pushing directly to main.
Prevent access to sensitive files like .env, SSH keys, AWS credentials, or API tokens.
Return a clear error explaining why an action was rejected, allowing Claude to choose a safer alternative instead.

2. Enforcing engineering standards

Hooks can ensure Claude follows your team’s engineering methodology instead of taking shortcuts.

For example, a PostToolUse hook can automatically run your test suite or linter whenever Claude creates or edits source code.

If tests fail, the resulting stdout and stderr are fed directly back to Claude within the same interaction. Claude can then identify the failing assertions and attempt to fix them before you ever review the code.

This makes it easy to enforce practices such as:

Test-Driven Development (TDD)
Mandatory linting
Automated quality gates
Continuous validation after every code change

3. Dynamic context injection

Large system prompts often become bloated with documentation, coding standards, and compliance rules.

Hooks provide a more efficient alternative by injecting context only when it’s relevant.

For example:

If Claude modifies a file in /src/payments, a PreToolUse hook can inject payment-specific compliance or security guidelines.
A SessionStart hook can restore project context by loading GitHub or Jira issues, documentation, or data from a local vector database before you even type your first command.

This keeps the working context focused while ensuring Claude always has the right information at the right time.

4. Automatic code formatting

Formatting is another task that’s better handled by hooks than by prompts.

A PostToolUse hook can automatically run formatters such as Prettier, Black, or gofmt immediately after Claude modifies a file. This ensures:

Consistently formatted code
Cleaner Git diffs
Fewer prompt iterations spent fixing indentation or imports
More model time focused on solving real engineering problems

5. External tool orchestration

Because hooks execute native system commands, they can also connect Claude with the rest of your development stack.

Common use cases include:

Voice updates: Send completed tasks to a text-to-speech engine for spoken progress updates.
Notifications: Trigger Slack or Discord webhooks when Claude finishes a long-running task or requires human approval.
Cost monitoring: Track token usage throughout a session and send alerts—or terminate execution—when predefined budget limits are exceeded.

Claude Code hooks transform Claude from a general-purpose coding assistant into a programmable development platform.

By enforcing security policies, automating engineering workflows, injecting context only when needed, handling formatting automatically, and orchestrating external tools — hooks enable teams to build AI-assisted workflows that are safer, more reliable, and better aligned with modern software development practices.

How the new “Karpathy Skills” drastically improve Claude Code’s accuracy

By Tari Ibaba / Last updated on July 20, 2026

High-quality AI coding in 2026 has gone way past just picking a good model and calling it a day.

There’s now much more that needs to be properly calibrated and fine-tuned to get the very best results from your agent.

We now have Skills which let you precisely shape AI behavior by packaging instructions, scripts, and context into reusable units.

And that’s what the new, wildly popular “Karpathy Skills” have been able to take advantage of to the fullest extent.

Karpathy Skills is a set of strict rules and guidelines that drastically improve the accuracy and reliability of your agent, once you add them to your CLAUDE.md (or CURSOR.md) file.

Let’s take a look at some of these key rules, so you can better understand why it makes such a massive difference.

1. The surgical strike

Most LLMs try to be helpful. Too helpful.

You’ve probably experienced this:

You ask for a fix or new feature.
They make the changes… but also:

clean up unrelated code
reformat files
rename variables
refactor “while they’re there”

It looks productive. But it leads to low model trust and messes up your mental model of the codebase.

The rule:

Only change the exact lines required
No drive-by edits
No unrelated improvements

Why it matters:

Prevents diff bloat
Makes PRs readable
Reduces hidden risk

Think about review time.

500-line diff → slow, error-prone
5-line diff → fast, obvious

This isn’t about style.
It’s about trust.

A good AI agent doesn’t try to improve everything.
It solves exactly the problem.

2. Extreme disambiguation

Most agents are optimized to continue.

If they’re 80% sure, they guess the missing 20% and move forward.

That’s dangerous.

The rule:

Don’t assume
Don’t hide confusion
Surface tradeoffs

In practice:

Ask clarifying questions
Present multiple interpretations
Push back on unclear requests

Why it matters:

Prevents hallucinated requirements
Exposes ambiguity early
Creates tighter feedback loops

Bad agent:

“Sure, I implemented it.”

Good agent:

“Do you want A or B? They have different tradeoffs.”

3. Goal-first thinking (declarative over imperative)

Most developers naturally give instructions.
AI works better with outcomes.

The Karpathy-style rule is simple:

Define success criteria
Loop until verified
Transform imperative tasks into verifiable goals

This pushes the agent into goal-first thinking.

What this changes

Instead of telling the AI what to do step-by-step, you tell it:

what should fail
what should pass
how to know it’s done

That’s declarative thinking.

Imperative vs declarative

Imperative (weak):

“Add validation to this endpoint.”

Declarative (strong):

“Write a test that fails when invalid input is accepted. Update the code until the test passes.”

Notice the difference:

Imperative → action
Declarative → outcome

4. Anti–future-proofing (simplicity first)

AI loves to over-engineer.

You ask for something simple.
It builds something “flexible.”

Suddenly you have:

abstractions
configuration layers
unused hooks
“just in case” logic

The rule:

If 200 lines could be 50, rewrite it
No abstractions for single use

Why it matters:

Prevents AI slop
Keeps code readable
Reduces long-term maintenance

Over-engineering compounds.

One abstraction → pattern
Pattern → everywhere
Everywhere → hard to change

Simplicity doesn’t mean naive.
It means appropriate.

If the problem is small, the solution should be small.

The takeaway

No new architecture.
No breakthrough model.

Just constraints:

Keep diffs small
Surface ambiguity
Define success with tests
Prefer simple code

Claude Skills give structure to these ideas.
Karpathy-style rules give them teeth.

The result:

Not an AI that writes everything — a reliable AI that writes just enough, and just right.

One you can trust in a real codebase.

5 Claude Code features for high-quality context — most developers ignore these

By Tari Ibaba / Last updated on July 18, 2026

The biggest limitation of every AI coding assistant isn’t intelligence.

It’s not how much code it can right or how fast it responds.

It’s context.

What happens as software projects grow?

Conversations become longer, repositories become larger, and the amount of information the model needs to juggle explodes.

Architecture decisions, coding conventions, debugging discoveries, terminal output, documentation, and previous discussions all compete for the same finite context window.

That’s why Claude Code attacks this problem from multiple directions to generate the highest quality code possible.

It doesn’t just give you a larger context window — it provides wide of range of tools that intelligently manage what the model knows, remembers, and carries forward between sessions.

Let’s look at five of most powerful context management features, from quick context branching, to automatic features that most developers take for granted.

1. Your context shouldn’t increase with every new message

I made this mistake a lot in the past.

I would ask Claude Code a lot of small, unrelated questions in the middle of a long coding session.

Questions that would become a permanent part of the context — despite having no long-term value beyond the moment I asked them.

I didn’t realize how much context I was wasting.

But thankfully we have now have the /btw command to fix this exact problem in Claude Code.

It lets you ask a quick side question that has full visibility into your current conversation without adding the question or answer to your chat history. Instead, Claude shows the response in a temporary overlay that disappears when you’re done.

Asking questions on our changes with /btw:

That means you can ask things like:

“What was that config file called?”
“Why did we choose this approach?”
“Which function handles authentication?”

…without cluttering your main conversation.

When we press Enter, the btw message disappears and we’re back to our normal conversation:

For long-running coding sessions, /btw helps keep your context focused on the implementation while still giving you instant access to everything Claude already knows about your project

2. Your Claude Code sessions don’t have to start from zero

You don’t have to being every new conversation with Claude Code from scratch.

Its Auto Memory system allows Claude to accumulate useful project knowledge over time.

As you correct mistakes, establish workflows, or repeatedly teach it project-specific conventions, Claude can save those learnings automatically and reload them in future sessions.

Instead of repeatedly explaining things like:

build commands
debugging workflows
environment quirks
preferred implementation patterns

Claude gradually learns them itself.

Over time, the assistant becomes increasingly tailored to your project with almost no manual effort.

3. Claude can’t figure everything out by itself

You can teach Claude Code how it’s done.

If Auto Memory learns automatically, CLAUDE.md is where you teach Claude deliberately.

It’s like permanent briefing document for your project.

Rather than re-explaining your coding standards, testing strategy, deployment workflow, or naming conventions every session, you simply write them once inside a CLAUDE.md file.

A sample CLAUDE.md file for Claude Code:

Every new conversation begins with that context already loaded.

A good CLAUDE.md might include:

coding conventions
important design decisions
common team-specific workflows
“always” and “never” rules

The more mature a project becomes, the more valuable this file gets.

4. What does Claude Code do when context gets out of hand?

Most AI tools simply start forgetting earlier parts of the discussion as context approaches the limit.

Claude Code does something much smarter — automatic context compaction.

As the context window fills up, it automatically compresses older portions of the conversation into a concise summary while preserving the important decisions, discoveries, and reasoning that led there.

Instead of carrying hundreds of thousands of tokens forever, Claude keeps the essential information while freeing space for new work.

This allows coding sessions to continue far longer without suffering the dramatic quality drop that often occurs when context windows become overloaded.

5. What can you do when context gets out of hand?

Claude doesn’t have sole control — you can also take manual control of your context — with multiple commands.

The /compact command tells Claude to summarize the current conversation immediately.

It’s perfect after finishing a large feature or debugging session. You preserve everything important while dramatically reducing context usage before moving on to the next task.

But somethings you don’t even want any previous conversation at all.

That’s where /clear comes in.

It starts a fresh conversation with an empty context window while still keeping your persistent project knowledge—such as CLAUDE.md instructions and Auto Memory—intact.

Think of it as wiping the whiteboard clean without forgetting everything you’ve learned about the project.

Claude Code’s approach isn’t just about having more context — it’s about using context intelligently.

Together all these features let Claude spend its attention on what matters most, making long-running development sessions feel remarkably consistent even as your projects grow in size and complexity.

Kimi K3 just did something nobody ever expected from open-source AI

By Tari Ibaba / Last updated on July 17, 2026

This is unbelievable.

The new Kimi K3 is sending major shockwaves across the entire tech world.

For years people have been writing off open-source AI models, dismissing them as dumbed down version of “the real deal”.

But now, this Chinese company just released a open-source model that matches up to Claude Fable 5 in every single way — and even dominates it multiple critical areas.

Kimi K3’s output was preferred 76% of the time when matched up again other models for the exact same task:

Kimi K3 utterly dominates the recently released GPT-5.6 in generation of mini shooting game:

And if you think this is just another tiny “12B” model, you’re so dead wrong.

You won’t believe how massive this model is, and every other feature it has to offer…

1. 2.8 trillion parameters

Kimi K3 blows Claude Opus 4.8 out of the water in a 3d scene generation of a military armory:

Until now, many people in the AI space thought open-source only made sense for scrappy 8B or 70B models.

That trillion-parameter frontier giants could only be available in the strictly private domain of mega-corporations with infinite budgets.

Kimi K3 totally demolishes this.

At 2.8 trillion parameters, it’s hands-down the largest open-weight model ever developed.

It matches the rumored scale of closed-source giants like Claude Fable 5, proving developers don’t have to compromise on raw power to keep things open.

2.8T Mixture-of-Experts: Massive scale, but hyper-efficient.
16 Active Experts: Only activates what it needs per token, keeping compute costs sane.
No More Compromises: Puts proprietary-grade cognitive muscle into the public’s hands.

2. Coding with open-source models is no longer a joke

Kimi K3 demonstrates superior game physics and logic compared to GPT-5.6 and Opus:

Before now, a lot of software developers never really saw an open-source model as an option for coding.

Most tasks from complex software engineering to sleek frontend design were always handed off to Claude and other closed-sourced models.

With the new Kimi K3, many are starting to realize just how wrong they were. K3 immediately claimed the #1 spot on LMArena’s Frontend Code Arena, dethroning Claude Fable 5.

It doesn’t just write code; it thrives on complexity, creating fully playable 3D games and gorgeous interfaces entirely from scratch.

LMArena Champion: Beat the best proprietary models in frontend design.
Coded with Taste: Understands aesthetics, layout, and user experience natively.
Ambitious Generation: Spits out functional, interactive applications in one go.

3. “Long-horizon” autonomy isn’t restricted to closed APIs

We never actually needed closed, guarded infrastructure for this.

For all those long-running autonomous agents that run for hours to solve complex, multi-step tasks, Kimi K3 is here and it’s built from the ground-up for the marathon.

In testing, it ran autonomously in sandboxes for up to 48 hours. It didn’t break; instead, it built its own GPU programming compiler (MiniTriton) from scratch and solved graduate-level astrophysics.

48-Hour Autonomy: Stays on track for days without human hand-holding.
Hardware-Level Mastery: Optimizes kernels and compiles code natively.
True Agency: Moves open-source from simple chatbots to actual digital workers.

4. Defeating the memory bottleneck of 1M+ context windows

Running a 1-million-token context window on a giant model doesn’t really require hyper-optimized, proprietary hardware.

And Kimi K3 demonstrated this perfectly.

Instead of throwing more hardware at the problem, Moonshot AI used clever architecture.

By introducing Kimi Delta Attention and Attention Residuals, they slashed the memory needed for the model’s short-term cache by up to 75%, making massive inputs actually run on standard hardware.

75% Memory Savings: Drastically lowers the hardware bar for giant datasets.
Kimi Delta Attention: Smart compression that keeps long chats incredibly fast.
Open-Source Innovations: Anyone can now study and build on these memory-saving breakthroughs.

5. Demolishing the “delayed release” pattern

Before now, “Open-weights” used to mean getting yesterday’s technology.

We assumed labs would keep their shiny, cutting-edge flagship models closed, only releasing weaker, older versions to the public.

Moonshot AI completely breaks out of that mindset.

They’ve bypassed the “lite” versions and released the weights of their absolute crown jewel under a Modified MIT license.

They are putting the absolute state-of-the-art directly in our hands.

No “Lite” Gatekeeping: The actual, uncompromised flagship is being released.
Modified MIT License: Built for developer freedom and commercial innovation.
Running Neck-and-Neck: Proves open-source isn’t a step behind anymore—it’s setting the pace.

How Google AI Studio makes developers so much more powerful

By Tari Ibaba / Last updated on July 16, 2026

Google AI Studio just keeps ascending toward unbelievable heights…

From a simple prompt playground for basic AI testing…

To a practical development environment for building apps, interfaces, and AI-powered products faster.

With incredible features like prompt autocomplete, visual editing, and integrated sophisticated image generation, we developers can now rapidly move from idea to prototype faster than ever.

1. Tab Tab Tab: Prompt autocomplete

The “tab tab tab” prompt autocomplete feature helps developers expand rough ideas into stronger prompts instantly. Instead of writing a detailed prompt from scratch, a developer can start with something like:

“Create a clean SaaS dashboard with analytics cards…”

AI Studio can then suggest layout details, styling direction, responsiveness, components, and user flows.

This turns prompting into something closer to code autocomplete. It speeds up brainstorming, UI generation, front-end scaffolding, and MVP creation. Developers can quickly generate a React-style structure, landing page, dashboard, or app layout, then export and customize the code further.

For solo founders and indie hackers, this is especially useful because it reduces the time spent on boilerplate HTML, CSS, and basic UI structure.

2. Design previews

Design previews let developers choose the visual direction of an app before it’s finished.

In Google AI Studio’s vibe coding experience, Gemini can now generate custom themes while your app is being created.

Within seconds, developers can compare different looks and pick the one that best fits the product: minimal, playful, premium, futuristic, enterprise-ready, or creator-focused.

For SaaS builders this means landing pages, dashboards, and MVPs no longer have to start with a generic “AI-generated” look. You can establish a stronger visual identity from the beginning, then refine the code later.

3. Edit mode and annotation

Edit mode makes transforms AI studio from a chatbot into a full-blown visual development tool.

With annotation, developers can draw directly on the app interface. They can circle a section, mark an area, or point to a component and write notes such as:

“Make this bigger,” “move this to the top,” or “reduce the spacing here.”

The AI interprets the visual instruction and updates the app accordingly.

This is a major improvement because many UI changes are easier to show than explain. Instead of writing long prompts to describe a design problem, developers can communicate visually.

This brings AI Studio closer to tools like Figma, but with code generation and AI assistance built in.

4. Integrated image generation with Nano Banana

Nano Banana integration solves one of the most common developer problems: creating visual assets.

AI Studio can now generate custom images, logos, icons, illustrations, and UI graphics while the app is being built. This removes the need to search for placeholder images, icon packs, or temporary “programmer art.”

Even better, the generated assets can maintain a consistent aesthetic across the project. Colors, style, tone, and visual language can remain aligned from the landing page to icons and illustrations.

For developers building SaaS products this means they can create beautiful marketing pages and more polished MVPs without needing a designer at the earliest stage.

These features compress the product-building workflow. Developers can prompt an idea, preview the design, annotate changes, directly edit components, generate matching assets, and export code.

That makes Google AI Studio increasingly useful for rapid prototyping, MVP development, SaaS landing pages, and front-end experimentation. It helps developers spend less time fighting boilerplate and more time turning ideas into working products.

GPT-5.6 is an absolute game changer

By Tari Ibaba / Last updated on July 14, 2026

This is HUGE.

OpenAI just shocked the entire coding world with the new GPT-5.6.

It actually dominated Claude Fable 5 in some really key areas, wow.

GPT-5.6 outshines both Claude Fable and the legendary Claude Mythos in multiple high-profile benchmarks.

Unbelievable scores in the most challenging benchmarks out there.

GPT-5.6 beats out almost every model convincingly in the AI Code Arena Frontend leaderboard, scoring near Joint 1st, with Claude Fable 5

And it’s not just way more intelligent — it also comes with an entire new model family, and multiple new first-party tools to build incredible things with the power of GPT-5.6.

Including a brilliant new competitor to OpenClaw.

This model is so powerful that OpenAI actually had to spend weeks convincing the US government that it was safe enough to release into the wild.

1. A brand new model family

GPT-5.6 outclasses its predecessor in frontend web design:

They’ve completely abandoned their traditional one-model strategy.

GPT-5.6 now comes in three permanent capability tiers that can evolve independently over time.

Sol is the flagship model for advanced coding, research, cybersecurity, and complex agent workflows.

Terra is the balanced everyday workhorse, combining strong reasoning with lower latency.

Luna is highly optimized for speed and cost — making it ideal for high-volume tasks like customer support, translation, and summarization.

Instead of paying flagship prices for every request, we can now choose the right model for each workload.

2. Record-breaking leaps in intelligence and coding ability

For the first time in several months, an OpenAI model tops the highly granular DesignArena frontend benchmarks, with the new GPT-5.6:

GPT-5.6 Sol is now OpenAI’s most capable model.

It sets new state-of-the-art results across several major benchmarks, including TerminalBench, BrowseComp, and Humanity’s Last Exam.

On TerminalBench 2.1, which evaluates real-world software engineering tasks, Sol scored 88.8%, rising to 91.9% in its new Ultra reasoning mode.

On BrowseComp, it achieved 92.2%, while Humanity’s Last Exam climbed to 52.7%, outperforming GPT-5.5 as well as Claude Fable 5 and Claude Opus 4.8. OpenAI also says Sol reaches these scores while using significantly fewer output tokens than previous models.

Perhaps the most impressive breakthrough comes from ARC-AGI-3, a benchmark designed to measure how well AI adapts to completely unfamiliar problems.

Sol became the first frontier model to solve a public ARC-AGI-3 task.

The interesting part isn’t just that it performs better. It’s how it reasons.

When one approach fails, Sol dynamically forms a new hypothesis instead of repeatedly executing the same flawed plan — a major weakness of previous-generation AI agents.

3. So good at hacking it delayed the launch

GPT-5.6 outclasses top Claude models in the extremely long-horizon Agent’s Last Exam benchmarks, achieving the best results for an extreme fraction of the cost:

GPT-5.6’s cybersecurity capabilities became advanced enough to trigger an additional U.S. government national security review before public deployment.

The model shows major improvements in command-line operations, vulnerability discovery, threat modeling, exploit analysis, code review, and automated patch generation.

Those gains required an equally large investment in safety. According to OpenAI, Sol now blocks roughly 10× more potentially malicious cyber activity than previous generations while remaining below the company’s “Critical” capability threshold.

4. Massive efficiency gains = massive cost savings

GPT-5.6 demonstrates an incredible ability to generate complex, highly sophisticated interactive diagrams and visualizations from scratch:

GPT-5.6 isn’t just smarter — it’s substantially more efficient.

Sol is 54% more token-efficient on AI coding tasks than previous reasoning models while still achieving better results. It delivers top-tier performance using less computation, reducing both latency and API costs.

And the pricing reflects that efficiency too.

Sol costs $5 per million input tokens and $30 per million output tokens, while Terra and Luna give you progressively cheaper alternatives for less demanding workloads.

5. ChatGPT Work and multi-agent orchestration

ChatGPT Work using multiple tools and features to get achieve tasks blazingly fast:

GPT-5.6 also powers ChatGPT Work, OpenAI’s new platform built around delegation and getting things done.

Its headline feature is Ultra mode, which can coordinate up to four AI agents simultaneously across parallel workstreams.

Instead of solving a project sequentially, GPT-5.6 splits it into multiple tasks, lets separate agents tackle them in parallel, and combines the results into a finished deliverable. That pushes TerminalBench performance from 88.8% to 91.9%.

ChatGPT Work also integrates directly with tools like Slack, Google Drive, Microsoft Teams, CRM platforms, and local desktop files.

This lets GPT-5.6 spend hours building web apps, updating spreadsheets, generating reports, or completing other multi-step workflows with minimal supervision.

GPT-5.6 is going to seriously revolutionize the AI coding landscape.

With stronger reasoning, industry-leading coding performance, permanent capability tiers, major efficiency gains, and true multi-agent orchestration, it’s one of OpenAI’s biggest releases in years.

1. Independent context windows

2. True peer-to-peer communication

3. Interactive split-pane views

4. Mix-and-match AI models

5. Parallel “competing hypotheses” debugging

1. Proactive delegation

2. Cross-provider review

3. Hybrid runtime: local + cloud working together

4. MCP shines yet again

5. Competitive pricing: follow the strategy

What this really means

1. Hard security guardrails

2. Enforcing engineering standards

3. Dynamic context injection

4. Automatic code formatting

5. External tool orchestration

1. The surgical strike

2. Extreme disambiguation

3. Goal-first thinking (declarative over imperative)

What this changes

Imperative vs declarative

4. Anti–future-proofing (simplicity first)

The takeaway

1. Your context shouldn’t increase with every new message

2. Your Claude Code sessions don’t have to start from zero

3. Claude can’t figure everything out by itself

4. What does Claude Code do when context gets out of hand?

5. What can *you* do when context gets out of hand?

1. 2.8 trillion parameters

2. Coding with open-source models is no longer a joke

3. “Long-horizon” autonomy isn’t restricted to closed APIs

4. Defeating the memory bottleneck of 1M+ context windows

5. Demolishing the “delayed release” pattern

1. Tab Tab Tab: Prompt autocomplete

2. Design previews

3. Edit mode and annotation

4. Integrated image generation with Nano Banana

1. A brand new model family

2. Record-breaking leaps in intelligence and coding ability

3. So good at hacking it delayed the launch

4. Massive efficiency gains = massive cost savings

5. ChatGPT Work and multi-agent orchestration

5. What can you do when context gets out of hand?