openai

OpenAI o3 clearly proves that “AGI” doesn’t really mean anything

o3! Wow! AGI has finally been achieved now?! No way!!

Lol.

So when is someone finally going to tell us what “AGI” means?

Or are we going to keep moving the goalposts to keep our heads in sands about the inevitable?

Okay so sure, it wasn’t AGI when ChatGPT first shocked everyone including OpenAI in 2022 with dynamic responses on any single topic. I remember many were playing dumb back then and calling it glorified autocomplete.

It wasn’t AGI when GPT passed the Bar and SAT.

It wasn’t AGI when GPT smashed the Turing test. Never mind that what makes a Turing test has kept changing for years as AI gets more and more advanced — another way of moving the goalposts.

And this one too 🙂

Now we have o3 scoring 87.5% in the ARC-AGI benchmarks… are we there yet?

Damn, look at how much that stuff costs tho. $2,500 per task on the high end? But of course it will eventually come down — right? Right?

No way in hell they’re going to give this away for free. Maybe we can expect a new $2000 plan soon.

But is it “AGI”?

How good does AI have to get before you say it’s “general”?

o3 destroys PhDs in standardized tests, gets 96.7 in one of the toughest math exams in the world, beats almost every single competitive coder on Codeforces…

ChatGPT could already do therapy, write poems and song lyrics you could never dream of, generate personalized workout plans, explain weird French translations…

But oh no, it couldn’t possibly be “general” intelligence, it’s just glorified autocomplete.

Or no it needs to be an omnipotent god before we can call it AGI.

Why do we even treat AGI in such a binary way? Is it or it’s not? A onetime ultimate final destination after which all our jobs get wiped out instantly and we’re doomed.

The job loss is already here and it’s happening steadily but surely, as AGI-lity advances.

You say for AGI it needs to be able to learn and reason, but GPT can’t already do that? And what do you really mean by “learn” and “reason”?

When you upload a PDF to ChatGPT that it’s never seen before and it answers every single question you ask far faster than if you slogged away reading it yourself, it didn’t “learn”? But for you you would have “learned” right? Or was it still glorified autocomplete? But not you right? You have “real” intelligence.

There’s a reason why tools like AutoGPT and BabyAGI were such big deals. They were the first AI agents.

They could create a step-by-step plan to achieve any goal using the tools at their disposal — while checking if its actions were in line with the plan.

And when you think of it, this is basically what we humans do almost every single moment of our lives, even if we don’t realize it.

Life is all goals, conscious or sub-conscious, short-term or long-term — eat, tell a joke, get rich, write an article, run for your life, cast your vote, kiss…

We break down the goals into smaller sub-goals and use the tools we have to achieve them — our legs, our speech, our devices, our money, and so much more.

I said “I’m hungry” and it knew I wanted food, is that not “reasoning”?

These tools weren’t perfect but they still had promises glimpses of success in many demos that went viral across the internet.

They showed us what’s possible with AI agents, and now we everyone rushing to build the most advanced agent

These agents will only continue to get better and better at complex problem solving and reasoning. After a certain point the only thing limiting them will be the tools you connect them to.

We’re already seeing the progress — just look at what Google’s new Project Mariner agent in action

Coding tools like Copilot and the new agentic Windsurf IDE are already making software dev easier than ever before.

o3-powered agents will be even more powerful those of previous models.

Fact is whether you call these tools AGI or not, they’re already doing things we never dreamed they would be able to.

“Creative” jobs like writing, visual art, and UI design are already falling to AI. Now we have video generators upon us that will only rapidly improve as enter 2025.

The AI takeover is happening and it’s not stopping, whether they meet your definition of AGI or not.

OpenAI’s new AI agent will change everything

The new OpenAI operator agent will change the world forever.

This is going to be a real AI agent that actually works — unlike gimmicks like AutoGPT.

Soon AI will be able to solve complex goals with lots of interconnected steps.

Completely autonomous — no continuous prompts — zero human guidance apart from dynamic input for each step.

Imagine you could just tell ChatGPT “teach me French” and that’ll be all it needs…

  • Analyzing your French level with a quick quiz
  • Crafting a comprehensive learning plan
  • Setting phone and email reminders to help you stick to your plan…
Not quite there yet 😉

This is basically the beginning of AGI — if it isn’t already.

And when you think of it this is already what apps like Duolingo try to do — solving complex problems.

But an AI agent will do this in far more comprehensive and personalized way — intelligently adjusting to the user’s needs and changing desires.

You can say something super vague like “plan my next holiday” and instantly your agent gets to work:

  • Analyzes your calendar to know the best next holiday time
  • Figures out someone you’ll love from previous conversations that stays within your budget
  • Books flights and sets reservations according to your schedule

This will change everything.

Which is why they’re not the only ones working on agents — the AI race continues…

We have Google apparently working on “Project Jarvis” — an AI agent to automate web-based tasks in Chrome.

Automatically jumping from page to page and filling out forms and clicking buttons.

Maybe something like Puppeteer — a dev tool programmers use to make the browser do stuff automatically — but it isn’t hard-coded and it’s far more flexible.

Anthropic has already released their own AI agent in Claude 3.5 Sonnet — a groundbreaking “computer use” feature.

Google and Apple will probably have a major advantage over OpenAI and Anthropic though — cause of Android and iOS.

Gemini Android and Apple Intelligence could seamlessly switch between all your mobile apps for a complex chain of actions.

Since they have deep access to the OS they could even use the apps without having to open them visually.

They’ll control system settings.

You call the Apple Intelligence agent, “Send a photo of a duck to my Mac”, and it’ll generate an image of a duck, turn on Airdrop on iPhone, send the photo and turn Airdrop back off.

But the most power all these agents will have comes from the API interface — letting you build tools to plug into the agent.

Like you can create a “Spotify” tool that’ll let you play music from the agent. Or a “Google” tool to check your mails and plan events with your calendar.

So it all really looks promising — and as usual folks like Sam Altman are already promising the world with it.

AI agents may well be the future—personalized, autonomous, and powerful. They’ll revolutionize how we learn, plan, and interact. The race is on.

We may see devastating job-loss impacts in several industries — including software development…

Let’s see how it goes.

New Gemini 1.5 FLASH model: An absolute Google game changer

So Google has finally decided to show OpenAI who the real king of AI is.

Their new Gemini 1.5 Flash model blows GPT-4o out of the water and the capabilities are hard to believe.

Lightning fast.

33 times cheaper than GPT-4o but has a 700% greater context — 1 million tokens.

What is 1 million tokens in the real-world? Approximately:

  • Over an 1 hour of video
  • Over 30,000 lines of code
  • Over 700,000 words

❌GPT-4o cost:

  • Input: $2.50 per million tokens
  • Output: $10 per million tokens
  • Cached input: $1.25 per million tokens

✅ Gemini 1.5 Flash cost:

  • Input: $0.075 per million tokens
  • Output: $0.30 per million tokens
  • Cached input: $0.01875 per million tokens

And then there’s the mini Flash-8B version for cost-efficient tasks — 66 times cheaper:

And the best part is the multi-modality — it can reason with text, files, images and audio in complex integrated ways.

And 1.5 Flash has almost all the capabilities of Pro but much faster. And as a dev you can start using them now.

Gemini 1.5 Pro was tested with a 44-minute silent movie and astonishingly, it easily analyzed the movie into various plot points and events. Even pointing out tiny details that most of us would miss on first watch.

Meanwhile the GPT-4o API only lets you work with text and images.

You can easily create, test and refine prompts in Google’s AI Studio — completely free.

It doesn’t count in your billing like in OpenAI playground.

Just look at the power of Google AI Studio — creating a food recipe based on an image:

I uploaded this delicious bread from gettyimages:

Now:

What if I want the response to be a specialized format for my API or something?

Then you can just turn on JSON mode and specify the response schema:

OpenAI playground has this too, but it’s not as intuitive to work with.

Another upgrade Gemini has over OpenAI is how creativity it can be.

In Gemini you can increase the temperature from 0 to 200% to control how random and creative the responses are:

Meanwhile in OpenAI if you try going far beyond 100%, you’ll most likely get a whole literal load of nonsense.

And here’s the best part — when you’re done creating your prompt you can just use Get code — easily copy and paste the boilerplate API code and move lightning-fast in your development.

Works in several languages including Kotlin, Swift and Dart — efficient AI workflow in mobile dev.

In OpenAI playground you can get the code for Python and JavaScript.

Final thoughts

Gemini 1.5 Flash is a game-changer offering unparalleled capabilities at a fraction of the cost.

With its advanced multi-modality ease of use, generous free pricing, and creative potential it sets a new standard for AI leaving GPT-4o in the dust.

Fine-tuning for OpenAI’s GPT-3.5 Turbo model is finally here

Some great news lately for AI developers from OpenAI.

Finally, you can now fine-tune the GPT-3.5 Turbo model using your own data. This gives you the ability to create customized versions of the OpenAI model that perform incredibly well at specific tasks and give responses in a customized format and tone, perfect for your use case.

For example, we can use fine-tuning to ensure that our model always responds in a JSON format, containing Spanish, with a friendly, informal tone. Or we could make a model that only gives one out of a finite set of responses, e.g., rating customer reviews as critical, positive, or neutral, according to how *we* define these terms.

As stated by OpenAI, early testers have successfully used fine-tuning in various areas, such as being able to:

  • Make the model output results in a more consistent and reliable format.
  • Match a specific brand’s style and messaging.
  • Improve how well the model follows instructions.

The company also claims that fine-tuned GPT-3.5 Turbo models can match and even exceed the capabilities of base GPT-4 for certain tasks.

Before now, fine-tuning was only possible with weaker, costlier GPT-3 models, like davinci-002 and babbage-002. Providing custom data for a GPT-3.5 Turbo model was only possible with techniques like few-shot prompting and vector embedding.

OpenAI also assures that any data used for fine-tuning any of their models belongs to the customer, and then don’t use it to train their models.

What is GPT-3.5 Turbo, anyway?

Launched earlier this year, GPT-3.5 Turbo is a model range that OpenAI introduced, stating that it is perfect for applications that do not solely focus on chat. It boasts the capability to manage 4,000 tokens at once, a figure that is twice the capacity of the preceding model. The company highlighted that preliminary users successfully shortened their prompts by 90% after applying fine-tuning on the GPT-3.5 Turbo model.

What can I use GPT-3.5 Turbo fine-tuning for?

  • Customer service automation: We can use a fine-tuned GPT model to make virtual customer service agents or chatbots that deliver responses in line with the brand’s tone and messaging.
  • Content generation: The model can be used for generating marketing content, blog posts, or social media posts. The fine-tuning would allow the model to generate content in a brand-specific style according to prompts given.
  • Code generation & auto-completion: In software development, such a model can provide developers with code suggestions and autocompletion to boost their productivity and get coding done faster.
  • Translation: We can use a fine-tuned GPT model for translation tasks, converting text from one language to another with greater precision. For example, the model can be tuned to follow specific grammatical and syntactical rules of different languages, which can lead to higher accuracy translations.
  • Text summarization: We can apply the model in summarizing lengthy texts such as articles, reports, or books. After fine-tuning, it can consistently output summaries that capture the key points and ideas without distorting the original meaning. This could be particularly useful for educational platforms, news services, or any scenario where digesting large amounts of information quickly is crucial.

How much will GPT-3.5 Turbo fine-tuning cost?

There’s the cost of fine-tuning and then the actual usage cost.

  • Training: $0.008 / 1K tokens
  • Usage input: $0.012 / 1K tokens
  • Usage output: $0.016 / 1K tokens

For example, a gpt-3.5-turbo fine-tuning job with a training file of 100,000 tokens that is trained for 3 epochs would have an expected cost of $2.40.

OpenAI, GPT 3.5 Turbo fine-tuning and API updates

When will fine-tuning for GPT-4 be available?

This fall.

OpenAI has announced that support for fine-tuning GPT-4, its most recent version of the large language model, is expected to be available later this year, probably during the fall season. This upgraded model has been proven to perform at par with humans across diverse professional and academic benchmarks. It surpasses GPT-3.5 in terms of reliability, creativity, and its capacity to deal with instructions that are more nuanced.