openai Archives - Coding Beauty

DeepSeek really destroyed OpenAI and ChatGPT without even trying

By Tari Ibaba / Last updated on January 30, 2025

Big Tech truly got the shock of their lives from China.

They really thought they were light years ahead of everyone else just because they had all the money in the world.

But DeepSeek just taught them a lesson never to forget.

After these tech giants blindly poured all those billions and billions of dollars into their models in desperate attempts to stay ahead in the AI race.

DeepSeek spent just a tiny tiny fraction of that — less than $6 million — to train a model that destroys 97% of all the major models like GPT-4 and Gemini in every way.

And far far cheaper to run too — China 😅

You easily see how DeepSeek is by far the most cost-efficient of all the major models.

And not just relatively efficient but more intelligent on an absolute measurement.

Only o1 can compare — and you can see just how ridiculously expensive it is — just look at the crazy gap to DeepSeek and all the rest.

DeepSeek is at least 20 times cheaper than o1 and yet matches it in every way.

Well well well.

So all those heavily funded genius computer scientists working on all those models — got thoroughly outclassed by a tiny side project from like 50 random guys from China.

And then the final nail in the coffin — open-source and free to use.

These tech giants tried so hard to keep the inner workings of all their fancy models from the public — so many trillions to made from being the first and only to achieve and control the holy grail of superintelligence, right?

Lol remember when OpenAI used to actually be open…

But now this one-year-old startup just came out of absolutely nowhere and crashed the entire pro-profit party.

Not just open-source but with MIT license — meaning you get to do basically whatever you want with it.

The entire algorithm is all out in the open for everybody and their dog to see. And test and run.

Many users have already been talking about how much more creative and clever the DeepSeek feels compared to ChatGPT.

With all of this it wasn’t surprising to see their official apps rocket to the top of the charts on both app stores.

It’s funny how all this comes just a few days after the so-called Stargate Project that’s costing as much as $500 billion dollars.

These huge US tech companies have been swimming in so much cash and have been getting lazy.

Their main focus seemed to be just pumping in as much cash as possible to fatten up their models — and then hoping that the models just keep improving from getting bigger and bigger.

GPT-3.5 — 175 billion paramters

GPT-4 — 1.8 trillion parameters

GPT-4 was largely better but was it anywhere close to TEN times as better? Of course not. It seemed even worse at some tasks.

Instead of trying to improve how they train the models and looking for ways to improve on the transformer LLM architecture.

They just kept doubling down on model size and raw computing power

Gobbling up chips from Nvidia and shooting their stock price to the moon ($600 billion wiped out in the last few days btw)

Trying to build massive NUCLEAR-powered data centers (really?)

Now DeepSeek just educated them on how much better a model with the same resources can be with superior training methods.

It’s a wake-up call that spread panic across the US stock market.

The disruption will hopefully send back more researchers back to the drawing board to focus on what matters, leading to more solid AI progress across the board.

ChatGPT stands zero chance on WhatsApp unless they change this quickly

By Tari Ibaba / Last updated on January 3, 2025

ChatGPT is now on WhatsApp, but right now it’s too obvious it stands no chance against Meta AI.

For starters, they’ve got a cool branded number, but that can never compete with how easy Meta AI is to access.

And you still have to create a new contact to actually add the number.

May not seem like a lot of work but we humans can be pretty lazy and a tiny bit of friction like this is all takes to stop a lot of us from ever giving WhatsApp ChatGPT a shot.

I know it took much longer to try it out compared to Meta AI that was just there. One tap of a button.

When you clear your Meta AI chat it’s so easy to start a new one with the same button — but with ChatGPT you have manually search for it in your contact list like for any other contact.

They just can’t compete with this native integration. It’s also a major advantage Apple and Google have over every other AI-obsessed company. No AI will ever be able to integrate deeply with iOS and Apple ecosystem as much as Siri + Apple Intelligence. Even they’re better.

Even within chats they also couldn’t match up in usability.

Read receipts — seems small but that feedback can go a long way. It’s a big reason why me and lot of people keep them turned on.

ChatGPT doesn’t have them like Meta AI.

Meta AI even has a loading indicator when processing your messages — probably a native feature only accessible in that chat.

And when you finally get the response from Meta they will be much more up-to-date than what ChatGPT gives you.

Meta AI can give answers straight from the web.

Right now it’s powered by Llama 3.2, and they will only keep upgrading the model and built-in cutoff point.

Meanwhile ChatGPT on WhatsApp actually uses the outdated GPT-4 model — which means it’s knowledge cut off is still in April 2023.

And it doesn’t even have access to the web to compensate this.

And they only GPT-4 so they’ll be no image generation like in regular ChatGPT.

Meta AI can even join group chats and give useful info when you tag it.

Native first-party features like this will make really hard for OpenAI to overthrow Meta AI in home territory.

But they still have a real chance in this.

They do have the calling feature which Meta doesn’t — though I doubt most people will use it.

But like if they upgrade the model on WhatsApp to GPT-4o, they’ll be able to work with images and audio — a great advantage of Meta AI that doesn’t even let you upload anything.

They’re definitely seeing the advantage of being so close to users directly in their favorite chat app where they have all their friends and family.

Especially when there are over 2.7 billion of those users.

OpenAI o3 clearly proves that “AGI” doesn’t really mean anything

By Tari Ibaba / Last updated on January 2, 2025

o3! Wow! AGI has finally been achieved now?! No way!!

Lol.

So when is someone finally going to tell us what “AGI” means?

Or are we going to keep moving the goalposts to keep our heads in sands about the inevitable?

Okay so sure, it wasn’t AGI when ChatGPT first shocked everyone including OpenAI in 2022 with dynamic responses on any single topic. I remember many were playing dumb back then and calling it glorified autocomplete.

It wasn’t AGI when GPT passed the Bar and SAT.

It wasn’t AGI when GPT smashed the Turing test. Never mind that what makes a Turing test has kept changing for years as AI gets more and more advanced — another way of moving the goalposts.

Now we have o3 scoring 87.5% in the ARC-AGI benchmarks… are we there yet?

Damn, look at how much that stuff costs tho. $2,500 per task on the high end? But of course it will eventually come down — right? Right?

No way in hell they’re going to give this away for free. Maybe we can expect a new $2000 plan soon.

But is it “AGI”?

How good does AI have to get before you say it’s “general”?

o3 destroys PhDs in standardized tests, gets 96.7 in one of the toughest math exams in the world, beats almost every single competitive coder on Codeforces…

ChatGPT could already do therapy, write poems and song lyrics you could never dream of, generate personalized workout plans, explain weird French translations…

But oh no, it couldn’t possibly be “general” intelligence, it’s just glorified autocomplete.

Or no it needs to be an omnipotent god before we can call it AGI.

Why do we even treat AGI in such a binary way? Is it or it’s not? A onetime ultimate final destination after which all our jobs get wiped out instantly and we’re doomed.

The job loss is already here and it’s happening steadily but surely, as AGI-lity advances.

You say for AGI it needs to be able to learn and reason, but GPT can’t already do that? And what do you really mean by “learn” and “reason”?

When you upload a PDF to ChatGPT that it’s never seen before and it answers every single question you ask far faster than if you slogged away reading it yourself, it didn’t “learn”? But for you you would have “learned” right? Or was it still glorified autocomplete? But not you right? You have “real” intelligence.

There’s a reason why tools like AutoGPT and BabyAGI were such big deals. They were the first AI agents.

They could create a step-by-step plan to achieve any goal using the tools at their disposal — while checking if its actions were in line with the plan.

And when you think of it, this is basically what we humans do almost every single moment of our lives, even if we don’t realize it.

Life is all goals, conscious or sub-conscious, short-term or long-term — eat, tell a joke, get rich, write an article, run for your life, cast your vote, kiss…

We break down the goals into smaller sub-goals and use the tools we have to achieve them — our legs, our speech, our devices, our money, and so much more.

I said “I’m hungry” and it knew I wanted food, is that not “reasoning”?

These tools weren’t perfect but they still had promises glimpses of success in many demos that went viral across the internet.

They showed us what’s possible with AI agents, and now we everyone rushing to build the most advanced agent

These agents will only continue to get better and better at complex problem solving and reasoning. After a certain point the only thing limiting them will be the tools you connect them to.

We’re already seeing the progress — just look at what Google’s new Project Mariner agent in action

Coding tools like Copilot and the new agentic Windsurf IDE are already making software dev easier than ever before.

o3-powered agents will be even more powerful those of previous models.

Fact is whether you call these tools AGI or not, they’re already doing things we never dreamed they would be able to.

“Creative” jobs like writing, visual art, and UI design are already falling to AI. Now we have video generators upon us that will only rapidly improve as enter 2025.

The AI takeover is happening and it’s not stopping, whether they meet your definition of AGI or not.

OpenAI’s new AI agent will change everything

By Tari Ibaba / Last updated on November 26, 2024

The new OpenAI operator agent will change the world forever.

This is going to be a real AI agent that actually works — unlike gimmicks like AutoGPT.

Soon AI will be able to solve complex goals with lots of interconnected steps.

Completely autonomous — no continuous prompts — zero human guidance apart from dynamic input for each step.

Imagine you could just tell ChatGPT “teach me French” and that’ll be all it needs…

Analyzing your French level with a quick quiz
Crafting a comprehensive learning plan
Setting phone and email reminders to help you stick to your plan…

This is basically the beginning of AGI — if it isn’t already.

And when you think of it this is already what apps like Duolingo try to do — solving complex problems.

But an AI agent will do this in far more comprehensive and personalized way — intelligently adjusting to the user’s needs and changing desires.

You can say something super vague like “plan my next holiday” and instantly your agent gets to work:

Analyzes your calendar to know the best next holiday time
Figures out someone you’ll love from previous conversations that stays within your budget
Books flights and sets reservations according to your schedule

This will change everything.

Which is why they’re not the only ones working on agents — the AI race continues…

We have Google apparently working on “Project Jarvis” — an AI agent to automate web-based tasks in Chrome.

Automatically jumping from page to page and filling out forms and clicking buttons.

Maybe something like Puppeteer — a dev tool programmers use to make the browser do stuff automatically — but it isn’t hard-coded and it’s far more flexible.

Anthropic has already released their own AI agent in Claude 3.5 Sonnet — a groundbreaking “computer use” feature.

Google and Apple will probably have a major advantage over OpenAI and Anthropic though — cause of Android and iOS.

Gemini Android and Apple Intelligence could seamlessly switch between all your mobile apps for a complex chain of actions.

Since they have deep access to the OS they could even use the apps without having to open them visually.

They’ll control system settings.

You call the Apple Intelligence agent, “Send a photo of a duck to my Mac”, and it’ll generate an image of a duck, turn on Airdrop on iPhone, send the photo and turn Airdrop back off.

But the most power all these agents will have comes from the API interface — letting you build tools to plug into the agent.

Like you can create a “Spotify” tool that’ll let you play music from the agent. Or a “Google” tool to check your mails and plan events with your calendar.

So it all really looks promising — and as usual folks like Sam Altman are already promising the world with it.

AI agents may well be the future—personalized, autonomous, and powerful. They’ll revolutionize how we learn, plan, and interact. The race is on.

We may see devastating job-loss impacts in several industries — including software development…

Let’s see how it goes.

New Gemini 1.5 FLASH model: An absolute Google game changer

By Tari Ibaba / Last updated on November 6, 2024

So Google has finally decided to show OpenAI who the real king of AI is.

Their new Gemini 1.5 Flash model blows GPT-4o out of the water and the capabilities are hard to believe.

Lightning fast.

33 times cheaper than GPT-4o but has a 700% greater context — 1 million tokens.

What is 1 million tokens in the real-world? Approximately:

Over an 1 hour of video
Over 30,000 lines of code
Over 700,000 words

❌GPT-4o cost:

Input: $2.50 per million tokens
Output: $10 per million tokens
Cached input: $1.25 per million tokens

✅ Gemini 1.5 Flash cost:

Input: $0.075 per million tokens
Output: $0.30 per million tokens
Cached input: $0.01875 per million tokens

And then there’s the mini Flash-8B version for cost-efficient tasks — 66 times cheaper:

And the best part is the multi-modality — it can reason with text, files, images and audio in complex integrated ways.

And 1.5 Flash has almost all the capabilities of Pro but much faster. And as a dev you can start using them now.

Gemini 1.5 Pro was tested with a 44-minute silent movie and astonishingly, it easily analyzed the movie into various plot points and events. Even pointing out tiny details that most of us would miss on first watch.

Meanwhile the GPT-4o API only lets you work with text and images.

You can easily create, test and refine prompts in Google’s AI Studio — completely free.

It doesn’t count in your billing like in OpenAI playground.

Just look at the power of Google AI Studio — creating a food recipe based on an image:

I uploaded this delicious bread from gettyimages:

Now:

What if I want the response to be a specialized format for my API or something?

Then you can just turn on JSON mode and specify the response schema:

OpenAI playground has this too, but it’s not as intuitive to work with.

Another upgrade Gemini has over OpenAI is how creativity it can be.

In Gemini you can increase the temperature from 0 to 200% to control how random and creative the responses are:

Meanwhile in OpenAI if you try going far beyond 100%, you’ll most likely get a whole literal load of nonsense.

And here’s the best part — when you’re done creating your prompt you can just use Get code — easily copy and paste the boilerplate API code and move lightning-fast in your development.

Works in several languages including Kotlin, Swift and Dart — efficient AI workflow in mobile dev.

In OpenAI playground you can get the code for Python and JavaScript.

Final thoughts

Gemini 1.5 Flash is a game-changer offering unparalleled capabilities at a fraction of the cost.

With its advanced multi-modality ease of use, generous free pricing, and creative potential it sets a new standard for AI leaving GPT-4o in the dust.

Fine-tuning for OpenAI’s GPT-3.5 Turbo model is finally here

By Tari Ibaba / Last updated on August 25, 2023

Some great news lately for AI developers from OpenAI.

Finally, you can now fine-tune the GPT-3.5 Turbo model using your own data. This gives you the ability to create customized versions of the OpenAI model that perform incredibly well at specific tasks and give responses in a customized format and tone, perfect for your use case.

For example, we can use fine-tuning to ensure that our model always responds in a JSON format, containing Spanish, with a friendly, informal tone. Or we could make a model that only gives one out of a finite set of responses, e.g., rating customer reviews as critical, positive, or neutral, according to how *we* define these terms.

As stated by OpenAI, early testers have successfully used fine-tuning in various areas, such as being able to:

Make the model output results in a more consistent and reliable format.
Match a specific brand’s style and messaging.
Improve how well the model follows instructions.

The company also claims that fine-tuned GPT-3.5 Turbo models can match and even exceed the capabilities of base GPT-4 for certain tasks.

Before now, fine-tuning was only possible with weaker, costlier GPT-3 models, like davinci-002 and babbage-002. Providing custom data for a GPT-3.5 Turbo model was only possible with techniques like few-shot prompting and vector embedding.

OpenAI also assures that any data used for fine-tuning any of their models belongs to the customer, and then don’t use it to train their models.

What is GPT-3.5 Turbo, anyway?

Launched earlier this year, GPT-3.5 Turbo is a model range that OpenAI introduced, stating that it is perfect for applications that do not solely focus on chat. It boasts the capability to manage 4,000 tokens at once, a figure that is twice the capacity of the preceding model. The company highlighted that preliminary users successfully shortened their prompts by 90% after applying fine-tuning on the GPT-3.5 Turbo model.

What can I use GPT-3.5 Turbo fine-tuning for?

Customer service automation: We can use a fine-tuned GPT model to make virtual customer service agents or chatbots that deliver responses in line with the brand’s tone and messaging.
Content generation: The model can be used for generating marketing content, blog posts, or social media posts. The fine-tuning would allow the model to generate content in a brand-specific style according to prompts given.
Code generation & auto-completion: In software development, such a model can provide developers with code suggestions and autocompletion to boost their productivity and get coding done faster.
Translation: We can use a fine-tuned GPT model for translation tasks, converting text from one language to another with greater precision. For example, the model can be tuned to follow specific grammatical and syntactical rules of different languages, which can lead to higher accuracy translations.
Text summarization: We can apply the model in summarizing lengthy texts such as articles, reports, or books. After fine-tuning, it can consistently output summaries that capture the key points and ideas without distorting the original meaning. This could be particularly useful for educational platforms, news services, or any scenario where digesting large amounts of information quickly is crucial.

How much will GPT-3.5 Turbo fine-tuning cost?

There’s the cost of fine-tuning and then the actual usage cost.

Training: $0.008 / 1K tokens
Usage input: $0.012 / 1K tokens
Usage output: $0.016 / 1K tokens

For example, a gpt-3.5-turbo fine-tuning job with a training file of 100,000 tokens that is trained for 3 epochs would have an expected cost of $2.40.
OpenAI, GPT 3.5 Turbo fine-tuning and API updates

When will fine-tuning for GPT-4 be available?

This fall.

OpenAI has announced that support for fine-tuning GPT-4, its most recent version of the large language model, is expected to be available later this year, probably during the fall season. This upgraded model has been proven to perform at par with humans across diverse professional and academic benchmarks. It surpasses GPT-3.5 in terms of reliability, creativity, and its capacity to deal with instructions that are more nuanced.