openai

OpenAI’s new AI agent will change everything

The new OpenAI operator agent will change the world forever.

This is going to be a real AI agent that actually works — unlike gimmicks like AutoGPT.

Soon AI will be able to solve complex goals with lots of interconnected steps.

Completely autonomous — no continuous prompts — zero human guidance apart from dynamic input for each step.

Imagine you could just tell ChatGPT “teach me French” and that’ll be all it needs…

  • Analyzing your French level with a quick quiz
  • Crafting a comprehensive learning plan
  • Setting phone and email reminders to help you stick to your plan…
Not quite there yet 😉

This is basically the beginning of AGI — if it isn’t already.

And when you think of it this is already what apps like Duolingo try to do — solving complex problems.

But an AI agent will do this in far more comprehensive and personalized way — intelligently adjusting to the user’s needs and changing desires.

You can say something super vague like “plan my next holiday” and instantly your agent gets to work:

  • Analyzes your calendar to know the best next holiday time
  • Figures out someone you’ll love from previous conversations that stays within your budget
  • Books flights and sets reservations according to your schedule

This will change everything.

Which is why they’re not the only ones working on agents — the AI race continues…

We have Google apparently working on “Project Jarvis” — an AI agent to automate web-based tasks in Chrome.

Automatically jumping from page to page and filling out forms and clicking buttons.

Maybe something like Puppeteer — a dev tool programmers use to make the browser do stuff automatically — but it isn’t hard-coded and it’s far more flexible.

Anthropic has already released their own AI agent in Claude 3.5 Sonnet — a groundbreaking “computer use” feature.

Google and Apple will probably have a major advantage over OpenAI and Anthropic though — cause of Android and iOS.

Gemini Android and Apple Intelligence could seamlessly switch between all your mobile apps for a complex chain of actions.

Since they have deep access to the OS they could even use the apps without having to open them visually.

They’ll control system settings.

You call the Apple Intelligence agent, “Send a photo of a duck to my Mac”, and it’ll generate an image of a duck, turn on Airdrop on iPhone, send the photo and turn Airdrop back off.

But the most power all these agents will have comes from the API interface — letting you build tools to plug into the agent.

Like you can create a “Spotify” tool that’ll let you play music from the agent. Or a “Google” tool to check your mails and plan events with your calendar.

So it all really looks promising — and as usual folks like Sam Altman are already promising the world with it.

AI agents may well be the future—personalized, autonomous, and powerful. They’ll revolutionize how we learn, plan, and interact. The race is on.

We may see devastating job-loss impacts in several industries — including software development…

Let’s see how it goes.

New Gemini 1.5 FLASH model: An absolute Google game changer

So Google has finally decided to show OpenAI who the real king of AI is.

Their new Gemini 1.5 Flash model blows GPT-4o out of the water and the capabilities are hard to believe.

Lightning fast.

33 times cheaper than GPT-4o but has a 700% greater context — 1 million tokens.

What is 1 million tokens in the real-world? Approximately:

  • Over an 1 hour of video
  • Over 30,000 lines of code
  • Over 700,000 words

❌GPT-4o cost:

  • Input: $2.50 per million tokens
  • Output: $10 per million tokens
  • Cached input: $1.25 per million tokens

✅ Gemini 1.5 Flash cost:

  • Input: $0.075 per million tokens
  • Output: $0.30 per million tokens
  • Cached input: $0.01875 per million tokens

And then there’s the mini Flash-8B version for cost-efficient tasks — 66 times cheaper:

And the best part is the multi-modality — it can reason with text, files, images and audio in complex integrated ways.

And 1.5 Flash has almost all the capabilities of Pro but much faster. And as a dev you can start using them now.

Gemini 1.5 Pro was tested with a 44-minute silent movie and astonishingly, it easily analyzed the movie into various plot points and events. Even pointing out tiny details that most of us would miss on first watch.

Meanwhile the GPT-4o API only lets you work with text and images.

You can easily create, test and refine prompts in Google’s AI Studio — completely free.

It doesn’t count in your billing like in OpenAI playground.

Just look at the power of Google AI Studio — creating a food recipe based on an image:

I uploaded this delicious bread from gettyimages:

Now:

What if I want the response to be a specialized format for my API or something?

Then you can just turn on JSON mode and specify the response schema:

OpenAI playground has this too, but it’s not as intuitive to work with.

Another upgrade Gemini has over OpenAI is how creativity it can be.

In Gemini you can increase the temperature from 0 to 200% to control how random and creative the responses are:

Meanwhile in OpenAI if you try going far beyond 100%, you’ll most likely get a whole literal load of nonsense.

And here’s the best part — when you’re done creating your prompt you can just use Get code — easily copy and paste the boilerplate API code and move lightning-fast in your development.

Works in several languages including Kotlin, Swift and Dart — efficient AI workflow in mobile dev.

In OpenAI playground you can get the code for Python and JavaScript.

Final thoughts

Gemini 1.5 Flash is a game-changer offering unparalleled capabilities at a fraction of the cost.

With its advanced multi-modality ease of use, generous free pricing, and creative potential it sets a new standard for AI leaving GPT-4o in the dust.

Fine-tuning for OpenAI’s GPT-3.5 Turbo model is finally here

Some great news lately for AI developers from OpenAI.

Finally, you can now fine-tune the GPT-3.5 Turbo model using your own data. This gives you the ability to create customized versions of the OpenAI model that perform incredibly well at specific tasks and give responses in a customized format and tone, perfect for your use case.

For example, we can use fine-tuning to ensure that our model always responds in a JSON format, containing Spanish, with a friendly, informal tone. Or we could make a model that only gives one out of a finite set of responses, e.g., rating customer reviews as critical, positive, or neutral, according to how *we* define these terms.

As stated by OpenAI, early testers have successfully used fine-tuning in various areas, such as being able to:

  • Make the model output results in a more consistent and reliable format.
  • Match a specific brand’s style and messaging.
  • Improve how well the model follows instructions.

The company also claims that fine-tuned GPT-3.5 Turbo models can match and even exceed the capabilities of base GPT-4 for certain tasks.

Before now, fine-tuning was only possible with weaker, costlier GPT-3 models, like davinci-002 and babbage-002. Providing custom data for a GPT-3.5 Turbo model was only possible with techniques like few-shot prompting and vector embedding.

OpenAI also assures that any data used for fine-tuning any of their models belongs to the customer, and then don’t use it to train their models.

What is GPT-3.5 Turbo, anyway?

Launched earlier this year, GPT-3.5 Turbo is a model range that OpenAI introduced, stating that it is perfect for applications that do not solely focus on chat. It boasts the capability to manage 4,000 tokens at once, a figure that is twice the capacity of the preceding model. The company highlighted that preliminary users successfully shortened their prompts by 90% after applying fine-tuning on the GPT-3.5 Turbo model.

What can I use GPT-3.5 Turbo fine-tuning for?

  • Customer service automation: We can use a fine-tuned GPT model to make virtual customer service agents or chatbots that deliver responses in line with the brand’s tone and messaging.
  • Content generation: The model can be used for generating marketing content, blog posts, or social media posts. The fine-tuning would allow the model to generate content in a brand-specific style according to prompts given.
  • Code generation & auto-completion: In software development, such a model can provide developers with code suggestions and autocompletion to boost their productivity and get coding done faster.
  • Translation: We can use a fine-tuned GPT model for translation tasks, converting text from one language to another with greater precision. For example, the model can be tuned to follow specific grammatical and syntactical rules of different languages, which can lead to higher accuracy translations.
  • Text summarization: We can apply the model in summarizing lengthy texts such as articles, reports, or books. After fine-tuning, it can consistently output summaries that capture the key points and ideas without distorting the original meaning. This could be particularly useful for educational platforms, news services, or any scenario where digesting large amounts of information quickly is crucial.

How much will GPT-3.5 Turbo fine-tuning cost?

There’s the cost of fine-tuning and then the actual usage cost.

  • Training: $0.008 / 1K tokens
  • Usage input: $0.012 / 1K tokens
  • Usage output: $0.016 / 1K tokens

For example, a gpt-3.5-turbo fine-tuning job with a training file of 100,000 tokens that is trained for 3 epochs would have an expected cost of $2.40.

OpenAI, GPT 3.5 Turbo fine-tuning and API updates

When will fine-tuning for GPT-4 be available?

This fall.

OpenAI has announced that support for fine-tuning GPT-4, its most recent version of the large language model, is expected to be available later this year, probably during the fall season. This upgraded model has been proven to perform at par with humans across diverse professional and academic benchmarks. It surpasses GPT-3.5 in terms of reliability, creativity, and its capacity to deal with instructions that are more nuanced.