ai

OpenAI’s new AI agent will change everything

The new OpenAI operator agent will change the world forever.

This is going to be a real AI agent that actually works — unlike gimmicks like AutoGPT.

Soon AI will be able to solve complex goals with lots of interconnected steps.

Completely autonomous — no continuous prompts — zero human guidance apart from dynamic input for each step.

Imagine you could just tell ChatGPT “teach me French” and that’ll be all it needs…

  • Analyzing your French level with a quick quiz
  • Crafting a comprehensive learning plan
  • Setting phone and email reminders to help you stick to your plan…
Not quite there yet 😉

This is basically the beginning of AGI — if it isn’t already.

And when you think of it this is already what apps like Duolingo try to do — solving complex problems.

But an AI agent will do this in far more comprehensive and personalized way — intelligently adjusting to the user’s needs and changing desires.

You can say something super vague like “plan my next holiday” and instantly your agent gets to work:

  • Analyzes your calendar to know the best next holiday time
  • Figures out someone you’ll love from previous conversations that stays within your budget
  • Books flights and sets reservations according to your schedule

This will change everything.

Which is why they’re not the only ones working on agents — the AI race continues…

We have Google apparently working on “Project Jarvis” — an AI agent to automate web-based tasks in Chrome.

Automatically jumping from page to page and filling out forms and clicking buttons.

Maybe something like Puppeteer — a dev tool programmers use to make the browser do stuff automatically — but it isn’t hard-coded and it’s far more flexible.

Anthropic has already released their own AI agent in Claude 3.5 Sonnet — a groundbreaking “computer use” feature.

Google and Apple will probably have a major advantage over OpenAI and Anthropic though — cause of Android and iOS.

Gemini Android and Apple Intelligence could seamlessly switch between all your mobile apps for a complex chain of actions.

Since they have deep access to the OS they could even use the apps without having to open them visually.

They’ll control system settings.

You call the Apple Intelligence agent, “Send a photo of a duck to my Mac”, and it’ll generate an image of a duck, turn on Airdrop on iPhone, send the photo and turn Airdrop back off.

But the most power all these agents will have comes from the API interface — letting you build tools to plug into the agent.

Like you can create a “Spotify” tool that’ll let you play music from the agent. Or a “Google” tool to check your mails and plan events with your calendar.

So it all really looks promising — and as usual folks like Sam Altman are already promising the world with it.

AI agents may well be the future—personalized, autonomous, and powerful. They’ll revolutionize how we learn, plan, and interact. The race is on.

We may see devastating job-loss impacts in several industries — including software development…

Let’s see how it goes.

Bye bye Apple Intelligence — Gemini for iPhone is amazing 😲

Apple Intelligence isn’t coming to like 90% of iPhones and everyone is pissed…

So no better time for Google to jump on this and finally push out their chatbot for iPhone.

And they didn’t disappoint on features — live conversation, deep app integration, stunning image generation…

I’ve been using the web app ever since they were calling Bard and it’s been great so I was pretty keen on this.

Gemini isn’t aMaZiNg, but it works well for getting up-to-date info, unlike ChatGPT that’s stuck in April 2023 and doesn’t known when to search the web.

Impressively Gemini has already skyrocketed to the top of charts in Productivity.

Or maybe not so impressive since it’s Google and it’s only #2 — but then #1 is ChatGPT so…

The first thing I noticed is the clean minimalist interface, quite similar to the one on desktop.

You have more than enough space for core things like input.

It searches the entire web but responds incredibly quickly to give detailed answers to complex questions.

Spark your creativity and generate ideas for stories, poems, or scripts.

The image generation has drastically improved with their new Imagen 3 engine:

Refine images easily — but it’s changing it other parts of the image:

Gemini Live is a powerful standout feature, enabling real-time interactive conversations.

It provides context-based responses and adapts to the flow of the conversation.

Looks like they did their based to simulate a live conversation with a human — no chat or text feedback confirming what you say. Although they save the chat history.

The voices aren’t robotic and have decent intonation.

One cool thing is it intelligently figures out how when you’re not done speaking — like in a real convo.

Me: What is the umm…

Gemini: What is the what?

Me: I don’t know

Gemini: That’s totally fine…

Me: Shut up

Gemini: I’m sorry I’m not comfortable engaging in conversation that’s disrespectful

Me: I love you

Gemini: Thank you for your kind words…

You can control it from the Notification Centre.

It integrates seamless with all your other Google services — I asked it to check my latest email and it was on point.

Final thoughts

It’ll be exciting to see what Gemini can do in the future.

Of course it has no chance with deep integration with iOS like Apple Intelligence.

But it’s a versatile and intelligent AI Assistant worth checking out.

Microsoft is getting even more desperate with AI 🤦‍♂️

Microsoft going all in on the AI bandwagon…

Bing, Edge, Windows… now it’s Notepad’s turn.

Image source: bleepingcomputer.com

“Custom rewrite” — tweak tone, format and length:

Image source: theverge.com

Not bad but I doubt most people will use it.

Most people just use Notepad as a simple text editor to hold temp info and other super short-term stuff… not for this.

And wouldn’t it have been much better if it was just a text input to rewrite the text however we want flexibly?

Even good old Paint will be getting AI soon — “generative erase” (lol)

So you can remove any object from the photo and it’ll automagically create a seamless background.

❌ Before erase:

Image source: bleepingcomputer.com

✅ After erase:

Image source: bleepingcomputer.com

Two more for the growing list of MS products possessed with the AI spirit.

Even their Surface devices are all about AI now:

Even their Android keyboard app 😂

Remember this?

When Bing Chat first came out — an interesting chatbot getting a lot of attention that could have finally made a dent in Google’s numbers.

Only for them to brutally degrade it into your everyday chatbot.

Then they brought their annoying Copilot button to Edge — one more setting to change whenever I newly install it.

Then they brazenly replaced the NEW TAB button with this garbage in their mobile apps. That was the last straw for me — no more Edge on Android/iOS.

Imagine depriving users of easy access to such a fundamental action in a browser because of AI.

Imagine the horror of a Camera app where you see a Copilot button where the Snap button should be.

Luckily I don’t use Windows anymore so I won’t have to deal with their Copilot in Windows garbage:

And their aggressive marketing has certainly robbed a lot of people the wrong way — like how they did when Edge went Chromium.

Lol… someone was mad.

It’s just insane how many companies jumped on the AI bandwagon ever since ChatGPT.

Notion, Spotify, Zapier, Canva… even Apple finally caved.

Everything is AI now. Even the most mundane procedural algorithm to automate something is AI lol.

No doubt many AI upgrades have been like the recent Google Search Gen AI — that probably decimated traffic of millions of sites out there.

But a great of them add very little value — clearly just to prey on the emotions of users and investors.

But one is certain, this AI hype isn’t stopping anytime soon.

Let’s see how long until the so-called AGI comes around.

New Gemini 1.5 FLASH model: An absolute Google game changer

So Google has finally decided to show OpenAI who the real king of AI is.

Their new Gemini 1.5 Flash model blows GPT-4o out of the water and the capabilities are hard to believe.

Lightning fast.

33 times cheaper than GPT-4o but has a 700% greater context — 1 million tokens.

What is 1 million tokens in the real-world? Approximately:

  • Over an 1 hour of video
  • Over 30,000 lines of code
  • Over 700,000 words

❌GPT-4o cost:

  • Input: $2.50 per million tokens
  • Output: $10 per million tokens
  • Cached input: $1.25 per million tokens

✅ Gemini 1.5 Flash cost:

  • Input: $0.075 per million tokens
  • Output: $0.30 per million tokens
  • Cached input: $0.01875 per million tokens

And then there’s the mini Flash-8B version for cost-efficient tasks — 66 times cheaper:

And the best part is the multi-modality — it can reason with text, files, images and audio in complex integrated ways.

And 1.5 Flash has almost all the capabilities of Pro but much faster. And as a dev you can start using them now.

Gemini 1.5 Pro was tested with a 44-minute silent movie and astonishingly, it easily analyzed the movie into various plot points and events. Even pointing out tiny details that most of us would miss on first watch.

Meanwhile the GPT-4o API only lets you work with text and images.

You can easily create, test and refine prompts in Google’s AI Studio — completely free.

It doesn’t count in your billing like in OpenAI playground.

Just look at the power of Google AI Studio — creating a food recipe based on an image:

I uploaded this delicious bread from gettyimages:

Now:

What if I want the response to be a specialized format for my API or something?

Then you can just turn on JSON mode and specify the response schema:

OpenAI playground has this too, but it’s not as intuitive to work with.

Another upgrade Gemini has over OpenAI is how creativity it can be.

In Gemini you can increase the temperature from 0 to 200% to control how random and creative the responses are:

Meanwhile in OpenAI if you try going far beyond 100%, you’ll most likely get a whole literal load of nonsense.

And here’s the best part — when you’re done creating your prompt you can just use Get code — easily copy and paste the boilerplate API code and move lightning-fast in your development.

Works in several languages including Kotlin, Swift and Dart — efficient AI workflow in mobile dev.

In OpenAI playground you can get the code for Python and JavaScript.

Final thoughts

Gemini 1.5 Flash is a game-changer offering unparalleled capabilities at a fraction of the cost.

With its advanced multi-modality ease of use, generous free pricing, and creative potential it sets a new standard for AI leaving GPT-4o in the dust.

Why Devin AI can’t take your job.

Devin AI.

They claim it’s the silver bullet for all software creation, a miraculous tech outperforming every other AI model and handling real-world programming with ease.

With recent news of Nvidia CEO, Jensen Huang, confidently predicting the impending death of coding, surely this Devin AI thing must be the first nail to go in the coffin.

Mhmm.

Sounds suspiciously familiar… AutoGPT, anyone? GPT Engineer? LOL.

Oh no… before jumping on the bandwagon we need to take a closer look at the deception behind this supposed game-changer.

The first glaring issue is the lack of transparency surrounding Devin AI’s performance metrics.

Sure, they claim it’s superior, but how did they arrive at these numbers? And where’s the proof? There’s a conspicuous absence of generated source code to back up their claims.

Without this crucial evidence you can’t take their word at face value.

Can Devin AI really make a meaningful impact in a real-world repository? Doubtful. And what about limitations? Not a word. It’s as if they want us to believe Devin AI is flawless, without a single drawback.

The demos provided by Devin AI are suspect at best; They showcase its abilities but conveniently omit crucial details.
Ever notice how they never revealed the prompts inputted by the user?

If you pause the videos and examine the timestamps, you’ll find it takes hours, not the mere five minutes they lead you to believe. It’s a smoke and mirrors act designed to dazzle without substance.

And what about the demos themselves? They’re basic, rudimentary at best. Many of the problems showcased are nothing more than following a tutorial, some of which even included code snippets.

Hype over competence.

Perhaps the most concerning aspect is the lack of public testing. If Devin AI truly lives up to the hype, why not let the public put it through its paces?

The reluctance to release it for testing raises red flags and hints at a possible cash grab scheme. Business may well soon find themselves disillusioned with promises that fail to materialize.

Trusting AI blindly is a path to failure.

Even if Devin AI does possess remarkable capabilities, it’s important to remember that code still requires human understanding and review to be acceptable. Software engineering is a nuanced field with countless variables; How can an AI know it is correct when its idea of correctness is bound by its training data?

If you think AI can replace developers so easily, then you probably missing the whole point of why we code. Coding at it’s core, is not about typing and compiling. It’s not even about creating apps or websites.

Coding is about specifying the requirements of a system with zero ambiguity. It’s about expressing the solution to a problem with absolute precision.

When you type in a prompt to ChatGPT with all the vivid descriptions and (hopefully) expressive constraints, you are coding.

The difference now is the glaring ambiguity of natural language; the lack of certainty of getting exactly what you want from the AI 100% of the time. That’s why you can refine a prompt dozens of times and have absolutely nothing to show for it.

So AI can only be as good at generating code as the instructions it’s given. And describing the software you want with precision has always been the greatest challenge in software development.

If Devin AI can compel users to provide enough definitions, then perhaps it has potential. But until then, it remains an overhyped tool with limited utility.

AI’s role in programming is similar to the evolution of programming languages. As languages have progressed, programming has become more accessible. But has this led to fewer programmers? No. Instead, it has expanded the reach of programming, leading to more innovation and productivity.

Likewise AI-supported coding will enhance productivity, not replace developers. These AI models are essentially sophisticated search engines trained on vast amounts of data. They excel at common tasks but falter when faced with specific or innovative challenges. They lack the creativity and problem-solving abilities inherent in human developers.

Once again let’s not forget about reliability; AI may churn out code, but isn’t always accurate; deploying AI in critical applications without human oversight is a recipe for disaster. Developers are essential for identifying and correcting errors to ensure the integrity and functionality of the software.

Devin AI may have its uses but it’s far from the panacea it’s been made out to be. As software engineers we should embrace innovation but remain skeptical of overhyped technologies. After all, it’s our expertise and ingenuity that will continue to drive progress in the field, not flashy AI gimmicks.

The genius algorithm behind ChatGPT’s most powerful UI feature

Yes it’s ChatGPT, the underrated + overrated chatbot used by self-proclaimed AI experts to promote “advanced skills” like prompt engineering.

But this isn’t a ChatGPT post about AI. It’s about JavaScript and algorithms…

Message editing; a valuable feature you see in every popular chatbot:

  • Edit our message: No one is perfect and we all make mistakes, or we want to branch off on a different conversation from an earlier point.
  • Edit AI message: Typically by regeneration to get varying responses, especially useful for creative tasks.

But ChatGPT is currently the only chatbot that saves your earlier conversation branches whenever you edit messages.

Other chatbots avoid doing this, probably due to the added complexity involved, as we’ll see.

OpenConvo is my fork of Chatbot UI v1, and this conversation branching feature was one of the key things I added to the fork — the only reason I made the fork.

Today, let’s put ourselves in the shoes of the OpenAI developers, and see how to bring this feature into life (ChatGPT’s life).

Modify the chatbot to allow storing previous user and AI messages after editing or regeneration. Users can navigate to any message sent at any time in the chat and view the resulting conversation branch and sub-branches that resulted from that message.

Just before we start this feature we’ll probably have been storing the convo as a simple list of messages👇. It’s just an ever-growing list that keeps increasing.

The 3 main functional requirements we’re concerned with, and what they currently do in a sample React app.

  • Add new message: add item to list.
  • Edit message: Delete this and all succeeding messages, then Add new message with edited content.
  • Display messages: Transform list to JSX array with your usual data -> UI element mapping.

But now with the conversation branching feature, we’re going have some key sub-requirements stopping us from using the same implementation

  • Every message has sibling messages to left and/or right.
  • Every message has parent and child message to top and/or bottom.

We can’t use simple lists to store the messages anymore; we want something that easily gives us that branching functionality without us trying to be too smart.

If you’re done a little Algo/DS you’ll instantly see that the messages are in a tree-like structure. And one great way to implement trees is with: Linked Lists.

  • Every conversation message is a node. A single “head” node begins the conversation.
  • Every node has 4 pointers: prevSibling, nextSibling, parent, and child (←→ ↑ ↓) . Siblings are all on the same tree level.
  • Every level has an active node, representing the message the user can see at that branch.

We either branch right by editing/regenerating:

Or we branch down by entering a new message or getting a response:

The most important algorithm for this conversation branching feature is the graph transversal. Dearly needed to add and display the messages.

Here’s pseudocode for the full-depth transversal to active conversation branch’s latest message:

  1. Set current node to conversation head (always the same) (level 1 node)
  2. Look for the active node at current node’s level and re-set current node to it. This changes whenever the user navigates with the left/right arrows.
  3. If current node has a child, re-set current node to it. Else return current node.
  4. Rinse and repeat: Go to step 2.

Add new message

So when the user adds a new message we travel to the latest message and add a child to it to extend the current branch.

If it’s a new convo, then we just set the head node to this new message instead.

Edit message / regenerate response

There’s no need for transversal because we get the node from the message ID in a “message edited” event listener.

At the node we find its latest/right-most sibling and add another sibling.

Display messages

Pretty straightforward: travel down all the active nodes in the conversation and read their info for display:

In OpenConvo I added each node to a simple list to transform to JSX for display in the web app:

View previous messages

No point in this branching feature if users can’t see their previous message, is there?

To view the previous messages we simply change the active message to the left or right sibling (we’re just attending to another of our children, but we love them all equally).

With this we’ve successfully added the conversation branching feature.

Another well-known way to to represent graphs/trees is an array of lists; that may have an easier (or harder) way to implement this.

Every amazing new feature in GPT-4 Turbo

Great news – OpenAI just released GPT-4 Turbo, an upgraded version of the GPT-4 model with a context window up to 128K tokens – more than 300 pages of text, and a fourfold increase in regular GPT-4’s most powerful 32K context model.

The company made this known at its first-ever developer conference, touting a preview version of the model and promising a production-grade GPT-4 Turbo in the next few weeks.

Users will be able to have longer, more complex conversations with GPT-4 Turbo as there’ll be more room to remember more of what was said earlier in the chat.

DALLE-3 prompt: “A beautiful city with buildings made of different, bright, colorful candies and looks like a wondrous candy land”.
DALLE-3 prompt: “A beautiful city with buildings made of different, bright, colorful candies and looks like a wondrous candy land”

Also exciting to hear, GPT-4 Turbo is now trained on real-world knowledge and events up to April 2023, allowing us to build greater apps utilizing up-to-date data, without needing to manually keep it in the loop with custom data from embeddings and few-shot prompting.

Even better, the greater speed and efficiency of this new turbocharged model have made input tokens 3 times cheaper and slashed the cost of output tokens in half.

So, upgraded in capability, upgraded in knowledge, upgraded in speed, all with a fraction of the previous cost. That’s GPT-4 Turbo.

An innovative feature currently in preview, you can now pass image inputs to the GPT-4 model for processing, making it possible to perform tasks like generating captions, analyzing and classifying real-world images, and automated image moderation.

Then there’s the new DALL-E 3 API for automatically generating high-quality images and designs, and an advanced Text-to-speech (TTS) API capable of generating human-level speech with a variety of voices to choose from.

DALLE-3 outclasses Midjourney! Especially when it comes to creating complex images from highly detailed and creative prompts.

DALLE-3 (top) vs Midjourney (bottom). Prompt: "A vast landscape made entirely of various meats spreads out before the viewer. tender, succulent hills of roast beef, chicken drumstick trees, bacon rivers, and ham boulders create a surreal, yet appetizing scene. the sky is adorned with pepperoni sun and salami clouds".
DALLE-3 (top) vs Midjourney (bottom). Prompt: “A vast landscape made entirely of various meats spreads out before the viewer. tender, succulent hills of roast beef, chicken drumstick trees, bacon rivers, and ham boulders create a surreal, yet appetizing scene. the sky is adorned with pepperoni sun and salami clouds”. Source: DALL-E 3 vs. Midjourney: A Side by Side Quality Comparison

And we can’t forget the ambitious new Assistants API, aimed at helping devs build heavily customized AI agents with specific instructions that leverage extra knowledge and call models and tools to perform highly specialized tasks.

It’s always awesome to see these ground-breaking improvements in the world of AI, surely we can expect developers to take full advantage of these and produce even more intelligent and world-changing apps that improve the quality of life for everyone.

Mojo: 7 brilliant Python upgrades in the new AI language

It is 35,000 times faster than Python. It is quicker than C. It is as easy as Python.

Enter Mojo: a newly released programming language made for AI developers and made by Modular, a company founded by Chris Lattner, the original creator of Swift.

This 35000x claim came from a benchmark comparison between Mojo and other languages, using the Mandelbrot algorithm on a particular AWS instance.
This 35000x claim came from a benchmark comparison between Mojo and other languages, using the Mandelbrot algorithm on a particular AWS instance.

It’s a superset of Python, combining Python’s usability, simplicity, and versatility with C’s incredible performance.

If you’re passionate about AI and already have a grasp on Python, then Mojo is definitely worth a try. So, let’s dive in and explore 7 powerful features of this exciting language together.

Mojo’s features

I signed up for Mojo access shortly after it was announced and got access a few days later.

I got access to the Mojo playground.

I started exploring all the cool new features they had to offer and even had the chance to run some code and see the language in action. Here are 7 interesting Python upgrades I found:

1. let and var declarations

Mojo introduces new let and var statements that let us create variables.

If we like we can specify a type like Int or String for the variable, as we do in TypeScript. var allows variables to change; let doesn’t. So it’s not like JavaScript’s let and var – There’s no hoisting for var and let is constant.

Mojo
def your_function(a, b): let c = a # Uncomment to see an error: # c = b # error: c is immutable if c != b: let d = b print(d) your_function(2, 3)

2. structs for faster abstraction

We have them in C++, Go, and more.

Structs are a Mojo feature similar to Python classes, but they’re different because Mojo classes are static: you can’t add more methods are runtime. This is a trade-off, as it’s less flexible, but faster.

Mojo
struct MyPair: var first: Int var second: Int # We use 'fn' instead of 'def' here - we'll explain that soon fn __init__(inout self, first: Int, second: Int): self.first = first self.second = second fn __lt__(self, rhs: MyPair) -> Bool: return self.first < rhs.first or (self.first == rhs.first and self.second < rhs.second)

Here’s one way struct is stricter than class: all fields must be explicitly defined:

Fields must be explicitly defined in Mojo structs.

3. Strong type checking

These structs don’t just give us flexibility, they let us check variable types at compile-time in Mojo, like the TypeScript compiler does.

Mojo
def pairTest() -> Bool: let p = MyPair(1, 2) # Uncomment to see an error: # return p < 4 # gives a compile time error return True

The 4 is an Int, the p is a MyPair; Mojo simply can’t allow this comparison.

4. Method overloading

C++, Java, Swift, etc. have these.

Function overloading is when there are multiple functions with the same name that accept parameters with different data types.

Look at this:

Mojo
struct Complex: var re: F32 var im: F32 fn __init__(inout self, x: F32): """Makes a complex number from a real number.""" self.re = x self.im = 0.0 fn __init__(inout self, r: F32, i: F32): """Makes a complex number from its real and imaginary parts.""" self.re = r self.im = i

Typeless languages like JavaScript and Python simply can’t have function overloads, for obvious reasons.

Although overloading is allowed in module/file functions and class methods based on parameter/type, it won’t work based on return type alone, and your function arguments need to have types. If don’t do this, overloading won’t work; all that’ll happen is the most recently defined function will overwrite all those previously defined functions with the same name.

5. Easy integration with Python modules

Having seamless Python support is Mojo’s biggest selling point by far.

And using Python modules in Mojo is straightforward. As a superset, all you need to do is call the Python.import_module() method, with the module name.

Here I’m importing numpy, one of the most popular Python libraries in the world.

Mojo
from PythonInterface import Python # Think of this as `import numpy as np` in Python let np = Python.import_module("numpy") # Now it's like you're using numpy in Python array = np.array([1, 2, 3]) print(array)

You can do the same for any Python module; the one limitation is that you have to import the whole module to access individual members.

All the Python modules will run 35,000 times faster in Mojo.

6. fn definitions

fn is basically def with stricter rules.

def is flexible, mutable, Python-friendly; fn is constant, stable, and Python-enriching. It’s like JavaScript’s strict mode, but just for def.

Mojo
struct MyPair: fn __init__(inout self, first: Int, second: Int): self.first = first self.second = second

fn‘s rules:

  • Immutable arguments: Arguments are immutable by default – including self – so you can’t mistakenly mutate them.
  • Required argument types: You have to specify types for its arguments.
  • Required variable declarations: You must declare local variables in the fn before using them (with let and var of course).
  • Explicit exception declaration: If the fn throws exceptions, you must explicitly indicate so – like we do in Java with the throws keyword.

7. Mutable and immutable function arguments

Pass-by-value vs pass-by-reference.

You may have across this concept in languages like C++.

Python’s def function uses pass-by-reference, just like in JavaScript; you can mutate objects passed as arguments inside the def. But Mojo’s def uses pass-by-value, so what you get inside a def is a copy of the passed object. So you can mutate that copy all you want; the changes won’t affect the main object.

Pass-by-reference improves memory efficiency as we don’t have to make a copy of the object for the function.

But what about the new fn function? Like Python’s def, it uses pass-by-reference by default, but a key difference is that those references are immutable. So we can read the original object in the function, but we can’t mutate it.

Immutable arguments

borrowed a fresh, new, redundant keyword in Mojo.

Because what borrowed does is to make arguments in a Mojo fn function immutable – which they are by default. This is invaluable when dealing with objects that take up a substantial amount of memory, or we’re not allowed to make a copy of the object we’re passing.

For example:

Mojo
fn use_something_big(borrowed a: SomethingBig, b: SomethingBig): """'a' and 'b' are both immutable, because 'borrowed' is the default.""" a.print_id() // 10 b.print_id() // 20 let a = SomethingBig(10) let b = SomethingBig(20) use_something_big(a, b)

Instead of making a copy of the huge SomethingBig object in the fn function, we simply pass a reference as an immutable argument.

Mutable arguments

If we want mutable arguments instead, we’ll use the new inout keyword instead:

Mojo
struct Car: var id_number: Int var color: String fn __init__(inout self, id: Int): self.id_number = id self.color = 'none' # self is passed by-reference for mutation as described above. fn set_color(inout self, color: String): self.color = color # Arguments like self are passed as borrowed by default. fn print_id(self): # Same as: fn print_id(borrowed self): print('Id: {0}, color: {1}') car = Car(11) car.set_color('red') # No error

self is immutable in fn functions, so we here we needed inout to modify the color field in set_color.

Key takeaways

  • Mojo: is a new AI programming language that has the speed of C, and the simplicity of Python.
  • let and var declarations: Mojo introduces let and var statements for creating optionally typed variables. var variables are mutable, let variables are not.
  • Structs: Mojo features static structs, similar to Python classes but faster due to their immutability.
  • Strong type checking: Mojo supports compile-time type checking, akin to TypeScript.
  • Method overloading: Mojo allows function overloading, where functions with the same name can accept different data types.
  • Python module integration: Mojo offers seamless Python support, running Python modules significantly faster.
  • fn definitions: The fn keyword in Mojo is a stricter version of Python’s def, requiring immutable arguments and explicit exception declaration.
  • Mutable and immutable arguments: Mojo introduces mutable (inout) and immutable (borrowed) function arguments.

Final thoughts

As we witness the unveiling of Mojo, it’s intriguing to think how this new AI-focused language might revolutionize the programming realm. Bridging the performance gap with the ease-of-use Python offers, and introducing powerful features like strong type checking, might herald a new era in AI development. Let’s embrace this shift with curiosity and eagerness to exploit the full potential of Mojo.

Fine-tuning for OpenAI’s GPT-3.5 Turbo model is finally here

Some great news lately for AI developers from OpenAI.

Finally, you can now fine-tune the GPT-3.5 Turbo model using your own data. This gives you the ability to create customized versions of the OpenAI model that perform incredibly well at specific tasks and give responses in a customized format and tone, perfect for your use case.

For example, we can use fine-tuning to ensure that our model always responds in a JSON format, containing Spanish, with a friendly, informal tone. Or we could make a model that only gives one out of a finite set of responses, e.g., rating customer reviews as critical, positive, or neutral, according to how *we* define these terms.

As stated by OpenAI, early testers have successfully used fine-tuning in various areas, such as being able to:

  • Make the model output results in a more consistent and reliable format.
  • Match a specific brand’s style and messaging.
  • Improve how well the model follows instructions.

The company also claims that fine-tuned GPT-3.5 Turbo models can match and even exceed the capabilities of base GPT-4 for certain tasks.

Before now, fine-tuning was only possible with weaker, costlier GPT-3 models, like davinci-002 and babbage-002. Providing custom data for a GPT-3.5 Turbo model was only possible with techniques like few-shot prompting and vector embedding.

OpenAI also assures that any data used for fine-tuning any of their models belongs to the customer, and then don’t use it to train their models.

What is GPT-3.5 Turbo, anyway?

Launched earlier this year, GPT-3.5 Turbo is a model range that OpenAI introduced, stating that it is perfect for applications that do not solely focus on chat. It boasts the capability to manage 4,000 tokens at once, a figure that is twice the capacity of the preceding model. The company highlighted that preliminary users successfully shortened their prompts by 90% after applying fine-tuning on the GPT-3.5 Turbo model.

What can I use GPT-3.5 Turbo fine-tuning for?

  • Customer service automation: We can use a fine-tuned GPT model to make virtual customer service agents or chatbots that deliver responses in line with the brand’s tone and messaging.
  • Content generation: The model can be used for generating marketing content, blog posts, or social media posts. The fine-tuning would allow the model to generate content in a brand-specific style according to prompts given.
  • Code generation & auto-completion: In software development, such a model can provide developers with code suggestions and autocompletion to boost their productivity and get coding done faster.
  • Translation: We can use a fine-tuned GPT model for translation tasks, converting text from one language to another with greater precision. For example, the model can be tuned to follow specific grammatical and syntactical rules of different languages, which can lead to higher accuracy translations.
  • Text summarization: We can apply the model in summarizing lengthy texts such as articles, reports, or books. After fine-tuning, it can consistently output summaries that capture the key points and ideas without distorting the original meaning. This could be particularly useful for educational platforms, news services, or any scenario where digesting large amounts of information quickly is crucial.

How much will GPT-3.5 Turbo fine-tuning cost?

There’s the cost of fine-tuning and then the actual usage cost.

  • Training: $0.008 / 1K tokens
  • Usage input: $0.012 / 1K tokens
  • Usage output: $0.016 / 1K tokens

For example, a gpt-3.5-turbo fine-tuning job with a training file of 100,000 tokens that is trained for 3 epochs would have an expected cost of $2.40.

OpenAI, GPT 3.5 Turbo fine-tuning and API updates

When will fine-tuning for GPT-4 be available?

This fall.

OpenAI has announced that support for fine-tuning GPT-4, its most recent version of the large language model, is expected to be available later this year, probably during the fall season. This upgraded model has been proven to perform at par with humans across diverse professional and academic benchmarks. It surpasses GPT-3.5 in terms of reliability, creativity, and its capacity to deal with instructions that are more nuanced.