OpenAI debuts GPT-4o as a multimodal, more personal AI

OpenAI has unveiled GPT-4o, its latest flagship AI model, presenting this omni version of GPT-4 intelligence as a multimodal revolutionary step in AI interaction. However, a closer look reveals a mix of genuine advancements and potential pitfalls. Google will respond with its improvements to Gemini at its I/O developer conference today.

By Lindsey Schutters

14 May 2024

OpenAI's latest multimodal AI model GPT-4o is built on GPT-4 intelligence.

While GPT-4o boasts improved speed and capabilities across text, vision, and audio it’s hard to judge whether these improvements are truly groundbreaking in our AI saturated world. The inclusion of real-time audio and vision is innovative, but the practical applications and effectiveness remain to be seen.

OpenAI's emphasis on democratising access to AI tools is commendable. However, the company's track record on safety raises concerns. This could, however, also be foreshadowing the improvements we will see to Apple's Siri after rumours of a deal between the tech giant and OpenAI circulated ahead of the Worldwide Developer Conference (WWDC) next month.

Meta has unleashed its AI assistant into all of its social media apps.

Meta slides AI assistant into DMs in 7 African countries

Lindsey Schutters 22 Apr 2024

The potential misuse of real-time audio and vision capabilities could lead to unforeseen consequences, especially considering the technology's availability to free users.

A key focus of the release is to democratise access to advanced AI tools, making them more intuitive and accessible to everyone. OpenAI emphasised its commitment to reducing friction in user experience, ensuring that everyone can benefit from the power of AI.

Show and tell

The live demonstrations showcased impressive features like real-time translation and image analysis. Yet, these demonstrations were carefully curated and may not represent the model's performance in real-world scenarios.

While GPT-4o undoubtedly holds promise, it is crucial to approach the hype with a critical eye. The true impact of this technology will depend on how it is deployed, regulated, and ultimately, used by individuals and organisations.

With GPT-4o, users gain access to a wider range of tools and capabilities, previously only available to paid users. These include:

GPT Store: Access to custom ChatGPT experiences created by others.
Vision: Upload and analyse images and documents containing text and images.
Memory: Improved continuity and context across conversations.
Browse: Search for real-time information during conversations.
Advanced data analysis: Upload and analyse charts and data.
Improved language support: Enhanced quality and speed in 50 languages

As the model is rolled out over the next few weeks, it will be important to observe its real-world applications and potential challenges.

Speed to market

OpenAI is also rolling out GPT-4o on its API, providing developers with the tools to build applications at scale as well as faster processing, 50% cost reduction, and five times higher rate limits than GPT-4 Turbo.

Addressing the challenges that come with introducing powerful AI models, OpenAI highlighted its commitment to safety and responsible deployment.

The company has been actively working on mitigations against misuse, collaborating with stakeholders from government, media, entertainment, and civil society to ensure the technology is used ethically and responsibly.