Google Gemini is here - so what and what's next?
Google just launched its newest and most capable AI model family Gemini. It’s one of the biggest AI news of the year - in a year of groundbreaking AI news.
What are the big takeaways of Gemini for people in design and innovation? There are three that jump out to me.
1. Multimodality wave continues
With the launch of Gemini, the AI megatrend of multimodality - models that are fluent in audio, video, text, images, and code - is picking up pace.
Google is proudly touting the fact that Gemini was ”build from the ground up with multimodality”. This is evident in how its most powerful version (The Ultra) can surpass competing models like GPT-4 in image and video recognition. The Ultra is slightly above GPT-4 in text-based tasks, but the difference is debatable.
Gemini Pro (its middle-tier model) is roughly on par with Open AI’s GPT 3.5 in text-related tasks.
As a letdown, image generation and video recognition will not be included in the launch, so we’ll still have to take Google’s word on the multimodality.
So what?
This push for multimodality unlocks all kinds of intriguing opportunities.
Use cases range from the simple - providing feedback on visual designs - to the disruptive - new digital services and robots based on real-time visual input of the world.
2. Micro AI models running on devices
The smallest of the Geminis, Nano, is light enough to run on devices locally.
While its diminutive size will come with significant limitations, it’ll enable handy things like smart reply suggestions and text summaries on the device itself.
As Android is a dominant platform across all smart devices, Nano will enable developers to bake in AI into all sorts of uses in the real world - think elevator repair mechanic showing a problem to their phone that knows everything about elevators. Running instantly with no internet connection required. Google already includes it on their Android flagship, the Pixel 8 Pro.
So what?
Micro-scale AI models will enable cost-effective and offline use cases on mobile and embedded devices.
Sometimes, it won’t make sense to opt for more powerful models when designing for a relatively simple use case - like getting answers from PDFs.
3. The enterprise AI game has three major players now
If you are building services with AI or scaling its use in a large organization, the Gemini launch means there are now three major players on the market: Open AI, Microsoft, and Google.
Large enterprises with existing relationships with Microsoft will likely feel at home in the Microsoft world powered by GPT-4.
Companies and consumers already invested in the Google ecosystem will find their family of AI models and developer tools handy. Companies with less Microsoft legacy will gravitate towards building on the Open AI API or using the enterprise version of ChatGPT.
Google will likely match Open AI and Microsoft as an environment for developers in terms of support, models, and pricing.
So what?
Companies and consumers of all inclinations will have more choices on which AI ecosystem they want to jump into. This three-legged competition (with OpenAI and Microsoft as partner-competitors) will spur the whole industry forward.
What to expect next year?
This week’s Google Gemini launch felt a bit rushed.
Its most potent model - The Ultra - won’t be released until sometime next year. Also, the middle-tier model - Pro - is unavailable in Europe for now.
Still, expect Google to push its AI innovation next year. It’s far from done with some of the best AI talent, including many deep learning inventors, on its roster.
It also holds one of the largest portfolios of online services in the world - from YouTube and Search to Drive - that it can use to both scale its AI models and improve them.
You can expect these broad trends - multimodality, micro AI models, and the three-legged competition - to pick up steam going into 2024.
I’m also curious to see real advancements in agentive AI - something Google Deepmind Demis Hassabis openly admitted his team to be pushing forward in an interview with the New York Times.
Want to dive deeper on Google Gemini?
There have been several high-quality deep dives on Google Gemini. My recommendations are:
- AI Explained on Youtube with rigorous reporting, as always
- Hard Fork Podcast by the New York Times provided an enlightening and entertaining overview of Gemini
Want to read more?
This article was first published on my weekly newsletter Generative AI for Design and Innovation.