Google's Gemini
Google launched Gemini today. Gemini 1.0 is a multimodal model that has been trained on image, audio, video and text data and is intended to be generalizable across different data modalities (e.g., you give an image, it can return text-based description of the image). Gemini comes in three sizes - Ultra, Pro, and Nano. Ultra is intended for the most complex and difficult tasks. Pro is designed for performance and deployability (sort of akin to the enterprise tier). Nano empowers on-device applications. Nano models are smaller and distillations of the bigger Gemini models. They are quantized for on-device deployment.
The evaluation against other multimodal models shows how fast the entire industry continues to move. It feels like foundational models are yesteryear's news, and multimodal is the present. Who knows what comes next. This is against the backdrop that building these large foundational models are extremely challenging, and multimodal foundational models are even harder because of multimodality.
Below is the evaluation from the technical paper. Gemini is good. GPT-4 is good. I expect to see further improvement on all of these models as these models continue to learn and as machine learning researchers continue to incorporate new learnings and tweaks.
Further, we can see the power of the large models as well as the performance degradation on the smaller "nano" model. This is expected. We have to note that it doesn't mean that Nano model or other smaller models are bad. It means that the market now has more foundational models to cater to different use cases and deployment structure. I like the fact that Google gives this comparison across models and tasks.
Congratulations to Google Deepmind team and all the friends and students who have contributed to this effort in some way.
Today's announcement also gives startups in the space much to think about. I wrote about AI defensibility before, and it is imperative for startups to dig in and consider where your edge is against large platform players.