It’s time to upgrade our knowledge on the Google basics.
Google’s AI Agent that can See, Hear & Speak
Major AI updates from Google I/O 2024
https://unwindai.substack.com/p/googles-ai-agent-that-can-see-hear
- Google extends Gemini 1.5 Pro’s context window to 2 Million tokens
- Google’s next text-to-video model competing with OpenAI’s Sora
- Project Astra real-time AI assistance in video calls
- Gemma 2 will be released soon, outperforming models double its size
Via FryAI
|
“GOOGLE (LITERALLY) SHATTERS THE INTERNET
|
|
It’s safe to say Google’s spring update put OpenAI’s update to shame. |
What’s up? One day after OpenAI released GPT-4o, Google held a massive event in Hawaii, showcasing an entirely new universe of AI projects. |
What are the new updates? This high-energy event featured the introduction and demonstration of monumental AI developments. Some of the major updates include the following: |
- AI Overviews: A search experience that allows Google to do the Googling for you, sifting the internet to find concise and accurate search results for layered questions, with links for further resources. This new feature will also soon allow users to search with videos! For example, you can take a video of your broken record player and ask, “Why won’t this work?”
- Project Astra: An expert AI life assistant you can speak to in natural language. This model is teachable and recognizes multimodal content, including video and speech.
- Veo: A new video model that creates 1080p videos from text and image inputs. These videos can be further edited using additional prompts.
- Ask Photos: This allows a user to do a detailed search of their photos. For instance, you can ask, “When did my daughter learn how to swim?” or “Show me how my daughter’s swimming has progressed,” and you will get a detailed summary, including the relevant photos. This also includes Circle to Search, which allows users to circle parts of photos to search that portion of the image online.
- Gemini 1.5 Pro’s context window went from a staggering 1 million to an even more staggering 2 million.
- Imagen 3: A new image generation model with richer details and fewer distortions. This new model pays attention to small details in longer, naturally written prompts.
- Updates to Google Workspace Labs: This includes a new AI Teammate feature as well as updates to Gmail.
- Gemini 1.5 Flash: This new model is designed to be fast and cost-efficient, optimized for tasks where low latency and efficiency matter most.
- Gems: Personal AI assistants which specialize in specific types of tasks. For example, you can create a “personal fitness trainer” Gem.
- Music AI Sandbox: Created in collaboration with YouTube, this new feature allows users to create instrumental sections from scratch, transfer styles between tracks, and more.
- SynthID: This open-source digital watermark technology will protect the authenticity of photos, audio, text, and videos.
- Trillium TPUs: Google’s latest generation of TPUs which delivers a 4.7x improvement in compute performance per chip over the previous generation
- PaliGemma: Google’s first vision-language open model.
|
“When we first began this journey to build AI more than 15 years ago, we knew one day, it would change everything. Now, that time is here.” |
|
-Google DeepMind” |
|
|
Via AI Tool Report
Google’s game-changing AI search:
At the Google I/O developer conference, Google unveiled some revolutionary AI search features
https://www.aitoolreport.com/articles/googles-game-changing-ai-search
At Google I/O 2024, our annual developer conference, we shared how we’re building more helpful products and features with AI — including improvements across Search, Workspace, Photos, Android and more. Read on for everything we announced.
Via The Rundown AI
|
Image source: Google |
|
“The Rundown: Google just kicked off its I/O Developer’s Conference, announcing a wide array of updates across its AI ecosystem — including enhancements across its flagship Gemini model family and a new video generation model to rival OpenAI’s Sora. |
Gemini model updates: |
- New updates to 1.5 Pro include a massive 2M context window extension and enhanced performance in code, logic, and image understanding.
- Gemini 1.5 Pro can also utilize the long context to analyze a range media types, including documents, videos, audio, and codebases.
- Google announced Gemini 1.5 Flash, a new model optimized for speed and efficiency with a context window of 1M tokens.
- Gemma 2, the next generation of Google’s open-source models, is launching in the coming weeks, along with a new vision-language model called PaliGemma.
- Gemini Advanced subscribers can soon create customized personas called ‘Gems’ from a simple text description, similar to ChatGPT GPTs.
|
Video and image model upgrades: |
- Google revealed a new video model called Veo, capable of generating over 60-second, 1080p resolution videos from text, image, and video prompts.
- The new Imagen 3 text-to-image model was also unveiled with better detail, text generation, and natural language understanding than its predecessor.
- VideoFX text-to-video tool, featuring storyboard scene-by-scene creation and the ability to add music to generations.
- VideoFX is launching in a ‘private preview’ in the U.S. for select creators, while ImageFX (with Imagen 3) is available to try via a waitlist.
|
Why it matters: Gemini’s already industry-leading context window gets a 2x boost, enabling endless new opportunities to utilize AI with massive amounts of information. Additionally, Sora officially has competition with the impressive Veo demo — but which one will make it to public access first?” |
“Google just came out swinging against OpenAI’s GPT-4o with its own Project Astra. This universal AI agent is designed to be your assistant for everyday life tasks and leverages your phone’s camera and voice recognition to give responses. And it even works with smart glasses.”
“Google Search just received a massive generative AI update at today’s I/O 2024 event. This Gemini update is expected to transform the way you use the search engine by answering questions you have by pulling information from the web. Here’s what we learned.”
“Today Google introduced a new generative media model with Veo. Veo is built to generate high-def video and can even understand cinematic terms and make minute-long (or more) videos. Here’s more about Google’s new AI model.”
0 Responses
Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.