Skip to content


Google Bard’s New Visual Feature is a Game Changer. Chatbots can officially see the world

Google Bard’s New Visual Feature is a Game Changer

“Bard Can Recognize Images. 4 Business Use Cases 👇

Bard is like a college dropout (& not the startup founder-type): bad at writing, coding, and overall reasoning.
But it does have one redeeming feature: image recognition (upload a photo → Bard describes the image).
This can be particularly helpful for sorting or categorizing any physical items. Here are a few neat applications:
  • Classifying scanned invoices.
  • Evaluating images of products or real estate.
  • Documenting meeting notes.
  • Transcribing text from actual books.
For now, Bard only supports image uploads from your phone. And much like a college dropout: Bard overpromises. It describes images wrong. A lot.
Check out how badly it does handling simple children’s images.
Why it matters: Bard’s image capabilities are impressive, but they appear to do nothing more than a reverse image search (locating similar visuals on the internet). We only recommend it for non-work-related tasks as of now.”

“Meta Announced CM3leon, A ‘First-Of-Its Kind’ Image AI 📷

Midjourney generates photorealistic images.
Bard describes what’s in an image.
CM3leon can do both.
This new foundational AI model from Meta can do both text-to-image and image-to-text. It’s like LeBron James: insane at offense and defense.
This means CM3leon (pronounced “chameleon”) has the capabilities to:
  1. generate images (level sits between Midjourney & DALL-E).
  2. edit photos with text prompts (“alter the color”).
  3. place objects with precision (“put sink at coordinates 175, 47”).
  4. create captions and answer questions about its images.
Why it matters: Meta keeps dishing out innovative AI tech. But so far, they haven’t been commercialized (you and I can’t use them).
But at the end of the day, Meta is a social media business, and we’re confident that many of its models will find their place in media creation.
CM3leon’s superpower is that it’s multimodal: it’s proficient in text and image. This could be a big unlock for advertisers, marketers, and/or journalists to generate images and content under one umbrella.”
0 Shares

Posted on: July 17, 2023, 10:36 am Category: Uncategorized

0 Responses

Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.