Head over to our on-demand library to view sessions from VB Transform 2023. Register Here
The tsunami of new generative AI product news is showing no signs of letting up: Fresh on the heels of OpenAI’s expansion of Code Interpreter to all ChatGPT Plus users and Anthropic’s announcement of Claude 2, Google is taking the spotlight back with two big AI announcements this week. The first is a massive update to its large language model (LLM) product Bard, enabling users to upload images and have Bard analyze them. The second is the unveiling of Google NotebookLM, an AI-powered note-taking service in limited availability.
Bard goes global and visual
First up, the updates to Bard. For a while after OpenAI released ChatGPT in November 2022, it seemed like Google was racing to play catchup with its AI efforts.
But the annual Google I/O conference in May 2023 changed all that, with CEO Sundar Pichai and other executives and presenters saying the words “generative AI” more than 140 times during the two-hour-long keynote presentation like it was some sort of magical incantation for business success.
Clearly, the search and web giant was wholeheartedly embracing the tech trend that has swept Silicon Valley and the global tech industry. Though Bard has failed to reach the same user numbers as ChatGPT since its wide release at the same I/O event, it has been increasing its numbers more dramatically recently, and the new updates announced today may help further that trend.
VB Transform 2023 On-Demand
Did you miss a session from VB Transform 2023? Register to access the on-demand library for all of our featured sessions.
A Google blog post published today, authored by Jack Krawczyk, Bard’s product lead, and Amarnag Subramanya, VP of engineering for Bard, outlines a flurry of new features for the language model, including:
- Availability in “most of the globe,” and support for user prompts in 40 languages including Arabic, Chinese, German, Hindi and Spanish. Bard is also accessible in many new locations such as Brazil and Europe.
- Bard can speak its responses in 40 languages, which could be particularly beneficial for learning pronunciation.
- There are five new modes users can switch between for the types of responses they want Bard to provide: simple, long, short, professional or casual. What’s the difference? Google offers this example: “You can ask Bard to help you write a marketplace listing for a vintage armchair, and then shorten the response using the drop-down.” The feature is available only in English to start, but Google says other languages will follow.
- Four new features have been launched to enhance productivity: Users can pin and rename conversations with Bard; export Python code to Replit as well as Google Colab; share responses with their network via shareable links; and use images in their prompts with the help of Google Lens integration. The pinning in particular seems generally helpful, as it allows the user to save selected responses from Bard conversations off to the left side of the interface window for easy access later (instead of scrolling all the way up or down to find them).
- Finally, following up on a promise made at I/O, Bard now integrates with Google Lens, the tech giant’s image recognition technology, allowing users to include images in their prompts. Whether you need more information about an image or require assistance with creating a caption, Bard can analyze the uploaded image to assist. As of the time of the blog post, this feature is available in English, with plans to expand it to other languages soon. However, on Reddit, one user already successfully used Bard to solve a Google image CAPTCHA (“select all the squares with traffic lights”), adding an interesting twist to a world where the line between humanity and artificial intelligence is becoming increasingly blurry.
The future of note taking?
Yesterday, Google also revealed that another I/O announcement had graduated from internal development and use to limited public availability.
Introduced as “Project Tailwind” back at I/O, Google has renamed the service NotebookLM (short for “language model.”) It’s a more fitting name for the goal of this service: re-inventing the age-old practice of taking notes.
As Google’s self-described “small” NotebookLM team sees it, note-taking can be improved from the standard scribblings on paper or typing in the Apple Notes app by automatically analyzing and finding connections among many disparate notes and documents and summarizing these in a clear, easy-to-read guide. NotebookLM can go even further and answer user questions about their notes and documents in a conversational style, or even help users create new content.
“As we’ve been talking with students, professors and knowledge workers, one of the biggest challenges is synthesizing facts and ideas from multiple sources,” wrote Raiza Martin, product manager at Google Labs, and Steven Johnson, editorial director of Google Labs, in Google’s blog post explaining the service. “You often have the sources you want, but it’s time consuming to make the connections.”
Google’s solution to the problem is to create a “virtual research assistant” that is “grounded” or personalized to the user based on whatever set of documents they select. NotebookLM looks at these documents, pulls together its own guide, and then presents it to the user. The user can then ask the service in a Bard-like text-to-text prompting field for more information about any particular aspect, or for creative ideas based upon the underlying content.
As the Google blog post explains: “A medical student could upload a scientific article about neuroscience and tell NotebookLM to ‘create a glossary of key terms related to dopamine.’ An author working on a biography could upload research notes and make a request like: ‘Summarize all the times Houdini and Conan Doyle interacted.’”
Furthermore, in what may be a boon to YouTube Creators and TikTok influencers, “A content creator could upload their ideas for new videos and ask: ‘Generate a script for a short video on this topic.’”
VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.