Head over to our on-demand library to view sessions from VB Transform 2023. Register Here
Data platform vendor DataStax is entering the vector database space, announcing the general availability of vector search in its flagship Astra DB cloud database.
DataStax is one of the leading contributors to the open-source Apache Cassandra database, with Astra DB serving as a commercially supported cloud Database-as-a-Service (DBaaS) offering. Cassandra is what is known as a NoSQL database, though it has been expanding in recent years to support multiple data types and expanded use cases, notably AI/ML.
In fact, DataStax has been pushing its overall platform toward AI/ML during 2023, acquiring AI feature engineering vendor Kaskada in January. Datastax integrated the Kaskada technology into its DataStax Luna ML service, which was launched in May.
The new Astra DB vector support update further extends DataStax’s AI/ML capabilities, giving organizations a trusted, widely deployed database platform they can use for both traditional workloads and newer AI workloads.
VB Transform 2023 On-Demand
Did you miss a session from VB Transform 2023? Register to access the on-demand library for all of our featured sessions.
The vector capability was first previewed on Google Cloud Platform in June. With general availability it is today accessible natively on Amazon Web Services (AWS) and Microsoft Azure as well.
“In every meaningful way, Astra DB is now as much a native vector database as anyone else,” Ed Anuff, chief product officer at DataStax, told VentureBeat.
What vector databases are all about
Vector databases are fundamental to AI/ML operations. They enable content to be stored as a vector embedding — a numerical representation of data.
Anuff explained that vectors are an ideal way to represent the semantic meaning of content, and have broad applicability for applications within large language models (LLMs) as well as for improving relevance when trying to retrieve content.
There are many different approaches and vendors in the vector database space today. Purpose-built vendors include Pinecone, whose president and COO spoke at the recent VB Transform event about the ”explosion” in vector databases for generative AI. The open-source Milvus vector database is another popular option. An increasingly common approach to vector databases is to also provide vector search as an overlay, or extension to an existing database platform. MongoDB announced support for vector search in June. The widely deployed PostgreSQL database supports vectors by way of the pgvector technology.
>>Follow all our VentureBeat Transform 2023 coverage<<
Anuff explained that DataStax’s vector search uses vector columns as a native data type in Astra DB. With vectors as a data type, Astra DB users can query and search much as they would with any other type of data.
How Cassandra and Astra DB extend the concept of vectors
The vector database capabilities are coming to DataStax’s Astra DB a bit ahead of the availability of the feature in the open-source Cassandra project. Anuff explained that the feature has been added to the open-source project, however, and will be available in the upcoming Cassandra 5.0 release later this year. As a commercial vendor, DataStax is able to pull the code in to its own platform earlier, which is why Astra DB is getting the feature now.
Anuff explained that core to the architecture of Cassandra is the idea of extensible data types. As such, the database can over time incorporate additional native data types. As a native data type, vectors, or any other data for that matter, are integrated with Cassandra’s distributed index system.
“What that means is that I can just keep adding rows to my database into perpetuity, so I can have 100 million vectors, I can have a trillion vectors,” Anuff said. “So if I want to have a large dataset that has a vector for every entry into it, I’m not going to be concerned by the number of vectorized rows that I put out. That’s just what Cassandra does, it’s not an overlay, it’s a native part of the system.”
Native LangChain integration is a bonus
An increasingly common approach to building AI-powered applications is to use multiple LLMs together. This approach is commonly enabled with the open-source LangChain technology that DataStax’s Astra DB now also supports.
The integration allows Astra DB vector search results to be fed into LangChain models to generate responses. This makes it easier for developers to build real-time agents that can not just make a prediction but actually make a recommendation using vector search results from Astra DB and linked LangChain models.
Anuff emphasized that having vector capabilities generally available on the platform is a big step toward making generative AI a reality for enterprise users.
>>Follow VentureBeat’s ongoing generative AI coverage<<
“Getting into [generative AI] is a big step, because we have a lot of customers that are going in and saying, look, can we do generative AI in production this year?” Anuff said. “The answer is: We’re ready to go if you are, so we’re pretty excited about it.”
VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.