Vectara grounds AI accuracy with Boomerang vector embedding

4 min read


Head over to our on-demand library to view sessions from VB Transform 2023. Register Here

The issue of AI hallucinations is a big challenge when it comes to enterprise AI adoption. After all, no organization wants to generate inaccurate results from generative AI efforts.

Among the many organizations looking to solve the problem of AI hallucination is Vectara, which first emerged from stealth in October 2022, led by one of the co-founders of Big Data vendor Cloudera.

In May, the company updated its Generative AI platform with a grounded search capability in an attempt to provide retrieval augmented generation (RAG) results based on content.

Today the company is going a step further in its quest to reduce the risk of AI hallucination with the debut of its new Boomerang technology that the company refers to as a neural information retrieval model. Boomerang provides a new approach to generating the vector embeddings that are at the foundation of large language models (LLMs) to enable a higher degree of accuracy — with less hallucination.


VB Transform 2023 On-Demand

Did you miss a session from VB Transform 2023? Register to access the on-demand library for all of our featured sessions.


Register Now

“It’s a retrieval mode, it’s fundamentally there to serve the following purpose, the user sends a query into some kind of knowledge base and relevant information comes back out of the knowledge base,” Amin Ahmad, co-founder and CTO of Vectara told VentureBeat. “So there’s that kind of boomeranging action.”

Boomerang is the encode block in this picture, it is what takes the text and converts it to vectors/embeddings representing the meanings behind the text. The Generate block below is the LLM that produces the final output as function of the user’s prompt and the retrieved facts. (Image credit: Vectara)

Advancing the State-of-the-Art for Vector embedding

The new Boomerang engine will make Vectara’s GenAI platform more accurate and builds on the company’s grounded generation approach.

“The way grounded generation works, is you take your data and you put it in a special vector database, or a meaning space – which is the term we use,” Amr Awadallah, co-founder and CEO of Vectara told VentureBeat. “And if you can’t map your data properly inside of this meaning space, then when the user question comes in, you are not going to get the proper facts coming back.” 

Boomerang is the new Vectara developed model that generates the vector embeddings that represents the meanings behind the words, regardless of language. The process of creating vector embeddings is critical and is one that the big LLM vendors all have. For example, OpenAI has its own ada embedding models which have been steadily improved in recent years as well. 

Awadallah explained that Boomerang is an upgraded engine from what his company had before, and enables the creation of a higher degree of quality and accuracy for the vector embeddings. The core enterprise benefit of Boomerang is that it enables the creation of what Awadallah said are better facts.

“Because now we have way better facts, everything else improves, the hallucination probability goes down and the explainability becomes way better on the output side,” he said.

The patch toward zero hallucinations

As to precisely how Boomerang creates better vector embeddings, there is a great deal of complexity.

“The way that we got to this new model from the previous model we had is through application of a large number of new and additional techniques, as well as a lot more varying and diverse training data,” Ahmad said.

Ahmad noted that Vectara is aiming to publish some research papers detailing some of the new and unique methods that help to enable the Boomerang vector embedding approach. Awadallah echoed his co-founder partner noting that his company did in fact come up with new techniques that will be detailed in future academic research.

“There was a lot of research, a lot of trial and error, a lot of things that didn’t work and things that did work, that got us to this point where we now can exceed a couple of the most advanced companies in this space,” Awadallah said.

Vectara claims that Boomerang is able to outperform other larger models in cross-lingual retrieval and is able to better understand content in hundreds of languages and dialects. While the updated platform does make strides to reducing the risk of hallucination, there is still more that Vectara needs to do.

“Hallucination is not 0% and we want it to be 0%,  so we will be continuing our research in terms of how to get hallucination to be significantly minimized, which is critical for business contexts,” Awadallah said.

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.


Source link