Vector databases arrived on the scene a few years ago to help power a new breed of search engines that are based on neural networks as opposed to keywords. Companies like Home Depot dramatically improved the search experience using this emerging tech. But now vector databases are finding a new role to play in helping organizations deploy chatbots and other applications based on large language models.
The vector database is a new type of database that is becoming popular in the world of machine learning and AI. Vector databases are different from traditional relational databases, like PostgreSQL, which was originally designed to store tabular data in rows and columns. They’re also decidedly different than newer NoSQL databases, such as MongoDB, which store data in JSON documents.
That’s because a vector database is designed for storing and retrieving one specific type of data: vector embeddings.
Vectors, of course, are the numerical arrays that represent various characteristics of an object. As the output from the training part of the machine learning process, vector embeddings are the distilled representations of the training data. They essentially are the filter through which new data is run through during the inference part of the machine learning process.
The first big use case for vector databases was powering next-generation search engines as well as production recommender systems. Home Depot dramatically improved the accuracy and usability of its website search engine by augmenting traditional keyword search vector search techniques. Instead of requiring a perfect keyword match (or a database filled with common misspellings of Home Depot’s 2 million products), vector search enables Home Depot to use the power of machine learning to infer the intent of a user.
But now vector databases are finding themselves smack dab in the middle of the hottest workload in tech: large language models (LLMs) such as OpenAI’s GPT-4, Facebook’s LLaMA, and Google’s LaMDA.
In LLM deployments, a vector database can be used to store the vector embeddings that result from the training of the LLM. By storing potentially billions of vector embeddings representing the extensive training of the LLM, the vector database performs the all-important similarity search that finds the best match between the user’s prompt (the question he or she is asking) and the particular vector embedding.
While relational and NoSQL databases have been modified to store vector embeddings, none of them were originally designed to store and serve that type of data. That gives a certain advantage to native vector databases that were designed from the ground up to manage vector embeddings, such as those from Pinecone and Zilliz, among others.
Zilliz is the primary developer of Milvus, an open source vector database first released in 2019. According to the Milvus website, the database was developed in the modern cloud manner and can deliver “millisecond search on trillion vector datasets.”
Last week at Nvidia’s GPU Technology Conference, Zilliz announced the latest release of the vector database, Milvus 2.3. When paired with an Nvidia GPU, Milvus 2.3 can run 10x faster than Milvus 2.0, the company said. The vector database can also run on a mixture of GPUs and CPUs, which is said to be a first.
Nvidia also announced a new integration between its RAFT (Reusable Accelerated Functions and Tools) graph acceleration library and Milvus. Nvidia CEO Jensen Huang spoke about the importance of vector databases during his GTC keynote.
“Recommender systems use vector databases to store, index, search, and retrieve massive data sets of unstructured data,” Huang said. “A new important use case of vector databases is large language models to retrieve domain specific or proprietary facts that can be queried during text generation…Vector databases will be essential for organizations building proprietary large language models.”
But vector databases can also be used by organizations that are content to leverage pre-trained LLMs via APIs exposed by the tech giants, according to Greg Kogan, Pinecone’s vice president of marketing.
LLMs such as ChatGPT that have been trained on huge corpuses of data from the Internet have shown themselves to be very good (although not perfect) at generating appropriate responses to questions. Because they’ve already been trained, many organizations have started investing in prompt engineering tools and techniques as a way to make the LLM work better for their particular use case.
Users of GPT-4 can prompt the model with up to 32,000 “tokens” (words or word fragments), which represents about 50 pages of text. That’s significantly more than GPT-3, which could handle about 4,000 tokens (or about 3,000 words). While the tokens are critical for prompt engineering, the vector database also has an important role to play in providing a form of persistence for LLMs, according to Kogan.
“Now you can fit 50 pages of context, which is pretty useful. But that’s still a small portion of your total context within a company,” Kogan says. “You may not even want to fill the whole context window, because then you pay a latency and cost price.
“So what companies need is a long term memory, something to add on to the model,” he continues. “The model is what knows the language–it can interpret it. But it needs to be coupled with long-term memory that will store your company’s information. That’s the vector database.”
Kogan says about half of Pinecone’s customer engagements today involve LLMs. By stuffing their vector database with embeddings that represent their entire knowledge base–whether it’s retail inventory or corporate data–Pinecone customers gain a long-term storage area for their proprietary information.
With Pinecone serving as the long-term memory, the data flow works a little differently. Instead of submitting a customer’s question directly to ChatGPT (or other LLM), the question is first routed to the vector database, which will retrieve the top 10 or 15 most relevant documents for that query, according to Kogan. The vector database then bundles those supporting documents with the user’s original question, submits the full package as the prompt to the LLM, which returns the answer.
The results of this approach are superior to just blindly asking ChatGPT questions, Kogan says, and also helps with LLM’s pesky hallucination problem. “We know that this is a kind of a workflow that works really well, and we are trying to educate others about it too,” he says.
Milvus 2.3 Launches with Support for Nvidia GPUs
Prompt Engineer: The Next Hot Job in AI
Home Depot Finds DIY Success with Vector Search
ChatGPT, GPT-4, large language models, LLM, long-term memory, Milvus, neural search, prompt, prompt engineering, Raft, tokens, vector database, vector databases, vector search