Track Hyper | Tencent Cloud's AGI Infrastructure: Boosting LLM with Vector Databases
Some people work diligently and steadfastly.
Since ChatGPT became popular, major platform-type companies in China have rushed to release AGI or specialized LLM, but there are few large-scale model infrastructures, such as vector databases.
On July 4th, TENCENT Cloud addressed this issue by launching Tencent Cloud VectorDB, a vector database that can be widely applied in scenarios such as large-scale model training, inference, and knowledge base supplementation. This is the first vector database in China that provides AI capabilities throughout the entire lifecycle, from access layer to computing layer to storage layer.
What is a vector database? What is TENCENT's purpose? What are the benefits for B-side applications?
The First Full Lifecycle VectorDB in China
ChatGPT has made LLM the focus of the AI field, and vector databases have become a hot topic as a result. A vector database can be seen as a cost-effective but effective "external brain" for LLM.
What does this mean?
Firstly, vectors are often used to represent the position, features, or attributes of data values/points in a multi-dimensional space. Vector data is a mathematical expression, which uses an ordered set of values (usually floating-point numbers) to represent an object or data value/point.
For example, in computer vision, an image can be represented by values (i.e., pixel values), which form a vector. Each value corresponds to the color intensity of a pixel in the image. For instance, an 8x8 grayscale image can be represented by a vector containing 64 values.
Secondly, vectors are not exclusive to AGI and were initially used in recommendation-based AI. Due to their ability to represent characteristics of data points in multi-dimensional space, vectors are used for personalized recommendations in recommendation systems. In these systems, both users and items can be represented by vectors.
For example, a user's personalized preferences for clothing color, style, material, and usage can be summarized using a numerical vector. Personalized recommendations are achieved by calculating the similarity between the user vector and item vectors.
In the AI world, vectors are used to represent everything in the physical world. With the rise of LLM, the demand for storing and computing vector data has been greatly unleashed.
By December 2022, vectors have already been extensively applied in various AI recommendation systems. However, due to the lack of a dedicated vector database, vector data is scattered across a large number of recommendation system files.
Although the structure of vector data is relatively simple, there are numerous algorithms available due to the wide range of application scenarios, such as machine vision, text and image processing, neural networks, and natural language processing in the multi-model AGI field.
Retrieving vector data from multiple systems consumes a significant amount of GPU and CPU resources, resulting in high costs and low efficiency. Luo Yun, Deputy General Manager of TENCENT Cloud Database, said, "As the workload of using vector databases continues to increase, plugin-based databases will face challenges." Since 2019, some general/plugin databases have started to support vector databases, such as ElasticSearch, Redis, and PostgreSQL.
TENCENT Cloud's vector database was also launched in 2019. At that time, TENCENT Cloud introduced a distributed vector database storage technology engine called "OLAMA," which became one of the foundations of TENCENT Cloud's vector database. OLAMA currently supports billion-level single-row vector data indexing, achieves millions of queries per second (QPS), and has a response latency of 20 milliseconds.
Since 2019, TENCENT Cloud has continuously enriched the AI capabilities of the OLAMA engine, such as adding several vector indexing algorithms like Embedding and Segment algorithms, as well as NLP (Natural Language Processing) retrieval capabilities.
What is the role of vector data algorithms?
Simply put, the purpose of vector data algorithms is to find vectors faster, more accurately, at a lower cost, and with lower latency, and to find those vectors that are more similar. This is the purpose of vector data algorithms and is also reflected in the improved algorithm capabilities of TENCENT's vector database.
Zheng Wei, Senior Tech Lead of TENCENT PCG Big Data Platform Search and Recommendation, said, "In addition to being fast, efficient, and low-latency, another major improvement of TENCENT Cloud's vector database is that while achieving low cost, the stability of the OLAMA engine has also been greatly improved."
Thirdly, in terms of applicability, TENCENT Cloud's vector database has made a lot of improvements. "How to provide services more automated and intelligent," said Zheng Wei. "After the emergence of large models, how to better adapt to large models and expand further. For example, users can obtain various bills, data, and reports by simply using our vector database and typing on the keyboard."
Cost reduction, improved memory capability
TENCENT Cloud's launch of a professional vector database is driven by strong demand from the customer side.
According to Luo Yun, "Basically, more than 1-2 customers come to inquire about when they can use the vector database every day."
There are three stages of demand for vector databases: The first stage is that the type must be a vector database. Luo Yun said that because TENCENT Cloud has been continuously iterating the vector database storage and indexing engine since 2019, this step has been directly crossed; the second stage is to solve the cost problem. For example, the cost of a single QPS (Query Per Second) query, how much money the demand side needs to spend for each query.
The third stage is the usability of vector databases for B-end users, which requires TENCENT Cloud to have rich experience in AI applications in various industries. The vector database is used to provide storage and retrieval for vectors. The demand side needs to convert unstructured data (such as a piece of text) into vectors, segment the text, and then find a reasonable model for embedding (vectors). In ML (Machine Learning) and NLP, Embedding is an N-dimensional real-valued vector that can be used to represent/describe almost any form, such as text, sound, or video. The real-valued Embedding can describe the semantic meaning of words, mainly because the Embedding vector can learn from the forms in which the word appears in the linguistic context.
Tencent Cloud Vector Database has been applied in more than 30 business scenarios, including QQ Browser, Tencent Video, Tencent Games, QQ Music, and Sogou Input Method. Currently, the Tencent Cloud team is still in the third phase of meeting demand, with a focus on "improving the overall applicability of products through AI transformation," said Luo Yun.
Why does LLM need to use vector search technology?
The process of finding the most similar object to a given object in a collection is called vector search. Text or images, for example, can be transformed into vector representations, thereby transforming the similarity problem of text or images into a similarity problem of vectors.
Here's a problem: the LLM contextual model has a length limit. For example, the ChatGPT 3.5 context length limit is 4k tokens. If it exceeds the context length, ChatGPT will "forget," affecting the accuracy of interaction results (or Context Learning).
However, one of the abilities of vector search is to divide text that exceeds the context length limit into shorter groups (Chunks) and convert different groups into Embeddings. This is equivalent to the memory capability of vector search. With the help of Embedding, LLM can find the most relevant information to the prompt.
From this perspective, the vector database takes on the memory task of LLM's Context Learning, thereby improving the interaction accuracy of GPT.
In addition, due to the long training time and high cost of LLM based on the Transform architecture, it is difficult to incorporate the latest materials (data) into LLM, which is known as the timeliness limitation of LLM. If end users urgently need LLM data for real-time updates, the role of the vector database becomes significant.
The space limitation of LLM is also evident. For example, B-end users' private domain data is not convenient to provide to the LLM training platform for high-frequency collective training. In this case, B-end users can store their private domain data in the vector database and temporarily retrieve it for inference when needed by LLM. The benefits of doing so are data security, high training efficiency, and low cost.
Why is this possible?
Because the vector database is specifically designed for storing and querying vector data, it is known as the "hippocampus" of large models in the industry.
Tencent Cloud Vector Database supports up to a billion-level vector retrieval scale, with latency controlled in milliseconds. Compared to traditional single-machine plug-in database retrieval scale, it has a 10-fold increase, while also having a peak capacity of millions of queries per second (QPS). Using TENCENT Cloud Vector Database for classification, deduplication, and cleaning of large-scale pre-training data can improve efficiency by 10 times compared to traditional methods. If the vector database is used as an external knowledge base for model inference, it can reduce costs by 2-4 orders of magnitude.