Enhancing AI Agent Reliability, NVIDIA Launches Containerized Microservices Inference Microservices
NVIDIA stated that large language models behind AI agents may exhibit adverse reactions and could even trigger security issues when malicious users attempt to breach protections. Nvidia Inference Microservices is an extension of the protective framework NeMo Guardrails provided by NVIDIA for developers, aimed at enhancing the safety, accuracy, and scalability of generative artificial intelligence applications
Author: Zhao Yuhe
Source: Hard AI
NVIDIA launched the Nvidia Inference Microservices (NIM) on Thursday, a containerized microservice for accelerating the deployment of generative AI models, aiming to help enterprises enhance the trust, safety, and reliability of AI agents.
NVIDIA stated in a blog that AI agents are a rapidly evolving technology that is gradually changing the way people interact with computers, but it also comes with many critical issues. Agentic AI is expected to revolutionize how knowledge workers perform tasks and how customers "converse" with brands, but the large language models behind it may still exhibit adverse reactions and even trigger security issues when malicious users attempt to breach defenses.
NVIDIA indicated that the content released on Thursday is an extension of its protective framework NeMo Guardrails provided to developers, aimed at improving the safety, accuracy, and scalability of generative AI applications. NeMo Guardrails is part of NVIDIA's NeMo platform, used to manage, customize, and protect AI, helping developers integrate and manage AI safeguards in large language model (LLM) applications. Currently, Amdocs, Cerence AI, and Lowe’s are using NeMo Guardrails to safeguard AI applications.
NVIDIA's released NIM consists of three types, covering topic control, content safety, and jailbreak protection. The company stated that these microservices are highly optimized small lightweight AI models that can enhance application performance by regulating the responses of large models.
Kari Briski, Vice President of Enterprise AI Models, Software, and Services at NVIDIA, stated.
"One of the new microservices for regulating content safety is trained on the Aegis content safety dataset. This is one of the highest quality datasets in its category, sourced from human annotations."
The Aegis content safety dataset launched by NVIDIA includes over 35,000 human-annotated samples for detecting AI safety issues and attempts to bypass system restrictions. This dataset will be publicly released on Hugging Face later this year.
For example, the NIM for topic control can prevent AI agents from becoming "too talkative" or deviating from their original task objectives, ensuring they stay within the established topic. NVIDIA stated that the longer the conversation with an AI chatbot lasts, the more likely it is to forget the original intent of the dialogue, leading to topic drift, similar to potential rambling in human conversations. While humans may accept this situation, for chatbots, especially brand AI agents, topic deviation could lead to discussions about celebrities or competing products, which could be detrimental to the brand.
Briski stated,
"Small language models in the NeMo Guardrails series have lower latency and are designed for efficient operation in resource-constrained or distributed environments, making them very suitable for scaling AI applications in scenarios such as hospitals or warehouses in industries like healthcare, automotive, and manufacturing." In addition, NIM allows developers to layer multiple protective measures with minimal additional latency. NVIDIA stated that this is crucial for most generative AI applications, as users do not like to wait for long periods, such as seeing a blinking dot or spinning loading animation before text or speech appears.
NVIDIA announced that the NIM microservices, along with NeMo Guardrails and the NVIDIA Garak toolkit for orchestration, are now open for developers and enterprises to use. Developers can start integrating AI protective measures into customer service AI agents through relevant tutorials, utilizing NeMo Guardrails to build secure AI applications.