
Guotai Junan: DeepSeek NSA architecture leads AI efficiency revolution, bringing new development opportunities from infrastructure to application layer

Guotai Junan released a research report stating that DeepSeek has launched the NSA (Native Sparse Attention) technology, breaking through the bottleneck of long text processing and promoting the transformation of AI large models towards algorithmic efficiency competition. NSA enhances long text processing efficiency through three parallel attention branches, lowers the development threshold for AI applications, facilitates the popularization of AI technology, and fosters new application scenarios and business model innovations. This technology significantly reduces the computational resources required for pre-training, lowers the threshold for small and medium-sized enterprises to participate in AI development, and brings new market opportunities
According to the Zhitong Finance APP, Guotai Junan released a research report stating that DeepSeek has published a paper related to NSA (Native Sparse Attention), breaking through the bottleneck of long text processing. The feasibility of low-cost model training promotes the shift of AI large models towards algorithmic efficiency competition. By lowering the development threshold for AI applications, it is expected to stimulate a new wave of innovation, ultimately accelerating the penetration of AI across various industries and driving the upgrade of the entire industrial chain, with new development opportunities emerging from infrastructure to application layers.
Guotai Junan's main points are as follows:
NSA achieves breakthroughs in long context processing through a native sparse attention mechanism.
Long context processing is one of the key bottlenecks in the development of large models, with attention computation in the softmax architecture accounting for 70%-80% of the total decoding latency for 64k contexts. NSA employs three parallel attention branches: compressed attention captures global information, selective attention retains important tokens, and sliding window attention processes local contexts. In general benchmark tests, it performs comparably to full attention models while achieving efficiency improvements in long text processing.
The reduction of computational barriers will accelerate the democratization of AI.
The NSA technology significantly reduces the computational resources required for pre-training through end-to-end sparse training, decreasing the usage time of A100 GPUs during the pre-training process. This lowers the financial and technical barriers for enterprises to develop large models, allowing more small and medium-sized enterprises to participate in foundational AI development. The significant reduction in computational barriers will promote the widespread adoption of AI technology from a few tech giants to a broader market.
The enhancement of long text processing capabilities will give rise to new application scenarios and drive business model innovation.
The NSA technology enables models to directly process entire books, code repositories, or thousands of rounds of customer service dialogues. This improvement in long sequence processing capabilities will significantly expand the application boundaries of AI in document analysis, code generation, and other fields. Particularly in low-latency scenarios such as edge computing, the efficient inference characteristics of NSA may give rise to entirely new business models, thus providing new market opportunities for hardware manufacturers, solution providers, and others.
Risk Warning: Risks of intensified technological competition and commercialization processes falling short of expectations
