The arrival of a B-end artifact? Microsoft releases SpreadsheetLLM, which can significantly enhance AI capabilities in Excel

Wallstreetcn
2024.07.16 03:09
portai
I'm PortAI, I can summarize articles.

This model encodes spreadsheet data into a format understandable by Large Language Models (LLMs), enabling LLMs to infer from spreadsheet data, answer questions about the data, and even generate new spreadsheets based on natural language prompts. Netizens joked: "Karen's job will soon be replaced by artificial intelligence"

On the 12th, Microsoft released a new large-scale language model, planning to develop a new AI language model called SpreadsheetLLM for spreadsheet applications such as Excel and Google Sheets.

In the paper, Microsoft pointed out that SpreadsheetLLM, as a new AI model, will be widely used for understanding and processing complex spreadsheet data.

SpreadsheetLLM has the potential to change the management and analysis of spreadsheet data, paving the way for more intelligent and efficient user interactions.

This may make accountants and data analysts worried about their future job prospects. Netizens joked on the social platform X, suggesting that "Karen's job will soon be replaced by artificial intelligence."

"Karen may soon be unemployed"

Researchers pointed out that current spreadsheet applications provide users with a wide range of choices in layout and formatting, making it difficult for traditional AI language models to work effectively in spreadsheet processing. SpreadsheetLLM is specifically designed for spreadsheet applications.

Microsoft has also developed the SheetCompressor tool to help SpreadsheetLLM better understand and process spreadsheet data.

Researchers stated that SpreadsheetLLM has a wide range of potential applications, from automatically performing daily data analysis tasks to providing intelligent insights and recommendations based on spreadsheet data. For example, SpreadsheetLLM can be used to automatically generate financial reports, identify anomalies or trends in data, and provide personalized product or service recommendations to customers.

Therefore, SpreadsheetLLM has the potential to completely change the way companies handle data.

One user claimed, "As we know, an LLM that can write SQL will kill the entire data engineering industry."

Another wrote, "SaaS is in deep trouble."

"This will have a huge impact on the financial industry." Associate Professor Ethan Mollick of the Wharton School at the University of Pennsylvania wrote on Twitter: "This once again demonstrates that LLM can quickly handle structured and unstructured spreadsheet data. This will unlock many use cases (forecasting, finance, valuation, etc.), and having real sources of spreadsheet data often reduces illusions."

How does SpreadsheetLLM work?

SpreadsheetLLM encodes spreadsheet data into a format understandable by large language models (LLM), enabling LLM to reason about spreadsheet data, answer questions about the data, and even generate new spreadsheets based on natural language prompts.

At the core of SpreadsheetLLM is the "SheetCompressor" framework, which effectively compresses and encodes spreadsheet data to make it easier for LLM to process. SheetCompressor consists of three modules:

▲ Structure Anchor-based Compression: Placing "structure anchors" throughout the spreadsheet to help LLM understand the data structure.

▲ Inverse Index Translation: Converting the spreadsheet into a more compact format and eliminating redundant data.

▲ Data Format-aware Aggregation: Grouping adjacent cells based on numeric formats and data types.

Illustration of the SHEETCOMPRESSOR framework (Image: Microsoft)

Microsoft states that SpreadsheetLLM significantly improves the performance of spreadsheet detection tasks, outperforming conventional methods by 25.6% in the context learning setting of GPT4, reducing the cost by 96% in terms of tokens, and providing better processing results.

Currently, Microsoft has not announced when they will release SpreadsheetLLM to the public. The paper notes that the model still has some limitations, such as limited understanding capabilities for complex or highly structured data, and SheetCompressor currently cannot compress cells containing natural language, among other things