Agentic AI Is Starting to Improve Telecom Network Stability and Efficiency

With the hype around AI continuing, we’re often asked to give examples of where and why AI is being used. In networks, there’s a clear answer. The rising complexity of telecom networks — especially in the core network — necessitates the use of AI tools to avoid greater operational costs and to enable nimble new services that open up revenue opportunities such as 5G network slicing and network APIs.

Improving reliability is also incredibly important. There’s no point exposing sophisticated network functions through the Camara-based API and GSMA Open Gateway initiatives if that network is not supremely reliable and able to meet the needs of network API customers. Operations and maintenance (O&M) tools and processes need to embrace the AI era to deliver sufficiently reliable networks. In the transition from level 3 to level 4 autonomous networks, two of the areas that benefit most from the use of AI-powered predictive analysis and closed-loop automation are maintenance and troubleshooting, and managing network change.

Yet 5G has increased network complexity with twice the network elements of a 4G horizontal architecture. Plus, with 5G, signalling traffic increases in scale. Typical 5G signalling message volumes are seven times more than 4G, and are 13 times greater than 2G or 3G. The size of each signalling message is also greater, with those of 5G over four times larger than 4G messages. Combined, this increased traffic can lead to enormous signalling storms that take down the network.

We’ve previously written about the ubiquity and enormous impact of such network outages on operator customers in all global regions. And we’ve pointed to examples of AI helping operators too. But it’s important to go deeper into precisely how AI adds value in the move to autonomous telecom networks.

Autonomous Networks Help Operators to Innovate

Operators have many skills, but their network teams tend to be highly capable in traditional network operations, and not always in the DevOps approach that goes alongside the shift to cloudified software-based networks and the use of machine learning models. The move to software is a significant feature of 5G architecture and it will become even more important in the years ahead with 5G-Advanced arriving, and eventually with 6G too.

Many of the new enterprise segments that operators wish to serve with network slicing and private networks are also making this shift to the cloud. Therefore, operator teams need the tools and skills to engage with customer technical teams as well. Offering short-lived network slices, for example at events, is considerably easier if the processes to provision, manage and switch off each slice are automated.

Shifting to an autonomous network takes a load off operator network teams, reducing the number of heads needed. And, with a multimodal generative AI interface for technical documentation, it makes network teams more efficient and speeds their ability to support a software-based network.

Collaboration between Small and Large Telecom AI Models

The AI tools that operators are employing now are a mixture of small and large models. For example, China Mobile’s O&M unit uses a collaborative approach with small models for about 80% of tasks, and only invokes large models when needed and the scope of the task exceeds the capabilities of the small models. For scale, the large model used here is Huawei’s Telecom Foundation Model and has tens of billion parameters. The small models are orders of magnitude smaller and are in the hundreds-of-million-parameter range, which makes them easy to run, but more specialized in a particular role.

Huawei offers two main agents: a complaint-handling agent, CompSpirit, and a fault-handling agent, called AssurSpirit. For complaints, the agent can classify issues into categories and run analysis. Faults can be similarly handled by AssurSpirit. Both agents are based on the large Huawei Telecom Foundation Model, but can be smaller because they focus on narrower tasks. Interaction between agents leads to more closed-loop processes, aiding operators moving from autonomous network level 3 to level 4.

Reshaping the Complaint- and Alarm-Handling Processes to Improve Efficiency

The challenge with ticket handling is that the skill set needed to deal with each issue is expensive, with a typical person having over five years of experience. The time needed per ticket is also too long, with a great deal of mechanical tasks. CompuSpirit classifies complaints, performs basic and signalling analysis, then uses generative AI to automatically fill in forms to speed dispatch.

Huawei claims that its tool can improve the forward-handling rate from 5% to 75% and reduce the time per ticket from over 14 hours to just five. In a network with 270 tickets each month, this would save the equivalent of 18 highly skilled heads.

AssurSpirit takes a similar approach to alarm handling. Here multiple alarms may be triggered by an underlying issue. It simplifies processes by analysing alarms and comparing them with typical fault scenarios, diagnosing the underlying problem and suggesting a solution. Huawei reports this tool can handle 90% of alarms with 90% accuracy and can reduce the time needed to analyse alarms from 90 minutes to 10 minutes.

At China Mobile, the solution is integrated with the enterprise version of WeChat. So, when technicians are away from their computer they can continue to interact with the system and contribute to efficient network management. Such solutions could be integrated with other similar tools like Microsoft Teams, Slack or WhatsApp for companies in other markets that have standardized on one of those collaboration tools.

Telecom Large Language Models Democratize Network Operations Knowledge

Just as the most widely-known large language models (LLMs) like ChatGPT, Anthropic’s Claude, or Google’s Gemini provide a swift easy-to-use interface to the internet’s knowledge, a telecom model can make technical documentation more accessible. However, unlike internet LLMs, a telecom model is trained on the technical documentation of the network equipment alongside operational data from probes, past alarms and complaint tickets. So the model can be smaller — although still large — and is tailored to the telecom tasks at hand.

A typical network operations centre (NOC) specialist needs five to 10 years’ experience and cross-domain knowledge. Huawei’s NOCMate multimodal tool aims to provide a natural language interface to support NOC employees to reduce the experience needed to one to three years. Such employees are easier to find and cheaper to employ, meaning that an operator can deploy its highly skilled NOC team members selectively where they are actually needed. This helps an operator to run a more complicated network at a similar, or lower cost, than before.

Level 4 Automation Will Be Boosted by “Agentic AI” Interactions

As the telecom industry moves ahead with autonomous networks, more closed-loop fully automated interactions will be deployed. Ten years ago we would have expected those to be interfaces between traditional software and coded by humans. Now, increasingly, we will see task-focused telecom AI models interacting directly with other AI models, and drawing on the greater knowledge of large telecom models for support when needed.

Generative AI will lower the skill set needed for many technical roles by supporting human decision-making through natural language interfaces. Plus, it will help to connect automated systems with established human-focused tools, for example by prepopulating forms. This will make it easier to integrate autonomous tools into existing network processes and move step-by-step into greater automation. There’s much to do as we move first to level 4, and later to fully autonomous level 5 networks, but there are already many examples of AI being used to automate telecom networks.

Written by: Ian Fogg

Posted on 22 November 2024