Key Takeaways
- AI is accelerating data creation and distributed workflows, making traditional reactive approaches inadequate for enabling resilience.
- Resilience operations (ResOps) help teams shift from reactive troubleshooting to coordinated action, with Commvault applying AI across three areas: protecting AI data, models, and pipelines; leveraging AI to guide and accelerate response; and extending AI across the broader resilience ecosystem.
- Practical AI agents are designed to surface operational issues (Arlie Data Sense), guide protection decisions (Arlie Advisor), and enable conversational workflows (MCP server).
- Data security and governance remain foundational. AI must be trained to respect access controls, maintain auditability, and operate within policy boundaries.
- Organizations can start small with targeted agents and scale toward coordinated, intelligent operations.
AI introduces both new challenges and potential breakthroughs for enterprise resilience. On one hand, traditional siloed tools for protection, recovery, and governance weren’t designed to operate across constantly evolving AI environments that span multiple platforms.
On the other hand, AI-enabled resilience tools can deliver a transformative impact by helping teams maintain visibility, enforce policy, and recover cleanly. For IT and security teams, the question is how best to leverage the benefits of AI while mitigating the operational risks it can pose.
In a recent webinar, I sat down with Teja Medasani, Principal Product Manager, AI, at Commvault, to explore real-world use cases that put AI agents to work in resilience workflows across cloud, SaaS, on-premises, and AI-native platforms.
Why AI Is Redefining Resilience Operations
The rapid growth and dynamic nature of AI-native environments have put operational workflows under pressure. Manual tagging, spreadsheets, and logic quickly drift out of sync. Sprawling job history tables and audit trails slow manual troubleshooting and make subtle warning signs easy to miss. Recovery processes that assume centralized data and isolated failures are poorly suited for exponential data growth and fragmented workloads across platforms.
When data, workloads, and environments span platforms, resilience can’t remain siloed in separate teams, tools, and policies. A new operating model is needed: resilience operations, or ResOps.
What ResOps Looks Like in Practice
The ResOps framework addresses these challenges across three dimensions:
- Protect AI: Safeguarding AI data, models, and pipelines across environments so they remain recoverable and compliant.
- Leverage AI: Using AI to help reduce manual effort, surface operational insight, and guide response and recovery decisions.
- Extend AI: Connecting ResOps across tools and teams to help protect conversational interactions and integrated workflows.
In the webinar, we focused primarily on leveraging and extending AI, highlighting key roles AI agents can play in day-to-day operations. These examples center on Arlie, Commvault’s AI assistant. Designed to help users interpret data, understand issues, and move toward action more efficiently, Arlie includes a library of agents purpose-built to help address specific resilience workflows.
By helping reduce repetitive analysis, surfacing meaningful signals, and guiding decisions around security-aware recovery, these agents can help teams take actions more quickly and confidently. Arlie Data Sense, Arlie Advisor, and Commvault’s MCP server illustrate a few of the possibilities unlocked by AI-enabled ResOps.
Surfacing Operational Issues with Arlie Data Sense
Arlie Data Sense helps teams make sense of dense operational data like job history tables and audit trails. Instead of manually scanning through hundreds of rows to find patterns or diagnose failures, users can trigger Arlie to help analyze the data and generate an executive summary highlighting anomalies and emerging issues.
Teams can ask follow-up questions in natural language and explore data further through interactive summaries or visualizations. When a job fails, Arlie can help analyze logs, summarize the failure, identify the possible cause, and provide next steps for resolution.
Guiding Response Decisions with Arlie Advisor
As workloads are added, ownership changes, and requirements shift, maintaining protection coverage across environments becomes increasingly difficult. Arlie Advisor is designed to help teams create and validate protection plans at scale by evaluating the characteristics and current protection coverage for each resource, and then highlighting where adjustments may be needed.
Recommendations are presented clearly with reasoning explained, so teams can evaluate them and decide how to apply them within existing governance processes. This helps teams maintain consistency across dynamic environments.
Extending Resilience Workflows with MCP Server
Resilience workflows often need to connect with ticketing systems, collaboration tools, and security platforms outside the Commvault platform, and they need to be accessible to users who aren’t resilience experts. Commvault’s MCP server makes it possible to extend workflows without custom integrations or significant training by allowing conversational interaction.
Users can ask questions or request actions in natural language, with their prompts translated into governed API calls behind the scenes – for example, to automatically create tickets in ServiceNow for failed jobs.
Coordination and Clarity Across Teams and Platforms
The examples above share a common theme: coordination. Effective resilience requires visibility, policy enforcement, and clean recovery across environments. AI can help strengthen these capabilities by helping teams identify what matters and act more quickly.
While the evolution of resilience from reactive recovery to continuous insight and guided action has become essential, it doesn’t need to happen all at once. Teams can start with targeted agents that address specific operational pain points and then build toward more coordinated operations as capabilities mature and teams gain confidence.
The key is to begin the ResOps journey now – because the challenges posed by evolving resilience requirements will only keep growing.
Watch the full webinar on-demand to see detailed demos of Arlie Data Sense, Arlie Advisor, and conversational resilience in action, and explore how AI-enabled ResOps can help support your operational workflows.
FAQs
Q: What is resilience operations?
A: ResOps is an operating model that unifies data security, identity resilience, and cyber recovery into a continuous, automated discipline rather than treating them as separate IT functions. ResOps helps transform resilience from a reactive response to incidents into an active practice that helps continuously understand data access patterns, detect threats and anomalies, and enable fast, intelligent recovery at scale.
Q: What is Arlie and how has it evolved?
A: Arlie, short for autonomous resilience, was first introduced in 2023 as an AI assistant to help users navigate the Commvault platform more easily. As AI capabilities have evolved, Arlie has evolved as well.
In addition to answering questions and guiding configuration, Arlie now also includes a library of purpose-built agents to address specific resilience workflows, such as surfacing operational insights, recommending protection strategies, and guiding security-aware recovery decisions. Arlie has become an entry point into operational insight rather than just a how-to assistant.
Q: How does Arlie Data Sense help with operational troubleshooting?
A: Arlie Data Sense helps teams make sense of dense operational data like job history tables and audit trails. Instead of manually scanning through hundreds of rows to find subtle warning signs or diagnose issues, users can trigger Arlie to analyze the full data set and generate an executive summary highlighting patterns, anomalies, and emerging issues.
Teams can ask follow-up questions in natural language and explore data through interactive summaries or visualizations. For failed jobs, Arlie provides root-cause analysis by analyzing logs, summarizing failures, identifying possible causes, and providing personalized next steps for resolution.
Q: What does “guided action” mean in the context of AI-enabled resilience?
A: Guided action refers to AI helping teams move from insight to response more efficiently by recommending specific actions based on analysis of operational data and protection coverage. Rather than simply surfacing information, AI agents like Arlie Advisor help evaluate resource characteristics, identify gaps between current protection and policy expectations, and present clear recommendations with reasoning.
Teams retain decision-making authority and can evaluate recommendations within their existing governance processes, but the agent helps reduce the manual effort required to identify what needs attention and what actions may be appropriate.
Q: How does MCP server enable conversational resilience workflows?
A: MCP server uses Model Context Protocol technology to enable conversational interaction with resilience workflows through natural language. Users can ask questions or request actions in everyday language, and those requests are translated into governed API calls behind the scenes.
Identity, role-based access control, and audit logging remain in place, so the conversational interface doesn’t bypass security requirements. This approach helps reduce friction for experienced teams, lower barriers for new users, and enable resilience workflows to integrate more easily with other enterprise systems like ticketing platforms through standardized interfaces.
Q: How does Commvault enable AI to respect security and governance requirements?
A: In Commvault Cloud, AI interactions inherit the same identity and role-based access controls that govern the rest of the platform. When AI surfaces insights or recommends actions, it operates within the governance framework customers already rely on.
This means AI respects existing access controls, maintains auditability through standard logging, and operates within clearly defined policy boundaries. The architecture is designed to prevent natural language interactions or agent recommendations from bypassing the security and governance requirements already in place for the platform.
Q: Can organizations adopt AI-enabled ResOps incrementally?
A: Yes. Organizations can start with targeted AI agents that address specific operational pain points rather than transforming their entire resilience practice at once. For example, teams might begin by using Arlie Data Sense to help surface insights from operational data, then add Arlie Advisor to help maintain protection coverage at scale, and later enable conversational workflows through MCP server for easier integration with other systems.
This incremental approach allows teams to build confidence with AI-enabled capabilities, demonstrate value in specific workflows, and scale toward more coordinated, intelligent operations over time as the organization’s needs and capabilities evolve.
Vir Choksi is Principal Product Marketing Manager at Commvault.