Skip to Content

AI and Automation Incident Management with AWS DevOps Agent and New Relic MCP Server

Reimagining Incident Management in Modern IT Environments

Get All The Latest to Your Inbox!

Thanks for registering!

 

Advertise Here!

Gain premium exposure to our growing audience of professionals. Learn More

Operational incidents in distributed systems can overwhelm even the most experienced Site Reliability Engineers (SREs) and DevOps teams. Traditional methods,manually sifting through logs, metrics, and traces,often lead to slow root cause analysis and prolonged downtime. Integrating AWS DevOps Agent with New Relic’s Model Context Protocol (MCP) server brings AI-driven automation to streamline and even preempt incident resolution, empowering teams to work smarter and faster.

Unlocking the Power of Seamless AI Integration

The New Relic MCP server acts as a standardized gateway for AI agents like AWS DevOps Agent, removing the need for custom API development and enabling direct, secure access to observability data. AWS DevOps Agent serves as a “frontier agent,” proactively learning from your resources, code repositories, and monitoring tools to detect, resolve, and prevent incidents across AWS, multi-cloud, and hybrid environments.

  • Automated Investigations: Integration with platforms like ServiceNow and Slack allows the agent to automatically investigate incident tickets, reducing mean time to resolution (MTTR).

  • Incident Coordination: Teams can launch or guide investigations through interactive chat, making collaboration with the agent seamless in familiar environments.

  • Root Cause Analysis: By correlating telemetry, code, and deployment data, the agent systematically narrows down the true cause of issues.

  • Step-by-Step Mitigation: Once the source is found, the agent generates actionable mitigation plans, including validation and rollback options.

  • Proactive Prevention: Leveraging historical data, the agent recommends enhancements to observability, infrastructure, and deployment processes to decrease future incidents.

Simplified Onboarding: Fast-Track to Automation

Getting started is straightforward. Teams establish an Agent Space within AWS DevOps Agent and register their New Relic servers. Key setup steps in the AWS Management Console include:

  • Creating an Agent Space with the correct IAM roles
  • Adding New Relic as a telemetry provider via the Capabilities tab
  • Generating a secure webhook and bearer token for alert notifications
  • Configuring Amazon EventBridge and AWS Lambda in New Relic to route alerts and authenticate requests to the agent webhook

This enables real-time, authenticated communication between New Relic and AWS DevOps Agent, so incidents trigger automated investigations the moment anomalies arise.

Real-World Impact: Rapid Retail Incident Response

Imagine a retail chain facing a sudden latency spike in its online shopping cart service, risking lost transactions and customer trust. Traditionally, operations teams would lose time manually reviewing dashboards and logs to diagnose the issue. With AWS DevOps Agent and New Relic, the response is transformed:

  • New Relic APM agents detect the latency and trigger an alert.

  • The alert is routed via EventBridge and Lambda, authenticated, and delivered to AWS DevOps Agent.

  • The agent initiates an automated investigation, gathers relevant telemetry, identifies correlated entities and change events, and analyzes logs and traces.

  • Root cause findings and mitigation steps are shared with the SRE team in a collaborative chat interface.

  • SREs can review or execute the recommended actions, dramatically reducing MTTR and minimizing business impact.

This approach allows teams to focus on strategic improvements instead of manual data collection and troubleshooting, leading to greater efficiency and resilience.

Takeaway: Advancing Operational Excellence with AI

Integrating AWS DevOps Agent with New Relic MCP server raises the bar for operational efficiency. By automating root cause analysis, remediation, and proactive risk mitigation, organizations can resolve issues faster and even prevent disruptions before they occur. This powerful combination of observability, AI, and automation enables SRE and DevOps teams to deliver reliable digital experiences, supporting continuous innovation and business growth.

Source: AWS DevOps & Developer Productivity Blog

AI and Automation Incident Management with AWS DevOps Agent and New Relic MCP Server
Joshua Berkowitz December 20, 2025
Views 1199
Share this post