Why

A cost spike running 2 weeks unnoticed costs 14× what catching it on day one costs. Without automated detection, anomalies are discovered when the monthly invoice arrives — by then the financial impact is unrecoverable. Budget alerts ensure visibility early; ML-based detection catches patterns that static thresholds miss.

Without Anomaly Detection
═══════════════════════════════════════════════════════════════

  Day 1          Day 7           Day 30           Day 35
  Anomaly        Still           Invoice          Someone
  starts         running         arrives          notices

  $500/hr        $84K            $360K            "Why is
  runaway        wasted          on the bill       this so high?"

  ◄──────── Detection gap: 30+ days ─────────────────────────►
  ◄──────── Financial impact: unrecoverable ──────────────────►

What

Deploy automated anomaly detection using cloud-native tools at multiple levels of the hierarchy (organisation, account, service), with alert routing to the right person.

How

Choose Detection Method

Use cloud-native anomaly detection as the starting point — it’s free or low-cost and integrates with billing data natively.

Native Detection Pipelines
═══════════════════════════════════════════════════════════════

  AWS:   Cost Anomaly Detection → SNS → Chatbot → Slack/Teams
                                      └→ Webhook → PagerDuty

  GCP:   Anomaly Detection → Pub/Sub → Function → Slack/PagerDuty

  Azure: Anomaly Alert → Action Group → Logic App → Slack/Teams
                                      └→ Webhook → PagerDuty

AWS uses “Chatbot” for zero-code Slack/Teams integration. Azure uses Logic Apps for drag-and-drop workflow. GCP requires a Cloud Function (small code snippet) to route alerts.

Configure Detection Scope

Set up monitors at different levels for different purposes:

Monitor Scope	What It Catches	Alert Recipient
Organisation / Billing	Catastrophic org-wide spikes	FinOps lead, CTO
Account / Sub / Project	Team or application-level spikes	Service Owner, Eng Manager
Service level	Specific service runaway (e.g., egress)	Engineer, SRE
Individual resource	Single resource gone rogue (optional)	Resource owner (via tag)

Don’t monitor everything at the same granularity. Organisation-level catches catastrophic events (compromised credentials, massive deployment errors). Account-level catches team-specific issues. Service-level catches runaway services like data transfer spikes.

Set Up Notification Routing

Detection is worthless if the alert goes to a generic inbox. Route to the person who can act.

Routing Logic
═══════════════════════════════════════════════════════════════

  ANOMALY FOUND
       │
       ▼
  CHECK FOR "OWNER" TAG
       │
       ├── YES → Route to owner (Slack/Email)
       │
       └── NO  → Escalate to FinOps Practitioner
                  (manual triage for "homeless" spend)

Implement a serverless router — a lightweight function (Lambda, Azure Function, Cloud Function) that enriches alerts by looking up the owner tag on the affected resource and routing the notification to the correct Slack channel or email.

The function logic:

Parse the alert payload to get Account ID and Resource ID
Call Cloud API to read resource tags
If owner tag exists, look up their Slack channel in a config file
Post a formatted message to the correct channel
If no owner tag, post to #finops-central for manual triage

Configure Budget Alerts

In addition to ML-based anomaly detection, set static budget alerts at provisioning time as a safety net.

Provider	Tool	Configuration
AWS	AWS Budgets	Per-account budgets with 50%, 80%, 100% thresholds
Azure	Azure Budgets	Per-subscription budgets with progressive alerts
GCP	GCP Billing Budgets	Per-project budgets with Pub/Sub notification

Budget alerts should be provisioned automatically as part of the workload onboarding pipeline (links to S2-02 Automated Provisioning).

Deliverable Checklist

Cloud-native anomaly detection enabled (per provider)
Organisation-level monitor configured
Account/subscription/project-level monitors configured
Serverless alert router deployed with tag-based routing
Fallback routing to FinOps central channel
Budget alerts configured per account/subscription/project
Alert routing tested with a synthetic anomaly