PHASE 03 // IMPLEMENT

recfo@implement:~/runbooks/s1-02
S1-02 · Understand Cloud Usage and Cost · Allocation

Enforce Tagging

Why

A tagging strategy without enforcement degrades silently. Within months, compliance drops as teams deploy via console, forget tags in ad-hoc scripts, or bypass shared IaC modules. Enforcement is the only path to near-100% tag compliance, and it must be layered: prevent at the source, detect what slips through, and remediate automatically where safe.

What

Implement a defence-in-depth enforcement stack across three preventive layers and three detective layers, rolled out progressively over ~12 weeks to avoid blocking legitimate deployments.

Defence in Depth
═══════════════════════════════════════════════════════════════

  PREVENT                          DETECT & FIX
  ────────────────────────         ────────────────────────
  Layer 1: IaC Module Defaults     Layer 4: Compliance Scanning
  Layer 2: CI/CD Pipeline Gates    Layer 5: Auto-Remediation
  Layer 3: Cloud-Native Policies   Layer 6: Alerting & Tickets

  Combined target: > 98% tag coverage (spend-weighted)

How

Embed Tags in Shared IaC Modules

Mandatory tags become required input variables in all shared Terraform modules, CloudFormation templates, and Bicep modules. Teams cannot deploy without providing them.

# Terraform example — mandatory tag variable
variable "mandatory_tags" {
  type = object({
    cost_center   = string
    business_unit = string
    application   = string
    environment   = string
    owner         = string
  })
  validation {
    condition = contains(
      ["prod","stg","dev","sbx"],
      var.mandatory_tags.environment
    )
    error_message = "environment must be: prod, stg, dev, sbx"
  }
}

locals {
  all_tags = merge(var.mandatory_tags, {
    managed-by   = "terraform"
    created-date = formatdate("YYYY-MM-DD", timestamp())
  })
}

This catches ~80–90% of violations. It misses console deployments, CLI one-offs, and resources auto-created by managed services.

Add CI/CD Pipeline Validation

Add a scanning step to every CI/CD pipeline that deploys infrastructure. Start in warning mode, then promote to blocking mode after 2 weeks.

ToolWhat It Does
tflintTerraform linter with custom rules for required tags
checkovPolicy-as-code scanner (Terraform, CF, ARM, Bicep)
OPA / RegoGeneral-purpose policy engine for plan output
SentinelHashiCorp policy-as-code (Terraform Cloud/Enterprise)
Pipeline flow:
  git push
    → pre-commit: tflint (local, fast)
    → CI step 1: checkov / OPA scan
    → CI step 2: terraform plan
    → CI step 3: policy eval on plan output
    → CI step 4: terraform apply (only if all pass)

Deploy Cloud-Native Policies

Roll out in three phases: Audit → Enablement → Deny over ~4 weeks.

AWS — Service Control Policies (SCPs)

{
  "Effect": "Deny",
  "Action": ["ec2:RunInstances", "rds:CreateDBInstance", "s3:CreateBucket"],
  "Resource": "*",
  "Condition": {
    "Null": {
      "aws:RequestTag/cost-center": "true",
      "aws:RequestTag/environment": "true",
      "aws:RequestTag/owner": "true"
    }
  }
}

Also deploy AWS Tag Policies at OU level for allowed values and case enforcement.

Azure — Azure Policy

Use built-in policies: “Require a tag and its value on resources” and “Inherit a tag from the resource group / subscription if missing”. Assign at Management Group scope for org-wide coverage. Start with Audit effect, then promote to Deny.

Enable Cost Management Tag Inheritance in settings — this propagates Subscription/RG tags to cost records even if resources themselves are untagged. Azure has the strongest inheritance story of all three providers.

GCP — Compensating controls

GCP has no native “require label” deny constraint. Compensate with IaC discipline (default_labels in provider block), CI/CD gates (OPA/Sentinel), and Cloud Asset Inventory feeds to Cloud Functions for detective labelling.

Deploy Detective Controls

Set up compliance scanning, alerting, and auto-remediation for what slips through prevention.

Scanning:

ProviderToolWhat It Does
AWSAWS Config Rulesrequired-tags managed rule, continuous eval
AzureAzure Policy ComplianceReal-time dashboard, drill to non-compliant
GCPCloud Asset InventoryExport to BigQuery for trending
Cross-cloudCloud Custodian / SteampipeSQL or YAML policies across all providers

Auto-remediation by environment:

EnvironmentAction for Untagged Resources
SandboxAuto-tag defaults + alert. 7 days untagged → stop. 30 days → terminate
DevAuto-tag defaults + alert. 14 days untagged → stop
StagingAlert owner + ticket. SLA: 5 business days
ProductionAlert owner + P3 ticket. NEVER auto-remediate. SLA: 10 business days

Progressive Rollout Schedule

PhaseTimelineGoal
Phase 1: VisibilityWeeks 1–4Audit-mode policies. Baseline metrics. Build case.
Phase 2: EnablementWeeks 5–8IaC modules updated. CI/CD warns. Tagging sprint.
Phase 3: EnforcementWeeks 9–12Audit → Deny. Auto-remediation in non-prod. SLAs.
Phase 4: OptimisationOngoingTighten targets 90% → 95% → 98%. Quarterly review.

Deliverable Checklist

  • Shared IaC modules updated with mandatory tag variables
  • CI/CD pipeline scanning step deployed (warning → blocking)
  • Cloud-native policies deployed per provider (Audit → Deny)
  • Tag Inheritance enabled (Azure Cost Management)
  • Detective scanning operational (Config Rules / Policy / CAI)
  • Auto-remediation active for sandbox/dev
  • Alert routing configured (Slack/Teams → owner)
  • Ticketing integration for prod violations with SLAs