PHASE 03 // IMPLEMENT

recfo@implement:~/runbooks/s1-09
S1-09 · Understand Cloud Usage and Cost · Data Ingestion

Normalise Billing Data with FinOps FOCUS

Why

Without normalisation, each cloud has separate reports with different terminology. AWS calls it line_item_unblended_cost, Azure calls it CostInBillingCurrency, GCP calls it cost. Cross-cloud questions — “what’s our total compute spend?” or “what’s our commitment coverage across all providers?” — require manual reconciliation and inevitably produce conflicting numbers. FOCUS provides one schema with the same column names and semantics across all providers.

The Problem
═══════════════════════════════════════════════════════════════

  AWS CUR 2.0              Azure Exports           GCP BigQuery
  ────────────             ─────────────           ──────────────
  line_item_type           ChargeType              cost_type
  line_item_usage_amount   Quantity                usage.amount
  line_item_unblended_cost CostInBillingCurrency   cost

         │                       │                       │
         ▼                       ▼                       ▼
  ┌──────────────────────────────────────────────────────────┐
  │                       FOCUS 1.0+                          │
  │   ChargeType  │  BilledCost  │  EffectiveCost  │  ...    │
  │   ServiceName │  RegionId    │  SubAccountId   │  ...    │
  └──────────────────────────────────────────────────────────┘

What

Build a unified FOCUS data lake where all cloud billing data sits in a single table with identical column names. This enables cross-cloud dashboards, unified anomaly detection, and multi-cloud unit economics — all from standard SQL queries.

How

Enable FOCUS Exports per Provider

This builds on S1-08 (Billing Ingestion). Ensure FOCUS-format data is available from each provider:

ProviderFOCUS AvailabilityAction Required
AWSNative FOCUS 1.0 export via Data ExportsEnable “FOCUS with AWS Columns” export to S3
AzureNative FOCUS 1.0r2 export via Cost Management ExportsEnable FOCUS dataset export to Blob Storage
GCPNot native — requires a BigQuery View transformationCreate FOCUS view using Google’s guide SQL or FOCUS Converter

Choose an Ingestion Architecture

Multi-Cloud FOCUS Data Lake
═══════════════════════════════════════════════════════════════

  ┌──────────┐   ┌─────────┐   ┌─────────┐
  │  AWS     │   │  Azure  │   │  GCP    │
  │  FOCUS   │   │  FOCUS  │   │  FOCUS  │
  │  Export  │   │  Export │   │  View   │
  └────┬─────┘   └────┬────┘   └────┬────┘
       │              │             │
       ▼              ▼             ▼
  ┌──────────────────────────────────────────────┐
  │            Ingestion Layer                    │
  │                                              │
  │  Option A: Cloud-native                      │
  │    S3 + Athena federated / ADX / BigQuery    │
  │                                              │
  │  Option B: Data warehouse                    │
  │    Snowflake / Databricks / ClickHouse       │
  │                                              │
  │  Option C: Third-party FinOps platform       │
  │    (ingests and normalises for you)          │
  └──────────────────┬───────────────────────────┘

  ┌──────────────────▼───────────────────────────┐
  │           Unified FOCUS Table                 │
  │  ProviderName │ ServiceName │ BilledCost │... │
  │  AWS          │ EC2         │ 42.50      │    │
  │  Azure        │ VMs         │ 38.20      │    │
  │  GCP          │ Compute Eng │ 29.80      │    │
  └──────────────────────────────────────────────┘

Option A (cloud-native) is cheapest for organisations already invested in one cloud’s analytics stack. Use Athena federated queries (AWS-primary), Azure Data Explorer cross-cloud ingestion, or BigQuery with external data sources.

Option B (data warehouse) is best for organisations with an existing Snowflake/Databricks investment.

Option C (FinOps platform) is fastest for organisations that want turnkey multi-cloud normalisation without building pipelines.

Land and Union All Sources

For each cloud provider, land the FOCUS export into the chosen data store:

SourceLanding Method
AWS FOCUSS3 → ETL/copy to warehouse, or Athena federated query
Azure FOCUSBlob Storage → copy to warehouse, or ADX ingestion
GCP FOCUSScheduled BigQuery query → export to GCS → copy to warehouse

Union into a single table. The FOCUS schema ensures columns align across providers. Add a ProviderName partition column for query performance.

Validate with FOCUS Validator

Run the open-source FOCUS Validator on each source before unioning. It checks schema compliance, required columns, data type correctness, and semantic rules.

# FOCUS Validator (Python)
pip install focus-validator
focus-validator validate --input ./aws-focus-export.parquet
focus-validator validate --input ./azure-focus-export.parquet
focus-validator validate --input ./gcp-focus-view-export.parquet

Cross-check: sum BilledCost per provider per month against each provider’s invoice. Discrepancies >1% require investigation.

Build Cross-Cloud Reports

Use the FOCUS Use Case Library (focus.finops.org) for pre-built SQL queries that work on the unified table without modification:

Use CaseFOCUS Columns Used
Total spend by providerProviderName, BilledCost
Spend by service categoryServiceCategory, BilledCost
Commitment discount utilisationCommitmentDiscountId, EffectiveCost
Cross-cloud unit economicsSubAccountId, BilledCost, UsageQuantity
Anomaly detection across providersBilledCost, ChargePeriod

Establish Maintenance Process

Schema changes by providers can break the pipeline. Establish:

  • Monthly check of provider release notes for billing schema changes
  • Automated tests that validate FOCUS schema compliance on every pipeline run
  • Owner assigned for pipeline maintenance

Deliverable Checklist

  • FOCUS export enabled on AWS and Azure
  • FOCUS BigQuery view created for GCP
  • Ingestion architecture chosen and implemented
  • All three sources landed in unified data store
  • FOCUS Validator passed on each source
  • Cross-cloud reconciliation against invoices confirmed (<1% variance)
  • At least one cross-cloud dashboard live
  • Pipeline maintenance owner assigned
  • Monthly schema change monitoring process in place