Monitoring
The platform provides monitoring across three pillars: metrics, logs, and traces — all backed by Azure-managed services.
Metrics (Prometheus + Grafana)
Section titled “Metrics (Prometheus + Grafana)”Azure Monitor Workspace
Section titled “Azure Monitor Workspace”AKS Automatic ships built-in Prometheus metrics collection via Azure Monitor Workspace. This captures:
- Kubernetes control plane metrics (API server, scheduler, etcd)
- Node-level metrics (CPU, memory, disk, network)
- Pod-level metrics (container resource usage)
- Istio mesh metrics (request volume, latency, error rate)
Grafana
Section titled “Grafana”When enabled (ENABLE_GRAFANA_WORKSPACE=true), Azure Managed Grafana provides pre-built dashboards for:
- Cluster health and node utilization
- Pod resource consumption
- Istio service mesh traffic
- Karpenter node provisioning
Logs (Container Insights)
Section titled “Logs (Container Insights)”Log Analytics Workspace
Section titled “Log Analytics Workspace”Container Insights collects logs from all containers and forwards them to Log Analytics. Query logs via Azure Portal or az monitor log-analytics query:
// Pod logs for a specific serviceContainerLogV2| where PodNamespace == "osdu"| where PodName startswith "partition"| project TimeGenerated, LogMessage| order by TimeGenerated desc| take 100Common Log Queries
Section titled “Common Log Queries”// Services in CrashLoopBackOffKubePodInventory| where Namespace == "osdu"| where PodStatus == "Failed"| summarize count() by Name, PodStatus// OOM killsContainerLogV2| where LogMessage contains "OOMKilled"| project TimeGenerated, PodName, LogMessageTraces (Application Insights)
Section titled “Traces (Application Insights)”OSDU services emit distributed traces to Application Insights via the APPLICATIONINSIGHTS_CONNECTION_STRING environment variable. This enables:
- End-to-end request tracing across services
- Dependency maps showing service-to-service and service-to-PaaS calls
- Exception tracking and failure analysis
- Performance analysis (latency percentiles, throughput)
Middleware Monitoring
Section titled “Middleware Monitoring”Elasticsearch
Section titled “Elasticsearch”Kibana is optionally exposed via the gateway module. Access the Kibana dashboard to monitor:
- Cluster health (green/yellow/red)
- Index status and document counts
- Search query performance
- Shard allocation across nodes
Airflow
Section titled “Airflow”The Airflow web UI is optionally exposed via the gateway module. Monitor:
- DAG execution status
- Task run history and durations
- Worker pod scheduling