Skip to main content
Last updated on
Version: 1.0.0

Monitoring & Logging

Guidance for Node.js services on Azure Container Apps (ACA) using Azure Monitor and workspace-based Application Insights.

Goals

  • End-to-end observability (metrics, logs, traces) for every service.
  • Fast detection: alert on errors, latency, resource pressure, and restarts.
  • Actionability: standard queries, dashboards, and runbooks.

Stack

  • Compute: Azure Container Apps.
  • Observability: Azure Monitor + Log Analytics workspace + Application Insights (workspace-based).
  • Visualization: Azure Monitor Workbooks; optional Azure Managed Grafana.
  • Tracing: OpenTelemetry with Azure Monitor exporter.

What to monitor

  • Availability: request rate, 4xx/5xx ratio, p95/p99 latency.
  • Performance: CPU %, memory working set, container restarts, scale events.
  • Reliability: dependency failures (DB, Service Bus, HTTP downstream), retry counts, queue backlog.
  • Security: auth failures, permission denials, unexpected public endpoints.
  • Platform: ACA revision health, ingress errors.

Setup (platform)

  1. Create a Log Analytics workspace (same region as ACA).
  2. Create an Application Insights resource linked to that workspace.
  3. In ACA Environment > Diagnostic settings, send ContainerAppConsoleLogs and ContainerAppSystemLogs to the workspace.
  4. For each Container App:
    • Set env vars: APPLICATIONINSIGHTS_CONNECTION_STRING, OTEL_SERVICE_NAME, LOG_LEVEL (info/warn/error), NODE_ENV.
    • Health probes: /healthz (liveness), /readyz (readiness) with fast responses.
    • Scale rules: CPU, RPS, or queue depth (Service Bus/Event Hub) as appropriate.

Setup (Node.js app)

  • Dependencies: applicationinsights, pino, optional @opentelemetry/sdk-node, @azure/monitor-opentelemetry-exporter.
  • Structure logs as single-line JSON; include service name and correlation IDs.

Minimal logging + metrics

import appInsights from 'applicationinsights';
import pino from 'pino';
import express from 'express';

appInsights
.setup(process.env.APPLICATIONINSIGHTS_CONNECTION_STRING)
.setAutoCollectRequests(true)
.setAutoCollectPerformance(true)
.setAutoCollectExceptions(true)
.setSendLiveMetrics(true)
.setDistributedTracingMode(appInsights.DistributedTracingModes.AI_AND_W3C)
.start();

const logger = pino({
level: process.env.LOG_LEVEL || 'info',
base: { service: 'appvity-api' },
messageKey: 'msg',
});

const app = express();
app.get('/healthz', (_req, res) => res.status(200).send('ok'));

app.listen(process.env.PORT || 3000, () => {
logger.info({ msg: 'service-started', port: process.env.PORT || 3000 });
});

Distributed tracing (OpenTelemetry)

import { NodeSDK } from '@opentelemetry/sdk-node';
import { getNodeAutoInstrumentations } from '@opentelemetry/auto-instrumentations-node';
import { AzureMonitorTraceExporter } from '@azure/monitor-opentelemetry-exporter';

const sdk = new NodeSDK({
serviceName: process.env.OTEL_SERVICE_NAME || 'appvity-api',
traceExporter: new AzureMonitorTraceExporter({
connectionString: process.env.APPLICATIONINSIGHTS_CONNECTION_STRING,
}),
instrumentations: [getNodeAutoInstrumentations()],
});

sdk.start();

Standard alerts (examples)

  • 5xx rate > 2% for 5m.
  • p95 latency > 800 ms for 5m.
  • CPU > 80% for 5m or memory > 80% for 5m.
  • Container restarts > 3 in 15m.
  • Queue backlog (Service Bus) > threshold tied to SLA.
  • No logs ingested in 10m (heartbeat).

Useful KQL queries

Requests (error rate):

requests
| where timestamp > ago(1h)
| summarize total = count(), errors = countif(success == false) by bin(timestamp, 5m)
| extend error_rate = 100.0 * errors / total

Latency (p95):

requests
| where timestamp > ago(1h)
| summarize p95 = percentile(duration, 95) by bin(timestamp, 5m)

Container logs (by app, level):

ContainerAppConsoleLogs
| where TimeGenerated > ago(1h)
| where ContainerAppName == "appvity-api"
| extend level = tostring(parse_json(Log_s).level)
| summarize count() by level

Exceptions:

exceptions
| where timestamp > ago(1h)
| summarize count() by type, outerMessage, bin(timestamp, 10m)

Dashboards

  • Azure Monitor Workbook: latency, error rate, restarts, scale events, dependency failures.
  • Optional Grafana (managed): import Azure Monitor and Log Analytics data sources for shared views.

Retention and cost

  • Metrics: keep at least 30 days; logs 30–90 days depending on compliance.
  • Use sampling in Application Insights when traffic is high (e.g., 20–50%).
  • Prefer structured logs; avoid large payloads and secrets.

Runbooks (quick actions)

  • High 5xx/latency: check latest revision, dependency health, recent deploys; roll back if needed.
  • CPU/memory pressure: inspect scale rules; increase min replicas or tune concurrency.
  • No logs ingested: verify diagnostic settings to workspace and ACA permissions.
  • Repeated restarts: check probes, startup latency, and configuration/secret changes.

Local development

  • Set APPLICATIONINSIGHTS_CONNECTION_STRING to a non-prod resource.
  • Keep LOG_LEVEL=debug locally; use info or higher in production.
  • Exercise /healthz and a sample request flow to ensure traces and logs appear in the workspace.