Last updated on Jan 12, 2026

Version: 1.0.0

Monitoring & Logging

Guidance for Node.js services on Azure Container Apps (ACA) using Azure Monitor and workspace-based Application Insights.

Goals

End-to-end observability (metrics, logs, traces) for every service.
Fast detection: alert on errors, latency, resource pressure, and restarts.
Actionability: standard queries, dashboards, and runbooks.

Stack

Compute: Azure Container Apps.
Observability: Azure Monitor + Log Analytics workspace + Application Insights (workspace-based).
Visualization: Azure Monitor Workbooks; optional Azure Managed Grafana.
Tracing: OpenTelemetry with Azure Monitor exporter.

What to monitor

Availability: request rate, 4xx/5xx ratio, p95/p99 latency.
Performance: CPU %, memory working set, container restarts, scale events.
Reliability: dependency failures (DB, Service Bus, HTTP downstream), retry counts, queue backlog.
Security: auth failures, permission denials, unexpected public endpoints.
Platform: ACA revision health, ingress errors.

Setup (platform)

Create a Log Analytics workspace (same region as ACA).
Create an Application Insights resource linked to that workspace.
In ACA Environment > Diagnostic settings, send ContainerAppConsoleLogs and ContainerAppSystemLogs to the workspace.
For each Container App:
- Set env vars: APPLICATIONINSIGHTS_CONNECTION_STRING, OTEL_SERVICE_NAME, LOG_LEVEL (info/warn/error), NODE_ENV.
- Health probes: /healthz (liveness), /readyz (readiness) with fast responses.
- Scale rules: CPU, RPS, or queue depth (Service Bus/Event Hub) as appropriate.

Setup (Node.js app)

Dependencies: applicationinsights, pino, optional @opentelemetry/sdk-node, @azure/monitor-opentelemetry-exporter.
Structure logs as single-line JSON; include service name and correlation IDs.

Minimal logging + metrics

import appInsights from 'applicationinsights';
import pino from 'pino';
import express from 'express';

appInsights
  .setup(process.env.APPLICATIONINSIGHTS_CONNECTION_STRING)
  .setAutoCollectRequests(true)
  .setAutoCollectPerformance(true)
  .setAutoCollectExceptions(true)
  .setSendLiveMetrics(true)
  .setDistributedTracingMode(appInsights.DistributedTracingModes.AI_AND_W3C)
  .start();

const logger = pino({
  level: process.env.LOG_LEVEL || 'info',
  base: { service: 'appvity-api' },
  messageKey: 'msg',
});

const app = express();
app.get('/healthz', (_req, res) => res.status(200).send('ok'));

app.listen(process.env.PORT || 3000, () => {
  logger.info({ msg: 'service-started', port: process.env.PORT || 3000 });
});

Distributed tracing (OpenTelemetry)

import { NodeSDK } from '@opentelemetry/sdk-node';
import { getNodeAutoInstrumentations } from '@opentelemetry/auto-instrumentations-node';
import { AzureMonitorTraceExporter } from '@azure/monitor-opentelemetry-exporter';

const sdk = new NodeSDK({
  serviceName: process.env.OTEL_SERVICE_NAME || 'appvity-api',
  traceExporter: new AzureMonitorTraceExporter({
    connectionString: process.env.APPLICATIONINSIGHTS_CONNECTION_STRING,
  }),
  instrumentations: [getNodeAutoInstrumentations()],
});

sdk.start();

Standard alerts (examples)

5xx rate > 2% for 5m.
p95 latency > 800 ms for 5m.
CPU > 80% for 5m or memory > 80% for 5m.
Container restarts > 3 in 15m.
Queue backlog (Service Bus) > threshold tied to SLA.
No logs ingested in 10m (heartbeat).

Useful KQL queries

Requests (error rate):

requests
| where timestamp > ago(1h)
| summarize total = count(), errors = countif(success == false) by bin(timestamp, 5m)
| extend error_rate = 100.0 * errors / total

Latency (p95):

requests
| where timestamp > ago(1h)
| summarize p95 = percentile(duration, 95) by bin(timestamp, 5m)

Container logs (by app, level):

ContainerAppConsoleLogs
| where TimeGenerated > ago(1h)
| where ContainerAppName == "appvity-api"
| extend level = tostring(parse_json(Log_s).level)
| summarize count() by level

Exceptions:

exceptions
| where timestamp > ago(1h)
| summarize count() by type, outerMessage, bin(timestamp, 10m)

Dashboards

Azure Monitor Workbook: latency, error rate, restarts, scale events, dependency failures.
Optional Grafana (managed): import Azure Monitor and Log Analytics data sources for shared views.

Retention and cost

Metrics: keep at least 30 days; logs 30–90 days depending on compliance.
Use sampling in Application Insights when traffic is high (e.g., 20–50%).
Prefer structured logs; avoid large payloads and secrets.

Runbooks (quick actions)

High 5xx/latency: check latest revision, dependency health, recent deploys; roll back if needed.
CPU/memory pressure: inspect scale rules; increase min replicas or tune concurrency.
No logs ingested: verify diagnostic settings to workspace and ACA permissions.
Repeated restarts: check probes, startup latency, and configuration/secret changes.

Local development

Set APPLICATIONINSIGHTS_CONNECTION_STRING to a non-prod resource.
Keep LOG_LEVEL=debug locally; use info or higher in production.
Exercise /healthz and a sample request flow to ensure traces and logs appear in the workspace.

Goals​

Stack​

What to monitor​

Setup (platform)​

Setup (Node.js app)​

Minimal logging + metrics​

Distributed tracing (OpenTelemetry)​

Standard alerts (examples)​

Useful KQL queries​

Dashboards​

Retention and cost​

Runbooks (quick actions)​

Local development​