Operational Notes
✅ Subscription & Identity
- Correct subscription selected:
az account show --query id -o tsv
- Account has permissions to create:
- Resource Groups
- VNet / Subnets
- Application Gateway
- Storage Account
- Key Vault
- Container Apps
- Private Endpoints
- Redis Enterprise
- Cosmos Mongo vCore
✅ Naming & Configuration Review
Open 00_config.sh and verify:
- Subscription ID
- Resource Group name
- Region (
LOC) - Storage account name (globally unique)
- ACR name (globally unique)
- Key Vault name
- CIDR ranges do not conflict with corp/VPN networks
- Domain names are correct
- Mongo region override =
eastasia
✅ Required Environment Variables
Set before running scripts:
export AZ_SUBSCRIPTION_ID="..."
export MONGO_VCORE_ADMIN_PW="StrongPasswordHere"
export CERT_PFX_PATH="/path/to/prod.pfx"
export CERT_PFX_PASSWORD="..."
✅ Application Gateway Expectations
- App Gateway uses
--no-waitduring creation. - Always wait for provisioning to complete before running config scripts.
- Confirm with:
az network application-gateway show -g eq-prod-resgroup -n eq-prod-appgw --query provisioningState -o tsv
Wait until:
Succeeded
✅ Execution Order Reminder
Run scripts in order:
01 → 07 → 08a/08b → 09 → 10 → 11 → 13 → 14 → 15
Stop immediately if any script fails.
EQ-PROD Deployment Notes & Troubleshooting
This file contains operational notes discovered during deployment and validation.
1. Azure CLI DNS Resolution Error
Symptom
HTTPSConnectionPool(host='login.microsoftonline.com', port=443): Max retries exceeded with url: /organizations/v2.0/.well-known/openid-configuration NameResolutionError: Failed to resolve 'login.microsoftonline.com'
Meaning
The machine running Azure CLI cannot resolve public DNS names. This is NOT caused by the deployment scripts.
Azure authentication requires outbound DNS + HTTPS connectivity.
2. Quick Validation Commands
Run these on the machine executing the scripts:
nslookup login.microsoftonline.com
nslookup management.azure.com
curl -I https://login.microsoftonline.com -m 10
If these fail → DNS or outbound networking is broken.
3. Fix Scenarios
A. Linux VM
Check resolver:
cat /etc/resolv.conf
Temporary fix:
sudo bash -c 'cat > /etc/resolv.conf <<EOF
nameserver 8.8.8.8
nameserver 1.1.1.1
EOF'
sudo systemctl restart systemd-resolved
Re-test:
nslookup login.microsoftonline.com
az login
B. WSL2 (Most Common)
- Replace DNS:
sudo rm -f /etc/resolv.conf
sudo bash -c 'cat > /etc/resolv.conf <<EOF
nameserver 1.1.1.1
nameserver 8.8.8.8
EOF'
- Prevent auto overwrite:
sudo bash -c 'cat > /etc/wsl.conf <<EOF
[network]
generateResolvConf = false
EOF'
- Restart WSL from Windows PowerShell:
wsl --shutdown
Then retry:
nslookup login.microsoftonline.com
az login
C. Corporate Proxy
Set proxy variables:
export HTTPS_PROXY="http://user:pass@proxy.company.com:8080"
export HTTP_PROXY="http://user:pass@proxy.company.com:8080"
export NO_PROXY="localhost,127.0.0.1,.azure.com,.microsoftonline.com"
4. Network Requirements for Azure CLI
Outbound access required: - DNS: UDP/TCP 53 - HTTPS: TCP 443 to: - login.microsoftonline.com - management.azure.com - graph.microsoft.com
5. App Gateway → Internal ACA Validation
Backend Health Check
az network application-gateway show-backend-health -g eq-prod-resgroup -n eq-prod-appgw -o table
Expected: all backends Healthy.
DNS Validation from Inside VNet
From a VM inside the VNet:
nslookup <containerapp-fqdn>
curl -vk https://<containerapp-fqdn>/healthz
Expected: - Private IP resolution - Successful HTTPS response
6. Common Causes of AppGW Unhealthy Backends
- Custom DNS servers not forwarding Azure private zones
- Wrong probe path or protocol
- NSG blocking AppGW subnet → ACA infra subnet
- Incorrect listener or backend settings
7. Ingress Model
- Container Apps: internal ingress only
- Public exposure: Application Gateway only
- No direct public access to ACA endpoints
End of notes.
8. Application Gateway --no-wait Behavior
The script 13_appgw_create.sh uses the flag:
--no-wait
This means the command returns immediately while Application Gateway is still provisioning in the background. If you continue to run:
14_appgw_config_backend.sh15_appgw_url_path_map.sh
before provisioning finishes, the commands may fail or partially configure resources.
✅ Required Action
Always verify that App Gateway provisioning is complete before continuing:
az network application-gateway show -g eq-prod-resgroup -n eq-prod-appgw --query provisioningState -o tsv
Wait until it returns:
Succeeded
Only then continue with the next scripts.
⚠️ Common Symptoms if Skipped
- Resource not found errors
- Listener / frontend-port creation failures
- Backend pool conflicts
- Random intermittent CLI failures
9. Most Common Issues (So You’re Not Surprised)
❌ Azure CLI cannot login / random REST failures
Symptoms
- DNS resolution errors
- Timeouts to
login.microsoftonline.com - Intermittent REST failures
Root cause
- Local DNS misconfiguration
- Corporate proxy not configured
- Firewall blocking outbound 443 / 53
Fix
- Follow Section 1–4 in this document to fix DNS / proxy / outbound access.
❌ Application Gateway backends show Unhealthy
Symptoms
- Backend health shows
Unhealthy,Unknown, orTimeout - No traffic reaches Container Apps
Root cause
- DNS resolution failure from AppGW subnet
- Probe path mismatch (e.g.,
/healthznot implemented) - Wrong protocol (HTTP vs HTTPS)
- NSG blocking traffic between AppGW subnet and ACA infra subnet
Fix
- Validate DNS from a VM inside the VNet.
- Confirm probe path exists on the app.
- Verify NSG rules allow outbound traffic from AppGW subnet.
- Check backend protocol and port.
❌ Private Endpoint stuck in Pending Approval
Symptoms
- Redis / Mongo private endpoint never becomes usable
- DNS resolves but connection fails
Root cause
- Private endpoint connection requires manual approval
- RBAC restrictions on resource owner
Fix
az network private-endpoint-connection list --id <resource-id>
az network private-endpoint-connection approve --id <connection-id>
❌ Container App image pull failures
Symptoms
- Revisions fail to start
- Logs show
ImagePullBackOffor authentication errors
Root cause
- Managed Identity not granted
AcrPull - Registry identity not configured on the app
Fix
az containerapp identity show -g <rg> -n <app>
az role assignment list --assignee <principalId>
az containerapp registry show -g <rg> -n <app>
Re-run:
11_identities_and_acr_pull.sh
❌ App Gateway configuration scripts fail after creation
Symptoms
- Listener creation fails
- Frontend port already exists
- Random resource not found errors
Root cause
- App Gateway still provisioning (
--no-waitwas used)
Fix Wait until provisioning completes:
az network application-gateway show -g eq-prod-resgroup -n eq-prod-appgw --query provisioningState -o tsv
Only proceed when status is Succeeded.
❌ WebSocket disconnects or drops unexpectedly
Symptoms
- WS connections drop after a few minutes
- Random reconnects
Root cause
- AppGW backend timeout too low
- Idle timeout mismatch
Fix
- Ensure WebSocket backend http-settings timeout is high (e.g., 1800s or more).
- Verify application keepalive behavior.
10. Storage Share Creation Error: --auth-mode login
Symptom
unrecognized arguments: --auth-mode login
Root Cause
Older Azure CLI versions do not support --auth-mode login for:
az storage share create
Fix Applied
Script 07_storage_share.sh now:
- Retrieves the storage account key automatically:
az storage account keys list
- Uses:
--account-name
--account-key
for share creation and ACA environment attachment.
No CLI upgrade is required.
Mongo vCore schema errors: missing highAvailability / storage type / password complexity
If you see errors like:
required property of 'highAvailability'storage contained undefined properties: 'type'Password must be between 8 & 256 characters, and contain 3 of the following...
It means:
- Your admin password doesn't meet Azure complexity requirements, and/or
- Your request body doesn't match the schema for the API version you're calling.
Fixes applied in 09_mongo_vcore.sh:
- Validates password complexity before calling Azure.
- Always includes
highAvailability.targetMode(default: Disabled). - Omits
storage.typefor older API versions (2024-07-01 / 2024-10-01-preview), and includes it only for 2025-* API versions.
You can tune HA with:
MONGO_HA_TARGET_MODE=Disabled|SameZone|ZoneRedundantPreferred
Mongo vCore: High Availability not available for 'M30' cluster tier
If you see:
High Availability not available for 'M30' cluster tier.
Fix:
- Do NOT send the
highAvailabilityobject at all for M30. - In
00_config.sh, leaveMONGO_HA_TARGET_MODEempty (default), so the script omits HA completely.
Example:
export MONGO_HA_TARGET_MODE=""
bash 09_mongo_vcore.sh
Mongo vCore: highAvailability required by schema (but you can still run "no HA")
Some API versions validate highAvailability as a required property. In those cases:
- Set
highAvailability.targetModetoDisabledto indicate no HA.
For older preview schemas, nodeGroupSpecs[*].enableHa is required:
- Set it to
false(no HA).
The script 09_mongo_vcore.sh now does this automatically for M30.
Config documentation
00_config.sh has become long. A dedicated reference is provided:
CONFIG.md— documentation for configuration variables
Keep secrets out of 00_config.sh (use pipeline/env vars).
Mongo vCore: HA fields can be schema-required
Your tenant validated that:
- New-schema API versions (2024-07-01, 2024-10-01-preview, 2025-*) require
highAvailability. - Older preview schemas require
nodeGroupSpecs[*].enableHaanddiskSizeGB.
09_mongo_vcore.sh now:
- sends
highAvailability.targetMode=Disabled(no HA) for new schema - sends
enableHa=false+diskSizeGB=<size>for old schema
This keeps the deployment no-HA, but passes schema validation.
File 12 error: (ContainerAppInvalidEnvVarName) template.containers[0].volumeMounts
If you see:
Env variable name 'template.containers[0].volumeMounts' contains invalid character
Cause:
az containerapp updatedoes not support generic--setfor template fields; it interprets the key as an env var name.
Fix (applied in v22):
- Use
az resource updateto set:properties.template.volumesproperties.template.containers[0].volumeMounts
Domain configuration
Domain variables are centralized in 00_config.sh:
APP_PUBLIC_DOMAINAPPGW_PUBLIC_FQDN
These are safe to keep as placeholders during infra bootstrap.