🏠 OMS to D365 Integration Overview
Complete guide to building an event-driven, cloud-native integration platform from Order Management System (OMS) to Dynamics 365 Finance & Operations via Azure Integration Services (AIS). This architecture prioritizes reliability, security, and operational excellence.
🎯 Key Highlights
📊 Big Picture Data Flow
Here's the complete journey of an order from OMS to D365:
📅 Processing Timeline
🏗️ Component Interaction Overview
⏱️ Why Every 4 Hours?
- Cost: Fewer Function 2 executions = lower compute costs and Azure Service Bus reads
- Throughput: Batching multiple orders into a single ZIP reduces D365 DIXF import overhead
- Timeliness: Orders reach D365 within ~4.25 hours (4h batch + 15min offset) — acceptable for most business processes
- Rate Limiting: Respects D365 API throttling limits; single ZIP upload per 4h window
- Idempotency: Clear processing windows reduce replay/duplicate risk
🔄 Processing Status States
Each order moves through these states in Cosmos DB:
Pending— Order ingested by Function 1, awaiting batch processingProcessing— Function 2 has claimed the order, transforming to D365 schemaProcessed— ZIP created and uploaded to Blob StorageFailed— Error during transformation; marked for manual review or retry
📈 Expected Performance
| Metric | Value | Notes |
|---|---|---|
| Ingestion Latency | < 1 second | Service Bus trigger fires instantly; Function 1 validates & writes to Cosmos |
| End-to-End Latency | ~4-4.5 hours | Batch window (4h) + delivery offset (15min) + D365 processing (variable) |
| Throughput | 1,000+ orders/batch | ZIP size typically 2-5 MB for standard order volumes |
| DLQ Rate | < 0.1% | Only malformed events or network failures; good data quality assumed |
| Availability Target | 99.9% (3x9) | SLA: deliveries reach D365 within +30min of scheduled time |
🎓 What You'll Learn
By exploring this tutorial, you will understand:
- How to design event-driven architectures on Azure
- Service Bus dead-letter queue handling and recovery patterns
- Cosmos DB state machines and idempotent processing
- Logic App design for long-running orchestration with fault handling
- Bicep and Terraform IaC for 3-environment deployments
- Security best practices: Managed Identity, Key Vault, private endpoints
- Observability with Application Insights KQL queries and alerts
🏛️ System Architecture
High-Level Design
Layered architecture showing data flow from OMS source through D365 destination, with observability and security cross-cutting concerns.
Component Overview
Layer 1 — Event Capture: OMS publishes CloudEvents to Event Grid custom topic every 4 hours. Event Grid subscriptions route to Service Bus topic for reliable, durable message queue.
Layer 2 — Ingestion: Azure Functions v4 (Function 1) binds to Service Bus subscription via ServiceBusTrigger. Validates JSON schema, checks for duplicates, and upserts order documents to Cosmos DB with status "Pending".
Layer 3 — State Store: Cosmos DB NoSQL container stores order state machine. Documents partitioned by orderId; session-level consistency; 30-day TTL for automatic cleanup.
Layer 4 — Transform: Function 2 runs on timer trigger (every 4 hours). Queries all "Pending" orders, claims them (status → "Processing"), transforms to D365 schema, and creates ZIP archive.
Layer 5 — Stage: ZIP files stored in Blob Storage container "oms-d365-payloads". Blob naming convention: oms_yyyyMMdd_HHmmss_guid.zip. Files retained for 90 days.
Layer 6 — Deliver: Logic App Standard runs every 4h + 15min offset. Lists blobs, downloads latest ZIP, calls D365 connector to upload, triggers DIXF import job. Updates Cosmos DB status to "Processed" on success.
Destination — D365 F&O: Dynamics 365 receives ZIP package via DIXF (Data Import Export Framework). Executes standard import job for SalesOrderHeadersV2 entity. Orders appear in D365 Sales Order module.
Observability: Application Insights collects structured telemetry from Functions and Logic App. Key Vault holds all connection strings, API keys, and D365 service principal credentials.
Low-Level Design
Detailed component architecture showing service SKUs, configuration parameters, and inter-service communication details.
Key Configuration Details
- Service Bus: Standard SKU (not Premium) — cost-effective for < 5M messages/month; partition support for scalability; duplicate detection prevents replay within 10-minute window
- Cosmos DB: Session consistency (not Strong) — better performance (< 50ms p99 latency) with acceptable eventual consistency within single region
- Function App: Consumption plan = pay-per-execution; .NET 8 isolated worker process for better isolation; Managed Identity eliminates connection string storage
- Logic App Standard: Can run in ISE for private networking; preferred over Cloud consumption for D365 connector stability
- Blob Storage: ZRS replication in UAT/Prod for HA; Cool tier after 30 days reduces storage cost by 50%
🔀 Complete Data Flow
Step-by-step sequence of events showing how an order moves through the system.
Sequence Diagram: End-to-End Order Processing
Data Flow Diagram — Functional Processes (DFD Level 0)
DFD Level 1 — Internal Processes
Order State Machine Transitions
Event Schema (CloudEvent → D365)
Input Event (from OMS):
{
"specversion": "1.0",
"type": "OMS.Order.Created",
"source": "https://oms.contoso.com",
"id": "12345-67890",
"time": "2026-03-14T08:00:00Z",
"datacontenttype": "application/json",
"data": {
"orderId": "ORD-20260314-001",
"customerId": "CUST-12345",
"orderDate": "2026-03-14",
"totalAmount": 5000.00,
"currency": "USD",
"lineItems": [
{
"itemNumber": 1,
"productCode": "PROD-ABC",
"quantity": 10,
"unitPrice": 500.00
}
],
"shippingAddress": {
"street": "123 Main St",
"city": "Seattle",
"state": "WA",
"zip": "98101",
"country": "US"
}
}
}
Cosmos DB Document (after ingestion):
{
"id": "ORD-20260314-001",
"orderId": "ORD-20260314-001",
"processingStatus": "Pending",
"omsEvent": { /* full CloudEvent data */ },
"ingestedAt": "2026-03-14T08:00:15Z",
"pickedUpAt": null,
"processedAt": null,
"batchId": null,
"blobReference": null,
"retryCount": 0,
"_ttl": 2592000
}
🧩 Why Azure? Component Decisions
Decision matrix showing component selection rationale and alternatives considered.
Component Selection Matrix
| Component | Why Chosen | Alternatives Considered | Decision Criteria |
|---|---|---|---|
| Event Grid Custom Topic |
Push-based, CloudEvents spec, 10M+ events/s, SAS + Managed Identity auth, built-in subscription routing, no server management | Apache Kafka, Event Hubs, direct HTTP webhooks, SNS (AWS) | Native Azure, serverless, low cost for < 1M events, automatic subscription routing, CloudEvents standard compliance |
| Service Bus Standard |
Reliable at-least-once delivery, peek-lock semantics, built-in DLQ, session support, 10-retry default, duplicate detection, FIFO ordering (topics with sessions) | Storage Queue, Event Hubs, RabbitMQ, Amazon SQS | Enterprise-grade messaging, DLQ critical for resilience, automatic retry + dead-lettering, cost-effective for intermittent load |
| Azure Functions v4 .NET 8 |
Serverless (consumption plan), no idle cost, elastic scale 0→1000s, isolated worker process, built-in Service Bus trigger, Managed Identity native | App Service, Container Apps, Durable Functions, AWS Lambda | Minimal ops overhead, scales with message volume, excellent .NET integration, automatic trigger binding for Service Bus |
| Cosmos DB NoSQL |
Schema-flexible for OMS variant structures, partition by orderId ensures hot-data locality, idempotent upsert, 10ms p99 reads, TTL for auto-cleanup, global distribution option | Azure SQL, Table Storage, MongoDB, PostgreSQL | Order documents vary in structure; SQL would require schema migration per OMS change. Upsert operation = built-in idempotency. Session consistency sufficient. |
| Blob Storage | Cheap binary storage ($0.01/GB/month cool tier), append-only guarantees, streaming upload, SAS URLs for D365 access, 90-day retention policies, ZRS replication | Azure Files, Data Lake, direct database BLOB field, SFTP server | ZIP packages inherently binary; Blob is designed for this. Cost << SQL storage. SAS URLs are secure temporary download links. No server to manage. |
| Logic App Standard |
Low-code D365 connector (built-in), visual workflow designer, retry policies per action, scope-based exception handling, Managed Identity support, ISE for private networking | API Management + custom code, AWS Step Functions, Zapier, Power Automate Cloud | Out-of-box D365 F&O connector saves 3+ weeks dev time. Visual design reduces bugs. Exception scopes map cleanly to business logic. |
| Application Insights |
Native Azure telemetry SDK, structured logging, KQL query language, auto-correlation across services, alert rules, free for Functions, 90-day retention, Log Analytics integration | Datadog, Splunk, New Relic, CloudWatch (AWS) | Zero instrumentation cost in Azure. KQL is powerful. Correlation IDs auto-tracked. Built-in SLA/SLO dashboards. No vendor lock-in (data exportable). |
| Key Vault | Centralized secret store, RBAC-based access (no shared keys), Managed Identity auto-auth, audit logging, optional hardware security module (HSM), secret rotation automation, soft-delete recovery | App Configuration, environment variables, AWS Secrets Manager, HashiCorp Vault | Zero secrets in code/configs. Managed Identity = no passwords to rotate manually. RBAC = principle of least privilege. HSM option for compliance (PCI-DSS, HIPAA). |
Design Decision Narratives
📨 Why NOT Premium Service Bus?
- 99.95% SLA vs 99.9%
- Guaranteed throughput with partitions
- VNet integration
- 4-hour batch cycle = ~6,250 messages/day max = well below Standard limits
- 99.9% SLA acceptable for order processing (not real-time payments)
- Private endpoints can be added to Standard for network security
- Upgrade path exists if load increases
🌐 Why Session Consistency (not Strong) in Cosmos DB?
- Strong: p99 latency > 100ms, 30% fewer RUs
- Session: p99 latency < 50ms, optimal RU cost
- Single-region deployment = fast propagation (< 1ms internal)
- Function 1 writes & reads own session = session-consistent
- Function 2 reads "Pending" are eventual consistent (acceptable 4h batch delay)
- No cross-partition transactions needed
⚡ Why /orderId as Partition Key (not /processingStatus)?
- /processingStatus: Only 4 values → ALL "Pending" orders on same partition → hot partition → throttling (429 errors)
- /orderId: Millions of unique values → even distribution → no hot partition
- Function 1 writes per-order = natural distribution
- Function 2 cross-partition query (status=Pending) only runs 6x/day (acceptable)
- Each order's operations stay on same partition (better cache locality)
🔐 Why Managed Identity over Connection Strings?
- Connection String Model: "DefaultEndpointProtocol=..." stored in Key Vault → retrieved at runtime → risk of exposure in memory/logs
- Managed Identity Model: Function app has an identity → RBAC role grants "Service Bus Data Receiver" → no secret ever stored
- AAD token lifetime = 1 hour (auto-refreshed)
- No static credentials = no rotation overhead
- Audit logs show which identity accessed what (fine-grained accountability)
- Complies with zero-trust security model
⏰ Why Logic App Standard over Cloud?
- Cloud (Consumption): Cheaper per execution, fully serverless, but cold starts and D365 connector latency spikes
- Standard (App Service Plan): Higher baseline cost ($20/month ISE), but warm instances, guaranteed latency, better for Regulated orgs
- D365 connector calls are sensitive to latency (200ms+ cold start can exceed D365 API timeout)
- ISE option enables private networking (VNet integration)
- Predictable monthly cost (no surprise execution fees)
- Better monitoring + debugging experience
⚡ Event Grid & Secure Event Publishing
Event Grid Custom Topic endpoint security and 4-hour push cycle rationale.
Securing the Event Grid Endpoint
Five authentication methods protect Event Grid Custom Topic from unauthorized publishers:
- Microsoft Entra ID: OMS app registration with "EventGrid Data Sender" role; token-based (1-hour lifetime)
- SAS Key: Shared Access Signature key in header; long-lived (rotate every 90 days)
- Private Endpoint: For on-prem OMS via Express Route; traffic stays on Azure backbone
- IP Firewall: Restrict inbound to OMS's known IP ranges
- Managed Service Identity: If OMS runs in Azure, use MSI for automatic token generation
Authentication Flow: OMS to Event Grid
Event Grid Security Layers (Defence-in-Depth)
CloudEvents Payload Format
OMS publishes CloudEvents (CNCF standard) batched in HTTP POST every 4 hours:
[
{
"specversion": "1.0",
"type": "OMS.Order.Created",
"source": "https://oms.contoso.com",
"id": "order-uuid-1234",
"time": "2026-03-14T12:00:00Z",
"datacontenttype": "application/json",
"data": {
"orderId": "ORD-001",
"customerId": "CUST-001",
"totalAmount": 5000.00,
"lineItems": []
}
}
]
⏱️ Why Every 4 Hours?
| Factor | 4-Hour Window |
|---|---|
| Cost | 6 Function 2 runs/day; low compute cost |
| E2E Latency | ~4.25 hours (batch + 15min offset); acceptable for order processing |
| Batch Size | 50-500 orders typical; 1-2 MB ZIP |
| D365 Throttling | 1 ZIP upload every 4h; well below 200/min import limit |
| Idempotency | Clear window boundaries make dedup logic simple |
📨 Service Bus & Dead Letter Queue
At-least-once delivery, peek-lock semantics, and automatic dead-lettering for failed messages.
Service Bus Configuration
| Component | Configuration | Rationale |
|---|---|---|
| Namespace SKU | Standard | Cost-effective; 40M messages/month included |
| Topic | Partitioned; TTL 14 days; duplicate detection 10 min | Partitioning scales throughput; duplicate detection prevents replay |
| Subscription | maxDelivery=10; lockDuration=5min; DLQ enabled | 10 retries over 5+ minutes; failed messages auto-DLQ |
Service Bus Namespace Topology
Dead Letter Queue (DLQ) Flow
Messages are sent to DLQ when:
- JSON parse fails → Function immediately DLQ's message
- Schema validation fails → Function immediately DLQ's message
- Cosmos DB timeout → Function abandons; after 10 retries, Service Bus auto-DLQ's
- Message age > 14 days → Service Bus auto-DLQ's
DLQ Message Processing Flow
DLQ Monitoring Alert
Managed Identity Authentication
Function App accesses Service Bus via Managed Identity (no connection strings):
[Function("OmsOrderIngestion")]
public async Task Run(
[ServiceBusTrigger("oms-orders-topic", "oms-d365-subscription")]
ServiceBusReceivedMessage message,
FunctionContext context)
{
// Function binding uses Managed Identity automatically
// No need to manage connection strings
}
⚡ Azure Functions: Ingestion and Transformation
Two serverless functions handle real-time validation, state management, and batch transformation with comprehensive exception handling.
Function 1: OmsOrderIngestion
Trigger: Azure Service Bus (ServiceBusTrigger) | Runtime: .NET 8 Isolated | Timeout: 5 minutes
Processing Flow Diagram
Full C# Implementation
using Azure.Messaging.ServiceBus;
using FunctionApp.OmsIntegration.Models;
using FunctionApp.OmsIntegration.Services;
using FunctionApp.OmsIntegration.Validators;
using Microsoft.Azure.Functions.Worker;
using Microsoft.Extensions.Logging;
using System.Text.Json;
namespace FunctionApp.OmsIntegration.Functions;
public class OmsOrderIngestionFunction
{
private readonly ICosmosDbService _cosmosDbService;
private readonly ILogger<OmsOrderIngestionFunction> _logger;
private static readonly JsonSerializerOptions _jsonOptions = new()
{
PropertyNameCaseInsensitive = true
};
public OmsOrderIngestionFunction(
ICosmosDbService cosmosDbService,
ILogger<OmsOrderIngestionFunction> _logger)
{
_cosmosDbService = cosmosDbService;
this._logger = _logger;
}
[Function(nameof(OmsOrderIngestionFunction))]
public async Task Run(
[ServiceBusTrigger(
topicName: "%ServiceBusTopicName%",
subscriptionName: "%ServiceBusSubscriptionName%",
Connection: "ServiceBusConnection")]
ServiceBusReceivedMessage message,
ServiceBusMessageActions messageActions)
{
var correlationId = message.CorrelationId ?? Guid.NewGuid().ToString();
using var logScope = _logger.BeginScope(new Dictionary<string, object>
{
["CorrelationId"] = correlationId,
["MessageId"] = message.MessageId,
["DeliveryCount"] = message.DeliveryCount,
["EnqueuedTime"] = message.EnqueuedTime
});
_logger.LogInformation("OmsOrderIngestionFunction started. MessageId={MessageId}", message.MessageId);
// STEP 1: Deserialise JSON
OmsOrderEvent? omsEvent;
try
{
omsEvent = JsonSerializer.Deserialize<OmsOrderEvent>(
message.Body.ToString(), _jsonOptions);
}
catch (JsonException ex)
{
_logger.LogError(ex, "JSON deserialisation failed for MessageId={MessageId}", message.MessageId);
await messageActions.DeadLetterMessageAsync(
message,
deadLetterReason: "JsonDeserialiseFailure",
deadLetterErrorDescription: ex.Message);
return;
}
if (omsEvent is null)
{
await messageActions.DeadLetterMessageAsync(message,
deadLetterReason: "NullPayload",
deadLetterErrorDescription: "Deserialised event was null");
return;
}
// STEP 2: Validate business rules
var validationResult = OmsOrderValidator.Validate(omsEvent);
if (!validationResult.IsValid)
{
_logger.LogWarning("Validation failed for OrderId={OrderId}: {Errors}",
omsEvent.OrderId, string.Join("; ", validationResult.Errors));
await messageActions.DeadLetterMessageAsync(message,
deadLetterReason: "ValidationFailure",
deadLetterErrorDescription: string.Join("; ", validationResult.Errors));
return;
}
// STEP 3: Idempotency check
OmsOrderDocument? existing = null;
try
{
existing = await _cosmosDbService.GetOrderByIdAsync(omsEvent.OrderId);
}
catch (Exception ex)
{
_logger.LogError(ex, "Cosmos DB read failed for OrderId={OrderId}", omsEvent.OrderId);
await messageActions.AbandonMessageAsync(message);
return;
}
if (existing is not null)
{
_logger.LogInformation("Duplicate: OrderId={OrderId} already in Cosmos DB. Completing.", omsEvent.OrderId);
await messageActions.CompleteMessageAsync(message);
return;
}
// STEP 4: Persist to Cosmos DB
var document = new OmsOrderDocument
{
Id = omsEvent.OrderId,
OrderId = omsEvent.OrderId,
ProcessingStatus = "Pending",
OmsEvent = omsEvent,
IngestedAt = DateTimeOffset.UtcNow,
RetryCount = 0
};
try
{
await _cosmosDbService.UpsertOrderAsync(document);
_logger.LogInformation("Ingested OrderId={OrderId} with status=Pending", omsEvent.OrderId);
await messageActions.CompleteMessageAsync(message);
}
catch (Exception ex)
{
_logger.LogError(ex, "Failed to upsert OrderId={OrderId}. Abandoning.", omsEvent.OrderId);
await messageActions.AbandonMessageAsync(message);
}
}
}
Function 2: OmsTimerTransform
Trigger: Timer CRON (0 0 */4 * * *) | Runtime: .NET 8 Isolated | Timeout: 30 minutes
Processing Flow Diagram
ZIP Package Contents
header.json — Package envelope with batch metadata and timestamppackage.yaml — DIXF manifest listing entities and import sequencesales_orders.json — Transformed D365 SalesOrderHeadersV2 recordssales_order_lines.json — Transformed D365 SalesOrderLinesV2 records
Full C# Implementation
using FunctionApp.OmsIntegration.Mappers;
using FunctionApp.OmsIntegration.Models;
using FunctionApp.OmsIntegration.Services;
using Microsoft.Azure.Functions.Worker;
using Microsoft.Extensions.Logging;
using System.IO.Compression;
using System.Text;
using System.Text.Json;
namespace FunctionApp.OmsIntegration.Functions;
public class OmsTimerTransformFunction
{
private readonly ICosmosDbService _cosmosDbService;
private readonly IBlobStorageService _blobStorageService;
private readonly ILogger<OmsTimerTransformFunction> _logger;
private static readonly JsonSerializerOptions _jsonOptions = new()
{
WriteIndented = true,
PropertyNamingPolicy = JsonNamingPolicy.CamelCase
};
public OmsTimerTransformFunction(
ICosmosDbService cosmosDbService,
IBlobStorageService blobStorageService,
ILogger<OmsTimerTransformFunction> _logger)
{
_cosmosDbService = cosmosDbService;
_blobStorageService = blobStorageService;
this._logger = _logger;
}
// CRON: fires at 00:00, 04:00, 08:00, 12:00, 16:00, 20:00 UTC
[Function(nameof(OmsTimerTransformFunction))]
public async Task Run([TimerTrigger("0 0 */4 * * *")] TimerInfo timerInfo)
{
var batchId = Guid.NewGuid().ToString("N")[..8];
var startTime = DateTimeOffset.UtcNow;
_logger.LogInformation("OmsTimerTransformFunction started. BatchId={BatchId}", batchId);
// STEP 1: Query pending orders
var pendingOrders = await _cosmosDbService.GetPendingOrdersAsync();
if (!pendingOrders.Any())
{
_logger.LogInformation("No pending orders found. Exiting. BatchId={BatchId}", batchId);
return;
}
_logger.LogInformation("Found {Count} pending orders. BatchId={BatchId}", pendingOrders.Count, batchId);
// STEP 2: IDEMPOTENCY — claim records immediately
await _cosmosDbService.ClaimOrdersForProcessingAsync(pendingOrders, batchId);
// STEP 3: Map OMS orders to D365 schema
var salesOrders = pendingOrders.Select(o => OmsToD365Mapper.MapHeader(o.OmsEvent)).ToList();
var salesOrderLines = pendingOrders.SelectMany(o => OmsToD365Mapper.MapLines(o.OmsEvent)).ToList();
// STEP 4: Build ZIP archive
var blobName = $"oms_{startTime:yyyyMMdd_HHmmss}_{batchId}.zip";
using var zipStream = new MemoryStream();
using (var archive = new ZipArchive(zipStream, ZipArchiveMode.Create, leaveOpen: true))
{
// header.json
var header = new PackageHeader
{
BatchId = batchId,
BatchSize = pendingOrders.Count,
CreatedAt = startTime,
ExportedBy = "OmsTimerTransformFunction"
};
AddJsonEntry(archive, "header.json", header);
// package.yaml
AddTextEntry(archive, "package.yaml",
"Name: OMS-D365-Integration\n" +
"Entities:\n" +
" - name: SalesOrderHeadersV2\n" +
" file: sales_orders.json\n" +
" - name: SalesOrderLinesV2\n" +
" file: sales_order_lines.json\n");
// sales_orders.json
AddJsonEntry(archive, "sales_orders.json", salesOrders);
// sales_order_lines.json
AddJsonEntry(archive, "sales_order_lines.json", salesOrderLines);
}
// STEP 5: Upload ZIP to Blob Storage
zipStream.Position = 0;
try
{
await _blobStorageService.UploadAsync(blobName, zipStream);
_logger.LogInformation("ZIP uploaded: {BlobName}. BatchId={BatchId}", blobName, batchId);
}
catch (Exception ex)
{
_logger.LogError(ex, "Upload failed for BatchId={BatchId}. Marking orders Failed.", batchId);
await _cosmosDbService.MarkOrdersFailedAsync(pendingOrders.Select(o => o.OrderId), batchId);
throw;
}
// STEP 6: IDEMPOTENCY — mark orders as processed
await _cosmosDbService.MarkOrdersProcessedAsync(
pendingOrders.Select(o => o.OrderId),
batchId,
blobReference: blobName);
var duration = DateTimeOffset.UtcNow - startTime;
_logger.LogInformation(
"TransformSuccess. BatchId={BatchId} BatchSize={BatchSize} Duration={Duration}ms BlobName={BlobName}",
batchId, pendingOrders.Count, (int)duration.TotalMilliseconds, blobName);
}
private static void AddJsonEntry<T>(ZipArchive archive, string fileName, T obj)
{
var entry = archive.CreateEntry(fileName, CompressionLevel.Optimal);
using var writer = new StreamWriter(entry.Open(), Encoding.UTF8);
writer.Write(JsonSerializer.Serialize(obj, _jsonOptions));
}
private static void AddTextEntry(ZipArchive archive, string fileName, string content)
{
var entry = archive.CreateEntry(fileName, CompressionLevel.Optimal);
using var writer = new StreamWriter(entry.Open(), Encoding.UTF8);
writer.Write(content);
}
}
🔗 Logic App Workflow: D365 Delivery Orchestration
Low-code workflow that schedules every 4 hours and delivers processed orders to D365.
Workflow Steps (Visual Representation)
✅ TRUE Branch: Blob Found
❌ FALSE Branch: No Blob
Logic App Workflow JSON Definition
{
"definition": {
"$schema": "https://schema.management.azure.com/providers/Microsoft.Logic/schemas/2016-06-01/workflowdefinition.json#",
"actions": {
"Recurrence": {
"type": "Recurrence",
"recurrence": {
"frequency": "Hour",
"interval": 4,
"startTime": "2026-03-14T00:15:00Z",
"timeZone": "UTC"
}
},
"Initialize_StartTime": {
"type": "InitializeVariable",
"inputs": {
"variables": [
{
"name": "StartTime",
"type": "string",
"value": "@utcNow()"
}
]
},
"runAfter": { "Recurrence": ["Succeeded"] }
},
"List_Blobs": {
"type": "ApiConnection",
"inputs": {
"host": {
"connection": { "name": "@parameters('$connections')['azureblob']['connectionId']" }
},
"method": "get",
"path": "/datasets/default/foldersV2/@{encodeURIComponent(encodeURIComponent('oms-d365-payloads'))}",
"queries": {
"useFlatListing": false,
"pageSize": 10,
"searchPattern": "oms_*.zip"
}
},
"runAfter": { "Initialize_StartTime": ["Succeeded"] }
},
"Condition_BlobFound": {
"type": "If",
"expression": {
"and": [
{
"greater": [
"@length(outputs('List_Blobs')?['body/value'])",
0
]
}
]
},
"actions": {
"Set_BlobName": {
"type": "SetVariable",
"inputs": {
"name": "BlobName",
"value": "@outputs('List_Blobs')?['body/value'][0]['Name']"
}
},
"Get_Blob_Content": {
"type": "ApiConnection",
"inputs": {
"host": {
"connection": { "name": "@parameters('$connections')['azureblob']['connectionId']" }
},
"method": "get",
"path": "/datasets/default/files/@{encodeURIComponent(encodeURIComponent('oms-d365-payloads/@{variables(\\'BlobName\\')}')}",
"queries": { "inferContentType": true }
},
"runAfter": { "Set_BlobName": ["Succeeded"] }
},
"Scope_MainFlow": {
"type": "Scope",
"actions": {
"Upload_to_D365": {
"type": "ApiConnection",
"inputs": {
"host": {
"connection": { "name": "@parameters('$connections')['dynamicscrmonline']['connectionId']" }
},
"method": "post",
"path": "/api/data/v9.2/SalesOrderHeadersV2",
"body": "@outputs('Get_Blob_Content')?['body']",
"headers": { "Content-Type": "application/zip" }
}
},
"Trigger_DIXF_Import": {
"type": "ApiConnection",
"inputs": {
"host": {
"connection": { "name": "@parameters('$connections')['dynamicscrmonline']['connectionId']" }
},
"method": "post",
"path": "/api/data/v9.2/dmf_importexecution",
"body": {
"dmf_jobdefinitionid": "@parameters('DixfJobDefinitionId')",
"dmf_sourcename": "@variables('BlobName')"
}
},
"runAfter": { "Upload_to_D365": ["Succeeded"] }
},
"Wait_for_JobCompletion": {
"type": "Until",
"expression": "@or(equals(variables('D365Status'), 'Completed'), equals(variables('D365Status'), 'Failed'))",
"limit": {
"count": 20,
"timeout": "PT10M"
},
"actions": {
"Get_Job_Status": {
"type": "ApiConnection",
"inputs": {
"host": { "connection": { "name": "@parameters('$connections')['dynamicscrmonline']['connectionId']" } },
"method": "get",
"path": "/api/data/v9.2/dmf_importexecution(@{outputs('Trigger_DIXF_Import')?['body/dmf_importexecutionid']})"
}
},
"Delay_30s": {
"type": "Wait",
"inputs": { "interval": { "count": 30, "unit": "Second" } },
"runAfter": { "Get_Job_Status": ["Succeeded"] }
}
}
},
"Log_Success": {
"type": "ApiConnection",
"inputs": {
"host": { "connection": { "name": "@parameters('$connections')['applicationinsights']['connectionId']" } },
"method": "post",
"path": "/api/logEvent",
"body": {
"name": "DeliverySuccess",
"properties": {
"BlobName": "@variables('BlobName')",
"Duration": "@{sub(ticks(utcNow()), ticks(variables('StartTime')))}"
}
}
},
"runAfter": { "Wait_for_JobCompletion": ["Succeeded"] }
}
},
"runAfter": { "Get_Blob_Content": ["Succeeded"] }
},
"Scope_ExceptionHandling": {
"type": "Scope",
"actions": {
"Log_Failure": {
"type": "ApiConnection",
"inputs": {
"host": { "connection": { "name": "@parameters('$connections')['applicationinsights']['connectionId']" } },
"method": "post",
"path": "/api/logEvent",
"body": {
"name": "DeliveryFailure",
"properties": {
"BlobName": "@variables('BlobName')",
"Error": "@{body('Scope_MainFlow')}"
}
}
}
},
"Send_Alert_Email": {
"type": "ApiConnection",
"inputs": {
"host": { "connection": { "name": "@parameters('$connections')['office365']['connectionId']" } },
"method": "post",
"path": "/Mail",
"body": {
"To": "ops@contoso.com",
"Subject": "ALERT: D365 Order Delivery Failed",
"Body": "Blob: @{variables('BlobName')}\nError: @{body('Scope_MainFlow')}"
}
}
},
"Terminate_Failure": {
"type": "Terminate",
"inputs": {
"runStatus": "Failed",
"runError": { "message": "@{body('Scope_MainFlow')}" }
}
}
},
"runAfter": { "Scope_MainFlow": ["Failed"] }
}
},
"else": {
"actions": {
"Terminate_NoBlobs": {
"type": "Terminate",
"inputs": { "runStatus": "Succeeded" }
}
}
},
"runAfter": { "List_Blobs": ["Succeeded"] }
}
},
"contentVersion": "1.0.0.0",
"outputs": {},
"parameters": {
"$connections": { "defaultValue": {}, "type": "Object" },
"DixfJobDefinitionId": { "defaultValue": "", "type": "String" }
},
"triggers": { "Recurrence": { "type": "Recurrence" } }
}
}
Connection Details
| Connection | Service | Authentication | Actions Used |
|---|---|---|---|
| azureblob | Azure Blob Storage | Managed Identity (preferred) or Storage Account Key | List Blobs, Get Blob Content |
| dynamicscrmonline | Dynamics 365 Finance & Operations | Service Principal (app registration) with DIXF admin role | Post to SalesOrderHeadersV2, Trigger DIXF Job, Get Job Status |
| applicationinsights | Application Insights | Instrumentation Key or API Key | Track Events (custom telemetry) |
| office365 | Office 365 / Outlook | OAuth (user account) | Send Email Alert |
🌐 Cosmos DB: State Management
Order state machine powered by Azure Cosmos DB NoSQL. Stores processing status and original OMS event data.
Document Schema
Each order in Cosmos DB is stored as a JSON document with the following schema:
| Field | Type | Partition Key? | Description |
|---|---|---|---|
id |
string | ✅ Yes (Cosmos ID) | Unique document identifier; same as orderId |
orderId |
string | ✅ Yes (Logical Partition) | Order identifier from OMS; partition key for hot-data locality |
processingStatus |
string | ❌ Indexed | "Pending" | "Processing" | "Processed" | "Failed" |
omsEvent |
object (JSON) | ❌ | Full CloudEvent data object (for audit trail) |
ingestedAt |
ISO 8601 string | ❌ Indexed (composite) | Timestamp when Function 1 processed the order |
pickedUpAt |
ISO 8601 string | null | ❌ | When Function 2 claimed the order for batch processing |
processedAt |
ISO 8601 string | null | ❌ | When ZIP was created and uploaded to Blob Storage |
batchId |
GUID string | null | ❌ | Batch identifier assigned by Function 2; links orders processed together |
blobReference |
URI string | null | ❌ | Blob Storage URI of the ZIP file containing this order |
retryCount |
integer | ❌ | Number of processing attempts; incremented on retry |
_ttl |
integer | ❌ | Time-to-live in seconds (2,592,000 = 30 days); document auto-deletes |
Sample Document (JSON)
{
"id": "ORD-20260314-001",
"orderId": "ORD-20260314-001",
"processingStatus": "Processed",
"omsEvent": {
"specversion": "1.0",
"type": "OMS.Order.Created",
"source": "https://oms.contoso.com/api/orders",
"id": "order-20260314-001-uuid-1234",
"time": "2026-03-14T12:30:45Z",
"datacontenttype": "application/json",
"data": {
"orderId": "ORD-20260314-001",
"customerId": "CUST-98765",
"orderDate": "2026-03-14",
"totalAmount": 12500.00,
"currency": "USD",
"lineItems": [
{
"itemNumber": 1,
"productCode": "PROD-XYZ-001",
"quantity": 5,
"unitPrice": 2500.00
}
],
"shippingAddress": {
"street": "456 Oak Ave",
"city": "Portland",
"state": "OR",
"zip": "97201",
"country": "US"
}
}
},
"ingestedAt": "2026-03-14T12:30:52Z",
"pickedUpAt": "2026-03-14T16:00:15Z",
"processedAt": "2026-03-14T16:02:33Z",
"batchId": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"blobReference": "https://stomsomsintegrationprod.blob.core.windows.net/oms-d365-payloads/oms_20260314_160230_a1b2c3d4.zip",
"retryCount": 0,
"_ttl": 2592000
}
State Machine Diagram
Why /orderId as Partition Key?
Partition Key Selection Rationale
| Candidate | Cardinality | Data Distribution | Issues | Decision |
|---|---|---|---|---|
| /orderId | Millions of unique values | Even distribution across 16 logical partitions | None; ideal | ✅ CHOSEN |
| /processingStatus | Only 4 values (Pending, Processing, Processed, Failed) | All "Pending" orders on same partition → hot partition | Throttling (429) when batch size > 10K orders/4h; RU limits | ❌ Rejected |
| /customerId | ~10K unique values (uneven) | Large customers create hot partitions | Skewed distribution; some partitions overloaded | ❌ Rejected |
| /ingestedAt (day) | 365 unique values | Today's date → single partition until midnight | Hot partition during peak hours; cold after | ❌ Rejected |
Impact Analysis
Indexing Policy
{
"indexingPolicy": {
"indexingMode": "consistent",
"automatic": true,
"includedPaths": [
{
"path": "/*"
}
],
"excludedPaths": [
{
"path": "/omsEvent/*"
}
],
"compositeIndexes": [
[
{ "path": "/processingStatus", "order": "ascending" },
{ "path": "/ingestedAt", "order": "descending" }
],
[
{ "path": "/batchId", "order": "ascending" },
{ "path": "/processedAt", "order": "descending" }
]
]
}
}
Index Rationale
- Automatic indexing: All fields indexed by default (except /omsEvent/* which is large)
- Composite index 1: Optimizes Function 2's query (WHERE processingStatus="Pending" ORDER BY ingestedAt)
- Composite index 2: Optimizes reporting queries (get all orders in a batch, ordered by processing time)
Throughput & RU Configuration
| Environment | RU/s Mode | RU/s Value | Rationale |
|---|---|---|---|
| Dev | Manual | 400 RU/s | Low cost; batch of 50 orders = ~50 RU writes + 50 RU updates = 100 RU/batch. 6 batches/day = 600 RU/day total. |
| UAT | Autoscale | 400–1000 RU/s | Peak load testing; autoscale handles spikes (e.g., retry batches) |
| Prod | Autoscale | 1000–2000 RU/s | Business growth; 2000 RU/s supports 4x current load without throttling |
Time-to-Live (TTL) Configuration
Cosmos DB Account Configuration
| Setting | Value | Notes |
|---|---|---|
| Kind | GlobalDocumentDB | SQL API (not MongoDB, Cassandra, etc.) |
| Consistency Level | Session | Balanced: faster than Strong, sufficient for our use case |
| Region(s) | Primary: East US; Failover: West US (prod only) | Single region for dev/uat; multi-region for HA in prod |
| Backup Policy | Continuous 30 days | Point-in-time restore available for 30 days (prod) |
| Network | Private Endpoint (prod); Public (dev/uat) | Private endpoint restricts access to VNet-connected services |
🏗️ Infrastructure as Code (Bicep & Terraform)
Complete IaC for 3-environment deployments (Dev/UAT/Prod) using Bicep and Terraform.
Bicep uses modular structure: main.bicep orchestrates modules (servicebus.bicep, cosmosdb.bicep, etc.). Parameter files specify per-environment values.
main.bicep — Orchestrator
Entry point. Declares all module deployments and wires outputs of one module into inputs of dependent modules. Also creates RBAC role assignments.
targetScope = 'resourceGroup'
@description('Deployment environment')
@allowed(['dev', 'uat', 'prod'])
param environment string
@description('Azure region for all resources')
param location string = resourceGroup().location
@description('Tags applied to all resources')
param tags object = {
project: 'oms-d365-integration'
environment: environment
managedBy: 'bicep'
team: 'azure-integration'
}
// ── Module: Application Insights + Log Analytics ───────────────────────────────────
module appInsights './modules/appInsights.bicep' = {
name: 'deploy-appInsights-${environment}'
params: { environment: environment, location: location, tags: tags }
}
// ── Module: Key Vault ──────────────────────────────────────────────────────────────
module keyVault './modules/keyVault.bicep' = {
name: 'deploy-keyVault-${environment}'
params: { environment: environment, location: location, tags: tags, tenantId: tenant().tenantId }
}
// ── Module: Service Bus ────────────────────────────────────────────────────────────
module serviceBus './modules/serviceBus.bicep' = {
name: 'deploy-serviceBus-${environment}'
params: { environment: environment, location: location, tags: tags }
}
// ── Module: Cosmos DB ──────────────────────────────────────────────────────────────
module cosmosDb './modules/cosmosDb.bicep' = {
name: 'deploy-cosmosDb-${environment}'
params: { environment: environment, location: location, tags: tags }
}
// ── Module: Storage Account ────────────────────────────────────────────────────────
module storage './modules/storage.bicep' = {
name: 'deploy-storage-${environment}'
params: { environment: environment, location: location, tags: tags }
}
// ── Module: Event Grid (depends on Service Bus) ────────────────────────────────────
module eventGrid './modules/eventGrid.bicep' = {
name: 'deploy-eventGrid-${environment}'
params: {
environment: environment
location: location
tags: tags
serviceBusTopicId: serviceBus.outputs.topicId
serviceBusNamespaceId: serviceBus.outputs.namespaceId
}
dependsOn: [serviceBus]
}
// ── Module: Function App (depends on most modules) ─────────────────────────────────
module functionApp './modules/functionApp.bicep' = {
name: 'deploy-functionApp-${environment}'
params: {
environment: environment
location: location
tags: tags
storageAccountName: storage.outputs.storageAccountName
appInsightsConnectionString: appInsights.outputs.connectionString
cosmosDbEndpoint: cosmosDb.outputs.endpoint
cosmosDbAccountName: cosmosDb.outputs.accountName
serviceBusNamespaceName: serviceBus.outputs.namespaceName
serviceBusNamespaceId: serviceBus.outputs.namespaceId
keyVaultUri: keyVault.outputs.keyVaultUri
keyVaultName: keyVault.outputs.keyVaultName
}
dependsOn: [storage, appInsights, cosmosDb, serviceBus, keyVault]
}
// ── Module: Logic App ──────────────────────────────────────────────────────────────
module logicApp './modules/logicApp.bicep' = {
name: 'deploy-logicApp-${environment}'
params: {
environment: environment
location: location
tags: tags
storageAccountName: storage.outputs.storageAccountName
storageAccountId: storage.outputs.storageAccountId
appInsightsConnectionString: appInsights.outputs.connectionString
keyVaultUri: keyVault.outputs.keyVaultUri
keyVaultName: keyVault.outputs.keyVaultName
}
dependsOn: [storage, appInsights, keyVault]
}
// ── RBAC: Function App MI → Service Bus Data Receiver ─────────────────────────────
resource sbRbac 'Microsoft.Authorization/roleAssignments@2022-04-01' = {
name: guid(serviceBus.outputs.namespaceId, functionApp.outputs.principalId, 'sb-receiver')
scope: resourceGroup()
properties: {
roleDefinitionId: subscriptionResourceId('Microsoft.Authorization/roleDefinitions', '4f6d3b9b-027b-4f4c-9142-0e5a2a2247e0')
principalId: functionApp.outputs.principalId
principalType: 'ServicePrincipal'
}
}
// ── RBAC: Function App MI → Storage Blob Data Contributor ──────────────────────────
resource storageFaRbac 'Microsoft.Authorization/roleAssignments@2022-04-01' = {
name: guid(storage.outputs.storageAccountId, functionApp.outputs.principalId, 'storage-blob')
scope: resourceGroup()
properties: {
roleDefinitionId: subscriptionResourceId('Microsoft.Authorization/roleDefinitions', 'ba92f5b4-2d11-453d-a403-e96b0029c9fe')
principalId: functionApp.outputs.principalId
principalType: 'ServicePrincipal'
}
}
// ── RBAC: Logic App MI → Storage Blob Data Contributor ─────────────────────────────
resource storageLaRbac 'Microsoft.Authorization/roleAssignments@2022-04-01' = {
name: guid(storage.outputs.storageAccountId, logicApp.outputs.principalId, 'storage-blob-la')
scope: resourceGroup()
properties: {
roleDefinitionId: subscriptionResourceId('Microsoft.Authorization/roleDefinitions', 'ba92f5b4-2d11-453d-a403-e96b0029c9fe')
principalId: logicApp.outputs.principalId
principalType: 'ServicePrincipal'
}
}
// ── Outputs ────────────────────────────────────────────────────────────────────────
output functionAppName string = functionApp.outputs.functionAppName
output cosmosDbEndpoint string = cosmosDb.outputs.endpoint
output serviceBusNamespaceName string = serviceBus.outputs.namespaceName
output logicAppName string = logicApp.outputs.logicAppName
output keyVaultUri string = keyVault.outputs.keyVaultUri
output appInsightsName string = appInsights.outputs.appInsightsName
output storageAccountName string = storage.outputs.storageAccountNamemodules/serviceBus.bicep
Creates: Service Bus Namespace (Standard SKU), Topic (oms-orders-topic) with partitioning and duplicate detection, Subscription with DLQ enabled.
@description('Deployment environment')
param environment string
@description('Azure region')
param location string
@description('Resource tags')
param tags object
resource sbNamespace 'Microsoft.ServiceBus/namespaces@2022-10-01-preview' = {
name: 'sb-oms-integration-${environment}'
location: location
sku: { name: 'Standard', tier: 'Standard' }
properties: {
minimumTlsVersion: '1.2'
zoneRedundant: environment == 'prod' ? true : false
}
tags: tags
}
// Topic: partitioned for throughput; dup detection prevents replay within 10 min
resource sbTopic 'Microsoft.ServiceBus/namespaces/topics@2022-10-01-preview' = {
parent: sbNamespace
name: 'oms-orders-topic'
properties: {
enablePartitioning: true
requiresDuplicateDetection: true
duplicateDetectionHistoryTimeWindow: 'PT10M'
defaultMessageTimeToLive: 'P14D'
maxSizeInMegabytes: 1024
}
}
// Subscription: maxDelivery=10 then auto-DLQ; lockDuration=5min matches function timeout
resource sbSubscription 'Microsoft.ServiceBus/namespaces/topics/subscriptions@2022-10-01-preview' = {
parent: sbTopic
name: 'oms-d365-subscription'
properties: {
maxDeliveryCount: 10
lockDuration: 'PT5M'
deadLetteringOnMessageExpiration: true
defaultMessageTimeToLive: 'P14D'
enableBatchedOperations: true
}
}
output namespaceName string = sbNamespace.name
output namespaceId string = sbNamespace.id
output topicId string = sbTopic.id
output subscriptionName string = sbSubscription.name
output namespaceHostname string = '${sbNamespace.name}.servicebus.windows.net'modules/cosmosDb.bicep
Creates: Cosmos DB Account (NoSQL/GlobalDocumentDB), Database, Container with /orderId partition key, composite index on [processingStatus, ingestedAt], 30-day TTL.
@description('Deployment environment')
param environment string
@description('Azure region')
param location string
@description('Resource tags')
param tags object
// Autoscale in prod; manual 400 RU/s in dev/uat
var throughputSettings = environment == 'prod'
? { autoscaleSettings: { maxThroughput: 4000 } }
: { throughput: 400 }
// Session consistency: best trade-off - cheaper than Strong, sufficient for our batch pattern
resource cosmosAccount 'Microsoft.DocumentDB/databaseAccounts@2023-04-15' = {
name: 'cosmos-oms-integration-${environment}'
location: location
kind: 'GlobalDocumentDB'
properties: {
databaseAccountOfferType: 'Standard'
consistencyPolicy: {
defaultConsistencyLevel: 'Session'
maxStalenessPrefix: 100
maxIntervalInSeconds: 5
}
locations: [{
locationName: location
failoverPriority: 0
isZoneRedundant: environment == 'prod' ? true : false
}]
enableAutomaticFailover: false
publicNetworkAccess: environment == 'prod' ? 'Disabled' : 'Enabled'
minimalTlsVersion: 'Tls12'
}
tags: tags
}
resource database 'Microsoft.DocumentDB/databaseAccounts/sqlDatabases@2023-04-15' = {
parent: cosmosAccount
name: 'oms-integration-db'
properties: { resource: { id: 'oms-integration-db' } }
}
// Partition key: /orderId — millions of unique values = perfect cardinality, no hot partition
// TTL: 30 days auto-deletes processed documents to control storage cost
resource container 'Microsoft.DocumentDB/databaseAccounts/sqlDatabases/containers@2023-04-15' = {
parent: database
name: 'oms-orders'
properties: {
resource: {
id: 'oms-orders'
partitionKey: { paths: ['/orderId'], kind: 'Hash', version: 2 }
defaultTtl: 2592000
indexingPolicy: {
indexingMode: 'consistent'
automatic: true
includedPaths: [{ path: '/*' }]
excludedPaths: [{ path: '/omsEvent/*' }, { path: '/"_etag"/?' }]
compositeIndexes: [
[
{ path: '/processingStatus', order: 'ascending' }
{ path: '/ingestedAt', order: 'ascending' }
]
]
}
}
options: throughputSettings
}
}
output endpoint string = cosmosAccount.properties.documentEndpoint
output accountName string = cosmosAccount.name
output accountId string = cosmosAccount.id
output databaseName string = database.namemodules/storage.bicep
Creates: Storage Account (ZRS in prod, LRS in dev), Blob Service, Container oms-d365-payloads, Lifecycle policy (auto-delete ZIPs after 7 days).
@description('Deployment environment')
param environment string
@description('Azure region')
param location string
@description('Resource tags')
param tags object
// ZRS in prod = zone-redundant (survives datacenter failure)
// LRS in dev/uat = cheaper, sufficient for non-prod
resource storageAccount 'Microsoft.Storage/storageAccounts@2023-01-01' = {
name: 'saomsintegration${environment}' // max 24 chars, no hyphens
location: location
sku: { name: environment == 'prod' ? 'Standard_ZRS' : 'Standard_LRS' }
kind: 'StorageV2'
properties: {
accessTier: 'Hot'
minimumTlsVersion: 'TLS1_2'
allowBlobPublicAccess: false
supportsHttpsTrafficOnly: true
networkAcls: {
defaultAction: environment == 'prod' ? 'Deny' : 'Allow'
bypass: 'AzureServices'
}
}
tags: tags
}
resource blobService 'Microsoft.Storage/storageAccounts/blobServices@2023-01-01' = {
parent: storageAccount
name: 'default'
properties: {
deleteRetentionPolicy: { enabled: true, days: 7 }
isVersioningEnabled: environment == 'prod' ? true : false
}
}
// Staging container: holds ZIP packages for Logic App to read and deliver to D365
resource stagingContainer 'Microsoft.Storage/storageAccounts/blobServices/containers@2023-01-01' = {
parent: blobService
name: 'oms-d365-payloads'
properties: { publicAccess: 'None' }
}
// Safety net: any ZIP older than 7 days was not processed — auto-delete
resource lifecyclePolicy 'Microsoft.Storage/storageAccounts/managementPolicies@2023-01-01' = {
parent: storageAccount
name: 'default'
properties: {
policy: {
rules: [{
name: 'delete-old-staging-zips'
enabled: true
type: 'Lifecycle'
definition: {
filters: { blobTypes: ['blockBlob'], prefixMatch: ['oms-d365-payloads/'] }
actions: { baseBlob: { delete: { daysAfterModificationGreaterThan: 7 } } }
}
}]
}
}
}
output storageAccountName string = storageAccount.name
output storageAccountId string = storageAccount.id
output stagingContainerName string = stagingContainer.name
output blobEndpoint string = storageAccount.properties.primaryEndpoints.blobmodules/functionApp.bicep
Creates: App Service Plan (Y1 Consumption), Function App (.NET 8 isolated), all app settings with Key Vault references for secrets, System-Assigned Managed Identity.
@description('Deployment environment')
param environment string
@description('Azure region')
param location string
@description('Resource tags')
param tags object
@description('Storage account name for AzureWebJobsStorage')
param storageAccountName string
@description('Application Insights connection string')
param appInsightsConnectionString string
@description('Cosmos DB endpoint URL')
param cosmosDbEndpoint string
@description('Cosmos DB account name')
param cosmosDbAccountName string
@description('Service Bus namespace name')
param serviceBusNamespaceName string
@description('Service Bus namespace resource ID')
param serviceBusNamespaceId string
@description('Key Vault URI')
param keyVaultUri string
@description('Key Vault name for KV reference syntax')
param keyVaultName string
resource storageAccount 'Microsoft.Storage/storageAccounts@2023-01-01' existing = {
name: storageAccountName
}
// Consumption plan Y1: pay-per-execution, elastic scale, zero idle cost
resource appServicePlan 'Microsoft.Web/serverfarms@2023-01-01' = {
name: 'asp-oms-integration-${environment}'
location: location
sku: { name: 'Y1', tier: 'Dynamic' }
kind: 'functionapp'
properties: { reserved: false }
tags: tags
}
resource functionApp 'Microsoft.Web/sites@2023-01-01' = {
name: 'func-oms-integration-${environment}'
location: location
kind: 'functionapp'
identity: { type: 'SystemAssigned' }
properties: {
serverFarmId: appServicePlan.id
httpsOnly: true
siteConfig: {
netFrameworkVersion: 'v8.0'
use32BitWorkerProcess: false
ftpsState: 'Disabled'
minTlsVersion: '1.2'
appSettings: [
{ name: 'FUNCTIONS_EXTENSION_VERSION', value: '~4' }
{ name: 'FUNCTIONS_WORKER_RUNTIME', value: 'dotnet-isolated' }
{ name: 'AzureWebJobsStorage', value: 'DefaultEndpointsProtocol=https;AccountName=${storageAccount.name};AccountKey=${storageAccount.listKeys().keys[0].value}' }
{ name: 'APPLICATIONINSIGHTS_CONNECTION_STRING', value: appInsightsConnectionString }
// Service Bus — Managed Identity (no connection string)
{ name: 'ServiceBusConnection__fullyQualifiedNamespace', value: '${serviceBusNamespaceName}.servicebus.windows.net' }
{ name: 'ServiceBusTopicName', value: 'oms-orders-topic' }
{ name: 'ServiceBusSubscriptionName', value: 'oms-d365-subscription' }
// Cosmos DB — Managed Identity
{ name: 'CosmosDbEndpoint', value: cosmosDbEndpoint }
{ name: 'CosmosDbDatabaseName', value: 'oms-integration-db' }
{ name: 'CosmosDbContainerName', value: 'oms-orders' }
// Blob Storage
{ name: 'BlobStorageEndpoint', value: 'https://${storageAccountName}.blob.core.windows.net' }
{ name: 'BlobContainerName', value: 'oms-d365-payloads' }
// Key Vault References — secrets never appear in plain text
{ name: 'D365BaseUrl', value: '@Microsoft.KeyVault(VaultName=${keyVaultName};SecretName=D365-Base-Url)' }
{ name: 'D365ClientId', value: '@Microsoft.KeyVault(VaultName=${keyVaultName};SecretName=D365-Client-Id)' }
{ name: 'D365ClientSecret', value: '@Microsoft.KeyVault(VaultName=${keyVaultName};SecretName=D365-Client-Secret)' }
{ name: 'Environment', value: environment }
{ name: 'WEBSITE_RUN_FROM_PACKAGE', value: '1' }
]
}
}
tags: tags
}
output functionAppName string = functionApp.name
output functionAppId string = functionApp.id
output principalId string = functionApp.identity.principalId
output defaultHostname string = functionApp.properties.defaultHostNamemodules/logicApp.bicep
Creates: App Service Plan (WS1 Workflow Standard — required for Logic App Standard), Logic App Standard site, app settings with KV references, System-Assigned MI.
@description('Deployment environment')
param environment string
@description('Azure region')
param location string
@description('Resource tags')
param tags object
@description('Storage account name (required by Logic App Standard)')
param storageAccountName string
@description('Storage account resource ID')
param storageAccountId string
@description('Application Insights connection string')
param appInsightsConnectionString string
@description('Key Vault URI')
param keyVaultUri string
@description('Key Vault name')
param keyVaultName string
resource storageAccount 'Microsoft.Storage/storageAccounts@2023-01-01' existing = {
name: storageAccountName
}
// Logic App Standard REQUIRES a dedicated ASP (not Consumption)
// WS1 = 1 vCore, 3.5 GB RAM — suitable for our delivery workflow
resource logicAppPlan 'Microsoft.Web/serverfarms@2023-01-01' = {
name: 'asp-la-oms-integration-${environment}'
location: location
sku: { name: 'WS1', tier: 'WorkflowStandard' }
kind: 'windows'
properties: {
targetWorkerCount: environment == 'prod' ? 2 : 1
maximumElasticWorkerCount: environment == 'prod' ? 4 : 2
}
tags: tags
}
resource logicApp 'Microsoft.Web/sites@2023-01-01' = {
name: 'la-oms-d365-delivery-${environment}'
location: location
kind: 'functionapp,workflowapp'
identity: { type: 'SystemAssigned' }
properties: {
serverFarmId: logicAppPlan.id
httpsOnly: true
siteConfig: {
use32BitWorkerProcess: false
ftpsState: 'Disabled'
minTlsVersion: '1.2'
appSettings: [
{ name: 'APP_KIND', value: 'workflowApp' }
{ name: 'FUNCTIONS_EXTENSION_VERSION', value: '~4' }
{ name: 'FUNCTIONS_WORKER_RUNTIME', value: 'node' }
{ name: 'WEBSITE_NODE_DEFAULT_VERSION', value: '~18' }
{ name: 'AzureWebJobsStorage', value: 'DefaultEndpointsProtocol=https;AccountName=${storageAccount.name};AccountKey=${storageAccount.listKeys().keys[0].value}' }
{ name: 'APPLICATIONINSIGHTS_CONNECTION_STRING', value: appInsightsConnectionString }
{ name: 'BlobStorageEndpoint', value: 'https://${storageAccountName}.blob.core.windows.net' }
{ name: 'BlobStagingContainer', value: 'oms-d365-payloads' }
{ name: 'D365BaseUrl', value: '@Microsoft.KeyVault(VaultName=${keyVaultName};SecretName=D365-Base-Url)' }
{ name: 'D365ClientId', value: '@Microsoft.KeyVault(VaultName=${keyVaultName};SecretName=D365-Client-Id)' }
{ name: 'D365ClientSecret', value: '@Microsoft.KeyVault(VaultName=${keyVaultName};SecretName=D365-Client-Secret)' }
{ name: 'Environment', value: environment }
{ name: 'WEBSITE_RUN_FROM_PACKAGE', value: '1' }
]
}
}
tags: tags
}
output logicAppName string = logicApp.name
output logicAppId string = logicApp.id
output principalId string = logicApp.identity.principalId
output defaultHostname string = logicApp.properties.defaultHostNamemodules/eventGrid.bicep
Creates: Event Grid Custom Topic (CloudEvents 1.0 schema), Event Subscription routing OMS.Order.Created events to Service Bus Topic, RBAC for Event Grid MI to publish to Service Bus.
@description('Deployment environment')
param environment string
@description('Azure region')
param location string
@description('Resource tags')
param tags object
@description('Service Bus topic resource ID (event destination)')
param serviceBusTopicId string
@description('Service Bus namespace resource ID (for RBAC)')
param serviceBusNamespaceId string
// CloudEvents 1.0 schema enforces structured, validated payload format
// Prevents arbitrary unstructured JSON from being published
resource eventGridTopic 'Microsoft.EventGrid/topics@2023-12-15-preview' = {
name: 'egt-oms-events-${environment}'
location: location
identity: { type: 'SystemAssigned' } // Needed for delivery to Service Bus
properties: {
inputSchema: 'CloudEventSchemaV1_0'
publicNetworkAccess: environment == 'prod' ? 'Disabled' : 'Enabled'
disableLocalAuth: false
}
tags: tags
}
// Event Grid MI needs Azure Service Bus Data Sender role to forward events
resource egToSbRole 'Microsoft.Authorization/roleAssignments@2022-04-01' = {
name: guid(serviceBusNamespaceId, eventGridTopic.id, 'sb-data-sender')
scope: resourceGroup()
properties: {
roleDefinitionId: subscriptionResourceId('Microsoft.Authorization/roleDefinitions', '69a216fc-b8fb-44d8-bc22-1f3c2cd27a39')
principalId: eventGridTopic.identity.principalId
principalType: 'ServicePrincipal'
}
}
// Filter: only route OMS.Order.Created and OMS.Order.Updated events
resource eventSubscription 'Microsoft.EventGrid/topics/eventSubscriptions@2023-12-15-preview' = {
parent: eventGridTopic
name: 'oms-to-servicebus-subscription'
properties: {
destination: {
endpointType: 'ServiceBusTopic'
properties: { resourceId: serviceBusTopicId }
}
filter: {
includedEventTypes: ['OMS.Order.Created', 'OMS.Order.Updated']
enableAdvancedFilteringOnArrays: true
}
eventDeliverySchema: 'CloudEventSchemaV1_0'
retryPolicy: { maxDeliveryAttempts: 30, eventTimeToLiveInMinutes: 1440 }
}
dependsOn: [egToSbRole]
}
output topicEndpoint string = eventGridTopic.properties.endpoint
output topicId string = eventGridTopic.id
output topicName string = eventGridTopic.name
output principalId string = eventGridTopic.identity.principalIdmodules/keyVault.bicep
Creates: Key Vault with RBAC authorization model (not access policies), soft delete 90 days, purge protection in prod. Secrets are set post-deployment via CI/CD — never hardcoded in Bicep.
@description('Deployment environment')
param environment string
@description('Azure region')
param location string
@description('Resource tags')
param tags object
@description('AAD Tenant ID for Key Vault')
param tenantId string
// RBAC model preferred over access policies:
// - Standard Azure RBAC = consistent with other resources
// - No 16-policy limit (RBAC is unlimited)
// - Can scope permissions to individual secrets
// - Works seamlessly with Managed Identity
resource keyVault 'Microsoft.KeyVault/vaults@2023-07-01' = {
name: 'kv-oms-integration-${environment}'
location: location
properties: {
tenantId: tenantId
sku: { name: 'standard', family: 'A' }
enableRbacAuthorization: true // RBAC model — not access policies
enableSoftDelete: true
softDeleteRetentionInDays: 90
enablePurgeProtection: environment == 'prod' ? true : false
publicNetworkAccess: environment == 'prod' ? 'Disabled' : 'Enabled'
networkAcls: {
defaultAction: environment == 'prod' ? 'Deny' : 'Allow'
bypass: 'AzureServices'
}
}
tags: tags
}
output keyVaultUri string = keyVault.properties.vaultUri
output keyVaultName string = keyVault.name
output keyVaultId string = keyVault.idmodules/appInsights.bicep
Creates: Log Analytics Workspace (workspace-based mode — classic AI is deprecated), Application Insights linked to the workspace. Configures sampling percentage per environment.
@description('Deployment environment')
param environment string
@description('Azure region')
param location string
@description('Resource tags')
param tags object
// Log Analytics Workspace: required for workspace-based Application Insights
// Classic (non-workspace) Application Insights is deprecated since 2024
resource logAnalyticsWorkspace 'Microsoft.OperationalInsights/workspaces@2023-09-01' = {
name: 'law-oms-integration-${environment}'
location: location
properties: {
sku: { name: 'PerGB2018' }
retentionInDays: environment == 'prod' ? 90 : 30
features: { enableLogAccessUsingOnlyResourcePermissions: true }
}
tags: tags
}
// Workspace-based Application Insights: logs stored in Log Analytics
// Enables unified KQL querying across all Azure resources
resource appInsights 'Microsoft.Insights/components@2020-02-02' = {
name: 'appi-oms-integration-${environment}'
location: location
kind: 'web'
properties: {
Application_Type: 'web'
WorkspaceResourceId: logAnalyticsWorkspace.id
RetentionInDays: environment == 'prod' ? 90 : 30
// Sampling: 50% in prod reduces ingestion cost while maintaining statistical accuracy
// 100% in dev/uat ensures full visibility during testing
SamplingPercentage: environment == 'prod' ? 50 : 100
DisableIpMasking: false // GDPR compliance: mask client IPs
}
tags: tags
}
// Use connectionString (not instrumentationKey) — instrumentationKey is deprecated
output connectionString string = appInsights.properties.ConnectionString
output instrumentationKey string = appInsights.properties.InstrumentationKey
output appInsightsName string = appInsights.name
output appInsightsId string = appInsights.id
output logAnalyticsWorkspaceId string = logAnalyticsWorkspace.idparameters/dev.bicepparam — Development Environment
using '../main.bicep'
// Development environment — cost-optimised, public access enabled for developer tooling
param environment = 'dev'
param location = 'westeurope'
param tags = {
project: 'oms-d365-integration'
environment: 'dev'
costCenter: 'IT-DEV-001'
managedBy: 'bicep'
team: 'azure-integration'
deployedBy: 'ci-cd-pipeline'
owner: 'integration-team@company.com'
}parameters/uat.bicepparam — UAT Environment
using '../main.bicep'
// UAT environment — mirrors prod config where possible for test parity
// Public access still enabled (no private endpoints) to reduce test friction
param environment = 'uat'
param location = 'westeurope'
param tags = {
project: 'oms-d365-integration'
environment: 'uat'
costCenter: 'IT-UAT-001'
managedBy: 'bicep'
team: 'azure-integration'
deployedBy: 'ci-cd-pipeline'
owner: 'integration-team@company.com'
testEnvironment: 'true'
dataClassification: 'non-production'
}parameters/prod.bicepparam — Production Environment
using '../main.bicep'
// PRODUCTION — full security hardening, zone-redundant, private endpoints
// Requires: manual approval gate in CI/CD pipeline before apply
// Requires: deployment during off-peak window (00:00–04:00 UTC)
param environment = 'prod'
param location = 'westeurope'
param tags = {
project: 'oms-d365-integration'
environment: 'prod'
costCenter: 'IT-PROD-001'
managedBy: 'bicep'
team: 'azure-integration'
deployedBy: 'ci-cd-pipeline'
owner: 'integration-team@company.com'
slaTarget: '99.9'
dataClassification: 'confidential'
complianceScope: 'SOC2'
businessUnit: 'supply-chain'
criticalityLevel: 'high'
}Deployment Command
# Dev deployment
az deployment group create \
--name oms-d365-integration-dev \
--resource-group rg-oms-d365-dev \
--template-file main.bicep \
--parameters dev.bicepparam
# Prod deployment
az deployment group create \
--name oms-d365-integration-prod \
--resource-group rg-oms-d365-prod \
--template-file main.bicep \
--parameters prod.bicepparam
Terraform Implementation
Alternative IaC using HashiCorp Terraform. Three files: main.tf (resources), variables.tf (inputs), outputs.tf (results).
main.tf
terraform {
required_version = ">= 1.3"
required_providers {
azurerm = {
source = "hashicorp/azurerm"
version = "~> 3.0"
}
}
backend "azurerm" {
# Configure backend state storage
}
}
provider "azurerm" {
features {}
subscription_id = var.subscription_id
}
# ── Resource Group ────────────────────────────────────────────────
resource "azurerm_resource_group" "oms_integration" {
name = "rg-oms-d365-${var.environment}"
location = var.location
tags = var.tags
}
# ── Service Bus Namespace ─────────────────────────────────────────
resource "azurerm_servicebus_namespace" "main" {
name = "sb-oms-integration-${var.environment}"
location = azurerm_resource_group.oms_integration.location
resource_group_name = azurerm_resource_group.oms_integration.name
sku = "Standard"
capacity = 1
tags = var.tags
}
# ── Service Bus Topic ─────────────────────────────────────────────
resource "azurerm_servicebus_topic" "oms_orders" {
name = "oms-orders-topic"
namespace_id = azurerm_servicebus_namespace.main.id
partitioned = true
max_size_in_megabytes = 1024
default_message_ttl = "P14D"
requires_duplicate_detection = true
duplicate_detection_history_time_window = "PT10M"
}
# ── Service Bus Subscription ──────────────────────────────────────
resource "azurerm_servicebus_subscription" "oms_d365" {
name = "oms-d365-subscription"
topic_id = azurerm_servicebus_topic.oms_orders.id
max_delivery_count = 10
lock_duration = "PT5M"
dead_lettering_on_message_expiration = true
default_message_ttl = "P14D"
enable_batched_operations = true
}
# ── Cosmos DB Account ─────────────────────────────────────────────
resource "azurerm_cosmosdb_account" "main" {
name = "cosmos-oms-integration-${var.environment}"
location = azurerm_resource_group.oms_integration.location
resource_group_name = azurerm_resource_group.oms_integration.name
offer_type = "Standard"
kind = "GlobalDocumentDB"
consistency_policy {
consistency_level = "Session"
}
geo_location {
location = var.location
failover_priority = 0
}
tags = var.tags
}
# ── Cosmos DB SQL Database ────────────────────────────────────────
resource "azurerm_cosmosdb_sql_database" "main" {
name = "oms-integration"
account_name = azurerm_cosmosdb_account.main.name
resource_group_name = azurerm_resource_group.oms_integration.name
}
# ── Cosmos DB SQL Container ───────────────────────────────────────
resource "azurerm_cosmosdb_sql_container" "oms_orders" {
name = "oms-orders"
account_name = azurerm_cosmosdb_account.main.name
database_name = azurerm_cosmosdb_sql_database.main.name
resource_group_name = azurerm_resource_group.oms_integration.name
partition_key_path = "/orderId"
throughput = var.environment == "prod" ? 1000 : 400
default_ttl = 2592000
indexing_policy {
indexing_mode = "consistent"
included_path {
path = "/*"
}
excluded_path {
path = "/omsEvent/*"
}
composite_index {
index {
path = "/processingStatus"
order = "Ascending"
}
index {
path = "/ingestedAt"
order = "Descending"
}
}
}
}
# ── Storage Account ───────────────────────────────────────────────
resource "azurerm_storage_account" "main" {
name = "st${replace(var.environment, "-", "")}oms${random_string.storage_suffix.result}"
resource_group_name = azurerm_resource_group.oms_integration.name
location = azurerm_resource_group.oms_integration.location
account_tier = "Standard"
account_replication_type = var.environment == "prod" ? "ZRS" : "LRS"
min_tls_version = "TLS1_2"
tags = var.tags
}
# ── Blob Container ────────────────────────────────────────────────
resource "azurerm_storage_container" "oms_payloads" {
name = "oms-d365-payloads"
storage_account_id = azurerm_storage_account.main.id
container_access_type = "private"
}
# ── Application Insights ──────────────────────────────────────────
resource "azurerm_application_insights" "main" {
name = "appinsights-oms-integration-${var.environment}"
location = azurerm_resource_group.oms_integration.location
resource_group_name = azurerm_resource_group.oms_integration.name
application_type = "web"
retention_in_days = 90
tags = var.tags
}
# ── Key Vault ─────────────────────────────────────────────────────
resource "azurerm_key_vault" "main" {
name = "kv-oms-integration-${var.environment}-${random_string.keyvault_suffix.result}"
location = azurerm_resource_group.oms_integration.location
resource_group_name = azurerm_resource_group.oms_integration.name
tenant_id = data.azurerm_client_config.current.tenant_id
sku_name = "standard"
purge_protection_enabled = var.environment == "prod" ? true : false
tags = var.tags
}
# ── Random Strings for Naming ─────────────────────────────────────
resource "random_string" "storage_suffix" {
length = 4
special = false
upper = false
}
resource "random_string" "keyvault_suffix" {
length = 4
special = false
upper = false
}
# ── Data Source: Current User ─────────────────────────────────────
data "azurerm_client_config" "current" {}
variables.tf
variable "subscription_id" {
description = "Azure subscription ID"
type = string
}
variable "environment" {
description = "Deployment environment"
type = string
validation {
condition = contains(["dev", "uat", "prod"], var.environment)
error_message = "Environment must be dev, uat, or prod."
}
}
variable "location" {
description = "Azure region for resources"
type = string
default = "eastus"
}
variable "tags" {
description = "Resource tags"
type = map(string)
default = {
project = "oms-d365-integration"
managedBy = "terraform"
}
}
outputs.tf
output "resource_group_name" {
value = azurerm_resource_group.oms_integration.name
}
output "service_bus_namespace_name" {
value = azurerm_servicebus_namespace.main.name
}
output "service_bus_namespace_id" {
value = azurerm_servicebus_namespace.main.id
}
output "cosmosdb_account_endpoint" {
value = azurerm_cosmosdb_account.main.endpoint
}
output "cosmosdb_account_primary_key" {
value = azurerm_cosmosdb_account.main.primary_key
sensitive = true
}
output "storage_account_name" {
value = azurerm_storage_account.main.name
}
output "blob_container_name" {
value = azurerm_storage_container.oms_payloads.name
}
output "application_insights_instrumentation_key" {
value = azurerm_application_insights.main.instrumentation_key
sensitive = true
}
output "key_vault_id" {
value = azurerm_key_vault.main.id
}
output "key_vault_uri" {
value = azurerm_key_vault.main.vault_uri
}
Terraform Deployment
# Initialize Terraform
terraform init
# Plan deployment (dev)
terraform plan \
-var-file="environments/dev.tfvars" \
-out=tfplan
# Apply deployment
terraform apply tfplan
# Outputs
terraform output -json > outputs.json
🔒 Security Architecture
Defense-in-depth security model: network, identity, data, and secret management layers.
Security Architecture Overview
Zero-Trust Security Principles Applied
Security Layers
1. Network Security
- Event Grid IP Filtering: Restrict inbound CloudEvent publishing to OMS's known IP ranges. Whitelist /32 CIDR blocks.
- Private Endpoints (Production): Cosmos DB, Service Bus, Blob Storage accessed via private IPs; traffic stays within Azure backbone (not internet-routable).
- VNet Integration (Logic App): Logic App Standard ISE runs in dedicated App Service Environment within customer VNet; outbound calls through VNet gateway.
- Function App with VNet: Functions use regional VNet integration; all outbound traffic through VNet (enabling NSG rules, firewall filtering).
2. Identity & Access Control (IAM)
All services use Managed Identity + RBAC (no shared keys or passwords).
RBAC Configuration (Bicep)
// Grant Service Bus Data Receiver role to Function App
resource sbRoleAssignment 'Microsoft.Authorization/roleAssignments@2022-04-01' = {
name: guid(resourceGroup().id, functionApp.id, 'Service Bus Data Receiver')
scope: sbSubscription
properties: {
roleDefinitionId: subscriptionResourceId(
'Microsoft.Authorization/roleDefinitions',
'4f6d3b9b-027b-4f4c-9142-0e5a2a2247ff' // Service Bus Data Receiver
)
principalId: functionApp.identity.principalId
principalType: 'ServicePrincipal'
}
}
// Grant Cosmos DB Contributor role
resource cosmosRoleAssignment 'Microsoft.Authorization/roleAssignments@2022-04-01' = {
name: guid(resourceGroup().id, functionApp.id, 'Cosmos DB Contributor')
scope: cosmosDb
properties: {
roleDefinitionId: subscriptionResourceId(
'Microsoft.Authorization/roleDefinitions',
'230815da-be43-4aae-9cb8-5a8995d27db8' // Cosmos DB Built-in Data Contributor
)
principalId: functionApp.identity.principalId
principalType: 'ServicePrincipal'
}
}
// Grant Storage Blob Data Contributor
resource storageRoleAssignment 'Microsoft.Authorization/roleAssignments@2022-04-01' = {
name: guid(resourceGroup().id, functionApp.id, 'Storage Blob Data Contributor')
scope: storage
properties: {
roleDefinitionId: subscriptionResourceId(
'Microsoft.Authorization/roleDefinitions',
'ba92f5b4-2d11-453d-a403-e96b0029c9fe' // Storage Blob Data Contributor
)
principalId: functionApp.identity.principalId
principalType: 'ServicePrincipal'
}
}
// Grant Key Vault Secrets User
resource kvRoleAssignment 'Microsoft.Authorization/roleAssignments@2022-04-01' = {
name: guid(resourceGroup().id, functionApp.id, 'Key Vault Secrets User')
scope: keyVault
properties: {
roleDefinitionId: subscriptionResourceId(
'Microsoft.Authorization/roleDefinitions',
'4633458b-17de-408a-b874-0445c86b69e6' // Key Vault Secrets User
)
principalId: functionApp.identity.principalId
principalType: 'ServicePrincipal'
}
}
3. Data Encryption
- At Rest: All data encrypted with AES-256 by default in Azure (automatic; no configuration needed).
- In Transit: TLS 1.2+ enforced for all API calls; CloudEvent payload encrypted by transport layer.
- Key Management: Customer-managed keys (CMK) optional for Cosmos DB + Storage (additional compliance requirement).
4. Secret Management (Key Vault)
All sensitive configuration stored in Azure Key Vault (not in app settings or git).
| Secret | Stored In | Accessed By | Rotation |
|---|---|---|---|
| D365 Service Principal Password | Key Vault | Function 1, Logic App | 90 days (Azure Automation scheduled) |
| Cosmos DB Connection String | Key Vault | Function 1, Function 2 (via Managed Identity) | Never (Managed Identity doesn't use string) |
| Service Bus SAS Key (fallback) | Key Vault | Function binding reference | 180 days |
| Event Grid SAS Key (fallback) | Key Vault | OMS app config | 90 days |
| Application Insights API Key | Key Vault | Logic App telemetry connector | 1 year |
Key Vault Reference in App Settings
{
"name": "D365ServicePrincipalPassword",
"value": "@Microsoft.KeyVault(VaultName=kv-oms-integration-prod;SecretName=d365-sp-password)"
}
5. Audit & Compliance Logging
- Azure Activity Log: All resource deployments, role assignments, policy changes tracked in Activity Log (searchable, exportable).
- Key Vault Audit Logging: Every secret access logged (who, when, success/failure).
- Application Insights Telemetry: Function execution, errors, dependencies logged with correlation IDs for end-to-end tracing.
- Service Bus Metrics: Messages sent/received, DLQ counts, throttling events in Azure Monitor.
6. Access Control Principle: Least Privilege
- Function App: "Service Bus Data Receiver" (not Sender; can't publish events)
- Function App: "Cosmos DB Data Contributor" (can read + write, not delete collections)
- Logic App: "Storage Blob Data Contributor" (can read blobs; not modify container metadata)
- OMS App Registration: "EventGrid Data Sender" (can only publish events; can't read/delete topics)
📊 Monitoring & Observability
Complete telemetry stack: Application Insights for logging, KQL for querying, alerts for proactive incident detection.
Application Insights Integration
- Function App: Automatic SDK injection; ILogger calls automatically flow to App Insights.
- Logic App: Built-in connector for custom events; manual tracking of workflow outcomes.
- Dependencies: Auto-tracked calls to Cosmos DB, Service Bus, Blob Storage (latency, success/failure).
- Correlation IDs: Every invocation gets a unique CorrelationId; linked events across Functions + Logic App.
Key KQL (Kusto Query Language) Queries
Query 1: Ingestion Success Rate (last 24h)
customEvents
| where name == "IngestionSuccess"
| where timestamp > ago(24h)
| summarize
success_count = count(),
avg_duration_ms = avg(todouble(customDimensions.DurationMs))
by bin(timestamp, 1h)
| render timechart with (title="Hourly Ingestion Success Rate")
Query 2: Dead Letter Queue Events
traces
| where message contains "DeadLetter"
| where timestamp > ago(24h)
| project
timestamp,
orderId = tostring(customDimensions.OrderId),
reason = tostring(customDimensions.DeadLetterReason),
error_message = tostring(customDimensions.ErrorMessage)
| order by timestamp desc
Query 3: End-to-End Latency (Ingest → Deliver)
let ingested = customEvents
| where name == "IngestionSuccess"
| extend orderId = tostring(customDimensions.OrderId), ingestedAt = timestamp;
let delivered = customEvents
| where name == "DeliverySuccess"
| extend orderId = tostring(customDimensions.OrderId), deliveredAt = timestamp;
ingested
| join kind=inner delivered on orderId
| extend latency_hours = (deliveredAt - ingestedAt) / 1h
| summarize
avg_latency = avg(latency_hours),
p95_latency = percentile(latency_hours, 95),
p99_latency = percentile(latency_hours, 99)
by bin(ingestedAt, 4h)
| render linechart
Query 4: Function Exception Rate
traces
| where severityLevel >= 2 // Warning or Error
| where timestamp > ago(1h)
| summarize
error_count = count(),
affected_functions = dcount(tostring(customDimensions.FunctionName))
by tostring(customDimensions.FunctionName), severityLevel
| order by error_count desc
Query 5: Cosmos DB RU Consumption
dependencies
| where target == "cosmos-oms-integration-prod"
| where timestamp > ago(24h)
| extend ru = todouble(customDimensions.RUConsumed)
| summarize
total_ru = sum(ru),
avg_ru_per_op = avg(ru),
max_ru_per_op = max(ru)
by bin(timestamp, 1h)
| render barchart
Azure Monitor Alerts
| Alert Name | Condition | Severity | Action Group |
|---|---|---|---|
| DLQ Message Alert | Service Bus subscription activeMessageCount (DLQ) > 0 | 🔴 Critical | Page ops on-call; create incident ticket |
| Function Failure Rate High | Failed invocations > 5% in last 5 min | 🟠 High | Email ops@contoso.com; log to dashboard |
| Logic App Run Failed | Logic App oms-to-d365-delivery run status = Failed | 🟠 High | Page ops-critical group; send Teams notification |
| Cosmos DB Throttling | 429 error count > 10 in 5 min window | 🟡 Medium | Email ops@contoso.com; evaluate RU scale-up |
| Event Grid Failed Deliveries | Event Grid metric deadLettered > 0 | 🟡 Medium | Email ops@contoso.com; review Event Grid dead letter log |
Application Insights Dashboard
Pin the following to your dashboard for real-time visibility:
- Live Metrics Stream: Real-time incoming requests, server response time, failures (streaming only; not historical)
- Failures Blade: Top 10 failed operations; drill down to stack traces
- Custom Events Timeline: IngestionSuccess, TransformSuccess, DeliverySuccess events plotted over time
- End-to-End Transaction Search: Search by OrderId; see all related traces across Functions + Logic App
- Dependency Map: Visual graph of Function → Cosmos DB, Logic App → Blob Storage calls; latency overlays
SLA & SLO Targets
| Metric | Target (SLO) | Measurement |
|---|---|---|
| Order Delivery Success Rate | 99.9% (3x9) | Orders successfully imported into D365 / Total orders initiated |
| End-to-End Latency (p99) | < 5 hours | Order appears in D365 within 5 hours of OMS submission |
| Ingestion Latency (p95) | < 1 second | Order visible in Cosmos DB within 1 second of Service Bus arrival |
| DLQ Rate | < 0.1% | Messages reaching DLQ / Total messages attempted |
🚀 Future Roadmap
Planned enhancements across resilience, security, observability, and operational excellence.
Phase 2 — Enhanced Resilience (3–6 months)
Replace simple timer functions with Azure Durable Functions for checkpointing and fault tolerance.
- Activity functions for each step (query Cosmos, transform, zip, upload)
- Automatic retry with exponential backoff without manual logic
- Suspend/resume on transient failures (network timeouts)
- Human approval workflow for DLQ remediation
Add secondary region (West US) with automatic failover.
- Read replicas reduce latency for queries (p99 < 30ms from any region)
- Automatic failover if primary region unreachable (RTO < 5min)
- Multi-region write for high-frequency updates (eventual consistency)
Add circuit breaker pattern for D365 API calls to prevent cascading failures.
- Detect D365 downtime or throttling (429 > 10/min)
- Open circuit; queue blobs in Blob Storage for retry later
- Automatic recovery when D365 health restored
- Fallback: email notification for manual intervention
Phase 3 — Security Hardening (1–3 months)
Migrate all PaaS services to private endpoint architecture.
- Cosmos DB private endpoint in VNet
- Service Bus private endpoint
- Blob Storage private endpoint
- Event Grid private endpoint (when GA)
- Outcome: zero traffic crosses Azure internet backbone
Deploy Function App with regional VNet integration and Logic App in App Service Environment (ISE).
- All outbound traffic through VNet (NSG rules, firewall filtering)
- Inbound: Function App via Azure Front Door with WAF
- Result: no direct internet exposure
Enable advanced threat protection and compliance scanning.
- Continuous vulnerability assessment for container images
- CSPM (Cloud Security Posture Management) reports
- Regulatory compliance dashboard (SOC2, PCI-DSS, HIPAA if applicable)
Replace SAS keys with Entra ID app registration and token-based auth.
- OMS gets app registration with "EventGrid Data Sender" role
- No SAS keys to rotate; token-based (1-hour lifetime)
- Audit trail in Azure AD sign-in logs
Phase 4 — Observability (2–4 months)
Create executive dashboards for order throughput, latency SLOs, and cost tracking.
- Orders processed per 4-hour batch (trend chart)
- End-to-end latency (p50/p95/p99 percentiles)
- Cost per order (RU cost + Function execution + data transfer)
- DLQ event drill-down (count, root causes)
Implement W3C TraceContext standard for end-to-end tracing.
- OMS generates traceparent header; included in CloudEvent
- Service Bus, Functions, Logic App forward traceparent header
- All related logs linked by trace ID (searchable in App Insights)
- Integration with Jaeger or similar APM tools
Use Grafana for multi-cloud visualization and alerting.
- Azure Monitor data source plugin
- Custom panels for order throughput, latency, cost trends
- Alert rules in Grafana (send to PagerDuty, Slack)
Phase 5 — Operational Excellence
Automated testing and deployment for Functions and Logic App.
- Trigger: git push to main branch
- Build: dotnet build + unit tests (xUnit)
- Integration tests: Deploy to dev, run Newman tests against D365 sandbox
- Deploy to UAT on successful tests; manual approval for Prod
Newman (Postman collections) for order pipeline testing.
- Test suite: Create order → ingest → batch → delivery
- Assert: Order appears in D365 F&O within SLO (5 hours)
- Run nightly against dev environment; alert on failure
Continuous compliance checking against IaC source of truth.
- Policy-as-Code: Azure Policy enforces naming conventions, tagging, encryption settings
- Automated remediation: non-compliant resources auto-corrected
- Report: weekly email of drift incidents
Proactive failure testing to improve resilience.
- Fault injection: simulate Cosmos DB latency spikes (100ms → 500ms)
- Service outages: temporarily disable Service Bus to test fallback logic
- Measure: How long before automated recovery? Manual SLA impact?
- Frequency: Monthly chaos experiments in UAT
Azure Automation runbooks for common remediation tasks.
- DLQ Remediation: Auto-requeue messages after validation
- Cosmos DB Scale-up: Auto-scale RU/s when throttling detected
- Function App Cold Start: Pre-warm instances during peak hours
- SAS Key Rotation: Auto-rotate Event Grid / Service Bus keys (90 days)