The blog of a cloud agnostic professional and craft beer connoisseur

Monitoring Azure Data Factory for the Azure Well-Architected Framework

Original Post Read More

The Azure Well-Architected Framework (WAF) helps ensure that Azure workloads are reliable, stable, and secure while meeting SLAs for performance and cost. The WAF tenets are:

Cost Optimization – Managing costs to maximize the value delivered.
Reliability – The ability of a system to recover from failures and continue to function.
Operational Excellence – Operational processes that keep a system running in production.
Performance Efficiency – The ability of a system to adapt to changes in load.
Security – Protecting applications and data from threats.

 

Applying the Azure WAF to your Azure Data Factory (ADF) workloads is critical and should be considered during initial architecture design and resource deployment.  If you haven’t already, check out this companion blog on Azure Data Factory Patterns and Features for the Azure Well-Architected Framework. But how do you ensure that your ADF environment still meets WAF as workloads grow and evolve? 

 

In this blog post, we’ll focus on monitoring Azure Data Factory to help align to the Azure Well-Architected Framework for data workloads.  

 

 

Alerts and monitoring over Azure Data Factory

All Azure resources offer the capability to build dashboards over costs, but don’t necessarily give you the detail needed or have the alerting capabilities when an issue arises. You can view pipeline activity within the Data Factory itself, but this does not allow you to create aggregated reports over activities and pipelines over time.

 

Create alerts over ADF metrics, leverage Azure Monitor and Log Analytics for detailed and/or summarized information about your Data Factory activities and/or create your own notification framework within Data Factory, helping your Data Factories to continue to be optimized for cost, performance and reliability.

 

Using metrics and alerts in Data Factory

Metrics are essentially performance counters, always returning a number, and are leveraged when you configure alerts.

Configure alerts for failures

Configure ADF metrics and alerts to send notifications when triggers, pipelines, activities or SSIS packages fail.  In the example below, an alert will be issued whenever the activity name “cdCopyTextToSQL” fails:

 

Configure Pipeline Elapsed Time metric

In the ADF Pipeline Settings, the Elapsed time metrics on Pipeline Settings allows you to set a duration metric for the pipeline:

Then create an Alert Rule for Elapsed Time Pipeline Run metrics:

If the pipeline runtime exceeds the duration defined in the Elapsed time metric Pipeline Settings, an alert will be issued.

 

Set Alerts on Self-Hosted Integration Runtimes

Self-Hosted Integration Runtimes (SHIRs) are used to move and transform data that resides in an on-premises network or VNet. Set alerts to ensure resources are not overutilized or queuing data movement requests:

The following metrics are available:

Integration runtime available memory (IntegrationRuntimeAvailableMemory)  – be notified when there are any dips in available memory
Integration runtime available node count (IntegrationRuntimeAvailableNodeNumber) – be notified when nodes in a SHIR cluster are not available or not being fully utilized
Integration runtime CPU Utilization (IntegrationRuntimeCpuPercentage) – be notified when there are spikes in CPU or when CPU is being maxed out
Integration runtime queue duration (IntegrationRuntimeAverageTaskPickupDelay) – be notified when the average activity queue duration exceeds a limit
Integration runtime queue length (IntegrationRuntimeQueueLength) – be notified when there are long waits between activities

You can also configure event log capture on the VM(s) that hosts your SHIR.

 

Set alerts on Azure Subscription Limits

ADF has resources limits per Azure subscription. If you expect a Data Factory will have a large number of pipelines, datasets, triggers, linked services, private endpoints and other entities, set alerts on the count of Total entities to be notified when Data Factories start approaching the limit (Default Limit is 5000). For example:

You can also set an alert or query on Total factory size (GB unit) to ensure the Data Factory will not exceed the data factory size limit (2 GB default).

 

Leveraging alerts in ADF allows you to be immediately notified when pipelines are failing or when resources are reaching their limits, supporting WAF tents of Cost Optimization, Reliability, Operational Excellence, and Performance Efficiency.

 

Use Azure Monitor with Log Analytics over Data Factory

Azure Monitor provides verbose information about your ADF triggers, pipelines, and activities for further analysis.

 

Add diagnostic settings

Add diagnostic settings to your Data Factory, enabling Azure Monitor to provide detailed information such as activity duration, trends, and failure information.

 

Send this data to Log Analytics to query in with the Kusto Query Language(KQL), build Azure workbooks from KQL queries, or export to Power BI for further transformation and analysis.

 

(In my Data Factories, I do not use SSIS therefore I do not have them configured.)

 

Explore logs with KQL

 

In the Azure Portal for the Data Factory where you configured the diagnostic settings, go to Monitoring -> Logs to query the corresponding Log Analytics tables containing the run information about my Data Factory:

 

 

Detailed Failure Information

Run queries to get detailed information or aggregated information around failures, as in the example below: 

 

 

 

ADFActivityRun
| where Status == ‘Failed’
| project ActivityName, TimeGenerated, Error, Input, Output

 

 

 

 

Extrapolate costs for orchestration

Costs in Azure Data Factory are based upon Usage. Costs are based upon the number of activities run or triggered, the type of Integration Runtime (IR) used, the number of cores used in an IR, and the type of activity. Get the latest pricing details here

 

Calculations for Orchestration activities are simple: sum up the number of failed or successful activities (ADFActivityRun) plus the number of triggers executed (ADFTriggerRun) plus the number of debug runs (ADFSandboxPipelineRun). The table below summarizes the cost per 1000 runs (as of 11/14/2022):

 

Activity Type

Azure IR

VNet Managed IR

Self-Hosted IR

Orchestration 

$1/1000 Runs

$1/1000 Runs

$1.50/1000 Runs

 

Here’s a sample query to the number of activity runs, where you can apply the cost per IR:

 

 

 

ADFActivityRun
| where Status != “Queued” and Status != “InProgress”
| where EffectiveIntegrationRuntime != “”
| summarize count() by EffectiveIntegrationRuntime

 

 

 

 

Costs are also accrued based upon the type of activity, the activity run duration, and the Integration Runtime used. This data is available in the ADFActivityRun table. Below are the cost details for pipeline activities by IR (for West US 2, as of 11/14/2022): 

 

Activity Type

Azure IR

VNet Managed IR

Self-Hosted IR

Data movement activities

$0.25/DIU-hour

 $0.25/DIU-hour

$0.10/hour

Pipeline activities

$0.005/hour

$1/hour

$0.002/hour

External pipeline activities

$0.00025/hour

$1/hour

$0.0001/hour

 

The example query below derives the elements highlighted above that contribute to the Activity cost:

 

 

 

ADFActivityRun
| where Status != “Queued” and Status != “InProgress”
| project ActivityJson = parse_json(Output)
| project billing = parse_json(ActivityJson.billingReference.billableDuration[0]), ActivityType = parse_json(ActivityJson.billingReference.activityType)
| where ActivityType ==”PipelineActivity”
| evaluate bag_unpack(billing)
| project duration, meterType, unit

 

 

 

 

Dataflow activity costs are based upon whether the cluster is General Purpose or Memory optimized as well as the data flow run duration (Cost as of 11/14/2022 for West US 2): 

 

General Purpose

Memory Optimized

$0.274 per vCore-hour

$0.343 per vCore-hour

 

Here’s an example query to get elements for Dataflow costs:

 

 

 

ADFActivityRun
| where Status != “Queued” and Status != “InProgress” and ActivityType ==”ExecuteDataFlow”
| project ActivityJson = parse_json(Output), InputJSon = parse_json(Input)
| project billing = parse_json(ActivityJson.billingReference.billableDuration[0]), compute = parse_json(InputJSon.compute)
| evaluate bag_unpack(billing)
| evaluate bag_unpack(compute)

 

 

 

 

Costs on Data Factory operations are also incurred, but these are generally insignificant (costs as of 11/14/2022, US West 2):

 

Read/Write

Monitoring

$0.50 per 50,000 modified/referenced entities

$0.25 per 50,000 run records retrieved

 

For more examples on Data Factory pricing, see Understanding Azure Data Factory pricing through examples.

 

You can also export all the table data from Log Analytics to Power BI and build our own reports:

Build your own monitoring framework

Some organizations prefer to build their own monitoring platform, extracting pipeline input, output, or error information to SQL or their data platform of choice. You can also send email notifications when an activity fails.

 

Monitoring your data factories, whether it is with the built-in features of Azure Metrics, Azure Monitor and Log Analytics or through your own auditing framework, helps ensure your workloads continue to be optimized for cost, performance and reliability to meet the tenets of the WAF. New features are continually added to Azure Data Factory and new ideas evolve as well. Please post your comments and feedback with other features or patterns that have helped you monitor your data factories!