Data Factory Integration Runtime
  • 31 Jan 2022
  • 5 Minutes to read
  • Contributors
  • Dark
    Light
  • PDF

Data Factory Integration Runtime

  • Dark
    Light
  • PDF

Integration Runtime

An Integration Runtime (IR) is the compute infrastructure used by Azure Data Factory to provide data integration capabilities such as Data Flows and Data Movement. It acts as a link between the activity and the linked Services.

The following data integration capabilities are provided by Integration Runtime across various network environments:

Data Flow: Allows users to execute a Data Flow in managed Azure compute environment.

Data movement: Alows users to copy data between public network data stores and private network data stores (on-premises or virtual private network). It supports built-in connectors, format conversion, column mapping, and fast and scalable data transfer.

Activity dispatch: Allows users to dispatch and monitor transformation activities running on a variety of compute services, including Azure Databricks, Azure HDInsight, ML Studio (classic), Azure SQL Database, SQL Server, and others.

SSIS package execution: SQL Server Integration Services (SSIS) packages can be executed natively in a managed Azure compute environment.

Important terms
  • An activity defines the action to be performed in Data Factory and Synapse pipelines.

  • A linked service specifies a destination data store or compute service.

  • An integration runtime acts as a link between the activity and the linked Services.

Types of Integration Runtime

Data Factory offers three different types of Integration Runtimes from which customers can select the one that best meets their data integration and network environment needs.

The three different types include:

  • Azure
  • Self-hosted
  • Azure-SSIS

The capabilities and network support for each integration runtime type are described in the table below:

IR Type Public Network Private Network
Azure Data Flow, Data movement, Activity dispatch Data Flow, Data movement, Activity dispatch
Self-hosted Data movement,Activity dispatch Data movement, Activity dispatch
Azure-SSIS SSIS package execution SSIS package execution

Azure Integration Runtime

Microsoft is responsible for all infrastructure patching, scaling, and maintenance. The IR can only access data stores and services on public networks.

The following activities are possible with Azure Integration runtime:

  1. Execute Data Flows in Azure

  2. Execute a copy activity between cloud data stores.

  3. Dispatch the following transform activities in public network:

    • Databricks Notebook/ Jar/ Python activity
    • HDInsight Hive activity
    • HDInsight Pig activity
    • HDInsight MapReduce activity
    • HDInsight Spark activity
    • HDInsight Streaming activity
    • ML Studio (classic) Batch Execution activity
    • ML Studio (classic) Update Resource activities
    • Stored Procedure activity
    • Data Lake Analytics U-SQL activity
    • .NET custom activity
    • Web activity
    • Lookup activity
    • Get Metadata activity.

Network environment

  • Connecting to data stores and compute services with publicly accessible endpoints is supported by Azure Integration Runtime.

  • Azure Integration Runtime also supports connecting to data stores using private link service in a private network environment when Virtual network configuration is enabled.

Self-hosted Integration Runtime

Users must manage their own infrastructure and hardware for Self-hosted Integration Runtimes.

Users are responsible for all patching, scaling, and maintenance issues. The IR has access to resources in both public and private networks.

Self-hosted IR should be installed on-premises or as a virtual machine within a private network. Currently, only Windows supports self-hosted IR.

The following activities can be carried out by a self-hosted Integration Runtime:

  1. Copying data between a cloud data store and a private network data store.

  2. Dispatching the following transform activities against compute resources in on-premises or Azure Virtual Network:

    • HDInsight Hive activity (BYOC-Bring Your Own Cluster)
    • HDInsight Pig activity (BYOC)
    • HDInsight MapReduce activity (BYOC)
    • HDInsight Spark activity (BYOC)
    • HDInsight Streaming activity (BYOC)
    • ML Studio (classic) Batch Execution activity
    • ML Studio (classic) Update Resource activities
    • Stored Procedure activity
    • Data Lake Analytics U-SQL activity
    • Custom activity (runs on Azure Batch)
    • Lookup activity
    • Get Metadata activity.

Key advantage over Azure Integration Runtime

  • Consider a scenario of copying data from source to sink. When the global Azure integration runtime is associated with the linked service as the source, and an Azure integration runtime in the Azure Data Factory managed virtual network is associated with the linked service as the sink, both the source and sink linked services use the Azure integration runtime in Azure Data Factory or Synapse Workspaces using a managed virtual network.

  • When a self-hosted integration runtime associates a linked service with a source, the self-hosted integration runtime is used by both the source and sink connected services.

  • With the support of a managed virtual network, the self-hosted integration runtime takes precedence over the Azure integration runtime in Azure Data Factory or Synapse Workspaces.

Network environment

  • Users can install a self-hosted IR behind their corporate firewall or inside a virtual private network to perform data integration safely in a private network environment.

  • Only outward HTTP-based connections to the open internet are made by the self-hosted integration runtime.

Azure-SSIS Integration Runtime

  • Integration of Azure and SSIS Runtimes are virtual machines that run the SSIS engine and allow users to execute SSIS packages natively.

  • Microsoft is responsible for all infrastructure patching, scaling, and maintenance. The IR has the ability to access resources in both public and private networks.

  • Users can create an Azure-SSIS IR to natively execute SSIS packages in order to lift and shift existing SSIS workload.

Network environment

  • Azure-SSIS IR can be deployed in either the public or private networks. On-premises data access is enabled by connecting Azure-SSIS IR to a Virtual Network that is connected to your on-premises network.

Linked nodes

The Data Factory nodes associated with Integration Runtime can now be viewed in the overview, as well as the Integration Runtime resource grid.

The following node information will be provided to the user:

  • Name
  • Status
  • Max concurrent jobs
  • Last connect time

DF IR overview.png

Resource Grid.png

The Overview section of the Integration Runtime resource also displays linked services and related resources such as Data Factory pipelines.

Related services.png

Resource Dashboard

Users now have access to a default Integration Runtime Dashboard across all Integration Runtime resource types, allowing for enhanced data visualisation and real-time data tracking.

DF IR Dashboard.png

Users are provided with the following pre-defined Dashboard widgets, which can be customised to meet their specific needs.

1. CPU utilization
2. SSIS Executions - Succeeded vs Failed
3. Available Nodes

Monitoring

  • Users can monitor their Integration Runtime resources by configuring the rules available for monitoring.

  • Navigate to the Monitoring section of the resource to configure the monitoring rules for Data Factory Integration Runtime.

  • Users can specify monitoring threshold values based on their needs.

  • When the monitoring rule type is a metric, selecting metric against metric rules is also an option.

Data factory integration runtime.png

Properties

  • By selecting the Properties option at the top of the section, users can view the properties of the Integration Runtime resource across all available types.

Self-hosted properties.png

  • There is also an option to view the properties in JSON format.

Was this article helpful?