Skip to content

Integration Runtimes in Azure Data Factory

This post is part 15 of 25 in the series Beginner's Guide to Azure Data Factory

So far in this series, we have only worked with cloud data stores. But what if we need to work with on-premises data stores? After all, Azure Data Factory is a hybrid data integration service :) To do that, we need to create and configure a self-hosted integration runtime. But before we do that, let’s look at the different types of integration runtimes!

Integration Runtimes

An integration runtime (IR) specifies the compute infrastructure an activity runs on or gets dispatched from. It has access to resources in either public networks, or in public and private networks.

Or, in Cathrine-speak, using less precise words: An integration runtime specifies what kind of hardware is used to execute activities, where this hardware is physically located, who owns and maintains the hardware, and which data stores and services the hardware can connect to.

You specify which integration to use in each linked service:

Screenshot of a linked service, highlighting the integration runtime setting

Let’s look at the different types of integration runtimes!

Azure Integration Runtimes

Azure integration runtimes use infrastructure and hardware managed by Microsoft. They take care of the installation, maintenance, patching, and scaling, while you pay for the time you use it. An Azure integration runtime can only access data stores and services in public networks.

Your Azure Data Factory will always have at least one Azure integration runtime called AutoResolveIntegrationRuntime. This is the default integration runtime, and the region is set to auto-resolve. That means that Azure Data Factory decides the physical location of where to execute activities based on the source, sink, or activity type. You can find all the details in the official documentation.

If you need to ensure that data does not leave a specific region, for legal reasons, you can create new Azure integration runtimes in specific regions.

How do I create an Azure integration runtime?

Open connections, click on integration runtimes, then click + new:

Screenshot of the Azure Data Factory interface with the integration runtimes open, highlighting the new integration runtime button

Select “perform data movement and dispatch activities“:

Screenshot of the new integration runtime pane

Then, select the Azure integration runtime:

Screenshot of the new integration runtime pane, with the Azure integration runtime selected

Finally, give the new integration runtime a name, description, and specify the region:

Screenshot of the new integration runtime pane, with the West Europe region selected

You can also specify the data flow settings in the Azure integration runtime, if you need to scale up the performance:

Screenshot of the new integration runtime pane, with the data flow settings highlighted

Self-Hosted Integration Runtimes

Self-hosted integration runtimes use infrastructure and hardware managed by you. You take care of all the installation, maintenance, patching, and scaling, but you also pay for the time you use it through Azure Data Factory. A self-hosted integration runtime can access resources in both public and private networks.

A self-hosted integration runtime works like a gateway. You install the integration runtime on a machine inside the private network, and then it can communicate with the Azure Data Factory.

Azure-SSIS Integration Runtimes

Azure-SSIS integration runtimes are clusters of virtual machines running the SQL Server Integration Services (SSIS) engine, managed by Microsoft. They take care of all the installation, maintenance, patching, and scaling, while you pay for the time you use it. An Azure-SSIS integration runtime is used for executing SSIS packages in Azure Data Factory. Those SSIS packages can access resources in both public and private networks.

Which integration runtime should I use?

You use an Azure integration runtime when you:

  • Copy data between cloud stores
  • Transform data between cloud stores using data flows
  • Execute activities using cloud stores and services
Illustration of copying data using Azure integration runtimes
Illustration of transforming data using Azure integration runtimes
Illustration of executing activities using Azure integration runtimes

You use a self-hosted integration runtime when you:

  • Copy data between cloud and on-premises stores
  • Copy data between on-premises stores
  • Execute activities using on-premises stores and services
Illustration of copying data between cloud and on-premises stores using self-hosted integration runtimes
Illustration of copying data between on-premises stores using self-hosted integration runtimes
Illustration of executing activities using self-hosted integration runtimes

You use an Azure-SSIS integration runtime when you:

  • Execute SSIS Packages through Azure Data Factory
Illustration of executing SSIS packages using Azure-SSIS integration runtimes

Summary

In this post, we looked at the use cases for and differences between Azure, Self-Hosted, and Azure-SSIS integration runtimes. If you are using cloud stores and services, or transform data using data flows, use an Azure integration runtime. If you are using on-premises stores and services, use a self-hosted integration runtime. And if you want to execute SSIS packages in Azure Data Factory, use an Azure-SSIS integration runtime.

We’ll get back to executing SSIS packages in Azure Data Factory. But first, let’s look at how we can install and use a self-hosted integration runtime for copying SQL Server data!

🤓

About the Author

Cathrine Wilhelmsen is a Microsoft Data Platform MVP, BimlHero Certified Expert, Microsoft Certified Solutions Expert, international speaker, author, blogger, and chronic volunteer who loves teaching and sharing knowledge. She works as a Senior Business Intelligence Consultant at Inmeta, focusing on Azure Data and the Microsoft Data Platform. She loves sci-fi, chocolate, coffee, craft beers, ciders, cat gifs and smilies :)