So far in this series, we have only worked with cloud data stores. But what if we need to work with on-premises data stores? After all, Azure Data Factory is a hybrid data integration service :) To do that, we need to create and configure a self-hosted integration runtime. But before we do that, let’s look at the different types of integration runtimes!
(Pssst! Integration runtimes have been moved into the management page. I’ll be updating the descriptions and screenshots shortly!)
An integration runtime (IR) specifies the compute infrastructure an activity runs on or gets dispatched from. It has access to resources in either public networks, or in public and private networks.
Or, in Cathrine-speak, using less precise words: An integration runtime specifies what kind of hardware is used to execute activities, where this hardware is physically located, who owns and maintains the hardware, and which data stores and services the hardware can connect to.
You specify which integration to use in each linked service:
Let’s look at the different types of integration runtimes!
Azure Integration Runtimes
Azure integration runtimes use infrastructure and hardware managed by Microsoft. They take care of the installation, maintenance, patching, and scaling, while you pay for the time you use it. An Azure integration runtime can only access data stores and services in public networks.
Your Azure Data Factory will always have at least one Azure integration runtime called AutoResolveIntegrationRuntime. This is the default integration runtime, and the region is set to auto-resolve. That means that Azure Data Factory decides the physical location of where to execute activities based on the source, sink, or activity type. You can find all the details in the official documentation.
If you need to ensure that data does not leave a specific region, for legal reasons, you can create new Azure integration runtimes in specific regions.
How do I create an Azure integration runtime?
Open connections, click on integration runtimes, then click + new:
Select “perform data movement and dispatch activities“:
Then, select the Azure integration runtime:
Finally, give the new integration runtime a name, description, and specify the region:
You can also specify the data flow settings in the Azure integration runtime, if you need to scale up the performance:
Self-Hosted Integration Runtimes
Self-hosted integration runtimes use infrastructure and hardware managed by you. You take care of all the installation, maintenance, patching, and scaling, but you also pay for the time you use it through Azure Data Factory. A self-hosted integration runtime can access resources in both public and private networks.
A self-hosted integration runtime works like a gateway. You install the integration runtime on a machine inside the private network, and then it can communicate with the Azure Data Factory.
Azure-SSIS Integration Runtimes
Azure-SSIS integration runtimes are clusters of virtual machines running the SQL Server Integration Services (SSIS) engine, managed by Microsoft. They take care of all the installation, maintenance, patching, and scaling, while you pay for the time you use it. An Azure-SSIS integration runtime is used for executing SSIS packages in Azure Data Factory. Those SSIS packages can access resources in both public and private networks.
Which integration runtime should I use?
You use an Azure integration runtime when you:
- Copy data between cloud stores
- Transform data between cloud stores using data flows
- Execute activities using cloud stores and services
You use a self-hosted integration runtime when you:
- Copy data between cloud and on-premises stores
- Copy data between on-premises stores
- Execute activities using on-premises stores and services
You use an Azure-SSIS integration runtime when you:
- Execute SSIS Packages through Azure Data Factory
In this post, we looked at the use cases for and differences between Azure, Self-Hosted, and Azure-SSIS integration runtimes. If you are using cloud stores and services, or transform data using data flows, use an Azure integration runtime. If you are using on-premises stores and services, use a self-hosted integration runtime. And if you want to execute SSIS packages in Azure Data Factory, use an Azure-SSIS integration runtime.
We’ll get back to executing SSIS packages in Azure Data Factory. But first, let’s look at how we can install and use a self-hosted integration runtime for copying SQL Server data!