Skip to content

Category: Data Platform

I’m a data geek :) In fact, I like data so much that I have made it my career! I work with Azure Data and the Microsoft Data Platform, focusing on Data Integration using Azure Data Factory (ADF) and SQL Server Integration Services (SSIS).

In this category, I write technical posts and guides, and share my experiences with certification exams. You can also find a few interviews with Azure and SQL Server experts!

Azure Data posts cover topics like Azure Data Factory, Azure SQL Databases, Azure Data Lake Storage, and Azure Synapse Analytics. Microsoft Data Platform posts may cover topics like SQL Server, T-SQL, and SQL Server Management Studio (SSMS). You may even find the occasional Power BI post in here!

Executing SSIS Packages in Azure Data Factory

This post is part 18 of 26 in the series Beginner's Guide to Azure Data Factory

Two posts ago, we looked at the three types of integration runtimes and created an Azure integration runtime. In the previous post, we created a self-hosted integration runtime for copying SQL Server data. In this post, we will complete the integration runtime part of the series. We will look at what SSIS Lift and Shift is, how to create an Azure-SSIS integration runtime, and how you can start executing SSIS packages in Azure Data Factory.

(And if you don’t work with SSIS, today is an excellent day to take a break from this series. Go do something fun! Like eat some ice cream. I’m totally going to eat ice cream after publishing this post 🙃)

Continue reading →

Copy SQL Server Data in Azure Data Factory

This post is part 17 of 26 in the series Beginner's Guide to Azure Data Factory

In the previous post, we looked at the three different types of integration runtimes. In this post, we will first create a self-hosted integration runtime. Then, we will create a new linked service and dataset using the self-hosted integration runtime. Finally, we will look at some common techniques and design patterns for copying data from and into an on-premises SQL Server.

And when I say “on-premises”, I really mean “in a private network”. It can either be a SQL Server on-premises on a physical server, or “on-premises” in a virtual machine.

Or, in my case, “on-premises” means a SQL Server 2019 instance running on Linux in a Docker container on my laptop 🤓

Continue reading →

Integration Runtimes in Azure Data Factory

This post is part 16 of 26 in the series Beginner's Guide to Azure Data Factory

So far in this series, we have only worked with cloud data stores. But what if we need to work with on-premises data stores? After all, Azure Data Factory is a hybrid data integration service :) To do that, we need to create and configure a self-hosted integration runtime. But before we do that, let’s look at the different types of integration runtimes!

(Pssst! Integration runtimes have been moved into the management page. I’ll be updating the descriptions and screenshots shortly!)

Integration Runtimes

An integration runtime (IR) specifies the compute infrastructure an activity runs on or gets dispatched from. It has access to resources in either public networks, or in public and private networks.

Or, in Cathrine-speak, using less precise words: An integration runtime specifies what kind of hardware is used to execute activities, where this hardware is physically located, who owns and maintains the hardware, and which data stores and services the hardware can connect to.

Continue reading →

Annotations and User Properties in Azure Data Factory

This post is part 15 of 26 in the series Beginner's Guide to Azure Data Factory

In the previous post, we looked at how monitoring and alerting works. But what if we want to customize the monitoring views even further? There are a few ways to do that in Azure Data Factory. In this post, we will add both annotations and custom properties.

But before we do that, let’s look at a few more ways to customize the monitoring views.

Customizing Monitoring Views

In the previous post, we mainly looked at how to configure the monitoring and alerting features. We saw that we could change filters and switch between list and Gantt views, but it’s possible to tweak the interface even more to our liking.

Continue reading →

Monitoring Azure Data Factory

This post is part 14 of 26 in the series Beginner's Guide to Azure Data Factory

In the previous post, we looked at the three different trigger types, as well as how to trigger pipelines on-demand. In this post, we will look at what happens after that. How does monitoring work in Azure Data Factory?

Now, if we want to look at monitoring, we probably need something to monitor first. I mean, I could show you a blank dashboard, but I kind of already did that, and that wasn’t really interesting at all 🤔 So! In the previous post, I created a schedule trigger that runs hourly, added it to my orchestration pipeline, and published it.

Let’s take a look at what has happened since then!

Continue reading →

Triggers in Azure Data Factory

This post is part 13 of 26 in the series Beginner's Guide to Azure Data Factory

In the previous post, we looked at testing and debugging pipelines. But how do you schedule your pipelines to run automatically? In this post, we will look at the different types of triggers in Azure Data Factory.

Let’s start by looking at the user interface, and dig into the details of the different trigger types.

(Pssst! Triggers have been moved into the management page. I’ll be updating the descriptions and screenshots shortly!)

Creating Triggers

First, click Triggers. Then, on the linked services tab, click New:

Screenshot of Azure Data Factory user interface with Triggers open, highlighting the button for creating a new trigger

The New Trigger pane will open. The default trigger type is Schedule, but you can also choose Tumbling Window and Event:

Screenshot of Azure Data Factory user interface with the New Trigger pane open and the different trigger types highlighted

Let’s look at each of these trigger types and their properties :)

Continue reading →

Debugging Pipelines in Azure Data Factory

This post is part 12 of 26 in the series Beginner's Guide to Azure Data Factory

In the previous post, we looked at orchestrating pipelines using branching, chaining, and the execute pipeline activity. In this post, we will look at debugging pipelines. How do we test our solutions?

You debug a pipeline by clicking the debug button:

Screenshot of the Azure Data Factory interface, with a pipeline open, and the debug button highlighted

Tadaaa! Blog post done? :D

I joke, I joke, I joke. Debugging pipelines is a one-click operation, but there are a few more things to be aware of. In the rest of this post, we will look at what happens when you debug a pipeline, how to see the debugging output, and how to set breakpoints.

(Pssst! The debugging experience has had a huge makeover since I first wrote this post. I’ll be updating everything shortly!)

Debugging Pipelines

Let’s start with the most important thing:

Continue reading →

Orchestrating Pipelines in Azure Data Factory

This post is part 11 of 26 in the series Beginner's Guide to Azure Data Factory

In the previous post, we peeked at the two different data flows in Azure Data Factory, then created a basic mapping data flow. In this post, we will look at orchestrating pipelines using branching, chaining, and the execute pipeline activity.

Let’s continue where we left off in the previous post. How do we wire up our solution and make it look something like this?

Diagram showing data being copied from an on-premises data center to Azure Data Lake Storage, and then transformed from Azure Data Lake Storage to Azure Synapse Analytics (previously Azure SQL Data Warehouse)

We need to make sure that we get the data before we can transform that data.

One way to build this solution is to create a single pipeline with a copy data activity followed by a data flow activity. But! Since we have already created two separate pipelines, and this post is about orchestrating pipelines, let’s go with the second option :D

Continue reading →

Data Flows in Azure Data Factory

This post is part 10 of 26 in the series Beginner's Guide to Azure Data Factory

So far in this Azure Data Factory series, we have looked at copying data. We have created pipelines, copy data activities, datasets, and linked services. In this post, we will peek at the second part of the data integration story: using data flows for transforming data.

But first, I need to make a confession. And it’s slightly embarrassing…

Continue reading →

Linked Services in Azure Data Factory

This post is part 9 of 26 in the series Beginner's Guide to Azure Data Factory

In the previous post, we looked at datasets and their properties. In this post, we will look at linked services in more detail. How do you configure them? What are the authentication options for Azure services? And how do you securely store your credentials?

Let’s start by creating a linked service to an Azure SQL Database. Yep, that linked service you saw screenshots of in the previous post. Mhm, the one I sneakily created already so I could explain using datasets as a bridge to linked services. That one :D

(Pssst! Linked services have been moved into the management page. I’ll be updating the descriptions and screenshots shortly!)

Creating Linked Services

First, click Connections. Then, on the linked services tab, click New:

Screenshot of the Azure Data Factory user interface showing the connections tab with linked services highlighted

The New Linked Service pane will open. The Data Store tab shows all the linked services you can get data from or read data to:

Continue reading →