Skip to content

Series: Beginner's Guide to Azure Data Factory

Welcome to this Beginner’s Guide to Azure Data Factory! In this series, I’m going to cover the fundamentals of Azure Data Factory in fun, casual, bite-sized blog posts that you can read through at your own pace and reference later. You may not be new to ETL, data integration, Azure, or SQL, but we’re going to start completely from scratch when it comes to Azure Data Factory.

How do you get started building data pipelines? What if you need to transform or re-shape data? How do you schedule and monitor your data pipelines? Can you make your solution dynamic and reusable? Join me in this Beginner’s Guide to Azure Data Factory to learn all of these things – and maybe more. 🤓 Let’s go!

  1. Introduction to Azure Data Factory
  2. Creating an Azure Data Factory
  3. Overview of Azure Data Factory User Interface
  4. Overview of Azure Data Factory Components
  5. Copy Data Tool
  6. Pipelines
  7. Copy Data Activity
  8. Datasets
  9. Linked Services
  10. Data Flows
  11. Orchestrating Pipelines
  12. Debugging Pipelines
  13. Triggers
  14. Monitoring
  15. Annotations and User Properties
  16. Integration Runtimes
  17. Copy SQL Server Data
  18. Executing SSIS Packages
  19. Source Control
  20. Templates
  21. Parameters
  22. Variables
  23. ForEach Loops
  24. Lookups
  25. Understanding Pricing
  26. Resources

P.S. This series will always be a work-in-progress. Yes, always. Azure changes often, so I keep coming back to tweak, update, and improve content. I just might not be able to do it right away!

Introduction to Azure Data Factory

This post is part 1 of 26 in the series Beginner's Guide to Azure Data Factory

Hi! I’m Cathrine 👋🏻 I really like Azure Data Factory. It’s one of my favorite topics, I can talk about it for hours. (And I do.) But talking about it can only help so many people – the ones who happen to attend an event where I’m presenting a session. So I’ve decided to try something new… I’m going to write an introduction to Azure Data Factory! And not just one blog post. A whole bunch of them.

I’m going to take all the things I like to talk about and turn them into bite-sized blog posts that you can read through at your own pace and reference later. I’ve named this series Beginner’s Guide to Azure Data Factory. You may not be new to ETL, data integration, Azure, or SQL, but we’re going to start completely from scratch when it comes to Azure Data Factory.

Does that sound good? Are you in? Cool. Let’s go!

Continue reading →

Creating an Azure Data Factory

This post is part 2 of 26 in the series Beginner's Guide to Azure Data Factory

In the introduction to Azure Data Factory, we learned a little bit about the history of Azure Data Factory and what you can use it for. In this post, we will be creating an Azure Data Factory and navigating to it.

Spoiler alert! Creating an Azure Data Factory is a fairly quick click-click-click process, and you’re done. But! Before you can do that, you need an Azure Subscription, and the right permissions on that subscription. Let’s get that sorted out first.

Azure Subscription and Permissions

If you don’t already have an Azure Subscription, you can create a free account on azure.microsoft.com/free. (Woohoo! Free! Yay!) Some of the Azure services will always be free, while some are free for the first 12 months. You get $200 worth of credits that last 30 days so you can test and learn the paid Azure services. One tip: Time your free account wisely ⏳

If you already have an Azure subscription, make sure that you have the permissions you need. To create an Azure Data Factory, you need to either:

Continue reading →

Overview of Azure Data Factory User Interface

This post is part 3 of 26 in the series Beginner's Guide to Azure Data Factory

In the previous post, we started by creating an Azure Data Factory, then we navigated to it. In this post, we will navigate inside the Azure Data Factory. Let’s look at the Azure Data Factory user interface and the four Azure Data Factory pages.

Azure Data Factory Pages

On the left side of the screen, you will see the main navigation menu. Click on the arrows to expand and collapse the menu:

Animation of expanding and collapsing the pages menu in the Azure Data Factory user interface

Once we expand the navigation menu, we see that Azure Data Factory consists of four main pages: Home, Author, Monitor, and Manage:

Screenshot of the Azure Data Factory user interface showing the four main pages: Data Factory, Author, Monitor, and Manage
Continue reading →

Overview of Azure Data Factory Components

This post is part 4 of 26 in the series Beginner's Guide to Azure Data Factory

In the previous post, we looked at the Azure Data Factory user interface and the four main Azure Data Factory pages. In this post, we will go through the Author page in more detail and look at a few things on the Monitoring page. Let’s look at the different Azure Data Factory components!

Azure Data Factory Components on the Author Page

On the left side of the Author page, you will see your factory resources. In this example, we have already created one pipeline, two datasets, one data flow, and one power query:

Screenshot of the Author page in Azure Data Factory, with one Pipeline, two Datasets, and one Data Flow already created

Let’s go through each of these Azure Data Factory components and explain what they are and what they do.

Continue reading →

Copy Data Tool in Azure Data Factory

This post is part 5 of 26 in the series Beginner's Guide to Azure Data Factory

In the previous post, we looked at the different Azure Data Factory components. In this post, we’re going to tie everything together and start making things happen. Woohoo! First, we will get familiar with our demo datasets. Then, we will create our Azure Data Lake Storage Account that we will copy data into. Finally, we will start copying data using the Copy Data Tool.

Demo Datasets

First, let’s get familiar with the demo datasets we will be using. I don’t know about you, but I’m a teeny tiny bit tired of the AdventureWorks demos. (I don’t even own a bike…) WideWorldImporters is at least a little more interesting. (Yay, IT joke mugs and chocolate frogs!) But! Let’s use something that might be a little bit more fun to explore.

Let me present… *drumroll* 🥁

Continue reading →

Pipelines in Azure Data Factory

This post is part 6 of 26 in the series Beginner's Guide to Azure Data Factory

In the previous post, we used the Copy Data Tool to copy a file from our demo dataset to our data lake. The Copy Data Tool created all the factory resources for us: pipelines, activities, datasets, and linked services.

In this post, we will go through pipelines in more detail. How do we create and organize them? What are their main properties? Can we edit them without using the graphical user interface?

Continue reading →

Copy Data Activity in Azure Data Factory

This post is part 7 of 26 in the series Beginner's Guide to Azure Data Factory

In the previous post, we went through Azure Data Factory pipelines in more detail. In this post, we will dig into the copy data activity. How does it work? How do you configure the settings? And how can you optimize performance while keeping costs down?

Copy Data Activity

The copy data activity is the core (*) activity in Azure Data Factory.

(* Cathrine’s opinion 🤓)

You can copy data to and from more than 80 Software-as-a-Service (SaaS) applications (such as Dynamics 365 and Salesforce), on-premises data stores (such as SQL Server and Oracle), and cloud data stores (such as Azure SQL Database and Amazon S3). During copying, you can define and map columns implicitly or explicitly, convert file formats, and even zip and unzip files – all in one task.

Yeah. It’s powerful :) But how does it really work?

Continue reading →

Datasets in Azure Data Factory

This post is part 8 of 26 in the series Beginner's Guide to Azure Data Factory

In the previous post, we looked at the copy data activity and saw how the source and sink properties changed with the datasets used. In this post, we will take a closer look at some common datasets and their properties.

Let’s start with the source and sink datasets we created in the copy data wizard!

Dataset Names

First, a quick note. If you use the copy data wizard, you can change the dataset names by clicking the edit button on the summary page…

Continue reading →

Linked Services in Azure Data Factory

This post is part 9 of 26 in the series Beginner's Guide to Azure Data Factory

In the previous post, we looked at datasets and their properties. In this post, we will look at linked services in more detail. How do you configure them? What are the authentication options for Azure services? And how do you securely store your credentials?

Let’s start by creating a linked service to an Azure SQL Database. Yep, that linked service you saw screenshots of in the previous post. Mhm, the one I sneakily created already so I could explain using datasets as a bridge to linked services. That one :D

(Pssst! Linked services have been moved into the management page. I’ll be updating the descriptions and screenshots shortly!)

Creating Linked Services

First, click Connections. Then, on the linked services tab, click New:

Screenshot of the Azure Data Factory user interface showing the connections tab with linked services highlighted

The New Linked Service pane will open. The Data Store tab shows all the linked services you can get data from or read data to:

Continue reading →

Data Flows in Azure Data Factory

This post is part 10 of 26 in the series Beginner's Guide to Azure Data Factory

So far in this Azure Data Factory series, we have looked at copying data. We have created pipelines, copy data activities, datasets, and linked services. In this post, we will peek at the second part of the data integration story: using data flows for transforming data.

But first, I need to make a confession. And it’s slightly embarrassing…

Continue reading →