Skip to content

Series: Beginner's Guide to Azure Data Factory

Welcome to this Beginner’s Guide to Azure Data Factory! In this series, I’m going to cover the fundamentals of Azure Data Factory in fun, casual, bite-sized blog posts that you can read through at your own pace and reference later. You may not be new to ETL, data integration, Azure, or SQL, but we’re going to start completely from scratch when it comes to Azure Data Factory.

How do you get started building data pipelines? What if you need to transform or re-shape data? How do you schedule and monitor your data pipelines? Can you make your solution dynamic and reusable? Join me in this Beginner’s Guide to Azure Data Factory to learn all of these things – and maybe more. 🤓 Let’s go!

  1. Introduction to Azure Data Factory
  2. Creating an Azure Data Factory
  3. Overview of Azure Data Factory User Interface
  4. Overview of Azure Data Factory Components
  5. Copy Data Wizard
  6. Pipelines
  7. Copy Data Activity
  8. Datasets
  9. Linked Services
  10. Data Flows
  11. Orchestrating Pipelines
  12. Debugging Pipelines
  13. Triggers
  14. Monitoring
  15. Annotations and User Properties
  16. Integration Runtimes
  17. Copy SQL Server Data
  18. Executing SSIS Packages
  19. Source Control
  20. Templates
  21. Parameters
  22. Variables
  23. ForEach Loops
  24. Lookups
  25. Understanding Pricing
  26. Resources

P.S. This series will always be a work-in-progress. Yes, always. Azure changes often, so I keep coming back to tweak, update, and improve content. I just might not be able to do it right away!

Parameters in Azure Data Factory

This post is part 21 of 26 in the series Beginner's Guide to Azure Data Factory

In the last mini-series inside the series (🙃), we will go through how to build dynamic pipelines in Azure Data Factory. In this post, we will look at parameters, expressions, and functions. Later, we will look at variables, loops, and lookups. Fun!

But first, let’s take a step back and discuss why we want to build dynamic pipelines at all.

(Pssst! There are now also global parameters, woohoo! They didn’t exist when I first wrote this blog post. I’ll be adding this shortly!)

Hardcoded Solutions

Back in the post about the copy data activity, we looked at our demo datasets. The LEGO data from Rebrickable consists of nine CSV files. So far, we have hardcoded the values for each of these files in our example datasets and pipelines.

Now imagine that you want to copy all the files from Rebrickable to your Azure Data Lake Storage account. Then copy all the data from your Azure Data Lake Storage into your Azure SQL Database. What will it look like if you have to create all the individual datasets and pipelines for these files?

Like this. It will look like this:

Screenshot of nine different datasets connecting to the Rebrickable website
Screenshot of nine different datasets connecting to Azure Data Lake Storage
Screenshot of nine different datasets connecting to Azure SQL Database
Screenshot of nine different pipelines copying data from the Rebrickable website to Azure Data Lake Storage
Screenshot of nine different pipelines copying data from Azure Data Lake Storage to Azure SQL Database

Hooboy! I don’t know about you, but I do not want to create all of those resources! 🤯

(And I mean, I have created all of those resources, and then some. I currently have 56 hardcoded datasets and 72 hardcoded pipelines in my demo environment, because I have demos of everything. And I don’t know about you, but I never want to create all of those resources again! 😂)

So! What can we do instead?

Dynamic Solutions

We can build dynamic solutions!

Continue reading →

Variables in Azure Data Factory

This post is part 22 of 26 in the series Beginner's Guide to Azure Data Factory

In the previous post, we talked about why you would want to build a dynamic solution, then looked at how to use parameters. In this post, we will look at variables, how they are different from parameters, and how to use the set variable and append variable activities.

Variables

Parameters are external values passed into pipelines. They can’t be changed inside a pipeline. Variables, on the other hand, are internal values that live inside a pipeline. They can be changed inside that pipeline.

Parameters and variables can be completely separate, or they can work together. For example, you can pass a parameter into a pipeline, and then use that parameter value in a set variable or append variable activity.

Continue reading →

ForEach Loops in Azure Data Factory

This post is part 23 of 26 in the series Beginner's Guide to Azure Data Factory

In the previous post, we looked at how to use variables in pipelines. We took a sneak peek at working with an array, but we didn’t actually do anything with it. But now, we will! In this post, we will look at how to use arrays to control foreach loops.

ForEach Loops

You can use foreach loops to execute the same set of activities or pipelines multiple times, with different values each time. A foreach loop iterates over a collection. That collection can be either an array or a more complex object. Inside the loop, you can reference the current value using @item().

Let’s take a look at how this works in Azure Data Factory!

Continue reading →

Lookups in Azure Data Factory

This post is part 24 of 26 in the series Beginner's Guide to Azure Data Factory

In the previous post, we looked at foreach loops and how to control them using arrays. But you can also control them using more complex objects! In this post, we will look at lookups. How do they work? What can you use them for? And how do you use the output in later activities, like controlling foreach loops?

Lookups

Lookups are similar to copy data activities, except that you only get data from lookups. They have a source dataset, but they do not have a sink dataset. (So, like… half a copy data activity? :D) Instead of copying data into a destination, you use lookups to get configuration values that you use in later activities.

And how you use the configuration values in later activities depends on whether you choose to get the first row only or all rows.

But before we dig into that, let’s create the configuration datasets!

Continue reading →

Understanding Pricing in Azure Data Factory

This post is part 25 of 26 in the series Beginner's Guide to Azure Data Factory

Congratulations! You’ve made it through my entire Beginner’s Guide to Azure Data Factory 🤓 We’ve gone through the fundamentals in the first 23 posts, and now we just have one more thing to talk about: Pricing.

And today, I’m actually going to talk! You see, in November 2019, I presented a 20-minute session at Microsoft Ignite about understanding Azure Data Factory pricing. And since it was recorded and the recording is available for free for everyone… Well, let’s just say that after 23 posts, I think we could both appreciate a short break from reading and writing 😅

(And as a side note, I’m originally publishing this post on December 24th. Here in Norway, we celebrate Christmas all day today! This is the biggest family day of the year for me, full of food and traditions. So instead of spending a lot of time writing today, I’m going to link to my video and spend the rest of the day with my family. Yay! 🎅🏻🎄🎁)

Continue reading →

Azure Data Factory Resources

This post is part 26 of 26 in the series Beginner's Guide to Azure Data Factory

For the past 25 days, I have written one blog post per day about Azure Data Factory. My goal was to start completely from scratch and cover the fundamentals in casual, bite-sized blog posts. This became the Beginner’s Guide to Azure Data Factory. Today, I will share a bunch of resources to help you continue your own learning journey.

I’ve already seen from your questions and comments that you are ready to jump way ahead and dive into way more advanced topics than I ever intended this series to cover 😉 And as much as I love Azure Data Factory, I can’t cover everything. So a little further down, I will share where and how and from who you can continue learning about Azure Data Factory.

But first…

That’s a wrap! Woohoo 🥳

Continue reading →