Welcome to this Beginner’s Guide to Azure Data Factory! In this series, I’m going to cover the fundamentals of Azure Data Factory in casual, bite-sized blog posts that you can read through at your own pace and reference later. You may not be new to ETL, data integration, Azure, or SQL, but we’re going to start completely from scratch when it comes to Azure Data Factory.
How do you get started building data pipelines? What if you need to transform or re-shape data? How do you schedule and monitor your data pipelines? Can you make your solution dynamic and reusable? Join me in this Beginner’s Guide to Azure Data Factory to learn all of these things - and maybe more. 🤓 Let’s go!
In the last mini-series inside the series (🙃), we will go through how to build dynamic pipelines in Azure Data Factory. In this post, we will look at parameters, expressions, and functions. Later, we will look at variables, loops, and lookups. Fun!
But first, let’s take a step back and discuss why we want to build dynamic pipelines at all.
Pssst! There are now also Global Parameters, woohoo! They didn't exist when I first wrote this blog post. I'm working on updating the descriptions and screenshots, thank you for your understanding and patience 😊
In the previous post, we talked about why you would want to build a dynamic solution, then looked at how to use parameters. In this post, we will look at variables, how they are different from parameters, and how to use the set variable and append variable activities.
Parameters are external values passed into pipelines. They can’t be changed inside a pipeline. Variables, on the other hand, are internal values that live inside a pipeline. They can be changed inside that pipeline.
Parameters and variables can be completely separate, or they can work together. For example, you can pass a parameter into a pipeline, and then use that parameter value in a set variable or append variable activity.
In the previous post, we looked at how to use variables in pipelines. We took a sneak peek at working with an array, but we didn’t actually do anything with it. But now, we will! In this post, we will look at how to use arrays to control foreach loops.
You can use foreach loops to execute the same set of activities or pipelines multiple times, with different values each time. A foreach loop iterates over a collection. That collection can be either an array or a more complex object. Inside the loop, you can reference the current value using @item().
Let’s take a look at how this works in Azure Data Factory!
In the previous post, we looked at foreach loops and how to control them using arrays. But you can also control them using more complex objects! In this post, we will look at lookups. How do they work? What can you use them for? And how do you use the output in later activities, like controlling foreach loops?
Lookups are similar to copy data activities, except that you only get data from lookups. They have a source dataset, but they do not have a sink dataset. (So, like… half a copy data activity? 😄) Instead of copying data into a destination, you use lookups to get configuration values that you use in later activities.
And how you use the configuration values in later activities depends on whether you choose to get the first row only or all rows.
But before we dig into that, let’s create the configuration datasets!
Congratulations! You’ve made it through my entire Beginner’s Guide to Azure Data Factory 🤓 We’ve gone through the fundamentals in the first 24 posts, and now we just have one more thing to talk about: Pricing.
And today, I’m actually going to talk! You see, in January 2022, I presented a 10-minute session at DataMinutes about understanding pipeline pricing in Azure Data Factory and Azure Synapse Analytics. And since it was recorded and the recording is available for free for everyone… Well, let’s just say that after 24 posts, I think we could both appreciate a short break from reading and writing 😅