Skip to content

Parameters in Azure Data Factory

In the last mini-series inside the series (🙃), we will go through how to build dynamic pipelines in Azure Data Factory. In this post, we will look at parameters, expressions, and functions. Later, we will look at variables, loops, and lookups. Fun!

But first, let’s take a step back and discuss why we want to build dynamic pipelines at all.

Hardcoded Solutions

Back in the post about the copy data activity, we looked at our demo datasets. The LEGO data from Rebrickable consists of nine CSV files. So far, we have hardcoded the values for each of these files in our example datasets and pipelines.

Now imagine that you want to copy all the files from Rebrickable to your Azure Data Lake Storage account. Then copy all the data from your Azure Data Lake Storage into your Azure SQL Database. What will it look like if you have to create all the individual datasets and pipelines for these files?

Like this. It will look like this:

Screenshot of nine different datasets connecting to the Rebrickable website
Screenshot of nine different datasets connecting to Azure Data Lake Storage
Screenshot of nine different datasets connecting to Azure SQL Database
Screenshot of nine different pipelines copying data from the Rebrickable website to Azure Data Lake Storage
Screenshot of nine different pipelines copying data from Azure Data Lake Storage to Azure SQL Database

Hooboy! I don’t know about you, but I do not want to create all of those resources! 🤯

(And I mean, I have created all of those resources, and then some. I currently have 56 hardcoded datasets and 72 hardcoded pipelines in my demo environment, because I have demos of everything. And I don’t know about you, but I never want to create all of those resources again! 😂)

So! What can we do instead?

Dynamic Solutions

We can build dynamic solutions!

Creating hardcoded datasets and pipelines is not a bad thing in itself. It’s only when you start creating many similar hardcoded resources that things get tedious and time-consuming. Not to mention, the risk of manual errors goes drastically up when you feel like you create the same resource over and over and over again.

(Trust me. When I got to demo dataset #23 in the screenshots above 👆🏻, I had pretty much tuned out and made a bunch of silly mistakes. I went through that so you won’t have to! 😅)

And that’s when you want to build dynamic solutions. When you can reuse patterns to reduce development time and lower the risk of errors :)

How dynamic should the solution be?

It can be oh-so-tempting to want to build one solution to rule them all. (Especially if you love tech and problem-solving, like me. It’s fun figuring things out!) But be mindful of how much time you spend on the solution itself. If you start spending more time figuring out how to make your solution work for all sources and all edge-cases, or if you start getting lost in your own framework… stop.

Your solution should be dynamic enough that you save time on development and maintenance, but not so dynamic that it becomes difficult to understand.

…don’t try to make a solution that is generic enough to solve everything :)

Your goal is to deliver business value. If you end up looking like this cat, spinning your wheels and working hard (and maybe having lots of fun) but without getting anywhere, you are probably over-engineering your solution.

Alright, now that we’ve got the warnings out the way… Let’s start by looking at parameters :)

Parameters

You can use parameters to pass external values into pipelines, datasets, linked services, and data flows. Once the parameter has been passed into the resource, it cannot be changed. By parameterizing resources, you can reuse them with different values each time.

For example, instead of hardcoding the file name from Rebrickable in each dataset, we can parameterize the file name value. Then, we can pass the file name in as a parameter each time we use the dataset.

That means that we can go from nine datasets to one dataset:

Illustration of turning nine hardcoded datasets into one parameterized dataset

And now we’re starting to save some development time, huh? :D

Let’s look at how to parameterize our datasets.

Dataset Parameters

I have previously created two datasets, one for themes and one for sets. I’m going to change sets to be a generic dataset instead.

(Oof, that was a lot of “sets”. I should probably have picked a different example 😅 Anyway!)

Open the dataset, go to the parameters properties, and click + new:

Screenshot of a dataset, showing the parameters properties, highlighting the new parameter button

Add a new parameter named FileName, of type String, with the default value of FileName:

Screenshot of a dataset, showing the parameters properties, highlighting a new parameter

Go to the connection properties and click inside the relative URL field. The add dynamic content link will appear under the text box:

Screenshot of a dataset, showing the connection properties, highlighting the relative URL field with the add dynamic content link underneath

When you click the link (or use ALT+P), the add dynamic content pane opens. Click the new FileName parameter:

Screenshot of a dataset, showing the add dynamic content pane, highlighting the available parameters

The FileName parameter will be added to the dynamic content. Notice the @dataset().FileName syntax:

Screenshot of a dataset, showing the add dynamic content pane, highlighting that a parameter has been added

When you click finish, the relative URL field will use the new parameter. Notice that the box turns blue, and that a delete icon appears. This shows that the field is using dynamic content. You can click the delete icon to clear the dynamic content:

Screenshot of a dataset, showing the connection properties, highlighting the relative URL field with the new parameter added

Finally, go to the general properties and change the dataset name to something more generic:

Screenshot of a dataset, showing the general properties, highlighting the new dataset name

…and double-check that there is no schema defined, since we want to use this dataset for different files and schemas:

Screenshot of a dataset, showing the empty schema

We now have a parameterized dataset, woohoo! Let’s see how we can use this in a pipeline.

Pipeline Parameters

I have previously created a pipeline for themes. I’m going to change this to use the parameterized dataset instead of the themes dataset.

Open the copy data activity, and change the source dataset:

Screenshot of a pipeline, highlighting changing the source dataset

When we choose a parameterized dataset, the dataset properties will appear:

Screenshot of a pipeline with a parameterized dataset used as a source, highlighting the dataset properties

Now, we have two options. The first option is to hardcode the dataset parameter value:

Screenshot of a pipeline with a parameterized dataset used as a source, highlighting the dataset properties using a hardcoded value

If we hardcode the dataset parameter value, we don’t need to change anything else in the pipeline. The pipeline will still be for themes only.

…but where’s the fun in that? :D Let’s change the rest of the pipeline as well!

The second option is to create a pipeline parameter and pass the parameter value from the pipeline into the dataset.

Click to open the add dynamic content pane:

Screenshot of a pipeline with a parameterized dataset used as a source, highlighting the dataset properties using a dynamic value

We can create parameters from the pipeline interface, like we did for the dataset, or directly in the add dynamic content pane. There is a little + button next to the filter field. Click that to create a new parameter. (Totally obvious, right? No, no it’s not. Not at all 😂)

Screenshot of the add dynamic content pane, highlighting the add parameter button

Create the new parameter:

Screenshot of the new parameter pane, opened from the add dynamic content pane

Click to add the new FileName parameter to the dynamic content:

Screenshot of the add dynamic content pane, using a pipeline parameter

Notice the @pipeline().parameters.FileName syntax:

Screenshot of a pipeline with a parameterized dataset used as a source, highlighting the dataset properties using a pipeline parameter

Changing the rest of the pipeline

To change the rest of the pipeline, we need to create a new parameterized dataset for the sink:

Screenshot of the sink dataset with a new parameter

Change the sink dataset in the pipeline:

Screenshot of a pipeline with a parameterized dataset used as a sink

And rename the pipeline and copy data activity to something more generic:

Screenshot of a parameterized pipeline with a generic name

That’s it!

…or is it?

If you are asking “but what about the fault tolerance settings and the user properties that also use the file name?” then I will answer “that’s an excellent question!” :D

There’s one problem, though… The fault tolerance setting doesn’t use “themes.csv“, it uses “lego/errors/themes“:

Screenshot of the pipeline settings, not using parameters yet

And the user properties contain the path information in addition to the file name:

Screenshot of the user properties, not using parameters yet

That means that we need to rethink the parameter value. Instead of passing in “themes.csv”, we need to pass in just “themes”. Then, we can use the value as part of the filename (“themes.csv”) or part of the path (“lego//themes.csv”).

How do we do that?

Combining Strings

A common task in Azure Data Factory is to combine strings, for example multiple parameters, or some text and a parameter. There are two ways you can do that.

String Concatenation

The first way is to use string concatenation. In this case, you create an expression with the concat() function to combine two or more strings:

@concat('lego//', pipeline().parameters.FileName, '.csv')

(An expression starts with the @ symbol. A function can be called within an expression.)

String Interpolation

The other way is to use string interpolation. This is my preferred method, as I think it’s much easier to read. In this case, you create one string that contains expressions wrapped in @{…}:

lego//@{pipeline().parameters.FileName}.csv

No quotes or commas, just a few extra curly braces, yay :)

Using String Interpolation in Azure Data Factory

I think Azure Data Factory agrees with me that string interpolation is the way to go. Why? Well, let’s try to click auto generate in the user properties of a pipeline that uses parameterized datasets:

Screenshot of the pipeline settings, showing the user properties, highlighting the auto generate button

Tadaaa! String interpolation. It’s magic ;)

Screenshot of the pipeline settings, showing the user properties, after clicking the auto generate button

However! Since we now only want to pass in the file name, like “themes”, you need to add the “.csv” part yourself:

Screenshot of the pipeline settings, showing the user properties, after changing the user properties

We also need to change the fault tolerance settings:

Screenshot of the pipeline settings, showing the user properties, after changing the fault tolerance settings

And then we need to update our datasets. In the HTTP dataset, change the relative URL:

Screenshot of a dataset after adding string interpolation

In the ADLS dataset, change the file path:

Screenshot of a dataset after adding string interpolation

Now you can use “themes” or “sets” or “colors” or “parts” in the pipeline, and those values will be passed into both the source and sink datasets. Cool!

Passing Parameters

But how do we use the parameter in the pipeline? Parameters can be passed into a pipeline in three ways.

You, the user, can define which parameter value to use, for example when you click debug:

Screenshot of a pipeline, highlighting the debug button

That opens the pipeline run pane where you can set the parameter value:

Screenshot of a pipeline after clicking the debug button, highlighting the pipeline run parameters

You can set the parameter value when you trigger now:

Screenshot of a pipeline, highlighting the trigger now button

That opens the pipeline run pane where you can set the parameter value. Notice that you have to publish the pipeline first, that’s because we’ve enabled source control:

Screenshot of a pipeline, highlighting the new/edit trigger button

You can also add new / edit a trigger:

Screenshot of a pipeline, highlighting the new/edit trigger button

That opens the edit trigger pane so you can set the parameter value:

Screenshot of a pipeline after clicking the new/edit trigger button, highlighting the trigger run parameters

Finally, you can pass a parameter value when using the execute pipeline activity:

Screenshot of the execute pipeline activity, highlighting the new parameter options

How are parameters passed?

To summarize all of this, parameters are passed in one direction. You can provide the parameter value to use manually, through triggers, or through the execute pipeline activity. Then, that parameter can be passed into the pipeline and used in an activity. Activities can pass parameters into datasets and linked services.

Illustration of how parameters are passed from user/trigger/pipeline, to pipelines, to activities, to datasets, to linked services

Summary

In this post, we looked at parameters, expressions, and functions. In the next post, we will look at variables. Then, we will cover loops and lookups.

🤓

About the Author

Cathrine Wilhelmsen is a Microsoft Data Platform MVP, BimlHero Certified Expert, Microsoft Certified Solutions Expert, international speaker, author, blogger, and chronic volunteer who loves teaching and sharing knowledge. She works as a Senior Business Intelligence Consultant at Inmeta, focusing on Azure Data and the Microsoft Data Platform. She loves sci-fi, chocolate, coffee, craft beers, ciders, cat gifs and smilies :)