Skip to content

Orchestrating Pipelines in Azure Data Factory

This post is part 10 of 25 in the series Beginner's Guide to Azure Data Factory

In the previous post, we peeked at the two different data flows in Azure Data Factory, then created a basic mapping data flow. In this post, we will look at orchestrating pipelines using branching, chaining, and the execute pipeline activity.

Let’s continue where we left off in the previous post. How do we wire up our solution and make it look something like this?

Diagram showing data being copied from an on-premises data center to Azure Data Lake Storage, and then transformed from Azure Data Lake Storage to Azure Synapse Analytics (previously Azure SQL Data Warehouse)

We need to make sure that we get the data before we can transform that data.

One way to build this solution is to create a single pipeline with a copy data activity followed by a data flow activity. But! Since we have already created two separate pipelines, and this post is about orchestrating pipelines, let’s go with the second option :D

Orchestrating Pipelines using Execute Pipeline Activities

The other way to build this solution is by creating an orchestration pipeline with two execute pipeline activities. This gives us a little more flexibility than having a single pipeline, because we can execute each pipeline separately if we want to.

Let’s start by creating a new pipeline and adding two execute pipeline activities to it. In the activity settings, select the pipelines to execute, and check wait on completion:

Screenshot of the Execute Pipeline activity settings

Then, create a dependency between the two execute pipeline activities by clicking the green handle on the right side of the first activity and dragging it onto the second activity:

Screenshot of creating dependencies between two Execute Pipeline activities

You will now execute the two pipelines sequentially:

Screenshot of orchestrating pipelines using Execute Pipeline activities

When Should I Wait on Completion?

The execute pipeline activity can behave in two ways. The default behavior is to not wait on completion. In this case, the activity will start executing the pipeline. As soon as the pipeline has been started, the activity will go “alright, I pressed play, the pipeline has started, my job here is done!” and report success. In this case, success means that the pipeline was successfully started.

If you choose to wait on completion, the activity will start executing the pipeline, and then wait until the pipeline is completed. If the pipeline fails, the activity will go “oh no, the pipeline failed, now I shall fail too!” and report failure. If the pipeline succeeds, the activity will go “cool cool, the pipeline succeeded, my job here is done!” and report success. In this case, success means that the pipeline was successfully completed.

Activity Dependencies

By default, you will only see the success output on activities. You can add more outputs by clicking the add output button:

Screenshot of adding output on an execute pipeline activity

This will open up the output menu showing the four types of outputs you can add to an activity:

Screenshot of adding output on an execute pipeline activity, highlighting the pop-up menu

Success

The green handles and arrows visualize the success output:

Screenshot of pipeline with two execute pipeline tasks connected by a success dependency

When you add a success dependency, the second activity will only be executed if the first activity succeeds.

Failure

The red handles and arrows visualize the failure output:

Screenshot of pipeline with two execute pipeline tasks connected by a failure dependency

When you add a failure dependency, the second activity will only be executed if the first activity fails.

Completion

The blue handles and arrows visualize the completion output:

Screenshot of pipeline with two execute pipeline tasks connected by a completion dependency

When you add a completion dependency, the second activity will be executed when the first activity completes, regardless of its status.

Skipped

The gray handles and arrows visualize the skipped output:

Screenshot of pipeline with two execute pipeline tasks connected by a skipped dependency

When you add a skipped dependency, the second activity will only be executed if the first activity isn’t executed.

…are you confused about the skipped output? I was! So I made a test for myself :D

Multiple Dependencies

To test the activity dependencies, I created a slightly more complex pipeline. Oh, and I threw in some additional features that aren’t strictly necessary to test the dependencies, just to make it a little more complex. You know, for fun science :D

In this pipeline, I can make the copy data activity succeed (by copying a file that exists) or fail (by trying to copy a file that doesn’t exist). I then log the output statuses to a table in my database. Can you guess what happens when the copy data activity succeeds, and what happens when it fails?

Screenshot of a complex pipeline with multiple branches and chains

When the copy data activity succeeds, we log both success and completion:

Screenshot of a complex pipeline with multiple branches and chains, where two branches are executed and one is not

When the copy data activity fails, we log failure, completion, and skipped. In this case, the “log skipped” activity runs because the “log success” activity didn’t run, because the copy data activity didn’t succeed:

Screenshot of a complex pipeline with multiple branches and chains, where two branches are executed and one is not

Summary

In this post, we looked at orchestrating pipelines using branching, chaining, and the execute pipeline activity. We created a pipeline that executes two other pipelines, and looked at the different activity dependencies.

While my examples work for small projects, you may need a different strategy for larger projects. Paul Andrew (@mrpaulandrew) has a great blog post about his pipeline hierarchy design pattern. He uses the concept of grandparent, parent, child, and infant. Once you start scaling out your project, this is an excellent resource!

Finally, there is a big gotcha when it comes to activity dependencies… Meagan Longoria (@MMarie) has explained this gotcha in her excellent blog post Activity Dependencies are a Logical AND. Once you start building more complex pipelines, make sure you understand how this works, then test your pipelines thoroughly.

And speaking of testing… how do you test your pipelines? We’ll look at debugging pipelines next!

🤓

About the Author

Cathrine Wilhelmsen is a Microsoft Data Platform MVP, BimlHero Certified Expert, Microsoft Certified Solutions Expert, international speaker, author, blogger, and chronic volunteer who loves teaching and sharing knowledge. She works as a Senior Business Intelligence Consultant at Inmeta, focusing on Azure Data and the Microsoft Data Platform. She loves sci-fi, chocolate, coffee, craft beers, ciders, cat gifs and smilies :)