Skip to content

Debugging Pipelines in Azure Data Factory

This post is part 11 of 25 in the series Beginner's Guide to Azure Data Factory

In the previous post, we looked at orchestrating pipelines using branching, chaining, and the execute pipeline activity. In this post, we will look at debugging pipelines. How do we test our solutions?

You debug a pipeline by clicking the debug button:

Screenshot of the Azure Data Factory interface, with a pipeline open, and the debug button highlighted

Tadaaa! Blog post done? :D

I joke, I joke, I joke. Debugging pipelines is a one-click operation, but there are a few more things to be aware of. In the rest of this post, we will look at what happens when you debug a pipeline, how to see the debugging output, and how to set breakpoints.

Debugging Pipelines

Let’s start with the most important thing:

When you debug a pipeline, you execute the pipeline. If you have a copy data activity, the data will be copied. If you truncate tables or delete files, you will truncate the tables and delete the files.

The difference between debugging and executing pipelines is that debugging does not log execution information, so you cannot see the results on the Monitor page. Instead, you can only see the results in the output pane in the pipeline.

This means that you need to make sure that you are either:

  1. Debugging in a separate development or test environment
  2. Using test connections, folders, files, tables, etc.

You may also want to limit your queries and datasets, unless you are testing your pipeline performance.

All clear? You now definitely know not to debug anything in production unless you’re really really really sure it doesn’t break anything? Yes? Excellent! :D

So!

How do I debug a pipeline?

Again, you debug a pipeline by clicking the debug button:

Screenshot of the Azure Data Factory interface, with a pipeline open, and the debug button highlighted

This starts the debug process. First, Azure Data Factory deploys the pipeline to the debug environment:

Screenshot of the Azure Data Factory interface, with a pipeline open and being deployed to the debug environment

Then, it runs the pipeline. This opens the output pane where you will see the pipeline run ID and the current status. The status will be updated every 20 seconds for 5 minutes. After that, you have to manually refresh. The tab border also changes color to yellow, so you can see which pipelines are currently running:

Screenshot of the Azure Data Factory interface, with a pipeline open and being debugged

You can also open the active debug runs pane:

Screenshot of the Azure Data Factory interface, with a pipeline open and highlighting the active debug runs button

Here you can see all active pipeline runs:

Screenshot of the Azure Data Factory interface, showing the active debug runs

Once the pipeline finishes, you will get a notification, see an icon on the activity, and see the results in the output pane. Hopefully, everything is green and successful!

Screenshot of the Azure Data Factory interface, with a pipeline debugged successfully

However, if everything is red and failed, that’s kind of good too. Because you would rather get errors during testing and debugging than in production ;)

Screenshot of the Azure Data Factory interface, with a failed pipeline

How do I view the details of a debug run?

In addition to the pipeline run ID, start time, duration, and status, you can view the details of the debug run. Click the action buttons in the output pane:

Screenshot of the pipeline debug output action buttons

Input

Input will show you details about the activity itself – in JSON format. In this example, we recognize the settings from the copy data activity, including the number of data integration units used:

Screenshot of the input JSON

Output

Output will show you details about the execution – in JSON format. In this example, we see information such as how much data and how many rows were copied:

Screenshot of the output JSON

Details

Details will show you much of the same information as output, but in a visual interface. In this example, we see the source and sink type icons, as well as information about data and rows:

Screenshot of the details

Error

Error will show you the error code and error message – in JSON format. You can also provide feedback on these messages, directly in the interface! Click the emojis:

Screenshot of the error JSON

Then write your message and click submit:

Screenshot of the error JSON, highlighting the feedback form

How do I debug data flows?

Debugging data flows is quite different from debugging pipelines. For example, it requires you to start a debug session. If we try to debug our orchestration pipeline, it will ask us to start a new session:

Screenshot of interface asking if you want to start a debug session

Now, I’m going to refer to smarter people than me again, just like I did in the data flows post :) You can read all the details about mapping data flows debug mode in the official documentation.

But this leads us to the next part of this post. What if we want to debug the orchestration pipeline without starting a debug session?

How do I debug specific activities?

In Azure Data Factory, you can set breakpoints on activities:

Screenshot of hovering over the debug until feature

When you set a breakpoint, the activities after that breakpoint will be disabled:

Screenshot of a pipeline with a breakpoint set

You can now debug the pipeline, and only the activities up to and including the activity with the breakpoint will be executed:

Screenshot of a successful pipeline run with a breakpoint

As of right now, you can only debug until. There is no way to “debug from” or “debug single activity”. That’s why we separated our logic into individual pipelines :)

Oh! One last thing:

How do I debug nested pipelines?

When you debug pipelines with execute pipeline activities, you can click on output, then click on the pipeline run ID:

Screenshot of the output of an execute pipeline activity

This opens the pipeline and shows you that specific pipeline run:

Screenshot of a successful debug run of a nested pipeline

Summary

In this post, we looked at what happens when you debug a pipeline, how to see the debugging output, and how to set breakpoints. Once your debug runs are successful, you can go ahead and schedule your pipelines to run automatically. In the next post, we will look at triggers!

🤓

About the Author

Cathrine Wilhelmsen is a Microsoft Data Platform MVP, BimlHero Certified Expert, Microsoft Certified Solutions Expert, international speaker, author, blogger, and chronic volunteer who loves teaching and sharing knowledge. She works as a Senior Business Intelligence Consultant at Inmeta, focusing on Azure Data and the Microsoft Data Platform. She loves sci-fi, chocolate, coffee, craft beers, ciders, cat gifs and smilies :)

Secured By miniOrange