Datasets in Azure Data Factory

Woman standing next to a projector showing the Azure Data Factory logo.

In the previous post, we looked at the copy data activity and saw how the source and sink properties changed with the datasets used. In this post, we will take a closer look at some common datasets and their properties.

Let’s start with the source and sink datasets we created in the copy data wizard!

Dataset Names

First, a quick note. If you use the copy data tool, you can change the dataset names by clicking the edit button on the summary page…

Copy Data Activity in Azure Data Factory

Woman standing next to a projector showing the Azure Data Factory logo.

In the previous post, we went through Azure Data Factory pipelines in more detail. In this post, we will dig into the copy data activity. How does it work? How do you configure the settings? And how can you optimize performance while keeping costs down?

Copy Data Activity

Copy Data Activity in Azure Data Factory.

The copy data activity is the core (*) activity in Azure Data Factory.

(* Cathrine’s opinion 🤓)

You can copy data to and from more than 90 Software-as-a-Service (SaaS) applications (such as Dynamics 365 and Salesforce), on-premises data stores (such as SQL Server and Oracle), and cloud data stores (such as Azure SQL Database and Amazon S3). During copying, you can define and map columns implicitly or explicitly, convert file formats, and even zip and unzip files – all in one task.

Yeah. It’s powerful! But how does it really work?

Pipelines in Azure Data Factory

Woman standing next to a projector showing the Azure Data Factory logo.

In the previous post, we used the Copy Data Tool to copy a file from our demo dataset to our data lake. The Copy Data Tool created all the factory resources for us: pipelines, activities, datasets, and linked services.

In this post, we will go through pipelines in more detail. How do we create and organize them? What are their main properties? Can we edit them without using the graphical user interface?

How do I create pipelines?

So far, we have created a pipeline by using the Copy Data Tool. There are several other ways to create a pipeline.

On the Home page, click on the New → Pipeline dropdown menu, or click on the Orchestrate shortcut tile:

Screenshot of the Azure Data Factory Home page with the New Pipeline and Orchestrate tasks highlighted.

Copy Data Tool in Azure Data Factory

Woman standing next to a projector showing the Azure Data Factory logo.

In the previous post, we looked at the different Azure Data Factory components. In this post, we’re going to tie everything together and start making things happen. Woohoo! First, we will get familiar with our demo datasets. Then, we will create our Azure Data Lake Storage Account that we will copy data into. Finally, we will start copying data using the Copy Data Tool.

Demo Datasets

First, let’s get familiar with the demo datasets we will be using. I don’t know about you, but I’m a teeny tiny bit tired of the AdventureWorks demos. (I don’t even own a bike…) WideWorldImporters is at least a little more interesting. (Yay, IT joke mugs and chocolate frogs!) But! Let’s use something that might be a little bit more fun to explore.

Let me present… *drumroll* 🥁

Overview of Azure Data Factory Components

Woman standing next to a projector showing the Azure Data Factory logo.

In the previous post, we looked at the Azure Data Factory user interface and the four main Azure Data Factory pages. In this post, we will go through the Author page in more detail and look at a few things on the Monitoring page. Let’s look at the different Azure Data Factory components!

Azure Data Factory Components on the Author Page

On the left side of the Author page, you will see your factory resources. In this example, we have already created one pipeline, two datasets, one data flow, and one power query:

Screenshot of the Author page in Azure Data Factory, with one Pipeline, two Datasets, and one Data Flow already created.

Let’s go through each of these Azure Data Factory components and explain what they are and what they do.