In the previous post, we looked at the copy data activity and saw how the source and sink properties changed with the datasets used. In this post, we will take a closer look at some common datasets and their properties.
Let’s start with the source and sink datasets we created in the copy data wizard!
First, a quick note. If you use the copy data wizard, you can change the dataset names by clicking the edit button on the summary page…
In the previous post, we went through Azure Data Factory pipelines in more detail. In this post, we will dig into the copy data activity. How does it work? How do you configure the settings? And how can you optimize performance while keeping costs down?
Copy Data Activity
The copy data activity is the core (*) activity in Azure Data Factory.
(* Cathrine’s opinion 🤓)
You can copy data to and from more than 80 Software-as-a-Service (SaaS) applications (such as Dynamics 365 and Salesforce), on-premises data stores (such as SQL Server and Oracle), and cloud data stores (such as Azure SQL Database and Amazon S3). During copying, you can define and map columns implicitly or explicitly, convert file formats, and even zip and unzip files – all in one task.
Yeah. It’s powerful :) But how does it really work?
In the previous post, we used the Copy Data Wizard to copy a file from our demo dataset to our storage account. The Copy Data Wizard created all the factory resources for us: pipelines, activities, datasets, and linked services.
In this post, we will go through pipelines in more detail. How do we create and organize them? What are their main properties? Can we edit them without using the graphical user interface?
Pipelines: The Basics
When I was new to Azure Data Factory, I had many questions, but I didn’t always have someone to ask and learn from. When I did work in a team, I didn’t always dare ask my team members for help, because I felt silly for asking about things that I felt I should probably know.
Yeah, I know… It’s easy to tell others that there are no silly questions, but I don’t always listen to myself :)
I don’t want you to feel the same way! So. Let’s start from the beginning. These are the questions that I had when I was new to Azure Data Factory. Or, these are the questions that I realized I should have asked when I discovered something by accident and went “Oh! So that’s what that is! I wish I knew that last week!“
In the previous post, we looked at the different Azure Data Factory components. In this post, we’re going to tie everything together and start making things happen. Woohoo! First, we will get familiar with our demo datasets. Then, we will create our Azure Storage Accounts that we will copy data into. Finally, we will start copying data using the Copy Data Wizard.
First, let’s get familiar with the demo datasets we will be using. I don’t know about you, but I’m a teeny tiny bit tired of the AdventureWorks demos. (I don’t even own a bike…) WideWorldImporters is at least a little more interesting. (Yay, IT joke mugs and chocolate frogs!) But! Let’s use something that’s not already in relational database format.
In the previous post, we looked at the Azure Data Factory user interface and the four main Azure Data Factory pages. In this post, we will go through the Author page in more detail and look at a few things on the Monitoring page. Let’s look at the different Azure Data Factory components!
Azure Data Factory Components on the Author Page
On the left side of the Author page, you will see your factory resources. In this example, we have already created one pipeline, two datasets, and two data flows:
Let’s go through each of these Azure Data Factory components and explain what they are and what they do.
In the introduction to Azure Data Factory, we learned a little bit about the history of Azure Data Factory and what you can use it for. In this post, we will be creating an Azure Data Factory and navigating to it.
Spoiler alert! Creating an Azure Data Factory is a fairly quick click-click-click process, and you’re done. But! Before you can do that, you need an Azure Subscription, and the right permissions on that subscription. Let’s get that sorted out first.
Azure Subscription and Permissions
If you don’t already have an Azure Subscription, you can create a free account on azure.microsoft.com/free. (Woohoo! Free! Yay!) Some of the Azure services will always be free, while some are free for the first 12 months. You get $200 worth of credits that last 30 days so you can test and learn the paid Azure services. One tip: Time your free account wisely ⏳
If you already have an Azure subscription, make sure that you have the permissions you need. To create an Azure Data Factory, you need to either:
Hi! I’m Cathrine 👋🏻 I really like Azure Data Factory. It’s one of my favorite topics, I can talk about it for hours. (And I do.) But talking about it can only help so many people – the ones who happen to attend an event where I’m presenting a session. So I’ve decided to try something new… I’m going to write an introduction to Azure Data Factory! And not just one blog post. A whole bunch of them.
I’m going to take all the things I like to talk about and turn them into bite-sized blog posts that you can read through at your own pace and reference later. I’ve named this series Beginner’s Guide to Azure Data Factory. You may not be new to ETL, data integration, Azure, or SQL, but we’re going to start completely from scratch when it comes to Azure Data Factory.
In 2019, the Azure Data Factory team announced two exciting features. The first was Mapping Data Flows (currently in Public Preview), and the second was Wrangling Data Flows (currently in Limited Private Preview). Since then, I have heard many questions. One of the more common questions is “which should I use?” In this blog post, we will be comparing Mapping and Wrangling Data Flows to hopefully make it a little easier for you to answer that question.
Should you use Mapping or Wrangling Data Flows?
Now, we all know that the consultant answer to “which should I use?” is It Depends ™ :) But what does it depend on?
To me, it boils down to a few key questions you need to ask:
What is the task or problem you are trying to solve?
Where and how will you use the output?
Which tool are you most comfortable using?
Before we dig further into these questions, let’s start with comparing Mapping and Wrangling Data Flows.
On April 4th, 2019, I presented my Pipelines and Packages: Introduction to Azure Data Factory session at 24 Hours of PASS. I was excited to show some cool features and use cases, including how to handle schema drift in the new Mapping Data Flows feature.
In January 2019, I was honored to be asked to contribute to the PASS Insights BI Edition Newsletter. I said yes, of course! :) I chose to create an Azure Data Factory Data Flows introduction video. This is a sneak preview of the upcoming Data Flows feature, with a quick walkthrough of how easy it can be to create scalable data transformations in the cloud – without writing any code!
Please note: As of January 2019, when I recorded this video and published this blog post, Azure Data Factory Data Flows is still in preview. Features will be added and things will get changed, just like all the other Azure products. But! Hopefully this shows what you can look forward to.
At the end of this blog post, I have tried to answer some frequently asked questions about Azure Data Factory Data Flows.