Understanding SSIS packages migration to Azure Data Factory
Move your local SSIS packages to Azure Data Factory
Introduction
SQL Server Integration Services (SSIS) emerged as an upgrade to Data Transformation Services (DTS), which was introduced with the version of SQL Server 7.0. With the release of SQL Server 2005, DTS was replaced by SSIS and has been the most popular tool and technology stack for Extract-Transform-Load (ETL) operations and for many other related tasks such as data hygiene, data-dedupe/filtering, data preparation, and so on. SSIS packages has been developed locally on developer workstations and deployed on the on-premise servers using tools like Visual Studio and SQL Server Data Tools for decades. However, in this digital transformation era of cloud computing where every imaginable data service is found on the cloud, SSIS also has its place on the Azure cloud as well.
Azure Data Factory supports SSIS integration runtime which enables executing SSIS packages on Azure cloud. Developers continue to develop new packages or modernize existing packages and deploy them on cloud-based services like SQL Azure or Data Factory, using Visual Studio as the IDE for development and deployment.
This article provides steps for provisioning an Azure-SQL Server Integration Services (SSIS) integration runtime (IR) in Azure Data Factory (ADF) and Azure Synapse Pipelines. An Azure-SSIS IR supports:
· Running packages deployed into SSIS catalog (SSISDB) hosted by Azure SQL Database server/Managed Instance (Project Deployment Model)
· Running packages deployed into file system, Azure Files, or SQL Server database (MSDB) hosted by Azure SQL Managed Instance (Package Deployment Model)
Azure SSIS IR is an Azure Data Factory fully managed cluster of virtual machines that are hosted in Azure and dedicated to run SSIS packages in the Data Factory, with the ability to scale up the SSIS IR nodes by configuring the node size and scale it out by configuring the number of nodes in the VMs cluster.
Note: In this article, we will be using Azure SQL Managed Instance for SSIS Catalog
Integration Runtime Scenarios
Azure Data Factory integration runtime can be setup in the following scenarios.
1. Azure automatically provisions an integration Runtime which can connect to Azure resources (Azure SQL, Azure Synapse Analytics, Storage Accounts) without any issues.
2. You can perform data integration securely in a private network environment, shielded from the public cloud environment. For that you need to install a self-hosted IR inside your virtual private network. The self-hosted integration runtime only makes outbound HTTP-based connections to open internet.
3. You can also perform data integration securely in an on prem environment. For that you need to install a Self-hosted IR behind your corporate firewall in your on prem environment.
4. You can natively execute SSIS Packages by creating an Azure-SSIS Integration Runtime which creates an Integration Services Catalog in Azure SQL Database where the packages are stored. An ADF pipeline run sends commands to the Azure SSIS IR which executes the SSIS Packages.
Pre-requisites
- Azure Data Factory resource in Azure (ADF)
- SQL Managed Instance in Azure (SQL MI)
- Visual Studio 2019 (VS)
- Visual Studio Data Tools (SSDT)
Note: Create ADF and SQL MI is beyond the scope of this document. it is assumed that these resources are already created in Azure. It is recommended to created ADF in the same region where SQL instance is deployed
Setup SSIS Integration Runtime in Azure Data Factory
Launch Azure Data Factory and click Manage →Integration Runtime. Then Click “+ New” to add a new integration runtime.
Select Azure-SSIS for Integration runtime setup as shown in the below figure.
Click “Continue”
Provide the following:
- Name: Name of integration runtime.
- Location: Specify the location where integration runtime is created.
- Node size: The node size is the size and power of the virtual machine(s) that are running your integration runtime. if you are planning to run many packages in parallel, it is recommended to select high performing tier for that VM. Click Configure performance for the Azure-SSIS Integration Runtime — Azure Data Factory | Microsoft Docs to find out more about configuring SSIS-IR for high performance.
- Node Number: adjusts the scalability of the integration runtime. The throughput of the integration runtime is proportional to the AzureSSISNodeNumber. Set the AzureSSISNodeNumber to a small value at first, monitor the throughput of the integration runtime, then adjust the value for your scenario.
- Edition/license: Select appropriate license as per your environment and needs. You can also take advantage of Azure Hybrid Benefits for SQL Server in licensing.
- Click Continue
Now you need to specify whether to create SSISDB and deploy your packages into it, and/or use Azure-SSIS IR package stores to deploy the SSIS packages.
If you select to create an SSISDB, you will be asked to specify the Azure subscription and the region where the Azure SQL Server is located, taking into consideration that it is recommended to have both the Azure SQL Server and the Azure SSIS IR in the same region.
Click Continue.
Note: Make sure you allow the port 29876 and 29877 on the subnet NSG which you are going to mount the IR
Run the vNet Validation.
Click Create on the following screen
Once created successfully, ensure that the Azure SSIS IR status is changed to “Running” in order to use it to run the SSIS packages.
Package Development in Visual Studio
The next step is to install Visual Studio Data Tools with the required packages and nuggets installed that are required to create an SSIS package locally on a developer workstation.
Install 2017 integration services feature pack for azure by downloading it from the following link
After installing the integration services feature pack for Azure, you will see the following in visual studio under SSIS Toolbox
Create a project in Visual Studio
Create a new project in visual studio by clicking File →New →Project. Look for integration services and select Integration Services Project (Azure-Enabled) as shown below.
Click Next. Provide project name
Click “Create”
Following pop-up will show up as soon as project is created.
Click “Yes” if you have ADF already deployed with SSIS integration runtime configured. If is not configured, you can click “No” for now. In this case, we have already setup SSIS-IR in ADF, so we are clicking “Yes” . This will launch SSIS in ADF connection wizard.
Click Next
Select you subscription, data factory that has SSIS-IR configured, and integration runtime as shown below. Make sure you are logged into your Azure environment within Visual Studio.
Click Next
Select Azure Storage account that will be used to store packages in Azure Files.
Click “Connect”
In case, if you have selected “No” for connecting to SSIS-IR, you can perform this operation by right clicking on project → SSIS in Azure Data Factory →Connect to SSIS in Azure Data Factory as shown below
Package Design
Now create a package in VS. Creating a step-by-step package is beyond the scope of this article. For the sake of simplicity, a package was designed as a proof of concept as shown below. This package reads data from SQL table as a source and stores it as a CSV format in Azure Storage as a sink or destination. Once done, package will look like as shown below.
Once the package is ready, ensure package does not have any errors. Right-click on the project and select the Build menu item as shown below.
Output tab will show successful build.
Deploy the package
Once the build is successful, follow the same steps and select the Deploy menu item to start the deployment as shown.
This will launch Integration Services Deployment Wizard. Click Next
Select SSIS in Azure Data Factory.
Now Provide the following
1. Server name: Name of SQL server that contains SSISDB created in the earlier steps.
2. SQL Authentication
3. Path for SSIS package
4. Click Next
Review your entries. Click Deploy
Deployment will begin as shown below.
Click “Close” to close the wizard.
Verify Package
Launch SSMS and connect to SQL instance where package is deployed. Verify in SSMS that the package has been deployed successfully.
Execute the package in SSMS. Right click on the package and click “Execute”
Review the execution as shown below.
Run SSIS Package in ADF
After deploying the package to the SSISDB instance that is connected to the Azure SSIS IR, you need to create a new pipeline to run the SSIS package using an Execute SSIS package activity
You can create a pipeline as shown below
Under General tab, drag and drop Execute SSIS package as shown below.
Fill-up the following form as shown below. Select SSISDB as package location.
Specify Folder, project, and package name
Execute the pipeline.
You can also execute SSIS package by clicking the three dots in SSIS-IR
Conclusion
To lift and shift existing SSIS workload, you can create an Azure-SSIS IR to natively execute SSIS packages.