Execute any process anywhere in Azure Data Factory pipeline
Run your executables or any other process using SSIS integration runtime in Azure Data Factory (ADF)
When migrating SQL Server from on-premises to cloud, there is always an ask how to migrate SSIS packages or other ETL jobs to the cloud? Azure Data Factory SSIS Integration Runtime (IR) offers to run SSIS packages in the cloud. if you would like to know more in depth about it, you can read my article how to migrate your SSIS packages using SSIS Integration Runtime in Azure Data Factory (ADF) at Run your SSIS packages using Azure SQL Server Managed Instance | by Afzal Muhammad | Dev Genius
However, in certain circumstances, those ETL jobs or SSIS packages run executables such as .EXE as a part of workflow or ETL pipelines. It becomes more challenging if you are migrating SQL Server running on-premises to PaaS as there is no access to PaaS compute to run those executables.
In this article, I would describe how to run SQL Server Integration Services (SSIS) packages on an Azure-SSIS Integration Runtime (Azure-SSIS IR) with a self-hosted integration runtime (self-hosted IR) configured as a proxy.
This feature allows us to access data and run tasks such as executables or processes on premises or on a VM anywhere.
This feature can only be enabled on SSIS Data Flow Task and Execute SQL/Process Tasks for now. You can enable execute process task to run in Self Hosted integration runtime.
Abbreviations used in this article.
ADF →Azure Data Factory
SSIS → SQL Server Integration Services
IR → Integration Runtime
SHIR →Self Hosted Integration Runtime
SSIS-IR →SQL Server Integration Services Integration Runtime
MI → Managed Instance
PE →Private Endpoints
VS-> Visual Studio
SSIS — SHIR Architecture
Below is the high-level architecture used in setting up this environment
This architecture provisions Azure Data Factory with private endpoints.
- ADF PEs are created on the subnet shown in red color (e.g. 10.0.2.0/24)
- Storage account should be setup with private endpoint and connected to one of the subnets of the Managed Instance vNet
- SSIS-IR should be configured with vNet injection to the same subnets (e.g. 10.0.2.0/24). if you are using express vNet injection method, then you need a dedicated subnet.
- In this lab environment, VPN gateway is setup to have on-premises connectivity. setting up VPN gateway is beyond the scope of this article.
- Configure inbound firewall rules for the subnets where SSIS-IR is mounted to allow port 29876 and 29877
- Setup Self Hosted Integration Runtime (SHIR) in ADF and install the SHIR in on-premises machine. SHIR is establishing a connectivity with ADF via the VPN connection. In an enterprise environment, this could be an Express Route (ER) connectivity as well.
- SQL Server Managed Instance (MI) is also part of this topology for data read and write use case examples. However, setting up MI is also beyond the scope of this article.
Setup and Configure Self-Hosted Integration Runtime (SHIR)
Use the following steps to create a self-hosted IR using the Azure Data Factory GUI.
Select Integration runtimes on the left pane, and then select +New.
On the Integration runtime setup page, select Azure, Self-Hosted, and then select Continue.
On the following page, select Self-Hosted to create a Self-Hosted IR, and then select Continue.
Provide a unique name for your IR. you can also provide optional description. Then select Create.
This would generate two keys that would be used to configure SHIR. In this article, I’m using Option 2 for installing integration runtime which is “Manual Setup”.
Download the runtime as shown below from Option 2 on the machine where SHIR is going to be configured.
Run the installer to install the SHIR.
Once the installation is complete. On the Register Integration Runtime (Self-hosted) page, provide the key you captured earlier, and select Register.
Note: If you are using private endpoint for your ADF. Make sure the machine you are installing SHIR must resolve the ADF private endpoint.
On the New Integration Runtime (Self-hosted) Node page, select Finish.
Registration process will begin. It may take few minutes. After the self-hosted integration runtime is registered successfully, you see the following window:
Click “Launch Configuration Manager”, it should look like as shown below.
Visit ADF GUI. Click on Manage → Integration runtime.
Verify the status of SHIR. It should be in “Running” state as shown in the below figure.
Setup and Configure SSIS Integration Runtime (SSIS IR) with SHIR as a proxy
In the following steps, we would be creating SSIS IR and will defined SHIR as a proxy created in the earlier steps.
In the ADF GUI, select Integration runtimes on the left pane, and then select +New.
Select Azure-SSIS for Integration runtime setup as shown in the below figure.
Provide the following:
- Name: Name of integration runtime.
- Location: Specify the location where integration runtime is created.
- Node size: The node size is the size and power of the virtual machine(s) that are running your integration runtime. if you are planning to run many packages in parallel, it is recommended to select high performing tier for that VM. Click Configure performance for the Azure-SSIS Integration Runtime — Azure Data Factory | Microsoft Docs to find out more about configuring SSIS-IR for high performance.
- Node Number: adjusts the scalability of the integration runtime. The throughput of the integration runtime is proportional to the AzureSSISNodeNumber. Set the AzureSSISNodeNumber to a small value at first, monitor the throughput of the integration runtime, then adjust the value for your scenario.
- Edition/license: Select appropriate license as per your environment and needs. You can also take advantage of Azure Hybrid Benefits for SQL Server in licensing.
- Click Continue
Note: Make sure you allow the port 29876 and 29877 on the subnet NSG which you are going to mount the IR
Now configure vNet injection for SSIS IR.
Select “Set up Self-Hosted Integration Runtime as a proxy…” check box. Select SHIR from the dropdown created earlier. This also requires storage link service.
Select your existing Azure Blob Storage linked service or create a new one that references the staging data store to be used when moving data between self-hosted and Azure-SSIS integration runtimes. For this purpose, the connect Via property of staging storage linked service must be set to “AutoResolveIntegrationRuntime”.
Also provide Staging path → Specify the path in your staging data store to be used when moving data between Self-Hosted and Azure-SSIS Integration Runtimes, a default container will be used if unspecified.
Perform the vNet Validation and click “Continue”.
Click “Create” on the summary page. This will provision your SSIS-IR with SHIR acting as a proxy.
Once created successfully, ensure that the Azure SSIS IR and SHIR status shows “Running” in order to use it to run the SSIS packages.
Create a sample demo executable .bat file
Create a batch file in the notepad with the following. Running this .bat file will create a demo.txt file with the contents provided as shown below.
echo 'This is a test from SSIS IR' > demo.txt
Develop a sample SSIS package
In the following steps, we will be developing a very simple SSIS package that would call .bat file created earlier using “Execute Process Task”.
Launch the Visual Studio.
Create a new project as show below by selecting either of the integration services project templates.
Now create a package in VS
Select “New SSIS Package” and provide a package name.
Drag “Execute Process Task” from the SSIS Toolbox to the designer pane.
Double click on “Execute Process Task” and click to Process tab as shown below.
Specify “Executables” and “WorkingDirectory” as shown below. In my environment following entries were made. Please make changes according to your setup.
Exeutables → Demo.bat
WorkingDirectory → C:\Development\bat
Right Click on Execute Process Task and click Properties.
Set ExecuteOnProxy = True as shown in the below figure.
Once the package is ready, ensure package does not have any errors. Right-click on the project and select the Build menu item as shown below.
Upload package to Azure Files
Upload your package to the Azure files. Setting up storage account and Azure files is beyond the scope of this article. Make sure, they are available for the packages to be uploaded.
Desing Pipeline in ADF
Lauch ADF GUI.
You can create a pipeline as shown below.
Under General tab, drag and drop Execute SSIS package as shown below.
Click on Settings tab and fill up the following as shown below.
Select the IR created.
Specify Folder, project, and package name
Make sure the IR we created earlier is in “Running” state before we execute the pipeline. Click Manage →Integration runtime in ADF GUI
Run the pipeline. Click Add Trigger →Trigger Now
Now click Manage →Pipeline runs
Click on the pipeline. As you can status shows “Succeeded”
Verify in the on-prem or VM that the “Execute Process Task” ran via the SHIR. As you can see the .txt file is created as a result of pipeline execution.
After setting up self-hosted IR as a proxy for SSIS IR, you can deploy and run your packages to access data and or run any SQL statements. Further you can run any executables/processes whether on premises or on a VM in Azure or anywhere. The pipeline runs as Execute SSIS Package activities in Data Factory.