Storage Event-Driven with ADF or Synapse Pipelines

Implementation Guide — Part 2

Vinny Paluch
3 min readNov 15, 2022

Previous Posts:

Synapse — Same-day analytics Data Exploration — Implementation Guide | by Vinny Paluch | Nov, 2022 | Medium

Parameterised Connections

In order to facilitate reuse, I try to parameterise my connections and pipelines as much as possible. This includes the connections to the linked services. In addition, I use an Azure Key vault in order to store the information required to connect to the storage accounts.

How Event Driven Pipelines Work

Everything begins with the Pipeline Trigger definition. When creating a trigger you need to provide the following information:

Type: Storage Events

Storage Account and Container names

Path* : As a base string, like “parentfolder/folder”

Event: Create and/or Delete

Storage Event Trigger Type

File Information

When a new file is copied or deleted in the Data Lake, a event is fired using the Azure event grid. The ADF/Synapse trigger will receive the following parameters from the Event Grid.

@triggerBody().folderPath and @triggerBody().fileName

We will get back to this later.

Event Grid Resource Provider not Registered

If you get the error message bellow, you need to activate the service in your subscription before continuing.

Registering the Event Grid Resource in the subscription.

Navigate to your subscription / resource providers and enable ‘Microsoft.EventGrid’ resource. Or use the AZ CLI console.

az provider register --namespace 'Microsoft.EventGrid'

Linked Services

Basically we gonna need 2 linked services:

Azure KeyVault linked Service

Create the Key Vault linked service. I use parameters also for the Key Vault. This will simplify the CI/CD process when deploying this into UAT or PRD and we won’t have to update the deployment scripts.

Of course, we still have to change this in the Workspace, but this could be done as a global parameter. (ADF only)

Note: Synapse/ADF Managed Identity account must a member of ‘Key Vaults Secrets User’ RBAC group.

Azure Key Vault Linked Service — Two steps to grant service access — Vinny Paluch — Medium

Storage Account Secrets

When I create a new storage account, I automatically save the connection information into the Key Vault, this is usually done through a terraform script. But in this scenario we will be creating those secrets manually.

Required secrets: Data Lake Endpoint, SAS Key and Primary Key

Those secrets will be used in other stages of this implementation.

a) During the copy phase when the connection to the Data Lake is not made using the MIS Credential.

b) During the last stage, Synapse object creation, we will use the SAS Key to register the Data Lake as a SQL Data Source.

Data Lake storage Linked Service

The connection between ADF/Synapse and the Data Lake will use a Managed Identity. If that’s not possible in your scenario, use the the Primary Key from the Key Vault.

Both authentication methods to the Storage Account. Managed Identity is Preferred.

Pipeline Design Logic

I plan to create 4 pipelines, this will allow better reutilisation.

Pipeline Main will be triggered by any file dropped into the /raw/dropfolder in my storage account.

The sub pipelines are independent and can be reused latter by another process, e.q., a Bulk process pipeline.

The implementation process is described in this post series.

References:

Create event-based triggers — Azure Data Factory & Azure Synapse | Microsoft Learn

NIFTY 100 Stocks — Kaggle sample dataset | by Vinny Paluch | Nov, 2022 | Medium

--

--

Vinny Paluch
Vinny Paluch

Written by Vinny Paluch

Expert in the use of Microsoft’s BI technology stack and Business Intelligence projects with more than 20 years of experience

No responses yet