Data Factory Pipeline
  • 12 Feb 2024
  • 4 Minutes to read
  • Dark
    Light
  • PDF

Data Factory Pipeline

  • Dark
    Light
  • PDF

Article Summary

Introduction

A pipeline is a logical collection of activities that work together to complete a task. A pipeline, for example, could include a set of activities that ingest and clean log data before launching a mapping data flow to analyze the log data.

The pipeline enables users to manage the activities as a group rather than being individual.

Run Pipeline

The Run pipeline option available at the top of the section allows users to trigger a Data Factory Pipeline by providing values to the parameters configured using the Azure portal.

Run pipeline1.png

Pipeline Runs

Users can view all pipeline runs along with the pipeline activity runs by clicking the pipeline run ID.

Users can also get to know the start time of the activity runs along with its input and output.

pipeline runs.gif

Parameters

The parameters set when triggering a pipeline will appear in the Parameters column for that pipeline run.

Pipeline Parameters1.png

Filtering

The pipeline runs can be filtered based any one of the following statuses:

  • Succeeded
  • In Progress
  • Queued
  • Failed
  • Cancelled

filtering df.gif

Rerun failed pipeline runs

Pipelines can be re-run by using the configured parameters for the respective runs.

A Data Factory Pipeline run might fail due to failure in any of its activities and it occurs quite often in a business orchestration. Turbo360 already facilitates viewing the run history and accessing the details of the failed activity runs. There may be a need to rerun failed activity runs, and Turbo360 allows users to run single or multiple pipeline runs.

Once the re-run operation is complete, a tag that allows users to retrieve the details of the original activity run is added to the child run. The parent pipeline run contains a tag that allows users to view the run history of all its child runs.

Regardless of how many child runs are re-run, only one pipeline run acts as the original activity run for the child runs.

rerun df.gif

Ignore pipeline run

Pipeline runs can also be ignored in Turbo360. Once ignored, an ignored tag gets associated with the particular run along with the description.

Ignore tag is visible only in Turbo360 and will not have any impact on the actual runs.

ignore df.gif

Stop pipeline run

Users can stop the ongoing pipeline runs by selecting it from the Pipeline Runs tab and clicking the Cancel button.

stop df.gif

Action Required

Action required helps users to focus on the failed pipeline runs. It provides users the privilege to rerun or ignore the failed activity runs.

Action Required1.png

Favorites

Any of the pipeline runs used frequently can be marked as a favorite along with an optional description from both Pipeline runs as well as Action Required tabs.

The favorite runs can be re-run, ignored as well as removed from Favorites.

fav df.gif

Integration runtimes

The list of Integration runtime resources connected to the Data Factory pipeline resource is now displayed in the Overview section.

Integration runtime.png

Connected triggers

In the Overview section, users can view the list of triggers associated with a Data Factory pipeline, as well as recurrence data for scheduled triggers.

Triggers.png

Inline task

  • The Data Factory Pipeline section allows users to quickly re-run failed pipeline runs from the source pipeline that is currently being explored within the specified hours.

  • The configuration created to run immediately can also be saved for future use in the Automated Tasks section.

Inline task to rerun failed runs

This feature allows users to quickly rerun the pipeline runs that failed within the specified hours (Minimum 1 hour).

DF pipeline - inline task.png

Task status can be viewed by navigating to the Automated Tasks section and switching to the Task history tab.

  • Users can use this feature to quickly create a task that runs immediately.

  • Navigate to the Automated Tasks section in Turbo360 to create a task with a more detailed configuration, schedule tasks to run at a specific time, or automate the task to run on the specified hours, days, and more.

Automated task

The failed pipeline runs can be scheduled to run at a specific time from the Automated Task section with options provided to Include previous reruns and Rerun the ignored runs.

The following illustration reruns the failed pipeline runs using Automated task:

AT df.gif

Resource Dashboard

A default resource dashboard is available for Data Factory Pipeline resources in the Overview section, allowing for enhanced data visualization and tracking of real-time data.

DF Pipeline resource dashboard1.png

Users are provided with the following pre-defined Dashboard widgets, which can be customized to meet their specific needs.

1. Cancelled Activity Runs
2. Failed Activity Runs vs Succeeded Activity Runs
3. Succeeded Activity Runs

Monitoring

  1. Navigate to Data Factory Pipeline -> Monitoring to configure the monitoring rules for Data Factory Pipelines.
  2. Select the necessary monitoring metrics and configure the threshold values.
  3. Click Save.

The threshold values can also be provided with any metric name, defining the monitoring rule to be violated when the metric value configured at threshold field is met.

121.png

Monitoring rules will be saved for Data Factory Pipeline, and the monitoring state for the metrics will be reflected after every monitoring cycle.


Was this article helpful?