What Is Apache Airflow?

Written by Caitlin Davidson

Share

Apache Airflow Defined

Apache Airflow is an online, open-source workflow management platform, which programmatically authors, schedules and monitors workflows, through automating scripts in order to perform tasks. Apache Airflow is apart of the Apache franchise.

Airflow allows users to author workflows as Directed Acyclic Graphs (DAGs) of tasks. Airflow scheduler then executes the tasks to an array of workers while following the specified dependencies. DAGs can be run either on a defined schedule (e.g. hourly or daily) or based on external event triggers.

There are different elements used in the makeup of Airflow.

Pipelines: are configured as a code written in Python, allowing for dynamic pipeline generation. These pipelines allow users to easily define their operators, executors and extend the library, so that it fits the level of abstraction that suits a users environment.

Scripts: Parameterizing a user script is built into the core of Airflow. This is done by using the Jinja templating engine.

Architecture: Airflow has a modular architecture and uses a message queue to orchestrate an arbitrary number of workers.

Airflow also provides a monitoring and managing interface, allowing users to view a quick overview of the status of the different tasks, as well as have the possibility to trigger and clear tasks or DAGs runs.

In Data Defined, we help make the complex world of data more accessible by explaining some of the most complex aspects of the field.

Click Here for more Data Defined.