There are two fundamental building blocks of Luigi -
Task class and the
Both are abstract classes and expect a few methods to be implemented.
In addition to those two concepts,
Parameter class is an important concept that governs how a Task is run.
Target class corresponds to a file on a disk,
a file on HDFS or some kind of a checkpoint, like an entry in a database.
Actually, the only method that Targets have to implement is the exists
method which returns True if and only if the Target exists.
In practice, implementing Target subclasses is rarely needed.
Luigi comes with a toolbox of several useful Targets.
but there is also support for other file systems:
luigi.contrib.redshift.RedshiftTarget, and several more.
Most of these targets, are file system-like.
HdfsTarget map to a file on the local drive or a file in HDFS.
In addition these also wrap the underlying operations to make them atomic.
They both implement the
open() method which returns a stream object that
could be read (
mode='r') from or written to (
Luigi comes with Gzip support by providing
Adding support for other formats is pretty simple.
Task class is a bit more conceptually interesting because this is
where computation is done.
There are a few methods that can be implemented to alter its behavior,
Tasks consume Targets that were created by some other task. They usually also output targets:
The Task class corresponds to some type of job that is run, but in general you want to allow some form of parametrization of it. For instance, if your Task class runs a Hadoop job to create a report every night, you probably want to make the date a parameter of the class. See Parameters for more info.
Using tasks, targets, and parameters, Luigi lets you express arbitrary dependencies in code, rather than using some kind of awkward config DSL. This is really useful because in the real world, dependencies are often very messy. For instance, some examples of the dependencies you might encounter:
(These diagrams are from a Luigi presentation in late 2014 at NYC Data Science meetup)