Parameters¶
Parameters is the Luigi equivalent of creating a constructor for each Task.
Luigi requires you to declare these parameters by instantiating
Parameter
objects on the class scope:
class DailyReport(luigi.contrib.hadoop.JobTask):
date = luigi.DateParameter(default=datetime.date.today())
# ...
By doing this, Luigi can take care of all the boilerplate code that
would normally be needed in the constructor.
Internally, the DailyReport object can now be constructed by running
DailyReport(datetime.date(2012, 5, 10))
or just DailyReport()
.
Luigi also creates a command line parser that automatically handles the
conversion from strings to Python types.
This way you can invoke the job on the command line eg. by passing --date 2012-05-10
.
The parameters are all set to their values on the Task object instance, i.e.
d = DailyReport(datetime.date(2012, 5, 10))
print(d.date)
will return the same date that the object was constructed with. Same goes if you invoke Luigi on the command line.
Instance caching¶
Tasks are uniquely identified by their class name and values of their parameters. In fact, within the same worker, two tasks of the same class with parameters of the same values are not just equal, but the same instance:
>>> import luigi
>>> import datetime
>>> class DateTask(luigi.Task):
... date = luigi.DateParameter()
...
>>> a = datetime.date(2014, 1, 21)
>>> b = datetime.date(2014, 1, 21)
>>> a is b
False
>>> c = DateTask(date=a)
>>> d = DateTask(date=b)
>>> c
DateTask(date=2014-01-21)
>>> d
DateTask(date=2014-01-21)
>>> c is d
True
Insignificant parameters¶
If a parameter is created with significant=False
,
it is ignored as far as the Task signature is concerned.
Tasks created with only insignificant parameters differing have the same signature but
are not the same instance:
>>> class DateTask2(DateTask):
... other = luigi.Parameter(significant=False)
...
>>> c = DateTask2(date=a, other="foo")
>>> d = DateTask2(date=b, other="bar")
>>> c
DateTask2(date=2014-01-21)
>>> d
DateTask2(date=2014-01-21)
>>> c.other
'foo'
>>> d.other
'bar'
>>> c is d
False
>>> hash(c) == hash(d)
True
Parameter visibility¶
Using ParameterVisibility
you can configure parameter visibility. By default, all
parameters are public, but you can also set them hidden or private.
>>> import luigi
>>> from luigi.parameter import ParameterVisibility
>>> luigi.Parameter(visibility=ParameterVisibility.PRIVATE)
ParameterVisibility.PUBLIC
(default) - visible everywhere
ParameterVisibility.HIDDEN
- ignored in WEB-view, but saved into database if save db_history is true
ParameterVisibility.PRIVATE
- visible only inside task.
Parameter types¶
In the examples above, the type of the parameter is determined by using different
subclasses of Parameter
. There are a few of them, like
DateParameter
,
DateIntervalParameter
,
IntParameter
,
FloatParameter
, etc.
Python is not a statically typed language and you don’t have to specify the types
of any of your parameters.
You can simply use the base class Parameter
if you don’t care.
The reason you would use a subclass like DateParameter
is that Luigi needs to know its type for the command line interaction.
That’s how it knows how to convert a string provided on the command line to
the corresponding type (i.e. datetime.date instead of a string).
Setting parameter value for other classes¶
All parameters are also exposed on a class level on the command line interface. For instance, say you have classes TaskA and TaskB:
class TaskA(luigi.Task):
x = luigi.Parameter()
class TaskB(luigi.Task):
y = luigi.Parameter()
You can run TaskB
on the command line: luigi TaskB --y 42
.
But you can also set the class value of TaskA
by running
luigi TaskB --y 42 --TaskA-x 43
.
This sets the value of TaskA.x
to 43 on a class level.
It is still possible to override it inside Python if you instantiate TaskA(x=44)
.
All parameters can also be set from the configuration file. For instance, you can put this in the config:
[TaskA]
x: 45
Just as in the previous case, this will set the value of TaskA.x
to 45 on the class level.
And likewise, it is still possible to override it inside Python if you instantiate TaskA(x=44)
.
Parameter resolution order¶
Parameters are resolved in the following order of decreasing priority:
Any value passed to the constructor, or task level value set on the command line (applies on an instance level)
Any value set on the command line (applies on a class level)
Any configuration option (applies on a class level)
Any default value provided to the parameter (applies on a class level)
See the Parameter
class for more information.