luigi.contrib.external_program

Template tasks for running external programs as luigi tasks.

This module is primarily intended for when you need to call a single external program or shell script, and it’s enough to specify program arguments and environment variables.

If you need to run multiple commands, chain them together or pipe output from one command to the next, you’re probably better off using something like plumbum, and wrapping plumbum commands in normal luigi Task s.

Classes

ExternalProgramRunContext(proc)

ExternalProgramTask(*args, **kwargs)

Template task for running an external program in a subprocess

ExternalPythonProgramTask(*args, **kwargs)

Template task for running an external Python program in a subprocess

Exceptions

ExternalProgramRunError(message, args[, ...])

class luigi.contrib.external_program.ExternalProgramTask(*args, **kwargs)[source]

Template task for running an external program in a subprocess

The program is run using subprocess.Popen, with args passed as a list, generated by program_args() (where the first element should be the executable). See subprocess.Popen for details.

Your must override program_args() to specify the arguments you want, and you can optionally override program_environment() if you want to control the environment variables (see ExternalPythonProgramTask for an example).

By default, the output (stdout and stderr) of the run external program is being captured and displayed after the execution has ended. This behaviour can be overridden by passing --capture-output False

capture_output

A Parameter whose value is a bool. This parameter has an implicit default value of False. For the command line interface this means that the value is False unless you add "--the-bool-parameter" to your command without giving a parameter value. This is considered implicit parsing (the default). However, in some situations one might want to give the explicit bool value ("--the-bool-parameter true|false"), e.g. when you configure the default value to be True. This is called explicit parsing. When omitting the parameter value, it is still considered True but to avoid ambiguities during argument parsing, make sure to always place bool parameters behind the task family on the command line when using explicit parsing.

You can toggle between the two parsing modes on a per-parameter base via

class MyTask(luigi.Task):
    implicit_bool = luigi.BoolParameter(parsing=luigi.BoolParameter.IMPLICIT_PARSING)
    explicit_bool = luigi.BoolParameter(parsing=luigi.BoolParameter.EXPLICIT_PARSING)

or globally by

luigi.BoolParameter.parsing = luigi.BoolParameter.EXPLICIT_PARSING

for all bool parameters instantiated after this line.

stream_for_searching_tracking_url

Used for defining which stream should be tracked for URL, may be set to ‘stdout’, ‘stderr’ or ‘none’.

Default value is ‘none’, so URL tracking is not performed.

tracking_url_pattern

Regex pattern used for searching URL in the logs of the external program.

If a log line matches the regex, the first group in the matching is set as the tracking URL for the job in the web UI. Example: ‘Job UI is here: (https?://.*)’.

Default value is None, so URL tracking is not performed.

program_args()[source]

Override this method to map your task parameters to the program arguments

Returns:

list to pass as args to subprocess.Popen

program_environment()[source]

Override this method to control environment variables for the program

Returns:

dict mapping environment variable names to values

property always_log_stderr

When True, stderr will be logged even if program execution succeeded

Override to False to log stderr only when program execution fails.

build_tracking_url(logs_output)[source]

This method is intended for transforming pattern match in logs to an URL :param logs_output: Found match of self.tracking_url_pattern :return: a tracking URL for the task

run()[source]

The task run method, to be overridden in a subclass.

See Task.run

class luigi.contrib.external_program.ExternalProgramRunContext(proc)[source]
kill_job(captured_signal=None, stack_frame=None)[source]
exception luigi.contrib.external_program.ExternalProgramRunError(message, args, env=None, stdout=None, stderr=None)[source]
class luigi.contrib.external_program.ExternalPythonProgramTask(*args, **kwargs)[source]

Template task for running an external Python program in a subprocess

Simple extension of ExternalProgramTask, adding two luigi.parameter.Parameter s for setting a virtualenv and for extending the PYTHONPATH.

virtualenv

Class to parse optional parameters.

extra_pythonpath

Class to parse optional parameters.

program_environment()[source]

Override this method to control environment variables for the program

Returns:

dict mapping environment variable names to values