luigi.contrib.pig module

Apache Pig support. Example configuration section in luigi.cfg:

[pig]
# pig home directory
home: /usr/share/pig
class luigi.contrib.pig.PigJobTask(*args, **kwargs)[source]

Bases: Task

pig_home()[source]
pig_command_path()[source]
pig_env_vars()[source]

Dictionary of environment variables that should be set when running Pig.

Ex::

return { ‘PIG_CLASSPATH’: ‘/your/path’ }

pig_properties()[source]

Dictionary of properties that should be set when running Pig.

Example:

return { 'pig.additional.jars':'/path/to/your/jar' }
pig_parameters()[source]

Dictionary of parameters that should be set for the Pig job.

Example:

return { 'YOUR_PARAM_NAME':'Your param value' }
pig_options()[source]

List of options that will be appended to the Pig command.

Example:

return ['-x', 'local']
output()[source]

The output that this Task produces.

The output of the Task determines if the Task needs to be run–the task is considered finished iff the outputs all exist. Subclasses should override this method to return a single Target or a list of Target instances.

Implementation note

If running multiple workers, the output must be a resource that is accessible by all workers, such as a DFS or database. Otherwise, workers might compute the same output since they don’t see the work done by other workers.

See Task.output

pig_script_path()[source]

Return the path to the Pig script to be run.

run()[source]

The task run method, to be overridden in a subclass.

See Task.run

track_and_progress(cmd)[source]
class luigi.contrib.pig.PigRunContext[source]

Bases: object

kill_job(captured_signal=None, stack_frame=None)[source]
exception luigi.contrib.pig.PigJobError(message, out=None, err=None)[source]

Bases: RuntimeError