luigi.contrib.hadoop_jar module

Provides functionality to run a Hadoop job using a Jar


Coerce input arguments to use temporary files when used for output.

Return a list of temporary file pairs (tmpfile, destination path) and a list of arguments.

Converts each HdfsTarget to a string for the path.

exception luigi.contrib.hadoop_jar.HadoopJarJobError[source]

Bases: exceptions.Exception

class luigi.contrib.hadoop_jar.HadoopJarJobRunner[source]

Bases: luigi.contrib.hadoop.JobRunner

JobRunner for hadoop jar commands. Used to run a HadoopJarJobTask.

run_job(job, tracking_url_callback=None)[source]
class luigi.contrib.hadoop_jar.HadoopJarJobTask(*args, **kwargs)[source]

Bases: luigi.contrib.hadoop.BaseHadoopJobTask

A job task for hadoop jar commands that define a jar and (optional) main method.


Path to the jar for this Hadoop Job.


optional main method for this Hadoop Job.


If True, then rewrite output arguments to be temp locations and atomically move them into place after the job finishes.


Set this to run hadoop command remotely via ssh. It needs to be a dict that looks like {“host”: “myhost”, “key_file”: None, “username”: None, [“no_host_key_check”: False]}


Returns an array of args to pass to the job (after hadoop jar <jar> <main>).