luigi.contrib.hadoop_jar module

Provides functionality to run a Hadoop job using a Jar

luigi.contrib.hadoop_jar.fix_paths(job)[source]

Coerce input arguments to use temporary files when used for output.

Return a list of temporary file pairs (tmpfile, destination path) and a list of arguments.

Converts each HdfsTarget to a string for the path.

exception luigi.contrib.hadoop_jar.HadoopJarJobError[source]

Bases: Exception

class luigi.contrib.hadoop_jar.HadoopJarJobRunner[source]

Bases: JobRunner

JobRunner for hadoop jar commands. Used to run a HadoopJarJobTask.

run_job(job, tracking_url_callback=None)[source]
class luigi.contrib.hadoop_jar.HadoopJarJobTask(*args, **kwargs)[source]

Bases: BaseHadoopJobTask

A job task for hadoop jar commands that define a jar and (optional) main method.

jar()[source]

Path to the jar for this Hadoop Job.

main()[source]

optional main method for this Hadoop Job.

job_runner()[source]
atomic_output()[source]

If True, then rewrite output arguments to be temp locations and atomically move them into place after the job finishes.

ssh()[source]

Set this to run hadoop command remotely via ssh. It needs to be a dict that looks like {“host”: “myhost”, “key_file”: None, “username”: None, [“no_host_key_check”: False]}

args()[source]

Returns an array of args to pass to the job (after hadoop jar <jar> <main>).