luigi.contrib.hadoop_jar module¶

Provides functionality to run a Hadoop job using a Jar

luigi.contrib.hadoop_jar.fix_paths(job)[source]¶

Coerce input arguments to use temporary files when used for output.

Return a list of temporary file pairs (tmpfile, destination path) and a list of arguments.

Converts each HdfsTarget to a string for the path.

exception luigi.contrib.hadoop_jar.HadoopJarJobError[source]¶: Bases: Exception

class luigi.contrib.hadoop_jar.HadoopJarJobRunner[source]¶

JobRunner for hadoop jar commands. Used to run a HadoopJarJobTask.

class luigi.contrib.hadoop_jar.HadoopJarJobTask(*args, **kwargs)[source]¶

A job task for hadoop jar commands that define a jar and (optional) main method.

atomic_output()[source]¶: If True, then rewrite output arguments to be temp locations and atomically move them into place after the job finishes.

ssh()[source]¶: Set this to run hadoop command remotely via ssh. It needs to be a dict that looks like {“host”: “myhost”, “key_file”: None, “username”: None, [“no_host_key_check”: False]}

args()[source]¶: Returns an array of args to pass to the job (after hadoop jar <jar> <main>).

Luigi