luigi.contrib.scalding module¶
-
luigi.contrib.scalding.
logger
= <logging.Logger object>¶ Scalding support for Luigi.
Example configuration section in luigi.cfg:
[scalding] # scala home directory, which should include a lib subdir with scala jars. scala-home: /usr/share/scala # scalding home directory, which should include a lib subdir with # scalding-*-assembly-* jars as built from the official Twitter build script. scalding-home: /usr/share/scalding # provided dependencies, e.g. jars required for compiling but not executing # scalding jobs. Currently required jars: # org.apache.hadoop/hadoop-core/0.20.2 # org.slf4j/slf4j-log4j12/1.6.6 # log4j/log4j/1.2.15 # commons-httpclient/commons-httpclient/3.1 # commons-cli/commons-cli/1.2 # org.apache.zookeeper/zookeeper/3.3.4 scalding-provided: /usr/share/scalding/provided # additional jars required. scalding-libjars: /usr/share/scalding/libjars
-
class
luigi.contrib.scalding.
ScaldingJobRunner
[source]¶ Bases:
luigi.contrib.hadoop.JobRunner
JobRunner for pyscald commands. Used to run a ScaldingJobTask.
-
class
luigi.contrib.scalding.
ScaldingJobTask
(*args, **kwargs)[source]¶ Bases:
luigi.contrib.hadoop.BaseHadoopJobTask
A job task for Scalding that define a scala source and (optional) main method.
requires() should return a dictionary where the keys are Scalding argument names and values are sub tasks or lists of subtasks.
For example:
{'input1': A, 'input2': C} => --input1 <Aoutput> --input2 <Coutput> {'input1': [A, B], 'input2': [C]} => --input1 <Aoutput> <Boutput> --input2 <Coutput>
-
source
()[source]¶ Path to the scala source for this Scalding Job
Either one of source() or jar() must be specified.
-
jar
()[source]¶ Path to the jar file for this Scalding Job
Either one of source() or jar() must be specified.
-
atomic_output
()[source]¶ If True, then rewrite output arguments to be temp locations and atomically move them into place after the job finishes.
-
requires
()[source]¶ The Tasks that this Task depends on.
A Task will only run if all of the Tasks that it requires are completed. If your Task does not require any other Tasks, then you don’t need to override this method. Otherwise, a subclass can override this method to return a single Task, a list of Task instances, or a dict whose values are Task instances.
See Task.requires
-