luigi.contrib.spark module¶
-
class
luigi.contrib.spark.
SparkSubmitTask
(*args, **kwargs)[source]¶ Bases:
luigi.contrib.external_program.ExternalProgramTask
Template task for running a Spark job
Supports running jobs on Spark local, standalone, Mesos or Yarn
See http://spark.apache.org/docs/latest/submitting-applications.html for more information
-
name
= None¶
-
entry_class
= None¶
-
app
= None¶
-
always_log_stderr
= False¶
-
stream_for_searching_tracking_url
= 'stderr'¶
-
tracking_url_pattern
¶
-
pyspark_python
¶
-
pyspark_driver_python
¶
-
hadoop_user_name
¶
-
spark_version
¶
-
spark_submit
¶
-
master
¶
-
deploy_mode
¶
-
jars
¶
-
packages
¶
-
py_files
¶
-
files
¶
-
conf
¶
-
properties_file
¶
-
driver_memory
¶
-
driver_java_options
¶
-
driver_library_path
¶
-
driver_class_path
¶
-
executor_memory
¶
-
driver_cores
¶
-
supervise
¶
-
total_executor_cores
¶
-
executor_cores
¶
-
queue
¶
-
num_executors
¶
-
archives
¶
-
hadoop_conf_dir
¶
-
program_environment
()[source]¶ Override this method to control environment variables for the program
Returns: dict mapping environment variable names to values
-
-
class
luigi.contrib.spark.
PySparkTask
(*args, **kwargs)[source]¶ Bases:
luigi.contrib.spark.SparkSubmitTask
Template task for running an inline PySpark job
Simply implement the
main
method in your subclassYou can optionally define package names to be distributed to the cluster with
py_packages
(uses luigi’s global py-packages configuration by default)-
app
= '/home/docs/checkouts/readthedocs.org/user_builds/luigi/envs/latest/lib/python2.7/site-packages/luigi-2.8.13-py2.7.egg/luigi/contrib/pyspark_runner.py'¶
-
name
¶
-
py_packages
¶
-
files
¶
-
setup
(conf)[source]¶ Called by the pyspark_runner with a SparkConf instance that will be used to instantiate the SparkContext
Parameters: conf – SparkConf
-