luigi.contrib.spark module¶
- class luigi.contrib.spark.SparkSubmitTask(*args, **kwargs)[source]¶
Bases:
ExternalProgramTask
Template task for running a Spark job
Supports running jobs on Spark local, standalone, Mesos or Yarn
See http://spark.apache.org/docs/latest/submitting-applications.html for more information
- name = None¶
- entry_class = None¶
- app = None¶
- always_log_stderr = False¶
- stream_for_searching_tracking_url = 'stderr'¶
Used for defining which stream should be tracked for URL, may be set to ‘stdout’, ‘stderr’ or ‘none’.
Default value is ‘none’, so URL tracking is not performed.
- property tracking_url_pattern¶
Class to parse optional parameters.
- property pyspark_python¶
- property pyspark_driver_python¶
- property hadoop_user_name¶
- property spark_version¶
- property spark_submit¶
- property master¶
- property deploy_mode¶
- property jars¶
- property packages¶
- property py_files¶
- property files¶
- property conf¶
- property properties_file¶
- property driver_memory¶
- property driver_java_options¶
- property driver_library_path¶
- property driver_class_path¶
- property executor_memory¶
- property driver_cores¶
- property supervise¶
- property total_executor_cores¶
- property executor_cores¶
- property queue¶
- property num_executors¶
- property archives¶
- property hadoop_conf_dir¶
- program_environment()[source]¶
Override this method to control environment variables for the program
- Returns:
dict mapping environment variable names to values
- class luigi.contrib.spark.PySparkTask(*args, **kwargs)[source]¶
Bases:
SparkSubmitTask
Template task for running an inline PySpark job
Simply implement the
main
method in your subclassYou can optionally define package names to be distributed to the cluster with
py_packages
(uses luigi’s global py-packages configuration by default)- app = '/home/docs/checkouts/readthedocs.org/user_builds/luigi/envs/stable/lib/python3.9/site-packages/luigi/contrib/pyspark_runner.py'¶
- property name¶
- property py_packages¶
- property files¶
- property pickle_protocol¶
- setup(conf)[source]¶
Called by the pyspark_runner with a SparkConf instance that will be used to instantiate the SparkContext
- Parameters:
conf – SparkConf