luigi.contrib.batch module¶
AWS Batch wrapper for Luigi
From the AWS website:
AWS Batch enables you to run batch computing workloads on the AWS Cloud.
Batch computing is a common way for developers, scientists, and engineers to access large amounts of compute resources, and AWS Batch removes the undifferentiated heavy lifting of configuring and managing the required infrastructure. AWS Batch is similar to traditional batch computing software. This service can efficiently provision resources in response to jobs submitted in order to eliminate capacity constraints, reduce compute costs, and deliver results quickly.
See AWS Batch User Guide for more details.
To use AWS Batch, you create a jobDefinition JSON that defines a docker run command, and then submit this JSON to the API to queue up the task. Behind the scenes, AWS Batch auto-scales a fleet of EC2 Container Service instances, monitors the load on these instances, and schedules the jobs.
This boto3-powered wrapper allows you to create Luigi Tasks to submit Batch
jobDefinition``s. You can either pass a dict (mapping directly to the
``jobDefinition
JSON) OR an Amazon Resource Name (arn) for a previously
registered jobDefinition
.
Requires:
- boto3 package
- Amazon AWS credentials discoverable by boto3 (e.g., by using
aws configure
from awscli) - An enabled AWS Batch job queue configured to run on a compute environment.
Written and maintained by Jake Feala (@jfeala) for Outlier Bio (@outlierbio)
-
class
luigi.contrib.batch.
BatchClient
(poll_time=10)[source]¶ Bases:
object
-
get_job_status
(job_id)[source]¶ Retrieve task statuses from ECS API
Parameters: (str) (job_id) – AWS Batch job uuid Returns one of {SUBMITTED|PENDING|RUNNABLE|STARTING|RUNNING|SUCCEEDED|FAILED}
-
-
class
luigi.contrib.batch.
BatchTask
(*args, **kwargs)[source]¶ Bases:
luigi.task.Task
Base class for an Amazon Batch job
Amazon Batch requires you to register “job definitions”, which are JSON descriptions for how to issue the
docker run
command. This Luigi Task requires a pre-registered Batch jobDefinition name passed as a ParameterParameters: - (str) (job_definition) – name of pre-registered jobDefinition
- job_name – name of specific job, for tracking in the queue and logs.
- job_queue – name of job queue where job is going to be submitted.
-
job_definition
= Parameter¶
-
job_name
= OptionalParameter (defaults to None)¶
-
job_queue
= OptionalParameter (defaults to None)¶
-
poll_time
= IntParameter (defaults to 10)¶
-
parameters
¶ Override to return a dict of parameters for the Batch Task