luigi.contrib.batch module

AWS Batch wrapper for Luigi

From the AWS website:

AWS Batch enables you to run batch computing workloads on the AWS Cloud.

Batch computing is a common way for developers, scientists, and engineers to access large amounts of compute resources, and AWS Batch removes the undifferentiated heavy lifting of configuring and managing the required infrastructure. AWS Batch is similar to traditional batch computing software. This service can efficiently provision resources in response to jobs submitted in order to eliminate capacity constraints, reduce compute costs, and deliver results quickly.

See AWS Batch User Guide for more details.

To use AWS Batch, you create a jobDefinition JSON that defines a docker run command, and then submit this JSON to the API to queue up the task. Behind the scenes, AWS Batch auto-scales a fleet of EC2 Container Service instances, monitors the load on these instances, and schedules the jobs.

This boto3-powered wrapper allows you to create Luigi Tasks to submit Batch jobDefinition``s. You can either pass a dict (mapping directly to the ``jobDefinition JSON) OR an Amazon Resource Name (arn) for a previously registered jobDefinition.

Requires:

  • boto3 package

  • Amazon AWS credentials discoverable by boto3 (e.g., by using aws configure from awscli)

  • An enabled AWS Batch job queue configured to run on a compute environment.

Written and maintained by Jake Feala (@jfeala) for Outlier Bio (@outlierbio)

exception luigi.contrib.batch.BatchJobException[source]

Bases: Exception

class luigi.contrib.batch.BatchClient(poll_time=10)[source]

Bases: object

get_active_queue()[source]

Get name of first active job queue

get_job_id_from_name(job_name)[source]

Retrieve the first job ID matching the given name

get_job_status(job_id)[source]

Retrieve task statuses from ECS API

Parameters:

(str) (job_id) – AWS Batch job uuid

Returns one of {SUBMITTED|PENDING|RUNNABLE|STARTING|RUNNING|SUCCEEDED|FAILED}

get_logs(log_stream_name, get_last=50)[source]

Retrieve log stream from CloudWatch

submit_job(job_definition, parameters, job_name=None, queue=None)[source]

Wrap submit_job with useful defaults

wait_on_job(job_id)[source]

Poll task status until STOPPED

register_job_definition(json_fpath)[source]

Register a job definition with AWS Batch, using a JSON

class luigi.contrib.batch.BatchTask(*args, **kwargs)[source]

Bases: Task

Base class for an Amazon Batch job

Amazon Batch requires you to register “job definitions”, which are JSON descriptions for how to issue the docker run command. This Luigi Task requires a pre-registered Batch jobDefinition name passed as a Parameter

Parameters:
  • (str) (job_definition) – name of pre-registered jobDefinition

  • job_name – name of specific job, for tracking in the queue and logs.

  • job_queue – name of job queue where job is going to be submitted.

job_definition = Parameter
job_name = OptionalParameter (defaults to None)
job_queue = OptionalParameter (defaults to None)
poll_time = IntParameter (defaults to 10)
run()[source]

The task run method, to be overridden in a subclass.

See Task.run

property parameters

Override to return a dict of parameters for the Batch Task