luigi.contrib.rdbms module

A common module for postgres like databases, such as postgres or redshift

class luigi.contrib.rdbms.CopyToTable(*args, **kwargs)[source]

Bases: luigi.task.MixinNaiveBulkComplete, luigi.contrib.rdbms._MetadataColumnsMixin, luigi.task.Task

An abstract task for inserting a data set into RDBMS.

Usage:

Subclass and override the following attributes:

  • host,
  • database,
  • user,
  • password,
  • table
  • columns
  • port
host
database
user
password
table
port
columns = []
null_values = (None,)
column_separator = '\t'
create_table(connection)[source]

Override to provide code for creating the target table.

By default it will be created using types (optionally) specified in columns.

If overridden, use the provided connection object for setting up the table in order to create the table and insert data using the same transaction.

update_id

This update id will be a unique identifier for this insert on this table.

output()[source]

The output that this Task produces.

The output of the Task determines if the Task needs to be run–the task is considered finished iff the outputs all exist. Subclasses should override this method to return a single Target or a list of Target instances.

Implementation note
If running multiple workers, the output must be a resource that is accessible by all workers, such as a DFS or database. Otherwise, workers might compute the same output since they don’t see the work done by other workers.

See Task.output

init_copy(connection)[source]

Override to perform custom queries.

Any code here will be formed in the same transaction as the main copy, just prior to copying data. Example use cases include truncating the table or removing all data older than X in the database to keep a rolling window of data available in the table.

post_copy(connection)[source]

Override to perform custom queries.

Any code here will be formed in the same transaction as the main copy, just after copying data. Example use cases include cleansing data in temp table prior to insertion into real table.

copy(cursor, file)[source]
class luigi.contrib.rdbms.Query(*args, **kwargs)[source]

Bases: luigi.task.MixinNaiveBulkComplete, luigi.task.Task

An abstract task for executing an RDBMS query.

Usage:

Subclass and override the following attributes:

  • host,
  • database,
  • user,
  • password,
  • table,
  • query

Optionally override:

  • autocommit

Subclass and override the following methods:

  • output
host
database
user
password
table
query
autocommit
run()[source]

The task run method, to be overridden in a subclass.

See Task.run

output()[source]

Override with an RDBMS Target (e.g. PostgresTarget or RedshiftTarget) to record execution in a marker table

update_id

Override to create a custom marker table ‘update_id’ signature for Query subclass task instances