luigi.contrib.rdbms module

A common module for postgres like databases, such as postgres or redshift

class luigi.contrib.rdbms.CopyToTable(*args, **kwargs)[source]

Bases: MixinNaiveBulkComplete, _MetadataColumnsMixin, Task

An abstract task for inserting a data set into RDBMS.

Usage:

Subclass and override the following attributes:

  • host,

  • database,

  • user,

  • password,

  • table

  • columns

  • port

abstract property host
abstract property database
abstract property user
abstract property password
abstract property table
property port
columns = []
null_values = (None,)
column_separator = '\t'
create_table(connection)[source]

Override to provide code for creating the target table.

By default it will be created using types (optionally) specified in columns.

If overridden, use the provided connection object for setting up the table in order to create the table and insert data using the same transaction.

property update_id

This update id will be a unique identifier for this insert on this table.

abstract output()[source]

The output that this Task produces.

The output of the Task determines if the Task needs to be run–the task is considered finished iff the outputs all exist. Subclasses should override this method to return a single Target or a list of Target instances.

Implementation note

If running multiple workers, the output must be a resource that is accessible by all workers, such as a DFS or database. Otherwise, workers might compute the same output since they don’t see the work done by other workers.

See Task.output

init_copy(connection)[source]

Override to perform custom queries.

Any code here will be formed in the same transaction as the main copy, just prior to copying data. Example use cases include truncating the table or removing all data older than X in the database to keep a rolling window of data available in the table.

post_copy(connection)[source]

Override to perform custom queries.

Any code here will be formed in the same transaction as the main copy, just after copying data. Example use cases include cleansing data in temp table prior to insertion into real table.

abstract copy(cursor, file)[source]
class luigi.contrib.rdbms.Query(*args, **kwargs)[source]

Bases: MixinNaiveBulkComplete, Task

An abstract task for executing an RDBMS query.

Usage:

Subclass and override the following attributes:

  • host,

  • database,

  • user,

  • password,

  • table,

  • query

Optionally override:

  • port,

  • autocommit

  • update_id

Subclass and override the following methods:

  • run

  • output

abstract property host

Host of the RDBMS. Implementation should support hostname:port to encode port.

property port

Override to specify port separately from host.

abstract property database
abstract property user
abstract property password
abstract property table
abstract property query
property autocommit
property update_id

Override to create a custom marker table ‘update_id’ signature for Query subclass task instances

abstract run()[source]

The task run method, to be overridden in a subclass.

See Task.run

abstract output()[source]

Override with an RDBMS Target (e.g. PostgresTarget or RedshiftTarget) to record execution in a marker table