luigi.contrib.presto module

class luigi.contrib.presto.presto(*args, **kwargs)[source]

Bases: Config

host = Parameter (defaults to localhost): Presto host
port = IntParameter (defaults to 8090): Presto port
user = Parameter (defaults to anonymous): Presto user
catalog = Parameter (defaults to hive): Default catalog
password = Parameter (defaults to None): User password
protocol = Parameter (defaults to https): Presto connection protocol
poll_interval = FloatParameter (defaults to 1.0):  how often to ask the Presto REST interface for a progress update, defaults to a second
class luigi.contrib.presto.PrestoClient(connection, sleep_time=1)[source]

Bases: object

Helper class wrapping pyhive.presto.Connection for executing presto queries and tracking progress

property percentage_progress
Returns:

percentage of query overall progress

property info_uri
Returns:

query UI link

execute(query, parameters=None, mode=None)[source]
Parameters:
  • query – query to run

  • parameters – parameters should be injected in the query

  • mode – “fetch” - yields rows, “watch” - yields log entries

Returns:

class luigi.contrib.presto.WithPrestoClient(name, bases, attrs)[source]

Bases: Register

A metaclass for injecting PrestoClient as a _client field into a new instance of class T Presto connection options are taken from T-instance fields Fields should have the same names as in pyhive.presto.Cursor

Custom class creation for namespacing.

Also register all subclasses.

When the set or inherited namespace evaluates to None, set the task namespace to whatever the currently declared namespace is.

class luigi.contrib.presto.PrestoTarget(client, catalog, database, table, partition=None)[source]

Bases: Target

Target for presto-accessible tables

count()[source]
exists()[source]
Returns:

True if given table exists and there are any rows in a given partition False if no rows in the partition exists or table is absent

class luigi.contrib.presto.PrestoTask(*args, **kwargs)[source]

Bases: Query

Task for executing presto queries During its executions tracking url and percentage progress are set

property host

Host of the RDBMS. Implementation should support hostname:port to encode port.

property port

Override to specify port separately from host.

property user
property username
property schema
property password
property catalog
property poll_interval
property source
property partition
property protocol
property session_props
property requests_session
property requests_kwargs
query = None
run()[source]

The task run method, to be overridden in a subclass.

See Task.run

output()[source]

Override with an RDBMS Target (e.g. PostgresTarget or RedshiftTarget) to record execution in a marker table