luigi.contrib.bigquery module¶
-
class
luigi.contrib.bigquery.
CreateDisposition
[source]¶ Bases:
object
-
CREATE_IF_NEEDED
= 'CREATE_IF_NEEDED'¶
-
CREATE_NEVER
= 'CREATE_NEVER'¶
-
-
class
luigi.contrib.bigquery.
WriteDisposition
[source]¶ Bases:
object
-
WRITE_TRUNCATE
= 'WRITE_TRUNCATE'¶
-
WRITE_APPEND
= 'WRITE_APPEND'¶
-
WRITE_EMPTY
= 'WRITE_EMPTY'¶
-
-
class
luigi.contrib.bigquery.
QueryMode
[source]¶ Bases:
object
-
INTERACTIVE
= 'INTERACTIVE'¶
-
BATCH
= 'BATCH'¶
-
-
class
luigi.contrib.bigquery.
SourceFormat
[source]¶ Bases:
object
-
AVRO
= 'AVRO'¶
-
CSV
= 'CSV'¶
-
DATASTORE_BACKUP
= 'DATASTORE_BACKUP'¶
-
NEWLINE_DELIMITED_JSON
= 'NEWLINE_DELIMITED_JSON'¶
-
-
class
luigi.contrib.bigquery.
FieldDelimiter
[source]¶ Bases:
object
The separator for fields in a CSV file. The separator can be any ISO-8859-1 single-byte character. To use a character in the range 128-255, you must encode the character as UTF8. BigQuery converts the string to ISO-8859-1 encoding, and then uses the first byte of the encoded string to split the data in its raw, binary state. BigQuery also supports the escape sequence ” ” to specify a tab separator. The default value is a comma (‘,’).
https://cloud.google.com/bigquery/docs/reference/v2/jobs#configuration.load
-
COMMA
= ','¶
-
TAB
= '\t'¶
-
PIPE
= '|'¶
-
-
class
luigi.contrib.bigquery.
DestinationFormat
[source]¶ Bases:
object
-
AVRO
= 'AVRO'¶
-
CSV
= 'CSV'¶
-
NEWLINE_DELIMITED_JSON
= 'NEWLINE_DELIMITED_JSON'¶
-
-
class
luigi.contrib.bigquery.
Encoding
[source]¶ Bases:
object
[Optional] The character encoding of the data. The supported values are UTF-8 or ISO-8859-1. The default value is UTF-8.
BigQuery decodes the data after the raw, binary data has been split using the values of the quote and fieldDelimiter properties.
-
UTF_8
= 'UTF-8'¶
-
ISO_8859_1
= 'ISO-8859-1'¶
-
-
class
luigi.contrib.bigquery.
BQDataset
(project_id, dataset_id, location)¶ Bases:
tuple
Create new instance of BQDataset(project_id, dataset_id, location)
-
dataset_id
¶ Alias for field number 1
-
location
¶ Alias for field number 2
-
project_id
¶ Alias for field number 0
-
-
class
luigi.contrib.bigquery.
BQTable
[source]¶ Bases:
luigi.contrib.bigquery.BQTable
Create new instance of BQTable(project_id, dataset_id, table_id, location)
-
dataset
¶
-
uri
¶
-
-
class
luigi.contrib.bigquery.
BigQueryClient
(oauth_credentials=None, descriptor='', http_=None)[source]¶ Bases:
object
A client for Google BigQuery.
For details of how authentication and the descriptor work, see the documentation for the GCS client. The descriptor URL for BigQuery is https://www.googleapis.com/discovery/v1/apis/bigquery/v2/rest
-
dataset_exists
(dataset)[source]¶ Returns whether the given dataset exists. If regional location is specified for the dataset, that is also checked to be compatible with the remote dataset, otherwise an exception is thrown.
param dataset: type dataset: BQDataset
-
make_dataset
(dataset, raise_if_exists=False, body=None)[source]¶ Creates a new dataset with the default permissions.
Parameters: - dataset (BQDataset) –
- raise_if_exists – whether to raise an exception if the dataset already exists.
Raises: luigi.target.FileAlreadyExists – if raise_if_exists=True and the dataset exists
-
delete_dataset
(dataset, delete_nonempty=True)[source]¶ Deletes a dataset (and optionally any tables in it), if it exists.
Parameters: - dataset (BQDataset) –
- delete_nonempty – if true, will delete any tables before deleting the dataset
-
list_datasets
(project_id)[source]¶ Returns the list of datasets in a given project.
Parameters: project_id (str) –
-
list_tables
(dataset)[source]¶ Returns the list of tables in a given dataset.
Parameters: dataset (BQDataset) –
-
get_view
(table)[source]¶ Returns the SQL query for a view, or None if it doesn’t exist or is not a view.
Parameters: table (BQTable) – The table containing the view.
-
update_view
(table, view)[source]¶ Updates the SQL query for a view.
If the output table exists, it is replaced with the supplied view query. Otherwise a new table is created with this view.
Parameters: - table (BQTable) – The table to contain the view.
- view (str) – The SQL query for the view.
-
run_job
(project_id, body, dataset=None)[source]¶ Runs a BigQuery “job”. See the documentation for the format of body.
Note
You probably don’t need to use this directly. Use the tasks defined below.
Parameters: dataset (BQDataset) –
-
copy
(source_table, dest_table, create_disposition='CREATE_IF_NEEDED', write_disposition='WRITE_TRUNCATE')[source]¶ Copies (or appends) a table to another table.
Parameters: - source_table (BQTable) –
- dest_table (BQTable) –
- create_disposition (CreateDisposition) – whether to create the table if needed
- write_disposition (WriteDisposition) – whether to append/truncate/fail if the table exists
-
-
class
luigi.contrib.bigquery.
BigQueryTarget
(project_id, dataset_id, table_id, client=None, location=None)[source]¶ Bases:
luigi.target.Target
-
class
luigi.contrib.bigquery.
MixinBigQueryBulkComplete
[source]¶ Bases:
object
Allows to efficiently check if a range of BigQueryTargets are complete. This enables scheduling tasks with luigi range tools.
If you implement a custom Luigi task with a BigQueryTarget output, make sure to also inherit from this mixin to enable range support.
-
class
luigi.contrib.bigquery.
BigQueryLoadTask
(*args, **kwargs)[source]¶ Bases:
luigi.contrib.bigquery.MixinBigQueryBulkComplete
,luigi.task.Task
Load data into BigQuery from GCS.
-
source_format
¶ The source format to use (see
SourceFormat
).
-
write_disposition
¶ What to do if the table already exists. By default this will fail the job.
See
WriteDisposition
-
schema
¶ Schema in the format defined at https://cloud.google.com/bigquery/docs/reference/v2/jobs#configuration.load.schema.
If the value is falsy, it is omitted and inferred by BigQuery.
-
max_bad_records
¶ The maximum number of bad records that BigQuery can ignore when reading data.
If the number of bad records exceeds this value, an invalid error is returned in the job result.
-
field_delimiter
¶ The separator for fields in a CSV file. The separator can be any ISO-8859-1 single-byte character.
-
source_uris
()[source]¶ The fully-qualified URIs that point to your data in Google Cloud Storage.
Each URI can contain one ‘*’ wildcard character and it must come after the ‘bucket’ name.
-
skip_leading_rows
¶ The number of rows at the top of a CSV file that BigQuery will skip when loading the data.
The default value is 0. This property is useful if you have header rows in the file that should be skipped.
-
allow_jagged_rows
¶ Accept rows that are missing trailing optional columns. The missing values are treated as nulls.
If false, records with missing trailing columns are treated as bad records, and if there are too many bad records,
an invalid error is returned in the job result. The default value is false. Only applicable to CSV, ignored for other formats.
-
ignore_unknown_values
¶ Indicates if BigQuery should allow extra values that are not represented in the table schema.
If true, the extra values are ignored. If false, records with extra columns are treated as bad records,
and if there are too many bad records, an invalid error is returned in the job result. The default value is false.
The sourceFormat property determines what BigQuery treats as an extra value:
CSV: Trailing columns JSON: Named values that don’t match any column names
-
allow_quoted_new_lines
¶ Indicates if BigQuery should allow quoted data sections that contain newline characters in a CSV file. The default value is false.
-
-
class
luigi.contrib.bigquery.
BigQueryRunQueryTask
(*args, **kwargs)[source]¶ Bases:
luigi.contrib.bigquery.MixinBigQueryBulkComplete
,luigi.task.Task
-
write_disposition
¶ What to do if the table already exists. By default this will fail the job.
See
WriteDisposition
-
create_disposition
¶ Whether to create the table or not. See
CreateDisposition
-
flatten_results
¶ Flattens all nested and repeated fields in the query results. allowLargeResults must be true if this is set to False.
-
query
¶ The query, in text form.
-
udf_resource_uris
¶ Iterator of code resource to load from a Google Cloud Storage URI (gs://bucket/path).
-
use_legacy_sql
¶ Whether to use legacy SQL
-
-
class
luigi.contrib.bigquery.
BigQueryCreateViewTask
(*args, **kwargs)[source]¶ Bases:
luigi.task.Task
Creates (or updates) a view in BigQuery.
The output of this task needs to be a BigQueryTarget. Instances of this class should specify the view SQL in the view property.
If a view already exist in BigQuery at output(), it will be updated.
-
view
¶ The SQL query for the view, in text form.
-
-
class
luigi.contrib.bigquery.
ExternalBigQueryTask
(*args, **kwargs)[source]¶ Bases:
luigi.contrib.bigquery.MixinBigQueryBulkComplete
,luigi.task.ExternalTask
An external task for a BigQuery target.
-
class
luigi.contrib.bigquery.
BigQueryExtractTask
(*args, **kwargs)[source]¶ Bases:
luigi.task.Task
Extracts (unloads) a table from BigQuery to GCS.
This tasks requires the input to be exactly one BigQueryTarget while the output should be one or more GCSTargets from luigi.contrib.gcs depending on the use of destinationUris property.
-
destination_uris
¶ The fully-qualified URIs that point to your data in Google Cloud Storage. Each URI can contain one ‘*’ wildcard character and it must come after the ‘bucket’ name.
Wildcarded destinationUris in GCSQueryTarget might not be resolved correctly and result in incomplete data. If a GCSQueryTarget is used to pass wildcarded destinationUris be sure to overwrite this property to suppress the warning.
-
print_header
¶ Whether to print the header or not.
-
field_delimiter
¶ The separator for fields in a CSV file. The separator can be any ISO-8859-1 single-byte character.
-
destination_format
¶ The destination format to use (see
DestinationFormat
).
-
compression
¶ Whether to use compression.
-
-
luigi.contrib.bigquery.
BigqueryClient
¶
-
luigi.contrib.bigquery.
BigqueryTarget
¶
-
luigi.contrib.bigquery.
MixinBigqueryBulkComplete
¶
-
luigi.contrib.bigquery.
BigqueryLoadTask
¶
-
luigi.contrib.bigquery.
BigqueryRunQueryTask
¶
-
luigi.contrib.bigquery.
BigqueryCreateViewTask
¶
-
luigi.contrib.bigquery.
ExternalBigqueryTask
¶