luigi.contrib.hdfs.config module¶
You can configure what client by setting the “client” config under the “hdfs” section in the configuration, or using the --hdfs-client
command line option.
“hadoopcli” is the slowest, but should work out of the box. “snakebite” is the fastest, but requires Snakebite to be installed.
-
class
luigi.contrib.hdfs.config.
hdfs
(*args, **kwargs)[source]¶ Bases:
luigi.task.Config
-
client_version
= IntParameter (defaults to None)¶
-
effective_user
= OptionalParameter (defaults to None): Optionally specifies the effective user for snakebite. If not set the environment variable HADOOP_USER_NAME is used, else USER¶
-
snakebite_autoconfig
= BoolParameter (defaults to False)¶
-
namenode_host
= OptionalParameter (defaults to None)¶
-
namenode_port
= IntParameter (defaults to None)¶
-
client
= Parameter (defaults to hadoopcli)¶
-
tmp_dir
= OptionalParameter (defaults to None)¶
-
-
class
luigi.contrib.hdfs.config.
hadoopcli
(*args, **kwargs)[source]¶ Bases:
luigi.task.Config
-
command
= Parameter (defaults to hadoop): The hadoop command, will run split() on it, so you can pass something like "hadoop --param"¶
-
version
= Parameter (defaults to cdh4): Can also be cdh3 or apache1¶
-
-
luigi.contrib.hdfs.config.
get_configured_hadoop_version
()[source]¶ CDH4 (hadoop 2+) has a slightly different syntax for interacting with hdfs via the command line.
The default version is CDH4, but one can override this setting with “cdh3” or “apache1” in the hadoop section of the config in order to use the old syntax.