luigi.contrib.hdfs.config module

You can configure what client by setting the “client” config under the “hdfs” section in the configuration, or using the --hdfs-client command line option. “hadoopcli” is the slowest, but should work out of the box.

class luigi.contrib.hdfs.config.hdfs(*args, **kwargs)[source]

Bases: Config

client_version = IntParameter (defaults to None)
namenode_host = OptionalParameter (defaults to None)
namenode_port = IntParameter (defaults to None)
client = Parameter (defaults to hadoopcli)
tmp_dir = OptionalParameter (defaults to None)
class luigi.contrib.hdfs.config.hadoopcli(*args, **kwargs)[source]

Bases: Config

command = Parameter (defaults to hadoop): The hadoop command, will run split() on it, so you can pass something like "hadoop --param"
version = Parameter (defaults to cdh4): Can also be cdh3 or apache1
luigi.contrib.hdfs.config.load_hadoop_cmd()[source]
luigi.contrib.hdfs.config.get_configured_hadoop_version()[source]

CDH4 (hadoop 2+) has a slightly different syntax for interacting with hdfs via the command line.

The default version is CDH4, but one can override this setting with “cdh3” or “apache1” in the hadoop section of the config in order to use the old syntax.

luigi.contrib.hdfs.config.get_configured_hdfs_client()[source]

This is a helper that fetches the configuration value for ‘client’ in the [hdfs] section. It will return the client that retains backwards compatibility when ‘client’ isn’t configured.

luigi.contrib.hdfs.config.tmppath(path=None, include_unix_username=True)[source]

@param path: target path for which it is needed to generate temporary location @type path: str @type include_unix_username: bool @rtype: str

Note that include_unix_username might work on windows too.