luigi.contrib.hdfs.webhdfs_client module

A luigi file system client that wraps around the hdfs-library (a webhdfs client)

This is a sensible fast alternative to snakebite. In particular for python3 users, where snakebite is not supported at the time of writing (dec 2015).

Note. This wrapper client is not feature complete yet. As with most software the authors only implement the features they need. If you need to wrap more of the file system operations, please do and contribute back.

class luigi.contrib.hdfs.webhdfs_client.webhdfs(*args, **kwargs)[source]

Bases: luigi.task.Config

port = IntParameter (defaults to 50070): Port for webhdfs
user = Parameter (defaults to ): Defaults to $USER envvar
client_type = ChoiceParameter (defaults to insecure): Type of hdfs client to use. Choices: {insecure, kerberos}
class luigi.contrib.hdfs.webhdfs_client.WebHdfsClient(host=None, port=None, user=None, client_type=None)[source]

Bases: luigi.contrib.hdfs.abstract_client.HdfsFileSystem

A webhdfs that tries to confirm to luigis interface for file existence.

The library is using this api.

walk(path, depth=1)[source]

Returns true if the path exists and false otherwise.

upload(hdfs_path, local_path, overwrite=False)[source]
download(hdfs_path, local_path, overwrite=False, n_threads=-1)[source]
remove(hdfs_path, recursive=True, skip_trash=False)[source]

Remove file or directory at location path

  • path (str) – a path within the FileSystem to remove.
  • recursive (bool) – if the path is a directory, recursively remove the directory and all of its descendants. Defaults to True.
read(hdfs_path, offset=0, length=None, buffer_size=None, chunk_size=1024, buffer_char=None)[source]
move(path, dest)[source]

Move a file, as one would expect.

mkdir(path, parents=True, mode=493, raise_if_exists=False)[source]

Has no returnvalue (just like WebHDFS)

chmod(path, permissions, recursive=False)[source]

Raise a NotImplementedError exception.

chown(path, owner, group, recursive=False)[source]

Raise a NotImplementedError exception.


Raise a NotImplementedError exception.

copy(path, destination)[source]

Raise a NotImplementedError exception.

put(local_path, destination)[source]

Restricted version of upload

get(path, local_destination)[source]

Restricted version of download

listdir(path, ignore_directories=False, ignore_files=False, include_size=False, include_type=False, include_time=False, recursive=False)[source]

Return a list of files rooted in path.

This returns an iterable of the files rooted at path. This is intended to be a recursive listing.

Parameters:path (str) – a path within the FileSystem to list.

Note: This method is optional, not all FileSystem subclasses implements it.


To touchz using the web hdfs “write” cmd.