luigi.contrib.hdfs.snakebite_client module

A luigi file system client that wraps around snakebite

Originally written by Alan Brenner <alan@magnetic.com> github.com/alanbbr

class luigi.contrib.hdfs.snakebite_client.SnakebiteHdfsClient[source]

Bases: luigi.contrib.hdfs.abstract_client.HdfsFileSystem

A hdfs client using snakebite. Since Snakebite has a python API, it’ll be about 100 times faster than the hadoop cli client, which does shell out to a java program on each file system operation.

static list_path(path)[source]
get_bite()[source]

If Luigi has forked, we have a different PID, and need to reconnect.

exists(path)[source]

Use snakebite.test to check file existence.

Parameters:path (string) – path to test
Returns:boolean, True if path exists in HDFS
move(path, dest)[source]

Use snakebite.rename, if available.

Parameters:
  • path (either a string or sequence of strings) – source file(s)
  • dest (string) – destination file (single input) or directory (multiple)
Returns:

list of renamed items

rename_dont_move(path, dest)[source]

Use snakebite.rename_dont_move, if available.

Parameters:
  • path (string) – source path (single input)
  • dest (string) – destination path
Returns:

True if succeeded

Raises:

snakebite.errors.FileAlreadyExistsException

remove(path, recursive=True, skip_trash=False)[source]

Use snakebite.delete, if available.

Parameters:
  • path (either a string or a sequence of strings) – delete-able file(s) or directory(ies)
  • recursive (boolean, default is True) – delete directories trees like *nix: rm -r
  • skip_trash (boolean, default is False (use trash)) – do or don’t move deleted items into the trash first
Returns:

list of deleted items

chmod(path, permissions, recursive=False)[source]

Use snakebite.chmod, if available.

Parameters:
  • path (either a string or sequence of strings) – update-able file(s)
  • permissions (octal) – *nix style permission number
  • recursive (boolean, default is False) – change just listed entry(ies) or all in directories
Returns:

list of all changed items

chown(path, owner, group, recursive=False)[source]

Use snakebite.chown/chgrp, if available.

One of owner or group must be set. Just setting group calls chgrp.

Parameters:
  • path (either a string or sequence of strings) – update-able file(s)
  • owner (string) – new owner, can be blank
  • group (string) – new group, can be blank
  • recursive (boolean, default is False) – change just listed entry(ies) or all in directories
Returns:

list of all changed items

count(path)[source]

Use snakebite.count, if available.

Parameters:path (string) – directory to count the contents of
Returns:dictionary with content_size, dir_count and file_count keys
copy(path, destination)[source]

Raise a NotImplementedError exception.

put(local_path, destination)[source]

Raise a NotImplementedError exception.

get(path, local_destination)[source]

Use snakebite.copyToLocal, if available.

Parameters:
  • path (string) – HDFS file
  • local_destination (string) – path on the system running Luigi
get_merge(path, local_destination)[source]

Using snakebite getmerge to implement this. :param path: HDFS directory :param local_destination: path on the system running Luigi :return: merge of the directory

mkdir(path, parents=True, mode=493, raise_if_exists=False)[source]

Use snakebite.mkdir, if available.

Snakebite’s mkdir method allows control over full path creation, so by default, tell it to build a full path to work like hadoop fs -mkdir.

Parameters:
  • path (string) – HDFS path to create
  • parents (boolean, default is True) – create any missing parent directories
  • mode (octal, default 0755) – *nix style owner/group/other permissions
listdir(path, ignore_directories=False, ignore_files=False, include_size=False, include_type=False, include_time=False, recursive=False)[source]

Use snakebite.ls to get the list of items in a directory.

Parameters:
  • path (string) – the directory to list
  • ignore_directories (boolean, default is False) – if True, do not yield directory entries
  • ignore_files (boolean, default is False) – if True, do not yield file entries
  • include_size (boolean, default is False (do not include)) – include the size in bytes of the current item
  • include_type (boolean, default is False (do not include)) – include the type (d or f) of the current item
  • include_time (boolean, default is False (do not include)) – include the last modification time of the current item
  • recursive (boolean, default is False (do not recurse)) – list subdirectory contents
Returns:

yield with a string, or if any of the include_* settings are true, a tuple starting with the path, and include_* items in order

touchz(path)[source]

Raise a NotImplementedError exception.