luigi.contrib.hdfs.snakebite_client module¶
A luigi file system client that wraps around snakebite
Originally written by Alan Brenner <alan@magnetic.com> github.com/alanbbr
-
class
luigi.contrib.hdfs.snakebite_client.
SnakebiteHdfsClient
[source]¶ Bases:
luigi.contrib.hdfs.abstract_client.HdfsFileSystem
A hdfs client using snakebite. Since Snakebite has a python API, it’ll be about 100 times faster than the hadoop cli client, which does shell out to a java program on each file system operation.
-
exists
(path)[source]¶ Use snakebite.test to check file existence.
Parameters: path (string) – path to test Returns: boolean, True if path exists in HDFS
-
move
(path, dest)[source]¶ Use snakebite.rename, if available.
Parameters: - path (either a string or sequence of strings) – source file(s)
- dest (string) – destination file (single input) or directory (multiple)
Returns: list of renamed items
-
rename_dont_move
(path, dest)[source]¶ Use snakebite.rename_dont_move, if available.
Parameters: - path (string) – source path (single input)
- dest (string) – destination path
Returns: True if succeeded
Raises: snakebite.errors.FileAlreadyExistsException
-
remove
(path, recursive=True, skip_trash=False)[source]¶ Use snakebite.delete, if available.
Parameters: - path (either a string or a sequence of strings) – delete-able file(s) or directory(ies)
- recursive (boolean, default is True) – delete directories trees like *nix: rm -r
- skip_trash (boolean, default is False (use trash)) – do or don’t move deleted items into the trash first
Returns: list of deleted items
-
chmod
(path, permissions, recursive=False)[source]¶ Use snakebite.chmod, if available.
Parameters: - path (either a string or sequence of strings) – update-able file(s)
- permissions (octal) – *nix style permission number
- recursive (boolean, default is False) – change just listed entry(ies) or all in directories
Returns: list of all changed items
-
chown
(path, owner, group, recursive=False)[source]¶ Use snakebite.chown/chgrp, if available.
One of owner or group must be set. Just setting group calls chgrp.
Parameters: - path (either a string or sequence of strings) – update-able file(s)
- owner (string) – new owner, can be blank
- group (string) – new group, can be blank
- recursive (boolean, default is False) – change just listed entry(ies) or all in directories
Returns: list of all changed items
-
count
(path)[source]¶ Use snakebite.count, if available.
Parameters: path (string) – directory to count the contents of Returns: dictionary with content_size, dir_count and file_count keys
-
get
(path, local_destination)[source]¶ Use snakebite.copyToLocal, if available.
Parameters: - path (string) – HDFS file
- local_destination (string) – path on the system running Luigi
-
get_merge
(path, local_destination)[source]¶ Using snakebite getmerge to implement this. :param path: HDFS directory :param local_destination: path on the system running Luigi :return: merge of the directory
-
mkdir
(path, parents=True, mode=493, raise_if_exists=False)[source]¶ Use snakebite.mkdir, if available.
Snakebite’s mkdir method allows control over full path creation, so by default, tell it to build a full path to work like
hadoop fs -mkdir
.Parameters: - path (string) – HDFS path to create
- parents (boolean, default is True) – create any missing parent directories
- mode (octal, default 0755) – *nix style owner/group/other permissions
-
listdir
(path, ignore_directories=False, ignore_files=False, include_size=False, include_type=False, include_time=False, recursive=False)[source]¶ Use snakebite.ls to get the list of items in a directory.
Parameters: - path (string) – the directory to list
- ignore_directories (boolean, default is False) – if True, do not yield directory entries
- ignore_files (boolean, default is False) – if True, do not yield file entries
- include_size (boolean, default is False (do not include)) – include the size in bytes of the current item
- include_type (boolean, default is False (do not include)) – include the type (d or f) of the current item
- include_time (boolean, default is False (do not include)) – include the last modification time of the current item
- recursive (boolean, default is False (do not recurse)) – list subdirectory contents
Returns: yield with a string, or if any of the include_* settings are true, a tuple starting with the path, and include_* items in order
-