luigi.contrib.hdfs.webhdfs_client

A luigi file system client that wraps around the hdfs-library (a webhdfs client)

Note. This wrapper client is not feature complete yet. As with most software the authors only implement the features they need. If you need to wrap more of the file system operations, please do and contribute back.

Classes

WebHdfsClient([host, port, user, client_type])

A webhdfs that tries to confirm to luigis interface for file existence.

webhdfs(*args, **kwargs)

class luigi.contrib.hdfs.webhdfs_client.webhdfs(*args, **kwargs)[source]
port

Parameter whose value is an int.

user

Parameter whose value is a str, and a base class for other parameter types.

Parameters are objects set on the Task class level to make it possible to parameterize tasks. For instance:

class MyTask(luigi.Task):
    foo = luigi.Parameter()

class RequiringTask(luigi.Task):
    def requires(self):
        return MyTask(foo="hello")

    def run(self):
        print(self.requires().foo)  # prints "hello"

This makes it possible to instantiate multiple tasks, eg MyTask(foo='bar') and MyTask(foo='baz'). The task will then have the foo attribute set appropriately.

When a task is instantiated, it will first use any argument as the value of the parameter, eg. if you instantiate a = TaskA(x=44) then a.x == 44. When the value is not provided, the value will be resolved in this order of falling priority:

  • Any value provided on the command line:

    • To the root task (eg. --param xyz)

    • Then to the class, using the qualified task name syntax (eg. --TaskA-param xyz).

  • With [TASK_NAME]>PARAM_NAME: <serialized value> syntax. See Parameters from config Ingestion

  • Any default value set using the default flag.

Parameter objects may be reused, but you must then set the positional=False flag.

client_type
A parameter which takes two values:
  1. an instance of Iterable and

  2. the class of the variables to convert to.

In the task definition, use

class MyTask(luigi.Task):
    my_param = luigi.ChoiceParameter(choices=[0.1, 0.2, 0.3], var_type=float)

At the command line, use

$ luigi --module my_tasks MyTask --my-param 0.1

Consider using EnumParameter for a typed, structured alternative. This class can perform the same role when all choices are the same type and transparency of parameter value on the command line is desired.

class luigi.contrib.hdfs.webhdfs_client.WebHdfsClient(host=None, port=None, user=None, client_type=None)[source]

A webhdfs that tries to confirm to luigis interface for file existence.

The library is using this api.

property url
property client
walk(path, depth=1)[source]
exists(path)[source]

Returns true if the path exists and false otherwise.

upload(hdfs_path, local_path, overwrite=False)[source]
download(hdfs_path, local_path, overwrite=False, n_threads=-1)[source]
remove(hdfs_path, recursive=True, skip_trash=False)[source]

Remove file or directory at location path

Parameters:
  • path (str) – a path within the FileSystem to remove.

  • recursive (bool) – if the path is a directory, recursively remove the directory and all of its descendants. Defaults to True.

read(hdfs_path, offset=0, length=None, buffer_size=None, chunk_size=1024, buffer_char=None)[source]
move(path, dest)[source]

Move a file, as one would expect.

mkdir(path, parents=True, mode=493, raise_if_exists=False)[source]

Has no returnvalue (just like WebHDFS)

chmod(path, permissions, recursive=False)[source]

Raise a NotImplementedError exception.

chown(path, owner, group, recursive=False)[source]

Raise a NotImplementedError exception.

count(path)[source]

Raise a NotImplementedError exception.

copy(path, destination)[source]

Raise a NotImplementedError exception.

put(local_path, destination)[source]

Restricted version of upload

get(path, local_destination)[source]

Restricted version of download

listdir(path, ignore_directories=False, ignore_files=False, include_size=False, include_type=False, include_time=False, recursive=False)[source]

Return a list of files rooted in path.

This returns an iterable of the files rooted at path. This is intended to be a recursive listing.

Parameters:

path (str) – a path within the FileSystem to list.

Note: This method is optional, not all FileSystem subclasses implements it.

touchz(path)[source]

To touchz using the web hdfs “write” cmd.