luigi.contrib.hdfs.webhdfs_client
A luigi file system client that wraps around the hdfs-library (a webhdfs client)
Note. This wrapper client is not feature complete yet. As with most software the authors only implement the features they need. If you need to wrap more of the file system operations, please do and contribute back.
Classes
|
A webhdfs that tries to confirm to luigis interface for file existence. |
|
- class luigi.contrib.hdfs.webhdfs_client.webhdfs(*args, **kwargs)[source]
- port
Parameter whose value is an
int.
- user
Parameter whose value is a
str, and a base class for other parameter types.Parameters are objects set on the Task class level to make it possible to parameterize tasks. For instance:
class MyTask(luigi.Task): foo = luigi.Parameter() class RequiringTask(luigi.Task): def requires(self): return MyTask(foo="hello") def run(self): print(self.requires().foo) # prints "hello"
This makes it possible to instantiate multiple tasks, eg
MyTask(foo='bar')andMyTask(foo='baz'). The task will then have thefooattribute set appropriately.When a task is instantiated, it will first use any argument as the value of the parameter, eg. if you instantiate
a = TaskA(x=44)thena.x == 44. When the value is not provided, the value will be resolved in this order of falling priority:Any value provided on the command line:
To the root task (eg.
--param xyz)Then to the class, using the qualified task name syntax (eg.
--TaskA-param xyz).
With
[TASK_NAME]>PARAM_NAME: <serialized value>syntax. See Parameters from config IngestionAny default value set using the
defaultflag.
Parameter objects may be reused, but you must then set the
positional=Falseflag.
- client_type
- A parameter which takes two values:
an instance of
Iterableandthe class of the variables to convert to.
In the task definition, use
class MyTask(luigi.Task): my_param = luigi.ChoiceParameter(choices=[0.1, 0.2, 0.3], var_type=float)
At the command line, use
$ luigi --module my_tasks MyTask --my-param 0.1
Consider using
EnumParameterfor a typed, structured alternative. This class can perform the same role when all choices are the same type and transparency of parameter value on the command line is desired.
- class luigi.contrib.hdfs.webhdfs_client.WebHdfsClient(host=None, port=None, user=None, client_type=None)[source]
A webhdfs that tries to confirm to luigis interface for file existence.
The library is using this api.
- property url
- property client
- remove(hdfs_path, recursive=True, skip_trash=False)[source]
Remove file or directory at location
path- Parameters:
path (str) – a path within the FileSystem to remove.
recursive (bool) – if the path is a directory, recursively remove the directory and all of its descendants. Defaults to
True.
- read(hdfs_path, offset=0, length=None, buffer_size=None, chunk_size=1024, buffer_char=None)[source]
- mkdir(path, parents=True, mode=493, raise_if_exists=False)[source]
Has no returnvalue (just like WebHDFS)
- listdir(path, ignore_directories=False, ignore_files=False, include_size=False, include_type=False, include_time=False, recursive=False)[source]
Return a list of files rooted in path.
This returns an iterable of the files rooted at
path. This is intended to be a recursive listing.- Parameters:
path (str) – a path within the FileSystem to list.
Note: This method is optional, not all FileSystem subclasses implements it.