Installation#
You can directly install it within your virtual environment:
pip install renameit
Additional library support for a specific cloud provider can be installed as extra requirement:
pip install renameit[aws, azure, google]
If you don't want to worry about cluttering an existing production environment, we provide a PEX file for each release that can be directly downloaded:
curl https://github.com/Hasan-J/renameit/releases/download/{release-name}/renameit.pex -o renameit
We also publish a docker image to docker hub for each release:
docker pull hasanj/renameit:<tag-name>
NOTE: Both pex file and docker image contain all dependencies
Usage#
renameit needs some kind of user input to tell it what kind of jobs to run.
Users can basically define a list of jobs that will be run in sequence (job parralalism could be introduced in later versions).
Configure your jobs#
Provide job configuration file (.ini, .toml, .json, .yaml, .yml) in the user directory,
~/.config/renameit/config.<extension> for Linux and macos or %USERPROFILE%/.config/renameit/config.<extension>
for windows.
config.yaml example
version: 0.1
jobs:
- name: local_job_1
operation: copy
file_system:
type: local
args:
source_dir: /data/source_files
target_dir: /data/target_files
recursive: True
rename_handler:
type: basic
args:
method: add_prefix
value: TEST_PREFIX_
Run jobs:#
Using virtual environment or pex file#
renameit
or with a custom config file:
renameit --config-path /custom/path/config.yaml
Using docker:#
Docker run needs some additional mount points for the config file,
suppose you have a local dir that contains the config file /my/configs/config.yaml
docker run --rm -v '/my/configs:/configs' hasanj/renameit:<tag-name> \
renameit --config-path /configs/config.yaml
If you're using the local file system, you also need to provide mount point for directories that should contain
source and renamed files:
docker run --rm \
-v '/my/configs:/configs' \
-v '/data:/data' \
hasanj/renameit:<tag-name> renameit --config-path /configs/config.yaml
Optionally change log level
Define environment variable LOGLEVEL before running the command, default is DEBUG.
Job Configuration schema#
Lets walk through the above example to understand the different components needed:
- First
versionkey defines the config schema version (fixed/latest is0.1) jobskey contains the list of jobs the user needs to run-
Each job dict needs to have the following key value pairs:
- name: user-defined string value
- operation: can be either
copyorrename(depends onfile_system) - file_system: the file storage system where files are located (available file systems)
- rename_handler: specifies what type of file name transformation should be done and how (available rename handlers)
run the above job definition
Assuming we are running on linux with local dir /data/source_files containing 2 files:
/data/source_files/file1.txt
/data/source_files/subfolder/file2.txt
output:
/home/foo:~$ export LOGLEVEL=INFO; renameit --config-path config.yaml
2022-01-01 04:00:00 PM [INFO] Parsing 'yaml' config file from '/home/foo/config.yaml'
2022-01-01 04:00:00 PM [INFO] Running job local_job_1
2022-01-01 04:00:00 PM [INFO] Copying: /data/source_files/file1.txt to /data/target_files/TEST_PREFIX_file1.txt
2022-01-01 04:00:00 PM [INFO] Copying: /data/source_files/subfolder/file2.txt to /data/target_files/subfolder/TEST_PREFIX_file2.txt
File systems#
| type | description | operations | docs |
|---|---|---|---|
| local | Operate on local file system | copy, rename | FileSystemLocal |
| azure_datalake_storage | Operate on azure datalake storage | rename | FileSystemAzureDatalakeStorage |
| aws_s3 | Operate on aws s3 storage | copy | FileSystemAwsS3 |
| google_cloud_storage | Operate on google cloud storage | copy, rename | FileSystemGoogleCloudStorage |
Rename handlers#
| type | description | docs |
|---|---|---|
| basic | Provides simple file name transformations | BasicHandler |
| regex | Use regular expressions to match and replace patterns in file names | RegexHandler |
| datetime | Use datetime values and place them in file names | DateTimeHandler |