Unveiling a Python wrapper script that leverages multi-threading for swift archive creation. The script employs pgiz to perform parallel execution, speeding up the tar archive creation process.
Ever had to create a .tar.gz
archive for a large directory and found yourself twiddling your thumbs as you waited for the process to complete? My Python wrapper script could be the answer you’re looking for. Designed for execution in the shell, this script facilitates the creation of .tar.gz
archives from any directory. It utilizes pgiz
for parallel execution, which makes the process considerably faster.
In this blog post, we’ll cover how to install required libraries, use the script, and delve into the Python code that powers it all.
Before we get started, make sure the following Python libraries are installed and available:
os
argparse
subprocess
Additionally, pigz
needs to be installed on your system. On a Mac, you can install it using Homebrew with the following command:
brew install pigz
Simply copy the Python script from the bottom of this post into a file named parallelized_tar.py
or any other name you prefer.
Execute the script in the shell using the following command:
python script_name.py /path/to/input/folder /path/to/output/folder --threads 10
In the command, script_name.py
serves as a stand-in for whatever name you’ve assigned to the saved script. Substitute /path/to/input/folder
with the specific directory you wish to compress into an archive, and replace /path/to/output/folder
with the target directory where you want the resulting .tar.gz
archive to reside. By default, the archive will inherit its name from the last segment of the provided input folder path.
The script uses command-line arguments for better flexibility:
input_folder
: The path to the folder you want to archive.output_folder
: The path where you’d like the .tar.gz
archive to be saved.--threads
: Optional. The number of threads that pigz
will use for compression. The default is 10 threads.Here’s a quick rundown of what the code does:
import os
import argparse
import subprocess
def create_tar_gz(input_folder, output_folder, num_threads):
# Create the output folder if it doesn't exist
if not os.path.exists(output_folder):
os.makedirs(output_folder)
# Use the last part of the input_folder path as the archive name
archive_name = os.path.basename(os.path.normpath(input_folder)) + '.tar.gz'
output_path = os.path.join(output_folder, archive_name)
try:
# Use subprocess to run the tar and pigz commands in parallel
with open(output_path, 'wb') as f_out:
p1 = subprocess.Popen(["tar", "cf", "-", input_folder], stdout=subprocess.PIPE)
p2 = subprocess.Popen(["pigz", "-p", str(num_threads)], stdin=p1.stdout, stdout=f_out)
p2.communicate()
print(f"Archive created successfully at {output_path}")
except Exception as e:
print(f"An error occurred: {e}")
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='Create a tar.gz archive from an input folder.')
parser.add_argument('input_folder', type=str, help='Path to the input folder.')
parser.add_argument('output_folder', type=str, help='Path to the output folder.')
parser.add_argument('--threads', type=int, default=10, help='Number of threads to use.')
args = parser.parse_args()
create_tar_gz(args.input_folder, args.output_folder, args.threads)
If you encounter any errors while using the script, the most common issues usually relate to:
pigz
and the required Python libraries are installed.In the fast-paced world of technology, time is of the essence. With this Python wrapper script that facilitates parallelized tar archive creation, you can optimize your archiving operations and save valuable time. Whether you’re backing up important data or transferring files, this tool makes the process smooth, swift, and efficient.
Feel free to download the script, customize it as per your requirements, and let us know your thoughts in the comments section below.
Happy archiving!