Live HBase Cluster Backup Using Export & Import Utility

As part of a recent project, we implemented a live HBase cluster backup mechanism for one of our clients. The goal was to build a simple and efficient backup strategy that supports both full and incremental backups using native HBase utilities. In this blog, we will walk through a script-based solution we developed to automate full and incremental backups for HBase tables using HBase’s native export and import utilities.
Why Use HBase Export/Import for Backups?
HBase’s export tool provides a flexible way to export table data to HDFS, while the Import utility allows restoring it. This approach is ideal for:
- Live cluster backups (no downtime required).
- Point-in-time recovery with incremental backups.
- Cross-cluster data migration.
While the import functionality is still undergoing testing, this blog focuses on the export process, which forms the backbone of the backup workflow.
Features of the Script
⦁ Supports Full and Incremental Backups
⦁ Full Backup: Captures all data versions up to the current timestamp.
⦁ Incremental Backup: Backs up changes from the last 4 hours.
⦁ Automated Timestamp Management
⦁ Dynamically calculates start/end timestamps for incremental backups.
⦁ Organizes backups in HDFS and copies them to a local path.
⦁ Simple Command-Line Interface
⦁ Specify the table name and backup type (full/incremental) as arguments
Prerequisites
⦁ HBase cluster with HDFS user permissions.
⦁ Hadoop CLI tools are installed on the node running the script.
⦁ Sufficient HDFS and local disk space for backups.
Script Overview
Usage Guide
# Full Backup
sh hbase_backup.sh [TBL_NAME] FULL_BACKUP
# Incremental Backup (last 4 hours)
sh hbase_backup.sh [TBL_NAME] INCREMENTAL_BACKUP
Key Parameters
⦁ TBL_NAME: HBase table to back up.
⦁ BACKUP_TYPE: FULL_BACKUP or INCREMENTAL_BACKUP.
⦁ Backup paths:
⦁ HDFS: /apps/hbase/[BACKUP_TYPE]/[TABLE_NAME][TIMESTAMP]
⦁ Local: /hbase-backup/[FULL|INCREMENTAL]
Step-by-Step Explanation
1. Full Backup Workflow
⦁ Captures all versions of the data by setting versionNumber=2147483648 (max integer value).
⦁ Exports data from the earliest timestamp (-2147483648) to the current time.
⦁ Copies the backup from HDFS to the local path /hbase-backup/FULL.
Command Example:
sh hbase_backup.sh my_table FULL_BACKUP
2. Incremental Backup Workflow
⦁ Backs up changes from 4 hours prior to the current time.
⦁ Uses versionNumber=2147483647 to limit versions (adjustable based on use case).
⦁ Stores backups in /hbase-backup/INCREMENTAL.
Command Example:
sh hbase_backup.sh my_table INCREMENTAL_BACKUP
3. Export Logic
The script uses HBase’s org.apache.hadoop.hbase.mapreduce.Export class to dump table data to HDFS:
sudo -u hdfs hbase org.apache.hadoop.hbase.mapreduce.Export \
[TABLE_NAME] \
[HDFS_BACKUP_PATH] \
[VERSIONS] \
[START_TIMESTAMP] \
[END_TIMESTAMP]
4. Copying Backups to Local Disk
After exporting to HDFS, the script runs:
hadoop fs -copyToLocal [HDFS_PATH] [LOCAL_PATH]
Code
#!/bin/bash
CURRENTTIME=”$(date +’%Y%m%d%H%M’)”
DATE=”$(date +’%Y%m%d’)”
TABLE_NAME=$1
EXPORT_CMD=”sudo -u hdfs hbase org.apache.hadoop.hbase.mapreduce.Export”
exportHbaseTables()
{
echo “Backing up Hbase Tables ….”
$EXPORT_CMD “$1” $2/”$1″_$3
hadoop fs -ls $2/”$1_$3″
echo “Backup Location on HDFS : hadoop fs -ls $2/”
echo “Backup Complete …”
}
dumptables() {
echo “Dumping HDFS backup file to local filesystem…”
hadoop fs -copyToLocal “$1” “$2”
}
backupType=$2
if [ -z “$backupType” ]; then
echo -e “For a Complete backup, use the below command:”
echo -e “usage: sh filename.sh TBL_NAME FULL_BACKUP \n”
echo -e “For an Incremental backup, use the below command – it will take the last 4 hrs backup:”
echo -e “usage: sh filename.sh TBL_NAME INCREMENTAL_BACKUP\n”
exit 1
fi
# Setting Basepath for HDFS Location
BASE_PATH=”/apps/hbase”
if [ “$backupType” == “FULL_BACKUP” ]; then
# Complete backup for all versions
echo -e “HBASE BACKUP: Starting FULL_BACKUP\n”
backupStartTimestamp=”-2147483648″
backupEndTimestamp=”$(date +%s)000″
versionNumber=”2147483648″
# Creating backup Base Path
BACKUP_BASE_PATH=”$BASE_PATH/$backupType”
echo -e “Starting FULL BACKUP – data will be stored in $BACKUP_BASE_PATH/”
exportHbaseTables “$TABLE_NAME” “$BACKUP_BASE_PATH” “$CURRENTTIME” “$versionNumber” “$backupStartTimestamp” “$backupEndTimestamp”
# Dump HDFS to local
dumptables “$BACKUP_BASE_PATH/”$TABLE_NAME”_$CURRENTTIME” “/hbase-backup/FULL”
elif [ “$backupType” == “INCREMENTAL_BACKUP” ]; then
echo -e “HBASE BACKUP: Starting INCREMENTAL_BACKUP\n”
backupStartTimestamp=”$(date –date=’4 hours ago’ +%s)000″
backupEndTimestamp=”$(date +%s)000″
versionNumber=”2147483647″
BACKUP_BASE_PATH=”$BASE_PATH/$backupType”
echo -e “Starting INCREMENTAL BACKUP – data will be stored in $BACKUP_BASE_PATH/”
exportHbaseTables “$TABLE_NAME” “$BACKUP_BASE_PATH” “$CURRENTTIME” “$versionNumber” “$backupStartTimestamp” “$backupEndTimestamp”
# Dump HDFS to local
dumptables “$BACKUP_BASE_PATH/”$TABLE_NAME”_$CURRENTTIME” “/hbase-backup/INCREMENTAL”
else
echo -e “Enter Correct Parameter \n”
exit 1
fi
Key Notes
⦁ Import Testing: While the export process is stable, the import utility requires thorough testing (e.g., restoring backups to a test cluster).
⦁ Retention Policy: Add cleanup logic for old backups to avoid disk exhaustion.
⦁ Security: Ensure sensitive data is encrypted during transit/storage.
Upcoming: Import Utility
⦁ Automate Retention: Add cron jobs to delete backups older than the required number of days.
⦁ Restore Validation: Test the import script to restore backups from both HDFS and local paths into HBase.
Stay tuned for a follow-up post!
Feedback or suggestions?
Have you implemented a different backup strategy or faced any challenges with HBase data exports? We’d love to hear your thoughts in the comments!


