Skip to main content

Live HBase Cluster Backup Using Export & Import Utility

As part of a recent project, we implemented a live HBase cluster backup mechanism for one of our clients. The goal was to build a simple and efficient backup strategy that supports both full and incremental backups using native HBase utilities. In this blog, we will walk through a script-based solution we developed to automate full and incremental backups for HBase tables using HBase’s native export and import utilities.

Why Use HBase Export/Import for Backups?

HBase’s export tool provides a flexible way to export table data to HDFS, while the Import utility allows restoring it. This approach is ideal for:

  • Live cluster backups (no downtime required).
  • Point-in-time recovery with incremental backups.
  • Cross-cluster data migration.

While the import functionality is still undergoing testing, this blog focuses on the export process, which forms the backbone of the backup workflow.

Features of the Script

⦁ Supports Full and Incremental Backups

⦁ Full Backup: Captures all data versions up to the current timestamp.

⦁ Incremental Backup: Backs up changes from the last 4 hours.

⦁ Automated Timestamp Management

⦁ Dynamically calculates start/end timestamps for incremental backups.

⦁ Organizes backups in HDFS and copies them to a local path.

⦁ Simple Command-Line Interface

⦁ Specify the table name and backup type (full/incremental) as arguments

Prerequisites

HBase cluster with HDFS user permissions.

Hadoop CLI tools are installed on the node running the script.

⦁ Sufficient HDFS and local disk space for backups.

Script Overview

Usage Guide

# Full Backup

sh hbase_backup.sh [TBL_NAME] FULL_BACKUP

# Incremental Backup (last 4 hours)

sh hbase_backup.sh [TBL_NAME] INCREMENTAL_BACKUP

Key Parameters

⦁ TBL_NAME: HBase table to back up.

⦁ BACKUP_TYPE: FULL_BACKUP or INCREMENTAL_BACKUP.

⦁ Backup paths:

HDFS: /apps/hbase/[BACKUP_TYPE]/[TABLE_NAME][TIMESTAMP]

Local: /hbase-backup/[FULL|INCREMENTAL]

Step-by-Step Explanation

1. Full Backup Workflow

⦁ Captures all versions of the data by setting versionNumber=2147483648 (max integer value).

⦁ Exports data from the earliest timestamp (-2147483648) to the current time.

⦁ Copies the backup from HDFS to the local path /hbase-backup/FULL.

Command Example:

sh hbase_backup.sh my_table FULL_BACKUP

2. Incremental Backup Workflow

⦁ Backs up changes from 4 hours prior to the current time.

⦁ Uses versionNumber=2147483647 to limit versions (adjustable based on use case).

⦁ Stores backups in /hbase-backup/INCREMENTAL.

Command Example:

sh hbase_backup.sh my_table INCREMENTAL_BACKUP

3. Export Logic

The script uses HBase’s org.apache.hadoop.hbase.mapreduce.Export class to dump table data to HDFS:

sudo -u hdfs hbase org.apache.hadoop.hbase.mapreduce.Export \  

  [TABLE_NAME] \  

  [HDFS_BACKUP_PATH] \  

  [VERSIONS] \  

  [START_TIMESTAMP] \  

  [END_TIMESTAMP] 

4. Copying Backups to Local Disk

After exporting to HDFS, the script runs:

hadoop fs -copyToLocal [HDFS_PATH] [LOCAL_PATH]  

Code

#!/bin/bash

CURRENTTIME=”$(date +’%Y%m%d%H%M’)”

DATE=”$(date +’%Y%m%d’)”

TABLE_NAME=$1

EXPORT_CMD=”sudo -u hdfs hbase org.apache.hadoop.hbase.mapreduce.Export”

exportHbaseTables()

{

    echo “Backing up Hbase Tables ….”

    $EXPORT_CMD “$1” $2/”$1″_$3

    hadoop fs -ls $2/”$1_$3″

    echo “Backup Location on HDFS : hadoop fs -ls $2/”

    echo “Backup Complete …”

}

dumptables() {

    echo “Dumping HDFS backup file to local filesystem…”

    hadoop fs -copyToLocal “$1” “$2”

}

backupType=$2

if [ -z “$backupType” ]; then

    echo -e “For a Complete backup, use the below command:”

    echo -e “usage: sh filename.sh TBL_NAME FULL_BACKUP \n”

    echo -e “For an Incremental backup, use the below command – it will take the last 4 hrs backup:”

    echo -e “usage: sh filename.sh TBL_NAME INCREMENTAL_BACKUP\n”

    exit 1

fi

# Setting Basepath for HDFS Location

BASE_PATH=”/apps/hbase”

if [ “$backupType” == “FULL_BACKUP” ]; then

    # Complete backup for all versions

    echo -e “HBASE BACKUP: Starting FULL_BACKUP\n”

    backupStartTimestamp=”-2147483648″

    backupEndTimestamp=”$(date +%s)000″

    versionNumber=”2147483648″

    # Creating backup Base Path

    BACKUP_BASE_PATH=”$BASE_PATH/$backupType”

    echo -e “Starting FULL BACKUP – data will be stored in $BACKUP_BASE_PATH/”

    exportHbaseTables “$TABLE_NAME” “$BACKUP_BASE_PATH” “$CURRENTTIME” “$versionNumber” “$backupStartTimestamp” “$backupEndTimestamp”

# Dump HDFS to local


dumptables “$BACKUP_BASE_PATH/”$TABLE_NAME”_$CURRENTTIME” “/hbase-backup/FULL”

elif [ “$backupType” == “INCREMENTAL_BACKUP” ]; then

    echo -e “HBASE BACKUP: Starting INCREMENTAL_BACKUP\n”

    backupStartTimestamp=”$(date –date=’4 hours ago’ +%s)000″

    backupEndTimestamp=”$(date +%s)000″

    versionNumber=”2147483647″

    BACKUP_BASE_PATH=”$BASE_PATH/$backupType”

    echo -e “Starting INCREMENTAL BACKUP – data will be stored in $BACKUP_BASE_PATH/”

    exportHbaseTables “$TABLE_NAME” “$BACKUP_BASE_PATH” “$CURRENTTIME” “$versionNumber” “$backupStartTimestamp” “$backupEndTimestamp”

# Dump HDFS to local

dumptables “$BACKUP_BASE_PATH/”$TABLE_NAME”_$CURRENTTIME” “/hbase-backup/INCREMENTAL”

else

    echo -e “Enter Correct Parameter \n”

    exit 1

fi

Key Notes

Import Testing: While the export process is stable, the import utility requires thorough testing (e.g., restoring backups to a test cluster).

Retention Policy: Add cleanup logic for old backups to avoid disk exhaustion.

Security: Ensure sensitive data is encrypted during transit/storage.

Upcoming: Import Utility

⦁ Automate Retention: Add cron jobs to delete backups older than the required number of days.

⦁ Restore Validation: Test the import script to restore backups from both HDFS and local paths into HBase.

Stay tuned for a follow-up post!

Feedback or suggestions?

Have you implemented a different backup strategy or faced any challenges with HBase data exports? We’d love to hear your thoughts in the comments!

Get in Touch with Our Customer Success Team.

Determine ROIs, oversee migrations, initiate complimentary PoCs, and access a team prepared to swiftly evaluate subsequent actions.