Secure and Reliable Long-Term Backups

Inhaltsverzeichnis

Secure and Reliable Long-Term Backups on Linux 🔐

With many years of experience protecting information in all kind of infrastructures, I have some hints to share if you are looking for a reliable but also simple and affordable solution for personal backups.

At the some point in the first years of my career I decided to study Digital Forensics, where I needed to understand pretty well how the storage devices work, how the data is organized by different File Systems, and how to recover lost fragments of data without damaging an already fragile storage device.

With this knowledge on my head, I had the opportunity to work on some very interesting cases. But curiously, most of the cases were not involving malicious activities to be investigated, but in fact were about recovering information that was lost both by human errors or hardware failures. It didn’t take long until I realize that it is completely inefficient, time consuming, and boring to work on data recovery. Also, in almost all the cases I had worked the issues could be completely avoided in much simpler, much cheaper, and much more efficient ways if a good backup policy was in place.

So, backup is something extremely important if you want peace of mind. In the enterprise level, there are plenty of nice solutions but the options for individuals are not always accessible, or flexible, or simple enough.

In this article I will share with you my recommendation for a personal backup strategy that is reliable, simple, and affordable.

Physical Device

When it comes to long-term backups on Linux, making the right choice for hardware and software is essential. Lets start considering HDD and SSD options.

HDD (Hard Disk Drive)

Advantages

  • Proven reliability for cold storage (rarely powered on, periodic syncs).
  • Low risk of data loss due to power-off degradation.
  • Cost-effective for large capacities.
  • Predictable aging - mechanical parts wear, but data often stays intact for years.

Disadvantages

  • More vulnerable to mechanical failure if handled improperly.
  • Slower read/write speeds and higher latency.
  • Requires careful storage conditions (temperature/humidity).

SSD (Solid State Drive)

Advantages

  • Fast read/write and low latency.
  • More resistant to physical shocks since there are no moving parts.
  • Silent and energy-efficient.

Disadvantages

  • Higher risk of data degradation over long periods without power.
  • Limited write cycles - modern drives improved but still finite.
  • Higher cost per TB.

My experience with HDDs

I have a few backup HDDs with 10-12 years that are still working pretty well and the data is integer. I never bough the most expensive disks, and never cared much about brands, but I was always diligent with the technical specifications to select good disks based on my own criteria.

In this context, there was an isolated case with one of the first disks I created for backups. The disk is no longer in use but I preserved it for testing purposes. In this particular disk I noticed a couple of pictures that were corrupted (gradual magnetic degradation) after about 10 years without much activity. I could see the files were there, but I could not open them because some fragments of the files were lost after so long time. This was an extreme case, but confirmed the importance of regularly energizing it and ideally reading/writing data. I never had a hardware failure, but better to not wait the worst and replace the disks in reasonable intervals.

Since my purpose for this disk was to experiment and learn, I applied some techniques to recover data chunks and rebuild the file fragments so it was possible to see the corrupted pictures again, at least partially since some fragments could not be recovered. In practice it was like looking a picture that was horizontally cut, just to have some idea on how it looks like with corrupted image files.

RPM (Revolutions Per Minute)

If going with HDDs a natural question that could come is about the RPMs, which ultimately sets physical limits on how fast the information can be read or written. The common options available for end users are 5400 and 7200 RPMs.

In general, when a new backup disk is created, it receives a higher volume of data but later the backups tend to be incremental and the speed won’t have a relevant impact. You can always leave it synchronizing and go get a coffee. :)

Something that may be interesting to consider is the portability and needed space to store your disks. But this is more a personal preference. Not a big deal for the final solution. Take a look on some considerations between the two most common options.

5400 RPM (Lower speed, higher longevity)

Common in archival and energy-efficient drives. Best for cold storage and long-term backups.

Pros

  • Lower power consumption, less heat, longer lifespan.
  • Smaller disks with good portability.

Cons

  • Slower speeds (~80-120 MB/s).

7200 RPM (Standard speed, faster performance)

Common in performance HDDs. Best for regular backups, external storage, NAS.

Pros

  • Faster (~150-250 MB/s), good balance of speed & durability.

Cons

  • Slightly higher power use and heat.
  • Bigger disks with less portability.

Consideration

Essentially, a backup disk doesn’t need to be super fast. Also, in general, it is not expected a high risk of physical shocks. Instead, it is more common that they are stored in safe locations for long time.

So, after many years applying this backup strategy, I have no doubts that HDDs are my preferred option for long-term backups with infrequent access. They are reliable for longer time and much cheaper than SSDs, which allow a reasonable replacement policy with minimal investment. Actually, with the saved money you can buy two HDDs and also have a redundant backup, which is great.

Other technical aspects can help but should not have big impact in the final solution.

File System Layer

I had largely used Ext3, Ext4, XFS and more recently Btrfs. The truth is that for a personal backup solution, there is no relevant difference since you are using a robust and stable file system. Long time ago I concluded that Ext3 was the best available option for this scenario and if I remember correctly, one important criterion I had in mind back in the time was the availability of forensic tools just in case I would need to recover any data by any reason. I know, it seems a bit overthinking but since there is no shortcut for experience, I preferred to be on the safe side regarding information protection.

Later I moved to Ext4 as soon as I considered it stable enough to replace Ext3. At some point I also used XFS and have nothing to complain. Based on my last assessment few years ago, XFS still has less forensic tools but sincerely I don’t have real expectation to jump on any data recovery activity anymore. It is simply unnecessary.

You are good with any of these options, but if you are preparing something new, I strongly recommend to consider Btrfs. Lets take a look on some details about these three options: Ext4, XFS and Btrfs.

Ext4

Key features

  • Supports volumes up to 1 EiB and files up to 16 TiB.
  • Uses journaling to protect against corruption.
  • Delayed allocation for better performance.

Pros

  • Stability and reliability - it is widely used and mature.
  • High performance for general workloads.
  • Easier to recover in case of system issues.

Cons

  • No built-in checksum for data integrity.
  • Lacks native snapshot or redundancy features.

XFS

Key features

  • Supports 8 EiB volumes and 8 EiB file sizes.
  • Uses journaling and metadata logging for data integrity.
  • Optimized for parallel I/O and high-performance workloads.

Pros

  • Excellent for large files and high I/O throughput.
  • Mature and scalable.

Cons

  • No data checksums (only metadata).
  • Does not allow shrinking partitions so you need to consider the long-term during the partition design.
  • No built-in redundancy or snapshots.

Btrfs (B-tree File System)

Key features

  • Supports 16 EiB volumes and 16 EiB file sizes.
  • Copy-on-Write (CoW) for enhanced data integrity.
  • Built-in checksum, snapshots, and RAID.

Pros

  • Checksums for both metadata and data to ensure great integrity.
  • Supports snapshots, compression, and subvolumes.
  • Offers redundancy on metadata/data, even on single disks.
  • Self-healing features when redundancy is enabled.

Cons

  • Slightly lower performance in some scenarios compared to Ext4 and XFS.
  • Complex configurations for redundancy on single disks.
  • RAID 5/6 features doesn’t seem stable yet.

Consideration

For backup and long-term storage, Btrfs offers superior integrity protection (checksums, data duplication, self-healing).

It’s better suited for detecting and correcting bit rot compared to Ext4 or XFS. I really like the idea of using RAID 1 on the File System level. I know that this does not prevent hardware failures, but definitely can mitigate the issue previously mentioned, where some pictures where corrupted because some fragments were lost.

Ext4 and XFS are superior in performance, when compared to Btrfs. But sincerely it is not a big deal for a long-term backup solution. Btrfs is my recommendation for a new implementation.

Encryption

You have the custody of your data and is your duty to keep it protected against unauthorized access. So, for me there is no negotiation about encryption. The question is what is the best solution.

I recently considered using hardware encryption offered by some brands. But when researching a little more the first frustration was the lack of good software for Linux. It also annoys me to rely my information on closed solutions that I can’t properly assess the implementation. Finally, each brand has its own solution, making them not flexible or easy to migrate.

So, in this last research I only reinforced my recommendation to use an open-source solution for encryption, that I am sure that works pretty well in any Linux and in any disk. For me, LUKS is still the winner for more than a decade and likely will be my recommendation for some time yet.

LUKS Encryption

Why Use LUKS (Linux Unified Key Setup)?

  • Provides full-disk encryption with strong algorithms (AES-256).
  • Multiple key slots for different passphrases or keyfiles.
  • Native support in most Linux distributions.
  • Protects against data theft, even if the disk is stolen.
  • Can be auto-unlocked using keyfiles, TPM, or smart cards.

Consideration

If your backup contains sensitive data, LUKS is a must. It’s easy to configure, secure, and flexible.

LVM (Logical Volume Manager)

If like me you are from old times and spent nights manually moving data between disks or partitions in order to satisfy the new space demands from the business, we can easily agree that LVM is a must if you like to enjoy your free time. But is LVM also a good option for backup disk?

LVM Benefits

  • Flexible volume management: resize partitions, add new disks.
  • Snapshots support for backups/testing.
  • Striping and mirroring (if spanning multiple disks).

LVM Downsides

  • Adds complexity - requires maintenance and understanding.
  • Mirroring across single disk partitions offers limited protection.
  • If LVM metadata is corrupted, it can complicate recovery.

Consideration

LVM is great for flexible storage management, especially if you anticipate growth or need snapshots. But for single-disk backup setups, it’s often unnecessary, specially if you use Btrfs, which can also replace some benefits of LVM. Another downside is that using LVM in a external disk will demand admin privileges to properly mount the external disk because the system needs to read the LVM information.

So, LVM for single-disk backup setups is not necessary and can bring avoidable complexity.

Summary

Hardware

  • Use a quality HDD available in the marked with a good cost-benefit.
  • Power up at least once a month for sync and integrity checks.
I will provide some examples of commands below using a 5TB disk with 5400 RPM.
In this example, the disk is recognized in my computer as /dev/sdb.
Make sure to update the commands according to your environment, otherwise you may lose information.

Partitioning

  • Create a partition big enough to accommodate your growing data until the planned date to replace.
  • Using Btrfs with RAID 1 will actually reduce the real partition size in 50%. Take it into consideration.
    • For example, a 3 TB partition formatted to Btrfs with RAID 1 will allow to store about 1.5 TB.
sudo parted /dev/sdb -- mklabel gpt

# Create a 3TB partition
sudo parted -a optimal /dev/sdb -- mkpart primary btrfs 0% 3000GB
sudo parted /dev/sdb -- name 1 lukscrypt

Encryption (LUKS)

  • Encrypt the partition with LUKS.
  • Use strong passphrases or keyfiles.
# Initialize LUKS on the new partition
sudo cryptsetup luksFormat /dev/sdb1

# Open and map it to a device (we'll call it "luks_partition")
sudo cryptsetup open /dev/sdb1 luks_partition
  • Configure auto-unlock via crypttab and fstab if desired.

Filesystem (Btrfs)

  • Format the encrypted partition with Btrfs:
sudo mkfs.btrfs --metadata dup --data dup --label "my-encrypted-backup" /dev/mapper/luks_partition

Conclusion

The generic solution proposed here was tested for more than a decade and it is proved to be simple, cheap and very reliable. Most importantly, it allows any person to create a robust backup solution without spending a lot of money thanks to open-source technology. The only investment is in hardware, which is also minimized due to the great level of coverage provided by the software layer.

There are many interesting discussions going very deep in details about tiny specification of hardware, file systems, encryption, etc. I certainly support any discussion that contributes to spread the knowledge. But I also understand that most of the people are not experts, not always have time to research, or have no access to proper mentorship when diving in so many technical details. I want to provide you the information you need in a simplified and didactical way so you can more quickly understand the technology and independently protect your data with peace of mind.