A Precarious State of Storage

According to Dictionary.com, precarious has the following definition:

precarious
  1. dependent on circumstances beyond one's control; uncertain; unstable; insecure
  2. dependent on the will or pleasure of another; liable to be withdrawn or lost at the will of another
  3. exposed to or involving danger; dangerous; perilous; risky
  4. having insufficient, little, or no foundation

This is the story of how I managed to get the storage on one of my systems in to a precarious state, and back out.

Over the coarse of the last few months, I've been rather mean to the storage of one of my systems. Not the least of which was multiple file system (ext4 and Reiser) expansion and reduction cycles. Many of the expansions were done on line (mounted) and all of the reductions were done offline (unmounted). I did this because of 1) how I run systems that are tight on disk space (120 GB SSD) and 2) my personal preferences. Specifically I LOVE LVM and have been embracing them for the better part of 15 years. As such, I tend to use separate LVs / FSs for various directories on my Linux systems. Since I tend to keep the LVs / FSs small, occasionally ~> frequently one of them will run out of space. Thankfully, do to the wonderful nature of LVMs, I can expand the LV and file system. More recently I've been able to do this expansion with the file system on line (mounted). Sometimes this expansion is temporary, so I will ultimately shrink the LV / file system back down to regain free space in the VG. I wan the free space in the VG because I tend to create LVs / FSs as needed, even if it's only temporarily. Recently I've even started to use LVs as virtual disks for VMs, so they get created, used, and destroyed as test transient VMs. So after repeating this cycle for a number of months, I ended up with my root (/) file system taking longer and longer to shrink, indicating that there was a (likely) underlying file system meta data issue. (Don't ask why was I repeatedly growing and shrinking the root (/) file system, because I'm not going to tell. Suffice it to say it has to do with the less than optimal installation.) Thankfully the system (notebook) in question had an eSATA port, so I could hook up another hard drive.

Below is a high level of the process that I went through to resolve my problems.

  1. Hook up the external eSATA drive and partition it in to two partitions, one for booting and the other as a PV to hold the VG.
  2. Enable (luksFormat) and activate (luksOpen) LUKS encryption on the second partition.
  3. Turn the encrypted device into a physical volume (pvcreate) for use with LVM.
  4. Extend the root volume group (vgextend) by adding the new PV to it.
  5. Create a new temporary logical volume (lvcreate -n rootNew rootvg) for each of the file systems (other than /boot) that I needed to work with.
  6. Format (mkfs.ext4) the first partition and temporary LVs.
  7. Make temporary mount points (mkdir /mnt/root) for each of the file systems that I needed to work with.
  8. Mount the temporary file systems (mount /dev/rootvg/rootNew /mnt/root).
  9. Recursively copy data from the old file system, /, to the new file system, /mnt/root, being careful not to get in to a recursion loop by using the "--one-file-system" option to cp.
  10. Check the error level of the copy (echo $?) to make sure there were no errors. Note: You have to do this IMMEDIATELY after you do the cp, any other command will replace cp's error level with it's own.
  11. Do a quick sanity check and pipe the output find into wc to count the number of items on the source and destination file systems. Again, passing options to find (-xdev) to tell it to not cross file system boundaries.
  12. Repeat the last three steps for each additional file system.
  13. Install GRUB to make the first partition be bootable. (grub, root(hd1,0), setup (hd1), exit)
  14. Update the copy of /etc/fstab on the new root LV (rootNew) to reflect the new devices, specifically rootNew for / and /sdb1 for boot. Update fstab for any additional LVs / FSs that you are migrating.
  15. Unmount the temporary mount points.
  16. Reboot the system and boot off of the new boot disk (/dev/sdb1).
  17. Presuming that all went well and that you are now booted off of the new LVs / FSs.
  18. Migrate (pvmove) all LVs to the external disk to vacate the internal disk.
  19. Remove (vgreduce) the internal disk from the VG.
  20. Wipe LVM information (pvremove) from the internal disk. - This is optional, but I find it better to gracefully back things out of service to a blank state.
  21. Remove the partition that was holding the PV. - This too is optional, but I find it better to gracefully back things out of service to a blank state.
  22. Remove the partition that was the old /boot file system. - This is also optional, but I find it better to gracefully back things out of service to a blank state.
  23. Now that I was no longer dependent on the internal disk, I was able to move it to another system where I could perform some SSD tweaks on it. (Set up Over Provisioning and tweak partition alignment.)
  24. I did use a mixture of gpart and fdisk to partition the disk. I used gpart to find the proper sector numbers to use so that the partitions aligned with my SSD's Erase Block Size. Then I used those sector numbers in fdisk because I want a MBR, not a GPT on this system.
  25. Once the SSD was tweaked, I could re-introduce it to the system and reverse the process.
  26. /boot (same steps as above: format, copy, grub, /etc/fstab)
  27. Re-introduce the SSD to the volume group.
  28. Migrate all of the logical volumes back from the external drive to the internal drive.
  29. Reboot the system on the internal drive.
  30. Gracefully remove the external drive from the VG. (vgreduce, pvremove, (de)partition)
  31. Presuming that all went well, you can now safely delete (lvremove) or rename (lvrename) the old logical volumes.
  32. With the old logical volumes out of the way you can rename the new logical volumes to the old name and tweak /etc/fstab accordingly.

After going through the entire process, I managed to migrate my system to an external drive, vacating my internal drive in the process and then back again, without any data loss. Along the way, I re-created any precarious file systems there by removing any potential problems that might bite me in the future.

Note: That's the rough process that I went through. There were some additional steps, particularly related to LUKS encryption and what my system's configuration is expecting. I did have to tweak GRUB (/boot/grub/menu.lst) and /etc/fstab to reflect new UUIDs of various devices.