2

I did something stupid, corrupted my main linux install, and am juggling disks to back everything up. In the process I've run into a dd and/or LUKS issue I don't know how to deal with. dd does not appear to be creating a true clone!

The original disk is fine, except that the install is broken. My data still is intact. I stuck it in an external USB chassis and attached it to my laptop (exact same version of Ubuntu as the main PC was).

fdisk shows the standard LUKS encrypted constellation of 3 partitions (all ext4):

/dev/sda1 is boot,

/dev/sda2 is an extended partition consuming the rest of the disk

/dev/sda5 is the same size as /dev/sda2, but for LUKS.

If I run "cryptsetup luksOpen /dev/sdb5" and then mount, I can access the contents of the disk just fine.

I then stuck that disk and a spare disk I had floating around into my lobotomized main PC, and booted off a live ubuntu stick. Both disks were recognized and I ran:

"dd if=/dev/sda of=/dev/sdb bs=4M status=progress" 

and waited 3 hours. It ran without complaint.

I doubt it matters, but the source disk is a 1.8 TB SSD and the destination is a 3 TB HDD.

I stuck the 3 TB disk into an external chassis and attached it to my laptop.

Now, fdisk only shows /dev/sdb1 and /dev/sdb2. These look correct, but without /dev/sdb5 I can't mount the LUKS encrypted volume and I can't access my data.

My understanding is that dd copies every byte, and there's no hidden metadata it misses, but I'm not an expert on modern disk controllers. Am I missing something (other than /dev/sdb5)?

Is there something I need to do on the laptop? The passphrase should be the same if it is a true byte-clone of the original. I assume there's nothing keyed to the disk serial #, since that strikes me as something nobody would want in a software-based encryption scheme.

Any insight will be greatly appreciated! I'm hesitant to do anything until I've ascertained I can access the data on my backup disk.

gdisk output for 5 TB disk: GPT fdisk (gdisk) version 1.0.1

EBR signature for logical partition invalid; read 0x0000, but should be 0xAA55 Error reading logical partitions! List may be truncated! Partition table scan: MBR: MBR only BSD: not present APM: not present GPT: not present


Found invalid GPT and valid MBR; converting MBR to GPT format in memory.


Disk /dev/sdb: 1220942646 sectors, 4.5 TiB Logical sector size: 4096 bytes Disk identifier (GUID): REDACTED Partition table holds up to 128 entries First usable sector is 6, last usable sector is 1220942640 Partitions will be aligned on 256-sector boundaries Total free space is 1220444971 sectors (4.5 TiB)

Number Start (sector) End (sector) Size Code Name 1 2048 499711 1.9 GiB 8300 Linux filesystem

gdisk output for 5 TB disk:
GPT fdisk (gdisk) version 1.0.1

EBR signature for logical partition invalid; read 0x0000, but should be 0xAA55 Error reading logical partitions! List may be truncated! Partition table scan: MBR: MBR only BSD: not present APM: not present GPT: not present


Found invalid GPT and valid MBR; converting MBR to GPT format in memory.


Disk /dev/sdb: 1220942646 sectors, 4.5 TiB Logical sector size: 4096 bytes Disk identifier (GUID): REDACTED Partition table holds up to 128 entries First usable sector is 6, last usable sector is 1220942640 Partitions will be aligned on 256-sector boundaries Total free space is 1220444971 sectors (4.5 TiB)

Number Start (sector) End (sector) Size Code Name 1 2048 499711 1.9 GiB 8300 Linux filesystem

Cheers, Ken

Mitch
  • 107,631
Kensmosis
  • 141
  • Both computers were running ubuntu 16.04. Kernel is 4.4.0-79-generic. – Kensmosis Oct 17 '20 at 00:36
  • 1
    A 3TB drive is supposed to be gpt partitioned. Was it partitioned before? Your dd will convert it to a 1.8TB MBR drive, but may still have backup gpt partition table at end? Post this: sudo gdisk -l /dev/sdX where the X is which drive it is now seen as sda, sdb, sdc, etc. – oldfred Oct 17 '20 at 02:31
  • Are you sure that device names in the live environment matched what you were used to? – vidarlo Oct 17 '20 at 10:38

2 Answers2

2

It appears this is related to the use of USB chasses to test the 3.5" backup HDDs. Although the answer is unrelated to dd, I'll post it in case someone runs into a similar issue.

The original drive uses 512 byte sectors, but apparently most external 3.5" usb chasses use a chipset which decides that windows is the only OS anyone possibly would want to use. Bill Gates says sectors are 4K, so they must be 4K. It then cheerfully reports the original sector start numbers for partitions. But now it reports a 4K sector size, which linux then converts to the wrong byte offset. It also explains the 14.6T partition size reported.

Using a loopback device with the right offset does allow mounting and access of the partitions --- though I still seem to be unable to access the LUKS sdb5 partition (since it isn't reported), even when I put in the address calculated from the offset on the original drive. The data's there, but the screwy reporting keeps cryptsetup from working. Not sure if there is a workaround other than getting a less presumptuous usb chassis.

Plugging the HDDs (via sata) into the original PC does show an sdb5 partition. DD apparently did work as expected, even copying the disk ID. My 2.5" usb chassis (used for attaching the original drive to the laptop) didn't have the same problem. Either it wasn't designed for windows or maybe it just associates a block size of 512 with SSD's. In either case, it returns the correct sector size, and I can access the LUKS partition.

So... a cautionary tale about assuming that even the simplest devices won't try to "correct" you. All is for the best in the best of all possible worlds.

Kensmosis
  • 141
  • 4K sectors is not about microsoft or windows. It's about modern drives. You can read more for instance here. It's a change pushed not by Microsoft, but by advancing requirements. – vidarlo Oct 25 '20 at 09:04
  • 1
    The use of 4K sectors is indeed due to the demands of larger drives, but the blithe auto-resizing behavior of some USB chassis chips appears to have originated with a need for backward compatibility with windows XP (or so I've read, since I have no inside knowledge of the matter). The issue isn't the 4K sector size. It's that the controller chip decides that it's smarter than the user and simply reports a 4K sector size regardless of the disk's actual sector size. – Kensmosis Oct 26 '20 at 15:15
2

[EDIT] After spending an unscrupulous amount of time draftiving, revising, finding references, citations, and posting this answer, I'm embarassed to say the least that the OP had already answered their own question. My initial thought was to delete my answer as most of it is irrelevant but that remains difficult after the amount of time invested into what may amount to nothing. Though I hope to address the newly discovered issue mentioned in the OP's unmarked answer.

Updated issue:

Using a loopback device with the right offset does allow mounting and access of the partitions --- though I still seem to be unable to access the LUKS sdb5 partition (since it isn't reported), even when I put in the address calculated from the offset on the original drive. The data's there, but the screwy reporting keeps cryptsetup from working. Not sure if there is a workaround other than getting a less presumptuous usb chassis.

Since it seems to work while connected internally and you've determined the issue is the chassis, why not just recopy the drive and change your bs value? After all, when you plugged the original drive in through the chassis, you were able to access files, so it was recognized in that way. So it's odd that the original drive could be read through the chassis but not the new copy unless...

...By chance part my answer did happen to apply. Then perhaps with the loop device scenario whereby you calculated the offset from the original drive, you can try this script to determine the offset being reported when it's connected as USB to address the partition not being recognized.

Otherwise the more simple but more tedious fix would be to reformat the spare drive prior to recopying the main drive to back it up so that when it's plugged in as an externa USB device, the chipset doesn't get confused and cause linux to convert an incorrect byte offset.

[END EDIT]


The Old and Inapplicable Oversimplified Explanation

What's happening is that you probably didn't format the spare disk you found before using dd to backup your main linux install and what you had on your main linux install hard drive was an MBR partition scheme while the spare disk had a GPT partition scheme.

@oldfred pointed this out in a comment already but no response to the request for output of the following command has still not yet been received

sudo gdisk -l /dev/sdb

(whereas I've replaced the X to represent how the disk is shown in reference for the GPT/MBR related error, AKA the laptop)

So when you used dd to copy the entire disk onto the the spare disk, which is larger than the original (or source) disk, and since it does, in fact, copy byte for byte of the drive, it copied over the partition scheme (including the partition table and headers) of the former hard disk and laid it on top of the GPT partition scheme of the larger disk. This means it would've been blocked by GPT's protective MBR layer which is designed to prevent MBR-based disk utilities from misrecognizing and possibly overwriting a GPT disks and conflicted with the LUKS header which sits at the start of the device, in the same place the Protective MBR sits. So when it was not allowed to write this part, it was not able to include the proper header information for /dev/sdb5 or /dev/sda5 whichever the case may be. And, if you're wondering why dd didn't report any of this as an error or run with that 'complaint' when it finished, this might be why I wouldn't be able to tell you without the ouput shown by dd when it finished, which usually looks something like this:

6+1 records in
6+1 records out
3454 bytes (3.5 kB) copied, 8.3585e-05 s, 41.3 MB/s

But otherwise if it had mentioned it, it would've been reported in real time and not shown at the end. Otherwise you'd probably have to look in your kernel log messages (dmesg, or /var/log/kern.log) for more detailed messages in case it would've been considered a hardware error. You may also find smartctl -x /dev/sda useful in this case since it would've been something to write to a section prior to the beginning a partition or something, which might also show up in the kernel log. Especially since this error wasn't reported until afterwards:

Found invalid GPT and valid MBR; converting MBR to GPT format in memory.

With how complicated this "oversimplified answer" is getting, I digress and won't bother getting into the specific conflicts of how it all works with MBR Disk partitioning versus GPT partition table headers and GPT partition entries and how that is affected by BIOS, UEFI, or UEFI with CSM conditions, to further perhaps unnecessarily the "why" as that might be beyond the scope of how to resolve your issue but hopefully offered some level of insight.

To address this in the future you can check the backup image consistency for any problems with the backup copy prior to all the trouble of connecting to the laptop

fsck -y /dev/sdb

Which reminds me, when dd completed "without complaint", was the live boot stick system able to see the third partition?? Because if it could, was there any attempt to access your data from there? If yes to both, then that might invalidate everything I just thought whereby your hard drive enclosure's controller might be the suspect.

Ahh well...

Moving Forward - Backup of your data

Depending on what your ultimate goal is, whether it is:

  1. To make a backup of your just your data so that it's not lost while trying to fix your main PC where you can just copy that data back into the PC once it's fixed; or
  2. if it's to create an exact duplicate of your main PC so that if you mess up any attempt to fix the "broken install" you can start over from the beginning

Then how you may want to proceed will vary. The main difference that is inferred and/or needs to be assumed here is that for each scenario from above, then respectively, in the case of:

  1. If all else fails, you would have to resort to a fresh install of your desired operating system and copy over your data by mounting it as an external or internal drive and would be limited to personal files so any apps, packages, programs or settings would need to be redownload, reinstalled; or reset while (More Work - File Data exclusive restoration)
  2. If all else fails, you can keep trying and start over from where you are now, so that you can get your system back to boot without needing to reinstall anything or copying over anything (More tries - Full system restoration)

Since you mentioned that you're "hesitant to do anything until [you've] ascertained [you] can access [your] data on [your] back up disk", I will assume you just want to have a backup of your data as you go about fixing your PC and assume that your extended partition mentioned in the output of fdisk as /dev/sda2 doesn't contain any desirable data to be preserved and that only /dev/sda5 is relevant since it was mentioned

...but without /dev/sdb5 I can't mount the LUKS encrypted volume and I can't access my data.

Then I would suggest that you change your options and flags for dd so that rather than make a backup of your entire disk by copying it directly to the spare disk, you instead either make a backup image copy of the entire disk by changing the of of dd to point to an image file saved onto your mounted spare disk, whereby you can then mount that image virtually either as a virtual system so that you avoid any partitioning conflicts on the spare drive or even better yet, change the if flag of dd and copy only the partition you need which would be /dev/sda5 if using the naming schemes referenced when mounting both disks and booting from a live stick system.

Since you refrained from using the oflag=direct or oflag=sync options when copying the disks, I strongly suggest you review the dd man page for options if doing the above.

Addressing Your Questions

My understanding is that dd copies every byte, and there's no hidden metadata it misses, but I'm not an expert on modern disk controllers. Am I missing something (other than /dev/sdb5)?

Yes you're right, but as we've discovered, minor write errors could've been overlooked without proper logging in terms of the hidden metadata. Because it read it fine, it just couldn't write it. So, while it's difficult to quantify what you're missing as it already seems like you know quite a bit, I would, at the least, simply mention dd does makes a copy of every bit, byte for byte, and it would go so far as to even copy the empty spaces or NULL bytes of unpartitioned space and even unmarked "empty space" where files once existed but were deleted and could be recovered therefore could be recovered if a copy using 'dd' included that "empty space". But it does a lot more than simply read from stdin and output to stdout albeit it takes incorporating some options, but I digress.

If anything, I would venture to say that something most people perhaps often forget to consider is what copying every byte actually means or translates to, depending on the scenario and what's being copied/read from or what medium holding the data is used, such as a device that is not similar in size or type to the source being copied. One specific reference to elaborate on this would be that if it can't be read by dd it can't be written to by dd. So in terms of copying a filesystem from one drive to one that is nearly or exactly similar, some hardware level coded data won't be carried over and bits of information that is relied on for hardware addressing by other data not stored in the file or device itself and the such won't be changed. But for some, that's obvious while for others it's not. To be sure, if it exists on a block and can be read by the computer but is not revealed to the user, it can be copied.

Is there something I need to do on the laptop? The passphrase should be the same if it is a true byte-clone of the original.

As we've surmised by now, it's not the laptop causing the issue but the exact duplication of data on to mismatched hardware and mismatched partition schemes (but more so the latter as will later be explained), that resulted in a partition that can't be recognized by the OS and thus resulted in an inability to mount. As for the passphrase being the same, yes you're right, it is the same.

To take it a step further, as noted in the LUKS faqs under 1.2 Warnings - Cloning/Imaging:

If you clone or image a LUKS container, you make a copy of the LUKS header and the master key will stay the same! That means that if you distribute an image to several machines, the same master key will be used on all of them, regardless of whether you change the passphrases.

yes you're right, it is the same. but you're issue is not that it's not recognizing the password, but the partition itself.

More Elaborate Explanation (without getting too technical)

While the mismatched partition schemes are the named suspects of the issue as a result examining the error message provided, the conclusion stemmed from a number of experiences that have been presumed to apply but details pertaining to them were omitted likely due to their seemingly unrelated involvement in creating the issue. These considerations included:

  • Knowing the partition scheme of the main linux system SSD which was determined as having an MBR partition scheme since it was mentioned that /dev/sda2 was an extended partition and those only exist in MBR. The missing piece for solid conclusive evidence is whether the PC was running in BIOS/Legacy or UEFI with CSM support
  • Without knowing the motherboard settings for BIO or UEFI, we don't know if the live boot stick was launched in legacy or UEFI mode but it had to match the desktop in being able to recognize both disks when attached. Though suffice to say, we don't have what was defined as "being recognized" since to make a proper copy with dd for every byte for byte would've required offline copies, meaning you wouldn't have mounted them to see if you could actually access them. But we dont know what system was used to make the copy either or what version of dd was used
  • Knowing for certain the partition scheme of the spare hard drive - there isn't much information on the spare drive's origins, what format it was in, where it came from (if it was a Linux PC or Windows PC) which would've greatly help determine the destination of the data being written where errors were found
  • Finally, we knew the laptop could see the drive before the backup was made so it had to have CSM support if running UEFI or booted in BIOS mode. When it was able to access the first 2 partitions of the backup but not see the 3rd luks encrypted partition, it stands to reason it was not at all having to do with OS system recognition but missing partition header information causing the unviewable drive.

Additional Notes & Considerations

Some trivial notations to help you more efficiently get help or find answers in the future, feel free to take it or leave it as I only offer them as suggestions and not as any sort of criticism or ridicule since I'm sure anyone can equally point out mistakes of mine from this post as well (and means to say I am no better than you), with all due respect:

  • In case it is not already know, "Partition" and "Volume" are not usually interchangeable and technically refer to different units of storage of a hard drive. Whereas a volume is a single accessible storage area with a single file system. A partition is a logical division of a hard disk. Although it might be just semantics for this post, it may help cause less misinterpretation in other scenarios when trying to understand the events that take place
  • Please try to be consistent with your references, you show an fdisk output indicating /dev/sda1 /dev/sda2 and /dev/sda5 but then refer to using cryptsetup on /dev/s**d**b5 I understand the device location changes with each system but it can create confusion to what you're trying specifically, such as the what, when, and where. i.e. Seeing as how fdisk shows partitions for /dev/sda from context I can assume this was run on the laptop, and when you mention the dd command, judging by the input flag (or if=) I can assume it's the source disk, but if I assume based on the information provided that cryptsetup was run on the same drive as the same one referred to in fdisk and dd then it must be on a different system to have a different device assignment. Conversely, if I assume it was the laptop, then I must assume some sort of swapping took place to constitute being the same drive or it's not the same device to receive a device mapping name change. Though in this case it may not be as pertinent, in other cases it might mean one error over another.
  • With consideration of the 2 aforementioned points above, it might be more clear how increasingly confusing it is when you mention the source disk being a 1.8 TB SSD and the destination being a 3TB HDD but then you go on to show the gdisk output for a 5TB disk which suggests either there was a typo in referencing a 3 TB disk or you meant to say a 3TB partition of an HDD whereas the latter would help point out an issue resulting from command option flag being executed as opposed to something else.
  • Though you've mentioned "I doubt it matters," there is quite a bit of significance in the size and drive type when using dd with regard to making backups. Especially if it's important data not to be thought of as nuance, then it's good to note those things as I would've brought up the issues that would've contributed to possible errors not otherwise reported after using dd (such as ones only reported in dmesg from bufferbloat) had it not been the case as prescribed here and more specific to when you transfer from a lower capacity drive to a higher capacity drive as well as a solid state drive to a hard disk drive in terms of effective options to be used with dd to mitigate performance issues, cache errors, and ultimately reliability of the data being transferred. (Though I've deleted the original summary of how that all matters since it wouldn't help resolve your issue anyway)
Doedigo
  • 81
  • After posting, I realized the OP answered their own question but didn't mark it as such. /shootme – Doedigo Jan 16 '22 at 04:30
  • 1
    Apologies for that! I didn't realize this was the etiquette (it felt self-serving to accept my own answer to my question). I fixed it. This said, I found your answer very informative! I personally ran into the issue (on another occasion when setting up a drive for mirroring), that dd'ing copies EVERYTHING. My bios wouldn't recognize either drive because both identified themselves as the same to it. Since then I've been much more careful. Thanks for the answer, and sorry about the poor followup on my part! – Kensmosis Jan 17 '22 at 16:59
  • 1
    I can understand that feeling and would feel the same way if I were you so no problem at all. I should have been more thorough in reading the provided answer as I read the part about it still not working and must've drifted off and failed to read the rest so it's my own demise I have caused. Thank you tho, I appreciate your reply and thank you for your kind words. StackExchange does have some very unique customs but ones I learn to appreciate at some point in the time heh. in any case, thanks again, and take care! – Doedigo Jan 19 '22 at 19:21