2

Yesterday, I did a update/dist-upgrade. Today, I powered on the machine and it was hanging at the loading screen with logo and cycling dots - I've waited at this screen for about an hour several times with no results. If I interrupt upstart with ctrl-alt-del, bootup resumes/completes, but it puts me at tty console login. X does start, a few seconds later, but a dialog about graphics being configured incorrectly is immediately brought up. Update: The X issue was resolved by doing apt-get install nvidia-current. Interrupt issue still stands.

Unfortunately, every lead I've found as to why this might be happening has turned into a dead end. Here's my boot.log (from /var/log) showing where I interrupted the startup. You can see it hangs just as it starts "enable remaining boot-time encrypted block devices" (this is from cryptdisks), but removing that service makes no difference. I've tried pretty much everything from this Mint bug report, which describes symptoms nearly identical to mine, to no avail. At this point, I'm fairly sure that cryptdisks is a red herring, and that it's something else entirely.

I've also found that resuming startup from recovery mode seems to load things in a different order. Upstart still hangs, but not after cryptdisks. If I ctrl-alt-del, it brings me to the graphical login manager instead of a tty, and I can login successfully. However, the system still isn't fully functional; USB plug and play seems not to work, I can't use my second monitor, and I have to manually do start resolvconf to access the internet. Here's the boot.log from one of those startups.

I should add that I am encrypting my HDD with LUKS, and the hang happens after I successfully enter the decryption password. Here's my fstab, in case it has anything to do with things:

# /etc/fstab: static file system information.
#
# Use 'blkid' to print the universally unique identifier for a
# device; this may be used with UUID= as a more robust way to name devices
# that works even if disks are added and removed. See fstab(5).
#
# <file system> <mount point>   <type>  <options>       <dump>  <pass>
/dev/mapper/ubuntu--vg-root /               ext4    errors=remount-ro 0       1
# /boot was on /dev/sda1 during installation
UUID=9e7c1e90-f3e4-4075-b3b0-e3ccb6d933c7 /boot           ext2    defaults        0       2
/dev/mapper/ubuntu--vg-swap_1 none            swap    sw              0       0

What's going on here?

Chaosed0
  • 171
  • are you using nvidia or some type of proprietary graphics? if so, you need to reinstall the driver after upgrading your kernel. to avoid this, you can have dkms installed when you install the driver so you don't have to reinstall it every time you upgrade the kernel. Also, aside from that, you should be able to boot an older kernel by choosing more options or advanced options or something like that from the grub screen. From there, choose one of the older kernel versions. – mchid Dec 18 '14 at 21:09
  • this might help http://askubuntu.com/questions/141606/how-to-fix-the-system-is-running-in-low-graphics-mode-error – mchid Dec 18 '14 at 21:12
  • Forgot to put in OP - I did try reverting back to the kernel I was using yesterday, and it didn't seem to help. I don't have any nvidia-specific package, or fglrx installed. I feel like the low graphics mode is a symptom, not a cause, but I'll try to investigate further if maybe I do have something proprietary. – Chaosed0 Dec 18 '14 at 21:16
  • Running lspci revealed that I am (and was) using a NVIDIA card, but did not have the drivers installed. Doing apt-get install nvidia-current allowed me to get to the graphical login after a normal boot, but I stil have to interrupt upstart with a ctrl-alt-del, and I still have to manually start the resolvconf service to access the internet. Maybe they are unrelated problems. – Chaosed0 Dec 18 '14 at 21:31

2 Answers2

5

The root cause was a huge number of files in my /tmp directory.

I'd used the /tmp directory to store millions of files earlier. It turns out that having that many files there causes the service which cleans /tmp to take a long, long time (duh). After moving the files out of /tmp, the problem is solved. It had nothing to do with the upgrade; that was just a coincidence.


In case it helps anyone later, here's the process I used to figure it out. I enabled the "Magic SysRq key" by changing etc/sysctl.d/10-magic-sysrq.conf. Then, I reproduced the problem by rebooting; when startup hung, I hit Alt-SysRq-t. This dumped the following in the kernel buffer, read using dmesg:

[   36.318527] SysRq : Show Blocked State
[   36.318696]   task                        PC stack   pid father
[   36.318719] find            D ffff88041dd93480     0   839    788 0x00000000
[   36.318721]  ffff880405d07a48 0000000000000082 ffff880401136000 ffff880405d07fd8
[   36.318723]  0000000000013480 0000000000013480 ffff880401136000 ffff88041dd93d18
[   36.318725]  ffff88041dfab460 0000000000000002 ffffffff811ef380 ffff880405d07ac0

It dumps a lot more than this, but this is the relevant part. This shows the blocked task is find. After that, it was just a matter of a knowledgeable friend knowing that the /tmp cleaning service was a likely culprit.

Chaosed0
  • 171
4

Thank-you, Chaosed0, for coming back with your solution (i.e. huge number of files in /tmp). [I tried to post this as a comment but I don't have enough reputation points]

I ran into the same issue with Ubuntu server (14.04) and it was very difficult to diagnose until I found your post.

When I rebooted the machine, it would appear to get blocked right before it would normally show the login console. It could be unblocked by pressing Ctrl+Alt+Del, which would cause a log message to be printed that wait-for-state (rcplymouth-shutdown) had been terminated. That log message sent me down the wrong path of poking various plymouth scripts and then trying to disable plymouth completely :-(

In actual fact, the boot process wasn't deadlocked, it was just waiting for the clean-up of /tmp to complete. That machine had tens of thousands of files under /tmp, so it was taking a long long long time to do the clean-up.

So the fix for me was to boot into recovery, get a root shell and then rm -rf /tmp/*. After an hour or so the rm job completed. Then I rebooted and everything worked normally.

It would be great if a log message could be printed when the clean-up of /tmp starts.

David Foerster
  • 36,264
  • 56
  • 94
  • 147
jamuir
  • 41