1

I'm running Ubuntu 16.04 on a tiny device that the techs have placed on the very top of a server rack, it has become unresponsive 3 times in the last 4 days and with nothing logged saying it was shutting off.

The power light stayed on, etc... But I had no indication that the OS was functioning while it was not working (I only have atop and rsyslog actively logging).

I'm going to give them back the device with "sensors" running in a cronjob logging to a file until I can prove this is why it's getting turned off. But, before I do, should I have known that the machine was turning off because of some acpi shutdown trigger that ought to have been logged.

I'm guessing this might be hardware specific, but it seems strange that I'm not getting any sort of trigger from the kernel that it's about to go casters up.

Peter Turner
  • 576
  • 1
  • 9
  • 20
  • Does this 'tiny device' have a baytrail processor from Intel? – WinEunuuchs2Unix Mar 01 '17 at 20:41
  • @win yeah, I think so. It's an atom E3825 – Peter Turner Mar 02 '17 at 14:09
  • You need this solution: http://askubuntu.com/questions/803640/system-freezes-completely-with-intel-bay-trail but I have new info about kernels 4.8, 4.9 and 4.10 to link later when I find it again. – WinEunuuchs2Unix Mar 02 '17 at 15:53
  • OK, I had noticed that before Ubuntu 16.04 LTS came around, in my experience it was fixed. Glad to know there was a fix for it, but I think this was heat related, I've got about 15 of these puppies out in the field right now and the ones with good ventilation aren't hanging. Also, I think when that sleep state thing happens, the computer continues to function. (if not, then it's a different bug I've experienced) – Peter Turner Mar 02 '17 at 17:36
  • The documented side effect of intel idle max kernel parameter is increased heat if iirc. – WinEunuuchs2Unix Mar 02 '17 at 18:11
  • If you think it's hardware specific, do you think [edit]ing a link to a pastie containing the output of sudo lshw might be useful? – Elder Geek Mar 14 '17 at 14:02
  • @elder company firewall blocks pastebin and most other sharing sites, I could do a github gist, but that's probably against the spirit of the site. Is there a particular part of lshw that would be useful I could edit in? I see some ACPI stuff and the processort description. – Peter Turner Mar 14 '17 at 14:22
  • I can't find anything that would indicate that that course of action would be against the spirit of the site. I did however find this which seems to indicate that using a github gist is perfectly acceptable. – Elder Geek Mar 14 '17 at 15:46

1 Answers1

0

I'm not sure what policy this site has on "original research" but...

I managed to get my fanless computer up to 84 degrees Celsius by putting it in a cookie tin with some rocks and a coffee warmer running the stress command and it shut down without leaving any logs.

Ubuntu said the critical temp was 110.

Peter Turner
  • 576
  • 1
  • 9
  • 20