I just discovered thermald for preventing machines from overheating. I'd like some basic suggestions on how to modify the xml configuration file. Below is the one I have at /etc/thermald/thermal-conf.xml
. From some examples I have skimmed online, it seems it is set to start preventing overheating at 55 C (if I am reading the <Temperature>55000</Temperature>
line correctly), but my cores reach even 94 C with fans going on.
I am using a Intel(R) Core(TM) i7-10850H CPU @ 2.70GHz
machine.
<?xml version="1.0"?>
<!--
use "man thermal-conf.xml" for details
-->
<!-- BEGIN -->
<ThermalConfiguration>
<Platform>
<Name>Generic X86 Laptop Device</Name>
<ProductName>EXAMPLE_SYSTEM</ProductName>
<Preference>QUIET</Preference>
<ThermalSensors>
<ThermalSensor>
<Type>TSKN</Type>
<AsyncCapable>1</AsyncCapable>
</ThermalSensor>
</ThermalSensors>
<ThermalZones>
<ThermalZone>
<Type>SKIN</Type>
<TripPoints>
<TripPoint>
<SensorType>TSKN</SensorType>
<Temperature>55000</Temperature>
<type>passive</type>
<ControlType>SEQUENTIAL</ControlType>
<CoolingDevice>
<index>1</index>
<type>rapl_controller</type>
<influence> 100 </influence>
<SamplingPeriod> 16 </SamplingPeriod>
</CoolingDevice>
<CoolingDevice>
<index>2</index>
<type>intel_powerclamp</type>
<influence> 100 </influence>
<SamplingPeriod> 12 </SamplingPeriod>
</CoolingDevice>
</TripPoint>
</TripPoints>
</ThermalZone>
</ThermalZones>
</Platform>
<!-- Thermal configuration example only -->
<Platform>
<Name>Example Platform Name</Name>
<!--UUID is optional, if present this will be matched -->
<!-- Both product name and UUID can contain
wild card "", which matches any platform
-->
<UUID>Example UUID</UUID>
<ProductName>Example Product Name</ProductName>
<Preference>QUIET</Preference>
<ThermalSensors>
<ThermalSensor>
<!-- New Sensor with a type and path -->
<Type>example_sensor_1</Type>
<Path>/some_path</Path>
<AsyncCapable>0</AsyncCapable>
</ThermalSensor>
<ThermalSensor>
<!-- Already present in thermal sysfs,
enable this or add/change config
For example, here we are indicating that
sensor can do async events to avoid polling
-->
<Type>example_thermal_sysfs_sensor</Type>
<!-- If async capable, then we don't need to poll -->
<AsyncCapable>1</AsyncCapable>
</ThermalSensor>
<ThermalSensor>
<!-- Examle of a virtual sensor. This sensor
depends on other real sensor or
virtual sensor.
E.g. here the temp will be
temp of example_sensor_1 0.5 + 10
-->
<Type>example_virtual_sensor</Type>
<Virtual>1</Virtual>
<SensorLink>
<SensorType>example_sensor_1</SensorType>
<Multiplier> 0.5 </Multiplier>
<Offset> 10 </Offset>
</SensorLink>
</ThermalSensor>
</ThermalSensors>
<ThermalZones>
<ThermalZone>
<Type>Example Zone type</Type>
<TripPoints>
<TripPoint>
<SensorType>example_sensor_1</SensorType>
<!-- Temperature at which to take action -->
<Temperature> 75000 </Temperature>
<!-- max/passive/active
If a MAX type is specified, then
daemon will use PID control
to aggresively throttle to avoid
reaching this temp.
-->
<type>max</type>
<!-- SEQUENTIAL | PARALLEL
When a trip point temp is violated, then
number of cooling device can be activated.
If control type is SEQUENTIAL then
It will exhaust first cooling device before trying
next.
-->
<ControlType>SEQUENTIAL</ControlType>
<CoolingDevice>
<index>1</index>
<type>example_cooling_device</type>
<!-- Influence will be used order cooling devices.
First cooling device will be used, which has
highest influence.
-->
<influence> 100 </influence>
<!-- Delay in using this cdev, this takes some time
too actually cool a zone
-->
<SamplingPeriod> 12 </SamplingPeriod>
</CoolingDevice>
</TripPoint>
</TripPoints>
</ThermalZone>
</ThermalZones>
<CoolingDevices>
<CoolingDevice>
<!--
Cooling device can be specified
by a type and optionally a sysfs path
If the type already present in thermal sysfs
no need of a path.
Compensation can use min/max and step size
to increasing cool the system.
Debounce period can be used to force
a waiting period for action
-->
<Type>example_cooling_device</Type>
<MinState>0</MinState>
<IncDecStep>10</IncDecStep>
<ReadBack> 0 </ReadBack>
<MaxState>50</MaxState>
<DebouncePeriod>5000</DebouncePeriod>
<!--
If there are no PID parameter
compensation increase step wise and exponentaially
if single step is not able to change trend.
Alternatively a PID parameters can be specified
then next step will use PID calculation using
provided PID constants.
-->>
<PidControl>
<kp>0.001</kp>
<kd>0.0001</kd>
<ki>0.0001</ki>
</PidControl>
</CoolingDevice>
</CoolingDevices>
</Platform>
</ThermalConfiguration>
<!-- END -->
Following the suggestion from @heynnema, I have deleted the configuration file, stopped thermald
, and ran sudo thermald --no-daemon --loglevel=info
. Following is the output, but how is this supposed to help me construct a new, more efficient configuration file?
$ sudo thermald --no-daemon --loglevel=info
[1649408071][INFO]RAPL domain count 1
[1649408071][INFO]RAPL domain count 1
[1649408071][MSG]22 CPUID levels; family:model:stepping 0x6:a5:2 (6:165:2)
[1649408071][INFO]Running on a vanilla kernel
[1649408071][MSG]Polling mode is enabled: 4
[1649408071][INFO]sensor_update: type TSKN
[1649408071][INFO]sensor_update: type acpitz
[1649408071][INFO]sensor_update: type x86_pkg_temp
[1649408071][INFO]sensor_update: type pch_cometlake
[1649408071][INFO]sensor_update: type NGFF
[1649408071][INFO]sensor_update: type TMEM
[1649408071][INFO]sensor_update: type B0D4
[1649408071][INFO]sensor_update: type TVGA
[1649408071][INFO]thd_read_default_thermal_sensors loaded 8 sensors
[1649408071][INFO]dts /sys/devices/platform/coretemp.0/name doesn't exist
[1649408071][WARN]sensor id 11 : No temp sysfs for reading raw temp
[1649408071][WARN]sensor id 11 : No temp sysfs for reading raw temp
[1649408071][WARN]sensor id 11 : No temp sysfs for reading raw temp
[1649408071][INFO]INT3400 Base path is
[1649408071][INFO]failed to open /dev/acpi_thermal_rel
[1649408071][INFO]failed to open /dev/acpi_thermal_rel
[1649408071][INFO]TRT/ART read failed
[1649408071][INFO]Using config file /etc/thermald/thermal-conf.xml
I/O warning : failed to load external entity "/etc/thermald/thermal-conf.xml"
[1649408071][WARN]error: could not parse file /etc/thermald/thermal-conf.xml
[1649408071][INFO]sensor index:2 TSKN /sys/class/thermal/thermal_zone2/ Async:0
[1649408071][INFO]sensor index:0 acpitz /sys/class/thermal/thermal_zone0/ Async:0
[1649408071][INFO]sensor index:7 x86_pkg_temp /sys/class/thermal/thermal_zone7/ Async:1
[1649408071][INFO]sensor index:5 pch_cometlake /sys/class/thermal/thermal_zone5/ Async:0
[1649408071][INFO]sensor index:3 NGFF /sys/class/thermal/thermal_zone3/ Async:0
[1649408071][INFO]sensor index:1 TMEM /sys/class/thermal/thermal_zone1/ Async:0
[1649408071][INFO]sensor index:6 B0D4 /sys/class/thermal/thermal_zone6/ Async:0
[1649408071][INFO]sensor index:4 TVGA /sys/class/thermal/thermal_zone4/ Async:0
[1649408071][INFO]sensor index:8 hwmon /sys/class/hwmon/hwmon5/temp1_input Async:0
[1649408071][INFO]sensor index:9 hwmon /sys/class/hwmon/hwmon5/temp2_input Async:0
[1649408071][INFO]sensor index:10 hwmon /sys/class/hwmon/hwmon5/temp3_input Async:0
[1649408071][INFO]thd_read_default_cooling devices loaded 14 cdevs
[1649408071][INFO]ppcc limits max:47000000 min:10000000 min_win:28000000 step:1000000
[1649408071][INFO]set_pid_param 14 [-1000.100,10]
[1649408071][INFO]Use Default pstate drv settings
[1649408071][INFO]sysfs create failed
[1649408071][INFO]INT3400 Base path is
[1649408071][INFO]failed to open /dev/acpi_thermal_rel
[1649408071][INFO]failed to open /dev/acpi_thermal_rel
[1649408071][INFO]TRT/ART read failed
[1649408071][INFO]Using config file /etc/thermald/thermal-conf.xml
I/O warning : failed to load external entity "/etc/thermald/thermal-conf.xml"
[1649408071][WARN]error: could not parse file /etc/thermald/thermal-conf.xml
[1649408071][INFO]name = package-0
[1649408071][INFO]name = dram
[1649408071][INFO]sysfs read failed /sys/devices/virtual/powercap/intel-rapl/intel-rapl:0/intel-rapl:0:2/constraint_0_max_power_uw
[1649408071][INFO]:powercap RAPL invalid max power limit range
[1649408071][INFO]Calculate dynamically phy_max
[1649408071][INFO]set_pid_param 18 [-0.4.0,0]
[1649408071][INFO]13: ath10k_thermal, C:0 MN: 0 MX:100 ST:1 pt:/sys/class/thermal/ rd_bk 1
[1649408071][INFO]1: Processor, C:0 MN: 0 MX:3 ST:1 pt:/sys/class/thermal/ rd_bk 0
[1649408071][INFO]11: Processor, C:0 MN: 0 MX:3 ST:1 pt:/sys/class/thermal/ rd_bk 0
[1649408071][INFO]8: Processor, C:0 MN: 0 MX:3 ST:1 pt:/sys/class/thermal/ rd_bk 0
[1649408071][INFO]6: Processor, C:0 MN: 0 MX:3 ST:1 pt:/sys/class/thermal/ rd_bk 0
[1649408071][INFO]4: Processor, C:0 MN: 0 MX:3 ST:1 pt:/sys/class/thermal/ rd_bk 0
[1649408071][INFO]2: Processor, C:0 MN: 0 MX:3 ST:1 pt:/sys/class/thermal/ rd_bk 0
[1649408071][INFO]12: intel_powerclamp, C:-1 MN: 0 MX:50 ST:5 pt:/sys/class/thermal/ rd_bk 0
[1649408071][INFO]0: Processor, C:0 MN: 0 MX:3 ST:1 pt:/sys/class/thermal/ rd_bk 0
[1649408071][INFO]10: Processor, C:0 MN: 0 MX:3 ST:1 pt:/sys/class/thermal/ rd_bk 0
[1649408071][INFO]9: Processor, C:0 MN: 0 MX:3 ST:1 pt:/sys/class/thermal/ rd_bk 0
[1649408071][INFO]7: Processor, C:0 MN: 0 MX:3 ST:1 pt:/sys/class/thermal/ rd_bk 0
[1649408071][INFO]5: Processor, C:0 MN: 0 MX:3 ST:1 pt:/sys/class/thermal/ rd_bk 0
[1649408071][INFO]3: Processor, C:0 MN: 0 MX:3 ST:1 pt:/sys/class/thermal/ rd_bk 0
[1649408071][INFO]14: rapl_controller, C:47000000 MN: 47000000 MX:10000000 Inc ST:-2000000 Dec ST:-1000000 pt:/sys/devices/virtual/powercap/intel-rapl/intel-rapl:0/ rd_bk 1
[1649408071][INFO]15: intel_pstate, C:0 MN: 0 MX:10 ST:1 pt:/sys/devices/system/cpu/intel_pstate/ rd_bk 1
[1649408071][INFO]16: rapl_controller_dram, C:100000000 MN: 100000000 MX:0 ST:-500000 pt:/sys/devices/virtual/powercap/intel-rapl/intel-rapl:0/intel-rapl:0:2/ rd_bk 1
[1649408071][INFO]17: LCD, C:0 MN: 0 MX:120000 ST:12000 pt:/sys/class/backlight/intel_backlight/ rd_bk 1
[1649408071][INFO]18: amdgpu, C:0 MN: 0 MX:0 ST:0 pt: rd_bk 1
[1649408071][INFO]thd_read_default_thermal_zones loaded 7 zones
[1649408071][INFO]INT3400 Base path is
[1649408071][INFO]zone cpu will be created
[1649408071][INFO]dts zone /sys/devices/platform/coretemp.0/name doesn't exist
[1649408071][INFO]/sys/class/hwmon/hwmon6/name->dell_smm
[1649408071][INFO]/sys/class/hwmon/hwmon4/name->pch_cometlake
[1649408071][INFO]/sys/class/hwmon/hwmon2/name->BAT0
[1649408071][INFO]/sys/class/hwmon/hwmon0/name->AC
[1649408071][INFO]/sys/class/hwmon/hwmon7/name->ath10k_hwmon
[1649408071][INFO]/sys/class/hwmon/hwmon5/name->coretemp
[1649408071][INFO]Buggy max temp: to close to critical 90000
[1649408071][INFO]Core temp DTS :critical 100000, max 90000, psv 95000
[1649408071][INFO]node type: Element, name: CoolingDevice value: rapl_controller
[1649408071][INFO]node type: Element, name: CoolingDevice value: intel_pstate
[1649408071][INFO]node type: Element, name: CoolingDevice value: intel_powerclamp
[1649408071][INFO]node type: Element, name: CoolingDevice value: cpufreq
[1649408071][INFO]node type: Element, name: CoolingDevice value: Processor
[1649408071][INFO]CDEVS order specified in thermal-cpu-cdev-order.xml
[1649408071][INFO]/sys/class/hwmon/hwmon3/name->nouveau
[1649408071][INFO]/sys/class/hwmon/hwmon1/name->acpitz
[1649408071][INFO]INT3400 Base path is
[1649408071][INFO]failed to open /dev/acpi_thermal_rel
[1649408071][INFO]failed to open /dev/acpi_thermal_rel
[1649408071][INFO]TRT/ART read failed
[1649408071][INFO]Using config file /etc/thermald/thermal-conf.xml
I/O warning : failed to load external entity "/etc/thermald/thermal-conf.xml"
[1649408071][WARN]error: could not parse file /etc/thermald/thermal-conf.xml
[1649408071][INFO]
ZONE DUMP BEGIN
[1649408071][INFO]
[1649408071][INFO]Zone 8: cpu, Active:1 Bind:0 Sensor_cnt:1
[1649408071][INFO]..sensors..
[1649408071][INFO]sensor index:7 x86_pkg_temp /sys/class/thermal/thermal_zone7/ Async:1
[1649408071][INFO]..trips..
[1649408071][INFO]index 0: type:passive temp:95000 hyst:0 zone id:8 sensor id:65535 control_type:1 cdev size:4
[1649408071][INFO]cdev[0] rapl_controller, Sampling period: 0
[1649408071][INFO] target_state:not defined
[1649408071][INFO]cdev[1] intel_pstate, Sampling period: 0
[1649408071][INFO] target_state:not defined
[1649408071][INFO]cdev[2] intel_powerclamp, Sampling period: 0
[1649408071][INFO] target_state:not defined
[1649408071][INFO]cdev[3] Processor, Sampling period: 0
[1649408071][INFO] target_state:not defined
[1649408071][INFO]index 1: type:polling temp:85500 hyst:0 zone id:8 sensor id:7 control_type:0 cdev size:0
[1649408071][INFO]
[1649408071][INFO]
ZONE DUMP END
[1649408071][INFO]Current user preference is 0
[1649408071][INFO]thd_engine_thread begin
After the edit, this is my configuration file, yet cores temp goes up to 90 C:
~$ cat /etc/thermald/thermal-conf.xml
<?xml version="1.0"?>
<ThermalConfiguration>
<Platform>
<Name>Generic X86 Laptop Device</Name>
<ProductName>*</ProductName>
<Preference>QUIET</Preference>
<ThermalZones>
<ThermalZone>
<Type>cpu</Type>
<TripPoints>
<TripPoint>
<SensorType>x86_pkg_temp</SensorType>
<Temperature>55000</Temperature>
<type>passive</type>
<ControlType>PARALLEL</ControlType>
</TripPoint>
</TripPoints>
</ThermalZone>
</ThermalZones>
</Platform>
</ThermalConfiguration>
Additional info:
~$ ls -al /etc/thermald
total 32
drwxr-xr-x 2 root root 4096 Apr 8 16:32 .
drwxr-xr-x 159 root root 12288 Apr 5 09:03 ..
-rw-r--r-- 1 root root 4605 Jan 15 2019 backup
-rw-rw-r-- 1 username username 816 Apr 8 16:32 thermal-conf.xml
-rw-r--r-- 1 root root 508 Jan 15 2019 thermal-cpu-cdev-order.xml
And also this seems relevant (thermald inactive?):
$ sudo systemctl status thermald
● thermald.service - Thermal Daemon Service
Loaded: loaded (/lib/systemd/system/thermald.service; enabled; vendor preset: enabled)
Active: inactive (dead) since Fri 2022-04-08 10:54:28 CEST; 1 weeks 0 days ago
Main PID: 1328 (code=exited, status=0/SUCCESS)
Apr 07 11:51:51 Precision-3551 thermald[1328]: error: could not parse file /etc/thermald/thermal-conf.xml
Apr 07 11:51:51 Precision-3551 thermald[1328]: I/O warning : failed to load external entity "/etc/thermald/thermal-conf.xml"
Apr 07 11:51:51 Precision-3551 thermald[1328]: error: could not parse file /etc/thermald/thermal-conf.xml
Apr 07 11:51:51 Precision-3551 thermald[1328]: I/O warning : failed to load external entity "/etc/thermald/thermal-conf.xml"
Apr 07 11:51:51 Precision-3551 thermald[1328]: error: could not parse file /etc/thermald/thermal-conf.xml
Apr 08 10:54:26 Precision-3551 systemd[1]: Stopping Thermal Daemon Service...
Apr 08 10:54:26 Precision-3551 thermald[1328]: Terminating ...
Apr 08 10:54:27 Precision-3551 thermald[1328]: terminating on user request ..
Apr 08 10:54:28 Precision-3551 systemd[1]: thermald.service: Succeeded.
Apr 08 10:54:28 Precision-3551 systemd[1]: Stopped Thermal Daemon Service.
I have now reactivated it with sudo service thermald restart
and now:
$ sudo systemctl status thermald
● thermald.service - Thermal Daemon Service
Loaded: loaded (/lib/systemd/system/thermald.service; enabled; vendor preset: enabled)
Active: active (running) since Fri 2022-04-15 22:26:23 CEST; 2s ago
Main PID: 609438 (thermald)
Tasks: 2 (limit: 18622)
Memory: 1.3M
CGroup: /system.slice/thermald.service
└─609438 /usr/sbin/thermald --systemd --dbus-enable --adaptive
Apr 15 22:26:23 Precision-3551 systemd[1]: Starting Thermal Daemon Service...
Apr 15 22:26:23 Precision-3551 systemd[1]: Started Thermal Daemon Service.
Apr 15 22:26:23 Precision-3551 thermald[609438]: 22 CPUID levels; family:model:stepping 0x6:a5:2 (6:165:2)
Apr 15 22:26:23 Precision-3551 thermald[609438]: 22 CPUID levels; family:model:stepping 0x6:a5:2 (6:165:2)
Apr 15 22:26:23 Precision-3551 thermald[609438]: Polling mode is enabled: 4
Apr 15 22:26:23 Precision-3551 thermald[609438]: sensor id 11 : No temp sysfs for reading raw temp
Apr 15 22:26:23 Precision-3551 thermald[609438]: sensor id 11 : No temp sysfs for reading raw temp
Apr 15 22:26:23 Precision-3551 thermald[609438]: sensor id 11 : No temp sysfs for reading raw temp
thermald
, and it seems supported by Ubuntu. The laptop is pretty new. I am running some jobs (withcpulimit 50
to be honest, but still...), and that's why it is running hot/ – Py-ser Apr 01 '22 at 13:40lshw
seems probably too much? – Py-ser Apr 01 '22 at 13:53grep --max-count=1 "model name" /proc/cpuinfo
gives, for example:model name : Intel(R) Core(TM) i5-10600K CPU @ 4.10GHz
edit the information into your question. – Doug Smythies Apr 01 '22 at 14:02model name : Intel(R) Core(TM) i7-10850H CPU @ 2.70GHz
I'll edit the main post – Py-ser Apr 01 '22 at 14:10xml
? – Py-ser Apr 01 '22 at 16:26/etc/thermald/thermal-conf.xml
file. – Doug Smythies Apr 01 '22 at 17:41man thermald
andman thermal-conf.xml
. The thermal-conf.xml file that you used is a generic one that's an example only. First remove it altogether, and restart thermald. It'll try and run in a default configuration if it doesn't find the .xml file. See how that works. Otherwise, stop thermald, and manually run it withsudo thermald --no-daemon --loglevel=info
and let thermald tell you itself what it finds, and use that to write your own .xml file. – heynnema Apr 01 '22 at 17:46sudo systemctl status thermald
? – Doug Smythies Apr 15 '22 at 15:02