2

I just discovered thermald for preventing machines from overheating. I'd like some basic suggestions on how to modify the xml configuration file. Below is the one I have at /etc/thermald/thermal-conf.xml. From some examples I have skimmed online, it seems it is set to start preventing overheating at 55 C (if I am reading the <Temperature>55000</Temperature> line correctly), but my cores reach even 94 C with fans going on.

I am using a Intel(R) Core(TM) i7-10850H CPU @ 2.70GHz machine.

<?xml version="1.0"?>

<!-- use "man thermal-conf.xml" for details -->

<!-- BEGIN --> <ThermalConfiguration> <Platform> <Name>Generic X86 Laptop Device</Name> <ProductName>EXAMPLE_SYSTEM</ProductName> <Preference>QUIET</Preference> <ThermalSensors> <ThermalSensor> <Type>TSKN</Type> <AsyncCapable>1</AsyncCapable> </ThermalSensor> </ThermalSensors> <ThermalZones> <ThermalZone> <Type>SKIN</Type> <TripPoints> <TripPoint> <SensorType>TSKN</SensorType> <Temperature>55000</Temperature> <type>passive</type> <ControlType>SEQUENTIAL</ControlType> <CoolingDevice> <index>1</index> <type>rapl_controller</type> <influence> 100 </influence> <SamplingPeriod> 16 </SamplingPeriod> </CoolingDevice> <CoolingDevice> <index>2</index> <type>intel_powerclamp</type> <influence> 100 </influence> <SamplingPeriod> 12 </SamplingPeriod> </CoolingDevice> </TripPoint> </TripPoints> </ThermalZone> </ThermalZones> </Platform>

<!-- Thermal configuration example only --> <Platform> <Name>Example Platform Name</Name> <!--UUID is optional, if present this will be matched --> <!-- Both product name and UUID can contain wild card "", which matches any platform --> <UUID>Example UUID</UUID> <ProductName>Example Product Name</ProductName> <Preference>QUIET</Preference> <ThermalSensors> <ThermalSensor> <!-- New Sensor with a type and path --> <Type>example_sensor_1</Type> <Path>/some_path</Path> <AsyncCapable>0</AsyncCapable> </ThermalSensor> <ThermalSensor> <!-- Already present in thermal sysfs, enable this or add/change config For example, here we are indicating that sensor can do async events to avoid polling --> <Type>example_thermal_sysfs_sensor</Type> <!-- If async capable, then we don't need to poll --> <AsyncCapable>1</AsyncCapable> </ThermalSensor> <ThermalSensor> <!-- Examle of a virtual sensor. This sensor depends on other real sensor or virtual sensor. E.g. here the temp will be temp of example_sensor_1 0.5 + 10 --> <Type>example_virtual_sensor</Type> <Virtual>1</Virtual> <SensorLink> <SensorType>example_sensor_1</SensorType> <Multiplier> 0.5 </Multiplier> <Offset> 10 </Offset> </SensorLink> </ThermalSensor>

&lt;/ThermalSensors&gt;
&lt;ThermalZones&gt;
    &lt;ThermalZone&gt;
        &lt;Type&gt;Example Zone type&lt;/Type&gt;
        &lt;TripPoints&gt;
            &lt;TripPoint&gt;
                &lt;SensorType&gt;example_sensor_1&lt;/SensorType&gt;
                &lt;!-- Temperature at which to take action --&gt;
                &lt;Temperature&gt; 75000 &lt;/Temperature&gt;
                &lt;!-- max/passive/active
                    If a MAX type is specified, then
                    daemon will use PID control
                    to aggresively throttle to avoid
                    reaching this temp.
                 --&gt;
                &lt;type&gt;max&lt;/type&gt;
                &lt;!-- SEQUENTIAL | PARALLEL
                When a trip point temp is violated, then
                number of cooling device can be activated.
                If control type is SEQUENTIAL then
                It will exhaust first cooling device before trying
                next.
                --&gt;
                &lt;ControlType&gt;SEQUENTIAL&lt;/ControlType&gt;
                &lt;CoolingDevice&gt;
                    &lt;index&gt;1&lt;/index&gt;
                    &lt;type&gt;example_cooling_device&lt;/type&gt;
                    &lt;!-- Influence will be used order cooling devices.
                        First cooling device will be used, which has
                        highest influence.
                    --&gt;
                    &lt;influence&gt; 100 &lt;/influence&gt;
                    &lt;!-- Delay in using this cdev, this takes some time
                    too actually cool a zone
                    --&gt;
                    &lt;SamplingPeriod&gt; 12 &lt;/SamplingPeriod&gt;
                &lt;/CoolingDevice&gt;
            &lt;/TripPoint&gt;

        &lt;/TripPoints&gt;
    &lt;/ThermalZone&gt;
&lt;/ThermalZones&gt;
&lt;CoolingDevices&gt;
    &lt;CoolingDevice&gt;
        &lt;!--
            Cooling device can be specified
            by a type and optionally a sysfs path
            If the type already present in thermal sysfs
            no need of a path.
            Compensation can use min/max and step size
            to increasing cool the system.
            Debounce period can be used to force
            a waiting period for action
        --&gt;
        &lt;Type&gt;example_cooling_device&lt;/Type&gt;
        &lt;MinState&gt;0&lt;/MinState&gt;
        &lt;IncDecStep&gt;10&lt;/IncDecStep&gt;
        &lt;ReadBack&gt; 0 &lt;/ReadBack&gt;
        &lt;MaxState&gt;50&lt;/MaxState&gt;
        &lt;DebouncePeriod&gt;5000&lt;/DebouncePeriod&gt;
        &lt;!--
            If there are no PID parameter
            compensation increase step wise and exponentaially
            if single step is not able to change trend.
            Alternatively a PID parameters can be specified
            then next step will use PID calculation using
            provided PID constants.
        --&gt;&gt;
        &lt;PidControl&gt;
            &lt;kp&gt;0.001&lt;/kp&gt;
            &lt;kd&gt;0.0001&lt;/kd&gt;
            &lt;ki&gt;0.0001&lt;/ki&gt;
        &lt;/PidControl&gt;
    &lt;/CoolingDevice&gt;
&lt;/CoolingDevices&gt;

</Platform> </ThermalConfiguration> <!-- END -->

Following the suggestion from @heynnema, I have deleted the configuration file, stopped thermald, and ran sudo thermald --no-daemon --loglevel=info. Following is the output, but how is this supposed to help me construct a new, more efficient configuration file?

$ sudo thermald --no-daemon --loglevel=info
[1649408071][INFO]RAPL domain count 1
[1649408071][INFO]RAPL domain count 1
[1649408071][MSG]22 CPUID levels; family:model:stepping 0x6:a5:2 (6:165:2)
[1649408071][INFO]Running on a vanilla kernel
[1649408071][MSG]Polling mode is enabled: 4
[1649408071][INFO]sensor_update: type TSKN
[1649408071][INFO]sensor_update: type acpitz
[1649408071][INFO]sensor_update: type x86_pkg_temp
[1649408071][INFO]sensor_update: type pch_cometlake
[1649408071][INFO]sensor_update: type NGFF
[1649408071][INFO]sensor_update: type TMEM
[1649408071][INFO]sensor_update: type B0D4
[1649408071][INFO]sensor_update: type TVGA
[1649408071][INFO]thd_read_default_thermal_sensors loaded 8 sensors 
[1649408071][INFO]dts /sys/devices/platform/coretemp.0/name doesn't exist
[1649408071][WARN]sensor id 11 : No temp sysfs for reading raw temp
[1649408071][WARN]sensor id 11 : No temp sysfs for reading raw temp
[1649408071][WARN]sensor id 11 : No temp sysfs for reading raw temp
[1649408071][INFO]INT3400 Base path is 
[1649408071][INFO]failed to open /dev/acpi_thermal_rel 
[1649408071][INFO]failed to open /dev/acpi_thermal_rel 
[1649408071][INFO]TRT/ART read failed
[1649408071][INFO]Using config file /etc/thermald/thermal-conf.xml
I/O warning : failed to load external entity "/etc/thermald/thermal-conf.xml"
[1649408071][WARN]error: could not parse file /etc/thermald/thermal-conf.xml
[1649408071][INFO]sensor index:2 TSKN /sys/class/thermal/thermal_zone2/ Async:0 
[1649408071][INFO]sensor index:0 acpitz /sys/class/thermal/thermal_zone0/ Async:0 
[1649408071][INFO]sensor index:7 x86_pkg_temp /sys/class/thermal/thermal_zone7/ Async:1 
[1649408071][INFO]sensor index:5 pch_cometlake /sys/class/thermal/thermal_zone5/ Async:0 
[1649408071][INFO]sensor index:3 NGFF /sys/class/thermal/thermal_zone3/ Async:0 
[1649408071][INFO]sensor index:1 TMEM /sys/class/thermal/thermal_zone1/ Async:0 
[1649408071][INFO]sensor index:6 B0D4 /sys/class/thermal/thermal_zone6/ Async:0 
[1649408071][INFO]sensor index:4 TVGA /sys/class/thermal/thermal_zone4/ Async:0 
[1649408071][INFO]sensor index:8 hwmon /sys/class/hwmon/hwmon5/temp1_input Async:0 
[1649408071][INFO]sensor index:9 hwmon /sys/class/hwmon/hwmon5/temp2_input Async:0 
[1649408071][INFO]sensor index:10 hwmon /sys/class/hwmon/hwmon5/temp3_input Async:0 
[1649408071][INFO]thd_read_default_cooling devices loaded 14 cdevs 
[1649408071][INFO]ppcc limits max:47000000 min:10000000  min_win:28000000 step:1000000
[1649408071][INFO]set_pid_param 14 [-1000.100,10]
[1649408071][INFO]Use Default pstate drv settings
[1649408071][INFO]sysfs create failed 
[1649408071][INFO]INT3400 Base path is 
[1649408071][INFO]failed to open /dev/acpi_thermal_rel 
[1649408071][INFO]failed to open /dev/acpi_thermal_rel 
[1649408071][INFO]TRT/ART read failed
[1649408071][INFO]Using config file /etc/thermald/thermal-conf.xml
I/O warning : failed to load external entity "/etc/thermald/thermal-conf.xml"
[1649408071][WARN]error: could not parse file /etc/thermald/thermal-conf.xml
[1649408071][INFO]name = package-0
[1649408071][INFO]name = dram
[1649408071][INFO]sysfs read failed /sys/devices/virtual/powercap/intel-rapl/intel-rapl:0/intel-rapl:0:2/constraint_0_max_power_uw
[1649408071][INFO]:powercap RAPL invalid max power limit range 
[1649408071][INFO]Calculate dynamically phy_max 
[1649408071][INFO]set_pid_param 18 [-0.4.0,0]
[1649408071][INFO]13: ath10k_thermal, C:0 MN: 0 MX:100 ST:1 pt:/sys/class/thermal/ rd_bk 1 
[1649408071][INFO]1: Processor, C:0 MN: 0 MX:3 ST:1 pt:/sys/class/thermal/ rd_bk 0 
[1649408071][INFO]11: Processor, C:0 MN: 0 MX:3 ST:1 pt:/sys/class/thermal/ rd_bk 0 
[1649408071][INFO]8: Processor, C:0 MN: 0 MX:3 ST:1 pt:/sys/class/thermal/ rd_bk 0 
[1649408071][INFO]6: Processor, C:0 MN: 0 MX:3 ST:1 pt:/sys/class/thermal/ rd_bk 0 
[1649408071][INFO]4: Processor, C:0 MN: 0 MX:3 ST:1 pt:/sys/class/thermal/ rd_bk 0 
[1649408071][INFO]2: Processor, C:0 MN: 0 MX:3 ST:1 pt:/sys/class/thermal/ rd_bk 0 
[1649408071][INFO]12: intel_powerclamp, C:-1 MN: 0 MX:50 ST:5 pt:/sys/class/thermal/ rd_bk 0 
[1649408071][INFO]0: Processor, C:0 MN: 0 MX:3 ST:1 pt:/sys/class/thermal/ rd_bk 0 
[1649408071][INFO]10: Processor, C:0 MN: 0 MX:3 ST:1 pt:/sys/class/thermal/ rd_bk 0 
[1649408071][INFO]9: Processor, C:0 MN: 0 MX:3 ST:1 pt:/sys/class/thermal/ rd_bk 0 
[1649408071][INFO]7: Processor, C:0 MN: 0 MX:3 ST:1 pt:/sys/class/thermal/ rd_bk 0 
[1649408071][INFO]5: Processor, C:0 MN: 0 MX:3 ST:1 pt:/sys/class/thermal/ rd_bk 0 
[1649408071][INFO]3: Processor, C:0 MN: 0 MX:3 ST:1 pt:/sys/class/thermal/ rd_bk 0 
[1649408071][INFO]14: rapl_controller, C:47000000 MN: 47000000 MX:10000000 Inc ST:-2000000 Dec ST:-1000000 pt:/sys/devices/virtual/powercap/intel-rapl/intel-rapl:0/ rd_bk 1 
[1649408071][INFO]15: intel_pstate, C:0 MN: 0 MX:10 ST:1 pt:/sys/devices/system/cpu/intel_pstate/ rd_bk 1 
[1649408071][INFO]16: rapl_controller_dram, C:100000000 MN: 100000000 MX:0 ST:-500000 pt:/sys/devices/virtual/powercap/intel-rapl/intel-rapl:0/intel-rapl:0:2/ rd_bk 1 
[1649408071][INFO]17: LCD, C:0 MN: 0 MX:120000 ST:12000 pt:/sys/class/backlight/intel_backlight/ rd_bk 1 
[1649408071][INFO]18: amdgpu, C:0 MN: 0 MX:0 ST:0 pt: rd_bk 1 
[1649408071][INFO]thd_read_default_thermal_zones loaded 7 zones 
[1649408071][INFO]INT3400 Base path is 
[1649408071][INFO]zone cpu will be created 
[1649408071][INFO]dts zone /sys/devices/platform/coretemp.0/name doesn't exist
[1649408071][INFO]/sys/class/hwmon/hwmon6/name->dell_smm
[1649408071][INFO]/sys/class/hwmon/hwmon4/name->pch_cometlake
[1649408071][INFO]/sys/class/hwmon/hwmon2/name->BAT0
[1649408071][INFO]/sys/class/hwmon/hwmon0/name->AC
[1649408071][INFO]/sys/class/hwmon/hwmon7/name->ath10k_hwmon
[1649408071][INFO]/sys/class/hwmon/hwmon5/name->coretemp
[1649408071][INFO]Buggy max temp: to close to critical 90000
[1649408071][INFO]Core temp DTS :critical 100000, max 90000, psv 95000
[1649408071][INFO]node type: Element, name: CoolingDevice value: rapl_controller
[1649408071][INFO]node type: Element, name: CoolingDevice value: intel_pstate
[1649408071][INFO]node type: Element, name: CoolingDevice value: intel_powerclamp
[1649408071][INFO]node type: Element, name: CoolingDevice value: cpufreq
[1649408071][INFO]node type: Element, name: CoolingDevice value: Processor
[1649408071][INFO]CDEVS order specified in thermal-cpu-cdev-order.xml
[1649408071][INFO]/sys/class/hwmon/hwmon3/name->nouveau
[1649408071][INFO]/sys/class/hwmon/hwmon1/name->acpitz
[1649408071][INFO]INT3400 Base path is 
[1649408071][INFO]failed to open /dev/acpi_thermal_rel 
[1649408071][INFO]failed to open /dev/acpi_thermal_rel 
[1649408071][INFO]TRT/ART read failed
[1649408071][INFO]Using config file /etc/thermald/thermal-conf.xml
I/O warning : failed to load external entity "/etc/thermald/thermal-conf.xml"
[1649408071][WARN]error: could not parse file /etc/thermald/thermal-conf.xml
[1649408071][INFO]

ZONE DUMP BEGIN [1649408071][INFO] [1649408071][INFO]Zone 8: cpu, Active:1 Bind:0 Sensor_cnt:1 [1649408071][INFO]..sensors.. [1649408071][INFO]sensor index:7 x86_pkg_temp /sys/class/thermal/thermal_zone7/ Async:1 [1649408071][INFO]..trips.. [1649408071][INFO]index 0: type:passive temp:95000 hyst:0 zone id:8 sensor id:65535 control_type:1 cdev size:4 [1649408071][INFO]cdev[0] rapl_controller, Sampling period: 0 [1649408071][INFO] target_state:not defined [1649408071][INFO]cdev[1] intel_pstate, Sampling period: 0 [1649408071][INFO] target_state:not defined [1649408071][INFO]cdev[2] intel_powerclamp, Sampling period: 0 [1649408071][INFO] target_state:not defined [1649408071][INFO]cdev[3] Processor, Sampling period: 0 [1649408071][INFO] target_state:not defined [1649408071][INFO]index 1: type:polling temp:85500 hyst:0 zone id:8 sensor id:7 control_type:0 cdev size:0 [1649408071][INFO] [1649408071][INFO]

ZONE DUMP END [1649408071][INFO]Current user preference is 0 [1649408071][INFO]thd_engine_thread begin

After the edit, this is my configuration file, yet cores temp goes up to 90 C:

~$ cat /etc/thermald/thermal-conf.xml
<?xml version="1.0"?>
<ThermalConfiguration>
<Platform>
        <Name>Generic X86 Laptop Device</Name>
        <ProductName>*</ProductName>
        <Preference>QUIET</Preference>
        <ThermalZones>
                <ThermalZone>
                        <Type>cpu</Type>
                        <TripPoints>
                                <TripPoint>
                                        <SensorType>x86_pkg_temp</SensorType>
                                        <Temperature>55000</Temperature>
                                        <type>passive</type>
                                        <ControlType>PARALLEL</ControlType>
                                </TripPoint>
                        </TripPoints>
                </ThermalZone>
        </ThermalZones>
</Platform>
</ThermalConfiguration>

Additional info:

~$ ls -al /etc/thermald
total 32
drwxr-xr-x   2 root      root       4096 Apr  8 16:32 .
drwxr-xr-x 159 root      root      12288 Apr  5 09:03 ..
-rw-r--r--   1 root      root       4605 Jan 15  2019 backup
-rw-rw-r--   1 username username   816 Apr  8 16:32 thermal-conf.xml
-rw-r--r--   1 root      root        508 Jan 15  2019 thermal-cpu-cdev-order.xml

And also this seems relevant (thermald inactive?):

$ sudo systemctl status thermald
● thermald.service - Thermal Daemon Service
     Loaded: loaded (/lib/systemd/system/thermald.service; enabled; vendor preset: enabled)
     Active: inactive (dead) since Fri 2022-04-08 10:54:28 CEST; 1 weeks 0 days ago
   Main PID: 1328 (code=exited, status=0/SUCCESS)

Apr 07 11:51:51 Precision-3551 thermald[1328]: error: could not parse file /etc/thermald/thermal-conf.xml Apr 07 11:51:51 Precision-3551 thermald[1328]: I/O warning : failed to load external entity "/etc/thermald/thermal-conf.xml" Apr 07 11:51:51 Precision-3551 thermald[1328]: error: could not parse file /etc/thermald/thermal-conf.xml Apr 07 11:51:51 Precision-3551 thermald[1328]: I/O warning : failed to load external entity "/etc/thermald/thermal-conf.xml" Apr 07 11:51:51 Precision-3551 thermald[1328]: error: could not parse file /etc/thermald/thermal-conf.xml Apr 08 10:54:26 Precision-3551 systemd[1]: Stopping Thermal Daemon Service... Apr 08 10:54:26 Precision-3551 thermald[1328]: Terminating ... Apr 08 10:54:27 Precision-3551 thermald[1328]: terminating on user request .. Apr 08 10:54:28 Precision-3551 systemd[1]: thermald.service: Succeeded. Apr 08 10:54:28 Precision-3551 systemd[1]: Stopped Thermal Daemon Service.

I have now reactivated it with sudo service thermald restart and now:

$ sudo systemctl status thermald
● thermald.service - Thermal Daemon Service
     Loaded: loaded (/lib/systemd/system/thermald.service; enabled; vendor preset: enabled)
     Active: active (running) since Fri 2022-04-15 22:26:23 CEST; 2s ago
   Main PID: 609438 (thermald)
      Tasks: 2 (limit: 18622)
     Memory: 1.3M
     CGroup: /system.slice/thermald.service
             └─609438 /usr/sbin/thermald --systemd --dbus-enable --adaptive

Apr 15 22:26:23 Precision-3551 systemd[1]: Starting Thermal Daemon Service... Apr 15 22:26:23 Precision-3551 systemd[1]: Started Thermal Daemon Service. Apr 15 22:26:23 Precision-3551 thermald[609438]: 22 CPUID levels; family:model:stepping 0x6:a5:2 (6:165:2) Apr 15 22:26:23 Precision-3551 thermald[609438]: 22 CPUID levels; family:model:stepping 0x6:a5:2 (6:165:2) Apr 15 22:26:23 Precision-3551 thermald[609438]: Polling mode is enabled: 4 Apr 15 22:26:23 Precision-3551 thermald[609438]: sensor id 11 : No temp sysfs for reading raw temp Apr 15 22:26:23 Precision-3551 thermald[609438]: sensor id 11 : No temp sysfs for reading raw temp Apr 15 22:26:23 Precision-3551 thermald[609438]: sensor id 11 : No temp sysfs for reading raw temp

Py-ser
  • 697
  • Separate issue maybe but if you are getting temps of 94 c you are only a few degrees from a thermal shutdown. I would be concerned about why it is running so hot. Is there a reason you installed this to control the temp? Are you sure the fans, all of them are working correctly and that all vents are clear of dust and dirt? – David Apr 01 '22 at 13:32
  • I was reading this conversation where they suggest thermald, and it seems supported by Ubuntu. The laptop is pretty new. I am running some jobs (with cpulimit 50 to be honest, but still...), and that's why it is running hot/ – Py-ser Apr 01 '22 at 13:40
  • Please give your processor make and model. – Doug Smythies Apr 01 '22 at 13:40
  • OK well you got it hot enough to make kaffe. – David Apr 01 '22 at 13:46
  • @DougSmythies how? lshw seems probably too much? – Py-ser Apr 01 '22 at 13:53
  • grep --max-count=1 "model name" /proc/cpuinfo gives, for example: model name : Intel(R) Core(TM) i5-10600K CPU @ 4.10GHz edit the information into your question. – Doug Smythies Apr 01 '22 at 14:02
  • model name : Intel(R) Core(TM) i7-10850H CPU @ 2.70GHz I'll edit the main post – Py-ser Apr 01 '22 at 14:10
  • @DougSmythies I am not sure I should interpret this as a suggestion to lower my trip point temperature? That's all I have to change in my xml? – Py-ser Apr 01 '22 at 16:26
  • Replace the entire /etc/thermald/thermal-conf.xml file. – Doug Smythies Apr 01 '22 at 17:41
  • A lesson on how to configure thermald could take a while. First check man thermald and man thermal-conf.xml. The thermal-conf.xml file that you used is a generic one that's an example only. First remove it altogether, and restart thermald. It'll try and run in a default configuration if it doesn't find the .xml file. See how that works. Otherwise, stop thermald, and manually run it with sudo thermald --no-daemon --loglevel=info and let thermald tell you itself what it finds, and use that to write your own .xml file. – heynnema Apr 01 '22 at 17:46
  • I'll put an answer with my thermal-conf.xml for you to look at. – heynnema Apr 01 '22 at 17:48
  • what do you get for sudo systemctl status thermald? – Doug Smythies Apr 15 '22 at 15:02

1 Answers1

0

From the comments:

A lesson on how to configure thermald could take a while. First check man thermald and man thermal-conf.xml. The thermal-conf.xml file that you used is a generic one that's an example only. First remove it altogether, and restart thermald. It'll try and run in a default configuration if it doesn't find the .xml file. See how that works. Otherwise, stop thermald, and manually run it with sudo thermald --no-daemon --loglevel=info and let thermald tell you itself what it finds, and use that to write your own .xml file.

Here's my thermal-conf.xml file...

<?xml version="1.0"?>
<ThermalConfiguration>
<Platform>
        <Name>Dell Inspiron-7700-AIO</Name>
        <ProductName>*</ProductName>
        <Preference>QUIET</Preference>
        <ThermalZones>
                <ThermalZone>
                        <Type>cpu</Type>
                        <TripPoints>
                                <TripPoint>
                                        <SensorType>x86_pkg_temp</SensorType>
                                        <Temperature>65000</Temperature>
                                        <type>passive</type>
                                        <ControlType>PARALLEL</ControlType>
                                        <CoolingDevice>
                                                <index>0</index>
                                                <type>Fan</type>
                                                <influence>30</influence>
                                                <SamplingPeriod>10</SamplingPeriod>
                                        </CoolingDevice>
                                        <CoolingDevice>
                                                <index>5</index>
                                                <type>Processor</type>
                                                <influence>80</influence>
                                                <SamplingPeriod>5</SamplingPeriod>
                                        </CoolingDevice>
                                        <CoolingDevice>
                                                <index>13</index>
                                                <type>intel_powerclamp</type>
                                                <influence>100</influence>
                                                <SamplingPeriod>5</SamplingPeriod>
                                        </CoolingDevice>
                                </TripPoint>
                        </TripPoints>
                </ThermalZone>
        </ThermalZones>
</Platform>
</ThermalConfiguration>

Update #1:

Minimal thermal-conf.xml file...

Just edit <Name>, <SensorType>, and <Temperature> values. Then restart thermald as a daemon, or manually to observe what goes on.

<?xml version="1.0"?>
<ThermalConfiguration>
<Platform>
        <Name>Generic</Name>
        <ProductName>*</ProductName>
        <Preference>QUIET</Preference>
        <ThermalZones>
                <ThermalZone>
                        <Type>cpu</Type>
                        <TripPoints>
                                <TripPoint>
                                        <SensorType>x86_pkg_temp</SensorType>
                                        <Temperature>55000</Temperature>
                                </TripPoint>
                        </TripPoints>
                </ThermalZone>
        </ThermalZones>
</Platform>
</ThermalConfiguration>

To stress the CPU and observe what happens with the temps, first install Vitals https://extensions.gnome.org/extension/1460/vitals/ and set it to display CPU package temps, and FAN speed. Then type "YES" in the terminal and watch what happens to the CPU temp. You can also install the stress app to do the same as "YES", but with more control.

heynnema
  • 70,711
  • I am not sure how to read the terminal output once I have deleted the conf file, but at least I have noticed that the output does not include any "Temperature" information, which seems odd to me? An explicit threshold for the temperature was the only clear thing to me in the original configuration file... – Py-ser Apr 03 '22 at 17:44
  • @Py-ser I've added a minimal thermal-conf.xml file for you to start with. Run thermald manually, with no .xml file, and find the three values that I call out and edit my minimal example. Then you can install the .xml file, re-run thermald manually, and check that it's doing what you expect. It'll take a little time to understand it all. – heynnema Apr 03 '22 at 18:46
  • @Py-ser See additions to Update #1 in my answer. – heynnema Apr 03 '22 at 18:52
  • I have vitals installed, that's how I know my cores' temp. I am still unsure what the suggested editing would improve. My current conf file has a trip point even lower than the one suggested in your answer. You say "Just edit , , and ." How? My limited knowledge only allows me to copy/paste the sensor type from the current configuration file, and temperature is already pretty low as far as I can tell? Such an editing doesn't seem effective to me? – Py-ser Apr 04 '22 at 10:31
  • @Py-ser I don't know what .xml file that you're using now. What were your results with no .xml file? My answer shows two working .xml files. The second one is a minimal one, but only requires final customization by you... the three values that I quoted. You get those values by watching sudo thermald --no-daemon --loglevel=info. I know there's a lot of information there, but as I mentioned, it'll just take some time for you to understand it all. Maybe printing it out will help you discover the values that you need. – heynnema Apr 04 '22 at 12:23
  • I have edited the answer to add the info after I removed the original configuration file. The original one was posted in the original thread, and that was the one I was using. In the output info there are several sensors types, index, etc. and I couldn't find a real guide to navigate through the output and skim out what I need to implement in my configuration file. If you know any, please add it to your answer – Py-ser Apr 08 '22 at 09:02
  • @Py-ser I reviewed your thermald log. It looks like my minimal .xml file should work for you. You just need to customize the and values, if need be, and then restart thermald. Then use the YES or stress tests to observe thermald in operation. – heynnema Apr 08 '22 at 13:48
  • Then what should I do after I observe how the system is stressed? I already stressed it with my workload and thermald didn't seem to be effective even if the configuration was basically the same as the one suggested in this answer. Let's say I stress the system and the temp goes up, what do I change then in the thermald configuration? – Py-ser Apr 15 '22 at 08:08
  • If you're using the .xml from my Update #1, then you just have to change the value in 65000. Try 55000. – heynnema Apr 15 '22 at 12:15
  • 55000 was the original value (see the original post). Even when I change it as you suggest, the package id 0 tempo goes up to 90degrees. Something is wrong. – Py-ser Apr 15 '22 at 13:44
  • @Py-ser You're using my minimal .xml file, correct? 90 degrees F or C? – heynnema Apr 15 '22 at 14:13
  • @Py-ser Edit your question and show me cat /etc/thermald/thermal-conf.xml. – heynnema Apr 15 '22 at 14:15
  • Edited my question. 90 C. – Py-ser Apr 15 '22 at 14:33
  • @Py-ser I've made a small change to my minimal .xml file. Please replace your current one with the new one and retest. If that doesn't work, with the new .xml file in place, stop the thermald service, and manually do sudo thermald --no-daemon --loglevel=info again, and let me take a look. – heynnema Apr 15 '22 at 15:06
  • @Py-ser Also show me ls -al /etc/thermald. – heynnema Apr 15 '22 at 15:08
  • good point. It shouldn't matter, but I have sometimes changed /etc/thermald/thermal-cpu-cdev-order.xml. I am curious to see of the deletion of the passive type directive makes any difference. – Doug Smythies Apr 15 '22 at 15:36
  • @DougSmythies I hope it does... as I'm almost out of ideas. This is really not something you can teach/troubleshoot via comments/chat. Any further ideas are welcomed :-) – heynnema Apr 15 '22 at 16:08
  • Added info. Looks like thermald was stopped? I restarted it and I'll keep an eye on sensors, but what about No temp sysfs for reading raw temp? – Py-ser Apr 15 '22 at 20:31
  • @Py-ser You're using my latest minimum .xml file, yes? The file "username username 816 Apr 8 16:32 thermal-conf.xml" should be root:root. The first "sudo systemctl status thermald" was from back when you used the example .xml file, and should be ignored. The second "sudo systemctl status thermald" is better, but may show a problem. Best way to know is to run YES or STRESS, and watch TOP while you watch the temps/fans. – heynnema Apr 15 '22 at 21:11
  • The minimum .xml file I am using is the one I recently posted in the updated question. I am not sure about the "root:root" part, should I change permissions to it? – Py-ser Apr 16 '22 at 08:01
  • @Py-ser Make sure that you're using the latest edit for the minimal .xml file posted in Update #1 in my answer. As I mentioned earlier, I made some small changes to possibly help with your situation. If you ls -al /etc/thermald you'll see that the .xml file is under your username, instead of root. You'll need to do a sudo chown root:root /etc/thermald/thermal-conf.xml to change that. – heynnema Apr 16 '22 at 08:06
  • @Py-ser If you change your .xml to reflect my latest edit in Update #1, make sure to restart thermald so it can pick up the newer .xml file. – heynnema Apr 16 '22 at 08:10
  • @Py-ser If you watch the top command in one terminal window, while you're running either the YES or STRESS in another terminal window, you'll see one process for each CPU pop up when thermald needs to control the CPUs to reduce temps. – heynnema Apr 16 '22 at 08:12
  • Let's continue this thread in this chat room... https://chat.stackexchange.com/rooms/135598/optimize-thermal-daemon – heynnema Apr 16 '22 at 08:21