I have a bash script
#!/bin/bash
# Enable nvidia-smi settings so they are persistent the whole time the system is on.
nvidia-smi -pm 1
# Define the various overclocking settings (powerLimit in watts)
powerLimit="100"
coreOffset="150"
memoryOffset="1000"
targetFanSpeed="40"
TOTAL_GPU=5
GPU_INDEX=0
while [ $GPU_INDEX -lt $TOTAL_GPU ]; do
nvidia-smi -i $GPU_INDEX -pl $powerLimit
nvidia-settings -a [gpu:$GPU_INDEX]/GpuPowerMizerMode=1
nvidia-settings -a [gpu:$GPU_INDEX]/GPUMemoryTransferRateOffset[3]=$memoryOffset
nvidia-settings -a [gpu:$GPU_INDEX]/GPUGraphicsClockOffset[3]=$coreOffset
nvidia-settings -a [gpu:$GPU_INDEX]/GPUFanControlState=1
nvidia-settings -a [fan:$GPU_INDEX]/GPUTargetFanSpeed=$targetFanSpeed
let GPU_INDEX=GPU_INDEX+1
done
To set an overclock on the GPUs installed in my system. I am trying to run this script on startup to automate the system on a reboot. To do so I have edited my root crontab with the entries
0 0 * * * reboot -h now
@reboot bash /home/rig0/Documents/startup/1060OC.sh > /home/rig0/Documents/startup/1060OC.log
I am piping the output of the bash script into a log file to make sure that each setting is successful. My first, minor problem, is that some of the text that is output when I run the bash script form a terminal, is not caught in this file
For instance the log output from the cron job is (for just one GPU)
Enabled persistence mode for GPU 00000000:07:00.0.
...
Power limit for GPU 00000000:07:00.0 was set to 100.00 W from 150.00 W.
All done.
But when ran in the console the text reads (for just one GPU)
Power limit for GPU 00000000:07:00.0 was set to 100.00 W from 100.00 W.
All done.
Attribute 'GPUPowerMizerMode' (rig0-System-Product-Name:0[gpu:4]) assigned
value 1.
Attribute 'GPUMemoryTransferRateOffset' (rig0-System-Product-Name:0[gpu:4])
assigned value 1000.
Attribute 'GPUGraphicsClockOffset' (rig0-System-Product-Name:0[gpu:4])
assigned value 150.
Attribute 'GPUFanControlState' (rig0-System-Product-Name:0[gpu:4]) assigned
value 1.
Attribute 'GPUTargetFanSpeed' (rig0-System-Product-Name:0[fan:4]) assigned
value 40.
Why do I not see the "extra" text when the output is piped to the log file in the cron job?
I assume I need to do something like this U&L answer and add the redirect 2>&1
. Which from earlier reading I think means to pipe std.err
to std.out
for the command?.
Though... I think the real issue, and the main question of this post is, none of the settings from the cron job bash script are actually set (I need to check, I think the wattage is set, but I forget).
Are these settings not actually set because the cron job runs before the Nvidia X server starts up?
Basically the calls to nvidia-settings ...
don't take effect because the X server using the nvidia
driver is not yet running at the time the cron job is ran?
Is there a way I could check, and wait, for the X server to be running in my bash script? This would allow the cron job to wait until the settings it tries to effect are available.
Maybe I can add the proposed solution from this SO answer ?
EDIT: I found and old RedHat archive that gave me a way to allow the bash script to wait for the X server to startup
# Wait for the X server to startup
echo "Waiting for the X server to startup..."
XON=""
while [ "$XON" == "" ]; do
/bin/sleep 5
echo " Checking for X server."
XON=$(ps ax | grep -v grep | grep -i xorg)
echo " Result:[$XON]"
done
echo "X server started. Setting overclock settings..."
Which does seem to wait for the X server to start up based on the log file output
Waiting for the X server to startup...
Checking for X server.
Result:[ 978 tty7 Rs+ 0:01 /usr/lib/xorg/Xorg -core :0 -seat seat0 -auth /var/run/lightdm/root/:0 -nolisten tcp vt7 -novtswitch]
X server started. Setting overclock settings...
Enabled persistence mode for GPU 00000000:01:00.0.
Though the same issues persist. The settings do not take hold and the output I would expect from the settings commands, are not present in the log.
Do I need to wait for nvidia
-something to start maybe..?
EDIT2: First, I've moved the same cron command into my rc.local
because it seems as a more correct solution.
Next, I've redirected the output 2>&1
and found the error message
Failed to connect to Mir: Failed to connect to server socket: No such file or directory
Unable to init server: Could not connect: Connection refused
ERROR: The control display is undefined; please run `nvidia-settings
--help` for usage information.
Which seems to be a common problem on headless systems trying to run the nvidia-settings
command, and seems to be solved with export DISPLAY:0
. Though, given that I have a head'ed system, I am not sure if this is the correct solution, or exactly the effects of export DISPLAY:0
, especially on a head'ed system.
Doing some research...
2>&1
to capture stderr as well as stdout - you should then be able to see errors in the logfile which may give a clue as to what's not working. (Often it's things like executables not being in cron's default path - where isnvidia-smi
located, for example?) – steeldriver Jan 18 '18 at 03:08nvidia-smi
should be in cron's path because the power limits are confirmed to be set. But,nvidia-settings
probably isn't then. I'll add2>&1
to capture and see if that is indeed the issue and use absolute paths tonvidia-settings
then. // I just tried to run the same commad from the cron inrc.local
with the same results. Though, I thinkrc.local
should seenvidia-settings
given when it runs..? But I'm too novice to know. – KDecker Jan 18 '18 at 03:17nvidia-settings
, and is subsequently solved byexport DISPLAY:0
. Though, I dont exactly understand what this means, and seeing as I have a head'ed system, I think it would be a mistake to blinldy copypasta that. – KDecker Jan 18 '18 at 03:34DISPLAY=:0
is just a way to tell nvidia-settings which X display to connect to - I don't think any harm can come from it. – steeldriver Jan 18 '18 at 13:54