0

I have a bash script

#!/bin/bash

# Enable nvidia-smi settings so they are persistent the whole time the system is on.
nvidia-smi -pm 1

# Define the various overclocking settings (powerLimit in watts)
powerLimit="100"
coreOffset="150"
memoryOffset="1000"
targetFanSpeed="40"

TOTAL_GPU=5
GPU_INDEX=0
while [  $GPU_INDEX -lt $TOTAL_GPU ]; do
    nvidia-smi -i $GPU_INDEX -pl $powerLimit
    nvidia-settings -a [gpu:$GPU_INDEX]/GpuPowerMizerMode=1
    nvidia-settings -a [gpu:$GPU_INDEX]/GPUMemoryTransferRateOffset[3]=$memoryOffset
    nvidia-settings -a [gpu:$GPU_INDEX]/GPUGraphicsClockOffset[3]=$coreOffset
    nvidia-settings -a [gpu:$GPU_INDEX]/GPUFanControlState=1
    nvidia-settings -a [fan:$GPU_INDEX]/GPUTargetFanSpeed=$targetFanSpeed
    let GPU_INDEX=GPU_INDEX+1 
done

To set an overclock on the GPUs installed in my system. I am trying to run this script on startup to automate the system on a reboot. To do so I have edited my root crontab with the entries

0 0 * * * reboot -h now
@reboot bash /home/rig0/Documents/startup/1060OC.sh > /home/rig0/Documents/startup/1060OC.log

I am piping the output of the bash script into a log file to make sure that each setting is successful. My first, minor problem, is that some of the text that is output when I run the bash script form a terminal, is not caught in this file

For instance the log output from the cron job is (for just one GPU)

Enabled persistence mode for GPU 00000000:07:00.0.
...
Power limit for GPU 00000000:07:00.0 was set to 100.00 W from 150.00 W.
All done.

But when ran in the console the text reads (for just one GPU)

Power limit for GPU 00000000:07:00.0 was set to 100.00 W from 100.00 W.
All done.
  Attribute 'GPUPowerMizerMode' (rig0-System-Product-Name:0[gpu:4]) assigned
  value 1.
  Attribute 'GPUMemoryTransferRateOffset' (rig0-System-Product-Name:0[gpu:4])
  assigned value 1000.
  Attribute 'GPUGraphicsClockOffset' (rig0-System-Product-Name:0[gpu:4])
  assigned value 150.
  Attribute 'GPUFanControlState' (rig0-System-Product-Name:0[gpu:4]) assigned
  value 1.
  Attribute 'GPUTargetFanSpeed' (rig0-System-Product-Name:0[fan:4]) assigned
  value 40.

Why do I not see the "extra" text when the output is piped to the log file in the cron job?

I assume I need to do something like this U&L answer and add the redirect 2>&1. Which from earlier reading I think means to pipe std.err to std.out for the command?.


Though... I think the real issue, and the main question of this post is, none of the settings from the cron job bash script are actually set (I need to check, I think the wattage is set, but I forget).

Are these settings not actually set because the cron job runs before the Nvidia X server starts up?

Basically the calls to nvidia-settings ... don't take effect because the X server using the nvidia driver is not yet running at the time the cron job is ran?


Is there a way I could check, and wait, for the X server to be running in my bash script? This would allow the cron job to wait until the settings it tries to effect are available.

Maybe I can add the proposed solution from this SO answer ?


EDIT: I found and old RedHat archive that gave me a way to allow the bash script to wait for the X server to startup

# Wait for the X server to startup
echo "Waiting for the X server to startup..."
XON=""
while [ "$XON" == "" ]; do
    /bin/sleep 5
    echo "    Checking for X server."
    XON=$(ps ax | grep -v grep | grep -i xorg)
    echo "    Result:[$XON]"
done
echo "X server started. Setting overclock settings..."

Which does seem to wait for the X server to start up based on the log file output

Waiting for the X server to startup...
    Checking for X server.
    Result:[  978 tty7     Rs+    0:01 /usr/lib/xorg/Xorg -core :0 -seat seat0 -auth /var/run/lightdm/root/:0 -nolisten tcp vt7 -novtswitch]
X server started. Setting overclock settings...
Enabled persistence mode for GPU 00000000:01:00.0.

Though the same issues persist. The settings do not take hold and the output I would expect from the settings commands, are not present in the log.

Do I need to wait for nvidia-something to start maybe..?


EDIT2: First, I've moved the same cron command into my rc.local because it seems as a more correct solution.

Next, I've redirected the output 2>&1 and found the error message

Failed to connect to Mir: Failed to connect to server socket: No such file or directory
Unable to init server: Could not connect: Connection refused

ERROR: The control display is undefined; please run `nvidia-settings
       --help` for usage information.

Which seems to be a common problem on headless systems trying to run the nvidia-settings command, and seems to be solved with export DISPLAY:0. Though, given that I have a head'ed system, I am not sure if this is the correct solution, or exactly the effects of export DISPLAY:0, especially on a head'ed system.

Doing some research...

KDecker
  • 278
  • Yes you probably need to add 2>&1 to capture stderr as well as stdout - you should then be able to see errors in the logfile which may give a clue as to what's not working. (Often it's things like executables not being in cron's default path - where is nvidia-smi located, for example?) – steeldriver Jan 18 '18 at 03:08
  • Ohhh, hmm, nvidia-smi should be in cron's path because the power limits are confirmed to be set. But, nvidia-settings probably isn't then. I'll add 2>&1 to capture and see if that is indeed the issue and use absolute paths to nvidia-settings then. // I just tried to run the same commad from the cron in rc.local with the same results. Though, I think rc.local should see nvidia-settings given when it runs..? But I'm too novice to know. – KDecker Jan 18 '18 at 03:17
  • @steeldriver I've made another edit. But after some research, it might be a bit specific of an issue. Basically the same issue is encountered on headless systems trying to run nvidia-settings, and is subsequently solved by export DISPLAY:0. Though, I dont exactly understand what this means, and seeing as I have a head'ed system, I think it would be a mistake to blinldy copypasta that. – KDecker Jan 18 '18 at 03:34
  • Setting DISPLAY=:0 is just a way to tell nvidia-settings which X display to connect to - I don't think any harm can come from it. – steeldriver Jan 18 '18 at 13:54

0 Answers0