9

Very much like to what happened to this guy, my GPU has disappeared from my device list. Not lscpi nor lshw shows the presence of the graphics card and I don't understand what caused the problem.

System info:

  • Thinkpad P1 (2nd Gen) with Ubuntu 20.04 that was upgraded from 19.10.
  • Kernel version is 5.4.
  • GPU NVIDIA Quadro T1000.

The problem actually appeared after a firmware upgrade (bios and motherboard), so i attempted a downgrade, but fwupdmgr downgrade returned no available downgrade options. Then I tried to boot from an Ubuntu live, and there I was able to detect the GPU and it seemed to work just fine. Therefore, the problem cannot be of firmware type.

I tried to search for some useful error hints using journalctl, dmesg and /var/log/syslog, without much of a success. If you think they can be helpful i can attach the log.

I don't know if it is a kernel module problem or what else. I feel I tried everything I could without success. I even purged and removed all nvidia drivers and nvidia related libraries, but this cannot be the reason why not even lshw shows the device...

P.S. using the nvida-installer to attempt forcing the installation of drivers didn't work either.



EDIT: New insights found.

Digging inside the journal I found this line:

kernel: nouveau 0000:01:00.0: DRM: Disabling PCI power management to avoid bug

(full output below [1])

Therefore I disabled and blacklisted the nouveau driver in order to see if it was messing something around, but without much of a success. The new output does not contain any voice regarding the deactivation of 0000:01:00.0 (which is the graphics card), but it only states

kernel: pci 0000:01:00.0: Removing from iommu group 1

(full output below [2])

But I have no clue what does it mean, nor I found anything useful on the web.


Requested in a comment: As you can see only the Intel graphics card is shown:

:~$ lspci -k | grep -EA3 'VGA|3D|Display'
00:02.0 VGA compatible controller: Intel Corporation UHD Graphics 630 (Mobile)
Subsystem: Lenovo UHD Graphics 630 (Mobile)
Kernel driver in use: i915
Kernel modules: i915

Second request:

:~$ ls -al /dev | grep nvidia
crw-rw-rw-   1 root     root    195,     0 lug  2 19:40 nvidia0
crw-rw-rw-   1 root     root    195,   255 lug  2 19:40 nvidiactl
crw-rw-rw-   1 root     root    195,   254 lug  2 19:40 nvidia-modeset
crw-rw-rw-   1 root     root    235,     0 lug  2 19:40 nvidia-uvm
crw-rw-rw-   1 root     root    235,     1 lug  2 19:40 nvidia-uvm-tools

:~$ grep 125 /etc/group colord:x:125:

Third request: output of /proc/cmdline

BOOT_IMAGE=/vmlinuz-5.4.0-40-generic root=/dev/mapper/vgubuntu-root ro quiet nosplash noacpi noapic pcie_port_pm=off

I should precise that the presence or absence of any of the last three options didn't make any diffenrence, exept increasing fan usage.

Fourth request:


----:~$ grep -r nvidia /lib/udev/rules.d/ /etc/udev/rules.d
/lib/udev/rules.d/61-gdm.rules:# disable Wayland when using the proprietary nvidia driver
/lib/udev/rules.d/61-gdm.rules:DRIVER=="nvidia", RUN+="/usr/lib/gdm3/gdm-disable-wayland"
/lib/udev/rules.d/71-nvidia.rules:SUBSYSTEM=="pci", ATTRS{vendor}=="0x10de", DRIVERS=="nvidia", TAG+="seat", TAG+="master-of-seat"
/lib/udev/rules.d/71-nvidia.rules:# Start and stop nvidia-persistenced on power on and power off
/lib/udev/rules.d/71-nvidia.rules:ACTION=="add", DEVPATH=="/bus/pci/drivers/nvidia", TAG+="systemd", ENV{SYSTEMD_WANTS}="nvidia-persistenced.service"
/lib/udev/rules.d/71-nvidia.rules:# Load and unload nvidia-modeset module
/lib/udev/rules.d/71-nvidia.rules:ACTION=="add", DEVPATH=="/bus/pci/drivers/nvidia", RUN+="/sbin/modprobe nvidia-modeset"
/lib/udev/rules.d/71-nvidia.rules:ACTION=="remove", DEVPATH=="/bus/pci/drivers/nvidia", RUN+="/sbin/modprobe -r nvidia-modeset"
/lib/udev/rules.d/71-nvidia.rules:# Load and unload nvidia-drm module
/lib/udev/rules.d/71-nvidia.rules:ACTION=="add", DEVPATH=="/bus/pci/drivers/nvidia", RUN+="/sbin/modprobe nvidia-drm"
/lib/udev/rules.d/71-nvidia.rules:ACTION=="remove", DEVPATH=="/bus/pci/drivers/nvidia", RUN+="/sbin/modprobe -r nvidia-drm"
/lib/udev/rules.d/71-nvidia.rules:# Load and unload nvidia-uvm module
/lib/udev/rules.d/71-nvidia.rules:ACTION=="add", DEVPATH=="/bus/pci/drivers/nvidia", RUN+="/sbin/modprobe nvidia-uvm"
/lib/udev/rules.d/71-nvidia.rules:ACTION=="remove", DEVPATH=="/bus/pci/drivers/nvidia", RUN+="/sbin/modprobe -r nvidia-uvm"
/lib/udev/rules.d/71-nvidia.rules:# This will create the device nvidia device nodes
/lib/udev/rules.d/71-nvidia.rules:ACTION=="add", DEVPATH=="/bus/pci/drivers/nvidia", RUN+="/sbin/ub-device-create"
/lib/udev/rules.d/71-nvidia.rules:# Create the device node for the nvidia-uvm module
/lib/udev/rules.d/71-nvidia.rules:ACTION=="add", DEVPATH=="/module/nvidia_uvm", SUBSYSTEM=="module", RUN+="/sbin/ub-device-create"
/lib/udev/rules.d/71-u-d-c-gpu-detection.rules:ACTION=="add", SUBSYSTEMS=="pci", DRIVERS=="nvidia", RUN+="/bin/touch /run/u-d-c-nvidia-was-loaded"


[1] output with nouveau driver

----:~$ cat journalctl -b | grep -i '01:00\|nvidia\|nouveau'
giu 22 19:21:28 ----- kernel: pci 0000:01:00.0: [10de:1fb9] type 00 class 0x030000
giu 22 19:21:28 ----- kernel: pci 0000:01:00.0: reg 0x10: [mem 0xed000000-0xedffffff]
giu 22 19:21:28 ----- kernel: pci 0000:01:00.0: reg 0x14: [mem 0xc0000000-0xcfffffff 64bit pref]
giu 22 19:21:28 ----- kernel: pci 0000:01:00.0: reg 0x1c: [mem 0xd0000000-0xd1ffffff 64bit pref]
giu 22 19:21:28 ----- kernel: pci 0000:01:00.0: reg 0x24: [io  0x2000-0x207f]
giu 22 19:21:28 ----- kernel: pci 0000:01:00.0: reg 0x30: [mem 0xfff80000-0xffffffff pref]
giu 22 19:21:28 ----- kernel: pci 0000:01:00.0: PME# supported from D0 D3hot D3cold
giu 22 19:21:28 ----- kernel: pci 0000:01:00.1: [10de:10fa] type 00 class 0x040300
giu 22 19:21:28 ----- kernel: pci 0000:01:00.1: reg 0x10: [mem 0xee000000-0xee003fff]
giu 22 19:21:28 ----- kernel: pci 0000:01:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
giu 22 19:21:28 ----- kernel: pci 0000:01:00.0: vgaarb: bridge control possible
giu 22 19:21:28 ----- kernel: pci 0000:01:00.0: can't claim BAR 6 [mem 0xfff80000-0xffffffff pref]: no compatible bridge window
giu 22 19:21:28 ----- kernel: pci 0000:01:00.0: BAR 6: assigned [mem 0xee080000-0xee0fffff pref]
giu 22 19:21:28 ----- kernel: pci 0000:01:00.1: D0 power state depends on 0000:01:00.0
giu 22 19:21:28 ----- kernel: pci 0000:01:00.0: Adding to iommu group 1
giu 22 19:21:28 ----- kernel: pci 0000:01:00.1: Adding to iommu group 1
giu 22 19:21:28 ----- kernel: pci 0000:01:00.0: optimus capabilities: enabled, status dynamic power, hda bios codec supported
giu 22 19:21:28 ----- kernel: nouveau: detected PR support, will not use DSM
giu 22 19:21:28 ----- kernel: nouveau 0000:01:00.0: enabling device (0002 -> 0003)
giu 22 19:21:28 ----- kernel: nouveau 0000:01:00.0: NVIDIA TU117 (167000a1)
giu 22 19:21:28 ----- kernel: nouveau 0000:01:00.0: bios: version 90.17.31.00.23
giu 22 19:21:28 ----- kernel: nouveau 0000:01:00.0: fb: 4096 MiB GDDR5
giu 22 19:21:28 ----- kernel: nouveau 0000:01:00.0: DRM: VRAM: 4096 MiB
giu 22 19:21:28 ----- kernel: nouveau 0000:01:00.0: DRM: GART: 536870912 MiB
giu 22 19:21:28 ----- kernel: nouveau 0000:01:00.0: DRM: BIT table 'A' not found
giu 22 19:21:28 ----- kernel: nouveau 0000:01:00.0: DRM: BIT table 'L' not found
giu 22 19:21:28 ----- kernel: nouveau 0000:01:00.0: DRM: TMDS table version 2.0
giu 22 19:21:28 ----- kernel: nouveau 0000:01:00.0: DRM: DCB version 4.1
giu 22 19:21:28 ----- kernel: nouveau 0000:01:00.0: DRM: DCB outp 00: 02800f66 04600020
giu 22 19:21:28 ----- kernel: nouveau 0000:01:00.0: DRM: DCB outp 01: 02011f52 00020010
giu 22 19:21:28 ----- kernel: nouveau 0000:01:00.0: DRM: DCB outp 02: 01022f36 04600010
giu 22 19:21:28 ----- kernel: nouveau 0000:01:00.0: DRM: DCB outp 03: 01033f46 04600020
giu 22 19:21:28 ----- kernel: nouveau 0000:01:00.0: DRM: DCB conn 00: 00020047
giu 22 19:21:28 ----- kernel: nouveau 0000:01:00.0: DRM: DCB conn 01: 00010161
giu 22 19:21:28 ----- kernel: nouveau 0000:01:00.0: DRM: DCB conn 02: 00001248
giu 22 19:21:28 ----- kernel: nouveau 0000:01:00.0: DRM: DCB conn 03: 00002348
giu 22 19:21:28 ----- kernel: nouveau 0000:01:00.0: DRM: failed to create kernel channel, -22
giu 22 19:21:28 ----- kernel: nouveau 0000:01:00.0: DRM: MM: using COPY for buffer copies
giu 22 19:21:28 ----- kernel: nouveau 0000:01:00.0: DRM: unknown connector type 48
giu 22 19:21:28 ----- kernel: nouveau 0000:01:00.0: DRM: unknown connector type 48
giu 22 19:21:28 ----- kernel: [drm] Initialized nouveau 1.3.1 20120801 for 0000:01:00.0 on minor 1
giu 22 19:21:28 ----- kernel: nouveau 0000:01:00.0: DRM: Disabling PCI power management to avoid bug
giu 22 19:21:30 ----- kernel: snd_hda_intel 0000:01:00.1: enabling device (0000 -> 0002)
giu 22 19:21:30 ----- kernel: snd_hda_intel 0000:01:00.1: Disabling MSI
giu 22 19:21:30 ----- kernel: snd_hda_intel 0000:01:00.1: Handle vga_switcheroo audio client
giu 22 19:21:30 ----- kernel: pci 0000:01:00.0: Removing from iommu group 1
giu 22 19:21:31 ----- kernel: input: HDA NVidia HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input20
giu 22 19:21:31 ----- kernel: input: HDA NVidia HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input21
giu 22 19:21:31 ----- kernel: input: HDA NVidia HDMI/DP,pcm=8 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input22
giu 22 19:21:31 ----- kernel: pci 0000:01:00.1: Removing from iommu group 1
giu 22 19:21:31 ----- systemd-udevd[761]: controlC1: /usr/lib/udev/rules.d/78-sound-card.rules:5 Failed to write ATTR{/sys/devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/controlC1/../uevent}, ignoring: No such file or directory
giu 22 19:21:31 ----- audit[1241]: AVC apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe" pid=1241 comm="apparmor_parser"
giu 22 19:21:31 ----- audit[1241]: AVC apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe//kmod" pid=1241 comm="apparmor_parser"
giu 22 19:21:31 ----- kernel: audit: type=1400 audit(1592846491.687:6): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe" pid=1241 comm="apparmor_parser"
giu 22 19:21:31 ----- kernel: audit: type=1400 audit(1592846491.687:7): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe//kmod" pid=1241 comm="apparmor_parser"

[2] output without nouveau driver


-----:~$ journalctl -b | grep -i '01:00\|nvidia\|nouveau'
giu 23 08:15:49 ----- kernel: pci 0000:01:00.0: [10de:1fb9] type 00 class 0x030000
giu 23 08:15:49 ----- kernel: pci 0000:01:00.0: reg 0x10: [mem 0xed000000-0xedffffff]
giu 23 08:15:49 ----- kernel: pci 0000:01:00.0: reg 0x14: [mem 0xc0000000-0xcfffffff 64bit pref]
giu 23 08:15:49 ----- kernel: pci 0000:01:00.0: reg 0x1c: [mem 0xd0000000-0xd1ffffff 64bit pref]
giu 23 08:15:49 ----- kernel: pci 0000:01:00.0: reg 0x24: [io  0x2000-0x207f]
giu 23 08:15:49 ----- kernel: pci 0000:01:00.0: reg 0x30: [mem 0xfff80000-0xffffffff pref]
giu 23 08:15:49 ----- kernel: pci 0000:01:00.0: PME# supported from D0 D3hot D3cold
giu 23 08:15:49 ----- kernel: pci 0000:01:00.1: [10de:10fa] type 00 class 0x040300
giu 23 08:15:49 ----- kernel: pci 0000:01:00.1: reg 0x10: [mem 0xee000000-0xee003fff]
giu 23 08:15:49 ----- kernel: pci 0000:01:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
giu 23 08:15:49 ----- kernel: pci 0000:01:00.0: vgaarb: bridge control possible
giu 23 08:15:49 ----- kernel: pci 0000:01:00.0: can't claim BAR 6 [mem 0xfff80000-0xffffffff pref]: no compatible bridge window
giu 23 08:15:49 ----- kernel: pci 0000:01:00.0: BAR 6: assigned [mem 0xee080000-0xee0fffff pref]
giu 23 08:15:49 ----- kernel: pci 0000:01:00.1: D0 power state depends on 0000:01:00.0
giu 23 08:15:49 ----- kernel: pci 0000:01:00.0: Adding to iommu group 1
giu 23 08:15:49 ----- kernel: pci 0000:01:00.1: Adding to iommu group 1
giu 23 08:15:49 ----- kernel: pci 0000:01:00.0: Removing from iommu group 1
giu 23 08:15:49 ----- kernel: pci 0000:01:00.1: Removing from iommu group 1
giu 23 08:15:49 ----- kernel: nvidia: module license 'NVIDIA' taints kernel.
giu 23 08:15:49 ----- kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 235
giu 23 08:15:49 ----- kernel: NVRM: No NVIDIA graphics adapter found!
giu 23 08:15:49 ----- kernel: nvidia-nvlink: Unregistered the Nvlink Core, major device number 235
giu 23 08:15:50 ----- kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 235
giu 23 08:15:50 ----- kernel: NVRM: No NVIDIA graphics adapter found!
giu 23 08:15:50 ----- kernel: nvidia-nvlink: Unregistered the Nvlink Core, major device number 235
giu 23 08:15:52 ----- audit[1086]: AVC apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe" pid=1086 comm="apparmor_parser"
giu 23 08:15:52 ----- audit[1086]: AVC apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe//kmod" pid=1086 comm="apparmor_parser"
giu 23 08:15:52 ----- kernel: audit: type=1400 audit(1592892952.070:4): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe" pid=1086 comm="apparmor_parser"
giu 23 08:15:52 ----- kernel: audit: type=1400 audit(1592892952.070:5): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe//kmod" pid=1086 comm="apparmor_parser"

One more journalctl output: performed after manually installing all nvidia software (and latest driver), plus using as boot option pcie_port_pm=off (as suggested from @nobody)

-----:~$ journalctl -b | grep -i '01:00\|nvidia\|nouveau'
lug 02 09:47:59 ----- kernel: pci 0000:01:00.0: [10de:1fb9] type 00 class 0x030000
lug 02 09:47:59 ----- kernel: pci 0000:01:00.0: reg 0x10: [mem 0xed000000-0xedffffff]
lug 02 09:47:59 ----- kernel: pci 0000:01:00.0: reg 0x14: [mem 0xc0000000-0xcfffffff 64bit pref]
lug 02 09:47:59 ----- kernel: pci 0000:01:00.0: reg 0x1c: [mem 0xd0000000-0xd1ffffff 64bit pref]
lug 02 09:47:59 ----- kernel: pci 0000:01:00.0: reg 0x24: [io  0x2000-0x207f]
lug 02 09:47:59 ----- kernel: pci 0000:01:00.0: reg 0x30: [mem 0xfff80000-0xffffffff pref]
lug 02 09:47:59 ----- kernel: pci 0000:01:00.0: PME# supported from D0 D3hot D3cold
lug 02 09:47:59 ----- kernel: pci 0000:01:00.1: [10de:10fa] type 00 class 0x040300
lug 02 09:47:59 ----- kernel: pci 0000:01:00.1: reg 0x10: [mem 0xee000000-0xee003fff]
lug 02 09:47:59 ----- kernel: pci 0000:01:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
lug 02 09:47:59 ----- kernel: pci 0000:01:00.0: vgaarb: bridge control possible
lug 02 09:47:59 ----- kernel: pci 0000:01:00.0: can't claim BAR 6 [mem 0xfff80000-0xffffffff pref]: no compatible bridge window
lug 02 09:47:59 ----- kernel: pci 0000:01:00.0: BAR 6: assigned [mem 0xee080000-0xee0fffff pref]
lug 02 09:47:59 ----- kernel: pci 0000:01:00.1: D0 power state depends on 0000:01:00.0
lug 02 09:47:59 ----- kernel: pci 0000:01:00.0: Adding to iommu group 1
lug 02 09:47:59 ----- kernel: pci 0000:01:00.1: Adding to iommu group 1
lug 02 09:47:59 ----- kernel: nvidia: loading out-of-tree module taints kernel.
lug 02 09:47:59 ----- kernel: nvidia: module license 'NVIDIA' taints kernel.
lug 02 09:47:59 ----- kernel: nvidia: module verification failed: signature and/or required key missing - tainting kernel
lug 02 09:47:59 ----- kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 237
lug 02 09:47:59 ----- kernel: nvidia 0000:01:00.0: enabling device (0002 -> 0003)
lug 02 09:47:59 ----- kernel: nvidia 0000:01:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none
lug 02 09:47:59 ----- kernel: NVRM: loading NVIDIA UNIX x86_64 Kernel Module  440.100  Fri May 29 08:45:51 UTC 2020
lug 02 09:47:59 ----- kernel: nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  440.100  Fri May 29 08:14:04 UTC 2020
lug 02 09:47:59 ----- kernel: [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
lug 02 09:47:59 ----- kernel: [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:01:00.0 on minor 1
lug 02 09:47:59 ----- kernel: nvidia-uvm: Loaded the UVM driver, major device number 235.
lug 02 09:47:59 ----- kernel: pci 0000:01:00.1: Removing from iommu group 1
lug 02 09:47:59 ----- kernel: pci 0000:01:00.0: Removing from iommu group 1
lug 02 09:48:02 ----- kernel: audit: type=1400 audit(1593676082.870:3): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe" pid=1113 comm="apparmor_parser"
lug 02 09:48:02 ----- kernel: audit: type=1400 audit(1593676082.870:4): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe//kmod" pid=1113 comm="apparmor_parser"
lug 02 09:48:02 ----- audit[1113]: AVC apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe" pid=1113 comm="apparmor_parser"
lug 02 09:48:02 ----- audit[1113]: AVC apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe//kmod" pid=1113 comm="apparmor_parser"
lug 02 09:48:03 ----- systemd[1]: Starting NVIDIA Persistence Daemon...
lug 02 09:48:03 ----- nvidia-persistenced[1231]: Verbose syslog connection opened
lug 02 09:48:03 ----- nvidia-persistenced[1231]: Now running with user ID 125 and group ID 132
lug 02 09:48:03 ----- nvidia-persistenced[1231]: Started (1231)
lug 02 09:48:03 ----- nvidia-persistenced[1231]: Failed to query NVIDIA devices. Please ensure that the NVIDIA device files (/dev/nvidia*) exist, and that user 125 has read and write permissions for those files.
lug 02 09:48:03 ----- nvidia-persistenced[1231]: PID file unlocked.
lug 02 09:48:03 ----- nvidia-persistenced[1228]: nvidia-persistenced failed to initialize. Check syslog for more details.
lug 02 09:48:03 ----- nvidia-persistenced[1231]: PID file closed.
lug 02 09:48:03 ----- nvidia-persistenced[1231]: The daemon no longer has permission to remove its runtime data directory /var/run/nvidia-persistenced
lug 02 09:48:03 ----- nvidia-persistenced[1231]: Shutdown (1231)
lug 02 09:48:03 ----- systemd[1]: nvidia-persistenced.service: Control process exited, code=exited, status=1/FAILURE
lug 02 09:48:03 ----- systemd[1]: nvidia-persistenced.service: Failed with result 'exit-code'.
lug 02 09:48:03 ----- systemd[1]: Failed to start NVIDIA Persistence Daemon.
Niccco
  • 146
  • 1
    Please [edit] your question and add output of lspci -k | grep -EA3 'VGA|3D|Display' terminal command. – Pilot6 Jun 17 '20 at 21:18
  • What happens if you (re)install the proprietary NVidia drivers and/or boot with noacpi? – Landak Jun 28 '20 at 22:50
  • @Landak Thanks for the hint. About the drivers, I was receiving errors such as "no suitable candidate for this driver" or something of that sort. About thenoacpi option, i never heard about it, I will try it this evening. – Niccco Jun 30 '20 at 07:38
  • @Landak just for clarity I tried to manually install the driver (manually because the nvidia-installer fails) and nothing happened. I also tried to add the options noacpi and noapic in the grub config file (at the line GRUB_CMDLINE_LINUX_DEFAULT), but the GPU is still unreachable. – Niccco Jun 30 '20 at 16:52
  • Please, can you try bootoption pcie_port_pm=off – nobody Jul 01 '20 at 15:19
  • @nobody I did add that option as well and then updated the grub. But is see no difference it the various outputs, even the one from journalctl. I will edit the question with one more journalctl output even though i don't think it's gonna be much of a use. – Niccco Jul 02 '20 at 07:54
  • 1
    As I see no one suggested this before. You should try booting from Ubuntu 19.10 Live (or other) and see whether it shows in lspci. Last time I saw a vanishing Nvidia card it was neither present in any Linux nor Windows. In that case it meant that the notebook motherboard was at fault and had to replace it. Also, check BIOS (UEFI), some of them have disable options for the 2nd VGA. If the system can't see it physically (lspci) then trying out different drivers is futile. – lev258 Jul 02 '20 at 08:13
  • You mentioned seeing it on a Live sometime before, but you should try again. By forcing the Live to use it you might flip a switch that will make it work again normally. – lev258 Jul 02 '20 at 08:22
  • what the hack. lspci -nnk Have you controlled if /dev/nvidia* is present? ls -al /dev | grep nvidia and grep 125 /etc/group – nobody Jul 02 '20 at 09:16
  • P.S disable secureboot too, please. – nobody Jul 02 '20 at 09:21
  • Thank you everyone for the comments. @nobody secure boot has always been disabled; I'm updating the question with the output of the two commands you asked; lspci doesn't show device 01:00.0 which is my nvidia card. – Niccco Jul 02 '20 at 17:56
  • @lev258 In the meanwhile i bought a second hard drive in which i installed ubuntu 20.04, in order to see if the problem was existing even inside an up to date version (the live is not up to date), and from there EVERYTHING works. Second monitor, dual and single graphics card usage (nvidia-prime on-demand/nvidia), lspci and lshw show the nvidia card . From my understanding there is something that powers off the card even before any driver is loaded, but I don't know how to debug such thing (who has the power to do that? where do i debug it except from the journal? how to prevent it?) – Niccco Jul 02 '20 at 17:56
  • Can you add the content of /proc/cmdline. There could be some switch added by grub that disables somehow the GPU. – Simon Sudler Jul 03 '20 at 06:23
  • @SimonSudler I did as you required. – Niccco Jul 03 '20 at 09:42
  • grep -r nvidia /lib/udev/rules.d/ /etc/udev/rules.d last idea from me :(. – nobody Jul 03 '20 at 14:13
  • Thanks everyone anyway! I guess i will reset the system sooner or later – Niccco Jul 04 '20 at 13:57
  • Please change for test purpose /lib/udev/rules.d/71-nvidia.rules with this one https://pastebin.ubuntu.com/p/yCYwK8JRXT/ after run sudo update-initramfs -u -k $(uname -r) this updates only the current running kernel. reboot. – nobody Jul 05 '20 at 10:22
  • Thanks @nobody. The only visible difference i found (by doing so) is that i got at boot time the error "Failed to start NVIDIA Persistence deamon" 5 times instead of only one – Niccco Jul 06 '20 at 11:56
  • related: https://askubuntu.com/questions/737289/nvidia-gpu-is-not-detected – Zanna Nov 13 '20 at 16:52

1 Answers1

1

Please see the helpful answer by generix here: https://forums.developer.nvidia.com/t/no-matter-which-drivers-i-install-i-cannot-boot-my-ubuntu-20-04-lts-beyond-a-black-screen/127510/9

  1. switch to nvidia (again): sudo prime-select nvidia
  2. delete /lib/udev/rules.d/50-pm-nvidia.rules (and delete /lib/udev/rules.d/80-pm-nvidia.rules too)
  3. remove stray blacklist files: sudo rm /lib/modprobe.d/blacklist-nvidia.conf /etc/modprobe.d/blacklist-nvidia.conf
  4. update the initrd: sudo update-initramfs -u
  5. reboot

See also this Nvidia Forums post where generix noted:

Ok, it’s this:

/lib/udev/rules.d/80-pm-nvidia.rules:

ACTION=="add", SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{class}=="0x030000", ATTR{remove}="1"

it’s removing the nvidia gpu from the bus. The rest of the file are rules to prepare for render offload. This looks like something changed in Ubuntu’s prime-select/nvidia-prime again, don’t really know what’s the point of doing this…

For me, sudo prime-select nvidia usually does the trick, but if not, manually deleting /lib/udev/rules.d/50-pm-nvidia.rules and /lib/udev/rules.d/80-pm-nvidia.rules and a reboot should do it.

Thankfully, these udev rules are going away in 20.10 (groovy) (see changelog of nvidia-prime 0.8.15), so such "Nvidia GPU disappearing" problems will soon become a thing of the past.

Zanna
  • 70,465
Anthony Fok
  • 121
  • 2