Chapter 9. Known Issues

The following problems still exist in this release and are in the process of being resolved.

Known Issues

Cache Aliasing

Cache aliasing occurs when multiple mappings to a physical page of memory have conflicting caching states, such as cached and uncached. Due to these conflicting states, data in that physical page may become corrupted when the processor's cache is flushed. If that page is being used for DMA by a driver such as NVIDIA's graphics driver, this can lead to hardware stability problems and system lockups.

NVIDIA has encountered bugs with some Linux kernel versions that lead to cache aliasing. Although some systems will run perfectly fine when cache aliasing occurs, other systems will experience severe stability problems, including random lockups. Users experiencing stability problems due to cache aliasing will benefit from updating to a kernel that does not cause cache aliasing to occur.

Valgrind

The NVIDIA OpenGL implementation makes use of self modifying code. To force Valgrind to retranslate this code after a modification you must run using the Valgrind command line option:

--smc-check=all

Without this option Valgrind may execute incorrect code causing incorrect behavior and reports of the form:

==30313== Invalid write of size 4

Driver fails to initialize when MSI interrupts are enabled

The Linux NVIDIA driver uses Message Signaled Interrupts (MSI) by default. This provides compatibility and scalability benefits, mainly due to the avoidance of IRQ sharing.

Some systems have been seen to have problems supporting MSI, while working fine with virtual wire interrupts. These problems manifest as an inability to start X with the NVIDIA driver, or CUDA initialization failures. The NVIDIA driver will then report an error indicating that the NVIDIA kernel module does not appear to be receiving interrupts generated by the GPU.

Problems have also been seen with suspend/resume while MSI is enabled. All known problems have been fixed, but if you observe problems with suspend/resume that you did not see with previous drivers, disabling MSI may help you.

NVIDIA is working on a long-term solution to improve the driver's out of the box compatibility with system configurations that do not fully support MSI.

MSI interrupts can be disabled via the NVIDIA kernel module parameter "NVreg_EnableMSI=0". This can be set on the command line when loading the module, or more appropriately via your distribution's kernel module configuration files (such as those under /etc/modprobe.d/).

Console restore behavior

The Linux NVIDIA driver uses the nvidia-modeset module for console restore whenever it can. Currently, the improved console restore mechanism is used on systems that boot with the UEFI Graphics Output Protocol driver, and on systems that use supported VESA linear graphical modes. Note that VGA text, color index, planar, banked, and some linear modes cannot be supported, and will use the older console restore method instead.

When the new console restore mechanism is in use and the nvidia-modeset module is initialized (e.g. because an X server is running on a different VT, nvidia-persistenced is running, or the nvidia_drm module is loaded with the modeset=1 parameter), then nvidia-modeset will respond to hot plug events by displaying the console on as many displays as it can. Note that to save power, it may not display the console on all connected displays.

Vulkan and device enumeration

Starting with the X.Org X server version 1.20.7, it is possible to enumerate all the NVIDIA devices in the system if the application is able to open a connection to the X server. However, such applications will only be able to create an Xlib or XCB swapchain on the device driving the X screen. Such a device can be identified by using the vkGetPhysicalDeviceSurfaceSupportKHR() API.

Prior to the X.Org X server version 1.20.7, it is not possible to enumerate multiple devices if one of them will be used to present to an X11 swapchain. It is still possible to enumerate multiple devices even if one of them is driving an X screen, if the devices will be used for Vulkan offscreen rendering or presenting to a display swapchain. For that, make sure that the application cannot open a display connection to an X server by, for example, unsetting the DISPLAY environment variable.

Restricting access to GPU performance counters

NVIDIA Developer Tools allow developers to debug, profile, and develop software for NVIDIA GPUs. GPU performance counters are integral to these tools. By default, access to the GPU performance counters is restricted to root, and other users with the CAP_SYS_ADMIN capability, for security reasons. If developers require access to the NVIDIA Developer Tools, a system administrator can accept the security risk and allow access to users without the CAP_SYS_ADMIN capability.

Wider access to GPU performance counters can be granted by setting the kernel module parameter "NVreg_RestrictProfilingToAdminUsers=0" in the nvidia.ko kernel module. This can be set on the command line when loading the module, or more appropriately via your distribution's kernel module configuration files (such as those under /etc/modprobe.d/).

Driver fails to initialize with some versions of RHEL 8

Some versions of Red Hat Enterprise Linux 8 kernels have a bug that causes driver initialization to fail with an error such as:

    NVRM: Xid (PCI:0000:09:00): 79, pid=2172, GPU has fallen off the bus.
    NVRM: GPU 0000:09:00.0: GPU has fallen off the bus.
    NVRM: GPU 0000:09:00.0: RmInitAdapter failed! (0x26:0x65:1239)
    NVRM: GPU 0000:09:00.0: rm_init_adapter failed, device minor number 0

See the Red Hat knowledge base article https://access.redhat.com/solutions/5825061 to find the specific affected and fixed kernel versions.

Driver fails to load on Linux kernel versions 5.18 through 5.18.19 with CONFIG_X86_KERNEL_IBT enabled

The NVIDIA driver fails to load on IBT (Indirect Branch Tracking) supported CPUs running Linux kernel versions 5.18 to 5.18.19, when IBT is enabled, with the following error:

        error "traps: Missing ENDBR:"

This issue is not seen with Linux kernels having the following commit:

         commit 3c6f9f77e618 (objtool: Rework ibt and extricate from stack validation)

The aforementioned commit is available in Linux kernel versions 5.19 and later. The NVIDIA driver's IBT support works with Linux kernels containing commit 3c6f9f77e618 (5.19 and later). Please use the kernel boot parameter "ibt=off" as a workaround on kernels without that commit.

Notebooks

If you are using a notebook see the "