When MST and atomic were introduced to nouveau, another structure that
could contain a drm_connector embedded within it was introduced; struct
nv50_mstc. This meant that we no longer would be able to simply loop
through our connector list and assume that nouveau_connector() would
return a proper pointer for each connector, since the assertion that
all connectors coming from nouveau have a full nouveau_connector struct
became invalid.
Unfortunately, none of the actual code that looped through connectors
ever got updated, which means that we've been causing invalid memory
accesses for quite a while now.
An example that was caught by KASAN:
[ 201.038698] ==================================================================
[ 201.038792] BUG: KASAN: slab-out-of-bounds in nvif_notify_get+0x190/0x1a0 [nouveau]
[ 201.038797] Read of size 4 at addr ffff88076738c650 by task kworker/0:3/718
[ 201.038800]
[ 201.038822] CPU: 0 PID: 718 Comm: kworker/0:3 Tainted: G O 4.18.0-rc4Lyude-Test+ #1
[ 201.038825] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET78W (1.51 ) 05/18/2018
[ 201.038882] Workqueue: events nouveau_display_hpd_work [nouveau]
[ 201.038887] Call Trace:
[ 201.038894] dump_stack+0xa4/0xfd
[ 201.038900] print_address_description+0x71/0x239
[ 201.038929] ? nvif_notify_get+0x190/0x1a0 [nouveau]
[ 201.038935] kasan_report.cold.6+0x242/0x2fe
[ 201.038942] __asan_report_load4_noabort+0x19/0x20
[ 201.038970] nvif_notify_get+0x190/0x1a0 [nouveau]
[ 201.038998] ? nvif_notify_put+0x1f0/0x1f0 [nouveau]
[ 201.039003] ? kmsg_dump_rewind_nolock+0xe4/0xe4
[ 201.039049] nouveau_display_init.cold.12+0x34/0x39 [nouveau]
[ 201.039089] ? nouveau_user_framebuffer_create+0x120/0x120 [nouveau]
[ 201.039133] nouveau_display_resume+0x5c0/0x810 [nouveau]
[ 201.039173] ? nvkm_client_ioctl+0x20/0x20 [nouveau]
[ 201.039215] nouveau_do_resume+0x19f/0x570 [nouveau]
[ 201.039256] nouveau_pmops_runtime_resume+0xd8/0x2a0 [nouveau]
[ 201.039264] pci_pm_runtime_resume+0x130/0x250
[ 201.039269] ? pci_restore_standard_config+0x70/0x70
[ 201.039275] __rpm_callback+0x1f2/0x5d0
[ 201.039279] ? rpm_resume+0x560/0x18a0
[ 201.039283] ? pci_restore_standard_config+0x70/0x70
[ 201.039287] ? pci_restore_standard_config+0x70/0x70
[ 201.039291] ? pci_restore_standard_config+0x70/0x70
[ 201.039296] rpm_callback+0x175/0x210
[ 201.039300] ? pci_restore_standard_config+0x70/0x70
[ 201.039305] rpm_resume+0xcc3/0x18a0
[ 201.039312] ? rpm_callback+0x210/0x210
[ 201.039317] ? __pm_runtime_resume+0x9e/0x100
[ 201.039322] ? kasan_check_write+0x14/0x20
[ 201.039326] ? do_raw_spin_lock+0xc2/0x1c0
[ 201.039333] __pm_runtime_resume+0xac/0x100
[ 201.039374] nouveau_display_hpd_work+0x67/0x1f0 [nouveau]
[ 201.039380] process_one_work+0x7a0/0x14d0
[ 201.039388] ? cancel_delayed_work_sync+0x20/0x20
[ 201.039392] ? lock_acquire+0x113/0x310
[ 201.039398] ? kasan_check_write+0x14/0x20
[ 201.039402] ? do_raw_spin_lock+0xc2/0x1c0
[ 201.039409] worker_thread+0x86/0xb50
[ 201.039418] kthread+0x2e9/0x3a0
[ 201.039422] ? process_one_work+0x14d0/0x14d0
[ 201.039426] ? kthread_create_worker_on_cpu+0xc0/0xc0
[ 201.039431] ret_from_fork+0x3a/0x50
[ 201.039441]
[ 201.039444] Allocated by task 79:
[ 201.039449] save_stack+0x43/0xd0
[ 201.039452] kasan_kmalloc+0xc4/0xe0
[ 201.039456] kmem_cache_alloc_trace+0x10a/0x260
[ 201.039494] nv50_mstm_add_connector+0x9a/0x340 [nouveau]
[ 201.039504] drm_dp_add_port+0xff5/0x1fc0 [drm_kms_helper]
[ 201.039511] drm_dp_send_link_address+0x4a7/0x740 [drm_kms_helper]
[ 201.039518] drm_dp_check_and_send_link_address+0x1a7/0x210 [drm_kms_helper]
[ 201.039525] drm_dp_mst_link_probe_work+0x71/0xb0 [drm_kms_helper]
[ 201.039529] process_one_work+0x7a0/0x14d0
[ 201.039533] worker_thread+0x86/0xb50
[ 201.039537] kthread+0x2e9/0x3a0
[ 201.039541] ret_from_fork+0x3a/0x50
[ 201.039543]
[ 201.039546] Freed by task 0:
[ 201.039549] (stack is not available)
[ 201.039551]
[ 201.039555] The buggy address belongs to the object at ffff88076738c1a8
which belongs to the cache kmalloc-2048 of size 2048
[ 201.039559] The buggy address is located 1192 bytes inside of
2048-byte region [ffff88076738c1a8, ffff88076738c9a8)
[ 201.039563] The buggy address belongs to the page:
[ 201.039567] page:ffffea001d9ce200 count:1 mapcount:0 mapping:ffff88084000d0c0 index:0x0 compound_mapcount: 0
[ 201.039573] flags: 0x8000000000008100(slab|head)
[ 201.039578] raw: 8000000000008100 ffffea001da3be08 ffffea001da25a08 ffff88084000d0c0
[ 201.039582] raw: 0000000000000000 00000000000d000d 00000001ffffffff 0000000000000000
[ 201.039585] page dumped because: kasan: bad access detected
[ 201.039588]
[ 201.039591] Memory state around the buggy address:
[ 201.039594] ffff88076738c500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 201.039598] ffff88076738c580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 201.039601] >ffff88076738c600: 00 00 00 00 00 00 00 00 00 00 fc fc fc fc fc fc
[ 201.039604] ^
[ 201.039607] ffff88076738c680: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 201.039611] ffff88076738c700: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 201.039613] ==================================================================
Signed-off-by: Lyude Paul <lyude@redhat.com>
Cc: stable@vger.kernel.org
Cc: Karol Herbst <karolherbst@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Every codepath in nouveau that loops through the connector list
currently does so using the old method, which is prone to race
conditions from MST connectors being created and destroyed. This has
been causing a multitude of problems, including memory corruption from
trying to access connectors that have already been freed!
Signed-off-by: Lyude Paul <lyude@redhat.com>
Cc: stable@vger.kernel.org
Cc: Karol Herbst <karolherbst@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
The bo array has req->nr_buffers elements so the > should be >= so we
don't read beyond the end of the array.
Fixes: a1606a9596 ("drm/nouveau: new gem pushbuf interface, bump to 0.0.16")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
It was possible for this to be skipped when shutting down MST streams, and
leaving the core channel interlocked with a wndw channel update that never
happens - leading to a hung display.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Tested-By: Lyude Paul <lyude@redhat.com>
Ctxdmas for cursors from all heads are setup in the core channel, and due
to us tracking allocated handles per-window, we were failing with -EEXIST
on multiple-head setups trying to allocate duplicate handles.
The cursor code is hardcoded to use the core channel vram ctxdma already,
so just skip ctxdma allocation for cursor fbs to fix the issue.
Fixes: 5bca1621c0 ("drm/nouveau/kms/nv50-: move fb ctxdma tracking into windows")
Reported-by: Adam Borowski <kilobyte@angband.pl>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Inserted wait-for-gr-idle in the places it seems that RM does it, seems
to prevent some random mmio timeouts on Quadro GV100.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
It's better to use "list_for_each_entry_from_reverse" for iterating list
than "for loop" as it makes the code more clear to read.
This patch replace "for loop" with "list_for_each_entry_from_reverse"
and "start" variable with "cstate" which helps in refactoring
the code and also "cstate" variable is more commonly used in the other
functions.
changes in v2:
"start" variable is removed, before "cstate" variable was removed
but "cstate" is more common so preferred "cstate" over "start".
Signed-off-by: Arushi Singhal <arushisinghal19971997@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
A NV34 GPU was seeing temp and pwm entries in hwmon, which would error
out when read. These should not have been visible, but also the whole
hwmon object should just not have been registered in the first place.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
The method struct vga_switcheroo_handler::get_client_id() is defined
as returning an 'enum vga_switcheroo_client_id' but the implementation
in this driver, nouveau_dsm_get_client_id(), returns an 'int'.
Fix this by returning 'enum vga_switcheroo_client_id' in this driver too.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
The method struct drm_connector_helper_funcs::mode_valid is defined
as returning an 'enum drm_mode_status' but the driver implementation
for this method uses an 'int' for it.
Fix this by using 'enum drm_mode_status' in the driver too.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
VEID support hacked in here, as it's the most convenient place for now.
Will be refined once it's better understood.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
When only the position of a window changes, there's no need to submit
an image update as well.
Will be required to support the overlays, and Volta windows.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Window visibility is going to become a little more complicated with the
upcoming LUT changes, so store the calculated value to avoid needing to
recalculate the armed state again.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Instead of windows returning their core channel interlock mask if they
know core has been modified, it's recorded unconditionally and used if
required when update methods are emitted.
This will be required to support Volta.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>