#dri-devel on 2024-02-23 — irc logs at oftc.irclog.whitequark.org

2022-12-21 00:45 ChanServ changed the topic of #dri-devel to: <ajax> nothing involved with X should ever be unable to find a bar

00:20 vliaskov has quit [Remote host closed the connection]

00:31 iive has quit [Quit: They came for me...]

00:37 kzd has quit [Quit: kzd]

00:51 jrelvas has joined #dri-devel

01:12 CounterPillow has quit [Read error: Connection reset by peer]

01:13 CounterPillow has joined #dri-devel

01:24 columbarius has joined #dri-devel

01:26 co1umbarius has quit [Ping timeout: 480 seconds]

01:27 yyds has joined #dri-devel

01:31 kzd has joined #dri-devel

01:34 alanc has quit [Remote host closed the connection]

01:35 alanc has joined #dri-devel

01:42 flynnjiang has quit [Remote host closed the connection]

01:43 flynnjiang has joined #dri-devel

01:52 jrelvas has quit [Ping timeout: 480 seconds]

01:58 Calandracas has quit [Remote host closed the connection]

02:00 Kayden has quit [Quit: to the sky!]

02:02 Calandracas has joined #dri-devel

02:20 yyds has quit []

02:20 yyds has joined #dri-devel

02:47 konstantin_ has joined #dri-devel

02:47 konstantin is now known as Guest603

02:47 konstantin_ is now known as konstantin

02:50 Guest603 has quit [Ping timeout: 480 seconds]

03:00 fjdegroo has joined #dri-devel

03:18 yyds has quit [Remote host closed the connection]

03:25 heat has quit [Ping timeout: 480 seconds]

03:40 anujp has quit [Ping timeout: 480 seconds]

03:44 kts has joined #dri-devel

03:44 aravind has joined #dri-devel

03:57 davispuh has quit [Ping timeout: 480 seconds]

04:05 kts has quit [Ping timeout: 480 seconds]

04:07 kts has joined #dri-devel

04:08 kts has quit []

04:11 bmodem has joined #dri-devel

04:20 anujp has joined #dri-devel

04:34 KetilJ has joined #dri-devel

04:40 KetilJohnsen has quit [Ping timeout: 480 seconds]

04:44 surajkandpal has joined #dri-devel

05:04 YuGiOhJCJ has joined #dri-devel

05:08 simon-perretta-img has quit [Ping timeout: 480 seconds]

05:24 sarthakbhatt has joined #dri-devel

05:29 sarthakbhatt has quit [Remote host closed the connection]

05:30 sukuna has quit [Remote host closed the connection]

05:31 sukuna has joined #dri-devel

05:32 sukuna has quit [Remote host closed the connection]

05:32 sukuna has joined #dri-devel

05:44 ungeskriptet has joined #dri-devel

05:50 anujp has quit [Ping timeout: 480 seconds]

05:51 kts has joined #dri-devel

06:02 ungeskriptet has quit [Quit: Ping timeout (120 seconds)]

06:03 ungeskriptet has joined #dri-devel

06:04 Company has joined #dri-devel

06:09 ungeskriptet is now known as Guest621

06:09 ungeskriptet has joined #dri-devel

06:15 Guest621 has quit [Ping timeout: 480 seconds]

06:15 ungeskriptet is now known as Guest623

06:15 ungeskriptet has joined #dri-devel

06:17 glennk has joined #dri-devel

06:18 florida has joined #dri-devel

06:19 florida has quit []

06:21 Guest623 has quit [Ping timeout: 480 seconds]

06:30 Jeremy_Rand_Talos has quit [Remote host closed the connection]

06:30 Jeremy_Rand_Talos has joined #dri-devel

06:32 ungeskriptet has quit [Quit: Ping timeout (120 seconds)]

06:32 ungeskriptet has joined #dri-devel

06:35 Leopold has quit [Remote host closed the connection]

06:36 Leopold has joined #dri-devel

06:37 kts has quit [Ping timeout: 480 seconds]

06:40 Ryback_ has quit [Remote host closed the connection]

06:40 lstrano_ has joined #dri-devel

06:41 ungeskriptet is now known as Guest626

06:41 ungeskriptet has joined #dri-devel

06:42 lstrano has quit [Ping timeout: 480 seconds]

06:44 Guest626 has quit [Ping timeout: 480 seconds]

06:45 ungeskriptet is now known as Guest628

06:45 ungeskriptet has joined #dri-devel

06:49 lstrano_ has quit [Ping timeout: 480 seconds]

06:50 Guest628 has quit [Ping timeout: 480 seconds]

06:51 ungeskriptet is now known as Guest629

06:51 ungeskriptet has joined #dri-devel

06:57 Guest629 has quit [Ping timeout: 480 seconds]

07:02 ungeskriptet is now known as Guest631

07:02 ungeskriptet has joined #dri-devel

07:02 Kayden has joined #dri-devel

07:04 surajkandpal has quit [Ping timeout: 480 seconds]

07:08 Guest631 has quit [Ping timeout: 480 seconds]

07:16 Duke`` has joined #dri-devel

07:18 mvlad has joined #dri-devel

07:21 ungeskriptet is now known as Guest633

07:22 ungeskriptet has joined #dri-devel

07:26 sima has joined #dri-devel

07:27 Guest633 has quit [Ping timeout: 480 seconds]

07:28 simon-perretta-img has joined #dri-devel

07:32 ungeskriptet is now known as Guest635

07:32 ungeskriptet has joined #dri-devel

07:33 zdobersek has quit [Read error: Network is unreachable]

07:34 zdobersek has joined #dri-devel

07:34 ungeskriptet has quit []

07:37 Guest635 has quit [Ping timeout: 480 seconds]

07:42 tzimmermann has joined #dri-devel

07:52 surajkandpal has joined #dri-devel

08:00 sghuge has quit [Remote host closed the connection]

08:00 ninjaaaaa has quit [Read error: Connection reset by peer]

08:00 simondnnsn has quit [Read error: Connection reset by peer]

08:00 ninjaaaaa has joined #dri-devel

08:00 jsa has joined #dri-devel

08:00 sghuge has joined #dri-devel

08:01 simondnnsn has joined #dri-devel

08:07 TMM has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]

08:07 TMM has joined #dri-devel

08:10 kzd has quit [Ping timeout: 480 seconds]

08:13 Guest572 is now known as rgallaispou

08:14 KetilJohnsen has joined #dri-devel

08:21 KetilJ has quit [Ping timeout: 480 seconds]

08:28 vliaskov has joined #dri-devel

08:35 cmarcelo has quit [Remote host closed the connection]

08:35 rpigott has quit [Read error: Connection reset by peer]

08:35 pitust has quit [Read error: Connection reset by peer]

08:35 ella-0 has quit [Remote host closed the connection]

08:35 sumoon has quit [Remote host closed the connection]

08:35 rosefromthedead has quit [Remote host closed the connection]

08:35 kuruczgy has quit [Remote host closed the connection]

08:35 kennylevinsen has quit [Remote host closed the connection]

08:35 ifreund has quit [Remote host closed the connection]

08:35 mainiomano has quit [Remote host closed the connection]

08:35 kchibisov has quit [Remote host closed the connection]

08:35 cmarcelo has joined #dri-devel

08:35 kennylevinsen has joined #dri-devel

08:35 kuruczgy has joined #dri-devel

08:35 mainiomano has joined #dri-devel

08:35 ella-0 has joined #dri-devel

08:35 rosefromthedead has joined #dri-devel

08:36 sumoon has joined #dri-devel

08:36 kchibisov has joined #dri-devel

08:36 ifreund has joined #dri-devel

08:36 rpigott has joined #dri-devel

08:36 pitust has joined #dri-devel

08:46 tursulin has joined #dri-devel

08:58 tanty has quit [Ping timeout: 480 seconds]

09:00 <pq> tzimmermann, what do you think of using fbdev UAPI to drive keyboard RGB leds? :-p

09:00 <tzimmermann> pq, wat? go away!

09:00 <pq> lol

09:01 <tzimmermann> wasn't the discussion about auxdisplay?

09:02 <pq> yeah, I saw fbdev code in the auxdisplay driver mentioned.

09:02 * ccr nukes RGB leds from the orbit

09:02 <pq> is there another UAPI for auxdisplay, too?

09:04 <pq> I couldn't tell if cfag12864b.c had any UAPI in it, but cfag12864bfb.c seems to use fbdev things? Are they parts of the same driver, or two separate drivers for the same thing?

09:05 <tzimmermann> auxdisplay is "all the rest" that didn't fit anywhere else AFAICT

09:05 <pq> just wondering and stirring the pot, no big deal for me :-)

09:06 <tzimmermann> i've just glanced over that discussion. OMG

09:07 <tzimmermann> please let us not treat keyboard leds like regular displays

09:07 <pq> :-D

09:07 <pq> btw. kernel docs say: "The cfag12864bfb describes a framebuffer device (/dev/fbX)."

09:07 <tzimmermann> fbdev and drm should be reserved for display that show the user's console or desktop

09:08 jkrzyszt has joined #dri-devel

09:08 <tzimmermann> but not some status information or blinky features

09:09 <tzimmermann> pq, indeed. some of the ausdisplay HW seems to be some kind of led device. so there's an fbdev device for it. whether that makes is questionable

09:20 tanty has joined #dri-devel

09:22 <tzimmermann> i think jani made a good point about handling these leds in the input subsys

09:39 lemonzest has quit [Quit: WeeChat 4.2.1]

09:46 bolson has quit [Remote host closed the connection]

09:46 lemonzest has joined #dri-devel

09:53 bmodem has quit [Ping timeout: 480 seconds]

10:05 shankaru has quit [Remote host closed the connection]

10:26 ninjaaaaa has quit [Read error: Connection reset by peer]

10:26 simondnnsn has quit [Read error: Connection reset by peer]

10:28 simondnnsn has joined #dri-devel

10:30 ninjaaaaa has joined #dri-devel

10:30 Leopold has quit [Ping timeout: 480 seconds]

10:31 Leopold has joined #dri-devel

10:31 apinheiro has joined #dri-devel

10:32 bmodem has joined #dri-devel

10:36 simondnnsn has quit [Ping timeout: 480 seconds]

10:37 surajkandpal has quit [Ping timeout: 480 seconds]

10:41 simondnnsn has joined #dri-devel

11:07 flynnjiang has quit [Ping timeout: 480 seconds]

11:08 rasterman has joined #dri-devel

11:43 kts has joined #dri-devel

11:56 Leopold has quit [Remote host closed the connection]

11:56 Leopold has joined #dri-devel

12:01 Leopold has quit [Remote host closed the connection]

12:02 Leopold_ has joined #dri-devel

12:08 bmodem has quit [Ping timeout: 480 seconds]

12:21 kts has quit [Remote host closed the connection]

12:26 DodoGTA has quit [Remote host closed the connection]

12:26 DodoGTA has joined #dri-devel

12:26 Calandracas has quit [Remote host closed the connection]

12:30 kts has joined #dri-devel

12:31 Calandracas has joined #dri-devel

12:47 kts_ has joined #dri-devel

12:48 kts_ has quit [Remote host closed the connection]

12:52 fireburn has quit []

12:52 fireburn has joined #dri-devel

12:54 kts has quit [Ping timeout: 480 seconds]

13:05 YuGiOhJCJ has quit [Remote host closed the connection]

13:06 YuGiOhJCJ has joined #dri-devel

13:24 aravind has quit [Ping timeout: 480 seconds]

13:31 yyds has joined #dri-devel

13:33 linusw has joined #dri-devel

13:34 kts has joined #dri-devel

13:34 kts has quit [Remote host closed the connection]

13:45 kts has joined #dri-devel

13:57 YuGiOhJCJ has quit [Quit: YuGiOhJCJ]

14:07 tanty has quit [Quit: Ciao!]

14:14 tanty has joined #dri-devel

14:35 DodoGTA has quit [Quit: DodoGTA]

14:35 heat has joined #dri-devel

14:36 DodoGTA has joined #dri-devel

14:42 Jeremy_Rand_Talos_ has joined #dri-devel

14:43 DodoGTA has quit [Quit: DodoGTA]

14:43 DodoGTA has joined #dri-devel

14:49 Jeremy_Rand_Talos has quit [Ping timeout: 480 seconds]

14:50 <mareko> karolherbst: if it's useful, radeonsi could do SVM where CPU pointer == GPU pointer

14:51 <mareko> karolherbst: we can implement pipe_screen::resource_from_user_memory to do that by default with the amdgpu kernel driver, or based on a a flag

14:54 Dr_Who has joined #dri-devel

15:12 rgallaispou has left #dri-devel [#dri-devel]

15:19 ninjaaaaa has quit [Ping timeout: 480 seconds]

15:19 simondnnsn has quit [Ping timeout: 480 seconds]

15:20 <karolherbst> mareko: yeah.. that's how I plan to implement non sytem SVM

15:20 kzd has joined #dri-devel

15:21 <karolherbst> I have a prototype based on iris, but it blew up the applications VM

15:21 <karolherbst> kinda need to find some time and properly think it all through

15:21 KetilJohnsen has quit [Ping timeout: 480 seconds]

15:22 <karolherbst> the biggest issue is just how to synchronize the VMs on both sides properly

15:24 <karolherbst> like.. if the driver allocates a bo, which could be used for global memory, it probably also needs to `mmap` at the same location on the CPU side. Or like mmap on the CPU first and then just place the bo at the same location on the GPU side

15:25 <karolherbst> and my plan was to add a "SVM" flag to pipe_resource_Flags so the driver knows it's a SVM thing or so

15:27 <karolherbst> sadly, the story for discrete GPUs is way more complex, because you obviously don't want to operate on host memory, just mapped on both sides and I didn't even get to the point where I'd do memory migration

15:41 Calandracas has quit [Remote host closed the connection]

15:42 <DemiMarie> robclark: if kernel submission does not protect the GPU or its firmware in any way, then userspace submission is an improvement!

15:42 Haaninjo has joined #dri-devel

15:45 <robclark> no, I think more likely it is just a false sense of security, tbh

15:45 jrelvas has joined #dri-devel

15:45 jrelvas has quit [Remote host closed the connection]

15:49 Thymo has quit [Quit: ZNC - http://znc.in]

15:58 <DemiMarie> robclark: I see! Do GPU vendors generally do a decent job at writing firmware?

15:58 bolson has joined #dri-devel

15:59 <DemiMarie> robclark: how hard will it be to proxy the doorbells? It is strictly unsafe to pass MMIO to a VM under Intel unless the MMIO behaves like memory, in that reads return the just-read value and both reads and writes complete in bounded time.

15:59 <robclark> well, there is a pretty wide range of what can be called firmware, ranging from things that have some sort of RTOS to things that are somewhat more limited

16:00 <robclark> but on-gpu escapes is much more rare than more mundane UAF type bugs

16:01 macromorgan_ has joined #dri-devel

16:01 macromorgan_ has quit [Remote host closed the connection]

16:01 Calandracas has joined #dri-devel

16:01 <mareko> karolherbst: what I meant is that we can assign any GPU address to any buffer if the address range is unused, and the whole address range used by the CPU is always unused because our GPU allocations choose addresses that CPU allocations wouldn't use, and that's for SVM. For resource_from_user_memory, we can use the CPU pointer as the requested GPU address for the buffer, which is the most trivial case.

16:01 <mareko> resource_create is more involved because you would have the pass the desired GPU address to it.

16:02 <robclark> hmm, I'm not entirely familiar w/ the doorbell issue.. I would have expected it to work because that is basically how sr-iov works (although maybe not on past/current devices, idk)

16:04 <karolherbst> mareko: like.. if I'd allocate a pipe_resource and would map it, the mapped address would also need to be the same as seen on the GPU

16:04 <karolherbst> but

16:05 <karolherbst> if the driver can promise, that addresses of GPUs bos are either reserved or won't be able to be used by CPU allocators, that might be good enough. There is then the question of how would synchronization work between the host and the GPU

16:06 macromorgan has quit [Ping timeout: 480 seconds]

16:07 <mareko> like I said, our GPU BOs use address that CPU allocators wouldn't use

16:07 <mareko> *addresses

16:07 <karolherbst> but it also highly depends if we are talking about system SVM or not here. For non system SVM the allocations are explicit. For system SVM any CPU pointer needs to be valid also for the GPU

16:07 <karolherbst> like.. wouldn't or won't use?

16:08 <mareko> won't

16:08 <karolherbst> who is managing the VM for radeonsi btw? Is that the kernel or is it done in userspace?

16:08 macromorgan has joined #dri-devel

16:09 <mareko> our GPU VM design is that all CPU addresses that the process can use are currently never used by GPU alloations

16:09 <mareko> so the kernel could mirror the whole process address space

16:09 <mareko> into the GPU

16:09 <karolherbst> okay

16:09 <karolherbst> yeah, that sounds like more it's designed to implement system SVM things :)

16:10 <mareko> it seems, but no

16:10 <mareko> amdkfd does that mirroring, while amdgpu requires explicit VM map calls

16:10 <karolherbst> so I guess host allocations still need to be "imported" via userptrs or something

16:12 <karolherbst> mareko: I think I just have two questions then: 1. if I map a pipe_resource, can the mapped pointer be the GPU address valid for the CPU? and 2. Could I allocate a `pipe_resource` in a way that it's placed at a given address?

16:12 <karolherbst> like..

16:12 <karolherbst> given address as "on the CPU side there is an allocation I want to mirror inside VRAM"

16:13 <karolherbst> there is a USM extension which allows for very explicit placement and migration, so I might want to have the ability to move memory between system RAM and VRAM, but placed at the same address on both sides

16:13 <mareko> it's about page table mirroring, VRAM or GTT placement doesn't matter

16:13 <mareko> 2 is trivial, you can choose the GPU address for any created pipe_resource

16:13 <mareko> and any imported pipe_resource

16:14 <karolherbst> okay, and it can also be an address which already exists on the CPU's side?

16:15 <mareko> yes

16:15 <karolherbst> okay, yeah, that should be good enough then

16:15 <mareko> I don't know about 1 since we assign GPU addresses that the CPU process wouldn't use, so I don't if mmap can even use them

16:15 <mareko> *I don't know

16:17 <karolherbst> 1 is a hard requirement by CL sadly. I could do the reverse way: allocate on the host and then userptr import it, but... how would I get the memory to be migrated into VRAM?

16:17 <mareko> you wouldn't

16:18 <mareko> 1 is only dependent on mmap being usable, not on the driver

16:19 <karolherbst> yeah :) that's the problem and why I'd like to allocate a pipe_resource instead and just make sure the address to which it gets mapped is the same. `mmap` does allow you to specify where you want to map something though

16:19 <karolherbst> but it's not guaranteed to succeed afaik

16:19 <karolherbst> but that's someting I could play around with

16:19 <mareko> if you use a normal BO and you access it with a CPU and the BO is in invisible VRAM, it will cause a CPU page fault and the kernel will migrate it to GTT and keep it there

16:21 <karolherbst> could I force the migration without having to access it? Or would I have to touch every page? Or just one page?

16:21 <karolherbst> like.. if I can read at offset 0x0 and it would migrate the entire allocation that's good enough

16:21 <karolherbst> though I don't really need that, as memory migration is just a hint on the API level

16:21 <karolherbst> explicit migration I mean

16:22 <mareko> the migration is forced by touching the page with a CPU, and it migrates the whole buffer

16:22 <karolherbst> okay

16:22 <mareko> recent CPUs and BIOSes allow all VRAM to be visible

16:23 <karolherbst> so the only thing to figure out would be the mapping thing then. But yeah, a driver guaranteeing that bo's won't overlap with CPU allocation is indeed a big help

16:24 <karolherbst> or rather, with mappings in general

16:29 <MrCooper> note that any CPU reads from VRAM will throw you off a performance cliff

16:31 <karolherbst> yeah, that's why I want to be able have allocations on both sides at the same address

16:31 <karolherbst> so I can do explicit migrations

16:34 tzimmermann has quit [Quit: Leaving]

16:39 anujp has joined #dri-devel

16:40 yyds has quit [Remote host closed the connection]

16:40 junaid has joined #dri-devel

16:46 junaid has quit [Remote host closed the connection]

16:55 <MrCooper> https://gitlab.freedesktop.org/drm/amd/-/issues/3195 looks like birdie is going for the triple crown of getting banned on LWN, Phoronix and fdo GitLab

16:57 <CounterPillow> didn't know Phoronix even banned people

17:00 <CounterPillow> >What would need to happen would be for the media player to be able to ask the compositor if it can just hand it the raw YUV video data. If the compositor supports that and uses display planes to handle it, then the media player can just share the YUV images rather than RGB, cutting out the GFX work in the media player.

17:00 <CounterPillow> mpv already has a VO for this (dmabuf_wayland)

17:00 <karolherbst> impressive

17:01 <CounterPillow> it's a fairly low amount of code last I checked

17:04 <CounterPillow> obviously, it will never be the default, because both compositors and hardware are too spotty with their implementations, and it likely uses a lower quality scaler than mpv's current defaults, and also iirc it currently requires hwdec which is also unlikely to be turned on by default judging by how often AMD manages to find ways to break it

17:05 <MrCooper> CounterPillow: "banned" in quotes, he's been posting as "avis" ever since the birdie account was banned, doesn't even to try to hide it's him, but nothing happens

17:06 <CounterPillow> heh

17:06 <karolherbst> anyway.. if people are under the impression that users are wasting time or are in other ways disrespectful, we can certainly discuss this, but I haven't really seen much from birdie on gitlab besides maybe wasting times or making some out of place remarks

17:07 <MrCooper> even so, a Phoronix "ban" is some kind of achievement I guess

17:07 <MrCooper> karolherbst: yeah I was joking, let's hope he's not just getting warmed up though

17:07 <karolherbst> nah

17:07 <CounterPillow> yeah it's probably better not to shittalk people here even if they deserve it

17:07 <karolherbst> birdie is being active on the gitlab for years now, so I guess it's fine

17:08 <karolherbst> well.. "active"

17:12 <MrCooper> CounterPillow: if he doesn't want to get called out for what he's posting on the Phoronix forums, he can always stop, we'd all be better off for it

17:12 <CounterPillow> Personally I simply do not read places with a high frequency of bad posts

17:15 <Lynne> 9a00a360ad8bf0e32d41a8d4b4610833d137bb59 causes segfaults on wayland

17:15 <Lynne> is pelloux here to discuss? I'd rather not send a revert MR

17:17 tursulin has quit [Ping timeout: 480 seconds]

17:18 <MrCooper> pepp: ^

17:20 <pepp> Lynne: annoying. Do you have more details?

17:23 <Lynne> mpv and firefox crash instantly, in libvulkan_radeon

17:23 <Lynne> running on sway with the vulkan backend

17:24 <Lynne> wlroots generally causes programs to do a swapchain rebuild twice in a quick succession on init, maybe it's related to this?

17:25 ity has quit [Remote host closed the connection]

17:25 ity has joined #dri-devel

17:28 <pepp> Lynne: I guess it's missing a "if (chain->wsi_wl_surface)" check

17:29 <DemiMarie> Regarding SVM: SVM can be perfectly compatible with virtio-GPU because while the guest userspace program doesn’t make any explicit requests to make memory accessible to the GPU, the guest kernel can make these requests to the host.

17:30 <DemiMarie> CounterPillow: is hardware decoding unreliable under desktop Linux?

17:30 <CounterPillow> yes

17:30 <DemiMarie> CounterPillow: why?

17:31 <CounterPillow> bugs

17:31 <DemiMarie> in what?

17:31 <CounterPillow> the driver

17:31 <Lynne> pepp: can confirm that fixes it

17:32 <DemiMarie> Is this because of distributions shipping old versions of Mesa?

17:32 jkrzyszt has quit [Ping timeout: 480 seconds]

17:32 <CounterPillow> no

17:32 <CounterPillow> new bugs are added all the time

17:32 <DemiMarie> What makes it more reliable under Windows/macOS/ChromeOS/Android/etc?

17:33 <pepp> Lynne: thx, I'll open a MR soon

17:34 <CounterPillow> never said it was more reliable there, but for Android/macOS/ChromeOS it's definitely more engineering resources invested

17:35 <DemiMarie> I thought ChromeOS just used upstream drivers.

17:35 <CounterPillow> Not always true, and most importantly they usually do not ship AMD hardware as far as I know

17:36 <DemiMarie> Are Intel’s drivers more reliable?

17:36 <CounterPillow> I don't know since I don't use Intel, but they sure seem to be judging by the number of mpv bugs opened concerning vaapi misbehaving

17:37 <mareko> karolherbst: radeonsi can change BO placement, but if there is not enough memory, it's only a hint

17:38 Dark-Show has joined #dri-devel

17:39 <mattst88> CounterPillow: we have AMD Chromebooks nowadays, FYI

17:40 <mattst88> they're using the video encode/decode drivers in Mesa

17:40 <CounterPillow> Boy I sure do hope they pre-validate all input then because you can get 100% repeatable GPU resets by feeding AMD's VCN corrupt H.264 streams

17:40 Company has quit [Read error: Connection reset by peer]

17:41 <mattst88> I don't work on the AMD stuff directly, but from what I've heard the video driver stability is not great

17:42 <mattst88> e.g. we split out radv into a separate package that can be updated independently of the radeonsi driver (which is great) and the video driver (which is not great, AFAIK)

17:43 <robclark> _eventually_ we will have some gitlab ci for amd video.. IIRC it is still blocked on some deqp-runner MR

17:45 <CounterPillow> That'd be great, especially if it tests all still relevant generations of VCN (one of the more frustrating parts is reporting a bug and then being told that the AMD engineers don't have that hardware to reproduce it on)

17:46 <robclark> it would be re-using the existing gitlab ci farms... I know we've sent some amd chromebooks to the collabora farm, but I doubt it is exhaustive.

17:46 <CounterPillow> :(

17:47 <robclark> someone sufficiently motivated and w/ enough hw is ofc welcome to host their own ci farms to expand the hw coverage

17:48 <CounterPillow> I am planning on setting up a lava lab eventually but it does feel a bit silly that AMD's driver team does not have access to AMD's hardware

17:49 <robclark> 🤷

17:49 davispuh has joined #dri-devel

17:49 <DemiMarie> CounterPillow: is it mostly old hardware?

17:50 <robclark> it would ofc be nice if hw vendors ran or sponsored ci farms.. but it can quickly turn into a large project depending on how far back in # of gens you go

17:50 <CounterPillow> In my case, AMD Picasso isn't *that* old, and no, mpv has just had a bug filed caused by a 7900 XT's hardware decoder which is the current gen

17:51 <CounterPillow> I've seen the "sorry we don't have that hardware" response for issues reported on 6xxx series cards, i.e. previous gen

17:51 <DemiMarie> Oh dear

17:52 <DemiMarie> Seems like they only support the most recent hardware generation.

17:52 <CounterPillow> it's not a matter of policy, I don't think

17:54 ity has quit [Ping timeout: 480 seconds]

17:54 ity has joined #dri-devel

17:54 Marcand has joined #dri-devel

17:57 rasterman has quit [Quit: Gettin' stinky!]

17:59 junaid has joined #dri-devel

18:04 <abhinav__> daniels Hi, GM. Just wanted to check with you on https://gitlab.freedesktop.org/freedesktop/freedesktop/-/issues/1193 . If this is still the right way to submit this request

18:08 <robclark> abhinav__: I think daniels is migrating drm-misc to gitlab so that shell accounts will no longer be needed

18:10 <abhinav__> robclark yes, thats why I wanted to check whether the old process of applying for committer access still holds true or what would be the new method .... as only existing committers will be migrated to gitlab not new ones

18:13 <robclark> I think just the last step changes, to gitlab permissions instead of account creation.. hmm, and I guess you just need to configure your ssh pub key in gitlab. Otherwise the process should be the same

18:16 <abhinav__> robclark got it, Yes I have already uploaded my pub keys to my gitlab account ...

18:17 ity has quit [Ping timeout: 480 seconds]

18:18 <daniels> yeah, it would just be gitlab permissions so you don't need to fill out most of that form

18:18 <daniels> robclark: afaik we only have stoney

18:20 <abhinav__> daniels got it, so approvals will still happen on that form i assume though?

18:23 <daniels> mripard wanted to try out the 'access request' button, but don't worry, we'll give you access :) and it should be moved early next week

18:29 kts has quit [Ping timeout: 480 seconds]

18:31 ity has joined #dri-devel

18:32 <abhinav__> daniels thanks :)

18:33 heat has quit [Remote host closed the connection]

18:33 kts has joined #dri-devel

18:33 heat has joined #dri-devel

18:37 ity has quit [Remote host closed the connection]

18:37 <mareko> karolherbst: we could also make radeonsi use amdkfd to get system SVM, but it would need a new winsys

18:38 ity has joined #dri-devel

18:39 <karolherbst> yeah... system SVM makes implementing all this stuff way easier, but I don't have anything which actually requires it

18:40 <karolherbst> normal SVM is used by SyCL or chipstar (hip on CL), so that's why it's relatively important to support at some point

18:45 zackr has joined #dri-devel

18:51 <llyyr> https://gitlab.freedesktop.org/mesa/mesa/-/commit/9a00a360ad8bf0e32d41a8d4b4610833d137bb59 this commit breaks nearly every wayland native application that uses vulkan, including applications I launch with MESA_LOADER_DRIVER_OVERRIDE=zink

18:52 <llyyr> can reproduce with mpv --gpu-api=vulkan (and plplay), as well as "MESA_LOADER_DRIVER_OVERRIDE=zink ffplay [video]"

18:57 Thymo has joined #dri-devel

19:02 vliaskov has quit []

19:02 fjdegroo has quit [Read error: Connection reset by peer]

19:03 <llyyr> fixed by this diff https://0x0.st/H5t4.txt

19:03 <llyyr> I'll open a MR if that looks right

19:05 Thymo has quit [Ping timeout: 480 seconds]

19:07 Thymo has joined #dri-devel

19:09 fjdegroo has joined #dri-devel

19:09 <ity> Hi, hopefully a quick question, does libglx only allow access to the GPU that the X11 Server is running on?

19:10 <ity> Slash is there smth in the DRI protocol for choosing between GPUs that the X11 server is connected to

19:10 Thymo has quit []

19:14 Thymo has joined #dri-devel

19:14 lstrano has joined #dri-devel

19:18 <daniels> llyyr: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27767

19:18 Duke`` has quit []

19:18 Duke`` has joined #dri-devel

19:18 <llyyr> ah

19:23 simon-perretta-img has quit [Ping timeout: 480 seconds]

19:23 simon-perretta-img has joined #dri-devel

19:28 <agd5f> CounterPillow, we have access to all generations of hardware in general at least at engineering board level. OEM specific platforms or boards are a different matter.

19:28 <CounterPillow> agd5f: then it is strange to me that I've seen "I don't have access to this hardware" as a response to something multiple times, referring to non-OEM models.

19:28 <CounterPillow> There seems to be some breakdown in communication

19:29 <agd5f> CounterPillow, do you have an example?

19:29 <CounterPillow> No, I don't have links from 6 months ago handy

19:29 <agd5f> CounterPillow, not every engineer has every board, but as a team, we have a hardware library where you can get the boards

19:31 simon-perretta-img has quit [Ping timeout: 480 seconds]

19:32 <agd5f> CounterPillow, that said, we have remote developers and it's not always feasible to send them one of every board so sometimes we need to reach out to someone in the office to repo issues, etc. which can take time

19:32 simon-perretta-img has joined #dri-devel

19:35 <CounterPillow> agd5f: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9497#note_2248945 (not sure if this person is an AMD employee though)

19:41 <CounterPillow> looks like they are

19:41 <agd5f> CounterPillow, Thong is an AMD employee and he never said he didn't have access to the hardware.

19:44 oneforall2 has quit [Remote host closed the connection]

19:45 simon-perretta-img has quit [Ping timeout: 480 seconds]

19:46 <DemiMarie> Do AMD kernel drivers have problems recovering from GPU resets?

19:48 oneforall2 has joined #dri-devel

19:48 <CounterPillow> yes

19:48 <agd5f> DemiMarie, they can, depending on the nature of the hang and hardware involved

19:48 <DemiMarie> What is the reason for this?

19:49 <DemiMarie> agd5f: will this be fixed in the future?

19:49 <CounterPillow> I've never seen amdgpu recover successfully on either picasso or a 7900 XT or the zen4 igpu

19:49 <llyyr> DemiMarie: just leave a h264 video playing with vaapi for 5-6 hours, you'll almost defintiely get a reset within that time on a RDNA 2/3 gpu

19:49 <llyyr> a reset that it doesn't recover from, that is

19:50 <agd5f> DemiMarie, It's mostly older hardware. newer stuff should be in pretty good shape

19:51 <DemiMarie> For context: this makes supporting AMD GPUs in virtualization use-cases (with virtio-GPU native contexts) significantly less appealing.

19:51 <DemiMarie> agd5f: is it possible to just do a full GPU reset, wipe out everything in VRAM, give everyone a context lost error, and continue?

19:51 <agd5f> DemiMarie, yes

19:52 <agd5f> but most userspace doesn't handle context lost so even if the kernel resets everything, userspace is left in a bad state

19:52 <DemiMarie> agd5f: does that mean that the zen4 iGPU not recovering is a bug?

19:52 <airlied> daniels: Linus has merged my tree, i can give you a week :-)

19:53 <DemiMarie> agd5f: I see, so that is a bug in all sorts of userspace programs?

19:53 <ccr> uhh.

19:53 * DemiMarie wonders if non-robust contexts should cause SIGABRT when a context loss happens

19:54 <agd5f> DemiMarie, right. On other OSes, the desktop environment is robust aware and if it sees a context lost, it rebuilds it's state, creates a new context and continues

19:55 <DemiMarie> agd5f: I guess that means that bugs should be reported against various Wayland compositors.

19:58 <DemiMarie> agd5f: what is the status of LeftoverLocals mitigations for AMD GPUs?

19:58 <agd5f> DemiMarie, in progress

20:00 <DemiMarie> agd5f: does the hardware make it quite difficult?

20:00 <DemiMarie> IIRC Google shipped something for ChromeOS.

20:00 <DemiMarie> Will there be a way to enforce the mitigations at the kernel driver level?

20:02 <agd5f> DemiMarie, I don't think I'm at liberty to discuss the details at this point

20:02 shiva has joined #dri-devel

20:03 <DemiMarie> agd5f: will the details be made available in the future?

20:03 <agd5f> yes

20:06 <DemiMarie> Context: I’m going to be working on GPU acceleration for Qubes OS and working LeftoverLocals protection, preferably at the kernel driver or firmware level, is a hard requirement there.

20:07 <DemiMarie> The reason the location of the mitigations matters is that the userspace driver will be running in the guest, which is not trusted.

20:07 simon-perretta-img has joined #dri-devel

20:23 <zamundaaa[m]> <CounterPillow> "I've never seen amdgpu recover..." <- I've seen it recover correctly lots of times with a 6800XT, and also once with a 7900XTX (only reset that has happened on it so far, triggered by Doom Eternal)

20:24 <zamundaaa[m]> You just need to use the one compositor that supports recovering from GPU resets :)

20:24 <zmike> weston ?

20:25 <zamundaaa[m]> KWin

20:25 <daniels> obviously weston uses the gpu so perfectly that we never need to recover

20:26 <CounterPillow> zamundaaa[m]: I use KWin, and I don't think the problem was the compositor considering dmesg kept getting spammed with amdgpu trying to reset

20:28 <zamundaaa[m]> I have seen a reset loop happen once before as well, on the 6800 XT. I thought that was fixed though, it hasn't happened in a while

20:33 kts has quit [Ping timeout: 480 seconds]

20:38 <agd5f> I have my doubts as to whether this stuff will ever work very reliably on consumer level Linux in general just due to the nature of the ecosystem. There are tons of distros and they all use slightly different combinations and versions of components and no one can reasonably test all of those, plus all of the OSVs and IHVs focus the vast majority of their testing on their enterprise offerings.

20:39 shiva has quit []

20:42 <CounterPillow> Ah, the good ol' "Linux is too diverse to support" excuse when it's surprisingly always your component that crashes.

20:42 Duke`` has quit [Remote host closed the connection]

20:44 Duke`` has joined #dri-devel

20:44 Marcand has quit [Ping timeout: 480 seconds]

20:47 <DemiMarie> agd5f: The only things that should matter here are the KMD version and the firmware version

20:47 <DemiMarie> And the hardware itself, obviously.

20:48 ungeskriptet has joined #dri-devel

20:48 <zamundaaa[m]> Mesa can also matter. Until recently, reset handling in RadeonSi was borked

20:49 <agd5f> DemiMarie, and the compositor version and the mesa version and the LLVM version

20:49 <CounterPillow> The long-haired Linux smellies are simply asking too much of us when we have to make sure we don't have bugs in our firmware, our kernel driver, and our userspace driver

20:50 <zamundaaa[m]> agd5f: KWin has supported GPU resets for a loooong time

20:50 <zamundaaa[m]> And it's been 90% functional for almost always. In the remaining cases it would just crash, which is still better than a hang

20:50 <DemiMarie> zamundaaa: What is the consequence of that? Applications not being able to deal with `VK_ERROR_DEVICE_LOST`/`GL_CONTEXT_LOST` reliably?

20:51 <zamundaaa[m]> There were two issues, one was that RadeonSi never reported the GPU reset as being over

20:51 <DemiMarie> agd5f: userspace should not determine whether the KMD can reset the GPU successfully

20:52 <DemiMarie> agd5f: So LLVM problems can be solved by either having Mesa bundle LLVM, or by having Mesa stop using LLVM to generate AMD GPU code.

20:53 <zamundaaa[m]> The other one was related to shared contexts, and meant that after re-creating OpenGL state, KWin would still get the context reset reported by RadeonSi on the new context, despite everything being fine

20:53 <agd5f> CounterPillow, there are combinations of components that work great and others that do not. Say what you will about windows or android, it's a lot easier to test once and verify that it will work everywhere. It's not feasible to test every combination of driver, firmware, rest of kernel, UMD, LLVM, compositor, etc.

20:53 <agd5f> DemiMarie, and we should also bundle kernel and mesa and firmware into one repo as well if we really want to get solid

20:53 <zamundaaa[m]> agd5f: GPU reset handling on the application side is luckily very simple, so there isn't a lot of variation

20:54 <zamundaaa[m]> It's pretty much just if (hasResetHappened()) recreateEglContexts()

20:54 <DemiMarie> agd5f: from my PoV, the obvious solution to this is fuzzing

20:54 <agd5f> I'm not talking about GPU reset specifically, just general GPU stack stability. Like you can have a good combination of KMD and firmware, but if UMD or bad, you'll just keep getting resets

20:55 <DemiMarie> agd5f: what should distros do?

20:55 <DemiMarie> always take the latest kernel and latest Mesa?

20:55 <CounterPillow> not ship AMD code since they're the only ones with this recurrent quality problem

20:55 <LaserEyess> you don't need to test every combination of driver, firmware, software. Test the upstream kernel and upstream mesa, and pick a DE to test, it doesn't matter

20:55 <zamundaaa[m]> CounterPillow: comments like that really don't help

20:56 ungeskriptet is now known as Guest688

20:56 ungeskriptet has joined #dri-devel

20:57 <agd5f> LaserEyess, sure until distro X decides to pull in a new firmware or stick with an older mesa release, then you have an untested combination

20:58 <LaserEyess> but that's not your problem, and if said distro is doing that then, well, they're doing something wrong

20:58 <agd5f> LaserEyess, but that is what users use

20:59 <LaserEyess> well I'm addressing the point of, for example, amdgpu bugs that are reproducible on drm-tip, or the stable linux kernel, or one of linus's -rc's

20:59 <CounterPillow> Does breaking older user space with newer firmware count as an uapi break or is it fine because it's in firmware?

20:59 <tleydxdy> who are these "users"?

21:00 <tleydxdy> I doubt firmware change should affect anything beyond kmd

21:00 <tleydxdy> if it did it's a kmd issue

21:01 Guest688 has quit [Ping timeout: 480 seconds]

21:06 ungeskriptet is now known as Guest689

21:06 ungeskriptet has joined #dri-devel

21:11 ungeskriptet has quit []

21:11 ungeskriptet has joined #dri-devel

21:12 <DemiMarie> tleydxdy: those users are people using distros like Debian stable

21:12 Guest689 has quit [Ping timeout: 480 seconds]

21:12 <tleydxdy> shouldn't they get support from debian?

21:12 <tleydxdy> like amd is not in the position to do anything

21:13 <tleydxdy> I would think the "users" for amd would be the upstream projects

21:13 <tleydxdy> in that case there's only one support target: "tip of tree"

21:14 <DemiMarie> but that has no humans actually using it, except for dev

21:14 <robclark> DemiMarie: for LL bnieuwenhuizen made a mitigation that clears lmem in mesa.. configured via driconf. This is what we are shipping w/ CrOS but others are free to use it until we get something better from amd

21:14 <tleydxdy> like the other reports help catch bugs that's good, but it's unrealistic to fully support them

21:15 <tleydxdy> I can spin up a distro tmr that only ships known bad configs from vendor X and X would need to support my users?

21:15 <DemiMarie> tleydxdy: “you have to be running this development version to get help” is not reasonable to expect from end-users

21:16 <tleydxdy> yes, but amd also can't ship packages to debian stable

21:16 <tleydxdy> so debian stable need to fix the issue

21:16 <tleydxdy> not amd directly

21:16 <tleydxdy> and if the fix is in upstream, they can backport

21:17 <tleydxdy> "try latest upstream" is a reasonable ask of you are reporting issue to upstream

21:17 <LaserEyess> DemiMarie: the distro is the user, for example ubuntu. When you get a bug on ubuntu, you report it to their issue tracker, and a developer there should be your primary PoC. That developer should be the one coming to AMD if it's an AMD bug, and that developer should be able to run a development system

21:17 <LaserEyess> in fact people pay canonical for that service

21:18 ungeskriptet has quit [Quit: Ping timeout (120 seconds)]

21:18 ungeskriptet has joined #dri-devel

21:19 <tleydxdy> I mean if you are paying money that's a different story, whoever took your money should make sure you get fixed

21:19 <tleydxdy> if you pay amd a contract then sure, run hanamontana os and get direct support

21:20 <LaserEyess> I"m talking about the support contracts that many linux vendors offer

21:20 <LaserEyess> amd does not offer support for those, the linux vendors do

21:20 <LaserEyess> even free distros have bug trackers

21:20 <LaserEyess> it's the same thing, just with volunteer time and not a contract

21:20 <DemiMarie> tleydxdy: “latest released kernel and Mesa” would be something that is realistic to expect at least some users to run

21:20 <DemiMarie> “tip of tree” isn’t

21:20 <DemiMarie> not least because IIUC neither Linux nor Mesa actually recommend running it

21:22 junaid has quit [Remote host closed the connection]

21:23 <tleydxdy> well tip of tree might be a poor choice of word for me. but I was pretty sure e.g. linux would want you try linux-next at least

21:23 <tleydxdy> if you report bug directly to there

21:26 ungeskriptet has quit [Quit: Ping timeout (120 seconds)]

21:26 ungeskriptet has joined #dri-devel

21:27 <tleydxdy> in any case I don't think hardware vendors should concern themselves with anything other than the latest upstream projects (i.e. direct consumer of their code) when it comes to test coverage. unless they got support contracts that mandate otherwise of course

21:29 <agd5f> ROCm is super stable running RHEL with our packaged drivers. In that case we can make sure you are using a well validated combination of firmwares, driver code, and core OS components because both AMD and RH test the hell out of it. fedora, less so.

21:33 <tleydxdy> yeah, give money to rhel might be the end lesson here

21:35 <DemiMarie> agd5f: From what I have seen, Intel is stable on Fedora, too.

21:36 nukelet has joined #dri-devel

21:37 <DemiMarie> What this sounds like to me is that the various interfaces are unstable.

21:37 <tleydxdy> I sure hope fedora gets tested otherwise would't every rhel update be a QA hell?

21:37 <DemiMarie> tleydxdy: that kind of stuff is why Linux on the desktop has a bad reputation

21:37 <DemiMarie> tleydxdy: to me, “latest upstream projects” means “latest release version”

21:40 <DemiMarie> robclark: is this race-free? In other words, is it guaranteed that GPU preemption can’t happen before that command stream finishes?

21:41 <robclark> it clears at the end of each shader, so as long as there isn't mid-shader preemption it should be ok

21:41 Duke`` has quit [Ping timeout: 480 seconds]

21:45 ungeskriptet has quit [Quit: Ping timeout (120 seconds)]

21:45 ungeskriptet has joined #dri-devel

21:46 <DemiMarie> Is mid-shader preemption guaranteed not to happen?

21:48 ungeskriptet has quit []

21:48 ungeskriptet has joined #dri-devel

21:49 <robclark> better question for someone from amd but I wouldn't expect mid-shader preemption

21:49 <robclark> ie. seems like it would be a hard thing to implement in hw

21:49 <agd5f> DemiMarie, on AMD hardware mid-shader preemption is only supported on the user queues used by ROCm. Kernel managed queues are not preempted

21:51 <DemiMarie> agd5f: is one reason that ROCm queues can be preempted that they do not have access to fixed-function blocks?

21:51 <agd5f> only compute queues support mid-shader preemption. GFX is always at draw boundaries

21:52 <agd5f> due to fixed function hardware

22:08 mvlad has quit [Remote host closed the connection]

22:09 ungeskriptet has quit [Quit: Ping timeout (120 seconds)]

22:09 ungeskriptet has joined #dri-devel

22:11 jsa has quit []

22:22 ungeskriptet has quit [Quit: Ping timeout (120 seconds)]

22:22 ungeskriptet has joined #dri-devel

22:22 Marcand has joined #dri-devel

22:23 <DemiMarie> I see.

22:23 <DemiMarie> Hopefully future hardware will support preemption of fixed-function units.

22:25 <DemiMarie> Right now it seems that GFX is a second-class citizen when it comes to robustness.

22:26 ungeskriptet is now known as Guest696

22:26 ungeskriptet has joined #dri-devel

22:30 ungeskriptet has quit []

22:30 ungeskriptet has joined #dri-devel

22:31 Guest696 has quit [Ping timeout: 480 seconds]

22:31 sima has quit [Ping timeout: 480 seconds]

22:33 ungeskriptet has quit []

22:33 ungeskriptet has joined #dri-devel

22:44 apinheiro has quit [Quit: Leaving]

22:47 benjaminl has quit [Ping timeout: 480 seconds]

22:48 Marcand has quit [Ping timeout: 480 seconds]

22:52 jhli has quit []

23:03 TMM has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]

23:03 TMM has joined #dri-devel

23:04 ungeskriptet has quit [Quit: Ping timeout (120 seconds)]

23:04 ungeskriptet has joined #dri-devel

23:08 ungeskriptet has quit []

23:10 ungeskriptet has joined #dri-devel

23:16 dakr has quit [Quit: ZNC 1.8.2 - https://znc.in]

23:16 dakr has joined #dri-devel

23:37 ungeskriptet is now known as Guest699

23:37 ungeskriptet has joined #dri-devel

23:38 dviola has quit [Quit: WeeChat 4.2.1]

23:41 Guest699 has quit [Ping timeout: 480 seconds]

23:45 ungeskriptet is now known as Guest701

23:45 ungeskriptet has joined #dri-devel

23:47 ungeskriptet has quit [Remote host closed the connection]

23:47 ungeskriptet has joined #dri-devel

23:49 Guest701 has quit [Ping timeout: 480 seconds]

23:57 ungeskriptet is now known as Guest703

23:57 ungeskriptet has joined #dri-devel