#asahi-gpu on 2023-03-06 — irc logs at oftc.irclog.whitequark.org

2022-12-21 00:46 ChanServ changed the topic of #asahi-gpu to: Asahi Linux GPU development (no user support, NO binary reversing) | Keep things on topic | GitHub: https://alx.sh/g | Wiki: https://alx.sh/w | Logs: https://alx.sh/l/asahi-gpu

00:16 aratuk has quit []

00:22 jhan has quit [Ping timeout: 480 seconds]

00:30 jhan has joined #asahi-gpu

00:49 jhan has quit [Ping timeout: 480 seconds]

00:59 jhan has joined #asahi-gpu

01:07 jhan has quit [Ping timeout: 480 seconds]

01:10 jhan has joined #asahi-gpu

01:12 user982492 has joined #asahi-gpu

01:48 jhan has quit [Remote host closed the connection]

02:38 Hibyehello has quit [Ping timeout: 480 seconds]

03:03 jhan has joined #asahi-gpu

03:19 Hibyehello has joined #asahi-gpu

03:25 Hibyehello_ has joined #asahi-gpu

03:27 Hibyehello has quit [Ping timeout: 480 seconds]

03:30 kesslerd has quit [Remote host closed the connection]

03:42 Hibyehello_ has quit [Ping timeout: 480 seconds]

03:52 Hibyehello has joined #asahi-gpu

04:00 Hibyehello has quit [Ping timeout: 480 seconds]

04:15 jhan has quit [Remote host closed the connection]

04:25 jhan_ has joined #asahi-gpu

04:27 jhan has joined #asahi-gpu

04:27 jhan_ has quit [Read error: Connection reset by peer]

04:27 jhan has quit [Read error: Connection reset by peer]

04:30 Hibyehello has joined #asahi-gpu

04:32 DarkShadow44 has quit [Quit: ZNC - https://znc.in]

04:32 DarkShadow44 has joined #asahi-gpu

04:42 Hibyehello has quit [Ping timeout: 480 seconds]

04:47 possiblemeatball has quit [Quit: Quit]

05:05 jhan has joined #asahi-gpu

05:33 Hibyehello has joined #asahi-gpu

05:42 user982492 has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]

05:53 Hibyehello has quit [Ping timeout: 480 seconds]

06:39 Hibyehello has joined #asahi-gpu

07:00 Hibyehello has quit [Ping timeout: 480 seconds]

07:12 JTL has quit [Remote host closed the connection]

07:13 JTL has joined #asahi-gpu

07:19 JTL has quit [Remote host closed the connection]

07:20 JTL has joined #asahi-gpu

07:25 jhan has quit [Remote host closed the connection]

07:27 jhan has joined #asahi-gpu

07:35 jhan has quit [Ping timeout: 480 seconds]

07:50 bisko has joined #asahi-gpu

07:50 Hibyehello has joined #asahi-gpu

08:13 <lina> alyssa: I can't reproduce your splat but I think I know what happened. I thought there would be something in drm_sched to wait for in-progress jobs when a scheduler/entity gets destroyed, but it doesn't look like it... so you probably had a GPU job submitted, then the userspace process aborted, then the kernel freed all the scheduler stuff and when the GPU job completed it crashed because the scheduler

08:13 <lina> was gone.

08:14 <lina> To make things even more confusing, the job completion isn't via a job reference, it's via a fence...

08:15 <lina> I've added a reference from the job to the scheduler and I think that will fix it... since the job should never get destroyed until the scheduler cleans it up from its main loop, which can only happen when the fence gets signaled or fails, so that should mean the scheduler always outlives the job outlives the fence...

08:15 <lina> This ownership/lifetime stuff is so subtle and completely undocumented in C APIs, it's such a mess T_T

08:17 <lina> I'll look at the piglit stuff next though I'm less worried about that since I think we've always had corner case GPU crash bugs (as much as I've tried to eliminate them...)

08:17 <lina> And GPU crashes are better behaved now, at least it doesn't just hang your system

08:21 pjakobsson has joined #asahi-gpu

08:24 <lina> Okay, reproduced the sched thing with something deliberate (8Kx8K glmarks getting killed in a loop) ^^

08:24 <lina> Let's see if my fix worked...

08:46 <lina> Okay, I fixed the splat but I have another issue... I'm leaking slots somewhere with this workload, it runs out

08:52 <lina> ... and I can't reproduce it now? ;;

09:01 <lina> Why do I get the feeling this is drm_sched again... something like killing the entity stops jobs from being run, but doesn't cancel/free pending jobs...

09:03 <lina> Ohhh wait, I think I'm never calling the entity cleanup function. Okay, that one's on me then...

09:03 <lina> Wait no I do

09:11 jhan has joined #asahi-gpu

09:24 <lina> Ah, this could be a bad firmware interaction... I do know I invalidate context before waiting for jobs to complete, which is probably a bad idea. Maybe that just kills things and leaves the jobs dangling, never to complete.

09:25 jhan has quit [Ping timeout: 480 seconds]

09:34 DarkShadow44 has quit [Quit: ZNC - https://znc.in]

09:35 <lina> Nope, this is getting signaled... so why is the scheduler not cleaning this up?

09:35 DarkShadow44 has joined #asahi-gpu

09:39 stickytoffee has quit [Quit: brb]

09:39 DarkShadow44 has joined #asahi-gpu

09:54 <lina> And this time I crashed RTKit ^^;;

09:55 <lina> That could just be the context issue I mentioned though... but first I want to find out why I'm leaking jobs...

10:01 stickytoffee has joined #asahi-gpu

10:03 <lina> This is weeeird... the job cleanup callback gets called but the job doesn't get dropped sometimes?

10:07 nyilas has joined #asahi-gpu

10:11 <lina> Ohh... am I deadlocking by any chance?

10:13 <lina> Yeeeah...

10:14 <lina> Okay, I can't put a reference to the scheduler in the job, because if it is the last reference dropping the scheduler from the job cleanup callback deadlocks ;;

10:28 <lina> Maybe drm_sched_stop() before killing the scheduler will do what I want...?

10:35 <lina> No, but that only cleans up completed jobs and detaches the callbacks, it doesn't actually free pending jobs because it assumes you want to restart the queue later...

10:38 <lina> I think I need to modify the C side for this, this is just broken, I have no idea how to safely wrap this API without duplicating job tracking...

11:03 jhan has joined #asahi-gpu

11:09 <lina> OK, I think I finally fixed this one ^^

11:09 <lina> Now the piglig thing...

11:12 jhan has quit [Ping timeout: 480 seconds]

11:26 DarkShadow44 has quit [Quit: ZNC - https://znc.in]

11:26 DarkShadow44 has joined #asahi-gpu

11:41 jhan has joined #asahi-gpu

11:49 jhan has quit [Ping timeout: 480 seconds]

11:52 jhan has joined #asahi-gpu

11:53 <lina> Reproduced the RTKit crash... now did my GpuContext thing fix it?

11:55 <lina> [ 83.025087] asahi 206400000.gpu: Allocator: Corruption after object of type asahi::fw::fragment::RunFragmentG13V12_3 at 0xffffffa00009be00:0x928 + 0x0..0x5

11:55 <lina> Ooooo that's a new one

11:55 MajorBiscuit has joined #asahi-gpu

12:01 jhan has quit [Ping timeout: 480 seconds]

12:03 kode54 has quit [Quit: Ping timeout (120 seconds)]

12:03 kode54 has joined #asahi-gpu

12:06 hightower2 has joined #asahi-gpu

12:15 <lina> streaming-texture (or something) is OOMing for me...

12:17 <lina> Excluding that though, I got through a piglit run ^^

12:18 <lina> Trying again with higher GL...

12:21 <lina> Still works ^^

12:21 <lina> Let me run with a fix for that corruption warning and see how that goes...

12:22 <lina> *Wild guess* those fields might have something to do with preemption, that sounds like the kind of thing piglit would end up triggering...

12:26 <lina> I think it's fixed! ^^

12:28 <lina> alyssa: Please uprev your kernel, I think I fixed both issues ^^

12:40 jhan has joined #asahi-gpu

12:41 possiblemeatball has joined #asahi-gpu

12:43 <lina> alyssa: Also this is now rebased on 6.2 with DCP changes, so you might need a m1n1 update too (I did)

12:51 jhan has quit [Ping timeout: 480 seconds]

12:57 jhan has joined #asahi-gpu

13:01 kujeger has quit [Quit: ZNC 1.8.2 - https://znc.in]

13:01 kujeger has joined #asahi-gpu

13:04 kujeger has quit []

13:04 kujeger has joined #asahi-gpu

13:07 benoyelq has joined #asahi-gpu

13:12 jhan has quit [Ping timeout: 480 seconds]

13:17 benoyelq has quit []

13:44 jhan has joined #asahi-gpu

13:44 hightower2 has quit [Remote host closed the connection]

13:49 le0n has quit [Ping timeout: 480 seconds]

13:49 bluetail9 has quit [Ping timeout: 480 seconds]

13:56 le0n has joined #asahi-gpu

14:09 ChaosPrincess has quit [Quit: WeeChat 3.8]

14:10 ChaosPrincess has joined #asahi-gpu

14:41 bluetail9 has joined #asahi-gpu

14:53 possiblemeatball has quit [Remote host closed the connection]

15:07 jhan has quit [Remote host closed the connection]

15:21 jhan has joined #asahi-gpu

15:23 jhan has quit [Remote host closed the connection]

15:39 hightower2 has joined #asahi-gpu

15:48 jhan has joined #asahi-gpu

15:56 jhan has quit [Ping timeout: 480 seconds]

16:26 jhan has joined #asahi-gpu

16:29 jhan has quit []

16:40 kesslerd has joined #asahi-gpu

16:42 bluetail9 has quit [Ping timeout: 480 seconds]

16:48 MajorBiscuit has quit [Ping timeout: 480 seconds]

17:01 bluetail9 has joined #asahi-gpu

19:35 TheLink7 has joined #asahi-gpu

19:36 VinDuv_ has joined #asahi-gpu

19:36 thevar1able_ has joined #asahi-gpu

19:36 m42uko_ has joined #asahi-gpu

19:36 snuck has joined #asahi-gpu

19:36 m5zs7k_ has joined #asahi-gpu

19:37 Misthios1 has joined #asahi-gpu

19:37 merry_ has joined #asahi-gpu

19:37 TellowKrinkle_ has joined #asahi-gpu

19:37 grange_c68 has joined #asahi-gpu

19:37 karolherbst_ has joined #asahi-gpu

19:37 Mary6 has joined #asahi-gpu

19:37 jnn has joined #asahi-gpu

19:37 pbsds3 has joined #asahi-gpu

19:37 antonio__ has joined #asahi-gpu

19:37 V_ has joined #asahi-gpu

19:37 lawrence6 has joined #asahi-gpu

19:37 minecrell4 has joined #asahi-gpu

19:38 akemin-dayo has joined #asahi-gpu

19:38 vup2 has joined #asahi-gpu

19:38 Yamakaja has joined #asahi-gpu

19:38 merry has quit [Ping timeout: 480 seconds]

19:38 TellowKrinkle has quit [Ping timeout: 480 seconds]

19:38 ah- has quit [Remote host closed the connection]

19:38 karolherbst has quit [Read error: Connection reset by peer]

19:38 ah-_ has joined #asahi-gpu

19:38 JoshuaAs- has joined #asahi-gpu

19:38 stipa_ has joined #asahi-gpu

19:38 kit_ty_kate2 has joined #asahi-gpu

19:39 ChaosPrincess has quit [Remote host closed the connection]

19:39 sneak has quit [Write error: connection closed]

19:39 ligma_toad has quit [Read error: Connection reset by peer]

19:39 ChaosPrincess has joined #asahi-gpu

19:39 mairacanal0 has joined #asahi-gpu

19:39 JoshuaAshton has quit [Read error: Connection reset by peer]

19:39 grange_c6 has quit [Read error: Connection reset by peer]

19:39 m42uko has quit [Write error: connection closed]

19:39 grange_c68 is now known as grange_c6

19:39 ligma_toad has joined #asahi-gpu

19:39 thevar1able has quit [Ping timeout: 480 seconds]

19:39 kit_ty_kate1 has quit [Read error: Connection reset by peer]

19:39 kit_ty_kate2 has quit [reticulum.oftc.net helix.oftc.net]

19:39 nyilas has quit [reticulum.oftc.net helix.oftc.net]

19:39 bisko has quit [reticulum.oftc.net helix.oftc.net]

19:39 stipa_ has quit [reticulum.oftc.net helix.oftc.net]

19:39 Misthios has quit [reticulum.oftc.net helix.oftc.net]

19:39 Mary has quit [reticulum.oftc.net helix.oftc.net]

19:39 Yamakaja_ has quit [reticulum.oftc.net helix.oftc.net]

19:39 TheLink has quit [reticulum.oftc.net helix.oftc.net]

19:39 mairacanal has quit [reticulum.oftc.net helix.oftc.net]

19:39 V has quit [reticulum.oftc.net helix.oftc.net]

19:39 m5zs7k has quit [reticulum.oftc.net helix.oftc.net]

19:39 pbsds has quit [reticulum.oftc.net helix.oftc.net]

19:39 stipa has quit [reticulum.oftc.net helix.oftc.net]

19:39 akemin_dayo has quit [reticulum.oftc.net helix.oftc.net]

19:39 minecrell has quit [reticulum.oftc.net helix.oftc.net]

19:39 VinDuv has quit [reticulum.oftc.net helix.oftc.net]

19:39 vup has quit [reticulum.oftc.net helix.oftc.net]

19:39 jn has quit [reticulum.oftc.net helix.oftc.net]

19:39 lawrence has quit [reticulum.oftc.net helix.oftc.net]

19:39 manawyrm has quit [reticulum.oftc.net helix.oftc.net]

19:39 TheLink7 is now known as TheLink

19:39 VinDuv_ is now known as VinDuv

19:39 Misthios1 is now known as Misthios

19:39 lawrence6 is now known as lawrence

19:40 akemin_dayo has joined #asahi-gpu

19:40 mairacanal has joined #asahi-gpu

19:40 pbsds has joined #asahi-gpu

19:40 V has joined #asahi-gpu

19:40 vup has joined #asahi-gpu

19:40 Yamakaja_ has joined #asahi-gpu

19:40 minecrell has joined #asahi-gpu

19:40 jn has joined #asahi-gpu

19:40 manawyrm has joined #asahi-gpu

19:40 Mary has joined #asahi-gpu

19:40 nyilas has joined #asahi-gpu

19:40 kit_ty_kate2 has joined #asahi-gpu

19:40 stipa_ has joined #asahi-gpu

19:41 minecrell has quit [Max SendQ exceeded]

19:41 V has quit [Max SendQ exceeded]

19:41 nyilas has quit [Ping timeout: 484 seconds]

19:41 Mary has quit [Ping timeout: 484 seconds]

19:41 Yamakaja_ has quit [Ping timeout: 484 seconds]

19:41 akemin_dayo has quit [Ping timeout: 484 seconds]

19:41 pbsds has quit [Ping timeout: 484 seconds]

19:41 jn has quit [Ping timeout: 484 seconds]

19:41 mairacanal has quit [Ping timeout: 484 seconds]

19:41 vup has quit [Ping timeout: 484 seconds]

19:42 TellowKrinkle_ is now known as TellowKrinkle

19:44 possiblemeatball has joined #asahi-gpu

19:46 mairacanal0 is now known as mairacanal

20:08 stipa has joined #asahi-gpu

20:08 stipa_ has quit [Read error: Connection reset by peer]

20:09 kit_ty_kate3 has joined #asahi-gpu

20:09 kit_ty_kate2 has quit [Read error: Connection reset by peer]

20:09 manawyrm has quit [reticulum.oftc.net helix.oftc.net]

20:10 manawyrm has joined #asahi-gpu

20:11 minecrell4 has quit []

20:11 minecrell has joined #asahi-gpu

21:11 aratuk has joined #asahi-gpu

21:27 aratuk has quit []

21:28 antonio__ has quit [Remote host closed the connection]

21:30 aratuk has joined #asahi-gpu

21:40 hightower2 has quit [Remote host closed the connection]

22:09 minecrell9 has joined #asahi-gpu

22:13 minecrell has quit [reticulum.oftc.net helix.oftc.net]

22:13 manawyrm has quit [reticulum.oftc.net helix.oftc.net]

22:15 manawyrm has joined #asahi-gpu

22:31 aratuk has quit [Remote host closed the connection]

22:31 aratuk has joined #asahi-gpu

22:35 aratuk has quit [Remote host closed the connection]

22:36 aratuk has joined #asahi-gpu

23:01 minecrell9 has quit []

23:01 minecrell9 has joined #asahi-gpu

23:02 aratuk has quit [Remote host closed the connection]

23:19 Mary6 has quit []

23:20 Mary6 has joined #asahi-gpu

23:25 aratuk has joined #asahi-gpu

23:33 aratuk has quit [Ping timeout: 480 seconds]

23:39 jole_ has quit [Remote host closed the connection]

23:45 jole has joined #asahi-gpu

23:55 aratuk has joined #asahi-gpu