#asahi-gpu on 2023-06-15 — irc logs at oftc.irclog.whitequark.org

2022-12-21 00:46 ChanServ changed the topic of #asahi-gpu to: Asahi Linux GPU development (no user support, NO binary reversing) | Keep things on topic | GitHub: https://alx.sh/g | Wiki: https://alx.sh/w | Logs: https://alx.sh/l/asahi-gpu

00:09 ourdumbfuture has quit [Quit: My Mac has gone to sleep. ZZZzzz…]

00:12 ourdumbfuture has joined #asahi-gpu

00:34 mlp has quit [Read error: Connection reset by peer]

00:35 mlp has joined #asahi-gpu

02:06 odak_ has joined #asahi-gpu

02:21 odak_ has quit [Quit: odak_]

02:21 odak_ has joined #asahi-gpu

02:30 <lina> alyssa: What I call clusters is what Apple calls "mGPU"s. M1 has just one. We need to test whatever you end up with on t600x and check whether the core ID register is globally unique or we need to get the mGPU ID somewhere else and add it in... hopefully it's unique.

03:18 ourdumbfuture has quit [Quit: My Mac has gone to sleep. ZZZzzz…]

03:24 PyroPeter has joined #asahi-gpu

03:26 pyropeter3 has quit [Ping timeout: 480 seconds]

03:31 odak_ has quit [Quit: odak_]

03:32 odak_ has joined #asahi-gpu

03:54 maria has quit [Remote host closed the connection]

03:55 maria has joined #asahi-gpu

04:00 possiblemeatball has joined #asahi-gpu

04:36 <lina> alyssa: BTW, the firefox register spilling thing was some crazy shadertoy, not webrender, so don't worry about that ^^

04:46 possiblemeatball has quit [Quit: Quit]

05:11 systwi has quit [Ping timeout: 480 seconds]

05:16 cylm has joined #asahi-gpu

05:32 hightower2 has quit [Ping timeout: 480 seconds]

06:07 systwi has joined #asahi-gpu

06:17 nsklaus has joined #asahi-gpu

06:52 odak_ has quit [Quit: odak_]

07:11 nimprod3l has joined #asahi-gpu

07:29 cylm has quit [Ping timeout: 480 seconds]

07:59 cylm has joined #asahi-gpu

08:07 mkurz has quit [Quit: Konversation terminated!]

08:07 mkurz has joined #asahi-gpu

08:28 nimprod3l has quit [Quit: Leaving]

08:40 hightower2 has joined #asahi-gpu

09:21 cylm has quit [Ping timeout: 480 seconds]

09:29 djorz has quit [Ping timeout: 480 seconds]

10:20 zocker has quit [Quit: Lost terminal]

10:21 <alyssa> lina: right, ok. Interestingly Apple gets the "# of concurrent tiles per core" parameter from a uniform, and doesn't ever explicitly use "# of cores" in the shader

10:21 <alyssa> meaning, the shaders they produce are portable across GPUs regardless of configuration

10:21 <alyssa> (just needing the driver to use the right constants for the implementation)

10:22 <alyssa> Ohh.. I guess the hierarchy in Apple terms is like

10:22 <alyssa> "core -> cluster -> mGPU"?

10:22 <alyssa> whereas you call those

10:22 <alyssa> "frags -> core -> cluster"?

10:22 <alyssa> maybe?

10:26 <alyssa> as for the shadertoy, yes if you run shadertoys you get to pick up the pieces

10:27 hightower3 has joined #asahi-gpu

10:27 alyssa has quit [Quit: leaving]

10:33 hightower2 has quit [Ping timeout: 480 seconds]

10:40 ourdumbfuture has joined #asahi-gpu

10:43 cylm has joined #asahi-gpu

10:58 ourdumbfuture has quit [Quit: My Mac has gone to sleep. ZZZzzz…]

11:02 ourdumbfuture has joined #asahi-gpu

11:26 hightower3 has quit [Ping timeout: 480 seconds]

11:34 <lina> alyssa: Frags is Apple terminology and I don't actually know what it means

11:34 <lina> What is a cluster according to apple? I thought I made that one up

11:35 <lina> As far as I'm concerned it's just core -> mGPU/cluster

11:37 ourdumbfuture has quit [Quit: My Mac has gone to sleep. ZZZzzz…]

11:41 <lina> alyssa: For compute context switching, they have a table as part of their data structure that maps core numbers, and my understanding is that is used for GPUs with disabled cores to avoid allocating memory that will never be used, since I'm guessing the core ID has gaps on those. So I'm surprised they don't use it for eMRT?

11:42 <lina> Also on M2 Pro/Max machines, all GPUs have "missing" cores due to the way they designed the chips... (clusters have uneven numbers of cores and they implemented it by just having phantom cores logically which are always marked disabled)...

11:45 ourdumbfuture has joined #asahi-gpu

12:03 odak_ has joined #asahi-gpu

12:13 chadmed has quit [Remote host closed the connection]

12:14 chadmed has joined #asahi-gpu

13:01 mlp has quit [Read error: Connection reset by peer]

13:02 mlp has joined #asahi-gpu

13:02 alyssa has joined #asahi-gpu

13:03 <alyssa> I'm glad I took the easy way out because I'm getting close to something workable

13:03 <alyssa> (-:

13:03 <lina> Nice! ^^

13:03 <alyssa> Hopefully I can finish that off today, it's only 9am lol

13:03 <alyssa> (I've been up for a few hours I don't know my sleep schedule these days)

13:03 ourdumbfuture has quit [Quit: My Mac has gone to sleep. ZZZzzz…]

13:03 <alyssa> After this, the only fail left will be due to nir_opt_preamble speculating loads

13:04 <alyssa> lina: what's the story with soft fault?

13:04 <alyssa> [Although probably I should figure out how to fix that properly since Adreno doesn't have soft fault and it also uses the same broken pass..]

13:05 <lina> It's a kernel parameter, asahi.fault_control=0xb

13:05 <alyssa> so just need a bit of uapi to tell userspace whether it's on or not?

13:06 ourdumbfuture has joined #asahi-gpu

13:06 <lina> Yeah ^^

13:06 <lina> It should just be a compatible feature flag

13:07 <alyssa> OK

13:09 <lina> If you want to test, just flip the kernel arg and plumb it into a new feature you fake on or something, I can put it into the kernel next time I sit down to hack on it ^^

13:09 <lina> I think different bits in fault_control might enable different things, but I don't know what's what yet...

13:09 <lina> I think there's probably one global enable and then specific feature bits or something...

13:10 <lina> I'm not sure if we should just pass that to userspace as-is as an integer without understanding it though, it is parsed by the firmware as far as I know, not straight a hardware register.

13:10 <alyssa> Interesting

13:10 <alyssa> How'd you find that 0xb value?

13:10 <alyssa> ("macOS uses it")

13:10 <lina> That's the macO-yes

13:11 <lina> And I guess 0 was disable/fault

13:11 <alyssa> Ok, how'd you fgure out that's fault control and that setting not-0xb is safe?

13:11 <lina> Guesswork...

13:11 <alyssa> sensible

13:11 <alyssa> 13:10 < lina> I'm not sure if we should just pass that to userspace as-is as an integer without understanding it though, it is parsed by the firmware as far as I know, not straight a hardware register.

13:11 <lina> There are a few "interesting" clusters of flags in the giant initdata structures and that's one of them

13:11 <lina> Well away from all the power management gunk

13:11 <alyssa> IMO, do *not* pass through the value unparsed if it's not 100% hardware defined

13:12 <alyssa> since that's a ticking uapi given the fw randomly changes on uprevs

13:12 <lina> Yeah, so then just a feature flag, only problem is once we do understand it we might want to split that into separate feature flags... but that's okay I guess

13:12 <lina> I guess if == 0xb then set the flag, otherwise don't set it, for now

13:12 <lina> Since we know that's the safe macOS behavior

13:12 <alyssa> Yeah, ++

13:13 <alyssa> if we add more bits later... ugly UAPI > breaking userspace

13:13 <lina> Yeah

13:13 <lina> You could try adding this to the kernel yourself, maybe try a bit of kernel rust~ ✨

13:14 <alyssa> Might be fun

13:14 <alyssa> This isn't blocking CTS though

13:15 <alyssa> since we need to support the !softfault case even if we end up shipping softfault

13:15 <lina> OK ^^

13:15 <alyssa> (if only because it makes Mesa debugging massively easier)

13:16 <lina> TBH I want to enable softfault by default, it's going to make things a lot more stable for people I bet, since hard faults can kill unrelated jobs and also trigger other existing bugs...

13:16 <alyssa> Yeah

13:16 <alyssa> What I mean is, even if literally every other user in existence has soft fault enabled, I want hard faults on my dev machines for my own use, and I don't want enabling faults to regress CTS lol

13:17 <lina> I suspect that setting only affects stuff fetched from shader cores (textures and device loads), so I think we'll still get hard faults for other brokenness like bad command streams or TVB problems

13:17 <lina> And yeah, fair ^^

13:18 <alyssa> soft fault was the hummus in my macOS mesa debug PITA

13:18 <lina> wwwwwwwww

13:19 <lina> Hey I know you enjoyed my mesa fault decoding code ^^

13:20 <alyssa> It's growing on me like bacteria

13:25 <alyssa> so... might have a branch passing CTS on Monday

13:25 <lina> ^^

13:26 <alyssa> not sure if you'll be able to kick off CTS runs next week though if your machines are in random boxes ;)

13:26 <alyssa> ahead-of-schedule is a good thing, hey

13:26 <lina> The machines are not going in boxes ^^

13:27 <lina> But I'm more worried about whether I'll be in a state to kick off CTS runs and fix the remaining M2 Pro/Max things...

13:28 <lina> See signal...

13:32 mort_5 is now known as mort_

13:34 <_jannau__> are the complicated requirements for CTS runs? I could cover M1 Max, M1 Ultra and M2

13:35 <alyssa> _jannau__: ooh, that might be helpful indeed instead of me doing 1 run and Lina doing 17000 :-p

13:35 <alyssa> and no, not particularly complicated

13:36 <alyssa> I mean the CTS is kinda annoying but. "git clone this repo, cmake build it, build mesa, run ./cts-runner, wait a zillion hours, zip up all the files produced and send me them"

13:37 <alyssa> there's a little bit more work to do the actual submission but that's on my end

13:38 odak_ has quit [Quit: odak_]

13:38 odak_ has joined #asahi-gpu

13:42 <alyssa> The most annoying part is that the CTS is slow

13:42 <alyssa> GLES3.1 on Mali took ~12h if I remember

13:42 <alyssa> your M1 Max will not have that problem :-p

13:43 <alyssa> (CTS is mostly CPU-bound. and the "real" CTS runs are basically single-threaded.)

13:44 <alyssa> (For development and CI, anholt's deqp-runner shards the CTS across all available CPUs, which scales almost linearly. But that's not valid for official CTS submissions.)

13:57 ourdumbfuture has quit [Quit: My Mac has gone to sleep. ZZZzzz…]

14:09 hightower2 has joined #asahi-gpu

14:52 <alyssa> ok.. my blending is not working with multiple draws together

14:52 <alyssa> draw #1 writes to an image, draw #2 reads it

14:53 <alyssa> i'm putting wait_pix's in but that doesn't seem to be strong enough

14:53 <alyssa> I guess I'll look at how rasterizer order groups work in metal

14:54 <alyssa> (This functionality is definitely supposed to work)

14:58 odak_ has quit [Quit: odak_]

14:58 odak_ has joined #asahi-gpu

15:01 nimprod3l has joined #asahi-gpu

15:16 ourdumbfuture has joined #asahi-gpu

15:23 odak_ has quit [Quit: odak_]

15:42 <i509vcb> Hmm agx_border says customBorderColorWithoutFormat can't be supported but zink requires that to be supported. Shader hack later I guess could cover that?

15:57 mkurz has quit [Ping timeout: 480 seconds]

16:00 nimprod3l has quit [Quit: Leaving]

16:23 <alyssa> i509vcb: zink doesn't really require customBorderColorWithoutFormat, it already doesn't use it on turnip for perf reasons

16:23 <alyssa> easy to patch Zink to do what we need

16:23 <alyssa> The bigger issue is DXVK/VKD3D

16:26 mkurz has joined #asahi-gpu

16:41 rhysmdnz has quit [Quit: Bridge terminating on SIGTERM]

16:41 Guest2821 has quit [Quit: Bridge terminating on SIGTERM]

16:45 Jamie has joined #asahi-gpu

16:45 rhysmdnz has joined #asahi-gpu

16:45 Jamie is now known as Guest3184

16:48 possiblemeatball has joined #asahi-gpu

17:09 ourdumbfuture has quit [Quit: My Mac has gone to sleep. ZZZzzz…]

17:24 mkurz has quit [Remote host closed the connection]

17:26 <alyssa> -----

17:26 <alyssa> MORE spicy barriers? ah bah oui!

17:27 <alyssa> 2, 2, 10

17:27 <alyssa> 3, 2, 10

17:27 <alyssa> 0, 0, 4

17:27 <alyssa> 2, 1, 10

17:27 <alyssa> the 0, 0, 4 is notably different here

17:28 <alyssa> also interesting that it's inserting these after the write, implicitly

17:28 <alyssa> also, sample_mask 255, 1 at the top of the program

17:28 <alyssa> also never did figure out what wait_pix 512, 3 was about

17:29 <alyssa> the barriers are only inserted when raster order groups are used

17:29 <alyssa> flush the write to other invocations, I guess

17:30 ourdumbfuture has joined #asahi-gpu

17:33 <alyssa> Ooh

17:33 <alyssa> 0, 0, 4 is the barrier I had discovered by bruteforce

17:33 <alyssa> and called flush_memory_to_texture

17:34 <alyssa> but actually the problem is HSR, i think

17:35 <alyssa> yeah ok lol

17:35 <alyssa> yep ok blend works now

17:39 <alyssa> next up, fix partial renders

17:39 <alyssa> which should be trivial

17:42 <alyssa> fixed

17:47 <alyssa> still seeing some HSR related issues..

17:47 possiblemeatball has quit [Quit: Quit]

17:47 <alyssa> but fewer at least

17:47 <alyssa> good enough to play supertuxkart with all render targets spilled to memory

17:48 <alyssa> (aka "my Apple M1 is an immediate mode renderer now!")

18:00 cylm has quit [Read error: Connection reset by peer]

18:15 <alyssa> Ooh tasty new bit in the PBE descriptor for sRGB

18:15 <alyssa> looks like bit 125

18:16 aafeke_ has joined #asahi-gpu

18:32 <alyssa> lina: I am now observing that MSRTT makes eMRT significantly more annoying

18:32 <alyssa> so yet another reason to defer that until we have a use case

18:32 <alyssa> the only thing I know that uses it is WebGL in Chromium

18:33 zzywysm has joined #asahi-gpu

18:33 <alyssa> for iPhone SoCs, it'd be a BIG deal for perf there

18:33 <alyssa> for the desktop class stuff, possibly we can eat the it

18:33 <alyssa> hit

18:41 <alyssa> Oh uffffffffff

18:42 <alyssa> fragment shader side effects + sample shading = 🤯

18:56 ourdumbfuture has quit [Quit: My Mac has gone to sleep. ZZZzzz…]

19:01 aafeke_ has quit [Quit: aafeke_]

19:01 aafeke_ has joined #asahi-gpu

19:09 ourdumbfuture has joined #asahi-gpu

19:23 mkurz has joined #asahi-gpu

19:57 possiblemeatball has joined #asahi-gpu

20:16 aafeke_ has quit [Quit: aafeke_]

20:17 aafeke_ has joined #asahi-gpu

20:27 aafeke_ has quit [Quit: aafeke_]

20:27 aafeke_ has joined #asahi-gpu

20:31 <alyssa> many hacks involved but passed the KHR-GLES tests with compression + preambles disabled

20:33 hightower2 has quit [Remote host closed the connection]

20:48 alyssa has quit [Quit: leaving]

21:36 aafeke_ has quit [Ping timeout: 480 seconds]

22:19 possiblemeatball has quit [Quit: Quit]

22:44 skoobasteeve has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]

22:45 skoobasteeve has joined #asahi-gpu

22:47 skoobasteeve has quit []

22:48 skoobasteeve has joined #asahi-gpu

23:16 nsklaus has quit [Ping timeout: 480 seconds]