ChanServ changed the topic of #asahi-gpu to: Asahi Linux GPU development (no user support, NO binary reversing) | Keep things on topic | GitHub: https://alx.sh/g | Wiki: https://alx.sh/w | Logs: https://alx.sh/l/asahi-gpu
ourdumbfuture has quit [Quit: My Mac has gone to sleep. ZZZzzz…]
ourdumbfuture has joined #asahi-gpu
mlp has quit [Read error: Connection reset by peer]
mlp has joined #asahi-gpu
odak_ has joined #asahi-gpu
odak_ has quit [Quit: odak_]
odak_ has joined #asahi-gpu
<lina> alyssa: What I call clusters is what Apple calls "mGPU"s. M1 has just one. We need to test whatever you end up with on t600x and check whether the core ID register is globally unique or we need to get the mGPU ID somewhere else and add it in... hopefully it's unique.
ourdumbfuture has quit [Quit: My Mac has gone to sleep. ZZZzzz…]
PyroPeter has joined #asahi-gpu
pyropeter3 has quit [Ping timeout: 480 seconds]
odak_ has quit [Quit: odak_]
odak_ has joined #asahi-gpu
maria has quit [Remote host closed the connection]
maria has joined #asahi-gpu
possiblemeatball has joined #asahi-gpu
<lina> alyssa: BTW, the firefox register spilling thing was some crazy shadertoy, not webrender, so don't worry about that ^^
possiblemeatball has quit [Quit: Quit]
systwi has quit [Ping timeout: 480 seconds]
cylm has joined #asahi-gpu
hightower2 has quit [Ping timeout: 480 seconds]
systwi has joined #asahi-gpu
nsklaus has joined #asahi-gpu
odak_ has quit [Quit: odak_]
nimprod3l has joined #asahi-gpu
cylm has quit [Ping timeout: 480 seconds]
cylm has joined #asahi-gpu
mkurz has quit [Quit: Konversation terminated!]
mkurz has joined #asahi-gpu
nimprod3l has quit [Quit: Leaving]
hightower2 has joined #asahi-gpu
cylm has quit [Ping timeout: 480 seconds]
djorz has quit [Ping timeout: 480 seconds]
zocker has quit [Quit: Lost terminal]
<alyssa> lina: right, ok. Interestingly Apple gets the "# of concurrent tiles per core" parameter from a uniform, and doesn't ever explicitly use "# of cores" in the shader
<alyssa> meaning, the shaders they produce are portable across GPUs regardless of configuration
<alyssa> (just needing the driver to use the right constants for the implementation)
<alyssa> Ohh.. I guess the hierarchy in Apple terms is like
<alyssa> "core -> cluster -> mGPU"?
<alyssa> whereas you call those
<alyssa> "frags -> core -> cluster"?
<alyssa> maybe?
<alyssa> as for the shadertoy, yes if you run shadertoys you get to pick up the pieces
hightower3 has joined #asahi-gpu
alyssa has quit [Quit: leaving]
hightower2 has quit [Ping timeout: 480 seconds]
ourdumbfuture has joined #asahi-gpu
cylm has joined #asahi-gpu
ourdumbfuture has quit [Quit: My Mac has gone to sleep. ZZZzzz…]
ourdumbfuture has joined #asahi-gpu
hightower3 has quit [Ping timeout: 480 seconds]
<lina> alyssa: Frags is Apple terminology and I don't actually know what it means
<lina> What is a cluster according to apple? I thought I made that one up
<lina> As far as I'm concerned it's just core -> mGPU/cluster
ourdumbfuture has quit [Quit: My Mac has gone to sleep. ZZZzzz…]
<lina> alyssa: For compute context switching, they have a table as part of their data structure that maps core numbers, and my understanding is that is used for GPUs with disabled cores to avoid allocating memory that will never be used, since I'm guessing the core ID has gaps on those. So I'm surprised they don't use it for eMRT?
<lina> Also on M2 Pro/Max machines, all GPUs have "missing" cores due to the way they designed the chips... (clusters have uneven numbers of cores and they implemented it by just having phantom cores logically which are always marked disabled)...
ourdumbfuture has joined #asahi-gpu
odak_ has joined #asahi-gpu
chadmed has quit [Remote host closed the connection]
chadmed has joined #asahi-gpu
mlp has quit [Read error: Connection reset by peer]
mlp has joined #asahi-gpu
alyssa has joined #asahi-gpu
<alyssa> I'm glad I took the easy way out because I'm getting close to something workable
<alyssa> (-:
<lina> Nice! ^^
<alyssa> Hopefully I can finish that off today, it's only 9am lol
<alyssa> (I've been up for a few hours I don't know my sleep schedule these days)
ourdumbfuture has quit [Quit: My Mac has gone to sleep. ZZZzzz…]
<alyssa> After this, the only fail left will be due to nir_opt_preamble speculating loads
<alyssa> lina: what's the story with soft fault?
<alyssa> [Although probably I should figure out how to fix that properly since Adreno doesn't have soft fault and it also uses the same broken pass..]
<lina> It's a kernel parameter, asahi.fault_control=0xb
<alyssa> so just need a bit of uapi to tell userspace whether it's on or not?
ourdumbfuture has joined #asahi-gpu
<lina> Yeah ^^
<lina> It should just be a compatible feature flag
<alyssa> OK
<lina> If you want to test, just flip the kernel arg and plumb it into a new feature you fake on or something, I can put it into the kernel next time I sit down to hack on it ^^
<lina> I think different bits in fault_control might enable different things, but I don't know what's what yet...
<lina> I think there's probably one global enable and then specific feature bits or something...
<lina> I'm not sure if we should just pass that to userspace as-is as an integer without understanding it though, it is parsed by the firmware as far as I know, not straight a hardware register.
<alyssa> Interesting
<alyssa> How'd you find that 0xb value?
<alyssa> ("macOS uses it")
<lina> That's the macO-yes
<lina> And I guess 0 was disable/fault
<alyssa> Ok, how'd you fgure out that's fault control and that setting not-0xb is safe?
<lina> Guesswork...
<alyssa> sensible
<alyssa> 13:10 < lina> I'm not sure if we should just pass that to userspace as-is as an integer without understanding it though, it is parsed by the firmware as far as I know, not straight a hardware register.
<lina> There are a few "interesting" clusters of flags in the giant initdata structures and that's one of them
<lina> Well away from all the power management gunk
<alyssa> IMO, do *not* pass through the value unparsed if it's not 100% hardware defined
<alyssa> since that's a ticking uapi given the fw randomly changes on uprevs
<lina> Yeah, so then just a feature flag, only problem is once we do understand it we might want to split that into separate feature flags... but that's okay I guess
<lina> I guess if == 0xb then set the flag, otherwise don't set it, for now
<lina> Since we know that's the safe macOS behavior
<alyssa> Yeah, ++
<alyssa> if we add more bits later... ugly UAPI > breaking userspace
<lina> Yeah
<lina> You could try adding this to the kernel yourself, maybe try a bit of kernel rust~ ✨
<alyssa> Might be fun
<alyssa> This isn't blocking CTS though
<alyssa> since we need to support the !softfault case even if we end up shipping softfault
<lina> OK ^^
<alyssa> (if only because it makes Mesa debugging massively easier)
<lina> TBH I want to enable softfault by default, it's going to make things a lot more stable for people I bet, since hard faults can kill unrelated jobs and also trigger other existing bugs...
<alyssa> Yeah
<alyssa> What I mean is, even if literally every other user in existence has soft fault enabled, I want hard faults on my dev machines for my own use, and I don't want enabling faults to regress CTS lol
<lina> I suspect that setting only affects stuff fetched from shader cores (textures and device loads), so I think we'll still get hard faults for other brokenness like bad command streams or TVB problems
<lina> And yeah, fair ^^
<alyssa> soft fault was the hummus in my macOS mesa debug PITA
<lina> wwwwwwwww
<lina> Hey I know you enjoyed my mesa fault decoding code ^^
<alyssa> It's growing on me like bacteria
<alyssa> so... might have a branch passing CTS on Monday
<lina> ^^
<alyssa> not sure if you'll be able to kick off CTS runs next week though if your machines are in random boxes ;)
<alyssa> ahead-of-schedule is a good thing, hey
<lina> The machines are not going in boxes ^^
<lina> But I'm more worried about whether I'll be in a state to kick off CTS runs and fix the remaining M2 Pro/Max things...
<lina> See signal...
mort_5 is now known as mort_
<_jannau__> are the complicated requirements for CTS runs? I could cover M1 Max, M1 Ultra and M2
<alyssa> _jannau__: ooh, that might be helpful indeed instead of me doing 1 run and Lina doing 17000 :-p
<alyssa> and no, not particularly complicated
<alyssa> I mean the CTS is kinda annoying but. "git clone this repo, cmake build it, build mesa, run ./cts-runner, wait a zillion hours, zip up all the files produced and send me them"
<alyssa> there's a little bit more work to do the actual submission but that's on my end
odak_ has quit [Quit: odak_]
odak_ has joined #asahi-gpu
<alyssa> The most annoying part is that the CTS is slow
<alyssa> GLES3.1 on Mali took ~12h if I remember
<alyssa> your M1 Max will not have that problem :-p
<alyssa> (CTS is mostly CPU-bound. and the "real" CTS runs are basically single-threaded.)
<alyssa> (For development and CI, anholt's deqp-runner shards the CTS across all available CPUs, which scales almost linearly. But that's not valid for official CTS submissions.)
ourdumbfuture has quit [Quit: My Mac has gone to sleep. ZZZzzz…]
hightower2 has joined #asahi-gpu
<alyssa> ok.. my blending is not working with multiple draws together
<alyssa> draw #1 writes to an image, draw #2 reads it
<alyssa> i'm putting wait_pix's in but that doesn't seem to be strong enough
<alyssa> I guess I'll look at how rasterizer order groups work in metal
<alyssa> (This functionality is definitely supposed to work)
odak_ has quit [Quit: odak_]
odak_ has joined #asahi-gpu
nimprod3l has joined #asahi-gpu
ourdumbfuture has joined #asahi-gpu
odak_ has quit [Quit: odak_]
<i509vcb> Hmm agx_border says customBorderColorWithoutFormat can't be supported but zink requires that to be supported. Shader hack later I guess could cover that?
mkurz has quit [Ping timeout: 480 seconds]
nimprod3l has quit [Quit: Leaving]
<alyssa> i509vcb: zink doesn't really require customBorderColorWithoutFormat, it already doesn't use it on turnip for perf reasons
<alyssa> easy to patch Zink to do what we need
<alyssa> The bigger issue is DXVK/VKD3D
mkurz has joined #asahi-gpu
rhysmdnz has quit [Quit: Bridge terminating on SIGTERM]
Guest2821 has quit [Quit: Bridge terminating on SIGTERM]
Jamie has joined #asahi-gpu
rhysmdnz has joined #asahi-gpu
Jamie is now known as Guest3184
possiblemeatball has joined #asahi-gpu
ourdumbfuture has quit [Quit: My Mac has gone to sleep. ZZZzzz…]
mkurz has quit [Remote host closed the connection]
<alyssa> -----
<alyssa> MORE spicy barriers? ah bah oui!
<alyssa> 2, 2, 10
<alyssa> 3, 2, 10
<alyssa> 0, 0, 4
<alyssa> 2, 1, 10
<alyssa> the 0, 0, 4 is notably different here
<alyssa> also interesting that it's inserting these after the write, implicitly
<alyssa> also, sample_mask 255, 1 at the top of the program
<alyssa> also never did figure out what wait_pix 512, 3 was about
<alyssa> the barriers are only inserted when raster order groups are used
<alyssa> flush the write to other invocations, I guess
ourdumbfuture has joined #asahi-gpu
<alyssa> Ooh
<alyssa> 0, 0, 4 is the barrier I had discovered by bruteforce
<alyssa> and called flush_memory_to_texture
<alyssa> but actually the problem is HSR, i think
<alyssa> yeah ok lol
<alyssa> yep ok blend works now
<alyssa> next up, fix partial renders
<alyssa> which should be trivial
<alyssa> fixed
<alyssa> still seeing some HSR related issues..
possiblemeatball has quit [Quit: Quit]
<alyssa> but fewer at least
<alyssa> good enough to play supertuxkart with all render targets spilled to memory
<alyssa> (aka "my Apple M1 is an immediate mode renderer now!")
cylm has quit [Read error: Connection reset by peer]
<alyssa> Ooh tasty new bit in the PBE descriptor for sRGB
<alyssa> looks like bit 125
aafeke_ has joined #asahi-gpu
<alyssa> lina: I am now observing that MSRTT makes eMRT significantly more annoying
<alyssa> so yet another reason to defer that until we have a use case
<alyssa> the only thing I know that uses it is WebGL in Chromium
zzywysm has joined #asahi-gpu
<alyssa> for iPhone SoCs, it'd be a BIG deal for perf there
<alyssa> for the desktop class stuff, possibly we can eat the it
<alyssa> hit
<alyssa> Oh uffffffffff
<alyssa> fragment shader side effects + sample shading = 🤯
ourdumbfuture has quit [Quit: My Mac has gone to sleep. ZZZzzz…]
aafeke_ has quit [Quit: aafeke_]
aafeke_ has joined #asahi-gpu
ourdumbfuture has joined #asahi-gpu
mkurz has joined #asahi-gpu
possiblemeatball has joined #asahi-gpu
aafeke_ has quit [Quit: aafeke_]
aafeke_ has joined #asahi-gpu
aafeke_ has quit [Quit: aafeke_]
aafeke_ has joined #asahi-gpu
<alyssa> many hacks involved but passed the KHR-GLES tests with compression + preambles disabled
hightower2 has quit [Remote host closed the connection]
alyssa has quit [Quit: leaving]
aafeke_ has quit [Ping timeout: 480 seconds]
possiblemeatball has quit [Quit: Quit]
skoobasteeve has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
skoobasteeve has joined #asahi-gpu
skoobasteeve has quit []
skoobasteeve has joined #asahi-gpu
nsklaus has quit [Ping timeout: 480 seconds]