ChanServ changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard & Bifrost - Logs https://oftc.irclog.whitequark.org/panfrost - <macc24> i have been here before it was popular
Guest2061 has quit []
oftc- has joined #panfrost
oftc- has left #panfrost [#panfrost]
xdarklight has joined #panfrost
xdarklight has quit []
oftc- has joined #panfrost
oftc- has left #panfrost [#panfrost]
xdarklight has joined #panfrost
xdarklight has quit []
xdarklight has joined #panfrost
<icecream95> alyssa: I like how one of the big features of tiling GPUs is that you can do vertex shading for one frame while fragment shading the next, but panfrost.ko makes that not really work at all
<alyssa> Yeah... :(
<alyssa> patches welcome on the mesa side....
<icecream95> I would try using kbase, but because of all the mediatek hacks I couldn't get it to work properly on duet
<icecream95> jekstrand: Want to rewrite the kernel driver?
jambalaya has quit [Quit: Off to see the wizard.]
<icecream95> alyssa: Here's a merge request which does the complete opposite of improving the situation: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15367
<alyssa> I, uh, I think you just NAK'd your own patch there
<icecream95> Time for s/nodearray/boarray/ then?
nlhowell is now known as Guest2069
nlhowell has joined #panfrost
Guest2069 has quit [Ping timeout: 480 seconds]
camus has quit [Remote host closed the connection]
camus has joined #panfrost
camus has quit []
Danct12 has quit [Remote host closed the connection]
chewitt has quit [Quit: Zzz..]
chewitt has joined #panfrost
camus has joined #panfrost
chewitt has quit [Ping timeout: 480 seconds]
karolherbst has quit [Remote host closed the connection]
karolherbst has joined #panfrost
MajorBiscuit has joined #panfrost
<bbrezillon> jekstrand: R-b on vulkan-1.0 patch => done
jernej has quit [Quit: Free ZNC ~ Powered by LunarBNC: https://LunarBNC.net]
jernej has joined #panfrost
rtp has quit [Remote host closed the connection]
rtp has joined #panfrost
rtp has quit []
rtp has joined #panfrost
rtp has quit [Quit: Reconnecting]
rtp has joined #panfrost
rtp has quit []
rtp has joined #panfrost
rasterman has joined #panfrost
jambalaya has joined #panfrost
Danct12 has joined #panfrost
MTCoster has joined #panfrost
rkanwal has joined #panfrost
rkanwal has left #panfrost [#panfrost]
jambalaya has quit [Quit: Off to see the wizard.]
nlhowell is now known as Guest2092
nlhowell has joined #panfrost
Guest2092 has quit [Ping timeout: 480 seconds]
rkanwal has joined #panfrost
nlhowell has quit [Ping timeout: 480 seconds]
nlhowell has joined #panfrost
JulianGro has joined #panfrost
simon-perretta-img has quit [Quit: Leaving]
simon-perretta-img has joined #panfrost
wcs has joined #panfrost
wcs has quit []
simon-perretta-img has quit [Quit: Leaving]
simon-perretta-img has joined #panfrost
simon-perretta-img has quit []
simon-perretta-img has joined #panfrost
simon-perretta-img has quit []
simon-perretta-img has joined #panfrost
nlhowell is now known as Guest2101
nlhowell has joined #panfrost
Danct12 has quit [Quit: Quit]
<jekstrand> bbrezillon: Thanks!
Guest2101 has quit [Ping timeout: 480 seconds]
* jekstrand assigns marge
<jekstrand> alyssa: Didn't you hear? I don't write GL code anymore. :P
<jekstrand> icecream95: Not especially. I probably could but I don't really like kernel hacking. :-/
erlehmann has quit [Ping timeout: 480 seconds]
<robmur01> what could the kernel do better? (with the proviso that if it's not memory attributes I'm probably not the one to fix it)
erlehmann has joined #panfrost
Danct12 has joined #panfrost
<jekstrand> Last panvk run completed in 8:39:41. It's coming down!
<alyssa> Woo!
<alyssa> this is with deqp-runner I assume?
<jekstrand> yup
<jekstrand> On a single board
<jekstrand> But it means that I can kick off a run at EoD and have results in the morning.
<jekstrand> Which is such a huge step forward from the 30h runs
<alyssa> *nod*
<jekstrand> I'm hoping to get a pile of it merged today so that we can get upstream there, not just my branch.
<jekstrand> bbrezillon reviewed most of the patches, I just need to make them into final MRs and review the index buffers patches.
<jekstrand> Oh, and add CI bits to them so the newly working stuff gets some amount of testing.
<jekstrand> There's also something busted with texturing right now that I'd like to figure out.
<daniels> jekstrand: you've disabled coredumps completely, right?
erlehmann has quit []
<jekstrand> daniels: Yes. :)
<jekstrand> I think
<jekstrand> The "should I coredump?" is set to /bin/false
<jekstrand> Even without coredumps, though, deqp is such a pig to restart
<jekstrand> Just starting up and enumerating all 500k tests takes a 1-2s.
<jekstrand> 1.8M cases, rather.
<jekstrand> Damn... is it really that high?!?
<alyssa> I think I've switched to sway out of spite
<HdkR> \o/
<alyssa> logisim causes me physical pain
karolherbst_ has joined #panfrost
karolherbst has quit [Read error: Connection reset by peer]
karolherbst_ is now known as karolherbst
jambalaya has joined #panfrost
<jekstrand> alyssa: Are you aware of any issues with setting Z to REPEAT in the sampler for non-3D images?
<jekstrand> Or anyone else?
<jekstrand> Seems to be broken but Vulkan doesn't know the image dimensionality when creating the sampler[.
nlhowell has quit [Ping timeout: 480 seconds]
<alyssa> jekstrand: It.. should be fine?
<alyssa> jekstrand: Please send me a failing pandecode
<alyssa> there are restrictions for non-normalized coordinates (RECT textures), maybe that's it?
nlhowell has joined #panfrost
<alarumbe> Hi cphealy
<alarumbe> did you have a chance to test run the crash dump analyser ?
rasterman has quit [Ping timeout: 480 seconds]
rasterman has joined #panfrost
<jekstrand> alyssa: It is non-normalized. :)
MajorBiscuit has quit [Ping timeout: 480 seconds]
rasterman has quit [Quit: Gettin' stinky!]
<alyssa> jekstrand: yes that's your problem, non-normalized is heavily restricted
<alyssa> there ought to be a way to handle in VK since the DDK is conformant
<alyssa> just conceptually, non-normalized + REPEAT requires some seriously nontrivial arithmetic in hardware
<alyssa> so it makes a lot of sense to not
rkanwal has quit [Ping timeout: 480 seconds]
<jekstrand> alyssa: I'll look at what all Vulkan restrictions I can employ. There's got to be something. :D
<alyssa> jekstrand: Ahhh!
<alyssa> f'king
<alyssa> i got it
<alyssa> VUID-VkSamplerCreateInfo-unnormalizedCoordinates-01075
<alyssa> If unnormalizedCoordinates is VK_TRUE, addressModeU and addressModeV must each be either VK_SAMPLER_ADDRESS_MODE_CLAMP_TO_EDGE or VK_SAMPLER_ADDRESS_MODE_CLAMP_TO_BORDER
<alyssa> together with
<alyssa> When unnormalizedCoordinates is VK_TRUE, images the sampler is used with in the shader have the following requirements:
<alyssa> The viewType must be either VK_IMAGE_VIEW_TYPE_1D or VK_IMAGE_VIEW_TYPE_2D.
<alyssa> so the s/t wrap modes are clamp which we want
<alyssa> and the r wrap mode doesn't matter, since it *can't* matter because non-normalized ==> !3D in vulkan
<alyssa> (not in the hardware, but that's not your problem)
<alyssa> so I think you just want
<alyssa> cfg.wrap_mode_r = normalized_coordinates ? translate(wrap_mode_r) : CLAMP_TO_EDGE
<alyssa> if normalized_coordinates is set, it's correct
<jekstrand> alyssa: Or I may be able to disable some CTS tests. :)
<alyssa> and if normalized_coordinates is not set, it doesn't matter therefore not correct
<jekstrand> But I need to look at it more closely
<alyssa> jekstrand: Not needed, I'm convinced that one line patch does what you need
<alyssa> mali has a lot of restrictions around unnormalized samplers but they're the same as the VK ones (...possibly because of Arm lobbying Khronos ;-p)
<jekstrand> alyssa: Cool. I'll type the patch once I've convinced me too. :)
<jekstrand> alyssa: Maybe I'll author it to you and RB it. :)
<alyssa> jekstrand: Lol. Fair enough
<alyssa> FWIW, if normalize_coordinates=0, then:
<alyssa> - LODs are ignored, round(min_lod) is used
<alyssa> - mipmap_mode is ignored, nearest is used
<alyssa> - anistropy is disabled
<alyssa> - wrap modes must be clamp_to_edge or clamp_to_border
<alyssa> in vulkan, that's justified by
<icecream95> Ah... good 'ol Mali, faulting whenever an unused field is set wrong :)
<alyssa> - VUID-VkSamplerCreateInfo-unnormalizedCoordinates-01074
<alyssa> - VUID-VkSamplerCreateInfo-unnormalizedCoordinates-01073
<alyssa> - VUID-VkSamplerCreateInfo-unnormalizedCoordinates-01076
<alyssa> - VUID-VkSamplerCreateInfo-unnormalizedCoordinates-01075 for u/v, together with that one line patch and
<alyssa> When unnormalizedCoordinates is VK_TRUE, images the sampler is used with in the shader have the following requirements:
<alyssa> The viewType must be either VK_IMAGE_VIEW_TYPE_1D or VK_IMAGE_VIEW_TYPE_2D.
<alyssa> so yes I am convinced you're good as long as the vulkan usage is valid and you do `if(!normalized) wrap_mode_r = CLAMP`
<jekstrand> \o/
<alyssa> icecream95: It's on brand.
<jekstrand> alyssa: I'm in 3 IRC conversations now and dealing with blog review feedback. I'll get to it after the dust settles. :)
<alyssa> Fair.
<alyssa> was i supposed to review a blog
<anarsoul> alyssa: hm, do these unnormalized coords requirements go back to utgard?
<alyssa> anarsoul: No idea
<anarsoul> :(
<alyssa> Likely, REPEAT + unnormalized requires extra hw
<anarsoul> I guess we're good unless someone complains :)
<icecream95> alyssa: It also requires a division (well, modulo), doesn't it?
<alyssa> Yeah
<alyssa> that's the extra hw in question
<alyssa> ..how does this work for texelFetch..
<alyssa> oh texelFetch + wrapping is UB, nice
<icecream95> alyssa: Annoyingly, texelFetch of negative coordinates does not wrap around on Mali..
<icecream95> ("Why would the hardware make it signed anyway?")
<alyssa> also reasonable, though?
<jekstrand> Why would it wrap around? texelFetch() in GL takes a signed integer for position
<icecream95> jekstrand: But wouldn't a negative position would always be OOB (and so undefined)?
<icecream95> The problem is that 16-bit coordinates cannot be used for a texture which is >32768 wide or high
<jekstrand> Depends on your OOB behavior, I guess.
<jekstrand> With Vulkan image robustness, OOB behavior is defined.
<jekstrand> And negative values are considered OOB, I belive.
<icecream95> So then the question is: Why did GL/Vulkan make the position signed?
<jekstrand> idk. That decision was made a long time ago.
<jekstrand> Also, most implementations can't handle texture sizes larger than 2^14 anyway, so it's a bit moot.
<jekstrand> Maybe Mali can but it's weird
<alyssa> ERROR - dEQP error: 9 = TEX_SINGLE.texel_offset.array_enable.2d.rgba.explicit.s32 39, r0, r0[1], sr_count:5
<alyssa> i'm going to need that RA extension sooner than later aren't I.. grumble..
<icecream95> But it does cause problems when you have a texture buffer you lowered to a 2^16-wide texture
<alyssa> guess I'll do the obvious thing on my Valhall branch that realistically won't be merged for a while and we'll figure out something better .. later ..
<icecream95> alyssa: You mean increase the shift for the constraints?
<alyssa> Yeah
<jekstrand> icecream95: So lower it to a 2^15-wide texture and throw &0x7fff on the coordinate?
<icecream95> jekstrand: But that takes more ALU ops than just converting the coordinates to 32-bit
<icecream95> (Probably)
<jekstrand> Then convert. ALU is cheap relative to memory, especially on mobile.
<alyssa> icecream95: The arguments to FETCH on Bifrost are u32 so I'm not sure I understand the issue
<icecream95> alyssa: I guess this was Midgard then?
nlhowell has quit [Ping timeout: 480 seconds]
<alyssa> yes, midgard takes s16/s32, don't know why
<alyssa> Pass: 26174, Fail: 1453, Crash: 320, Warn: 65, Skip: 124, Missing: 3858, Flake: 6, Duration: 17:11, Remaining: 5:45
<alyssa> close enough stopping there
<alyssa> dunno what missing is about
<alyssa> Passed: 302/302 (100.0%)
<alyssa> eyy
<alyssa> next up, fixing MRT
<anarsoul> 6 flakes though
<alyssa> i hate this code
<alyssa> on the other hand, editing the control flow graph to support MRT sounds suss
<alyssa> then again it's a lot less likely to break violently
<alyssa> might not be such a bad idea, actually..
<alyssa> jekstrand: So how do we feel about blend shaders :-V
<jekstrand> alyssa: Uh... they're annoying?
<alyssa> what if i just
<icecream95> alyssa: If we get rid of blend shaders, then at least you won't be able to break dual-source blending again
<alyssa> don't support them on bifrost+
<alyssa> icecream95: rude
<alyssa> accurate, but rude :-p
<alyssa> we're really not supposed to key shaders to blend state or framebuffer formats.
<alyssa> admittedly it's not as "bad" as e.g. on AGX
<alyssa> since the fast path will still be fast / no keys
<icecream95> But make sure that the tilebuffer wait is not put too far up in the shader, or it will make things slower when there is a lot of overdraw
<alyssa> (blendable format + fixed function blend mode .. if you ever are doing not that, perf will fall off a cliff anyway and maybe the compiler perf is not the problem here)
<alyssa> and it means blend constants can be handled in a way that's not stupid
<alyssa> faster execution by a few instructions, and maybe better i-cache utilization
<alyssa> icecream95: shouldn't make a difference? I'm just talking about inlining blend shaders into the fragment shaders that use them, like we do in panvk
<icecream95> "i-cache". Because now prefetching is only done for one shader, not two?
<alyssa> "maybe"
<alyssa> Not sure what the rules are for the i-cache
<alyssa> but I wouldn't be surprised if blend shaders (that are far away in memory) hit a bad case
<alyssa> easier RA because now we don't need bifrost blend shader ABI
<alyssa> only GL is affected and if we're thinking VK long term, I mean..
<alyssa> We have to support blend shaders on Midgard due to a hw limitation, but that's a different compiler backend so who cares
<icecream95> alyssa: So can we now give the +BLEND any set of source registers, not only R0-R3?
<icecream95> About tilebuffer wait.. if the LD_TILE is near the start of the shader, then other shaders writing to the same pixel have to finish blending first