ChanServ changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard & Bifrost - Logs https://oftc.irclog.whitequark.org/panfrost - <macc24> i have been here before it was popular
chrisf has joined #panfrost
soreau has quit [Read error: Connection reset by peer]
soreau has joined #panfrost
vstehle has quit [Ping timeout: 480 seconds]
vstehle has joined #panfrost
mmind00 has quit [Server closed connection]
mmind00 has joined #panfrost
tolszak has joined #panfrost
chrisf has quit [Remote host closed the connection]
tolszak has quit [Ping timeout: 480 seconds]
sigmaris has quit [Server closed connection]
sigmaris has joined #panfrost
tomeu829 has quit []
tomeu has joined #panfrost
<tomeu> bbrezillon: just one idea about the surprising perf results: could it be related to out-of-order writes to WC buffers flushing caches and thus making unrelated operations slower?
stepri01 has quit [Server closed connection]
stepri01 has joined #panfrost
<bbrezillon> tomeu: I'd expect the memcpy to be in-order, but the address relocation definitely looks more random, though we do try to patch address in the order they appear in the descriptor
<bbrezillon> but why would writes to a WC buffer (AKA uncached on ARM) mess up with the caches?
evx256 has quit [Ping timeout: 480 seconds]
MajorBiscuit has joined #panfrost
<bbrezillon> tomeu: what's true though, is the fact that the memcpy will trash the cache, and we then patch addresses in a second pass. So it's like an extra load compared to the SW implementation, which reads the vk_cmd_queue_entry once and emit all the descs that result from this cmd.
<tomeu> and btw, regarding the unexpected page fault: somebody reported in LKML that they were getting a surprising amount of minor faults with panfrost vs. libmali
<tomeu> wonder if the two issues could be related
<tomeu> and one more btw: could it be worth checking that the memcpy happens sequentially?
<bbrezillon> tomeu: the BO page fault thing, I addressed with MAP_POPULATE in the mmap() we have in panfrost_bo_mmap()
<bbrezillon> and it indeed helped reducing the diff between the 2 implems
<tomeu> ah, very cool
<bbrezillon> it definitely makes sense to pre-populate the page table since pages are already pinned and mapped GPU-side
<bbrezillon> well, there's a bit of overhead, so I'm actually not sure it makes sense to pre-populate in all situations, but if we know we'll be accessing the whole buffer, it makes sense
<bbrezillon> but, at the same time, if we just want to access a sub-portion of the buffer, we'd rather extend panfrost_bo_mmap() to take an offset and a size, so we don't mmap() the whole thing
tolszak has joined #panfrost
narmstrong has quit [Read error: Connection reset by peer]
narmstrong has joined #panfrost
rasterman has joined #panfrost
jernej has quit [Server closed connection]
jernej_ has joined #panfrost
pjakobsson has joined #panfrost
Stary has quit [Server closed connection]
Stary has joined #panfrost
mriesch has quit [Server closed connection]
mriesch has joined #panfrost
br has quit [Server closed connection]
br has joined #panfrost
megi1 has quit [Server closed connection]
megi1 has joined #panfrost
CounterPillow_ has quit [Server closed connection]
CounterPillow has joined #panfrost
nlhowell has joined #panfrost
erlehmann has quit [Ping timeout: 480 seconds]
camus1 has joined #panfrost
robmur01 has joined #panfrost
camus has quit [Ping timeout: 480 seconds]
nlhowell has quit [Ping timeout: 480 seconds]
nlhowell has joined #panfrost
nlhowell is now known as Guest2245
nlhowell has joined #panfrost
Guest2245 has quit [Ping timeout: 480 seconds]
JulianGro has quit [Ping timeout: 480 seconds]
erlehmann has joined #panfrost
tolszak has quit [Ping timeout: 480 seconds]
tjcorley has quit [Ping timeout: 480 seconds]
tjcorley has joined #panfrost
MajorBiscuit has quit [Ping timeout: 480 seconds]
<alyssa> Hmm will I add a new feature to the Midgard compiler, for the sake of deleting a bunch of code for a Valhall Vulkan driver
<alyssa> L e g a cy
<anarsoul> hehe
<alyssa> gl_FragCoord is a varying on midgard, this is inconvenient for blit shaders
<alyssa> but I think there's a LD_SPECIAL selector that behaves like the Valhall preloaded fragment position
<anarsoul> it's a real varying?
<alyssa> iSH
<alyssa> *Ish
<alyssa> Oh, hum
<anarsoul> sounds like a rudiment from utgard to me :)
<alyssa> jekstrand: The gl_FragCoord vs varying thing for blit shaders is a bit more complicated
<alyssa> For scaling blits, varying interp abstracts away the src vs dst coords
<anarsoul> gl_FragCoord is loaded using load varying instruction on utgard, but it's not a real varying (i.e. no need to allocate space for it in varying stream)
<alyssa> yep, same as midgard then
<alyssa> jekstrand: So would need to consider the cost of changing coordinate systems if the varying is replaced by gl_FragCoord
<alyssa> and would still have the f2i conversion
<jekstrand> sure
<alyssa> Admittedly I am not sure how a pile of ALU compares to a varying load
<jekstrand> alyssa: Thinking about how one would do a vkCmdDrawFill()... I don't know if you could have "regular" varyings at that point.
<alyssa> (Noting that the varying load doesn't really do memory access, I mean, it's a quad it'll be 100% cache hit rate)
<jekstrand> per-instance attributes would be fine, I guess. And push constants or uniforms.
<jekstrand> But not per-vertex attributes since there are no vertices.
* alyssa downlaods vulkan spec
<jekstrand> alyssa: I expect that memorized by tomorrow. :P
<alyssa> yOu'Re nOt tHe bOsS oF mE
<alyssa> <daniels> alyssa: I expect that memorized by tomorrow.
<jekstrand> hehe
<alyssa> I don't see that command
<alyssa> oh, vkCmdFillBuffer?
<jekstrand> There is no such thing
<alyssa> that writes to a buffer so it'll use a compute shader
<alyssa> ?
<jekstrand> Thinking about our discussion from yesterday about a special draw that can use the tile load shader
<daniels> alyssa: *memorised
<alyssa> daniels: caught red handed
<alyssa> jekstrand: right, ok
<jekstrand> alyssa: At that poing, you pretty much have to use gl_FragCoord and math
<jekstrand> *point
<alyssa> I don't see the problem?
<alyssa> Maybe I'm not understanding
<jekstrand> Maybe I don't understand how those shaders work?
<alyssa> Ordinarily, we specify
<alyssa> { vertex shader, vertex attributes, varying buffers } + { fragment shader, varying buffers }
<alyssa> varying buffers in real memory
<alyssa> For pre-frame shaders (and full screen jobs on v9), we specify only
<alyssa> { fragment shader, varying buffers }
<alyssa> where the vertices are implicitly (0, 0), (width, 0), (0, height), (width, height)
<jekstrand> Right, so you can drop some stuff in those varying buffers
<alyssa> right
<jekstrand> But where do the interpolants come from?
<alyssa> interpolants?
<jekstrand> barycentrics
<jekstrand> I presume those varying buffers don't have a value per-pixel. They have to be interpolated somehow
<alyssa> barys are internal to the hw
<alyssa> the shader just does a LD_VARY
<alyssa> (with the sample mode and the varying index)
<jekstrand> So how does that work with full-screen draws? Barys are relative to the primitive.
<jekstrand> Or is it relative to a triangle that covers the upper-left or something like that?
<alyssa> Exactly the same as a GL_QUAD with the four corners of the screen
<jekstrand> Ok. Makes sense
<jekstrand> On Intel with RECTLIST, it's 3 vertices that get auto-extended out to a rectangle and you end up with barys outside [0, 1]
<jekstrand> The other trick is, without running a VS, I'm not sure how you would specify the varying buffers through the Vulkan API. That seems like something where the format would be very mali-specific.
<alyssa> Yeah
<alyssa> I'm not trying to make this nice and Vulkan generic or anything
<alyssa> at this point -- just trying to do something sane for Mali alone, for both GL and VK
<alyssa> bbrezillon: has already done the hard parts ^^
<alyssa> just need to get it closer to what the hardware wants
<alyssa> hey uh what does blorp stand for
<alyssa> "pan_blit" is not a good prefix if this grows clear/copy/mipmap functionality
<alyssa> blahli
<alyssa> bland
<jekstrand> Blit Or Resolve Pass, originally. These days, it's Blit Over Render Pipe
<jekstrand> I go back and forth a bit on whether everyone should build their own BLORP or we should build a u_blitter for Vulkan.
<alyssa> forget vulkan, the u_blitter we have for Gallium is inefficient on Mali
<alyssa> and doesn't work for tilebuffer preloads (...I tried... it sucked hard...) so you still grow your own blitter
<jekstrand> :(
<jekstrand> bummer
<anarsoul> alyssa: why is it inefficient? (except redundant vertex stage)
<alyssa> anarsoul: that
* anarsoul still needs to implement blitter for lima to get msaa working, u_blitter cannot do msaa resolve for utgard :(
<alyssa> and on bifrost+, also redundant tiling stage
<alyssa> and no good way to coalesce blits for mipmap gen
<anarsoul> and how much is overhead?
<alyssa> \shrug/
<alyssa> Doesn't really matter now, the code is there and works and needs to be there for panvk
<alyssa> and improving it more is easier than doing a vk_blitter than works less.. well?
dhewg has joined #panfrost
rtp has joined #panfrost
HdkR has joined #panfrost
tolszak has joined #panfrost
skl131313 has joined #panfrost
skl131313_ has joined #panfrost
skl131313_ has quit [Remote host closed the connection]
tjcorley has quit [Ping timeout: 480 seconds]
fahien has joined #panfrost
tjcorley has joined #panfrost
psydroid has joined #panfrost
rasterman has quit [Quit: Gettin' stinky!]
evx256 has joined #panfrost
megi1 has quit []