ChanServ changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard & Bifrost - Logs https://oftc.irclog.whitequark.org/panfrost - <macc24> i have been here before it was popular
chrisf has joined #panfrost
soreau has quit [Read error: Connection reset by peer]
soreau has joined #panfrost
vstehle has quit [Ping timeout: 480 seconds]
vstehle has joined #panfrost
mmind00 has quit [Server closed connection]
mmind00 has joined #panfrost
tolszak has joined #panfrost
chrisf has quit [Remote host closed the connection]
tolszak has quit [Ping timeout: 480 seconds]
sigmaris has quit [Server closed connection]
sigmaris has joined #panfrost
tomeu829 has quit []
tomeu has joined #panfrost
<tomeu>
bbrezillon: just one idea about the surprising perf results: could it be related to out-of-order writes to WC buffers flushing caches and thus making unrelated operations slower?
stepri01 has quit [Server closed connection]
stepri01 has joined #panfrost
<bbrezillon>
tomeu: I'd expect the memcpy to be in-order, but the address relocation definitely looks more random, though we do try to patch address in the order they appear in the descriptor
<bbrezillon>
but why would writes to a WC buffer (AKA uncached on ARM) mess up with the caches?
evx256 has quit [Ping timeout: 480 seconds]
MajorBiscuit has joined #panfrost
<bbrezillon>
tomeu: what's true though, is the fact that the memcpy will trash the cache, and we then patch addresses in a second pass. So it's like an extra load compared to the SW implementation, which reads the vk_cmd_queue_entry once and emit all the descs that result from this cmd.
<tomeu>
and btw, regarding the unexpected page fault: somebody reported in LKML that they were getting a surprising amount of minor faults with panfrost vs. libmali
<tomeu>
wonder if the two issues could be related
<tomeu>
and one more btw: could it be worth checking that the memcpy happens sequentially?
<bbrezillon>
tomeu: the BO page fault thing, I addressed with MAP_POPULATE in the mmap() we have in panfrost_bo_mmap()
<bbrezillon>
and it indeed helped reducing the diff between the 2 implems
<tomeu>
ah, very cool
<bbrezillon>
it definitely makes sense to pre-populate the page table since pages are already pinned and mapped GPU-side
<bbrezillon>
well, there's a bit of overhead, so I'm actually not sure it makes sense to pre-populate in all situations, but if we know we'll be accessing the whole buffer, it makes sense
<bbrezillon>
but, at the same time, if we just want to access a sub-portion of the buffer, we'd rather extend panfrost_bo_mmap() to take an offset and a size, so we don't mmap() the whole thing
tolszak has joined #panfrost
narmstrong has quit [Read error: Connection reset by peer]
narmstrong has joined #panfrost
rasterman has joined #panfrost
jernej has quit [Server closed connection]
jernej_ has joined #panfrost
pjakobsson has joined #panfrost
Stary has quit [Server closed connection]
Stary has joined #panfrost
mriesch has quit [Server closed connection]
mriesch has joined #panfrost
br has quit [Server closed connection]
br has joined #panfrost
megi1 has quit [Server closed connection]
megi1 has joined #panfrost
CounterPillow_ has quit [Server closed connection]
CounterPillow has joined #panfrost
nlhowell has joined #panfrost
erlehmann has quit [Ping timeout: 480 seconds]
camus1 has joined #panfrost
robmur01 has joined #panfrost
camus has quit [Ping timeout: 480 seconds]
nlhowell has quit [Ping timeout: 480 seconds]
nlhowell has joined #panfrost
nlhowell is now known as Guest2245
nlhowell has joined #panfrost
Guest2245 has quit [Ping timeout: 480 seconds]
JulianGro has quit [Ping timeout: 480 seconds]
erlehmann has joined #panfrost
tolszak has quit [Ping timeout: 480 seconds]
tjcorley has quit [Ping timeout: 480 seconds]
tjcorley has joined #panfrost
MajorBiscuit has quit [Ping timeout: 480 seconds]
<alyssa>
Hmm will I add a new feature to the Midgard compiler, for the sake of deleting a bunch of code for a Valhall Vulkan driver
<alyssa>
L e g a cy
<anarsoul>
hehe
<alyssa>
gl_FragCoord is a varying on midgard, this is inconvenient for blit shaders
<alyssa>
but I think there's a LD_SPECIAL selector that behaves like the Valhall preloaded fragment position
<anarsoul>
it's a real varying?
<alyssa>
iSH
<alyssa>
*Ish
<alyssa>
Oh, hum
<anarsoul>
sounds like a rudiment from utgard to me :)
<alyssa>
jekstrand: The gl_FragCoord vs varying thing for blit shaders is a bit more complicated
<alyssa>
For scaling blits, varying interp abstracts away the src vs dst coords
<anarsoul>
gl_FragCoord is loaded using load varying instruction on utgard, but it's not a real varying (i.e. no need to allocate space for it in varying stream)
<alyssa>
yep, same as midgard then
<alyssa>
jekstrand: So would need to consider the cost of changing coordinate systems if the varying is replaced by gl_FragCoord
<alyssa>
and would still have the f2i conversion
<jekstrand>
sure
<alyssa>
Admittedly I am not sure how a pile of ALU compares to a varying load
<jekstrand>
alyssa: Thinking about how one would do a vkCmdDrawFill()... I don't know if you could have "regular" varyings at that point.
<alyssa>
(Noting that the varying load doesn't really do memory access, I mean, it's a quad it'll be 100% cache hit rate)
<jekstrand>
per-instance attributes would be fine, I guess. And push constants or uniforms.
<jekstrand>
But not per-vertex attributes since there are no vertices.
* alyssa
downlaods vulkan spec
<jekstrand>
alyssa: I expect that memorized by tomorrow. :P
<alyssa>
yOu'Re nOt tHe bOsS oF mE
<alyssa>
<daniels> alyssa: I expect that memorized by tomorrow.
<jekstrand>
hehe
<alyssa>
I don't see that command
<alyssa>
oh, vkCmdFillBuffer?
<jekstrand>
There is no such thing
<alyssa>
that writes to a buffer so it'll use a compute shader
<alyssa>
?
<jekstrand>
Thinking about our discussion from yesterday about a special draw that can use the tile load shader
<daniels>
alyssa: *memorised
<alyssa>
daniels: caught red handed
<alyssa>
jekstrand: right, ok
<jekstrand>
alyssa: At that poing, you pretty much have to use gl_FragCoord and math
<jekstrand>
*point
<alyssa>
I don't see the problem?
<alyssa>
Maybe I'm not understanding
<jekstrand>
Maybe I don't understand how those shaders work?
<alyssa>
For pre-frame shaders (and full screen jobs on v9), we specify only
<alyssa>
{ fragment shader, varying buffers }
<alyssa>
where the vertices are implicitly (0, 0), (width, 0), (0, height), (width, height)
<jekstrand>
Right, so you can drop some stuff in those varying buffers
<alyssa>
right
<jekstrand>
But where do the interpolants come from?
<alyssa>
interpolants?
<jekstrand>
barycentrics
<jekstrand>
I presume those varying buffers don't have a value per-pixel. They have to be interpolated somehow
<alyssa>
barys are internal to the hw
<alyssa>
the shader just does a LD_VARY
<alyssa>
(with the sample mode and the varying index)
<jekstrand>
So how does that work with full-screen draws? Barys are relative to the primitive.
<jekstrand>
Or is it relative to a triangle that covers the upper-left or something like that?
<alyssa>
Exactly the same as a GL_QUAD with the four corners of the screen
<jekstrand>
Ok. Makes sense
<jekstrand>
On Intel with RECTLIST, it's 3 vertices that get auto-extended out to a rectangle and you end up with barys outside [0, 1]
<jekstrand>
The other trick is, without running a VS, I'm not sure how you would specify the varying buffers through the Vulkan API. That seems like something where the format would be very mali-specific.
<alyssa>
Yeah
<alyssa>
I'm not trying to make this nice and Vulkan generic or anything
<alyssa>
at this point -- just trying to do something sane for Mali alone, for both GL and VK
<alyssa>
bbrezillon: has already done the hard parts ^^
<alyssa>
just need to get it closer to what the hardware wants
<alyssa>
hey uh what does blorp stand for
<alyssa>
"pan_blit" is not a good prefix if this grows clear/copy/mipmap functionality
<alyssa>
blahli
<alyssa>
bland
<jekstrand>
Blit Or Resolve Pass, originally. These days, it's Blit Over Render Pipe
<jekstrand>
I go back and forth a bit on whether everyone should build their own BLORP or we should build a u_blitter for Vulkan.
<alyssa>
forget vulkan, the u_blitter we have for Gallium is inefficient on Mali
<alyssa>
and doesn't work for tilebuffer preloads (...I tried... it sucked hard...) so you still grow your own blitter
<jekstrand>
:(
<jekstrand>
bummer
<anarsoul>
alyssa: why is it inefficient? (except redundant vertex stage)
<alyssa>
anarsoul: that
* anarsoul
still needs to implement blitter for lima to get msaa working, u_blitter cannot do msaa resolve for utgard :(
<alyssa>
and on bifrost+, also redundant tiling stage
<alyssa>
and no good way to coalesce blits for mipmap gen
<anarsoul>
and how much is overhead?
<alyssa>
\shrug/
<alyssa>
Doesn't really matter now, the code is there and works and needs to be there for panvk
<alyssa>
and improving it more is easier than doing a vk_blitter than works less.. well?
dhewg has joined #panfrost
rtp has joined #panfrost
HdkR has joined #panfrost
tolszak has joined #panfrost
skl131313 has joined #panfrost
skl131313_ has joined #panfrost
skl131313_ has quit [Remote host closed the connection]