ChanServ changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard & Bifrost - Logs - <macc24> i have been here before it was popular
<jekstrand> bbrezillon: New panvk secondary command buffer patches:
<jekstrand> bbrezillon: I think maybe I actually like that. I may hate it in the morning, though. (-:
macc24 has quit [Ping timeout: 480 seconds]
macc24 has joined #panfrost
camus has joined #panfrost
camus1 has quit [Read error: Connection reset by peer]
JulianGro has quit [Remote host closed the connection]
vstehle has quit [Ping timeout: 480 seconds]
vstehle has joined #panfrost
Daanct12 has joined #panfrost
<tomeu> jekstrand: can't you just run the CTS in the same way as the CI?
Daanct12 has quit [Read error: Connection reset by peer]
Daanct12 has joined #panfrost
<Daanct12> looks like i was able to crash my pinephone pro
<Daanct12> i type `timedemo 1` in ioq3 console (latest git) and suddenly everything slows down
<Daanct12> but if i kill q3 process then panfrost throws up some error and then crash
<Daanct12> this is on sway though, not too sure if it happens on x11
<bbrezillon> jekstrand: just wanted to have all the descriptor emission code in panvk_[vX_]cs.{c,h}. If we're worried about the perf cost of the of this extra function call, I'd rather make all emit helpers inline functions defined in panvk_cs.h, because I kinda like this separation (panvk_vX_cmd_buffer.c is already quite big without those emit functions)
erlehmann has quit [Ping timeout: 480 seconds]
cphealy has quit [Ping timeout: 480 seconds]
pendingchaos has quit [Ping timeout: 480 seconds]
<jekstrand> tomeu: It only runs a tiny subset of tests
<jekstrand> tomeu: If I want to know how much we fixed, it won't tell me.
<tomeu> yeah, but in theory the ones that pass
<jekstrand> Yeah, but when adding new features, it doesn't say what more passes
<tomeu> if we pass more, we should extend coverage in CI
<tomeu> ah, right
<tomeu> for that we need full runs, but hopefully you don't need to do that often
<tomeu> jekstrand: you could easily send whole runs to CI and shard, but would be good to coordinate that so you don't bog down devices for too long
<tomeu> we have 7 vim3 boards you could use for that
<jekstrand> tomeu: If I'm actually hacking on panvk more than a few hours/week, I'll easily dominate 7 boards. :-/
<jekstrand> If doing local runs is a real problem, I may just have daniels or guy send me a few more. It's not like vim3s are expensive.
<tomeu> yeah, hopefully you won't spend too much time taking care of your farm
<tomeu> or if you want to add valhall support to panvk, you can get much more powerful boards with mediatek socs
<bbrezillon> tomeu: can I get one too ? :P
<daniels> bbrezillon: of course, what do you need?
<bbrezillon> I was just kidding. Don't know when I'll get back to panvk dev, and there's a lot to address on Bifrost before that
<tomeu> they are actually relatively cheap, but come with a screen attached, so not ideal for a farm on your desk :)
erlehmann has joined #panfrost
rasterman has joined #panfrost
MajorBiscuit has joined #panfrost
camus has quit [Remote host closed the connection]
camus has joined #panfrost
tjcorley has quit [Ping timeout: 480 seconds]
pendingchaos has joined #panfrost
<macc24> i also want a mt8192 machine definitely-for-development-i-promise
<macc24> xD
Rathann has joined #panfrost
erlehmann has quit [Ping timeout: 480 seconds]
erlehmann has joined #panfrost
JulianGro has joined #panfrost
camus1 has joined #panfrost
camus has quit [Read error: Connection reset by peer]
<bbrezillon> jekstrand: there's no generic EndCommandBuffer() implem anymore. I guess I should implement it directly in the primary command buffer implementation
<bbrezillon> I guess that'd work if we had the concept of cmd_dispatch_table, and we'd only force the beginning of the device dispatch table to the default cmd dispatchers
<bbrezillon> but even then, we'd need default implems for End/BeginCommandBuffer and all other cmd entrypoints that don't start with 'Cmd'
<bbrezillon> unless we consider those to not be cmd entrypoints, but I'm not convinced that's a good idea
Daanct12 has quit [Quit: Quit]
<bbrezillon> hm, actually it works if I move this vk_device_dispatch_table_from_entrypoints() at the beginning
<bbrezillon> if you're happy with this version, I'll update the MR
_99 has joined #panfrost
_99 has left #panfrost [#panfrost]
camus1 has quit []
cphealy has joined #panfrost
nlhowell has joined #panfrost
nlhowell is now known as Guest1705
nlhowell has joined #panfrost
Guest1705 has quit [Ping timeout: 480 seconds]
<jekstrand> bbrezillon: Yeah, I guess we'll need begin/end
<jekstrand> Woah! deqp-vk run finished in 10 hours! Must have gotten rid of a lot of crashes.
<jekstrand> bbrezillon: RE: allocation scope. Yeah, SCOPE_COMMAND is wrong. I suspect we want SCOPE_OBJECT or SCOPE_DEVICE
<jekstrand> VK_SYSTEM_ALLOCATION_SCOPE_OBJECT specifies that the allocation is scoped to the lifetime of the Vulkan object that is being created or used.
<jekstrand> So I think we want OBJECT
<bbrezillon> okay, seems to match my understanding then
<bbrezillon> and I realized panvk was missing a custom secondary_CmdBindDescriptorSets ...
<jekstrand> bbrezillon: Yeah, I've been thinking about that one.
<jekstrand> bbrezillon: I think what we want to do is add vk_device::[un]ref_pipeline_layout function pointers and say you have to reference count if you want to use command recording.
<jekstrand> Then we can implement all the manual stuff in src/vulkan/runtime/vk_cmd_enqueue_manual.c or something
<bbrezillon> sounds better than what I did in dozen...
<jekstrand> If we're going to have panvk, lavapipe, and dozen all using this thing, we may as well try to share the manual enqueue funcs.
pjakobsson has joined #panfrost
<jekstrand> bbrezillon: Yeah, pipeline and descriptor set layouts have annoying timelines. We reference count descriptor set layouts in ANV for $REASONS.
<jekstrand> s/timelines/lifetimes/
<bbrezillon> I ended up copying relevant data from set layouts to pipeline layout in dozne
pjakobsson_ has quit [Ping timeout: 480 seconds]
<bbrezillon> I think you were the one suggesting that :)
<bbrezillon> but even with that, it still makes the manual CmdBindDescriptorSets() kind of ugly/open-coded
<jekstrand> Why? Ref the pipeline layout and stuff everything in the struct. When you destroy, unref.
<bbrezillon> you mean the set layouts?
<jekstrand> BindDescriptorSets takes a pipeline layout
<bbrezillon> yeah, I was just digressing
<bbrezillon> since you mentioned the pipeline and set layout being uncorrelated
<bbrezillon> but sure, your solution works fine and avoids open-coding the helper in 3 drivers
<jekstrand> bbrezillon: I've not read all your comments yet. Been doing e-mail and chat back-log so far. Do you want me to keep going on secondaries and try to get it actually working and passing non-trivial tests? Or did you want to run with it some more?
<bbrezillon> jekstrand: I had it passing the basic tests
<bbrezillon> at least on panfrost
erlehmann has quit [Ping timeout: 480 seconds]
pjakobsson has quit [Remote host closed the connection]
<jekstrand> bbrezillon: Old version or patches on mine?
<bbrezillon> yours
<bbrezillon> triggered a lavapipe CI run
<jekstrand> \o/
<bbrezillon> to see if we regress things when transitioning to cmd_queue wrappers
<jekstrand> Thanks!
<jekstrand> Of course!
erlehmann has joined #panfrost
<jekstrand> Where'd you push the updated branch? I'll take a look.
<jekstrand> I've got a pretty awesome lavapipe testing machine. :D
<jekstrand> cool. I'll pull and debug once I read some IMG comments.
<jekstrand> And... my vim3 died :(
Rathann has quit [Quit: Leaving]
<tomeu> my vim3 died as well, the odroid n2 seems more reliable
<macc24> imagine having your computers break before they become obsolete this statement was sponsored by business laptop fans
<bbrezillon> jekstrand: hm, I was looking at the descriptor_set_layout refcount logic, and it looks like the allocator passed to vkDestroyDescriptorSetLayout is ignored. Isn't a problem if the caller uses an allocator that's different from the device allocator?
<jekstrand> It means they always get the device allocator
<jekstrand> Which sucks but it is what it is
<bbrezillon> ah, right, you pass the device allocator in the create path
davidlt has joined #panfrost
erlehmann has quit [Ping timeout: 480 seconds]
rcf has quit [Quit: WeeChat 3.2.1]
rcf has joined #panfrost
robmur01 has quit [Quit: Leaving]
MajorBiscuit has quit [Quit: WeeChat 3.4]
* jekstrand plugs in serial in hopes of figuring out why his VIM3 is dying
<jekstrand> panfrost kernel bugs. :(
<HdkR> Welcome to the kernel bug party \o/
<bbrezillon> jekstrand: ok, so there's one bug remaining in lvp after the transition to vk_cmd_queue helpers
<jekstrand> bbrezillon: Oh, can you throw me the latest branch?
<jekstrand> bbrezillon: I'm looking at an old one and it's blowing up bad
<bbrezillon> already pushed
<jekstrand> kk
<jekstrand> bbrezillon: What's the bug?
<jekstrand> bbrezillon: I'm seeing it blow up on PIPELINE_BARRIER2 right now
<bbrezillon> uh, apparently the last version introduced new bugs :)
<bbrezillon> but I think it all resolves to the same issue
<bbrezillon> vk_common_Xxx -> Xxx2 wrappers
<jekstrand> yup
<jekstrand> How did lavapipe handle that before?
<bbrezillon> we don't know what lvp implements in lvp_execute_cmd_buffer()
<daniels> zmike studiously avoiding this channel ;)
<HdkR> Just make panvk good enough to run typical x86 games and he'll be forced to come over :P
<daniels> HdkR: but how could you possibly run x86-64 games on aarch64 ... ?
<HdkR> I hear there is a Fosdem talk from a little known project that sheds some light on this
<jekstrand> bbrezillon: What I don't get is how it avoided generating those commands before.
<bbrezillon> it just had lvp_CmdXxx auto-generated with vk_commands_gen.pu
<bbrezillon> so the core wasn't overloading the CmdXxx implementation with its own wrapper
<jekstrand> bbrezillon: I think I'm seeing what's going on mayb
<jekstrand> *maybe
<jekstrand> and might have a plan
<bbrezillon> guess we could pass a handler table where each entry is a function that takes a cmd_entry and a void pointer, and then use that table for both filling the device dispatch table, and automating lvp_execute_cmd_buffer() a bit
rasterman has quit [Quit: Gettin' stinky!]
<jekstrand> bbrezillon: pushed my branch. Should work now.
<jekstrand> bbrezillon: I'm going to start moving hand-written wrappers
<bbrezillon> jekstrand: panvk/exec-cmd-if-second ?
<jekstrand> bbrezillon: yup
<bbrezillon> I only see my changes there
<jekstrand> bbrezillon: I squashed things
<bbrezillon> mind pointing to the relevant commit?
<jekstrand> look at the lavapipe commit
<jekstrand> it's got a new function in there with an allowlist
<bbrezillon> ok, so that's done manually
<bbrezillon> got it
<jekstrand> I don't see another way
<jekstrand> Also, we've been fighting the auto-generation in lavapipe for a long time because of stuff like that
<jekstrand> It can now start using 2 wrappers if we have a list.
pendingchaos has quit [Remote host closed the connection]
pendingchaos has joined #panfrost
<bbrezillon> me neither, but I thought you had a brilliant idea :)
<jekstrand> Nope. I just did the typing. :)
<bbrezillon> okay, I'll resping with those changes and push that tomorrow then
<jekstrand> I'll send out an MR with some of the changes and the lavapipe stuff before EOD
<jekstrand> I'm trying to move some lavapipe stuff into common code now for hand-written enqueue funcs
<bbrezillon> great!
<bbrezillon> Cc me on the MR and I'll review it
<jekstrand> cool
pendingchaos_ has joined #panfrost
pendingchaos has quit [Read error: Connection reset by peer]
pendingchaos_ is now known as pendingchaos
erlehmann has joined #panfrost
cphealy has quit []
davidlt has quit [Ping timeout: 480 seconds]
cphealy has joined #panfrost
jernej has quit [Remote host closed the connection]
jernej has joined #panfrost
<jekstrand> bbrezillon: CI is running now.
<jekstrand> bbrezillon: I've also re-pushed my panvk/exed-cmd-if-second branch on top of that.
<jekstrand> bbrezillon: I've not done CmdBindDescriptorSets yet
tanty has quit []
tanty has joined #panfrost
tanty has quit []
tanty has joined #panfrost
<jekstrand> bbrezillon: Ok, I should have BindDescriptorSets now. Only compile-tested but should roughly work.
nlhowell has quit [Ping timeout: 480 seconds]
<icecream95> *phew*, using ioctl and mmap in Rust is a lot harder than in C.. hopefully I won't need to do any more of those
<icecream95> alyssa: Fittingly, commit afbc24a234e in 22.0 is one of your Panfrost commits :)
<icecream95> It wasn't AFBC related, though..
rasterman has joined #panfrost
<anarsoul> icecream95: unsafe {} ?
<icecream95> anarsoul: I've got it working now, I don't need suggestions. I did end up with four unsafe blocks in total..
<anarsoul> hope your unsafe blocks are safe :)
<icecream95> They caused a few kernel Oopses before I fixed the bugs there ;)
rasterman has quit [Quit: Gettin' stinky!]
<jekstrand> I think you're generally allowed to assume the kernel is safe in rust. :)
<jekstrand> Does layered rendering really require geometry shaders on Mali?
<jekstrand> I'm seeing pan_prepare_rt assert that `last_layer == first_layer`
<icecream95> jekstrand: There is no HW support for geometry shaders..
<icecream95> How does layered rendering work?
<icecream95> (From the API point of view)
<jekstrand> On some hardware, you can output a layer ID from the vertex shader
<jekstrand> In GL, I think it requires outputting it from the geometry shader unless you have an extension
<jekstrand> Trying to figure out how it's all linked in Vulkan ATM
<icecream95> One way would be to store a position for every layer, but make it e.g. all zeroes for inactive layers
<icecream95> Then you can run the tiler job for each layer
<icecream95> Another way is to use the same tiler heap in multiple fragment jobs, and check against the layer ID in the fragment shader
<jekstrand> You don't want to broadcast a primitive across all layers. I want to select the layer from the VS
<icecream95> So my suggestion is to write the position for one layer, but every other layer sets it to zero and the tiler removes the primitive
<jekstrand> You only render to one layer at a time
<jekstrand> Well, you have them all bound but each primitive only goes to one
<jekstrand> I guess if shaderOutputLayer isn't supported and you don't support GS, layered rendering is a no-op and everything goes to layer 0.
<jekstrand> Kind-of weird that it's allowed at all in that case, though.
<icecream95> Usually vertex shaders write a single output position, but it should be possible to have multiple outputs, one for each layer
<jekstrand> I don't want one for each layer
<jekstrand> I only want one output position
<jekstrand> I just need to be able to direct it to a particular layer
<icecream95> ..but if you don't write to the others then the tiler will read uninitialised data for the vertex
<icecream95> (So if you memset the BO to 0 then you don't have to write to the others)
<jekstrand> :-/
<jekstrand> On a tiler, maybe you want one vertex data BO per layer?
<jekstrand> That seems kinda crazy
<icecream95> (I don't know how the blob does this, if it supports the feature; this is just thinking what could work from my knowledge of the tiler unit)
<jekstrand> That's fair
<icecream95> "vertex data BO per layer". Which vertex data? Position data can be seperated from varyings, so the draw descriptors for each layer would share the varyings, but use a different buffer for positions
<icecream95> But a third way of implementing it would be: Allocate an index buffer per layer, and in the vertex shader only add the vertex to the index buffer corresponding to the layer, so that the tiler doesn't see the other vertices. This cuts down on memory bandwidth, because you only have to write up to four bytes per layer, rather than 16 for a position
<jekstrand> yeah
<icecream95> Well.. that's easy for drawArrays, but when you already have an index buffer it could get tricky
<icecream95> (The blob does not expose the feature, which is not a surprise)
<jekstrand> the blob doesn't expose geometry shaders?
<icecream95> It does, but not multiViewport
<jekstrand> This is different from multiviewport
<jekstrand> multiviewport just lets you have a different viewport per layer. The base GS feature adds layered rendering.
<icecream95> does?
<jekstrand> At least in GL
<jekstrand> Vulkan also ties them together weirdly
<icecream95> Oh.. even if the shaderOutputLayer feature is not enabled, then it can be used from geometry shaders
<jekstrand> Yes
<jekstrand> It's very strange
<jekstrand> shaderOutputLayer really means shaderOtherThanGeometryOutputLayer
<icecream95> Here's a fourth way of implementing the feature: Rewrite the fixed-function tiler in OpenCL, and then you can add whatever features you like to it. I've already written a software tiler in C that mostly works
<jekstrand> hehe. Sure. :)
<jekstrand> Does bifrost run with a subgroup size of 4 in the FS?
<jekstrand> I'm reading the nir_op_fddx code and that's the only that makes sense
<icecream95> jekstrand: The "Arm Mali GPU Datasheet" lists v6 Bifrost as having a "warp width" of 4, v7 having 8, and Valhall having 16
<icecream95> +CLPER.i32 on v7 supports subgroup sizes of 2, 4 and 8
<jekstrand> Ok. That explains the weird &ing then
<jekstrand> It sues 1 and 2 where I expected ~2 and ~1 but on a warp size of 4, it's the same.
<icecream95> Note the BI_SUBGROUP_SUBGROUP4 below
simon-perretta-img has joined #panfrost
<jekstrand> Ah
<jekstrand> Yeah, that makes sense
<jekstrand> Ok, I think I know how to do coarse/fine on bifrost. Not sure how much anyone cares but it's easy so I'll type it tomorrow once my machine is freed up.