ChanServ changed the topic of #asahi-gpu to: Asahi Linux: porting Linux to Apple Silicon macs | GPU / 3D graphics stack black-box RE and development (NO binary reversing) | Keep things on topic | GitHub: https://alx.sh/g | Wiki: https://alx.sh/w | Logs: https://alx.sh/l/asahi-gpu
yuyichao_ has quit [Quit: Konversation terminated!]
<alyssa> ..part of me wants to support g13p just for the lulz
<alyssa> (A14 iPhones)
<alyssa> I'm under the impression iOS apps can talk to the AGXAccelerator IOKit service themselves (based on project 0 reports I've read) making that technically possible,
<alyssa> though maybe it's sandboxed and just has a WebKit exception that's used in those exploit chains..
yuyichao has joined #asahi-gpu
yuyichao_ has joined #asahi-gpu
yuyichao has quit [Ping timeout: 480 seconds]
chipxxx has quit [Read error: Connection reset by peer]
kov has quit [Quit: Coyote finally caught me]
appleboy711[m] has joined #asahi-gpu
<appleboy711[m]> I wonder how the M2 GPU reverse engineering is going.
joske has joined #asahi-gpu
<joske> jannau: thanks! I can try it tonight after the kids are asleep 😉
SSJGZ has joined #asahi-gpu
bisko has joined #asahi-gpu
bisko has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
joske has quit [Ping timeout: 480 seconds]
<jannau> alyssa: not far off from my background in video/multimedia
<jannau> lina, alyssa: driver is stable enough for > 1h of video playback in firefox
<ar> is the "rebooting the gpu on every frame" hack still in place?
<jannau> yes, no updates since the last stream. demo is working for xdc don't expect updates until after
<ar> i'm honestly amazed it works so well
goldsoultheory has joined #asahi-gpu
chipxxx has joined #asahi-gpu
<lina> You mean as smooth as it does? Thank Apple's awesome PM ^^
<lina> It being stable is no surprise though, it's a great way to clear cached state
<lina> By the way, we have at least one other user running the crazy it-works branch. I told her to just make a laundry list of what apps are broken, so we can go through it at our leisure as we get to fixing things.
<Ella[m]> I'm going to be trying to get my vulkan driver working on it next tuesday
<lina> The uAPI needs a full revamp to really support Vulkan, but I guess you should be able to test basic rendering jobs as is? Just keep in mind the Z clipping is hardcoded to OpenGL mode right now...
<lina> You need to flip bit 0 to 0 for non-GL Z clip (I need to move that into the uAPI ASAP...)
<lina> Ella[m]: Thank you for working on this! Having a Vulkan implementation (even if incomplete) is going to be super useful to make sure we don't mess up the real uAPI design ^^
<Sobek[m]> Is the uapi of the GPU drivers covered by the stability guarantees in linux releases? Is introducing a new uapi version associated with a newer and better driver for the same hardware, with coordinated changes in mesa, something that happens regularly?
<lina> Yes and not really, which is why it's important to get this right the first time!
<lina> It's also why I don't think this driver is going to be upstreamed any time soon, because rushing it makes no sense. I want to see GL 3.1 and OpenCL and at least some Vulkan running before we even attempt it.
<lina> But that's okay, it can live downstream for a while!
<lina> I had a long discussion with jekstrand and alyssa about uAPI design, so I have a good plan I think... but I really want to see it proven before we jump to upstreaming!
<lina> The current uAPI is just a demo, so all of that is going away entirely.
<lina> For now, the first step is to just move into the uAPI all the registers that GL actually needs to pass the CTS (and also that GL mode bit). Most of that render command design can carry over to the real uAPI progressively.
<lina> But the way the driver does memory management, queue allocation, waiting, and scheduling is going to be completely redesigned for the real uAPI
<Ella[m]> <lina> "https://github.com/AsahiLinux/..."; <- I'll probably just flip it in the kernel when I'm testing next week. I want to try and get some basic WSI stuff working but I shouldn't need OpenGL at all for that.
<lina> Sounds good! It's just Z clip so if it's wrong you'll just get messed up clipping, it won't otherwise break everything. When I tested GL mesa on non-GL mode, I was getting half a cube. Vulkan on GL mode is probably just the opposite, so it'll probably display fine as long as nothing is behind the camera?
<Ella[m]> probably
skoobasteeve has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
skoobasteeve has joined #asahi-gpu
skoobasteeve has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
skoobasteeve has joined #asahi-gpu
goldsoultheory has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
goldsoultheory has joined #asahi-gpu
joske_ has joined #asahi-gpu
<joske_> OMG it works!! Thx lina, thx jannau!!!!
<joske_> running gnome-shell --wayland from the console on 16GB M1 MBA
<joske_> booted from u-boot
<joske_> is there a trick to get lightdm to startup at boot?
<alyssa> lightdm will be broken
<jannau> there is afaik no session manager which doesn't use X
<joske_> oh, ok, no worries then
<daniels> gdm is Wayland-native
<joske_> oh and thx alyssa too!!!
<joske_> funny but in mate terminal (when running in wayland?) everytime it gets focus, the window shrinks a bit :-D
joske_ has quit []
<alyssa> I do wonder how I managed to break glamor on this driver
<karolherbst> heh... I know there are some funny things going on in gnome where stuff depends on correct rendering, otherwise it just doesn't work...
<jannau> daniels: good to know
<alyssa> on Bifrost it was lack of register spilling that bit me, but AGX has twice as many registers available to a single thread, so that should be ok
<karolherbst> like in the past there was screwed up colors for big endian systems and besides just everything having the wrong color, the UI didn't work at all
<alyssa> lot of real control flow in the shaders but we're passing everything relevant in deqp-gles2 so I don't think that's it, the control flow handling we have now is suboptimal but "obviously" correct
<karolherbst> famous last works
<karolherbst> *words
<alyssa> based on the corruption I saw in the images I'm more inclined to blame a GL driver bug than a compiler one
<jannau> resizing weston-terminal inside weston behave strangely as well. window border not at the same place at the mouse cursor
<alyssa> oh, hhhh I wonder actually
<alyssa> wonder if modifiers are botched
<jannau> s/at/as/
<alyssa> obviously there's no DRM modifiers on macOS so that code is all untested up until now
<daniels> karolherbst: mutter used to do window picking by rendering with solid colours and reading back, but not for a while now
<karolherbst> yeah.. maybe it's fixed now
<alyssa> daniels: sorry what
<alyssa> that's... novel...
<karolherbst> alyssa: well.. try to come up with _good_ reasons why rendering wrong colors makes the UI break like completely :P
<karolherbst> and with completely I mean: you can't do anything except looking at the screen
<alyssa> this is wrong
<alyssa> for debug I would "assert(whandle->offset == 0);" there to see if that's causing any issue
<daniels> alyssa: it’s actually less uncommon than you might think - or was at least
<alyssa> I don't think it's uncommon, I'd just like to know if that's causing people's problems ;)
<alyssa> this depends on (handle->layer == 0)
<alyssa> also seemingly missing "handle->format = rsrc->layout.format;" there
<alyssa> or maybe "handle->format = rsrc->base.format;" ... would need to think for a bit to figure out whether the difference matters
<alyssa> (I think no because you can't share depth/stencil targets across process boundaries so we should be ok)
DarkShadow44 has quit [Quit: ZNC - https://znc.in]
DarkShadow44 has joined #asahi-gpu
<alyssa> also seemingly missing "handle->size = rsrc->layout.size_B;"
<alyssa> missing assert(level == 0) here
<alyssa> this loop is fundamentally bogus and will return modifiers that aren't actually valid
<alyssa> need to split out 2 functions:
<alyssa> bool agx_linear_allowed(const struct agx_resource *pres) {
<alyssa> return pres->base.target == PIPE_BUFFER || agx_is_2d(pres->base.target);
<alyssa> }
<alyssa> bool agx_twiddled_allowed(const struct agx_resource *pres) {
<alyssa> return true;
<alyssa> }
<alyssa> ok i guess the second function isn't needed
<alyssa> and then there are two distinct cases
<alyssa> the first is when WE select the modifier, in which case it's just the logic without the for loop on top
<alyssa> and at the end can assert
<alyssa> agx_linear_allowed || modifer != LINEAR
<alyssa> agx_twiddled_allowed || modifer != TWIDDLED
<alyssa> the second case is when the CLIENT selects the modifier, in which case we have to pick from that set conditional on those allowed checks
<alyssa> this if-stmt is way too late
<alyssa> agx_twiddled_allowed should then have
<alyssa> !(pres->base.bind & (PIPE_BIND_DISPLAY_TARGET | PIPE_BIND_SCANOUT | PIPE_BIND_LINEAR))
<alyssa> for PIPE_BIND_SHARED, we can and should twiddle
<alyssa> (DISPLAY_TARGET is mostly for software rasterizers but we can't delete it because reasons, SCANOUT is for buffers shared with the display/dcp, SHARED is for buffers shared across processes like window surfaces, LINEAR is an old hack that won't die)
<alyssa> this should only be for DISPLAY_TARGET, it can and will break for SCANOUT and SHARED which can't go through the displaytarget path
<alyssa> though I don't think that last part will actually matter because winsys will be NULL on linux
<alyssa> ---
<alyssa> the modifiers thing is the most fundamental breakage here but yeah
<sven> Code review via irc? :D
<lina> I don't know what I'm doing with mesa anyway, I just copied and pasted bits of Panfrost code so I'm not surprised it's broken... ^^;;;
<alyssa> giggle
<alyssa> lina: also assuming the panfrost code is correct... brave
<karolherbst> what are those registers GL needs btw? Do you have some other ways of accessing that besides adding ioctls? Or is that mostly in the "context initialization" area?
<lina> They'd just be fields in the render struct. It's job data like everything else already there.
<lina> It's just that we're still figuring out what fields of the corresponding giant firmware render structures actually need to be exposed to userspace and which don't.
<karolherbst> mhhh, normally I'd expect that userpace just fills command buffers, sends that to the kernel driver and that's it. But it kind of sounds like that doesn't really exists here?
<lina> In this case we've been learning that some of those fields correspond directly to hardware registers (some of them fully or partially match PowerVR registers...), and since that ABI is dictated by hardware and therefore fixed, it's safe to expose to userspace directly.
<karolherbst> and it's more like filling actual structs with data do describe jobs, is that correct?
<lina> In this case the kernel fills command buffers because Apple decided to architect their firmware such that those command buffers have to be trusted.
<karolherbst> ahhh
<karolherbst> yeah okay, that's.. problematic
<lina> Well, it's not really problematic, it's just... annoying
<lina> Userspace still fills the actual draw/shader bind/etc command stream of course
<lina> It's just that the fixed-sized, job description structures need to be populated by the kernel, partially from userspace data
<karolherbst> the hardware I know is more flexible. E.g. on nvidia you can assign "classes" to context which have priviliged access, which the kernel only allows for itself, and userspace can only allocated classes without special priviliges
<karolherbst> okay
<lina> So we make up an arbitrary uABI for that to contain the data we need, and the kernel just copies it over to the firmware structs and fires the job
<karolherbst> yeah, that's not as terrible as I initially thought then
<karolherbst> so you can batch multiple render jobs into one submission and the kernel just sanitizes, submits and fence it
<alyssa> karolherbst: Nothing to sanitize
<lina> Depends on what you mean by "render job"
<alyssa> our submit ioctl has fields for the registers that userspace is allowed to set
<karolherbst> right.. I meant it more like error checking and filling out proper structs
<alyssa> (all render pass config)
<lina> alyssa: There are some things to sanitize (e.g. not exceeding max # of attachments), but it's fairly obvious stuff.
<karolherbst> lina: one shader invocation
<alyssa> lina: Oh. Delight.
<karolherbst> like "draw this shader pipeline"
<lina> karolherbst: Ah yes, you can do that as many times as you want.
<alyssa> Is that actually a security issue though? Why does kernel need to sanitize?
<lina> One job submission to the kernel = one batch of rendering to the same set of render targets
<alyssa> How is that different from any other invalid cmdbuf that userspace can prepare and that GPU will fault on
<lina> If you switch render targets, that does need a separate job
<lina> (Because this is a tiler)
<karolherbst> okay
<alyssa> lina: "one batch of rendering to the same set of render targets" is called a render pass :)
<lina> alyssa: The kernel needs to copy the attachment list over for one, but I also don't think the firmware will appreciate the count being > the list size
<karolherbst> so as long things stay simple it's fine, but if games switch the render targets often it's bad
<lina> That's the only variable length structure involved here that I can think of though.
<karolherbst> though usually command buffers get that huge that flushing is a must anyway
<lina> Everything else is static
<alyssa> lina: Yeah, that's fair ... I guess supplying bad data to firmware is different than supplying bad data to hardware
<alyssa> given the trust model
<lina> Supplying bad data to firmware crashes the GPU and your entire machine in practice...
<lina> So yeah
<alyssa> yeah okay
<alyssa> delightful.
<karolherbst> at least for GL that's easy to solve
<karolherbst> for vulkan.... it's super annoying
<karolherbst> you will have to workaround that limit
<alyssa> what limit?
<lina> I think separate submissions per render pass is typical for all tiled architectures?
<karolherbst> attachment lists
<lina> I don't think AGX is special here
<lina> Wait what's hard about attachment lists?
<alyssa> yeah i don't see the problem either
<karolherbst> yeah.. it's not special, but vulkan allows application to construct cmd buffers directly
<karolherbst> and submit them explicitly
goldsoultheory has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
<karolherbst> so you have ot make sure the kernel accepts all of them
<karolherbst> either you split it up or something else
<lina> Yeah but userspace will know about the max ahead of time
<karolherbst> right
<lina> Every GPU has a max # of render targets, right?
<lina> I don't think this is special
goldsoultheory has joined #asahi-gpu
<karolherbst> well... depends
<karolherbst> nvidia doesn't because it's all done in the command buffer
<lina> Apple lists those limits in a big PDF somewhere
<karolherbst> switching render targets is just a command
<lina> Yes, but there is a max # of *simultaneous* render targets
<lina> As in MRT
<karolherbst> we do have more virtual limits, like the amount of bos allowed to use in a submission, but that's pretty much it
<lina> Of the same dimensions
goldsoultheory has quit []
<lina> Right?
<karolherbst> right.. hardware limitations exist and those are usually exposed in vulkan as limits
<lina> You can do as many render passes to as many separate sets of render targets as you want in AGX, each one just has to be a separate job submission
<karolherbst> yeah... just means that you will have to do multiple submissions when submitting a vulkan cmd buffer
<lina> Sure, but that's life for tiled architectures...
<karolherbst> and make sure each batch is correct while recording
<karolherbst> yeah
<lina> I mean you need to make sure everything submitted is correct anyway, you wouldn't want to feed the kernel something it rejects at any point
<karolherbst> I just never had to deal with those problems myself :D
<lina> That's a bug in mesa
<karolherbst> we don't have that in nvidia
<lina> Yeah but I'm sure you have hardware limits that cannot be exceeded for correct output, right?
<karolherbst> the only reason the kernel rejects is if the context crashes or we run out of memory
<lina> No difference to this ^^
<karolherbst> there is no metadata attached when doing submissions
<karolherbst> it's literally just memory
<karolherbst> but yes, we have to make sure in userspace to not cause the context to crash, but that's something the kernel isn't involved with
<lina> But the command buffers themselves are going to be full of fields telling the hardware what to do... what's the difference between the kernel rejecting your command buffer and the hardware bugging out? Both indicate the driver did the wrong thing and allowed something to go through that isn't allowed.
<karolherbst> right, not saying they are fundamentally different, just some hardware doens't need anything to be configured on the kernel side for things to run
<lina> Yes, this is funny hardware ^^
<karolherbst> nvidia does all of its submission in userspace even
<lina> I'm just saying this isn't a big deal in the grand scheme of things, as far as hardware differences go
<karolherbst> at least for vulkan and compute
<karolherbst> yeah.. probably
goldsoultheory has joined #asahi-gpu
<alyssa> yes, NVIDIA's hardware is the most perfect thing in the universe, everything else pales in comparison, we all know, jekstrand won't stop talking about it :-p
<karolherbst> :D
<alyssa> as far as gripes I have with AGX, this is not one
<karolherbst> did he
<karolherbst> though my hope is to get to know more about other hardware with my CL work
<alyssa> AGX is a bog standard tiler
<alyssa> although it's unique in how much it does on the shader cores and how little fixed function hardware it has
<karolherbst> yeah.. I've heard
<alyssa> even relative to powervr
<karolherbst> probably makes sense to some degree being more flexible here as you know input/outputs are just memory anyway
<karolherbst> though implementing geometry shaders sounds like pain on agx?
<lina> Everyone knows Intel GPUs are the best and most perfect GPUs ^^
<karolherbst> uhhh
<karolherbst> I don't think I ever heard anybody at Intel saying that even
<karolherbst> but I also never talked to actual hw engineers at Intel
<karolherbst> at least modern intel GPUs are all scalar, finally
<lina> (That was a joke, click the link. That's my favorite WTF hardware graphics bug I saw while looking at GPU drivers.)
<lina> I have no idea how that even happens.
<alyssa> lina: honestly the linked code seems mild by hasvk standards
<karolherbst> lina: usually by hw engineers getting too nifty
<alyssa> (The hasvk driver recently branched off anv, it's because Haswell can haz vk)
<karolherbst> there is probably some good reason behind it once you hear the reason, but a totaly mystery for anybody else
<lina> The addresses are so random...
<alyssa> karolherbst: no! bad! you're thinking about good hardware design, stop it!
<alyssa> ;-p
<karolherbst> :D
<karolherbst> lina: yeah.. until you hear the reason
<alyssa> it's my understanding intel's gfx hw is some of the most pathological in the industry
<karolherbst> it is
DarkShadow44 has quit [Quit: ZNC - https://znc.in]
<alyssa> at least historically... IDK if gfx12 is any better in practice
DarkShadow44 has joined #asahi-gpu
<karolherbst> auto vectorization is a religion at intel
<karolherbst> that's the reason they got vectorized ISAs for that long
<karolherbst> some ditched them 15 years ago
<karolherbst> not sure when intel went full scalar though.. gfx9?
<alyssa> most GPU ISAs are vector at least a little bit
<karolherbst> oh sure.. for memory
<alyssa> lina: ^ if you haven't seen that thread it's good for a laugh
<karolherbst> but intel went this insane way where compute and something was scalar, but the other stages vectorized
<karolherbst> it's really idiotic
<karolherbst> alyssa: yeah.. very good thread
<karolherbst> there is even this joke story about intel engineers trying to beat cuda by using auto vectorization
<alyssa> "Intel's [GPU ISA] is basically designed to keep CS PhD compiler engineers entertained solving unnecessarily hard problems"
<lina> What is Arduino?
<karolherbst> and asked "how did nvidia solved this issue" - "they didn't, they don't use vectors" - "sure, but how did they solve auto vectorization?"
<alyssa> lina: mispelling of Adreno, qualcomm's thing
<lina> Ohhh wwwwwwwww
<karolherbst> fun read
<sven> „at the bottom of the pile… Broadcom“ what a surprise! :p
<karolherbst> that's also one reason why CL is so inherently bad most of the times
<karolherbst> people don't write scalar applications, they make have use of vectors and it sucks
<karolherbst> anyway, that blog post sums it all up nicely
<alyssa> sven: we don't talk about broadcom, no no no...
<karolherbst> what's that about broadcom?
<alyssa> karolherbst: y tu, de ti, ni un sonido saldra!
<alyssa> (sorry I don't know the lyrics in English)
<karolherbst> :D
<karolherbst> anyway.. hw design matters for writing good driver impls, and some care less than others
<alyssa> [Apparently "hey sis, I want, not a sound out of you" ... less dramatic than the Spanish :p]
bisko has joined #asahi-gpu
* karolherbst goes into panic mode because flight preparations
<karolherbst> I had the smart idea of taking a flight at 6 am
<alyssa> Woof
<karolherbst> yeah... so my plan was to try to stay awak over the night and sleep on my flight to the US
<karolherbst> oh well...
<karolherbst> let's see how that all goes
<alyssa> good luck
<sven> im just amazed how everything Broadcom makes seems to be horrible
<Ella[m]> from the work I've seen from v3dv, I second that
<alyssa> Ella[m]: We really should come up with a plan for getting agxvk into shape.
<alyssa> Definitely want a New Vulkan(TM) driver instead of Old Vulkan
<alyssa> Common vk state tracking + common vk_meta
<Ella[m]> yes, I've made some notes on what order I want to do things in and I've already integrated the new state tracking stuff
<alyssa> awesome
<alyssa> and possibly also pure dynamic rendering with common render pass emulation, but we're still trying to figure out if that can be made to work efficiently with tilers
<Ella[m]> That's what I've been doing so far
<alyssa> which?
<Ella[m]> pure dynamic rendering
<alyssa> OK
<alyssa> The trouble is that you leave a lot of perf on the table for renderpasses with multiple subpasses
<Ella[m]> yeah :/
<alyssa> Ella[m]: BTW, do you have Khronos access? (maybe through Igalia?)
<Ella[m]> No I don't
<alyssa> ack
<alyssa> Anyway, here's my take:
<alyssa> Render passes don't map to how AGX works *anyway*
<alyssa> Metal's rendering model doesn't really map to AGX either, but it's a helluva lot closer
<karolherbst> is metal really that great/terrible how people say?
<alyssa> karolherbst: Definitely my favourite graphics API
<alyssa> the programming model is a paragon of sanity by comparison.
<karolherbst> compared to GL or vulkan?
<alyssa> /* no comment */
<alyssa> Ella[m]: Not sure how much you know about how render targets work on AGX
<alyssa> I figured out most of the details in August but haven't had time to update Mesa with the knowledge yet.
<alyssa> For the benefit of everyone else in the room I'll summarize:
<alyssa> Everything happens via fragment shaders. Everything!
<karolherbst> maybe I should try to use metal once and then I have data and can decide to hate or love it or something
<alyssa> There is no tilebuffer. It's just local memory, same as compute shaders!
<alyssa> How does that work? Well, there are 3 handy instructions:
<karolherbst> does something like helper invocations even exist?
<alyssa> 1. local pixel store. This works like a local store (as compute kernels use), but the destination is implicitly indexed by the fragment coordinate. It's a formatted load, meaning you specify the format in the instruction and it does conversion (rgba8 -> f32) in hardware for you.
<alyssa> The shader passes in a byte offset, a format, and a value to store.
<alyssa> This is how fragment shaders write out their results to the "tilebuffer".
<alyssa> 2. local pixel load. The above in reverse. This is how we can load the "tilebuffer" to implement blending.
<alyssa> 3. copy local block to image.
<alyssa> This is the most subtle. ("unkB1" in applegpu right now)
<alyssa> This takes in an image descriptor (usually loaded into a texture state register with BIND_TEXTURE)
<alyssa> and a format and a byte offset
<alyssa> It interprets a block of local memory according to the format/byte offset provided, and blits it to the right spot in the image you pass in.
<alyssa> Generally this instruction is used in the special "end-of-tile" program.
<alyssa> That is a fragment-like shader that runs once per tile, rather than once per pixel, after all regular fragment shading is done.
<alyssa> The driver prepares this shader based on the API state.
<alyssa> For example, to implement N render targets with different formats, the driver will bind N texture state registers with descriptors mapping each render target in memory with the format of the attachment in memory.
<alyssa> with a shader that contains N block copy instructions, each specifying the format and location of the render target in the tilebuffer
chipxxx has quit [Read error: Connection reset by peer]
<alyssa> (I expect those formats need not match exactly but they must be compatible for the results to make sense.)
<alyssa> For completeness, there's also a special "background" program that runs for every pixel before any other fragment shader. It should load or clear all render targets as necessary.
<alyssa> (For loads, the driver will again bind N texture state registers with texture descriptors mapping the render target's existing contents)
<alyssa> ---------------
<alyssa> So how does any of this map to subpasses?
<alyssa> Ideally, the "background -> draws -> end of tile" pipeline is a single *render pass*
<alyssa> The background program initializes the tilebuffer for the *first* subpass.
<alyssa> Before and after each subpass, there is a fixed layout of the tilebuffer: mappings of colour attachments to offsets and tilebuffer formats.
<alyssa> Fragment shaders are keyed to the layout before and after, determining how input attachments are read and colour is written respectively.
<alyssa> Finally, the end of tile program writes out only what's left at the end with STORE_OPs
<alyssa> ------------
<alyssa> This can all be done with render passes, but it doesn't really map in a totally clean way.
<alyssa> Conceptually what we actually want is:
<alyssa> vkBeginRendering(the layout at the END, that will actually be stored to memory)
<alyssa> Draws where input attachments are lowered to "tilebuffer load" and outputs are lowered to "tilebuffer store", keyed to the explicit tilebuffer layouts
<alyssa> (what Metal calls "imageblocks")
<alyssa> vkEndRendering()
<alyssa> So to implement all this subpass gooiness, we'd want to be doing an internal dynamic rendering for the whole render pass, with some sort of internal VK_MESA_pixel_local_storage extension
<alyssa> (EXT_pixel_local_storage is a GLES extension adding imageblocks.)
<alyssa> -----
<alyssa> My motivated reasoning says that we could extend the render pass emulation in the Vulkan core to do that for tilers (implement with dynamic rendering + imageblocks)
<alyssa> Both Mali and AGX want that common lowering
<alyssa> I assume PowerVR too given the heritage.
<alyssa> I don't know if v3dv could use.
<alyssa> I don't think turnip can, because qualcomm isn't a traditional tiler.
<alyssa> I think it can be done... I don't think I want to be the one to do it ;)
<karolherbst> I am actually wondering if we'll see more hardware like that even in the discrete space, where stages will be more like that
MatrixTravelerbot[m]1 has quit []
<alyssa> karolherbst: The conceptual block diagram is the same as Mali, the hardware is just a lot less opinionated.
<karolherbst> alyssa: sure.. but I could also imagine that nvidia or AMD are more aggressive about killing of stages, but maybe they are simply less concerned about space and keep their stuff. At least for nvidia I know that they already just use L1 or L2 cache for all those stage special memory things
<alyssa> sure
<lina> I ended up reading the entire ispc saga...
<karolherbst> it's fun, isn't it?
<lina> It is!
<lina> Also it seems so obvious in retrospect, shaders for CPUs... why wasn't that a thing before?
<karolherbst> though I am not sure myself if I read the full thing, might have stopped somewhere in the middle. The first part is in itself already a good story about intel here :D
<lina> I read until the leaving intel part
<lina> The stories of Intel internal coporate culture explain so much...
<karolherbst> the main problem is, that for different problems you have to use different mindeset. Normal code written for C doesn't optimize well, and normal C devs never had to think much about data paralization and all of that
<alyssa> lina: All of the "this GPU is scalar/vector" drama makes a lot more sense when you realize subgroup-style SIMD is equivalent to vec4 SIMD :)
<karolherbst> if you can use avx/sse all the way, it can be super fast, but normal code doesn't give you that
<karolherbst> and never will
<karolherbst> yeah...
<lina> alyssa: Yeah ^^
<karolherbst> coporate culture is more important than "technical reasoning" for most things
<karolherbst> or well.. the more relevant reason why things turned out how they did
<lina> Yup...
<karolherbst> not even wanting to say that it's a bad thing, because most of the time "technical points" are just mindsets/culture put into something abstract and ticked of with a "yep, that's how it must be"
<karolherbst> in the end what works decides what's the best thing, not some abstract reasoning
<karolherbst> uhhh that "On jerks and institutional acceptance of them" part reminds me of some...
<karolherbst> I am now actually wondering what would happen if we mixed llvmpipe with ispc...
<karolherbst> though llvmpipe is already doing similar things
<karolherbst> but also not sure how much it embraces AVX
compassion4 has joined #asahi-gpu
compassion has quit [Ping timeout: 480 seconds]
compassion4 is now known as compassion
<alyssa> llvmpipe already does ispc's 1 Weird Trick
<karolherbst> yeah .. for most part
<karolherbst> funny thing is though, that pocl is roughly 10x as fast as llvmpipe in luxmark
<karolherbst> but I am also not super concerned about llvmpipe with CL, because it's easy to just use pocl
goldsoultheory has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
goldsoultheory has joined #asahi-gpu
goldsoultheory has quit []
joske has joined #asahi-gpu
<joske> some observations from running Lina's GPU driver on M1 MBA 16GB: doesn't boot reliably, seems I *need* to edit the kernel command line in grub, or it won't boot, and it seems that loglevel must be set to 8
<joske> once booted, wifi doesn't work: complains about some crypto things missing in the kernel while they ARE available: https://paste.debian.net/1255756/
<joske> when running GNOME in wayland, function keys are not working (can not change volume etc), while I'm pretty sure that worked before in GNOME wayland
goldsoultheory has joined #asahi-gpu
<joske> desktop does seem to be stable once running
joske has quit [Quit: Leaving]
yuyichao_ has quit [Read error: Connection reset by peer]
joske has joined #asahi-gpu
<joske> confirmed function keys are working in released asahi kernel in GNOME wayland
yuyichao_ has joined #asahi-gpu
joske has quit []
<alyssa> those sound like kernel misconfiguration or busted initramfs or something
RowanG[m] has quit [Quit: Client limit exceeded: 20000]
joske has joined #asahi-gpu
<joske> kernel misconfiguration is possible of course (it's an old config I used when I was still using 4k kernel)
<joske> not using an initrd
joske has quit []
<alyssa> where is the wifi firmware coing from
<alyssa> also, modules
joske has joined #asahi-gpu
<joske> both from the rootfs, this was always working without initrd
<alyssa> did you install up to date modules?
<alyssa> is the local version sane?
<joske> yes (make modules_install)
<alyssa> ok
<alyssa> i'm not a kernel dev [dubious - discuss], not sure I can help other than ask questions about stuff I've screwed up in the past :D
<karolherbst> is it fixed after loading the modules?
<joske> brcmfmac was loaded
<karolherbst> sure, but also the crypto stuff?
<joske> and I've built all crypto stuff into kernel (=y)
<karolherbst> well... but why does it complain then?
<joske> no idea
<karolherbst> sure all of those symbols are actually there?
<karolherbst> uhm.. configs
<joske> hmm how can I check?
<karolherbst> either in your .config or the one getting installed to /boot/
<joske> I mean the symbols
<karolherbst> I am sure iwd just checks the config
<sven> i don’t der how enabling the gpu driver could break any of that tbh. Maybe the debug part due to sone timing but the rest very much sounds like config issues
<sven> *i don’t see
<joske> yes it could very well be I'm messing something up
<karolherbst> I am not entirely sure _how_ iwd checks it.. maybe you have to make sure to also install the config or something
<karolherbst> maybe iwd checks against =m
<joske> initially it was =m
<karolherbst> that could be it
<joske> then I saw it didn't find it, then I built again as =y
<karolherbst> normally the rule is: initramfs or no support
<joske> but it was the same
<karolherbst> and that network stuff is usually built around the fact that you have an initramfs
<karolherbst> if you leave this assumption you are generally on your own and I advise you to use an initramfs
<joske> I was not aware of that rule ;-). And as I said it had always worked without (been using my own kernel with sven's 4k patch for many months)
<karolherbst> sure.. and something might have changed
<alyssa> to be clear you're not using 4k pages now?
<joske> I appreciate any help of course, and I don't want to send you in a wild goose chase
<joske> no 16k
<karolherbst> maybe a config of those got disabled by accident or...
<sven> Using that patch also means you get to keep the parts of things break! ;)
<joske> haha but it never did!
<sven> *if
<joske> it has worked just long enough until chromium and vscode supported 16k pages
<joske> then I re-installed with asahi reference distro and release kernel
<karolherbst> the annoying part is that loading order kind of matters... might be simpler to start with a stock config taken from somewhere
<joske> yes, I'll do that
<alyssa> remember kids you always win a race you can ignore it
<karolherbst> build and install from that to make sure it boots and loads network sucessfully
<karolherbst> and then only make very very very small changes and go another test
<joske> I'll use the config of the asahi kernel as base, and do olddefconfig
<joske> problem is the release kernel doesn't boot with the boot.bin with the new device tree
<karolherbst> right.. but that shouldn't be a problem if you only keep the old config.. probably
<joske> yes I can try that
goldsoultheory has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
goldsoultheory has joined #asahi-gpu
<jannau> wifi/iwd works for me with the gpu driver enabled
<joske> that's good to know thx, so please ignore me for now
<joske> jannau: do the function keys work?
<jannau> I haven't tested on a m1 macbook
<karolherbst> I think I am 🤏 so close of getting an M1 apple macbook myself...
<joske> ok thx
<joske> karolherbst: you won't regret it, even without the GPU driver, the machine is very usable in linux
<sven> karolherbst: you know you want to! You won’t regret it :p
<alyssa> karolherbst: and I think I am 🤏 so close of getting an M2 macbook :-p
<karolherbst> I am the kind of person who'd just go with the m2 ultra straight away
<karolherbst> ...
<joske> that won't be a macbook though :-D
<karolherbst> yeah...
<sven> why not both!
<karolherbst> jo... I mean.. the ultra is already expensive enough at it is
<joske> maybe wait for the M2 extreme!
<karolherbst> ahh it's m1 max actually
<karolherbst> those marketing people
<alyssa> I cannot understand the marketing.
<karolherbst> s/the//
<Ella[m]> I wrote a script to regularly scrape second hand sites until i could find something I could afford
<joske> I have an M1 max MBP for work, I installed asahi on it for a few hours (not allowed in fact), it flies
<alyssa> I prefer the code names just to get away from the ridiculousness.
<karolherbst> should have called it SS M1 SS2 M1....
<sven> they listened too much to the usb if
<karolherbst> is the M2 equal to the M1 for the bits which matter btw?
<karolherbst> though I don't think I need the GPU power, because what usually happens is, that once nvidia release new GPUs, I get them anyway
<sven> it had some new hardware and changed a few things around. Don’t think anyone looked at the gpu in detail yet afaik
<karolherbst> k.. atm I am looking at the 14-inch one... 16 might be a tad too big.. dunno
<karolherbst> oh well...
<sven> im still happy with my 13 inch m1 air. Except that it only does a single external screen
<karolherbst> ehhh
<sven> but we’ll see if I can „fix“ that
<karolherbst> is that really a limitation though?
<joske> just get the air, in single core perf, it's the same, and NO noise
<karolherbst> I had an argument on the dri-devel mailing list with somebody who wanted to always use the discrete GPU for external ports (there are laptops where you can one time flip to the discrete one) so they can plug in 4 displays instead of 3.... and I was like:: seriuosly?
<karolherbst> and make it the default for all laptops with such a flip
joske has quit [Remote host closed the connection]
<sven> I just want two external screens and not use the internal screen when I’m at home tbh
<karolherbst> ahh...
<karolherbst> unless apple messed up the hw, it should be possible... hopefully
<karolherbst> could also be hardcoded in their firmware
<alyssa> I'm boring and only use a single monitor
<sven> yeah, I have an idea how to hack it. They do support two separate DP streams for tiled displays
<jannau> no, the internal display controller seems to be special
<sven> and I know how to reroute those to two separate ports
<karolherbst> I also only use one, but my intel GPU is already busy with two 4k screens... :D
<sven> now the only questions remaining is if the Display controller firmware will hate me for that
<karolherbst> it probably will
<karolherbst> question is: does it matter?
<karolherbst> and is the firmware signed?
<sven> fair enough :)
<sven> it’s unfortunately signed, yeah
<karolherbst> mhhh
goldsoultheory has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
<jannau> the firmware will most likely crash
<sven> maybe
<karolherbst> sooo.. there might be a way to get around signed firmware, which... ehh I know because of reasons
<karolherbst> not sure if that would work on apple hardware though
<sven> maybe it’ll at least work with two identical displays. We‘ll see
<karolherbst> it also depends on how they verify the signature
<karolherbst> and how the secret is stored
<karolherbst> but yeah.... it's pointless if one needs to do heavy changes to the firmware
<sven> They load the fw and lock down the processor with fancy hw features before we get control
<jannau> dcp firmware is loaded on boot by the system bootloader
<karolherbst> sure
<karolherbst> nvidia did the same, but people still managed to find a way to leak the signature keys
<karolherbst> hw isn't perfect
<sven> We’d need the iPhone root signature keys afaik :D
<karolherbst> uhhh...
<karolherbst> I suspect apple has a bit more experience with making sure it's secure than nvidia does
<sven> yeah…
<karolherbst> nintendo wasn't amused
<karolherbst> btw
<jannau> apple is certainly not perfect, the dcp firmware was used to pown macos
<jannau> err, ios
<sven> oh, I’m sure i could get rop code exec on the DCP in a weekend or so
<sven> but that’s nothing we want to rely on
<sven> I’m only willing to try and abuse their tiled display support
<karolherbst> yeah... but it's better to make a decision once it's known what the limitations are and if one could simply binary patch something... though even that is very annoying
<sven> yeah
<sven> but binary patching would already require an exploit somewhere
<jannau> two (mostly) identical displays is probably too much of an limitation but it should be possible to build a thunderbolt/usb4 dock which emulates a tiles display
<sven> and I’m specifically working on this hw because I don’t have to write or rely on exploits ;)
<jannau> tiled
<sven> I’d be very happy with that two mostly identical display limitation fwiw
<sven> but let’s ses what happens first
<sven> *see
goldsoultheory has joined #asahi-gpu
<jannau> sure, but I could imagine that it a reason why such product doesn't exists
<alyssa> sven: so that's a "no" on iPad M1 bootrom exploits from you? :-p
<jannau> no matter how large you write that only works with 2 identical displays you will be swamped with support requests
<sven> yeah, and maybe also why apple doesn’t support that configuration on m1
<sven> alyssa: yup :p
<alyssa> Mainly I don't understand why they have the DCP architecture at all on anything other than the watches
<alyssa> the M1 Ultra dies are comically inefficient with the DCPs..
<sven> without DCP we’d have to write fun parts like DP link training ourselves :/
<karolherbst> wait.. so tiled displays work?
<karolherbst> any max resolution there?
<sven> yes, but only if they use thunderbolt and not MST
<karolherbst> heh...
<sven> 6k I think
<karolherbst> okay so... tiled displays are just two displays
<jannau> we suspect that the m1 ultra supports 6 6k displays instead of just 4 like claimed by apple
<karolherbst> I mean like literally two displays
<karolherbst> DP has a flag in the edid to mark them as such, so the OS can properly deal with that
<sven> yup, which why I hope that I can hack that to support two actually separate displays on two ports
<karolherbst> yeah... that should be possible.. worst case you just modify the edid the firmware sees.. but dunno how that's all wired up on the hardware
<karolherbst> so TB usually has twice the amount of bandwith than DP, so requiring TB for 6k might even make sense
<karolherbst> there is also this lane assignment business going on
<karolherbst> and one could argue that you don't want to waste all 4 lanes on Displayport
<alyssa> if I never again see the word "crossbar" it will be too soon
<sven> what’s so bad about crossbar? :p
<karolherbst> how are crossbar related to that anyway?
<jannau> routing dp from any DCP to any USB4 ports but no cross die routing on the m1 ultra
<jannau> but I think alyssa has something else in mind
<alyssa> jannau: no, that :)
<alyssa> multiplexers! wow!
* karolherbst should stop trolling trolls on phoronix
SSJ_GZ has joined #asahi-gpu
SSJGZ has quit [Remote host closed the connection]
<alyssa> karolherbst: "Go compiles fast because the compiler isn't great at optimizing. "
* alyssa grabs popcorns
<karolherbst> it gets better
<alyssa> Eh, i think that's the best line :-p
<karolherbst> sad
<alyssa> I really liked Go when I used it
* karolherbst never used go
<alyssa> I don't write the sort of software that Go shines at, but I really liked what I saw when I was playing with it
<alyssa> except for the error handling
<karolherbst> but seems like go isn't such a controversial topic, so it's easier to engage on those "anti rust" folks
<alyssa> i swear i need a keybinding to insert "if err != nil { return err }"
<karolherbst> you mean like "?"?
<alyssa> yes
<karolherbst> but I am actually interested about all those fake arguments people pull out just to say why Rust doesn't belong into anything or something
<karolherbst> my favourit is still "it shouldn't have a stdlib"
<alyssa> karolherbst: I wonder what would happen if you claimed serde was part of the stdlib :-p
<karolherbst> I still have ~3 hours until I have to leave
<karolherbst> alyssa: I already said that package management is an important key feature of rust :D
<alyssa> okay but itertools
<karolherbst> but sure.. I could say that pulling in serde would be cool
<alyssa> why is itertools not stdlib
<karolherbst> because it's still experimental?
<alyssa> if we're doing the whole functional thing
<karolherbst> rust does have this formal "get shit into stdlib" process
<karolherbst> and crates are used to toy around first
<karolherbst> there are other things pulled from crates into stdlib all the time
<karolherbst> num_integer e.g. has some nice things
<dottedmag> alyssa: w.r.t. things Go shines at: I'm doing a Wayland compositor in Go for fun (under macOS four double-fun), and it still kinda-shines.
SSJ_GZ has quit [Ping timeout: 480 seconds]
<alyssa> dottedmag: Interesting
<dottedmag> s/four/for/
<karolherbst> as long as more and more things are getting pulled into stdlib we might not even need crates long term in mesa
<karolherbst> though not having serde would be a big downside :(
<alyssa> I was referring to web services and such
<alyssa> (or a Gemini service, in my case)
<karolherbst> it's kind of sad though how those "software purist" always think the entire world is bad and they are the only one who really get it
<i509VCB> I am kind of on the train of "the standard library is where libraries go to die". You need to be very selective of what goes in.
<karolherbst> sure and I think the rust community is very very selective on that
<alyssa> karolherbst: what about those who think the entire world is bad and nobody gets it including them? :P
<karolherbst> heh
<i509VCB> why would mesa need to depend on serde specifically? I'm guessing there is some need somewhere to manage json or xml?
<karolherbst> shader caching
<i509VCB> so you are looking at putting rust in nir?
<karolherbst> nah, it's more about metadata
<i509VCB> I'm not sure where we are so I may sound lost
<karolherbst> I have a few structs I have to serialize
<karolherbst> and atm I handroll the code
<karolherbst> but come on.. why do I have to hand roll serializeing enums and stuff
<i509VCB> derive in serde does work, but it can be a little too heavy in some use cases
<karolherbst> yeah.. but I only need it for trivial enough things
<karolherbst> it's not a biggy, but that's one place where e.g. using serde would make sense
<karolherbst> even though it's doing much more than we actually need
<karolherbst> don't even need a special format.. a plain binary dump is enough
<karolherbst> maybe serde is actually too big for that
<i509VCB> serde is also a little awkward to deal with when regarding binary formats
<karolherbst> yeah... though we have a little trick: shader caches are only valid for the same mesa binary
<i509VCB> bincode v2 moved away from serde (but still supports it) because it's a binary format
<karolherbst> if you got a new version, the cache is invalid
<alyssa> I used a neat library for this recently
<karolherbst> so what I really need is just an easy way to map between data types and [u8]
<alyssa> well, 9 months ago 'recently'
<alyssa> nom
<alyssa> I think that's only the parsing half of things
<karolherbst> mostly that enum to int mapping is annoying, though I could just unsafe it I think?
yrlf has quit [Quit: The Lounge - https://thelounge.chat]
yrlf has joined #asahi-gpu
<i509VCB> I'd think TryFrom implementation that returns an Err given an invalid value would make more sense
<i509VCB> That way you can just do self.enum_value as u32
<i509VCB> (repr(C) is an example, could be repr(u32) or whatever_
<karolherbst> well.. returning an Error doesn't help with anything though
<karolherbst> because the fallback is to just compile the source regardless
<karolherbst> but yeah.. maybe repr(C) would help
<alyssa> unsafe { transmute } all the things
* alyssa is not very good at Rust
<karolherbst> mhhh
<karolherbst> already having enough unsafe blocks :D
<i509VCB> Or `impl From<u32> for Option<TheEnum>` could be another option
<karolherbst> though I am sure that the 1st place of unsafe \ loc goes to the stdlib
<karolherbst> i509VCB: my point was more, that I don't want to have to write the code for it :)
<alyssa> mako template generating rust code? :-p
<karolherbst> because when it gets changed, the code needs got get updated
<karolherbst> heh...
<karolherbst> I think there are also crates making that int <-> enum conversion less painful
<karolherbst> but then I also want to just declare a type to get derived serialize/deserialize methods
M152ari[m] has joined #asahi-gpu
<i509VCB> sure I get don't want more ticking timebombs that explode when you forget to update little things.
<i509VCB> an int <-> enum derive macro sounds interesting
<karolherbst> I think this was the one I found way earlier: https://docs.rs/int-enum/latest/int_enum/trait.IntEnum.html
<i509VCB> looks abandoned sadly
<karolherbst> or it's the first crate to be "done"
<i509VCB> Well the open pull request is my metric of that
<i509VCB> The author seems to exist still, I guess it could be brought back up
<karolherbst> or maybe it's part of serde now
<karolherbst> it does seem to use serde It hink...
<karolherbst> ahh yes
<karolherbst> but not sure if serde supports that out of the box now
<karolherbst> but looking at bindgen I think rust really needs to improve handling of "int enums", to make interacting with C code less painfull
<i509VCB> Isn't there an option to generate rust enums in bindgen?
<alyssa> karolherbst: especially bit enums
<karolherbst> i509VCB: yeah, and then 4 other enum types
<i509VCB> Although bitflags are a bit out of it: https://docs.rs/bindgen/0.60.1/bindgen/struct.Builder.html#method.rustified_enum
<karolherbst> yeah...
<karolherbst> all those enum types have pro and cons
<karolherbst> I think I use all of them in mesa...
<i509VCB> You can mark it as a bitfield enum, but you have to name all the types... https://docs.rs/bindgen/0.60.1/bindgen/struct.Builder.html#method.bitfield_enum
<karolherbst> I know :)
<karolherbst> still.. they are annoying to use
<karolherbst> could be worse though
goldsoultheory has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
bluetail4 has joined #asahi-gpu
bluetail has quit [Ping timeout: 480 seconds]
bluetail4 is now known as bluetail
laintree has quit [Remote host closed the connection]
lain has joined #asahi-gpu
juleeandres[m] has joined #asahi-gpu