#wayland on 2024-09-27 — irc logs at oftc.irclog.whitequark.org

2024-07-16 04:52 ChanServ changed the topic of #wayland to: https://wayland.freedesktop.org | Discussion about the Wayland protocol and its implementations, plus libinput

00:04 <mclasen> I was thinking of the xeyes one that went by recently

00:46 fmuellner_ has quit [Remote host closed the connection]

01:59 Company has quit [Ping timeout: 480 seconds]

02:08 Company has joined #wayland

03:05 mxz_ has joined #wayland

03:10 mxz has quit [Ping timeout: 480 seconds]

03:11 mxz__ has quit [Ping timeout: 480 seconds]

03:13 mxz_ is now known as mxz

03:27 Company has quit [Remote host closed the connection]

03:27 nerdopolis has quit [Ping timeout: 480 seconds]

05:02 mxz_ has joined #wayland

05:27 sima has joined #wayland

05:29 rv1sr has joined #wayland

05:41 kasper93 has quit [Ping timeout: 480 seconds]

05:49 pbsds is now known as Guest4689

05:49 pbsds has joined #wayland

05:53 Guest4689 has quit [Ping timeout: 480 seconds]

06:12 RAOF has quit [Remote host closed the connection]

06:12 RAOF has joined #wayland

06:17 tzimmermann has joined #wayland

06:36 leon-anavi has joined #wayland

06:39 iomari891 has joined #wayland

06:48 iomari891 has quit [Ping timeout: 480 seconds]

06:49 kts has joined #wayland

06:52 mripard has joined #wayland

06:54 eruditehermit has joined #wayland

07:27 coldfeet has joined #wayland

07:44 rgallaispou has joined #wayland

07:58 abhimanyu has joined #wayland

07:58 kasper93 has joined #wayland

08:03 kts has quit [Quit: Leaving]

08:12 kts has joined #wayland

08:38 paulk has quit [Ping timeout: 480 seconds]

08:40 paulk has joined #wayland

08:47 yrlf has quit [Quit: The Lounge - https://thelounge.chat]

08:48 yrlf has joined #wayland

08:48 yrlf has quit []

08:48 yrlf has joined #wayland

08:50 yrlf has quit []

08:50 yrlf has joined #wayland

09:29 rasterman has joined #wayland

09:36 plutuniun has joined #wayland

09:36 plutuniun has quit [Remote host closed the connection]

09:36 plutuniun has joined #wayland

09:39 plutoniun has quit [Ping timeout: 480 seconds]

10:05 kts has quit [Quit: Leaving]

10:06 kestrel has joined #wayland

10:17 gallo has quit [Remote host closed the connection]

10:20 kts has joined #wayland

10:23 gallo has joined #wayland

10:56 ___nick___ has joined #wayland

10:56 ___nick___ has quit []

10:59 ___nick___ has joined #wayland

11:09 kts has quit [Quit: Leaving]

11:21 nerdopolis has joined #wayland

11:58 eruditehermit has quit [Ping timeout: 480 seconds]

12:00 Company has joined #wayland

12:01 <nerdopolis> Curiosity: What would be needed in Wayland or Mesa for display servers to survive the removal of the simpledrm device when the kernel replaces it with the real card?

12:07 <kennylevinsen> many display servers already support new GPUs showing up, but at least wlroots/sway doesn't support changing the device that acts as primary renderer

12:07 <kennylevinsen> so it would need realize the transition is going on, throw away its current renderer and start over with the new one

12:08 <zamundaaa[m]> nerdopolis: simpledrm -> real GPU driver shouldn't need changes in Wayland

12:08 <kennylevinsen> yeah it's just display server logic

12:08 <zamundaaa[m]> A real GPU driver going away, so that you can't use linux dmabuf anymore, that would be more tricky

12:11 <nerdopolis> Yeah, I guess the core protocol is fine, I should have not phrased it that way. (Unless if clients also need to change to be aware)

12:11 <kennylevinsen> well clients might want to start uisng the new renderer

12:12 <kennylevinsen> if they had previously started with a software renderer

12:12 <kennylevinsen> whether intentionally or through llvmpipe

12:14 kts has joined #wayland

12:20 <nerdopolis> That would make sense. But for the ones that don't care, like say kdialog or some greeter would be fine?

12:23 <karolherbst> kennylevinsen: mhhh, that reminds me of how macos handles those things. There is a "I support switching over to a new rendered" opt-in flag for applications and the OS signals to applications that this is gonna to happen (because the GPU is going away or because of other reasons), so applications get explicitly told which device to use and when.

12:24 <karolherbst> otherwise they all render on the discrete one (which has its problem for other reasons)

12:30 lsd|2 has joined #wayland

12:37 <nerdopolis> Does simpledrm even support dmabuf?

12:40 <Company> switching GPUs is not really supported anywhere because it basically never happens

12:40 <Company> so it's not worth spending time on

12:40 <Company> same for gpu resets

12:40 <DemiMarie> GPU resets absolutely happen

12:41 <DemiMarie> I'm looking at AMD VAAPI here.

12:41 <DemiMarie> And Vulkan video

12:41 <Company> I write a Vulkan driver on AMD, I know that GPU resets do happen

12:41 <Company> but I don't think I've ever had one in GTK's issue tracker for example

12:42 <DemiMarie> I doubt GTK gets the bug reports

12:42 <DemiMarie> Though I will happily write one for GTK not handling device loss.

12:42 <Company> maybe - Mutter goes pretty crazy on GPU resets

12:42 <DemiMarie> That should be fixed

12:43 <nerdopolis> Switching GPUs is starting to be an issue, especially with simpledrm. It was reported against mutter first https://gitlab.gnome.org/GNOME/mutter/-/issues/2909 which was worked around with a timeout, and then sddm

12:43 <DemiMarie> wlroots is working on it, as is Smithay

12:43 <Company> I usually reboot when it happens

12:43 <DemiMarie> KWin handles it already

12:43 <DemiMarie> Company: that is horrible UX

12:44 <Company> you're welcome to implement it, write tests to make sure it doesn't regress and fix all the issues with it

12:44 <nerdopolis> The issue is that amdgpu takes a longer time to start up, so /dev/dri/card0 is actually simpledrm, and then the display server starts using it. Then when amdgpu (or other drivers) finishes simpledrm goes away and gets replaced with /dev/dri/card1

12:44 <DemiMarie> Company: Or not use Mutter

12:45 <Company> DemiMarie: yeah, if you commonly reset your gpu, that's probably not a bad idea - though I would suggest not resetting the gpu

12:45 <jadahl> in mutter the intention to handle it like a gpu reset where everything graphical just starts from scratch. but the situation that simpledrm introduces is not what it is intended for, as simpledrm showing up for a short little while then gets replaced causes a broken bootup experience, so unless it can't be handled kernel side, might need to work around it by waiting a bit if we end up with simpledrm to

12:45 <jadahl> see if anything more real shows up

12:45 <DemiMarie> That said, it might be simpler for Mutter to crash intentionally when the GPU resets, so the user can log back in.

12:46 <jadahl> because we don't want to start rendering with simpledrm, then switching to amdgpu

12:46 <jadahl> (at bootup)

12:46 <DemiMarie> Company: Do AMD GPUs support recovery from resets, or is it usually impossible on that hardware?

12:47 <Company> jadahl: what do you do with all the wl_buffers you (no longer) have in Mutter? Tell every app to send a new one and wait until they sent you one?

12:47 <nerdopolis> jadahl: But what about systems that don't have driver support, and only support simpledrm? Are they going to be stuck with an 8 second timeout?

12:48 <jadahl> Company: we don't really handle gpu resets, so now we don't do anything. in a branch we do a trick for wlshm buffers to have them redraw, but it doesn't handle switching dmabuf main device etc

12:48 <Company> DemiMarie: I'm pretty sure it can be made to work somehow, because Windows seems to be able to do it (the OS, not the apps)

12:48 Hypfer is now known as Guest4714

12:48 Hypfer has joined #wayland

12:49 <jadahl> nerdopolis: that is the annoying part, they'd get a slower boot experience because the kernel has not the slightest clue if a gpu will show up ever after boot

12:49 <Company> DemiMarie: also, when installing new drivers on Windows it tends to work

12:49 <karolherbst> the one situation where changing the renderer makes sense if you e.g. want to build your compositor in the way, that you have multiple rendering context per display/GPU so you won't have to do the render on discrete GPU -> composite on integrated GPU -> scanout on discrete GPU round trip and stay local to one GPU. So if you move a window from one GPU to another, the compositor _could_ ask the applications to switch the renderer as well to

12:49 <jadahl> (even if it's connected already etc)

12:49 <karolherbst> save on e.g. PCIe bandwidth which is a significant bottleneck with higher resolutions e.g.

12:49 <Company> jadahl: I was wondering about the dmabufs

12:49 <jadahl> Company: the compositor would switch main device, and the clients would need to come up with new buffers

12:49 <DemiMarie> karolherbst: how significant a bottleneck?

12:50 <karolherbst> depends on a looot of things

12:50 <Company> jadahl: right, so you'd potentially be left without buffers for surfaces for a (likely short) while

12:50 <karolherbst> I was playing around with some PCIe link things in nouveau a few years back and I saw differences of over 25%

12:50 <karolherbst> in fps numbers

12:50 <jadahl> karolherbst: or gpu hotplug a beefy one

12:50 <karolherbst> yeah. or that

12:51 <karolherbst> but games usually don't support it, so they would just stick with one

12:51 <mclasen> jadahl: you could give mutter some config to turn off the wait

12:51 <jadahl> Company: indeed, one would need to wait for a little bit to avoid avoidable glitches

12:51 <karolherbst> but the point is, that the round trip to the integrated GPU causes bottlenecks on the PCIe link

12:51 <karolherbst> (but to fix this you'd probably have to rewrite almost all compositors)

12:51 <jadahl> mclasen: how would one set that automatically ?

12:51 <mclasen> you won't

12:51 kts has quit [Quit: Leaving]

12:51 <zamundaaa[m]> Demi: I think Company meant to say they're writing a renderer, not a driver

12:51 <DemiMarie> karolherbst: does it really take a full rewrite?

12:52 <mclasen> but a user who cares about fast booting without a gpu could set it

12:52 <karolherbst> not a full one

12:52 <karolherbst> but like instead of having one rendering context for all displays, you need to be more dynamic

12:52 <zamundaaa[m]> amdgpu does support recovering from GPU resets, though it's not completely 100% reliable

12:52 <DemiMarie> zamundaaa: would you mind explaining further?

12:52 <karolherbst> and that can cause quit significiant reworks

12:52 Hypfer is now known as Guest4715

12:52 <karolherbst> *quite

12:52 <karolherbst> and then decide where something should be rendered where

12:52 <zamundaaa[m]> Demi: in some situations, recovery just fails for some reason

12:52 leandrohrb56 has joined #wayland

12:52 <jadahl> mclasen: sure. it's unfortunate this seems to be needed :(

12:52 <zamundaaa[m]> I don't know the exact details

12:52 Guest4714 has quit [Ping timeout: 480 seconds]

12:53 <DemiMarie> I think the hard solution is the best option

12:53 <mclasen> jadahl: yeah, after all these years, booting is still a problem :(

12:53 <DemiMarie> At least if there is enough resources to do it.

12:53 <karolherbst> but yeah.. the PCIe situation with eGPUs is even worse, because you usually don't have a x8/x16 connection, but x4

12:54 <Company> random data: I lose ~10% performance by moving my GTK benchmark to the monitor that is connected to the iGPU

12:55 <Company> which makes no sense because the screen updates only 60x per second anyway, not the 2000x that the benchmark updates itself

12:55 <Company> but it drops from 2050fps to 1850fps

12:56 <Company> probably overhead because the dGPU has to copy the frame to the iGPU

12:56 <karolherbst> if rendering on the discrete GPU?

12:56 <Company> yeah, GTK stays on the dGPU

12:56 <karolherbst> but yeah.. if you have more load on the PCIe bus, command submission will also be slower probably

12:56 <Company> it's just Mutter having to shuffle the buffer from the dGPU to the iGPU

12:57 Guest4715 has quit [Ping timeout: 480 seconds]

12:57 <Company> we have ~150k data per frame, at 2000fps that's 300MB/s

12:57 <karolherbst> I've played around with changing the PCIe bus speed in nouveau when I've done the reclocking work. On desktops none of that mattered up, single digit perf gains at most, but on a laptop it was absoultely brutal how much faster things went

12:57 Hypfer has joined #wayland

12:57 <Company> actually, probably more because that's just vertex buffer + texture data, not the commands

12:57 <Company> this is on a desktop

12:58 <karolherbst> oh I mean desktop as in single GPU

12:58 <Company> Radeon 6500 dGPU and whatever is in the Ryzen 5700G as the iGPU

12:59 <Company> but the speeds for reading data from the dGPU are slooooow anyway

12:59 <DemiMarie> Even when using DMA?

12:59 <karolherbst> it apparently also matters enough that OS add features so that a dGPU can claim an entire display for itself so no round trips what so ever happen and it matters

12:59 <karolherbst> DMA is still using PCIe

12:59 iconoclasthero has joined #wayland

13:00 <karolherbst> you can only push soo much data over PCIe

13:00 <DemiMarie> Can't one usually push quite a few buffers?

13:00 <Company> DemiMarie: I worked on dmabuf-to-cpu-memory stuff recently and it takes ~200ms to copy at 8kx8k image from the GPU

13:00 <karolherbst> PCIe 4.0 x16 is like 32GiB/s

13:00 <Company> note: *from* the GPU, not *to* the GPU

13:00 <karolherbst> VRAM can be like.. 1TiB/s

13:01 <DemiMarie> Company: 8K is way out of scope for now

13:01 <Company> yeah, but that's still a lot less than I expected

13:01 <DemiMarie> For my work at least

13:01 <Company> PCIe does 32GB/s, this is more like 1GB/s

13:01 <DemiMarie> Seems like a driver bug or hardware bug worth reporting.

13:02 <Company> and a 30x difference in speeds is noticable

13:02 <karolherbst> but yeah.. if somebody wants to experiment with splitting compositing between all the GPUs/displays and make apps not have to round-trip to the iGPU, that would be a super interesting experiment to see

13:02 <DemiMarie> I mean I want to initially round-trip via shm buffers initially, because it makes the validation logic so much simpler.

13:02 <Company> karolherbst: before any of that, I need to support switching GPUs in GTK ;)

13:03 <karolherbst> :')

13:03 glennk has joined #wayland

13:03 <DemiMarie> But that is because I am starting with software rendering as the baseline.

13:03 lanodan has joined #wayland

13:03 <karolherbst> could be a new protocol where the compositor tells clients to change them, or where it simply causes the GL context to go "context_lost" or something, but yeah....

13:04 <karolherbst> I'd really be interested in anybody investigating this area

13:04 <Company> the linux-dmabuf tranches tell you which GPU to prefer, no?

13:04 <Company> I mean, ideally, with a supporting compositor

13:05 <karolherbst> I mean as in dynamically switching

13:05 <YaLTeR[m]> cosmic-comp does the split rendering from what i understand (each GPU renders the outputs it's presenting), though i believe GPU selection happens via separate Wayland sockets where a given GPU is advertised

13:05 <karolherbst> like maybe you disconnect your AC and the compositor forces all apps to go the the iGPU

13:05 <Company> karolherbst: I'd expect to get new tranches

13:05 <karolherbst> YaLTeR[m]: ohh interesting, I should check that out

13:06 <Company> YaLTeR[m]: the problem with that is that I suspect the dGPU is still faster for that output, so just using the GPU that the monitor is conencted to might not be what's best

13:06 <Company> also: people connect their monitors to the wrong GPU all the time

13:07 <Company> there's lots of reddit posts about that

13:07 <karolherbst> yeah.. but the pcie round-trip overhead could be worse

13:07 <YaLTeR[m]> it's actually not that hard to do with smithay infra (in general render with an arbitrary GPU). But it makes doing custom rendering stuff somewhat annoying

13:07 <YaLTeR[m]> Company: a random half of the usb-C ports on my laptop connect to the igpu and the other half to the dgpu

13:08 <YaLTeR[m]> certainly makes it convenient to test multi GPU issues :p

13:08 <karolherbst> oh yeah.. my laptop is USB-C -> iGPU all normal connectors -> dGPU

13:08 <Company> yay

13:08 <karolherbst> there are apparently also laptops where you can flip it

13:08 <Company> I only have my setup for testing

13:09 <karolherbst> and then there are laptops which have eDP on both GPUs and you can move the internal display to the other GPU

13:09 <Company> like what i did 10 minutes ago

13:09 <karolherbst> at runtime

13:09 <YaLTeR[m]> I have that too

13:09 <YaLTeR[m]> Not at runtime tho I don't think

13:09 <karolherbst> you can even make the transition look almost seamingless if you use PSR while the transtion is happening

13:10 iconoclasthero has quit [Ping timeout: 480 seconds]

13:10 <Company> karolherbst: fwiw, changing GPUs in GTK would not be too hard to implement (at least with Vulkan) - but I've never seen a need for it

13:10 <karolherbst> I know that some people were interested in getting the eDP GPU switch working

13:10 <karolherbst> yeah.. it might not matter much for gtk apps. Maybe more for apps who also use GL/VK themselves for heavy rendering

13:11 <karolherbst> and then AC -> move to dGPU, disconnect AC -> move to iGPU

13:12 <Company> (there are GTK apps that do heavy rendering)

13:12 <karolherbst> but my setup is already cursed an the dual 4K setup causing the iGPU heavy suffering

13:12 <karolherbst> and apps just ain't at 60fps all the time

13:12 <Company> that can easily be the app

13:13 <karolherbst> (or gnome-shell even)

13:13 <Company> because software rendering at 4k gets to its limits

13:13 <karolherbst> but yeah... dual 4K is heavy

13:13 lsd|2 has quit [Quit: KVIrc 5.2.2 Quasar http://www.kvirc.net/]

13:13 <linkmauve> vnd, Weston’s current version is 14, so 8 or 10 are very out of date and likely unsupported, you probably should upgrade that first. I don’t know if the current version supports plane offloading better on your SoC though.

13:13 <Company> plus, software rendering has to fight the app for CPU time

13:14 <karolherbst> sure, but it's hardware rendering here

13:14 <Company> hardware rendering at 4k is fine - at least for GTK apps

13:14 <karolherbst> also on a small intel GPU?

13:15 <Eighth_Doctor> karolherbst: the framework's all-usb-c connections make testing USB-C to GPU pretty easy

13:15 <Company> it should be

13:15 <karolherbst> yeah... but it isn't always here :)

13:15 <Company> not sure how smallthough

13:15 <Eighth_Doctor> and oh my god it's so damn hard to find a good dock that works reliably

13:15 <karolherbst> well it's not terrible

13:15 <karolherbst> but definetly not smooth

13:15 <Eighth_Doctor> a friend of mine and I went through 4 docks from different vendors and none of them worked as advertised because of different quirks with each one

13:16 <karolherbst> but a lot of it is also gnome-shell, and sometimes also just GPU/CPU not clocking up quickly enough

13:16 <Eighth_Doctor> I just want a dock that works 😭

13:16 <karolherbst> as they starve each other

13:16 nerdopolis has quit [Quit: http://quassel-irc.org - Chat comfortably. Anywhere.]

13:16 <karolherbst> but I suspect that's a different issue and not necessarily only perf related

13:16 <Company> probably

13:16 nerdopolis has joined #wayland

13:17 <Company> my Tigerlake at 4k gets around 600fps - so I'd expect an older GPU to halve that and a more demanding GTK app to halve that again

13:17 <nerdopolis> I think with the simpledrm case it is somewhat harder in some ways as it's not a GPU reset, but /dev/dri/card0 just completely goes away instead

13:17 <karolherbst> I think it's just all things together here

13:18 <Company> if you then do full redraws on 2 monitors with it, you get close to 60fps

13:18 <karolherbst> e.g. the gnome window overview puts the GPU at like 60% load, but is still not smooth

13:18 <Company> I learned recently that that's usually too many flushes

13:18 <karolherbst> but no idea what's going on there, nor did I check, nor do I think it's related to where things are rendered. Though I can imagine a laptop on AC doing it on the dGPU could speed things up

13:18 <karolherbst> yeah.. could be

13:19 <Company> solution: use Vulkan, there you can just never flush and have even more lag!

13:19 <karolherbst> anyway.. I think it totally makes sense to experiment more with how this all works on dual GPU setups. The question is just how much does it actually matter

13:19 <karolherbst> heh

13:20 <karolherbst> like if a laptop could move entirely to the dGPU, invcluding the desktop and all displays, it could make the experience smoother on insane setups (imagine like 8K displays)

13:20 <karolherbst> and also driving the internal display via the dGPU

13:20 <Company> the problem with that is that you guys screwed up the APIs so much

13:20 <karolherbst> heh

13:20 <zamundaaa[m]> karolherbst: I'm sometimes using an eGPU connected to a 5120x1440@120 display. Without triple buffering, the experience was *terrible*

13:21 <Company> that app devs don't want to touch multi-gpu

13:21 <karolherbst> right...

13:21 <Company> Vulkan is much nicer there

13:21 <karolherbst> yeah with GL it's a mess

13:21 <Company> but everyone but me seems stuck on GL

13:21 <karolherbst> that's why I was wondering if a wayland protocol could help here and the compositor signals via it what GPU to use

13:22 <karolherbst> and the apps only get "you recreate your rendering context now and don't care about the details"

13:22 <karolherbst> and then it magically uses the other GPU

13:22 <Company> what people really want is the GLContext magically doing the right thing

13:22 <karolherbst> yeah.. that's somewhat how it works on macos for like 15 years already

13:23 <karolherbst> they get an event telling them to recreate their rendering stuff

13:23 <karolherbst> and that's basically it

13:23 <Company> that's somewhat complicated though, because that needs fbconfig negotiation and all that

13:23 <karolherbst> (which also means they have to requery capabilities, and because of that and other reasons it's opt-in)

13:24 <karolherbst> or maybe it's opt-out now, dunno

13:24 <Company> too much work for too little benefit I think

13:24 <karolherbst> probably

13:25 <karolherbst> but as I said, if somebody wants to experiment with all of this and comes around with "look, this makes everything sooper dooper smooth, and games run at 20% more fps" that would certainly be a data point

13:25 <Company> it would

13:25 <Company> and I bet it would only work on 1 piece of hardware

13:25 <Company> and a different laptop, probably from the same vendor, would get 20% slower with the same code

13:25 <karolherbst> maybe

13:26 <karolherbst> maybe it doesn't matter much at all

13:26 <Company> I think it does matter on some setups

13:26 <karolherbst> and then eGPUs never became a huge think and dual GPU laptops are also icky enough so a lot of people avoid them

13:26 <Company> because you want to go to the dgpu when not on battery but stay on the igpu on battery

13:26 <karolherbst> yeah

13:27 <Company> first I need to make gnome-maps use the GPU for rendering the map

13:27 <Company> so I have something that can hammer the GPU

13:28 <Company> then I'll look at switching between different ones

13:28 <jadahl> karolherbst: there is already a 'main device' event that allows the compositor to communicate what gpu to use for non-scanout

13:28 <Company> I don't think HDR conversions are enough

13:29 <karolherbst> jadahl: that's for startup only or also dynamically at runtime?

13:30 <jadahl> its dynamic

13:30 <karolherbst> ahh, I see

13:31 <Company> https://wayland.app/protocols/linux-dmabuf-v1#zwp_linux_dmabuf_feedback_v1:event:main_device

13:34 <zamundaaa[m]> karolherbst: https://gitlab.freedesktop.org/wayland/wayland-protocols/-/merge_requests/268

13:35 <zamundaaa[m]> In some configurations, especially with eGPUs, the difference can be far larger than 20%

13:36 <karolherbst> yeah, I can imagine

13:37 coldfeet has quit [Remote host closed the connection]

13:38 <nerdopolis> Compositors might have to be changed to support the possibility of the primary GPU going away, correct?

13:40 <kennylevinsen> there isn't a concept of a primary GPU on the system, but applications that depend on a GPU - the display server included - need to do something to handle it going away

13:40 azerov has joined #wayland

13:47 <nerdopolis> I'm still more thinking in the case of simpledrm going away I guess. during the transition when simpledrm goes away and the new GPU device initializes...

13:49 <Company> usually there's 2 things involved: 1. bringing up the new GPU, and adapting to the changes in behavior (ie it may or may not support certain features) and 2. figuring out what to do with the data stored in the GPU's VRAM

13:51 Moprius has joined #wayland

13:51 Brainium has joined #wayland

13:52 <nerdopolis> I think at least things using simpledrm should all be using software rendering, correct?

13:54 Moprius has quit [Remote host closed the connection]

13:56 <jadahl> nerdopolis: yes. i guess in theory one could have an accelerated render node with no display controller part, where one renders with acceleration, but displays it via simpledrm, but that is probably not a very common setup

13:59 abhimanyu has quit [Quit: The Lounge - https://thelounge.chat]

14:00 abhimanyu has joined #wayland

14:02 <nerdopolis> Probably not, simpledrm is only there if the real driver hasn't loaded yet, OR they have some obscure card that is not supported by the kernel at all. I mean I guess I never tested the bootvga driver being unsupported, with the secondary device having a valid mode setting driver...

14:03 Hypfer has quit [Ping timeout: 480 seconds]

14:18 <emersion> it can happen

14:18 <emersion> e.g. nouveau doesn't have support for a newer card

14:18 <emersion> (yet)

14:18 <emersion> in general, any case where kernel is old and hw is new

14:20 <nerdopolis> Ah, that makes sense

14:29 iconoclasthero has joined #wayland

14:33 <MrCooper> geez, you leave for a couple of hours of grocery shopping, and this channel explodes :)

14:35 <MrCooper> Company: we get a fair number of bug reports about AMD GPU hangs against Xwayland (and probably other innocent projects), thanks to radeonsi questionably killing the process after a GPU reset if there's a non-robust GL context

14:35 <MrCooper> Company DemiMarie: AMD GPU resets generally work fine, the issue is that most user space can't survive GPU resets yet

14:36 <Company> I have no idea how you'd want to handle GPU resets in general

14:36 <DemiMarie> Company: Recreate all GPU-side state.

14:36 <Company> like, you'd need to guarantee there's no critical data on the GPU

14:37 <MrCooper> Company: mutter should copy buffer data across PCIe only once per output frame, not per client frame

14:37 <DemiMarie> Exactly

14:37 <MrCooper> Company: in a nutshell, throw away the GPU context and create a new one

14:37 <DemiMarie> Having critical data on the GPU is a misdesign.

14:38 <Company> is it?

14:38 <kennylevinsen> Apps will have to rerender and submit new frames, compositor will need to rerender and have windows be black until it gets new frames...

14:38 <Company> so, let's assume I have a drawing app - do I need to replicate the drawing on the CPU anticipating a reset?

14:38 <DemiMarie> Company: no, you just rerender everything

14:38 <Company> or can I do the drawing on the GPU until the user saves their document?

14:40 <Company> I mean, for a compositor that's easy - you just tell all the apps to send you a new buffer

14:40 <Company> because there's no critical data on the GPU

14:40 <Company> but if you have part of the application's document on the GPU?

14:40 <DemiMarie> Company: Don't drawing apps typically keep some state beyond the bitmap?

14:40 <MrCooper> except there's no mechanism for that yet

14:40 <kennylevinsen> even for an app I would expect that the state that drove rendering exists in system memory to allow a later rerender

14:40 <DemiMarie> That would be a misdesign

14:40 <DemiMarie> kennylevinsen: exactly

14:40 <Company> kennylevinsen: esepcially with compute becoming more common, I'd expect that to not be the case

14:40 <zamundaaa[m]> MrCooper: when a GPU reset happens, apps that are GPU accelerated will know on their own to reallocate

14:41 <zamundaaa[m]> Company: in some cases, data could get lost, yes

14:41 <DemiMarie> Company: then rerun the compute job

14:41 <kennylevinsen> DemiMarie: I don't think it's appropriate to call it a misdesign per say, there could be uses where the caveat of state being lost on GPU reset is acceptable

14:41 <zamundaaa[m]> Just like with the application or PC crashing for any other reason, apps should do regular saving / backups

14:41 <kennylevinsen> I just don't expect that to generally be the case

14:41 <MrCooper> zamundaaa[m]: the app needs to actively handle it, the vast majority don't

14:41 <DemiMarie> Generally, you should preserve the inputs of what went into the computation until the output is safely in CPU memory or on disk

14:41 <kennylevinsen> "safely in CPU memory" heh

14:41 <DemiMarie> Which is a bug in most apps

14:42 <zamundaaa[m]> MrCooper: yes, but in that case, requesting a new buffer is useless anyways

14:42 <DemiMarie> kennylevinsen: you get what I mean

14:42 <MrCooper> that's a separate issue

14:42 <DemiMarie> So yes, GTK should be able to recover from GPU resets.

14:42 <zamundaaa[m]> how so? If the app handles the GPU reset, it can just submit a new buffer to the compositor after recovering from one

14:43 <MrCooper> zamundaaa[m]: if the compositor recovers from the reset after a client, it might not be able to use the last attached client buffer anymore, in which case it would need to ask for a new one

14:43 <zamundaaa[m]> The only reason some compositors need to request new buffers from apps is that they release wl_shm buffers after uploading the content to the GPU

14:44 <zamundaaa[m]> MrCooper: right, if the compositor wants to explicitly avoid using possibly tainted buffers

14:44 <zamundaaa[m]> or rather, buffers with garbage content

14:45 <MrCooper> that's not the issue, it's not being able to access the contents anymore

14:45 <zamundaaa[m]> MrCooper: it can access the contents just fine

14:45 <MrCooper> not sure it's really needed though, keeping around the dma-buf fds might be enough

14:46 <zamundaaa[m]> Not the original one, if the buffer is from before the GPU reset, but it can still read from the buffers and get something as the result

14:46 <DemiMarie> zamundaaa: is that true on all GPUs?

14:46 <DemiMarie> I would not be surprised if that just caused another GPU fault.

14:47 <Company> my main problem is that GTK wants to use immutable objects that do not change once created - and a GPU reset changes those objects

14:47 <Company> so now you need a method to recover from immutable objects mutating

14:48 <Company> which is kinda like wl_buffer

14:48 <DemiMarie> Company: Can you make each object transparently recreate its own GPU state?

14:48 <Company> which is suddenly no longer immutable either because the GPU just decided it's bad now

14:48 <DemiMarie> Or recreate everything from the API objects?

14:49 <Company> DemiMarie: not if it's a GL texture object

14:49 <Company> and no idea about dmabuf texture objects

14:49 <DemiMarie> Company: those are not immutable

14:49 <Company> and even if I could recreate them, they'd suddenly have new sunc points

14:49 <DemiMarie> anything that is on the GPU is mutable

14:49 <kennylevinsen> the world would be much nicer if GPUs didn't reset :)

14:50 <Company> DemiMarie: those are immutable in GTK per API contract - just like wl_buffers

14:50 <DemiMarie> Company: that seems like an API bug then

14:50 <llyyr> you dont need to deal with gpu resets if you don't do hardware acceleration

14:50 <DemiMarie> Apps need to recreate GPU buffers if needed

14:50 <psykose> you don't need to deal with software if you don't have hardware yea

14:51 <DemiMarie> kennylevinsen: I believe Intel GPUs come close. They guarantee that a fault will not affect non-faulting contexts unless there are kernel driver or firmware bugs.

14:51 <zamundaaa[m]> Demi: I *think* that it's a guarantee drivers with robust memory access have to make

14:51 <Company> DemiMarie: that's the question - you can decide that things are mutable, but then suddenly eveyrthing becomes mutable and you have a huge amount of code to write

14:52 <DemiMarie> Company: that seems like the price of hardware acceleration to me

14:52 <kennylevinsen> DemiMarie: I imagine the cause of resets is generally such bugs, so not sure how helpful that guarantee is

14:52 <kennylevinsen> but amdgpu does have above-average reset occurrence

14:52 <Company> DemiMarie: same thing about mmap() - the kernel could just mutate your memory and send you a signal so you need to recreate it - why not?

14:52 <DemiMarie> kennylevinsen: On many GPUs userspace bugs can bring down the whole GPU.

14:53 <DemiMarie> Company: because CPUs provide proper software fault containment

14:53 <Company> DemiMarie: I don't think that's a useful design though - I think a useful design is one where the kernel doesn't randomly fuck with memory

14:53 <Company> DemiMarie: so make it happen on the GPU

14:54 <DemiMarie> Company: Complain to the driver writers and hardware vendors, not me.

14:54 <Company> I am

14:54 <DemiMarie> Via which channels.

14:54 <DemiMarie> ?

14:55 <Company> but I think it's fine if I just write my code assuming those things can't happen and wait for hardware to fix their stuff

14:55 <kennylevinsen> Company: currently, resets happen Quite Often™ on consumer hardware

14:55 <Company> instead of designing an overly complex API working around that misdesign

14:55 <kennylevinsen> so you probably have to expect them for now

14:55 <DemiMarie> I can say that on some GPUs, you may be able to get that guarantee at a performance penalty, because no more than one context will be able to use the GPU at a time.

14:56 <Company> kennylevinsen: not really - people complain way more about other things

14:56 <DemiMarie> kennylevinsen: how often do you see them on Intel?

14:57 checkfoc_us9 has quit []

14:58 <Company> there's also this tendency of the lower layer developers to just punt all their errors to the higher layers and then blame those devs for not handling them

14:58 <Company> which is also not helpful

14:58 <kennylevinsen> there is indeed an issue with hardware issues getting pushed to software, but we tend to get stuck dealing with the hardware we got as our users have it

14:58 <Company> "the application should just handle it" is a very good excuse

14:59 <Company> my favorite example of that is still malloc()

14:59 checkfoc_us9 has joined #wayland

15:00 <kennylevinsen> DemiMarie: I'm not a hardware reliability database - anecdotally, I have only seen a few i915 resets, but have had on amdgpu where opening chrome or vscode would cause a reset within 10-30 minutes, which was painful before sway handled resets

15:00 <DemiMarie> Company: GPU hardware makes fault containment much harder than CPU hardware does.

15:00 tzimmermann has quit [Quit: Leaving]

15:00 <DemiMarie> kennylevinsen: that makes sense

15:01 <kennylevinsen> s/have had/have had periods/

15:01 <Company> on Intel, a DEVICE_LOST because of my Vulkan skills don't reset the whole GPU

15:01 <Company> on AMD, a DEVICE_LOST makes me reboot

15:01 <DemiMarie> Company: that's what I expect

15:01 <kennylevinsen> here, DEVICE_LOST just causes apps that don't handle context resets to exit

15:02 <zamundaaa[m]> kennylevinsen: about "amdgpu does have above-average reset occurrence", not so fun fact, amdgpu GPU resets are the currently third most common crash reason we get reported for plasmashell

15:02 <kennylevinsen> but to amd's credit, resets have reduced and they also appear to resort to context loss less often?

15:03 <kennylevinsen> zamundaaa[m]: dang

15:03 <Company> dunno, I write code that doesn't lose devices

15:03 <Company> I don't want to reboot ;)

15:03 <zamundaaa[m]> kennylevinsen: in my experience, GPU resets are at least recovered from correctly lately

15:03 <zamundaaa[m]> While plasmashell may crash, KWin recovers, and some other apps do as well

15:04 <DemiMarie> zamundaaa: what are the first two?

15:04 <kennylevinsen> zamundaaa[m]: it was a huge user experience improvement when sway grew support for handling context loss

15:04 <zamundaaa[m]> Xwayland's the bigger problem

15:04 <zamundaaa[m]> Demi: something Neon specific, and something X11 specific

15:05 <kennylevinsen> hmm yeah, losing xwayland is more jarring even if relaunched

15:05 <DemiMarie> Company: in theory, I agree that GPUs should be more robust. In practice, GTK should deal with device loss if it doesn't want bad UX.

15:05 <zamundaaa[m]> kennylevinsen: it's worse. Sometimes KWin hangs in some xcb function when Xwayland kicks the bucket

15:05 <DemiMarie> zamundaaa: Neon?

15:06 <kennylevinsen> oof

15:06 <Company> DemiMarie: I think it's not important enough for me to care about - and I expect it to get less important over time

15:06 <zamundaaa[m]> Demi: KDE Neon had too old Pipewire, which had some bug or something. I don't know the whole story, but it should be solved as users migrate to the next update

15:06 <Company> DemiMarie: but if someone wants to write patches improving things - go ahead

15:09 <Company> same thing with malloc() btw - Gnome still aborts on malloc failure just like it did 20 years ago. I'm sure that could be improved but nobody has bothered yet

15:10 <DemiMarie> zamundaaa: AMD is working on process isolation, which will hopefully make things better, but it will be off by default unless distros decide otherwise.

15:11 <Company> AMD turns everything off that may make fps go down

15:13 <nerdopolis> I feel like the case where the main driver is slow to load, and the login manager greeter display server uses simpledrm, and then gets stuck in limbo when the kernel kicks it out is starting to be more common too

15:14 <nerdopolis> kwin handles it the best because of kwin_wayland_wrapper, (but only Qt applications so far)

15:15 <nerdopolis> other display servers like Weston, they hang when I boot with modprobe.blacklist virtio_gpu have them start with simpledrm and then modprobe virtio-gpu

15:15 rgallaispou has quit [Read error: Connection reset by peer]

15:16 <kennylevinsen> amdgpu being as slow to load as it is should also really be fixed...

15:19 <jadahl> kennylevinsen: I'd also like a generic "i'm gonna start trying to load a gfx driver now" signalling so one can conditionalize waiting in userspace on that

15:20 <jadahl> but it seems non-trivial to make such a thing possible

15:23 <MrCooper> Company: malloc never fails, for a very good approximation of "never"

15:23 kenny1 has quit [Ping timeout: 480 seconds]

15:24 <DemiMarie> kennylevinsen: what makes it so slow?

15:24 <kennylevinsen> you'd have to ask amdgpu devs that question

15:24 <kennylevinsen> firmware loading perhaps?

15:25 <Company> MrCooper: that took a while though - 25 years ago when glib started aborting, malloc() did fail way more

15:26 <MrCooper> DemiMarie: one big issue ATM is that the amdgpu kernel module is humongous, so just loading it and applying code relocations takes a long time

15:27 <MrCooper> Company: I'll have to take your word for it, can't remember ever seeing it fail in the 25 years I've been using Linux

15:27 <MrCooper> of course one can make it fail by disabling overcommit, that likely results in a very bad experience though

15:28 <DemiMarie> MrCooper: why the kernel module so large? Is it because of the huge amount of copy-and-pasted code between versions?

15:28 <Company> MrCooper: when I worked on GStreamer in the early 2000s, I saw that happen sometimes

15:28 <Company> also because multimedia back then took lots of memory

15:28 <MrCooper> DemiMarie: mostly because it supports a huge variety of HW

15:29 <Company> *lots of memory relative to system memory

15:30 <DemiMarie> Also I wonder if in some cases the initramfs will only have simpledrm, with the hardware-specific drivers only available once the root filesystem loads.

15:38 <nerdopolis> I think there is one distro that actually does that with the initramfs, I could be wrong

15:40 <nerdopolis> DemiMarie: And I think it does it if the volume is not encrypted or something, so it's not all installs. I think its ubuntu, but don't quote me on that

15:45 kts has joined #wayland

15:49 <nerdopolis> Yeah it is Ubuntu that sometimes doesn't load modesetting drivers in the initrd https://github.com/systemd/systemd/issues/3259

16:04 kenny has joined #wayland

16:15 plutuniun has quit [Remote host closed the connection]

16:20 kenny has quit [Ping timeout: 480 seconds]

16:35 iconoclasthero has quit [Quit: http://quassel-irc.org - Chat comfortably. Anywhere.]

16:40 kenny has joined #wayland

16:48 lsd|2 has joined #wayland

16:51 coldfeet has joined #wayland

16:56 ___nick___ has quit [Ping timeout: 480 seconds]

17:02 ___nick___ has joined #wayland

17:30 Company has quit [Remote host closed the connection]

17:32 leon-anavi has quit [Remote host closed the connection]

17:37 kts has quit [Quit: Leaving]

17:42 <wlb> wayland-protocols Merge request !342 opened by () governance: introduce workflow improvements https://gitlab.freedesktop.org/wayland/wayland-protocols/-/merge_requests/342 [governance], [In 30 day discussion period]

18:04 kestrel has quit [Quit: The Lounge - https://thelounge.chat]

18:19 plutuniom has joined #wayland

18:21 FreeFull has quit [Quit: tmux update]

18:22 FreeFull has joined #wayland

18:23 agd5f_ has joined #wayland

18:25 cyrinux has quit []

18:25 cyrinux has joined #wayland

18:28 eruditehermit has joined #wayland

18:29 agd5f has quit [Ping timeout: 480 seconds]

18:37 lsd|2 has quit [Quit: KVIrc 5.2.2 Quasar http://www.kvirc.net/]

19:08 iomari891 has joined #wayland

19:16 iomari891 has quit [Ping timeout: 480 seconds]

19:32 yaslam has quit [Read error: Connection reset by peer]

19:39 coldfeet has quit [Quit: leaving]

20:03 ___nick___ has quit [Remote host closed the connection]

20:06 coldfeet has joined #wayland

20:07 coldfeet has quit []

20:07 coldfeet has joined #wayland

20:14 coldfeet has quit [Quit: Leaving]

20:15 coldfeet has joined #wayland

20:17 coldfeet has quit []

20:18 coldfeet has joined #wayland

20:23 yaslam has joined #wayland

20:42 lsd|2 has joined #wayland

20:45 coldfeet has quit [Quit: Leaving]

21:05 yrlf has quit [Ping timeout: 480 seconds]

21:14 yrlf has joined #wayland

21:17 mripard has quit [Quit: mripard]

21:35 gryffus has joined #wayland

21:38 abhimanyu has quit [Quit: The Lounge - https://thelounge.chat]

21:50 agd5f_ has quit [Read error: Connection reset by peer]

21:52 rasterman has quit [Quit: Gettin' stinky!]

21:52 rv1sr has quit []

21:54 vincejv has quit [Ping timeout: 480 seconds]

21:57 agd5f has joined #wayland

22:00 fmuellner has joined #wayland

22:04 glennk has quit [Ping timeout: 480 seconds]

22:06 vincejv has joined #wayland

22:45 sima has quit [Ping timeout: 480 seconds]

23:00 nerdopolis has quit [Ping timeout: 480 seconds]

23:11 lsd|2 has quit [Quit: KVIrc 5.2.2 Quasar http://www.kvirc.net/]

23:22 feaneron has quit [Quit: feaneron]

23:52 iomari891 has joined #wayland