#dri-devel on 2023-09-06 — irc logs at oftc.irclog.whitequark.org

2022-12-21 00:45 ChanServ changed the topic of #dri-devel to: <ajax> nothing involved with X should ever be unable to find a bar

00:03 Company has quit [Quit: Leaving]

00:11 <DemiMarie> Is there a “how can I get started in Linux graphics” document anywhere? Asking because Qubes might be interested in Intel SR-IOV, Xen virtio-GPU native contexts, or some combination.

00:11 <DemiMarie> Is it safe to assume that if (unprivileged) userspace can freeze the GPU and the kernel fails to recover from it that is a kernel bug?

00:15 youmukon1 has joined #dri-devel

00:17 YuGiOhJCJ has joined #dri-devel

00:20 youmukonpaku1337 has quit [Ping timeout: 480 seconds]

00:26 YuGiOhJCJ has quit [Remote host closed the connection]

00:27 YuGiOhJCJ has joined #dri-devel

00:37 ngcortes has quit [Ping timeout: 480 seconds]

00:47 co1umbarius has joined #dri-devel

00:48 columbarius has quit [Ping timeout: 480 seconds]

00:52 flynnjiang has joined #dri-devel

00:56 youmukonpaku1337 has joined #dri-devel

01:00 DragoonAethis has quit [Quit: hej-hej!]

01:00 DragoonAethis has joined #dri-devel

01:00 youmukon1 has quit [Ping timeout: 480 seconds]

01:01 mvchtz has quit [Quit: WeeChat 3.5]

01:02 mvchtz has joined #dri-devel

01:06 jewins has quit [Ping timeout: 480 seconds]

01:09 youmukonpaku1337 has quit [Ping timeout: 480 seconds]

01:49 yuq825 has joined #dri-devel

02:08 flynnjiang has quit [Remote host closed the connection]

02:35 ayaka_ has joined #dri-devel

02:42 ayaka_ has quit [Remote host closed the connection]

02:45 <alyssa> karolherbst: I think the next big compute stack should be rusticl on zink on radv with the amd llvm backend

02:45 <alyssa> LLVM -> SPIRV -> NIR -> SPIRV -> NIR -> LLVM

02:45 <alyssa> :P

02:59 <HdkR> How many layers of translation can we go?

03:09 luc has joined #dri-devel

03:10 dri-logger has joined #dri-devel

03:10 mareko_ has joined #dri-devel

03:12 dri-logg1r has quit [Ping timeout: 480 seconds]

03:12 glisse has quit [Ping timeout: 480 seconds]

03:12 mareko has quit [Ping timeout: 480 seconds]

03:15 glisse has joined #dri-devel

03:18 <luc> hi, all, discussions on maillist by replying to **cover letter** of patch series won't be recorded on https://patchwork.freedesktop.org/series/123257/, will they?

03:28 anarsoul|2 has quit [Read error: No route to host]

03:28 anarsoul has joined #dri-devel

03:32 tristan has joined #dri-devel

03:33 tristan is now known as Guest1985

04:09 yout has joined #dri-devel

04:12 yout has quit [Read error: Connection reset by peer]

04:19 Guest1985 has quit [Ping timeout: 480 seconds]

04:28 Duke`` has joined #dri-devel

04:29 yyds has joined #dri-devel

04:31 yyds has quit [Remote host closed the connection]

04:33 Haaninjo has joined #dri-devel

04:34 yyds has joined #dri-devel

04:38 Duke`` has quit [Ping timeout: 480 seconds]

05:00 fab has joined #dri-devel

05:13 sima has joined #dri-devel

05:16 itoral has joined #dri-devel

05:20 ohmltb^ has joined #dri-devel

05:24 kzd has quit [Ping timeout: 480 seconds]

05:24 bmodem has joined #dri-devel

05:35 i-garrison has quit []

05:36 i-garrison has joined #dri-devel

05:38 tzimmermann has joined #dri-devel

05:45 mauld has quit [Remote host closed the connection]

05:50 mauld has joined #dri-devel

05:57 mszyprow has joined #dri-devel

06:03 vjaquez has quit [Remote host closed the connection]

06:06 youmukonpaku1337 has joined #dri-devel

06:06 fab has quit [Quit: fab]

06:10 vjaquez has joined #dri-devel

06:19 crabbedhaloablut has joined #dri-devel

06:25 An0num0us has joined #dri-devel

06:33 <itoral> karolherbst: jasuarez told me you were having some trouble with fences in v3d, maybe I can assist you with that

06:35 frieder has joined #dri-devel

06:38 fab has joined #dri-devel

06:53 TMM has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]

06:53 TMM has joined #dri-devel

06:55 mripard has joined #dri-devel

07:00 sghuge has quit [Remote host closed the connection]

07:00 sghuge has joined #dri-devel

07:04 donaldrobson_ has joined #dri-devel

07:05 donaldrobson has quit [Ping timeout: 480 seconds]

07:10 youmukonpaku1337 has quit [Remote host closed the connection]

07:10 youmukonpaku1337 has joined #dri-devel

07:13 lplc has joined #dri-devel

07:43 elongbug has joined #dri-devel

07:44 lynxeye has joined #dri-devel

07:45 flynnjiang has joined #dri-devel

07:49 swalker__ has joined #dri-devel

07:51 swalker_ has joined #dri-devel

07:51 swalker_ is now known as Guest2000

07:56 rasterman has joined #dri-devel

07:57 swalker__ has quit [Ping timeout: 480 seconds]

07:59 vliaskov has joined #dri-devel

08:02 <airlied> alyssa: okay I've slogged the functions MR to a place of great happiness :-) I'd appreciate when you next have time!

08:05 youmukonpaku1337 has quit [Remote host closed the connection]

08:05 youmukonpaku1337 has joined #dri-devel

08:06 itoral has quit [Remote host closed the connection]

08:06 itoral has joined #dri-devel

08:14 <pq> gildekel, Weston is unfortunate in that respect: it does not handle "link-status" property at all. It does have "HOTPLUG" uevent handling.

08:15 Leopold_ has quit [Remote host closed the connection]

08:17 <pq> gildekel, I don't know what would happen in Weston on link training failure. OTOH, it does handle connectors appearing and disappearing, but that's not what you're asking about.

08:17 rgallaispou has quit [Quit: Leaving.]

08:20 <pq> gildekel, I can't see Weston code reacting to mode list pruning in any way. It might be simply not implemented, so I'd guess it'd malfunction anyway already, so your kernel changes can't make it worse.

08:20 milek7 has quit [Remote host closed the connection]

08:22 Leopold_ has joined #dri-devel

08:24 rgallaispou has joined #dri-devel

08:26 Ahuj has joined #dri-devel

08:28 youmukon1 has joined #dri-devel

08:31 youmukonpaku1337 has quit [Ping timeout: 480 seconds]

08:36 fab has quit [Read error: No route to host]

08:37 fab has joined #dri-devel

08:48 donaldrobson_ has quit [Ping timeout: 480 seconds]

08:49 donaldrobson has joined #dri-devel

08:50 YuGiOhJCJ has quit [Remote host closed the connection]

08:51 YuGiOhJCJ has joined #dri-devel

08:53 youmukonpaku1337 has joined #dri-devel

08:55 An0num0us has quit [Ping timeout: 480 seconds]

08:57 youmukon1 has quit [Read error: Connection reset by peer]

09:01 milek7 has joined #dri-devel

09:02 itoral has quit [Quit: Leaving]

09:09 fab has quit [Ping timeout: 480 seconds]

09:14 dos1 has quit [Ping timeout: 480 seconds]

09:15 luc has quit [Remote host closed the connection]

09:17 dos1 has joined #dri-devel

09:20 youmukonpaku1337 has quit [Remote host closed the connection]

09:21 youmukonpaku1337 has joined #dri-devel

09:21 <emersion> pq, most likely a monitor would go black and weston would continue to use it as-if nothing happened

09:21 <emersion> pq, doesn't weston reload the modelist on HOTPLUG=1?

09:22 <emersion> (wlroots doesn't, it only does when link-status=bad)

09:23 <emersion> pq, a link-status property change results in HOTPLUG=1 uevent

09:23 <emersion> with also CONNECTOR and PROPERTY set

09:23 <emersion> in the uevent

09:24 <emersion> so if you don't read CONNECTOR/PROPERTY, it just looks like a normal HOTPLUG=1 uevent

09:24 <emersion> the kernel uses HOTPLUG=1 as a generic way of informing userspace that something in the KMS state changed

09:28 youmukonpaku1337 has quit [Remote host closed the connection]

09:28 youmukonpaku1337 has joined #dri-devel

09:29 Haaninjo has quit [Quit: Ex-Chat]

09:38 <karolherbst> alyssa: obviously

09:39 <karolherbst> it might be what we have to use until aco can deal with function calls :D

09:39 <karolherbst> doing that in zink should be trivial

09:48 An0num0us has joined #dri-devel

09:56 swalker_ has joined #dri-devel

09:56 swalker_ is now known as Guest2011

09:58 Guest2011 has quit [Remote host closed the connection]

09:58 Guest2000 has quit [Remote host closed the connection]

10:02 youmukonpaku1337 has quit [Remote host closed the connection]

10:02 youmukonpaku1337 has joined #dri-devel

10:05 <pq> emersion, no, Weston does things backwards: it updates the mode list only after the frontend has already picked a mode, and that only happens when the frontend inits or changes the mode. *facepalm*

10:06 <pq> Weston does update pretty much everything *else* on hotplug event

10:06 <pq> and communicates the change to the frontend, so the frontend can choose what to do

10:06 <pq> just... not the modes

10:08 <pq> so, nothing to worry about from kernel side, Weston is already broken there

10:10 DottorLeo has joined #dri-devel

10:10 <DottorLeo> hi!

10:11 <DottorLeo> someone has sucessfully used mesa on a phone? what is the best chipset supported by Mesa?

10:12 youmukonpaku1337 has quit [Remote host closed the connection]

10:12 youmukonpaku1337 has joined #dri-devel

10:12 mauld has quit [Ping timeout: 480 seconds]

10:15 <javierm> tzimmermann: I see that drivers/video/fbdev/au1100fb.c also has some logic when !defined(CONFIG_FRAMEBUFFER_CONSOLE) && defined(CONFIG_LOGO)

10:15 Company has joined #dri-devel

10:15 <javierm> tzimmermann: in its au1100fb_drv_remove() callback, it does a au1100fb_fb_blank(VESA_POWERDOWN, &fbdev->info)

10:15 <javierm> not sure why, because that driver doesn't show the logo like drivers/video/fbdev/au1200fb.c does

10:16 <javierm> tzimmermann: but maybe you drop that too in your series?

10:16 <pq> gildekel, *cough* https://gitlab.freedesktop.org/wayland/weston/-/issues/124

10:17 <javierm> tzimmermann: and drivers/video/fbdev/xilinxfb.c has the same logic in xilinxfb_release() but also doesn't show the logo, maybe a copy&paste thing between these old drivers?

10:20 youmukonpaku1337 has quit [Remote host closed the connection]

10:20 youmukonpaku1337 has joined #dri-devel

10:22 <tzimmermann> javierm, my cleanups where specifically for fb_prepare_logo() and fb_show_logo(), so that fb_logo.c can be made optional easily. can these other cleanups be send out separately?

10:22 <javierm> tzimmermann: yes, sure

10:23 <javierm> I was mentioning just in case you missed those

10:26 <tzimmermann> why these drivers do this is not so clear to me. that code should be in the code, if any

10:26 <tzimmermann> javierm, the definition of the blanking constants is kinda confusing https://elixir.bootlin.com/linux/latest/source/include/uapi/linux/fb.h#L303

10:28 youmukonpaku1337 has quit [Remote host closed the connection]

10:29 youmukonpaku1337 has joined #dri-devel

10:30 <pq> tzimmermann, I can guess what those mean in the signal, but I don't know why the H and V suspend modes exist.

10:31 <javierm> tzimmermann: yeah, me neither but is weird that they do on their remove/release but nothing on driver probe

10:31 <javierm> that's why I think is some kind of left over or wrong copy&paste

10:33 itoral has joined #dri-devel

10:33 <tzimmermann> pq, i think this corresponds to DPMS levels. wasn't it such that the old CRTs would resume faster/slower on different levels?

10:33 <pq> the monitor of the original IBM PC comes to mind, which would let smoke out if HSYNC pin was left in a wrong state for a too long moment.

10:34 <pq> tzimmermann, yes, kind of.

10:34 <pq> I presume that if you syncs flying, the CRT circuitry stays alive and warm, and the high voltage remains.

10:35 <pq> if you stop HSYNC, you lose high voltage, because usually the flyback transformer is driven directly from that, without an oscillator of its own.

10:36 <pq> getting the high voltage back may take a moment, but what takes a lot longer is if the CRT filament cools down.

10:36 <tzimmermann> "which would let smoke out if HSYNC pin was left in a wrong state" that's hilarious!

10:36 <pq> I don't what controls the CRT heater.

10:37 mauld has joined #dri-devel

10:37 <pq> *I don't know

10:37 <tzimmermann> thanks for the HW details

10:38 <pq> vacuum tubes! \o/

10:38 ap51 has joined #dri-devel

10:39 youmukonpaku1337 has quit [Remote host closed the connection]

10:40 youmukonpaku1337 has joined #dri-devel

10:40 <pq> filament... I mean the cathode

10:43 <ccr> similar kind of issue existed with certain Commodore PET models. you could, by manipulating some video register(s), cause damage to the CRT monitor.

10:43 <pq> yeah, the IBM PC monitor was something. That was before VGA. With VGA, it was still possible to let the smoke out of some monitors by sending a too high hsync frequency.

10:44 <pq> precisely because hsync was used to drive the flyback transistor, and too high freq. caused it to melt

10:45 <javierm> pq: I remember that a DPMS caused a noisy click on my old CRTC monitor

10:46 <javierm> in fact, every mode set was noisy IIRC

10:47 <pq> I guess relays, maybe protecting the above mentioned components. :-)

10:47 <pq> javierm, or was that accompanied with a vibrating image that then settled down?

10:47 <javierm> pq: I don't remember vibrating image, only the noise

10:47 <pq> ok

10:48 <javierm> unless the settle down was too quick for my eye to notice :)

10:48 <pq> degaussing was a fun feature of CRT monitors too

10:48 <javierm> tzimmermann: https://marcin.juszkiewicz.com.pl/2012/12/10/how-to-fry-speakers-in-your-chromebook/ is a more recent example

10:48 <javierm> of SW causing things to melt :)

10:49 <pq> isn't Asahi also needing to deal with explicit power control to avoid melting those speakers too?

10:50 <tzimmermann> javierm, "Is this howto useful? I think it is. Cause if you have device broken in some way and you want to get it replaced you can just run it and hope for replacement instead of repair." :D

10:51 <vsyrjala> iirc n900 had a pulseaudio filter to protect the speakers from unwanted frequencies. good luck if you wanted to not use pulse :/

10:52 <karolherbst> itoral: yeah.. so trying to bring up rusticl on v3d, but the compute job I'm enqueing is never signaling the fence or something... I honestly don't exactly know what's happening, that's my patch so far: https://gitlab.freedesktop.org/karolherbst/mesa/-/commit/c3d472346704d1a95609f992128dc18a4352919e

10:53 youmukonpaku1337 has quit [Remote host closed the connection]

10:53 <karolherbst> I'm probably just missing something.. dunno

10:53 youmukonpaku1337 has joined #dri-devel

10:54 <karolherbst> I could also make a gallium trace if you think that would help you understand what's going on

10:55 <itoral> so what is happening exactly? you submit a compute job and it timeouts?

10:56 tristianc6704 has quit [Ping timeout: 480 seconds]

10:58 * ccr shakes fist at pulseaudio

11:00 <karolherbst> itoral: yes

11:00 <youmukonpaku1337> pulse audio more like pulse midio

11:01 <itoral> karolherbst: what do you see in dmesg?

11:01 <karolherbst> itoral: nothing

11:01 <itoral> nothing? uh, that is very weirs

11:01 <karolherbst> I know..

11:01 <itoral> are you sure the job is being submitted?

11:01 <karolherbst> I suspect somehting with buffer tracking is off.. or context stuff

11:01 <karolherbst> yes

11:01 <karolherbst> and the ioctl doesn't return an error

11:02 <itoral> ugh, that is so weird... could you send an e-mail to me with instructions on how to replicate the issue? (assuming it is not too much work)

11:03 <itoral> in fact, if you create an issue in gitlab with the details it would be even better

11:03 <itoral> I am then try to replicate and if I do I figure I should be able to find what is happening

11:03 <karolherbst> I suspect it's something pipe_context related

11:04 <karolherbst> ehh wait...

11:04 <karolherbst> I booted it up again and now it works...

11:04 <itoral> uh, maybe the gpu was in a silly state :-/

11:04 <karolherbst> yeah....

11:04 <karolherbst> I think there were MMU faults in the dmesg, but they were triggered by other jobs? Maybe they messed up the GPU state I guess..

11:05 <itoral> yes, that could be

11:05 <karolherbst> v3d fec00000.v3d: MMU error from client L2T (0) at 0x3400, write violation, pte invalid

11:05 <karolherbst> v3d fec00000.v3d: MMU error from client L2T (0) at 0xa00, write violation, pte invalid

11:05 <karolherbst> that's what I had seen, but if one job triggering those mess up all the future ones that's kinda a problem for CTS runs :'(

11:06 <karolherbst> maybe it's also something else

11:06 <itoral> there is usually one of those on every boot... but you should only see that one

11:06 <karolherbst> this boot I see none

11:06 <itoral> if you are seeing more than that, and in parricular, if you are seeing those when you run your workload then there is something wrong

11:06 <karolherbst> now I have to deal with other errors like "unknown NIR ALU inst: div 64 %76 = u2u64 %75" :'(

11:07 <karolherbst> yeah..

11:07 <karolherbst> I try to track down what's causing those errors

11:07 <itoral> ugh, we don't support 64bit alu :-(

11:07 <karolherbst> right...

11:07 <karolherbst> probably needs more int64 lowering

11:07 <karolherbst> but then it requires packing

11:07 <itoral> yep

11:08 <karolherbst> also.. the hw doesn't support fma?

11:08 <itoral> nope :-(

11:08 <karolherbst> :'(

11:08 <itoral> yeah...

11:08 <karolherbst> CL requires _precise_ fma :')

11:08 <karolherbst> so fma will be lowered to software

11:09 <itoral> right

11:09 <karolherbst> though I wonder if we can improve the lowering in libclc :D

11:09 <karolherbst> anyway..

11:09 <karolherbst> u2u64 lowering needs pack_64_2x32_split

11:11 <karolherbst> int64 support is optional though, and I don't know why emulated fma even needs u2u64...

11:12 sarahwalker has joined #dri-devel

11:14 YuGiOhJCJ has quit [Quit: YuGiOhJCJ]

11:15 <karolherbst> uhh.. the lowering stores the mantissa as a long

11:15 karolherbst is now known as karolherbst_

11:15 <karolherbst_> pain

11:15 karolherbst_ is now known as karolherbst

11:15 <itoral> there is a lower_pack_64_2x32_split right?

11:16 tristan has joined #dri-devel

11:16 <karolherbst> ohh indeed

11:16 tristan is now known as Guest2017

11:20 youmukonpaku1337 has quit [Remote host closed the connection]

11:20 youmukonpaku1337 has joined #dri-devel

11:23 <karolherbst> itoral: it stopped working :') and nothing in dmesg

11:24 ap51 has quit [Ping timeout: 480 seconds]

11:25 <karolherbst> ehh wait.. that's just nir opts infinitely loop.. nvm

11:25 <karolherbst> why is int64 lowering so cursed

11:26 <itoral> ouch!

11:26 <karolherbst> added lower_int64 into my opt loop, because opt_algebrics pack lowering adds u2u64

11:27 <karolherbst> and uhm.. it doesn't stop now

11:27 <karolherbst> :D

11:27 <karolherbst> ahh yes

11:27 <karolherbst> nir_pack_64_2x32_split lowering uses u2u64 and u2u64 lowering uses nir_pack_64_2x32_split

11:27 <karolherbst> :')

11:27 <itoral> oh boy :)

11:28 <karolherbst> you think if lower_pack_64_2x32_split can be easily implemented though?

11:29 <karolherbst> ehhh

11:29 <karolherbst> pack_64_2x32 I mean

11:29 milek7 has quit [Remote host closed the connection]

11:30 <itoral> but that would require that we are actually able to have 64bit defs around, no?

11:31 <itoral> I mean, we would need an ssa that can reference an actual 64bit value

11:32 <karolherbst> not necessarily

11:34 <karolherbst> the lowering is literally doing `nir_pack_64_2x32_split(b, x32, nir_imm_int(b, 0))`

11:34 <karolherbst> could just.. uhm.. treat is as 32 bit :D

11:34 <itoral> oh

11:34 <karolherbst> I guess it would get messy with load/stores

11:34 <itoral> well... sure, we can do that XD

11:34 <karolherbst> and how that 64 bit value is used

11:34 <karolherbst> but I think when translating from nir that nonsense _could_ be handled in a hacky way

11:35 youmukon1 has joined #dri-devel

11:35 <karolherbst> other int64 lowering isn't as nice though

11:35 <karolherbst> just the conversion ones just place a 0 in the upper bits

11:36 <karolherbst> maybe we should just have alternative nir lowering

11:36 <karolherbst> `(('pack_64_2x32_split', a, b), ('ior', ('u2u64', a), ('ishl', ('u2u64', b), 32)), 'options->lower_pack_64_2x32_split')` mhh

11:36 <karolherbst> itoral: the thing is... u2u64 is also trivial to implement on 32 bit hardware

11:37 <karolherbst> but yeah... I suspect for it to not be very cursed, the backend IR would need some support for 64 bit values

11:37 <karolherbst> which it probably should have for vectorized load/stores anyway

11:37 <karolherbst> unless you don't have vectorized loads/stores

11:38 <karolherbst> or I just ignore fma for now...

11:38 youmukonpaku1337 has quit [Ping timeout: 480 seconds]

11:39 <itoral> maybe we would like a lowering that works in terms of uvec2 instead of u64

11:39 <itoral> that's what we do for the vulkan device address extension

11:39 <karolherbst> mhh.. maybe

11:39 <itoral> which expects device addresses to be 64-bit

11:40 <karolherbst> translatinv u2u64 to a vec2 shouldn't be too hard

11:40 <karolherbst> or rather

11:40 ap51 has joined #dri-devel

11:40 <karolherbst> ehh.. I guess it depends more on where that 64 bit value is coming from

11:41 <alyssa> airlied: :+1:

11:41 <karolherbst> yeah.. anyway, if not doing fma stuff things seem to work

11:41 <karolherbst> - random compiler crashes :')

11:41 <itoral> cool

11:41 <itoral> hahaha

11:42 <karolherbst> _but_ basic kernels seem to work

11:42 <alyssa> ...does videocore deserve CL

11:43 <karolherbst> no, but there is high demand

11:43 <karolherbst> it has 32 bit pointers :')

11:43 <karolherbst> itoral: the bigger problem is 8/16 bit handling which is mandatory in CL

11:43 <karolherbst> at least for ints

11:44 <karolherbst> let's see if I can trigger this MMU fault or if my proper implementation of set_Global_bindings fixed that one

11:44 <karolherbst> RIP https://gist.githubusercontent.com/karolherbst/a60d48fa8a748fbf580d40fc0ba286c6/raw/793fcc44b74b126889e6a96ba912bdbf78c40fef/gistfile1.txt

11:44 youmukonpaku1337 has joined #dri-devel

11:45 <itoral> v3d doesn't support native 8bit/16bit int alu, but I guess it can be done in 32-bit?

11:45 <karolherbst> yeah

11:45 <karolherbst> more or less

11:45 <karolherbst> it gets a bit fishy around load/stores but I think we have lowering in place for most of it

11:45 <karolherbst> `nir_lower_bit_size`

11:45 <itoral> those MMU errors are a real issue

11:46 <itoral> probably some out-of-bounds access

11:46 <karolherbst> mhhh

11:46 tristianc6704 has joined #dri-devel

11:46 <itoral> [ 1091.642212] v3d fec00000.v3d: MMU error from client L2T (0) at 0x0, write violation, pte invalid

11:46 <itoral> that one is pretty clear, seems to be trying to write something at a NULL address

11:47 <karolherbst> mhhh

11:47 youmukon1 has quit [Ping timeout: 480 seconds]

11:47 <karolherbst> could be something in my set_global_binding impl

11:48 <karolherbst> the model CL has with memory is kinda annoying

11:48 <karolherbst> but I'm also not sure the way I've implemented load/store global is actually correct

11:48 <karolherbst> but it's just doing 32 bit addresses anyway

11:51 <itoral> mmm... I think nir_intrinsic_load_global takes a 64-bit value no?

11:51 <itoral> that's why we added nir_intrinsic_load_global_2x32

11:51 <itoral> (which is what we use in vulkan for device addresses)

11:52 <karolherbst> ahh

11:52 <karolherbst> it's shared memory stuff

11:52 <karolherbst> itoral: not necessarily

11:53 <karolherbst> it can take a 32 bit one, but that depends on the API

11:53 <karolherbst> CL supports multiple pointer sizes and the device simply reports what it supports

11:53 <karolherbst> so if the runtime claims 32 bit points, all pointers are 32 bit

11:53 <itoral> aha

11:53 <karolherbst> anyway..

11:53 <karolherbst> seeing those mmu errors when using shared memory

11:53 <karolherbst> which makes sense

11:53 <karolherbst> because...

11:53 <karolherbst> uhm

11:54 <karolherbst> pipe_grid_info::variable_shared_mem

11:54 <karolherbst> so CL has a cursed model of shared memory

11:54 <karolherbst> you have 1. in kernel declared static shared memory blocks

11:54 <karolherbst> and 2. you can pass arbitrary sized shared memory blocks as kernel parameters into the runtime

11:55 <karolherbst> so you have a static part (declared in nir_shader) + a variable part (set via pipe_grid_info::variable_shared_mem at launch_grid time)

11:55 <karolherbst> I guess I just need to support that one then :')

11:55 <itoral> oh, I see

11:56 <karolherbst> I've added a shader_info::cs::has_variable_shared_mem so backend compilers can deal with some of it

11:56 <karolherbst> or drivers in general

11:57 <karolherbst> luckily we know if a shader has variable shared mem at compile time, so drivers can deal with that in some proper way

11:58 <karolherbst> ohh

11:58 <karolherbst> I suspect that null pointer is when it's all variable and no shared memory block is allocated

11:58 <karolherbst> maybe

12:03 <karolherbst> mhhh

12:04 milek7 has joined #dri-devel

12:04 <karolherbst> I guess some of the shared_size handling in the compiler needs to be fixed as well

12:06 <itoral> gotta go now but if you hit anything on v3d where you need asistance feel free to ping me or open an issue and I'll try to help

12:06 <karolherbst> okay, cool

12:07 milek7 has quit [Remote host closed the connection]

12:08 itoral has quit [Quit: Leaving]

12:10 <karolherbst> nice... I've added _broken_ variable_shared_size handling, but at least some/all of the MMU faults are gone

12:23 fab has joined #dri-devel

12:28 paulk-bis has quit []

12:28 paulk has joined #dri-devel

12:29 youmukonpaku1337 has quit [Remote host closed the connection]

12:29 elongbug has quit [Read error: Connection reset by peer]

12:30 youmukonpaku1337 has joined #dri-devel

12:30 elongbug has joined #dri-devel

12:39 youmukonpaku1337 has quit [Remote host closed the connection]

12:39 youmukonpaku1337 has joined #dri-devel

12:44 yyds has quit [Remote host closed the connection]

12:44 yyds has joined #dri-devel

12:50 DottorLeo has quit [Quit: Konversation terminated!]

12:53 kts has joined #dri-devel

12:58 kts_ has joined #dri-devel

12:59 jdavies has joined #dri-devel

12:59 jdavies is now known as Guest2029

13:03 kts has quit [Ping timeout: 480 seconds]

13:20 <austriancoder> which existing nir pass would do this simple transformation?

13:20 <austriancoder> 32x4 %5 = load_const (0x00000000, 0x00000000, 0x00000000, 0x00000000) = (0.000000, 0.000000, 0.000000, 0.000000)

13:20 <austriancoder> from:

13:20 <austriancoder> @store_reg (%5 (0x0, 0x0, 0x0, 0x0), %12) (base=0, wrmask=xyzw, legacy_fsat=0)

13:20 <austriancoder> to:

13:20 <austriancoder> @store_reg (%5.xxxx (0x0, 0x0, 0x0, 0x0), %12) (base=0, wrmask=xyzw, legacy_fsat=0)

13:20 <austriancoder> 32 %5 = load_const (0x00000000) = (0.000000)

13:21 <austriancoder> I think my irc client has messed up the last messages :( https://www.irccloud.com/pastebin/kY2ts6HY/

13:30 elongbug_ has joined #dri-devel

13:31 bmodem has quit [Ping timeout: 480 seconds]

13:35 yyds has quit [Remote host closed the connection]

13:35 An0num0us has quit [Ping timeout: 480 seconds]

13:36 Guest2017 has quit [Ping timeout: 480 seconds]

13:37 yyds has joined #dri-devel

13:38 elongbug has quit [Ping timeout: 480 seconds]

13:39 youmukon1 has joined #dri-devel

13:41 ap51 has quit [Ping timeout: 480 seconds]

13:45 youmukonpaku1337 has quit [Read error: Connection reset by peer]

13:48 <karolherbst> austriancoder: sounds like a job for nir_opt_reuse_constants, but I don't think nir_instr_set_add_or_rewrite (which it uses, and also used by opt_cse) seem to look into vectors

13:49 <austriancoder> karolherbst: or maybe nir_opt_shrink_vectors .. which I try to hack to support the new nir register thing

13:49 <karolherbst> mhhh

13:49 <karolherbst> I _think_ it could rather be a mix of two passes

13:49 <karolherbst> reuse_constants to use the .xxxx swizzle

13:49 <karolherbst> and shrink vectors could ditch the unused ones

13:51 kts_ has quit []

13:51 pekkari has joined #dri-devel

13:52 <karolherbst> dunno though, it just kinda feels like a cse problem

13:52 <karolherbst> and cse is currently based on entire ssa values afaik

13:53 <karolherbst> I can also see that in the future we should be able to cse parts of vectors, what if you have xy and zw being equal in a vec4?

13:55 <austriancoder> karolherbst: in the future .. are you working on something?

13:55 <karolherbst> I'm not

13:55 <karolherbst> I was just thinking out loud :D

13:58 <austriancoder> :)

14:12 itsmeluigi has joined #dri-devel

14:13 Ahuj has quit [Ping timeout: 480 seconds]

14:20 mszyprow has quit [Ping timeout: 480 seconds]

14:20 frieder has quit [Remote host closed the connection]

14:23 bmodem has joined #dri-devel

14:28 <gildekel> emersion:

14:28 <gildekel> `7:53 PM <emersion> gildekel: yeah i think 0 modes is a kernel bug`

14:28 <gildekel> I hace seen 0 modes connectors in the past, which did seem weird to me. But I am offering that connectors in such a bad state should be pruned by userspaces. How does sway handles these cases?

14:28 <gildekel>

14:29 <gildekel> pq:

14:29 <gildekel> `*cough* https://gitlab.freedesktop.org/wayland/weston/-/issues/124`

14:29 <gildekel> Oh boy - that's a 5 year old bug.. heh

14:30 kzd has joined #dri-devel

14:30 <emersion> pretty sure sway misbehaves

14:31 <emersion> when the kernel exposes a 0-mode connector

14:31 <gildekel> Well, complete link-training failures on SST sources should be fairly uncommon, as the link-training fallback logic usually salvages the process

14:32 <gildekel> however, link-training fallback is not currently not implemented for MST in i915, which may cause userspaces to hit this case more often

14:32 <gildekel> So, if my series is approved, you may be running into more modeless connectors in a bad state

14:33 mareko_ is now known as mareko

14:33 <emersion> imho still a kernel bug when a 0-mode connector is exposed

14:33 <emersion> sima: ^

14:34 <emersion> when i say "kernel mode", i mean that sway should not fix it

14:34 <emersion> s/mode/bug/

14:35 <gildekel> Hmm.. I am not entirely convinced that's a bug. How would you otherwise signal userspace that that connector is in a bad state?

14:35 <gildekel> you can't just prune it in DRM

14:35 rsalvaterra_ has quit []

14:35 <gildekel> userspace would be left confused about why the connector is completely missing. A connector with 0 modes have a state that userspace can parse

14:35 <gildekel> inform end users

14:36 youmukonpaku1337 has joined #dri-devel

14:37 <emersion> you mark it as disconnected, if it really isn't usable, gildekel

14:38 <gildekel> That's also an option I am suggesting, as an alternative. But that's still not an accurate state for the connector, as it is actually connected. There was successful communication, DPCD reads, modes, link-training even

14:39 <gildekel> If it is marked as disconnected, again, userspace may be left wondering how come a connected display is ignored without sufficient signal.

14:41 youmukon1 has quit [Read error: Connection reset by peer]

14:41 yuq825 has quit [Remote host closed the connection]

14:42 <zamundaaa[m]> Yeah it would be nice to be able to show the user that the display is connect and just not working. But if we don't get more information than "it doesn't work" then it's probably not too useful vs just not detecting the display in the first place

14:42 <zamundaaa[m]> Because "it doesn't work" is something the user will notice by themselves already

14:42 An0num0us has joined #dri-devel

14:43 youmukon1 has joined #dri-devel

14:43 <gildekel> Agreed, but seeing a connected display misbehaving is better than seeing nothing. Also, DRM defines link-training failures to be signaled to userspace via uevents and a connector prop "link-status" to be BAD

14:43 <karolherbst> also kernel changing behavior even if userspace is buggy is still a kernel regression

14:43 rsalvaterra has joined #dri-devel

14:44 rgallaispou has quit [Read error: Connection reset by peer]

14:45 mauld has quit [Remote host closed the connection]

14:45 <gildekel> No arguments there, karolherbst. But the current state of i915 is that there's a gap when link-training completely fails. The failed attempt does not issue a uevent to userspace, and, at least ChromeOS, is left thinking that the modeset was successful and the display is operational.

14:45 <karolherbst> right...

14:45 <karolherbst> I think that goes back to the point if userspace can even do anything with that information

14:45 <karolherbst> can it recover? probably not

14:45 <zamundaaa[m]> Perhaps a solution could be to add a new connection state? DRM_MODE_CONNECTED_LINK_FAILURE or something like that?

14:46 <karolherbst> can the kernel tell the user what to do to fix this situation or is it simply broken no matter what?

14:47 <gildekel> Well, if you get a connected connector with no modes, you could potentially signal to the user that the connector failed to link-train

14:47 <karolherbst> what does this tell the user even?

14:47 <gildekel> and suggest to replace the cable, or try a simpler display set up

14:47 <karolherbst> what should the user do with this information

14:47 <karolherbst> okay

14:47 <karolherbst> but "disconnectred" on a connected display kinda means "your cable might be bonkers" already

14:47 <karolherbst> _but_

14:48 youmukonpaku1337 has quit [Ping timeout: 480 seconds]

14:48 <karolherbst> I think there is value to tell the user that the cable might be bonkers

14:48 youmukonpaku1337 has joined #dri-devel

14:48 <gildekel> hey, displays are hard. We face the same difficulty in ChromeOS. We are thinking to show a bubble with a link to possible troubleshooting around displays

14:48 <karolherbst> right

14:48 <gildekel> it could be anything from bad cable to incompatibility, to driver bugs.

14:48 <karolherbst> okay

14:48 <karolherbst> but does the way the kernel reports this matter here?

14:49 <gildekel> It does, I think. If a connector comes back simply disconnected, then how do you signal userspace something went wrong vs. nothing was detected on the link?

14:49 <karolherbst> not disagreeing with your idea to report 0 modes, it's just icky if it breaks userspace

14:49 <gildekel> It's ok, this discussion is healthy, and exactly what I was hoping for

14:50 <gildekel> I am trying to make all our lives better.. not harder

14:50 <karolherbst> yeah...

14:51 <gildekel> At the end of the day, I want to be able to know in ChromeOS why a connector that succeeded to modeset is not coming up. That's where it all started. I call those "zombie" displays

14:51 <gildekel> I realized it is because i915 stops issuing uevents to users once RBR x 1 Lane fails

14:51 <karolherbst> it's just that because this is uapi territory, not breaking userspace is just more important than a clean UAPI

14:51 <gildekel> Your userspace is already broken

14:51 <gildekel> you see zombie displays, at best

14:51 youmukon1 has quit [Read error: Connection reset by peer]

14:51 <gildekel> if you're running i915, that is

14:52 <karolherbst> fair

14:52 <gildekel> and btw, this change if currently affecting i915 alone

14:52 <karolherbst> I guess it depends on how bad the regression is. turning non seeing anything on the display into a compositor crash might be reason enough

14:52 <karolherbst> if it's just the compostitor doing something silly vs not doing something at all it might not matter

14:53 <gildekel> if you do not ignore 0 mode connectors, then, depending on how you choose modes, you'll hopefully just send a disable modeset request..?

14:53 <karolherbst> at least if the display is still black, the user could e.g. just check the cables (which they probably will) and the system might recover

14:54 <gildekel> Sure, that's true. But wouldn't a better user experience be that userspace signals the user that the display is in trouble?

14:54 <karolherbst> if the display is black, what can you even signal to the user?

14:54 <gildekel> via parsing the state of a connected connector in link-status=bad, as DRM dictates?

14:54 <karolherbst> though you still have your primary one...

14:55 <karolherbst> gildekel: oh.. I meant in case all connectors are bad or something, but I think this issue is about MST specifically, but then again, what if the MST display is the only display connected anyway

14:55 <gildekel> `10:54 AM <karolherbst> if the display is black, what can you even signal to the user?`

14:55 <gildekel> Unfortunately, that's a case that can't be alleviated by anything we do

14:56 <gildekel> if the only display a user has doesn't come up...

14:56 <gildekel> no amount of signaling can help

14:56 <gildekel> unless we emit audio

14:56 <gildekel> but there's no end to this

14:56 <karolherbst> but yeah... I think _if_ marking this state in a special way helps userspace to report something more meaningful to the user of the system then that's reason enough to be more explicit unless there will be regressions

14:58 <gildekel> Agreed. I personally believe that a controlled state is better than undefined behavior, which is "zombie" displays. Also don't forget that you produce frames in these scenarios.

14:58 <karolherbst> right...

14:58 <gildekel> At least in ChromeOS we do. Frames, mouse warping, layouts, the whole thing.

14:59 <daniels> I think a pragmatic compromise would be a tweak on what Karol suggested: for current userspace, expose as disconnected, but for userspace which opts in with a client cap, expose as connected-but-useless

15:00 <daniels> the fact that auxch works isn’t much comfort to userspace; any userspace that’s weird enough to care about the distinction can opt in to receive the new status

15:05 jewins has joined #dri-devel

15:05 pekkari has quit [Remote host closed the connection]

15:05 kts has joined #dri-devel

15:06 mauld has joined #dri-devel

15:06 <gildekel> Alright. Obviously there's some discomfort around the suggested change. I'll think this through some more. Thanks for the input all. Thanks for the suggestion daniels

15:07 <daniels> np!

15:08 <daniels> your usecase makes total sense, I think it just needs finesse to keep other userspace doing the right thing is all

15:09 <gildekel> I completely understand. That's why I initiated the discussion. I don't want to break anyone's stuff.

15:09 tzimmermann has quit [Quit: Leaving]

15:10 <gildekel> My view is that the solution solidifies/extends DRM specs around complete link-status failures for both SST and MST cases, but I also agree that it's not worth regressions.

15:15 alyssa has left #dri-devel [#dri-devel]

15:17 An0num0us has quit [Ping timeout: 480 seconds]

15:17 Duke`` has joined #dri-devel

15:18 pekkari has joined #dri-devel

15:22 pekkari has quit [Remote host closed the connection]

15:36 yyds has quit [Remote host closed the connection]

15:36 <daniels> you can add it to the list of things we’d do differently if we were greenfielding it, rather than something that was mostly bashed together at GUADEC 2007 ;)

15:36 <daniels> *than building on top of

15:38 pekkari has joined #dri-devel

15:44 <seanpaul_> how about adding a new value to the link-status property which is "terminal" or somethign

15:44 <seanpaul_> then we don't need a new connector status and _maybe_ don't need a new client cap

15:45 <daniels> so the connector status is disconnected but the link status is terminal?

15:45 <seanpaul_> connector would be connected

15:46 <seanpaul_> modes could be pruned or not (sway should be resilient in both cases, but meh)

15:46 <daniels> mm, but then you’d still have userspace blithely trying to light up a display which can never be used

15:46 <seanpaul_> link-status == bad typically means "try another modeset", whereas link-status == terminal means don't bother

15:46 <zamundaaa[m]> seanpaul_: Old userspace would just overwrite that link-status value with "Good"

15:46 <daniels> ‘connected’ = ‘pixels will appear somewhere’

15:47 <seanpaul_> zamundaaa[m]: that's probably ok, they're broken anyways

15:47 <seanpaul_> daniels: i don't think that's necessarily true, connectors are connected before modeset

15:49 <seanpaul_> we have this link-status property, it seems wasteful to introduce an entirely new signal which means almost the same thing

15:50 pekkari has quit [Quit: Konversation terminated!]

15:50 <daniels> right, but once you do a modeset, you can reasonably expect that they’ll probably work

15:51 <daniels> like, semantically to userspace, the expectation is ‘this is a connector you can and probably should light up’, not ‘there’s a cable plugged in but it will never work’

15:53 <seanpaul_> i guess my point is that we already have an exception mechanism to that reasonable expectation, so why not use that

16:09 <sima> daniels, seanpaul_ gildekel I think emersion 's proposal of just marking terminally fubar connectors as disconnected makes the most sense

16:09 camus1 has joined #dri-devel

16:09 <sima> userspace can handle that, and most userspace will handle that without falling over

16:09 <sima> plus if you magically recover the link, you can change to connected again and throw an uevent out to userspace

16:10 <sima> so I'm not sure why we need to add another awkward corner case here

16:10 <sima> since connected + 0 modes very much means there's a working screen there, we just don't know how to drive it

16:11 <sima> such screens do exist (or at least have, on vga, way back)

16:11 <gildekel> sima:

16:11 <gildekel> `userspace can handle that, and most userspace will handle that without falling over`

16:11 <gildekel> But that would mean we lose all potential signal that a connector is in a bad state vs. not connected..

16:11 <gildekel> I am not completely opposed to this solution, to be clear

16:12 <sima> gildekel, why do you care?

16:12 <gildekel> Because I would like to be able to provide some information to the user that a display is connected, but there is a connectivity issue

16:12 <gildekel> I would like to provide feedback, in any form

16:12 <sima> like from a user pov, what's the difference between a badly plugged in cable and a shit cable?

16:12 <dottedmag> Maybe add a property to a connector saying "why this thing is disconnected"? With absence of value meaning "dunno, probably no cable"

16:12 <sima> in both cases they think it should work, but it doesnt

16:14 <sima> and in both cases they'll figure out that something is shit, and any attempts at further debugging feel a bit silly to me

16:14 <sima> like maybe the connector on the board is dented, and no amount of cable replacement is going to fix anything

16:14 <gildekel> In one case, they are left on their own. In the other, the OS validates their circumstances and provides some feedback

16:14 <gildekel> I see that as a better user experience.

16:14 camus has quit [Ping timeout: 480 seconds]

16:14 <sima> gildekel, yeah but what is the users going to do?

16:15 <sima> you can't tell them whether it's the laptop, cable, or sink that's busted

16:15 <sima> just "something is wrong"

16:15 <sima> which ... they know, it doesn't work

16:15 <zamundaaa[m]> sima: you can give them a hint, a list of things to do

16:15 <gildekel> ^

16:15 <zamundaaa[m]> For users that don't know anything about computers that's pretty helpful

16:15 <gildekel> In ChromeOS, we plan on providing a link to a Display Troubleshoot page

16:16 <gildekel> Displays are hard. period. We can provide basic education, where things can go wrong, cable management, etc.

16:16 <gildekel> vs. leave them on their own to open yet another bug in which "my displays don't turn on"

16:16 <gildekel> and call us all mokeys with keyboards on reddit

16:16 <gildekel> ^ (real story)

16:18 <sima> yeah that part is unfortunately not optional :-/

16:19 <sima> but yeah I still think what we should do is 1. handle this within current kms semantics, i.e. set the connector to disconnected or something if it's terminally busted

16:19 <sima> 2. do some extension on top of that, with the userspace glue to handle it, but that extensions needs to extend, not change the rules

16:20 <sima> maybe if we set it to disconnected we can abuse link-status=terminal, but that feels a bit risky

16:24 <gildekel> Well, in that case, how about zamundaaa[m]'s original suggestion to add a connector state? why isn't that sufficient? It'll be in terminal state + link-status bad after the final link training attempt. It should be sufficient to say that if a connector != CONNECTED then userspaces ignore it.. no?

16:25 <gildekel> And the, should a userspace care, it can parse connectors in terminal state for extra signal

16:27 mszyprow has joined #dri-devel

16:35 <sima> gildekel, that's 1&2 together

16:36 <sima> and I'm not super keen on auditing everything whether it copes with a new connector state

16:36 <seanpaul_> sima: can you elaborate on "risky"? it actually might work even better this way since you can mark the base connector as terminal for MST cases

16:36 mszyprow has quit [Ping timeout: 480 seconds]

16:36 <sima> that's even more risky than extending link-status, because almost everything kms userspace looks at connector state

16:36 <seanpaul_> it's already disconnected

16:36 <dottedmag> default: fprintf(stderr, "Unknown connector state"); exit)(1);

16:37 <dottedmag> this kind of risky?

16:37 <sima> yup

16:39 <zmike> mareko: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24849

16:40 <seanpaul_> sima: to clarify, i was asking about why link-status extension was risky

16:40 <seanpaul_> (especially if the connector is disconnected)

16:41 <sima> seanpaul_, well same, but since the users of that are a lot more limited it's probably going to be fine, plus if it's already disconnected even more chances it's going to be fine

16:41 <sima> but fundamentally the no regression rule means that userspace is right, no matter how stupid

16:41 <seanpaul_> ok, understood, was curious if there was anything beyond that

16:41 <sima> and upgrading a "my screen external doesn't work" bug to a "my compositor just died" bug isnt good

16:42 <sima> seanpaul_, nah just general uapi paranoia

16:42 <gildekel> Agreed. I can work with a disconnected connector with link-status=terminal

16:42 <seanpaul_> perhaps i should tighten the straps on my tinfoil hat

16:42 <gildekel> you guys get a tinfoil hat? :o

16:43 <gildekel> sima: I'll add you to future revisions of the series, if you don't mind.

16:45 <zamundaaa[m]> I agree that link-status terminal + disconnected connector would be a good solution

16:45 <sima> gildekel, little wrapper with docs and all (plus extending uapi docs for the properties too ofc) would be good

16:46 <sima> since to avoid races you have to set to disconnected before updating link-status, then uevent

16:46 Jeremy_Rand_Talos__ has quit [Remote host closed the connection]

16:46 <sima> and we don't want to give drivers any other way to get to link-status=terminal I think

16:46 Jeremy_Rand_Talos__ has joined #dri-devel

16:47 <gildekel> aye aye, Captain :)

16:48 <gildekel> We already take a similar approach when we set the link status to bad in i915, especially with the recursive function to set downstream MST ports to BAD as well

16:48 <gildekel> so I feel like this would make sense.

16:55 <seanpaul_> i assume for the MST case we're going to mark link-status terminal on the base connector and leave everything else as-is?

16:55 <gildekel> I think we should mark all MST topology as terminal as well

16:55 <gildekel> Userspaces tend to abstract these details

16:55 <gildekel> Or, rather, let me speak for ChromeOS

16:56 <gildekel> Alternatively, we can mark all downstream ports as BAD, but why...?

16:58 <seanpaul_> hmm, but usually if all connectors are disconnected from an MST branch, they are destroyed

16:59 <seanpaul_> so that would have the unintended consequence of leaving a fully disconnected MST topology

17:00 <gildekel> Does the clean up occur independently of a re-probe? because otherwise, a failed link-training even would change the status of the connectors, and, I would assume, the follow-up uevent will trigger the cleanup

17:01 An0num0us has joined #dri-devel

17:01 <seanpaul_> > a failed link-training even would change the status of the connectors

17:01 <seanpaul_> i don't think it does/would

17:02 <gildekel> Well, it doesn't, currently. But it will if my series is accepted.

17:02 <gildekel> A part of the change is to modify the state of all downstream MST ports after a failed link-train

17:02 <seanpaul_> no, all status would stay the same, base would stay disconnected and sinks would stay connected

17:03 <gildekel> Not sure I am following.

17:03 <seanpaul_> link training would fail on the base connector

17:03 <gildekel> correct, and the new state would be propagated to the downstream ports as well.

17:04 Jeremy_Rand_Talos__ has quit [Remote host closed the connection]

17:04 <gildekel> that's a part of the work we have to do after the base connector fails.

17:04 <seanpaul_> but those connectors have connected status

17:04 <seanpaul_> so you would end up with link-status terminal and status_connected

17:04 Jeremy_Rand_Talos__ has joined #dri-devel

17:05 <seanpaul_> so i think what you want to do is leave the downstream ports as CONNECTED, leave the base connector as DISCONNECTED and just update link-status on the base connector to terminal

17:05 <gildekel> I must be misunderstanding you somehow. Here's what I plan to do:

17:05 <gildekel> 1) Base connector fails link-training

17:05 <gildekel> 2) Modify its status to disconnected, mark it as terminal

17:05 <gildekel> 3) If connector is MST, recursively mark its downstream ports to disconnected + terminal

17:06 <gildekel> 4) send uevent

17:06 <seanpaul_> re: 2) it's already disconnected

17:06 youmukonpaku1337 has quit [Ping timeout: 480 seconds]

17:06 <gildekel> It's not.. not at this point..

17:07 <gildekel> This is during link-training.. the connector comes in in a "good state" to link-training. Connector is connected with link-status=GOOD by kernel or userspace.

17:08 <gildekel> We can sync offline if you want

17:08 <seanpaul_> hmm, when does the base connector transition to disconnected in the successful case?

17:08 <gildekel> Oh. I see what you mean.

17:09 <gildekel> That's a good point. But shouldn't really matter.

17:09 <seanpaul_> re: 3) we currently don't support keeping the MST topology alive when all sinks are disconnected, so this would change current behavior

17:11 <gildekel> But then we're back to square one. We have connected connectors producing zombie displays..

17:11 <gildekel> on MST

17:11 <seanpaul_> you would have to inspect the base connector's link-status and do the recursion in userspace

17:12 <gildekel> why not just let the connectors die out, and have the base connector signal that?

17:12 mauld has quit [Ping timeout: 480 seconds]

17:12 <seanpaul_> which, conceptually kind of makes sense b/c the link-status between source & mst branch device is bad, but the link in between branch devices may not be

17:17 <gildekel> Ok. So do we at least change the link-status to terminal/bad on the downstream ports?

17:17 <gildekel> I would assume we d.

17:17 <gildekel> do*

17:20 <seanpaul_> i'd say no

17:21 donaldrobson has quit [Ping timeout: 480 seconds]

17:21 <gildekel> But from this source's point of view, these connectors are unusable.

17:21 <gildekel> doesn't matter if another potential source can use them

17:22 kzd has quit [Ping timeout: 480 seconds]

17:23 sukrutb_ has joined #dri-devel

17:32 vliaskov has quit [Remote host closed the connection]

17:33 junaid has joined #dri-devel

17:39 mszyprow has joined #dri-devel

17:41 ngcortes has joined #dri-devel

17:45 <zamundaaa[m]> <seanpaul_> "you would have to inspect the..." <- That would not be backwards compatible with current userspace

17:51 <gildekel> I believe he was suggesting a solution to proper pruning of the misbehaving connectors in userspace. If left un-handled, then it should produce the same behavior in which the connectors appear to be connected and operational.

17:51 <gildekel> ...a solution for* proper pruning...

17:53 <zamundaaa[m]> Sure, I guess that would kinda-ish maybe be fine. But it would be quite different from how userspace operates right now

17:53 junaid has quit [Remote host closed the connection]

17:54 <zamundaaa[m]> by which I mean that most userspace completely ignores that MST is a thing at all. It's abstracted away by the kernel, and all we get as information about it at all is the mst path property

17:54 <gildekel> correct, that I agree. That will not change.

17:55 <gildekel> The current behavior is that after a failed link-training, all (base) connectors are left connected, marked as link-status=BAD, and no more uevents are issued. So userspace thinks the last modeset succeeded.

17:55 <gildekel> This applies to the MST connectors as well

17:56 <gildekel> If we modify only the base connector to be in terminal state (as it's already marked disconnected by the MST topology manager), and leave the MST connectors as they are (connected and link-status good), then userspace sees them as always

17:56 <gildekel> the difference is that userspaces that wish to parse the new state, can look up the base connector of the MST ports, and see that it's in a bad state, and prune the MST connectors

17:56 <gildekel> Otherwise, you'll have your zombie displays, as always, as the MST connectors are marked connected and ready

17:59 <zamundaaa[m]> I don't think that's a good solution, it's inconsistent between different connectors

18:00 <zamundaaa[m]> If you need to keep MST downstream ports as connected for kernel-internal reasons, then keep that mess in the kernel. If you want all userspace to do a recursion on the connectors and mark them as disconnected, then why not do that in the kernel when sending connector information to userspace?

18:03 <gildekel> To clarify, I am with you on this. I would rather mark the downstream connectors as disconnected/terminal

18:03 <gildekel> but there seem to be more disagreement here. And justly so, because doing that _will_ change current behavior in userspace

18:05 mszyprow has quit [Ping timeout: 480 seconds]

18:11 tertl8 has joined #dri-devel

18:11 <zamundaaa[m]> What behavior would it change? If you're referring to MST connectors being marked as disconnected instead of removed, that's not a change that can break userspace (which isn't already fundamentally broken)

18:17 <gildekel> They will not be marked as disconnected, but removed entirely.. since MST support removes the topology entirely if all downstream connectors are marked as disconnected

18:17 <gildekel> so you go from having MST connectors showing up, to having no MST connectors

18:17 <gildekel> that's the change. Whether or not it's safe or acceptable is a different discussion

18:23 <zamundaaa[m]> But that's only in the kernel. I don't particularly care about what you do in there - if you have to keep connectors marked as connected internally, then do that

18:24 sarahwalker has quit [Remote host closed the connection]

18:24 <zamundaaa[m]> What I'm saying is that when userspace calls drmModeGetConnector, you should - inside the kernel - check if link-status is terminal, and set the connection status to disconnected for userspace

18:25 <zamundaaa[m]> Then you can simply mark all the affected MST connectors as connected + link-status=terminal inside the kernel, and userspace will see disconnected + link-status=terminal

18:35 alanc has quit [Remote host closed the connection]

18:36 alanc has joined #dri-devel

18:42 Guest2029 has quit [Ping timeout: 480 seconds]

18:45 gouchi has joined #dri-devel

18:48 lynxeye has quit [Quit: Leaving.]

18:50 Haaninjo has joined #dri-devel

18:55 mauld has joined #dri-devel

19:07 <gildekel> That's a possibility, but sounds a like debugging hell

19:07 ngcortes has quit [Ping timeout: 480 seconds]

19:22 oneforall2 has quit [Ping timeout: 480 seconds]

19:24 mszyprow has joined #dri-devel

19:27 bmodem has quit [Ping timeout: 480 seconds]

19:36 oneforall2 has joined #dri-devel

19:36 Mangix has quit [Read error: Connection reset by peer]

19:38 Mangix has joined #dri-devel

19:40 guru_ has joined #dri-devel

19:41 sima has quit [Ping timeout: 480 seconds]

19:44 Danct12 has quit [Remote host closed the connection]

19:47 neniagh has quit []

19:48 oneforall2 has quit [Ping timeout: 480 seconds]

19:49 milek7 has joined #dri-devel

19:52 Danct12 has joined #dri-devel

19:59 Mangix has quit [Ping timeout: 480 seconds]

19:59 Mangix has joined #dri-devel

20:07 <Kayden> hitting some test failures with an MR on virpipe. anyone know how to reproduce those? looks like something with virgl_test_server and llvmpipe?

20:08 neniagh has joined #dri-devel

20:08 <Kayden> hm I guess I just run the server and then LIBGL_ALWAYS_SOFTWARE=1 GALLIUM_DRIVER=virpipe program, huh

20:12 tintou has joined #dri-devel

20:12 <tintou> yes

20:13 <Kayden> tintou: thanks!

20:13 <tintou> To run the test server as in the CI, you can look at https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/.gitlab-ci/deqp-runner.sh?ref_type=heads#L137 to give the right arguments and environment variables

20:13 <Kayden> how do I get it building virgl_test_server? -Dgallium-drivers=virgl isn't sufficient it seems

20:14 <tintou> that's in the virglrenderer project https://gitlab.freedesktop.org/virgl/virglrenderer

20:14 <Kayden> oh, or that's...not part of mesa, got it

20:15 fab has quit [Quit: fab]

20:18 mszyprow has quit [Ping timeout: 480 seconds]

20:19 <Kayden> virglrenderer is what translates the shaders?

20:20 <Kayden> looks ilke it, yep

20:21 <Kayden> ah, from TGSI, so there's ntt too

20:27 kzd has joined #dri-devel

20:29 <Kayden> now to figure out if this is a virglrenderer bug or a mesa glsl language frontend bug

20:39 oneforall2 has joined #dri-devel

20:40 guru_ has quit [Ping timeout: 480 seconds]

20:41 Duke`` has quit [Ping timeout: 480 seconds]

20:41 <Kayden> arg, this is confusing.

20:42 <Kayden> so on my barrier optimization MR, I'm getting rid of unnecessary barrier modes

20:42 <Kayden> which causes virglrenderer to emit memoryBarrierBuffer() and memoryBarrierAtomicCounter() instead of the full memoryBarrier(). sensible

20:42 <Kayden> but it's doing #version 140 #extension GL_ARB_shader_storage_buffer_object : require

20:42 oneforall2 has quit [Remote host closed the connection]

20:43 <Kayden> mesa only provides those functions in GLSL 4.30 or ESSL 3.10 or if #extension GL_ARB_compute_shader : enable

20:43 oneforall2 has joined #dri-devel

20:44 <Kayden> GL_ARB_compute_shader, however, does not mention a #extension directive

20:44 <airlied> I assume that means all compute shaders should have them

20:44 <Kayden> right. however, this is a vertex shader

20:45 <airlied> okay that seems like a problem then :-)

20:46 <Kayden> The functions memoryBarrierShared() and groupMemoryBarrier() are available only in compute shaders; the other functions are available in all shader types.

20:46 <Kayden> so they should be there. but when

20:47 <Kayden> there's some thought that maybe it's just adding them as an interaction with the other specs...except...memoryBarrier() is provided by ARB_shader_image_load_store, not ARB_shader_storage_buffer_object

20:48 <airlied> I'd probably just fix virgl on the mesa side by sending more barrier modes that necessary and file a bug to get virglrenderer fix

20:48 <Kayden> I guess virglrenderer ought to be doing an #extension GL_ARB_compute_shader : require when using memory barriers?

20:48 <Kayden> yeah, was going to say, not sure how to synchronize changes in the two projects

20:49 <Kayden> so right now I'm just enabling the pass for all drivers in st/glsl

20:49 <airlied> I don't think the GL_ARB_compute_shader makes sense at all in mes

20:49 <airlied> mesa

20:49 <Kayden> oh?

20:49 mauld has quit [Ping timeout: 480 seconds]

20:50 <airlied> since it doesn't look specified

20:50 <Kayden> yeah.

20:51 <airlied> ssbo spec says "Additionally, the shading language provides the memoryBarrier() function to control the relative order of memory accesses within individual shader invocations and provides various memory qualifiers controlling how the memory corresponding to individual variables is accessed.

20:51 <airlied> "

20:51 <airlied> then never mentions it again

20:52 <airlied> it's only then mentioned in the image one, which seems like virglrender should be emitting then

20:53 <Kayden> yeah, that's where the original memoryBarrier is defined

20:54 <Kayden> makes me think that mesa ought to be adding the memoryBarrierShared/AtomicCounter/Image/Buffer() variants if ARB_compute_shader is supported at all, without an #extension directive, whenever memoryBarrier() is available

20:55 <Kayden> ah

20:55 <Kayden> right, okay, so here's the thing, I guess

20:56 <Kayden> it wasn't possible to create these situations outside of compute shaders before my pass.

20:56 <Kayden> because you didn't have memoryBarrier*() subvariants outside of compute shaders, or without turning them on yourself somehow

20:56 <Kayden> hm. I guess you could turn them on and do it though

20:57 <Kayden> vrend_shader.c does do ctx->shader_req_bits |= SHADER_REQ_IMAGE_LOAD_STORE; for the full barrier but nothing for the others

20:58 <Kayden> airlied: you might also be interested in a lavapipe fix: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24842/diffs?commit_id=93ea6fdd706c430e38cc9f738c15a484ad311867

21:03 ngcortes has joined #dri-devel

21:04 <zmike> is that actually affecting anything? the scopes below should be all the filtering needed

21:05 rasterman has quit [Quit: Gettin' stinky!]

21:05 <Kayden> yeah, dEQP-GLES31.functional.image_load_store.2d.qualifiers.volatile_r32f was failing on zink+lavapipe without the fix, with https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24842/

21:06 idr has joined #dri-devel

21:07 <Kayden> was getting these barriers: http://whitecape.org/paste/lvp-diff.txt

21:09 <zmike> 🤔

21:09 <zmike> where are the NONE scope barriers coming from?

21:10 <Kayden> those are just memoryBarrier() I believe

21:12 <Kayden> yeah, test has memoryBarrier(); barrier();

21:12 <Kayden> in a compute shader

21:12 <Kayden> there's no SSBO or shared or global access, only images

21:13 <zmike> ah okay so it's a OpMemoryBarrier with scope=NONE?

21:13 <Kayden> but the barrier() still ought to synchronize the invocations I think

21:13 <Kayden> well, it's GLSL

21:15 <zmike> seems legit

21:15 <zmike> take my rb

21:16 <zamundaaa[m]> gildekel: you could always add a CAP that exposes the "real" state to userspace. Please just don't make connectors behave differently depending on their role in the MST topology

21:18 <Kayden> zmike: thank you!

21:23 gouchi has quit [Remote host closed the connection]

21:24 <Kayden> robclark: I'm hitting a freedreno failure in https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24842 (barrier mode optimizations) in piglit's spec/arb_compute_shader/execution/simple-barrier-atomics. not sure how best to debug this since I don't have an a6xx handy

21:24 <Kayden> guessing I deleted some barriers that you were relying on

21:25 <Kayden> or the atomic counters aren't being accessed via derefs when nir_opt_barrier_modes() is called so it's failing to see them. the place I call the pass seems to be working for other drivers though

21:29 mauld has joined #dri-devel

21:30 <airlied> Kayden: I'm a bit worried about emitting nir barriers on shader stages that haven't been traditionally used to seeing them

21:31 <airlied> though it's likely the problem is going to be mostly sw/virgl where we don't really have specified paths

21:31 <Kayden> airlied: seems like other than the extension directive thing, it ought to work

21:32 <Kayden> I guess I could make the pass optional via a nir_shader_compiler_options flag

21:32 <robclark> Kayden: I'll look in a few.. but you could probably use drm-shim... also I'm a big fan of asserts when it comes to what form a shader is in (since otherwise it is pretty much a mess knowing the right time and order for passes)

21:33 <idr> I just read issue 13 in the ARB_compute_shader spec, now I'm re-reading the IRC log...

21:34 <Kayden> robclark: oh can I? that'd be awesome

21:35 <robclark> well, assuming diffing nir_print and/or IR3_SHADER_DEBUG=disasm output is enough to spot the issue..

21:36 <Kayden> probably would be

21:37 <idr> Kayden: Based on my reading of issue 13, I think the "bug" is in virgl, but it occurs because it's getting something it doesn't expect.

21:38 <idr> Outside of a compute shader, a memory barrier of any sort should get turned into memoryBarrier().

21:40 mszyprow has joined #dri-devel

21:40 <Kayden> idr: why?

21:41 <Kayden> oh, you mean, when virgl translates back to GLSL.

21:41 <Kayden> because the sub-functions don't exist there

21:41 <Kayden> right

21:41 <idr> Right.

21:41 <idr> I was going to double-check what GLSL 4.30 says. That may affect things too.

21:42 <idr> If it's converting to GLSL 4.30, then... maybe.

21:44 <Kayden> it's using 1.40 + #extensions

21:44 <idr> I looked at the 4.60 spec because it's what was handy... memoryBarrierAtomicCounter, memoryBarrierBuffer, and memoryBarrierImage exist in all stages there, so I assume that's 4.30+ behavior.

21:44 <Kayden> robclark: thanks for the drm-shim pointer, I can run the shaders now and I see what's going on :)

21:44 <robclark> \o/

21:46 <Kayden> idr: Yeah. I think that's the intention

21:47 <Kayden> idr: so it's really just the undefined mess of #extension enabling in pre-4.30

21:47 An0num0us has quit [Ping timeout: 480 seconds]

21:49 <idr> Which I interpret as "those new 4.30 functions only exist in compute shaders."

21:50 <Kayden> *nods*

21:51 <airlied> yeah I think just force virgl to emit full barriers on non-compute might be the best plan

21:51 <Kayden> yeah

21:52 <airlied> we can't fix virglrenderer to fix this bug, it would need feature flags etc and new mesa needs to run on old virglrenderer

21:53 <airlied> anyone know about the v3d cpu tasks and why they aren't just compute shaders?

21:54 alyssa has joined #dri-devel

21:55 <alyssa> idr: IIRC when I reworked barriers, I noticed there was virglrenderer brokenness worked around in nir-to-tgsi

21:55 <alyssa> I have paged out all details but it's worth looking in ntt

21:57 <Kayden> alyssa: yeah, I could either hack around it in nir_to_tgsi, or in a virgl renderer pass to put them back, or just add a nir_shader_compiler_options flag to avoid calling my pass on non-compute for virgl

21:57 <alyssa> Kayden: what i meant is, I think there's already brokenness here

21:57 <alyssa> may or may not be my fault

21:57 <alyssa> may or may not affect your pass

21:57 <Kayden> okay :)

21:59 <Kayden> apparently my atomic counter handling is busted, too. glsl_to_nir uses nir_var_mem_ssbo as the mode for those. but the nir_deref_var is nir_var_uniform with glsl_type atomic_uint

21:59 <Kayden> and my pass is currently running before nir_lower_atomics_to_ssbo

22:00 a-865 has quit [Ping timeout: 480 seconds]

22:03 itsmeluigi has quit [Quit: Konversation terminated!]

22:03 youmukonpaku1337 has joined #dri-devel

22:03 youmukonpaku1337 has quit []

22:04 youmukonpaku1337 has joined #dri-devel

22:04 crabbedhaloablut has quit []

22:04 <alyssa> i kinda want to delete nir atomic counters

22:05 <Kayden> would be a fan

22:06 <Kayden> I guess it would impact r600

22:06 <idr> I think that's the only hardware that ever did anything special for atomic counters. :(

22:07 <Kayden> yeah

22:07 <idr> I have some vague recollection of warts on the spec because of that.

22:07 youmukon1 has joined #dri-devel

22:07 <idr> It wasn't us for once!

22:07 youmukonpaku1337 has quit []

22:07 <alyssa> idr: lol

22:08 <alyssa> idr: I can blame your employer for spending months of my life on geometry shaders though, right?

22:08 youmukon1 has quit []

22:08 youmukonpaku1337 has joined #dri-devel

22:08 <alyssa> (-:

22:10 a-865 has joined #dri-devel

22:11 <idr> Geometry shaders were for sure a group effort.

22:11 <idr> Honestly... I'd blame Microsoft for requiring GS and fp64.

22:12 <idr> Khronos was just keeping up with the Jones's.

22:12 <airlied> the fp64 requirement was one of dumbest self owns in graphics :-P

22:12 <airlied> pretty sure dx never actually required it, GL just had to one up them

22:13 <karolherbst> yeah.. who actually thought that was a good idea

22:13 <karolherbst> though without it, nobody would have implemented fp64 probably...

22:13 <karolherbst> or it would have been some enterprise only feature only exposed on 5 gpus

22:14 <airlied> which is exactly how it should have been :-)

22:14 <karolherbst> :D

22:14 <idr> MS must have required it, or it never would have ended up in Intel GPUs.

22:14 <Kayden> hmm, FEATURE_LEVEL_11 with D3D11_FEATURE_DOUBLES

22:16 * Kayden not adept at cross-referencing MS docs yet

22:16 <Kayden> but yeah unfortunate for sure

22:20 <alyssa> there was a brief, terrible time when Mali GPUs had native hardware for fp64

22:20 <alyssa> not just fp64

22:20 <alyssa> fp64vec2!

22:21 <alyssa> So you can add two doubles in one instruction! :-D

22:21 youmukon1 has joined #dri-devel

22:21 youmukon1 has quit []

22:22 <airlied> Kayden: yes so it was optional in d3d11, gl4.0 should have just left it as an extension

22:23 <airlied> or adopted some sort of capabilities for exts

22:23 youmukon1 has joined #dri-devel

22:23 youmukonpaku1337 has quit [Read error: Connection reset by peer]

22:32 mszyprow has quit [Ping timeout: 480 seconds]

22:35 <karolherbst> alyssa: but why...

22:35 <alyssa> karolherbst: the same genius 128-bit ALU that brought you native vec16 instructions

22:36 <karolherbst> I mean.. I kinda see the point of a 128 bit alu, but native fp64?

22:37 xypron has quit [Remote host closed the connection]

22:37 <karolherbst> I wonder if I finally have a CTS run on v3d without crashing the GPU

22:37 youmukon1 has quit [Read error: Connection reset by peer]

22:37 youmukonpaku1337 has joined #dri-devel

22:38 youmukon1 has joined #dri-devel

22:38 youmukonpaku1337 has quit [Read error: Connection reset by peer]

22:46 columbarius has joined #dri-devel

22:47 co1umbarius has quit [Ping timeout: 480 seconds]

22:49 <Kayden> huh, this is new

22:50 <Kayden> went to go test my updated MR !24842 to make sure I fixed the virpipe and freedreno regressions, and... https://gitlab.freedesktop.org/mesa/mesa/-/pipelines/979688 says "You are not authorized to run this manual job" on any of the container steps. I used to click the play button there to cause it to cascade through the other jobs and actually test stuff

22:50 <Kayden> did something change? am I supposed to be doing this differently now?

22:51 <Kayden> oh.

22:51 <Kayden> wasn't signed in, heh

22:52 <Kayden> (signed in twice today but various browser tabs are having...problems. *shrug*)

22:54 simon-perretta-img_ has joined #dri-devel

22:58 simon-perretta-img has quit [Ping timeout: 480 seconds]

23:02 YuGiOhJCJ has joined #dri-devel

23:05 <DemiMarie> Could one implement Venus on top of WebGL?

23:06 <DemiMarie> airlied: hard disagree on fp64, you kind of need it for many scientific computing workloads

23:09 <airlied> exactly so we don't need it :-)

23:09 <airlied> like intel even dropped it from their hw going forward

23:10 <HdkR> I like the NVIDIA approach. One fp64 pipeline per SM for compatibility, if you need real perf then buy the compute focused cards :D

23:11 simon-perretta-img_ has quit [Ping timeout: 480 seconds]

23:18 alanc has quit [Remote host closed the connection]

23:19 Haaninjo has quit [Quit: Ex-Chat]

23:21 heat has joined #dri-devel

23:44 <DemiMarie> airlied HdkR: who is the “we” in “so we don’t need it”?

23:45 <airlied> 99% of graphics/compute developers, the GL4 API mandating it etc

23:45 <airlied> like it was fine as an extension for specialist hardare like dx11 did it

23:46 <karolherbst> fp64 is unusable on any desktop GPU anyway

23:46 youmukonpaku1337 has joined #dri-devel

23:54 youmukon1 has quit [Ping timeout: 480 seconds]

23:55 kts has quit [Ping timeout: 480 seconds]