#dri-devel on 2023-06-29 — irc logs at oftc.irclog.whitequark.org

2022-12-21 00:45 ChanServ changed the topic of #dri-devel to: <ajax> nothing involved with X should ever be unable to find a bar

00:09 psykose has quit [Remote host closed the connection]

00:10 psykose has joined #dri-devel

00:14 columbarius has joined #dri-devel

00:15 co1umbarius has quit [Ping timeout: 480 seconds]

01:00 ngcortes has quit [Ping timeout: 480 seconds]

01:10 heat has quit [Remote host closed the connection]

01:30 kzd has quit [Quit: kzd]

01:43 kzd has joined #dri-devel

01:48 dviola has left #dri-devel [WeeChat 4.0.0]

01:48 dviola has joined #dri-devel

01:58 digetx has quit [Ping timeout: 480 seconds]

01:59 digetx has joined #dri-devel

02:01 Danct12 is now known as Guest4502

02:01 Danct12 has joined #dri-devel

02:01 Danct12 has quit []

03:05 TMM has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]

03:05 TMM has joined #dri-devel

03:10 orbea1 has quit []

03:11 orbea has joined #dri-devel

03:56 aravind has joined #dri-devel

04:24 <orowith2os> karolherbst: was poking around the Mesa source tree, rusticl specifically, and noticed that rusticl is currently on Rust 2018. Have you not gotten around to updating to 2021 yet, or do you want to stay on 2018 for backwards compat, or?

04:27 <orowith2os> I'd love to give it a small attempt myself to see if I can manage it, test my skills, but want to check with you first.

04:28 aravind has quit [Ping timeout: 480 seconds]

04:31 bmodem has joined #dri-devel

04:38 lina has quit [Quit: Lost terminal]

04:40 Duke`` has joined #dri-devel

04:50 bmodem has quit [Ping timeout: 480 seconds]

04:50 bmodem has joined #dri-devel

04:51 bmodem has quit [Excess Flood]

04:51 bmodem has joined #dri-devel

05:06 kzd has quit [Ping timeout: 480 seconds]

05:07 <orowith2os> never mind, git logs were goofy 😛

05:07 <orowith2os> it seems like it uses Rust 2021 already? At least, it uses the stdlib of 2021.

05:16 fab has joined #dri-devel

05:17 zf has quit [Ping timeout: 480 seconds]

05:17 Duke`` has quit [Ping timeout: 480 seconds]

05:21 zf has joined #dri-devel

05:34 bmodem has quit [Ping timeout: 480 seconds]

05:39 bmodem has joined #dri-devel

05:46 Ultra has quit [Ping timeout: 480 seconds]

05:46 Ultra has joined #dri-devel

05:47 zf has quit [Ping timeout: 480 seconds]

05:51 lina has joined #dri-devel

05:56 itoral has joined #dri-devel

05:58 zf has joined #dri-devel

05:59 sgruszka has joined #dri-devel

06:07 f11f12 has joined #dri-devel

06:11 fab has quit [Quit: fab]

06:20 rasterman has joined #dri-devel

06:23 bgs has joined #dri-devel

06:29 mbrost has quit [Ping timeout: 480 seconds]

06:38 tursulin has joined #dri-devel

06:56 fab has joined #dri-devel

06:58 sghuge has quit [Remote host closed the connection]

06:58 sghuge has joined #dri-devel

07:02 tzimmermann has joined #dri-devel

07:07 mbrost has joined #dri-devel

07:09 fab has quit [Read error: Connection reset by peer]

07:09 fab has joined #dri-devel

07:10 fab is now known as Guest4518

07:10 eukara has quit [Ping timeout: 480 seconds]

07:12 bmodem has quit [Ping timeout: 480 seconds]

07:12 Guest4518 has quit []

07:13 bmodem has joined #dri-devel

07:14 bmodem has quit [Excess Flood]

07:14 fab_ has joined #dri-devel

07:14 bmodem has joined #dri-devel

07:15 fab_ is now known as Guest4519

07:16 jhli has quit [Remote host closed the connection]

07:18 frieder has joined #dri-devel

07:20 frankbinns has joined #dri-devel

07:20 mbrost has quit [Ping timeout: 480 seconds]

07:33 aravind has joined #dri-devel

07:35 <MrCooper> pq: "If FB_ID is non-0, solid_fill blob is ignored" is backwards compatible, isn't it?

07:36 bmodem has quit [Ping timeout: 480 seconds]

07:36 bmodem has joined #dri-devel

07:37 bmodem has quit [Excess Flood]

07:37 bmodem has joined #dri-devel

07:38 JohnnyonF has joined #dri-devel

07:46 JohnnyonFlame has quit [Ping timeout: 480 seconds]

07:53 cwegener1 has quit [Quit: WeeChat 3.8]

07:57 JohnnyonFlame has joined #dri-devel

08:00 kxkamil2 has quit []

08:03 elongbug has joined #dri-devel

08:03 Ahuj has joined #dri-devel

08:04 JohnnyonF has quit [Ping timeout: 480 seconds]

08:10 djbw has quit [Read error: Connection reset by peer]

08:16 i509vcb has quit [Quit: Connection closed for inactivity]

08:22 <lordheavy> Any way to debug "Couldn't create Clang invocation." with rusticl ? already tried RUSTICL_DEBUG=clc

08:22 <pq> MrCooper, I suppose so, if FB_ID=0 is not the (only) way to disable the plane.

08:23 <MrCooper> hmm, good point, that may be the case

08:23 <pq> thinking about someone leaving a non-0 solid_fill blob behind

08:24 <tarceri> Anyone got a setup with nvidia binary drivers installed and can test a piglit shader_test for me?

08:25 <MrCooper> pq: maybe CRTC_ID=0 disables the plane as well

08:30 <pq> MrCooper, I guess, but what does userspace rely on?

08:30 <pq> maybe DRM core required both CRTC_ID and FB_ID to be 0 together right now? That would work.

08:30 kxkamil has joined #dri-devel

08:31 Guest4519 has quit [Read error: Connection reset by peer]

08:31 aravind has quit [Ping timeout: 480 seconds]

08:32 <pq> OTOH, I don't even care about this kind of "new userspace left-overs" compatibility with old userspace, because it seems no-one else does either. And it gets exponentially more difficult when new things get added if done this ad hoc way.

08:35 lina has quit [Ping timeout: 480 seconds]

08:37 <MrCooper> looks like drm_atomic_plane_check does enforce that both are (non-)0

08:40 Haaninjo has joined #dri-devel

08:42 <pq> it would need extending to solid_color blob, too

08:42 <pq> to be future-proof

08:54 <MrCooper> that would break currently existing user space though

08:55 quantum5 has joined #dri-devel

08:58 quantum5- has quit [Ping timeout: 480 seconds]

08:59 alanc has quit [Remote host closed the connection]

09:00 alanc has joined #dri-devel

09:03 <pq> MrCooper, yeah, OTOH when someone then adds a third way of putting content to planes, we would not be able to ad hoc make that backward-compatible, because the "all are guaranteed 0 for disable plane" card was already used and discarded.

09:03 <MrCooper> right

09:15 Haaninjo has quit [Ping timeout: 480 seconds]

09:20 <karolherbst> orowith2os: should be 2021

09:21 <karolherbst> atm we set the rustc req to 1.60, but I think there might be reason enough to bump that soon, now that the kenrel is at 1.68.2 and firefox ESR at 1.65

09:22 fab_ has joined #dri-devel

09:22 fab_ is now known as Guest4527

09:22 cmichael has joined #dri-devel

09:31 YuGiOhJCJ has joined #dri-devel

10:07 MrCooper has quit [Quit: Leaving]

10:07 MrCooper has joined #dri-devel

10:11 MrCooper has quit [Remote host closed the connection]

10:34 MrCooper has joined #dri-devel

10:40 MrCooper has quit [Remote host closed the connection]

10:41 MrCooper has joined #dri-devel

10:55 funtoomen has joined #dri-devel

10:57 <funtoomen> Hi, I have recently read Phoronix article about you switching to BLAKE3 instead of SHA1. If BLAKE3 is a cryptographic hash function wouldn't it be faster to use a non cryptographic hash function or even a checksum function? Do you need the benefits of cryptographic hash functions over other hash/checksum functions for the purpose of uniquely identifing Vulkan shaders?

10:58 <pendingchaos> yes, it needs to be cryptographic

10:58 <funtoomen> Why so?

10:59 <pendingchaos> because collisions aren't handled

11:01 <HdkR> Some internal hashing uses xxhash for things that don't need cryptographic

11:07 vliaskov has joined #dri-devel

11:11 <funtoomen> pendingchaos: what do you mean? wouldn't collisions happen almost never?

11:12 <pendingchaos> with a cryptographic hash, yes

11:12 <pendingchaos> not with a non-cryptographic hash

11:13 <pq> But how much is not too much? There is a difference when you have an adversary that is intentionally attempting to cause a collision, and hitting one just by sheer bad luck.

11:14 <dottedmag> funtoomen: Any performance suggestions should come with benchmark information, thnks

11:16 <pendingchaos> pq: I'm not sure I understand the question

11:16 <pendingchaos> a single collision is bad, because then the shader cache will use the wrong shader binary

11:16 <funtoomen> dottedmag: i mean, im just asking. i dont know nothing about graphics driver development, but i know enough cryptology to know that cryptographic hash functions come but some drawbacks

11:17 <funtoomen> s/come but/come with/

11:17 <pq> All hash functions with hash shorter than input have collisions. With cryptographic hash functions it is just much harder to intentionally cause collisions, but they can still happen accidentally.

11:19 <pq> when does that theoretical concern become a practical concern, I have no clue

11:20 <pendingchaos> probably never

11:20 <pq> why?

11:22 <pq> Is the goal of making intentionally finding collisions as hard as possible equivalent to the goal of reducing the possibility that two shader texts collide by accident?

11:22 <pq> funtoomen, any idea?

11:23 <pendingchaos> because cryptographic hash functions are very good and have a relatively large output size

11:23 <pendingchaos> I would expect the former goal to help in the latter

11:25 <funtoomen> pq: in my opinion even checksum function shoud make the collision *very* unlikely, and would come with quite some performance. but as i said i know nothing about graphic driver development, im just cryptology enthusiast.

11:26 <pq> funtoomen, really nice to hear from that side :-)

11:28 <psykose> shader cache is trusted input so it does have the implication of someone wanting to intentionally collide it under some scenario

11:28 elongbug has quit [Remote host closed the connection]

11:28 elongbug has joined #dri-devel

11:33 <pq> psykose, but if an adversary is able to run shaders, whould they also not have write access to the cache files? Maybe not on WebGL perhaps?

11:33 <psykose> other way around, can't run shaders but can write to cache

11:34 <pq> how?

11:34 <pq> you mean the cache is poisoned whatever ways, and then a legit app falls prey?

11:34 <psykose> perhaps

11:34 <psykose> i mean it's obviously a very niche scenario

11:35 <psykose> hm, though it is possible how you say it too

11:35 <pq> in that case, why wouldn't the attacker just look at the original cache what hashes have been used, and simply replace their contents?

11:35 <pq> guaranteed hit the legit app starts the next time

11:36 <pq> regardless of hash functions

11:36 <psykose> that's true :)

11:38 <funtoomen> so you use cryptographic functions just because the make almost impossible thing (collision) even more imposible?

11:38 <funtoomen> s/because the/because they/

11:39 <HdkR> Targeted collisions are very much a concern

11:40 <pq> I suppose saving the original shader text in the cache for detecting collisions would be prohitive from both performance and legal perspective? and storage?

11:40 <pq> *prohibitive

11:41 <funtoomen> HdkR: but if someone could try and target a collision wouldnt be the machine already comprimised?

11:42 <pq> not with WebGL, I suppose - unless you consider WebGL itself to compromise the machine in the first place

11:43 <HdkR> Remote execution doesn't necessarily mean the full system is compromised

11:43 <HdkR> Shutting down attack vectors is good :)

11:45 <karolherbst> yeah soo.. if it comes to hashes, the _correct_ way of using them is verify the key matches the data set, though that would blow up our disk cache and makes it more expensive, but... yeah, atm we have this current potentialy problem of a key loading a different cache item

11:45 <funtoomen> HdkR: Heah, i guess you are right. Im curius what are the performence drawbacks, would be cool if someone more competent then me did a benchmark.

11:45 <dottedmag> Also Amdahl's law. How much time does hashing take after switching to blake3? That's why I asked for benchmarking: if hashing cost is now in the noise, then replacing the hash with the one that executes in no time won't improve anything.

11:45 <karolherbst> but a cryptographic "safe" hash kinda mitigates that problem enough so people rely on it being sane

11:46 <funtoomen> ok, i think i get it now.

11:47 <HdkR> Blake3 is actaully quite quick for being a cryptographic hash which helps :)

11:47 <karolherbst> but anyway.. sha1 is already broken, soo...

11:48 <funtoomen> yeah, thats why they switched i guess

11:48 <karolherbst> I am wondering if any use actually hit a cache collision already...

11:48 <karolherbst> *user

11:48 <funtoomen> same

11:48 <karolherbst> *hash

11:49 <pq> they probably tried a different mesa version next - does that invalidate the whole cache?

11:49 <funtoomen> would love i someone calculated the probability based on the size of hash cache

11:49 <funtoomen> s/love i/love if/

11:49 <karolherbst> pq: yes

11:49 <pq> so that would blow the problem away

11:50 <karolherbst> well... we use the build-id of the so files

11:50 <funtoomen> thank you all, i have to go

11:50 funtoomen has quit [Quit: Page closed]

11:50 <psykose> dottedmag: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22387 is like 50% already just over sha1 for full cache hits

11:50 <psykose> not sure what that 2.4s ratio is of cache:everything-else

11:51 <psykose> but it's really just low territory at that point i guess

11:51 <karolherbst> I wonder if it's better to just depend on a lib implementing blake3 instead of shipping on ourselfes....

11:51 <psykose> well, it is a lib

11:51 <psykose> it's blake3 reference copied into the tree

11:51 <psykose> :D

11:51 <karolherbst> yeah...

11:52 <psykose> just like sha1 was and xxhash also is i think

11:52 <karolherbst> could also just depend on libblake instead or something :P

11:52 <psykose> aye

11:53 <pq> good luck adding any dependency to mesa, or even bumping existing ones

11:53 <karolherbst> ohh seems like blake3 is so fast, because it actually bothered to keep SIMD in mind

11:54 <dottedmag> I remember maintaining libsha1 (a copy-paste from somewhere else, SHA1 only) in an embedded distro just to build kdrive (or was it fontconfig?). It wasn't that fun.

11:54 <karolherbst> so the speed mainly comes from the fact you can run stuff in parallel

11:55 <karolherbst> fun.. seems like the rust impl of blake3 is the "main" one

11:56 <psykose> eh, i'd say the C one is also maintained and meant to be used

11:56 <psykose> not like some thing someone threw in

11:56 <karolherbst> yeah, but you can't run it with multiple threads

11:56 <psykose> well, on the same input no

11:57 <psykose> but shaders are parallelisable per-shader anyway, no?

11:57 <psykose> i.e. multiple hash threads, each takes 1, ..

11:57 <karolherbst> it's not limited to one operation

11:57 <karolherbst> you can run hashing one thing in parallel

11:57 <karolherbst> (at least with the rust impl)

11:58 <karolherbst> `b3sum` e.g. does this

11:59 <psykose> yeah, i'm referring to the C version and what you can do anyway, abstractly for things that don't parallel on one input

11:59 <psykose> personally i always found that to be an easier model

11:59 <karolherbst> right

11:59 <karolherbst> it probably doesn't make much sense if compilation of shaders happens in parallel already

11:59 <karolherbst> but it seems like most of the speed actually comes from doing it multithreaded

12:00 <karolherbst> yeah...

12:01 <psykose> are you sure? with b3sum it's actually slower with more threads on a random thing i tested

12:01 <karolherbst> huh.. weird

12:02 <psykose> ah, no

12:02 <psykose> i was misreading

12:02 <karolherbst> I wonder how its speed compares to sha1sum? with 1 thread

12:02 <karolherbst> but yeah.. it looks like 8-16 threads is somehow the sweet spot

12:03 <karolherbst> at least according to the paper

12:03 <karolherbst> ohh.. the "5.2 Multi-threading" section explains it quite well

12:04 <karolherbst> makes it sound like you can actually also do it on the GPU fairly easy

12:04 <psykose> https://img.ayaya.dev/N3dIgJKYNQFu

12:04 <karolherbst> okay

12:04 <psykose> personally though i'd say the User: time is the most interesting one, and how it's 2x faster than sha1sum

12:04 <karolherbst> yeah, so it's not _that_ much faster single threaded

12:04 <karolherbst> still faster tho

12:04 <psykose> more threads is 'nice' but you can't in general rely on it because it is more actual cpu resource even if wall is lower

12:05 <karolherbst> yeah

12:05 <psykose> (compression algorithms usually have a scaling issue there, like if you do more than -T4 on zstd you'll get less time but at like -T16 you're not even halving the time and using over 4x the cpu, so it becomes very wasteful unless you have a dedicated reason to do it, etc)

12:06 <karolherbst> yeah so in blake3 you chunk the input and each thread can operate on those chunks, and then the main thread chains it all together in order

12:06 <psykose> yeah, it scales quite efficiently

12:06 <karolherbst> you could probably calculate all those chunks on a GPU and just chain it on the CPU :D

12:07 <karolherbst> but kinda cool

12:08 <karolherbst> being able to make full use of SIMD is nice

12:08 <karolherbst> at least it's a way more sane programming model they had in mind it seems

12:08 <psykose> nerd :p

12:08 <karolherbst> :P

12:09 <karolherbst> well.. most other things vectorize a linear operation which doens't get you anywhere

12:12 itoral has quit [Remote host closed the connection]

12:14 YuGiOhJCJ has quit [Remote host closed the connection]

12:14 Guest4527 has quit [Quit: Guest4527]

12:19 kts has joined #dri-devel

12:23 frankbinns has quit [Remote host closed the connection]

12:31 <dottedmag> karolherbst: We put a shader into shader so you can hash while you hash?

12:31 kts has quit [Ping timeout: 480 seconds]

12:33 <karolherbst> yes

12:37 <javierm> tzimmermann: nice cleanup series. I only read the cover-letter for now but agree on the direction. I'll try to review the patches tomorrow

12:38 <javierm> tzimmermann: btw, I rebased last night the RFC to split FB in FB_CORE and FB. After your recent fbdev cleanups, I could drop two patches and now are only Kconfig and makefile changes :)

12:40 <tzimmermann> thanks, javierm

12:41 <tzimmermann> i wanted to use the firmware edid with simpledrm, but found that the rsp code is slightly chaotic. hence the cleanup

12:42 <javierm> tzimmermann: right. But having screen_info defined only for the arches that use it would be great. I remember having build issues due some arches missing ifdefery for that

12:43 lina has joined #dri-devel

12:44 <tzimmermann> yes. and we can even do better, i think. i outline in the cover letter that we could enable it only when there are actual users. it's not in the patchset, but a follow-up would be straight forward.

12:44 <javierm> tzimmermann: yup, I read that. That's why I said agree on the direction :)

12:45 <tzimmermann> javierm, BTW have you seen https://gitlab.freedesktop.org/drm/amd/-/issues/2649

12:46 <tzimmermann> for some odd config, the fbdev console doesn't set up correctly. i'm trying to wrap my head around it, but it's confusing

12:46 <tzimmermann> if the primary display is off, the usb-attached monitors remain off as well.

12:46 <tzimmermann> it's a regression

12:46 <javierm> tzimmermann: hmm, no I haven't seen that bug before. Let me read it...

12:49 fab has joined #dri-devel

12:52 <MrCooper> tzimmermann: I suspect the lid being closed might just matter indirectly as well, e.g. via timing; AFAICT the fundamental issue is that the fbdev emulation doesn't correctly handle the hot-plugged DP MST connector

12:54 <tzimmermann> i does work if the old output_poll_changed callback has been set. i cannot get how this affects fbdev state. maybe i'll fill the code with printks and let the reporter run it before and after.

12:59 <javierm> tzimmermann, MrCooper: if amdgpu is using the generic fbdev emulation then setting .output_poll_changed should not be needed indeed

12:59 <javierm> I guess will have to dig that driver code

13:01 <MrCooper> "should" being the key word :)

13:01 <javierm> MrCooper: right :)

13:01 <tzimmermann> right. it "should" not be needed. so there's a bug somewhere

13:04 rasterman has quit [Quit: Gettin' stinky!]

13:12 sgruszka has quit [Ping timeout: 480 seconds]

13:18 <pq> Has anyone else had problems that when program and Mesa are built with ASan, ASan itself segfaults on exit (radeonsi), or everything futex-deadlocks before the program finished (llvmpipe)?

13:19 <pq> the program is Weston, fwiw

13:21 simon-perretta-img has joined #dri-devel

13:21 <pq> That's Mesa build with glvnd. Without glvnd I get reports of leaks inside Mesa with radeonsi, and the same futex deadlock with llvmpipe.

13:22 <pq> and I'm pretty sure it's not Weston leaking anything, since eglTerminate + eglReleaseThread should clean up.

13:25 illwieckz has quit [Ping timeout: 480 seconds]

13:35 illwieckz has joined #dri-devel

13:38 sgruszka has joined #dri-devel

13:45 <lordheavy> Ok, i have more informations about clang failure and rusticl - adding this simple patch https://paste.xinu.at/6CTg/

13:45 <MrCooper> pq: I think the EGLDisplay is inevitably leaked without EGL_KHR_display_reference (which Mesa doesn't support yet: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10118)

13:45 <lordheavy> gives me error: unknown argument: '-no-opaque-pointers'

13:46 <lordheavy> i think it's related to archlinux - so not a mesa bug - but think king of information is usefull intead of just the message 'Couldn't create Clang invocation'

13:47 <psykose> lordheavy: that looks like the mesa is using clang16 which doesn't recognise the no-opaque mode anymore

13:48 <psykose> https://gitlab.freedesktop.org/mesa/mesa/-/issues/7468 intel clc is not ported yet, so only works with 15 afaict

13:48 <psykose> that said i'm not a developer for that, just what i know of of that combination more generally :)

13:49 JohnnyonFlame has quit [Ping timeout: 480 seconds]

13:49 <psykose> code part is https://gitlab.freedesktop.org/mesa/mesa/-/blob/9ca1bb3cf8f2f4d9378ceb8ae39e6f853fb900b0/src/compiler/clc/clc_helpers.cpp#L787

13:49 <psykose> and you're right, error should be logged i think

13:50 <lordheavy> psykose: oh, thanks - now it's time to patch and test

13:50 <psykose> i don't think there will be a trivial patch unless it was accidentally left in there, as being opaque-pointers compatible usually takes a bunch more work

13:50 <psykose> good luck however

13:51 JohnnyonFlame has joined #dri-devel

13:57 <pq> MrCooper, oh. In that case I'd kinda expect to see more leaks than I do. Would leaking the display also leak shader bits?

13:58 <MrCooper> that's been my assumption, not sure though

13:59 <MrCooper> valgrind always reports tons of leaks in Mesa code for me with a Wayland compositor or Xwayland; I've been assuming they're mostly due to this, might be wishful thinking though :)

13:59 fab_ has joined #dri-devel

14:00 <pq> https://gitlab.freedesktop.org/-/snippets/7648 are the leaks I see from the main thread. There were more in other mesa created threads.

14:00 fab_ is now known as Guest4544

14:00 <lordheavy> psykose: great, removing '-no-opaque-pointers' at least fixed my issue with clinfo ;)

14:00 <psykose> :D

14:05 Haaninjo has joined #dri-devel

14:06 fab has quit [Ping timeout: 480 seconds]

14:06 Guest4544 is now known as fab

14:06 fab is now known as Guest4545

14:09 Company has joined #dri-devel

14:21 junaid has joined #dri-devel

14:23 <javierm> MrCooper, tzimmermann: after staring the amdgpu code for a long time, the only thing that I can't think of is that drm_fbdev_generic_setup() is only called when there are available connectors

14:23 <javierm> https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c#L2169

14:23 <javierm> if after probing the driver, a connector is added then the generic fbdev won't be set-up ?

14:24 <tzimmermann> javierm, indeed. that's some weird code

14:25 <tzimmermann> that whole condition should be removed IMHO

14:25 <javierm> which might cause the issue since the drm_fbdev_generic_setup() -> drm_fbdev_generic_client_hotplug() won't happen

14:26 <MrCooper> javierm: sounds plausible, if it wasn't for the eDP connector existing even with the lid closed here (even with connected status)

14:27 <MrCooper> also, fbdev emulation works on the internal panel if I open the lid

14:27 <javierm> MrCooper: that's not what the shared dmesg log says though... at least how I read it

14:27 <javierm> [drm:drm_helper_probe_single_connector_modes [drm_kms_helper]] [CONNECTOR:78:eDP-1] status updated from unknown to connected

14:27 <tzimmermann> javierm, for the output_poll_changed to work, you'd still need generic_setup

14:27 <javierm> tzimmermann: hmm, right

14:27 <tzimmermann> still, that branch should probably go

14:28 <javierm> tzimmermann, MrCooper: but just by reading the code, I can't see a reason why drm_fb_helper_output_poll_changed() is needed that drm_fbdev_generic_setup() doesn't already

14:28 <MrCooper> javierm: FWIW, the lid of this laptop (which is affected by the same or at least very similar issue) is currently closed, and drm_info says eDP status is connected

14:29 <javierm> MrCooper: yeah, my suspicious was wrong if drm_fbdev_generic_setup() is needed for mode_config->output_poll_changed

14:29 <tzimmermann> javierm, MrCooper. IMHO there's something with the handling of deferred_setup https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/drm_fb_helper.c#L2329 as if it fails once and then cannot recover

14:29 sgruszka has left #dri-devel [#dri-devel]

14:29 <tzimmermann> but i cannot really point to the issue

14:33 <tzimmermann> at probe we call generic_setup()

14:33 <tzimmermann> it simulates a hotplug to initialize the display: https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/drm_fbdev_generic.c#L343

14:33 Ahuj has quit [Ping timeout: 480 seconds]

14:34 <tzimmermann> this apparently worked, as there's no error message in that debug log

14:36 <javierm> tzimmermann: yeah, some debug log in drm_fbdev_generic_setup() when it succeedes would be useful

14:37 <tzimmermann> the output_poll_changed callback is only called by the two functions starting at https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/drm_probe_helper.c#L691

14:37 <tzimmermann> it's immediately followed by client_dev_hotplug(), which calls our client code

14:38 <tzimmermann> and it should end up in the same place as output_poll_changed, namely drm_fb_helper_hotplug_event: https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/drm_fbdev_generic.c#L272

14:39 <javierm> yeah

14:39 <javierm> that's my conclusion as well so I don't understand why is failing...

14:40 <tzimmermann> and i'm pretty sure that we take this branch, because our initial simulated hoptplug did not fail. so dev->fb_helper should be set as this point

14:40 <javierm> yes

14:43 <tzimmermann> javierm, i don't find this line in the debug logs: https://elixir.bootlin.com/linux/v6.1.35/source/drivers/gpu/drm/drm_fb_helper.c#L2085

14:43 <tzimmermann> grep for hotplug_event and there's only the sysfs stuff

14:44 <tzimmermann> maybe it fails in this condition: https://elixir.bootlin.com/linux/v6.1.35/source/drivers/gpu/drm/drm_fb_helper.c#L2077

14:44 kzd has joined #dri-devel

14:44 <tzimmermann> that would still return an err of 0

14:47 <macromorgan> anyone have experience with tinydrm panel drivers? I'm confused about the rs pin and how I make it work with my SPI controller...

14:50 <javierm> tzimmermann: which would be !fb_helper->fb since is unlikely that the mutex grabbing in drm_master_internal_acquire() would fail

14:50 <javierm> so it seems your intuition is correct and the problem is in the delayed outplug path

14:51 <javierm> *hotplug

14:51 <tzimmermann> javierm, there's x11 running! so drm_master_internal_acquire() should fail

14:51 <tzimmermann> x11 is the drm master already

14:52 <tzimmermann> after do the described vt-switch (alt+f2), the code would run last_close IIRC

14:52 f11f12 has quit [Quit: Leaving]

14:53 <tzimmermann> ans last_close ultimatively brings us to https://elixir.bootlin.com/linux/v6.1.35/source/drivers/gpu/drm/drm_fb_helper.c#L232

14:53 <tzimmermann> where delayed_hotplug is being handled

14:53 <tzimmermann> i still don't see the issue, though :/

14:53 <MrCooper> in my case, gnome-shell is most certainly not running yet at that point, plymouth might be though

14:56 JohnnyonFlame has quit [Ping timeout: 480 seconds]

14:58 <javierm> tzimmermann: yeah porque if (do_delayed) then drm_fb_helper_hotplug_event() should be called

14:58 <javierm> err, because. I don't know why I mixed spanish and english in the same sentence haha

14:58 <javierm> tzimmermann: https://elixir.bootlin.com/linux/v6.1.35/source/drivers/gpu/drm/drm_fb_helper.c#L262

15:00 <tzimmermann> de nada

15:00 <javierm> :D

15:01 <javierm> tzimmermann: so I'm also not seeing the issue... MrCooper maybe you can add some debug logs in drm_fb_helper_hotplug_event() and figure out whether is called or not on switch to VT ?

15:01 alyssa has joined #dri-devel

15:03 <tzimmermann> maybe we have set deferred_setup https://elixir.bootlin.com/linux/v6.1.35/source/drivers/gpu/drm/drm_fb_helper.c#L1945

15:04 <javierm> tzimmermann: but wouldn't that be the case too for the drm_fb_helper_output_poll_changed() -> drm_fb_helper_hotplug_event() path ?

15:05 <MrCooper> can do

15:05 <tzimmermann> then we'd return early: https://elixir.bootlin.com/linux/v6.1.35/source/drivers/gpu/drm/drm_fb_helper.c#L241

15:09 <tzimmermann> MrCooper, thanks

15:10 padovan4 has joined #dri-devel

15:10 <javierm> MrCooper: great. I'm out of ideas

15:10 padovan4 is now known as padovan

15:11 Duke`` has joined #dri-devel

15:17 <MrCooper> thanks for the brainstorming guys

15:19 alyssa has quit [Quit: alyssa]

15:21 Haaninjo has quit [Ping timeout: 480 seconds]

15:22 bgs has quit [Remote host closed the connection]

15:22 alyssa has joined #dri-devel

15:25 tzimmermann has quit [Quit: Leaving]

15:34 Ahuj has joined #dri-devel

15:38 bmodem has quit [Ping timeout: 480 seconds]

15:42 TMM has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]

15:42 TMM has joined #dri-devel

15:49 illwieckz has quit [Ping timeout: 480 seconds]

15:53 frieder has quit [Remote host closed the connection]

16:02 Haaninjo has joined #dri-devel

16:06 Haaninjo has quit []

16:16 clever has quit [Ping timeout: 480 seconds]

16:17 illwieckz has joined #dri-devel

16:18 cmichael has quit [Quit: Leaving]

16:21 <agd5f> javierm, we don't set up the fbdev code if the GPU doesn't have any display hardware on the GPU.

16:23 <agd5f> some GPUs may not have any display IPs at all, others may have display IPs, but no physical connectors on the board.

16:28 Ahuj has quit [Ping timeout: 480 seconds]

16:35 dylanchapell has joined #dri-devel

16:38 <javierm> agd5f: yeah, I think that understood the rationale of that logic. But that wasn't the issue anyways as tzimmermann mentioned. It's likely something in the deferred setup

16:38 <javierm> macromorgan: sorry, I missed your message before. What problem do you have, with what panel driver ?

16:40 benjaminl has joined #dri-devel

16:46 benjamin1 has quit [Ping timeout: 480 seconds]

16:59 benjamin1 has joined #dri-devel

17:01 <macromorgan> I'm trying to create a new panel driver for tinydrm based on this: https://github.com/FunKey-Project/linux/blob/FunKey_S/drivers/staging/fbtft/fb_st7789v.c

17:02 benjaminl has quit [Read error: Connection reset by peer]

17:02 rasterman has joined #dri-devel

17:02 <macromorgan> I'm having issues understanding though how to handle the RS pin (which is hardwired to the MISO pin)

17:03 djbw has joined #dri-devel

17:03 benjaminl has joined #dri-devel

17:03 <macromorgan> if I define the pinctrl for the SPI bus, won't that block me from using the MISO pin as the RS pin? Is there a helper function that does that in Linux I'm missing?

17:08 <javierm> macromorgan: there's already a panel driver for this chip I see: drivers/gpu/drm/panel/panel-sitronix-st7789v.c

17:09 benjamin1 has quit [Ping timeout: 480 seconds]

17:10 <macromorgan> that's for initing the panel via SPI but displaying it via DPI, if I'm not mistaken

17:10 <macromorgan> Mine is to both init and display via SPI

17:10 <macromorgan> I assumed that needed a different setup

17:13 <javierm> macromorgan: ah Ok. But still I wonder if wouldn't be better to extend that driver to support both DPI and SPI transports

17:13 <javierm> mripard ^

17:14 <macromorgan> If that's the route we want to go. Honestly I'm just trying to get the panel to work, then I can worry about making it mainline conformant

17:15 <macromorgan> this is the first time I've worked with a pure SPI panel that used the MISO pin as a "switch" to note if we're sending data or commands

17:23 <javierm> macromorgan: yeah, that's normal for some of these SPI panels. It's usually called D/C (data or command) and not RS though (which sounds more like reset?)

17:24 mbrost has joined #dri-devel

17:24 <javierm> macromorgan: and what you do usually is to use a GPIO to toggle that pin

17:28 <javierm> macromorgan: it seems is called D/CX in your chip datasheet, by looking at "8.4 Serial Interface" section in https://newhavendisplay.com/content/datasheets/ST7789V.pdf

17:29 kts has joined #dri-devel

17:29 <javierm> macromorgan: "In 4-lines serial interface, data packet contains just transmission byte and control bit D/CX is transferred by the D/CX pin"

17:29 <macromorgan> yep... sadly in my implementation it's hooked to the MISO pin

17:30 Guest4336 has quit []

17:30 leandrohrb5 has quit [Quit: The Lounge - https://thelounge.chat]

17:30 italove7 has quit []

17:30 dwlsalmeida has quit [Quit: The Lounge - https://thelounge.chat]

17:30 leandrohrb is now known as leandrohrb5

17:31 <javierm> macromorgan: I see... and you must use 4-wire, you can't support a 3-wire SPI setup?

17:31 dwlsalmeida8 has joined #dri-devel

17:32 <javierm> because with 3-wire you can have the D/C bit as a part of the 9-bit payload

17:32 <macromorgan> I honestly don't know

17:32 <macromorgan> new to SPI displays honestly

17:33 <javierm> macromorgan: look at the "8.4.2 Command write mode" section in the datasheet I shared

17:33 <macromorgan> okay will do

17:34 <javierm> you either can send a 9-bit payload (where the first bit is the D/CX to let the controller know whether the payload is data or a command) or a 8-bit payload (where the D/CX is out-of-band using a pin)

17:35 <macromorgan> so if I send an 8 bit payload over a 3-wire interface I should be golden right?

17:36 <macromorgan> I guess I can try that and see if it falls flat on its face or not

17:36 <javierm> macromorgan: yeah, that won't work because the chip won't know was you are sending it...

17:36 <macromorgan> ohh, wait, you mean send a 9 bit payload over a 3 wire interface

17:36 tursulin has quit [Ping timeout: 480 seconds]

17:37 <javierm> macromorgan: yes

17:37 <macromorgan> okay, let's try that :-)

17:37 <javierm> the 9-bit is <1-bit D/CX, 8-bit payload>

17:37 <javierm> macromorgan: but your chip has to be configured to use that interface

17:37 <javierm> see "6.2 Interface Logic Pins" section

17:38 <javierm> pins IM[3-0] are used for that but I don't know whether those are accessible on your design

17:38 <javierm> you wan't those to be 0 1 0 1 according that datasheet

17:39 <macromorgan> they are not available and I don't know how they are configured... let me check the panel display sheet in case it has something on it

17:39 <javierm> err, it seems 1 1 0 1. I misread

17:39 <macromorgan> https://cdn.hackaday.io/files/1649347056536256/ALIBABA_SAEF_SF-TC154B-8377A-N_annote.pdf

17:41 <javierm> macromorgan: I've to leave but I had to implement something like that for the ssd130x driver: https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/solomon/ssd130x-spi.c#L21

17:41 <macromorgan> okay, thank you for your help. Gives me something to look at more

17:41 <javierm> but if you can't use a GPIO, then that's not an option for you :(

17:42 <javierm> macromorgan: you are welcome. Now that I think about it 3-wire SPI should also be supported for the ssd130x, maybe I should implement that too

17:42 jhli has joined #dri-devel

18:14 kts has quit [Ping timeout: 480 seconds]

18:16 dviola has quit [Ping timeout: 480 seconds]

18:18 eukara has joined #dri-devel

18:18 <anholt_> daniels: thanks. looks like I was already picking the radv runners, so I just need to --stress and we'll be good.

18:21 junaid has quit [Quit: leaving]

18:26 JohnnyonFlame has joined #dri-devel

18:33 dylanchapell has quit [Ping timeout: 480 seconds]

18:37 illwieckz has quit [Ping timeout: 480 seconds]

18:37 diego has joined #dri-devel

18:37 clever has joined #dri-devel

18:37 kasper93 has joined #dri-devel

18:41 rasterman has quit [Quit: Gettin' stinky!]

18:42 illwieckz has joined #dri-devel

18:44 <daniels> nice

18:51 LexSfX has joined #dri-devel

19:10 hussam has joined #dri-devel

19:11 <hussam> Hello. Does OpenCL 3.0 work with mesa on a 620 intel hd?

19:12 mbrost has quit [Ping timeout: 480 seconds]

19:14 kzd has quit [Ping timeout: 480 seconds]

19:38 junaid has joined #dri-devel

19:53 neonking has joined #dri-devel

20:00 kzd has joined #dri-devel

20:00 vliaskov has quit []

20:07 gouchi has joined #dri-devel

20:09 gouchi has quit []

20:20 CATS has quit [Ping timeout: 480 seconds]

20:25 Guest4545 has quit [Quit: Guest4545]

20:29 * alyssa glares at dEQP-EGL.functional.render.multi_context.*

20:29 <alyssa> I'm running it in a loop and it's passing reliably.

20:29 <alyssa> But I've SEEN it flake T_T

20:30 <karolherbst> hussam: yes

20:30 <karolherbst> well.. mostly

20:31 <alyssa> apparently my stress test isn't stressful enough

20:31 <karolherbst> you need help hitting flakes? tried running 200 threads in parallel?

20:31 <alyssa> That's an idea..

20:31 <hussam> karolherbst: What meson options do I need?

20:31 <karolherbst> that's how I fixed all the CL flakes I had left 🙃

20:31 <karolherbst> well.. maybe not 200, but...

20:32 <karolherbst> using "stress" to keep your CPU busy does help with finding CPU related flakes 🙃

20:32 <karolherbst> hussam: gallium-rusticl=true

20:33 <hussam> will that generate the icd file?

20:33 <karolherbst> yes

20:33 <hussam> Thank you. I will try now.

20:34 CATS has joined #dri-devel

20:41 <karolherbst> jenatali: ever saw a "Attribute does not match Module context!" error?

20:42 <hussam> karolherbst: Done. hashcat says no devices found.

20:42 <jenatali> karolherbst: Sounds like you've got two LLVM contexts?

20:42 <karolherbst> jenatali: so I have a user seeing this: https://pastebin.com/eiQn4R21

20:42 <karolherbst> mhh but yeah..

20:42 <karolherbst> could be something silly inside gentoo again...

20:43 <karolherbst> but also makes no sense really..

20:43 <karolherbst> hussam: need to set RUSTICL_ENABLE=iris

20:43 <hussam> yay. that worked.

20:43 <karolherbst> it might or might not work correctly yet

20:43 <jenatali> Weird

20:44 <karolherbst> it's good enough to pass the CTS and run random stuff... but.. there are still issues I want to tackle before enabling anything by default :F

20:44 <karolherbst> jenatali: yeah...

20:44 <karolherbst> I'll ask for a LD_DEBUG=libs...

20:45 <karolherbst> jenatali: this "llvm::compression::zstd::compress" confuses me....

20:46 <psykose> how so

20:46 <karolherbst> why would zstd::compress be called when reporting an error?

20:47 <karolherbst> I'm sure it's just some optimized build and figuring out the symbols is screwed up

20:47 JohnnyonF has joined #dri-devel

20:47 <psykose> hhmm

20:47 <psykose> yeah, you're right

20:48 <psykose> that does not track at all so it's wrong symbols or similar brokenness

20:49 <psykose> well, maybe there's some magic path where the function call inside report_fatal_error is so broken it jumps into zstd::compress which then aborts on something :D

20:49 <psykose> corrupted memory can do anything

20:49 <karolherbst> jenatali: https://pastebin.com/JcLzijD3 .... mhhhh

20:49 <karolherbst> I don't know, but having two llvms....

20:50 <jenatali> Yeah 100% that's the problem

20:50 <karolherbst> `libigdfcl` that's intel, no?

20:50 <karolherbst> yep...

20:50 <karolherbst> mhhh

20:51 <karolherbst> :pain:

20:51 <psykose> i thought having two llvms just aborted on init

20:51 <psykose> or does that not happen on glibc

20:51 <karolherbst> good question

20:51 <psykose> on musl at least the second one will abort due to some Options thing being double registered, inside llvms constructor

20:52 <karolherbst> yeah...

20:52 <karolherbst> that's usually what happens

20:52 <karolherbst> maybe something else happens now

20:52 <karolherbst> but anyway.... uhhh

20:52 <karolherbst> can we torch llvm? :D

20:53 <psykose> but cute dragon :(

20:53 <karolherbst> we keep the dragon

20:54 <karolherbst> I wonder if we really have to runtime load llvm....

20:54 <karolherbst> and just load it in a way it's only private to us

20:54 JohnnyonFlame has quit [Ping timeout: 480 seconds]

20:54 <jenatali> Like static linking...?

20:54 <karolherbst> but uhhh.....

20:54 <karolherbst> mnhhh

20:55 <karolherbst> I don't know if I'm in the mood for that kind of bikeshed comming to use

20:55 <karolherbst> *us

20:55 <karolherbst> _but_

20:55 <karolherbst> we could tell distributions if they support multiple llvm versions at once, then they either static link or we close all bugs

20:55 <psykose> it's a distribution issue to make sure everything in mesa path has the same llvm version, yes

20:56 <psykose> but it really is very obvious 99% of the time, you just get a load abort..

20:56 <karolherbst> ehh it's not inside mesa

20:56 <psykose> i dont' know why this case is different

20:56 <karolherbst> mesa is fine

20:56 <karolherbst> well

20:56 <karolherbst> soo

20:56 <karolherbst> ever heard of the Vulkan ICD thing? so this was an idea they took from CL and CL does the same thing

20:56 <psykose> i am completely clueless about anything ICD :D

20:56 <karolherbst> the issue with CL is, that... all (like.. almost all) cl impls use LLVM

20:57 <karolherbst> loading multiple implementations at once

20:57 <psykose> yeah, all gotta match

20:57 <karolherbst> so the user has Intels CL stack (on LLVM-15) and mesa (on LLVM-16)

20:57 <karolherbst> and the ICD dlopens those CL impls

20:57 <psykose> ah, i think that's the issue

20:58 <psykose> if you dlopen the conflicting llvm much later you get into this state

20:58 <karolherbst> yep, the user already confirmed it :)

20:58 <psykose> but yeah, it's a pain

20:58 <karolherbst> sooo.. the icd loads with `RTLD_LAZY|RTLD_LOCAL`

20:58 <karolherbst> _but_

20:59 <karolherbst> there is a `// | RTLD_DEEPBIND` in the code...

20:59 <karolherbst> and I wonder if that would fix it...

20:59 <karolherbst> but really.. this part is really broken on linux

21:00 <psykose> i dunno, it sounds like it would just somewhat hide it more

21:00 <karolherbst> maybe

21:00 <karolherbst> but this is something we have to figure out I guess

21:00 <psykose> famously musl also does not have that, but distro side that's a 5 second patch for me so idc personally

21:00 <karolherbst> yeah so some distros support multiple LLVM versions

21:00 <karolherbst> like gentoo

21:00 <psykose> and similarly on distro side "match all the llvms" is just what i do

21:00 <karolherbst> and ubuntu

21:00 <airlied> and fedora :)

21:00 <psykose> e.g. in alpine mesa is 15 because blender is 15

21:00 <karolherbst> and I think fedora as well in theory

21:01 <karolherbst> I think dlopening llvm ourselves is probably the way to go here :/

21:01 <karolherbst> but llvmpipe devs will hate us

21:02 hikiko has quit [Ping timeout: 480 seconds]

21:02 <karolherbst> but uhhh...

21:02 <karolherbst> why is it such a huge issue with llvm

21:02 <psykose> it would be the same with any multiple-abi-versions dep

21:02 <psykose> llvm is just the famous one here :)

21:02 <karolherbst> ahh right.. because symbol versioning is broken or something

21:03 <psykose> clearly what we need is more market fragmentation

21:03 <psykose> so others start using definitelynotllvm that doesn't conflict

21:03 <karolherbst> yeah, maybe we just have to make more people run into this issue so it finally gets addressed

21:04 <psykose> libc side i don't think there's anyone super interested in addressing this in some meaningful way that i've seen for the loader

21:04 <karolherbst> but honestly.. why couldn't the icd dlopen our libraries in a way that dependencies are private to the library or something :/

21:04 <psykose> could be wrong

21:05 Kayden has quit [Quit: change locations]

21:06 <karolherbst> khronos own loader uses RTLD_NOW mhhh

21:07 JohnnyonFlame has joined #dri-devel

21:13 <karolherbst> so the user had used khronos loader which is using RTLD_NOW mhhh

21:13 <karolherbst> I wonder if the loader should be fixed here...

21:14 JohnnyonF has quit [Ping timeout: 480 seconds]

21:16 <karolherbst> does anybody know if "RTLD_LOCAL" gets applied to dependencies as well?

21:17 <karolherbst> so if one loads Intel's CL impl with RTLD_LOCAL, would its LLVM dep be only local to it?

21:20 Duke`` has quit [Ping timeout: 480 seconds]

21:21 Kayden has joined #dri-devel

21:24 <psykose> if you mean things that are DT_NEEDED on the cl object and not something it itself also later dlopens then i think so

21:24 <psykose> didn't test though

21:24 <psykose> this stuff is so broken though i wouldn't be surprised if it's actually the opposite of that :D

21:25 <karolherbst> yeah...

21:25 <karolherbst> sooo.. I'm checking if it works with how ocl-icd loads things

21:25 <karolherbst> and if it does, I just file a bug against khronos loader or change it to RTLD_LOCAL

21:26 <karolherbst> because on fedora intel's stack also installs with llvm-15 :')

21:26 <psykose> yeah idk how that works at all

21:26 <psykose> i imagine mesa being 16 on any distro isn't actually functional for any later deps

21:26 <psykose> but there's always magic

21:27 <karolherbst> what magic

21:27 <psykose> magic of magic, the unknown :D

21:27 <psykose> (it's bedtime for me)

21:27 <psykose> nini karol

21:27 <karolherbst> rude

21:27 <psykose> <3

21:27 <karolherbst> :3

21:32 junaid has quit [Remote host closed the connection]

21:33 Daanct12 has joined #dri-devel

21:37 <karolherbst> yeah sooo...

21:37 <karolherbst> with ocl-icd it just works it seems

21:37 tursulin has joined #dri-devel

21:38 <karolherbst> or at least I think it does... mhh

21:38 <karolherbst> ehh no, both are built against llvm-15.. uhhh

21:39 Guest4502 has quit [Ping timeout: 480 seconds]

21:47 JohnnyonFlame has quit [Ping timeout: 480 seconds]

21:55 Kayden has quit [Quit: change loc]

21:57 <karolherbst> yeah dunno.. on fedora it just works

21:57 <karolherbst> must be a gentoo bug then

21:59 tursulin has quit [Ping timeout: 480 seconds]

22:05 <karolherbst> mattst88: so apparently on gentoo if a process loads llvm-15 and llvm-16 it crashes in weird ways. On fedora the same thing seems to work. Maybe fedora is something dodgy to make it work. Maybe gentoo also needs to do something dodgy. I have no idea, but just wanted to let you know

22:06 <karolherbst> this can happend if a user has intel's and mesa's CL impl installed and are compiled against different LLVM versions

22:12 ngcortes has joined #dri-devel

22:15 <mattst88> karolherbst: ugh :(

22:15 <mattst88> oh, separately, did you make an MR with the patch Gentoo is carrying? the clang resource dir one

22:26 <alyssa> anholt_: any pointers about dEQP-EGL.functional.render.multi_context*?

22:26 <anholt_> alyssa: nope

22:26 <alyssa> Wheeee

22:26 <anholt_> disable any job reordering?

22:26 <alyssa> We're hitting very rare flakiness for it on Asahi.. but I see it's also on freedreno flake list so I'm thinking it's not driver specific

22:27 <alyssa> disabling the shader disk cache seems to make the flakes all but disappear, though I think I still hit.. the flake once with cache disabled

22:29 * alyssa should try to reproduce the flakiness on panfrost

22:33 <alyssa> looking through Mesa CI Daily Reports, I see a similar test (`wayland-dEQP-EGL.functional.color_clears.multi_context.gles1.rgba8888_window`) flaked(?) on llvmpipe

22:33 <alyssa> which seems like a harbinger

22:34 <alyssa> another related test is on the rpi3 flake list but I suspect the whole test group (not just that one) is flaky https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20452/diffs

22:35 <alyssa> I can't really imagine what could be going wrong for the multi_context tests (which AFAICT are still single-threaded?) to flake so rarely across... every driver that's running them in CI, seemingly?

22:36 <karolherbst> mattst88: I planned to do so tomorrow

22:43 Haaninjo has joined #dri-devel

22:43 benjamin1 has joined #dri-devel

22:50 benjaminl has quit [Ping timeout: 480 seconds]

22:53 <karolherbst> uhh multi_context?

22:54 <karolherbst> I think I was also seeing flakes in nouveau with that...

22:55 <karolherbst> but yeah, multi_context is single threaded

22:55 leo60228- has joined #dri-devel

22:57 leo60228 has quit [Ping timeout: 480 seconds]

22:57 i509vcb has joined #dri-devel

22:57 gtn has joined #dri-devel

22:57 Haaninjo has quit [Quit: Ex-Chat]

22:58 kzd has quit [Ping timeout: 480 seconds]

22:58 <alyssa> karolherbst: I don't understand how a single threaded test can flake on every driver

22:58 rasterman has joined #dri-devel

23:01 <alyssa> https://rosenzweig.io/flaker.xml if anyone is curious

23:02 <alyssa> I notice some interesting rectangular corruption

23:02 <alyssa> IDK if tile boundaries but.. doesn't seem natural

23:03 <alyssa> it just flakes /so rarely/ that I don't know how to debug this monster

23:03 <anholt_> MSAA only?

23:04 <alyssa> not sure, will try to capture more qpa's

23:05 <alyssa> also sometimes see GPU timeouts (though not faults). this seems distinct symptom from the fails.

23:06 <anholt_> wonder how much happier everything would be if we did multisample render to single sampled in the winsys.

23:06 <alyssa> heh

23:06 gtn has quit []

23:08 <alyssa> this would be a lot easier if I could actually reproduce the damn flake

23:09 <alyssa> got another fail, this time config 48, EGL_SAMPLES=4

23:09 kzd has joined #dri-devel

23:12 <alyssa> 47, EGL_SAMPLES=2

23:13 <alyssa> another 47

23:13 <alyssa> a lot of 47

23:18 <alyssa> 47 twice more

23:18 <alyssa> while that's a little odd

23:18 <alyssa> it's testing 24 configs, only 3(?) of which are eGL_SAMPLES=0

23:19 <alyssa> so while I have not observed a non-MSAA failure, that's not wholly unexpected

23:19 rasterman has quit [Quit: Gettin' stinky!]

23:22 <alyssa> hacked up the driver to pretend not to support MSAA, let's go

23:26 JohnnyonFlame has joined #dri-devel

23:28 <alyssa> without MSAA, so far no fails observed after over 4000 iterations

23:29 <alyssa> going to let this keep going just in case, but I think this is indeed solid evidence that yes, it's MSAA related.

23:29 <alyssa> anholt_: nice one :+1:

23:29 shashanks_ has joined #dri-devel

23:35 shashanks__ has quit [Ping timeout: 480 seconds]

23:35 <alyssa> ok, 10k iterations with no fail

23:35 <alyssa> yeah, I'd say this is indeed MSAA only.

23:43 eukara has quit [Ping timeout: 480 seconds]

23:43 elongbug has quit [Read error: Connection reset by peer]

23:59 <airlied> src/compiler/nir/nir_opt_algebraic.c: In function ‘nir_opt_algebraic’:

23:59 <airlied> src/compiler/nir/nir_opt_algebraic.c:1374082: note: ‘-Wmisleading-indentation’ is disabled from this point onwards, since column-tracking was disabled due to the size of the code/headers

23:59 <airlied> 1374082 | nir_foreach_function_impl(impl, shader) {

23:59 <airlied> nothing to see here, only 1.4M loc file :-P