#dri-devel on 2023-02-17 — irc logs at oftc.irclog.whitequark.org

2022-12-21 00:45 ChanServ changed the topic of #dri-devel to: <ajax> nothing involved with X should ever be unable to find a bar

00:01 <anholt> I'm not sure that the time savings we're talking about here is worth the complexity. I'd be more interested in what compiler tunables there might be to get us "symbol backtraces, maybe function input values if they're cheap" rather than a full most-debuggable binary.

00:02 smilessh has joined #dri-devel

00:03 <anholt> given how allergic most mesa devs are to thinking how to interact with the CI already, making any additional complexity for debugging makes it less and less likely that anyone ever uses it.

00:03 <DavidHeidelberg[m]> anholt: when transferring outside of us and network is under load, sometimes the transfers are really slow, so every MB seems to matter (in terms of performance).

00:03 <DavidHeidelberg[m]> *US

00:04 <DavidHeidelberg[m]> btw. with the docs ( https://mesa.pages.freedesktop.org/-/mesa/-/jobs/34548032/artifacts/public/debugging.html#working-with-core-dumps-generated-by-ci ) it doesn't seems to be so hard to setup with few commands and debug. It's in general normal coredump

00:04 exit70_ has quit []

00:08 <anholt> why is the debug.dwp not included in the unstripped mesa tarball?

00:09 <DavidHeidelberg[m]> anholt: because of otherwise it have to be linked with linker at compilation time, which would slowdown the build

00:09 <DavidHeidelberg[m]> so in unstripped build are only references to the debug.dwp (exactly to the .dwo files inside the .dwp)

00:10 exit70 has joined #dri-devel

00:17 <anholt> we're linking debug info at compile time today, right?

00:17 <Lynne> have to say, descriptor buffers are so much nicer to work with

00:17 danvet has quit [Ping timeout: 480 seconds]

00:19 rmckeever has joined #dri-devel

00:21 <DavidHeidelberg[m]> anholt: with my MR we can use split-debug (so not put debug into .o, but .dwo), otherise sure

00:22 <anholt> I'm trying to understand why you need the split-debug complexity for the unstripped tarball.

00:23 <DavidHeidelberg[m]> anholt: in general, this page sums it up https://gcc.gnu.org/wiki/DebugFission ; for smaller project it doesn't matter, but Mesa is already large enough to difference to be seen

00:23 <anholt> (I haven't done the work myself, but I really suspect there's something better we could choose in our debugoptimized build's -g options that could make the cost of debug symbols low enough that it would give us debuggability without extra tarballs even)

00:25 <DavidHeidelberg[m]> anholt: well, even with full debug with debugoptimized it's not perfect since we lose all details, I assume doing something between would produce very limited results (not saying it couldn't produce something useful ofc)

00:26 <DavidHeidelberg[m]> I used mine MR few times and I had to say I would prefer to have meson 'debug' build, but ofc that's useless for flakes.

00:27 Zopolis4 has quit []

00:39 djbw has quit [Read error: Connection reset by peer]

00:41 djbw has joined #dri-devel

00:42 pcercuei has quit [Quit: dodo]

01:02 Haaninjo has quit [Quit: Ex-Chat]

01:05 anholt has quit [Ping timeout: 480 seconds]

01:06 liyi__ has joined #dri-devel

01:15 stuart has quit []

01:18 kts has joined #dri-devel

01:20 anholt has joined #dri-devel

01:33 kts has quit [Quit: Leaving]

01:53 kzd has quit [Quit: kzd]

01:55 columbarius has joined #dri-devel

01:57 co1umbarius has quit [Ping timeout: 480 seconds]

01:58 alyssa has left #dri-devel [#dri-devel]

02:09 fxkamd has quit [Remote host closed the connection]

02:09 ngcortes has quit [Remote host closed the connection]

02:09 fxkamd has joined #dri-devel

02:09 ngcortes has joined #dri-devel

02:10 Peuc has quit [Remote host closed the connection]

02:10 Peuc has joined #dri-devel

02:11 sauce has quit [Remote host closed the connection]

02:12 sauce has joined #dri-devel

02:13 unerlige1 has quit [Remote host closed the connection]

02:13 unerlige1 has joined #dri-devel

02:13 Sachiel has quit [Remote host closed the connection]

02:14 Sachiel has joined #dri-devel

02:16 kzd has joined #dri-devel

02:20 naseer__ has quit [Read error: Network is unreachable]

02:20 naseer__ has joined #dri-devel

02:20 warpme_____ has quit [Read error: Network is unreachable]

02:20 warpme_____ has joined #dri-devel

02:21 orbea has quit [Remote host closed the connection]

02:21 orbea has joined #dri-devel

02:25 ngcortes has quit [Read error: Connection reset by peer]

02:33 kzd has quit [Ping timeout: 480 seconds]

02:33 konstantin has joined #dri-devel

02:35 kzd has joined #dri-devel

02:35 nchery has quit [Ping timeout: 480 seconds]

02:39 konstantin_ has quit [Ping timeout: 480 seconds]

02:43 <karolherbst> uhh... why can't I trigger the fails CI runs into locally :(

03:05 aravind has joined #dri-devel

03:05 TMM has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]

03:05 TMM has joined #dri-devel

03:28 pallavim has quit [Ping timeout: 480 seconds]

03:39 oneforall2 has quit [Remote host closed the connection]

03:40 oneforall2 has joined #dri-devel

03:47 kzd has quit [Quit: kzd]

03:52 kzd has joined #dri-devel

04:04 Zopolis4 has joined #dri-devel

04:32 macromorgan has joined #dri-devel

04:32 bmodem has joined #dri-devel

05:03 heat has quit [Ping timeout: 480 seconds]

05:03 JohnnyonFlame has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]

05:05 kzd_ has joined #dri-devel

05:06 kzd has quit [Ping timeout: 480 seconds]

05:09 liyi__ has quit [Ping timeout: 480 seconds]

05:14 kzd has joined #dri-devel

05:18 liyi__ has joined #dri-devel

05:20 kzd_ has quit [Ping timeout: 480 seconds]

05:29 junaid has joined #dri-devel

05:33 junaid_ has joined #dri-devel

05:46 kzd_ has joined #dri-devel

05:52 kzd has quit [Ping timeout: 480 seconds]

05:57 kzd_ has quit []

06:07 kzd has joined #dri-devel

06:18 kzd has quit [Quit: kzd]

06:34 bgs has joined #dri-devel

06:46 kzd has joined #dri-devel

06:55 junaid_ has quit [Remote host closed the connection]

06:55 junaid has quit [Remote host closed the connection]

06:57 kzd has quit [Quit: kzd]

06:59 aravind has quit [Ping timeout: 480 seconds]

07:05 tzimmermann has joined #dri-devel

07:06 pjakobsson has quit []

07:21 alanc has quit [Remote host closed the connection]

07:22 danvet has joined #dri-devel

07:23 bluetail98 has quit []

07:28 alanc has joined #dri-devel

07:34 danvet has quit [Ping timeout: 480 seconds]

07:41 jkrzyszt has joined #dri-devel

07:41 danvet has joined #dri-devel

07:44 nchery has joined #dri-devel

07:47 <daniels> karolherbst: which driver?

07:58 sghuge has quit [Remote host closed the connection]

07:58 sghuge has joined #dri-devel

08:02 fab has joined #dri-devel

08:03 rasterman has joined #dri-devel

08:07 mvlad has joined #dri-devel

08:25 junaid has joined #dri-devel

08:28 junaid has quit [Read error: No route to host]

08:29 junaid has joined #dri-devel

08:30 vliaskov has joined #dri-devel

08:30 fab has quit [Ping timeout: 480 seconds]

08:33 danvet has quit [Ping timeout: 480 seconds]

08:33 ice9 has joined #dri-devel

08:34 danvet has joined #dri-devel

08:36 <linkmauve> “23:28:30 gfxstrand> IDK, Intel has managed to evolve their hardware for 15 years without deleting interesting formats.”, /me cries in ASTC.

08:43 danvet has quit [Ping timeout: 480 seconds]

08:44 danvet has joined #dri-devel

08:54 phasta has joined #dri-devel

08:57 <MrCooper> DavidHeidelberg[m]: FWIW, -ggdb/-ggdb3 might improve debugability of debugoptimized builds, probably at the cost of bigger debuginfo though

09:05 junaid has quit [Remote host closed the connection]

09:05 Company has quit [Quit: Leaving]

09:07 pochu has joined #dri-devel

09:08 tursulin has joined #dri-devel

09:09 ahajda has joined #dri-devel

09:11 MajorBiscuit has joined #dri-devel

09:20 danvet has quit [Ping timeout: 480 seconds]

09:21 danvet has joined #dri-devel

09:22 Haaninjo has joined #dri-devel

09:29 darkbasic4 has joined #dri-devel

09:38 rmckeever has quit [Quit: Leaving]

09:42 pcercuei has joined #dri-devel

09:57 <HdkR> linkmauve: Everyone cries in ASTC, just like the hardware designers

09:57 apinheiro has joined #dri-devel

10:04 <karolherbst> daniels: llvmpipe mainly

10:15 danvet has quit [Ping timeout: 480 seconds]

10:17 danvet has joined #dri-devel

10:27 Haaninjo has quit [Quit: Ex-Chat]

10:43 darkbasic4 has quit [Remote host closed the connection]

10:43 sgruszka has joined #dri-devel

10:43 sgruszka has quit [Remote host closed the connection]

10:47 sgruszka has joined #dri-devel

10:55 jdavies has joined #dri-devel

10:56 jdavies is now known as Guest5126

10:58 jdavies_ has joined #dri-devel

10:59 jdavies_ has quit [Remote host closed the connection]

11:03 junaid has joined #dri-devel

11:04 Guest5126 has quit [Ping timeout: 480 seconds]

11:11 junaid has quit [Remote host closed the connection]

11:21 liyi__ has quit [Ping timeout: 480 seconds]

11:28 co1umbarius has joined #dri-devel

11:30 columbarius has quit [Ping timeout: 480 seconds]

11:36 anholt_ has joined #dri-devel

11:37 devilhorns has joined #dri-devel

11:39 anholt has quit [Ping timeout: 480 seconds]

12:00 bmodem has quit [Ping timeout: 480 seconds]

12:04 ppascher has quit [Ping timeout: 480 seconds]

12:06 camus has quit []

12:07 kts has joined #dri-devel

12:08 kts has quit []

12:10 kts has joined #dri-devel

12:10 kts has quit [Remote host closed the connection]

12:11 kts has joined #dri-devel

12:31 YuGiOhJCJ has quit [Ping timeout: 480 seconds]

12:34 <DavidHeidelberg[m]> MrCooper: sounds good, anyway I don't see -ggdb vs -ggdbX documented in GCC docs

12:34 YuGiOhJCJ has joined #dri-devel

12:39 <psykose> it's in the manpage

12:39 <psykose> the documentation doesn't say anything useful however

12:40 <psykose> you can grep ggdb here https://man7.org/linux/man-pages/man1/gcc.1.html

12:40 <psykose> it just repeats the same shit as -g

12:41 <psykose> unsure if it does anything at all

12:41 jkrzyszt has quit [Remote host closed the connection]

12:44 <psykose> output size is the same though the sha changes

12:44 <psykose> maybe what i built is just not reproducible

12:44 <psykose> as for ggdbX it's in the ggdblevel part

12:45 <psykose> -g3 -> ggdb3 -g2 -> ggdb2

13:11 phasta has quit [Quit: Leaving]

13:18 <karolherbst> where can I check what CTS version/tag a test is using?

13:18 <karolherbst> or is it all the same?

13:20 srslypascal is now known as Guest5140

13:20 srslypascal has joined #dri-devel

13:21 Guest5140 has quit [Read error: Connection reset by peer]

13:36 kts has quit [Quit: Leaving]

13:52 agd5f_ has joined #dri-devel

13:59 agd5f has quit [Ping timeout: 480 seconds]

14:02 Daaanct12 has quit [Quit: Quitting]

14:07 <karolherbst> I'm now even on the same CTS version and can't trigger the fails from CI :(

14:18 grillo has joined #dri-devel

14:28 grillo has left #dri-devel [#dri-devel]

14:31 grillo_0 has joined #dri-devel

14:31 YuGiOhJCJ has quit [Quit: YuGiOhJCJ]

14:39 ahajda_ has joined #dri-devel

14:42 heat has joined #dri-devel

14:44 ahajda has quit [Ping timeout: 480 seconds]

14:44 ahajda has joined #dri-devel

14:47 ice9 has quit [Ping timeout: 480 seconds]

14:51 ahajda_ has quit [Ping timeout: 480 seconds]

14:54 ahajda_ has joined #dri-devel

14:58 agd5f has joined #dri-devel

15:00 ahajda has quit [Ping timeout: 480 seconds]

15:03 agd5f_ has quit [Ping timeout: 480 seconds]

15:08 agd5f_ has joined #dri-devel

15:14 agd5f has quit [Ping timeout: 480 seconds]

15:20 agd5f_ has quit [Ping timeout: 480 seconds]

15:24 agd5f has joined #dri-devel

15:24 Zopolis4 has quit []

15:31 pochu has quit []

15:34 kzd has joined #dri-devel

15:36 vliaskov has quit [Remote host closed the connection]

15:38 devilhorns has quit []

15:40 kts has joined #dri-devel

15:42 ahajda__ has joined #dri-devel

15:43 TMM has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]

15:43 TMM has joined #dri-devel

15:43 ahajda___ has joined #dri-devel

15:48 jrayhawk has quit [Quit: leaving]

15:48 ahajda_ has quit [Ping timeout: 480 seconds]

15:50 ahajda__ has quit [Ping timeout: 480 seconds]

15:54 ahajda___ has quit [Ping timeout: 480 seconds]

15:58 fxkamd has quit []

16:08 agd5f_ has joined #dri-devel

16:12 Duke`` has joined #dri-devel

16:13 <DavidHeidelberg[m]> hakzsam: I suppose you don't have Helen Android patches included in the VKCTS uprev, right?

16:13 <hakzsam> nope?

16:14 <DavidHeidelberg[m]> There is few patches included in Helen tree which cannot be upstreamed yet, if you apply them it would be best

16:15 agd5f has quit [Ping timeout: 480 seconds]

16:15 <hakzsam> I will try to remember, thanks

16:22 gouchi has joined #dri-devel

16:24 sgruszka has quit [Remote host closed the connection]

16:27 <emersion> can someone review this? https://patchwork.freedesktop.org/series/109887/

16:27 <emersion> just simple logging stuff

16:28 <karolherbst> okay.. so my MR regresses stuff, just not on my machine :'(

16:36 <karolherbst> this makes no sense...

16:36 sewn has joined #dri-devel

16:37 <karolherbst> gfxstrand: any idea how https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20161/diffs?commit_id=e908e08deb198153d92075889815192edb12eb30 could break OpenGL? I honestly don't see a path to this code from a GL perspective

16:37 <sewn> is this the right channel to ask for mesa compiling help? im experiencing a weird mesa build failure.

16:39 jrayhawk has joined #dri-devel

16:41 <karolherbst> oh uhhh....

16:41 djbw has quit [Read error: Connection reset by peer]

16:42 <karolherbst> oops

16:42 <jenatali> wael: Yeah, probably

16:44 <soreau> sewn: yes

16:44 <sewn> despite zstd being disabled via -Dzstd=disabled, libvulkan for amd drivers attempt to link to it, which causes a build failure

16:45 tzimmermann has quit [Quit: Leaving]

16:47 <soreau> is this with mesa git or a release?

16:47 <karolherbst> I think I found it...

16:47 <sewn> it also attempts to link to udev, which in the manually specified (variable, -Dpkg_config_path meson option) pkg config paths, it tries to link to it as well.

16:47 <sewn> release 22.3.5

16:47 djbw has joined #dri-devel

16:47 Company has joined #dri-devel

16:57 heat has quit [Read error: No route to host]

16:57 tursulin has quit [Ping timeout: 480 seconds]

16:57 heat has joined #dri-devel

16:59 vyivel has quit [Remote host closed the connection]

16:59 vyivel has joined #dri-devel

17:10 agd5f has joined #dri-devel

17:15 agd5f_ has quit [Ping timeout: 480 seconds]

17:21 pallavim has joined #dri-devel

17:26 ZenWalker has quit [Ping timeout: 480 seconds]

17:30 kts has quit [Quit: Leaving]

17:31 unerlige1 has left #dri-devel [#dri-devel]

17:31 kzd has quit [Quit: kzd]

17:31 unerlige has joined #dri-devel

17:34 kzd has joined #dri-devel

17:40 simon-perretta-img has quit [Ping timeout: 480 seconds]

17:43 bluetail98 has joined #dri-devel

17:46 tobiasjakobi has joined #dri-devel

17:47 tobiasjakobi has quit []

17:49 jkrzyszt has joined #dri-devel

17:52 MajorBiscuit has quit [Quit: WeeChat 3.6]

17:54 ZenWalker has joined #dri-devel

17:56 <gfxstrand> karolherbst: Uh... what?

17:56 <gfxstrand> Yeah, that makes no sense.

17:58 smilessh has quit [Ping timeout: 480 seconds]

18:13 heat_ has joined #dri-devel

18:13 heat has quit [Read error: No route to host]

18:19 <DemiMarie> jenatali: what does “crash” mean in this context? If it means that the kernel driver or GPU firmware crashed, that is a bug in the kernel or GPU firmware.

18:19 bestest has joined #dri-devel

18:19 <jenatali> It means that the GPU hung, which can also indicate a bug in the usermode driver generating commands that would hang the GPU, or a bug in an app

18:24 <bestest> I'm suffering startup crashes on mesa for the game Minit, which uses 32-bit YoYo Games Linux Runner 1.3 and appears to suffer from the issues described here https://gitlab.freedesktop.org/mesa/mesa/-/issues/1310 https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4181 . As far as I can tell, these issues were fixed 2 years ago, but I'm on the latest stable for mesa and the game still crashes. Was there a regression, or was the

18:24 <bestest> issue never properly fixed, or am I simply doing something wrong?

18:27 <eric_engestrom> sewn: that's really weird; the code that handles this is simple enough that I'm sure there's no bug in it:

18:27 <eric_engestrom> https://gitlab.freedesktop.org/mesa/mesa/-/blob/mesa-22.3.5/meson.build#L1597

18:27 <eric_engestrom> zstd is replaced with an empty dependency if you pass `-D zstd=disabled`, so anything after that will never even know zstd is a thing

18:28 <sewn> just to be sure, im building this with lib32 in mind, and pkg config path is set to /usr/lib32/pkgconfig

18:30 <bestest> Well, it does crash; is there any info I can provide that would help?

18:30 <bestest> I tried to get a backtrace, but it just returns no stack

18:31 <bestest> Oh, I'm sorry, I misread, my apologies

18:35 heat_ has quit [Read error: No route to host]

18:35 heat has joined #dri-devel

18:39 <soreau> sewn: have you tried removing the build directory and trying again?

18:40 <sewn> im building it with a package manager, so technically yes

18:40 <eric_engestrom> sewn: that shouldn't make any difference with that; it might not find things or find things it can't use if the lib32 config is missing/wrong, but that's it

18:40 <sewn> eric_engestrom: it says udev is found but is not actually in pkg config path

18:47 bestest has quit [Quit: Leaving]

18:54 kzd_ has joined #dri-devel

19:00 kzd has quit [Ping timeout: 480 seconds]

19:02 <karolherbst> gfxstrand: yeah... I still have no idea, but apparently my new version changes things and I replaced `deref->var->type` with `deref->type`...

19:02 <karolherbst> and now the llvmpipe CI tests aren't failing anymore

19:02 <karolherbst> or rather.. randomly crashing

19:02 <karolherbst> I still have no idea how that path is even hit..

19:03 <jenatali> Yeah deref->var would only be valid for direct variable derefs, but if you've got arrays, it wouldn't be set

19:06 bluetail986 has joined #dri-devel

19:07 <DemiMarie> jenatali: are GPUs generally unable to prevent malicious userspace from freezing them?

19:07 <jenatali> In my experience, yes

19:07 <DemiMarie> Is this because of the lack of instruction-level preemption?

19:08 <jenatali> Effectively, running a shader on the GPU is kind of like running a userspace process. Someone authored the shader code. If it does something like dereference a null pointer... it's got to crash somehow

19:08 <jenatali> Some GPUs can report those types of errors, others just hang

19:09 konstantin_ has joined #dri-devel

19:09 <jenatali> That's about the extent of my knowledge though

19:09 <DemiMarie> I expect the shader to crash, but it should not take down other stuff on the GPU.

19:10 <jenatali> Right. Newer GPUs don't have to take down the whole GPU, that particular program would just hang, and then the host OS can reset the engine and set the GPU to work on a different task

19:11 iive has joined #dri-devel

19:11 <DemiMarie> This is especially important for VR/AR where a malicious shader must not be able to prevent the VR/AR from updating, as otherwise the human user might get sick.

19:11 agd5f_ has joined #dri-devel

19:11 <gfxstrand> karolherbst: Oh.... Yeah, deref->var->type wasn't going to work

19:12 <karolherbst> yeah.. but I'm more confused on how llvmpipe even hit that path...

19:13 <karolherbst> I think I still have a regression with AMD, but I can figure out what's wrong there...

19:13 bluetail98 has quit [Ping timeout: 480 seconds]

19:13 nchery is now known as Guest5166

19:13 nchery has joined #dri-devel

19:13 <karolherbst> all arb_bindless_texture related, which kind of makes sense *sigh*

19:15 neko2 has joined #dri-devel

19:15 konstantin has quit [Ping timeout: 480 seconds]

19:17 neko2 has left #dri-devel [#dri-devel]

19:18 agd5f has quit [Ping timeout: 480 seconds]

19:21 Guest5166 has quit [Ping timeout: 480 seconds]

19:21 danvet has quit [Read error: Connection reset by peer]

19:22 ngcortes has joined #dri-devel

19:25 neko2 has joined #dri-devel

19:25 <neko2> whoops of course I forgot nick reg

19:25 <neko2> hey all. I've encountered a rather nasty amdgpu reset/crash loop today that I can trigger pretty reliably. wondering if this the right channel or if I need to direct it somewhere else... memory is fuzzy but I'm sure it was dri something or another.

19:27 <karolherbst> neko2: probably just want to file a bug on gitlab or something, but there is a #radeon channel for more AMD specific things

19:27 rmckeever has joined #dri-devel

19:28 <neko2> right, I feel like it is more of a kernel thing anyway. yes there is a crash triggered by a program which I've yet to get to the bottom of, but there is this loop of reset fails on top.

19:28 <karolherbst> ohh it's more of a hardware thing

19:28 <neko2> I suspect sp

19:28 <neko2> *so

19:28 <neko2> I already think this hw is cursed tbh

19:28 <karolherbst> the hardware doesn't really support fault recovery, so all the kernel can do is a full GPU reset

19:28 <karolherbst> and often that doesn't go as well

19:29 <neko2> shall I just upload the dmesg somewhere so you can see if it's one of those such cases

19:29 <karolherbst> _but_ ultimately it's also Userspace sending faulty commands to the hardware

19:29 <karolherbst> mhh

19:29 danvet has joined #dri-devel

19:30 <karolherbst> yeah, I guess it makes sense to file a kernel bug, but also a mesa bug (if it's triggered throguh GL/VK(

19:30 <karolherbst> Userspace obviously shouldn't send garbage, but the kernel should also handle recovery better

19:30 <neko2> in this case the offender was libreoffice of all things

19:31 <neko2> I can only survive if I disable it's acceleration

19:31 <karolherbst> mhh

19:31 <neko2> here is the log https://paste.rs/peC

19:31 <karolherbst> acceleration through OpenCL by any chance or general acceleration?

19:31 <neko2> I think general GL

19:31 <neko2> it didn't get past the splash screen

19:31 <neko2> it'd hang, a ring timeout would fire, the reset loop would happen for a while

19:32 <neko2> eventually it gives up and hangs on something else, and after that I lose system responsiveness

19:32 <karolherbst> yeah.. if the GPU keeps getting broken command it will just crash again

19:33 <neko2> huh. but the kernel can identity it's soffice.bin. surely if a crash occurs the best thing is to punt the offender

19:33 <neko2> obv I'm not a kernel dev

19:33 <karolherbst> mhh, not really, or at least that's not what people would like to do

19:33 <neko2> well I mean it's standard for userspace

19:33 <neko2> you segfault, you have a bug, your results are meaningless

19:33 <neko2> you get murdered

19:34 <karolherbst> sure.. but it could also just reap the userspace process if it keeps crashing multiple times

19:34 <karolherbst> doesn't have to check the process name

19:34 <karolherbst> anyway.. there are multiple bugs to be fixed :)

19:35 <neko2> in any case, I already expect my hardware doesn't help with the recovery process, given that it's raven ridge on a quirky motherboard

19:35 <karolherbst> newer GPUs aren't better

19:35 <neko2> still, I have encountered enough bugs at this point to know this combo is a source of curses

19:35 <pixelcluster> neko2: it indeed starts out looking like a relatively "normal" gpu hang ("ring comp_1.1.0 timeout")

19:35 <karolherbst> well.. newer AMD gpu's

19:35 <pixelcluster> the recovery seems *really* cursed though, I never saw it fail like that

19:35 <karolherbst> some vendors care more about GPU resets, some.. don't

19:35 <neko2> karolherbst: yeahhhhh ngl as much as I hate it, think my next system will be team blue gpu

19:35 <pixelcluster> karolherbst: well it can work

19:36 <karolherbst> well, yes... but you have to reset the entire GPU

19:36 <neko2> pixelcluster: like I said. I think asrock have fucked several things in this firmware

19:36 <karolherbst> others aren't that cursed

19:36 <pixelcluster> it does work quite well on the steam deck

19:36 <neko2> ... wait heck, language check, rip

19:36 <karolherbst> yeah.. not saying that it doesn't work most of the time

19:36 <neko2> (sorry. I'll keep the f-strikes tactical but I think that one was deserved, this bios is truly awful in many ways)

19:36 <karolherbst> but I've seen it fail miserably and also seen it recovering properly

19:37 <pixelcluster> I think that is because recovery is better tested in the kernels they ship

19:37 <karolherbst> nah.. not all people are from the US here :P

19:37 <neko2> like to get this far with the crash loop, I had to disable iommu beecause this firmware's handling of this is likely broken based on looking up those crash logs (I don't have those sadly, something broke in saving the logs)

19:37 <neko2> it would fail on even the first reset otherwise

19:37 <pixelcluster> rip

19:38 <neko2> so that doesn't bode well

19:38 <karolherbst> mhh I can also imagine that enabling iommu makes things worse, because I don't think any developer tests with iommu enabled tbh

19:38 <neko2> side note: iommu.strict can be ignored in the cmdline, I had left that there but at this point I disabled it in firmware due to being broken on this system

19:38 <karolherbst> well.. can always file bugs

19:39 <neko2> not actually sure where to start with the crash loop filing that

19:39 <neko2> where do I even file a bug for amdgpu's kernel side

19:39 <karolherbst> good question

19:39 <neko2> even if they tell me "your hardware is screeeeewed"

19:40 <karolherbst> https://gitlab.freedesktop.org/drm/amd/-/issues

19:40 <neko2> (I do suspect a 50% chance of this at this point)

19:40 <karolherbst> I mean.. it's their hardware, they probably know it's screwed

19:40 <karolherbst> :P

19:41 <karolherbst> but yeah.. in 95% of the cases where hardware is blamed, it's actually a software bug

19:41 <neko2> well when I say hardware, I mean the firmware intimately tied to it

19:42 <neko2> there were a few flags reading it that suggested something terribly wrong was occuring at a low level

19:42 <karolherbst> maybe

19:42 <karolherbst> but that doesn't really matter for GPU resets

19:42 <neko2> oh? I was under the impression the firmware had to be in cooperation to reset properly...

19:42 <neko2> at least as far as the primary GPU is concerned

19:43 <karolherbst> the GPU's firmware yes, but not really the motherboard one

19:43 <karolherbst> once you are in your OS the firmware doesn't really do much with the GPU anymore

19:43 <neko2> right. well, it's an integrated one, on a 2200G, so I doubt there would be weird firmware for it...?

19:43 <karolherbst> yeah and even if.. the driver has to deal with nasty GPU firmware

19:44 <karolherbst> OEMs usually get tools from nvidia/AMD to customize the GPU firmware, but that still follows rules

19:45 <neko2> just trying some keywords in the search atm to look for dupes

19:46 <jenatali> Why are apps awful...

19:46 <karolherbst> because they are written by humans :P

19:47 <jenatali> GFXBench apparently hardcodes VK_FORMAT_A2R10G10B10, which is optional, when they could just as easily use VK_FORMAT_A2B10G10R10, which is required

19:47 <karolherbst> maybe VK_FORMAT_A2R10G10B10 was faster on nvidia?

19:47 gouchi has quit [Remote host closed the connection]

19:48 mvlad has quit [Remote host closed the connection]

19:48 <jenatali> Fine, then check format support and use it if it's there, don't just assume that it is

19:48 gouchi has joined #dri-devel

19:48 <karolherbst> heh.. using different formats per GPU in a benchmark? that ain't fair :P

19:48 <neko2> when did nvidia ever play fair...

19:48 <neko2> *ducks*

19:49 <jenatali> Then... use the one that's guaranteed to be there

19:49 <neko2> ok, I'm not seeing any results thus far that have my particular looping crash issue. so I think it's safe to say I can file a new bug report there.

19:49 <karolherbst> you could submit patches and see what they say :D

19:50 <jenatali> Is that an OSS benchmark? I didn't think it was

19:50 <karolherbst> I mean.. you probably have access to the source, no?

19:51 <jenatali> Eh some groups at MSFT do, I don't

19:52 <karolherbst> heh

19:52 <neko2> karolherbst: btw, asssuming the specific trigger (whatever the heck libreoffice is doing) is never asked for in fixing the crash loop, what would then be the steps to go through to fix the bug with libreoffice's whatever-it's-doing causing a crash to begin with? I presume I'd need API traces or such

19:52 <karolherbst> but I suspect they'll say no "because that would make older results invalid"

19:52 nchery has quit [Read error: Connection reset by peer]

19:52 <neko2> (and I imagine that'd be more the mesa side in terms of not sending something that'd crash to the gpu when given dubious input)

19:52 <karolherbst> neko2: probably?

19:53 <karolherbst> yeah soo there are always different pov here

19:53 <karolherbst> you could also argue that libreoffice might use the API incorrectly (if that's the case)

19:53 <neko2> oh, so it's not clear who's fault it is yet

19:53 <karolherbst> but regardless of that, mesa shouldn't really end up crashing the system

19:53 <karolherbst> and the kernel should be able to recover...

19:54 <karolherbst> but it also always depends on the things libreoffice is doing

19:54 <neko2> I personally think both sides should be corrected if possible tbh, especially if it runs on an older (stable?) kernel that has that bug

19:54 <karolherbst> some APIs specify that they can bring down the system if used incorrectly

19:54 <neko2> wait, I'm getting my layers confused now, nvm

19:54 <neko2> karolherbst: texture handles? x)

19:55 <neko2> (which I am told are sometimes just essentially GPU pointers)

19:55 <karolherbst> well.. anything where you hand in actual pointers can cause funky problems

19:55 <karolherbst> bindless_textures can be such thing in OpenGL e.g.

19:55 <neko2> right, bindless textures, that was what I was thinking of

19:56 <neko2> I am reminded of the confusingly named attrib pointer functions in openGL which were offsets, not actual pointers, despite the prototype... if I'm remembering that right anyway

19:57 <karolherbst> in any case, unprivileged Userspace shouldn't be able to bring down the system, because that's a CVE level bug

19:57 nchery has joined #dri-devel

19:58 <neko2> yeah, there was already a joke in archlinux-offtopic on libera earlier, what a way to DoS an amdgpu system, send them a spreadsheet to open (or in fact anything that LO is set to open)

19:59 <neko2> all I was doing was trying to just view a spreadsheet I'd been sent. T_T alas s!@# happens

20:00 <neko2> anyway, thanks all for the input, it seems there's a clear order to at least get that reset bug fixed, then at least once I can trigger the crash without bringing the house down I can then do more useful debugging of whatever the heck libreoffice is doing.

20:02 Haaninjo has joined #dri-devel

20:02 <jenatali> And of course using ABGR instead of ARGB works just fine

20:03 agd5f has joined #dri-devel

20:08 agd5f_ has quit [Ping timeout: 480 seconds]

20:14 jfalempe has quit [Quit: Leaving]

20:14 <cmarcelo> jenatali: what's the minium required version of MSVC for Mesa?

20:15 <jenatali> cmarcelo: Either VS2019 or VS2022, not sure if we'd dropped 2019 yet

20:15 <jenatali> Any particular reason?

20:15 <jenatali> Ah CI still builds with 2019

20:17 <cmarcelo> jenatali: designated initializers... oldest clang / gcc we require support them even without C++20, I know "MSVC 2019 16.1" (is this what CI have?) does support it under /C++20. wondering if there's another flag for that in MSVC.

20:17 <jenatali> cmarcelo: No, it's only supported in C++20 mode

20:17 <cmarcelo> jenatali: context: https://gitlab.freedesktop.org/mesa/mesa/-/jobs/36599451 line 726

20:18 <jenatali> Build that test as C++20?

20:18 ngcortes has quit [Ping timeout: 480 seconds]

20:19 <cmarcelo> jenatali: would you be ok with "if msvc: set C++20"?

20:19 <jenatali> Yeah, fine by me

20:20 <jenatali> For that test at least, not sure if we want to upgrade the whole tree

20:20 <cmarcelo> sure

20:25 <jenatali> cmarcelo: Ping me in the MR if you want an ack from me on the patch :)

20:26 <cmarcelo> Cool tks.

20:27 ngcortes has joined #dri-devel

20:30 nchery has quit [Remote host closed the connection]

20:30 nchery has joined #dri-devel

20:39 ngcortes has quit [Ping timeout: 480 seconds]

20:53 ngcortes has joined #dri-devel

20:56 junaid has joined #dri-devel

21:02 idr has joined #dri-devel

21:03 <idr> jenatali: Does this look even close to correct: https://gitlab.freedesktop.org/idr/mesa/-/commit/8a3bb8f71b22fccb4f323e539e0ad758f9cff8fc

21:04 agd5f_ has joined #dri-devel

21:05 <jenatali> idr: Seems plausible. Where'd you find the 1930 number?

21:06 <idr> jenatali: https://stackoverflow.com/questions/70013/how-to-detect-if-im-compiling-code-with-a-particular-visual-studio-version

21:06 <idr> Other sources say 2019 needs /std:c++lastest while 2022 can use /stc:c++20.

21:07 <idr> Surely 38 random forum posts can't steer me wrong.

21:07 <cmarcelo> https://devblogs.microsoft.com/cppblog/msvc-cpp20-and-the-std-cpp20-switch/

21:07 <jenatali> Yeah I'm just not sure if that's the version number that meson detects

21:08 <jenatali> Lemme see what it says for my compiler

21:08 <cmarcelo> (shared just to note that: eventually even 2019 got the c++20 too, but not sure if is old enough)

21:10 <idr> 2019 16.11... but maybe not 16.9 or 16.10?

21:10 agd5f has quit [Ping timeout: 480 seconds]

21:10 <jenatali> Yeah I think the first version number with support is 19.29

21:10 <jenatali> https://github.com/mesonbuild/meson/blob/master/mesonbuild/compilers/cpp.py#L735

21:11 <jenatali> That lines up with the output from the build job (https://gitlab.freedesktop.org/idr/mesa/-/jobs/36605312) saying the version number is 19.29.30146

21:12 <jenatali> _MSC_VER is a macro available in the source, which apparently has no relation to the version reported through stdout? I dunno it's all a mess

21:13 <jenatali> Oh I see, 1930 == 19.30, that makes sense. So yeah, < 19.29 instead of < 1930 is what you want

21:13 <idr> Yeah... I was just typing something like that. :)

21:13 kzd_ has quit []

21:15 <idr> Maybe "if 'c++20' in cc.get_options()" would be better?

21:15 <idr> dcbaker: ^^^

21:15 agd5f has joined #dri-devel

21:15 neko2 has quit [Quit: leaving]

21:15 <idr> Or cpp_stds?

21:16 kzd has joined #dri-devel

21:18 <dcbaker> idr: yeah, if you can`override_options : ['cpp_std=c++20']` or `=c++lastest` (meson should understand both), assuming a new enough version

21:19 <dcbaker> In famous words, I have some patches that I should finish up that would make that all more robust, but...

21:19 apinheiro has quit [Ping timeout: 480 seconds]

21:20 agd5f_ has quit [Ping timeout: 480 seconds]

21:23 * idr tries that...

21:27 <jenatali> Thanks. Sorry MSVC is a bit of a headache here :(

21:27 <demarchi> rodrigovivi: as we were talking earlier today, the subdir-ccflags in drivers/gpu/drm/xe/Makefile is applying the cflags to the whole dir instead of just to the display-related compilation units... do you know if there is a way to do one of the options below? 1) add a separate Makefile in the xe/display dir, so subdir-ccflags applies only to that, but still link everything in the xe.ko; or 2) replace the subdir-ccflags with something

21:27 <demarchi> else so it only applies to the display/%.o objects?

21:28 <jenatali> I wish it just supported designated initializers without having to be in C++20 mode

21:29 <dcbaker> I blame the C++ committee

21:30 agd5f_ has joined #dri-devel

21:31 <idr> dcbaker: Can you elaborate on "override_options"?

21:31 <idr> This did not work: https://gitlab.freedesktop.org/idr/mesa/-/commit/4753aa729b20165bf95c85baad154ab51505ca6c

21:31 <idr> https://gitlab.freedesktop.org/idr/mesa/-/jobs/36607114

21:33 <dcbaker> @idr, ah, `override_options` is a keyword to pass to a build target like `cpp_args`, but it tells meson "Hey, you know that default option I told you about? Yeah, ignore that, do this instead" so you'd write something like `cpp_std_override = ['cpp_std=c++latest']\n executable(..., override_options : cpp_std_override)`

21:33 <dcbaker> sorry, I should have been more clear about that

21:34 <dcbaker> which will stop meson from putting two c++ standard arguments into the command line

21:34 <dcbaker> src/intel/compiler does that with c++17

21:36 agd5f has quit [Ping timeout: 480 seconds]

21:37 <idr> So... how do I do that to select between option A, option B, or nothing?

21:38 <idr> Because the "obvious" things don't work.

21:39 <idr> I might have a thing that's good enough...

21:39 agd5f_ has quit [Ping timeout: 480 seconds]

21:40 <idr> https://gitlab.freedesktop.org/idr/mesa/-/commit/605d405a7e00f8c859e9f1e07ce6f9af5274dea4

21:40 <idr> That builds locally. :shrugh:

21:41 <jenatali> That looks reasonable to me

21:41 <idr> jenatali: We'll see if it also looks reasonable to the CI. :)

21:44 Duke`` has quit [Ping timeout: 480 seconds]

21:45 <rodrigovivi> demarchi: I really don't know... I believe that that separated file under the display dir could do the trick... the worst part that is the one I marked with XXX in the xe/Makefile I believe can be now removed after your patch to include the files directly or to remove the need for the i915 files...

21:49 <demarchi> # XXX: Needed for i915 register definitions. Will be removed after xe-regs

21:49 <demarchi> this?

21:49 <demarchi> this is a nop

21:50 <demarchi> the line above it will add the include to all .o

21:50 Zopolis4 has joined #dri-devel

21:50 <demarchi> oh... you mean, if disabling display in the kconfig

21:53 <demarchi> rodrigovivi: no, we can't remove, unless we add more ifdefs around the code. I fixed several of those by removing display completely and checking the errors, but there are some hard ones:

21:54 <demarchi> drivers/gpu/drm/xe/xe_device_types.h -> display/intel_display_core.h -> the-world.h

21:55 <demarchi> and some files rely on this indirect include, like xe_pci.c

22:08 <rodrigovivi> ouch :(

22:09 apinheiro has joined #dri-devel

22:09 <rodrigovivi> it would be good to have something cleaner for this display reuse...

22:10 krushia has joined #dri-devel

22:12 danvet has quit [Ping timeout: 480 seconds]

22:12 danvet has joined #dri-devel

22:23 danvet has quit [Ping timeout: 480 seconds]

22:24 danvet has joined #dri-devel

22:45 <gfxstrand> Does anyone else remember this crazy loader bug where it falls over on vkGetPhysicalDeviceProperties2KHR() if you support gpdp2 but not 1.1?

22:45 <gfxstrand> Or maybe it's a crazy CTS bugb?

22:45 <jenatali> It's a CTS bug

22:46 <gfxstrand> Oh, so someone does remember it. :)

22:46 <jenatali> The CTS doesn't enable the extension for gpdp2

22:46 <gfxstrand> That'll do it

22:46 <jenatali> Yeah... I was tripping over it constantly until I flipped on 1.1

22:46 <gfxstrand> Ugh

22:46 <jenatali> I assumed it was a regression but according to the history for those tests, nope

22:46 <jenatali> And I didn't see any issues filed about it in a quick skim. I probably should've filed one

22:47 <gfxstrand> I'm guessing it wasn't a problem until Mesa started doing the right thing and returning NULL if you don't enable an extension

22:48 <jenatali> Yeah I'd believe that

22:48 <gfxstrand> alright, I'll see if I can fix the CTS quick.

22:48 <gfxstrand> I'd make my intern do that but I want her to still like me. (-:

22:50 <demarchi> rodrigovivi: for now I'm keeping a hack commit on top "Undo display", that at least lets me test if the rest is moving to the right direction

22:51 <demarchi> we may need to rethink the display integration

22:51 <demarchi> i.e. I know the way it is right now is temporary, but it shouldn't be causing issues to the rest of the driver

22:53 junaid has quit [Remote host closed the connection]

22:53 danvet has quit [Ping timeout: 480 seconds]

22:54 gouchi has quit [Remote host closed the connection]

22:55 jkrzyszt has quit [Remote host closed the connection]

22:59 darkapex has quit [Remote host closed the connection]

22:59 macromorgan is now known as Guest5176

22:59 macromorgan has joined #dri-devel

22:59 darkapex has joined #dri-devel

23:02 danvet has joined #dri-devel

23:07 Guest5176 has quit [Ping timeout: 480 seconds]

23:08 jkrzyszt has joined #dri-devel

23:18 nchery has quit [Ping timeout: 480 seconds]

23:19 nchery has joined #dri-devel

23:19 apinheiro has quit [Quit: Leaving]

23:19 bgs has quit [Remote host closed the connection]

23:19 Haaninjo has quit [Quit: Ex-Chat]

23:21 <gfxstrand> Ok, now that I figured out how to actually get SSH to work...

23:27 jkrzyszt has quit [Remote host closed the connection]

23:28 macromorgan is now known as Guest5180

23:28 macromorgan has joined #dri-devel

23:28 <Ristovski> I love it when I reboot and get a random beep code but I can't reproduce it anymore

23:28 Guest5180 has quit [Read error: Connection reset by peer]

23:28 macromorgan is now known as Guest5181

23:28 macromorgan has joined #dri-devel

23:29 <psykose> they should have replays but for Life

23:30 rasterman has quit [Quit: Gettin' stinky!]

23:36 Guest5181 has quit [Ping timeout: 480 seconds]

23:36 <Ristovski> hmm maybe there already is `irltrace`, and its being used to edit-and-replay Mike until he makes zink get 9000FPS in every benchmark possible

23:37 <zmike> sweatytowelguy.jpg

23:37 agd5f has joined #dri-devel

23:42 jkrzyszt has joined #dri-devel

23:45 warpme_____ has quit []

23:54 danvet has quit [Ping timeout: 480 seconds]

23:57 <gfxstrand> Why are these tests creating custom instances?!?