#dri-devel on 2022-05-02 — irc logs at oftc.irclog.whitequark.org

2022-03-22 11:57 ChanServ changed the topic of #dri-devel to: <ajax> nothing involved with X should ever be unable to find a bar

00:02 adjtm is now known as Guest3314

00:02 Guest3314 has quit [Read error: Connection reset by peer]

00:02 adjtm has joined #dri-devel

00:08 co1umbarius has joined #dri-devel

00:10 columbarius has quit [Ping timeout: 480 seconds]

01:41 <karolherbst> airlied: yeah... I think we truncate pointers somewhere

01:43 jewins has quit [Ping timeout: 480 seconds]

01:56 nvishwa1 has quit [Remote host closed the connection]

01:56 nvishwa1 has joined #dri-devel

02:17 pushqrdx has joined #dri-devel

02:20 lemonzest has quit [Quit: WeeChat 3.4]

02:20 <karolherbst> airlied: I found it

02:21 <karolherbst> something with wrong scratch size and indirect accesses :(

02:29 <karolherbst> yeah... I kind of need a better solution here

02:31 digetx has quit [Read error: Connection reset by peer]

02:31 digetx has joined #dri-devel

03:14 hch12907 has quit [Ping timeout: 480 seconds]

03:25 mhenning has quit [Quit: mhenning]

03:47 elongbug_ has joined #dri-devel

03:47 elongbug__ has quit [Read error: Connection reset by peer]

04:17 sdutt has joined #dri-devel

04:29 mszyprow has joined #dri-devel

04:47 Duke`` has joined #dri-devel

04:48 mszyprow has quit [Ping timeout: 480 seconds]

05:10 hch12907 has joined #dri-devel

05:18 itoral has joined #dri-devel

05:20 itoral_ has joined #dri-devel

05:26 itoral has quit [Ping timeout: 480 seconds]

05:29 lemonzest has joined #dri-devel

05:37 kts has joined #dri-devel

05:41 mszyprow has joined #dri-devel

05:49 Duke`` has quit [Ping timeout: 480 seconds]

06:03 mclasen has joined #dri-devel

06:13 nchery has joined #dri-devel

06:21 kts has quit [Ping timeout: 480 seconds]

06:22 mclasen has quit [Ping timeout: 480 seconds]

06:27 tzimmermann has joined #dri-devel

06:32 mclasen has joined #dri-devel

06:52 frieder has joined #dri-devel

07:22 jfalempe has joined #dri-devel

07:31 itoral_ has quit [Remote host closed the connection]

07:32 itoral_ has joined #dri-devel

07:35 hch12907 has quit [Ping timeout: 480 seconds]

07:35 nvishwa1 has quit [Read error: Connection reset by peer]

07:37 rgallaispou has joined #dri-devel

07:42 danvet has joined #dri-devel

07:44 itoral_ has quit [Remote host closed the connection]

07:45 itoral has joined #dri-devel

07:47 itoral has quit [Remote host closed the connection]

07:48 itoral has joined #dri-devel

07:50 itoral has quit [Remote host closed the connection]

07:51 itoral has joined #dri-devel

07:56 thellstrom has joined #dri-devel

07:58 lynxeye has joined #dri-devel

07:59 itoral has quit [Remote host closed the connection]

07:59 hch12907 has joined #dri-devel

07:59 itoral has joined #dri-devel

08:12 itoral has quit [Remote host closed the connection]

08:13 itoral has joined #dri-devel

08:15 itoral has quit [Remote host closed the connection]

08:16 itoral has joined #dri-devel

08:17 itoral has quit [Remote host closed the connection]

08:18 itoral has joined #dri-devel

08:20 itoral has quit [Remote host closed the connection]

08:21 itoral has joined #dri-devel

08:22 itoral has quit [Remote host closed the connection]

08:23 itoral has joined #dri-devel

08:26 itoral has quit [Remote host closed the connection]

08:27 itoral has joined #dri-devel

08:29 itoral has quit [Remote host closed the connection]

08:29 itoral has joined #dri-devel

08:30 mclasen has quit [Ping timeout: 480 seconds]

08:36 itoral has quit [Remote host closed the connection]

08:37 itoral has joined #dri-devel

08:38 pcercuei has joined #dri-devel

08:39 itoral has quit [Remote host closed the connection]

08:39 itoral has joined #dri-devel

08:42 mclasen has joined #dri-devel

08:43 rasterman has joined #dri-devel

08:45 itoral has quit [Remote host closed the connection]

08:45 itoral has joined #dri-devel

08:47 itoral has quit [Remote host closed the connection]

08:48 itoral has joined #dri-devel

08:49 itoral has quit [Remote host closed the connection]

08:49 itoral has joined #dri-devel

08:51 thellstrom has quit [Ping timeout: 480 seconds]

08:51 apinheiro has joined #dri-devel

08:53 sdutt has quit [Ping timeout: 480 seconds]

08:54 itoral has quit [Remote host closed the connection]

08:54 itoral has joined #dri-devel

08:57 gawin has joined #dri-devel

08:58 itoral has quit [Remote host closed the connection]

08:59 itoral has joined #dri-devel

09:03 anarsoul|2 has joined #dri-devel

09:03 anarsoul has quit [Read error: Connection reset by peer]

09:10 itoral has quit [Remote host closed the connection]

09:10 itoral has joined #dri-devel

09:15 itoral has quit [Remote host closed the connection]

09:16 itoral has joined #dri-devel

09:18 anarsoul|2 has quit [Ping timeout: 480 seconds]

09:18 anarsoul has joined #dri-devel

09:21 itoral has quit [Remote host closed the connection]

09:22 itoral has joined #dri-devel

09:23 itoral has quit [Remote host closed the connection]

09:23 itoral has joined #dri-devel

09:30 itoral has quit [Remote host closed the connection]

09:30 itoral has joined #dri-devel

09:34 itoral has quit [Remote host closed the connection]

09:34 itoral has joined #dri-devel

09:43 MajorBiscuit has joined #dri-devel

09:48 mclasen has quit [Ping timeout: 480 seconds]

09:53 itoral has quit [Remote host closed the connection]

09:53 itoral has joined #dri-devel

09:59 elongbug__ has joined #dri-devel

10:03 itoral has quit [Remote host closed the connection]

10:03 itoral has joined #dri-devel

10:06 elongbug_ has quit [Ping timeout: 480 seconds]

10:15 itoral has quit [Remote host closed the connection]

10:16 itoral has joined #dri-devel

10:20 itoral has quit [Remote host closed the connection]

10:21 itoral has joined #dri-devel

10:22 itoral has quit [Remote host closed the connection]

10:22 itoral has joined #dri-devel

10:25 mclasen has joined #dri-devel

10:26 itoral has quit [Remote host closed the connection]

10:27 xtarun has joined #dri-devel

10:27 itoral has joined #dri-devel

10:28 xtarun has quit []

10:33 YuGiOhJCJ has joined #dri-devel

10:33 itoral has quit [Remote host closed the connection]

10:33 flacks has quit [Quit: Quitter]

10:33 itoral has joined #dri-devel

10:40 flacks has joined #dri-devel

10:44 itoral has quit [Remote host closed the connection]

10:44 itoral has joined #dri-devel

10:55 itoral has quit [Remote host closed the connection]

10:55 itoral has joined #dri-devel

11:01 mclasen has quit [Ping timeout: 480 seconds]

11:02 <tzimmermann> javierm, mripard sent me a bug report about fbdev hotunplugging going wrong: https://github.com/raspberrypi/linux/issues/5011

11:02 <tzimmermann> i guess we need to repork this

11:03 itoral has quit [Remote host closed the connection]

11:03 itoral has joined #dri-devel

11:04 <tzimmermann> 'rework' :)

11:04 itoral has quit [Remote host closed the connection]

11:05 itoral has joined #dri-devel

11:10 * javierm reads

11:12 itoral has quit [Remote host closed the connection]

11:13 <tzimmermann> javierm, there's a kernel log, which shows plymouth crashing. https://github.com/raspberrypi/linux/files/8573600/log.txt I suspect that plymouth tries to access fbdev framebuffer memory after the platform device has been unplugged

11:13 itoral has joined #dri-devel

11:15 <javierm> tzimmermann: yes, I'm reading the log now

11:17 <karolherbst> what I want is: containers to run any distribution deskop on any machine

11:21 itoral has quit [Remote host closed the connection]

11:22 itoral has joined #dri-devel

11:24 itoral has quit [Remote host closed the connection]

11:25 itoral has joined #dri-devel

11:31 <javierm> tzimmermann: so if I understand the problem correctly, is like this 1) simplefb is registered and /dev/fb0 opened by plymouth, get_fb_info() is called

11:31 itoral has quit [Remote host closed the connection]

11:31 <javierm> 2) then at some point vc4 DRM driver registers and kicks out simplefb fbdev, then platform_device_unregister() is called

11:32 itoral has joined #dri-devel

11:32 <javierm> tzimmermann: 3) but file->private_data still has a reference to struct fb_info that got in fb_open() and so in fb_release() things go boom

11:33 <tzimmermann> javierm, i didn't step through it. i was just guessing

11:33 <tzimmermann> where did you see the get_fb_info call?

11:34 <javierm> tzimmermann: yean, I'm guessing too by reading the code and the log

11:34 <javierm> tzimmermann: I did not but in the call stack there's fb_release() and a mutex_lock()

11:35 <javierm> tzimmermann: so I think that's either a use-after-free in fb_release() when doing struct fb_info * const info = file->private_data and then lock_fb_info(info) or a locking issue

11:35 <javierm> tzimmermann: that's why I thought that's more safe to drop the lock and re-acquire rather than making assumptions about the locking status

11:36 <javierm> the __mutex_lock_slowpath and __mutex_lock.constprop.9 in the crash log are also suspicious

11:41 <javierm> [ 11.365003] Unable to handle kernel paging request at virtual address fe7e014c

11:41 <javierm> [ 11.365039] [fe7e014c] *pgd=00000000

11:41 <javierm> seems to be a NULL pointer deref

11:43 <tzimmermann> javierm, AFAICT your theory aligns with a bug report that was on linux-fbdev this weekend: https://lore.kernel.org/linux-fbdev/ab099144-2db2-caae-7a59-94211111a6cf@suse.de/T/#m93fb3184c1de6f5d444081a21128f9c703d93b53

11:44 <javierm> tzimmermann: oh, that does sound like the same issue indeed

11:44 <javierm> :(

11:44 <javierm> another corner case... I want to disable CONFIG_FB so much

11:46 <tzimmermann> fbdev worked well until we tried to fix it :)

11:47 <javierm> tzimmermann: for some definition of well :)

11:49 <tzimmermann> javierm, i don't think the provided patch is correct, though. seems like it's papering over the issue

11:49 <javierm> tzimmermann: absolutely agree

11:50 <javierm> tzimmermann: but also... what's the point of keeping the device if the real driver will get other the display controller anyways

11:52 <javierm> tzimmermann: so I wonder if we shouldn't just prevent this NULL pointer deref to happen, i.e: add a if (!info) return or something like that

11:53 <tzimmermann> and return -ENODEV? if this already fixes the problem, i'm all for it

11:54 <javierm> tzimmermann: yeah. But I don't know how to reproduce the issue. Maybe answering the person that proposed the patch with this suggestion ?

11:55 <tzimmermann> javierm, maybe mripard or the reported of the bug

11:55 <javierm> tzimmermann: because if someone mmap'ed the /dev/fb0, things are not going well anyways after let's say vc4 probes

11:56 <javierm> the /dev/fb0 for simplefb I mean

11:56 <tzimmermann> javierm, i suspected this was the problem. but i've never encountered the error anywhere

11:57 <tzimmermann> javierm, after looking at get_fb_info(), i realized that we don't seem to clear registered_fb[i] anywhere after calling platform_device_unregister()

11:57 <tzimmermann> so the pointer is still there (?)

11:57 <tzimmermann> maybe that's the problem

11:57 <mripard> tzimmermann: I can try to reproduce and test it if you want, or you can comment on the bug report

11:57 <tzimmermann> i have to take another look

11:58 <mripard> I'm probably not going to be able to test for the next couple of days though, so commenting on the bug report might be the fastest option

11:58 <tzimmermann> mripard, thanks for the offer. i'll try first, but i'm really busy today

11:58 <javierm> tzimmermann: I've a rpi4 and can't reproduce it, I also been testing recently with simplefb too to cover all cases for https://lore.kernel.org/lkml/20220429084253.1085911-1-javierm@redhat.com/

11:58 <tzimmermann> if nothing helps, i'll get back to you

11:59 <tzimmermann> i have an rpi3+

11:59 <javierm> but I'm using Fedora... maybe something in the distro that stress differently

11:59 <javierm> like how we configure plymouth by default or whatever

12:01 <javierm> tzimmermann: I don't think that's true, it's set to NULL in do_unregister_framebuffer()

12:01 <javierm> tzimmermann: platform_device_unregister() will call the driver's .remove handler that will call unregister_framebuffer()

12:02 <javierm> I guess is a driver bug actually? That shouldn't unregister if is opened ?

12:04 q4a has joined #dri-devel

12:06 itoral has quit [Remote host closed the connection]

12:07 heat has joined #dri-devel

12:07 itoral has joined #dri-devel

12:07 <tzimmermann> javierm, i guess i missed that

12:08 <tzimmermann> i probably have to reproduce it

12:09 itoral has quit [Remote host closed the connection]

12:09 itoral has joined #dri-devel

12:10 <javierm> tzimmermann: do you know at what point the .release file ops is called during do_unregister_framebuffer() ?

12:10 <javierm> because that seems to be the root cause. The .fb_release callback shouldn't be called after the framebuffer has been unregistered

12:12 itoral has quit [Remote host closed the connection]

12:13 <tzimmermann> javierm, file->private_data is still set, i guess

12:14 itoral has joined #dri-devel

12:15 itoral has quit [Remote host closed the connection]

12:16 itoral has joined #dri-devel

12:16 gawin has quit [Ping timeout: 480 seconds]

12:18 cheako has quit [Quit: Connection closed for inactivity]

12:19 <tzimmermann> javierm, could we acquire the registration lock around https://elixir.bootlin.com/linux/v5.17.5/source/drivers/video/fbdev/core/fbmem.c#L1441

12:19 <tzimmermann> ?

12:20 <javierm> tzimmermann: I don't think so. Because if I'm reading correctly, we already do registration_lock -> fb_info -> lock

12:20 <tzimmermann> or we leave the fb_info around until the final reference has been dropped

12:20 <javierm> so that would cause an ABBA deadlock

12:20 <javierm> tzimmermann: but I'm writing a patch for you to read

12:20 <tzimmermann> thank you

12:22 gawin has joined #dri-devel

12:29 heat has quit [Ping timeout: 480 seconds]

12:33 itoral has quit [Remote host closed the connection]

12:34 itoral has joined #dri-devel

12:35 itoral has quit [Remote host closed the connection]

12:36 itoral has joined #dri-devel

12:36 <javierm> tzimmermann: I actually found two bugs in fbmem

12:37 <javierm> not related to your patches but things that were there before already

12:39 itoral has quit []

12:42 <javierm> tzimmermann: https://paste.centos.org/view/raw/ab974d99 and https://paste.centos.org/view/raw/565f1fcd

12:45 <javierm> ups, patch 1/2 doesn't compile :P https://paste.centos.org/view/raw/c8955e39 instead

12:47 frieder has quit [Ping timeout: 480 seconds]

12:50 <javierm> tzimmermann: btw, that assumptions that the fb_info reference can change beneath the user-space program feet that's holding a fd is scary, but that's how fbdev is...

12:51 <javierm> I'm just making .fb_release() consistent with fb_ioctl(), fb_mmap(), etc

12:52 agd5f has joined #dri-devel

12:54 kts has joined #dri-devel

13:03 rkanwal has joined #dri-devel

13:07 <tzimmermann> javierm, oh! so file_fb_info() is where the magic happens

13:08 apinheiro has quit [Ping timeout: 480 seconds]

13:08 frieder has joined #dri-devel

13:10 <javierm> tzimmermann: yeah and fb_release() was not using that accessor

13:10 <javierm> I've tested both patches in my rpi4 and at least didn't find any regressions

13:10 <javierm> tzimmermann, mripard: posted to the list, let me know what you think folks

13:27 mvlad has joined #dri-devel

13:50 jewins has joined #dri-devel

13:52 <javierm> tzimmermann: dropped patch 1/2, fbdev is a mine field so the less we touch it, the better :)

14:02 sdutt has joined #dri-devel

14:05 sdutt has quit []

14:05 sdutt has joined #dri-devel

14:05 fxkamd has joined #dri-devel

14:05 YuGiOhJCJ has quit [Quit: YuGiOhJCJ]

14:10 MrCooper_ is now known as MrCooper

14:16 Thymo_ has quit []

14:17 <jekstrand> dcbaker, jljusten: Ping on a waffle release

14:19 rasterman has quit [Quit: Gettin' stinky!]

14:20 Thymo has joined #dri-devel

14:20 <MrCooper> ajax: "is it actually that hard to predict which image will be returned next? it's the one with the lowest sbc" isn't always true, e.g. with direct scanout of a mailbox swapchain

14:21 <MrCooper> ajax: we can get rid of the WSI thread with Xwayland, if Wayland compositors & Xwayland are fixed to handle mailbox properly (replacing an older buffer only once the newer one is actually ready)

14:32 tzimmermann has quit [Quit: Leaving]

14:42 apinheiro has joined #dri-devel

14:46 gawin has quit [Ping timeout: 480 seconds]

14:47 <jekstrand> karolherbst, imirkin: A bit of help debugging nouveau?

14:47 <jekstrand> Getting this from Mesa:

14:47 <jekstrand> nvc0_screen_create:1168 - Error allocating PGRAPH context for M2MF: -16

14:47 <jekstrand> And this from the kernel:

14:47 <jekstrand> [ 835.464588] nouveau 0000:17:00.0: gr: fecs falcon already acquired by gr!

14:47 <jekstrand> [ 835.464593] nouveau 0000:17:00.0: gr: init failed, -16

14:47 <karolherbst> jekstrand: need signed firmware

14:47 <jekstrand> karolherbst: Ok... and those aren't in linux-firmware?

14:47 <karolherbst> they are, except for ampere

14:47 <jekstrand> karolherbst: This is an RTX 2080 FWIW

14:47 <jekstrand> 2060, rather

14:48 <karolherbst> but they might not be inside your initramfs

14:48 <jekstrand> Yeah... I wondered about that

14:48 <jekstrand> I'll try to debug that. Thanks

14:49 <karolherbst> jekstrand: there is a lsinitrd command

14:49 <karolherbst> ohh wiat

14:49 <karolherbst> jekstrand: I think you might acutally hit some bug

14:49 <karolherbst> :(

14:49 <karolherbst> what kernel are you on?

14:49 <jekstrand> Looks like I don't have nvidia firmware

14:50 <karolherbst> ahh

14:50 <karolherbst> I hope that is it, but...

14:52 <dcbaker> jekstrand: jljusten: I’m working on it. I’ve been working on the xorg release scripts to work with waffle like they do with mesa

15:01 ella-0 has joined #dri-devel

15:04 <cwabbott> ugh, this is so annoying...

15:04 <cwabbott> ../src/freedreno/vulkan/tu_cmd_buffer.c: In function ‘vk2tu_single_stage’:

15:04 <cwabbott> ../src/freedreno/vulkan/tu_cmd_buffer.c:3061:4: error: case label does not reduce to an integer constant

15:04 <cwabbott> 3061 | case VK_PIPELINE_STAGE_2_DRAW_INDIRECT_BIT:

15:04 <cwabbott> | ^~~~

15:04 ella-0_ has quit [Read error: Connection reset by peer]

15:05 <cwabbott> whyyy can't we have nice things like 64-bit enums

15:33 heat has joined #dri-devel

15:35 <jenatali> cough C++ can do it cough ;)

15:38 anujp has joined #dri-devel

15:45 <jekstrand> karolherbst: Not working. :(

15:45 <karolherbst> noo :(

15:45 <karolherbst> what is dmesg saying?

15:45 <jekstrand> karolherbst: Still getting a "pmu: firmware unavailable" message, though. Maybe I need to dracut harder?

15:45 <karolherbst> nah, we don't have PMU firmware

15:46 <karolherbst> we had some fixes in that area though

15:46 <karolherbst> but all this firmware stuff is just sooo annoying

15:46 <jekstrand> That's the only firmware message I see

15:46 <karolherbst> doing that without any documentation even more so

15:46 <karolherbst> jekstrand: yeah.. can be a bug then

15:47 <jekstrand> karolherbst: Here's dmesg |grep nouveau: https://paste.centos.org/view/3f3318d9

15:47 <karolherbst> :( yeah.. seems to be a bug

15:48 <karolherbst> Ben is working on some of those though

15:48 pjakobsson has joined #dri-devel

15:49 frieder has quit [Remote host closed the connection]

15:49 <karolherbst> jekstrand: the main problem here is, that probably not even nvidia knows what's wrong, it's all so painful if it comes to firmware

15:50 <jekstrand> :(

15:50 <jekstrand> And here I thought a 2060 was supposed to work. :P

15:51 <karolherbst> they give us different firmware, because we don't get the PMU stuff, so we get mostly untested firmware :), nice, isn't it?

15:51 <karolherbst> jekstrand: well... some do

15:51 <karolherbst> that's the neat part

15:51 <karolherbst> all GPUs seem to be different

15:51 <karolherbst> some work, some.. a little, and some don't :)

15:51 <karolherbst> the turings I've got here all work

15:51 <jekstrand> :(

15:52 iive has joined #dri-devel

15:52 pjakobsson_ has quit [Ping timeout: 480 seconds]

15:52 <karolherbst> there might be a trick we could do though, but not quite sure where to put it

15:53 <karolherbst> there are many unknowns here, like there is a pre image we could use in the vbios, but also just reseting everything before loading our stuff _could_ help.. it's annoying

15:53 <karolherbst> and ben might already have patches which fix your issue

15:53 <karolherbst> but they might break others :)

15:54 <karolherbst> that "sec2: unhandled intr 00000010" looks odd btw

15:54 <jekstrand> Does it matter that this card has never been booted w/ windows? I would hope not. Sticky initialization would be a mess.

15:54 <karolherbst> jekstrand: we do have an internal bug with a partner hitting your issue though

15:54 <karolherbst> so there is some incentive to get that fixed

15:56 <karolherbst> jekstrand: no, it doesn't matter, at least not if you cold reboot

15:56 nvishwa1 has joined #dri-devel

15:58 <jekstrand> Well, if you've got kernel patches you'd like me to try, I can do that. The good news is that the card is plugged into the beefy machine that builds kernels fast. :)

15:58 <karolherbst> yeah.. I'll try to ping ben and see if he has any ideas

15:58 ybogdano has joined #dri-devel

15:58 apinheiro has quit [Ping timeout: 480 seconds]

16:03 <karolherbst> another possibility is that the firmware is buggy and we need a new one from nvidia.. wouldn't even be the first time that happens

16:04 <jekstrand> :-/

16:07 <karolherbst> jekstrand: mind booting with nouveau.debug=trace and share the log?

16:08 <jekstrand> Yeah... Just a second. Gotta hook up a keyboard.

16:16 <jenatali> Hm... I seem to have found app that's using the same resource on 2 contexts at the same time :(

16:16 rsalvaterra has joined #dri-devel

16:17 <karolherbst> jenatali: sounds like fun

16:17 <jenatali> That's one word for it

16:17 <karolherbst> I mean.. what's the problem here?

16:17 <jenatali> Our driver's not robust against that, but I wonder, is Mesa/Gallium in general?

16:17 <jenatali> I'm hitting linked list corruption specifically

16:17 <jekstrand> karolherbst: https://paste.centos.org/view/9c970baa

16:17 <karolherbst> it's up to the frontend to make sure things don't break I think. pipe_resource as being shareable objects, the drivers have to be thread safe there anyway

16:18 <karolherbst> screen operations in general need to be thread safe as well

16:18 <jenatali> Is that true? I thought the pipe context followed the GL threading rules, which is that you can't use the same resource on 2 contexts at once

16:18 <jenatali> Sure, screen ops are thread-safe

16:18 <karolherbst> jenatali: ehh.. your buffer overflowed :(

16:18 <karolherbst> ...

16:18 <karolherbst> I meant jekstrand

16:19 <karolherbst> jenatali: pipe_context yes, that's unsafe, but the pipe_resource has to be thread safe

16:19 <anholt> jenatali: GL threading rules let you use the same texture on 2 contexts at once.

16:19 <jenatali> anholt: I could've sworn I did a spec dive a couple months back that said you couldn't

16:19 FireBurn has quit [Quit: Konversation terminated!]

16:19 <karolherbst> well.. gallium epxects pipe_resources to be thread safe

16:20 <jenatali> Maybe that was only for read-after-write hazards though... maybe multiple readers does need to work and that's where I'm busted

16:20 <karolherbst> yeah..., I had some fun with that as well, because map/unmap are so annoying

16:21 <karolherbst> but you have to expect that unmap and map can happen from different contexts even

16:21 <jekstrand> karolherbst: ?

16:21 <anholt> for RAW between threads, the app needs to be sure that the first rendering has finished (glFinish(() was traditional), and then do a bind in the reading context.

16:21 <jenatali> Yeah that's fine for us, as long as it's not racy

16:21 apinheiro has joined #dri-devel

16:21 <jekstrand> anholt: Don't you mean glFlush()?

16:21 <jekstrand> Or does it really need glFinish()?

16:21 <anholt> jekstrand: glFlush() is so ambiguous. it would have worked on Mesa for a long time, but wouldn't today.

16:22 <karolherbst> jekstrand: your kmsg buffer is not big enough

16:22 <karolherbst> try "log_buf_len=4M" or something

16:23 <anholt> and given in general how ambiguous flush was, people would sprinkle flushes and finishes around their apps

16:23 <anholt> (see also: the number of games that glFlush before glXSwapBuffers()!)

16:24 <anholt> ajax: thank you for always XInitThreads()ing

16:24 <anholt> long overdue

16:27 <jekstrand> karolherbst: https://people.freedesktop.org/~jekstrand/nouveau.dmesg

16:27 <karolherbst> that looks better, thanks

16:31 MajorBiscuit has quit [Quit: WeeChat 3.4]

16:33 <karolherbst> jekstrand: mhh yeah.. seems like everything is in order until we try to set up the fifo

16:35 <karolherbst> if we would just know what that 0x10 interrupt means

16:36 <jekstrand> :(

16:36 Duke`` has joined #dri-devel

16:38 maxzor has joined #dri-devel

16:44 <karolherbst> worst case it simply means "you messed up" :)

16:45 <karolherbst> which would be more information than we have today

16:46 <jekstrand> I don't think *I* messed up. :P

16:52 <karolherbst> yeah... I don't think so :)

16:53 <karolherbst> it's just such a mess without any docs and... *sigh*

16:53 linkmauve has left #dri-devel [#dri-devel]

16:53 nchery has quit [Ping timeout: 480 seconds]

16:54 linkmauve has joined #dri-devel

16:54 <karolherbst> I just hope something happens so deubgging this wouldn't be so painful anymore

16:55 nchery has joined #dri-devel

16:56 <jekstrand> :-/

17:01 lynxeye has quit [Quit: Leaving.]

17:04 bcheng has joined #dri-devel

17:07 tobiasjakobi has joined #dri-devel

17:07 tobiasjakobi has quit []

17:07 bcheng has quit [Remote host closed the connection]

17:11 bcheng has joined #dri-devel

17:13 bcheng has quit [Remote host closed the connection]

17:17 mszyprow has quit [Ping timeout: 480 seconds]

17:18 ybogdano has quit [Ping timeout: 480 seconds]

17:22 bcheng has joined #dri-devel

17:25 bcheng has quit [Remote host closed the connection]

17:44 bcheng has joined #dri-devel

17:46 <Kayden> mareko: with util_queue, if I want to add a bunch of jobs to run in parallel, then wait for all of them to be done...do I have to add a fence on each job, and individually wait on those? or I could just util_queue_finish I suppose, though that might wait on additional jobs

17:46 <Kayden> or is there a way to have a fence for "after all of these jobs are done"

17:48 <Kayden> (guessing no, util_queue_fence seems to be tri-state rather than counting)

18:02 alyssa has joined #dri-devel

18:02 * alyssa wonders if any drivers depend on this NIR bug..

18:02 <karolherbst> alyssa: yes

18:03 <alyssa> :D

18:03 <karolherbst> btw, what bug?

18:03 <alyssa> karolherbst: writes_memory is set for fragment shaders without side effects if they are linked to a vertex shader producing transform feedback varyings

18:04 <alyssa> (causing various backend opts to be disabled)

18:04 <karolherbst> this sounds like somebody which needs to be like that

18:04 <karolherbst> :D

18:04 <karolherbst> *something

18:05 <karolherbst> alyssa: "shader->info.writes_memory = shader->info.has_transform_feedback_varyings;" mhh

18:05 <alyssa> Ye

18:05 <alyssa> unfortunately, not the reason my code is broken

18:06 <karolherbst> why unfortunately? Be happy, so you won't have to figure out why this needs to be set :p

18:09 <alyssa> hnnngh

18:10 <alyssa> still can't figure out where this 1 pix bug comes from

18:21 hch12907 has quit [Ping timeout: 480 seconds]

18:27 ybogdano has joined #dri-devel

18:29 <jekstrand> alyssa: oh?

18:29 <jekstrand> alyssa: Oh, my...

18:33 Emmy_ has quit [Remote host closed the connection]

18:35 <dcbaker> mattst88: what's the status of waffle!107, I think that's the last thing on the list before the next waffle release

18:36 <mattst88> dcbaker: I think it's ready. I was hoping to get chadv to take a quick look at it. I'll ping him about it

18:39 alanc has quit [Remote host closed the connection]

18:39 alanc has joined #dri-devel

18:43 <Kayden> ouch, I just realized that there's texcompress_s3tc_tmp.h and util/format/u_format_s3tc.c

18:43 <Kayden> we should probably consolidate on one S3TC implementation...

18:44 <airlied> Kayden: don't they include each other?

18:44 <Kayden> somewhat, but util_format_dxtn_pack_rgba_8unorm for example looks pretty duplicated

18:49 maxzor has quit [Ping timeout: 480 seconds]

18:53 <karolherbst> ohh right.. I wanted to figure out scratch space actually

18:53 Haaninjo has joined #dri-devel

18:54 <karolherbst> jekstrand: sooo.. I think I found a kernel doing weird stuff.. it writes 12 bytes into the scratch buffer and then uses values from global mem to access it... I didn't look yet on what we should do, but this could become ugly

18:55 mszyprow has joined #dri-devel

18:56 anarsoul|2 has joined #dri-devel

18:56 <airlied> karolherbst: is that the problem with luxmark/llvmpipe?

18:57 <karolherbst> yes

18:57 <karolherbst> it's the size specifically

18:57 <karolherbst> so I cloned what compiler/clc was doing by reseting the size of scratch space and get it recalculated, but...

18:57 <karolherbst> seems to cause it to crash with llvmpipe

18:57 <karolherbst> I'll check what's the actual offsets we get and try to figure out what's the correct thing here

18:59 rasterman has joined #dri-devel

18:59 * airlied looks at the spirv-llvm-translator opaque ptrs issue, gonna be fun

18:59 <karolherbst> airlied: I'd look into coroutines first

18:59 <karolherbst> opaque pointers can be disabled at runtime

19:00 <karolherbst> which we might have to do until it gets resolved

19:00 <karolherbst> but that coroutine change we can't work around as it seems

19:00 anarsoul has quit [Ping timeout: 480 seconds]

19:04 anarsoul|2 has quit [Ping timeout: 480 seconds]

19:08 <airlied> karolherbst: yeah coroutines are first when I get it building, was just looking around the minefield

19:08 <alyssa> jekstrand: woof

19:08 anarsoul has joined #dri-devel

19:09 <jekstrand> alyssa: I need more of your patches.

19:11 <alyssa> jekstrand: which ones?

19:11 <alyssa> the make vk go brr patches?

19:11 <jekstrand> idk. Maybe the ones for null shaders?

19:11 mszyprow has quit [Ping timeout: 480 seconds]

19:12 <jekstrand> And... I've got a kernel oops on trying to access a null user pointer

19:13 apinheiro has quit [Ping timeout: 480 seconds]

19:14 <alyssa> mmh, I can take a look

19:14 <alyssa> am 'supposed' to be doing valhall but you know, this is more fun ;-P

19:14 <alyssa> https://gitlab.freedesktop.org/alyssa/mesa/-/commits/vk3/

19:14 <alyssa> this should be everything

19:15 <alyssa> the worklist stuff there isn't necessary I just suck at git rebase ;p

19:15 <jekstrand> I saw you pushed some of them

19:15 <alyssa> yes, the subset bbrezillon reviewed

19:16 <alyssa> Pass: 16739, Fail: 7, Crash: 252, Skip: 20793, Duration: 8:21, Remaining: 0

19:16 <alyssa> I think I broke something :V

19:20 <alyssa> oh.. dce..

19:28 mvlad has quit [Remote host closed the connection]

19:32 <alyssa> Separate shader lowering. Ugh.

19:32 adjtm has quit [Quit: Leaving]

19:39 <karolherbst> why is printf such a terrible interface? :(

19:39 <alyssa> C

19:40 <alyssa> karolherbst: until OpenCL supports %n it sucks ;-p

19:40 <karolherbst> :D

19:40 <karolherbst> never ever

19:45 <mattst88> dcbaker: merged \o/

19:49 apinheiro has joined #dri-devel

19:55 <karolherbst> airlied: ehh.. scratch support looks.. weird

19:55 <karolherbst> airlied: I don't really get what the assignment to bld.scratch_ptr is supposed to do?

19:56 <karolherbst> specifically that "shader->scratch_size * type.length" part

19:58 <jekstrand> Why is dma_fence_release calling an IRQ handler?!?

20:08 <airlied> karolherbst: a scratch value for each lane? not sure if that is even a thing

20:08 <karolherbst> airlied: huh? scratch mem is thread private, no?

20:09 <airlied> yes so we run 8 threads at once

20:09 <airlied> "threads"

20:09 <karolherbst> right..

20:09 <airlied> terminology gets too blurry around threads

20:09 <karolherbst> I tihnk I am just confused what "type" is here

20:10 <airlied> it's the basic shader type, 32-bit x lanes

20:10 <karolherbst> ahh

20:10 <airlied> width is 32, length is number of vector lanes, 4 or 8 usually

20:11 <airlied> a scratch write should only be written for the active lanes

20:12 <karolherbst> right

20:13 <airlied> though not sure how a 12-byte write is handled :-P

20:13 <karolherbst> I think something is going very wrong here, but I can't really put my finger on what exactly

20:13 <karolherbst> airlied: there is no 12 bit write, just the scratch space is that big

20:13 <airlied> ah

20:14 <karolherbst> mhh, I think I want to figure out if the read/writes are actually OOB

20:15 <karolherbst> huh

20:17 <airlied> probably dump a bunch of lp_build_print_value in emit_store_scratch

20:17 <karolherbst> I tried that, but printf doesn't really get flushed when the JIT is crashing or something? dunno

20:17 <karolherbst> I don't think I get the last values out of it

20:17 <airlied> pretty sure it's not buffered

20:18 <karolherbst> the last values I got were all 0 and it crashed

20:18 <airlied> so if you print before/after the lp_build_pointer_set then it didn't crash in there

20:18 famfo has quit []

20:21 <airlied> granted it might have loaded a value the crashes it later :-P

20:21 <karolherbst> possibly

20:22 <karolherbst> I doubt it though

20:22 <karolherbst> the offset comes directly out of a global mem buffer

20:22 <karolherbst> and I am sure it's all constant

20:22 <karolherbst> the array it indirects on is of size 1

20:23 <karolherbst> we could cheat a lot and just assume it's the right index... :D

20:24 maxzor has joined #dri-devel

20:30 <karolherbst> \o/

20:31 <karolherbst> airlied: it finally happened

20:31 <karolherbst> Thread 85 "rusticl queue t" received signal SIGSEGV, Segmentation fault.

20:31 <karolherbst> [Switching to Thread 0x7fffdc54e640 (LWP 3906609)]

20:31 <karolherbst> 0x00007fffe91afbd0 in llvm::Value::getType (this=0x0) at /home/kherbst/git/llvm-project/llvm/include/llvm/IR/Value.h:255

20:31 <karolherbst> ehh wait... I think that's somethin else

20:31 <karolherbst> ahh yeah.. that's on me :D

20:32 <karolherbst> I thought I hit that weird assert/crash

20:35 <karolherbst> airlied: mhhhh.. is there an easy way to print all currently bound resources?

20:35 <karolherbst> I start to believe it's something else... or.. well.. multiple things or something

20:35 <airlied> not that I know off

20:37 heat has quit [Remote host closed the connection]

20:38 heat has joined #dri-devel

20:39 <karolherbst> airlied: yeah.. so it's loading from a address which is neither the scratch mem buffer, nor any of the buffers passed in as kernel args

20:40 <karolherbst> and there are no images/samplers

20:53 adjtm has joined #dri-devel

20:55 <karolherbst> argh... and I thought I was close to fixing it..

21:00 <karolherbst> okay...

21:00 <karolherbst> I think I got it

21:02 <karolherbst> airlied: dammit.. we do 32 bit stuff on a 64 bit pointer :( and I am sure we overflow

21:03 <karolherbst> mov edx,DWORD PTR [rsp+rcx*8+0x47a0]

21:03 <karolherbst> mov edx,DWORD PTR [rdx]

21:03 <karolherbst> rdx is 0x7fff0153e21f

21:03 <karolherbst> rsp = 0x7fffce7f5c00

21:03 <karolherbst> rcx = 3

21:04 <karolherbst> ehh wait..

21:04 <karolherbst> ehh

21:04 <karolherbst> assembly is weird

21:05 <HdkR> s/assembly/x86

21:05 <karolherbst> I still thing it's part of the load_scratch thing

21:05 <karolherbst> let me move back a bit more

21:06 HankB_ has quit []

21:06 <karolherbst> ehh.. I had to mvoe one additional byte back

21:07 rkanwal has quit [Read error: No route to host]

21:07 <karolherbst> now it's a 64 bit load

21:07 rkanwal has joined #dri-devel

21:10 mszyprow has joined #dri-devel

21:11 pushqrdx has quit [Read error: Connection reset by peer]

21:12 maxzor has quit [Ping timeout: 480 seconds]

21:17 heat has quit [Remote host closed the connection]

21:17 heat has joined #dri-devel

21:19 <karolherbst> weird pointer: 0x7fff00000003

21:20 <karolherbst> so that's one part of the base

21:20 <karolherbst> and the offset of 0x153e21c gets added to it

21:21 <karolherbst> and that gives me the invalid pointer 0x7fff0153e21f

21:21 <karolherbst> nothing mapped at 0x7fff00000000

21:26 <mareko> Kayden: each job is independent, you can either wait manually for each job or call finish

21:27 <mareko> finish is slower

21:31 <mareko> finish is a group barrier like in GLSL, meaning that no newer job can execute if finish is waiting

21:32 <mareko> perhaps we could implement finish in a better way

21:36 <mareko> hopefully nobody is using finish in a perf-critical path, right zink?

21:44 <zmike> what

21:48 <ajax> ugh that reminds me

21:49 <mareko> util_queue_finish

21:49 Duke`` has quit [Ping timeout: 480 seconds]

21:49 <zmike> only on context/screen destroy

22:01 <karolherbst> airlied: huh... am I missing something or is the loop value result inside emit_load_scratch read without writing to it?

22:04 <karolherbst> ohh, it's init with 0

22:04 <karolherbst> nvm then

22:04 <karolherbst> anyway.. I think emit_load_scratch is wrong

22:04 <karolherbst> not sure what exactly yet

22:07 <airlied> are there inactive lanes?

22:07 <karolherbst> it looks like it, yes

22:07 <karolherbst> at least some values stay 0 within those vectors

22:08 <airlied> you can print exec_mask

22:08 <airlied> but it shouldn't do any loads for those lanes

22:08 <karolherbst> the proble isn't that

22:08 <airlied> like in theory inactive lanes should never do load/stores, but there might be a bug

22:08 <karolherbst> it constructs wrong pointers for some lanes

22:09 <karolherbst> like really bogus ones

22:09 <karolherbst> temp_res = 0 0 140735371329600 140733193388035 0 0 0 0

22:09 <karolherbst> (gdb) p/x 140735371329600

22:09 <karolherbst> $4 = 0x7fff81d0c040

22:09 <ajax> would it be legal to implement just the GetSwapchainStatus bit of VK_KHR_shared_presentable_image and not try to support any of the extra refresh modes

22:09 <karolherbst> (gdb) p/x 140733193388035

22:09 <karolherbst> $5 = 0x7fff00000003

22:09 <karolherbst> the first one is okay

22:09 <karolherbst> the second isn't

22:10 <ajax> i feel like that'd be a legal gl move but i'm not sure how vulkan convention rolls

22:10 <karolherbst> not sure if that has to do with inactive lanes, but..

22:10 <karolherbst> airlied:

22:10 <karolherbst> exec_mask = 0 0 -1 -1 -1 0 0 0

22:10 <karolherbst> temp_res = 0 0 140735236972608 140733193388035 0 0 0 0

22:10 <karolherbst> temp_res is dumped beween LLVMBuildLoad and LLVMBuildInsertElement

22:10 <karolherbst> it segfaults on the lane having a 140733193388035 in between

22:11 <karolherbst> it tries to load 0x7fff0153e21f, but it's based on that past bogus value

22:12 <karolherbst> result contains the right pointer though

22:12 <karolherbst> 0x7fffc8010040

22:12 <karolherbst> ehh well..

22:13 <karolherbst> nvm result

22:13 nchery has quit [Read error: Connection reset by peer]

22:14 nchery has joined #dri-devel

22:14 <Kayden> mareko: thanks, that clarifies things!

22:15 Jookia has joined #dri-devel

22:15 Jookia has left #dri-devel [#dri-devel]

22:17 cheako has joined #dri-devel

22:17 neonking has quit [Remote host closed the connection]

22:17 neonking has joined #dri-devel

22:19 apinheiro has quit [Remote host closed the connection]

22:26 <Viciouss> I have some trouble with a vblank timeout after upgrading from android 11 to 12 with an exynos4412 device. It will turn the screen black, but the device seems to work normally aside from that, I can adb in, sounds continue playing. I can reproduce this consistently.

22:26 <Viciouss> I'm using the android common kernel 5.10.101 with some patches for my device, this is happening on mesa 21.3.8, I also tried 22.0.2 as well as main with the same result. Here is the warning that comes with it: https://privatebin.net/?817901b4067fe684#56xaciREjDKvubLaCeM5umnQw6cJA4JS6TVkiPZqpfpa

22:31 danvet has quit [Ping timeout: 480 seconds]

22:32 Haaninjo has quit [Quit: Ex-Chat]

22:32 Jookia has joined #dri-devel

22:32 <Jookia> is the mesa3d.org sysadmin around here?

22:45 ybogdano has quit [Ping timeout: 480 seconds]

22:48 mclasen has joined #dri-devel

22:58 mclasen_ has joined #dri-devel

22:59 mclasen has quit [Ping timeout: 480 seconds]

23:01 <dcbaker> Jookia: you'll probably have more luck at #_oftc_#freedesktop:matrix.org , thats where the sysadmins generally hang out

23:01 <Jookia> ah

23:02 <Jookia> nevermind then

23:03 <Jookia> if someone could pass on that the archive.mesa3d.org certificate is broken in gnutls that'd be great :)

23:03 <Sachiel> I'm guessing that just means #freedesktop and the matrix client turned it into something else

23:04 <heat> Jookia, if the cert is broken in gnutls i'd guess it's gnutls that's broken, not the cert

23:04 <Jookia> heat: it's the server config sending duplicate certs

23:05 <Jookia> yes it's a gnutls bug, but it's fixable serverside too

23:09 <jekstrand> Is it just me or does the framebuffer_fetch spec say that if it's enabled in the shader, it's basically always on. Like, if I'm reading it correctly, you can just not write gl_FragData and it'll output the same color as before.

23:09 <jekstrand> Or you can just write gl_FragData.x to only modify one component

23:09 <jekstrand> in theory, anyway.

23:11 Jookia has left #dri-devel [#dri-devel]

23:18 <airlied> karolherbst: do I need anything beyond your rusticl/wip branch to reproduce?

23:18 <anholt> 90% of mediump tests passing with my vtn relaxed precision support. that means I'm basically done, right?

23:18 <karolherbst> nope

23:19 <anholt> airlied: will lvp want 16-bit math for mediump?

23:19 <karolherbst> airlied: ... I am still not quite sure what's going on, but I think _something_ calculates a wrong pointer, stores it into scratch mem and... uses it for a load, but ufff...

23:20 heat has quit [Remote host closed the connection]

23:20 <airlied> anholt: probably worth enabling just for testing, it should work

23:20 <airlied> the only 16-bit problem I remember having is around uniform readback for GL

23:20 <airlied> karolherbst: oh that would be annoying to track down

23:20 <anholt> not a case of "yeah, avx should love it, turn it on"?

23:20 <karolherbst> airlied: yes...

23:21 lumag_ has quit [Ping timeout: 480 seconds]

23:21 <karolherbst> but I don't think that's it

23:22 <airlied> anholt: don't think it magically made anything faster in the past

23:22 <karolherbst> there is some weirdness going on with types

23:22 <karolherbst> like it feels like something uses 32 bit although it should be 64

23:24 <karolherbst> airlied: is there a good place to dump the "final" nir shader of llvmpipe?

23:24 <karolherbst> ehh right before lp_build_nir_llvm I guess

23:25 <airlied> yeah there

23:25 mszyprow has quit [Ping timeout: 480 seconds]

23:26 <karolherbst> ahhh.. now it doesn't crash but simply renders garbage :D

23:27 <airlied> also LP_NUM_THREADS=1 might help to see things better

23:27 <karolherbst> yeah, I already set that one

23:29 <karolherbst> airlied: yeah.. well...

23:29 <karolherbst> https://gist.githubusercontent.com/karolherbst/d817e52205fbd89703a7634c0f570bf6/raw/aeb56f667852e163cbb8a7c21ea37d02c7d95df8/gistfile1.txt

23:30 <karolherbst> question is.. is this correct or not

23:30 pcercuei has quit [Quit: dodo]

23:31 maxzor has joined #dri-devel

23:31 <karolherbst> ssa_28 seems to be the loaded base ptr

23:31 <karolherbst> weird...

23:31 icecream95 has joined #dri-devel

23:33 <karolherbst> but I think it's crashing on that one, as the x86 assembly looked very close to that ones

23:33 <karolherbst> *one

23:33 mhenning has joined #dri-devel

23:33 <karolherbst> but..

23:34 <karolherbst> ohhh.. let me check something

23:35 <karolherbst> offset 0x00000038 mhh

23:35 <karolherbst> let me check if the input buffer even contains valid stuff

23:36 neonking_ has joined #dri-devel

23:40 maxzor has quit [Ping timeout: 480 seconds]

23:42 neonking has quit [Ping timeout: 480 seconds]

23:46 morphis has quit [Ping timeout: 480 seconds]

23:46 morphis has joined #dri-devel

23:49 iive has quit []

23:52 ppascher has quit [Ping timeout: 480 seconds]

23:57 <karolherbst> ahhh.. why is this bug soo annoying

23:57 <karolherbst> now it stopped crashing and simply renders incorrectly :(