ChanServ changed the topic of #dri-devel to: <ajax> nothing involved with X should ever be unable to find a bar
adjtm is now known as Guest3314
Guest3314 has quit [Read error: Connection reset by peer]
adjtm has joined #dri-devel
co1umbarius has joined #dri-devel
columbarius has quit [Ping timeout: 480 seconds]
<karolherbst> airlied: yeah... I think we truncate pointers somewhere
jewins has quit [Ping timeout: 480 seconds]
nvishwa1 has quit [Remote host closed the connection]
nvishwa1 has joined #dri-devel
pushqrdx has joined #dri-devel
lemonzest has quit [Quit: WeeChat 3.4]
<karolherbst> airlied: I found it
<karolherbst> something with wrong scratch size and indirect accesses :(
<karolherbst> yeah... I kind of need a better solution here
digetx has quit [Read error: Connection reset by peer]
digetx has joined #dri-devel
hch12907 has quit [Ping timeout: 480 seconds]
mhenning has quit [Quit: mhenning]
elongbug_ has joined #dri-devel
elongbug__ has quit [Read error: Connection reset by peer]
sdutt has joined #dri-devel
mszyprow has joined #dri-devel
Duke`` has joined #dri-devel
mszyprow has quit [Ping timeout: 480 seconds]
hch12907 has joined #dri-devel
itoral has joined #dri-devel
itoral_ has joined #dri-devel
itoral has quit [Ping timeout: 480 seconds]
lemonzest has joined #dri-devel
kts has joined #dri-devel
mszyprow has joined #dri-devel
Duke`` has quit [Ping timeout: 480 seconds]
mclasen has joined #dri-devel
nchery has joined #dri-devel
kts has quit [Ping timeout: 480 seconds]
mclasen has quit [Ping timeout: 480 seconds]
tzimmermann has joined #dri-devel
mclasen has joined #dri-devel
frieder has joined #dri-devel
jfalempe has joined #dri-devel
itoral_ has quit [Remote host closed the connection]
itoral_ has joined #dri-devel
hch12907 has quit [Ping timeout: 480 seconds]
nvishwa1 has quit [Read error: Connection reset by peer]
rgallaispou has joined #dri-devel
danvet has joined #dri-devel
itoral_ has quit [Remote host closed the connection]
itoral has joined #dri-devel
itoral has quit [Remote host closed the connection]
itoral has joined #dri-devel
itoral has quit [Remote host closed the connection]
itoral has joined #dri-devel
thellstrom has joined #dri-devel
lynxeye has joined #dri-devel
itoral has quit [Remote host closed the connection]
hch12907 has joined #dri-devel
itoral has joined #dri-devel
itoral has quit [Remote host closed the connection]
itoral has joined #dri-devel
itoral has quit [Remote host closed the connection]
itoral has joined #dri-devel
itoral has quit [Remote host closed the connection]
itoral has joined #dri-devel
itoral has quit [Remote host closed the connection]
itoral has joined #dri-devel
itoral has quit [Remote host closed the connection]
itoral has joined #dri-devel
itoral has quit [Remote host closed the connection]
itoral has joined #dri-devel
itoral has quit [Remote host closed the connection]
itoral has joined #dri-devel
mclasen has quit [Ping timeout: 480 seconds]
itoral has quit [Remote host closed the connection]
itoral has joined #dri-devel
pcercuei has joined #dri-devel
itoral has quit [Remote host closed the connection]
itoral has joined #dri-devel
mclasen has joined #dri-devel
rasterman has joined #dri-devel
itoral has quit [Remote host closed the connection]
itoral has joined #dri-devel
itoral has quit [Remote host closed the connection]
itoral has joined #dri-devel
itoral has quit [Remote host closed the connection]
itoral has joined #dri-devel
thellstrom has quit [Ping timeout: 480 seconds]
apinheiro has joined #dri-devel
sdutt has quit [Ping timeout: 480 seconds]
itoral has quit [Remote host closed the connection]
itoral has joined #dri-devel
gawin has joined #dri-devel
itoral has quit [Remote host closed the connection]
itoral has joined #dri-devel
anarsoul|2 has joined #dri-devel
anarsoul has quit [Read error: Connection reset by peer]
itoral has quit [Remote host closed the connection]
itoral has joined #dri-devel
itoral has quit [Remote host closed the connection]
itoral has joined #dri-devel
anarsoul|2 has quit [Ping timeout: 480 seconds]
anarsoul has joined #dri-devel
itoral has quit [Remote host closed the connection]
itoral has joined #dri-devel
itoral has quit [Remote host closed the connection]
itoral has joined #dri-devel
itoral has quit [Remote host closed the connection]
itoral has joined #dri-devel
itoral has quit [Remote host closed the connection]
itoral has joined #dri-devel
MajorBiscuit has joined #dri-devel
mclasen has quit [Ping timeout: 480 seconds]
itoral has quit [Remote host closed the connection]
itoral has joined #dri-devel
elongbug__ has joined #dri-devel
itoral has quit [Remote host closed the connection]
itoral has joined #dri-devel
elongbug_ has quit [Ping timeout: 480 seconds]
itoral has quit [Remote host closed the connection]
itoral has joined #dri-devel
itoral has quit [Remote host closed the connection]
itoral has joined #dri-devel
itoral has quit [Remote host closed the connection]
itoral has joined #dri-devel
mclasen has joined #dri-devel
itoral has quit [Remote host closed the connection]
xtarun has joined #dri-devel
itoral has joined #dri-devel
xtarun has quit []
YuGiOhJCJ has joined #dri-devel
itoral has quit [Remote host closed the connection]
flacks has quit [Quit: Quitter]
itoral has joined #dri-devel
flacks has joined #dri-devel
itoral has quit [Remote host closed the connection]
itoral has joined #dri-devel
itoral has quit [Remote host closed the connection]
itoral has joined #dri-devel
mclasen has quit [Ping timeout: 480 seconds]
<tzimmermann> javierm, mripard sent me a bug report about fbdev hotunplugging going wrong: https://github.com/raspberrypi/linux/issues/5011
<tzimmermann> i guess we need to repork this
itoral has quit [Remote host closed the connection]
itoral has joined #dri-devel
<tzimmermann> 'rework' :)
itoral has quit [Remote host closed the connection]
itoral has joined #dri-devel
* javierm reads
itoral has quit [Remote host closed the connection]
<tzimmermann> javierm, there's a kernel log, which shows plymouth crashing. https://github.com/raspberrypi/linux/files/8573600/log.txt I suspect that plymouth tries to access fbdev framebuffer memory after the platform device has been unplugged
itoral has joined #dri-devel
<javierm> tzimmermann: yes, I'm reading the log now
<karolherbst> what I want is: containers to run any distribution deskop on any machine
itoral has quit [Remote host closed the connection]
itoral has joined #dri-devel
itoral has quit [Remote host closed the connection]
itoral has joined #dri-devel
<javierm> tzimmermann: so if I understand the problem correctly, is like this 1) simplefb is registered and /dev/fb0 opened by plymouth, get_fb_info() is called
itoral has quit [Remote host closed the connection]
<javierm> 2) then at some point vc4 DRM driver registers and kicks out simplefb fbdev, then platform_device_unregister() is called
itoral has joined #dri-devel
<javierm> tzimmermann: 3) but file->private_data still has a reference to struct fb_info that got in fb_open() and so in fb_release() things go boom
<tzimmermann> javierm, i didn't step through it. i was just guessing
<tzimmermann> where did you see the get_fb_info call?
<javierm> tzimmermann: yean, I'm guessing too by reading the code and the log
<javierm> tzimmermann: I did not but in the call stack there's fb_release() and a mutex_lock()
<javierm> tzimmermann: so I think that's either a use-after-free in fb_release() when doing struct fb_info * const info = file->private_data and then lock_fb_info(info) or a locking issue
<javierm> tzimmermann: that's why I thought that's more safe to drop the lock and re-acquire rather than making assumptions about the locking status
<javierm> the __mutex_lock_slowpath and __mutex_lock.constprop.9 in the crash log are also suspicious
<javierm> [ 11.365003] Unable to handle kernel paging request at virtual address fe7e014c
<javierm> [ 11.365039] [fe7e014c] *pgd=00000000
<javierm> seems to be a NULL pointer deref
<tzimmermann> javierm, AFAICT your theory aligns with a bug report that was on linux-fbdev this weekend: https://lore.kernel.org/linux-fbdev/ab099144-2db2-caae-7a59-94211111a6cf@suse.de/T/#m93fb3184c1de6f5d444081a21128f9c703d93b53
<javierm> tzimmermann: oh, that does sound like the same issue indeed
<javierm> :(
<javierm> another corner case... I want to disable CONFIG_FB so much
<tzimmermann> fbdev worked well until we tried to fix it :)
<javierm> tzimmermann: for some definition of well :)
<tzimmermann> javierm, i don't think the provided patch is correct, though. seems like it's papering over the issue
<javierm> tzimmermann: absolutely agree
<javierm> tzimmermann: but also... what's the point of keeping the device if the real driver will get other the display controller anyways
<javierm> tzimmermann: so I wonder if we shouldn't just prevent this NULL pointer deref to happen, i.e: add a if (!info) return or something like that
<tzimmermann> and return -ENODEV? if this already fixes the problem, i'm all for it
<javierm> tzimmermann: yeah. But I don't know how to reproduce the issue. Maybe answering the person that proposed the patch with this suggestion ?
<tzimmermann> javierm, maybe mripard or the reported of the bug
<javierm> tzimmermann: because if someone mmap'ed the /dev/fb0, things are not going well anyways after let's say vc4 probes
<javierm> the /dev/fb0 for simplefb I mean
<tzimmermann> javierm, i suspected this was the problem. but i've never encountered the error anywhere
<tzimmermann> javierm, after looking at get_fb_info(), i realized that we don't seem to clear registered_fb[i] anywhere after calling platform_device_unregister()
<tzimmermann> so the pointer is still there (?)
<tzimmermann> maybe that's the problem
<mripard> tzimmermann: I can try to reproduce and test it if you want, or you can comment on the bug report
<tzimmermann> i have to take another look
<mripard> I'm probably not going to be able to test for the next couple of days though, so commenting on the bug report might be the fastest option
<tzimmermann> mripard, thanks for the offer. i'll try first, but i'm really busy today
<javierm> tzimmermann: I've a rpi4 and can't reproduce it, I also been testing recently with simplefb too to cover all cases for https://lore.kernel.org/lkml/20220429084253.1085911-1-javierm@redhat.com/
<tzimmermann> if nothing helps, i'll get back to you
<tzimmermann> i have an rpi3+
<javierm> but I'm using Fedora... maybe something in the distro that stress differently
<javierm> like how we configure plymouth by default or whatever
<javierm> tzimmermann: I don't think that's true, it's set to NULL in do_unregister_framebuffer()
<javierm> tzimmermann: platform_device_unregister() will call the driver's .remove handler that will call unregister_framebuffer()
<javierm> I guess is a driver bug actually? That shouldn't unregister if is opened ?
q4a has joined #dri-devel
itoral has quit [Remote host closed the connection]
heat has joined #dri-devel
itoral has joined #dri-devel
<tzimmermann> javierm, i guess i missed that
<tzimmermann> i probably have to reproduce it
itoral has quit [Remote host closed the connection]
itoral has joined #dri-devel
<javierm> tzimmermann: do you know at what point the .release file ops is called during do_unregister_framebuffer() ?
<javierm> because that seems to be the root cause. The .fb_release callback shouldn't be called after the framebuffer has been unregistered
itoral has quit [Remote host closed the connection]
<tzimmermann> javierm, file->private_data is still set, i guess
itoral has joined #dri-devel
itoral has quit [Remote host closed the connection]
itoral has joined #dri-devel
gawin has quit [Ping timeout: 480 seconds]
cheako has quit [Quit: Connection closed for inactivity]
<tzimmermann> javierm, could we acquire the registration lock around https://elixir.bootlin.com/linux/v5.17.5/source/drivers/video/fbdev/core/fbmem.c#L1441
<tzimmermann> ?
<javierm> tzimmermann: I don't think so. Because if I'm reading correctly, we already do registration_lock -> fb_info -> lock
<tzimmermann> or we leave the fb_info around until the final reference has been dropped
<javierm> so that would cause an ABBA deadlock
<javierm> tzimmermann: but I'm writing a patch for you to read
<tzimmermann> thank you
gawin has joined #dri-devel
heat has quit [Ping timeout: 480 seconds]
itoral has quit [Remote host closed the connection]
itoral has joined #dri-devel
itoral has quit [Remote host closed the connection]
itoral has joined #dri-devel
<javierm> tzimmermann: I actually found two bugs in fbmem
<javierm> not related to your patches but things that were there before already
itoral has quit []
<javierm> ups, patch 1/2 doesn't compile :P https://paste.centos.org/view/raw/c8955e39 instead
frieder has quit [Ping timeout: 480 seconds]
<javierm> tzimmermann: btw, that assumptions that the fb_info reference can change beneath the user-space program feet that's holding a fd is scary, but that's how fbdev is...
<javierm> I'm just making .fb_release() consistent with fb_ioctl(), fb_mmap(), etc
agd5f has joined #dri-devel
kts has joined #dri-devel
rkanwal has joined #dri-devel
<tzimmermann> javierm, oh! so file_fb_info() is where the magic happens
apinheiro has quit [Ping timeout: 480 seconds]
frieder has joined #dri-devel
<javierm> tzimmermann: yeah and fb_release() was not using that accessor
<javierm> I've tested both patches in my rpi4 and at least didn't find any regressions
<javierm> tzimmermann, mripard: posted to the list, let me know what you think folks
mvlad has joined #dri-devel
jewins has joined #dri-devel
<javierm> tzimmermann: dropped patch 1/2, fbdev is a mine field so the less we touch it, the better :)
sdutt has joined #dri-devel
sdutt has quit []
sdutt has joined #dri-devel
fxkamd has joined #dri-devel
YuGiOhJCJ has quit [Quit: YuGiOhJCJ]
MrCooper_ is now known as MrCooper
Thymo_ has quit []
<jekstrand> dcbaker, jljusten: Ping on a waffle release
rasterman has quit [Quit: Gettin' stinky!]
Thymo has joined #dri-devel
<MrCooper> ajax: "is it actually that hard to predict which image will be returned next? it's the one with the lowest sbc" isn't always true, e.g. with direct scanout of a mailbox swapchain
<MrCooper> ajax: we can get rid of the WSI thread with Xwayland, if Wayland compositors & Xwayland are fixed to handle mailbox properly (replacing an older buffer only once the newer one is actually ready)
tzimmermann has quit [Quit: Leaving]
apinheiro has joined #dri-devel
gawin has quit [Ping timeout: 480 seconds]
<jekstrand> karolherbst, imirkin: A bit of help debugging nouveau?
<jekstrand> Getting this from Mesa:
<jekstrand> nvc0_screen_create:1168 - Error allocating PGRAPH context for M2MF: -16
<jekstrand> And this from the kernel:
<jekstrand> [ 835.464588] nouveau 0000:17:00.0: gr: fecs falcon already acquired by gr!
<jekstrand> [ 835.464593] nouveau 0000:17:00.0: gr: init failed, -16
<karolherbst> jekstrand: need signed firmware
<jekstrand> karolherbst: Ok... and those aren't in linux-firmware?
<karolherbst> they are, except for ampere
<jekstrand> karolherbst: This is an RTX 2080 FWIW
<jekstrand> 2060, rather
<karolherbst> but they might not be inside your initramfs
<jekstrand> Yeah... I wondered about that
<jekstrand> I'll try to debug that. Thanks
<karolherbst> jekstrand: there is a lsinitrd command
<karolherbst> ohh wiat
<karolherbst> jekstrand: I think you might acutally hit some bug
<karolherbst> :(
<karolherbst> what kernel are you on?
<jekstrand> Looks like I don't have nvidia firmware
<karolherbst> ahh
<karolherbst> I hope that is it, but...
<dcbaker> jekstrand: jljusten: I’m working on it. I’ve been working on the xorg release scripts to work with waffle like they do with mesa
ella-0 has joined #dri-devel
<cwabbott> ugh, this is so annoying...
<cwabbott> ../src/freedreno/vulkan/tu_cmd_buffer.c: In function ‘vk2tu_single_stage’:
<cwabbott> ../src/freedreno/vulkan/tu_cmd_buffer.c:3061:4: error: case label does not reduce to an integer constant
<cwabbott> 3061 | case VK_PIPELINE_STAGE_2_DRAW_INDIRECT_BIT:
<cwabbott> | ^~~~
ella-0_ has quit [Read error: Connection reset by peer]
<cwabbott> whyyy can't we have nice things like 64-bit enums
heat has joined #dri-devel
<jenatali> cough C++ can do it cough ;)
anujp has joined #dri-devel
<jekstrand> karolherbst: Not working. :(
<karolherbst> noo :(
<karolherbst> what is dmesg saying?
<jekstrand> karolherbst: Still getting a "pmu: firmware unavailable" message, though. Maybe I need to dracut harder?
<karolherbst> nah, we don't have PMU firmware
<karolherbst> we had some fixes in that area though
<karolherbst> but all this firmware stuff is just sooo annoying
<jekstrand> That's the only firmware message I see
<karolherbst> doing that without any documentation even more so
<karolherbst> jekstrand: yeah.. can be a bug then
<jekstrand> karolherbst: Here's dmesg |grep nouveau: https://paste.centos.org/view/3f3318d9
<karolherbst> :( yeah.. seems to be a bug
<karolherbst> Ben is working on some of those though
pjakobsson has joined #dri-devel
frieder has quit [Remote host closed the connection]
<karolherbst> jekstrand: the main problem here is, that probably not even nvidia knows what's wrong, it's all so painful if it comes to firmware
<jekstrand> :(
<jekstrand> And here I thought a 2060 was supposed to work. :P
<karolherbst> they give us different firmware, because we don't get the PMU stuff, so we get mostly untested firmware :), nice, isn't it?
<karolherbst> jekstrand: well... some do
<karolherbst> that's the neat part
<karolherbst> all GPUs seem to be different
<karolherbst> some work, some.. a little, and some don't :)
<karolherbst> the turings I've got here all work
<jekstrand> :(
iive has joined #dri-devel
pjakobsson_ has quit [Ping timeout: 480 seconds]
<karolherbst> there might be a trick we could do though, but not quite sure where to put it
<karolherbst> there are many unknowns here, like there is a pre image we could use in the vbios, but also just reseting everything before loading our stuff _could_ help.. it's annoying
<karolherbst> and ben might already have patches which fix your issue
<karolherbst> but they might break others :)
<karolherbst> that "sec2: unhandled intr 00000010" looks odd btw
<jekstrand> Does it matter that this card has never been booted w/ windows? I would hope not. Sticky initialization would be a mess.
<karolherbst> jekstrand: we do have an internal bug with a partner hitting your issue though
<karolherbst> so there is some incentive to get that fixed
<karolherbst> jekstrand: no, it doesn't matter, at least not if you cold reboot
nvishwa1 has joined #dri-devel
<jekstrand> Well, if you've got kernel patches you'd like me to try, I can do that. The good news is that the card is plugged into the beefy machine that builds kernels fast. :)
<karolherbst> yeah.. I'll try to ping ben and see if he has any ideas
ybogdano has joined #dri-devel
apinheiro has quit [Ping timeout: 480 seconds]
<karolherbst> another possibility is that the firmware is buggy and we need a new one from nvidia.. wouldn't even be the first time that happens
<jekstrand> :-/
<karolherbst> jekstrand: mind booting with nouveau.debug=trace and share the log?
<jekstrand> Yeah... Just a second. Gotta hook up a keyboard.
<jenatali> Hm... I seem to have found app that's using the same resource on 2 contexts at the same time :(
rsalvaterra has joined #dri-devel
<karolherbst> jenatali: sounds like fun
<jenatali> That's one word for it
<karolherbst> I mean.. what's the problem here?
<jenatali> Our driver's not robust against that, but I wonder, is Mesa/Gallium in general?
<jenatali> I'm hitting linked list corruption specifically
<karolherbst> it's up to the frontend to make sure things don't break I think. pipe_resource as being shareable objects, the drivers have to be thread safe there anyway
<karolherbst> screen operations in general need to be thread safe as well
<jenatali> Is that true? I thought the pipe context followed the GL threading rules, which is that you can't use the same resource on 2 contexts at once
<jenatali> Sure, screen ops are thread-safe
<karolherbst> jenatali: ehh.. your buffer overflowed :(
<karolherbst> ...
<karolherbst> I meant jekstrand
<karolherbst> jenatali: pipe_context yes, that's unsafe, but the pipe_resource has to be thread safe
<anholt> jenatali: GL threading rules let you use the same texture on 2 contexts at once.
<jenatali> anholt: I could've sworn I did a spec dive a couple months back that said you couldn't
FireBurn has quit [Quit: Konversation terminated!]
<karolherbst> well.. gallium epxects pipe_resources to be thread safe
<jenatali> Maybe that was only for read-after-write hazards though... maybe multiple readers does need to work and that's where I'm busted
<karolherbst> yeah..., I had some fun with that as well, because map/unmap are so annoying
<karolherbst> but you have to expect that unmap and map can happen from different contexts even
<jekstrand> karolherbst: ?
<anholt> for RAW between threads, the app needs to be sure that the first rendering has finished (glFinish(() was traditional), and then do a bind in the reading context.
<jenatali> Yeah that's fine for us, as long as it's not racy
apinheiro has joined #dri-devel
<jekstrand> anholt: Don't you mean glFlush()?
<jekstrand> Or does it really need glFinish()?
<anholt> jekstrand: glFlush() is so ambiguous. it would have worked on Mesa for a long time, but wouldn't today.
<karolherbst> jekstrand: your kmsg buffer is not big enough
<karolherbst> try "log_buf_len=4M" or something
<anholt> and given in general how ambiguous flush was, people would sprinkle flushes and finishes around their apps
<anholt> (see also: the number of games that glFlush before glXSwapBuffers()!)
<anholt> ajax: thank you for always XInitThreads()ing
<anholt> long overdue
<karolherbst> that looks better, thanks
MajorBiscuit has quit [Quit: WeeChat 3.4]
<karolherbst> jekstrand: mhh yeah.. seems like everything is in order until we try to set up the fifo
<karolherbst> if we would just know what that 0x10 interrupt means
<jekstrand> :(
Duke`` has joined #dri-devel
maxzor has joined #dri-devel
<karolherbst> worst case it simply means "you messed up" :)
<karolherbst> which would be more information than we have today
<jekstrand> I don't think *I* messed up. :P
<karolherbst> yeah... I don't think so :)
<karolherbst> it's just such a mess without any docs and... *sigh*
linkmauve has left #dri-devel [#dri-devel]
nchery has quit [Ping timeout: 480 seconds]
linkmauve has joined #dri-devel
<karolherbst> I just hope something happens so deubgging this wouldn't be so painful anymore
nchery has joined #dri-devel
<jekstrand> :-/
lynxeye has quit [Quit: Leaving.]
bcheng has joined #dri-devel
tobiasjakobi has joined #dri-devel
tobiasjakobi has quit []
bcheng has quit [Remote host closed the connection]
bcheng has joined #dri-devel
bcheng has quit [Remote host closed the connection]
mszyprow has quit [Ping timeout: 480 seconds]
ybogdano has quit [Ping timeout: 480 seconds]
bcheng has joined #dri-devel
bcheng has quit [Remote host closed the connection]
bcheng has joined #dri-devel
<Kayden> mareko: with util_queue, if I want to add a bunch of jobs to run in parallel, then wait for all of them to be done...do I have to add a fence on each job, and individually wait on those? or I could just util_queue_finish I suppose, though that might wait on additional jobs
<Kayden> or is there a way to have a fence for "after all of these jobs are done"
<Kayden> (guessing no, util_queue_fence seems to be tri-state rather than counting)
alyssa has joined #dri-devel
* alyssa wonders if any drivers depend on this NIR bug..
<karolherbst> alyssa: yes
<alyssa> :D
<karolherbst> btw, what bug?
<alyssa> karolherbst: writes_memory is set for fragment shaders without side effects if they are linked to a vertex shader producing transform feedback varyings
<alyssa> (causing various backend opts to be disabled)
<karolherbst> this sounds like somebody which needs to be like that
<karolherbst> :D
<karolherbst> *something
<karolherbst> alyssa: "shader->info.writes_memory = shader->info.has_transform_feedback_varyings;" mhh
<alyssa> Ye
<alyssa> unfortunately, not the reason my code is broken
<karolherbst> why unfortunately? Be happy, so you won't have to figure out why this needs to be set :p
<alyssa> hnnngh
<alyssa> still can't figure out where this 1 pix bug comes from
hch12907 has quit [Ping timeout: 480 seconds]
ybogdano has joined #dri-devel
<jekstrand> alyssa: oh?
<jekstrand> alyssa: Oh, my...
Emmy_ has quit [Remote host closed the connection]
<dcbaker> mattst88: what's the status of waffle!107, I think that's the last thing on the list before the next waffle release
<mattst88> dcbaker: I think it's ready. I was hoping to get chadv to take a quick look at it. I'll ping him about it
alanc has quit [Remote host closed the connection]
alanc has joined #dri-devel
<Kayden> ouch, I just realized that there's texcompress_s3tc_tmp.h and util/format/u_format_s3tc.c
<Kayden> we should probably consolidate on one S3TC implementation...
<airlied> Kayden: don't they include each other?
<Kayden> somewhat, but util_format_dxtn_pack_rgba_8unorm for example looks pretty duplicated
maxzor has quit [Ping timeout: 480 seconds]
<karolherbst> ohh right.. I wanted to figure out scratch space actually
Haaninjo has joined #dri-devel
<karolherbst> jekstrand: sooo.. I think I found a kernel doing weird stuff.. it writes 12 bytes into the scratch buffer and then uses values from global mem to access it... I didn't look yet on what we should do, but this could become ugly
mszyprow has joined #dri-devel
anarsoul|2 has joined #dri-devel
<airlied> karolherbst: is that the problem with luxmark/llvmpipe?
<karolherbst> yes
<karolherbst> it's the size specifically
<karolherbst> so I cloned what compiler/clc was doing by reseting the size of scratch space and get it recalculated, but...
<karolherbst> seems to cause it to crash with llvmpipe
<karolherbst> I'll check what's the actual offsets we get and try to figure out what's the correct thing here
rasterman has joined #dri-devel
* airlied looks at the spirv-llvm-translator opaque ptrs issue, gonna be fun
<karolherbst> airlied: I'd look into coroutines first
<karolherbst> opaque pointers can be disabled at runtime
<karolherbst> which we might have to do until it gets resolved
<karolherbst> but that coroutine change we can't work around as it seems
anarsoul has quit [Ping timeout: 480 seconds]
anarsoul|2 has quit [Ping timeout: 480 seconds]
<airlied> karolherbst: yeah coroutines are first when I get it building, was just looking around the minefield
<alyssa> jekstrand: woof
anarsoul has joined #dri-devel
<jekstrand> alyssa: I need more of your patches.
<alyssa> jekstrand: which ones?
<alyssa> the make vk go brr patches?
<jekstrand> idk. Maybe the ones for null shaders?
mszyprow has quit [Ping timeout: 480 seconds]
<jekstrand> And... I've got a kernel oops on trying to access a null user pointer
apinheiro has quit [Ping timeout: 480 seconds]
<alyssa> mmh, I can take a look
<alyssa> am 'supposed' to be doing valhall but you know, this is more fun ;-P
<alyssa> this should be everything
<alyssa> the worklist stuff there isn't necessary I just suck at git rebase ;p
<jekstrand> I saw you pushed some of them
<alyssa> yes, the subset bbrezillon reviewed
<alyssa> Pass: 16739, Fail: 7, Crash: 252, Skip: 20793, Duration: 8:21, Remaining: 0
<alyssa> I think I broke something :V
<alyssa> oh.. dce..
mvlad has quit [Remote host closed the connection]
<alyssa> Separate shader lowering. Ugh.
adjtm has quit [Quit: Leaving]
<karolherbst> why is printf such a terrible interface? :(
<alyssa> C
<alyssa> karolherbst: until OpenCL supports %n it sucks ;-p
<karolherbst> :D
<karolherbst> never ever
<mattst88> dcbaker: merged \o/
apinheiro has joined #dri-devel
<karolherbst> airlied: ehh.. scratch support looks.. weird
<karolherbst> airlied: I don't really get what the assignment to bld.scratch_ptr is supposed to do?
<karolherbst> specifically that "shader->scratch_size * type.length" part
<jekstrand> Why is dma_fence_release calling an IRQ handler?!?
<airlied> karolherbst: a scratch value for each lane? not sure if that is even a thing
<karolherbst> airlied: huh? scratch mem is thread private, no?
<airlied> yes so we run 8 threads at once
<airlied> "threads"
<karolherbst> right..
<airlied> terminology gets too blurry around threads
<karolherbst> I tihnk I am just confused what "type" is here
<airlied> it's the basic shader type, 32-bit x lanes
<karolherbst> ahh
<airlied> width is 32, length is number of vector lanes, 4 or 8 usually
<airlied> a scratch write should only be written for the active lanes
<karolherbst> right
<airlied> though not sure how a 12-byte write is handled :-P
<karolherbst> I think something is going very wrong here, but I can't really put my finger on what exactly
<karolherbst> airlied: there is no 12 bit write, just the scratch space is that big
<airlied> ah
<karolherbst> mhh, I think I want to figure out if the read/writes are actually OOB
<karolherbst> huh
<airlied> probably dump a bunch of lp_build_print_value in emit_store_scratch
<karolherbst> I tried that, but printf doesn't really get flushed when the JIT is crashing or something? dunno
<karolherbst> I don't think I get the last values out of it
<airlied> pretty sure it's not buffered
<karolherbst> the last values I got were all 0 and it crashed
<airlied> so if you print before/after the lp_build_pointer_set then it didn't crash in there
famfo has quit []
<airlied> granted it might have loaded a value the crashes it later :-P
<karolherbst> possibly
<karolherbst> I doubt it though
<karolherbst> the offset comes directly out of a global mem buffer
<karolherbst> and I am sure it's all constant
<karolherbst> the array it indirects on is of size 1
<karolherbst> we could cheat a lot and just assume it's the right index... :D
maxzor has joined #dri-devel
<karolherbst> \o/
<karolherbst> airlied: it finally happened
<karolherbst> Thread 85 "rusticl queue t" received signal SIGSEGV, Segmentation fault.
<karolherbst> [Switching to Thread 0x7fffdc54e640 (LWP 3906609)]
<karolherbst> 0x00007fffe91afbd0 in llvm::Value::getType (this=0x0) at /home/kherbst/git/llvm-project/llvm/include/llvm/IR/Value.h:255
<karolherbst> ehh wait... I think that's somethin else
<karolherbst> ahh yeah.. that's on me :D
<karolherbst> I thought I hit that weird assert/crash
<karolherbst> airlied: mhhhh.. is there an easy way to print all currently bound resources?
<karolherbst> I start to believe it's something else... or.. well.. multiple things or something
<airlied> not that I know off
heat has quit [Remote host closed the connection]
heat has joined #dri-devel
<karolherbst> airlied: yeah.. so it's loading from a address which is neither the scratch mem buffer, nor any of the buffers passed in as kernel args
<karolherbst> and there are no images/samplers
adjtm has joined #dri-devel
<karolherbst> argh... and I thought I was close to fixing it..
<karolherbst> okay...
<karolherbst> I think I got it
<karolherbst> airlied: dammit.. we do 32 bit stuff on a 64 bit pointer :( and I am sure we overflow
<karolherbst> mov edx,DWORD PTR [rsp+rcx*8+0x47a0]
<karolherbst> mov edx,DWORD PTR [rdx]
<karolherbst> rdx is 0x7fff0153e21f
<karolherbst> rsp = 0x7fffce7f5c00
<karolherbst> rcx = 3
<karolherbst> ehh wait..
<karolherbst> ehh
<karolherbst> assembly is weird
<HdkR> s/assembly/x86
<karolherbst> I still thing it's part of the load_scratch thing
<karolherbst> let me move back a bit more
HankB_ has quit []
<karolherbst> ehh.. I had to mvoe one additional byte back
rkanwal has quit [Read error: No route to host]
<karolherbst> now it's a 64 bit load
rkanwal has joined #dri-devel
mszyprow has joined #dri-devel
pushqrdx has quit [Read error: Connection reset by peer]
maxzor has quit [Ping timeout: 480 seconds]
heat has quit [Remote host closed the connection]
heat has joined #dri-devel
<karolherbst> weird pointer: 0x7fff00000003
<karolherbst> so that's one part of the base
<karolherbst> and the offset of 0x153e21c gets added to it
<karolherbst> and that gives me the invalid pointer 0x7fff0153e21f
<karolherbst> nothing mapped at 0x7fff00000000
<mareko> Kayden: each job is independent, you can either wait manually for each job or call finish
<mareko> finish is slower
<mareko> finish is a group barrier like in GLSL, meaning that no newer job can execute if finish is waiting
<mareko> perhaps we could implement finish in a better way
<mareko> hopefully nobody is using finish in a perf-critical path, right zink?
<zmike> what
<ajax> ugh that reminds me
<mareko> util_queue_finish
Duke`` has quit [Ping timeout: 480 seconds]
<zmike> only on context/screen destroy
<karolherbst> airlied: huh... am I missing something or is the loop value result inside emit_load_scratch read without writing to it?
<karolherbst> ohh, it's init with 0
<karolherbst> nvm then
<karolherbst> anyway.. I think emit_load_scratch is wrong
<karolherbst> not sure what exactly yet
<airlied> are there inactive lanes?
<karolherbst> it looks like it, yes
<karolherbst> at least some values stay 0 within those vectors
<airlied> you can print exec_mask
<airlied> but it shouldn't do any loads for those lanes
<karolherbst> the proble isn't that
<airlied> like in theory inactive lanes should never do load/stores, but there might be a bug
<karolherbst> it constructs wrong pointers for some lanes
<karolherbst> like really bogus ones
<karolherbst> temp_res = 0 0 140735371329600 140733193388035 0 0 0 0
<karolherbst> (gdb) p/x 140735371329600
<karolherbst> $4 = 0x7fff81d0c040
<ajax> would it be legal to implement just the GetSwapchainStatus bit of VK_KHR_shared_presentable_image and not try to support any of the extra refresh modes
<karolherbst> (gdb) p/x 140733193388035
<karolherbst> $5 = 0x7fff00000003
<karolherbst> the first one is okay
<karolherbst> the second isn't
<ajax> i feel like that'd be a legal gl move but i'm not sure how vulkan convention rolls
<karolherbst> not sure if that has to do with inactive lanes, but..
<karolherbst> airlied:
<karolherbst> exec_mask = 0 0 -1 -1 -1 0 0 0
<karolherbst> temp_res = 0 0 140735236972608 140733193388035 0 0 0 0
<karolherbst> temp_res is dumped beween LLVMBuildLoad and LLVMBuildInsertElement
<karolherbst> it segfaults on the lane having a 140733193388035 in between
<karolherbst> it tries to load 0x7fff0153e21f, but it's based on that past bogus value
<karolherbst> result contains the right pointer though
<karolherbst> 0x7fffc8010040
<karolherbst> ehh well..
<karolherbst> nvm result
nchery has quit [Read error: Connection reset by peer]
nchery has joined #dri-devel
<Kayden> mareko: thanks, that clarifies things!
Jookia has joined #dri-devel
Jookia has left #dri-devel [#dri-devel]
cheako has joined #dri-devel
neonking has quit [Remote host closed the connection]
neonking has joined #dri-devel
apinheiro has quit [Remote host closed the connection]
<Viciouss> I have some trouble with a vblank timeout after upgrading from android 11 to 12 with an exynos4412 device. It will turn the screen black, but the device seems to work normally aside from that, I can adb in, sounds continue playing. I can reproduce this consistently.
<Viciouss> I'm using the android common kernel 5.10.101 with some patches for my device, this is happening on mesa 21.3.8, I also tried 22.0.2 as well as main with the same result. Here is the warning that comes with it: https://privatebin.net/?817901b4067fe684#56xaciREjDKvubLaCeM5umnQw6cJA4JS6TVkiPZqpfpa
danvet has quit [Ping timeout: 480 seconds]
Haaninjo has quit [Quit: Ex-Chat]
Jookia has joined #dri-devel
<Jookia> is the mesa3d.org sysadmin around here?
ybogdano has quit [Ping timeout: 480 seconds]
mclasen has joined #dri-devel
mclasen_ has joined #dri-devel
mclasen has quit [Ping timeout: 480 seconds]
<dcbaker> Jookia: you'll probably have more luck at #_oftc_#freedesktop:matrix.org , thats where the sysadmins generally hang out
<Jookia> ah
<Jookia> nevermind then
<Jookia> if someone could pass on that the archive.mesa3d.org certificate is broken in gnutls that'd be great :)
<Sachiel> I'm guessing that just means #freedesktop and the matrix client turned it into something else
<heat> Jookia, if the cert is broken in gnutls i'd guess it's gnutls that's broken, not the cert
<Jookia> heat: it's the server config sending duplicate certs
<Jookia> yes it's a gnutls bug, but it's fixable serverside too
<jekstrand> Is it just me or does the framebuffer_fetch spec say that if it's enabled in the shader, it's basically always on. Like, if I'm reading it correctly, you can just not write gl_FragData and it'll output the same color as before.
<jekstrand> Or you can just write gl_FragData.x to only modify one component
<jekstrand> in theory, anyway.
Jookia has left #dri-devel [#dri-devel]
<airlied> karolherbst: do I need anything beyond your rusticl/wip branch to reproduce?
<anholt> 90% of mediump tests passing with my vtn relaxed precision support. that means I'm basically done, right?
<karolherbst> nope
<anholt> airlied: will lvp want 16-bit math for mediump?
<karolherbst> airlied: ... I am still not quite sure what's going on, but I think _something_ calculates a wrong pointer, stores it into scratch mem and... uses it for a load, but ufff...
heat has quit [Remote host closed the connection]
<airlied> anholt: probably worth enabling just for testing, it should work
<airlied> the only 16-bit problem I remember having is around uniform readback for GL
<airlied> karolherbst: oh that would be annoying to track down
<anholt> not a case of "yeah, avx should love it, turn it on"?
<karolherbst> airlied: yes...
lumag_ has quit [Ping timeout: 480 seconds]
<karolherbst> but I don't think that's it
<airlied> anholt: don't think it magically made anything faster in the past
<karolherbst> there is some weirdness going on with types
<karolherbst> like it feels like something uses 32 bit although it should be 64
<karolherbst> airlied: is there a good place to dump the "final" nir shader of llvmpipe?
<karolherbst> ehh right before lp_build_nir_llvm I guess
<airlied> yeah there
mszyprow has quit [Ping timeout: 480 seconds]
<karolherbst> ahhh.. now it doesn't crash but simply renders garbage :D
<airlied> also LP_NUM_THREADS=1 might help to see things better
<karolherbst> yeah, I already set that one
<karolherbst> airlied: yeah.. well...
<karolherbst> question is.. is this correct or not
pcercuei has quit [Quit: dodo]
maxzor has joined #dri-devel
<karolherbst> ssa_28 seems to be the loaded base ptr
<karolherbst> weird...
icecream95 has joined #dri-devel
<karolherbst> but I think it's crashing on that one, as the x86 assembly looked very close to that ones
<karolherbst> *one
mhenning has joined #dri-devel
<karolherbst> but..
<karolherbst> ohhh.. let me check something
<karolherbst> offset 0x00000038 mhh
<karolherbst> let me check if the input buffer even contains valid stuff
neonking_ has joined #dri-devel
maxzor has quit [Ping timeout: 480 seconds]
neonking has quit [Ping timeout: 480 seconds]
morphis has quit [Ping timeout: 480 seconds]
morphis has joined #dri-devel
iive has quit []
ppascher has quit [Ping timeout: 480 seconds]
<karolherbst> ahhh.. why is this bug soo annoying
<karolherbst> now it stopped crashing and simply renders incorrectly :(