itoral has quit [Remote host closed the connection]
itoral has joined #dri-devel
<tzimmermann>
'rework' :)
itoral has quit [Remote host closed the connection]
itoral has joined #dri-devel
* javierm
reads
itoral has quit [Remote host closed the connection]
<tzimmermann>
javierm, there's a kernel log, which shows plymouth crashing. https://github.com/raspberrypi/linux/files/8573600/log.txt I suspect that plymouth tries to access fbdev framebuffer memory after the platform device has been unplugged
itoral has joined #dri-devel
<javierm>
tzimmermann: yes, I'm reading the log now
<karolherbst>
what I want is: containers to run any distribution deskop on any machine
itoral has quit [Remote host closed the connection]
itoral has joined #dri-devel
itoral has quit [Remote host closed the connection]
itoral has joined #dri-devel
<javierm>
tzimmermann: so if I understand the problem correctly, is like this 1) simplefb is registered and /dev/fb0 opened by plymouth, get_fb_info() is called
itoral has quit [Remote host closed the connection]
<javierm>
2) then at some point vc4 DRM driver registers and kicks out simplefb fbdev, then platform_device_unregister() is called
itoral has joined #dri-devel
<javierm>
tzimmermann: 3) but file->private_data still has a reference to struct fb_info that got in fb_open() and so in fb_release() things go boom
<tzimmermann>
javierm, i didn't step through it. i was just guessing
<tzimmermann>
where did you see the get_fb_info call?
<javierm>
tzimmermann: yean, I'm guessing too by reading the code and the log
<javierm>
tzimmermann: I did not but in the call stack there's fb_release() and a mutex_lock()
<javierm>
tzimmermann: so I think that's either a use-after-free in fb_release() when doing struct fb_info * const info = file->private_data and then lock_fb_info(info) or a locking issue
<javierm>
tzimmermann: that's why I thought that's more safe to drop the lock and re-acquire rather than making assumptions about the locking status
<javierm>
the __mutex_lock_slowpath and __mutex_lock.constprop.9 in the crash log are also suspicious
<javierm>
[ 11.365003] Unable to handle kernel paging request at virtual address fe7e014c
<javierm>
tzimmermann: oh, that does sound like the same issue indeed
<javierm>
:(
<javierm>
another corner case... I want to disable CONFIG_FB so much
<tzimmermann>
fbdev worked well until we tried to fix it :)
<javierm>
tzimmermann: for some definition of well :)
<tzimmermann>
javierm, i don't think the provided patch is correct, though. seems like it's papering over the issue
<javierm>
tzimmermann: absolutely agree
<javierm>
tzimmermann: but also... what's the point of keeping the device if the real driver will get other the display controller anyways
<javierm>
tzimmermann: so I wonder if we shouldn't just prevent this NULL pointer deref to happen, i.e: add a if (!info) return or something like that
<tzimmermann>
and return -ENODEV? if this already fixes the problem, i'm all for it
<javierm>
tzimmermann: yeah. But I don't know how to reproduce the issue. Maybe answering the person that proposed the patch with this suggestion ?
<tzimmermann>
javierm, maybe mripard or the reported of the bug
<javierm>
tzimmermann: because if someone mmap'ed the /dev/fb0, things are not going well anyways after let's say vc4 probes
<javierm>
the /dev/fb0 for simplefb I mean
<tzimmermann>
javierm, i suspected this was the problem. but i've never encountered the error anywhere
<tzimmermann>
javierm, after looking at get_fb_info(), i realized that we don't seem to clear registered_fb[i] anywhere after calling platform_device_unregister()
<tzimmermann>
so the pointer is still there (?)
<tzimmermann>
maybe that's the problem
<mripard>
tzimmermann: I can try to reproduce and test it if you want, or you can comment on the bug report
<tzimmermann>
i have to take another look
<mripard>
I'm probably not going to be able to test for the next couple of days though, so commenting on the bug report might be the fastest option
<tzimmermann>
mripard, thanks for the offer. i'll try first, but i'm really busy today
<javierm>
tzimmermann: btw, that assumptions that the fb_info reference can change beneath the user-space program feet that's holding a fd is scary, but that's how fbdev is...
<javierm>
I'm just making .fb_release() consistent with fb_ioctl(), fb_mmap(), etc
agd5f has joined #dri-devel
kts has joined #dri-devel
rkanwal has joined #dri-devel
<tzimmermann>
javierm, oh! so file_fb_info() is where the magic happens
apinheiro has quit [Ping timeout: 480 seconds]
frieder has joined #dri-devel
<javierm>
tzimmermann: yeah and fb_release() was not using that accessor
<javierm>
I've tested both patches in my rpi4 and at least didn't find any regressions
<javierm>
tzimmermann, mripard: posted to the list, let me know what you think folks
mvlad has joined #dri-devel
jewins has joined #dri-devel
<javierm>
tzimmermann: dropped patch 1/2, fbdev is a mine field so the less we touch it, the better :)
sdutt has joined #dri-devel
sdutt has quit []
sdutt has joined #dri-devel
fxkamd has joined #dri-devel
YuGiOhJCJ has quit [Quit: YuGiOhJCJ]
MrCooper_ is now known as MrCooper
Thymo_ has quit []
<jekstrand>
dcbaker, jljusten: Ping on a waffle release
rasterman has quit [Quit: Gettin' stinky!]
Thymo has joined #dri-devel
<MrCooper>
ajax: "is it actually that hard to predict which image will be returned next? it's the one with the lowest sbc" isn't always true, e.g. with direct scanout of a mailbox swapchain
<MrCooper>
ajax: we can get rid of the WSI thread with Xwayland, if Wayland compositors & Xwayland are fixed to handle mailbox properly (replacing an older buffer only once the newer one is actually ready)
tzimmermann has quit [Quit: Leaving]
apinheiro has joined #dri-devel
gawin has quit [Ping timeout: 480 seconds]
<jekstrand>
karolherbst, imirkin: A bit of help debugging nouveau?
<jekstrand>
Getting this from Mesa:
<jekstrand>
nvc0_screen_create:1168 - Error allocating PGRAPH context for M2MF: -16
<jekstrand>
And this from the kernel:
<jekstrand>
[ 835.464588] nouveau 0000:17:00.0: gr: fecs falcon already acquired by gr!
<jekstrand>
[ 835.464593] nouveau 0000:17:00.0: gr: init failed, -16
<karolherbst>
jekstrand: need signed firmware
<jekstrand>
karolherbst: Ok... and those aren't in linux-firmware?
<karolherbst>
they are, except for ampere
<jekstrand>
karolherbst: This is an RTX 2080 FWIW
<jekstrand>
2060, rather
<karolherbst>
but they might not be inside your initramfs
<jekstrand>
Yeah... I wondered about that
<jekstrand>
I'll try to debug that. Thanks
<karolherbst>
jekstrand: there is a lsinitrd command
<karolherbst>
ohh wiat
<karolherbst>
jekstrand: I think you might acutally hit some bug
<karolherbst>
:(
<karolherbst>
what kernel are you on?
<jekstrand>
Looks like I don't have nvidia firmware
<karolherbst>
ahh
<karolherbst>
I hope that is it, but...
<dcbaker>
jekstrand: jljusten: I’m working on it. I’ve been working on the xorg release scripts to work with waffle like they do with mesa
ella-0 has joined #dri-devel
<cwabbott>
ugh, this is so annoying...
<cwabbott>
../src/freedreno/vulkan/tu_cmd_buffer.c: In function ‘vk2tu_single_stage’:
<cwabbott>
../src/freedreno/vulkan/tu_cmd_buffer.c:3061:4: error: case label does not reduce to an integer constant
<cwabbott>
3061 | case VK_PIPELINE_STAGE_2_DRAW_INDIRECT_BIT:
<cwabbott>
| ^~~~
ella-0_ has quit [Read error: Connection reset by peer]
<cwabbott>
whyyy can't we have nice things like 64-bit enums
heat has joined #dri-devel
<jenatali>
cough C++ can do it cough ;)
anujp has joined #dri-devel
<jekstrand>
karolherbst: Not working. :(
<karolherbst>
noo :(
<karolherbst>
what is dmesg saying?
<jekstrand>
karolherbst: Still getting a "pmu: firmware unavailable" message, though. Maybe I need to dracut harder?
<karolherbst>
nah, we don't have PMU firmware
<karolherbst>
we had some fixes in that area though
<karolherbst>
but all this firmware stuff is just sooo annoying
<jekstrand>
That's the only firmware message I see
<karolherbst>
doing that without any documentation even more so
<karolherbst>
Ben is working on some of those though
pjakobsson has joined #dri-devel
frieder has quit [Remote host closed the connection]
<karolherbst>
jekstrand: the main problem here is, that probably not even nvidia knows what's wrong, it's all so painful if it comes to firmware
<jekstrand>
:(
<jekstrand>
And here I thought a 2060 was supposed to work. :P
<karolherbst>
they give us different firmware, because we don't get the PMU stuff, so we get mostly untested firmware :), nice, isn't it?
<karolherbst>
jekstrand: well... some do
<karolherbst>
that's the neat part
<karolherbst>
all GPUs seem to be different
<karolherbst>
some work, some.. a little, and some don't :)
<karolherbst>
the turings I've got here all work
<jekstrand>
:(
iive has joined #dri-devel
pjakobsson_ has quit [Ping timeout: 480 seconds]
<karolherbst>
there might be a trick we could do though, but not quite sure where to put it
<karolherbst>
there are many unknowns here, like there is a pre image we could use in the vbios, but also just reseting everything before loading our stuff _could_ help.. it's annoying
<karolherbst>
and ben might already have patches which fix your issue
<karolherbst>
but they might break others :)
<karolherbst>
that "sec2: unhandled intr 00000010" looks odd btw
<jekstrand>
Does it matter that this card has never been booted w/ windows? I would hope not. Sticky initialization would be a mess.
<karolherbst>
jekstrand: we do have an internal bug with a partner hitting your issue though
<karolherbst>
so there is some incentive to get that fixed
<karolherbst>
jekstrand: no, it doesn't matter, at least not if you cold reboot
nvishwa1 has joined #dri-devel
<jekstrand>
Well, if you've got kernel patches you'd like me to try, I can do that. The good news is that the card is plugged into the beefy machine that builds kernels fast. :)
<karolherbst>
yeah.. I'll try to ping ben and see if he has any ideas
ybogdano has joined #dri-devel
apinheiro has quit [Ping timeout: 480 seconds]
<karolherbst>
another possibility is that the firmware is buggy and we need a new one from nvidia.. wouldn't even be the first time that happens
<jekstrand>
:-/
<karolherbst>
jekstrand: mind booting with nouveau.debug=trace and share the log?
<jekstrand>
Yeah... Just a second. Gotta hook up a keyboard.
<jenatali>
Hm... I seem to have found app that's using the same resource on 2 contexts at the same time :(
rsalvaterra has joined #dri-devel
<karolherbst>
jenatali: sounds like fun
<jenatali>
That's one word for it
<karolherbst>
I mean.. what's the problem here?
<jenatali>
Our driver's not robust against that, but I wonder, is Mesa/Gallium in general?
<jenatali>
I'm hitting linked list corruption specifically
<karolherbst>
it's up to the frontend to make sure things don't break I think. pipe_resource as being shareable objects, the drivers have to be thread safe there anyway
<karolherbst>
screen operations in general need to be thread safe as well
<jenatali>
Is that true? I thought the pipe context followed the GL threading rules, which is that you can't use the same resource on 2 contexts at once
<jenatali>
Sure, screen ops are thread-safe
<karolherbst>
jenatali: ehh.. your buffer overflowed :(
<karolherbst>
...
<karolherbst>
I meant jekstrand
<karolherbst>
jenatali: pipe_context yes, that's unsafe, but the pipe_resource has to be thread safe
<anholt>
jenatali: GL threading rules let you use the same texture on 2 contexts at once.
<jenatali>
anholt: I could've sworn I did a spec dive a couple months back that said you couldn't
FireBurn has quit [Quit: Konversation terminated!]
<karolherbst>
well.. gallium epxects pipe_resources to be thread safe
<jenatali>
Maybe that was only for read-after-write hazards though... maybe multiple readers does need to work and that's where I'm busted
<karolherbst>
yeah..., I had some fun with that as well, because map/unmap are so annoying
<karolherbst>
but you have to expect that unmap and map can happen from different contexts even
<jekstrand>
karolherbst: ?
<anholt>
for RAW between threads, the app needs to be sure that the first rendering has finished (glFinish(() was traditional), and then do a bind in the reading context.
<jenatali>
Yeah that's fine for us, as long as it's not racy
apinheiro has joined #dri-devel
<jekstrand>
anholt: Don't you mean glFlush()?
<jekstrand>
Or does it really need glFinish()?
<anholt>
jekstrand: glFlush() is so ambiguous. it would have worked on Mesa for a long time, but wouldn't today.
<karolherbst>
jekstrand: your kmsg buffer is not big enough
<karolherbst>
try "log_buf_len=4M" or something
<anholt>
and given in general how ambiguous flush was, people would sprinkle flushes and finishes around their apps
<anholt>
(see also: the number of games that glFlush before glXSwapBuffers()!)
<anholt>
ajax: thank you for always XInitThreads()ing
<karolherbst>
jekstrand: mhh yeah.. seems like everything is in order until we try to set up the fifo
<karolherbst>
if we would just know what that 0x10 interrupt means
<jekstrand>
:(
Duke`` has joined #dri-devel
maxzor has joined #dri-devel
<karolherbst>
worst case it simply means "you messed up" :)
<karolherbst>
which would be more information than we have today
<jekstrand>
I don't think *I* messed up. :P
<karolherbst>
yeah... I don't think so :)
<karolherbst>
it's just such a mess without any docs and... *sigh*
linkmauve has left #dri-devel [#dri-devel]
nchery has quit [Ping timeout: 480 seconds]
linkmauve has joined #dri-devel
<karolherbst>
I just hope something happens so deubgging this wouldn't be so painful anymore
nchery has joined #dri-devel
<jekstrand>
:-/
lynxeye has quit [Quit: Leaving.]
bcheng has joined #dri-devel
tobiasjakobi has joined #dri-devel
tobiasjakobi has quit []
bcheng has quit [Remote host closed the connection]
bcheng has joined #dri-devel
bcheng has quit [Remote host closed the connection]
mszyprow has quit [Ping timeout: 480 seconds]
ybogdano has quit [Ping timeout: 480 seconds]
bcheng has joined #dri-devel
bcheng has quit [Remote host closed the connection]
bcheng has joined #dri-devel
<Kayden>
mareko: with util_queue, if I want to add a bunch of jobs to run in parallel, then wait for all of them to be done...do I have to add a fence on each job, and individually wait on those? or I could just util_queue_finish I suppose, though that might wait on additional jobs
<Kayden>
or is there a way to have a fence for "after all of these jobs are done"
<Kayden>
(guessing no, util_queue_fence seems to be tri-state rather than counting)
alyssa has joined #dri-devel
* alyssa
wonders if any drivers depend on this NIR bug..
<karolherbst>
alyssa: yes
<alyssa>
:D
<karolherbst>
btw, what bug?
<alyssa>
karolherbst: writes_memory is set for fragment shaders without side effects if they are linked to a vertex shader producing transform feedback varyings
<alyssa>
(causing various backend opts to be disabled)
<karolherbst>
this sounds like somebody which needs to be like that
<alyssa>
unfortunately, not the reason my code is broken
<karolherbst>
why unfortunately? Be happy, so you won't have to figure out why this needs to be set :p
<alyssa>
hnnngh
<alyssa>
still can't figure out where this 1 pix bug comes from
hch12907 has quit [Ping timeout: 480 seconds]
ybogdano has joined #dri-devel
<jekstrand>
alyssa: oh?
<jekstrand>
alyssa: Oh, my...
Emmy_ has quit [Remote host closed the connection]
<dcbaker>
mattst88: what's the status of waffle!107, I think that's the last thing on the list before the next waffle release
<mattst88>
dcbaker: I think it's ready. I was hoping to get chadv to take a quick look at it. I'll ping him about it
alanc has quit [Remote host closed the connection]
alanc has joined #dri-devel
<Kayden>
ouch, I just realized that there's texcompress_s3tc_tmp.h and util/format/u_format_s3tc.c
<Kayden>
we should probably consolidate on one S3TC implementation...
<airlied>
Kayden: don't they include each other?
<Kayden>
somewhat, but util_format_dxtn_pack_rgba_8unorm for example looks pretty duplicated
maxzor has quit [Ping timeout: 480 seconds]
<karolherbst>
ohh right.. I wanted to figure out scratch space actually
Haaninjo has joined #dri-devel
<karolherbst>
jekstrand: sooo.. I think I found a kernel doing weird stuff.. it writes 12 bytes into the scratch buffer and then uses values from global mem to access it... I didn't look yet on what we should do, but this could become ugly
mszyprow has joined #dri-devel
anarsoul|2 has joined #dri-devel
<airlied>
karolherbst: is that the problem with luxmark/llvmpipe?
<karolherbst>
yes
<karolherbst>
it's the size specifically
<karolherbst>
so I cloned what compiler/clc was doing by reseting the size of scratch space and get it recalculated, but...
<karolherbst>
seems to cause it to crash with llvmpipe
<karolherbst>
I'll check what's the actual offsets we get and try to figure out what's the correct thing here
rasterman has joined #dri-devel
* airlied
looks at the spirv-llvm-translator opaque ptrs issue, gonna be fun
<karolherbst>
airlied: I'd look into coroutines first
<karolherbst>
opaque pointers can be disabled at runtime
<karolherbst>
which we might have to do until it gets resolved
<karolherbst>
but that coroutine change we can't work around as it seems
anarsoul has quit [Ping timeout: 480 seconds]
anarsoul|2 has quit [Ping timeout: 480 seconds]
<airlied>
karolherbst: yeah coroutines are first when I get it building, was just looking around the minefield
<alyssa>
jekstrand: woof
anarsoul has joined #dri-devel
<jekstrand>
alyssa: I need more of your patches.
<alyssa>
jekstrand: which ones?
<alyssa>
the make vk go brr patches?
<jekstrand>
idk. Maybe the ones for null shaders?
mszyprow has quit [Ping timeout: 480 seconds]
<jekstrand>
And... I've got a kernel oops on trying to access a null user pointer
apinheiro has quit [Ping timeout: 480 seconds]
<alyssa>
mmh, I can take a look
<alyssa>
am 'supposed' to be doing valhall but you know, this is more fun ;-P
<ajax>
would it be legal to implement just the GetSwapchainStatus bit of VK_KHR_shared_presentable_image and not try to support any of the extra refresh modes
<karolherbst>
(gdb) p/x 140733193388035
<karolherbst>
$5 = 0x7fff00000003
<karolherbst>
the first one is okay
<karolherbst>
the second isn't
<ajax>
i feel like that'd be a legal gl move but i'm not sure how vulkan convention rolls
<karolherbst>
not sure if that has to do with inactive lanes, but..
<karolherbst>
temp_res is dumped beween LLVMBuildLoad and LLVMBuildInsertElement
<karolherbst>
it segfaults on the lane having a 140733193388035 in between
<karolherbst>
it tries to load 0x7fff0153e21f, but it's based on that past bogus value
<karolherbst>
result contains the right pointer though
<karolherbst>
0x7fffc8010040
<karolherbst>
ehh well..
<karolherbst>
nvm result
nchery has quit [Read error: Connection reset by peer]
nchery has joined #dri-devel
<Kayden>
mareko: thanks, that clarifies things!
Jookia has joined #dri-devel
Jookia has left #dri-devel [#dri-devel]
cheako has joined #dri-devel
neonking has quit [Remote host closed the connection]
neonking has joined #dri-devel
apinheiro has quit [Remote host closed the connection]
<Viciouss>
I have some trouble with a vblank timeout after upgrading from android 11 to 12 with an exynos4412 device. It will turn the screen black, but the device seems to work normally aside from that, I can adb in, sounds continue playing. I can reproduce this consistently.
<dcbaker>
Jookia: you'll probably have more luck at #_oftc_#freedesktop:matrix.org , thats where the sysadmins generally hang out
<Jookia>
ah
<Jookia>
nevermind then
<Jookia>
if someone could pass on that the archive.mesa3d.org certificate is broken in gnutls that'd be great :)
<Sachiel>
I'm guessing that just means #freedesktop and the matrix client turned it into something else
<heat>
Jookia, if the cert is broken in gnutls i'd guess it's gnutls that's broken, not the cert
<Jookia>
heat: it's the server config sending duplicate certs
<Jookia>
yes it's a gnutls bug, but it's fixable serverside too
<jekstrand>
Is it just me or does the framebuffer_fetch spec say that if it's enabled in the shader, it's basically always on. Like, if I'm reading it correctly, you can just not write gl_FragData and it'll output the same color as before.
<jekstrand>
Or you can just write gl_FragData.x to only modify one component
<jekstrand>
in theory, anyway.
Jookia has left #dri-devel [#dri-devel]
<airlied>
karolherbst: do I need anything beyond your rusticl/wip branch to reproduce?
<anholt>
90% of mediump tests passing with my vtn relaxed precision support. that means I'm basically done, right?
<karolherbst>
nope
<anholt>
airlied: will lvp want 16-bit math for mediump?
<karolherbst>
airlied: ... I am still not quite sure what's going on, but I think _something_ calculates a wrong pointer, stores it into scratch mem and... uses it for a load, but ufff...
heat has quit [Remote host closed the connection]
<airlied>
anholt: probably worth enabling just for testing, it should work
<airlied>
the only 16-bit problem I remember having is around uniform readback for GL
<airlied>
karolherbst: oh that would be annoying to track down
<anholt>
not a case of "yeah, avx should love it, turn it on"?
<karolherbst>
airlied: yes...
lumag_ has quit [Ping timeout: 480 seconds]
<karolherbst>
but I don't think that's it
<airlied>
anholt: don't think it magically made anything faster in the past
<karolherbst>
there is some weirdness going on with types
<karolherbst>
like it feels like something uses 32 bit although it should be 64
<karolherbst>
airlied: is there a good place to dump the "final" nir shader of llvmpipe?
<karolherbst>
ehh right before lp_build_nir_llvm I guess
<airlied>
yeah there
mszyprow has quit [Ping timeout: 480 seconds]
<karolherbst>
ahhh.. now it doesn't crash but simply renders garbage :D
<airlied>
also LP_NUM_THREADS=1 might help to see things better