ChanServ changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard + Bifrost + Valhall - Logs https://oftc.irclog.whitequark.org/panfrost
manu has quit [Ping timeout: 480 seconds]
Consolatis_ has joined #panfrost
Consolatis_ is now known as Consolatis
DPA2 has quit [Ping timeout: 480 seconds]
<Mary> linkmauve: what version of mesa? Going to update to mainline to see if there is any kind of regression but latest main work here in a meson devenv
<linkmauve> Mary, f1f93ff1f68, which was master two weeks ago.
<linkmauve> I haven’t tried a stable release yet on this board, usually things always work better on master in Mesa.
<Mary> linkmauve: tried on latest stable-rc + mesa main and I get everything working under a meson devenv
<Mary> looking at linux master, it seems to only have my prio restriction patch so there shouldn't be any big difference with stable-rc
<Mary> linkmauve: can you give me some logs? or maybe give meson devenv a try and see if it works under there?
<linkmauve> I usually build on a different computer than the rk3588, so installing a package is easier for me. Which logs are you interested in?
<linkmauve> I’m building main as we talk, so I can test on a build from today.
<Mary> linkmauve: probably clinfo to see if anything is failing on rusticl side
<Mary> I never really tested rusticl outside of devenv so maybe it's that? (maybe karolherbst have some ideas?)
<linkmauve> With any debug env var? Because otherwise jagan_’s log is exactly the same as mine.
Consolatis_ has joined #panfrost
Consolatis is now known as Guest4869
Consolatis_ is now known as Consolatis
Guest4869 has quit [Ping timeout: 480 seconds]
<Mary> hmm... I build in debugoptimized, let me try a proper release build just in case
<Mary> linkmauve: tried with release no issues, I even installed the library and loaded it just fine too with RUSTICL_ENABLE=panfrost OCL_ICD_VENDORS=/opt/local/panfrost/etc/OpenCL/vendors clinfo
<Mary> the only diff is that I have libclc installed and don't have that warning you both have so maybe it's related?
<Mary> (Testing on a Fedora on a Rock 5B with latest stable-rc in term of kernel)
DPA has joined #panfrost
rasterman has joined #panfrost
DPA has quit [Quit: ZNC 1.8.2+deb3.1+deb12u1 - https://znc.in]
DPA has joined #panfrost
DPA2 has joined #panfrost
DPA has quit [Read error: Connection reset by peer]
manu has joined #panfrost
DPA has joined #panfrost
DPA2 has quit [Read error: Connection reset by peer]
DPA2 has joined #panfrost
DPA has quit [Ping timeout: 480 seconds]
DPA has joined #panfrost
DPA2 has quit [Ping timeout: 480 seconds]
rasterman has quit [Quit: Gettin' stinky!]
pbrobinson has joined #panfrost
<karolherbst> Mary: should just work in a devenv
<karolherbst> PipeScreen::new might be a good place to look with gdb and see what happens
<karolherbst> _maybe_ `load_screens` inside mesa/pipe/device.rs
warpme has quit []
warpme has joined #panfrost
warpme has quit []
warpme has joined #panfrost
warpme has quit []
rasterman has joined #panfrost
soxrok2212 has quit []
soxrok2212 has joined #panfrost
kinkinkijkin has joined #panfrost
erle has quit [Quit: K-lined]
<linkmauve> Mary, I don’t have the libclc issue here, so here is my entire log: https://linkmauve.fr/files/panfrost-rusticl.txt
<linkmauve> karolherbst, it would be nice to have RUSTICL_ENABLE=list or RUSTICL_ENABLE=help to make sure we don’t typo it.
<linkmauve> And maybe error out if we try to enable a driver which doesn’t exist.
<linkmauve> Hmm, despite having built Mesa in debug mode, /usr/lib/libRusticlOpenCL.so.1 only has very few of them.
<linkmauve> Not true, when I strip it it gets lighter from 50 MiB to 9.9 MiB, but there are no symbols inside, especially not PipeScreen::new() nor PipeScreen::load_screens().
<karolherbst> linkmauve: rusticl only exports three symbols, but maybe something went wrong with the rust flags?
<karolherbst> rusticl here is 166MiB
<linkmauve> I only built the panfrost and rocket Gallium drivers, and the panfrost (panvk) Vulkan driver.
<karolherbst> yeah.. maybe it's that
<linkmauve> So it being smaller might be related.
<karolherbst> I have a couple of more drivers
<karolherbst> linkmauve: can you break on panfrost_create_screen instead and see if it gets called?
<linkmauve> I could add llvmpipe to see if the issue comes from rusticl or from panfrost.
<karolherbst> yeah.. having llvmpipe built is always a good idea
<linkmauve> Although, the exact same Mesa build gives a clinfo result on a Mali-G52.
<karolherbst> could be some odd difference in supported stuff
<linkmauve> karolherbst, no it isn’t.
<linkmauve> (panfrost_create_screen)
<karolherbst> mhhhhhh
<karolherbst> _odd_
<karolherbst> mhh wait.. the screen gets created after filtering...
<karolherbst> break on pipe_loader_probe then
<karolherbst> it might get multiple times
<linkmauve> This one does work.
<karolherbst> but it should return something if panfrost works alright
<karolherbst> and maybe dump the driver
<linkmauve> It gets called exactly twice.
<linkmauve> Breakpoint 1, pipe_loader_probe (devs=0x0, ndev=0, with_zink=true) at ../mesa/src/gallium/auxiliary/pipe-loader/pipe_loader.c:64
<linkmauve> Breakpoint 1, pipe_loader_probe (devs=0xaaaaaab92990, ndev=1, with_zink=true) at ../mesa/src/gallium/auxiliary/pipe-loader/pipe_loader.c:64
<karolherbst> you want to dump the devs
<karolherbst> specially driver_name
<linkmauve> (gdb) p devs[0]
<linkmauve> $2 = (struct pipe_loader_device *) 0x0
<karolherbst> (I think I should add some debug options for the loader part...
<karolherbst> )
<linkmauve> Yes please. :)
<karolherbst> mhhh wait...
<karolherbst> you need to dump it at the end of the func
<karolherbst> like at the `return n;` line or so
<linkmauve> $3 = {type = PIPE_LOADER_DEVICE_SOFTWARE, u = {pci = {vendor_id = 0, chip_id = 0}}, driver_name = 0xfffff769b7f8 "swrast", ops = 0xfffff7bb7fc0 <pipe_loader_sw_ops>, option_cache = {
<linkmauve> info = 0x0, values = 0x0, tableSize = 0}, option_info = {info = 0x0, values = 0x0, tableSize = 0}}
<linkmauve> So it tries to load swrast despite it not even being built in Mesa?
<karolherbst> yeah.. that's kinda normal
<karolherbst> anyway
<karolherbst> panfrost isn't there
<linkmauve> I’ll try with llvmpipe enabled now.
<karolherbst> so it's not linked in or the loader refuses to load it....
<karolherbst> anyway, not an issue in rusticl (tm) (I think)
<karolherbst> soo.. the first call to pipe_loader_probe is for the hardware accelerated drivers, the second is for sw
<karolherbst> ehh wait
<karolherbst> the first counts the backends
<karolherbst> and the second iterates
<karolherbst> anyway, the problem is, that something goes wrong there
<linkmauve> $4 = {type = PIPE_LOADER_DEVICE_PLATFORM, u = {pci = {vendor_id = 0, chip_id = 0}}, driver_name = 0xaaaaaab977b0 "panfrost", ops = 0xfffff7bb83d8 <pipe_loader_drm_ops>, option_cache = {
<linkmauve> On the Mali-G52 board (a rk3568), I see it:
<linkmauve> (gdb) p *devs[0]
<linkmauve> info = 0x0, values = 0x0, tableSize = 0}, option_info = {info = 0x0, values = 0x0, tableSize = 0}}
<linkmauve> (gdb) p *devs[1]
<linkmauve> $5 = {type = PIPE_LOADER_DEVICE_SOFTWARE, u = {pci = {vendor_id = 0, chip_id = 0}}, driver_name = 0xfffff7694de8 "swrast", ops = 0xfffff7bb83c0 <pipe_loader_sw_ops>, option_cache = {
<linkmauve> info = 0x0, values = 0x0, tableSize = 0}, option_info = {info = 0x0, values = 0x0, tableSize = 0}}
<karolherbst> 🙃
<linkmauve> Exact same Mesa build.
<karolherbst> right...
<karolherbst> you want to step through `n += backends[i](&devs[n], MAX2(0, ndev - n));`
<karolherbst> and see what happens
<karolherbst> I'm sure it's something silly
<karolherbst> pipe_loader_drm_probe_internal is what's called for hw drivers
<karolherbst> maybe it can't open the renderer node, maybe pipe_loader_drm_probe_fd_nodup fails for weird reasons
<karolherbst> who knows
<linkmauve> It doesn’t even go to pipe_loader_drm_probe_fd_nodup().
<karolherbst> so it fails to open the renderer node
<linkmauve> It does on the G52 board.
<karolherbst> I suspect some permission stuff going wrong
<linkmauve> crw-rw-rw- 1 root render 226, 128 Sep 11 01:17 renderD128
<linkmauve> That should be usable by anyone.
<karolherbst> sure, but random_things can happen and mesa fails to get an fd
<linkmauve> Oh uh, eglinfo also gives me swrast!
<karolherbst> yeah.. something something, maybe the kernel driver fails to do something
<karolherbst> you should check what error you get when mesa tries to open the renderer node
<karolherbst> should be in loader_open_device
<karolherbst> there is alog_ thing, but you might want to print errnor after open just in case
<karolherbst> *errno
<linkmauve> [ 10.193714] [drm] Initialized panthor 1.0.0 for fb000000.gpu on minor 0
<linkmauve> $3 = 22
<linkmauve> (gdb) p (int)errno
<linkmauve> So EINVAL.
<karolherbst> yeah mhhh no idea about that one :) sounds like either a panfrost or... something else bug
<linkmauve> panthor*
<karolherbst> or panthor
<karolherbst> Mary: ^^ open returns EINVAL when opening the file descriptor
<linkmauve> How do you enable more verbose DRM logs in the kernel?
<karolherbst> there is drm.debug you can flip at runtime, but that can be a bit verbose if you don't know the right flags to enable
<linkmauve> Hmm, just says [28734.372029] [drm:drm_stub_open] four times.
<linkmauve> I wrote 0xff in /sys/module/drm/parameters/debug.
<karolherbst> yeah.. probably need to check if panthor has any debug flags
<linkmauve> Not according to modinfo.
<linkmauve> modinfo drm describes the drm.debug bits btw.
<karolherbst> yeah,.. but I don't have any ideas on how to debug this issue :) Maybe something something needs to happen so one can open the node, but if GL also fails, I suspect it's some general issue
<linkmauve> Yeah, panvk also fails, all three of them.
<linkmauve> I’ll try stable 6.11 instead of mainline.
<linkmauve> If this work I’ll bisect the kernel.
<linkmauve> 6.11.0 does open() fine!
<linkmauve> And clinfo and eglinfo do work, but PAN_I_WANT_A_BROKEN_VULKAN_DRIVER=1 vulkaninfo still doesn’t.
<linkmauve> I’ll bisect.
urja has quit [Read error: Connection reset by peer]
urja has joined #panfrost
<linkmauve> 13 steps remaining.
<karolherbst> impressive... wouldn't have considered a kernel regression
<linkmauve> Yeah me neither.
kinkinkijkin has quit [Quit: Leaving]
<linkmauve> … I should have moved from SD to NVMe before doing this bisect, I’m bottlenecked by the IO speed of extracting the modules. ^^'
<Mary> karolherbst: that's weird...
<Mary> you don't set any priority on the context creation right?
<linkmauve> Bisecting: 28 revisions left to test after this (roughly 5 steps)
<linkmauve> I should have used ccache.
<karolherbst> Mary: nope, but it already fails at loader time, so just opening the renderer node
<Mary> Okay so it's not my changes around group priority then...
Andrey has joined #panfrost
Andrey has quit [Remote host closed the connection]
warpme has joined #panfrost
warpme has quit []
<Mary> linkmauve, karolherbst: 6.11.1-rc1 is fine and that contains all the changes for panthor that are in current master
<Mary> there is a merge commit on master tho
<Mary> so maybe the issue is coming from that?
cphealy has quit [Quit: Leaving]
rasterman has quit [Quit: Gettin' stinky!]
<linkmauve> Bisecting: 6 revisions left to test after this (roughly 3 steps)
luiz_felipe has joined #panfrost
luiz_felipe has quit []
<linkmauve> Mary, karolherbst, jagan_, 641bb4394f405cba498b100b44541ffc0aed5be1 is the first bad commit.
<karolherbst> mhhhh
<linkmauve> https://linkmauve.fr/files/panthor-bisect.txt in case anyone wants to reproduce.
<karolherbst> you could check if it works with the previous commit jsut to be sure
<karolherbst> but yeah.. it kinda makes sense, maybe you need to add tha flag to panthor as well?
<karolherbst> `.fop_flags= FOP_UNSIGNED_OFFSET,` I mean
<linkmauve> I did, 8447d848e1dc was a good one.
<linkmauve> I’m testing with a revert of 641bb4394f405cba498b100b44541ffc0aed5be1 on master, hopefully nothing depends on it yet.
<linkmauve> Yup, with that commit reverted clinfo and eglinfo work fine again. :)
<karolherbst> annoying :)
<linkmauve> The commit message mentions that panthor_drm_driver_fops must now set FMODE_UNSIGNED_OFFSET, but it doesn’t touch that struct at all.
<linkmauve> I will now try doing exactly that.
<linkmauve> I expect this to be the fix.
<Mary> So I guess it got lost along the way...
<linkmauve> Bingo!
<linkmauve> I’ll send the patch shortly.
<linkmauve> Alright, sent.
<linkmauve> Oops, I forgot to say I tested on a Rock-5B, but I guess it’s not that important.
<linkmauve> And on that, good night everyone. :)
cphealy has joined #panfrost
CaptainIRS has quit [Read error: Connection reset by peer]
CaptainIRS has joined #panfrost