#dri-devel on 2021-06-24 — irc logs at oftc.irclog.whitequark.org

2021-06-22 12:29 ChanServ changed the topic of #dri-devel to: <ajax> nothing involved with X should ever be unable to find a bar

00:09 adjtm has quit [Read error: Connection reset by peer]

00:19 ngcortes has quit [Remote host closed the connection]

00:23 adjtm has joined #dri-devel

00:24 sarnex has joined #dri-devel

00:28 sarnex has quit []

00:29 sarnex has joined #dri-devel

00:32 undvasistas[m] has joined #dri-devel

00:42 fcarrijo has quit []

01:03 GloriousEggroll has quit [Quit: Death is life's way of telling you you've been fired.]

01:06 ddavenport has joined #dri-devel

01:09 reinist12 has joined #dri-devel

01:16 ppascher has quit [Ping timeout: 480 seconds]

01:28 Lucretia has quit []

01:38 agners has quit [Quit: WeeChat 3.2]

01:42 nsneck has quit [Remote host closed the connection]

01:43 MrCooper has quit [Remote host closed the connection]

01:43 nsneck has joined #dri-devel

01:43 adjtm has quit [Read error: Connection reset by peer]

01:43 adjtm has joined #dri-devel

01:43 MrCooper has joined #dri-devel

01:46 sarnex has quit [Ping timeout: 480 seconds]

01:47 Lightkey has quit [Ping timeout: 480 seconds]

01:48 camus1 has joined #dri-devel

01:50 sarnex has joined #dri-devel

01:50 <zmike> update: despite turnip coming out with a strong lead, and then ANV surging ahead into "but also this one little thing here" shedsmanship, lavapipe has officially taken first place in the multidraw implementation contest

01:50 <zmike> the organizers still have their eyes peeled to see what will happen with the second and third place prizes

01:56 <robclark> zmike: is this some sort of reviews-per-second benchmark :-P

01:56 Lightkey has joined #dri-devel

01:57 <zmike> if it was, airlied would be winning :P

01:57 <robclark> heheh

01:58 hch12907_ has joined #dri-devel

01:59 reinist12 has quit []

02:00 hch12907 has quit [Ping timeout: 480 seconds]

02:11 boistordu has quit [Remote host closed the connection]

02:11 boistordu has joined #dri-devel

02:14 luzipher_ has joined #dri-devel

02:20 luzipher__ has quit [Ping timeout: 480 seconds]

02:21 blue__penquin has joined #dri-devel

02:25 camus1 has quit [Remote host closed the connection]

02:25 camus has joined #dri-devel

02:27 ddavenport has quit [Remote host closed the connection]

02:41 fcarrijo has joined #dri-devel

02:44 khfeng has joined #dri-devel

03:01 gpoo has quit [Ping timeout: 480 seconds]

03:09 ppascher has joined #dri-devel

03:10 gpoo has joined #dri-devel

03:39 gpoo has quit [Ping timeout: 480 seconds]

03:43 <zmike> I think we gotta cut down on these iris jobs in ci until there's more hw or something

03:43 <zmike> they seem to get stuck way too often

03:48 cedric has joined #dri-devel

03:49 bluebugs has quit [Ping timeout: 480 seconds]

03:49 fcarrijo has quit []

03:51 yoslin has joined #dri-devel

03:54 yoslin has quit []

03:57 yoslin has joined #dri-devel

04:02 mattrope has quit [Remote host closed the connection]

04:08 luzipher__ has joined #dri-devel

04:14 luzipher_ has quit [Ping timeout: 480 seconds]

04:18 ddavenport has joined #dri-devel

04:27 andrey-konovalov has joined #dri-devel

04:50 tzimmermann has joined #dri-devel

04:53 aravind has joined #dri-devel

04:57 Duke`` has joined #dri-devel

04:59 mattrope has joined #dri-devel

05:00 luzipher_ has joined #dri-devel

05:06 luzipher__ has quit [Ping timeout: 480 seconds]

05:19 thellstrom1 has joined #dri-devel

05:19 thellstrom has quit [Remote host closed the connection]

05:30 <marcan> is this the right place to talk about KMS shenanigans?

05:30 <airlied> yeah probably

05:30 <marcan> I've been looking at how Apple did their display controller on M1 and... this is going to be fun

05:31 <marcan> they run a big blob on a side-CPU, and speak to it over a mailbox protocol. great, I don't have to implement DP training and isochronous memory bandwidth calculations and all that stuff

05:31 <marcan> ... but in exchange, the mailbox protocol is, shall we say, "interesting"

05:32 <marcan> as far as I can tell, for some parts what they did is literally move IOKit drivers to the firmware, and stuck shims in front

05:32 sdutt has quit [Remote host closed the connection]

05:32 <airlied> seems like a winning design

05:32 <airlied> is the interface sane or some sort of C++ RPC?

05:32 <marcan> for others there's another kind of protocol with synchronous messages/calls in both directions, as well as async ones, and such

05:32 <marcan> I'm not entirely sure quite yet how the regular marshaling works yet

05:33 <airlied> like nvidia does something similiar

05:33 <airlied> the kernel drivers talks to the modesetting firmware

05:33 <marcan> I do know that where data structures are involved, they have their own version of json-but-binary

05:33 <marcan> e.g. they don't get the EDID, they get a giant serialized blob describing everything verbosely, already parsed, as well as available modes and constraints etc

05:34 <marcan> this is only for certain things, not all messages (e.g. calls for displaying frames and such use pure binary structures, not this)

05:35 <airlied> is there any sort of per connector split?

05:35 <airlied> I suppose the other problems is the interface in anyway stable

05:35 <airlied> like will an update just trash the API

05:36 <marcan> right now on M1 machines I think there are two completely separate controllers with processors, and then there is a crossbar; not entirely sure who manages the crossbar, if these things or the host directly

05:36 <marcan> and yes, I was going to mention that

05:37 <marcan> I don't know how stable this ABI is; the firmware is shipped with macOS and becomes part of the bootloader stuff that runs before we get control

05:37 <marcan> however, it *is* per OS install

05:37 <marcan> so we are insulated from macOS updates, but it means that if we want to pin a version, our install script has to grab it from Apple's CDN and pull out the blobs from there

05:37 <airlied> my other worry would be they do fw updates from the OS drivers post-boot

05:37 <marcan> they do not

05:37 <marcan> it is not stored separately

05:38 <marcan> basically the bootloader-that-runs-before-us loads this firmware from the "OS" partition (which for linux will just contain our bootloader because we obviously aren't putting Linux on APFS)

05:38 <marcan> so you can dual-boot and macOS will never screw Linux up on this

05:38 mattrope has quit [Remote host closed the connection]

05:38 <marcan> that much is great

05:38 <airlied> okay then it mostly sounds like how nvidia hw works, probably with a different split in functionality

05:40 <airlied> from an atomic modesetting pov it would be good to know if you can submit all the state for all the connectors at once

05:40 <marcan> if they are different DCPs I assume not, but Apple know their display pipelines so they must have *something* to make this work

05:41 <airlied> sounds like a lot of tracing required and monitor plugging/unplugging :-)

05:41 <marcan> the other thing is while Apple can just tie firmware versions and the OS version together, we're eventually going to have to support more than one version (to get bugfixes and features), and if the ABI is unstasble that means we do need to support at least a certain subset of versions in the same kernel

05:41 <marcan> yes :-)

05:42 <marcan> I'm currently running macOS in a VM and just worked out the mailbox/ringbuffer protocols, so I have full dumps of every command/event/etc sent and received

05:42 <airlied> yeah hopefully the fw advertises a version

05:42 <marcan> oh we will know the version from iBoot anyway, that won't be an issue

05:42 <marcan> I'm sure there are a dozen ways to get that

05:43 <marcan> I'm writing the shim-bootloader that bridges the apple world to the devicetree world, so any info we need I can put in the devicetree

05:43 <marcan> if you're curious, here's a log from boot to login screen: https://marcan.st/transf/dcplog.txt

05:45 <airlied> you'll also have to figure out how audio works I suppose

05:46 <marcan> yeah, there's an endpoint for audio, but obviously the audio hardware is completely separate and there has to be some link at some point

05:47 <marcan> at this point I'm wondering how I'm going to deal with all this marshalling in the kernel... especially if we end up having to support multiple ABIs. Also that json-like thing.

05:51 <marcan> on the plus side, hopefully this will make supporting newer chips relatively trivial, if most of the differences are abstracted out

05:52 <marcan> I guess I should probably read up on IOKit since it will probably explain a lot of the concepts I'm seeing here, heh

05:52 <marcan> and here I was hoping to avoid that :-)

05:52 <airlied> marcan: it's probably not much worse than ACPI

05:53 <marcan> okay yeah

05:53 <marcan> but I was also hoping not to have to write another ACPI-like framework for this :-)

05:54 <airlied> marcan: any idea if any other hw looks the same?

05:54 <marcan> there are many other CPUs with mailboxes like this, but I don't know if the mailbox protocols are all along these lines. DCP is one of the worse ones I think.

05:54 <marcan> the GPU also uses one though

05:55 <marcan> in fact I was looking at DCP in part to have the basics worked out enough to then move to GPU, especially since the GPU has an unknown special MMU while the DCP uses the "standard" Apple IOMMU that we already know about (DART)

05:55 <airlied> marcan: metal transported over it?

05:56 <marcan> nah, alyssa already worked out the shaders and basica drawing and stuff, she's got like 70% of the GLES tests passing on macOS?

05:56 <marcan> that's issuing IOKit calls on macOS directly to their driver

05:56 <airlied> does the driver then forward those to the hw? or do stuff?

05:56 <marcan> so it's mostly about working out memory management, command submission, and preemption (which they do support, that's one of the big things the firmware does)

05:57 <marcan> I think it mostly forwards, though due to the preemption the firmware has to be aware of e.g. render target details, which is why it's higher level than you might expect

05:57 <marcan> we'll find out once I put this tracer on the GPU device

05:57 <marcan> I literally worked out the mailbox message passing structures yesterday :)

06:13 <airlied> marcan: nice! probably worth building a userspace prototype until you can nail down how much marshalling etc is going to be needed

06:17 ddavenport has quit [Remote host closed the connection]

06:18 Duke`` has quit [Ping timeout: 480 seconds]

06:19 mlankhorst has joined #dri-devel

06:24 ddavenport has joined #dri-devel

06:31 RobertC has joined #dri-devel

06:39 <marcan> aaa

06:39 <marcan> whoops, I guess that TCP connection wasn't as dead as I thought

06:40 <marcan> airlied: my current playground is actually using the device remotely via USB, running in my bootloader, controlled via python scripts on the other side

06:40 <marcan> that's actually how that hypervisor works - all that tracer stuff is python via a remote proxy, only the core is running locally

06:40 <marcan> so I'm going to do the same for this, build a prototype driver for it in python

06:40 <marcan> same for the GPU, I want to render a triangle before I start on the kernel driver

06:42 <marcan> hoping that eventually that tooling can help spit out metadata, thunks, or whatever we decide to go with for the kernel marshalling

06:42 <marcan> one cute thing this also lets me do, in theory, is just halt macos and hijack DCP and issue whatever commands I want, then resume it

06:43 <marcan> which should be interesting for experiments

06:43 <marcan> (I can also modify or suppress commands macOS sends)

06:43 <marcan> though how much of this I actually want to implement "properly" depends on how useful it is :)

06:43 RobertC has quit [Ping timeout: 480 seconds]

06:44 <marcan> this is what the top level looks like for the tracer that spat out that .txt file: https://github.com/AsahiLinux/m1n1/blob/main/proxyclient/hv/trace_dcp.py

06:44 <marcan> though I need to refactor the message handling, in particular the inbuf/outbuf thing, now that I have a better idea of how it works. it's more symmetric than I thought.

06:49 gouchi has joined #dri-devel

06:49 gouchi has quit []

06:51 luzipher__ has joined #dri-devel

06:57 luzipher_ has quit [Ping timeout: 480 seconds]

07:17 pnowack has joined #dri-devel

07:17 pnowack has quit [Remote host closed the connection]

07:17 pnowack has joined #dri-devel

07:20 <bbrezillon> danvet: yes, I was planning to introduce a 'disable implicit deps' flag in v2, and daniels already pointed me to jekstrand's work ;-)

07:20 rasterman has joined #dri-devel

07:21 ppascher has quit [Ping timeout: 480 seconds]

07:21 rasterman has quit []

07:22 rasterman has joined #dri-devel

07:23 luzipher_ has joined #dri-devel

07:25 kem has quit [Ping timeout: 480 seconds]

07:26 ppascher has joined #dri-devel

07:28 kem has joined #dri-devel

07:29 luzipher__ has quit [Ping timeout: 480 seconds]

07:32 ddavenport has quit [Remote host closed the connection]

07:35 andrey-konovalov has quit [Ping timeout: 480 seconds]

07:43 danvet has joined #dri-devel

07:46 <pepp> MrCooper: all pixels of an app are copied twice (DRI3 buffers -> window pixmap -> compositor framebuffer), right?

07:46 lynxeye has joined #dri-devel

07:49 <MrCooper> right

07:58 <pepp> thanks. One more question: AFAICT using gdb the window pixmap is still updated for unredirected fullscreen windows. So unredirected means = the window pixmap is sent directly to kms / the display hardware?

08:04 <MrCooper> right, for unredirected windows, the window pixmap is the screen (scanout) pixmap

08:04 <MrCooper> the compositor is not involved with presentation for those

08:05 <MrCooper> page flipping turns the client pixmap into the screen pixmap

08:06 <pepp> MrCooper: I see. Thanks!

08:08 <MrCooper> np (actually Present page flipping doesn't replace the screen pixmap itself, but close enough logically)

08:08 randomher0 has joined #dri-devel

08:29 camus1 has joined #dri-devel

08:29 camus has quit [Read error: Connection reset by peer]

08:30 Lucretia has joined #dri-devel

08:33 <danvet> https://lore.kernel.org/dri-devel/20210622165511.3169559-1-daniel.vetter@ffwll.ch/T/#m7ab7743fa4d0d3b99eef06ceb27c9947201b76c1

08:33 <danvet> ^^ anyone feel like reviewing?

08:34 <danvet> it's the last one in that series

08:37 luzipher__ has joined #dri-devel

08:43 luzipher_ has quit [Ping timeout: 480 seconds]

08:44 thellstrom has joined #dri-devel

08:49 thellstrom1 has quit [Ping timeout: 480 seconds]

08:58 dllud_ has quit [Ping timeout: 480 seconds]

09:13 elongbug has joined #dri-devel

09:17 dllud has joined #dri-devel

09:22 ppascher has quit [Ping timeout: 480 seconds]

09:34 bcarvalho_ has joined #dri-devel

09:37 bcarvalho has quit [Read error: Connection reset by peer]

09:41 MrCooper has quit [Quit: Leaving]

09:42 MrCooper has joined #dri-devel

10:02 <HdkR> Anyone know if meson supports static-pie? :)

10:02 <ccr> mmm .. pie.

10:03 <HdkR> cmake is a bit..wonky with static-pie

10:19 hch12907_ is now known as hch12907

10:21 bcarvalho_ has quit []

10:26 <daniels> zmike: 'get stuck' as in just wander off and die whilst running Piglit/dEQP? if so, more hardware won't help that, but we will get the heartbeat sorted

10:33 <danvet> mripard, https://lore.kernel.org/dri-devel/20210622165511.3169559-1-daniel.vetter@ffwll.ch/T/#m7ab7743fa4d0d3b99eef06ceb27c9947201b76c1 maybe you?

10:47 pcercuei has joined #dri-devel

10:47 aiddamse has joined #dri-devel

10:48 aiddamse has quit []

10:48 aiddamse has joined #dri-devel

10:49 aiddamse has quit []

10:51 karolherbst has quit [Ping timeout: 480 seconds]

10:52 aiddamse has joined #dri-devel

10:52 aravind has quit [Remote host closed the connection]

10:53 aravind has joined #dri-devel

10:53 aiddamse has quit []

11:07 <danvet> tzimmermann, I think IS_ENABLED is more standard kernel style instead of #ifdef in the code

11:07 <danvet> but it's also a bit a bikeshed :-)

11:08 <tzimmermann> danvet, no problem i go this jani's suggestion

11:22 <danvet> daniels, I wonder whether we should document a bunch of the interop issues

11:22 <danvet> like "when am I actually on Android"

11:22 <danvet> and stuff like that

11:24 <daniels> yeah, it's kinda tricky since there are three audiences really; driver developers who need to know how to write a driver/subsystem that doesn't suck, userspace developers who need to know what to expect and what they can do, confused people from other parts of the kernel wondering wtf any of this is :)

11:25 <daniels> what you wrote ... isn't incorrect ... but it also doesn't really explain this stuff to people who don't already know it

11:25 karolherbst has joined #dri-devel

11:29 <daniels> danvet: so yeah, I guess maybe one on dma_resv for kernel people which just documents what the fence slots are for (WAR/WAW/RAW hazard avoidance) and how to access/update them in the kernel, then a link to a separate uapi section which explains why & when you use each?

11:29 <danvet> daniels, yeah something like that

11:30 <danvet> daniels, I think trying to smash the uapi spec (well "what should the full stack achieve") and the kernel driver docs into one won't go well here

11:31 <daniels> goldilocks ftw

11:31 YuGiOhJCJ has quit [Quit: YuGiOhJCJ]

11:39 luzipher_ has joined #dri-devel

11:39 yk has joined #dri-devel

11:40 adjtm is now known as Guest422

11:40 adjtm has joined #dri-devel

11:42 gpoo has joined #dri-devel

11:45 luzipher__ has quit [Ping timeout: 480 seconds]

11:47 Guest422 has quit [Ping timeout: 480 seconds]

12:05 <danvet> mlankhorst, https://lore.kernel.org/dri-devel/20210622165511.3169559-1-daniel.vetter@ffwll.ch/T/#m7ab7743fa4d0d3b99eef06ceb27c9947201b76c1 care to take a quick look an r-b?

12:05 <danvet> then I could smash that series in

12:07 pcercuei has quit [Quit: Lost terminal]

12:09 pcercuei has joined #dri-devel

12:14 <pcercuei> Is it normal/expected that systemd-logind is listed as a client of both my integrated intel and discreet nvidia GPUs? That means nouveau never goes to auto-suspend state...

12:17 <pcercuei> I guess I'd better ask on the systemd channel

12:20 <danvet> daniels, https://paste.debian.net/1202242/ sufficient amounts of s/should/must/ for the part we have already?

12:20 <danvet> pcercuei, nouveau auto-suspend is a bit shotgun approach and not very smart

12:20 <danvet> and yeah logind is supposed to keep that fd around

12:20 <danvet> it needs it for managing the vt switch dance

12:21 <danvet> pcercuei, proper auto-suspend would only keep the hw alive if an output is alive or if rendering is going on

12:21 <danvet> but that means you need to have code to swap out all the buffers before you auto-suspend

12:21 <danvet> and the locking gets a _lot_ more funny

12:22 <pcercuei> danvet: understood

12:22 <danvet> with integrated gpu it's a lot easier since the memory can't disappear, so you only have to care about saving/restoring device state

12:23 <danvet> pcercuei, I think for discrete the only realistic option would be a idle timer

12:23 <danvet> which first needs to swap out all bo to system memory

12:23 <danvet> and then drop the last runtime pm reference

12:23 <danvet> because taking dma_resv_lock from within runtime pm callbacks will deadlock

12:24 <danvet> maybe with some ttm helpers it wouldn't be too onerous to get it all going, but the first driver will suffer real bad

12:25 andrey-konovalov has joined #dri-devel

12:26 <pcercuei> danvet: from what I can see, I have no BOs

12:27 <danvet> pcercuei, yeah, but if you push the runtime pm down from "any open file handles" you might

12:27 <danvet> so minimally you need to keep track of that

12:27 <pcercuei> So the PM could be set to "active" when creating a BO, and "idle" when destroying all BOs, no?

12:28 <danvet> yeah, but there's also everything else

12:28 <danvet> I have no idea what else nouveau sets up on file open

12:28 <danvet> maybe chat with karolherbst

12:29 <karolherbst> uhghrhhg... my head still spins from all the fencing and bo code we have in nouveau :D

12:29 dllud has quit [Read error: Connection reset by peer]

12:29 <karolherbst> what's the question/problem?

12:30 <danvet> daniels, that diff good enough for an ack?

12:30 <pcercuei> karolherbst: I noticed that systemd-logind keeps a handle to my discreet's nvidia (nouveau) GPU, which causes nouveau to never auto-suspend the hardware

12:30 <karolherbst> pcercuei: that's not the cause of it never auto suspending

12:30 <karolherbst> that's normal as logind assigns devices to seats or something.. dunno the details

12:30 <karolherbst> but that's normal

12:31 <karolherbst> and it doesn't prevent auto pm from happening

12:31 ppascher has joined #dri-devel

12:31 <karolherbst> pcercuei: do you have external displays attached?

12:31 <pcercuei> I do

12:31 <karolherbst> but anyway.. check that the GPU is set to auto, that the HDA device on that GPU is set to auto and the bus controller they are on

12:32 <karolherbst> pcercuei: chances are, the display is on the nvidia GPU

12:32 <daniels> danvet: honestly I think just scrap the whole 'e.g. OpenGL is ... and Vulkan is ...' section because it's confusing

12:32 <daniels> with that, ack

12:32 <karolherbst> check /sys/class/drm/card1-*/ if any of the connectors are connected

12:33 <danvet> daniels, https://paste.debian.net/1202243/ like this?

12:33 <danvet> and a note in the commit message that the full driver stack/api discussion will be added in drm-uapi.rst later on?

12:33 <pcercuei> karolherbst: no, I only have card0-* stuff

12:33 <karolherbst> okay

12:34 dllud has joined #dri-devel

12:34 <pcercuei> it's a laptop so (AFAIK) the nvidia GPU is render-only

12:34 <karolherbst> pcercuei: ehh.. no

12:35 <karolherbst> there are plenty of laptops where the GPU is not render only

12:35 <karolherbst> it sounds stupid, but some use the nvidia GPU for displays

12:35 <karolherbst> except.. you use DP-MST via USB-C where the intel is in charge again :D

12:35 <karolherbst> anyway

12:35 <pcercuei> Ok. I believe you know that better than I do

12:35 <danvet> daniels, https://paste.debian.net/1202244/ end result

12:35 <karolherbst> I'd need the output of "lspci -tvv" and "grep . /sys/bus/pci/devices/*/power/control"

12:37 aravind has quit [Remote host closed the connection]

12:37 aravind has joined #dri-devel

12:39 <pcercuei> karolherbst: https://pastebin.com/raw/9kpzZJJq

12:39 <karolherbst> mhhhhhhhh

12:39 <karolherbst> pcercuei: dmesg as wlel

12:39 <karolherbst> *well

12:40 <karolherbst> could be that one bug we have.. but still checking. dmesg should be able to help

12:41 <pcercuei> dmesg: https://pastebin.com/VSCu67Pu

12:41 bcarvalho has joined #dri-devel

12:42 <karolherbst> *sigh*.. yeah... I think it's on snd_hda_intel

12:43 <karolherbst> pcercuei: I bet the 1.00.01 device is active? (cat /sys/bus/pci/devices/0000\:01\:00.1/power/runtime_status)

12:43 <karolherbst> *1.00.1

12:43 <karolherbst> I think Roy hit this issue as well..

12:43 <pcercuei> correct

12:43 <karolherbst> yeah...

12:44 <karolherbst> pcercuei: mind joining #nouveau? then we can discuss with Rspliet

12:44 <pcercuei> How is snd_hda_intel related in any way to nouveau?

12:44 <pcercuei> Sure

12:45 <bbrezillon> danvet, lynxeye: while adding/testing a new panfrost IGT test I noticed something weird https://gitlab.freedesktop.org/-/snippets/2247, can you tell me if that's expected?

12:46 <danvet> no idea about drm/scheduler, but in i915 we have both

12:46 <danvet> where we let jobs linger after fd close (apparently there's some reason for it somewhere)

12:46 <danvet> and where we actively tear them all down

12:46 <danvet> I guesss to make it actually race-free you'd need drm/scheduler support for teardown

12:47 <bbrezillon> ok, I thought drm_sched_entity_destroy() was taking care of that already

12:48 <danvet> mripard, thx

12:51 <mripard> you're welcome :)

12:51 <mripard> was it the only patch in that series?

12:52 <daniels> danvet: yep, ack, thankyou!

12:52 <danvet> mripard, yeah I think there all good now

12:55 <lynxeye> bbrezillon: Hm, you can't take jobs back that are already queued in the HW and your "scheduler pop'ed the job from the entity, but didn't quite submit it to HW yet" is just a corner case of this. What's the problem with letting the job execute at that point?

13:03 <bbrezillon> lynxeye: that's what we were doing, but stepri01 (who is not on this channel :-/) suggested that we killed in-flight jobs instead of letting them finish

13:03 stepri01 has joined #dri-devel

13:07 <lynxeye> bbrezillon: Can you actually kill jobs already on the HW runqueue, without any hickup?

13:07 <bbrezillon> lynxeye: IIRC, one argument was that we might want to relax the timeout on compute jobs at some point, and if we do that, we need to make sure process that get killed get their GPU job killed too

13:08 <bbrezillon> lynxeye: yep, there's a HARD_STOP feature that's designed for that

13:08 <stepri01> yes Mali (Midgard onwards at least) supports killing jobs ('hard stop') without affecting other jobs

13:09 <stepri01> can't we just check in the run_job() callback if the entity has gone away and simply immediately fail the job at that point?

13:13 thellstrom has quit [Remote host closed the connection]

13:14 <danvet> bbrezillon, you can't relax the timeout much really

13:14 <danvet> there's some wiggle room

13:14 <danvet> but if you want real long-running compute, you need to be able to preempt them

13:14 <danvet> so that you can stuff other stuff in-between

13:14 <danvet> also, that means no more dma_fence for these compute jobs

13:15 haasn` has quit []

13:16 haasn has joined #dri-devel

13:16 <danvet> daniels can perhaps fill you in on a bunch of the glorious details

13:16 <haasn> Does DRM_FORMAT_GR88 map at all to Vulkan or EGL? I have a client who claims he only gets GR88 format buffers from some source (vaapi?), but I cannot see a way to actually specify this DRM format with either Vulkan or EGL when importing the dmabuf

13:16 <bbrezillon> stepri01: not easily, at least not without some sort of synchronization (because the entity will be freed at some point, and we need to make sure we compare job->entity to a valid pointer)

13:17 <mareko> anholt: hi, is there anything to do to move this forward? https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11339

13:17 <stepri01> bbrezillon: what I really mean is not free the entity immediately, but mark it as dead. Then the normal submission logic can run, but instead of actually submitting the jobs just fail them immediately

13:19 <bbrezillon> danvet: you said you had support for both options in intel the drive, why do you need the 'kill jobs immediately' solution there?

13:19 <lynxeye> bbrezillon: I don't see how your scenario is happening. drm_sched_entity_fini waits until the entity is idle, so at that point the jobs are either still on the entity runqueue and will be killed by the scheduler, or submitted to the HW queue. So killing the job on the HW queue should work at that point.

13:21 <lynxeye> I don't see that race window where the job is pop'ed from the entity, but not yet on the HW queue. At least if the code is working as intended.

13:22 <haasn> Or is the intent of dmabufs that the importing API (EGL,Vulkan) implementation can access DMAbuf internals to disambiguage RG88 and GR88?

13:22 <haasn> I see some lines of code that seem to be doing that

13:23 <danvet> bbrezillon, it started as a misguided, because incomplete, attempt at supporting long running compute jobs

13:23 <danvet> bbrezillon, we had preempt and killing jobs on fd close

13:24 <danvet> but still dma_fence, so you could abuse dma_fence on a long running compute job to hang the kernel in all kinds of funny places

13:25 <bbrezillon> lynxeye: unfortunately that's not what I see here, I added traces in the close and hw_submit path, and I see this sequence 1/enter drv->run_job() 2/loop over all in-flight jobs to kill those attached to the entity in the close path 3/continue in drv->run_job() and submit the job that was supposed to be destroyed

13:26 sdutt has joined #dri-devel

13:31 boistordu has quit [Ping timeout: 480 seconds]

13:32 <bbrezillon> lynxeye: https://gitlab.freedesktop.org/bbrezillon/linux/-/blob/v5.13-for-mesa-ci/drivers/gpu/drm/panfrost/panfrost_job.c and https://gitlab.freedesktop.org/-/snippets/2249

13:33 RobertC has joined #dri-devel

13:34 <lynxeye> bbrezillon: Then I think that's a bug. drm_sched_entity_destroy should only return once all not-yet-submitted jobs are killed and the main scheduler thread submitted the job it got from the entity runlist.

13:35 <lynxeye> After this function returned the job should be submitted to the HW and you should find it for your HARD_STOP handling.

13:37 thellstrom has joined #dri-devel

13:37 <zmike> daniels: something like that I guess

13:38 <lynxeye> bbrezillon: Huh? Yea, the complete(&entity->entity_idle) in the scheduler main loop happens before the submit to HW. I think that should be moved to after the submit is done.

13:39 <bbrezillon> lynxeye: I can try that

13:44 <ishitatsuyuki> I'm seeing what seems to be suboptimal divergence analysis in ACO and want to debug it. Is there any handy ACO_DEBUG option for this?

13:45 <pendingchaos> ishitatsuyuki: no

13:46 <ishitatsuyuki> ok, guess I'll just fiddle with the shader/ISA dump then

13:46 <mareko> zmike: did you mean that all commits are Rb? https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11428#note_960894

13:46 <bbrezillon> stepri01: right, but the entity is part of the file_priv object which we free in the close path, so I'd prefer a solution where I don't need to refcount that one too :)

13:47 <stepri01> yes, obviously fixing this in the common code is even better :)

13:48 <pendingchaos> ishitatsuyuki: with https://pastebin.com/raw/Ksrqt7iZ and RADV_DEBUG=nocache,shaders,preoptir , you can shaders with the results of divergence analysis below the "NIR shader before instruction selection:" line

13:49 <ishitatsuyuki> neat trick

13:49 <zmike> mareko: all except the one that's ab

13:49 <zmike> sorry, missed your comment

13:49 <zmike> my inbox is a disaster

13:50 blue__penquin has quit []

13:52 <bbrezillon> lynxeye: I can't be 100% sure, but it seems to fix the issue

13:54 <bbrezillon> at least I don't see those panfrost_job_run() traces after the entity_destroy() calls anymore

13:55 <lynxeye> bbrezillon: Yea, I think it's just that nobody was thinking about this race window, as it's not a problem for all drivers that just let jobs finish once they are on the HW runqueue. For those drivers "job is on HW runqueue" and "job is picked up by scheduler thread" is the same thing, so you can declare the entity idle earlier.

13:56 <lynxeye> If you really need to make sure the job is on the HW runqueue, so you can find it there for killing the idle need to happen later. I don't think moving this complete after the submit has any downside for the other existing drivers.

14:04 <ishitatsuyuki> re: DA, DA was actually correctly marking the variable as uniform but VMEM loads were emitted instead, maybe someone changed it on purpose

14:06 <pendingchaos> if the memory read might have been written with a vmem store, ACO has to use vmem loads because smem uses a different cache

14:06 <pendingchaos> currently, ACO just checks if a potentially aliasing ssbo was written anywhere in the shader

14:07 <pendingchaos> also, smem can't be used for coherent/volatile loads on gfx6/7

14:09 luzipher__ has joined #dri-devel

14:09 <mlankhorst> danvet: looks like the prepare_fb/cleanup_fb could be a generic helper too, but I don't know what other drivers do in their fb functions

14:15 luzipher_ has quit [Ping timeout: 480 seconds]

14:23 cphealy has joined #dri-devel

14:29 thellstrom has quit [Quit: thellstrom]

14:29 <ishitatsuyuki> ended up solving the SMEM thing by removing writes from the shader

14:30 <ishitatsuyuki> it's probably the time to properly split my buffers by purpose instead of using a giant read-write one

14:30 <ishitatsuyuki> thanks pendingchaos, your guidance was very helpful

14:31 <bbrezillon> lynxeye: thx for your help with the sched 'bug/limitation'

14:34 sdutt has quit []

14:34 sdutt has joined #dri-devel

14:50 mattrope has joined #dri-devel

14:56 RobertC has quit [Ping timeout: 480 seconds]

15:01 NiksDev has joined #dri-devel

15:08 camus has joined #dri-devel

15:09 camus1 has quit [Remote host closed the connection]

15:20 pcercuei has quit [Quit: brb]

15:22 pcercuei has joined #dri-devel

15:22 <danvet> mlankhorst, lots more in that series is about making generic bits generic for prepare/cleanup_fb

15:23 <danvet> mlankhorst, so not sure what you're talking about

15:25 pcercuei has quit []

15:26 pcercuei has joined #dri-devel

15:33 bcarvalho_ has joined #dri-devel

15:34 bcarvalho_ has quit []

15:34 bcarvalho_ has joined #dri-devel

15:35 bcarvalho has quit [Read error: Connection reset by peer]

15:35 dllud has quit [Ping timeout: 480 seconds]

15:42 libv has joined #dri-devel

15:42 bcarvalho_ has quit []

15:43 bcarvalho has joined #dri-devel

15:45 libv_ has quit [Ping timeout: 480 seconds]

15:45 mbrost has joined #dri-devel

15:47 dllud has joined #dri-devel

15:48 pcercuei has quit [Quit: brb]

15:49 pcercuei has joined #dri-devel

15:51 <mlankhorst> ah k, I'll take a look at the whole series then

15:59 GloriousEggroll has joined #dri-devel

16:02 Duke`` has joined #dri-devel

16:06 <danvet> mlankhorst, can also look at drm-misc-next :-)

16:09 stepri01 has quit [Quit: leaving]

16:19 khfeng has quit [Ping timeout: 480 seconds]

16:30 bcarvalho_ has joined #dri-devel

16:30 bcarvalho has quit [Read error: Connection reset by peer]

16:31 gouchi has joined #dri-devel

16:39 thellstrom has joined #dri-devel

16:42 thellstrom1 has joined #dri-devel

16:48 thellstrom has quit [Ping timeout: 480 seconds]

16:48 aravind has quit [Remote host closed the connection]

16:54 cedric is now known as bluebugs

17:00 ngcortes has joined #dri-devel

17:06 ngcortes has quit [Remote host closed the connection]

17:07 elongbug has quit [Remote host closed the connection]

17:10 Peste_Bubonica has joined #dri-devel

17:16 pcercuei_ has joined #dri-devel

17:16 pcercuei has quit [Read error: Connection reset by peer]

17:17 alanc has quit [Remote host closed the connection]

17:17 alanc has joined #dri-devel

17:35 nsneck has quit [Ping timeout: 480 seconds]

17:36 lynxeye has quit [Quit: Leaving.]

17:40 gouchi has quit [Remote host closed the connection]

17:48 pcercuei_ has quit []

17:50 pcercuei has joined #dri-devel

17:53 thellstrom1 has quit []

17:54 nsneck has joined #dri-devel

18:02 thellstrom has joined #dri-devel

18:32 tobiasjakobi has quit [Remote host closed the connection]

18:34 pcercuei has quit [Quit: brb]

18:37 tzimmermann has quit [Quit: Leaving]

18:37 pcercuei has joined #dri-devel

18:45 thellstrom has quit [Remote host closed the connection]

18:54 gouchi has joined #dri-devel

18:54 pcercuei has quit [Read error: Connection reset by peer]

18:54 pcercuei_ has joined #dri-devel

19:03 pcercuei_ has quit []

19:04 thellstrom has joined #dri-devel

19:05 alyssa has joined #dri-devel

19:06 <alyssa> zmike: Reading through the atom ordering stuff

19:06 pcercuei has joined #dri-devel

19:06 <alyssa> for hardware that has fixed-function for everything (vertex attributes, blend modes, render targets, etc -- that is, hw whose GL drivers do no variants)

19:07 <alyssa> is Zink happy there?

19:07 <zmike> zink is never happy

19:07 <alyssa> IIUC a lot of this "variants for everything!" is an AMD specific issue (and Apple..)

19:07 <zmike> are you reading the issue or the MR

19:07 <alyssa> the issue

19:07 <zmike> oh

19:08 <zmike> that's a waste of time

19:08 <alyssa> thanks, she says after finishing reading it

19:09 <alyssa> trying to understand the goals here, is all

19:09 <alyssa> I also suppose the way GL drivers are written differs from VK

19:09 <zmike> the goal for ordering is just to enable drivers to be able to use values from states as they come in instead of having to do everything at draw

19:10 <alyssa> E.g. on Mali (Bifrost), we can either bake blend state into the shader variant or do it as dynamic state

19:10 <alyssa> gallium panfrost does it always dynamic

19:10 <alyssa> vulkan panfrost, we're talking about always baking it in

19:10 <zmike> yes, and with this you would know that any time you get a blend state you're guaranteed to have your shader state already

19:10 <alyssa> so for blend modes, zink+panvk would be inherently worse than panfrost, but that's a purely design decision, not an inherent one

19:11 <alyssa> (and if Zink perf is something we want to optimize for, we can do the same dynamic state stuff in the vk side. It just really sucks.)

19:11 <alyssa> bbrezillon: ^

19:11 <zmike> zink+panvk is setting off my future ptsd again, pls avoid mentioning this

19:11 <alyssa> [I would rather not optimize for this but I'm also not on vk]

19:12 <alyssa> zmike: Did I mention yet that Collabora is hiring? ;-)

19:12 * zmike dives head first into a steam sale

19:18 cphealy has quit [Ping timeout: 480 seconds]

19:19 cphealy has joined #dri-devel

19:28 gouchi has quit [Remote host closed the connection]

19:29 gouchi has joined #dri-devel

19:36 * vsyrjala getting 5 KiB/s from steam. sale must be going well

19:36 alyssa has left #dri-devel [#dri-devel]

19:47 pnowack has quit [Quit: pnowack]

19:48 pcercuei has quit [Read error: Connection reset by peer]

19:49 <mdnavare> hwentlan: Has anyone tried using the VRRTest App developed by Nixola and looked at the Target fps and actual FPS measurements? I am trying to understand how the lua scripts are calculating the actual fps in VRR case with Vsync Off , since in VRR enabled case that should be obtained from the flip done events generated by the kernel

19:51 <vsyrjala> our code doesn't handle async flip + vrr. as step 1 we should just reject async flips when vrr is enabled

19:58 <mdnavare> vsyrjala: So in the async flip request we can check if VRR prop set and if so then reject the flip?

19:59 <mdnavare> vsyrjala: Actually whats confusing here is the vsync off mode in VRR test app, not sure if that then just sends Async flips to the driver?

19:59 <vsyrjala> i would assume so

19:59 mwk has quit [Remote host closed the connection]

20:00 mwk has joined #dri-devel

20:00 <mdnavare> vsyrjala: So then in that case, we should be testing only with Vsync ON mode with this VRR Test app right?

20:01 <vsyrjala> yes. until we handle this correctly. i think i had some kind of idea how to handle this combination, but the details escape me right now

20:02 <mdnavare> vsyrjala: But should Async flips work with VRR ?

20:02 <mdnavare> ever?

20:03 <vsyrjala> i don't see much point in doing async+vrr, but it could be done

20:03 <vsyrjala> i should say that atm we'll just keep running at the lowest refresh rate if you do async flips

20:04 <vsyrjala> the "fix" would be to trigger pushes even for async flips, but that introduces some corner cases because we may now get new flips while the previous push is still pending

20:06 <vsyrjala> hence the easy thing to do is just refuse async flips for now

20:08 <mdnavare> vsyrjala: Yes I agree, but can we set up a separate call to discuss how we want toc orrectly handle or decide never to handle async flips + VRR

20:18 pcercuei has joined #dri-devel

20:18 ngcortes has joined #dri-devel

20:19 <anholt> danvet: in "drm/sched: Split drm_sched_job_init", it looks like you should have updated v3d and didn't?

20:32 rasterman has quit [Quit: Gettin' stinky!]

20:32 <danvet> anholt, ah yes, and that explains why you didn't get cc'ed on that

20:33 mlankhorst has quit [Ping timeout: 480 seconds]

20:33 mbrost_ has joined #dri-devel

20:35 mbrost has quit [Ping timeout: 480 seconds]

20:41 pcercuei has quit [Quit: brb]

20:47 pcercuei has joined #dri-devel

21:03 yevhenii has joined #dri-devel

21:03 yk has quit [Read error: Connection reset by peer]

21:17 <anholt> danvet: honestly I'm surprised I even opened the mail, I want so little to do with email workflow. and especially the kernel.

21:25 gouchi has quit [Remote host closed the connection]

21:27 jewins has joined #dri-devel

21:35 flto has quit [Remote host closed the connection]

21:37 flto has joined #dri-devel

21:38 danvet has quit [Ping timeout: 480 seconds]

21:46 ddavenport has joined #dri-devel

21:53 Duke`` has quit [Ping timeout: 480 seconds]

22:07 camus1 has joined #dri-devel

22:09 NiksDev has quit [Ping timeout: 480 seconds]

22:14 camus has quit [Ping timeout: 480 seconds]

22:26 orbea1 has joined #dri-devel

22:26 orbea1 has quit []

22:26 orbea1 has joined #dri-devel

22:27 ngcortes has quit [Remote host closed the connection]

22:27 orbea1 has quit []

22:27 orbea1 has joined #dri-devel

22:28 orbea has quit [Ping timeout: 480 seconds]

22:28 orbea1 has quit []

22:29 orbea has joined #dri-devel

22:38 CME has quit []

22:38 CME has joined #dri-devel

22:42 karolherbst has quit [Quit: Konversation terminated!]

22:43 Peste_Bubonica has quit [Ping timeout: 480 seconds]

22:43 ddavenport has quit [Remote host closed the connection]

22:45 Peste_Bubonica has joined #dri-devel

22:47 camus has joined #dri-devel

22:49 camus1 has quit [Read error: Connection reset by peer]

22:49 <zmike> anyone have an idea why this ci job has a 30 minute countdown timer showing up for...whatever this is ? https://gitlab.freedesktop.org/mesa/mesa/-/jobs/11276347

22:53 <jenatali> It's downloading the source... very slowly

22:54 <zmike> clearly daniels twisting the pipe into a knot again

22:54 YuGiOhJCJ has joined #dri-devel

22:54 flto has quit [Quit: Leaving]

22:55 flto has joined #dri-devel

22:56 <ccr> someone playing Pipe Mania with CI?

22:58 pcercuei has quit [Quit: dodo]

22:58 karolherbst has joined #dri-devel

22:58 <daniels> it’s one of the GStreamer-provided runners sucking the Mesa git repo through a straw

22:59 <zmike> ah

22:59 bcarvalho_ has quit [Ping timeout: 480 seconds]

23:04 Peste_Bubonica has quit [Quit: Leaving]

23:04 ngcortes has joined #dri-devel

23:05 gpoo has quit [Ping timeout: 480 seconds]

23:09 libv has quit [Ping timeout: 480 seconds]

23:15 gpoo has joined #dri-devel

23:18 luzipher_ has joined #dri-devel

23:22 libv has joined #dri-devel

23:24 luzipher__ has quit [Ping timeout: 480 seconds]

23:25 andrey-konovalov has quit [Ping timeout: 480 seconds]