ChanServ changed the topic of #dri-devel to: <ajax> nothing involved with X should ever be unable to find a bar
adjtm has quit [Read error: Connection reset by peer]
ngcortes has quit [Remote host closed the connection]
adjtm has joined #dri-devel
sarnex has joined #dri-devel
sarnex has quit []
sarnex has joined #dri-devel
undvasistas[m] has joined #dri-devel
fcarrijo has quit []
GloriousEggroll has quit [Quit: Death is life's way of telling you you've been fired.]
ddavenport has joined #dri-devel
reinist12 has joined #dri-devel
ppascher has quit [Ping timeout: 480 seconds]
Lucretia has quit []
agners has quit [Quit: WeeChat 3.2]
nsneck has quit [Remote host closed the connection]
MrCooper has quit [Remote host closed the connection]
nsneck has joined #dri-devel
adjtm has quit [Read error: Connection reset by peer]
adjtm has joined #dri-devel
MrCooper has joined #dri-devel
sarnex has quit [Ping timeout: 480 seconds]
Lightkey has quit [Ping timeout: 480 seconds]
camus1 has joined #dri-devel
sarnex has joined #dri-devel
<zmike> update: despite turnip coming out with a strong lead, and then ANV surging ahead into "but also this one little thing here" shedsmanship, lavapipe has officially taken first place in the multidraw implementation contest
<zmike> the organizers still have their eyes peeled to see what will happen with the second and third place prizes
<robclark> zmike: is this some sort of reviews-per-second benchmark :-P
Lightkey has joined #dri-devel
<zmike> if it was, airlied would be winning :P
<robclark> heheh
hch12907_ has joined #dri-devel
reinist12 has quit []
hch12907 has quit [Ping timeout: 480 seconds]
boistordu has quit [Remote host closed the connection]
boistordu has joined #dri-devel
luzipher_ has joined #dri-devel
luzipher__ has quit [Ping timeout: 480 seconds]
blue__penquin has joined #dri-devel
camus1 has quit [Remote host closed the connection]
camus has joined #dri-devel
ddavenport has quit [Remote host closed the connection]
fcarrijo has joined #dri-devel
khfeng has joined #dri-devel
gpoo has quit [Ping timeout: 480 seconds]
ppascher has joined #dri-devel
gpoo has joined #dri-devel
gpoo has quit [Ping timeout: 480 seconds]
<zmike> I think we gotta cut down on these iris jobs in ci until there's more hw or something
<zmike> they seem to get stuck way too often
cedric has joined #dri-devel
bluebugs has quit [Ping timeout: 480 seconds]
fcarrijo has quit []
yoslin has joined #dri-devel
yoslin has quit []
yoslin has joined #dri-devel
mattrope has quit [Remote host closed the connection]
luzipher__ has joined #dri-devel
luzipher_ has quit [Ping timeout: 480 seconds]
ddavenport has joined #dri-devel
andrey-konovalov has joined #dri-devel
tzimmermann has joined #dri-devel
aravind has joined #dri-devel
Duke`` has joined #dri-devel
mattrope has joined #dri-devel
luzipher_ has joined #dri-devel
luzipher__ has quit [Ping timeout: 480 seconds]
thellstrom1 has joined #dri-devel
thellstrom has quit [Remote host closed the connection]
<marcan> is this the right place to talk about KMS shenanigans?
<airlied> yeah probably
<marcan> I've been looking at how Apple did their display controller on M1 and... this is going to be fun
<marcan> they run a big blob on a side-CPU, and speak to it over a mailbox protocol. great, I don't have to implement DP training and isochronous memory bandwidth calculations and all that stuff
<marcan> ... but in exchange, the mailbox protocol is, shall we say, "interesting"
<marcan> as far as I can tell, for some parts what they did is literally move IOKit drivers to the firmware, and stuck shims in front
sdutt has quit [Remote host closed the connection]
<airlied> seems like a winning design
<airlied> is the interface sane or some sort of C++ RPC?
<marcan> for others there's another kind of protocol with synchronous messages/calls in both directions, as well as async ones, and such
<marcan> I'm not entirely sure quite yet how the regular marshaling works yet
<airlied> like nvidia does something similiar
<airlied> the kernel drivers talks to the modesetting firmware
<marcan> I do know that where data structures are involved, they have their own version of json-but-binary
<marcan> e.g. they don't get the EDID, they get a giant serialized blob describing everything verbosely, already parsed, as well as available modes and constraints etc
<marcan> this is only for certain things, not all messages (e.g. calls for displaying frames and such use pure binary structures, not this)
<airlied> is there any sort of per connector split?
<airlied> I suppose the other problems is the interface in anyway stable
<airlied> like will an update just trash the API
<marcan> right now on M1 machines I think there are two completely separate controllers with processors, and then there is a crossbar; not entirely sure who manages the crossbar, if these things or the host directly
<marcan> and yes, I was going to mention that
<marcan> I don't know how stable this ABI is; the firmware is shipped with macOS and becomes part of the bootloader stuff that runs before we get control
<marcan> however, it *is* per OS install
<marcan> so we are insulated from macOS updates, but it means that if we want to pin a version, our install script has to grab it from Apple's CDN and pull out the blobs from there
<airlied> my other worry would be they do fw updates from the OS drivers post-boot
<marcan> they do not
<marcan> it is not stored separately
<marcan> basically the bootloader-that-runs-before-us loads this firmware from the "OS" partition (which for linux will just contain our bootloader because we obviously aren't putting Linux on APFS)
<marcan> so you can dual-boot and macOS will never screw Linux up on this
mattrope has quit [Remote host closed the connection]
<marcan> that much is great
<airlied> okay then it mostly sounds like how nvidia hw works, probably with a different split in functionality
<airlied> from an atomic modesetting pov it would be good to know if you can submit all the state for all the connectors at once
<marcan> if they are different DCPs I assume not, but Apple know their display pipelines so they must have *something* to make this work
<airlied> sounds like a lot of tracing required and monitor plugging/unplugging :-)
<marcan> the other thing is while Apple can just tie firmware versions and the OS version together, we're eventually going to have to support more than one version (to get bugfixes and features), and if the ABI is unstasble that means we do need to support at least a certain subset of versions in the same kernel
<marcan> yes :-)
<marcan> I'm currently running macOS in a VM and just worked out the mailbox/ringbuffer protocols, so I have full dumps of every command/event/etc sent and received
<airlied> yeah hopefully the fw advertises a version
<marcan> oh we will know the version from iBoot anyway, that won't be an issue
<marcan> I'm sure there are a dozen ways to get that
<marcan> I'm writing the shim-bootloader that bridges the apple world to the devicetree world, so any info we need I can put in the devicetree
<marcan> if you're curious, here's a log from boot to login screen: https://marcan.st/transf/dcplog.txt
<airlied> you'll also have to figure out how audio works I suppose
<marcan> yeah, there's an endpoint for audio, but obviously the audio hardware is completely separate and there has to be some link at some point
<marcan> at this point I'm wondering how I'm going to deal with all this marshalling in the kernel... especially if we end up having to support multiple ABIs. Also that json-like thing.
<marcan> on the plus side, hopefully this will make supporting newer chips relatively trivial, if most of the differences are abstracted out
<marcan> I guess I should probably read up on IOKit since it will probably explain a lot of the concepts I'm seeing here, heh
<marcan> and here I was hoping to avoid that :-)
<airlied> marcan: it's probably not much worse than ACPI
<marcan> okay yeah
<marcan> but I was also hoping not to have to write another ACPI-like framework for this :-)
<airlied> marcan: any idea if any other hw looks the same?
<marcan> there are many other CPUs with mailboxes like this, but I don't know if the mailbox protocols are all along these lines. DCP is one of the worse ones I think.
<marcan> the GPU also uses one though
<marcan> in fact I was looking at DCP in part to have the basics worked out enough to then move to GPU, especially since the GPU has an unknown special MMU while the DCP uses the "standard" Apple IOMMU that we already know about (DART)
<airlied> marcan: metal transported over it?
<marcan> nah, alyssa already worked out the shaders and basica drawing and stuff, she's got like 70% of the GLES tests passing on macOS?
<marcan> that's issuing IOKit calls on macOS directly to their driver
<airlied> does the driver then forward those to the hw? or do stuff?
<marcan> so it's mostly about working out memory management, command submission, and preemption (which they do support, that's one of the big things the firmware does)
<marcan> I think it mostly forwards, though due to the preemption the firmware has to be aware of e.g. render target details, which is why it's higher level than you might expect
<marcan> we'll find out once I put this tracer on the GPU device
<marcan> I literally worked out the mailbox message passing structures yesterday :)
<airlied> marcan: nice! probably worth building a userspace prototype until you can nail down how much marshalling etc is going to be needed
ddavenport has quit [Remote host closed the connection]
Duke`` has quit [Ping timeout: 480 seconds]
mlankhorst has joined #dri-devel
ddavenport has joined #dri-devel
RobertC has joined #dri-devel
<marcan> aaa
<marcan> whoops, I guess that TCP connection wasn't as dead as I thought
<marcan> airlied: my current playground is actually using the device remotely via USB, running in my bootloader, controlled via python scripts on the other side
<marcan> that's actually how that hypervisor works - all that tracer stuff is python via a remote proxy, only the core is running locally
<marcan> so I'm going to do the same for this, build a prototype driver for it in python
<marcan> same for the GPU, I want to render a triangle before I start on the kernel driver
<marcan> hoping that eventually that tooling can help spit out metadata, thunks, or whatever we decide to go with for the kernel marshalling
<marcan> one cute thing this also lets me do, in theory, is just halt macos and hijack DCP and issue whatever commands I want, then resume it
<marcan> which should be interesting for experiments
<marcan> (I can also modify or suppress commands macOS sends)
<marcan> though how much of this I actually want to implement "properly" depends on how useful it is :)
RobertC has quit [Ping timeout: 480 seconds]
<marcan> this is what the top level looks like for the tracer that spat out that .txt file: https://github.com/AsahiLinux/m1n1/blob/main/proxyclient/hv/trace_dcp.py
<marcan> though I need to refactor the message handling, in particular the inbuf/outbuf thing, now that I have a better idea of how it works. it's more symmetric than I thought.
gouchi has joined #dri-devel
gouchi has quit []
luzipher__ has joined #dri-devel
luzipher_ has quit [Ping timeout: 480 seconds]
pnowack has joined #dri-devel
pnowack has quit [Remote host closed the connection]
pnowack has joined #dri-devel
<bbrezillon> danvet: yes, I was planning to introduce a 'disable implicit deps' flag in v2, and daniels already pointed me to jekstrand's work ;-)
rasterman has joined #dri-devel
ppascher has quit [Ping timeout: 480 seconds]
rasterman has quit []
rasterman has joined #dri-devel
luzipher_ has joined #dri-devel
kem has quit [Ping timeout: 480 seconds]
ppascher has joined #dri-devel
kem has joined #dri-devel
luzipher__ has quit [Ping timeout: 480 seconds]
ddavenport has quit [Remote host closed the connection]
andrey-konovalov has quit [Ping timeout: 480 seconds]
danvet has joined #dri-devel
<pepp> MrCooper: all pixels of an app are copied twice (DRI3 buffers -> window pixmap -> compositor framebuffer), right?
lynxeye has joined #dri-devel
<MrCooper> right
<pepp> thanks. One more question: AFAICT using gdb the window pixmap is still updated for unredirected fullscreen windows. So unredirected means = the window pixmap is sent directly to kms / the display hardware?
<MrCooper> right, for unredirected windows, the window pixmap is the screen (scanout) pixmap
<MrCooper> the compositor is not involved with presentation for those
<MrCooper> page flipping turns the client pixmap into the screen pixmap
<pepp> MrCooper: I see. Thanks!
<MrCooper> np (actually Present page flipping doesn't replace the screen pixmap itself, but close enough logically)
randomher0 has joined #dri-devel
camus1 has joined #dri-devel
camus has quit [Read error: Connection reset by peer]
Lucretia has joined #dri-devel
<danvet> ^^ anyone feel like reviewing?
<danvet> it's the last one in that series
luzipher__ has joined #dri-devel
luzipher_ has quit [Ping timeout: 480 seconds]
thellstrom has joined #dri-devel
thellstrom1 has quit [Ping timeout: 480 seconds]
dllud_ has quit [Ping timeout: 480 seconds]
elongbug has joined #dri-devel
dllud has joined #dri-devel
ppascher has quit [Ping timeout: 480 seconds]
bcarvalho_ has joined #dri-devel
bcarvalho has quit [Read error: Connection reset by peer]
MrCooper has quit [Quit: Leaving]
MrCooper has joined #dri-devel
<HdkR> Anyone know if meson supports static-pie? :)
<ccr> mmm .. pie.
<HdkR> cmake is a bit..wonky with static-pie
hch12907_ is now known as hch12907
bcarvalho_ has quit []
<daniels> zmike: 'get stuck' as in just wander off and die whilst running Piglit/dEQP? if so, more hardware won't help that, but we will get the heartbeat sorted
pcercuei has joined #dri-devel
aiddamse has joined #dri-devel
aiddamse has quit []
aiddamse has joined #dri-devel
aiddamse has quit []
karolherbst has quit [Ping timeout: 480 seconds]
aiddamse has joined #dri-devel
aravind has quit [Remote host closed the connection]
aravind has joined #dri-devel
aiddamse has quit []
<danvet> tzimmermann, I think IS_ENABLED is more standard kernel style instead of #ifdef in the code
<danvet> but it's also a bit a bikeshed :-)
<tzimmermann> danvet, no problem i go this jani's suggestion
<danvet> daniels, I wonder whether we should document a bunch of the interop issues
<danvet> like "when am I actually on Android"
<danvet> and stuff like that
<daniels> yeah, it's kinda tricky since there are three audiences really; driver developers who need to know how to write a driver/subsystem that doesn't suck, userspace developers who need to know what to expect and what they can do, confused people from other parts of the kernel wondering wtf any of this is :)
<daniels> what you wrote ... isn't incorrect ... but it also doesn't really explain this stuff to people who don't already know it
karolherbst has joined #dri-devel
<daniels> danvet: so yeah, I guess maybe one on dma_resv for kernel people which just documents what the fence slots are for (WAR/WAW/RAW hazard avoidance) and how to access/update them in the kernel, then a link to a separate uapi section which explains why & when you use each?
<danvet> daniels, yeah something like that
<danvet> daniels, I think trying to smash the uapi spec (well "what should the full stack achieve") and the kernel driver docs into one won't go well here
<daniels> goldilocks ftw
YuGiOhJCJ has quit [Quit: YuGiOhJCJ]
luzipher_ has joined #dri-devel
yk has joined #dri-devel
adjtm is now known as Guest422
adjtm has joined #dri-devel
gpoo has joined #dri-devel
luzipher__ has quit [Ping timeout: 480 seconds]
Guest422 has quit [Ping timeout: 480 seconds]
<danvet> then I could smash that series in
pcercuei has quit [Quit: Lost terminal]
pcercuei has joined #dri-devel
<pcercuei> Is it normal/expected that systemd-logind is listed as a client of both my integrated intel and discreet nvidia GPUs? That means nouveau never goes to auto-suspend state...
<pcercuei> I guess I'd better ask on the systemd channel
<danvet> daniels, https://paste.debian.net/1202242/ sufficient amounts of s/should/must/ for the part we have already?
<danvet> pcercuei, nouveau auto-suspend is a bit shotgun approach and not very smart
<danvet> and yeah logind is supposed to keep that fd around
<danvet> it needs it for managing the vt switch dance
<danvet> pcercuei, proper auto-suspend would only keep the hw alive if an output is alive or if rendering is going on
<danvet> but that means you need to have code to swap out all the buffers before you auto-suspend
<danvet> and the locking gets a _lot_ more funny
<pcercuei> danvet: understood
<danvet> with integrated gpu it's a lot easier since the memory can't disappear, so you only have to care about saving/restoring device state
<danvet> pcercuei, I think for discrete the only realistic option would be a idle timer
<danvet> which first needs to swap out all bo to system memory
<danvet> and then drop the last runtime pm reference
<danvet> because taking dma_resv_lock from within runtime pm callbacks will deadlock
<danvet> maybe with some ttm helpers it wouldn't be too onerous to get it all going, but the first driver will suffer real bad
andrey-konovalov has joined #dri-devel
<pcercuei> danvet: from what I can see, I have no BOs
<danvet> pcercuei, yeah, but if you push the runtime pm down from "any open file handles" you might
<danvet> so minimally you need to keep track of that
<pcercuei> So the PM could be set to "active" when creating a BO, and "idle" when destroying all BOs, no?
<danvet> yeah, but there's also everything else
<danvet> I have no idea what else nouveau sets up on file open
<danvet> maybe chat with karolherbst
<karolherbst> uhghrhhg... my head still spins from all the fencing and bo code we have in nouveau :D
dllud has quit [Read error: Connection reset by peer]
<karolherbst> what's the question/problem?
<danvet> daniels, that diff good enough for an ack?
<pcercuei> karolherbst: I noticed that systemd-logind keeps a handle to my discreet's nvidia (nouveau) GPU, which causes nouveau to never auto-suspend the hardware
<karolherbst> pcercuei: that's not the cause of it never auto suspending
<karolherbst> that's normal as logind assigns devices to seats or something.. dunno the details
<karolherbst> but that's normal
<karolherbst> and it doesn't prevent auto pm from happening
ppascher has joined #dri-devel
<karolherbst> pcercuei: do you have external displays attached?
<pcercuei> I do
<karolherbst> but anyway.. check that the GPU is set to auto, that the HDA device on that GPU is set to auto and the bus controller they are on
<karolherbst> pcercuei: chances are, the display is on the nvidia GPU
<daniels> danvet: honestly I think just scrap the whole 'e.g. OpenGL is ... and Vulkan is ...' section because it's confusing
<daniels> with that, ack
<karolherbst> check /sys/class/drm/card1-*/ if any of the connectors are connected
<danvet> daniels, https://paste.debian.net/1202243/ like this?
<danvet> and a note in the commit message that the full driver stack/api discussion will be added in drm-uapi.rst later on?
<pcercuei> karolherbst: no, I only have card0-* stuff
<karolherbst> okay
dllud has joined #dri-devel
<pcercuei> it's a laptop so (AFAIK) the nvidia GPU is render-only
<karolherbst> pcercuei: ehh.. no
<karolherbst> there are plenty of laptops where the GPU is not render only
<karolherbst> it sounds stupid, but some use the nvidia GPU for displays
<karolherbst> except.. you use DP-MST via USB-C where the intel is in charge again :D
<karolherbst> anyway
<pcercuei> Ok. I believe you know that better than I do
<danvet> daniels, https://paste.debian.net/1202244/ end result
<karolherbst> I'd need the output of "lspci -tvv" and "grep . /sys/bus/pci/devices/*/power/control"
aravind has quit [Remote host closed the connection]
aravind has joined #dri-devel
<pcercuei> karolherbst: https://pastebin.com/raw/9kpzZJJq
<karolherbst> mhhhhhhhh
<karolherbst> pcercuei: dmesg as wlel
<karolherbst> *well
<karolherbst> could be that one bug we have.. but still checking. dmesg should be able to help
bcarvalho has joined #dri-devel
<karolherbst> *sigh*.. yeah... I think it's on snd_hda_intel
<karolherbst> pcercuei: I bet the 1.00.01 device is active? (cat /sys/bus/pci/devices/0000\:01\:00.1/power/runtime_status)
<karolherbst> *1.00.1
<karolherbst> I think Roy hit this issue as well..
<pcercuei> correct
<karolherbst> yeah...
<karolherbst> pcercuei: mind joining #nouveau? then we can discuss with Rspliet
<pcercuei> How is snd_hda_intel related in any way to nouveau?
<pcercuei> Sure
<bbrezillon> danvet, lynxeye: while adding/testing a new panfrost IGT test I noticed something weird https://gitlab.freedesktop.org/-/snippets/2247, can you tell me if that's expected?
<danvet> no idea about drm/scheduler, but in i915 we have both
<danvet> where we let jobs linger after fd close (apparently there's some reason for it somewhere)
<danvet> and where we actively tear them all down
<danvet> I guesss to make it actually race-free you'd need drm/scheduler support for teardown
<bbrezillon> ok, I thought drm_sched_entity_destroy() was taking care of that already
<danvet> mripard, thx
<mripard> you're welcome :)
<mripard> was it the only patch in that series?
<daniels> danvet: yep, ack, thankyou!
<danvet> mripard, yeah I think there all good now
<lynxeye> bbrezillon: Hm, you can't take jobs back that are already queued in the HW and your "scheduler pop'ed the job from the entity, but didn't quite submit it to HW yet" is just a corner case of this. What's the problem with letting the job execute at that point?
<bbrezillon> lynxeye: that's what we were doing, but stepri01 (who is not on this channel :-/) suggested that we killed in-flight jobs instead of letting them finish
stepri01 has joined #dri-devel
<lynxeye> bbrezillon: Can you actually kill jobs already on the HW runqueue, without any hickup?
<bbrezillon> lynxeye: IIRC, one argument was that we might want to relax the timeout on compute jobs at some point, and if we do that, we need to make sure process that get killed get their GPU job killed too
<bbrezillon> lynxeye: yep, there's a HARD_STOP feature that's designed for that
<stepri01> yes Mali (Midgard onwards at least) supports killing jobs ('hard stop') without affecting other jobs
<stepri01> can't we just check in the run_job() callback if the entity has gone away and simply immediately fail the job at that point?
thellstrom has quit [Remote host closed the connection]
<danvet> bbrezillon, you can't relax the timeout much really
<danvet> there's some wiggle room
<danvet> but if you want real long-running compute, you need to be able to preempt them
<danvet> so that you can stuff other stuff in-between
<danvet> also, that means no more dma_fence for these compute jobs
haasn` has quit []
haasn has joined #dri-devel
<danvet> daniels can perhaps fill you in on a bunch of the glorious details
<haasn> Does DRM_FORMAT_GR88 map at all to Vulkan or EGL? I have a client who claims he only gets GR88 format buffers from some source (vaapi?), but I cannot see a way to actually specify this DRM format with either Vulkan or EGL when importing the dmabuf
<bbrezillon> stepri01: not easily, at least not without some sort of synchronization (because the entity will be freed at some point, and we need to make sure we compare job->entity to a valid pointer)
<mareko> anholt: hi, is there anything to do to move this forward? https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11339
<stepri01> bbrezillon: what I really mean is not free the entity immediately, but mark it as dead. Then the normal submission logic can run, but instead of actually submitting the jobs just fail them immediately
<bbrezillon> danvet: you said you had support for both options in intel the drive, why do you need the 'kill jobs immediately' solution there?
<lynxeye> bbrezillon: I don't see how your scenario is happening. drm_sched_entity_fini waits until the entity is idle, so at that point the jobs are either still on the entity runqueue and will be killed by the scheduler, or submitted to the HW queue. So killing the job on the HW queue should work at that point.
<lynxeye> I don't see that race window where the job is pop'ed from the entity, but not yet on the HW queue. At least if the code is working as intended.
<haasn> Or is the intent of dmabufs that the importing API (EGL,Vulkan) implementation can access DMAbuf internals to disambiguage RG88 and GR88?
<haasn> I see some lines of code that seem to be doing that
<danvet> bbrezillon, it started as a misguided, because incomplete, attempt at supporting long running compute jobs
<danvet> bbrezillon, we had preempt and killing jobs on fd close
<danvet> but still dma_fence, so you could abuse dma_fence on a long running compute job to hang the kernel in all kinds of funny places
<bbrezillon> lynxeye: unfortunately that's not what I see here, I added traces in the close and hw_submit path, and I see this sequence 1/enter drv->run_job() 2/loop over all in-flight jobs to kill those attached to the entity in the close path 3/continue in drv->run_job() and submit the job that was supposed to be destroyed
sdutt has joined #dri-devel
boistordu has quit [Ping timeout: 480 seconds]
RobertC has joined #dri-devel
<lynxeye> bbrezillon: Then I think that's a bug. drm_sched_entity_destroy should only return once all not-yet-submitted jobs are killed and the main scheduler thread submitted the job it got from the entity runlist.
<lynxeye> After this function returned the job should be submitted to the HW and you should find it for your HARD_STOP handling.
thellstrom has joined #dri-devel
<zmike> daniels: something like that I guess
<lynxeye> bbrezillon: Huh? Yea, the complete(&entity->entity_idle) in the scheduler main loop happens before the submit to HW. I think that should be moved to after the submit is done.
<bbrezillon> lynxeye: I can try that
<ishitatsuyuki> I'm seeing what seems to be suboptimal divergence analysis in ACO and want to debug it. Is there any handy ACO_DEBUG option for this?
<pendingchaos> ishitatsuyuki: no
<ishitatsuyuki> ok, guess I'll just fiddle with the shader/ISA dump then
<mareko> zmike: did you mean that all commits are Rb? https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11428#note_960894
<bbrezillon> stepri01: right, but the entity is part of the file_priv object which we free in the close path, so I'd prefer a solution where I don't need to refcount that one too :)
<stepri01> yes, obviously fixing this in the common code is even better :)
<pendingchaos> ishitatsuyuki: with https://pastebin.com/raw/Ksrqt7iZ and RADV_DEBUG=nocache,shaders,preoptir , you can shaders with the results of divergence analysis below the "NIR shader before instruction selection:" line
<ishitatsuyuki> neat trick
<zmike> mareko: all except the one that's ab
<zmike> sorry, missed your comment
<zmike> my inbox is a disaster
blue__penquin has quit []
<bbrezillon> lynxeye: I can't be 100% sure, but it seems to fix the issue
<bbrezillon> at least I don't see those panfrost_job_run() traces after the entity_destroy() calls anymore
<lynxeye> bbrezillon: Yea, I think it's just that nobody was thinking about this race window, as it's not a problem for all drivers that just let jobs finish once they are on the HW runqueue. For those drivers "job is on HW runqueue" and "job is picked up by scheduler thread" is the same thing, so you can declare the entity idle earlier.
<lynxeye> If you really need to make sure the job is on the HW runqueue, so you can find it there for killing the idle need to happen later. I don't think moving this complete after the submit has any downside for the other existing drivers.
<ishitatsuyuki> re: DA, DA was actually correctly marking the variable as uniform but VMEM loads were emitted instead, maybe someone changed it on purpose
<pendingchaos> if the memory read might have been written with a vmem store, ACO has to use vmem loads because smem uses a different cache
<pendingchaos> currently, ACO just checks if a potentially aliasing ssbo was written anywhere in the shader
<pendingchaos> also, smem can't be used for coherent/volatile loads on gfx6/7
luzipher__ has joined #dri-devel
<mlankhorst> danvet: looks like the prepare_fb/cleanup_fb could be a generic helper too, but I don't know what other drivers do in their fb functions
luzipher_ has quit [Ping timeout: 480 seconds]
cphealy has joined #dri-devel
thellstrom has quit [Quit: thellstrom]
<ishitatsuyuki> ended up solving the SMEM thing by removing writes from the shader
<ishitatsuyuki> it's probably the time to properly split my buffers by purpose instead of using a giant read-write one
<ishitatsuyuki> thanks pendingchaos, your guidance was very helpful
<bbrezillon> lynxeye: thx for your help with the sched 'bug/limitation'
sdutt has quit []
sdutt has joined #dri-devel
mattrope has joined #dri-devel
RobertC has quit [Ping timeout: 480 seconds]
NiksDev has joined #dri-devel
camus has joined #dri-devel
camus1 has quit [Remote host closed the connection]
pcercuei has quit [Quit: brb]
pcercuei has joined #dri-devel
<danvet> mlankhorst, lots more in that series is about making generic bits generic for prepare/cleanup_fb
<danvet> mlankhorst, so not sure what you're talking about
pcercuei has quit []
pcercuei has joined #dri-devel
bcarvalho_ has joined #dri-devel
bcarvalho_ has quit []
bcarvalho_ has joined #dri-devel
bcarvalho has quit [Read error: Connection reset by peer]
dllud has quit [Ping timeout: 480 seconds]
libv has joined #dri-devel
bcarvalho_ has quit []
bcarvalho has joined #dri-devel
libv_ has quit [Ping timeout: 480 seconds]
mbrost has joined #dri-devel
dllud has joined #dri-devel
pcercuei has quit [Quit: brb]
pcercuei has joined #dri-devel
<mlankhorst> ah k, I'll take a look at the whole series then
GloriousEggroll has joined #dri-devel
Duke`` has joined #dri-devel
<danvet> mlankhorst, can also look at drm-misc-next :-)
stepri01 has quit [Quit: leaving]
khfeng has quit [Ping timeout: 480 seconds]
bcarvalho_ has joined #dri-devel
bcarvalho has quit [Read error: Connection reset by peer]
gouchi has joined #dri-devel
thellstrom has joined #dri-devel
thellstrom1 has joined #dri-devel
thellstrom has quit [Ping timeout: 480 seconds]
aravind has quit [Remote host closed the connection]
cedric is now known as bluebugs
ngcortes has joined #dri-devel
ngcortes has quit [Remote host closed the connection]
elongbug has quit [Remote host closed the connection]
Peste_Bubonica has joined #dri-devel
pcercuei_ has joined #dri-devel
pcercuei has quit [Read error: Connection reset by peer]
alanc has quit [Remote host closed the connection]
alanc has joined #dri-devel
nsneck has quit [Ping timeout: 480 seconds]
lynxeye has quit [Quit: Leaving.]
gouchi has quit [Remote host closed the connection]
pcercuei_ has quit []
pcercuei has joined #dri-devel
thellstrom1 has quit []
nsneck has joined #dri-devel
thellstrom has joined #dri-devel
tobiasjakobi has quit [Remote host closed the connection]
pcercuei has quit [Quit: brb]
tzimmermann has quit [Quit: Leaving]
pcercuei has joined #dri-devel
thellstrom has quit [Remote host closed the connection]
gouchi has joined #dri-devel
pcercuei has quit [Read error: Connection reset by peer]
pcercuei_ has joined #dri-devel
pcercuei_ has quit []
thellstrom has joined #dri-devel
alyssa has joined #dri-devel
<alyssa> zmike: Reading through the atom ordering stuff
pcercuei has joined #dri-devel
<alyssa> for hardware that has fixed-function for everything (vertex attributes, blend modes, render targets, etc -- that is, hw whose GL drivers do no variants)
<alyssa> is Zink happy there?
<zmike> zink is never happy
<alyssa> IIUC a lot of this "variants for everything!" is an AMD specific issue (and Apple..)
<zmike> are you reading the issue or the MR
<alyssa> the issue
<zmike> oh
<zmike> that's a waste of time
<alyssa> thanks, she says after finishing reading it
<alyssa> trying to understand the goals here, is all
<alyssa> I also suppose the way GL drivers are written differs from VK
<zmike> the goal for ordering is just to enable drivers to be able to use values from states as they come in instead of having to do everything at draw
<alyssa> E.g. on Mali (Bifrost), we can either bake blend state into the shader variant or do it as dynamic state
<alyssa> gallium panfrost does it always dynamic
<alyssa> vulkan panfrost, we're talking about always baking it in
<zmike> yes, and with this you would know that any time you get a blend state you're guaranteed to have your shader state already
<alyssa> so for blend modes, zink+panvk would be inherently worse than panfrost, but that's a purely design decision, not an inherent one
<alyssa> (and if Zink perf is something we want to optimize for, we can do the same dynamic state stuff in the vk side. It just really sucks.)
<alyssa> bbrezillon: ^
<zmike> zink+panvk is setting off my future ptsd again, pls avoid mentioning this
<alyssa> [I would rather not optimize for this but I'm also not on vk]
<alyssa> zmike: Did I mention yet that Collabora is hiring? ;-)
* zmike dives head first into a steam sale
cphealy has quit [Ping timeout: 480 seconds]
cphealy has joined #dri-devel
gouchi has quit [Remote host closed the connection]
gouchi has joined #dri-devel
* vsyrjala getting 5 KiB/s from steam. sale must be going well
alyssa has left #dri-devel [#dri-devel]
pnowack has quit [Quit: pnowack]
pcercuei has quit [Read error: Connection reset by peer]
<mdnavare> hwentlan: Has anyone tried using the VRRTest App developed by Nixola and looked at the Target fps and actual FPS measurements? I am trying to understand how the lua scripts are calculating the actual fps in VRR case with Vsync Off , since in VRR enabled case that should be obtained from the flip done events generated by the kernel
<vsyrjala> our code doesn't handle async flip + vrr. as step 1 we should just reject async flips when vrr is enabled
<mdnavare> vsyrjala: So in the async flip request we can check if VRR prop set and if so then reject the flip?
<mdnavare> vsyrjala: Actually whats confusing here is the vsync off mode in VRR test app, not sure if that then just sends Async flips to the driver?
<vsyrjala> i would assume so
mwk has quit [Remote host closed the connection]
mwk has joined #dri-devel
<mdnavare> vsyrjala: So then in that case, we should be testing only with Vsync ON mode with this VRR Test app right?
<vsyrjala> yes. until we handle this correctly. i think i had some kind of idea how to handle this combination, but the details escape me right now
<mdnavare> vsyrjala: But should Async flips work with VRR ?
<mdnavare> ever?
<vsyrjala> i don't see much point in doing async+vrr, but it could be done
<vsyrjala> i should say that atm we'll just keep running at the lowest refresh rate if you do async flips
<vsyrjala> the "fix" would be to trigger pushes even for async flips, but that introduces some corner cases because we may now get new flips while the previous push is still pending
<vsyrjala> hence the easy thing to do is just refuse async flips for now
<mdnavare> vsyrjala: Yes I agree, but can we set up a separate call to discuss how we want toc orrectly handle or decide never to handle async flips + VRR
pcercuei has joined #dri-devel
ngcortes has joined #dri-devel
<anholt> danvet: in "drm/sched: Split drm_sched_job_init", it looks like you should have updated v3d and didn't?
rasterman has quit [Quit: Gettin' stinky!]
<danvet> anholt, ah yes, and that explains why you didn't get cc'ed on that
mlankhorst has quit [Ping timeout: 480 seconds]
mbrost_ has joined #dri-devel
mbrost has quit [Ping timeout: 480 seconds]
pcercuei has quit [Quit: brb]
pcercuei has joined #dri-devel
yevhenii has joined #dri-devel
yk has quit [Read error: Connection reset by peer]
<anholt> danvet: honestly I'm surprised I even opened the mail, I want so little to do with email workflow. and especially the kernel.
gouchi has quit [Remote host closed the connection]
jewins has joined #dri-devel
flto has quit [Remote host closed the connection]
flto has joined #dri-devel
danvet has quit [Ping timeout: 480 seconds]
ddavenport has joined #dri-devel
Duke`` has quit [Ping timeout: 480 seconds]
camus1 has joined #dri-devel
NiksDev has quit [Ping timeout: 480 seconds]
camus has quit [Ping timeout: 480 seconds]
orbea1 has joined #dri-devel
orbea1 has quit []
orbea1 has joined #dri-devel
ngcortes has quit [Remote host closed the connection]
orbea1 has quit []
orbea1 has joined #dri-devel
orbea has quit [Ping timeout: 480 seconds]
orbea1 has quit []
orbea has joined #dri-devel
CME has quit []
CME has joined #dri-devel
karolherbst has quit [Quit: Konversation terminated!]
Peste_Bubonica has quit [Ping timeout: 480 seconds]
ddavenport has quit [Remote host closed the connection]
Peste_Bubonica has joined #dri-devel
camus has joined #dri-devel
camus1 has quit [Read error: Connection reset by peer]
<zmike> anyone have an idea why this ci job has a 30 minute countdown timer showing up for...whatever this is ? https://gitlab.freedesktop.org/mesa/mesa/-/jobs/11276347
<jenatali> It's downloading the source... very slowly
<zmike> clearly daniels twisting the pipe into a knot again
YuGiOhJCJ has joined #dri-devel
flto has quit [Quit: Leaving]
flto has joined #dri-devel
<ccr> someone playing Pipe Mania with CI?
pcercuei has quit [Quit: dodo]
karolherbst has joined #dri-devel
<daniels> it’s one of the GStreamer-provided runners sucking the Mesa git repo through a straw
<zmike> ah
bcarvalho_ has quit [Ping timeout: 480 seconds]
Peste_Bubonica has quit [Quit: Leaving]
ngcortes has joined #dri-devel
gpoo has quit [Ping timeout: 480 seconds]
libv has quit [Ping timeout: 480 seconds]
gpoo has joined #dri-devel
luzipher_ has joined #dri-devel
libv has joined #dri-devel
luzipher__ has quit [Ping timeout: 480 seconds]
andrey-konovalov has quit [Ping timeout: 480 seconds]