alanc-away has quit [Remote host closed the connection]
alanc-away has joined #dri-devel
mbrost_ has quit [Remote host closed the connection]
mbrost_ has joined #dri-devel
mbrost__ has joined #dri-devel
mbrost has joined #dri-devel
mbrost_ has quit [Ping timeout: 480 seconds]
danvet has joined #dri-devel
mbrost__ has quit [Ping timeout: 480 seconds]
orbea has quit [Remote host closed the connection]
orbea has joined #dri-devel
Zopolis4_ has joined #dri-devel
itoral has joined #dri-devel
mbrost has quit [Read error: Connection reset by peer]
Company has quit [Quit: Leaving]
bmodem has joined #dri-devel
mbrost has joined #dri-devel
mbrost_ has joined #dri-devel
mbrost has quit [Ping timeout: 480 seconds]
dcz has joined #dri-devel
tzimmermann has joined #dri-devel
camus1 has quit [Ping timeout: 480 seconds]
mvlad has joined #dri-devel
rasterman has joined #dri-devel
vliaskov has joined #dri-devel
camus has joined #dri-devel
frieder has joined #dri-devel
frieder has quit [Remote host closed the connection]
i-garrison has quit [Ping timeout: 480 seconds]
i-garrison has joined #dri-devel
kzd has quit [Quit: kzd]
jfalempe has joined #dri-devel
sghuge has quit [Remote host closed the connection]
sghuge has joined #dri-devel
pjakobsson has quit [Remote host closed the connection]
frieder has joined #dri-devel
Zopolis4_ has quit [Quit: Connection closed for inactivity]
frieder has quit []
frieder has joined #dri-devel
tursulin has joined #dri-devel
pcercuei has joined #dri-devel
kts has joined #dri-devel
urja has quit [Read error: Connection reset by peer]
urja has joined #dri-devel
Haaninjo has joined #dri-devel
<airlied>
Venemo: hey you might know, do mesh shaders outputs get fixed function clipped like any outputs from the vert stages?
Ziemas has quit [Ping timeout: 480 seconds]
lynxeye has joined #dri-devel
vliaskov has quit [Ping timeout: 480 seconds]
vliaskov has joined #dri-devel
pochu has joined #dri-devel
kts has quit [Quit: Konversation terminated!]
swalker_ has joined #dri-devel
swalker_ is now known as Guest262
swalker__ has joined #dri-devel
<javierm>
danvet: I wrote on Friday the patch you suggested for mutter but have some issues with the compat layer, the damaged clips are added to the plane's atomic state before committing
<javierm>
tzimmermann: probably you can help here too ^
mbrost_ has quit []
Guest262 has quit [Ping timeout: 480 seconds]
elongbug has joined #dri-devel
<bbrezillon>
danvet: With drm_sched_resubmit_jobs() being deprecated, I wonder who's supposed to set the job's parent field back to something non-NULL before drm_sched_start() is called (those are set to NULL in drm_sched_stop(), when the fence callback is removed). Are drivers supposed to manually iterate over the pending_list to set it up?
<mareko>
airlied: mesh shaders are just like vertex shaders, but are launched as compute shaders that must produce "VS outputs"
<emersion>
javierm: ah, so what's missing?
<javierm>
emersion: only a single thing that danvet mentioned, I've just asked in the mailing list to Zack if plans to post a v3 or we could help with that
<javierm>
basically not using a DRIVER_VIRTUAL driver cap and instead add a new plane type
<javierm>
a DRM_PLANE_TYPE_VIRTUAL_CURSOR or something like that. Since other than that the cursor, the virtual machine KMS drivers behave by the uAPI contract
<javierm>
emersion: but I'm very happy that this is almost ready to land, I first thought that would need to implement the whole thing to get virtio-gpu out of the mutter atomic deny list
YuGiOhJCJ has joined #dri-devel
<emersion>
hm, not sure a new plane type makes sense
<emersion>
you'd also need the not-fun part: igt
_xav_ has quit [Ping timeout: 480 seconds]
_xav_ has joined #dri-devel
djbw has quit [Ping timeout: 480 seconds]
mbrost_ has quit [Ping timeout: 480 seconds]
<danvet>
javierm, don't use dirtyfb and page_flip together
<danvet>
dirtyfb is for frontbuffer rendering, no page flip
<danvet>
and hence the fb check to make sure that you actually report damage for the current fb
<danvet>
page_flip is for a new buffer and has an implicit damage of everything
<danvet>
if you want both page flip and damage, you must use atomic
<danvet>
javierm, I don't have an account there, please add this to the mr ^^^
<danvet>
bbrezillon, I'm lost, or well not on top of sched discussions enough to have any idea of what you should do :-/
<danvet>
emersion, btw on your hdr hackfest summary, unstable uabi isn't a thing in upstream
<emersion>
hm
<danvet>
all the "hide it behind knobs" we've done was for uapi we planned to keep forever, but weren't sure we have all the pieces and the kernel+userspace stack was bug-free enough to unleash it on users
<danvet>
but uapi you plan to actually break/remove is a completely different kind of thing
<danvet>
unless you are very careful with not leaking users and make sure all users can fall back to something you wont get regression reports for
<danvet>
it'll be a regression and the experimental uabi becomes rather permanent
<emersion>
i'm personally in the "let's not merge vendor specific stuff" camp FWIW
<emersion>
but this is something we wanted to discuss on-list
<emersion>
maybe it's no big deal if it's stable
<danvet>
I'm no fan of vendor kms props either
<emersion>
harry seemed okay with maintaining this uAPI forever for this hw
<danvet>
your writeup at least sounded like that's the compromise because you can remove it again
<danvet>
and there's solid chances that's not how it'll play out
<emersion>
yeah… that's how we wanted to compromise
<danvet>
yeah if it's for some specific gpu just to get things going it might be ok
zzoon_2 has quit [Ping timeout: 480 seconds]
<danvet>
yeah that's not really how it works :-)
<emersion>
okay, good to know
<emersion>
yup, it would just be for AMD DCN 3
<emersion>
but then nothign stops somebody else from wanting to do the same thing
<emersion>
or expand to more hw
<emersion>
then things get into a meh state
<danvet>
yeah I think we need really solid reasons that we really can't get off a good per-plane color mgmt api without going vendor specific at first
<emersion>
anyways, this is all mostly to deal with the generic uAPI being stuck
<emersion>
i really hope we can un-stuck it now
<emersion>
a lot of frustration seems to come from the cost of adding new uAPI
<emersion>
(from me first tbh)
<emersion>
we've also discussed mandatory vkms impl for new uAPI BTW
<emersion>
(which is at odds with the above)
<emersion>
(vkms not caught up enough for this to be practical just yet anyways)
<Venemo>
airlied: yes, mesh shader output rasterization works exactly the same as any other
<javierm>
danvet: ah, I see. So then was my misunderstanding how the legacy KMS API should be used. Thanks for the clarification, I'll just drop that MR then and instead focus on using atomic for virtio-gpu
<danvet>
emersion, yeah I think vkms for blending uapi would be nice, so that you can validate the igts against something everyone can use
<emersion>
javierm: nice :)
<javierm>
emersion, danvet: it was meant to be a temporary workaround but I understand now that legacy KMS + damage clipping + page flip isn't a supported combination
<javierm>
emersion, danvet: about uAPI, I remember than in media/v4l2 at some point new features were merged but just the uAPI not exposed
<danvet>
yeah that'd be an option too
<javierm>
that seems to be a good compromise to at least experiment and try to find patterns between the different drivers to define a generic uAPI
<emersion>
uAPI not exposed?
<emersion>
how does that work?
<javierm>
emersion: people will need to run a patched kernel but it's better than keeping all the patches out-of-tree for a long time
<emersion>
ah, so just a patch to #define ENABLE_WHATEVER basically?
<emersion>
that's a bit weird
<javierm>
emersion: it is, yeah. But better than getting stuck with a uAPI that was defined too early
<emersion>
yea
<ccr>
what is needed is a time machine. travel to future to acquire the perfect uAPI and see what mistakes were made, etc.
anholt_ has joined #dri-devel
<bbrezillon>
danvet: any idea would I should ask?
<danvet>
könig? or whoever marked that function obsolete
<bbrezillon>
also tried hooking up native fence support, as you suggested last time, and it's not clear to me when the core can check the FW seqno against the drm_sched_job->parent->seqno
anholt has quit [Ping timeout: 480 seconds]
<bbrezillon>
danvet: yep, it's könig, but he doesn't seem to be on IRC
<danvet>
yeah könig's not on irc
<bbrezillon>
oh well, I'll write an email
invertedoftc096 has quit []
heat has joined #dri-devel
elongbug has quit [Remote host closed the connection]
<bbrezillon>
mbrost__: thanks, I think I had a version with the custom run_wq already
<emersion>
i'd like to check that i can grab the EDID at least
robmur01 has joined #dri-devel
<mbrost__>
cool, that should be the latest
<swick[m]>
emersion: A0h can contain either EDID or DisplayID, A4h can only contain DisplayID afaiu
<emersion>
i see
<swick[m]>
but A0h should be readable
<mbrost__>
I'll probably get all drm sched changes out today on the list
<emersion>
hm
CME has quit []
<bbrezillon>
mbrost__: drm_sched_main() still has a loop to dequeue jobs until there's a reason to stop dequeuing (dep no signaled, or all job slots filled). I was wondering if it wouldn't be better to dequeue one job at a time (or at least limit the number of jobs you dequeue) so that you don't block other works scheduled on the same run_wq.
<bbrezillon>
That's particularly important if we want to use the same ordered (single-threaded) workqueue for both the job dequeueing and tdr, to guarantee that nothing tries to queue stuff to the HW while we're resetting it
swalker__ has quit [Remote host closed the connection]
<jenatali>
Down to 41 CTS fails :)
greenjustin_ has joined #dri-devel
tzimmermann has quit [Quit: Leaving]
greenjustin has quit [Ping timeout: 480 seconds]
kts has joined #dri-devel
mszyprow has quit [Ping timeout: 480 seconds]
mbrost has joined #dri-devel
<mbrost>
bbrezillon, that is kinda tricky
<mbrost>
give me a few and let see what it looks like
rsalvaterra has quit []
rsalvaterra has joined #dri-devel
<mbrost>
but let's say you use the system_wq, you can as many cores on the machine running in parallel as long as each item is its work_struct
<mbrost>
the system_wq is just a wrapper for a pool of kthreads I think
<mbrost>
so IMO I don't see each work_struct running in a loop while it has work as that big of a deal
tursulin has quit [Ping timeout: 480 seconds]
<mbrost>
Another option to would be pass in unique run_wq to each scheduler and now you timeslicing
tlwoerner has quit [Quit: Leaving]
<mbrost>
if you have more work_struct than cores...
JohnnyonFlame has joined #dri-devel
<mbrost>
or pass in a few ordered wq to many schedulers
tlwoerner has joined #dri-devel
kts has quit [Quit: Konversation terminated!]
<mbrost>
but either way let prototype 1 dequeue per pass of the main loop if think that would be better, I'm not going argue this one either way
Guest304 has quit [Remote host closed the connection]
greenjustin_ is now known as greenjustin
<bbrezillon>
mbrost: so, multithreaded workqueue improves the situation, yes, but ckönig was actually suggesting using a single-threaded workqueue, so we don't have to call drm_sched_stop/start() to stop/start the drm_sched while we're resetting the GPU
<mbrost>
well you can pass in any run wq you want but confused how you could away from not calling start / stop
<mbrost>
the existing code calls start / stop on the kthread
<mbrost>
we call start / stop in our reset flows and they rock solid, esp when compared to the i915
<mbrost>
do you mean the same ordered wq as the reset one / tdr?
<bbrezillon>
well, if you have a single-threaded wq for your drm_sched workers, and you queue your reset worker to this queue, you're guaranteed that the reset function will only be executed when all drm_sched workers are idle
<bbrezillon>
yes, that's what ckönig suggested, if I'm correct
<mbrost>
yea that is true
<mbrost>
nothing preventing you from doing that now but yea in that case i guess dequeuing 1 item would make a bit more sense
<mbrost>
i don't think I want to design Xe to share a WQ though, calling start / stop is easy enough
<bbrezillon>
mbrost: ok. I guess we need a replacement for drm_sched_job_resubmit() then...
<bbrezillon>
didn't just what you were using in the Xe driver
<bbrezillon>
*didn't check
<mbrost>
and like I said our reset flows are rock solid, I've written tests to hammer these paths as I was super paranoid about this not working after working on the i915
<mbrost>
we just kill the entity scheduler if it hangs
<bbrezillon>
drm_sched_stop()
<bbrezillon>
I guess
<mbrost>
no recovery, I think this faith's idea for all VM bind drivers i think
<mbrost>
We have a buy in from our UMDs
<mbrost>
1 sec, i'll point you to our TDR callback
<bbrezillon>
you mean remaining jobs are just cancelled?
<mbrost>
that function should be pretty well commented
alanc-away has quit []
alanc has joined #dri-devel
<bbrezillon>
hm, okay. So that's for the 'one entity is going crazy, but the GPU and FW-proc are still functional' case
<mbrost>
yes
<bbrezillon>
we do have global hangs where we need to stop all entities, reset the GPU and FW, and start all entities (and by start, I mean kick the previously active entities at the FW level, and call drm_sched_start())
<mbrost>
We also have GT resets (entire GPU is trashed) but in practice that should be impossible or very, very rare
<mbrost>
in that case, we try to find the bad entity, ban it, resubmit the others
<bbrezillon>
that's where synchronization gets messy, because the reset operation, which is always queued on a workqueue, has to wait for all drm_sched workers to be idle
<mbrost>
A GT reset basically should only be triggered by junk hardware or if our KMD has a bug
<mbrost>
yea, let me point you to our GT reset code...
<bbrezillon>
mbrost: would you mind replying to Christian regarding the drm_sched_{stop,start}()? I feel like your opinions are diverging here, and more and more driver keep depending on your work ;-)
<mbrost>
yea we do have to loop over every entity and stop it
<bbrezillon>
yep, that's what we were planning to do initially
<mbrost>
there is nothing stopping from the current interfaces for doing it either way
<mbrost>
s/for/from
<bbrezillon>
it's just that, with drm_sched_resubmit_jobs() being deprecated, I was a bit lost as to what drivers were supposed to do to reassign the drm_sched_job::parent fence
<mbrost>
that is news to me it being deprecated
Duke`` has joined #dri-devel
<mbrost>
the seems like a unilateral decision, drivers should be allowed to do this either way
<mbrost>
ugh... I need to pay better attention to the list
<bbrezillon>
was merged in 6.3-r1 apparently
<mbrost>
Xe doesn't need most of the nonsense in that function anyways
<mbrost>
really all we need is loop over pending jobs and call run_job
<bbrezillon>
well, the only thing we'd need in the powervr driver is a loop re-assigning the parent fence, we don't even need to call run_job() (the ringbuf should already be filled and it content preserved, unless I'm mistaken)
<mbrost>
ok maybe we just open code the resubmit
<bbrezillon>
just felt odd to iterate over the pending_list manually
<mbrost>
in Xe the parent fence would be the same anyways
<bbrezillon>
but if that's how it's supposed to be done, I'm fine with that
<bbrezillon>
yep, same here, we don't re-allocate the fence
<bbrezillon>
it's the same object, same seqno
<mbrost>
and ref count is still correct too
<bbrezillon>
in any case, we should document what drivers are expected to do between the stop and start calls, because it's unclear right now
<bbrezillon>
like, the bare minimum is to re-assign parent fences, otherwise pending jobs are dropped when drm_sched_start() is called
<mbrost>
again I don't think we should force a driver to do anything
<mbrost>
I'm confused by that last statement
<bbrezillon>
well, it's not about forcing them to do anything, but if you call drm_sched_start() after having kicked the queue, your driver is likely to end up with use-after-free bugs
<mbrost>
in Xe you can run_job as many times as you want you always get the same fence back
<bbrezillon>
at least that's what happens if you keep in-flight jobs in some internal list/queue
<mbrost>
which is assigned to the parent
<mbrost>
it is fence which looks at a memory location for a value to be greater or equal to the jobs seqn
<mbrost>
Seems to work just fine, it would be fun with we had some benchmarks and see if this had a difference
<jenatali>
anholt_: <3
<zmike>
anholt++
<anholt_>
I swear we need to have static int called = 0; assert(called++ < 1000) in piglit_probe_pixel.
<jenatali>
Sounds like a good idea to me
thellstrom has quit [Ping timeout: 480 seconds]
ngcortes has joined #dri-devel
<DemiMarie>
danvet emersion: What about hiding the uAPI behind BROKEN?
<DemiMarie>
I assume no distro will ship a CONFIG_BROKEN=y kernel, and if they do (or patch out the dependency) they get to keep both pieces.
stuarts has joined #dri-devel
<emersion>
DemiMarie: what is CONFIG_BROKEN?
ngcortes has quit [Ping timeout: 480 seconds]
pcercuei has quit [Quit: dodo]
<DemiMarie>
emersion: It is a catchall, never-enabled Kconfig entry used to disable broken code (hence the name). If there is a Kconfig that should never actually be set, one can use `depends on BROKEN` to make sure it is in fact never set.
<emersion>
i see
dcz has joined #dri-devel
Zopolis4_ has joined #dri-devel
mbrost_ has joined #dri-devel
ngcortes has joined #dri-devel
mszyprow has joined #dri-devel
mbrost__ has joined #dri-devel
konstantin_ has joined #dri-devel
mbrost has quit [Ping timeout: 480 seconds]
gawin has quit [Quit: Konversation terminated!]
konstantin has quit [Ping timeout: 480 seconds]
mbrost_ has quit [Ping timeout: 480 seconds]
rasterman has quit [Quit: Gettin' stinky!]
pallavim has quit [Ping timeout: 480 seconds]
apinheiro has quit [Quit: Leaving]
mszyprow has quit [Ping timeout: 480 seconds]
mbrost__ has quit [Ping timeout: 480 seconds]
i509vcb has joined #dri-devel
pallavim has joined #dri-devel
mbrost has joined #dri-devel
mbrost_ has joined #dri-devel
mbrost_ has quit [Remote host closed the connection]
mbrost_ has joined #dri-devel
iive has quit [Quit: They came for me...]
mbrost has quit [Ping timeout: 480 seconds]
RSpliet has quit [Quit: Bye bye man, bye bye]
RSpliet has joined #dri-devel
danvet has quit [Ping timeout: 480 seconds]
alatiera2 has joined #dri-devel
alatiera has quit [Ping timeout: 480 seconds]
alatiera2 is now known as alatiera
<airlied>
zmike: you any clues on the lvp memory model fails that CI is seeing?
<penguin42>
TToTD: If you hit a PROFILING_INFO_NOT_AVAILABLE it might be because you missed a wait() on an event - it took me a while to figure that out
<penguin42>
especially since ROCm didn't complain about it
<zmike>
airlied: I had a ticket where I was tracking all the known fails
<zmike>
if it's not in there I have no idea
<airlied>
ah there was a flake lodged by DavidHeidelberg[m]
<DavidHeidelberg[m]>
?
<airlied>
you logged a lavapipe flake a while back, seems to be more prevalent now, will have to make the effort to track it down I suppose
FireBurn has joined #dri-devel
<airlied>
zmike: just saw 20520 fell down the crack
<DavidHeidelberg[m]>
right
<jenatali>
Anybody know who I should ping to get opinions on !22800?
<airlied>
jenatali: whoever wrote your qsort :-P
<jenatali>
Heh, blame the C standard authors for not requiring it to be stable :P
<zmike>
airlied: oh I forgot about that
<zmike>
I'm not actually sure it's correct now that I look again
oneforall2 has quit [Remote host closed the connection]
alanc has quit [Remote host closed the connection]
<airlied>
Venemo, mareko : do mesh shaders interact with transform feedback?
<airlied>
ah d3d12 doesn't seem to support it at least
<jenatali>
Right
greenjustin has quit [Ping timeout: 480 seconds]
<jenatali>
Huh, actually having working caches seems to have taken ~20 minutes off my CTS run time. I'll take it
<jenatali>
Maybe it means I can bump the CI factor from 4 to 3...