ChanServ changed the topic of #dri-devel to: <ajax> nothing involved with X should ever be unable to find a bar
co1umbarius has quit [Ping timeout: 480 seconds]
nchery is now known as Guest189
nchery has joined #dri-devel
co1umbarius has joined #dri-devel
Guest189 has quit [Ping timeout: 480 seconds]
heat has quit [Remote host closed the connection]
alyssa has left #dri-devel [#dri-devel]
aravind has joined #dri-devel
simon-perretta-img has quit [Ping timeout: 480 seconds]
hch12907 has joined #dri-devel
aravind has quit [Ping timeout: 480 seconds]
apinheiro has quit [Ping timeout: 480 seconds]
sdutt has quit [Ping timeout: 480 seconds]
sdutt has joined #dri-devel
aravind has joined #dri-devel
heat has joined #dri-devel
sdutt has quit []
sdutt has joined #dri-devel
<robclark>
the problem with user signaled fences is not *just* memory management... userspace can also indefinitely block atomic commits from wq, and eventually (depending on kernel config) things will reboot due to hung task in kernel.. we occasionally have that problem in CrOS when there are compositor bugs (because sw_sync and how it is used to paper over android<->wayland<->compositor impedance mismatches (ie. you can end up w/ gpu
<robclark>
waiting on fence that never signals, and atomic commit waiting on fence that gpu would have signaled if it wasn't stuck, etc.. it can result in fence dependency changes that are hard to understand and look like kernel bugs when in fact they are not)
<HdkR>
Just convert surfaceflinger to Wayland. EZ PZ ;)
mwk has quit [Remote host closed the connection]
mwk has joined #dri-devel
<robclark>
heh, if android were sane, things would be much easier
<HdkR>
So true
<graphitemaster>
airlied, Thanks for the link.
<graphitemaster>
I wish more was known about how WDDM and the graphics stack works on Windows
<graphitemaster>
It does feel like they nailed the design of that right because all issues pertaining to sync and preemption and hw-scheduling appear solved there.
<graphitemaster>
They just have the usual WSI and UI issues of HDR and DPI that everyone else has.
<robclark>
I think if we had two things, user signaled fences could *perhaps* be sane: (1) some sort of way to dump out chain of fence dependencies when something goes wrong, which means dma-fence needs to somehow know when signaling fence B depends on fence A being signaled, and (2) some sort of reasonably shorts (couple seconds at most) give-up timer in kernel that goes ahead and signals unsignaled user fences for userspace
<robclark>
graphitemaster: preemption is an orthogonal issue.. it "just" needs a combination of driver and hw support, not really anything in terms of core framework
<airlied>
graphitemaster: wddm rebooted their ecosystem, so they didn't have to deal with it
<airlied>
it's a bit hard to that on Linux, land of compositor choice :-P
<airlied>
robclark: I think as long as you have compositors checking fence status on the cpu before using submitted buffers things should work
<airlied>
it's just a lot of compositors don't work like that
<graphitemaster>
I was told here from others that preemption requires a replumb of the entire Linux graphics stack so I feel like it's only orthogonal in specifics but not in the scope and amount of work necessary to get it like is the case with explicit sync here, plus there's subtle areas of overlap I believe with it, like the notion of long-running compute (and draw dispatches) which do complete but shouldn't just be tore down by the kernel as
<graphitemaster>
it signals unsignaled fences (in your (2) want there), some sort of queue-flag that says "don't signal unsignal fences to avoid dead locks, preempt instead"
<robclark>
I mean, that isn't going to help if you have long running shaders (like compute things if the hw doesn't have an independent ring for them).. that actually needs a combination of driver and hw support to preempt a running draw/grid
<robclark>
ignoring that problem, it is just a compositor issue, not a driver issue ;-)
mhenning has joined #dri-devel
<airlied>
robclark: well you definitely want separate compute queues for long running tasks, otherwise it's nuts
<robclark>
until you encounter a creative shadertoy, then even that doesn't help ;-)
<graphitemaster>
Sure, there's more work involved. Ideally what my wants are (which are not strictly part of any API yet) is something in Vulkan that expooses the handling of dead locks to the user as a queue initialization flag. So you have the DONTCARE type which treats work on that queue how it currently is, or what ever the default should be, a TIMEOUT type which is sort of like TDR (but also for compute) so this would be the kernel signaling
<graphitemaster>
unsignaled fences and tearing down / resetting the context if need be (with user defined timeout, but a min timeout policy in the kernel that you cannot go below), and then a third option type of PREEMPT which is basically the same semantics of TIMEOUT except it doesn't tear down the context, it just preempts, and these queue types/options would be exposed on a case by case basis by driver+hw support
<graphitemaster>
So if devs want PREEMPT, they query support for it, then can initialize their queue that way and begin recording commands into it and get those semantics.
<graphitemaster>
The spec would then put a min requirement on DONTCARE and TIMEOUT, with PREEMPT being something only supported by modern GPUs and drivers that put the effort in.
<jekstrand>
graphitemaster: Oh, there are lots of problems they have. :)
<robclark>
What happens when some rude process sets a very long timeout ;-)
<jekstrand>
Don't get me wrong. WDDM2 is definitely a step forward and better than what we have on most of Linux today.
<jekstrand>
But to say they solved all the problems is a bit mutch.
<jekstrand>
*much
<karolherbst>
robclark: don't let processes do it :P
<graphitemaster>
robclark, min **and** max timeout policy in the kernel most likely :D
<graphitemaster>
Set some reasonable defaults
<karolherbst>
OpenCL kind of has this issue and the solution is: don't do it in the kernel
<karolherbst>
or on hw even
<jekstrand>
The solution to OpenCL's problems is mid-kernel preemption
<karolherbst>
that's not what I meant
<karolherbst>
I meant user signalled fences
<robclark>
what is a reasonable default, though.. it kinda depends on use-case, which isn't a thing kernel knows
<jekstrand>
Or userspace
<jekstrand>
At least not in drivers
<graphitemaster>
I mean TDR is like 5 seconds on Windows right?
<robclark>
the only "true" soln is preemption.. and that can be "hard"
<karolherbst>
so in CL the application can signal events/fences, good luck with coming up with anything reasonable here
<jekstrand>
graphitemaster: I think that's the default but there's registry keys for it
<robclark>
I mean 1/5 fps isn't great, right?
<graphitemaster>
I mean you could degrade anything taking longer to use PREEMPT by default if the HW supports it
<graphitemaster>
So if you hit TDR, switch queue to PREEMPT
<karolherbst>
and if the hw doesn't?
<graphitemaster>
Then do what is currently done, reset
<karolherbst>
which isn't legal in the CL world :P
<karolherbst>
all I try to say here is, there are use cases we can't solve this way
<karolherbst>
and sometimes stuff can run for minutes on purpose
<graphitemaster>
OpenCL code kernels can be massages to be made reentrant and preemptive at the actual runtime layer though.
<graphitemaster>
*messaged
<karolherbst>
that's kernel stuff
<karolherbst>
but what about user fences?
<graphitemaster>
Oh I dunno about user fences :|
<karolherbst>
well, the application can control certain events :)
<graphitemaster>
*massaged
<graphitemaster>
I can't type
<karolherbst>
I've talked with jekstrand about it and if all this fancy kernel stuff could be used for it. In the end the only reasonable answer is: no
<karolherbst>
it won't work
<graphitemaster>
I tmight just be time to put all this stuff in a sarcophagus and define new things that only apply to new APIs and extensions, screw back compat.
<graphitemaster>
Sort of what Windows did with WDDM
<karolherbst>
yeah
<graphitemaster>
jekstrand, Well if it does have issues, they've done a remarkably good job hiding them from user-space and developers of applications.
<graphitemaster>
While everything in the Linux graphic stack seems to constantly butt heads with developers writing applications.
<graphitemaster>
Which is a terrible leaky abstraction I might add.
<karolherbst>
anyway... for CL you can split up work in smaller pieces and just execute the same kernel multiple times, which is good enough (tm)
<jekstrand>
graphitemaster: Don't get me wrong, WDDM2 is much better
<jekstrand>
But it's not flawless. :)
<graphitemaster>
Nothing is flawless.
<jekstrand>
But, frankly, if we could get there on Linux, I wouldn't worry too much about also trying to fix the flaws. It'd still be way better than where we are.
<graphitemaster>
karolherbst, This is what I had to do at work, with GLSL compute shaders, basically instead of running my whole sim frame I'm just measuring each dispatch, dispatch indirect, draw, and draw indirect and trying to keep the actual calls for a sim step below 10ms so I have 6ms for rendering to keep things running at 60fps and the desktop from locking up, it's so much additional complexity and work, but it does work, there's a whole
<graphitemaster>
time prediction thing in there too since the queries take a couple frames to return their results.
<graphitemaster>
It's totally unnecessary on Windows, but it does a better job than Window's preemption so I'm just doing it by default now.
<karolherbst>
graphitemaster: well for compute we don't need to have such tight schedules. But often those kernels can also be super huge...
<karolherbst>
or well.. not even huge, just stupid
<karolherbst>
one benchmarks just copyes TB of memory within a loop :)
<karolherbst>
*copies
<graphitemaster>
We're a GL application, it's remarkable that at least Windows NV is able to run compute separate from draws ...
<graphitemaster>
Relying on a lot of driver magic there.
<graphitemaster>
AMD runs at 1/38th the speed
<graphitemaster>
The preemption helps.
<karolherbst>
yeah.. new hw being able to preempt is really nice, even though there are really strict rules to that
<graphitemaster>
Anyways I do think if users could control scheduling of work queues based on their needs rather than the dumb thing we have now, at least in APIs like Vulkan where the norm is "yeah you can BSOD or black screen a running OS from user-space, we're not safe or secure" ... it feels at home and gives developers the control they actually want. It's already bad enough that the only control we have in Vulkan is separation of compute and
<graphitemaster>
graphics commands to achieve some magical "async compute" which is not even guaranteed either, since there's a ton of drivers that only report graphics+compute queue as one (NV)
<karolherbst>
yep
<karolherbst>
it's not for free
<graphitemaster>
There should be queue priorities, scheduling types (deadline, immediate, deferred, preempt, timeout), and control of those timeouts, etc.
<karolherbst>
makes the hw more complex
<karolherbst>
context switching is really expensive, and on nv hw it's an opt in feature
<jekstrand>
scheduling isn't the biggest problem. Preemption and dealing with long-running jobs is.
<graphitemaster>
I mean you only need one in the hw, instruction-level preemption, you can do all other forms of scheduling in software ontop.
<karolherbst>
ehh mid kernel/shader preemption is opt in I mena
<karolherbst>
*mean
<jekstrand>
graphitemaster: Hah You say instruction-level like that's the easy one.
<karolherbst>
graphitemaster: bye bye performance
<karolherbst>
we should stop trying to do CPU like things on GPUs :P
<graphitemaster>
The hardware can do a lot of the work of context management, like a full shadow copy of all state for a context is probably remarkably simple to do with modern GPU designs where there's a lot of video memory and now things like direct storage to SSDs, I would say to hell with the idea of preemption on GPUs with limited memory / shared memory and tilers. I'd keep preemption strictly a desktop class feature.
<graphitemaster>
I only care about BFGPUs :P
<karolherbst>
:D
<karolherbst>
if it would only be that simple
<karolherbst>
we got firmware doing just context switching
<karolherbst>
if that's what you mean by "the hardware can" sure, if not, then well.. ;)
<graphitemaster>
I think for draws it's probably quite expensive, there's a lot of state there.
<graphitemaster>
But for compute work I feel like the amount of state needed is probably less than actual modern x86 CPUs today
<karolherbst>
it's probably simplier for compute, yes
<karolherbst>
I wouldn't say it's not much, but
<icecream95>
On at least Mali GPUs you could probably preempt fragment jobs just by disabling anything that hasn't been rendered yet in the tile enable map
<karolherbst>
but for compute we can do different things. Worst case we schedule one block at a time and check if we should switch over to graphics or something :P
LexSfX has quit []
sdutt has quit [Ping timeout: 480 seconds]
<robclark>
icecream95: modern adreno has kinda two levels of preemption, the "small hammer" cooperative preemption in between tile passes, and the "big hammer" which is the fallback if you don't reach end of tile pass in time, which involves saving/restoring all the gpu state as well as gmem (tile buffer) save/restore.. although there is some not-insignificant driver work to support both modes. The latter is defn way more expensive than
<robclark>
cpu task preemption ;-)
<robclark>
there is quite a bit of sqe fw involved in the latter
<graphitemaster>
there'
<graphitemaster>
I keep hearing the existence of pre-emption in some hardware and drivers on Linux
<graphitemaster>
Yet I've yet to find a desktop setup of Linux on a configuration that doesn't lock up the moment you run anything on it that eats a lot of GPU time.
<graphitemaster>
So I'm still strongly in the camp of it doesn't exist because no one can prove to me it exists sort of like other written about things that don't have proof :P
<robclark>
like I said, newer hw + fw can do it.. but there is a lot of driver work needed too.. that is the missing piece.. and absolutely nothing at drm framework level needed ;-)
<graphitemaster>
By driver work, do you mean KMD for the GPU or mesa here?
<robclark>
some desktop gpu's make it easier in the more limited case of compute vs 3d by having separate rings where 3d can preempt compute.. but that only solves the more limited problem of long running compute jobs.. (but tbf that is the easier problem to tackle since *way* less state to save/restore)
<graphitemaster>
Or both
<robclark>
both
LexSfX has joined #dri-devel
<graphitemaster>
Right
<graphitemaster>
How exactly does GPU virtualization work on Linux at all if you do not have pre-emption?
<airlied>
SRIOV
<airlied>
they just parcel out the hw units
<airlied>
I'm surprised the nvidia driver doesn't get it right on linux
<graphitemaster>
Would it be possible to solve pre-emption in a similar way, parcel out the hw units per queue :P
<graphitemaster>
Run the whole OS and driver stack per application lol
<robclark>
virtualization is also more or less orthogonal.. (but the answer to your question is more or less: either (a) api level virtualization which sucks for performance or (b) vendor specific soln... I've been spending more of my time on making virt work decently, otherwise I might have been spending time on preemption ;-))
<robclark>
airlied: *if* the hw supports partitioning like that
<airlied>
robclark: yeah and sriov hw is fairly limited to servers
<robclark>
because CrOS is a big fan of VM's on things that are very much $$$ is no object servers, I've been having fun in that area ;-)
<robclark>
*very much not...
sdutt has joined #dri-devel
aravind has quit [Ping timeout: 480 seconds]
<airlied>
like for VMs just running apps on a desktop, fair resource sharing really isn't required
<airlied>
an android app inside a VM can work the same as a CrOS native app or a linux app in another vm
<robclark>
yeah, it is exact same problem if there is more than one GPU user, regardless if one is in VM or not
<robclark>
(but VMs do make memory management and things like cpufreq and scheduling much more entertaining)
<graphitemaster>
They have hot-swap GPUs, pre-emption has to be possible for that.
<graphitemaster>
You can physically pre-empt one
<graphitemaster>
With your hands, mechanically
<graphitemaster>
Server people have it good
<robclark>
I mean, *that* sort of preemption doesn't make a great user experience ;-)
<graphitemaster>
Just unplug and plug the GPU every 16ms
<robclark>
I mean, sure.. we can also just kill gpu jobs that take more than 16ms.. which is fine as long as you don't have any gpu jobs that take more than 16ms ;-)
cef is now known as Guest199
cef has joined #dri-devel
Guest199 has quit [Ping timeout: 480 seconds]
<graphitemaster>
robclark, oh it doesn't resume when you physically remove and insert a gpu?
<graphitemaster>
damn, that's not as cool as I thought then
<graphitemaster>
Like even if the software handles context reset event correctly?
<robclark>
not unless userspace handles it by starting again from scratch.. and then you are back at the same problem, only 16ms later ;-)
<graphitemaster>
Right
<robclark>
if you intend to make forward progress, it is not the approach I would recommend ;-)
<graphitemaster>
It should be as seamless as plugging in headphones. I just resume the music from where it was, I don't have to listen to the song from the beginning again :D
<graphitemaster>
Ironically I imagine stateful sound cards of the 80s and 90s had similar problems, ones with midi hardware support and what not.
<robclark>
to make forward progress you need to be able to save current state in some way that it can be restored and resumed.. and GPUs have a *lot* of state
<robclark>
(less so for compute.. compute is a simpler subset of the same basic problem)
<graphitemaster>
We solved all of this by taking it out of hardware and doing it all in software.
mhenning has quit [Quit: mhenning]
<graphitemaster>
I mean that's a possibility too, you could immediately switch to a software implementation when the hardware resets to make forward progress
<graphitemaster>
Then reupload all state when it comes back online
<graphitemaster>
Keep a shadow copy of all state on the CPU
<graphitemaster>
Quite expensive but somewhat neat.
<graphitemaster>
Windows sort of does that doesn't it
<graphitemaster>
When you have no graphics drivers it reinitializes the whole graphics subsystem when the drivers are installed / up to date
<graphitemaster>
In the mean time it runs purely in software
<jekstrand>
Not at all
<jekstrand>
If you TDR on Windows, you get a TDR. You're done.
<jekstrand>
Apple's GL implementation carried around shadow copies so they could migrate apps between GPUs but that's the only one I know of that's actually ever done full shadowing.
<graphitemaster>
Sure, with TDR. I'm talking about when you have a non-accelerated desktop and install graphics drivers or update them, it can re-init the stack
<jekstrand>
If you update your drivers, I think all active apps get a TDR
<graphitemaster>
I do find it funny that everyone is like "x is bad, lets replace it", then they replace it with something worse or something that still has to speak x
Company has quit [Quit: Leaving]
bmodem has joined #dri-devel
Daanct12 has joined #dri-devel
<jekstrand>
IDK that Wayland is worse. It's different. It has a different set of problems.
<jekstrand>
And, yeah, it has to speak X because backwards compatibility forever!
lromwoo^ has quit [Ping timeout: 480 seconds]
lromwoo^ has joined #dri-devel
aravind has joined #dri-devel
Duke`` has joined #dri-devel
Daanct12 has quit [Remote host closed the connection]
sdutt_ has joined #dri-devel
sdutt has quit [Read error: Connection reset by peer]
aravind has quit [Remote host closed the connection]
aravind has joined #dri-devel
Daanct12 has joined #dri-devel
heat has quit [Ping timeout: 480 seconds]
itoral has joined #dri-devel
aravind has quit [Ping timeout: 480 seconds]
itoral has quit [Remote host closed the connection]
itoral has joined #dri-devel
Daanct12 has quit [Quit: Leaving]
aravind has joined #dri-devel
Duke`` has quit [Ping timeout: 480 seconds]
mvlad has joined #dri-devel
ppascher has quit [Ping timeout: 480 seconds]
itoral has quit [Remote host closed the connection]
itoral has joined #dri-devel
ahajda__ has joined #dri-devel
lemonzest has joined #dri-devel
Surkow|laptop has quit [Ping timeout: 480 seconds]
bmodem has quit [Ping timeout: 480 seconds]
AndrewR has quit [Ping timeout: 480 seconds]
danvet has joined #dri-devel
<dj-death>
any knows whether the fd passed to driver screen_create vfunc in gallium are owned by the driver?
<dj-death>
s/any/anybody/
nchery has quit [Read error: Connection reset by peer]
sdutt_ has quit [Remote host closed the connection]
zackr has quit [Remote host closed the connection]
sdutt_ has joined #dri-devel
abhinav__ has quit [Quit: Ping timeout (120 seconds)]
abhinav__ has joined #dri-devel
zackr has joined #dri-devel
jessica_24 has quit [Quit: Ping timeout (120 seconds)]
dri-logg1r has quit [Remote host closed the connection]
dri-logger has joined #dri-devel
mslusarz has quit [Remote host closed the connection]
mslusarz has joined #dri-devel
anarsoul has quit [Read error: Connection reset by peer]
mclasen has quit []
mclasen has joined #dri-devel
agd5f has quit [Remote host closed the connection]
tlwoerner has quit [Remote host closed the connection]
agd5f has joined #dri-devel
tlwoerner has joined #dri-devel
ceyusa has quit [Remote host closed the connection]
gpiccoli has quit [Quit: Bears...Beets...Battlestar Galactica]
samueldr_ has quit [Remote host closed the connection]
samueldr has joined #dri-devel
ceyusa has joined #dri-devel
gpiccoli has joined #dri-devel
cheako has quit [Quit: Connection closed for inactivity]
tursulin has joined #dri-devel
<MrCooper>
graphitemaster: preemption is mostly a HW / driver problem, not really related to explicit vs implicit sync (preemption is working with the latter with some drivers)
apinheiro has joined #dri-devel
lynxeye has joined #dri-devel
jkrzyszt has joined #dri-devel
AndrewR has joined #dri-devel
<MrCooper>
graphitemaster: also, note that explicit vs implicit sync isn't one global binary choice; implicit sync is perfectly adequate for some things, and can be emulated just for those things using explicit sync at lower levels
bmodem has joined #dri-devel
rasterman has joined #dri-devel
gouchi has joined #dri-devel
bmodem has quit [Remote host closed the connection]
gouchi has quit [Quit: Quitte]
pcercuei has joined #dri-devel
jhli has quit [Remote host closed the connection]
jhli has joined #dri-devel
mdnavare has quit [Remote host closed the connection]
mdnavare has joined #dri-devel
mwalle has quit [Quit: WeeChat 3.0]
mwalle has joined #dri-devel
LexSfX has quit [Remote host closed the connection]
LexSfX has joined #dri-devel
sdutt_ has quit [Ping timeout: 480 seconds]
mi6x3m has joined #dri-devel
<mi6x3m>
hey, i need some info as to what is going on. I start something with MESA_LOADER_DRIVER_OVERRIDE=swrast then i get this output https://pastebin.com/mSMSQ9jG but LLVM is used normally
simon-perretta-img has joined #dri-devel
<pq>
mi6x3m, maybe also tell what you actually want, and how you ended up with that variable?
<pq>
I mean, is "swrast" even a driver name?
<mi6x3m>
it is, I am trying to test out different games with hardware and software rendering
<pq>
swrast is usually just a generic term referring to some software rasterizer, or maybe explicitly to the "classic swrast" which I don't think exists anymore.
<mi6x3m>
so I override the crocus driver with swrast
<mi6x3m>
well what is the name of the software rasterizer in driver terms?
<pq>
I don't know
<pq>
not in terms of that variable at least
<mi6x3m>
is there another var to override the driver?
<pq>
also llvmpipe *is* a swrast
<mi6x3m>
yes I know but it's selected automatically after the system reports that swrast can't be loaded
<mi6x3m>
so swrast is considered a driver but llvmpipe isn't, very weird
<mi6x3m>
if I set the var to be =llvmpipe it says no driver with that name exists
<pq>
aha
<pq>
ok, I do see that swrast_dri.so exists at least
<pq>
the way I've forced software renderer is LIBGL_ALWAYS_SOFTWARE=1 which gest me llvmpipe then.
<pq>
or "true", says Mesa docs
<pq>
and GALLIUM_DRIVER when you want softpipe
rkanwal has joined #dri-devel
yogesh_mohan has joined #dri-devel
vyivel has quit [Read error: No route to host]
vyivel has joined #dri-devel
<mi6x3m>
ah, interesting, might be a remnant of the past then
<pq>
it's a good question why it doesn't work
<pq>
maybe swrast loading code is the remnant of a past and doesn't work through a more standard mechanism
hch12907 has quit [Ping timeout: 480 seconds]
hch12907 has joined #dri-devel
<mi6x3m>
pq, seems to be the case indeed
icecream95 has quit [Ping timeout: 480 seconds]
<MrCooper>
pq: FWIW, there are cases where LIBGL_ALWAYS_SOFTWARE=1 has never had an effect, but MESA_LOADER_DRIVER_OVERRIDE=swrast works
<MrCooper>
mi6x3m: those error messages are a red herring AFAICT, it ends up falling back to swrast anyway :)
<pq>
what cases would those be?
<pq>
yeah, it fails loading swrast, so it falls back to swrast :-P
<mi6x3m>
thanks MrCooper :)
<MrCooper>
pq: not sure exactly, but jadahl / swick were hitting it for mutter testing
<mi6x3m>
my use case is also rather extreme as will be unveiled shortly but I do wanna test all paths
<MrCooper>
you know how to set up suspense
<jadahl>
MrCooper: "never" - it has, at least long long ago
<MrCooper>
jadahl: we talked about this before :) it never worked for those particular cases
<jadahl>
but I have memories (at least from years ago) that it worked :P
<jadahl>
you're telling me I'm crazy?
<MrCooper>
unless I misinterpreted the Git history
<pq>
but swrast is not just one driver, is it? There's more than llvmpipe, or have all the others been deleted by now?
<MrCooper>
jadahl: I suspect you were hitting a different case back then
<jadahl>
MrCooper: it's true it didn't when I thought it did (when adding that documentation)
<jadahl>
perhaps
<mi6x3m>
MrCooper, it's a project so absurd it'll be made illegal by sane world governments
<jadahl>
but that env var should probably eithe be removed, or at least removed from the documentation, unless it's actually fixed
<MrCooper>
pq: there was only ever one swrast_dri.so
<MrCooper>
yes, multiple Gallium drivers now (or the classic swrast driver before)
<mi6x3m>
i think the driver is swrast and it has 2 options
<mi6x3m>
so GALLIUM_DRIVER should be =swrast with LLVMPIPE_ENABLED=0/1
<MrCooper>
jadahl: somebody who cares would just need to add the handling where it's missing
<pq>
swr is gone now, right?
<MrCooper>
pq: MESA_LOADER_DRIVER_OVERRIDE is for selecting the DRI driver, there's GALLIUM_DRIVER for selecting the Gallium driver
<pq>
confusing
<MrCooper>
e.g. MESA_LOADER_DRIVER_OVERRIDE=swrast GALLIUM_DRIVER=softpipe
<mi6x3m>
quite confusing
<jadahl>
perhaps reading LIBGL_ALWAYS_SOFTWARE at the same place as MESA_LOADER_DRIVER_OVERRIDE is read would be enough
<pq>
so there are or have been at least 5 software rasterizers: classig swrast, softpipe, llvmpipe, swr, and zink+lavapipe. Did I miss any? :-)
<MrCooper>
yeah, seems like it should, in all the same places
<mi6x3m>
thanks friends, this gives me some overview!!!
<MrCooper>
pq: I think that covers them
<airlied>
mi6x3m: try kms_swrast maybe
<MrCooper>
oh yeah, no libGL errors with that
<mi6x3m>
airlied, this worked :)
<mi6x3m>
what's the difference
<pq>
but then the "kms" in it is a lie? and both still exist as .so files.
<MrCooper>
I wonder if there's any point still in having them as separate names
<pq>
they are the same file, yes, but both file names are installed in file system
<MrCooper>
all *_dri.so are hardlinks to the same mega driver
<pq>
Debian stable disagrees by having two different mega drivers, but I suppose that's just legacy.
<pq>
hmm, classic drivers I guess
rsalvaterra_ has joined #dri-devel
rsalvaterra_ is now known as rsalvaterra
<MrCooper>
interestingly, they expose different sets of GLX extensions
<MrCooper>
only swrast_dri.so exposes GLX_EXT_buffer_age and sync/swap control extensions, only kms_swrast_dri.so exposes GLX_ARB_context_flush_control & GLX_ARB_create_context_robustness
<MrCooper>
seems like there's a mess to be cleaned up here
<mi6x3m>
i discovered it, i want a share of the loot
<MrCooper>
mi6x3m: you get to file a GitLab issue :P
flacks has quit [Quit: Quitter]
flacks has joined #dri-devel
<mi6x3m>
i'll be a president of issues then
lromwoo^ has quit [Ping timeout: 480 seconds]
slattann has joined #dri-devel
kisak has quit [Quit: leaving]
kisak has joined #dri-devel
alanc has quit [Remote host closed the connection]
mattst88 has quit [Read error: Connection reset by peer]
alanc has joined #dri-devel
mattst88 has joined #dri-devel
sagar_ has quit [Remote host closed the connection]
sagar_ has joined #dri-devel
lromwoo^ has joined #dri-devel
Company has joined #dri-devel
aravind has quit [Ping timeout: 480 seconds]
Danct12 has quit [Quit: Quitting]
`join_subline has quit [Remote host closed the connection]
siqueira has quit []
flacks has quit [Quit: Quitter]
flacks has joined #dri-devel
siqueira has joined #dri-devel
anarsoul|2 has quit [Ping timeout: 480 seconds]
anarsoul has joined #dri-devel
lromwoo^ has quit [Ping timeout: 480 seconds]
`join_subline has joined #dri-devel
rsalvaterra has quit [Ping timeout: 480 seconds]
itoral has quit [Remote host closed the connection]
rsalvaterra has joined #dri-devel
lemonzest has quit [Quit: WeeChat 3.5]
lromwoo^ has joined #dri-devel
RSpliet has quit [Quit: Bye bye man, bye bye]
sdutt has joined #dri-devel
ppascher has joined #dri-devel
RSpliet has joined #dri-devel
lemonzest has joined #dri-devel
bcheng has quit [Remote host closed the connection]
bcheng has joined #dri-devel
lromwoo^ has quit [Ping timeout: 480 seconds]
alyssa has joined #dri-devel
* alyssa
wonders how to evaluate latency scheduling
<HdkR>
microprofiling!
<HdkR>
:)
<alyssa>
HdkR: sounds interesting, details? :p
<alyssa>
what specifically in the perf counters etc is interesting?
<HdkR>
Probably shader execution cycles
lromwoo^ has joined #dri-devel
bertje__ has joined #dri-devel
lromwoo^ has quit [Remote host closed the connection]
<dolphin>
alyssa: step A) hope there is a free running clock on the HW that you can access from the executing workload
<dolphin>
and also from the CPU via some MMIO
<dolphin>
see i-g-t tests/i915/gem_exec_latency
<alyssa>
Yes, there's a shader clock although I never finished wiring it up because it needed kernel changes
<alyssa>
OTOH, those changes are needed for vulkan too I think
Surkow|laptop has joined #dri-devel
<dolphin>
by my experience, that's the best way to get some real wall clock numbers
<dolphin>
we also have further micros to split that number into smaller items
<dolphin>
but it's at least a reasonable way to check that all the other micros add up to the total latency
lynxeye has quit [Quit: Leaving.]
<alyssa>
OK
<dolphin>
and that's actually the latency folks care about, wall clock time from when you submit from userspace to when the workload starts on the GPU/whatever
<alyssa>
to be clear I'm talking about latency within the shader
<dolphin>
oh :)
<alyssa>
i.e. scheduling texture instructions early enough in the shader that we don't stall accessing the results
<dolphin>
I guess that answer will be much more hardware specific
bertje__ has quit [Ping timeout: 480 seconds]
bertje__ has joined #dri-devel
<alyssa>
yeah, for sure
bertje__ has quit [Remote host closed the connection]
<jenatali>
I can work around it (c++20 or 17) for now but we'll have to resolve that at some point probably
`join_subline has quit [Remote host closed the connection]
mbrost has quit [Read error: Connection reset by peer]
kts_ has joined #dri-devel
kts has quit [Ping timeout: 480 seconds]
lemonzest has quit [Quit: WeeChat 3.5]
ahajda_ has quit [Read error: Connection reset by peer]
* jekstrand
is really starting to hate the RADV meta code...
ahajda_ has joined #dri-devel
`join_subline has joined #dri-devel
<jekstrand>
Not that it's necessarily bad code or anything. It's just that begin/end rendering are hopelessly intertwined.
<airlied>
well if vulkan had just done no subpasses up front :-P
<HdkR>
Just means we need to add vkBegin and vkEnd.
<jekstrand>
airlied: It's not that. I've got that *mostly* detangled.
<jekstrand>
With the renderpass code, I'm resetting everything when you save the state as a sanity measure. Turns out a lot of things expect to be able to rely on the old render pass or attachments right up until the last moment. :-/
lumag_ has joined #dri-devel
ahajda_ has quit [Remote host closed the connection]
icecream95 has joined #dri-devel
tzimmermann has quit [Quit: Leaving]
`join_subline has quit [Remote host closed the connection]
`join_subline has joined #dri-devel
camus has joined #dri-devel
Danct12 has joined #dri-devel
rkanwal has quit [Read error: No route to host]
rkanwal has joined #dri-devel
camus1 has quit [Ping timeout: 480 seconds]
`join_subline has quit [Remote host closed the connection]
heat has quit [Read error: Connection reset by peer]
heat has joined #dri-devel
<cwabbott>
jekstrand: I'm gone for the next few weeks
<jekstrand>
cwabbott: :-/
mclasen_ has joined #dri-devel
mclasen has quit [Ping timeout: 480 seconds]
oneforall2 has quit [Remote host closed the connection]
jfalempe has quit []
oneforall2 has joined #dri-devel
<jstultz>
danvet: jekstrand: hey! I'm trying to get my head around some of the umf / drm_syncobj discussions.. curious if there was a sense how non drm drivers (cameras, decoders, etc) would interplay with the move the drm_syncobj for everything? At least w/ sync_files were not so subsystem specific.
<jekstrand>
jstultz: I don't think we're yet set on drm_syncobj for everything
<jstultz>
jekstrand: ah, sorry, i'm maybe over-emphasizing some of danvet's comments. But are folks thinking wider than the drm/ dir w/ umf?
<jekstrand>
I don't think we have a clear path, TBH.
<airlied>
drm_syncobj for anything that is in drm
<jekstrand>
We need to keep DRM working with other components
<airlied>
if it has to interact with other stuff outside the drm device then sync files is fine
<airlied>
but drm interfaces should be restricted to syncobjs and sync_files should be explicit conversions
<jekstrand>
And if we move to UMF, the dma_fence finite time guarantees get tricky but maybe, in a UMF world, with "modern" fencing, we can do that with a simple timeout.
<jekstrand>
But I don't think there's any assumption that things like v4l will need to start interacting with UMF any time soon.
<jekstrand>
Or that it's even tractable, honestly.
<jekstrand>
The goal is to get DRM drivers running as fast as possible with all the fancy stuff for compute etc.
<jekstrand>
Once you leave the DRM world, I think we're ok with converting to sync_file.
<jekstrand>
I think
<jekstrand>
That's a load bearing "I think"
<jstultz>
heh
<jekstrand>
There's a whole lot of design space to trim down still.
<jekstrand>
But I don't see us being able to ever get away from having to interact with sync_file so we're going to have to figure out how to make that work somehow.
<jekstrand>
If the best solution we can find sucks too much than maybe plumb UMF through to other things.
<jekstrand>
Hopefully, we'll find something that doesn't suck too much.
<jstultz>
jekstrand: so, the sync_file API seems pretty bounded. In a lot of the discussions issues w/ dma_fences are transposed onto sync_files, but couldn't sync_files be backed by something else?
<jstultz>
jekstrand: or is there some aspect where existing userland understands it as a dma_fence and so behavior is fixed?
<jekstrand>
jstultz: Possibly but the guarantees we provide to userspace are basically the same guarantees as we have for dma_fence.
<jekstrand>
Which is to say that while we certainly could swap out the internals, I don't see it helping solve the fundamental problem.
<jstultz>
jekstrand: ok, so behavioral guarantees leaked.
<jekstrand>
Because if it works for sync_file, it works for dma_fence and so we may as well just wrap <SYNC_THING> in a dma_fence.
<jekstrand>
jstultz: I don't know that it's so much a leak as just very sensible semantics that both have.
<jekstrand>
The primary such semantic being the finite time guarantees.
<danvet>
jekstrand, timeout alone can deadlock
<danvet>
it's like "mutex_lock_timeout fixes my deadlock" approach to locking design
<jekstrand>
danvet: Yes, I know timeout isn't enough.
<jekstrand>
I'm waving my hands and hoping that future me is somehow smarter.
<danvet>
jekstrand, yeah just wanted to make sure, because the siren calls of "this should be simpler, why isn't it" is so strong on this topic :-)
<danvet>
jekstrand, same
<jekstrand>
danvet: Oh, sure.
mvlad has quit [Remote host closed the connection]
bgs has quit [Read error: Connection reset by peer]
bgs has joined #dri-devel
<danvet>
jstultz, I'm kinda assuming that socs and smaller systems will stay with the dma_fence/sync_file semantics for quite some more time
heat_ has joined #dri-devel
heat has quit [Read error: No route to host]
<airlied>
let's just create drm2, definitely be simpler :-P
morphis has quit [Ping timeout: 480 seconds]
<daniels>
jstultz: well for V4L2, the UMF story is exactly the same as the pre-UMF story - V4L2 doesn't do anything at all and it's your problem to sort out :P
<daniels>
(unless something's changed since I last looked)
morphis has joined #dri-devel
pcercuei has quit [Quit: dodo]
<jstultz>
daniels: are solutions wanted in that space?
<jenatali>
Hm... is there still a need for libglapi.so now that all drivers are Gallium drivers?
<jstultz>
daniels: maybe that sounded snarky - not my intent. trying to better understand the reason wider solution might not be useful between the subsystems, seems like buffers being filled by v4l2 devices heading to the gpu or display would want to provide similar signaling
<jstultz>
daniels: if its just different subsystems focused on their squares of space, or if there are other politics involved
kts_ has quit []
<airlied>
jenatali: I think it's used on the xserver side
<jenatali>
airlied: Oh like used directly rather than as an implementation detail of GLX/EGL?
<airlied>
I think it uses dri drivers by both paths and it needs to be shared
<airlied>
so it's own loader and via glx/egl
<jenatali>
I see
* airlied
forgets why shared glapi exists though now
<jenatali>
It made sense to me when I was thinking about GL+GLES dispatching to either a classic or gallium driver
<jenatali>
Having it as a muxer essentially
<jenatali>
But now you could just embed the shared glapi in the gallium megadriver and be done with it
<idr>
Yeah... it was at least partially there to make sure that functions that existed in both GL and GLES had the same dispatch offset.
<jenatali>
I'm debating doing that for Windows - now that I split the gallium megadriver out there too, there's no reason for libglapi.dll to exist I think
<danvet>
jstultz, v4l2 had some patches to add sync_file support but they never landed
<danvet>
so someone once cared enough to type some code, but never enough to merge it
<danvet>
otoh the people using sync_file tend to not use v4l2 much, for whatever reasons
<airlied>
I wonder if that patch is shipping in android kernels
<jstultz>
danvet: and if that support was revived, you don't see it as problematic integrating w/ the umf later on? Some of the interesting bits from my understanding the timeline semaphore stuff is how you can set things up and repeat the pipeline over and over. v4l2 frames coming in seems similarly repetitive and it might be nice to be able to avoid the buys work of having to re-generate single-shot sync_files over and over.
<jstultz>
s/buys/busy/
rasterman has quit [Quit: Gettin' stinky!]
<danvet>
jstultz, so in an ideal world v4l would adopt drm_syncobj (we can rename it) and android would have mesa vk stacks reusing all the drm_syncobj infrastructure
<danvet>
in reality I think android doesn't use v4l at all and adopts upstream gpu stuff at a snails pace at best, so none of this matters
<danvet>
jstultz, for interop the really nasty part of dma_fence is the interaction with memory shrinkers
<danvet>
but v4l pre-reserves all buffers, so doing umf interop once you have drm_syncobj on v4l side should be trivial
<danvet>
any time you have a v4l driver that does dynamic memory management it probably should have been a drm driver instead :-)
<airlied>
like I think the biggest drm_syncobj block is they are identified by handing off the drm file descriptor
<jstultz>
danvet: is renaming drm_syncobj sufficient? the ioctls all take drm devices, no?
<danvet>
anyway I should have started sleeping like 2-3 hours ago ...
<airlied>
we'd have to introduce fd semantics somewhere
<danvet>
jstultz, it's a stand-alone fd too
<danvet>
like sync_file
tursulin has quit [Read error: Connection reset by peer]
<danvet>
jstultz, but yeah need some decoupling, maybe even a syscall
<jstultz>
danvet: ok
<airlied>
oh yeah we do share them as fds as well, it's just to make sure they aren't sync_file by accident
<danvet>
jstultz, kinda like the gem_bo -> dma_buf fd trick we played
<daniels>
jstultz: no snark taken :) from my understanding V4L2 isn't optimistic when you dequeue output (no signal before completion), and so far they haven't seen much need to push fencing down into the queue path either, because their usecases are so straightline that you don't gain much from having done so
<daniels>
jstultz: I have no meaningful opinion on whether this is good or bad myself
<jstultz>
danvet: ok, sounds good. i really do appreciate the discussion (I know the nihilism is strong), and this helps me paint a more coherent picture.
<daniels>
(I'm sure this is to some extent coloured by how difficult it is to introduce new V4L2 uAPI ...)