ChanServ changed the topic of #dri-devel to: <ajax> nothing involved with X should ever be unable to find a bar
cphealy has quit [Quit: Leaving]
<graphitemaster> You folks want to hear the most cursed thing ever
cphealy has joined #dri-devel
pnowack has quit [Quit: pnowack]
<graphitemaster> Since fancy pants Linux still lacks GPU preemption that works for compute dispatch I decided to make the entire frame reentrant by remembering which compute call was made and then coming back to complete the ones after that, then cutting up all the dispatches to really tiny thread counts spread over many dispatch calls, with the heuristic being "when time gets too long, yield". The thing is I did that with the most cursed hack ever.
<graphitemaster> Instead of changing any code I just bound a timer query to GL_QUERY_BUFFER, enabled GL_RASTERIZER_DISCARD, filled a 4x4 depth buffer with the value 16 for 16ms, set the depth test to GL_LESS, wrote a vertex shader that reads the query buffer and writes out gl_Position.z to it.
<graphitemaster> Then used conditional rendering around every dispatch call invoking this shader.
<graphitemaster> The draw call that is invoked for that vertex shader is GL_POINTS, of 1, so basically the depth test is being used to perform a "has this frame exceeded 16ms", if it does it fails the depth test and the conditional render's query object gets no sampled passed... so the dispatch calls guarded by it do not run.
<graphitemaster> Look what you made me do Linux
Haaninjo has quit [Quit: Ex-Chat]
<graphitemaster> But also there you go, that's one way you can enable preemption :P
<Plagman> you can pre-empt anything with realtime compute on amd hw fwiw
hch12907 has joined #dri-devel
<mareko> AMD has lots of preemption options, like mid-command-buffer preemption where it can switch to a different command buffer between draws and dispatches; compute shaders can also be suspended to memory and resumed later
<Plagman> you can pre-empt mid-graphics-draw with compute though right?
<mareko> yes, compute is completely parallel with graphics so it's not really preemption, they just share compute units with everything else
<Plagman> yeah, it kinda is if you have realtime priority set up
thellstrom has joined #dri-devel
thellstrom1 has quit [Read error: Connection reset by peer]
tursulin has quit [Read error: Connection reset by peer]
<mareko> there are 3 levels of preemption: 1) which command buffer is selected to execute on a queue (multiple command buffers can be in progress on the same hw queue) 2) which queue is selected to create work on CUs (but it's impossible to evict already-created graphics work from CUs) 3) which wave is selected for execution (to use ALU and load/store units)
columbarius has joined #dri-devel
alanc has quit [Remote host closed the connection]
co1umbarius has quit [Ping timeout: 480 seconds]
alanc has joined #dri-devel
<graphitemaster> mareko, There's nothing in the Linux graphics stack that indicates this works though. You run a long compute kernel and the whole Desktop freezes.
ybogdano has quit [Ping timeout: 480 seconds]
<graphitemaster> Even on AMD.
<Plagman> are you running it on a vulkan compute-only queue?
<Plagman> that's how you get real parallelism
<Plagman> you can easily confirm with gpuvis
<graphitemaster> OpenGL sadly
<Plagman> yeah don't use that
<graphitemaster> Don't have much choice I'm afraid
<graphitemaster> It's pretty ridiculous how reliant we're on Windows' WDDM's scheduler to keep the OS actually usable.
lemonzest has quit [Quit: WeeChat 3.3]
<graphitemaster> Kind of sad Linux never came up with a good design here.
sarnex has quit [Quit: Quit]
* robclark wonders how gpu preemption isn't pretty much completely hw and hw specific kernel driver dependent
<jenatali> The granularity is. The mechanism is. But the choice of when to do it based on the granularity is up to the OS (for WDDM)
sarnex has joined #dri-devel
<robclark> I mean.. I guess I'm most familiar w/ adreno.. where you have cooperative preemption (which goes back a few gens, but relies on userspace to insert yield points where it could be safe to preempt (not mid-draw, generally it is between tile passes when there isn't *too* much state to save/restore) and non-cooperative (a6xx and later.. but it is *much* more expensive because gpu's have a lot of state).. there isn't really much
<robclark> more the kernel mode driver can do
<robclark> if hw has separate compute ring, that makes things easier.. but also means you need to use something more clever than gl (maybe dx is more vk like in this regard.. idk too much about dx)
<robclark> I still don't see too much how this is a "linux problem" ;-)
<jenatali> Adreno supports preemption on Windows but I haven't checked the granularity they claim there. And of course it's black box as to how for me
<robclark> afaict it waits for some amount of time to hopefully hit a cooperative yield point (so it is only register state save/restore and not also gmem/etc).. and if that doesn't happen in time it goes the more expensive route
<robclark> we could support that on linux.. we don't yet.. it would take some kernel and mesa work.. but it isn't really like "linux" is going to help or hurt us there any
<robclark> for compute specifically, there is less state to save/restore.. so you could (in hw) have async compute ring, like amd, to solve the more specific problem of compute things making desktop sluggish.. but that means using something more clever than gl as your api
<clever> in the case of v3d, there is registers to control which cores can do vertex, fragment, and compute shaders
<clever> so you can for example reserve 2 cores for vertex/fragment, and then compute cant stall you out
<robclark> right.. but if you are using gl, you have no clue if you need to preserve the allusion of implicit sync between compute and 3d work
<robclark> (or rather the gl driver doesn't)
<clever> yeah
<robclark> not a linux problem... but a gl problem ;-)
<clever> ive not done any gl compute stuff
<kisak> ideally, the end user would want a GPU equivilent to nice that they can tune, then the kernel and driver would translate that to what the hardware would need to do to make the most of that instead of realtime/high/medium/low bins. I don't expect that idealistic view to match reality.
<clever> something i still want, is just a simple gpu usage %
<robclark> yeah, reality is more complicated than ideal scenario there ;-)
<robclark> simple gpu usage % is doable.. but in hw specific way.. perfetto gives a somewhat generic way to export that information for profiling and is already supported to some degree by some drivers
<clever> my main reason for wanting a gpu%, is to know which thing i should upgrade first
<clever> the cpu or the gpu?
<robclark> well, I suppose with v3d that isn't really a choice :-P
<clever> yeah, for that system, you cant swap parts
<clever> but for my desktop, i can
<robclark> for adreno, if you don't want to mess with the full perfetto setup, there is fdperf in mesa
<robclark> I think other drivers have similar things
<clever> 01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Bonaire XTX [Radeon R7 260X/360]
<clever> ive got one of these, and an fx-8350 cpu
<imirkin> there'
<imirkin> there's a radeontop, no?
<clever> *looks*
<clever> already in my package manager, installing!
<DrNick> umr would be better
<clever> imirkin: i can see that 3% is going to the graphics pipe when just watching youtube
<clever> hmmm, what is the difference between vram and gtt?
<imirkin> gtt = system memory
<clever> ah
<imirkin> as accessed via DMA by the PCIe device
<clever> 80% of my 2gig of vram is used, and basically no gtt is used
<clever> i dont even have any games open, so where did all of my vram go? lol
<imirkin> well, every application thinks that GL is awesome now
<imirkin> so a lot of that is just buttons.
<clever> i do also have a lot of tabs open in chrome
<clever> but i have rebooted recently, well, it locked up solid overnight
camus has joined #dri-devel
camus1 has quit [Remote host closed the connection]
<clever> i'll have to keep an eye on this next time i have a game open
<clever> some games can still get a decent 60fps, but others run at 5fps
<clever> i also need to play with galium hud more
<clever> imirkin: so if gtt is basically just a temporary zone in host ram, for tings to park in until dma grabs them, why does gtt have to be 3gig?
<imirkin> it's memory that the gpu can access
<imirkin> not necessarily temporary
<imirkin> when used successfully, the gpu would fetch that data once
<clever> does that result in the host having 3gig less ram for other purposes?
<imirkin> same as any application that's taking up 3gb of memory
<imirkin> it can be swapped out/etc
<clever> ahh
jewins has quit [Ping timeout: 480 seconds]
<clever> 49624 frames in 5.0 seconds = 9924.727 FPS
<clever> if i run `vblank_mode=0 glxgears` then i can cause a pretty high load on most of the radeontop bars
<clever> and with vsync on, due to how simple the polygons are, its barely more then just desktop rendering
<clever> which reminds me, composition is off
camus has quit []
camus has joined #dri-devel
reductum has quit []
reductum has joined #dri-devel
boistordu_ex has joined #dri-devel
mszyprow_ has joined #dri-devel
boistordu_old has quit [Ping timeout: 480 seconds]
mszyprow_ has quit [Ping timeout: 480 seconds]
camus1 has joined #dri-devel
camus has quit [Remote host closed the connection]
jewins has joined #dri-devel
heat has joined #dri-devel
jewins has quit [Quit: jewins]
ngcortes has quit [Ping timeout: 480 seconds]
ella-0 has joined #dri-devel
ella-0_ has quit [Remote host closed the connection]
alatiera0 has joined #dri-devel
alatiera has quit [Ping timeout: 480 seconds]
utf12 has quit []
mclasen has quit [Ping timeout: 480 seconds]
hch12907 has quit [Remote host closed the connection]
hch12907 has joined #dri-devel
hch12907_ has joined #dri-devel
hch12907 has quit [Ping timeout: 480 seconds]
Duke`` has joined #dri-devel
hch12907 has joined #dri-devel
hch12907_ has quit [Ping timeout: 480 seconds]
fxkamd has quit []
sdutt has quit [Remote host closed the connection]
lemonzest has joined #dri-devel
tzimmermann has joined #dri-devel
Daanct12 has joined #dri-devel
hch12907_ has joined #dri-devel
Daaanct12 has joined #dri-devel
Danct12 has quit [Ping timeout: 480 seconds]
itoral has joined #dri-devel
hch12907 has quit [Ping timeout: 480 seconds]
Daanct12 has quit [Ping timeout: 480 seconds]
camus has joined #dri-devel
mvlad has joined #dri-devel
Danct12 has joined #dri-devel
thellstrom1 has joined #dri-devel
Daaanct12 has quit [Ping timeout: 480 seconds]
hch12907 has joined #dri-devel
aravind has joined #dri-devel
thellstrom has quit [Ping timeout: 480 seconds]
camus1 has quit [Ping timeout: 480 seconds]
Daanct12 has joined #dri-devel
hch12907_ has quit [Ping timeout: 480 seconds]
Danct12 has quit [Ping timeout: 480 seconds]
mattrope has quit [Read error: Connection reset by peer]
camus1 has joined #dri-devel
camus has quit [Ping timeout: 480 seconds]
Daanct12 has quit [Quit: Quitting]
Danct12 has joined #dri-devel
Duke`` has quit [Ping timeout: 480 seconds]
alanc has quit [Remote host closed the connection]
aravind has quit [Ping timeout: 480 seconds]
alanc has joined #dri-devel
<ishitatsuyuki> any idea how write-combining is set for VRAM BAR?
<ishitatsuyuki> I'm on amdgpu
<ishitatsuyuki> I can see a few calls in amdgpu_bo_init; but if I understand correctly most of them are noop because the target is not a RAM region
mszyprow_ has joined #dri-devel
JohnnyonF has joined #dri-devel
JohnnyonFlame has quit [Ping timeout: 480 seconds]
adjtm has quit [Remote host closed the connection]
adjtm has joined #dri-devel
OftenTimeConsuming has quit [Remote host closed the connection]
OftenTimeConsuming has joined #dri-devel
heat has quit [Ping timeout: 480 seconds]
tursulin has joined #dri-devel
rauji_ has quit [Remote host closed the connection]
<ishitatsuyuki> for the context, I'm debugging an issue where certain combination of resizable BAR and above 4G decoding would cause extremely slow (~10x) BAR writes
<ishitatsuyuki> if anyone knows a known hardware errata for this please let me know
ppascher has quit [Ping timeout: 480 seconds]
ppascher has joined #dri-devel
<GyrosGeier> ishitatsuyuki, that should be set up through the MTRRs
<ishitatsuyuki> GyrosGeier: if I understand correctly, MTRR isn't touched at all if PAT is initialized
<GyrosGeier> hmm
<GyrosGeier> my expectation would be that if someone asks for a 4 GB mapping with write combining, a 4 GB chunk covered by an MTRR with WC enabled is chosen as the bus address for the mapping
<tzimmermann> danvet, lyude acked the dp-rename patch. can i keep you a-b for the other patches?
pcercuei has joined #dri-devel
<danvet> tzimmermann, oh I figured you only want an ack on the rename bikeshed, I didn't look really at the other stuff
<tzimmermann> danvet, i'd need at least an a-b on the patchset. lyude has acked the renaming (the bikesheded part). the other patches should be non-controversial
<tzimmermann> i think i only linked patch 3 because of the discussion there
<danvet> I guess I need the link again then
<GyrosGeier> hm
<GyrosGeier> MTRR and PAT seem to be used together
dviola has joined #dri-devel
<ishitatsuyuki> GyrosGeier: if you follow the MTRR function call, it will early return when pat is enabled
<GyrosGeier> like, if the PAT doesn't specify anything, the MTRR is used to determine caching behaviour
<GyrosGeier> hm
<ishitatsuyuki> yes, in the processor it's used together but the kernel does not touch it
<ishitatsuyuki> although I'm kinda skeptical whether this is a kernel issue since some combination can fuck this up on Windows too
<GyrosGeier> because https://www.kernel.org/doc/Documentation/x86/pat.txt talks about how you can set up a large uncacheable area and then enable write-combining in a subrange through PAT
<GyrosGeier> IIRC the default MTRR setup would have WC enabled for > 4 GB anyway
<ishitatsuyuki> hm?
<ishitatsuyuki> just a minute let me paste my /proc/mtrr
<GyrosGeier> ah wait
<ishitatsuyuki> http://ix.io/3M8C
<GyrosGeier> that changed
<GyrosGeier> you seem to have an entirely different setup than I do
<GyrosGeier> dmesg for me says that the default is WB, and then it uses three MTRRs to carve out UC areas
<GyrosGeier> since you have WT/WB in the MTRRs, presumably the default is UC
rgallaispou has joined #dri-devel
rgallaispou has left #dri-devel [#dri-devel]
rgallaispou has joined #dri-devel
rgallaispou has quit []
<danvet> yeah ack on the set
<tzimmermann> danvet, thanks a lot
<tzimmermann> we may want to address the overall organization of the drm sources in a seperate discussion. i think some several people have asked for changes
<tzimmermann> like sub-directories for drivers and helpers
<hakzsam> mslusarz: any reasons to not merge https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11779 ?
Major_Biscuit has joined #dri-devel
Major_Biscuit has quit []
MajorBiscuit has joined #dri-devel
itoral_ has joined #dri-devel
itoral has quit [Ping timeout: 480 seconds]
ahajda has joined #dri-devel
rasterman has joined #dri-devel
Company has joined #dri-devel
pnowack has joined #dri-devel
co1umbarius has joined #dri-devel
columbarius has quit [Ping timeout: 480 seconds]
Haaninjo has joined #dri-devel
ahajda has quit []
ahajda has joined #dri-devel
mclasen has joined #dri-devel
thellstrom1 has quit []
thellstrom has joined #dri-devel
flacks has quit [Quit: Quitter]
flacks has joined #dri-devel
MajorBiscuit has quit [Quit: WeeChat 3.3]
MajorBiscuit has joined #dri-devel
itoral_ has quit [Remote host closed the connection]
<cwabbott> hakzsam: dschuermann: so, when looking at some freedreno stuff I realized that if has_fsub is true, opt_algebraic_late will combine a * b - c into fsub(fmul(a, b), c) instead of the (obviously better) ffma(a, b, fneg(c))
<cwabbott> I assume radv/radeonsi would be affected too
illwieckz has quit [Ping timeout: 480 seconds]
<cwabbott> that's because the fsub combining is before ffma combining in the list of patterns
<cwabbott> am I misssing something there?
illwieckz has joined #dri-devel
camus1 has quit []
<danvet> mripard, tzimmermann there seems to be a few patches stuck in drm-misc-fixes
alatiera0 is now known as alatiera
<dschuermann> cwabbott: guess you are right. we also combine in the backend, so we don't immediately recognize such things ;)
<cwabbott> wait, you're combining ffma in the backend? that's probably a bit too much
<cwabbott> we really ought to be able to combine ffma in nir, and if we can't then it's better to fix it in nir because everyone else benefits
mwk has quit [Remote host closed the connection]
mwk has joined #dri-devel
<danvet> mripard, looks like no pull happened over holidays, was really no one around of all three?
<danvet> mripard, tzimmermann if you get a pull out today I can still include it, otherwise please make sure these patches don't miss next week
<danvet> really not great if we block fixes for a month in -misc every merge window :-(
<cwabbott> total instructions in shared programs: 1561330 -> 1551619 (-0.62%)
<cwabbott> big oof
<tzimmermann> danvet, there was nothing in drm-misc-next-fixes over the holidays
<tzimmermann> i sent out a PR yesterday with a single patch
<tzimmermann> dunno about drm-misc-next
<mripard> danvet: I'm on the drm-misc-fixes PR
<tzimmermann> IIRC rc6 was before christmas. so there were no PRs for drm-misc-next over the holidays
<danvet> tzimmermann, people misplace stuff or there was no -fixes pull last week
<danvet> and I'm talking about drm-misc-fixes
<danvet> tzimmermann, mripard since you're around pls either of you drop a pull for that into my inbox
<danvet> also would be great if you guys coordinate this directly instead of me poking you
<danvet> like if -fixes missed the release make sure it's handed over to -next-fixes somehow
<danvet> how I don't care, as long as it's not me who has to go look for it :-)
<tzimmermann> danvet, that -fixes PR was just sent by maxime. besides that, every thing seems to be business as usual
<tzimmermann> no?
<danvet> tzimmermann, no seeing it show up yet here?
<danvet> tzimmermann, the no-as-usual part is that I have to ask for the pr instead of it just showing up :-)
<tzimmermann> hmm, ok
<danvet> or you checking when you do -next-fixes whether anyone misplaced a patch in -fixes
<danvet> which committers do routinely
<danvet> this aint the first time we got a few bugfixes stuck for a few weeks, which just isn't that great
<danvet> or mripard not telling you that there's a few fixes in -fixes that missed the release
<tzimmermann> i cherry-picked a few vmwgfx patches into -next-fixes. but that was before christmas. i'll double-check for other patches
<tzimmermann> but it was really quite recently
<tzimmermann> for those stuck patches, why would mripard tell me? we're not interfering with each others tasks, unless someone specifically asks for a it
<danvet> well I look at drm-misc as a team, not 3 individuals that just happen to use the same git tree
<danvet> handover only going through drm.git is a bit a bug, not a feature I think
<dschuermann> cwabbott it's only recent that we switched to nir ffma in the first place :P
<danvet> mripard, I'm still not setting that PR from you anywhere, got a lore link?
<dschuermann> but interesting idea.. I'll have a look how much effect the backend combining still has
<mripard> danvet: the PR was sent 30 minutes ago
<cwabbott> dschuermann: if you find anything else, I'd be interested because fixing it in NIR will probably also help freedreno
<cwabbott> freedreno is in pretty much the same boat as RDNA2 in this case because we support fsub, have ffma that's actually fused, and have neg modifiers
<cwabbott> we can also put even two immediates in a fma by sticking them in the const file
MajorBiscuit has quit [Quit: WeeChat 3.3]
MajorBiscuit has joined #dri-devel
<danvet> tzimmermann, thx
<danvet> mripard, well it's still stuck for me, no idea
<danvet> also only just now showed up for me on lore
<mripard> danvet: and to be fair, there's patches stuck in drm-misc-fixes during the merge window all the time. so yeah, maybe I tend to forget about it, but I don't really see what the big fuss is, or why we should pass the blame around
<danvet> mripard, two are from last week and did't make it into the release
<danvet> pls yeah they get misplaced all the time by committers, I just like if they show up in a PR without me having to ask
<danvet> *plus
<tzimmermann> danvet, i found two patches that i can cherry-pick into -next-fixes. but test-building will take a bit.
<tzimmermann> ^ in -drm-misc-next
<danvet> tzimmermann, oh those I'm not worried about really since presumably committers didn't consider them fixes
<danvet> so except if you pushed them yourself I wouldn't just cherry-pick them
kts has joined #dri-devel
<tzimmermann> i misplaced one of them. i send ou ta PR if i can. if you don't merge it, the next PR for drm-misc-next is soonish anyway
<tzimmermann> danvet, ^ i also found several patches with Fixes tags in drm-misc-next that could be cherry-picked into -fixes. But these Fixes tags refer to mounths-old releases (v5.10). So they are maybe not really fixes
<tzimmermann> could we enforce something via 'dim'? so that it recommands a branch if the patch has a Fixes tag?
<tzimmermann> danvet, mripard, ^
<mripard> tzimmermann: I'm not really sure about that. Some fixes are not important enough to (like fixes for bugs that don't have any effect)
<mripard> *to be fast-tracked
<tzimmermann> mripard, i'm just thinking aloud :)
<mripard> yeah, I know :)
<mripard> the rule I've seen in other subsystems is that if it fixes something (bug, regression), then it goes in as a fix, otherwise it goes through to the next/devel branch
kts has quit [Quit: Konversation terminated!]
<danvet> mripard, tzimmermann yeah there's a disagreement in the wider community about these
<danvet> some (like gregkh) say anything with Fixes: should be cheery-picked to stable
<danvet> some sternly disagree, for good reasons
<danvet> it's tricky
<danvet> agd5f, tell me if you plan to fix the pr compiler fail right away and I'll wait a bit more with sending out the pull
<danvet> otherwise next week I guess
<tzimmermann> i don't have a clear position on this
<graphitemaster> robclark, the main point of comparison here is WDDM on Windows support mid-command buffer preemption for both compute and graphics queues - GL on there, with long running compute kernels do not stall the desktop at all, neither do graphic tasks, though admittedly those just trigger TDR, but if you disable TDR the OS does keep the system responsive and usable. Nothing like this seems to exist for Linux though, long running compute
<graphitemaster> kernels in Vulkan on a compute-queue, or on a graphics+compute queue in OpenGL just completely deadlock the desktop and I've tried Intel, AMD, and NV both open source and proprietary drivers (in the case of NV) and it's consistently bad. I've been able to pretty much deadlock the system and even prevent entering a kernel TTY.
<danvet> graphitemaster, hangcheck is supposed to clean up the mess
<graphitemaster> Even with a robust context too fwiw.
<danvet> but because reasons, driver set the hangcheck to ridiculous long timouts on compute queues
<danvet> which is not great
<GyrosGeier> nV drivers also fall over if you ask them whether they can render to a remote X server
<danvet> also i915 didn't have working hangcheck for a while
<GyrosGeier> "this NULL pointer here doesn't know"
<graphitemaster> hangcheck just stops running the thing though doesn't it?
<danvet> graphitemaster, yeah, it just prevents the hard hang
<graphitemaster> Yeah that's not really a good solution either :P
<danvet> the real fix, i.e. mid batch preemption, is really tricky
<graphitemaster> But yeah this is something I wish I could actually sponsor the development of getting improved.
<danvet> graphitemaster, ask jekstrand for the dma-fence vs compute presentation at plumbers last year
<tzimmermann> danvet, FYI i sent a PR with the two additional fixes.
<graphitemaster> Maybe if I have some pull at work I guess go to Igalia or Collabera
<danvet> that pretty much covers the entire problem in all it's glory
<danvet> graphitemaster, so before you allocate budget, maybe check that out to understand how bad the dead alley is we're in :-/
MajorBiscuit has quit [Ping timeout: 480 seconds]
<graphitemaster> :(
<danvet> graphitemaster, it's not completely dead, I do think we have a way out
<danvet> but it's a lot more than "ah just change CS buffers so we can preempt them"
<graphitemaster> I've heard Apple has run into the same problem and will be reworking their entire driver stack to account for this in Metal.
<graphitemaster> Windows had some insanely forward thinking.
<danvet> yes and yes
<danvet> graphitemaster, also check out the recording if it exists, there's been some good further discussions on how we could get out of this dead end
* danvet not sure where it is
<graphitemaster> If you get a link I'd love to watch it.
<danvet> graphitemaster, https://linuxplumbersconf.org/event/11/contributions/1115/ this has the slides at least
<danvet> daniels, ^^ you know where/if recordings exist?
sdutt has joined #dri-devel
MajorBiscuit has joined #dri-devel
<danvet> agd5f, I just realized that the compile fail happens even before your pull, but it doesn't happen on 5.16
<danvet> so something is busted, but also readded your pull again
<danvet> also why does udelay not have a consistent limit for "too high value", why is this per-arch
nchery has joined #dri-devel
mlankhorst has joined #dri-devel
kts has joined #dri-devel
mlankhorst has quit [Remote host closed the connection]
kts has quit [Ping timeout: 480 seconds]
imre has quit [Remote host closed the connection]
<tursulin> dim rebuild-tip is hitting a conflict in drivers/gpu/drm/amd/display/dc/core/dc_link.c, anyone from AMD can take care of it?
<agd5f> danvet, looking at it today. thanks
<danvet> agd5f, maybe if you can first check out current drm-tip
<danvet> there's some conflicts between your pull and what's in 5.16
<danvet> I just went with what's in your public, it looked like any bugfixes you cherrypicked are included
<danvet> but maybe double-check
<danvet> against your internal treee or whatever
<danvet> so I can give the right instructions to linus
kts has joined #dri-devel
<agd5f> danvet, sure. thanks
<danvet> thx
mattrope has joined #dri-devel
<jenatali> Hm... this GCC warning seems... very bogus
<jenatali> Release-only, only with GCC 10: https://gitlab.freedesktop.org/mesa/mesa/-/jobs/17636342. That size value should only ever be 0 or 4. Dunno how it's getting anything other than that
mbrost has joined #dri-devel
fxkamd has joined #dri-devel
<agd5f> danvet, looks good. testing it now
Duke`` has joined #dri-devel
pnowack has quit [Quit: pnowack]
eukara has quit [Read error: Connection reset by peer]
macromorgan is now known as Guest569
Guest569 has quit [Read error: Connection reset by peer]
macromorgan has joined #dri-devel
<rodrigovivi> has anyone seen this gitlab issue when forking a project to a personal namespace: "An error occurred while forking the project. Please try again."
<rodrigovivi> I like how descriptive the error message is :/
<agd5f> danvet, looks good. thanks
<danvet> agd5f, thx a lot for checking, I'll go type some pr mail now
gawin has joined #dri-devel
<gawin> mareko: do perhaps know why r300 is adding one extra write to output in VS?
<anholt> rodrigovivi: #freedesktop is the place to report infra issues, and make sure to include the specific repo.
illwieckz has quit [Ping timeout: 480 seconds]
<jekstrand> danvet: You keep pointing people at that presentation as if it contains answers. :-P
<danvet> jekstrand, VK_DEVICE_LOST to do the seamless upgrade from umf to dma-fence for sync is absolutely an answer I'm proud of :-P
<danvet> also the fact I call this "seamless" and am proud of it just shows how few answers there even are :-(
<danvet> jekstrand, plus the recording _does_ answer the question "I want preemptible compute, how hard can it be"
<jekstrand> danvet: Yes, it doesn contain that answer.
<jekstrand> Answer: VK_ERROR_ENGINEERS_LOST_IN_THE_WEEDS
<jekstrand> And the current situation is definitely VK_SUBOPTIMAL_KHR
<danvet> graphitemaster, ^^
MajorBiscuit has quit [Quit: WeeChat 3.3]
illwieckz has joined #dri-devel
<daniels> boom-tish
robertfoss has quit [Ping timeout: 480 seconds]
<zmike> breaking news, jekstrand pivoting to humor columnist
<daniels> ghostwriter at Super Good Code LLC
<daniels> sorry, *executive ghostwriter
luckyxxl has joined #dri-devel
robertfoss has joined #dri-devel
tzimmermann has quit [Quit: Leaving]
Viciouss has quit [Quit: The Lounge - https://thelounge.chat]
Viciouss has joined #dri-devel
Viciouss has quit []
JohnnyonF has quit [Ping timeout: 480 seconds]
<jekstrand> I'll have you know that I don't shadow-write for SGC. That's all zmike. I won't take any of his credit/blame.
<zmike> wait...is THIS a hint?!
<zmike> have I hired you and forgotten about it?!?!?!?/
<ccr> it's one of those scifi plots where you erase your memory for safety purposes
ybogdano has joined #dri-devel
<jekstrand> zmike: I've been shadow-writing ever since you stopped wearing tin foil hats.
<jekstrand> Oops... Wasn't supposed to let that one out of the bag...
<zmike> I knew I felt smarter since last october but I couldn't figure out why
<austriancoder> lets say I want to emulate EXT_texture_sRGB and I would like to do it like fake_rgtc via u_transfer_helper.. do I miss some detail/blocker? Or should it be done in the shader?
hch12907_ has joined #dri-devel
Viciouss has joined #dri-devel
hch12907 has quit [Ping timeout: 480 seconds]
<imirkin> austriancoder: well, you can adjust the srgb-ness at runtime
<imirkin> austriancoder: so if you pre-decode the texture contents, you'll have to reencode them when that sampler parameter gets flipped
<imirkin> (or tex param? i forget)
<jekstrand> If you push srgbness into the shader, you can apply the srgb curve manually
<imirkin> in the shader it means you need a key (or uniform) to determine which textures to do the srgb decoding on
<jekstrand> Yup
<jekstrand> Honestly, a key might not be too bad. Depends on the workload.
<imirkin> i suspect the shader approach is much more feasible than transfer-time
<jekstrand> But a side-band uniform wouldn't be terrible either. Assuming you have integer hardware, all you need is a bitfield.
<jekstrand> If you don't have integer hardware, it's still doable, just more annoying.
<imirkin> non-integer hw generally implies not-a-ton-of-uniforms too
<austriancoder> jekstrand: its for non integer hw
<austriancoder> imirkin: I have 64 uniforms (vec4)
<graphitemaster> Finished watching the presentation. I like the idea of user-space memory fences, and option 1 where you replumb everything to support it - I'd keep dmabuf for backwards compat but then set minimum hardware specs (i.e having preemption) on compositors / wayland
gawin has quit [Quit: Konversation terminated!]
hch12907 has joined #dri-devel
<graphitemaster> That's my vote, not that anyone cares, as for interop, just deprecate dmabuf with modern graphics and compute contexts, only allow them for backwards compat and if you decide to use import then you lose out on umf
* austriancoder plays with the shader approach
<graphitemaster> daniels, jekstrand thonks?
<daniels> graphitemaster: that's a load-bearing 'just' you've got there
<graphitemaster> I mean when you make the context that's info you'd know ... so it feels like a good just, the hard part to me seems if you use interop which can also be deprecated with newer contexts
<graphitemaster> basically just rely on the context version as a mechanism to move things forward rather than trying to be forwards compat too
mbrost has quit [Remote host closed the connection]
<graphitemaster> who gives a crap if interop and dmabuf work with gl 4.7 when ever the heck that comes out, or the next version of es / egl / vulkan / x11 / wayland
mbrost has joined #dri-devel
<graphitemaster> backwards compat means supporting those clients that are already using older contexts
<graphitemaster> anyways that's my two cents
hch12907_ has quit [Ping timeout: 480 seconds]
<jekstrand> There's a fun problem in there where we don't know what x11/wayland supports at the time we load the Vulkan driver and start creating synchronization objects. And Vulkan makes fall-back options... difficult.
hch12907_ has joined #dri-devel
hch12907 has quit [Ping timeout: 480 seconds]
<graphitemaster> So propose a new extension to communicate that information, the absence of the extension indicates the system cannot possibly support umf itself
<graphitemaster> I realize that's very hand-wavy and involves probably more than I realize
<graphitemaster> But transfering some information is also not rocket science :P
* ccr blinks his eyes at unexpected crocus crash.
<zmike> feel free to open tickets on the khronos github for vulkan spec
<daniels> Vulkan doesn't work like EGL does - you start working with the device and creating objects directly, then you speak to the window system almost as an afterthought only to create your target surfaces
<jekstrand> Which is the right way to do it!
<jekstrand> That wasn't a mistake, either at the time or in retrospect.
<daniels> so it's not that no-one can be bothered typing up the transit of a single bit, it's that Vulkan does not and cannot possibly know the information it needs, until it's too late
<jekstrand> But it does make this problem hard to solve.
mbrost has quit [Ping timeout: 480 seconds]
hch12907_ is now known as hch12907
<graphitemaster> I don't think that's actually all that difficult of a problem to solve, once the windowing system is aware of Vulkan all it has to do is reset the context :P
<graphitemaster> Do a wait for idle, throw out the context, then replace the sync objects with the proper ones backed by umf
<ajax> neither xserver nor (say) mutter have any concept of the client using vulkan
<graphitemaster> why is everything designed so oblivious to the existence of reality on Linux lol
<ajax> neither do they have any way to push a context reset to the client, really
<ajax> "the server" here would be the drm
<graphitemaster> Fine, define a socket protocol for communicating this information, wayland has to probe it, vulkan has to probe it, everything has to probe it
<graphitemaster> make a new daemon for it
<graphitemaster> call it a graphics hypervisor service
<jekstrand> Oh, great... Now vkCreateInstance is making DBus calls..
<graphitemaster> I didn't say DBus XD
<graphitemaster> Could also just hack some new stuff into gem / drm
<graphitemaster> some new ioctl everyone needs to probe
<jekstrand> except the kernel doesn't know either. We don't want to impose window-system restrictions on a headless compute process.
<jekstrand> It's either a giant negotiation disaster or we need a seamless fall-back.
<graphitemaster> The kernel just provides a communication channel, the driver would poke info into it, other systems would read it out, user-space would impose a convention on the contents of it and what it means, the kernel doesn't need to care - I mean it could just be a dumb buffer literally in the drm/gem sense, I mean the kernel already has the concept of scan out for systems with no GPUs I don't see the harm
Company has quit [Ping timeout: 480 seconds]
Company has joined #dri-devel
pnowack has joined #dri-devel
lemonzest has quit [Quit: WeeChat 3.3]
sdutt has quit []
sdutt has joined #dri-devel
hch12907_ has joined #dri-devel
hch12907 has quit [Ping timeout: 480 seconds]
luckyxxl has quit []
gawin has joined #dri-devel
ngcortes has joined #dri-devel
JohnnyonFlame has joined #dri-devel
rasterman has quit [Quit: Gettin' stinky!]
gouchi has joined #dri-devel
gawin has quit [Quit: Konversation terminated!]
gawin has joined #dri-devel
idr has joined #dri-devel
mszyprow_ has quit [Ping timeout: 480 seconds]
agd5f_ has joined #dri-devel
gawin_ has joined #dri-devel
agd5f has quit [Ping timeout: 480 seconds]
gawin has quit [Ping timeout: 480 seconds]
ahajda has quit []
gouchi has quit [Remote host closed the connection]
ngcortes_ has joined #dri-devel
ngcortes has quit [Read error: Connection reset by peer]
ngcortes_ has quit [Remote host closed the connection]
ngcortes has joined #dri-devel
hch12907_ has quit [Ping timeout: 480 seconds]
gouchi has joined #dri-devel
heat has joined #dri-devel
pnowack has quit [Quit: pnowack]
heat has quit [Read error: No route to host]
heat has joined #dri-devel
agd5f_ has quit []
agd5f has joined #dri-devel
<zmike> is there a function that converts glsl_base_type to nir_alu_type?
<zmike> nvm
gawin_ has quit []
gawin has joined #dri-devel
Duke`` has quit [Ping timeout: 480 seconds]
mvlad has quit [Quit: Leaving]
gouchi has quit [Remote host closed the connection]
<jekstrand> zmike: I think so
<zmike> yea I found it
<zmike> no doubt you were off at your new job with Lego trying to avoid stepping barefoot on the bricks laying around your new corner office
<jekstrand> Working for LEGO could be good fun. Except I've looked at bits of their SW stack and it's attrocious.
<zmike> perfect place for you to parachute in and clean everything up
<jekstrand> I did write an optimizing compiler for a lego robots once
<zmike> if it wasn't so late in the day and I hadn't just spent the past 2 hours battling nvidia's shader compiler I'd be writing a blog post now
<HdkR> Being a Lego Master Builder would be a neat job
<jekstrand> It would.
Company has quit [Quit: Leaving]
thellstrom has quit [Remote host closed the connection]
thellstrom has joined #dri-devel
<jenatali> imirkin: You'd said something about a nir pass that converts sample_mask_in to a single bit for sample-frequency execution?
<imirkin> i did say something about that
<jenatali> I did a bit of digging but didn't quickly find it
<jenatali> You wouldn't happen to know the name, would you?
<imirkin> pretty sure intel and v3d use it
<imirkin> so i'd look in those
<jenatali> Alright, thanks
<imirkin> hrmph
<imirkin> perhaps i was thinking of the helper invocation lowering
<imirkin> which also uses the sample mask
<jenatali> Yeah, I did see that one
<imirkin> intel does it "by hand": fs_visitor::emit_samplemaskin_setup
<imirkin> (in src/intel/compiler/brw_fs.cpp)
<jenatali> Yeah that's what I'm about to do, it's not hard
<jekstrand> jenatali: I wrote a different one for single-sample recently
<jenatali> To smash sample_mask_in to 1?
<jekstrand> yeah
<jekstrand> And sample_id to 0 and a bunch of other stuff
<jenatali> Yeah, I found myself with one of those yesterday. Apparently writing output sample mask in single-sample is ignored in GL, but writing 0 will still kill pixels in D3D
<imirkin> jenatali: there's a subtle difference between single-sample and non-multi-sample
<imirkin> it's largely academic
<jenatali> Right, I suppose that's what I meant
<imirkin> basically when multisampling is disabled, gl_SampleMask does nothing
<imirkin> this can actually happen when you have a msaa surface bound
<imirkin> but you rasterize without msaa
<imirkin> etc
<imirkin> the value is essentially broadcast to all the samples
<imirkin> (this is not a useful thing to do. just a spec corner.)
<jenatali> imirkin: D3D has a feature for that we call target-independent rasterization (TIR). Apparently we added it for D2D?
<jekstrand> for forced per-sample, I wouldn't be opposed to a NIR pass either. But there's not much to put in that. Basically just sample mask in
<jenatali> Yeah, I'll just do it in the backend, not a big deal
heat has quit [Read error: Connection reset by peer]
heat has joined #dri-devel
<Lyude> HdkR: One of my roommates used to work for one of the Lego stores in one of the malls around here, and apparently while he was working at the store Jamie Berard went to buy legos from before Jamie became a Lego master builder
<Lyude> oops, that sentence came out wonky. "Apparently he was working at the same store that Jamie Berard bought legos from before they were big
<HdkR> Lyude: Caught a lego celebrity in the wild :P
<Lyude> yep!
<Lyude> On a positive note as well, while my roommate says they'd never take the job at Lego again it -was- supposedly the best retail job they ever had
danvet has quit [Ping timeout: 480 seconds]
heat_ has joined #dri-devel
Haaninjo has quit [Quit: Ex-Chat]
heat has quit [Ping timeout: 480 seconds]
LexSfX has quit []
kts has quit [Quit: Konversation terminated!]
LexSfX has joined #dri-devel
heat_ has quit [Remote host closed the connection]
pcercuei has quit [Quit: Lost terminal]
pcercuei has joined #dri-devel
LexSfX has quit []