ChanServ changed the topic of #dri-devel to: <ajax> nothing involved with X should ever be unable to find a bar
rasterman has quit [Quit: Gettin' stinky!]
tobiasjakobi has quit [Read error: Connection reset by peer]
nchery has quit [Remote host closed the connection]
tobiasjakobi has joined #dri-devel
vivijim has joined #dri-devel
tursulin has quit [Read error: Connection reset by peer]
mbrost has quit [Ping timeout: 480 seconds]
* anholt_ wishes gitlab's jobs interface could search by job name
mbrost has joined #dri-devel
ngcortes has quit [Remote host closed the connection]
nchery has joined #dri-devel
nchery has quit [Remote host closed the connection]
vivijim has quit [Ping timeout: 480 seconds]
enick_805 has left #dri-devel [#dri-devel]
DrNick has joined #dri-devel
jhli has quit [Ping timeout: 480 seconds]
vivekk has joined #dri-devel
sdutt_ has joined #dri-devel
mbrost has quit [Ping timeout: 480 seconds]
sdutt has quit [Ping timeout: 480 seconds]
vivek has quit [Ping timeout: 480 seconds]
boistordu has joined #dri-devel
boistordu_old has quit [Ping timeout: 480 seconds]
icecream95[m] is now known as icecream95
Company has quit [Read error: Connection reset by peer]
xyene has joined #dri-devel
camus has joined #dri-devel
camus1 has quit [Remote host closed the connection]
<xyene> hi -- out of curiosity, is there any reason EGL_EXT_swap_buffers_with_damage isn't supported in an llvmpipe EGL context? (or am I just doing something wrong...?)
<airlied> dont think anyone has written support for sw paths
aravind has joined #dri-devel
Duke`` has joined #dri-devel
<xyene> airlied: thanks for confirming. Would there be interest in supporting this? My motivation is trying to use Firefox through a VNC session, a frustrating experience :)
<xyene> I tried tracing through the code to understand how the extension is reported as present or not, and how llvmpipe ends up calling wl_surface.commit for Wayland, but got lost in the indirections
<xyene> could I maybe get some pointers on how this is structured?
<airlied> xyene: yeah it's pretty convoluted, let me see if I can find a pointer
<airlied> src/egl/drivers/dri2/platform_wayland.c is the main place
<airlied> dri2_initialize_wayland_swrast needs to add the extension, but I'm not sure how much the llvmpipe paths are able to do partial updates
<airlied> dri2_wl_swrast_swap_buffers would need an with_damage version
NiksDev has joined #dri-devel
<xyene> airlied: thanks, that makes sense. I guess what I'm most unclear about is how the dri2_egl_display_vtbl ends up being referenced in llvmpipe
<xyene> "but I'm not sure how much the llvmpipe paths are able to do partial updates" hmm, I'm almost certainly missing something here, but isn't the extension "just" about reporting to the compositor what has changed? or is there some internal Mesa state that I'm overlooking here
<airlied> xyene: yeah it just passes the damage rects, so maybe it shouldn't matter to the llvmpipe bits at all
<airlied> xyene: llvmpipe itself doesn't really deal with swapbuffers
itoral has joined #dri-devel
<xyene> airlied: got it, I think I understand how this flows now. one more question: how do people typically test their mesa changes? I can think of replacing the current mesa or running in a VM, wondering if there's a middle ground
<airlied> xyene: install to a local prefix, and set LD_LIBRARY_PATH is probably the easiest, esp for EGL
<xyene> great, will give that a go when I get a chance. thanks!
tobiasjakobi has quit [Remote host closed the connection]
lemonzest has joined #dri-devel
jewins has quit [Ping timeout: 480 seconds]
tzimmermann has joined #dri-devel
Duke`` has quit [Ping timeout: 480 seconds]
sdutt_ has quit [Ping timeout: 480 seconds]
alanc has quit [Remote host closed the connection]
frieder has joined #dri-devel
alanc has joined #dri-devel
thellstrom has joined #dri-devel
pcercuei has joined #dri-devel
mlankhorst has joined #dri-devel
sdutt_ has joined #dri-devel
rasterman has joined #dri-devel
mattst88 has quit [Ping timeout: 480 seconds]
pnowack has joined #dri-devel
danvet has joined #dri-devel
jkrzyszt has joined #dri-devel
vivekk has quit [Ping timeout: 480 seconds]
mattst88 has joined #dri-devel
i-garrison has quit []
i-garrison has joined #dri-devel
tursulin has joined #dri-devel
K`den has joined #dri-devel
Kayden has quit [Remote host closed the connection]
<hakzsam> anholt_: what's the status of this MR https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2640 ? it seems outdated, at least for RADV.
Ahuj has joined #dri-devel
K`den is now known as Kayden
itoral_ has joined #dri-devel
itoral has quit [Remote host closed the connection]
camus1 has joined #dri-devel
camus has quit [Remote host closed the connection]
orbea1 has joined #dri-devel
orbea has quit [Ping timeout: 480 seconds]
flacks has quit [Quit: Quitter]
flacks has joined #dri-devel
camus has joined #dri-devel
hikiko has quit [Ping timeout: 480 seconds]
camus1 has quit [Ping timeout: 480 seconds]
hikiko has joined #dri-devel
Surkow|laptop has quit [Quit: 418 I'm a teapot - NOP NOP NOP]
Surkow|laptop has joined #dri-devel
hikiko has quit [Remote host closed the connection]
hikiko has joined #dri-devel
hikiko_ has joined #dri-devel
hikiko has quit [Read error: Connection reset by peer]
Net147 has quit [Remote host closed the connection]
Net147 has joined #dri-devel
Lucretia has quit []
Lucretia has joined #dri-devel
aravind has quit []
sdutt_ has quit [Read error: Connection reset by peer]
Company has joined #dri-devel
iive has joined #dri-devel
itoral_ has quit [Remote host closed the connection]
frieder has quit [Ping timeout: 480 seconds]
sven has joined #dri-devel
vivijim has joined #dri-devel
frieder has joined #dri-devel
Peste_Bubonica has joined #dri-devel
swick has joined #dri-devel
sdutt has joined #dri-devel
camus1 has joined #dri-devel
camus has quit [Ping timeout: 480 seconds]
sdutt has quit []
orbea1 has quit []
orbea has joined #dri-devel
<alyssa> Afraid to ask but is Mesa IR still used for anything?
sdutt has joined #dri-devel
<alyssa> kusma: is saying in another channel "only for classic drivers that aren't i965"
<alyssa> which seems sufficiently obscure to ignore in my diagram
<ajax> that could only maybe be i915? and r200 if we used it for ATI_fragment_shader maybe?
<ajax> either way: don't care
<alyssa> oh and ffvertex_prog, ughhh
* alyssa had thought that was at least tgsi
<alyssa> wait maybe that is tgsi
<alyssa> no i don't think so ugh
* alyssa adds more nodes to her digraph
<pinchartl> stupid question, how does one implement mode filtering with DP ? the max link frequency depends on link training, so I can't implement a bridge .mode_valid() or .mode_fixup()/.atomic_check(), as the result will depend on the result of the mode set
<danvet> pinchartl, mode_valid filters per the latest probe results
<danvet> if the atomic_commit goes south, you just light up the pipe anyway
<danvet> and set the link status flag to bad
<danvet> and take the newly discovered limits into account in your mode_valid
<danvet> at least until the next hotunplug
<pinchartl> ok, thanks
<pinchartl> should I perform link training at probe time ?
<danvet> nah
<danvet> well, if you're bored maybe
<danvet> usually it should work
<pinchartl> :-)
<danvet> so for link-status userspace gets an uevent
<danvet> and should realize that the display list has changed
<danvet> and pick something else
<danvet> some dont
<danvet> nothing you can do there
<pinchartl> bored isn't really the right term to describe my current situation I think
<danvet> we haz docs even
<pinchartl> so I'll go for best effort
<pinchartl> thanks
<danvet> oh the kerneldoc for https://dri.freedesktop.org/docs/drm/gpu/drm-kms.html#c.drm_connector_set_link_status_property should explain that your mode_valid needs to filter already before you call this
<danvet> or there's races possible
<danvet> would be good to clarify
i-garrison has quit [Remote host closed the connection]
i-garrison has joined #dri-devel
f11f12 has joined #dri-devel
shash_sha has quit [Remote host closed the connection]
f11f12 has quit [Quit: Leaving]
hanetzer has quit [Quit: WeeChat 3.2]
Ahuj has quit [Ping timeout: 480 seconds]
<alyssa> jekstrand: what is fsat(NaN)?
<jekstrand> alyssa: I think IEEE rules would say NaN. I kind-of want it to be 0, though.
<bnieuwenhuizen> why not 1?
<alyssa> i think our hardware really wants it to be 0 too
<alyssa> or 1 or undefined
<jekstrand> It should either be NaN or some undefined value in the range [0, 1]
<jekstrand> alyssa: I smell a piglit test
<alyssa> afaict on Mali min(max(f(x), 0.0), 1.0) is IEEE correct but f.sat(x) is not
<alyssa> which interacts with vk
<jekstrand> min/max have different IEEE rules
<pendingchaos> alyssa: it's 0.0
<pendingchaos> IIRC it doesn't have to be if the fsat isn't exact though
<alyssa> jekstrand: but then clamp(x, 0.0, 1.0) -> fsat(x) is inexact?
* jekstrand just realized he has "real" access to IEEE standards
hanetzer has joined #dri-devel
* jekstrand downloads the IEEE-754-2019 PDF
<jekstrand> I should probably get it printed and bound.
<alyssa> lol
Duke`` has joined #dri-devel
<Sachiel> cover your walls with it
<Sachiel> standard wallpaper
<ccr> make paper boat(s) out of the print, so they float .. "we all float down here"
<alyssa> jekstrand: what about fsat(-0.0)?
<alyssa> valhall reports that as +0.0
<alyssa> but fsat_signed(-0.0) = -0.0
<alyssa> clamp(NaN, -1.0, 1.0) = -1.0
<alyssa> clamp(-NaN, -1.0, 1.0) = -1.0
<alyssa> clamp(-NaN, 0.0, 1.0) = 0.0
<jekstrand> The IEEE spec says nothing about fsat. It's not a defined IEEE float operation
<alyssa> clamp(NaN, 0.0, 1.0) = 0.0
<alyssa> max(NaN, 0.0) = 0.0
<alyssa> max(-NaN, 0.0) = 0.0
<alyssa> max(-0.0, 0.0) = 0.0
<alyssa> those are all the interesting rules on Valhall... I assume older mali behaves the same
<alyssa> (I have valhall computerator handy, don't have equiv for older)
<jekstrand> Ok, from a Spec POV, here's my conclusion:
mareko has quit [Remote host closed the connection]
<jekstrand> IEEE doesn't specify sat, so it's up to GLSL/SPIR-V
mareko has joined #dri-devel
vivek has joined #dri-devel
<jekstrand> SPIR-V doesn't spec it either. It only has clamp. Let's assume sat(x) = clamp(x, 0, 1)
<jekstrand> SPIR-V specifies clamp(x, minVal, maxVal) = min(max(x, minVal), maxVal)
<jekstrand> So sat(x) = clamp(x, 0, 1) = min(max(x, 0), 1)
<jekstrand> Which means sat(NaN) = 0
<alyssa> not NaN?
<jekstrand> Correct
<jekstrand> And not 1
<alyssa> so python is wrong then
<alyssa> ?
<jekstrand> Python has different rules
<alyssa> Ug!
<jekstrand> I had to go though two levels of SPIR-V to get to IEEE there
<alyssa> does C have correct rules for this stuff? I'd like to know which of the Mali rules are correct
<jekstrand> Define "correct"
<jekstrand> IEEE only specifies behavior of min/max
<alyssa> bit identical to what OpenCL without -ffast-math and to Vulkan with all the strict float controls on
<jekstrand> And IEEE has two different min/max rules. There's minimum/maximum and minimumNumber/maximumNumber
<jekstrand> SPIR-V and DXIL/DXBC both follow minimum/maximum
<alyssa> = fminf?
<jekstrand> In IEEE-754, minimum(x, NaN) = minimum(NaN, x) = x
<alyssa> and signed zero?
<jekstrand> Bother, I mean SPIR-V and DXIL/DXBC follow minimumNumber/maximumNumber
<pendingchaos> SPIR-V has separate NMin/NMax/NClamp (D3D behaviour) and FMin/FMax/FClamp (GLSL behaviour) opcodes
<pendingchaos> which operand is the result of FMin/FMax/FClamp if one operand is NaN is undefined
<jekstrand> +0 compares greater than -0
<jekstrand> So fmax(-0, +0) = +0
<jekstrand> pendingchaos: Bah. I forgot about nmin/nmax
<alyssa> bloody
<jekstrand> alyssa: Mali does sat(-0) = -0?
<alyssa> jekstrand: sat(-0) = +0
<jekstrand> alyssa: That's fine with my derived IEEE rules
<jekstrand> Looks like NIR's constant folding *should* be correct
<jekstrand> I'd really love some constant-folding unit tests
<alyssa> All the cases there are correct under "use the other operand if one operand is NaN" rules
<jekstrand> I started writing some eons ago
<alyssa> which is fine for FMin/FMax
rgallaispou has joined #dri-devel
<alyssa> but I don't know if it's fine for NMin/NMax
<danvet> robclark, did you send the pull request for your msm drm/sched conversion?
<jekstrand> alyssa: NMin/NMax are "use the other operand if one operand is NaN"
<alyssa> jekstrand: oh, perfect.
<danvet> melissawen just sent out a v3d rfc which will add more rebase pain for me ...
<jekstrand> alyssa: FMin/FMax are looser: "Use either operand if one operand is NaN"
<alyssa> jekstrand: So valhall, at least, matches IEEE rules exactly.
<jekstrand> alyssa: \o/
vivek has quit []
<robclark> danvet: yeah, I sent that last week
<jekstrand> alyssa: It'd be awesome if we had a pile of SPIR-V (to avoid the GLSL compiler) shader-runner tests which compare HW to NIR's constant folding of various ops.
<danvet> oh cool need rebase asap then
<alyssa> jekstrand: I go back to uni in a month I can't be nerdsniped now!
<jekstrand> :)
<alyssa> oh wait
<alyssa> Validating Numerical Floating-Point Correctness in Graphics Compilers by Rosenzweig et al
<alyssa> yeah just ping me in September πŸ™ƒ
<jekstrand> alyssa: Who's in the "et al"?
<alyssa> jekstrand: Thanks for volunteering
<melissawen> danvet, tbh, I expect to have your series already applied (because I believe it is almost there), before I can apply mine
<jekstrand> alyssa: Do NOT put me on that author list. :P
<danvet> melissawen, ok I'll try and rebase mine
* jekstrand doesn't need random ARM people asking him about the IEEE correctness of their own hardware.
V has quit [Ping timeout: 481 seconds]
V has joined #dri-devel
nchery has joined #dri-devel
V has quit [Ping timeout: 480 seconds]
<alyssa> Alright, the Valhall spec now documents NaN and signed zero behaviour of clamps
<jekstrand> alyssa: I love that you're writing your own spec. πŸ˜‚
<zmike> +1 for FDC Willard
frieder has quit [Remote host closed the connection]
V has joined #dri-devel
<graphitemaster> You know what floating point needs? More NaNs
<Sachiel> LOD based NaN
NiksDev has quit [Ping timeout: 480 seconds]
<graphitemaster> What we do is eliminate single-precision and integers in the hardware, leaving on double-precision. Then we use NaN boxing for the integers, like a lot of managed language VMs.
<graphitemaster> s/on/only
<graphitemaster> That's your new GPU
<jekstrand> graphitemaster: You think that's a joke....
<graphitemaster> Descriptor sets and all that, just some NaN tagging.
<graphitemaster> The future is WebGPU anyways I've been told, may as well have the hardware execute JS directly, JS shaders.
<graphitemaster> jekstrand, Oh this exists already?
<jekstrand> graphitemaster: In OpenGL ES 2 (or maybe it's 3?), integers only require at most 24 bits so you can use floats for them.
<jekstrand> On AMD, they love mediump integers which they map to 24-bit and use the float hardware.
<jekstrand> Or at least they used to
<jekstrand> Other hardware has played similar tricks.
<jekstrand> The AMD 24-bit integer thing is "real" integers, not emulated. But they run them through the float hardware because it's faster.
<jekstrand> Again, this is from many years ago so may not be true on RDNA or GCN.
<graphitemaster> Makes sense, floating point mul is faster than integer mul even on x86_64 XD
jhli has joined #dri-devel
<mareko> the AMD 24-bit integer is really just the mul instruction doing i24 * i24 = i32
<jekstrand> What I know is that when we were spec'ing Vulkan, the AMD guy kept pushing for mediump integers. Aparently, they were a bit his pet.
<jekstrand> That and HW atomic counters.
gouchi has joined #dri-devel
<mareko> I'll just say that 24-bit mediump integers and HW atomic counters are irrelevant today
<jekstrand> Yeah
<jekstrand> That's why they aren't in Vulkan. :D
<zmike> Vulkan: only the most relevant technologies. And transform feedback.
<bnieuwenhuizen> jekstrand: just need to make the UBO/SSBO size limit small enough I can use ti for all my address calculations
<mareko> was he Graham Sellers?
<jekstrand> mareko: Yes
<mareko> he joined nvidia last year
<jekstrand> Yeah, I know
<bnieuwenhuizen> didn't nvidia have something with 24-bit multiplies as well?
<mareko> bnieuwenhuizen: everybody has that because it's free, it's the 24-bit multiplier used for FP32
<jekstrand> We don't
<jekstrand> I mean, I'm sure the HW has such a multiplier but we don't have an instruction for it
<hch12907> zmike: to be fair, transform feedback was added for the D3D compatibility layer
<graphitemaster> Oh good, I'm glad to see even driver devs hate xfb
<jekstrand> graphitemaster: With a passion
<mareko> so do hw devs
<graphitemaster> It's not really relevant now with compute shaders I'd say.
<graphitemaster> Can we kill it, how do we kill it
<danylo> It's useful for debugging
<graphitemaster> Ironically, my engine has a really ugly emulation of compute shaders in terms of image load/store and xfb
<hch12907> why would you need to emulate compute shaders?
<jekstrand> GLES3?
<mareko> compute can't handle primitive restart
<mareko> hence transform feedback can't be fully done with compute
<graphitemaster> Apple is like the worst, but imageLoad/Store doesn't exist there either, so I do some even uglier stuff there.
<graphitemaster> And yeah WebGL 2 (ES3)
<jekstrand> It's not just primitive restart. Compute can't really do GS or tessellation either.
<graphitemaster> Khronos really dropped the ball adopting compute when D3D has it for years. Pushing their OpenCL nonsense instead and all the context switch overhead of that.
<mareko> yeah
<jekstrand> You could do GS if you knew the number of primitives but it'd mean processing all vertices 3x
<bnieuwenhuizen> well, with shared memory I'm sure one could write an appropriate processing loop
<bnieuwenhuizen> that doesn't have too many inefficiencies
<mareko> compute can't also do variable-sized buffer append
<jekstrand> Yeah, you could build a VS output cache in SLM.
<jekstrand> But that doesn't fix the ordering problems
<bnieuwenhuizen> just run a compaction pass at the end
<danvet> robclark, your drm/sched conversion isn't quite right
<danvet> drm_sched_job_init is the point of no return, you're not allowed to fail after that
<danvet> or the scheduler gets very confused
<bnieuwenhuizen> I'm sure with some smartness wrt in what order work gets launched we can do something decent here
<danvet> which is what my patch set with the split into init/arm is meant to fix
<mareko> launch order != buffer store order
<danvet> robclark, I'll whack a FIXME comment on there and do a proposed fix or so
<graphitemaster> Maybe driver devs can explain why GS has to happen _after_ the VS, rather than before. It's somewhat useless in my opinion to happen after. It's also ridiculous that the hardware has to consume geometry shader outputs serially (rendered in input order).
<danvet> robclark, luckily only needs the first prep patch to sort out
<bnieuwenhuizen> if it helps I don't think AMD HW gets good parallelism in practice with GS either
<bnieuwenhuizen> so it is not like we have to support full N workgroups with compute
<mareko> bnieuwenhuizen: why not good parallelism?
<mareko> graphitemaster: no reason, just MS requirement
NiksDev has joined #dri-devel
<bnieuwenhuizen> mareko: dunno, just never seen anything even remotely good at keeping lots of GS waves in flight
<mareko> bnieuwenhuizen: it's fast enough to feed the rasterizer
<graphitemaster> I don't think anyone except Intel gets good performance from GS. It's my understanding NV and AMD both require round-trips through memory. I think AMD uses on-chip cache and has to deal with weird wrapping behavior and NV uses off-chip DRAM with really high latency. Intel iGPUs have massive register files for it (this is based on my really dated knowledge also I ain't a driver dev)
<bnieuwenhuizen> guess it might be different with NGG
<mareko> graphitemaster: using memory is not a disadvantage
<graphitemaster> Anyways why are we talking about how to fix GS. Can we also kill them too. No more GS, XFB, or TESS. Lets just go mesh already.
<graphitemaster> GL will never get an extension will it
<bnieuwenhuizen> graphitemaster: need mesh shader support in vulkan first ...
<mareko> bnieuwenhuizen: I think we allow "num CUs * 4" wave64s for GS and VS by default
<mareko> at least
<danvet> robclark, on the upside drm_sched_job_init is at the right place already for my series :-)
moa is now known as bluebugs
<robclark> yeah, if we could just defer creating the fence, I think that (from memory) is probably all that is needed
<mareko> I think only AMD has the necessary special hw that can do XFB in compute
<mareko> but not with tess and GS
<mareko> actually GS would be fine, tess would need to be emulated in compute
<mareko> the driver effort would be so large that it's not worth it
<dcbaker> Soooo... There's nothing on the 21.2 milestone. Does anyone object to just having the 21.2 release today?
<alyssa> jekstrand: Well if Arm won't give us the spec I will... ;)
<graphitemaster> I'm going to be finally dabbling in Vulkan as a serious thing soon and there's basically nothing about it I like, how do I come around to it XD
<alyssa> graphitemaster: Mali doesn't have any GS/tess support. Arm lowers into massive compute shaders. this is the reason panfrost is capped at GL3.1
Peste_Bubonica has quit [Quit: Leaving]
<mareko> alyssa: how does it do compute when GS/tess produce a variable number of primitives? how does it do primitive restart with compute?
<bnieuwenhuizen> dcbaker: we might have some cache issues with radv in the single-file cache. But given that it also made its way to stable releases not sure it is worth holding the release over that
<alyssa> mareko: No idea, I didn't care to read 10,000 lines of Mali assembly to find out either.
<bnieuwenhuizen> so at the moment just targeting having it fixed by .1
<mareko> alyssa: it's impossible in compute shaders as defined by APIs
<alyssa> One trick the driver can do that the app can't is allocate a massive BO that's not backed by any memory
<alyssa> and then the kernel maps pages dynamically on page fault
<graphitemaster> What makes it impossible?
<bnieuwenhuizen> mareko: the trick is to do synchronization using atomics
<alyssa> "grow on page fault", "growable", or "heap" memory
<dcbaker> bnieuwenhuizen: I'm working out some bugs with our tooling at the moment, so if you decide you want to hold it up let me know. jenatali completely broke my assumption that no one would do something like `Closes: https://github.com/Microsoft/CL/10000` in a mesa commit, lol
<mareko> bnieuwenhuizen: what kind of synchronization?
<jenatali> dcbaker: Oops :)
<alyssa> so that lets a driver-internal compute shader pretend to do dynamic mem alloc
<dcbaker> you're acutally not the only one
<bnieuwenhuizen> dcbaker: not holding from my side
<alyssa> jenatali: Nicely done πŸ˜‰
<dcbaker> I also discovered that people are writing `Closes: #1, #2` which I also never expected and don't handle :)
mbrost has joined #dri-devel
<dcbaker> Which I only discovered because you're bug actually caused the script to throw an uncaught exception
<dcbaker> the comma was just getting truncated and bug #2 wouldn't end up in the release notes :/
<jekstrand> dcbaker: I try to do "Closes: https://gitlab.freedesktop.org/..."; so you can click the link from the git history.
<mareko> alyssa: static alloc would be sufficient with some wastage, that's not a show stopper, the ordering is
<bnieuwenhuizen> also because pasting an URL is easier than typing a number
<jekstrand> But if jenatali wants that to work with MSFT github, he's going to have to write his own scripts. :P
<dcbaker> jekstrand: yeah, I handle that, but I just assumed that `Closes: https://` was going to be a mesa link, so the script tried to say that his commit closed a mesa Issue that doesn't exist
<jekstrand> :D
<jenatali> Eh I don't need tooling to work with external repos, it just seemed good to reference it
<jenatali> If you want me to not use "Closes: " in the future, I can avoid it :P
<jekstrand> dcbaker: What about http? :P
<dcbaker> Nope, I'm just going to fix the script :)
<dcbaker> jekstrand: If you use `http:` you deserve to be ignroed 😜
<mareko> I tried to emulate GS-like ordering with atomics and the performance was 0.002% of the compute power I had, I think Mali has the same problem
<bnieuwenhuizen> now I wonder if you can close merge requests from a commit
<Sachiel> you can
<Sachiel> oh, wait, merge requests no
<alyssa> mareko: that sounds about right
<alyssa> It's a checkbox feature for Arm
<alyssa> The Mali optimization guide has this to say about GS:
<alyssa> - Do not use GS with TS
<alyssa> - Do not use GS with XFB
<alyssa> - Do not use GS with indirect draws
<alyssa> - Do not use GS
<mareko> tess has the same problem
<alyssa> yep.
<mareko> and primitive restart too
<alyssa> and indirect draws is a similar problem
<alyssa> on Bifrost anyway, where the driver is expected to alloc varyings (not the hw/fw)
mbrost_ has joined #dri-devel
Peste_Bubonica has joined #dri-devel
<graphitemaster> Ten years ago the thought of an AMD in your pocket was only a thought if you lived in the north and couldn't find hand warmers.
<graphitemaster> How times have changed.
<jekstrand> mareko: It's almost like someone should take some slightly older AMD GPU IP and fork it to create a mobile GPU....
<graphitemaster> Qualcomm claims that the Snapdragon 820's Adreno 530 supports D3D11 and 12.
<graphitemaster> Weird seeing desktop APIs which lack render passes on a mobile TBDR GPU.
<Venemo> mareko is this the problem that is being solved by ds_ordered_count?
<jekstrand> graphitemaster: There are a few Windows devices that run on them these days. Before then, I don't know why they bothered to implement D3D besides their engineers wanting to be "big boys"
ngcortes has joined #dri-devel
<mareko> Venemo: yes
<airlied> i think running windows has always been a.stretch goal :-p
<mareko> windows is cool :)
vivek has joined #dri-devel
<HdkR> Now if only their D3D driver didn't crash when doing the most basic things :)
<dcbaker> jenatali: And the more I poked at the code the more thoroughly broken I discovered it was.
<alyssa> jekstrand: To this day, Mali advertises D3D support
<jenatali> You're welcome? :)
<alyssa> I am not convinced said D3D driver was ever written
<alyssa> but also to this day the hw has random knobs for "GL mode" and "D3D mode" ...
<FLHerne> Doesn't nine work on panfrost now?
<FLHerne> Or was that freedreno
<alyssa> s/Mali/Arm/
<FLHerne> well, they're right, they just got someone else to write the driver for them :p
<alyssa> Pfheh
<ajax> technically any turing-complete processor supports d3d12 if you apply enough thrust to the pig, so,
<hch12907> throw compute shaders at the gpu until it supports d3d12 :D
<graphitemaster> Yeah good luck with it running well though. Can't do everything in software. Stop killing my friendly fixed function units
<HdkR> llvmpipe, swr, softpipe, and WARP can do everything in software. It's fine ;)
<ajax> swr doesn't have ARB_compute_shader
<ajax> (more horrifying is that softpipe _does_)
<graphitemaster> I meant software as in on the GPU. Like implement the entire rasterizer, blending, sampling, etc, in one compute shader.
<Venemo> mareko: BTW here is a silly thought. What if we disable vertex reuse and set the vertex group size to be divisible by 3, for VS and TES. Then, every workgroup except the last one would have the same number of vertices
<airlied> ajax: oh that was back when I considered lavapipe on softpipe :-P
<Venemo> This may work for GS too unless its output is not compile time known
<airlied> ajax: someone should write softpipe tess support (by which i mean nobody should)
anujp has joined #dri-devel
<daniels> jekstrand, airlied: I think it’s pretty non-optional for the Surface Pro X to do D3D tbqh
rasterman has quit [Quit: Gettin' stinky!]
jhli has quit [Read error: Connection reset by peer]
<airlied> daniels: if we learned one thing from the Surface Pro X, it's that the whole device was very optional
<HdkR> I really like the device. Shame about that Linux support :(
jhli has joined #dri-devel
<alyssa> graphitemaster: That's not far from how ray tracing in d3d/vk works tbh
<bnieuwenhuizen> alyssa: but we get to add new fixed function hardware
<bnieuwenhuizen> (some of us at least ...)
<alyssa> jekstrand: Just found my 3rd distinct bug in modifier prop >___>
<alyssa> failing to propagate through swizzles on fsat
<alyssa> and that's somewhat nontrivial, since reswizzling an op isn't always possiblw
urja has quit [Read error: Connection reset by peer]
<jekstrand> alyssa: This is part of why NIR does modifiers at the end, if at all.
<alyssa> Yeah, seriously
<alyssa> unfortunately this whole isel thing is exactly the "end"
<alyssa> ;)
<jekstrand> And why I wrote nir_search. No one should ever have to think about swizzles. If they do, they'll get them wrong.
urja has joined #dri-devel
<jekstrand> Including me
<alyssa> Yeah...
<ajax> airlied: i kind of want to replace softpipe with llvmpipe-to-llir-plus-interpreter
mareko has quit [Remote host closed the connection]
mareko has joined #dri-devel
<jekstrand> NIRPipe!
mslusarz has quit [Remote host closed the connection]
mslusarz has joined #dri-devel
<alyssa> jekstrand: Unless you're ok with adding 150 new _mali alu ops and 50 _mali intrinsics i think i have to think about these things myself
<jekstrand> alyssa: Yeah, that might be a bit much...
<alyssa> jekstrand: For Bifrost in particular (and to a lesser extent Valhall), there are limitations on what swizzles are allowed on what sources of which instructions
<jekstrand> alyssa: Awesome!
<jekstrand> alyssa: We have a bit of that but really only when you start playing with fp64 and, at that point, you've already lost.
<alyssa> So I need either a bi_lower_swizzle pass or a bi_nir_lower_swizzle pass regardless
<alyssa> And I'd argue the latter is a lot easier to get wrong, since the mapping of NIR to Bifrost instructions is nontrivial
<jekstrand> Yup
<alyssa> => innocuous changes to NIR->BIR cause random swizzle bugs
ppascher has quit [Ping timeout: 480 seconds]
<hch12907> could be fun to implement a NIRpipe that interprets NIR
<alyssa> anholt_: started one i think
<alyssa> of course my vote is for ccpipe
<alyssa> libgccjit supports 68k, right? :-p
<ajax> assuming gcc itself does, yeah
<alyssa> Also - does anyone have opinions about GLSL standalone being used for integration tests in `meson test`?
<alyssa> Like, ensuring that a given .shader_test compiles to a given binary.
<alyssa> The use case being simple shaders where there is an unambiguous optimal compile that should be maintained.
<alyssa> OTOH those shaders are trivial to build with nir_builder
<alyssa> So maybe throwing GLSL into the mix is just complicating things
ppascher has joined #dri-devel
<graphitemaster> Going to make a Captcha where I show the same checkerboard texture (gif) zooming in and out under different filtering options with the caption "select all images that are min=%s" or "all images that are mag=%s"
<graphitemaster> Good way to train TAs.
tzimmermann has quit [Quit: Leaving]
<alyssa> graphitemaster: Help it turns out my CSC317 TA is a robot!
nirmoy has joined #dri-devel
hch12907_ has joined #dri-devel
hch12907 has quit [Ping timeout: 480 seconds]
sarnex has quit [Quit: Quit]
dreda has quit [Ping timeout: 480 seconds]
Danct12 has quit [Read error: Connection reset by peer]
K`den has joined #dri-devel
Kayden has quit [Read error: Connection reset by peer]
Danct12 has joined #dri-devel
Duke`` has quit [Ping timeout: 480 seconds]
gouchi has quit [Remote host closed the connection]
sdutt has quit [Remote host closed the connection]
jkrzyszt has quit [Ping timeout: 480 seconds]
Peste_Bubonica has quit [Quit: Leaving]
<anholt_> hakzsam: pretty old at this point. not sure.
V has quit []
V has joined #dri-devel
K`den is now known as Kayden
<alyssa> jekstrand: In the same way that FABSNEG and FCLAMP pseudo ops make abs/neg/clamp propagation easier, I'm wondering if I wouldn't like to define a set of pseudo ops operating on 1-bit booleans so I can do the lowering more intelligently
<alyssa> Not yet sure what ops those would be
<alyssa> ...and argh! here's yet another bug!
<alyssa> this one only seems to affect valhall thankfully
<alyssa> confusion around whether BI_SWIZZLE_H00 means ".xx" or ".x and zext" or ".x and sext" or ".x and f2f32" or ".xyxy with i8vec4"
<alyssa> because i think it means all of them context dependent right now and ... no, this is intractable.
<alyssa> I could use that obscure spec pub right about now
nirmoy has quit []
mlankhorst has quit [Ping timeout: 480 seconds]
pcercuei has quit [Quit: dodo]
sarnex has joined #dri-devel
<anholt_> any existing code anyone know of i could look at for how to parse samplers from nir uniform variable declarations (in the presence of structs and arrays)?
<anholt_> hmm. looks like the trick is that the actual sampler used in an array of structs got lowered out of the struct to its own decl that's just an array of samplers, so ignore samplers deep in a struct.
thellstrom has quit [Remote host closed the connection]
thellstrom has joined #dri-devel
<jekstrand> anholt_: Yup, that's the answer.
soreau has quit [Remote host closed the connection]
soreau has joined #dri-devel
<jenatali> That still strikes me as such a weird design for the shader language, to allow samplers in structs
<anholt_> so painful
<bnieuwenhuizen> is this for bindless?
<anholt_> !8044
danvet has quit [Ping timeout: 480 seconds]
<alyssa> when you have an MR # memorized you know it's too late
<mareko> Venemo: you can still have draws where every workgroup is partial
<mareko> assuming 1 workgroup per draw
<alyssa> dcbaker: "Currently, there's no test code... Of course, that means it's broken. In a lot of different ways, it turns out."
<alyssa> I really should've started writing tests, oh, 5 years ago..
idr has quit [Ping timeout: 480 seconds]
macromorgan has quit [Remote host closed the connection]
macromorgan has joined #dri-devel
iive has quit []