<karolherbst>
why aren't we const folding u2u(0) to 0? something isn't right here...
<karolherbst>
airlied: yeah.. well.. your choice. If something works with clover, it should also use with rusticl.. more or less. At least nothing from the explicit expectation chagnes
<karolherbst>
airlied: image_size is busted
<karolherbst>
airlied: so it gets the coords via load_input, llvmpipe replaces it with 0 somewhere
<karolherbst>
but between the load and the image_size is a u2u
<karolherbst>
so it can't read out the const value
<airlied>
shouldn't that u2u disappear? :-P
<karolherbst>
well.. yeah? :P
<karolherbst>
but I am little confused....
<karolherbst>
what's the first arg to image_size anyway?
<DanaG>
I've always found it odd that there are "warnings" with no message, just a backtrace. "There's a problem at this address in the neighborhood." But what is the problem? <no answer>
<DanaG>
WARNING: CPU: 3 PID: 3101 at drivers/gpu/drm/amd/amdgpu/../display/dc/dce/dce_aux.c:395 dce_aux_transfer_raw+0x28d/0x2f0 [amdgpu]
tursulin has joined #dri-devel
YuGiOhJCJ has joined #dri-devel
<FLHerne>
Did it work before with that kernel, or what did you have before?
<DanaG>
I think the only thing that changed is that I tried the GPU in an HP T730 thin client's PCIe slot, and I'm not sure how many watts that slot can supply, but it's not a high-wattage card.
<DanaG>
And I had a previous WX 4100 go broken when I tried it in that slot, but couldn't be sure it wasn't a Supermicro board I had at the time that did it.
icecream95 has quit [Remote host closed the connection]
<FLHerne>
All PCIe x16 slots should support 75W by spec
<FLHerne>
I guess an OEM thin client might not bother
<FLHerne>
but I really doubt underpowering a card would permanently damage it anyway
<FLHerne>
I'd try some other kernels and/or file a bug
<FLHerne>
5.15 (and even some backports to 5.14) have definitely had regressions on certain hardware
<DanaG>
I can try booting an older one.
danvet has joined #dri-devel
<DanaG>
I should also note that I have a displayport KVM switch, which can be finicky (even though it's an optical displayport cable on the output). Actually, let me try direct connecting it...
ahajda has joined #dri-devel
slattann has quit []
<DanaG>
I'll also check if it works properly in Windows.
nchery has joined #dri-devel
<FLHerne>
those both sound like good ideas
LexSfX has quit [Ping timeout: 480 seconds]
LexSfX has joined #dri-devel
adjtm has quit [Quit: Leaving]
adjtm has joined #dri-devel
LexSfX has quit []
adjtm has quit []
<HdkR>
`UBSAN: shift-out-of-bounds in /build/linux-HMZHpV/linux-5.15.0/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c:601:30` The UBSAN in amdgpu is pretty questionable as well
<HdkR>
Could be UBSAN is tweaking some behaviour just enough that it is breaking things and needs to not be enabled :)
<DanaG>
Interesting, I disabled the EFI CSM, and now I at least get a framebuffer at boot, but still get the flip timeout. Now going to try old kernel.
icecream95 has joined #dri-devel
LexSfX has joined #dri-devel
<DanaG>
5.13 kernel: same timeout.
<javierm>
tzimmermann: answered to the list. What you mention was my first thought as well but then realized that wasn't the correct thing to do
jkrzyszt has joined #dri-devel
shankaru has quit [Read error: Connection reset by peer]
<DanaG>
Driver seems to work okay in Windows, at least under a quick test. When the WX 4100 failed, it was dying in Windows too, but this is better at least.
<DanaG>
I'm pondering getting a W6400 for this machine to mess with KVM. Has that GPU had the dang PCI reset problems fixed?
<DanaG>
I do have IOMMU and ACS and ARI enabled, I wonder if any of the settings in that group are relevant? I recall seeing Navi get a quirk to disable ARI.
<DanaG>
This isn't a Navi, though.
rasterman has joined #dri-devel
<DanaG>
Looks like amdgpu.dc=0 makes it not die. I guess it's bugreport time, indeed. But first, just time for sleep.
frieder has joined #dri-devel
<tzimmermann>
javierm, the way dp_aux_chardev and dp_cec work is a bit messy. but we cannot really do much about it ATM
slattann has joined #dri-devel
<javierm>
tzimmermann: yeah, I've sent yet another proposal in the thread
<tzimmermann>
javierm, one alternative it to provide empty stubs of drm_dp_dpcd_write()/read()/etc if DISPLAY_DP_HELPER has been disabled. that would resolve the linker error at least
<DanaG>
With dc disabled, I see [drm:amdgpu_atombios_dp_process_aux_ch.constprop.0 [amdgpu]] dp_aux_ch flags not zero
<javierm>
tzimmermann: indeed, I can include that in v3 if you don't agree with the latest option I mentioned
<javierm>
I would just kill all these user configurable options, I think that made more sense before you made the split but have little value nowadays
<DanaG>
That reminds me, I had hangs on aarch64 on my wx 4100, until I switched from active DP-HDMI to passive DP-HDMI (target device is pikvm so doesn't need active).
<DanaG>
Backtrace mentioned CEC.
<DanaG>
I don't have that cec backtrace handy, though.
<emersion>
pq: has anyone replied to "It's bit moot to e.g. render everything in electrical 10 bit RGB, if the link is just going to squash that into electrical 8 bit RGB, right?"
DanaG has quit [Remote host closed the connection]
<pq>
emersion, nope
MrCooper_ is now known as MrCooper
<emersion>
okay. wondering if that would be sane default behavior
<pq>
emersion, but MrCooper_ did point out that electrical 8 bpc FB may not be a reason to turn link bpc down to 8 too, because the KMS color pipeline can have more precision.
<MrCooper>
on a different (though somewhat related) topic: with temporal dithering, the effective observable bpc can be higher than the HW bpc, right?
<pq>
that's the idea, I believe, yes
<pq>
or even spatial dithering - you rarely look at individual pixels
<MrCooper>
then requiring a minimum HW bpc might artificially exclude some scenarios which would actually work as intended
<pq>
probably, as long as we have no idea if dithering is there or not
<MrCooper>
swick: ^ so user space actually can't know the required minimum HW bpc
<MrCooper>
I guess drivers could be allowed to take dithering into account for the minimum bpc
<MrCooper>
or maybe dithering should be explicitly controlled as well (e.g. apparently some people physically cannot bear temporal dithering)
<pq>
I would want explicit control and knowledge of dithering.
lemonzest has joined #dri-devel
gouchi has joined #dri-devel
gouchi has quit []
rasterman has quit [Quit: Gettin' stinky!]
pcercuei has joined #dri-devel
digetx has quit [Read error: Connection reset by peer]
<javierm>
tzimmermann: funny that the first local patch I had here to just bypass the link error was "depends on DRM && DRM_DISPLAY_HELPER && DRM_DISPLAY_DP_HELPER"
<javierm>
tzimmermann: but that didn't feel quite right to me for the reasons I mentioned in the thread. I didn't expect this change to be that controversial :)
rasterman has quit [Remote host closed the connection]
rasterman has joined #dri-devel
<danvet>
mripard, no, never got wired through
<danvet>
but probably a good idea to do so?
<danvet>
the rough plan was to reuse the async cursor flip stuff of some sorts
<danvet>
and let drivers figure out which exact kind of async we really need
<danvet>
but the idea of flip_done was that that would hide this all sufficiently well
Daanct12 has quit [Remote host closed the connection]
<javierm>
tzimmermann: sure. Btw, Dan's report about pach 1/5 missing a mutex_unlock(&info->mm_lock) before return seems correct
<tzimmermann>
javierm, yes. i forgot that mutex_unlock.
<karolherbst>
jekstrand, airlied: mhh.. I think we have to disable user resources for non 1D images as we currently have no way of specifying custom slices :(
thellstrom has joined #dri-devel
nvishwa1_ has quit [Ping timeout: 480 seconds]
<karolherbst>
but fixing the interfaces... uhh... maybe I should read up on how the GL stuff works for using that
Daanct12 has joined #dri-devel
<karolherbst>
mhh, I guess from a GL perspective it was only valid for buffers anyway
<karolherbst>
mhh.. and textures? oh wow
<karolherbst>
so the problem is, that llvmpipe just calculates its own pitch/slice and that breaks stuff
ella-0 has joined #dri-devel
ella-0_ has quit [Read error: Connection reset by peer]
rasterman has quit [Quit: Gettin' stinky!]
rasterman has joined #dri-devel
frieder has quit [Remote host closed the connection]
<tzimmermann>
and whenever a plane updates, either sync or async, the commit acquires crtc->mutex?
Daanct12 has quit [Quit: Leaving]
rasterman has joined #dri-devel
<karolherbst>
jekstrand: ... treating llvmpipe as a GPU makes the image stuff pass :(
<karolherbst>
maybe we can get a waiver for CPU impls
<karolherbst>
at least I'd try
<danvet>
tzimmermann, yeah the locking is the same
<tzimmermann>
danvet, ah, thanks. related question: concurrent updates to planes of the same crtc will never interfere, because they are serialized via crtc->mutex?
<danvet>
not quite
<danvet>
on the sw state, yes
<danvet>
on the hw state, nonblocking updates are pushed through without holding any locks
<danvet>
and ordering is ensured by waiting for/signalling drm_crtc_commit appropriately
<danvet>
which makes this all a bit more complicated
<tzimmermann>
that is basically what happens in commit_tail, right?
<tzimmermann>
the hw-state update
<tzimmermann>
danvet, but two concurrent hw-state updates are serialized by DRM's atomic helpers, right?
<danvet>
yeah should be
<danvet>
the real fun only starts when you have cross crtc state
<danvet>
hence the epic discussions recently with Lyude on dp mst state
<danvet>
lynxeye, random thought really, but have you looked at moving etnaviv over to shmem helpers?
<tzimmermann>
danvet, that's luckily not the case
<tzimmermann>
danvet, i'm still having that ast bug where mouse movement interferes with modesetting
<danvet>
yeah as long as any hw is strictly attached to either crtc or connector you should be fine with atomic helpers
<danvet>
even when the connector moves around, it keeps track of that stuff and should order it all
<tzimmermann>
and i'm looking for ways they could overlap
<danvet>
tzimmermann, is that with my patch to nuke legacy cursor already applied?
<tzimmermann>
danvet, no without the patch
<tzimmermann>
but i cannot even reproduce it. the reporter fixed it by repeatedly setting I/O registers
<tzimmermann>
maybe the HW is too slow to catch up with the rest
<tzimmermann>
it doesn't look like the correct fix
<danvet>
uh yeah that's pretty horrible
<danvet>
I would try with the legacy cursor patch applied, I think it can cause stuff like this
<danvet>
if it's something else then I guess a spinlock around all the indexed register writes is what's needed
<tzimmermann>
ok, i'll the patch
<tzimmermann>
if the atomic updates are serialized, maybe the ast HW simply needs time after a full modeswitch before it accepts new commands; just guessing here.
<danvet>
hm yeah maybe that could be it too
<danvet>
that it's not a race, but the hw being a bit slow, and you actually have to hammer the index reg until it goes through
<danvet>
I guess that could be the 3rd option really
<danvet>
and for that case ofc no spinlock needed
<danvet>
I guess we could test this by adding some tracing to the index_reg functions?
<danvet>
if they run concurrently, then there's an sw bug
<danvet>
if they never run concurrently, then probably a hw issue
soreau has quit [Read error: Connection reset by peer]
<tzimmermann>
thanks for confirming
<danvet>
like just do an atomic_inc/dec around the code and complain if it's ever elevated
<danvet>
and test with that
soreau has joined #dri-devel
<tzimmermann>
good idea
<tzimmermann>
thank god ast HW is all fully documented with example code and reference drivers directly from the manufacturer /sarcasm
<lynxeye>
danvet: Yea, I've had that on my list for while, but didn't really get around to having an opinion yet due to other things having higher prio.
<danvet>
lynxeye, I think once the shrinker stuff that's in the works has landed, there's really not anything left for shmem helpers
<danvet>
so might actually be good to have etnaviv using it, to make sure that stuff all fits
itoral has quit []
<lynxeye>
danvet: agreed. The shrinker patches bumped this up a bit on my prio list, but still not at the top.
<danvet>
yeah makes sense, just wanted to make sure you've seen this
<danvet>
I'm expecting the shrinker patches to take some time still anyway, the locking is a bit a mess
YuGiOhJCJ has quit [Quit: YuGiOhJCJ]
<danvet>
sravn_, I'm assuming you're also pushing the fbcon patch you acked?
slattann has quit [Ping timeout: 480 seconds]
<swick>
pq: MrCooper: it's not only dithering, it's the complete color pipeline we would have to know about to get the effective bpc
<swick>
we would have to know the precision before and after each block and every time a block goes from a higher to lower bpc there could be dithering
<swick>
so either user space has to know about all that stuff or min_bpc must be the effective minimum bpc
<pq>
swick, IOW, would you reject an atomic commit if min_bpc happens to be larger than, say, CTM block precision?
<pq>
that's an interesting idea, but I wonder what that means for discoverability of working configurations...
<pq>
it seems every new KMS property adds a new dimension to the combinatorial explosion
<swick>
pq: if the CTM block is used in a way that the precision is below min_bpc then yes
<pq>
At least with the color pipeline we have the option to not use any of it. Link bpc we cannot ignore.
<swick>
it could be passthrough and retain the full pipe precision or dithering the CTM result could increase the effective bpc
<swick>
yeah
<pq>
I'm kinda hoping I could just ignore the precisions of the color pipeline hardware blocks, until the "libcamera for KMS" exists.
Company has joined #dri-devel
<swick>
honestly if drivers ignore all of that in the beginning I wouldn't be too mad
sh-zam has joined #dri-devel
heat has joined #dri-devel
<swick>
I would assume that most hardware is designed to provide the precision it can drive the display at, too
sh_zam has quit [Ping timeout: 480 seconds]
fxkamd has joined #dri-devel
jewins has joined #dri-devel
mdroper has joined #dri-devel
<rgallaispou>
Hi. I'm playing with kms_color to test the gamma property. After the test ends, the pointer associated with the last gamma lut passed to the kernel is still in use when wayland-weston is started. Is my driver broken or it is a standard behavior to keep the gamma lut ?
<pq>
rgallaispou, it's standard KMS behavior. Weston lacks resetting most KMS properties.
<pq>
these proeprties in particular are in my todo to fix in Weston
<pq>
I've had fun with fbdev in HDR mode...
<rgallaispou>
pq: okay it's good to know
nchery has quit [Read error: Connection reset by peer]
<rgallaispou>
pq: yes, I can imagine why
<rgallaispou>
Thanks anyway :)
<marex>
sigh, seems like powervr driver is effectively back to being dead
<ajax>
oh?
<marex>
is there any activity ?
<ajax>
!16040 was last touched on monday. i've gone six months without touching some of my MRs.
<marex>
every time I tried to bring it up on HW with powervr I have here, I found the kernel driver is outdated, specific firmware does not work or is unavailable or I have the wrong version with no way of getting the right version ... userspace at least compiles
Haaninjo has joined #dri-devel
<kj>
The 1.17 update is stuck in an internal process. I was going to ping people tomorrow since I haven't received approval from all the necessary people
<marex>
that doesn't help me, the hardware I have ships with blob built for API 1.16 . Until there is easy support for different APIs , the powervr stuff is unusable except for one specific SoC model
<kj>
There should be a 1.17 firmware binary released soon so that might be of some use
<marex>
kj: and that works on all SoCs and thus powervr revisions ?
<marex>
kj: or is this one specific binary for one specific SoC again ?
<ajax>
do we actually do anything with the semaphore passed to vkAcquireNextImageKHR ?
<mlankhorst>
karolherbst: Currently not doing much on locking, patch you linked seems sane, hence I'm worried that it probably breaks. ;)
<ajax>
:q
<kj>
marex: It likely wouldn't work on all platforms out there but it might be worth trying out. I think it might just work however we still need to upstream the firmware binary and the pvrsrvkm kernel module + the mesa side changes which are stuck in the internal process
nchery has joined #dri-devel
<ajax>
getting the distinct impression that nobody understands wsi
<karolherbst>
mlankhorst: :D
<karolherbst>
mlankhorst: if it breaks something that would worry me even more
<mlankhorst>
You haven't written many locking patches to i915 then.
<karolherbst>
I didn't, but the code looks obviously wrong though
<karolherbst>
well, the current one that is
<emersion>
ajax: /me adds to WSI related quote list
<marex>
kj: is there a tree with up-to-date kernel driver ?
<ajax>
okay, got it. what we do with that semaphore is we require that the backend ANI actually acquire an image synchronously, and then we signal the sema on our way out
<ajax>
this seems: bad
sdutt has joined #dri-devel
<ajax>
i mean, not wrong, but also not good
thellstrom has quit [Ping timeout: 480 seconds]
sdutt has quit []
sdutt has joined #dri-devel
<marex>
kj: so yes, that kernel driver is based on some 6 months old kernel version, ancient
thellstrom has joined #dri-devel
gawin has quit [Ping timeout: 480 seconds]
<marex>
I'll just stop here
<melissawen>
mripard, danvet, related to the previous discussion on async flip (I guess): I see in drm_mode_atomic_ioctl we are aborting when there is a ASYNC_FLIP flag and there is also a comment in crtc->async_flip that `It's not wired up for the atomic IOCTL itself yet`
<melissawen>
so how do we usually handle async page flip in a atomic context? we just don't do it right now?
<melissawen>
is there any drivers doing it in a custom implementation, for example?
<melissawen>
I'm looking into this topic and ended up quite confused on what we can currently do or not...
iive has joined #dri-devel
<mlankhorst>
modify a single fb is usually most what is allowed async
jkrzyszt has quit [Ping timeout: 480 seconds]
mdroper has quit [Read error: Connection reset by peer]
pcercuei has quit [Quit: brb]
pcercuei has joined #dri-devel
aravind has quit [Ping timeout: 480 seconds]
nvishwa1 has joined #dri-devel
tzimmermann has quit [Quit: Leaving]
thellstrom1 has joined #dri-devel
rgallaispou has left #dri-devel [#dri-devel]
thellstrom has quit [Ping timeout: 480 seconds]
sdutt has quit [Remote host closed the connection]
<Lyude>
even managed to get it so that we can change payload->vcpi_start_slot during the atomic commit like I needed
<danvet>
Lyude, hm why do we only copy stuff over that late?
<Lyude>
danvet: because we're not holding any locks by the time we start committing the state potentially in non-blocking modesetting, right?
<danvet>
or can we compute the vcpi slots only when we do the actual commit and not precompute things?
<Lyude>
yeah you can't precompute them
<Lyude>
because you'd need to know what order the driver is bringing up the payloads in order to do that
<danvet>
ah ok
<danvet>
hm
<danvet>
I guess it's not super atomic-y thought, but should work
<danvet>
deserve a huge comment for that state that it doesn't work quite like the others
<Lyude>
yeah, luckily the thing about start slots is they're totally irrelevant to any actual state computation - so the values there also don't really matter until commit time
<danvet>
yeah I think just a comment that we use this as scratch patch and that it works like hw state essentially - i.e. races are prevented by careful ordering, not locking
<danvet>
and also a comment that the drm_crtc_commit completion provides the necessary cpu memory barriers for this to work correctly
<danvet>
since strictly speaking you're doing a fancy lockless thing here
<Lyude>
mhm - gotcha, was planning on doing something like that once I start cleaning this series up
sravn_ has quit []
<anholt>
jekstrand: were you objecting to making tex->txl lowering optional? or just asking for explanation
ybogdano has quit [Ping timeout: 480 seconds]
slattann has joined #dri-devel
mbrost_ has joined #dri-devel
<jekstrand>
It still feels kinda bogus to me.
<anholt>
frontend shading language doesn't have txl(shadowcube) or txl(shadow2darray)
<jekstrand>
anholt: It's an obviously correct transform that most hardware needs.
<anholt>
but NIR insists on making those, because...?
<jekstrand>
Ugh...
mbrost has quit [Read error: Connection reset by peer]
<anholt>
so then the drivers have to back out the lowering they didn't want
<jekstrand>
which drivers is this causing a problem for? virgl, maybe, I guess.
<anholt>
nir_lower_tex has a bunch of options, and then there's this one non-optional thing it does, too.
<karolherbst>
jekstrand: nv50
<karolherbst>
we don't have lod sources for those
<anholt>
nouveau's the one that didn't have a workaround for NIR yet.
* jekstrand
wonders if NV hardware does the right thing there automatically or if shadow sampling in vertex shaders is just busted and no one cares.
<anholt>
in ntt (virgl) and radeonsi we recognize that NIR did the thing we didn't want and back it out.
<karolherbst>
jekstrand: we just don't have it on nv50
<karolherbst>
the hw is.. broken :)
<jekstrand>
Bingo!
<karolherbst>
so we can only pass 4 sources into tex
<karolherbst>
and most of them are filled up with coords
<anholt>
jekstrand: uh, do you positively know that VS texturing on nv50 doesn't set lod to 0?
<karolherbst>
and then you add shadow and you got 4
<jekstrand>
anholt: I have no idea. But I'd kind-of like someone to prove that it does before they say the workaround isn't needed.
<anholt>
given that nvc0 got a knob for lod 0, but the knob isn't on nv50, I would easily believe that it's a shader stage knob on older, then an instruction knob later when they realize the same HW bits would shave instructions in the FS.
<karolherbst>
we can't encode the lod for shadowcube or shadow2darray
<karolherbst>
on nv50 that is
<karolherbst>
there is nothing
<jekstrand>
I'm not arguing that you can't encode it
<jekstrand>
I'm asking if VS texturing works
<jekstrand>
Maybe it does by automagic
<karolherbst>
I think it doesn't :)
<karolherbst>
imirkin said something about parts being broken, but nvidia exposing that anyway and fails to compile
<karolherbst>
something like that
<jekstrand>
:facepalm:
<DanaG>
I'm curious, how hard would it be for somebody to add DRI_PRIME support to the `ast` driver? I'd like to be able to DRI_PRIME offload, or use xrandr offload to render on the Radeon and dump into the ASPEED's framebuffer. If making it not bog down the Radeon would require skipping 3/4 of the frames, or make it allow horrible tearing, that would be fine.
<anholt>
karolherbst: that was about bias.
<karolherbst>
ahh
<karolherbst>
so the lod works, but the bias is dropped, right?
<jekstrand>
Maybe it's just too many years at Intel but I don't assume hardware does things right.
<karolherbst>
anyway.. we have 4 sources and have to do something
<jekstrand>
I assume someone noticed VS texturing was broken on nv50 and went "ugh... we need to force LOD to 0, let's add an instruction bit"
<karolherbst>
on later gens we can encode up to 8 sources
<karolherbst>
but we do have a special zero lod flag as well
<anholt>
jekstrand: I've never understood. What is the argument for why NIR *should* turn tex into txl on VS on all drivers?
<anholt>
like, we don't force many lowering passes on everyone, why this one?
<jekstrand>
anholt: Yeah, it's weird. I'm not opposed to making it optional, in principal. I'm opposed to saying it breaks nv50 when it may actually be sort-of fixing nv50 and we're all too lazy to care and figure out why.
<anholt>
it's valid, but also why not turn tex into txb(bias=0) in the FS? that's also legal.
<karolherbst>
I think it would be fine to force this lowering, but then we have to explicitly say : tex won't reach the driver or something
Emmy_ has quit [Remote host closed the connection]
<anholt>
and my argument is: you shouldn't be adding tex srcs when you don't have to. and nir-to-tgsi wishes you wouldn't.
<jekstrand>
I also don't like instructions having subtly different behavior per-stage unless they're stage-specific.
sdutt has quit [Remote host closed the connection]
sdutt has joined #dri-devel
<jenatali>
FWIW, D3D only has shadow tex with implicit mip, and shadow txl for level 0, and for VS only the latter is allowed. There's no arbitrary level txl
<anholt>
(nir-to-tgsi has to back it out, because then virgl would need to recognize txl(shadowcube/2darray, lod=0) and turn it back into tex since txl on those samplers is illegal!)
<jekstrand>
Why does texturing behave a little differently in 2/3 of the stages? Uh... we didn't think about it when designing GL until it was too late?
<karolherbst>
well back then you only had two stages, no?
<karolherbst>
or was there a time with just one stage even? :D
<jekstrand>
There have always been >= 2
slattann has quit [Read error: Connection reset by peer]
<jenatali>
Isn't it more like 1/6 stages? FS is the only one that does implicit LOD
<jekstrand>
jenatali: Compute too, with some extensions.
Emmy_ has joined #dri-devel
<jenatali>
Ah right, I forgot about those extensions
<anholt>
we definitely had one stage in the fragment program days.
<karolherbst>
couldn't we make those instruction to behave the same inside nir?
<jekstrand>
anholt: right...
<karolherbst>
sure.. glsl is stupid, but why should we carry the stupid over
<jekstrand>
karolherbst: That's my argument. :)
<karolherbst>
yeah...
<jekstrand>
If VIRGL wants to translate back to GLSL, it's got to deal with GLSL being stupid. Same for Zink.
<jekstrand>
Maybe we want a txz opcode which is txl with lod0?
<karolherbst>
I am not opposed to check for a zero lod inside codegen, especially as we do have that special lz flag and could make use of it, but...
<karolherbst>
jekstrand: maybe
<anholt>
jekstrand: TGSI's got one of those.
<karolherbst>
tex_lz?
<anholt>
yep
<jekstrand>
anholt: Are those allowed for shadow and cube?
<anholt>
nvc0 and radeonsi have it in hw.
<anholt>
yes
<jekstrand>
Then why not translate nir_texop_txl with lod=0 to tex_lz?
<anholt>
jekstrand: in ntt? because the ntt-consuming drivers don't have support for it, it's a cap.
<jekstrand>
oh
<anholt>
the reason to fix nir_lower_tex was because if we can get a decision for where to fix NIR's undesired lowering, we can get nouveau onto NIR on all chipsets and not go the ntt route for it.
<anholt>
and then I can finally land !8044
<jekstrand>
Yeah, I know.
<karolherbst>
well.. we could handle that in some way inside from_nir, but... I also kind of prefer to move lowering from codegen into nir :)
<jekstrand>
And I really want to land 8044
<karolherbst>
but...
<karolherbst>
maybe it does make sense to add a tex_lz? but that's going to be messy
<anholt>
right. so karolherbst wants the fixup not in nir frontend. imirkin wants the fixup not in the backend, because it's not legal in the shading language. jekstrand wants the fixup not in nir because VS tex instead of txl is silly. I want someone to budge because I just want to move on with my life.
<jekstrand>
anholt: I know
* jekstrand
kind-of wants to replace the entire nouveau compiler. :-P
* karolherbst
wants the same
<jekstrand>
But that's not going to happen today. :)
rasterman has quit [Quit: Gettin' stinky!]
<karolherbst>
yeah... so my reason for moving stuff into nir is, so that codegen shrinks
<karolherbst>
if we can rely on using nir, I can throw out quite a lot of code
<jekstrand>
I guess it's probably fine. I really don't like "nouveau is sloppy" to be the reason we carry tech debt in NIR.
<jekstrand>
But I can get over it. glsl_to_tgsi is more tech debt than this tiny bit of lowering.
<karolherbst>
well.. if we are expected to fix stuff up when going from nir to codegen, that's fine by me
<jekstrand>
So we're still going the right direction.
<karolherbst>
I just don't want to add more stuff into codegens lowering
<anholt>
jekstrand: when I look at doing this in NIR, it feels like paying down tech debt because so many drivers have to reverse the undesired thing nir adds.
<jekstrand>
anholt: You say "so many" and it's really just 2 AFAICT.
<anholt>
(though, that argument doesn't hold so much because radeonsi and nvc0 would want to recognize lod==zero anyway since they can do tex_lz on all stages)
<jekstrand>
radeonsi isn't undoing anything.
ybogdano has joined #dri-devel
<jekstrand>
In fact, with the NIR lowering, radeonsi doesn't need to be doign its stage check which is the point.
<jekstrand>
And neither does nouveau except for one piece of hardware we aren't sure works.
<jekstrand>
But whatever, I don't want to keep arguing.
<jekstrand>
As long as we can come up with a better name for the bit (I suggested one), I guess I'm fine adding it.
aravind has joined #dri-devel
slattann has joined #dri-devel
<ajax>
jekstrand: want to pick your brain about !4037 if you have a minute
<jekstrand>
ajax: Sure, what about it?
<ajax>
hah, race condition, i just posted a comment on the mr
<jekstrand>
ajax: Feel free to pull the first 6 into a different MR and land them.
neonking has quit [Ping timeout: 480 seconds]
<ajax>
hmm. could really use a way to tunnel arbitrary events back through an XGE channel.
<zmike>
ajax: really really really need that xlib change ^^^^^^
<ajax>
you know what especially sucks
<ajax>
i probably want to fix that in glvnd's frontend too
<ajax>
let's see if i remember how to do x module releases
slattann has quit []
<ajax>
ugh okay.
<ajax>
jekstrand: so the irritating thing here is, we're using x11/present in a totally cromulent way, we're passing in the idle fence and waiting for it before we hand it back from ANI.
<jekstrand>
yup
<ajax>
the bug is that the way present releases the pixmaop is it calls into the ddx to flush rendering, and glamor treats that "flush" literally as glFlush instead of anything stronger.
<ajax>
so, iiuc, all xserver has done is submit commands to the device, it hasn't waited for their completion and it can't guarantee those commands get submitted before the client takes back over
<ajax>
(assuming single-queue to the hardware from the kernel, but with no particular ordering among drm clients)
ngcortes has joined #dri-devel
<anholt>
this sounds correct to me -- at the time we wrote that, everyone had implicit sync, and glFlush() got you to the kernel.
<jekstrand>
ajax: Yup. That's why we still need implicit sync
neonking has joined #dri-devel
sdutt has quit [Read error: Connection reset by peer]
stuart has joined #dri-devel
rasterman has joined #dri-devel
<DanaG>
One other idea for the DRI_PRIME with ASPEED: have the ASPEED pull from the Radeon, instead of having the Radeon push to the ASPEED.
<ajax>
DanaG: i didn't think aspeed chips had enough dma to do that
DanaG has quit [Remote host closed the connection]
<ajax>
jekstrand: is ARB_sync explicit enough here? if glamor did glFenceSync and waited for it to pass before emitting the present-idle event, good enough?
<ajax>
is there any benefit to that over just glFinish, too
<jekstrand>
ajax: Both would stall in X and kill any pipelinling
<ajax>
i don't need to stall, i have a main loop and i can ClientWaitSync(timeout=0) just fine.
gawin has joined #dri-devel
<jekstrand>
I don't mean it stalls X I mean we have to do a full round-trip through userspace and possible wait inside the client (with whatever implications that has) before work gets submitted.
<jekstrand>
With the likely solution being num_buffers++
<ajax>
i don't follow "before work gets submitted" here. the idle fence wouldn't get touched until after the gl fence passed, and we (wsi) wait on the idle fence before ANI will give it back to the app
<jekstrand>
With the way things are today, as soon as X has submitted its compositing job or blit, it can flush, hand us back the buffer, and we can hand it off to the app so they can start rendering to it.
<ajax>
yes i am suggesting glamor would be better
<ajax>
can easily be, small numbers of lines of code.
<jekstrand>
If we glFinish or wait on a fence in X11, regardless of where that wait happens, the client can't even start building command buffers to render until after the GPU is done with that buffer.
eukara has quit []
YuGiOhJCJ has joined #dri-devel
<ajax>
walk me through that? what part of the buffer state is so mutable while it's busy that you can't even start?
eukara has joined #dri-devel
<ajax>
it's not going to change size
<ajax>
it's probably not getting memmoved anywhere
<jekstrand>
The app can't start until it gets it back from ANI because it doesn't know which image ANI is going to return next.
<jekstrand>
So we want to be able to return it from ANI the moment X knows a buffer is going to be free
<jekstrand>
That's the whole reason ANI comes with fences and semaphore.s
<ajax>
(thinking)
<ajax>
that sounds a bit like you want to virtualize the mapping between VkImages and X Pixmaps?
<jekstrand>
You might want me to. :P
<jekstrand>
That's something that has been discussed but no.
<jekstrand>
We want to know the actual BO that's going to become free
<jekstrand>
And have a fence/semaphore
<jekstrand>
that tell you when it's actually free
<jekstrand>
The theory being that there's no point in the app trying to race with X on its current composit anyway.
<jekstrand>
(Unless that app is a VR thing but those are crazy and special and don't want X in the way to begin with)
alanc has quit [Remote host closed the connection]
alanc has joined #dri-devel
<ajax>
fine.
<ajax>
how much of a stall ReadPixels
<ajax>
is ReadPixels, excuse me
<ajax>
there _has_ to be some way to learn that a particular command buffer has been completed without stalling the whole chip
<ajax>
how else do you ever reclaim old command bufs
<ajax>
enh. i guess the ring buffer just tells you when a command goes into the gpu, not when it finishes.
<airlied>
karolherbst: oh you have to specify row/image strides on user buffers? seems dangerous :-P
<karolherbst>
airlied: well... the user provides the buffer
<airlied>
karolherbst: does the gallium interface allow that?
<ajax>
jekstrand: is it actually that hard to predict which image will be returned next? it's the one with the lowest sbc.
<karolherbst>
nope
sravn has joined #dri-devel
<airlied>
karolherbst: did clover deal with it somehow?
<karolherbst>
airlied: not at all
<karolherbst>
I think it relies on resource_from_user_memory to fail
<karolherbst>
the GL interface also doesn't allow custom strides which is a bit ... strange
<airlied>
karolherbst: ah so cl_image_desc is the thing that specs it?
<karolherbst>
yeah
<karolherbst>
AMD_pinned_memory is the GL extension btw
<airlied>
should check the vulkan ext
<airlied>
just for completeness
<HdkR>
pinned_memory is such a wacky extension
<karolherbst>
it looks broken
<airlied>
karolherbst: pinned memory seems buffer only
<karolherbst>
so you specify the host ptr and say how big the texture is, but... not a word about strides?!?
<karolherbst>
could be
<karolherbst>
but
<karolherbst>
you can read pixels out of it
<karolherbst>
boxed
<airlied>
you use the gl packing to do that then
<HdkR>
SSBOs, UBOS, pixels. Dolphin-emu should abuse all of those with pinned_memory
<karolherbst>
but yeah.. looks like the underlying memory needs to be a plain buffer
<jekstrand>
ajax: It's impossible for the app to predict.
<jekstrand>
ajax: The driver might be able to predict it, sure, but that doesn't solve any problems.
<karolherbst>
airlied: yeah
<ajax>
it does if it means ANI can return a promise and the fences/semas actually work
<ajax>
right?
<jekstrand>
ajax: But the only way to actually provide that promise is if the driver then stalls later because we can't actually pass the fence we get from X11 off to the kernel.
<jekstrand>
Most drivers do use submit threads these days (or can) so we could, in theory, do it.
<ajax>
unless we fix x11
<jekstrand>
But oof
<ajax>
which i keep telling you i know how to do
<jekstrand>
What are we going to fix in x11?
<karolherbst>
airlied: anyway.. the thing is, CL requires custom pitches and gallium doesn't allow it :(
<jekstrand>
I'm still unclear on that
<karolherbst>
it's fine for buffers, because it doesn't matter
<jekstrand>
The only "fix" for x11 is if it starts using syncobj instead of shmfence
<airlied>
karolherbst: guess you get to fixing gallium then :)
<ajax>
or, for its shmfences to reflect explicit sync instead of implicit
<karolherbst>
airlied: it seems that way :)
<jekstrand>
ajax: Then we're back to driver threads
<ajax>
i mean i have one of those for wsi for fifo modes anyway...
abws has joined #dri-devel
<jekstrand>
Unless it's an explicit sync primitive I can hand off to the kernel as part of my exec, we have to manage the whole VkQueue with a driver thread and wait for the fence in the driver before we submit to X11
<karolherbst>
airlied: but actually, I think it's time to upstream some patches atm, so I am a bit reluctant to add more stuff at this point :D
<karolherbst>
the MR is way to big already anyway
<jekstrand>
At which point X11 not being able to use kernel sync primitives is impacting driver design. Please, no.
<jekstrand>
(Impacting far deeper than a bit of annoyance in the WSI code, that is.)
<karolherbst>
jekstrand: ohh btw.. would you mind if I fold your rusticl reference stuff into the "initial" commit? I'd like to clean it all up and would rather prefer to fix the existing commits than to add a bunch of new ones :D
<jekstrand>
karolherbst: fine with me
<karolherbst>
okay, cool
<airlied>
karolherbst: do both :-)
<airlied>
create some upstreaming MRs, hack away while reviewers hold back your brilliant code :-P
<karolherbst>
:D
<karolherbst>
well..
<karolherbst>
I don't want people to review patches if I change stuff 100 patches later
<airlied>
karolherbst: oh the upstreaming MR should be cleanly rewritten usually
<karolherbst>
so I wwant to move all "fixups" into the original code
<karolherbst>
yeah
<karolherbst>
that's my plan for now :P
<airlied>
hopefully next week I can get back to trying to get any images working
<karolherbst>
or maybe we could leave it in late, but also call it inside lp_get_disk_shader_cache if necessary?
<karolherbst>
anyway, I need it for caching the libclc
ybogdano has quit [Ping timeout: 480 seconds]
mbrost has joined #dri-devel
mbrost_ has quit [Read error: Connection reset by peer]
Duke`` has quit [Ping timeout: 480 seconds]
GeorgesStavracasfeaneron[m] has joined #dri-devel
<zmike>
dcbaker: do you prefer to look over backport MRs or should I just marge them?
frieder has joined #dri-devel
<jenatali>
Georges Stavracas (feaneron): Yes, but you're not registered correctly, so only other people who are connected via Matrix can see your messages
frieder has quit [Remote host closed the connection]
famfo has quit [Ping timeout: 480 seconds]
famfo has joined #dri-devel
<airlied>
karolherbst: yeah that patch would be fine to init it early I think
<karolherbst>
airlied: ohh.. how easy would you think would it be to wire up function calling support in llvmpipe?
<airlied>
karolherbst: I had it mostly working, except for implicit args
<karolherbst>
airlied: well... creating the screen also ends up spawning all the threads sadly
<airlied>
and flow control :-P
<karolherbst>
ahh
<airlied>
karolherbst: ah yeah that bit is messy isn't it
<karolherbst>
wait.. so glProgramStringARB "simply" replaces the current glsl source code?
<jenatali>
Looks like they have a comment after END and it seems Mesa's parser doesn't like that
<karolherbst>
or is that ARB shader stuff?
<jenatali>
This... is not a thing I thought I'd be debugging today
<karolherbst>
(I am too young to know this)
<jenatali>
karolherbst: It's an assembly shader
ngcortes has quit [Ping timeout: 480 seconds]
<karolherbst>
ahh okay, so my first guess was indeed correct
<karolherbst>
I am sure they do it because of performance
<karolherbst>
as everybody knows, assembly is faster
<jenatali>
Uh huh
gawin has quit [Ping timeout: 480 seconds]
<karolherbst>
heh nice.. it still crashes with undef to zero
famfo_znc has joined #dri-devel
famfo has quit [Ping timeout: 480 seconds]
rasterman has joined #dri-devel
gawin has joined #dri-devel
<jenatali>
Ugh... I think it looks like Mesa's lexer for ARB programs doesn't like that the last comment line in this program doesn't have a newline
<HdkR>
Oh no, ARB
danvet has quit [Ping timeout: 480 seconds]
icecream95 has joined #dri-devel
<karolherbst>
jenatali: well.. I'd say it's an application bug then :)
<karolherbst>
case closed
<jenatali>
But it works on all Windows GL drivers...
<karolherbst>
ehh.. the spec disagrees with me, I say the spec is wrong
<karolherbst>
"Comments begin with the character "#" and are terminated by a newline, a carriage return, or the end of the program array." :(
<HdkR>
I think someone actually complained about this a year or so ago
<Sachiel>
not ending a text file with a newline should always be a bug
ngcortes has joined #dri-devel
<karolherbst>
Sachiel: it's windows
<karolherbst>
they seem to like that
<karolherbst>
:P
ybogdano has joined #dri-devel
<karolherbst>
jenatali: workaround: insert a new line at the end of the program :)
<jenatali>
I'm tempted
<karolherbst>
I am sure we copy the string anyway, so we can also just add it :D
<jenatali>
Yeah actually that's probably the easiest thing...
<HdkR>
`/* The newline of shame gets added to all ARB programs as a workaround */`
ahajda_ has joined #dri-devel
ahajda has quit [Read error: Connection reset by peer]
<jenatali>
Yeah
<karolherbst>
I wouldn't be surprised if that's even better from a perf perspective
<jenatali>
Otherwise apparently the only other option is for EOF to generate a different character that can be matched against? WTF flex?
<karolherbst>
the alternative is to make the parser/lexer more complicated
<jenatali>
Yeah
maxzor has joined #dri-devel
pcercuei has quit [Quit: dodo]
mbrost has quit [Ping timeout: 480 seconds]
* jekstrand
realizes he knows way too much about NaN and begins questioning life choices
<jenatali>
There we go, fixed by !16230, as ugly as it is
<HdkR>
ah, probably fixes a bunch of things compiling to ARB with CG?
illwieckz has joined #dri-devel
<HdkR>
Probably also fixes dolphin-emu 2.0 era OpenGL then :P
<jenatali>
Probably?
<jenatali>
Our compat folks just flagged this particular app because it doesn't need GL to render, but if you add in a Mesa GL impl, then it stops rendering
<jenatali>
So it's technically a regression by adding GL support
tursulin has quit [Read error: Connection reset by peer]
<jenatali>
Anyways, reviews or acks welcome. I have no idea if that code has an owner these days
morphis has quit [Ping timeout: 480 seconds]
morphis has joined #dri-devel
maxzor has quit [Ping timeout: 480 seconds]
<airlied>
karolherbst: you can't create a context before libclc? though that would be pretty ugly
<dcbaker>
zmike: just let me know that you’ve merged stuff so I don’t force push over it
<airlied>
the other option would be to add a flag to screen creation I suppose
<karolherbst>
airlied: the problem isn't that I can't, the thing is, I don't want to :P
<karolherbst>
I kind of load the libclc when I create the device struct
<karolherbst>
and that's like really really early
<airlied>
plumbing a flag through would also be messy
<karolherbst>
yeah...
ahajda_ has quit []
<zmike>
dcbaker: kk
<airlied>
karolherbst: the other option could be to delay disk cache thread init