ChanServ changed the topic of #dri-devel to: <ajax> nothing involved with X should ever be unable to find a bar
<robclark>
DemiMarie: so one thing that is currently being enabled is setting up a fixed mapping.. say take an 8GB window, setup an unbacked anon r/o mmap in vmm and map that to guest.. and then on host side dynamically map GEM buffers (backed by whatever) into that window and when they are unmapped overwrite the vmm mmap w/ anon r/o mmap... couldn't you do something like that for vram?
<DemiMarie>
robclark: Xen does not (yet) support unmapping emulated BARs via MMU notifier, I think.
<DemiMarie>
So the kernel driver can’t unmap anything.
<robclark>
tbh, I'm not super familar with xen.. but from a hw standpoint, or if there are x86 vs arm differences in how this works... but shouldn't it be two independent stages of address translation, (va -> ipa -> pa)?
<Lynne>
bcheng: correct, only on navi3x, have tried to replicate on navi2x, but haven't seen it happen there
pcercuei has quit [Quit: dodo]
<DemiMarie>
robclark: yes, but the problem is that Linux doesn’t have control of the IPA -> PA page tables. That’s Xen’s job.
KDDLB07 has joined #dri-devel
alanc has quit [Remote host closed the connection]
alanc has joined #dri-devel
jfalempe has quit [Remote host closed the connection]
jfalempe has joined #dri-devel
flynnjiang has joined #dri-devel
<robclark>
oh, hmm.. because the "host" is in a vm too.. idk, it mostly seems like a sw/xen problem but it seems like amd folks are interested in xen so are better connected to the problem
<eric_engestrom>
tintou: thanks! I would've noticed eventually xD
<eric_engestrom>
kusma: yeah, the bucket was configured to auto-delete the kernels after a couple of months, and apparently it's really hard to configure them better because they've been working on it for many months now 😅
<eric_engestrom>
the workaround so far has been for someone who has access to the gfx-ci/linux repo to re-run the jobs to re-compile & upload the kernels
<kusma>
OK, uh, can someone please do that so I can merge again? :D
<eric_engestrom>
(that's the one for your case for instance)
<eric_engestrom>
anyone who has access can re-run the jobs in the latest pipeline on that page
<kusma>
Looks like something is running right now?
<kusma>
Or was, now it's pending, I guess?
<eric_engestrom>
yup, looks like DavidHeidelberg just re-ran them
<eric_engestrom>
"Created just now by David Heidelberg"
<kusma>
Awesome :)
<eric_engestrom>
yeah, last re-run was feb 11, so I think it's exactly every 2 months that someone has to go around and re-run everything
<DavidHeidelberg>
I have MR for dropping them, just didn't managed to stress-test it enough, since I know some freedreno problems popped a bit randomly, so hopefully 1. phase soon these will be dropped and then we move new jobs to 4ever-lasting s3 bucket of Images
<eric_engestrom>
would be nice if the bucket was no longer configured to delete everything 🙃
<DavidHeidelberg>
y, before Daniel left, he throw at us nice prepared bucket for that :D
<eric_engestrom>
please ping me on the MR when you merge it, I will need to make sure it's backported to the stable branch(es)
The_Company has joined #dri-devel
jfalempe has quit [Ping timeout: 480 seconds]
paulk-ter has joined #dri-devel
paulk-bis has quit [Ping timeout: 480 seconds]
jfalempe has joined #dri-devel
jfalempe has quit [Read error: Connection reset by peer]
guludo has quit [Remote host closed the connection]
guludo has joined #dri-devel
karolherbst has quit [Quit: Konversation terminated!]
<zamundaaa[m]>
pq: if I can restart the compositor and it works again, then the drm device can't be dead. I think it's just the logic around pageflip timeouts not being built to allow the compositor to recover
<pq>
zamundaaa[m], I made a diffence between "DRM device" and "opened DRM device".
<pq>
maybe you need to close the device and open it again, but what would tell you to do that...
<zamundaaa[m]>
Good idea, that might be a usable workaround for when pageflip timeouts happen
<pq>
if you really are supposed to re-open the device, then it would be similar to the case of "hot-unplug + hotplug", which a severe device reset might look like.
<pq>
I still think the error must be something else than EBUSY for that case.
<zamundaaa[m]>
I don't think compositors are supposed to do that, the kernel is just broken. Hotunplug does have the same problem too, pending pageflips never arrive, which tripped up KWin's logic before
warpme has quit [Read error: Connection reset by peer]
warpme has joined #dri-devel
<zamundaaa[m]>
With hotunplug of course, it can be easily worked around because you don't need the device anymore. Just not waiting for the pageflip to arrive is fine there and avoids all the issues
<pq>
you mean compositors are not supposed to re-open a device?
<zamundaaa[m]>
I don't think that should be needed
<pq>
it's not a literal re-open though, it's more like the DRM device disappered, and another DRM device appeared, and by luck it might be re-using the DRM device node name or not. So it's really no different from hot-unplug.
<zamundaaa[m]>
If the kernel were to actually signal device removal + re-adding it for GPU resets, that would break compositors though
<pq>
(maybe the DRM docs said something about re-using device node names, I forget)
<zamundaaa[m]>
As in, currently KWin just quits when the compositing GPU gets removed
<pq>
Compositors have to be able to handle device removal and additions anyway, because eGPU.
<pq>
currently they don't, I think, but they should
zxrom has joined #dri-devel
<pq>
a GPU reset could be harsh enough that it really looks like device removal, e.g. when all VRAM contents are lost.
<zamundaaa[m]>
Yes, that would be nice and is planned for KWin, but doesn't change anything for the kernel regression policy. If the primary GPU gets unplugged, the compositor goes down right now, so the kernel can't do that for GPU resets that otherwise work
<pq>
but, if the GPU reset is not that harsh, then the kernel really does need to make sure that pending flips are eventually completed.
<pq>
what's the regression?
<zamundaaa[m]>
KWin quits when it could recover just fine if the GPU reset was just treated like a normal GPU reset
<zamundaaa[m]>
From an EGL perspective, the EGLDisplay stays valid during a GPU reset, so actually removing the device would break all the apps too
<pq>
all that depends on what the reset does, right?
<pq>
I'm saying, there only two possible reasonable outcomes: either the page flip completes eventually, or the device gets effectively removed, depending on the kind of reset.
<zamundaaa[m]>
Unless the GPU actually gets hotunplugged, the EGLDisplay always stays valid. Even then it stays valid, you just can't create new contexts with it
<pq>
and I think you can use the existing DRM docs to make your case on that for a bug report
<pq>
I didn't say anything about EGLDisplay yet.
<pq>
I do say, that GPU does not need to get physically unplugged for the DRM device to be removed. A harsh enough reset or failure can do that too.
tshikaboom has joined #dri-devel
<zamundaaa[m]>
It doesn't get removed even with a full reset or failure. In the latter case it maybe should, but that's not super relevant for me - either way the screens don't get updated anymore
<zamundaaa[m]>
But if we were to add new drm uAPI to signal "you might wanna reopen the drm node" then that would be an acceptable solution. Only the actual udev remove event would break stuff
<pq>
if a page flip is stuck, it should still remain stuck, even if the KMS client goes away, and the new KMS client should still get EBUSY.
<pq>
maybe the driver does have a timeout, but it's longer than kwin's timeout?
<pq>
or just takes a little bit longer than you wait to recover
<pq>
but you said kwin won't recover even if it waits indefinitely?
<zamundaaa[m]>
It didn't recover even after waiting for a while
<pq>
unless maybe a switch to fbcon in the mean time hammers more resets in, like a full modeset
<pq>
quite strange
yyds has quit [Remote host closed the connection]
<zamundaaa[m]>
Yes. Either way, whatever is going on, we need a better way to deal with pageflip timeouts. Or any way to deal with them really
dorcasli_ has joined #dri-devel
rasterman has joined #dri-devel
bmodem has quit [Ping timeout: 480 seconds]
dorcaslitunya has quit [Ping timeout: 480 seconds]
warpme has quit []
<pq>
zamundaaa[m], IMO a page flip timing out is one of two things: a driver bug, or the device disappeared (which means you don't get EBUSY if you to flip again, you get a different error).
karolherbst has joined #dri-devel
<pq>
Device disappearance is communicated in two ways: udev device removal event, and UAPI returning errors. The errors might not be unique to device disappearance, but I think EBUSY is not it.
<zamundaaa[m]>
I agree, but currently it's a driver bug that isn't being taken care of. If the driver can detect the pageflip timing out (all of them do), then it could also do something about it in many cases
<pq>
Yes. You have the grounds to complain about a driver bug, and the DRM docs can back you up. :-)
<zamundaaa[m]>
I'll make a thread on the mailing list about it
<pq>
cool!
Mangix has quit [Read error: Connection reset by peer]
Mangix has joined #dri-devel
krumelmonster has quit [Ping timeout: 480 seconds]
krumelmonster has joined #dri-devel
zxrom has quit []
kts has joined #dri-devel
dorcasli_ has quit [Ping timeout: 480 seconds]
JohnnyonFlame has joined #dri-devel
YuGiOhJCJ has quit [Quit: YuGiOhJCJ]
itoral has quit [Remote host closed the connection]
kts has quit [Quit: Leaving]
opotin656 has quit [Ping timeout: 480 seconds]
padovan4 has quit [Ping timeout: 480 seconds]
dorcaslitunya has joined #dri-devel
karolherbst has quit [Quit: Konversation terminated!]
karolherbst has joined #dri-devel
mlankhorst has quit [Remote host closed the connection]
crabbedhaloablut has quit []
apinheiro has quit [Quit: Leaving]
crabbedhaloablut has joined #dri-devel
warpme has joined #dri-devel
heat has joined #dri-devel
zdobersek_ has left #dri-devel [#dri-devel]
zdobersek has joined #dri-devel
<MrCooper>
pq zamundaaa[m]: maybe closing the DRM file description happens to clean up whatever was lingering and causing EBUSY
<pq>
that would be a bug, IMO
<pq>
but of course it could
kts has joined #dri-devel
<MrCooper>
indeed it would, just a possible explanation why restarting the compositor worked
tlwoerner has joined #dri-devel
pickl has quit [Remote host closed the connection]
karolherbst has quit [Quit: Konversation terminated!]
greenjustin has quit [Remote host closed the connection]
kzd has joined #dri-devel
dorcasli_ has joined #dri-devel
hikiko has joined #dri-devel
dorcaslitunya has quit [Ping timeout: 480 seconds]
macromorgan has joined #dri-devel
macromorgan has quit [Quit: Leaving]
dorcasli_ has quit [Remote host closed the connection]
Haaninjo has joined #dri-devel
fab has quit [Quit: fab]
macromorgan has joined #dri-devel
pzanoni has joined #dri-devel
karolherbst has joined #dri-devel
sudeepd has joined #dri-devel
karolherbst has quit [Remote host closed the connection]
junaid has quit [Quit: Lost terminal]
karolherbst has joined #dri-devel
greenjustin has joined #dri-devel
karolherbst has quit []
fab has joined #dri-devel
dorcaslitunya has joined #dri-devel
<randevouz>
I am incapable of reading w3c too much material, i have not entirely given up, the way i see things my own is scarily similar, march 5 dri-devel irc under babylonian nick, i rambled with transition to smaller value, in fact if every alu bank transitions it's also possible to upconvert per alu then transition then branch, which is what i would call a triplet, and that can be done with batched queries, technically based of tests
<randevouz>
and logic or common sense that would also have to work, as long as predicate eliminates the odd or even branch correctly. I would have to elaborate that it's definitely possible, but my failure to provide anything useful that straight functions is already pissing off everyone.
randevouz was kicked from #dri-devel by ChanServ [You are not permitted on this channel]
dorcaslitunya has quit [Ping timeout: 480 seconds]
karolherbst has joined #dri-devel
<Lynne>
tchar: airlied: how is film grain supposed to be signalled as not supported on intel?
<Lynne>
the return codes for vkGetPhysicalDeviceVideoCapabilitiesKHR are strictly specified and mention no cases where film grain may not be supported
<randevouz>
I do not want to write new language or duplicate stuff, but i am really in trouble with filtering all the specs, feels like time would be better served if doing from scratch. https://github.com/SmartDataAnalytics/minds , if i put all the links i work/digest on the channel gets full of my text only. I can not process so much info it seems alone and to find a pin from the haystack, is more difficult than to code it my own, which
<randevouz>
also takes months to get anywhere, damn it seems like most links are click bates possibly , words are invitingly intriguing but code does not seem right , no compilation or testing done, i think the code looks not correct so far among those projects i have investigated.
dorcaslitunya has joined #dri-devel
warpme has quit []
hansg has quit [Quit: Leaving]
u-amarsh04 has quit []
kts has quit [Remote host closed the connection]
zxrom has joined #dri-devel
dorcaslitunya has quit [Ping timeout: 480 seconds]
DodoGTA has quit [Quit: DodoGTA]
jsa has quit [Ping timeout: 480 seconds]
kts has joined #dri-devel
u-amarsh04 has joined #dri-devel
DodoGTA has joined #dri-devel
DodoGTA has quit []
DodoGTA has joined #dri-devel
surajkandpal has quit [Ping timeout: 480 seconds]
kts_ has joined #dri-devel
kts has quit [Ping timeout: 480 seconds]
Duke`` has joined #dri-devel
riteo has quit [Ping timeout: 480 seconds]
<tchar>
Lynne: that looks like an oversight in the spec, we never had an implementation to test with that didn't support filmgrain at all, so this likely never came up :(
* tchar
though I understand Intel does support filmgrain, albeit with older generations following a "non-standard" algorithm.
Company has quit [Quit: Leaving]
<Lynne>
nah
<Lynne>
it simply did a shader filmgrain
<Lynne>
which vulkan can't do
<Lynne>
don't fall for their marketing scams
<tchar>
heh, but does it matter how they do the filmgrain in the driver? anything passes as a conformant filmgrain process it would appear
hansg has joined #dri-devel
<bcheng>
I thought the film grain support reporting is in the video profile
<tchar>
bcheng: it is there, but it seems we missed adding an error code for cases where the implementation can't support it when you signal it in the profile
<tchar>
maybe we could rely on VK_ERROR_VIDEO_PROFILE_CODEC_NOT_SUPPORTED_KHR, for the AV1 specific codec bits in theory, but it's a bit opaque
hansg has quit [Quit: Leaving]
The_Company has quit [Ping timeout: 480 seconds]
hansg has joined #dri-devel
<mareko>
DemiMarie: you are overestimating the resources that we have - if you want Xen to work perfectly on a specific chip and configuration, you can sign a contract with AMD
hansg has quit [Quit: Leaving]
<bcheng>
tchar: oh, i see...
simondnnsn has quit [Ping timeout: 480 seconds]
ninjaaaaa has quit [Ping timeout: 480 seconds]
ninjaaaaa has joined #dri-devel
simondnnsn has joined #dri-devel
<DemiMarie>
mareko: and that would be far beyond the resources *we* have. We need Xen support to work on past, current, and future chips, too. The only way I know to do that is to make Xen completely transparent to the driver, so that code that works without Xen automatically works with it.
anujp has quit [Ping timeout: 480 seconds]
frankbinns1 has quit [Ping timeout: 480 seconds]
tzimmermann has quit [Quit: Leaving]
Kayden has quit [Ping timeout: 480 seconds]
ninjaaaaa has quit [Ping timeout: 480 seconds]
simondnnsn has quit [Ping timeout: 480 seconds]
simondnnsn has joined #dri-devel
<alyssa>
ugh glmark
<alyssa>
don't use SRC_ALPHA blending and then just write a constant 1.0 in the FS >.<
Kayden has joined #dri-devel
kts_ has quit [Ping timeout: 480 seconds]
vliaskov has quit [Ping timeout: 480 seconds]
junaid has joined #dri-devel
<airlied>
Lynne: i think you are meant to fail early if user specifies filmgrain and you dont support it
hikiko has quit []
anujp has joined #dri-devel
<airlied>
i dislike that as i dont think its discoverable or scalable
randevouz has quit [Remote host closed the connection]
<Lynne>
yeah, when you submit the profile to test, but the error code is what I'm wondering about
<airlied>
dj-death: yeah no good ideas there, except use huc fw i think
<Lynne>
tchar: their driver may lie to us and do film grain by itself, but mesa wouldn't do that, would it?
iive has joined #dri-devel
<DemiMarie>
mareko: there might be a better option, though.
sgruszka has quit [Ping timeout: 480 seconds]
frieder has quit [Remote host closed the connection]
<alyssa>
Lynne: "you wouldn't film grain in mesa" meme template here
<mareko>
alyssa: st/mesa could optimize that
<alyssa>
mareko: It could... not sure if anything other than this goofy benchmark hits it tho
<tchar>
Lynne: i hope not! In case you really don't want the driver to do film grain, you can forcibly flag apply_grain off in the std headers and handle it in the application
JohnnyonFlame has quit [Ping timeout: 480 seconds]
Kayden has quit [Quit: Leaving]
<tjaalton>
gfxstrand: hi, after enabling nvk it seems to be trying to find syn using pkg-config, which seems wrong?
Marcand_ has joined #dri-devel
<dj-death>
airlied: yeah :(
<dj-death>
airlied: or do the nasty kind of tricks we're doing with the compute queue
<dj-death>
airlied: when some operation isn't supported, call in another engine to do work for you
<dj-death>
airlied: in that case could do a compute shader generating commands for you
<dj-death>
airlied: mentaaaaal
sudeepd_ has joined #dri-devel
jsa has joined #dri-devel
sudeepd has quit [Ping timeout: 480 seconds]
mripard has quit [Remote host closed the connection]
simon-perretta-img has quit [Ping timeout: 480 seconds]
simon-perretta-img has joined #dri-devel
lynxeye has quit [Quit: Leaving.]
warpme has joined #dri-devel
Marcand_ has quit [Ping timeout: 480 seconds]
ninjaaaaa has joined #dri-devel
simondnnsn has quit [Read error: Connection reset by peer]
ninjaaaaa has quit [Remote host closed the connection]
ninjaaaaa has joined #dri-devel
simondnnsn has joined #dri-devel
simon-perretta-img has quit [Ping timeout: 480 seconds]
warpme has quit [Read error: Connection reset by peer]
simondnnsn has quit [Ping timeout: 480 seconds]
simon-perretta-img has joined #dri-devel
ced117 has joined #dri-devel
simondnnsn has joined #dri-devel
heat has quit [Read error: No route to host]
heat has joined #dri-devel
fab has quit [Quit: fab]
junaid has quit [Remote host closed the connection]
Mangix has quit [Read error: Connection reset by peer]
Mangix has joined #dri-devel
ced117_ has joined #dri-devel
simon-perretta-img has quit [Ping timeout: 480 seconds]
ced117 has quit [Ping timeout: 480 seconds]
junaid has joined #dri-devel
frankbinns1 has joined #dri-devel
anujp has quit [Ping timeout: 480 seconds]
anujp has joined #dri-devel
simon-perretta-img has joined #dri-devel
<alyssa>
dj-death: intel-clc thanks you
<gfxstrand>
tjaalton: Yeah, you need to set MESON_PACKAGE_CACHE_DIR to tell it to look at the debian packages
frankbinns1 has quit [Ping timeout: 480 seconds]
Calandracas has quit [Remote host closed the connection]
pzanoni has quit [Quit: pzanoni]
Calandracas has joined #dri-devel
Calandracas has quit [Remote host closed the connection]
vliaskov has joined #dri-devel
Calandracas has joined #dri-devel
<dj-death>
alyssa: it's super slow though
rasterman has quit [Quit: Gettin' stinky!]
simon-perretta-img has quit [Ping timeout: 480 seconds]
<Calandracas>
does the llvmpipe rusticl backend behave meaningfully different from the other backends? radonsi, iris, and nouveau all all giving me the expect (correct) results, but llvmpipe is outputing pure garbage
<Calandracas>
^ could obviously be bad code too, am trying to rule out my code being the problem
junaid has quit [Quit: Lost terminal]
vliaskov_ has joined #dri-devel
<alyssa>
dj-death: aww
vliaskov has quit [Ping timeout: 480 seconds]
<mort_>
wow panthor is merged, I didn't realize that; this is extremely exciting
mattst88 has quit [Quit: leaving]
pzanoni has joined #dri-devel
mattst88 has joined #dri-devel
mattst88 has quit []
mattst88 has joined #dri-devel
simon-perretta-img has joined #dri-devel
Leopold has quit [Remote host closed the connection]
mattst88 has quit []
mattst88 has joined #dri-devel
Leopold_ has joined #dri-devel
Leopold_ has quit [Remote host closed the connection]
Leopold has joined #dri-devel
mattst88 has quit [Quit: leaving]
<alyssa>
=D
mattst88 has joined #dri-devel
Duke`` has quit [Ping timeout: 480 seconds]
sima has quit [Ping timeout: 480 seconds]
guludo has quit [Quit: WeeChat 4.2.1]
dejavuaround has joined #dri-devel
<dejavuaround>
Mar 13 18:23:19 <randevouz>in theory it's simpler to do the proposed triplets, cause otherwise the compiler analysis goes more difficult, you first upconvert/uptranslate a value in first alu then downtranslate it to smaller, in other words alu has some permutes, and it selects one, but from the value one digit is missing, it kind of clips onto that gap, and the subtraction result exposes the upconvert which needs a predicate that has
<dejavuaround>
to be smaller and
<dejavuaround>
Mar 13 18:23:19 <randevouz>downconvert, if every alu transitions this way no compiler analysis needs to be done it's space complexity tradeoff, you trade slightly more space by complexity, otherwise compiler has to keep tables of alus that go smaller and bigger those are splitting the instructions into two half's, so it only upconverts when it has to. but the space needed to upconvert is very tiny compared to alu permute banks, so the
<dejavuaround>
analysis is not worth
<dejavuaround>
Mar 13 18:23:19 <randevouz>it.
<dejavuaround>
Mar 13 18:48:25 <randevouz>in theory that is not so complex , since operands are odd and even do not need predicates, branches need predicates, and on the fork maximum alu amount predicates mandate the predicate usage, so if you take shorter branch , you always predicate if you take longer branch you could predicate only if shorter one has already graduated, and also every alu must transition to smaller value again, cause this is for
jkrzyszt has quit [Ping timeout: 480 seconds]
<dejavuaround>
book keeping and so to
<dejavuaround>
Mar 13 18:48:25 <randevouz>speak scaffold or garbage elimination, otherwise you immediately get stale or wrong content, and under this assumption that hashes are produced so that every alu transitions, it can always be done in parallel, in theory offers huge throughput and low latency, massive fast computation.
<dejavuaround>
Mar 13 19:07:19 <randevouz>and my phone is being hammered by all sorts of personal subjects, with click bates as i am the hero, well my definition is not matching about heroism, some immune system strengths i have , but in reality i would had considered myself a hero if i saved myself in the past from huge trouble i get into, by having been just smarter, and my relations with love of my life had long since finished, she finished it,
<dejavuaround>
and i confirm i no
<dejavuaround>
Mar 13 19:07:19 <randevouz>longer have plans with that lady, she treated me very bad , and cheated only, the last which is another subject of this flood i get on my phone. Sure she is sorry but she never liked me in real time and i care not about it anymore.
<dejavuaround>
Mar 13 19:52:15 <randevouz>Xen programmers are the only ones that fully support vidpn from ms apis, it is Microsoft's Randr, i wanted to do it too, but looked that they managed it pretty good. And of course i was very insulted by humiliations of sexual kind and i do no longer deal with any of the trash people who did it daily basis, and likely they get charged/sanctioned/penalized , but i am not disappointed about technology, i see all
<dejavuaround>
being done except my
<dejavuaround>
Mar 13 19:52:15 <randevouz>last project, which is upsetting one, in a way that none would want to share such code it puts one on the highlight immediately too, where many do not want to be at. all except the super trooper engines actually have been implemented and it was much harder work, then the engines which would upset the world and cause a lot of trouble.
<dejavuaround>
Mar 13 19:53:09 <randevouz>*than
<dejavuaround>
Mar 13 20:06:57 <randevouz>so it's indeed possible but i leave now, someone might have it, lot of institutions go down if something alike gets enough popularity and usage, puts a lot of good people at risk, and assaults are granted against the publisher too. Chaotic results likely indeed. So i leave from that research now, and do not share much either for safety.
<dejavuaround>
For fuck sakes you do things wrong, does not matter what grain it is, you could do whatever you want if you had brain.
<dejavuaround>
So my resignation is given you are all retards. They message me when you get handled all.
<dejavuaround>
so they took my voice, fuck you assholes.