<liyi__>
hi All, I send Linux kernel patches to dri-devel@lists.freedesktop.org using liyi@loongson.cn, but all the emails is refused by the server, the Bounce reason is Can not connect to recipient's server because of unstable network or firewall filter. rcpt handle timeout,last handle info: Can not connect to lists.freedesktop.org:2610:10:20:722:a800:ff:fe36:1795:25, the Bounce Code is can not connect to , but my other emils is OK. I don't know Who I
<liyi__>
Shoud Contact, anyone can help me? Thank you.
<liyi__>
or anywhere I can get help, thank you
pa has quit [Ping timeout: 480 seconds]
<agd5f>
liyi__ try on #freedesktop
Zopolis4 has joined #dri-devel
JohnnyonF has joined #dri-devel
JohnnyonFlame has quit [Ping timeout: 480 seconds]
bmodem1 has quit [Ping timeout: 480 seconds]
srslypascal has quit [Quit: Leaving]
heat has quit [Ping timeout: 480 seconds]
srslypascal has joined #dri-devel
cphealy has quit [Quit: Leaving]
jdavies_ has quit [Remote host closed the connection]
<danvet>
tzimmermann, it was largely motivated by the crtc helper->atomic helper transitional helpers
<danvet>
since we ditched those there's not much point keeping the hooks that made the transitional helpers possible around
<tzimmermann>
danvet. i see. using modeset_nofb to program the display mode would be nice to get symmetry between atomic_enable/_disable. but it doesn't appear like it's supposed to be used :)
<mripard>
those instances are incredibly hard to track and reproduce, so it's not really easy to figure out what's wrong and provide more details unfortunately :/
jkrzyszt has quit [Remote host closed the connection]
<danvet>
mripard, yeah ... there's kinda two things
<danvet>
one is that this shouldn't be possible, if you're still using hw resources at the point of state destroy something is very busted
<danvet>
(minus the legacy cursor hack this should hold)
<mripard>
yeah, the source of the corruption seems to be the same: we overwrite the hardware descriptors while they are in use
<danvet>
the other thing is that your atomic_check is kinda going out into a bit of silliness
<mripard>
but we indeed seem to trigger this from multiple paths
<danvet>
atomic_check is supposed to be against the sw state of things, not against some arbitrarily delayed hw state update
<danvet>
I guess in practice it doesn't matter much because people don't really use a lot of planes?
<danvet>
mripard, the way this is supposed to work is that you sufficiently delay the flip_done completion
<danvet>
i.e. the crtc event
<danvet>
if you signal that too early then the state gets shot down too early
<danvet>
so any delaying you might need should be in there, not later on
<danvet>
assuming the legacy cursor bypass is closed off
aravind has quit [Ping timeout: 480 seconds]
<danvet>
so in a way if flip_done fires too early, then in theory also your app could start rendering too early
<danvet>
since the flip_done is tied to the drm_event (it's the same thing)
<mripard>
because, if we leave the legacy cursor stuff aside, all commits are supposed to wait for flip_done before freeing the state, so if we free the state too soon then it must be that flip_done is signalled too soon?
<danvet>
yup
<mripard>
yeah, that makes sense
<danvet>
so either the flip_done is too early (would be driver bug) or we don't wait for flip_done (legacy cursor plus maybe a helper bug we haven't found yet?)
<danvet>
mripard, maybe add a check to crtc destry to warn if flip_done isn't done
<danvet>
check that not having the legacy cursor removal patch hits that
<danvet>
and then go hunting?
<mripard>
yeah, that's a good idea
<mripard>
thanks
<mripard>
for the first patch, it looks like mod_delayed_work is what we need indeed, are you ok with me fixing it up while applying?
camus has joined #dri-devel
camus1 has quit [Read error: Connection reset by peer]
<agd5f>
danvet, I don't think so? Don't remember off hand.
pcercuei has quit [Read error: Connection reset by peer]
pcercuei has joined #dri-devel
liyi__ has quit [Ping timeout: 480 seconds]
MajorBiscuit has quit [Ping timeout: 480 seconds]
<danvet>
agd5f, ah I caught up on that entire story, looks like it was just a classic refcount loop which könig already sorted out
<danvet>
the gem_bo holding a ref on the dma_buf and the dma_buf holding a ref on the gem_bo for the fbdev bo
<danvet>
so yeah that won't ever get freed :-)
Cyrinux9 has quit []
<danvet>
agd5f, the patches you linked are kinda different problem, I was thinking of exported dma-buf having to hold a reference to the drm_device
<danvet>
because holding a module reference alone does not actually keep the drm_device alive against hotunplug
<danvet>
or manual unbinding through sysfs
<danvet>
they are separate refcounts
Cyrinux9 has joined #dri-devel
<agd5f>
danvet, yeah, that was the only one I could think of off hand that amd was working on
<danvet>
agd5f, the hotunplug stuff is laid on ice for now?
<danvet>
iirc we discussed properly refcounting dma-buf and dma-fence as part of the big threads there ...
MajorBiscuit has joined #dri-devel
<agd5f>
danvet, I don't recall that issue off hand. The big issues we've run into with hotplug were a locking bug in the PCI core around hotplugs at suspend/resume time and just general issues around trying to prevent access to hardware when the device is no longer present
<agd5f>
within the driver
<danvet>
agd5f, I think what you need is someone holding a dma-buf
<danvet>
but no one holding a drm fd open
<danvet>
then the drm_device goes away, and you'll oops on the dma-buf
<danvet>
same with syncobj/sync_file fd
<danvet>
and well any dma_fence that another driver might be holding somewhere
<danvet>
agd5f, so it's more an exploit situation than a real world usage situation
<agd5f>
ah, ok
<danvet>
but if we still go boom in pci code in real world usage ... oh well :-/
<danvet>
ogabbay, ^^
<danvet>
it would still be nice to fix the refcount in drm_prime.c I think instead of each driver rolling their own
<ogabbay>
danvet: yeah, I'm pretty sure that will happen in our case as well. Although I would imagine that doing any hotplug on our device will break the device, or the driver, or both
<danvet>
whynotboth.meme
<ogabbay>
TomerTayar[m]: let's try this, hot unplug the device while we export dma buf and then close dma buf and see what happens
macromorgan has quit [Quit: Leaving]
jkrzyszt has joined #dri-devel
fxkamd has quit []
fxkamd has joined #dri-devel
sauce has quit [Quit: sauce]
sauce has joined #dri-devel
MajorBiscuit has quit [Ping timeout: 480 seconds]
kts has joined #dri-devel
devilhorns has quit []
rasterman has quit [Quit: Gettin' stinky!]
dviola has quit [Quit: WeeChat 3.7.1]
Zopolis4 has quit []
aravind has quit [Ping timeout: 480 seconds]
lemonzest has quit [Quit: WeeChat 3.6]
<karolherbst>
can we burn LLVM to the ground?
<karolherbst>
honestly..
<jenatali>
What now?
<karolherbst>
jenatali: remember my resource path fix?
<jenatali>
Vaguely
<karolherbst>
turns out.. it doesn't fix it on _all_ systems
<karolherbst>
and "clang -print-resource-dir " is indeed the only reliable option
<jenatali>
Just embed the header. It's tiny
<karolherbst>
(ノಠ益ಠ)ノ彡┻━┻
<karolherbst>
uhh
<karolherbst>
but what if the header gets fixed?
<jenatali>
Then rebuild with the newer LLVM
<karolherbst>
or changed or _something_ and then compilation fails, because the updated one isn't included
<karolherbst>
yeah well...
<karolherbst>
that's what I wanted to prevent from happening (any update to llvm requiring to rebuild mesa)
<karolherbst>
maybe I have to bring this up to LLVM folks so they give us a proper API to fetch that resource directory....
bmodem has quit [Ping timeout: 480 seconds]
<karolherbst>
apparently I could pass CLANG_RESOURCE_DIR as the second arg to GetResourcesPath, but CLANG_RESOURCE_DIR might hardcode the version as well :'(
pallavim has joined #dri-devel
<karolherbst>
I just want libclang to find that silly opencl-c-base.h file
<karolherbst>
why is that so annoying
<karolherbst>
and stupid
<jenatali>
Yeah that's kind of ridiculous
<jenatali>
I guess there's one upside to statically linking
<jenatali>
With the big downside being it requires actually building LLVM, which... oof
<jenatali>
Our build machines that we have to use for building the code we ship takes 2 hours to compile LLVM
<karolherbst>
yeah...
<karolherbst>
distributions will probably also kill us
<karolherbst>
I still hope there is some super magic clang option we can pass it so it magically finds, but...
<karolherbst>
I wonder if not setting `c->getHeaderSearchOpts().ResourceDir` at all would work?
<karolherbst>
is there actually a reason we are setting that?
alyssa has left #dri-devel [#dri-devel]
nchery has joined #dri-devel
ice9 has quit [Ping timeout: 480 seconds]
phasta has quit [Quit: Leaving]
<robclark>
danvet: sry that one fell off my radar.. it's sort of like somehow email inbox isn't a good way to manage patches ;-)
<danvet>
robclark, yeah who'd have guessed
<danvet>
tbh I've been disaster the past weeks on this too
<robclark>
also, some waitboost discussion on v2 (because, my bad, I replied to what wasn't the most recent version of patch)
<danvet>
so I figured I'll compensate by pinging a few people who also missed things :-)
<karolherbst>
can we just stop doing mailing lists?
<danvet>
how would we survive without that pain
<danvet>
robclark, oh I forgot to cc you on my ping on v3
<karolherbst>
just write to dri-devel: "sunsetting emails. We won't accept _any_ patch through emails anymore, starting tomorrow. No discussion. Deal with it :3"
<robclark>
danvet: yeah, I noticed that the way I notice most patches.. by blind luck :-P
lemonzest has joined #dri-devel
<robclark>
karolherbst: I guess there would be some villagers with pitchforks.. and also I don't have a good plan for how to deal w/ s-o-b vs committer-email.. we've _kinda_ been using MRs for drm/msm (but only ff and mainly as a way to integrate CI so far)
<karolherbst>
not caring
<karolherbst>
(which in regards to bikesheding on mailing lists is the only sustainable option)
<danvet>
robclark, you can just not sob?
<karolherbst>
robclark: I have some magic for ya
<danvet>
I mean drm-misc with commit rights does just not have maintainer sobs on there
<danvet>
they probably would scream if marge starts s-o-bing patches :-P
<karolherbst>
that's why I have my dim stuff
<karolherbst>
you do "dim tag nouveau 4235" and it adds the tags through your machine
<danvet>
needs an MR and then pinging demarchi to land it
<karolherbst>
yeah.. and clean ups
<karolherbst>
I want to make it possible to use SSH as well
<karolherbst>
and stuff
<karolherbst>
_but_
<karolherbst>
we can use the gitlab API for automating such nonesense
kzd has joined #dri-devel
<karolherbst>
it also has the problem of requiring accounts to have public emails and stuff
<danvet>
yeah, I think some iterations on how to integrated gitlab api stuff best into dim will be needed
MajorBiscuit has joined #dri-devel
<danvet>
karolherbst, well you shouldn't ever forge a sob for someone else, so this shouldn't be a problem
<karolherbst>
it's more for review tags
<karolherbst>
or acks
<danvet>
ah yeah for those it's a bit more tricky
<karolherbst>
like you could add ack tags from anybody "aproving" the MR
<danvet>
otoh if the mail is private, just put the gitlab user https: link in there instead?
<karolherbst>
something
<robclark>
hmm, ideally some way that only one of the driver maintainers could assign to margebot and margebot setting the commit-author and SoB based on who assigned it to her?
<danvet>
it'll fuck up everyone's parser, but oh well :-)
<karolherbst>
but that needs more ML bikeshedding on lkml
<danvet>
robclark, uh that smells like going back to maintainer model from committer model
<danvet>
defo no
<demarchi>
karolherbst: let's not overload terms... s/tag/trailers/
<karolherbst>
heh, people are angry if you mess with their parsers
<danvet>
you really don't need maintainer sob
<danvet>
no one has screamed about that for the years we've been doing it by now
<danvet>
and people have screamed about everything else that the commit right stuff caused
<danvet>
so really, don't worry
<karolherbst>
:D
<danvet>
I mean we put sob onto merge requests instead, which no one else does
<danvet>
or at least didn't, maybe that's changing
<karolherbst>
ehhh...
<robclark>
danvet: when I say maintainer I'm thinking of (currently) myself, lumag, and abhinav for drm/msm.. and similar for other driver.. ie. more than a single person but someone who is familiar enough with that driver to approve changes
Duke`` has joined #dri-devel
<javierm>
danvet: I wonder what are some of the complains / issues that the commit right stuff caused
<robclark>
we aren't yet to the point where we can just completely rely on CI
lumag has joined #dri-devel
<lumag>
robclark, I might sound like and oldschooler here, but I prefer ML model. Bound with the patchwork and b4 it significantly simplifies my work as a co-maintainer, reviewer and patch author. It allows me to collect all the review and testing tags in a sensible way.
<lumag>
And the problem with merge requests is that CI is far from being perfect.
<lumag>
For example, CI was completely happy with the recent db410c breakage.
jkrzyszt has quit [Remote host closed the connection]
<lumag>
So I still do a lot of manual testing, and it's simpler with `b4 am`
<lumag>
The only benefit from MRs is that the developers will have to apply their patches on top of a sane revision rather than dumping the stuff that 'was tested on top of 5.15 fork'
tzimmermann has quit [Quit: Leaving]
Duke`` has quit [Ping timeout: 480 seconds]
kts has quit [Quit: Leaving]
Duke`` has joined #dri-devel
alyssa has joined #dri-devel
<cwabbott>
dschuermann_: the more I think about it, the more I think we can't assume that undef is convergent
<cwabbott>
or rather, we can't assume that things computed from undef values are convergent
<cwabbott>
because our current model assumes that you can get a new value every time you use an undef value, sure, but then ALU operations can narrow down the possibilities so that it becomes constant
Company has joined #dri-devel
<cwabbott>
think like "foo = undef & 1; bar = foo & ~foo;"
<cwabbott>
optimizations that effectively insert readfirstlane somewhere (which is pretty common) can't assume "foo" is convergent and insert a readfirstlane because "bar" would go from being always 0 to being either 0 or 1 (assuming foo wound up being in some vector register)
<cwabbott>
that's less likely to happen with a "pure" undef but might happen if instead we have a phi node where one of the sources is undef
<cwabbott>
the case I found is slightly different but sort-of in the same vein, where an optimization assuming the result is the same in all lanes blows up because it's undef
jkrzyszt has joined #dri-devel
<robclark>
lumag: what was the db410c breakage, and how can we make CI better catch it?
<lumag>
God them from local market, so the label is different, but the overall appearance is the same
<daniels>
linaro used to make one, but iirc it only wired HPD, and didn’t have an EEPROM
sgruszka has quit [Remote host closed the connection]
<robclark>
in theory we are supposed to get some chamelium v3 boards at some point..
<robclark>
which should be useful for igt testing.. I guess maybe we need to ask for one for pdx farm db401c/db820c's.
<daniels>
Amazon have plenty which have EEPROM these days, apparently so you can run a real desktop on your cryptocurrency miners and RDP into them
smilessh has quit [Ping timeout: 480 seconds]
<robclark>
heh
MajorBiscuit has quit [Quit: WeeChat 3.6]
frieder has quit [Remote host closed the connection]
rasterman has joined #dri-devel
jkrzyszt has quit [Remote host closed the connection]
kts has joined #dri-devel
junaid has joined #dri-devel
dviola has joined #dri-devel
Duke`` has quit [Ping timeout: 480 seconds]
junaid has quit [Ping timeout: 480 seconds]
Duke`` has joined #dri-devel
kts has quit [Quit: Leaving]
tursulin has quit [Ping timeout: 480 seconds]
tobiasjakobi has joined #dri-devel
stuart has joined #dri-devel
<dcbaker>
anholt_: I've got "5c246e21b7 Revert "freedreno/a5xx: Fix clip_mask" nominated for 23.0 since it's a revert, but it isn't immediately obvious to me as to whether this is something we really want to backport, do you have opinions?
<anholt_>
dcbaker: I think so -- it fixes GPU hangs on that board.
<dcbaker>
anholt_: I'll pull it then, thanks!
tobiasjakobi has quit []
<alyssa>
anholt_: I meant to ask about that commit, if the clip mask gets lowered by mesa/st, why do other drivers (including fd/a6xx) handle it?
<alyssa>
or is this more PIPE_CAP voodoo?
<anholt_>
alyssa: a6xx doesn't lower it through st
<anholt_>
because something about its weird gs lowering.
<alyssa>
delight
<alyssa>
I ask because AGX supports clip distance, but 1. there is no way to disable (must use lower_clip_disable) and worse 2. they are sysval only outputs and cannot be interpolated (must duplicate the store_output and write to some non-sysval slot)
<alyssa>
currently I just use lower_clip_fs
<alyssa>
long term plan is to deal with #2 when we get to enabling clipDistance in AGXV, and then Zink should do #1 for us too :~p
<DemiMarie>
zmike: doesn’t the crash also mean there is a GPU kernel driver bug that needs to be fixed?
<gfxstrand>
italove: The good(?) news is that I think your patch is correct. The bad news is that now we have to figure out what's going on with glReadPixels().
<gfxstrand>
I'll leave that to you. :)
<gfxstrand>
I've got a big enough headache already. :P
<italove>
gfxstrand: thanks for the help with that + the review :)
pcercuei has quit [Read error: Connection reset by peer]
pcercuei has joined #dri-devel
<gfxstrand>
yw
<gfxstrand>
I forgot how much I hate GL...
gouchi has quit [Remote host closed the connection]
<gfxstrand>
So thanks for that little reminder. :P
<alyssa>
gfxstrand: ...Is Vulkan better?
<alyssa>
You're always telling me how much you hate stuff in Vulkan :~P
<gfxstrand>
In this regard, yes.
<gfxstrand>
GL is full of "As for thing X, driver will choose thing Y which is at least as good as X, then you can call glGetParameter* with GL_Y to get Y. Also, when you do Z, we'll check that Y is ... and fail otherwise"
<gfxstrand>
*ask for
<alyssa>
ah, yes, well, yes
<gfxstrand>
In Vulkan, there's none of this "Ask for X, get Y nonsense". The app queries for what's possible and then does that and the driver gives it what it asked for.
<gfxstrand>
Makes the spec SO much easier to read.
<alyssa>
Makes apps harder to write though
<alyssa>
I assume
<gfxstrand>
Also, means you don't have 3 different parameters and a calculated thing to keep in sync and constantly be trying to remember which one you need to use in any given situation.
<gfxstrand>
Yes and no. Thing is... most apps don't want GL's fuzziness.
<alyssa>
Yeah, that's fair
<alyssa>
I'm mostly wondering about stuff like e.g. AGX doesn't support 24-bit depth
<alyssa>
GL will implicitly promote to 32-bit
<gfxstrand>
You don't need to ask for at least 8 bits. You just say "give me RGBA8888" and the driver does it because no one's upgrading you to 11 bits for no good reason.
<alyssa>
VK we just don't expose that format
<gfxstrand>
Yup
<alyssa>
which would* mean that games are just going to fall over and break if they have never seen a GPU without 24-bit depth support
<alyssa>
limiting title selection
<alyssa>
* if not for RADV also not advertising 24-bit depth
<gfxstrand>
And for most apps, they can write a tiny loop which looks at what formats are available and selects 32-bit if 24-bit isn't available.
<gfxstrand>
Vulkan apps are pretty used to that.
<alyssa>
I wonder if that sort of thing could be driconf'd too?
<alyssa>
if there's some old binary app that you're trying to coax to run on your platform?
<alyssa>
("old" includes stuff written today but for hw in 10 years form now)
<gfxstrand>
In theory, the HW 10 years from now will support a superset of what today's hardware does.
* alyssa
doubtful
<gfxstrand>
Well, for any given vendor
* alyssa
doubtful
<alyssa>
Maybe I've been burned by Mali removing features over time? :p
<gfxstrand>
IDK, Intel has managed to evolve their hardware for 15 years without deleting interesting formats.
<gfxstrand>
Yeah, I think you might be. :)
<alyssa>
Arm removed a huge number of formats a few years ago
<gfxstrand>
woof
<alyssa>
and we're still sorting out the pain
<alyssa>
you're not going to miss them but if you have a 2007 title that uses them... I'm glad mesa/st can emulate
<gfxstrand>
Well, Vulkan has a required baseline format set that apps can alway rely on and it's enough to cover most cases.
<alyssa>
in the parallel universe where it was VK the whole time...
<gfxstrand>
If you're doing something truely esoteric, you may need to actually query for something.
ngcortes has quit [Remote host closed the connection]
ngcortes has joined #dri-devel
rasterman has quit [Quit: Gettin' stinky!]
ngcortes has quit [Ping timeout: 480 seconds]
frankbinns has quit [Remote host closed the connection]
darkapex has quit [Remote host closed the connection]
darkapex has joined #dri-devel
anholt_ has quit [Quit: Leaving]
anholt has joined #dri-devel
dviola has joined #dri-devel
ngcortes has joined #dri-devel
jkrzyszt has quit [Ping timeout: 480 seconds]
<HdkR>
t/win 4
<DavidHeidelberg[m]>
Quiz: How often do you want to debug something which failing in CI with gdb?
<DavidHeidelberg[m]>
I'm currently more and more considering the costs of distributing unstripped mesa (NOT sending it to the runners, just uploading) and debug symbols (also aside, but uploaded with each mesa build)
<DavidHeidelberg[m]>
One of huge advantages I see is debugging flakes. On other hand, for that purpose would be better to have not debugoptimized we use now, but debug... which is overkill ofc.
<zmike>
tbh just printing a backtrace from gdb would be a huge improvement for crashes
<DavidHeidelberg[m]>
zmike: good. anyway have to warn again, it's `debugoptimized` so no luxury of seeing variables & stuff
<DavidHeidelberg[m]>
my calculation is around extra 250M of unstripped mesa + 250M debug.dwp, these files just stay uploaded until removed (or used for debugging)
<DavidHeidelberg[m]>
and that's for each build which gets distributed to the test jobs class
mvlad has quit [Remote host closed the connection]
<zmike>
what kind of impact does that have on download times for the jobs
<DavidHeidelberg[m]>
none
<DavidHeidelberg[m]>
anyway, I've been too bulvar. Looking at mine unstripped zst archive and debug.dwp, it's 81M + 137M for arm64
<DavidHeidelberg[m]>
zmike: the distributed stuff is the same as before, just "few extra kbytes" for the split dwarf references
<zmike>
hm cool
<DavidHeidelberg[m]>
if you need debug, you download the full-sized unstripped binary compatible with the one running in CI (stripped)
<DavidHeidelberg[m]>
what could be spotted I guess is build job uploading these files to s3 (I would love if the job could postpone it after "finishing"
<DavidHeidelberg[m]>
thinking a loud. Mesa has reproducible builds I assume. Maybe we could re-generate the build and upload in different job
ngcortes has joined #dri-devel
<DavidHeidelberg[m]>
so for CI we would build without uploading unstripped & debug.dwp and if we need to generate these files, we could run separate job
<DavidHeidelberg[m]>
ugly thing would be 7 duplicated job in pipeline
<anholt>
we don't strip armhf debugoptimized mesa already. is there something stopping us from just dropping stripping for arm64 and x86?
<DavidHeidelberg[m]>
anholt: upload times to the runners?