ChanServ changed the topic of #freedesktop to: https://www.freedesktop.org infrastructure and online services || for questions about freedesktop.org projects, please see each project's contact || for discussions about specifications, please use https://gitlab.freedesktop.org/xdg or xdg@lists.freedesktop.org
ybogdano has joined #freedesktop
ybogdano has quit [Ping timeout: 480 seconds]
ds` has quit [Quit: ...]
ds` has joined #freedesktop
chomwitt has quit [Ping timeout: 480 seconds]
<robclark>
daniels: I enjoyed the MR Label Maker pun (and also seems like something gitlab should have supported out of the box for like ever) ;-)
alpernebbi has joined #freedesktop
alpernebbi_ has quit [Ping timeout: 480 seconds]
ximion has quit []
genpaku has quit [Read error: Connection reset by peer]
genpaku has joined #freedesktop
AbleBacon has quit [Read error: Connection reset by peer]
ybogdano has joined #freedesktop
ybogdano has quit [Ping timeout: 480 seconds]
eroux has joined #freedesktop
danvet has joined #freedesktop
chomwitt has joined #freedesktop
vbenes has joined #freedesktop
alanc has quit [Remote host closed the connection]
alanc has joined #freedesktop
itoral has joined #freedesktop
immibis has quit [Remote host closed the connection]
immibis has joined #freedesktop
ximion has joined #freedesktop
ximion has quit []
thaller has quit [Quit: Leaving]
thaller has joined #freedesktop
ngcortes has quit [Quit: Leaving]
<bentiss>
FWIW, we have a gitlab securty update pending. I'll plan on doing that in the next few minutes
<bentiss>
sergi: it might make more sense to merge the piglit bump first, so we keep bisectability
<pq>
Thanks for top-postin! (i.e. telling me be answer before I asked the question) :-D
<sergi>
Thanks bentiss. You may have seen my 'testonly' merge requests to 'uprev piglit in mesa'. We made a tool to 'uprev mesa in virglrenderer', and I'm doing a PoC to know if we can extend it with 'uprev piglit in mesa'.
<bentiss>
gitaly-3 is still having issues to spin up, I might have to reboot some nodes to clean its state
<bentiss>
sergi: no worries. gallo[m] made already some work at finding new issues with the uprev of piglit so I didn't want you to spend time on it when the job has already been done
<bentiss>
the update to 15.5.2 went fine except for the gitaly pods that are still pending their update
<bentiss>
and gitaly-3 is still down
<tomeu>
bentiss: the idea is to have a tool checking daily what could break (if anything) with an uprev, and also propose a patch with the uprev and any changes to the test expectations
<bentiss>
and.... all gitaly pods are up to date as of now
<bentiss>
tomeu: yes, this is really nice :)
<tomeu>
the main goal is to avoid people trying to uprev a component after a few weeks, then finding lots of tests failing and having to investigate them
<bentiss>
tomeu: this is a very good time to do it now because there are breaking tests :)
<tomeu>
yeah, with piglit it will happen pretty often
<tomeu>
we will be doing something similar for other components such as the kernel, deqp, etc
<bentiss>
tomeu, sergi, is there a close enough ETA for opening that uprev MR or you'll need a little bit more time (like a couple of weeks)
<bentiss>
because having seen that, I figure for bisectability we probably should bump piglit first in my minio-to-s3 MR, and then do the server switch
<bentiss>
though bisectability will only be ensured while we keep minio-packet.fd.o up, i.e., it'll shut down at the later on Nov 30
<sergi>
there is no close eta for this uprev, bentiss. It in a proof of concept stage
<sergi>
by this uprev, I mean the one made by this uprev tool we are developing
<mupuf>
tomeu, sergi: this sounds great, but how can one guarantee that these background checks do not prevent merges in Mesa because the runners are unavailable?
<bentiss>
sergi: OK, then I'll continue pushing !19076
<bentiss>
mupuf: honestly if one extra pipeline a day is blocking MRs, we are screwed
<mupuf>
bentiss: oh, so it wouldn't be a stress test, running like 10 times or more?
<bentiss>
mupuf: that's not what I understood from the messages above "09:59:33 tomeu | bentiss: the idea is to have a tool checking daily what could break (if anything) with an uprev, and also propose a patch with the uprev and any changes to the test expectations"
<tomeu>
yep, and will be scheduled when there is less load
<mupuf>
sounds good!
<bentiss>
tomeu: if you need a specific job to check just when mupuf is pushing patches, that can be arranged :)
<mupuf>
tomeu: do you know how much of capacity a single pipeline uses, for your lava-backed jobs?
<mupuf>
as in: mesa uses 10, but we have 25 of them in the farm
<bentiss>
s/arranged/done/
<tomeu>
we will be retrying jobs if we find flakiness on the first try, but hopefully that won't happen often
<tomeu>
mupuf: we use all of them!
<tomeu>
or well, that's the idea, in practice some device types have some slack
<mupuf>
hmm... that doesn't help alleviate my concern then :D
<mupuf>
well, I guess if a job takes less than 10 minutes, it isn't too bad
<tomeu>
well, we have queueing and a limit on how long jobs are expected to take
<tomeu>
same with all runners
<tomeu>
around 10 minutes is the target
<mupuf>
do you have any prioritization of jobs in lava?
<mupuf>
as in, jobs coming from Marge are higher priority than the ones coming from other users
<tomeu>
yes, pre-merge testing has priority over the others
<tomeu>
ah, not for Marge
<tomeu>
but the other day I got some stats, and surprisingly little people trigger pipelines, besides Marge
<mupuf>
so, you have one gitlab runner per project per DUT? How else can you prioritize on the lava side otherwise?
<mupuf>
good :p
<tomeu>
ah no, we have one machine per lab that has one gitlab runner instance per device type
<mupuf>
oh, and the gitlab runner has parallel = N?
<tomeu>
and each gitlab runner advertises as many slots as DUTs of that type, plus one
<mupuf>
I see, interesting
<tomeu>
so the queueing happens mostly on the gitlab side
<tomeu>
mind that the lab is shared with kernelci.org and other CIs, but those aren't pre-merge and have thus lower priority
<tomeu>
plus, those jobs are typically very small
<mupuf>
yeah :s I hope to get to hack on Gitlab soon to introduce two endpoints: list all the jobs that could be run by the runner that queries it, and one to pick a particular job
<tomeu>
but really, regarding HW availability, I think we should switch at some point in the near future towards increasing coverage without increasing DUTs
<tomeu>
it is hard to scale further otherwise, and there is a lot of FOSS software that isn't being tested atm
* mupuf
keeps delaying putting "his" navi21 in pre-merge as he has a mortal fear of preventing merges
<mupuf>
Agreed
<tomeu>
mupuf: have you seen the script we have to stress test jobs before we enable them in CI?
<mupuf>
yes, I have, thanks to daniels :)
<mupuf>
thanks for writing it
<mupuf>
btw, on the topic of increasing code coverage without increasing hw needs to much: post-merge testing in mesa
<tomeu>
np, so with that and some monitoring, you can probably get stuff merged and still sleep soundly at night :)
* bentiss
never understood why post-merge was considered as a good thing
<tomeu>
yeah, we can go that way and probably need to do so anyway (CL CTS...), but post-merge and manual jobs are close to useless if there isn't a team of people keeping an eye on them and keeping them green
<bentiss>
especially when we do ff-merge
<tomeu>
well, post-merge can be better than no testing, but someone needs to keep an eye on it
<mupuf>
tomeu: yep... which is what I do but outside of Mesa which makes it even more painful :D
<tomeu>
otherwise, it si probably worst :)
<mupuf>
bentiss: well, let's put it this way: can we have pre-merge testing for 10 generations of AMD GPUs?
<bentiss>
tomeu: but if post-merge testing has a chance to fail... why not do the test in pre-merge instead of breaking someone's HW?
<bentiss>
mupuf: we should
<tomeu>
that is probably the main problem with kernelci.org, you have a great way of knowing when stuff breaks, but not a great plan on what to do when that happens
<bentiss>
mupuf:if we care
<mupuf>
tomeu: I know what you mean :)
<bentiss>
mupuf: let's be honest, it's much easier to ask the person who broke things to fix it than having someone else figuring it out based on a post merge result
<mupuf>
for Intel, we had fast-feedback on all hardware, and full testing on limited generations
<mupuf>
I think this is probably a great approach
<tomeu>
I'm not sure how many intel gens we test currently in mesa CI, but I tihnk it will be close to 10
<tomeu>
and we are adding more
<mupuf>
bentiss: I can't overstate how much I agree with you
<tomeu>
and that is without the vendor's intervention :p
<bentiss>
heh :)
<mupuf>
tomeu: yeah, it is good to have someone so interested in keeping chromebooks tested :p
<tomeu>
it's great to have somebody interested in mesa not regressing ;)
<mupuf>
;)
<mupuf>
We received 70 chromebooks at Intel after I presented our efforts at XDC 2017.... which was hosted by Google :p
MajorBiscuit has joined #freedesktop
<mupuf>
that being said, testing Intel platforms is easy on the infra side: try getting 10 machines sporting some RTX 4090 :D
<mupuf>
You can have two per breaker in the US
<tomeu>
was a great XDC for CI :)
<mupuf>
Maybe 3 in europe
<tomeu>
yeah, we are going to think about that for nvk
<mupuf>
luckily, they run VKCTS FAAASSSSTTT
<tomeu>
hmm, I think VKCTS is CPU-bound, right?
<mupuf>
The new hosts we bought for CI have 5950X
<mupuf>
16 cores, 32 threads
<tomeu>
due to the software rendering (something we need address to do more testing with the same amount of HW)
<mupuf>
you want to bake the expectations? Would be a great idea, yeah :)
<mupuf>
so, to go back to post-merge testing in Mesa. I think we could try to re-use the artifacts generated by the Marge job, and just run more CI jobs on main
<mupuf>
failures would send an email to whoever wanted the job in post-merge
<mupuf>
and it is their responsibility to keep it clean or disable the job until they do
<mupuf>
they can use manual testing in their MR to iterate on patches
<bentiss>
you should probably send that email to the person who submitted the change
<mupuf>
ah, right, good idea, we should be able to get that information
<bentiss>
and it will probably be: "why does my job failed, I spent too much time fixing it, let's enable it in pre-merge"
<mupuf>
if that leads to improving the execution time of our testing in order to be able to test more in pre-merge, that's success for everyone!
<bentiss>
I might be stubburn, but I still don't get why post merge tests are considered as good. We have a test that prevent a bug, let's enable it (unless it's flaky)
<mupuf>
machine time, there are no other reasons
<bentiss>
then fix the tests :)
<mupuf>
improve the tests, and increase the size of farms, yep!
<bentiss>
increasing the size of farms won't give you much actually, unless you can use matrx testing on multiple identical mnachines
<bentiss>
because you want marge to do testing on just one MR at a time
<mupuf>
yes, that's what I meant, increasing the amount of duplicated machines