ChanServ changed the topic of #freedesktop to: https://www.freedesktop.org infrastructure and online services || for questions about freedesktop.org projects, please see each project's contact || for discussions about specifications, please use https://gitlab.freedesktop.org/xdg or xdg@lists.freedesktop.org
ybogdano has joined #freedesktop
ybogdano has quit [Ping timeout: 480 seconds]
ds` has quit [Quit: ...]
ds` has joined #freedesktop
chomwitt has quit [Ping timeout: 480 seconds]
<robclark> daniels: I enjoyed the MR Label Maker pun (and also seems like something gitlab should have supported out of the box for like ever) ;-)
alpernebbi has joined #freedesktop
alpernebbi_ has quit [Ping timeout: 480 seconds]
ximion has quit []
genpaku has quit [Read error: Connection reset by peer]
genpaku has joined #freedesktop
AbleBacon has quit [Read error: Connection reset by peer]
ybogdano has joined #freedesktop
ybogdano has quit [Ping timeout: 480 seconds]
eroux has joined #freedesktop
danvet has joined #freedesktop
chomwitt has joined #freedesktop
vbenes has joined #freedesktop
alanc has quit [Remote host closed the connection]
alanc has joined #freedesktop
itoral has joined #freedesktop
immibis has quit [Remote host closed the connection]
immibis has joined #freedesktop
ximion has joined #freedesktop
ximion has quit []
thaller has quit [Quit: Leaving]
thaller has joined #freedesktop
ngcortes has quit [Quit: Leaving]
<bentiss> FWIW, we have a gitlab securty update pending. I'll plan on doing that in the next few minutes
mvlad has joined #freedesktop
<bentiss> sergi: FWIW, there is already a MR bumping piglit: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19076 Feel free to take anything you need there
<bentiss> sergi: it might make more sense to merge the piglit bump first, so we keep bisectability
<pq> Thanks for top-postin! (i.e. telling me be answer before I asked the question) :-D
<sergi> Thanks bentiss. You may have seen my 'testonly' merge requests to 'uprev piglit in mesa'. We made a tool to 'uprev mesa in virglrenderer', and I'm doing a PoC to know if we can extend it with 'uprev piglit in mesa'.
<bentiss> gitaly-3 is still having issues to spin up, I might have to reboot some nodes to clean its state
<bentiss> sergi: no worries. gallo[m] made already some work at finding new issues with the uprev of piglit so I didn't want you to spend time on it when the job has already been done
<bentiss> the update to 15.5.2 went fine except for the gitaly pods that are still pending their update
<bentiss> and gitaly-3 is still down
<tomeu> bentiss: the idea is to have a tool checking daily what could break (if anything) with an uprev, and also propose a patch with the uprev and any changes to the test expectations
<bentiss> and.... all gitaly pods are up to date as of now
<bentiss> tomeu: yes, this is really nice :)
<tomeu> the main goal is to avoid people trying to uprev a component after a few weeks, then finding lots of tests failing and having to investigate them
<bentiss> tomeu: this is a very good time to do it now because there are breaking tests :)
<tomeu> yeah, with piglit it will happen pretty often
<tomeu> we will be doing something similar for other components such as the kernel, deqp, etc
<bentiss> tomeu, sergi, is there a close enough ETA for opening that uprev MR or you'll need a little bit more time (like a couple of weeks)
<bentiss> because having seen that, I figure for bisectability we probably should bump piglit first in my minio-to-s3 MR, and then do the server switch
<bentiss> though bisectability will only be ensured while we keep minio-packet.fd.o up, i.e., it'll shut down at the later on Nov 30
<sergi> there is no close eta for this uprev, bentiss. It in a proof of concept stage
<sergi> by this uprev, I mean the one made by this uprev tool we are developing
<mupuf> tomeu, sergi: this sounds great, but how can one guarantee that these background checks do not prevent merges in Mesa because the runners are unavailable?
<bentiss> sergi: OK, then I'll continue pushing !19076
<bentiss> mupuf: honestly if one extra pipeline a day is blocking MRs, we are screwed
<mupuf> bentiss: oh, so it wouldn't be a stress test, running like 10 times or more?
<bentiss> mupuf: that's not what I understood from the messages above "09:59:33 tomeu | bentiss: the idea is to have a tool checking daily what could break (if anything) with an uprev, and also propose a patch with the uprev and any changes to the test expectations"
<tomeu> yep, and will be scheduled when there is less load
<mupuf> sounds good!
<bentiss> tomeu: if you need a specific job to check just when mupuf is pushing patches, that can be arranged :)
<mupuf> tomeu: do you know how much of capacity a single pipeline uses, for your lava-backed jobs?
<mupuf> as in: mesa uses 10, but we have 25 of them in the farm
<bentiss> s/arranged/done/
<tomeu> we will be retrying jobs if we find flakiness on the first try, but hopefully that won't happen often
<tomeu> mupuf: we use all of them!
<tomeu> or well, that's the idea, in practice some device types have some slack
<mupuf> hmm... that doesn't help alleviate my concern then :D
<mupuf> well, I guess if a job takes less than 10 minutes, it isn't too bad
<tomeu> well, we have queueing and a limit on how long jobs are expected to take
<tomeu> same with all runners
<tomeu> around 10 minutes is the target
<mupuf> do you have any prioritization of jobs in lava?
<mupuf> as in, jobs coming from Marge are higher priority than the ones coming from other users
<tomeu> yes, pre-merge testing has priority over the others
<tomeu> ah, not for Marge
<tomeu> but the other day I got some stats, and surprisingly little people trigger pipelines, besides Marge
<mupuf> so, you have one gitlab runner per project per DUT? How else can you prioritize on the lava side otherwise?
<mupuf> good :p
<tomeu> ah no, we have one machine per lab that has one gitlab runner instance per device type
<mupuf> oh, and the gitlab runner has parallel = N?
<tomeu> and each gitlab runner advertises as many slots as DUTs of that type, plus one
<mupuf> I see, interesting
<tomeu> so the queueing happens mostly on the gitlab side
<tomeu> mind that the lab is shared with kernelci.org and other CIs, but those aren't pre-merge and have thus lower priority
<tomeu> plus, those jobs are typically very small
<mupuf> yeah :s I hope to get to hack on Gitlab soon to introduce two endpoints: list all the jobs that could be run by the runner that queries it, and one to pick a particular job
<tomeu> btw, there is monitoring in place to keep an eye on how well this is all working: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7613
<mupuf> yeah, I have seen this :)
<tomeu> but really, regarding HW availability, I think we should switch at some point in the near future towards increasing coverage without increasing DUTs
<tomeu> it is hard to scale further otherwise, and there is a lot of FOSS software that isn't being tested atm
* mupuf keeps delaying putting "his" navi21 in pre-merge as he has a mortal fear of preventing merges
<mupuf> Agreed
<tomeu> mupuf: have you seen the script we have to stress test jobs before we enable them in CI?
<mupuf> yes, I have, thanks to daniels :)
<mupuf> thanks for writing it
<mupuf> btw, on the topic of increasing code coverage without increasing hw needs to much: post-merge testing in mesa
<tomeu> np, so with that and some monitoring, you can probably get stuff merged and still sleep soundly at night :)
* bentiss never understood why post-merge was considered as a good thing
<tomeu> yeah, we can go that way and probably need to do so anyway (CL CTS...), but post-merge and manual jobs are close to useless if there isn't a team of people keeping an eye on them and keeping them green
<bentiss> especially when we do ff-merge
<tomeu> well, post-merge can be better than no testing, but someone needs to keep an eye on it
<mupuf> tomeu: yep... which is what I do but outside of Mesa which makes it even more painful :D
<tomeu> otherwise, it si probably worst :)
<mupuf> bentiss: well, let's put it this way: can we have pre-merge testing for 10 generations of AMD GPUs?
<bentiss> tomeu: but if post-merge testing has a chance to fail... why not do the test in pre-merge instead of breaking someone's HW?
<bentiss> mupuf: we should
<tomeu> that is probably the main problem with kernelci.org, you have a great way of knowing when stuff breaks, but not a great plan on what to do when that happens
<bentiss> mupuf:if we care
<mupuf> tomeu: I know what you mean :)
<bentiss> mupuf: let's be honest, it's much easier to ask the person who broke things to fix it than having someone else figuring it out based on a post merge result
<mupuf> for Intel, we had fast-feedback on all hardware, and full testing on limited generations
<mupuf> I think this is probably a great approach
<tomeu> I'm not sure how many intel gens we test currently in mesa CI, but I tihnk it will be close to 10
<tomeu> and we are adding more
<mupuf> bentiss: I can't overstate how much I agree with you
<tomeu> and that is without the vendor's intervention :p
<bentiss> heh :)
<mupuf> tomeu: yeah, it is good to have someone so interested in keeping chromebooks tested :p
<tomeu> it's great to have somebody interested in mesa not regressing ;)
<mupuf> ;)
<mupuf> We received 70 chromebooks at Intel after I presented our efforts at XDC 2017.... which was hosted by Google :p
MajorBiscuit has joined #freedesktop
<mupuf> that being said, testing Intel platforms is easy on the infra side: try getting 10 machines sporting some RTX 4090 :D
<mupuf> You can have two per breaker in the US
<tomeu> was a great XDC for CI :)
<mupuf> Maybe 3 in europe
<tomeu> yeah, we are going to think about that for nvk
<mupuf> luckily, they run VKCTS FAAASSSSTTT
<tomeu> hmm, I think VKCTS is CPU-bound, right?
<mupuf> The new hosts we bought for CI have 5950X
<mupuf> 16 cores, 32 threads
<tomeu> due to the software rendering (something we need address to do more testing with the same amount of HW)
<mupuf> you want to bake the expectations? Would be a great idea, yeah :)
<mupuf> so, to go back to post-merge testing in Mesa. I think we could try to re-use the artifacts generated by the Marge job, and just run more CI jobs on main
<mupuf> failures would send an email to whoever wanted the job in post-merge
<mupuf> and it is their responsibility to keep it clean or disable the job until they do
<mupuf> they can use manual testing in their MR to iterate on patches
<bentiss> you should probably send that email to the person who submitted the change
<mupuf> ah, right, good idea, we should be able to get that information
<bentiss> and it will probably be: "why does my job failed, I spent too much time fixing it, let's enable it in pre-merge"
<mupuf> if that leads to improving the execution time of our testing in order to be able to test more in pre-merge, that's success for everyone!
<bentiss> I might be stubburn, but I still don't get why post merge tests are considered as good. We have a test that prevent a bug, let's enable it (unless it's flaky)
<mupuf> machine time, there are no other reasons
<bentiss> then fix the tests :)
<mupuf> improve the tests, and increase the size of farms, yep!
<bentiss> increasing the size of farms won't give you much actually, unless you can use matrx testing on multiple identical mnachines
<bentiss> because you want marge to do testing on just one MR at a time
<mupuf> yes, that's what I meant, increasing the amount of duplicated machines
egbert has quit [Ping timeout: 480 seconds]
egbert has joined #freedesktop
Haaninjo has joined #freedesktop
<karolherbst> daniels: do you have a way of checking why label maker added the 'rusticl' label here? https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19498
<karolherbst> also.. it probably shouldn't create new labels
<jenatali> The labels are case sensitive so someone probably typod and added a lowercase one
vbenes has quit [Ping timeout: 480 seconds]
<karolherbst> that would be our label bot, but I'd like to know why :P
<karolherbst> I at least though I did the right thing when changing the config
<karolherbst> I suspect the title triggered it though
<jenatali> Lowercase needs to be upper on the rhs
<jenatali> The title section is right though
<karolherbst> ahh.. probably, let me try that then
<karolherbst> wondering if that stuff gets auto deployed
Mark[m]123 has joined #freedesktop
itoral has quit [Remote host closed the connection]
<eric_engestrom> karolherbst: I don't think it does, iirc daniels has to pull the new version and restart the service
<jenatali> That's my understanding as well
Leopold has joined #freedesktop
AbleBacon has joined #freedesktop
pendingchaos_ has joined #freedesktop
ybogdano has joined #freedesktop
pendingchaos has quit [Ping timeout: 480 seconds]
pendingchaos has joined #freedesktop
pendingchaos_ has quit [Ping timeout: 480 seconds]
xyb has joined #freedesktop
xyb has quit []
xyb has joined #freedesktop
xyb has quit []
<daniels> he sure does
___nick___ has joined #freedesktop
MajorBiscuit has quit [Ping timeout: 480 seconds]
<daniels> (it's already deployed)
ybogdano has quit [Quit: The Lounge - https://thelounge.chat]
Leopold has quit [Remote host closed the connection]
ybogdano has joined #freedesktop
mvlad has quit [Remote host closed the connection]
kem has quit [Ping timeout: 480 seconds]
___nick___ has quit []
kem has joined #freedesktop
___nick___ has joined #freedesktop
___nick___ has quit []
___nick___ has joined #freedesktop
ybogdano has quit [Ping timeout: 480 seconds]
ybogdano has joined #freedesktop
ngcortes has joined #freedesktop
strugee has quit [Ping timeout: 480 seconds]
___nick___ has quit [Ping timeout: 480 seconds]
Guest69 is now known as frytaped
danvet has quit [Ping timeout: 480 seconds]
Haaninjo has quit [Quit: Ex-Chat]
<DavidHeidelberg[m]> Hey! How is called the current solution used for s3.freedesktop.org ?
karolherbst_ has joined #freedesktop
karolherbst is now known as Guest381
karolherbst_ is now known as karolherbst
danilo has joined #freedesktop
Guest381 has quit [Ping timeout: 480 seconds]
dakr has quit [Ping timeout: 480 seconds]
ybogdano has quit [Ping timeout: 480 seconds]
danilo has quit [Ping timeout: 480 seconds]
<daniels> DavidHeidelberg[m]: that’s ceph with opa + istio. do you want to know a catchy name for a MR summary or look into the components?
<DavidHeidelberg[m]> daniels: I need to know if that stuff know to generate checksums and send it as header content
<DavidHeidelberg[m]> something like Content-MD5
dakr has joined #freedesktop
Lyude has quit [Read error: Connection reset by peer]
Lyude has joined #freedesktop
dakr has quit [Read error: No route to host]
dakr has joined #freedesktop
<DavidHeidelberg[m]> daniels: exactly x-amz-content-sha256 (which ceph should support). Maybe it needs to be enabled?
dakr has quit [Quit: ZNC 1.8.2+deb2 - https://znc.in]
dakr has joined #freedesktop
eroux has quit [Ping timeout: 480 seconds]
<bentiss> DavidHeidelberg[m]: FWIW, OPA + Istio are validating the request, and return a true/false to let the request go to ceph or not
<bentiss> so maybe x-amz-content-sha256 will be properly validated by ceph, but worth case we can do that in OPA easily
<bentiss> DavidHeidelberg[m]: what's the use case?
<daniels> DavidHeidelberg[m]: ^ if you can elaborate more (inc. pointers to MRs?) then we can help better :)