#freedesktop on 2022-11-03 — irc logs at oftc.irclog.whitequark.org

2022-08-14 19:45 ChanServ changed the topic of #freedesktop to: https://www.freedesktop.org infrastructure and online services || for questions about freedesktop.org projects, please see each project's contact || for discussions about specifications, please use https://gitlab.freedesktop.org/xdg or xdg@lists.freedesktop.org

00:09 ybogdano has joined #freedesktop

00:23 ybogdano has quit [Ping timeout: 480 seconds]

00:26 ds` has quit [Quit: ...]

00:26 ds` has joined #freedesktop

01:13 chomwitt has quit [Ping timeout: 480 seconds]

02:27 <robclark> daniels: I enjoyed the MR Label Maker pun (and also seems like something gitlab should have supported out of the box for like ever) ;-)

02:39 alpernebbi has joined #freedesktop

02:43 alpernebbi_ has quit [Ping timeout: 480 seconds]

02:59 ximion has quit []

03:00 genpaku has quit [Read error: Connection reset by peer]

03:00 genpaku has joined #freedesktop

04:19 AbleBacon has quit [Read error: Connection reset by peer]

05:19 ybogdano has joined #freedesktop

05:39 ybogdano has quit [Ping timeout: 480 seconds]

06:10 eroux has joined #freedesktop

06:21 danvet has joined #freedesktop

06:25 chomwitt has joined #freedesktop

06:31 vbenes has joined #freedesktop

06:31 alanc has quit [Remote host closed the connection]

06:32 alanc has joined #freedesktop

06:36 itoral has joined #freedesktop

06:50 immibis has quit [Remote host closed the connection]

06:50 immibis has joined #freedesktop

06:56 ximion has joined #freedesktop

07:22 ximion has quit []

07:31 thaller has quit [Quit: Leaving]

07:31 thaller has joined #freedesktop

08:04 ngcortes has quit [Quit: Leaving]

08:32 <bentiss> FWIW, we have a gitlab securty update pending. I'll plan on doing that in the next few minutes

08:40 mvlad has joined #freedesktop

08:41 <bentiss> sergi: FWIW, there is already a MR bumping piglit: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19076 Feel free to take anything you need there

08:42 <bentiss> sergi: it might make more sense to merge the piglit bump first, so we keep bisectability

08:47 <pq> Thanks for top-postin! (i.e. telling me be answer before I asked the question) :-D

08:51 <sergi> Thanks bentiss. You may have seen my 'testonly' merge requests to 'uprev piglit in mesa'. We made a tool to 'uprev mesa in virglrenderer', and I'm doing a PoC to know if we can extend it with 'uprev piglit in mesa'.

08:51 <bentiss> gitaly-3 is still having issues to spin up, I might have to reboot some nodes to clean its state

08:52 <bentiss> sergi: no worries. gallo[m] made already some work at finding new issues with the uprev of piglit so I didn't want you to spend time on it when the job has already been done

08:57 <bentiss> the update to 15.5.2 went fine except for the gitaly pods that are still pending their update

08:57 <bentiss> and gitaly-3 is still down

08:59 <tomeu> bentiss: the idea is to have a tool checking daily what could break (if anything) with an uprev, and also propose a patch with the uprev and any changes to the test expectations

08:59 <bentiss> and.... all gitaly pods are up to date as of now

08:59 <bentiss> tomeu: yes, this is really nice :)

09:00 <tomeu> the main goal is to avoid people trying to uprev a component after a few weeks, then finding lots of tests failing and having to investigate them

09:00 <bentiss> tomeu: this is a very good time to do it now because there are breaking tests :)

09:00 <tomeu> yeah, with piglit it will happen pretty often

09:01 <tomeu> we will be doing something similar for other components such as the kernel, deqp, etc

09:02 <bentiss> tomeu, sergi, is there a close enough ETA for opening that uprev MR or you'll need a little bit more time (like a couple of weeks)

09:02 <bentiss> because having seen that, I figure for bisectability we probably should bump piglit first in my minio-to-s3 MR, and then do the server switch

09:03 <bentiss> though bisectability will only be ensured while we keep minio-packet.fd.o up, i.e., it'll shut down at the later on Nov 30

09:24 <sergi> there is no close eta for this uprev, bentiss. It in a proof of concept stage

09:24 <sergi> by this uprev, I mean the one made by this uprev tool we are developing

09:30 <mupuf> tomeu, sergi: this sounds great, but how can one guarantee that these background checks do not prevent merges in Mesa because the runners are unavailable?

09:45 <bentiss> sergi: OK, then I'll continue pushing !19076

09:46 <bentiss> mupuf: honestly if one extra pipeline a day is blocking MRs, we are screwed

10:07 <mupuf> bentiss: oh, so it wouldn't be a stress test, running like 10 times or more?

10:08 <bentiss> mupuf: that's not what I understood from the messages above "09:59:33 tomeu | bentiss: the idea is to have a tool checking daily what could break (if anything) with an uprev, and also propose a patch with the uprev and any changes to the test expectations"

10:09 <tomeu> yep, and will be scheduled when there is less load

10:10 <mupuf> sounds good!

10:10 <bentiss> tomeu: if you need a specific job to check just when mupuf is pushing patches, that can be arranged :)

10:10 <mupuf> tomeu: do you know how much of capacity a single pipeline uses, for your lava-backed jobs?

10:10 <mupuf> as in: mesa uses 10, but we have 25 of them in the farm

10:10 <bentiss> s/arranged/done/

10:11 <tomeu> we will be retrying jobs if we find flakiness on the first try, but hopefully that won't happen often

10:11 <tomeu> mupuf: we use all of them!

10:11 <tomeu> or well, that's the idea, in practice some device types have some slack

10:11 <mupuf> hmm... that doesn't help alleviate my concern then :D

10:12 <mupuf> well, I guess if a job takes less than 10 minutes, it isn't too bad

10:12 <tomeu> well, we have queueing and a limit on how long jobs are expected to take

10:12 <tomeu> same with all runners

10:12 <tomeu> around 10 minutes is the target

10:12 <mupuf> do you have any prioritization of jobs in lava?

10:12 <mupuf> as in, jobs coming from Marge are higher priority than the ones coming from other users

10:13 <tomeu> yes, pre-merge testing has priority over the others

10:13 <tomeu> ah, not for Marge

10:13 <tomeu> but the other day I got some stats, and surprisingly little people trigger pipelines, besides Marge

10:13 <mupuf> so, you have one gitlab runner per project per DUT? How else can you prioritize on the lava side otherwise?

10:13 <mupuf> good :p

10:14 <tomeu> ah no, we have one machine per lab that has one gitlab runner instance per device type

10:14 <mupuf> oh, and the gitlab runner has parallel = N?

10:14 <tomeu> and each gitlab runner advertises as many slots as DUTs of that type, plus one

10:15 <mupuf> I see, interesting

10:15 <tomeu> so the queueing happens mostly on the gitlab side

10:15 <tomeu> mind that the lab is shared with kernelci.org and other CIs, but those aren't pre-merge and have thus lower priority

10:15 <tomeu> plus, those jobs are typically very small

10:16 <mupuf> yeah :s I hope to get to hack on Gitlab soon to introduce two endpoints: list all the jobs that could be run by the runner that queries it, and one to pick a particular job

10:18 <tomeu> btw, there is monitoring in place to keep an eye on how well this is all working: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7613

10:19 <mupuf> yeah, I have seen this :)

10:19 <tomeu> but really, regarding HW availability, I think we should switch at some point in the near future towards increasing coverage without increasing DUTs

10:20 <tomeu> it is hard to scale further otherwise, and there is a lot of FOSS software that isn't being tested atm

10:20 * mupuf keeps delaying putting "his" navi21 in pre-merge as he has a mortal fear of preventing merges

10:20 <mupuf> Agreed

10:20 <tomeu> mupuf: have you seen the script we have to stress test jobs before we enable them in CI?

10:21 <mupuf> yes, I have, thanks to daniels :)

10:21 <mupuf> thanks for writing it

10:21 <mupuf> btw, on the topic of increasing code coverage without increasing hw needs to much: post-merge testing in mesa

10:21 <tomeu> np, so with that and some monitoring, you can probably get stuff merged and still sleep soundly at night :)

10:22 * bentiss never understood why post-merge was considered as a good thing

10:22 <tomeu> yeah, we can go that way and probably need to do so anyway (CL CTS...), but post-merge and manual jobs are close to useless if there isn't a team of people keeping an eye on them and keeping them green

10:22 <bentiss> especially when we do ff-merge

10:23 <tomeu> well, post-merge can be better than no testing, but someone needs to keep an eye on it

10:23 <mupuf> tomeu: yep... which is what I do but outside of Mesa which makes it even more painful :D

10:23 <tomeu> otherwise, it si probably worst :)

10:24 <mupuf> bentiss: well, let's put it this way: can we have pre-merge testing for 10 generations of AMD GPUs?

10:24 <bentiss> tomeu: but if post-merge testing has a chance to fail... why not do the test in pre-merge instead of breaking someone's HW?

10:24 <bentiss> mupuf: we should

10:24 <tomeu> that is probably the main problem with kernelci.org, you have a great way of knowing when stuff breaks, but not a great plan on what to do when that happens

10:24 <bentiss> mupuf:if we care

10:25 <mupuf> tomeu: I know what you mean :)

10:25 <bentiss> mupuf: let's be honest, it's much easier to ask the person who broke things to fix it than having someone else figuring it out based on a post merge result

10:25 <mupuf> for Intel, we had fast-feedback on all hardware, and full testing on limited generations

10:25 <mupuf> I think this is probably a great approach

10:26 <tomeu> I'm not sure how many intel gens we test currently in mesa CI, but I tihnk it will be close to 10

10:26 <tomeu> and we are adding more

10:26 <mupuf> bentiss: I can't overstate how much I agree with you

10:26 <tomeu> and that is without the vendor's intervention :p

10:26 <bentiss> heh :)

10:26 <mupuf> tomeu: yeah, it is good to have someone so interested in keeping chromebooks tested :p

10:27 <tomeu> it's great to have somebody interested in mesa not regressing ;)

10:27 <mupuf> ;)

10:27 <mupuf> We received 70 chromebooks at Intel after I presented our efforts at XDC 2017.... which was hosted by Google :p

10:28 MajorBiscuit has joined #freedesktop

10:28 <mupuf> that being said, testing Intel platforms is easy on the infra side: try getting 10 machines sporting some RTX 4090 :D

10:28 <mupuf> You can have two per breaker in the US

10:29 <tomeu> was a great XDC for CI :)

10:29 <mupuf> Maybe 3 in europe

10:29 <tomeu> yeah, we are going to think about that for nvk

10:30 <mupuf> luckily, they run VKCTS FAAASSSSTTT

10:30 <tomeu> hmm, I think VKCTS is CPU-bound, right?

10:30 <mupuf> The new hosts we bought for CI have 5950X

10:30 <mupuf> 16 cores, 32 threads

10:30 <tomeu> due to the software rendering (something we need address to do more testing with the same amount of HW)

10:31 <mupuf> you want to bake the expectations? Would be a great idea, yeah :)

10:34 <mupuf> so, to go back to post-merge testing in Mesa. I think we could try to re-use the artifacts generated by the Marge job, and just run more CI jobs on main

10:34 <mupuf> failures would send an email to whoever wanted the job in post-merge

10:34 <mupuf> and it is their responsibility to keep it clean or disable the job until they do

10:35 <mupuf> they can use manual testing in their MR to iterate on patches

10:35 <bentiss> you should probably send that email to the person who submitted the change

10:35 <mupuf> ah, right, good idea, we should be able to get that information

10:35 <bentiss> and it will probably be: "why does my job failed, I spent too much time fixing it, let's enable it in pre-merge"

10:36 <mupuf> if that leads to improving the execution time of our testing in order to be able to test more in pre-merge, that's success for everyone!

10:37 <bentiss> I might be stubburn, but I still don't get why post merge tests are considered as good. We have a test that prevent a bug, let's enable it (unless it's flaky)

10:37 <mupuf> machine time, there are no other reasons

10:38 <bentiss> then fix the tests :)

10:38 <mupuf> improve the tests, and increase the size of farms, yep!

10:39 <bentiss> increasing the size of farms won't give you much actually, unless you can use matrx testing on multiple identical mnachines

10:39 <bentiss> because you want marge to do testing on just one MR at a time

10:40 <mupuf> yes, that's what I meant, increasing the amount of duplicated machines

10:41 egbert has quit [Ping timeout: 480 seconds]

10:43 egbert has joined #freedesktop

10:47 Haaninjo has joined #freedesktop

10:53 <karolherbst> daniels: do you have a way of checking why label maker added the 'rusticl' label here? https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19498

10:53 <karolherbst> also.. it probably shouldn't create new labels

11:40 <jenatali> The labels are case sensitive so someone probably typod and added a lowercase one

12:41 vbenes has quit [Ping timeout: 480 seconds]

13:09 <karolherbst> that would be our label bot, but I'd like to know why :P

13:09 <karolherbst> I at least though I did the right thing when changing the config

13:10 <karolherbst> I suspect the title triggered it though

13:11 <jenatali> karolherbst: https://gitlab.freedesktop.org/freedesktop/mr-label-maker/-/blob/main/mr_label_maker/mesa.py#L233

13:11 <jenatali> Lowercase needs to be upper on the rhs

13:11 <jenatali> The title section is right though

13:12 <karolherbst> ahh.. probably, let me try that then

13:14 <karolherbst> wondering if that stuff gets auto deployed

13:27 Mark[m]123 has joined #freedesktop

13:31 itoral has quit [Remote host closed the connection]

13:39 <eric_engestrom> karolherbst: I don't think it does, iirc daniels has to pull the new version and restart the service

13:42 <jenatali> That's my understanding as well

14:08 Leopold has joined #freedesktop

14:17 AbleBacon has joined #freedesktop

14:28 pendingchaos_ has joined #freedesktop

14:30 ybogdano has joined #freedesktop

14:31 pendingchaos has quit [Ping timeout: 480 seconds]

14:36 pendingchaos has joined #freedesktop

14:39 pendingchaos_ has quit [Ping timeout: 480 seconds]

15:19 xyb has joined #freedesktop

15:20 xyb has quit []

15:21 xyb has joined #freedesktop

15:22 xyb has quit []

15:52 <daniels> he sure does

15:55 ___nick___ has joined #freedesktop

16:04 MajorBiscuit has quit [Ping timeout: 480 seconds]

16:13 <daniels> (it's already deployed)

16:39 ybogdano has quit [Quit: The Lounge - https://thelounge.chat]

16:50 Leopold has quit [Remote host closed the connection]

17:35 ybogdano has joined #freedesktop

17:47 mvlad has quit [Remote host closed the connection]

18:49 kem has quit [Ping timeout: 480 seconds]

18:57 ___nick___ has quit []

18:58 kem has joined #freedesktop

18:59 ___nick___ has joined #freedesktop

18:59 ___nick___ has quit []

19:01 ___nick___ has joined #freedesktop

19:53 ybogdano has quit [Ping timeout: 480 seconds]

19:54 ybogdano has joined #freedesktop

20:06 ngcortes has joined #freedesktop

20:25 strugee has quit [Ping timeout: 480 seconds]

21:07 ___nick___ has quit [Ping timeout: 480 seconds]

21:14 Guest69 is now known as frytaped

21:19 danvet has quit [Ping timeout: 480 seconds]

21:24 Haaninjo has quit [Quit: Ex-Chat]

21:57 <DavidHeidelberg[m]> Hey! How is called the current solution used for s3.freedesktop.org ?

21:59 karolherbst_ has joined #freedesktop

22:00 karolherbst is now known as Guest381

22:00 karolherbst_ is now known as karolherbst

22:02 danilo has joined #freedesktop

22:03 Guest381 has quit [Ping timeout: 480 seconds]

22:04 dakr has quit [Ping timeout: 480 seconds]

22:05 ybogdano has quit [Ping timeout: 480 seconds]

22:12 danilo has quit [Ping timeout: 480 seconds]

22:12 <daniels> DavidHeidelberg[m]: that’s ceph with opa + istio. do you want to know a catchy name for a MR summary or look into the components?

22:13 <DavidHeidelberg[m]> daniels: I need to know if that stuff know to generate checksums and send it as header content

22:13 <DavidHeidelberg[m]> something like Content-MD5

22:15 dakr has joined #freedesktop

22:16 Lyude has quit [Read error: Connection reset by peer]

22:16 Lyude has joined #freedesktop

22:27 dakr has quit [Read error: No route to host]

22:27 dakr has joined #freedesktop

22:29 <DavidHeidelberg[m]> daniels: exactly x-amz-content-sha256 (which ceph should support). Maybe it needs to be enabled?

22:37 dakr has quit [Quit: ZNC 1.8.2+deb2 - https://znc.in]

22:38 dakr has joined #freedesktop

22:38 eroux has quit [Ping timeout: 480 seconds]

22:51 <bentiss> DavidHeidelberg[m]: FWIW, OPA + Istio are validating the request, and return a true/false to let the request go to ceph or not

22:52 <bentiss> so maybe x-amz-content-sha256 will be properly validated by ceph, but worth case we can do that in OPA easily

22:52 <bentiss> DavidHeidelberg[m]: what's the use case?

23:33 <daniels> DavidHeidelberg[m]: ^ if you can elaborate more (inc. pointers to MRs?) then we can help better :)