#dri-devel on 2023-03-12 — irc logs at oftc.irclog.whitequark.org

2022-12-21 00:45 ChanServ changed the topic of #dri-devel to: <ajax> nothing involved with X should ever be unable to find a bar

00:09 <alyssa> android-virgl-llvmpipe is way too slow

00:09 <alyssa> https://gitlab.freedesktop.org/mesa/mesa/-/jobs/37907988

00:09 <alyssa> Marge failed the pipeline because that was 5 seconds over its 20 minute budget

00:10 <alyssa> probably the time was spent waiting for a runner but still

00:22 Zopolis4 has quit []

00:38 smilessh has joined #dri-devel

00:47 <DavidHeidelberg[m]> alyssa: daniels limited it to the 20 minutes recently, feel free to bump it to the 25 in the MR

00:47 <DavidHeidelberg[m]> I would do it, bit I'm on the phone

00:51 <alyssa> I feel like we need a better approach to CI

00:52 <alyssa> because this isn't scaling and not for lack of herculean efforts trying

00:56 <alyssa> Maybe driver teams queueing up all their MRs for the week (that they've certified are good) and assigning as a unit to Marge

00:56 <alyssa> so that the Marge queue is freed up for big common code changes that really do need the extra CI checks

00:57 <alyssa> doesn't fix reliability but it saves resources, and the fewer pipelines you assign to Marge the fewer fails you'll see statistically

01:01 wind has joined #dri-devel

01:01 <alyssa> if that queueing is happening at a team level (and not an individual level), then you sort out the rebase conflicts "offline"

01:01 <alyssa> and only a single person on the team has to actually interact with the upstream CI (rather than the teensy subset that's needed for the team to develop amongst themselves)

01:02 <alyssa> not to throw one member of the team under the bus, that role can (and probably should) rotate

01:03 <alyssa> but, like, if I only had to interact with Marge once a fortnight, and Lina interacted with Marge once a fortnight, and asahi/mesa became a canonical integration tree that got synced upstream every week... that would eliminate a lot of the emotional burden, I think

01:03 <alyssa> the associated problem with downstream canonical integration is that it can move review downstream too, which we don't want

01:03 <alyssa> ideally review continues to happen out on the open in mesa/mesa, just doesn't get merged to upstream immediately

01:04 <alyssa> one kludge way to do this is to use the "Needs merge" milestone for stuff that's nominally ready but needs to be queued up with other work from that team before hitting marge

01:04 <alyssa> and then having some out of band way to sync that downstream for integration for the week

01:04 <alyssa> a better way might be having branches on mesa/mesa for each driver team

01:05 <alyssa> mesa/mesa:asahi, mesa/mesa:panfrost, whatever

01:05 <alyssa> such that MRs that are for a single driver (as opposed to something for common code) gets the MR targeted against its team branch instead of mesa's main branch directly

01:06 <alyssa> so then the weekly Marge exercise is just MR'ing mesa/mesa:team against mesa/mesa:main

01:06 <alyssa> this gives people the satisfaction of seeing their code merged in gitlab even if it's not upstream yet

01:08 windleaves has quit [Ping timeout: 480 seconds]

01:12 columbarius has joined #dri-devel

01:14 co1umbarius has quit [Ping timeout: 480 seconds]

01:43 <DavidHeidelberg[m]> alyssa: there was dying runners this week, valve and our farm, this week the situation SHOULD get better (due to retries), sadly bad luck occured and it got worse :D

01:44 <airlied> turned off issues for public again

01:44 <airlied> since spammer is live in action

01:44 <alyssa> DavidHeidelberg[m]: this week was certainly worse than usual, but I've been talking about these problems for years..

01:44 <alyssa> airlied: WOULD YOU LIKE TO WATCH FREE MOVIES ABOUT MESA?!

01:45 <DavidHeidelberg[m]> I would like. Where? :D

01:45 <alyssa> DavidHeidelberg[m]: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/

01:45 <alyssa> just start, er, watching the MRs :p

01:45 <alyssa> it's FREE!

01:45 <alyssa> aco MR hung on debian-mingw32-x86_64

01:45 <alyssa> why are we even testing mingw in CI wtf?

01:45 <alyssa> can we delete that job?

01:46 <alyssa> better question why do we support building in mingw at all?

01:46 <alyssa> what possible benefit is that brining us

01:46 <alyssa> bringing

01:46 <alyssa> dcbaker:

01:46 <DavidHeidelberg[m]> + gitlab instability problems, yeah

01:47 <alyssa> DavidHeidelberg[m]: I mean. Yes. Those are all true in isolation

01:47 <DavidHeidelberg[m]> Nah, it's not FREE. We're here because we are not free.

01:47 <alyssa> But if you look at the bigger picture... the model we have is fundamentally unsustainable

01:48 <alyssa> pre-merge CI on every single configuration for every single merge request, serialized across all MRs to an upstream project with dozens of full time people, where the test coverage is fundamentally extremely intensive for each piece of hw/sw

01:48 <DavidHeidelberg[m]> I agree with you.

01:48 <alyssa> No matter how amazingly competent the people running it are... that model is fundamentally unsustainable

01:48 <kisak> >_> I'm here because it's libre ... but I don't do much around here except complain.

01:49 <DavidHeidelberg[m]> That's why I pushed for `retry: 1`. To handle these flakes which .. sometimes... happen

01:49 <alyssa> (and while I'm not going to name names, other long time Mesa devs have also express very unhappy about this)

01:49 <alyssa> expressed being

01:49 heat has quit [Read error: Connection reset by peer]

01:49 <alyssa> yes, the retry helps. but that doesn't fix the deeper sustainability issue

01:49 <DavidHeidelberg[m]> If we have no long term failures (farm, runners, bad job), it can work I believe.

01:49 heat has joined #dri-devel

01:50 <alyssa> I don't think so

01:50 <alyssa> The cost to run CI that way is quadratic over time

01:50 * DavidHeidelberg[m] tends to be sometimes optimistic

01:50 <alyssa> The more Mesa grows, the more commits are coming in AND the more coverage every commit becomes subject to

01:50 <alyssa> so unless our budget is growing quadratically (hint: it's not), at some point we exhaust our very finite resources and the whole thing collapses

01:51 <DavidHeidelberg[m]> I shouls get more pesimistic. But then I say CI is useless waste of power.

01:51 <alyssa> Well, if it was *useless*, I'd say we should delete it all

01:51 <alyssa> It isn't useless. Good targetted CI catches real bugs.

01:51 <alyssa> The problem is that bad CI is worse than no CI.

01:52 <DavidHeidelberg[m]> Well, we also try optimize over time more and more, so quadratic is maybe too exsaturated

01:52 <DavidHeidelberg[m]> I agree.

01:52 <alyssa> So... under the current approach to CI, the growth of Mesa means a quadratic increase in CI cost / power consumption / wall clock time / stability / whatever metric we're looking at.

01:52 <DavidHeidelberg[m]> Sure.

01:53 <alyssa> Either we reduce how much we run in CI (by limiting coverage to what *really* matters, not what we think might matter, and having to budget how much we put in the pipeline)

01:54 <alyssa> or we reduce how often we run CI (e.g. by requiring that vendor MRs be batched up and start at staging trees, to try to keep the # of Marge MRs per week roughly constant even as the # of commits increases)

01:54 <alyssa> or both

01:54 <alyssa> So far both approaches have been wildly unpopular

01:54 <kisak> alyssa: it should be noted that has already happened in the past and adjustments were made to get back in budget

01:55 <DavidHeidelberg[m]> I would say batched stuff could be problem when looking at (often) flakes. Bisectability will decrease

01:55 <alyssa> kisak: Yes, I'm aware. Every once in a while we have a panic, get our act together for a bit, and then slowly the frog boils back

01:55 <DavidHeidelberg[m]> :D

01:55 <alyssa> which gets back to my original point: what we're doing is fundamentally unsustainable

01:55 <alyssa> technically, ecologically, emotionally

01:56 <DavidHeidelberg[m]> Just wanted to being the car industry, but you summed up nicely

01:56 <alyssa> whether your axis is fd.o's budget in $'s, or kgs of co2 the ci farms are emitting, or # of mesa developers and fd.o sysadmins we burn out

01:56 <alyssa> this is unsustainable

01:57 <DavidHeidelberg[m]> BTW. Be prepared to drop rust with lto to C89 with -G0 equivalent :D

01:57 <DavidHeidelberg[m]> *-g0

01:57 <alyssa> Hum?

01:57 <alyssa> IDK what -g0 is

01:57 <DavidHeidelberg[m]> Hmm. Too tired -O0

01:58 <DavidHeidelberg[m]> I've been playing too much with -ggdb and -g recently...

01:58 <alyssa> mm

01:58 <alyssa> I can start with the containers

01:58 <alyssa> Why do we need to test 3 different Linux distributions on x86?

01:59 <alyssa> but only for build testing?

01:59 <DavidHeidelberg[m]> Alpine has musl libx; Debian we love; someone cares about Fedora (RedHat? ;) )

01:59 <alyssa> Is building against musl libc in premerge CI bringing us any value?

01:59 <alyssa> Keeping in mind it's a build test and not a runtime test

02:00 <alyssa> Could we drop that job and instead do an alpine build once a release, and if there are build regressions, fix them then right before the branchpoint?

02:00 <alyssa> Likewise for Fedora?

02:00 <DavidHeidelberg[m]> We don't break stuff for musl based distros. That's kinda nice behavior :) + other sets of warning from different set libraries

02:01 <alyssa> having something in premerge CI is very different from not breaking

02:01 <DavidHeidelberg[m]> The once per release - could be solution.

02:01 <alyssa> The coverage can still exist but it doesn't need to run more than once a release or once a month or whatever

02:01 <alyssa> having something in premerge CI is very different from not breaking it

02:01 <alyssa> gitlab ci is not a fundamentally bad place to have "build all the things" but premerge ci is

02:01 <DavidHeidelberg[m]> When I added it (alpine) we had usually lot runners available. Situation changed a bit recently

02:01 <alyssa> I'm just working my way down the list

02:02 <alyssa> But my point is: we know that there is a cost to the Alpine and Fedora jobs (or Debian, I don't care which distro we pick for CI, it doesn't matter to me)

02:02 <alyssa> Is there a benefit to having them in premerge? Are they catching real bugs?

02:02 devinaut has joined #dri-devel

02:02 <alyssa> I can't think of any they've caught (that debian/x86 didn't also catch)

02:02 <DavidHeidelberg[m]> I think time to time some ifdefs for alpine

02:03 <alyssa> Sure. That's easy to fix up once a release or once a month. Doesn't need to be once a commit.

02:03 <alyssa> Especially given it's build testing only.

02:03 <DavidHeidelberg[m]> I agreed already on that one.

02:03 <alyssa> All the same considerations go for "weird" architectures, namely anything that doesn't feed into a software or hardware test after

02:03 <alyssa> s390x, ppc64el builds

02:04 <DavidHeidelberg[m]> The trick is these jobs are heavily cached and non-blocking CI

02:05 <DavidHeidelberg[m]> Except load of container and linking, it usually doesn't do much job

02:05 <alyssa> Do we need build tests for all the combinatorics of {gcc, clang} x {debug, release}? Would halving that to gcc+debug and clang+release (or vice versa) get most of the bang for the buck?

02:05 <alyssa> Why are their clover jobs still? we're committed to not supporting clover anymore anyway, amber it.

02:05 <alyssa> Why can't the rusticl job be part of the regular gcc or clang build?

02:06 <alyssa> Why is Vulkan special?

02:06 <alyssa> Why is fedora-release getting us value over the existing release build testing?

02:06 <DavidHeidelberg[m]> I have feeling you have cancel mood :D

02:06 <alyssa> Why are we premerge testing mingw?

02:06 <alyssa> david i haven't tweeted since 2022 i am itching to cancel ;-p

02:06 <DavidHeidelberg[m]> With you approach we end with 3 jobs :D

02:07 <alyssa> Yes. And if those 3 jobs are reliable and we can count on them, it'd be a big win overall.

02:07 <DavidHeidelberg[m]> and I'm not saying it won't solve the CI problem :D

02:07 <alyssa> A lot easier to manage for sysadmins, a lot better for developers, no real value lost

02:08 <alyssa> Checksum-based trace testing is a HUGE bug bear of mine, because it is fundamentally unsound and contrary to the GL spec

02:08 <alyssa> which means it is a constant source of trouble (=> cost to all of developers sysadmins and the machines/infra itself from wasted pipelines going to update checksums)

02:08 <alyssa> because it fundamentally cannot work according to the GL spec

02:09 <DavidHeidelberg[m]> Agree. But there is tradeoff in complexity. :/

02:09 <alyssa> and for all the resources it's wasted, it has yet to catch a single actual bug in my memory

02:09 <alyssa> (resources both human and machine)

02:09 <DavidHeidelberg[m]> I had to drop many cool traces because onepixel reproducibility.

02:10 <alyssa> we have to run deqp in CI, that's nonnegotiable

02:10 <alyssa> which means every other CI job has to be measured not as "can this catch issues?" but rather "will this in practice catch issues that the deqp coverage will miss?"

02:11 <alyssa> for piglit this answer is "absolutely yes", for trace based testing the answer is "in theory maybe but not for the pain it's worth"

02:12 <alyssa> given you need a human in the loop to review the trace changes, having it in premerge is unacceptable for that reason alone

02:12 <alyssa> but also you don't lose much from running them once a {release,month,whatever} if you're an interested party in a particular driver

02:15 <Lynne> radv you lying bastard!

02:16 <Lynne> I give vkGetMemoryHostPointerPropertiesEXT pointer 0x7f0fe1b94000 and it happily returns memoryTypeBits 0x20

02:16 <bnieuwenhuizen> Lynne: ?

02:16 <Lynne> but when I try to actually host map the memory address, with a length of 2076672, it returns VK_ERROR_INVALID_EXTERNAL_HANDLE

02:16 <Lynne> both the address and length are mod the pagesize

02:16 <bnieuwenhuizen> how did you get the memory?

02:17 <Lynne> just regular system ram, aligned malloc

02:17 <bnieuwenhuizen> hmm, that should succeed

02:17 <Lynne> it apparently also fails on ANV, but that's not enough to get attention in here :)

02:17 <alyssa> performance trace-based testing, you're up next

02:18 <alyssa> I don't know how this is supposed to work and I don't want to know

02:18 <bnieuwenhuizen> what do you mean with host map btw?

02:18 <alyssa> is that not actually CI and just (ab)using CI to fed grafana?

02:19 <bnieuwenhuizen> alyssa: trace based testing is just using traces as testcases. Maybe you're wanting perf numbers with that?

02:19 <alyssa> bnieuwenhuizen: which msg is this replying to?

02:19 <alyssa> my issue with the trace-based tests or me not understanding why there are -performance versions

02:19 <HdkR> Lynne: GPU page size or CPU page size? :D

02:20 <bnieuwenhuizen> alyssa: the grafana thingy

02:20 <alyssa> oh

02:20 <bnieuwenhuizen> HdkR: should be the same on AMD

02:20 <alyssa> I know freedreno has a perf dashboard, I guess they're abusing CI to feed it with data

02:20 <Lynne> yup, minImportedHostPointerAlignment = 4096

02:20 <HdkR> bnieuwenhuizen: ooo fancy

02:21 <bnieuwenhuizen> of course major benefits if you use hugepage or similar (GPU really likes 64k pages)

02:21 <alyssa> performance-rules has allow_failure so I guess I don't care for the purpose of this rant

02:21 <Lynne> by host-map, I mean I map the host memory as a VkDeviceMemory and use it to back a VkBuffer

02:21 <bnieuwenhuizen> Lynne: anything in dmesg?

02:21 <Lynne> no, empty

02:21 <alyssa> so... software-renderer you're up... why is there an llvmpipe-piglit-clover job when we're not supporting clover and there's ALSO a rusticl job? why trace based testing (see above issue)?

02:25 <alyssa> layered-backends: do I even want to ask about the spirv2dxil job, how is that even in scope for upstream testing? virgl traces has the usual trace problems and has caused problems for me personally with correct NIR changes, what value is that providing to upstream Mesa to justify its inclusion in premerge? similar for zink traces?

02:25 <bnieuwenhuizen> Lynne: maybe try some errno dumping? https://gitlab.freedesktop.org/bnieuwenhuizen/mesa/-/commit/eb12053b213d91b2970d3043a8bb4c6540fb9554

02:26 <Lynne> sure, give me a sec

02:27 <alyssa> quite frankly, with my upstream Mesa hat on, I am NAK'ing checksum-based trace testing in pre-merge CI for any driver.

02:27 <alyssa> I can do that now apparently XP

02:27 <bnieuwenhuizen> also IIRC there stupidly there is a kernel build option that needs to be enabled to make it work

02:27 <bnieuwenhuizen> thought it was kinda default ish, but at least ChromeOS at some point managed to disable it

02:27 <Lynne> "host ptr import fail: 1 : Operation not permitted"

02:28 <bnieuwenhuizen> thx, let me check

02:30 <Lynne> err, apparently it's not ordinary ram

02:31 <Lynne> it's actually device memory

02:31 <bnieuwenhuizen> oh that wouldn't work

02:31 <Lynne> yeah, I thought so, my fault, thanks

02:32 <alyssa> The bottom line is "some company paid to have this in Mesa CI" is fundamentally an UNACCEPTABLE reason to put something in upstream CI

02:32 <alyssa> because it's a cost that EVERYONE pays

02:32 <alyssa> for every item in CI

02:32 devinaut has quit []

02:33 <alyssa> For any job in CI the cost the community pays to have it needs to be measured against the benefit the community gains from it

02:33 <alyssa> and if the cost exceeds the benefit -- as it does in the case of a number of the jobs I mentioned above -- it does not deserve to be in premerge

02:33 <alyssa> even if there's a billion dollar corporate sponsor for the CI coverage

02:34 <Lynne> I wouldn't be able to detect if an address is device memory, by any chance, right?

02:34 <bnieuwenhuizen> failure to import? :P

02:34 <bnieuwenhuizen> but no, not really

02:34 <alyssa> it is a simple cost-benefit analysis, and if the private bigcorp reaps the benefit while the commons pays the cost.. that's unacceptable

02:34 <Lynne> or could vkGetMemoryHostPointerPropertiesEXT be changed to return !VK_SUCCESS?

02:35 <bnieuwenhuizen> we could try to do an import there

02:35 <bnieuwenhuizen> I think the other weird case is mmapped files, I don't think those are supported either

02:37 <Lynne> would be nice if the function to check if a pointer can be imported actually checks, it's where my fallback is hooked up to

02:40 ice9 has joined #dri-devel

02:45 ice9 has quit [Read error: Connection reset by peer]

02:57 <alyssa> what a surprise, !20553 is currently in the long pile of running traces

02:57 <alyssa> would be merged if not for the traces

02:57 <alyssa> ^already

03:08 JohnnyonFlame has quit [Ping timeout: 480 seconds]

03:20 <alyssa> Aahahahaha and the pipeline failed because of a trace job flaking

03:21 <alyssa> how many times do i need to say that trace jobs cannot be in premerge testing

03:24 <alyssa> Trace-based testing. does not. belong in upstream premerge

03:24 <alyssa> Every bit of premerge CI coverage is a cost that EVERYONE pays

03:25 <alyssa> and unless there's that benefit in turn, it is a burden on EVERYONE and needs to go

03:25 <alyssa> and given that the value proposition of checksum based trace testing is essentially nil, I see no reason to keep it.

03:26 <alyssa> do what you want post-merge but this is an unacceptable burden for the community to bear

03:30 lemonzest has quit [Quit: WeeChat 3.6]

03:35 Zopolis4 has joined #dri-devel

03:39 lemonzest has joined #dri-devel

03:45 <lina> alyssa: Another advantage of having asahi/main as integration point is we could probably add our own custom CI without having to worry about its stability being an issue for other teams ^^

03:45 <lina> (Like once we actually have runners)

03:46 <lina> Or some other branch specific to us

03:50 <lina> Like if I hav some runners in my closet it's probably good enough for us but I don't want to be responsible for breaking CI for everyone if my internet goes down ^^;;

03:53 <alyssa> lina: Responsible. I appreciate that :)

03:53 * HdkR sweats in pile of ARM boards

04:05 <Lynne> if you've got too much of them, you can turn them into bricks of fabulous doorstops very easily if they're called "rockchip" and carry the number 3399, just call dd

04:31 orbea has quit [Quit: You defeated orbea! 2383232 XP gained!]

04:33 orbea has joined #dri-devel

04:36 digetx has quit [Ping timeout: 480 seconds]

05:01 digetx has joined #dri-devel

05:06 agd5f_ has joined #dri-devel

05:10 <lina> Actually, would it make sense to gate pre-merge CI on the tags?

05:11 <lina> Like only run CI specific to the drivers affected

05:11 <lina> And then full CI can run periodically on the main branch

05:12 agd5f has quit [Ping timeout: 480 seconds]

05:14 Company has quit [Quit: Leaving]

05:15 <alyssa> lina: Pre-merge CI is gated on the files updated

05:16 <alyssa> mesa/.gitlab-ci/test-source-dep.yml controls that

05:16 <alyssa> tags aren't used since they get stale easily

05:19 alyssa has quit [Quit: leaving]

05:24 bgs has joined #dri-devel

05:36 <daniels> DavidHeidelberg[m]: I limited it to 20min as it usually completes in 8min; 25min is in no way normal or good

05:42 agd5f has joined #dri-devel

05:47 agd5f_ has quit [Ping timeout: 480 seconds]

05:48 heat has quit [Ping timeout: 480 seconds]

06:03 krushia has quit [Quit: Konversation terminated!]

06:19 RSpliet has quit [Quit: Bye bye man, bye bye]

06:19 RSpliet has joined #dri-devel

06:25 nishiyama has joined #dri-devel

06:37 nishiyama has quit []

06:57 agd5f_ has joined #dri-devel

07:04 agd5f has quit [Ping timeout: 480 seconds]

07:07 Duke`` has joined #dri-devel

07:21 chipxxx has quit [Remote host closed the connection]

07:23 chipxxx has joined #dri-devel

07:30 chipxxx has quit []

07:31 chipxxx has joined #dri-devel

07:45 robobub has joined #dri-devel

07:55 Zopolis4 has quit []

07:58 sghuge has quit [Remote host closed the connection]

07:58 sghuge has joined #dri-devel

08:07 sghuge has quit [Ping timeout: 480 seconds]

08:36 danvet has joined #dri-devel

08:37 fab has joined #dri-devel

08:56 YuGiOhJCJ has joined #dri-devel

09:02 agd5f has joined #dri-devel

09:08 agd5f_ has quit [Ping timeout: 480 seconds]

09:45 agd5f_ has joined #dri-devel

09:46 JohnnyonFlame has joined #dri-devel

09:49 hansg has joined #dri-devel

09:51 agd5f has quit [Ping timeout: 480 seconds]

09:57 ManMower has quit [Ping timeout: 480 seconds]

10:04 kzd has quit [Quit: kzd]

10:12 Zopolis4 has joined #dri-devel

10:17 <Newbyte> This page links to the Mesa issue tracker as the place to report bugs: https://docs.mesa3d.org/bugs.html

10:17 <Newbyte> But the link is a 404. What gives?

10:20 <ccr> due to spammer issues the issue tracking is currently set to project members only (afaik)

10:20 <Newbyte> thanks

10:20 pcercuei has joined #dri-devel

10:29 <psykose> it makes them invisible to non-members too :D

10:36 Haaninjo has joined #dri-devel

10:41 gouchi has joined #dri-devel

10:46 evadot has quit [Quit: quit]

10:49 evadot has joined #dri-devel

10:50 <ccr> unfortunately, but I'm sure someone is working on a better solution.

11:03 bluetail9 has joined #dri-devel

11:03 bluetail9 has quit []

11:04 bluetail has joined #dri-devel

11:11 fab has quit [Remote host closed the connection]

11:17 fab has joined #dri-devel

11:20 agd5f has joined #dri-devel

11:25 fab has quit [Quit: fab]

11:26 agd5f_ has quit [Ping timeout: 480 seconds]

11:30 fab has joined #dri-devel

11:40 fab has quit [Quit: fab]

11:41 fab has joined #dri-devel

11:43 <DavidHeidelberg[m]> daniels: I agree, my point was if something wen't wrong and it takes around 21-23 minutes it's still better compromise to have 25 instead of 1h before if that lead to job finishing

11:45 <DavidHeidelberg[m]> daniels: as I'm looking into the Daily and the failure rate, maybe we should disable it for now + I'm thinking about moving alpine and fedora into nightly runs, since figuring out build failure isn't that hard and it happens only rarely

11:57 agd5f_ has joined #dri-devel

12:00 kts has joined #dri-devel

12:03 agd5f has quit [Ping timeout: 480 seconds]

12:10 Danct12 has joined #dri-devel

12:28 Haaninjo has quit [Read error: Connection reset by peer]

12:32 Haaninjo has joined #dri-devel

12:36 kts has quit [Quit: Konversation terminated!]

12:55 jernej_ has quit [Remote host closed the connection]

12:55 jernej has joined #dri-devel

12:57 MrCooper has quit [Remote host closed the connection]

12:57 jernej has quit []

12:57 jernej has joined #dri-devel

12:57 MrCooper has joined #dri-devel

13:12 kode54 has quit [Quit: Ping timeout (120 seconds)]

13:13 sjfricke[m] has quit []

13:15 kode54 has joined #dri-devel

13:15 Leopold__ has quit [Remote host closed the connection]

13:17 Leopold has joined #dri-devel

13:27 jernej has quit [Remote host closed the connection]

13:29 jernej has joined #dri-devel

13:29 YuGiOhJCJ has quit [Quit: YuGiOhJCJ]

13:35 ManMower has joined #dri-devel

13:37 zehortigoza has quit [Remote host closed the connection]

13:43 Company has joined #dri-devel

13:51 jdavies has joined #dri-devel

13:52 jdavies is now known as Guest7468

13:52 bmodem has joined #dri-devel

13:54 nekit has quit [Quit: The Lounge - https://thelounge.chat]

13:56 nekit has joined #dri-devel

13:59 Guest7468 has quit [Ping timeout: 480 seconds]

14:09 srslypascal is now known as Guest7472

14:09 srslypascal has joined #dri-devel

14:11 Guest7472 has quit [Ping timeout: 480 seconds]

14:23 bmodem has quit [Ping timeout: 480 seconds]

14:28 kts has joined #dri-devel

14:37 <jenatali> alyssa: re mingw, some downstream folks apparently wanted that - not us fwiw

14:38 <jenatali> Re spirv2dxil, it's a compiler only job. Its main purpose was to stress the DXIL backend when fed Vulkan SPIR-V, but now that Dozen is more mature we can probably retire it

14:41 <daniels> DavidHeidelberg[m]: yeah but honestly it’s just hiding actual root causes and making it harder to solve the actual problem

14:42 <DavidHeidelberg[m]> what I was thinking is moving our daily treshold for reporting jobs to 15 and 30 minutes, instead of 30 and 60 minutes (enqueued etc.)

14:43 <DavidHeidelberg[m]> so we would see it. I has to agree with Alyssa that it's so annoying from developer POV and, how we care about it doesn't really matter, but they should have passing marge pipelines no matter how we reach it

14:50 junaid has joined #dri-devel

15:01 Zopolis4 has quit []

15:41 srslypascal is now known as Guest7479

15:41 srslypascal has joined #dri-devel

15:42 <daniels> sure, but at some stage it's unusable anyway - we could accept that jobs take 3-4x the runtime, which probably means making the marge timeout 3h, and at that point we can only merge 8 MRs per day

15:48 Guest7479 has quit [Ping timeout: 480 seconds]

16:12 hansg has quit [Quit: Leaving]

16:29 kts has quit [Quit: Konversation terminated!]

16:33 junaid has quit [Ping timeout: 480 seconds]

16:41 pcercuei has quit [Quit: bbl]

16:42 junaid has joined #dri-devel

16:43 <cheako> Do ppl know that issues were removed from gitlab/mesa?

16:44 <hch12907> you need an account to access them, I think

16:44 <cheako> good catch, but no I'm logged in.

16:45 <ccr> only available project members, e.g. people with certain access level. due to recent spam issues.

16:45 <ccr> available to

16:53 <cheako> I'm trying to provide more information, I can wait/when should I try again?

16:58 <daniels> cheako: opened them up now

17:02 <cheako> :)

17:25 fab_ has joined #dri-devel

17:25 fab_ is now known as Guest7487

17:31 fab has quit [Ping timeout: 480 seconds]

17:35 smilessh has quit [Ping timeout: 480 seconds]

17:40 alyssa has joined #dri-devel

17:40 <alyssa> jenatali: yep, I am aware that the mingw job wasn't you

17:40 <alyssa> whether I'm happy about it or not, the windows jobs have earned their place :p

17:41 <alyssa> (the VS2019 ones)

17:42 <alyssa> which is why I was wondering what benefit it *was* providing

17:43 * jenatali shrugs

17:46 <daniels> the vmware team do most of their work on top of mingw

17:47 <alyssa> OK, I don't think I recalled that

17:49 <alyssa> So then the question is -- what benefit is there to the job (i.e. what issues will it catch that the combination of linux gcc + windows vs2019 will not catch), what cost is it to premerge ci, and how much of that benefit could be recovered with some form of post-merge coverage (likely almost all of it, because build failures are easy to deal with for relevant stakeholders... given that there is Windows

17:49 <alyssa> CI it should be a rare event to see a mingw only failure)

17:50 <jenatali> I'd be inclined to agree, post-merge seems more appropriate

17:50 <alyssa> benefit measured as P(legitimate fail in mingw | windows vs2019 passes AND gcc linux passes)

17:51 <daniels> I'm not sure that post-merge has any more value than just not having it ever, because all that happens is that you get used to seeing that stuff has failed and ignoring it

17:51 <alyssa> The question is who is "you"

17:51 <daniels> either way, I've disabled the job for now as it's broken in some kind of exotic way

17:51 <alyssa> If the "you" is "Alyssa", then that seems... fine? I don't do anything that's liable to change mingw in interesting ways and from an upstream perspective mingw is not something we're committed to supporting, just committed not to kicking from the tree.

17:51 <jenatali> Right, if there's no stakeholders, then nobody will ever fix it, and post-merge is the same as never running it

17:52 <alyssa> If the "you" is "an interested mingw stakeholder", say VMware, then if the coverage is getting them benefit, they will monitor the post-merge and act appropriately

17:52 <anholt_> post-merge is, effectively, me. I've got plenty of chasing CI already, no thanks.

17:52 <anholt_> (in the form of the nightly runs)

17:52 <alyssa> and if that mingw stakeholder doesn't care then... if there's no benefit in premerge or postmerge then there's no benefit to having the coverage full stop and it should just be removed

17:52 <jenatali> daniels: Want to ping lygstate for the mingw fails? I think he cares

17:53 <daniels> jenatali: oh, thanks for the pointer

17:53 <jenatali> (I'm still mostly on vacation, just happened to see a relevant topic for me in the one minute of scroll back I read)

17:54 <zmike> jenatali: go vacation harder!

17:54 <alyssa> I guess that's my point. If there is a stakeholder who cares, then they will monitor the nightly run and act accordingly.

17:54 <alyssa> if there's no stakeholder who cares, there's no value in the job running at all, and.. that's fine?

17:54 <jenatali> I'm sitting in a hotel lobby waiting to go to the airport lol. I've vacationed hard enough

17:54 <anholt_> alyssa: there is no mechanism for nightly alerting.

17:54 <zmike> jenatali: oh okay, proceed

17:54 <anholt_> it would be great if there was

17:54 <alyssa> anholt_: ugh. I see.

17:55 <alyssa> to be clear "anholt_ monitors all the nightly mingw jobs" is not the proposal and NAK to that because that's a terrible idea

17:56 <APic> Uh huh.

17:56 <anholt_> +1 to deleting clover job. It was introduced when rusticl was first landing and "make sure we don't break clover" seemed more reasonable. On the other hand, I don't think I've seen it flake.

17:58 <HdkR> How soon until it is +1 to deleting Clover? :)

17:59 <anholt_> I'm +1 to deleting clover right now.

17:59 <alyssa> same here

17:59 <daniels> srs

17:59 <DavidHeidelberg[m]> 🎊

17:59 <anholt_> but the rusticl dev has been hesitant until feature parity

17:59 <anholt_> (which, afaik, is close)

17:59 <alyssa> if "clover is deleted" is the only thing that comes out of this burnout fuel hell weekend

17:59 <alyssa> still a net positive

17:59 <alyssa> :p

18:00 <DavidHeidelberg[m]> Can someone update https://www.mesa3d.org/ Current release: 22.3.7 . Anyway Clover will stay in 23.0, which is not that far apart anyway

18:01 <HdkR> mesamatrix doesn't track clover versus rusticl features, I'm sad :P

18:02 <daniels> alyssa: we also now have shared runners which aren't being DoSed by some impressively resourceful crypto miners

18:03 <alyssa> shitcoin really does ruin everything it touches

18:14 <DavidHeidelberg[m]> before the fate of Clover is fulfulled, can we agree on decreasing the load by dropping the clover CI jobs? https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19385#note_1818805 If yes, I'll prepare MR so we can do small amendment to the CI and remove three jobs

18:14 <daniels> DavidHeidelberg[m]: sure, sounds good

18:15 <alyssa> DavidHeidelberg[m]: ++

18:21 <DavidHeidelberg[m]> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21865

18:25 alyssa has left #dri-devel [#dri-devel]

18:25 alyssa has joined #dri-devel

18:25 alyssa has left #dri-devel [#dri-devel]

18:31 <DavidHeidelberg[m]> haven't thought about it yet HOW to do it, but when we putting farm down, we should omit running CI. Only on bringup.

18:38 <jenatali> Yeah, would really be nice if there was a way to avoid running hardware CI jobs for unrelated config changes, like bumping a Windows container image...

19:09 alanc has quit [Remote host closed the connection]

19:09 alanc has joined #dri-devel

19:13 MajorBiscuit has joined #dri-devel

19:13 heat has joined #dri-devel

19:14 kzd has joined #dri-devel

19:18 gio has joined #dri-devel

19:19 <airlied> anholt_: the clover job preexisted rusticl

19:20 <airlied> by a long time

19:20 <airlied> and it has csught a some llvmpipe regressions

19:21 <airlied> now rusticl will eventually catch them, just not sure it does yet

19:23 MajorBiscuit has quit [Quit: WeeChat 3.6]

19:25 <anholt_> airlied: yeah, misread a commit. you're right.

19:27 anholt_ has quit [Quit: Leaving]

19:28 <eric_engestrom> DavidHeidelberg[m]: https://gitlab.freedesktop.org/mesa/mesa3d.org/-/merge_requests/163 merged, website will be updated in a couple of minutes

19:30 <DavidHeidelberg[m]> Thank you :)

19:31 <APic> ☺

19:49 tobiasjakobi has joined #dri-devel

19:58 tobiasjakobi has quit [Ping timeout: 480 seconds]

20:07 Zopolis4 has joined #dri-devel

20:17 lemonzest has quit [Quit: WeeChat 3.6]

20:19 DPA has quit [Ping timeout: 480 seconds]

20:30 Leopold has quit []

20:33 Leopold has joined #dri-devel

20:37 konstantin_ has joined #dri-devel

20:39 khfeng_ has joined #dri-devel

20:41 konstantin has quit [Ping timeout: 480 seconds]

20:42 khfeng has quit [Ping timeout: 480 seconds]

20:43 DPA has joined #dri-devel

20:48 YuGiOhJCJ has joined #dri-devel

20:50 junaid has quit [Ping timeout: 480 seconds]

21:35 bgs has quit [Remote host closed the connection]

21:36 pcercuei has joined #dri-devel

21:36 Leopold has quit [Ping timeout: 480 seconds]

21:36 Duke`` has quit [Ping timeout: 480 seconds]

21:37 camus has quit [Ping timeout: 480 seconds]

21:45 gouchi has quit [Remote host closed the connection]

21:49 Leopold_ has joined #dri-devel

22:07 DPA has quit [Ping timeout: 480 seconds]

22:09 tarceri has quit [Ping timeout: 480 seconds]

22:16 Zopolis4 has quit []

22:29 Guest7487 has quit []

22:41 tarceri has joined #dri-devel

22:54 a-865 has quit [Quit: ChatZilla 0.15 [SeaMonkey 2.53.15/20230108172623]]

22:56 a-865 has joined #dri-devel

23:11 DPA has joined #dri-devel

23:15 mattst88 has quit [Ping timeout: 480 seconds]

23:18 mattst88 has joined #dri-devel

23:20 pcercuei has quit [Quit: dodo]

23:26 tarceri has quit [Ping timeout: 480 seconds]

23:37 Danct12 is now known as Guest7507

23:37 tarceri has joined #dri-devel

23:42 Haaninjo has quit [Quit: Ex-Chat]