ChanServ changed the topic of #dri-devel to: <ajax> nothing involved with X should ever be unable to find a bar
<alyssa> android-virgl-llvmpipe is way too slow
<alyssa> Marge failed the pipeline because that was 5 seconds over its 20 minute budget
<alyssa> probably the time was spent waiting for a runner but still
Zopolis4 has quit []
smilessh has joined #dri-devel
<DavidHeidelberg[m]> alyssa: daniels limited it to the 20 minutes recently, feel free to bump it to the 25 in the MR
<DavidHeidelberg[m]> I would do it, bit I'm on the phone
<alyssa> I feel like we need a better approach to CI
<alyssa> because this isn't scaling and not for lack of herculean efforts trying
<alyssa> Maybe driver teams queueing up all their MRs for the week (that they've certified are good) and assigning as a unit to Marge
<alyssa> so that the Marge queue is freed up for big common code changes that really do need the extra CI checks
<alyssa> doesn't fix reliability but it saves resources, and the fewer pipelines you assign to Marge the fewer fails you'll see statistically
wind has joined #dri-devel
<alyssa> if that queueing is happening at a team level (and not an individual level), then you sort out the rebase conflicts "offline"
<alyssa> and only a single person on the team has to actually interact with the upstream CI (rather than the teensy subset that's needed for the team to develop amongst themselves)
<alyssa> not to throw one member of the team under the bus, that role can (and probably should) rotate
<alyssa> but, like, if I only had to interact with Marge once a fortnight, and Lina interacted with Marge once a fortnight, and asahi/mesa became a canonical integration tree that got synced upstream every week... that would eliminate a lot of the emotional burden, I think
<alyssa> the associated problem with downstream canonical integration is that it can move review downstream too, which we don't want
<alyssa> ideally review continues to happen out on the open in mesa/mesa, just doesn't get merged to upstream immediately
<alyssa> one kludge way to do this is to use the "Needs merge" milestone for stuff that's nominally ready but needs to be queued up with other work from that team before hitting marge
<alyssa> and then having some out of band way to sync that downstream for integration for the week
<alyssa> a better way might be having branches on mesa/mesa for each driver team
<alyssa> mesa/mesa:asahi, mesa/mesa:panfrost, whatever
<alyssa> such that MRs that are for a single driver (as opposed to something for common code) gets the MR targeted against its team branch instead of mesa's main branch directly
<alyssa> so then the weekly Marge exercise is just MR'ing mesa/mesa:team against mesa/mesa:main
<alyssa> this gives people the satisfaction of seeing their code merged in gitlab even if it's not upstream yet
windleaves has quit [Ping timeout: 480 seconds]
columbarius has joined #dri-devel
co1umbarius has quit [Ping timeout: 480 seconds]
<DavidHeidelberg[m]> alyssa: there was dying runners this week, valve and our farm, this week the situation SHOULD get better (due to retries), sadly bad luck occured and it got worse :D
<airlied> turned off issues for public again
<airlied> since spammer is live in action
<alyssa> DavidHeidelberg[m]: this week was certainly worse than usual, but I've been talking about these problems for years..
<DavidHeidelberg[m]> I would like. Where? :D
<alyssa> just start, er, watching the MRs :p
<alyssa> it's FREE!
<alyssa> aco MR hung on debian-mingw32-x86_64
<alyssa> why are we even testing mingw in CI wtf?
<alyssa> can we delete that job?
<alyssa> better question why do we support building in mingw at all?
<alyssa> what possible benefit is that brining us
<alyssa> bringing
<alyssa> dcbaker:
<DavidHeidelberg[m]> + gitlab instability problems, yeah
<alyssa> DavidHeidelberg[m]: I mean. Yes. Those are all true in isolation
<DavidHeidelberg[m]> Nah, it's not FREE. We're here because we are not free.
<alyssa> But if you look at the bigger picture... the model we have is fundamentally unsustainable
<alyssa> pre-merge CI on every single configuration for every single merge request, serialized across all MRs to an upstream project with dozens of full time people, where the test coverage is fundamentally extremely intensive for each piece of hw/sw
<DavidHeidelberg[m]> I agree with you.
<alyssa> No matter how amazingly competent the people running it are... that model is fundamentally unsustainable
<kisak> >_> I'm here because it's libre ... but I don't do much around here except complain.
<DavidHeidelberg[m]> That's why I pushed for `retry: 1`. To handle these flakes which .. sometimes... happen
<alyssa> (and while I'm not going to name names, other long time Mesa devs have also express very unhappy about this)
<alyssa> expressed being
heat has quit [Read error: Connection reset by peer]
<alyssa> yes, the retry helps. but that doesn't fix the deeper sustainability issue
<DavidHeidelberg[m]> If we have no long term failures (farm, runners, bad job), it can work I believe.
heat has joined #dri-devel
<alyssa> I don't think so
<alyssa> The cost to run CI that way is quadratic over time
* DavidHeidelberg[m] tends to be sometimes optimistic
<alyssa> The more Mesa grows, the more commits are coming in AND the more coverage every commit becomes subject to
<alyssa> so unless our budget is growing quadratically (hint: it's not), at some point we exhaust our very finite resources and the whole thing collapses
<DavidHeidelberg[m]> I shouls get more pesimistic. But then I say CI is useless waste of power.
<alyssa> Well, if it was *useless*, I'd say we should delete it all
<alyssa> It isn't useless. Good targetted CI catches real bugs.
<alyssa> The problem is that bad CI is worse than no CI.
<DavidHeidelberg[m]> Well, we also try optimize over time more and more, so quadratic is maybe too exsaturated
<DavidHeidelberg[m]> I agree.
<alyssa> So... under the current approach to CI, the growth of Mesa means a quadratic increase in CI cost / power consumption / wall clock time / stability / whatever metric we're looking at.
<DavidHeidelberg[m]> Sure.
<alyssa> Either we reduce how much we run in CI (by limiting coverage to what *really* matters, not what we think might matter, and having to budget how much we put in the pipeline)
<alyssa> or we reduce how often we run CI (e.g. by requiring that vendor MRs be batched up and start at staging trees, to try to keep the # of Marge MRs per week roughly constant even as the # of commits increases)
<alyssa> or both
<alyssa> So far both approaches have been wildly unpopular
<kisak> alyssa: it should be noted that has already happened in the past and adjustments were made to get back in budget
<DavidHeidelberg[m]> I would say batched stuff could be problem when looking at (often) flakes. Bisectability will decrease
<alyssa> kisak: Yes, I'm aware. Every once in a while we have a panic, get our act together for a bit, and then slowly the frog boils back
<DavidHeidelberg[m]> :D
<alyssa> which gets back to my original point: what we're doing is fundamentally unsustainable
<alyssa> technically, ecologically, emotionally
<DavidHeidelberg[m]> Just wanted to being the car industry, but you summed up nicely
<alyssa> whether your axis is fd.o's budget in $'s, or kgs of co2 the ci farms are emitting, or # of mesa developers and fd.o sysadmins we burn out
<alyssa> this is unsustainable
<DavidHeidelberg[m]> BTW. Be prepared to drop rust with lto to C89 with -G0 equivalent :D
<DavidHeidelberg[m]> *-g0
<alyssa> Hum?
<alyssa> IDK what -g0 is
<DavidHeidelberg[m]> Hmm. Too tired -O0
<DavidHeidelberg[m]> I've been playing too much with -ggdb and -g recently...
<alyssa> mm
<alyssa> I can start with the containers
<alyssa> Why do we need to test 3 different Linux distributions on x86?
<alyssa> but only for build testing?
<DavidHeidelberg[m]> Alpine has musl libx; Debian we love; someone cares about Fedora (RedHat? ;) )
<alyssa> Is building against musl libc in premerge CI bringing us any value?
<alyssa> Keeping in mind it's a build test and not a runtime test
<alyssa> Could we drop that job and instead do an alpine build once a release, and if there are build regressions, fix them then right before the branchpoint?
<alyssa> Likewise for Fedora?
<DavidHeidelberg[m]> We don't break stuff for musl based distros. That's kinda nice behavior :) + other sets of warning from different set libraries
<alyssa> having something in premerge CI is very different from not breaking
<DavidHeidelberg[m]> The once per release - could be solution.
<alyssa> The coverage can still exist but it doesn't need to run more than once a release or once a month or whatever
<alyssa> having something in premerge CI is very different from not breaking it
<alyssa> gitlab ci is not a fundamentally bad place to have "build all the things" but premerge ci is
<DavidHeidelberg[m]> When I added it (alpine) we had usually lot runners available. Situation changed a bit recently
<alyssa> I'm just working my way down the list
<alyssa> But my point is: we know that there is a cost to the Alpine and Fedora jobs (or Debian, I don't care which distro we pick for CI, it doesn't matter to me)
<alyssa> Is there a benefit to having them in premerge? Are they catching real bugs?
devinaut has joined #dri-devel
<alyssa> I can't think of any they've caught (that debian/x86 didn't also catch)
<DavidHeidelberg[m]> I think time to time some ifdefs for alpine
<alyssa> Sure. That's easy to fix up once a release or once a month. Doesn't need to be once a commit.
<alyssa> Especially given it's build testing only.
<DavidHeidelberg[m]> I agreed already on that one.
<alyssa> All the same considerations go for "weird" architectures, namely anything that doesn't feed into a software or hardware test after
<alyssa> s390x, ppc64el builds
<DavidHeidelberg[m]> The trick is these jobs are heavily cached and non-blocking CI
<DavidHeidelberg[m]> Except load of container and linking, it usually doesn't do much job
<alyssa> Do we need build tests for all the combinatorics of {gcc, clang} x {debug, release}? Would halving that to gcc+debug and clang+release (or vice versa) get most of the bang for the buck?
<alyssa> Why are their clover jobs still? we're committed to not supporting clover anymore anyway, amber it.
<alyssa> Why can't the rusticl job be part of the regular gcc or clang build?
<alyssa> Why is Vulkan special?
<alyssa> Why is fedora-release getting us value over the existing release build testing?
<DavidHeidelberg[m]> I have feeling you have cancel mood :D
<alyssa> Why are we premerge testing mingw?
<alyssa> david i haven't tweeted since 2022 i am itching to cancel ;-p
<DavidHeidelberg[m]> With you approach we end with 3 jobs :D
<alyssa> Yes. And if those 3 jobs are reliable and we can count on them, it'd be a big win overall.
<DavidHeidelberg[m]> and I'm not saying it won't solve the CI problem :D
<alyssa> A lot easier to manage for sysadmins, a lot better for developers, no real value lost
<alyssa> Checksum-based trace testing is a HUGE bug bear of mine, because it is fundamentally unsound and contrary to the GL spec
<alyssa> which means it is a constant source of trouble (=> cost to all of developers sysadmins and the machines/infra itself from wasted pipelines going to update checksums)
<alyssa> because it fundamentally cannot work according to the GL spec
<DavidHeidelberg[m]> Agree. But there is tradeoff in complexity. :/
<alyssa> and for all the resources it's wasted, it has yet to catch a single actual bug in my memory
<alyssa> (resources both human and machine)
<DavidHeidelberg[m]> I had to drop many cool traces because onepixel reproducibility.
<alyssa> we have to run deqp in CI, that's nonnegotiable
<alyssa> which means every other CI job has to be measured not as "can this catch issues?" but rather "will this in practice catch issues that the deqp coverage will miss?"
<alyssa> for piglit this answer is "absolutely yes", for trace based testing the answer is "in theory maybe but not for the pain it's worth"
<alyssa> given you need a human in the loop to review the trace changes, having it in premerge is unacceptable for that reason alone
<alyssa> but also you don't lose much from running them once a {release,month,whatever} if you're an interested party in a particular driver
<Lynne> radv you lying bastard!
<Lynne> I give vkGetMemoryHostPointerPropertiesEXT pointer 0x7f0fe1b94000 and it happily returns memoryTypeBits 0x20
<bnieuwenhuizen> Lynne: ?
<Lynne> but when I try to actually host map the memory address, with a length of 2076672, it returns VK_ERROR_INVALID_EXTERNAL_HANDLE
<Lynne> both the address and length are mod the pagesize
<bnieuwenhuizen> how did you get the memory?
<Lynne> just regular system ram, aligned malloc
<bnieuwenhuizen> hmm, that should succeed
<Lynne> it apparently also fails on ANV, but that's not enough to get attention in here :)
<alyssa> performance trace-based testing, you're up next
<alyssa> I don't know how this is supposed to work and I don't want to know
<bnieuwenhuizen> what do you mean with host map btw?
<alyssa> is that not actually CI and just (ab)using CI to fed grafana?
<bnieuwenhuizen> alyssa: trace based testing is just using traces as testcases. Maybe you're wanting perf numbers with that?
<alyssa> bnieuwenhuizen: which msg is this replying to?
<alyssa> my issue with the trace-based tests or me not understanding why there are -performance versions
<HdkR> Lynne: GPU page size or CPU page size? :D
<bnieuwenhuizen> alyssa: the grafana thingy
<alyssa> oh
<bnieuwenhuizen> HdkR: should be the same on AMD
<alyssa> I know freedreno has a perf dashboard, I guess they're abusing CI to feed it with data
<Lynne> yup, minImportedHostPointerAlignment = 4096
<HdkR> bnieuwenhuizen: ooo fancy
<bnieuwenhuizen> of course major benefits if you use hugepage or similar (GPU really likes 64k pages)
<alyssa> performance-rules has allow_failure so I guess I don't care for the purpose of this rant
<Lynne> by host-map, I mean I map the host memory as a VkDeviceMemory and use it to back a VkBuffer
<bnieuwenhuizen> Lynne: anything in dmesg?
<Lynne> no, empty
<alyssa> so... software-renderer you're up... why is there an llvmpipe-piglit-clover job when we're not supporting clover and there's ALSO a rusticl job? why trace based testing (see above issue)?
<alyssa> layered-backends: do I even want to ask about the spirv2dxil job, how is that even in scope for upstream testing? virgl traces has the usual trace problems and has caused problems for me personally with correct NIR changes, what value is that providing to upstream Mesa to justify its inclusion in premerge? similar for zink traces?
<Lynne> sure, give me a sec
<alyssa> quite frankly, with my upstream Mesa hat on, I am NAK'ing checksum-based trace testing in pre-merge CI for any driver.
<alyssa> I can do that now apparently XP
<bnieuwenhuizen> also IIRC there stupidly there is a kernel build option that needs to be enabled to make it work
<bnieuwenhuizen> thought it was kinda default ish, but at least ChromeOS at some point managed to disable it
<Lynne> "host ptr import fail: 1 : Operation not permitted"
<bnieuwenhuizen> thx, let me check
<Lynne> err, apparently it's not ordinary ram
<Lynne> it's actually device memory
<bnieuwenhuizen> oh that wouldn't work
<Lynne> yeah, I thought so, my fault, thanks
<alyssa> The bottom line is "some company paid to have this in Mesa CI" is fundamentally an UNACCEPTABLE reason to put something in upstream CI
<alyssa> because it's a cost that EVERYONE pays
<alyssa> for every item in CI
devinaut has quit []
<alyssa> For any job in CI the cost the community pays to have it needs to be measured against the benefit the community gains from it
<alyssa> and if the cost exceeds the benefit -- as it does in the case of a number of the jobs I mentioned above -- it does not deserve to be in premerge
<alyssa> even if there's a billion dollar corporate sponsor for the CI coverage
<Lynne> I wouldn't be able to detect if an address is device memory, by any chance, right?
<bnieuwenhuizen> failure to import? :P
<bnieuwenhuizen> but no, not really
<alyssa> it is a simple cost-benefit analysis, and if the private bigcorp reaps the benefit while the commons pays the cost.. that's unacceptable
<Lynne> or could vkGetMemoryHostPointerPropertiesEXT be changed to return !VK_SUCCESS?
<bnieuwenhuizen> we could try to do an import there
<bnieuwenhuizen> I think the other weird case is mmapped files, I don't think those are supported either
<Lynne> would be nice if the function to check if a pointer can be imported actually checks, it's where my fallback is hooked up to
ice9 has joined #dri-devel
ice9 has quit [Read error: Connection reset by peer]
<alyssa> what a surprise, !20553 is currently in the long pile of running traces
<alyssa> would be merged if not for the traces
<alyssa> ^already
JohnnyonFlame has quit [Ping timeout: 480 seconds]
<alyssa> Aahahahaha and the pipeline failed because of a trace job flaking
<alyssa> how many times do i need to say that trace jobs cannot be in premerge testing
<alyssa> Trace-based testing. does not. belong in upstream premerge
<alyssa> Every bit of premerge CI coverage is a cost that EVERYONE pays
<alyssa> and unless there's that benefit in turn, it is a burden on EVERYONE and needs to go
<alyssa> and given that the value proposition of checksum based trace testing is essentially nil, I see no reason to keep it.
<alyssa> do what you want post-merge but this is an unacceptable burden for the community to bear
lemonzest has quit [Quit: WeeChat 3.6]
Zopolis4 has joined #dri-devel
lemonzest has joined #dri-devel
<lina> alyssa: Another advantage of having asahi/main as integration point is we could probably add our own custom CI without having to worry about its stability being an issue for other teams ^^
<lina> (Like once we actually have runners)
<lina> Or some other branch specific to us
<lina> Like if I hav some runners in my closet it's probably good enough for us but I don't want to be responsible for breaking CI for everyone if my internet goes down ^^;;
<alyssa> lina: Responsible. I appreciate that :)
* HdkR sweats in pile of ARM boards
<Lynne> if you've got too much of them, you can turn them into bricks of fabulous doorstops very easily if they're called "rockchip" and carry the number 3399, just call dd
orbea has quit [Quit: You defeated orbea! 2383232 XP gained!]
orbea has joined #dri-devel
digetx has quit [Ping timeout: 480 seconds]
digetx has joined #dri-devel
agd5f_ has joined #dri-devel
<lina> Actually, would it make sense to gate pre-merge CI on the tags?
<lina> Like only run CI specific to the drivers affected
<lina> And then full CI can run periodically on the main branch
agd5f has quit [Ping timeout: 480 seconds]
Company has quit [Quit: Leaving]
<alyssa> lina: Pre-merge CI is gated on the files updated
<alyssa> mesa/.gitlab-ci/test-source-dep.yml controls that
<alyssa> tags aren't used since they get stale easily
alyssa has quit [Quit: leaving]
bgs has joined #dri-devel
<daniels> DavidHeidelberg[m]: I limited it to 20min as it usually completes in 8min; 25min is in no way normal or good
agd5f has joined #dri-devel
agd5f_ has quit [Ping timeout: 480 seconds]
heat has quit [Ping timeout: 480 seconds]
krushia has quit [Quit: Konversation terminated!]
RSpliet has quit [Quit: Bye bye man, bye bye]
RSpliet has joined #dri-devel
nishiyama has joined #dri-devel
nishiyama has quit []
agd5f_ has joined #dri-devel
agd5f has quit [Ping timeout: 480 seconds]
Duke`` has joined #dri-devel
chipxxx has quit [Remote host closed the connection]
chipxxx has joined #dri-devel
chipxxx has quit []
chipxxx has joined #dri-devel
robobub has joined #dri-devel
Zopolis4 has quit []
sghuge has quit [Remote host closed the connection]
sghuge has joined #dri-devel
sghuge has quit [Ping timeout: 480 seconds]
danvet has joined #dri-devel
fab has joined #dri-devel
YuGiOhJCJ has joined #dri-devel
agd5f has joined #dri-devel
agd5f_ has quit [Ping timeout: 480 seconds]
agd5f_ has joined #dri-devel
JohnnyonFlame has joined #dri-devel
hansg has joined #dri-devel
agd5f has quit [Ping timeout: 480 seconds]
ManMower has quit [Ping timeout: 480 seconds]
kzd has quit [Quit: kzd]
Zopolis4 has joined #dri-devel
<Newbyte> This page links to the Mesa issue tracker as the place to report bugs:
<Newbyte> But the link is a 404. What gives?
<ccr> due to spammer issues the issue tracking is currently set to project members only (afaik)
<Newbyte> thanks
pcercuei has joined #dri-devel
<psykose> it makes them invisible to non-members too :D
Haaninjo has joined #dri-devel
gouchi has joined #dri-devel
evadot has quit [Quit: quit]
evadot has joined #dri-devel
<ccr> unfortunately, but I'm sure someone is working on a better solution.
bluetail9 has joined #dri-devel
bluetail9 has quit []
bluetail has joined #dri-devel
fab has quit [Remote host closed the connection]
fab has joined #dri-devel
agd5f has joined #dri-devel
fab has quit [Quit: fab]
agd5f_ has quit [Ping timeout: 480 seconds]
fab has joined #dri-devel
fab has quit [Quit: fab]
fab has joined #dri-devel
<DavidHeidelberg[m]> daniels: I agree, my point was if something wen't wrong and it takes around 21-23 minutes it's still better compromise to have 25 instead of 1h before if that lead to job finishing
<DavidHeidelberg[m]> daniels: as I'm looking into the Daily and the failure rate, maybe we should disable it for now + I'm thinking about moving alpine and fedora into nightly runs, since figuring out build failure isn't that hard and it happens only rarely
agd5f_ has joined #dri-devel
kts has joined #dri-devel
agd5f has quit [Ping timeout: 480 seconds]
Danct12 has joined #dri-devel
Haaninjo has quit [Read error: Connection reset by peer]
Haaninjo has joined #dri-devel
kts has quit [Quit: Konversation terminated!]
jernej_ has quit [Remote host closed the connection]
jernej has joined #dri-devel
MrCooper has quit [Remote host closed the connection]
jernej has quit []
jernej has joined #dri-devel
MrCooper has joined #dri-devel
kode54 has quit [Quit: Ping timeout (120 seconds)]
sjfricke[m] has quit []
kode54 has joined #dri-devel
Leopold__ has quit [Remote host closed the connection]
Leopold has joined #dri-devel
jernej has quit [Remote host closed the connection]
jernej has joined #dri-devel
YuGiOhJCJ has quit [Quit: YuGiOhJCJ]
ManMower has joined #dri-devel
zehortigoza has quit [Remote host closed the connection]
Company has joined #dri-devel
jdavies has joined #dri-devel
jdavies is now known as Guest7468
bmodem has joined #dri-devel
nekit has quit [Quit: The Lounge -]
nekit has joined #dri-devel
Guest7468 has quit [Ping timeout: 480 seconds]
srslypascal is now known as Guest7472
srslypascal has joined #dri-devel
Guest7472 has quit [Ping timeout: 480 seconds]
bmodem has quit [Ping timeout: 480 seconds]
kts has joined #dri-devel
<jenatali> alyssa: re mingw, some downstream folks apparently wanted that - not us fwiw
<jenatali> Re spirv2dxil, it's a compiler only job. Its main purpose was to stress the DXIL backend when fed Vulkan SPIR-V, but now that Dozen is more mature we can probably retire it
<daniels> DavidHeidelberg[m]: yeah but honestly it’s just hiding actual root causes and making it harder to solve the actual problem
<DavidHeidelberg[m]> what I was thinking is moving our daily treshold for reporting jobs to 15 and 30 minutes, instead of 30 and 60 minutes (enqueued etc.)
<DavidHeidelberg[m]> so we would see it. I has to agree with Alyssa that it's so annoying from developer POV and, how we care about it doesn't really matter, but they should have passing marge pipelines no matter how we reach it
junaid has joined #dri-devel
Zopolis4 has quit []
srslypascal is now known as Guest7479
srslypascal has joined #dri-devel
<daniels> sure, but at some stage it's unusable anyway - we could accept that jobs take 3-4x the runtime, which probably means making the marge timeout 3h, and at that point we can only merge 8 MRs per day
Guest7479 has quit [Ping timeout: 480 seconds]
hansg has quit [Quit: Leaving]
kts has quit [Quit: Konversation terminated!]
junaid has quit [Ping timeout: 480 seconds]
pcercuei has quit [Quit: bbl]
junaid has joined #dri-devel
<cheako> Do ppl know that issues were removed from gitlab/mesa?
<hch12907> you need an account to access them, I think
<cheako> good catch, but no I'm logged in.
<ccr> only available project members, e.g. people with certain access level. due to recent spam issues.
<ccr> available to
<cheako> I'm trying to provide more information, I can wait/when should I try again?
<daniels> cheako: opened them up now
<cheako> :)
fab_ has joined #dri-devel
fab_ is now known as Guest7487
fab has quit [Ping timeout: 480 seconds]
smilessh has quit [Ping timeout: 480 seconds]
alyssa has joined #dri-devel
<alyssa> jenatali: yep, I am aware that the mingw job wasn't you
<alyssa> whether I'm happy about it or not, the windows jobs have earned their place :p
<alyssa> (the VS2019 ones)
<alyssa> which is why I was wondering what benefit it *was* providing
* jenatali shrugs
<daniels> the vmware team do most of their work on top of mingw
<alyssa> OK, I don't think I recalled that
<alyssa> So then the question is -- what benefit is there to the job (i.e. what issues will it catch that the combination of linux gcc + windows vs2019 will not catch), what cost is it to premerge ci, and how much of that benefit could be recovered with some form of post-merge coverage (likely almost all of it, because build failures are easy to deal with for relevant stakeholders... given that there is Windows
<alyssa> CI it should be a rare event to see a mingw only failure)
<jenatali> I'd be inclined to agree, post-merge seems more appropriate
<alyssa> benefit measured as P(legitimate fail in mingw | windows vs2019 passes AND gcc linux passes)
<daniels> I'm not sure that post-merge has any more value than just not having it ever, because all that happens is that you get used to seeing that stuff has failed and ignoring it
<alyssa> The question is who is "you"
<daniels> either way, I've disabled the job for now as it's broken in some kind of exotic way
<alyssa> If the "you" is "Alyssa", then that seems... fine? I don't do anything that's liable to change mingw in interesting ways and from an upstream perspective mingw is not something we're committed to supporting, just committed not to kicking from the tree.
<jenatali> Right, if there's no stakeholders, then nobody will ever fix it, and post-merge is the same as never running it
<alyssa> If the "you" is "an interested mingw stakeholder", say VMware, then if the coverage is getting them benefit, they will monitor the post-merge and act appropriately
<anholt_> post-merge is, effectively, me. I've got plenty of chasing CI already, no thanks.
<anholt_> (in the form of the nightly runs)
<alyssa> and if that mingw stakeholder doesn't care then... if there's no benefit in premerge or postmerge then there's no benefit to having the coverage full stop and it should just be removed
<jenatali> daniels: Want to ping lygstate for the mingw fails? I think he cares
<daniels> jenatali: oh, thanks for the pointer
<jenatali> (I'm still mostly on vacation, just happened to see a relevant topic for me in the one minute of scroll back I read)
<zmike> jenatali: go vacation harder!
<alyssa> I guess that's my point. If there is a stakeholder who cares, then they will monitor the nightly run and act accordingly.
<alyssa> if there's no stakeholder who cares, there's no value in the job running at all, and.. that's fine?
<jenatali> I'm sitting in a hotel lobby waiting to go to the airport lol. I've vacationed hard enough
<anholt_> alyssa: there is no mechanism for nightly alerting.
<zmike> jenatali: oh okay, proceed
<anholt_> it would be great if there was
<alyssa> anholt_: ugh. I see.
<alyssa> to be clear "anholt_ monitors all the nightly mingw jobs" is not the proposal and NAK to that because that's a terrible idea
<APic> Uh huh.
<anholt_> +1 to deleting clover job. It was introduced when rusticl was first landing and "make sure we don't break clover" seemed more reasonable. On the other hand, I don't think I've seen it flake.
<HdkR> How soon until it is +1 to deleting Clover? :)
<anholt_> I'm +1 to deleting clover right now.
<alyssa> same here
<daniels> srs
<DavidHeidelberg[m]> 🎊
<anholt_> but the rusticl dev has been hesitant until feature parity
<anholt_> (which, afaik, is close)
<alyssa> if "clover is deleted" is the only thing that comes out of this burnout fuel hell weekend
<alyssa> still a net positive
<alyssa> :p
<DavidHeidelberg[m]> Can someone update Current release: 22.3.7 . Anyway Clover will stay in 23.0, which is not that far apart anyway
<HdkR> mesamatrix doesn't track clover versus rusticl features, I'm sad :P
<daniels> alyssa: we also now have shared runners which aren't being DoSed by some impressively resourceful crypto miners
<alyssa> shitcoin really does ruin everything it touches
<DavidHeidelberg[m]> before the fate of Clover is fulfulled, can we agree on decreasing the load by dropping the clover CI jobs? If yes, I'll prepare MR so we can do small amendment to the CI and remove three jobs
<daniels> DavidHeidelberg[m]: sure, sounds good
<alyssa> DavidHeidelberg[m]: ++
alyssa has left #dri-devel [#dri-devel]
alyssa has joined #dri-devel
alyssa has left #dri-devel [#dri-devel]
<DavidHeidelberg[m]> haven't thought about it yet HOW to do it, but when we putting farm down, we should omit running CI. Only on bringup.
<jenatali> Yeah, would really be nice if there was a way to avoid running hardware CI jobs for unrelated config changes, like bumping a Windows container image...
alanc has quit [Remote host closed the connection]
alanc has joined #dri-devel
MajorBiscuit has joined #dri-devel
heat has joined #dri-devel
kzd has joined #dri-devel
gio has joined #dri-devel
<airlied> anholt_: the clover job preexisted rusticl
<airlied> by a long time
<airlied> and it has csught a some llvmpipe regressions
<airlied> now rusticl will eventually catch them, just not sure it does yet
MajorBiscuit has quit [Quit: WeeChat 3.6]
<anholt_> airlied: yeah, misread a commit. you're right.
anholt_ has quit [Quit: Leaving]
<eric_engestrom> DavidHeidelberg[m]: merged, website will be updated in a couple of minutes
<DavidHeidelberg[m]> Thank you :)
<APic> ☺
tobiasjakobi has joined #dri-devel
tobiasjakobi has quit [Ping timeout: 480 seconds]
Zopolis4 has joined #dri-devel
lemonzest has quit [Quit: WeeChat 3.6]
DPA has quit [Ping timeout: 480 seconds]
Leopold has quit []
Leopold has joined #dri-devel
konstantin_ has joined #dri-devel
khfeng_ has joined #dri-devel
konstantin has quit [Ping timeout: 480 seconds]
khfeng has quit [Ping timeout: 480 seconds]
DPA has joined #dri-devel
YuGiOhJCJ has joined #dri-devel
junaid has quit [Ping timeout: 480 seconds]
bgs has quit [Remote host closed the connection]
pcercuei has joined #dri-devel
Leopold has quit [Ping timeout: 480 seconds]
Duke`` has quit [Ping timeout: 480 seconds]
camus has quit [Ping timeout: 480 seconds]
gouchi has quit [Remote host closed the connection]
Leopold_ has joined #dri-devel
DPA has quit [Ping timeout: 480 seconds]
tarceri has quit [Ping timeout: 480 seconds]
Zopolis4 has quit []
Guest7487 has quit []
tarceri has joined #dri-devel
a-865 has quit [Quit: ChatZilla 0.15 [SeaMonkey 2.53.15/20230108172623]]
a-865 has joined #dri-devel
DPA has joined #dri-devel
mattst88 has quit [Ping timeout: 480 seconds]
mattst88 has joined #dri-devel
pcercuei has quit [Quit: dodo]
tarceri has quit [Ping timeout: 480 seconds]
Danct12 is now known as Guest7507
tarceri has joined #dri-devel
Haaninjo has quit [Quit: Ex-Chat]