#freedesktop on 2023-03-13 — irc logs at oftc.irclog.whitequark.org

2022-12-21 00:45 ChanServ changed the topic of #freedesktop to: https://www.freedesktop.org infrastructure and online services || for questions about freedesktop.org projects, please see each project's contact || for discussions about specifications, please use https://gitlab.freedesktop.org/xdg or xdg@lists.freedesktop.org

00:09 anholt has joined #freedesktop

00:12 danvet has quit [Ping timeout: 480 seconds]

01:12 co1umbarius has joined #freedesktop

01:13 ximion has quit [Quit: Detached from the Matrix]

01:14 columbarius has quit [Ping timeout: 480 seconds]

01:19 ximion has joined #freedesktop

01:28 Leopold_ has quit [Remote host closed the connection]

01:40 Leopold_ has joined #freedesktop

02:25 systwi has joined #freedesktop

02:29 systwi_ has quit [Ping timeout: 480 seconds]

03:21 ximion has quit [Quit: Detached from the Matrix]

03:41 chip_x has joined #freedesktop

03:46 chipxxx has quit [Ping timeout: 480 seconds]

05:04 karolherbst has quit [Read error: Connection reset by peer]

05:04 karolherbst has joined #freedesktop

05:30 agd5f has joined #freedesktop

05:36 agd5f_ has quit [Ping timeout: 480 seconds]

05:58 agd5f_ has joined #freedesktop

06:05 agd5f has quit [Ping timeout: 480 seconds]

06:22 <bentiss> sergi: mind if I stop all of your scheduled pipelines in https://gitlab.freedesktop.org/gfx-ci-bot/mesa? You are basically DoS all the farms by running 1 mesa pipeline every 10 minutes when it takes ~50 min to run

06:22 <bentiss> actually I'm not even asking

06:28 <bentiss> sergi: and on top of that: ERROR: Job failed: failed to pull image "alpine:latest" with specified policies [always]: Error response from daemon: toomanyrequests: You have reached your pull rate limit.

06:28 thaller has joined #freedesktop

06:28 <bentiss> so not enough DoS the farms, you even locked us up from pulling from docker.io by adding a ton of pulls :(

06:48 thaller has quit [Quit: Leaving]

06:48 thaller has joined #freedesktop

07:06 danvet has joined #freedesktop

07:06 <sergi> Hi bentiss, I'm not DoSing but diagnosing the farms. Or at least this was the purpose. I've prepared last week a pipeline that launches a job per tag of runners, that onely does an echo. This has been made to see how the scheduler work because with the team we've been seeing issues with the pick from queues or how the jobs are enqueued

07:07 <bentiss> sergi: yes, I understand the rational behind it, but this is DoS the farm

07:07 <bentiss> we had ~700 pending jobs, ~650 were yours

07:08 <bentiss> and so the regular mesa pipelines can't run anymore

07:09 <sergi> I see. The initial idea was to schedule it every 2 minutes, but I schedules every 10 thinking that I may mean too much. When 10 were already too much

07:09 <bentiss> your idea kind of works when the farm are alive and not used, but as soon as they are heavy used, you are creating more jobs than it can handle, and if a farm goes down, you still have hundreds of pending jobs on that farm that can't be executed

07:09 <sergi> It seems the attempt to help in diagnose has even contribute to the problem

07:09 <bentiss> sergi: honestly, once per hour would be ok-ish, but not sure it'll help

07:10 <bentiss> sergi: the solution would be to have either a dedicated runner per farm that can handle those jobs out of the capacity of the other runners, or have a dedicated machine in each farm that can ping/report values of the runners

07:11 <bentiss> but it's school dropoff time here, I'll be back later

07:12 <sergi> thanks for managing that and sorry for contributing to the issue.

07:12 <sergi> see you later

07:50 <nirbheek_> Hugs to all admins today, thank you for handling the runner issues over the weekend <3

07:53 <bentiss> sigh... so we've got one user claiming he's going to work on mesa and camera, but used his brand new privileges for building android testing... :(

07:53 <bentiss> for references https://gitlab.freedesktop.org/freedesktop/freedesktop/-/issues/527

08:00 <bentiss> daniels: looks like JaswantTeja escaped the sandbox

08:02 <bentiss> that guy asked for privileges 2 days ago, and created 3 projects he since removed: ohing.git prawn.git and test.git

08:09 * mupuf is happy he asked users to specify their intent

08:10 <bentiss> FWIW, JaswantTeja also requested access

08:11 <bentiss> I wonder if we should force the issue to be public

08:11 <nirbheek_> > I can't create new projects or fork an existing one. I am not a spammer and I want to contribute, so please add me to the list of internal users!

08:11 <nirbheek_> What an ass

08:11 <bentiss> nirbheek_: he is following a template we give them

08:12 <nirbheek_> Ahh, I see, that text is from the template

08:12 <nirbheek_> His description was `As a linux engineer, i need to work on it & contribute for the community of freedesktop. So, it will be a great opportunity if I get the access. Thanks in Advance!`

08:12 <bentiss> nirbheek_: the "As a linux engineer, i need to work on it & contribute for the community of freedesktop. So, it will be a great opportunity if I get the access. Thanks in Advance!" is fully from him

08:12 <nirbheek_> Should we require new users to be either vouched by an existing user or prove identification in some other way?

08:13 <bentiss> nirbheek_: the initial idea was to discriminate bots/humans

08:13 <nirbheek_> Ah, I see

08:13 <bentiss> but I think I'm going to lock down the runners even if it means that virglrenderer pipelines will fail

08:13 <bentiss> cause escaping the sandbox is *really* worrying

08:14 <bentiss> the thing though is the arm runners don't have a recent enough podman version :(

08:15 <nirbheek_> I wonder if this is his github profile: https://github.com/JaswantTeja/

08:16 <nirbheek_> @bentiss was sunnyteja201@gmail.com his email id for gitlab too?

08:16 <bentiss> nirbheek_: yep, you got him

08:16 <nirbheek_> I am going to talk to him on Telegram

08:17 <bentiss> nirbheek_: enjoy!

08:17 <nirbheek_> His full name is Jaswant Teja Mamidisetti

08:18 <nirbheek_> Website: https://jaswanthteja.me/

08:19 <nirbheek_> Very odd person. Smells like a 16 year old script kiddie.

08:19 <bentiss> "An Inter Student who aspires to become a Developer" -> well, looks like you are entirely radiated from the freedesktop community

08:20 <bentiss> on his tweeter: Got my Alchemy University acceptance letter! -> I wonder if we can not send an email to his future hierarchy :)

08:20 <bentiss> actually, looks like it's "free", so maybe not formal enough

08:21 <nirbheek_> He is studying for JEE (the engineering entrance exams in India), he is in 11th

08:22 vbenes has joined #freedesktop

08:22 <bentiss> the amount of information people post on internet is just desperating...

08:22 <bentiss> I mean, we almost know everything on him after a quick look

08:22 <mupuf> yeah, and talking about him like that makes me feel uncomfortable

08:22 <nirbheek_> I think we should just be glad that we were pwned by someone who wasn't trying to compromise the software projects hosted by fdo

08:23 <bentiss> nirbheek_: 2 different farms

08:23 <nirbheek_> He saw a vuln and went for it, what looks like, out of sheer gall

08:23 <mupuf> yeah, if you do managed to talk to him, ask him to disclose how he pawned us..,

08:23 <bentiss> the gitlab running instance itself is way more restrictive

08:23 <bentiss> mupuf: I can probably find that in the backups

08:23 <bentiss> ohing.git was the name

08:24 <mupuf> I see, yeah, probably the best way to go

08:24 <bentiss> talking about backups because he deleted the project

08:25 <nirbheek_> He replied to me on telegram and I immediately got a furniture delivery IRL, so... brb

08:26 <mupuf> lol

08:26 <bentiss> he sent you furnitures???? :)

08:26 <bentiss> that was fast :)

08:26 <bentiss> (kidding in case it was unclear)

08:27 <nirbheek_> I am telling him I am really impressed with him

08:27 <mupuf> ...

08:28 <nirbheek_> I want to get him to admit what he did

08:28 <mupuf> don't play games, just say this is uncool: lying, abusing common infrastructure

08:28 <mupuf> destroying the tools that created the software he likely uses

08:28 <nirbheek_> I will

08:28 <nirbheek_> I have dealt with people like this before

08:29 <mupuf> ok... /me runs away. Don't want to see this

08:33 <nirbheek_> He is admitting what he did and naming other people too(!?)

08:34 <bentiss> please give me the list

08:35 <nirbheek_> He gave someone else his creds, apparently

08:36 <nirbheek_> Currently scolding him

08:45 * nirbheek_ uploaded an image: (102KiB) < https://matrix.org/_matrix/media/v3/download/matrix.org/XQHZiOTHfPJfozkoJQKwLsNh/photo_2023-03-13_14-09-44.jpg >

08:45 <nirbheek_> His accomplices are also crapping their pants now

08:49 ___nick___ has joined #freedesktop

08:49 <nirbheek_> bentiss: not sure what to do here now, it doesn't look like he knows his accomplices

08:50 <bentiss> nirbheek_: ok, thanks anyway (I must confess I haven't understand a word in your screenshot ;-P)

08:50 <bentiss> well, at least they know it's bad

08:52 <nirbheek_> Do we have any contacts at namecheap?

08:52 <nirbheek_> The main guy's website is https://cyberknight777.dev and the whois info is protected by namecheap

08:53 <bentiss> "Your friendly neighbourhood kernel developer" means that we can probably blacklist him in the kernel community, no?

08:54 <nirbheek_> I don't think he actually contributes

08:54 <bentiss> heh: https://cyberknight777.dev/posts/2021/09/in-the-making-spamprotection-rs/ we should ask him for input then :)

08:55 <bentiss> (for our spam problem)

08:55 <nirbheek_> He also hosts on linode, do we have contacts there?

08:56 * bentiss barely have contacts outside of fdo

08:56 <nirbheek_> Maybe daniels has, or someone else

08:57 <bentiss> ouch, pulling the full backup with the repos takes more than 1 hour

08:57 <nirbheek_> This guy is a real menace:... (full message at <https://matrix.org/_matrix/media/v3/download/matrix.org/HQmWuKaajeTRWvzTQHUYwypq>)

08:58 <bentiss> sigh

09:00 mvlad has joined #freedesktop

09:00 <bentiss> TBH, I wish I was allowed to also prune the gitlab db from all of the non contributing users :(

09:00 <mupuf> welp... not surprised, but I guess it is good to be reminded that not everyone is a self-centered asshole

09:01 <nirbheek_> So, this guy has a github account and he has admitted what he did, and is unrepentant

09:01 <nirbheek_> I think with enough contacts we can make him have actual consequences for his actions

09:01 <bentiss> nirbheek_: https://gitlab.com/cyberknight777/tg-bomb -> that's not good for you :(

09:02 <nirbheek_> Seems to require a phone number

09:03 <nirbheek_> I joined his chat https://t.me/knightschat and tried to download the history, but he kicked me before I could

09:04 <bentiss> it ś that dead simple to spam an account on telegram?????

09:04 * bentiss is horrified

09:06 <nirbheek_> Hm, interestingly, the chat export is still happening

09:06 AbleBacon has quit [Read error: Connection reset by peer]

09:14 <nirbheek_> This was one of his accomplices: https://github.com/dakkshesh07 he is part of the github developer program

09:16 Haaninjo has joined #freedesktop

09:17 <emersion> similar activity on wine's gitlab fwiw https://gitlab.winehq.org/sanjay7kumar11/rep

09:17 <nirbheek_> Yep, same guy

09:17 <nirbheek_> "cyberknight77"

09:19 <nirbheek_> emersion: do you have contacts at linode or namecheap?

09:19 <emersion> nope

09:32 <bentiss> daniels: even kata-1 has been compromised...

09:33 <bentiss> though it looks like it's running privileged podman

09:35 ___nick___ has quit []

09:36 <daniels> bentiss: fwiw, there's more to sergi's pipelines than you might think, but let's deal with that later

09:36 <bentiss> daniels: I agree, and I'm not arguing against the idea

09:36 Haaninjo has quit [Quit: Ex-Chat]

09:36 <bentiss> just the implementation :)

09:37 ___nick___ has joined #freedesktop

09:37 ___nick___ has quit []

09:39 ___nick___ has joined #freedesktop

10:10 robobub has quit []

10:28 MrCooper has quit [Remote host closed the connection]

10:29 MrCooper has joined #freedesktop

10:51 <alatiera> nirbheek_ so is that guy indeed 16?

10:52 <nirbheek_> cyberknight777 is 20 or older, rest are 16

11:05 nirbheek_ is now known as nirbheek

11:25 vyivel has quit [Remote host closed the connection]

11:25 vyivel has joined #freedesktop

11:26 <MrCooper> wonder if https://gitlab.freedesktop.org/mesa/piglit/-/issues/88 is legit or suspicious

11:35 <daniels> MrCooper: the username does match the user who filed the GitHub PR, and it's an @qq.com which matches China rather than our current friends, so I'd be inclined to say it's OK

11:42 <daniels> bentiss: so back to the scheduled pipeline, one thing we didn't realise is that 'auto-cancel redundant pipelines' seemingly doesn't apply to scheduled runners

11:42 <daniels> there's definitely a bug in GitLab itself where it just doesn't schedule jobs on tagged runners sometimes, no matter how much free capacity there is

11:43 <daniels> the only way to get the job scheduled is to schedule another job for the same runner, which bumps the check timestamp the runner passes, and forces it to see that there are more jobs available

11:43 <daniels> you can see this when you have jobs that have been queued for 30+min, then you cancel it and retry it and it gets processed instantly

11:44 <daniels> I've seen on Rails that if you query the job queue for that runner (i.e. executing exactly what the runner endpoint would've done), that there is a job there queued for it - but it never checks the queue because the runner passes a last-check token which is the most current one, so it skips checking the queue and just tells the runner there are no more jobs available

11:44 <daniels> the jobs in sergi's pipeline are dummy jobs - they just echo and return immediately, taking no time at all - they're only there to pump the job queue

11:48 <alatiera> lol

11:49 <zmike> is swrast ci operational?

11:52 <sergi> bentiss and daniels, yes my jobs were dummies but I haven't thought about the effect on the "toomanyrequests" to docker.io

11:54 <bentiss> daniels, sergi: sorry I was worried this morning about the DoD, and took immediate actions. But again, I'm not against such thing, and if that's the only way of solving the issue, then yes, we should go for it

11:55 <daniels> zmike: sadly no, waiting for anholt to get back so I can figure it out with them

11:55 <zmike> alrighty

11:55 <daniels> bentiss: yeah, no problem :) it is the dumbest possible solution for sure, but I can't see much else atm. we can rework it so it pulls one of the ci-templates from harbor at least, which should be a no-op on all the runners as it'll always be cached?

11:55 <bentiss> It's just that it was one more thing to add to the pile of crap we had since the beginning of the year, between spammers and hackers... And I would expect a minimum of monitoring for these jobs

11:56 <bentiss> yeah, using harbor is fine

11:56 <daniels> bentiss: yeah, I asked sergi to put it in place to solve one of the biggest fires we had in Mesa since we run enough jobs (on isolated hw-specific runners which don't see new jobs into the queue until the next MR ...) that it triggers fairly regularly - given that he has children at home and I don't, the monitoring was that I keep an eye on it, but I was somewhat distracted as you know :P

11:56 <daniels> sorry about that

11:56 <bentiss> no worries

11:57 <bentiss> BTW< the one thing I also disliked was the gfx-ci-bot is actually a real account, linked to sergi's email

11:57 <daniels> bentiss: you'd prefer it was a project bot account?

11:57 <bentiss> that's not a good way of doing things. We should have used a proper bot

11:58 <daniels> mmm, we originally did that, but gfx-ci-bot does quite a few things, including needing to be able to push to multiple repos

11:58 <bentiss> yeah, because if anything happens to sergi or if he loses his email credentials we are screwed

11:58 <daniels> tbf the credentials are in Collabora's BitWarden, but yeah, I get the point

11:58 <bentiss> right...

11:58 <daniels> should we just make the email gfx-ci-bot@no.invalid?

11:58 <bentiss> maybe we can find a better middle ground then

11:59 <bentiss> @collabora.com would be better, no?

11:59 <bentiss> because I'm sure that if I see such an email in a user account that does DoS, I'll just nuke it :)

12:00 <bentiss> gfx-ci-bot-collabora@no.invalid if you prefer

12:00 <daniels> but yeah, for example one of the things gfx-ci-bot does is, if there have been changes in piglit/virglrenderer, then it'll push to some repos to build a testing MR of Mesa using updated versions of those two projects, then raise an MR on Mesa to bump the dependencies - we can't do that with a project token

12:00 <daniels> bentiss: yeah, that's a good idea

12:01 <bentiss> daniels: I just spent the past 2 hours discussing with whot about my gitlab-validate-users project I mentioned

12:01 <bentiss> and this rust part is becoming a generic webhook facility for gitlab

12:01 <sergi> I thought the use of my email address in the bot helps to identify me in case something went worng

12:02 <bentiss> basically, you could register that webhook in piglit/virglrenderer for code push, and if it gets a change, it'll run any python script you want with the proper credentials

12:03 <sergi> And, now it's long a go, I prefered not to use my user, because this uprev project is a team tool and should be a single person project

12:03 <daniels> sergi: yeah, and you can't register with an invalid email so that's fair enough - I've made it go to one which contains the contact details but also ends in .fdo.invalid, so we'll never try to route e.g. a password-reset email

12:03 <daniels> bentiss: interesting - wouldn't that require something like Vault to get a token which could then create a MR on Mesa though?

12:03 <bentiss> sergi: yes, and it's valuable, but if you look into my eyes this morning: "oh, there is a user nmamed gfx-ci-bot DoS the farms, it must be legit. Well, wait a minute that user is an actual user, so is it a hacker?, let's just nuke, talk later"

12:04 <sergi> bentiss, it's a newbie XD

12:04 <daniels> haha

12:04 <sergi> I mean, I'm the newbie

12:05 <bentiss> daniels: right now I intend to have the token in the yaml config, stored on an internal tree (or just in kuberenetes), so one token per hookiedookie instance

12:05 <daniels> well, one thing to do at least is to nuke all the invalid tags which will never get scheduled

12:05 <bentiss> (we plan on renaming it hookiedookie)

12:05 <daniels> bentiss: gotcha, nice

12:06 <sergi> I understand how this morning was impossible to distinguish if this behaviour was for good or for bad. But for sure, it has disturbing the infra

12:06 <bentiss> and as we were talking this morning, I also realized I could watch for the pushes in the config repo, and automatically reload it through a webhook :)

12:06 <daniels> bentiss: oh that's cute, and might resolve my current wondering of how to auto-update various bots and tasks we have running in k8s

12:07 <daniels> (e.g. it would be nice if the marge-bot update process was something more nuanced than 'ask daniels to change the tag')

12:07 <bentiss> sergi: again, don't blame yourself, we all make mistakes. The fact that I recognized the name prevented me to nuke the account, so don't worry.

12:07 <bentiss> daniels: I know it would be useful to others :)

12:07 <bentiss> whot: told you ^^ :)

12:07 * bentiss needs to grab some lunch, bbl

12:11 <alatiera> bentiss, daniels another thing I wanted to do for a while, was to default images created with ci-templates to run as non-root users

12:11 <daniels> alatiera: fr fr

12:11 <daniels> I'd need to check how that works with nested KVM, since we depend on that heavily for both Mesa and Weston

12:11 <alatiera> last I looked at it there was some buildah issue

12:12 <daniels> but yeah, if we can make it work then I'd be completely on board with enforcing unprivileged across the board

12:12 <daniels> (& will keep looking into how to make Kata fly today, had some v bad timing with some other work stuff which has dragged me away but trying to come back to it)

12:12 <alatiera> np np

12:13 <alatiera> my plan is to wipe one runner and set it up as group runner so at least MRs can be back working

12:13 <alatiera> which should buy us some time to figure out how to lock down the rest

12:14 <daniels> cool cool, I'll let you know how I get on with Kata - you should be able to use the cloud-init stuff to provision new htz runners as well

12:20 <alatiera> yea

12:25 MajorBiscuit has joined #freedesktop

12:40 vkareh has joined #freedesktop

12:50 agd5f has joined #freedesktop

12:55 agd5f_ has quit [Ping timeout: 480 seconds]

12:56 agd5f_ has joined #freedesktop

12:58 vbenes has quit [Remote host closed the connection]

13:02 agd5f has quit [Ping timeout: 480 seconds]

13:12 vbenes has joined #freedesktop

14:26 Leopold_ has quit [Remote host closed the connection]

14:28 vbenes has quit [Quit: Leaving.]

14:28 vbenes has joined #freedesktop

14:29 Leopold has joined #freedesktop

14:39 <MrCooper> ugh, now ccache is hanging in F36 containers as well

14:46 <daniels> MrCooper: job?

14:46 <daniels> if it's still running I can SSH in and stare at it

14:46 <MrCooper> https://gitlab.freedesktop.org/mesa/mesa/-/jobs/37970650 , I cancelled it

14:47 <MrCooper> I was hitting this when I tried upgrading to F37 first, that's why I settled for F36 for now

14:47 <MrCooper> looks like ccache has been upgraded from 4.5.1 to 4.7.4 in F36

14:48 <MrCooper> which is the same version as in F37

14:53 mohamexiety has joined #freedesktop

14:55 <mohamexiety> daniels: hey again! not sure if you remember but a few days ago I came with a weird issue where I was having a really really bad connection to the fdo gitlab. you mentioned that maybe there was some node in the middle that was having issues or so and suggested looking for alternative routing

14:55 <mohamexiety> well I tried cloudflare WARP and now everything works well, so thanks a lot again!

14:55 <MrCooper> daniels: it was hanging in a futex syscall, https://github.com/ccache/ccache/issues/1244 sounds similar

15:00 <daniels> mohamexiety: glad to hear it!

15:01 <daniels> MrCooper: hmm but we don’t use redid

15:01 <daniels> *redis

15:04 <MrCooper> can't reproduce locally in podman either

15:07 <MrCooper> would dropping ccache from the fedora image be acceptable for now?

15:16 <daniels> MrCooper: yeah, just do that for now, it's not on the critical path to hardware testing anyway

15:17 <daniels> MrCooper: thanks for looking into it :)

15:17 <MrCooper> no worries, thanks

15:35 <Wallbraker> Seems like something happened to some of the Windows runners on the CI, got a job stuck because there no runners for the tags: docker, windows, 2022.

15:36 <daniels> Wallbraker: yeah they're gone for now, awaiting a rebuild

15:36 <Wallbraker> Hours, days, weeks?

15:36 <Wallbraker> Thanks for the info!

15:37 <daniels> absolutely no clue I'm afraid

15:37 agd5f_ has quit []

15:37 <daniels> but weeks would be surprising

15:37 agd5f has joined #freedesktop

15:39 <Wallbraker> Heh, I was being pessimistic. :p

15:40 <Wallbraker> Okay I'll disable the windows build if things doesngive it a few hours before disabling the windows

15:40 <Wallbraker> Okay I'll disable the windows build if things doesn't improve by tomorrow.

15:41 <MrCooper> daniels: hmm, why can't I retry https://gitlab.freedesktop.org/mesa/mesa/-/jobs/37974766 ?

15:41 <eric_engestrom> MrCooper: because gitlab already retried the job

15:41 <daniels> yeah, it retried and succeeded in https://gitlab.freedesktop.org/mesa/mesa/-/jobs/37975697

15:42 <eric_engestrom> thanks to DavidHeidelberg[m]'s `retry: 1`, anything that fails gets auto-retried once by gitlab

15:43 <MrCooper> sigh, thanks

15:45 <daniels> (that's backed up by a bunch of automated monitoring we have that e.g. tells you which specific dEQP cases flaked when you retried a job and it succeeded; we're not just blindly retrying for no reason)

15:52 <MrCooper> man, I fell into a rabbit hole: https://gitlab.freedesktop.org/mesa/mesa/-/jobs/37976025

15:54 <alatiera> Wallbraker unlikely they will be up by tomorrow

15:55 <Wallbraker> alatiera: Okay, thanks for letting me know. Any reason for the outage?

15:56 <alatiera> the runners are fine-ish from the look of it, unlike the linux ones, but I wanna wait for the idiot poking at our runners to lose interest before enabling them again

15:56 <alatiera> I guess could enable one of them since I am gonna rebuild them eventually anyway

15:56 <alatiera> or rather rollback the snapshot

15:57 <Wallbraker> Somebody hacking/DoSs the runners?

15:57 <Wallbraker> Also that would be great.

15:57 <daniels> Wallbraker: yep

15:58 <alatiera> yea indeed, lets enable one of them and see how it goes

15:58 <Wallbraker> Sigh

15:58 <Wallbraker> Thanks for that!

16:02 craftyguy has quit [Remote host closed the connection]

16:03 craftyguy has joined #freedesktop

16:04 craftyguy has quit []

16:11 <alatiera> Wallbraker, daniels one of the windows runner is up

16:12 <alatiera> will try to keep an eye on usage

16:12 <alatiera> and will revert the vm to the snapshot if anything goes bad

16:12 craftyguy has joined #freedesktop

16:15 <Wallbraker> Our job succeeded, thanks!

16:20 jarthur has joined #freedesktop

16:26 <anholt> daniels: so, what's the summary from the weekend? I've still got mesa-swrast turned off.

16:28 Leopold has quit [Remote host closed the connection]

16:30 genpaku has quit [Read error: Connection reset by peer]

16:31 Leopold_ has joined #freedesktop

16:32 genpaku has joined #freedesktop

16:48 Leopold_ has quit [Remote host closed the connection]

16:50 Leopold_ has joined #freedesktop

16:50 rcampbell has joined #freedesktop

16:52 rcampbell has quit []

16:53 rcampbell has joined #freedesktop

16:53 rcampbell has quit [Remote host closed the connection]

16:56 rcampbell has joined #freedesktop

16:58 <bentiss> anholt: basically we have been hacked, we kicked them out and we set everything back is a longer todo list to make runners more secure, which will be a pain to everybody

16:58 <bentiss> s/everything back is a/everything back *with* a/

16:58 <anholt> bentiss: so, containers escaped?

16:59 <bentiss> yep

16:59 <anholt> that's what I was seeing on mesa-swrast

17:00 <bentiss> anholt: nirbheek managed to talk to them, and some are just kids (16) but "the brain" is like 20 and has no remorses

17:00 <bentiss> so when you deal with that kind of ass, you don't have much choice

17:01 <daniels> anholt: containers escaped, found out exactly who was doing it, banned them, know what to look for in future

17:02 <daniels> anholt: I'm working on Kata but I very likely won't have that done today

17:02 <anholt> ok, but we don't have a way to prevent the container escape?

17:02 <daniels> ^

17:02 * anholt searches for kata, finds out

17:03 <daniels> anholt: 'what if every container was also an ephemeral VM'

17:03 <anholt> I mean, it seems like the obvious sensible thing

17:03 <daniels> yeah, obvious, sensible, also comes with a number of hazards to do it properly

17:03 <anholt> I'm shocked.

17:03 <daniels> anyway, I'm going to figure that out and provide a cloud-init patch, hopefully tomorrow but might end up being more like Wednesday

17:04 <anholt> well, now that I've figured out how to cloud-init, I can at least follow along I guess :)

17:04 <daniels> itmt if you provision new ones, I'm staring at our current ones looking for any kind of escape and not seeing it, and we know which new users to look for

17:04 <daniels> and then it's easy to either reprovision with kata, or if you have some way of sharing GCE access then I can do it

17:06 <zmike> is it known that the a630-traces job is stalling out? I think this is the second time today for me https://gitlab.freedesktop.org/mesa/mesa/-/jobs/37982645

17:07 <daniels> zmike: it's stalling in uploading the images to minio, so either the network is rubbish or s3 is crawling

17:07 <anholt> it's at replay_minio_upload_images, and at least a long time ago that was slow for unknown reasons when lots of images changed.

17:07 <daniels> otoh, it literally just now completed

17:07 <zmike> yeah but the pipeline has already timed out

17:07 <daniels> so next time, ask me and anholt to open it earlier so it can complete in front of all our eyes

17:08 <zmike> shrug

17:08 <daniels> hmm rly? only starting traces at >=42min is already disastrous

17:08 <zmike> I've had two attempts to marge this today time out

17:08 <daniels> ohhhhh it got retried after it timed out because S3 again

17:08 <zmike> and freedreno isn't hitting this codepath so I know it's unrelated

17:08 <zmike> ah

17:08 <zmike> ok

17:08 <zmike> well then at least the cause is known

17:08 <daniels> https://gitlab.freedesktop.org/mesa/mesa/-/jobs/37979873#L1331

17:20 ximion has joined #freedesktop

17:21 <anholt> we still don't have any plans for how we get fd.o to be less 503-happy, do we?

17:23 mohamexiety has quit []

17:25 <bentiss> anholt: we might be able to simply spin up more pods to handle the workloads, I just never done that with rados gateway

17:29 <daniels> bentiss: could you please do that? it's pretty frequent, and we do seem to get long 502/503/504 blackouts - like 5min+ at a time

17:30 <bentiss> daniels: honestly not today :(

17:30 <mupuf> anyone having a clue what may have landed in mesa that would cause the valve vkcts jobs to spew "ERROR - dEQP error: error: XDG_RUNTIME_DIR not set in the environment." in a loop?

17:30 <bentiss> I can try to work on that tomorrow

17:31 AbleBacon has joined #freedesktop

17:31 * mupuf cancelled the marge job and will investigate that tomorrow

17:34 <daniels> bentiss: yeah, no prob at all :)

17:34 <daniels> mupuf: presumably having Wayland at least available as the window system, and dEQP trying to use that

17:35 <daniels> mupuf: I suspect the answer is to set HWCI_START_WESTON in your job environment

17:36 <mupuf> thx!

17:36 <daniels> either that or just skip all the Wayland WSI tests

17:36 <daniels> np

17:36 <daniels> bentiss: thankyou very much <3

17:37 <bentiss> daniels: thank you too for also handling this things :)

17:40 <daniels> bentiss: team work makes the dream work

17:40 <bentiss> heh

18:06 MajorBiscuit has quit [Ping timeout: 480 seconds]

18:13 <bentiss> actually, this might be easy (multiple pods for radosgw)

18:16 <bentiss> daniels, anholt: done via https://gitlab.freedesktop.org/freedesktop/helm-gitlab-config/-/commit/2af51be5d419aad8c3f3f69023568c0701ee9acd we'll see how it behaves in the next few days

18:17 <anholt> bentiss: thank you!

18:18 <bentiss> alright the fdo-opa just started too, so all radosgw have now 3 pods instead of 1

18:29 ybogdano is now known as Guest7588

18:29 ybogdano has joined #freedesktop

18:30 ybogdano has quit []

18:45 <alatiera> bentiss, daniels is the podman executor setup somewhere in git btw>?

18:46 <alatiera> also did we register another placeholder-job runner?

18:46 <bentiss> alatiera: we use https://gitlab.freedesktop.org/freedesktop/helm-gitlab-config/-/blob/master/gitlab-runner-provision/generate-cloud-init.py to generate the cloud-init

18:46 <bentiss> and I don't know if placeholder-job has been set up

18:47 <alatiera> oh I see, thanks

18:47 * alatiera hasn't looked enough into cloud-init yet

18:54 Kayden has quit [Quit: _> office]

19:00 abrotman has quit [Remote host closed the connection]

19:00 abrotman has joined #freedesktop

19:01 ybogdano has joined #freedesktop

19:07 mvlad has quit [Remote host closed the connection]

19:09 vkareh has quit [Quit: WeeChat 3.6]

19:09 alanc has quit [Remote host closed the connection]

19:09 alanc has joined #freedesktop

19:30 <alatiera> hmm added a group runner in gst and have an mr pipeline but it doesn't seem to pick up jobs from the mr

19:31 <alatiera> it does pick up jobs in gst/gst though

19:31 <alatiera> any ideas?

19:36 abrotman has quit [Remote host closed the connection]

19:36 abrotman has joined #freedesktop

19:45 <daniels> bentiss: this is awesome, thanks a lot

19:46 <daniels> alatiera: I've never tried to use group runners, sorry - it depends on the MR though as to whether the pipeline executes in user or group context - check the path for the pipeline if it says alatiera/gst or gst/gst

19:47 <alatiera> supposedly the parent context pipeline is an EE feature

19:47 <daniels> if it's the wrong one, you'll want to look at https://docs.gitlab.com/ee/ci/pipelines/merge_request_pipelines.html#use-with-forked-projects

19:47 <daniels> hmm

19:47 <alatiera> but checking if it gets triggered if I have permissions

19:47 <daniels> 'Moved to GitLab Premium in 13.9.'

19:47 <daniels> daaaaaaaaaaaaamn.

19:48 <alatiera> ha! yeap only fork pipelines it is

19:48 <alatiera> bentiss there goes the dream

19:52 <alatiera> guess I will need a shared-runner token for now

19:52 <alatiera> but will keep the runner unpriv at least

19:53 <alatiera> still treat it as throwaway until we figure out something though

19:53 <bentiss> honestly, we can probably have something like marge-bot, that forcces the pipeline to run in the context of the target project if we need. But I'm surprised it's premium only noe

19:53 <bentiss> now

19:54 <daniels> sent you the token

19:54 * bentiss really wants to have hookiedookie ready now

19:54 <alatiera> bentiss marge will trigger a fork pipeline too

19:54 <daniels> bentiss: oh, that's a good point, and gst does have marge-bot - it would just need to be modified to push into a throwaway branch of the parent project rather than the downstream project

19:54 <alatiera> we'd have to clone -> close -> create new

19:55 <bentiss> because we could havea /run-pipeline, and we can have a way to copy the MR in the target, and run pipeline from it,

19:55 <alatiera> and then I guess have the bot merge manually

19:55 <daniels> alatiera: ugh yeah, of course

19:55 <bentiss> but we could also use parent-child pipeline with git clone policy never and we provide our own sha

19:56 <alatiera> ah hmm

19:56 <alatiera> so the child pipeline would be on the fork I am guessing

19:57 <alatiera> I think I recall an issue in gitlab for sharing the runners

19:58 <bentiss> I honestly doesn't have the brain today for thinking through all the quirks, but basically we probably want: user submits a MR in his fork, a developer in the project approves it, this triggers a new pipeline in the target project with the given sha, and we wait fo rthe results

19:58 <__tim> we could add the runner to marge and active devs, then it would work for merge requests

19:59 <__tim> tedious though, and not great for drive-by patches, but what can you do

20:01 <bentiss> actually even easier: if we set the pre-clone hook like I have in the issue, the shared runner will only execute if the MR is from a project member or is approved

20:01 <__tim> right, I thought that wasn't quite ready yet

20:02 <bentiss> The missing bit are: a proper repo to store that hook, the runners configuration to be extended

20:02 <bentiss> so should be easy enough to implement, except I'm terrible at giving names to projects :)

20:02 <alatiera> https://gitlab.com/gitlab-org/gitlab/-/issues/19451

20:02 <alatiera> there is no hope

20:02 * bentiss always wanted to rename helm-gitlab-configure and helm-gitlab-omnibus to something else :)

20:03 Haaninjo has joined #freedesktop

20:03 <alatiera> __tim we'd need to add a runner instance/config for each contributor

20:04 <alatiera> oh hmm

20:04 <alatiera> so shared runner but abort if there is no mr

20:04 <alatiera> as long that can't be overwritten somehow it might work

20:04 <bentiss> yeah, if there is no mr or if the project or user is not allowed

20:05 <alatiera> found another issue https://gitlab.com/gitlab-org/gitlab/-/issues/336530

20:05 <bentiss> technically it would go in a repo on gitlab.fd.o/freedesktop, so if it's hacked, then we are doomed

20:05 * alatiera mumbles about open core

20:07 <__tim> then we have bigger problems anyway :)

20:07 <__tim> which is what you meant I guess

20:07 <bentiss> yeah

20:07 <alatiera> I was thinking more about if you can overwrite the pre-clone hook

20:08 abrotman has quit [Remote host closed the connection]

20:08 abrotman has joined #freedesktop

20:09 <bentiss> it's on the runner config

20:09 <bentiss> it ś a script it runs. The fact that it can down;oad a file and execute it is a side effect :)

20:09 <bentiss> so technically, when the pre-clone script executes, we are not even in the job's environment, so not much the job can do

20:10 <bentiss> it's also on a separate container

20:11 <bentiss> alatiera: in case you need more info and you missed when I reposted it yesterday: https://gitlab.freedesktop.org/freedesktop/freedesktop/-/issues/438

20:12 <alatiera> bentiss I saw the issue, part why I was expecting it to work!

20:12 <bentiss> 9 months ago, it was working

20:12 <bentiss> :)

20:12 <alatiera> heh

20:12 <alatiera> by accident probably

20:12 <bentiss> and given that it is shared runners, all the restrictions for premium are not applying

20:12 <alatiera> but yes I do recall spending hours trying to understand the pipeline contextes

20:13 <alatiera> hmm

20:14 <alatiera> if we hardcode a mr check, we also need to check the group namespace

20:14 <alatiera> and like have_mr + mr_parent_namespace=gst|mesa|etc

20:14 <bentiss> should be easy enough to extend https://gitlab.freedesktop.org/freedesktop/test-ci/-/raw/main/runner-gating.sh

20:15 <bentiss> as long as you stick with grep and wget, it works

20:16 <bentiss> alatiera: the nice thing is that groups can't be created by the normal users. So we have some control over them, and all groups are technically valid ones

20:16 <alatiera> we say that all toplevel groups are autoallowed since only admins can make them

20:16 <bentiss> yeah

20:17 <alatiera> will take a stab at it

20:17 <bentiss> and we can probably have some contacts with teh persons if there is an issue in a group

20:18 <bentiss> anyway, it's been a long day here. I'll be back tomorrow I think

20:18 <alatiera> the stupid thing is, that we can't even do "maintainers only trigger pipelines" like its on github

20:18 <alatiera> sure we will get pipelines running automatically, but also the things we have to do to get there..

20:22 wizard5623 has joined #freedesktop

20:26 abrotman has quit [Remote host closed the connection]

20:26 abrotman has joined #freedesktop

20:34 danvet has quit [Ping timeout: 480 seconds]

20:44 abrotman has quit [Remote host closed the connection]

20:44 abrotman has joined #freedesktop

20:49 DodoGTA has quit [Quit: DodoGTA]

20:51 DodoGTA has joined #freedesktop

20:55 Kayden has joined #freedesktop

20:55 Kayden has quit [Remote host closed the connection]

20:56 Kayden has joined #freedesktop

21:06 DodoGTA has quit [Remote host closed the connection]

21:08 ___nick___ has quit [Ping timeout: 480 seconds]

21:08 DodoGTA has joined #freedesktop

21:19 ybogdano is now known as Guest7599

21:19 Guest7588 is now known as ybogdano

21:27 Leopold_ has quit []

21:35 Leopold_ has joined #freedesktop

21:48 Leopold_ has quit [Remote host closed the connection]

21:51 Leopold_ has joined #freedesktop

22:17 <alatiera> so remeber when I said that parent pipelines from forks are EE? I lied

22:17 <alatiera> it is Premium which is different apparently

22:17 <alatiera> enabled on selfhosted but needs a subscription on gitlab.com

22:18 <alatiera> so we could have group runners just for maintainers/developers I guess

22:18 <alatiera> and throw away shared runners

22:27 <jenatali> alatiera: Are the Windows runners in a good enough state to re-enable for Mesa? Or should we wait longer before doing that?

22:27 <alatiera> jenatali I've turned one of them back on

22:27 <alatiera> there isn't any sign they were tampered with

22:27 <jenatali> Got it, so half capacity at the moment

22:27 <alatiera> however I will roll back the snapshot at some point within the week

22:28 <jenatali> Ok

22:28 <alatiera> or if the kids come back

22:41 wizard5623 has quit []

22:50 Guest7599 has quit [Ping timeout: 480 seconds]

22:52 anholt has quit [Quit: Leaving]

22:52 anholt has joined #freedesktop

23:28 DodoGTA has quit [Quit: DodoGTA]

23:31 DodoGTA has joined #freedesktop

23:51 DodoGTA has quit [Remote host closed the connection]

23:52 DodoGTA has joined #freedesktop

23:53 Kayden has quit [Quit: go home before fire alarm tests]