#freedesktop on 2023-03-12 — irc logs at oftc.irclog.whitequark.org

2022-12-21 00:45 ChanServ changed the topic of #freedesktop to: https://www.freedesktop.org infrastructure and online services || for questions about freedesktop.org projects, please see each project's contact || for discussions about specifications, please use https://gitlab.freedesktop.org/xdg or xdg@lists.freedesktop.org

01:12 columbarius has joined #freedesktop

01:14 co1umbarius has quit [Ping timeout: 480 seconds]

01:45 <airlied> spammer on again, i turned off mesa public issues again

01:48 strugee_ has joined #freedesktop

01:51 strugee has quit [Ping timeout: 480 seconds]

01:51 strugee_ is now known as strugee

02:50 ___nick___ has quit []

02:52 ___nick___ has joined #freedesktop

02:53 ___nick___ has quit []

02:56 ___nick___ has joined #freedesktop

03:03 miracolix has quit [Ping timeout: 480 seconds]

03:11 richar has joined #freedesktop

03:17 richar has quit [Quit: Page closed]

03:21 miracolix has joined #freedesktop

04:57 jarthur has quit [Ping timeout: 480 seconds]

05:06 agd5f_ has joined #freedesktop

05:12 agd5f has quit [Ping timeout: 480 seconds]

05:42 agd5f has joined #freedesktop

05:47 agd5f_ has quit [Ping timeout: 480 seconds]

06:57 agd5f_ has joined #freedesktop

06:59 ximion has quit [Quit: Detached from the Matrix]

07:04 agd5f has quit [Ping timeout: 480 seconds]

07:21 chipxxx has quit [Remote host closed the connection]

07:23 chipxxx has joined #freedesktop

07:30 chipxxx has quit []

07:31 chipxxx has joined #freedesktop

07:45 robobub has joined #freedesktop

08:36 danvet has joined #freedesktop

09:02 agd5f has joined #freedesktop

09:08 agd5f_ has quit [Ping timeout: 480 seconds]

09:45 agd5f_ has joined #freedesktop

09:51 agd5f has quit [Ping timeout: 480 seconds]

10:36 Haaninjo has joined #freedesktop

11:20 agd5f has joined #freedesktop

11:26 agd5f_ has quit [Ping timeout: 480 seconds]

11:57 agd5f_ has joined #freedesktop

12:03 agd5f has quit [Ping timeout: 480 seconds]

12:28 Haaninjo has quit [Read error: Connection reset by peer]

12:32 Haaninjo has joined #freedesktop

12:57 MrCooper has quit [Remote host closed the connection]

12:57 MrCooper has joined #freedesktop

13:09 <__tim> hrm, I'm seeing ./xmrig -a rx -o stratum+ssl://rx.unmineable.com:443 -u REP:0x41f16b4cCB10FE72936C1787A0c54A683cA0977C.7#n4t1-uiz9 -p x --threads=64 on the gst htz2 runner

13:09 <__tim> that doesn't look good

13:09 <__tim> runner-ybqhg69i-project-44165466-concurrent-1-a56fc4b75d9ed0cf-build-2

13:11 <daniels> __tim: I’ve been suspecting the same on our shared runners - guess someone introduced a container escape

13:11 <daniels> going to burn and recreate them when they’re back home

13:11 <__tim> this seems to be running inside a docker though

13:11 <daniels> but definitely running in VMs would be a great start

13:11 <daniels> ah ok

13:11 <daniels> interesting

13:13 <daniels> __tim: the admin UI shows no jobs running on htz2 btw …

13:14 <__tim> I killed it

13:14 <__tim> so presumably the job failed then

13:14 <__tim> until it comes back

13:14 <__tim> saw the same on other runners

13:15 <daniels> hmm, they all look legit atm?

13:15 Leopold__ has quit [Remote host closed the connection]

13:16 <__tim> I killed all those jobx

13:16 <__tim> I'll let you know when one comes back

13:16 <__tim> runner-ybqhg69i-project-44165466-concurrent-1-a56fc4b75d9ed0cf-build-2 was one of the docker jobs on htx-2

13:16 <__tim> dunno if that project id is enough to tell where it came from

13:17 Leopold has joined #freedesktop

13:18 <daniels> yeah, that does help, thankyou! I’ll check on it later - the gitlab-runner logs should also be able to link that to a job ID

13:18 <__tim> thanks

13:20 <alatiera> monday is gonna be fun I see

13:21 <alatiera> we could put the gst runners inside kvm, but they also don't have access to anything else

13:21 <alatiera> so unless someone manages to write some malware that persists through wipes, it will be about the same

13:22 <daniels> I was thinking more like Kata containers, so each job has its own ephemeral VM

13:22 <alatiera> does gitlab have a kata executor?

13:23 <alatiera> though then we will need to do all the docker run things manually too

13:23 <daniels> I believe that’s more about podman/CRI than GitLab as well

13:23 <alatiera> unless the executor already does kata + podman/docker images

13:23 <daniels> s/ as well//

13:23 <alatiera> oh hmm

13:23 <daniels> yeah I dunno - will look into it tonight

13:24 <alatiera> indeed I recall hearing about a cri runtime with kata

13:26 <alatiera> indeed its even possible to replace docker's default runtime with kata in place

13:31 <alatiera> https://github.com/kata-containers/kata-containers/blob/main/docs/how-to/containerd-kata.md

13:32 <alatiera> wrong file, https://github.com/kata-containers/kata-containers/blob/main/docs/how-to/how-to-run-docker-with-kata.md

13:33 <alatiera> though we will need to do a custom vm/kernel setup in order to have working virtiofs

13:59 <i-garrison> more spam in gitlab pulseaudio issues

13:59 ximion has joined #freedesktop

14:06 <__tim> daniels, running again on htz2 / htz3 / htz4 in case you can see anything in the admin iface (will kill it in a bit)

14:07 <daniels> __tim: hmmm, when did they start?

14:07 <__tim> I don't know

14:07 <__tim> but it looks like the ssh connection got severed and I can't log in again right now

14:09 <daniels> hmmm. tbh you should probably consider that host compromised and rebuild at this point. going to do that for the equinix ones when I’m back at my laptop

14:10 <__tim> yeah, probably

14:10 <__tim> anyway, can't look more right now, back later

14:10 <daniels> o/

14:35 <bentiss> ouch, sth weird is happening: 699 pending jobs...

14:37 <bentiss> daniels: so all the equinix runners are compromised?

14:37 <daniels> bentiss: that’s my working assumption

14:37 * bentiss can't seem to log in on to them (though it's just not answering ssh)

14:39 <bentiss> they are handling jobs though

14:39 <bentiss> maybe I can use the tmate magic to get an ssh connection on to one of them

14:39 <daniels> yeah just crazy slowly. tpm noted that he could no longer SSH to his either, and I have seen job runtimes like 4x what they should be. so it all adds up

14:40 <daniels> we can use the Equinix SOS console but not sure what the root pw is …

14:40 <bentiss> yeah, the root password is trached after 24h

14:40 <daniels> I think prob easiest is to just burn & recreate, and in parallel I’ll figure something kata-like tonight?

14:40 <bentiss> would be great if you can, yeah

14:41 <daniels> in about an hour

14:42 <bentiss> BTW, if we recreate them we need to add one packet mupuf requested during the week (I'll need to find it in the backlog)

14:42 <mupuf> podman-plugins

14:42 <bentiss> thanks :)

14:42 <mupuf> that would be nice :)

14:44 <bentiss> daniels: the console gives a bunch of systemd-journald[1366018]: Failed to open system journal: Not a directory

14:56 <daniels> bentiss: …

14:58 <__tim> ours are definitely compromised

15:01 <__tim> user accounts/dirs purged, ssh keys installed into /root/.ssh/authorized_keys

15:09 <bentiss> __tim: ok, so that's worrying, because if they manage to get control of all runners, thye might do so again if we spin up new ones

15:10 <bentiss> __tim: you were running docker, not podman?

15:11 <__tim> afaik yes

15:11 <bentiss> daniels: maybe a quickest solution would be to run the runners in user, not root, and keep the privileged onbe for virglrenderer onlu

15:11 <bentiss> only

15:11 <__tim> (unless alatiera changed something recently)

15:14 <bentiss> __tim: k, so it's not podman related then

15:21 <bentiss> looks like the host override for ssh.tmate.io is still there

15:21 * alatiera touched nothing

15:22 <bentiss> daniels: I confirm, all CPU are 100%

15:23 <bentiss> (I used the tmate trick on with a local server

15:23 <bentiss> )

15:23 <alatiera> let's wipe them all and figure how to re provision them tmr

15:23 <bentiss> but now, I am stuck in podman :)

15:23 <alatiera> is there a button in the admin panel to like disable all shared runners?

15:24 <alatiera> other thing we could do is have only group runners and only merge pipelines, no branch pipelines, no fork pipelines

15:24 <__tim> alatiera, I rebooted the gst-htz- runners into a rescue system for now, so they are essential offline for now

15:24 <alatiera> it would mean that in order to trigger a pipeline there must be an MR

15:24 <alatiera> __tim nice, thanks

15:25 <alatiera> or rather, you don't need a merge request but the group runners will only pick up merge pipelines or ones from branches on the repos in the group

15:25 <alatiera> mesa/mesa, not forks unless shared explicitly

15:25 <bentiss> alatiera: https://gitlab.freedesktop.org/freedesktop/freedesktop/-/issues/438

15:26 <bentiss> that was one proposal I had for these cases

15:26 <bentiss> daniels: I'm going to nuke all of them except 13, I am logged on it.

15:27 <__tim> bentiss, that issue says "This is a confidential draft" but the issue isn't actually marked as such?

15:27 <bentiss> It used to be confidential, I opened it once I got some acks

15:27 <__tim> ah ok

15:28 <alatiera> bentiss yea seems to be inline with what I was imagining

15:28 <alatiera> gst is already using merge pipelines

15:28 <bentiss> painful, but should be OK-ish

15:28 <alatiera> I could probably look over and port other smaller groups/repos in gitlab

15:28 <bentiss> now I need to find a CVE that I can use to evade the podman sandbox...

15:30 <alatiera> I should ask bart if he ever looked into kata containers before

15:32 <bentiss> alatiera: IIRC we looked, but never could get it working properly and the current solution was working... so...

15:33 <alatiera> oh fuck I wonder if the windows runners are also pawned

15:35 <alatiera> they were 100% usage up until I logged in

15:35 <alatiera> encouraging

15:35 <bentiss> I guess time for cutting all CI entirely

15:36 <alatiera> it was probably some CI job that just finished hopefully

15:36 * bentiss crosses fingers

15:38 <bentiss> alatiera: actually if you could log in, then maybe it's not compromised. We are basically unable to do anything on the compromised hosts

15:38 <bentiss> well, unless spwaning a container thorugh CI and logging in it

15:39 <daniels> bentiss: mmm, weston also uses kvm, and others using docker-in-docker are going to need privileged

15:39 <alatiera> yea I think the windows ones are fine but probably nto for long

15:40 <alatiera> I will disable the runner for now

15:40 <bentiss> daniels: kvm was fine in usermode

15:40 <bentiss> it was virglrenderer that was failing

15:40 <daniels> oh right, why was virgl failing?

15:40 <alatiera> d-in-d is a nogo

15:40 <alatiera> but its probably only used to build docker images hopefully?

15:41 <daniels> some projects like dbus use dind; virgl is using ci-templates tho

15:41 <bentiss> daniels: I couldn't really understand why

15:41 <bentiss> alatiera: d-in-d is already not working anymore, because the runners are behind podman these days

15:42 <alatiera> oh nice

15:42 <__tim> alatiera, windows seems to be working fine at least (cerbero msvc job at least running fine)

15:42 <__tim> if it was running miners you'd notice ;)

15:42 <alatiera> yea I just saw that job

15:42 <__tim> it's not terribly important if you want to stop it

15:43 <alatiera> not useful without the linux runners anyway

15:43 <alatiera> so yea I stopped the runner

15:49 <bentiss> daniels: I rebooted 13, and can log back in

15:50 <bentiss> and there are 3 suspicious ssh keys indeed

15:51 <bentiss> one with a plain email :)

15:55 <bentiss> yes, on that compromised machine, if I run podman logs runner-cvxra4bi-project-44165466-concurrent-1-6cd38c9473c64837-build-2 -> definitely a crypto handling

15:56 <bentiss> xmrig and everythin

15:57 <bentiss> but it doesn't seem to be a compromising job. Just a crypto one

15:57 <bentiss> but the project id is suspicious, because we are not anywhere near id 44165466

15:58 <daniels> mupuf: podman-plugins doesn't seem to exist as a package - what is it that you wanted installed?

15:58 <mupuf> daniels: what distro are you using?

15:59 <bentiss> it's debian stable with a custom repo for podman

15:59 <mupuf> ack!

15:59 <bentiss> so not surprising that it doesn't exist

16:01 * mupuf will need to figure out what is important, and which package provides it

16:01 <mupuf> so... ignore my request for now

16:01 <mupuf> sorry for the noise

16:01 <mupuf> even arch doesn't have such a package

16:02 <alatiera> what's project 19974

16:03 <bentiss> daniels: so what's the plan with -16 and -18?

16:03 <bentiss> alatiera: gfx-ci-bot/mesa

16:03 <alatiera> pheww

16:03 <alatiera> also its still on windows 2019!

16:04 <alatiera> I had almost forgot about that runner

16:04 <bentiss> I'm afraid they are using a bug in gitlab... because that project id doesn't exist

16:04 <bentiss> runner-cvxra4bi-project-44165466-concurrent-1-6cd38c9473c64837-build-2 I mean

16:06 <daniels> bentiss: I was going to leave them for now - so they're at least usable - whilst I try to bring up kata in the background on another runner

16:06 <bentiss> daniels: ack

16:07 <bentiss> the thing that worries me is that I don't know what they did to escape the container

16:12 <bentiss> ohhh. /etc/gitlab-runner/config.toml is polluted with a lot of new registrations

16:12 <bentiss> towards gitlab.com

16:15 <DavidHeidelberg[m]> outch

16:15 <bentiss> so AFAICT, the chain was -> we are compromised -> they register a gitlab.com runner to their project -> they mine crypto

16:15 <bentiss> next question, how were we compromised????

16:17 <daniels> bentiss: oh ... !

16:17 <daniels> you can compromise it from a job if you have --privileged

16:18 <bentiss> daniels: I kept a copy of the file in /root/compromised_config.toml

16:18 <bentiss> daniels: then instead of kata, maybe we can just use usermode podman

16:18 <daniels> we can do both!

16:19 <bentiss> daniels: but that also mean that that person registered before March 2 when we blocked provite repos

16:19 <mupuf> lovely...

16:19 <bentiss> or... that it's one of the last person requesting access

16:19 <bentiss> which is easy to track :)

16:20 <mupuf> but yeah, privileged runners == not so good. The problem is that ci-templates currently requires it

16:20 <bentiss> BTW, do we want to reenable mesa issues -> https://gitlab.freedesktop.org/freedesktop/freedesktop/-/issues/526 ?

16:20 <mupuf> (it used to work, but a newer version of buildah fails un unprivileged runners)

16:20 <bentiss> mupuf: nope, it was working fine with usermode podman

16:21 <mupuf> ha, I see! so the problem is that I did not finish this transition in my farm just yet

16:21 <mupuf> thanks

16:21 <mupuf> one more thing I need to do :D

16:21 <mupuf> well, the code is written, just not released

16:29 <bentiss> daniels: it could be interesting to ping our friends at gitlab about that project (44165466). All I have is a username in an path which doesn't match to a public gitlab.com account: builds/chanakyan.j/

16:30 <bentiss> https://gitlab.com/chanakyan does returns something...

16:30 <bentiss> and FWIW, one of the ssh key was chana@LAPTOP-K58CTK2G

16:34 <alatiera> that's sloppy if that's it lol

16:36 <bentiss> alatiera: I even have an ssh key with an email :) (@gmail.com)

16:36 <alatiera> but you can put anything on the key description

16:36 <alatiera> but yea that's funny

16:36 <alatiera> wait wait, the priv part of the key too? :O

16:37 <bentiss> no no, the ssh-rsa XXXXXX foo@bar

16:37 <bentiss> which can be anything

16:40 <bentiss> daniels: FWIW, -15 had the placeholder job, and is gone. You might want to bring this one back in

16:51 <daniels> bentiss: yep, will do thanks

16:51 <daniels> bentiss: indeed I just found both chanakyan.j and dakshesh07

16:51 <bentiss> daniels: where id you find them?

16:51 <daniels> on the arm runner, running a job from chanakyan.j/llvm-tc-build (which has since been deleted) correlates with the time the gitlab-runner config was modified

16:51 <bentiss> FWIW, dakshesh07 was the one I reported on github and that was messing with us a few month ago

16:52 <daniels> yeah, I remember him from LLVM builds

16:52 <daniels> dakks@ was another of the SSH keys in there

16:52 <bentiss> yep

16:52 <daniels> I've also got an email written to gitlab.com

16:52 <bentiss> nice :)

16:53 <bentiss> daniels: which arm runner was it, 7 or 8?

16:53 <daniels> both ...

16:54 <bentiss> finding that job in the list of jobs is going to be cumbersome :(

16:55 <daniels> the project was deleted, so the jobs are gone too

16:56 <bentiss> ah, damn

16:56 <bentiss> though you got it all written down in your email, so we are fine

16:58 <daniels> https://gitlab.freedesktop.org/chanakyan.j/llvm-tc-build/-/jobs/37755096 was definitely the starting point

19:09 alanc has quit [Remote host closed the connection]

19:09 alanc has joined #freedesktop

19:13 MajorBiscuit has joined #freedesktop

19:23 MajorBiscuit has quit [Quit: WeeChat 3.6]

19:27 anholt_ has quit [Quit: Leaving]

20:20 ximion has quit [Quit: Detached from the Matrix]

20:30 Leopold has quit []

20:33 Leopold has joined #freedesktop

21:07 ___nick___ has quit [Ping timeout: 480 seconds]

21:11 <DavidHeidelberg[m]> daniels: https://gitlab.freedesktop.org/mesa/mesa/-/jobs/37939577 waiting time 40 minutes and it wasn't by looking at lava.collabora.dev it seems it wasn't under pressure.

21:36 Leopold has quit [Ping timeout: 480 seconds]

21:49 Leopold_ has joined #freedesktop

22:53 ximion has joined #freedesktop

23:42 Haaninjo has quit [Quit: Ex-Chat]

23:51 AbleBacon has joined #freedesktop