ChanServ changed the topic of #freedesktop to: infrastructure and online services || for questions about projects, please see each project's contact || for discussions about specifications, please use or
columbarius has joined #freedesktop
co1umbarius has quit [Ping timeout: 480 seconds]
<airlied> spammer on again, i turned off mesa public issues again
strugee_ has joined #freedesktop
strugee has quit [Ping timeout: 480 seconds]
strugee_ is now known as strugee
___nick___ has quit []
___nick___ has joined #freedesktop
___nick___ has quit []
___nick___ has joined #freedesktop
miracolix has quit [Ping timeout: 480 seconds]
richar has joined #freedesktop
richar has quit [Quit: Page closed]
miracolix has joined #freedesktop
jarthur has quit [Ping timeout: 480 seconds]
agd5f_ has joined #freedesktop
agd5f has quit [Ping timeout: 480 seconds]
agd5f has joined #freedesktop
agd5f_ has quit [Ping timeout: 480 seconds]
agd5f_ has joined #freedesktop
ximion has quit [Quit: Detached from the Matrix]
agd5f has quit [Ping timeout: 480 seconds]
chipxxx has quit [Remote host closed the connection]
chipxxx has joined #freedesktop
chipxxx has quit []
chipxxx has joined #freedesktop
robobub has joined #freedesktop
danvet has joined #freedesktop
agd5f has joined #freedesktop
agd5f_ has quit [Ping timeout: 480 seconds]
agd5f_ has joined #freedesktop
agd5f has quit [Ping timeout: 480 seconds]
Haaninjo has joined #freedesktop
agd5f has joined #freedesktop
agd5f_ has quit [Ping timeout: 480 seconds]
agd5f_ has joined #freedesktop
agd5f has quit [Ping timeout: 480 seconds]
Haaninjo has quit [Read error: Connection reset by peer]
Haaninjo has joined #freedesktop
MrCooper has quit [Remote host closed the connection]
MrCooper has joined #freedesktop
<__tim> hrm, I'm seeing ./xmrig -a rx -o stratum+ssl:// -u REP:0x41f16b4cCB10FE72936C1787A0c54A683cA0977C.7#n4t1-uiz9 -p x --threads=64 on the gst htz2 runner
<__tim> that doesn't look good
<__tim> runner-ybqhg69i-project-44165466-concurrent-1-a56fc4b75d9ed0cf-build-2
<daniels> __tim: I’ve been suspecting the same on our shared runners - guess someone introduced a container escape
<daniels> going to burn and recreate them when they’re back home
<__tim> this seems to be running inside a docker though
<daniels> but definitely running in VMs would be a great start
<daniels> ah ok
<daniels> interesting
<daniels> __tim: the admin UI shows no jobs running on htz2 btw …
<__tim> I killed it
<__tim> so presumably the job failed then
<__tim> until it comes back
<__tim> saw the same on other runners
<daniels> hmm, they all look legit atm?
Leopold__ has quit [Remote host closed the connection]
<__tim> I killed all those jobx
<__tim> I'll let you know when one comes back
<__tim> runner-ybqhg69i-project-44165466-concurrent-1-a56fc4b75d9ed0cf-build-2 was one of the docker jobs on htx-2
<__tim> dunno if that project id is enough to tell where it came from
Leopold has joined #freedesktop
<daniels> yeah, that does help, thankyou! I’ll check on it later - the gitlab-runner logs should also be able to link that to a job ID
<__tim> thanks
<alatiera> monday is gonna be fun I see
<alatiera> we could put the gst runners inside kvm, but they also don't have access to anything else
<alatiera> so unless someone manages to write some malware that persists through wipes, it will be about the same
<daniels> I was thinking more like Kata containers, so each job has its own ephemeral VM
<alatiera> does gitlab have a kata executor?
<alatiera> though then we will need to do all the docker run things manually too
<daniels> I believe that’s more about podman/CRI than GitLab as well
<alatiera> unless the executor already does kata + podman/docker images
<daniels> s/ as well//
<alatiera> oh hmm
<daniels> yeah I dunno - will look into it tonight
<alatiera> indeed I recall hearing about a cri runtime with kata
<alatiera> indeed its even possible to replace docker's default runtime with kata in place
<alatiera> though we will need to do a custom vm/kernel setup in order to have working virtiofs
<i-garrison> more spam in gitlab pulseaudio issues
ximion has joined #freedesktop
<__tim> daniels, running again on htz2 / htz3 / htz4 in case you can see anything in the admin iface (will kill it in a bit)
<daniels> __tim: hmmm, when did they start?
<__tim> I don't know
<__tim> but it looks like the ssh connection got severed and I can't log in again right now
<daniels> hmmm. tbh you should probably consider that host compromised and rebuild at this point. going to do that for the equinix ones when I’m back at my laptop
<__tim> yeah, probably
<__tim> anyway, can't look more right now, back later
<daniels> o/
<bentiss> ouch, sth weird is happening: 699 pending jobs...
<bentiss> daniels: so all the equinix runners are compromised?
<daniels> bentiss: that’s my working assumption
* bentiss can't seem to log in on to them (though it's just not answering ssh)
<bentiss> they are handling jobs though
<bentiss> maybe I can use the tmate magic to get an ssh connection on to one of them
<daniels> yeah just crazy slowly. tpm noted that he could no longer SSH to his either, and I have seen job runtimes like 4x what they should be. so it all adds up
<daniels> we can use the Equinix SOS console but not sure what the root pw is …
<bentiss> yeah, the root password is trached after 24h
<daniels> I think prob easiest is to just burn & recreate, and in parallel I’ll figure something kata-like tonight?
<bentiss> would be great if you can, yeah
<daniels> in about an hour
<bentiss> BTW, if we recreate them we need to add one packet mupuf requested during the week (I'll need to find it in the backlog)
<mupuf> podman-plugins
<bentiss> thanks :)
<mupuf> that would be nice :)
<bentiss> daniels: the console gives a bunch of systemd-journald[1366018]: Failed to open system journal: Not a directory
<daniels> bentiss: …
<__tim> ours are definitely compromised
<__tim> user accounts/dirs purged, ssh keys installed into /root/.ssh/authorized_keys
<bentiss> __tim: ok, so that's worrying, because if they manage to get control of all runners, thye might do so again if we spin up new ones
<bentiss> __tim: you were running docker, not podman?
<__tim> afaik yes
<bentiss> daniels: maybe a quickest solution would be to run the runners in user, not root, and keep the privileged onbe for virglrenderer onlu
<bentiss> only
<__tim> (unless alatiera changed something recently)
<bentiss> __tim: k, so it's not podman related then
<bentiss> looks like the host override for is still there
* alatiera touched nothing
<bentiss> daniels: I confirm, all CPU are 100%
<bentiss> (I used the tmate trick on with a local server
<bentiss> )
<alatiera> let's wipe them all and figure how to re provision them tmr
<bentiss> but now, I am stuck in podman :)
<alatiera> is there a button in the admin panel to like disable all shared runners?
<alatiera> other thing we could do is have only group runners and only merge pipelines, no branch pipelines, no fork pipelines
<__tim> alatiera, I rebooted the gst-htz- runners into a rescue system for now, so they are essential offline for now
<alatiera> it would mean that in order to trigger a pipeline there must be an MR
<alatiera> __tim nice, thanks
<alatiera> or rather, you don't need a merge request but the group runners will only pick up merge pipelines or ones from branches on the repos in the group
<alatiera> mesa/mesa, not forks unless shared explicitly
<bentiss> that was one proposal I had for these cases
<bentiss> daniels: I'm going to nuke all of them except 13, I am logged on it.
<__tim> bentiss, that issue says "This is a confidential draft" but the issue isn't actually marked as such?
<bentiss> It used to be confidential, I opened it once I got some acks
<__tim> ah ok
<alatiera> bentiss yea seems to be inline with what I was imagining
<alatiera> gst is already using merge pipelines
<bentiss> painful, but should be OK-ish
<alatiera> I could probably look over and port other smaller groups/repos in gitlab
<bentiss> now I need to find a CVE that I can use to evade the podman sandbox...
<alatiera> I should ask bart if he ever looked into kata containers before
<bentiss> alatiera: IIRC we looked, but never could get it working properly and the current solution was working... so...
<alatiera> oh fuck I wonder if the windows runners are also pawned
<alatiera> they were 100% usage up until I logged in
<alatiera> encouraging
<bentiss> I guess time for cutting all CI entirely
<alatiera> it was probably some CI job that just finished hopefully
* bentiss crosses fingers
<bentiss> alatiera: actually if you could log in, then maybe it's not compromised. We are basically unable to do anything on the compromised hosts
<bentiss> well, unless spwaning a container thorugh CI and logging in it
<daniels> bentiss: mmm, weston also uses kvm, and others using docker-in-docker are going to need privileged
<alatiera> yea I think the windows ones are fine but probably nto for long
<alatiera> I will disable the runner for now
<bentiss> daniels: kvm was fine in usermode
<bentiss> it was virglrenderer that was failing
<daniels> oh right, why was virgl failing?
<alatiera> d-in-d is a nogo
<alatiera> but its probably only used to build docker images hopefully?
<daniels> some projects like dbus use dind; virgl is using ci-templates tho
<bentiss> daniels: I couldn't really understand why
<bentiss> alatiera: d-in-d is already not working anymore, because the runners are behind podman these days
<alatiera> oh nice
<__tim> alatiera, windows seems to be working fine at least (cerbero msvc job at least running fine)
<__tim> if it was running miners you'd notice ;)
<alatiera> yea I just saw that job
<__tim> it's not terribly important if you want to stop it
<alatiera> not useful without the linux runners anyway
<alatiera> so yea I stopped the runner
<bentiss> daniels: I rebooted 13, and can log back in
<bentiss> and there are 3 suspicious ssh keys indeed
<bentiss> one with a plain email :)
<bentiss> yes, on that compromised machine, if I run podman logs runner-cvxra4bi-project-44165466-concurrent-1-6cd38c9473c64837-build-2 -> definitely a crypto handling
<bentiss> xmrig and everythin
<bentiss> but it doesn't seem to be a compromising job. Just a crypto one
<bentiss> but the project id is suspicious, because we are not anywhere near id 44165466
<daniels> mupuf: podman-plugins doesn't seem to exist as a package - what is it that you wanted installed?
<mupuf> daniels: what distro are you using?
<bentiss> it's debian stable with a custom repo for podman
<mupuf> ack!
<bentiss> so not surprising that it doesn't exist
* mupuf will need to figure out what is important, and which package provides it
<mupuf> so... ignore my request for now
<mupuf> sorry for the noise
<mupuf> even arch doesn't have such a package
<alatiera> what's project 19974
<bentiss> daniels: so what's the plan with -16 and -18?
<bentiss> alatiera: gfx-ci-bot/mesa
<alatiera> pheww
<alatiera> also its still on windows 2019!
<alatiera> I had almost forgot about that runner
<bentiss> I'm afraid they are using a bug in gitlab... because that project id doesn't exist
<bentiss> runner-cvxra4bi-project-44165466-concurrent-1-6cd38c9473c64837-build-2 I mean
<daniels> bentiss: I was going to leave them for now - so they're at least usable - whilst I try to bring up kata in the background on another runner
<bentiss> daniels: ack
<bentiss> the thing that worries me is that I don't know what they did to escape the container
<bentiss> ohhh. /etc/gitlab-runner/config.toml is polluted with a lot of new registrations
<bentiss> towards
<DavidHeidelberg[m]> outch
<bentiss> so AFAICT, the chain was -> we are compromised -> they register a runner to their project -> they mine crypto
<bentiss> next question, how were we compromised????
<daniels> bentiss: oh ... !
<daniels> you can compromise it from a job if you have --privileged
<bentiss> daniels: I kept a copy of the file in /root/compromised_config.toml
<bentiss> daniels: then instead of kata, maybe we can just use usermode podman
<daniels> we can do both!
<bentiss> daniels: but that also mean that that person registered before March 2 when we blocked provite repos
<mupuf> lovely...
<bentiss> or... that it's one of the last person requesting access
<bentiss> which is easy to track :)
<mupuf> but yeah, privileged runners == not so good. The problem is that ci-templates currently requires it
<bentiss> BTW, do we want to reenable mesa issues -> ?
<mupuf> (it used to work, but a newer version of buildah fails un unprivileged runners)
<bentiss> mupuf: nope, it was working fine with usermode podman
<mupuf> ha, I see! so the problem is that I did not finish this transition in my farm just yet
<mupuf> thanks
<mupuf> one more thing I need to do :D
<mupuf> well, the code is written, just not released
<bentiss> daniels: it could be interesting to ping our friends at gitlab about that project (44165466). All I have is a username in an path which doesn't match to a public account: builds/chanakyan.j/
<bentiss> does returns something...
<bentiss> and FWIW, one of the ssh key was chana@LAPTOP-K58CTK2G
<alatiera> that's sloppy if that's it lol
<bentiss> alatiera: I even have an ssh key with an email :) (
<alatiera> but you can put anything on the key description
<alatiera> but yea that's funny
<alatiera> wait wait, the priv part of the key too? :O
<bentiss> no no, the ssh-rsa XXXXXX foo@bar
<bentiss> which can be anything
<bentiss> daniels: FWIW, -15 had the placeholder job, and is gone. You might want to bring this one back in
<daniels> bentiss: yep, will do thanks
<daniels> bentiss: indeed I just found both chanakyan.j and dakshesh07
<bentiss> daniels: where id you find them?
<daniels> on the arm runner, running a job from chanakyan.j/llvm-tc-build (which has since been deleted) correlates with the time the gitlab-runner config was modified
<bentiss> FWIW, dakshesh07 was the one I reported on github and that was messing with us a few month ago
<daniels> yeah, I remember him from LLVM builds
<daniels> dakks@ was another of the SSH keys in there
<bentiss> yep
<daniels> I've also got an email written to
<bentiss> nice :)
<bentiss> daniels: which arm runner was it, 7 or 8?
<daniels> both ...
<bentiss> finding that job in the list of jobs is going to be cumbersome :(
<daniels> the project was deleted, so the jobs are gone too
<bentiss> ah, damn
<bentiss> though you got it all written down in your email, so we are fine
alanc has quit [Remote host closed the connection]
alanc has joined #freedesktop
MajorBiscuit has joined #freedesktop
MajorBiscuit has quit [Quit: WeeChat 3.6]
anholt_ has quit [Quit: Leaving]
ximion has quit [Quit: Detached from the Matrix]
Leopold has quit []
Leopold has joined #freedesktop
___nick___ has quit [Ping timeout: 480 seconds]
<DavidHeidelberg[m]> daniels: waiting time 40 minutes and it wasn't by looking at it seems it wasn't under pressure.
Leopold has quit [Ping timeout: 480 seconds]
Leopold_ has joined #freedesktop
ximion has joined #freedesktop
Haaninjo has quit [Quit: Ex-Chat]
AbleBacon has joined #freedesktop