ChanServ changed the topic of #freedesktop to: https://www.freedesktop.org infrastructure and online services || for questions about freedesktop.org projects, please see each project's contact || for discussions about specifications, please use https://gitlab.freedesktop.org/xdg or xdg@lists.freedesktop.org
Seirdy has joined #freedesktop
Seirdy has quit []
Seirdy has joined #freedesktop
Haaninjo has quit [Quit: Ex-Chat]
<ishitatsuyuki> I'm trying to subscribe to dri-devel@lists.freedesktop.org but I don't get a confirmation email. What could be the reason?
jstein has quit []
ximion has quit [Ping timeout: 480 seconds]
ximion has joined #freedesktop
ximion has quit []
pendingchaos has quit [Quit: No Ping reply in 180 seconds.]
pendingchaos has joined #freedesktop
alanc has quit [Remote host closed the connection]
alanc has joined #freedesktop
danvet has joined #freedesktop
<emersion> daniels: hmmm, sorry to bother you again about this but… has the CI runner registration token changed? i'm getting a 403
<emersion> i'll avoid this next time by caching the runner token once registered
<daniels> yeah caching is good :)
<emersion> thanks again!
<tomeu> daniels: bentiss: extra slow download from packet to hetzner: https://gitlab.freedesktop.org/tomeu/mesa/-/jobs/16438979
<tomeu> mesa.tar.gz 25% |******** | 79.2M 0:25:23 ETA
<bentiss> __tim: ^^ hey, can we get the output of "mtr minio-packet.freedesktop.org -w -b" and the iperf3 test?
<MrCooper> ishitatsuyuki: FWIW, subscribing to lists via e-mail is no longer supported, only via the website
ximion has joined #freedesktop
<ishitatsuyuki> MrCooper: I tried from website too and it doesn't work
<ishitatsuyuki> (I suppose you read the mail server logs?)
<ishitatsuyuki> does the list requires approval to subscribe? if not, I suspect that something is broken with recaptcha
ximion has quit []
<daniels> oof
<daniels> ok, Mailman is now working its way through a lot of outbound mail
<ishitatsuyuki> damn
<ishitatsuyuki> thanks daniels :)
<daniels> heh, np
Seirdy has quit [Ping timeout: 480 seconds]
Seirdy has joined #freedesktop
<bentiss> __tim: thanks. Though the fact that the download is stuck but iperf is showing good results implies that the issue is on our side
<__tim> I think so too, but I'm not sure what it could be that would have such drastic effects
<__tim> unless something somewhere is failing utterly at TCP
<bentiss> likewise, we have a connection opened from the 51 IP address with a very low bandwidth rate
<__tim> it happens the other way round too though, there were issues where uploads were a trickle speeds (20-50kB/sec), which could possibly be explained by lots of concurrent downloads clogging the pipe or something, but then when the network pressure (if that even exists) eases, those connections never ramp up to reasonable speeds, but get stuck at their
<__tim> low speeds
<bentiss> https://gitlab.freedesktop.org/mesa/mesa/-/jobs/16440376 seems to be at ~200KB/s according to iftop
<bentiss> I wonder if this is not related to the wget/busibox version gitlab runner is using
pendingchaos has quit [Ping timeout: 480 seconds]
<bentiss> __tim: I'll add you to the cc list of equinix, to give them a heads up. Can I use your centricular email from your gitlab profile?
pendingchaos has joined #freedesktop
<daniels> or the machines aren't in I/O death are they?
<daniels> if you're trying to download loads of stuff at the same time ...
<bentiss> daniels: iperf3 would show some issues, not the full bandwidth available, no?
<__tim> bentiss, sure
<bentiss> __tim: thanks, sending it now
<daniels> bentiss: I mean disk I/O, like does it make a difference if it's writing to FS vs. purely in memory
<__tim> it doesn't feel like a network infrastructure issue to me, since we're getting consistently ok readings using iperf (unless there's some filtering somewhere for https ports that we're not aware of, but seems unlikely doesn't it)
<bentiss> __tim: agree, but I want to get back to Equinix as I mentioned that issue recently. He might also have an ideo or 2 of what could be wrong
<bentiss> daniels: could be, but the mesa.tar.gz file is used often, when the writes to the disk is not so often
<bentiss> I mean, not sure the write IO on minio-packet.fd.o are taking over for an extended period of time
<daniels> bentiss: of course, what I mean is on the Hetzner side
<bentiss> oh, maybe
<daniels> i.e. that the output from minio-packet is fine, the Equinix<->Hetzner network is fine, the local Hetzner network interface is fine, but the slowness is in the Hetzner machine writing to local disk
<daniels> (I have no evidence of this, just a random guess)
<__tim> Wouldn't that show up across the board everywhere?
<bentiss> dumb idea: next time we see that, on the runner we run a manual `wget https://minio-packet.freedesktop.org/git-cache/mesa/mesa/mesa.tar.gz` -> if this works, then busybox is the problem
<__tim> and in that case I would expect download to start fast (while write caches in RAM are filled) and then to slow down
<__tim> I don't know if it's busy box, we're seeing these slow speeds elsewhere too
<__tim> in fact, if I do a wget https://minio-packet.freedesktop.org/git-cache/mesa/mesa/mesa.tar.gz on the runner now I get 160-180kB/sec, with pretty much nothing else going on (no gitlab jobs)
<bentiss> heh, that's where those 160KB/s are coming from :)
<__tim> wget -O /dev/null shows same speed fwiw
* bentiss just deleted the minio pod to see if a restart works
<bentiss> __tim: can you retry a wget now?
<MrCooper> FWIW, we did find in Mesa's CI that any file writes can cause meson tests to time out, taking orders of magnitude longer than normally; indicates very high block layer and possibly also memory pressure on the runners
<__tim> bentiss, 10MB/sec
<bentiss> still not the best but way better
<bentiss> FWIW, the pod was up for 256d, so maybe it kind of blacklisted your ips
jenatali has quit [Quit: Bridge terminating on SIGTERM]
pv has quit [Quit: Bridge terminating on SIGTERM]
colemickens has quit []
heftig[m] has quit [Quit: Bridge terminating on SIGTERM]
jjardon[m] has quit []
dcbaker has quit [Quit: Bridge terminating on SIGTERM]
unrznbl[m] has quit []
HayashiEsme[m] has quit []
dev[m] has quit []
scorpion2185[m] has quit []
x[m] has quit []
ignapk[m] has quit []
alatiera_afk[m] has quit []
nielsdg has quit []
gagallo7[m] has quit []
eriki73[m] has quit []
yasin-f[m] has quit []
thejonny has quit []
nirbheek has quit [Quit: Bridge terminating on SIGTERM]
gnfzdz[m] has quit []
msmith12[m] has quit []
Sumera[m] has quit []
ewlsh[m] has quit []
kusma has quit [Quit: Bridge terminating on SIGTERM]
gkiagia has quit []
razze[m] has quit []
tinywrkb has quit []
zzag[m] has quit []
chrysn[m] has quit []
shadeslayer has quit [Quit: Bridge terminating on SIGTERM]
halfline has quit []
muhlinux[m] has quit []
mooff[m] has quit [Quit: Bridge terminating on SIGTERM]
ivyl has quit [Quit: Bridge terminating on SIGTERM]
Sheogorath[m] has quit []
tintou has quit []
yk has joined #freedesktop
<bentiss> will see if that is enough for the time being
<__tim> MrCooper, only on the htz runners, or in general?
<__tim> I have not seen any memory pressure on the htz runners. It's usually single digit GBs out of 128GB "used", with more than half used for caches, and 50-ish GB "free"
<__tim> only times I've seen problems was when there was some kind of coredump bombing going on from some mesa jobs, but I think that was a long time ago
<MrCooper> it could happen on the packet runners as well, but IIRC the timeouts happened more often on htz ones (I could honestly misremember though); did you check during the afternoon in North America?
<daniels> __tim: fwiw we just globally disabled core dumping at the kernel level
mooff[m] has joined #freedesktop
dev[m] has joined #freedesktop
<__tim> bentiss, still getting slow downloads when doing the initial git clone fwiw (200kB/sec-ish) from htz3
<__tim> is that a different pod?
<daniels> yeah, totally different service ...
alatiera_afk[m] has joined #freedesktop
chrysn[m] has joined #freedesktop
colemickens has joined #freedesktop
dcbaker has joined #freedesktop
eriki73[m] has joined #freedesktop
ewlsh[m] has joined #freedesktop
gagallo7[m] has joined #freedesktop
gkiagia has joined #freedesktop
gnfzdz[m] has joined #freedesktop
halfline[m] has joined #freedesktop
HayashiEsme[m] has joined #freedesktop
heftig[m] has joined #freedesktop
ignapk[m] has joined #freedesktop
Guest7762 has joined #freedesktop
jenatali has joined #freedesktop
jjardon[m] has joined #freedesktop
kusma has joined #freedesktop
msmith12[m] has joined #freedesktop
muhlinux[m] has joined #freedesktop
nielsdg has joined #freedesktop
nirbheek_ has joined #freedesktop
pv has joined #freedesktop
razze[m] has joined #freedesktop
scorpion2185[m] has joined #freedesktop
Sheogorath[m] has joined #freedesktop
shadeslayer has joined #freedesktop
x[m]1 has joined #freedesktop
Sumera[m] has joined #freedesktop
thejonny has joined #freedesktop
tintou has joined #freedesktop
tinywrkb has joined #freedesktop
unrznbl[m] has joined #freedesktop
yasin-f[m] has joined #freedesktop
zzag[m] has joined #freedesktop
jarthur has joined #freedesktop
Guest7762 has quit []
enick_484 has joined #freedesktop
enick_484 has quit []
enick_484 has joined #freedesktop
<jenatali> alatiera: I'm looking to understand why a Mesa CI job on the Windows runner is hanging, and I can't reproduce it locally. Any chance I could get access to the machine to try to get more info like a crash dump of a hung job?
<jenatali> daniels suggested you might be able to help
<alatiera> sure thing
<alatiera> what is the type of the hang
<alatiera> also __tim ^
<jenatali> Just some outstanding tests that aren't passing, failing, or crashing
<alatiera> is it a new regression or new tests?
<jenatali> I'm trying to change the tests over to use the waffle framework instead of freeglut, and for some reason that causes the tests to hang
<jenatali> But I use waffle locally and I don't see this kind of hang
<__tim> does it hang 'every time'? I mean, is it reproducible or just now?
<jenatali> Yeah, this is 3 for 3 with the same test count remaining
<jenatali> To be clear I'm pretty sure this is a problem either in the test, or the component that's being tested, but since it doesn't crash there's no logs for me to even know which tests, let alone why
<MrCooper> running piglit with (the Windows equivalent of) -1 -v might let you know which test it is at least
<jenatali> MrCooper: Thanks! I'll give that a shot
<alatiera> does it hang also on the other windows runner?
<alatiera> do note that the gst windows runner doesn't have a gpu
<jenatali> "The other" runner? I didn't realize there was more than one
<jenatali> And yeah, I'm aware there's no gpu
<__tim> you can add your own personal gitlab runners if you have suitable machines at hand fwiw (in case you suspect it's a docker / gitlab-runner env thing)
<alatiera> well, for windows you need a windows server for that
<alatiera> daniels I recall correctly that there are 2 windows runner in fd.o right?
thaller is now known as Guest7795
thaller has joined #freedesktop
Guest7795 has quit [Ping timeout: 480 seconds]
___nick___ has joined #freedesktop
<daniels> alatiera: erm, just the htz one - the other ones haven't been running for a long time now
___nick___ has quit []
<alatiera> ah I see, had not kept up with it
___nick___ has joined #freedesktop
___nick___ has quit []
___nick___ has joined #freedesktop
ximion has joined #freedesktop
<jenatali> alatiera: So would it be possible to get access to that machine so I can generate/retrieve a crash dump from the hung test?
jarthur has quit [Ping timeout: 480 seconds]
<alatiera> yes, one sec
<alatiera> jenatali can you send me an ssh key in a dm or jordan at centricular dot com?
ngcortes has joined #freedesktop
Thymo_ has joined #freedesktop
Thymo has quit [Ping timeout: 480 seconds]
Thymo_ has quit [Read error: Connection reset by peer]
Thymo has joined #freedesktop
Haaninjo has joined #freedesktop
<alatiera> daniels do you remember how you had setup ssh into powershell on the windows runner by a chance?
ngcortes has quit [Ping timeout: 480 seconds]
___nick___ has quit [Ping timeout: 480 seconds]
<daniels> Go to Windows Settings -> Apps & Features -> Optional Features -> Add a Feature and enable the OpenSSH server
<daniels> Create C:\ProgramData\ssh\administrators_authorized_keys with your SSH key; find it in File Explorer, go to Properties -> Security -> Advanced, and select Disable Inheritance, copying the inherited permissions to the current file; then remove any Authorized Users ACL, so only SYSTEM and Administrators have any access
<daniels> Go to Services and either manually start the OpenSSH server, or set it to start automatically
<daniels> Make sure your current network is Private so you can get through the firewall
<daniels> You should now be able to SSH into your Windows machine, and get a lame CMD.EXE prompt
<daniels> there's a way to make pwsh the default, but you don't want to for reasons I don't entirely remember (basically it's like changing your UNIX shell to csh; stuff just fails to work)
danvet has quit [Ping timeout: 480 seconds]
pendingchaos has quit [Ping timeout: 480 seconds]
<jenatali> daniels: There's a missing step here for the runner. Since the Linux host machine acts as a NAT, something also needs to forward the SSH port to it (I think - I'm way out of my depth here)
<daniels> jenatali: that's probably correct, yeah, but one for alatiera ...
<daniels> (I mean I wrote that for my laptop)
pendingchaos has joined #freedesktop
Seirdy has quit []
ngcortes has joined #freedesktop
Seirdy has joined #freedesktop
Seirdy has quit [Ping timeout: 480 seconds]
Seirdy has joined #freedesktop