daniels changed the topic of #freedesktop to: GitLab is currently down for upgrade; will be a while before it's back || https://www.freedesktop.org infrastructure and online services || for questions about freedesktop.org projects, please see each project's contact || for discussions about specifications, please use https://gitlab.freedesktop.org/xdg or xdg@lists.freedesktop.org
Ekho has quit [Quit: CORE ERROR, SYSTEM HALTED.]
co1umbarius has joined #freedesktop
columbarius has quit [Ping timeout: 480 seconds]
<zmike>
a630 overloaded again?
Ekho has joined #freedesktop
Ekho has quit []
Ekho has joined #freedesktop
peelz has joined #freedesktop
peelz is now known as Guest2082
Consolatis_ has joined #freedesktop
Consolatis is now known as Guest2083
Consolatis_ is now known as Consolatis
Guest2083 has quit [Ping timeout: 480 seconds]
Ekho has quit []
Ekho has joined #freedesktop
Consolatis has quit [Ping timeout: 480 seconds]
Consolatis has joined #freedesktop
egbert is now known as Guest2094
egbert has joined #freedesktop
Guest2094 has quit [Ping timeout: 480 seconds]
ximion has quit [Quit: Detached from the Matrix]
swatish2 has joined #freedesktop
agd5f_ has joined #freedesktop
agd5f has quit [Ping timeout: 480 seconds]
tzimmermann has joined #freedesktop
bmodem has joined #freedesktop
scrumplex_ has joined #freedesktop
scrumplex has quit [Ping timeout: 480 seconds]
sima has joined #freedesktop
i-garrison has quit []
i-garrison has joined #freedesktop
An0num0us has joined #freedesktop
DodoGTA has quit [Quit: DodoGTA]
DodoGTA has joined #freedesktop
DodoGTA has quit []
DodoGTA has joined #freedesktop
pkira has joined #freedesktop
AbleBacon has quit [Read error: Connection reset by peer]
Haaninjo has joined #freedesktop
pkira_ has joined #freedesktop
pkira has quit [Ping timeout: 480 seconds]
swatish21 has joined #freedesktop
swatish2 is now known as Guest2118
swatish21 is now known as swatish2
Guest2118 has quit [Ping timeout: 480 seconds]
Ahuj has joined #freedesktop
pkira__ has joined #freedesktop
pkira_ has quit [Ping timeout: 480 seconds]
Haaninjo has quit [Quit: Ex-Chat]
vbenes has joined #freedesktop
vbenes has quit [Quit: Leaving.]
ximion has joined #freedesktop
vbenes has joined #freedesktop
ximion has quit [Quit: Detached from the Matrix]
<DavidHeidelberg[m]>
zmike: it's usually overloaded, I working on switching some jobs to less utilized a660 :)
vkareh has joined #freedesktop
<zmike>
🤕
vbenes has quit [Ping timeout: 480 seconds]
MrCooper has quit [Remote host closed the connection]
MrCooper has joined #freedesktop
vbenes has joined #freedesktop
tjaalton has left #freedesktop [#freedesktop]
vbenes has quit [Quit: Leaving.]
pq has quit [Ping timeout: 480 seconds]
pq has joined #freedesktop
heapify has joined #freedesktop
AbleBacon has joined #freedesktop
enunes- has quit [Remote host closed the connection]
enunes has joined #freedesktop
An0num0us has quit [Ping timeout: 480 seconds]
An0num0us has joined #freedesktop
tzimmermann has quit [Quit: Leaving]
heapify has quit [Remote host closed the connection]
ximion has joined #freedesktop
heapify has joined #freedesktop
heapify is now known as heapheap
<ndufresne>
This is hitting me all the time now while I do editorial changes in messages that are otherwise unreadable .... Your comment could not be updated because your comment has been recognized as spam. please, change the content to proceed..
heapheap has quit []
bmodem has quit [Ping timeout: 480 seconds]
<mupuf>
ndufresne: it often happens if you edit too fast
<anholt_>
I don't actually get how this would go together, but if you can document something in docs.mesa3d.org for lab setup, I'd love to enable it for mine and google's.
vbenes has joined #freedesktop
Ahuj has joined #freedesktop
Haaninjo has joined #freedesktop
<DavidHeidelberg[m]>
zmike: what I was thinking is move some x86 jobs to aarch64, since aarch64 is not that utilized.. but also there is less of them, so I don't think it's that meaningful
<anholt_>
if fdo is overprovisioned on aarch64 vs x86, then we should be moving provisioning more x86 compared to aarch64, rather than shuffling around every project's testing.
<daniels>
last I looked 2 was a bit overprovisioned and 1 would be quite underprovisioned
<daniels>
bentiss: so how does logging work these days? I see elasticsearch is no longer there and sentry doesn't have anything backing the svc
<bentiss>
daniels: head to the IP address of the loki-stack-grafana svc in the logging namespace
<bentiss>
login with the credentials from the secret fdo-loki-grafana in the logging namesapce (not entirely sure about this0
<bentiss>
)
<bentiss>
then go to explore, select the namespace, the app, the pod, and/or the line filtering
Ahuj has quit [Ping timeout: 480 seconds]
<bentiss>
(just checked the secret is the correct one)
<bentiss>
daniels: I switched from elastic to loki and it's much simpler. Because it's plain text filtering, but you can have fancy stuff like json parsing
Kayden has quit [Quit: to JF]
<daniels>
bentiss: ok I think I'm getting the hang of it, thanks!
Kayden has joined #freedesktop
<mupuf>
anholt_: ack!
<mupuf>
Thx
<anholt_>
oh, neat. gitlab is making progress on non-docker-machine support for autoscaling on cloud.
sima has quit [Ping timeout: 480 seconds]
thaller has joined #freedesktop
vbenes has quit [Quit: Leaving.]
<daniels>
anholt_: yeah, they started rewriting provisioning stuff a few months back
<daniels>
last I checked it wasn't yet ready, but getting there
<anholt_>
having gitlab automatically spawn VMs up to some maximum and manage their life cycle feels like the dream.
Haaninjo has quit [Quit: Ex-Chat]
<daniels>
bentiss: ooi do you regularly dig deep through logs, i.e. look for things over time? what I was trying to do is to see how many !200 responses we get for runner-gating.sh, so we can figure out why jobs are/were dying before even starting
<daniels>
what I've got so far is that I've had to do line filters (contains 'runner-gating.sh', does not contain '"status":200') before the json filter, else the queries time out once you get beyond like 6 hours
<daniels>
even then, once I have a total pipeline that uses expressions on the json unpack to only keep the vars I want (status+duration_ms), with or without a count expr, I pretty quickly end up in timeouts
<daniels>
I can make it to a week on workhorse (shorter lines?), and less than that on webservice
<daniels>
(and if I try to get log lines rather than counts, that drops to like ... 3 days)
swatish2 has quit [Ping timeout: 480 seconds]
vkareh has quit [Quit: WeeChat 4.0.4]
<daniels>
anyway, in the only cases I can see of runner-gating.sh failing in the past few days, what I see is that by the time nginx reads the request data, the client has already given up (499 error code, which is nginx's way of saying the client hung up, and sub-millisecond $request_time)
<daniels>
so that would maybe indicate that in those cases, nginx isn't actually servicing client requests quickly enough, that they're all hanging around in a queue for ... however long it takes for curl to give up?
<daniels>
bentiss: anyway, atm we do 10 requests per job (2x fetching script + 4x sub-fetches); I wonder if we could reduce that a bunch by having the runners fetch and cache as much as possible locally?
<daniels>
(also, am I right to guess that the $CI_* runner variables aren't available to pre-clone-sources/pre-build scripts?)