ChanServ changed the topic of #freedesktop to: https://www.freedesktop.org infrastructure and online services || for questions about freedesktop.org projects, please see each project's contact || for discussions about specifications, please use https://gitlab.freedesktop.org/xdg or xdg@lists.freedesktop.org
scrumplex has joined #freedesktop
scrumplex_ has quit [Ping timeout: 480 seconds]
DragoonAethis has quit [Quit: hej-hej!]
DragoonAethis has joined #freedesktop
Turkish-Men has quit [Quit: Leaving]
marcheu_ has joined #freedesktop
marcheu has quit [Ping timeout: 480 seconds]
dri-logger has quit [Ping timeout: 480 seconds]
marcheu_ has quit [Ping timeout: 480 seconds]
dri-logger has joined #freedesktop
marcheu has joined #freedesktop
dri-logg1r has joined #freedesktop
eluks has quit [Remote host closed the connection]
eluks has joined #freedesktop
dri-logger has quit [Ping timeout: 480 seconds]
marcheu has quit [Ping timeout: 480 seconds]
marcheu has joined #freedesktop
ybogdano has quit [Remote host closed the connection]
ybogdano has joined #freedesktop
m5zs7k has quit [Ping timeout: 480 seconds]
m5zs7k has joined #freedesktop
swatish2 has joined #freedesktop
haver has quit [Ping timeout: 480 seconds]
AbleBacon has quit [Read error: Connection reset by peer]
haver has joined #freedesktop
tzimmermann has joined #freedesktop
pjakobsson has joined #freedesktop
jsa1 has joined #freedesktop
pjakobsson has quit [Remote host closed the connection]
pjakobsson has joined #freedesktop
<hakzsam> mesa CI is very slow these days, the queue is like ~10h long :/
sghuge has quit [Remote host closed the connection]
sghuge has joined #freedesktop
pjakobsson has quit [Remote host closed the connection]
lsd|2 has joined #freedesktop
swatish21 has joined #freedesktop
swatish2 has quit [Ping timeout: 480 seconds]
lsd|2 has quit [Quit: KVIrc 5.2.2 Quasar http://www.kvirc.net/]
sima has joined #freedesktop
ximion has quit [Remote host closed the connection]
swatish21 has quit [Ping timeout: 480 seconds]
swatish2 has joined #freedesktop
swatish2 has quit [Ping timeout: 480 seconds]
swatish2 has joined #freedesktop
<daniels> hakzsam: yeah I know, some of the build jobs were obscenely long which I just managed to trim down last night
<hakzsam> great, thanks for looking into it!
<daniels> there's definitely some worrying per-runner variability which I think means we have noisy neighbours competing too much for memory perhaps
<daniels> but yeah, we now have two Intel compilers to build, NVK and Rusticl to build and link in Rust which takes about a year, ACO not getting any smaller, and we get to eat more of that pain as we run clc in every build
<daniels> so atm the best thing anyone can do to help out would be to make compile times faster - either the C/C++/Rust compile and build, or the NIR compiler chain to build all the CLC
<daniels> istr alyssa mentioning nir_validate also got super slow recently?
swatish2 has quit [Ping timeout: 480 seconds]
swatish2 has joined #freedesktop
<mupuf> daniels: thanks <3
jsa1 has quit [Ping timeout: 480 seconds]
<daniels> np
guludo has joined #freedesktop
swatish21 has joined #freedesktop
swatish2 has quit [Ping timeout: 480 seconds]
jsa1 has joined #freedesktop
swatish2 has joined #freedesktop
swatish21 has quit [Ping timeout: 480 seconds]
<karolherbst> mhh, I guess I could see how to make clc compiles faster
<karolherbst> I did though about adding pch support which might help a bit here
<karolherbst> *thought
swatish2 has quit [Ping timeout: 480 seconds]
mvlad has joined #freedesktop
swatish2 has joined #freedesktop
<psykose> there's a libanv file that takes like 50s to compile @_@
<dj-death> one of the genX ?
<psykose> bit variable every run, my trace one became 42 for it, src/intel/vulkan/libanv_per_hw_ver300.a.p/genX_cmd_buffer.c.o
<karolherbst> I'm sure the intel clc stuff would benefit a lot from pch, because it pulls in the expensive internal opencl header, because of that intel subgroup extension
<karolherbst> though not quite sure how to integrate it well into the build system... maybe as its own target
<karolherbst> and then users depend on it
<karolherbst> if somebody is motivated enough to figure it out, please do
swatish2 has quit [Ping timeout: 480 seconds]
swatish2 has joined #freedesktop
<karolherbst> daniels: I'm wondering if we could improve the situation alot by cleaning up the libasan/libubsan situation. Like why not enabling both at once for all builds and just use that instead of splitting it all up? Might require some work here and there, but that would cut down at least jobs, though that might make the individuals pipelines slower,
<karolherbst> though might not matter much
<psykose> longest jobs for a default build locally https://img.ayaya.dev/eWQDkrK6xMfn
<karolherbst> interesting...
<karolherbst> I tihnk somebody mentioned that the genx macros are super heavy
<karolherbst> and looking at that genx_init_state.c file I can see how it explodes in size
<karolherbst> anyway.. guess I'll look into the pch stuff, because I wanted to do that anyway
<psykose> if you want the full perfetto trace by chance: https://img.ayaya.dev/XALK1KwzgfyQ (includes -ftime-trace so same query needs where depth = 0)
<psykose> wonder why nir_opt_algebraic.py takes 15s
todi1 has joined #freedesktop
todi has quit [Ping timeout: 480 seconds]
<daniels> karolherbst: asan/ubsan have a pretty huge runtime cost - especially in memory - and can't be used together iirc
<daniels> oh, it's asan and msan that are mutually exclusive
<daniels> but yeah, they'll massively slow down the run
<karolherbst> maybe could merge ubsan+asan at least
<dj-death> intel-clc should not be a thing anymore
<dj-death> code is still there but the default build options should avoid building it
<dj-death> we only need mesa_clc instead
<dj-death> psykose: I've noticed it takes a while indeed
<dj-death> psykose: the number of headers included is pretty bad on those files, it's mostly everything pulled by anv_private.h
<psykose> not sure how includes show in -ftime-trace tbh, like 95% of the time is in backend passes, but maybe it takes more time to shake them out
<psykose> though it does add up to a lot, plenty of files with >50% frontend
pjakobsson has joined #freedesktop
<bentiss> daniels: Anything to add to the new banner I created at https://gitlab.freedesktop.org/admin/broadcast_messages?
ximion has joined #freedesktop
<bentiss> in a way I'd like to be dramatic so we get volunteers, but OTOH, it wouldn't be fair to say we are cutting operations on April 30
<daniels> bentiss: it lgtm, thankyou
<daniels> hopefully we don't end up having to kill fd.o :P
<bentiss> k
<bentiss> well, one solution could be to migrate to gitlab.com and let them deal with the infra :)
<bentiss> but it's probably worse because vendor lock :(
<bentiss> daniels: also I had a chat this morning with whot, and while we were talking about AI scrapers, and one solution would be to use cloudflare to chase them out. Would this be something that can be on the table as well?
<daniels> I'd be fine with CloudFlare, or we might still have contacts at fastly as well
<daniels> hmmm, is that notification undismissable? ouch :)
<pinchartl> it sounds like something that shouldn't be dismissed until we find a solution
<bentiss> daniels: I wanted to make a point for a couple of days before making it dismissable :)
<pinchartl> what's needed ? sponsorship from a datacenter host ? money ? other things ? all of that ?
<bentiss> pinchartl: all of that I think
<bentiss> we were running on datacenter sponsorship which makes things easy, but right now I wish we just had money we spent so we could just handle the situation at the financial level, not at the technical one
<DragoonAethis> bentiss: how many resources, roughly, GitLab needs?
<DragoonAethis> CPU cores, storage, etc
<DragoonAethis> Runners are going to be a mess anyways, but we have some spare hardware and we might be able to figure out something
<bentiss> DragoonAethis: currently my main concern is the egress data: 56TB over the past month only. So if we start splitting the runners in various datacenter, this is going to explode
<bentiss> and cost a bunch
<pinchartl> :-O
<bentiss> (many thanks to the AI scrapers as well on that)
<DragoonAethis> by "some" hardware I might mean a rack full of Broadwells
<DragoonAethis> so not great, not terrible
<bentiss> right now we have interesting machines, but they are split so we can share more load, so if we have a dozen of not great runners, that might still make the difference
<pinchartl> (we should all get back to NNTP and let the tech bros destroy the rest of the internet. then we'll rebuild on the ashes)
<bentiss> heh
<Consolatis> that sounds like a plan
* pinchartl things of the last scene of season 1 of Mr Robot
<pinchartl> s/things/thinks/
<daniels> well, there was that gopher-v2 thing that was basically exactly what you're describing
<pinchartl> maybe a CI infrastructure designed on the assumption that resources are infinite (I'm not talking about fd.o in particular, but about CI solutions in general) is not sustainable ?
<bentiss> DragoonAethis: for gitlab, I'd like to have: a managed DB that I don't have to worry about (the disk used right now is 400GB, and it's growing), then we got 4 gitaly services, each of them between 200 to 500GB of data on disk), ideally on 4 separate machines/nodes, and then there are multiple services like webservers, sidekiq, marge-bot, pages webserver, which should all fit in 2
<bentiss> to 3 machines, and last, we need a ton of S3 storage: roughly 65TB
<pinchartl> every time I push a branch to the linux-media CI I feel bad (and that doesn't use the fd.o shared runners)
<bentiss> FTR, I've put all the numbers on a message to the board yesterday, not sure it's public though
<bentiss> (and we should probably bring the technical discussion on a public issue)
<daniels> pinchartl: what would you prefer?
<DragoonAethis> bentiss: The compute I could deliver, the storage and managed DB - not so much
<pinchartl> daniels: I don't know yet. I'm not sure anyone has really thought about alternative architectures (at least I haven't read about any) that would significantly cut CPU time and bandwidth
<DragoonAethis> lemme check what the boxes have, one sec
<pinchartl> but recompiling a kernel from scratch with 10 compilers/archs for every commit... that's just crazy
<psykose> normally caching like ccache or incrementally building on older build dirs (this fell out of fashion) helps with that, but the fundamental issue is that to test what you care about there is always a combinatorial explosion of options
<bentiss> oooh -> https://www.fastly.com/products/storage "zero egress fees"... daniels: if you have contacts at fastly that might be very interesting for us :)
<psykose> multiple build types * archs * test suites = mega time :D
<psykose> there's no way to reduce that without losing coverage of something
swatish2 has quit [Ping timeout: 480 seconds]
<pinchartl> and caches require more disk space
<psykose> yup
<karolherbst> dj-death: right, but I wanted to look into pch support, because intel_subgroups requires pulling in this other header which adds a significant amount of time to compile times
<pinchartl> but that could be cheaper than CPU time ? and maybe also less costly from an environmental point of view ? I don't have visibility on that
<daniels> the time-honoured cycle is that people start with shitty but small CI ('I don't need these templates because I don't understand them'), then they have shitty and big CI so we go talk to them, then they fix it to have big but relatively efficient CI
<svuorela> KDE iirc runs their/our gitlab on a couple of hetzner machines. I'm not sure fdo is bigger... though have weirder test setups for at least mesa.
<bentiss> svuorela: we are hosting a few kernel repos, and they are big... I wish we could run on just a couple of machines
kj2 has joined #freedesktop
<karolherbst> oh wow... using pch 1s -> 0.1s for the libintel shaders
<karolherbst> just need to figure out how the details all work there...
<karolherbst> it e.g. complains if the specified CL version differs
<karolherbst> let's see what it does in regards to the ray tracing ones
<Consolatis> I wonder how much of that 56TB egress traffic is from CI and by how much caching on the runner side + binary diffs could lower the amount (assuming most of the CI traffic is container images for the runners)
<mupuf> Consolatis: I would think most of the egress is build artifacts
<mupuf> The GitLab runner does not deduplicate their downloads :s
<bentiss> container images are already cached on the runners. The only info I can get is that we pulled almost 5TB from indico+ci-stats+s3.fd.o (s3.fd.o being the public facing s3)
<bentiss> so we get ~50TB of build artifacts, git pulling and AI scrapers
<bentiss> and registry.fd.o and all the various pages website we host\
<Ford_Prefect> Do we have any sort of outreach happening to get more infra etc.? Anything users can be helping amplify?
todi1 has quit []
todi has joined #freedesktop
ximion has quit [Remote host closed the connection]
soreau has joined #freedesktop
<soreau> I saw the banner on gitlab about the server provider.. migration doesn't sound like fun
<soreau> It seemed like fd.o gitlab was coming together pretty well on its current space
<soreau> It would be nice if I can do something to help but I can only offer thanks to everyone who helps maintain the fd.o space and hoping the transition will somehow change things for the better
<bentiss> soreau: thanks for the kind words :)
<soreau> 👍
jsa1 has quit [Ping timeout: 480 seconds]
haaninjo has joined #freedesktop
<karolherbst> yeah.. looks good from a quick glance. The driver taking the most time to build seems to be intel, hopefully somebody figures out how to make the genxml stuff faster
<bentiss> daniels: anything to add/correct on https://gitlab.freedesktop.org/freedesktop/freedesktop/-/issues/2011 ?
<bentiss> sigh... I forgot about cloudflare/fastly
<jenatali> bentiss: Out of curiosity do you have any numbers for what replacements might end up costing? I... doubt that I can convince folks to pony up finances but I wouldn't have any chance to even try if there's no budget ask
<bentiss> jenatali: yesterday Arek made a rough guesstimate, and by being overly generous he got between 10k to 15k a month
AbleBacon has joined #freedesktop
<bentiss> I think we can cut this down by a fair bit, but we are talking roughly 100K a year, maybe less, maybe more
<jenatali> 👍
<bentiss> (that's another emoji I can not see ;-) )
<daniels> bentiss: nope, it's super thorough
<bentiss> \o/
<daniels> thankyou for teh writeup :)
<bentiss> copy/paste from the previous email mostly :)
<daniels> Ford_Prefect: the writeup there is probably the start of the outreach. I think a couple of people pinged a couple of people but it didn't get much anywhere. a lot more people are now pinging a lot more people and trying to figure out who would be able to contribute how much
<daniels> Ford_Prefect: is Asymptotic feeling flush? :)
* bentiss hangs up for today
tzimmermann has quit [Quit: Leaving]
<Ford_Prefect> daniels: haha, if only!
swatish2 has joined #freedesktop
alanc has quit [Remote host closed the connection]
alanc has joined #freedesktop
random_james has quit [Remote host closed the connection]
random_james has joined #freedesktop
jsa1 has joined #freedesktop
swatish2 has quit [Ping timeout: 480 seconds]
<mupuf> bentiss: "We don't owe Equinix anything" --> did you mean that Equinix doesn't owe us anything?
<bentiss> mupuf: maybe... It's too late for me to think straight :)
<mupuf> ha ha
<bentiss> feel free to correct these kind of mistakes
<mupuf> you meant to say that equinix did not have to give us this service, right?
<mupuf> if so, I'll fix it
<bentiss> yeah, thanks
<bentiss> I also hesitated to put something like "any trolling would be moderated and/or start ban actions", but we are a good community, right?
<mupuf> hehe
<karolherbst> bentiss: there was an interesting idea to mess with bots I saw today: have a URL disallowed by bots in robots.txt and just ban anything accessing that url
<vyivel> i would definitely access that url out of curiosity
<bentiss> karolherbst: good idea, but those bots don't even bother with robots.txt, so how are they going to pull that URL in the first place?
<karolherbst> some random meta html tag
<karolherbst> vyivel: yeah, but we could also add comments to not do that :D
<pinchartl> "do *NOT* press the big red button"
<karolherbst> there is also things like this: https://zadzmo.org/code/nepenthes/
<karolherbst> *are
<pinchartl> and iocaine
<karolherbst> but yeha...
<karolherbst> those AI crawlers are a pain
<karolherbst> would be fun to try out a few things post migration
<karolherbst> also once we have better monitoring of those things
<pinchartl> do we have any leverage on (some of) those companies through developers who work there ?
karolherbst4 has joined #freedesktop
karolherbst has quit [Read error: Connection reset by peer]
<pinchartl> I suppose openai wouldn't care if they can't access fd.o
<pinchartl> and all the chinese AI companies wouldn't either
karolherbst4 has quit []
karolherbst has joined #freedesktop
jsa1 has quit [Ping timeout: 480 seconds]
<DavidHeidelberg> does cleaning up forks container registry useful or is 0.00% impact?
<DavidHeidelberg> (for the upcoming FDO migration)
<ivyl> bentiss: another CDN we can reach out to is bunny.net. I've heard some good things about them. What would be good time to contact them? We can possibly even budget something like that.
<pinchartl> speaking of container registry, is there a way to automatically delete older container images ? when I modify the libcamera CI in a way that triggers a container rebuild, I don't need to keep the old container images forever. I try to delete them manually when I remember
<pinchartl> (it won't save much, as we rarely update containers these days, but still)
<karolherbst> now that I *checks notes* eliminated 5 seconds from the mesa compile time, I'm sure this solves all our mesa CI overload issues
<pinchartl> ivyl: thanks
<ivyl> I've been using it on some other gitlab instance. Sadly it looks like it may be disabled on fdo: `This project's cleanup policy for tags is not enabled. Please contact your administrator.`
<DavidHeidelberg> yup
<pinchartl> it could be useful to link to that from the FDO_EXPIRES_AFTER documentation. I initially thought FDO_EXPIRES_AFTER was about cleaning up old images
<pinchartl> ah, if it's not available, then it's not an option :-)
<ivyl> Maybe it can be enabled? It's optional per project anyway.
mvlad has quit [Remote host closed the connection]
ximion has joined #freedesktop
jsa1 has joined #freedesktop
Kayden has quit [Quit: upgrade distro]
jsa1 has quit [Ping timeout: 480 seconds]
<DragoonAethis> bentiss: another option that comes to mind - purchase the hardware and run it at a colo dc
<DragoonAethis> Extreme one-time cost, pretty chill afterwards
<robclark> isn't that more or less what we had pre-gitlab ;-)
MrBonkers has quit [Quit: Ping timeout (120 seconds)]
MrBonkers has joined #freedesktop
haaninjo has quit [Quit: Ex-Chat]
<__tim> kemper might be out of disk space
<__tim> remote: remote: fatal: unable to write loose object file: No space left on device
<__tim> remote: hint: The 'hooks/post-update' hook was ignored because it's not set as executable.
<__tim> remote: error: remote unpack failed: unpack-objects abnormal exit
<__tim> remote: hint: You can disable this warning with `git config advice.ignoredHook false`.
<__tim> remote: To ssh://kemper.freedesktop.org/git/gstreamer/gstreamer
<__tim> remote: ! [remote rejected] 1.24.12 -> 1.24.12 (unpacker error)
<__tim> remote: error: failed to push some refs to 'ssh://kemper.freedesktop.org/git/gstreamer/gstreamer'
<__tim> sorry, that should've been two lines only
donte has joined #freedesktop
donte has left #freedesktop [#freedesktop]
sima has quit [Ping timeout: 480 seconds]