daniels changed the topic of #freedesktop to: https://www.freedesktop.org infrastructure and online services || for questions about freedesktop.org projects, please see each project's contact || for discussions about specifications, please use https://gitlab.freedesktop.org/xdg or xdg@lists.freedesktop.org
<bentiss>
DavidHeidelberg: yeah, s3 is fine. The only question is how to get your file there. If you can upload it through a job, then that's easy enough. If you need to manually upload it, we might need to create a special bucket for manual uploads. But we'll have to get a JWT from gitlab for that...
Haaninjo has joined #freedesktop
<mupuf>
between S3 and the container registry?
<mupuf>
bentiss: I would like to add a new test container in mesa, for arm64 testing. This would be used by both imagination and freedreno. Looking at the current code, it seems like we are storing the arm64 rootfs in two places: S3 (for lava), and in a container (bundled with other stuff and extra dependencies) in baremetal. Ideally, I feel like we should have one container that contains all the userspace stuff (stored as zstd:chunked to reduce
<mupuf>
bandwidth when making new versions) which could be used directly by gitlab runners / CI-tron, then have the lava/baremetal jobs create the rootfs they need at run time by extracting the container to NFS and downloading/extracting the kernel they want to use in the same way they currently download them just like they currently do, except by using skopeo/podman rather than wget. Do you have any thoughts on this? Is storage/bandwidth cost the same
zxrom has joined #freedesktop
<bentiss>
mupuf: no real thoughts on this. storage/bandwidth cost the same between s3 and container registry as it's the same backend (one is serving files directly, the other has a container registry in front of an internal s3)
<mupuf>
good, this matches my expectations
<bentiss>
it just feels more work to pull the container, extract it, when using direct s3 can be cached by a local proxy
<mupuf>
the container can be cached by a local proxy just as well, no?
<bentiss>
yeah but you need to extract it everytime
<bentiss>
from an admin pov, I have more visibility on the s3 filesystem when used directly as the paths are encoded directly (no hash checksum in the paths)
<bentiss>
so if we get out of storage, I can more easily pinpoint who is the responsible
<mupuf>
oh, actually, this is what happens with the current way (the rootfs is download and extracted every time) whereas the container would be extracted on the first download and just copied to NFS
<bentiss>
when using the registry, between everybody copying the image, we have more chances of having stalled intermediate images and no ideas if we can delete those
nwm has joined #freedesktop
<mupuf>
The container way should be faster than the current solution, on top of reducing bandwidth needs when new versions get download thanks to zstd:chuncked
<mupuf>
oh, right, accountability is indeed an issue
<bentiss>
long story short: I won't prevent you to have one common container for everything if that's simpler
<mupuf>
(not that we don't already have the problem with all the other containers we build for mesa)
<bentiss>
for acountability, we should have a script that goes on the entire container registry, look at the labels and purge the out of date images
<bentiss>
mupuf: I'm not saying we don't have the problem, just that this part is more easy to be seen
<mupuf>
yeah, got you :)
<mupuf>
alright, so containers win for local reproducibility of the CI environment and bandwidth reduction but they lose on the ease of accountability of storage
<bentiss>
sounds like a good summary, yeah
<mupuf>
Ack, got it!
<mupuf>
thanks a lot :)
<bentiss>
no worries (not sure I helped much TBH)
lynxeye has joined #freedesktop
<mupuf>
bentiss: You confirmed that both end up hitting the same place, and you shared your accounting concerns
<daniels>
mupuf: well, if you rewrote LAVA and bare-metal to consume containers rather than a tarball URL, then fine
<daniels>
but short of that, we'd have to have something which would pull the container, tar it up, store it somewhere and access it, which is ... exactly what we do now?
<daniels>
so I'm not really sure what problem that's solving atm
<mupuf>
daniels: yeah, to really reduce duplication, we would have to rewrite both
<mupuf>
What this would solve is imagination and freedreno being able to run tests on aRM64
<mupuf>
We consume containers replicating what baremetal does it just making the problem worse
<mupuf>
Potman has a rootfs mode, but using it would be very inefficient to use
<daniels>
img and fdno are already running tests on arm64 though?
<daniels>
or do you mean enabling those on ci-tron
<mupuf>
Yeah, new jobs using ci tron
<daniels>
yeah I mean, short of rewriting LAVA/bare-metal to both look exactly like ci-tron does (or completely breaking all the existing jobs, or making them much slower than they already are), the best thing for now is just to replicate what all the other jobs do
<daniels>
we'll never get to the point where DUTs are directly consuming containers because most of them are way too slow (either CPU, I/O, or both - plus limited RAM) to take the hit of unpacking a container image during startup
bmodem has quit [Ping timeout: 480 seconds]
<daniels>
there could be some kind of local cache which takes a container image, flattens it out to a single tarball, and uploads it some kind of storage service which can be used by the ultimate NFS host, but again that's already what we have
mvlad has joined #freedesktop
<daniels>
(we're not hitting s3 every time because that would obviously be insane - we have multi-tiered local caching proxies and monitoring on hit rates etc - so s3 is primarily just there for better remote visibility)
bmodem has joined #freedesktop
<mupuf>
I'm confused because what you guys are doing is just a cruder version of a container runtime, and we could replace that to do the exact same thing without changing anything.for the.DUTs
<mupuf>
I'll give it a try, maybe I 've missed something
<daniels>
well sure, whatever you need to do to provide LAVA a URL to a single flat tarball which contains the rootfs
<daniels>
I'm not really sure what the difference would be with what you're proposing but happy to review MRs
<daniels>
(there's no container runtime on the DUTs obviously, they just boot a single rootfs over NFS)
<mupuf>
Yep
<mupuf>
Lava is the one extracting the rooffs, right?
<mupuf>
Makes sense for its design
<mupuf>
So the best we could do is change baremetal not to package the rootfs in its image and instead download it from the registry
<mupuf>
So the rootfs job would push both a container AND a container
<daniels>
that sounds like a nice cleanup, yeah
<mupuf>
I looked into creating a custom manifest that would point to the same layer... but it doesn't seem possible
<mupuf>
Maybe we could get an http link to the layer for lava to consume though
<mupuf>
Will investigate
<daniels>
indeed LAVA extracts the rootfs - the job you submit has a URL to a tarball, then when it's scheduled the job to a given DUT, the local worker attached to that DUT pulls the tarball and unpacks to the NFS root path
<mupuf>
Ack
<daniels>
so yeah, LAVA could be modified to instead unpack a container image to that path instead
<mupuf>
Thanks!
<daniels>
it just hasn't been the low-hanging fruit to date
<mupuf>
Right, and I doubt I'll be hacking on it
<mupuf>
But baremetal seems more ripe for improvements