ChanServ changed the topic of #dri-devel to: <ajax> nothing involved with X should ever be unable to find a bar
ultra has quit [Quit: Ultra]
ultra has joined #dri-devel
co1umbarius has joined #dri-devel
columbarius has quit [Ping timeout: 480 seconds]
sukrutb has quit [Ping timeout: 480 seconds]
leo60228 has quit [Read error: No route to host]
leo60228 has joined #dri-devel
JohnnyonF has quit []
yuq825 has joined #dri-devel
<DemiMarie> gfxstrand: is it okay if I send you a private message? I have some rather in-depth questions regarding the future of GPUs and security.
mwk[m] has joined #dri-devel
sukrutb has joined #dri-devel
<DemiMarie> In Qubes OS, for instance, security is the absolute highest priority. Right now, Qubes OS is stuck with software rendering, so even a 2x performance hit over bare metal would still be a huge win. Direct-to-firmware submission means that the GPU firmware could come under fire some pretty high-level attackers. Mobile GPU firmware has been designed to defend against malicious userspace for quite a while now, but most desktop GPU firmware
<DemiMarie> has not, and relying on the security of a closed source privileged blob does not give me a warm fuzzy feeling. The more protections between guests and the GPU, the better, so long as they do not have the stupendous attack surface that running the shader compiler on the host entails.
<DemiMarie> Intel SR-IOV is on by default in Windows Sandbox IIUC, so that should be pretty safe. Apple has always cared a LOT about app sandbox escapes, and while the AGX driver must go off of what alyssa and lina could reverse-engineer, I consider the risk of a vulnerability due to this to be comparable to (if not less than!) the risk of a memory corruption hole in one of the other drivers. AMD and especially Nvidia are quite concerning, though.
sukrutb has quit [Remote host closed the connection]
aravind has joined #dri-devel
TMM has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
TMM has joined #dri-devel
oneforall2 has quit [Remote host closed the connection]
heat_ has quit [Remote host closed the connection]
oneforall2 has joined #dri-devel
heat_ has joined #dri-devel
ondracka has joined #dri-devel
ondracka has quit [Quit: Leaving]
bmodem has joined #dri-devel
JohnnyonFlame has joined #dri-devel
kzd has quit [Ping timeout: 480 seconds]
sgruszka has joined #dri-devel
itoral has joined #dri-devel
JohnnyonFlame has quit [Ping timeout: 480 seconds]
sima has joined #dri-devel
pallavim has quit [Ping timeout: 480 seconds]
Duke`` has joined #dri-devel
tzimmermann has joined #dri-devel
Company has quit [Quit: Leaving]
alanc has quit [Remote host closed the connection]
alanc has joined #dri-devel
<doras> jenatali: I see. Thanks for letting me know. I wasn't familiar with this guideline.
<lina> DemiMarie: I already found and reported a shader-to-full-system-control CVE in Apple's GPU stack but that one never affected Linux ^^
frieder has joined #dri-devel
<dolphin> DemiMarie: I guess the overall challenge is that GPUs have been designed with mostly performance in mind a few years ago, security is only addition of last half a dozen years
<lina> Given the kernel side is written in Rust I'd be surprised if you can find something to actually exploit from userland directly in the kernel driver other than a DoS (there's plenty of ways to DoS the system anyway...)
<tjaalton> dcbaker: is mesa 23.0.3 last of the series?
<lina> And Apple did fix the one hole in their firmware privilege separation I could find...
<dolphin> If you look at the hardware, there was global address space used by all applications on older hardware, and I think nouveau still doesn't care to zero VRAM passing between different applications.
<lina> We do zero... the one thing that I'm not so sure about is cross-process tile memory leakage. I don't know whether the shader cores/firmware guarantee tile memory is cleared when switching between VM contexts. We could mitigate that using a shader program prelude in kernel-managed, GPU-RO memory, although it would have some performance cost and complicate the UAPI...
jfalempe has joined #dri-devel
sghuge has quit [Remote host closed the connection]
sghuge has joined #dri-devel
rauji___ has quit []
aravind has quit [Ping timeout: 480 seconds]
mauld has joined #dri-devel
aravind has joined #dri-devel
Jeremy_Rand_Talos__ has joined #dri-devel
Jeremy_Rand_Talos_ has quit [Remote host closed the connection]
pochu has joined #dri-devel
rasterman has joined #dri-devel
<jani> lumag: https://intel-gfx-ci.01.org/ and maybe #intel-gfx-ci or ping DragoonAethis
<MrCooper> karolherbst_: the end of .gitlab-ci/container/cross_build.sh has a workaround for that
lynxeye has joined #dri-devel
pcercuei has joined #dri-devel
heat_ has quit [Remote host closed the connection]
heat_ has joined #dri-devel
vliaskov has joined #dri-devel
kts has joined #dri-devel
aravind has quit [Ping timeout: 480 seconds]
cmichael has joined #dri-devel
aravind has joined #dri-devel
sarahwalker has joined #dri-devel
itoral has quit [Remote host closed the connection]
<MrCooper> tjaalton: looks like chromium doesn't take the Mesa version into account correctly: https://bugs.chromium.org/p/chromium/issues/detail?id=1442633
<karolherbst_> MrCooper: it doesn't. That workaround is because the foreign packages would install foreign python. I need native and foreign, but they install both into the same locations
karolherbst_ is now known as karolherbst
<MrCooper> gotcha
<karolherbst> I already use that workaround, but with rustc/bindgen we need both and it's annoying :(
kts has quit [Quit: Konversation terminated!]
bmodem1 has joined #dri-devel
bmodem has quit [Ping timeout: 480 seconds]
<DragoonAethis> lumag: Re: Patchwork tests, we don't have the setup documented anywhere unfortunately, but the tl;dr is that you can access Patchwork via its API: https://gitlab.freedesktop.org/patchwork-fdo/patchwork-fdo/-/blob/master/docs/rest.rst - from that you pick up events like new series/revs, trigger a pipeline on the CI system of your choice, pull the latest target kernel for a given project, apply mbox (grab it from /api/1.0/series/{id}/
<DragoonAethis> revisions/{rev}/mbox/), build, run tests, report success/failures over the API for whichever stage failed
<DragoonAethis> On the hardware testing, good starting points are either https://mupuf.org/blog/2021/02/08/setting-up-a-ci-system-preparing-your-test-machine/ (ping mupuf) or https://www.lavasoftware.org/index.html (ARM-friendly) - we have some in-house scripting to make that tests go brrr, but we're moving towards a cleaner setup with b2c et al
<mupuf> DragoonAethis, lumag: FYI, we now also support arm64 in our infra
<mupuf> riscv64 has experimental support
<mupuf> and armv6 should also be supported very soon
<DragoonAethis> mupuf: Uh, v6, not v7?
<mupuf> DragoonAethis: if you have a v7 CPU, it will run v6 code
macromorgan is now known as Guest1741
macromorgan has joined #dri-devel
<mupuf> the difference between the two is more related to the NEON instructions, which isn;t something that impacts system applications
<DragoonAethis> Yeah, it's just odd, I remember the last v6 boards that did any sort of graphics being ancient and slow
<mupuf> oh, right, yeah
<mupuf> just wanted to go for maximum compatibility
<mupuf> but b2c is not designed for gfx testing, is it? It is designed for running containers from an initrd as fast as possible
<mupuf> if people want to use it to test NVME drivers, it's up to them ;)
Guest1741 has quit [Ping timeout: 480 seconds]
<DragoonAethis> Well, if you're testing NVMe drivers on ARMv6, you'll have a bad time with CPUs not keeping up either :D
<DragoonAethis> But sure, I guess Pi Picos etc could work in there
pochu has quit [Quit: leaving]
pochu has joined #dri-devel
macromorgan is now known as Guest1743
macromorgan has joined #dri-devel
kts has joined #dri-devel
macromorgan is now known as Guest1744
macromorgan has joined #dri-devel
Guest1743 has quit [Ping timeout: 480 seconds]
Guest1744 has quit [Ping timeout: 480 seconds]
bmodem1 has quit [Ping timeout: 480 seconds]
<mupuf> DragoonAethis: lol, yeah, but that's not the point ;)
* mupuf wants to be able to test i2c drivers :D
<lumag> DragoonAethis, thanks for the pointer. I didn't know about the /project/N/events/ API.
<lumag> mupuf, we mostly default to a custom version drm-ci via gitlab. But I will take a look, thanks
<mupuf> lumag: FYI, I started working on a patchwork to gitlab bridge some time ago: https://gitlab.freedesktop.org/mupuf/patchwork-bridge/-/blob/bridge/bridge.py
<mupuf> the bridge will apply the patches on top of a tree and push that to a branch when a new revision comes
<mupuf> and it will report back to patchwork when it is done, so that an email can be sent from there
<mupuf> we can move that to gfx-ci/patchwork-bridge if you want to work on it :)
<lumag> mupuf, probably we can use gitlab actions to ping patchwork?
<mupuf> you mean scheduled pipelines? yeah, that's what the project does
* mupuf would suggest not putting this pipeline in drm-ci though
<mupuf> it would be best to have one project responsible for the mirroring for multiple projects, rather than having every project replicate the same thing over and over again
<mupuf> drm-ci already did that by importing mesa CI verbatim, let's not add more :D
<DragoonAethis> mupuf: I'm working on something a bit weirder right now, because we have A Mess(tm)
* mupuf is intrigued
<DragoonAethis> Basically we've got GitHub for all the internal projects and GitLab for some public stuff
<DragoonAethis> And then Patchwork that talks with mailing lists and shares some GitLab projects
<DragoonAethis> Internal rules and governance for attaching GitHub runners makes it somewhat painful to work with, we can't host some CI bits on the public GitLab instance either because I'd like to share as much infra as possible between internal and public flows
<DragoonAethis> And we have 5 people who know Jenkins already and 1 who know GitHub/Lab CI
<DragoonAethis> So
<DragoonAethis> Jenkins <-> Forge Bridge <-> GitHub/GitLab/Patchwork/Gerrit/whatever you want
<DragoonAethis> Basically an API that keeps track of all the triggers, started pipelines, reporting back results using the appropriate upstream APIs
<DragoonAethis> And most importantly, it keeps track of what processes are in progress - all the public CI queue stuff is "fake", so to speak
<DragoonAethis> And we don't really know where each process starts and where it breaks until someone comes and yells at us
<DragoonAethis> And that's supposed to finally be the one source that just binds it all into sensible queues, what is running and what stopped unexpectedly, etc
JohnnyonFlame has joined #dri-devel
<DavidHeidelberg[m]> jljusten: ping, you have nice piglit Debian package, what would you say to push piglit nightlies into experimental?
<hakzsam> alyssa: gfxstrand: would you be able to review https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23254 please?
smilessh has joined #dri-devel
smiles_1111 has quit [Ping timeout: 480 seconds]
<mupuf> DragoonAethis: good luck...
MajorBiscuit has joined #dri-devel
Company has joined #dri-devel
<tjaalton> MrCooper: you mean comment #15?
<MrCooper> yeah, also in the Fedora bug report people are hitting the issue after upgrading to a newer upstream version
<MrCooper> so I'm afraid changing (PACKAGE_)VERSION won't help after all
<tjaalton> meh
<MrCooper> I do wonder if Mesa couldn't check whether the cached data is compatible though
YuGiOhJCJ has quit [Quit: YuGiOhJCJ]
heat_ has quit [Read error: No route to host]
heat has joined #dri-devel
<DragoonAethis> mupuf: Thanks, I'm gonna need it ;-;
kts has quit [Quit: Konversation terminated!]
fxkamd has joined #dri-devel
junaid has joined #dri-devel
yuq825 has quit []
Haaninjo has joined #dri-devel
<alyssa> hakzsam: done to the best of my abilitiy
<hakzsam> thanks!
heat_ has joined #dri-devel
kzd has joined #dri-devel
heat has quit [Read error: No route to host]
aravind has quit [Ping timeout: 480 seconds]
<alyssa> any time
<alyssa> well,
<alyssa> most times
<alyssa> 9am-5pm Mon-Fri, excluding stat holidays illness or planned vacations
<alyssa> weird express "any time",.
<alyssa> expression
<DemiMarie> lina: I strongly recommend adding the shader prelude. If it turns out to be unnecessary then it can just be removed, but it seems that adding it would be a uAPI break and Linus doesn’t like those.
aravind has joined #dri-devel
idr has joined #dri-devel
rauji___ has joined #dri-devel
<gfxstrand> DemiMarie: I don't think your notion that mobile GPU vendors are doing a good job at security. Just a couple years ago, somsone had a WebGPU app which can reboot your phone.
<alyssa> gfxstrand: you sure it wasn't a feature
<alyssa> WebReboot?
<alyssa> if we're allowed a WebUSB why not expose all the other drivers to arbitrary JS right?
<DavidHeidelberg[m]> is somehow implied, that we still have problem with running spec/arb_vertex_attrib_64bit/execution/vs_in/vs-input-uint_uvec4-double_dmat3x4_array2-position.shader_test inside mesa and we need to patch piglit 3 years later (read with spongebob voice) after initial introduction ? https://gitlab.freedesktop.org/mesa/mesa/-/commit/576f7b6ea52d39406df119b336396bfa41628726
<DavidHeidelberg[m]> MrCooper: ^ do you recall how serious it was? :P I guess with flake/fails/skips list we can drop the patch now
<MrCooper> it may no longer be needed in practice, though conceptually running a slightly different set of tests each time still seems odd
<lina> DemiMarie: I don't think we should add anything until we understand if there is a problem at all or not. I think the basic mitigation can be done without a UAPI break, you just need UAPI changes to get back some of the efficiency you lost.
kts has joined #dri-devel
kts has quit [Quit: Konversation terminated!]
pallavim has joined #dri-devel
pochu has quit [Quit: leaving]
tzimmermann has quit [Quit: Leaving]
<DemiMarie> lina: fair! Hopefully this can be understood before merge.
kts has joined #dri-devel
<DemiMarie> gfxstrand: people are at least looking.
sgruszka has quit [Remote host closed the connection]
kts has quit []
<gfxstrand> DemiMarie: They're looking on desktop, too. In fact, desktop has typically been a decade ahead of mobile when it comes to a lot of basic security stuff like per-context page tables.
alyssa has left #dri-devel [#dri-devel]
<a1batross> Hi there! Did anybody started mainlining rk628d bridge?
<gfxstrand> My point is that the notion that mobile is more secure is a total misnomer.
Haaninjo has quit [Ping timeout: 480 seconds]
<DemiMarie> gfxstrand: I was not aware of that, thanks! That might alleviate some of mwk @mwk:invisiblethingslab.com’s concerns.
frieder has quit [Remote host closed the connection]
<DavidHeidelberg[m]> MrCooper: slightly different? what does that mean?
<gfxstrand> That doesn't mean there are no bugs. Of course there are bugs. Some of them may even have security implications. With closed-source firmware our ability to find and fix bugs is indeed limited. But the notion that Mali and Qualcomm have better firmwares than Nvidia or AMD just because they're mobile is just nonsense.
<MrCooper> DavidHeidelberg[m]: as an example for illustration, say there are 6 tests A B C D E F; each piglit invocation would randomly run a different subset of those, e.g. "B C F" then "A B E"...
<DavidHeidelberg[m]> MrCooper: we could fix that to run everything? ... and then don't use it in CI anyway because of how long it will run :D
<MrCooper> yeah, not sure how bad it would be
TMM has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
TMM has joined #dri-devel
<MrCooper> oh, and actually I don't remember if the subset changes per invocation, or just per build of piglit
fab has joined #dri-devel
fab has quit [Quit: fab]
fab has joined #dri-devel
<robclark> gfxstrand, DemiMarie: most of the security issues I see getting reported thru bug bounty program are just kernel issues (race conditions where userspace can trigger UAF, etc.. handles are horrible).. but I'd probably not be super excited about usespace cmd submission from vm guest directly to gpu fw
heat_ has quit [Read error: No route to host]
heat_ has joined #dri-devel
aravind has quit [Ping timeout: 480 seconds]
junaid has quit [Remote host closed the connection]
djbw has joined #dri-devel
<gfxstrand> dschuermann, cwabbott: Any suggestions for papers to read about SSA spilling?
<gfxstrand> I found the Braun and Hack paper that ACO cites. I guess I'll start by reading that one.
<gfxstrand> Looks like ACO and ir3 both cite that paper. I guess that's the one, then.
cmichael has quit [Quit: Leaving]
agd5f_ has quit []
agd5f has joined #dri-devel
djbw has quit [Read error: Connection reset by peer]
<DemiMarie> robclark: exactly!
Major_Biscuit has joined #dri-devel
MajorBiscuit has quit [Ping timeout: 480 seconds]
djbw has joined #dri-devel
benjaminl has joined #dri-devel
sarahwalker has quit [Remote host closed the connection]
<cwabbott> gfxstrand: yes, that's the one
<cwabbott> at least for once there's actually a paper to point to that more-or-less describes the algorithm you actually need to use and isn't totally impractical
lynxeye has quit [Quit: Leaving.]
<jenatali> alyssa: Ping for !23173
sukrutb has joined #dri-devel
Major_Biscuit has quit [Ping timeout: 480 seconds]
eukara has joined #dri-devel
<dcbaker> tjaalton: I've cut one last release just now, so 23.0.4 will be the last. Sorry for being *so* late with it.
<tjaalton> dcbaker: okay, thanks!
alyssa has joined #dri-devel
<alyssa> anholt_: Is nir-to-tgsi expected to handle instructions where multiple sources are indirect?
<alyssa> It doesn't seem to work, since addr_reg[0] gets used twice (with the second index clobbering the first), but maybe this was impossible to happen before I went playing with nir_reg
AndrewR has joined #dri-devel
<eric_engestrom> dcbaker: do you feel like taking on 23.2?
<dcbaker> yeah, I can take on 23.2
<eric_engestrom> (no worries if you want to skip that one)
<eric_engestrom> dcbaker: see !23205 for the 23.2 schedule
<tjaalton> +1 for early august release :)
<tjaalton> early-to-mid
<robclark> alyssa: the hw that I'm familiar with that has indirect reg access only has a single address register.. so you can't have an instruction that uses two different addr reg values
<alyssa> robclark: Hmm, ok
<alyssa> Well, this pass is mostly for ir3's benefit so I can avoid this at a NIR level
<alyssa> Maybe I botched locals_to_reg and there's a reason it didn't do this before
<alyssa> Oh.
<alyssa> The real reason is that locals_to_reg generates moves and doesn't coalesce them, whereas my new thing can coalesce the moves..
<alyssa> That'd do it, I guess
<alyssa> robclark: How competent is ir3's backend copyprop of array registers, btw?
<robclark> you can throw extra mov's at the frontend, no prob
<alyssa> sounds good
<robclark> (also, in what case is ir3 involved with anything that uses ntt?)
<alyssa> I have helpers to chase movs at NIR->backend time, but they're really geared for "legacy" backends ... If ir3 can just eat the moves in the frontend it's probably cleaner
<alyssa> (It's not. I'm ripping out nir_register which means I have to worry about every backend.)
<robclark> ahh
<alyssa> (Today I'm porting nir_lower_locals_to_reg to the new intrinsics and debugging fallout on softpipe+nir-to-tgsi since)
<eric_engestrom> dcbaker: don't forget to post the 23.0.x on the website too :)
<eric_engestrom> (`./post_release.py 23.0.1` and adjust the date to when it actually was)
<dcbaker> thanks for the reminder! I always forget about the website :(
oneforall2 has quit [Remote host closed the connection]
Haaninjo has joined #dri-devel
oneforall2 has joined #dri-devel
Danct12 has joined #dri-devel
jewins has joined #dri-devel
Daanct12 has quit [Ping timeout: 480 seconds]
junaid has joined #dri-devel
<DemiMarie> robclark: the best solution would be for the GPU firmware to be open source and subject to public security review. Ideally formal methods would be used to prove at least the absence of runtime errors, such as memory corruption or undefined behavior.
<airlied> https://arxiv.org/abs/2305.12784 even the hw doesn't like you :-P
<DemiMarie> IMO data-dependent power consumption is as much a vulnerability as data-dependent timing.
<DemiMarie> Or DVFS needs to be disabled.
<DemiMarie> If DVFS is enabled and data-dependent power consumption is present, there is a security problem unless the DVFS is based on something (like instruction counts) that is data independent.
<alyssa> airlied: interesting paper
JohnnyonF has joined #dri-devel
JohnnyonFlame has quit [Ping timeout: 480 seconds]
cheako has joined #dri-devel
Leopold_ has quit [Remote host closed the connection]
Leopold has joined #dri-devel
BobBeck9 has quit []
BobBeck has joined #dri-devel
benjaminl has quit [Ping timeout: 480 seconds]
heat_ has quit [Remote host closed the connection]
heat_ has joined #dri-devel
junaid has quit [Remote host closed the connection]
Lyude has quit [Quit: Bouncer restarting]
<karolherbst> is there a way to make the kernel build system not append silly things like "-dirty" to my kernel version?
Lyude has joined #dri-devel
<alyssa> karolherbst: yes
<alyssa> CONFIG_LOCALVERSION_AUTO
<karolherbst> so I set that to y and it's doing something sane?
<alyssa> karolherbst: no, unset it
<alyssa> that's the CONFIG_APPEND_SILLY_THINGS option
<karolherbst> well.. it's not set for me
<alyssa> uhhh
<karolherbst> maybe it's something inside installkernel doing it.. dunno
Duke`` has quit [Ping timeout: 480 seconds]
JohnnyonFlame has joined #dri-devel
benjaminl has joined #dri-devel
JohnnyonF has quit [Ping timeout: 480 seconds]
heat_ has quit [Remote host closed the connection]
heat has joined #dri-devel
macromorgan has quit [Read error: Connection reset by peer]
macromorgan has joined #dri-devel
macromorgan is now known as Guest1777
macromorgan has joined #dri-devel
eloy_ has quit [Ping timeout: 480 seconds]
benjamin1 has joined #dri-devel
eloy_ has joined #dri-devel
Guest1777 has quit [Ping timeout: 480 seconds]
benjaminl has quit [Ping timeout: 480 seconds]
sima has quit [Ping timeout: 480 seconds]
rauji___ has quit []
<jenatali> How upset would people be if I added a nir option to accept mediump in the backend? I.e. to disable the late opts that turn it into actual 16bit converts
<alyssa> jenatali: define "mediump"
<alyssa> x2ymp conversions?
<alyssa> or something else?
<jenatali> Yeah
<alyssa> meh
<jenatali> Just the conversions
<alyssa> I really want to kill off nir_shader_compiler_options but failing that, .. yeah
<alyssa> if nobody has deleted options->intel_vec4 after all this time
<alyssa> disabling f2fmp lowering is mild.
<jenatali> Unless you want to move it into args to the algebraic opt directly, I'd be fine with that too
<alyssa> yeah that's what I'm waffling about
<alyssa> The way nir_opt_algebraic was supposed to work is that all the drivers would have their own passes
<alyssa> but.. whoops
<jenatali> Basically, DXIL has a bit which says whether 16bit types should be min-precision or native, and if the only thing that produces 16-bit types are f2fmp then I don't want to set the native bit
<alyssa> sure
<alyssa> I think there's a possible future where nir_opt_algebraic.py just generates lists of rules based on an options dict
<alyssa> and then each driver would define their own backend_nir_opt_algebraic passes inheriting those rules (setting the options they need and also appending their own rules)
<jenatali> I like the sound of that, though there's some benefit to be had by making a large rule list instead of a bunch of independent passes
<karolherbst> soo.. like the nir_shader_compiler_options stuff?
<alyssa> karolherbst: nir_shader_compiler_options is checked at runtime
<karolherbst> right...
<alyssa> jenatali: nir_opt_algebraic.py would realistically still be a big rules list
<jenatali> And I'd probably want some way to keep some options at runtime
<karolherbst> mhhh
<alyssa> anyway this is just kinda me daydreaming
<alyssa> because my actual big thing right now is torching nir_register
<karolherbst> good luck
<jenatali> Looking forward to it
<alyssa> which is the NIR equivalent of rolling a boulder uphill
<alyssa> pretty close to the point where it becomes blocked on "convert 15 drivers" instead of plumbing
<jenatali> That analogy implies someone is going to bring it back
<karolherbst> I can rely on you pinging me on more stuff I have to fix in codegen to make that reality?
<alyssa> jenatali: No, it implies it's liable to roll down, smack me in the face, and make me roll down with it like in the cartoons
<alyssa> karolherbst: mais oui
<jenatali> :P
<karolherbst> I wonder if I really figure a way out to move all the stupid lowering post SSA or if I just torch codegen once NAK is done
<karolherbst> though it's probably fine.. most of those passes are SSA compatible already
<karolherbst> and some are obsolete
pcercuei has quit [Quit: dodo]
<alyssa> jenatali: the good news is that, like in cartoons, it's totally harmless to me
<jenatali> Heh
<DemiMarie> How does Mesa handle running out of memory?
<karolherbst> we try to handle malloc fails, but it's all pointless as malloc doesn't fail anyway
Leopold___ has joined #dri-devel
<robclark> same way everything else does... crash!
<karolherbst> well.. you wanna kill processes on high memory presure
<karolherbst> but yeah...
<karolherbst> the way things work on linux is, that... you can't handle geoing out of memory anyway
<karolherbst> *going
<robclark> "cooperative-low-memory-killing"
<zf> unless of course you're running out of something other than physical memory, or you're not on linux :-)
<alyssa> karolherbst: speaking of my NIR reworks, I think https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23208 needs a marge assign
<karolherbst> ohh right.. I wanted to test that first
benjaminl has joined #dri-devel
<Hi-Angel> Ohhh
<Hi-Angel> I'm stupid
<Hi-Angel> Never mind, I'm just still waiting for the 22.3.x release
<Hi-Angel> I thought that was it, sorry
Leopold has quit [Ping timeout: 480 seconds]
<Hi-Angel> My friend is just on Fedora who don't update to the next major Mesa release yet, so I'm waiting for the previous bugfix release to tell my friend she can update her system
<Hi-Angel> So, anyway, sorry for the confusion
<robclark> hmm, fedora is on 23.0.3
<Lynne> speaking of OOM, I wish linux had a mode to make malloc fail
<Hi-Angel> Oh, then wait
<Hi-Angel> I am right
<karolherbst> Lynne: you can disable overcommitting
<Lynne> that would make all the effort we spend on ffmpeg surviving that worth it
<Hi-Angel> dcbaker: yeah, right, the previous 23.0.3 release was a month ago, and the 23.0.4 from yesterday didn't include the commit :c
benjamin1 has quit [Ping timeout: 480 seconds]
<airlied> Hi-Angel: was considering pushing 23.1.1 into f38
<karolherbst> Lynne: thing is.. sometimes you overcommit on purpose
<Hi-Angel> airlied: awww, nice!
<karolherbst> Lynne: there are also issues in regards to fork()
<zf> there is also ulimit -v
<dcbaker> hi-angel: the release was ready last week, but I got busy and didn't make it. Since it was fully validated I made it today as-is. If I pulled more patches today I'd restart the validation cycle and it would be a couple days till the release happened, when there were already a number of really important fixes queued
<dcbaker> I can make a 23.0.5 if there's critical stuff that needs to go in and cut one more release
<Hi-Angel> I see
<Hi-Angel> No, it's not critical, basically I don't think it affects anyone but one user…
<Hi-Angel> Alrighty then… will be waiting for 23.1.1 to get into f38 C:
<karolherbst> anyway... all of the malloc error handling is purely cosmetic
<karolherbst> speaking of overcommiting, Kayden: the SVM work on iris is kinda cursed, because I don't think people will like it if the driver unconditionally allocates like multiple TB of virtual memory... so maybe frontends need to opt in on a screen level or something...
<karolherbst> I kinda wished all of that wouldn't be so painful to implement
<Kayden> yeah, that sounds kinda awful
<karolherbst> maybe I don't use malloc and do some mmap magic in svmAlloc instead...
<karolherbst> but uhh...
<Lynne> karolherbst: I meant only when it makes sense, not always when running out of memory
<karolherbst> "when it makes sense"?
<Kayden> TBH I'm not really sure why you need to do that
<karolherbst> Kayden: well.. the thing is, I have to allocate memory which has the identical address on the GPU and CPU side
<Kayden> there are really only a couple places where we have specific buffers that have to be in certain places
<Lynne> yeah, when the kernel decides to OOM you
<karolherbst> and I can't really fail it
<Lynne> gives you half a chance of at least closing down
<karolherbst> I can probably call malloc until I hit a hole in the GPUs vm
<Kayden> you could probably just MMAP_FIXED on the CPU side the spot for our binding tables and so on
<karolherbst> or I call mmap and use a good starting address
<Kayden> to make sure that nothing gets malloc'd there
<karolherbst> yeah.. I do all of that
<karolherbst> _but_
<karolherbst> _OTHER is the annoying heap
<Kayden> the one with no restrictions of any kind is the annoying one? :)
<karolherbst> yeah
<karolherbst> because
<karolherbst> malloc shouldn't allocate memory at the same location
<Kayden> it sounds like we just need to make our GPU allocations visible on the CPU side
<karolherbst> right
<Kayden> at one point anv (well, hasvk) used memfds and userptr for allocations
<Kayden> couldn't we just mmap nothing to get a CPU address and then use that as the GPU address?
<Kayden> and then just tell util_vma that there's a BO at this address, instead of letting it pick arbitrarily
<karolherbst> yeah.. that's kinda my alternative idea
<karolherbst> I'd still like to make that opt in, so we don't mmap for nothing
<Kayden> yeah, it'd be nice to know if SVM is required
<Kayden> not sure how expensive that'd be, honestly
<Kayden> it might not be too bad... it's probably not free though
<karolherbst> ehh.. as long as it's all done on allocation it's fine
<Kayden> suballocator and BO caches both do mitigate that, yeah.
<karolherbst> and then I can still just reuse mallocs return value, because every bo allocation cut a hole via mmap.. it's a lot of mmaps, but... that's probably better than sizing OTHER to e.g. 1TB and cut such a hole
<karolherbst> it still expensive, but oh well...
<karolherbst> mhhhh
<karolherbst> yeah no... I mean.. maybe I dig a bit on how intel's stack is doing that, but when I looked into it, it's really hard to follow
<karolherbst> though I don't think they bother with all those different heaps in the first place
rasterman has quit [Quit: Gettin' stinky!]
<karolherbst> Kayden: actually... I think I could really just skip malloc, it's a bit cursed, but it might not be too bad... so if we back bos with mmap, and I use mmap instead of malloc, I could e.g. say mmap to find the first gap at a starting address
<karolherbst> fragmentation would be a problem, but...
<karolherbst> anyway.. I'd check what other drivers do first before I setlle with any approach
Danct12 has quit [Ping timeout: 480 seconds]
vliaskov has quit [Remote host closed the connection]
Haaninjo has quit [Quit: Ex-Chat]
ngcortes has joined #dri-devel
rsalvaterra has quit [Read error: Connection reset by peer]