#dri-devel on 2023-09-07 — irc logs at oftc.irclog.whitequark.org

2022-12-21 00:45 ChanServ changed the topic of #dri-devel to: <ajax> nothing involved with X should ever be unable to find a bar

00:02 <DemiMarie> airlied: which compute workloads (other than ML) do _not_ need FP64?

00:05 <airlied> let me turn that around, what compute workloads exist anymore that aren't ML :-P

00:07 <jtatz[m]> airlied: physics simulations

00:10 co1umbarius has joined #dri-devel

00:11 <alyssa> airlied: ray tracing

00:11 kzd has quit [Quit: kzd]

00:12 columbarius has quit [Ping timeout: 480 seconds]

00:13 <DemiMarie> I did a cell growth simulation when I was an intern in college.

00:14 <airlied> yeah I'm just trolling because running those with the fp64 emulation you get on nearly all modern desktop hw totally sucks

00:15 <HdkR> alyssa: Raytracing is watertight in fp32 space until you're like a million points away from origin, it's fine :P

00:19 <DemiMarie> Is the Asahi driver blocked on the same drm_sched issues that Xe and PanCSF are?

00:19 <DemiMarie> As an onlooker it seems that drm_sched is a very bad API.

00:20 <airlied> the only thing worse than a shader scheduler design is per-driver scheduler design

00:20 <airlied> it's not really an issue, it's just necessary public design work

00:21 <DemiMarie> “shader scheduler”?

00:22 <DemiMarie> At the very least writing drivers that don’t UAF when unbound should not be so hard.

00:22 <airlied> shared

00:22 <airlied> I cannot type shared anymore, my fingers autocorrect it

00:23 <airlied> you'd think it was simple, but turns out it's not, hence design

00:24 <DemiMarie> Am I correct that the current design is bad?

00:25 <airlied> the current design is fine for the things it was designed for

00:25 <airlied> we are now trying to use it for new things

00:25 <airlied> hence it needs redesign

00:26 <DemiMarie> what about the lifetime problems with Panfrost Asahi Lina pointed out?

00:26 <airlied> we are now trying to use it for new things

00:26 <DemiMarie> and some old drivers had bugs

00:26 <airlied> not sure what was unclear about that statement

00:27 <airlied> the xe, asahi, pancsf, nouveau use case is all new things

00:28 <DemiMarie> I’m referring to the Panfrost unbind bug

00:28 <DemiMarie> Also I’m surprised that Nouveau is a new thing, considering that it is an old driver.

00:29 <airlied> feel free to learn how it all works and suggest designs :-)

00:30 <DemiMarie> yeah, as an infosec person I find that I am often more able to provide constraints than to solve those constraints

00:34 <Lynne> radeon vii was nice, it had more fp64 units than pretty much everything consumer before or since, and even had hbm memory

00:34 <psykose> i don't think i knew a single person that had one, for some reason

00:36 <Lynne> it was in the shadow of the 1080, much like the 2080 was, nvidia dropped the ball and made a future-proof card

00:36 <DemiMarie> I do hope that Xe will have fewer bugs than i915; bugs in i915 are the reason GPU acceleration in Qubes OS will need to be off by default even once it is implemented.

00:38 <airlied> there are unreported security bugs in i915?

00:38 rz_ has joined #dri-devel

00:40 <q66> i mainly hope xe will actually finally be merged into the kernel within my lifetime

00:41 <q66> so that i can actually finally use my a750 for what i wanted to use it for (non-x86 computers)

00:43 rz has quit [Ping timeout: 480 seconds]

00:46 <emersion> gildekel: i think the "bad cable" issue is not just with 0 modes, it also happens when the kernel has pruned some but not all modes

00:46 <emersion> in that case as well you'd want to signal back to the user that _something_ is wrong

00:47 <emersion> so i don't think using 0 modes as an indication that "something is wrong" works very well

00:47 <emersion> a new prop might be better

00:50 <DemiMarie> airlied: “no known unpatched security bugs” is not enough for Qubes OS. Qubes OS also cares about how many security bugs there have been in total, especially compared to the amount of effort put in to finding them.

00:50 <airlied> DemiMarie: how do you use computers at all?

00:51 <airlied> not saying GPUs are becaons of greatness, but total security bugs in the kernel or any other software surely swamps it

00:53 <DemiMarie> airlied: I did `git log -p v6.1..v6.1.51 -- drivers/gpu/drm/i915` and looked at the individual commit messages to see what looked like a security bug. I found 13, one of which was GVT-g only (which would not matter for virtio-GPU native contexts) and 4 of which were mitigated by wrapping all ioctl() calls in a global lock. That leaves 8. Qubes OS has had 8 security bugs _total_ in the same time period.

00:53 peelz has joined #dri-devel

00:54 <airlied> I think that means you aren't looking closely enough at other pieces of the kernel then :-)

00:54 peelz is now known as Guest2082

00:54 <DemiMarie> airlied: nah, it’s because we don’t trust most of Linux to be a security boundary :-)

00:55 <DemiMarie> You are absolutely correct about there being way more vulns elsewhere, but the vast majority of them don’t impact Qubes OS in any way.

00:55 <airlied> do you pass any devices into the guest like you would the GPU?

00:55 <airlied> or should you just figure out how to use venus?

00:56 <DemiMarie> We use PCI pass-through for Ethernet and USB controllers. That’s all handled by the Xen hypervisor and various other Xen Project code.

00:57 <DemiMarie> Venus is out of the question because it means running the shader compiler on the host.

00:57 <airlied> I don't think xe will be any better there, or any gpu driver, the fact they fixed that many bugs surprises me

00:58 <DemiMarie> I would absolutely love to be wrong about my analysis!

00:59 <robclark> airlied: the situation isn't great for webgpu/webgl or anywhere that that gpu is exposed to hostile content.. (and vm is same sort of situation).. I think virtgpu native ctx helps because at least 2/3 issues seem to be handle lifetime which require attacker to control two threads with access to drm driver (which native-ctx doesn't give you) and the remainder require abusing ioctls in a way that you can't nearly as easily with

00:59 <robclark> native-ctx.. For non vm case I'd really like to have webgl/webgpu mesa in dedicated sandbox without direct access to kernel but using something like native-ctx over a sort of hardened vtest

01:00 <robclark> turns out, since flash was deprecated, the bad guys moved on to find the next easiest low hanging fruit

01:00 <DemiMarie> robclark: thank you!!!!!! can you explain what you mean by “abusing ioctls”?

01:01 <DemiMarie> Is this because native contexts basically validate everything again in userspace?

01:02 rz_ is now known as rz

01:02 xypron has joined #dri-devel

01:02 <robclark> that could be things like passing impossibly long arrays or weird params.. not sure native-ctx protects everything but if it protects 90% then it makes a useful layer in the defense-in-depth perspective.. and the host part of it is an easy place to add extra protection/sanity-checking

01:03 <robclark> I'm of the opinion that it is too easy to get code execution in UMD (mesa or otherwise) for that to be considered a layer in the sense of defense-in-depth

01:04 <robclark> so UMD not having access to kernel is a a useful thing.. maybe small perf penalty but that is a useful tradeoff when you are exposing it to the interwebs

01:05 <airlied> but what you give the umd access to would just be an API wrapping ioctls in ioctls?

01:05 <robclark> (also, native-ctx and core virglrenderer is small enough that re-writing parts of it in rust over time is tractable)

01:05 <robclark> umd would have access to a socket

01:06 <robclark> to "vtest" server (well, vtest will serve as the prototype but we should re-write in rust)

01:07 <airlied> yeah like I've considered we should have that for containers anyways

01:07 <airlied> but also just virgl in case we don't have native drivers inside the container

01:07 <robclark> yeah.. depends on degree of trust of thing using umd but if you don't trust it we want another layer.. umd should just be considered "crumple zone"

01:08 <robclark> no vrend pls.. the vrp program is having a hay-day with vrend

01:09 <airlied> well maybe the venus version rather than the glsl one :-P

01:09 <robclark> vrend is just an express lane to what you don't want to have the bad guys get at

01:09 <robclark> yeah, venus is more sane.. but still, expect to get code exe in umd and then once again you have access to kernel

01:10 <DemiMarie> robclark: are you with ChromeOS?

01:10 <DemiMarie> because if so then I think there is room for collaboration

01:11 <robclark> DemiMarie: yes, CrOS gpu team

01:11 <DemiMarie> robclark: THANK YOU

01:12 <DemiMarie> (caps intentional)

01:12 <robclark> I get to live with the terror that is vm's and web content exposed to gpu drivers ;-)

01:13 <DemiMarie> Situation in Qubes is that we are caught between users someone might burn a 0day on and GUI toolkit authors who decide that software rendering eating 80% CPU is not a bug.

01:13 <robclark> (and tbh.. I haven't looked at games that have user contributed content like roblox.. not sure if that should also be scary yet)

01:14 Company has quit [Quit: Leaving]

01:16 <robclark> I'd recommend against enabling webgpu and probably webgl if you can.. since web is probably were users are most exposed.. toolkit, well that depends on if users are just running random apps, but I think there is less risk there.. you can more easily chose not to run $randome_exe than avoid malicious ads, etc..

01:16 <robclark> but ofc depennds on users threat model

01:17 <DemiMarie> robclark: In Qubes OS our assumption is that the villains have already popped your web browser.

01:17 <robclark> some might want to stick to sw rendering in guest

01:17 <DemiMarie> So 0days in Chrome, Firefox, etc are very much in scope

01:19 <DemiMarie> And yes, I would very much prefer to stick with SW rendering, but unfortunately the GTK4 devs have decided that SW rendering eating tons of CPU is perfectly okay.

01:19 youmukon1 has joined #dri-devel

01:19 <DemiMarie> robclark: mind if I DM you to avoid distracting the rest of the people here?

01:20 <robclark> np

01:20 lemonzest has quit [Quit: WeeChat 4.0.4]

01:22 yyds has joined #dri-devel

01:24 youmukonpaku1337 has quit [Remote host closed the connection]

01:25 <airlied> has anyone documented how slow/much cpu gtk4 is using?

01:26 lemonzest has joined #dri-devel

01:27 <q66> considering i've successfully used gtk4 on a 2005 power mac, i don't think it's as significant of an issue as presented

01:28 youmukonpaku1337 has joined #dri-devel

01:29 feng_ has joined #dri-devel

01:30 <q66> qubes os and similar folks are just great at creating an environment where everything is a footgun and then complaining they shot themselves in the foot

01:31 <airlied> could probably expend some time to see if we can make gtk4 use llvmpipe linear paths more

01:31 <airlied> once I finally get gnome-shell across the line

01:32 <DemiMarie> q66: Qubes OS developer here, my hope is that GPU security can be more than just “don’t give GPU access to untrusted code”.

01:32 <DemiMarie> airlied: that would be absolutely amazing, thanks!

01:32 <q66> yes, you made it obvious enough already

01:32 <emersion> DemiMarie: have you considered submitting GTK patches to optimize the software rendering paths?

01:32 <airlied> or fixing llvmpipe to be faster

01:33 <emersion> i think it's perfectly reasonable for GTK devs to not focus on the software rendering perf, since it's not used by most people

01:33 <q66> the main typical cpu bottleneck in gtk4 is not software or gpu rendering, but overreliance on simd'ed paths through graphene

01:33 youmukon1 has quit [Ping timeout: 480 seconds]

01:33 <emersion> as always in the open-source world, if you care, you can just submit patches

01:33 <q66> if you compile graphene with scalar you will *actually* see your performance (in gnome shell and elsewhere) go down the drain

01:34 <psykose> that makes total sense tho

01:35 <q66> the annoying part is half the api implementations graphene exposes are in the header

01:36 <robclark> I'm ok with exposing g-s to GPU (esp if you block extensions).. that doesn't terrify me as much as web browser.. I'm ok with exposing skia (or whatever ffox uses) to GPU for web browser.. but webgpu and to a _slightly_ lesser extent, webgl.. not so much

01:36 <DemiMarie> emersion: GTK software rendering is deprecated, but airlied’s suggestion of speeding up llvmpipe is a worthwhile thought

01:37 <emersion> have you found out why it is deprecated?

01:37 <emersion> maybe it's because it's missing a maintainer?

01:38 <q66> presumably because software rendering was using cairo and cairo sucks

01:38 <q66> (not sure if that's the actual reason, but it sounds likely)

01:39 <q66> for one it's been kinda dead for quite a while and for two its design is fundamentally flawed

01:40 <DemiMarie> emersion: It's deprecated because it doesn’t support 3D transformations.

01:40 <emersion> 3D… in a UI toolkit?

01:41 <emersion> oh well

01:41 <DemiMarie> emersion: yup

01:41 Daanct12 has joined #dri-devel

01:47 <DemiMarie> As far as submitting patches goes, is there a guide to getting started in Mesa anywhere?

01:48 feng_ has quit []

01:50 <kisak> https://docs.mesa3d.org/submittingpatches.html

01:50 <robclark> there is https://docs.mesa3d.org/submittingpatches.html is you have patches to make llvmpipe running in guest / untrusted env faster.. but pls don't think of mesa as a hardened layer in defense

01:57 <DemiMarie> kisak robclark: it’s less a matter of “what is the procedure used“ and more “how do I not get lost in millions of lines of code”

01:58 <robclark> a journey of a million LoC starts with one LoC?

01:58 <q66> emersion: what's weird about that

01:58 <q66> being able to do generic matrix transformations is pretty useful

01:58 <q66> and considering the toolkit is already opengl-driven, it makes sense

01:59 <robclark> our docs are better than they've been in a long time.. but digging into a large code base is always going to be hard

01:59 <DemiMarie> robclark: good point, maybe it is just a skill I should learn

01:59 <q66> i mean, css allows them too for instance

01:59 <DemiMarie> I did manage to dig into Linux and fix some Xen memory management issues after all.

02:00 <robclark> I guess, for less trusted apps your focus would be on llvmpipe being a good-enough alternative... so profile and go from there?

02:01 youmukon1 has joined #dri-devel

02:02 <DemiMarie> I guess the first place to start would be the algorithmic stuff airlied is working on

02:02 <robclark> but always, these things are prioritizing.. what things are trusted enough that you can expose to gpu (after combining with native-context, etc).. what things are completely untrusted and you just want to live with llvmpipe, and such

02:02 <DemiMarie> yup

02:02 <robclark> some things 99% is good.. some things it takes 99.99999%, etc

02:03 <DemiMarie> indeed

02:03 <DemiMarie> and also some risk is unavoidable, and taking the risk might be worth it in some cases

02:04 <DemiMarie> But at this point I think hypothetical discussion has gone as far as it can and I need to start thinking about where (and when) I can start actually doing stuff

02:04 <robclark> :-)

02:05 youmukonpaku1337 has quit [Read error: Connection reset by peer]

02:06 <DemiMarie> Thanks to everyone here for the feedback (extremely useful!) and for putting up with all of my questions.

02:06 xzhan34_ has quit [Remote host closed the connection]

02:06 <DemiMarie> Sorry for taking up so much of people’s time.

02:07 xzhan34 has joined #dri-devel

02:08 <robclark> np.. it is an interesting topic.. I think gpu security is getting better but at the same time as security has been improving in various places (such as deprecating flash) the bad guys don't retire, they just look for the next least-hard surface.. cat-and-mouse and all that

02:09 <DemiMarie> yeah, that is my understanding too

02:10 <DemiMarie> I do hope that newer drivers will at least have cleaner uAPIs that are easier to implement properly.

02:15 <robclark> maybe rust in kernel will make handle lifetimes harder to screw up.. that is literally most of the issues I see go by (fortunately it is by defn a race condition issue which single threaded native ctx saves you from)..

02:16 <DemiMarie> yeah

02:16 <DemiMarie> Xe rewritten in Rust would be amazing

02:17 <emersion> q66: matrix transformations can be 2D, not necessarily 3D

02:17 <emersion> (just use 3x3 instead of 4x4)

02:21 egbert is now known as Guest2094

02:21 <q66> emersion: yeah but it's handy to be able to do generic 3d transforms as well considering these days toolkits are not just strictly defined widget layouts but rather more generic/freeform graphics toolkits

02:22 egbert has joined #dri-devel

02:23 <DemiMarie> BTW if you want the actual commits I thought were security issues I can send them to you robclark. They were all backported to stable and I assume that CrOS picked them up from there.

02:26 Guest2094 has quit [Ping timeout: 480 seconds]

02:37 crabbedhaloablut has joined #dri-devel

02:39 heat has quit [Ping timeout: 480 seconds]

03:12 <daniels> Kayden: fwiw .gitlab-ci/bin/ci_run_n_monitor.py is what you want for triggering manual jobs, rather than hitting play a few times and cancel a million times

03:14 <Sachiel> oh, that's good to know

03:15 <daniels> for one it makes your life a lot easier, and for two the target regex avoids swamping hardware runners with unnecessary jobs which block marge

03:16 youmukonpaku1337 has joined #dri-devel

03:21 youmukon1 has quit [Remote host closed the connection]

03:22 <Kayden> ah neat, so I can just pick a specific job I want to run and it'll get all the deps for that and not everything else?

03:22 <Kayden> thanks for the tip :)

03:41 tristan has joined #dri-devel

03:42 tristan is now known as Guest2100

03:42 ngcortes has quit [Ping timeout: 480 seconds]

03:45 <daniels> Kayden: correct, and np!

04:05 <airlied> initial look suggests hitting the linear path for gtk4 shaders might be hard, the frag shaders are pretty complex

04:08 ZeZu has quit [Quit: off to see the wizard]

04:08 ZeZu has joined #dri-devel

04:11 kzd has joined #dri-devel

04:14 a-865 has quit [Ping timeout: 480 seconds]

04:17 ungeskriptet0 has quit []

04:17 ungeskriptet has joined #dri-devel

04:25 a-865 has joined #dri-devel

04:27 kzd has quit [Ping timeout: 480 seconds]

04:29 jewins has quit [Ping timeout: 480 seconds]

04:32 youmukonpaku1337 has quit [Remote host closed the connection]

04:33 youmukonpaku1337 has joined #dri-devel

04:33 flynnjiang has quit [Ping timeout: 480 seconds]

04:33 flynnjiang has joined #dri-devel

04:42 Duke`` has joined #dri-devel

04:53 agd5f_ has joined #dri-devel

04:59 agd5f has quit [Ping timeout: 480 seconds]

05:00 fab has joined #dri-devel

05:01 kzd has joined #dri-devel

05:12 Duke`` has quit [Ping timeout: 480 seconds]

05:18 kzd has quit [Quit: kzd]

05:22 itoral has joined #dri-devel

05:26 ngcortes has joined #dri-devel

05:26 kzd has joined #dri-devel

05:28 tzimmermann has joined #dri-devel

05:32 elongbug_ has quit [Read error: Connection reset by peer]

05:35 bmodem has joined #dri-devel

05:38 flynnjiang has quit [Ping timeout: 480 seconds]

05:39 flynnjiang has joined #dri-devel

05:45 sima has joined #dri-devel

05:47 kzd_ has joined #dri-devel

05:48 itoral_ has joined #dri-devel

05:51 kzd has quit [Ping timeout: 480 seconds]

05:54 itoral has quit [Ping timeout: 480 seconds]

05:54 i-garrison has quit []

05:55 i-garrison has joined #dri-devel

05:59 <HdkR> `[20147.176454] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, but soft recovered` Nice, I found a game that has a crash on demand function

05:59 mszyprow has joined #dri-devel

06:00 kzd_ has quit []

06:01 <HdkR> https://cdn.discordapp.com/attachments/797528613787926538/1149222796958912583/Screenshot_2023-09-06_at_10.59.39_PM.png It absolutely destroys both my Adreno GPU and AMD GPU if it loses focus

06:01 flynnjiang has quit [Read error: Connection reset by peer]

06:01 neutron has joined #dri-devel

06:02 flynnjiang has joined #dri-devel

06:02 <neutron> Xorg people here?

06:04 fab has quit [Quit: fab]

06:06 kzd has joined #dri-devel

06:11 Guest2100 has quit [Ping timeout: 480 seconds]

06:12 <HdkR> https://gitlab.freedesktop.org/mesa/mesa/-/issues/9766 Created a bug for the amd/adreno peeps if they want to try and fix it :)

06:13 rellla has quit [Ping timeout: 480 seconds]

06:13 rellla has joined #dri-devel

06:24 kzd has quit [Quit: kzd]

06:24 rasterman has joined #dri-devel

06:32 <rpigott> HdkR: what top is that?

06:32 rasterman has quit [Quit: Gettin' stinky!]

06:32 kzd has joined #dri-devel

06:33 <HdkR> rpigott: nvtop

06:33 <HdkR> Supports Intel, AMD, NVIDIA, and Adreno

06:33 <rpigott> hm nice

06:35 ngcortes has quit [Ping timeout: 480 seconds]

06:35 rasterman has joined #dri-devel

06:36 <HdkR> I saw some Panfrost patches on the ML for supporting the fdinfo interface, so theoretically added Panfrost in to it would be easy

06:37 <HdkR> If those patches land anyway

06:40 frieder has joined #dri-devel

06:49 youmukon1 has joined #dri-devel

06:50 <hakzsam> dcbaker: can you take care of https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24749 please?

06:51 yuq825 has joined #dri-devel

06:51 <kode54> gee

06:51 <kode54> nvtop can't report GPU Mem for Intel

06:52 An0num0us has joined #dri-devel

06:53 youmukonpaku1337 has quit [Remote host closed the connection]

06:54 <HdkR> Hope for those hwmon patches to land or something I guess

06:57 tobiasjakobi has joined #dri-devel

06:57 tobiasjakobi has quit []

06:59 sarnex has quit [Quit: Quit]

06:59 sarnex has joined #dri-devel

07:00 sghuge has quit [Remote host closed the connection]

07:00 sghuge has joined #dri-devel

07:01 DodoGTA has quit [Quit: DodoGTA]

07:02 DodoGTA has joined #dri-devel

07:02 DodoGTA has quit []

07:03 DodoGTA has joined #dri-devel

07:09 <daniels> HdkR: alarumbe already sent a PR to add Panfrost support

07:10 <HdkR> daniels: To nvtop?

07:10 <daniels> yeah

07:10 <HdkR> Nice~

07:13 ngcortes has joined #dri-devel

07:16 kzd has quit [Quit: kzd]

07:16 youmukonpaku1337 has joined #dri-devel

07:20 youmukon1 has quit [Read error: Connection reset by peer]

07:22 flynnjiang has quit [Quit: Leaving]

07:23 kzd has joined #dri-devel

07:30 sarahwalker has joined #dri-devel

07:32 fab has joined #dri-devel

07:32 swalker_ has joined #dri-devel

07:33 swalker_ is now known as Guest2114

07:36 Haaninjo has joined #dri-devel

07:37 youmukon1 has joined #dri-devel

07:38 sarahwalker has quit [Ping timeout: 480 seconds]

07:39 simon-perretta-img has joined #dri-devel

07:41 youmukonpaku1337 has quit [Ping timeout: 480 seconds]

07:43 rgallaispou has joined #dri-devel

07:44 rgallaispou has quit [Remote host closed the connection]

07:47 kzd has quit [Quit: kzd]

07:48 lynxeye has joined #dri-devel

07:49 rgallaispou has joined #dri-devel

07:50 vliaskov has joined #dri-devel

07:51 youmukonpaku1337 has joined #dri-devel

07:55 youmukon1 has quit [Read error: Connection reset by peer]

07:56 donaldrobson has joined #dri-devel

07:58 tristan has joined #dri-devel

07:59 tristan is now known as Guest2116

08:01 youmukon1 has joined #dri-devel

08:05 youmukonpaku1337 has quit [Read error: Connection reset by peer]

08:13 flynnjiang has joined #dri-devel

08:15 neutron has quit [Remote host closed the connection]

08:15 dtmrzgl has quit []

08:23 dtmrzgl has joined #dri-devel

08:26 ap51 has joined #dri-devel

08:28 itoral_ has quit [Remote host closed the connection]

08:28 itoral_ has joined #dri-devel

08:31 youmukonpaku1337 has joined #dri-devel

08:36 youmukon1 has quit [Ping timeout: 480 seconds]

08:39 youmukon1 has joined #dri-devel

08:43 youmukonpaku1337 has quit [Remote host closed the connection]

08:45 Ahuj has joined #dri-devel

08:46 ngcortes has quit [Read error: Connection reset by peer]

08:50 <tzimmermann> airlied, sima, please merge last week's drm-misc-next-fixes: https://lore.kernel.org/dri-devel/20230901070123.GA6987@linux-uq9g/T/#u

08:55 illwieckz has quit [Ping timeout: 480 seconds]

09:02 camus1 has quit [Remote host closed the connection]

09:03 camus has joined #dri-devel

09:04 illwieckz has joined #dri-devel

09:15 <karolherbst> itoral_: soo.. I've fixed a bunch of other issues and I think the GPU can get into an invalid state through atomic operations. No idea _why_ but something weird seems to happen there

09:16 Haaninjo has quit [Quit: Ex-Chat]

09:33 <itoral_> not sure what could be wrong with those, I've never seen any issues and there are plenty of tests for these in CTS... are those issues repeateable or do they happen randomly?

09:38 <itoral_> I assume they are 32-bit atomics and the addresses are 32-bit aligned, right?

09:44 <karolherbst> uhm.. didn't check

09:49 Guest2116 has quit [Ping timeout: 480 seconds]

10:05 Cyrinux9474 has quit []

10:07 Cyrinux9474 has joined #dri-devel

10:11 <karolherbst> itoral_: the odd thing is, it might also be something else, because global atomics are not supported by the compiler anyway, but still the atomic tests are the one making the GPU go all weird

10:11 youmukon1 has quit [Remote host closed the connection]

10:11 youmukonpaku1337 has joined #dri-devel

10:18 <karolherbst> ohhhh....

10:18 <karolherbst> uhhh

10:19 <karolherbst> itoral_: I know what's happening...

10:19 <karolherbst> itoral_: sooo.. if the compiler hits an unknown intrinsics it continues compiling ...

10:19 <karolherbst> and then throws a random broken shader at the GPU

10:19 <karolherbst> and apparently the GPU doesn't like it

10:20 <austriancoder> has somebody written an gpu opcode verifier? looking for inspiration before porting the old etnaviv one

10:20 Daaanct12 has joined #dri-devel

10:20 <karolherbst> I'll throw in asserts and see if dealing with it fixes all the issues I've seeing

10:20 <karolherbst> *I'm seeing

10:20 elongbug has joined #dri-devel

10:21 Daanct12 has quit [Ping timeout: 480 seconds]

10:21 <karolherbst> https://gitlab.freedesktop.org/karolherbst/mesa/-/commit/f43102a114629d4ea80a009004d2c3fe61c68abe

10:22 <karolherbst> I wouldn't be surprised if the GPU gets into an infinite loop or other fun things

10:22 <itoral_> yeah, could be

10:22 <karolherbst> does the kernel driver handle those cases where the GPU is getting stuck with a job?

10:23 <itoral_> I think the job would timeout and you would see a message in dmesg

10:23 <karolherbst> mhh

10:24 <karolherbst> well.. I didn't see a timeout at least

10:24 <karolherbst> but userspace also waits infinitely

10:25 <itoral_> either way, if we are submitting shaders with bogus load/store ops (due to unsupported intrinsics) I guess anything can happen :P

10:25 <karolherbst> maybe something else is going on, but I'll do another full run and see if I still see the GPU getting stuck

10:25 <karolherbst> yeah...

10:27 <karolherbst> anyway, I'll probably create an MR with some initial patches tomorrow, because I already have a few things I might want to get some feedback on

10:29 <karolherbst> mhh, still seeing the GPU getting stuck, let's see what it is this time

10:29 mauld has quit [Remote host closed the connection]

10:31 <austriancoder> freedreno's computerator looks nice

10:36 <kode54> looks like Intel's compiler has an unreachable intrinsic check as well, but it only asserts if NDEBUG isn't set

10:36 <kode54> unless unreachable() is just a crash

10:37 Company has joined #dri-devel

10:43 mauld has joined #dri-devel

10:45 <HdkR> kode54: unreachable should be a SIGABRT or SIGILL depending on arch

10:47 <kode54> gotcha

10:47 <kode54> so the issue must be something else

10:48 <kode54> the atomic mention above made me think that

10:48 <kode54> some games using that on xe.ko seem to just crash the GuC easily

10:48 <kode54> or perhaps it is something else

10:58 vliaskov has quit [Remote host closed the connection]

11:08 <tintou> olv: Hi there! I have a question, when you introduced perfetto you added the "slow" category, but I don't find any user of it, what was the plan behind this?

11:09 jdavies has joined #dri-devel

11:09 jdavies is now known as Guest2130

11:11 Guest2130 has quit []

11:13 <karolherbst> itoral_: how can I dump what shader is generated?

11:15 <karolherbst> anyway.. the hang I'm seeing seems to be happen randomly with some of the buffer tests (which literally just try to do some global load/stores) but I'm also seeing some invalid memory accesses

11:15 <itoral_> V3D_DEBUG=cs should dump NIR, VIR and QPU assembly for compute shaders

11:16 <karolherbst> anyway.. the callback I've written for nir_lower_mem_access_bit_sizes is probably incorrect..

11:20 <itoral_> do we need that? v3dv supports vk_khr_{16,8}bit_storage, so regular ubo/ssbo/shared load/store is supported

11:21 <itoral_> or is that for unligned accesses?

11:21 <karolherbst> CL has vec16

11:21 <karolherbst> but you'll also see things like vec4x8 load/stores

11:22 <karolherbst> it kinda sounded like that only single component load/stores are supported in 8/16 bit

11:22 <karolherbst> but yeah.. CL also has unaligned accesses as it also has packed structs like C

11:22 <itoral_> uh, ok

11:24 <karolherbst> my current implementation: https://gitlab.freedesktop.org/karolherbst/mesa/-/commit/44129edd6e63efde94a00e3d0f89c6b0e1f94956

11:24 <karolherbst> it's kinda copied from asahi I think, but also adjusted

11:24 <karolherbst> but I think the issue I'm seeing is something else...

11:37 <itoral_> that code looks valid for us too, we are also in the bucket of only supporting 32-bit vector acccess

11:37 <karolherbst> yeah...

11:37 <karolherbst> I think something else is going very wrong, but I just don't know what.. maybe the way I've implemented load/store global is messing up with stuff

11:37 frankbinns has quit [Remote host closed the connection]

11:38 <karolherbst> but it generally also works

11:38 frankbinns has joined #dri-devel

11:39 <itoral_> if you can create an issue with details on how to replicate this I can try to give it a go tomorrow and see if I can figure out what is happening

11:40 <karolherbst> here is the commit to add global mem support to v3d in case you can see something which isn't necessarily correct there: https://gitlab.freedesktop.org/karolherbst/mesa/-/commit/a641d1fb1cfbea8cf32d66be0941bf2f70d658e6

11:40 <itoral_> or rather next Monday, I won't be online tomorow :)

11:40 <karolherbst> fair enough :D

11:41 <karolherbst> mhh

11:41 <karolherbst> maybe memory gets dealloacted while it's still in use...

11:41 <karolherbst> which.. shouldn't happen

11:43 <tnt> I'm a bit unfamiliar with the schedule for mesa releases but how/when is the cut-off for 23.3 features done ? Like how long would I have to get (non-bug-fixes) patches merged onto main for them to make it in 23.3 ?

11:54 <itoral_> karolherbst:this is what I did when I added support for global_2x32 in our compiler, maybe it is a good reference to go through that and check if your global intrinsics are handled properly in all the right places

11:54 <itoral_> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17275/diffs?commit_id=fa03d9c8be3751da00d2365533211ae8e034498f

11:55 <itoral_> for example, you may need to do something for our v3d_nir_lower_load_store_bitsize pass, etc

11:56 <itoral_> which I don't see in your patch

11:57 <itoral_> and we do have code there to handle global_2x32

11:57 itoral_ has quit [Quit: Leaving]

11:59 yyds has quit [Remote host closed the connection]

12:10 <daniels> zmike: this seems obviously bad rendering, guess it should either be fixed or just disable that trace on zink-tu-a618 until it is https://mesa.pages.freedesktop.org/-/mesa/-/jobs/48663147/artifacts/results/summary/results/trace@zink-a618@gputest@pixmark-piano-v2.trace.html

12:11 <zmike> I guess I'll take your word for it since I can't see the reference?

12:14 tristan has joined #dri-devel

12:15 tristan is now known as Guest2134

12:16 youmukonpaku1337 has quit [Remote host closed the connection]

12:16 youmukonpaku1337 has joined #dri-devel

12:17 alyssa has quit [Quit: alyssa]

12:18 fab has quit [Quit: fab]

12:23 youmukonpaku1337 has quit [Remote host closed the connection]

12:24 youmukonpaku1337 has joined #dri-devel

12:26 heat has joined #dri-devel

12:29 <daniels> yeah, I suspect a whole load of reference images fell out at some point, but the sheet music doesn't look great?

12:30 fab has joined #dri-devel

12:32 Daaanct12 has quit [Quit: WeeChat 4.0.4]

12:33 Daanct12 has joined #dri-devel

12:36 <zmike> maybe that's how it's supposed to look?

12:36 <zmike> not sure I've ever looked at results for this trace before

12:37 youmukonpaku1337 has quit [Remote host closed the connection]

12:38 youmukonpaku1337 has joined #dri-devel

12:47 YuGiOhJCJ has quit [Quit: YuGiOhJCJ]

12:59 Daanct12 has quit [Quit: WeeChat 4.0.4]

13:01 vliaskov has joined #dri-devel

13:08 MrCooper has quit [Remote host closed the connection]

13:08 MrCooper has joined #dri-devel

13:40 Guest2134 has quit [Ping timeout: 480 seconds]

13:42 <gildekel> emersion: `a new prop might be better`

13:42 <gildekel> I believe we reached an agreement in which a new link-status=TERMINAL is added, and the connector is marked as disconnected, as you suggested initially.

13:44 <gildekel> There is still some debate around how to propagate this state down an MST topology, but I think that at this point, I'll write some code and the discussion can continue via the review process, or once people get a chance to digest the new approach.

13:50 yuq825 has quit [Remote host closed the connection]

13:55 youmukonpaku1337 has quit [Remote host closed the connection]

13:55 youmukonpaku1337 has joined #dri-devel

14:03 rasterman has quit [Quit: Gettin' stinky!]

14:11 fab has quit [Read error: Connection reset by peer]

14:12 pq has quit [Ping timeout: 480 seconds]

14:16 youmukonpaku1337 has quit [Remote host closed the connection]

14:17 youmukonpaku1337 has joined #dri-devel

14:17 kzd has joined #dri-devel

14:18 pq has joined #dri-devel

14:30 ap51 has quit [Ping timeout: 480 seconds]

14:33 danylo is now known as Guest2143

14:33 danylo has joined #dri-devel

14:33 enunes- has quit [Remote host closed the connection]

14:33 enunes has joined #dri-devel

14:34 youmukon1 has joined #dri-devel

14:37 periontus has joined #dri-devel

14:37 periontus was kicked from #dri-devel by ChanServ [You are not permitted on this channel]

14:38 Guest2143 has quit [Ping timeout: 480 seconds]

14:38 youmukonpaku1337 has quit [Read error: Connection reset by peer]

14:39 An0num0us has quit [Ping timeout: 480 seconds]

14:43 fab has joined #dri-devel

14:48 junaid has joined #dri-devel

14:48 mediaim has joined #dri-devel

14:55 <mediaim> That technology is very safe, clock lines need emfi absorber protection, and there is a hw tpm and hw shifted base window i.e MMIO bar circulates all the time, and the procedure uses ciphered instruction to enter into secure enclave, where memory protection is toggled out on request, hence i had serious doubt that estonians got the satellite sent to anywhere, cause their first revision of the hw did not have mmu, but third revision had

14:55 <mediaim> some ppram reram security in fact, which was in random phase. So this they specified later, maybe it was in the first revision too of the estcube, than technically it was possible that the satellite was orbiting , all in all you need to have some security to make those things , and self-driving cars and satellites can have very strong and basically non compromise able security, international space station relay too, cause security is

14:55 <mediaim> theoretically at max, hw trojans should be scanned out from such hw, the last which would be the only way to compromise such systems.

14:57 kts has joined #dri-devel

15:05 jewins has joined #dri-devel

15:09 An0num0us has joined #dri-devel

15:21 Duke`` has joined #dri-devel

15:23 kts has quit [Ping timeout: 480 seconds]

15:26 tzimmermann has quit [Quit: Leaving]

15:26 <mediaim> So what would be beneficial is to make some a category electrician and programming certificate, there is many opportunities, i have looked what boston robotics systems do, and have some British factory demo streams, as to how well things can be arranged, but human life is that it can not be so paranoid either, you want to live your life, not in cage like putin now has to do, so this is always a trade-off.

15:35 <mattst88> mediaim: please stop spamming us. you've been doing this for 10+ years and we do not appreciate it

15:35 <mattst88> we have tried to explain this to you. please stop spamming

15:37 <mediaim> I do not think that people who assault behind the back consistently are doing well in life, you get torn too, you have been doing this all my existence. And your line count is way longer on IRC too, but your cunt actions deserve a penalty.

15:39 <mattst88> please do not talk about off-topic things in our IRC channels

15:39 <mattst88> it is not welcome here

15:41 <mediaim> fair enough if you please stop being suicidal terrorist, we have an agreement!

15:42 <mattst88> I agree. I will not be a suicidal terrorist

15:42 <mediaim> ack, good bye.

15:42 mediaim has quit [Quit: Leaving]

15:52 mszyprow has quit [Ping timeout: 480 seconds]

15:55 <zmike> mattst88 4 prez

15:56 * mattst88 is stunned

15:56 <idr> WHAT HAPPEN?

15:56 <karolherbst> :O

16:02 <DemiMarie> kode54: Can you report the GuC crashes to Intel and have them fix them? GuC crashes smell of a potential security problem to me.

16:11 lynxeye has quit [Quit: Leaving.]

16:23 Guest2114 has quit [Remote host closed the connection]

16:27 <olv> tintou, it was for trace events that was too heavy to be enabled by default. There is indeed no user.

16:32 donaldrobson has quit [Ping timeout: 480 seconds]

16:41 junaid has quit [Remote host closed the connection]

16:54 <robclark> mattst88: wow

16:55 <mattst88> robclark: all because I didn't have permissions to akick too :P

16:55 bmodem has quit [Ping timeout: 480 seconds]

16:57 kzd has quit [Quit: kzd]

16:57 <kisak> For some reason I have trouble believing that both parties will honor that agreement, but best of luck to both of you.

16:57 <mattst88> hahaha

16:57 <idr> Lol

16:58 Jeremy_Rand_Talos__ has quit [Remote host closed the connection]

16:59 Jeremy_Rand_Talos__ has joined #dri-devel

17:01 tertl8 has quit [Quit: Connection closed for inactivity]

17:08 DemiMarie has left #dri-devel [#dri-devel]

17:10 kzd has joined #dri-devel

17:11 youmukonpaku1337 has joined #dri-devel

17:13 youmukon1 has quit [Read error: Connection reset by peer]

17:18 frieder has quit [Remote host closed the connection]

17:24 kzd has quit [Quit: kzd]

17:24 junaid has joined #dri-devel

17:27 kzd has joined #dri-devel

17:31 kzd has quit []

17:35 mbrost has joined #dri-devel

17:53 kzd has joined #dri-devel

17:57 kzd has quit []

18:05 Cyrinux9474 has quit []

18:07 Cyrinux9474 has joined #dri-devel

18:24 Ahuj has quit [Ping timeout: 480 seconds]

18:47 Ahuj has joined #dri-devel

18:50 Haaninjo has joined #dri-devel

19:07 Ahuj has quit [Ping timeout: 480 seconds]

19:11 Kayden has quit [Quit: to JF]

19:12 oneforall2 has quit [Remote host closed the connection]

19:15 oneforall2 has joined #dri-devel

19:20 sukrutb_ has quit []

19:21 fab has quit [Quit: fab]

19:29 gouchi has joined #dri-devel

19:36 Kayden has joined #dri-devel

19:43 junaid has quit [Ping timeout: 480 seconds]

19:48 kzd has joined #dri-devel

19:49 youmukon1 has joined #dri-devel

19:50 youmukon1 has quit []

19:51 youmukon1 has joined #dri-devel

19:51 youmukon1 has quit []

19:51 youmukon1 has joined #dri-devel

19:52 youmukon1 has quit []

19:52 youmukonpaku1337 has quit [Ping timeout: 480 seconds]

19:52 youmukonpaku1337 has joined #dri-devel

19:57 * ccr blinks.

19:58 <karolherbst> I'm going crazy... setting `V3D_DEBUG=` causes MMU faults?

19:58 <karolherbst> and yes, I'm setting an empty value

19:59 <karolherbst> I should run with libasan...

20:00 rasterman has joined #dri-devel

20:03 junaid has joined #dri-devel

20:06 mszyprow has joined #dri-devel

20:11 sima has quit [Ping timeout: 480 seconds]

20:12 crabbedhaloablut has quit []

20:23 kzd has quit [Quit: kzd]

20:38 Haaninjo has quit [Quit: Ex-Chat]

20:39 Duke`` has quit [Ping timeout: 480 seconds]

20:42 mszyprow has quit [Ping timeout: 480 seconds]

20:52 kzd has joined #dri-devel

20:52 heat has quit [Read error: Connection reset by peer]

20:53 heat has joined #dri-devel

20:58 youmukonpaku1337 has quit [Remote host closed the connection]

20:58 youmukonpaku1337 has joined #dri-devel

21:04 junaid has quit [Remote host closed the connection]

21:14 rasterman has quit [Quit: Gettin' stinky!]

21:16 youmukonpaku1337 has quit [Quit: WeeChat 4.0.4]

21:32 youmukonpaku1337 has joined #dri-devel

21:40 gouchi has quit [Remote host closed the connection]

21:41 An0num0us has quit [Ping timeout: 480 seconds]

21:44 <karolherbst> maybe it's something on the kernel side...

21:51 Cyrinux9474 has quit []

22:02 An0num0us has joined #dri-devel

22:06 frankbinns has quit [Remote host closed the connection]

22:10 Cyrinux9474 has joined #dri-devel

22:10 An0num0us has quit [Ping timeout: 480 seconds]

22:25 kts has joined #dri-devel

22:35 youmukonpaku1337 has quit [Read error: Connection reset by peer]

22:45 vliaskov has quit [Remote host closed the connection]

22:52 <mareko> nah, he continues on #radeon

22:52 <Sachiel> just tell him you will also not be a suicidal terrorist

22:58 mbrost has quit [Ping timeout: 480 seconds]

23:00 <Lynne> yet another hard enough crash on amd to prevent reisub from working

23:01 <Lynne> does the hardware have enough tools to allow for at least a full reset without restarting the entire system?

23:03 <karolherbst> you shouldn't have to reboot your system, yes

23:04 fxkamd has joined #dri-devel

23:08 kzd has quit [Quit: kzd]

23:27 <Lynne> something to look forward to, I guess

23:28 <Lynne> how do windows users have less catastrophic crashes like these?

23:29 <Lynne> and how do amd's pro drivers avoid crashing altogether? do they insert sanitation in shaders during compilations?

23:29 <bnieuwenhuizen> amds pro drivers can generally crash just as much

23:30 <bnieuwenhuizen> though difference in userspace drivers and version of the kernel driver can mean you're avoiding specific bugs

23:30 <bnieuwenhuizen> the big issue with amd reset is often loss of VRAM, if you have something more than that, it is likely just the kernel driver being buggy again :|

23:31 <bnieuwenhuizen> (the initial hang is likely userspace issue of the app or userspace driver of course)

23:31 <Lynne> I meant their pro drivers for workstation GPUs, I think they had some guarantees about them not crashing

23:32 <bnieuwenhuizen> linux amdgpu-pro is mostly just the open source amdgpu kernel driver (+ shim to support multiple kernels) + amdvlk (with the LLVM compiler swapped out for their internal one)

23:32 <bnieuwenhuizen> not sure if they have a special distro for pro on linux outside of this these days?

23:32 kzd has joined #dri-devel

23:32 <bnieuwenhuizen> windows wise I have no clue

23:33 <Lynne> yeah, I think it was for windows

23:33 <bnieuwenhuizen> might just be a more complete test matrix

23:33 <Lynne> disappointing to hear that amdgpu-pro is just repackaged amdvlk, I was hoping it was another implementation of vulkan video which maybe I could have experimented with

23:34 <bnieuwenhuizen> I mean it may have some extra bits

23:34 <Lynne> ah

23:34 <bnieuwenhuizen> like RT was for the longest time available in amdgpu-pro and not amdvlk

23:34 <bnieuwenhuizen> but if amdvlk has vulkan video I doubt the impl will be meaningfully different on -pro

23:35 <bnieuwenhuizen> (that said, -pro might have some amf bits)

23:35 kts has quit [Quit: Leaving]

23:35 <Lynne> yup, pro has amf

23:37 <Lynne> also amdvlk has the hdr bits wired up

23:39 <Lynne> btw, airlied this happened with bwdif_vulkan, I'm guessing something still isn't entirely right, maybe sync-wise

23:40 <bnieuwenhuizen> hdr swapchain bits, or is there something video related?

23:41 <Lynne> the hdr swapchain bits

23:43 kzd has quit [Quit: kzd]

23:51 Danct12 has quit [Remote host closed the connection]

23:57 YuGiOhJCJ has joined #dri-devel