JohnnyonFlame has quit [Read error: Connection reset by peer]
sghuge has quit [Remote host closed the connection]
sghuge has joined #dri-devel
warpme has joined #dri-devel
kts has quit [Ping timeout: 480 seconds]
tursulin has joined #dri-devel
davispuh has joined #dri-devel
frieder has joined #dri-devel
frieder has quit []
frieder has joined #dri-devel
surajkandpal has joined #dri-devel
jsa has joined #dri-devel
qflex has joined #dri-devel
alanc has quit [Remote host closed the connection]
alanc has joined #dri-devel
surajkandpal has quit [Ping timeout: 480 seconds]
lemonzest has quit [Quit: WeeChat 4.2.1]
surajkandpal has joined #dri-devel
lynxeye has joined #dri-devel
surajkandpal has quit [Read error: Connection reset by peer]
lemonzest has joined #dri-devel
shalem has joined #dri-devel
surajkandpal has joined #dri-devel
mvlad has joined #dri-devel
yuq825 has quit [Ping timeout: 480 seconds]
yuq825 has joined #dri-devel
frieder has quit [Ping timeout: 480 seconds]
<jfalempe>
sima: I sent a v8 of drm_panic yesterday, can you check if the locking is better this time ?
<jfalempe>
sima: I directly register the primary plane now, so there is no need to walk through the plane list.
<sima>
jfalempe, will try to take a look, still behind on mails
<sima>
walking the plane list shouldn't be an issue though, since the planes are invariant over the lifetime of a drm_device
<sima>
but primary plane is probably the most expected case, so might still be good idea to avoid issues
<jfalempe>
sima: that simplifies the code a bit too, since the primary plane is what you really want to draw the panic screen.
flynnjiang has quit [Quit: flynnjiang]
bolson has quit [Remote host closed the connection]
frieder has joined #dri-devel
u-amarsh04 has quit [Quit: Konversation terminated!]
rasterman has joined #dri-devel
u-amarsh04 has joined #dri-devel
warpme has quit []
warpme has joined #dri-devel
<sima>
jfalempe, yeah makes senes
yuq825 has quit []
sukuna has quit [Remote host closed the connection]
vliaskov has joined #dri-devel
frieder has quit [Ping timeout: 480 seconds]
frieder has joined #dri-devel
narmstrong_ has quit []
narmstrong has joined #dri-devel
yyds has quit [Remote host closed the connection]
surajkandpal1 has joined #dri-devel
shalem has quit [Remote host closed the connection]
surajkandpal has quit [Ping timeout: 480 seconds]
rgallaispou has quit [Read error: Connection reset by peer]
apinheiro has joined #dri-devel
surajkandpal1 has quit [Ping timeout: 480 seconds]
kxkamil has quit []
guludo has joined #dri-devel
kts has joined #dri-devel
Namarrgon has quit [Ping timeout: 480 seconds]
kxkamil has joined #dri-devel
guludo has quit [Ping timeout: 480 seconds]
guludo has joined #dri-devel
kts has quit [Ping timeout: 480 seconds]
dorcaslitunyaVM has joined #dri-devel
yyds has joined #dri-devel
dorcaslitunya has joined #dri-devel
kts has joined #dri-devel
gouchi has joined #dri-devel
gouchi has quit [Remote host closed the connection]
Company has joined #dri-devel
Dorc has joined #dri-devel
surajkandpal has joined #dri-devel
rgallaispou has joined #dri-devel
gio has joined #dri-devel
<dorcaslitunya>
Hello, my name is Dorcas and I am currently interested in participating in X.Org Endless Vacation of Code. I am currently in search for a mentor and an available project that I can be able to work on for this program. I am interested in drm projects but I am also open to any available project and very willing to learn. Eager to learn and grow through the mentor and program. Any leads appreciated! Incase I am unavailable on irc
<dorcaslitunya>
, I can be found on email at anonolitunya@gmail.com
kts has quit [Quit: Konversation terminated!]
shalem has joined #dri-devel
shalem has quit []
hansg has quit [Quit: Leaving]
hansg has joined #dri-devel
Namarrgon has joined #dri-devel
cmichael has joined #dri-devel
itoral has quit [Remote host closed the connection]
frieder has quit [Ping timeout: 480 seconds]
dorcaslitunya has quit [Remote host closed the connection]
frieder has joined #dri-devel
surajkandpal has quit []
aravind has quit [Ping timeout: 480 seconds]
bmodem has quit [Ping timeout: 480 seconds]
yyds_ has joined #dri-devel
krumelmonster has quit [Ping timeout: 480 seconds]
dorcaslitunyaVM has quit [Ping timeout: 480 seconds]
yyds has quit [Ping timeout: 480 seconds]
apinheiro has quit [Quit: Leaving]
rgallaispou has left #dri-devel [#dri-devel]
bolson has joined #dri-devel
krumelmonster has joined #dri-devel
RSpliet has quit [Quit: Bye bye man, bye bye]
RSpliet has joined #dri-devel
yyds_ has quit [Remote host closed the connection]
dorcaslitunya has joined #dri-devel
dorcaslitunyaVM has joined #dri-devel
dorcaslitunyaVM has quit [Remote host closed the connection]
<tzimmermann>
sima, airlied, everybody seems to be sending feature PRs for drm-next. do you still accept them? abhinav__ asked for a final drm-misc-next PR for something needed by msm
<sima>
tzimmermann, I feel like I'll leave that to airlied :-)
<tzimmermann>
ok. i have no preference
<sima>
yeah same
dorcaslitunya has quit [Remote host closed the connection]
dviola has quit [Ping timeout: 480 seconds]
rppt has joined #dri-devel
dorcaslitunyaVM has quit [Ping timeout: 480 seconds]
crabbedhaloablut has quit []
crabbedhaloablut has joined #dri-devel
YuGiOhJCJ has quit [Quit: YuGiOhJCJ]
<gfxstrand>
eric_engestrom: Are you around? Could you take a very quick look at the meson bits of !27832
<gfxstrand>
dj-death: Did you see my comment on the descriptor buffer MR itself?
<dj-death>
gfxstrand: yeah
<dj-death>
gfxstrand: I think I implemented what you suggested
<dj-death>
gfxstrand: it's relocated constants
<dj-death>
gfxstrand: I just need access to the value to relocate in the shader when doing the shader upload
<dj-death>
gfxstrand: once the runtime change is merged, then the ugly thing you pointed out in the DB MR will go away
frieder has quit [Ping timeout: 480 seconds]
<gfxstrand>
dj-death: My suggestion was that you store the SAMPLER_STATE and whatever else you need to create the sampler in the anv_shader_bin itself and serialize it along with the rest of the shader. Then the only thing needed on upload is the device, not the pipeline layout.
<gfxstrand>
I haven't thought through what a mess that'll make of YCbCr, though.
<dj-death>
gfxstrand: so every shader upload would allocate a SAMPLER_STATE?
<dj-death>
gfxstrand: the spec guarantees that the used sampler stays alives during the lifetime of the shader so that would be an extra copy for not much
meymar has joined #dri-devel
<meymar>
karolherbst: mwk eat shit idiots.
<meymar>
Where as world would come down to depending anything what you do, it would actually look as big of a chaos as the code you do.
meymar was kicked from #dri-devel by ChanServ [You are not permitted on this channel]
<gfxstrand>
dj-death: I was assuming your hashing would fix it
<dj-death>
gfxstrand: yeah, even if I do want you suggest, that's still not solving the cache lookup
<dj-death>
oh... I see what you're suggesting now
<dj-death>
yeah I guess that works
<dj-death>
just have to have additional samplers allocated
frieder has joined #dri-devel
<dj-death>
but sounds closer to what sampler synthesizing does
<dj-death>
should remove the need for the change
<gfxstrand>
Yeah, that was my thought
<gfxstrand>
That way the sampler is really part of the shader and gets cached properly and you're not trusting the app to give you the same thing in the descriptor set every time.
<gfxstrand>
I mean, they had better otherwise it won't work and/or you had better hash and cache properly.
<gfxstrand>
But it also prevents issues where you have exactly the same shader, different by one sampler bit, and you get a cache collision
<dj-death>
yeah we're already hashing things properly
<dj-death>
it's just the sampler creation that's not what you suggest
<dj-death>
just border colors...
<gfxstrand>
*sigh* Border color...
<gfxstrand>
Why did we add border color back to Vulkan? 😢
<daniels>
wasn't it for zmike?
<zmike>
yes and thank you
<dj-death>
gfxstrand: thanks btw
<gfxstrand>
yw
<gfxstrand>
Sorry for chucking a spanner in the works. 😅
<dj-death>
there is still time to ship this sucker this week
<dj-death>
unless CI fails me
Dorc has quit [Ping timeout: 480 seconds]
frieder has quit [Remote host closed the connection]
dogukan has joined #dri-devel
Haaninjo has joined #dri-devel
<mripard>
sima: if everything's working fine for the drm gitlab repo, could you let sfr know to update linux-next?
<sima>
mripard, ah good point
<mripard>
I'm not sure I have the street cred to do so :)
kts has joined #dri-devel
karolherbst_ has joined #dri-devel
iive has joined #dri-devel
karolherbst has quit [Ping timeout: 480 seconds]
Duke`` has joined #dri-devel
karolherbst_ is now known as karolherbst
karolherbst has quit [Remote host closed the connection]
karolherbst has joined #dri-devel
alanc has quit [Remote host closed the connection]
simondnnsn has quit [Ping timeout: 480 seconds]
ninjaaaaa has quit [Ping timeout: 480 seconds]
ninjaaaaa has joined #dri-devel
simondnnsn has joined #dri-devel
Marcand has joined #dri-devel
Marcand has quit [Remote host closed the connection]
cmichael has quit [Quit: Leaving]
lynxeye has quit [Quit: Leaving.]
tzimmermann has quit [Quit: Leaving]
heat has joined #dri-devel
krushia has joined #dri-devel
ninjaaaaa has quit [Ping timeout: 480 seconds]
simondnnsn has quit [Ping timeout: 480 seconds]
simondnnsn has joined #dri-devel
ninjaaaaa has joined #dri-devel
junaid has joined #dri-devel
Jeremy_Rand_Talos__ has quit [Remote host closed the connection]
Jeremy_Rand_Talos__ has joined #dri-devel
hansg has quit [Quit: Leaving]
<DemiMarie>
Is it possible to guarantee that GPUs will not crash by validating the commands produced by the compiler?
<airlied>
no
<DemiMarie>
Why?
<DemiMarie>
I assume no firmware or hardware bugs.
<airlied>
turing completeness
<DemiMarie>
That only applies for arbitrary commands, not commands that are required to conform to certain restrictions.
<airlied>
compilers produce instructions from a arbitrary shader input
<DemiMarie>
See: Native Client
<DemiMarie>
“Crash” means “faults” for this purpose.
<airlied>
faulting is fine as long as the gpu and driver recovers
<airlied>
you cannot avoid faults since someone can just write a shader that takes an arbitrary ubo value and tries to load from it
<DemiMarie>
How often is recovery successful in practice?
<airlied>
depends on the gpu drivers and gpu I suppose, intel, nvidia seem to be pretty good, amd has it's issues but faults usually don't take it down
riteo has joined #dri-devel
<lstrano>
DemiMarie: you assume no firmware or hardware bugs?
halves has quit [Quit: o/]
halves has joined #dri-devel
halves has quit []
halves has joined #dri-devel
<DemiMarie>
lstrano: No unknown ones, yes. Same as how people writing code for a CPU assume no unknown CPU hardware or microcode bugs.
alanc has joined #dri-devel
psykose has quit [Remote host closed the connection]
psykose has joined #dri-devel
kts has quit [Ping timeout: 480 seconds]
dorcaslitunya has joined #dri-devel
dorcaslitunya has quit [Remote host closed the connection]
dorcaslitunya has joined #dri-devel
dorcaslitunya has quit [Remote host closed the connection]
<Company>
DemiMarie: can't you just write a program that crashes when it validates correctly and does't crash when the program would crash - and then feed it itself?
<DemiMarie>
Company: Such a program would be rejected.
<Company>
that's my attempt at the halting problem
<DemiMarie>
airlied: Is it mostly VCN problems that bring down the entire GPU?
<Company>
like, isn't that literally the halting problem you're trying to solve?
<DemiMarie>
Company: No. What I am asking is not, “Will XYZ program break the GPU?”, but rather “Does XYZ program conform to the restrictions I have set, which are enough to ensure that there are no unrecoverable faults?”.
<agd5f>
DemiMarie, GPU page faults never directly hang the GPU. They trigger an interrupt and writes are dropped and reads return 0 or something like that. Where it can be a problem is if the page fault leads to a problem due to the data returned. E.g., if the data is garbage and some hardware is waiting on bit in that memory that never changed
<DemiMarie>
agd5f: which hardware would that be?
<agd5f>
AMD GPUs
<DemiMarie>
Do AMD GPUs have such uninterruptable polling loops?
<DemiMarie>
That seems like a hardware or firmware bug to me.
<Company>
DemiMarie: but that turns your accepted programs into a small subset of all programs, because Turing-completeness would cause the halting problem
<DemiMarie>
Company: which is fine in this use-case
<DemiMarie>
because any program can be transformed into one that is in this subset.
<agd5f>
DemiMarie, fixed function hardware can't be preempted
<DemiMarie>
but this discussion is also too hypothetical and so should probably be dropped
<mattst88>
yay for self-awareness :D
<DemiMarie>
agd5f: can whatever is feeding it be preempted and forced to give the fixed-function hardware what it needs, so that it returns garbage instead of hanging forever?
<DemiMarie>
agd5f: the problem is that resetting the GPU seems to often take out the thing that froze, but also a bunch of other stuff
<DemiMarie>
the blast radius is the problem'
<agd5f>
DemiMarie, you can take a perf hit and only allow a single application on gfx at a given time
<DemiMarie>
agd5f: how big a perf hit?
<DemiMarie>
Fundamentally, my question is about why the hardware is designed the way it is.
<agd5f>
depends on what you are trying to run
<DemiMarie>
agd5f: typical desktop
<agd5f>
mainly single app use cases probably not that much. will impact more when you have multiple apps active since they can't run in parallel
<DemiMarie>
It’s not a question about how to work around the problems with existing hardware, but rather me trying to understand why hardware is designed as it is.
<agd5f>
DemiMarie, to maximize throughput
<DemiMarie>
agd5f: how does a wide blast radius help with that?
<pixelcluster>
agd5f: re: "GPU page faults never directly hang the GPU" really? always? this doesn't quite match what I see, AFAICT when page faults happen, the faulting waves get halted and I'm not sure if there is a way to un-halt them?
<DemiMarie>
for instance, why can’t whatever feeds the rasterizers just be forced to feed the rasterizer with fake input (zeros?) if it is taking too long?
dviola has joined #dri-devel
<agd5f>
pixelcluster, there is a debug module parameter you can set to force a halt on a fault, but that is not the default
<pixelcluster>
but why does my compute shader get halted then if I use global_store_dword to write to NULL? this happens without any debug parameters set, on any kernel version
<DemiMarie>
agd5f: in practice, does a fault ever not result in a hang, unless one is doing a pure compute workload?
<agd5f>
DemiMarie, yes
<DemiMarie>
agd5f: what does the rasterizer do if the vertex shader faults?
<abhinav__>
yes sima airlied one more PR from misc would be great for our feature
<agd5f>
DemiMarie, consider something like a page fault in a command buffer or a resource descriptor or sampler descriptor. When that state gets loaded into the state machine, it might be an invalid combination that results in a deadlock somewhere in the hardware state.
<DemiMarie>
agd5f: because the hardware is not designed to be robust against malicious input?
<DemiMarie>
that seems like a hardware bug to me
<agd5f>
DemiMarie, too many combinations of state to validate
<DemiMarie>
agd5f: and they don’t try to use e.g. formal methods that work no matter how complex the system is?
<DemiMarie>
more generally, why is the rasterizer shared between all the shader cores?
<DemiMarie>
that seems to be the cause of the behavior that I am observing
<DemiMarie>
AMD’s fix for LeftoverLocals being “do not let different contexts run in parallel” tells me that, unless I am missing something, there is a lot of internal sharing between different contexts on the GPU
<eric_engestrom>
gfxstrand: I'm around, but I've hurt my wrist on top of already having a ton of non-work stuff taking all my time (and it will only increase for the next 2 months :/); looking at your MR now
<DemiMarie>
robclark: for AMD GPUs, I’m wondering if not allowing more than one VM to use the GPU in parallel would be a good idea.
<agd5f>
DemiMarie, the design largely the same for everyone's GPUs.
<pixelcluster>
having hardware potentially deadlock is not that big of a deal yeah
<DemiMarie>
is it guaranteed to only be a deadlock, and not e.g. a corruption of state belonging to other’s shaders (which would potentially allow privilege escalation)?
<DemiMarie>
or a leak of state from other’s shaders?
<pixelcluster>
not having a way to shut down everything from one specific context (and leaving everything from other contexts untouched) is an issue, and it would seem like other vendors can provide that
<DemiMarie>
pixelcluster: exactly!
<DemiMarie>
robclark: my concern is that AMD GPUs might have a lot more state (architectural or otherwise) that is shared between contexts than it seems
<robclark>
limiting to single process using gpu at a time is a thing you want for preventing information leaks.. not sure if released yet but AFAIU something is in the works (not sure if fw or kernel level)
<pixelcluster>
the rasterizers being shared isn't an issue in practice, I'm pretty sure
<DemiMarie>
robclark: on AMD or everywhere?
rasterman has quit [Quit: Gettin' stinky!]
<pixelcluster>
I don't think more than one app can use raster hw at a time anyway, and that has always been this way
<robclark>
well, probably everywhere for GPUs which can have multiple processes active at a time (think LL type issues)
<DemiMarie>
I don’t want to get GPU acceleration shipped only for marmarek to have to issue a Qubes Security Bulletin for “oops, we didn’t properly isolate stuff on the GPU.”
<DemiMarie>
robclark: I thought Intel and Nvidia were immune to LL
<agd5f>
DemiMarie, there is no more or less state shared between contexts. The issue is that there is only one of each fixed function block so it can only be used by one thing at a time. If a particular fixed function block hangs you have to reset the whole engine. Maybe other vendors can just reset a particular block.
<DemiMarie>
agd5f: I suspect being able to reset individual blocks is the difference
<robclark>
I think intel already had sufficient state clearing (which I think implies they don't have multiple processes running at same time).. nv I'm not really sure the situation, we don't use 'em in any chromebooks
<DemiMarie>
Intel having enough state clearing is good
<DemiMarie>
If it is a kernel fix, what about forbidding multiple VMs from sharing the GPU at the same time, and then letting the guest decide if it wants to let multiple processes from itself run simultaneously?
<DemiMarie>
robclark: also is it okay if I send you a direct message about something that is related to Chrome security but not graphics?
<robclark>
I'm not sure what level of granularity is possible.. but if it is possible it could be useful to isolate VMs without isolating every process
<robclark>
sure
<robclark>
note that with nctx the host kernel sees a single process with multiple dev file open()s for a single VM
<DemiMarie>
robclark: I think that would be useful too
tursulin has quit [Ping timeout: 480 seconds]
<airlied>
mripard: maybe just send one more PR I don't think it'll matter at this point
<DemiMarie>
robclark: I’m pinging you a few times in various Qubes issues
alyssa has joined #dri-devel
<alyssa>
karolherbst: do we support work_group_scan_*?
<karolherbst>
no
<alyssa>
k
<karolherbst>
those are crazy to implement anyway
<alyssa>
:(
* alyssa
wants them for geometry shaders
<karolherbst>
they are like subgroup ops, but over the entire block
<alyssa>
i know that's why i want them
<karolherbst>
I think there is nothing against implementing them, I just suspect it's a lot of lowering
<alyssa>
sure
<alyssa>
also, is there any standard ballot in CL?
<alyssa>
I polyfilled it in asahi_clc but that seems.. wrong
<karolherbst>
there is cl_khr_subgroup_ballot
<alyssa>
ahaha nice thanks :)
<karolherbst>
it's in no way wired up, because I think the CTS requires weird features to test it or something...
<karolherbst>
but I doubt it's hard to implement and might just work once the clc bits are added
<alyssa>
meh I'll just use my hack for now
<karolherbst>
okay
* alyssa
doing gpu horrors
<karolherbst>
guess I should look into this soon, so it's better supported in mesa
<dj-death>
karolherbst: how does that work in divergent control flow?
<DemiMarie>
Thanks to everyone here for taking their time to help me as I try to understand GPUs and design GPU acceleration for Qubes OS!
<karolherbst>
dj-death: through cl_khr_subgroup_non_uniform_vote maybe?
<gfxstrand>
airlied: I've already done that part twice... I find it hard to believe that NVIDIA would be the insane one of the three.
<karolherbst>
ohh wait
<karolherbst>
maybe you can only do that in kernel functions? mhhh
<karolherbst>
🙃
<karolherbst>
but yeah.. makes sense because otherwise you wouldn't be able to calculate how much local memory a kernel entry point would use...
<karolherbst>
alyssa: I guess you need to pass it in as a pointer and then deal with the size at runtime, kinda like the variable shared mem stuff
<alyssa>
exciting
<karolherbst>
though.. I mean it's driver internal stuff, just keep track of how much shared memory you need 🙃
<alyssa>
(-:
* alyssa
will continue this Later
mvchtz has quit [Ping timeout: 480 seconds]
mvchtz has joined #dri-devel
guludo has quit [Quit: WeeChat 4.2.1]
<mdnavare_>
hwentlan_: Ping ?
danylo has quit [Quit: Ping timeout (120 seconds)]
danylo has joined #dri-devel
<mdnavare_>
hwentlan_: vsyrjala airlied sima : A mode that is a VRR mode, according to VESA def, its VTOTAL value is stretched out to match the VRR refresh rate. Since for VRR we ignore MSA, the VSE and VSS pulses are ignored by sink, so would it suffice to create a VRR mode that only has a different value of VTOTAL calculated as = clock/(htotal * vrr_refresh_rate)
mvchtz is now known as Guest1337
mvchtz has joined #dri-devel
Guest1337 has quit [Ping timeout: 480 seconds]
mbrost has joined #dri-devel
jsa has quit [Read error: Connection reset by peer]
vliaskov has quit [Remote host closed the connection]
<Kayden>
gfxstrand: congratulations on 1.3 :)
mvchtz has quit [Ping timeout: 480 seconds]
<gfxstrand>
Kayden: Thanks!
<gfxstrand>
I've now written 2 conformant 1.3 drivers. :D
mvchtz has joined #dri-devel
glennk has quit [Ping timeout: 480 seconds]
<airlied>
gfxstrand: you caught up :-P
<gfxstrand>
To myself, yes.
<gfxstrand>
Oh, right...
<gfxstrand>
airlied: But you can't really claim to have made RADV support 1.3
<gfxstrand>
So here's a question: If I rewrite NVK in rust, does that count as having written 3 Vulkan drivers?
<alyssa>
yes
<alyssa>
do it
<alyssa>
infinite conformant vk driver glitch
mvchtz is now known as Guest1340
mvchtz has joined #dri-devel
<airlied>
gfxstrand: hey having a company step in an staff up a team to make radv support 1.3 seems like a better win :-P
zhiwang1 has quit [Quit: Connection closed for inactivity]
<alyssa>
skill issue
<airlied>
but yes a rust rewrite could win, I don't think you can nerdsnipe me into lvp rust :-P
Guest1340 has quit [Ping timeout: 480 seconds]
<zmike>
surely you'd have to start from the foundation of lavapipe anyway