<u-amarsh04>
I'm trying to git bisect mesa under Debian / Devuan and have only git bisected kernels before. Are there any up-to-date guides to help?
<karolherbst>
u-amarsh04: you want to use `meson devenv` so you won't have to install stuff
<karolherbst>
but that's pretty much it
<karolherbst>
just write your reproduce script and use `git bisect run`
<karolherbst>
or do it manual if you can't automate it
<u-amarsh04>
so if I have git source in /usr/src/mesa and from in that directory have run "meson setup ../meson-test", what next?
heat has joined #dri-devel
adarshgm has quit [Ping timeout: 480 seconds]
kts has joined #dri-devel
<dj-death>
karolherbst: I'm starting to think lower_vars_to_ssa needs an upgrade
<dj-death>
karolherbst: we should really be able to remap everything if all accesses are not indexed with dynamic values
<karolherbst>
yeah...
<dj-death>
take the whole struct size
<dj-death>
split it into vec4s
<dj-death>
rebuild the casts
<dj-death>
I mean even vec16 right
<dj-death>
if it's all constant offsets, it'll get splitted correctly
<karolherbst>
it would probably be enough to just take into account actual location instead to just look at the logical paths
<karolherbst>
though we don't always have this information when calling into vars_to_ssa
<karolherbst>
we could require explicit types, but then we need to fix a couple of passes who can choke on that
<karolherbst>
though maybe only nir_opt_memcpy needs fixing
<karolherbst>
ehh
<dj-death>
explicit types is another issue on its own :)
<karolherbst>
maybe those things got fixed actually
<karolherbst>
well
<karolherbst>
you don't know the size of a struct without explict types
<karolherbst>
so taking the size into account is impossible without that
<dj-death>
the fact that we lower explicit types and the explicit_stride is still 0
<dj-death>
that's completely broken
<dj-death>
that's probably step0
<karolherbst>
I sadly don't know what was the reason for that, but gfxstrand might be able to tell
<karolherbst>
though I think explicit stride is only set in vtn by creating the type
<dj-death>
it's never set on structs
<dj-death>
only arrays & matrix
<karolherbst>
yeah, because structs don't have explicit strides
<karolherbst>
I think the bug is rather that we shouldn't rely on the explicit stride being set at all
<karolherbst>
_though_
<karolherbst>
having like this duality also sucks
<karolherbst>
but this smells like "spend a month to rework all of this" to change how we do things here
adarshgm has joined #dri-devel
<dj-death>
can't we just special case with nir->info.stage == MESA_SHADER_KERNEL and glsl_get_cl_size ? :)
<karolherbst>
I don't see how that would actually help? Because the fix is to not rely on the explicit stride anyway
<karolherbst>
I think
<karolherbst>
though I think that assigning a stride to every type is probably the better long term solution here and just rely on that. In any case, we should discuss with gfxstrand before somebody spends a lot of time on this and then we decide something else
Leopold_ has quit [Remote host closed the connection]
<thellstrom>
Has there been any discussion about using __GFP_ACCOUNT for user-mode triggered allocations in the kernel mode drivers? I can imagine starting adding that in existing drivers may break existing setups. But for new drivers, is this a good time to audit for such allocations and add it?
* alyssa
would like to get rid of SHADER_KERNEL someday
kts has quit [Ping timeout: 480 seconds]
<sima>
thellstrom, doing that for system ram gpu allocations is pretty much what the last big cgroups discussion boiled down to
<sima>
it's just kinda a pile of work
<karolherbst>
alyssa: that requires fixing zink :')
<sima>
thellstrom, wrt the backwards compat thing, I figured we'll just do a Kconfig
<karolherbst>
or well.. rework how gl sampler/textures work
<sima>
since it's kinda a distro choice
<sima>
but that means when we have it, we should at least try to account consistently across drivers
<karolherbst>
not sure we have any other reasons why there is still KERNEL
<sima>
which I don't think is that much work since almost everyone uses helpers nowadays
<sima>
the other fun part is that you kinda need a cgroups aware shrinker or it'll not work out great at all
<thellstrom>
sima: I was thinking of starting adding it initially to things we can't really shrink. Like persistent structure allocations triggered by user-space etc. But yeah for shrinkable buffer object memory, cgroups-aware shrinkers would be needed.
<sima>
thellstrom, I fear a bit that if we're very piecemeal then we need a new opt-in every time we add a substantial amount of memory
<sima>
but if we start out with gem bo accounting first, then adding the other bits should only ever really catch abusive applications that e.g. create a ton of ctx they don't actually use
<sima>
thellstrom, t j mercier did work on this last, including some charge transfer stuff that android would need, for otherwise it all lands in the central allocator binder process
adarshgm has quit [Ping timeout: 480 seconds]
<sima>
thellstrom, I chatted with mlankhorst about this earlier this week and also dropped a few links there, can dig them out again if you want
<thellstrom>
NP. I can ping mlankhorst if needed. This was more like a general question whether it was a "No, don't do that" thing.
kts has joined #dri-devel
<zmike>
dj-death: is there any progress on making anv work again or do I need to just locally revert however many patches
<dj-death>
zmike: not yet
<zmike>
🤕
<karolherbst>
could use llvm-16 locally
vliaskov has joined #dri-devel
itoral has quit [Remote host closed the connection]
<dj-death>
zmike: probably going to implement that stupid scratch to ssa plan
<u-amarsh04>
karolherbst - thanks, I think that I have it working now
<karolherbst>
dj-death: how would that work though?
<karolherbst>
like...
<karolherbst>
we need to be able to optimize it all before explicit_types
<karolherbst>
or rather...
<karolherbst>
before explicit_io
<karolherbst>
but maybe running explicit_types twice or moving scratch size calculation elsewhere might actually work?
<dj-death>
after explicit_io
<dj-death>
because I know my shaders
<dj-death>
with a special intel_clc --fuck-you-llvm-17
<karolherbst>
but then you can't calculate a new scratch_size
<karolherbst>
ahh.. so intel_clc specific pass then?
<dj-death>
I assume it'll be 0
<karolherbst>
right...
<dj-death>
dirty
<karolherbst>
probably good enough until we properly fix it
<dj-death>
but at least that'll make zmike happy
<dj-death>
yeah
adarshgm has joined #dri-devel
<zmike>
generally being able to init drivers does make me happy, yes
<zmike>
MrCooper: did you have additional comments or just the one
adarshgm has quit [Ping timeout: 480 seconds]
Leopold has joined #dri-devel
yyds has joined #dri-devel
apinheiro has quit [Quit: Leaving]
kts has quit [Ping timeout: 480 seconds]
Leopold has quit [Remote host closed the connection]
Leopold has joined #dri-devel
yyds has quit [Remote host closed the connection]
kts has joined #dri-devel
kts has quit [Remote host closed the connection]
kts has joined #dri-devel
Company has joined #dri-devel
sukrutb has joined #dri-devel
Jeremy_Rand_Talos has quit [Remote host closed the connection]
Jeremy_Rand_Talos has joined #dri-devel
sukrutb has quit [Ping timeout: 480 seconds]
kts has quit [Quit: Leaving]
f11f12 has joined #dri-devel
bolson has joined #dri-devel
<dj-death>
karolherbst: it's interesting that all the function stuff is using scratch
<dj-death>
karolherbst: and yet once you inline it goes away magically
<alyssa>
dj-death: i hit this with libagx too, yeah
<alyssa>
The big culprit are return values
<dj-death>
but that seems okay for me
<alyssa>
which are derefs in NIR so we're forced to use scratch variables for me
<alyssa>
s/me/them/
<alyssa>
but after inlining, vars_to_ssa chews through them and makes the variables go away
<alyssa>
but scratch_size is never decremented
tomba_ has quit [Remote host closed the connection]
<karolherbst>
dj-death: why is it interesting that function uses scratch? where else would you put it (besides SSA values)?
<karolherbst>
or do you mean the function parameters?
<dj-death>
karolherbst: yeah parameters
<karolherbst>
that's probably because kernels are cursed...
<karolherbst>
you can read up on some of the kernel wrapper we emit inside spirv_to_nir
<karolherbst>
but memory has to go somewhere anyway
<dj-death>
could be something else
<dj-death>
like stack
<karolherbst>
but I think most of it is due to llvm placing anything bigger always in memory? but the spirv way of passing things by value is also by putting them into function memory
<karolherbst>
and then pass in pointers
<karolherbst>
and yeah.. CL allows you to take pointers to stack memory :)
<karolherbst>
though I think the reason we end up with so many pointers as function arguments is because that's how LLVM and the translator work
<dj-death>
because for instance execute in RT pipelines goes into a different memory location than scratch for us
<karolherbst>
mhhh
<karolherbst>
we could potentially make use of other memory regions, but not quite sure how that all would look like as CL Creally has a stronger model here than GLSL
kts has joined #dri-devel
kzd has joined #dri-devel
anujp has joined #dri-devel
<jenatali>
dj-death: isn't scratch the same as stack?
<dj-death>
jenatali: what do you mean?
<jenatali>
alyssa: yeah see the long discussion around scratch_size yesterday and how it's currently terrible. You have to reset it and re-run vars to explicit types after optimizations
<jenatali>
dj-death: my interpretation of scratch is that it's equivalent to the C concept of stack memory
<dj-death>
jenatali: yeah, that's not really how our HW works
<dj-death>
jenatali: we have a special way to store scratch, which offsets magically per lane
<karolherbst>
jenatali: depends on how you look at it... technically CL private is more like thread local storage and CL local is more like CPU C stack?
<karolherbst>
just without dedicated hardware
<dj-death>
jenatali: but that only works if you don't reorder thread through shader calls (which our HW does)
<dj-death>
jenatali: so we have to store stack stuff somewhere else, independent of HW thread location
<karolherbst>
wait...
<karolherbst>
that's kinda cursed
<jenatali>
Oh sure, scratch is like stack from the single threaded POV, where local is like stack from the SIMT POV
<karolherbst>
dj-death: so you basically have to use a global buffer and offset per thread manually?
<dj-death>
karolherbst: correct
<dj-death>
for the stack
<dj-death>
we only use that in the RT pipelines atm
<karolherbst>
yeah.. figures
<karolherbst>
anyway... trying to figure out to fix this other issue with scratch..
<karolherbst>
dj-death: uhh.. how can I figure out the easiest way which kernels (as in the CL C code) causes issues?
<dj-death>
karolherbst: from what I've seen it's mostly structure
<dj-death>
karolherbst: and temps
<karolherbst>
I mean.. sure, but I still want to see the code so I can copy it instead of trying to figure out the examples myself
<dj-death>
karolherbst: like a fairly large private structure (20 dwords) passed by pointer to a function
<dj-death>
and the function picking some values to do something else with it
<dj-death>
I can only give you some example of what is causing problems there
<dj-death>
s/there/here/
<karolherbst>
yeah, that would be good enough for now
<dj-death>
okay, let me try to generate a few
<karolherbst>
like.. if you hav ethe code or should I just dump whatever gets passed into compiler_shader?
<karolherbst>
the reason llvm-17 ends up with u64* here is, because the translatoe "guesses" the function signature of the called function by looking at the pointer types it has atm, because LLVM doesn't have typed pointers anymore
<karolherbst>
soooo
<karolherbst>
e.g. if you call into memcpy, it might not be a void* thing, but a u64* thing or whatever
<karolherbst>
anyway.. now we optimize that one cast to a `deref_struct` thing :D
<karolherbst>
soooo
<karolherbst>
okay
<karolherbst>
here is the _actual_ difference
<karolherbst>
`nir_opt_memcpy` was able to recognize that the copy copies between two values of the same type and deconstructs it
<karolherbst>
or rather
<karolherbst>
converts memcpy_deref -> copy_deref
<karolherbst>
and we end up using scratch with llvm-17 because of nir_lower_memcpy
tzimmermann has quit [Quit: Leaving]
yyds has joined #dri-devel
<karolherbst>
but ultimately that means that the cast -> deref_struct optimization, breaks this use pattern
<abhinav__>
jani thanks, I will use drm-tip and post those again. we will need one help though, once reviewed and merged onto intel tree, we will need a tag with those patches so that we can base our tree on top of that as one series of ours needs those.
fireburn has quit []
<lumag>
jani, regarding abhinav__'s request. The depending series has mostly passed the reviews, so if possible we'd like to skip unnecessary delays.
jkrzyszt has quit [Ping timeout: 480 seconds]
<karolherbst>
do we have a handy nir helper to iterate over all uses of a def?
<dj-death>
karolherbst: oh thanks, will give this a try
<karolherbst>
what a mess honestly..
<karolherbst>
but I think a fix here might actually also help optimizing better in a few corner cases regardless of llvm being weird
<karolherbst>
maybe we want to have better memcpy opts 🤷
<dj-death>
testing ETA 45mn :)
bolson has quit [Remote host closed the connection]
bolson has joined #dri-devel
<HdkR>
8
cyrinux has quit []
cyrinux has joined #dri-devel
lynxeye has quit [Quit: Leaving.]
mbrost has joined #dri-devel
Haaninjo has joined #dri-devel
ced117 has quit [Remote host closed the connection]
ced117 has joined #dri-devel
rasterman has quit [Quit: Gettin' stinky!]
frieder has quit [Remote host closed the connection]
mbrost has quit [Read error: Connection reset by peer]
Kayden has quit [Quit: -> JF]
gouchi has joined #dri-devel
<zmike>
mareko: you probably know this off the top of your head - is it legal for e.g., a vertex attrib to be set with vertexArrayAttribFormat(R8_UINT) and then the shader attribute is type int?
<zmike>
I've skimmed core and glsl specs and I haven't yet found anything disallowing it
Guest2903 has quit []
fab has joined #dri-devel
fab has quit []
fab has joined #dri-devel
fab is now known as Guest2922
tursulin has quit [Ping timeout: 480 seconds]
Guest2922 has quit []
fab_ has joined #dri-devel
fab_ is now known as Guest2926
Guest2926 has quit []
sima has quit [Ping timeout: 480 seconds]
konstantin_ has joined #dri-devel
konstantin is now known as Guest2928
konstantin_ is now known as konstantin
Guest2928 has quit [Ping timeout: 480 seconds]
Haaninjo has quit [Quit: Ex-Chat]
Duke`` has quit [Ping timeout: 480 seconds]
mvlad has quit [Remote host closed the connection]
gouchi has quit [Remote host closed the connection]
iive has joined #dri-devel
Jeremy_Rand_Talos has quit [Remote host closed the connection]
Jeremy_Rand_Talos has joined #dri-devel
sarthakbhatt has joined #dri-devel
<sarthakbhatt>
Hi, I'm trying to fork the mesa repo but unfortunately I'm not able to fork it. I'm kinda new to mesa.