ChanServ changed the topic of #dri-devel to: <ajax> nothing involved with X should ever be unable to find a bar
JohnnyonFlame has joined #dri-devel
<jenatali>
Pretty sure you can do anything with a specialization constant
nchery_ has joined #dri-devel
<zmike>
there are some values that must be actual constants
<jenatali>
Just means you need to run the specialization pass from SPIRV-Tools before sending it to the Vulkan driver
mbrost__ has quit [Ping timeout: 480 seconds]
<zmike>
😬
<zmike>
at that point may as well just do normal pipeline variants
mbrost__ has joined #dri-devel
<karolherbst>
yeah... shared mem is terrible.. isn't there a vulkan extension or something?
<zmike>
could be
<zmike>
check the registry
nchery has quit [Ping timeout: 480 seconds]
<karolherbst>
doesn't seem to exist
<karolherbst>
heh.. but it's not shared mem why luxmark kills the GPU anyway
<zmike>
for now maybe just detect whether you're on zink and create different compute shaders for each variable size by setting static size as the total size
mbrost_ has joined #dri-devel
<karolherbst>
uhhhhhh
<zmike>
unless you want to plug in spec constant / pipeline variants
<karolherbst>
you know that this value is completely variable?
<zmike>
yep
<karolherbst>
not creating 65000 shaders
<zmike>
well if you want to do other stuff this weekend...
<zmike>
I'm saying I can do the variants next week
<zmike>
but this would work as a temporary solution
<karolherbst>
I think I try to fix why zink trashes the GPU context instead :)
<karolherbst>
ahh... right.. I used to use them, but it was fine.. maybe it's not with radv
<zmike>
if you have it installed you can use ZINK_DEBUG=validation
<karolherbst>
but maybe I should just in case
mbrost_ has quit [Ping timeout: 480 seconds]
<karolherbst>
mhh.. it's not printing anything, must be perfect code then
<karolherbst>
ehh.. I'll just run the cts on anv and see what's the difference
co1umbarius has joined #dri-devel
columbarius has quit [Ping timeout: 480 seconds]
<jenatali>
karolherbst: fwiw we'd need variants with shared memory size too, if you wanted to do it in rusticl so any drivers that need it don't have to do it
<karolherbst>
jenatali: I kind of plan on splitting it out so backends can deal with it however they wanted to
<karolherbst>
but maybe the test is doing something super silly? like using the same device multiple times?
<alyssa>
type=invalid? thinking
<zmike>
karolherbst: massif?
<karolherbst>
zmike: mhh?
<zmike>
to see where the allocation is
<karolherbst>
not in userspace
<zmike>
ah
<karolherbst>
the process is using 1.0% of mem :(
<karolherbst>
probably some GPU buffers or something? I dunno
<zmike>
if you're on amd you can use radeontop or nvtop to see vram utilization in realtime
<karolherbst>
it's anv sadly
<karolherbst>
heh on radv that test just works
<zmike>
find a 🍺 then
<karolherbst>
uses like 100MB of memory
<alyssa>
one line fix nvm
<karolherbst>
"Successfully created all 200 contexts." uhhhhh
<zmike>
man-blinking.gif
<HdkR>
Does it actually run out of memory or does it run in to the 65k VMA region limit? :D
<karolherbst>
I guess if a context creates more then 200MB of stuff I can see it running out of memory :D
<karolherbst>
HdkR: actual physical memory
<karolherbst>
but on the GPU side I think
<HdkR>
Very fancy
<karolherbst>
ehhh... but what are those 200 contexts anyway?....
<karolherbst>
I am sure I only create one screen...
<karolherbst>
and a context is per queue...
<karolherbst>
*pipe_context
<karolherbst>
ohh.. it actually creates queues and kernels and programs per cl_context
<karolherbst>
zmike: what's the vk type a pipe_context maps to? vkDevice? vkQueue?
<zmike>
there isn't one
<karolherbst>
anyway.. sounds like an anv bug to me :)
<zmike>
does indeed
lyudess has joined #dri-devel
<jenatali>
karolherbst: When you run on zink, which / how many devices do you show?
<karolherbst>
just one
<karolherbst>
whatever is the active/first vulkan device
Lyude has quit [Read error: Connection reset by peer]
<karolherbst>
zink can't be loaded multiple times yet afaik
<jenatali>
I see
<karolherbst>
I am sure not because of zink, just because there is no "iterate all render nodes" thing
<karolherbst>
anyway.. on radv it works
<karolherbst>
on anv it OOMs
<zmike>
you could probably do it by changing the icd env between loads
<karolherbst>
heh...
<karolherbst>
what a dirty hack, but I like it
kem has quit [Ping timeout: 480 seconds]
khfeng has joined #dri-devel
khfeng has quit []
khfeng has joined #dri-devel
kem has joined #dri-devel
<illwieckz>
dirty hacks need love too
<karolherbst>
yo, but that is seriously dirty as I'd have to parse env vars and locations and all of that
<zmike>
I guess maybe I should try adding fallback in zink to try progressively iterating through devices if the first one fails? 🤔
<karolherbst>
zmike: I need something like pipe_loader_probe just for vulkan
<zmike>
what is the end goal here
<karolherbst>
being able to load multiple devices through zink
<zmike>
yeah but do you care what you're loading or you're okay with just loading anything
<karolherbst>
I have to load _all_ devices
<zmike>
yeah so probably just adding fallback handling in zink would be fine
<karolherbst>
though at some point I want to either use the gallium native or the zink one though
<karolherbst>
so maybe we need a pipe_loader_probe which falls back to zink for the specific render node instead
<karolherbst>
or something
heat has quit [Ping timeout: 480 seconds]
<zmike>
hm
<jenatali>
That's how we handle our D3D layered impls
<jenatali>
Would be nice if we could do that for the CL/GL/VK layered impls too, but for now we just put all devices and just put them after the native impls
alyssa has left #dri-devel [#dri-devel]
YuGiOhJCJ has joined #dri-devel
YuGiOhJCJ has quit [Quit: YuGiOhJCJ]
jagan_ has quit [Remote host closed the connection]
srslypascal has quit [Remote host closed the connection]
srslypascal has joined #dri-devel
srslypascal is now known as Guest3159
srslypascal has joined #dri-devel
Guest3158 has quit [Ping timeout: 480 seconds]
lunarequest has joined #dri-devel
Guest3159 has quit [Ping timeout: 480 seconds]
srslypascal is now known as Guest3160
srslypascal has joined #dri-devel
heat has joined #dri-devel
srslypascal is now known as Guest3161
srslypascal has joined #dri-devel
lunarequest has left #dri-devel [#dri-devel]
srslypascal is now known as Guest3162
srslypascal has joined #dri-devel
Guest3160 has quit [Ping timeout: 480 seconds]
lunarequest has joined #dri-devel
Guest3161 has quit [Ping timeout: 480 seconds]
srslypascal has quit [Remote host closed the connection]
srslypascal has joined #dri-devel
Guest3162 has quit [Ping timeout: 480 seconds]
srslypascal has quit [Remote host closed the connection]
kts has quit [Quit: Leaving]
nullrequest has joined #dri-devel
srslypascal has joined #dri-devel
lunarequest has quit [Ping timeout: 480 seconds]
srslypascal has quit [Remote host closed the connection]
JoshuaAshton has quit [Ping timeout: 480 seconds]
gawin has joined #dri-devel
kts has joined #dri-devel
nullrequest has quit [Ping timeout: 480 seconds]
srslypascal has joined #dri-devel
srslypascal is now known as Guest3165
srslypascal has joined #dri-devel
Guest3165 has quit [Ping timeout: 480 seconds]
nullrequest has joined #dri-devel
nullrequest has left #dri-devel [#dri-devel]
srslypascal is now known as Guest3167
srslypascal has joined #dri-devel
Guest3167 has quit [Ping timeout: 480 seconds]
srslypascal is now known as Guest3168
srslypascal has joined #dri-devel
gouchi has joined #dri-devel
srslypascal has quit [Remote host closed the connection]
srslypascal has joined #dri-devel
Guest3168 has quit [Ping timeout: 480 seconds]
Company has quit [Quit: Leaving]
gawin has quit [Ping timeout: 480 seconds]
kem has quit [Ping timeout: 480 seconds]
kem has joined #dri-devel
sarnex has quit [Read error: Connection reset by peer]
sarnex has joined #dri-devel
chipxxx has quit [Remote host closed the connection]
MajorBiscuit has joined #dri-devel
srslypascal is now known as Guest3173
srslypascal has joined #dri-devel
Guest3173 has quit [Ping timeout: 480 seconds]
chipxxx has joined #dri-devel
Compy has joined #dri-devel
djbw has joined #dri-devel
srslypascal is now known as Guest3175
srslypascal has joined #dri-devel
Compy has left #dri-devel [#dri-devel]
Compy_ has joined #dri-devel
<Compy_>
Good morning all. I'm working on a baseline image targeting an Allwinner H2, MALI GPU (buildroot, linux 5.10.47). I've installed mesa3d (21.1.8) and the gallium lima driver, opengl EGL/ES and libdrm (2.4.107). Linux modules for KMSDRM are compiled in, and I see kmsg saying that lima has loaded/been detected successfully, however I can't get any KMSDRM devices in things like SDL2. From what I can tell, the
<Compy_>
KMSDRM_drmModeGetResources call is failing (the underlying IOCTL call). Any idea where I can start to look at this?
Guest3175 has quit [Ping timeout: 480 seconds]
Lucretia has quit []
Lucretia has joined #dri-devel
srslypascal is now known as Guest3177
srslypascal has joined #dri-devel
Guest3177 has quit [Ping timeout: 480 seconds]
kts has quit [Quit: Leaving]
MajorBiscuit has quit [Quit: WeeChat 3.5]
<pinchartl>
Compy_: my usual approach to this kind of problem is to trace it in the kernel. there are debugging options you can enable in the DRM/KMS core to output debug messages to the kernel log, tracing calls from userspace. if that's not enough, I usually add printk() statements to the code paths to locate the source of the error. this debugging strategy may be too influenced by a lifetime of kernel
<pinchartl>
development though :-)
<Compy_>
Hahah, I'm doing the same thing with printk() statements. I'll see if I can up that debug verbosity in the DRM/KMS core. Thanks a ton for the response to my semi-vague situation. Always appreciated :)
srslypascal is now known as Guest3178
srslypascal has joined #dri-devel
<pinchartl>
I'll have to do something similar to get GL-accelerated compositio on an i.MX8MP (with etnaviv, not lima), so I sympathize :-)
JohnnyonFlame has joined #dri-devel
Guest3178 has quit [Ping timeout: 480 seconds]
srslypascal is now known as Guest3183
srslypascal has joined #dri-devel
Guest3183 has quit [Ping timeout: 480 seconds]
srslypascal is now known as Guest3185
srslypascal has joined #dri-devel
Guest3185 has quit [Ping timeout: 480 seconds]
heat has quit [Ping timeout: 480 seconds]
JoshuaAshton has joined #dri-devel
columbarius has joined #dri-devel
alyssa has joined #dri-devel
<alyssa>
I'm seeing incorrect rendering of the in-game stars in Neverball on upstream Mesa.
<alyssa>
I expect all drivers are affected and all drivers need that fix
<alyssa>
But apparently I play more Neverball than you guys ;-p
Namarrgon has quit [Quit: WeeChat 3.6]
Namarrgon has joined #dri-devel
Namarrgon has quit []
Namarrgon has joined #dri-devel
<karolherbst>
alyssa: mind checking if "./build/test_conformance/buffers/test_buffers buffer_map_read_uint" fails randomly without the sync hacks you were talking about? I see it on zink and was thinking if it's the same thing you were seeing, because I can also fix it by hard syncing stuff
<alyssa>
I am not in an OpenCL branch right now so I can
<alyssa>
't test easily
<alyssa>
but ... everything was failing randomly, even early on in test_basic
<karolherbst>
ohh, I see
<karolherbst>
yeah,that's something else then
<karolherbst>
though that might be fixed with my MRs merged now
<karolherbst>
mapping of memory was bonkers and literally only worked on iris perfectly
<alyssa>
Delightful
<karolherbst>
yeah... but the new code should be much better.. at least it works on radeonsi and nouveau without issues
<karolherbst>
but still something is wrong with mapping memory on zink... :/
sdutt has joined #dri-devel
<alyssa>
nod
<alyssa>
I have some advice for anyone considering writing a compiler for a VLIW vec4 GPU
<alyssa>
Don't
<karolherbst>
+1
<karolherbst>
the apple stuff is like that, no?
<alyssa>
Not at all!
<alyssa>
AGX is pure scalar, dynamic scheduling, basically like a CPU except for control flow
<karolherbst>
ohh.. then I confused it with something else
<karolherbst>
or was it just that encoding is variable it size?
<alyssa>
Yeah
<alyssa>
that's just encoding though
<karolherbst>
right... that I mistook that for something else
<karolherbst>
*Then
<alyssa>
(and means that AGX programs are much smaller than any Mali on avg)
<karolherbst>
I am actually curious if that is a benefit or drawback overall
pcercuei has quit [Read error: Connection reset by peer]
<alyssa>
which?
pcercuei has joined #dri-devel
<karolherbst>
variable length
<alyssa>
Oh.. \shrug/
<alyssa>
Definitely good for icache use
<alyssa>
Makes no difference to sw
<alyssa>
No idea how much complexity it adds to the decoder though
<karolherbst>
yeah.. you probably need smaller caches, but nvidia was like: let's just go with 128b fixed length and yolo it
<karolherbst>
and it's very very wasteful overall
<karolherbst>
but maybe they just put bigger caches
<RSpliet>
reckon the icache has partially-decoded ops or like micro-ops inside?
<RSpliet>
Not that you'd be able to tell from the outside :-)
dv_ has quit [Ping timeout: 480 seconds]
gouchi has quit [Remote host closed the connection]
<karolherbst>
zmike: what's the thing I have to do for zink to make sure an operation on one pipe_context is actually visibile in another one... e.g. a buffer_subdata
dv_ has joined #dri-devel
<zmike>
karolherbst: depends on the type of buffer? if you actually get a direct mapping for it then it'll be visible immediately
<zmike>
but if it's vram then you'll get a staging slice, so you have to unmap/flush
<zmike>
after that the buffer should automatically synchronize other usage
<karolherbst>
mhhh, yeah so it's seems like the only case where it's always doing the correct thing is if I use a userptr/hostptr
<karolherbst>
mhh
<karolherbst>
something is very broken with all the memory consistency stuff
<zmike>
are you flushing the transfer map?
<karolherbst>
nope
<karolherbst>
just unmapping them
<karolherbst>
which I guess should be good enough, no?
<zmike>
that should be sufficient, though if you're using cross-context like that you also have to flush the context
<karolherbst>
yeah.. I always flush the helper context I use for data uploads
<zmike>
I'd expect that to be sufficient
<zmike>
can you describe to me your usage in terms of gallium calls
<karolherbst>
yeah.. it's weird
<zmike>
are you actually doing subdata or are you manually map/unmap
<zmike>
and what is the other usage for the buffer
<karolherbst>
subdata for initial data uploads, then a resource_copy + unsynchronized maps/unmaps or shadow reasoures... I think there are multiple bugs/issues
<karolherbst>
the shadow stuff seems to work reliably though
<karolherbst>
so I suspect something being odd with unsynchronized maps
<karolherbst>
and the tests cases with initial uploads are also flaky
<zmike>
it sounds like you probably want to be using the threaded context replace buffer hook?
<karolherbst>
mhh.. how would that look like?
<zmike>
create buffer A, subdata, bind as descriptor or whatever, dispatch compute, create buffer B with same size, subdata, replace storage (src=B, dst=A), dispatch compute, ...
<zmike>
ah, the issue might be with rebinds actually
<karolherbst>
not doing any compute launches though
<karolherbst>
that's all just plain copies
<zmike>
step through zink_buffer_map and see if it's discarding
<zmike>
that might be altering the behavior you're expecting
<zmike>
though it should be the same behavior as radeonsi
<karolherbst>
yeah... I'll check
<zmike>
I realize now that I don't think I hooked up anything for rebinding a buffer that's used as a global bind
<zmike>
so if that happens everything's fucked
<karolherbst>
it's not getting bound
<zmike>
just saying
<karolherbst>
there is really no kernel involved here
<karolherbst>
k
<zmike>
you still haven't told me what you're actually doing so I'm just saying things as I think of them
<karolherbst>
" subdata for initial data uploads, then a resource_copy + unsynchronized maps/unmaps or shadow reasoures" that's really all
<zmike>
I mean in terms of the exact command stream
<zmike>
GALLIUM_TRACE would've been great here if it worked
<karolherbst>
what would I need to do to hook it up?
<zmike>
there's some debug_wrap thing
<zmike>
inline_debug_helper.h
<zmike>
debug_screen_wrap
<zmike>
with that working you should be able to do GALLIUM_TRACE=t.xml <exe> and have it dump a huge xml thing that you can then use like `src/gallium/tools/trace/dump.py -N -p t.xml > dump`
<karolherbst>
it seems to be doing that automatically, but it's not setting less callbacks
<karolherbst>
ohh wait.. I guess I have to unwrap it
<zmike>
?
<zmike>
the point is that you use the wrapped screen+context from rusticl
<zmike>
and it'll trace everything you do
<karolherbst>
sure, but that trace context is already created by something else
<karolherbst>
mhh, let me check the code, maybe I miss something here
<zmike>
ergh I gotta paste you my filter command sometime
<karolherbst>
heh.. seems like the test indeed launches kernels.. strange and annoying
<zmike>
sed -i '/set_constant_buffer/d;/bind_sampler/d;/get_param/d;/get_shader_param/d;/is_format_supported/d;/get_compute_param/d;/get_compiler_options/d;/get_disk_shader_cache/d;/get_name/d;/get_vendor/d;/resource_bind_backing/d;/allocate_memory/d;/free_memory/d;/set_prediction_mode/d;/fence_reference/d;/delete_sampler_state/d' dump
<zmike>
just sending so you have it and can use it
<karolherbst>
... figures
<zmike>
so where's the issue happening
<zmike>
lot of resource_copy_region calls
<karolherbst>
yeah...
<zmike>
hm in the pruned version I see L215 for example you're mapping resource_12 on pipe_1
<karolherbst>
hard to say, the application basically just maps memory and checks if it has the correct values, but it does map asynchronous which is a huge pain
<zmike>
then doing resource_copy_region on pipe_2 with resource_12 as dst
<zmike>
or no misread
<karolherbst>
yeah.. pipe_1 is the helper context dealing with random stuff I don't have a cl_queue for
<zmike>
the copy is a little later, not immediate
<karolherbst>
the thing is...
<karolherbst>
the first buffer_map you see is already returning the pointer the application checks
<karolherbst>
and it checks it's containing the stuff happening after the map
<karolherbst>
until the flush+fence_finish stuff
<karolherbst>
so everything between resource_creates is more or less one test case
<karolherbst>
I do some shadow resources, so sometimes it doens't add up
<zmike>
can you just prune it down to one failing case?
<zmike>
it would be easier to know exactly what's happening that way
<karolherbst>
ohh.. I could let it crash on the first failing test
<karolherbst>
that should work
<zmike>
(i.e., edit the test to run only one case that fails)
<karolherbst>
should be obvious where the second one starts
<karolherbst>
(after the flushing)
Jeremy_Rand_Talos__ has joined #dri-devel
Jeremy_Rand_Talos_ has quit [Remote host closed the connection]
<zmike>
so...it's creating a resource, subdata, flush, map, set_global_binding, and then it fails?
<zmike>
I assume the kernel is writing to it?
<karolherbst>
yeah.. I think so
<karolherbst>
yep
<zmike>
ok and can you verify whether this is a BAR allocation that zink is doing?
<zmike>
ie. when it maps is it directly mapping
<zmike>
because you're creating it with usage=0, which means it should be attempting BAR
<zmike>
which should be host-visible
Jeremy_Rand_Talos__ has quit [Remote host closed the connection]
Jeremy_Rand_Talos__ has joined #dri-devel
<zmike>
ahhhh
<zmike>
you're not creating it as persistent!
<karolherbst>
because it isn't
<zmike>
the mapping?
<zmike>
it sure seems to be
<karolherbst>
it's DIRECTLY | UNSYNCHRONIZED
<zmike>
that's not enough if you're expecting to read data out of it across a dispatch
<karolherbst>
I don't, I want the current data, that's all
<zmike>
but you're running a kernel that writes to it
<karolherbst>
the application is responsible for syncing
<karolherbst>
the application checks after the flushes happen
<karolherbst>
I just have to map very very early
<zmike>
I don't see anything that would cause this to sync?
<zmike>
there's vulkan calls that should be happening here
<karolherbst>
the flush and fence_finish?
<zmike>
yeah but I think you still need to flush/invalidate teh mapped memory range
<karolherbst>
mhhh.. that might be
<zmike>
which is why you need persistent
<zmike>
or at least transfer_flush_region
<karolherbst>
but I don't want persistent
<karolherbst>
yeah.. I guess I have to use transfer_flush_region
<karolherbst>
but that's all only for WRITE access and not READ :/
<karolherbst>
it's very annoying
<karolherbst>
but it might not matter
<karolherbst>
so how I understand transfer_flush_region is that when a resource is created with FLUSH_EXPLICIT that cached data is only _written_ back on transfer_flush_region
<zmike>
well you can see if this fixes it by just setting persistent on the map
<karolherbst>
anyway.. not using "DIRECTLY | UNSYNCHRONIZED" on non UMA systems, so RADV should get its shadow buffer and that should work all fine regardless
<zmike>
and nothing else
<karolherbst>
sure, but I absolutely don't want to set persistent
<zmike>
nothing else extra*
<zmike>
yeah yeah but for testing
<karolherbst>
just the map or also the resource?
<zmike>
map
<karolherbst>
still fails
<zmike>
hm
<zmike>
what's the memory_barrier call there for?
<karolherbst>
just something clover was doing after launch_grid and I copied it over
<karolherbst>
might not be needed or might be.. dunno
<zmike>
you only need that if you're synchronizing between gpu operations
<zmike>
which this doesn't appear to be doing
<zmike>
not that it's harmful
<karolherbst>
yeah.. perf opts are for later and stuff
<zmike>
hm
<zmike>
not really a perf opt, zink will no-op it anyway
<karolherbst>
ahh
<zmike>
definitely looks like something weird happening
<karolherbst>
so the only real requierement I want to have is that after flush + fence_finish I want the current data to be visible on the mapped ptrs
<karolherbst>
and normally I only try to map directly when I know it's safe
<karolherbst>
like on UMA systems
<zmike>
yeah the only thing I can think of is that you're not getting a real map of the buffer somehow
<karolherbst>
don't do it on discrete GPUs e.g.
<karolherbst>
might be
<karolherbst>
but zink doesn't seem to shadow it either
<zmike>
but if you've stepped through zink_buffer_map you can check that pretty easily
<zmike>
have you tried mapping with just PIPE_MAP_READ and not also PIPE_MAP_WRITE
<karolherbst>
yeah.. it calls map_resource on the res directly
<karolherbst>
huh.. I could try that, but it does look like it gives me the real deal directly
<zmike>
alright
<karolherbst>
so
<karolherbst>
I know a way of fixing
<karolherbst>
it
<karolherbst>
if I just stall the pipeline the tests are passing
<zmike>
not sure then, I'd probably have to look at it
<zmike>
stall the pipeline?
<karolherbst>
like sleeping in my worker thread on the second context
mhenning has joined #dri-devel
<karolherbst>
helper context stuff happens on the application thread, actual context stuff (cl_queue) happens inside a worker thread
<karolherbst>
so probably just a sync issue between the two contexts or something?
<zmike>
that's incredibly bizarre
<karolherbst>
I could sleep and make another trace
<karolherbst>
maybe it looks differente nough to spot something
<karolherbst>
yeah.. somebody asked for that as a separate feature, but I thought I can just combine it to one env var
<karolherbst>
should also help with zink and native not claiming the same device
<karolherbst>
sooo.. zink is using PIPE_MAP_DISCARD_RANGE.. and also sets UNSYNCHRONIZED, DISCARD_WHOLE_RESOURCE.. but it's still ending up with a direct map_resource thing
djbw has quit [Read error: Connection reset by peer]
<zmike>
hm
<zmike>
I don't have a great explanation then
<karolherbst>
maybe MAP_ONCE changes things...
<zmike>
subdata should be fine on its own
<karolherbst>
yeah... dunno
<karolherbst>
doesn't seem to be though
<zmike>
I can only speculate that the flush+fence is forcing mapped memory invalidation/flush
<karolherbst>
mhhh... yeah, let me debug a bit further