ChanServ changed the topic of #dri-devel to: <ajax> nothing involved with X should ever be unable to find a bar
<airlied>
ah it ultra sucks because srgb dxt
<airlied>
that might be solvable
<airlied>
or maybe not, need to dig more
<karolherbst>
airlied: yeah.. but I suspect llvmpipe might be more optimized for graphics then for compute and I know that e.g. pocl can be 10 times as fast as llvmpipe :/ But that might have other reasons (like not using libclc)
<airlied>
you'd hope pocl would be faster, it has developers
<airlied>
but then again I thought that for swiftshader
<airlied>
I'd guess a day with pocl vs llvmpipe would figure it out
<airlied>
I know if you drop the coroutines the compute can got a little faster, but 10x is unlikely :-P
<karolherbst>
well...
<airlied>
the ncnn devs gave me some benchmarks a long time ago and seemed pretty happy with the results from them
<karolherbst>
yeah... not sure
<karolherbst>
maybe compute is really terrible because of libclc routines
<karolherbst>
and using the normal libc math lib is way more efficient
<airlied>
yeah there are many places you could wire up faster bits
ngcortes has quit [Ping timeout: 480 seconds]
<airlied>
ah * After linearized sRGB values require more than 8bits. the defeater is found
<zmike>
victory?
<marex>
pinchartl: thanks, trivial nitpick here and there
<airlied>
nope still seems slow if i hack around it
<karolherbst>
mostly compiler, but also a lot of runtime being broken
<alyssa>
karolherbst: the CTS results, are those for nouveau+rusticl? or radeonsi?
<alyssa>
or llvmpipe?
<alyssa>
you do a lot of OpenCL :-p
<karolherbst>
radeonsi
<alyssa>
Ah!
<alyssa>
nice
<karolherbst>
got asked on the CL khronos call today if rusticl can help with getting CL more usefull on the desktop, and I was like: well... the plan is to ship it by default through mesa long term :P
<alyssa>
niiice
Leopold__ has joined #dri-devel
<karolherbst>
less then 100 fails.. guess that might be possible until XDC... wtf
<karolherbst>
though I think we should still get iris to pass conformance before that :D
<karolherbst>
why are those pile of images_kernel_image_methods crashing...
Leopold_ has quit [Ping timeout: 480 seconds]
<karolherbst>
ehhh
<karolherbst>
radeonsi can't handle a i2i32 on a deref_var to an image
<karolherbst>
figures
Jeremy_Rand_Talos__ has quit [Remote host closed the connection]
Jeremy_Rand_Talos__ has joined #dri-devel
Leopold_ has joined #dri-devel
Leopold__ has quit [Ping timeout: 480 seconds]
<alyssa>
opencl/linux/arm64 is a big chicken and egg problem tbh
<alyssa>
there are few useful workloads, probably because driver support is so poor
<alyssa>
but driver support is poor because why write a driver with no workload?
yuq825 has joined #dri-devel
<karolherbst>
yeah...
<alyssa>
other than darktable i'm not sure what workloads I could test with that don't require anbox or fex/box86
<alyssa>
and you mentioned darktable's cl backend was somehow slower than its cpu..?
<alyssa>
don't get me wrong passing CL CTS is a huge flex
<alyssa>
but not sure what it buys us in the short term
<karolherbst>
not quite sure... I guess once would have to dig into why it's not faster or something
<karolherbst>
probably pipeline stalls
<alyssa>
yeah..
<alyssa>
many (if not most) bugs I hit from CL CTS I would've eventually hit with Vulkan
<alyssa>
so even if CL never ships it was certainly a good preparedness exercise, the valhall compiler should be ready for vulkan now
<alyssa>
(including with all the funny 8 bit exts)
<karolherbst>
I actually hit a bug in AMDs LLVM backend ...
<alyssa>
nice.
<alyssa>
I hit many many bugs in Panfrosts NIR backend ... ;)
<karolherbst>
anyway.. I think it's possible to write a competent and low overhead driver in CL, I am just convinced that it's impossible for applications to use it efficiently :D
<karolherbst>
we really need a CL_MESA_dont_allow_stupid_crap
<karolherbst>
I am sure if all would posix_memalign to 0x1000 their host_ptrs before using it, they might get more perf
<alyssa>
heh
<karolherbst>
though I actually also don't know what's so low perf in darktable.. is it us using libclc too often? is it stalls in launch_grid, because of... weird things?
<karolherbst>
no clue :)
<karolherbst>
there is one fun thing we could do: support out of order queues
<karolherbst>
and load balance across multiple contexts for things not dependning on each other
<karolherbst>
but ufff
<karolherbst>
could be tons of work
<alyssa>
t_t
<karolherbst>
heh.. is it jsut me or is si_buffer_from_user_memory busted if it fails?
<karolherbst>
ahh yeah.. I try to create an image from user memory and this totally doesn't work for radeonsi :)
<karolherbst>
alyssa: anyway.. I am sure out of order queues are a fun project :P
<karolherbst>
_maybe_ we could use TC as the backend impl
<karolherbst>
but that's not really out of order, is it?
<airlied>
alyssa: camera processing might be a cl user
<karolherbst>
face detection, but in open source :P
<karolherbst>
uhm.. recognition I mean
<karolherbst>
fancy "choose the best filters" AI magic
<karolherbst>
I might have to teach nir_to_llvm to emit deref_vars to pointers... :/
<karolherbst>
maybe I just deal with the deref_var there for drivers like radeonsi wanting derefs...
cheako has joined #dri-devel
bmodem has joined #dri-devel
aravind has joined #dri-devel
ybogdano has quit [Ping timeout: 480 seconds]
Daanct12 has joined #dri-devel
heat has quit [Ping timeout: 480 seconds]
kem has quit [Ping timeout: 480 seconds]
fxkamd has quit []
mhenning has quit [Quit: mhenning]
Company has quit [Quit: Leaving]
pcercuei has quit [Ping timeout: 480 seconds]
kem has joined #dri-devel
Daanct12 has quit [Remote host closed the connection]
Daanct12 has joined #dri-devel
bgs has quit [Remote host closed the connection]
Duke`` has joined #dri-devel
tzimmermann has joined #dri-devel
JohnnyonFlame has quit [Quit: No Ping reply in 180 seconds.]
JohnnyonFlame has joined #dri-devel
sdutt_ has joined #dri-devel
sdutt has quit [Read error: Connection reset by peer]
epoll has quit [Ping timeout: 480 seconds]
JohnnyonFlame has quit [Quit: No Ping reply in 180 seconds.]
JohnnyonFlame has joined #dri-devel
itoral has joined #dri-devel
epoll has joined #dri-devel
Leopold_ has quit [Ping timeout: 480 seconds]
Duke`` has quit [Ping timeout: 480 seconds]
ppascher has joined #dri-devel
JohnnyonFlame has quit [Read error: No route to host]
JohnnyonFlame has joined #dri-devel
kts has joined #dri-devel
kts has quit []
neoXite has quit []
ella-0 has quit []
mbrost has joined #dri-devel
jewins has quit [Read error: Connection reset by peer]
jkrzyszt has joined #dri-devel
lemonzest has quit [Quit: WeeChat 3.5]
Daaanct12 has joined #dri-devel
Daanct12 has quit [Read error: Connection reset by peer]
aravind has quit [Ping timeout: 480 seconds]
mvlad has joined #dri-devel
tursulin has joined #dri-devel
aravind has joined #dri-devel
danvet has joined #dri-devel
MajorBiscuit has joined #dri-devel
mbrost has quit [Ping timeout: 480 seconds]
Major_Biscuit has joined #dri-devel
rasterman has joined #dri-devel
jfalempe has joined #dri-devel
MajorBiscuit has quit [Ping timeout: 480 seconds]
swalker__ has joined #dri-devel
swalker_ has joined #dri-devel
swalker_ is now known as Guest1601
<pq>
mdnavare, debugfs makes no sense if you want userspace to use it as part of normal operations.
lynxeye has joined #dri-devel
<pq>
mdnavare, please, no bidrectional KMS properties. They are a mistake.
<pq>
mdnavare, vsyrjala, what does the kernel do if userspce attempts to program a video mode that exceeds the monitor frequencly limits? It is failed, does it silently do something else, or does the kernel do exactly what the userspace told (which then maybe results in link-status failure later)?
swalker__ has quit [Ping timeout: 480 seconds]
<pq>
mdnavare, vsyrjala, I think attempting to set and use VRR should follow the same scheme as setting a video mode. E.g. if userspace enables VRR, the driver enables VRR, period. The monitor might malfunction or link-status might fail, but that's what userspace asked for.
<pq>
emersion, ^
<pq>
It's different if the source-side hardware or driver *cannot* do something, that needs to be an immediate failure.
aravind has quit [Read error: Connection reset by peer]
<emersion>
basically, does ALLOW_MODESET guarantee no visual artifacts even if the screen does dumb stuff
<emersion>
if there's no suich guarantee, how can we reword the docs to account for this
<danvet>
emersion, yeah maybe include a warning that sinks might be stupid
<emersion>
danvet: thanks for the suggestion, sounds fine to me
<danvet>
we can't really guarantee what's going on there though
<emersion>
right…
<danvet>
so kernel can only guarantee that stuff is perfectly atomic (i.e. the infoframe change goes out with the right picture) up to the sink
<danvet>
which might actually physically be in the external display with mst hubs and stuff :-)
<danvet>
*up to the sink but not including whatever is going on there
<danvet>
unless userspace wants the kernel to carry the heuristics/spec language/whatever to check which infoframe changes would result in flickering
<danvet>
which I guess could make some sense, but maybe better to clarify that in the docs for the respective properties
<emersion>
i think it depends
<danvet>
emersion, pq my understanding of the flip event was always "old fb can now be safely overwritten"
<danvet>
and not anything about when the pixels show up, that's what the timestamp is for
<emersion>
if an infoframe triggers a visual artifact on all screens, then it would be good to gate that behind ALLOW_MODESET
<danvet>
I guess it would make sense to clarify that
<emersion>
all screens, as in, all sinks
<danvet>
but that's perhaps a job for the patch document the vblank event stuff
<pq>
danvet, hmm? I'm not sure I ever talked about the time instant of *sending* a flip event.
<danvet>
pq, the dumbest possible compositor is allowed to start rendering right after it gets the event back
<danvet>
with just double buffering
<danvet>
I thought we've documented this somewhere
<pq>
danvet, of course. But why are you telling me this?
tobiasjakobi has joined #dri-devel
<danvet>
pq, there seems to be a confusion going on in the thread for the kerneldoc patch for this
tobiasjakobi has quit [Remote host closed the connection]
fahien has joined #dri-devel
<emersion>
danvet: "the update is completed" would just mean "the new frame is displayed on-screen"
<pq>
I guess the doc does not make a sufficient difference between the time instants of *receiving* a page flip event, and the *timestamp* carried by the event?
<emersion>
not "the uevent is sent"
<danvet>
emersion, that's not actually the definition
<danvet>
emersion, it's also not an uevent
<danvet>
pq, yeah, and judging from the discussion that seems like worthy of clarification
<danvet>
hm there's actually a 3rd meaning in there
<danvet>
as interaction with vblank events
<emersion>
i never got the difference between vblank and page_flip events
<danvet>
emersion, it's the same structure
<danvet>
vblank events you get back from the vblank related ioctl
<danvet>
page_flip from page_flips
<emersion>
"the vblank related ioctl"?
<danvet>
the interaction rule is that if you emit a flip immediately after a vblank event for frame N
<danvet>
then the flip will hit frame N+1
<danvet>
emersion, VBLANK and GET_SEQUENCE/QUEUE_SEQUENCE
<danvet>
the stuff in drm_vblank.c
<emersion>
ah
<emersion>
so they trigger at the same time, but in response to different IOCTLs?
<danvet>
yeah
<danvet>
also any compositor that schedules flips really should know about the vblank ones, or I have no idea how anything works
<emersion>
hmmm?
<emersion>
why would a compositor use the vblank callback, if it doesn't use VBLANK and GET_SEQUENCE/QUEUE_SEQUENCE?
<danvet>
emersion, it should use one of these
<pq>
Weston uses the vblank ioctl only find out the phase of the scanout cycle when it starts updating an output after a pause.
<danvet>
or how do you schedule future flips
<emersion>
i don't follow
<danvet>
like client does a buffer flip for a specific frame
<danvet>
somewhen in the future
<pq>
while updates are rolling continuously, there is no need for vblank ioctls, Weston drives based on page flips alone.
<danvet>
pq, yeah but what if you stop doing updates?
<pq>
Weston uses the vblank ioctl only find out the phase of the scanout cycle when it starts updating an output after a pause.
<emersion>
wlroots just does an atomic commit immediately when resuming the rendering loop
<danvet>
emersion, uh that wastes a frame, that's not good?
<emersion>
how does this waste a frame?
<danvet>
you do a flip?
<pq>
it doesn't waste a frame, but the timings of the first frame are not predicted
<danvet>
ah ok, I figured the flip is just to get the timing and not doing anything yet
<emersion>
ah, no
<emersion>
repaint and flip, as usual
<danvet>
still, do you all just convert future frames to time and run with that?
<emersion>
there's no concept of future frame
<danvet>
the xorg way is to schedule a vblank event one frame ahead
<pq>
Wayland has no way to schedule future frames yet.
<danvet>
ah ok
<danvet>
that explains I guess
<emersion>
even if there was, the compositor must be careful, there might be multiple clients
<danvet>
I guess we can kerneldoc this all when that wayland extension shows up
<pq>
I do fully expect scheduling to be completely timestamp based, not frame counter
<pq>
because VRR
<danvet>
emersion, oh sure, you can't just block for that event, you might have an entire pile of them in flight
Daaanct12 has quit [Quit: Quitting]
<danvet>
pq, and just fake for the old x clients in xwayland?
<emersion>
oh, so you use them to wait for a specific vblank, i see
<pq>
yeah
<danvet>
emersion, it's an event on a pollable fd, it's just a one of the wake sources in your main loop
<pq>
old x clients wouldn't understand VRR anyway if they wanted to schedule frames
<danvet>
pq, well it's more clock drift
<danvet>
when you have a non-vrr display
<danvet>
but I guess for stuff that far out no one cares
<pq>
sure, and there will be a feedback loop I presume
<danvet>
pq, well originally that was the vblank event stuff
<danvet>
otherwise you get to drive that feedback loop through the compositor and have a bunch more wakeups
<danvet>
maybe at least
<pq>
Xwayland can adjust at every fully presented frame as it gets the timestamp and scanout period back.
<pq>
not sure I get that, Xwayland needs to present through the compositor anyway
<pq>
maybe that's about frame callback event and presentation-time presented event each cause a separate wakeup in Xwayland?
Akari has quit [Remote host closed the connection]
rasterman has quit [Quit: Gettin' stinky!]
sdutt_ has quit [Ping timeout: 480 seconds]
fahien has quit [Ping timeout: 480 seconds]
vliaskov has joined #dri-devel
bnieuwenhuizen has quit [Quit: Bye]
bnieuwenhuizen has joined #dri-devel
mbrost has joined #dri-devel
fahien has joined #dri-devel
Lucretia has quit []
fab has joined #dri-devel
Lucretia has joined #dri-devel
camus has quit []
camus has joined #dri-devel
<jani>
pushed something, and getting drm-tip rebuild conflicts in stuff I didn't touch at all. was the conflict there before? can someone else try the drm-tip rebuild?
<vsyrjala>
conflicts in amdgpu
<vsyrjala>
and msm
YuGiOhJCJ has quit [Quit: YuGiOhJCJ]
mbrost_ has joined #dri-devel
bmodem has quit [Ping timeout: 480 seconds]
Duke`` has joined #dri-devel
<jani>
yeah, definitely not my doing... but I'm wondering if rerere has all the right bits if I just had a different git version
<pq>
vsyrjala, re: that above thread; why would *any* KMS property update other than FB_ID require implicit full plane damage? The FB contents are not changing, only what you do with them changes.
<tursulin>
huge drm-tip rebuild conflict in amdgpu - anyone from amd around to look into it?
heat has joined #dri-devel
<tursulin>
plus one conflict in msm
<tursulin>
airlied ^^^ you merged amd-next and msm-next just today - did drm-tip rebuild for you? wondering if I hit one of those "too old git for correct rerere" issues, although my git is 2.34.1 so I thought should be good.
rgallaispou has quit []
gawin has quit [Ping timeout: 480 seconds]
pcercuei has joined #dri-devel
chipxxx has joined #dri-devel
chipxxx has quit [Remote host closed the connection]
chipxxx has joined #dri-devel
Guest1601 has quit [Remote host closed the connection]
<Ristovski>
ok this Vega APU is surprisingly nice (got a Ryzen 5700g)
oneforall2 has quit [Remote host closed the connection]
MrCooper has quit [Remote host closed the connection]
oneforall2 has joined #dri-devel
mbrost has quit [Ping timeout: 480 seconds]
gawin has joined #dri-devel
iive has joined #dri-devel
<karolherbst>
jekstrand: if I want to lower deref chains only for certain instructions (e.g. do it for image_deref_order, but not for other image_derefs), do we have helpers to clone the chain and only lower that or something? Kind of in a mess here with radeonsi wanting image_derefs, but I need the actual lowered thing for image_deref_order/format lowering.
<karolherbst>
though I might be able to teach nir_to_llvm to deal with deref_vars of images by fetching the location, but not sure if that wouldn't mess with all the other things it's doing :/
<airlied>
tursulin, jani : sorry i bet my rebuild failed, will fix it uo
<airlied>
okay will fix it up now, sorry for noise
pochu has quit [Quit: leaving]
Major_Biscuit has quit [Ping timeout: 480 seconds]
<jimjams>
does DRM guarantee that users will get a DRM_EVENT_FLIP_COMPLETE for every NON_BLOCK page flip that gets submitted, or are there cases where a flip won't complete even if submission is successful?
fab has joined #dri-devel
<emersion>
jimjams: if you passed PAGE_FLIP_EVENT… only a very serious issue would prevent the driver from sending it
<emersion>
e.g. OOM
<jimjams>
Awesome, thanks!
fab has quit [Quit: fab]
<airlied>
jani, tursulin : okay fixed now
<jekstrand>
karolherbst: Yeah... nir_explicit_io_address_from_deref for explicit.
<karolherbst>
okay, cool, hope that helps
<jekstrand>
karolherbst: Typically, the way this is done is you go through and do your own lowering first then maybe dead code then run the pass you want to deal with everything else.
ngcortes has joined #dri-devel
<jekstrand>
karolherbst: If you want derefs to lower to more derefs, I don't know that we have an easy way to do that.
<karolherbst>
jekstrand: yeah... thing is, I really have to keep the original derefs there for radeonsi
<jekstrand>
yeah...
<karolherbst>
but nir_explicit_io_address_from_deref should be okay and we don't expect any indirects anyway
<karolherbst>
anyway.. I think I already lowered everything once I get to my lowering, so it sould be fine
<vsyrjala>
emersion: there shouldn't be memory allocations once the ioctl has returned, so oom should be fine
<jekstrand>
karolherbst: Worst case, you have to do some deref chasing. It's annoying but something I'd expect clover or rusticl wants to do at least some.
<karolherbst>
yeah...
<mdnavare>
pq: To your question on what kernel does if userspace attempts to request VRR at a freq outside of VRR range : For Eg: If the monitor range is 40-75Hz and if userspace requests 80Hz, then the shortest vblank possible is corresponding to Vmin so it will be capped at 75Hz and if usersace requests say 25 Hz, the HW terminates Vblank at Vmx so min possible refresh rate will be 40Hz
<mdnavare>
pq: Hope this answers your question
<karolherbst>
jekstrand: what address format do I have to use?
<karolherbst>
mhh though maybe offset is good enough?
<emersion>
vsyrjala: i was thinking sending the uevent might require allocs, but maybe not
<vsyrjala>
shouldn't
<emersion>
maybe commit_tail requires allocs
<emersion>
ideally these would indeed be made in advance
<bl4ckb0ne>
how does symbol exporting work for apis like gl and egl
<bl4ckb0ne>
im updating the egl header and a couple of symbols dissapeard
tursulin has quit [Ping timeout: 480 seconds]
gouchi has joined #dri-devel
gawin has quit [Remote host closed the connection]
<alyssa>
jekstrand: this reminds me that we never resolved "drivers should be able to lower their own I/O" in GL like they can in VK
<alyssa>
I think nir_lower_explicit_io can do all the UBO and SSBO lowering needed for AGX... if only I could call it
<alyssa>
clearly the solution is to just rewrite the driver in vk
<alyssa>
admittedly it's unclear to me where the UBO base address comes from in nir_lower_explicit_io
<alyssa>
I guess this really is tied to the VK descriptor set model
Daanct12 has joined #dri-devel
Leopold_ has joined #dri-devel
rasterman has joined #dri-devel
Danct12 has quit [Read error: Connection reset by peer]
Daaanct12 has joined #dri-devel
Daanct12 has quit [Ping timeout: 480 seconds]
pcercuei has joined #dri-devel
gio_ has joined #dri-devel
gio has quit [Ping timeout: 480 seconds]
<alyssa>
glehmann: "aco: Combine s_abs and s_sub/s_add to s_absdiff"
<alyssa>
NIR has uabs_isub/usub opcodes
<alyssa>
which iirc were subtly different but I don't remember the details
fahien has quit [Ping timeout: 480 seconds]
<alyssa>
is that relevant?
<alyssa>
maybe not who knows
<alyssa>
the bfm optimizations I'd be happy to do in NIR
<alyssa>
we already *have* a bfm opcode, just need opt_algebraic rules ...
<idr>
Which MR is this?
<idr>
!18870?
<idr>
(That punctuation looks really weird. Lol.)
* idr
wonders of those patterns arise from lowering ACO does during codegen...
<alyssa>
idr: yeah
<alyssa>
i don't have any amd hardware I'm just nosy
<alyssa>
idV
<Ristovski>
Wait so "vddgfx" is literally the same as "SVI2_Core" on the Ryzen 5700g :/
<alyssa>
idr: btw, What are your thoughts on noltis-based ffma?
<alyssa>
(or noltis in general)
<idr>
I like the idea, but I haven't reviewed it carefully.
<idr>
There are a lot of places where there opportunities to apply some kind of "search based" code optimization.
<idr>
At one point I also experimented with some use a branch-and-bound search algorithm, but it wasn't a great fit for the problem at hand.
<alyssa>
fair enough ^^
<idr>
All of the '(is_used_once)' stuff in opt_algebraic is a hack to approximate a real global minimization solution. #foodforthought
<alyssa>
I mean
<alyssa>
Memory bandwidth on mobile chips is so scarce that I have neither seen a real ALU-bound workload on Mali in my life, I have no evidence that isel actually matters, food for thought.
<idr>
Fair.
mvlad has quit [Remote host closed the connection]
<idr>
The biggest benchmarkable difference seems to be when it affects either loop unrolling or spilling.
<alyssa>
sure
<alyssa>
TBD whether ALU will be more relevant on the big power hungry Apple chips
<alyssa>
I hear we're not totally starved for memory bw on those, might be more fun for a change :p
<alyssa>
OTOH they're still tilers so...
<karolherbst>
mareko: what happens to unaligned ubo loads, e.g. "%37 = call nsz arcp float @llvm.amdgcn.s.buffer.load.f32(<4 x i32> %11, i32 66, i32 0) #3"? I kind of get the impression that 8 and 16 bit ubo loads are a little broken
<karolherbst>
at least I get the same result as the same ubo load loading from 64 instead of 66
<bl4ckb0ne>
eric_engestrom: any idea what could cause the symbols to go away in the EGL header update?
tzimmermann has quit [Quit: Leaving]
<glehmann>
alyssa: oh we do have bfm?
<glehmann>
the question is if we want to use it in opt_algebraic for (1 << a) - 1
<alyssa>
glehmann: nir has way too many bitfield ops and nobody knows what they're for
<alyssa>
how is bitfield_extract different from bfe? don't worry about it
<zmike>
what in the sam hill is bfe
<alyssa>
(a >> b) & ((1 << m) - 1)
<alyssa>
maybe
<bl4ckb0ne>
eric_engestrom: ha i think I got it
<glehmann>
alyssa: bitfield_extract can extract the full value with bits == 32, bfe can't
<glehmann>
iirc
Leopold_ has quit [Remote host closed the connection]
Leopold_ has joined #dri-devel
<alyssa>
(a >> b) & ((1 << m) - 1)
<alyssa>
oops
<alyssa>
glehmann: oh that's very clear from the names ;-p
<mareko>
karolherbst: ac_nir_to_llvm should convert them to 32-bit loads, but tarceri__ discovered that it's indeed broken
<karolherbst>
yeah.. I saw that
<karolherbst>
are there any patches for that?
<karolherbst>
or are there 16/8 bit loads I could use in the meantime?
<mareko>
tarceri__'s revert
<mareko>
or writing a NIR lowering pass converting loads to 32 bits
everfree has quit [Quit: leaving]
fahien has joined #dri-devel
<mareko>
the hw can only do ubo loads with 32-bit granularity, the 2 lowest offset bits are ignored
<karolherbst>
yeah.. I can work on a lowering pass once I make it somewhat working
<karolherbst>
ahh, I could find the MR for whatever reason
<karolherbst>
let's see if I can get it to apply cleanly here
<karolherbst>
cool
<karolherbst>
that fixes my issue
everfree has joined #dri-devel
<karolherbst>
that probably also fixes a bunch of other issues
chipxxx has quit [Read error: No route to host]
<karolherbst>
rusticl CTS runs are the slowest on radeonsi :( damn 3D images
<alyssa>
mali is slower (-:
<karolherbst>
yeah...
<karolherbst>
I mean.. I only have to wait 20 minutes, which is still good though
oneforall2 has quit [Remote host closed the connection]
<karolherbst>
wondering if I hit conformance today or tomorrow...
<karolherbst>
I might have to reorder tests a bit in my runner, because this one subtests just runs for like 15 minutes alone :(
<karolherbst>
"Pass 2320 Fails 29 Crashes 3"
oneforall2 has joined #dri-devel
<airlied>
karolherbst: btw I did a pocl vs llvmpipe on luxmark, it does kick our ass :-P
<karolherbst>
I know
<karolherbst>
but not much I can do about that except trying to optimize llvmpipe :p
<karolherbst>
there is one weird thing I noticed though
<karolherbst>
CPU utilization is capped a lot
<karolherbst>
like it's only using 12 threads max or something
<karolherbst>
airlied: but if you want something to kick ass, try running the C++ version
<karolherbst>
it's ridiculous
<airlied>
I thought reporting better vector sizes and group size suggestions might help, but it didn't
<alyssa>
only 20 minutes! wow
<karolherbst>
alyssa: on iris it's like 8
<alyssa>
!!!
<alyssa>
wow.
<karolherbst>
well... 20 core machine
<alyssa>
mali is integrated :(
<karolherbst>
yeah... and the CTS uses a lot of CPU
<karolherbst>
though.. illwieckz might be able to tell you how to do distributed CL and run the runner on beefy x86 and use your mali for CL :D
<alyssa>
thinking
<karolherbst>
samplerless reads are broken for 1D....
<karolherbst>
but works perfectly fine for 2D
<karolherbst>
odd
vliaskov has quit [Remote host closed the connection]
heat has quit [Read error: No route to host]
heat has joined #dri-devel
<jekstrand>
alyssa: Yeah.... I was working on that at one point, then I tried assigning to a minion, then I gave up. Now I have no minions. (-:
<jekstrand>
alyssa: IDK if we actually want drivers lowering their own or if we just want to let them provide a nir_address_mode via a cap
gouchi has quit [Remote host closed the connection]
<alyssa>
daniels: Can we get jekstrand some minions?
Duke`` has quit [Ping timeout: 480 seconds]
<karolherbst>
ohh wait.. samplerless is only broken for 1D buffer
<alyssa>
jekstrand: It's still not clear to me how the Gallium driver supplies the base of an SSBO/UBO
<alyssa>
with NIR lowering
<alyssa>
I guess with Vulkan, it gets lowered to a load_global_constant of the descriptor set?
<alyssa>
and the VK driver needs to match the descriptor set UBO to the NIR ABI?
<jekstrand>
alyssa: We'd have a load_ssbo_address(index) intrinsic
<alyssa>
Right ok
<alyssa>
we already have that
iive has quit [Ping timeout: 480 seconds]
<jekstrand>
alyssa: For a bindful driver like iris that uses index+offset already, it would just lower that to vec2(index, 0).
<jekstrand>
alyssa: For a driver that needs to push in an address like AGX or panfrost, it would do a push constant load or something.
<Ristovski>
Hmmm, why does it look like amdgpu isn't releasing claimed GTT memory? After killing everything and dropping into tty, `free` still reports 1.4GB used which is about what the peak of GTT usage was in TF2 in the same session.
<alyssa>
Sure
chipxxx has joined #dri-devel
<Ristovski>
radeontop says only 55MB in GTT
<jekstrand>
alyssa: I don't remember why we gave up.
sdutt has quit [Ping timeout: 480 seconds]
<jekstrand>
alyssa: In fact, GL does use nir_lower_explicit_io() today. It just has it hard-coded to index+offset
<jekstrand>
And doesn't use any sort of load_base_address intrinsic
<jekstrand>
alyssa: I seem to recall you being the reason I typed that. :P
<alyssa>
I seem to recall so as well
<alyssa>
I don't even know what gl_nir_lower_buffers is
<jekstrand>
alyssa: It lowers derefs to index+offset for UBOs and SSBOs
chaim has quit [Quit: Konversation terminated!]
<alyssa>
right, ok
<alyssa>
I have read that MR twice now and am still a bit lost about what needs to happen to land and delete nir_lower_ssbo and avoid writing nir_lower_ubo (-:
<alyssa>
(strictly I already typed out nir_lower_ubo in my local tree)