tlwoerner__ has quit [Remote host closed the connection]
tlwoerner__ has joined #dri-devel
oneforall2 has quit [Remote host closed the connection]
oneforall2 has joined #dri-devel
oneforall2 has quit [Remote host closed the connection]
aravind has joined #dri-devel
oneforall2 has joined #dri-devel
heat has quit [Ping timeout: 480 seconds]
<mareko>
ACO seems to work very well with radeonsi
mwalle has quit [Quit: WeeChat 3.8]
Duke`` has joined #dri-devel
fab has joined #dri-devel
itoral has joined #dri-devel
Duke`` has quit [Ping timeout: 480 seconds]
Mangix has quit [Read error: Connection reset by peer]
Mangix has joined #dri-devel
itoral has quit [Remote host closed the connection]
OftenTimeConsuming has quit [Remote host closed the connection]
OftenTimeConsuming has joined #dri-devel
itoral has joined #dri-devel
kts has joined #dri-devel
rz has quit [Remote host closed the connection]
rz has joined #dri-devel
sima has joined #dri-devel
fab has quit [Quit: fab]
kts_ has joined #dri-devel
<airlied>
agd5f: backmerged pushed out
kts has quit [Ping timeout: 480 seconds]
kts_ has quit []
aravind has quit [Remote host closed the connection]
aravind has joined #dri-devel
mszyprow has joined #dri-devel
jfalempe has joined #dri-devel
jfalempe has quit [Ping timeout: 480 seconds]
rasterman has joined #dri-devel
<mupuf>
mareko: great to hear! Any more you want to share?
<mupuf>
As in, do you mean it is close to being functionally and performance equivalent? Or is it equal, or better? What apps did you use for your testing?
frieder has joined #dri-devel
sghuge has quit [Remote host closed the connection]
sghuge has joined #dri-devel
jfalempe has joined #dri-devel
fab has joined #dri-devel
crabbedhaloablut has joined #dri-devel
glennk has joined #dri-devel
donaldrobson has joined #dri-devel
donaldrobson has quit [Remote host closed the connection]
rgallaispou has joined #dri-devel
kts has joined #dri-devel
apinheiro has joined #dri-devel
sgm has quit [Ping timeout: 480 seconds]
sgm has joined #dri-devel
lynxeye has joined #dri-devel
frieder has quit [Quit: Leaving]
frieder has joined #dri-devel
frieder_ has joined #dri-devel
mwalle has joined #dri-devel
kzd has quit [Ping timeout: 480 seconds]
kts has quit [Quit: Leaving]
frieder has quit [Ping timeout: 480 seconds]
pcercuei has joined #dri-devel
hansg has joined #dri-devel
linkmauve has joined #dri-devel
heat has joined #dri-devel
i509vcb has quit [Quit: Connection closed for inactivity]
itoral has quit [Remote host closed the connection]
itoral has joined #dri-devel
yyds_ has joined #dri-devel
mvlad has joined #dri-devel
yyds has quit [Ping timeout: 480 seconds]
dtmrzgl has quit []
yyds has joined #dri-devel
yyds_ has quit [Ping timeout: 480 seconds]
drobson has joined #dri-devel
drobson has quit [Remote host closed the connection]
idr has quit [Ping timeout: 480 seconds]
fab has quit [Ping timeout: 480 seconds]
fab has joined #dri-devel
nsa1 has joined #dri-devel
cmichael has joined #dri-devel
nsa1 has left #dri-devel [#dri-devel]
glennk has quit [Ping timeout: 480 seconds]
karolherbst has quit [Remote host closed the connection]
karolherbst has joined #dri-devel
biju has joined #dri-devel
drobson has joined #dri-devel
cazzacarna has joined #dri-devel
cazzacarna has quit []
ChaosPrincess has quit [Quit: ChaosPrincess]
hansg has quit [Quit: Leaving]
kts has joined #dri-devel
macslayer has quit [Remote host closed the connection]
cazzacarna has joined #dri-devel
JohnnyonFlame has joined #dri-devel
hansg has joined #dri-devel
hansg has quit []
glennk has joined #dri-devel
yyds has quit [Remote host closed the connection]
kts has quit [Ping timeout: 480 seconds]
kts has joined #dri-devel
kts has quit [Ping timeout: 480 seconds]
luc has joined #dri-devel
dtmrzgl has joined #dri-devel
bmodem has quit [Ping timeout: 480 seconds]
sgm has quit [Remote host closed the connection]
sgm has joined #dri-devel
glennk has quit [Ping timeout: 480 seconds]
kts has joined #dri-devel
kts has quit []
<luc>
hi, all, recently I do some experiment on aarch64 platform. i replace memcpy[1] with __memcpy_aarch64_simd[2] in _mesa_store_compressed_texsubimage. it turns out that the latter is almost 1x slower than the former. If I understand correctly. what _mesa_store_compressed_texsubimage() does is copying data from ram to vram. I dont know why simd does worse under the circumstance
<karolherbst>
luc: memcpy is already implemented efficiently via simd instructions
<karolherbst>
glibc chooses what is the fastest given the hw and input
idr has joined #dri-devel
<karolherbst>
also compilers might replace memcpy by something better as well
<luc>
compared to ARM-software version, I noticed that glibc just doesn't use SIMD/FP registers,I wonder how they (simd/fp registers) make a difference.
<karolherbst>
yeah.. but I'd trust them to know what they are doing and apparently they seem to do
<karolherbst>
but it might be best to check with gdb what actually happens on that memcpy
<karolherbst>
compilers are free to skip going through libc on any memcpy call, so it might just be that the compiler does something even smarter
<karolherbst>
and by using something besides memcpy you take that freedom away from compilers
YuGiOhJCJ has quit [Quit: YuGiOhJCJ]
<luc>
I've checked that with gdb. sure that it is __memcpy_generic in [1] above that is chosen. so i guess what is slow are those instructions such as load/store q0.. 7
DodoGTA has quit [Quit: DodoGTA]
DodoGTA has joined #dri-devel
Daanct12 has quit [Quit: WeeChat 4.1.2]
<karolherbst>
luc: out of curiousity, did you try the sve version?
frieder_ has quit []
frieder has joined #dri-devel
<luc>
karolherbst: not yet, because my cpu is armv8-a, according to ARM reference, sve is introduced since armv8.2-a
<karolherbst>
could check in /proc/cpuinfo but yeah.. it's kinda hard to find out when sve was actually introduced
<luc>
in fact, __ARM_FEATURE_SVE not defined by my compiler
<jenatali>
Huh ok. When I did it a few years ago I was seeing closer to like 72. Not sure if I was just slow execution or if it was slow compilation
<karolherbst>
yeah.. maybe rusticl being heavily threaded helps
<karolherbst>
though
<karolherbst>
the CTS built in release mode helps a lot
<karolherbst>
but yeah... I interface with a `pipe_context` only from a special worker thread, which allows some kind of parallelism
<karolherbst>
(I should compile programs in parallel though...)
<karolherbst>
but I have a script which runs like evertyhing in an hour parallized
<karolherbst>
or under 10 minutes with wimpy and some annoying and irrelevant tests disabled
<karolherbst>
MrCooper: ahh
<karolherbst>
MrCooper: on my end I have nextafter, remainder and remquo failing sometimes
<karolherbst>
but also something with half vstore/vload
<karolherbst>
I'll look into the vstore/vload stuff first then
<MrCooper>
~30 tests fail here ever since I started testing, these were passing until today though
<karolherbst>
jenatali: maybe something serializes on conversion/math_brute_force on your end? Those tests are already threaded themselves and run on multiple CL queues
<karolherbst>
and conversions is like 60% of the runtime
<karolherbst>
at least for me
<jenatali>
Yeah they just take forever
<jenatali>
I haven't tried to do a full run recently and I'm working on perf currently so hopefully it'll be faster when I'm done
<karolherbst>
Test Conversions passed in 28495.6525979s on iris
<karolherbst>
roughly 8 hours full profile
<jenatali>
I think my last fails actually disappeared since I last looked too (hooray shared / external libraries) so I might be able to actually submit for CL3.0 certification
<karolherbst>
nice
<karolherbst>
jenatali: full or embedded profile?
<karolherbst>
I guess full as you don't have the image restriction issue with d3d
<karolherbst>
like.. GL doesn't split samplers and textures
<karolherbst>
so most drivers only support 32 read only images
<jenatali>
Oh right
<karolherbst>
and radeonsi wasn't interested unless anything actually needs more, as it's otherwise just pointless overhead :D
<jenatali>
Yeah the one main benefit of using an external runtime+driver
<jenatali>
Right
<karolherbst>
MrCooper: I see
<karolherbst>
MrCooper: one concerning issue is that I _sometimes_ hit this assert: test_bruteforce: ../src/gallium/auxiliary/util/u_inlines.h:83: pipe_reference_described: Assertion `count != 1' failed.
<karolherbst>
kinda need to figure out what that's all about
gpiccoli_ has quit [Read error: Connection reset by peer]
simon-perretta-img has quit [Ping timeout: 480 seconds]
gpiccoli_ has joined #dri-devel
dviola has joined #dri-devel
frieder has quit [Remote host closed the connection]
gpiccoli has quit [Ping timeout: 480 seconds]
<MrCooper>
karolherbst: nope, fails even with MESA_SHADER_CACHE_DISABLE=1
<karolherbst>
mhhh
<karolherbst>
I wonder if my issue is the same, but it's quite random
rgallaispou has quit []
<karolherbst>
and I need to run ~7 times with the cache disabled to either hit it or not
glennk has quit [Ping timeout: 480 seconds]
biju has quit [Quit: Konversation terminated!]
fab has quit [Remote host closed the connection]
<cmarcelo>
jenatali: best part of that MSVC news for me: struct whatever w = {}; will work for it now.
<karolherbst>
pain.. I always bisect towards nonsense commits :(
fab has joined #dri-devel
<jenatali>
cmarcelo: Hm? Is that a thing being added in C23?
<MrCooper>
karolherbst: it's been consistent for me so far, I've only done low double-digit number of tests though
<cmarcelo>
jenatali: yes. you can use = {} instead of = {0} to zero initialize structs.. that is helpful in some edge cases too (nested structs etc). it was already supported in clang/gcc as compiler extensions for a while.
<jenatali>
Oh cool
<karolherbst>
MrCooper: mhh.. maybe I'm debugging a different bug then
<MrCooper>
seems likely
simon-perretta-img has joined #dri-devel
flom84 has quit [Quit: Leaving]
<karolherbst>
let's see how many attempts it will take to find the culprit :')
<cmarcelo>
jenatali: and from my understanding it also will zero the padding bits (!)
<jenatali>
:O
<karolherbst>
it's not already guarnateed?
<karolherbst>
or will {} != { 0 } then?
<cmarcelo>
I don't think is guaranteed :-( my understanding is that will be different. trying to parse out the spec proposals.
Mangix has quit [Read error: Connection reset by peer]
Mangix has joined #dri-devel
drobson has quit [Ping timeout: 480 seconds]
<bwidawsk>
so there were a few patches which landed for 23.3 (started with 9ec9849c85e8202cb) that leandrohrb56 authored and that emersion and daniels reviewed which essentially stop me from using VKMS as an EGL renderer. I'm wondering what the right path would be for me to run my test suite now
<bwidawsk>
at least I think this is the case...
<emersion>
bwidawsk: why do they stop you from doing that?
<bwidawsk>
I think the main one is I lose dmabuf import apparently
<emersion>
sounds like a bug
<cmarcelo>
karolherbst: AFAICT "= {0}" didn't guaranteed to also zero the padding. empty initializer "= {}" guarantees that.
<karolherbst>
cursed
<karolherbst>
the same for the compiler extensions?
<bwidawsk>
maybe it's my fault, let me check something else
glennk has joined #dri-devel
lynxeye has quit [Quit: Leaving.]
<bwidawsk>
daniels, emersion, leandrohrb56: It was my mistake. It was falling back to gles renderer instead of using pixman as it was supposed to be.
<cmarcelo>
karolherbst: the GCC extension seems to do that (zero padding), although it not really documented. also looks like in practice gcc/clang already treat "={0}" == "={}". will keep an eye open to see what MSVC will do here.
<karolherbst>
yeah, it's also often faster to just initialize it all in one go, because vector instructions
<vsyrjala>
iirc c23 mandates ={} to make sense. ie. padding is also zeroed
<cmarcelo>
vsyrjala: yes
FL4SHK[m] has joined #dri-devel
<vsyrjala>
oh that was exactly what is being disuccsed :)
* vsyrjala
didn't look far back
<vsyrjala>
if only constexpr for functions had been included as well :(
<karolherbst>
mareko: while you are here, are you aware of any recent regression inside radeonsi in regards to the shader compiler _sometimes_ producing different/wrong code? Should be 2-3 weeks old change, but I'm still haveing troubles figuring out what's actually going on here. Just wondeirng if you know something
<mupuf>
mareko: thanks!
<mareko>
karolherbst: if it's the bitset thing, try to use CLEAR instead of SET
<mareko>
in si_compute.c
<mareko>
for the saved registers
<karolherbst>
not quite sure yet.. I need to run a test ~15 times to properly detect the regression, so my git bisect runs are kinda... unreliable so far
<karolherbst>
yeah.. MrCooper bisected to that I think
<mareko>
BITSET_SET_RANGE is 100% wrong, it should be CLEAR
<karolherbst>
okay, thanks :) will try that out then
<mareko>
the previous code used a bitmask and it set 0
<karolherbst>
that one inside si_launch_grid?
<mareko>
there should be only one in that file
<karolherbst>
okay, must be that one then
<mareko>
it's rather obvious from the bad commit
<karolherbst>
yeah, now that I found the spot it indeed looks wrong
i509vcb has joined #dri-devel
<karolherbst>
will need to run the test in a loop for a while to be sure it fixes it :)
<soreau>
could it cause gpu hangs or app crashes?
<karolherbst>
I've seen such happening but not sure if it was caused by that
<mareko>
only with rusticl, clover, or CDNA
<karolherbst>
mareko: yeah.. so it looks better, do you want to submit an MR or should I?
<karolherbst>
I'll do more testing to make sure it's better
<mareko>
feel free to do it
<karolherbst>
okay, once I run more tests I'll open one then
ChaosPrincess has joined #dri-devel
JohnnyonFlame has quit [Read error: Connection reset by peer]
<ChaosPrincess>
is there any documentation on how tessellation control shaders are compiled? even a very simple one that only sets the levels and passes through one variable (tes_color[gl_InvocationID] = tcs_color[gl_InvocationID]) turns into a huge pile of bcsels and control flow.
tursulin has quit [Ping timeout: 480 seconds]
luben has joined #dri-devel
fab has quit [Quit: fab]
mszyprow has joined #dri-devel
hansg has quit [Quit: Leaving]
Leopold_ has quit [Remote host closed the connection]
glennk has quit [Read error: Connection reset by peer]
Leopold_ has joined #dri-devel
macromorgan has quit [Quit: Leaving]
<airlied>
ChaosPrincess: for what gpu?
<ChaosPrincess>
asahi, but that is me dumping nir quite early, right at the beginning of agx_compile_variant
Duke`` has quit [Ping timeout: 480 seconds]
<airlied>
not sure where they lower tess to compute and do actual tessellation
<airlied>
might need alyssa to appear
<ChaosPrincess>
they don't. i am looking at input nir that is being passed from opengl compiler to driver-specific code
<airlied>
NIR_DEBUG=print_tcs might be a good place to look
<jenatali>
ChaosPrincess: IIRC it comes out of the GLSL frontend that way
<ChaosPrincess>
print_tcs says the offending pass is gl_nir_lower_buffers
<jenatali>
Ah, looks like it's probably nir_lower_indirect_derefs which just isn't wrapped in NIR_PASS_V so it doesn't print
luben has quit [Ping timeout: 480 seconds]
vsyrjala has quit [Remote host closed the connection]
YuGiOhJCJ has joined #dri-devel
iive has joined #dri-devel
Company has joined #dri-devel
macromorgan has joined #dri-devel
eukara has joined #dri-devel
luben has joined #dri-devel
sima has quit [Ping timeout: 480 seconds]
luben has quit [Remote host closed the connection]