QwertyChouskie has quit [Ping timeout: 480 seconds]
aravind has joined #dri-devel
<graphitemaster>
Is there anything special w.r.t data hazard when writing to multiple faces of a cubemap from a compute shader?
<graphitemaster>
Like I have a compute dispatch that does six work groups on the Z dimension and I'm indexing the face of the cubemap to write to based on gl_GlobalInvocationID.z, but for some reason I'm getting very strange results on Intel mesa, works everywhere else.
<graphitemaster>
If I do a dispatch per face the problem goes away
<mareko>
I wonder if we really need PREFERRED_IR=TGSI if drivers use nir_to_tgsi internally
<ishitatsuyuki>
graphitemaster, sounds fine to me so probably an ANV bug.
<ishitatsuyuki>
hm wait
<ishitatsuyuki>
that might have something to do with non uniform index
<ishitatsuyuki>
let's look up the spec
aravind has quit [Ping timeout: 480 seconds]
<ishitatsuyuki>
so I think it's fine if you are passing the face as the z component in imageStore, but if you treat it as an array there might be a problem
fahien has joined #dri-devel
<graphitemaster>
Not an array, the image being written to is layout(rgbaf16) restrict writeonly imageCube, just a old imageStore(..., ivec3(gl_GlobalInvocationID), value)
YuGiOhJCJ has quit [Quit: YuGiOhJCJ]
<graphitemaster>
Problem also goes away if I add a glFinish after the glDispatchCompute
<ishitatsuyuki>
no idea about gl unfortunately
<ishitatsuyuki>
i only do vk stuff these days ;)
<HdkR>
graphitemaster: Probably need a glMemoryBarrier
nchery has quit [Read error: Connection reset by peer]
<graphitemaster>
I have a GL_TEXTURE_FETCH_BARRIER_BIT before I used the written result
<HdkR>
Might need GL_SHADER_IMAGE_ACCESS_BARRIER_BIT
<graphitemaster>
I don't use imageLoad though, it's a regular texture()
<graphitemaster>
But I could try adding that I guess
<HdkR>
But you're using imageStore in the compute shader
<HdkR>
It affects loads and store ordering
kts has joined #dri-devel
<graphitemaster>
Not sure I understand. It's a single dispatch of 6 work groups on Z, there's no intermediate time to add a barrier call, just when I use the written result later and I was under the impression you only issue the barrier for how you'll use the resource, not for how it was prepared
<graphitemaster>
In this case I'm using it as a texture(), so the only barrier needed should be GL_TEXTURE_FETCH_BARRIER_BIT at least according to the spec.
<MrCooper>
whoever tleydxdy is talking to, we can't see their messages on IRC
<tleydxdy>
yeah, the matrix bridge always kick me out without telling me too :(
<doras>
How about now?
alanc has quit [Remote host closed the connection]
alanc has joined #dri-devel
QwertyChouskie has joined #dri-devel
QwertyChouskie has quit [Ping timeout: 480 seconds]
<Rayyan>
hello doras from IRC
<doras>
:)
<doras>
I'll repeat what I wrote:
<doras>
So apparently the user experience with AMD GPUs in games is abysmal without setting pp_power_profile_mode to 3D_FULL_SCREEN, and it gets worse the more powerful the GPU is (and thus the higher the chance that it won't be 100% utilized by the game).
<doras>
It either means that AMD's heuristics are (very) bad at keeping the GPU clocks relatively stable in common workloads, or that there's a real need for user space to communicate the type of content being presented at any given time.
<clever>
doras: would that apply to cards this old?
<clever>
ive noticed abysmal performance in some games
<clever>
5 fps or less
HerrSpliet has joined #dri-devel
<tleydxdy>
doras: I did some search, apparently even under windows it have problem going into high performance graphics modes automatically
<tleydxdy>
I think the port is just too bad
RSpliet has quit [Ping timeout: 480 seconds]
HerrSpliet has quit [Read error: No route to host]
RSpliet has joined #dri-devel
<doras>
clever: I doubt it has the necessary kernel support for power management. The issue I'm referring to is stutter, most likely caused the GPU clocks ramping up and down too often due to aggressive power management.
<tleydxdy>
you say most likely, have you checked?
<tleydxdy>
like logging the clocks or something
<clever>
doras: ahh
<clever>
tleydxdy: how can the clocks be viewed? radeontop graphs them with bars, but its hard to see the change over time or feed it into grafana
<clever>
yeah, theres the problem, `cat pp_dpm_mclk` contains a *!
<tleydxdy>
hmm interestingly, on my system the pp_power_profile_mode is 3D_FULL_SCREEN already
<tleydxdy>
I'm in my desktop, not running any games
<clever>
pp_power_profile_mode has a * beside 3D_FULL_SCREEN for me too
<tleydxdy>
yeah that's the currently selected one
<clever>
the only thing ive got running that could really be using opengl is the compositing window manager
<clever>
and perhaps youtube in chrome, but its currently paused
<clever>
no chance when i untick composition in xfce
<clever>
change*
danvet has joined #dri-devel
sdutt has quit []
sdutt has joined #dri-devel
<clever>
tleydxdy: what do the numbers within pp_power_profile_mode mean?
<pinchartl>
what's the best way to test if a KMS device supports plane scaling ? I can create a test commit, but I'd like a method that I can use before I have any frame buffer available
<tleydxdy>
*_HYST are ramp up and ramp down time?
<tleydxdy>
* I think
<clever>
that covers 4 of them, which just leaves the active levels for sclk and mclk
<tleydxdy>
yeah no idea for those
<clever>
ah, and i'm guessing an UP_HYST of 0, means it can ramp up instantly
<clever>
but a larger DOWN_HYST means it takes times to fall again
<clever>
i'm reminded of some cpufreq bugs i had decades ago, where the cpu would freeze for 0.5 seconds every time the freq changed
<clever>
just running "ls -l" would make the cpu ramp up
<clever>
and right as i type a reply, it decides to ramp down again
<clever>
the ps2 buffer overflows during thw down-ramp, and it looses key-up eventsssssssssssssssssssssssssssssss
<clever>
at the time there was no up/down hyst controls, but all scaling was controlled by a userland daemon, which had a configurable polling interval
<tleydxdy>
but SCLK_UP_HYST Delay before sclk is increased (in milliseconds)
<clever>
slower polling made it slow to both rampup and rampdown
<tleydxdy>
SCLK_ACTIVE_LEVEL Workload required before sclk levels change (in %)
<clever>
so it was far less twitchy
<tleydxdy>
etc
<clever>
ahhh
<clever>
so in my current profile, it only has to have >30% load for 0ms, before it will ramp up the sclk
<clever>
but it must be <30% for 100ms to ramp down, i think
<tleydxdy>
might be
<clever>
and the power-save one, waits 10ms at >30% before it ramps up
<clever>
so its less eager to drain your battery
<clever>
but at 60fps, 1 frame is 16ms
<tleydxdy>
then the problem become why is doras not in this 3D mode automatically
<clever>
and why am i in 3d mode, when all i have open is the compositor and chrome
<clever>
hmmm, but none of the other profiles really fit my current state
<tleydxdy>
yeah
<karolherbst>
everything is 3d these days
<clever>
bootup wouldnt make sense, i'm not in a fullscreen 3d game, powersave/video/vr/compute are all that remain