ChanServ changed the topic of #dri-devel to: <ajax> nothing involved with X should ever be unable to find a bar
benjamin1 has joined #dri-devel
benjaminl has quit [Ping timeout: 480 seconds]
<cmarcelo>
anyone that knows / maintains margebot: wondering if I just put marge in an odd state by pushing on an MR that marge pushed to.
<cmarcelo>
(is there a "marge log" somewhere I can peek at in those cases?)
<airlied>
cmarcelo: what it processing that MR?
<airlied>
if so you should unassign it, and kill any pipelines it was running
<cmarcelo>
oh, I missed canceling the pipeline (it was already failing)
<airlied>
yeah I think if it get cancelled, marge might wake up before the 1hr expiry
<airlied>
though it may not
<cmarcelo>
it did wake up
<cmarcelo>
thanks
nashpa has quit []
kzd has quit [Quit: kzd]
a-865 has joined #dri-devel
dliviu has joined #dri-devel
yyds has joined #dri-devel
penguin42 has quit [Remote host closed the connection]
kzd has joined #dri-devel
memleak has joined #dri-devel
co1umbarius has joined #dri-devel
<memleak>
Hey I was debugging PREEMPT_RT latency spikes with amdgpu and radeon DRM drivers, I finally have a consistent stack trace now which is exceeding 30-50 microseconds (occasionally even spikes to above 200 microseconds)
<airlied>
okay so yes you are hitting a hw register and hw takes time to react
<memleak>
ok :)
<airlied>
not sure there's much can be done about it
<airlied>
mmio register reads/writes can stall the cpu, don't think there's any nice way around it
<memleak>
well hey! that at least solves the mystery!
<memleak>
and it's not user error!
<memleak>
Thank you! :D
memleak has quit [Remote host closed the connection]
Company has quit [Remote host closed the connection]
heat has quit [Remote host closed the connection]
kzd has quit [Ping timeout: 480 seconds]
crabbedhaloablut has joined #dri-devel
kzd has joined #dri-devel
Daanct12 has joined #dri-devel
mbrost has quit [Read error: Connection reset by peer]
Daanct12 has quit [Quit: WeeChat 4.0.4]
Daanct12 has joined #dri-devel
ngcortes_ has quit [Ping timeout: 480 seconds]
JohnnyonFlame has quit [Ping timeout: 480 seconds]
rz has quit [Remote host closed the connection]
rz has joined #dri-devel
YuGiOhJCJ has joined #dri-devel
slattann has joined #dri-devel
memleak has joined #dri-devel
<memleak>
hey airlied I just wanted to come back and say I'm sorry for possibly annoying the shit out of you years ago, I was really hyper, I talked too much and I was a handful for everybody.
kzd has quit [Ping timeout: 480 seconds]
<memleak>
I was in junior high when I first started dabbling with X.org, anyways, thank you for everything.
memleak has quit []
Lyude has quit [Ping timeout: 480 seconds]
slattann has quit [Quit: Leaving.]
Lyude has joined #dri-devel
sukrutb has joined #dri-devel
sarnex has quit [Ping timeout: 480 seconds]
sarnex has joined #dri-devel
Duke`` has joined #dri-devel
fab has joined #dri-devel
i-garrison has quit []
i-garrison has joined #dri-devel
bmodem has joined #dri-devel
junaid has joined #dri-devel
junaid has quit [Remote host closed the connection]
itoral has joined #dri-devel
Duke`` has quit [Ping timeout: 480 seconds]
bmodem has quit [Ping timeout: 480 seconds]
bmodem has joined #dri-devel
bmodem has quit [Excess Flood]
tzimmermann has joined #dri-devel
bmodem has joined #dri-devel
fab has quit [Quit: fab]
ungeskriptet has quit [Ping timeout: 480 seconds]
ungeskriptet has joined #dri-devel
sima has joined #dri-devel
rasterman has joined #dri-devel
jkrzyszt has joined #dri-devel
fab has joined #dri-devel
An0num0us has joined #dri-devel
sghuge has quit [Remote host closed the connection]
sghuge has joined #dri-devel
f11f12 has joined #dri-devel
f11f12 has quit [Remote host closed the connection]
paulk has quit [Quit: WeeChat 3.0]
paulk has joined #dri-devel
vliaskov has joined #dri-devel
pcercuei has joined #dri-devel
f11f12 has joined #dri-devel
Ahuj has joined #dri-devel
swalker_ has joined #dri-devel
swalker_ is now known as Guest2563
swalker__ has joined #dri-devel
frieder has joined #dri-devel
Guest2563 has quit [Ping timeout: 480 seconds]
elongbug__ has joined #dri-devel
elongbug_ has quit [Remote host closed the connection]
hansg has joined #dri-devel
elongbug__ has quit [Remote host closed the connection]
elongbug__ has joined #dri-devel
fab has quit [Ping timeout: 480 seconds]
rgallaispou has joined #dri-devel
qyliss has quit [Quit: bye]
qyliss has joined #dri-devel
qyliss has quit [Remote host closed the connection]
<itoral>
karolherbst: doesn't make any sense to me... if you don't V3D_DEBUG at all, don't you see any mem faults?
An0num0us has quit [Ping timeout: 480 seconds]
<karolherbst>
itoral: correct
<karolherbst>
maybe something something VM placement or something
<karolherbst>
I think the shader accesses OOB no matter what, but I'll debug more thouroughly today on what's going on here... I was able to get rid of that error by doubliing buffer sizes
<karolherbst>
it's just _very_ confusing that setting that env var makes a difference :D
<karolherbst>
the value of v3d_mesa_debug doesn't change, but I suspect something changes in that handling of that env var which changes something else? dunno.. it's just very odd :D
donaldrobson has quit [Ping timeout: 480 seconds]
donaldrobson has joined #dri-devel
<itoral>
yeah, I think that what happens is that for some reason when the envvars are set some allocation patterns change and that makes some OOB accesses land into valid memory addresses
mripard_ has joined #dri-devel
mripard has quit [Read error: Connection reset by peer]
<itoral>
karolherbst: do these tests use global address intrinsics to read memory from a buffer that is then used to compute global addresses for other global reads/writes?
Daanct12 has quit [Ping timeout: 480 seconds]
<itoral>
I ask because if that is not happening then you can do a simple trick to identify the bad access(es): you drop all global reads/writes from the kernel (for example by not emitting the global intrinsic from the compiler) and then start putting them back into the kernel one by one until you see the OOB error again
mripard has joined #dri-devel
<itoral>
actually, you could also use this tactic even if you use the results from a read to compute the address for follow-up reads, since you are adding later global intrinsics progressively one by one
<itoral>
one you know the first global intrinsic that causes the problem then we can just look at how the address is generated to figure out what is wrong
mriesch has quit [Remote host closed the connection]
donaldrobson has quit [Ping timeout: 480 seconds]
cmichael has joined #dri-devel
donaldrobson has joined #dri-devel
mripard_ has quit [Ping timeout: 480 seconds]
Daanct12 has joined #dri-devel
jdavies has joined #dri-devel
jdavies is now known as Guest2575
mriesch has joined #dri-devel
Guest2575 has quit [Ping timeout: 480 seconds]
tarceri_ has joined #dri-devel
<karolherbst>
itoral: yeah, it's just one sized buffer bound and then read/write to
<karolherbst>
the kernel is really trivial
<karolherbst>
it's literaly this:
<karolherbst>
int tid = get_global_id(0);
<karolherbst>
dst[tid] = ((1<<16)+1);
enunes has joined #dri-devel
<karolherbst>
I wonder if the test is slightly buggy... maybe I pass the buffer size into it and see what I can do with that
enunes has quit []
jhli_ has joined #dri-devel
tarceri has quit [Ping timeout: 480 seconds]
JohnnyonFlame has joined #dri-devel
<karolherbst>
mhhhhh
<karolherbst>
itoral: I think it has something to do with how the kernel is launched
<karolherbst>
there is an OOB read and if I cap the tid to the buffer size it doesn't cause those
<karolherbst>
_but_
<karolherbst>
the kernel also launches threads according to the buffer size so that should be impossible
<karolherbst>
however.. CL has a cursed feature.. :D printf
jhli has quit [Ping timeout: 480 seconds]
<karolherbst>
ahh I can't use it as it needs global atomics, which I haven't looked at yet
<karolherbst>
huh....
sukrutb has quit []
egbert is now known as Guest2576
egbert has joined #dri-devel
<karolherbst>
itoral: mhhh... maybe it's also something to do with me overclocking the rpi with +400 MHz...
<karolherbst>
let me try without it first
anholt__ has joined #dri-devel
<karolherbst>
ahh no..
<karolherbst>
ahh no..
Guest2576 has quit [Ping timeout: 480 seconds]
<karolherbst>
but higher CPU load does make it more likely at least.. yeah so something odd is going on
anholt_ has quit [Ping timeout: 480 seconds]
<karolherbst>
I think the test is doing silly things...
mauld has quit [Ping timeout: 480 seconds]
penguin42 has joined #dri-devel
elongbug_ has joined #dri-devel
elongbug__ has quit [Read error: Connection reset by peer]
elongbug__ has joined #dri-devel
elongbug_ has quit [Ping timeout: 480 seconds]
mauld has joined #dri-devel
<glehmann>
how do the fdot_replicated opcodes work? can they have any number of output components or does e.g. fdot4_replicated always have a 4 component output?
<itoral>
overclocking should't really have any impact
<karolherbst>
well.. I'm quite close to the point where increasing the clock a bit further causes the CPU to do wrong things :D
<karolherbst>
I've configured it in a way to not increase the voltage over the limit
<karolherbst>
but yeah.. the setting is fine it seems and never caused any problems
<karolherbst>
it clearly reads OOB but I have no idea why...
<itoral>
so capping the TID fixes the issue? mmm...
<karolherbst>
ehh.. no
<karolherbst>
I just got (un)lucky
<itoral>
ah :)
<itoral>
is that write to dst the only global address access in the kernel?
<karolherbst>
now I'm running "stress -c8" int the background and things are a bit more interesting
<itoral>
interesting, in that case the only way we can have an OOB is that tid is out of bounds.... have you tried making the dst buffer larger and write the tid into it?
<itoral>
then inspect te buffer when you trigger the mem faults and check it the tids are sane
<itoral>
I don't quite imagine why they wouldn't be, but something weird is happening so...
<karolherbst>
maybe something with the shader?
<itoral>
can you dump the kernel with V3D_DEBUG=cs?
<karolherbst>
yeah... something is odd
<karolherbst>
doing this instead makes the fault go away: if (&dst[tid] < 0x70000 || &dst[tid] >= 0x80000) dst[tid] = ((1<<16)+1);
<karolherbst>
at least it seems that way
<karolherbst>
itoral: the odd thing is, the test passes no matter what, so maybe it's just more threads running than expected? Anyway, will dump the plain shader
devarsh_ has joined #dri-devel
<itoral>
wht would that if fix anything? isn't dst bound to different addresses in various iterations? At least it looks like that from thaces you pasted
<karolherbst>
yeah.. there are three pre allocated buffers in that test
<itoral>
karolherbst: does v3d_csd_choose_workgroups_per_supergroup return a number other than 1?
<karolherbst>
each 16384 elements big, once with int/int2/int4
<karolherbst>
itoral: nah, that's always 1 it seems
<MrCooper>
daniels: FWIW, sending plain-text e-mails with Thunderbird works mostly fine for me (still on 102 though, since one extension I use doesn't support 115 yet); I disabled mailnews.send_plaintext_flowed and set mailnews.wraplength to 0
<austriancoder>
what is the official definition of a nir system value?
bmodem has quit [Ping timeout: 480 seconds]
<mareko>
austriancoder: a value that doesn't come from the user
<austriancoder>
mareko: thanks
YuGiOhJCJ has quit [Quit: YuGiOhJCJ]
Danct12 has quit [Read error: Connection reset by peer]
jewins has joined #dri-devel
heat has quit [Remote host closed the connection]
heat has joined #dri-devel
mbrost has joined #dri-devel
hansg has joined #dri-devel
fxkamd has joined #dri-devel
idr_ has joined #dri-devel
vliaskov has quit [Remote host closed the connection]
yuq825 has quit [Remote host closed the connection]
idr has quit [Ping timeout: 480 seconds]
fxkamd has quit []
bmodem has joined #dri-devel
kzd has joined #dri-devel
idr_ has quit []
idr has joined #dri-devel
kzd has quit [Quit: kzd]
rasterman has quit [Quit: Gettin' stinky!]
mripard has quit [Quit: mripard]
Duke`` has joined #dri-devel
benjamin1 has quit [Ping timeout: 480 seconds]
Danct12 has joined #dri-devel
shashanks has joined #dri-devel
frieder has quit [Remote host closed the connection]
cmichael has quit [Quit: Leaving]
benjaminl has joined #dri-devel
soreau has quit [Ping timeout: 480 seconds]
benjaminl has quit [Ping timeout: 480 seconds]
benjaminl has joined #dri-devel
sarahwalker has quit [Remote host closed the connection]
<zf>
I hesitate to immediately file a bug, since this could be our bug, although the validation layers don't complain... but I'd appreciate if someone could give me some pointers where to look in the source?
sima has quit [Ping timeout: 480 seconds]
<zf>
since I am wholly unfamiliar with nir
JohnnyonFlame has quit [Ping timeout: 480 seconds]
rasterman has quit [Quit: Gettin' stinky!]
mmx_in_orbit__ has joined #dri-devel
macromorgan has quit [Read error: Connection reset by peer]
heat_ has quit [Remote host closed the connection]
heat_ has joined #dri-devel
macromorgan has joined #dri-devel
ngcortes has joined #dri-devel
<DemiMarie>
zf: if the validation layers don’t complain and your program isn’t corrupting memory (use Address Sanitizer to check that), it’s a Mesa bug
nchery has quit [Ping timeout: 480 seconds]
<DemiMarie>
That error message means that Mesa is generating invalid IR; it’s the equivalent of an internal compiler error in GCC, Clang, or MSVC.
<zf>
well, it could always be a bug in the validation layers, i.e. a missing validation
<DemiMarie>
is your program open source?
<zf>
but I can certainly file a mesa bug on the assumption that it's safe
<DemiMarie>
yeah
<zf>
yes, this is actually something we're running into in the Wine self test suite
<DemiMarie>
Ah, so that is why you have Windows-style pathnames :)
<DemiMarie>
If you have a small reproducer that should help the Mesa developers fix the issue.
<zf>
yeah, that's... the hard part
<zf>
it's Vulkan, so "small reproducer" isn't really a thing
<zf>
and Wine is not exactly a lightweight piece of software
<zf>
if this was GL, I could record an apitrace, but I guess no such thing exists for Vulkan
a-865 has joined #dri-devel
macromorgan has quit [Read error: Connection reset by peer]
<DemiMarie>
It probably should
<Sachiel>
gfxreconstruct exists
<Sachiel>
you can go back a while and see if the same issue exists and if not, try bisecting
<DemiMarie>
BTW geometry shaders are generally not very efficient, so if this shader comes from Wine then it is probably best to use something else
<zf>
we're a translation layer, so we do need the geometry shaders :-)
<zf>
thanks, I'll try gfxreconstruct
<zf>
I don't see the issue with stock distribution Mesa, but I wouldn't be surprised if that's because it's built with NDEBUG
<DemiMarie>
To quote the Arm Mali docs (or possibly an old version of them): “Most use-cases for Geometry shading are better handled by compute shaders.” and “Find a better solution to your problem. Geometry shaders are not your solution.”
<zf>
trust me, I'm well aware of the problems with geometry shaders, but we don't really have a choice in the matter
<DemiMarie>
It is actually possible to emulate geometry shaders using nothing but compute shaders.
pcercuei has quit [Quit: dodo]
<DemiMarie>
AGX will need to do that because Apple hardware doesn’t support geometry shaders at all.
<Sachiel>
not being in control of either the source of the shaders nor the driver, it'd be a ton of work to probably get a ton of weird failure cases
<zf>
and worse performance
<DemiMarie>
That said, I understand you not wanting to take that route.
<DemiMarie>
Probably
mauld has quit [Remote host closed the connection]
mauld has joined #dri-devel
heat_ has quit [Remote host closed the connection]
An0num0us has quit [Ping timeout: 480 seconds]
Haaninjo has quit [Quit: Ex-Chat]
ngcortes_ has joined #dri-devel
ngcortes has quit [Ping timeout: 480 seconds]
alyssa has joined #dri-devel
shashanks_ has joined #dri-devel
idr_ has joined #dri-devel
idr has quit [Ping timeout: 480 seconds]
shashanks has quit [Ping timeout: 480 seconds]
mbrost has quit [Remote host closed the connection]
mbrost has joined #dri-devel
lemonzest has quit [Quit: WeeChat 4.0.4]
mbrost_ has joined #dri-devel
mbrost has quit [Read error: Connection reset by peer]