ChanServ changed the topic of #dri-devel to: <ajax> nothing involved with X should ever be unable to find a bar
nsneck_ has quit [Remote host closed the connection]
<gawin>
(though maybe I'm not familiar with all hardware limitations)
<airlied>
Venemo: in general, I expect rdna2 to be too much different there
<Venemo>
airlied: different how?
<airlied>
sorry not to be much different
<Venemo>
airlied: I'm afraid I don't have enough context to see what exactly the problem is. Is it not possible for the driver to look at some registers and decide which part hung?
<Venemo>
It is difficult to believe thatit hasn't occoured to anyone in the HW design team to add some ability to diagnose these kind of problems.
<imirkin>
Venemo: if it's hung, where does the information to back the register data come from?
<Venemo>
imirkin: I suppose it could read the same place where umr reads it from?
<airlied>
does umr give you enough info?
<airlied>
for power mgmt type hangs I rarely got useful info out of umr
<Venemo>
airlied: it gives much more than what dmesg has.
<airlied>
I suppose you could try and do something like i915 error state, where it records a bunch of stuff into a kernel memory allocation that you dump out through debugfs
<Venemo>
Some classes of problems are quite nicely handled by umr actually, eg. if the problem is clearly shader related it can dump a lot.
pcercuei has quit [Quit: dodo]
<Venemo>
My hope would be that some of this could be automated and dumped into the dmesg log so we don't have to ask users to mess with umr
<Venemo>
It is already too difficult to explain how to get the dmesg log after a GPU hang
<Venemo>
airlied: however that being said umr definitely doesn't always give 'enough' info but it's miles better than what dmesg has.
<bnieuwenhuizen>
Venemo: might be interesting to add devcoredump support to the kernel to dump all the raw data umr uses and then have a separate decoding app that we can then use
<Venemo>
That would be indeed way better than what we have now
<Venemo>
It would at least give us some hint about these random hangs that users seem to have sometimes.
tursulin has quit [Read error: Connection reset by peer]
<mmx_in_orbit>
emersion and everyone: how to use noop gallium driver?
<imirkin>
stick GALLIUM_DRIVER=noop in env
<imirkin>
erm ... wait, that's not quite right
<imirkin>
let me check
<imirkin>
GALLIUM_NOOP=true
<imirkin>
mmx_in_orbit: --^
<mmx_in_orbit>
thank you
kenjigashu has quit [Ping timeout: 480 seconds]
Haaninjo has quit [Quit: Ex-Chat]
vivijim has quit [Ping timeout: 480 seconds]
<mmx_in_orbit>
imirkin: where'd you get that from may i ask?
<ajax>
airlied: i don't know how far down the dd.h rabbit hole you are, but i took a whack a while ago at dropping the unused slots from dd_function_table, feel free to steal: https://paste.centos.org/view/raw/1d21273a
<airlied>
ajax: I already whacked a lot of the unsued onesd
<ajax>
excellent
<airlied>
in master
<airlied>
or main
<airlied>
I have't whacked objectpurge yet
<airlied>
I'm mostly removing live dd.h entires in favour of direct calsl
<airlied>
see 14073/14100
<ajax>
ooh shiny
<airlied>
contemplating merging mesa + state_tracker in various ways
<airlied>
not sure how good an idea it is
<airlied>
once I fix the builds issues CI is throwing up :-P
<dcbaker>
airlied: seems like an obvious idea to me, all the other state trackers are that way
<airlied>
dcbaker: yeah I think it makes a lot of sense, just have to pick some examples of how to move forward
<dcbaker>
I actually figured mareko would be all over that, it seems like an obvious way to lower CPU overhead
<airlied>
ah the mesa tests haven't linked to gallium before now, but now they need to if we direct call int ost
<airlied>
to make the tests build, would be good if you could glance at it
<dcbaker>
I think that also all goes away in my move mesa to src/gallium
<airlied>
dang it standalone compiler uses the NewProgram hook
<airlied>
better bring that one back
dviola has quit [Quit: WeeChat 3.3]
mmx_in_orbit_2 has joined #dri-devel
mmx_in_orbit has quit [Quit: Leaving]
mmx_in_orbit has joined #dri-devel
mmx_in_orbit has quit []
anholt has quit [Remote host closed the connection]
mmx_in_orbit has joined #dri-devel
anholt has joined #dri-devel
Duke`` has joined #dri-devel
camus1 has joined #dri-devel
camus has quit [Read error: Connection reset by peer]
<dcbaker>
airlied: because I totally wasn't clear, `r-b` on that meson patch
<airlied>
dcbaker: thanks
<airlied>
removing mtypes.h from the brw compiler opens a can of worms :-P
sdutt has quit [Remote host closed the connection]
camus has joined #dri-devel
<dcbaker>
oh yeah, I tried that too
<dcbaker>
mtypes is deeply invasive
<dcbaker>
I'm pretty sure even NIR ends up needng mtypes
camus1 has quit [Ping timeout: 480 seconds]
<airlied>
I've disengtangled it quite a bit
<dcbaker>
I should get back to trying to rip the glsl list implementation out of nir and the brw compiler...
danvet has joined #dri-devel
tzimmermann has joined #dri-devel
Duke`` has quit [Ping timeout: 480 seconds]
camus1 has joined #dri-devel
mmx_in_orbit_ has joined #dri-devel
camus has quit [Ping timeout: 480 seconds]
mmx_in_orbit has quit [Ping timeout: 480 seconds]
mmx_in_orbit__ has joined #dri-devel
alanc has quit [Remote host closed the connection]
alanc has joined #dri-devel
mmx_in_orbit_ has quit [Ping timeout: 480 seconds]
mlankhorst_ has joined #dri-devel
jkrzyszt has joined #dri-devel
<kusma>
Hmmpf. Thinking about it a tiny bit more, moving SWR to the amber branch, seems like it might be a mistake... It's fundamentally different from the other drivers we moved there, as it's a Gallium driver. So if that lives on the amber branch, we also need to keep Gallium around there. Otherwise, we could have culled all of Gallium from that branch as well...
ramaling has joined #dri-devel
pnowack has joined #dri-devel
rasterman has joined #dri-devel
JohnnyonFlame has quit [Ping timeout: 480 seconds]
<danvet>
hm does dri3proto not transport fourcc or am I just blind?
tursulin has joined #dri-devel
<daniels>
danvet: you’re not blind … fourcc is implied by the combination of the bit depth + bpp of the pixmap you store into, plus the visual of the destination window
camus1 has quit [Remote host closed the connection]
<robclark>
mripard: I ran into a fun issue w/ fw_devlink vs swapping dsi vs bridge probe order..
<robclark>
afaict what is going on is that nothing triggers the bridge to re-probe after the dsi-host is registered
<robclark>
not sure if you hit that issue? I'm starting to poke around and figure out how fw_devlink works..
Duke`` has joined #dri-devel
<agd5f>
airlied, Venemo for power management related hangs, we have a new STB dump where we can get a dump of the basically the hardware and firmware sequences. Currently only supported on newer APUs and RDNA dGPUs. It's available via debugfs.
<agd5f>
STB = Smart Trace Buffer
<agd5f>
for engine related hangs, we can dump more of what umr does, but it gets a bit tricky with stuff like gfxoff and powergating
<agd5f>
like reading registers in a block which is not powered up will causes hangs
<robclark>
agd5f: have you looked at devcoredump stuff? We have that wired up to dump gpu/gmu/fw state and cmdstream when there are gpu hangs.. (and also wired that up to CrOS crash reporting infrastructure).. it has been pretty useful for us
boistordu_ex has quit [Remote host closed the connection]
<agd5f>
robclark, no, I'll take a look
jkrzyszt has quit [Remote host closed the connection]
<danvet>
Lyude, vsyrjala some dp fixes from Kees Cook, can you pls take care?
alyssa has joined #dri-devel
* alyssa
keeps wondering how compile time would be affected if we used dynarrays of instructions instead of linked lists
ella-0 has joined #dri-devel
<imirkin>
alyssa: assuming you want to insert insturctions in the middle / move stuff around, negatively i'd assume
ella-0_ has quit [Read error: Connection reset by peer]
<alyssa>
imirkin: Maybe
<alyssa>
But how often are we inserting instructions in an arbitrary /middle/, as opposed to inserting immediately before/after a cursor while already iterating the instructions
<jenatali>
How would that change performance characteristics of insertion?
<alyssa>
jenatali: ^^
<jenatali>
Sure, but it's still a middle. Just because you have a cursor to it and have been iterating, you still need to move everything else out of the way to do the insertion?
<alyssa>
The latter can be handled by "allocate shadow array of instructions. for each instruction { if (want to insert at cursor) { append to shadow } ... append instruction to shadow }. replace array with shadow and free original"
<imirkin>
alyssa: i dunno about nir, but in nouveau, pretty often. phi's / jumps are at the begin/end of a bb
<imirkin>
and we insert "regular" instructions after phi's / before jumps
<imirkin>
we could store those lists completely separately i suppose and provide a virtual iterator
<alyssa>
You copy everything once for the whole optimization pass, not once per instruction inserted
<anholt>
coooool. hooked up r300 shader-db so I could quantify its NIR conversion, but it turns out shader-db's ./run just produces a sigill at the end on my hw.
<jenatali>
Oh sure, amortize the cost by doing full clone/swap instead. Interesting idea, would be interesting to experiment
<imirkin>
we also have a pass to flip orders of instructions
<imirkin>
(when they don't depend on each other, and flipping the order can result in better packing)
<alyssa>
jenatali: right, both this approach and linked lists are amortized O(1), just a question of constants
<alyssa>
overhead of iterating linked list (versus nice packed caches), versus overhead of memcpy'ing entire instructions (instead of just looking at prev/next pointers)
<alyssa>
imirkin: nouveau handles phis?
<jenatali>
Sure, but it increases the overhead of running any pass that would need to do modification since it needs to start copying the entire block
<alyssa>
Right
<imirkin>
alyssa: nouveau was the first SSA compiler in mesa by like ... 5 years?
<imirkin>
actually i guess sb was also SSA. so less than that.
<anholt>
alyssa: well, instructions are variable-sized, so unless you're going to give them bounds or internal pointers, you need a pointer between your array and the actual instr.
<alyssa>
anholt: nod, that's fair. panfrost backend instructions are fixed-size... then again I should strongly consider not doing that...
<anholt>
not looking forward to a world of "well, we don't want to add a const index to this intrinsic because that would expand the size of every nir instr."
* alyssa
has that world in her backend
<alyssa>
Probably was the wrong decision but hey
<anholt>
oh. someone clever put -march=native in the shader-db Makefile.
<clever>
wasnt me!
<imirkin>
clever: prove it
<clever>
imirkin: i dont have commit access :P
<cmarcelo>
alyssa: also the scheme we use with destination (dest.ssa) will require a bunch of references to be updated, right? would probably need a different way to reference that.
<imirkin>
clever: clever
<imirkin>
(sorry, couldn't resist...)
<idr>
alyssa, imirkin: If the dynarray uses something like the old "buffer gap" structure that old text editors used to use, insertions at the cursor should be very efficient... even when the cursor moves around a bit.
<idr>
It might be an interesting experiment, anyway.
<alyssa>
cmarcelo: Ah, right for NIR ... in the bifrost backend at least, the dest.ssa pointers are collected while iterating in the optimization pass itself... an idea from ACO
<alyssa>
(Also "just" a constant factor different, not sure which wins out.)
* airlied
wonders could you add some read/write counters for each instr block and try to work out the ratios
<alyssa>
perf counters in your compiler? likelier than you'd think :-p
mlankhorst has quit [Ping timeout: 480 seconds]
<airlied>
alyssa: then you could use the perf counters to pick an optimal path using ML :-P
hikiko has joined #dri-devel
nchery has quit [Quit: Leaving]
hikiko_ has quit [Ping timeout: 480 seconds]
mbrost has joined #dri-devel
ngcortes has joined #dri-devel
nchery has joined #dri-devel
mszyprow has joined #dri-devel
<robclark>
mripard: ok, I tracked it down.. the ti bridge wasn't returning -EPROBE_DEFER.. patch sent
<anholt>
I should probably also do some piglit testing to make sure I haven't trashed desktop GL
<anholt>
just, ugh. unstable gpus suck.
<imirkin>
anholt: having an optimizing compiler helps ;)
<anholt>
not having an optimizing compiler in the backend might honestly help r300 a lot
<imirkin>
anholt: well, i meant somewhere in the pipeline
<anholt>
it's so very slow
<imirkin>
r300's compiler was not really that optimizing
<anholt>
it has loop unrolling!
<imirkin>
nv30 would also benefit most likely
<anholt>
it's got to be so great!
<imirkin>
as esp nv3x itself (not nv4x) has zero support for loops
<imirkin>
anholt: if you do happen to pick up this work for nv30 gallium driver, feel free to review https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8544 . i didn't push it because i was concerned about double-compiling shaders unnecessarily...
<anholt>
imirkin: ah, good to know about. yeah, will do.
<ajax>
anholt: did we ever get shaderdb for llvmpipe working? there's an issue against shaderdb that's still open
<anholt>
I don't think I've heard of anyone working on that
<ajax>
hmph. adds to todo.
<imirkin>
anholt: i know you're not signing up for this, but i'd also welcome a "direct" nir backend for nv30. if you're interested :)
<alyssa>
ajax: Sounds like a can of worms... would that be shader-db stats for the LLVM IR, or a different set of stats for each target architecture, or stats for one representative architecture (cue bikeshedding which one, I preemptively vote arm64)
<imirkin>
anholt: the only thing i'm concerned about with this approach is the scalar -> vector conversion. it could result in a lot of stupidity. but i guess what matters is whether it does in practice for shaders that target those arch's
<alyssa>
I guess even LLVM IR is loaded, because IIRC gallivm emits target-specific intrinsics for things like SSE
<ajax>
alyssa: i kind of want to hook up the llir interpreter as an llvmpipe mode _anyway_. at which point, go ahead and emit code for whatever non-native cpu if you're just compiling the shaders.
Haaninjo has joined #dri-devel
<alyssa>
ajax: would that get rid of softpipe?
<alyssa>
since I assume llir interpreting works on architectures llvm doesn't support
<ajax>
in the most rubegoldbergian of ways, but sure yeah why not
<alyssa>
:D
<ajax>
that's the intent anyway. i think that's approximately the state of llvm on m68k for example
<Sachiel>
that's gitlab's merge thing though, marge will rebase when it gets to it and if it can't, it will complain and assign it back to you
<FireBurn>
But I'm not seeing a merge request for it yet
<FireBurn>
Sachiel, thanks
oneforall2 has quit [Quit: Leaving]
idr has joined #dri-devel
<marex>
well urgh ...
<marex>
even if I run kmscube and monitor the processes Rss, I see it growing
<marex>
slowly, over time, there is increase
mclasen has quit [Ping timeout: 480 seconds]
gawin has joined #dri-devel
pcercuei has quit [Quit: dodo]
Duke`` has quit [Ping timeout: 480 seconds]
<alyssa>
marex: out of interest -- what about number of open file descriptors?
* alyssa
had an fd leak
<marex>
alyssa: there is 8 of them open thus far
<alyssa>
yeah nope nvm
<marex>
I'll let kmscube run for a bit longer to see whether it grows further, it stopped growing around 200 seconds mark
<marex>
there is also much more significant growth when running glmark2-es2-drm jellyfish test, but I suspect that might be glmark not freeing up some context ?
<mmx_in_orbit>
airlied: sudo ninja install just finished for me. so what is the envvar for blackhole rendering via r600 with your code?
mszyprow has joined #dri-devel
<HdkR>
Intel blackhole render extension isn't an environment variable but instead glEnable(GL_BLACKHOLE_RENDER_INTEL) in code
<idr>
Hrm... that MR is still not getting past the CI.
<idr>
The failure that I thought was a flake, I now got twice in a row.
<idr>
One of the traces no Apollolake segfaults... but not in Mesa.
<anholt>
segfaults? not in the runs I've seen of yours
<anholt>
the runs I looked at the rendering had changed