<dt9>
danvet: regarding your question - no, noone rewrote gem_exec_schedule yet
ppascher has joined #dri-devel
frieder has joined #dri-devel
<danvet>
dt9, I'll cc you and adixit on some patch
<danvet>
I think I'll just do a quick hack
<danvet>
for my problem
<danvet>
but if my understanding is correct, that test should be converted to softpin unconditionally
<danvet>
since relocations are getting in the way of the test logic
<danvet>
and there's some not entirely 100% looking hacks to avoid the issues
<danvet>
so softpin everywhere for that test will maybe make it a bit more reliable
mripard has joined #dri-devel
bbrezillon has joined #dri-devel
<dt9>
danvet: relocations implicitly helps keeping pipeline busy, with softpin there's not so easy to do so, because after closing/freeing offset (in allocator) we can stall pipeline - second bo will get same offset (previously freed for first bo) so it has to wait for vma reuse
<dt9>
we can use pseudo-allocations using incremented offsets (wrt to size of previous allocations) to have similar to relocations behavior
mripard has quit [Quit: leaving]
mripard has joined #dri-devel
<danvet>
dt9, with this test it's the other way round
<danvet>
the relocations can cause stalls, so we need to assign fixed addresses for all buffers upfront
<danvet>
maybe the testcase needs to be converted to use hardcoded addresses even (it's kinda doing that right now)
<dt9>
danvet: which subtest you mean?
<danvet>
anything that uses __store_dword() iirc
<dt9>
one of test I rewrote in my private branch uses similar store_dword()
<dt9>
I can check how much effort is required to quickly change this to softpin
boistordu has quit [Remote host closed the connection]
RAOF has joined #dri-devel
blue__penquin has joined #dri-devel
<danvet>
dt9, I don't think it's a case of "quickly"
<danvet>
and I think my change (need to recheck my analysis) is really small change
<dt9>
danvet: yes, but change to migrate to softpin is not straighforward and requires changes in spinner (I got this on private branch still)
pcercuei has joined #dri-devel
RAOF is now known as RAOF2
nsneck has quit [Remote host closed the connection]
pekkari has joined #dri-devel
evadot has quit [Remote host closed the connection]
manu has joined #dri-devel
bcarvalho has joined #dri-devel
mlankhorst has joined #dri-devel
pekkari has quit []
pekkari has joined #dri-devel
thellstrom has joined #dri-devel
thellstrom has quit [Ping timeout: 480 seconds]
rasterman has joined #dri-devel
abelloni has joined #dri-devel
blue__penquin has quit []
thellstrom has joined #dri-devel
thellstrom has quit [Remote host closed the connection]
itoral has joined #dri-devel
abelloni has left #dri-devel [#dri-devel]
thellstrom has joined #dri-devel
thellstrom has quit [Ping timeout: 480 seconds]
frieder_ has joined #dri-devel
frieder has quit [Ping timeout: 480 seconds]
adjtm has quit [Quit: Leaving]
pcercuei has quit [Quit: brb]
adjtm has joined #dri-devel
pcercuei has joined #dri-devel
RAOF2 has quit [Ping timeout: 480 seconds]
adjtm is now known as Guest310
adjtm has joined #dri-devel
Guest310 has quit [Ping timeout: 480 seconds]
ella-0 has joined #dri-devel
Sumera has joined #dri-devel
<Sumera>
melissawen, danvet: what is a good way to debug memory errors?
<Sumera>
I have a feeling it's because kfree() is not being called somewhere, but I tried changing that, still no show :/
Lightkey has quit [Ping timeout: 480 seconds]
neonking has quit [Ping timeout: 480 seconds]
Lightkey has joined #dri-devel
itoral has quit []
xp4ns3 has quit []
xp4ns3 has joined #dri-devel
neonking has joined #dri-devel
thellstrom has joined #dri-devel
<danvet>
Sumera, hm I'd dump how big the allocation is
<danvet>
maybe we're trying a huge resolution
<danvet>
above what kzalloc can allocate
<danvet>
then compare which allocations work and which don't, if it's only the big ones that fail, that's probably the bug
<danvet>
if it's random, then there's another reason
<danvet>
but if you only get that you're probably over the kmalloc limit
<danvet>
since if we're actually running low on memory there's usually a big splat of additional information from the allocator
heat has joined #dri-devel
<Sumera>
danvet: this happens only for the virtual_hw case tho, won't the memory being requested be same for both virtual and non virtual cases?
<Sumera>
I will check the size being allocated in the meanwhile and get back to you in some time
<danvet>
Sumera, hm maybe, I'd check to be sure
<danvet>
it's just that usually when kmalloc fails, you get a few pages of allocator dumps in dmesg
<danvet>
hm maybe dmesg debug level is only showing critical stuff?
<danvet>
that could be another one
<danvet>
__GFP_NOWARN is the flag for "I know how to handle allocations error here and even expect them, don't freak out when there's no memory"
<danvet>
and we don't set that in that case you're hitting
alatiera4 is now known as alatiera
yk has joined #dri-devel
<Sumera>
danvet: yeah, could be, none of my printks(even after using KERN_CRIT) were showing up so changed config and tree is building rn.
dt9 is now known as dt9_away
iive has joined #dri-devel
thellstrom has quit [Remote host closed the connection]
thellstrom has joined #dri-devel
thellstrom has quit [Remote host closed the connection]
xp4ns3 has quit [Quit: Konversation terminated!]
ella-0 has quit [Remote host closed the connection]
thellstrom has joined #dri-devel
Guest205 is now known as blue_penquin
txenoo has quit [Quit: Leaving]
frieder_ has quit [Remote host closed the connection]
pcercuei has quit [Quit: brb]
pcercuei has joined #dri-devel
pcercuei has quit []
pcercuei has joined #dri-devel
dt9_away has left #dri-devel [#dri-devel]
dt9 has joined #dri-devel
berylline has joined #dri-devel
<berylline>
i don't want to sound like a pest, but i asked a question yesterday and it was this
<berylline>
[18:49:46] <berylline> another question that i wanted to ask: is there any way to trace what's going on with GPUs on ARM without mmiotrace?
<berylline>
[18:50:00] <berylline> i know there's a method like what panwrap was made for
<berylline>
<berylline> but i was also wondering if there are any methods of tracing GPU hardware reads, writes and other things besides what i've already mentioned
txenoo has joined #dri-devel
<robclark>
berylline: you disconnected before anyone had a chance to answer.. but I don't believe mmiotrace works on arm (it certainly didn't many years back when I started poking at gpus on arm devices).. AFAIK everyone uses LD_PRELOAD shims to wrap ioctls
<berylline>
robclark: yeah, i understand that mmiotrace doesn't support ARM, which was why i asked. and i disconnected because i had to do something
<berylline>
it's a shame that mmiotrace doesn't ARM, really :(
<berylline>
*doesn't support ARM
<robclark>
tbh, mmiotrace hasn't been to much needed.. the only thing really directly touching hw is the kernel part, and android requires that to be open source
<berylline>
true. i also know that the SGX GPUs that i'm going to study have an open-source kernel module which contains service commands
<berylline>
which i think is going to be interesting to refer to for looking at the traffic that goes on between the closed-source components and the kernel module, i think
<berylline>
although i don't think all of the communication coming from those components are going to involve the kernel module
<berylline>
but i can't really say unless i get my hands on a BeagleBone Black or something else like it
alanc has quit [Remote host closed the connection]
alanc has joined #dri-devel
<bl4ckb0ne>
is it normal to experience gl3.3 failure when updating the vulkan spec?
gouchi has joined #dri-devel
pekkari has quit []
gouchi has quit [Remote host closed the connection]
<ccr>
I wonder if anyone has realized that pretty much all of the gallium debugging stuff works by pure chance? I mean, the drivers and auxiliary/driver_* stuff all wrap pipe_screen into one of their own structs, basically typecasting to pipe_screen. which is kind of a problem when more than one thing does the same.
<zmike>
shhhh we don't talk about that
<ccr>
I see :P too bad I ran into a issue where things explode because driver messes with the trace component's data, corrupting a pointer -> kaboom
libv has joined #dri-devel
jernej has quit [Remote host closed the connection]
SanchayanMaity has joined #dri-devel
<ccr>
I guess I'll "fix" it locally with something like struct trace_screen { struct pipe_screen base; int bogus_padding[1024]; ...
<ccr>
was already trying to figure out a better solution, but this looks like rather deep-rooted issue and would require some major overhauls to fix properly
libv_ has quit [Ping timeout: 480 seconds]
<zmike>
ccr: where are you encountering this?
<zmike>
padding won't help since the driver is then failing to update its own data
<zmike>
I fixed cases of this recently in iris and llvmpipe
<ccr>
with crocus, so it's kinda out-of-mainline thing
<zmike>
ah
<zmike>
maybe the same as what I had in iris then?
<airlied>
ccr: point me at it and I'll port over the iris fix
<zmike>
anything in the driver accessing resource->screen will explode
<ccr>
could be, but I don't really see why this kind of issue wouldn't occur with any driver .. depending on what the struct wrapping pipe_screen has after "base"
<zmike>
ideally drivers don't trigger that behavior with wrapped pointers
<ccr>
struct crocus_screen {
<ccr>
uint32_t refcount;
<ccr>
struct pipe_screen base;
<ccr>
dreda has quit [Ping timeout: 480 seconds]
<zmike>
so yeah, same as the iris one it sounds like
<ccr>
mainline iris has the same, is your fix in one of your trees?
<zmike>
no, it's in iris
<airlied>
zmike: oh I see the fix you did recently I'll pull it over
<ccr>
ah
<zmike>
👍
jernej has joined #dri-devel
<ccr>
I assumed it would've been some kind of struct thing, ok
<zmike>
it is, you just have to approach it from the other direction
<ccr>
sounds .. scary
<zmike>
trace bugs are
<ccr>
well, I'd say this is more of a design failure overall, but shrug
<zmike>
cool that it's working with crocus!
<zmike>
haha
dreda has joined #dri-devel
<ccr>
crocus is working super well, for me at least.
mal has quit [Remote host closed the connection]
mal has joined #dri-devel
<ccr>
super well with my .. haswell *badadum tssh*
<airlied>
pushed the orig_screen fix to crocus
<ccr>
airlied, hooray
<airlied>
may even open an MR against main this week
<ccr>
../src/gallium/drivers/crocus/crocus_resource.c:320:23: error: implicit declaration of function ‘crocus_screen_ref’; did you mean ‘crocus_pscreen_ref’? [-Werror=implicit-function-declaration]
pcercuei has quit [Quit: dodo]
<zmike>
oops
<airlied>
dang snb is so slow to compile
<ccr>
works \:D\
<HdkR>
oop, I should double check if that llvmpipe patch works for me rather than claiming it'll work for me :P