ChanServ changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard & Bifrost - Logs https://oftc.irclog.whitequark.org/panfrost - <macc24> i have been here before it was popular
chewitt has joined #panfrost
Bennett has quit [Remote host closed the connection]
camus has joined #panfrost
camus1 has quit [Read error: Connection reset by peer]
nlhowell has joined #panfrost
nlhowell has quit [Ping timeout: 480 seconds]
jambalaya has quit [Remote host closed the connection]
nlhowell has joined #panfrost
jambalaya has joined #panfrost
nlhowell has quit [Ping timeout: 480 seconds]
camus1 has joined #panfrost
camus has quit [Ping timeout: 480 seconds]
JulianGro has joined #panfrost
camus has joined #panfrost
camus1 has quit [Ping timeout: 480 seconds]
soreau has quit [Ping timeout: 480 seconds]
rasterman has joined #panfrost
macc24 has joined #panfrost
camus1 has joined #panfrost
camus has quit [Remote host closed the connection]
camus has joined #panfrost
camus1 has quit [Remote host closed the connection]
soreau has joined #panfrost
camus1 has joined #panfrost
camus has quit [Read error: Connection reset by peer]
camus1 has quit [Remote host closed the connection]
camus has joined #panfrost
camus1 has joined #panfrost
camus has quit [Ping timeout: 480 seconds]
hyrc has quit []
camus1 has quit [Remote host closed the connection]
camus has joined #panfrost
nlhowell has joined #panfrost
<alyssa> Oh ho ho wait
<alyssa> the automatic varying allocation on Valhall ... the varyings get allocated into the tiler heap?
<alyssa> adorable.
<alyssa> (0.0, 0.0, 1.0, 1.0), (0, 256.0, 1.0, 1.0), (256.0, 0.0, 1.0, 1.0), (256.0, 256.0, 1.0, 1.0)
<alyssa> that sure looks like my quad
<rasterman> that does look like a quad ...
<rasterman> tho zw are 1.0?
<rasterman> (xyzw) odd...
<rasterman> this is dumping the tiler heap as vecs?
<rasterman> (vec4)
<alyssa> rasterman: mm
<alyssa> have I forgotten how opengl works is that what happened here
<rasterman> the 1.0's ?
<alyssa> mm
<rasterman> mm ... memory management?
<alyssa> oh geez did i just. ugh
* alyssa reboots
<alyssa> it would be lovely if I didn't have the NDK and my demo on different partitions of the same machine but well
<rasterman> oooh 'droid
nlhowell is now known as Guest6704
nlhowell has joined #panfrost
<alyssa> juggling too many OSes
* rasterman grimaces
Guest6704 has quit [Ping timeout: 480 seconds]
<alyssa> yep i've committed opengl crimes
<rasterman> hehehe
<alyssa> (0, 0, 0.5, 1.0), (0, 256.0, 0.5, 1.0), (25.0, 0.0, 0.5, 1.0), (256.0, 256.0, 0.5, 1.0)
<alyssa> this seems a lot less sketchy
<alyssa> still broken but I fixed my initial OpenGL crimes
<rasterman> that doesnt look like a quad... :)
<alyssa> er 25 should be 256
<alyssa> but otherwise, still should render at least one pixel
<alyssa> hm I wonder if I can hook up perf counters
<rasterman> aaah ok- thats more like a quad :)
<rasterman> well a screen grid aligned quad :)
<alyssa> I'm just trying to render a nonzero number of pixels, right now everything is getting culled and it's probably entirely my fault
<rasterman> z problems?
<alyssa> (culled or clipped or scissored or otherwise discarded before attempting the depth test or rendering)
<rasterman> or it literally isnt even trying tyo render ?
<alyssa> it literally isn't trying to render
<rasterman> ok - got it before fragemnts
<rasterman> hmmm
<alyssa> yeah, I know that because I set the fragment shader / blend descriptor / depth stencil descriptor addresses to nonsense and it doesn't fault
<alyssa> I do see a seriously weird looking framebuffer descriptor, maybe that's an interesting bit
<alyssa> I should probably also confirm that the android sample I'm taking bits from actually, y'know, works ...
<rasterman> ohhh
<rasterman> u never saw it working? hahahahah
<rasterman> speaking of working...
<rasterman> something in drm-next has screwed panfrost / midgard /rk3399
<alyssa> mumble
<rasterman> 5.15rc land
<rasterman> i'm getting kernell oops's
<rasterman> drm hangs (kms is broken) ... lots of un-fun-times
<alyssa> ok maybe the android demo itself is broken ugh
<alyssa> alyssa..
<rasterman> hahahaa
<rasterman> always check the thing you want to run is right to begin with... :)
<alyssa> https://backtick.town/~bloom/triangle0.bmp okay so I definitely have a working sample
<rasterman> bmp?
<rasterman> geee...
<alyssa> you underestimate my lazy
<rasterman> well you have pretty pink and white triangles... :)
<rasterman> i hate it when bisects screw up. i alreayd had this screw up twice
<rasterman> like once it told me its been fixed betwen a range of like 8 commet revs
<rasterman> this time i got a "this rev is broken"
<rasterman> ... finally
<rasterman> now i have to manually check because i don't trust this...
<alyssa> mmh
<alyssa> maybe I should try to bisect from the other side, then
<alyssa> start scribbling over DDK memory and see when it breaks o:)
<rasterman> hahahahaha
<alyssa> dougallj did this on apple to great success
<rasterman> actually i'm really curious if over time morello can make this less likely. e.g. memory is "owned" by a shared lib
<rasterman> and ONLY code executing from within that shared lib can write to it...
<alyssa> whose side are you on :p
camus1 has joined #panfrost
<rasterman> and to allow writing to memory the lib may alloc -= it has to explicitly export a poointer to do that then "unexport" it to revoke such access
<rasterman> (and only code inside that shlib mappings can do the export/unexport)
<rasterman> that'd be nice...
<alyssa> whose side are you on :p
<rasterman> and .. i'm on my side...
<rasterman> as someone who writes lots of shlibs... i'm tired of apps scribbling over data structs the shlib manages then deciding to blame the shlib :)
<rasterman> it'd be also nice to lock mem down to a specific thread too :)
<rasterman> the same way
camus has quit [Ping timeout: 480 seconds]
<rasterman> i've had to debug problems before where someone rtan code from a thread... and it worked 99.9999% of the time
<rasterman> then every now and again the app under some heavy testing would lcok up
<rasterman> a linked list data struct would become a looped list (infinite now with no beginning.end)
<alyssa> rust
<rasterman> it eventually turned out to be that writing fromt he thread (didnt know it was doing that) because those apis to handle that were not threadsafe anbd intended to be called from mainloop only
<rasterman> rewriting everything in rust is not a viable solution :)
<rasterman> sit down and spend 5 years going nowhere and just re-writing code in rust (assuming you already gained expert status in rust too)
<rasterman> so no ability to move on...
<rasterman> probabkly will take 10y actually not 5y
<rasterman> AND in the process you are likely to add new bugs as previously well debugged code/algorithms get rewritten and have new bugs - sure... no "memory stomping" bugs... but other new logic ones :)
<HdkR> rasterman: Tagged memory can do what you're wanting. Allocate a tag per library and enforce ownership semantics that way
<HdkR> Of course it won't work in all cases since you need to be careful about giving away ownership
<HdkR> Most APIs just pass around memory without a care
<rasterman> this problem above got solved by sliding a new object handle system beneath the existing api... object * ptrs became references in a table and needed a lookup. those tables are TLS and have other sanity checks (like checksums/hashes) and this then stopps the ability to even access an object from a thread where it did not exist UNLESS you explicitly expose another thread's context
<rasterman> HdkR: you mean MTE? or you mean eg aligning your allocs to e.g. 16 bytes and using lower 4 bits as the tag?
<HdkR> Yes, MTE
<rasterman> yeah. mte is nice. :)
<alyssa> Ok, DDK does /not/ like me deleting its position shader resource table
<rasterman> but the above would not have been solved by mte as it'd have been a valid ptr just acc essed from the wrong thread
<rasterman> but definitely having some kind of export/import ptr in an api gives you a point of control
<rasterman> forcing alignment e.g. to 64bytes would allow 6 bits which would be nicer :)
<rasterman> but the above object api solved it by making it an indirection and it made it essentially impossible to access an obj when not intended ... so that got done.
<rasterman> but it was just an example of the kinds of thnigs a proper capability system can do
<rasterman> MTE is like a capability system for poor people. 4 bits... :)
<rasterman> it's nice to have and slide in to today's arch. but a full 128bits is even nicer :)
<rasterman> but there's a lot of research/work to do to bring in the idea of exporting/importing ptrs between "domains of ownership". it's not too common.
<alyssa> hmm what is 112 bytes
<alyssa> "2^4 * 7" "real helpful"
<alyssa> ok yep words 30,31 of the idvs helper payload are the near/far planes
<HdkR> Woo RE
<alyssa> so it looks like I should target my investigation at this resouce thing in position shaders, and the LEA_ATTR instruction (or whatever it actually is)
<alyssa> but.. this isn't even accessing the resources. clearly I've broken multiple things
<rasterman> AAAARGH
<rasterman> why? bisect is wrong ... wtf...
<alyssa> sounds like me trying to valhall
<alyssa> ah!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
<alyssa> resources is a f'ing tagged pointer
<rasterman> :)
<alyssa> this is not the first time i've burned a lot of time on this bug
<alyssa> but... it's not going further than this outer layer on the resouce table too?
<alyssa> definitely getting closer .. I think
<alyssa> i mean. maybe.
<alyssa> I don't remember this being painful on Bifrost... I remember it being really bad on Midgard... medium-bad on AGX...
<alyssa> remark: valhall seems to issue vertex jobs in groups of 4
<alyssa> i.e. if you draw a single triangle it still does the extra vertex, arguably incorrectly but..
<alyssa> the tiler heap is suspiciously empty except for my vertices. Guessing whatever goes wrong, is going wrong at most as late as tiling
chewitt has quit [Quit: Zzz..]
<HdkR> alyssa: Does that mean that you'll get two line segments and four points executing? :D
<alyssa> HdkR: likely
<HdkR> That's cute
<alyssa> will try this again later
<alyssa> still, progress .. I think
<alyssa> and still not really "stuck" .. just getting bored of the tug of war with the hw :p
camus has joined #panfrost
camus1 has quit [Ping timeout: 480 seconds]
macc24 has quit [Ping timeout: 480 seconds]
bbrezillon has quit [Read error: Connection reset by peer]
bbrezillon has joined #panfrost
camus1 has joined #panfrost
camus has quit [Ping timeout: 480 seconds]
camus has joined #panfrost
camus1 has quit [Read error: Connection reset by peer]
alpernebbi has quit [Ping timeout: 480 seconds]
megi has quit [Remote host closed the connection]
megi has joined #panfrost
nlhowell has quit [Ping timeout: 480 seconds]
<daniels> rasterman: cherry-pick the top two from https://gitlab.freedesktop.org/robclark/msm/-/commits/v5.16-rc1-plus-fixes if you haven't already got them
<rasterman> daniels: is that actually it?
<daniels> rasterman: if your oops points to dma_scheduler/dma_resv/dma_fence UAF, yeah
<robclark> I assume daniels meant the first two
<daniels> yeah, first two, soz
<rasterman> i've tried bisecting like 4 tiems now - each time it ends up at a different hash ... and well... manually pre/post that its also broken... i'ts been pissing me off :|
<rasterman> it's been cvausing all sorts of fun side effects like network port not working right and other weird side effects too.
<rasterman> i'll give that a shot to stuff those in and see. i closed up my bisecting terms for the day :)
alpernebbi has joined #panfrost
<rasterman> actually let me try now
<rasterman> i still have it up - just had to reconnect
<robmur01> protip: always assume any kernel before about -rc3 to be catastrophically broken. These days I typically don't even bother bisecting things unless I'm still hitting them mid-cycle, except when there's some likelihood of it being related to something I've done :)
<rasterman> i was hoping to figure it out before rel... :)
<rasterman> but i've been a bit baffled that a bisect hasnt reliably pointed to something to scratch my head over :|
<rasterman> but i didnt like the idea of a release having this broken
<robmur01> bisecting across merges is an arse at the best of times, but particularly when those merges are branches with different bases all over the place
<rasterman> yeah... so i find. it makes me yearn for linear history :)
<robmur01> what's worst though is when the bisect result is utterly nonsensical but actually true
<rasterman> you mean it found the commit but th8e commit itself makes no sense as to why that would cause it?
* robmur01 remembers figuring out when the merge of the input tree broke USB on Juno...