#panfrost on 2021-11-24 — irc logs at oftc.irclog.whitequark.org

2021-07-26 22:56 ChanServ changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard & Bifrost - Logs https://oftc.irclog.whitequark.org/panfrost - <macc24> i have been here before it was popular

00:41 chewitt has joined #panfrost

00:46 Bennett has quit [Remote host closed the connection]

02:28 camus has joined #panfrost

02:29 camus1 has quit [Read error: Connection reset by peer]

02:39 nlhowell has joined #panfrost

02:47 nlhowell has quit [Ping timeout: 480 seconds]

02:52 jambalaya has quit [Remote host closed the connection]

02:52 nlhowell has joined #panfrost

02:52 jambalaya has joined #panfrost

03:00 nlhowell has quit [Ping timeout: 480 seconds]

04:55 camus1 has joined #panfrost

05:01 camus has quit [Ping timeout: 480 seconds]

05:06 JulianGro has joined #panfrost

06:49 camus has joined #panfrost

06:53 camus1 has quit [Ping timeout: 480 seconds]

07:13 soreau has quit [Ping timeout: 480 seconds]

07:57 rasterman has joined #panfrost

08:12 macc24 has joined #panfrost

08:39 camus1 has joined #panfrost

08:40 camus has quit [Remote host closed the connection]

10:17 camus has joined #panfrost

10:18 camus1 has quit [Remote host closed the connection]

10:45 soreau has joined #panfrost

10:45 camus1 has joined #panfrost

10:46 camus has quit [Read error: Connection reset by peer]

11:00 camus1 has quit [Remote host closed the connection]

11:00 camus has joined #panfrost

12:16 camus1 has joined #panfrost

12:21 camus has quit [Ping timeout: 480 seconds]

13:56 hyrc has quit []

14:58 camus1 has quit [Remote host closed the connection]

14:58 camus has joined #panfrost

15:07 nlhowell has joined #panfrost

15:52 <alyssa> Oh ho ho wait

15:53 <alyssa> the automatic varying allocation on Valhall ... the varyings get allocated into the tiler heap?

15:53 <alyssa> adorable.

15:54 <alyssa> (0.0, 0.0, 1.0, 1.0), (0, 256.0, 1.0, 1.0), (256.0, 0.0, 1.0, 1.0), (256.0, 256.0, 1.0, 1.0)

15:55 <alyssa> that sure looks like my quad

16:24 <rasterman> that does look like a quad ...

16:25 <rasterman> tho zw are 1.0?

16:25 <rasterman> (xyzw) odd...

16:25 <rasterman> this is dumping the tiler heap as vecs?

16:25 <rasterman> (vec4)

16:30 <alyssa> rasterman: mm

16:30 <alyssa> have I forgotten how opengl works is that what happened here

16:31 <rasterman> the 1.0's ?

16:31 <alyssa> mm

16:32 <rasterman> mm ... memory management?

16:33 <alyssa> oh geez did i just. ugh

16:33 * alyssa reboots

16:33 <alyssa> it would be lovely if I didn't have the NDK and my demo on different partitions of the same machine but well

16:34 <rasterman> oooh 'droid

16:34 nlhowell is now known as Guest6704

16:34 nlhowell has joined #panfrost

16:34 <alyssa> juggling too many OSes

16:34 * rasterman grimaces

16:36 Guest6704 has quit [Ping timeout: 480 seconds]

16:36 <alyssa> yep i've committed opengl crimes

16:36 <rasterman> hehehe

16:48 <alyssa> (0, 0, 0.5, 1.0), (0, 256.0, 0.5, 1.0), (25.0, 0.0, 0.5, 1.0), (256.0, 256.0, 0.5, 1.0)

16:48 <alyssa> this seems a lot less sketchy

16:48 <alyssa> still broken but I fixed my initial OpenGL crimes

16:50 <rasterman> that doesnt look like a quad... :)

16:51 <alyssa> er 25 should be 256

16:51 <alyssa> but otherwise, still should render at least one pixel

16:51 <alyssa> hm I wonder if I can hook up perf counters

16:51 <rasterman> aaah ok- thats more like a quad :)

16:52 <rasterman> well a screen grid aligned quad :)

16:52 <alyssa> I'm just trying to render a nonzero number of pixels, right now everything is getting culled and it's probably entirely my fault

16:53 <rasterman> z problems?

16:53 <alyssa> (culled or clipped or scissored or otherwise discarded before attempting the depth test or rendering)

16:53 <rasterman> or it literally isnt even trying tyo render ?

16:53 <alyssa> it literally isn't trying to render

16:53 <rasterman> ok - got it before fragemnts

16:53 <rasterman> hmmm

16:53 <alyssa> yeah, I know that because I set the fragment shader / blend descriptor / depth stencil descriptor addresses to nonsense and it doesn't fault

16:57 <alyssa> I do see a seriously weird looking framebuffer descriptor, maybe that's an interesting bit

16:58 <alyssa> I should probably also confirm that the android sample I'm taking bits from actually, y'know, works ...

16:59 <rasterman> ohhh

16:59 <rasterman> u never saw it working? hahahahah

16:59 <rasterman> speaking of working...

17:00 <rasterman> something in drm-next has screwed panfrost / midgard /rk3399

17:00 <alyssa> mumble

17:00 <rasterman> 5.15rc land

17:00 <rasterman> i'm getting kernell oops's

17:00 <rasterman> drm hangs (kms is broken) ... lots of un-fun-times

17:54 <alyssa> ok maybe the android demo itself is broken ugh

17:54 <alyssa> alyssa..

17:54 <rasterman> hahahaa

17:54 <rasterman> always check the thing you want to run is right to begin with... :)

17:57 <alyssa> https://backtick.town/~bloom/triangle0.bmp okay so I definitely have a working sample

17:57 <rasterman> bmp?

17:57 <rasterman> geee...

17:57 <alyssa> you underestimate my lazy

17:58 <rasterman> well you have pretty pink and white triangles... :)

17:58 <rasterman> i hate it when bisects screw up. i alreayd had this screw up twice

17:59 <rasterman> like once it told me its been fixed betwen a range of like 8 commet revs

17:59 <rasterman> this time i got a "this rev is broken"

17:59 <rasterman> ... finally

17:59 <rasterman> now i have to manually check because i don't trust this...

17:59 <alyssa> mmh

17:59 <alyssa> maybe I should try to bisect from the other side, then

18:00 <alyssa> start scribbling over DDK memory and see when it breaks o:)

18:00 <rasterman> hahahahaha

18:00 <alyssa> dougallj did this on apple to great success

18:01 <rasterman> actually i'm really curious if over time morello can make this less likely. e.g. memory is "owned" by a shared lib

18:01 <rasterman> and ONLY code executing from within that shared lib can write to it...

18:01 <alyssa> whose side are you on :p

18:01 camus1 has joined #panfrost

18:02 <rasterman> and to allow writing to memory the lib may alloc -= it has to explicitly export a poointer to do that then "unexport" it to revoke such access

18:02 <rasterman> (and only code inside that shlib mappings can do the export/unexport)

18:02 <rasterman> that'd be nice...

18:02 <alyssa> whose side are you on :p

18:02 <rasterman> and .. i'm on my side...

18:02 <rasterman> as someone who writes lots of shlibs... i'm tired of apps scribbling over data structs the shlib manages then deciding to blame the shlib :)

18:03 <rasterman> it'd be also nice to lock mem down to a specific thread too :)

18:03 <rasterman> the same way

18:04 camus has quit [Ping timeout: 480 seconds]

18:04 <rasterman> i've had to debug problems before where someone rtan code from a thread... and it worked 99.9999% of the time

18:04 <rasterman> then every now and again the app under some heavy testing would lcok up

18:04 <rasterman> a linked list data struct would become a looped list (infinite now with no beginning.end)

18:04 <alyssa> rust

18:05 <rasterman> it eventually turned out to be that writing fromt he thread (didnt know it was doing that) because those apis to handle that were not threadsafe anbd intended to be called from mainloop only

18:05 <rasterman> rewriting everything in rust is not a viable solution :)

18:06 <rasterman> sit down and spend 5 years going nowhere and just re-writing code in rust (assuming you already gained expert status in rust too)

18:06 <rasterman> so no ability to move on...

18:06 <rasterman> probabkly will take 10y actually not 5y

18:07 <rasterman> AND in the process you are likely to add new bugs as previously well debugged code/algorithms get rewritten and have new bugs - sure... no "memory stomping" bugs... but other new logic ones :)

18:08 <HdkR> rasterman: Tagged memory can do what you're wanting. Allocate a tag per library and enforce ownership semantics that way

18:08 <HdkR> Of course it won't work in all cases since you need to be careful about giving away ownership

18:09 <HdkR> Most APIs just pass around memory without a care

18:09 <rasterman> this problem above got solved by sliding a new object handle system beneath the existing api... object * ptrs became references in a table and needed a lookup. those tables are TLS and have other sanity checks (like checksums/hashes) and this then stopps the ability to even access an object from a thread where it did not exist UNLESS you explicitly expose another thread's context

18:09 <rasterman> HdkR: you mean MTE? or you mean eg aligning your allocs to e.g. 16 bytes and using lower 4 bits as the tag?

18:09 <HdkR> Yes, MTE

18:10 <rasterman> yeah. mte is nice. :)

18:10 <alyssa> Ok, DDK does /not/ like me deleting its position shader resource table

18:10 <rasterman> but the above would not have been solved by mte as it'd have been a valid ptr just acc essed from the wrong thread

18:11 <rasterman> but definitely having some kind of export/import ptr in an api gives you a point of control

18:11 <rasterman> forcing alignment e.g. to 64bytes would allow 6 bits which would be nicer :)

18:12 <rasterman> but the above object api solved it by making it an indirection and it made it essentially impossible to access an obj when not intended ... so that got done.

18:12 <rasterman> but it was just an example of the kinds of thnigs a proper capability system can do

18:12 <rasterman> MTE is like a capability system for poor people. 4 bits... :)

18:13 <rasterman> it's nice to have and slide in to today's arch. but a full 128bits is even nicer :)

18:14 <rasterman> but there's a lot of research/work to do to bring in the idea of exporting/importing ptrs between "domains of ownership". it's not too common.

18:21 <alyssa> hmm what is 112 bytes

18:21 <alyssa> "2^4 * 7" "real helpful"

18:30 <alyssa> ok yep words 30,31 of the idvs helper payload are the near/far planes

18:35 <HdkR> Woo RE

18:38 <alyssa> so it looks like I should target my investigation at this resouce thing in position shaders, and the LEA_ATTR instruction (or whatever it actually is)

18:45 <alyssa> but.. this isn't even accessing the resources. clearly I've broken multiple things

18:45 <rasterman> AAAARGH

18:46 <rasterman> why? bisect is wrong ... wtf...

18:50 <alyssa> sounds like me trying to valhall

18:52 <alyssa> ah!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

18:52 <alyssa> resources is a f'ing tagged pointer

18:52 <rasterman> :)

18:53 <alyssa> this is not the first time i've burned a lot of time on this bug

19:01 <alyssa> but... it's not going further than this outer layer on the resouce table too?

19:02 <alyssa> definitely getting closer .. I think

19:10 <alyssa> i mean. maybe.

19:11 <alyssa> I don't remember this being painful on Bifrost... I remember it being really bad on Midgard... medium-bad on AGX...

19:20 <alyssa> remark: valhall seems to issue vertex jobs in groups of 4

19:20 <alyssa> i.e. if you draw a single triangle it still does the extra vertex, arguably incorrectly but..

19:21 <alyssa> the tiler heap is suspiciously empty except for my vertices. Guessing whatever goes wrong, is going wrong at most as late as tiling

19:22 chewitt has quit [Quit: Zzz..]

19:25 <HdkR> alyssa: Does that mean that you'll get two line segments and four points executing? :D

19:25 <alyssa> HdkR: likely

19:25 <HdkR> That's cute

19:27 <alyssa> will try this again later

19:27 <alyssa> still, progress .. I think

19:27 <alyssa> and still not really "stuck" .. just getting bored of the tug of war with the hw :p

19:33 camus has joined #panfrost

19:37 camus1 has quit [Ping timeout: 480 seconds]

20:07 macc24 has quit [Ping timeout: 480 seconds]

21:05 bbrezillon has quit [Read error: Connection reset by peer]

21:10 bbrezillon has joined #panfrost

21:47 camus1 has joined #panfrost

21:51 camus has quit [Ping timeout: 480 seconds]

22:30 camus has joined #panfrost

22:31 camus1 has quit [Read error: Connection reset by peer]

23:06 alpernebbi has quit [Ping timeout: 480 seconds]

23:12 megi has quit [Remote host closed the connection]

23:13 megi has joined #panfrost

23:19 nlhowell has quit [Ping timeout: 480 seconds]

23:23 <daniels> rasterman: cherry-pick the top two from https://gitlab.freedesktop.org/robclark/msm/-/commits/v5.16-rc1-plus-fixes if you haven't already got them

23:23 <rasterman> daniels: is that actually it?

23:24 <daniels> rasterman: if your oops points to dma_scheduler/dma_resv/dma_fence UAF, yeah

23:24 <robclark> I assume daniels meant the first two

23:24 <daniels> yeah, first two, soz

23:24 <rasterman> i've tried bisecting like 4 tiems now - each time it ends up at a different hash ... and well... manually pre/post that its also broken... i'ts been pissing me off :|

23:25 <rasterman> it's been cvausing all sorts of fun side effects like network port not working right and other weird side effects too.

23:25 <rasterman> i'll give that a shot to stuff those in and see. i closed up my bisecting terms for the day :)

23:30 alpernebbi has joined #panfrost

23:44 <rasterman> actually let me try now

23:44 <rasterman> i still have it up - just had to reconnect

23:48 <robmur01> protip: always assume any kernel before about -rc3 to be catastrophically broken. These days I typically don't even bother bisecting things unless I'm still hitting them mid-cycle, except when there's some likelihood of it being related to something I've done :)

23:48 <rasterman> i was hoping to figure it out before rel... :)

23:48 <rasterman> but i've been a bit baffled that a bisect hasnt reliably pointed to something to scratch my head over :|

23:49 <rasterman> but i didnt like the idea of a release having this broken

23:52 <robmur01> bisecting across merges is an arse at the best of times, but particularly when those merges are branches with different bases all over the place

23:54 <rasterman> yeah... so i find. it makes me yearn for linear history :)

23:55 <robmur01> what's worst though is when the bisect result is utterly nonsensical but actually true

23:56 <rasterman> you mean it found the commit but th8e commit itself makes no sense as to why that would cause it?

23:56 * robmur01 remembers figuring out when the merge of the input tree broke USB on Juno...