#asahi-gpu on 2023-08-17 — irc logs at oftc.irclog.whitequark.org

2022-12-21 00:46 ChanServ changed the topic of #asahi-gpu to: Asahi Linux GPU development (no user support, NO binary reversing) | Keep things on topic | GitHub: https://alx.sh/g | Wiki: https://alx.sh/w | Logs: https://alx.sh/l/asahi-gpu

01:42 jeisom has quit [Ping timeout: 480 seconds]

02:04 <i509vcb> alyssa: looking at https://gitlab.freedesktop.org/asahi/mesa/-/blob/agxv/main/src/vulkan/runtime/vk_meta_copy.c#L558 multiple layers aren't handled yet, is this something more complicated than just run do_copy at different locations a few times or does some other part of agxv need to compensate for this?

02:05 <alyssa> shrug

02:06 <alyssa> I'm told Collabora is working on finishing vk_meta_copy so I wasn't touching it

02:06 <i509vcb> ok

02:09 <i509vcb> Well I knew it would happen eventually that I'd cause kde to hang

02:09 <i509vcb> Didn't expect running a semaphore test to do that

02:10 <alyssa> yeah the process isolation on these things isn't.. grat

02:10 <alyssa> great

02:10 <alyssa> when I was doing Mesa on macOS, i got macOS to fail in all sorts of creative ways..

02:11 <i509vcb> I'm guessing this type of hang I can't fix from just trying to ssh in...

02:11 <alyssa> It might be

02:11 <alyssa> I regularly have to ssh in and pkill sway and that's enough to fix

02:11 <alyssa> haven't had to reboot in months, lina's kernel is rock solid <3

02:13 <i509vcb> I think the compositor hung if I can't adjust volume, well I don't know the local IP off head so I guess I need to reboot anyways

02:15 <i509vcb> Wayland compositor handover would be great for this type of situation...

02:24 <i509vcb> Wondering is there a reason why dmesg gets swamped with "Blocking due to compute queue full"?

02:24 <alyssa> apparently i have that too, that's neat

02:25 <i509vcb> I noticed that spam when I ran dEQP-VK.synchronization.basic.binary_semaphore.chain which failed before a reboot and works after a reboot

02:25 <alyssa> Oof

02:28 <i509vcb> > WorkQueue: Cannot submit, but queue is empty?

02:28 <i509vcb> That seems weird

02:39 ourdumbfuture has quit [Quit: My Mac has gone to sleep. ZZZzzz…]

03:01 VinDuv has quit [Ping timeout: 480 seconds]

03:21 marvin24_ has joined #asahi-gpu

03:29 marvin24 has quit [Ping timeout: 480 seconds]

03:36 <lina> That sounds like a bug...

03:37 <lina> The driver is really good with memory safety stuff but there are still definitely bugs in the queuing logic/etc (plus drm_sched problems) that can cause things to get stuck I think...

03:37 <lina> (And then if a fence never gets signaled that can jam all of KDE)

03:38 <i509vcb> Flipping the bits in agxv to turn on timeline semaphores just seems to hang forever on dEQP-VK.synchronization.basic.timeline_semaphore.one_queue

03:38 <i509vcb> I guess this isn't tested so no idea what could be wrong there

03:38 <lina> Oh it might be that timeline stuff is just completely broken

03:38 <lina> That is totally untested, I just wrote it blind...

03:39 <lina> If you have a branch/testcase I can use to try it I can look at it tomorrow ^^

03:40 <i509vcb> Give me a min and I'll push my current agxv branch so you can look at that

03:43 <i509vcb> https://gitlab.freedesktop.org/i509VCB/mesa/-/commits/agxv/timeline-semaphore

03:44 <i509vcb> dEQP-VK.synchronization.basic.timeline_semaphore.* all hang I believe

03:44 Z750 has quit [Quit: Ping timeout (120 seconds)]

03:45 Z750 has joined #asahi-gpu

03:45 <i509vcb> I also tested with setting sync_types[1] to &device->sync_timeline_type.sync and then putting NULL at [2] but that just fails on an assert in common code

03:45 <i509vcb> It's possible I don't exactly know what to do with the vk_sync stuff

03:47 <alyssa> i509vcb: welcome to how the kernel driver is built

03:47 <i509vcb> however other drivers seem to do fine with null at [2] and [1] being the timeline sync type

03:47 <alyssa> lina writes kernel code without being able to test, i write mesa code and find the kernel is broken, lina debugs kernel with my mesa branch, i fix my mesa branch once the kernel is fixed because whoops i also wrote buggy mesa code

03:47 <alyssa> or sometimes the other way around

03:47 <alyssa> :~)

03:48 <alyssa> the broken branch hot potato

03:48 <alyssa> lina and i are experts :~)

03:49 <i509vcb> The assert I hit has this ominous "We can only have one timeline mode" comment

04:07 cylm_ has joined #asahi-gpu

05:53 Whistler_ has joined #asahi-gpu

06:04 bisko has quit [Quit: My Mac has gone to sleep. ZZZzzz…]

07:34 cylm_ has quit [Ping timeout: 480 seconds]

07:54 cr1901_ has joined #asahi-gpu

07:54 cr1901 has quit [Read error: Connection reset by peer]

08:16 jennifilm has joined #asahi-gpu

08:33 WindowPa- has joined #asahi-gpu

08:33 WindowPain has quit [Read error: Connection reset by peer]

10:10 ourdumbfuture has joined #asahi-gpu

10:17 cylm_ has joined #asahi-gpu

10:33 ourdumbfuture has quit [Quit: My Mac has gone to sleep. ZZZzzz…]

11:08 cylm_ has quit [Ping timeout: 480 seconds]

11:21 zane has joined #asahi-gpu

11:25 jeisom has joined #asahi-gpu

11:30 ourdumbfuture has joined #asahi-gpu

11:45 ourdumbfuture has quit [Quit: My Mac has gone to sleep. ZZZzzz…]

11:52 ourdumbfuture has joined #asahi-gpu

12:29 iyes has joined #asahi-gpu

12:31 ATiltedTree_ has joined #asahi-gpu

12:33 compassion1785 has quit [Quit: lounge quit]

12:33 ATiltedTree has quit [Ping timeout: 480 seconds]

12:34 <iyes> So I made an account on FDO gitlab, but I can't fork asahi/mesa? It says project limit reached. Do I need some special permissions on my account? How am I going to be able to hack on the source code and make MRs if I can't fork?

12:36 ATiltedTree_ is now known as ATiltedTree

12:45 <_jannau__> iyes: https://gitlab.freedesktop.org/freedesktop/freedesktop/-/wikis/home#how-can-i-contribute-to-an-existing-project-or-create-a-new-one

12:46 <iyes> Oh! Okay!

13:03 anarcat[m] has joined #asahi-gpu

13:03 compassion1785 has joined #asahi-gpu

13:05 cylm_ has joined #asahi-gpu

13:59 Dementor has joined #asahi-gpu

14:02 <iyes> Lol after not having worked with C for years, only Rust, reading mesa source code now (studying NIR apis) and seeing all the various pointers to different structs everywhere, is filling me with dread and paranoia. :D Everything feels so fragile! I guess I'll get used to it...

14:03 <alyssa> you will not :3

14:03 <alyssa> c is terrifying

14:07 <rosefromthedead> hahahah

14:07 <rosefromthedead> "you're in for a world of pain :3"

14:10 <iyes> Does mesa have any sort of browse-able API docs generated from the source code itself? something similar to what rustdoc/docs.rs for rust projects?

14:12 <iyes> Or am I too spoiled by rust's nice dev tools and having unreasonable expectations here? :D

14:20 <alyssa> a bit

14:20 <alyssa> https://docs.mesa3d.org/nir/index.html

14:21 <alyssa> https://docs.mesa3d.org/vulkan/index.html

14:21 <alyssa> https://docs.mesa3d.org/gallium/index.html

14:22 compassion1785 has quit [Quit: lounge quit]

14:23 ourdumbfuture has quit [Quit: My Mac has gone to sleep. ZZZzzz…]

14:26 <iyes> Yea I looked at those things, but they seem to be incomplete and human written. In Rust projects (and C/C++ projects that use something like Doxygen), it's so nice to able to learn a codebase by using a web browser to search around and click around to explore all the different functions, structs, etc. :)

14:27 <iyes> I guess no such thing exists here. It's fine. I got LSP with clangd working, and I can navigate around the source code easily from my text editor. I'll try to study things that way.

14:28 compassion1785 has joined #asahi-gpu

14:29 compassion1785 has quit []

14:37 compassion1785 has joined #asahi-gpu

14:37 possiblemeatball has joined #asahi-gpu

14:38 zane has quit [Quit: WeeChat 4.0.3]

14:43 <alyssa> that's doxygen

14:43 <alyssa> iirc

15:35 possiblemeatball has quit [Quit: Quit]

15:48 <iyes> What is an agx_bo ?

15:48 <iyes> Or I guess, a "BO" more generally?

15:49 <i509vcb> buffer object

15:50 <iyes> Oh! *facepalms*

15:51 <iyes> That makes sense, that user-facing apis like UBOs/VBOs/SSBOs/etc would be backed by common code for gpu buffers more generally.

15:54 possiblemeatball has joined #asahi-gpu

16:01 <iyes> How does mesa know what GL extensions to advertise as supported (say, in glxinfo output)? What exactly in the source code of a given driver (say, in our case asahi) results in a given extension to be reported as supported/available?

16:01 <alyssa> iyes: based on the PIPE_CAPs in agx_pipe.c

16:02 <alyssa> although I would encourage new developers to focus on Vulkan, there's a lot more low-hanging fruit there for both bugs and features

16:02 <alyssa> the GL driver, I kinda did all the easy stuff and now it's just the hard stuff left :p

16:02 <iyes> Hahah, understandable

16:02 <i509vcb> I think some parts of gallium also check the screen vtable as well

16:02 <i509vcb> for extensions

16:02 <iyes> I still want to learn how things work, though

16:03 <iyes> Out of curiosity, alyssa , based on your judgement, how hard are cubemap array textures? Sample rate shading?

16:04 <alyssa> cubemap arrays are easy

16:04 <alyssa> but the GL extension depends on geometry shaders for silly historical reasons

16:04 <alyssa> *GLES

16:04 <alyssa> sample shading is hard but already works, just waiting to be released.

16:05 <iyes> Oh okay that makes sense why cube map arrays are not advertised as supported in GLES/GL

16:05 <iyes> That's unfortunate, it was the reason why Bevy game engine's 3d rendering did not work on Asahi when I tried it.

16:06 <i509vcb> I do know that you can run wgpu in it's downlevel gles 3.0 mode currently

16:06 <i509vcb> but the fancy stuff will need vulkan for wgpu to expose it

16:07 <iyes> Yep yep, Bevy works on asahi when using the GLES backend

16:07 <iyes> But Bevy's 3D PBR materials use cube map arrays

16:07 <iyes> So 3D examples errored on the missing extension

16:08 <iyes> But anyway, this is off-topic

16:08 <i509vcb> Back before wgpu-hal and in the gfx-hal days, there was a proper GL 4.6 backend but that never really got ported over

16:09 ourdumbfuture has joined #asahi-gpu

16:19 <i509vcb> lina: assuming the driver supports timeline sync objects the branch I linked last night should be in a testable state

16:24 iyes has quit [Ping timeout: 480 seconds]

16:26 <i509vcb> where does the timeline kernel branch live again?

16:30 nela has quit [Ping timeout: 480 seconds]

16:33 iyes has joined #asahi-gpu

16:34 nela has joined #asahi-gpu

16:41 Guest8982 has quit [Quit: Bridge terminating on SIGTERM]

16:41 rhysmdnz has quit [Quit: Bridge terminating on SIGTERM]

16:44 iyes has quit [Ping timeout: 480 seconds]

16:45 rhysmdnz has joined #asahi-gpu

16:46 Guest9188 has joined #asahi-gpu

16:46 cr1901_ is now known as cr1901

16:47 iyes has joined #asahi-gpu

16:48 <Mary> Hmm agxv_GetPhysicalDeviceImageFormatProperties2 isn't filling maxExtent.depth, what would be an appropriate value to report?

16:49 iyes has quit []

16:51 jeisom has quit [Quit: Leaving]

16:52 <i509vcb> https://registry.khronos.org/vulkan/specs/1.3-extensions/html/vkspec.html#features-extentperimagetype

16:52 <i509vcb> 1d and 2d have a depth of 1

16:52 <i509vcb> 3d not so sure

16:53 crabbedhaloablut has joined #asahi-gpu

16:54 <alyssa> Mary: arrays can have up to 2048 layers

16:54 <Mary> we report 8192 for maxImageDimension3D so I suppose that's what we should be doing here

16:54 <Mary> oh

16:54 <alyssa> 3D can be up to 8192 depth

16:54 <Mary> Will fix that then, thanks

16:59 <i509vcb> alyssa do you happen to know what kernel branch the timeline stuff is implemented in?

17:01 jeisom has joined #asahi-gpu

17:04 <alyssa> I thought it's in what we ship

17:07 <i509vcb> I was going to try to debug myself but I guess it happens with that's shipped so

17:10 <alyssa> gpu/rust-wip branch anyway

17:10 <alyssa> although that has uapi bumpage

17:10 <alyssa> which i was going to do next week

17:10 <alyssa> rebase agxv i mean

17:13 <i509vcb> Well there are other things I can do in the driver until then

17:15 jeisom has quit [Ping timeout: 480 seconds]

18:12 jeisom has joined #asahi-gpu

18:12 ourdumbfuture has quit [Quit: My Mac has gone to sleep. ZZZzzz…]

18:12 possiblemeatball has quit [Quit: Quit]

18:19 ourdumbfuture has joined #asahi-gpu

18:19 VinDuv has joined #asahi-gpu

19:18 <i509vcb> restarting ide over ssh... I'd think just pkill -9 kwin_wayland but now am on a black screen

19:18 <i509vcb> /s/ide/kde

19:19 <i509vcb> What's meant to be the intended way to do this?

19:49 <alyssa> i509vcb: I use multi-user.target instead of graphical.target and start (sway in my case most of the time) manually from the tty on boot, so then the pkill returns the tty

19:49 <alyssa> not sure what the Right way is

20:10 crabbedhaloablut has quit []

20:11 <i509vcb> Not sure where to look to handle nir_intrinsic_load_num_workgroups in agx_emit_intrinsic

20:14 <i509vcb> Hmm nvm I see something in asahi gallium

20:17 <alyssa> i509vcb: the nontriviality is CmdDispatchIndirect

20:17 <alyssa> you don't (in general) know the number of workgroups on the cpu

20:20 <i509vcb> in gallium you seem to load the number of workgroups in a sideband storage?

20:21 <i509vcb> although that might not be whats good for vulkan

20:22 <alyssa> the way I handle this in gallium is..

20:22 <alyssa> 1. For direct dispatch, I upload the workgroup count to GPU memory at dispatch time.

20:23 <alyssa> 2. Now regardless of direct or indirect, we have a GPU address to the workgroup count in the common format

20:23 <alyssa> 3. That GPU address is mapped to uniform registers for any shader that reads num_workgroups

20:23 <alyssa> 4. num_workgroups turns into a read from those uniform registers

20:24 <i509vcb> indirect you are told how many workgroups you have?

20:25 <alyssa> no, indirect the app passes the count in gpu memory

20:26 <alyssa> (it could be written by some other compute shader, you can't read it from the cpu)

20:28 <alyssa> .

20:41 possiblemeatball has joined #asahi-gpu

20:51 <jannau> lina: did you find an alternative solution for display/gpu testing on the m2 ultra? I finally got dcpext0/dp-altmode to work on t6020

20:54 <i509vcb> there was an agx opcode reference somewhere if I recall?

20:54 <jannau> anyway, I'll prepare a branch for testing on the mac pro (and the m2 max macbook pro)

20:54 <alyssa> i509vcb: https://dougallj.github.io/applegpu/docs.html

21:16 <i509vcb> I learned of nir_lower_compute_system_values which seems to handle nir_intrinsic_load_num_workgroups

21:17 <i509vcb> is there a reason the gallium driver has it's own bespoke thing there with AGX_SYSVAL_TABLE_GRID?

21:17 <i509vcb> (v3dv and a few other drivers use lower_compute_system_values)

21:23 karolherbst has quit [Remote host closed the connection]

21:24 karolherbst has joined #asahi-gpu

21:27 <alyssa> i509vcb: becuase that doesn't actually handle it

21:27 <alyssa> it just relies on the driver passing in a constant count

21:28 <alyssa> which again you don't know for indirect dispatch

21:29 <i509vcb> ok so agxv can't use that

21:29 <alyssa> that's an optimization for mesh shaders

21:29 <alyssa> not compute

21:29 <alyssa> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22334

21:29 alyssa has quit [Quit: alyssa]

22:08 ourdumbfuture has quit [Quit: My Mac has gone to sleep. ZZZzzz…]

22:10 ourdumbfuture has joined #asahi-gpu

22:25 ourdumbfuture has quit [Quit: My Mac has gone to sleep. ZZZzzz…]

22:27 ourdumbfuture has joined #asahi-gpu

22:30 manawyrm has quit [Quit: Read error: 2.99792458 x 10^8 meters/second (Excessive speed of light)]

22:31 manawyrm has joined #asahi-gpu

23:12 cylm_ has quit [Ping timeout: 480 seconds]

23:22 <jannau> lina, marcan: https://github.com/AsahiLinux/m1n1/pull/320 required m1n1 changes, can be merged immediately

23:24 <jannau> kernel branch: https://github.com/jannau/linux/tree/asahi-6.4-7pre-dcpext_dp_t602x - only works on t602x with 13.5 fw, tested on j474s

23:25 <jannau> only atc1 works, should be on all machines the port next to the usb-c port with serial

23:28 <jannau> requires 'pd_ignore_unused' (haven't checked why yet), there seems to be some kind of memory hazard in afk.c or dptx.c. works for me only if compiled with clang

23:29 <jannau> This is a problem on fedora since clang and bindgen use libllvm.so from llvm-15/16 which breaks build with asahi

23:32 <jannau> other than that it seems to work ok. comes up reliably at boot and after display standby. only lightly tested

23:44 alyssa has joined #asahi-gpu

23:45 <alyssa> since rebasing I'm getting weird Chromium issues

23:45 <alyssa> sometimes it fails to start properly and spams stderr with messages like

23:45 <alyssa> Errors:

23:45 <alyssa> link failed but did not provide an info log

23:45 <alyssa> [4983:4983:0817/194416.716259:ERROR:shared_context_state.cc(81)] Skia shader compilation error

23:45 <alyssa> I can't tell if this a Mesa bug or a Chromium one

23:45 <alyssa> MESA_SHADER_CACHE_DISABLE=1 'seems' to workaround which points at a Mesa bug but it's hard to say

23:46 <alyssa> The on-screen symptom is that Chromium rendering is totally broken

23:46 <jannau> alyssa: Chromium has its own shader cache

23:46 <jannau> ~/.config/chromium/Default/GPUCache

23:48 anarcat[m] has left #asahi-gpu [#asahi-gpu]

23:48 <jannau> see https://github.com/AsahiLinux/linux/issues/72 there is an upstream bug but I think we're partly to blame as well since the mesa version doesn't change

23:56 <alyssa> jannau: oh, good

23:56 <alyssa> here I was worried this was a regression I introduced ;~P

23:56 <alyssa> thanks :)

23:58 <alyssa> OpenGL core profile version string: 3.1 Mesa 23.3.0-devel (git-2866d6991e)

23:58 <alyssa> this has the commit hash

23:58 <alyssa> IDK why they're checking only the renderer