#dri-devel on 2023-12-13 — irc logs at oftc.irclog.whitequark.org

2022-12-21 00:45 ChanServ changed the topic of #dri-devel to: <ajax> nothing involved with X should ever be unable to find a bar

00:06 gegoxaren[m] has joined #dri-devel

00:08 mszyprow has quit [Ping timeout: 480 seconds]

00:15 karolherbst has quit [Quit: Konversation terminated!]

00:15 karolherbst has joined #dri-devel

00:21 iive has quit [Quit: They came for me...]

00:37 pcercuei has quit [Quit: dodo]

00:37 Net147 has quit [Read error: Connection reset by peer]

00:37 crabbedhaloablut has quit []

00:37 Net147 has joined #dri-devel

00:42 TMM has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]

00:42 TMM has joined #dri-devel

00:56 soreau has quit [Ping timeout: 480 seconds]

01:04 soreau has joined #dri-devel

01:07 flynnjiang has joined #dri-devel

01:10 flynnjiang has quit []

01:14 OftenTimeConsuming has quit [Remote host closed the connection]

01:15 OftenTimeConsuming has joined #dri-devel

01:16 luc has quit [Quit: Page closed]

01:17 luc has joined #dri-devel

01:27 Leopold_ has quit [Remote host closed the connection]

01:28 Leopold has joined #dri-devel

01:28 co1umbarius has joined #dri-devel

01:30 columbarius has quit [Ping timeout: 480 seconds]

01:50 oneforall2 has quit [Remote host closed the connection]

01:50 oneforall2 has joined #dri-devel

01:52 yyds has joined #dri-devel

02:00 DragoonAethis has quit [Quit: hej-hej!]

02:00 DragoonAethis has joined #dri-devel

02:22 yuq825 has joined #dri-devel

02:27 mripard_ has joined #dri-devel

02:29 Namarrgon has joined #dri-devel

02:34 mripard has quit [Ping timeout: 480 seconds]

02:34 angerctl has quit [Ping timeout: 480 seconds]

02:52 yyds has quit []

02:52 yyds has joined #dri-devel

02:55 <luc> cwabbott: I noticed that glibc starts using simd/fp registers since https://sourceware.org/git/?p=glibc.git;a=commit;h=e6f3fe362f1aab78b1448d69ecdbd9e3872636d3. but my test is on the older glibc version (2.31) which does NOT contain those instructions like `ldr q0, [src]`. so I guess __memcpy_aarch64_simd() might not faster than normal LD/ST if destination of memcpy is vram

03:03 luc has quit [Quit: Page closed]

03:09 <HdkR> ldq q0 is asimd

03:10 <HdkR> Oh, does not contain those

03:41 luben has joined #dri-devel

03:49 neniagh has quit [Ping timeout: 480 seconds]

03:57 heat has quit [Ping timeout: 480 seconds]

03:58 neniagh has joined #dri-devel

04:04 YuGiOhJCJ has quit [Ping timeout: 480 seconds]

04:07 lanodan has quit [Ping timeout: 480 seconds]

04:07 bbrezillon has quit [Ping timeout: 480 seconds]

04:07 aissen has quit [Ping timeout: 480 seconds]

04:15 aissen has joined #dri-devel

04:15 bbrezillon has joined #dri-devel

04:22 bmodem has joined #dri-devel

04:51 kts has joined #dri-devel

04:56 luben has quit [Ping timeout: 480 seconds]

05:22 Haaninjo has joined #dri-devel

05:23 Haaninjo has quit []

05:46 kts has quit [Quit: Leaving]

06:00 kts has joined #dri-devel

06:01 psykose has quit [Remote host closed the connection]

06:01 ptrc has quit [Remote host closed the connection]

06:02 ptrc has joined #dri-devel

06:09 psykose has joined #dri-devel

06:11 itoral has joined #dri-devel

06:13 sadlerap3 has quit [Remote host closed the connection]

06:13 sadlerap3 has joined #dri-devel

06:38 kts has quit [Ping timeout: 480 seconds]

06:44 urja has quit [Ping timeout: 480 seconds]

06:59 mripard_ has quit []

06:59 mripard has joined #dri-devel

07:09 mszyprow has joined #dri-devel

07:14 kzd has quit [Ping timeout: 480 seconds]

07:29 jsa has joined #dri-devel

07:54 sima has joined #dri-devel

08:00 sghuge has quit [Remote host closed the connection]

08:00 sghuge has joined #dri-devel

08:00 fab has joined #dri-devel

08:00 mvlad has joined #dri-devel

08:06 macslayer has quit [Ping timeout: 480 seconds]

08:11 tursulin has joined #dri-devel

08:13 <MrCooper> karolherbst: BTW, any reason rusticl on radeonsi couldn't be tested in CI?

08:24 oneforall2 has quit [Ping timeout: 480 seconds]

08:26 oneforall2 has joined #dri-devel

08:29 Leopold has quit [Remote host closed the connection]

08:30 Leopold has joined #dri-devel

08:34 hansg has joined #dri-devel

08:37 Leopold has quit [Remote host closed the connection]

08:37 Leopold has joined #dri-devel

08:41 rgallaispou has joined #dri-devel

08:41 lynxeye has joined #dri-devel

08:42 thaytan has quit [Ping timeout: 480 seconds]

08:47 <airlied> MrCooper: getting CL cts into a useful form, though we could just pick a couple of main tests

08:48 <MrCooper> I was thinking of piglit

08:48 <MrCooper> that should have caught this regression and a few before at least

08:49 vliaskov has joined #dri-devel

08:50 simon-perretta-img has quit [Ping timeout: 480 seconds]

08:51 simon-perretta-img has joined #dri-devel

09:12 crabbedhaloablut has joined #dri-devel

09:16 i509vcb has quit [Quit: Connection closed for inactivity]

09:20 <pq> jani, what if you forward-declare an enum, use it in struct definition, and then include the definition of the enum which results in a different size? Or maybe you just copy the struct without ever having the enum definition?

09:21 pcercuei has joined #dri-devel

09:23 <emersion> pq: it appears the compiler remembers the enum has unknown size, and will require a size definition before you can do these things

09:24 <jani> pq: it's an incomplete type similar to a struct/union forward declaration, and can't use it before you know the size

09:24 <jani> https://gcc.gnu.org/onlinedocs/gcc/Incomplete-Enums.html

09:26 <pq> cool, thanks!

09:27 apinheiro has joined #dri-devel

09:29 simon-perretta-img has quit [Ping timeout: 480 seconds]

09:31 Leopold has quit [Remote host closed the connection]

09:31 <jani> I know it's a bit hacky, but if you can use it to avoid pulling in some headers everywhere, I'll use it

09:32 <emersion> it's non-standard, so i won't use it

09:33 simon-perretta-img has joined #dri-devel

09:33 <jani> fair, though the kernel is explicitly non-standard

09:34 <jani> I'd also avoid it outside of the kernel

09:34 <emersion> yeah

09:42 heat has joined #dri-devel

09:44 Nefsen402 has quit [Remote host closed the connection]

09:44 bl4ckb0ne has quit [Remote host closed the connection]

09:44 emersion has quit [Remote host closed the connection]

09:44 bl4ckb0ne has joined #dri-devel

09:44 Nefsen402 has joined #dri-devel

09:44 emersion has joined #dri-devel

09:58 sukrutb has quit [Ping timeout: 480 seconds]

10:10 pjakobsson has quit [Remote host closed the connection]

10:12 cmichael has joined #dri-devel

10:15 drobson has joined #dri-devel

10:17 biju has joined #dri-devel

10:19 vliaskov has quit [Read error: Connection reset by peer]

10:19 janek has joined #dri-devel

10:22 pa- has quit []

10:23 pa has joined #dri-devel

10:23 drobson has quit [Ping timeout: 480 seconds]

10:29 Leopold has joined #dri-devel

10:33 DodoGTA has quit [Quit: DodoGTA]

10:35 DodoGTA has joined #dri-devel

10:38 kts has joined #dri-devel

10:39 kts has quit []

10:45 DodoGTA has quit [Quit: DodoGTA]

10:48 DodoGTA has joined #dri-devel

10:50 sgm has quit [Remote host closed the connection]

10:50 sgm has joined #dri-devel

10:56 kts has joined #dri-devel

10:59 kts has quit []

11:00 rasterman has joined #dri-devel

11:07 columbarius has joined #dri-devel

11:07 co1umbarius has quit [Ping timeout: 480 seconds]

11:09 kts has joined #dri-devel

11:25 <tomba> Looking at the atomic helpers, the current sequence when enabling the video pipeline is: crtc enable, bridge pre_enable, encoder enable, bridge enable. Crtc's enable happening before the bridge's pre_enable strikes me a bit odd, especially as bridge's pre_enable documentation says "The display pipe (i.e. clocks and timing signals) feeding this bridge will not yet be running when this callback is called". Anyone have insight on why the sequence has

11:25 <tomba> evolved to be such? Does the DRM framework expect that there's always an encoder which will somehow gate the signal from CRTC, until the encoder enable is called?

11:25 kts has quit [Ping timeout: 480 seconds]

11:47 <karolherbst> MrCooper: not really

11:47 <karolherbst> piglit is better than nothing I guess, but I also kinda want the CL CTS to be tested, it's just a pain to do

11:48 <mupuf> karolherbst: what's different about cl vs gl/gles/vk?

11:49 <karolherbst> mupuf: it's a bunch of binaries with no consistent way for fetching subtests, see my own runner dealing with that nonsense: https://gitlab.freedesktop.org/karolherbst/opencl_cts_runner/-/blob/master/clctsrunner.py

11:49 <mupuf> thanks!

11:49 <karolherbst> `def create(cls, id, file):` specifically

11:50 <karolherbst> so I have a bunch of different regex to parse the help thing.. and one tests has a special option for it

11:50 <karolherbst> it's all messy

11:50 <mupuf> wow

11:51 <karolherbst> I think it would be easier to fix the CTS to be more consistent here instead 🙃 probably

11:51 <karolherbst> yeah.. the image ones are really crazy as they have flags you can pass in

11:51 <karolherbst> like the image format + order and such as well

11:51 <karolherbst> it's nice for testing, but a pain for creating such a list

11:52 <mupuf> https://www.khronos.org/conformance/adopters/conformant-products/opencl <-- it is suprirsing to still see submissions here

11:52 <karolherbst> _maybe_ it would make sense to write a binary/lib translating from deqp style naming to CL CTS

11:52 <karolherbst> mupuf: why surprising though? :D

11:52 <mupuf> I thought noone cared

11:52 <karolherbst> Intel is pretty big in CL

11:53 <karolherbst> and arm as well

11:53 <karolherbst> nah

11:53 <karolherbst> that was like 5-10 years ago

11:53 <karolherbst> today they care more

11:53 <karolherbst> the CL WG is pretty active even

11:53 <karolherbst> currently people work on making it more vulkan like by adding command buffers and stuff

11:53 <mupuf> so many comformance results, so few CL apps

11:53 <karolherbst> cl_khr_command_buffer

11:54 <mupuf> I see, good to hear

11:54 <karolherbst> mupuf: davinci resolve is one CL app :D

11:54 <karolherbst> it's a pro video editing tool

11:54 <mupuf> right

11:54 <karolherbst> but yeah.. it's more used in professional apps than like linux desktop ones

11:54 <karolherbst> photoshop also uses CL for some stuff?

11:54 <mupuf> and I guess some AI stuff may be using CL

11:55 <karolherbst> otherwise in the foss world you have darktables being able to use CL

11:55 <karolherbst> ahh yeah

11:55 <karolherbst> openvino uses CL on intel

11:55 <karolherbst> it's a framework doing ONNX base AI/ML stuff

11:55 <karolherbst> and can be used to some degree with tensorflow/pytorch/etc...

11:56 <karolherbst> mupuf: the sad part is simply that besides CUDA we only have CL as a cross vendor API which doesn't suck

11:56 <karolherbst> though that _might_ change with the CXL stuff

11:56 <mupuf> SYCL was also supposed to change that

11:56 <karolherbst> uhm..

11:56 <karolherbst> ha

11:56 <karolherbst> no

11:57 <karolherbst> SYCL is C++ only for startest and SyCL is a _compile time_ API, there is no runtime specified

11:57 <karolherbst> sooooo

11:57 <mupuf> :o

11:57 <karolherbst> if your toolchain only supports AMD GPUs, your app only supports AMD GPUs

11:57 <karolherbst> luckily the runtime intel worked on layers on top of CL

11:57 <karolherbst> which brings us back to CL anyway

11:57 <karolherbst> *toolchain

11:58 <karolherbst> so yeah....

11:58 <karolherbst> mupuf: that's kinda what's going on atm: https://www.linuxfoundation.org/press/announcing-unified-acceleration-foundation-uxl

11:58 <karolherbst> but that just formed

11:59 <karolherbst> and it's unknown what comes out of it

11:59 <mupuf> Ah, UXL, not CXL

11:59 <mupuf> that was confusing ;)

11:59 <karolherbst> ahh yeah.. my bad

11:59 <mupuf> np

12:00 <mupuf> too many three letter acronyms

12:00 <mupuf> Thanks Karol, keep up the good work!

12:00 <karolherbst> :) thanks

12:01 <karolherbst> but yeah...

12:01 <karolherbst> I kinda wnat to test the CL CTS in CI

12:03 <karolherbst> actually I kinda like my deqp adapter idea....

12:05 <mupuf> that would be an easy path forward, yeah

12:06 <karolherbst> just also slow if it simply would `execv` the binaries...

12:06 <mupuf> deqp is extremelly slow at test enumeration

12:06 <mupuf> so don't worry

12:06 <karolherbst> I wonder if one could `dlopen` them and just call into their `main` function instead...

12:07 <mupuf> how many tests are there, and how long is a typical runtime?

12:07 <karolherbst> all tests in non wimpy mode are like between 3 and 70 hours apparently

12:07 <karolherbst> 70 as that's what jenatali needed for a full CL CTS run

12:08 <karolherbst> wimpy reduces amount of iteration in arithmetic tests

12:08 <karolherbst> if I pass `wimpy` and `quick` into my runner it's like 10 minutes parallized

12:08 <karolherbst> but it skips a bunch of corner case tests

12:08 <karolherbst> subtests without taking image formats/order into account is roughly 2500

12:08 <mupuf> I see

12:09 <karolherbst> but you could split those up if you wanted to

12:09 <karolherbst> probably around 5000 then

12:09 <mupuf> sure, but let's just say that it is a little silly for it to take that long

12:09 <mupuf> just like igt used to take a month to run

12:09 <karolherbst> it's testing a lot of stuff

12:09 <karolherbst> like

12:09 <karolherbst> arithmetic precision

12:10 <karolherbst> and it iterates over a bunch of random values just to make sure the runtime is okay there

12:10 <mupuf> right, but knowing what to test is an important thing too

12:10 <karolherbst> yeah

12:10 <mupuf> I'm sure the wimpy mode is a good start anyway

12:10 <karolherbst> yeah

12:10 <karolherbst> it's good enough

12:11 <karolherbst> it doesn't catch all subnormal/nan related corner case, but whatever

12:11 <mupuf> 10 minutes means it could run on one runner

12:11 <karolherbst> that's 10 minutes on a 20 core machine

12:11 <mupuf> is it cpu-limited?

12:11 <karolherbst> yes

12:11 <karolherbst> compiling a lot of C code

12:11 <karolherbst> well some tests

12:11 <karolherbst> some tests are GPU limited

12:11 <mupuf> we have 16 cores runners

12:11 <karolherbst> I could do a 4 core run and see how that changes things

12:12 <mupuf> (5950X, for navi21/31)

12:12 <mupuf> 4 cores run? It would only make sense for lavapipe

12:12 camus has quit [Remote host closed the connection]

12:12 <mupuf> For bare metal runners, our slowest runners are the steam decks

12:13 <karolherbst> yeah... I just want to see how slong things become if you limit on the CPU side aggressively

12:13 <karolherbst> *slow

12:13 <karolherbst> but it's kinda wild.. some test utilize the GPU at 100%, others at like 0.1%

12:14 <karolherbst> and most isn't even runtime, it's just validation on the CTS side

12:14 <karolherbst> as you know... it calculates the same thing also on the CPU for checking the result

12:21 <karolherbst> at least on intel CPU util idles around 8% while running the CTS :')

12:21 <karolherbst> (with 4 cores)

12:22 <karolherbst> now it's 0% :')

12:23 kts has joined #dri-devel

12:23 yyds has quit [Remote host closed the connection]

12:23 <mupuf> ha ha, you are latency bound :D Everything just keeps waiting on each other

12:24 <karolherbst> mhhh

12:24 <karolherbst> doubt

12:24 <karolherbst> my cores are still at 400% in userspace

12:24 <karolherbst> ehh

12:24 <karolherbst> each one at 100%

12:25 <karolherbst> I mean.. the CL CTS for 90% of its time runs the same code on the CPU and checks if the result is correct, so if it wouldn't be CPU bound it would be kinda sad

12:25 <pq> I suspect a simple misread idle vs. usage % :-)

12:25 <karolherbst> but it kinda depends on the tests, some are more GPU bound, for non optimal code reasons

12:26 <karolherbst> pq: no seriously.. some of the tests are just doing 100% math

12:27 <karolherbst> if you validate multiple 10 thousands results from e.g. `sin` that's what you get

12:27 <karolherbst> (and all the other builtins)

12:27 <pq> karolherbst, I think mupuf read "usage" when you said "idle".

12:27 <karolherbst> ahh I see

12:28 <mupuf> pq: indeed

12:28 <karolherbst> ehh wait

12:28 <karolherbst> I wrote CPU

12:28 <karolherbst> I meant GPU

12:28 <karolherbst> duh

12:28 <pq> lol

12:28 <mupuf> makes more sense :p

12:28 <karolherbst> my fault 🙃

12:28 <karolherbst> mhh but yeah.. limiting to 4 cores kinda doubles the CL CTS runtime on my intel GPU

12:29 <karolherbst> could be worse

12:29 bmodem has quit [Ping timeout: 480 seconds]

12:30 <karolherbst> but we also have a wimpy factor option in some tests, which just adjusts how many iterations are done

12:31 <karolherbst> anyway.. maybe I'll play around with the deqp idea and see how bad it would be

12:33 <mupuf> karolherbst: sounds like a good idea. deqp support is found in a lot of tools, so if you can keep compatibility to it, it would be easiest

12:33 <karolherbst> yeah..

12:34 <mupuf> maybe it could even land in clcts, but that's not a requirement

12:35 <karolherbst> I wonder how hard it would be to convert all those testing to deqp actually, maybe I should bring it up at the CL WG next year as well.. but that kinda requires everybody else wanting it also :D

12:35 <karolherbst> but migrating the entire code base is probably quite a bit of work

12:36 bmodem has joined #dri-devel

12:39 <tomeu> haven't been following, but if somebody is going to make any big changes to the CTSes, it would be great if caching of golden results was taken into account

12:39 <tomeu> once I implemented that in my test suite, I started finding concurrency bugs in the kernel driver...

12:42 itoral has quit [Remote host closed the connection]

12:42 <mupuf> tomeu: I guess for images, it can make sense... but for precision/arithmetics tests, I doubt caching would improve performance :s

12:43 <tomeu> well, anything that computes something expensive in the CPU that is used to compare it with the driver's output

12:44 <tomeu> guess that is the case if it's CPU bound

12:54 <mupuf> it doesn't have to be expensive, it can be that there are just too many tests. Imagine a test that check that the gpu can increment a variable... for every format and for every acceptable value in the format

12:54 <mupuf> that seems to be what clcts is doing in some cases... not much to cache there

12:55 <mupuf> but... maybe the issue is that the tests are stupid :D

12:55 <mupuf> and instead a lot of operations could be tested at the same time, and only decomposed if the final result doesn't match expectations

12:56 <mupuf> but that requires serious work

12:56 <mupuf> glad to you are taking cpu time seriously in the design for the test suite!

12:59 <jenatali> Honestly the CL CTS does at least test things the right way, multithreaded and batching work together. It just does an insane amount of work

13:04 yyds has joined #dri-devel

13:09 <mupuf> jenatali: good to hear too!

13:19 bmodem has quit [Remote host closed the connection]

13:19 bmodem has joined #dri-devel

13:25 kts has quit [Ping timeout: 480 seconds]

13:47 kts has joined #dri-devel

13:47 <Venemo> Lynne: ping, I am trying to reproduce the multiplanar issue, can you give me a hand, please? it seems your command produces a video file with 1 frame, and I am unsure how best to view it

13:48 bmodem has quit [Ping timeout: 480 seconds]

13:48 <Venemo> Lynne: gnome's video app only shows a black screen, and vlc only shows the frame for a very brief time

13:48 <Venemo> Lynne: however, the output seems to be broken with or without enabling the transfer queue, so I am convinced the issue is not in the transfer queue implementation

13:52 fab has quit [Quit: fab]

13:59 yuq825 has left #dri-devel [#dri-devel]

13:59 urja has joined #dri-devel

14:06 hansg has quit [Ping timeout: 480 seconds]

14:06 <Lynne> Venemo: sure

14:06 janek has quit [Remote host closed the connection]

14:07 <Lynne> try "ffmpeg -init_hw_device vulkan -i test.png -vf format=yuv422p10le,hwupload,hwdownload,format=yuv422p10le,format=rgba -c:v png -y test_output.png"

14:07 konstantin_ has joined #dri-devel

14:08 <Venemo> Lynne: looks good

14:08 <Lynne> compared to the input image?

14:08 urja has quit [Quit: WeeChat 4.0.2]

14:09 <Venemo> they look the same to me

14:10 <Lynne> let me test again, maybe it fixed itself

14:10 <Lynne> which patches do I apply aside from the multiplane one?

14:10 <Venemo> I tested this branch here: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25594

14:11 konstantin has quit [Ping timeout: 480 seconds]

14:11 <Venemo> the other branch does not expose the transfer queue yet

14:11 <karolherbst> mupuf: yeah so on radeonsi I kinda end at 22 minutes

14:11 sgm has quit [Remote host closed the connection]

14:12 sgm has joined #dri-devel

14:14 <Lynne> segfault in radv_GetPhysicalDeviceQueueFamilyProperties2?

14:15 <Lynne> I thought we fixed that bug

14:17 <Lynne> Venemo: ah, the sparse binding queue was added, so the count's off again, https://0x0.st/H3RO.diff fixes it

14:18 <Lynne> I am seeing corruption here on 10bits: https://0x0.st/H3RV.png

14:19 <Venemo> okay, give me a moment

14:19 <mupuf> karolherbst: on what host cpu?

14:19 <karolherbst> intel i7-12700

14:19 <mupuf> ack, with 4 threads, right?

14:19 <karolherbst> yes

14:19 <karolherbst> and it has the cooling to run at max clock all the time :)

14:20 <karolherbst> I _could_ check how long it takes on my steamdeck, but then I have to figure out how to do that first

14:21 <mupuf> yeah, not important

14:22 <mupuf> I think expecting about 45-60 would be somewhat accurate

14:26 <jenatali> FL4SHK: you're not authed with NickServ so IRC folks aren't seeing your messages. This is the right place though

14:26 FL4SHK has joined #dri-devel

14:27 <pq> emersion, I'd like to wash my hands off that heap naming discussion now, there is nothing useful I could say.

14:28 <emersion> ahah

14:28 <emersion> well thanks for your replies, i think they've been useful

14:29 <pq> glad you think so, I just feel butting in somewhere that's not my business :-p

14:30 <FL4SHK[m]> hello

14:30 <pq> FL4SHK[m], hello, we see you now.

14:30 <FL4SHK[m]> cool

14:31 <FL4SHK[m]> I was wondering, how difficult would it be to develop either a GCC version of LLVMPipe for a custom GPU, (given that I know how to write a GCC backend, which I do) or to develop a new Mesa driver in general?

14:32 <FL4SHK[m]> I am a hardware/software developer and I plan on creating a new FPGA-based workstation.

14:32 <FL4SHK[m]> I would be happy with something like 500 MHz for the system (doable with my Xilinx ZCU104)

14:32 urja has joined #dri-devel

14:33 <FL4SHK[m]> This would be a passion project but I'd be happy to open source everything

14:34 mszyprow has quit [Ping timeout: 480 seconds]

14:35 <FL4SHK[m]> I typically do that for anything public I make anyway

14:36 <FL4SHK[m]> at least for stuff made outside of work :)

14:41 <karolherbst> FL4SHK[m]: gcc doesn't support those kind of use cases

14:42 <karolherbst> or uhm..

14:42 <karolherbst> mhh

14:42 <karolherbst> maybe with libgccjit.so it actually does...

14:42 <karolherbst> though not sure what input it epxects

14:43 <FL4SHK[m]> I see

14:44 <karolherbst> but it looks very very C centric

14:44 <karolherbst> I think it would be a cool research project in terms of "how good is the gcc jit"

14:44 <karolherbst> but not sure we'd have any plans of merging it unless it has strong benefits over llvmpipe

14:44 <FL4SHK[m]> I see

14:45 <FL4SHK[m]> perhaps I could write an LLVM backend for the GPU then

14:45 <karolherbst> at least having a more stable API/ABI would be a benefit

14:45 <FL4SHK[m]> that would be nice yes

14:45 urja has quit [Read error: Connection reset by peer]

14:45 <karolherbst> but in general we kinda prefer having the GPU's backend compiler all inside mesa

14:46 <karolherbst> as doing GPU backends in gcc and llvm kinda... well.. have their disadvantages and we move away from that

14:46 urja has joined #dri-devel

14:46 <FL4SHK[m]> okay, right

14:47 Haaninjo has joined #dri-devel

14:47 <FL4SHK[m]> well, I was actually hoping to make my GPU have really long vectors instead of making a conventional GPU

14:47 <karolherbst> please don't

14:47 <FL4SHK[m]> otherwise it'd be like the CPU core

14:47 <karolherbst> or rather

14:47 <karolherbst> not inside the ISA

14:47 <FL4SHK[m]> hm?

14:48 <karolherbst> vector ISA have too many drawbacks and everybody moves away from them

14:48 <FL4SHK[m]> why?

14:48 <karolherbst> (if they haven't already)

14:48 <karolherbst> GPU ISA are mostly scalar

14:48 <FL4SHK[m]> I see

14:48 <karolherbst> makes it easier to optimize code

14:48 <karolherbst> so every SIMD lane is just a scalar program from the ISA point of view

14:48 <FL4SHK[m]> I could make a regular GPU then

14:48 <karolherbst> and the hardware just runs e.g. 32 threads in a SIMD group

14:48 <karolherbst> with each executing the same instruction

14:49 <karolherbst> makes it easier to parallize data as you won't run into the issue of "what if you can only use 3 of your 32 SIMD lanes, because of program code"

14:49 <karolherbst> vectorization within a thread is destined to fail

14:50 <karolherbst> so you get the most perf if you don't rely on it

14:50 <karolherbst> however.. some GPUs have e.g. vec2 operations for fp16 or 128 bit wide memory load/stores

14:51 <FL4SHK[m]> I could do that

14:51 <FL4SHK[m]> the ISA I was going to go with is designed to reduce memory traffic

14:51 <karolherbst> but in the end feel free to experiment :D

14:52 mszyprow has joined #dri-devel

14:52 <karolherbst> yeah.. so some ISAs have "scalar" or "uniform" registers which are special cases where one instruction inside the SIMD group has the same result across all lanes

14:52 <karolherbst> so optimizations like that exist

14:53 <karolherbst> but that's alu <-> register traffic stuff

14:57 shashanks has joined #dri-devel

14:59 rgallaispou1 has joined #dri-devel

14:59 konstantin_ is now known as konstantin

15:00 yyds has quit [Remote host closed the connection]

15:02 rgallaispou2 has joined #dri-devel

15:03 rgallaispou1 has quit [Read error: Connection reset by peer]

15:04 rgallaispou has quit [Ping timeout: 480 seconds]

15:06 <Venemo> Lynne: I still don't see it

15:06 <Venemo> Lynne: what GPU do you use?

15:07 <Lynne> 7900XTX, give me a sec to try on a 6900XT

15:09 dliviu has quit [Ping timeout: 480 seconds]

15:13 <Venemo> Lynne: I am also trying on 7900XTX

15:13 <Venemo> sorry but the issue just doesn't happen here

15:13 <Venemo> are you sure we are doing the same thing?

15:15 dliviu has joined #dri-devel

15:15 <Lynne> you're using the sample image I'm using, right: https://gitlab.freedesktop.org/mesa/mesa/uploads/2ffa09962eb83f2e1f7de2d919b549ec/test.png

15:15 <Venemo> exactly

15:16 <Lynne> you're running "ffmpeg -init_hw_device vulkan -i test.png -vf format=yuv422p,hwupload,hwdownload,format=yuv422p,format=rgba -c:v png -y test_out.png"

15:16 <Lynne> with export RADV_PERFTEST=transfer_queue

15:16 <Venemo> I tried both with and without, both commands produce the same output

15:17 <Lynne> no green stripe across the image?

15:18 <Venemo> Lynne: this is how it looks here: https://i.imgur.com/B08PvuU.jpg

15:18 <Venemo> I have ffmpeg-6.0.1-3.fc39.x86_64 in case that matters

15:19 <Lynne> it does, that's before the large vulkan patchset was merged

15:19 <Venemo> let me try again

15:19 <Lynne> could you update to 6.1?

15:19 fab has joined #dri-devel

15:19 <Venemo> ehh

15:20 <Venemo> there doesn't seem to be a fedora build for ffmpeg 6.1 yet, what is the easiest way for me to get it?

15:21 <Lynne> git clone, ./configure && make build?

15:21 <Venemo> ehh, okay

15:22 <Lynne> on the bright side, it's less time to compile than a minimal mesa and doesn't depend on llvm

15:23 <Venemo> Lynne: what package do I need for this one? nasm/yasm not found or too old. Use --disable-x86asm for a crippled build.

15:23 <kwizart> Venemo, 6.1 is in rpmfusion for f40 (you could use a container or chroot ?)

15:24 <Lynne> Venemo: nasm

15:26 <Venemo> kwizart: I would prefer not to

15:26 <Venemo> Lynne: got it, it's building now

15:28 lanodan has joined #dri-devel

15:28 <Venemo> Lynne: awesome, I got the green stripe now

15:29 <Lynne> it should disappear if you remove the transfer_queue perftest

15:30 <Venemo> it does indeed

15:35 <Lynne> it also does happen on 6900XT, but the stripe is twice as large as 7900

15:35 <Lynne> *long

15:36 <Venemo> whatever the issue is, it's probably the same problem

15:36 <Venemo> on both GPUs

15:37 mvlad has quit [Remote host closed the connection]

15:41 <tomeu> gfxstrand: what would you think of adding some of these instructions to NIR? https://www.tensorflow.org/mlir/tfl_ops

15:41 <tomeu> it doesn't need to be this dialect, can be something different, but of equivalent functionality and level of abstraction

15:43 mszyprow has quit [Ping timeout: 480 seconds]

15:45 <gfxstrand> tomeu: A bunch of those we already have.

15:46 <gfxstrand> tomeu: The first question that comes to mind is how big is a tensor?

15:46 <gfxstrand> NIR vectors are currently limited to the SPIR-V limit of 16

15:46 <gfxstrand> With some limitations on exactly what sizes are allowed but those can probably be lifted.

15:46 <karolherbst> ~~question is for how long still~~

15:46 <tomeu> they can be megabytes big, but I'm not sure a tensor can be mapped as a vector

15:47 <gfxstrand> tomeu: Oh... Okay, that changes my mental model.

15:48 <gfxstrand> So what is a tensor then? Is it an opaque object that's backed by memory somewhere?

15:51 <tomeu> yep, with attributes such as dimensions (4 is common) and data type

15:52 <tomeu> guess I should investigate a bit more what others are doing for generating machine code from MLIR, I just got really frustrated by having to reinvent NIR in my NPU driver

15:52 <tomeu> there is a cute graph at https://www.tensorflow.org/mlir/overview

15:53 <tomeu> there is a mlir-to-spirv translator out there, but I'm not sure what is the level of abstraction of the output

15:54 <tomeu> ie. if convolution operations have been lowered to CL spirv or are still there

15:58 <Venemo> Lynne: interesting. it would seem that there are 3 buffer->image and image->buffer copies and the 3rd copy seems to miss a part of the image

16:00 <gfxstrand> tomeu: So, my gut feeling is that if you do it all with intrinsics, find yourself a suffix you're happy with and you can make as many as you'd like.

16:02 <tomeu> ok, I will play with it after holidays and comment back

16:02 hansg has joined #dri-devel

16:02 <gfxstrand> It's unclear to me how tensor ops would fit into NIR long-term.

16:03 <gfxstrand> If it's a good match, we may want to add a new op type for tensor ops and make them more first-class.

16:03 <gfxstrand> My biggest fear is that tensor NIR will end up looking so different from regular NIR that we might as well have different IRs.

16:04 <gfxstrand> But I haven't thought about it hard enough for that fear to be an opinion. It's more of a "Hey, there's a mountain over there and I've heard rumors of dragons so, uh... watch out!"

16:08 <Lynne> Venemo: luma plane looks fine, so it's the chroma planes

16:11 kzd has joined #dri-devel

16:16 Duke`` has joined #dri-devel

16:16 <tomeu> yeah, I also see the tensor type as the main difficulty here

16:20 <gfxstrand> If it remains an opaque thing, that's easy enough.

16:26 urja has quit [Ping timeout: 480 seconds]

16:36 urja has joined #dri-devel

16:37 mszyprow has joined #dri-devel

16:38 cmichael has quit [Remote host closed the connection]

16:38 macslayer has joined #dri-devel

16:38 cmichael has joined #dri-devel

16:51 Company has quit [Quit: Leaving]

16:51 cmichael has quit []

16:56 mvlad has joined #dri-devel

17:04 JohnnyonFlame has joined #dri-devel

17:19 <Venemo> Lynne: only the 3rd thingy seems wrong

17:19 <Venemo> but I don't yet see why

17:40 nashpa has joined #dri-devel

17:42 dliviu has quit [Ping timeout: 480 seconds]

17:43 <Lynne> in case you're wondering why you couldn't replicate on 6.0: it doesn't use multiplane images (ever)

17:45 <FL4SHK[m]> is LLVMPipe not going to be supported in the future?

17:46 koike has joined #dri-devel

17:46 <koike> o/ I'm trying to run dim setup, but it fails to update rerere cache and ask me to run git branch --set-upstream-to=<remote>/<branch> rerere-cache , I already removed rerere-cache branch and worktree to see if it would fix but no luck, I'm new to dim tool so I was wondering if anyone could give me pointers here

17:46 TMM has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]

17:46 TMM has joined #dri-devel

17:46 kts has quit [Quit: Leaving]

17:46 vliaskov has joined #dri-devel

17:49 kts has joined #dri-devel

17:49 mszyprow has quit [Ping timeout: 480 seconds]

17:49 <koike> (never mind, looks like the branch didn't really get removed, it seems it worked now, sorry for the noise and thanks for :rubberduck: xD )

17:51 kts has quit []

17:52 Jscob has joined #dri-devel

17:55 urja has quit [Ping timeout: 480 seconds]

17:57 <jenatali> FL4SHK: LLVMPipe runs on the CPU, not a GPU

17:58 <jenatali> There exist drivers that use LLVM for generating GPU code. I think it's just radeonsi at this point. But it has nothing to do with LLVMPipe

17:58 <FL4SHK[m]> Okay

17:58 <Venemo> Lynne: how is the buffer uploaded to the GPU?

17:58 <kisak> Intel has a couple fingers into llvm (OpenCL?)

17:59 <FL4SHK[m]> How difficult would it be to develop a Mesa driver for a new GPU then?

17:59 <FL4SHK[m]> I'm assuming it'd be hard...

17:59 Jscob has quit [Read error: Connection reset by peer]

18:00 <jenatali> LLVM is used in frontends like rusticl, yeah. I dunno how much it's used for GPU backends, especially from Mesa

18:00 <jenatali> FL4SHK: It depends

18:00 <gfxstrand> Baseline is "hard". It only goes up from there depending on hardware and what APIs you want to support.

18:00 <gfxstrand> I mean, multiple highschoolers have successfully written Mesa drivers, so...

18:01 <FL4SHK[m]> Hm

18:01 <gfxstrand> But also I've been head down on NVK for 1.5 years and we're just now starting to play games and I'm one of the best there is.

18:01 <idr> Totally average, every day high school students...

18:01 <gfxstrand> From Normal High

18:01 <FL4SHK[m]> I see

18:01 <FL4SHK[m]> I'll keep that in mind

18:02 <gfxstrand> What are you wanting to make a driver for?

18:02 <FL4SHK[m]> a custom FPGA-based GPU

18:02 <FL4SHK[m]> I have part of the instruction set written up

18:02 <FL4SHK[m]> my goal is to have a 500 MHz workstation

18:02 rasterman has quit [Quit: Gettin' stinky!]

18:03 <FL4SHK[m]> which should be possible with the hardware I've got

18:03 <FL4SHK[m]> A Xilinx ZCU104

18:03 <FL4SHK[m]> I know it's a lot of work

18:07 urja has joined #dri-devel

18:08 vliaskov has quit [Ping timeout: 480 seconds]

18:08 <Lynne> Venemo: memory map image on RAM to a vkbuffer, then vkbuffer->vkimage copy

18:08 <Lynne> same but in reverse for downloads

18:09 <Venemo> Lynne: is it possible that there is a sync bug in there somehow?

18:09 <Lynne> validation passes

18:10 <Lynne> we do a barrier before each copy too

18:10 <Venemo> I'm not sure if that is relevant here. by the same logic I could say radv passes the cts

18:10 <Lynne> (it doesn't?)

18:10 <Venemo> it does

18:11 <Venemo> or what do you mean?

18:11 <Lynne> nothing

18:11 <Lynne> disabling host-mapping and falling back to a RAM->vkbuffer + vkbuffer->vkimage copy doen't help

18:11 <Venemo> it is very curious that only some middle part of the image is missing and the rest is correct

18:12 <Lynne> it's always the same part too, everywhere, so it's not a sync issue, I think

18:12 <Lynne> it seems like it could be alignment related somehow, though not sure

18:13 <Venemo> alignment of what?

18:15 <heat> gfxstrand, was NVK considerably harder cuz nvidia? or do you reckon it didn't matter much?

18:15 <heat> well, doesn't, it's still ongoing work ofc :)

18:15 <Venemo> Lynne: what is very peculiar here is that it fails the same way even if I force the code to copy the image line-by-line

18:16 <gfxstrand> heat: It's hard because we're going straight to "can play D3D11 and D3D12 games"

18:17 <gfxstrand> If you just want enough of OpenGL ES 2 to get a desktop up and going it's significantly easier.

18:17 <FL4SHK[m]> I'd like to be able to run some lower end emulators

18:18 <FL4SHK[m]> eventually

18:18 <gfxstrand> NVIDIA hardware is quite nice, actually. That's not at all the problem.

18:19 <karolherbst> the tldr on nvidia is, that the hardware is designed for driver developers

18:20 <soreau> is there a gl(es) driver for nvk-capable hw too, or you mean running $compositor on zink?

18:20 <Venemo> soreau: there is nouveau like always has been

18:20 <gfxstrand> There's a GL driver but Zink+NVK is already starting to outpace it

18:20 <soreau> I see

18:21 <soreau> well, don't forget there's a forest in the trees, somewhere..

18:22 <Lynne> Venemo: <some> alignment, after all, if the image has nicer dimensions it looks fine

18:22 <heat> gfxstrand, i thought it was a PITA to get docs on nvidia hw though? or did you folks solve that situation already?

18:22 <Lynne> I chose that image because it's all odd-sized

18:22 <gfxstrand> We have headers now

18:23 <gfxstrand> Which is a big step up

18:23 <gfxstrand> For the ISA, some folks have access to some docs and we have the PTX docs public which are often helpful.

18:23 <gfxstrand> But developing any GPU driver involves a certain amount of R/E anyway

18:30 sukrutb has joined #dri-devel

18:31 Omax_ has quit [Ping timeout: 480 seconds]

18:34 iive has joined #dri-devel

18:35 Omax has joined #dri-devel

18:35 <Venemo> Lynne: does it behave better with even-sized images?

18:41 nashpa has quit []

18:44 dliviu has joined #dri-devel

18:44 <FL4SHK[m]> if the GPU is open source surely there's no R/E involved other than reading the code

18:45 hansg has quit [Quit: Leaving]

18:45 <karolherbst> *doubt*

18:45 <FL4SHK[m]> Doubt for what?

18:45 <karolherbst> GPUs are quite complex

18:45 <idr> FL4SHK[m]: GPUs are complex. Some of the RE is, "What happens if I do these things together in a way nobody really thought about?"

18:45 <karolherbst> and reverse engineering isn't limited to binary blobs

18:46 <FL4SHK[m]> ah

18:46 <FL4SHK[m]> gotcha

18:46 <karolherbst> but yeah.. you might have to figure out how your open source GPU behaves doing certain things as it might not be obvious from the code

18:46 <karolherbst> maybe "debugging" would be the better term here? dunno :)

18:46 <FL4SHK[m]> right

18:47 <heat> i've gone through the intel GPU docs a fair bit... safe to say they don't tell you all the things you need to know

18:47 <FL4SHK[m]> In my case I will be developing both the GPU and the driver

18:47 <karolherbst> and usually: hw trumps the spec/code in any argument

18:47 <heat> and between the thousands of pages of docs and the i915 kernel driver... yeah i gave up on that pretty quickly :)))

18:49 lynxeye has quit [Quit: Leaving.]

18:49 <gfxstrand> Well, reverse engineering is just debugging something you don't have the capacity to change.

18:49 <airlied> also a gpu is a lot more than just compute execution units

18:49 <gfxstrand> So you're just replacing debugging something you can't change with debugging something you can.

18:50 <karolherbst> :D fair

18:50 <airlied> those are the fun pieces, but for a useful graphics gpu, you'd probably want texture units at least, and maybe hw blending

18:51 alanc has quit [Remote host closed the connection]

18:52 alanc has joined #dri-devel

18:54 <vsyrjala> iirc some pirate once said: "hardware docs are more of what you'd call guidelines than an actual description of how the hardware really works"

18:55 <gfxstrand> hehe

18:55 <gfxstrand> Pretty much

18:55 <dj-death> when it's not outright lies

18:56 <karolherbst> hard to tell if anybody actually lies there,b

18:56 <gfxstrand> That's the best part of not having docs. There's nothing to lie to me!

18:56 <FL4SHK[m]> <airlied> "those are the fun pieces, but..." <- not sure there's much more I'll be including in mine

18:56 <karolherbst> but the code to design the hw doesn't have to match it out of various reasons, including bugs

18:56 <dj-death> yeah

18:56 <dj-death> gfxstrand: nice little feeling not to have to trust anybody ;)

18:57 <karolherbst> at least nvidia just leaves out the parts they don't want to share :D

18:58 <vsyrjala> i feel docs are a bit of a double edged sword sometimes. inexperienced developers tend to blindly trust what they read there, and then nothing works correctly

19:13 <daniels> just like the average CV, they're aspirational but unreliable

19:13 <karolherbst> or they end up debugging their own code for days just until somebody tells them the docs are wrong or come to that conclusion on their own or in the worst case give up

19:15 urja has quit [Ping timeout: 480 seconds]

19:15 Leopold has quit [Remote host closed the connection]

19:16 urja has joined #dri-devel

19:27 i509vcb has joined #dri-devel

19:27 biju has quit []

19:28 mszyprow has joined #dri-devel

19:29 Leopold has joined #dri-devel

19:35 V has quit [Ping timeout: 480 seconds]

19:36 V has joined #dri-devel

19:37 mszyprow has quit [Ping timeout: 480 seconds]

19:48 Mangix_ has joined #dri-devel

19:49 Mangix has quit [Ping timeout: 480 seconds]

20:03 gouchi has joined #dri-devel

20:08 Mangix_ has quit [Ping timeout: 480 seconds]

20:12 azerov has quit []

20:14 <Lynne> Venemo: yup

20:17 <Lynne> it's only the width that matters, the height can be od

20:17 <Lynne> *d

20:20 <Lynne> correction: it happens on 1024x1024 images too

20:21 <Lynne> corruption seems to depend on dimensions but so far seems to be random

20:22 <Lynne> the corruption on 1024x1024 does look like an incorrect stride issue for the chroma

20:24 yann-kaelig has joined #dri-devel

20:26 mszyprow has joined #dri-devel

20:27 rsripada has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]

20:27 rsripada has joined #dri-devel

20:47 jsa has quit []

20:49 <karolherbst> okay.. the u_trace stuff has a use-after-free :')

20:49 <karolherbst> apparently util_queue_finish doesn't properly clean up the thread started in util_queue_init

20:50 <karolherbst> mhh maybe it's more of a u_queue issue then

20:57 rsalvaterra has quit []

20:58 rsalvaterra has joined #dri-devel

20:59 sima has quit [Ping timeout: 480 seconds]

21:06 tursulin has quit [Ping timeout: 480 seconds]

21:06 Dr_Who has quit [Read error: Connection reset by peer]

21:07 fab has quit [Quit: fab]

21:08 <Venemo> Lynne: so, if you take the same image, but add 1 pixel to each side, it will work?

21:08 i-garrison has quit [Remote host closed the connection]

21:09 i-garrison has joined #dri-devel

21:11 konstantin_ has joined #dri-devel

21:12 <Venemo> Lynne: really weird. can you give me an image to reproduce with at 1024x1024?

21:14 konstantin has quit [Ping timeout: 480 seconds]

21:17 konstantin has joined #dri-devel

21:19 mvlad has quit [Remote host closed the connection]

21:19 yann-kaelig has quit [Ping timeout: 480 seconds]

21:21 konstantin_ has quit [Ping timeout: 480 seconds]

21:31 <Lynne> better yet, I can teach you to make your own: "ffmpeg -i test.png -vf crop=w=1359:h=1791 -y test_cropped.png"

21:32 <Lynne> you can use any larger image as an input and crop to whatever size you need, the ffmpeg command to upload+download always does a format conversion to yuv422

21:33 gouchi has quit [Remote host closed the connection]

21:34 <Venemo> Lynne: by any chance, have you tried if this works on amdvlk?

21:36 yann-kaelig has joined #dri-devel

21:37 <Venemo> it probably will

21:37 <Lynne> err, no, I haven't, it's a pain to use since it hamfistedly overrides the default driver

21:37 <Venemo> if it's any consolation, I've found 3 other bugs while chasing this, but neither of them have anything to do with your use case sadly

21:39 azerov has joined #dri-devel

21:41 Duke`` has quit [Ping timeout: 480 seconds]

21:43 azerov has quit []

21:43 azerov has joined #dri-devel

21:48 yann-kaelig has quit [Ping timeout: 480 seconds]

21:51 yann-kaelig has joined #dri-devel

21:53 <Lynne> bad news: it works just fine on amdvlk

21:54 <Venemo> it's not bad news, it means whatever the problem is, it's solvable

21:57 <Venemo> Lynne: can you help me understand how the 3 planes are combined into a single image that I can see in the file?

21:57 <Lynne> err, turns out it works because amdvlk doesn't support vulkan's multiplane 422

21:58 <Venemo> heh

21:59 a-865 has quit [Quit: ChatZilla 0.18b1 [SeaMonkey 2.53.18/20231107000034]]

21:59 <Lynne> yeah, so after downloading, there's a conversion step to turn it into rgb to compress it with png and output it

22:00 <Lynne> you can get the exact data out directly via .nut, but you'll just have to launch vlc with a cli arg to start in paused mode rather than play

22:01 <Lynne> internally, the vkformat 1000156004 is used for the images, with the vkimage being non-disjoint, regularly allocated, optimally-tiled

22:03 a-865 has joined #dri-devel

22:14 azerov has quit []

22:15 azerov has joined #dri-devel

22:24 azerov has quit []

22:25 azerov has joined #dri-devel

22:33 <Venemo> Lynne: I think I found the solution... who the hell would have thought that the size of a plane is not the same as the image extent... the issue seems to be gone if I use vk_format_get_plane_width/height

22:33 <Venemo> give me a few minutes to update the branch

22:40 <jenatali> Venemo: That's what subsampling does, though

22:42 <Venemo> I learn something new and interesting every day in this job

22:45 smaeul has quit [Quit: Down for maintenance...]

22:45 apinheiro has quit [Quit: Leaving]

22:46 ykaelig has joined #dri-devel

22:49 yann-kaelig has quit [Ping timeout: 480 seconds]

22:53 yann-kaelig has joined #dri-devel

22:55 ykaelig has quit [Ping timeout: 480 seconds]

23:00 ykaelig has joined #dri-devel

23:02 yann-kaelig has quit [Ping timeout: 480 seconds]

23:10 sukrutb has quit [Ping timeout: 480 seconds]

23:14 mszyprow has quit [Ping timeout: 480 seconds]

23:33 crabbedhaloablut has quit []

23:39 mszyprow has joined #dri-devel

23:48 msizanoen[m] has quit []

23:54 <Venemo> Lynne: I've updated the MR now, can you pls check if the issue is gone?

23:55 mszyprow has quit [Read error: Connection reset by peer]

23:59 <Lynne> Venemo: yup, fully works now!

23:59 linyaa_ is now known as linyaa