#dri-devel on 2022-12-14 — irc logs at oftc.irclog.whitequark.org

2022-08-14 19:45 ChanServ changed the topic of #dri-devel to: <ajax> nothing involved with X should ever be unable to find a bar

00:02 <mattst88> DemiMarie: I'm unaware of any security issues in the GuC, past or present. I'm just aware of consistent performance problems when using it for command submission

00:02 iive has quit [Quit: They came for me...]

00:08 * airlied can't wait until someone hits gsprm fw with a fuzzer :-P

00:11 <airlied> then we can buffer overflow and it upload our own fw :-P

00:11 <alyssa> marcan: ^^ have fun

00:12 <airlied> get some rust on risc-v fw :-P

00:12 alyssa has quit [Quit: leaving]

00:12 alyssa has joined #dri-devel

00:12 ahajda__ has quit []

00:12 <alyssa> that's more lina and jekstrand's dept

00:21 <Lynne> airlied: rebased my patchset, going to compare indices with nvdec again

00:41 ybogdano has quit [Read error: Connection reset by peer]

00:52 <airlied> Lynne: now I can't get more than 2 good frames :-(

00:53 <Lynne> did I mess anything up?

00:53 <airlied> not sure, will have to check out what is going wrong

00:55 <Lynne> oh, I didn't merge your previous diff

00:58 <airlied> okay hacked up to 5 frames going again

01:12 <marcan> alyssa: :p

01:13 zf has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]

01:13 Haaninjo has quit [Quit: Ex-Chat]

01:16 co1umbarius has joined #dri-devel

01:18 columbarius has quit [Ping timeout: 480 seconds]

01:19 zf has joined #dri-devel

01:20 <airlied> Lynne: so do you get any good frames from nvidia?

01:20 sarnex_ has quit []

01:21 sarnex has joined #dri-devel

01:33 zf has quit [Ping timeout: 480 seconds]

01:34 YuGiOhJCJ has joined #dri-devel

01:37 <Lynne> no, I mentioned last night it broke stuff there

01:37 <airlied> Lynne: yeah just wondering how badly

01:38 <Lynne> as badly as on AMD without your diff, let me check

01:40 <Lynne> yeah, I think so

01:40 <Lynne> with the difference nvidia's first 2 gops have completely green frames on I-frames if the indices are incorrect

01:46 * airlied resorts to reading the dxvk h265 spec

01:53 <DemiMarie> mattst88: then why use it instead of execlists and vexeclists?

01:55 zf has joined #dri-devel

01:56 <mattst88> DemiMarie: I honestly don't know. I've heard why it *could* be better, but I've only ever been aware of it being worse (e.g. in terms of GPU clock management)

01:57 <mattst88> what I heard when I was at Intel was that the GuC was created because Windows syscall overhead was super high, so being able to submit via the GuC avoided that overhead

01:58 <mattst88> and then because it existed, they powers that be wanted to have only one supported path, so naturally Linux must switch to GuC even if it didn't suffer from the problem the GuC was created to solve

01:58 <mattst88> but I don't know any of that first hand

01:59 <jekstrand> Part of the problem is that no one ever did the work to make i915 use the GuC properly.

02:00 <jekstrand> I think if you use the GuC as a proper submission firmware and stop trying to treat it like an execlist back-end, it probably works a lot better.

02:00 <jekstrand> Which doesn't mean you need to open it up to direct submit from userspace.

02:00 <jekstrand> direct-submit and "use the GuC properly" are two totally different things.

02:02 <alyssa> jekstrand: So, I hear there's this 3 letter company that has direct-submit for their hardware under Linux..

02:04 <DemiMarie> mattst88: what about pushing back on that and telling them that they need to keep supporting execlists?

02:04 ngcortes has quit [Quit: Leaving]

02:04 <mattst88> DemiMarie: we did for years

02:04 <DemiMarie> then what happened?

02:04 <DemiMarie> did they change the hardware to make it impossible?

02:04 <mattst88> IIRC there were plans to switch to GuC by default for SKL, then for KBL, then for CNL (which never shipped), etc etc

02:05 <jekstrand> Execlists still exist but they're untested, entirely unused on Windows, and they want to get rid of them.

02:05 <jekstrand> They're also pretty shitty, actually

02:05 <DemiMarie> In what ways?

02:05 <mattst88> I don't know of any hardware chances that would make it impossible, but anything untested is bound to be broken

02:05 <DemiMarie> Untested by who?

02:05 <mattst88> anyone

02:05 <DemiMarie> Even i915?

02:05 <jekstrand> Untested by the people who will catch bugs fast enough to fix them.

02:06 <alyssa> Wait, if Windows syscall overhead is the problem

02:06 <mattst88> I think GuC submission is finally on-by-default on Alderlake, so I expect any bugs in the execlist path to pretty much be ignored going forward until it bitrots and ends up being removed

02:06 <alyssa> What's wrong with Android syscall overhead

02:06 <mattst88> alyssa: does Intel ship any hardware used in Android products? :)

02:06 <jekstrand> As for what's wrong with them: They're horribly racy.

02:06 <DemiMarie> What do you mean?

02:07 Company has quit [Quit: Leaving]

02:07 <jekstrand> There's like 10 execlist ports and in order to keep the GPU full, you're supposed to cycle between them except if you smash registers at the wrong time, you might end up preempting and swapping out your job without knowing it.

02:07 <jekstrand> Also, the feedback mechanism to figure out what's running and what the status was is pretty racy too.

02:07 <jekstrand> It's a horrible hardware design.

02:08 <DemiMarie> Is the GuC much nicer?

02:08 <jekstrand> IDK how nice it is internally but the interface is pretty much what you want for a competent modern driver.

02:08 <DemiMarie> How does it compare to nvidia’s GSP?

02:09 <jekstrand> It presents some number (like 1k? 16k?) virtual queues and you just stick stuff on them and it load balances between all the active ones, respecting priority.

02:09 <DemiMarie> Also I wonder if one of the advantages of such processors is that because they are so small, they can just busy spin all the time and not wake up the host.

02:09 <jekstrand> It's roughly the same design concept as nvidia's GSP. I'm sure NVIDIA did it better because they're NVIDIA and Intel is Intel but it's the same basic idea.

02:09 yuq825 has joined #dri-devel

02:10 <airlied> gsp is the same submit fw as pre-gsp

02:10 <airlied> gsp didn't really change that piece too much

02:11 <alyssa> mattst88: I meant for another vendor pushing userspace submit that sells mostly Android

02:11 <DemiMarie> airlied: then what does gsp change? and why is it like 40MB?

02:16 <airlied> DemiMarie: it changes the init procedure really

02:16 <airlied> instead of the cpu driver loading up all the individual firmwares in a complicated and horrible sequence

02:16 <airlied> they offload a chunk of that work to the GSP, which has all the sequence logic etc

02:16 <DemiMarie> airlied: what is so horrible about it? and does having the GSP just mean that the GSP firmware writers had to write that horrible code?

02:17 <DemiMarie> and why does the GSP FW have to be so much larger than all the other FW combined?

02:17 <airlied> DemiMarie: because it contains a lot of versions of things for lots of gpus

02:17 <Lynne> airlied: compared values for nvdec and vulkan, found a small diff in the dbp slot, fixed it

02:17 <jenatali> airlied: Did you mean the DXVA spec or did you really mean DXVK?

02:17 <Lynne> now both amd and nvidia do 5 frames correctly, then get messed up output

02:17 <airlied> jenatali: lols DXVA

02:17 <airlied> Lynne: win!

02:18 <airlied> Lynne: so frame 6 is the first where the slot index don't match anymore

02:18 <Lynne> they get messed up output in the same way visually, so that is indeed a win

02:18 <Lynne> yup

02:18 <DemiMarie> jekstrand: Is Intel known for making bad GPUs?

02:18 <airlied> they aren't know for making good GPUs

02:18 <Lynne> batum-pssst

02:19 <jenatali> Ok that's what I thought lol

02:19 <jekstrand> DemiMarie: I mean, they work. But NVIDIA's clearly better at it.

02:19 <jekstrand> All GPUs are horrible somewhere.

02:19 <DemiMarie> Any more than CPUs?

02:20 <jekstrand> ¯\_(ツ)_/¯

02:22 <lina> I'm impressed that Apple managed to do mostly driver-transparent multi-GPU scalability on a tiler that actually works

02:22 <alyssa> same

02:22 <lina> They probably deserve some credit in GPU-land for that one ^^

02:22 <DemiMarie> lina: hello! why is this so impressive?

02:23 <lina> Splitting work across multiple GPUs is pretty hard! And then a tile has to do it twice, with a shuffle step in the middle.

02:23 <lina> *tiler

02:23 <alyssa> i'm just impressed someone managed to make a fast tiler

02:23 <DemiMarie> what was the hardest part about writing asahi.ko?

02:24 <alyssa> me neighsaying rust?

02:24 <lina> I'm not sur--- yeah maybe that ^^

02:24 <DemiMarie> alyssa: were you skeptical at first?

02:25 <alyssa> very

02:25 <DemiMarie> why, and what changed?

02:25 <alyssa> thought the bindings would be hell and it would delay release

02:26 <alyssa> from what I can tell, they were and it did, but it was worth it because the driver is rock solid and that would never have happened in C

02:26 <alyssa> and I'd rather release a completely stable driver a few months late than race to push out a broken one that architecturally can never get fixed

02:27 <lina> To be honest, figuring out how to work around the stack placement issue was harder than the abstractions I think ^^

02:27 <alyssa> woof

02:27 <DemiMarie> I would say that “never” is a strong word, but (a) Intel and AMD’s drivers are not one-person efforts and (b) at least Intel’s driver makes completely broken assumptions about Linux’s Page Attribute Table

02:27 <lina> The abstractions was mostly just having to learn how to model it properly, but it's not that much code and not very complicated

02:27 <alyssa> lina: so... could I interest you in writing another kernel driver in Rust? (-:

02:27 <lina> Which one? ^^

02:28 <alyssa> Mali :-p

02:28 <lina> wwwww

02:28 <DemiMarie> alyssa: I thought there was already a driver for that

02:28 <lina> marcan mentioned something about DCP...

02:29 <HdkR> Speaking of Asahi, should I take a look at the asahi drm API yet?

02:29 <DemiMarie> DCP?

02:29 <lina> If you're interested! It's pretty minimal and only really implements what I need, but I think the design is reasonable (though I'm sure there's things to be improved)

02:30 <DemiMarie> Does anyone here wish that they could require userspace libraries that matched the kernel version, the way illumos and macOS do?

02:30 <lina> DemiMarie: Apple's display controller

02:30 <HdkR> I want to make sure the API won't make FEX too terribly upset once 32-bit applications start talking to it.

02:30 <alyssa> HdkR: No. the UAPI is going to be completely torched.

02:30 <HdkR> :O

02:30 <alyssa> i mean

02:30 <lina> Oh I thought HdkR meant the abstractions

02:30 <alyssa> the current UAPI architecturally has no sync

02:30 <lina> The UAPI no, yeah, that is provisional

02:30 <alyssa> lina: he wants to know when to start thunking the UAPI for FEX on Asahi :-p

02:31 <HdkR> Indeed

02:31 <alyssa> i don't mean "no implicit sync"

02:31 <lina> It should just work on 32-bit

02:31 <alyssa> or "no explicit sync"

02:31 <alyssa> i mean "no sync" ;-p

02:31 <lina> There are no pointer-sized pointers

02:31 <alyssa> lina: speaking of when are you going to start torching the UAPI

02:31 <alyssa> after plumbing compute?

02:31 <DemiMarie> lina: does 32-bit even matter in this case?

02:31 <lina> Not really, no, neither side cares

02:31 <HdkR> Yes!

02:32 <HdkR> I care :<

02:32 <DemiMarie> lina: does the hardware even support 32-bit userspace?

02:32 <lina> I meant neither driver side cares that one is 32-bit

02:32 <lina> DemiMarie: No, this is for emulation

02:32 <DemiMarie> lina: emulation of what?

02:32 <lina> x86

02:33 <HdkR> It's a tricky problem when an x86 GL/Vulkan communicates to an AArch64 DRM uapi :D

02:33 <lina> HdkR: TBH, I'd be more worried about 64-bit assumptions in the driver breaking in 32-bit builds more than the UAPI (though I didn't notice any so far)

02:33 <qyliss> DemiMarie: (FEX as mentioned above is an x86 emulator)

02:33 <lina> But the UAPI design should require no thunking at all, and if it does that's a bug

02:33 <lina> So you should be able to start using it just fine already, and see if you run into driver issues in 32-bit builds

02:33 <DemiMarie> linaalyssa: I had a cursed idea for avoiding a stable UAPI: have the driver provide a shared library that userspace is expected to dlopen(), and require userspace to do everything via that library. If you try to make an ioctl and your instruction emulator is *not* in that DSO your process just dies.

02:34 <qyliss> (not that it was mentioned above that it's an x86 emulator, but that it was mentioned at all)

02:34 <DemiMarie> qyliss: Interesting!

02:34 <lina> DemiMarie: That's just kicking the problem to another layer...

02:35 <HdkR> lina: Aye, I'm not expecting the uapi to have problems, but the sooner I can get it in to CI the better from my PoV. The rest of the problems are...likely to be painful

02:35 <DemiMarie> lina: good point, unless that DSO *is* Mesa 🤣

02:35 <HdkR> Just make sure not to allocate 4GB of contiguous VA space in your driver :<

02:35 <alyssa> "linaalyssa" is this a ship name >->

02:35 <lina> And then we already do that... there's a UAPI version field (that will probably go away once it's stable, but for now it refuses to load the driver if there is a mismatch) ^^

02:36 <DemiMarie> alyssa: ship name?

02:36 <alyssa> i will have you know that is very much not what's happening here thanks

02:36 <lina> HdkR: I don't think we do that ww

02:36 <HdkR> We'd find out quickly if it did :D

02:36 <DemiMarie> (This is where I really wish IRC supported proper quoted replies)

02:37 <lina> The only weirdness is the 16K page issue as you know (only 16K kernels are supported right now)

02:37 <qyliss> DemiMarie: https://en.wikipedia.org/wiki/Shipping_(fandom)

02:37 <HdkR> lina: I like the "right now" on the end there

02:37 <DemiMarie> qyliss: that is what I thought, but thanks for confirming!

02:37 <lina> sven has a WIP patch to make 4K kernels work for the general IOMMUs, but for the GPU I need to get the shmem helper to allocate 16K contiguous aligned pages and... that seems like it'll be painful

02:37 <lina> (if anyone has any ideas there I'm all ears!)

02:38 <HdkR> Also need to get that TSO bit wired up to a prctl at some point

02:38 <lina> Ah yeah, that too

02:38 <lina> Ping marcan about that

02:38 <HdkR> marcan: Gimme the TSO bit in a prctl please.

02:38 <lina> I think he said something about putting it in m1n1 as a stopgap so you can start testing?

02:39 <marcan> yeah, I can give you a systemwide global toggle in m1n1 very easily. prctl shouldn't be hard but that's another one for the kernel bikeshedding list I'm sure...

02:39 <DemiMarie> Would io_uring_cmd be useful for GPU drivers?

02:39 <HdkR> :)

02:40 <Lynne> airlied: could you renew your diff? it expired

02:40 <airlied> Lynne: not sure I have a clean version of it right now

02:40 <airlied> unless I have a git stash somewhere

02:40 <airlied> https://paste.centos.org/view/dff25c14 might have most of it

02:41 <airlied> Lynne: might also be worth trying the memsets on nvidia

02:41 <DemiMarie> jekstrand: how complex is the format of each queue w.r.t. parsing, etc? IOW: for a security-sensitive system like Qubes, should VFs be made to submit via the kernel for validation?

02:51 <Lynne> airlied: with all of your changes, I get 2 correct frames, with just my new changes + -1ing the arrays, I get 5 frames

02:51 <Lynne> on nvidia, it doesn't really help

02:52 <Lynne> the way the 2 implementations go messed up is different, it's like radv gets completely invalid refs, while nvidia just gets wrong refs

03:01 Daanct12 has joined #dri-devel

03:03 camus has joined #dri-devel

03:05 <airlied> Lynne: yay I win

03:06 <airlied> but no idea how to cleanly win :-P

03:06 <Lynne> I can't really finds any flaws with the logic in the code

03:06 <airlied> https://paste.centos.org/view/e4e2cd5c

03:06 <marcan> can I say I'm glad our video enc/dec hardware has nothing to do with the GPU?

03:06 <marcan> stashing it in the GPU sounds like triple the pain :p

03:06 <Lynne> RefPicSetStCurrBefore exactly matches nvdec

03:07 <airlied> Lynne: see how that patch works for nvidia

03:07 <Lynne> idx is just a counter that counts over the total refs

03:07 <airlied> it needs "refinement"

03:07 <Lynne> tmp2 counts over all refs+keyframe

03:07 <airlied> marcan: I think video enc/dec is horrible no matter where you do it

03:07 <airlied> Lynne: so what I worked out is approx this:

03:07 <marcan> fair :)

03:07 <airlied> the Before/After list is an index into the RefPicList

03:08 <Lynne> on nvidia, it breaks in such a way that it looks like the wrong refs are used

03:08 <airlied> the drivers generate the RefPicList from the VK structure ordering

03:08 <airlied> now I've no idea how nvidia generate their RefPicList

03:09 <Lynne> RefPicList you say? we have a list named like that in our vaapi code

03:10 <Lynne> if you're meant to generate that same list via the given 3 lists of before/after/cur refs, should compare them

03:10 <airlied> nope it is what those lists refer to

03:10 <airlied> you can't generate it from them

03:10 <airlied> the values in the before/after/cur arrays are indexes into the refpiclist

03:10 <airlied> the refpiclist contains indexes into the DPB

03:14 <airlied> Lynne: okay with all those fixed up B frames work for me as well

03:14 <airlied> now the question is whether we've just created a radv specific API user

03:16 <Lynne> I compared vaapi's refpiclist with RefPicSetStCurrAfter

03:16 <Lynne> they match if j + (j >= key_idx); gets changed to j; in vulkan

03:17 <Lynne> changing this value in vulkan does not actually change the output at all

03:17 <airlied> https://paste.centos.org/view/raw/63b9fcc3 is a cleaned up works on radv version

03:18 <airlied> Lynne: they aren't the same

03:18 <airlied> so shouldn't match

03:18 <airlied> RefPicList[RefPicSetStCurrAfter[i]] is how it works

03:19 <airlied> the question on vulkan is how the driver builds RefPicList

03:19 <airlied> since it's not passed in, so it must come from the slotIndexes in the picture list

03:19 <airlied> reference list

03:19 <airlied> tbh I think the spec is vague here

03:24 <Lynne> hey, that's pretty good, works here on everything I threw at it!

03:25 <Lynne> let me test on nvidia

03:26 <Lynne> novidya :(

03:26 <Lynne> either nvidia's implementation is broken, or we've created a radv-specific parameter version

03:26 <airlied> I don't think the spec is clear enough on those arrays

03:27 <airlied> in fact I think it just hides them in "h265 specifics"

03:27 <Lynne> it fails after frame 5 like before, with an output that makes it look like ref frames are blank

03:28 <airlied> I'm writing a public issue to see what we can find out

03:28 <airlied> it might be worth trying to changes slots[j] to j

03:29 <airlied> Lynne: okay filed 2010

03:30 <Lynne> changing slots[j] to j fixes nvidia

03:30 <airlied> Lynne: okay

03:30 <airlied> who is right :-P

03:30 <Lynne> we are, until proven otherwise

03:30 * airlied goes to read slotIndex for the n+1th time

03:30 <Lynne> I'll ping the nvidia guy on the thread to take a look at it along with h264

03:31 <Lynne> he still hasn't cheged h264 despite finishing that weeks ago

03:31 <Lynne> well, onwards to encoding now! the night is still young!

03:33 <airlied> Lynne: I tried changing radv to work with j, but the hw seems to dislike it

03:33 <Lynne> neat, it seems like the magic internal 10bit to 8bit conversion works

03:33 <airlied> Lynne: okay I can make radv work with j

03:35 <airlied> so maybe that is the correct answer

03:37 <Lynne> btw 10bit doesn't work because the driver only returns NV12 as a possible surface to decode into

03:38 <Lynne> I mean it works through magic, but you get 8bit output, not 10

03:38 <airlied> I suspect the surfface format is just the first step

03:39 <airlied> https://paste.centos.org/view/f2634ea4 works for me now on radv

03:39 <airlied> Lynne: got a video/command line I can test 10bit with?

03:50 heat_ has joined #dri-devel

03:50 heat has quit [Read error: Connection reset by peer]

03:50 <airlied> Lynne: radv should report a 10-bit format now

03:53 <airlied> and might even do the right thing now

03:55 <Lynne> how do you deal with the messed up p010 format, which isn't really p010?

03:55 <Lynne> because in vulkan, the 6bit padding is in the LSBs

03:56 <Lynne> whilst everyone else puts the padding in the MSBs afaik for p010

03:57 <airlied> that's why I need some testing, I've no idea about how it works

03:58 <airlied> what does nvidia return?

03:58 <Lynne> it returns an image, which looks wrong

03:58 <airlied> but what format?

03:59 <Lynne> well, VK_FORMAT_G10X6_B10X6_R10X6_3PLANE_420_UNORM_3PACK16

04:01 <Lynne> updated my repo

04:01 <Lynne> radv returns an init error when trying to decode

04:01 <Lynne> command line is

04:02 <Lynne> ./ffmpeg_g -init_hw_device "vulkan=vk:0,debug=1" -hwaccel vulkan -hwaccel_output_format vulkan -i <INPUT> -loglevel debug -filter_hw_device vk -vf hwdownload,format=p016 -c:v rawvideo -an -y <OUTPUT>

04:03 <Lynne> for a test file, give me a sec

04:04 <airlied> Lynne: do you fill lumaBitDepth out correctly in the profile?

04:04 * airlied assumes that will be 10-bit

04:05 <airlied> or maybe I'm mixing up how I should be deciding on the formats to report

04:06 <Lynne> no

04:06 <Lynne> err, yes

04:07 <Lynne> sample - https://files.lynne.ee/TEST_HEVC_QP8_WITH_B_32GOP_10BIT.mkv

04:07 <Lynne> lumadepth is set to VK_VIDEO_COMPONENT_BIT_DEPTH_10_BIT_KHR

04:08 <Lynne> what endianess did vulkan specify packed stuff to be in?

04:08 <Lynne> was it big or little?

04:09 <airlied> doh, fixed the start of it, now it dies later

04:10 <airlied> the spec writes out the expectation in sentences for all the packed format

04:10 <Lynne> yeah, it mentions no endianess there, so I presume either native endian or big endian?

04:12 <airlied> VK_FORMAT_G10X6_B10X6R10X6_2PLANE_420_UNORM_3PACK16 specifies an unsigned normalized multi-planar format that has a 10-bit G component in the top 10 bits of each 16-bit word of plane 0, and a two-component, 32-bit BR plane 1 consisting of a 10-bit B component in the top 10 bits of the word in bytes 0..1, and a 10-bit R component in the top 10 bits of the word in bytes 2..3, with the bottom 6 bits of

04:12 <airlied> each word unused.

04:12 <airlied> so that seems like it says, G10 is top bits

04:13 <Lynne> so big endian?

04:14 <Lynne> usually endianess is on a lower level so they don't mention it and assume native endian

04:16 <airlied> " The in-memory ordering of bytes within a component is determined by the host endianness."

04:18 <Lynne> right, native then

04:20 bmodem has joined #dri-devel

04:20 <airlied> Lynne: okay gets further now

04:20 heat_ has quit [Ping timeout: 480 seconds]

04:31 <airlied> Lynne: so you are looking at h264 encoding first?

04:33 <Lynne> doesn't really matter whether it's hevc or h264, hardware encoding APIs are all clean enough to let you implement anything with the same codepath

04:34 <Lynne> give me a few moments to verify 10bit uploading and downloading works so you can be sure the driver's wrong

04:41 <Lynne> err... no 10bit formats are supported at all by radv?

04:41 <Lynne> no VK_FORMAT_G10X6_B10X6_R10X6_3PLANE_420_UNORM_3PACK16, no VK_FORMAT_G10X6_B10X6R10X6_2PLANE_420_UNORM_3PACK16

04:41 <Lynne> not listed in vulkaninfo --show-formats

04:42 cengiz_io has joined #dri-devel

04:42 <airlied> oh I should see why not

04:43 <airlied> I'm just showing them in the video formats, didn't check out the radv bits

04:46 <Lynne> on nvidia, verified VK_FORMAT_G10X6_B10X6_R10X6_3PLANE_420_UNORM_3PACK16 works

04:46 <Lynne> but VK_FORMAT_G10X6_B10X6R10X6_2PLANE_420_UNORM_3PACK16 gives clearly wrong output

04:46 <Lynne> I'm not sure if it's a driver bug or I'm doing something odd (shouldn't be, it's just upload+download)

04:47 <airlied> Lynne: I'm seeing FORMAT_G10X6_B10X6R10X6_2PLANE_420_UNORM_3PACK16 in supported format

04:48 <Lynne> not here on navi21

04:49 <airlied> Lynne: using the top of my branch?

04:50 <airlied> https://paste.centos.org/view/raw/f9be8edb

04:54 <Lynne> yup, works now, output looks wrong

04:54 <Lynne> you can use "./ffmpeg_g -init_hw_device "vulkan=vk:0,debug=1" -i <INPUT> -filter_hw_device vk -vf "format=p016,hwupload,hwdownload,format=p016" -c:v rawvideo -y TEST.nut" to test

04:54 <Lynne> I'm compiling on my intel machine so I have a third ref

04:57 <alyssa> I think I'm doing something nuts

04:57 <HdkR> :O

04:57 <alyssa> and it's for a bad cause

04:57 <alyssa> (moar fps on bifrost)

04:59 <Lynne> not supported on intel too

05:03 <airlied> Lynne: why do you call it P016?

05:05 <Lynne> well, vulkan puts the padding in the LSBs for p010, ffmpeg expects the padding in the MSBs since that's what everyone else uses afaik

05:05 <Lynne> so treating it like p016 is a way of solving this, provided the padding is always 0s

05:07 <airlied> the microsfot docs seem to suggest ls

05:07 <airlied> lsb

05:07 <airlied> "When the graphics hardware reads a surface that contains a 10-bit representation, it should ignore the low-order 6 bits of each channel. If a surface contains valid 16-bit data, however, it should be identified as a 16-bit surface."

05:12 <Lynne> right, you're right, ffmpeg does expect padding in the LSBs

05:12 <Lynne> (why do I remember it being the other way around? can't remember)

05:17 <Lynne> right, now I remember, ffmpeg expects padding for 3plane formats in the LSBs

05:17 <Lynne> *MSBs

05:18 <Lynne> so yuv420p10 is <padding><data>

05:19 <Lynne> updated my repo, use p010 in both places in the command line

05:19 <Lynne> output is still wrong

05:19 <Lynne> and output is wrong in slightly different ways on nvidia and radv...

05:21 lina has quit [Remote host closed the connection]

05:21 <alyssa> HdkR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20311/diffs?commit_id=7103a3b7380c43f07378e3d050bf2172f468b83c

05:21 <alyssa> Maybe I got a bit carried away.

05:22 kchibisov_ has left #dri-devel [#dri-devel]

05:22 <airlied> Lynne: you seeing the yellow and red on radv?

05:23 <Lynne> yup, on nvidia it's read and green

05:23 <Lynne> *red

05:24 kchibisov has joined #dri-devel

05:26 <HdkR> alyssa: ooo, fancy swizzles

05:26 lina has joined #dri-devel

05:27 <airlied> Lynne: with the cmd line above I don't seem to be hitting radv at all

05:28 <airlied> at least not the dcoder

05:29 <Lynne> yeah, that's right, it's just an upload+downlad test

05:29 <airlied> ah cool

05:29 <Lynne> if you'd like to run decoding, use the command line you've used previously but with p010 instead of nv12

05:30 <Lynne> but if upload+download looks wrong, it's probably good to fix that first

05:32 <airlied> Lynne: btw they are only ratifying decode this year, encode is into next q

05:34 <Lynne> huh, okay. I bet this information alone is under NDA, so I'll pretend I didn't hear it.

05:37 off^ has joined #dri-devel

05:39 * airlied gotta finish for today, but yeah forget I said it, but also why I'm not as pushed on encode :-P

05:41 aravind has joined #dri-devel

05:42 <Lynne> I am slightly miffed, no video devs/users were involved in the standardization process, officially or unofficially (I did come in here and ask a few times for a bone, since it would've really saved on rewriting vulkan code while I was writing it)

05:44 <lina> I'm starting to suspect that the remaining KWin and Firefox glitches/issues I'm running into might be related to CPU out of order execution or other craziness like that... let's see if I can get down to the bottom of it today, but this is really weird...

05:44 <Lynne> and at the end, the only serious implementation was written by the busiest person in the world and someone not really paid to work on it

05:44 <lina> Valgrind makes it work properly, apitrace makes it work properly...

05:45 <airlied> Lynne: yes it's pretty crazy, the commitments from the gpu vendors has been pretty lax

05:45 dakr has quit [Read error: No route to host]

05:47 dakr has joined #dri-devel

05:48 Duke`` has joined #dri-devel

05:49 <DemiMarie> lina: are you using break-before-make when changing GPU page tables m

05:51 <lina> It's not that, the badness happens at the application level

05:52 <DemiMarie> Still, you really ought to be

05:52 <DemiMarie> This assumes that the GPU works like the CPU, which it probably does.

05:52 <lina> We always just map or unmap, there are no mutation operations

05:54 <DemiMarie> I see

06:09 lemonzest has joined #dri-devel

06:13 nchery has quit [Ping timeout: 480 seconds]

06:13 tzimmermann has joined #dri-devel

06:17 bgs has joined #dri-devel

06:17 itoral has joined #dri-devel

06:21 Duke`` has quit [Ping timeout: 480 seconds]

06:31 danvet has joined #dri-devel

06:35 nchery has joined #dri-devel

06:36 ahajda__ has joined #dri-devel

06:46 <Lynne> airlied: 10bit decoding actually works

06:46 <Lynne> it's the format itself which seems to be wrong

06:47 <Lynne> if you add ,format=yuv420p10 after the ,format=p010 the output is correct

06:47 <Lynne> for both decoding and the hwupload+download test

06:48 <Lynne> OH, now I realize why

06:49 <Lynne> no one bothered to standardize packed format for NUT

06:49 <Lynne> (the container/protocol I'm working on and should've been working on instead does support it, but it's still wip)

06:50 <Lynne> if you save it to raw raw or convert it to planar yuv it works, so no issues with anything

06:57 <Lynne> yup, works on Nvidia as well

06:58 <Lynne> the output, however, isn't fully okay on Nvidia, the colors are off

06:59 <Lynne> RADV's output is compliant and agrees with the software decoder, but Nvidia has visible chroma artifacts if you turn down the gamma

07:02 <Lynne> only happens on 10bits

07:03 <Lynne> either way, a problem with nvidia, probably misreading header values

07:05 kts has joined #dri-devel

07:13 <Lynne> so, uh, now there really is nothing left to do but work on encoding

07:25 <Lynne> (also, btw, since vulkan has a mechanism to expose which formats vulkan can decode into, for 10bits, radv ought to signal both nv12 and p010 for 10bit content and let clients choose the best one)

07:29 alanc has quit [Remote host closed the connection]

07:29 <Lynne> also the driver ought to signal 8 for maximum active refs for hevc, not 17

07:30 alanc has joined #dri-devel

07:31 <Lynne> (I think, if maximum active refs is the total number of refs a frame can use rather than the total number of refs in the DPB)

07:40 <airlied> Lynne: oh okay I'll fix the formast outputs

07:44 <airlied> okay pushed out both of those fixes

07:44 mvlad has joined #dri-devel

07:44 <Lynne> I updated my repo and post with I think most of the spec issues I found

07:48 <Lynne> something seems to have gone wrong, tons of validation layers now

07:52 camus has quit [Remote host closed the connection]

07:54 <airlied> oops I thinhk I misttype

07:54 frieder has joined #dri-devel

07:54 <airlied> oh no that isn't it

07:54 camus has joined #dri-devel

07:55 <airlied> the image formst stuff looks broken somewhere

07:55 rasterman has joined #dri-devel

07:55 <Lynne> no, my code was bad, it's good now

07:57 rmckeever has quit [Quit: Leaving]

07:58 lynxeye has joined #dri-devel

07:59 <Lynne> branch updated; it's 9 in the morning here

08:06 <Lynne> by the way, decoding is veeery slow compared to vaapi, 1080p hevc is 700fps for vaapi but only 60fps for vulkan

08:06 <Lynne> where is the extra overhead coming from?

08:07 <Lynne> no downloading or presenting at all, this is just ffmpeg decoding and then doing nothing with the frames

08:08 warpme_____ has joined #dri-devel

08:08 <Lynne> seems very CPU-bound, perf would probably expose it, but my perf is broken after I forgot to update it after my kernel, so that's a mystery for another time

08:11 fab has joined #dri-devel

08:12 tursulin has joined #dri-devel

08:13 junaid has joined #dri-devel

08:15 macromorgan has quit [Read error: Connection reset by peer]

08:22 junaid has quit [Remote host closed the connection]

08:23 <airlied> Lynne: not sure where, probably not getting the pipelining very right, or waiting for some hw takes a lot longer

08:29 jkrzyszt has joined #dri-devel

08:36 sgruszka has joined #dri-devel

08:49 MajorBiscuit has joined #dri-devel

08:54 dcz_ has joined #dri-devel

08:57 junaid has joined #dri-devel

08:58 rgallaispou has joined #dri-devel

09:04 apinheiro has joined #dri-devel

09:04 djbw has quit [Read error: Connection reset by peer]

09:09 aknautiy_ has quit [Remote host closed the connection]

09:12 Akari has joined #dri-devel

09:12 Daaanct12 has joined #dri-devel

09:27 YuGiOhJCJ has quit [Quit: YuGiOhJCJ]

09:31 Daaanct12 has quit [Remote host closed the connection]

09:37 junaid has quit [Ping timeout: 480 seconds]

09:43 junaid has joined #dri-devel

09:44 agd5f has quit [Ping timeout: 480 seconds]

09:53 junaid has quit [Ping timeout: 480 seconds]

09:56 sarahwalker has joined #dri-devel

09:58 sgruszka has quit [Ping timeout: 480 seconds]

10:10 vliaskov has joined #dri-devel

10:14 Company has joined #dri-devel

10:16 TMM has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]

10:16 TMM has joined #dri-devel

10:25 dcz_ has quit [Ping timeout: 480 seconds]

10:26 srslypascal is now known as Guest2105

10:27 srslypascal has joined #dri-devel

10:27 MajorBiscuit has quit [Ping timeout: 480 seconds]

10:29 devilhorns has joined #dri-devel

10:33 Guest2105 has quit [Ping timeout: 480 seconds]

10:33 sgruszka_ has joined #dri-devel

10:33 natto has quit []

10:38 natto has joined #dri-devel

10:38 natto has quit []

10:42 natto has joined #dri-devel

10:43 <daniels> mupuf: you were already looking at fixing this right? https://gitlab.freedesktop.org/mesa/mesa/-/jobs/32797788

10:43 <mupuf> daniels: Yeah, let me check if it used the fixed version or not

10:43 <daniels> oh, mid-air collision :)

10:44 <mupuf> yeah, it was using the older version. Let me check that we are indeed using the new versions now

10:44 natto has quit []

10:44 <mupuf> yeah, we are. So, it shouldn't happen anymore (we retry 3 times to download artifacts, and if it still fails, we ignore the missing files and pretend all is fine)

10:44 natto has joined #dri-devel

10:47 natto has quit []

10:49 natto has joined #dri-devel

10:59 srslypascal has quit [Ping timeout: 480 seconds]

11:16 natto has quit []

11:18 natto has joined #dri-devel

11:51 MajorBiscuit has joined #dri-devel

12:02 pcercuei has joined #dri-devel

12:08 srslypascal has joined #dri-devel

12:25 Daanct12 has quit [Quit: Quitting]

12:29 jkrzyszt has quit [Remote host closed the connection]

12:32 aravind has quit [Remote host closed the connection]

12:34 junaid has joined #dri-devel

12:36 MajorBiscuit has quit [Quit: WeeChat 3.6]

12:36 MajorBiscuit has joined #dri-devel

12:37 Lucretia has quit [Read error: Connection reset by peer]

12:40 jkrzyszt has joined #dri-devel

12:41 sarahwalker has quit [Remote host closed the connection]

12:42 Lucretia has joined #dri-devel

12:48 srslypascal has quit [Quit: Leaving]

12:56 jkrzyszt has quit [Remote host closed the connection]

13:01 dcz_ has joined #dri-devel

13:14 heat_ has joined #dri-devel

13:16 junaid has quit [Ping timeout: 480 seconds]

13:18 jkrzyszt has joined #dri-devel

13:20 itoral has quit [Remote host closed the connection]

13:20 sgruszka_ has quit [Ping timeout: 480 seconds]

13:27 columbarius has joined #dri-devel

13:28 co1umbarius has quit [Ping timeout: 480 seconds]

13:34 kts has quit [Quit: Leaving]

13:45 co1umbarius has joined #dri-devel

13:45 sarahwalker has joined #dri-devel

13:47 columbarius has quit [Ping timeout: 480 seconds]

13:49 <LordKalma> so I managed to finish that driver that was causing me to bang my head against the wall

13:49 <LordKalma> https://github.com/ruilvo/panel-jinglitai-jlt4013a

13:49 <LordKalma> do you reckon this is upstream-worthy material?

13:55 sgruszka has joined #dri-devel

13:57 devilhorns has quit []

14:03 yuq825 has left #dri-devel [#dri-devel]

14:05 agd5f has joined #dri-devel

14:05 <CounterPillow> imho don't credit their author if they chose to not comply with the GPL, you wrote the source after all

14:06 <CounterPillow> oh nvm, you do that

14:06 <agd5f> bnieuwenhuizen, it was sent a while ago, just waiting for the linux-firmare maintainers to apply it

14:06 jkrzyszt has quit [Remote host closed the connection]

14:08 <bnieuwenhuizen> agd5f: I think they just did :) thx!

14:08 <agd5f> excellent

14:09 columbarius has joined #dri-devel

14:09 co1umbarius has quit [Ping timeout: 480 seconds]

14:12 kts has joined #dri-devel

14:17 <LordKalma> CounterPillow, it's just in the readme.md as a factoid

14:19 <CounterPillow> LordKalma: I think instead of the pr_ commands with a manual prefix you should use the dev_info macro with the device as an argument (you should be able to get it from struct drm_panel's dev member)

14:19 <LordKalma> in fact, we can remove those debug messages now

14:19 <CounterPillow> if you can obtain a datasheet of the panel somehow and document what some of those magic values are, that would be pretty cool too.

14:19 <LordKalma> I do have a datasheet

14:20 <LordKalma> I don't remember where I got the controller datasheet

14:20 <LordKalma> I know I have the *panel* datasheet legitimately

14:20 <CounterPillow> that's fine, doesn't have to be linked to

14:20 <LordKalma> as in, I asked the supplier

14:21 <CounterPillow> naming the commands (0xEB, 0xEC, 0xED) in actual defines would be nice

14:21 <LordKalma> fair as well

14:22 <LordKalma> as you can probably guess this was an adventure

14:23 <LordKalma> I first reverse engineered the source code from the driver on the vendor's firmware

14:23 <CounterPillow> yeah, I figured when you mentioned ghidra was involved :D

14:23 <LordKalma> but the panel didn't work... then another interested person showed up...

14:23 <LordKalma> and we discovered there was a second half (well, first half) of the driver code... in u-boot

14:23 <LordKalma> u-boot initialized the panel, the driver only picked up an initialized panel

14:24 <LordKalma> the crap vendors do... hahaha

14:24 <CounterPillow> Yeah :/

14:24 <LordKalma> I get it's so they can have the AMAZING(TM) splash screen

14:24 <LordKalma> but still...

14:26 <LordKalma> you can see the commits... I did the kernel part, someone else did the whole process of figuring out the boot sequence

14:26 <LordKalma> so it's rough around the edges

14:26 Akari has quit [Quit: segmentation fault (core dumped)]

14:28 <CounterPillow> btw that pr_warn should be a dev_warn too, but more importantly, if it uses SPI is there like some register stuff going on you can just use a regmap for instead of manually writing the byte sequence of the wire?

14:29 <CounterPillow> also that do { } while (0) in the try macro seems strange to me

14:30 <CounterPillow> it seems to only break if it fails? what?

14:30 <LordKalma> that was written by my colleague.. I too was wondering why it was

14:30 <LordKalma> apparently it's a very common macro thing

14:30 <CounterPillow> oh, while(0), not while(1), silly me

14:30 <CounterPillow> okay I see

14:30 <LordKalma> https://stackoverflow.com/a/4674580/5168563

14:30 <LordKalma> ^that

14:32 <LordKalma> it's the poor man's way of making a complicated thing a single statement

14:32 pcercuei has quit [Read error: Connection reset by peer]

14:32 <CounterPillow> looks like that's done elsewhere in the kernel as well

14:33 <LordKalma> re regmaps: not sure about those because of the DCX pin

14:33 pcercuei has joined #dri-devel

14:33 <LordKalma> that's "weird" SPI, not "normal" SPI

14:33 <CounterPillow> I see

14:33 <LordKalma> there's this special DCX wire

14:33 <LordKalma> you can see that st7701s_write_data does gpiod_set_value(ctx->dcx, 1); and st7701s_write_command does gpiod_set_value(ctx->dcx, 0);

14:36 <LordKalma> that controller supports 9 bit SPI and 16 bit SPI without DCX wire, and 8 bit SPI with DCX wire

14:36 <LordKalma> this is just the easier way

14:36 <LordKalma> (and the way the panel we're using is wired anyway)

14:36 <LordKalma> we're reverse engineering the software of a radio haha

14:36 <CounterPillow> Looks like you might be able to use regmaps with an appropriate regmap_config https://elixir.bootlin.com/linux/latest/source/include/linux/regmap.h#L291

14:37 jkrzyszt has joined #dri-devel

14:38 <LordKalma> I'll look into it, thanks

14:46 zehortigoza has quit [Remote host closed the connection]

14:47 <javierm> CounterPillow, LordKalma: that's exactly that the ssd130x-spi driver does: https://elixir.bootlin.com/linux/v6.1-rc8/source/drivers/gpu/drm/solomon/ssd130x-spi.c#L21

14:48 <LordKalma> oh cool, thanks

14:51 zehortigoza has joined #dri-devel

14:53 <LordKalma> javierm, were is ssd130x_spi_write actually used?

14:54 <javierm> LordKalma: https://elixir.bootlin.com/linux/v6.1-rc8/source/drivers/gpu/drm/solomon/ssd130x-spi.c#L57

14:55 <javierm> it's the regmap .write implementation for the SPI driver

14:56 <javierm> LordKalma: the driver logic is independent of the transport bus used, and just gets a regmap: https://elixir.bootlin.com/linux/v6.1-rc8/source/drivers/gpu/drm/solomon/ssd130x.c#L972

14:57 <LordKalma> ahh I see

14:57 <LordKalma> thanks

14:58 <javierm> LordKalma: you are welcome. The driver just then do regmap_bulk_write(ssd130x->regmap, ...) or regmap_write(ssd130x->regmap, ...) and doesn't care if is using 4-wire SPI or I2C

14:58 <javierm> I would suggest to have a similar design

14:59 fab has quit [Quit: fab]

15:00 <javierm> LordKalma: specially since you said that the chip also support 9-bit and 16-bit SPI so you could have different write implementations for those

15:05 kts has quit [Quit: Leaving]

15:05 jkrzyszt has quit [Ping timeout: 480 seconds]

15:11 macromorgan has joined #dri-devel

15:27 junaid has joined #dri-devel

15:27 vliaskov has quit [Remote host closed the connection]

15:32 Duke`` has joined #dri-devel

15:32 junaid has quit [Remote host closed the connection]

15:33 fab has joined #dri-devel

15:54 tchar_ has quit []

15:54 tchar has joined #dri-devel

16:10 jkrzyszt has joined #dri-devel

16:13 lemonzest has quit [Quit: WeeChat 3.6]

16:14 alyssa has left #dri-devel [#dri-devel]

16:23 JohnnyonFlame has joined #dri-devel

16:40 jkrzyszt has quit [Remote host closed the connection]

16:41 frieder has quit [Remote host closed the connection]

16:41 djbw has joined #dri-devel

16:50 srslypascal has joined #dri-devel

16:58 <Lynne> airlied: wrote https://lynne.ee/drafts/vulkan-video-decoding.html

16:58 <Lynne> some proofreading would be nice

17:02 jkrzyszt has joined #dri-devel

17:03 sgruszka has quit [Remote host closed the connection]

17:06 <psykose> the 'to configure' section does && make -j40 but then says type make -j0 to build it, should just drop the former

17:06 <psykose> (minor nit)

17:07 maxzor has joined #dri-devel

17:07 <Lynne> fixed it, thanks

17:08 <digetx> tarceri: hello, could you please take a look at these Mesa cache MRs once you'll have time:

17:08 <digetx> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18551

17:08 <digetx> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19328

17:08 <digetx> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20256

17:10 bmodem has quit [Ping timeout: 480 seconds]

17:12 lynxeye has quit [Quit: Leaving.]

17:12 <jenatali> Lynne: Good read :)

17:13 <Ristovski> Lynne: are there benchmarks for encode/decode vs lets say VAAPI on radeonsi?

17:13 <Ristovski> but indeed, good read

17:13 tursulin has quit [Ping timeout: 480 seconds]

17:13 <jenatali> Btw, not sure if you'd seen, but we recently helped Intel bring VAAPI to Windows

17:15 <Ristovski> jenatali: didn't Intel also use DXVK on Windows in their latest GPU stack (for their new dGPUs iirc?), quite interesting

17:15 <jenatali> Seems that way

17:15 <Ristovski> yeah: https://game.intel.com/story/intel-arc-graphics-directx9

17:16 <Ristovski> mentioned only in the readme though :^)

17:39 sarahwalker has quit [Ping timeout: 480 seconds]

17:54 <airlied> Lynne: looks good

18:04 <Lynne> still a bit dry, I'll let it sit for half a day

18:04 <Lynne> couldn't figure out where the overhead came from

18:04 <Lynne> it was all from libc

18:04 <Lynne> but removing pretty much everything didn't make it better

18:08 <Lynne> libc/__subtf3 and kernel/clear_page_erms

18:09 JohnnyonFlame has quit [Read error: Connection reset by peer]

18:11 <HdkR> oh no, subtf3?

18:11 <HdkR> Who's using long double over here?

18:12 <HdkR> shame and smite

18:13 <Lynne> got rid of all vulkan calls during decoding, frames are only allocated once at the start from a pool, there's literally nothing that ought to make decoding wait

18:18 <Lynne> oh...

18:18 <Lynne> allocating 114 megs per-frame does that to you

18:19 <Lynne> guess even libc's allocator isn't fast enough, I'll have to implement proper size calculations and pooling

18:24 tzimmermann has quit [Quit: Leaving]

18:30 ybogdano has joined #dri-devel

18:32 heat__ has joined #dri-devel

18:32 heat_ has quit [Read error: No route to host]

19:03 djbw has quit [Read error: Connection reset by peer]

19:06 morphis has quit []

19:13 <Lynne> for those interested, for h264, vulkan is 30% slower than vaapi

19:15 <Lynne> could be improved by possibly using sparse buffers for slice data

19:15 agd5f has quit [Read error: Connection reset by peer]

19:15 <Lynne> and host mapping

19:16 <Lynne> the kernel seems to spend a bit of time in prepare_transfer

19:16 wens_ has joined #dri-devel

19:16 wens has quit [Read error: Connection reset by peer]

19:18 MajorBiscuit has quit [Ping timeout: 480 seconds]

19:22 agd5f has joined #dri-devel

19:31 rmckeever has joined #dri-devel

19:41 JoniSt has joined #dri-devel

19:44 <JoniSt> Hey, I'm trying to debug a rather nasty GPU hang on my W6800 at the moment that occurs when I launch ROCm workloads. I get SMU hangs. I noticed that the graphics clock of my card goes to its maximum as soon as ROCm starts up (but way before there's actual load on the GPU)... Does anyone know what code exactly pins the GPU at max frequency? Is it ROCm itself, amdkfd, or amdgpu?

19:47 <kisak> probably worth asking in the radeon-specific channel

19:48 <JoniSt> Ah, true :)

19:50 jkrzyszt has quit [Remote host closed the connection]

20:03 djbw has joined #dri-devel

20:04 Danct12 has quit [Quit: Quitting]

20:06 Danct12 has joined #dri-devel

20:19 apinheiro has quit [Ping timeout: 480 seconds]

20:28 alyssa has joined #dri-devel

20:28 <alyssa> return PIPE_QUIRK_TEXTURE_BORDER_COLOR_SWIZZLE_FREEDRENO;

20:28 <alyssa> this is a normal thing to have in panfrost right?

20:28 <alyssa> normal.

20:28 <alyssa> thing.

20:40 q4a has joined #dri-devel

20:40 <q4a> it's already in zink

20:40 jkrzyszt has joined #dri-devel

20:42 <q4a> src/gallium/drivers/zink/zink_screen.c

20:45 <alyssa> zink uses it for turnip which is less egregious :p

20:46 Jeremy_Rand_Talos_ has quit [Remote host closed the connection]

20:47 Jeremy_Rand_Talos_ has joined #dri-devel

21:06 bgs has quit [Remote host closed the connection]

21:08 turol has quit [Ping timeout: 480 seconds]

21:17 mvlad has quit [Remote host closed the connection]

21:29 ybogdano has quit [Ping timeout: 480 seconds]

21:30 Haaninjo has joined #dri-devel

21:31 rmckeever has quit [Quit: Leaving]

21:32 danvet has quit [Ping timeout: 480 seconds]

21:47 rasterman has quit [Quit: Gettin' stinky!]

21:48 dcz_ has quit [Ping timeout: 480 seconds]

21:59 Duke`` has quit [Ping timeout: 480 seconds]

22:10 apinheiro has joined #dri-devel

22:12 JoniSt has quit [Quit: KVIrc 5.0.0 Aria http://www.kvirc.net/]

22:20 rgallaispou1 has joined #dri-devel

22:20 fab has quit [Quit: fab]

22:22 heat__ has quit [Read error: Connection reset by peer]

22:22 heat__ has joined #dri-devel

22:24 rgallaispou1 has quit [Read error: Connection reset by peer]

22:25 rgallaispou has quit [Ping timeout: 480 seconds]

22:26 rgallaispou has joined #dri-devel

22:32 <Lynne> implemented pooling for headers in hevcdec, it's better now, 500fps for typical low bitrate 1080p

22:32 <Lynne> still not quite vaapi's 670fps

22:33 Akari has joined #dri-devel

22:36 <Lynne> "24.93% [kernel] [k] vcn_v3_0_ring_patch_cs_in_place"

22:36 <Lynne> hello

22:37 ahajda_ has joined #dri-devel

22:37 * airlied will be offline for a bit of today, will hopefully get time to dig a bit into perf later

22:37 <Lynne> np, I can move on to encoding now

22:42 <agd5f> Lynne, vcn_v3_0_ring_patch_cs_in_place is required because the VCN engines are asymmetric on some chips (e.g., first engine supports all codecs, second instance doesn't). So we need to see what codec the user is requesting and make sure the job ends up on the right engine

22:44 ahajda__ has quit [Ping timeout: 480 seconds]

22:48 <Lynne> right

22:49 <Lynne> on VAAPI, that symbol's nowhere near the top

22:49 <Lynne> is scheduling done differently?

22:50 <Lynne> btw nvidia speed for hevc 1080p - 800fps nvdec, 390 vulkan

22:53 heat__ has quit [Remote host closed the connection]

22:53 <agd5f> shouldn't be any different

22:54 * airlied wonders can vulkan expose the asymmetry to avoid that

22:55 <agd5f> airlied, we purposely did that so that the kernel could load balance properly

22:55 <agd5f> otherwise how to do you know which queue to use in userspace

22:56 <agd5f> I guess we could have added a codec flag to the context or something like that

22:56 heat has joined #dri-devel

22:57 pcercuei has quit [Quit: dodo]

23:06 pjakobsson_ has joined #dri-devel

23:11 pjakobsson has quit [Ping timeout: 480 seconds]

23:22 jkrzyszt has quit [Remote host closed the connection]

23:37 jkrzyszt has joined #dri-devel

23:41 turol has joined #dri-devel

23:45 tobiasjakobi has joined #dri-devel

23:46 tobiasjakobi has quit [Remote host closed the connection]

23:48 apinheiro has quit [Quit: Leaving]

23:53 jkrzyszt has quit [Ping timeout: 480 seconds]

23:57 warpme_____ has quit []