ChanServ changed the topic of #dri-devel to: <ajax> nothing involved with X should ever be unable to find a bar
<mattst88> DemiMarie: I'm unaware of any security issues in the GuC, past or present. I'm just aware of consistent performance problems when using it for command submission
iive has quit [Quit: They came for me...]
* airlied can't wait until someone hits gsprm fw with a fuzzer :-P
<airlied> then we can buffer overflow and it upload our own fw :-P
<alyssa> marcan: ^^ have fun
<airlied> get some rust on risc-v fw :-P
alyssa has quit [Quit: leaving]
alyssa has joined #dri-devel
ahajda__ has quit []
<alyssa> that's more lina and jekstrand's dept
<Lynne> airlied: rebased my patchset, going to compare indices with nvdec again
ybogdano has quit [Read error: Connection reset by peer]
<airlied> Lynne: now I can't get more than 2 good frames :-(
<Lynne> did I mess anything up?
<airlied> not sure, will have to check out what is going wrong
<Lynne> oh, I didn't merge your previous diff
<airlied> okay hacked up to 5 frames going again
<marcan> alyssa: :p
zf has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
Haaninjo has quit [Quit: Ex-Chat]
co1umbarius has joined #dri-devel
columbarius has quit [Ping timeout: 480 seconds]
zf has joined #dri-devel
<airlied> Lynne: so do you get any good frames from nvidia?
sarnex_ has quit []
sarnex has joined #dri-devel
zf has quit [Ping timeout: 480 seconds]
YuGiOhJCJ has joined #dri-devel
<Lynne> no, I mentioned last night it broke stuff there
<airlied> Lynne: yeah just wondering how badly
<Lynne> as badly as on AMD without your diff, let me check
<Lynne> yeah, I think so
<Lynne> with the difference nvidia's first 2 gops have completely green frames on I-frames if the indices are incorrect
* airlied resorts to reading the dxvk h265 spec
<DemiMarie> mattst88: then why use it instead of execlists and vexeclists?
zf has joined #dri-devel
<mattst88> DemiMarie: I honestly don't know. I've heard why it *could* be better, but I've only ever been aware of it being worse (e.g. in terms of GPU clock management)
<mattst88> what I heard when I was at Intel was that the GuC was created because Windows syscall overhead was super high, so being able to submit via the GuC avoided that overhead
<mattst88> and then because it existed, they powers that be wanted to have only one supported path, so naturally Linux must switch to GuC even if it didn't suffer from the problem the GuC was created to solve
<mattst88> but I don't know any of that first hand
<jekstrand> Part of the problem is that no one ever did the work to make i915 use the GuC properly.
<jekstrand> I think if you use the GuC as a proper submission firmware and stop trying to treat it like an execlist back-end, it probably works a lot better.
<jekstrand> Which doesn't mean you need to open it up to direct submit from userspace.
<jekstrand> direct-submit and "use the GuC properly" are two totally different things.
<alyssa> jekstrand: So, I hear there's this 3 letter company that has direct-submit for their hardware under Linux..
<DemiMarie> mattst88: what about pushing back on that and telling them that they need to keep supporting execlists?
ngcortes has quit [Quit: Leaving]
<mattst88> DemiMarie: we did for years
<DemiMarie> then what happened?
<DemiMarie> did they change the hardware to make it impossible?
<mattst88> IIRC there were plans to switch to GuC by default for SKL, then for KBL, then for CNL (which never shipped), etc etc
<jekstrand> Execlists still exist but they're untested, entirely unused on Windows, and they want to get rid of them.
<jekstrand> They're also pretty shitty, actually
<DemiMarie> In what ways?
<mattst88> I don't know of any hardware chances that would make it impossible, but anything untested is bound to be broken
<DemiMarie> Untested by who?
<mattst88> anyone
<DemiMarie> Even i915?
<jekstrand> Untested by the people who will catch bugs fast enough to fix them.
<alyssa> Wait, if Windows syscall overhead is the problem
<mattst88> I think GuC submission is finally on-by-default on Alderlake, so I expect any bugs in the execlist path to pretty much be ignored going forward until it bitrots and ends up being removed
<alyssa> What's wrong with Android syscall overhead
<mattst88> alyssa: does Intel ship any hardware used in Android products? :)
<jekstrand> As for what's wrong with them: They're horribly racy.
<DemiMarie> What do you mean?
Company has quit [Quit: Leaving]
<jekstrand> There's like 10 execlist ports and in order to keep the GPU full, you're supposed to cycle between them except if you smash registers at the wrong time, you might end up preempting and swapping out your job without knowing it.
<jekstrand> Also, the feedback mechanism to figure out what's running and what the status was is pretty racy too.
<jekstrand> It's a horrible hardware design.
<DemiMarie> Is the GuC much nicer?
<jekstrand> IDK how nice it is internally but the interface is pretty much what you want for a competent modern driver.
<DemiMarie> How does it compare to nvidia’s GSP?
<jekstrand> It presents some number (like 1k? 16k?) virtual queues and you just stick stuff on them and it load balances between all the active ones, respecting priority.
<DemiMarie> Also I wonder if one of the advantages of such processors is that because they are so small, they can just busy spin all the time and not wake up the host.
<jekstrand> It's roughly the same design concept as nvidia's GSP. I'm sure NVIDIA did it better because they're NVIDIA and Intel is Intel but it's the same basic idea.
yuq825 has joined #dri-devel
<airlied> gsp is the same submit fw as pre-gsp
<airlied> gsp didn't really change that piece too much
<alyssa> mattst88: I meant for another vendor pushing userspace submit that sells mostly Android
<DemiMarie> airlied: then what does gsp change? and why is it like 40MB?
<airlied> DemiMarie: it changes the init procedure really
<airlied> instead of the cpu driver loading up all the individual firmwares in a complicated and horrible sequence
<airlied> they offload a chunk of that work to the GSP, which has all the sequence logic etc
<DemiMarie> airlied: what is so horrible about it? and does having the GSP just mean that the GSP firmware writers had to write that horrible code?
<DemiMarie> and why does the GSP FW have to be so much larger than all the other FW combined?
<airlied> DemiMarie: because it contains a lot of versions of things for lots of gpus
<Lynne> airlied: compared values for nvdec and vulkan, found a small diff in the dbp slot, fixed it
<jenatali> airlied: Did you mean the DXVA spec or did you really mean DXVK?
<Lynne> now both amd and nvidia do 5 frames correctly, then get messed up output
<airlied> jenatali: lols DXVA
<airlied> Lynne: win!
<airlied> Lynne: so frame 6 is the first where the slot index don't match anymore
<Lynne> they get messed up output in the same way visually, so that is indeed a win
<Lynne> yup
<DemiMarie> jekstrand: Is Intel known for making bad GPUs?
<airlied> they aren't know for making good GPUs
<Lynne> batum-pssst
<jenatali> Ok that's what I thought lol
<jekstrand> DemiMarie: I mean, they work. But NVIDIA's clearly better at it.
<jekstrand> All GPUs are horrible somewhere.
<DemiMarie> Any more than CPUs?
<jekstrand> ¯\_(ツ)_/¯
<lina> I'm impressed that Apple managed to do mostly driver-transparent multi-GPU scalability on a tiler that actually works
<alyssa> same
<lina> They probably deserve some credit in GPU-land for that one ^^
<DemiMarie> lina: hello! why is this so impressive?
<lina> Splitting work across multiple GPUs is pretty hard! And then a tile has to do it twice, with a shuffle step in the middle.
<lina> *tiler
<alyssa> i'm just impressed someone managed to make a fast tiler
<DemiMarie> what was the hardest part about writing asahi.ko?
<alyssa> me neighsaying rust?
<lina> I'm not sur--- yeah maybe that ^^
<DemiMarie> alyssa: were you skeptical at first?
<alyssa> very
<DemiMarie> why, and what changed?
<alyssa> thought the bindings would be hell and it would delay release
<alyssa> from what I can tell, they were and it did, but it was worth it because the driver is rock solid and that would never have happened in C
<alyssa> and I'd rather release a completely stable driver a few months late than race to push out a broken one that architecturally can never get fixed
<lina> To be honest, figuring out how to work around the stack placement issue was harder than the abstractions I think ^^
<alyssa> woof
<DemiMarie> I would say that “never” is a strong word, but (a) Intel and AMD’s drivers are not one-person efforts and (b) at least Intel’s driver makes completely broken assumptions about Linux’s Page Attribute Table
<lina> The abstractions was mostly just having to learn how to model it properly, but it's not that much code and not very complicated
<alyssa> lina: so... could I interest you in writing another kernel driver in Rust? (-:
<lina> Which one? ^^
<alyssa> Mali :-p
<lina> wwwww
<DemiMarie> alyssa: I thought there was already a driver for that
<lina> marcan mentioned something about DCP...
<HdkR> Speaking of Asahi, should I take a look at the asahi drm API yet?
<DemiMarie> DCP?
<lina> If you're interested! It's pretty minimal and only really implements what I need, but I think the design is reasonable (though I'm sure there's things to be improved)
<DemiMarie> Does anyone here wish that they could require userspace libraries that matched the kernel version, the way illumos and macOS do?
<lina> DemiMarie: Apple's display controller
<HdkR> I want to make sure the API won't make FEX too terribly upset once 32-bit applications start talking to it.
<alyssa> HdkR: No. the UAPI is going to be completely torched.
<HdkR> :O
<alyssa> i mean
<lina> Oh I thought HdkR meant the abstractions
<alyssa> the current UAPI architecturally has no sync
<lina> The UAPI no, yeah, that is provisional
<alyssa> lina: he wants to know when to start thunking the UAPI for FEX on Asahi :-p
<HdkR> Indeed
<alyssa> i don't mean "no implicit sync"
<lina> It should just work on 32-bit
<alyssa> or "no explicit sync"
<alyssa> i mean "no sync" ;-p
<lina> There are no pointer-sized pointers
<alyssa> lina: speaking of when are you going to start torching the UAPI
<alyssa> after plumbing compute?
<DemiMarie> lina: does 32-bit even matter in this case?
<lina> Not really, no, neither side cares
<HdkR> Yes!
<HdkR> I care :<
<DemiMarie> lina: does the hardware even support 32-bit userspace?
<lina> I meant neither driver side cares that one is 32-bit
<lina> DemiMarie: No, this is for emulation
<DemiMarie> lina: emulation of what?
<lina> x86
<HdkR> It's a tricky problem when an x86 GL/Vulkan communicates to an AArch64 DRM uapi :D
<lina> HdkR: TBH, I'd be more worried about 64-bit assumptions in the driver breaking in 32-bit builds more than the UAPI (though I didn't notice any so far)
<qyliss> DemiMarie: (FEX as mentioned above is an x86 emulator)
<lina> But the UAPI design should require no thunking at all, and if it does that's a bug
<lina> So you should be able to start using it just fine already, and see if you run into driver issues in 32-bit builds
<DemiMarie> linaalyssa: I had a cursed idea for avoiding a stable UAPI: have the driver provide a shared library that userspace is expected to dlopen(), and require userspace to do everything via that library. If you try to make an ioctl and your instruction emulator is *not* in that DSO your process just dies.
<qyliss> (not that it was mentioned above that it's an x86 emulator, but that it was mentioned at all)
<DemiMarie> qyliss: Interesting!
<lina> DemiMarie: That's just kicking the problem to another layer...
<HdkR> lina: Aye, I'm not expecting the uapi to have problems, but the sooner I can get it in to CI the better from my PoV. The rest of the problems are...likely to be painful
<DemiMarie> lina: good point, unless that DSO *is* Mesa 🤣
<HdkR> Just make sure not to allocate 4GB of contiguous VA space in your driver :<
<alyssa> "linaalyssa" is this a ship name >->
<lina> And then we already do that... there's a UAPI version field (that will probably go away once it's stable, but for now it refuses to load the driver if there is a mismatch) ^^
<DemiMarie> alyssa: ship name?
<alyssa> i will have you know that is very much not what's happening here thanks
<lina> HdkR: I don't think we do that ww
<HdkR> We'd find out quickly if it did :D
<DemiMarie> (This is where I really wish IRC supported proper quoted replies)
<lina> The only weirdness is the 16K page issue as you know (only 16K kernels are supported right now)
<HdkR> lina: I like the "right now" on the end there
<DemiMarie> qyliss: that is what I thought, but thanks for confirming!
<lina> sven has a WIP patch to make 4K kernels work for the general IOMMUs, but for the GPU I need to get the shmem helper to allocate 16K contiguous aligned pages and... that seems like it'll be painful
<lina> (if anyone has any ideas there I'm all ears!)
<HdkR> Also need to get that TSO bit wired up to a prctl at some point
<lina> Ah yeah, that too
<lina> Ping marcan about that
<HdkR> marcan: Gimme the TSO bit in a prctl please.
<lina> I think he said something about putting it in m1n1 as a stopgap so you can start testing?
<marcan> yeah, I can give you a systemwide global toggle in m1n1 very easily. prctl shouldn't be hard but that's another one for the kernel bikeshedding list I'm sure...
<DemiMarie> Would io_uring_cmd be useful for GPU drivers?
<HdkR> :)
<Lynne> airlied: could you renew your diff? it expired
<airlied> Lynne: not sure I have a clean version of it right now
<airlied> unless I have a git stash somewhere
<airlied> https://paste.centos.org/view/dff25c14 might have most of it
<airlied> Lynne: might also be worth trying the memsets on nvidia
<DemiMarie> jekstrand: how complex is the format of each queue w.r.t. parsing, etc? IOW: for a security-sensitive system like Qubes, should VFs be made to submit via the kernel for validation?
<Lynne> airlied: with all of your changes, I get 2 correct frames, with just my new changes + -1ing the arrays, I get 5 frames
<Lynne> on nvidia, it doesn't really help
<Lynne> the way the 2 implementations go messed up is different, it's like radv gets completely invalid refs, while nvidia just gets wrong refs
Daanct12 has joined #dri-devel
camus has joined #dri-devel
<airlied> Lynne: yay I win
<airlied> but no idea how to cleanly win :-P
<Lynne> I can't really finds any flaws with the logic in the code
<marcan> can I say I'm glad our video enc/dec hardware has nothing to do with the GPU?
<marcan> stashing it in the GPU sounds like triple the pain :p
<Lynne> RefPicSetStCurrBefore exactly matches nvdec
<airlied> Lynne: see how that patch works for nvidia
<Lynne> idx is just a counter that counts over the total refs
<airlied> it needs "refinement"
<Lynne> tmp2 counts over all refs+keyframe
<airlied> marcan: I think video enc/dec is horrible no matter where you do it
<airlied> Lynne: so what I worked out is approx this:
<marcan> fair :)
<airlied> the Before/After list is an index into the RefPicList
<Lynne> on nvidia, it breaks in such a way that it looks like the wrong refs are used
<airlied> the drivers generate the RefPicList from the VK structure ordering
<airlied> now I've no idea how nvidia generate their RefPicList
<Lynne> RefPicList you say? we have a list named like that in our vaapi code
<Lynne> if you're meant to generate that same list via the given 3 lists of before/after/cur refs, should compare them
<airlied> nope it is what those lists refer to
<airlied> you can't generate it from them
<airlied> the values in the before/after/cur arrays are indexes into the refpiclist
<airlied> the refpiclist contains indexes into the DPB
<airlied> Lynne: okay with all those fixed up B frames work for me as well
<airlied> now the question is whether we've just created a radv specific API user
<Lynne> I compared vaapi's refpiclist with RefPicSetStCurrAfter
<Lynne> they match if j + (j >= key_idx); gets changed to j; in vulkan
<Lynne> changing this value in vulkan does not actually change the output at all
<airlied> https://paste.centos.org/view/raw/63b9fcc3 is a cleaned up works on radv version
<airlied> Lynne: they aren't the same
<airlied> so shouldn't match
<airlied> RefPicList[RefPicSetStCurrAfter[i]] is how it works
<airlied> the question on vulkan is how the driver builds RefPicList
<airlied> since it's not passed in, so it must come from the slotIndexes in the picture list
<airlied> reference list
<airlied> tbh I think the spec is vague here
<Lynne> hey, that's pretty good, works here on everything I threw at it!
<Lynne> let me test on nvidia
<Lynne> novidya :(
<Lynne> either nvidia's implementation is broken, or we've created a radv-specific parameter version
<airlied> I don't think the spec is clear enough on those arrays
<airlied> in fact I think it just hides them in "h265 specifics"
<Lynne> it fails after frame 5 like before, with an output that makes it look like ref frames are blank
<airlied> I'm writing a public issue to see what we can find out
<airlied> it might be worth trying to changes slots[j] to j
<airlied> Lynne: okay filed 2010
<Lynne> changing slots[j] to j fixes nvidia
<airlied> Lynne: okay
<airlied> who is right :-P
<Lynne> we are, until proven otherwise
* airlied goes to read slotIndex for the n+1th time
<Lynne> I'll ping the nvidia guy on the thread to take a look at it along with h264
<Lynne> he still hasn't cheged h264 despite finishing that weeks ago
<Lynne> well, onwards to encoding now! the night is still young!
<airlied> Lynne: I tried changing radv to work with j, but the hw seems to dislike it
<Lynne> neat, it seems like the magic internal 10bit to 8bit conversion works
<airlied> Lynne: okay I can make radv work with j
<airlied> so maybe that is the correct answer
<Lynne> btw 10bit doesn't work because the driver only returns NV12 as a possible surface to decode into
<Lynne> I mean it works through magic, but you get 8bit output, not 10
<airlied> I suspect the surfface format is just the first step
<airlied> https://paste.centos.org/view/f2634ea4 works for me now on radv
<airlied> Lynne: got a video/command line I can test 10bit with?
heat_ has joined #dri-devel
heat has quit [Read error: Connection reset by peer]
<airlied> Lynne: radv should report a 10-bit format now
<airlied> and might even do the right thing now
<Lynne> how do you deal with the messed up p010 format, which isn't really p010?
<Lynne> because in vulkan, the 6bit padding is in the LSBs
<Lynne> whilst everyone else puts the padding in the MSBs afaik for p010
<airlied> that's why I need some testing, I've no idea about how it works
<airlied> what does nvidia return?
<Lynne> it returns an image, which looks wrong
<airlied> but what format?
<Lynne> well, VK_FORMAT_G10X6_B10X6_R10X6_3PLANE_420_UNORM_3PACK16
<Lynne> updated my repo
<Lynne> radv returns an init error when trying to decode
<Lynne> command line is
<Lynne> ./ffmpeg_g -init_hw_device "vulkan=vk:0,debug=1" -hwaccel vulkan -hwaccel_output_format vulkan -i <INPUT> -loglevel debug -filter_hw_device vk -vf hwdownload,format=p016 -c:v rawvideo -an -y <OUTPUT>
<Lynne> for a test file, give me a sec
<airlied> Lynne: do you fill lumaBitDepth out correctly in the profile?
* airlied assumes that will be 10-bit
<airlied> or maybe I'm mixing up how I should be deciding on the formats to report
<Lynne> no
<Lynne> err, yes
<Lynne> lumadepth is set to VK_VIDEO_COMPONENT_BIT_DEPTH_10_BIT_KHR
<Lynne> what endianess did vulkan specify packed stuff to be in?
<Lynne> was it big or little?
<airlied> doh, fixed the start of it, now it dies later
<airlied> the spec writes out the expectation in sentences for all the packed format
<Lynne> yeah, it mentions no endianess there, so I presume either native endian or big endian?
<airlied> VK_FORMAT_G10X6_B10X6R10X6_2PLANE_420_UNORM_3PACK16 specifies an unsigned normalized multi-planar format that has a 10-bit G component in the top 10 bits of each 16-bit word of plane 0, and a two-component, 32-bit BR plane 1 consisting of a 10-bit B component in the top 10 bits of the word in bytes 0..1, and a 10-bit R component in the top 10 bits of the word in bytes 2..3, with the bottom 6 bits of
<airlied> each word unused.
<airlied> so that seems like it says, G10 is top bits
<Lynne> so big endian?
<Lynne> usually endianess is on a lower level so they don't mention it and assume native endian
<airlied> " The in-memory ordering of bytes within a component is determined by the host endianness."
<Lynne> right, native then
bmodem has joined #dri-devel
<airlied> Lynne: okay gets further now
heat_ has quit [Ping timeout: 480 seconds]
<airlied> Lynne: so you are looking at h264 encoding first?
<Lynne> doesn't really matter whether it's hevc or h264, hardware encoding APIs are all clean enough to let you implement anything with the same codepath
<Lynne> give me a few moments to verify 10bit uploading and downloading works so you can be sure the driver's wrong
<Lynne> err... no 10bit formats are supported at all by radv?
<Lynne> no VK_FORMAT_G10X6_B10X6_R10X6_3PLANE_420_UNORM_3PACK16, no VK_FORMAT_G10X6_B10X6R10X6_2PLANE_420_UNORM_3PACK16
<Lynne> not listed in vulkaninfo --show-formats
cengiz_io has joined #dri-devel
<airlied> oh I should see why not
<airlied> I'm just showing them in the video formats, didn't check out the radv bits
<Lynne> on nvidia, verified VK_FORMAT_G10X6_B10X6_R10X6_3PLANE_420_UNORM_3PACK16 works
<Lynne> but VK_FORMAT_G10X6_B10X6R10X6_2PLANE_420_UNORM_3PACK16 gives clearly wrong output
<Lynne> I'm not sure if it's a driver bug or I'm doing something odd (shouldn't be, it's just upload+download)
<airlied> Lynne: I'm seeing FORMAT_G10X6_B10X6R10X6_2PLANE_420_UNORM_3PACK16 in supported format
<Lynne> not here on navi21
<airlied> Lynne: using the top of my branch?
<Lynne> yup, works now, output looks wrong
<Lynne> you can use "./ffmpeg_g -init_hw_device "vulkan=vk:0,debug=1" -i <INPUT> -filter_hw_device vk -vf "format=p016,hwupload,hwdownload,format=p016" -c:v rawvideo -y TEST.nut" to test
<Lynne> I'm compiling on my intel machine so I have a third ref
<alyssa> I think I'm doing something nuts
<HdkR> :O
<alyssa> and it's for a bad cause
<alyssa> (moar fps on bifrost)
<Lynne> not supported on intel too
<airlied> Lynne: why do you call it P016?
<Lynne> well, vulkan puts the padding in the LSBs for p010, ffmpeg expects the padding in the MSBs since that's what everyone else uses afaik
<Lynne> so treating it like p016 is a way of solving this, provided the padding is always 0s
<airlied> the microsfot docs seem to suggest ls
<airlied> lsb
<airlied> "When the graphics hardware reads a surface that contains a 10-bit representation, it should ignore the low-order 6 bits of each channel. If a surface contains valid 16-bit data, however, it should be identified as a 16-bit surface."
<Lynne> right, you're right, ffmpeg does expect padding in the LSBs
<Lynne> (why do I remember it being the other way around? can't remember)
<Lynne> right, now I remember, ffmpeg expects padding for 3plane formats in the LSBs
<Lynne> *MSBs
<Lynne> so yuv420p10 is <padding><data>
<Lynne> updated my repo, use p010 in both places in the command line
<Lynne> output is still wrong
<Lynne> and output is wrong in slightly different ways on nvidia and radv...
lina has quit [Remote host closed the connection]
<alyssa> Maybe I got a bit carried away.
kchibisov_ has left #dri-devel [#dri-devel]
<airlied> Lynne: you seeing the yellow and red on radv?
<Lynne> yup, on nvidia it's read and green
<Lynne> *red
kchibisov has joined #dri-devel
<HdkR> alyssa: ooo, fancy swizzles
lina has joined #dri-devel
<airlied> Lynne: with the cmd line above I don't seem to be hitting radv at all
<airlied> at least not the dcoder
<Lynne> yeah, that's right, it's just an upload+downlad test
<airlied> ah cool
<Lynne> if you'd like to run decoding, use the command line you've used previously but with p010 instead of nv12
<Lynne> but if upload+download looks wrong, it's probably good to fix that first
<airlied> Lynne: btw they are only ratifying decode this year, encode is into next q
<Lynne> huh, okay. I bet this information alone is under NDA, so I'll pretend I didn't hear it.
off^ has joined #dri-devel
* airlied gotta finish for today, but yeah forget I said it, but also why I'm not as pushed on encode :-P
aravind has joined #dri-devel
<Lynne> I am slightly miffed, no video devs/users were involved in the standardization process, officially or unofficially (I did come in here and ask a few times for a bone, since it would've really saved on rewriting vulkan code while I was writing it)
<lina> I'm starting to suspect that the remaining KWin and Firefox glitches/issues I'm running into might be related to CPU out of order execution or other craziness like that... let's see if I can get down to the bottom of it today, but this is really weird...
<Lynne> and at the end, the only serious implementation was written by the busiest person in the world and someone not really paid to work on it
<lina> Valgrind makes it work properly, apitrace makes it work properly...
<airlied> Lynne: yes it's pretty crazy, the commitments from the gpu vendors has been pretty lax
dakr has quit [Read error: No route to host]
dakr has joined #dri-devel
Duke`` has joined #dri-devel
<DemiMarie> lina: are you using break-before-make when changing GPU page tables m
<lina> It's not that, the badness happens at the application level
<DemiMarie> Still, you really ought to be
<DemiMarie> This assumes that the GPU works like the CPU, which it probably does.
<lina> We always just map or unmap, there are no mutation operations
<DemiMarie> I see
lemonzest has joined #dri-devel
nchery has quit [Ping timeout: 480 seconds]
tzimmermann has joined #dri-devel
bgs has joined #dri-devel
itoral has joined #dri-devel
Duke`` has quit [Ping timeout: 480 seconds]
danvet has joined #dri-devel
nchery has joined #dri-devel
ahajda__ has joined #dri-devel
<Lynne> airlied: 10bit decoding actually works
<Lynne> it's the format itself which seems to be wrong
<Lynne> if you add ,format=yuv420p10 after the ,format=p010 the output is correct
<Lynne> for both decoding and the hwupload+download test
<Lynne> OH, now I realize why
<Lynne> no one bothered to standardize packed format for NUT
<Lynne> (the container/protocol I'm working on and should've been working on instead does support it, but it's still wip)
<Lynne> if you save it to raw raw or convert it to planar yuv it works, so no issues with anything
<Lynne> yup, works on Nvidia as well
<Lynne> the output, however, isn't fully okay on Nvidia, the colors are off
<Lynne> RADV's output is compliant and agrees with the software decoder, but Nvidia has visible chroma artifacts if you turn down the gamma
<Lynne> only happens on 10bits
<Lynne> either way, a problem with nvidia, probably misreading header values
kts has joined #dri-devel
<Lynne> so, uh, now there really is nothing left to do but work on encoding
<Lynne> (also, btw, since vulkan has a mechanism to expose which formats vulkan can decode into, for 10bits, radv ought to signal both nv12 and p010 for 10bit content and let clients choose the best one)
alanc has quit [Remote host closed the connection]
<Lynne> also the driver ought to signal 8 for maximum active refs for hevc, not 17
alanc has joined #dri-devel
<Lynne> (I think, if maximum active refs is the total number of refs a frame can use rather than the total number of refs in the DPB)
<airlied> Lynne: oh okay I'll fix the formast outputs
<airlied> okay pushed out both of those fixes
mvlad has joined #dri-devel
<Lynne> I updated my repo and post with I think most of the spec issues I found
<Lynne> something seems to have gone wrong, tons of validation layers now
camus has quit [Remote host closed the connection]
<airlied> oops I thinhk I misttype
frieder has joined #dri-devel
<airlied> oh no that isn't it
camus has joined #dri-devel
<airlied> the image formst stuff looks broken somewhere
rasterman has joined #dri-devel
<Lynne> no, my code was bad, it's good now
rmckeever has quit [Quit: Leaving]
lynxeye has joined #dri-devel
<Lynne> branch updated; it's 9 in the morning here
<Lynne> by the way, decoding is veeery slow compared to vaapi, 1080p hevc is 700fps for vaapi but only 60fps for vulkan
<Lynne> where is the extra overhead coming from?
<Lynne> no downloading or presenting at all, this is just ffmpeg decoding and then doing nothing with the frames
warpme_____ has joined #dri-devel
<Lynne> seems very CPU-bound, perf would probably expose it, but my perf is broken after I forgot to update it after my kernel, so that's a mystery for another time
fab has joined #dri-devel
tursulin has joined #dri-devel
junaid has joined #dri-devel
macromorgan has quit [Read error: Connection reset by peer]
junaid has quit [Remote host closed the connection]
<airlied> Lynne: not sure where, probably not getting the pipelining very right, or waiting for some hw takes a lot longer
jkrzyszt has joined #dri-devel
sgruszka has joined #dri-devel
MajorBiscuit has joined #dri-devel
dcz_ has joined #dri-devel
junaid has joined #dri-devel
rgallaispou has joined #dri-devel
apinheiro has joined #dri-devel
djbw has quit [Read error: Connection reset by peer]
aknautiy_ has quit [Remote host closed the connection]
Akari has joined #dri-devel
Daaanct12 has joined #dri-devel
YuGiOhJCJ has quit [Quit: YuGiOhJCJ]
Daaanct12 has quit [Remote host closed the connection]
junaid has quit [Ping timeout: 480 seconds]
junaid has joined #dri-devel
agd5f has quit [Ping timeout: 480 seconds]
junaid has quit [Ping timeout: 480 seconds]
sarahwalker has joined #dri-devel
sgruszka has quit [Ping timeout: 480 seconds]
vliaskov has joined #dri-devel
Company has joined #dri-devel
TMM has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
TMM has joined #dri-devel
dcz_ has quit [Ping timeout: 480 seconds]
srslypascal is now known as Guest2105
srslypascal has joined #dri-devel
MajorBiscuit has quit [Ping timeout: 480 seconds]
devilhorns has joined #dri-devel
Guest2105 has quit [Ping timeout: 480 seconds]
sgruszka_ has joined #dri-devel
natto has quit []
natto has joined #dri-devel
natto has quit []
natto has joined #dri-devel
<daniels> mupuf: you were already looking at fixing this right? https://gitlab.freedesktop.org/mesa/mesa/-/jobs/32797788
<mupuf> daniels: Yeah, let me check if it used the fixed version or not
<daniels> oh, mid-air collision :)
<mupuf> yeah, it was using the older version. Let me check that we are indeed using the new versions now
natto has quit []
<mupuf> yeah, we are. So, it shouldn't happen anymore (we retry 3 times to download artifacts, and if it still fails, we ignore the missing files and pretend all is fine)
natto has joined #dri-devel
natto has quit []
natto has joined #dri-devel
srslypascal has quit [Ping timeout: 480 seconds]
natto has quit []
natto has joined #dri-devel
MajorBiscuit has joined #dri-devel
pcercuei has joined #dri-devel
srslypascal has joined #dri-devel
Daanct12 has quit [Quit: Quitting]
jkrzyszt has quit [Remote host closed the connection]
aravind has quit [Remote host closed the connection]
junaid has joined #dri-devel
MajorBiscuit has quit [Quit: WeeChat 3.6]
MajorBiscuit has joined #dri-devel
Lucretia has quit [Read error: Connection reset by peer]
jkrzyszt has joined #dri-devel
sarahwalker has quit [Remote host closed the connection]
Lucretia has joined #dri-devel
srslypascal has quit [Quit: Leaving]
jkrzyszt has quit [Remote host closed the connection]
dcz_ has joined #dri-devel
heat_ has joined #dri-devel
junaid has quit [Ping timeout: 480 seconds]
jkrzyszt has joined #dri-devel
itoral has quit [Remote host closed the connection]
sgruszka_ has quit [Ping timeout: 480 seconds]
columbarius has joined #dri-devel
co1umbarius has quit [Ping timeout: 480 seconds]
kts has quit [Quit: Leaving]
co1umbarius has joined #dri-devel
sarahwalker has joined #dri-devel
columbarius has quit [Ping timeout: 480 seconds]
<LordKalma> so I managed to finish that driver that was causing me to bang my head against the wall
<LordKalma> do you reckon this is upstream-worthy material?
sgruszka has joined #dri-devel
devilhorns has quit []
yuq825 has left #dri-devel [#dri-devel]
agd5f has joined #dri-devel
<CounterPillow> imho don't credit their author if they chose to not comply with the GPL, you wrote the source after all
<CounterPillow> oh nvm, you do that
<agd5f> bnieuwenhuizen, it was sent a while ago, just waiting for the linux-firmare maintainers to apply it
jkrzyszt has quit [Remote host closed the connection]
<bnieuwenhuizen> agd5f: I think they just did :) thx!
<agd5f> excellent
columbarius has joined #dri-devel
co1umbarius has quit [Ping timeout: 480 seconds]
kts has joined #dri-devel
<LordKalma> CounterPillow, it's just in the readme.md as a factoid
<CounterPillow> LordKalma: I think instead of the pr_ commands with a manual prefix you should use the dev_info macro with the device as an argument (you should be able to get it from struct drm_panel's dev member)
<LordKalma> in fact, we can remove those debug messages now
<CounterPillow> if you can obtain a datasheet of the panel somehow and document what some of those magic values are, that would be pretty cool too.
<LordKalma> I do have a datasheet
<LordKalma> I don't remember where I got the controller datasheet
<LordKalma> I know I have the *panel* datasheet legitimately
<CounterPillow> that's fine, doesn't have to be linked to
<LordKalma> as in, I asked the supplier
<CounterPillow> naming the commands (0xEB, 0xEC, 0xED) in actual defines would be nice
<LordKalma> fair as well
<LordKalma> as you can probably guess this was an adventure
<LordKalma> I first reverse engineered the source code from the driver on the vendor's firmware
<CounterPillow> yeah, I figured when you mentioned ghidra was involved :D
<LordKalma> but the panel didn't work... then another interested person showed up...
<LordKalma> and we discovered there was a second half (well, first half) of the driver code... in u-boot
<LordKalma> u-boot initialized the panel, the driver only picked up an initialized panel
<LordKalma> the crap vendors do... hahaha
<CounterPillow> Yeah :/
<LordKalma> I get it's so they can have the AMAZING(TM) splash screen
<LordKalma> but still...
<LordKalma> you can see the commits... I did the kernel part, someone else did the whole process of figuring out the boot sequence
<LordKalma> so it's rough around the edges
Akari has quit [Quit: segmentation fault (core dumped)]
<CounterPillow> btw that pr_warn should be a dev_warn too, but more importantly, if it uses SPI is there like some register stuff going on you can just use a regmap for instead of manually writing the byte sequence of the wire?
<CounterPillow> also that do { } while (0) in the try macro seems strange to me
<CounterPillow> it seems to only break if it fails? what?
<LordKalma> that was written by my colleague.. I too was wondering why it was
<LordKalma> apparently it's a very common macro thing
<CounterPillow> oh, while(0), not while(1), silly me
<CounterPillow> okay I see
<LordKalma> ^that
<LordKalma> it's the poor man's way of making a complicated thing a single statement
pcercuei has quit [Read error: Connection reset by peer]
<CounterPillow> looks like that's done elsewhere in the kernel as well
<LordKalma> re regmaps: not sure about those because of the DCX pin
pcercuei has joined #dri-devel
<LordKalma> that's "weird" SPI, not "normal" SPI
<CounterPillow> I see
<LordKalma> there's this special DCX wire
<LordKalma> you can see that st7701s_write_data does gpiod_set_value(ctx->dcx, 1); and st7701s_write_command does gpiod_set_value(ctx->dcx, 0);
<LordKalma> that controller supports 9 bit SPI and 16 bit SPI without DCX wire, and 8 bit SPI with DCX wire
<LordKalma> this is just the easier way
<LordKalma> (and the way the panel we're using is wired anyway)
<LordKalma> we're reverse engineering the software of a radio haha
<CounterPillow> Looks like you might be able to use regmaps with an appropriate regmap_config https://elixir.bootlin.com/linux/latest/source/include/linux/regmap.h#L291
jkrzyszt has joined #dri-devel
<LordKalma> I'll look into it, thanks
zehortigoza has quit [Remote host closed the connection]
<javierm> CounterPillow, LordKalma: that's exactly that the ssd130x-spi driver does: https://elixir.bootlin.com/linux/v6.1-rc8/source/drivers/gpu/drm/solomon/ssd130x-spi.c#L21
<LordKalma> oh cool, thanks
zehortigoza has joined #dri-devel
<LordKalma> javierm, were is ssd130x_spi_write actually used?
<javierm> it's the regmap .write implementation for the SPI driver
<javierm> LordKalma: the driver logic is independent of the transport bus used, and just gets a regmap: https://elixir.bootlin.com/linux/v6.1-rc8/source/drivers/gpu/drm/solomon/ssd130x.c#L972
<LordKalma> ahh I see
<LordKalma> thanks
<javierm> LordKalma: you are welcome. The driver just then do regmap_bulk_write(ssd130x->regmap, ...) or regmap_write(ssd130x->regmap, ...) and doesn't care if is using 4-wire SPI or I2C
<javierm> I would suggest to have a similar design
fab has quit [Quit: fab]
<javierm> LordKalma: specially since you said that the chip also support 9-bit and 16-bit SPI so you could have different write implementations for those
kts has quit [Quit: Leaving]
jkrzyszt has quit [Ping timeout: 480 seconds]
macromorgan has joined #dri-devel
junaid has joined #dri-devel
vliaskov has quit [Remote host closed the connection]
Duke`` has joined #dri-devel
junaid has quit [Remote host closed the connection]
fab has joined #dri-devel
tchar_ has quit []
tchar has joined #dri-devel
jkrzyszt has joined #dri-devel
lemonzest has quit [Quit: WeeChat 3.6]
alyssa has left #dri-devel [#dri-devel]
JohnnyonFlame has joined #dri-devel
jkrzyszt has quit [Remote host closed the connection]
frieder has quit [Remote host closed the connection]
djbw has joined #dri-devel
srslypascal has joined #dri-devel
<Lynne> some proofreading would be nice
jkrzyszt has joined #dri-devel
sgruszka has quit [Remote host closed the connection]
<psykose> the 'to configure' section does && make -j40 but then says type make -j0 to build it, should just drop the former
<psykose> (minor nit)
maxzor has joined #dri-devel
<Lynne> fixed it, thanks
<digetx> tarceri: hello, could you please take a look at these Mesa cache MRs once you'll have time:
bmodem has quit [Ping timeout: 480 seconds]
lynxeye has quit [Quit: Leaving.]
<jenatali> Lynne: Good read :)
<Ristovski> Lynne: are there benchmarks for encode/decode vs lets say VAAPI on radeonsi?
<Ristovski> but indeed, good read
tursulin has quit [Ping timeout: 480 seconds]
<jenatali> Btw, not sure if you'd seen, but we recently helped Intel bring VAAPI to Windows
<Ristovski> jenatali: didn't Intel also use DXVK on Windows in their latest GPU stack (for their new dGPUs iirc?), quite interesting
<jenatali> Seems that way
<Ristovski> mentioned only in the readme though :^)
sarahwalker has quit [Ping timeout: 480 seconds]
<airlied> Lynne: looks good
<Lynne> still a bit dry, I'll let it sit for half a day
<Lynne> couldn't figure out where the overhead came from
<Lynne> it was all from libc
<Lynne> but removing pretty much everything didn't make it better
<Lynne> libc/__subtf3 and kernel/clear_page_erms
JohnnyonFlame has quit [Read error: Connection reset by peer]
<HdkR> oh no, subtf3?
<HdkR> Who's using long double over here?
<HdkR> shame and smite
<Lynne> got rid of all vulkan calls during decoding, frames are only allocated once at the start from a pool, there's literally nothing that ought to make decoding wait
<Lynne> oh...
<Lynne> allocating 114 megs per-frame does that to you
<Lynne> guess even libc's allocator isn't fast enough, I'll have to implement proper size calculations and pooling
tzimmermann has quit [Quit: Leaving]
ybogdano has joined #dri-devel
heat__ has joined #dri-devel
heat_ has quit [Read error: No route to host]
djbw has quit [Read error: Connection reset by peer]
morphis has quit []
<Lynne> for those interested, for h264, vulkan is 30% slower than vaapi
<Lynne> could be improved by possibly using sparse buffers for slice data
agd5f has quit [Read error: Connection reset by peer]
<Lynne> and host mapping
<Lynne> the kernel seems to spend a bit of time in prepare_transfer
wens_ has joined #dri-devel
wens has quit [Read error: Connection reset by peer]
MajorBiscuit has quit [Ping timeout: 480 seconds]
agd5f has joined #dri-devel
rmckeever has joined #dri-devel
JoniSt has joined #dri-devel
<JoniSt> Hey, I'm trying to debug a rather nasty GPU hang on my W6800 at the moment that occurs when I launch ROCm workloads. I get SMU hangs. I noticed that the graphics clock of my card goes to its maximum as soon as ROCm starts up (but way before there's actual load on the GPU)... Does anyone know what code exactly pins the GPU at max frequency? Is it ROCm itself, amdkfd, or amdgpu?
<kisak> probably worth asking in the radeon-specific channel
<JoniSt> Ah, true :)
jkrzyszt has quit [Remote host closed the connection]
djbw has joined #dri-devel
Danct12 has quit [Quit: Quitting]
Danct12 has joined #dri-devel
apinheiro has quit [Ping timeout: 480 seconds]
alyssa has joined #dri-devel
<alyssa> return PIPE_QUIRK_TEXTURE_BORDER_COLOR_SWIZZLE_FREEDRENO;
<alyssa> this is a normal thing to have in panfrost right?
<alyssa> normal.
<alyssa> thing.
q4a has joined #dri-devel
<q4a> it's already in zink
jkrzyszt has joined #dri-devel
<q4a> src/gallium/drivers/zink/zink_screen.c
<alyssa> zink uses it for turnip which is less egregious :p
Jeremy_Rand_Talos_ has quit [Remote host closed the connection]
Jeremy_Rand_Talos_ has joined #dri-devel
bgs has quit [Remote host closed the connection]
turol has quit [Ping timeout: 480 seconds]
mvlad has quit [Remote host closed the connection]
ybogdano has quit [Ping timeout: 480 seconds]
Haaninjo has joined #dri-devel
rmckeever has quit [Quit: Leaving]
danvet has quit [Ping timeout: 480 seconds]
rasterman has quit [Quit: Gettin' stinky!]
dcz_ has quit [Ping timeout: 480 seconds]
Duke`` has quit [Ping timeout: 480 seconds]
apinheiro has joined #dri-devel
JoniSt has quit [Quit: KVIrc 5.0.0 Aria http://www.kvirc.net/]
rgallaispou1 has joined #dri-devel
fab has quit [Quit: fab]
heat__ has quit [Read error: Connection reset by peer]
heat__ has joined #dri-devel
rgallaispou1 has quit [Read error: Connection reset by peer]
rgallaispou has quit [Ping timeout: 480 seconds]
rgallaispou has joined #dri-devel
<Lynne> implemented pooling for headers in hevcdec, it's better now, 500fps for typical low bitrate 1080p
<Lynne> still not quite vaapi's 670fps
Akari has joined #dri-devel
<Lynne> "24.93% [kernel] [k] vcn_v3_0_ring_patch_cs_in_place"
<Lynne> hello
ahajda_ has joined #dri-devel
* airlied will be offline for a bit of today, will hopefully get time to dig a bit into perf later
<Lynne> np, I can move on to encoding now
<agd5f> Lynne, vcn_v3_0_ring_patch_cs_in_place is required because the VCN engines are asymmetric on some chips (e.g., first engine supports all codecs, second instance doesn't). So we need to see what codec the user is requesting and make sure the job ends up on the right engine
ahajda__ has quit [Ping timeout: 480 seconds]
<Lynne> right
<Lynne> on VAAPI, that symbol's nowhere near the top
<Lynne> is scheduling done differently?
<Lynne> btw nvidia speed for hevc 1080p - 800fps nvdec, 390 vulkan
heat__ has quit [Remote host closed the connection]
<agd5f> shouldn't be any different
* airlied wonders can vulkan expose the asymmetry to avoid that
<agd5f> airlied, we purposely did that so that the kernel could load balance properly
<agd5f> otherwise how to do you know which queue to use in userspace
<agd5f> I guess we could have added a codec flag to the context or something like that
heat has joined #dri-devel
pcercuei has quit [Quit: dodo]
pjakobsson_ has joined #dri-devel
pjakobsson has quit [Ping timeout: 480 seconds]
jkrzyszt has quit [Remote host closed the connection]
jkrzyszt has joined #dri-devel
turol has joined #dri-devel
tobiasjakobi has joined #dri-devel
tobiasjakobi has quit [Remote host closed the connection]
apinheiro has quit [Quit: Leaving]
jkrzyszt has quit [Ping timeout: 480 seconds]
warpme_____ has quit []