ChanServ changed the topic of #dri-devel to: <ajax> nothing involved with X should ever be unable to find a bar
<graphitemaster> Everything being the same old single GDDR / HBM is so ridiculous.
<mareko> is that a joke? :)
<graphitemaster> It's not a joke, no. I want two separate types of memory and two separate cache systems with explicit intrinsics in my shading language to flush and syncronize the caches
<graphitemaster> shared memory doesn't count, it's too small and it's not programmable.
<graphitemaster> Also it eats away at cache
<graphitemaster> (programmable as in I can upload data to it from the CPU like any other resource)
<graphitemaster> AMD has ubiquitous tiled resources which basically live in their own world and have a different page size and everything
<graphitemaster> So it's not an insane ask.
<mattst88> is https://pastebin.com/HtT7yUcK a bit surprising to anyone?
<anholt> not surprising to me (unfortunately)
<anholt> but, also, are you trying to do the string table thing for perf metrics, by chance?
<imirkin> mattst88: maor const
<imirkin> mattst88: try static const char *const seasion[]
<mattst88> ugh. not a huge deal, as the next step was to replace the pointers with just an index into the table
<mattst88> anholt: yeah
<anholt> mattst88: that won't help you, anyway, because your [0] = are relocs, and your season[2] is a reloc
<mattst88> imirkin: wow, that helps with gcc, but not clang (!!!)
ybogdano has quit [Ping timeout: 480 seconds]
<mattst88> anholt: yeah, I was just trying to make it a clean intermediate step
<anholt> I see
<imirkin> mattst88: hmmm ... i guess yeah, unclear if there's a way to say that an array's pointers are immutable
<imirkin> (or maybe that's not what it's even complaining about? dunno)
<imirkin> mattst88: also -std=c11 could help? or hurt :)
<mattst88> imirkin: yeah, still doesn't work with clang :(
<imirkin> is the clang error the same?
<mattst88> yeah
<mattst88> > t.c:12:43: error: initializer element is not a compile-time constant
<gawin> just use constexpr from c++ /s
<anholt> if it's your intermediate step, just drop the consts and move on, C is not your friend here.
<alyssa> mareko: Mali does GS+XFB with no assistance from the hardware
<mattst88> anholt: yeah, that's the plan
<alyssa> well, technically there's some special formats for loading attribute data with funny topologies (patches or adjacency or something)
<alyssa> other than that none, it's 1000s of lines of assembly in a half dozen compute kernels that monkey patch the command stream
<mattst88> well, except I already made the intel_perf_query_counter data into static const bufferes, so no can do there.
<mattst88> oh wow, I can just drop the const
nchery has quit [Ping timeout: 480 seconds]
ngcortes has joined #dri-devel
<jenatali> FWIW it wouldn't work with MSVC either I think
<mattst88> thanks
<alyssa> jekstrand: oh gosh is "v2f32" llvm syntax
<alyssa> is Mali assembly syntax just llvm syntax
<alyssa> what have i done
<jekstrand> alyssa: Maybe?
<anholt> alyssa: got started on your reimplementing llvm project, clearly.
<alyssa> anholt: Noooo
<ccr> anholt, imirkin, https://reviews.llvm.org/D76096
<ccr> eh, I meant mattst88 ^
<mattst88> ccr: oh, interesting. thanks for the link
<bylaws> alyssa: if you set the DF_0_GLOBAL flag on a so and then dlopen it it'll act as LD_PRELOAD for all subsequently loaded libs
fxkamd has quit []
ppascher has joined #dri-devel
<alyssa> Wild
<DrNick> you mean DF_1_GLOBAL?
LexSfX has joined #dri-devel
mszyprow_ has quit [Ping timeout: 480 seconds]
* alyssa is going to /part because overstimulated
<alyssa> over in #panfrost if you need me
alyssa has left #dri-devel [#dri-devel]
<DrNick> glibc's dlfcn.h says #define DF_1_GLOBAL 0x00000002 /* Set RTLD_GLOBAL for this object. */ but doesn't actually implement it, Solaris has the define but documents it as unused, apparently it came from the *BSDs
tursulin has quit [Read error: Connection reset by peer]
<jekstrand> Ok, I take it all back. Docker is for the birds!
<anholt> I don't use docker much for my actual dev, but I'm curious: what trouble did you get in?
<jekstrand> Oh, running things under qemu is taking forever
<anholt> oh, yeah. do not recommend.
linearcannon has quit [Read error: Connection reset by peer]
<jekstrand> I thought I had a pretty good setup with running under qemu and then kicking off to icecream with real x86_64 binaries of the aarch64 compiler for the actual building.
<jekstrand> But qemu can't even run fast enough to keep icecream fed.
<anholt> just use a meson cross file?
<anholt> not sure why you'd run any of your build stuff under qemu. unit tests run under qemu and that's hard enough.
<jekstrand> If I were on debian, that'd be easy. :-/
<jekstrand> Fedora's cross-build support is horrible
<anholt> oh.
<anholt> at that point you're stuck with using a sysroot to your arm64 chroot.
<jekstrand> Yeah, I played with sysroot a bit but I've not figured out how to get it to stop messing up stdc++ includes
<jekstrand> My pi4, on the other hand, doesn't seem to have much trouble at all keeping icecream full.
<jekstrand> Maybe that's what I do?
<jekstrand> Except I/O sucks on the pi
<jekstrand> because it's an sd card
<imirkin> what's icecream btw?
<imirkin> (doesn't feel like it'd be easy to search for that ...)
<imirkin> thanks
<jekstrand> distcc but better
<imirkin> low bar
<imirkin> anyways, neat
<jekstrand> In particular, it can cross-build
<jekstrand> as in I'm building on my pi4 with a bunch of the build jobs happening on an i9
<imirkin> right
<imirkin> i do the same with my laptop
<imirkin> (but it's same arch)
iive has quit []
karolherbst has quit [Read error: Connection reset by peer]
karolherbst has joined #dri-devel
The_Company has quit []
columbarius has joined #dri-devel
co1umbarius has quit [Ping timeout: 480 seconds]
gawin has quit [Ping timeout: 480 seconds]
alatiera5 has joined #dri-devel
alatiera has quit [Ping timeout: 480 seconds]
<graphitemaster> does glsl have any occupancy query functions
<graphitemaster> stuff like cudaOccupancyMaxActiveBlocksPerMultiprocessor
agx_ has joined #dri-devel
agx has quit [Read error: Connection reset by peer]
<idr> That looks like an API query for how much parallelism a shader will have.
<idr> I don't know of anything like that in GL or Vulkan.
<imirkin> iirc there are nv exts to expose some of that stuff
<imirkin> ah hm. i was probably thinking of NV_shader_thread_(group|shuffle). but that's something else.
<graphitemaster> Interesting how AMD implements it for HIP: https://github.com/ROCm-Developer-Tools/hipamd/blob/develop/src/hip_platform.cpp#L317
<graphitemaster> Sadly does not apply to NV. Though I wonder how tricky it would be to change it to support both.
mclasen has quit [Ping timeout: 480 seconds]
mclasen has joined #dri-devel
linearcannon has joined #dri-devel
ngcortes has quit [Ping timeout: 480 seconds]
agx_ has quit [Read error: Connection reset by peer]
agx has joined #dri-devel
pzanoni` has joined #dri-devel
jewins1 has joined #dri-devel
ramaling_ has joined #dri-devel
jewins has quit [Ping timeout: 480 seconds]
pzanoni has quit [Ping timeout: 480 seconds]
mattrope has quit [Ping timeout: 480 seconds]
ramaling has quit [Ping timeout: 480 seconds]
mattrope has joined #dri-devel
<jekstrand> imirkin, graphitemaster: THere's an NV Vulkan spec for it: VK_NV_shader_sm_builtins
<jekstrand> VkPhysicalDeviceShaderSMBuiltinsPropertiesNV::shaderWarpsPerSM
shankaru has joined #dri-devel
idr has quit [Quit: Leaving]
<HdkR> Uh oh. I have a new panel and it doesn't work with amdgpu
<jekstrand> Why does the raspbery pi 4 not have an NVME disk? SD cards suck.
<HdkR> IO is hard for ARM :D
<HdkR> Should have bought a Xavier if you want /too/ much IO
jewins1 has quit [Ping timeout: 480 seconds]
<jekstrand> This VIM3 has an NVME and it seems to work. It takes like 5 minutes to boot but then it seems to have enough I/O
<jekstrand> Not gonna win any awards but it's non-terrible
ramaling has joined #dri-devel
pzanoni has joined #dri-devel
<austriancoder> jekstrand: why not use a Debian container with meson crossfiles? Works superb here on my fedora installation.
<jekstrand> Can you then run those binaries on a fedora system?
pzanoni` has quit [Ping timeout: 480 seconds]
ramaling_ has quit [Ping timeout: 480 seconds]
<jekstrand> I'm not against containers, so long as I don't have to qemu them because qemu sucks a whole lot more than I remember.
<jekstrand> Then again, last time I did serious qemu, I was comparing it to a 1st gen beagle board so...
<airlied> the last time I did series qemu I wrote a virt gpu
Wally has joined #dri-devel
<airlied> serious doh
<jekstrand> parallel qemu is better. :P
<airlied> jekstrand: if you'd just bought an M1 like alyssa suggested you'd have saved more time :-P
<jekstrand> airlied: Probably.
<airlied> and you'd also be able to distract yourself by implementing a vulkan driver for apple, or zink on moltenvk!
<jekstrand> lol
<jekstrand> In all seriousness, they're not that expensive....
<jekstrand> But then I'd insist on running Fedora on it and I'd spend like a week figuring out how to do that. (-:
<airlied> like it could be worse you could be trying to use android on arm
<jekstrand> I'm not using android
tzimmermann has joined #dri-devel
mattrope has quit [Remote host closed the connection]
* jekstrand has a panvk build \o/
<jekstrand> And a CTS to go with it
shankaru has quit [Ping timeout: 480 seconds]
Wally has quit [Quit: Page closed]
sdutt_ has joined #dri-devel
sdutt has quit [Read error: Connection reset by peer]
itoral has joined #dri-devel
Duke`` has joined #dri-devel
mclasen has quit [Ping timeout: 480 seconds]
mszyprow_ has joined #dri-devel
ahajda has joined #dri-devel
Duke`` has quit [Ping timeout: 480 seconds]
danvet has joined #dri-devel
alanc has quit [Remote host closed the connection]
alanc has joined #dri-devel
dllud has joined #dri-devel
dllud_ has quit [Read error: Connection reset by peer]
<daniels> jekstrand: unfortunately it’s eMMC rather than NVMe
mlankhorst has joined #dri-devel
Company has joined #dri-devel
MajorBiscuit has joined #dri-devel
mvlad has joined #dri-devel
tursulin has joined #dri-devel
itoral has quit [Remote host closed the connection]
itoral has joined #dri-devel
itoral has quit [Remote host closed the connection]
itoral has joined #dri-devel
sdutt_ has quit [Ping timeout: 480 seconds]
<daniels> but yeah, RPi is notoriously not blessed with I/O
<javierm> daniels, jekstrand: what I do is to install the rpi4 edk2 firmware in a uSD and a EFI install on an USB3 disk
itoral has quit [Remote host closed the connection]
itoral has joined #dri-devel
<pepp> Kayden: probably next week
pcercuei has joined #dri-devel
JohnnyonF has joined #dri-devel
<Kayden> okay, thanks!
JohnnyonFlame has quit [Ping timeout: 480 seconds]
itoral has quit [Remote host closed the connection]
itoral has joined #dri-devel
itoral has quit [Remote host closed the connection]
itoral has joined #dri-devel
itoral has quit [Remote host closed the connection]
itoral has joined #dri-devel
<daniels> javierm: EFI! Red Hat's changed you :(
<dolphin> at least couple of months age that EFI firmware worked really poor on the rpi4
<dolphin> s/age/ago/
<dolphin> the boot time increases *much* and it was unreliable in making a successful boot :/
itoral has quit [Remote host closed the connection]
itoral has joined #dri-devel
<javierm> daniels :D
itoral has quit [Remote host closed the connection]
itoral has joined #dri-devel
Lucretia has quit []
Lucretia has joined #dri-devel
rasterman has joined #dri-devel
kuter has joined #dri-devel
<kuter> window help
<kuter> exit
kuter has quit []
itoral has quit [Remote host closed the connection]
itoral has joined #dri-devel
kuter has joined #dri-devel
kuter has quit []
itoral has quit [Remote host closed the connection]
itoral has joined #dri-devel
itoral has quit [Remote host closed the connection]
itoral has joined #dri-devel
kts has joined #dri-devel
itoral has quit [Remote host closed the connection]
itoral has joined #dri-devel
itoral has quit [Remote host closed the connection]
itoral has joined #dri-devel
itoral has quit [Remote host closed the connection]
itoral has joined #dri-devel
kts has quit [Quit: Konversation terminated!]
itoral has quit [Remote host closed the connection]
itoral has joined #dri-devel
itoral has quit [Remote host closed the connection]
itoral has joined #dri-devel
kts has joined #dri-devel
itoral has quit [Remote host closed the connection]
itoral has joined #dri-devel
sagar__ has quit [Remote host closed the connection]
kts has quit []
itoral has quit [Remote host closed the connection]
sagar__ has joined #dri-devel
itoral has joined #dri-devel
itoral has quit [Remote host closed the connection]
itoral has joined #dri-devel
itoral has quit [Remote host closed the connection]
itoral has joined #dri-devel
itoral has quit [Remote host closed the connection]
itoral has joined #dri-devel
Haaninjo has joined #dri-devel
boistordu has joined #dri-devel
flacks has quit [Quit: Quitter]
flacks has joined #dri-devel
<javierm> pinchartl: I agree with you but don't think that making the arm drivers honour the nomodeset param makes things worse than the status quo
elongbug has joined #dri-devel
<pinchartl> looks like useless code to me :-)
<javierm> pinchartl: at least users could have a known way to disable the drm drivers rather than figuring out if is built-in and have to use initcall_blacklist=rcar_du_init or modprobe.blacklist, etc
<pinchartl> having per-subsystem ways to disable drivers doesn't sound like the best idea though
itoral has quit [Remote host closed the connection]
itoral has joined #dri-devel
itoral has quit [Remote host closed the connection]
itoral has joined #dri-devel
itoral has quit []
<javierm> pinchartl: fair
<pinchartl> I'm not strictly opposed to that series, but I doubt it will be useful in most drivers
<javierm> it's surprising how nomodeset meaning changed over time. It started as a way to force text mode in vgacon and nowadays is used by gdm to decide if the wayland session should be disabled
<pq> There is something really strange in that gdm.rules file anyway, as if physical seats didn't exist as a concept.
Lucretia has quit []
Lucretia has joined #dri-devel
tobiasjakobi has joined #dri-devel
mclasen has joined #dri-devel
pnowack has joined #dri-devel
devilhorns has joined #dri-devel
<javierm> danvet, tzimmermann, pq: another thing that could ease writing new tiny drm drivers is to have a drivers/gpu/drm/tiny/tiny-skeleton.c, similar to drivers/usb/usb-skeleton.c for usb
<tzimmermann> javierm, how well does this work?
<javierm> tzimmermann: I don't know, never wrote a usb driver :)
<javierm> but when writing drivers the first thing I do is to copy one existing that is as similar as possible to my HW, so having a template would be useful
<tzimmermann> javierm, i never tried such templates in practice.
<tzimmermann> i do the same and try to follow the execution flow of an existing driver. that's not possible with the skeleton drivers, so i never had much use for them
<javierm> tzimmermann: yes, but at least you can codify there good practices and conventions. So people could start from there and fill the callbacks rather than removing / modifying an existing driver
<javierm> but maybe you are right and there's no much use for those in practice
<tzimmermann> javierm, i really don't know
<graphitemaster> jekstrand, Seems like it doesn't offer much for an already compiled program though. You still have to guess the number of vgprs and sgprs used to compute occupancy by hand, unlike the HIP and Cuda functions.
<graphitemaster> It's not impossible, the binary shaders returned from both NV and AMD have a program header that describes it. So I can do it by hand and likely will do it by hand.
<tzimmermann> javierm, i think, what i would find useful is a document on the reimplementation of tiny/cirrus.c. it would walk newbies through that driver's code and explain what the individual functions do and how they work together. cirrus is for qemu, so it's easy to tinker with it
<javierm> tzimmermann: that's a good idea too
nchery has joined #dri-devel
ahajda_ has joined #dri-devel
ahajda has quit [Read error: Connection reset by peer]
Haaninjo has quit [Quit: Ex-Chat]
Peste_Bubonica has joined #dri-devel
MrCooper has joined #dri-devel
MrCooper has quit [Remote host closed the connection]
MrCooper has joined #dri-devel
fxkamd has joined #dri-devel
<danvet> javierm, we have the drm_device skeleton in the docs
<danvet> I think maybe a simple pipe skeleton might be useful
<danvet> but the trouble with display chips is they are all wildly different
<danvet> like even within simple
<danvet> and you drop easily out of simple (e.g. as soon as you have planes)
<danvet> so I think examples are good, but maybe we need more composable examples ...
<tzimmermann> yes, that's the basic driver layout
<javierm> danvet: ah, cool. I missed that
hikiko has joined #dri-devel
<hikiko> hello! I was looking at my MRs and there's this one: https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/211 for DRM that is fixing the kms-steal-crtc test. I've never tested it properly because I lack the hardware (I based my fix on a compiler warning about the size of an input parameter in util_fill_pattern). Would anyone mind to take a look at it as it's currently rotting? :D
<hikiko> (it's for drm not mesa)
mattrope has joined #dri-devel
jewins has joined #dri-devel
sdutt has joined #dri-devel
<jekstrand> javierm: Yeah, that might be better for the pi. I don't have any particularly good USB media at the moment, though. What do you use?
craft has joined #dri-devel
craft has quit [Remote host closed the connection]
craft has joined #dri-devel
<javierm> and write to the USB media using the arm-image-installer script, because allows me to set a ssh key, add console cmdline param, etc
<javierm> i.e: sudo arm-image-installer --image=Fedora-Workstation-35-1.2.aarch64.raw.xz --target=none --media=/dev/sdb --addconsole --addkey=id_rsa.pub --norootpass --resizefs
tobiasjakobi has quit [Remote host closed the connection]
Company has quit [Ping timeout: 480 seconds]
Company has joined #dri-devel
<javierm> jekstrand: oh, you meant the USB drive ? just a cheap sandisk 64 GiB USB3 stick. It's still way faster and more reliable than a SD card :)
rpigott has quit [Read error: Connection reset by peer]
rpigott has joined #dri-devel
lemonzest has quit [Quit: WeeChat 3.4]
<jekstrand> javierm: Oh, well I do have a couple of those and can easily buy more. :)
craft has quit [Remote host closed the connection]
craft has joined #dri-devel
lemonzest has joined #dri-devel
<tzimmermann> danvet, do you have further comments on the virtfb thing?
<danvet> virtfb thing?
<danvet> oh right
<tzimmermann> that falg
<danvet> let me grab the latest
<tzimmermann> flag
<danvet> tzimmermann, I was blind
<danvet> I'll reply and explain :-)
<tzimmermann> ok
<tzimmermann> there's no hurry
* danvet was indeed missing something
<danvet> anyway sent it out, contains r-b with some bikeshed requests for your consideration
<tzimmermann> saw it. thanks a lot
<graphitemaster> Does anyone know where I can find out the binary format of shaders on NV - like when you dump them you seem to get the NVfp/NVvp stuff as plain text, but there's a program header on there too and I was just wondering if that has been reverse engineered / documented somewhere, maybe in nouveau (unlikely though), envytools, anything?
<daniels> graphitemaster: -> #nouveau
<graphitemaster> Thanks
mareko has quit [Read error: Connection reset by peer]
mslusarz has quit [Read error: Connection reset by peer]
marcheu has quit [Read error: Connection reset by peer]
gouchi has joined #dri-devel
dri-logger has quit [Read error: Connection reset by peer]
glisse has quit [Read error: Connection reset by peer]
Duke`` has joined #dri-devel
mszyprow_ has quit [Ping timeout: 480 seconds]
mslusarz has joined #dri-devel
dri-logger has joined #dri-devel
JohnnyonFlame has joined #dri-devel
JohnnyonF has quit [Ping timeout: 480 seconds]
alatiera5 is now known as alatiera
sdutt has quit []
sdutt has joined #dri-devel
mszyprow_ has joined #dri-devel
<austriancoder> Why does st/main do not skip triangle draws when FRONT_AND_BACK culling is enabled? d3d12 and svga are doing this in draw_vbo and etnaviv soon too
mszyprow_ has quit [Ping timeout: 480 seconds]
<jekstrand> How insane would it be to have a vk_descriptor_set_layout base struct?
<jekstrand> bnieuwenhuizen, dj-death: ^^
<zmike> containing what
<jekstrand> descriptor types if nothing else
<jekstrand> I'm looking at how tractable it would be to do common vk_descriptor_update_template
<jekstrand> And I at least need to know the type for each thing.
<jekstrand> Actually... I don't. That's already in the VkDescriptorUpdateTemplateEntry. \o/
<jekstrand> But, also common descriptor set stuff could maybe be useful one day.
mbrost has joined #dri-devel
tzimmermann has quit [Quit: Leaving]
craft has quit [Remote host closed the connection]
ybogdano has joined #dri-devel
Milardo has joined #dri-devel
mbrost has quit [Ping timeout: 480 seconds]
MajorBiscuit has quit [Quit: WeeChat 3.3]
<jenatali> austriancoder: I think technically front and back culling is different in that it's supposed to run the VS for side effects and be reflected in stats
<bnieuwenhuizen> jekstrand: I think Josh actually has a JIT for the template thing lying around somewhere ... Not sure if we still want to move to that in RADV
<jekstrand> a JIT?
<bnieuwenhuizen> A JIT compiler creating x86 code for the templatew updates
<jekstrand> Oof
<jekstrand> Is that necessary?
<jekstrand> Genuine question.
<jekstrand> I don't know how hot templates actually get in apps.
<bnieuwenhuizen> AFAIU there's been complaaints about perf in general and it is a significant cost in dxvk since all descriptor updates go through the template path
<bnieuwenhuizen> about template update perf in general*
<bnieuwenhuizen> though the gains weren't super great IIRC hence I'm not sure if that still was going forward
<zmike> templates are extremely hot
<zmike> basemark, for example, is like 90% template updating
<jekstrand> Good, then I'll have something to benchmark. :)
<jekstrand> I noticed panvk doesn't have templates yet and I'd rather implement it generic than implement it in panvk if we can do so and have it fast.
<austriancoder> jenatali: makes sense somehow. my hardware has no way to set front and back culling register wise. the blob driver also skips triangle draws when FRONT_AND_BACK culling is enabled.. so I will do the same in draw_vbo
<jenatali> Makes sense. I think I probably need to switch us to disabling rasterization instead of dropping the draw. Xfb/ssbo stuff should still happen I think
ybogdano has quit [Remote host closed the connection]
<jekstrand> Bah... My beautiful plan for fast updates requires a lock. :-(
<jekstrand> daaaaang... vk-gl-cts is 6.9G when built. My poor VIM3 only has 12G of eMMC
<jekstrand> I have room for one mesa build and one vk-gl-cts build and no room for test results. :-/
<jekstrand> wait, no. It should have 32G of eMMC
<lstrano> jekstrand: dnf install fuse-sshfs
mvlad has quit [Quit: Leaving]
<jekstrand> Yeah, I just need to grow the partition
* jekstrand figures out how to do that
<jekstrand> Ah, much better.
<jekstrand> Now deqp-runner won't crash from running out of disk space. :)
mlankhorst has quit [Ping timeout: 480 seconds]
lstrano_ has quit []
lstrano_ has joined #dri-devel
<jekstrand> Ok, I give up, maybe. I don't have a plan for fast template updates that doesn't involve piles of allocation or a lock in the VkDescriptorUpdateTemplate.
lstrano_ has quit []
<jekstrand> I guess that's why we have the extension in the first place. :-/
<jekstrand> Maybe the data structure can at least be common?
<jekstrand> Save a bit of copy+pasta?
<jekstrand> Yeah, I think that's the new plan.
<cwabbott> jekstrand: robclark: I took a bit of a look at pipeline cache in ir3, and it's gonna require a lot of rewriting
<jekstrand> cwabbott: I saw your post. :-/
<cwabbott> I'm still trying to figure out whether we can use the common vk cache infrastructure without depending on it in ir3
mszyprow_ has joined #dri-devel
<cwabbott> but even if we did, we have all this caching stuff in ir3 and I guess we'd have to pull out the bits we need and "shut it off" in ir3
<cwabbott> not great
<robclark> cwabbott: fwiw, the situation where lack of pipeline cache became fairly noticeable, disk_cache is disabled.. but not sure that matters from PoV of "how this all fits together".. and it is easy enough to disable the ir3 disk cache
<robclark> but I'll need to sit down and look at the vk pipeline cache helper mr to have a more intelligent response
iive has joined #dri-devel
<cwabbott> robclark: fwiw, from freedreno's point of view, I think you can think of the pipeline cache as a weird combination of ir3_program_state cache and variant cache
<cwabbott> well that, plus it's also used for caching NIR shaders (i.e. stuff that mesa/st does in gallium land)
Milardo has quit []
<cwabbott> it has an interface similar to ir3_cache if you squint hard enough, except that you can also serialize and deserialize it
<cwabbott> so it's more used for caching shader binaries rather than cmdstream
<robclark> not sure if it is a dumb idea or not, but I suppose if you serialize the nir as well you could create a new ir3_shader and then deserialize the ir3_shader_variant from that?
ybogdano has joined #dri-devel
<cwabbott> I guess, but that's more overhead than just sharing the ir3_shader_variant
<cwabbott> we'll probably be caching the ir3_shader
<cwabbott> at which point, I guess we technically wouldn't need to reference count the variants?
<cwabbott> just stop serializing/deserializing them, and have serializing/deserializing the shader also serialize the variants it owns
<cwabbott> and not even have the variants themselves in the vk pipeline cache
mszyprow_ has quit [Ping timeout: 480 seconds]
<cwabbott> yeah, that might be a workable plan
mclasen has quit [Ping timeout: 480 seconds]
mclasen has joined #dri-devel
devilhorns has quit []
<robclark> cwabbott: I started down that path before, but the thing is new variants can show up on any draw
ahajda_ has quit [Read error: Connection reset by peer]
<cwabbott> robclark: not with vulkan
<robclark> which is why I eventually gave up and made the variant the thing that gets cached/serialized
<robclark> right, I meant w/ gl
<cwabbott> so caching the shader might be the answer for vulkan
<cwabbott> although, there is the disk_cache thingy as a backup
<cwabbott> which might screw up that plan
<cwabbott> turns out lots of games don't actually use the pipeline cache and we have to provide a default one
<robclark> it would be trivial enough to add an arg to ir3_compiler to disable it's in-built disk_cache, if that simplifies things for vk
<cwabbott> except that the default one can't just dump the entire state after compiles are done, like the app is supposed to do
<cwabbott> because the driver obviously doesn't know when compiles are done
<robclark> hmm, I thought the default pipeline cache thing solved that.. but tbh I've not had a chance to look at that yet
pnowack has quit [Ping timeout: 480 seconds]
<agd5f> Lyude, I think Greg's comments are short sighted. If the patch is valid, I don't see a reason to not commit it
<Lyude> I'm fine with that, I'm mainly just wondering if linus will be unhappy with something like that or not
<Lyude> ( @ agd5f )
<agd5f> we got some fixes from umn folks and I'm not planning to revert them. The ones we got were valid bug fixes at least.
<agd5f> I don't have the time to retype valid patches. That also has ethical implications
<cwabbott> robclark: the default pipeline cache is just disk_cache
<cwabbott> which has the same problem of new variants showing up after caching the shader, if you cache the ir3_shader
mclasen has quit [Ping timeout: 480 seconds]
<robclark> cwabbott: I suppose we could keep the existing path, but also add a path that serializes the ir3_shader and it's (presumably) single variant perhaps?
<cwabbott> robclark: with the pipeline cache a ir3_shader can have multiple variants
<cwabbott> assuming we get a cache hit
<robclark> hmm
<cwabbott> atm there's only one variant
<cwabbott> so yeah, maybe caching the variant is still best, dunno
<cwabbott> I think that require partially excising the vulkan cache code into src/util and making it derive from vk_pipeline_cache_object
pnowack has joined #dri-devel
Haaninjo has joined #dri-devel
<cwabbott> yeah, the more I think about the less sure I am that it's viable
<cwabbott> disk_cache doesn't handle "updating" something with the same hash, it assumes things being cached are immutable
ngcortes has joined #dri-devel
<cwabbott> but vk_pipeline_cache would probably get messed up if we change the hash of something suddenly
<robclark> can the pipeline cache cache arbitrary things (ie both shader and variant)?
<cwabbott> yes, it can
<cwabbott> other drivers have multiple levels of cache
<cwabbott> so, a cache for spirv->nir
<cwabbott> plus an overall cache that caches all the binaries (variants) given the nir shaders + other state
<cwabbott> plus a post-linking cache (similar to existing ir3 cache)
<cwabbott> you can be as creative as you want with the combinations
<robclark> I guess you can cache ir3_shader and ir3_shader_variant separately.. although tbh if you cache the spirv->nir you don't really need to cache the shader itself
<cwabbott> well, there are a bunch of lowering steps between spirv->nir and the shader
<robclark> well, cache spirv->lowering->nir then?
<robclark> or is tu handling some variant stuff outside of ir3?
<cwabbott> tu does have some variant-like stuff
<cwabbott> for example multi-pos output depends on the multiview mask which we get from the subpass
<cwabbott> the multi-pos lowering happens in tu_create_shader
Lucretia has quit []
<robclark> so mesa/st does some variant stuff, and has it's own variant key.. so I guess you could have tu_variant_key and have three levels of caching ;-)
<cwabbott> like I said, we can get as creative as we want :)
<cwabbott> but I think we do need to cache variants unfortunately
<cwabbott> I mean, we have to cache ir3_shader_variants "ourself"
<robclark> it would be easy enough to expose the variant key and serialization stuff so it could be re-used directly from tu
Lucretia has joined #dri-devel
<jenatali> Is there a ready-made pass that lowers varying doubles into 2x uints, and deals with loads/stores appropriately?
<anholt> jenatali: feel free to steal any part of nir_to_tgsi_lower_64bit_intrinsic()
<jenatali> Cool
<jekstrand> jenatali: nir_lower_io_lower_64bit_to_32
<jekstrand> automagic!
<jenatali> There it is, that's what I'm looking for, thanks
<anholt> ooh!
Daanct12 has joined #dri-devel
<jenatali> I assumed there must've been given all the restrictions I saw in the spec about that
<anholt> that said, with how much NTT I've got waiting for review right now, probably not goign to bother cleaning up
<jekstrand> :-/
<jenatali> Looking closer I'm not dealing with varyings yet, but I'll need that soon enough probably
<jekstrand> Sorry. Some of that's waiting on me. :-|
<jekstrand> Hopefully, I'll be able to start chipping away at the review backlog next week. This week's been burned doing Collabora new hire stuff and farting around trying to figure out a good aarch64 setup.
<jekstrand> And writing a blog post
<jenatali> I'm 2/3 of the remaining extensions down to get GL4.0 :) just need to finish plumbing fp64 which should be easy enough given that there's full software support if I need it
Danct12 has quit [Ping timeout: 480 seconds]
<imirkin> jenatali: tess and gpu_shader5 are generally the hard ones in that list. you're in the home stretch
<jenatali> Yep!
<jenatali> The transform feedback 2 and 3 ones gave me a bit of a headache, especially since xfb3 actually started using multiple GS streams that I thought I'd already done with gpu_shader5
<imirkin> hehe
<jenatali> I'm pretty sure there's no piglits for positive tests for indexed queries though
<jenatali> Since I didn't implement those and I'm not seeing failures
<imirkin> uhm
<imirkin> there def are
<jenatali> Maybe they're just not in the quick_gl or quick_shader passes, hm
<imirkin> not sure precisely what quick_gl does
<zmike> use the full gpu profile
<zmike> there's definitely tests
<jenatali> Ack, I'll dig harder
<zmike> no I mean literally the profile is named 'gpu'
<imirkin> perhaps they fail for other reasons ;)
<jenatali> Oh I see. But no they're not failing for other reasons, unless their skip conditions are just completely broken :P
<imirkin> jenatali: so there's at least arb_transform_feedback_overflow_query
<imirkin> which definitely tests for it. but perhaps you don't have that ext?
<jenatali> Right
<zmike> isn't that 4.4 or something?
<imirkin> jenatali: arb_gpu_shader5/execution/xfb-streams.c
<jenatali> Huh, that's passing for me...
* jenatali sees why
<imirkin> for (i = 0; i < STREAMS; i++) {
<imirkin> glBeginQueryIndexed(GL_PRIMITIVES_GENERATED, i, queries[i]);
<imirkin> perhaps the test is easy to pass. dunno :)
<jenatali> If the shader writes 1 primitive to each stream then sure I'd pass it, right now all of those would just be stream 0 queries lol
<jenatali> Annnnd yep that's what it does lol
ngcortes has quit [Ping timeout: 480 seconds]
<imirkin> not a great test then :)
<zmike> I think the enhanced layouts tests do more with streams
<jenatali> I'll go ahead and hook up the index and maybe I'll just pass those tests when I get to them. Or when I dig into CTS for these new features
<imirkin> there are def CTS tests for this stuff too
<jenatali> Yeah I'd assumed so
ngcortes has joined #dri-devel
mlankhorst has joined #dri-devel
Peste_Bubonica has quit [Quit: Leaving]
<jekstrand> Ugh... Looks like the Vulkan CTS is broken for 1.0. :-/
<jekstrand> Specifically, stuff calling GetPhysicalDeviceProperties2 unconditionally...
<imirkin> should add an option to drivers to expose the minimum stuff?
<imirkin> (to help test cts ;) )
<jekstrand> Yeah...
<imirkin> or maybe fuzz it
<jekstrand> Or I can just enable VK_KHR_get_physical_device_properties2 in panvk and forget about it.
<imirkin> hehehe
<imirkin> i wonder which will take longer
<imirkin> adding a switch to the driver, or fixing cts
<jekstrand> Nothing takes longer than fixing dEQP bugs
nico_ has joined #dri-devel
nico_ has left #dri-devel [#dri-devel]
<anarsoul> jekstrand: I just finished reading https://www.jlekstrand.net/jason/blog/2022/01/in-defense-of-nir/ - that's a really nice post :)
Haaninjo has quit [Quit: Ex-Chat]
Duke`` has quit [Ping timeout: 480 seconds]
<graphitemaster> Yeah, a really nice post.
nsneck has joined #dri-devel
ngcortes has quit [Read error: Connection reset by peer]
LexSfX has quit []
nsneck has quit [Quit: bye]
nsneck has joined #dri-devel
hch12907 has joined #dri-devel
LexSfX has joined #dri-devel
hch12907_ has quit [Ping timeout: 480 seconds]
hch12907_ has joined #dri-devel
hch12907 has quit [Ping timeout: 480 seconds]
jfalempe has quit [Quit: Leaving]
hch12907 has joined #dri-devel
<jenatali> Ugh. Why don't we have double-precision ffract
<FLHerne> jekstrand: The existence of that post makes me worry Intel want to ditch NIR and do some nonsensical over-the-wall thing with IBC :-(
<FLHerne> also that you've left
gouchi has quit [Remote host closed the connection]
hch12907_ has quit [Ping timeout: 480 seconds]
hch12907_ has joined #dri-devel
hch12907 has quit [Ping timeout: 480 seconds]
moony has joined #dri-devel
hch12907 has joined #dri-devel
tjaalton has joined #dri-devel
iive has quit [Ping timeout: 480 seconds]
urja has joined #dri-devel
hch12907_ has quit [Ping timeout: 480 seconds]
<anholt> argh, wtf. skqp built for amd64: runs vk backend tests fine. skqp built for arm64 with the same flags: opens libvulkan, but doesn't even log anything under VK_LOADER_DEBUG=all, never gets to the driver, acts as if the tests don't exist.
<anholt> gagallo7[m]: any idea?
iive has joined #dri-devel
<anholt> oh. well done, skia. opens libvulkan.so instead of libvulkan.so.1 because library abis are for chumps. arm system didn't have libvulkan-dev, so the bare .so link was missing.
mszyprow_ has joined #dri-devel
<HdkR> Ouch
ngcortes has joined #dri-devel
nchery has quit [Ping timeout: 480 seconds]
mlankhorst has quit [Ping timeout: 480 seconds]
hch12907_ has joined #dri-devel