ChanServ changed the topic of #dri-devel to: <ajax> nothing involved with X should ever be unable to find a bar
Daanct12 has joined #dri-devel
pnowack has quit [Quit: pnowack]
jewins has quit [Ping timeout: 480 seconds]
tursulin has quit [Ping timeout: 480 seconds]
garrison has joined #dri-devel
i-garrison has quit [Read error: Connection reset by peer]
mclasen has quit []
mclasen has joined #dri-devel
nchery has quit [Read error: Connection reset by peer]
nchery has joined #dri-devel
mclasen has quit []
mclasen has joined #dri-devel
camus1 has quit []
co1umbarius has joined #dri-devel
camus has joined #dri-devel
kts has quit [Ping timeout: 480 seconds]
mclasen has quit []
mclasen has joined #dri-devel
<mareko> what's faster - virgl or zink+venus?
ella-0 has joined #dri-devel
ella-0_ has quit [Remote host closed the connection]
Company has quit [Quit: Leaving]
mclasen has quit []
mclasen has joined #dri-devel
aravind has joined #dri-devel
<jekstrand> mareko: I'm not sure if venus is far enough along to answer that question. My suspiscion is that venus would be quite a bit faster once things are tuned.
mclasen has quit []
mclasen has joined #dri-devel
oneforall2 has quit [Remote host closed the connection]
shankaru has joined #dri-devel
xd123 has quit [Remote host closed the connection]
shankaru has quit [Quit: Leaving.]
shankaru has joined #dri-devel
sdutt_ has joined #dri-devel
sdutt has quit [Read error: Connection reset by peer]
<anholt> mareko: angle+venus is already looking better than virgl iirc, so I would expect zink+venus to be similarly good for non-tilers.
pnowack has joined #dri-devel
nchery has quit [Ping timeout: 480 seconds]
pnowack has quit []
<airlied> jekstrand: did you ever get anywhere on generic ycbcr lowering?
pnowack has joined #dri-devel
Daanct12 has quit [Ping timeout: 480 seconds]
nchery has joined #dri-devel
mclasen has quit []
mclasen has joined #dri-devel
danvet has joined #dri-devel
Duke`` has joined #dri-devel
mattrope has quit [Read error: Connection reset by peer]
<airlied> jekstrand: found the stagnant branch :)
fxkamd has quit []
YuGiOhJCJ has joined #dri-devel
mclasen has quit []
mclasen has joined #dri-devel
itoral has joined #dri-devel
pallavim has joined #dri-devel
saurabhg has joined #dri-devel
lemonzest has joined #dri-devel
kbommu has joined #dri-devel
Duke`` has quit [Ping timeout: 480 seconds]
camus has quit [Ping timeout: 480 seconds]
Daanct12 has joined #dri-devel
vnayana has joined #dri-devel
mclasen has quit []
alanc has quit [Remote host closed the connection]
mclasen has joined #dri-devel
frieder has joined #dri-devel
alanc has joined #dri-devel
mclasen has quit []
mclasen has joined #dri-devel
tzimmermann has joined #dri-devel
itoral has quit [Remote host closed the connection]
itoral has joined #dri-devel
co1umbarius has quit [Remote host closed the connection]
itoral has quit [Remote host closed the connection]
frankbinns has joined #dri-devel
itoral has joined #dri-devel
frankbinns has quit []
frankbinns has joined #dri-devel
saurabh_1 has joined #dri-devel
co1umbarius has joined #dri-devel
saurabhg has quit [Ping timeout: 480 seconds]
itoral has quit [Remote host closed the connection]
itoral has joined #dri-devel
jkrzyszt has joined #dri-devel
itoral has quit [Remote host closed the connection]
itoral has joined #dri-devel
tursulin has joined #dri-devel
mclasen has quit []
mclasen has joined #dri-devel
MajorBiscuit has joined #dri-devel
MajorBiscuit has quit []
MajorBiscuit has joined #dri-devel
lynxeye has joined #dri-devel
nchery has quit [Quit: Leaving]
sdutt_ has quit [Ping timeout: 481 seconds]
camus has joined #dri-devel
<karolherbst> jekstrand: event/queue blocking stuff implemented :)
* karolherbst hopes stuff doesn't dead lock now
saurabh_1 has quit [Ping timeout: 480 seconds]
<karolherbst> I don't really hate how I implemented stuff in the end :D
mclasen has quit []
mclasen has joined #dri-devel
rasterman has joined #dri-devel
pcercuei has joined #dri-devel
<tjaalton> any idea why CONFIG_SYSFB_SIMPLEFB makes booting with nomodeset fail? the display is just stuck at showing 'loading initramfs' but the machine boots up fine
<tjaalton> at least 5.15 and up seem affected
<pq> There were long discussions about what nomodeset should actually do and how to unify that accross drivers. Maybe that's the change?
<tjaalton> where was that?
<pq> here and dri-devel mailing list I think
<javierm> tjaalton: what's your DRM driver ?
<tjaalton> I see some patchset from javierm
<tjaalton> javierm: i915
<tjaalton> if the hw is too new and the driver doesn't support it yet, the situation is mostly the same
<tjaalton> -> no output
<javierm> tjaalton: and do you have CONFIG_DRM_SIMPLEDRM enabled ?
<tjaalton> yes
<javierm> tjaalton: can you share your boot log ?
<javierm> or dmesg output
<tjaalton> sure, hang on
rkanwal has joined #dri-devel
<javierm> and btw, the nomodeset handling for PCI DRM drivers didn't really change, only for platform DRM drivers
<javierm> but simpledrm didn't handle some setups where efifb was working correctly
<pq> cool
<tjaalton> well I'm not seeing simpledrm loading
<tjaalton> so could be it's a distro thing
<pq> Am I being too PITA in the "drm: Add GPU reset sysfs event" thread?
<tjaalton> javierm: https://dpaste.com//EQWVLSWRR
<javierm> tjaalton: ah, that would explain. Because CONFIG_SYSFB_SIMPLEFB will make the kernel to register a "simple-framebuffer" platform device instead of an "efi-framebuffer" platform device
<javierm> tjaalton: so it seems your distro has CONFIG_FB_EFI and CONFIG_SYSFB_SIMPLEFB enabled instead of CONFIG_DRM_SIMPLEDRM and CONFIG_SYSFB_SIMPLEFB
<javierm> efifb driver is registered but no "efi-framebuffer" platform device to match so the driver is never probed
<tjaalton> CONFIG_DRM_SIMPLEDRM is enabled
<tjaalton> all three are
<javierm> tjaalton: I see
<emersion> jekstrand: where are we at with this "extract implciit fence" kernel patch?
<emersion> i'd really like to make use of it
<javierm> tjaalton: so I guess that simpledrm then fails to request the I/O memory region since that was already grabbed by efifb
<javierm> tjaalton: it may be worth to increase the debug output, i.e "debug initcall_debug log_buf_len=16M ignore_loglevel"
<javierm> tjaalton: but it's a kernel config issue. You want CONFIG_DRM_SIMPLEDRM=y and CONFIG_FB_EFI not set
<tjaalton> ah
<tjaalton> I'll give that a go
<javierm> so the output would be firmware -> simpledrm -> native DRM driver or firmware -> simpledrm (with nomodeset)
<javierm> but now is firmware -> efifb (and "efi-framebuffer" pdev is never registered)
mclasen has quit []
mclasen has joined #dri-devel
saurabhg has joined #dri-devel
flacks has quit [Quit: Quitter]
itoral has quit [Remote host closed the connection]
<tjaalton> sounds like the options should conflict if they can't be both set
itoral has joined #dri-devel
flacks has joined #dri-devel
<javierm> tjaalton: agreed. There's a series improving that, one sec I'll find it
<javierm> pq: re: "drm: Add GPU reset sysfs event" thread - I believe you are reaising good points with your concerns
<pq> thanks :-)
<tjaalton> javierm: thanks, I'll read the thread
<javierm> tjaalton: but I think you are correct and CONFIG_SYSFB_SIMPLEFB should conflict with CONFIG_FB_EFI and CONFIG_FB_VESA
<javierm> tjaalton: the problem is that even when CONFIG_SYSFB_SIMPLEFB is enabled, as a fallback either a "efi-framebuffer" or "vesa-framebuffer" is registered
<javierm> but in your case the video mode was compatible with what's supported by simple{fb,drm}, and so a "simple-framebuffer" was registered instead of an "efi-framebuffer"
itoral has quit [Remote host closed the connection]
itoral has joined #dri-devel
itoral has quit [Remote host closed the connection]
itoral has joined #dri-devel
itoral has quit [Remote host closed the connection]
itoral has joined #dri-devel
<karolherbst> jekstrand, airlied: implementing all that user event stuff really helped a ton: Pass 1968 Fails 181 Crashes 29 Timeouts 0: 100%| :)
qyliss has quit [Quit: bye]
qyliss has joined #dri-devel
itoral has quit [Remote host closed the connection]
itoral has joined #dri-devel
saurabhg has quit [Ping timeout: 480 seconds]
piggz has quit [Quit: Konversation terminated!]
piggz has joined #dri-devel
<karolherbst> doing a run with image support disabled, because that seems like to account for most of the failures
saurabhg has joined #dri-devel
sagar__ has quit [Remote host closed the connection]
sagar__ has joined #dri-devel
YuGiOhJCJ has quit [Quit: YuGiOhJCJ]
rkanwal has quit [Remote host closed the connection]
rkanwal has joined #dri-devel
ahajda has joined #dri-devel
sagar__ has quit [Remote host closed the connection]
itoral has quit []
sagar__ has joined #dri-devel
saurabhg has quit [Ping timeout: 480 seconds]
rkanwal has quit [Ping timeout: 480 seconds]
mclasen has quit []
mclasen has joined #dri-devel
mvlad has joined #dri-devel
Daanct12 has quit [Ping timeout: 480 seconds]
cyrozap has quit [Quit: Client quit]
cyrozap has joined #dri-devel
cheako has joined #dri-devel
thellstrom has joined #dri-devel
pjakobsson has joined #dri-devel
jewins has joined #dri-devel
oneforall2 has joined #dri-devel
sdutt has joined #dri-devel
sdutt has quit []
sdutt has joined #dri-devel
aravind has quit [Read error: Connection reset by peer]
aravind has joined #dri-devel
ella-0_ has joined #dri-devel
ella-0 has quit [Read error: Connection reset by peer]
fxkamd has joined #dri-devel
thellstrom has quit [Remote host closed the connection]
FireBurn has quit [Quit: Konversation terminated!]
kbommu has quit [Ping timeout: 480 seconds]
mclasen has quit []
mclasen has joined #dri-devel
mattrope has joined #dri-devel
<karolherbst> Pass 1983 Fails 89 Crashes 6 Timeouts 0: 100%| with images disabled :)
saurabhg has joined #dri-devel
mclasen has quit []
mclasen has joined #dri-devel
fxkamd has quit []
saurabhg has quit [Ping timeout: 480 seconds]
<alyssa> Woo!
lemonzest has quit [Quit: WeeChat 3.4]
<jekstrand> karolherbst: \o/
<jekstrand> emersion: I think once König's fence rework lands (I need to go review it some more), he'll sign off on it.
<karolherbst> yeah.. I think embedded profile CL 1.0 without images compliance is kind of the first step :D
<emersion> jekstrand: ah nice!
<karolherbst> at least this is a valid PoC that implementing CL in rust within mesa is doable
saurabhg has joined #dri-devel
Danct12 has quit [Remote host closed the connection]
Danct12 has joined #dri-devel
<karolherbst> jekstrand: there is one thing I want to get right, and that is compile to device binary when the kernel is created to fetch some stats, like how many threads can be launched with the compiled binary. That information was somethin we never had in clover.
<karolherbst> robclark: ^^ that can also be the reason why stuff is slow with clover
<jekstrand> karolherbst: Yeah, that'd be nice
<karolherbst> okay.. fixed enough stuff, new run :D
<karolherbst> I don't want to enable full profile :(
<karolherbst> last time I did, it only got to 28% of the tests after an hour
<jekstrand> heh
<jekstrand> karolherbst: Still running llvmpipe?
<karolherbst> yeah
<karolherbst> I have work stuff to do in the meantime, so I don't want to trash my system :D
<karolherbst> _but_
<karolherbst> I think iris should work alright now as I already implemented fencing and stuff
<jekstrand> \o/
<karolherbst> jekstrand: but my CPU isn't that slow, so I don't think it should matter all that much
<karolherbst> those conversions tests just take soo long
<karolherbst> yeah. that fencing got me those 500 passes in the end
<karolherbst> conversions and math tests use userevents, so I implemented it all in one go
frankbinns1 has joined #dri-devel
frankbinns has quit [Read error: Connection reset by peer]
frankbinns2 has joined #dri-devel
frankbinns1 has quit [Read error: Connection reset by peer]
<jekstrand> anholt: Can I use glob specifiers in fails.txt?
frankbinns2 has quit []
frankbinns has joined #dri-devel
rkanwal has joined #dri-devel
shankaru has quit [Quit: Leaving.]
rkanwal has quit []
rkanwal has joined #dri-devel
Haaninjo has joined #dri-devel
JohnnyonFlame has joined #dri-devel
i-garrison has joined #dri-devel
garrison has quit [Read error: Connection reset by peer]
rkanwal has quit [Remote host closed the connection]
rkanwal has joined #dri-devel
<graphitemaster> super upsetting AMD squandered their SSD on GPU thing by not rolling out their API correctly, basically setup this idea for failure so no one would ever try it again.
dinfuehr has quit [Quit: ZNC 1.7.5+deb4 - https://znc.in]
<graphitemaster> Despite the fact it would've been a _huge_ paradigm shift and actually useful for games and video editing.
<imirkin> isn't MS rolling out a DirectStorage now, which i'm guessing is just P2P DMA?
<graphitemaster> DirectStorage just provisisons a regular SSD for GPU access over PCIe, when AMD's SSG stuff put the SSD on the GPU and shortened the traces and allowed the GPU to directly access it.
<graphitemaster> Like without that expensive bus and protocol and slower transfer speeds.
<imirkin> you mean an SSD *on* the GPU?
kbommu has joined #dri-devel
<graphitemaster> DirectStorage is sort of the middle-ground I guess.
<graphitemaster> Yeah AMD has the "SSG", an SSD *on* the GPU
<imirkin> i.e. direct-attached or whatever
<imirkin> ah, had no idea that was a thing
<karolherbst> the hell.. map_buffers broken for subbuffers...
<MrCooper> graphitemaster: pretty sure it's on the graphics card (connected via a PCIe bridge), not on the GPU itself :)
<imirkin> i think that was implied :p
<imirkin> MrCooper: do you take issue with people calling the computer case a "CPU"? :)
<MrCooper> maybe DirectStorage can work with that as well
<MrCooper> imirkin: never heard that, but I most certainly would
rkanwal1 has joined #dri-devel
<imirkin> MrCooper: it's pretty common in non-tech circles
<imirkin> connect the monitor to the CPU :)
<vsyrjala> not connect the computer to the cpu?
<imirkin> good point.
<imirkin> although with the all-in-ones, hard to tell
<vsyrjala> we should go back to the "where is the the cpu? it is hangar 2" days
<imirkin> in those days, that was accurate, of course
rkanwal has quit [Ping timeout: 480 seconds]
piggz has quit [Ping timeout: 480 seconds]
<agd5f> graphitemaster, P2P DMA has been a struggle to enable on all OSes.
jewins1 has joined #dri-devel
<agd5f> yeah, SSG was just a PCIe bridge with GPU and SSDs
<pq> Ok, I give up on the GPU reset sysfs event. Trying to figure out what it's used for is like pulling teeth. Have fun, go wild. I promise to try very hard to not care anymore.
<agd5f> pq, are we talking past each other maybe?
jewins has quit [Ping timeout: 480 seconds]
<agd5f> pq, the requirement we have to to provide an event that an application can listen for to be notified that a GPU reset took place.
<agd5f> it could be a telemetry app to provide better data about GPU hangs, or it could be a daemon that wants to do something more like I described.
<agd5f> we just need an notification regardless of the use case
<agd5f> other drivers already have notifications. I guess we could add an amdgpu specific one as well.
<emersion> but other drivers have driver-specific notifications to be able to support robustness, i think?
<emersion> pq, :(
<emersion> agd5f: introducing a new uAPI doesn't work if you come with "some userspace which might exist in the future might need it"
<emersion> the userspace already has to exist (as in: patches posted somewhere) by the time the kernel-side uAPI patches are submitted
<emersion> (perfectly fine to send RFCs before userspace exists, but i'm explaining the merge criteria here)
mbrost has joined #dri-devel
rkanwal has joined #dri-devel
rkanwal1 has quit [Ping timeout: 480 seconds]
nchery has joined #dri-devel
<agd5f> emersion, I understand the merge criteria. I wasn't proposing to merge this without it.
Duke`` has joined #dri-devel
piggz_ has joined #dri-devel
<anarsoul> out of curiosity, will there be in-person XDC this year?
rkanwal has quit [Ping timeout: 480 seconds]
<karolherbst> anarsoul: I guess that's really hard to tell at this point, although... depends on the country
<karolherbst> but I hope so because I started to lose all my patience with those anti vaxers
<karolherbst> and I stop to care
piggz_ has left #dri-devel [#dri-devel]
piggz has joined #dri-devel
<HdkR> Plan is for Minneapolis this year right?
<karolherbst> ohh, true
piggz has left #dri-devel [#dri-devel]
piggz has joined #dri-devel
<piggz> probably our compositor doesnt support dmabuf
<karolherbst> piggz: probable indeed
aravind has quit [Ping timeout: 480 seconds]
<karolherbst> piggz: looking at the MR it does point towards MRs for compositors
ybogdano has joined #dri-devel
idr has joined #dri-devel
tobiasjakobi has joined #dri-devel
tobiasjakobi has quit []
jkrzyszt has quit [Ping timeout: 480 seconds]
mclasen has quit []
mclasen has joined #dri-devel
<MrCooper> leandrohrb: ^ is it intentional that https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11248 made the Wayland dma-buf protocol required?
<alyssa> anarsoul: if so I'm hoping for a hybrid option
<anarsoul> :(
<alyssa> but it does look like it'll be in person
saurabhg has quit [Ping timeout: 480 seconds]
<anarsoul> alyssa: 2h flight from Toronto :)
gouchi has joined #dri-devel
<alyssa> been reflecting a lot on flying to conferences, over the pandemic, and how the environmental impact of aviation stacks up against the quality of online conferences
<alyssa> Haven't reached any firm conclusions yet
<karolherbst> alyssa: same
<alyssa> I do notice quite a bit of pressure for me to fly to in-person conferences.
<alyssa> Not a point for or against, just an observation.
<karolherbst> alyssa: well as long as I don't cross continents I am trying to use the train, but I now that can be pain in north america for instance
<alyssa> Yeah... Train is a good option for me to get to any major city in Ontario or Quebec, and that's about it.
<alyssa> In theory New York City would be doable too, in practice it would be expensive (even relative to flying) and a very long day and I don't think cross-border trains have restarted yet (since the pandemic)
<jekstrand> anholt: Is there some sort of a script for updating the fails files?
gouchi has quit [Remote host closed the connection]
<tjaalton> javierm: disabling CONFIG_FB_EFI didn't help, it still fails the same way
MajorBiscuit has quit [Quit: WeeChat 3.4]
<zmike> karolherbst: any chance you're looking to add a push constant interface to gallium?
<karolherbst> zmike: not at this point
<karolherbst> maybe in a few years? dunno. When perf starts to matter for CL :D
<zmike> just checking
<karolherbst> yeah, although I am not sure we really need it as cbuf0 is already a little special, but oh well
<karolherbst> would be better to make it explicit
<zmike> I would have used it in lavapipe to avoid some awfulness I'll have to do instead
<karolherbst> like using cbuf0 for push buffers?
<karolherbst> ehh
<karolherbst> push constants
<HdkR> alyssa: Obviously the solution is to rent an electric vehicle and drive
<karolherbst> HdkR: auto drive
<HdkR> :P
<HdkR> Never. Not until we get level 5 everywhere. So Never.
<karolherbst> well, eCars are not sustainable in the short term anyway
<karolherbst> so...
frankbinns has quit [Remote host closed the connection]
<ajax> no cars, as we currently define them, are sustainable
<karolherbst> just for something who hope that not having to change would in any way safe our planet :)
<karolherbst> ajax: :D
<karolherbst> sure
<HdkR> aye
<tjaalton> javierm: does it really require drm=y, simpledrm=y?
<karolherbst> I survived until today without a drivers license :D
nsneck has joined #dri-devel
<karolherbst> although I guess for the US that's the only thing which would work out short term.. eCars I mean...
<zmike> cbuf0 is push constants now in lavapipe but it's also gonna have to be uniforms
<zmike> which will be terrible
<karolherbst> mhhh
<alyssa> zmike: ... how would a push constant interface in gallium work?
<zmike> it'd be like
<zmike> gallium: hey you got some push constants?
<zmike> me: yeah here you go
<zmike> gallium: cool, thanks
<alyssa> r-b
<karolherbst> I'd guess it won't be used by GL :D
<karolherbst> or mhh
<karolherbst> for small uniform buffers maybe
<karolherbst> ?
<karolherbst> dunno how much space devices usually have for that
<HdkR> Gallium sounds like a cool friend. Someone that would lend you a twenty for some dinner.
<zmike> gallium totally would
<alyssa> karolherbst: 512 bytes on Mali
<alyssa> (Bifrost and newer)
<karolherbst> alyssa: ufffff
<zmike> but then tc would steal that $20 and use it for coke and mentos
<karolherbst> that's not even enough for CL inputs I think
kbommu has quit [Remote host closed the connection]
<karolherbst> ahh no, 512 are good for CL 1.0
<karolherbst> can't do CL 1.1
<karolherbst> unless you use a const buffer
<karolherbst> or something else
<alyssa> (About 256 bytes on Midgard. Can be as little as 128 bytes in bad circumstances)
<karolherbst> alyssa: the absolute minimum is 256 in CL
<karolherbst> only way out: Custom device type
<karolherbst> but I guess you have const buffers, so you are fine anyway
<alyssa> for clover/rusticl, I just want you to set a cbuf to whatever you need and use that
<karolherbst> alyssa: sure.. but what would you use push constants for otherwise?
<alyssa> and the driver will sort out the optimal path itself
<karolherbst> the input buffer is generally small
<karolherbst> ahh yeah
<karolherbst> but hence zmike asking for an explicit push constants API so gallium could do that as well
<alyssa> It should not and can't
<alyssa> and this will be a mess in panvk
<karolherbst> why not?
<alyssa> because the device constraints are far too complicated to express as API
<jekstrand> Same with ANV. We'll figure out the push for you.
<alyssa> so if you want good perf, you need to just trust the driver
<alyssa> bind a cbuf and let us handle it
<karolherbst> okay
* jekstrand believes putting push constants in Vulkan was a bad choice
<karolherbst> then we won't have such an aPI
<alyssa> also please kill the weirdo clover cbuf API call thx
<alyssa> set_kernel_input or something
<alyssa> just make it a UBO bind please
<alyssa> (/cbuf)
<karolherbst> alyssa: yeah, that's my plan
<jekstrand> alyssa: The plan is to kill most of it.
frieder has quit [Remote host closed the connection]
<alyssa> jekstrand: delightful
<karolherbst> and add other stuff
<jekstrand> alyssa: The only clover-specific thing will be the one that binds a buffer and returns a fixed address.
<alyssa> nOooo
<karolherbst> alyssa: we have to extract information from compiled binaries
<karolherbst> like max thread count and the likes
<jekstrand> And maybe that
<karolherbst> as the client can be explicit about thread blocks, we have to report what a compiled binary can do, as this can be lower than what the device is capable of in the best case
Thaodan has joined #dri-devel
<HdkR> When I first heard of Vulkan push constants, I was hoping it was a construct to encode UBO updates directly in the command buffer. How sad I was when it wasn't that :(
<jekstrand> It should have been, in retrospect.
<jekstrand> I think there's ways we could have defined it that would have been less horrible.
<jekstrand> But we were all pretty enamoured with "represent the hardware!" at the time. :-/
<HdkR> So what we really need is a VK_EXT_UBO_Push_Constant and tell everyone to use that instead :P
<karolherbst> :D
<jekstrand> Something like that, maybe
pallavim has quit [Ping timeout: 480 seconds]
<jekstrand> IMO, there are two big problems with the push API:
<jekstrand> 1) The rediculously low minimum limit of 128B
<karolherbst> uhhh
<jekstrand> 2) The fact that you need new SPIR-V to switch between push and a UBO
<karolherbst> ehhh
<zmike> yeah that 128 limit is insane
<karolherbst> sounds horrible
<HdkR> 2) is the one that messes with me the most
<karolherbst> and I've seen horrible things
<jekstrand> The 128B limit is thanks primarily to Qualcomm and AMD, IIRC.
<karolherbst> AMD?
<HdkR> Also oof at 128byte limit. If it is the GPU CP just consuming the work then it should be as large as the hardware can do memory updates :P
<zmike> advanced minimalist devices
* karolherbst is still surprised how NVidia managed to make cbuf accesses as fast as registers
<karolherbst> it's even a valid idea to move constants into cbufs if that kills stupid movs
<karolherbst> as this can make shaders faster
<HdkR> karolherbst: SSBO access with uniform flag set in the latest hardware is nearly as fast even ;)
<karolherbst> HdkR: yeah..... I've heard
<jekstrand> them's some nice caches
<karolherbst> so CL constant mem living in global mem doesn't even matter all that much
<karolherbst> jekstrand: and huge
<karolherbst> like seriously huge
<karolherbst> I think we have like 18? cbufs with 64kb
<karolherbst> and for compute 10 of them are disabled for shader accesses
<jekstrand> The new LSC in Intel DG2 is supposed to be pretty spiffy but it still requires a SEND message so it's lots of extra shader work.
<karolherbst> and I suspect they are used for something else instead :)
<karolherbst> like as a huge cache
<HdkR> 20MB L1 + 40MB L2 in the compute cards is huge cache.
vnayana has quit [Ping timeout: 480 seconds]
<karolherbst> jekstrand: we can directly access ubos from inside alu instructions as sources
<karolherbst> HdkR: uhh
mclasen has quit []
<karolherbst> there is one huge drawback though
<karolherbst> non uniform indirect accesses to ubos is terrible
mclasen has joined #dri-devel
<jekstrand> karolherbst: Yeah. That's where the NV ISA seriously wins
<karolherbst> it's so terrible, it can be slower than global mem
<HdkR> Don't ever do non-uniform UBO access on Nvidia hardware. It's so bad.
<karolherbst> it really is
<HdkR> It serializes every active thread's access. So >32x slowdown
<karolherbst> but if you do such things on a GPU it's kind of your own fault
<karolherbst> don't do non uniform stuff
<karolherbst> it's that simple
* karolherbst still needs to wire up uniform regs
<HdkR> At least SSBO is your way out when you start going non-uniform
<karolherbst> yeah...
<HdkR> And uniform registers are definitely worth wiring up :D
<karolherbst> yep
<karolherbst> but you know.. perf and nouveau...
<HdkR> Free 32x perf increase for uniform work. Might help when running at idle clocks ;)
<imirkin> HdkR: doesn't speed up memory
<karolherbst> I am actually wondering how those uniform regs even work, but I suspect they put threads idle or something else weird
<karolherbst> just saving regs can't be the solution
<karolherbst> or maybe it is?
<karolherbst> imirkin: correct, but non uniform cbuf indirects are ____slow_____
<HdkR> imirkin: Pain all around
<imirkin> how often do those happen
<imirkin> compared to, say, RT writes.
<karolherbst> never outside of CTS'
<imirkin> ;)
<karolherbst> although I think in the compute world you see those crazy things
<HdkR> as far as I understand, everyone knows the Nvidia indirect cbuf pain so they all switch to SSBOs :D
<karolherbst> lol
<karolherbst> I know some of them
<karolherbst> :D
<imirkin> i've definitely seen like DX9 games have a ton of that
<imirkin> they have this weird conversion
<imirkin> which thinks it's a good idea to combine a bunch of unrelated uniforms together too
<karolherbst> imirkin: indirects, sure, but also non uniform?
<imirkin> oh. dunno.
<karolherbst> as long as the indirects are uniform that's all fine
<javierm> tjaalton: and simpledrm did probe ?
<karolherbst> non uniform is what kills perf
<javierm> tjaalton: you could try booting with the increased debug using the kernel cmdline params I mentioned above and share your boot log
mbrost has quit [Ping timeout: 480 seconds]
<karolherbst> ehh "ERROR: Kernel preprocessor __OPENCL_VERSION__ undefined!kernel_preprocessor_macros FAILED" :/
<HdkR> imirkin: https://github.com/sebbbi/perftest grep for `cbuffer{float4} load linear` on that page and watch how Nvidia (And intel?) falls apart on non-uniform access :D
<imirkin> i believe ya
<alyssa> karolherbst: non-uniform uniforms, kinda funny
<karolherbst> alyssa: yep
<anholt> jekstrand: sorry, I was out at an appointment this morning. the fails I saw looked like maybe duplicate tests being run, which we should fix by not running duplicate tests. or, in a pinch, cat results/failures.txt >> anv-fails.txt.
<karolherbst> they are called UBOs for a reason
<Sachiel> undefined behavior objects
<jekstrand> anholt: I got it sorted. A bunch of stuff went fail -> crash and I didn't move the fails.
<jekstrand> anholt: It's all fixed in gerrit CLs (most of which have been merged)
<jekstrand> Sachiel: Apart from the one depth/stencil resolve CL, where are we at on recent CTS on ANV?
<alyssa> Sachiel: undefined behaviour objects include NaNs, infinities, and negative zeros
<alyssa> forming undefined bheaviour algebras
<anholt> jekstrand: I was looking to do a deqp-runner and deqp uprev soon anyway. are all the necessary shas listed in your xfails there?
<Sachiel> jekstrand: should be mostly fine, except on bdw. Though there are a couple tooling_info crashes that you have an MR for somewhere
<karolherbst> ehhh.. why is clang crashing :(
<Sachiel> jekstrand: that's from my last rc2 run though, so if we go to the currently last stable we might have more issues that got fixed since
<jekstrand> anholt: I've got change-ids for the two new groups in !14961
mbrost has joined #dri-devel
<karolherbst> next: clEnqueueBarrierWithWaitList
<karolherbst> ehhh
<karolherbst> is it just me or does this function feels pointless
* karolherbst wonders if I have to run the tests as a CL 1.0 platform so that old APIs are tested as well
JohnnyonFlame has quit [Ping timeout: 480 seconds]
alyssa has left #dri-devel [#dri-devel]
lemonzest has joined #dri-devel
<tjaalton> javierm: ok I'll try it tomorrow.simpledrm did not get loaded
iive has joined #dri-devel
<tjaalton> javierm: also, what benefit does simpledrm bring compared to efifb? :)
<airlied> tjaalton: wayland can run
<airlied> or at least gnome-shell
<zmike> dcbaker / eric_engestrom: need a 22.1 milestone if either of you get a minute
<tjaalton> airlied: alright
<tjaalton> quick googling says nvidia doesn't work with simpledrm?
<javierm> tjaalton: thei nvidia driver does not register an emulated fbdev device so VT is not working
oneforall2 has quit [Remote host closed the connection]
oneforall2 has joined #dri-devel
<javierm> tjaalton: it seems that always relied on efifb to register the fbdev that was used by fbcon
gouchi has joined #dri-devel
<tjaalton> javierm: okay, good to know, will have to wait for them to fix it
<dcbaker> that release is coming up, we're just 4 weeks away!
<zmike> from branchpoint?
<dcbaker> yeah
<dcbaker> April 13th
<zmike> whew
<Sachiel> monthly releases!
ngcortes has joined #dri-devel
mclasen has quit []
mclasen has joined #dri-devel
mvlad has quit [Remote host closed the connection]
<dcbaker> Sachiel: you can be the release manager then
<Sachiel> I'm not complaining
mbrost has quit [Ping timeout: 480 seconds]
<dcbaker> lol
prahal has joined #dri-devel
mclasen has quit []
mclasen has joined #dri-devel
DPA- has joined #dri-devel
DPA has quit [Ping timeout: 480 seconds]
mclasen has quit []
mclasen has joined #dri-devel
DPA- has quit [Ping timeout: 480 seconds]
DPA has joined #dri-devel
mbrost has joined #dri-devel
<karolherbst> I was just wondering why all those library related compiler test fail.. turns out I never implemented creating libraries :D
tzimmermann has quit [Quit: Leaving]
rasterman has quit [Quit: Gettin' stinky!]
lynxeye has quit []
<anholt> https://gitlab.freedesktop.org/mesa/piglit/-/merge_requests/374 : this is a pretty big deal for GL dma-buf testing, any other driver teams able to take a look?
<karolherbst> jekstrand: do you remember if something special needed to be done for extern kernels?
<karolherbst> kind of have the issue that clc complains about the same kernel being defined twice
<karolherbst> but one program is just having an extern declaration
<jekstrand> I don't remember anything about that
<jekstrand> I know we have to let extern stuff through to compile libclc
<jekstrand> But I'm not remembering extern kernels
<karolherbst> yeah....
<karolherbst> I just get this message: "The entry point "CopyBuffer", with execution model Kernel, was already defined."
<karolherbst> not sure if that's something wrong with clc or...
<karolherbst> the spirv does generate a stub kernel though
<karolherbst> mhhh
piggz has quit [Ping timeout: 480 seconds]
mbrost has quit [Read error: Connection reset by peer]
Haaninjo has quit [Quit: Ex-Chat]
<karolherbst> mhh
<karolherbst> I also don't see anything special inside CLOn12
<karolherbst> jekstrand: ohh.. I bet spirv 1.0 can't do it
<karolherbst> but I am wondering why 1.0 is used anyway
mclasen has quit []
mclasen has joined #dri-devel
<karolherbst> ehhh.. annoying
mclasen has quit []
mclasen has joined #dri-devel
mbrost has joined #dri-devel
<karolherbst> jenatali: do we have something upstream?
mclasen has quit []
mclasen has joined #dri-devel
<karolherbst> oh wow is that annoying
<karolherbst> that's causing like 10 fails :(
<airlied> karolherbst: you have the latest translator where I fixed the kernel wrappers?
<karolherbst> nope
<karolherbst> "Pass 2103 Fails 61 Crashes 7 Timeouts 0: 100%|" :) that goes smoother than I actually expected
<karolherbst> airlied: I kind of tried with whatever fedora ships
<airlied> karolherbst: okay I don't think the fix landed in Fedora
<karolherbst> did your fix even landed upstream?
<airlied> yes it eventually landed upstream
<karolherbst> ohh nice
<airlied> only took 6 months
<karolherbst> I am still on llvm-13 I think
<karolherbst> ye
<karolherbst> p
<airlied> karolherbst: what spriv-llvm-translator do you have installed?
jcdutton has joined #dri-devel
<karolherbst> 13.0.0
<jcdutton> Is this a channel to discuss things like bugs in mesa?
<airlied> dang it doesn't cherry-pick cleany
<airlied> jcdutton: yes
<jcdutton> I am working with getting a GPU working over PCIe on an aarch64 platform, and getting bus errors when mesa tries to write stuff to the GPU. Essentially due to an alignment problem.
<jcdutton> So, my question is, why does mesa use memcpy when it should instead use a memcpy function that does aligned writes.
<jekstrand> Probably because memcpy is the obvious thing
<jekstrand> Can you be a bit more specific about what Mesa driver and where it's using an invalid memcpy?
<jcdutton> and search for "memcpy" on that web page
<karolherbst> I guess the main reason is, that some of the drivers are written with "x86 only" in mind, so stuff just assumes x86
<jekstrand> And things mapped across a PCIe BAR on aarch64 isn't exactly a use-case anyone has spent much time on.
<karolherbst> and will probably never work 100%
<karolherbst> because Arm is Arm and some SoC just miss out some MMU features or whatever random things which will lead to breakage
<karolherbst> also firmware also kernel drivers also....
<karolherbst> so.. patches are welcomed :)
<jekstrand> If memcpy to a mmap'd buffer doesn't work, you're going to have problems with a lot more than just mesa...
<jekstrand> That line you're seeing in Mesa is similar to things apps do all the time.
<karolherbst> well in this case it's about unaligned access, no?
<jekstrand> Maybe we need to fix memcpy?
<jcdutton> jekstrand, the problem is specific to a memcpy to a mmap that involves the PCIe bus
<karolherbst> jekstrand: yeah.....
<karolherbst> soo..
<karolherbst> people bring that up a lot of times
<karolherbst> thing is
<karolherbst> some Arm SoC are fine with that
<karolherbst> some... aren't
<anholt> memcpy doing unaligned accesses on armv8 would be broken. I'd guess the kernel isn't trapping it and fixing it up for you when it's bus memory, but the problem really is the unaligned access which is supposed to be invalid on arm.
<karolherbst> so do you want to hurt everybody because some of the crappy ones are not as feature rich?
<karolherbst> it's a tough call
<anholt> given rpi, I would be very unsurprised to hear that this involved some dodgy asm by raspberry pi.
<karolherbst> anholt: on all of arm or just on some of arm?
<karolherbst> I am hearing too many stories about things just work differently on this level on each Arm Soc
<jekstrand> anholt: Someone trying too hard to make memcpy fast?
<jcdutton> Well, in general, is the libc6 memcpy supposed to be alignment aware, or not. I don't actually know
<karolherbst> jcdutton: on x86 alignment only matters for things like SSE/AVX
<anholt> karolherbst: paging back in some ancient memory from rpi, but: used to trap and emulate on 32-bit, then 64-bit decided they regretted teh emulation it was now Illegal and you'd fault, but this then leaked across to 32-bit and now 32-bit arm running on 64 wouldn't work any more.
<jekstrand> memcpy is supposed to copy bytes of data between two pointers. If ARM requires alignment, it needs to handle that.
<anholt> jekstrand: I would bet money on rpi having replaced libc's memcpy.
<karolherbst> anholt: ... sounds messy
<anholt> karolherbst: it's arm!
<jekstrand> anholt: Of course... *sigh*
<karolherbst> anholt: yeah.. I know :D
<karolherbst> jekstrand: thing is.. do you think we would be the first one to try fixing memcpy?
<karolherbst> I think the issue was something along the lines that for normal memcpy it's all fine
<karolherbst> but once you involved device mem it breaks due to alignment
<jekstrand> karolherbst: I'd rather fix memcpy than replace every copy of memcpy in Mesa with a custom wrapper because memcpy is broken by someone writing too much custom asm.
<karolherbst> jekstrand: yeah.. but mempcy just works on sys mem :)
<jcdutton> karolherbst, or maybe memcpy is not the problem. If memcpy is not supposed to be aligment aware, we really need a different memcpy like function that is specifically alignment aware, so things like mesa can use the alignment version only when needed.
<karolherbst> and once you fix it for io mem, the random dudes will arive shouting about 5% sys perf regressions
<jekstrand> karolherbst: Yes, but the app is also going to memcpy into buffer memory and we're not going to also fix all those.
<karolherbst> jcdutton: bingo
<karolherbst> jekstrand: don't tell me
<karolherbst> I just explain why it hasn't fixed yet
<karolherbst> this issue is old
<karolherbst> nobody fixed it until now, do you really want to jump into this bikeshed of epic proportions?
<jekstrand> If you just fix it in mesa you'll maybe fix GL 1.2 and GLES 1.
<jekstrand> The moment an app uses a vertex buffer, you're toast
<karolherbst> sure
<karolherbst> so?
<anholt> the other piece I'd point out here is: there's lots of system memcpy happening on lots of arm platforms in mesa. so, what's special about this rpi and pcie case? this is why I'm betting on some mad rpi hax having broken memcpy.
<karolherbst> fix the apps using aligned memcpys :)
<jekstrand> So there's no point in putting hundreds of hacks in Mesa to fix a busted platform.
<karolherbst> jekstrand: yeah. that's the comclusion everybody reaches
<karolherbst> anholt: pcie is iomem where SoCs usually just have stolen ram?
<jcdutton> Are there arm platforms that can do memcpy across PCIe that do not need alignment?
<karolherbst> x86 I'd say
<karolherbst> because that stuff just works there
<karolherbst> and some Arm SoCs
<karolherbst> don't ask which one
<jekstrand> karolherbst: That doesn't matter. Unless we're also going to go "fix" every app on the planet, there's no point "fixing" mesa.
<anholt> karolherbst: from the kernel's perspective it's not importantly different.
<karolherbst> I just got told that some are better than others
lemonzest has quit [Quit: WeeChat 3.4]
<karolherbst> jekstrand: correct
<jekstrand> Or we can fix them all at the same time by fixing memcpy()
<karolherbst> jekstrand: try it
Duke`` has quit [Ping timeout: 480 seconds]
<karolherbst> I am just saying you won't be the first one to try
<karolherbst> anholt: but does it matter for the hw?
<jcdutton> Arn't there other CPUs that throw bus errors? I know those ancient sparc cpus did.
<karolherbst> might be
sdutt has quit [Read error: Connection reset by peer]
<karolherbst> but PCIe drivers are usually written against x86 in mind
<karolherbst> powerpc if you are adventerous
<HdkR> Ouch, unaligned IO memcpy? Asking for trouble there :P
<jekstrand> And, on those CPUs, memcpy works.
<icecream95> Even if memcpy is fixed, compiler optimisations might still generate code that causes SIGBUS
<karolherbst> :D
<karolherbst> okay, now that was something I wasn't aware of
<jcdutton> The problem with trying to fix it in memcpy, is that making the dest aligned, and the src not, is relatively easy, and does not really add much of a performance hit.
<karolherbst> but it makes sense
<anholt> icecream95: that is not my experience on ARM that sigbusses on unaligned access.
<jcdutton> It is far more difficult to do a memcpy that does both src and dest alignment
<anholt> jcdutton: and yet memcpy does have to handle it.
<karolherbst> maybe I should ask nvidia on how they are going to handle that or are already
<jcdutton> anholt, but the app, like mesa, knows better whether the src or the dest needs alignment, so could call a different memcpy function depending on which was needed, and therefore not hit performance.
Anorelsan has joined #dri-devel
<karolherbst> jcdutton: the problem isn't mesa
<karolherbst> it's all the apps getting mapped memory
<jekstrand> jcdutton: Sure, we can use memcpy_aligned() as a perf optimization. But it's an optimization, not for correctness. memcpy() has to work.
<anholt> jcdutton: theoretically, it could. sure. yet, if your memcpy is violating the requirements of the memcpy function, then you're going to waste a lot of time trying to convince everyone else to use a memcpy-with-promises instead of just normal memcpy.
<karolherbst> you can just map GPU memory into your address space and do whatever
<karolherbst> through GL/Vk
<karolherbst> yeah.. the only real fix is to fix memcpy
<karolherbst> or just declare some Arm Socs as broken
gouchi has quit [Remote host closed the connection]
<HdkR> Declare some Arm Socs broken
<jcdutton> I looked at the memcpy for sparc. It is hellishly complex asm code!
<karolherbst> that's what the ones would say who don't have that issue :D
<karolherbst> jcdutton: yep
<HdkR> unaligned memory accesses on ARM shouldn't fault except in edge case instruction usage.
<karolherbst> peak performance
<karolherbst> vectorization of memcpy comes with a price of utterly unreadable and overly complex code
<karolherbst> but it is fast
<jcdutton> but the sparc code is alignment aware, which it probably why it is so complex
<karolherbst> uhh
<karolherbst> well
<karolherbst> that's kind of the deal with vectorizations anyway
<karolherbst> the x86 memcpy is also doing it to be able to use SSE and AVX
<anholt> looks like I was flipped, it's arm32 that traps, arm64 is cool with unaligned.
<karolherbst> ahh
<HdkR> glibc memcpy is alignment aware as well. It 16byte aligns on the destination to be sane.
<HdkR> aarch64 glibc memcpy*
<HdkR> anholt: Aye, ARMv8 is good stuff :)
<HdkR> It only does a SIGBUS if you're insane and try to do unaligned atomics like certain people.
<karolherbst> HdkR: okay.. so only arm32 is broken?
<karolherbst> HdkR: please....
<HdkR> Potentially only ARMv7, AArch32 probably is also sane
<karolherbst> yeah.. it's always hard to tell
<HdkR> I haven't touched that dirty 32-bit stuff since like 2015 so hard to remember.
<karolherbst> I don't have 32 bit arm stuff at home
<karolherbst> jcdutton: anyway... the only real solution is to fix memcpy
<HdkR> Building a test...
rkanwal has joined #dri-devel
sdutt has joined #dri-devel
danvet has quit [Ping timeout: 480 seconds]
<jcdutton> memcpy sse on x86 is over 3000 lines of asm code!
<karolherbst> jep
<anholt> that sounds about right.
<karolherbst> it's peak performance
<jekstrand> And, IIRC, whether it runs backwards or forwards depends on your CPU. :D
<karolherbst> you will probably never see a memcpy impl being that optimized
<jcdutton> The arm64 memcpy is current 280 lines!
<karolherbst> so doesn't even come close to 10% of the perf
<karolherbst> how sad
<HdkR> Latest ARM spec added a wackload of memcpy and memset instructions. It damn well better be the most optimal at that point.
<anholt> jekstrand: it was real classy when intel did the backwards thing to optimize for their hardawre, where the write combiner on that hardware couldn't understand backwards access.
<jekstrand> anholt: "classy" is a word you could use for it. :)
<karolherbst> jcdutton: you can implement memcpy on x86 with like one instruction, but that would be slow as heck
<bnieuwenhuizen> karolherbst: wasn't that supposed to be pretty fast on modern Intel/AMD CPUs?
<karolherbst> the risc-v memcpy impl is probably nice with their support for variable vector widths
<karolherbst> bnieuwenhuizen: probably not as fast as avx-512 optimized memcpy
<karolherbst> but yeah.. I think the idea was to make it not as messy to implement
<karolherbst> but I bet that would just mean implementing it in firmware
<bnieuwenhuizen> avx512 is going to be messy due to clocking issues
<karolherbst> that didn't change from avx2
<karolherbst> I think it's just worse
<karolherbst> well.. intel still believes in auto vectorization, so...
nchery is now known as Guest2362
nchery has joined #dri-devel
<jcdutton> anholt, Are you sure, aarch64 should not fault? only aarch32 should ?
<anholt> jcdutton: like 80% confident, this was a quick google after trying to remember stuff I had to fight 5 years ago
<HdkR> aarch64 shouldn't fault
<HdkR> I deal with unaligned access garbage every day in AArch64
<HdkR> uncached IO ranges over PCIe? That might but I'd call that a hardware issue :D
<jcdutton> Ok, I did not know that. I will check whether the tests were down with aarch64 or 32
<jcdutton> Ok, it looks like there are bus errors on aarch64
<HdkR> If you're doing unaligned atomics then you're screwed
<jcdutton> avx512 looks pretty messy all over.
Guest2362 has quit [Ping timeout: 480 seconds]
<HdkR> SVE is the true ultra-wide vector ISA we deserve
<karolherbst> ehhh I have to implement CL_PROGRAM_BINARIES :(
<jcdutton> Most of the memcpy optimisations are around the scheduling. I.e. do a read from mem, go off do some other stuff, giving the read time to complete, then write it back out again.
<jcdutton> and pre-fetch
<karolherbst> yeah...
<karolherbst> none of that is trivial and memcpy is important enough to really care
<jcdutton> Hehe, I found how to make x86 bus error!
<idr> anholt: At your current rate of progress... you're going to be working on that driver again in another couple years. :)
<jcdutton> I don't understand, why would x86 have an instruction that causes it to bus error if unaligned writes are done, when x86 does not need it aligned ?
<karolherbst> jcdutton: fun
<karolherbst> jcdutton: I am not sure if all cases can be unaligned, but the tooling is and libc are implemented in a way that it just doesn't matter. not sure it even matters from a kernel perspective. Would be interesting to know
<HdkR> jcdutton: Some x86 instructions explicitly require alignment
<HdkR> Like CMPXCHG16B and MOVAPS
<jcdutton> HdkR, Ok
* karolherbst doesn't want to implement CL_PROGRAM_BINARIES
JohnnyonFlame has joined #dri-devel
<jcdutton> I am thinking what other applications might do mmap and then use memcpy to move data about. But, what about the use case where an app does the mmap, and then uses their own unaligned code to copy data about. e.g. inlined memcpy.
<jcdutton> I guess the only solution is to say it will only work with open source programs, so if one really wants to run app X on aarch64, they can fix it themselves.
Anorelsan has quit [Quit: Leaving]
<HdkR> jcdutton: What is the exact instruction receiving a SIGBUS on AArch64? under gdb you can `disas $pc,+4`
<jcdutton> disassembly listed there
rkanwal has quit [Ping timeout: 480 seconds]
pnowack has quit [Quit: pnowack]
<jcdutton> HdkR, in the disassembly, on aarch64, the "=> 0x0000007ff7aaeb98 <+280>: stp x12, x13, [x0]" points to the instruction that bus errored.
<HdkR> Smells like someone enabled strict alignment mode almost
<karolherbst> stp needs alignment though, no?
<karolherbst> "Unaligned accesses are allowed to addresses marked as Normal, but not to Device regions. "
<karolherbst> so everything according to docs
<jcdutton> Is there any instruction on aarch64 that does sort of "is this address a device region" ?
<karolherbst> good question
<jcdutton> one could then use that to decide to do a unaligned memcpy or resort to a slower aligned version.
<HdkR> Yea, only device region has the problem
<karolherbst> it seems like there is a way
<karolherbst> but it sounds complicated
<HdkR> It's also not available in userspace, has to be a kernel check
<karolherbst> yep
<karolherbst> one ioctl on every memcpy please
<jcdutton> hehe, that will be fast....not
<karolherbst> somehow I'd just like to see developers reactions if somebody actually proposes that
<karolherbst> jekstrand: do you think we want a stable binary format?
<karolherbst> well.. maybe not stable stable, but versioned so we can read out older ones
<karolherbst> I was thinking of just storing the spirv for now and recreate state from that. But we have to attach a bit of metadata, like what device it belongs to and stuff
<HdkR> `where 4G and 2G boards will only need to force uncached` ugh, so it is just buggy hardware reading through that issue
<karolherbst> HdkR: did you expect anything else?
<HdkR> Well it gives me more insight at least. Don't stick a dGPU on that platform.
<karolherbst> yeah...
<karolherbst> I mentioned that somewhere at the top
<HdkR> :D
<jcdutton> I think this might just be a but in the aarch64 memcpy. Looking at it, it has got some code in there to do alignment correction, but then it uses a too wide write that causes an unaligned write! Probably just replacing that bus erroring asm instruction with some 4 byte writes will fix the problem.
<jcdutton> s/but/bug
<karolherbst> the thing is.. there aren't like 1 or 2 vendors involved in fixing this
<karolherbst> but like 500
<karolherbst> and everybody will tell why one solution is the crappiest of them all
<karolherbst> jcdutton: some Arm vendors will disagree
<jcdutton> FYI, the raspberry pi 4 has some even worse PCIe bugs
<HdkR> Don't stick a dGPU on that either :P
ahajda has quit [Quit: Going offline, see ya! (www.adiirc.com)]
mclasen has quit []
mclasen has joined #dri-devel
<jcdutton> The pi CM4 cannot write 64 bit values to the bus at all, aligned or not!
<karolherbst> jcdutton: I mean if there is a simple fix then go for it
<karolherbst> jcdutton: sounds like broken hw to me
<karolherbst> at least none where perf matters
<jcdutton> yes, the CM4 is broken, but this other rock aarch64 SoC is better.
lemonzest has joined #dri-devel
mclasen has quit []
mclasen has joined #dri-devel
<jekstrand> karolherbst: idk.
<daniels> well, sounds like you've discovered why the kernel defines memcpy_{from,to}io() and memset_io() separately from the non-IO versions :P
YuGiOhJCJ has joined #dri-devel
<jcdutton> daniels, yes the kernel does use memcpy_{from,to}io(), but I think from the discussions, we are not going to try that in user space.
<jcdutton> daniels, that is kind of what I was asking for at the beginning of this discussion, but I see the problem with it now.
mclasen has quit []
<daniels> the generally-observed rule is that userspace is responsible for not making direct explicit memory accesses that aren't naturally aligned, but memcpy/memset/etc are required to ensure that all their accesses are aligned
mclasen has joined #dri-devel
<karolherbst> yeah... I think making sure that runtimes only return aligned pointers to device memory is probably the only sane way out of that mess
<karolherbst> at least one where you don't have to pick fights
<karolherbst> but you can't prevent clients from doing weirdo shit
<karolherbst> jekstrand: mhh, so my idea was to store the spirv, add some metadata and hope that's enough, just making it versioned just because we might change it.. but I really don't want to tie it to mesas version because that kind of defeats the purpose of that interface
<daniels> karolherbst: ++
<karolherbst> you know what would be fun? to catch the bus error and do a slow memcpy :D
<jcdutton> On a separate note. If one is getting strang display artifacts like in: https://github.com/geerlingguy/raspberry-pi-pcie-devices/issues/336#issuecomment-1068516074
<jcdutton> how does one start debugging it?
<jcdutton> Are there any cases in mesa, where the code path is different with ARM vs X86 ?
mclasen has quit []
<HdkR> jcdutton: Yes
mclasen has joined #dri-devel
<HdkR> Mostly around data transformations though, nothing completely changing
<HdkR> Oh, and Intel userspace drivers that just break when compiled for AArch64
<karolherbst> jcdutton: I'd keep it, looks cool
<jcdutton> Just out of interest, is there any use case where one would actually need an Intel userspace driver to compile for AArch64 ?
<jcdutton> karolherbst, hehe