ChanServ changed the topic of #dri-devel to: <ajax> nothing involved with X should ever be unable to find a bar
kzd has quit [Quit: kzd]
kzd has joined #dri-devel
<mareko> gfxstrand: what is NAK?
<airlied> nvidia kompiler
<mareko> so not NCO
<mareko> maybe the NXPTX LLVM backend is the answer, who knows
<mareko> *NVPTX
<karolherbst> the answer to what? ptx is a high level language
<airlied> where's my ptx to spir-v translator :-P
<airlied> https://github.com/gthparch/NVPTX-SPIRV-Translator oh someone wrote it :-P
<karolherbst> I wonder how often we can translate in circles until something crashes
<airlied> or someone could write tcg like layer for nvidia binaries :-P
<karolherbst> uhhh
<alyssa> karolherbst: dozen + vkd3d all the things
<karolherbst> mhhh
<alyssa> or vkd3d + dozen if you prefer
leo60228- has quit [Ping timeout: 480 seconds]
co1umbarius has joined #dri-devel
columbarius has quit [Ping timeout: 480 seconds]
MrCooper_ has joined #dri-devel
MrCooper has quit [Ping timeout: 480 seconds]
<idr> karolherbst: It's like that game of translating some bit of text through various human languages until you get total gibberish.
pallavim_ has joined #dri-devel
benjaminl has quit [Quit: WeeChat 3.8]
benjaminl has joined #dri-devel
<alyssa> Es como ese juego de traducir un poco de texto entre varios idiomas humanos hasta que no tenga sentido
pallavim has quit [Ping timeout: 480 seconds]
camus has quit []
camus has joined #dri-devel
yuq825 has joined #dri-devel
YuGiOhJCJ has joined #dri-devel
tuxayo_ has joined #dri-devel
tuxayo_ has quit [Quit: Page closed]
tuxayo_ has joined #dri-devel
tuxayo_ has left #dri-devel [#dri-devel]
test has joined #dri-devel
test has left #dri-devel [#dri-devel]
test has joined #dri-devel
tuxayo has joined #dri-devel
<tuxayo> Hi, hola, saluton :) Does anyone know about what could be missing in an AppImage to have Vulkan support? Someone worked on an AppImage for the 0ad game and when enabling the Vulkan renderer, it doesn't detect support (probes for VK_KHR_surface) on what seems to be any Intel and AMD GPUs in general (so it seems to have something to do with Mesa).
<tuxayo> And it falls back on OpenGL.
<tuxayo> On an NVIDIA GPU it works (I'm assuming it was the non-libre driver).
<tuxayo> So it can find the right mesa stuff when using OpenGL but when using Vulkan it doesn't find it. But it does find the non-libre NVIDIA Vulkan driver...
<tuxayo> Any clue? Here is the main build script and it seem to do nothing in particular to give us good Mesa OpenGL support: https://github.com/0ad-matters/0ad-appimage/blob/trunk/workflow.sh
<tuxayo> And here is the head-scratching so far:
idr has quit [Quit: Leaving]
<airlied> tuxayo: probably missing the vulkan loader
<airlied> but probably also need the mesa vulkan drivers
<airlied> not sure how appimage works there
heat__ has quit [Remote host closed the connection]
heat__ has joined #dri-devel
<airlied> tuxayo: maybe also the headers to build against, not sure how NVIDIA works
<tuxayo> airlied: thanks for the hints. So likely linuxdeploy/AppRun which build the AppImage takes care of basic mesa stuff but lacks vulkan loader/mesa vulkan drivers/headers
<airlied> yeah if I had to guess
bmodem has joined #dri-devel
aravind has joined #dri-devel
<marcan> looks like gitlab is unhappy...
rauji___ has joined #dri-devel
oneforall2 has quit [Remote host closed the connection]
oneforall2 has joined #dri-devel
kzd has quit [Ping timeout: 480 seconds]
test has left #dri-devel [#dri-devel]
<Nefsen402> It's an issue for me as well so it isn't localized
fxkamd has quit []
Duke`` has joined #dri-devel
JohnnyonF has joined #dri-devel
JohnnyonFlame has quit [Ping timeout: 480 seconds]
fab has joined #dri-devel
jrayhawk has quit [Read error: Connection reset by peer]
jrayhawk has joined #dri-devel
Company has quit [Quit: Leaving]
rmckeever has joined #dri-devel
bmodem has quit []
bmodem has joined #dri-devel
jewins has quit [Ping timeout: 480 seconds]
bmodem has quit [Read error: Connection reset by peer]
bmodem1 has joined #dri-devel
Duke`` has quit [Ping timeout: 480 seconds]
tzimmermann has joined #dri-devel
bgs has joined #dri-devel
bmodem1 has quit [Ping timeout: 480 seconds]
pallavim_ has quit [Ping timeout: 480 seconds]
bmodem has joined #dri-devel
bgs has quit [Remote host closed the connection]
itoral has joined #dri-devel
rmckeever has quit [Quit: Leaving]
frieder has joined #dri-devel
yuq825 has quit [Ping timeout: 480 seconds]
bmodem has quit [Remote host closed the connection]
bmodem has joined #dri-devel
junaid has joined #dri-devel
jkrzyszt has joined #dri-devel
bmodem1 has joined #dri-devel
bmodem has quit [Read error: Connection reset by peer]
fab has quit [Quit: fab]
sghuge has quit [Remote host closed the connection]
sghuge has joined #dri-devel
ngcortes has quit [Ping timeout: 480 seconds]
sima has joined #dri-devel
pcercuei has joined #dri-devel
TMM has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
TMM has joined #dri-devel
pochu has joined #dri-devel
MrCooper_ is now known as MrCooper
ngcortes has joined #dri-devel
fab_ has joined #dri-devel
fab_ is now known as Guest2311
Leopold_ has quit [Remote host closed the connection]
apinheiro has joined #dri-devel
<javierm> tzimmermann: hi, I haven't reviewed your optional fbdev series yet, but wondered what did you different than what I attempted in https://lore.kernel.org/lkml/20210827100027.1577561-1-javierm@redhat.com/t/
Leopold has joined #dri-devel
<javierm> tzimmermann: ah, I see. You want to hide all the fbdev uAPI (/dev/fb?, sysfs, etc) while I tried to only disable the "real" fbdev drivers (but keeping emulated fbdev uAPI)
<javierm> tzimmermann: so you plan to only keep the bare minimum to support fbcon, makes sense
<tzimmermann> javierm, it occured to me that we spoke about that change at some point. but i didn't remember that you even sent a patchset. i'll give you credit in the next iteration of the patchset.
tursulin has joined #dri-devel
<tzimmermann> javierm, i'm not sure what the difference is. but i was just reading the old discussion and I left a comment about the exisence of the fb device
<tzimmermann> in my patches i remove all of that. everything in devfs, sysfs and procfs is gone
<javierm> tzimmermann: yeah, I wasn't sure about the difference but after reading your cover letter I understand the difference of the approach now
<javierm> tzimmermann: I tried to keep the emulated DRM fbdev while you are also getting rid of that
<tzimmermann> fb_info will only by a data structure that connects the framebuffer device with fbcon
<tzimmermann> s/by/be
<javierm> tzimmermann: I think that did because something still dependend on that (maybe plymouth?) but that has been fixed already
<javierm> so I agree that your apparoch is better, get rid of all the uAPI for fbdev and just keep fbcon for now
<tzimmermann> there's not much in userspace that requires fbdev. i guess most doesn't even support it
<javierm> tzimmermann: yeah
<javierm> tzimmermann: I see that you will post a v2. I'll review that then
<tzimmermann> two thirds of these patches are actually bugfixes :)
<javierm> :)
<tzimmermann> javierm, your review is very welcome. i'll keep the current version up a bit longer.
ngcortes has quit [Read error: Connection reset by peer]
djbw_ has quit [Read error: Connection reset by peer]
lynxeye has joined #dri-devel
sarahwalker has joined #dri-devel
<javierm> tzimmermann: sure, I'll review v1 then
<MrCooper> DavidHeidelberg[m]: my main point was that the commit logs don't accurately reflect the situation and trade-off being made
vliaskov has joined #dri-devel
<tzimmermann> thanks, javierm
rasterman has joined #dri-devel
sgruszka has joined #dri-devel
junaid has quit [Ping timeout: 480 seconds]
<siddh> Hello, can anyone merge the revert commit [1] I had sent some time ago regarding drm macros? IIRC, the author had blindly used coccinelle and did not consider the unintended change. It is part of the drm macro series, but even if the series is not considered for merge, the revert should be since the change was incorrect.
junaid has joined #dri-devel
junaid_ has joined #dri-devel
junaid_ has quit []
<jani> siddh: it no longer applies, needs a rebase
Leopold has quit [Remote host closed the connection]
<dj-death> gfxstrand: do you remember what led you to disable compression for Anv's attachment feedback loop?
<siddh> @jani: oh okay... will send after doing a rebase
<dj-death> gfxstrand: there is a comment about aux data being separately
<dj-death> gfxstrand: but that makes no sense to me
<dj-death> gfxstrand: texturing & rendering having different caches does, but I'm failing to see where the compressed data fits in there
Leopold_ has joined #dri-devel
bmodem has joined #dri-devel
bmodem1 has quit [Read error: Connection reset by peer]
bmodem1 has joined #dri-devel
bmodem has quit [Read error: Connection reset by peer]
<javierm> tzimmermann: not sure I got your comment about the page_size on the ssd130x driver, that's not the system memory pages but the ssd130x controller "pages", that is how they divide the screen
<javierm> tzimmermann: also, the GEM shmem allocation is only done for the shadow buffer and that's bigger than the actual screen. Since is DRM_SHADOW_PLANE_MAX_{WIDTH,HEIGHT}
<javierm> or am I wrong on that?
tursulin has quit [Remote host closed the connection]
tursulin has joined #dri-devel
<tzimmermann> javierm, what i mean is: userspace allocates a GEM buffer, say 800 x 600. those sizes are aligned to a multiple of 64. so you'd allocate a memory block of 832 x 640 bytes. if these sizes are not dividable by 'page_size' and you do a DIV_ROUND_UP, you might end up with values that refer to areas outside the memory. for example during pageflip's memcpy() . i don't know if that can actually happen in the driver. i was
<tzimmermann> just concerned that the page_size might interfere here
bmodem has joined #dri-devel
bmodem1 has quit [Read error: Connection reset by peer]
<Hazematman> Hey, I'm working on a driver that doesn't have native support for PIPE_FORMAT_R32G32B32_FLOAT. If an OpenGL app requests that format as a RB, gallium seems to converts it to PIPE_FORMAT_R32G32B32A32_FLOAT (which is supported). Does anyone know where this happens. I've been trying to dig through the gallium infrastructure to see where it handles surface conversion, to find if its possible to access the native requested format. Any
<Hazematman> guidance of where I should look would be appreciated
<danylo> Hazematman: I think it chooses the compatible format with `choose_renderbuffer_format`. I guess to see where it handles the mismatch between formats you'd have to search for where `->InternalFormat` is used.
bmodem has quit [Ping timeout: 480 seconds]
jkrzyszt has quit [Remote host closed the connection]
jkrzyszt has joined #dri-devel
<javierm> tzimmermann: ah, got it. Good point, I'll look if that can happen and if is a possibility can fix on top. Thanks!
JohnnyonF has quit [Ping timeout: 480 seconds]
smiles_1111 has joined #dri-devel
benjamin1 has joined #dri-devel
benjaminl has quit [Remote host closed the connection]
<swick[m]> Lyude: I'm looking at https://gitlab.freedesktop.org/drm/intel/-/issues/8425 again. The intel eDP proprietary backlight control has a bunch of registers and control bits unused which sound like they could be the cause.
<swick[m]> jani: ^
<swick[m]> are there more details on them? I don't have the hardware to test any of that...
junaid has quit [Remote host closed the connection]
alyssa has left #dri-devel [#dri-devel]
kts has joined #dri-devel
OftenTimeConsuming has quit [Remote host closed the connection]
YuGiOhJCJ has quit [Ping timeout: 480 seconds]
OftenTimeConsuming has joined #dri-devel
YuGiOhJCJ has joined #dri-devel
junaid has joined #dri-devel
Macdu has joined #dri-devel
Macdu has quit []
milek7 has quit [Remote host closed the connection]
vliaskov has quit [Remote host closed the connection]
milek7 has joined #dri-devel
yuq825 has joined #dri-devel
heat__ has quit [Read error: Connection reset by peer]
heat has joined #dri-devel
apinheiro has quit [Quit: Leaving]
pochu_ has joined #dri-devel
pochu has quit [Ping timeout: 480 seconds]
Guest2311 has quit [Read error: Connection reset by peer]
jewins has joined #dri-devel
yuq825 has quit []
amber_harmonia has joined #dri-devel
fab has joined #dri-devel
airlied has quit [Ping timeout: 480 seconds]
smilessh has joined #dri-devel
pallavim has joined #dri-devel
TMM has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
TMM has joined #dri-devel
<mareko> DavidHeidelberg[m]: do any amd CI tests use LLVM < 15?
airlied has joined #dri-devel
smiles_1111 has quit [Ping timeout: 480 seconds]
heat has quit [Remote host closed the connection]
heat has joined #dri-devel
<DavidHeidelberg[m]> mareko: I don't think so, so far all images are 15
idr has joined #dri-devel
Danct12 has joined #dri-devel
Danct12 has quit [Remote host closed the connection]
Danct12 has joined #dri-devel
mbrost has joined #dri-devel
pochu_ has quit []
kzd has joined #dri-devel
benjamin1 has quit [Ping timeout: 480 seconds]
itoral has quit [Remote host closed the connection]
Duke`` has joined #dri-devel
JohnnyonFlame has joined #dri-devel
Company has joined #dri-devel
JohnnyonF has joined #dri-devel
JohnnyonFlame has quit [Ping timeout: 480 seconds]
kts has quit [Quit: Konversation terminated!]
fxkamd has joined #dri-devel
<mareko> great, thanks
frankbinns has joined #dri-devel
kts has joined #dri-devel
frieder has quit [Remote host closed the connection]
smilessh has quit [Ping timeout: 480 seconds]
<mareko> karolherbst: when do you think we can drop clover support from radeonsi?
junaid has quit [Remote host closed the connection]
mbrost has quit [Ping timeout: 480 seconds]
dfip^ has quit [Remote host closed the connection]
nchery is now known as Guest2340
nchery has joined #dri-devel
alanc has quit [Remote host closed the connection]
alanc has joined #dri-devel
mbrost has joined #dri-devel
<karolherbst> mareko: I want to wait until proper function calling support
<karolherbst> that's more or less the biggest regression compared to clover
sarahwalker has quit [Remote host closed the connection]
<karolherbst> I kinda plan to prototype this with llvmpipe and radeonsi given they use LLVM so it shouldn't be too hard to do, but nir needs some fixes here and there
<mareko> NIR->LLVM can't do function calls
aravind has quit [Ping timeout: 480 seconds]
<karolherbst> I know
<karolherbst> but without that we sometimes get shaders with like 2 million SSA values and RA eats through RAM and takes hours
iive has joined #dri-devel
<karolherbst> there are still some unknowns on how to do things, but my initial plan was to kinda only have function calls between the kernel and libclc
gouchi has joined #dri-devel
<karolherbst> and maybe only for functions being of specific size
<karolherbst> some of those libclc functions are massive and even use LUTs
<mareko> I don't know if LLVM support function calls with the Mesa (non-native) ABI
benjaminl has joined #dri-devel
<mareko> LLVM compiles shaders slightly differently for radeonsi, RADV, PAL, and clover (same as ROCm)
<mareko> there is an LLVM target triple that we set, radeonsi sets amdgcn--, RADV sets amdgcn-mesa-mesa3d, and I don't know what clover sets
<karolherbst> clover has a CAP for it: PIPE_COMPUTE_CAP_IR_TARGET
<karolherbst> it's amdgcn-mesa-mesa3d as it seems
<mareko> ok
<karolherbst> we don't need an ABI because I'm not planning to link GPU binaries, so as long as the final binary works it's all fine
<karolherbst> or rather, not a stable one
<karolherbst> so whatever llvm does internally for function calls doesn't really matter here
<mareko> arsenm on #radeon might know if amdgcn-- supports function calls
<karolherbst> besides that we have a little delete clover tracker here: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19385
<karolherbst> fp16 is kinda the only other thing missing, but that should be fairly trivial to add
sgruszka has quit [Ping timeout: 480 seconds]
ngcortes has joined #dri-devel
djbw_ has joined #dri-devel
mbrost has quit [Remote host closed the connection]
mbrost has joined #dri-devel
jewins has quit [Ping timeout: 480 seconds]
YuGiOhJCJ has quit [Quit: YuGiOhJCJ]
mbrost has quit [Ping timeout: 480 seconds]
JohnnyonF has quit [Ping timeout: 480 seconds]
zf has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
nchery has quit [Ping timeout: 480 seconds]
zf has joined #dri-devel
Peuc_ has joined #dri-devel
Peuc has quit [Ping timeout: 480 seconds]
<mareko> karolherbst: given what arsenm said, the only missing thing is function call support in ac_nir_to_llvm and probably adjacent places
Leopold_ has quit [Ping timeout: 480 seconds]
MrCooper has quit [Ping timeout: 480 seconds]
<airlied> karolherbst adding functions to llvmpipe was a bit of a pain, radeonsi might be easier at least as long as it's using llvm
<airlied> but I think with llvmpipe the overheads of sticking stuff onto the stack was quite noticeable
Leopold_ has joined #dri-devel
MrCooper has joined #dri-devel
<mareko> wow the Marge queue has 15 MRs
<karolherbst> airlied: yeah.. that's why I only want to turn calls into huge libclc functions into proper calls
<mareko> for Mesa
<karolherbst> where copying them multiple times would just hurt everything
<karolherbst> airlied: I kinda want to figure out why those luxmark benchmarks explode in size and just do function calls to deal with that problem
lynxeye has quit [Quit: Leaving.]
<mareko> radeonsi also unrolls aggressively
<mareko> see si_get.c
<mareko> loops with up to 128 iterations are unrolled
<mareko> probably regardless of the loop body size
<karolherbst> mhhhh, that would be.. bad
<karolherbst> anyway, I didn't check why those shaders explode in side, I just now they end up with millions of SSA values
<karolherbst> maybe I should do that
<karolherbst> mareko: seems like opt_loop_unroll checks for 26 instructions
<karolherbst> ehh wait
<karolherbst> iterations
tzimmermann has quit [Quit: Leaving]
<karolherbst> or is it instructions?
<karolherbst> yeah.. it's instructions
gouchi has quit [Remote host closed the connection]
gouchi has joined #dri-devel
JohnnyonFlame has joined #dri-devel
<karolherbst> mareko: btw, will you create the MR for the vectorization stuff?
<HdkR> karolherbst: How does one setup rust in meson's cross files for 32-bit? Or do I just ignore 32-bit rusticl?
<karolherbst> HdkR: good question, but I guess you just set rustc and set a 32 bit target as a compiler flag
<karolherbst> I actually did that...
<HdkR> Currently meson just complains that `rust compiler binary not defined in cross or native file`
<karolherbst> ahh yes.. HdkR: rust = '/home/kherbst/.rustup/toolchains/1.59-i686-unknown-linux-gnu/bin/rustc' 🙃
<HdkR> ah
<karolherbst> I think you can potentially also set the target, but I think just pointing to a toolchain is the proper way.. dunno.. I guess it depends on how your distrubtion handles it if you are not using rustup
<HdkR> Currently poking around at ArchLinux
<HdkR> ah, blocked by them not supporting spirv-tools for 32-bit and I'm too lazy to build that :)
gouchi has quit [Remote host closed the connection]
<karolherbst> :)
<HdkR> Oh well, not too concerned about 32-bit CL anyway
<karolherbst> one user actually filed a bug, because some 32 bit windows app ran into problems with rusticl 6
<karolherbst> s/6//
<HdkR> I guess they can figure that out if they want it running under FEX :P
<karolherbst> :D
<karolherbst> fair enough
<karolherbst> at some point I also have to check out FEX on my macbook
<HdkR> Finally getting around to creating an Arch image so Asahi users can have a nicer experience
<karolherbst> but CL doens't run there very well except for llvmpipe
<karolherbst> ahh, cool
<karolherbst> but the new and hot asahi distribution is fedora based :P
<HdkR> Next step Fedora I guess
naseer has joined #dri-devel
naseer has left #dri-devel [#dri-devel]
<DemiMarie> Is the simplest solution to the LLVM problems to stop using LLVM? Walter Bright wrote a C frontend to Digital Mars D in ~5000 lines of D, and I suspect Mesa has far more code than that that just works around LLVM problems. LLVM isn’t magic, and from what I have read it seems that its optimizations don’t really do anything useful. If one needed a C++ frontend that would be another matter, but my understanding is that none is needed.
gouchi has joined #dri-devel
<karolherbst> 1. we'd still have to maintain it 2. llvmpipe 3. C isn't just the language
benjamin1 has joined #dri-devel
benjaminl has quit [Ping timeout: 480 seconds]
benjaminl has joined #dri-devel
zf has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
<karolherbst> I won't say no if somebody comes around and writes a full C compiler + all the OpenCL API nonsense bits, but I won't do it
<DemiMarie> I see
zf has joined #dri-devel
<DemiMarie> Clang having OpenCL support is not something I expected.
<karolherbst> yeah.. we just use clangs support there
<karolherbst> they deal with most of the extension + header nonsense
<karolherbst> well.. builtins at this point, using the headers is slower than using the new and fancy stuff, which isn't headers
<karolherbst> kinda don't want to replicate all of that
rauji___ has quit []
<karolherbst> also.. writing a new C frontend is all cool and everything, but 5k just for parsing/lexing? kinda brutal
<karolherbst> anyway.. the CL bits dealing with LLVM are small, most of it is dealing with spir-v stuff.
<karolherbst> the part where LLVM matters more is on the backend side
<HdkR> Considering lexing is my least favourite part, I'll never do that :D
<karolherbst> llvmpipe and radeonsi do a lot of LLVM backend stuff, none of it is even remotely frontend related
<karolherbst> radeonsi problem will be solved with ACO, probably
<karolherbst> and to replace LLVM's use in llvmpipe we'd have to support _multiple_ CPU architectures with all their nonsense
<karolherbst> no thank you :D
benjaminl has quit [Ping timeout: 480 seconds]
<DemiMarie> Yeah LLVM is awesome at generating CPU code.
<HdkR> LLVM and the CPU side, great
<karolherbst> we even found an auto vectorization issue recently ...
<karolherbst> now radeonsi calls a nir pass to vectorize so LLVM can still mess up and we won't care
zf has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
<DemiMarie> Not suprising. I imagine llvmpipe generates very easily vectorizable code.
<karolherbst> probably
<karolherbst> airlied: you might want to call nir_opt_load_store_vectorize :D
<karolherbst> in llvmpipe
zf has joined #dri-devel
<DemiMarie> Curious: what does llvmpipe-generated code wind up bottlenecking on?
<jenatali> Yeah WARP's JIT backend for multiple CPU architectures is a mess...
<HdkR> llvmpipe is usually bottlenecked on vertex processing isn't it?
<HdkR> Since that was one of the things that SWR targeted as an improvement
Guest2340 is now known as nchery
heat has quit [Read error: No route to host]
heat has joined #dri-devel
tursulin has quit [Ping timeout: 480 seconds]
heat_ has joined #dri-devel
heat has quit [Read error: Connection reset by peer]
<karolherbst> antoniospg____: btw, did you made some progress on fp16 support?
<airlied> for most workkloads it bottlenecks on memory bandwidth around fragment shading
<airlied> there are some vertex heavy workloads where binning hits hard
<Lynne> isn't the mess in writing custom jit mostly in the platform ABI differences?
<karolherbst> good thing is: we have no ABI to care about
<airlied> yeah I'd hate to have to write backends for every processor in mesa itself
<karolherbst> anyway...
<airlied> karolherbst: not sure, llvmpipe doesn't do vectors like others do vectors
<karolherbst> airlied: you still want to give nir_opt_load_store_vectorize a go :D somehow llvm is too dumb to merge loads and ditch repeated address calculations in loops
<karolherbst> nah.. it has like _nothing_ to do with vectors
<karolherbst> airlied: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9139#note_1940698 and following comments
<karolherbst> just vectorizing loads ditches some alus on address calculations
<karolherbst> it's very dumb
<karolherbst> might be some amdgpu backend specific issue though
<airlied> yeah as I said when llvmpipe translates from nir it doesn't a whole lot of address translations itself
heat_ has quit [Remote host closed the connection]
<karolherbst> mhh, fair enough then
<airlied> but yeah I should throw it in at some point
<airlied> but I've no real way to notice it working :-P
<karolherbst> I'm just super surprised it even matters for radeonsi
heat_ has joined #dri-devel
<airlied> shaderdb someday :-P
<karolherbst> heh
<karolherbst> maybe I should check with luxmark
benjaminl has joined #dri-devel
<DemiMarie> airlied: bottlenecking on memory bandwidth explains why llvmpipe works so well on Apple Silicon.
<DemiMarie> Because they have loads of it.
<karolherbst> kinda, but less on the CPU side sadly
<HdkR> 800GB/s is very much in dGPU territory :)
<DemiMarie> Why can GPUs have so much better memory bandwidth?
<karolherbst> because it needs more
<karolherbst> the CPU seems to have slower access but it might also be because the CPU is too slow
fab has quit [Quit: fab]
kugel has quit [Ping timeout: 480 seconds]
<DemiMarie> I’m more interested in what is different about the GPU memory systems, especially on iGPUs where the DRAM and DRAM controllers are identical.
<HdkR> To note, CPUs tend to have lower latency on their memory accesses
<DemiMarie> Why is there a latency vs throughput tradeoff there?
<karolherbst> CPUs cheap out on memory bandwidth because they are still DIMM
<airlied> the other things GPUs have is tiling
<karolherbst> and it's all very limiting
<karolherbst> Apple uses 128 bit for memory transfers
<karolherbst> where on x86 you always get 64
<airlied> tiled textures are a big advantage if you are doing all the address translation in hw
benjamin1 has quit [Ping timeout: 480 seconds]
<karolherbst> and the "channel" situation with x86 memory is also just silly
<DemiMarie> karolherbst: for iGPUs both the CPU and GPUs have the same DRAM chips, so DIMMs are not relevant here.
<karolherbst> but how to you connect the memory?
<karolherbst> *do you
<karolherbst> the DIMM spec specifies memory operation latencies + transfer rates
<karolherbst> can't really fix that
<karolherbst> so you are just stuck with whatever that uses
<DemiMarie> How is this relevant? My point is that iGPUs have the same memory the CPU does, so it must be something other than the RAM chips.
<karolherbst> CPU memory is _slow_ on x86 systems
<DemiMarie> what part of the CPU memory is slow?
<karolherbst> you get like 50 GB/s on normal consumer systems
<karolherbst> the DIMM :)
<DemiMarie> then why does i915 not have garbage performance?
<karolherbst> it does have garbage perf
<karolherbst> the M2 is like 8 times that?
<karolherbst> the normal M2
<DemiMarie> Is this Intel-specific or does AMD also have bad memory bandwidth?
<karolherbst> same on AMD
<karolherbst> it's just that consumer systems are dual channel 64 bit at most
<karolherbst> and that's like around 50-60 GB/s
<HdkR> M2 gets 100GB/s
<DemiMarie> Does this mean that the M2 could do faster shading in software than i915 can in hardware?
<airlied> steamdeck does 88GB/s
<DemiMarie> At least on some workloads where fixed function isn’t a bottleneck
<karolherbst> mhhh.. probably not
JohnnyonFlame has quit [Ping timeout: 480 seconds]
<karolherbst> airlied: quad channel?
<karolherbst> or what is differnet on the steamdeck apu?
<HdkR> 128-bit bus, technically quad channel because of DDR5
<karolherbst> ahhh
<airlied> karolherbst: yeah
<karolherbst> 128 bit then
<karolherbst> well.. it's easily fixable on consumer hardware, but no vendor is tackling it
<HdkR> Desktop class would get roughly equivalent on DDR5
JohnnyonFlame has joined #dri-devel
<karolherbst> Dell kinda tried that with replacing DIMMs, but that's not going anywhere as it seems
<karolherbst> HdkR: that's so sad
<HdkR> It's a shame that desktop class has been stuck on 128-bit for so long
<HdkR> 192-bit would be a cool upgrade for that segment
<karolherbst> yeah..
<HdkR> Or just go straight 256-bit since most every board supports quad dimms anyway
<karolherbst> but that's not getting you 400 or even 800 GB/s :D
<DemiMarie> Can we start working on some design specs and see of SiFive can actually build a fast chip?
<psykose> people would have riots if you removed replacable dimms
<DemiMarie> What does get one that?
<karolherbst> psykose: well.. dell suggested something better
<karolherbst> but...
<psykose> haha
<karolherbst> but we can also just stick with slow memory :D
<karolherbst> something has to change or it's the end of x86 for real
<HdkR> Does CAMM allow 128-bit per module?
<DemiMarie> karolherbst: x86 needs to die
<karolherbst> HdkR: good question
<psykose> riscv also needs to die but nobody wants to hear it
<HdkR> Four dimms of 128-bit each would get desktops a hell of a lot closer
<airlied> riscv will eat itself
<karolherbst> HdkR: probably if you call it QIMM and bumb it to 128 bit :P
<DemiMarie> airlied: eat itself?
<HdkR> :D
<karolherbst> or di 96 bit first and call them TIMMs
junaid has joined #dri-devel
<airlied> it'll just be incompatible fork after incompatible fork, until there is no "risc-v"
<DemiMarie> karolherbst: the task is not keeping x86 alive, but rather ensuring that open platforms do not die with it.
<karolherbst> don't use DIMMs
<karolherbst> that's the way
<karolherbst> just do whatever apple did with memory
<airlied> solder that shit down
<HdkR> karolherbst: Anything more than 64-bit per DIMM would be an upgrade and I'm all for it.
<airlied> or HBM it up
<karolherbst> yeah.. soldering is the solution here
<DemiMarie> Why????
<karolherbst> but "people need to replace it" no
<karolherbst> well
<karolherbst> it's either that or it dies :)
<HdkR> Memory on package is how Apple managed to get those numbers
<karolherbst> yep
<airlied> the DIMM socket is an impediment to speed
<karolherbst> and how GPUs get those numbers for years
<airlied> all sockets are
<HdkR> It's infeasible in a current spec socketed system
<DemiMarie> karolherbst: are you saying that replacable memory simply cannot be anywhere near as fast as soldered memory?
<karolherbst> the Dell thing was interesting, but not sure what peak speeds they have
<psykose> it's pretty much electrically impossible yes
<karolherbst> DemiMarie: correct
<psykose> there's too many wires and length of wire to make it fast
<DemiMarie> Even with active signal regeneration in the sockets?
<karolherbst> the RAM on the M2 is right beside the SoC
<karolherbst> like literally right beside it
<karolherbst> and it's super small
<DemiMarie> Maybe we need optical on-board interconnects
<karolherbst> the entire SoC is smaller than an entire DIMM module
<psykose> IBM was doing some serial memory thing with firmware on the ram modules
<psykose> weren't they
<HdkR> optical would introduce /more/ latencies. Short runs of optical are actually slower than just copper. Ask people that use direct-attached-copper cables in networks
<puck_> psykose: there's also CXL now
<psykose> interesting
<karolherbst> ahh CAMM is the Dell thing.. right
<DemiMarie> HdkR: Signal propogation velocity is _not_ the limiting factor here.
<karolherbst> I acutally don't know if it fixes the perf problem
<puck_> i'm reminded of the AMD 4700S
<puck_> which is very distinct and has 16GB of soldered RAM used for both the CPU and what would be the GPU but i think they fused off the APU bits
<DemiMarie> Even in optical fiber light still goes 6cm in a single clock cycle.
<DemiMarie> CPU clock
<puck_> ..but it's 16GB of *GDDR6* as main memory
<DemiMarie> At 3GHz
<karolherbst> yeah but you also have to translate it into electrical signals and all that
<puck_> which is fast but has higher latency
<DemiMarie> karolherbst: my point is that the signal integrity problems simply vanish
<karolherbst> but it comes with massive latency costs
<DemiMarie> Why
<DemiMarie> ?
<karolherbst> because translating to optical signal isn't for free?
<karolherbst> we talk single digit ns here
<DemiMarie> Let me guess: the real limitation is cost?
<DemiMarie> I know
<karolherbst> maybe?
<karolherbst> but in any case, just soldering it together solves the problem in a simpler way
<DemiMarie> And I am 99.99% certain that e.g. optical modulators have latencies far, far lower than that
<karolherbst> close to nobody actually upgrades RAM
<DemiMarie> fair
<karolherbst> and it needs more space to be replaceable and everything
<DemiMarie> my point was that a high-speed socketed system is possible, not that it is going to be cost-effective
<puck_> i wonder if we'll see an era where there's soldered-on RAM plus CXL if you really need more bandwidth (aka more distinct tiers of RAM)
<puck_> s/bandwidth/memory/
<karolherbst> soldered RAM even leads to less ewaste in avarage, because it's needing way less space and everything
<DemiMarie> True
<DemiMarie> Honestly what I really want is for Moore’s Law to finally peter out.
<karolherbst> like to match the 800GB/s you need like... 16 DIMM slots I think? :D
<HdkR> Say that optical does solve the signal integrity problem. You now need 16 DIMMs worth of bus width to match M1/2 Ultra bandwidth
<karolherbst> but yeah...
<karolherbst> DIMM is stupid
<HdkR> Sixteen!
<DemiMarie> HdkR: yeah, not practical
<HdkR> Because the M1/2 Ultra has 8 LPDDR5 128-bit packages on it
<karolherbst> maybe CAMM would need more
<karolherbst> *less
<karolherbst> but...
<DemiMarie> Serious question: is there room for something that is a GPU/CPU hybrid?
<karolherbst> it's still huge
<karolherbst> the M2 24GB memory is so _tiny_
<dj-death> airlied: what's the current rule to update drm-uapi headers in mesa? take drm-next or drm-tip?
<airlied> dj-death: drm-next usually
<DemiMarie> Something made for those workloads that are easy to parallelize, but are somewhere between hard and impossible to meaningfully vectorize?
<karolherbst> DemiMarie: good question.. intel kinda tries that with AVX512, no?
<karolherbst> but....
<DemiMarie> karolherbst: anti-AVX512
<karolherbst> yeah well.. more threads would help
<karolherbst> but we are already going there
<DemiMarie> I’m thinking of stuff where the hard part is “what the heck do I do next?”
<HdkR> SVE2-512bit :P
<karolherbst> yeah.. more threads if you can parallelize
<karolherbst> more low power ones even to make use of low power consumption at lower clocks
<DemiMarie> Modern compilers are highly parallelizable, but nigh impossible to vectorize
<karolherbst> I think most CPU manufacturers will see that high perf cores give you nothing
<DemiMarie> Same
<karolherbst> and we'll end up with 4+20 core systems, 4 high perf, 20 low perf
<DemiMarie> Except for security holes
<DemiMarie> Yup
<karolherbst> intel kinda moves into having same high/low perf cores :D
<karolherbst> it's kinda funky
<DemiMarie> Xen is having a really hard time with HMP right now
<DemiMarie> Mostly because Xen’s scheduler is not HMP aware
<dj-death> airlied: apparently some amdgpu headers where pulled from neither : https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21986
<karolherbst> it's also funky that the difference between 12700 and 12900 was not more perf cores, but 4 more energy cores
<DemiMarie> Not at all surprised.
<karolherbst> heh
<karolherbst> 13th gen is already there
<karolherbst> 8 high perf, 16 low perf :D
<DemiMarie> is that a comment on Xen?
<karolherbst> kinda totally forgot about that
<dj-death> airlied: not quite sure what do to since we want to update the intel ones to the next drm-next
<karolherbst> so yeah.. intel is already there
Duke`` has quit [Ping timeout: 480 seconds]
rasterman has quit [Quit: Gettin' stinky!]
kugel has joined #dri-devel
<karolherbst> I wonder when Intel kills hyperthreading
<DemiMarie> To me the problem with big cores is that the stuff they do well on are:
<DemiMarie> 1. Wizard-optimized programs written with lots of SIMD intrinsics or even assembler.
<DemiMarie> 2. Legacy single-threaded programs that cannot be parallelized.
<HdkR> Once the sram cost of duplicating all the register state costs too much die area to them :P
<DemiMarie> 3. have lots of security holes
<karolherbst> DemiMarie: well... some things are hard to parallelize, like game engines
<DemiMarie> karolherbst: why?
<karolherbst> because things depend on each other
<karolherbst> AI in games is not trivial
<karolherbst> game devleopers can probably explain it a lot more
<karolherbst> there are things which can happen in parallel, but it's not as trivial as it might sound at first
<karolherbst> also think sound mixing and stuff
<DemiMarie> Sound mixing should happen on another thread.
<karolherbst> yeah
<karolherbst> so that's what I meant with some things can happen in parallel
<karolherbst> but you still need high single thread cores if you want to mix more sources in realtime
<karolherbst> in some games you notice that sound sources get disabled on demand, because of load
<DemiMarie> Should that be handled by a dedicated DSP?
<karolherbst> maybe?
<karolherbst> maybe not
<karolherbst> might be not flexible enough
<DemiMarie> I wonder if graph-reduction machines might help.
<karolherbst> but the point is rather, that there will be need for perf cores
<HdkR> Just throw another E-core at the problem, homogeneous programming model is better here
<DemiMarie> Basically a processor designed for Haskell and other purely functional languages, where everything is safe to run in parallel unless a data dependency says otherwise.
<DemiMarie> Where if there is a hazard that means the program has undefined behavior because someone misused unsafePerformIO or similar.
<psykose> even "in haskell" the above issues apply
<psykose> parallelism is not magic
<karolherbst> also.. caches
<DemiMarie> HdkR: Mobile devices have lots of DSPs IIUC
<psykose> strong '1 person in 12 months 12 people in 1 months' manager vibes
<HdkR> DemiMarie: And nobody's game uses them directly
<HdkR> Burn a Cortex-A53 to do the sound mixing, let the OS use the DSP for mixing
<DemiMarie> HdkR: maybe we need higher-level sound APIs that have “sound shaders” or similar
<karolherbst> in theory everything can be perfect, but practically we have the best outcome possible :P
<HdkR> DSP also takes up AI and modem responsibilities there...
<karolherbst> DemiMarie: cursed
<HdkR> OpenAL 2.0
<DemiMarie> karolherbst: cursed?
<karolherbst> very
sima has quit [Ping timeout: 480 seconds]
<DemiMarie> I meant, “define cursed”
<karolherbst> it just sounds cursed
<Lynne> don't AMD have some weird GPU sound mixing thing?
<DemiMarie> I mean eBPF and P4 are basically shading languages for network devices.
<karolherbst> yeah, and many think eBPF is very cursed
<karolherbst> not saying I disagree, but...
<DemiMarie> part of that is because of the need to prove termination
gouchi has quit [Remote host closed the connection]
<DemiMarie> In hardware that can be solved by having a timer interrupt.
<karolherbst> well.. on the kernel side you can also just kill a thread
<karolherbst> but you don't want to do that
<karolherbst> like never
<DemiMarie> longjmp()?
<karolherbst> so... you can't really do that wiht random applications, because they have to be aware of getting nuked at random points
kugel is now known as Guest2358
<karolherbst> so they have to be written against that
kugel has joined #dri-devel
<karolherbst> otherwise you risk inconsistent sate
<karolherbst> *state
<DemiMarie> the other possibility is that if your program doesn’t finish soon enough, that’s a bug
<karolherbst> if the modules work strictly input/output based, then yeah, might be good enough
<karolherbst> but then it's more of a design thing
<karolherbst> oh sure
<karolherbst> but you still can't kill it if it doens't know it will be killed randomly
<airlied> dj-death: just pull the intel ones and agd5f can chase down what happened with amd ones maybe
<karolherbst> you kinda have to enforce that in the programming model
<DemiMarie> yeah
Guest2358 has quit [Ping timeout: 480 seconds]
<DemiMarie> Also, I hope these conversations are interesting and not wasting people’s time! (Please let me know if either of those is false.)
<karolherbst> nah, it's fine
<psykose> what else would we be discussing
<mattst88> development of dri?
Leopold___ has joined #dri-devel
dliviu has quit [Ping timeout: 480 seconds]
<karolherbst> that X component?
<karolherbst> hell no!
<mattst88> might as well just ramble on about optical interconnects and haskell machines for 90 minutes instead :P
Leopold_ has quit [Ping timeout: 480 seconds]
Zopolis4 has joined #dri-devel
dliviu has joined #dri-devel
JohnnyonFlame has quit [Read error: Connection reset by peer]
konstantin_ has joined #dri-devel
konstantin has quit [Ping timeout: 480 seconds]
junaid has quit [Ping timeout: 480 seconds]
pcercuei has quit [Quit: dodo]
<Lynne> I understand fences are not quite a replacement for mutexes
<Lynne> but damn it, they should've added a wait+unsignal atomic operation on fences in vulkan
iive has quit [Quit: They came for me...]
jkrzyszt has quit [Ping timeout: 480 seconds]
JohnnyonFlame has joined #dri-devel
tango_ has quit [Ping timeout: 480 seconds]
smilessh has joined #dri-devel
tango_ has joined #dri-devel
memleak has joined #dri-devel
shashanks_ has joined #dri-devel
<memleak> Hello, I'm using kernel 6.4-rc5 (patched with PREEMPT_RT) and DRM/KMS works just fine on both AMDGPU and Radeon, I'm using an R9 290 (Hawaii) however when starting SDDM or LightDM, USB breaks
<memleak> If I use radeon then I get garbage on the screen, USB is dead, if I use AMDGPU, the screen at least looks fine but USB also dead.
<memleak> This problem does not exist on 6.1.31 (have not tried 6.1.32 yet)
<airlied> memleak: anything in dmesg?
<memleak> I set the panic timeout to -1 (instantly reboot on panic) and enabled panic on oops, the cursor for the login screen keeps blinking and the system stays on.
<memleak> I can't quite check it once the USB is dead lol i may have to grab a PS/2 keyboard if that works I don't have serial debug either
<memleak> I'll try and get dmesg output
shashanks__ has quit [Ping timeout: 480 seconds]
<memleak> I have to head out, I'll be back later, just wanted to get this down in the channel. airlied nice to see you again btw, it's NTU/Alec from freenode lol
<memleak> Oh, just want to note that USB works indefinitely as long as X doesn't start :)
<airlied> oh hey, have you another machine to ssh in from?
Zopolis4 has quit [Quit: Connection closed for inactivity]