<DavidHeidelberg>
Been on RISC-V conf., tried run eglgears/glxgears on some presented risc-v laptops, got segfaults & another unhealty output from Mesa (sometimes 22.x :/ so, not even update)...
<DavidHeidelberg>
I'm not crying here, but thinking.. there is like ~ 6 months until next Debian starts entering freeze (so we bump the CI and we can cover risc-v).
<DavidHeidelberg>
eventually, we could at least drop Alpine container with risc-v, maybe into nightly runs... not sure how well it'll run on x86_64. We don't have any risc-v machines (now, but there will be some in close future available)
<DavidHeidelberg>
I have 3 pieces at home, but it's so slow, the x86_64 emul would be 10x faster.
<DavidHeidelberg>
For the HW, what I saw is Imagination (proprietary) or AMD (PCI-e, working kinda well on SiFive). Except that everything is softpipe or llvmpipe (thanks to ORCJIT :) these we're the ones which worked)
bmodem has joined #dri-devel
jsa has joined #dri-devel
<airlied>
DavidHeidelberg: with PCIe it's often the PCIe hw is just broken
<airlied>
SoC PCIE hw is rarely tested on the things GPUs want
<airlied>
good to know orcjit works :-)
<DavidHeidelberg>
suprisingly, the PCIe GPUs seemed to work well (haha, let's be honest, no AAA titles was running there, but OpenArena level games)
jernej_ is now known as jernej
* DavidHeidelberg
regrets he didn't take pictures, they usually had cool PC cases on display
<HdkR>
perhaps fortunately, I don't play with RISCV hardware with GPUs. ARM hardware has mostly broken PCIe fabric :P
The_Company has quit []
Company has joined #dri-devel
<Company>
how fast is llvmpipe on those things?
croissant_ has joined #dri-devel
mbrost has joined #dri-devel
<airlied>
no idea what sort of vector support they have
<DemiMarie>
airlied: Is SoC HW broken, or are GPU drivers the broken ones?
<airlied>
DemiMarie: usually the hw
<DemiMarie>
airlied: in what ways?
<airlied>
they don't implement PCIE conformantly
glennk has joined #dri-devel
<DemiMarie>
In what way?
<airlied>
usually they don't enact snooping support
<DavidHeidelberg>
Company: Answer is usually sorted to two groups, on small boards: No; on boards with PCIe (and better cores), it's Yes, but you use normal GPU anyway, so it doesn't matter :D
<DemiMarie>
airlied: I thought that the DMA API did not guarantee snooping.
<DavidHeidelberg>
yup, usually weird workarounds are needed, also I heard that for example recent AMD GPUs have some adjustments to work on these boards
croissant has quit [Ping timeout: 480 seconds]
<Company>
I'm just curious because people build PoS systems with those low-powered non-gpu systems
<airlied>
what has the DMA API got to do with the hw not doing it?
<Company>
and I'm waiting for the time when software GL is fast enough on those things
<airlied>
DavidHeidelberg: often the AMD adjustments are just hacks that disable a path, but overall the hw is screwed
<DemiMarie>
Company: doubt it will happen, once the CPU is fast enough I suspect they will want more things and it's back to a dedicated SW renderer.
<airlied>
there are endless threads on dri-devel with various non-x86 cpus trying to disable codepaths
<airlied>
the loongsoon folks being the most horrible
<airlied>
there was one loongsoon that I think used an AMD intergrated GPU on an x86 southbridge
<airlied>
or rather northbridge
<HdkR>
Most ARM hardware has heartburn for nGnRnE PCIe mappings
<airlied>
always gives me bad AGP flashbacks
<DemiMarie>
airlied: My understanding is that the DMA API does not guarantee snooping, so Linux drivers that assume it are buggy from a DMA API perspective (they need explicit cache flushes).
<airlied>
DemiMarie: we don't use the DMA API
<HdkR>
AGP PTSD, oh no
<airlied>
or at least we workaround it's lack of support for snooping, since GPU needs it
<DemiMarie>
Why do GPUs need snooping and not flushes?
<airlied>
hw designers gonna design hw :-)
<DemiMarie>
airlied: why can't one add SW flushes?
<airlied>
because we have userspace mappings
<airlied>
we don't just map stuff in the kernel
<airlied>
you also don't want to be throwing away your whole cache all the time
Duke`` has joined #dri-devel
<DemiMarie>
I thought one could force writeback without invalidating. Are syscalls for cache maintenance too expensive?
<airlied>
why bother adding all that when the hw is meant to support it
<DemiMarie>
Obviously any mapping that is written from both sides would be busted, but that's racy anyway.
<airlied>
you have just a bunch of code that never gets tested
<DemiMarie>
airlied: which HW?
<airlied>
PCIE hw
<DemiMarie>
Are there any Arm SoCs that get this right?
<DemiMarie>
I presume POWER does
<airlied>
the new Ampere seems good
<DemiMarie>
That's a server board
<airlied>
plugging a 16x GPU into an SoC is often hard, but I've no idea, there is a lot of socs
<airlied>
I think jetson might have been good
<airlied>
not sure the plug a gpu into the rpi pcie 1x ever worked :-P
<airlied>
oh maybe the rpi5 works
<introducelogics>
So you will have 66+67+128 representing powers "2+4 +max" is the last aka. 2 in power of 31 that is in internal adders collision with "65+68+128" 1+8+max how would one want to get rid of collision? so 68 added to first and 69 to second, and answer sets adjust to the fact. They are unique now, that is the first family of solutions where you stretch the possible powers to another set
<introducelogics>
for example from 130to194, where collision happens 130+4 or 130+5 goes in, but howto solve this within encoder? The ddr has read-modify-write ok, but how it would know that collision happened in intermediate internal adder? is that what you think is the main puzzle? so is correction bank either write enable to 67+130 or 68+130, so the counter is pushed to a modify base stack, the
<introducelogics>
address calculator would use base differently depending on how the bases were written, are such pagetables or hw hacks very difficult? imo probably no, or what you would think? DMA is pushed with quite high end sanity in hw along with ddr controller. as you see it is meant for such offloads, so 66 get's discarded with 65 how? We skip 65 or 64 within IR of base 130 cause those are
<introducelogics>
unique (by means of WE signal 0 at their read location), and over one you add to the collision bank. 66 get's no candidate since no such address (at read base 0), 67 is on skip where as 68 is written to base130. so the invariant sums become 66+68+130+68 is akin of 66+68, and 67+66+0 is akin for it's own. Likely one correction/collision bank is enough but needs some thought yet. I
<introducelogics>
looked at alphametrics puzzles, but all this stuff needs long hours of practicing and note taking for me yet. PCI-sig is with expenses to be accessed, so i have seen some old specs with AGP myself too that ended up in the web.
mbrost has quit [Remote host closed the connection]
halves has quit [Quit: Ping timeout (120 seconds)]
yshui has quit [Read error: Connection reset by peer]
yshui has joined #dri-devel
jbarnes has quit [Read error: Connection reset by peer]
cascardo has joined #dri-devel
jbarnes has joined #dri-devel
halves has joined #dri-devel
Kayden has joined #dri-devel
mchehab_ has joined #dri-devel
Caterpillar has quit [Read error: Network is unreachable]
Caterpillar has joined #dri-devel
<DemiMarie>
airlied: thanks for the explanation
mbrost has quit [Ping timeout: 480 seconds]
sassefa has quit []
<HdkR>
DemiMarie: NVIDIA ARM boards get PCIe correct, including Xavier and Orin and of course Thor next year. Plus their server Grace offering of course.
sukuna has quit [Remote host closed the connection]
Duke`` has quit [Ping timeout: 480 seconds]
introducelogics has quit [Remote host closed the connection]
kzd has quit [Ping timeout: 480 seconds]
realmeninblack has joined #dri-devel
Calandracas has joined #dri-devel
fab has joined #dri-devel
<realmeninblack>
This is more of a demonstration for alternate way of doing it to dull on dma and ddr capabilities and have another view at things. Exact real procedure has been already talked about , two rounds of encoding, and you will see that the after power base marking pair round robin collision banning it leaves only somewhere 1024*1024 combinations, and we leave an error margin by the same size
<realmeninblack>
just in case, it is very clear that this is going to work, however 1024*1024 is replaced with highest number of double rounds of encoding. I have been so busy that have not coded the loop yet to see which number is the biggest in that double round set. But yes my propasal should be possible. RPI also comes with low pricetag and lots of flexibility of oss, kinda expected that hw bugs
<realmeninblack>
creap in, and very expected for many other socs as well, prolly dma as well as DDR are stable however, but none of this touches the fact that you can do transform feedback on any stage of shader engine , since the core is either unified or fixed function, the last which works in command queue based io views only, first has shader stages which are shared in arch. PCIe bugs can however
<realmeninblack>
be such kind that other IOs on unified architectures are not accessible from shader engines at a bugfluke, though i understood that most bugs reflect the rates which are not lithoraphy bugs but more like soc trace or lane shielding or whatever bugs on the motherboard. DMA and ddr arch is hugely well designed.
Company has quit [Quit: Leaving]
realmeninblack has quit [Remote host closed the connection]
Calandracas has quit [Ping timeout: 480 seconds]
tzimmermann has joined #dri-devel
vedranm_ is now known as vedranm
frankbinns has quit [Ping timeout: 480 seconds]
kts has joined #dri-devel
u-amarsh04 has quit []
kts has quit []
sghuge has quit [Remote host closed the connection]
sghuge has joined #dri-devel
amarsh04 has joined #dri-devel
MrCooper_ has quit [Remote host closed the connection]
MrCooper has joined #dri-devel
epoch101_ has joined #dri-devel
epoch101 has quit [Ping timeout: 480 seconds]
kugel has quit [Remote host closed the connection]
kugel has joined #dri-devel
mvlad has joined #dri-devel
kts has joined #dri-devel
samuelig has quit []
jkrzyszt has joined #dri-devel
samuelig has joined #dri-devel
LeviYun has quit [Ping timeout: 480 seconds]
LeviYun has joined #dri-devel
lynxeye has joined #dri-devel
warpme has joined #dri-devel
kts has quit [Quit: Leaving]
vliaskov has joined #dri-devel
K`den has joined #dri-devel
Kayden has quit [Read error: Connection reset by peer]
kaiwenjon has quit [Remote host closed the connection]
kaiwenjon has joined #dri-devel
magicalnull has joined #dri-devel
dstevenson has joined #dri-devel
<magicalnull>
now it's a bit hurry or busy situation , in another words other things eat time. But i can see that first power can be well 1+33, second 2+34, third 3+35, and null is 32+0, that yields single collision if both operands are same but they have different pc bases. 1+3 and 2+2 collide as others alike, 6+6 5+7 etc. That is the hardware way also.
YuGiOhJCJ has quit [Quit: YuGiOhJCJ]
guludo has joined #dri-devel
nerdopolis has joined #dri-devel
dolphin has joined #dri-devel
dolphin has quit []
yogesh_m1 has quit [Ping timeout: 480 seconds]
yogesh_m1 has joined #dri-devel
davispuh has joined #dri-devel
magicalnull has quit [Read error: Connection reset by peer]
kts has joined #dri-devel
apinheiro has quit [Quit: Leaving]
gallo729 has joined #dri-devel
heat has joined #dri-devel
himal has quit [Remote host closed the connection]
himal has joined #dri-devel
kts has quit [Quit: Leaving]
alane has quit []
alane has joined #dri-devel
himal has quit [Ping timeout: 480 seconds]
sassefa has joined #dri-devel
magicalnull has joined #dri-devel
nerdopolis has quit [Ping timeout: 480 seconds]
himal has joined #dri-devel
dumbbell has quit [Ping timeout: 480 seconds]
<magicalnull>
the pc indexes are pulled from the other end, 65+1-33+65+3-35+35+33=33+33+33+35+1+3=138 aka 33+33+72=138 where as 72+32+32=136 , first is 1+3 , second 2+2 , shifting one operand to 64 get's minimal 1 difference on collisions saves very little space if one wants. So that is kind of it from me, i've been spamming too hard, i am sorry.
Jeremy_Rand_Talos has quit [Remote host closed the connection]
Jeremy_Rand_Talos has joined #dri-devel
docmax has joined #dri-devel
bolson has joined #dri-devel
androidui has quit [Ping timeout: 480 seconds]
magicalnull has quit [Remote host closed the connection]
ZLangJIT has joined #dri-devel
ZLangJIT is now known as androidui
vliaskov has quit [Ping timeout: 480 seconds]
dumbbell has joined #dri-devel
kzd has joined #dri-devel
mdroper has joined #dri-devel
dsimic is now known as Guest7430
dsimic has joined #dri-devel
Guest7430 has quit [Ping timeout: 480 seconds]
cambrian_invader has joined #dri-devel
alane has quit []
alane has joined #dri-devel
f11f12 has quit [Quit: Leaving]
epoch101 has joined #dri-devel
Company has joined #dri-devel
tzimmermann has quit [Quit: Leaving]
bmodem has quit [Ping timeout: 480 seconds]
vliaskov has joined #dri-devel
Duke`` has joined #dri-devel
coldfeet has joined #dri-devel
rcf has quit [Quit: WeeChat 3.8]
rcf has joined #dri-devel
sukuna has joined #dri-devel
cmichael has quit [Quit: Leaving]
kts has joined #dri-devel
<cheako>
Hey, I was doing "well" but not great at writing vklayers in rust. ash, where I'm getting the vulkan types from, is good at writing vulkan structs... but is bad at reading them and I've a lot of code for just doing that. If ppl are interested in writing vulkan ICDs in rust, we should share code.
<digetx>
could anyone from Intel please ack the last patch of https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30988 that updates CI flake expectation for Zink/ADL test, Marge refuses to apply this mesa-cache MR because Intel CI fails due to the flaky test
Lucretia has quit []
lemonzest has joined #dri-devel
lemonzest1 has quit [Ping timeout: 480 seconds]
Lucretia has joined #dri-devel
rasterman has quit [Quit: Gettin' stinky!]
jkrzyszt has quit [Quit: Konversation terminated!]
dstevenson has quit []
nerdopolis has joined #dri-devel
<kisak>
mattst88: fwiw, media-libs/mesa-24.1.7 USE="opencl" was FTBFS for me. After transitioning back to my irregular mesa ebuild to go back to 24.2.5, it's fine again.
<kisak>
I should have grabbed a build log snippet. Alas...
himal has quit [Ping timeout: 480 seconds]
warpme has quit []
Company has quit [Remote host closed the connection]
tobiasjakobi has joined #dri-devel
tobiasjakobi has quit [Remote host closed the connection]
cyrinux has quit []
cyrinux has joined #dri-devel
frankbinns has joined #dri-devel
heat has quit [Read error: Connection reset by peer]
heat has joined #dri-devel
rasterman has joined #dri-devel
vliaskov has quit [Ping timeout: 480 seconds]
epoch101 has joined #dri-devel
sassefa has quit []
glennk has quit [Read error: No route to host]
jfalempe has quit [Quit: jfalempe]
sukuna has quit [Ping timeout: 480 seconds]
tobiasjakobi has joined #dri-devel
tobiasjakobi has quit [Remote host closed the connection]
nerdopolis has quit [Ping timeout: 480 seconds]
LeviYun has quit [Ping timeout: 480 seconds]
LeviYun has joined #dri-devel
mbrost has joined #dri-devel
mbrost_ has quit [Ping timeout: 480 seconds]
glennk has joined #dri-devel
kts has quit [Quit: Leaving]
oneforall2 has quit [Remote host closed the connection]
oneforall2 has joined #dri-devel
iive has joined #dri-devel
coldfeet has quit [Remote host closed the connection]
mbrost has quit [Ping timeout: 480 seconds]
vedranm_ has joined #dri-devel
vedranm has quit [Ping timeout: 480 seconds]
alanc has quit [Remote host closed the connection]
alanc has joined #dri-devel
mbrost has joined #dri-devel
vedranm has joined #dri-devel
vedranm_ has quit [Ping timeout: 480 seconds]
mbrost has quit [Ping timeout: 480 seconds]
<benjaminl>
is there a reliable way to get the nir variable associated with a {load,store}_input intrinsic?
sukuna has joined #dri-devel
mvlad has quit [Quit: Leaving]
Flerix has joined #dri-devel
Flerix has left #dri-devel [#dri-devel]
jsa1 has joined #dri-devel
jsa has quit [Ping timeout: 480 seconds]
YuGiOhJCJ has joined #dri-devel
sravn has joined #dri-devel
guludo has quit [Quit: WeeChat 4.4.2]
mbrost has joined #dri-devel
<karolherbst>
benjaminl: usually you do that before IO lowering was done on the shader, and after lowering all the relevant information should be part of the load/store instruction
sukuna has quit [Ping timeout: 480 seconds]
glennk has quit [Remote host closed the connection]
Company has joined #dri-devel
Duke`` has quit [Ping timeout: 480 seconds]
<benjaminl>
karolherbst: thanks! figured out that there was already a pass to do exactly this for the property I was interested in :)
<karolherbst>
ahh, cool
<alyssa>
benjaminl: my usual advice is "don't>
<benjaminl>
curious about the design intention behind doing it that way instead of preserving the var association?
<alyssa>
use locations on lowered i/o instead if you can
mbrost_ has joined #dri-devel
mbrost has quit [Ping timeout: 480 seconds]
mbrost has joined #dri-devel
mbrost_ has quit [Ping timeout: 480 seconds]
rasterman has quit [Quit: Gettin' stinky!]
epoch101 has quit []
jrayhawk has joined #dri-devel
jsa1 has quit [Ping timeout: 480 seconds]
<DemiMarie>
airlied: which Ampere? Ampere Altra has an erratum (PCIE_65) that makes it not work, with the workaround being to emulate unaligned accesses in the kernel.
<HdkR>
They were referring to "The new Ampere" So that would be the AmpereOne
<HdkR>
New is relative considering it's already over a year old :D
<HdkR>
Just need System76 to immediately replace their new system with a recent chip instead
sassefa has joined #dri-devel
<DemiMarie>
Also, do the Nvidia chips work with generic PCIe GPUs, or just with their own? The Nvidia driver has a workaround for the bug.
<DemiMarie>
HdkR: IMO Linux should just add an emulator for unaligned access faults and enable it by default on all Arm machines.
<HdkR>
DemiMarie: I have a Radeon Pro W7500 plugged in to my Jetson Orin
<HdkR>
Eh, maybe the unaligned handler for everything makes sense, but I'd prefer if ARM vendors just fix their broken hardware.
mbrost has quit [Ping timeout: 480 seconds]
<HdkR>
Although I definitely don't recommend buying an Orin board. It's old and has bugged atomics
<HdkR>
Thor will be a significant upgrade :)
<daniels>
trapping and fixing unaligned access is … not great for performance
Haaninjo has quit [Quit: Ex-Chat]
<HdkR>
Also hard to be entirely correct when crossing 16B and cacheline access granularies
<HdkR>
granularities*
nerdopolis has joined #dri-devel
<DemiMarie>
daniels: what about recompiling everything with `-fstrict-align`?
mbrost has joined #dri-devel
<DemiMarie>
I’d prefer for hardware to be fixed too, but in the absence of that then an unaligned access emulator is the best option I know of.
<iive>
DemiMarie, I think that's a different type of align
<DemiMarie>
iive: the idea is to prevent the compiler from generating unaligned accesses so they don't need to be trapped
<iive>
why is the compiler generating unaligned access at all on architecture that doesn't support it?
docmax_ has joined #dri-devel
epoch101 has joined #dri-devel
<iive>
I understand if there is a bug where pointer arithmetic leads to unaligned access. but stuff that is entirely controlled by the compiler...
docmax has quit [Ping timeout: 480 seconds]
mbrost_ has joined #dri-devel
<iive>
my bad, it's the same align. but isn't it supposed to be set by march or target by default?
<iive>
hum... arm arch doesn't even have the options. aarch64 does.
mbrost has quit [Ping timeout: 480 seconds]
sima has quit [Ping timeout: 480 seconds]
mbrost_ has quit [Ping timeout: 480 seconds]
epoch101 has quit []
<iive>
apparently ampere is aarch64.
epoch101 has joined #dri-devel
<HdkR>
Indeed it is
<HdkR>
Neoverse-N1 based cores in the Ampere Altra, custom design in the AmpereOne