<HdkR>
Good news everybody, B580 works on ARM devices
amarsh04 has quit []
u-amarsh04 has joined #dri-devel
paulk-bis has quit []
paulk has joined #dri-devel
apinheiro has joined #dri-devel
sguddati has quit [Ping timeout: 480 seconds]
kasperiaken has joined #dri-devel
<dolphin>
HdkR: Is that your picture?
heat is now known as Guest7573
heat has joined #dri-devel
Guest7573 has quit [Remote host closed the connection]
<kasperiaken>
The reason why they came after me to kill me, was not the spam or programming i did, i assume. Some other X profylactics to hide their activity against me, bury that all so to speak. My heart was not failing quick enough nor other organs, where the other side failure repeadedly took place. They managed to deliver electromagnetic field strong enough to kill an average bacteria infected
<kasperiaken>
elephant, but this all i knew, i know when i am duying and when i can stay there yet for some reason so the bacteria was painful i am at the average of tens of folds stronger at this position then any other average human, because the indicators to me ar frozen to the exact perfect, so if we raised the scale, i am several orders of magnitute, more resillient planet earth inhabitant, than
<kasperiaken>
any of the person you had ever seen. Where i say just fuck off with your slut and terror attacks. My mentality isn't such that i kill myself cause of a bitch or steroid muncher nor even military behind your back.
<HdkR>
dolphin: yeh
<dolphin>
Do you mind it being shared in internal chat?
<HdkR>
dolphin: Sure
<dolphin>
Nice feat btw. What is the board or is it not a public one?
<HdkR>
dolphin: It's an NVIDIA Jetson AGX Orin, which is a board that's been available for like five years now
<HdkR>
It's not like I have any non-public boards to post :P
<dolphin>
Cool. Did you need any patching or did it work out of drm-tip?
<mupuf>
HdkR: spoken like a hoarder or non-public board would!
<HdkR>
dolphin: This was upstream v6.13 kernel without any patches
<HdkR>
mupuf: You know I have a pile of all the ARM devices :P
<mupuf>
HdkR: you indeed do! Including the super-not-cheap one!
<HdkR>
:blobsweat:
<HdkR>
I'm fearing the price of the new board this summer
<mupuf>
Although... where is your Ampere server?
<mupuf>
new board from...?
<HdkR>
mupuf: PCIe is buggered on that platform, not worth getting. Waiting for AmpereOne A96-X37 to actually launch
<HdkR>
mupuf: Jetson Thor launches this summer
<mupuf>
oooooooh!
<mupuf>
And... ROFL for the Ampere :D
<dolphin>
HdkR: Do you mind sending dmesg with drm debug logs to mailing list maybe? I think folks are looking into Raspi 5 not working, so it might help to cross-reference
<HdkR>
dolphin: Pi has bugged PCIe so that's not entirely unexpected there
<dolphin>
Well, my experience with anything Pi and mainline also makes me very suspicious of being driver problem
<HdkR>
:D
coldfeet has joined #dri-devel
kaiwenjon has quit [Read error: Connection reset by peer]
<HdkR>
Pretty sure people poking on pi have kernel patches to emulated unaligned accesses on device memory or something nasty
<HdkR>
and on radeon they change some of the mapping types as well
kasperiaken has quit [Remote host closed the connection]
<daniels>
karolherbst: can you point to that in the MR please to add another skip?
griffithmonster has joined #dri-devel
rz_ has joined #dri-devel
rz has quit [Remote host closed the connection]
coldfeet has quit [Quit: Lost terminal]
<griffithmonster>
unaligned access is probably done with read-modify-write it's indeed a pci-e packet or PCI too, but if hardware has some failures at those packets on timing such as not all lanes working, it imo would still succeed on longer burst, so i do not quite understand you. If some lanes fail like airlie diagnosed on x16, then probably x1 conf works, or the whole chipset would not function
<griffithmonster>
that is how i would understand, but very deep experience i do not have with that, been fighting too much on other fronts.
griffithmonster has quit [Remote host closed the connection]
<karolherbst>
done
guludo has joined #dri-devel
yrlf has quit [Quit: Ping timeout (120 seconds)]
yrlf has joined #dri-devel
feaneron has joined #dri-devel
nerdopolis has joined #dri-devel
rgallaispou has quit [Remote host closed the connection]
ADS_Sr has quit [Ping timeout: 480 seconds]
rgallaispou has joined #dri-devel
kugel is now known as Guest7589
kugel has joined #dri-devel
kugel is now known as Guest7590
kugel has joined #dri-devel
fab has quit [Quit: fab]
Guest7589 has quit [Ping timeout: 480 seconds]
Guest7590 has quit [Ping timeout: 480 seconds]
feaneron has quit [Quit: feaneron]
feaneron has joined #dri-devel
fab has joined #dri-devel
Company has joined #dri-devel
heat is now known as Guest7595
Guest7595 has quit [Read error: Connection reset by peer]
<daniels>
‘ERROR - dEQP error: MESA: error: Failed to allocate device memory for BO’
<daniels>
probably needs to be skipped unless parallelism gets reduced
<alyssa>
that specific test or the whole job?
<alyssa>
if it really is just OOM and not a logic bug, we'd be playing whack-a-mole with the flake list (especially when considering vk cts uprevs)
frankbinns1 has quit [Read error: Connection reset by peer]
frankbinns1 has joined #dri-devel
<daniels>
well, either you skip the tests that are too aggressive on memory allocation, or you keep running them but you reduce parallelism across the whole job
<daniels>
usually what happens is that we skip those and leave them for the -full runs which run with reduced parallelism because they can go long
<zmike>
yeah there's a few of those on the zink jobs and they're just skipped
<zmike>
it's nbd
haaninjo has joined #dri-devel
<alyssa>
fair enough
JRepin has quit []
JRepin has joined #dri-devel
frankbinns1 has quit [Ping timeout: 480 seconds]
frankbinns has joined #dri-devel
coldfeet has joined #dri-devel
lemonzest has quit [Quit: WeeChat 4.5.1]
<DemiMarie>
HdkR: Asahi is going to take those patches. In the meantime, can't one recompile the world with strict-alignment flags?
alane_ has joined #dri-devel
alane has quit [Ping timeout: 480 seconds]
fab has quit [Quit: fab]
fab has joined #dri-devel
<chaos_princess>
what makes you think so?
valpackett has quit [Read error: Connection reset by peer]
vliaskov has quit [Remote host closed the connection]
frankbinns has joined #dri-devel
<sima>
pepp, nice work on the sched tracepoint series :-)
<sima>
I guess you have one of the sched maintainers who can push this to drm-misc for you?
<sima>
or maybe get commit rights if you plan to do more
tzimmermann has quit [Quit: Leaving]
JRepin has quit []
JRepin has joined #dri-devel
<sima>
airlied, I guess drm-misc forgot to send merge window fixes
<sima>
also nothing from intel
davispuh has joined #dri-devel
lemonzest has joined #dri-devel
cyrinux has quit []
cyrinux has joined #dri-devel
<DemiMarie>
sima: How does HMM migrate pages to the GPU?
valpackett has joined #dri-devel
hansg has quit [Quit: Leaving]
kts has quit [Quit: Leaving]
<sima>
DemiMarie, it's essentially like any other migration at the core mm datastructure level
<sima>
try_to_migrate() and migration ptes holding up do_swap_page() is the main magic
<sima>
only difference is that the driver orchestrates the overall flow, so that it can do a gpu blt copy to actually move the data instead of memcpy() on the cpu
<DemiMarie>
sima: For context, I'm wondering if it would be reasonable to try to migrate pages to (virtual) VRAM and either wait forever for the migration to complete, or leave the pages on the CPU (with a warning in the logs) if the migration can't be done.
<sima>
oh for that issue it's just normal migration, since the device memory is coherent, so you can install cpu ptes pointing at it
<sima>
so not the device private stuff
<DemiMarie>
The device memory is also virtual, so one can have a lot of it.
<DemiMarie>
At least one should be able to in a reasonable design.
<DemiMarie>
Is migration something that can be relied on for shared virtual memory, or is it too unreliable for that?
<sima>
you need to allocate a struct page array for the entire thing, so might not want to make it too big
<sima>
but you can register more dev_pagemap ranges if you run out I guess
<DemiMarie>
I'm mostly thinking from the hypervisor PoV
<DemiMarie>
From the hypervisor perspective, the simplest way to support userptr with virtio-GPU on Xen is to require the guest to migrate the userptr memory to a blob object.
<sima>
yeah
<sima>
I guess you could try and see how well it works
<sima>
but if userspace is doing stuff like direct i/o and similar you might need to retry a lot and might not ever win
<DemiMarie>
I'd say that's a userspace bug.
<sima>
that's pretty much what people want userptr for though
<DemiMarie>
why?
<DemiMarie>
because you can't do O_DIRECT with dmabuf?
<sima>
yup
<DemiMarie>
Could that limitation be fixed?
<sima>
direct io into dma-buf?
<DemiMarie>
yes
<sima>
unlikely, the entire model falls apart in how memory is managed
<DemiMarie>
The underlying problem on the Xen side is that guest physical memory is completely different from the host virtual memory and not under control of Linux core MM.
<sima>
but yeah on your question, I've chatted with a bunch of core mm developers on how reliable migration is, and they just loled
<sima>
otoh stuff like cma seems to work fairly well, so maybe you're lucky
<DemiMarie>
What I want is to say that you don't get userptr unless you use an LD_PRELOAD hack that turns anon mappings into dmabuf mappings
<sima>
yeah that doesn't work at all
<sima>
like the entire fork nonsense
<sima>
or the fact that anon memory can resize
<sima>
or the complete mismatch in locking and lifetimes
<DemiMarie>
#define fork abort 😆
<sima>
the syscall to spawn a process
<DemiMarie>
Is the only way to have reliable userptr with virtio-GPU for the hypervisor to move the pages behind the guest's back?
<DemiMarie>
yes, I know, my assumption was that these workloads would not be creating child processes, but also that was mostly meant as a joke (hence the emoji)
<sima>
apparently even pytorch is fork happy enough to be a nuisance
<DemiMarie>
because dmabuf can't COW?
<sima>
before stuff like pin_user_pages(FOLL_LONGTERM) fixed all these
<sima>
yeah
<sima>
they also have a fixed size
<sima>
and mostly the assumption that memory is allocated all at once
<DemiMarie>
I'm surprised that one can't make O_DIRECT into dmabuf work via p2pdma or similar.
<DemiMarie>
Since it can't be done, though, that is really bad
<DemiMarie>
Do Linux GPU drivers provide enough APIs for native contexts to implement HMM?
<DemiMarie>
userspace API that is
<DemiMarie>
I think that would require being able to handle GPU page faults in user mode
<DemiMarie>
Will Linux blow up if the hypervisor points the stage 2 page tables at GPU memory?
<DemiMarie>
I wonder if this needs a conference call between GPU driver devs, core MM devs, hypervisor devs, and the people who are doing the wrok.
<pepp>
sima: thanks. I don't have commit rights, but I think Christian could push it
<sima>
DemiMarie, it's locking and lifetime rules, not "can the hw do it"
<sima>
so p2pdma isn't helping
<sima>
unless you mean the magic that pretends mmio ranges have struct page, which is just for direct i/o
<sima>
because that really needs struct page or it wont work
<DemiMarie>
that is getting beyond what I know about Linux core mm stuff
<DemiMarie>
I'll take your word that it won't work, though
alanc has quit [Remote host closed the connection]
alanc has joined #dri-devel
chewitt has quit [Quit: Zzz..]
epoch101 has joined #dri-devel
epoch101_ has joined #dri-devel
epoch101 has quit [Ping timeout: 480 seconds]
<DemiMarie>
So if migration in the guest isn't reliable, that leaves migration done by the hypervisor, which is transparent to the guest. That is going to be harder, though.
lynxeye has quit [Quit: Leaving.]
epoch101 has joined #dri-devel
epoch101_ has quit [Ping timeout: 480 seconds]
Hazematman has quit [Quit: WeeChat 4.5.1]
epoch101_ has joined #dri-devel
epoch101 has quit [Ping timeout: 480 seconds]
Hazematman has joined #dri-devel
Hazematman has quit []
feaneron has quit [Ping timeout: 480 seconds]
JRepin has quit []
JRepin has joined #dri-devel
coldfeet has quit [Quit: Lost terminal]
feaneron has joined #dri-devel
chamlis has quit [Remote host closed the connection]
<benjaminl>
I'm observing a situation where nir_lower_io_to_temporaries changes the type of flat varyings from float to int
<benjaminl>
I'm assuming this is a bug, anybody seen it before?
ybogdano has quit [Quit: Ping timeout (120 seconds)]
YuGiOhJCJ has joined #dri-devel
<ngcortes>
zmike, something in mesa=fbacf3761f9a1cbef024e1ddd55be72f535de09c..4b0f2d1a2b30e5a288f909683203bfa3bbfacf50 is crashing deqp in CI when running zink
<ngcortes>
*Intel Mesa CI
<zmike>
ngcortes: I assume it's the same thing I fixed this morning
<ngcortes>
zmike, ah okay, I'll re-run our main gl job
<ngcortes>
weird that CI didn't pick up the fix and run it yet
ybogdano has joined #dri-devel
feaneron has quit [Quit: feaneron]
ybogdano has quit [Quit: Ping timeout (120 seconds)]
ybogdano has joined #dri-devel
Hazematman has joined #dri-devel
Hazematman has quit []
ADS_Sr has quit [Remote host closed the connection]
ADS_Sr has joined #dri-devel
rasterman has quit [Quit: Gettin' stinky!]
ybogdano has quit [Read error: Connection reset by peer]
Duke`` has quit [Ping timeout: 480 seconds]
Duke`` has joined #dri-devel
ADS_Sr has quit [Remote host closed the connection]
ybogdano has joined #dri-devel
fab has quit [Quit: fab]
fab has joined #dri-devel
alyssa has quit [Quit: alyssa]
<airlied>
sima: yeah I left it for a while longer just in case, but I'll send it now
sima has quit [Ping timeout: 480 seconds]
_rgallaispou has quit [Ping timeout: 480 seconds]
fab has quit [Quit: fab]
iive has joined #dri-devel
jkrzyszt has quit [Quit: Konversation terminated!]
<iive>
today I tried kernel 6.13.0 and playing light-weight proton game had some progressive slowdown. reverted to older kernel and got no issues. I'm with RX 570 card (polaris10 iirc).
<iive>
is there any known issues, or should I dig further?
<iive>
mesa 24.3.3 with llvm 19.1.7
Hazematman has joined #dri-devel
Hazematman has quit []
NiGaR has quit [Ping timeout: 480 seconds]
NiGaR has joined #dri-devel
Duke`` has quit [Ping timeout: 480 seconds]
heat has quit [Remote host closed the connection]
heat has joined #dri-devel
heat has quit [Remote host closed the connection]
digetx has quit [Remote host closed the connection]
epoch101_ has quit [Ping timeout: 480 seconds]
heat has joined #dri-devel
digetx has joined #dri-devel
<HdkR>
DemiMarie: What patches? I must have missed some context
<DemiMarie>
HdkR: which message are you replying to?
<HdkR>
DemiMarie: `< DemiMarie> HdkR: Asahi is going to take those patches. In the meantime, can't one recompile the world with strict-alignment flags?`
<DemiMarie>
HdkR: the ones to emulate unaligned access faults
<HdkR>
Why do they need it? They don't even support eGPUs
<DemiMarie>
To support eGPUs
<DemiMarie>
Which I think they planned on at some point.
<HdkR>
eh
<DemiMarie>
More generally, using PCIe GPUs on many Arm platforms requires that either userspace is compiled with "no unaligned access" flags, or that the kernel emulates unaligned access faults.
<HdkR>
I believe there is more issues than just unaligned device memory accesses isn't there? There's some hack that the Ampere tinkers are doing to replace uncached device memory accesses with WC or something?
<DemiMarie>
They need to force Normal memory types on PCIe to Device memory
<DemiMarie>
That doesn't support unaligned access, which is why emulation is needed.
<HdkR>
I hope it works for them
<HdkR>
I'm going to keep using platforms without buggered PCIe instead