ChanServ changed the topic of #dri-devel to: <ajax> nothing involved with X should ever be unable to find a bar
nchery is now known as Guest6320
nchery has joined #dri-devel
Guest6320 has quit [Ping timeout: 480 seconds]
<airlied>
dcbaker: can we ensure 22.1.5 gets 878784dbec00d1d5cd4d3d080d72d740e3197df4 for crocus, and a680fd078c0a7574b60fbf9a7e5c9f42c97a744e, 38a2a2da3e5f7110ac53a1ffa5fe5617553895f7 for llvmpipe
<dcbaker>
airlied: sure, I’m still working through a giant backlog of patches, and probably won’t get it out today. Still hoping to get 22.2-rc1 out tonight. See what happens
mbrost_ has quit [Ping timeout: 480 seconds]
mbrost has quit [Ping timeout: 480 seconds]
<airlied>
dcbaker: no worries, just saw staging update and wanted to make sure they would land!
<dcbaker>
Cool, I work though it a fewpatches at a time so I can watch ci
jkrzyszt has quit [Remote host closed the connection]
gouchi has joined #dri-devel
jkrzyszt has joined #dri-devel
anholt has quit [Ping timeout: 480 seconds]
slattann has quit []
toolchains has joined #dri-devel
idr has joined #dri-devel
alanc has quit [Remote host closed the connection]
alanc has joined #dri-devel
whald has quit [Remote host closed the connection]
JoniSt has joined #dri-devel
fxkamd has joined #dri-devel
ybogdano has joined #dri-devel
toolchains has quit [Ping timeout: 480 seconds]
fxkamd has quit []
jkrzyszt has quit [Ping timeout: 480 seconds]
toolchains has joined #dri-devel
alyssa has joined #dri-devel
<alyssa>
cwabbott: Is there a particular reason to force lower phis to scalar?
<alyssa>
In the spiller, if a vector is spilled (ie a vectorized load_global that can't be remat due to hazards), and that value is used in another block, we get a vector phi
<alyssa>
(Or a 128-bit scalar phi if you prefer..)
<alyssa>
Either we can scalarize that in the spiller (duplicate the phi 4x, add a SPLIT into each predecessor, add a COLLECT of the phis)
bmodem has joined #dri-devel
jewins1 has joined #dri-devel
<alyssa>
Or we can just allow "wide" phis, which seemingly only requires duplicating some moves in the phi -> parallel copy lowering
bmodem has quit []
<alyssa>
The latter seems less annoying, maybe more efficient, and would allow handling 32-bit vec4 phis in NIR without any extra special that I can see
<alyssa>
But maybe there is some awful edge case I'm missing (e.g. complicating live range splitting?)
danvet has quit [Remote host closed the connection]
MajorBiscuit has quit [Quit: WeeChat 3.5]
jewins has quit [Ping timeout: 480 seconds]
jfalempe has quit [Remote host closed the connection]
tursulin has quit [Ping timeout: 480 seconds]
jfalempe has joined #dri-devel
toolchains has quit [Read error: Connection timed out]
jfalempe has quit [Quit: Leaving]
fxkamd has joined #dri-devel
fxkamd has quit []
toolchains has joined #dri-devel
JohnnyonFlame has joined #dri-devel
toolchains has quit [Ping timeout: 480 seconds]
toolchains has joined #dri-devel
Duke`` has joined #dri-devel
rasterman has joined #dri-devel
sul has quit [Ping timeout: 480 seconds]
sul has joined #dri-devel
toolchains has quit [Ping timeout: 480 seconds]
ybogdano has quit [Ping timeout: 480 seconds]
mbrost has joined #dri-devel
<alyssa>
NIR doesn't want to give me vector phis anyway...
<jekstrand>
Ugh... it looks like mmap of GEM bos doesn't work with offsets. I'd forgotten that. :-/
<jekstrand>
danvet, airlied: Remember why?
<alyssa>
"gl_FragColor = something ? texture2D(foo) : bar", where everything is fp16
<alyssa>
getting good code generated on Bifrost requires opt_phi_precision before lowering alu to scalar and *not* lowering phis to scalar
<alyssa>
(make gl_FragColor a 64-bit phi, in other words)
<alyssa>
admittedly the first lowering of phis to scalar happens in mesa/st! unconditionally if doubles are lowered.
<alyssa>
lowering the phi to scalar gets 4 16-bit phis
sul has quit [Read error: Connection reset by peer]
<alyssa>
but because the backend doesn't support true 16-bit (only packed 2x16 ops), each of those scalars gets padded to 32-bit
<alyssa>
so effectively the backend has to insert a pile of u2u32 and u2u16 ops
<alyssa>
whereas if we keep the phi vectorized, everything "just works"
<alyssa>
This case probably doesn't really matter
sul has joined #dri-devel
<alyssa>
but it suggests that maybe the backend should ingest vectorized phis after all.
<alyssa>
rather, "wide" phis
<alyssa>
think "a scalar 128-bit phi" instead of "vec4 phi"
<alyssa>
which can get lowered to "scalar 128-bit moves" which can then be lowered to individual MOV.i32 instructions
frankbinns has quit [Remote host closed the connection]
ybogdano has joined #dri-devel
toolchains has joined #dri-devel
<alyssa>
A bit concerning that how hard it is to *not* scalarize phis from mesa/st...
lanodan has joined #dri-devel
lanodan has quit [Quit: WeeChat 3.5]
fahien has quit [Ping timeout: 480 seconds]
toolchains has quit [Ping timeout: 480 seconds]
fxkamd has joined #dri-devel
lanodan has joined #dri-devel
<robclark>
jekstrand: at least partly because the offset is used to map back to the gem bo
toolchains has joined #dri-devel
fxkamd has quit []
ybogdano has quit [Ping timeout: 480 seconds]
fahien has joined #dri-devel
<jekstrand>
robclark: Sure but it shouldn't have to be an exact match. We should be able to find the BO with that offset in its range.
<lygstate>
the failure is WARNING: Uploading artifacts as "archive" to coordinator... 500 Internal Server Error id=26082354 responseStatus=500 Internal Server Error status=500 token=yPaUs7Hm
<jekstrand>
jenatali: How do you "reserve" address space so you know you can safely map something there?
<jekstrand>
The first answer there provides the (racy) solution.
<jekstrand>
I mean, there's races with Unix mmap() too, but they're different.
<jenatali>
Right, I don't think there is a non-racy solution. Which sounds awful TBH
<jekstrand>
In any case, I think I now know how I'll test this monster on Windows if that's ever needed. (It won't be unless someone adds more WDDM2 API.)
<lygstate>
What's the intention to mapping to specific address?
<jekstrand>
Wine wants to be able to so they can control where vkMapMemory() places maps with 32-bit apps running on a 64-bit Linux userspace.
<jekstrand>
But there's nothing truly Linux-specific about the proposed feature except for the fact that Windows can't implement it.
<lygstate>
alyssa: needs bot again for sparse array
<lygstate>
Random 500 Internal Server Error, what's wrong with the artifacts server
<jekstrand>
lygstate: done
JohnnyonFlame has joined #dri-devel
<milek7>
maybe VirtualAlloc2 with MEM_RESERVE_PLACEHOLDER
<milek7>
and MapViewOfFile3 with MEM_REPLACE_PLACEHOLDER
<HdkR>
jekstrand: Oh, I see you're trying to solve my problems.
<HdkR>
Well, FEX problems. Nobody can solve the rest of my problems.
libv has quit [Remote host closed the connection]
<jenatali>
milek7: Huh hadn't seen those before, look at that
<jekstrand>
HdkR: Are those your problems? I didn't know. Someone asked for something and they were nice about it and it seemed reasonable.
alyssa has left #dri-devel [#dri-devel]
ybogdano has quit [Ping timeout: 480 seconds]
<HdkR>
jekstrand: FEX runs 32-bit applications as 64-bit so solving the memory allocation problems without thieving 256TB of VA space would be nice
<jekstrand>
I'm trying
<jekstrand>
I think things are starting to settle, maybe.
<HdkR>
I'll be curious with what you come up with. I've had a few ideas that all involve kernel help which I don't ever want to touch
<HdkR>
Still affects us but this probably only matters once Hangover is wired up
<HdkR>
Since I also need to make it so regular dumb mmap(nullptr, ...) doesn't hit us :)
<jekstrand>
That I can't help you with. :)
<HdkR>
I need a kernel dev to solve my woes
<jekstrand>
I pretend to be a kernel dev sometimes
<jekstrand>
But don't ask me to touch core MM
<HdkR>
Someone once recommended adding like five arch specific syscalls and I got very sad
<lygstate>
I am curios how the vulkan driver on Windows works?
<lygstate>
Does the windows only provide a 64 bit version vulkan driver, and the 32bit one also using it?
<HdkR>
Nah, you need both 32-bit and 64-bit for the userspace still
<milek7>
HdkR: isn't this what MAP_32BIT is for?
<HdkR>
milek7: MAP_32BIT doesn't exist on ARM, and also only covers a 2GB window within 32-bit VA space.
<jekstrand>
lygstate: Same as 32-bit on Linux. The OS knows it's talking to 32-bit userspace and gives you a 32-bit pointer.
<jekstrand>
The reason why the Wine case is so funky is it's a 32-bit Windows app on 64-bit Linux userspace on a 64-bit Linux kernel.
<HdkR>
And the reason why FEX is so funky is that it is {32,64]-bit {Linux,Windows} on 64-bit AArch64 Linux userspace on 64-bit Linux Kernel
<lygstate>
Does d3d9 driver of mesa have same issue?
<jekstrand>
lygstate: Probably.
<jekstrand>
robclark: It looks like it would be pretty easy to allow sub-ranges in the higher levels. Not sure if drivers individual mmap() implementations allow it, though.
<robclark>
jekstrand: it could probably work.. but are you trying to save virtual addr space?
<milek7>
HdkR: ah, but then couldn't you do your own bookkeeping and use MAP_FIXED?
<jekstrand>
robclark: When trying to allow, effectively, MAP_FIXED for vkMapMemory(), not letting the client map ranges at all seems pretty mean.
<HdkR>
milek7: Yes, but ioctl, shmat, and mremap could still end up above 32-bit VA space.
<jekstrand>
robclark: It's expressly designed for 32-bit apps which are way more likely to run into VA limits
<HdkR>
milek7: also uncontrolled mmap which is expected to end up in 32-bit VA space
<clever>
i had also theorized that MAP_32BIT could solve the problems the rpi firmware has
<clever>
the old rpi firmware api's, would use a 32bit userland pointer as an opaque token passed off to the gpu
<clever>
so when the reply comes back, userland can find its own state and deal with the reply properly
<clever>
but a 64bit pointer wont fit into that 32bit field!
<clever>
MAP_32BIT would ensure the state is at a small enough addr, that you can just truncate the 64bit pointer safely
<clever>
but RPF ignored my idea, and instead switched everything over to v4l based api's
<clever>
which is probably a better choice in the long run
<clever>
v4l and drm
<robclark>
jekstrand: hmm, ok running 32b app in 64b process is.. interesting.. anyways, I don't see anything obvious that would prevent lifting that restriction
<clever>
robclark: there is also a rarely used ABI, where you are using 64bit opcodes, but 32bit pointers
<clever>
because some programs dont need >4gig of ram, but do want the benefit of 64bit regs
<HdkR>
clever: Wine just does a far jump to switch the operating mode of the CPU
<robclark>
hmm, other than drm/mtk.. (that I've found so far)
<HdkR>
Because ILP32 is still incompatible
<clever>
HdkR: but that feels like an x86 only trick?
<HdkR>
That's because it is
<HdkR>
ILP32 is an architecture specific feature as well
<clever>
i looked into it before, and there is also an arm version of ILP32
<clever>
32bit pointers while in aarch64 mode
<HdkR>
Which only Apple is shipping in their Watch, it's dead otherwise
<clever>
ive thought of it, because of devices like the rpi-zero2
<clever>
its 64bit capable, but only has 512mb of ram
<robclark>
jekstrand: hmm, or, hmm, I have some doubts about things that are usingcma stuff.. but I guess the overlap of that and vk stuff is kinda the empty set
<clever>
doubling the pointer size serves no use, and only wastes ram
<robclark>
jekstrand: so, umm.. send patch and see what catches fire?
<jekstrand>
robclark: Yeah, I'm looking through i915 right now and I'm not sure it's mmap stuff is offset-safe. :-/
toolchains has joined #dri-devel
<robclark>
i915 I skipped ;-)
<jekstrand>
Also, if you mmap an imported dma-buf, you're trusting the other driver to be offset-safe so we'd need to audit all of dma-buf. :-/
<robclark>
jekstrand: maybe add a per-drm_gem_object flag allowing sub-buffer mapping
<jekstrand>
robclark: That's really awkward to expose via Vulkan
<robclark>
so we can just opt-in on a per driver and avoid the edge cases
<robclark>
does vk really expect to be able to map imported dmabuf fd's?
<mattst88>
clever: there are a bunch of kernel security features that only work in 64-bit mode, FWIW
<clever>
mattst88: but you can always mix a 64bit kernel with a 32bit userland
<mattst88>
ah, true
<clever>
raspi-os runs an armv6 userland, but arm_64bit=1 in config.txt gives you an aarch64 kernel, but keeps the armv6 userland
nchery has quit [Read error: Connection reset by peer]
<mattst88>
I really have doubts whether 64-bit integers are incredibly valuable enough to warrant something like x32 on arm
<clever>
depends a lot on what your workload actually is
<mattst88>
at least x32 got you twice the number of registers on x86, and it was still kind of a flop
<clever>
i think aarch64 also just has more registers overall?
<mattst88>
yeah, I'm sure there are cases
<glehmann>
jekstrand: Would only allowing placed maps for a full VkDeviceMemory object make things easier?
<robclark>
arm32 is kinda going away
<mattst88>
I think both arm and aarch64 have 32 registers
<robclark>
isn't arm dropping support from big cores in future things
<clever>
robclark: i still dont have a working aarch64 bootloader in my firmware, so i'm still 32bit only
<clever>
robclark: i think i heard something about how they are moving towards EL3/EL2/EL1 being 64bit only, and only allowing 32bit in EL0 (userland)
<clever>
there is a good deal of junk in the arm core, to support control registers in both 32 and 64bit
<urja>
that's already a thing for server cores
<clever>
but userland cant touch those control regs
<clever>
so the compat stuff can just go away, while still being compatible with a 32bit userland
<urja>
the Ampere things can do 32bit for userland only
<jekstrand>
glehmann: That's what the extension I've drafted does. It has a (currently unused) feature bit for sub-ranges but I don't expect anyone to advertise it right now. If the bit isn't set, you have to use offset=0 size=VK_WHOLE_SIZE
<HdkR>
robclark: Yes, 32-bit is going away, X2 and A510 already killed 32-bit support, A710 is the only one with 32-bit support on latest Cortex.
rkanwal has joined #dri-devel
<glehmann>
jekstrand: ah indeed, sorry I hadn't seen your latest update to the PR
<HdkR>
Cortex-A715 is already announced to drop 32-bit entirely, so next generation chips with X3+A715+A510 won't support 32-bit at all.
<jekstrand>
glehmann: I'm switching it out for VK_WHOLE_SIZE right now to make everything clearer
kts has quit [Ping timeout: 480 seconds]
<glehmann>
tbh 99% of the time the application is going to request the full range anyway
<jekstrand>
HdkR: Yeah.... But when you're emulating x86 on Arm, you're kinda asking for it. :-P
Guest6400 has quit [Ping timeout: 480 seconds]
<HdkR>
All aboard the pain train
<HdkR>
I feel bad for the Microsoft XTA emulation devs who need to rewrite the 32-bit emulation to AArch64 there.
<HdkR>
I was lucky enough to make the choice to NOT support AArch32
gouchi has quit [Remote host closed the connection]
toolchains has joined #dri-devel
toolchains has quit [Ping timeout: 480 seconds]
<jessica_24>
hey vsyrjala: I've posted a patch for igt kms_cursor_legacy (https://patchwork.freedesktop.org/series/106740/) switching the order of the nonblocking flip and cursor ioctl. Can you take a look over it?