alanc has quit [Remote host closed the connection]
alanc has joined #dri-devel
Lyude has quit [Ping timeout: 480 seconds]
fab has quit [Quit: fab]
sghuge has quit [Remote host closed the connection]
sghuge has joined #dri-devel
benjaminl has joined #dri-devel
jfalempe has joined #dri-devel
benjaminl has quit [Quit: WeeChat 3.8]
tursulin has joined #dri-devel
fab has joined #dri-devel
frankbinns1 has joined #dri-devel
smiles_ has quit [Ping timeout: 480 seconds]
frankbinns has joined #dri-devel
frankbinns2 has joined #dri-devel
frankbinns1 has quit [Ping timeout: 480 seconds]
frankbinns has quit [Ping timeout: 480 seconds]
<pq>
emersion, I read your reply on the KMS color pipeline thread, and I agree with everything your wrote.
pochu has joined #dri-devel
camus1 has quit [Read error: Connection reset by peer]
camus has joined #dri-devel
JohnnyonFlame has quit [Ping timeout: 480 seconds]
<emersion>
pq, sweet!
lynxeye has joined #dri-devel
smiles_ has joined #dri-devel
swalker_ has joined #dri-devel
swalker_ is now known as Guest2793
swalker__ has joined #dri-devel
Guest2793 has quit [Ping timeout: 480 seconds]
anarsoul|2 has joined #dri-devel
anarsoul has quit [Read error: No route to host]
xroumegue has joined #dri-devel
camus1 has joined #dri-devel
camus has quit [Read error: Connection reset by peer]
camus has joined #dri-devel
camus1 has quit [Remote host closed the connection]
jewins has joined #dri-devel
jewins has quit [Ping timeout: 480 seconds]
oneforall2 has quit [Remote host closed the connection]
oneforall2 has joined #dri-devel
pochu has quit [Quit: leaving]
rsalvaterra has quit []
rsalvaterra has joined #dri-devel
Danct12 has joined #dri-devel
pochu has joined #dri-devel
Danct12 has quit [Ping timeout: 480 seconds]
rasterman has joined #dri-devel
Leopold__ has quit [Remote host closed the connection]
Leopold_ has joined #dri-devel
robmur01 has quit [Remote host closed the connection]
JohnnyonFlame has joined #dri-devel
robmur01 has joined #dri-devel
pochu has quit [Read error: Connection reset by peer]
pochu has joined #dri-devel
bmodem1 has joined #dri-devel
bmodem has quit [Ping timeout: 480 seconds]
Piraty has quit [Remote host closed the connection]
Piraty has joined #dri-devel
Piraty has quit []
Piraty has joined #dri-devel
vliaskov has joined #dri-devel
djbw_ has quit [Read error: Connection reset by peer]
kasper93_ is now known as kasper93
FireBurn has joined #dri-devel
heat has joined #dri-devel
<FireBurn>
Would someone mind reverting 58e67bb3c131da5ee14e4842b08e53f4888dce0a I'm hoping to avoid it getting sent to airlied and onto linus
<zamundaaa[m]>
Is there a way to import an EGL fence?
<zamundaaa[m]>
I'm trying to blit a texture from one GPU to another, and with NVIdia that causes artifacts because of the lack of synchronization. Ideally I'd create an EGL fence on the source GPU, and have the other GPU wait before doing the blit with eglWaitSync, but I haven't found a way to actually get a fence for this on the destination GPU
<emersion>
look at weston maybe
FireBurn has quit [Ping timeout: 480 seconds]
<pq>
EGL_ANDROID_native_fence_sync might be the key
<zamundaaa[m]>
ah, so the fd is passed in as an attribute. Thanks!
<ickle>
win 16
jewins has joined #dri-devel
<emersion>
sad that there's no drm_syncobj love
f11f12 has joined #dri-devel
alyssa has joined #dri-devel
<alyssa>
gfxstrand: ok, I have something typed out to kill off abs/neg/fsat modifiers without requiring any nontrivial changes to backends
<alyssa>
(in particular, it does not require the backend to have working copyprop or dead code elimination)
<alyssa>
I hate it, but more than that I hate that we have backends that don't have DCE
<alyssa>
and, it means we actually have a chance of killing them off
<gfxstrand>
:sob
<alyssa>
so, probably worth the stupid
<alyssa>
the usual strategy--
<alyssa>
ahead-of-time trivialize pass that inserts copies to ensure fabs/fneg/fsat are folded 100% of the time,
<alyssa>
helpers to chase through fabs/fneg/fsat at backend isel time,
<alyssa>
and a gaurantee to backends that fabs/fneg/fsat will be chased 100% of the time so they just need to Not emit any code for them
<gfxstrand>
Running HSW now
<gfxstrand>
Let's see how bad the damage is.
fab has quit [Remote host closed the connection]
<alyssa>
from the nir_register changes?
<alyssa>
(Intel doesn't use lower_to_source_mods anymore so thankfully it's spared of this particular abomination)
<alyssa>
only uses are ntt, etnaviv, a2xx, lima, and r600/sfn
<alyssa>
I am not volunteering to rewrite people's compilers
<alyssa>
so.. this the consolation prize
<gfxstrand>
I'm more worried about vec-to-reg
<alyssa>
nod
<alyssa>
midgard seems happy with it
<gfxstrand>
Okay, ptn bug fixed.
swalker__ has quit [Remote host closed the connection]
<pendingchaos>
besides those, it's just a Vulkan driver
<alyssa>
gfxstrand: also, pushed nir/legacy-mods, it has your pushed fix squashed in though not the unpushed ptn fix
<hramrach>
but that documents driver some diver settings, not what hardware it supports
<pendingchaos>
RADV should support all AMD GPUs supporting Vulkan
<hramrach>
the moment they are released?
<pendingchaos>
there might be some delay (both because of release schedules and development effort) depending on how different the new GPU is from predecessors
<pendingchaos>
gfx1100 and gfx1101 for example, should be basically the same
<pendingchaos>
gfx1030 and gfx1100 had significant differences
<hramrach>
so how do I tell when a GPU has aged enough to be supported?
<llyyr>
rdna3 was supported pre-release already
<llyyr>
generally stuff should work on release but they might be buggy and that gets sorted out over time
<hramrach>
That would be nice improvement since the times that table is from
<pendingchaos>
I don't think there's any official list of RADV hardware support, so you can't easily tell
<llyyr>
radv supports all GCN/RDNA cards
<pendingchaos>
I think usually phoronix and such will release an article when a generation of gpus is supported
<llyyr>
so from hd 7000 series up to rx 7xxx
fab has joined #dri-devel
<hramrach>
yes, phoronix would probably have that
<pendingchaos>
ah, release notes also have new hardware support
<alyssa>
the whole SM5 shift mess is, as usual, a mess
<alyssa>
The obviously alternative is changing ubfe_imm to only produce ubfe if lower_bitfield_extract is set, otherwise, ubitfield_extract is produced
<alyssa>
s/obviously/obvious/
<alyssa>
It's unclear to me if that's better or worse
<alyssa>
For the _imm case, ubfe and ubitfield_extract are interchangeable (since we can just mask the immediate at build time)
<alyssa>
(or better yet, assert the immediate < 32)
<alyssa>
Hmm.. maybe I should do that actually
<alyssa>
pendingchaos: thoughts on ^^?
<alyssa>
I once again wonder if the default really should be khronos behaviour and _sm5 suffixed ops do the masked thing... meh
<alyssa>
would like to kick that can down the road again though.. I just want ubfe_imm or equivalent for agx
Kayden has quit [Quit: -> JF]
benjaminl has quit [Ping timeout: 480 seconds]
Haaninjo has joined #dri-devel
<jenatali>
🤷♂️ I looked at it but I don't really have any strong opinions on the matter
digetx is now known as Guest2834
digetx has joined #dri-devel
<alyssa>
valid
<alyssa>
maybe pendingchaos does
benjaminl has joined #dri-devel
Guest2834 has quit [Ping timeout: 480 seconds]
Kayden has joined #dri-devel
<pendingchaos>
I think building ubfe/ubitfield_extract depending on lower_bitfield_extract and using a unified helper makes sense, but having two helpers doesn't sound like a real problem
<alyssa>
sure
<alyssa>
let me know the preferred bikeshed colour and I'll paint it
kts has quit [Quit: Konversation terminated!]
rasterman has quit [Quit: Gettin' stinky!]
benjaminl has quit [Ping timeout: 480 seconds]
kts has joined #dri-devel
<alyssa>
airlied: when you have a few minutes could I pick your brain about AoS/SoA gallivm?
vliaskov has quit []
<alyssa>
usually don't like to "ask to ask" but I don't yet have a coherent question formulated
gio_ has joined #dri-devel
<airlied>
alyssa: my brain has defeated you by purge AoS/SoA knowledge right down to knowing which one is which
<airlied>
but yes ask and pick away
gio has quit [Read error: Connection reset by peer]
<airlied>
alyssa: also sampling AoS/SoA is slightly different to the AoS/SoA execution model
<airlied>
by default we use soa execution and mostly soa sampling but sometimes sampling goes to aos mode
<airlied>
for one narrow use case we use aos execution
kts has quit [Quit: Konversation terminated!]
<alyssa>
oh boy
<alyssa>
airlied: The basic question I have is that load_reg/store_reg take arrays of LLVMValueRefs
<alyssa>
instead of just a single LLVMValueRef for the whole vector
kts has joined #dri-devel
benjaminl has joined #dri-devel
<alyssa>
it seems in the AoS path only the [0] component is used
<alyssa>
but in the SoA path every component is used separately
<alyssa>
I guess "AoS" is like vec4 gpus and "SoA" is like scalar GPUs?
<alyssa>
in that case, why would gallivm even see vectorized NIR in the first place?
<alyssa>
why not scalarize completely in NIR, so we only need the single LLVMValueRef (corresponding to either the one component or the whole vector)?
<airlied>
probably because the core code was originally TGSI designed and TGSI is vec4
<airlied>
so it just kept doing that when I ported it to NIR, and handled vecotrs
<airlied>
but it doesn't really correspond to GPUs that well
<alyssa>
OK
<airlied>
SoA mode is it stores 4/8-wide scalars
<airlied>
so a vector in SoA mode is just a set of vec-len scalars each of which is 4/8 channels wide
<airlied>
depending on avx etc
<alyssa>
yes, that's how scalar GPUs work
<airlied>
oh my scalar gpus have uniform regs which llvmpipe doesn't :-P
<alyssa>
mine don't
<airlied>
AoS is a special case for storing 16-wide chars
<airlied>
so that you can process 4 8-bit RGBA pixels in one go
<airlied>
it's very limited in scope in what you can do
<airlied>
it's just to provide a fast path for blits and copies
<alyssa>
Right, ok
<airlied>
so yes we could probably scalarize completely in NIR for the aos case, but the TGSI code still exists
<alyssa>
OK
<alyssa>
mostly i'm trying to understand why assign_dest (for example) takes an array of valuerefs instead of just one
<alyssa>
but you're saying that's just TGSI legacy?
<airlied>
what one value ref would it take?
<airlied>
if dest has 4 components
<airlied>
you can't do vectors of arrays of values in llvm IR
<alyssa>
why would you ever have that, though?
<airlied>
because we haven't scalarised 4 component stores
<alyssa>
ooh
<airlied>
though maybe in practice we have
<alyssa>
like, store_ssbo?
<airlied>
I think the main uses caess are the vec4 type constructors
<airlied>
nir_vec4 etc
<alyssa>
right..
<airlied>
where you have one ssa value that is a vector of scalars but the scalars are 8-wide arrays
<alyssa>
maybe I'm objecting to the "Loop for R,G,B,A channels" in the SoA case in visit_alu
ngcortes has quit [Ping timeout: 480 seconds]
<alyssa>
not really interested in reworking this. just trying to figure out what to do for my NIR rework
<alyssa>
and today is llvmpipe
<alyssa>
day
fab has joined #dri-devel
tobiasjakobi has joined #dri-devel
benjaminl has quit [Ping timeout: 480 seconds]
<airlied>
yeah so we do all the operations once on each component of the vector, then collect the results, then store them back as an array
tobiasjakobi has quit []
<airlied>
I just didn't see the value for register stores of sticking them into an LLVM array
<airlied>
just to pull them back out again
<airlied>
since register stores actually go to memory, as opposed to just hide inside the ssa value hash table
<alyssa>
why are there multiple components on the vector?
<alyssa>
aren't we calling lower_alu_to_scalar in the SoA case?
<alyssa>
I guess we aren't
<alyssa>
we should be, I guess
<airlied>
lavapipe does, not sure llvmpipe does
<airlied>
probably a cleanup possible there
<alyssa>
doesn't look like it does
<alyssa>
yeah.. not today's cleanup though
<alyssa>
currently defeaturing nir_register from llvmpipe
<airlied>
there's probably quite a lot of llvm side stuff that could be moved to NIR side
<alyssa>
Yeah
<airlied>
it's mostly a legacy of TGSI and whatever state nir was in when I wrote it
<alyssa>
piles of the graphics pipeline emulation code could be common NIR passes too I think
<alyssa>
llvmpipe using nir_lower_blend anyone? ;-D
<airlied>
oh that stuff is so finely hand written
<alyssa>
D=
<airlied>
I fear to tread in the blending pipeline, so many hand coded swizzle calls that I don't really understand
<gfxstrand>
That sounds like a good argument for NIRifying
<alyssa>
:crab_fire:
<airlied>
it would be, but I doubt it would get as fast
<airlied>
since it's mostly hand writing LLVM IR to optimise thing
<airlied>
not sure translating NIR would achieve the same level, since NIR doesn't have a view into the LLVM 4-8 wide fun
<airlied>
alyssa: I think the other reason we don't scalarise in NIR is for the soa/aos decision point there might not be a simple point to do it
* airlied
has to wear noise cancelling headphones to compile llvm or use the preprod navi33 card I have
<alyssa>
Lool
<HdkR>
Sounds about right for most rack-mount things :D
<karolherbst>
airlied: that reminds me...
ngcortes has joined #dri-devel
<karolherbst>
btw, how did you end up connecting that one to your system?
sima has quit [Ping timeout: 480 seconds]
<airlied>
alyssa: seems to be about what I'd expect
<airlied>
karolherbst: I ended up turning my machine on it's side, putting it on a cardboard box, and when I put that card in I stick another piece of cardboard box between it and the PSU to ensure it is supported
<alyssa>
airlied: =D
<karolherbst>
oof
<airlied>
I really should get an PCIE extender so I can put it flat on something
<karolherbst>
yeah.. I planend to use a PCIe extender as well...
<alyssa>
well, in that case I need help since IIRC iris doesn't build on arm
* alyssa
tries anyway in case that was fixed
tintou has joined #dri-devel
go4godvin has joined #dri-devel
go4godvin is now known as Guest2915
pinchart1 is now known as pinchartl
<alyssa>
oh, it does, cool
<alyssa>
where's my drm-shim though
<alyssa>
iris doesn't have drm-shim? :(
<alyssa>
intel_stub_gpu. right
LinuxHackerman has joined #dri-devel
<alyssa>
OK, reproduced
<alyssa>
Ohhhh
<alyssa>
Lol
<alyssa>
OK
<alyssa>
I see what happened
fkassabri[m] has joined #dri-devel
<alyssa>
whoopsies
<alyssa>
today's edition of "stupid spot the bug"
<alyssa>
and fixed
<alyssa>
well test still crashes for me because of arb_fragment_shader_interlock-image-load-store: ../src/intel/isl/isl_tiled_memcpy.c:609: choose_copy_function: Assertion `!"" "ISL_MEMCOPY_STREAMING_LOAD requires sse4.1"' failed.
<HdkR>
A bit difficult to get SSE4.1 on your Macbook
<alyssa>
Little bit yeah
<gfxstrand>
hehe... Yeah....
<gfxstrand>
I thought we had a non-SSE path
<alyssa>
gfxstrand: you do, but iris was specifically asking for streaming
<gfxstrand>
Ah, yes...
<gfxstrand>
Because it can
<gfxstrand>
Because it only runs BDW+ which is always paired with a GPU that supports SSE4.1
<gfxstrand>
Unless that GPU is an Arc in which case it could be plugged into raspberry pi for all you know.
<alyssa>
Yep
<alyssa>
Well, not a raspberry pi I don't think
<alyssa>
low to mid-tier arm doesn't work with dGPUs usually
<HdkR>
Probably more a SolidRun Honeycomb or Ampere eMAG
<alyssa>
yeah
<alyssa>
server grade arm64 + dGPU
YaLTeR[m] has joined #dri-devel
<DemiMarie>
Are there any mid-grade Arm64 chips?
<jenatali>
Oof, that's a fun bug. Glad there was a test that caught it, though I'm surprised there was only one failure
<DemiMarie>
mid-grade = desktop PC class
<gfxstrand>
Apple
<gfxstrand>
Otherwise, not that I'm aware of.
halfline[m] has joined #dri-devel
Piraty has quit [Remote host closed the connection]
<DemiMarie>
Is that likely to ever change?
Piraty has joined #dri-devel
<jenatali>
Some of QC's higher end chips are approaching that IMO
Hi-Angel has joined #dri-devel
<DemiMarie>
gfxstrand: I’m a little bit salty about Apple having so many non-standard SMMUs. Means that Xen support for Apple Silicon is unlikely to ever happen.
hch12907 has joined #dri-devel
calebccff_ is now known as calebccff
<alyssa>
gfxstrand: Hmm?
<alyssa>
oh I see
<alyssa>
ok, zink is converted too
<alyssa>
I think that's enough backends for proving the design is sensible
Haaninjo has quit [Quit: Ex-Chat]
<alyssa>
i'm off for the night then
T_UNIX has joined #dri-devel
<alyssa>
pretty steady progress though
<alyssa>
getting close to taking the Draft status off, so that's exciting
<alyssa>
for me
<alyssa>
not exciting if you were ignoring it and will soon have to convert your backends :~P
<gfxstrand>
Or rather your coalesce_swizzle thing isn't quite as good for some reason.
<alyssa>
gfxstrand: I'd believe it
JohnnyonF has quit [Ping timeout: 480 seconds]
<gfxstrand>
alyssa: So the big difference as far as I can tell is that try_coalesce in lower_vec_to_movs puts the register write directly in the ALU op that generates the swizzle source. In a store_reg world, that would mean placing a store_reg immediately after.
<gfxstrand>
alyssa: Whereas in lower_vec_to_regs, you insert the store_reg at the vec location and then eliminate the swizzling mov, leaving the store_reg as-is.
<gfxstrand>
So the store_reg ends up living at the vec location.
<alyssa>
=> extra moves because that store isn't trivial
<alyssa>
?
<alyssa>
s/isn't/may not be/
<gfxstrand>
I'm not following
<alyssa>
I may not be either
<alyssa>
The reason the placement matters is presumably because putting the store_reg too late will cause nir_trivialize_registers to insert a move that won't be coalesced?
<gfxstrand>
No
<gfxstrand>
It's because, thanks to SSA, the coalescing that happens in try_coalesce works across blocks.
<gfxstrand>
It doesn't matter if the fdp4 or whatever it is happens to be 17 blocks away, if the vec is the only user, we can re-swizzle it and write the register as part of the fdp4.
<alyssa>
haswell supports control flow????????
<gfxstrand>
Yes, sadly.
<gfxstrand>
:P
<alyssa>
we dont talk about broadwell, no no no
<gfxstrand>
By contrast, when you emit the store_reg at the location of the vec and then try to coalesce later, the problem is much harder because you're moving a store_reg with insufficient information.
<gfxstrand>
Well, you have enough information
<gfxstrand>
It's possible
<gfxstrand>
Each component is written exactly once
<gfxstrand>
But it's a lot harder than when we're doing it in try_coalesce and the value we're dealing with is SSA.
* alyssa
is trying to page in enough details of the passes for this to make sense
<alyssa>
gfxstrand: I'm still not following why it matters where the store_reg instruction is placed
<alyssa>
except I guess because trivialize_registers inserting extra moves because it doesn't see across bblock boundaries
<gfxstrand>
It matters because back-end vec4 copy-prop and register coalesce suck
<alyssa>
Oh, well, yes
<idr>
Understatement of the year...
<alyssa>
gfxstrand: I can try to reintroduce try_coalesce instead of the 2 pass thing
<alyssa>
tomorrow, I mean. it's past working hours now I just saw an interesting problem
<gfxstrand>
Yeah
<gfxstrand>
That's fine
<alyssa>
would ppreciate if you can send me a small affected shader that I can play with
<alyssa>
but if not I can probably construct smoething
<alyssa>
I don't really remember why I did the 2 pass thing
<gfxstrand>
alyssa: It's sitting in your e-mail
<alyssa>
thanks!
<alyssa>
it appears I may have texted you the reason weeks ago but had disappearing messages on
<alyssa>
couldn't have been that important (-:
paulk has joined #dri-devel
paulk-bis has quit [Read error: Connection reset by peer]