<danvet>
some ack from vk people would be good for this I think
vliaskov has joined #dri-devel
<HdkR>
Alright cool, looks like the msm change will just work since I was already passing through padding, syncobj_wait is also a passthrough with correct arrangement, and...I'm not even watching the sync_file uapi so if it breaks, wheh
lynxeye has joined #dri-devel
apinheiro has joined #dri-devel
ice99 has joined #dri-devel
Zopolis4 has joined #dri-devel
pochu has joined #dri-devel
tzimmermann has quit [Remote host closed the connection]
<HdkR>
I should probably one day take a scroll through ioctls that are registering compat handlers
fab_ has joined #dri-devel
fab_ is now known as Guest9006
Haaninjo has joined #dri-devel
<danvet>
HdkR, anything new that requires compat handler is kinda a misdesign, we really shouldn't need these anymore
<danvet>
unfortunately android having been 32bit only for a while butchered a few :-/
<danvet>
but in drm I don't think we've screwed up since years
fab has quit [Ping timeout: 480 seconds]
<HdkR>
There's been a decent number of mess ups with tail padding on the structs
<HdkR>
Recent too
<danvet>
HdkR, care to supply an ack on-list?
<danvet>
HdkR, hm for arrays and stuff?
<danvet>
drm_ioctl() auto-corrects padding both ways, so unless you don't memset the entire struct or something it should work out
<HdkR>
nah, just ending the struct with a uint32_t and the tail alignment ends with 32-bit or 64-bit aligned depending on bitness
<HdkR>
Is the most common thing I've noticed
<danvet>
yeah I think we screw that sometimes, but it's also shouldn't hurt (unless it's an array)
<HdkR>
Changes the ioctl number which flags my CI :P
<danvet>
oh for secomp filtering and stuff like that?
<HdkR>
I didn't even think about seccomp filtering. That...might be interesting
<danvet>
sometimes we need to patch it up like DMA_BUF_SET_NAME_A/B
<HdkR>
But since FEX is translating 32-bit ioctls to 64-bit ioctls, any change in ioctl size is a flag for bad smell
<danvet>
but drm_ioctl handling padding either way absorbs these fully
Danct12 has quit [Quit: WeeChat 3.8]
tzimmermann has joined #dri-devel
<danvet>
HdkR, ah yeah if you write your own compat code then this sucks :-(
<HdkR>
Indeed!
<danvet>
just wondering whether a makefile check for uapi header structs would be doable
<danvet>
would need a ton of annotations for all the current ones that are busted though
<HdkR>
I have a ton of annotations specifically because of this. Mostly focused around drm, and I'm /definitely/ missing some
Ahuj has joined #dri-devel
<HdkR>
Then a clang script that compares member alignment, size, offset, etc between architectures
rasterman has joined #dri-devel
<danvet>
HdkR, maybe we want gitlab first to make sure it's actually used, but this sounds like something that would be really useful in upstream
<danvet>
maybe opt-in for drm only at first
<HdkR>
I will love if this could get maintained upstream at some point so I don't need to constantly glare at the ML for uapi changes :P
<HdkR>
Doing it this way since I don't annotate the struct definitions directly so I can use unmodified headers
<HdkR>
Also needs to handle opaque ioctl definitions that don't have the struct information declared in the `_IO_RW`, which is really annoying to validate
Guest9009 has quit [Ping timeout: 480 seconds]
jdavies has joined #dri-devel
jdavies is now known as Guest9010
Guest9010 has quit [Ping timeout: 480 seconds]
vliaskov has quit [Ping timeout: 480 seconds]
gio has joined #dri-devel
vliaskov has joined #dri-devel
frieder has quit [Ping timeout: 480 seconds]
nimrod456 has joined #dri-devel
nimrod456 has quit [Remote host closed the connection]
fe_sch[m] has joined #dri-devel
frieder has joined #dri-devel
agneli_ has joined #dri-devel
frieder has quit [Remote host closed the connection]
frieder has joined #dri-devel
agneli has quit [Ping timeout: 480 seconds]
srslypascal has quit [Ping timeout: 480 seconds]
heat has joined #dri-devel
srslypascal has joined #dri-devel
apinheiro has quit [Ping timeout: 480 seconds]
frieder has quit [Ping timeout: 480 seconds]
srslypascal is now known as Guest9016
srslypascal has joined #dri-devel
jaganteki has joined #dri-devel
apinheiro has joined #dri-devel
frieder has joined #dri-devel
Guest9016 has quit [Ping timeout: 480 seconds]
yuq825 has left #dri-devel [#dri-devel]
devilhorns has joined #dri-devel
tango_ has joined #dri-devel
vals_ has quit [Ping timeout: 480 seconds]
thenemesis has joined #dri-devel
srslypascal has quit [Ping timeout: 480 seconds]
tobiasjakobi has joined #dri-devel
tobiasjakobi has quit [Remote host closed the connection]
srslypascal has joined #dri-devel
kxkamil has quit []
angerctl is now known as Namarrgon
jaganteki has quit [Remote host closed the connection]
thenemesis has quit []
thenemesis has joined #dri-devel
frieder has quit [Ping timeout: 480 seconds]
gouchi has joined #dri-devel
thenemesis has quit [Ping timeout: 480 seconds]
agneli has joined #dri-devel
agneli_ has quit [Ping timeout: 480 seconds]
frieder has joined #dri-devel
devilhorns has quit []
Ahuj has quit [Ping timeout: 480 seconds]
gio has quit [Ping timeout: 480 seconds]
gio has joined #dri-devel
mohamexiety has joined #dri-devel
dtmrzgl has quit [Ping timeout: 480 seconds]
rosefromthedead has joined #dri-devel
fxkamd has joined #dri-devel
Dr_Who has joined #dri-devel
<eric_engestrom>
DavidHeidelberg[m]: not sure what you're trying to do with the CI compiler wrapper, but yes, right now it's there to pass `-Werror` because meson ignores `-werror=true` (which IMO is a meson bug), and in my WIP MR I'm also adding `--fatal-warnings` in that wrapper (arguable whether meson should do that, so we might have to keep the wrapper indefinitely if meson doesn't accept that)
<eric_engestrom>
DavidHeidelberg[m]: but for `clang` vs `clang-15`, you can trivially add a new 2-lines wrapper that includes the version number in the executable name
bmodem1 has joined #dri-devel
<DavidHeidelberg[m]>
My approach was to remove dash and number; ofc I can, I'm just thinking about how complex and ugly it looks :(
kts has quit [Ping timeout: 480 seconds]
jaganteki has joined #dri-devel
<DavidHeidelberg[m]>
wish meson would fix that behaviour to get along with our needs rather than this hacking around
* eric_engestrom
nods
<eric_engestrom>
the whole "`clang` something has a version number in its executable name" is a mess, and the way meson deals with it is another mess :/
<eric_engestrom>
s/something/sometimes/
<MrCooper>
eric_engestrom: FWIW, -Wl,--fatal-warnings could be added to c{,pp}_link_args, unless that breaks some meson configure checks as well?
<eric_engestrom>
MrCooper: you're right, and that's what I was doing at first actually
<eric_engestrom>
now I'm wondering why I changed this to use the wrapper instead
bmodem has quit [Ping timeout: 480 seconds]
<eric_engestrom>
but very good point, so the wrapper is only needed until meson's `--werror=true` enables `-Werror`
<MrCooper>
right
frieder has quit [Ping timeout: 480 seconds]
frieder has joined #dri-devel
vandemar has joined #dri-devel
elongbug has joined #dri-devel
YuGiOhJCJ has quit [Quit: YuGiOhJCJ]
<vandemar>
I keep getting segfaults in a game, after anywhere from a few minutes to 1-2 hours, running on a radeon 6700, in mesa 23.0.0 amdgpu_cs_add_buffer:674, more details https://pastebin.com/Zkup7jaV
<vandemar>
Is there anything else I should investigate, or should I file a bug, or is it a known issue? I didn't see any recent similar looking reports of segfaults, but I didn't look that far back.
<pepp>
vandemar: please file a bug
dsrt^ has quit [Remote host closed the connection]
<eric_engestrom>
MrCooper, DavidHeidelberg[m]: MR updated (although it is still blocked by the bug of telling the linker we export symbols that don't exist depending on which driver we build), but it made me wonder: why to we do `-D c_args="$(echo -n $C_ARGS)"` instead of `-D c_args="$C_ARGS"`? I don't think duplicate whitespaces are an issue, if that's what this was trying to remove
<pepp>
vandemar: identifying which call to radeon_add_to_buffer_list() from si_emit_draw_packets the lead to the crash would help
<MrCooper>
eric_engestrom: it's to get rid of the newlines
<eric_engestrom>
if we're using `>` there shouldn't be any newline?
srslypascal has quit [Quit: Leaving]
<eric_engestrom>
in yaml I mean
<eric_engestrom>
oh, perhaps a trailing newline, maybe we should have `>-`
<MrCooper>
maybe, can't remember the details
aravind has quit [Remote host closed the connection]
aravind has joined #dri-devel
<zmike>
mareko: for KHR-GL46.gpu_shader_fp64.fp64.state_query doesn't this mean the TES uniforms are also not referenced since the GS isn't reading gl_Position?
srslypascal has joined #dri-devel
<vandemar>
pepp: ok. it's line 1523 (23.0.0) in si_emit_draw_packets, in the if(index_size) branch
<pepp>
vandemar: can you print the indexbuf variable with gdb?
<vandemar>
no locals? should there be?
Guest9006 has quit []
bmodem has joined #dri-devel
kts has joined #dri-devel
frieder has quit [Remote host closed the connection]
srslypascal has quit [Ping timeout: 480 seconds]
srslypascal has joined #dri-devel
bmodem1 has quit [Ping timeout: 480 seconds]
Zopolis4 has quit []
aravind has quit [Ping timeout: 480 seconds]
aravind has joined #dri-devel
fab has joined #dri-devel
srslypascal is now known as Guest9029
kzd has joined #dri-devel
srslypascal has joined #dri-devel
gawin has joined #dri-devel
Guest9029 has quit [Ping timeout: 480 seconds]
<gfxstrand>
robclark: For your 64+32 loads, is the 32-bit offset signed or unsigned?
<robclark>
gfxstrand: IIRC signed in most (all?) cases
kts_ has joined #dri-devel
kts_ has quit []
kts has quit [Ping timeout: 480 seconds]
<gfxstrand>
robclark: Ok, so at least we have consistency there.
<gfxstrand>
etnaviv is 32+32 so no signeness issues there.
<robclark>
I think the harder thing is that there are several different encodings we can pick (ie. one option has a small const offset, but we can optionally left-shift the variable src by a PoT.. or we can choose a different option w/ larger const offset, etc)
<gfxstrand>
For NV, we can just re-materialize the add if offset > INT32_MAX
kts has joined #dri-devel
<gfxstrand>
I'm more concerned about hardware that can do non-const offsets.
<robclark>
hmm, I think we'd prefer the offset to _not_ be 64b (but happy for nir not to lower to this form if offset is too big)
<gfxstrand>
Because we can't know whether or not the offset is negative then
<robclark>
in practice I don't think you'll see many >4GB buffers ;-)
<gfxstrand>
Oh, the plan is for the offset to be 32bit
<gfxstrand>
always
<robclark>
+1
frieder has joined #dri-devel
<gfxstrand>
The question is how sign-extension is handled on the 32-bit part
<robclark>
hmm
<robclark>
I think I'd have to play with that
<gfxstrand>
NVIDIA only has const offsets and they're signed. So if it's unsigned, we just check for <= INT32_MAX and re-mat the add if it would be negative.
<gfxstrand>
For Etnaviv, it's 32+32 so there is no zero- or sign-extension
<gfxstrand>
It's freedreno where we're going to have a problem if we get it wrong.
bluetail30 has quit [Ping timeout: 480 seconds]
<gfxstrand>
For NIR internals, I think I'd like it to be unsigned because then it matches nir_address_format_64bit_global_32bit_offset.
<gfxstrand>
But NIR can be changed, hardware can't.
<gfxstrand>
It also depends a bit on how we want to deal with offsets. Array indices are typically signed so for an OpenCL-like case where you have a pointer that you're treating as an array, you may have a negative offset.
<robclark>
I'm going to hope/assume that hw handles negative offset in a sane way, but will need to do some experiments
bluetail30 has joined #dri-devel
<gfxstrand>
k
aravind has quit [Ping timeout: 480 seconds]
rosefromthedead has quit [Ping timeout: 480 seconds]
Duke`` has joined #dri-devel
frieder has quit [Remote host closed the connection]
frieder has joined #dri-devel
soreau has quit [Quit: Leaving]
ice99 has quit [Ping timeout: 480 seconds]
<mareko>
zmike: yes
bgs has joined #dri-devel
<alyssa>
HdkR: wouldn't it be less work to implement 32-bit libGL/libVK thunking and call it a day?
<eric_engestrom>
(marcan's suggestion of looping over the devices once and picking the driver based on that, instead of going through all the devices for all the drivers until one loads)
stuarts has joined #dri-devel
<robclark>
gfxstrand: ok, so the variable src offset is unsigned 32b
<gfxstrand>
Venemo: What does AMD have for offsets on load/store_global?
<gfxstrand>
robclark: Okay...
<gfxstrand>
robclark: But some of the constant ones are signed?
<robclark>
I _think_ so.. they are also a lot smaller than 32b
<robclark>
or at least we have signed offsets in some other places
<alyssa>
gfxstrand: for AGX, 64+32 either signed or unsigned, but the offset is implicitly multiplied by the element size (4 unless you're doing 8/16-bit loads)
<alyssa>
agx_nir_lower_address sorts out the mess, not planning to use common code becuase that last offset multiply thing is off the deep ened
Company has joined #dri-devel
gouchi has quit [Quit: Quitte]
<Venemo>
gfxstrand: I think we have a vector address, scalar address and a constant address, but pendingchaos can correct me if I'm wrong
frieder has quit [Remote host closed the connection]
<robclark>
gfxstrand: yeah, immed offsets encoded in instruction are signed
<robclark>
and yeah, offset is in units of element size as well
<gfxstrand>
robclark: Are your non-constant offsets in element size units?
<robclark>
yeah
<gfxstrand>
bugger. Okay.
<gfxstrand>
So now we have Immaginappledreno?!?
<robclark>
yeah, I don't think unaligned access is a thing ;-)
<gfxstrand>
I fear I may be starting to agree with Venemo that we want to just add a signed 32-bit offset const index to load/store_global.
<gfxstrand>
nir_lower_mem_access_bit_sizes is going to hate me for it.
<gfxstrand>
Or maybe we just assert offset == 0 in that pass
<robclark>
hmm, I was kinda expecting that lower-derefs could give us a 32b offset that driver can consume..
<gfxstrand>
Yeah, there's some debate to be had about where the best place to do that offsetting is
<robclark>
if you think about array+struct dereferencing you'd design an instruction exactly like what I have ;-)
<gfxstrand>
Right up until some app does a bit of "creative" casting. :P
kts has quit [Quit: Konversation terminated!]
<robclark>
sure there isn't some weasle out clause somewhere in the spec declaring that negative array indexing is UB?
srslypascal has quit [Quit: Leaving]
<gfxstrand>
Negative array indexing is UB assuming you get the entire array access in a single index.
<gfxstrand>
However there's OpPtrAsArray which lets you effectively do (&arr[7])[-3] which would be arr[4] which is valid.
<robclark>
if the xyz[-3] is UB how does it become defined if xyz=&arr[7]?
pochu has quit [Quit: leaving]
srslypascal has joined #dri-devel
<pendingchaos>
AMD global loads on gfx6 and gfx9+ can have a limited constant offset, which can be negative on gfx9+
<pendingchaos>
for gfx6: it can do either a sgpr/vgpr address and a unsigned 32-bit sgpr offset
<pendingchaos>
for gfx9+: it can take a vgpr address, or a sgpr address and unsigned 32-bit vgpr offset
<pendingchaos>
for gfx7/8: it can take a vgpr address and offsets have to be lowered as an addition
<gfxstrand>
robclark: The UB comes in when you actually access outside arr[], not based purely on the index.
<gfxstrand>
robclark: As long as all your OpPtrAssArray end you back inside the array, you're fine.
<alyssa>
gfxstrand: wait do we know what imagination does?
<gfxstrand>
Well, as long as you never end up outside.
<gfxstrand>
alyssa: No, I'm just making silly jokes. :P
<alyssa>
the ISA on AGX is all Apple as far as we know
tursulin has quit [Ping timeout: 480 seconds]
<gfxstrand>
robclark: So if you have int arr[10] then arr[7] is valid as is (&arr[7])[-3] because that's just arr[4]. I'm not sure about (&arr[12])[-5]. That one might be UB.
<robclark>
hmm, this is only a problem if part of the offset is folded into the 64b address..
<robclark>
although maybe we can't 100% avoid that?
<gfxstrand>
Yeah
<gfxstrand>
I mean, it's not strictly speaking a problem. We can handle whatever. It just affects design a bit.
<alyssa>
appledreno
<cwabbott>
robclark: yeah, presumably the two parts could be separated by other stuff in between
<cwabbott>
pass it as a function argument, use in a phi, you name it
junaid has joined #dri-devel
<robclark>
I guess even if we limited it to cases where we could tell at compile time that the base address is the actual start of the object, that seems fine.. if you're trying to be a jerk to the compiler I don't really care if you get horrible suboptimal code ;-)
bmodem has quit [Ping timeout: 480 seconds]
<gfxstrand>
At this point, I think we want to just add a signed 32-bit offset to load/store_global like Venemo said. I don't like calling it "base" but we can add a new name. Names are cheap.
<gfxstrand>
italove: ^^
<gfxstrand>
For the non-constant case, IDK what to do. It's also a far less important case, IMO. If there's a few adds in the shader, oh well.
<robclark>
non-constant == any array access
srslypascal is now known as Guest9038
srslypascal has joined #dri-devel
<robclark>
and few adds means bunch of instructions after lowering to 32b :-(
<gfxstrand>
robclark: Yeah, but when it comes to evolving core NIR, I think we start with what we know.
<gfxstrand>
I'm not going to suggest we take away your ir3 intrinsics :)
<gfxstrand>
Just make the common load/store more capable.
<gfxstrand>
The constant offset case is the one lots of hardware can do, where signed vs. unsigned matters less, and where it's easy enough / zero-cost to do `offset / (bit_size/8)` if you need to
pallavim has joined #dri-devel
<Venemo>
gfxstrand: thx :)
<robclark>
hmm, it is easy enough to check if nir src is const.. and if we _eventually_ want to support variable offset, does that mean we end up with both const and non-const offset?
Guest9038 has quit [Ping timeout: 480 seconds]
<gfxstrand>
robclark: No. The load/store_global_offset32 versions that we'd add wouldn't have a const_index offset.
<gfxstrand>
I wish we could practically just do load/store_global_offset32 and have the driver do nir_src_is_const() on it but there seem to be too many variables right now to figure out the right thing to do globally there.
<gfxstrand>
signed vs. unsigned, offset units, etc.
<robclark>
offset units I think you can punt on... a later stage pass would turn it into driver specific intrinsic and add the approprate PoT shift (and then maybe that shift gets optimized back out again)
<robclark>
I'd look at it this way, it is a lot easier for backend to do something like that than it is to fish out 32b'ness from 64b math
<gfxstrand>
Oh, no arguments there.
pivi has joined #dri-devel
<gfxstrand>
As long as you have something to optimize to which you can do before you lower 64b math to 32b, I think that's the important thing.
<robclark>
there is still the signedness issue.. but at least offset units is easy to push to driver
pivi has quit []
<danvet>
gfxstrand, different topic, ack on the syncobj deadline from robclark?
<danvet>
it looked just like nits you had
<danvet>
just want to make sure since uapi
junaid has quit [Remote host closed the connection]
<gfxstrand>
danvet: Yeah, I'm gonna look at that again. Been on Khronos calls and thinking about NIR pointers all morning.
<danvet>
gfxstrand, pls ping me when you're all happy so I can pull robclark's pr
<danvet>
or ask for respin :-)
nekit has joined #dri-devel
jhli has joined #dri-devel
Ahuj has joined #dri-devel
lynxeye has quit [Quit: Leaving.]
smilessh has quit [Ping timeout: 480 seconds]
<hays>
what could it mean if during the boot process, I get no hdmi output until wayland/x11 loads?
<hays>
apologize if this is the wrong channel
<ccr>
I have an ASUS motherboard that occasionally starts acting so that there's no HDMI output until Xorg starts, it apparently suddenly decides that it prefers VGA output (board has VGA, DVI and HDMI).
<ccr>
3
<hays>
hmm that is an interesting thing to check. there are 2 hdmi outputs on this thing
srslypascal has quit [Ping timeout: 480 seconds]
<hays>
kernel commandline lets you do video=HDMI-1:D video=HDMI-2:D but unclear semantics of that...
srslypascal has joined #dri-devel
<hays>
but seems worth trying
<ccr>
you could also check if UEFI/BIOS has some setting for preferred connector
<ccr>
(in my case it's a weird bug in UEFI or something)
srslypascal has quit [Read error: No route to host]
djbw has joined #dri-devel
srslypascal has joined #dri-devel
gawin has quit [Quit: Konversation terminated!]
djbw has quit [Read error: Connection reset by peer]
oliviac has joined #dri-devel
ngcortes has joined #dri-devel
soreau has joined #dri-devel
<gfxstrand>
robclark: RE deadline defaults: What I wouldn't give for Rust's Option<T> right about now....
<robclark>
Option<T> is nice for other reasons.. but doesn't really solve the issue that driver doesn't have enough information from app ;-)
<robclark>
vk extension wouild be a good thing.. but I think short term a 90% soln and driconf if needed are workable
mohamexiety has quit []
konstantin_ has joined #dri-devel
srslypascal is now known as Guest9047
srslypascal has joined #dri-devel
<gfxstrand>
robclark: No, but it solves the "how do I indicate that I don't care?" internal API question :)
<danvet>
just a real type system would be nice
konstantin has quit [Ping timeout: 480 seconds]
djbw has joined #dri-devel
<gfxstrand>
Well, yeah.
Guest9047 has quit [Ping timeout: 480 seconds]
rasterman has quit [Quit: Gettin' stinky!]
<jenatali>
I'm curious if there's Vulkan drivers out there that are working around app bugs in descriptor indexing. Specifically out-of-bounds indexing
<jenatali>
(Unless that's supposed to be well-defined and I just missed a bit of spec language)
oliviac has quit []
flibitijibibo has quit [Read error: Connection reset by peer]
<gfxstrand>
With robustness enabled, it's supposed to be less buggy
srslypascal is now known as Guest9049
srslypascal has joined #dri-devel
tzimmermann has quit [Quit: Leaving]
flibitijibibo has joined #dri-devel
Guest9049 has quit [Ping timeout: 480 seconds]
<jenatali>
gfxstrand: Which robustness?
<gfxstrand>
For buffers, it's robustBufferAccess, I think. For others, I don't remember the details.
<jenatali>
I'm not talking out-of-bounds within a buffer, I'm talking out-of-bounds within an array of buffers
<gfxstrand>
I know
<gfxstrand>
You're going to make me look this up, aren't you?
<jenatali>
That's a strong way to phrase it :P
flibitijibibo has quit [Ping timeout: 480 seconds]
<gfxstrand>
jenatali: I'm not seeing it in the spec anywhere
<jenatali>
Awesome. I'm still debugging DOOM Eternal and it looks like they do it
<gfxstrand>
That's believable
<DavidHeidelberg[m]>
jenatali: on which difficulty you do debug? :D
<jenatali>
David Heidelberg: When debugging an app like that, there's only one difficulty
<jenatali>
And it hurts
<DavidHeidelberg[m]>
nightmare :P ?
<alyssa>
jenatali: the natural descriptor indexing impl on some hw is going to be robust against that automatically so I could easily believe in app bugs
<alyssa>
robclark: optimizing out the shift after it's added in is tricky
<alyssa>
because `(a << b) >> b` is not equal to a in case of overflow
<gfxstrand>
No, but ((a << b) >> b) << b is
<jenatali>
alyssa: Interesting. I guess I'm confused as to what the behavior would be, if e.g. you index out-of-bounds of samplers
<alyssa>
jenatali: on valhall, return all-0 texels, iirc
anarsoul has quit [Ping timeout: 480 seconds]
<alyssa>
but I agree that's goofy
<jenatali>
Huh, cool
<alyssa>
I don't think Arm specifically designed that, more of an emergent behaviour
<alyssa>
the texture instructions check for valid descriptors and return 0's if not present (ostensibly)
<alyssa>
so I think it Just Works
<jenatali>
Huh
<alyssa>
but also I haven't tried and don't particularly care
<jenatali>
D3D leaves it as undefined behavior if you're using the native indexing functionality, but the stuff I had to do to get VK to work make it so much worse
<alyssa>
nod
anarsoul has joined #dri-devel
<alyssa>
we don't need another blog post about descriptor sets
<alyssa>
i learned that at xdc
<alyssa>
:p
ngcortes has quit [Ping timeout: 480 seconds]
Namarrgon has quit [Quit: WeeChat 3.7.1]
mvlad has quit [Remote host closed the connection]
<italove>
gfxstrand: ack on adding a const index offset to load/store_global and handling non-const separately, but then wouldn't it make sense to just use `base` so that we can fit more cleanly into `nir_opt_offsets`? It seems other load/store ops already assume `base` is an offset. Or you think that's not a good enough reason?
gouchi has joined #dri-devel
Namarrgon has joined #dri-devel
ngcortes has joined #dri-devel
lemonzest has quit [Quit: WeeChat 3.6]
soreau has quit [Ping timeout: 480 seconds]
ybogdano is now known as Guest9052
ybogdano has joined #dri-devel
soreau has joined #dri-devel
<HdkR>
alyssa: 32-bit thunking is a lot of work. Which is why we will have both. Also we still need to support all the....other 32-bit ioctl mess outside of DRM still
<alyssa>
HdkR: fair enough
<alyssa>
still seems like it ought to be the easier of the two but I don't know what I don't know
<HdkR>
:P
<alyssa>
gfxstrand: italove: Wait, are we touching core load/store_global? nak on that, not thrilled on the idea of auditing every backend
<alyssa>
or just adding new offset versions?
<alyssa>
IDk
<alyssa>
TBH, we need backend specific address arithmetic opt for some backends regardless so I'm pretty meh about the whole thing
rmckeever has joined #dri-devel
paulk-ter has quit [Remote host closed the connection]
agd5f_ has quit []
agd5f has joined #dri-devel
Ahuj has quit [Ping timeout: 480 seconds]
<gfxstrand>
alyssa: The plan was to modify load/store_global but make it so non-zero offsets are optional.
<gfxstrand>
alyssa: So the "audit" would just be running around to all the back-ends and adding `assert(nir_intrinsic_offset(intrin) == 0)` to their load/store_global handling.
<gfxstrand>
Should be pretty mechanical and fool-proof.
<Kayden>
has anyone had much luck running specviewperf2020?
<Kayden>
something with the nodejs gui stuff seems to be going wrong here and it just never displays anything on the screen at all, either when trying to use the viewset downloader, or just running the benchmarks
<Kayden>
IIRC strace just showed it sitting around blocked in a read call waiting on some socket forever
<alyssa>
gfxstrand: meh, fair enough
<alyssa>
is constant-only offsets really worth plumbing into NIR?
<alyssa>
i don't care i just don't plan on using it for 3/4 of the ISAs that I work on
<alyssa>
I'll use it for #4 i guess (Valhall)
<gfxstrand>
alyssa: I guess it depends on how much nir_ssa_scalar crawling you want back-ends doing
<alyssa>
shrug
danvet has quit [Ping timeout: 480 seconds]
<alyssa>
I don't see this materially reducing the backend crawls if backends want competent address arithm
<alyssa>
since most (all?) backends can do something more than just constant only which is the absolute lowest common denominator
ice99 has quit [Ping timeout: 480 seconds]
<gfxstrand>
Intel can't do anything. Nvidia is const-only
<alyssa>
OK
<alyssa>
so.. it sounds like NV is the only backend that actually is simplified by this.
<alyssa>
since every other backend either can't do it at all or can do something more complicated
<alyssa>
and in either case won't reduce backend crawling with a "common" crawl
<gfxstrand>
The perfect is killing the good right now...
<alyssa>
is it?
<alyssa>
I'm just not sold on what the good is
<alyssa>
We *already* have backend crawling for most of the backends that care for it, for that matter.
<gfxstrand>
You really don't want back-end crawling if your backend wants lower_int64
<gfxstrand>
crawling post-int64-lowering is a giant pain
<alyssa>
yes, this is true.
<gfxstrand>
So we need something that can get lowered at least a little earlier in the pipeline.
<alyssa>
and the reason I haven't bothered on mali, but that's more because I keep procrastinating on wiring up the real int64 that does exist lol
<gfxstrand>
If you haven't bothered on mali, then we don't "already" have it everywhere.
<alyssa>
I mean
<alyssa>
I don't plan to wire up the new thing on mali either ;-)
Dr_Who has quit [Ping timeout: 480 seconds]
<alyssa>
mostly because even doing constant-only is a huge pain on bifrost
<gfxstrand>
That's fine. That's your choice.
<gfxstrand>
It's optional. You can just not use it on bifrost.
<alyssa>
For context, AGX uses lower_int64 but I call agx_nir_lower_address immediately before nir_lower_int64 iirc
gouchi has quit [Remote host closed the connection]
<gfxstrand>
I'm just saying there's a giant difference between "Thing X already exists everywhere, you don't need to bother" and "Think X *could* exist everywhere but I haven't bothered because it's a pain."
<alyssa>
and agx_nir_lower_address lowers to load/store_agx intrinsics which map directly to the hw
<gfxstrand>
The later case is exactly when we want common things.
<alyssa>
Fair enough
<gfxstrand>
Maybe the answer is everyone has their own load/store_hw intrinsics and a lowering pass.
<alyssa>
gfxstrand: My opinion -- take it or leave it -- is add load/store_global_offset intrinsics (I think we already have the load) and add a NIR pass to do the crawl, it can live in src/compiler/nir/ if there's a second user of it
<gfxstrand>
It seems like we can do better than that.
<gfxstrand>
alyssa: Yeah, well that's the approach Venemo is NAKing.
<alyssa>
NAK and maybe Mali can use that (and call the common crawl right before lower_int64)
<alyssa>
ugh
<gfxstrand>
And where I started
<alyssa>
Venemo: what's your nak?
<gfxstrand>
That's why the perfect hates the good here. :P
<alyssa>
load_global_nak
<alyssa>
then you can't nak it
<alyssa>
or maybe you have to nak it
<alyssa>
shit
<Venemo>
I think load_global should be consistent with load_everything_else and have a const offset
<gfxstrand>
Venemo: Except that's not everything else
<gfxstrand>
That's everything else that doesn't take explicit addresses.
<Venemo>
eg. load_shared
<gfxstrand>
Wait, when did load_shared grow an extra offset?!?
<alyssa>
uhh
<alyssa>
did I miss that too?
<gfxstrand>
Maybe it always had one? Could be an artifact of history, that one.
<alyssa>
I did not know it had one?!
<alyssa>
is this broken in my backends?
<Venemo>
load_shared has a base offset
<Venemo>
so it makes sense to add it to load_global as well
apinheiro has quit [Quit: Leaving]
<Venemo>
that said, there is no hard NAK from me, it's just that I don't see why we should keep the intrinsic without the offset when we add the new one with the offset
<gfxstrand>
This goes back a long ways...
<gfxstrand>
Like pre-2018
<Venemo>
if you add a new one with a nee offset, who would actually want to use the old one?
<alyssa>
me
<Venemo>
y tho?
<alyssa>
because I'm doing all my own crawling anyway and an extra offset means one more thing to worry about
<zmike>
Kayden: link? I can try it tomorrow and let you know
<Venemo>
you can simply emit an add, it you can elect not to call the pass that collects the const offset, alyssa
<alyssa>
and the "well it'll just always be zero just trust us" is a recipe for brokenness in the future when some clever nir pass comes along in the future
<Venemo>
or* you can elect
<alyssa>
and also I need to get dinner and go to class and don't care about this bikeshed do whatever your want i'm out, peace!
alyssa has quit [Quit: leaving]
<gfxstrand>
So, RE load/store_shared. The offset appears to go all the way back but it hasn't been used by most backends since we switched to using nir_lower_explicit_io for shared.
<gfxstrand>
The base+offset stuff is typically associated with internal load/store ops, not explicit.
<gfxstrand>
It looks like Intel does support it though I doubt it ever sees a non-zero base these days.
<gfxstrand>
Intel HW doesn't have it, to be clear. The back-end emits an add.
<Venemo>
intel implemented it once I reminded them it's missing
<gfxstrand>
Oh
<Venemo>
it was last year when we started using it in the task shader lowering
<gfxstrand>
ah
<Venemo>
there was a bug due to them not supporting it
<Venemo>
anyway, I am sorry that this discussion offended alyssa, but I think it is good to have consistency between these intrinsics
<Venemo>
it is somewhat a mess though
<gfxstrand>
I'm all for consistency, I'm just not sure which way consistency favors
<gfxstrand>
From the Intel PoV, consistency favors ripping out BASE everywhere.
<Venemo>
we added load_global_amd so we can have the offsets we need, so dunno
<gfxstrand>
Well, that's the problem. :) The hardware is inconsistent.
<Venemo>
if you say we should remove the offsets from the ones that have it and we all should use backend specific ones instead, that is also fine by me
<gfxstrand>
I'm not necessarily saying we should do that.
<gfxstrand>
I'm not sure what the best thing to do is.
<gfxstrand>
I went into today with a pretty clear plan and it's been shot to hell. IDK where that leaves us.
<Venemo>
I'm sorry
<gfxstrand>
It's okay, it's how this process works.
<gfxstrand>
It's good to uncover all the inconsistencies
<jenatali>
Personally, I like having 2 intrinsics, one with a base that must be respected, and one that can never have a source. Having a nir_option for which one you prefer and having some common lowering pass that removes offsets when unwanted across all ops seems right to me
<jenatali>
That way we don't have "assert(offset == 0)" in backends that suddenly blows up because some pass started using offsets
<gfxstrand>
Well, we actually have three cases:
<gfxstrand>
1) Just an address
<gfxstrand>
2) Address + imm (where imm has some bit size and signedness)
<gfxstrand>
3) Address + offset
<jenatali>
True
<Venemo>
maybe we should have some kind of abstract NIR address type, which could be configured by the backend what it means exactly. and then we could all use the same intrinsic
<jenatali>
That already exists, and is consumed by (e.g.) lower_explicit_io to select the right intrinsics for the backend
<gfxstrand>
Intel wants 1, Nvidia wants 2, Etnaviv, Mali, and Freedreno, all want 2+2 with some hand-waved details about how big the immediates can be and the stride of the offset.
<jenatali>
Derefs are already the abstract address type
<Venemo>
I'm not talking about derefs
<gfxstrand>
*want 2+3
<gfxstrand>
Actually, I think Etnaviv just wants 3
<gfxstrand>
Venemo: What you're suggesting is pretty nir_address_mode, just way more of them than we already have.
<gfxstrand>
*pretty much
<Venemo>
maybe the load could take an arbitrary number of sources, but the meaning of those could be configured in nir_compiler_options
<gfxstrand>
That's pretty much nir_address_mode
<Venemo>
I don't see how that helps here
<gfxstrand>
It doesn't, not without way more address modes
<Venemo>
it could even part of the intrinsic itself
<gfxstrand>
But how would you describe what they mean?
<gfxstrand>
If it's all arbitrary and there isn't a machine-understandable meaning, we're back to everyeone having custom back-end intrinsics.
<Venemo>
struct { arbitrary_offset_sources : 8; scalar_offset_sources : 8; const_offset_sources: 8 } each field a bitmask that describes how to interpret each source
<Venemo>
and you can have up to 8 sources in total
<gfxstrand>
Writing correct generic code is going to be near impossible.
<gfxstrand>
We can barely do it now and that's with a way simpler system.
<gfxstrand>
That also doesn't take into account things like strides etc.
<Venemo>
well, then I guess we'll just have to keep using the backend specific intrinsics we already have
<Venemo>
I'm just throwing around random ideas, of course
<Venemo>
not necessarily saying this is any good
<gfxstrand>
How different are the rules on AMD for scalar vs. not?
<Venemo>
what kind of rules do you mean?
ybogdano has quit [Ping timeout: 480 seconds]
<Venemo>
scalar means it must be an SGPR
<Venemo>
on the NIR level it means divergence analysis must be able to prove that it's uniform
<gfxstrand>
Yeah, I know that
<gfxstrand>
But do you have different forms for SGPR vs. VGPR. Like VGPR can only do addr+imm but SGPR can do addr+offset or something like that.
<gfxstrand>
Or maybe just VGPR+SGPR
<gfxstrand>
I'm wondering how complicated the rules are and if there's any hope of getting AMD to use common code.
<Venemo>
most (not all) loads have all three kinds of offsets
<Venemo>
shared only has vgpr and imm, buffer loads have base sgpr pair, vgpr index, vgpr offset, sgpr offset and imm, scalar loads have a base sgpr pair, sgpr offset and imm, global is a bit weird I think it only has an sgpr pair base and a vgpr offset and imm
<Venemo>
thanks pendingchaos that is much more accurate for global loads than what I said
Haaninjo has quit [Quit: Ex-Chat]
<gfxstrand>
Ok, that's enough for me to know that a common NIR thing isn't going to model everything you want for AMD>
<gfxstrand>
So the question is if we can do something more useful than what we have today but that's also useful for other people and not any worse for anybody.
<Venemo>
yea
<gfxstrand>
But, IDK, everyone doing their own thing may not be totally horrile. :shrug:
pallavim_ has joined #dri-devel
pallavim_ has quit [Read error: Connection reset by peer]
pallavim has quit [Read error: Connection reset by peer]
<Venemo>
gfxstrand: in that case though, there is no need for load_global_offset either. nobody is going to want to downgrade to this when we already have a backend specific one that more accurately represents the HW
<gfxstrand>
Yeah, if everyone's doing their own thing, no bases, just addresses.
<Venemo>
so, shall we close the MR?
<gfxstrand>
Let me think some more before we do that.
<Venemo>
sure thing
Zopolis4 has quit []
mbrost has joined #dri-devel
<Venemo>
cmarcelo, mslusarz: when is it going to be time to remove NV_mesh_shader from ANV?
<Venemo>
gfxstrand: are you going to want to have NV_mesh_shader in NVK? :O
mbrost has quit [Remote host closed the connection]
mbrost has joined #dri-devel
<gfxstrand>
Venemo: No
<Venemo>
hooray :)
<Venemo>
that means we can remove that monstrosity from mesa without looking back
<cmarcelo>
Venemo: I _think_ should be fine by us, but mslusarz is more on top of mesh things, so let's wait him to chime in :)
oneforall2 has quit [Read error: Connection reset by peer]
oneforall2 has joined #dri-devel
<Venemo>
gfxstrand: I am looking to change the divergence analysis a bit, so it can distinguish between workgroup-uniform and wave-uniform. so I'd like to change nir_src_is_divergent and nir_dest_is_divergent to also take a scope, what do you think about that?