ChanServ changed the topic of #dri-devel to: <ajax> nothing involved with X should ever be unable to find a bar
rasterman has quit [Quit: Gettin' stinky!]
<karolherbst> ahh a crash :)
nchery is now known as Guest3390
nchery has joined #dri-devel
ybogdano has joined #dri-devel
rkanwal has quit [Quit: rkanwal]
rkanwal has joined #dri-devel
columbarius has joined #dri-devel
neonking__ has joined #dri-devel
Guest3390 has quit [Ping timeout: 480 seconds]
co1umbarius has quit [Ping timeout: 480 seconds]
<karolherbst> airlied: ehh.. does this scratch code even work if the values types are different?
<karolherbst> like this scratch area contains of a 64 bit and a 32 bit value
<karolherbst> s/of//
<karolherbst> and I think you place the elements in a "vector", no
<karolherbst> ?
<karolherbst> so I think one thread writes into offset 0x8
<karolherbst> and another thread reads 0x0 (as 64 bit) and gets a garbaged pointer
neonking__ has quit [Remote host closed the connection]
neonking__ has joined #dri-devel
<airlied> karolherbst: yes it should work for 64-bit or 32-bit values
<karolherbst> well.. it has both
<airlied> assuming the bit shift is in the right place :-P
<karolherbst> I don't think it is
<karolherbst> it really looks like the content of the scratch buffer gets corruped
<karolherbst> see those non ptr looking 32 bit values mixed in?
neonking_ has quit [Ping timeout: 480 seconds]
<karolherbst> at idx 8
<karolherbst> and 11
<karolherbst> 7 and 8 are clearly a heap pointer, but 0x7fffb00000003 is kind of a bad pointer
<airlied> btw which luxmark scene are you testing with?
<karolherbst> luxball
<karolherbst> the others won't compile :D
<karolherbst> well... I think they would compile at some point
<karolherbst> anyway
<karolherbst> I think the offset calculation in load/store scratch is wrong
<karolherbst> I am sure it all works if _all_ values are either 32 or 64 bit within scratch
<karolherbst> but not if it's mixed
<airlied> karolherbst: so it casts the scratch ptr to a 32-bit or 64-bit ptr
<airlied> then adds the offset to it
<karolherbst> I mean the thread_offsets value
<karolherbst> but weird...
<airlied> that is done before the shift though
<karolherbst> right...
<karolherbst> yeah so shift_val is different mhh
<karolherbst> yeah.. shift_val being wrong _would_ be a valid explenation here I think?
<airlied> when I break in there, I get scratch at 8 bytes for that demo
<karolherbst> it launches multiple kernels
<karolherbst> the second or third one with scratch space has 12
<airlied> oh i see it now
* airlied assumes it's not doing unaligned 64-bit loads
<karolherbst> I am sure it is :P
<karolherbst> the offsets are also a bit odd
<karolherbst> store offset = 0 1 3 4 6 7 9 10
<karolherbst> the 32 bit value gets store offset = 2 5 8 11 14 17 20 23
<karolherbst> maybe the size of 12 confuses it
<karolherbst> maybe something should align it?
<airlied> oh nmaybe
<karolherbst> let me try that
<airlied> might be worth trying to get that to 16 somewhere
<karolherbst> yeah
<karolherbst> but that would also explain why it works on iris
<karolherbst> airlied: question is now, should llvmpipe or the frontend work around that?
<airlied> llvmpipe I think
<karolherbst> seems to fix it
<karolherbst> but llvmpipe doesn't know the alignment of the biggest thing :(
<karolherbst> worst case you need to align to long16, no?
<airlied> should I align to 8 or just next power of two it?
<karolherbst> next power of two can hurt if you got huge scratch space
<airlied> no I think we don't ever vector load
<airlied> we always only load 32-bit or 64-bit
<karolherbst> maybe next pot _or_ long16?
<karolherbst> depening on what's smaller
<airlied> so I think 8 is probably fine
<karolherbst> airlied: CL has some stupid reqs on pointer alignments and shit though
<airlied> but when the IR gets to llvmpipe, it's pretty much load a 32-bit or load a 64-bit this number of components
<airlied> so you have to iterate it
<karolherbst> right
<airlied> it's not like we do a 256-bit fetch even if we could
<karolherbst> I am just wondering if some of the alignment fails I see are caused by this
<karolherbst> basic kernel_memory_alignment_constant
<karolherbst> basic kernel_memory_alignment_global
<karolherbst> but...
<karolherbst> could be just llvmpipe
<karolherbst> sooo.. let's benchmark on my non crappy desktop? :D
<karolherbst> I am sure this also fixes other random crashes with fp64
<karolherbst> what a pita of a bug
<airlied> thanks for digging in!
<karolherbst> airlied: yeah.. that works :)
<airlied> 16288 has the patch
<karolherbst> heh.. my ADL-S doesn't seem so much faster than my CML-H
<karolherbst> ahh.. it's GT-1 vs GT-2
<karolherbst> mhh, it should still be faster..
<airlied> make sure you took of the LP_NUM_THREADS :-P
<karolherbst> airlied: how can I make llvmpipe use more threads? :D
<karolherbst> airlied: nah.. I was testing iris first
<karolherbst> heh.. LP_NUM_THREADS=24 and still only 1100%
<karolherbst> where is my perf
<karolherbst> airlied: so uhm... how do I get more perf out of llvmpipe on my machine? :D
<karolherbst> guess local size of 32 isn't helping? dunno
<karolherbst> 468 points and image validation seems happy
<airlied> not really sure, it's probably limited by launch params
<karolherbst> yeah... so luxmark uses 32 threads on CPU devices
<karolherbst> and 64 on GPUs
<HdkR> Is llvmpipe still bounded by vertex heavy jobs rather than fragment?
<karolherbst> that's on CL :P
<HdkR> oh wow
<karolherbst> so.. llvmpipe is a GPU now
<karolherbst> still only 1100%
* karolherbst doens't have 20 cores for nothing
<karolherbst> but iris seems a little slow
<karolherbst> so iris on my desktop should be around 50% faster
<karolherbst> but is only 10%
<karolherbst> intel_gpu_top says 99%
<karolherbst> ¯\_(ツ)_/¯
<karolherbst> maybe I hurt perf
<karolherbst> yeah.. no
ybogdano has quit [Ping timeout: 480 seconds]
<karolherbst> ADL-S GT1: 2719
<karolherbst> CML GT2: 2305
<karolherbst> airlied: guess I have to figure out the llvm header situation
<karolherbst> and I think I might even require llvm-14, because the opencl header stuff isn't as terribly broken there...
<karolherbst> it still is, but.. uhhh
* airlied is going to go dig into coroutines
<karolherbst> good luck
<karolherbst> ADL-S GT1 + LP: 3139
mclasen_ has quit [Ping timeout: 480 seconds]
rkanwal has quit [Quit: rkanwal]
kts has quit [Quit: Konversation terminated!]
<karolherbst> airlied: btw, your skynet email is dead
<airlied> yeah need to chase down where that server went, might have to retire it
kts has joined #dri-devel
nchery has quit [Ping timeout: 480 seconds]
kts has quit [Quit: Konversation terminated!]
elongbug__ has quit [Ping timeout: 480 seconds]
rsalvaterra_ has joined #dri-devel
rsalvaterra is now known as Guest3407
rsalvaterra_ is now known as rsalvaterra
Guest3408 has quit [Ping timeout: 480 seconds]
fxkamd has quit []
sdutt has quit []
jimjams has joined #dri-devel
mwalle has quit [Quit: WeeChat 3.0]
Duke`` has joined #dri-devel
famfo has joined #dri-devel
itoral has joined #dri-devel
consolers has joined #dri-devel
<consolers> i think since i moved from mesa-20.2 to 21.2 clinfo started segfaulting: now it segfaults when loading loading /usr/lib64/gallium-pipe/pipe_iris.so
<consolers> that was on 22.0 i think i hit this before and figured something out but my mind is a blank
mhenning has quit [Quit: mhenning]
jewins has quit [Read error: Connection reset by peer]
Duke`` has quit [Ping timeout: 480 seconds]
lemonzest has quit [Quit: WeeChat 3.4]
consolers has quit [Ping timeout: 480 seconds]
ppascher has joined #dri-devel
danvet has joined #dri-devel
frieder has joined #dri-devel
garrison has joined #dri-devel
i-garrison has quit [Read error: Connection reset by peer]
consolers has joined #dri-devel
<consolers> anyclues on troubleshooting why clinfo is just crashing with mesa?
mvlad has joined #dri-devel
<consolers> i know it worked with mesa-20.2.0
<consolers> but apparently not since, then when i've had 21.2.1 and 22.2.0
digetx has quit [Ping timeout: 480 seconds]
digetx has joined #dri-devel
MajorBiscuit has joined #dri-devel
consolers has quit [Ping timeout: 480 seconds]
tzimmermann has joined #dri-devel
<airlied> danvet: fyi I backmerged rc5, I had an arm build fail it had a fix for
cheako has quit [Quit: Connection closed for inactivity]
<dolphin> airlied, danvet: no patches got picked up for drm-intel-fixes this week
ppascher has quit [Ping timeout: 480 seconds]
tursulin has joined #dri-devel
thellstrom has joined #dri-devel
lumag_ has joined #dri-devel
mwalle has joined #dri-devel
tzimmermann has quit [Quit: Leaving]
tzimmermann has joined #dri-devel
<tzimmermann> javierm, if you have a bit, could you please comment on https://patchwork.freedesktop.org/series/103222/ ?
<javierm> tzimmermann: sure, let me do that now
<tzimmermann> no hurries
<javierm> tzimmermann: no worries, is that I happen to have time now :)
thellstrom has quit [Remote host closed the connection]
rsripada_ has quit [Remote host closed the connection]
rsripada has joined #dri-devel
<javierm> tzimmermann: are you familiar with https://www.kernel.org/doc/html/latest/dev-tools/kunit/index.html ?
<tzimmermann> javierm, no sorry
<javierm> yeah, me neither. But I think that would be nice to have kunits for all the conversion helpers
xperia64_ has joined #dri-devel
<javierm> tzimmermann: I'll add that to my TODO to look at some point, which just keeps growing :)
<tzimmermann> that's a good idea with these unit tests
xperia64 has quit [Ping timeout: 480 seconds]
lynxeye has joined #dri-devel
<javierm> tzimmermann: Ok, I comment in the list too
<mripard> javierm: I had to use it a bit recently for the clocks framework, so I can help if needed
<mripard> (it's awesome)
<javierm> mripard: great
<javierm> mripard: yes, I was in a talk about kunit at some conference (plumbers in lisbon maybe?) and thought that was awesome but never had the time to dig deeper
<javierm> mripard: thanks for the offering, I'll for sure bug you if want to write some unit tests with kunit :)
nvishwa1 has quit [Read error: Connection reset by peer]
Lyude has quit [Ping timeout: 480 seconds]
mattrope has quit [Ping timeout: 480 seconds]
Lyude has joined #dri-devel
mattrope has joined #dri-devel
vyivel has quit [Read error: Connection reset by peer]
vyivel has joined #dri-devel
<mripard> I wanted to write some infrastructure for drivers to create unit tests in KMS, but got distracted
<mripard> maybe that would be worth adding in the TODO too
<javierm> mripard: Ok, I'll see to add that too when writing the patch for Documentation/gpu/todo.rst
<mripard> for vc4 for example, we have an atomic_check function that I have unit-tests for, but on my workstation, and it "works" with me copy/pasting the source code each and every time I need to rework it
<mripard> it's very far from optimal :)
<javierm> :D
<javierm> mripard: now you made me even more curious about kunit, gah I wish that had more time
<javierm> tzimmermann: what a nice patch series, the diff stat speaks for itself. And is great to see that much of code duplication going away
<tzimmermann> thanks :)
<tzimmermann> as i said before, i'd like to make these helpers composable, so that complex conversions can be assembled from multiple simple ones. we're not there yet, but it's a big step
<javierm> tzimmermann: it is a big step indeed
<javierm> specially since then someone reading these helpers will have to just understand drm_fb_xfrm() (which is complex, true) rather than the small differences between the different conversion helpers
<javierm> tzimmermann: and the diffstat after your patches speak for itself :)
<tzimmermann> javierm, the next step is to use iosys_map for the pointers arguments. iosys_map will be ammended with caching information. from this, we can easily detect which dbuf/sbuf need temporary buffers and which can be used as-is. we should also be able to merge drm_fb_xfrm() and drm_fb_xfrm_toio() into a single function
<javierm> tzimmermann: yup, I remember you mentioned that. Will speed up for the cases that don't use CMA/need a temp buffer
<javierm> since currently we are always doing the extra copy just in case
maxzor has joined #dri-devel
mszyprow has joined #dri-devel
pcercuei has joined #dri-devel
mszyprow has quit [Ping timeout: 480 seconds]
<pq> tzimmermann, javierm, mripard, FYI https://lists.freedesktop.org/archives/dri-devel/2022-April/349437.html has also per-line pixel conversion operations.
<pq> it's that the source or dest is always an internal 16 bpc representation used for blending in VKMS
rasterman has joined #dri-devel
jimjams has quit [Quit: Connection closed for inactivity]
<pq> tzimmermann, drm_fb_xrgb8888_to_rgb565_swab_line() sounds confusing. On one hand, the pixel formats are absolutely defined. OTOH, you add a swab.
<pq> or are these not reference to DRM_FORMAT_XRGB8888 and DRM_FORMAT_RGB565?
siqueira has quit []
lemes has quit []
melissawen has quit [Quit: ZNC 1.8.2+deb2+b1 - https://znc.in]
exit70 has quit [Quit: ZNC 1.8.2 - https://znc.in]
exit70 has joined #dri-devel
lemes has joined #dri-devel
siqueira has joined #dri-devel
melissawen has joined #dri-devel
<tzimmermann> pq, there are drivers that want a conversion+byteswap. i think we can already express that with the proper 4cc code. but conversion helpers are not there yet. i've been unifying these functions for some time and still in the middle of it. for now, i'd prefer to keep is as-is
<tzimmermann> pq, i'll see if some of that vkms code can go into generic helpers
<pq> tzimmermann, cool, thanks :-)
<pq> also, someone who actually does kernel dev would be nice to check by review comments on that series, since I'm not familar with kernel practises
<pq> *my review comment
<pq> tzimmermann, for now, the VKMS intermediate pixel format is not defined as a 4cc in order to use a struct conveniently.
apinheiro has joined #dri-devel
ppascher has joined #dri-devel
digetx has quit [Ping timeout: 480 seconds]
mclasen has joined #dri-devel
echoed has joined #dri-devel
echoed has left #dri-devel [#dri-devel]
consolers has joined #dri-devel
Lucretia has quit []
<consolers> could it be some thread thing that causes any opencl thing to segfault when loading the mesa iris gallium dll?
<consolers> i cant spot any reports on it either - except 2 on libreoffice/opencv i thinkfrom 2021 which were solved with downgrades
<consolers> and if i search for opencl google is giving me results for opened, like some ocr typo
digetx has joined #dri-devel
devilhorns has joined #dri-devel
Lucretia has joined #dri-devel
consolers has quit [Ping timeout: 480 seconds]
sagar__ has quit [Remote host closed the connection]
sagar__ has joined #dri-devel
consolers has joined #dri-devel
MajorBiscuit has quit [Ping timeout: 480 seconds]
rkanwal has joined #dri-devel
rasterman has quit [Quit: Gettin' stinky!]
rasterman has joined #dri-devel
Lucretia has quit []
Lucretia has joined #dri-devel
lemonzest has joined #dri-devel
itoral has quit [Remote host closed the connection]
itoral has joined #dri-devel
itoral has quit [Remote host closed the connection]
itoral has joined #dri-devel
consolers has quit [Ping timeout: 480 seconds]
ppascher has quit [Ping timeout: 480 seconds]
MajorBiscuit has joined #dri-devel
anarsoul has quit [Quit: ZNC 1.8.2 - https://znc.in]
anarsoul has joined #dri-devel
rgallaispou has quit [Remote host closed the connection]
maxzor has quit [Ping timeout: 480 seconds]
rpigott has quit [Read error: Connection reset by peer]
icecream95 has quit [Ping timeout: 480 seconds]
itoral has quit []
<HdkR> robclark: I noticed you wanted a way to determine big.little cores. Welcome to the pain train, there is no exact way to determine this, you must use heuristics. You could peek at what FEX-Emu does to classify big versus little but it leaves open the possibility of getting things wrong.
alyssa has left #dri-devel [#dri-devel]
sdutt has joined #dri-devel
<tzimmermann> javierm, sounds good
ppascher has joined #dri-devel
rgallaispou has joined #dri-devel
hch12907 has joined #dri-devel
MajorBiscuit has quit [Ping timeout: 480 seconds]
MajorBiscuit has joined #dri-devel
<robclark> HdkR: I believe "capacity" is exposed in sysfs.. crosvm (or rather the thing that launches it) looks at this to setup big and little vcpu's.. but haven't had a chance to look more closely at that
<MrCooper> daniels: do you remember what happened to https://lists.x.org/archives/xorg-devel/2017-November/055172.html (v3 of DRI3 v1.2: DMA fences), i.e. why it wasn't applied or followed up on?
<daniels> MrCooper: I think we already had this discussion on IRC a couple of years ago?
<daniels> MrCooper: by the time the dust had settled between keithp and ickle, it sounded like the conclusion was that it wouldn't be landed until xserver had a smart scheduler which would wait until the fences had actually signaled before doing anything
<daniels> unfortunately mine & lfrb's time on this earth is but finite
<MrCooper> thanks, don't remember such a fight, maybe I forgot about it :/
<MrCooper> that argument does make sense to me though
<daniels> yeah, it sounded like there was zero support for simply shuttling fences through, i.e. acting as we currently do with implicit sync
<daniels> and that nothing would be landable until the server was taking decisions itself
<daniels> it does sound like a good idea in isolation, but given the time that would require, and that you'd only really see any benefit if you weren't simply proxying via Xwl, or if you were mixing present + core rendering ... eh
<MrCooper> the same thing could be done with implicit sync in principle, using dma-buf fds
rgallaispou has quit [Read error: Connection reset by peer]
<daniels> it could!
<daniels> many things are possible
<daniels> a superset of the things which are sensible :P
<MrCooper> it's arguably a requirement for proper mailbox behaviour
* daniels shrugs
rpigott has joined #dri-devel
jewins has joined #dri-devel
<javierm> tzimmermann: interesting, should we just land it then ?
<tzimmermann> javierm, sure, why not.
<javierm> tzimmermann: I wondered the same that Junxiao asked but just did the minimum change to fix this particular issue
mszyprow has joined #dri-devel
<tzimmermann> well, he has a point
<tzimmermann> then maybe do a v2 with the other interfaces fixed.
Company has joined #dri-devel
maxzor has joined #dri-devel
<daniels> MrCooper: it sounds like a good thing to do and I'm certainly not going to talk you out of it :)
<zmike> anholt: what's the deqp-runner syntax for multiple --env options?
<zmike> or any deqp-runner expert
<javierm> tzimmermann: sure. It can't do any harm I guess^Whope :)
consolers has joined #dri-devel
alyssa has joined #dri-devel
alyssa has left #dri-devel [#dri-devel]
tzimmermann has quit [Quit: Leaving]
consolers has quit [Ping timeout: 480 seconds]
mszyprow has quit [Ping timeout: 480 seconds]
fxkamd has joined #dri-devel
MajorBiscuit has quit [Ping timeout: 480 seconds]
rgallaispou has joined #dri-devel
<pepp> zmike: I think you just pass multiple "--env name=value" param
<zmike> hm
<zmike> maybe I was doing something else wrong
mszyprow has joined #dri-devel
MajorBiscuit has joined #dri-devel
<robclark> HdkR: fwiw: cat /sys/devices/system/cpu/cpu*/cpu_capacity
ella-0_ has joined #dri-devel
ella-0 has quit [Read error: Connection reset by peer]
mszyprow has quit [Ping timeout: 480 seconds]
i-garrison has joined #dri-devel
garrison has quit [Read error: Connection reset by peer]
nvishwa1 has joined #dri-devel
<jekstrand> If's an infra problem. Not sure if it's fd.o or Google
<daniels> jekstrand: it's google, cf. #freedreno and also #freedesktop :P
<daniels> robclark is trying to fix it
tobiasjakobi has joined #dri-devel
tobiasjakobi has quit []
<jekstrand> :(
* jekstrand goes back to fixing nir_lower_blend
<daniels> sozzers
alyssa has joined #dri-devel
<alyssa> Static functions (not marked inline) in a header don't compile in release builds, but are fine in debug builds.
<alyssa> Any idea why that might be? Ideally any code that builds in debug also builds in release.
<alyssa> (In this case, it sounds like that should've failed in a local debug build.)
<ajax> because release builds enforce -Werror=unused-function. so any file you include the header in, must reference all the statics therein
<ajax> unless they're inline, or __attribute__((unused)), or whatever
<alyssa> ajax: ok.. is there a reason debug builds don't enforce -Werror=unused-function?
<daniels> alyssa: you're calling it from an assert()
<daniels> which makes it used in debug builds, and unused in release builds
<ajax> hah, indeed, i skipped a step there
<alyssa> aaaah
<daniels> but anyway, just make it inline ... ?
<alyssa> Yeah, the correct fix it is make it inline
<alyssa> I'm more baffled why it went through my local build (and made it to CI at all)
Lucretia has quit [Remote host closed the connection]
<alyssa> instead of gcc screaming at me to mark it inline
<alyssa> I'm the kind of person that needs compilers to scream at me :p
<karolherbst> alyssa: some people implement things in headers and include those
<karolherbst> so if gcc would scream, it would break code :)
<alyssa> sounds like a good thing to break ;)
<karolherbst> if you are ready for that bikeshedding, please write the patch and explain why violating some weird spec is fine :D
<karolherbst> let's make it a daily thing: shitting on C and be annoyed by how bad it is or something
<alyssa> which is the spec violation?
<karolherbst> does C even know about headers ?
<karolherbst> the pre processor is probably a spec on its own
<ajax> would be somewhat weird for the C standard to both define what goes in what standard headers and not know what headers are
<karolherbst> I am sure the standard lib is another spec
<karolherbst> maybe it isn't.. :D
<ajax> open-std.org is down atm so i can't pull up n1570.pdf to check, but
<karolherbst> the C spec is cursed
<karolherbst> "??=define arraycheck(a, b) a??(b??) ??!??! b??(a??)"
<karolherbst> who is a C expert and knows what that resolves to?
Lucretia has joined #dri-devel
<karolherbst> that's right, it's #define arraycheck(a, b) a[b] || b[a]
<daniels> trigraphs are so awesome
<karolherbst> I didn't even knew they existed
<ajax> they don't anymore iirc
<karolherbst> ajax: I have the C17 spec here :(
<daniels> in fairness gcc warns when you use trigraphs unless you specifically suppress it
<daniels> 'you 100% do not mean this, if you did then you can enable it but you didn't'
<karolherbst> what's the reason to add those anyway?
<ajax> i thought they were getting dropped in c23 was the rumor
<ajax> because ebcdic doesn't have all of the basic character set for c89 in its minimal subset
<ajax> so depending which s360 you find yourself on you might not have [] as, like, keys on the keyboard
<karolherbst> ohh wow.. "The trigraph sequences enable the input of characters that are not defined in the Invariant Code Set as described in ISO/IEC 646, which is a subset of the seven-bit US ASCII code set."
<alyssa> karolherbst: "I have the C17 spec here :(" ditching clang+llvm-spirv are we now?
<karolherbst> alyssa: :D
<karolherbst> I won't comment on that
<ajax> so you cannot write those characters into files, which makes it hard for the compiler to tokenise them
<karolherbst> uhhh
<ajax> i'm blaming ebcdic here and i think there's at least one other non-ascii encoding that was partly to blame here, but
<jekstrand> hrm... nir_lower_blend really shouldn't require 32-bit for logic ops...
* karolherbst should use univode emoticons as function names more often
nchery has joined #dri-devel
<ajax> greek alphabet in math functions please
<karolherbst> good idea actually
<karolherbst> assert becomes 🔥
<karolherbst> we have those joke programminc language, but maybe there needs to be one where ANSI chars are invalid
<ajax> every character must be from a unicode codepoint > 0xff
<alyssa> karolherbst: do it in rust :p
<karolherbst> I am not sure if the world is ready for that yet
<alyssa> a C->NIR compiler written in Rust? how hard can it be?
<alyssa> famous last
<karolherbst> mhhh
<karolherbst> don't tempt me
<alyssa> You've been tempted! :-p
<karolherbst> how much is rust self hostet, if llvm is still written in C anyway
<karolherbst> *hosted
stuart has joined #dri-devel
Duke`` has joined #dri-devel
<rgallaispou> Hi. I'm struggling with gamma again...
<rgallaispou> In drm_atomic_uapi.c:384, what is the point of this test ? Is it only to test data alignment ? Because it won't pass any error to userland if the data is aligned according to 'expected_elem_size' but out of the struct (let it be 2048 + 8). This is my current issue, shown by kms_color@pipe-a-invalid-gamma-lut-sizes: the ioctl returns 0 when it should not. How does it go on Intel/AMD sides ?
<jekstrand> Ok, here's a fun question: If someone doesn't write to gl_FragData.w but blending is such that w doesn't matter, do they get well-defined results? I think the answer is yes, unfortunately.
<hch12907> alyssa: I think I had a C parser somewhere, written in rust... maybe we can repurpose that and make a C->NIR compiler, lol
* karolherbst doesn't think he is ready for linking inside nir yet
<karolherbst> heck, not even vtn would be ready
<alyssa> jekstrand: I think so. Why is that unfortunate?
<jekstrand> alyssa: Just more juggling we have to do in nir_lower_blend
<alyssa> right, okay
<jekstrand> I think the easy thing to do is just make the variable always match the format. Then we'll even get some dead-code action happening, maybe.
<vsyrjala> rgallaispou: sounds like you're not checking that the blob has the correct size
<alyssa> jekstrand: hm, alright
<alyssa> it might be nice to nir_lower_blend for radeonsi-style shader epilogs on AGX
<alyssa> but.. meh, tbh
<jekstrand> Sure
<jekstrand> Doesn't sound like a terrible idea
<alyssa> actually, jank from shader variants on AGX with AAA games sounds like a great problem to have, don't worry about it ;)
<alyssa> (and presumably that's all Vulkan content when someday asahivk is a thing)
MajorBiscuit has quit [Ping timeout: 480 seconds]
<rgallaispou> vsyrjala: it seems it resolves to drm_atomic_replace_property_blob_from_id(), but I don't see any call to the stm driver
<rgallaispou> vsyrjala: did you meant on a userland level or on the kernel side ?
<vsyrjala> kernel. driver needs to check that
hikiko_ has joined #dri-devel
<rgallaispou> vsyrjala: okay, I'll check that, thanks
hikiko has quit [Ping timeout: 480 seconds]
<MrCooper> daniels: FWIW, assuming a fence fd becomes readable when the fence is signalled, it shouldn't require a "smart scheduler": IgnoreClient if fence isn't signalled yet, AttendClient when the fd becomes readable
gouchi has joined #dri-devel
<daniels> MrCooper: sure
gouchi has quit []
hikiko has joined #dri-devel
hikiko_ has quit [Ping timeout: 480 seconds]
<MrCooper> daniels: FWIW, the context for my question is https://gitlab.freedesktop.org/xorg/xserver/-/issues/1317
slattann has joined #dri-devel
<ajax> why are there two generated copies of vk_cmd_queue.h in my build directory
<ajax> and why do they have different content
devilhorns has quit []
<ajax> and, most importantly, why for me does lavapipe include the one that doesn't declare everything
slattann has quit [Remote host closed the connection]
<zmike> rm -r build/src/vulkan
alyssa has left #dri-devel [#dri-devel]
<daniels> MrCooper: NV being special again then
frieder has quit [Remote host closed the connection]
<jenatali> Huh... I think there's a double-close fd bug for Android native fences...
imirkin_ has joined #dri-devel
<jenatali> Oh, no I just got the semantics wrong, nevermind
nvishwa1 has quit [Read error: Connection reset by peer]
imirkin_ has quit [Quit: Leaving]
lynxeye has quit [Quit: Leaving.]
tjmercier has joined #dri-devel
krushia has joined #dri-devel
alanc has quit [Remote host closed the connection]
alanc has joined #dri-devel
Haaninjo has joined #dri-devel
gawin has joined #dri-devel
stuart has quit [Ping timeout: 480 seconds]
eukara has quit []
eukara has joined #dri-devel
apinheiro has quit [Ping timeout: 480 seconds]
<jenatali> zmike: What does has_alpha control?
<zmike> jenatali: whether the swapchain has alpha
<zmike> XRGB or ARGB basically
<jenatali> Ah, sure, yeah I don't see any reason to not always have alpha
rasterman has quit [Quit: Gettin' stinky!]
stuart has joined #dri-devel
<HdkR> robclark: sadly capacity only works if that is actually filled out. Also only gives you an idea, you still need to make a choice in big.bigger.biggest or small.smaller.smallest weirdo clustering setups :|
nchery has quit [Ping timeout: 480 seconds]
alyssa has joined #dri-devel
<alyssa> if an OpenCL shader gets a pointer to something on the stack, what does that look like in (optimized, lowered) NIR?
<alyssa> I guess load_scratch_base_ptr
apinheiro has joined #dri-devel
<karolherbst> alyssa: yes and no
<karolherbst> I think we have enough opt passes by now to resolve a lot of those things
<karolherbst> but yes.. if it ends up as funtion_temp memory, that gets lowered to scratch
<karolherbst> alyssa: I just don't think we end up with nir_load_scratch_base_ptr in CL
<alyssa> Hmm
<karolherbst> the base_ptr is only relevant for shader_calls as it seems
<alyssa> huh, ok
<karolherbst> for CL we just have a scratch space starting at 0 and the driver has to allocate that
<alyssa> that's better for Mali, I guess
<karolherbst> llvmpipe just mallocs :)
<karolherbst> on Nvidia we'd use local memory
<alyssa> though er how does that work
<karolherbst> like the same as for spilled memory
<karolherbst> ehh... spilled registers
<alyssa> yeah, but those aren't spilled to 0x0
<alyssa> ..
<karolherbst> the address doesn't matter
<alyssa> even if you take an address of it..?
<karolherbst> the only thing CL cares about is alignment of the address
<karolherbst> alyssa: the neat part is, sharing those pointers across invocations is just undefined behavior
<alyssa> ...oh, I see the trick you're doing now.
<alyssa> so even if the app does something cruel like *(&x[63] + y)
<karolherbst> anyway.. we have deref_cast to get the actual pointer value
<alyssa> it still just turns into load_scratch, never load_global
<karolherbst> yep
<alyssa> excellent
<alyssa> will do the easy thing then
<karolherbst> yeah
<karolherbst> just use whatever stuff you use for indirect arrays
mvlad has quit [Remote host closed the connection]
<karolherbst> and if you don't support scratch mem yet, just port your driver over to it :D
<alyssa> heh, we have scratch
<karolherbst> ahh
<karolherbst> excellent
<alyssa> but the hw likes to mangle the addresses for cache reasons
<karolherbst> then it should just work, no?
<karolherbst> yeah.. shouldn't matter
<alyssa> and at first blush it looks like that mangling needs to be disabled for CL
<alyssa> but yeah, ok
<karolherbst> as long as the alignment stays the same
<alyssa> Yep
<karolherbst> CL has strict rules though
<alyssa> each 16 byte chunk remains as-is
<karolherbst> so int16 is 0x80 aligned
<karolherbst> ehh
<karolherbst> long16
<alyssa> grumble. guess the mangling goes.
<karolherbst> it should be fine though
<alyssa> though... maybe not..?
<karolherbst> I think if the kernel wants the address we simply use the offset
<alyssa> because the app can never get to the physical pointer, only the virtual pointer starting at 0, which is aligned?
<karolherbst> deref_cast (ssa_x) whatever
<karolherbst> and that's just casting the deref thing to the constant
<karolherbst> alyssa: yeah, I think so
<karolherbst> the nir shader doesn't know the physical pointer anyway
<karolherbst> you get the offset into load_scratch/store_scratch
<karolherbst> what you do with that is up to you
nchery has joined #dri-devel
<karolherbst> I don't think we even get a load_scratch_base_ptr at all
<karolherbst> alyssa: ahhhh.. I know why I never saw any nir_load_scratch_base_ptr
<karolherbst> using nir_address_format_32bit_offset_as_64bit, for temp memory :)
<karolherbst> if you'd use nir_address_format_64bit_global _then_ you'd get load_scratch_base_ptr
<alyssa> Sure, that works great for us :)
<karolherbst> yeah.. I don't know who even wants real pointers on temp mem
<jenatali> Intel does
<karolherbst> well.. except llvmpipe
<jenatali> That's why jekstrand added the scratch base ptrs IIRC
<karolherbst> jenatali: seems to work fine without it?
<jenatali> Or maybe it was only for making it work with generic pointers
<karolherbst> maybe
<karolherbst> ahh yeah
<karolherbst> I think that's it
<alyssa> generic pointers...?
<karolherbst> because you need to allow drivers to map it into global mem
<alyssa> why does CL do this to us
<karolherbst> so load_scratch_base_ptr is the pointer into _global_ mem of the scratch space
<jenatali> It's optional in 3.0 at least
<alyssa> jenatali: optional means wontfix! :p
<jenatali> Yeah until you find some app that needs it
<jenatali> Which I hope there aren't any?
<karolherbst> alyssa: I got CL C 2.0 kernels using generic pointers to compile without any of this mess though :D
<karolherbst> jenatali: luxmark 3.1
<karolherbst> but...
<karolherbst> nir was able to resolve all generics to its original type
<jenatali> Huh really? It uses generic?
<karolherbst> yeah
<jenatali> Interesting
<karolherbst> that's why we added that alu of cast optimization
<karolherbst> so we can optimize away NULL checks on generics
<karolherbst> well.. if NULL is passed as an arg that is
<karolherbst> anyway.. my hope is, that we can always resolve those... but I am sure that function calling will make that impossible
<karolherbst> or we duplicate...
<karolherbst> dunno
<karolherbst> not a fan of having to generate worse code, just because of generics
<karolherbst> jenatali: my mistake was to expose CL C 3.0 as the "default" languge, turns out, some applications assume you support CL C 2.0 then :)
<karolherbst> and the spec specifically says to only do that if you support _all_ CL C 2.0 features
<jenatali> Ah, yeah that makes sense
<karolherbst> you can still expose it in the list property
<karolherbst> just not through that single value one
<karolherbst> CL_DEVICE_OPENCL_C_VERSION needs to be 1.2
<karolherbst> CL_DEVICE_OPENCL_C_ALL_VERSIONS can list 3.0
<airlied> i think we will need generic addresses for sycl
<karolherbst> airlied: that's fine
<karolherbst> I don't claim support for generics, not even using that address mode, but it still works fine
<karolherbst> there are just realy rare corner cases where that would be required
<karolherbst> like storing it into global mem and loading it loader
<karolherbst> but I think for most applications just implementing functions with generic args we can probably wing it and hope it works out
<karolherbst> airlied: is there a sycl CTS or something btw?
<karolherbst> :D
<zmike> dcbaker: I'll have a couple more backports for the next rc
<zmike> will prob do them tomorrow morning before you get up
<karolherbst> alyssa: anyway.. once you get rusticl working, I'd be interested how much breaks :D
<karolherbst> my hope is that stuff simply passes, but...
<karolherbst> alyssa: btw.. I have a patch which uses an ubo for the input buffer
<karolherbst> airlied: we might want to do the same in clover and get rid of that input stuff :D
<karolherbst> ehh wait.. radeon...
<karolherbst> *sigh*
mszyprow has joined #dri-devel
<alyssa> what's the deal with radeon cl
<karolherbst> alyssa: it uses llvm directly
<karolherbst> like directly directly
<karolherbst> has to use the AMD ABI and stuff
<alyssa> yeah... why do we support that again?
<alyssa> :p
<karolherbst> because there is no other way
<karolherbst> ask airlied for details
<alyssa> :V
<alyssa> i'm better off not knowing
<karolherbst> I am not going to support anything besides nir anyway
<karolherbst> so...
<alyssa> and panfrost isn't going to support clover once rusticl is merged, I think ;)
<karolherbst> :D
<karolherbst> but I'd be really curious how well it works
<dcbaker> @zmike: sounds good
<karolherbst> I still want to wire it up with nouveau, but for that I need to fix multithreading
<karolherbst> actually.. let me try it with my patches and see what happens
MrCooper has quit [Ping timeout: 480 seconds]
<alyssa> karolherbst: ooi what are the blockers for rusticl in-tree?
<karolherbst> alyssa: mostly just reviews?
<karolherbst> I do have a bunch of fixes for random stuff in tree though
<karolherbst> there is a list of MRs
<karolherbst> but I think I need to create more
<alyssa> ah
<mlankhorst> danvet: ping?
<karolherbst> there are 161 commits, and only 95 do rusticl stuff
<alyssa> delight
<karolherbst> most of it is bumping texture/sampler view limits
<karolherbst> and some iris fixes
<karolherbst> we also need to fix llvm for conformance, but..
<alyssa> 22.3 then?
<karolherbst> maybe?
<danvet> mlankhorst, too late here, pls ping me again tomorrow ...
<karolherbst> though 22.2 should be possible
<karolherbst> we just need reviews
<karolherbst> alyssa: most of the stuff isn't really needed though.. I could do a run without any of those patches and see how bad it would be :D
cheako has joined #dri-devel
<alyssa> "rusticl: the CTS is a piece of shit"
<alyssa> maybe some git rebase needed too? :p
<karolherbst> no, that's intentional
<karolherbst> :D
<karolherbst> although I think we might get that fixed in the CTS
<karolherbst> there are other applications broken by it though
<karolherbst> it's all so terrible
<karolherbst> really hate that we have to do it like that
<karolherbst> yeah.. I guess I'll change that at some point
nchery has quit [Ping timeout: 480 seconds]
Duke`` has quit [Ping timeout: 480 seconds]
nchery has joined #dri-devel
MrCooper has joined #dri-devel
<jekstrand> jenatali: base pointers are for making it work with generic pointers and for making it work with ray-tracing.
<karolherbst> jekstrand: you need it for ray tracing? :( sounds aweful
<alyssa> is unaligned access with load/store_scratch defined?
<karolherbst> nope
<alyssa> excellent
<karolherbst> at least not inside llvmpipe as we figured out yesterday :)
<jekstrand> karolherbst: Yup. RT kernels do scratch totally differently for $REASONS
Haaninjo has quit [Quit: Ex-Chat]
<karolherbst> alyssa: anyway.. you can assume that you'll get correct alignments for everything
<karolherbst> if not, we messed up
<jekstrand> Well, actually, the reason is really simple: Scratch offsets are assigned per logical invocation, not per physical thread because invocations may move around between threads as shaders are dispatched, rays are traces, continuations happen, etc.
<jekstrand> Ok, maybe that's not simple. (-:
<karolherbst> sounds horrible
<jekstrand> It's a pretty straightforward consequence of the API
<karolherbst> I bet it was sure fun to implement all of that
<alyssa> raytracing sounds awful
<jekstrand> Eh, it's kinda fun, actually.
<karolherbst> implementing OpenCL is also kind of fun :P
* alyssa fixes piles of spilling bugs on Valhall
<karolherbst> yay
<karolherbst> are you running luxmark yet?
mszyprow has quit [Ping timeout: 480 seconds]
rasterman has joined #dri-devel
stuart has quit [Ping timeout: 480 seconds]
<alyssa> no, ES3.1 cts
danvet has quit [Ping timeout: 480 seconds]
anarsoul has quit [Ping timeout: 480 seconds]
ppascher has quit [Ping timeout: 480 seconds]
rasterman has quit [Quit: Gettin' stinky!]
heat has joined #dri-devel
<anholt> danylo: does gfxreconstruct have a way to look at the state (particularly image contents) along the way of rendering a frame?
rasterman has joined #dri-devel
iive has joined #dri-devel
stuart has joined #dri-devel
rasterman has quit [Quit: Gettin' stinky!]
rasterman has joined #dri-devel
ppascher has joined #dri-devel
nchery has quit [Ping timeout: 480 seconds]
<danylo> anholt: nope, no way to look at any state there
nchery has joined #dri-devel
<danylo> only way is to make a renderdoc capture and inspect it there, which could be tricky when you trying to debug a hang...
fxkamd has quit []
rasterman has quit [Quit: Gettin' stinky!]
<anholt> luckily not a hang on this one, just the first 2kb of gfxbench vk-5-normal's screen being corrupted.
apinheiro has quit [Quit: Leaving]
maxzor has quit [Ping timeout: 480 seconds]
<HdkR> Is there any way to get wayland to not autodetect monitor/output removal like X?
<daniels> HdkR: ask your compositor
<HdkR> hmmm
<daniels> Wayland only does what it’s told to
<HdkR> Sadly I don't think sway has a swaymsg command to disable autodetect
lemonzest has quit [Quit: WeeChat 3.4]
<HdkR> Oh well, I'll wait for that part of the ecosystem to mature some more :)
pcercuei has quit [Quit: dodo]
eukara has quit []
<Ristovski> karolherbst: Where can I find progress on radeonsi support for rusticl? In the draft comments you mentioned that airlied is working on that part?
<karolherbst> Ristovski: dunno.. but talking with airlied on this made it sound like it would take a while, because how AMD is doing compute is super messy
<Ristovski> Heh, sounds about right
<karolherbst> they have their own kernel ABI and stuff
<Ristovski> Hmm, as in amdkfd?
<karolherbst> no, shader ABI
Kayden has quit [Quit: go to office]
<Ristovski> Aaah, that makes more sense
<karolherbst> so the idea would be to wire up ACO or something, but that also sounds like ton of work
* Ristovski reads discussion from logs
alyssa has left #dri-devel [#dri-devel]
icecream95 has joined #dri-devel
icecream95 has quit []
icecream95 has joined #dri-devel
eukara has joined #dri-devel
mclasen has quit []
mclasen has joined #dri-devel
<karolherbst> anybody ever used phoronix-test-suite with their own compiled binaries? I think it just cleans the environment making it a pita to use
<karolherbst> probably
<dschuermann> will definitely take some time to land. we first have to get rid of the remaining radv bits in aco
<Ristovski> Hmm, does it only support recent GFX or does it go all the way back to GCN1?
<karolherbst> yeah.. sounds like quite the project
<karolherbst> Ristovski: probably the same thing where radv runs on
heat has quit [Remote host closed the connection]
heat has joined #dri-devel
<Ristovski> I see, that should cover GCN1 as well then
<Ristovski> (asking since I saw MCBP mentioned and that is GFX8+)
<karolherbst> "The test run did not produce a result." *sigh*
<karolherbst> ahh works with system bins
<karolherbst> "fun"
<karolherbst> well.. it doesn't afterall
mdroper has joined #dri-devel
<karolherbst> "The test run ended quickly" yeah well...
<karolherbst> wow.. it does crash the GPU context
anarsoul has joined #dri-devel
<karolherbst> ehh "write: 512 GB in 736.9 ms: 694.8 GB/s" I have questions
gawin has quit [Ping timeout: 480 seconds]
tursulin has quit [Read error: Connection reset by peer]
<Ristovski> lol
<karolherbst> either we are that good or something is fishy
<karolherbst> I suspect we are not handling 64 bit sized things all that well
<karolherbst> "Test buffers will use GB" well..
<karolherbst> what crappy code is that
morphis has quit [Ping timeout: 480 seconds]
<karolherbst> ahh yeah.. it passes a null buffer in? wtf
morphis has joined #dri-devel
<Ristovski> unrelated PSA: https://github.com/iovisor/bpftrace is seriously OP, I just used it as a no-mess `initcall_debug` alternative and it's probably much less overhead as well. Possibilities are truly endless *goes back to profiling random crap*
<karolherbst> jekstrand: I can trigger a "[drm] rusticl queue t[1648304 context reset due to GPU hang" reliably :(
<karolherbst> sometihng with loops
<karolherbst> like.. _long_ loops
<karolherbst> like millions of iterations
<karolherbst> "for (block = 0; block < ((1024*1024*1024/sizeof(ulong))/32); block += 256)"
lumag_ has quit [Ping timeout: 480 seconds]
jhli has quit [Quit: ZNC 1.8.2 - https://znc.in]
rkanwal has quit [Ping timeout: 480 seconds]
jhli has joined #dri-devel
Kayden has joined #dri-devel
<karolherbst> yeah.. just iterating more makes it crash
<heat> Ristovski, ebpf is singlehandedly the best and worst thing in the linux kernel :D
<heat> but yeah, pretty pretty nifty. especially on networking stuff