ChanServ changed the topic of #dri-devel to: <ajax> nothing involved with X should ever be unable to find a bar
JohnnyonFlame has joined #dri-devel
Kayden has quit [Quit: Leaving]
tursulin has quit [Read error: Connection reset by peer]
Kayden has joined #dri-devel
pcercuei has quit [Quit: dodo]
minecrell has quit [Quit: Ping timeout (120 seconds)]
minecrell has joined #dri-devel
minecrell has quit [Quit: Ping timeout (120 seconds)]
minecrell has joined #dri-devel
minecrell has quit [Quit: Ping timeout (120 seconds)]
minecrell has joined #dri-devel
minecrell has quit [Quit: Ping timeout (120 seconds)]
minecrell has joined #dri-devel
iive has quit []
rossy has quit [Server closed connection]
rossy has joined #dri-devel
mhenning has joined #dri-devel
Kayden has quit [Quit: home]
adjtm is now known as Guest66
adjtm has joined #dri-devel
dabaiste^ has quit [Ping timeout: 480 seconds]
Guest66 has quit [Ping timeout: 480 seconds]
jljusten has quit [Server closed connection]
jljusten has joined #dri-devel
Company has quit [Quit: Leaving]
MTCoster has quit [Server closed connection]
MTCoster has joined #dri-devel
columbarius has joined #dri-devel
co1umbarius has quit [Ping timeout: 480 seconds]
markyacoub has quit [Server closed connection]
markyacoub has joined #dri-devel
dianders has quit [Server closed connection]
dianders has joined #dri-devel
minecrell has quit [Quit: Ping timeout (120 seconds)]
minecrell has joined #dri-devel
camus has joined #dri-devel
mhenning has quit [Quit: mhenning]
tales_ has joined #dri-devel
minecrell has quit [Quit: Ping timeout (120 seconds)]
minecrell has joined #dri-devel
JTL has quit [Server closed connection]
JTL has joined #dri-devel
Daanct12 has joined #dri-devel
jstultz has quit [Server closed connection]
jstultz has joined #dri-devel
cwabbott has quit [Server closed connection]
cwabbott has joined #dri-devel
nikitalita48 has quit [Server closed connection]
nikitalita48 has joined #dri-devel
CosmicPenguin has quit [Server closed connection]
CosmicPenguin has joined #dri-devel
TD-Linux has quit [Server closed connection]
TD-Linux has joined #dri-devel
Lightsword has quit [Server closed connection]
Lightsword has joined #dri-devel
jewins has quit [Ping timeout: 480 seconds]
ybogdano has quit [Ping timeout: 480 seconds]
cengiz_io has quit [Server closed connection]
cengiz_io has joined #dri-devel
radii has quit [Server closed connection]
radii has joined #dri-devel
robclark has quit [Server closed connection]
robclark has joined #dri-devel
Kayden has joined #dri-devel
seanpaul has quit [Server closed connection]
seanpaul has joined #dri-devel
bbrezill1 has joined #dri-devel
mripard_ has joined #dri-devel
narmstrong has quit [Server closed connection]
narmstrong has joined #dri-devel
mripard has quit [Ping timeout: 480 seconds]
bbrezillon has quit [Ping timeout: 480 seconds]
Lightning has quit [Server closed connection]
slattann has joined #dri-devel
Lightning has joined #dri-devel
zzag has quit [Server closed connection]
zzag has joined #dri-devel
lemonzest has joined #dri-devel
Guest1795 has quit [Server closed connection]
robink has quit [Server closed connection]
robink has joined #dri-devel
Sachiel has quit [Server closed connection]
Sachiel has joined #dri-devel
tfiga has quit [Server closed connection]
tfiga has joined #dri-devel
dschuermann has quit [Server closed connection]
dschuermann has joined #dri-devel
slattann has quit []
slattann has joined #dri-devel
fxkamd has quit []
daniels has quit [Server closed connection]
daniels has joined #dri-devel
rg3igalia has quit [Server closed connection]
rg3igalia has joined #dri-devel
arnd has quit [Server closed connection]
arnd has joined #dri-devel
steev has quit [Server closed connection]
steev has joined #dri-devel
haasn has quit [Quit: ZNC 1.7.5+deb4 - https://znc.in]
zmike has quit [Server closed connection]
zmike has joined #dri-devel
haasn has joined #dri-devel
neoXite__ has quit [Server closed connection]
neoXite__ has joined #dri-devel
aravind has joined #dri-devel
fxkamd has joined #dri-devel
maxzor has joined #dri-devel
Daanct12 has quit [Read error: Connection reset by peer]
Daanct12 has joined #dri-devel
nchery has quit [Ping timeout: 480 seconds]
zf has quit [Server closed connection]
zf has joined #dri-devel
Karyon has quit [Server closed connection]
Karyon has joined #dri-devel
kurufu has quit [Server closed connection]
kurufu has joined #dri-devel
JohnnyonFlame has quit [Ping timeout: 480 seconds]
lemes has quit [Server closed connection]
lemes has joined #dri-devel
demarchi has quit [Server closed connection]
demarchi has joined #dri-devel
nchery has joined #dri-devel
SolarAquarion has quit [Server closed connection]
camus has quit [Ping timeout: 480 seconds]
slattann has quit [Ping timeout: 480 seconds]
slattann has joined #dri-devel
jolan has quit [Server closed connection]
jolan has joined #dri-devel
camus has joined #dri-devel
fxkamd has quit []
exit70 has quit [Server closed connection]
exit70 has joined #dri-devel
halfline has quit [Server closed connection]
enilflah has joined #dri-devel
SolarAquarion has joined #dri-devel
fxkamd has joined #dri-devel
tales_ has quit []
jrayhawk has quit [Server closed connection]
jrayhawk has joined #dri-devel
austriancoder has quit [Server closed connection]
austriancoder has joined #dri-devel
rib___ has quit [Server closed connection]
rib___ has joined #dri-devel
mmx_in_orbit has quit [Server closed connection]
mmx_in_orbit has joined #dri-devel
nchery has quit [Ping timeout: 480 seconds]
<graphitemaster> Does anyone know a reliable way to detect if something is running inside nsight?
Duke`` has joined #dri-devel
slattann has quit []
slattann has joined #dri-devel
<HdkR> graphitemaster: Check for NSIGHT_LAUNCHED or NVTX_INJECTION64_PATH environment variables.
fxkamd has quit []
sh_zam has quit [Server closed connection]
sh_zam has joined #dri-devel
angular_mike_____ has quit [Server closed connection]
angular_mike_____ has joined #dri-devel
dolphin has quit [Server closed connection]
dolphin has joined #dri-devel
jhli has quit [Server closed connection]
jhli has joined #dri-devel
mattst88 has quit [Server closed connection]
mattst88 has joined #dri-devel
itoral has joined #dri-devel
tchar has quit [Server closed connection]
tchar has joined #dri-devel
hwentlan____ has quit [Server closed connection]
hwentlan____ has joined #dri-devel
ZeZu has quit [Server closed connection]
mattrope has quit [Read error: Connection reset by peer]
ZeZu has joined #dri-devel
siqueira_ has quit [Server closed connection]
aswar002 has quit [Server closed connection]
aswar002 has joined #dri-devel
craftyguy has quit [Server closed connection]
craftyguy has joined #dri-devel
siqueira has joined #dri-devel
cyrozap has quit [Server closed connection]
cyrozap has joined #dri-devel
abws_ has quit [Ping timeout: 480 seconds]
flto has quit [Server closed connection]
mbrost has quit [Remote host closed the connection]
flto has joined #dri-devel
nchery has joined #dri-devel
danvet has joined #dri-devel
sagar__ has quit [Server closed connection]
sagar__ has joined #dri-devel
Duke`` has quit [Ping timeout: 480 seconds]
maxzor has quit [Ping timeout: 480 seconds]
K`den has joined #dri-devel
Kayden has quit [Read error: Connection reset by peer]
mclasen has quit [Server closed connection]
mclasen has joined #dri-devel
Kayden has joined #dri-devel
K`den has quit [Read error: Connection reset by peer]
K`den has joined #dri-devel
Kayden has quit [Read error: Connection reset by peer]
K`den is now known as Kayden
imirkin has quit [Server closed connection]
imirkin has joined #dri-devel
zackr has quit [Server closed connection]
zackr has joined #dri-devel
camus1 has joined #dri-devel
camus has quit [Ping timeout: 480 seconds]
LexSfX has quit [Server closed connection]
LexSfX has joined #dri-devel
DanaG_ has quit [Server closed connection]
DanaG_ has joined #dri-devel
alanc has quit [Remote host closed the connection]
alanc has joined #dri-devel
frieder has joined #dri-devel
shashank_sharma has quit [Ping timeout: 480 seconds]
slattann has quit []
mvlad has joined #dri-devel
xyene has quit [Server closed connection]
xyene has joined #dri-devel
remexre has quit [Server closed connection]
remexre has joined #dri-devel
glisse has quit [Server closed connection]
glisse has joined #dri-devel
slattann has joined #dri-devel
jkrzyszt has joined #dri-devel
sagar__ has quit [Remote host closed the connection]
sagar__ has joined #dri-devel
ogabbay has quit [Server closed connection]
ogabbay has joined #dri-devel
ezequielg has quit [Server closed connection]
ezequielg has joined #dri-devel
lileo___ has quit [Server closed connection]
lileo___ has joined #dri-devel
quantum5 has quit [Server closed connection]
quantum5 has joined #dri-devel
nchery has quit [Read error: Connection reset by peer]
nchery has joined #dri-devel
pochu has joined #dri-devel
pnowack has joined #dri-devel
pnowack has quit [Remote host closed the connection]
pnowack has joined #dri-devel
hfink has quit [Server closed connection]
hfink has joined #dri-devel
macromorgan has quit [Server closed connection]
macromorgan has joined #dri-devel
gawin has joined #dri-devel
pnowack has quit [Quit: pnowack]
itoral has quit [Remote host closed the connection]
itoral has joined #dri-devel
lemonzest has quit [Quit: WeeChat 3.4]
slattann has quit [Ping timeout: 480 seconds]
dabaiste^ has joined #dri-devel
tursulin has joined #dri-devel
YuGiOhJCJ has joined #dri-devel
ifreund has quit [Server closed connection]
ifreund has joined #dri-devel
gpiccoli has quit [Server closed connection]
jessica_24 has quit [Server closed connection]
gpiccoli has joined #dri-devel
jessica_24 has joined #dri-devel
ahajda has joined #dri-devel
jbarnes has quit [Server closed connection]
jbarnes has joined #dri-devel
anholt has quit [Server closed connection]
anholt has joined #dri-devel
rgallaispou has joined #dri-devel
MajorBiscuit has joined #dri-devel
sarnex has quit [Server closed connection]
sarnex has joined #dri-devel
lynxeye has joined #dri-devel
MajorBiscuit has quit []
anarsoul has quit [Server closed connection]
anarsoul has joined #dri-devel
andrey-konovalov has quit [Server closed connection]
sneil has quit [Server closed connection]
sneil has joined #dri-devel
lemonzest has joined #dri-devel
andrey-konovalov has joined #dri-devel
rpigott has quit [Server closed connection]
rpigott has joined #dri-devel
linearcannon has quit [Server closed connection]
linearcannon has joined #dri-devel
aravind has quit [Read error: Connection reset by peer]
rgallaispou has quit [Read error: Connection reset by peer]
rkanwal has joined #dri-devel
shashanks has joined #dri-devel
MajorBiscuit has joined #dri-devel
Daaanct12 has joined #dri-devel
pcercuei has joined #dri-devel
slattann has joined #dri-devel
Daanct12 has quit [Ping timeout: 480 seconds]
`join_subline has quit [Server closed connection]
`join_subline has joined #dri-devel
dviola has quit [Server closed connection]
dviola has joined #dri-devel
eukara has quit [Server closed connection]
eukara has joined #dri-devel
shashanks has quit [Ping timeout: 480 seconds]
rgallaispou has joined #dri-devel
shankaru has joined #dri-devel
Daanct12 has joined #dri-devel
pcercuei has quit [Quit: brb]
Daaanct12 has quit [Ping timeout: 480 seconds]
Daaanct12 has joined #dri-devel
Daanct12 has quit [Ping timeout: 480 seconds]
pcercuei has joined #dri-devel
psii has joined #dri-devel
gawin has quit [Ping timeout: 480 seconds]
nchery is now known as Guest120
nchery has joined #dri-devel
<graphitemaster> HdkR, does this also work on Windows
<HdkR> Idunno
<HdkR> Give it a whirl. I'm not a Windows dev.
Guest120 has quit [Ping timeout: 480 seconds]
flacks has quit [Quit: Quitter]
flacks has joined #dri-devel
<jani> looking for good ideas to implement HDMI v2.1a HF-EEODB https://lore.kernel.org/r/8735j9j7vd.fsf@intel.com
JohnnyonFlame has joined #dri-devel
<jani> basically HF-EEODB says, set the extension count in base edid block to 1, but the *real* extension count is in the HF-EEODB data block
<jani> if you only have a struct edid pointer in kernel, the memory allocated for it depends on whether the allocator was HF-EEODB aware
<jani> even if we add helpers to determine the EDID size and extension count, I can't see a way around reviewing *all* EDID usage, transfer, allocation, everything, across *all* drivers
<jani> unless we modify the base block extension count in a HF-EEODB aware drm_get_edid()... but I fear that might have userspace implications too
<jani> vsyrjala: ajax: airlied: ^
frieder has quit [Ping timeout: 480 seconds]
<emersion> please don't mutate the EDID blob exposed to user-space
<daniels> ^
gawin has joined #dri-devel
YuGiOhJCJ has quit [Quit: YuGiOhJCJ]
frieder has joined #dri-devel
<jani> emersion: daniels: ack. (though we kinda do already by throwing out the edid blocks that are invalid... but maybe that's a thing on displays of the distant past mostly)
slattann has quit []
aravind has joined #dri-devel
<jani> I'm kind of tempted to investigate adding a struct drm_edid that wraps around the raw struct edid to contain this meta info
itoral_ has joined #dri-devel
itoral has quit [Ping timeout: 480 seconds]
thellstrom has joined #dri-devel
rgallaispou has quit [Read error: Connection reset by peer]
Lynne has quit [Quit: Lynne]
Lynne has joined #dri-devel
rasterman has joined #dri-devel
itoral_ has quit [Remote host closed the connection]
itoral has joined #dri-devel
jkrzyszt_ has joined #dri-devel
jkrzyszt has quit [Remote host closed the connection]
<dj-death> all the test runners are failing on intel : https://gitlab.freedesktop.org/mesa/mesa/-/jobs/20101180
rkanwal has quit [Read error: Connection reset by peer]
rkanwal has joined #dri-devel
<dj-death> looks like a network issue
lkw has joined #dri-devel
<daniels> it's not
<daniels> export CI_COMMIT_TITLE='iris: don't synchronize BO for batch decoding'
<daniels> that's why bash is dying in a cacaphony of weird errors
<daniels> oh ffs
<dj-death> arg...
<dj-death> daniels: nicely spotted
<daniels> sorry about that
alarumbe has quit [Ping timeout: 480 seconds]
<dj-death> s/don't/do not/ ;)
agx has quit [Read error: Connection reset by peer]
<daniels> yeah, until the commit to fix lands
<daniels> it looks like some kind of Debian dependencies changed so bash was no longer explicitly installed
<daniels> and rather than installing it, cristicc decided to try to POSIX-ify the script instead
<daniels> so he's going back to just installing bash
itoral has quit []
Daanct12 has joined #dri-devel
Daaanct12 has quit [Read error: Connection reset by peer]
OftenTimeConsuming has quit []
<marex> mripard_: hey, can you please review the remaining two icn6211 patches, so we can wrap this up ?
Thymo has quit [Ping timeout: 480 seconds]
OftenTimeConsuming has joined #dri-devel
nchery has quit [Remote host closed the connection]
nchery has joined #dri-devel
nchery has quit [Remote host closed the connection]
nchery has joined #dri-devel
Thymo has joined #dri-devel
jewins has joined #dri-devel
rgallaispou has joined #dri-devel
alyssa has joined #dri-devel
<alyssa> kusma: Hey, wanna go deleting TGSI caps? :-D
tobiasjakobi has joined #dri-devel
<kusma> alyssa: Uuuh, I mean... Yeah, but I feel like this is a trap.
<alyssa> TGSI_{LDEXP, DROUND, DFREXP_DLEDEXP}_SUPPORTED
<kusma> I haven't done the shader-caps yet
<alyssa> all of these are gating GLSL IR lowerings
<alyssa> which we already have NIR lowerings for
<kusma> Yeah, but we need all drivers to go via NIR first, and I don't think that's the case yet?
<alyssa> grumble
* alyssa wonders who's at the intersection of "pure TGSI path" and ES3.1+
<alyssa> Nouveau, I guess
* alyssa adds to issue graveyard
<kusma> Anything using gallivm or tgsi exec seems to prefer TGSI also
<kusma> LLVMpipe has an explicit switch for this, probably time to remove that one soonish ;)
* alyssa opened #6196 and added it to the graveyard
<alyssa> :q
<kusma> Yeah, !8044 is the MR to unblock this.
tobiasjakobi has quit []
shashanks has joined #dri-devel
alarumbe has joined #dri-devel
* alyssa sighs
<alyssa> Someday :)
Thymo_ has joined #dri-devel
Thymo has quit [Ping timeout: 480 seconds]
sdutt has joined #dri-devel
kts has joined #dri-devel
shashank_sharma has joined #dri-devel
Thymo has joined #dri-devel
psi has joined #dri-devel
shashanks has quit [Ping timeout: 480 seconds]
psii has quit [Remote host closed the connection]
Thymo_ has quit [Ping timeout: 480 seconds]
<kusma> alyssa: And what a glorious day it will be!
<kusma> At that point, we can even remove all TGSI caps, and make them options for nir_to_tgsi instead!
Thymo has quit [Ping timeout: 480 seconds]
psi has left #dri-devel [#dri-devel]
psii has joined #dri-devel
<alyssa> :-D
Thymo has joined #dri-devel
shashank_sharma is now known as shashanks
mattrope has joined #dri-devel
iive has joined #dri-devel
<karolherbst> ./build/test_conformance/images/clReadWriteImage/test_cl_read_write_images 1D: PASSED 42 of 42 sub-tests. :3
masush5[m] has joined #dri-devel
<alyssa> karolherbst: congrats!
<karolherbst> alyssa: sadly nothing besides 1D images work and I don't know exactly why that is :D
<alyssa> strides
<alyssa> tiling
<alyssa> compression
<karolherbst> something something
<karolherbst> I am sure it's strides though
<karolherbst> or that would be my first guess
<karolherbst> ahh... I know
<karolherbst> usize to u16 conversion fails, because clients are not required to init all fields of a struct...
<karolherbst> ehh wait
<karolherbst> that's not it
<karolherbst> and would be u32 anyway
<zmike> kusma: tgsi will stay in llvmpipe until softpipe is removed
<zmike> as the draw module also still uses it
<alyssa> who are we keeping softpipe around for again
<zmike> alternatively rewrite softpipe to use nir also, I suppose
<alyssa> oh right obscure archs that don't have llvmpiep nvm
<alyssa> *llvm
<karolherbst> alyssa: those exist?
<zf> is there no interest in having a conformant software rasterizer, then?
<zf> I thought that was the purpose of softpipe
<alyssa> llvmpipe is conformant IIRC
<karolherbst> softpipe is more conformant than llvmpipe?
<karolherbst> since when
<alyssa> softpipe is llvmpipe but super slow and without LLVM if you like that kinda thing
<zf> my impression was that llvmpipe intentionally violated the spec in some ways for the sake of performance
<alyssa> that was swrast, which has been deleted
<karolherbst> it passes the CTS
<karolherbst> alyssa: you mena swr
<karolherbst> *mean
<alyssa> karolherbst: maybe that one too? :-p
<karolherbst> or.... ehh.. swrast was classic, no?
<alyssa> ye
<karolherbst> anyway, llvmpipe is conformant
<zmike> llvmpipe is GL 4.5 conformant
<zmike> not ES
<karolherbst> in the end it doesn't matter
<zf> had to look this up, but at least in the past it needed GALLIVM_DEBUG=no_rho_approx,no_brilinear,no_quad_lod
<zf> to be fully conformant
<zf> also GALLIVM_PERF=no_filter_hacks
<karolherbst> zf: hardware is also not perfect
<zf> well, sure, but isn't perfection kind of the point of a reference rasterizer? :D
<karolherbst> but I think the perf opts are disabled by default
<karolherbst> zf: what's the point of software rast if it's unusably slow
alyssa has left #dri-devel [#dri-devel]
<karolherbst> reference impl is kind of a myth though in GL and Vk
<karolherbst> nobody needs it and nobody cares
<zf> it can be quite useful to run tests against it
<karolherbst> and what do you want to test with it?
<karolherbst> all games have optimized paths for vendors anyway
<karolherbst> so testing that against swrast is pointless
<karolherbst> as the vendor specific paths is something you'll need to test anyway
<zf> well, I work on wine, and we try to avoid vendor-specific paths
<karolherbst> and then you can just skip swrast because that would mean 4 rounds of testing instead of just 3
<karolherbst> or 2 if you ignore intel
<zf> there's been plenty of instances where testing against swrast reveals something interesting
<karolherbst> zf: yeah well.. which isn't true
<karolherbst> ohh sure, it might reveal something interesting
<daniels> Mesa CI does tons of conformance testing of and on top of llvmpipe
<zf> swrast also has the nice feature that it's unlikely to cause a GPU reset and break one's whole desktop
<daniels> it does need those options, as you say, but it is fine if you have them
<karolherbst> zf: sure, but then some conformant sw impl is fully enough
<karolherbst> I specifically said "reference impl"
<karolherbst> that is what nobody needs
<karolherbst> ehh meant
Haaninjo has joined #dri-devel
<karolherbst> anyway, "reference implementations" are usually something the spec author(s) write in order to show how to implement things
<zf> sure
<zf> fair enough, I'm not arguing for a reference implementation, but having a fully conformant one is still quite useful IME
<karolherbst> sure, and llvmpipe is fully conformant
<zf> not reference implementation qua reference, I should say
<karolherbst> well.. at least so far the CTS checks
rgallaispou has quit [Read error: Connection reset by peer]
mbrost has joined #dri-devel
<karolherbst> but speaking about llvmpipe.. I would be curious how llvmpipe makes use of vectorized instructions if at all... but I can also see that SSE/AVX/etc.. are lacking instructions to do that properly
gio_ has quit []
fxkamd has joined #dri-devel
camus1 has quit [Ping timeout: 480 seconds]
lkw has quit [Quit: Lost terminal]
mq47 has joined #dri-devel
<ajax> karolherbst: llvmpipe goes out of its way to rearrange pixels into SoA form so it can vectorize
Lucretia has quit []
<ajax> except for the "linear" path but that uses sse2 too
mq47 has quit []
<ajax> it's about as clever of an swrast as you could want, the biggest performance challenge with llvmpipe is your cpu's utter lack of bandwidth
<clever> SoA?
<ajax> "structure of arrays". four input RGBA pixels get rearranged into RRRR GGGG BBBB AAAA
<clever> ahhh right
<ajax> as opposed to the AoS of the input form
<clever> at least on the rpi gpu, it can accept data in either form i believe, the vector part of the 3d core is surprisingly flexible
<clever> but for other vector parts of the chip, SoA may be better
<clever> so which is better can vary, depending on the consumer
<DanaG_> Aside from my amdgpu + ast issues on my x86 server board, on my arm64 machine I've also had some odd lockups with traces that mention CEC (hdmi-cec)... but maybe that's more a kernel bug than a DRI bug? https://dpaste.com//BM8XND5QF
<ajax> yeah. iirc the SoA form is better for real 3d tasks since you spend more of your time in the meat of the shader, and the linear path stays in AoS because 2.5D compositor kinds of tasks don't do a whole lot of math in the fs so the swizzle in and out of SoA becomes meaningful overhead
rgallaispou has joined #dri-devel
<DanaG_> I tried amdgpu.dpm=0, but that disables the fan control, annoyingly. But amdgpu.bapm=0 seems to maybe help.
<clever> ajax: for the 3d core of the rpi, it always loads sets of 16 vertex into the vector core, but it has a wide range of modes, that can accept either
<ajax> (here speaking just of llvmpipe, i don't pretend to know the details of most modern gpus at this level)
<ajax> hey, speaking of vc4 machines
<clever> in text form, refering to a given row+col in the matrix, in 8bit packed mode, it will turn an uint32_t[4] into an uint8_t[16], breaking the bytes up
<clever> but you can also do 8bit laned, where you give a row and a byte, then it will extract the specified byte (0-3) from all 16 values in an uint32_t[16]
Lucretia has joined #dri-devel
<clever> so you could have an array of struct { uint8_t a,b,c,d; }, then you load it into the VPM, and use 8bit laned mode to read 16 a's
<ajax> clever: do you happen to have one handy and would you mind running x11perf on it a couple of times for me? i have a libX11 patch that seems like an obvious win on this machine but it's a 6-core i7 so i kinda want to check the overhead on a less potent machine
<clever> and 16bit laned mode lets you do the same with struct { uint16_t a; uint8_t b,c; }
<clever> ajax: i do have the entire model range, pi0, pi1, pi2, pi3, pi4, pi400, and pi02!
<ajax> aces
<ajax> and then 'Xvfb -ac -noreset -scrn 0 1024x768x24 :77 &'
<clever> in framebuffer mode? that driver is generally deprecated
<ajax> and then 'DISPLAY=:77 x11perf -noop -query -dot -rect{1,10} -{putimage,shmput}10', twice, once with that patch and once without
<ajax> enh, fb mode is always something we have to care about because there's always like the efifb case where i915 doesn't support your gpu yet
<ajax> if you want to the same test on hardware server that's fine too, just let me know which kind you're doing
<ajax> back story is XInitThreads is optional for a bunch of historical reasons but among them that the locking overhead would be unacceptably high
<clever> and i'm guessing a lot of the libX11 functions will just not do locking, if you dont XInitThreads() ?
<ajax> and afaict it's just not measurable anymore, so the tests there (except for -query) are all quite small in terms of work per request so they should show any overhead
<ajax> right
<clever> but thats still an overhead of having to check if locking is needed or not
<ajax> hence the 'weaksauce' comment in the commit message ;)
<ajax> point is even if i don't try super hard the overhead seems to not be there, and it makes whole classes of bugs die forever, so...
<clever> this sounds like its mainly a problem within the X11 client
<clever> and the exact backend used by the server doesnt matter?
<ajax> nod. as long as the server doesn't just no-op all rendering the numbers should be pretty meaningful
<mattst88> ajax: on the topic of libX11: should I open an MR to revert the commit mentioned in https://gitlab.freedesktop.org/xorg/lib/libx11/-/issues/153 ?
<clever> ajax: and since its a client thing, i can use LD_LIBRARY_PATH to force x11perf to use the new library?
<ajax> clever: yep. do it on both runs just so you're sure you're comparing same cflags instead of whatever the system libX11 happened to be built with
<clever> ah yeah, thats also critical
<clever> i also suspect that the arm core will impact things too
<clever> the bcm2835 arm1176 core doesnt really have the same mutex opcodes i think, because its single-core
<ajax> mattst88: i'm leaning no? the logic in that patch still seems to make sense, and !113 seems like a more robust fix
<mattst88> okay. do you want to press the button?
<ajax> clever: i will take as much data as you're willing to generate here ;)
<mattst88> I'll make a new release once that's fixed
<ajax> ugh, libx11 still makes you insert your own link to the MR in the commit message?
<ajax> mattst88: !125 too please?
<clever> ajax: and just to cover all of the bases, how would i check if a compositing window manager is active?
<clever> pi@pi400:~ $ x11perf -noop -query -dot -rect{1,10} -{putimage,shmput}10
<clever> usage: x11perf [-options ...]
<clever> it just prints the help text
<ajax> oh right, -pointer not -query
<clever> yep, i now see some flickering dot grids
<clever> and some signs of tearing
<ajax> 'xprop -root | grep NET_WM_CM' i think is the usual way? that's what xcompmgr looks for anyway
<ajax> ngh no that's just the atom name
<ajax> but i think 'xcompmgr -c' will fail if a cm is already running
<clever> before changing libX11 any, the benchmark has pegged Xorg at 100% cpu
<clever> so the client isnt even the bottleneck, the server is
fxkamd has quit [Remote host closed the connection]
<ajax> excellent
<ajax> i really only expect significant changes to noop and maybe dot or rect1 numbers
<ajax> the rest is controls
<clever> yeah, `xcompmgr -c` did start successfully, and when i ctrl+c'd, the shadows vanished
<clever> so it was off initially
<clever> https://gist.github.com/cleverca22/557ee4dd59180781ec23d70463e1578e the initial baseline, before changing anything
<clever> configure.ac:25: error: must install xorg-macros 1.15 or later before running autoconf/autogen
rkanwal has quit [Ping timeout: 480 seconds]
<clever> what was that package called in debian based distros?
<ajax> xutils-dev i think
<clever> that did something
<clever> building
<clever> ajax: oh, i should also comment out my cpufreq stuff, cron was set to change the cpu freq every hour, lol
<ajax> hah
<ajax> you say kms driver, is this Xorg + glamor?
<ajax> you did say Xorg i guess
<clever> root 916 18.9 1.4 201180 57724 tty7 Ssl+ 13:10 4:00 /usr/lib/xorg/Xorg :0 -seat seat0 -auth /var/run/lightdm/root/:0 -listen tcp vt7 -novtswitch
<clever> # grep glamor /proc/916/maps
<clever> f6ac4000-f6aee000 r-xp 00000000 b3:02 5564 /usr/lib/xorg/modules/libglamoregl.so
<ajax> excellent
<clever> build is done, on master 918063298cb893bee98040c9dca45ccdb2864773
<clever> for some benchmarks, the client is getting up to 20% cpu
<clever> for others, its much lower
<clever> `putimage 10x10 square` also looks wrong
<clever> the lines are just pure chaos
<clever> is it supposed to be?
<clever> ajax: gist updated, with libX11 master
<clever> ajax: and again, with your branch
<clever> ajax: no-op is far worse looking
aravind has quit [Ping timeout: 480 seconds]
<karolherbst> ajax: yeah.. I was more thinking about running shader purely with SSE/AVX
<karolherbst> so not just launching threads executing "GPU threads" but also vectorize internally, because... that's what we do on CPUs and I think that was the goal of swr?
<clever> karolherbst: when using SSE/AVX, how many vectors of uint8_t can you load into registers at once? how wide is each vector?
<clever> from a brief look at vector extensions in aarch64, its surprisingly weak
<karolherbst> I would hope that llvm knows how to optimize, but auto vectorization being so useless overall :/
<karolherbst> clever: was more thinking about x86, but yeah, aarch64 and risc-v could benefit as well
<clever> i'm curious as to how powerful x86 is as well
<karolherbst> auto vectorization is just doomed to fail, so developers have to be smart about writing code
<ajax> 2.6x slower noops. immeasurably slower everything else.
<karolherbst> clever: _if_ you can vectorize it's fast
<clever> ive been using a cpu core that has enough register space to hold 256 vectors of uint8_t[16]
<karolherbst> and how does that help if code doesn't make use of it?
<clever> so arm having only room for something like 8 or 16 doubles, seems rather weak
<ajax> karolherbst: i feel like a lot of llvmpipe's smarts is already about moving things into vec4s before llvm gets to them
<karolherbst> ajax: probably
<karolherbst> but
darkapex has joined #dri-devel
<karolherbst> the issue is, we still execute threads 1:1 afaik
<ajax> 1:1 with what
<clever> ajax: i think the "everything else" was bottlenecked with Xorg actually drawing to the drm buffers, as instructed?
<karolherbst> gpu:cpu threads
<karolherbst> so if you launch 1024 CL kernels you get 1024 "work items" which get executed by llvmpipe
<karolherbst> sucks for scalar code
<karolherbst> I might be wrong, but I think that's what is happening
pochu has quit [Quit: leaving]
<ajax> scalar code sucks, yes.
<karolherbst> well
<karolherbst> it doesn't
<karolherbst> that's the point
<ajax> that work queue does get distributed over every thread though, and i'm pretty sure lp will do one per core
<karolherbst> on real GPUs it won't matter as you can just run 10k threads in parallel
<karolherbst> so GPUs moved to scalar ISAs and everything
<karolherbst> but on CPUs....
<ajax> again, your problem isn't the instructions you're retiring, it's the long thin tube connecting your EUs to memory
<ajax> at least for GL uses of llvmpipe. maybe CL is different.
<karolherbst> mhhh, yeah maybe it won't matter as long as memory is connected as slowly as it is on x86
<clever> i did some benchmarking of that as well on the rpi
<clever> let me find my numbers
<karolherbst> but we do have AVXed and SSEed memcpys which do speed up things
<clever> 400mhz DDR2 ram, has an estimated 25.6 gigbit/sec of bandwidth, assuming you transfer data on every single clock
<clever> vectorized loads from uncached ram, to the VPU, got 23.5-23.7gigbit/sec of thruput
<clever> when using the 4096 byte vector-load opcode
<karolherbst> yeah well.. my intel CPU has like 76.8 GB/s max
<clever> 91% of the theoritical ram bandwidth, doesnt seem like a thin tube
<ajax> 25.6 GB/s, for reference, is a radeon 9800 (yes, the thing from 2003)
<karolherbst> well
<karolherbst> yeah
<karolherbst> x86 has slow memory
<karolherbst> but it's not _that_ slow
<karolherbst> I mean if there isn't enough memory bandwidth for SSE and AVX that would make them pointless, no?
<clever> karolherbst: from what ive seen of arm vector extensions, there is relatively few vector registers, and any code using it is going to be extremely load/store dense
<karolherbst> sure it would have been better to multiply cores by 8 and not add SSE/AVX but here we are
<clever> but the VPU vector extensions, are HUGE, and i could load an entire dataset in one shot, and then do an entire FIR filter in ~3 opcodes
<clever> and there is enough to spare, that i can keep the entire coefficients table in the registers, and load another dataset to FIR against
<karolherbst> clever: because big vector sizes are a waste of transistors
<clever> so i can omit an entire load on every loop
<ajax> karolherbst: i guess i'm trying to say most of the v11n that can usefully be done probably has
<karolherbst> probably
<clever> karolherbst: i think its this big, because it was originally a DSP core
<karolherbst> vectors for highly specific use cases are fine, just not for general purpose stuff
<ajax> i can't remember if lp stores textures pre-tiled-and-soa'd
<clever> the vectors are also always 16 wide, that is hard-wired
<ajax> seems like it should if it can
<clever> so its not that big of a vector, its more the number of vectors it can hold
rgallaispou has quit [Read error: Connection reset by peer]
<karolherbst> clever: I prefer 16 times more cores than vectors tbh
<clever> thats kinda what the 3d core did
<ajax> remember the other problem with keeping the cpu fed is your input data having any cache locality
<karolherbst> but that's super hard to do on CPUs actually
<karolherbst> it's a mess
<karolherbst> clever: yes and no
<clever> the 3d subsystem is 12 cores, of 16 wide vector-only compute
<karolherbst> GPUs can ignore a bunch of problems CPUs have to deal with
<ajax> and sampling linear memory is about the least cache friendly thing
<clever> yeah
<clever> the rpi 3d core is technically turing complete, but with added restrictions
<karolherbst> clever: sounds like ... 12*16 = 192 GL threads! :D
<clever> conditional branching, is based on if none/some/all of the lanes meet a condition
Duke`` has joined #dri-devel
<karolherbst> *ugghh* that reminds me of somebody who actually compared CPUs and GPUs like that
<clever> conditional execution (an opcode maybe being a no-op) has finer control
<karolherbst> yeah...
<HdkR> SVE and AVX-512 predication matches pretty well with the GPU model
<clever> the asm and most docs also treat the 3d system as a scalar core
<karolherbst> the ISA might look like scalar for most GPUs, but internally it's highly vectorized
<clever> exactly
<clever> it may look scalar, but the hw will schedule 16 threads with 16 different inputs, and 1 shader
<clever> and the illusion only breaks at conditional branching
<karolherbst> it would be interesting to see if there are shaders lp _could_ emit purely vectorized and be variabely sized GPU
<karolherbst> so sometimes it can launch 4 GPUs threads in one go, sometimes only 1
<clever> there is also user controlled threading as well
<karolherbst> mhhh
<karolherbst> openmp, but not threaded, just simed
<karolherbst> :D
<clever> if a shader starts a long async task (texture lookup), it can yield the core to another thread
<clever> the hw will then swap the upper and lower registers, and run a different set of threads
<clever> until they also yield
<karolherbst> ahh sure
<ajax> karolherbst: lp tasks are just a work queue, you can make them as big as you want
<clever> context switching is avoided, by just banning the use of the upper half of the registers, and swapping uppwer/lower
<karolherbst> ajax: yeah.. but I meant merging lp tasks into vectorized variants
<ajax> what do you think is the fundamental unit of work for an lp task
<ajax> it's not: one pixel
<clever> karolherbst: another tricky thing, is that its not even a 16x wide vector lane, its only 4x, but the pipeline helps to cheat
<ajax> they're built to be a vector workload, already
<karolherbst> ahh okay
<clever> my rough understanding of how the QPU functions, is that the pipeline is 4 stages long, so a given opcode takes 4 clock cycles to run
<karolherbst> maybe that doesn't pan out for compute then
<karolherbst> not sure
<clever> but to hide latencies where a register can only be used 4 clocks after you write
<clever> a given "thread" only runs on every 4th clock
<clever> so the pipeline, is always interleaved with 4 "threads"
<karolherbst> but I think 1024 CL threads were split into 1024 lp tasks
<ajax> well i'm describing render tasks, you could probably make compute tasks pick a different microtile size
<clever> and each of those "threads" is a 4x vector task
<karolherbst> ajax: yeah.. that's what I am thinking about how feasible that would be.. not that it really matters, just thinking out loud here
<ajax> probably what i'd do is if the incoming task has a 1x1 work unit is unroll it to 2x2 and feed that as the task?
<karolherbst> CL has explicit grid sizes
<karolherbst> ehh block?
<ajax> llvm might not vec it up very much but you're at least likely to improve cache hit rate
<karolherbst> the smaller one
<karolherbst> yeah...
<karolherbst> I wouldn't trust compilers here to do the correct thing in terms of vectorization anyway
<karolherbst> but maybe things could be improved a little somehow
<ajax> i'm honestly wanting a way out of llvm
<karolherbst> mhhh
<karolherbst> as long as we can keep clang
<karolherbst> nir to x86 would be fun
<ajax> it can be a process on my system, i'm not super into it being a DSO in every process. clang _or_ llvm.
<karolherbst> yeah...
<karolherbst> well I only want clang for CL
<ajax> and like: gallium is pretty well positioned to make informed layout choices already. i don't need my jit to try super hard for that, i just need it to be able to express every vector operation i might encounter
<karolherbst> I really don't want to write a C parser which can deal with all the code out there
<ajax> someone just teach mir about avx2 already
<ajax> anyway i'm rambling
<zmike> ADAM FOCUS
<clever> ajax: want any other combinations tested? Xvfb? fkms? no kms? other pi models?
<ajax> clever: wimpiest pi model you've got
<clever> original pi1 it is!
gio has joined #dri-devel
gio has quit []
gio has joined #dri-devel
<clever> if it will boot...
Thymo has quit [Ping timeout: 480 seconds]
<ajax> ah, provocative maintenance
<clever> maybe a zero...
<clever> same soc, just a different default clock
<clever> yep, booting
<clever> You are in emergency mode. After logging in, type "journalctl -xb" to view
<clever> [ 24.666726] mmc0: read transfer error - HSTS 20
<clever> ah yeah, /boot wont mount, for unknown reasons
ybogdano has joined #dri-devel
ajax is now known as Guest158
Daaanct12 has joined #dri-devel
lkw has joined #dri-devel
alyssa has joined #dri-devel
<alyssa> eric_engestrom: dcbaker: Have we selected a date for the branch point?
<alyssa> I guess April 13 is the 'expected' date but not sure if it'll be sooner/later in practice
<dcbaker> yes.. April 13th IIRC. I'll send out hte calendar update
<alyssa> ack
<alyssa> I'd really like to land Valhall support in Panfrost and trying to budget my time (and modulate my perfectionism) accordingly
<alyssa> given I have gles2 conformant (I.e. enough for accelerated desktop), it would stink if there's no support in 22.1
<alyssa> (Aiming for conformant gles31 in 22.2 regardless, but given Linux-capable Valhall devices are already in the wild 3 months is a big difference)
<dcbaker> okay. We can nudge it, or I can slip things in during th RC for you as long as it's just in panfrostland
<alyssa> Heh, thanks
<dcbaker> (or other people are happy to land common stuff)
<alyssa> We'll see how much I can get in before APril 13
<alyssa> the code is written just needs to be processed thru Marge ;P
<dcbaker> my policy is "if driver teams want to bork their own drivers, that's their call" :D
rkanwal has joined #dri-devel
Daanct12 has quit [Ping timeout: 480 seconds]
slattann has joined #dri-devel
MajorBiscuit has quit [Ping timeout: 480 seconds]
<alyssa> Lol. Fair
<airlied> karolherbst: when you say CL threads what do you mean?
<karolherbst> airlied: one item in the local_work_group
<alyssa> For context of scale, my Valhall bring up branch is about 5kloc diff from main right now
<airlied> launching 1024 1x1x1? or something saner?
<alyssa> Trying to land the first 3kloc today/this week
<karolherbst> airlied: nope, launching 1024x1x1
<airlied> like it vectorizes that
<karolherbst> really? mhh
<karolherbst> maybe I missed that
<alyssa> 3 weeks to clean up and Marge the other 2kloc, sounds doable
<airlied> yes it would suck otherwise
<karolherbst> maybe I need to take a closer look
<linkmauve> karolherbst, on CPU, you can use perf to see how a given function got JIT’d to.
<airlied> it will launch 1024/8
<alyssa> (Especially if I hide the PIPE_CAPs for anything newer than gl2)
<alyssa> (For the new hw only obviously)
<airlied> karolherbst: it also uses coroutines to do barriers
<clever> Guest158: ok, wut, i can read the entire boot partition, but i cant mount it!?
<karolherbst> clever: maybe run fsck?
<clever> bcm2835.c:#define SDHSTS_CRC16_ERROR 0x20
<clever> karolherbst: an HSTS of 0x20, means the SD controller encountered a crc16 error, on the SD bus
<clever> that would imply problems between the card and soc, not the data itself
<clever> and yet, i can read the entire partition, twice, and not have any errors
<clever> fsck is also happy
<airlied> karolherbst: so it could do better at dispatch maybe if barriers if no barriers
<airlied> since if you have an local block size that is nuts it will vectorize, but not thread
<airlied> it only threads blocks
gawin has quit [Ping timeout: 480 seconds]
slattann has quit []
lynxeye has quit []
garrison has joined #dri-devel
fxkamd has joined #dri-devel
<karolherbst> clever: or the driver is buggy
<clever> karolherbst: it only fails like this on certain models
i-garrison has quit [Ping timeout: 480 seconds]
i-garrison has joined #dri-devel
shankaru has quit [Read error: Connection reset by peer]
shankaru has joined #dri-devel
jkrzyszt_ has quit [Ping timeout: 480 seconds]
garrison has quit [Read error: Connection reset by peer]
heat has joined #dri-devel
DanaG_ has quit [Remote host closed the connection]
gawin has joined #dri-devel
ella-0 has joined #dri-devel
DanaG has joined #dri-devel
ella-0_ has quit [Read error: Connection reset by peer]
frieder has quit [Remote host closed the connection]
janesma has joined #dri-devel
nchery has quit [Ping timeout: 480 seconds]
shankaru has quit [Quit: Leaving.]
<karolherbst> 2D images: PASSED 42 of 42 sub-tests. :)
<karolherbst> 3D as well, yay
<karolherbst> alyssa: I forgot to use the strides from the pipe_transfer object... :D
<karolherbst> 1Darray and 2Darray are passing as well, nice
idr has joined #dri-devel
Guest158 has left #dri-devel [#dri-devel]
ajax has joined #dri-devel
<karolherbst> airlied: what's our plan for opaque pointers?
<airlied> karolherbst: I only found out about them yesterday :-)
<airlied> yeah at some point we'll have to to move to the new APIs
<karolherbst> :) I kept ignoring the issue and hoped somebody knowing LLVM would review
<karolherbst> airlied: or.... ditch clover?
<airlied> karolherbst: I think llvmpipe will needs fixes as well
<daniels> karolherbst: saving that message to pull out later when you start reviewing LLVM-adjacent work from others
<airlied> karolherbst: I don't think clover impacts it that much, and if it does, clc will have same issues
<karolherbst> airlied: annoying :(
<karolherbst> daniels: heh :D
Thymo has joined #dri-devel
<Venemo> kusma: about our talk earlier on your radv docs MR, what do you think would be the best way to also include the ACO docs on the Mesa website in the future? If I simply make a MR to move the ACO readme file to docs/ will that do?
<kusma> Venemo: You need to also convert it to rst. I *think* pandoc can do that for you...
<Venemo> kusma: does rst support tables now? Last I checked it didn't
<kusma> Venemo: It does.
<Venemo> Awesome!
<Venemo> dschuermann: how do you feel about this? I think it'd be nice to move the ACO docs there and publish on the website
<alyssa> karolherbst: strides were like my first guess right?
<karolherbst> basically
mhenning has joined #dri-devel
FireBurn has joined #dri-devel
nchery has joined #dri-devel
<Lyude> hooray, confirmed MST on radeon.ko definitely is broken
nchery has quit [Ping timeout: 480 seconds]
janesma has left #dri-devel [Leaving]
macc24 has quit [Quit: ZNC 1.7.5+deb4 - https://znc.in]
<alyssa> I love broken drivers when they're not my own
macc24 has joined #dri-devel
<airlied> alyssa: btw I added you to an mr to z24 on z32 helpers :-P
<alyssa> airlied: so I saw
<Lyude> alyssa: hehe, in this case being broken is a good thing: I can just get rid of the code and not worry about it now
<airlied> but no idea if it covers your case
<alyssa> Unfortunately I need that for my pet driver and I'm currently on the clock for my work driver :-p
<Lyude> since if it's broken (and not even enabled by default apparently), I highly doubt anyone's going to miss it
<alyssa> you know, the one that pays the bills so I can buy shiny toys to write pet drivers for
<airlied> alyssa: no worries, just making sure you've seen it :-)
<alyssa> I did see it and made a mental note to try it on Asahi when I do that again
<alyssa> bifrost_tests: ../src/util/u_cpu_detect.h:138: const util_cpu_caps_t* util_get_cpu_caps(): Assertion `util_cpu_caps.nr_cpus >= 1' failed.
<alyssa> I know some of those words
<alyssa> util_cpu_detect, ugh ok
<alyssa> oh, that's a pathological failure case.
<alyssa> My unit tests call _mesa_half_to_float
<alyssa> which calls util_get_cpu_caps() .... only on x86_64!
<alyssa> airlied: Can we delete the x86-only drivers from tree?
<alyssa> it'd lighten our tree a lot
* alyssa isn't sure where to call util_cpu_detect with gtest
<alyssa> actually will just remove the float_to_half call lol
Haaninjo has quit [Remote host closed the connection]
Haaninjo has joined #dri-devel
anarsoul|2 has joined #dri-devel
anarsoul has quit [Read error: Connection reset by peer]
Danct12 has quit [Ping timeout: 480 seconds]
Daaanct12 has quit [Ping timeout: 480 seconds]
<DanaG> Speaking of non-x86, ARM has some fun stuff... memcpy or pcie write ordering issue: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3274
nchery has joined #dri-devel
glennk has quit [Remote host closed the connection]
gouchi has joined #dri-devel
glennk has joined #dri-devel
lemonzest has quit [Quit: WeeChat 3.4]
Danct12 has joined #dri-devel
lkw has quit [Quit: leaving]
Haaninjo has quit [Quit: Ex-Chat]
<graphitemaster> NV discussing using Intel to fab GPUs is the most hellish thing I've read in computer graphics
<graphitemaster> We are in the bad place
<alyssa> graphitemaster: but everyone has been telling me this is the good place!
<Lyude> lmao I am surprised, I didn't expect Intel to be open to anyone using their fab
<alyssa> Lyude: all companies are open to anything for enough $$$
<graphitemaster> alyssa, That's exactly what someone who doesn't want you to know we're in the bad place would say!
<Lyude> alyssa: very true
rcf has quit [Quit: WeeChat 3.2.1]
rcf has joined #dri-devel
Duke`` has quit [Ping timeout: 480 seconds]
<alyssa> graphitemaster: *blink*
LexSfX has quit []
<graphitemaster> Intel's fab process is so far behind if NV made a GPU on it it would consume 700w of power, the chip would be the size of a CD jewel case, the GPU would take up four PCIe slots and probably require a transfer of the framebuffer to the CPU because Intel will want to slide their terrible hybrid dGPU + iGPU nonsense into it as part of the deal.
<graphitemaster> See, bad place.
mvlad has quit [Remote host closed the connection]
rcf has quit [Quit: WeeChat 3.2.1]
rcf has joined #dri-devel
LexSfX has joined #dri-devel
xroumegue has joined #dri-devel
ybogdano has quit [Read error: Connection reset by peer]
ybogdano has joined #dri-devel
<alyssa> graphitemaster: can't tell if you're forking with me
<icecream95> alyssa: getpid()?
<alyssa> that's one way to find out!
<graphitemaster> alyssa, It's called hyperbole because that is the brand image Intel has right now for their continued failure to deliver improvements on their manufacturing process and strong-holding OEMs to use integrated graphics.
<graphitemaster> I was embellishing it a little :P
<graphitemaster> Do you think I'm being too harsh on Intel
<alyssa> hyperbole, that's the sinh/cosh one right?
<DrNick> mesa's post-processing isn't suited to changing render target resolution is it
<alyssa> people use that?
<DrNick> mesa's postprocessing? no.
<graphitemaster> alyssa, I think those are called the exaggerated sines and cosine functions, they give similar results to sine and cosine but they're slightly exaggerated.
<DrNick> I was just contemplating AMD's RSR
<Lyude> graphitemaster: tbh ADL is finally actually competing so they're starting to do a bit better
<Lyude> (also I'm looking forward to the DG2 dropping, honestly very much want to use one in my next desktop)
<alyssa> graphitemaster: didn't think so
<alyssa> can we remove it yet
<graphitemaster> Remove it from what
<graphitemaster> GLSL?
columbarius has quit [Ping timeout: 480 seconds]
<alyssa> mesa's postprocessing
<alyssa> that was supposed to be @ DrNick
gouchi has quit [Remote host closed the connection]
AndroidC512L has joined #dri-devel
<graphitemaster> I was looking at the swift shader papers on improved exp/log and sin/cos
<graphitemaster> Wrote the sin/cos one in regular GLSL with floatBitsToUint and friends to simulate some of the machine-level stuff they do for what is ostensibly a software renderer
<graphitemaster> Since most of those kernels are not native on GPUs anyways
<graphitemaster> They are faster
<graphitemaster> Curious, does mesa have implementations of sin/cos/exp/log as part of NIR or what ever that are just used in-situ if the GPU lacks native instructions?
<graphitemaster> Could be interesting replacing them with the swift shader implementations if the ones in mesa are slower
<icecream95> graphitemaster: Yup, and also IIRC in TGSI.. I think it's in nir_opt_algebraic.py
JohnnyonFlame has quit [Ping timeout: 480 seconds]
<jekstrand> Most hardware has sin/cos/exp/log
<jekstrand> Other transcendentials like acos, atan2, etc. aren't
<icecream95> See also lp_build_log2_approx and i915_sincos_lower...
<jekstrand> I said "most"
<jekstrand> It might be useful for llvmpipe but no one cares much about i915 perf
<icecream95> jekstrand: I was pointing out that these (at least the i915 one) seem to duplicate the lowering in opt_algebraic
<alyssa> graphitemaster: Just read the sin/cos paper, these are all standard tricks AFAICT
<alyssa> (Less sophisticated than what can be done in hw these days as well)
<jekstrand> We have lowering in opt_algebraic?
* jekstrand wasn't paying attention, I guess.
<alyssa> for lima
<icecream95> lowered_sincos
columbarius has joined #dri-devel
danvet has quit [Ping timeout: 480 seconds]
<jekstrand> I see. Yeah, we could probably drop the i915 one then
<icecream95> "It's suitable for GLES2, but it throws warnings in dEQP GLES3 precision tests."
<jekstrand> :-/
<jekstrand> Good enough for i915, probably
gawin has quit [Ping timeout: 480 seconds]
* alyssa assigns to yeet
<alyssa> wait that sounds terrible
<alyssa> let me try again
* alyssa yeets to Marge
* zmike watches marge yeet it back
<alyssa> We'll see
ahajda has quit [Quit: Going offline, see ya! (www.adiirc.com)]
* icecream95 hurredly looks through the MR for something to NAK
rkanwal has quit [Ping timeout: 480 seconds]
<alyssa> Apparently I have to assign unreviewed code to Marge in order to get reviews C:
<graphitemaster> Don't yeet PRs
<icecream95> alyssa: Do you really have to touch the Bifrost ISA file like this? "<opt>acmpxchg</opt> <!-- For Valhall -->"
<alyssa> icecream95: Diff is missing context, that's part of a pseudo instruction
<alyssa> But... yes, I'm not regretting tying the IR to Bifrost's encodings
<daniels> anholt: given that GCN is now 10 years old, there's probably a case for yeeting r600 to amber tbh
<icecream95> alyssa: Generally pseudo-instructions don't need "<reserved/>" modifier values?
<alyssa> And would like to clean that up but I'd rather not block hw enablement on that
<alyssa> It's a copy paste from an actual valhall instr
<alyssa> *bifrost
<alyssa> (the real atomics)
<anholt> daniels: feels like a weird line -- we have nv30 on this side of amber, which is way worse than r600
<alyssa> The whole pseudo instruction idea is a mess and I regret it and wish I fixed it a year ago
<anholt> and crocus is handling a bunch of older hw, too.
<alyssa> anholt: daniels: IMO the relevant axis is maintainence level, which is only weakly correlated with hw age
<anholt> daniels: tbh I'd be happy if I could just merge r600 changes, like I'm doing for i915. but gerddie kinda owns it.
<alyssa> I don't know where nv30/r600 fall there. crocus seems active, though.
<daniels> anholt: ah, I'd missed that nv30 was still around
<daniels> worth prodding gerddie in any case
<airlied> r300 and r500 are still around :)
<anholt> yeah, we have a new active developer for r300, and it's not me.
<anholt> (and on that note, do we have a "add a new committer" process written down anywhere?)
<daniels> anholt: ask on irc/gitlab/list, get approval from one or more people, eventually someone adds them
<anholt> cool. well, @ondracka has been doing an incredible job fixing regressions and adding new optimization for r300.
<daniels> I'm obv biased, but https://gitlab.freedesktop.org/wayland/weston/-/blob/main/CONTRIBUTING.md#commit-rights might be a good thing to steal
<daniels> anholt: sounds like we should cling on to them before they flee
<jekstrand> Do we have i915 in CI?
<icecream95> alyssa: Won't ir_instructions end up equal to ir_instructions_with_pseudo?
<anholt> jekstrand: as a manual job, yeah, you can just go click the button
<anholt> (it's not pre-merge, because it's a bit slow, and it's on my desk and doesn't really have a guaranteed service level)
<jekstrand> anholt: Ok, but good enough I can test out my "delete sincos lowering" change. :D
<anholt> totally. and there's enough instr-count sensitive gles2 coverage that you'll know if you hurt things.
<anholt> daniels: went looking for where to put the docs, and today I found the existing instructions. https://docs.mesa3d.org/repository.html#developer-git-access
rbrune has quit [Ping timeout: 480 seconds]
<graphitemaster> Speaking of math in GLSL. Who here wants to fix integer divisions? Literally unusable in any shader because the rounding direction is undefined. I've seen the same hardware round differently just based on the values. It seems NV turns integer division into floating point so the precision is like 24 bits (might be less). At the very least some warning about using it would be nice in the shader compiler so I can detect bugs. I've
<graphitemaster> now found and corrected probably 50 such bugs.
<alyssa> icecream95: ugh. typo.
<alyssa> good NAK work
<alyssa> thanks
<alyssa> graphitemaster: Just don't do integer divisions on the GPU.
<alyssa> Just, really, don't.
<graphitemaster> I know but how do I enforce that in a team alyssa
<graphitemaster> There's no way to enforce it
<alyssa> jury rigged CI pipeline
<graphitemaster> How would you detect an integer division
<graphitemaster> Keep in mind a compiled shader using integer division will get turned into float division
<graphitemaster> And float divisions are fine
<alyssa> NIR_DEBUG=print ./app | grep idiv,udiv,imod,umod,irem,urem
<graphitemaster> Need to trigger all the shaders
<graphitemaster> Just running the app won't do that
<graphitemaster> if integer divisions are so bad they should be removed from the shading language
<graphitemaster> at the very least there should be an #extension I can require which turns them into errors
<mareko> AMD has precise integer division, use that
<graphitemaster> AMD is the one platform (on Windows) that gets integer divisons in GLSL wrong the most :P
<jekstrand> Intel does too
<jekstrand> Intel even has a "real" integer divide instruction
<alyssa> butwhy.gif
<graphitemaster> how does one get precise integer division
<mareko> via Mesa
<alyssa> soul sale
<jekstrand> Clearly, you need Mesa on Windows. Problem solved. :-P
<alyssa> that but unironically
<icecream95> graphitemaster: while (a > 0) { a -= b; ++c; }
<icecream95> :P
<alyssa> O(a) perf woot!
gawin has joined #dri-devel
<alyssa> actually not O(a)
<graphitemaster> I just need the people creating shading languages and graphics drivers to agree on something as basic as what way integer division rounds because it's totally unusable otherwise, imagine it rounds wrong and that value accesses out of bounds ...
<alyssa> unbounded and incorrect
<alyssa> icecream95: Your excellent algorithm has a bug, it hangs if b = 0
<icecream95> Or you could try subtracting shifted versions of b, like armv7 software division
<jekstrand> graphitemaster: Are you sure it isn't specified?
<graphitemaster> it isn't!
<jekstrand> graphitemaster: You may just be hitting AMD driver bugs
<graphitemaster> all the gl specs leave the rounding direction unspecified
<zmike> mareko: do you have any comments on the gallium patches in https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15504
<jekstrand> I mean, I can easily believe that...
<graphitemaster> and even the precision of the round unspecified (but at least 22 bits)
<icecream95> alyssa: It correctly returns the value 'infinity'. After infinite time, though
<alyssa> ah, I see that now
<alyssa> I still think you should optimize it to be
<alyssa> if (b == 0) { c = infinity; } else { while (a > 0) { a -= b; ++c; } }
<alyssa> we should coauthor a paper about the Icecream-Alyssa Integer Division Algorithm
<alyssa> IAIDA, the name rolls off the tongue
<icecream95> alyssa: Most integer formats don't support infinity, and anyway what if we should have returned negative infinity instead?
<icecream95> ..It doesn't work for negative inputs anway
<jekstrand> graphitemaster: Yeah... The SPIR-V spec for OpSDiv simply says " Signed-integer division of Operand 1 divided by Operand 2."
<graphitemaster> p=1,r=0;while((n>>1)>=d)p<<=1,d<<=1;for(;p>0;p>>=1,d>>=1)if(n>=d)n-=d,r+=p;return r;
<graphitemaster> alyssa, check that O(M-N)
<graphitemaster> n is numerator, d is denominator
<jekstrand> graphitemaster: Feel free to file a SPIR-V spec bug if you want
<graphitemaster> jekstrand, Yeah it's literally unusable, also all the translation layers which map Vulkan to DX12, or DX12 to Vulkan or Vulkan to Metal have silent bugs because DX actually _defines_ the rounding mode as does Metal
<jekstrand> graphitemaster: That's probably more useful than GLSL and will likely end up with people looking at the GLSL spec too
<jekstrand> graphitemaster: I suspect the intention in GLSL and SPIR-V was round-towards-zero
<jekstrand> i.e. C integer division
<jekstrand> But they just didn't bother to think that people would interpret it wrong
<graphitemaster> jekstrand, The issue is multi-faceted because the rounding direction differs depending on the sign as well and nothing is consistent there. Also C defines % in such a way that it's consistent with division, i.e (a/b)*b+a%b=a
<graphitemaster> And GLSL definitely does not honor C there at all either
<jekstrand> graphitemaster: SPIR-V does have OpSRem and OpSMod which do care about sign, sort-of
<jekstrand> But division going all sorts of weird ways seems wrong
<jekstrand> Worth filing an issue. Lets people know that there are devs out there suffering
<graphitemaster> I worked out after experimenting that AMD's division works different for positive numbers than it does rounding, paste incoming
<graphitemaster> "floor" "truncate" "amd"
<graphitemaster> b>0 [0, b-1] [-b+1, b-1] [0, b+1]
<graphitemaster> b<0 [b+1, 0] [b+1, -b-1] [b+1, -b-1]
<graphitemaster> Which is so strange (and totally broken)
<graphitemaster> Yeah I should file something on the GH maybe
<jekstrand> It's likely to accompish more than complaining here. :)
<graphitemaster> The WebGPU folks have been trying to figure this out too https://github.com/gpuweb/gpuweb/pull/1830
<graphitemaster> Sorry :P
<jekstrand> It
<jekstrand> It's ok. I don't mind you complaining here. #dri-devel is mostly for shit-posting, after all.
<jekstrand> And I could file the bug for you but I think it'll get more of the right attention if it's filed by someone external who's struggling with driver inconsistency in the real world than if I file something about how the spec is unclear.
<graphitemaster> My assumption is something as basic (and fundamental) as a bug in integer division wouldn't be something that requires that. I mean it shouldn't even be broken, how has this not been caught by tests.
<graphitemaster> Should need an advocacy group or presentation titled "Making division work in 2022"
<graphitemaster> s/Should/Shouldn't
<alyssa> graphitemaster: there are 2^64 test cases for idiv32
<graphitemaster> You only need two test the edge cases :P
<graphitemaster> s/two/too
<alyssa> If an implementation is right on 99% of inputs, it can easily be missed by random sampling (which the CTS does lots of)
<graphitemaster> But it's not right on 99% of inputs, it's right on 50% of inputs since 50% of pair of numbers will round differently
thellstrom has quit [Ping timeout: 480 seconds]
nchery is now known as Guest182
Guest182 has quit [Read error: Connection reset by peer]
nchery has joined #dri-devel
Peste_Bubonica has joined #dri-devel
dabaiste^ has quit [Ping timeout: 480 seconds]
Peste_Bubonica has quit [Quit: Leaving]