ChanServ changed the topic of #dri-devel to: <ajax> nothing involved with X should ever be unable to find a bar
ybogdano has quit [Ping timeout: 480 seconds]
<mareko>
adding optional pipe_context* into resource_create has just crossed my mind, and it seems it would have a few benefits
nchery has quit [Ping timeout: 480 seconds]
<gawin>
mareko: I can make new MR with revert patch, but what about shadowcube?
heat has joined #dri-devel
idr has quit [Ping timeout: 480 seconds]
<Plagman>
any plymouth experts here by any chance? rotating an image is leaving me with cropping having to do with the pre-rotated dimensions of the image and i'm not sure how to deal with it
heat_ has joined #dri-devel
co1umbarius has joined #dri-devel
columbarius has quit [Ping timeout: 480 seconds]
Bennett has quit [Remote host closed the connection]
Duke`` has quit [Remote host closed the connection]
nsneck has quit [Remote host closed the connection]
Duke`` has joined #dri-devel
gawin has quit [Ping timeout: 480 seconds]
shashank1202 has quit [Quit: Connection closed for inactivity]
JohnnyonFlame has joined #dri-devel
gawin has joined #dri-devel
pendingchaos has quit [Read error: Connection reset by peer]
adjtm has quit [Remote host closed the connection]
adjtm has joined #dri-devel
pendingchaos has joined #dri-devel
Company has joined #dri-devel
ella-0_ has joined #dri-devel
ella-0 has quit [Ping timeout: 480 seconds]
markus has quit [Ping timeout: 480 seconds]
markus has joined #dri-devel
cef is now known as Guest1602
cef has joined #dri-devel
Guest1602 has quit [Ping timeout: 480 seconds]
ella-0 has joined #dri-devel
ella-0_ has quit [Ping timeout: 480 seconds]
<jekstrand>
poll of ARM devs: alyssa, anholt_, robclark: What kind of memory do you store compiled shader binaries in? How painful is it to read back?
<jekstrand>
I'm thinking through the design of a shared vk_pipeline_cache and trying to decide if we need to always store a CPU shadow-copy of everything or if it's better to save memory by trying to avoid storing two copies of the shader.
<jekstrand>
Sadly, the way vkGetPipelineCacheData() is designed, we can't serialize and store on-the-go, we have to do it later when requested by the app so whatever we store internally has to be able to be serialized to a flat blob of data.
<clever>
jekstrand: for the rpi hw, the gpu ram isnt special, the cpu and gpu share the lower 1gig of ram, and its purely software rules deciding where to draw the lines
<clever>
jekstrand: but the 3d hw cant snoop on the arm caches, so you do still have to flush the cache to ram
<jekstrand>
So is it written and then the cache flushed or is it mapped write-combine?
<jekstrand>
Basically, do I have a read cache?
<clever>
i dont know what the current linux driver does exactly
<clever>
it could be configured to allow reading it back
<jekstrand>
I think for Intel and AMD dGPUs, we want a shadow copy simply because the memory can vanish on VK_ERROR_DEVICE_LOST.
<jekstrand>
For Intel and AMD integrated, we probably don't need that copy.
<robclark>
jekstrand: we store shader itself it an a WC GEM bo (and some push consts in the same bo).. but that is referenced by a pre-baked stateobj (which has ptr to shader bo)
<clever>
and just forces a flush when you tel the gpu to begin reading it
<jekstrand>
robclark: How much does it suck to read through that WC map?
<robclark>
jekstrand: also, atm disk_cache support is in ir3 (shared by tu and gallium).. fwiw
<robclark>
for reads it is kinda like uncached.. so not ideal
<clever>
jekstrand: no way for the ram to just vanish, its just a regular CMA allocation with linux's heap logic in full control
<robclark>
but why would we want to read it back (other than for debug stuff like dumping disasm)
<clever>
(on rpi hw)
<jekstrand>
robclark: If the read is a memcpy and it's going to start off cold anyway, how much worse is it?
<jekstrand>
robclark: VkPipelineCache requires us to be able to serialize at any time.
<robclark>
no prefetch, etc
<jekstrand>
robclark: Right. That could suck
<jekstrand>
I'm kind-of inclined to say we can eat the cost at vkGetPipelineCacheData() time and saving memory is more important.
<jekstrand>
Especially on little ARM parts
<jekstrand>
Less worried about Intel laptops with 16G of ram
<robclark>
on a6xx we *can* do cached-coherent.. but we mostly only want to do that for things we expect more CPU readback than GPU access.. on gallium side we use it for staging bo's (like texture upload/download)
<clever>
jekstrand: you could in theory do a flush without a clean, so then the data exists in both cache and ram
<jekstrand>
With discrete, the memory can actually disappear if we have a device lost.
<robclark>
jekstrand: I guess the question is whether GetPipelineCacheData() is expected to be critical path.. or if only done on cache misses when game is starting up
<jekstrand>
But I'm currently thinking we just loose some objects if that happens.
<jekstrand>
robclark: It should only be done after the load screen and they've compiled all their shadres
<jekstrand>
Or on shutdown
<jekstrand>
It's very much not critical path
<robclark>
sounds probably not criticial path.. I guess you should see what flto/cwabbott/anholt_/krh/etc say, they may have stronger opinions about vk side of things (tbh, my head is still more in gl/gallium)
<robclark>
but if not critical path, I don't thing WC readback is that bad
<robclark>
we have draw time WC index buffer readback for quads (ie. mincraft) on the GL side (and that sucks somewhat) ;-)
pushqrdx has joined #dri-devel
<clever>
jekstrand: another factor is textures, on the vc4 v3d (pi0 to pi3), the textures need to be in a special tiled format, but that conversion must be done entirely in the host cpu
<clever>
jekstrand: but on vc6 v3d (pi4(00)), the textures need to instead be in a different tiled format called UIF, and there is a dedicated hw block to convert from linear/planar rgb/yuv, to uif
<clever>
jekstrand: so you need to put one form of the image into ram, flush, allocate a second region, and tell the TFU to convert into the 2nd buffer, and now that 2nd buffer isnt coherent with the arm caches
<clever>
jekstrand: so you must discard/invalidate the arm cache for that range, or youll read back the wrong data
<pushqrdx>
so i figured out what i did to make firefox stop tearing without compositor and on intel/modesetting driver, but idk why it works wonder if somebody can explain
<pushqrdx>
basically i just set my gpu clock speed to max which is 1250mhz
<pushqrdx>
just doing that makes firefox stop tearing
heat has joined #dri-devel
macromorgan has quit [Read error: Connection reset by peer]
macromorgan has joined #dri-devel
<robclark>
pushqrdx: well, tearing is after all a race condition, between gpu rendering and display scanning out.. change the gpu freq and you change the timing
<HdkR>
Likely still tearing, just on the top side of the application window where you can't really see it
<pushqrdx>
HdkR even in fullscreen no tearing anywhere
<pushqrdx>
still horizontal tearing but no vertical tearing at all
<HdkR>
Oh, so tearing
<pushqrdx>
yeah but vertical tearing is most annoying i rarely ever see horizontal tearing anyways
<pushqrdx>
robclark but why would this clockspeed make it almost perfectly in sync with the monitor, should they phase out with time somehow
<pushqrdx>
it's not like they started exactly at the same time
<HdkR>
Upclocking likely just makes it fast enough that the tear doesn't happen across multiple lines. Staying instead in roughly one line
<pushqrdx>
makes sense this might be it
<robclark>
yeah, somehow or another you are just changing the timing and that happens to make it less noticeable
<pushqrdx>
robclark it's not less noticeable it's gone, 2 days now and i am pedantically checking every corner while scrolling (i have firefox smooth scrolling just like on macOS by enabling physics scroll) and the animation is smooth with no tearing once i set the clock to max
HaeckseAlexandra has joined #dri-devel
<clever>
robclark: but with proper design, only doing pageflips on vsync, tearing should be impossible, no matter how slow components are
slattann has joined #dri-devel
<clever>
robclark: what i dont understand, is how so many drivers can mess up such a basic thing
<robclark>
clever: you use "with proper design" when talking about x11 ;-)
<robclark>
ie. it is front buffer rendering
<clever>
robclark: from what ive heard, the rpi x11, is abusing opengl for its composition, so it seems simple enough to just have 2 or 3 buffers that opengl renders into, and flip between them
<clever>
and yet even that still tears
<clever>
perhaps because the clients within x11, arent double-buffering?
<clever>
so opengl reads the client rect and tears?
<pushqrdx>
someone implemented tearfree for modesetting x11 driver
<pushqrdx>
but it never got merged for some reason
<pushqrdx>
been thinking about rebasing it and giving it a shot
<robclark>
the fact that glamor uses opengl to accel x11 rendering is orthogonal to frontbuffer vs flipchain
<clever>
i also say "abusing opengl for its composition", because the 2d kms layer on the rpi can do composition already
<robclark>
x11 is *not* a compositor (although compositing window managers can use x11 extensions to move client rendering offscreen)
<clever>
robclark: https://www.youtube.com/watch?v=JFmCin3EJIs this is a demo i wrote, using the same hw that backs kms on an rpi, 13 seperate framebuffers, with per-pixel alpha, all moving on vsync, no tearing, almost zero cpu usage
<robclark>
sure, and not useful at all for x11 ;-)
tobiasjakobi has joined #dri-devel
<clever>
robclark: what if i give each x11 client, its own dedicated bitmap image in ram, and then just tell the hw to render each of those at the right xy coords?
<clever>
then moving a given window, would never tear
<robclark>
because that isn't how x11 works ;-)
<clever>
but for updates to a window, the client would still have to double-buffer itself, and ask x11 to swap
<clever>
how is x11 different? ive not used it as that low of a level yet
<robclark>
yeah, and then you have a thing that is called "wayland"
<pushqrdx>
except that wayland still sucks
<robclark>
well, I mean you were the one complaining about tearing ;-)
<pushqrdx>
and still using x11 i think this speaks enough
<clever>
robclark: how does x11 work at a low level, is it one bitmap for the whole desktop or per-client bitmaps? i can see how you might abuse stride to tell a client to render to a sub-rect, but occlusions complicate that
<robclark>
x11 is a drawing protocal basically..
<FLHerne>
pushqrdx: Wayland pretty much doesn't suck these days
<clever>
robclark: so if i was doing something basic like just hitting page-down once in firefox, what would firefox be sending to x11 to perform that action?
<clever>
FLHerne: can you screen-share your whole desktop?
<FLHerne>
I've been using it for months and don't notice that I am, which is the best thing an infrastructure layer can achieve :p
<FLHerne>
Yes, with xdg-desktop-portal and Pipewire
<robclark>
clever: modern apps are probably doing most of their rendering client side and the doing XShmPutImage to upload that to the screen, I'd guess.. but x11 proto covers everything from drawing text / lines / etc
<clever>
robclark: and XShmPutImage is basically then just an image copy, from an image in mmap'd /dev/shm, to a sub-rect of the main x11 framebuffer?
<robclark>
but unless you are using a compositing window manager, all that drawing is to the scanout buffer (modulo hacks like using a shadow buffer and incurring yet an extra copy)
<robclark>
basically
<clever>
and a compositing window manager would hijack XShmPutImage, copy them to something like an opengl texture, and then let shaders do the final render?
<pushqrdx>
FLHerne nothing on wayland feels cohesive, even mouse cursor is drawn by clients and you can run into issues like each app having a different mouse cursor theme or size, Xwayland is still buggy, i can't run some of my essential X11 dependent apps
<clever>
that doesnt seem terribly hard to avoid tearing on, just set a rule that the client must not modify an image while XShmPutImage() is executing
<clever>
and have x11 double-buffer (hmm, thats a bit tricky) and only flip after the copy is entirely done
<pushqrdx>
if all these years were spent on overhauling x11 instead of going fo the new shiny thing, i am pretty sure X11 would've been working perfectly now
<clever>
to double-buffer, i would need to final buffers to render
<pushqrdx>
it's ironic that the main argument for wayland is, x11 is old and maintaining it without breaking changes is hard
<clever>
but when i flip, i'm rendering into a buffer that is from 1 frame in the past, and doesnt have what i just rendered
<pushqrdx>
then proceed to break everything by making a new protocol
<robclark>
clever: composite extension allows redirecting client rendering to offscreen.. it is what compositing window mgrs (gnome-shell, kwin, etc) use
<HaeckseAlexandra>
I tried building mesa on my Talos II, which has an Aspeed AST2500 GPU and a IBM POWER9 CPU. OpenGL works quite good using LLVMpipe. Now I'm trying to get Lavapipe working.
<pushqrdx>
if you gonna break everything anyway why don't you just overhaul the xserver from the ground up and deprecate/break old the old/hard to maintain stuff
<HaeckseAlexandra>
When I try to run Vulkan Gears:
<HaeckseAlexandra>
WARNING: lavapipe is not a conformant vulkan implementation, testing use only.
<HaeckseAlexandra>
Could not find a graphics and/or presenting queue!
<HaeckseAlexandra>
Note: you can probably enable DRI3 in your Xorg config
<HaeckseAlexandra>
vulkan: No DRI3 support detected - required for presentation
<clever>
robclark: but when using the offscreen rendering, isnt that basically just one server-side bitmap per client, and XShmPutImage must still perform a costly copy into that bitmap?
<HaeckseAlexandra>
When I try to build Mesa:
<HaeckseAlexandra>
../meson.build:180:6: ERROR: Problem encountered: Unknown architecture ppc64. Please pass -Ddri-drivers to set driver options. Patches gladly accepted to fix this.
<robclark>
pushqrdx: people who know x11 much better than you or even myself concluded that wasn't possible... but the code is all there in a git tree, so have at it if you think you can do better
kem has quit [Ping timeout: 480 seconds]
* robclark
would argue that wayland is that "overhaul it from the ground up" approach
<DrNick>
rename Wayland to X12!
<clever>
robclark: whoa, that <canvas> demo is neat!, i can see how that would grealy simplify explaining things
<HdkR>
HaeckseAlexandra: You need to explicitly pass a `-Ddri-drivers=<list>` to meson in this case to choose which drivers you care about
<robclark>
clever: yes, when you have a window manager using composite extension client rendering is redirected (usually to a gl texture).. and page flipping is used.. normally a compositing window manager can avoid tearing
<HdkR>
HaeckseAlexandra: Probably also `-Dgallium-drivers=<list>` and `-Dvulkan-drivers=<list>` as well
<pushqrdx>
robclark having an opinion doesn't mean i think i can do better or know better, or even have anything but respect for devs working on wayland
<clever>
robclark: in the case of the rpi hw, the 3d unit cant use linear/raster images, so you now incur 1 copy from shm->offscreen, then a second complicated copy from offscreen->tiled-texture, then the shaders do a polygon based copy to a final buffer!
<HdkR>
HaeckseAlexandra: You can check the meson_options.txt file to see the valid arguments that can be passed to those options, comma delimited.
<clever>
robclark: but if you instead used kms with the composite extension, you could cut that all down to a single copy, and just kms composite the offscreen buffers together
<pushqrdx>
it's just that i would've loved a more non breaking change than wayland.. also wayland protocol leaves too much to interpretation
<robclark>
clever: the wayland architecture makes it a lot easier to put surfaces on overlays.. I know weston can do this.. not sure what the state of using overlays is on kwin, gnome-shell, and other wayland compositors
<clever>
robclark: also, linux enforces a silly rule of something like 8 or 10 surfaces in kms max, despite the rpi hw being more capable
<clever>
so wayland tends to stull lean on opengl for that
<robclark>
if there is an upper limit on kms planes, that should be an easy restriction to lift
jewins has joined #dri-devel
<clever>
robclark: the biggest problem, is that the limit on the rpi hw, is heavily based on the memory bandwidth, and the total number of pixels the hw has to read in
<robclark>
the bigger problem is a lot of the mature desktop environments still have gl baggage from when they started out life as x11 compositing window managers
<clever>
robclark: for large windows with per-pixel alpha, it tops out at around 13, but for tiny windows (under 32x32), it tops out around 170
<clever>
i suspect fixed alpha may also help, the hw can do occlusion checking, and just not fetch covered pixels
gouchi has quit [Remote host closed the connection]
<robclark>
the atomic API (and atomic-test step) was designed with those sort of use cases in mind.. rpi might be a bit on the extreme end of things, but other hw also has cases where # of planes that can be composited depends on size/position
<clever>
the rpi hw also has 2 modes of composition
<clever>
the primary mode, is to composite in realtime, generating scanlines as needed
<clever>
but if it cant keep up, the video signal corrupts
<clever>
the second mode, is a writeback, where it uses a FIFO and the dma controller (i think) to just dump a rasterized image back into ram
<clever>
timing is far less sensitive then, so you could trivially do ~500 layers at once
kem has joined #dri-devel
craftyguy_ has joined #dri-devel
<clever>
and its simpler then 3d, so it may be faster or more power efficient, compared to opengl
<clever>
and then you just use the primary mode to scanout that final image
<clever>
the writeback mode, is also the only way to get 90 degree rotations (axis swaps), but axis flips (mirroring) is cheap and can be done at scanout
gouchi has joined #dri-devel
<clever>
robclark: oh, ive also noticed some very wonky behavour in x11, in reguards to the alpha channel
<clever>
xterm sets the alpha channel to fully transparent, so it just vanishes
<clever>
xeyes will render the initial eye and pupil fully opaque, but when it erases the pupil to move it, the erase uses transparent white, causing it to rip holes in the "eye"
<HaeckseAlexandra>
svp64 does not exist yet, it is a future ISA extension
craftyguy has quit [Ping timeout: 480 seconds]
<clever>
robclark: is the alpha channel basically just an undefined wild-west, and everybody just assumes the hw ignores it?
<HdkR>
HaeckseAlexandra: Sure, probably something like that
<robclark>
yeah, for x11 you can't really make assumptions about alpha channel.. in *some* cases it may be defined but display should treat it as xrgb
<clever>
robclark: had to dig around a bit, to find the flag for xrgb, but that did fix things 100%
alyssa has left #dri-devel [#dri-devel]
<clever>
the hw supports either per-pixel alpha, or a single layer-wide alpha
<clever>
and maybe a mix of both, i didnt try that yet
<clever>
my rough understanding, is that xvideo is basically just a way to bypass the XShmPutImage, so you can shove frames directly to the gpu, and then chroma-key them in?
slattann has quit []
xexaxo has quit [Ping timeout: 480 seconds]
xexaxo has joined #dri-devel
<vsyrjala>
xv is essentially just yuv putimage. but there are magic implemeentation that use an overlay instead of writing the stuff to screen memory
slattann has joined #dri-devel
<HaeckseAlexandra>
Godot3 is now working, I will continue on Vulkan later.
<HaeckseAlexandra>
OpenGL ES 3.0 Renderer: llvmpipe (LLVM 11.0.1, 128 bits)
<HaeckseAlexandra>
Godot4 also depends on embree which has not yet been ported to PowerPC.
<clever>
vsyrjala: ah, and x86 gpu's tended to only support one such overlay?
<clever>
vsyrjala: but the rpi hw can accept any image in yuv or rgb, and with different bpp, and just magically convert them all in realtime
<vsyrjala>
most can
<clever>
maybe the problem is just that the protocol was designed when most couldnt
<clever>
and backwards compatability is crippling it
<vsyrjala>
there's nothing especially limiting in the protocol. assuming you think putimage is all you need
<clever>
vsyrjala: but does the protocol require putimage to use the same pixel order and bpp, for every client?
<HaeckseAlexandra>
At a later time, I will also try using the RPI, using its experimental libre firmware and reverse engineered GPU. But that way is still long.
<clever>
HaeckseAlexandra: i do have xfce running fully on the open firmware, on a pi2
slattann has quit []
<vsyrjala>
clever: no. oh, one extra thing xv does get you core protocol putimage doesn't have is scaling
<clever>
vsyrjala: and that reminds me, alpha based overlays ontop of xv, are limited to the resolution of the video
<clever>
so if i'm up-scaling a 640x480 xv video onto a 1080 monitor
<clever>
the UI must render at 640x480!
<clever>
i think that software is rendering the ui into the 640x480 yuv buffer?
<clever>
because anything rendered at the x11 layer, cant alpha blend, due to the chroma-keying
<clever>
so its either look like crap, or look like crap :P
<vsyrjala>
sure. putimage is meant to not blend
<clever>
the rpi hw also has full scaling support as well, so each image being rendered, can be scaled differently, but that does cut into the total layer count
<clever>
the control list is 4096 slots long, a 1:1 image takes up 7 slots, but a downscaled image i think takes 14 slots
<clever>
just for the fields to describe how to scale it
HaeckseAlexandra has left #dri-devel [#dri-devel]
pnowack has quit [Quit: pnowack]
xexaxo has quit [Ping timeout: 480 seconds]
xexaxo has joined #dri-devel
<clever>
robclark: i just checked xrestop, xfwm4 has nearly 1gig worth of `Pxm mem`!!!!
<clever>
i'm also reminded of how video in youtube/chromium sometimes glitches out
<clever>
if i kill the gpu process, the backing data for the video player seems to be lost, and stops painting
<clever>
but, it will rapidly flicker the video rect, between 2 different corrupted frames
<clever>
what i think is happening, is that something in the stack, is double-buffering, but it has forgotten to paint over the video rect
<clever>
so every time an update occurs, it modifies a frame, and swaps, but a rect in that frame is no longer painting at al
<clever>
l
Company has quit [Read error: Connection reset by peer]
tobiasjakobi has quit [Remote host closed the connection]
flacks_ has joined #dri-devel
flacks has quit [Ping timeout: 480 seconds]
flacks_ has quit [Ping timeout: 480 seconds]
kaylie has quit [Quit: Bye!]
moony has joined #dri-devel
iive has joined #dri-devel
moony has quit []
cphealy has quit [Quit: Leaving]
moony has joined #dri-devel
bgs has joined #dri-devel
moony has quit []
moony has joined #dri-devel
Duke`` has quit [Ping timeout: 480 seconds]
gouchi has quit [Remote host closed the connection]
rasterman has quit [Quit: Gettin' stinky!]
heat has quit [Ping timeout: 480 seconds]
pcercuei has quit [Quit: dodo]
Hi-Angel has quit [Ping timeout: 480 seconds]
nroberts has quit [Ping timeout: 480 seconds]
Company has joined #dri-devel
ceyusa has quit [Ping timeout: 480 seconds]
gawin has quit [Quit: Konversation terminated!]
pushqrdx has quit [Remote host closed the connection]
craftyguy_ is now known as craftyguy
iive has quit []
jewins has quit [Ping timeout: 480 seconds]
pendingchaos_ has joined #dri-devel
pendingchaos has quit [Read error: No route to host]