<pcercuei>
tl/dr, I'm working on it, but the road is paved with ambushes
<cphealy>
pcercuei: That's awesome! I am curious though about this "super-secret optimization" that you mention. ;-)
<pcercuei>
hugepages
<pcercuei>
switching from 4 KiB pages to 16 KiB gives a 10% boost in PocketSNES already
<cphealy>
Nice!
<cphealy>
Regarding the ETA for this update, are you looking for beta testers of the new OS?
<pcercuei>
I'm not quite there yet
<pcercuei>
Moving from fbdev was and still is a PITA
<cphealy>
Yea, where I work we are feeling the same pain transitioning from fbdev to DRM/KMS and Wayland
<pcercuei>
problem is more userspace than kernelspace
<cphealy>
Have you tried running Weston on the platform with etnaviv doing the compositing?
<cphealy>
We are using Weston with SDL2, Qt, and WPE on top of it.
<pcercuei>
No, that would add a huge overhead
<cphealy>
You mean the overhead of compositing?
<pcercuei>
What's the point of compositing, when the only thing you ever have is a fullscreen window?
<pcercuei>
Porting PocketSNES from SDL1 to SDL2, causes it to run at about 30-40% its original speed
<cphealy>
With Weston 8, Weston is smart enough to bypass the composition step if the display controller HW (IPU) can scan out the scene. So in the case of a single fullscreen window, weston should skip the composition step and have the display controller do the work.
<pcercuei>
25fps instead of 60
<pcercuei>
So it doesn't pass through the GPU?
<cphealy>
Correct
<pcercuei>
hmm. I'd have to give it a try
<pcercuei>
I don't expect much TBH
<cphealy>
Also, if the display controller supports HW planes, Weston will try to use the HW plane capabilities of the display controller before falling back to GPU based compositing. In the case of two fullscreen apps where one is in the foreground and the other in the background the foreground one has alpha transparancy, the display controller might be able to handle that instead of Weston using the 3D GPU.
<pcercuei>
I have three planes, two primary and one overlay
<pcercuei>
the second primary is the IPU, which can scale and resize
<cphealy>
Do you have control of pane ordering?
<cphealy>
plane
<pcercuei>
no, Z ordering is fixed
<pcercuei>
also the two primary planes can't be used at the same time
<pcercuei>
but they can both have the overlay on top
<cphealy>
I could envision the OS having the game run fullscreen which the compositor scans out using one of the HW planes, then when the OS needs to do some pop-up or something for quitting the game or taking a screenshot or whatever, the system could render the pop-up using SDL(2) and have Weston scan it out in another plane. This allows for not paying a penalty by compositing using the 3D GPU.
<cphealy>
So in your case, game in the primary plane and the pop-up in the overlay plane.
<pcercuei>
yes, that's the idea
<cphealy>
I will say it's quite likely when exercising the display controller in this manner with Weston that you will find little bugs in the IPU and/or IPU drivers.
<pcercuei>
What? My IPU driver is perfect :D
<cphealy>
haha, of course!
<pcercuei>
The other thing we had on the old framebuffer, that we can't have with KMS: Since everything was painted to the framebuffer, which had a fixed size, you wouldn't overflow the buffer by giving userspace a bigger resolution that what it asked for
<pcercuei>
Which means that userspace could ask for *any* resolution, and the driver would give it the closest
<pcercuei>
DRM/KMS doesn't have a "get closest resolution" callback
<pcercuei>
Weston can run without a GPU?
<cphealy>
I believe there is a soft backend.
<cphealy>
I think in your case, you would still build it with GPU backend support though. This way for cases where you want to do things that involve more layers than you have HW plane support for, you can have the GPU do the compositing (albeit slower.)
<cphealy>
This may be fine though for use such use cases as they may be when one is not actually playing the game.
<pcercuei>
If that means having to pull in Mesa, then the root filesystem will grow a lot in size, that won't work for some devices
<pcercuei>
limit for these is ~30 MiB
<pcercuei>
for GCW Zero and similar, bigger is fine
<cphealy>
How does the platform work without Mesa?
<pcercuei>
KMS backend of Weston requires GBM, which is Mesa anyway, so I guess I don't have a choice
<pcercuei>
software rendering
<pcercuei>
the extreme majority of apps and emulators are built with SDL1, which does raster to the framebuffer directly
<cphealy>
Does GCW Zero and RG350 OS typically ship without GPU support then?
<pcercuei>
there is GPU support, but it's rarely used
<pcercuei>
only games that use GL use it
<cphealy>
ack
<JohnnyonFlame>
for the majority of use-cases where you'd own such a device, it makes a lot of sense
<cphealy>
Related to the SDL framebuffer vs KMS/DRM issue, IIRC Ryan Gordon was working on a bridge library that allowed playing SDL1 apps on top of SDL2.
<cphealy>
From the article: "One bit that does sound quite exciting, is that it would get older games and applications that were software-rendered over onto the GPU and getting better performance, as well as better fullscreen support and more."
<pcercuei>
its KMS backend, that is
<mntmn>
cphealy: oh that's cool, didn't know
<JohnnyonFlame>
in our case it only means adding a (slow) middle man
<cphealy>
Oh yes, the SDL2 KMS backend doesn't seem to be getting much love. The Wayland backend on the other hand did get some love and at least for my use of SDL2 on Weston works quite well.
<pcercuei>
cphealy: well, that's BS. You won't get a better performance by software-rendering into a texture, then using the GPU to apply that texture to a GBM output buffer, then send that buffer to DRM
<pcercuei>
instead of software-rendering to the GBM buffer directly
<JohnnyonFlame>
IIRC you do get perf. on higher resolutions and desktop GPUs
<JohnnyonFlame>
e.g. directdraw
<cphealy>
I wonder if the benefit is for the case where the bame is low res and the GPU is used just to scale to fullscreen?
<cphealy>
bame - game
<pcercuei>
I think so, yes. Then it's indeed better to scale in hardware
<pcercuei>
but we have the IPU for that
<cphealy>
In the GCW Zero and RG350 case, yes the IPU could be used for that. I think the reason one would want to have the GPU do it is to be more platform agnostic.
<JohnnyonFlame>
added bonus for the IPU doing conversion of different pixel formats
<cphealy>
JohnnyonFlame: Wouldn't the GPU be able to do this too?
<JohnnyonFlame>
not without a performance hit
<cphealy>
At least with the Vivante GPUs I work with, we can import textures with many different color spaces including YUYV.
<pcercuei>
GPU renders to memory
<pcercuei>
IPU sends its data to the LCDC through an internal bus
<cphealy>
Where we see a hit to performance is if we want to import a texture that is semi-planar or fully planar like NV12. This results in a slowdown as the texture sampler cannot do the CSC for free.
<cphealy>
pcercuei: I see what you mean. Yes, there is a cost to have the GPU do the scale as it takes a CPU rendered buffer, reads it into the GPU then the GPU blits it back to DRAM.
<cphealy>
If the IPU scaling support is exposed through the DRM API, it might be that you can have the game render at the native resolution then have the compositor (weston or whatnot) scale it to fullscreen using the IPU though the DRM API. This prevents the need for any customization at the SDL1/SDL2 layer and pushes it down to the compositor. In this case of a CPU rendered game, the GPU would not be used at all and the IPU would be
<cphealy>
used to scale the game to fullscreen during scanout.
<cphealy>
That's new enough that "draw-calls" should work.
<cphealy>
Does "GALLIUM_HUD=fps" work?
<RzR>
yes
<RzR>
cpu too
<mth>
pcercuei: technically we can't control the plane order, but you could switch the buffers and the plane configurations and that would effectively swap the planes as well
<cphealy>
RzR: When you use "GALLIUM_HUD=help", do you see reference to draw-calls?
<RzR>
nop
<pcercuei>
mth: the planes are not the same though
<cphealy>
RzR: with the apps you are running, do you have anything that dumps the OpenGL information so you can see for sure that you are using the etnaviv driver?
<cphealy>
RzR: OK, this explains why draw-calls is not working. Your not using etnaviv or the GPU. In this case 7fps isn't too bad considering it's all CPU rendered... ;-)
<daniels>
pcercuei: I don't know where buildroot puts the cursor data, but you'd need that ... it sounds like their packaging is just compiling things and hoping, rather than ever testing it :\
pcercuei has joined #etnaviv
<RzR>
cphealy, yes I suspected this because cpu was loaded
<RzR>
cphealy, but it's way faster than my previous built
<cphealy>
Another thing to help tell if the GPU is being used or not is to check /proc/interrupts. If the GPU interrupt count is not incrementing, the GPU is not doing any work.