#etnaviv on 2021-06-28 — irc logs at oftc.irclog.whitequark.org

2021-06-22 12:28 ChanServ changed the topic of #etnaviv to: #etnaviv - the home of the reverse-engineered Vivante GPU driver - Logs https://oftc.irclog.whitequark.org/etnaviv

00:19 JohnnyonFlame has quit [Remote host closed the connection]

00:21 JohnnyonFlame has joined #etnaviv

05:29 adjtm is now known as Guest755

05:29 adjtm has joined #etnaviv

05:35 Guest755 has quit [Ping timeout: 480 seconds]

06:34 JohnnyonFlame has quit [Ping timeout: 480 seconds]

06:39 frieder has joined #etnaviv

07:44 ecrn has joined #etnaviv

08:01 <ecrn> I copied and slightly modified etna_viv/attic/test2d/bitblt2d.c command stream, to just draw one full-size rectangle and to compile with the libdrm-etnaviv API - it is supposed to blit 1920x1080 buffer without offset from one bo to another

08:01 <ecrn> it seems to work just fine, but I get 23ms per blit

08:02 <ecrn> https://nopaste.net/xHrxtLtgTJ this is the command stream

08:02 lynxeye has joined #etnaviv

08:03 <ecrn> https://nopaste.net/f03duOvUiK

08:03 <ecrn> and this is how I time it

08:03 <ecrn> is this expected, or I am doing something wrong?

08:04 <ecrn> NXP claims 600Mpix/s, but I don't know for which operations

08:09 <ecrn> that should be <4ms for 1920x1080

08:10 <austriancoder> ecrn: maybe try the same with galcore and get some numbers?

08:13 <lynxeye> ecrn: It seems you are also measuring the CPU cost of allocating backing storage for the BOs, etc. If you want to measure raw GPU performance, you really need to make a run to set everything up, then do a measuring run, which just submits the cmdstream and waits for completion.

08:16 <ecrn> the etna_bo_map? I can move it outside the timed section

08:16 <lynxeye> Note that memory allocation for BOs happens on first use, so if you don't touch your BOs with the CPU, the etna_bo_new just sets up the container structure. The actual memory allocation only happens when you submit the cmdstream.

08:17 <lynxeye> bo_map just sets up the mmap, if you don't touch this region with the CPU, the backing pages will not be faulted in.

08:17 <ecrn> a single fault will alocate the whole buffer?

08:18 <lynxeye> ecrn: With the current kernel driver implementation: yes. But there is no guarantee in the UAPI, so this might change in the future.

08:20 <ecrn> ok, I added memset(dst, 0, bmp_size); before the timed section, src buffer is filled by cpu earlier

08:21 <ecrn> Elapsed 23608us

08:30 <ecrn> I can compare with g2d and galcore, but that requires another kernel and setting up the libraries

08:33 <lynxeye> ecrn: Your dst pointer isn't volatile, so the compiler might optimize away the memset.

08:34 <ecrn> added volatile, still the same

08:54 frieder has quit [Ping timeout: 480 seconds]

09:05 frieder has joined #etnaviv

09:13 <ecrn> blitted! 9627us

09:13 <ecrn> g2d

09:14 <ecrn> https://nopaste.net/5ZzlrJc7yP

09:28 <ecrn> is there an easy way to dump the command stream that g2d has generated?

09:30 <austriancoder> ecrn: https://github.com/etnaviv/libvivhook

09:46 pcercuei has joined #etnaviv

09:58 <ecrn> ok, I got the fdr.out

09:59 <austriancoder> ecrn: https://github.com/etnaviv/etna_viv --> tools/dump_cmdstream.py

10:02 <ecrn> ValueError: bad typecode (must be b, B, u, h, H, i, I, l, L, q, Q, f or d)

10:02 frieder has quit [Ping timeout: 480 seconds]

10:02 <austriancoder> ecrn: sadly.. its python2.7

10:25 frieder has joined #etnaviv

10:55 <ecrn> https://nopaste.net/93t2l5u3rP

11:02 <austriancoder> ecrn: nice.. [seq 45] is what you looked for

11:02 <ecrn> ok, thanks

12:30 <ecrn> 0x080104af, /* LOAD_STATE (1) Base: 0x012BC Size: 1 Fixp: 0 */

12:30 <ecrn> 0xfffffdc7, /* [012BC] DE.ROT_ANGLE := DST=ROT0 */

12:30 <ecrn> what is the deal with it?

12:31 <ecrn> I cannot compose the 0xfffffdc7 value with VIVS_DE_* macros

12:31 <ecrn> I put etna_set_state(stream, VIVS_DE_ROT_ANGLE, 0xfffffdc7);, but it is probably wrong

12:32 <austriancoder> ecrn: here are some docs: https://github.com/etnaviv/etna_viv/blob/master/rnndb/state_2d.xml#L302

12:37 <ecrn> there are no fields corresponding to the most significant bits, for example

12:38 <austriancoder> ecrn: thats all we have .. sorry .. feel free to RE the missing bits

12:39 <ecrn> ok, so the etna_set_state(stream, VIVS_DE_ROT_ANGLE, 0xfffffdc7); should generate the same sequence as the pasted one?

12:39 <austriancoder> yes

13:02 <ecrn> the dumped stream works after some adaptations

13:02 <ecrn> but...

13:03 <ecrn> Elapsed 24262us

13:03 <ecrn> could it be some gpu clock frequency issue?

13:09 <lynxeye> ecrn: Are you sure you are measuring the right thing in the etnaviv test? Also, which SoC is this?

13:12 <ecrn> i.MX6DL

13:13 <ecrn> I'm not sure, but now the only thing that is being timed is gen_cmd_stream_g2d(stream, bmp, bmp2, width, height); which uses only etna_set_state/etna_cmd_stream_emit

13:13 <ecrn> and etna_cmd_stream_finish(stream);, which, as I understand it, is the thing that should be measured

13:14 <ecrn> both bos are created with etna_bo_new(dev, bmp_size, ETNA_BO_UNCACHED);

13:14 <ecrn> and then are mapped and filled with cpu

13:15 <ecrn> in the g2d case I timed g2d_blit and g2d_finish

13:15 <lynxeye> ecrn: On the 6DL the mainline kernel corrects the GPU clocks to avoid overclocking them (which is the SoC default POR state). No idea if your downstream kernel does the same. IIRC that's a 200MHz difference in 2D GPU clock rate.

13:16 <ecrn> I have 5.8.10 from mainstream, the only changes were custom device tree

13:17 <ecrn> and the g2d test was under freescale/nxp kernel

13:18 <ecrn> so it may be the case

13:18 <ecrn> but that would be over 2x the clock frequency

13:38 <DPA> Do 2 blits take twice as long? Maybe there is some latency somewhere?

13:39 <ecrn> yes, I tested 10 blits looping gen_cmd_stream_g2d(...) and etna_cmd_stream_finish(...) and it took 233ms

13:40 <ecrn> but I don't know if I have to reset the stream in some way after the etna_cmd_stream_finish, so that probably wasn't valid

13:41 <ecrn> brb

13:43 ecrn has quit [Remote host closed the connection]

13:45 ecrn has joined #etnaviv

14:16 <marex> lynxeye: hey, so I noticed the GPCv2 driver suffers from lock ups on boot

14:16 <marex> lynxeye: at least on 8mm and 8mn it does

14:17 <marex> lynxeye: have you seen such a thing where you flip e.g. VPUMIX PUP_REQ on and the bit does not self-clear yet ?

14:18 ecrn has quit [Remote host closed the connection]

14:29 ecrn has joined #etnaviv

14:29 <ecrn> ok, so I commented out the changes from https://patchwork.kernel.org/project/linux-arm-kernel/patch/1474017371-28966-2-git-send-email-l.stach@pengutronix.de/ and the difference is very subtle, 23064-23178us vs 24015-24440us per blit

14:30 <ecrn> so it seems to be something else that causes it

15:31 frieder has quit [Remote host closed the connection]

15:58 frieder has joined #etnaviv

16:02 <ecrn> .entryPipe = gcvPIPE_3D,

16:02 <ecrn> .exitPipe = gcvPIPE_2D,

16:02 frieder_ has joined #etnaviv

16:02 <ecrn> does it matter?

16:06 frieder has quit [Ping timeout: 480 seconds]

16:06 frieder_ has quit [Remote host closed the connection]

16:14 JohnnyonFlame has joined #etnaviv

16:43 <ecrn> 10x etna_cmd_stream_flush + a single etna_cmd_stream_finish still takes 233ms, and without the finish (only submitting) takes 4.4ms

18:01 <marex> ecrn: could it have to do with some on-GPU cache hit/miss ?

18:01 <marex> austriancoder: ^

19:58 <ecrn> I think not much of the 1920x1080 buffer fits into gpu cache

20:14 JohnnyonFlame has quit [Ping timeout: 480 seconds]

20:14 ecrn has quit [Remote host closed the connection]

20:27 JohnnyonFlame has joined #etnaviv

20:47 lynxeye has quit [Quit: Leaving.]

22:02 pcercuei has quit [Quit: dodo]