ChanServ changed the topic of #etnaviv to: #etnaviv - the home of the reverse-engineered Vivante GPU driver - Logs https://oftc.irclog.whitequark.org/etnaviv
JohnnyonFlame has quit [Remote host closed the connection]
JohnnyonFlame has joined #etnaviv
adjtm is now known as Guest755
adjtm has joined #etnaviv
Guest755 has quit [Ping timeout: 480 seconds]
JohnnyonFlame has quit [Ping timeout: 480 seconds]
frieder has joined #etnaviv
ecrn has joined #etnaviv
<ecrn> I copied and slightly modified etna_viv/attic/test2d/bitblt2d.c command stream, to just draw one full-size rectangle and to compile with the libdrm-etnaviv API - it is supposed to blit 1920x1080 buffer without offset from one bo to another
<ecrn> it seems to work just fine, but I get 23ms per blit
<ecrn> https://nopaste.net/xHrxtLtgTJ this is the command stream
lynxeye has joined #etnaviv
<ecrn> and this is how I time it
<ecrn> is this expected, or I am doing something wrong?
<ecrn> NXP claims 600Mpix/s, but I don't know for which operations
<ecrn> that should be <4ms for 1920x1080
<austriancoder> ecrn: maybe try the same with galcore and get some numbers?
<lynxeye> ecrn: It seems you are also measuring the CPU cost of allocating backing storage for the BOs, etc. If you want to measure raw GPU performance, you really need to make a run to set everything up, then do a measuring run, which just submits the cmdstream and waits for completion.
<ecrn> the etna_bo_map? I can move it outside the timed section
<lynxeye> Note that memory allocation for BOs happens on first use, so if you don't touch your BOs with the CPU, the etna_bo_new just sets up the container structure. The actual memory allocation only happens when you submit the cmdstream.
<lynxeye> bo_map just sets up the mmap, if you don't touch this region with the CPU, the backing pages will not be faulted in.
<ecrn> a single fault will alocate the whole buffer?
<lynxeye> ecrn: With the current kernel driver implementation: yes. But there is no guarantee in the UAPI, so this might change in the future.
<ecrn> ok, I added memset(dst, 0, bmp_size); before the timed section, src buffer is filled by cpu earlier
<ecrn> Elapsed 23608us
<ecrn> I can compare with g2d and galcore, but that requires another kernel and setting up the libraries
<lynxeye> ecrn: Your dst pointer isn't volatile, so the compiler might optimize away the memset.
<ecrn> added volatile, still the same
frieder has quit [Ping timeout: 480 seconds]
frieder has joined #etnaviv
<ecrn> blitted! 9627us
<ecrn> g2d
<ecrn> is there an easy way to dump the command stream that g2d has generated?
pcercuei has joined #etnaviv
<ecrn> ok, I got the fdr.out
<austriancoder> ecrn: https://github.com/etnaviv/etna_viv --> tools/dump_cmdstream.py
<ecrn> ValueError: bad typecode (must be b, B, u, h, H, i, I, l, L, q, Q, f or d)
frieder has quit [Ping timeout: 480 seconds]
<austriancoder> ecrn: sadly.. its python2.7
frieder has joined #etnaviv
<austriancoder> ecrn: nice.. [seq 45] is what you looked for
<ecrn> ok, thanks
<ecrn> 0x080104af, /* LOAD_STATE (1) Base: 0x012BC Size: 1 Fixp: 0 */
<ecrn> 0xfffffdc7, /* [012BC] DE.ROT_ANGLE := DST=ROT0 */
<ecrn> what is the deal with it?
<ecrn> I cannot compose the 0xfffffdc7 value with VIVS_DE_* macros
<ecrn> I put etna_set_state(stream, VIVS_DE_ROT_ANGLE, 0xfffffdc7);, but it is probably wrong
<ecrn> there are no fields corresponding to the most significant bits, for example
<austriancoder> ecrn: thats all we have .. sorry .. feel free to RE the missing bits
<ecrn> ok, so the etna_set_state(stream, VIVS_DE_ROT_ANGLE, 0xfffffdc7); should generate the same sequence as the pasted one?
<austriancoder> yes
<ecrn> the dumped stream works after some adaptations
<ecrn> but...
<ecrn> Elapsed 24262us
<ecrn> could it be some gpu clock frequency issue?
<lynxeye> ecrn: Are you sure you are measuring the right thing in the etnaviv test? Also, which SoC is this?
<ecrn> i.MX6DL
<ecrn> I'm not sure, but now the only thing that is being timed is gen_cmd_stream_g2d(stream, bmp, bmp2, width, height); which uses only etna_set_state/etna_cmd_stream_emit
<ecrn> and etna_cmd_stream_finish(stream);, which, as I understand it, is the thing that should be measured
<ecrn> both bos are created with etna_bo_new(dev, bmp_size, ETNA_BO_UNCACHED);
<ecrn> and then are mapped and filled with cpu
<ecrn> in the g2d case I timed g2d_blit and g2d_finish
<lynxeye> ecrn: On the 6DL the mainline kernel corrects the GPU clocks to avoid overclocking them (which is the SoC default POR state). No idea if your downstream kernel does the same. IIRC that's a 200MHz difference in 2D GPU clock rate.
<ecrn> I have 5.8.10 from mainstream, the only changes were custom device tree
<ecrn> and the g2d test was under freescale/nxp kernel
<ecrn> so it may be the case
<ecrn> but that would be over 2x the clock frequency
<DPA> Do 2 blits take twice as long? Maybe there is some latency somewhere?
<ecrn> yes, I tested 10 blits looping gen_cmd_stream_g2d(...) and etna_cmd_stream_finish(...) and it took 233ms
<ecrn> but I don't know if I have to reset the stream in some way after the etna_cmd_stream_finish, so that probably wasn't valid
<ecrn> brb
ecrn has quit [Remote host closed the connection]
ecrn has joined #etnaviv
<marex> lynxeye: hey, so I noticed the GPCv2 driver suffers from lock ups on boot
<marex> lynxeye: at least on 8mm and 8mn it does
<marex> lynxeye: have you seen such a thing where you flip e.g. VPUMIX PUP_REQ on and the bit does not self-clear yet ?
ecrn has quit [Remote host closed the connection]
ecrn has joined #etnaviv
<ecrn> ok, so I commented out the changes from https://patchwork.kernel.org/project/linux-arm-kernel/patch/1474017371-28966-2-git-send-email-l.stach@pengutronix.de/ and the difference is very subtle, 23064-23178us vs 24015-24440us per blit
<ecrn> so it seems to be something else that causes it
frieder has quit [Remote host closed the connection]
frieder has joined #etnaviv
<ecrn> .entryPipe = gcvPIPE_3D,
<ecrn> .exitPipe = gcvPIPE_2D,
frieder_ has joined #etnaviv
<ecrn> does it matter?
frieder has quit [Ping timeout: 480 seconds]
frieder_ has quit [Remote host closed the connection]
JohnnyonFlame has joined #etnaviv
<ecrn> 10x etna_cmd_stream_flush + a single etna_cmd_stream_finish still takes 233ms, and without the finish (only submitting) takes 4.4ms
<marex> ecrn: could it have to do with some on-GPU cache hit/miss ?
<marex> austriancoder: ^
<ecrn> I think not much of the 1920x1080 buffer fits into gpu cache
JohnnyonFlame has quit [Ping timeout: 480 seconds]
ecrn has quit [Remote host closed the connection]
JohnnyonFlame has joined #etnaviv
lynxeye has quit [Quit: Leaving.]
pcercuei has quit [Quit: dodo]