JohnnyonFlame has quit [Remote host closed the connection]
JohnnyonFlame has joined #etnaviv
adjtm is now known as Guest755
adjtm has joined #etnaviv
Guest755 has quit [Ping timeout: 480 seconds]
JohnnyonFlame has quit [Ping timeout: 480 seconds]
frieder has joined #etnaviv
ecrn has joined #etnaviv
<ecrn>
I copied and slightly modified etna_viv/attic/test2d/bitblt2d.c command stream, to just draw one full-size rectangle and to compile with the libdrm-etnaviv API - it is supposed to blit 1920x1080 buffer without offset from one bo to another
<ecrn>
it seems to work just fine, but I get 23ms per blit
<ecrn>
is this expected, or I am doing something wrong?
<ecrn>
NXP claims 600Mpix/s, but I don't know for which operations
<ecrn>
that should be <4ms for 1920x1080
<austriancoder>
ecrn: maybe try the same with galcore and get some numbers?
<lynxeye>
ecrn: It seems you are also measuring the CPU cost of allocating backing storage for the BOs, etc. If you want to measure raw GPU performance, you really need to make a run to set everything up, then do a measuring run, which just submits the cmdstream and waits for completion.
<ecrn>
the etna_bo_map? I can move it outside the timed section
<lynxeye>
Note that memory allocation for BOs happens on first use, so if you don't touch your BOs with the CPU, the etna_bo_new just sets up the container structure. The actual memory allocation only happens when you submit the cmdstream.
<lynxeye>
bo_map just sets up the mmap, if you don't touch this region with the CPU, the backing pages will not be faulted in.
<ecrn>
a single fault will alocate the whole buffer?
<lynxeye>
ecrn: With the current kernel driver implementation: yes. But there is no guarantee in the UAPI, so this might change in the future.
<ecrn>
ok, I added memset(dst, 0, bmp_size); before the timed section, src buffer is filled by cpu earlier
<ecrn>
Elapsed 23608us
<ecrn>
I can compare with g2d and galcore, but that requires another kernel and setting up the libraries
<lynxeye>
ecrn: Your dst pointer isn't volatile, so the compiler might optimize away the memset.
<ecrn>
there are no fields corresponding to the most significant bits, for example
<austriancoder>
ecrn: thats all we have .. sorry .. feel free to RE the missing bits
<ecrn>
ok, so the etna_set_state(stream, VIVS_DE_ROT_ANGLE, 0xfffffdc7); should generate the same sequence as the pasted one?
<austriancoder>
yes
<ecrn>
the dumped stream works after some adaptations
<ecrn>
but...
<ecrn>
Elapsed 24262us
<ecrn>
could it be some gpu clock frequency issue?
<lynxeye>
ecrn: Are you sure you are measuring the right thing in the etnaviv test? Also, which SoC is this?
<ecrn>
i.MX6DL
<ecrn>
I'm not sure, but now the only thing that is being timed is gen_cmd_stream_g2d(stream, bmp, bmp2, width, height); which uses only etna_set_state/etna_cmd_stream_emit
<ecrn>
and etna_cmd_stream_finish(stream);, which, as I understand it, is the thing that should be measured
<ecrn>
both bos are created with etna_bo_new(dev, bmp_size, ETNA_BO_UNCACHED);
<ecrn>
and then are mapped and filled with cpu
<ecrn>
in the g2d case I timed g2d_blit and g2d_finish
<lynxeye>
ecrn: On the 6DL the mainline kernel corrects the GPU clocks to avoid overclocking them (which is the SoC default POR state). No idea if your downstream kernel does the same. IIRC that's a 200MHz difference in 2D GPU clock rate.
<ecrn>
I have 5.8.10 from mainstream, the only changes were custom device tree
<ecrn>
and the g2d test was under freescale/nxp kernel
<ecrn>
so it may be the case
<ecrn>
but that would be over 2x the clock frequency
<DPA>
Do 2 blits take twice as long? Maybe there is some latency somewhere?
<ecrn>
yes, I tested 10 blits looping gen_cmd_stream_g2d(...) and etna_cmd_stream_finish(...) and it took 233ms
<ecrn>
but I don't know if I have to reset the stream in some way after the etna_cmd_stream_finish, so that probably wasn't valid
<ecrn>
brb
ecrn has quit [Remote host closed the connection]
ecrn has joined #etnaviv
<marex>
lynxeye: hey, so I noticed the GPCv2 driver suffers from lock ups on boot
<marex>
lynxeye: at least on 8mm and 8mn it does
<marex>
lynxeye: have you seen such a thing where you flip e.g. VPUMIX PUP_REQ on and the bit does not self-clear yet ?