#linux-sunxi on 2021-08-19 — irc logs at oftc.irclog.whitequark.org

2021-07-26 22:56 ChanServ changed the topic of #linux-sunxi to: Allwinner/sunxi development - Did you try looking at our wiki? https://linux-sunxi.org - Don't ask to ask. Just ask and wait for an answer! - This channel is logged at https://oftc.irclog.whitequark.org/linux-sunxi

00:04 ftg has quit [Read error: Connection reset by peer]

00:19 cmeerw has quit [Ping timeout: 480 seconds]

00:25 apritzel has quit [Ping timeout: 480 seconds]

01:48 vagrantc has quit [Quit: leaving]

02:23 cnxsoft has joined #linux-sunxi

02:31 cnxsoft has quit [Remote host closed the connection]

04:59 mehdix has quit []

05:00 mehdix has joined #linux-sunxi

05:58 apritzel has joined #linux-sunxi

06:38 apritzel has quit [Ping timeout: 480 seconds]

07:40 cmeerw has joined #linux-sunxi

08:11 gsz has joined #linux-sunxi

08:23 ftg has joined #linux-sunxi

09:38 Mangy_Dog has joined #linux-sunxi

09:49 ftg has quit [Read error: Connection reset by peer]

10:03 hlauer has joined #linux-sunxi

11:05 apritzel has joined #linux-sunxi

11:41 Daanct12 has joined #linux-sunxi

11:47 Danct12 has quit [Ping timeout: 480 seconds]

13:34 <ndufresne> jernej: for a SUNXI tiled buffer, GStreamer expects 157696 bytes, but cedrus returns sizeimage=152064

13:35 <ndufresne> for a 352x288 video

13:37 <ndufresne> yeah, looks like the driver screwed up size calculation

13:38 <ndufresne> it naively do height = ALIGN(height, 32);, and then device this by two for the luma

13:38 <ndufresne> but it has to do ALIGN(height, 64); for luma

13:39 <ndufresne> in this case we have 9 rows of tiled, for luma, this will allocate 4 rows, but we need 5

13:41 <ndufresne> 720p should be affected too

13:44 <ndufresne> * meant chroma in many placed in that comment, sorry

13:56 chewitt has quit [Quit: Zzz..]

13:57 chewitt has joined #linux-sunxi

14:02 gamiee has quit [Remote host closed the connection]

14:03 gamiee has joined #linux-sunxi

14:45 JohnDoe_71Rus has joined #linux-sunxi

15:03 jagan_ has joined #linux-sunxi

15:38 <jernej> ndufresne: are you sure that's needed?

15:38 <jernej> this is source of vendor calculation for tiled NV12: https://github.com/allwinner-zh/media-codec/blob/master/sunxi-cedarx/SOURCE/vdecoder/fbm.c#L613

15:39 <jernej> looking at PIXEL_FORMAT_YUV_MB32_420

15:40 <jernej> calculation is a bit strange, as it also align width and height to 64 for luma

15:40 <jernej> but in the end, for 352x288 video, it would allocate 153600 bytes (if I calculate correctly)

15:41 <jernej> this is still sligthly larger than mainline driver, but less than your calculation

15:44 <jernej> of course, it could still be a bug in both, vendor and mainline drivers

15:49 <ndufresne> jernej: I'm sure yes, the last row of luma with use half of each tiles from the last row of chroma

15:51 megi has quit [Quit: WeeChat 3.2]

15:51 megi has joined #linux-sunxi

15:51 <ndufresne> the layou is Luma 11x9, right now we allocate Chroma 11x4

15:52 <ndufresne> well, 11x4 + 5 tiles

15:57 <ndufresne> jernej: reading that code now, looks like they allocate each plane separatly

15:58 <ndufresne> do you think the HW might need an even number of row ?

15:58 <ndufresne> that code will always round up 64 the eight to the luma plane

15:58 <ndufresne> (it could be HW specific requirement)

15:59 <jernej> I'm looking at third code now: https://github.com/linux-sunxi/libvdpau-sunxi/blob/master/surface_video.c#L105

15:59 <jernej> and it agrees with your calculation

16:00 <ndufresne> have you notice AdapterMemPalloc(nMemSizeC*2);

16:00 <ndufresne> that code is...well

16:00 <jernej> yeah, there's many weird things

16:00 <jernej> anyway, I think you're right

16:01 <jernej> it doesn't make sense to have last row halved

16:01 <ndufresne> in gst I convert to number of tiles, I think its less error prone, or perhaps its just my way to think

16:02 <ndufresne> btw, I apparenlty never finished interlaced support

16:02 <ndufresne> split field interlacing more precisely

16:02 <jernej> I noticed the orher day :)

16:02 <jernej> *other

16:03 <jernej> I forgot to mention

16:03 <ndufresne> so now I at least got the code so that first field is held in the vb queue

16:04 <ndufresne> but I see lots of artifact, so working on that now

16:04 <ndufresne> I think ffmpeg still requeue

16:05 <ndufresne> 79/135, so gained 3, but also removed a lot of decode errors (now it's bad checksum)

16:05 <jernej> note that you always mark reference as "frame"

16:06 <jernej> for interlaced frames, you have to correctly mark as top/bottom

16:06 <jernej> in reference lists 0 and 1

16:07 <ndufresne> not surprising, considering this was never implemented, that's my obvious next step

16:07 <ndufresne> fixing paramters is way easier then fixing queue/hold issues

16:08 <jernej> true, that's why we took easy approach for ffmpeg - I expect a lot of work making it more async

16:08 <ndufresne> I'm kind of happy to finally finish this, I knew h264 was incomplete, and never had time to finish that properly

16:09 <ndufresne> my worry about the hold flag, is that I don't know how to "un-hold" it, e.g. in live streaming, we could loose a field over the network

16:09 <jernej> hold flag is just an optimization

16:10 <jernej> it worked fine before it with requeueing capture buffer

16:10 <ndufresne> but as gst implement render delays, I really need it

16:10 <jernej> but that disables async optimizations

16:10 <jernej> or not

16:11 <jernej> can be same buffer queued multiple times?

16:11 <ndufresne> you can queue the second field before you have dequeued the decoded picture for the first field

16:11 <ndufresne> * you can't

16:11 <jernej> I guess that would work if it is DMA-BUF

16:11 <ndufresne> so without hold flag, you operate in lock step

16:12 <ndufresne> yeah, if you have dmabuf import path (which I don't have implemented here, in fact I would not know were to allocate the dmabuf from

16:12 <jernej> ffmpeg holds hw context

16:12 <jernej> in our case, handle to drm device, which can allocate dumb buffers

16:13 <ndufresne> yeah, you can cheat the dumb allocator to respect v4l2 sizeimage (for padding needs)

16:13 <ndufresne> drivers need not to be buggy and include MV and other HW specific buffer size into sizeimage of course

16:14 <ndufresne> as discussed with ez, this has not be validated on any of the drivers yet, we wanted to make a PoC with dmabuf-heap, which would have allowed testing this

16:14 <jernej> can't you get required sizeimage for dma-buf imported buffers?

16:15 <ndufresne> in gst, we use the dumb allocation with width=1, height=sizeimage ;-D

16:15 <ndufresne> its the other way around, you have to tell your allocate the sizeimage needed by your importer

16:16 <ndufresne> the dmabuf is pre-allocated (unlike some other OS were you get a handle from the exporter, but it get lazily allocated by an importer trigger)

16:16 <ndufresne> that basically forces driver to be explicit

16:16 <jernej> let me check something

16:19 <jernej> if you call VIDIOC_REQBUFS with V4L2_MEMORY_DMABUF type, you actually get "empty" capture buffers with properly set sizeimage, right?

16:19 <jernej> you assign dma-buf handle to a capture buffer when you enqueue it, right?

16:24 <jernej> so, imo, you could first allocate V4L2_MEMORY_DMABUF capture buffers, read buffer info to get sizeimage and then allocate DRM dumb buffers according to sizeimage

16:24 <jernej> and just before you enqueue it, you set dma-buf fd into buffer structure

16:26 <ndufresne> jernej: well, sizeimage is returned by VIDIOC_S_FMT

16:26 <jernej> ah, true

16:26 <jernej> even easier

16:26 <ndufresne> so you simply do S_FMT, and then allocate dump, and then REQBUF for dmabuf import and import them

16:27 <ndufresne> it's straight forward imho, just not implemented by anyone apparently

16:27 <ndufresne> dumb or dmabuf-head (second one seems cleaner approach)

16:28 <ndufresne> I think Khodi could be a nice candidate for that

16:29 <ndufresne> ok, so in the latest gst DPB, the picture are duplicate per field, so that when they make it to ref list, we know if we refer to TOP or Bottom, now I just need to translate GstH264PictureField in the v4l2 equivalent

16:31 <ndufresne> the DPB filling does not look quite right either ... but after lunch

16:32 <jernej> well, if there is no reason to allocate buffers manually, why do it?

16:38 <jernej> btw, Cedrus has an issue on 32-bit kernels with vmalloc space, especially when decoding 4k videos

16:38 <jernej> I guess adding DMA_ATTR_NO_KERNEL_MAPPING flag to both queues would help

16:57 jernej has quit [Quit: Free ZNC ~ Powered by LunarBNC: https://LunarBNC.net]

16:57 jernej has joined #linux-sunxi

17:04 apritzel has quit [Ping timeout: 480 seconds]

17:12 <ndufresne> jernej: for my own knowledge, what does this flag do ?

17:14 hlauer has quit [Ping timeout: 480 seconds]

17:15 vagrantc has joined #linux-sunxi

17:18 <jernej> you can't access content of the buffer via cpu in driver

17:19 <jernej> it's not needed anyway

17:19 <jernej> bbl

17:40 <ndufresne> I see, indeed, that specific driver should not need it

17:41 <ndufresne> when we do, it's usually for HW quirks, or smenatic miss-match between v4l2 and hw flow

17:41 <ndufresne> note, can't be applied to let's say raw jpeg decoder, like hantro jpeg

17:51 linusw has quit [Quit: Connection closed for inactivity]

18:15 linusw has joined #linux-sunxi

19:06 <jernej> ndufresne: hantro has this flag set for both queues, except for encoder

19:06 <jernej> but I suspect that Cedrus would need ugly workaround for HEVC, where bitstream would need to be examined

19:08 <jernej> ndufresne: do you have A20 at hand? maybe your fix also helps with it

19:13 <libv> about the wiki, i am aware, but it probably is more important to get the ported wiki implemented and tested first

19:25 gsz has quit [Quit: leaving]

19:30 jagan_ has quit [Remote host closed the connection]

19:34 JohnDoe_71Rus has quit []

19:36 hlauer has joined #linux-sunxi

19:46 apritzel has joined #linux-sunxi

20:36 <ndufresne> jernej: I do, haven't tried yet though

20:36 <ndufresne> it can't make it worst for sure ;-D

20:37 <ndufresne> jernej: I'm finally getting there, the gst way of handling split fields is counter intuitive, but I think I'm getting a hold of it now

20:37 <jernej> oh, great!

20:38 <jernej> number of tests passing is improving, I presume?

20:39 <ndufresne> last ran was 95, but this one should be better

20:40 <ndufresne> took me quite a while to get cabac_mot_fld0_full working, but that unlocks a lot

20:40 <jernej> what is that?

20:40 <ndufresne> it's mbaff file

20:41 <jernej> oh, those are fun

20:41 <ndufresne> of course its mixed, cause its a test

20:41 <ndufresne> cedrus depens on proper poc values, well as least relative to each other

20:42 <ndufresne> but gst splits the top and bottom fields in possibly two pictures, I was picking poc value all wrong ...

20:42 <ndufresne> 111/135

20:42 <jernej> getting close to ffmpeg :)

20:42 <ndufresne> started the day at 76

20:42 <jernej> yesterday it was even worse :)

20:43 <ndufresne> 56 iirc

20:43 <jernej> something like that

20:43 <ndufresne> I'm really glad I got time to fix that

20:44 <jernej> do you have a clue where the rest of issues are?

20:49 <ndufresne> I'm picking them one by one, so not yet no

20:50 <ndufresne> all passing up to cama1_vtc_c, so looking at the one now

20:57 <ndufresne> quit subtle, without YUView, not sure I'd noticed the issues

20:59 ftg has joined #linux-sunxi

21:01 <ndufresne> hmm, strange, keyframe is fully decoded, should be field 3 (both top/bottom) in l0 ...

21:05 <ndufresne> jernej: so for this one it shows that poc is now right, but fields and flags in dpb and l0/l1 are not quite right yet

21:05 <ndufresne> for tomorrow !

21:06 <jernej> yeah, those flags are tricky to get them right. Anyway, nice progress!

22:29 cmeerw has quit [Ping timeout: 480 seconds]

22:55 hlauer has quit [Ping timeout: 480 seconds]

22:58 Mangy_Dog has quit [Remote host closed the connection]