ChanServ changed the topic of #linux-sunxi to: Allwinner/sunxi development - Did you try looking at our wiki? https://linux-sunxi.org - Don't ask to ask. Just ask and wait for an answer! - This channel is logged at https://oftc.irclog.whitequark.org/linux-sunxi
ftg has quit [Read error: Connection reset by peer]
cmeerw has quit [Ping timeout: 480 seconds]
apritzel has quit [Ping timeout: 480 seconds]
vagrantc has quit [Quit: leaving]
cnxsoft has joined #linux-sunxi
cnxsoft has quit [Remote host closed the connection]
mehdix has quit []
mehdix has joined #linux-sunxi
apritzel has joined #linux-sunxi
apritzel has quit [Ping timeout: 480 seconds]
cmeerw has joined #linux-sunxi
gsz has joined #linux-sunxi
ftg has joined #linux-sunxi
Mangy_Dog has joined #linux-sunxi
ftg has quit [Read error: Connection reset by peer]
hlauer has joined #linux-sunxi
apritzel has joined #linux-sunxi
Daanct12 has joined #linux-sunxi
Danct12 has quit [Ping timeout: 480 seconds]
<ndufresne> jernej: for a SUNXI tiled buffer, GStreamer expects 157696 bytes, but cedrus returns sizeimage=152064
<ndufresne> for a 352x288 video
<ndufresne> yeah, looks like the driver screwed up size calculation
<ndufresne> it naively do height = ALIGN(height, 32);, and then device this by two for the luma
<ndufresne> but it has to do ALIGN(height, 64); for luma
<ndufresne> in this case we have 9 rows of tiled, for luma, this will allocate 4 rows, but we need 5
<ndufresne> 720p should be affected too
<ndufresne> * meant chroma in many placed in that comment, sorry
chewitt has quit [Quit: Zzz..]
chewitt has joined #linux-sunxi
gamiee has quit [Remote host closed the connection]
gamiee has joined #linux-sunxi
JohnDoe_71Rus has joined #linux-sunxi
jagan_ has joined #linux-sunxi
<jernej> ndufresne: are you sure that's needed?
<jernej> looking at PIXEL_FORMAT_YUV_MB32_420
<jernej> calculation is a bit strange, as it also align width and height to 64 for luma
<jernej> but in the end, for 352x288 video, it would allocate 153600 bytes (if I calculate correctly)
<jernej> this is still sligthly larger than mainline driver, but less than your calculation
<jernej> of course, it could still be a bug in both, vendor and mainline drivers
<ndufresne> jernej: I'm sure yes, the last row of luma with use half of each tiles from the last row of chroma
megi has quit [Quit: WeeChat 3.2]
megi has joined #linux-sunxi
<ndufresne> the layou is Luma 11x9, right now we allocate Chroma 11x4
<ndufresne> well, 11x4 + 5 tiles
<ndufresne> jernej: reading that code now, looks like they allocate each plane separatly
<ndufresne> do you think the HW might need an even number of row ?
<ndufresne> that code will always round up 64 the eight to the luma plane
<ndufresne> (it could be HW specific requirement)
<jernej> and it agrees with your calculation
<ndufresne> have you notice AdapterMemPalloc(nMemSizeC*2);
<ndufresne> that code is...well
<jernej> yeah, there's many weird things
<jernej> anyway, I think you're right
<jernej> it doesn't make sense to have last row halved
<ndufresne> in gst I convert to number of tiles, I think its less error prone, or perhaps its just my way to think
<ndufresne> btw, I apparenlty never finished interlaced support
<ndufresne> split field interlacing more precisely
<jernej> I noticed the orher day :)
<jernej> *other
<jernej> I forgot to mention
<ndufresne> so now I at least got the code so that first field is held in the vb queue
<ndufresne> but I see lots of artifact, so working on that now
<ndufresne> I think ffmpeg still requeue
<ndufresne> 79/135, so gained 3, but also removed a lot of decode errors (now it's bad checksum)
<jernej> note that you always mark reference as "frame"
<jernej> for interlaced frames, you have to correctly mark as top/bottom
<jernej> in reference lists 0 and 1
<ndufresne> not surprising, considering this was never implemented, that's my obvious next step
<ndufresne> fixing paramters is way easier then fixing queue/hold issues
<jernej> true, that's why we took easy approach for ffmpeg - I expect a lot of work making it more async
<ndufresne> I'm kind of happy to finally finish this, I knew h264 was incomplete, and never had time to finish that properly
<ndufresne> my worry about the hold flag, is that I don't know how to "un-hold" it, e.g. in live streaming, we could loose a field over the network
<jernej> hold flag is just an optimization
<jernej> it worked fine before it with requeueing capture buffer
<ndufresne> but as gst implement render delays, I really need it
<jernej> but that disables async optimizations
<jernej> or not
<jernej> can be same buffer queued multiple times?
<ndufresne> you can queue the second field before you have dequeued the decoded picture for the first field
<ndufresne> * you can't
<jernej> I guess that would work if it is DMA-BUF
<ndufresne> so without hold flag, you operate in lock step
<ndufresne> yeah, if you have dmabuf import path (which I don't have implemented here, in fact I would not know were to allocate the dmabuf from
<jernej> ffmpeg holds hw context
<jernej> in our case, handle to drm device, which can allocate dumb buffers
<ndufresne> yeah, you can cheat the dumb allocator to respect v4l2 sizeimage (for padding needs)
<ndufresne> drivers need not to be buggy and include MV and other HW specific buffer size into sizeimage of course
<ndufresne> as discussed with ez, this has not be validated on any of the drivers yet, we wanted to make a PoC with dmabuf-heap, which would have allowed testing this
<jernej> can't you get required sizeimage for dma-buf imported buffers?
<ndufresne> in gst, we use the dumb allocation with width=1, height=sizeimage ;-D
<ndufresne> its the other way around, you have to tell your allocate the sizeimage needed by your importer
<ndufresne> the dmabuf is pre-allocated (unlike some other OS were you get a handle from the exporter, but it get lazily allocated by an importer trigger)
<ndufresne> that basically forces driver to be explicit
<jernej> let me check something
<jernej> if you call VIDIOC_REQBUFS with V4L2_MEMORY_DMABUF type, you actually get "empty" capture buffers with properly set sizeimage, right?
<jernej> you assign dma-buf handle to a capture buffer when you enqueue it, right?
<jernej> so, imo, you could first allocate V4L2_MEMORY_DMABUF capture buffers, read buffer info to get sizeimage and then allocate DRM dumb buffers according to sizeimage
<jernej> and just before you enqueue it, you set dma-buf fd into buffer structure
<ndufresne> jernej: well, sizeimage is returned by VIDIOC_S_FMT
<jernej> ah, true
<jernej> even easier
<ndufresne> so you simply do S_FMT, and then allocate dump, and then REQBUF for dmabuf import and import them
<ndufresne> it's straight forward imho, just not implemented by anyone apparently
<ndufresne> dumb or dmabuf-head (second one seems cleaner approach)
<ndufresne> I think Khodi could be a nice candidate for that
<ndufresne> ok, so in the latest gst DPB, the picture are duplicate per field, so that when they make it to ref list, we know if we refer to TOP or Bottom, now I just need to translate GstH264PictureField in the v4l2 equivalent
<ndufresne> the DPB filling does not look quite right either ... but after lunch
<jernej> well, if there is no reason to allocate buffers manually, why do it?
<jernej> btw, Cedrus has an issue on 32-bit kernels with vmalloc space, especially when decoding 4k videos
<jernej> I guess adding DMA_ATTR_NO_KERNEL_MAPPING flag to both queues would help
jernej has quit [Quit: Free ZNC ~ Powered by LunarBNC: https://LunarBNC.net]
jernej has joined #linux-sunxi
apritzel has quit [Ping timeout: 480 seconds]
<ndufresne> jernej: for my own knowledge, what does this flag do ?
hlauer has quit [Ping timeout: 480 seconds]
vagrantc has joined #linux-sunxi
<jernej> you can't access content of the buffer via cpu in driver
<jernej> it's not needed anyway
<jernej> bbl
<ndufresne> I see, indeed, that specific driver should not need it
<ndufresne> when we do, it's usually for HW quirks, or smenatic miss-match between v4l2 and hw flow
<ndufresne> note, can't be applied to let's say raw jpeg decoder, like hantro jpeg
linusw has quit [Quit: Connection closed for inactivity]
linusw has joined #linux-sunxi
<jernej> ndufresne: hantro has this flag set for both queues, except for encoder
<jernej> but I suspect that Cedrus would need ugly workaround for HEVC, where bitstream would need to be examined
<jernej> ndufresne: do you have A20 at hand? maybe your fix also helps with it
<libv> about the wiki, i am aware, but it probably is more important to get the ported wiki implemented and tested first
gsz has quit [Quit: leaving]
jagan_ has quit [Remote host closed the connection]
JohnDoe_71Rus has quit []
hlauer has joined #linux-sunxi
apritzel has joined #linux-sunxi
<ndufresne> jernej: I do, haven't tried yet though
<ndufresne> it can't make it worst for sure ;-D
<ndufresne> jernej: I'm finally getting there, the gst way of handling split fields is counter intuitive, but I think I'm getting a hold of it now
<jernej> oh, great!
<jernej> number of tests passing is improving, I presume?
<ndufresne> last ran was 95, but this one should be better
<ndufresne> took me quite a while to get cabac_mot_fld0_full working, but that unlocks a lot
<jernej> what is that?
<ndufresne> it's mbaff file
<jernej> oh, those are fun
<ndufresne> of course its mixed, cause its a test
<ndufresne> cedrus depens on proper poc values, well as least relative to each other
<ndufresne> but gst splits the top and bottom fields in possibly two pictures, I was picking poc value all wrong ...
<ndufresne> 111/135
<jernej> getting close to ffmpeg :)
<ndufresne> started the day at 76
<jernej> yesterday it was even worse :)
<ndufresne> 56 iirc
<jernej> something like that
<ndufresne> I'm really glad I got time to fix that
<jernej> do you have a clue where the rest of issues are?
<ndufresne> I'm picking them one by one, so not yet no
<ndufresne> all passing up to cama1_vtc_c, so looking at the one now
<ndufresne> quit subtle, without YUView, not sure I'd noticed the issues
ftg has joined #linux-sunxi
<ndufresne> hmm, strange, keyframe is fully decoded, should be field 3 (both top/bottom) in l0 ...
<ndufresne> jernej: so for this one it shows that poc is now right, but fields and flags in dpb and l0/l1 are not quite right yet
<ndufresne> for tomorrow !
<jernej> yeah, those flags are tricky to get them right. Anyway, nice progress!
cmeerw has quit [Ping timeout: 480 seconds]
hlauer has quit [Ping timeout: 480 seconds]
Mangy_Dog has quit [Remote host closed the connection]