raphaelsc has quit [Remote host closed the connection]
raphaelsc has joined #io_uring
<raphaelsc>
axboe, I have some write completing with ENOMEM when running many concurrency tests, it's not during submission since trace io_uring_req_failed reveals nothing. the kernel is 6.12, so it shouldn't be due to ring buffers being memlocked, which IIRC was fixed in ~5.1. also /cat/procmeminfo points to significant memory being available (even though free might be low). kernel page cache is used, so not DIO. the test passes with linux-aio.
<raphaelsc>
does it ring a bell to you?
<raphaelsc>
the file system is XFS.
raphaelsc_ has joined #io_uring
raphaelsc has quit [Ping timeout: 480 seconds]
<axboe>
raphaelsc_: does not ring a bell, there's really nothing on the io_uring side that should return ENOMEM if memory is available, I wonder if it's bubbling back from xfs somehow?
<axboe>
retsnoop may be useful here to figure out where it's coming from
<raphaelsc_>
axboe, thanks. I was indeed about to find a way to trace get that for me. I will give retsnoop a shot, thanks for the recommendation.
<raphaelsc_>
axboe, I was first wondering if there was a bug in io_uring that accidentally looked at memlock quota and returned ENOMEM, but the failure comes from write, and I am not registering buffers. XFS indeed came to mind as a plausible guilty. maybe there's an allocation somewhere in XFS that is marked as NOWAIT (for example, when expanding btree).
<raphaelsc_>
axboe, just to confirm, writes go through io_write, which calls underlying impl in the fs, I assume that runs inside the worker thread. looks like a good entry point candidate.
<raphaelsc_>
wdyt?
<raphaelsc_>
<...>-661829 [020] ...1. 44478.405379: io_uring_complete: ring 0000000086304bb8, req 000000003d1b0121, user_data 0x50f0000a2f70, result -12, cflags 0x0 extra1 0 extra2 0
raphaelsc_ has quit [Remote host closed the connection]