<f00b4r0>
nbd: so I have a device with kmemleak enabled currently exposing the bug, but I'm not sure what I should be looking for, if you could maybe give me a hint?
<nbd>
f00b4r0: can you send me the kmemleak data?
<f00b4r0>
sure
<f00b4r0>
sent. Problem started around 1AM from what I can tell from collectd data, and the device experienced a prior occurrence that ended in a crash reboot. Current uptime is 14:33
<f00b4r0>
slab usage is currently "stable" (albeit huge)
<f00b4r0>
ha, dmesg has more interesting data, I'll send you that too
<f00b4r0>
done
Borromini has joined #openwrt-devel
<f00b4r0>
for context this is a night club, so at 1AM there was actually quite a bit of wifi/network activity
<f00b4r0>
logread is full of "hostapd: handle_probe_req: send failed" but I guess that's a side effect
nixuser has quit [Remote host closed the connection]
Sawzallz has joined #openwrt-devel
Sawzall has quit [Ping timeout: 480 seconds]
rua has quit [Quit: Leaving.]
dannyAAM has joined #openwrt-devel
<nbd>
f00b4r0: so kmemleak isn't turning up any real relevant leaks
<nbd>
f00b4r0: can you check the /tmp part of df -h?
<f00b4r0>
nbd: tmpfs 120.4M 436.0K 119.9M 0% /tmp
<nbd>
hm
<f00b4r0>
the leak is definitely correlated to wireless activity, from what I've seen so far
<f00b4r0>
wireless/network
<f00b4r0>
so far it only affects one type of hardware
<f00b4r0>
(ZBT-WE1326 which is good ol' 802.11ac mt7621)
<f00b4r0>
hmm that's weird. According to collectd data, it seems that around 0:40 the device *did* manage to recover a large chunk of slab unreclaimable without rebooting
<f00b4r0>
this happened during OOM most likely given the values of free/buff/cache were close to zero
<f00b4r0>
that's around timestamp 16500 in the dmesg I sent you: that's when it killed hostapd
<f00b4r0>
oh wow
<f00b4r0>
I just kill -9 hostapd and slabtop showed skbuff entries shrinking in a matter of seconds
<f00b4r0>
it's now back to "normal"
vincejv has quit [Remote host closed the connection]
<f00b4r0>
"hostapd: netlink: recvfrom failed: No buffer space available" is shown at the end of the current log
<f00b4r0>
nbd: anything else I can collect? I'm going to have to reboot this device eventually, it doesn't look too good
<f00b4r0>
fwiw there's another device currently exposing the same problem but that one isn't running a slabinfo/kmemleak kernel
<nbd>
when this issue appears, i'd like to know if simply running 'wifi' also recovers the memory
<nbd>
or if it needs a hostapd restart
<f00b4r0>
ok, I'll check next time. The other device just died
zfdx123 has joined #openwrt-devel
zfdx123 has quit []
rua has joined #openwrt-devel
wvdakker has quit [Read error: Connection reset by peer]
nixuser has quit [Remote host closed the connection]
Borromini has joined #openwrt-devel
<soxrok2212_>
anyone have any tips to get a qca8337 working on an octeon board? i've tried adding a switch node with qca,qca8337 compatible string and linked port0 to my first pip interface, but linux complains "Port 0 not controlled by Linux, packet dropped"
Borromini has quit [Ping timeout: 480 seconds]
minimal has joined #openwrt-devel
fakuivan has quit [Remote host closed the connection]
fakuivan has joined #openwrt-devel
Borromini has joined #openwrt-devel
maciekb7218395381 has joined #openwrt-devel
maciekb721839538 has quit [Ping timeout: 480 seconds]
maciekb7218395381 is now known as maciekb721839538
noltari has quit [Quit: Bye ~ Happy Hacking!]
noltari has joined #openwrt-devel
Borromini has quit [Ping timeout: 480 seconds]
<dwfreed>
soxrok2212_: step 1 would be to look at how other devices with the same chip do it
<soxrok2212_>
there are no octeon+qca833x devices, unfortunately
ssterling has joined #openwrt-devel
<soxrok2212_>
all the octeon devices i see have phys directly connected. none use a switch
<Habbie>
i wonder how the juniper srx210 does it
<Habbie>
because when i tcpdump i recognise some data but there's clearly some extra header/framing
<Habbie>
2x gbit, 6x100. wonder if the 6x100 is behind one gbit phy
<Habbie>
soxrok2212_, what device you got there?
<soxrok2212_>
is a satcom modem made by cybertan
<Habbie>
ah
<Habbie>
how did you figure out the qca is in there?
<Habbie>
btw, did you need to disable usb?
<soxrok2212_>
i opened it up and looked
<soxrok2212_>
havent disabled USB yet
<Habbie>
ok. mine crashed during kernel boot with usb enabled :)
<soxrok2212_>
oh, nope this boots just fine with it
<Habbie>
cool
<Habbie>
octpkt0: <Octeon RGMII> on obio0
<Habbie>
original software boot log is not very verbose about the networking hardware
<Habbie>
ah yes i had [ 40.184716] Port 0 receive error code 10, packet dropped
<Habbie>
[ 9.929936] Interface 0 has 2 ports (GMII)
zer0def has quit [Quit: zer0def]
zer0def has joined #openwrt-devel
<soxrok2212_>
yeah
<soxrok2212_>
from what im piecing together with the help of a friend is this device probably bypasses the "pip nexus" altogether
<soxrok2212_>
through the pip and using kmod-phy-qca833x, i can enable the switch ports and inter-switch traffic passes fine
<soxrok2212_>
but switch <-> cpu does not
<soxrok2212_>
its just such a funky design, and the vendor's dts is useless
<soxrok2212_>
most other octeon boards have either ar803x phys or some vitesse phys