goliath has quit [Quit: SIGSEGV]
minimal has quit [Quit: Leaving]
PaulFertser has quit [Ping timeout: 480 seconds]
PaulFertser has joined #openwrt-devel
MatrixTravelerbot[m]1 has quit []
rua has quit [Quit: Leaving.]
valku has quit [Quit: valku]
xback has quit [Remote host closed the connection]
xback has joined #openwrt-devel
aiyion has quit [Ping timeout: 480 seconds]
lemoer_ has quit [Ping timeout: 480 seconds]
aiyion has joined #openwrt-devel
lemoer_ has joined #openwrt-devel
schwicht has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
aiyion has quit [Remote host closed the connection]
aiyion has joined #openwrt-devel
cbeznea has joined #openwrt-devel
ekathva has joined #openwrt-devel
Tapper has joined #openwrt-devel
guidosarducci has quit []
guidosarducci has joined #openwrt-devel
ptudor_ has joined #openwrt-devel
<owrt-2203-builds> Build [#149](https://buildbot.openwrt.org/openwrt-22.03/images/#builders/52/builds/149) of `bcm27xx/bcm2709` failed.
zorun has joined #openwrt-devel
ptudor has quit [Ping timeout: 480 seconds]
csrf1 has joined #openwrt-devel
goliath has joined #openwrt-devel
MaxSoniX has joined #openwrt-devel
Tapper has quit [Quit: Tapper]
Tapper has joined #openwrt-devel
srslypascal has quit [Ping timeout: 480 seconds]
cbeznea has quit [Quit: Leaving.]
srslypascal has joined #openwrt-devel
danitool has joined #openwrt-devel
<stintel> ok so the ENOSPC happens in install_file: https://gist.github.com/stintel/0d365e4bb2d1694b9c7762b5c7d32692
<stintel> looks like it might just be the sysupgrade image that is causing /tmp to fill up too much
<stintel> tmpfs 239.0M 233.1M 5.9M 98% /tmp
<stintel> after copying the sysupgrade image
<stintel> this could really use some improvement
<stintel> do a minimal space check and abort early with a message the user can read without having to do serial console
robimarko has joined #openwrt-devel
cbeznea has joined #openwrt-devel
<robimarko> This is basically the same issue that has plagued low RAM devices
<stintel> robimarko: sysupgrade could use some polishing really :)
<stintel> if catching problems before kicking user from SSH is hard, we should look into logging to a file on the overlay
<stintel> so that after reboot the user can inspect there what is going on
<robimarko> \x: Well, that is one use for it
<robimarko> stintel: SSH should get killed way before anything usefull other than downloading the sysugprade archive is done
<robimarko> *Is killed
<stintel> robimarko: if you run sysupgrade on command-line, a space check before killing SSH should not be impossible though
<\x> tested it with running differing bandwidth limits on iperf https://i.imgur.com/q9geupN.png
<stintel> anyway, no time for this now so I reported an issue
<\x> based man, colo showed me the way for that rx bytes thing, hella cool
<stintel> worked around the problem by giving the VM 1GB of RAM
<robimarko> stintel: That would take some restructuring for sure
<stintel> maybe the logging to overlay would be a better approach
<robimarko> That would be really usable for any kind of error
<stintel> it would increase debuggability tremendously
<robimarko> Cause, currently if you dont catch catch it via serial its gone
<stintel> yep
<stintel> debugging sysupgrade issues is horrendous
<Tapper> Wireshark 4.0 Network Protocol Analyzer Released https://www.phoronix.com/news/Wireshark-4.0-Released
f00b4r0 has quit [Remote host closed the connection]
csrf1 has quit [Ping timeout: 480 seconds]
f00b4r0 has joined #openwrt-devel
* f00b4r0 learns the hard way that procd_add_reload_interface_trigger() can fire up before the interface is "up"
<stintel> procd_add_raw_trigger "interface.*" 2000 /etc/init.d/foo restart might help you there ?
<stintel> 2000 being the delay iirc
<f00b4r0> i'm thinking procd_add_interface_trigger "interface.*.up" /etc/init.d/foo reload would work. I'm poring through code to ascertain, the documentation being scarce as it is ;)
ptudor_ is now known as ptudor
guidosarducci_ has joined #openwrt-devel
GNUmoon2 has joined #openwrt-devel
<stintel> please improve documentation where possible ;)
guidosarducci has quit [Ping timeout: 480 seconds]
<f00b4r0> stintel: hehe, I see what you're doing :D
<robimarko> As they say, patches are welcome
<f00b4r0> i'm happy to update doc when I'm _sure_ of my understanding. But with these parts, often I'm not ;P
GNUmoon has quit [Ping timeout: 480 seconds]
<stintel> yeah, docs are often scarce, did the hostapd ubus docs some time ago because this was useful for what I was working on at the time
<f00b4r0> that was very helpful
<robimarko> That reminds me, stintel you wanted to upstream ubus hostapd support?
<stintel> I asked for approval to work on that for day job but that didn't work
<robimarko> Ugh, that is a shame
<stintel> but I think it makes a lot of sense, as many SDKs out there are OpenWrt based, probably ubus implementation is way more used than wpa_supplicant/dbus combo
<robimarko> It for sure outnumber dbus by a large margin
<stintel> it would also allow to dedup some code
<stintel> as we're currently kind of copying the ctrl_iface code in ubus code, afair
<f00b4r0> well I can't come to a conclusion from the code either. I'm going to assume that "interface.*.up" only fires _after_ the interface is actually up ;P
srslypascal has quit [Ping timeout: 480 seconds]
srslypascal has joined #openwrt-devel
<f00b4r0> oh, I found the actual doc. Wasn't looking at the right place :)
srslypascal has quit [Read error: Connection reset by peer]
Gaspare has joined #openwrt-devel
rua has joined #openwrt-devel
<owrt-2203-builds> Build [#150](https://buildbot.openwrt.org/openwrt-22.03/images/#builders/52/builds/150) of `bcm27xx/bcm2709` completed successfully.
<mrkiko> anyone using shadowsocks and know how to limit the number of simultaneously connected clients, be them authenticated or not?
<karlp> stintel: if you're still working on qoriq, https://git.openwrt.org/?p=openwrt/openwrt.git;a=blob;f=target/linux/qoriq/Makefile;hb=HEAD#l11 should say "rootfs-part" not root-part
<stintel> karlp: thanks
Gaspare has quit [Ping timeout: 480 seconds]
<stintel> karlp: which email do I use in Reported-by ?
<Tapper> Hi any one know if the wax 204 will be added to openwrt like the wax202?
<robimarko> Is it the same SoC?
<Tapper> Don't know cant find it on google
<Tapper> Or my google skils are just crap.
<Tapper> I am asking because I spoted a wax 204 on sail were I live for £39
<robimarko> Like always, somebody has gotta do it, there is no central planning
<Tapper> robimarko Yeah mate I know the drill.
<Tapper> Just wanted to ask that's all.
<robimarko> FCC photos show Broadcom logo on the SoC
<robimarko> So, I guess thats game over
<Tapper> O know!
* Tapper Spits on Broadcom
<robimarko> Not high res enough to read the part number but being Broadcom and AX its game over
<svanheule> the firmware file is also riddled with brcm strings
<robimarko> Honestly, the Redmi AX6000 is looking like a good deal with Filogic 830
bluew has quit [Ping timeout: 480 seconds]
Ansuel has joined #openwrt-devel
<\x> ahemm, time to ask, coremark of that MT7986B?
<robimarko> Its just a quad core A53 at 2GHz max
<robimarko> So pretty much in line with IPQ807x
<\x> so like 6.7k ish
<robimarko> I am too lazy to actually try installing OpenWrt on Banana Pi 3 to test
<karlp> stintel: karlp@etactica.com I guess, was looking at this for work
<stintel> karlp: thanks
<stintel> fix pushed
<karlp> so... what does that feature actually _do_ anyway? I'm lost deep in teh guts of include makefile chains :)
<stintel> it exposes TARGET_ROOTFS_PARTSIZE afaik
<karlp> because _msot_ places put in features boot-part _and_ rootfs-part, but some only do rootfs-part
<karlp> which is just a size option in menuconfig for generating output images right?
<stintel> yeah
<stintel> so that makes sense for devices with "variable" storage
<stintel> e.g. SD cards
<stintel> iiuc
<karlp> I'm trying to add support for a device with emmc and sd, "factory" will always be from emmc, but building an sdcard image should be fine too.
<karlp> the existing sdcard images work fine, and you can just copy them to emmc, and it works fine too, but, is there any support in openwrt sysupgrade or friends for expanding a partition/fs after install?
<stintel> I didn't think so
<stintel> I usually create an extra partition e.g. /srv on what's left of the storage
<stintel> and use that for container storage or so
<karlp> yeah, that's what I was kinda leaning towards.
<stintel> but then if your boot/root fs size is changed you lose that partition on next sysupgrade
<karlp> I do want to use the emmc boot partitions to put uboot and environment on though, instead of slamming it into extra MBR partitions on the front.
<karlp> having a bit of a wheel spin playing with that and all the different ways uboot and linux let you create/write all those portions :)
<stintel> recently got a device with eMMC and uSD, the olinuxino a64 (sunxi)
<stintel> but I've not played with it enough
<stintel> right now it boots u-boot from SD
<stintel> and I can't get the eMMC to work
<karlp> yar, I'm expanding sunxi. got it booting from both happily,
<stintel> we might want to consider an auto-expand-rootfs-or-overlay feature I guess
<karlp> the big key to getting emmc to work was the "mmc bootbus" and "mmc partconf" lines from the end of https://linux-sunxi.org/Bootable_eMMC
<stintel> although ... with sysupgrade saving changes to /boot in many cases ... this will increase potential to fail sysupgrade
<karlp> (for me at least)
<karlp> yar, I've not even _started_ looking at geting sysupgrade to work yet, just trying to build "nice" images that I can flash neatly via sunxi-fel/dfu/UMS..
<karlp> trying to avoid the fastboot mess.
<stintel> s/changes/backup/
<stintel> maybe Daniel has some ideas about the subject
<karlp> yeah, I don't have any specific issues right now, just sort of trying to put all the pieces together.
<robimarko> Resizing could be done
<robimarko> I am doing it at work with Systemd, but it should be doable without it as well
<robimarko> But you gotta fix the GPT table first to be able to expand
<robimarko> That can be easily done with parted in script mode(That is how I am doing it)
<stintel> I actually wonder .. we want to avoid FAT partition with kernel (legacy), but in this case, how does sysupgrade backup/restore even work
<karlp> on sunxi in particular do you know if the theobroma people are still around? I've been updating uboot-sunxi to 22.07, and there'ðs some of these a31-pangolin patches that have conflicts with changes upstream, and I've got no way of testing/verifying.
<stintel> oh nice work, I was looking at that too
<stintel> I'd sent an RFT series to ML? cc people who added those patches?
<karlp> sure, I'm pretty close on some of this stuff.
<karlp> just knwo you'd been doing some sunxi stuff over the years as well.
<stintel> s/sent/send/
* stintel doesn't remember :D
<karlp> am considering whether to use fit images as well, instead of uImage, but I may just be making mroe work for myself..
<stintel> I'd definitely go for fit
<karlp> we already use fit images on other targets though, so seems like it should be "just do it"
<stintel> less u-boot env stuff requires
<stintel> geeeez
<stintel> s/requires/required/
* stintel goes for a walk
<karlp> thanks for the chat
<stintel> any time
<stintel> ping me if you want anything tested, too
<ynezz> karlp: IIRC 100-102 can be removed https://git.openwrt.org/375d8031522b8a180225327584c415a8e85ee51b
<karlp> yeah, I'd dropped most of it, I saw the upstream commits that hcnaged it, but I left the gpio defines in, https://github.com/etactica/openwrt/commit/289490f716943f96bcf19ba1c158df56ac1e5de5 is my 22.07 upgrade so far.
<karlp> yours looks more complete though, I like it.
<karlp> I tried to keep what was left of those patches.
<karlp> so. is this a git.openwrt server config error or what? https://git.openwrt.org/?p=openwrt/staging/ynezz.git;a=patch;h=375d8031522b8a180225327584c415a8e85ee51b encodes ynezz's name in the from, but not in the signed-off-by?
<karlp> that's the "patch" link, the "raw" link works fine, *shrugs*
<Ansuel> signedoff-by is email body
<Ansuel> from needs to be encoded or some email program will fk up
<Ansuel> but now that i think about it with git format-patch also the signed off by tag is encoded
<Ansuel> MH
<f00b4r0> Ansuel: the tag isn't encoded by format-patch. Only the From, for the reason you pointed out :)
<Ansuel> thanks for clarifying that
<robimarko> git format-patch wont touch the body
<karlp> well, if you wget the "raw" link, it "works" jsut fine with git am, the "patch" links are therefore, IMO, ~useless?
<Ansuel> endianess question.... how le64_to_cpu works on 32 bit system ?
<nbd> why should it make a difference if the system is 32 bit or 64 bit?
<robimarko> Same way as 64 bit variables work on 32 bit systems
<Ansuel> address size
<Ansuel> i can't understand how they write the 32 bit remaining stuff if we don't pass any pointer
floof58 has quit [Read error: Connection reset by peer]
<nbd> what's the context of that question?
<Ansuel> (practical example) mib descriptor are 64 bit... would someting like *mib = le64_to_cpu(*data); works?
floof58 has joined #openwrt-devel
<Ansuel> (this is a 32bit system)
<nbd> works the same way on 32 bit and 64 bit machines in terms of behavior
<nbd> only the internals are different in the cpu
<nbd> since 64 bit can put it in a register, the 32 bit machine may need to transfer the 32 bit chunks individually
<nbd> depending on arch level, instruction set, etc.
<nbd> but you don't have to care about that
<Ansuel> ok so the handling is done internally. Had some doubt these kind of stuff wasn't supported
<nbd> the compiler deals with it for you...
<stintel> nbd: not sure you got our pings yesterday, would you mind having a look at https://patchwork.ozlabs.org/project/openwrt/list/?series=313995 ?
<stintel> ah actually there might be a v2 on the way
<stintel> janvenekamp: were you going to send v2 for ^
ekathva has quit [Remote host closed the connection]
<nbd> series looks good to me
<stintel> was this also the series that fixes truncated files on ENOSPC?
<stintel> if so, would be nice if some people who ran into that can Tested-by the series
<robimarko> Ansuel: Any chance you can check SAW version on IPQ8064?
<robimarko> Dmitry asked both of us couple of days ago, just stumbled on it by accident
<robimarko> Actually no need, I remebered I have Asrock G10 which is 8064
<Ansuel> robimarko if you want i'm doing some thing with ipq8064
<Ansuel> right now
<Ansuel> anyway i totally didn't understand the logic of saw version for this old platform
<robimarko> Ok, then dump register 0x02011FD0
<robimarko> If I am reading it right its v1 something
<Ansuel> but of v1 there are many revision for that but let me dump that reg
<robimarko> Yeah, datasheet for IPQ806x KPSS says that both minor and step are bits 15:=
<robimarko> 15:0 which makes no sense
<robimarko> As bit 16 is set
srslypascal has joined #openwrt-devel
<karlp> anyone know of the top of their heads the difference beteen KERNELNAME and KERNEL_NAME?
<Ansuel> (btw i also totally missed that
<Ansuel> no idea how)
<karlp> KERNELNAME seems to be for building the kernel, and KERNEL_NAME for making images? but why?
<Ansuel> right from uboot should even be better
<robimarko> Ansuel: I just used U-boot with md.l and make sure to only use single object mode
<robimarko> As its gonna crash otherwise
<robimarko> But I would say that version is probably 1.1
<robimarko> I replied with raw readings, make sure to reply as well
<robimarko> QCA will probably chime in with the actual bit definitions
<Ansuel> same values
<Ansuel> i had some fun with inspecting the saw driver some times ago and i notice this discrepancy
<Ansuel> that our revision of v1 values doesn't have a version value
<Ansuel> so probably this is why documentation is wrong and we don't have anything
<robimarko> Datasheet defines 2 registers for values
<robimarko> 0x02011FD0 APCS_VERSION
<robimarko> And 0x02091FD0 EXT_APCS_VERSION
<robimarko> So they either changed the register layout or the one in the DTS is wrong
<Ansuel> btw on my case uboot doesn't crash
<robimarko> Then register layout is different
<robimarko> As on 4019 most of SAW registers are "secure" so trying to just read them will crash it
<Ansuel> well i would be strange toi have the same layout saw changed a lot from ipq806x and ipq4019 or the entire power management itself changed from ipq806x and ipq4019
srslypascal has quit [Quit: Leaving]
<robimarko> Well, lets see what comes out of the discussion
valku has joined #openwrt-devel
<Ansuel> also 0x02011000 is kpss-gcc
<Ansuel> saw for l2 regulator on ipq806x is
<Ansuel> 0x02012000
cc0 has joined #openwrt-devel
<cc0> hi, what is the union (',') format supported by openwrt's jsonfilter?
<Ansuel> extra cursed endianess handling with value all scambled in the switch
<Mangix> That for qca8k?
<Ansuel> yes but i'm stupid and the final code is just
<Ansuel> mib_eth_data->data[i] = le64_to_cpu(*(__le64 *)data2);
<Ansuel> i just had fun doing all manually with cast and stuff
<Ansuel> but then i notice there wasn't anything strange and it was all linear...
srslypascal has joined #openwrt-devel
Q__ has quit [Quit: Client limit exceeded: 20000]
<Ansuel> robimarko wonder what fixes tag should i use for the qca8k patch...
<Ansuel> guess the one that introduced the feature...
<Ansuel> i should reconsider my idea about myself
Tapper has quit [Read error: Connection reset by peer]
Tapper has joined #openwrt-devel
philipp64 has quit [Ping timeout: 480 seconds]
robimarko has quit [Remote host closed the connection]
robimarko has joined #openwrt-devel
<robimarko> Ansuel: Best to not reply to those kind and just ignore themž
<robimarko> Usually they give up
<robimarko> And yeah, use the fixes for the initial commit adding the feature
philipp64 has joined #openwrt-devel
robimarko has quit [Quit: Leaving]
robimarko has joined #openwrt-devel
robimarko has quit []
robimarko has joined #openwrt-devel
robimarko has quit []
robimarko has joined #openwrt-devel
robimarko has quit []
robimarko has joined #openwrt-devel
robimarko has quit []
robimarko has joined #openwrt-devel
robimarko has quit []
robimarko has joined #openwrt-devel
robimarko has quit [Quit: Leaving]
robimarko has joined #openwrt-devel
robimarko has quit []
robimarko has joined #openwrt-devel
philipp64 has quit [Quit: philipp64]
philipp64 has joined #openwrt-devel
philipp64 has quit [Ping timeout: 480 seconds]
philipp64 has joined #openwrt-devel
csrf1 has joined #openwrt-devel
cbeznea has quit [Quit: Leaving.]
MaxSoniX has quit [Quit: Konversation terminated!]
schwicht has joined #openwrt-devel
philipp64 has quit [Quit: philipp64]
schwicht has quit [Ping timeout: 480 seconds]
<janvenekamp> stintel: Yes I was going for a v2 for the uci patches.
<janvenekamp> However, I have some doubts about the best solution for some edge cases.
<janvenekamp> ndb: can I ask you about this? Or someone else here familiar with the uci code?
philipp64 has joined #openwrt-devel
<janvenekamp> nbd: ^
<nbd> i haven't touched uci in many years, so i'm not that familiar with it anymore
<nbd> but ask anyway
<janvenekamp> the problem is with calling uci_set with strcmp(ptr->section, ptr->s->e.name) != 0
<janvenekamp> or with strcmp(ptr->option, ptr->o->e.name) != 0
<janvenekamp> this could cases could be considered what i would say "illegal uci_rename"
<janvenekamp> the behaviour is different when using delta tracking or not
Borromini has joined #openwrt-devel
<janvenekamp> I think this should be straitened out, I am considering two options for this:
philipp64 is now known as Guest2454
philipp64 has joined #openwrt-devel
<janvenekamp> 1: throw UCI_ERR_INVAL when this occurs
Guest2454 has quit [Ping timeout: 480 seconds]
<janvenekamp> 2: completly ignore ptr->section and ptr->option and use ptr->s->e.name, ptr->o->e.name for uci_add_delta
<janvenekamp> What do you think?
<nbd> i think the intention was to do 2
cbeznea has joined #openwrt-devel
<janvenekamp> Ok, thank you. I am going to make a v2 with that approach.
csrf1 has quit [Ping timeout: 480 seconds]
<nbd> thanks
<robimarko> Ansuel: Stop wasting your time and mental health on that dude
<Ansuel> i couldn't resist...
<Ansuel> dimitry answerd
<robimarko> Yeah, I am just looking
<Ansuel> This is a part of l2cc, rather than SAW.
<robimarko> BTW, any ideas on how to patch this stupid ath11k TX timeout
<Ansuel> wait.... but that reg is set as saw in ipq806x dts
<Ansuel> TX timeout related to sysupgrade?
<robimarko> Yep, until that is sorted out I dont feel IPQ807x is ready
<Ansuel> yep cause sysupgrade will fail silently
<robimarko> I am also thinking about reverting ath11k decap offloading
<robimarko> As its breaking WDS
<Ansuel> mhhh i would keep the patch but just not set it
<robimarko> I dont mean to remove the code
<robimarko> Just not set the module param and thus enable it
<Ansuel> ok oh yes same idea just revert the module param
<robimarko> yeah
<Ansuel> if an user wants it he can enable it pretty easily
<Ansuel> for ath11k we have 2 way... use a big hack and give time for wpad to get killed
<Ansuel> or I need to find a correct way to repro and investigate where is the breakage... 99% there is flawed logic in how the ap peer is removed
<robimarko> So far I have not found a way to reproduce it on demand
<Ansuel> about that i have some idea like clear all the ath11k ring before removing the peer
<robimarko> It just happens after the AP is alive for a while
<Ansuel> i mean the problem here is all the indirect handling of the peer so it's probably something like the ring is handling packet while the ap is torn down
<robimarko> The thing is that it basically has TX packets in the queue
<robimarko> That never get sent
<Ansuel> so the idea is stop tx queue
<Ansuel> clear ring
<Ansuel> remove ap
<Ansuel> can also be that tx queue is never stopped
<Ansuel> let me check the code... i have spare time to waste while i compile test ipq806x
<robimarko> I kind of have a feeling that possible issue is FW getting killed or misconfigured before the peer is removed
<robimarko> And thus those packets newer leave and have to timeout
<Ansuel> but we tested that wifi the wifi down thing
<robimarko> Thats a hack
<Ansuel> another idea would be make the make the fw crash :DDD
<robimarko> It just gives it time to timeout, nothing else
<robimarko> Well, you can simulate a FW crash from debugfs on demand
<Ansuel> but the timeout should not happen so it's totally something not cleared
<Ansuel> when the ap peer is removed
<robimarko> I agree
<Ansuel> could also be that there already some patch in wlan-open that fix that
<robimarko> My "idea" is that driver tells the FW to remove the peer too early
<robimarko> And only then syncs that with mac80211
<Ansuel> my idea is that it tells to remove the peer while still transmitting or while the tx ring are still handling stuff
<robimarko> wlan-open is a dumpster fire
<robimarko> Good luck finding anything there, its all lacking description
<Ansuel> notice the juicy patch are the fix-compilation-error patch
<Ansuel> you can find all kind of fixes in there
<robimarko> I am planning to run linux-next and leave hostapd working tommorow on it
<robimarko> And then see if it produces the same error(It should)
<Ansuel> on another topic no responde on bugzilla
cbeznea has quit [Quit: Leaving.]
<Ansuel> robimarko do you have the exact error ?
<Ansuel> so i can check where is produced in the driver?
<Ansuel> (btw litterally &ar->dp.num_tx_pending)
philipp64 has quit [Read error: No route to host]
philipp64 has joined #openwrt-devel
<Ansuel> on the op stop
<Ansuel> atomic_set(&ar->num_pending_mgmt_tx, 0);
<Ansuel> ok this is bad
<Ansuel> ....
<robimarko> You mean trace?
<Ansuel> i mean just the error printed in the syslog
<robimarko> I dont have it saved, it is in the bugzilla report though
<robimarko> The number of packets varies though
csrf1 has joined #openwrt-devel
<robimarko> This reminds me, its time to add NVMEM support to ath11k
<Ansuel> should be easy more or less
<robimarko> Yeah, its not hard
<robimarko> I did it before, but never cleaned it up
<Ansuel> mh
<Ansuel> there is this .flush function
<Ansuel> part of ieee80211_ops
<Ansuel> need to understand the use since it does't call the mac_drain_tx
<Ansuel> that comunicate to the fw to remove any tx packet
<slh> robimarko: do you happen to know a way to toggle the bootorder on a soft-bricked ASRock g10 (the OEM firmware should still be fine on the other partition, as it's never touched)? I bricked mine while testing a v5.15 based build a few weeks back and am still physically distant from it and can't do a serial console based recovery for a few more weeks to come
* Ansuel hides
<robimarko> slh: The upgrade script has some logic that modifies the bootconfig part
<robimarko> And AFAIK, bootloader is reading that during boot
<Ansuel> *If the parameter @drop is set to %true, pending frames may be dropped.
<Ansuel> * @flush: Flush all pending frames from the hardware queue, making sure
<Ansuel> *that the hardware queues are empty. The @queues parameter is a bitmap
<Ansuel> *of queues to flush, which is useful if different virtual interfaces
<Ansuel> *use different hardware queues; it may also indicate all queues.
<Ansuel> *Note that vif can be NULL.
<Ansuel> *The callback can sleep.
<Ansuel> the error comes from calling .flush
<slh> robimarko: yeah, but sadly it didn't trip that logic on that test build (it did in earlier tests) and sticks to the non-booting (at least not responding over the network) partition, but I guess I#ll have to go the serial console way now
<slh> interrupting early boot doesn't seem to trigger it, nor keeping the reset button pressed while powering on
<robimarko> slh: sadly, serial is the only way to go then
<slh> not a biggy, just unfortunate as I'm not near it for now
<robimarko> Ansuel: Well yeah, as that OP calls ath11k_mac_op_flush which then calls ath11k_mac_flush_tx_complete
<robimarko> But what doesnt make sense to me is why its not calling ath11k_mac_wait_tx_complete instead?
<Ansuel> we need to check how the mac80211 stop flow works
<robimarko> Ansuel: I doubt its a mac80211 issue
<Ansuel> no it's ath11k issue but i need to check if op_flush is called before stop
<robimarko> I just dont see how is it supposed to work in ath11k
<Ansuel> and how
<Ansuel> a fix can also be just if (drop) { ath11k_mac_drain_tx(ar); return; }
<robimarko> Its calling ath11k_mac_flush_tx_complete directly
<robimarko> And ath11k_mac_drain_tx which actually does flushing is never called
<Ansuel> it's called in op_stop
<Ansuel> first function
<Ansuel> drain_tx is more aggressive from what i can see
<robimarko> ok, so they are counting on that doing the thing before flush could ever get called
<Ansuel> this is why i need to understand what is called first
<Ansuel> and how
<Ansuel> ieee80211_do_stop should be the one doing the stop in theory
<robimarko> Thats the thing, according to the flush op description it should flush the packets as well
<robimarko> Cause, they are relying on stop being called first to call ath11k_mac_drain_tx
<robimarko> From my view, flush should call ath11k_mac_wait_tx_complete directly
<robimarko> As that will flush the TX queue and then call the sanity checker
<Ansuel> my only concern is that flush is also used in other context and using drain may be problematic
<Ansuel> but if i understand the code
<Ansuel> the bool drop should just say that
<Ansuel> If the parameter @drop is set to %true, pending frames may be dropped.
<Ansuel> with this true drain_tx
<robimarko> Ansuel: Well, that may be the issue
<Ansuel> the function is ieee80211_flush_queues
<Ansuel> searching if it's actually used tho
<robimarko> Cause, the way I see it they are just returning if drop is true
<robimarko> Shouldnt it be the other way around?
<robimarko> Cause: If the parameter @drop is set to %true, pending frames may be dropped.
<robimarko> But currently, ath11k_mac_flush_tx_complete is called if its not true
<Ansuel> mhhh but wait if it's true then how we reach the tx_complete?
<robimarko> I think that the condition is reversed
Borromini has quit [Quit: Lost terminal]
<robimarko> So ath11k_mac_flush_tx_complete gets called when packets arent supposed to get flushed
<robimarko> Please sanity check me as I have been sick for a week and its getting hard to work late
<Ansuel> it's correct flush with the bool false should NOT drop packet
<Ansuel> so tx purge should NOT be used so it's correct
<Ansuel> what i can't understand is why it's called with the bool on
<Ansuel> why it's not called*
<robimarko> Well, then they are doing the reverse
<robimarko> They are checking if drop is true
<robimarko> And if not continuing, otherwise just return early
<Ansuel> i think they return early as they assume you are removing the interface if you want to drop packet while purging tx and that is the first thing done on the interface stop function
<robimarko> But then any check on drop is useless
<robimarko> If you want to rely on fact that stop should be called before and packets flushed anyway
<Ansuel> we can also check what other wifi driver do with the flush
<robimarko> ath10k for example in flush flushes per station
<Ansuel> IMHO problem is that there is not handling here ath11k_mac_op_remove_interface
<Ansuel> interface is removed
<Ansuel> tx queue is not cleared
<Ansuel> packets gets lost
<Ansuel> one part of the puzzle is clear now i need to understand who call flush
<robimarko> Is it just weird to me that TX flush is just freeing SKB-s
<robimarko> There are no calls to WMI aka FW like in ath10k
<Ansuel> ath11k_mac_drain_tx does a call to purge tx
<Ansuel> robi
<Ansuel> I WONDER IF
<Ansuel> this is just a typo
<Ansuel> anyway if you want we can experiment with a simple fix
<robimarko> Ansuel: ath11k_mgmt_over_wmi_tx_purge just frees the SKB-s
<robimarko> It doesnt make a WMI call
<Ansuel> just notice wth o.O
<robimarko> So, it looks like they are dequeing?
<robimarko> I assume that means prevent it from being sent
<robimarko> Then removing worked that handles WMI_MGMT_TX_COMPLETION_EVENTID
<robimarko> And simply freeing the SKB-s and calling it a day
<robimarko> But this synchronize_net();
<robimarko> Is worrying me
<robimarko> *
<robimarko> *synchronize_net - Synchronize with packet receive processing
<robimarko> *Wait for packets currently being received to be done.
<robimarko> *Does not block later packets from starting.
<robimarko> */
<Ansuel> in theory mac80211 has already stopped tx queue
<robimarko> Cause it waits for existing to complete, but does not prevent new ones from starting in the meantime
<Ansuel> so no more packet should come
csrf1 has quit [Ping timeout: 480 seconds]
<robimarko> I dont know, its just weird
<Ansuel> still need to understand who is calling .flush
<Ansuel> can't find that in the codeflow...
<robimarko> nbd: you still up
<robimarko> Ansuel: drv_flush from net/mac80211/driver-ops.h calls the flush op
<robimarko> And that is called by __ieee80211_flush_queues
<robimarko> Which is called by ieee80211_flush_queues
<Ansuel> yes and then ieee80211_flush_queues is not called by any function related to interface stop
<Ansuel> or remove
<Ansuel> BUT on sysupgrade .flush is called or the error doesn't make sense
<robimarko> Ansuel: It happens if you stop the interface as well
<Ansuel> since ath11k_mac_wait_tx_complete is also called only in ath11k/core
<robimarko> I would bet its .del_tx_ts cfg80211 being invoked
<robimarko> As that calls ieee80211_del_tx_ts
<robimarko> Which calls ieee80211_flush_queues
<nbd> robimarko: yes
<robimarko> nbd: Great, do you happen to know if its safe to assume that stop op is always gonna get called before flush does?
<robimarko> Cause, we are trying to chase this down: https://bugzilla.kernel.org/show_bug.cgi?id=216513
<Ansuel> nl80211_del_tx_ts
<Ansuel> wth is ts
<robimarko> Its the nl80211 call
<robimarko> AFAIK, nl80211->cfg80211->mac80211
<robimarko> LOL, just now figured you are asking about the "ts" part
<Ansuel> yep
<Ansuel> just to understand the naming
<Ansuel> well guess it's time to reset my uptime and do some test to debug the codeflow...
<Ansuel> are we sure it's something related to time and not related to amount of packet?
<robimarko> Not really
<Ansuel> did we tested if a sysupgrade while an iperf is running works?
<robimarko> Not that I know of
<nbd> robimarko: are you taking about mac80211 driver ops?
<Ansuel> seems a nice idea to hammer the ring
<robimarko> nbd: Yes
<robimarko> As QCA has pretty much made the assumptio that stop always gets called before flush
<nbd> that's weird
<nbd> .stop should only be called when all interfaces are removed
<nbd> as a last step for shutting down the radio
<robimarko> nbd: Well, the driver currently as far as I can tell expects that stop was called before flush
<robimarko> As nothing actually gets flushed on flush
<robimarko> It just calls the function that makes sure it was flushed
<nbd> what part of the code is your expectation based on?
<Ansuel> robi the path is .remove_interface
<Ansuel> and at the end .stop
<robimarko> nbd: ath11k_mac_op_flush
<robimarko> It just calls ath11k_mac_flush_tx_complete
<robimarko> Which despite its name is a eror checker basically
<robimarko> ath11k_mac_drain_tx which actually does the flushing is called in start and stop ops
<robimarko> To me that is kind of a broken assumption
<Ansuel> but drain drops packet
<robimarko> Yeah, but only on start and stop mac80211 cals
<robimarko> Not if flush is called
<nbd> flush is per-vif
<nbd> start/stop is global
<nbd> i mean from an api perspective
<robimarko> nbd: check ath11k_mac_op_flush
<Ansuel> nbd on ath11k it's all global LOL
<hauke> Mangix: I would like to do an OpenWrt 21.02.4 and OpenWrt 22.03.1 release in the next days
<robimarko> It doesnt actually flush anything
Ansuel has quit [Read error: Connection reset by peer]
<hauke> Mangix: Do I have to do anything special in the package feed?
<nbd> it doesn't implement drop, but it's intended to wait until frames have been sent
Ansuel has joined #openwrt-devel
<Ansuel> hauke if we had some plan for 22.03.1 why we did all the hack with the .1? we could just release a new point release
<Mangix> hauke: I don't think so.
<robimarko> nbd: well, we are hitting that it times out so send those
<robimarko> But, it seems to be happening on mac80211 stop after the AP has been up for a while
<robimarko> And stop is supposed to actually flush them
<nbd> so maybe some frames are stuck inside a queue somewhere, which doesn't clear
<Ansuel> robi what i can't understand is that they have a timeout for the flush but nothing that triggers it
<Ansuel> drain tx just free skb
<robimarko> Well yeah
<Ansuel> so it's really just wait for the ring to complete
<Ansuel> and send empty ring interrupt
<Ansuel> ...
<robimarko> My current logic is that something is not stopped
<robimarko> And thus new packets arrive after existing ones have been freed
<Ansuel> well the ring is global so new packets arrive from other interface
<Ansuel> lol...
<Ansuel> so it can be it even timeouts on the first interface_remove
<Ansuel> if a device switch to the other band
<Ansuel> ... MH
<hauke> Ansuel: The wolfssl updates are already shipped now
<robimarko> Hm, I am gonna disable 2G band
<hauke> and not everyone upgrades everything immediately
<robimarko> Just leave the 5G one for hours
<hauke> Mangix: thanks for the information
<robimarko> And see if it still triggers
<Ansuel> well if that's the case..... the driver require complete rework...
schwicht has joined #openwrt-devel
<Ansuel> check ath11k_mac_flush_tx_complete
<Ansuel> the num_tx_pending is global
<Ansuel> or the flush function should just be fixed to flush skb relevant to the current vif
<robimarko> So my worries about synchronize_net
<robimarko> Are kind of possible
<robimarko> As "radios" are not synchrounously stopped
<nbd> i don't think this issue is related to stopping the radio
<nbd> it's related to stopping vis
<nbd> vifs
<Ansuel> yep
<nbd> and i don't buy the theory that this is caused by too many packets coming in on another vif
<nbd> the load would have to be really strong to make it run into the timeout
<Ansuel> nbd to me it seems that the timeout occur because a device just connect to another interface
<Ansuel> and fill the tx queue
<Ansuel> robimarko but this theory doesn't work as wifi down before kill should have worked...
<nbd> i think it's far more likely that this is not about a tx queue for any vif being constantly filled
<nbd> it's either a counter imbalance
<nbd> or packets getting stuck somewhere
bluew has joined #openwrt-devel
<nbd> hauke: btw. are there any plans to update backports to 6.0?
<Ansuel> well need to examin what increase the num_tx_pending
schwicht has quit [Ping timeout: 480 seconds]
<Ansuel> can also be that some packet disappear out of existance and are never decreased from the counter
<Ansuel> if that's the case this is a nighmare...
<nbd> it's ath11k... of course there are going to be nightmares :)
<Ansuel> ./dp.h:174: atomic_t num_tx_pending;
<Ansuel> ./mac.c:7290: (atomic_read(&ar->dp.num_tx_pending) == 0),
<Ansuel> ansuel@Ansuel-xps  ~/ath/drivers/net/wireless/ath/ath11k   master  grep -rnw . -e num_tx_pending
<Ansuel> ./dp_tx.c:267: atomic_inc(&ar->dp.num_tx_pending);
<Ansuel> ./mac.c:7294: atomic_read(&ar->dp.num_tx_pending));
<Ansuel> ./dp_tx.c:310: if (atomic_dec_and_test(&ar->dp.num_tx_pending))
<Ansuel> ./dp_tx.c:339: if (atomic_dec_and_test(&ar->dp.num_tx_pending))
<Ansuel> ./dp_tx.c:719: if (atomic_dec_and_test(&ar->dp.num_tx_pending))
<Ansuel> ./dp.c:898: atomic_set(&dp->num_tx_pending, 0);
<Ansuel> MHE NOT BAD
<Ansuel> (sorry for the spam)
<Ansuel> actually there are some case where the counter is not decreased
<Ansuel> robimarko do you have other error or logread is clean for ath11k ?
<hauke> nbd: I haven't found the time to update backprots yet
schwicht has joined #openwrt-devel
<robimarko> Ansuel: Not that I know of
<robimarko> I am off to bed, can barely look
robimarko has quit [Quit: Leaving]
csrf1 has joined #openwrt-devel
Ansuel has quit [Quit: Probably my PC decided to sleep or I decided to sleep.]
srslypascal has quit [Remote host closed the connection]
srslypascal has joined #openwrt-devel
srslypascal has quit [Remote host closed the connection]
schwicht has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
srslypascal has joined #openwrt-devel
linusw_____ has quit []
schwicht has joined #openwrt-devel
schwicht has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
danitool has quit [Quit: Cubum autem in duos cubos, aut quadratoquadratum in duos quadratoquadratos]
Tapper has quit [Ping timeout: 481 seconds]
goliath has quit [Quit: SIGSEGV]
xback has quit [Remote host closed the connection]
xback has joined #openwrt-devel
<karlp> feck, was tearing my hair out trying to do some extra work at home, runnign into weird things, patches not being applied.
<karlp> seems i had a git-src symlink in place from some earlier work :|