#openwrt-devel on 2022-10-05 — irc logs at oftc.irclog.whitequark.org

2022-09-06 06:32 rmilecki changed the topic of #openwrt-devel to: Forum: https://forum.openwrt.org/ | Wiki: http://openwrt.org/ | Release: https://downloads.openwrt.org/releases/22.03.0/ | Notes: https://openwrt.org/releases/22.03/notes-22.03.0 | Logs: https://oftc.irclog.whitequark.org/openwrt-devel

00:34 goliath has quit [Quit: SIGSEGV]

01:23 minimal has quit [Quit: Leaving]

02:34 PaulFertser has quit [Ping timeout: 480 seconds]

03:07 PaulFertser has joined #openwrt-devel

03:18 MatrixTravelerbot[m]1 has quit []

03:32 rua has quit [Quit: Leaving.]

04:42 valku has quit [Quit: valku]

04:57 xback has quit [Remote host closed the connection]

04:59 xback has joined #openwrt-devel

05:01 aiyion has quit [Ping timeout: 480 seconds]

05:01 lemoer_ has quit [Ping timeout: 480 seconds]

05:02 aiyion has joined #openwrt-devel

05:11 lemoer_ has joined #openwrt-devel

05:15 schwicht has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]

05:30 aiyion has quit [Remote host closed the connection]

05:30 aiyion has joined #openwrt-devel

05:45 cbeznea has joined #openwrt-devel

06:10 ekathva has joined #openwrt-devel

07:02 Tapper has joined #openwrt-devel

07:07 guidosarducci has quit []

07:07 guidosarducci has joined #openwrt-devel

07:08 ptudor_ has joined #openwrt-devel

07:10 <owrt-2203-builds> Build [#149](https://buildbot.openwrt.org/openwrt-22.03/images/#builders/52/builds/149) of `bcm27xx/bcm2709` failed.

07:12 zorun has joined #openwrt-devel

07:12 ptudor has quit [Ping timeout: 480 seconds]

07:15 csrf1 has joined #openwrt-devel

07:16 goliath has joined #openwrt-devel

07:39 MaxSoniX has joined #openwrt-devel

07:43 Tapper has quit [Quit: Tapper]

07:44 Tapper has joined #openwrt-devel

07:48 srslypascal has quit [Ping timeout: 480 seconds]

07:50 cbeznea has quit [Quit: Leaving.]

07:52 srslypascal has joined #openwrt-devel

08:13 danitool has joined #openwrt-devel

08:22 <stintel> ok so the ENOSPC happens in install_file: https://gist.github.com/stintel/0d365e4bb2d1694b9c7762b5c7d32692

08:27 <stintel> looks like it might just be the sysupgrade image that is causing /tmp to fill up too much

08:27 <stintel> tmpfs 239.0M 233.1M 5.9M 98% /tmp

08:27 <stintel> after copying the sysupgrade image

08:28 <stintel> this could really use some improvement

08:28 <stintel> do a minimal space check and abort early with a message the user can read without having to do serial console

08:29 robimarko has joined #openwrt-devel

08:34 <stintel> https://github.com/openwrt/openwrt/issues/10898

08:36 cbeznea has joined #openwrt-devel

08:44 <robimarko> This is basically the same issue that has plagued low RAM devices

08:45 <\x> robimarko: good use of that led on MR7350 https://litter.catbox.moe/2v39gn.webm https://paste.wowlet.app/p/67PNC.txt

08:46 <stintel> robimarko: sysupgrade could use some polishing really :)

08:46 <stintel> if catching problems before kicking user from SSH is hard, we should look into logging to a file on the overlay

08:46 <stintel> so that after reboot the user can inspect there what is going on

08:47 <robimarko> \x: Well, that is one use for it

08:47 <robimarko> stintel: SSH should get killed way before anything usefull other than downloading the sysugprade archive is done

08:48 <robimarko> *Is killed

08:48 <stintel> robimarko: if you run sysupgrade on command-line, a space check before killing SSH should not be impossible though

08:48 <\x> tested it with running differing bandwidth limits on iperf https://i.imgur.com/q9geupN.png

08:48 <stintel> anyway, no time for this now so I reported an issue

08:49 <\x> based man, colo showed me the way for that rx bytes thing, hella cool

08:49 <stintel> worked around the problem by giving the VM 1GB of RAM

08:49 <robimarko> stintel: That would take some restructuring for sure

08:50 <stintel> maybe the logging to overlay would be a better approach

08:50 <robimarko> That would be really usable for any kind of error

08:50 <stintel> it would increase debuggability tremendously

08:50 <robimarko> Cause, currently if you dont catch catch it via serial its gone

08:50 <stintel> yep

08:51 <stintel> debugging sysupgrade issues is horrendous

08:56 <Tapper> Wireshark 4.0 Network Protocol Analyzer Released https://www.phoronix.com/news/Wireshark-4.0-Released

08:56 f00b4r0 has quit [Remote host closed the connection]

08:56 csrf1 has quit [Ping timeout: 480 seconds]

08:59 f00b4r0 has joined #openwrt-devel

09:20 * f00b4r0 learns the hard way that procd_add_reload_interface_trigger() can fire up before the interface is "up"

09:21 <stintel> procd_add_raw_trigger "interface.*" 2000 /etc/init.d/foo restart might help you there ?

09:21 <stintel> 2000 being the delay iirc

09:23 <f00b4r0> i'm thinking procd_add_interface_trigger "interface.*.up" /etc/init.d/foo reload would work. I'm poring through code to ascertain, the documentation being scarce as it is ;)

09:24 ptudor_ is now known as ptudor

09:28 guidosarducci_ has joined #openwrt-devel

09:31 GNUmoon2 has joined #openwrt-devel

09:32 <stintel> please improve documentation where possible ;)

09:32 guidosarducci has quit [Ping timeout: 480 seconds]

09:33 <f00b4r0> stintel: hehe, I see what you're doing :D

09:33 <robimarko> As they say, patches are welcome

09:34 <f00b4r0> i'm happy to update doc when I'm _sure_ of my understanding. But with these parts, often I'm not ;P

09:34 GNUmoon has quit [Ping timeout: 480 seconds]

09:35 <stintel> yeah, docs are often scarce, did the hostapd ubus docs some time ago because this was useful for what I was working on at the time

09:35 <f00b4r0> that was very helpful

09:36 <robimarko> That reminds me, stintel you wanted to upstream ubus hostapd support?

09:37 <stintel> I asked for approval to work on that for day job but that didn't work

09:38 <robimarko> Ugh, that is a shame

09:38 <stintel> but I think it makes a lot of sense, as many SDKs out there are OpenWrt based, probably ubus implementation is way more used than wpa_supplicant/dbus combo

09:38 <robimarko> It for sure outnumber dbus by a large margin

09:38 <stintel> it would also allow to dedup some code

09:39 <stintel> as we're currently kind of copying the ctrl_iface code in ubus code, afair

09:45 <f00b4r0> well I can't come to a conclusion from the code either. I'm going to assume that "interface.*.up" only fires _after_ the interface is actually up ;P

09:46 srslypascal has quit [Ping timeout: 480 seconds]

09:49 srslypascal has joined #openwrt-devel

09:49 <f00b4r0> oh, I found the actual doc. Wasn't looking at the right place :)

09:49 srslypascal has quit [Read error: Connection reset by peer]

10:51 Gaspare has joined #openwrt-devel

10:52 rua has joined #openwrt-devel

10:55 <owrt-2203-builds> Build [#150](https://buildbot.openwrt.org/openwrt-22.03/images/#builders/52/builds/150) of `bcm27xx/bcm2709` completed successfully.

10:58 <mrkiko> anyone using shadowsocks and know how to limit the number of simultaneously connected clients, be them authenticated or not?

11:00 <karlp> stintel: if you're still working on qoriq, https://git.openwrt.org/?p=openwrt/openwrt.git;a=blob;f=target/linux/qoriq/Makefile;hb=HEAD#l11 should say "rootfs-part" not root-part

11:02 <stintel> karlp: thanks

11:05 Gaspare has quit [Ping timeout: 480 seconds]

11:07 <stintel> karlp: which email do I use in Reported-by ?

11:08 <Tapper> Hi any one know if the wax 204 will be added to openwrt like the wax202?

11:09 <robimarko> Is it the same SoC?

11:09 <Tapper> Don't know cant find it on google

11:09 <Tapper> Or my google skils are just crap.

11:10 <Tapper> I am asking because I spoted a wax 204 on sail were I live for £39

11:12 <robimarko> Like always, somebody has gotta do it, there is no central planning

11:13 <Tapper> robimarko Yeah mate I know the drill.

11:13 <Tapper> Just wanted to ask that's all.

11:13 <robimarko> FCC photos show Broadcom logo on the SoC

11:14 <robimarko> So, I guess thats game over

11:14 <Tapper> O know!

11:14 * Tapper Spits on Broadcom

11:14 <robimarko> https://fccid.io/PY320100480/Internal-Photos/Internal-Photos-4876128

11:15 <robimarko> Not high res enough to read the part number but being Broadcom and AX its game over

11:15 <svanheule> the firmware file is also riddled with brcm strings

11:17 <robimarko> Honestly, the Redmi AX6000 is looking like a good deal with Filogic 830

11:22 bluew has quit [Ping timeout: 480 seconds]

11:25 Ansuel has joined #openwrt-devel

11:28 <\x> ahemm, time to ask, coremark of that MT7986B?

11:29 <robimarko> Its just a quad core A53 at 2GHz max

11:29 <robimarko> So pretty much in line with IPQ807x

11:29 <\x> so like 6.7k ish

11:30 <robimarko> I am too lazy to actually try installing OpenWrt on Banana Pi 3 to test

11:34 <karlp> stintel: karlp@etactica.com I guess, was looking at this for work

11:34 <stintel> karlp: thanks

11:37 <stintel> fix pushed

11:38 <karlp> so... what does that feature actually _do_ anyway? I'm lost deep in teh guts of include makefile chains :)

11:38 <stintel> it exposes TARGET_ROOTFS_PARTSIZE afaik

11:38 <karlp> because _msot_ places put in features boot-part _and_ rootfs-part, but some only do rootfs-part

11:38 <karlp> which is just a size option in menuconfig for generating output images right?

11:38 <stintel> yeah

11:39 <stintel> so that makes sense for devices with "variable" storage

11:39 <stintel> e.g. SD cards

11:39 <stintel> iiuc

11:39 <karlp> I'm trying to add support for a device with emmc and sd, "factory" will always be from emmc, but building an sdcard image should be fine too.

11:40 <karlp> the existing sdcard images work fine, and you can just copy them to emmc, and it works fine too, but, is there any support in openwrt sysupgrade or friends for expanding a partition/fs after install?

11:45 <stintel> I didn't think so

11:45 <stintel> I usually create an extra partition e.g. /srv on what's left of the storage

11:46 <stintel> and use that for container storage or so

11:46 <karlp> yeah, that's what I was kinda leaning towards.

11:46 <stintel> but then if your boot/root fs size is changed you lose that partition on next sysupgrade

11:47 <karlp> I do want to use the emmc boot partitions to put uboot and environment on though, instead of slamming it into extra MBR partitions on the front.

11:47 <karlp> having a bit of a wheel spin playing with that and all the different ways uboot and linux let you create/write all those portions :)

11:48 <stintel> recently got a device with eMMC and uSD, the olinuxino a64 (sunxi)

11:48 <stintel> but I've not played with it enough

11:48 <stintel> right now it boots u-boot from SD

11:48 <stintel> and I can't get the eMMC to work

11:48 <karlp> yar, I'm expanding sunxi. got it booting from both happily,

11:49 <stintel> we might want to consider an auto-expand-rootfs-or-overlay feature I guess

11:49 <karlp> the big key to getting emmc to work was the "mmc bootbus" and "mmc partconf" lines from the end of https://linux-sunxi.org/Bootable_eMMC

11:49 <stintel> although ... with sysupgrade saving changes to /boot in many cases ... this will increase potential to fail sysupgrade

11:49 <karlp> (for me at least)

11:50 <karlp> yar, I've not even _started_ looking at geting sysupgrade to work yet, just trying to build "nice" images that I can flash neatly via sunxi-fel/dfu/UMS..

11:50 <karlp> trying to avoid the fastboot mess.

11:50 <stintel> s/changes/backup/

11:51 <stintel> maybe Daniel has some ideas about the subject

11:51 <karlp> yeah, I don't have any specific issues right now, just sort of trying to put all the pieces together.

11:51 <robimarko> Resizing could be done

11:52 <robimarko> I am doing it at work with Systemd, but it should be doable without it as well

11:52 <robimarko> But you gotta fix the GPT table first to be able to expand

11:52 <robimarko> That can be easily done with parted in script mode(That is how I am doing it)

11:53 <stintel> I actually wonder .. we want to avoid FAT partition with kernel (legacy), but in this case, how does sysupgrade backup/restore even work

11:54 <karlp> on sunxi in particular do you know if the theobroma people are still around? I've been updating uboot-sunxi to 22.07, and there'ðs some of these a31-pangolin patches that have conflicts with changes upstream, and I've got no way of testing/verifying.

11:54 <stintel> oh nice work, I was looking at that too

11:55 <stintel> I'd sent an RFT series to ML? cc people who added those patches?

11:55 <karlp> sure, I'm pretty close on some of this stuff.

11:55 <karlp> just knwo you'd been doing some sunxi stuff over the years as well.

11:55 <stintel> s/sent/send/

11:56 * stintel doesn't remember :D

11:56 <karlp> am considering whether to use fit images as well, instead of uImage, but I may just be making mroe work for myself..

11:56 <stintel> I'd definitely go for fit

11:56 <karlp> we already use fit images on other targets though, so seems like it should be "just do it"

11:56 <stintel> less u-boot env stuff requires

11:56 <stintel> geeeez

11:56 <stintel> s/requires/required/

11:56 * stintel goes for a walk

11:57 <karlp> thanks for the chat

11:57 <stintel> any time

11:57 <stintel> ping me if you want anything tested, too

11:58 <ynezz> karlp: IIRC 100-102 can be removed https://git.openwrt.org/375d8031522b8a180225327584c415a8e85ee51b

12:00 <karlp> yeah, I'd dropped most of it, I saw the upstream commits that hcnaged it, but I left the gpio defines in, https://github.com/etactica/openwrt/commit/289490f716943f96bcf19ba1c158df56ac1e5de5 is my 22.07 upgrade so far.

12:01 <karlp> yours looks more complete though, I like it.

12:01 <karlp> I tried to keep what was left of those patches.

12:09 <karlp> so. is this a git.openwrt server config error or what? https://git.openwrt.org/?p=openwrt/staging/ynezz.git;a=patch;h=375d8031522b8a180225327584c415a8e85ee51b encodes ynezz's name in the from, but not in the signed-off-by?

12:10 <karlp> that's the "patch" link, the "raw" link works fine, *shrugs*

12:14 <Ansuel> signedoff-by is email body

12:14 <Ansuel> from needs to be encoded or some email program will fk up

12:16 <Ansuel> but now that i think about it with git format-patch also the signed off by tag is encoded

12:16 <Ansuel> MH

12:23 <f00b4r0> Ansuel: the tag isn't encoded by format-patch. Only the From, for the reason you pointed out :)

12:25 <Ansuel> thanks for clarifying that

12:29 <robimarko> git format-patch wont touch the body

12:31 <karlp> well, if you wget the "raw" link, it "works" jsut fine with git am, the "patch" links are therefore, IMO, ~useless?

12:48 <Ansuel> endianess question.... how le64_to_cpu works on 32 bit system ?

12:48 <nbd> why should it make a difference if the system is 32 bit or 64 bit?

12:49 <robimarko> Same way as 64 bit variables work on 32 bit systems

12:49 <Ansuel> address size

12:49 <Ansuel> i can't understand how they write the 32 bit remaining stuff if we don't pass any pointer

12:50 floof58 has quit [Read error: Connection reset by peer]

12:50 <nbd> what's the context of that question?

12:50 <Ansuel> (practical example) mib descriptor are 64 bit... would someting like *mib = le64_to_cpu(*data); works?

12:51 floof58 has joined #openwrt-devel

12:51 <Ansuel> (this is a 32bit system)

12:51 <nbd> works the same way on 32 bit and 64 bit machines in terms of behavior

12:51 <nbd> only the internals are different in the cpu

12:52 <nbd> since 64 bit can put it in a register, the 32 bit machine may need to transfer the 32 bit chunks individually

12:52 <nbd> depending on arch level, instruction set, etc.

12:52 <nbd> but you don't have to care about that

12:52 <Ansuel> ok so the handling is done internally. Had some doubt these kind of stuff wasn't supported

12:53 <nbd> the compiler deals with it for you...

12:53 <stintel> nbd: not sure you got our pings yesterday, would you mind having a look at https://patchwork.ozlabs.org/project/openwrt/list/?series=313995 ?

12:54 <stintel> ah actually there might be a v2 on the way

12:54 <stintel> janvenekamp: were you going to send v2 for ^

12:59 ekathva has quit [Remote host closed the connection]

12:59 <nbd> series looks good to me

13:07 <stintel> was this also the series that fixes truncated files on ENOSPC?

13:07 <stintel> if so, would be nice if some people who ran into that can Tested-by the series

13:37 <robimarko> Ansuel: Any chance you can check SAW version on IPQ8064?

13:37 <robimarko> Dmitry asked both of us couple of days ago, just stumbled on it by accident

13:40 <robimarko> Actually no need, I remebered I have Asrock G10 which is 8064

13:44 <Ansuel> robimarko if you want i'm doing some thing with ipq8064

13:44 <Ansuel> right now

13:45 <Ansuel> anyway i totally didn't understand the logic of saw version for this old platform

13:45 <robimarko> Ok, then dump register 0x02011FD0

13:45 <robimarko> If I am reading it right its v1 something

13:45 <Ansuel> but of v1 there are many revision for that but let me dump that reg

13:46 <robimarko> Yeah, datasheet for IPQ806x KPSS says that both minor and step are bits 15:=

13:46 <robimarko> 15:0 which makes no sense

13:46 <robimarko> As bit 16 is set

13:47 srslypascal has joined #openwrt-devel

13:51 <karlp> anyone know of the top of their heads the difference beteen KERNELNAME and KERNEL_NAME?

13:51 <Ansuel> (btw i also totally missed that

13:51 <Ansuel> no idea how)

13:51 <karlp> KERNELNAME seems to be for building the kernel, and KERNEL_NAME for making images? but why?

13:51 <Ansuel> right from uboot should even be better

13:52 <robimarko> Ansuel: I just used U-boot with md.l and make sure to only use single object mode

13:52 <robimarko> As its gonna crash otherwise

13:52 <robimarko> But I would say that version is probably 1.1

13:52 <robimarko> I replied with raw readings, make sure to reply as well

13:52 <robimarko> QCA will probably chime in with the actual bit definitions

13:53 <Ansuel> same values

13:53 <Ansuel> i had some fun with inspecting the saw driver some times ago and i notice this discrepancy

13:53 <Ansuel> that our revision of v1 values doesn't have a version value

13:53 <Ansuel> so probably this is why documentation is wrong and we don't have anything

13:54 <robimarko> Datasheet defines 2 registers for values

13:54 <robimarko> 0x02011FD0 APCS_VERSION

13:54 <robimarko> And 0x02091FD0 EXT_APCS_VERSION

13:54 <robimarko> So they either changed the register layout or the one in the DTS is wrong

13:55 <Ansuel> btw on my case uboot doesn't crash

13:55 <robimarko> Then register layout is different

13:56 <robimarko> As on 4019 most of SAW registers are "secure" so trying to just read them will crash it

13:57 <Ansuel> well i would be strange toi have the same layout saw changed a lot from ipq806x and ipq4019 or the entire power management itself changed from ipq806x and ipq4019

13:57 srslypascal has quit [Quit: Leaving]

13:57 <robimarko> Well, lets see what comes out of the discussion

13:57 valku has joined #openwrt-devel

13:59 <Ansuel> also 0x02011000 is kpss-gcc

13:59 <Ansuel> saw for l2 regulator on ipq806x is

13:59 <Ansuel> 0x02012000

14:04 cc0 has joined #openwrt-devel

14:05 <cc0> hi, what is the union (',') format supported by openwrt's jsonfilter?

14:07 <Ansuel> extra cursed endianess handling with value all scambled in the switch

14:07 <Ansuel> https://i.postimg.cc/bNcGVSx3/image.png

14:16 <Mangix> That for qca8k?

14:17 <Ansuel> yes but i'm stupid and the final code is just

14:17 <Ansuel> mib_eth_data->data[i] = le64_to_cpu(*(__le64 *)data2);

14:17 <Ansuel> i just had fun doing all manually with cast and stuff

14:17 <Ansuel> but then i notice there wasn't anything strange and it was all linear...

14:38 srslypascal has joined #openwrt-devel

14:56 Q__ has quit [Quit: Client limit exceeded: 20000]

15:00 <Ansuel> robimarko wonder what fixes tag should i use for the qca8k patch...

15:00 <Ansuel> guess the one that introduced the feature...

15:02 <Ansuel> just discovered i'm insolent https://forum.openwrt.org/t/adding-openwrt-support-for-xiaomi-ax3600/55049/8456?u=ansuel

15:03 <Ansuel> i should reconsider my idea about myself

15:10 Tapper has quit [Read error: Connection reset by peer]

15:11 Tapper has joined #openwrt-devel

15:16 philipp64 has quit [Ping timeout: 480 seconds]

15:42 robimarko has quit [Remote host closed the connection]

15:42 robimarko has joined #openwrt-devel

15:48 <robimarko> Ansuel: Best to not reply to those kind and just ignore themž

15:48 <robimarko> Usually they give up

15:49 <robimarko> And yeah, use the fixes for the initial commit adding the feature

15:51 philipp64 has joined #openwrt-devel

15:54 robimarko has quit [Quit: Leaving]

15:56 robimarko has joined #openwrt-devel

15:56 robimarko has quit []

15:57 robimarko has joined #openwrt-devel

15:59 robimarko has quit []

16:01 robimarko has joined #openwrt-devel

16:02 robimarko has quit []

16:03 robimarko has joined #openwrt-devel

16:04 robimarko has quit []

16:05 robimarko has joined #openwrt-devel

16:06 robimarko has quit []

16:07 robimarko has joined #openwrt-devel

16:14 robimarko has quit [Quit: Leaving]

16:15 robimarko has joined #openwrt-devel

16:16 robimarko has quit []

16:16 robimarko has joined #openwrt-devel

16:30 philipp64 has quit [Quit: philipp64]

17:04 philipp64 has joined #openwrt-devel

17:12 philipp64 has quit [Ping timeout: 480 seconds]

17:18 philipp64 has joined #openwrt-devel

17:45 csrf1 has joined #openwrt-devel

17:57 cbeznea has quit [Quit: Leaving.]

18:04 MaxSoniX has quit [Quit: Konversation terminated!]

18:05 schwicht has joined #openwrt-devel

18:13 philipp64 has quit [Quit: philipp64]

18:13 schwicht has quit [Ping timeout: 480 seconds]

18:15 <janvenekamp> stintel: Yes I was going for a v2 for the uci patches.

18:16 <janvenekamp> However, I have some doubts about the best solution for some edge cases.

18:16 <janvenekamp> ndb: can I ask you about this? Or someone else here familiar with the uci code?

18:16 philipp64 has joined #openwrt-devel

18:17 <janvenekamp> nbd: ^

18:19 <nbd> i haven't touched uci in many years, so i'm not that familiar with it anymore

18:19 <nbd> but ask anyway

18:22 <janvenekamp> the problem is with calling uci_set with strcmp(ptr->section, ptr->s->e.name) != 0

18:22 <janvenekamp> or with strcmp(ptr->option, ptr->o->e.name) != 0

18:24 <janvenekamp> this could cases could be considered what i would say "illegal uci_rename"

18:25 <janvenekamp> the behaviour is different when using delta tracking or not

18:26 Borromini has joined #openwrt-devel

18:27 <janvenekamp> I think this should be straitened out, I am considering two options for this:

18:28 philipp64 is now known as Guest2454

18:28 philipp64 has joined #openwrt-devel

18:29 <janvenekamp> 1: throw UCI_ERR_INVAL when this occurs

18:29 Guest2454 has quit [Ping timeout: 480 seconds]

18:30 <janvenekamp> 2: completly ignore ptr->section and ptr->option and use ptr->s->e.name, ptr->o->e.name for uci_add_delta

18:30 <janvenekamp> What do you think?

18:31 <nbd> i think the intention was to do 2

18:36 cbeznea has joined #openwrt-devel

18:37 <janvenekamp> Ok, thank you. I am going to make a v2 with that approach.

18:51 csrf1 has quit [Ping timeout: 480 seconds]

18:55 <nbd> thanks

19:04 <robimarko> Ansuel: Stop wasting your time and mental health on that dude

19:05 <Ansuel> i couldn't resist...

19:05 <Ansuel> dimitry answerd

19:05 <robimarko> Yeah, I am just looking

19:06 <Ansuel> This is a part of l2cc, rather than SAW.

19:06 <robimarko> BTW, any ideas on how to patch this stupid ath11k TX timeout

19:06 <Ansuel> wait.... but that reg is set as saw in ipq806x dts

19:07 <Ansuel> TX timeout related to sysupgrade?

19:07 <robimarko> Yep, until that is sorted out I dont feel IPQ807x is ready

19:07 <Ansuel> yep cause sysupgrade will fail silently

19:08 <robimarko> I am also thinking about reverting ath11k decap offloading

19:08 <robimarko> As its breaking WDS

19:08 <Ansuel> mhhh i would keep the patch but just not set it

19:08 <robimarko> I dont mean to remove the code

19:09 <robimarko> Just not set the module param and thus enable it

19:09 <Ansuel> ok oh yes same idea just revert the module param

19:09 <robimarko> yeah

19:09 <Ansuel> if an user wants it he can enable it pretty easily

19:09 <Ansuel> for ath11k we have 2 way... use a big hack and give time for wpad to get killed

19:10 <Ansuel> or I need to find a correct way to repro and investigate where is the breakage... 99% there is flawed logic in how the ap peer is removed

19:11 <robimarko> So far I have not found a way to reproduce it on demand

19:11 <Ansuel> about that i have some idea like clear all the ath11k ring before removing the peer

19:11 <robimarko> It just happens after the AP is alive for a while

19:11 <Ansuel> i mean the problem here is all the indirect handling of the peer so it's probably something like the ring is handling packet while the ap is torn down

19:11 <robimarko> The thing is that it basically has TX packets in the queue

19:11 <robimarko> That never get sent

19:11 <Ansuel> so the idea is stop tx queue

19:11 <Ansuel> clear ring

19:11 <Ansuel> remove ap

19:12 <Ansuel> can also be that tx queue is never stopped

19:12 <Ansuel> let me check the code... i have spare time to waste while i compile test ipq806x

19:12 <robimarko> I kind of have a feeling that possible issue is FW getting killed or misconfigured before the peer is removed

19:12 <robimarko> And thus those packets newer leave and have to timeout

19:13 <Ansuel> but we tested that wifi the wifi down thing

19:13 <robimarko> Thats a hack

19:13 <Ansuel> another idea would be make the make the fw crash :DDD

19:13 <robimarko> It just gives it time to timeout, nothing else

19:14 <robimarko> Well, you can simulate a FW crash from debugfs on demand

19:14 <Ansuel> but the timeout should not happen so it's totally something not cleared

19:14 <Ansuel> when the ap peer is removed

19:14 <robimarko> I agree

19:14 <Ansuel> could also be that there already some patch in wlan-open that fix that

19:14 <robimarko> My "idea" is that driver tells the FW to remove the peer too early

19:14 <robimarko> And only then syncs that with mac80211

19:15 <Ansuel> my idea is that it tells to remove the peer while still transmitting or while the tx ring are still handling stuff

19:15 <robimarko> wlan-open is a dumpster fire

19:15 <robimarko> Good luck finding anything there, its all lacking description

19:16 <Ansuel> notice the juicy patch are the fix-compilation-error patch

19:16 <Ansuel> you can find all kind of fixes in there

19:16 <robimarko> I am planning to run linux-next and leave hostapd working tommorow on it

19:17 <robimarko> And then see if it produces the same error(It should)

19:17 <Ansuel> on another topic no responde on bugzilla

19:18 cbeznea has quit [Quit: Leaving.]

19:19 <Ansuel> robimarko do you have the exact error ?

19:19 <Ansuel> so i can check where is produced in the driver?

19:20 <Ansuel> (btw litterally &ar->dp.num_tx_pending)

19:21 philipp64 has quit [Read error: No route to host]

19:21 philipp64 has joined #openwrt-devel

19:21 <Ansuel> on the op stop

19:21 <Ansuel> atomic_set(&ar->num_pending_mgmt_tx, 0);

19:21 <Ansuel> ok this is bad

19:21 <Ansuel> ....

19:22 <robimarko> You mean trace?

19:22 <Ansuel> i mean just the error printed in the syslog

19:22 <robimarko> I dont have it saved, it is in the bugzilla report though

19:23 <robimarko> https://bugzilla.kernel.org/show_bug.cgi?id=216513

19:24 <robimarko> The number of packets varies though

19:24 csrf1 has joined #openwrt-devel

19:25 <robimarko> This reminds me, its time to add NVMEM support to ath11k

19:25 <Ansuel> should be easy more or less

19:25 <robimarko> Yeah, its not hard

19:25 <robimarko> I did it before, but never cleaned it up

19:26 <Ansuel> mh

19:26 <Ansuel> there is this .flush function

19:26 <Ansuel> part of ieee80211_ops

19:27 <Ansuel> need to understand the use since it does't call the mac_drain_tx

19:27 <Ansuel> that comunicate to the fw to remove any tx packet

19:28 <slh> robimarko: do you happen to know a way to toggle the bootorder on a soft-bricked ASRock g10 (the OEM firmware should still be fine on the other partition, as it's never touched)? I bricked mine while testing a v5.15 based build a few weeks back and am still physically distant from it and can't do a serial console based recovery for a few more weeks to come

19:28 * Ansuel hides

19:30 <robimarko> slh: The upgrade script has some logic that modifies the bootconfig part

19:30 <robimarko> And AFAIK, bootloader is reading that during boot

19:31 <Ansuel> *If the parameter @drop is set to %true, pending frames may be dropped.

19:31 <Ansuel> * @flush: Flush all pending frames from the hardware queue, making sure

19:31 <Ansuel> *that the hardware queues are empty. The @queues parameter is a bitmap

19:31 <Ansuel> *of queues to flush, which is useful if different virtual interfaces

19:31 <Ansuel> *use different hardware queues; it may also indicate all queues.

19:31 <Ansuel> *Note that vif can be NULL.

19:31 <Ansuel> *The callback can sleep.

19:32 <Ansuel> the error comes from calling .flush

19:33 <slh> robimarko: yeah, but sadly it didn't trip that logic on that test build (it did in earlier tests) and sticks to the non-booting (at least not responding over the network) partition, but I guess I#ll have to go the serial console way now

19:33 <slh> interrupting early boot doesn't seem to trigger it, nor keeping the reset button pressed while powering on

19:34 <robimarko> slh: sadly, serial is the only way to go then

19:35 <slh> not a biggy, just unfortunate as I'm not near it for now

19:35 <robimarko> Ansuel: Well yeah, as that OP calls ath11k_mac_op_flush which then calls ath11k_mac_flush_tx_complete

19:36 <robimarko> But what doesnt make sense to me is why its not calling ath11k_mac_wait_tx_complete instead?

19:36 <Ansuel> we need to check how the mac80211 stop flow works

19:37 <robimarko> Ansuel: I doubt its a mac80211 issue

19:37 <Ansuel> no it's ath11k issue but i need to check if op_flush is called before stop

19:37 <robimarko> I just dont see how is it supposed to work in ath11k

19:37 <Ansuel> and how

19:38 <Ansuel> a fix can also be just if (drop) { ath11k_mac_drain_tx(ar); return; }

19:38 <robimarko> Its calling ath11k_mac_flush_tx_complete directly

19:38 <robimarko> And ath11k_mac_drain_tx which actually does flushing is never called

19:38 <Ansuel> it's called in op_stop

19:38 <Ansuel> first function

19:39 <Ansuel> drain_tx is more aggressive from what i can see

19:39 <robimarko> ok, so they are counting on that doing the thing before flush could ever get called

19:39 <Ansuel> this is why i need to understand what is called first

19:39 <Ansuel> and how

19:40 <Ansuel> ieee80211_do_stop should be the one doing the stop in theory

19:40 <robimarko> Thats the thing, according to the flush op description it should flush the packets as well

19:41 <robimarko> Cause, they are relying on stop being called first to call ath11k_mac_drain_tx

19:42 <robimarko> From my view, flush should call ath11k_mac_wait_tx_complete directly

19:42 <robimarko> As that will flush the TX queue and then call the sanity checker

19:42 <Ansuel> my only concern is that flush is also used in other context and using drain may be problematic

19:42 <Ansuel> but if i understand the code

19:42 <Ansuel> the bool drop should just say that

19:42 <Ansuel> If the parameter @drop is set to %true, pending frames may be dropped.

19:43 <Ansuel> with this true drain_tx

19:44 <robimarko> Ansuel: Well, that may be the issue

19:45 <Ansuel> the function is ieee80211_flush_queues

19:45 <Ansuel> searching if it's actually used tho

19:45 <robimarko> Cause, the way I see it they are just returning if drop is true

19:45 <robimarko> Shouldnt it be the other way around?

19:45 <robimarko> Cause: If the parameter @drop is set to %true, pending frames may be dropped.

19:46 <robimarko> But currently, ath11k_mac_flush_tx_complete is called if its not true

19:46 <Ansuel> mhhh but wait if it's true then how we reach the tx_complete?

19:46 <robimarko> I think that the condition is reversed

19:46 Borromini has quit [Quit: Lost terminal]

19:47 <robimarko> So ath11k_mac_flush_tx_complete gets called when packets arent supposed to get flushed

19:47 <robimarko> Please sanity check me as I have been sick for a week and its getting hard to work late

19:48 <Ansuel> it's correct flush with the bool false should NOT drop packet

19:48 <Ansuel> so tx purge should NOT be used so it's correct

19:48 <Ansuel> what i can't understand is why it's called with the bool on

19:48 <Ansuel> why it's not called*

19:48 <robimarko> Well, then they are doing the reverse

19:48 <robimarko> They are checking if drop is true

19:49 <robimarko> And if not continuing, otherwise just return early

19:49 <Ansuel> i think they return early as they assume you are removing the interface if you want to drop packet while purging tx and that is the first thing done on the interface stop function

19:50 <robimarko> But then any check on drop is useless

19:50 <robimarko> If you want to rely on fact that stop should be called before and packets flushed anyway

19:53 <Ansuel> we can also check what other wifi driver do with the flush

19:54 <robimarko> ath10k for example in flush flushes per station

19:57 <Ansuel> IMHO problem is that there is not handling here ath11k_mac_op_remove_interface

19:58 <Ansuel> interface is removed

19:58 <Ansuel> tx queue is not cleared

19:58 <Ansuel> packets gets lost

20:00 <Ansuel> one part of the puzzle is clear now i need to understand who call flush

20:00 <robimarko> Is it just weird to me that TX flush is just freeing SKB-s

20:00 <robimarko> There are no calls to WMI aka FW like in ath10k

20:01 <Ansuel> ath11k_mac_drain_tx does a call to purge tx

20:01 <Ansuel> robi

20:01 <Ansuel> I WONDER IF

20:01 <Ansuel> this is just a typo

20:02 <Ansuel> anyway if you want we can experiment with a simple fix

20:03 <robimarko> Ansuel: ath11k_mgmt_over_wmi_tx_purge just frees the SKB-s

20:03 <robimarko> It doesnt make a WMI call

20:04 <Ansuel> just notice wth o.O

20:04 <robimarko> So, it looks like they are dequeing?

20:05 <robimarko> I assume that means prevent it from being sent

20:05 <robimarko> Then removing worked that handles WMI_MGMT_TX_COMPLETION_EVENTID

20:05 <robimarko> And simply freeing the SKB-s and calling it a day

20:05 <robimarko> But this synchronize_net();

20:05 <robimarko> Is worrying me

20:06 <robimarko> *

20:06 <robimarko> *synchronize_net - Synchronize with packet receive processing

20:06 <robimarko> *Wait for packets currently being received to be done.

20:06 <robimarko> *Does not block later packets from starting.

20:06 <robimarko> */

20:06 <Ansuel> in theory mac80211 has already stopped tx queue

20:06 <robimarko> Cause it waits for existing to complete, but does not prevent new ones from starting in the meantime

20:06 <Ansuel> so no more packet should come

20:07 csrf1 has quit [Ping timeout: 480 seconds]

20:07 <robimarko> I dont know, its just weird

20:09 <Ansuel> still need to understand who is calling .flush

20:09 <Ansuel> can't find that in the codeflow...

20:09 <robimarko> nbd: you still up

20:11 <robimarko> Ansuel: drv_flush from net/mac80211/driver-ops.h calls the flush op

20:11 <robimarko> And that is called by __ieee80211_flush_queues

20:11 <robimarko> Which is called by ieee80211_flush_queues

20:11 <Ansuel> yes and then ieee80211_flush_queues is not called by any function related to interface stop

20:12 <Ansuel> or remove

20:12 <Ansuel> BUT on sysupgrade .flush is called or the error doesn't make sense

20:12 <robimarko> Ansuel: It happens if you stop the interface as well

20:12 <Ansuel> since ath11k_mac_wait_tx_complete is also called only in ath11k/core

20:13 <robimarko> I would bet its .del_tx_ts cfg80211 being invoked

20:13 <robimarko> As that calls ieee80211_del_tx_ts

20:13 <robimarko> Which calls ieee80211_flush_queues

20:14 <nbd> robimarko: yes

20:15 <robimarko> nbd: Great, do you happen to know if its safe to assume that stop op is always gonna get called before flush does?

20:15 <robimarko> Cause, we are trying to chase this down: https://bugzilla.kernel.org/show_bug.cgi?id=216513

20:15 <Ansuel> nl80211_del_tx_ts

20:15 <Ansuel> wth is ts

20:16 <robimarko> Its the nl80211 call

20:16 <robimarko> AFAIK, nl80211->cfg80211->mac80211

20:16 <robimarko> LOL, just now figured you are asking about the "ts" part

20:17 <Ansuel> yep

20:17 <Ansuel> just to understand the naming

20:17 <Ansuel> well guess it's time to reset my uptime and do some test to debug the codeflow...

20:18 <Ansuel> are we sure it's something related to time and not related to amount of packet?

20:18 <robimarko> Not really

20:18 <Ansuel> did we tested if a sysupgrade while an iperf is running works?

20:18 <robimarko> Not that I know of

20:18 <nbd> robimarko: are you taking about mac80211 driver ops?

20:18 <Ansuel> seems a nice idea to hammer the ring

20:19 <robimarko> nbd: Yes

20:19 <robimarko> As QCA has pretty much made the assumptio that stop always gets called before flush

20:19 <nbd> that's weird

20:20 <nbd> .stop should only be called when all interfaces are removed

20:20 <nbd> as a last step for shutting down the radio

20:21 <robimarko> nbd: Well, the driver currently as far as I can tell expects that stop was called before flush

20:21 <robimarko> As nothing actually gets flushed on flush

20:21 <robimarko> It just calls the function that makes sure it was flushed

20:23 <nbd> what part of the code is your expectation based on?

20:23 <Ansuel> robi the path is .remove_interface

20:23 <Ansuel> and at the end .stop

20:23 <robimarko> nbd: ath11k_mac_op_flush

20:23 <robimarko> It just calls ath11k_mac_flush_tx_complete

20:24 <robimarko> Which despite its name is a eror checker basically

20:24 <robimarko> ath11k_mac_drain_tx which actually does the flushing is called in start and stop ops

20:25 <robimarko> To me that is kind of a broken assumption

20:25 <Ansuel> but drain drops packet

20:25 <robimarko> Yeah, but only on start and stop mac80211 cals

20:26 <robimarko> Not if flush is called

20:26 <nbd> flush is per-vif

20:26 <nbd> start/stop is global

20:26 <nbd> i mean from an api perspective

20:26 <robimarko> nbd: check ath11k_mac_op_flush

20:26 <Ansuel> nbd on ath11k it's all global LOL

20:27 <hauke> Mangix: I would like to do an OpenWrt 21.02.4 and OpenWrt 22.03.1 release in the next days

20:27 <robimarko> It doesnt actually flush anything

20:27 Ansuel has quit [Read error: Connection reset by peer]

20:27 <hauke> Mangix: Do I have to do anything special in the package feed?

20:27 <nbd> it doesn't implement drop, but it's intended to wait until frames have been sent

20:27 Ansuel has joined #openwrt-devel

20:27 <Ansuel> hauke if we had some plan for 22.03.1 why we did all the hack with the .1? we could just release a new point release

20:27 <Mangix> hauke: I don't think so.

20:28 <robimarko> nbd: well, we are hitting that it times out so send those

20:28 <robimarko> But, it seems to be happening on mac80211 stop after the AP has been up for a while

20:29 <robimarko> And stop is supposed to actually flush them

20:30 <nbd> so maybe some frames are stuck inside a queue somewhere, which doesn't clear

20:31 <Ansuel> robi what i can't understand is that they have a timeout for the flush but nothing that triggers it

20:32 <Ansuel> drain tx just free skb

20:32 <robimarko> Well yeah

20:32 <Ansuel> so it's really just wait for the ring to complete

20:32 <Ansuel> and send empty ring interrupt

20:32 <Ansuel> ...

20:32 <robimarko> My current logic is that something is not stopped

20:32 <robimarko> And thus new packets arrive after existing ones have been freed

20:33 <Ansuel> well the ring is global so new packets arrive from other interface

20:33 <Ansuel> lol...

20:33 <Ansuel> so it can be it even timeouts on the first interface_remove

20:33 <Ansuel> if a device switch to the other band

20:33 <Ansuel> ... MH

20:33 <hauke> Ansuel: The wolfssl updates are already shipped now

20:34 <robimarko> Hm, I am gonna disable 2G band

20:34 <hauke> and not everyone upgrades everything immediately

20:34 <robimarko> Just leave the 5G one for hours

20:34 <hauke> Mangix: thanks for the information

20:34 <robimarko> And see if it still triggers

20:37 <Ansuel> well if that's the case..... the driver require complete rework...

20:37 schwicht has joined #openwrt-devel

20:38 <Ansuel> check ath11k_mac_flush_tx_complete

20:38 <Ansuel> the num_tx_pending is global

20:39 <Ansuel> or the flush function should just be fixed to flush skb relevant to the current vif

20:40 <robimarko> So my worries about synchronize_net

20:40 <robimarko> Are kind of possible

20:40 <robimarko> As "radios" are not synchrounously stopped

20:41 <nbd> i don't think this issue is related to stopping the radio

20:41 <nbd> it's related to stopping vis

20:41 <nbd> vifs

20:41 <Ansuel> yep

20:41 <nbd> and i don't buy the theory that this is caused by too many packets coming in on another vif

20:42 <nbd> the load would have to be really strong to make it run into the timeout

20:42 <Ansuel> nbd to me it seems that the timeout occur because a device just connect to another interface

20:42 <Ansuel> and fill the tx queue

20:43 <Ansuel> robimarko but this theory doesn't work as wifi down before kill should have worked...

20:44 <nbd> i think it's far more likely that this is not about a tx queue for any vif being constantly filled

20:44 <nbd> it's either a counter imbalance

20:44 <nbd> or packets getting stuck somewhere

20:44 bluew has joined #openwrt-devel

20:45 <nbd> hauke: btw. are there any plans to update backports to 6.0?

20:45 <Ansuel> well need to examin what increase the num_tx_pending

20:45 schwicht has quit [Ping timeout: 480 seconds]

20:45 <Ansuel> can also be that some packet disappear out of existance and are never decreased from the counter

20:45 <Ansuel> if that's the case this is a nighmare...

20:46 <nbd> it's ath11k... of course there are going to be nightmares :)

20:46 <Ansuel> ./dp.h:174: atomic_t num_tx_pending;

20:46 <Ansuel> ./mac.c:7290: (atomic_read(&ar->dp.num_tx_pending) == 0),

20:46 <Ansuel> ansuel@Ansuel-xps  ~/ath/drivers/net/wireless/ath/ath11k   master  grep -rnw . -e num_tx_pending

20:46 <Ansuel> ./dp_tx.c:267: atomic_inc(&ar->dp.num_tx_pending);

20:46 <Ansuel> ./mac.c:7294: atomic_read(&ar->dp.num_tx_pending));

20:46 <Ansuel> ./dp_tx.c:310: if (atomic_dec_and_test(&ar->dp.num_tx_pending))

20:47 <Ansuel> ./dp_tx.c:339: if (atomic_dec_and_test(&ar->dp.num_tx_pending))

20:47 <Ansuel> ./dp_tx.c:719: if (atomic_dec_and_test(&ar->dp.num_tx_pending))

20:47 <Ansuel> ./dp.c:898: atomic_set(&dp->num_tx_pending, 0);

20:47 <Ansuel> MHE NOT BAD

20:47 <Ansuel> (sorry for the spam)

20:49 <Ansuel> actually there are some case where the counter is not decreased

20:50 <Ansuel> robimarko do you have other error or logread is clean for ath11k ?

21:00 <hauke> nbd: I haven't found the time to update backprots yet

21:05 schwicht has joined #openwrt-devel

21:18 <robimarko> Ansuel: Not that I know of

21:18 <robimarko> I am off to bed, can barely look

21:19 robimarko has quit [Quit: Leaving]

21:20 csrf1 has joined #openwrt-devel

21:45 Ansuel has quit [Quit: Probably my PC decided to sleep or I decided to sleep.]

21:47 srslypascal has quit [Remote host closed the connection]

21:47 srslypascal has joined #openwrt-devel

21:48 srslypascal has quit [Remote host closed the connection]

21:48 schwicht has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]

22:15 srslypascal has joined #openwrt-devel

22:16 linusw_____ has quit []

22:31 schwicht has joined #openwrt-devel

22:43 schwicht has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]

23:01 danitool has quit [Quit: Cubum autem in duos cubos, aut quadratoquadratum in duos quadratoquadratos]

23:04 Tapper has quit [Ping timeout: 481 seconds]

23:28 goliath has quit [Quit: SIGSEGV]

23:39 xback has quit [Remote host closed the connection]

23:41 xback has joined #openwrt-devel

23:44 <karlp> feck, was tearing my hair out trying to do some extra work at home, runnign into weird things, patches not being applied.

23:44 <karlp> seems i had a git-src symlink in place from some earlier work :|