<dikiy>
as smaeul suggested changing 8 to 7 helped and test passed (no failures through some hours of testing)
<dikiy>
I would test the performance impact, but I dont know how
<dikiy>
And as of kernel parameter: it could be, for example 8 as default value. But if somebody finds out (rare cases, as my for example, I suppose) that 8 is not enough, then he could set the kernel parameter to 7 (or 6, whatever)
<apritzel>
dikiy: I understand the idea, but a command line parameter would just be a hack
<dikiy>
The case is, that nobody with good boards (tha majority) should suffer because of 5% of bad boards. And, unfortunately pine64 doesn't consider this as buggy HW, and doesn't send a replacement
<apritzel>
keep in mind that the arch timer in inside the SoC, so it's not some board production issue
<gamiee>
dikiy: it's happening it on PINE A64 LTS? Or other PINE64 device?
<dikiy>
pinephone OG
<apritzel>
it's an Allwinner A64 problem
<apritzel>
there might be good and bad batches of SoCs, nobody knows
<gamiee>
Yes, I know it's SoC problem, just thinking if it's issue of A64 batch, or all revisions of A64
<gamiee>
(I wonder if it's fixed in A64-H)
<apritzel>
and this "nobody with good boards (tha majority) should suffer" is the reason I asked for a performance assessment
<dikiy>
I'm only bored of need to crossbuild the kernel everytime I want to get an update...
<apritzel>
because if it turns out to be totally negligible, we don't need to boil the ocean, and just decrease the mask (again)
<dikiy>
apritzel: how could I measure the performance caveat?
<apritzel>
good question, for a start it should be relatively easy to check how often we actually call this function
<apritzel>
which I guess depends on the workload
<dikiy>
apritzel: is this a function?
<apritzel>
IIRC I once measured that the actual sysreg access takes 3 cycles, so we may overthink the performance impact here
<dikiy>
as I see it is some inline definition
<apritzel>
sun50i_a64_read_cnt[pv]ct_el0 are functions
<dikiy>
ah, I see
<dikiy>
can I set some counter on the syscall without recompile the kernel?
<apritzel>
I think we already suffer because we need a workaround at all, so just slightly increasing the chance of a false positive might not be noticeable at all
<dikiy>
9->8 and now 8->7 gives like 4 times caveat, isnt it?
<dikiy>
*would give
<apritzel>
dikiy: I'd say we go from 1/1024 to 1/256, which is technically 4 times, but still with a very low probability
<dikiy>
it is like we need a wait in a loop till sufficient time is over
<dikiy>
ah, so I didn't understand the caveat
<dikiy>
it is like we had a chance of 1/1024 to stuck for some time in a loop. and now it would be 1/256
<apritzel>
dikiy: check the code, it just immediately reads again, and we just impose an upper limit
<apritzel>
it's not stuck, normally you would just do another read, and that's it
<dikiy>
I lack some understanding of multiprocessing. For example, can kernel be interrupted meanwhile in this loop?
<dikiy>
ah.. the bug have a fingerprint, yes?
<dikiy>
like 7fffffff at the and
<apritzel>
yes, that's the idea: the erratum produces that pattern
<dikiy>
in this case it shouldn't be a big deal I think
<apritzel>
re: preemption: why would that matter? if the scheduler interrupts, we don't care about 10 cycles or so anymore
<dikiy>
should I report it somewhere?
<apritzel>
dikiy: in the normal case we just read once, and the value is fine, and there is just the general overhead of an arch timer workaround, plus the comparison (which costs nothing in the grand scheme of things, really)
<apritzel>
dikiy: if we hit the erratum case, then it's great, because we detect this, and just pay 10 or so more cycles to prevent a big problem
<apritzel>
dikiy: the only downside is that this bit pattern can of course appear just normally, without anything being wrong, that's this false positive case I mentioned above
<dikiy>
now I see that increasing the mask doesnt matter a lot, comparing with the whole workaround scheme
<apritzel>
exactly
<apritzel>
dikiy: yes, please report this to the mailing, including the maintainers (both sunxi and arch timer)
<apritzel>
scripts/get_maintainers.pl will tell you who they are
<dikiy>
tbh I have no idea, how the lists work.
<dikiy>
just google linux kernel maillist?
<apritzel>
just send an email to the addresses that this Perl script outputs
<dikiy>
could somebody run the script for me? I dont have a kernel tree right now
<apritzel>
eventually should make a patch, and then having a kernel tree is your smallest problem ;-)
<dikiy>
I could simply modify the already existent patch :)
<apritzel>
but why? You just change the code, commit that, and let "git format-patch" do the dirty work for you
<dikiy>
is it ok to send an email to all of three of them?
<dikiy>
because the only command I know is git clone xD
<dikiy>
I'm not a programmer
<apritzel>
dikiy: just send an email to Chen-Yu, Jernej and Samuel, and CC: the Linux-ARM and sunxi list mentioned in that paragraph
<dikiy>
thank you!
<apritzel>
regarding performance measurements: normally you should be able to just use ftrace, but those two functions are of course marked as "notrace" ;-)
<dikiy>
haha D
dikiy has quit [Ping timeout: 480 seconds]
grming has joined #linux-sunxi
JohnDoe_71Rus has quit []
dikiy_ has joined #linux-sunxi
cnxsoft1 has quit []
grming has quit [Ping timeout: 480 seconds]
grming has joined #linux-sunxi
cnxsoft has joined #linux-sunxi
warpme____ has joined #linux-sunxi
<maz>
apritzel: is this still about the A64 arch timer jumping around?
<apritzel>
maz: yeah, I am afraid so :-(
<maz>
apritzel: I though this one was done and dusted? is this appearing on new HW for which the workaround is not sufficient? or was the workaround broken the first place?
<apritzel>
the latter, apparently, it's still the same A64. For some odd reason there are chips out there that seem to be worse than we thought
<maz>
great :-(
<apritzel>
seems like not everyone reported issues, and if, then not to the right channels
cnxsoft has quit []
<maz>
meh. seems like 6.1 is going to interesting on the "timer workaround" front, since I broke XGene...
JohnDoe_71Rus has joined #linux-sunxi
dikiy___ has joined #linux-sunxi
dikiy_ has quit [Ping timeout: 480 seconds]
grming has quit [Quit: Konversation terminated!]
dikiy___ has quit [Read error: Connection reset by peer]
dikiy_ has joined #linux-sunxi
<palmer>
maz: looks like the RISC-V timers are broken too!
dikiy_ has quit [Ping timeout: 480 seconds]
<gamiee>
waiw what really?
apritzel_ has joined #linux-sunxi
vagrantc has joined #linux-sunxi
apritzel has quit [Ping timeout: 480 seconds]
dikiy has joined #linux-sunxi
apritzel_ has quit [Ping timeout: 480 seconds]
dikiy has quit [Quit: leaving]
dikiy has joined #linux-sunxi
<dikiy>
aperezdc: sorry, Ive lost the connection and dont know hoe to recover pm chat
<aperezdc>
dikiy: sorry, but I think we never talked in private... maybe you meant to ping someone else?
apritzel_ has joined #linux-sunxi
JohnDoe_71Rus has quit []
apritzel_ has quit [Ping timeout: 480 seconds]
<karlp>
pretty sure they meant ap<tab> to get to apritzel :)
<dikiy>
aahaha, yeah
<dikiy>
I only remember the first letters ))
<aperezdc>
Happens "=)
ftg has joined #linux-sunxi
dok has quit [Ping timeout: 480 seconds]
bauen1_ has joined #linux-sunxi
bauen1 has quit [Ping timeout: 480 seconds]
apritzel_ has joined #linux-sunxi
grming has joined #linux-sunxi
<dikiy>
apritzel_:
indy has quit [Ping timeout: 480 seconds]
apritzel_ has left #linux-sunxi [#linux-sunxi]
apritzel has joined #linux-sunxi
macromorgan has quit [Remote host closed the connection]