<smaeul>
mirko: hmm, I don't know of any reason why H6 boards would randomly crash. I'm assuming you have a UART hooked up before the crash, so you would see if Linux was panicking?
<smaeul>
any non-environmental hardware issues (i.e. marginal cpufreq voltages) wouldn't show up on multiple different boards
<smaeul>
and any software problem _should_ log something to UART
apritzel has quit [Ping timeout: 480 seconds]
cnxsoft has joined #linux-sunxi
ftg has quit [Ping timeout: 480 seconds]
vagrantc has quit [Quit: leaving]
cnxsoft1 has joined #linux-sunxi
cnxsoft has quit [Read error: Connection reset by peer]
cnxsoft has joined #linux-sunxi
cnxsoft1 has quit [Read error: Connection reset by peer]
cnxsoft has quit []
rajkosto has joined #linux-sunxi
apritzel has joined #linux-sunxi
apritzel has quit [Ping timeout: 480 seconds]
Daanct12 has joined #linux-sunxi
JohnDoe_71Rus has joined #linux-sunxi
adjtm has quit [Quit: Leaving]
rajkosto has quit [Read error: Connection reset by peer]
adjtm has joined #linux-sunxi
adjtm has quit []
paulk-bis has joined #linux-sunxi
bauen1 has joined #linux-sunxi
evgeny_boger has quit [Ping timeout: 480 seconds]
apritzel has joined #linux-sunxi
rajkosto has joined #linux-sunxi
apritzel has quit [Ping timeout: 480 seconds]
evgeny_boger has joined #linux-sunxi
apritzel has joined #linux-sunxi
Daanct12 has quit [Remote host closed the connection]
cnxsoft has joined #linux-sunxi
hlauer has joined #linux-sunxi
Daanct12 has joined #linux-sunxi
evgeny_boger1 has joined #linux-sunxi
evgeny_boger has quit [Ping timeout: 480 seconds]
evgeny_boger1 has quit [Ping timeout: 480 seconds]
Daanct12 has quit [Quit: Leaving]
bauen1_ has joined #linux-sunxi
bauen1 has quit [Ping timeout: 480 seconds]
Marcel has joined #linux-sunxi
Marcel is now known as Guest3074
ftg has joined #linux-sunxi
Guest3074 has quit []
ftg has quit [Ping timeout: 480 seconds]
JohnDoe_71Rus has quit []
cnxsoft has quit [Ping timeout: 480 seconds]
vpeter has quit [Remote host closed the connection]
vpeter has joined #linux-sunxi
JohnDoe_71Rus has joined #linux-sunxi
vagrantc has joined #linux-sunxi
hlauer has quit [Ping timeout: 480 seconds]
tllim has joined #linux-sunxi
tllim has quit []
pentabarf has joined #linux-sunxi
pentabarf is now known as pentabarf1
<pentabarf1>
smaeul: i would be very greatful if you could have a look at my code for checking the rotpk from "normal world". Is this the right approach or am i totally wrong? https://gist.github.com/pentabarf/8695459ce25b289ae068fd6091502c9c maybe it is just not working on H3
<mirko>
smaeul: uart connected, literally nothing
<mirko>
put a heat sink on it in case it's an overheating issue after all, same
<kilobyte_ch>
We are unfortunately still using the 3.4 Sunxi Kernel with NAND as Storage Medium. Any idea if there is a way to "stress test"/"benchmark"/verify the NAND Chip in such a Device?
apritzel has quit [Ping timeout: 480 seconds]
<kilobyte_ch>
Testin in terms of if it flips bits or does other funky stuff which the Allwinner NAND implementation can't deal with properly.
<mirko>
kilobyte_ch: maybe not quite helpful, but different nand chips handle ECC differently. e.g.: does it have an OOB area? is metadata about bad blocks / write counts stored in there? if that's the case, the driver needs to be (made) aware of that. what i'm trying to say i guess is, that if you're suspecting it not being able to deal with it, you should check 3 things: nand flash's datasheet about expected
<mirko>
write cycles per block and/or page, whether the driver makes use of the hardware's OOB space and what filesystem you use on top of the MTD layer (which i suppose you use rather than the block layer)
rajkosto has quit [Read error: Connection reset by peer]
<kilobyte_ch>
mirko: Unfortunately there is no MTD Layer with the 3.4 Sunxi NAND. I think theres also no Metadata available in the 3.4 Sunxi NAND implementation. Thus my idea to stress test chips.
<mirko>
kilobyte_ch: what are you trying to achieve? wearing out blocks? that will happen either way. or do you want to verify whether it meets the minimum erase cycles promised by the datasheet?
<mirko>
you *will* see blocks becoming bad - that's for sure
<kilobyte_ch>
mirko: sure, no question that this implementation -will- fail at some point as it lacks proper ECC. The only thing i want to achieve is to find "bad" chips with flaky bits or other defects.
<kilobyte_ch>
One idea is to write patterns like 0xFF, 0x00, 0xAA to the Chip and verify that it holds it on all cells. But I'm not sure how to achieve that in an easy manner.
<mirko>
kilobyte_ch: current NAND flash chips have bad blocks right from the beginning, coming out of the factory
<mirko>
you will always encounter NAND chips with bad blocks - which is considered totally normal
<mirko>
there's usually an area at the very beginning of the chip which is guaranteed not to be bad - which bootloader/kernel should reside in, so booting from it is guaranteed
<mirko>
from there on - nothing is.
<mirko>
software is supposed to deal with it - be it the file system or whatever is in between (e.g. NAND flash driver, ..). This includes wear levelling. Your software stack needs to match the underlying hardware constraints.
<kilobyte_ch>
The part with the area in the beginning is interesting, didn't know that
<kilobyte_ch>
Also matches our experience with this (badly designed) device that the bootloader is always working, but it fails at some point during starting linux stuff.
<mirko>
kilobyte_ch: datasheet of flash chip should mention the amount of blocks guaranteed not being bad - if you provide manufacturer and model i can take a look
<mirko>
kilobyte_ch: than the area might be too small to already have code loaded dealing with bad blocks - or you're doing it too late
<mirko>
bootloader + kernel + initramfs usually exceeds what's guaranteed
<mirko>
so you better have ECC functionality working as soon as possible
<kilobyte_ch>
The issue is that I can't really change anything on the device. Thus also my idea with the testing to do a kind of "burn in" test do sort out "bad" NAND chips with higher chance for flipping bits.
<mirko>
(especially if the bootloader loads a huge initramfs)
<mirko>
kilobyte_ch: that sounds like a very bad approach. better get your ECC working as early as possible
<mirko>
or reduce all the "loaded by bootloader" stuff to fit into mentioned safe space, so bootloader doesn't need it but kernel is certain to be read and started correctly
<mirko>
(clumsy grammar/wording, hope you still get the point)
<kilobyte_ch>
At least we have two root partitions which the bootloader can switch if it doesn't boot successful for a couple of times.
<kilobyte_ch>
Don't get me wrong, I fully agree with all your points and would also implement them if I would be able/allowed to do. But unfortunately I'm not.
<mirko>
all in all, it has nothing to do with linux-sunxi / allwinner - but i'm not gonna play moderator here :)
<mirko>
kilobyte_ch: two root partitions also sounds like a bad approachs, given that it's not unlikely there's bad blocks in both of them
<mirko>
i use ping-pong rather for ensuring updates don't brick devices (fall back to working volume/partition, if boot wasn't successful after upgrade and/or upgrade failed)
<mirko>
it's not supposed to help against bad blocks for reasons stated above
<mirko>
but yeah, as initially stated: my answer(s) might not be heplful for what you actually wanna do, which I'm pretty convinced is not the right approach anyaway
<kilobyte_ch>
Thanks for the explanation
<mirko>
kilobyte_ch: first i'd check how much space / many blocks is guaranteed to be working for at least how many writes and then start finding solutions and/or workarounds from there
pentabarf1 has quit [Quit: Leaving.]
apritzel has joined #linux-sunxi
JohnDoe_71Rus has quit []
hlauer has joined #linux-sunxi
juri__ has quit [Ping timeout: 480 seconds]
juri_ has joined #linux-sunxi
mps has quit [Remote host closed the connection]
juri_ has quit [Read error: No route to host]
juri_ has joined #linux-sunxi
hlauer has quit [Ping timeout: 480 seconds]
<smaeul>
pentabarf: your code looks fine. most likely it means H3 does not support that function