<apritzel>
smaeul: yeah, I remember this, but thought it was about fixing the manual offset adjustment only
<apritzel>
smaeul: I wasn't aware that the generic code checks for an enabled boot partition, and uses that already
<apritzel>
which makes automatic eMMC booting actually already work - but under good conditions only
<apritzel>
wens: ouch, you are right, thanks for the heads up!
Mangy_Dog has joined #linux-sunxi
tnovotny has joined #linux-sunxi
<warpme_>
apritzel: i curious about your opinion based on your ATF expertise: i'm trying get 5.13 mainline booting on sbc with FIP packed ATF. SD card booloader is current mainline uboot + vendor FIP files (platform/dram init + bl2/30/301/31). 5.13 kernel boots - but oops always at kernel's exec of /sbin/init ( https://pastebin.com/HLgeCP9t ). Init is shell script. Oops is even at simple echo command in sbin/init. To me it pretends
<warpme_>
like kernel exec of anything in user space makes oops. This exct SD card inserted to another box with very similar cpu (but with vendor bootloader) - boots perfectly fine. I think issue is in AFT code. Before quitting this project - i just want to be sure i understand where root cause is. So: is it possible that ATF has bug where bootloader & kernel are running fine till first kernel's user-space execution (kernel's
<warpme_>
sbin/init)?
<apritzel>
warpme_: blame other people's code, eh? ;-)
<apritzel>
on a first thought I have a hard time imagining how EL1->EL0 transition issues would be caused by EL3 code:
<apritzel>
if the kernel already happily ran at EL2 or EL1, and the SMP bringup worked, there is little that TF-A is to blame for
<apritzel>
warpme_: well, this one sticks out: [ 0.000000] OF: fdt: Reserved memory: failed to reserve memory for node 'secmon@5000000': base 0x0000000005000000, size 3 MiB
<apritzel>
warpme_: is there both a /memreserve/ entry *and* a /reserved-memory node for the same region in the DT?
<warpme_>
dt is extracted from vendor uboot (and vendor says it works ok with armbian)
<apritzel>
warpme_: seems like there is an #address-cells and #size-cells missing in that /reserved-memory node?
<apritzel>
compare what the other kids do: git grep -hA5 reserved-memory arch/arm64/boot/dts
<warpme_>
so error in DT? (vendor claims it has mainline working ok so my assumption dt is ok seems too optimistic?)
<apritzel>
"vendor claims it has mainline working" sounds fishy ;-)
<apritzel>
what "mainline"? with what setup?
<apritzel>
if there are no #a-c and #s-c properties in the node, Linux will assume the default 2/1 (at best), so it will reserve 0 bytes (if it wouldn't bail out before)
<warpme_>
re: " is there both a /memreserve/ entry *and* a /reserved-memory node for the same region in the DT" - seems not. can't find string "memreserve" in any dts
<apritzel>
warpme_: those things could be dynamically added through TF-A or U-Boot
<apritzel>
warpme_: but I think I see the root cause of this warning
<apritzel>
there is already a reservation for this area, in meson-g12-common.dtsi
<apritzel>
your DT adds another node, reserving the same region, but under a different name
<apritzel>
so you can remove this whole reserved-memory node in your DT, and at least this message should vanish
<warpme_>
indeed!. let me try!
<apritzel>
but I guess this will not affect your oops
<apritzel>
but it's a good argument for peer review and poor quality of vendor provided code (ignoring warning messages in dmesg) ;-)
<warpme_>
hmm: i commented in DT and.... [ 0.000000] OF: fdt: Reserved memory: failed to reserve memory for node 'secmon@5000000': base 0x0000000005000000, size 3 MiB is still there
<warpme_>
can it be that kernel gets fdt from uboot with priority over dt from boot.scr?
<apritzel>
yes, the DT the kernel eventually sees can be quite different
<apritzel>
for instance TF-A and U-boot can add or remove entries from memory reservations
<apritzel>
actually now looking closer I see that the reservation in your DT is for some separate region (32MB after the primary reservation)
<apritzel>
so that's fine
<warpme_>
ok - should i try to add missing #address-cells and #size-cells?
<apritzel>
no, that comes through some included common dts, actually
<apritzel>
but anyway the warning is still there, which makes me think the region might also be reserved via the DTB memory reservation block
<apritzel>
is there any way you get to the prompt with that firmware setup? Using another kernel, maybe?
<apritzel>
warpme_: then you could execute: "hexdump -C /sys/firmware/fdt | head -2" to check if there is something in that block
<apritzel>
and then also check what the final /reserved-memory node looks like
<warpme_>
nope. kernel oops even on simple "echo" in init script
<warpme_>
i have factory android boot log. will it be helpful?
choozy has joined #linux-sunxi
<apritzel>
you would need a copy of /sys/firmware/fdt, if that Android kernel provides this
<apritzel>
warpme_: and you are sure that /sbin/init is valid? because the kernel oopses because init dies
<warpme_>
well - this exact sd card (with this init) boots fine on another box. also - i see kernel mounts rootfs fine. tried to replace init with 1 line script having "echo" and it dies no any output.
<apritzel>
a script using what shell?
prefixcactus has quit [Ping timeout: 480 seconds]
prefixcactus has joined #linux-sunxi
Serge1000lyn has joined #linux-sunxi
hallyn has quit [Read error: Connection reset by peer]
qCactus has joined #linux-sunxi
prefixcactus has quit [Read error: No route to host]
prefixcactus has joined #linux-sunxi
qCactus has quit [Ping timeout: 480 seconds]
<warpme_>
apritzel: /bin/sh symlinked to busybox
<apritzel>
so then is that busybox binary sane?
<warpme_>
"binary sane" ? What exactly do you mean?
<warpme_>
if this exact copy (this SD card) of busybox binary works well on very similar cpu - should i assume it is binary sane?
ftg has quit [Ping timeout: 480 seconds]
<apritzel>
I meant if that binary really works. For instance you would see a similar error if the binary was corrupted, or for another architecture (booting a 32-bit kernel on a 64-bit userland)
<apritzel>
warpme_: maybe this exitcode=0x00000100 gives a hint?
<warpme_>
i'm a bit out of ideas except launching this exact binary on closest cpu i have. tested and it works. I agree with your logic and basically i think there are 2 possible explanations if my issue: 1\rootfs - while seems to be mounted ok - in reality is not; 2\kernel somehow can't execute 64bit userland. So now q is how to verify this. What will be your idea to verify 2\?
<apritzel>
warpme_: it looks like your first DTB and the initrd fail to load from your boot.scr script. Is that intended? (I guess so?)
<warpme_>
apritzel: yes. i wrote boot.scr in a way that it can serve setups with and without initrd. dtb loading is arrnaged in a way that user can easily provide it's own customised DT by dropping it into root of boot part
tnovotny has quit []
<warpme_>
btw: i just tried with 1 line init script (echo "hello world"; exit 0) still oops but now it is "end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000000" (was 0x00000100). this probably means rootfs is accessible ok by kernel as change in script changes error code. So for me this makes hypothesis 2\ much more probable?