<karlp>
(I haven't not used tiptrans for taobao shopping, but I'v eused them for aggregation and re-shipping for a while, no particular reason to doubt that they'll do what they say)
<gamiee>
Cool, thanks karlp!
moteen has quit [Remote host closed the connection]
vagrantc has joined #linux-sunxi
apritzel has quit [Ping timeout: 480 seconds]
moteen has joined #linux-sunxi
moteen has quit [Ping timeout: 480 seconds]
<MoeIcenowy>
weird, FEL execution seems to have some kind of signature verification
<MoeIcenowy>
fortunately it seems to be not enabled by default
<MoeIcenowy>
maybe it's part of secure boot?
<gamiee>
I hope that V853/V851 get more attention than it's predecessors (V831 V531 or V5), they are quite good SoCs
moteen has joined #linux-sunxi
moteen has quit [Remote host closed the connection]
moteen has joined #linux-sunxi
<MoeIcenowy>
got uart0-helloworld-sdboot working on V853
aggi has quit [Quit: zzz]
<MoeIcenowy>
strange, aw_read_arm_cp_reg() cannot work on V853 but aw_get_stackinfo() can
JohnDoe_71Rus has quit []
aggi has joined #linux-sunxi
jakllsch has joined #linux-sunxi
jakllsch has quit [Ping timeout: 480 seconds]
jakllsch has joined #linux-sunxi
apritzel_ has joined #linux-sunxi
apritzel_ has left #linux-sunxi [#linux-sunxi]
apritzel has joined #linux-sunxi
<apritzel>
MoeIcenowy: which register are you trying to read?
<apritzel>
MoeIcenowy: and what does the memory map look like? Like the D1/R528? or more like H6?
hlauer has joined #linux-sunxi
<MoeIcenowy>
apritzel: D1
<MoeIcenowy>
apritzel: sctlr
<MoeIcenowy>
the problem is that the str instruction behaves like it never happened
<MoeIcenowy>
the value of the scheduled destnation of str instruction does not change at all
<apritzel>
caching issues? do we have the "D cache clean to PoU", "I cache invalidate", "isb" sequence (with the proper dsb's in the middle) in the FEL code?
<MoeIcenowy>
well theortically cache shouldn't matter
<MoeIcenowy>
but who knows...
<MoeIcenowy>
can cache explain why aw_get_stackinfo() works?
<apritzel>
the more interesting question is why this worked before ;-) That's a single Cortex-A7 core?
<apritzel>
Does the BROM turn the MMU on? and can you check the SCTLR.I bit? (maybe from the BROM dump)
<MoeIcenowy>
apritzel: I think a single A7 (and an T-Head E907, but that one shouldn't be available on booting)
<hramrach>
gamiee: what's good about V853/V851?
<MoeIcenowy>
apritzel: the BROM enables I cache
<MoeIcenowy>
well if the I cache makes the execution of a thunk written to the same address not reliable, I think it's beyond what can be fixed, and maybe the whole sunxi-fel needs to be rewritten...
<apritzel>
well, not rewritten, just the right instructions in there. dsb; ic iallu; isb, for instance (or whatever ic iallu is in old money)
<hramrach>
sounds like a tandom bag of peripherials glued together that might work ok for some very specialized embedded applications but is pressy much useless for general purpose boards
<hramrach>
maybe you could replace A10 with this if A10 stopped being available
<hramrach>
nevermind, it has NPU but no GPU
<gamiee>
@hramrach : it's cheap and have H.265 encoder.
<hramrach>
yes, that's something potentially useful for some very specialized applications
<gamiee>
yes, of course, it's not absolutely general purpose, but I think it have many use cases :)
<hramrach>
I would very much appreciate if it haf GPU as well but it has NPU instead
<hramrach>
It's quite likely there are even cheaper and less power hungry dedicated h265 encode chips
<gamiee>
it's not
<gamiee>
and well, it should have G2D or something like that, so that is helpful
<gamiee>
(although not sure if Linux have any G2D drivers)
<MoeIcenowy>
apritzel: well it's really I cache issue
<MoeIcenowy>
I hooked aw_fel_write and let it run a thunk that disable I-Cache before every run
<hramrach>
if it's so chep that it's cost-efficient as dedicated h265 encoder then that's potentially interesting
<MoeIcenowy>
apritzel: I think the problem is that FEL does not invalidate I-Cache when writing new code
<gamiee>
definitely it's best SoC in ratio H.265 encoder and price, also it can do 4K H.265 what is pretty good
<apritzel>
MoeIcenowy: yes, architecturally we would need even more, but the circumstances (BROM FEL code executing meanwhile) and the fact that it's little cores only (A7, A53) let us get away with a lot
<MoeIcenowy>
apritzel: sent a PR
<apritzel>
MoeIcenowy: I wonder if we should put some "invalidate;isb;ret" code snippet somewhere, then execute that after writing, but before executing new code
<MoeIcenowy>
apritzel: I think this does not change unreliable FEL writing...
<apritzel>
but how comes your aw_disable_icache code gets executed?
<apritzel>
shouldn't that itself be affected by the same problem?
<MoeIcenowy>
well I just execute it before the first actual write
<MoeIcenowy>
apritzel: well after it is done it should have no problem
<apritzel>
but this code must be written somewhere as well first?
<MoeIcenowy>
re-disabling I-cache is harmless
<MoeIcenowy>
well to be honest this only eliminates the need for an alternative thunk storage
<MoeIcenowy>
which is needed for invalidation code
<apritzel>
it doesn't really need to be an alternative location, as long as it's separate, so the I$ copy stays valid. And ideally it's just three or four instructions
<MoeIcenowy>
well I think it should be at least a cache granularity far from the main thunk storage
<MoeIcenowy>
and maybe when booting SPL, it should also be out of SPL storage
moteen_ has joined #linux-sunxi
<MoeIcenowy>
BTW D1 has similar issues, I may need to check it out then
<MoeIcenowy>
although it will be a big hack to let sunxi-fel support D1
moteen has quit [Ping timeout: 480 seconds]
<MoeIcenowy>
apritzel: to be honest I was too silly to realize there's I and D caches before you saying about "SCTLR.I"
<MoeIcenowy>
(I mean there could be inconsistency between I and D caches
<apritzel>
not silly, the caches in the ARM architecture are quite tricky
hlauer has quit [Ping timeout: 480 seconds]
<apritzel>
the architecture does not guarantee coherency between L1I and L1D. So whenever you write code, you need to clean the D$ to the "point of unification" (where I$ and D$ eventually meet)
<apritzel>
this is typically L2
<hramrach>
so the problem is that when you write the code it's in the data cache but not memory nor instruction cache?
<apritzel>
hramrach: yes, exactly, when the MMU and the data cache is on
<apritzel>
but then you also need to invalidate the I$, so that it actually fetches the new data from the PoU
<hramrach>
can you write the invalidation code into some sram region that would not be affected by the caching?
<apritzel>
and finally you have to execute an "isb", so that the core discards whatever it had already fetched from the I$, and re-fetches it from there
<apritzel>
hramrach: yes and no
<apritzel>
if you put invalidation code somewhere, and manage to get this into the I$, then you should be able to jump to that
<apritzel>
no for: SRAM region not affected by the caching: there is no such thing
<apritzel>
not for the I$ accesses, that is
<hramrach>
that's PITA. You can flush the data cache by writing enoung data, presumably. You can flush the instruction cache by executing enough code but you don't know what code you have
<apritzel>
the data is most likely not enabled in our case, as this would require to turn the MMU on
<apritzel>
(because the core also knows if some area is cacheable or not through the page table entries, as there are not MTRRs, if you come from x86)
<apritzel>
so the writes end up in the SRAM
<hramrach>
presumably you can pick a region to write the code that has not been executed yet so it's not in the instruction cache, ugh
<apritzel>
yes, I think this is what happens when you write the first time, as MoeIcenowy workaround (disable the I$) relies on that
moteen_ has quit [Remote host closed the connection]
moteen has joined #linux-sunxi
<hramrach>
but how do you jump to that cod?
<apritzel>
there are three FEL USB commands: read data, write data, execute
<hramrach>
so the code for jumping some random location is there to start with
<apritzel>
actually the FEL code should invalidate the I$ if it runs with .I enabled, before it starts executing user code
<apritzel>
that's the BootROM code talking to the USB OTG device: it parses the packets, and then either reads data and returns that, or writes the data sent, or it branches to the given address
<apritzel>
if you just "ret" at the end of your provided code, then the BROM continues with the USB OTG operation, so you can send more
bauen1 has joined #linux-sunxi
<apritzel>
that's how sunxi-fel works: we upload small code snippets (for instance to program some MMIO registers, or read data from the SPI FIFO), and then execute them
<hramrach>
and if you rewrite the same location, have instruction cache enabled, and the brom does not invaliddate it you get garbage
<apritzel>
yes, though not garbage, but just the old code that was there before
<hramrach>
because the cache capacity is limited you can get mix of old and new code
moteen has quit [Ping timeout: 480 seconds]
<apritzel>
theoretically, but we write very small snippets, and the L1I is typically as big as the SRAM
<hramrach>
you get garbage anyway if the entry point of the old and new code differs
<apritzel>
but they don't, because we always re-use the same thunk address for that buffer
<apritzel>
but yeah, you cannot rely on anything, so old code is as good as garbage ;-)
<apritzel>
you can actually prove that behaviour with U-Boot: write a few instructions, followed by a "ret" somewhere (with mw.l), then use the "go" command to execute that
<apritzel>
and then change that code, and execute it again
<apritzel>
your change will most likely not be honoured
<apritzel>
(unless you had the proper invalidate sequence in the first instructions)
<hramrach>
and if this assesment is correct it's not really a problem for normal operation in which you upload a piece of code and execute it but it becomes a problem in development when you rewrite the memory mutiple times to run different test programs
<apritzel>
a program loader must observe that sequence, especially with the data caches on
<apritzel>
but yeah, JIT's hate it, because obviously frequent I$ invalidates are not good for performance
<apritzel>
that's why later ARMv8 revisions have feature bits to advertise that L1I and L1D are actually coherent
<hramrach>
that does not come for free either but you do not have to deal with the fallout yourself
<apritzel>
yes, coherency is somewhat expensive, bot h
<apritzel>
yes, coherency is somewhat expensive, both in terms of complexity to build, but also in power consumption
<MoeIcenowy>
apritzel: well I think the whole FEL just depends on SRAM not being touched
<apritzel>
MoeIcenowy: possibly, but this whole "upload code snippets and execute them" must be used by them too?
<MoeIcenowy>
apritzel: yes
<MoeIcenowy>
although they only use them with FES1 and U-Boot