ChanServ changed the topic of #dri-devel to: <ajax> nothing involved with X should ever be unable to find a bar
<airlied> karolherbst: btw is that all nv50->nvc0?
nchery has quit [Read error: Connection reset by peer]
nchery has joined #dri-devel
<karolherbst> airlied: no idea why, but it's really just nv50
<airlied> okay so not nv98 etc wierd
<karolherbst> ohh.. , I meant nv50 as the gallium driver
<karolherbst> currently on a g98
<airlied> ah cool
<karolherbst> doing another git bisect but removing the "default to nir for nv50" commit and see where I land
JohnnyonF has quit [Read error: Connection reset by peer]
<jekstrand> Ok, well I've got a disassembler now.
<karolherbst> \o/
<jekstrand> And the current Turing macros appear horribly inefficient. IDK if they actually are or not. I'm still figureing out how some of this works.
<karolherbst> jekstrand: I think you can't make use of whatever you do in slot0 in slot1
<jekstrand> karolherbst: yeah
co1umbarius has joined #dri-devel
<jekstrand> I assumed
<jekstrand> karolherbst: The thing I don't get is how this output stuff works. I'm starting to think the outputs just feed the command parser. So if you write a 32-bit immediate, it treats that as if it's a dword on the command stream.
<jekstrand> At least, that's how it's looking.
<jekstrand> but it's hard to tell
<jekstrand> there's "emit" and "method"
<karolherbst> yeah.... not sure
columbarius has quit [Ping timeout: 480 seconds]
<karolherbst> it's all very magic to me
<jekstrand> heh
<jekstrand> I'm sure I'll figure it out. :D
<jekstrand> I think the next thing to do is to write a tiny little app which sets up a macro, executes, it, and lets me see the output somehow.
<karolherbst> well.. I suspect it's more or less like the stuff we submit to the hardware
<jekstrand> If I can get that working, I'll be able to RE the whole thing.
<karolherbst> where you execute e method and hand over a 32 bit value
<jekstrand> It also has branching and IDK how that works
<karolherbst> there are some comments on that
<karolherbst> somewhere...
<karolherbst> not sure if that differs from volta+
<jekstrand> Oh, well that's fun. It uses SET_REPORT_SEMAPHORE_A to write to memory. :D
<karolherbst> yeah
<karolherbst> how else?
<jekstrand> That works. I just hadn't thought of that. :D
<karolherbst> :D
<karolherbst> fences are just mem writes really :P
<karolherbst> well.. they are a bit more special though
<jekstrand> pretty much
<karolherbst> I am just curious on how to whack the graph/context state directly :P
<karolherbst> but I guess one could find out with a little to of tinkering and getting values back
<karolherbst> jekstrand: I assume we can also use macros to fetch the QMD version supported by hardware
<karolherbst> if you want to already start with something useful
<karolherbst> NVC0C0_CHECK_QMD_VERSION e.g.
<karolherbst> also checking how "NVC0C0_SET_QMD_VERSION" affects the value returned
<karolherbst> ... guess what I git bisected to: mesa/st: Always generate NIR from GLSL, and use nir_to_tgsi for TGSI drivers.
<karolherbst> anholt: soo.. something is bonkers and I have no idea what... I might try to force the nir path and see if antyhing before the enablement could have broken it, but I suspect I'll probably run int a lot of broken mess :(
<airlied> karolherbst: what does the broken look like?
<airlied> dirty like zebra
<karolherbst> why can't I simply enable some env var globally and hope it sticks *sigh*
<airlied> karolherbst: you don't see the invalid opcode reports?
<karolherbst> I did today
<karolherbst> doesn't seem to always happen
ella-0_ has joined #dri-devel
<karolherbst> jo wtf...
<karolherbst> why doesn't gnome pick up my env var :(
ella-0 has quit [Remote host closed the connection]
<karolherbst> heh.. that's odd
khfeng has joined #dri-devel
<karolherbst> I am sure it's something with the CFG, but... uhhh
<karolherbst> it makes no sense
JoniSt has quit [Quit: KVIrc 5.0.0 Aria http://www.kvirc.net/]
Daanct12 has quit [Remote host closed the connection]
Daanct12 has joined #dri-devel
karolherbst has quit [Read error: Connection reset by peer]
karolherbst has joined #dri-devel
YuGiOhJCJ has joined #dri-devel
ybogdano has quit [Read error: Connection reset by peer]
<mareko> why can't Marge merge this simple change? https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18302
<mareko> the CI just doesn't execute for this MR
<airlied> mareko: does it hav the other can change this bit set?
kem has quit [Ping timeout: 480 seconds]
Daanct12 has quit [Remote host closed the connection]
<mareko> airlied: Marge can push but it can'r run pipelines
<mareko> can't*
<zmike> maybe it's a weird gap where no jobs cover the changed files?
kem has joined #dri-devel
bmodem has joined #dri-devel
saurabhg has joined #dri-devel
TMM has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
TMM has joined #dri-devel
a-865 has joined #dri-devel
a-865 has left #dri-devel [#dri-devel]
a-865 has joined #dri-devel
<mareko> it affects radeonsi and radv
<mareko> anholt: Marge can push but can't start pipelines: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18302
<mareko> we've been trying to merge it for 2 days
<airlied> mareko: I think it's because the branch it is in is called master
<airlied> please push it to a branch with a different nanme
* airlied vaguely remember someone having the same problem a while back
kts has joined #dri-devel
chek has joined #dri-devel
chek has left #dri-devel [#dri-devel]
The_Company has quit []
xroumegue has quit [Ping timeout: 480 seconds]
Lucretia has quit [Remote host closed the connection]
agx_ has joined #dri-devel
agx has quit [Read error: Connection reset by peer]
Lucretia has joined #dri-devel
xroumegue has joined #dri-devel
bmodem has quit []
heat_ has quit [Ping timeout: 480 seconds]
Duke`` has joined #dri-devel
fab has joined #dri-devel
kts has quit [Ping timeout: 480 seconds]
jewins has quit [Read error: Connection reset by peer]
Duke`` has quit [Ping timeout: 480 seconds]
mvlad has joined #dri-devel
danvet has joined #dri-devel
gio has joined #dri-devel
itoral has joined #dri-devel
agx_ has quit [Ping timeout: 480 seconds]
agx has joined #dri-devel
khfeng has quit [Ping timeout: 480 seconds]
fab has quit [Ping timeout: 480 seconds]
frieder has joined #dri-devel
khfeng has joined #dri-devel
<daniels> mareko: don’t have Part-of in your commit message
bluetail21514 has quit [Read error: Connection reset by peer]
bluetail21514 has joined #dri-devel
bluetail21514 has quit []
bluetail21514 has joined #dri-devel
fab has joined #dri-devel
lemonzest has joined #dri-devel
bluetail21514 has quit []
bluetail21514 has joined #dri-devel
bluetail21514 has quit [Read error: Connection reset by peer]
bluetail21514 has joined #dri-devel
bluetail21514 has quit [Read error: Connection reset by peer]
bluetail21514 has joined #dri-devel
khfeng has quit [Ping timeout: 480 seconds]
tursulin has joined #dri-devel
Haaninjo has joined #dri-devel
YuGiOhJCJ has quit [Quit: YuGiOhJCJ]
srslypascal is now known as Guest1714
srslypascal has joined #dri-devel
<lina> karolherbst: I meant abstractions (I keep confusing the two...)
<lina> Rust-for-Linux already uses bindgen for the bindings ^^
Guest1714 has quit [Ping timeout: 480 seconds]
<hakzsam> some spurious freedreno CI failures https://gitlab.freedesktop.org/mesa/mesa/-/pipelines/678189
<dj-death> looking at nir_opt_gcm, it seems to only push instructions down into an if block if it's a load_const
jkrzyszt has joined #dri-devel
<mareko> daniels: not sure what you mean, Part-of inserted Marge
rasterman has joined #dri-devel
<mareko> or Marge inserted Part-of
<MrCooper> mareko: I just added a comment explaining it
<MrCooper> but the source branch name might be an issue as well
ella-0 has joined #dri-devel
<MrCooper> anyway, just reassigned now and letting Marge rebase
ella-0_ has quit [Read error: Connection reset by peer]
<pepp> maybe the sanity job should check the branch name ("main", "master" seem to cause problem, as well as name with a "+" like !18226)
lynxeye has joined #dri-devel
bluetail215140 has joined #dri-devel
thellstrom has joined #dri-devel
bluetail215140 has quit []
bluetail21514 has quit [Ping timeout: 480 seconds]
<airlied> MrCooper: it's just going to timeout again I think
<MrCooper> yep
<MrCooper> pepp: the sanity job doesn't exist in the pipeline on the MR page
<MrCooper> the problem is GitLab doesn't create an MR pipeline
thellstrom has quit [Ping timeout: 480 seconds]
<mripard> airlied: danvet : did you receive the drm-misc-next PR from mlankhorst on 20/8 ?
kts has joined #dri-devel
<knr> danvet: Please comment on https://lore.kernel.org/dri-devel/20220823210612.296922-1-michal.winiarski@intel.com/ (expanding device numbers).
<knr> airlied: You mentioned on this channel that there are "other discussions", could you elaborate?
kts has quit [Ping timeout: 480 seconds]
pcercuei has joined #dri-devel
khfeng has joined #dri-devel
kts has joined #dri-devel
minecrell has quit [Quit: Ping timeout (120 seconds)]
minecrell has joined #dri-devel
<airlied> doesn't look like it's there
<airlied> knr: we were discussing whether we should just figure out a way to make it dynamic
<airlied> and we've also been talking about using control nodes for things like accessing RAS data
<mripard> yeah, I received it in my inbox but it doesn't look like it reached the ML
MrCooper has quit [Remote host closed the connection]
<mripard> mlankhorst: ^
<mripard> mlankhorst: could you resend the drm-misc-next PR?
<airlied> and also the possibility of using some space for a separate accel namespace
rkanwal has joined #dri-devel
MrCooper has joined #dri-devel
bmodem has joined #dri-devel
kts_ has joined #dri-devel
<knr> and was there any conclusion of those discussions? :) is there a way we can make forward progress on expanding the DRM device limit?
<knr> In v1 comments I suggested that perhaps we could stop doing the "static" minor reservation for minors > 192
sdutt has quit [Ping timeout: 480 seconds]
<knr> (since the current userspace will be "broken" for minors > 192 anyways)
<knr> accel seems like something orthogonal
<airlied> it is kinda, though control was for accel cards
kts_ has quit []
<airlied> but yeah i think droppong static for > 192 might be a workable plan
<airlied> esp for things like sriov vf stuff
<knr> huh? wasn't render supposed to be for accel (and control for modesetting?)
<airlied> nope control was never used for its original purpose
<airlied> card for modesetting
<airlied> some accel/gpus need an out of band RAS mechanism and a separate dev node might be s good solutio
<knr> and that uses IOCTLs? or just sysfs?
<airlied> would need ioctls
<airlied> sysfs may not scale, and lacks atomicity
<knr> well, for < 192 we still have 64 minors reserved, which can be reused for different purpose, for > 192 it's first-come first-served
<knr> the way to identify what "type" a given minor is is somewhat annoying
<knr> since you have to rely on matching the chardev name
<knr> to given pattern
<knr> in other words - it's not self-describing from sysfs perspective
<knr> but it's workable IMO
<airlied> i suppose we could make some sort of descriptive links from sysfs
<airlied> so we arent using /dev to work it out
<emersion> keep in mind that sandboxes exist
ahajda has joined #dri-devel
<emersion> we need to make sure to not break these
<airlied> emersion: why?
<airlied> they are generally broken the whole time by misc kernel changes
<emersion> if a new libdrm just uses sysfs, it'd break in sandboxes
<emersion> by sandbox i mean flatpak etc
<airlied> but libdrm would have to be compat with old kernels
<airlied> this would just be for new things
<airlied> could make it driver opt it to use dynamics
<airlied> since 64 is still plenty for most normal use cases
<emersion> right, it depends how it's done
<emersion> if it's just a is_new_kernel() check it'd still break
narmstrong_ has quit []
narmstrong has joined #dri-devel
kts has quit [Quit: Konversation terminated!]
<knr> emersion: doesn't sandboxing (or containers) need sysfs anyways?
<emersion> iirc it's an explicit opt-in
<knr> and I guess it's fine to just use name - it's what BSD does with libdrm
kem has quit [Ping timeout: 480 seconds]
<knr> old libdrm (and friends... mesa and other userspace drivers) are still fine with <192, new libdrm will work with both
dv_ has quit [Quit: WeeChat 2.8]
<knr> the only thing that's worrying is test coverage, since you need to have large number of devices connected to actually hit >192 minors
<airlied> could force a mod opt to do all > 192 for testing
<knr> yeah, that would work
<knr> so.. should I prep v3 that does things dynamically (+modparam for testing)? emersion: any objections? :)
kem has joined #dri-devel
<knr> that would be without any additional sysfs, we'd just rely on chardev name
<emersion> no objections
<emersion> chardev name sounds more reliable than magic with major/minor anyways
<emersion> i like the BSD approach of just not assigning any special meaning to dev_t numbers
<emersion> but a bit too late for that
<lina> By the way, I started writing Rust bindings for DRM, starting with GEM/shmem/MM (and the core drv/device stuff). Hopefully I'll have something to show in a few days ^^
kts has joined #dri-devel
bmodem has quit []
dos1 has quit [Quit: Kabum!]
dos1 has joined #dri-devel
<danvet> airlied, knr only thing I'm wondering on the minor allocation is whether we should even have a scheme for anything new
<danvet> i.e. allocate first 64 like we do now for render and card nodes
<knr> no, that's the idea - no scheme
<knr> first-come, first-served
<danvet> ah excellent
* danvet way behind
<knr> (for >192)
<knr> (we keep the scheme for compat)
<knr> (for <192)
<danvet> knr, so you'll do a v3 with that?
<danvet> with the do_while replaced with just a fallback to idr_alloc(start=192)?
<danvet> knr, also feel free to add an r-b: me to patch 2/2 with the locking change
<danvet> seems reasonable
<danvet> knr, maybe if you can add a patch 3/3 which adds a might_alloc(gfp); to idr_get_free?
<danvet> I think that'd be nice for lockdep coverage
<danvet> since we're not guaranteed to end up in an allocation
<danvet> airlied, I thought control was for pure kms? but it never worked, since you had zero gem ioctl, so impossible to get a bo handle into that node's namespace and actually create a drm_fb
kts has quit [Quit: Konversation terminated!]
<danvet> which is why I nuked it
kts has joined #dri-devel
<knr> danvet: yeah, I'll do a v3, 3/3 or separate patch?
<danvet> knr, you can also send it out separately, it needs to go in through idr maintainer anyway
<danvet> I like to include those in the series for context, but that's just me
<danvet> also might_alloc is pretty new, so rolling it out where it's missing is a good idea in general
<danvet> knr, also just noticed that willy has added might_alloc to xarray, so he should be enthusiastic about adding it to idr
<airlied> danvet: core was originally for multi seat resource splitting
<airlied> sorry control
<airlied> the idea was userspace seat manager would use it to create core nodes per seat with fixed resource assignments
<airlied> back when the dream was two separate users sharing a gpu
bluetail215140 has joined #dri-devel
<danvet> airlied, yeah we have leases now for that, kinda
<airlied> yeah we never really got the dream though :-p
<airlied> the kernel side worked and i think i had gdm patches at one point
<danvet> yeah with modern hw hardcoded splits kinda don't work, too much internal sharing
<daniels> airlied: huh, you lucked out with a single card - the machine I was making work for the other distro was 4x NV cards (which of course my employer at the time couldn't be bothered shipping to me, so I had to replicate with a homebrew setup which led to a week burnt on issues I couldn't reproduce due to VGA primary vs. PCIE enumeration order ... not that I'm still bitter or anything)
<airlied> daniels: just needed to write vga arbitration :-p
<daniels> writing & upstreaming code was an anti-goal
<daniels> (got disciplined for trying)
<airlied> danvet: yeah it would kinda suck, need a system compositor that fakes kms nodes so seats think they are talking to the kernel
<airlied> danvet: its the C way
<airlied> or rather the spaceboy way!
<daniels> mm I'm not sure system compositors are the way to go tbh. I mean we already lie to userspace about trivial details like 'planes' and 'encoders', so why not just lie harder?
<daniels> reconfiguration is always going to suck, but shrug
<daniels> sucks with a system compositor too
Jeremy_Rand_Talos_ has joined #dri-devel
<HdkR> Userspace doesn't need to know the truth anyway :P
Jeremy_Rand_Talos__ has quit [Remote host closed the connection]
<ccr> cue Obi-wan "so what I told you is true .. from a certain point of view."
<danvet> airlied, lease looks like kms node, and compositors I think already can handle the seat being opened for them
<danvet> so I think all that's left is the system compositor
<danvet> the vt switch is also moved into systemd, so that should all pan out I think
shadeslayer has joined #dri-devel
shadeslayer is now known as Guest1728
Guest1728 is now known as shadeslayer
itoral has quit []
pallavim has joined #dri-devel
<karolherbst> lina: ahh cool. I was also seeing you were using Pin<> for things, but didn't watch closely why exactly. Is the rust for linux abstractions using that? and what's the benefit for that in regards to C code? It felt to me it's only really useful for self referential things and it's more of a "hint" than actually doing anything.
Company has joined #dri-devel
<linkmauve> I’m testing my program for the first time on a 120 Hz output, and unsurprisingly it runs at twice the speed until my frame time detection code kicks in and switches from relying on eglSwapInterval(1) to usleep().
<linkmauve> Is there anything in EGL that would let me figure out that 120 Hz is a multiple of 60 and try eglSwapInterval(2)?
<linkmauve> Or should I rely on platform information, such as finding the current refresh rate of the current output?
<linkmauve> My game logic must run at 60 Hz, so I have the same issue on 50 Hz outputs and such, where I also switch to eglSwapInterval(0) in order to do timings by hand.
<pq> danvet, except that system compositor would not actually composite anything. :-) Might as well be systemd-logind managing leases if it had a config somewhere.
<pq> linkmauve, I think the fundamental problem is that you rely on the display timings to pace your simulation.
<linkmauve> The mode we switch to when they don’t correspond actually decouples that entirely.
<pq> linkmauve, so why even have the coupled mode at all?
<linkmauve> But then I can’t rely on optimisations like Weston’s delay after vsync or such things.
<pq> huh?
<linkmauve> pq, hmm, would you recommend to always eglSwapInterval(0) and pace frames with a timer?
<linkmauve> Or to keep eglSwapInterval(1) and output the last rendered frame instead?
kts has quit [Ping timeout: 480 seconds]
<pq> linkmauve, if your simulation needs to run on a timer, just make it run on a timer always. Rendering is not part of simulation, and render+display should happen only when you both have something new to show and display is ready for another frame.
ppascher has quit [Quit: Gateway shutdown]
<pq> I recommend making simulation completely independent and free-running, and that the render+display machinery sample the simulation state when the display needs an update.
<linkmauve> On a modern computer the whole simulation+render thing takes almost no time at all, so it’s useful to do it as late as possible, if we start the simulation (that is, polling input) at a random unsynchronised time it can lead to weird timings.
<pq> whether you use swapInterval 0 or 1 is irrelevant, it must not affect the simulation rate.
<linkmauve> (When I say almost no time at all, I mean sub-ms.)
<linkmauve> (On an Intel GPU from five years ago.)
<pq> "as late as possible" is a whole different matter
<pq> currently Wayland has no good provision for "as late as possible", for example.
<linkmauve> Right.
khfeng has quit [Remote host closed the connection]
<pq> choosing the swap interval is more about your program structure: do you have a separate thread for render+display or do you do it the same thread as simulation.
<pq> ..when your goal is to render once per display refresh cycle
<linkmauve> Currently I have only a single thread for all of that, only spreading some known-safe parallelisable work to a thread pool.
<pq> right
<pq> in that case you want swap interval 0, and watch for frame callbacks yourself.
<danvet> pq, yeah it really only manages access control and revoke
<danvet> so essentially like vt switch, but with drm lease ioctls
<danvet> I guess we'd need some kind of protocol to supply a new drm fd when switching back
<danvet> since atm lease revoke is one-way
<pq> danvet, logind already works like that: compositor asks logind to open a DRM device, logind sends an fd back.
<daniels> pq: either frame callbacks or working adaptively with presentation-time
<emersion> pq, but on VT switch back, logind doesn't supply a new DRM FD
<emersion> instead, the DRM FD goes from paused to unpaused
<pq> emersion, I don't know why switching is discussed. The leases would be active simulatenously.
<pq> you can't pause leased DRM fds like you can open()'d DRM fds?
<emersion> danvet is talking about moving VT switch into user-space
<pq> doesn't pausing the original DRM fd also pause all the leased fds?
<emersion> probably, but then it's still done on the kernel side
<pq> how could it not be kernel side?
<emersion> if you have systemd manage the VT switch completely, then it's not kernel-side
<pq> isn't systemd-logind only telling the kernel to pause/revoke the fds and the kernel does the actual revocation and pausing?
<emersion> right now, yes
saurabhg has quit [Ping timeout: 480 seconds]
<pq> how else could it be?
<emersion> in the future, maybe not
<pq> how do you revoke an fd without telling the kernel about it?
<emersion> in the future, maybe systemd can take care of all that
pallavim_ has joined #dri-devel
<pq> daniels, once presentation-time has been extended to communicate the deadline, yes, but before that it doesn't help for "as late as possible". I consider having to repeatedly miss frames to figure out the deadline to be a non-solution.
pallavim has quit [Ping timeout: 480 seconds]
<daniels> right, it's the second sentence that's the issue :) I hadn't taken that as axiomatic, not least because frame events are not a tight guarantee either
pac85 has joined #dri-devel
<pq> There are no guarantees, except that the frame event is a good time to start drawing.
nonhole has joined #dri-devel
<pq> if you delay from that, it may not be a good time or it may be too late to even start
<swick> that's one thing that the present_timing and present_wait get wrong as well. communicating the presentation is not enough for scheduling your own work, only for knowing what you should draw.
<nonhole> The pci_disable_device function is defined in the source file of the kernel pci module. Let the pci_disable_device function be a working parameter of the kernel pci module.
<nonhole> GRUB boot commands works when I type pci=disable_acs_redir=pci:12D8:2308 in linux pci module.
<nonhole> When I type pci=pci_disable_device=pci:12D8:2308 to disable the hardware, it gives unknown parameter error.
<pac85> pq: isn't it possible to measure the time the application waits for the callback and wait a safe percentage of that time before starting tendering?
<nonhole> what is the reason of this ?
<pq> pac85, what would be a safe percentage? Some compositors adjust when they send frame callbacks. Most compositors have a deadline well before the target vblank.
<danvet> emersion, with leases all you can do is revoke
<danvet> not restore
<danvet> which is unlike the drm ownership for the card* node
<danvet> so either we need to extend logind+compositors to re-acquire the lease
<danvet> or we extend the kernel to restore a lease
<danvet> either way it's still a combo of logind issuing ioctl and kernel enforcing access restrictions
<pq> re-acquire would fit the general lease design I think - can be revoked at any time, deal with it
<MrCooper> pq: even with an explicit deadline in the protocol, the client still needs to keep track of how much earlier than that it needs to start to hit the deadline; and if it wants to optimize that, there's a corresponding risk for missing the deadline
<pq> pretending that a lease is just another normal DRM fd feels a bit... loaded with traps
<pq> MrCooper, there is always a risk.
<MrCooper> so not sure the explicit deadline would change all that much
<pq> it does not mean stuff should be designed to realize the risk repeatedly during normal operations
<MrCooper> it's the same basic trade-off with or without explicit deadline
<daniels> pq: the event isn't a guarantee that it's a good time to start drawing - it's the compositor's guess
<pq> Aside from scheduling issues, if a client knows the deadline, then it has all the information that can exist.
<daniels> which may or may not be good :)
<daniels> swick: it's not 'wrong' to not include forward-looking speculation on timing for future frames. it's merely less complete than a solution which does include those things.
<pq> daniels, in compositor's opinion it is the best to time start.
<daniels> agreed
lkw has joined #dri-devel
<MrCooper> pq: it's not "best" for clients which need near 100% GPU utilization to sustain full frame rate; they need to start earlier
nonhole has quit [Quit: Page closed]
<MrCooper> (or generally when the time interval between starting a new frame and the GPU finishing is longer than a display refresh cycle)
pcercuei has quit [Quit: brb]
<MrCooper> this is assuming the Wayland compositor does proper mailbox semantics and uses only idle buffers for its output
<swick> daniels: nothing is wrong per se, you can start drawing at random times and it's not wrong, but it's objectively worse
pcercuei has joined #dri-devel
<pq> MrCooper, it's the *compositor's* opinion. If something wants to max out GPU, then it would never wait for anything.
<daniels> swick: yes. my point is that describing something which is not a perfect and complete solution which is 100% bulletproof accurate every single time for every single usecase as 'wrong' is not only in itself wrong, but seriously annoying when it derails every improvement that people are trying to make which doesn't fit into your own worldview.
<MrCooper> pq: that's not what I'm talking about
<swick> where exactly are people improving the situation?
<daniels> in the MRs you keep posting in saying that everyone must use the presentation-timing protocol which isn't anywhere near complete ...
<pac85> MrCooper: yeah, I talked with some game developers about presentation-time and they told me that some operating systems have something similar but it doesn't really help
<daniels> or in the exact point this discussion started, where you described present_wait as 'wrong' for not having the larger scope you wished it would have
<swick> oh you mean a protocol that has to be implemnented by compositors instead of just fixing the WSI?
<swick> sorry, but I see a lot of people complaining about WSI problems and choosing the easiest way to just get around the exact problem they are facing
<swick> and nobody actually investing time in unfucking the whole thing
saurabhg has joined #dri-devel
<swick> and papering over the issues doesn't improve how much work needs to be done, it only helps to keep a few more applications limping along
<pac85> pq: I think determining a safe margin is the hardest part. Rendering takes a variable time so I guess measuring expected time and variance then calculating the margin based on a tradeoff with the probability of missing the deadline could be a way but I'm sure it can get more involved.
<swick> so yeah, I'm getting annoyed by people increasing and decreasing the swapchain length because they have too few images for keeping the GPU bussy or too many images for scheduling with backpressure, or when people notice that the frame callback is just a hint and instead of decoupling the WSI from it start adding more protocols to work around that
jewins has joined #dri-devel
pac85 has quit [Ping timeout: 480 seconds]
<MrCooper> I'm starting to think it would be better if we stopped thinking of frame events as being related to frame scheduling at all; it simply marks the point where attaching a new buffer can no longer replace a previously attached buffer
<swick> which would mark the deadline?
<MrCooper> the frame event happens to be a good time to start the next frame for simple clients, but not objectively "best" for all clients
<MrCooper> swick: there's no explicit protocol for the deadline yet; it can only be guessed based on presentation time events
<MrCooper> that said, normally the deadline should be around one refresh cycle after the previous frame event
<swick> > the point where attaching a new buffer can no longer replace a previously attached buffer
<MrCooper> that's the frame event
<swick> isn't that just another way to say "buffer attach deadline"
<MrCooper> as I said, "one refresh cycle after the previous frame event" is a good first approximation of the deadline, but it can vary (and does with mutter)
<MrCooper> and while currently it's the deadline for attaching the buffer, long term it will be the deadline for the GPU finishing
kts has joined #dri-devel
<daniels> swick: yes, it's a monumental effort to get from where we are to perfect. no-one is denying that perfection is a _long_ way away. no-one is trying to say that the smaller incremental improvements are perfect and no other work will ever need to be done after that. what we've been trying to say is that incremental improvements are achievable, whereas the presentation_time extension in its current form (with large known problems, no
<daniels> implementations, no real end users, etc) is a _long_ way away, and that we'd rather have something slightly better now to bridge the gap. your position seems to be that anything other than the perfect solution is a complete non-starter and should never be worked on by anyone.
<daniels> which would be more defensible if the perfect solution seemed even remotely achievable ...
<MrCooper> I agree in principle, though the flip side is the risk of painting ourselves into a bad corner with incremental changes
<MrCooper> so it's important to have at least a rough idea of where we want to end up long term
<alyssa> daniels: I read your most recent (3) message out of context and thought you were talking about Mesa CI ;-p
<daniels> heh
<tomeu> same here :)
<alyssa> jasuarez: how do you feel about my conditional rendering proposal?
<alyssa> I can give a go at implementing it today if you'd like
<daniels> MrCooper: indeed the interactions could be hairy. in a lot of cases you might just have to have the incremental solutions and the full solution be mutually exclusive per-surface, and I think that's totally OK. but that also goes both ways: if the scope of presentation timing keeps on creeping infinitely (seriously, it is not a replacement for surface suspension ...) then it will become impossible to implement alongside anything
<daniels> else because it will have subsumed everything else.
zehortigoza has quit [Remote host closed the connection]
<alyssa> jasuarez: Your MR shows basically what has to be done.. not sure if I'd cherrypick anything there directly (maybe the first patch and then squash changes in) but yeah
<alyssa> At the very least we should land "gallium: add common functions for conditional render
<DemiMarie> daniels: Time to insert a rant about buggy games that insist on redrawing every loop?
<alyssa> ", but going all the way and doing the conditional_draw_vbo indirection dance would accomplish the goal of "gallium: move conditional render to gallium" without the overhead
<daniels> DemiMarie: again, 'buggy' is a subjective value judgement. there are a lot of games developers who will tell you it can't practically be done any other way, at which point it's downgraded from 'objectively bad omg no' to 'an unfortunate necessity that we have to live with'.
<alyssa> though I understand if you're burned after implementing the MR and getting that patch nak'd and would rather not respin this, which is why I'm offering to do so
<DemiMarie> daniels: why will they say that?
<daniels> maybe I'm just getting old, but I'm increasingly skeptical of tilting at windmills rather than trying to make things work well for real people
sdutt has joined #dri-devel
<Frogging101> And whether they're right is irrelevant because unless they all rewrite their games, real apps will be broken
<daniels> ^
<daniels> DemiMarie: there are any number of talks, blog posts, etc, which explain how modern game engines are designed and work out there
<DemiMarie> daniels: valid
<DemiMarie> hence the idea of (if I understand correctly) NOPing the spurious render calls
<daniels> provided you can determine that they're spurious ...
<Frogging101> DemiMarie: It's not really our place to decide whether developers' apps are buggy or not. The choice is between having them work and having them broken
<jasuarez> alyssa: sorry, wa having lunch :)
<DemiMarie> Frogging101: yeah, the joys of working on open systems where one doesn’t control all of the code
<jasuarez> the idea looks good, but I understood from zmike and mareko not going ahead with doing the change in gallium
<alyssa> jasuarez: zmike and mareko object to the increased overhead, particularly for zink and radeonsi
<alyssa> overriding draw_vbo in general (as patch 2 of the MR does) increases overhead for everyone, which is nak'd
<bl4ckb0ne> DemiMarie: beats working on a close system whe one doesnt control the code at all
kts has quit [Ping timeout: 480 seconds]
<alyssa> with my proposal, the new common code only executes at all if a driver opts into it (i.e. does not have hardware conditional rendering) AND the application uses conditional rendering
<alyssa> There's no change to frontends or gallium core
<jasuarez> I see... so the idea is keeping untouched the original draw_vbo
<alyssa> Yep
<jasuarez> and add a conditional-based that drivers opt-in
<alyssa> Yep
<jasuarez> nice
<alyssa> When conditional rendering is switched on for a driver that doesn't support it, draw_vbo is swapped out for a version with the check
<jasuarez> yeah, as said, I like the idea, and I'd like to go with it
<alyssa> cool
<alyssa> do you want me to give a go at implementing it today? or do you want to own this?
<jasuarez> if you want to go ahead, no problem
<alyssa> (the urgency being that I run off to university land next week and won't have much time for Mesa until next April)
<jasuarez> if you prefer I do it , not problem either :)
<alyssa> (so if you want me to do it, it'll be today)
<jasuarez> OK, then go ahead
<alyssa> Ack
zehortigoza has joined #dri-devel
<DemiMarie> alyssa: have fun at university
<jasuarez> thanks!
<alyssa> DemiMarie: Thanks, I won't :-)
<jasuarez> :)
<DemiMarie> alyssa: dislike school?
<pq> Seems like there are two development tracks: make current stuff work vs. design how it would be ideal. Maybe those two need to be tackled and implemented completely independently. Then build a wrapper for the "current" stuff, so you can punt anyone still using that to a compatibility layer.
<alyssa> DemiMarie: I certainly prefer writing code to talking about writing code
<DemiMarie> pq: is there also a lesson for graphics API designers?
<daniels> pq: continual convergence and divergence
<alyssa> See also: academic papers, corporate meetings, ...
<DemiMarie> alyssa: you’re a professor?
<alyssa> DemiMarie: Disgruntled undergrad
<DemiMarie> alyssa: Working on Mesa as an undergrad? Wow!!
<DemiMarie> pq: the lesson that comes to my mind is that WSI functions should *always* be asynchronous
<DemiMarie> And take a callback to be called on completion
<DemiMarie> Plus providing some sort of handle that can be used with native system event loop facilities
<pq> DemiMarie, I would agree, and I know people who disagree: it's more code to write for cross-platform frameworks than blocking + threads.
<DemiMarie> pq: for one, it means that the API becomes inherently OS-dependent.
<DemiMarie> <insert rant about Microsoft not providing public API functions for integrating I/O completion ports and message loops>
pac85 has joined #dri-devel
kts has joined #dri-devel
<alyssa> zmike: btw, my conditional rendering in software proposal should also help zink a bit I think
<zmike> uh ?
<alyssa> you should be able to delete your zink_check_conditional_render code
<alyssa> and instead do, like,
<alyssa> ctx->render_condition = screen->have_EXT_conditional_rendering ? zink_render_condition : u_default_render_condition
<zmike> oh huh
<zmike> I thought I deleted that already
<alyssa> which should actually be faster than what you have now on hw/vk drivers that supports conditional rendering
<zmike> the sw path is gone since gpuinfo showed majority support
<alyssa> (since the check would be optimized out of the draw-vbo path)
<alyssa> bah, will panvk need VK_EXT_conditional_rendering then?
<zmike> uhhh
<alyssa> it'll be a pain on anything older than mali-g610
<zmike> yes?
* zmike bats lashes furiously
<zmike> zink is a future-only driver
<zmike> if the hardware is currently shipping I don't want to run anything on it
<alyssa> Yeah, I get that
<alyssa> mali-g610 is also the first mali that you can write a vulkan driver for without quitting your job
<alyssa> so. there's that.
<alyssa> if the older stuff is GLES only, fine
<zmike> sounds nice
<alyssa> if the newer stuff is VK only and Zink for GL/ES, fine
<DemiMarie> alyssa: if you are Alyssa Rozensweig, why is the M1 driver GL only?
<DemiMarie> also, “without quitting your job”?
<swick> daniels: the problem is that I don't see the incremental improvements you talk about
<swick> decoupling the WSI from frame callbacks to some degree seems sensible, yet people get stuck on a protocol for saving power and memory
<DemiMarie> The other thing I would do if designing a WSI is call abort() if the WSI functions are misused
<pac85> DemiMarie: "Time to insert a rant about buggy games that insist on redrawing every loop?" chances are that suspending rendering in a game could be possible but keep in mind that rendering might feed other parts of the engine, for example when you can click on 3d objects to select them it's usually done by writing object ids to a buffer and reading it back. Also there are things that happen entirely on the gpu such as particle
<daniels> swick: because it's a mechanism which very explicitly informs the client that presentation is fruitless, rather than have it try to retcon whether or not it would be fruitless by trying some stuff and seeing what happens
<pac85> . Also there are things that happen entirely on the gpu such as particle simulation and rendering. Keeping all of that in mind you can see how you can't just stop rendering while running the main loop to keep things like netcode happy, it would be a lot of effort keep everything in sync and you'd get nothing back because people rarely run games off screen so you are not really saving on anything. Really you have to look at it
<pac85> Really you have to look at it from the developer perspective, you only do something if it adds value to what you are doing otherwise it's a waste of time. Also if I understand correctly the problem is that wayland compositors will just suddently stop calling the frame callback without any chance for the client to take actions before that happens (aside from timer fallbacks and other hacks). One last thing is that not all soft
<pac85> Sorry for the wot
<pac85> One last thing is that not all software can be changed and sometimes you need to change other parts around it even if makes very little sense from a design perspective, making things work should be top priority, no one will be happy with a system that looks well designed but breaks apps.
<daniels> swick: there are a lot of valid and good usecases for presentation time, but some of the suggestions for 'no don't do this just make presentation time do it' are an XY problem
<DemiMarie> Yeah, games are REALLY annoying because they are all proprietary, never get updates, and are full of bugs and security holes.
<alyssa> DemiMarie: gles2 is easier to spin up than vk, gles2 will get us real stuff working soon while vk1.0 needs bunches of extensions to be useful for real world Vulkan apps or to run Zink for aforementioned OpenGL apps
<alyssa> for now, opengl dominates FOSS linux
<DemiMarie> alyssa: any plans for VK eventually?
<alyssa> that will change but I'm pragmatic and writing a simple GL2 driver is way easier than writing a VK driver for use with Zink/ANGLE for GL2 apps
<alyssa> yes, absolutely
<DemiMarie> As a Rust developer: please implement robust image access
<alyssa> ella-0: has been looking at a VK driver
<swick> daniels: I already pointed out in a comment that some compositors exist which do throttle to a very low framerate instead of suspending surfaces completely
<alyssa> image access is probably already robust
<karolherbst> alyssa: I am wondering how the work is comparable if one would only focus on the bits zink needs to provide GL2
<DemiMarie> nice
<swick> daniels: the idea that frame callbacks arrive either at a "reasonable" rate or the surface is suspended is just not true
<alyssa> karolherbst: I guess if you don't care about passing the VK CTS (only the GLES CTS), it might be less
fab has quit [Quit: fab]
<daniels> swick: all of that is true but I don't see the relevance
<karolherbst> yeah.. that would be my impression as well
<alyssa> but there are piles of stuff that you can get away without in GL but absolutely need for VK 1.0
<karolherbst> guess we'll never find out :P
<karolherbst> yeah..
<alyssa> (layered rendering, GPU copies, ...)
<alyssa> render passes
* ccr thinks alyssa's latest blog-post was enlightening about current and future state of M* graphics https://rosenzweig.io/blog/asahi-gpu-part-6.html
fab has joined #dri-devel
fab has quit []
<agd5f> is msleep() allowed in drivers when CONFIG_PREEMPT is enabled?
<swick> daniels: the suspension protocol can't guarantee a reasonable rate of frame callbacks which means it's only useful for power and memory savings, not for fixing timing issues
<DemiMarie> Is anyone else here interested in working on virtualized graphics?
<swick> daniels: that *requires* changes in the WSI and EGL
<alyssa> karolherbst: also, the infrastructure for GL in Mesa is way more mature than VK
<alyssa> although jekstrand's heroic efforts are closing that gap
<alyssa> (u_blitter is a biggie)
<karolherbst> they do indeed
<zmike> 🙏 util 🙏 blitter 🙏
<DemiMarie> Right now, the standard for virtualized graphics is “either you have hardware SR-IOV, you render in software, or you take a big security risk”
<karolherbst> DemiMarie: yep, sounds about right
<DemiMarie> alyssa: Thanks for the explanation!
<alyssa> DemiMarie: 🙏
<karolherbst> my IRC client displays 🙏 really weirdly
<DemiMarie> karolherbst alyssa: this situation is obviously horrible and fixing it is something that has been on my mind for years
<DemiMarie> okay, maybe 1 year or a bit more
<karolherbst> DemiMarie: well.. the thing is without SR-IOV you can only implement virtualization in software really
<karolherbst> if the hw can't do the isolation itself, the driver has to
<DemiMarie> Actually more than that
<alyssa> DemiMarie: and the other open secret is that I know how to write a GL driver with Gallium -- I wrote my first years ago -- I don't know how to write a VK driver
<DemiMarie> karolherbst: yes, the driver has to
<DemiMarie> alyssa: Wow!!!!
<alyssa> so when I got my shiny new Apple hardware and wanted to write *some* driver, of course I'll reach for the one I know how to write :p
<karolherbst> and even with SR-IOV, I doubt it would be actually safe
<alyssa> s/wrote/started/
<DemiMarie> karolherbst: wait what???
<karolherbst> what do you think how the hw isolates stuff
<karolherbst> it's still one gpu
<DemiMarie> SR-IOV being unsafe is shocking
lygstate has joined #dri-devel
<karolherbst> I was more thinking about side channel attacks
<lynxeye> agd5f: Why wouldn't it be allowed? The sleep is preemptible, in fact it's voluntary preemption.
<alyssa> i'm hyped for this ^
<karolherbst> like imagine you have 80 SM on an nvidia GPU, guess what SR-IOV is doing
<karolherbst> just slicing those 80 and distribute them across the virtual GPUs
<MrCooper> agd5f: don't see how "in drivers" would affect that; it depends on the execution context (process, IRQ handler, ...) of the code?
<karolherbst> but who guarantees there is no side channel attakc possible
<pac85> alyssa: are you doing the reverse engineering yourself for the m1 gpu?
<karolherbst> maybe in 20 years alyssa will talk with Apple about releasing docs :P
<pac85> I've read some blog posts about the mesa driver for the m1 but I'm not sure it was you
<DemiMarie> karolherbst: I would expect the hardware has an extra layer of memory mapping the way CPUs do, and statically partitions the execution units.
<DemiMarie> (including caches, etc)
<ccr> ":D"
<karolherbst> :D
<DemiMarie> So one basically has N different GPUs
<karolherbst> ehh... no
<DemiMarie> What???
<ccr> that would be expensive
<karolherbst> it has to be cheaper than just having two GPUs otherwise SR-IOV would make no sense
lkw has quit [Ping timeout: 480 seconds]
<karolherbst> SR-IOV is already a compromise
<karolherbst> SR-IOV partitions don't have to align with actual hw boundaries of anything
<DemiMarie> karolherbst: of course it would be cheaper, those N different GPUs have 1/N the power of the original
<pac85> How are exechtion unit partioned? Are they allocated dynamically? If so how are hangs managed?
<karolherbst> I would expect that at least executuin units to be statically partitioneds, but you don't have to
<pac85> How does recovery work with virtualized GPUs?
<DemiMarie> karolherbst: I sincerely hope your concerns turn out to be unfounded, because if they are not, then Qubes is basically screwed
<karolherbst> but I wouldn't be surprised if you could extract information of neigbeor execution units
<karolherbst> through timing attacks or other fun stuff
<karolherbst> DemiMarie: well.. we'll never know until somebody finds an exploit
<DemiMarie> More generally, we need some sort of solution that:... (full message at https://matrix.org/_matrix/media/r0/download/matrix.org/ORLYvhzplfhvcJTyulMcJoVn)
<karolherbst> imagine we would have talked about weirdo CPU side channel attacks on branch speculation 5 years ago :P "I mean, those are different cores, right?!? how can that be insecure"
<DemiMarie> karolherbst: those were the same core
<karolherbst> uhm... no, not all
<DemiMarie> None of the Spectre attacks can leak data across physical cores
<DemiMarie> Are you referring to a different attack than I am?
<karolherbst> ehh.. probably, I was more thinking about rowhammer or other timing attacks actually...
heat_ has joined #dri-devel
<karolherbst> but I was under the impression there was a CPU wide problem as well
<DemiMarie> karolherbst: yes, rowhammer and cache-based attacks work cross-core
<karolherbst> yeah.. cache attacks I meant
<DemiMarie> but those apply equally to the CPU and the GPU
<DemiMarie> so adding GPU acceleration at least doesn’t make the problem worse
<DemiMarie> if anything, I suspect that GPU stuff tends to be less vulnerable, because the kinds of programs that are vulnerable are not fast on a GPU
<karolherbst> well
<karolherbst> doesn't matter anyway
<karolherbst> the system firmware can read out your framebuffer anyway
<karolherbst> DemiMarie: you know what's an awesome feature? This intel management stuff comes with a VNC server
ahajda has quit [Quit: Going offline, see ya! (www.adiirc.com)]
<DemiMarie> karolherbst: and is disabled by default
<karolherbst> sure, but the firmware can still do stuff :P
<karolherbst> but yeah...
<DemiMarie> my point is that GPU acceleration does not make any of that worse
<pac85> There have been some vulnerabilitirs found im consoles where a malicious shader could gain access to main memory through the IOMMU (that was configured incorrectly), is there anything that makes this kind of attack impossinle nowadays?
<karolherbst> pac85: "just don't write crappy code?"
ahajda has joined #dri-devel
<karolherbst> it's not a technical issue, so you don't have any technical ways of preventing that
<DemiMarie> karolherbst: have you heard of Arc Compute?
<karolherbst> nope
<DemiMarie> They are working on GPU virtualization
danvet has quit [Ping timeout: 480 seconds]
<karolherbst> well there are some who do things like that, yeah
<pac85> Comforting, so I guess those attacks could still exist on modern hw
<karolherbst> pac85: as long as there exist software, there exist vulknerabilities
<karolherbst> *vulnerabilities
<agd5f> MrCooper, not in a interrupt context. Seeing "scheduling while atomic" errors due to msleeps() at driver load time, but only seemingly when CONFIG_PREEMEPT is enabled
<DemiMarie> What I want to see is formal verification with translation verification and hardware/software co-design (where you prove that the HW actually satisfies the assumptions that the SW makes)
<karolherbst> doesn't help with anything
<pac85> You are absolutely right on that
<karolherbst> again, formal verification is a fallacy and doesn't address any problem
<agd5f> during device_pci_probe() for example
<karolherbst> it's a technical solution for a non technical problem
<DemiMarie> karolherbst: translation verification means proving that the compiler did not introduce any bugs
<MrCooper> agd5f: there you have it, the code runs in atomic context, so it can't sleep
<karolherbst> yeah.. not possible
<karolherbst> also
<karolherbst> again "it's a technical solution for a non technical problem"
<DemiMarie> karolherbst: it very much is possible, seL4 did it
<karolherbst> you think they did
<agd5f> MrCooper, device probe is atomic?
<karolherbst> of course you can have that checkmark saying "formal verification, our software is secure (tm)"
<DemiMarie> I will admit that your verification toolchain can have bugs, but it is still a much higher level of assurance than would be present otherwise
<karolherbst> but what does it even tell you? okay, somebody did some stuff, and things, but does it actually help? no clue
<MrCooper> agd5f: maybe some spinlock is locked?
<DemiMarie> karolherbst: it is not perfect (ex: the foundry could screw up, your proof checker could be unsound, etc), but it reduces the risk of a vuln by (as a wild guess) several orders of magnitude
<karolherbst> well... but you are trying to solve a non technical issue with it
<DemiMarie> karolherbst: what non technical issue?
<karolherbst> software quality
<agd5f> MrCooper, good call. let me check that
<DemiMarie> karolherbst: that is what it is meant to solve 😛
<karolherbst> it doesn't fail because $technical_reasons, it fails because big manager says you got 20 days
<MrCooper> agd5f: though then I'd expect it to complain regardless of CONFIG_PREEMEPT
<DemiMarie> karolherbst: OH
<DemiMarie> It does not solve that problem indeed
<DemiMarie> but also I did not realize that you were referring to that
<karolherbst> we can all talk about all the security here, but if you don't have the time to do anything securely it doesn't matter what you actually do
<pac85> Is it really feasable to formally prove the entirety of something like Linux?
<DemiMarie> I still think that formal verification of e.g. crypto code and microkernels helps some
<DemiMarie> pac85: with current tools, not even close
<karolherbst> pac85: with a lot of money yes
<karolherbst> is anybody willing tos pend that money? no
<karolherbst> you'd have to spend _a_lot_ of time
<karolherbst> and then you have to verify all changes
<DemiMarie> karolherbst: that does not qualify as feasible
<karolherbst> correct
<pac85> If I had to guess I'd say rhe work of proving it woukd be more than the work put in to make it
<DemiMarie> pac85: you are correct the vast majority of the time
<karolherbst> all we can do in terms if improving software quality is to not cheap out on the development process
<karolherbst> but you can't have "fancy sounding technical solutions" for that, so it's a topic avoided by upper ups :P
<DemiMarie> the main exceptions are where formal verification can be used as a substitute for an otherwise incredibly expensive test suite, or to generate said test suite
alyssa has quit [Quit: leaving]
<DemiMarie> karolherbst: the other thing one can do is design software to limit the blast radius of a bug
<pac85> If you are making something with the sole purpose of being secure then you can perhaps work on coming up with a design to reduce the amount of code tha that you need to rely on like microkernels do
<karolherbst> "we need more people working on that!" - "how about we do formal verification instead and fo fancy PR articles!"
<karolherbst> "our software is formal verified!11!111!!1"
<karolherbst> *formally
<DemiMarie> Linux (and other microkernels) fail miserably on that front
<DemiMarie> karolherbst: the people I know of who do formal verification do not do it for PR
lkw has joined #dri-devel
<karolherbst> DemiMarie: yeah, but that boils down to spending more time on software development
<DemiMarie> s/microkernels/monolithic kernels/
<karolherbst> because designing things right, takes more time
<DemiMarie> karolherbst: yes
<karolherbst> DemiMarie: not saying all do, but it can be used as to imply higher rate of security
<karolherbst> I mean... in theory it's all great
<karolherbst> but it doesn't solve the inherent issue
<pac85> I think in order to get past a certain level of sevurity you need to start giving away other things
<DemiMarie> karolherbst: the only thing that can solve that is govt regulation
<karolherbst> money mostly
<karolherbst> DemiMarie: yeah... potentially
<karolherbst> make software companies be accountable so cost of screwing up is high
<DemiMarie> make it so that if a company sells a product commercially, and its firmware is hacked (without an unapproved change by the owner with a physical presense test) and damage results, the company is liable
<karolherbst> but then they simply get insurance or something
<karolherbst> and the money flows into insurance companies instead of developers
<DemiMarie> karolherbst: and then it might be “if we formally verify X our insurance premium is Y instead of 10 * Y”
<karolherbst> why would you do that?
<karolherbst> would have to spend more money on developers
<DemiMarie> yes, but it might be more than made up by the reduction in insurance costs
<karolherbst> better pay external advisors instead
<karolherbst> yes, but that's not the point
<DemiMarie> karolherbst: are you saying that business people suck?
<karolherbst> cost of labor is always more important to reduce than anything else :P
<karolherbst> DemiMarie: no way
<karolherbst> why would you think that
<DemiMarie> Also, before anyone freaks out about software freedom: I very explicitly left a loophole that allows users to modify products they own, and absolves the manufacturer of liability for vulnerabilities caused by such modifications.
<orbea> karolherbst: small point, but designing things poorly will result in more time spent in the long term working on issues than to spend the extra time up front and doing it "Correctly"
<DemiMarie> karolherbst: because that is how it came across to me, sorry for the misunderstanding
<karolherbst> orbea: sure, but that doesn't matter if you can have a project dealine in 5 instead of 24 months
<karolherbst> long term costs, who even cares about that :P
<karolherbst> nobody
<orbea> yea, time constraints are limiting factor
<emersion> jekstrand: would it make sense to add a Vulkan ext to export a timeline semaphore to a drm_syncobj>
<emersion> ?
<DemiMarie> orbea: time to market is the root of all evil 😛
<emersion> i have a vulkan timeline semaphore but i don't know how i can export it
<emersion> VK_EXTERNAL_SEMAPHORE_HANDLE_TYPE_SYNC_FD_BIT sounds like the wrong thing to ask for
<karolherbst> DemiMarie: honestly though. I wouldn't blame individuals directly, it's just "how the market works (tm)"
<DemiMarie> karolherbst: exactly
<DemiMarie> it is a systems problem and needs a systems solution
<emersion> VK_EXTERNAL_SEMAPHORE_HANDLE_TYPE_OPAQUE_FD_BIT may work but is a bit hit-or-miss?
<karolherbst> yep
<karolherbst> technical solutions are great in fixing precise problems, but software security isn't one of those
<emersion> not even sure it's wired up in drivers
<DemiMarie> karolherbst: fixing it, no, improving the situation, yes
<DemiMarie> is Qubes OS perfect? no way
<DemiMarie> Is it an improvement? Absolutely.
<karolherbst> people are pragmatic, so that's what Qubes OS ignores. Sure, it helps those few people using it, but it's more of a mitigation thing, than actually addressing the issue
<DemiMarie> karolherbst: it is indeed a mitigation
danvet has joined #dri-devel
<karolherbst> which is fine to a certain degree, but then people have other places stuff is stored and it becomes an unfixable issue indivdually unless you go through a lot of pain
<karolherbst> and then you can already forget about any fancy security features if it's painful to use for most users
<DemiMarie> karolherbst: one can either give up or try to make the situation at least a little bit better
<DemiMarie> I chose the latter option
<karolherbst> sure, some nerds are safe, but all the other 99% of people are not
<DemiMarie> karolherbst: yup
<DemiMarie> and that sucks
<karolherbst> yeah
<karolherbst> can only tell your boss that their approach suck and you refuse to work :P
<karolherbst> individually I mean
<DemiMarie> but right now, it is the best I can do, so it is what I will do
<DemiMarie> that’s why I work on Qubes
<karolherbst> yeah, but that's not improving the situation is what I meant
<DemiMarie> and why I brought up GPU virt improvements
<karolherbst> it helps a few nerds, but all the other people out there? no dice
Duke`` has joined #dri-devel
<karolherbst> it's valuable reasearch work for sure
<karolherbst> and finding critical bugs
<DemiMarie> karolherbst: not just nerds, Qubes is also used by some dissidents, journalists, and other high-risk people
<karolherbst> yeah sure, but those are still "nerds" in their own area
<karolherbst> maybe high-risk people less
<DemiMarie> fair
<DemiMarie> One thing I am trying to do is chip away at the reasons people who want to use it do not
<DemiMarie> And lack of GPU acceleration is a huge one
<karolherbst> yeah... that's probably the most important thing spend on in qubes os
<daniels> swick: no-one has ever tried to argue otherwise ... the surface-suspension protocol doesn't try to guarantee a 'normal' refresh-rate-like frequency of frame event delivery. its existence would allow WSI/EGL, in cases where it's deemed helpful (prob driconf quirks), or explicitly-aware clients, to _avoid_ blocking on frame events. this is only weakly related to target present timing.
<DemiMarie> karolherbst: and that is why I asked for help in this channel
<karolherbst> or thinking about how to bring certain features to people without making it painful to use
<DemiMarie> because the Qubes team (myself included) are not graphics experts
<DemiMarie> The people in this room are
<karolherbst> yep
<karolherbst> and I think addressing that clearing VRAM issue is probably 90% of the overall issue
<karolherbst> thinking about virtualization and stuff is all nice, but then you come in conflict with business people and product engineering
<karolherbst> "yeah, but we want to request 5k bucks for real vGPU support"
<DemiMarie> karolherbst: not to mention market segmentation 🤮
<karolherbst> yeah
<DemiMarie> thankfully Intel is helping some
<karolherbst> so.. the most pragmatic thing is to address the info leak roblem for now
<karolherbst> *problem
<DemiMarie> (which reminds me: why is it taking so long for Intel to upstream SR-IOV support?)
<karolherbst> code quality
<karolherbst> see above :P
<DemiMarie> karolherbst: even once that is fixed, there still needs to be a way for guests to make the necessary API calls to the host
<karolherbst> we do want to make sure that the patches getting upstreamed are in order
<karolherbst> and some companies are really annoyed by that fact
<karolherbst> ask danvet or airlied for more details on that :P
<DemiMarie> If only companies would do upstream-first development
<karolherbst> that wouldn't help
<karolherbst> getting things upstream is _expensive_
<daniels> DemiMarie: seL4's has been formally verified based on an assumption of how the hardware works, which is not actually true in practice ... I know this because I've found the exact same bug in two seL4-derived 'secure' hypervisors. oops.
<DemiMarie> because of code quality?
<karolherbst> because of nitpicking and bikeshedding nerds on the mailing list, yes :P
<DemiMarie> karolherbst: time for less bikeshedding and nitpicking that does not actually help?
<karolherbst> well most comments are actually useful though
<DemiMarie> I don’t particularly care about code style, that should be handled by an automated tool
<karolherbst> but sometimes you can also have those annoying conversations
<karolherbst> it's all part of the overall situation
<karolherbst> DemiMarie: if we would have _one_ code style
<DemiMarie> How much of the problem is the old-style patch submission workflow?
<karolherbst> that's actually a big thing rust has: just define one frigging code style and nobody has the right to complain
<DemiMarie> karolherbst: then make it something that an automated tool can handle?
<karolherbst> :D
<DemiMarie> That is one of the nice things about Rust, yeah
<karolherbst> DemiMarie: those tools have 5000 knobs
<karolherbst> and then people bikeshed on those knobs and what values they should get :D
<DemiMarie> karolherbst: if I were Linus I would have a single giant commit that was running clang-format on the entire kernel
<karolherbst> but what format?
<karolherbst> though I guess Linus could just decide something
<DemiMarie> Whatever resulted in the smallest diffstat
<DemiMarie> Yeah
<karolherbst> I want to see this discussion actually :D
<karolherbst> "here is the thing, please comment"
<karolherbst> daniels: oopsie
<DemiMarie> Once that was done I would tell maintainers to run clang-format as part of committing
<karolherbst> make it part of the build system :P
<DemiMarie> Yup
<DemiMarie> if your code isn’t formatted it does not build
<karolherbst> but I am sure we would end up with 100 clang format files
<karolherbst> so each subsystem and driver can have their won style
<karolherbst> *own
<daniels> code style isn't the issue with upstreaming stuff like this though, it's legit functional & design issues ...
<daniels> a lot of what's been submitted wouldn't pass formal verification, put it that way
<karolherbst> talking about code styles, the worst GNU did to the world is their code style
<ccr> and probably also a pile of perl/python/sed/whatever to fix up what clang-format ultimately makes ugly :/
<DemiMarie> daniels: yikes
danvet has quit [Ping timeout: 480 seconds]
<DemiMarie> karolherbst daniels: I am very much glad that you all take security and code quality seriously
<DemiMarie> ccr: those are clang-format bugs and should be reported
<mattst88> hm, how can I dump the SPIR-V given to a vulkan app? what environment variable am I missing?
<karolherbst> the most annoying thing is.. we were never taught to program in a secure way
<karolherbst> who actually _learns_ this
<jekstrand> emersion: Such an extension could be written. My concern is that, in future, if we replace our use of syncobj with a memory fence, applications are still going to depend on it.
<emersion> hm
<karolherbst> people still learn C in universities and it's the worst imaginable kind of C and "it's okay" or something
<emersion> but i need to grab specifically a drm_syncobj for my purposes
<karolherbst> I could point out 100 ways of improving software security without even mentioning one technical solution :D
<qyliss> DemiMarie: the problem with considering ugly formatting a bug is that if you fix it now you're going to cause havoc for all your existing users
<emersion> because then i do some drm_syncobj and sync_file stuff with it
<emersion> with KMS
<jekstrand> mattst88: If it's failing, MESA_SPIRV_FAIL_DUMP_PATH.
<qyliss> DemiMarie: Because then people running slightly different versions of the tool will get different results.
<jekstrand> mattst88: vtn_dump_shader()
<emersion> basically, i convert the drm_syncobj to a sync_file and then set it as IN_FENCE_FD in KMS
<DemiMarie> qyliss: that is what git submodules are for
<jekstrand> mattst88: If it's not failing, you may have to add one.
<qyliss> DemiMarie: I think rustfmt, for example, defines its output (for valid code) to be stable for this reason.
<DemiMarie> qyliss: yup
* karolherbst added rustfmt as part of rusticls CI pipeline
<MrCooper> karolherbst: fun fact, C is one of the few things I'm still using now that I first learned at university
<karolherbst> at least for rust we have that issue covered :3
<karolherbst> MrCooper: mhhh, though my studies were more practical, so I also learned Java, HTML, etc...
<qyliss> And even if you do pin the formatter somehow, people still don't like it reformatting things when it's eventually upgraded.
<karolherbst> none of which I _currently_ use, but potentially could use
<karolherbst> but yeah.. the more theoretical the studies are, the worse it gets
<MrCooper> they taught me Java first as well, but I'm not using that anymore :)
<karolherbst> I also learned smalltalk-80 and lisp though :D
fab has joined #dri-devel
<DemiMarie> qyliss: we need semantic diffs and patches, so that reformatting does not cause conflicts
<karolherbst> MrCooper: I think I'm getting to the point where I use C more than I used java in the past :D
<MrCooper> (actually first they taught me Oberon, but then Wirth retired :)
<karolherbst> I was doing like 5-6 years of java stuff?
<karolherbst> maybe 7?
<qyliss> DemiMarie: that's not hard to do, just format before and after. You can even configure git to do this for you automatically when viewing diffs.
<qyliss> But the problem is that it's confusing when you're looking at a diff that was a formatting-only change, because you won't see anything.
<jekstrand> I've not touched Java in about 8.5 years. :D Even when I was, I was no good at it and hated it.
<karolherbst> what would be nice is to have a tool which says "those formatting changes are technically identical"
<karolherbst> as in: they compile to the identical AST or something
<jekstrand> karolherbst: If your formatting tool isn't resulting in identical AST, you have a problem.
<karolherbst> sure
<jekstrand> They should really only touch whitespace.
tursulin has quit [Ping timeout: 480 seconds]
<karolherbst> but do you know that as a reviewer?
<karolherbst> do you actually want to review a formatting commit touching all files
<jekstrand> My problem with automatic formatting isn't that I don't trust the tools. It's that I want git blame to work.
<karolherbst> mhh.. true
<karolherbst> that's a major problem
<qyliss> If you have reproducible builds you could have CI check it.
<karolherbst> maybe we shouldn't store source file anymore, but ASTs
<karolherbst> and editors turn that into formatted source code
<jekstrand> It's hard enough with all the code motion we have going on.
<jekstrand> karolherbst: Yeah, that's going to be great for code review. :P
<karolherbst> :D
<karolherbst> well you'd review the formatted source code of course
<karolherbst> could even diff it
<karolherbst> it would also allow nice things like "when was this function parameter added" instead of having to track down the commit, because that function was changed 100 times
<jekstrand> If you don't reformat everything every time, that's not a problem with plain source code text.
<jekstrand> Not much of one, anyway.
<karolherbst> yeah
<karolherbst> just have to start and make sure it stays formatted
<karolherbst> at least our rust code would be like that :D
<jekstrand> emersion: I'm mentally going round in circles on that one. :-/ On the one hand, I've thought multiple times of typing up an extension that adds VK_EXTERNAL_SEMAPHORE_HANDLE_TYPE_SYNCOBJ_EXT and exposing that on all the mesa drivers that do syncobj. It would allow for sharing timeline semaphores across drivers which, TBH, would be pretty neat.
<Frogging101> Sometimes clang-format makes something less readable to my eyes so I ignore its changes
<emersion> indeed!
<jekstrand> But the question I have to ask myself is, "If, in the future, we get to where we use UMF for everything, can we still import/export syncobj?" I'm not 100% sure the answer to that question is "yes".
<karolherbst> jekstrand: also, what's there not to like about java, the 200 entries in the stacktraces? eclipse? pointless abstractions for no good reason? :P
<jekstrand> In which case, the question becomes, "Is everyone who's going to jump up and down about that feature going to be ok if we take it away from them 3 years from now?"
<karolherbst> Frogging101: sure, but peace is more important than people allowed to bikeshed on the format, which is kind of the point of having a defined format
<jekstrand> karolherbst: The objects, mostly.
<emersion> would it be feasible to remove support for that ext when needed?
<karolherbst> I disliked most java people being high and mighty about java in general :)
<Frogging101> Like sometimes I broke something up into multiple lines for a logical reason, because there's some conceptual separation, but clang format thinks it should be on the same line
<karolherbst> "but java can be fast" - " sure, but not the java you write"
<emersion> i have a fallback to good old "just wait"
<karolherbst> Frogging101: that's why you have those 5000 knobs
<jekstrand> emersion: Depends on how much is relying on it and, more importantly, if we know what all is relying on it and those maintainers are still around and we can get them fixed.
<jekstrand> emersion: One thing you can do is use OPAQUE_FD and hope you get a syncobj. You will for timeline semaphore exports on all Mesa drivers.
<jekstrand> And amdvkl, I think.
<emersion> yeah, i've seen that
<emersion> hm, i guess it's good enough, and it'd just fail later on
<emersion> in which case i can fallback
<Frogging101> karolherbst: Some settings might work well most of the time but poorly some of the time. Maybe I usually do want lines merged, except when I separated them on purpose
<emersion> it doesn't feel very good but oh well
mbrost has joined #dri-devel
* karolherbst wondering if he should threaten to merge rusticl next week on Tuesday and ignores anybody whining for not catching the memo
<Frogging101> I think a good policy is to use clang-format before pushing, but with the discretion to ignore it sometimes
<karolherbst> nah
<karolherbst> this will suck once others touch it as well
<karolherbst> no, it has to be a all or nothing thing
<Frogging101> this is what mesa already does, so we're in the middle already
* karolherbst thinks he gave people enough opportunities to prevent the inevitable
<karolherbst> do we?
<jekstrand> Some projects have
<Frogging101> Yes. Well, in radv at least. The maintainers suggest you use it. But it's not automatically done
<jasuarez> I think Marge should do some of those checks as part of sanity check: verify format is compliant, commits contains proper R-b or A-b, etc
<karolherbst> ahh
<jekstrand> Others have just tried to keep clean by consistent typing.
<karolherbst> I think I mostly just don't want to have to think about it and it has to be some automatic magic which is just happening
<karolherbst> maybe make marge reformat it even
<karolherbst> dunno
mbrost_ has joined #dri-devel
<jasuarez> you don't need to think about it if you don't want; mabye having some automatic reformat tool that does it for you before pushing could help
<jasuarez> others can choose to do it manually, dunno
<Frogging101> I don't want it reformatted after I've pushed it because I may have chosen to format something a certain way for readability for a reason that the machine cannot detect
<karolherbst> inherent to this topic is the fact, that we'll never agree on it :P
<jasuarez> exactly :)
<ccr> the ultimate bikeshed
<jasuarez> what for some people is "readable" for others are not so "readable"
<Frogging101> I think there are some things that we can agree on, like that consistent formatting is a good thing and automated tools are useful and usually do the right thing
<Frogging101> I think the disagreement is what to do about the "usually"
<karolherbst> you know the community will be able to handle all social problems once it was able to get automated formatting done
<karolherbst> Frogging101: I think the hard part is the actual format
<karolherbst> everything else is trivial
<karolherbst> "uhh this format is all nice and all, but if this messes up _my_ awesome table, I'll nack it until it's solved"
<karolherbst> often the solution is to not care
mbrost has quit [Ping timeout: 480 seconds]
<Frogging101> in this particular conversation we haven't been arguing about what the format should be, we've been arguing about whether it should be applied automatically
<karolherbst> but some have hard feelings about their formattig :)
<Frogging101> If it messes up their table then leave the table alone
<pac85> You can exclude certain blocks of code from being formatted though
<karolherbst> Frogging101: yeah.. tables will be thrown once we get to the actual formatting
<Frogging101> that doesn't mean it's a bad format
<karolherbst> pac85: heresy!
<pac85> karolherbst: lol why?
<karolherbst> because it disables the true and only formatting
<pac85> Lol
<Frogging101> This is why I'm arguing for discretion, because then you have the easier task of agreeing on a "one size fits most" format instead of a "one size fits all" format
<pac85> For me personally I format everything then exclude certain parts if they really become less readable (not if they become ugly since thay's subjective and irrelevant)
<jekstrand> I just type reasonably consistent code
<karolherbst> I just do what I think feels right
<pac85> Too lazy for formatting by hand, unless using the formatter requires more effort then I do it manually.
<pac85> Anyway, is the iommu used by default by current gpu drivers or is the gpu virtual memory the only thing between a shader and the rest of the system?
<karolherbst> uhm.... no and... yes
<karolherbst> but I have good news for you
<karolherbst> we have this awesome feature where the GPU gets the same VM as the application has
<karolherbst> so to thin down the abstractions a little
<pac85> Interesting, how is that achieved?
<karolherbst> you mirror it
<pac85> Oh so you mirror the application vm on the gpu
<karolherbst> yeah
<pac85> That's really clever
<pac85> Til, thank you
<karolherbst> and if you access memory on the RAM side on the GPU, you handle that in a page fault handler
rkanwal has quit [Quit: rkanwal]
<jekstrand> On Intel integrated, there's no IOMMU involved. GPU page tables are the only thing sitting between shaders and system memory.
<clever> pac85: what is? :D
<jekstrand> I suspect AMD APUs are the same.
<pac85> Lol
<pac85> So integrated graphics basically ho through the same memory subsystem?
<jekstrand> Yup
<jekstrand> They can even share caches
<karolherbst> well.. they use the same RAM afterall
<jekstrand> For discrete cards, I'm not sure about all of them. Intel's also go pretty much straight to memory with no IOMMU setup.
<clever> jekstrand: ive seen one cpu, with a 128mb L4 cache, so the cpu could use it when the gpu isnt
<karolherbst> in nouveau we have optional IOMMU support
<jekstrand> IDK about AMD and NVIDIA. I've not spent that much time looking at them.
<jekstrand> Most mobile things go through IOMMU, I think, but I'm not an expert there.
<karolherbst> not sure why we even support IOMMU, but probably because of Tegra
<jekstrand> If you do PCI passthrough to stick your GPU in a VM, then it's definitely going through IOMMU.
<karolherbst> sure, but that's not really something we care about atm
<karolherbst> but maybe that's one reason
<jenatali> At least on Windows I don't think anybody currently uses IOMMU
<jekstrand> I don't remember seeing any IOMMU when I read through the AMD page table code a year back.
<pac85> So before virtual memory on GPUs they where basically making all security measures useless right?
<jekstrand> Yeah, but for most GPUs you have to go back a LONG way to not have virtual memory of any sort.
<clever> pac85: i remember seeing code on the vc4 (rpi0 to rpi3) driver, where it will audit the shader code in kernel mode, check for every potential reference to memory, and then substitute in the address of buffers you allocated
<jekstrand> Yeah, vc4 is netoriously horrible for this. :)
<clever> so while the gpu is free to do anything, the shaders passed to it are restricted
<jekstrand> Don't go running webGL on your rhaspbery pi unless it's a 4. :)
<clever> jekstrand: vc4 (pi4) improved things, by having a dedicated mmu inside the 3d core, but the driver doesnt swap out the paging tables
<clever> so every GL client is sharing a single address space
<pac85> clever: sounds kind of fragile
<karolherbst> clever: wait what?
lkw has quit [Ping timeout: 480 seconds]
<jekstrand> clever: Yeah, that's the way Intel is pre-IVB or so.
<jekstrand> IDK what PPGGTT status is on HSW/IVB. BDW+ have separate address spaces per process.
<MrCooper> jekstrand: agd5f may correct me, but I would have thought AMD APUs go through the IOMMU for system memory access
<jekstrand> clever: But that's still way better than vc4 with no IOMMU at all.
<karolherbst> clever: ohh, I didn't mean the page table stuff, more the shader parser
<jekstrand> MrCooper: That's entirely possible. Like I said, I read the page table code but wasn't really looking for that and it's complicated so I very easily could have missed it.
<clever> jekstrand: yeah, until your using a gl based compositor, and now your password manager is in texture ram
<jekstrand> clever: Oh, sure.
<karolherbst> okay, security on old hardware is broken, but requiring the kernel or parse shaders is a bit too extreme
<clever> karolherbst: its kinda needed anyways, for the kernel to be able to fill in the physical address of buffer objects
<karolherbst> uhhh
<karolherbst> that sounds so wrong
<clever> you dont have many options, either trust the offsets userland gives you, or give userland the raw physical pointers
<clever> or parse the object, and fill in the pointers for userland
tobiasjakobi has joined #dri-devel
<karolherbst> I am actually wondering how anybody could think: yep, that sounds sane, let's design the hw this way
tobiasjakobi has quit [Remote host closed the connection]
<clever> karolherbst: v3d also has co-operative threading, a shader can run a special yield opcode, to swap out with a buddy, and that swaps the upper and lower half of the shader registers
<clever> karolherbst: but, that is only safe if both shaders use the lower half exclusively
<karolherbst> uhhh
<jekstrand> karolherbst: vc4 was never designed to be put in something you might call a PC. It was designed for set-top boxes where the vendor has total control over SW.
<clever> so the kernel has to audit the shader, and see if its using the lower-half, or the full range
<clever> and then only schedule the lower-half shaders on the same core
<clever> while full-range ones cant thread
<karolherbst> jekstrand: well... they don't have total control over the SW? :D
<karolherbst> or at least not what random people would do to it
<karolherbst> but sure
<jekstrand> I didn't say it was a smart choice
<clever> karolherbst: think the original ipod, where you cant even install apps
<clever> one of the original ipod's was videocore based
<karolherbst> clever: well.. that doesn't prevent you to actually install stuff if you realy want to :)
<jekstrand> And I'm not talking designed for what we currently call a smart TV. I mean like your super closed zero-apps DISH streaming box.
<pac85> I guess that's how you lose control over the sw lol
<clever> also, the arm isnt a trusted part of the chip, and by default, the arm core cant drive the 2d or 3d subsystems
<clever> the arm is supposed to just give orders to the VPU, and the trusted VPU firmware does all the dma-adjacent activity
<clever> which is how the rpi did gl, before mesa got involved
<pac85> I've heard that un raspberry the gpu is what boots first
<karolherbst> gl inside firmware?
<clever> pac85: kinda
<clever> karolherbst: yes
<karolherbst> mhhhh
<clever> pac85: the rpi has a dual-core VPU, where the firmware runs, and then 1 or 4 arm cores, where linux runs
<clever> plus the 3d core, with its own compute cores, for shaders
<pac85> Isn't having an entire driver there slower than running ir on the big cpu?
<vsyrjala> jekstrand: ivb/hsw also using aliasing ppgtt
<clever> originally, the entire opengl implementation was running on the VPU
<karolherbst> oh wow
<clever> pac85: the VPU is a 500mhz dual-core cpu, with vector extensions, it has plenty of grunt
<karolherbst> I'd be interested if that actually has any benefits of doing it this way
<pac85> Oh
<pac85> What architecture?
<clever> something custom
<clever> possibly a synopsys based DSP
<clever> the people that know, arent speaking
<pac85> So perhaps it could even offload the arm cpus
<clever> yeah
<clever> for (int i=0; i<16; i++) { int temp = a[i] * b[i]; if (store) c[i] = temp; if (accumulate) accumulator[i] += temp; }
<karolherbst> but why not just get rid of them and add 2 more arm cores :P
<clever> as an example, the VPU has a vector opcode, that can do this entire operate in 2 clock cycles, at 500mhz, non-stop
<clever> karolherbst: because the entire security model doesnt trust the arm core, and it would be a major redesign from the ground up
<clever> it was designed before trustzone was a thing
<pac85> Well that's interesting
<karolherbst> I don't trust GL implemented in firmware :P
<clever> pac85: each VPU core, has a uint8_t[64][64] matrix (a full 4096 bytes!!)
<clever> pac85: and every vector opcode, gives an XY+direction (row or column) for the 2 operands and output
<clever> it can also concat 2 or 4 vectors of uint8_t[16] to create a 16bit or 32bit value
<pac85> intetesting design, are there publicly available tool chains for it?
<clever> yes and no
<clever> the official toolchain is behind NDA
<clever> https://github.com/itszor/vc4-toolchain but some crazy people reverse engineered the ISA, and ported gcc
<pac85> Amazing work
<daniels> karolherbst: it's also just history - most of these special-purpose processors started off as basically just a DSP, and over time the CPU grew from 'occasionally issues control instructions but rarely alive' to the one which actually ran the show. mobile was exactly the same way, where the CPU was a weird annoying younger sibling to the baseband processor
<karolherbst> yeah.. probably
<clever> pac85: this is some VPU vector asm i wrote, for implementing a fir filter
<clever> it loads a pair of int16_t[r0*16] arrays, multiplies element n with element n in the other array, sums all of the products up, and returns sum>>15
<clever> pac85: when it gets a cache hit, with uint16_t[128]'s, it can do that entire computation 77 clock cycles
<clever> which lets it compute ~6.4 million samples/second
<clever> assuming that entire [128] is in the L1 cache at all times, and not counting the loop calling it
<clever> daniels: another crazy fact, there is an MMU between "arm physical" and real ram, because the arm is meant to be a jail for untrusted software
<clever> so you can have a dedicated chunk of ram for the decrypted video frame, and the arm can never read it, no matter how hard you exploit the arm code
<pac85> clever: amazing, I had no idea how dsp processors worked, it's pretty far from anything I've seen
<clever> and since the VPU is driving GL, the shaders can composite those secure frames into the framebuffer
<pac85> Sounds like something used for drm
<clever> yep
<daniels> clever: yeah, that's not entirely uncommon from that world
<daniels> TI and NXP have the same
<clever> there is also other fun stuff
<clever> you can configure the soc, such that mmio from userland just never works
<clever> so you can mmap /dev/mem all day, and can never touch a single peripheral
<clever> and various peripherals, each have their own security
<clever> the 2d core (kms, composition), is read-only by default
<clever> and has its own control register to allow arm to write
<clever> pac85: https://www.youtube.com/watch?v=GHDh9RYg6WI this is a demo i did, driving the 2d&3d cores from baremetal on the VPU, the arm core is not even enabled
<DemiMarie> Could one run Linux on the VPU?
<clever> DemiMarie: the VPU has no mmu, so you would need a nu-mmu build of linux
<clever> but otherwise, it should be possible, just need to port linux
jkrzyszt has quit [Ping timeout: 480 seconds]
pac85 has quit [Read error: Connection reset by peer]
<DemiMarie> clever: what made them think they could run GL on the VPU without getting the VPU pwned?
<DemiMarie> Also, is the VPU faster than the CPU?
<clever> DemiMarie: the special arm mmu lets you protect the VPU code from the evil arm, and there is signature checks in the boot chain
danvet has joined #dri-devel
<clever> DemiMarie: originally (pi0/pi1, bcm2835), the VPU was a dualcore at 500mhz, and the arm was a single core at 700mhz, so the VPU won by having more cores and better vector extensions
<DemiMarie> clever: I get that you can protect the VPU, but if the VPU firmware is written in C/C++, I am almost certain the OpenGL implementation has a memory corruption vuln somewhere.
<clever> but the arm has progressively improved, and the VPU hasnt
<DemiMarie> now is the VPU better for anything?
<clever> DemiMarie: the VPU also has a secure/nonsecure state, certain registers can only be accessed in secure mode, and there is a syscall style interface to run functions on a whitelist
<DemiMarie> Other than e.g. a firmware TPM
<clever> DemiMarie: i did a FIR benchmark on the pi400 (the fastest arm in the rpi line), and 4 cores just barely out-did a FIR test on the VPU
<clever> DemiMarie: the pi2 VPU runs at the same rate
<DemiMarie> clever: So the VPU really is a DSP then
<clever> yeah
pac85 has joined #dri-devel
<pac85> clever: loved that demo, how did you gain enough knowledge about tge 2d and 3d core to do it?
<clever> it can also saturate the dram bus to ~91% of theoretical performance, with raw uncached reads
<clever> pac85: for the 3d core, i originally drove it under linux 9 years ago, via /dev/mem: https://github.com/cleverca22/hackdriver
<DemiMarie> Can the ARM cores do that?
<DemiMarie> Is there anything Linux could offload to the VPU?
<clever> pac85: for the 2d core, i read the new KMS drivers in linux: https://github.com/raspberrypi/linux/blob/rpi-5.10.y/drivers/gpu/drm/vc4/vc4_plane.c
<clever> DemiMarie: the "analog" headphone jack is one thing thats already being offloaded to the VPU
<clever> its doing thru a crazy chain of FFT stuff, to change the samplerate and convert it into pwm
<clever> the dma block then feeds those pwm samples to the pwm peripheral
<clever> and a low-pass filter turns it into analog audio
<clever> from the linux side, its just regular old 48khz audio samples
<pac85> clever: thank you so much!
<DemiMarie> clever: thanks!
<DemiMarie> Could a firmware TPM run in the secure mode of the VPU?
<clever> DemiMarie: but behind the curtains, there is a conversion to 781250Hz sample rate
<DemiMarie> clever: why not have Linux run at 781250Hz?
<clever> DemiMarie: kinda, the secure boot chain is crippled by the keys being pre-burnt at the factory, so anybody can run any code on the vpu
<clever> and then an evil maid can just swap the firmware out and dump everything
<clever> DemiMarie: because thats ~16 times as much data, and its not an integer multiple of standard sample rates, so it needs some complex math to convert
<clever> which would raise the cpu usage drastically
<DemiMarie> clever: what I would implement is the DICE measured boot protocol
<clever> pac85: there is also this pdf, for the 3d core: https://docs.broadcom.com/doc/12358545
<clever> DemiMarie: for the vc4 boot rom (pi0-pi3), the bootcode.bin on the SD card has to be signed by an hmac-sha1 signature
<DemiMarie> you can upload any code you want, but you get a key that depends on an HMAC of that code, so uploading anything different means you don’t have access to your secrets
<clever> and the key for that sha1 is an xor of a constant in rom, and a write-once constant in OTP
<clever> but the rpi's come with the key pre-burnt
<DemiMarie> clever: yikes
<clever> and every pi within a given model, has the same key burnt in
<clever> so i can hack your pi3, by just dumping the keys on my own pi3
<clever> and then replacing your SD card
<DemiMarie> clever: I would use a different key for each model, and have `bootcode.bin` just implement DICE
kem has quit [Ping timeout: 480 seconds]
<DemiMarie> (this assumes that `bootcode.bin` can have access to a secret and then revoke access to it)
<clever> RPF also didnt care about the secure-boot on that model range, the signature checks where disabled
lynxeye has quit [Quit: Leaving.]
<clever> starting with the pi4, bcm2711B0T, they started to care, and enabled the hmac-sha1 checks
<clever> then multiple community members cracked it independantly P
<clever> :P
<clever> DemiMarie: the bcm2711B1T then introduced a new maskrom, that can also check RSA signatures, which just cant be cracked
<clever> but the rsa pubkeys (4 in total) are in the rom, so the end-user has no way to customize it
<clever> the latest pi4 bootcode.bin, can enforce actually secure secure-boot, with a user-chosen key in SPI flash
<clever> but it has 1 weak link, your trusting that the bootcode.bin that RPF signs, is actually secure, and that they will never sign a trojan with those keys
kts has quit [Ping timeout: 480 seconds]
<DemiMarie> clever: is the source code to that bootcode.bin available?
<clever> DemiMarie: nope
<DemiMarie> clever: why?
<clever> software licenses i assume
<clever> typical closed-source firmware
<javierm> [da87e1725ae2](https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=da87e1725ae2) ("fat: add renameat2 RENAME_EXCHANGE flag support")
<javierm> ups, sorry
* clever pokes cleverca22[m]
<clever> DemiMarie: when you boot a pi4, it takes one path down this one path in this graph
<clever> it starts at the maskrom (no source, but also read-only), which can load a .bin from 3 places, recovery.bin on an SD card, a tagged blob in SPI, a usb-device protocol
<clever> the official recovery.bin is meant to be on an SD card, and will re-flash the SPI to unbrick things (or upgrade)
<clever> the official bootcode.bin is meant to be inside the SPI flash, it does dram init, and then executes bootmain.elf from SPI
<karolherbst> *sigh* this nv50 issue is really bugging me
kem has joined #dri-devel
<clever> https://github.com/librerpi/lk-overlay can also be compiled into a pi4 .bin file, and then be inserted into any of those 3 spots
<clever> DemiMarie: so while there is no source, the blob can be partially replaced
<clever> DemiMarie: but i lack drivers for dram init, so that path on the graph becomes a dead-end
<karolherbst> huh.. wait a second...
<clever> bootmain.elf is also closed, it loads the config file(from spi), and then looks for start4.elf on sd/nvme/usb/tftp/https (based on the BOOT_ORDER config entry), and executes that
<clever> the official msd4.elf, then has the VPU flip the dwc usb controller into gadget mode, and it emulates an MSD, to expose whatever is on the SD controller
<clever> thats used for re-flashing emmc's soldered to a CM4
<clever> and the official start4.elf brings the system fully online, and boots linux on an arm core
<clever> lk-overlay can also be compiled into a pi4 .elf file, and has been tested in start4.elf, but should also work in bootmain.elf, in theory
<clever> but ive not investigated turning the arm on from there, so thats currently another dead-end on the graph
sdutt has quit []
<pac85> I see that the vc4 is well documented in that pdf, has the mesa driver been made without reverse engineering?
sdutt has joined #dri-devel
<clever> pac85: that pdf is only covering the 3d core, the 2d core and other vc4 peripherals are missing
<clever> pac85: and the vc6-v3d has changed undocumented things with v3d, mesa has been modified to work, but there are no docs saying what changed
<clever> overall, the 3d core just takes shaders+textures+polygons, and turns it into a flat 2d image
<pac85> clever: I see so those had to be done through re? I wonder where you'd even start when the software interacting with it is a proprietary firmware for a proprietary isa
<clever> the 2d core (HVS) takes a list of flat 2d images, dest XY's, dest W/H's, composites them together, and creates a stream of pixels for a video phy
<karolherbst> dcbaker: yo... I found the bug
<clever> pac85: i think RPF hired somebody from the DRI team, had them sign an NDA, and paid them to write the new mesa
<pac85> clever: how does releasing an oss driver not break the nda?
<clever> pac85: they where told to release the oss driver, while keeping secret anything that the oss driver didnt need
<pac85> I see
<clever> things like that magic register that is needed for the arm to even talk to the 2d core
<pac85> I guess something similar happens for amd GPUs?
<clever> which rpi engineers repeatedly claimed didnt exist
<clever> until i found it in the headers they released years ago :P
<pac85> Wow
<pac85> lol
<clever> i'm not sure if they where intentionally hiding it, or they just didnt know it existed
<clever> and couldnt be bothered to search
<clever> which leads me to believe, i know some parts of the rpi hardware, better then the official engineers
<clever> just because this is in the lizard-brain of the code, that nobody has had to touch for decades
<Shibe> does anyone know where the VAAPI implementations for each GPU are? I've noticed vaRenderPicture when doing RGB to YUV conversion is taking a fair bit of time on AMDGPU and I wanted to see if it's using some sort of slow path (doing the conversion on CPU)
<Shibe> i cant seem to find them in mesa or libva
<anholt> clever: as the author of the code: 1) I was employed by broadcom 2) nda was typical employment agreement, but I wasn't blocked from releasing code using any particular registers 3) if you don't have autogen from hardware source files (which I never fought to release), then you just write headers with whatever registers you need and think you're likely to need, which is why not all registers are in headers.
<clever> anholt: ah, wasnt sure if you where awake, didnt want to ping you
<dcbaker> karolherbst: oh boy, I love bugs
<karolherbst> it's a very annoying one
<DemiMarie> anholt: did you consider asking them to release the autogen headers + docs?
ybogdano has joined #dri-devel
<clever> anholt: i believe this auto-generated header was already in the public when you started, but its got the wrong base addr, and i'm not sure if fits the linux register style
<anholt> DemiMarie: not seriously, no.
<clever> it also entirely lacks the display list "registers"
pac85 has quit [Remote host closed the connection]
<DemiMarie> also, any autogen header under GPL might raise some rather nasty concerns
<DemiMarie> GPL requires the source code to be available, and generated headers aren’t source according to the GPL’s definition
pac85 has joined #dri-devel
<DemiMarie> anholt: was this because you did not care, or because you did not think you would succeed? (not going to judge you, btw, and you don’t have to answer)
alyssa has joined #dri-devel
* alyssa kinda wishes we did 1-line spdx license comments in mesa
<karolherbst> something with unions are busted :(
<karolherbst> the most annoying kind of compiler bugs
<pac85> I guess we have to be thankful we have an oss driver at all
<DemiMarie> pac85: yup
<pac85> Given how most mobile GPUs are entirely done through re
<clever> pac85: RPF has been moving towards oss drivers, for many parts of the hw, but there are parts they are still holding out on
<pac85> Afaik
<DemiMarie> anholt: I am indeed grateful that we can have an OSS driver, and if that meant using incomplete headers, then so be it.
<anholt> DemiMarie: I had permission to do what I needed, which was surprising given the corporate environment. Given that, you don't want to bother higher-ups (who might say "why are we doing this?") until you have a more useful product to justify your requests.
<DemiMarie> anholt: Okay, that makes sense, thanks!
<DemiMarie> That is a very valid decision.
<clever> pac85: the unicam (csi camera input), x265 decode, 2d core, and 3d core all have source for driving them directly from linux, the x265 is the only weird one, all of the others already had blob drivers, so this was a choice to improve things
<anholt> ultimately I couldn't push mesa into the groups that really mattered (was only making progress in little 1%-of-devices divisions), and decided to go do something more interesting to me.
<clever> the x265 core is weird, in that it skipped the blob stage, and went right to oss
<karolherbst> dcbaker: not sure if I'll be able to fix this one easily, but at least I have a few ideas now. So I'd midly prefer towards a new RC at this point
<karolherbst> *lean
<DemiMarie> anholt: and I am glad that you did 🙂
<clever> pac85: but the ISP (camera processing, debayer yuv->rgb and more), remains in the blob
<DemiMarie> I hope I did not come across as trying to second-guess your decisions, btw.
pzanoni has joined #dri-devel
<pac85> clever: driven by the fw?
<DemiMarie> clever: how much of the ISP stuff could be done in software (either on ARM or on VPU)?
<clever> pac85: yeah
<clever> DemiMarie: probably all of it, its just bit shifting and math, the ISP is just a dedicated core that is designed to do it very fast
<karolherbst> huh.. something _very_ odd happens
<DemiMarie> clever: that is what I figured
<DemiMarie> my general rule is that one should always have a pure-software fallback, because hardware acceleration is not guaranteed to be available
<DemiMarie> and while performance of that fallback may not be good, one should try to make it at least usable
<pac85> The problem is that things like guis have fondamentallt shifted
<clever> DemiMarie: yeah, that policy is visible within mesa, how it can do entirely software gl
<DemiMarie> pac85: what do you mean?
<pac85> They used to be designed around the limitations of software rendering on old hw, like you pressed a button and that small rectangle was all that would be redrawn
<DemiMarie> Are you referring to redrawing everything instead of only what is needed?
<pac85> Now you hace aninations, shadows and what not
<pac85> Yeah
<pac85> Though that affects the way guis are made in many ways
<DemiMarie> Why do they do this? Wouldn’t limited redraw be faster even in HW?
<pac85> Not just the way the toolkit is implemented
<clever> pac85: 2 examples of what the 2d core alone can do, on the rpi
<anholt> DemiMarie: decent compositors and toolkits do partial redraw based on buffer age.
<clever> all that really is, is alpha blending, sliding images, and different x/y scales
<anholt> (or, for compositors, use overlays and not actually any drawing)
<clever> yeah, the images i just linked, are realtime composition, nothing is ever being redrawn
<DemiMarie> pac85: one of the GTK devs told me that one can recover some SW performance by using a very flat theme
<pac85> clever: I guess that hw is not that used by linux
<DemiMarie> Which is presumably what old toolkits always did
<clever> pac85: it is exposed via the drm/kms api, but linux limits to to 32? planes, while the hardware can do over 100 planes
<pac85> DemiMarie: mmm interesting but what about animations and such?
<clever> so almost nothing actually tries to use it to this level
<clever> everybody just goes directly to opengl composition
lemonzest has quit [Quit: WeeChat 3.5]
<pac85> clever: I see, I guess it is a question of writing sw for multiole targets that don't really have thay capability
<clever> yep
<DemiMarie> pac85: what animations? those things that are often more annoying (and time-wasting) than useful?
<clever> desktop GPU's may only support 2 or 3 planes max
<clever> at that point, you cant make every button into its own plane
<DemiMarie> (I get why designers would want to include them, but I think that any software that uses them should have the ability to disable them)
<pac85> DemiMarie: yes exactly
<dcbaker> karolherbst: I’ll cut an rc today then
<karolherbst> cool, thanks
<pac85> DemiMarie: but I wonder how havung them affects the architecture of the software
<DemiMarie> pac85: so my main annoyance is that we *know* how to make GUIs that work perfectly okay with software rendering, and that are perfectly functional, and yet nobody does that anymore
<pac85> Also partially redrawing things seems hard when you have, say double buffering, wouldn't you need an internediate buffer?
<DemiMarie> it feels like software bloat to me
<DemiMarie> pac85: yeah, but `memcpy()` is very fast
<clever> pac85: you would need to track the changes on say the last 2 frames, and do a partial redraw twice
pac85 has quit [Read error: Connection reset by peer]
<clever> DemiMarie: oh that reminds me, the dma core on the rpi has a 2d stride mode
pac85 has joined #dri-devel
<clever> DemiMarie: basically, it will copy X bytes, then advance the write pointer by Y, and repeat Z times
<clever> so it can draw into a sub-rect of a bigger 2d image
<DemiMarie> clever: nice
<pac85> Like a blitter
<karolherbst> RA is broken :(
<clever> pac85: you would need to track the changes on say the last 2 frames, and do a partial redraw twice
lemonzest has joined #dri-devel
<clever> pac85: also, that 3d demo i linked earlier, is fully double-buffered
<pac85> clever: I see, but when you are using hw wouldn't that end up making thinhs slower? Unless you also introduce some kind of euristic on whwyher it's better to do a total redraw or a partial redraw
<pac85> clever: dows the scanout go through virtuak memory?
<pac85> Does*
<clever> normally, no
<clever> there is a small fifo, about 3-4 scanlines
<clever> the compositor generates a full scanline at once
<clever> and the pixel valve then reads pixels out, at the configured pixel clock, and feeds them to a phy (like hdmi)
<pac85> So I guess there is a small dual ported buffer to scan out things at the pixel clock?
<clever> but, there is a rarely used mode (exposed as a writeback port in kms), that lets you skip the pv+hdmi step, and instead read the fifo directly
<clever> kinda, there are actually 2 fifo's
<clever> the hvs has one big block of ram, where it holds 3 parallel fifos (for the 3 video outputs)
<clever> the PV then has a much smaller fifo, on the order of 32 pixels, and it will read the hvs fifo in bursts
<clever> that design, implies that the hvs fifo isnt dual-port
<pac85> clever: that's why I asked whether it goes through virtual memory, if one can scan out memory and then read back what us being scanned out you sort of bypass vm right?
<pac85> clever: I see so it's basically doubled buffered
<clever> kinda, let me grab another example
<anholt> pac85: there are no mmus behind the vc4 v3d or the hvs. it's all bus addresses.
<clever> yep
<clever> this video shows 2 things
<clever> the main one, is that i have too many sprites on-screen, and if they all land in the same scanline, things glitch out
<clever> because the hardware only has ~5 scanlines of framebuffer
<clever> and if the compositor cant keep up, and falls 5 scanlines behind, then it just starts outputting garbage
<pac85> That feels like 16bit consoles era quirks
<clever> yep
<clever> i also made things worse, by under-clocking the system massively
kts has joined #dri-devel
alyssa has left #dri-devel [#dri-devel]
<clever> i was trying to find the limits, and its easier when you can change the freq, and graph how the limit changes
<pac85> Makes sense
<pac85> I love this kind of stuff
<clever> *looks*
<clever> > Yes, context memory (for display list) is embedded SRAM. It can do one pixel per cycle for palettised formats (compared to 4 pixels per cycle for unscaled, and 2 pixels per cycle scaled).
<clever> [9:15 AM]
<clever> pac85: how many pixels/clock the hvs can generate, as its compositing
<pac85> Is this like an internal detail of the hw or is it something exposed? I wonder how you learned the hw so deeply. I always wondered how modern hw handles scanout at this kind of level
<clever> pac85: the above is a quote from one of the rpi engineers, who answered questions on the forum
<clever> but once you know that answer, you can compute how many clocks it takes, to render any given scanline on in the display list
<pac85> I see
<clever> and you can then compare it against the pixel clock and vpu clock, and know if your going to fall behind or keep up
<clever> and can then say if a given displaylist is capable of being rendered
<clever> but the answer also reveals other things
<clever> if the hardware is capable of compositing 4 RGBA8888 pixels per clock, thats 128bits
<clever> so it must be capable of moving 128bits from ram -> hvs, on each clock
frieder has quit [Remote host closed the connection]
<clever> > Most of the high-bandwidth bus masters have 128-bit data widths to SDRAM.
<clever> pac85: and that agrees with this post
<clever> pac85: but, this is also where clock domains and conflicts come into play
<karolherbst> is there a good way to fetch a "text nir" and compile it locally? :D
<karolherbst> ohh.. maybe I can do it with the TGSI form ntt
<clever> the hvs can move 128bits/clock, at 500mhz, 64gigbit/sec
nchery is now known as Guest1752
nchery has joined #dri-devel
<clever> pac85: the LPDDR2-400 can move 32bits/half-clock, at 400mhz, 25.6gigbit/sec
kts has quit [Ping timeout: 480 seconds]
<clever> so full bore, the hvs is going to constantly be stalling out and waiting for dram to catch up
<pac85> clever: so in that case you would see artifacts I guess
<clever> but when compositing a scaled sprite, its only needing 64 bits/clock, 32gigbit/sec
<pac85> Right
<clever> pac85: only if the pixel clock is high enough to actually need it to run full-bore
<clever> or you have too many sprites on the same scanline
<clever> it doesnt do occlusion testing
<clever> so if a pixel is covered by 20 opaque sprites, it just draws to it 20 times
<clever> my rough understanding, is that it will do thru the entire display list, test if a given sprite is on the current scanline, and if it is, compute what line of pixels to copy from (assuming 1;1 scale)
<anholt> karolherbst: if you'd like me to take a look at something, please file a bug report with basic information like "app where this happens" and "before and after tgsi"
<clever> pac85: it will then copy 4 pixels/clock from dram to the scanline fifo, and then move on to the next sprite and repeat
<karolherbst> anholt: I don't think it's a bug inside the TGSI itself though
<karolherbst> mostly a pattern we didn't see before
<karolherbst> which... is interesting
<karolherbst> it's a real nouveau compiler bug
<anholt> karolherbst: you've pinged me on this a couple of times now with no useful information for me to do anything with.
<eric_engestrom> alyssa: `git grep SPDX` says we already do, and I'm definitely in favour of going more in that direction
<pac85> clever: so aside from the sprites per scaline limit you could also hit a global limit I guess
<clever> pac85: now, lets say we are driving a 1280x1024@60 screen, thats 1280 pixels per scanline, and it needs to generate 1024*60 scanlines/sec, thats ~78 million pixels/sec, not counting blanking time
<clever> but since the hvs can draw 4 pixels per clock, /4 that, and you could render 1 fullscreen sprite at ~20mhz
kem has quit [Ping timeout: 480 seconds]
<karolherbst> anholt: I think at this point it's more of a heads up, that such issues _could_ happen with other drivers. It's some weird 64 bit value stuff originating from loops and I am not sure if other drivers are handling it just fine or not
<karolherbst> at least it seems to be something new, which didn't happen before
Guest1752 has quit [Ping timeout: 480 seconds]
<clever> pac85: but lets say you have 2 1280x1024 sprites, with one containing alpha, now you need 40mhz, in theory, to draw it
heat_ has quit [Read error: No route to host]
heat has joined #dri-devel
<clever> and if you have a 1280x1024 sprite, and a 1280x20 overlay, you still need 40mhz for that 20 scanline region, or it wont be able to keep up, so we can just ignore sprite height
alyssa has joined #dri-devel
<pac85> clever: what clock is that? Vpu clock? What does it run at by default?
<alyssa> Uglier than I had hoped but the code itself is probably sane
<clever> the VPU clock yes, it defaults to 500mhz on pi2 and up
<clever> 20 sprites, of that size, and you need 400mhz, and now your likely to run into dram bandwidth problems instead
<clever> pac85: but the pi4 has a new dram controller, way more bandwidth, and multiple slave ports, so 2 bus masters can issue orders in the same clock
<pac85> clever: I see, that's really good performance either way
<clever> the compositor can also use the 128kb L2 cache, so if your drawing duplicates of an image, and the rows aligned right, you could use the cache
<clever> and that would lower dram bandwidth
<lygstate> a630-traces fails a lot
<clever> pac85: the hardware supports every pixel format in this list, so you can also halve the memory bandwidth usage, by using an rgb565, or an rgba5551
kem has joined #dri-devel
<clever> pac85: palette formats would reduce you down to 1 byte per pixel (possibly multiple pixels per byte?), but that comes at the cost of loosing your 4 pixel/clock advantage, so the VPU then has to be clocked higher
<robclark> alyssa: hmm, I guess we'd had those TODOs for a long time.. I'd forgotten about 'em. OTOH we could actually implement conditional draws without too much difficulty, I think.. Anyone know a game/etc that actually uses conditional draws?
mbrost_ has quit [Ping timeout: 480 seconds]
<alyssa> robclark: for context jasuarez and I are working on moving the emulation to common gallium
<alyssa> and making it slightly more efficient in the happy path
<alyssa> implementing cond draws for real on older mali is nontrivial, at the very least
<alyssa> and I don't know any game/etc that actually uses them
<alyssa> (so I'd just like to get the check out of the hot path)
<robclark> I guess as long as there is still a perf warn if we hit conditional draws so that I realize without a lot of debugging when the day comes to actually implement it properly on hw ;-)
<alyssa> ^^
<alyssa> that is an outstanding question, how to do a perf_debug in common
<alyssa> r
<pac85> clever: does the vpu run in sync with the 2d engine? Could you do tricks like "racing the beam"
<robclark> alyssa: so, might be clever to add something to the recently added cpu_trace.h and wire it up to some sort of perfetto event in the case that perfetto is enabled
<alyssa> i know what some of those things are
<clever> pac85: somewhat, the 2d core has a status field, that declares the current scanline its on, as long as you stay ahead of that, you can likely do it
<clever> there is also an hsync interrupt, but its delayed some
mbrost_ has joined #dri-devel
<clever> each time i get an hsync interrupt, i read the current scanline from the hvs
<clever> if scanline % 5 == 0, then i set the background color to blue, else white
<clever> the big fat blue stipes you see, are because the hvs was falling behind (underclocked even more then before), and when it hit a region with 0 sprites, it sprinted ahead
<clever> and generated many scanlines, within the time between 2 hsync irq's
<clever> and the larger white stripes, are the hvs falling behind, so the scanline didnt advance between hsync irq's
<clever> so you need to know how far ahead it can sprint, and keep that far ahead of things
<clever> that background fill also has a cost, i'm guessing 4 pixels/clock
<clever> so if you have a single full-screen plane, you can turn the background fill off
saurabhg has quit [Ping timeout: 480 seconds]
<pac85> clever: very nice, I'd imagine it would be totally feasible to do that old trick of shifting around a plane with precise timing to get something like parallax scrolling with just one plane
<clever> yeah, that might be possible, with some care
<clever> you would need to ensure the hvs is never falling behind, so the hvs fifo is always full, and your stalling on the pixel clock
<clever> then you can easily predict when it does the composition
<clever> pac85: the XY position of a plane is held together in a single 32bit slot, so you can move a sprite with just 1 32bit store
<clever> but negative positions dont work
<clever> so if you want a plane to start -50 pixels to the left, you have to increment the source addr by byte_per_pixel * 50, and decrement the width by 50
<clever> and then set its position to just 0
<pac85> But it's fine if it is partially outside od the screen right?
<clever> going beyond the right/bottom is perfectly fine
<clever> but the top/left you cant do, unsigned x/y
<clever> pac85: but using the 3d core would make this far simpler
<clever> pac85: this generates 3 vertices, each with a 12.4bit fixed-point XY, and 3 varyings
<clever> the hw will then rasterize it automatically, and interpolate the varyings for every pixel
<clever> and if you just use UV for varying, rather then RGB, you can do texture lookups, and now parallax scrolling is pretty much done, with no racing of the beam
<pac85> clever: I was curious on whether you could implement those ancient techniques on a modern piece of hw, thanks though the 3d hw is interesting as well, I see how i takes lists with all of the info, so there is no gfx rings like desktoo gpus
<pac85> I see it's a tdbr gpu
<clever> pac85: there kinda is a gpu ring, the binner and render control lists
<clever> pac85: this control list is capable of rendering multiple frames, and blocking on a mutex until something happens
<clever> so you can just keep appending to the ring and the hardware will just walk the list
<clever> but i wasnt aware of that when i wrote the code, and its configured to come to a complete halt after each frame
<pac85> oh I see now how it works
<alyssa> jasuarez: I think I did what I set out to do :)
<alyssa> passing the torch back to you :)
<pac85> clever: I really have to thank you I learned a ton of stuff today
<clever> pac85: if you compare makeRender and makeBinner, to section 9 in the pdf i linked a while back, you can see what each step does
alyssa has quit [Quit: leaving]
<clever> pac85: increment semaphore, and wait on semaphore, let the binner and renderer co-operatatively stall on eachother, so you can wait for the other to finish some work
<clever> there is a code to branch to an arbitrary 32bit addr, so your command list can loop, forming a ring
<clever> and there is a halt command, for when you hit the end of the ring
<clever> i forget where, but there was some marker thing as well
<pac85> clever: I see, I found the table, so opcode 16 is the one that branches, interesting how it also has "sublists" that work like subroutines
<clever> pac85: i think using the control/status reg (page 85), you can temporarily suspend a thread, then inspect its PC/LR, and dynamically append to its control list, and resume it
<clever> pac85: the binner will generate chunks of control list code for you, but there is no way to write a for loop to "call" those, so you have to unroll the loop, like this:
<clever> pac85: this will set the tile XY, call a sublist (which draws every polygon in that file), then it will commit that tile to the output framebuffer
<clever> things like the depth buffer, are only 1 tile big
<clever> the final rgba framebuffer is in ram, and lacks depth information
<clever> so once you commit the tile, you cant draw any more polygons, its just a flat 2d image
<clever> ah wait, you dont have to suspend the thread when modifying, thats what the "end" register is for, page 86
<clever> if the control list hits the end address, it automatically enters "halted at end" state
<clever> and if you change the end address while in that state, it will automatically resume
<clever> so you can just blindly append to the control list, and increment the end-addr
<clever> and whenever you need to wrap, insert a branch opcode
lkw has joined #dri-devel
<pac85> clever: interesting design, so control list are really flexible. I also understand how somethung like indirect draw could work
<pac85> So the binner is a piece of hardware generating control list for the renderer, does it generate a command for each primitive?
LexSfX has quit [Read error: Connection reset by peer]
LexSfX has joined #dri-devel
dv_ has joined #dri-devel
TMM has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
TMM has joined #dri-devel
warpme___ has joined #dri-devel
JohnnyonFlame has joined #dri-devel
<pac85> oh it seems like the binner outputs more than just the control lists
pa has joined #dri-devel
anarsoul has quit [Quit: ZNC 1.8.2 - https://znc.in]
pa- has quit [Ping timeout: 480 seconds]
pac85 has quit [Remote host closed the connection]
pac85 has joined #dri-devel
<clever> pac85: yep, and those have the special primitive list opcodes
<clever> pac85: type 48, compressed primtiive list, most likely
<clever> which is then explained on page 72
<clever> pac85: code 112 (page 71) tells the binner where it should put the control lists it generates, and where the entry-point for each tile is
<clever> and if the code cant fit within that region, there is an overflow space on page 83, "overspill binning memory block"
<clever> and it will fire an irq when it wants more
kem has quit [Ping timeout: 480 seconds]
nchery has quit [Read error: Connection reset by peer]
<clever> pac85: yeah, page 62 mentions placing a marker in the control list, and then counting how many markers you have hit, to figure out where in the control-ring it is, but there is no marker command!
nchery has joined #dri-devel
<pac85> clever: I see so I guess the marker command is undocumented?
<clever> or they meant to say, use the semaphores as a marker
<pac85> Mmm
fxkamd has quit []
<clever> in this example code i wrote, line 543 will swap between the 2 output buffers
<clever> 546 makes a new rendering control list, that targets the new frame
<clever> 548 updates the vertex info, so the triangle spins
kem has joined #dri-devel
<clever> 550-552 schedules the render thread, which the irq handler will then fire (but it could use semaphores)
<clever> 554-558 then starts the binner thread
<clever> 563 then waits for the irq to say binning&rendering is done
<clever> 567-576 then schedules a pageflip on the next vsync
<clever> and 585 then waits for vsync
<clever> and i just noticed a bit of a mistake, 592 schesules a second pageflip on the next vsync (overwriting the previous one)
<clever> if i used semaphores, then i could queue up several frames in here, as many as i have ram for, and just keep appending things to the control list rings
<clever> and then every time i get a frame-done irq, i can put it into a finished set of frames
<clever> and then just render one per vsync
<clever> for animations, a big queue would help keep the 3d core busy, but for games, you instead want a short queue for reaction times
cengiz_io has joined #dri-devel
<clever> bbl
cengiz_io_ has quit [Ping timeout: 480 seconds]
gouchi has joined #dri-devel
<karolherbst> dcbaker: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18377 is the fix.. probably...
<karolherbst> sadly nobody is left anymore who has a faint idea on how that's all supposed to work, so will have to think about it a little more. But it's fixing the nv50 regression I was seeing
<karolherbst> and hopefully some users will test that as well
<karolherbst> hesistant to push a patch where I don't have a good idea if it's actually the correct thing to do, nor if it actually makes sense
pac85 has quit [Ping timeout: 480 seconds]
ybogdano is now known as Guest1762
ybogdano has joined #dri-devel
Guest1762 has quit [Ping timeout: 480 seconds]
pac85 has joined #dri-devel
srslypascal is now known as Guest1763
srslypascal has joined #dri-devel
<pac85> clever: I was able to follow through your example while reading the pdf, thabk you again. I'll see if I can set up my pi and try out something myself
heat has quit [Read error: No route to host]
heat has joined #dri-devel
lemonzest has quit [Quit: WeeChat 3.5]
Guest1763 has quit [Ping timeout: 480 seconds]
rasterman has quit [Quit: Gettin' stinky!]
alyssa has joined #dri-devel
<alyssa> Is using targeted shaders in shaderdb + compiler design-by-contract a sane alternative to unit testing? Probably not :D
<alyssa> (or how I learned to stop worrying and https://gitlab.freedesktop.org/mesa/shader-db/-/merge_requests/68 )
<clever> pac85: one thing ive yet to test, and have just barely figured out, is vertex shading, all of my examples have been purely 2d inputs
ngcortes has joined #dri-devel
gouchi has quit [Quit: Quitte]
mbrost_ has quit [Ping timeout: 480 seconds]
AndroUser2 has joined #dri-devel
pac85 has quit [Read error: No route to host]
AndroUser2 has quit [Read error: Connection reset by peer]
heat has quit [Remote host closed the connection]
heat has joined #dri-devel
AndroUser2 has joined #dri-devel
AndroUser2 has quit [Remote host closed the connection]
AndroUser2 has joined #dri-devel
ybogdano has quit [Ping timeout: 480 seconds]
AndroUser2 has quit [Remote host closed the connection]
Duke`` has quit [Ping timeout: 480 seconds]
AndroUser2 has joined #dri-devel
ybogdano has joined #dri-devel
<dj-death> is there a way to combine uniforms or convergent loads (like load_ubo?)
<dj-death> in NIR
<dj-death> like I loaded 2 vec2 and could build a vec4
<dj-death> which would still be convergent and could live in a single of our vec8 registers
<airlied> nir_opt_load_store_vectorize?
FireBurn has joined #dri-devel
lkw has quit [Remote host closed the connection]
frankbinns has quit [Remote host closed the connection]
<pinchartl> a bit of a newbie question: how does EGL pick the platform backend at runtime or link time ? I'm looking at how eglplatform.h defines things like EGLNativeDisplayType for instance, and I understand one would have to e.g. include wayland-egl.h or define USE_X11 before including eglplatform.h. but how is the corresponding eglGetDisplay() version picked?
<dj-death> airlied: of course it existed already :)
<dj-death> airlied: might need to extend it for some intel intrinsics but thanks!
pallavim_ has quit [Ping timeout: 480 seconds]
<ajax> pinchartl: eglGetDisplay() predates the egl concept of multiple platforms so the EGLNativeDisplayType is just uintptr_t and it's a Display* for X11 and you just happen to have an EGL that only works with X11
<ajax> (for arbitrary values of X11 and Display*)
<ajax> if you use eglGetPlatformDisplay then you can actually _tell_ EGL what kind of handle it is
<ajax> but if you don't then we have to guess, and basically that's check that you can deref the pointer and then see if it's laid out like a Display or a wl_display
<ajax> please don't make us guess though, use GetPlatformDisplay
<pinchartl> :-)
<pinchartl> thanks
<pinchartl> what's the EGL attribute that conveys the display type to eglGetPlatformDisplay ?
<ajax> not an attribute, it's the first argument
<ajax> EGLDisplay EGLAPIENTRY
<ajax> eglGetPlatformDisplay(EGLenum platform, void *native_display, const EGLAttrib *attrib_list)
<ajax> EGL_PLATFORM_WAYLAND_EXT etc
<ajax> not sure why it's like that instead of being a mandatory attrib but whatever
<pinchartl> oh I missed that for some reason
<pinchartl> thank you
<ajax> np
<pinchartl> and I agree with you, dereferencing the pointer to guess the layout is absolutely horrible. I didn't know you had to deal with such horrendous concepts, I really feel sorry :-)
mvlad has quit [Remote host closed the connection]
alanc has quit [Remote host closed the connection]
alanc has joined #dri-devel
genpaku has quit [Remote host closed the connection]
dakr has quit [Ping timeout: 480 seconds]
genpaku has joined #dri-devel
AndroUser2 has quit [Read error: Connection reset by peer]
AndroUser2 has joined #dri-devel
ngcortes has quit [Read error: Connection reset by peer]
<karolherbst> soo... I think I'll merge the rusticl stuff some time next week unless somebody points out things which should be addressed. Might address some more of the Rust review until then, but I think it's more or less ready to land :)
<ajax> \o/
fab has quit [Quit: fab]
<dj-death> airlied: I don't think this pass does what I'm looking for
<dj-death> airlied: it seems to combine identical load/stores
<dj-death> airlied: while I'm trying to reduce register pressure by packing more constant data into vec8s
anarsoul has joined #dri-devel
danvet has quit [Ping timeout: 480 seconds]
anarsoul has quit []
ahajda has quit [Quit: Going offline, see ya! (www.adiirc.com)]
anarsoul has joined #dri-devel
anarsoul has quit []
lygstate_ has joined #dri-devel
lygstate has quit [Read error: Connection reset by peer]
<airlied> dj-death: pretty sure it was for combining adjacent ubo loads into vectors
<airlied> dschuermann: ^ might knw more
ngcortes has joined #dri-devel
anarsoul has joined #dri-devel
<pendingchaos> yes, that's what nir_opt_load_store_vectorize is for
<pendingchaos> it also combines identical and intersecting load/stores
FireBurn has quit [Quit: Konversation terminated!]
kts has joined #dri-devel
Haaninjo has quit [Quit: Ex-Chat]
nchery has quit [Remote host closed the connection]
pcercuei has quit [Quit: dodo]