#dri-devel on 2021-07-17 — irc logs at oftc.irclog.whitequark.org

2021-06-22 12:29 ChanServ changed the topic of #dri-devel to: <ajax> nothing involved with X should ever be unable to find a bar

00:01 rasterman has quit [Quit: Gettin' stinky!]

00:04 Hi-Angel has quit [Ping timeout: 480 seconds]

00:19 tursulin has quit [Remote host closed the connection]

00:19 adjtm has quit [Read error: Connection reset by peer]

00:28 jewins has quit [Ping timeout: 482 seconds]

00:28 xexaxo has quit [Read error: Connection reset by peer]

00:30 xexaxo has joined #dri-devel

00:46 cbaylis has quit [Quit: leaving]

01:06 Lightkey has quit [Ping timeout: 480 seconds]

01:16 Lightkey has joined #dri-devel

01:20 Emantor has quit [Quit: ZNC - http://znc.in]

01:20 Emantor has joined #dri-devel

01:27 adjtm has joined #dri-devel

02:02 sdutt has quit [Read error: Connection reset by peer]

02:02 sdutt has joined #dri-devel

02:11 boistordu_old has joined #dri-devel

02:17 boistordu has quit [Ping timeout: 480 seconds]

03:02 vivijim has joined #dri-devel

03:04 sumits has quit [Quit: ZNC - http://znc.in]

03:10 sumits has joined #dri-devel

03:44 Company has quit [Quit: Leaving]

03:58 sdutt has quit [Ping timeout: 480 seconds]

04:24 luckyxxl has joined #dri-devel

05:22 mattrope has quit [Read error: Connection reset by peer]

05:28 Lyude has quit [Quit: WeeChat 3.0.1]

05:55 Lyude has joined #dri-devel

06:12 mlankhorst has joined #dri-devel

06:28 Duke`` has joined #dri-devel

06:35 alanc has quit [Remote host closed the connection]

06:36 alanc has joined #dri-devel

06:47 luckyxxl has quit []

07:11 lemonzest has joined #dri-devel

07:16 soreau has quit [Quit: Leaving]

07:16 soreau has joined #dri-devel

08:07 rasterman has joined #dri-devel

08:10 zzoon[m] is now known as zzoon_holidays_till_21th[m]

08:19 yoslin has quit [Ping timeout: 480 seconds]

08:20 yoslin has joined #dri-devel

08:29 flto_ has joined #dri-devel

08:31 flto has quit [Ping timeout: 480 seconds]

08:35 danvet has joined #dri-devel

08:36 Hi-Angel has joined #dri-devel

08:43 gouchi has joined #dri-devel

08:46 gouchi has quit [Remote host closed the connection]

08:59 pcercuei has joined #dri-devel

09:01 jernej has quit [Quit: Free ZNC ~ Powered by LunarBNC: https://LunarBNC.net]

09:20 rasterman has quit [Quit: Gettin' stinky!]

09:32 jernej has joined #dri-devel

10:04 iive has joined #dri-devel

10:09 Tooniis[m] has quit []

10:09 Tooniis[m] has joined #dri-devel

10:14 gouchi has joined #dri-devel

10:15 Tooniis[m] has quit []

10:15 Tooniis[m] has joined #dri-devel

10:27 Company has joined #dri-devel

12:05 NiksDev has joined #dri-devel

12:07 flto_ has quit []

12:08 flto has joined #dri-devel

12:29 pekkari has joined #dri-devel

12:39 camus has joined #dri-devel

12:45 camus1 has joined #dri-devel

12:47 camus has quit [Ping timeout: 480 seconds]

13:06 MrRml[m] has joined #dri-devel

13:07 pekkari has quit [Quit: Konversation terminated!]

13:19 agd5f has quit [Remote host closed the connection]

13:37 tobiasjakobi has joined #dri-devel

13:45 graphitemaster has joined #dri-devel

13:49 MrRml[m] has quit []

13:49 MrRml[m] has joined #dri-devel

13:50 MrRml[m] has quit []

13:50 MrRml[m] has joined #dri-devel

13:57 pekkari has joined #dri-devel

13:59 <graphitemaster> I'm curious, what's the rationale for mesa_glthread being enabled per-application? Are threaded optimizations still non-stable and potentially application breaking? The reason I ask is because glBufferSubData performance is consistently better on NV (proprietary) because of it's threaded optimizations __GL_THREADED_OPTIMIZATIONS defaults to 1. It's a of a known fact in graphic circles that NV has fast sub buffer uploads that

13:59 <graphitemaster> application developers (myself included) go out of their way to write vendor specific paths for uploads, but it appears that mesa_glthread=true gets there too (but is not default)

14:01 <graphitemaster> Tons of applications only use persistently mapped buffers on AMD and Intel via Mesa because it tends to be faster than glBufferSubData. That seems like low hanging fruit if you can just implement one in terms of the other, why not, at least it's comparable to NV performance in my tests - alternatively, threaded optimizations bridge that gap too.

14:06 Hi-Angel has quit [Ping timeout: 480 seconds]

14:08 <graphitemaster> There's a whole blog entry on buffer mapping patterns on the website here https://docs.mesa3d.org/gallium/buffermapping.html which more or less sniff out the patterns games use but I feel like there's some missing information here because this doesn't consider the implicit double-buffering that well, double buffered vsync affords. I know the NV driver does not issue draws immediately, this is deferred until frame n+1 swap buffer

14:08 <graphitemaster> call, so it has a whole frame window to do the upload, which is moved onto the background thread. The way I read this in mesa is that the updates happen in lockstep with the frame.

14:09 <graphitemaster> I wonder, does GLTHREAD do the same thing then, move it to a background thread?

14:10 cbaylis has joined #dri-devel

14:25 flacks_ has joined #dri-devel

14:29 flacks has quit [Ping timeout: 480 seconds]

14:37 <alyssa> graphitemaster: I don't touch that code but AFAIK the short answer is "mesa_glthread helps games that don't use threading optimizations themselves, but hurts games that are well-written and do" so it's an allowlist for apps that are known to benefit instead of hurt for performance

14:47 <graphitemaster> alyssa, Even still, I wouldn't expect MapBuffer with COHERENT_BIT, and memcpy (instead of glBufferSubData) to be faster and yet it's consistently faster than current glBufferSubData in mesa in my tests. May as well just implement glBufferSubData that way then.

14:48 <graphitemaster> *MapBufferRange

14:51 mlankhorst has quit [Ping timeout: 480 seconds]

15:07 <alyssa> 🤷

15:12 tobiasjakobi has quit [Remote host closed the connection]

15:36 <glennk> what hardware is this on graphitemaster?

15:36 yoslin has quit [Quit: WeeChat 3.2]

15:45 rsalvaterra_ has joined #dri-devel

15:47 <graphitemaster> glennk, My testing hardware is a rig with AMD RX 530, A laptop with Iris Pro Graphics P580, and my desktop with RTX 3070, every machine running latest Arch Linux with mesa-21.1.4-1 (though the desktop rig I can switch between nouveau + mesa and the proprietary drivers for testing with a glvnd and dlopen hack in my engine)

15:48 <graphitemaster> glBufferSubData is worse in mesa on all three machines and hardware configurations than MapBufferRange with PERSISTENT and COHERENT bits set.

15:49 <imirkin> glBufferSubData has to wait for that buffer to stop being used

15:50 <graphitemaster> proprietary NV GL's glBufferSubData outperforms all by a solid 60%

15:50 <graphitemaster> And this is without any fancy double buffering or offset within the buffer tricks.

15:50 <imirkin> they must buffer the data i guess?

15:50 <imirkin> instead of waiting

15:50 <graphitemaster> mesa_glthread=true runs better in my tests too.

15:51 <graphitemaster> But still nowhere near NV speeds.

15:51 rsalvaterra has quit [Ping timeout: 480 seconds]

15:53 <zmike> file a mesa ticket with a test case would be my recommendation

15:54 <zmike> drawoverhead has a similar case for this (https://gitlab.freedesktop.org/mesa/piglit/-/blob/main/tests/perf/drawoverhead.c#L533) so you might try modifying that to better represent what you're seeing

15:54 <graphitemaster> I know a few things [don't ask] that NV does for data buffering. I know they internally double buffer the updates with respect to the swap buffer call that does vsync, I know they lift those uploads off the main thread onto a background thread within their driver (setting __GL_THREADED_OPTIMIZATIONS=0 reduces performance of the upload) and this one is not within the ability of mesa but NV has stream upload compression on 3000

15:54 <graphitemaster> series GPUs where they range encode uploads with an adaptive PPM entropy encoder that removes a lot of 0s in the bitstream, hardware runs a compute shader that decodes and expands that into memory on chip.

15:55 <graphitemaster> That last part no one is doing but the proprietary driver so it's muddying my benches for sure. :|

16:00 tobiasjakobi has joined #dri-devel

16:00 <graphitemaster> Anyways that drawoverhead.c test seems to just be respecifying all the contents every draw, not actually testing glBufferSubData to existing allocated storage that you just replace.

16:01 <graphitemaster> Or ping/pong of the buffers ontop of that which is what most games/engines do, at least once I've looked at or worked on.

16:01 <zmike> yes, it's for buffer replacement profiling, not what you're describing

16:01 <zmike> hence why I said you could try modifying it

16:02 <graphitemaster> Ah, okay, my bad, sorry for misunderstanding :)

16:03 <zmike> np :)

16:19 <glennk> graphitemaster, that 3070, is it running pcie 3 or 4?

16:23 <graphitemaster> glennk, pcie 3 x16

16:24 <glennk> afaik radeon 530 is pcie 3 x8

16:25 <graphitemaster> Don't think that would affect upload performance for 30 MiB a frame worth of data :P

16:25 <graphitemaster> Which is 1m point sprite vertices (each vertex 32 bytes in size)

16:25 <graphitemaster> Which is my bench

16:30 xexaxo has quit [Remote host closed the connection]

16:30 vivijim has quit [Quit: Lost terminal]

16:32 xexaxo has joined #dri-devel

16:33 <glennk> hmm, so you are replacing all the contents with a single call to subdata?

16:34 xexaxo_ has joined #dri-devel

16:41 xexaxo has quit [Ping timeout: 480 seconds]

16:42 yoslin has joined #dri-devel

16:44 illwieckz has joined #dri-devel

16:46 <graphitemaster> Yeah. The code looks more like allocate a 16 KiB buffer initially with glBufferData (nullptr for initial contents), store that size, and then if the update fits, SubBuffer replace, if it doesn't, in a loop, golden ratio resize the size then glBufferData again to make a new backing storage for that

16:46 <graphitemaster> This is how our engine works for streaming buffers, the actual frontend double buffers ontop of this as well.

16:49 <glennk> and what are the usage flags for bufferdata?

16:49 <graphitemaster> GL_DYNAMIC_DRAW

16:55 <glennk> as an experiment, what happens if you use STREAM_DRAW on the radeon?

16:57 manu has quit []

16:57 manu has joined #dri-devel

16:58 manu has left #dri-devel [#dri-devel]

16:58 evadot has joined #dri-devel

17:20 sdutt has joined #dri-devel

17:53 Peste_Bubonica has joined #dri-devel

18:07 tarceri_ has joined #dri-devel

18:09 tarceri has quit [Ping timeout: 480 seconds]

18:10 <graphitemaster> glennk, No performance difference between GL_STATIC_DRAW, GL_STREAM_DRAW, and GL_DYNAMIC_DRAW

18:19 <graphitemaster> What is interesting though is that there's a real performance difference on the AMD system with a compatibility GL context and a core profile (3.3) one

18:19 <graphitemaster> About 8-12% or so.

18:39 <dv_> core profile being the faster one?

18:40 <dv_> I dimly remember a weird case where some old desktop GPU actually ran faster with the compatibility context for unknown reasons

19:09 <graphitemaster> Yeah, on NV compat profiles tend to be faster in my tests. In this case it's core profile that is faster with AMD on mesa.

19:29 <glennk> graphitemaster, can you pastebin lspci -vvv for the amd card?

19:29 <glennk> on second thought, also for the nv card

19:31 flacks has joined #dri-devel

19:33 flacks_ has quit [Ping timeout: 480 seconds]

20:11 rsalvaterra_ has quit []

20:12 rsalvaterra has joined #dri-devel

20:16 pekkari has quit [Quit: Konversation terminated!]

20:21 <graphitemaster> Well that's weird, my lspci on the NV right is spitting out pcilib: sysfs_read_vpd: read failed: Input/output error

20:21 <graphitemaster> s/right/rig

20:25 <graphitemaster> NV is not going to be of much help for you though since not mesa, proprietary driver rn

20:25 <graphitemaster> https://pastebin.com/raw/38PBuSgB

20:25 <graphitemaster> But the other modules are there, I swap to nouveau when I need to test, I can do that if you want

20:26 <graphitemaster> Then rerun it, maybe the output is different.

20:28 <graphitemaster> I'm really concerned about that pcilib sysfs_read_vpd error though, it seems to happen randomly when I run lspci

20:28 <robclark> danvet: just for you, drm scheduler conversion and bonus drm_gem_object_put_locked() removal.. https://patchwork.freedesktop.org/series/92680/

20:29 <graphitemaster> Each time it happens dmesg spits out a nice "[4036884.941404] atlantic 0000:42:00.0: invalid short VPD tag 00 at offset 1"

20:29 * graphitemaster hopes his LSI controller isn't going

20:32 <danvet> robclark, oh nice

20:34 <glennk> graphitemaster, some bogus entry in pci rom for the card

20:36 <glennk> graphitemaster, so the nv card isn't mapping all of vram, just the usual 256MB window, just wanted to verify that

20:37 Hi-Angel has joined #dri-devel

20:37 <graphitemaster> I like the bar indices, 0, 1, 3, where did bar 2 go :(

20:37 <graphitemaster> region 0, 1, 3 too.

20:38 <glennk> random guess the audio subdevice uses 2

20:41 Hi-Angel has quit [Remote host closed the connection]

20:42 <graphitemaster> Hard to pastebin from the other machine since it's not on the internet but the output is ASUSTeK Computer Inc. Ellesmere [Radeon RX 470/480/570/570X/580/580X/590] and it has two memories (one at c00000000, the other at d00000000) 64-bit prefetchable and the first one is 256 MiB, the other is 2 MiB, ditto for the bars below

20:44 <graphitemaster> The Intel machine, first 64-bit, non prefetchable [size=16M], and the second one, 64-bit prefetchable, [size=256M] ... at de000000, and b0000000 respectfully.

20:45 <graphitemaster> Oh the AMD also has another 256K memory (32-bit, non-prefetchable) but for some reason it printed after the i/o port line and expansion rom so I missed it.

20:46 <graphitemaster> So it appears like all the machines are just mapping 256 MiB of vram.

20:47 <glennk> yeah without rebar thats the maximum

20:59 <glennk> so attempting to answer your question about mesa_glthread, it basically just marshals the top GL dispatch layer to separate app and driver as much as possible

21:05 gouchi has quit [Remote host closed the connection]

21:05 <graphitemaster> So it doesn't afford any additional pipelining of the upload then, just relaxes the GL thread of some work, they still operate in lock-step frame wise?

21:06 <glennk> what happens for BufferSubData is a bit driver dependent

21:07 <glennk> there's a generic codepath which just does a memcpy of the data in the marshaled stream (or malloc the buffer if its large enough)

21:09 <glennk> this path basically lets the app continue without waiting on the hardware, unless the marshal command buffer is full in which case it waits

21:10 Duke`` has quit [Ping timeout: 480 seconds]

21:13 <graphitemaster> I guess the main concern I have is if mesa encounters a draw call which sources said buffer, will it then stall waiting for the upload or is it smart enough to defer the draw call and maybe not stall at all since the contents upload in time for the actual immediate draw?

21:14 <graphitemaster> Like where does it wait if any for the upload (if it has to), is it waiting at the draw (client side), at swap buffers call (client side), or at the draw (server side)

21:15 <graphitemaster> And if you remove that data dependence with e.g double buffering, can it shift that stall down the pipeline until the actual draw.

21:18 <glennk> so on specifically radeonsi, there's another driver thread which actually performs the kernel calls which talk to the hardware

21:18 Surkow|laptop has quit [Ping timeout: 480 seconds]

21:18 <glennk> which is where the cpu side waits on the hardware happen

21:21 rasterman has joined #dri-devel

21:27 Surkow|laptop has joined #dri-devel

21:29 lemonzest has quit [Quit: Quitting]

21:33 <glennk> so subdata with mesa_glthread on radeonsi, it basically creates a staging buffer equivalent to BufferData(WRITE_ONLY, CLIENT_STORAGE|WRITE) then keeps it mapped unsynchronized, and the marshaling memcpy's into that

21:36 sdutt has quit [Ping timeout: 480 seconds]

21:39 <graphitemaster> When does the staging buffer hit the GPU though.

21:39 <graphitemaster> I mean that sounds like the approach a persistently mapped buffer in GL would more or less end up as too.

21:40 alyssa has left #dri-devel [#dri-devel]

21:44 <glennk> so a copy from system memory staging to vram on radeon is done with CP DMA, so its whenever the hardware command stream gets to that

21:46 <glennk> which the more threads you add into the mix, the longer between the api call and the hardware processing it

21:46 <glennk> which leads me to ask: what is your app doing in between the subdata call and any draw call using that data?

21:51 silver has quit [Ping timeout: 480 seconds]

21:52 <graphitemaster> Rendering a whole other frame :P

21:53 <graphitemaster> Our engine doesn't actually ever source the contents of an upload or an update to a resource within the same frame, it's always n+1

21:53 <graphitemaster> So I update a vertex buffer (as an example), and it won't be until next frame that this vertex buffer will be sourced for a draw call.

21:54 <graphitemaster> And our engine does all it's work basically at the end of a frame too, since it has to target multiple APIs, there's no "work" done inbetween GL draw calls, it's just a blast of GL commands one after the other followed by a swap buffers

21:55 <graphitemaster> So from the driver's perspective it just gets hit with say 800 GL function calls all immediately at once and then swap.

21:55 <graphitemaster> Which probably doesn't give it much time to do anything :P

22:03 <glennk> is the buffer object itself used by other draw calls the same frame? ie object dependency not dependency on content subrange

22:09 <graphitemaster> No, different buffer in this case, ping/pong the GLuint's, though there's some places in the engine where I do offsets within a buffer because it would be too much memory otherwise.

22:09 <graphitemaster> I don't think that makes much of a difference to be honest.

22:09 heat has joined #dri-devel

22:10 <graphitemaster> The engine also goes out of it's way to only ever update buffers on offsets that are 16-byte aligned and with sizes that are a multiple of 16 too

22:11 <graphitemaster> Since that appears to make a big difference on NV proprietary on Windows and Linux.

22:20 <glennk> for alignment i would probably say match your cpu cache line size

22:21 <glennk> anything mapping vram directly on gcn/navi will have a base address that is 256 byte aligned

22:22 <glennk> not the issue here, but as a side note

22:27 <graphitemaster> I mean I don't think there's an issue here other than glBufferSubData is slower than persistently mapped buffer with COHERENT for same size uploads involving a memcpy.

22:27 <graphitemaster> I would expect persistently mapped buffers to be faster if you avoided a memcpy and produced directly into it even because saving a memcpy is saving work, but this is basically the same amount of work.

22:28 <graphitemaster> And it's not the case on NV at all where the roles are swapped.

22:28 <graphitemaster> So I just find it more fascinating than anything.

22:29 <graphitemaster> I remember having to optimize the upload path for streaming years ago for different hardware and drivers, I just would've thought this is sorted out by now :P

22:29 <FLHerne> graphitemaster: fwiw, anholt_ wrote the buffermapping page you mention fairly recently, there's a few comments on the MR here https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9231

22:31 <glennk> graphitemaster, coherent persistent buffers do the driver work at map time, then its the hardware snooping the updates + your app code synchronizing

22:32 <glennk> with subdata every call needs to check hey is this buffer in flight? if not, okay, map it and memcpy, otherwise dump this data in staging buffer and emit a blit

22:32 <graphitemaster> I would expect `glBufferData` does the mapping, and `glBufferSubData` just does the same thing though

22:33 <graphitemaster> Why would SubData have to do anything other than a memcpy is what I want to know

22:33 <graphitemaster> I mean sure all the validation and what not, lets just ignore that for a moment

22:33 <glennk> there's not an option to ignore that for a conformant GL driver

22:33 <graphitemaster> I mean in the case of this discussion :P

22:34 <graphitemaster> It's not possible for SubData update be larger than the original backing allocation requiring a reallocation is it?

22:34 <graphitemaster> "size must define a range lying entirely within the buffer object's data store."

22:35 <glennk> consider an app calling subdata, draw, then subdata on an overlapping range, then draw

22:37 <graphitemaster> Right I understand there's serialization that needs to occur in that case to prevent sourcing the content in a draw while it's being written to, the same is true with coherent mapped buffers, but if you're not causing this (e.g double buffer, or offset within the buffer avoiding overlapping range) then surely the driver can be clever enough to see this and substitute a fast path

22:37 ngcortes has joined #dri-devel

22:39 <graphitemaster> Anyways talk is cheap, I should probably spend my next weekend poking around mesa to see how things work.

22:39 <glennk> yeah its not rocket science when the source code is available

22:40 <graphitemaster> Oh I fully expect it to be, I've been led to believe drivers are magic for too long XD

22:41 <graphitemaster> I think I found my answer for radeon anyways

22:42 <graphitemaster> map + memcpy + unmap

22:46 iive has quit []

22:48 <glennk> i think the path your app is hitting for radeonsi is discard old resource, map staging, memcpy, unmap, then blit to new

22:49 <graphitemaster> So for shits and giggles, suppose `radeonBufferData` did `radeon_bo_map` like it does when data != NULL, but just did not unmap it, radeonBufferSubData could then use it directly, but looks like it needs to store it in obj->Mappings like the persistent mapping does, then I suppose you'd have to unmap it when referenced by a draw. I'm just spit balling ideas at nightime without actually profiling or anything

22:50 <glennk> you are looking at the wrong driver :p

22:50 <graphitemaster> Oh

22:50 <glennk> thats the one for pre-shader radeons

22:50 <glennk> src/gallium/drivers/radeonsi

22:51 <graphitemaster> I assume si_buffer_subdata

22:52 <graphitemaster> Same thing though, it maps with si_buffer_transfer_map, does the memcpy, and unmaps with si_buffer_transfer_unmap

22:53 <glennk> the staging bits get decided in si_buffer_transfer_map

22:56 <graphitemaster> Right so my main question is, is there a way for this si_buffer_transfer_map, which also does the si_buffer_map once it works out all the usage bits to use for that mapping, to stay mapped as a pointer in the driver so when a si_buffer_subdata is called, it skips doing the map at all and just reuses that pointer?

22:57 <graphitemaster> Like I know there's a ton of what ifs here about syncronization and stuff, I just kind of want to know if it's theoretically possible

22:57 <graphitemaster> Basically to transparently manage a persistent mapping behind the scenes for subbuffer updates.

23:02 <glennk> well if you follow the rabbit hole into radeon_drm_bo.c

23:06 <graphitemaster> Not sure what referenced_by_cs is, command stream?, it then issues what looks like an immediate flush

23:06 <graphitemaster> When mapping for write

23:07 <graphitemaster> I do see a wait in there too, infinite one.

23:08 <graphitemaster> But yeah looks like eventually it calls radeon_bo_do_map which returns the existing mapping

23:08 <graphitemaster> Though that's after also acquiring a mutex

23:08 <graphitemaster> There's a lot of overhead to get a mapping reuse

23:09 <graphitemaster> And it still appears to flush in either case.

23:09 <glennk> i think your case should hit PIPE_MAP_UNSYNCHRONIZED

23:16 <graphitemaster> Humm, yeah and si_buffer_transfer_unmap doesn't actually unmap the buffer, it just signals si_buffer_do_flush_region by the looks of it, which then hits si_copy_buffer, and then that does the real copy with si_cp_dma_copy_buffer, the staging buffer stays persistently mapped.

23:16 tobiasjakobi has quit [Remote host closed the connection]

23:20 <graphitemaster> So then that's a bit of a dead end.

23:23 silver has joined #dri-devel

23:24 <graphitemaster> Sorry for keeping you engaged on this goose chase. You've been incredibly patient and kind. I'm going to do a more proper deep dive next weekend I think on the actual AMD rig and see, maybe I'll whip together a proper testcase too you can carry upstream in that perf directory.

23:25 <graphitemaster> I'm just really fascinated with why this is the case and how I can bridge the gap here performance wise so no one has to keep writing different streaming upload code for different rigs and systems.

23:26 <graphitemaster> It's just too ridiculous to me, UE4 has 12 different streaming upload paths for OpenGL, 12.

23:26 Peste_Bubonica has quit [Quit: Leaving]

23:27 danvet has quit [Ping timeout: 480 seconds]

23:28 <glennk> whats optimal for one bit of hardware is rarely so for another

23:30 <graphitemaster> Sure, and it's always been my opinion that the dumbest most basic glBufferData+SubData in the driver should try as hard as it can to be as fast as possible for any given hardware/driver.

23:30 <graphitemaster> Since that's always been the case at least with GL performance on NV in my experience.

23:31 ngcortes has quit [Remote host closed the connection]

23:37 <glennk> btw the compression thing on nv i think is only enabled when pcie link width is < 8x, ie thunderbolt

23:44 <graphitemaster> Seems kind of silly it wouldn't use it for streaming contents if it reduces memory bandwidth which is the main problem with streaming.

23:45 <graphitemaster> I have so much compute time left over I'd gladly trade all of it for like 50% reducing in memory bandwidth

23:45 rsalvaterra_ has joined #dri-devel

23:48 sdutt has joined #dri-devel

23:50 <glennk> a pcie 3 16x link does ~15GB/s, not a lot of cpu compressors that can output at that speed

23:51 rsalvaterra has quit [Ping timeout: 480 seconds]