ChanServ changed the topic of #dri-devel to: <ajax> nothing involved with X should ever be unable to find a bar
LeviYun has quit [Ping timeout: 480 seconds]
bolson has joined #dri-devel
feaneron has quit [Ping timeout: 480 seconds]
vliaskov has quit [Ping timeout: 480 seconds]
Daanct12 has joined #dri-devel
flynnjiang has joined #dri-devel
<berberlightning> it's simply bullshit what you do, and it's like even though i said it's bullshit and described how big cow shit you do, it's like you keep on doing it, where as i already showed how to do everything properly the dependencies are handled fine, cause length is invariant it is taken from the same next power and assembled as allocation based of value and index, so let's do it with numbers
<berberlightning> then, so the fake dependencies have to work, or pseudo deps is a batter name, what happens is you add a pc compressed identifiers sequence to the hash which has a size 6k the whole data bank of every variable is only 6thousound decimal digits, you access that from the hash by adding it to the memory location as sum dudes this is only less than two bytes, two bytes max value is 65536
<berberlightning> units, and it has all the pc's enumerated in as to where to query the lengths and it does it multiaccess , so all lengths are queried together, the assembly of such bank and all the compilation process is extremely fast, the runtime is extremely fast and low power as well, the battery would not last for 7 days like linuses old employer transmeta promised, but with such performance as
<berberlightning> you offer now it would last for years. Of course it's a body blow for some companies, but however for safety reasons me and tom know that amd can not expose their clockless domain, we would beam eachother into ashes soon enough, i simulated all the blocks of miaow and opened the gtkwave diagrams timeline toggled in and out , amd knows what tom and me know, it can not be exposed for
<berberlightning> safety reasons. Because of that i made a new way, you can not directly use my code to emit waves which harm people, i call them supersonic, ultrasonic whatever is the spectre, cause clockless can do that , and i still have those molds in all over my body when they emitted such waves on me, correctly focused wave will crush your heart just like this and only wireless antenna is needed
<berberlightning> which is equipped with any modern board. they emitted both versions to me, short burst to my heart for demonstration as well as the backteria culturing out of spec wave, which made me very sick, it was some military battleground, yyeah i nearly died but came out ontop once again eventually recovered through the miracle, what they did was hope for last minute that i die, delivering the
<berberlightning> needed energy from the sky with lasers which made floods to happen on my location, room was full of water, and i was very sick, however i am truely special in tissue strength, and such people are all donors in this world, cause they are labeled through monester medicine mob, and they chip you without your approval, to take all the substances out as to what they need for vaccines etc.
<berberlightning> Though the active low does not permit that, the reptilians do not care, however i am not an alien those traits come genetically mutant ways, and this is no going back several centuries like noble prize fraud winner svande pääbo believes. It happens cause of military testings and science of mass destruction weapons. For an example the energy of nuclear bombs wave goes three times
<berberlightning> around all of the earth, and the so called mushroom where there is enough proximity and that is what my location of granny was when my dad was born, very stramge things will happen one generation forward, and me as twin dominant got +4 in kellars table, and such are so called gods or dieties that nowdays are used as cows due to result fixings. My brain scan is alive as hell, it can
<berberlightning> lit up all the darkness, but they injured my joints and i have had issues, its so active cause my skull is thicker than usual and i have pronounced forehead, brain cells are more protected and in bigger vacuum, so you just learn how to master your head, if severely milked and injured and broke off guy can do it, you can do it too. My reflexes as well as instinctive and fast thinking
berberlightning has quit [Remote host closed the connection]
berberlightning has joined #dri-devel
LeviYun has joined #dri-devel
Haaninjo has quit [Quit: Ex-Chat]
LeviYun has quit [Ping timeout: 480 seconds]
karenthedorf has joined #dri-devel
LeviYun has joined #dri-devel
LeviYun has quit [Ping timeout: 480 seconds]
flynnjiang has quit [Quit: flynnjiang]
<DemiMarie> dwfreed: banhammer?
<airlied> already done
flynnjiang has joined #dri-devel
berberlightning has quit [Remote host closed the connection]
yyds has quit [Remote host closed the connection]
yyds has joined #dri-devel
apinheiro has quit [Quit: Leaving]
yyds has quit [Remote host closed the connection]
yyds has joined #dri-devel
yyds has quit [Remote host closed the connection]
yyds has joined #dri-devel
yyds has quit []
yyds has joined #dri-devel
alane has joined #dri-devel
yyds has quit [Remote host closed the connection]
yyds has joined #dri-devel
LeviYun has joined #dri-devel
yyds has quit []
yyds has joined #dri-devel
LeviYun has quit [Ping timeout: 480 seconds]
vedranm_ has joined #dri-devel
yyds has quit []
yyds has joined #dri-devel
yyds has quit [Remote host closed the connection]
<alyssa> how it started: I'm going to work on agx for a few minutes
<alyssa> how it's going: this is a fundamental bug in core NIR affecting every driver
<alyssa> :clown:
nerdopolis has quit [Ping timeout: 480 seconds]
yyds has joined #dri-devel
vedranm has quit [Ping timeout: 480 seconds]
Daaanct12 has joined #dri-devel
Daanct12 has quit [Read error: Connection reset by peer]
yyds has quit [Remote host closed the connection]
yyds has joined #dri-devel
LeviYun has joined #dri-devel
LeviYun has quit [Ping timeout: 480 seconds]
LeviYun has joined #dri-devel
<idr> alyssa: I looked at that issue. That's... misery.
LeviYun has quit [Ping timeout: 480 seconds]
sukuna has joined #dri-devel
jsa has joined #dri-devel
bmodem has joined #dri-devel
LeviYun has joined #dri-devel
LeviYun has quit [Ping timeout: 480 seconds]
LeviYun has joined #dri-devel
LeviYun has quit [Ping timeout: 480 seconds]
Duke`` has joined #dri-devel
Company has quit [Quit: Leaving]
RAOF has quit [Remote host closed the connection]
RAOF has joined #dri-devel
jsa has quit [Ping timeout: 480 seconds]
kode54 has quit [Quit: The Lounge - https://thelounge.chat]
KDDLB has quit [Quit: The Lounge - https://thelounge.chat]
kode54 has joined #dri-devel
heat has quit [Ping timeout: 480 seconds]
LeviYun has joined #dri-devel
LeviYun has quit [Ping timeout: 480 seconds]
kode54 has quit [Quit: The Lounge - https://thelounge.chat]
kode54 has joined #dri-devel
sima has joined #dri-devel
amarsh04 has quit []
jsa has joined #dri-devel
LeviYun has joined #dri-devel
coldfeet has joined #dri-devel
amarsh04 has joined #dri-devel
Duke`` has quit [Ping timeout: 480 seconds]
LeviYun has quit [Ping timeout: 480 seconds]
<glehmann> there is a visual difference, but is it a bug or just a unrelated minor precision change?
<glehmann> also the "Calculating the difference..." doesn't seem to work and the more info link is a 404
flynnjiang1 has joined #dri-devel
itoral has joined #dri-devel
flynnjiang has quit [Remote host closed the connection]
warpme has joined #dri-devel
LeviYun has joined #dri-devel
tzimmermann has joined #dri-devel
frieder has joined #dri-devel
LeviYun has quit [Ping timeout: 480 seconds]
mlankhorst has quit [Ping timeout: 480 seconds]
jsa has quit [Ping timeout: 480 seconds]
alih has quit [Ping timeout: 480 seconds]
coldfeet has quit [Remote host closed the connection]
warpme has quit []
bmodem has quit [Ping timeout: 480 seconds]
jsa has joined #dri-devel
warpme has joined #dri-devel
LeviYun has joined #dri-devel
sghuge has quit [Remote host closed the connection]
sghuge has joined #dri-devel
<sima> tursulin, is the drm-intel-gt-next-2024-06-12 PR the one that didn't deliver to airlied's inbox?
LeviYun has quit [Ping timeout: 480 seconds]
* airlied doesn't have an inbox anymore :-)
<airlied> but PR's need to get to lore
bmodem has joined #dri-devel
<airlied> and match q = (s:\"pull\" t:\"airlied\" d:last.week..)
<airlied> if I take a week of my plan fails :-P
<tursulin> sima, airlied: I did not know it did not get delivered but someone just pinged me that it was missing
<airlied> I have that in my lore list, I might have failed to process it on my end
<sima> tursulin, hm I thought you've mentioned something that one of your pr bounced on airlied's gmail
<sima> and thought maybe it was this one
<tursulin> hmm don't think that was me
<airlied> and it's all fine if a PR does bounce on my email, since I used to use patchwork and now I use lei to pull them
<airlied> I think I lost it because I was hoping around drm-misc-next pulls to make sure v3d built without warnings and jumped over it
<airlied> pulled it into my local tree now
rasterman has joined #dri-devel
LeviYun has joined #dri-devel
kzd has quit [Ping timeout: 480 seconds]
vedranm has joined #dri-devel
vedranm_ has quit [Ping timeout: 480 seconds]
LeviYun has quit [Ping timeout: 480 seconds]
vliaskov has joined #dri-devel
jmondi has quit [Quit: WeeChat 3.8]
lynxeye has joined #dri-devel
jkrzyszt has joined #dri-devel
Haaninjo has joined #dri-devel
smpl has joined #dri-devel
warpme has quit []
dolphin` has joined #dri-devel
dolphin is now known as Guest10938
dolphin` is now known as dolphin
Guest10938 has quit [Ping timeout: 480 seconds]
yyds has quit []
bmodem has quit [Ping timeout: 480 seconds]
warpme has joined #dri-devel
yyds has joined #dri-devel
sukuna has quit [Remote host closed the connection]
warpme has quit []
Mangix has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
Mangix has joined #dri-devel
LeviYun has joined #dri-devel
<jfalempe> I sent a patch for review, that affects multiple subsystem https://patchwork.freedesktop.org/series/135356/
<jfalempe> But it was taken in akpm-mm tree, and is now in linux-next/master, causing some build failure.
warpme has joined #dri-devel
vliaskov has quit [Read error: Connection reset by peer]
flynnjiang1 has quit []
coldfeet has joined #dri-devel
pcercuei has joined #dri-devel
kts has joined #dri-devel
dliviu has quit []
dliviu has joined #dri-devel
warpme has quit []
warpme has joined #dri-devel
kts has quit [Quit: Konversation terminated!]
Daaanct12 has quit [Quit: WeeChat 4.3.2]
cmichael has joined #dri-devel
kts has joined #dri-devel
rgallaispou has joined #dri-devel
TMM has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
TMM has joined #dri-devel
kts has quit [Quit: Konversation terminated!]
Calandracas_ has quit [Remote host closed the connection]
Calandracas has joined #dri-devel
hansg has joined #dri-devel
xroumegue has quit [Ping timeout: 480 seconds]
Hazematman has joined #dri-devel
<Hazematman> Hey all, I've been working on an MR to improve llvmpipe & lavapipe android support to work without kms_swrast, as well as improve the mesa documentation for android to include an out of tree build into an android image. I would appreciate any feedback on my MR https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29344 :)
xroumegue has joined #dri-devel
bmodem has joined #dri-devel
vliaskov has joined #dri-devel
fab has joined #dri-devel
fab has quit []
<alyssa> idr: (:
<FL4SHK[m]> So if I write my own Vulkan driver, do I still benefit from a Mesa backend?
itoral has quit [Quit: Leaving]
Company has joined #dri-devel
OftenTimeConsuming has quit [Remote host closed the connection]
epoch101 has joined #dri-devel
OftenTimeConsuming has joined #dri-devel
simon-perretta-img has quit [Read error: Connection reset by peer]
simon-perretta-img has joined #dri-devel
bmodem has quit [Ping timeout: 480 seconds]
<alyssa> yes
nerdopolis has joined #dri-devel
<FL4SHK[m]> neat
<FL4SHK[m]> where does the integration with Mesa come in?
<FL4SHK[m]> I'm a little confused about that
<FL4SHK[m]> I have a GCC port I've written most of for the CPU/GPU (they will have similar instruction sets) I'm developing
<FL4SHK[m]> can I combine this with Mesa
<zmike> DavidHeidelberg: are you back now?
<Hazematman> FL4SHK[m]: Checkout `src/vulkan` in mesa, there is a bunch of common code in the folder to help implement a lot of vulkan functionality. In your own driver you would call functions from that module.
<Hazematman> Additionally you can look at the shader compiler backends for different platforms to see how they use common code for shader compilers in mesa. That kind of goes outside vulkan, but the is a lot of common code in `src/compiler/nir` for example.
<FL4SHK[m]> I see
<FL4SHK[m]> thanks
<FL4SHK[m]> can I make use of my GCC port?
heat has joined #dri-devel
<Hazematman> <FL4SHK[m]> "can I make use of my GCC port?" <- Not exactly sure what you're doing, but I assume you have a GCC port for a different cpu arch that you want to use in a cross compiler fashion? In that case yes, you just need to setup a cross compiler environment with meson. You can read this for instructions, but you should be able to setup a cross file that points to your custom gcc and use that to build mesa
<FL4SHK[m]> so I have a GCC port for the GPU
<FL4SHK[m]> that's what I'm getting at
<FL4SHK[m]> I'm doing something a bit different for my CPU/GPU design
<FL4SHK[m]> like I mentioned they will run similar instruction sets, though with some modifications if necessary
<FL4SHK[m]> I can compile regular C/C++ code with the GCC port targeting the GPU
<Hazematman> In that case I'm not exactly sure it would be as useful here... Mesa has its own compiler infrastructure, since vulkan expects SPIRV and OpenGL expects GLSL or SPIRV. For drivers in Mesa the common code typically will take SPIRV or GLSL, compile it into Mesa's IR which is called NIR, and then the drivers will ingest NIR and convert it to native machine code for their architecture.
<Hazematman> So i'm not sure what exactly you're hoping to achieve by making use of Mesa if you have your own compiler infrastructure already.
<Hazematman> There is common code for handling things like sync, swapchain, wsi, etc in Mesa that might be of interest to you
rgallaispou has quit [Read error: Connection reset by peer]
<FL4SHK[m]> so, I could just use the GCC port for running regular C/C++ on the GPU, and also write a Mesa port.
<FL4SHK[m]> what I want is to be able to run Vulkan and hopefully OpenGL on the machine
<FL4SHK[m]> the whole system is hopefully going to be a many-core machine where CPU/GPU are hooked up via an interconnect inside the FPGA
<FL4SHK[m]> I am unsure how many cores I can fit.
<FL4SHK[m]> I have 500k logic cells but none of those have to be used to implement any kind of RAM
<FL4SHK[m]> so there's a lot of space for cores if I keep them simple. One core might be 1000 or so logic cells
<FL4SHK[m]> maybe more like 2000
fab has joined #dri-devel
<FL4SHK[m]> oh, wait a minute
<FL4SHK[m]> no, the cores will be more than 2000 logic cells
<FL4SHK[m]> and there'd be fewer of them
<FL4SHK[m]> the 2000 logic cells number is from if I didn't include vector instructions
<FL4SHK[m]> but I want to include vector float instructions at least
<FL4SHK[m]> 32-bit floats that is
rgallaispou has joined #dri-devel
feaneron has joined #dri-devel
karenthedorf has quit [Remote host closed the connection]
<Hazematman> Sounds like an ambitious project 😅 just an FYI then if you want to run generic OpenGL or Vulkan application you'll need to be able to ingest GLSL or SPIRV, which I don't think GCC supports. So at some point you'll need to look at modifying GCC to support that or building a new compiler that can handle those.
<FL4SHK[m]> well
<FL4SHK[m]> yeah, it's an ambitious project
<FL4SHK[m]> haha
<FL4SHK[m]> the GCC port wasn't too bad
<FL4SHK[m]> took a couple months
<FL4SHK[m]> it's a long term project, mind you
<FL4SHK[m]> at least I'm not writing an OS too
<FL4SHK[m]> so I'll go ahead and write a SPIR-V compiler for the system
<FL4SHK[m]> I heard that you can just do a basic translation from SPIR-V to the GPU instruction set
<FL4SHK[m]> which is an assembly-to-assembly transpile
<FL4SHK[m]> I might be oversimplifying it
fab has quit [Ping timeout: 480 seconds]
alih has joined #dri-devel
<DemiMarie> Is similar CPU and GPU ISAs a bad idea?
<FL4SHK[m]> possibly
<karolherbst> yes
<FL4SHK[m]> I'm gonna try it
<karolherbst> GPUs by definition generally don't need vector instructions, because that's implicit in how they run things
<DemiMarie> I suggest going with a conventional design.
simon-perretta-img has quit [Ping timeout: 480 seconds]
<FL4SHK[m]> what I was going to try is hooking up a lot of small cores
<FL4SHK[m]> since that's a novel idea mostly
<FL4SHK[m]> apparently it's been tried with x86
<FL4SHK[m]> in the 2010s
Net147 has quit [Quit: Quit]
<DemiMarie> FL4SHK: tried and failed
<karolherbst> yeah, and it was a bad idea
<FL4SHK[m]> why did it fail?
simon-perretta-img has joined #dri-devel
<karolherbst> because running the same CPU code on a GPU with the same ISA is a wrong promise
Net147 has joined #dri-devel
<karolherbst> I suggest reading this series of blog posts: https://pharr.org/matt/blog/2018/04/18/ispc-origins
<karolherbst> as it covers this topic explicitly
meowmeow has joined #dri-devel
<DemiMarie> FL4SHK: There are three areas in which I would like to see something new.
<FL4SHK[m]> tell me
<FL4SHK[m]> my design is absolutely not finalized right now
<FL4SHK[m]> I coul djust go for the manycore thing for the CPU
<DemiMarie> The first is fault isolation: If a context faults or must be reset, the GPU should guarantee that other contexts are not affected.
<DemiMarie> The second is bounded-latency instruction-level preemption of all hardware, including fixed-function. This means that even a malicious shader cannot prevent itself from being preempted in a bounded amount of time, allowing the GPU to be used in real-time systems.
<alyssa> FL4SHK[m]: for mesa you really want to write a nir backend
<alyssa> it's not hard
<DemiMarie> The third is security isolation: the GPU should guarantee that information cannot be leaked across contexts.
<FL4SHK[m]> alyssa: sounds good
<FL4SHK[m]> I sitll want to hook up the GPU cores directly to my on-chip interconnect
<FL4SHK[m]> inside the FPGA
<Hazematman> DemiMarie: Isn't this mostly covered by hardware GPU contexts and per context virtual memory. Or there something I'm missing?
<FL4SHK[m]> that's what I thought too
<FL4SHK[m]> per-context virtual memory is like virtual memory per-process on a CPU right?
<Hazematman> 1 & 2 would be a game changer for real time GPU usage especially in SC systems
<Hazematman> FL4SHK[m]: yeah the same concept but applied to GPUs. A graphics context will have its own virtual address space and typically the kernel driver is responsible for mapping GPU virtual memory to physical memory
<FL4SHK[m]> I see. Also, from the looks of it, the vectorization stuff that failed for LRB was for general purpose code right?
<FL4SHK[m]> not as much for the shader code maybe?
<karolherbst> FL4SHK[m]: the point is, that once you rely on auto vectorization for performance you are kinda screwed
<FL4SHK[m]> even for shader code?
<karolherbst> that's why GPUs are generally SIMT rather than SIMD
<FL4SHK[m]> I see
<karolherbst> so the ISA looks all scalar, but implicitly the same instruction is ran on multiple threads/lanes/whatever you want to call it
<FL4SHK[m]> gotcha
<FL4SHK[m]> I can do that
<karolherbst> and you either explicitly or implicitly manage thread masks
<karolherbst> like e.g. on nvidia each instruction can be predicated to turn it of for the current thread
<karolherbst> but it still executes on other threads with the predicate being true in the same warp/subgroup
<FL4SHK[m]> can I at least make the instruction set similar to the CPU in some ways?
meowmeow has quit [Remote host closed the connection]
<FL4SHK[m]> like let's say I use some of the same instruction encoding
<FL4SHK[m]> for stuff like, ALU ops
<karolherbst> yeah, the issue isn't the ISA being the same, just often a GPU ISA is more specialized so you might as well
<FL4SHK[m]> I see
<karolherbst> but you could have certain instructions (e.g. vector ones) only work in the "CPU" mode
<karolherbst> and in the GPU mode, scalar instructions just execute in an entire subgroup
<FL4SHK[m]> I see
guludo has joined #dri-devel
<FL4SHK[m]> So that's neat
<karolherbst> each thread still has their own registers, but there is also the concept of "scalar" or "uniform" registers which are the same in each thread of a subgroup
<FL4SHK[m]> I'm going to have to study GPUs further
<FL4SHK[m]> so SIMT is like... partially SIMD right?
<FL4SHK[m]> partitioned SIMD?
<FL4SHK[m]> you have multiple SIMD units
<FL4SHK[m]> many of them
<karolherbst> well.. more like implicit SIMD
<FL4SHK[m]> that was what I was thinking of emulating
<FL4SHK[m]> hm
<FL4SHK[m]> I thought there were actual SIMD engines in the hardware; that was what I learned in school
<karolherbst> like.. e.g. those x86 SIMD instructions with a lane mask map directly to predicated scalar instructions in a SIMT ISA
<FL4SHK[m]> small SIMD engines
<FL4SHK[m]> Hm
<karolherbst> yeah.. but the main difference is, that the ISA is scalar
<karolherbst> (and internal details)
<FL4SHK[m]> I see
dbrouwer has quit []
leandrohrb56 has quit []
ndufresne has quit [Quit: The Lounge - https://thelounge.chat]
padovan85 has quit []
sre54 has quit []
<karolherbst> so.. because your ISA is scalar, you don't need auto vectorization to get great performance
<FL4SHK[m]> that makes sense
<karolherbst> and of course that also requires a different programming language e.g. glsl where you describe what each SIMD lane/SIMT thread is doing
<karolherbst> instead of looking at the entire group
<karolherbst> now a days anything vecN gets scalarized anyway
<karolherbst> *nowadays
<FL4SHK[m]> how do you get from a scalar ISA to telling the hardware what SIMD Lanes to use?
<karolherbst> (except for load/stores which some hardware can actually do wide ones of)
<FL4SHK[m]> somehow that has to be figured out
<karolherbst> all lanes execute the same instruction
<FL4SHK[m]> then how do you get the data for them?
<cwabbott> I'd say that it *is* possible to compile a SIMT language to a SIMD architecture, as long as predication is competent enough, but it requires a completely different compiler architecture
<lynxeye> the important thing is that you can switch between threads when one of them is blocked on memory, which allows you to hide memory latency without large caches and sophisticated prefetchers
<karolherbst> FL4SHK[m]: you mean between threads in the same group?
<FL4SHK[m]> I think so
<FL4SHK[m]> you have to somehow specify where the data comes from
<lynxeye> and that's a direct consequence of the programming model, which you won't be able to realize with a SIMD model
<karolherbst> there are subgroup operations e.g. shuffle where you can move values between threads
<cwabbott> AMD's architecture, for example, is basically a SIMD core with a few goodies strapped on
<karolherbst> but normally each thread is just pulling the data it needs directly
<FL4SHK[m]> cwabbott: ah, yeah, that's what I was reading
<FL4SHK[m]> that AMD actually does have SIMD machines
<FL4SHK[m]> well, SIMD cores
<cwabbott> scalar registers are just normal registers, vector registers are SIMD registers, etc.
<FL4SHK[m]> yeah that is actually teh model I was going to emulate
<cwabbott> subgroup ops are vector shuffles
<FL4SHK[m]> ooh
<FL4SHK[m]> so then
<FL4SHK[m]> is what karol was talking about for nvidia then?
<FL4SHK[m]> or is it applicable to AMD as well?
<cwabbott> just nvidia
<karolherbst> the concepts are similar, just mostly different terms being used
<karolherbst> or different models :P
<karolherbst> though AMD manages masks explicitly, no?
<FL4SHK[m]> that sounds like putting the masks into the ISA
<cwabbott> yes, AMD manages masks explicitly
<FL4SHK[m]> I'd be happy to emulate that
<cwabbott> the point i was trying to make is, it is possible to have a different model in the hardware that's more explicit
<cwabbott> and more SIMD-like
<karolherbst> yeah, fair
<cwabbott> but the *compiler* still has to be SIMT
<FL4SHK[m]> oh, I see
<cwabbott> i.e. the register allocator has to still be aware of what the original control flow is
epoch101 has quit [Ping timeout: 480 seconds]
<FL4SHK[m]> so if I emulate AMD, I need to translate from SIMT to the ISA's more SIMD like arch?
<FL4SHK[m]> in the compiler
<cwabbott> yes, you would need to do that
<cwabbott> for example, ACO does that when translating from NIR
<FL4SHK[m]> okay, that sounds like something that could be a happy medium for my hardware design
<cwabbott> but it also keeps a representation of the original control flow
<FL4SHK[m]> I could still go with my original design idea?
<FL4SHK[m]> I see
<FL4SHK[m]> so if I port Mesa, can I still access that information in my backend?
<cwabbott> and register allocates vector registers with that
<cwabbott> so an existing backend that's written assuming a SIMD model would be mostly useless
<cwabbott> it has to at least be aware of the higher-level representation
<FL4SHK[m]> right, that's what I'm asking about. Can I access the higher-level representation from my backend?
<cwabbott> mesa's IR (NIR) is explicitly only using the higher-level representation
<FL4SHK[m]> ah, got it
<karolherbst> I think "vector registers" are kinda a dangerous terms, because you still don't get a vector reg as you'd get on a SIMD x86 ISA, right? It's still a thread private register you get, you just encode "this operates on different values per thread" or did I misunderstand how that's working on AMD?
<FL4SHK[m]> I thought AMD actually did use CPU-like SIMD registers
<FL4SHK[m]> based upon their documentation I read
<cwabbott> i'd say it's more like CPU SIMD registers
<karolherbst> I think it depends on how you look at it
<FL4SHK[m]> excellent, that's exactly what I wanted to hear
<cwabbott> yes, it's sort-of a difference in semantics
<FL4SHK[m]> then my question is, if I go with CPU-like SIMD registers in my ISA, can I emulate AMD's model?
<karolherbst> if you look at the entire thing as a SIMD group, yes it makes more sense to say it's SIMD like, but if you look at it from a "single thread" perspective it kinda doesn't
<cwabbott> yes, but beware that you do have to explicitly design things to "be nice" with the SIMT model
pcercuei has quit [Quit: Lost terminal]
<cwabbott> for example, 16-bit values have to go in the upper/lower half of 32-bit values
<FL4SHK[m]> I am willing to change the hardware to better fit the SIMT model
<FL4SHK[m]> since I do have control over the ISA and stuff
<cwabbott> the "stride" for lack of a better word has to always be 32 bits
<karolherbst> I think there were people having a common ISA on both sides, but it operated on SIMD lanes for GPU "code"
<cwabbott> i.e. each vector lane must be 32 bits
<karolherbst> and it was just a scalar ISA
<FL4SHK[m]> oh, that's familiar to me
<FL4SHK[m]> I don't mind making it 32-bit
<FL4SHK[m]> so that would mean, you have 32-bit floats?
<FL4SHK[m]> I was hoping to go with 32-bit floats
<cwabbott> otherwise you get into a world of hurt if you try to use the higher-level control flow in your compiler
<FL4SHK[m]> since those will use less hardware
<FL4SHK[m]> oh
<karolherbst> GPUs normally don't encode the "type" in registers, they use the same registers for int and float operations
<karolherbst> but yeah.. they are most of the time 32 bit wide
<FL4SHK[m]> Okay well I'll keep that in mind
<FL4SHK[m]> right actually I was thinking of doing that as well
<FL4SHK[m]> already had that idea
<FL4SHK[m]> for the scalar stuff as well
kxkamil2 has joined #dri-devel
<FL4SHK[m]> and for the CPU as well :)
jkrzyszt_ has joined #dri-devel
<FL4SHK[m]> I need to write up a version of my ISA spec for a 64-bit version of the CPU
<FL4SHK[m]> or just modify the existing one
<karolherbst> I think the entire reason it's split in x86 is because of legacy
<cwabbott> GPUs tend to use the same cores for int and float operations
<FL4SHK[m]> I see, that makes sense
<cwabbott> because integer stuff just isn't as important
jsa1 has joined #dri-devel
<karolherbst> nvidia has explicit int units these days
<FL4SHK[m]> I've heard that before too
<karolherbst> but yeah...
<cwabbott> that naturally leads to using the same registers, whereas CPUs have the opposite tradeoff
<karolherbst> but nvidia is weird anyway
<karolherbst> if you do a float op on a result of an int alu, you need to wait one cycle more
<karolherbst> vs float -> float or int -> int
jsa has quit [Ping timeout: 480 seconds]
kxkamil has quit [Ping timeout: 480 seconds]
jkrzyszt has quit [Ping timeout: 480 seconds]
<mattst88> karolherbst: how do texture operations handle sources where some arguments are floats and some are ints? do you have to move the ints to the float register file?
<karolherbst> mattst88: it's all raw data
<karolherbst> the instruction is responsible for interpreting the data
<mattst88> right, but can the texture operation take some sources from the float reg file and some from the int file?
<karolherbst> float vs int regs don't exist
<karolherbst> it's all registers
<mattst88> oh, they use the same register file. it's just that there's a different ALU unit and some additional latency when moving results from the int ALU to the fp ALU, etc
<karolherbst> that's why NIR is also entirely untyped, because it just doesn't really make sense to have typed registers
<mattst88> yeah, gotcha
<karolherbst> mattst88: correct
<karolherbst> though on nvidia it's all werid, because the scoreboarding/latency is done at compile time
<karolherbst> so the compiler has to know those rules
<karolherbst> and results just appear in a register at some defined time
<mattst88> yeah, makes sense
<karolherbst> (which also means, that an instruction executed later can actually clober the input of a previous instruction)
<mattst88> so NVIDIA has to do the software scoreboarding stuff in the compiler, like recent Intel GPUs?
<mattst88> and presumably has had that for much longer?
<karolherbst> yeah, it's quite old
<karolherbst> they experimented with that in kepler, but made it a full requirement with maxwell
<karolherbst> so like over 10 years roughly?
<karolherbst> it's quite complicated really. There are also instructions which read some inputs 2 cycles later and stuff 🙃
warpme has quit []
TMM has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
TMM has joined #dri-devel
<mattst88> makes sense
<alyssa> cwabbott: I assume you've seen https://github.com/ispc/ispc ?
<mattst88> it usually takes at least 5 years for some innovation in nvidia gpus to show up in intel gpus :)
<alyssa> mattst88: moof
<alyssa> mood
<cwabbott> alyssa: yes, and I assume that because it just uses llvm under the hood the codegen for more complicated things is bad
<karolherbst> cwabbott: it doesn't rely on an auto vectorizer
<alyssa> ^
<cwabbott> yes, I know
<karolherbst> it's a custom language translated to SIMD
<alyssa> afaik it's just the ACO thing
<karolherbst> at least it seems to perform better than auto vectorizers :D
<cwabbott> but by the time it hits the backend, the higher-level information is lost, no?
<karolherbst> it emits SIMD instructions/intrinsics directly afaik
<alyssa> oh you mean for RA and stuff.. yeah, presumably
<cwabbott> so it becomes a soup of predicated things where RA can't do a good job
<karolherbst> so not sure why it would matter?
<karolherbst> ahhh
<karolherbst> yeah, that could be
<cwabbott> imagine you have a loop, oops everything conflicts with everything else because it's all predicated
<cwabbott> that sort of thing
<karolherbst> though the question is: what's your alternative? use OpenMP declarations or rely on the auto-vectorizer?
klounge195 has joined #dri-devel
<cwabbott> yeah no, there's no good alternative for CPUs, you'd need to write a different backend from scratch to do it properly
<karolherbst> yeah..
<karolherbst> I think ispc is probably good enough of a solution here without reinventing everything
<karolherbst> anyway, fascinating project nontheless
jsa1 has quit [Ping timeout: 480 seconds]
jsa has joined #dri-devel
ndufresne has joined #dri-devel
epoch101 has joined #dri-devel
hansg has quit [Quit: Leaving]
davispuh has joined #dri-devel
bolson has quit [Remote host closed the connection]
bolson has joined #dri-devel
epoch101 has quit [Ping timeout: 480 seconds]
epoch101 has joined #dri-devel
warpme has joined #dri-devel
kts has joined #dri-devel
Duke`` has joined #dri-devel
idr has quit [Ping timeout: 480 seconds]
warpme has quit []
jsa has quit [Ping timeout: 480 seconds]
hansg has joined #dri-devel
epoch101_ has joined #dri-devel
epoch101 has quit [Ping timeout: 480 seconds]
tzimmermann has quit [Quit: Leaving]
macromorgan has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
rgallaispou has quit [Read error: Connection reset by peer]
LeviYun has quit [Ping timeout: 480 seconds]
bolson has quit [Ping timeout: 480 seconds]
kzd has joined #dri-devel
idr has joined #dri-devel
rgallaispou has joined #dri-devel
hansg has quit [Quit: Leaving]
LeviYun has joined #dri-devel
rgallaispou has quit [Read error: Connection reset by peer]
cmichael has quit [Quit: Leaving]
macromorgan has joined #dri-devel
epoch101_ has quit [Ping timeout: 480 seconds]
LeviYun has quit [Ping timeout: 480 seconds]
amarsh04 has quit []
rgallaispou has joined #dri-devel
amarsh04 has joined #dri-devel
LeviYun has joined #dri-devel
macromorgan has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
macromorgan has joined #dri-devel
macromorgan has quit []
alanc has quit [Remote host closed the connection]
alanc has joined #dri-devel
macromorgan has joined #dri-devel
blaztinn has quit [Remote host closed the connection]
blaztinn has joined #dri-devel
LeviYun has quit [Ping timeout: 480 seconds]
blaztinn has quit [Remote host closed the connection]
<DavidHeidelberg> zmike: still around the world trip.. just occasionally doing something until I'll start at some new job :)
blaztinn has joined #dri-devel
Kayden has quit [Quit: -> JF]
lynxeye has quit [Quit: Leaving.]
LeviYun has joined #dri-devel
blaztinn has quit [Remote host closed the connection]
blaztinn has joined #dri-devel
dbrouwer has joined #dri-devel
Kayden has joined #dri-devel
ckinloch has joined #dri-devel
LeviYun has quit [Ping timeout: 480 seconds]
cyrinux30 has quit []
cyrinux30 has joined #dri-devel
<gfxstrand> jenatali: Thinking about WDDM in mesa "for real" and I'm not sure I want to have a hard dependency on libdxg. Thoughts about marshalling things?
<jenatali> Marshaling?
<jenatali> In WSL you can rely on those entrypoints being available in the distro in a libdxcore.so, and in Windows they come from gdi32.dll
<gfxstrand> Yeah but let's say Ubuntu is going to ship Mesa with WDDM enabled
<gfxstrand> Does that mean Ubuntu also ships libdxg and it just doesn't do anything?
<gfxstrand> I guess that's probably fine. It's tiny.
<jenatali> That's an option, but you could also operate like we do with the d3d12/dozen driver, which ships enabled in Ubuntu AFAIK, and just dlopen
<jenatali> If libdxcore.so isn't there at runtime then you don't have WDDM anyway
<gfxstrand> Yeah, but then we have to dlsym everything
<gfxstrand> That's kinda what I was asking for thoughts on
<jenatali> Not necessarily, can't dlopen promote things into the global namespace?
<gfxstrand> Maybe we can with weak symbols of some sort
<jenatali> You'd have to allow unresolved symbols at link time though I guess for that to work
<alyssa> are Windows uapis stable ?
<jenatali> Yes
<alyssa> dang
<jenatali> All APIs provided from any Windows DLL, whether it's a kernel-accessor API or just strictly usermode, is stable once it ships in a retail OS
<gfxstrand> the pPrivateDatas, though, are anyone's guess.
<jenatali> Right, those are generally not considered stable
<jenatali> We require UMD and KMD to match because vendors have refused to commit to making those stable...
<gfxstrand> Yeah, they're usually literally just a struct in a header in a perforce tree somewhere
fireburn has quit [Quit: Konversation terminated!]
jkrzyszt_ has quit [Ping timeout: 480 seconds]
i-garrison has quit []
LeviYun has joined #dri-devel
simon-perretta-img has quit [Read error: Connection reset by peer]
kts has quit [Quit: Konversation terminated!]
i-garrison has joined #dri-devel
LeviYun has quit [Ping timeout: 480 seconds]
<DemiMarie> jenatali gfxstrand: does that mean that Mesa can be used as the UMD of a WDDM2 driver?
<jenatali> If the driver actually has a stable KMD interface, yeah
<jenatali> Or if the vendor is shipping Mesa as their UMD along with a matching KMD
<gfxstrand> DemiMarie: Yes, in theory
<DemiMarie> jenatali gfxstrand: Use-case is GPU acceleration in Windows guests on Qubes OS, which will require virtio-GPU native context support. That requires Mesa as the UMD and a proxy as KMD.
<DemiMarie> So the KMD interface would just be a proxy for the host’s KMD.
gouchi has joined #dri-devel
<DemiMarie> gfxstrand: What about in practice?
<DemiMarie> gfxstrand: what point are you trying to make?
<gfxstrand> I have it working, more-or-less
<gfxstrand> Whether or not we'll ship remains to be seen
simon-perretta-img has joined #dri-devel
<DemiMarie> gfxstrand: in what context are you considering shipping it?
<gfxstrand> Unclear
<gfxstrand> Right now I'm mostly concerned with proving it possible
<DemiMarie> Interesting.
<DemiMarie> Which KMD are you using?
<gfxstrand> The one AMD ships
jsa has joined #dri-devel
<Kayden> radv on windows? nice
<mattst88> that's pretty amazing
LeviYun has joined #dri-devel
Mangix has quit [Read error: Connection reset by peer]
Mangix has joined #dri-devel
* zmike cackles maniacally
KDDLB has joined #dri-devel
LeviYun has quit [Ping timeout: 480 seconds]
<Ermine> Wowsers
<feaneron> nice
kzd has quit [Quit: kzd]
* soreau blinks
<ccr> "the science has gone too far."
<alyssa> we do what we must, because we can
* DemiMarie wonders why one would not just use AMD’s UMD on Windows
kzd has joined #dri-devel
<gfxstrand> Because they're crazy!
Mangix has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
<gfxstrand> Or because daniels bet her $100 she couldn't do it. :D
Mangix has joined #dri-devel
karenthedorf has joined #dri-devel
<alyssa> psh, she can do anything
<Sachiel> silly daniels making that bet when $1 would have been enough
mlankhorst has joined #dri-devel
coldfeet has quit [Remote host closed the connection]
<daniels> gfxstrand: plus the cost of my time to work out PPM encoding so I could get that awesome logo on vkcube
<daniels> also the shipping
<gfxstrand> Okay, now I have semi-competent device enumeration such that RADV doesn't try to open my NVIDIA card
leandrohrb56 has joined #dri-devel
<alyssa> what could go wrong with that
LeviYun has joined #dri-devel
LeviYun has quit [Ping timeout: 480 seconds]
kxkamil2 has quit []
LeviYun has joined #dri-devel
<DragoonAethis> gfxstrand: can I bet you another $100 to make it go the other way around? :D
<gfxstrand> What do you define as the other way around?
<DragoonAethis> Windows Vulkan blobs on amdgpu on Linux
<airlied> that's what amdgpu-pro is
<gfxstrand> Yeah, no
<gfxstrand> Also that
<airlied> I'll take the $100 now :-P
<DragoonAethis> that's cheating ;p
kxkamil has joined #dri-devel
LeviYun has quit [Ping timeout: 480 seconds]
davispuh has quit [Remote host closed the connection]
<soreau> shoulda stuck at a dollar
<DragoonAethis> Should we meet at XDC or something, I'll owe you a beer
davispuh has joined #dri-devel
gouchi has quit [Read error: Connection reset by peer]
gouchi has joined #dri-devel
rasterman has quit [Quit: Gettin' stinky!]
soreau has quit [Ping timeout: 480 seconds]
LeviYun has joined #dri-devel
anarsoul has quit [Quit: ZNC 1.8.2 - https://znc.in]
anarsoul has joined #dri-devel
jsa has quit [Ping timeout: 480 seconds]
soreau has joined #dri-devel
LeviYun has quit [Ping timeout: 480 seconds]
gouchi has quit [Quit: Quitte]
colemickens has joined #dri-devel
sukuna has joined #dri-devel
vliaskov has quit [Ping timeout: 480 seconds]
LeviYun has joined #dri-devel
<agd5f> gfxstrand, the WSL support in ROCm works that way. Basically a different ROC runtime which converts the KFD calls to dxgi calls
Duke`` has quit [Ping timeout: 480 seconds]
nerdopolis has quit [Ping timeout: 480 seconds]
linusw_ has quit []
linusw has joined #dri-devel
LeviYun has quit [Ping timeout: 480 seconds]
davispuh has quit [Ping timeout: 480 seconds]
LeviYun has joined #dri-devel
glennk has joined #dri-devel
<zf> oop
<zf> sorry, wrong channel
guludo has quit [Quit: WeeChat 4.3.2]
gio has quit [Ping timeout: 480 seconds]
gio has joined #dri-devel
glennk has quit [Ping timeout: 480 seconds]
LeviYun has quit [Remote host closed the connection]
LeviYun has joined #dri-devel
smpl has quit [Ping timeout: 480 seconds]
LeviYun has quit [Ping timeout: 480 seconds]
idr has quit [Ping timeout: 480 seconds]
jannh has joined #dri-devel
Kayden has quit [Quit: -> dinner, home]
Haaninjo has quit [Quit: Ex-Chat]
LeviYun has joined #dri-devel
nerdopolis has joined #dri-devel
LeviYun has quit [Ping timeout: 480 seconds]
LeviYun has joined #dri-devel
LeviYun has quit [Ping timeout: 480 seconds]