<DUOLabs[m]>
Does anyone know how exactly does venus on the guest send Vulkan commands to qemu/virgl on the host?
<airlied>
via virtio
<DUOLabs[m]>
That doesn
<DUOLabs[m]>
That doesn't make things any more clear.
<airlied>
not sure what information you need though, the kernel driver talks to qemu via virtio, which is a standard transport mechanism
<DUOLabs[m]>
Yes, but looking through qemu's source code, I'm not sure where this takes place: for example, is it when virtio_gpu_simple_process_cmd processes VIRTIO_GPU_CMD_RESOURCE_CREATE_2D, or somewhere else?
<airlied>
virgl_cmd_submit_3d
<airlied>
though I haven't seen the venus code for qemu
<DUOLabs[m]>
I looked there already, but when I traced virgl_renderer_submit_cmd, no ctx passed in every went to venus, but straight to vrend proper, which is why I'm little confused.
camus has quit []
<HdkR>
It's becoming even more opaque with the whole drm passthrough thing :D
camus has joined #dri-devel
<DUOLabs[m]>
s/every/ever/, s/venus/Venus/
smilessh has quit [Read error: Connection reset by peer]
smilessh has joined #dri-devel
<airlied>
DRM_IOCTL_VIRTGPU_EXECBUFFER seems to be that path that gets called
<airlied>
and that calls VIRTIO_GPU_CMD_SUBMIT_3D
<airlied>
and one the qemu side taht seems to go into virgl_cmd_submit_3d
<airlied>
but yeah I can't see how that hooks up in virglrenderer
<airlied>
DUOLabs[m]: I think the proxy layer steps in somewhere on the renderer side
<airlied>
proxy_context_submit_cmd
<DUOLabs[m]>
No, that can't be it --- only `vrend_decode_ctx_submit_cmd` is ever called.
<tzimmermann>
narmstrong, thank you for taking care and informing us
<narmstrong>
tzimmermann: sure, waiting for some feedback on the revert and I'll push it on drm-misc-next
sgruszka has joined #dri-devel
heat_ has joined #dri-devel
heat has quit [Read error: No route to host]
pcercuei has joined #dri-devel
<MrCooper>
q66: some ARM SOCs have not-fully-compliant PCIe, which means they can't work correctly with amdgpu
pcercuei_ has joined #dri-devel
pcercuei has quit [Ping timeout: 480 seconds]
lynxeye has joined #dri-devel
Haaninjo has joined #dri-devel
djbw has quit [Read error: Connection reset by peer]
pochu has joined #dri-devel
pjakobsson has joined #dri-devel
jkrzyszt_ has joined #dri-devel
lumag_ is now known as lumag
heat_ has quit [Remote host closed the connection]
heat_ has joined #dri-devel
<jfalempe>
tzimmermann, For mgag200, I can use the GEM DMA helper for mgag200, but it works only on lower resolution, because it can't allocate more than 4MB for the framebuffer. I'm wondering how other drivers are using it, are they also hitting this limitation ?
<tzimmermann>
jfalempe, i've seen your response. i'm thinking how to move forward
<tzimmermann>
the other drivers with dma helpers are on SoCs. they set aside CMA areas and it apparently works. i'm a bit surprised that x86 is somewhat fragile about this
bgs has quit [Remote host closed the connection]
<jfalempe>
I think on x86, most hardware have scatter-gather DMA and can allocate smaller chunks.
<_jannau_>
there should be drivers using GEM DMA helpers on SoC behind an IOMMU. those do not need CMA but can still allocate large framebuffers
<tzimmermann>
jfalempe, i don't like that the dma code is "hacked into" the damage handling (for the lack of a better description)
swalker_ has joined #dri-devel
swalker_ is now known as Guest1397
rasterman has joined #dri-devel
<jfalempe>
tzimmermann, you mean I should move more code to the mgag200_dma.c, or that it shouldn't be called from the handle_damage() function ?
<jfalempe>
damage is where we copy the pixels to the VRAM, so I don't see how I can do that differently.
<tzimmermann>
maybe the memcpy() can still be avoided with careful use of the SHMEM helpers.
<tzimmermann>
if physical SHMEM pages are within the lower 32-bit range, they should be dma-able. with care, they could be flushed and dma-ed directly.
<tzimmermann>
that's for a different patchset, though
<jfalempe>
hum, they also should be contiguous in physical memory, which may be unlikely.
<jfalempe>
my server has 4GB or RAM, so I though all of it should be in lower 32bit range, but in practice, with CMA it has often addresses over the 32bit limit.
<jfalempe>
tzimmermann, I will do a v2, trying to address all your comments.
<tzimmermann>
sgruszka, looks reasonable. done. someone else has to ack as well.
<tzimmermann>
maybe mlankhorst or danvet ^
greenjustin_ has quit [Remote host closed the connection]
<sgruszka>
tzimmermann: thanks!
<sima>
tzimmermann, mripard maybe, I generally try to leave this to you all
<kode54>
mismanaging my Fastmail account
<kode54>
"oops" using before/after ranges to move every archived message into the archive folders
<kode54>
resulted in all my categorized folders being emptied
<kode54>
so now I'm moving the messages based on the original rules back into their original category folders
<kode54>
"to:@vger.kernel.org" - "Moving 450,774 conversations to Linux Kernel"
neonking has joined #dri-devel
YuGiOhJCJ has joined #dri-devel
aravind has joined #dri-devel
mauld has quit [Quit: WeeChat 3.6]
cmichael has joined #dri-devel
kts has joined #dri-devel
<doras>
jenatali: any suggestion for the name of the Meson option? Should it be "feature" or "boolean"? What should be the default/auto behavior? If it's a "feature" option, we need consider that `--auto-features enabled` is commonly used, so the `enabled` case should reflect the more commonly desired case.
<dj-death>
is there a NIR function that tells whether a block/instr can run before another block/instr ?
<dj-death>
like in the if () { } else {} you can tell if there is no loop wrapping that the if block will not execute before the else block
<dj-death>
those are 2 different paths
sukrutb__ has quit [Ping timeout: 480 seconds]
<doras>
jenatali: my current approach is a "feature" option called `opencl-external-clang-headers`, with its default `auto` behavior being "enabled". I'd rather not tie it to `microsoft-clc` in any way; common code paths changing depending on which users were enabled (and affecting other users) is quite unexpected. It does mean that you'll need to disable it explicitly for the Windows packaging use case (and CI).
<llyyr>
mesa vulkan-beta builds have been failing without -Wno-error=missing-prototypes for a while now, is this intended?
sukrutb__ has joined #dri-devel
neonking has quit [Remote host closed the connection]
nehsou^ has quit [Remote host closed the connection]
Company has joined #dri-devel
thenemesis has quit [Quit: My MacBook Air has gone to sleep. ZZZzzz…]
thenemesis has joined #dri-devel
heat_ has quit [Read error: Connection reset by peer]
heat has joined #dri-devel
thenemesis has quit [Read error: Connection reset by peer]
thenemesis has joined #dri-devel
idr has joined #dri-devel
greenjustin has joined #dri-devel
kzd has joined #dri-devel
<DavidHeidelberg[m]>
eric_engestrom: are u having some devices under outage? For example now I see 2 rpi jobs waiting to get triggered, while rest is running
<eric_engestrom>
DavidHeidelberg[m]: not that I know
<eric_engestrom>
checking
<DavidHeidelberg[m]>
so, maybe decrease number jobs by two?
<eric_engestrom>
yeah, maybe we need to have tweak the jobs a bit
<DavidHeidelberg[m]>
eric_engestrom: if some jobs going to wait, I would decrease paralellness to avoid setting up new job (while it still increase of job runtime, at least not that much)
<jasuarez>
let me check
<DavidHeidelberg[m]>
thx! :)
<DavidHeidelberg[m]>
anholt: same goes for a630, I see 7 jobs running, 3 waiting. Are there currently 10 devices available?
<eric_engestrom>
anholt's on holiday iirc
<DavidHeidelberg[m]>
thx. I'll see on the next job, if still 3 devs will wait, I'll drop few tests for now
<DavidHeidelberg[m]>
*devs=devices
<jasuarez>
David Heidelberg: seems all the devices are already running jobs
<jasuarez>
there are 21 rpi4 available
<jasuarez>
and now all of busy working in your pipeline :)
<jasuarez>
s/of/are
<DavidHeidelberg[m]>
ok, so someone had to run some test from different pipeline
<DavidHeidelberg[m]>
thanks!
<jasuarez>
those jobs are now in run
gouchi has quit [Remote host closed the connection]
<DavidHeidelberg[m]>
jasuarez: eric_engestrom btw. rpi4 jobs, I see it runs ~ 20 minutes. You need to cut it down to 15, please.
<jasuarez>
yes, we need to do some adjustments
<DavidHeidelberg[m]>
for LAVA devices it isn't that much pain, since we can do prioritization, but if someone send two pipelines to your farm just as pre-merge starts, it'll take more than 1h to finish
<DavidHeidelberg[m]>
with 15 minutes it would be still doable I think
<jasuarez>
yeah, totally agree
<DavidHeidelberg[m]>
Thank you :)
<jasuarez>
np!
thenemesis has quit [Quit: My MacBook Air has gone to sleep. ZZZzzz…]
zzoon has quit [Ping timeout: 480 seconds]
ADS_Sr has joined #dri-devel
pallavim has joined #dri-devel
<q66>
DavidHeidelberg[m]:
<q66>
oh shit
<q66>
sorry
<q66>
i was in mobile irc and accidentally tapped your name
<DavidHeidelberg[m]>
np, just read your report on Ampera and radeonsi hour ago
<q66>
yeah still no luck with that
<q66>
I'll dig around more next week
<q66>
i'll also be putting in different RAM but i doubt that'll make a diff
thenemesis has joined #dri-devel
thenemesis_ has joined #dri-devel
thenemesis_ has quit []
MrCooper has quit [Remote host closed the connection]
MrCooper has joined #dri-devel
thenemesis has quit [Ping timeout: 480 seconds]
<DUOLabs[m]>
<airlied> "okay then that seems to be where..." <- What makes this even more interesting is that `vn_instance_init_experimental_features` returns the wrong data --- all properties are shown as 0, which is not what virglrenderer's source code says.
<DUOLabs[m]>
This implies that the command actually fails to be sent, and some default value is shown.
thenemesis has joined #dri-devel
sgruszka has quit [Remote host closed the connection]
JohnnyonF has joined #dri-devel
jfalempe has quit [Read error: Connection reset by peer]
jfalempe_ has joined #dri-devel
JohnnyonFlame has quit [Ping timeout: 480 seconds]
<DUOLabs[m]>
This probably means that vn_decode_vkGetVenusExperimentalFeatureData100000MESA_reply fails somewhere, causing instance->experimental to be NULL.
junaid has joined #dri-devel
thenemesis has quit [Ping timeout: 480 seconds]
kts has joined #dri-devel
sukrutb__ has joined #dri-devel
thenemesis has joined #dri-devel
kts has quit [Quit: Konversation terminated!]
thenemesis has quit [Remote host closed the connection]
<DavidHeidelberg[m]>
eric_engestrom: just keep it and squash it with something else
<DavidHeidelberg[m]>
I think we seriously need implement CI: skip tag
<DavidHeidelberg[m]>
or maybe CI: skip-HW-test
<DavidHeidelberg[m]>
like for one linting line we run 250 devices for 10 minutes under full load. This is fcking ecology disaster (and also we have to wait)
tzimmermann has quit [Quit: Leaving]
<DavidHeidelberg[m]>
...ehm not 10 minutes but more like 15-20 minutes
<DavidHeidelberg[m]>
*ecological
<eric_engestrom>
DavidHeidelberg[m]: ack
<eric_engestrom>
and yeah, earlier I merged a .gitlab-ci/*.yml change where the fact the pipeline was created was the verification that the change was correct, but it still ran the entire pipeline with all the hardware tests
MrCooper has quit [Remote host closed the connection]
<DavidHeidelberg[m]>
I'll bring it up on Monday team meeting, we'll need implement something for it in Marge-bot I assume.
thenemesis has quit [Read error: Connection reset by peer]
<eric_engestrom>
I think a label would work well, we would just need to add `if [[ "$CI_MERGE_REQUEST_LABELS" = *ci::skip* ]]; exit 0; fi` at the top of each job that should be skipped (ie. at least all hardware jobs)
<eric_engestrom>
(or a !reference to that actually, to make it easier to edit later)
<DUOLabs[m]>
<DUOLabs[m]> "This probably means that..." <- However, while most `cs.hdr.ctx_id` is `2` (I'm assuming the "main" Virgl context, some `cs.hdr.ctx_id` is 5, which may refer to the venus context (in that case, why is the capset not for Venus)?
<eric_engestrom>
we could even have `ci::skip-hardware-tests` and `ci::skip-everything` for instance, if we want to even skip build jobs as well sometimes
junaid has quit [Ping timeout: 480 seconds]
<DUOLabs[m]>
DUOLabs[m]: However, the contexts with the 5 ctx_id don't exist with virgl_context_lookup
MrCooper has joined #dri-devel
aravind has quit [Ping timeout: 480 seconds]
thenemesis has joined #dri-devel
thenemesis has quit [Remote host closed the connection]
djbw has joined #dri-devel
<daniels>
I can’t wait until everyone just smashes their MRs in with ci::skip-everything because I am awesomely smart and how dare a machine tell me my code isn’t perfect
cmichael has quit [Quit: Leaving]
thenemesis has joined #dri-devel
smilessh has quit [Remote host closed the connection]
smilessh has joined #dri-devel
thenemesis has quit [Read error: Connection reset by peer]
<eric_engestrom>
yeah that's always the risk :/
<eric_engestrom>
I guess we could add checks that if any driver code is changed, using these labels fails the pipeline
thenemesis has joined #dri-devel
<eric_engestrom>
DavidHeidelberg[m]: kernel uprev is breaking X on freedreno :/
thenemesis has quit [Remote host closed the connection]
<DavidHeidelberg[m]>
eric_engestrom: I see :/ just few patchlevels up....
<DavidHeidelberg[m]>
robclark: any ideas? "23-05-26 17:04:31 R SERIAL> [ 18.416476] msm-mdss: probe of 900000.display-subsystem failed with error -110"
<eric_engestrom>
(also, commit says the hash is for 6.3.4 but the image tag bump says 6.3.3; the image tag doesn't matter but it might be good to be consistent, assuming the latter is the wrong one)
<eric_engestrom>
DavidHeidelberg[m]: can you add a comment on the line with the kernel url saying which tag the hash corresponds to, to having to look it up?
<DavidHeidelberg[m]>
I added the link into the MR
<robclark>
DavidHeidelberg[m]: looks like most of the changes are disp/dpu (no idea why, for ex, "drm/msm/dpu: split SM8550 catalog entry to the separate file" ended up on stable).. so should be pretty much unrelated..
thenemesis has quit [Read error: Connection reset by peer]
<DavidHeidelberg[m]>
I was thinking this one could affect a530
<DavidHeidelberg[m]>
but it doesn't look much likely :/
<robclark>
it didn't trigger any igt fails in msm-next/msm-fixes CI
<robclark>
IIRC db845c is using a bridge chip for hdmi, so could be something outside of drm/msm .. but maybe you could bisect
Cyrinux9 has quit []
Cyrinux9 has joined #dri-devel
<robclark>
(or at least narrow it down to which 6.3.y tag started having problems)
<DavidHeidelberg[m]>
I'll try to do it, but just as I reduce the 1hr qaiting for the rootfs generation
<DavidHeidelberg[m]>
*waiting
thenemesis has joined #dri-devel
paulk-bis has joined #dri-devel
sassefa has joined #dri-devel
<robclark>
thx
sassefa has left #dri-devel [#dri-devel]
sassefa has joined #dri-devel
sassefa has quit []
sassefa has joined #dri-devel
smilessh has quit [Ping timeout: 480 seconds]
paulk has quit [Ping timeout: 480 seconds]
thenemesis has quit [Remote host closed the connection]
thenemesis has joined #dri-devel
thenemesis has quit []
bgs has joined #dri-devel
junaid has joined #dri-devel
<sassefa>
Hello. I'm Surafel and I'm looking to get into DRI development. Just introducing myself for now 👋️
iive has joined #dri-devel
lynxeye has quit [Quit: Leaving.]
kts has joined #dri-devel
kts has quit [Quit: Konversation terminated!]
sassefa has quit [Ping timeout: 480 seconds]
sassefa has joined #dri-devel
<DUOLabs[m]>
* However, while most `cs.hdr.ctx_id` is `2` (I'm assuming the "main" Virgl context), some `cs.hdr.ctx_id` is 5, which may refer to the venus context (in that case, why is the capset not for Venus)?
Lyude has joined #dri-devel
krushia has joined #dri-devel
idr has quit [Ping timeout: 480 seconds]
kts has joined #dri-devel
Duke`` has quit [Ping timeout: 480 seconds]
sima has quit [Ping timeout: 480 seconds]
junaid has quit [Remote host closed the connection]
bgs has quit [Remote host closed the connection]
<karolherbst>
any debugging tips when kmalloc crashes on weird addresses? Tried kasan, but my bug doesn't happen once it's enabled
vliaskov has quit [Remote host closed the connection]
<iive>
karolherbst, how about using second crashkernel to make kdump of the crash?