Antonio Caggiano
November 26, 2021
Reading time:
With virtualization we can create multiple virtual machines over a single physical computer. The benefits of virtualization are countless, from being able to create virtual representation of different machines, to efficiently use the currently available hardware. Clearly a virtual machine, like any real computer, needs an operating system (OS). In this context it is called a Guest OS, as opposed to the one running on real hardware, called Host OS.
Running graphics applications in a Guest OS can be annoying as they are generally greedy of computing resources, and that can slow you down or give you a bad experience in terms of graphics performance. Being able to accelerate all this by offloading the workload to the hardware can be a great deal. The VirtIO-GPU virtual GPU device comes into play here, allowing a Guest OS to send graphics commands to it through OpenGL or Vulkan. While we are already there with OpenGL, we can not say the same for Vulkan. Well, until now.
Jump to a section: Overview | Definitions | Prerequisites | Create an image for QEMU | Running QEMU | Testing Venus | Troubleshooting | Conclusions
This blog post describes how to enable 3D acceleration of Vulkan applications in QEMU through the Venus experimental Vulkan driver for VirtIO-GPU with a local development environment.
As an alternative you could cherry-pick this commit which contains a set of scripts you could use to set up a Docker development environment.
Let us start with a brief description of the projects mentioned in this post:
The following snippets are prefixed by either (host)
or (guest)
to specify where they should run. Of course, in order to run something in the guest, you should have QEMU and an image already in place.
Venus requires BLOB resources support in QEMU, which in turns requires /dev/udmabuf
. This is not enabled in the default Debian kernel, so make sure your kernel was built with CONFIG_UDMABUF
.
Please note that you could encounter the following error with kvm on AMD when enabling BLOB support:
error: kvm run failed Bad address
.
Clone virglrenderer res-sharing
branch from FDO/fahien and compile it with:
(host) git clone -b res-sharing https://gitlab.freedesktop.org/Fahien/virglrenderer.git cd virglrenderer meson build \ -Dprefix=$HOME/.local \ -Dplatforms=egl \ -Dvenus-experimental=true \ -Dminigbm_allocation=false \ -Dbuildtype=debugoptimized ninja -C build install
venus-dev
from FDO/fahien. Then configure and compile it enabling OpenGL, VirGL, GTK (or SDL if you prefer this frontend):
(host) git clone -b venus-dev https://gitlab.freedesktop.org/Fahien/qemu.git cd mesa mkdir build && cd build ../configure \ --prefix=$HOME/.local \ --target-list=x86_64-softmmu \ --enable-kvm \ --disable-werror \ --enable-opengl \ --enable-virglrenderer \ --enable-gtk \ --enable-sdl make -j4 qemu-system-x86_64 && make install
Linux kernel v5.16-rc1
+.
Mesa version 21.1+
, configured with meson -Dvulkan-drivers=virtio-experimental
.
Install vulkan-utils
and run vulkaninfo | grep driver
to get some info on the available vulkan drivers.
Test vkcube.
You will need to provide QEMU an image. Here is an example of how to make one.
(host) ISO=ubuntu-21.04-desktop-amd64.iso wget https://releases.ubuntu.com/21.04/$ISO IMG=ubuntu.qcow2 qemu-img create -f qcow2 $IMG 16G # Start ubuntu installation by booting from CD-ROM. # No need for graphics acceleration at the moment. qemu-system-x86_64 \ -enable-kvm \ -M q35 \ -smp 1 \ -m 4G \ -net nic,model=virtio \ -net user,hostfwd=tcp::2222-:22 \ -hda $IMG \ -display gtk \ -boot d -cdrom $ISO
Running with -d guest_errors
will show error messages from the guest.
(host) qemu-system-x86_64 \ -enable-kvm \ -M q35 \ -smp 1 \ -m 4G \ -cpu host \ -net nic,model=virtio \ -net user,hostfwd=tcp::2222-:22 \ -hda $IMG \ -device virtio-vga-gl,context_init=true,blob=true,hostmem=4G \ -vga none \ -initrd /image/rootfs.cpio.gz \ -kernel /kernel/arch/x86_64/boot/bzImage \ -append "root=/dev/sda3 nokaslr" \ -display gtk,gl=on,show-cursor=on \ -usb -device usb-tablet \ -object memory-backend-memfd,id=mem1,size=4G \ -machine memory-backend=mem1 \ -d guest_errors
VirtIO VGA GL (-device virtio-vga-gl
) requires OpenGL support by the current QEMU display, which can be enabled with the following cli option -display gtk,gl=on
.
For some reason, we hit a GTK assertion due to a failure in gtk_widget_get_realized()
. The solution is to run QEMU with -vga none
to avoid having two scanouts, one for VGA and another for virtio-vga-gl.
I made a custom config (x86_64.config) to build VirtIO-GPU and DRM within the kernel with debug info.
(host) git clone --depth 1 -b v5.16-rc1 https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git kernel cd kernel ./scripts/kconfig/merge_config.sh arch/x86/configs/x86_64_defconfig x86_64.config make -j12 vmlinux bzImage
Starting Qemu with our custom kernel can be done by setting the current command line options:
-kernel arch/x86_64/boot/bzImage \ -inintrd ramdisk.img \ -append "root=/dev/sda3" \
You can create
ramdisk.img
by runningmkinitramfs -o ramdisk.img
Make sure VirGL is correctly detected and used by running the following:
(guest) glxinfo -B
If it outputs llvmpipe
instead, build mesa with this configuration:
(guest) git clone -b qemu-venus https://gitlab.freedesktop.org/Fahien/mesa.git cd mesa meson build \ -Dprefix=/usr \ -Ddri3=enabled \ -Dglx=dri \ -Degl=enabled \ -Dgbm=enabled \ -Dgallium-vdpau=disabled \ -Dgallium-vs=disabled \ -Dvalgrind=disabled \ -Dbuildtype=debugoptimized \ -Ddri-drivers=[] \ -Dgallium-drivers=swrast,virgl \ -Dvulkan-drivers=swrast,virtio-experimental \ -Dvulkan-layers=device-select ninja -C build install
Then compile and and run vkcube to test Venus, making sure to tell mesa the correct Vulkan ICD file name:
(guest) sudo apt install meson build-essential libdrm-dev libgbm-dev libpng-devibwayland-dev libxcb1-dev libvulkan-dev git clone https://github.com/krh/vkcube.git cd vkcube meson build && meson compile -C build VK_ICD_FILENAMES=/usr/shared/vulkan/icd.d/virtio_icd.x86_64.json build/vkcube
At this point the venus driver should be correctly loaded, and debug messages can be enabled by setting the VN_DEBUG
environment variable to one of the following: init
, result
, vtest
, or wsi
.
-s -S
:
-S
stops qemu waiting for gdb-s
starts a gdb server at localhost:1234
Your ~/.gdbinit
should contain this (make sure it does not point directly to scripts/gdb/vmlinux-gdb.py
):
add-auto-load-safe-path /path/to/linux/vmlinux-gdb.py
Run gdb and attach to QEMU gdb server:
(host) gdb vmlinux (gdb) target remote :1234 (gdb) hbreak start_kernel (gdb) c
You can use VSCode Debug UI thanks to the Native Debug extension, attaching to gdbserver target :1234
, and autorun: [ "hbreak kernel_init" ]
.
Sometimes your only choice is to debug with GDB, therefore here are some useful commands to keep in mind.
Command | Description |
---|---|
bt |
Print backtrace |
f |
Print the current frame |
f <n> |
Change to frame number |
list |
Print out a bunch of lines of code around the current instruction pointer |
b <func_name> |
Set a breakpoint |
stepi |
Step into a function |
n |
Step over to the next instruction within the current function |
fin |
Step out of a function |
delete |
Delete all breakpoints |
delete <n> |
Delete breakpoint number |
info sharedlibrary |
Show list of loaded library |
If your libvulkan.so
fails with a segmentation fault, it would be a good idea to build it from source and debug it with gdb. Make sure to checkout a version in line with your libvulkan-dev
.
(guest) git clone https://github.com/KhronosGroup/Vulkan-Loader.git vulkan-loader cd vulkan-loader git checkout v1.2.162
While you can enable Vulkan hardware acceleration by checking out the development branches following this guide, there is still further work to do on Virglrenderer and QEMU for proper upstreaming, which might need some time to complete. Virgilrenderer needs to fix resource import/export between OpenGL and Vulkan contexts, and QEMU needs various patches currently under review:
To sum up, if you need assistance with graphics virtualization, we would be happy to help, so please do not hesitate to contact us.
On the other hand, if you are a developer and would be thrilled to work on the open source Linux graphics stack, check out our careers page!
03/12/2024
this is a test post
08/10/2024
Having multiple developers work on pre-merge testing distributes the process and ensures that every contribution is rigorously tested before…
15/08/2024
After rigorous debugging, a new unit testing framework was added to the backend compiler for NVK. This is a walkthrough of the steps taken…
01/08/2024
We're reflecting on the steps taken as we continually seek to improve Linux kernel integration. This will include more detail about the…
27/06/2024
With each board running a mainline-first Linux software stack and tested in a CI loop with the LAVA test framework, the Farm showcased Collabora's…
26/06/2024
WirePlumber 0.5 arrived recently with many new and essential features including the Smart Filter Policy, enabling audio filters to automatically…
Comments (25)
Trigger Huang:
Dec 15, 2021 at 11:46 AM
Hello,
>>Please note that you could encounter the following error with kvm on AMD when enabling BLOB support:
Unfortunately, I saw this error on a KVM+INTEL CPU + AMDGPU platform. would you help?
I saw a error 'error: kvm run failed Bad address' when running vkcube in the guest VM after strictly followed all the steps.
However, vulkaninfo on guest shows that my enviroment is good
Virtio-GPU Venus (AMD RADV NAVI14 (ACO))
My system:
CPU: Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz
MemTotal: 16288232 kB
GPU on Host: AMD Radeon Pro W5500]
Reply to this comment
Reply to this comment
Antonio Caggiano:
Dec 16, 2021 at 03:47 PM
Hi, we are aware of this issue, unfortunately it is a configuration we have not tested with and somebody would need to debug it. I can not give you an estimate as the change list is still under the review process.
Reply to this comment
Reply to this comment
Trigger Huang:
Dec 17, 2021 at 11:57 AM
Hi Antonio,
Thank you for the quick response.
This article is still helpful for me as I managed to set up the VirGL rendering for OpenGL in guest VM without any extra steps. :) :) :)
BTW, could you share the recommended system configuration for the current Venus?
Reply to this comment
Reply to this comment
Antonio Caggiano:
Dec 17, 2021 at 03:13 PM
Awesome!
My testing machine has an Intel x86_64 processor with integrated GPU.
Reply to this comment
Reply to this comment
Janboe Ye:
Jun 04, 2022 at 04:40 AM
Do you have chance to test on nvidia dGPU? it reports 'Virgl blob create error: Unknown error -22' on my 1070 GPU
Thanks
Reply to this comment
Reply to this comment
Antonio Caggiano:
Jun 14, 2022 at 10:20 AM
Hi Janboe Ye,
Unfortunately I do not have a Nvidia dGPU at the moment. I would try with debugging QEMU and virglrenderer, stepping into virglrenderer.c:virgl_renderer_resource_create_blob() to see exactly what is going on.
Cheers!
Reply to this comment
Reply to this comment
Mitchel Stewart:
Dec 19, 2021 at 04:11 AM
Great work, can't wait to see this get uptreamed, lots of cool projects can be done with this
Reply to this comment
Reply to this comment
Trigger Huang:
Jan 13, 2022 at 05:53 AM
Happy new year:)
This week I got a chance to debug this issue on AMD dGPU.
This issue happened in the following scenario:
1, Mesa Vulkan in guest VM request to create and map blob resource (GPA is allocated from BAR4 of Virtio GPU PCI dev)
2, Host Qemu create AMDGPU BO and export it by vkGetMemoryFdKHR(RADV driver) in virgl_renderer_resource_create_blob() to get the FD
3, Host qemu call mmap for this FD to get HVA of this BO in virgl_renderer_resource_map()
4, With the HVA and GPA, host qemu will call kvm_set_user_memory_region() to insert this guest memory region into KVM
5, AMDGPU TTM driver will allocate the host pages of this BO when page fault happened
6, kvm_mmu_page_fault() in ept_violation() will be called to setup the EPT page table for this guest memory region. For the first page of this region, EPT is setup well and guest can access it with GVA successfully. But kvm_mmu_page_fault() failed on the second page, then Qemu will report this 'kvm run failed Bad address' error due to the EPT page table is not set successfully.
The root cause:
kvm_try_get_pfn() in kvm_mmu_page_fault() failed due to the second page has a refcount of zero
After check TTM driver, alloc_pages() will be called to allocate host pages, and it will only call set_page_refcounted(page) for this first page.
Fix:
I have a workaround patch to fix it in case anyone who wants to have a quick try Venus on AMD dGPU.
drm/amdgpu: increase ref count for pages from TTM
Signed-off-by: Trigger Huang
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index c875f1cdd..6d7664a1f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -1143,8 +1143,10 @@ static int amdgpu_ttm_tt_populate(struct ttm_device *bdev,
if (ret)
return ret;
- for (i = 0; i < ttm->num_pages; ++i)
+ for (i = 0; i < ttm->num_pages; ++i) {
ttm->pages[i]->mapping = bdev->dev_mapping;
+ page_ref_inc(ttm->pages[i]);
+ }
return 0;
}
@@ -1174,8 +1176,10 @@ static void amdgpu_ttm_tt_unpopulate(struct ttm_device *bdev,
if (ttm->page_flags & TTM_TT_FLAG_EXTERNAL)
return;
- for (i = 0; i < ttm->num_pages; ++i)
+ for (i = 0; i < ttm->num_pages; ++i) {
ttm->pages[i]->mapping = NULL;
+ page_ref_dec(ttm->pages[i]);
+ }
adev = amdgpu_ttm_adev(bdev);
return ttm_pool_free(&adev->mman.bdev.pool, ttm);
According to the patch f8be156be163a052a067306417cd0ff679068c97 in kernel KVM, due to CVE-2021-22543, KVM does not allow mapping valid but non-reference-counted pages
So, set_page_refcounted() should be called for each page of pages from alloc_pages()in TTM.
Maybe we need talk with people
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 72c4e6b39..043c97b3c 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2382,8 +2382,10 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
* would then underflow the refcount when the caller does the
* required put_page. Don't allow those pages here.
*/
- if (!kvm_try_get_pfn(pfn))
- r = -EFAULT;
+ if (!kvm_try_get_pfn(pfn)) {
+ //r = -EFAULT;
+ printk("Not force EFAULT: %s %d r = %d, pfn = 0x%016llx\n", __FUNCTION__, __LINE__, r, pfn);
+ }
Reply to this comment
Reply to this comment
Alex:
Apr 03, 2022 at 09:05 AM
Hi Trigger Huang,
Is this the fix for the note above?
"Please note that you could encounter the following error with kvm on AMD when enabling BLOB support: error: kvm run failed Bad address."
I'm on a Ubuntu host with HWE kernel 5.13.0-39-generic amdgpu Vega dGPU and experiencing this after a few moments in both a Fedora 36 guest (kernel 5.17 + mesa 21.x) and Ubuntu 22.04 guest (kernel 5.15 + mesa 22.0).
Cheers!
Reply to this comment
Reply to this comment
Trigger Huang:
May 30, 2022 at 02:41 AM
Hi Alex,
Sorry for the late response as I didn't check this thread recently.
Yes, the workaround patch, increase the ref count for pages from TTM, for the host AMDGPU driver should fix the issue: when enabling BLOB support: error: kvm run failed Bad address.", This workaround has nothing to do with the ASIC family (Vega, or Navi)
Thanks,
Trigger
Reply to this comment
Reply to this comment
Mitchel Stewart:
Jun 18, 2022 at 11:58 AM
has this been reported upstream? it would be nice if they were made aware.
Reply to this comment
Reply to this comment
Fafa Kitten:
Jun 26, 2022 at 11:26 AM
Thank you for making this patch!! I had the error: kvm run failed Bad address and using a kernel I compiled with this patch made it go away and now Venus device appears in my guest! AMD Radeon RX 5700 XT
Reply to this comment
Reply to this comment
Lorna McNeill:
Jan 17, 2022 at 03:04 PM
Hi Antonio, can you explain why you used -vga none with -device virtio-vga-gl?
Reply to this comment
Reply to this comment
Antonio Caggiano:
Jan 17, 2022 at 06:27 PM
IIRC, this was needed due to a limitation with supporting multiple QEMU scanouts. Without specifying -vga none, I believe QEMU would create two scanouts, "scanout-0" for the standard display device (-vga std) and "scanout-1" for the VirtIO-based display with virgl support (-device virtio-vga-gl). While working on this, I noticed the latter would only work on scanout-0. The quickest workaround for me was to just disable the standard display device.
Reply to this comment
Reply to this comment
Trigger Huang:
Jan 18, 2022 at 10:01 AM
Hi Antonio,
I can't do the single step debug for the guest kernel after follow your instructions, would you help?
Each time I input gdb command 'n' or 's', I always got a interrupt of '__sysvec_apic_timer_interrupt'
Thread 1 hit Breakpoint 1, virtio_gpu_execbuffer_ioctl (dev=0xffff888100ae1000, data=0xffffc90000aa7e50, file=0xffff88810201c000) at drivers/gpu/drm/virtio/virtgpu_ioctl.c:121
121 struct virtio_gpu_device *vgdev = dev->dev_private;
(gdb) n
__sysvec_apic_timer_interrupt (regs=0xffffc90000aa7d18) at arch/x86/kernel/apic/apic.c:1102
1102 trace_local_timer_entry(LOCAL_TIMER_VECTOR);
(gdb) c
Continuing.
Thread 1 hit Breakpoint 1, virtio_gpu_execbuffer_ioctl (dev=0xffff888100ae1000, data=0xffffc90000aa7e50, file=0xffff88810201c000) at drivers/gpu/drm/virtio/virtgpu_ioctl.c:121
121 struct virtio_gpu_device *vgdev = dev->dev_private;
(gdb) n
__sysvec_apic_timer_interrupt (regs=0xffffc90000aa7d18) at arch/x86/kernel/apic/apic.c:1102
1102 trace_local_timer_entry(LOCAL_TIMER_VECTOR);
(gdb) bt
#0 __sysvec_apic_timer_interrupt (regs=0xffffc90000aa7d18) at arch/x86/kernel/apic/apic.c:1102
#1 0xffffffff81c11a89 in sysvec_apic_timer_interrupt (regs=0xffffc90000aa7d18) at arch/x86/kernel/apic/apic.c:1097
Backtrace stopped: Cannot access memory at address 0xffffc90000004008
Reply to this comment
Reply to this comment
Antonio Caggiano:
Jan 18, 2022 at 03:12 PM
Hi Trigger, I would try with enabling this option:
make menuconfig
> Processor type and features >
[*] Support x2apic
Reply to this comment
Reply to this comment
Trigger Huang:
Jan 19, 2022 at 03:34 AM
Hi Antonio,
Unfortunately, I still saw this issue after enable x2apic on both host & guest kernel. :)
CONFIG_X86_X2APIC=y
The single step can only work well inside function __sysvec_apic_timer_interrupt()
Reply to this comment
Reply to this comment
Antonio Caggiano:
Jan 19, 2022 at 04:59 PM
Another thing you can try is this:
https://stackoverflow.com/questions/64961631/how-to-skip-timer-interrupt-while-debugging-linux
If this does not work either, I am afraid your best options would be to just set breakpoints on the lines you want and hit C.
Reply to this comment
Reply to this comment
Trigger Huang:
Jan 20, 2022 at 06:11 AM
Hi Antonio,
Thanks for pointing out this link. This method helped me a lot, and now GDB for guest kernel worked much better than before.
Reply to this comment
Reply to this comment
Reggie:
Jan 28, 2022 at 06:01 PM
how to get this driver to work in chrome os crostini with the default debian container?
Reply to this comment
Reply to this comment
Reggie:
Jun 06, 2022 at 10:10 PM
Here is an example this:
https://gist.github.com/Usulyre/bb33f77b225b8d9336c1f9e744114fba
Reply to this comment
Reply to this comment
Mitchel Stewart:
Jun 11, 2022 at 06:21 PM
Is this still being worked on? it would be something very nice to see working in qemu. and if so is there anywhere we can follow the progress of this and test it as it gets worked on?
Reply to this comment
Reply to this comment
DocMAX:
Feb 13, 2023 at 12:28 PM
Can somebody create a PKGBUILD for Arch Linux? I get compile errors with the qemu source mentioned here. Thanks.
Reply to this comment
Reply to this comment
WB:
Mar 05, 2023 at 01:55 AM
Hi Antonio,
that is really cool.
What is status of upstreaming qemu changes?
I checked qemu-devel pages, and see only v2 patch series, but nothing after that. Also the Fab
Reply to this comment
Reply to this comment
Daniel Stone:
Mar 06, 2023 at 02:01 PM
We've been coming back to this and expect to be able to post our changes up in the next month or two.
Reply to this comment
Reply to this comment
Add a Comment