AMD-vi: Fix IOMMU device interrupts being overriddenCurrently, AMD-vi PCI-e passthrough will lead to the following lines indmesg:"kernel: CPU0: local APIC error 0x40ivhd0: Error: completion fail
AMD-vi: Fix IOMMU device interrupts being overriddenCurrently, AMD-vi PCI-e passthrough will lead to the following lines indmesg:"kernel: CPU0: local APIC error 0x40ivhd0: Error: completion failed tail:0x720, head:0x0."After some tracing, the problem is due to the interaction withamdvi_alloc_intr_resources() and pci_driver_added(). In ivrs_drv, theidentification of AMD-vi IVHD is done by walking over the ACPI IVRStable and ivhdX device_ts are added under the acpi bus, while there areno driver handling the corresponding IOMMU PCI function. Inamdvi_alloc_intr_resources(), the MSI intr are allocated with the ivhdXdevice_t instead of the IOMMU PCI function device_t. bus_setup_intr() iscalled on ivhdX. the IOMMU pci function device_t is only used forpci_enable_msi(). Since bus_setup_intr() is not called on IOMMU pcifunction, the IOMMU PCI function device_t's dinfo->cfg.msi is neverupdated to reflect the supposed msi_data and msi_addr. So the msi_dataand msi_addr stay in the value 0. When pci_driver_added() tried to loopover the children of a pci bus, and do pci_cfg_restore() on each ofthem, msi_addr and msi_data with value 0 will be written to the MSIcapability of the IOMMU pci function, thus explaining the errors indmesg.This change includes an amdiommu driver which currently does attaching,detaching and providing DEVMETHODs for setting up and tearing downinterrupt. The purpose of the driver is to prevent pci_driver_added()from calling pci_cfg_restore() on the IOMMU PCI function device_t.The introduction of the amdiommu driver handles allocation of an IRQresource within the IOMMU PCI function, so that the dinfo->cfg.msi ispopulated.This has been tested on EPYC Rome 7282 with Radeon 5700XT GPU.Sponsored by: The FreeBSD FoundationReviewed by: jhbApproved by: philip (mentor)MFC after: 2 weeksDifferential Revision: https://reviews.freebsd.org/D28984(cherry picked from commit 74ada297e8978b8efda3dffdd1bb24aee7c5faa4)
show more ...
Initial support for bhyve save and restore.Save and restore (also known as suspend and resume) permits a snapshotto be taken of a guest's state that can later be resumed. In thecurrent implement
Initial support for bhyve save and restore.Save and restore (also known as suspend and resume) permits a snapshotto be taken of a guest's state that can later be resumed. In thecurrent implementation, bhyve(8) creates a UNIX domain socket that isused by bhyvectl(8) to send a request to save a snapshot (andoptionally exit after the snapshot has been taken). A snapshotcurrently consists of two files: the first holds a copy of guest RAM,and the second file holds other guest state such as vCPU registervalues and device model state.To resume a guest, bhyve(8) must be started with a matching pair ofcommand line arguments to instantiate the same set of device models aswell as a pointer to the saved snapshot.While the current implementation is useful for several uses cases, ithas a few limitations. The file format for saving the guest state istied to the ABI of internal bhyve structures and is notself-describing (in that it does not communicate the set of devicemodels present in the system). In addition, the state saved for somedevice models closely matches the internal data structures which mightprove a challenge for compatibility of snapshot files across a rangeof bhyve versions. The file format also does not currently supportversioning of individual chunks of state. As a result, the currentfile format is not a fixed binary format and future revisions to saveand restore will break binary compatiblity of snapshot files. Thegoal is to move to a more flexible format that adds versioning,etc. and at that point to commit to providing a reasonable level ofcompatibility. As a result, the current implementation is not enabledby default. It can be enabled via the WITH_BHYVE_SNAPSHOT=yes optionfor userland builds, and the kernel option BHYVE_SHAPSHOT.Submitted by: Mihai Tiganus, Flavius Anton, Darius MihaiSubmitted by: Elena Mihailescu, Mihai Carabas, Sergiu WeiszRelnotes: yesSponsored by: University Politehnica of BucharestSponsored by: Matthew Grooms (student scholarships)Sponsored by: iXsystemsDifferential Revision: https://reviews.freebsd.org/D19495
More fixes to build the kernel with a compiler that defaults to -fno-commonUsing the same approach as the last commit for the files used by genassym.sh.Obtained from: CheriBSD
Remove more manual additions of -DSMP.Since r357598 this should no longer be necessary.
All genassym.sh usage need offset.inc
Fix cyclic dependency after r326552.The OBJS_DEPEND_GUESS mechanism was making vmx_genassym.o dependon all headers along with vmx_assym.h, though vmx_assym.h dependson having vmx_genassym.o prese
Fix cyclic dependency after r326552.The OBJS_DEPEND_GUESS mechanism was making vmx_genassym.o dependon all headers along with vmx_assym.h, though vmx_assym.h dependson having vmx_genassym.o present to generate. Moving the headersto DPSRCS is enough to resolve the issue as they will no longerbe implicit dependencies for all objects. Because of this weneed explicit OBJS_DEPEND_GUESS entries to ensure the headersare generated when needed for the *_support.o files that needthem.X-MFC-With: r326552MFC after: 2 weeksSponsored by: Dell EMC
Add AMD IOMMU/AMD-Vi support in bhyve for passthrough/direct assignment to VMs. To enable AMD-Vi, set hw.vmm.amdvi.enable=1.Reviewed by:bcrApproved by:grehanTested by:rgrimesDifferential Revisio
Add AMD IOMMU/AMD-Vi support in bhyve for passthrough/direct assignment to VMs. To enable AMD-Vi, set hw.vmm.amdvi.enable=1.Reviewed by:bcrApproved by:grehanTested by:rgrimesDifferential Revision:https://reviews.freebsd.org/D10049
sys/modules: normalize .CURDIR-relative paths to SRCTOPThis simplifies make output/logicTested with: `cd sys/modules; make ALL_MODULES=` on amd64MFC after: 1 monthSponsored by: Dell EMC Isilon
Exclude -flto when building *genassym.oThe build process generates *assym.h using nm from *genassym.o (which isin turn created from *genassym.c).When compiling with link-time optimization (LTO)
Exclude -flto when building *genassym.oThe build process generates *assym.h using nm from *genassym.o (which isin turn created from *genassym.c).When compiling with link-time optimization (LTO) using -flto, .o filesare LLVM bitcode, not ELF objects. This is not usable by genassym.sh,so remove -flto from those ${CC} invocations.Submitted by: George RimarReviewed by: dimMFC after: 1 monthDifferential Revision: https://reviews.freebsd.org/D9659
Use lapic_ipi_alloc() to dynamically allocate IPI slots needed by bhyve whenvmm.ko is loaded.Also relocate the 'justreturn' IPI handler to be alongside all other handlers.Requested by: kib
Replace bhyve's minimal RTC emulation with a fully featured one in vmm.ko.The new RTC emulation supports all interrupt modes: periodic, update endedand alarm. It is also capable of maintaining the
Replace bhyve's minimal RTC emulation with a fully featured one in vmm.ko.The new RTC emulation supports all interrupt modes: periodic, update endedand alarm. It is also capable of maintaining the date/time and NVRAM contentsacross virtual machine reset. Also, the date/time fields can now be modifiedby the guest.Since bhyve now emulates both the PIT and the RTC there is no need for"Legacy Replacement Routing" in the HPET so get rid of it.The RTC device state can be inspected via bhyvectl as follows:bhyvectl --vm=vm --get-rtc-timebhyvectl --vm=vm --set-rtc-time=<unix_time_secs>bhyvectl --vm=vm --rtc-nvram-offset=<offset> --get-rtc-nvrambhyvectl --vm=vm --rtc-nvram-offset=<offset> --set-rtc-nvram=<value>Reviewed by: tychonDiscussed with: grehanDifferential Revision: https://reviews.freebsd.org/D1385MFC after: 2 weeks
Retire the '@' symlink. It isn't really needed and causes moreproblems than it solves. SYSDIR is already defined almost always andcan be used instead. Working around the one case where it isn't is
Retire the '@' symlink. It isn't really needed and causes moreproblems than it solves. SYSDIR is already defined almost always andcan be used instead. Working around the one case where it isn't ismuch easier than working around the fact that @ may not exist in 18other places.Differential Revision: https://reviews.freebsd.org/D1100
Add foo_genassym.c files to DPSRCS so dependencies for them are generated.This ensures these objects are rebuilt to generate an updated header ofassembly constants if needed.
Move the ACPI PM timer emulation into vmm.ko.This reduces variability during timer calibration by keeping the emulation"close" to the guest. Additionally having all timer emulations in the kernel
Move the ACPI PM timer emulation into vmm.ko.This reduces variability during timer calibration by keeping the emulation"close" to the guest. Additionally having all timer emulations in the kernelwill ease the transition to a per-VM clock source (as opposed to using thehost's uptime keep track of time).Discussed with: grehan
Fix build to not bogusly always rebuild vmm.ko.Rename vmx_assym.s to vmx_assym.h to reflect that file's actual useand update vmx_support.S's include to match. Add vmx_assym.h to theSRCS to that i
Fix build to not bogusly always rebuild vmm.ko.Rename vmx_assym.s to vmx_assym.h to reflect that file's actual useand update vmx_support.S's include to match. Add vmx_assym.h to theSRCS to that it gets properly added to the dependency list. Addvmx_support.S to SRCS as well, so it gets built and needs fewerspecial-case goo. Remove now-redundant special-case goo. Finally,vmx_genassym.o doesn't need to depend on a hand expanded ${_ILINKS}explicitly, that's all taken care of by beforedepend.With these items fixed, we no longer build vmm.ko every single timethrough the modules on a KERNFAST build.Sponsored by: Netflix
Restructure the MSR handling so it is entirely handled by processor-specificcode. There are only a handful of MSRs common between the two so there isn'ttoo much duplicate functionality.The VT-x c
Restructure the MSR handling so it is entirely handled by processor-specificcode. There are only a handful of MSRs common between the two so there isn'ttoo much duplicate functionality.The VT-x code has the following types of MSRs:- MSRs that are unconditionally saved/restored on every guest/host context switch (e.g., MSR_GSBASE).- MSRs that are restored to guest values on entry to vmx_run() and saved before returning. This is an optimization for MSRs that are not used in host kernel context (e.g., MSR_KGSBASE).- MSRs that are emulated and every access by the guest causes a trap into the hypervisor (e.g., MSR_IA32_MISC_ENABLE).Reviewed by: grehan
Move the atpit device model from userspace into vmm.ko for betterprecision and lower latency.Approved by: grehan (co-mentor)
Replace the userspace atpic stub with a more functional vmm.ko model.New ioctls VM_ISA_ASSERT_IRQ, VM_ISA_DEASSERT_IRQ and VM_ISA_PULSE_IRQcan be used to manipulate the pic, and optionally the ioa
Replace the userspace atpic stub with a more functional vmm.ko model.New ioctls VM_ISA_ASSERT_IRQ, VM_ISA_DEASSERT_IRQ and VM_ISA_PULSE_IRQcan be used to manipulate the pic, and optionally the ioapic, pin state.Reviewed by: jhb, neelApproved by: neel (co-mentor)
Add HPET device emulation to bhyve.bhyve supports a single timer block with 8 timers. The timers are all 32-bitand capable of being operated in periodic mode. All timers support interruptdelivery
Add HPET device emulation to bhyve.bhyve supports a single timer block with 8 timers. The timers are all 32-bitand capable of being operated in periodic mode. All timers support interruptdelivery using MSI. Timers 0 and 1 also support legacy interrupt routing.At the moment the timers are not connected to any ioapic pins but that willbe addressed in a subsequent commit.This change is based on a patch from Tycho Nightingale ([email protected]).
Move the ioapic device model from userspace into vmm.ko. This is needed forupcoming in-kernel device emulations like the HPET.The ioctls VM_IOAPIC_ASSERT_IRQ and VM_IOAPIC_DEASSERT_IRQ are used to
Move the ioapic device model from userspace into vmm.ko. This is needed forupcoming in-kernel device emulations like the HPET.The ioctls VM_IOAPIC_ASSERT_IRQ and VM_IOAPIC_DEASSERT_IRQ are used tomanipulate the ioapic pin state.Discussed with: grehan@Submitted by: Tycho Nightingale ([email protected])
Remove the 'vdev' abstraction that was meant to sit on top of device modelsin the kernel. This abstraction was redundant because the only device emulatedinside vmm.ko is the local apic and it is al
Remove the 'vdev' abstraction that was meant to sit on top of device modelsin the kernel. This abstraction was redundant because the only device emulatedinside vmm.ko is the local apic and it is always at a fixed guest physicaladdress.Discussed with: grehan
Add in last remaining files to get AMD-SVM operational.Submitted by: Anish Gupta ([email protected])
Fix 'make depend'.
Corral all the host state associated with the virtual machine into its own file.This state is independent of the type of hardware assist used so there isreally no need for it to be in Intel-specif
Corral all the host state associated with the virtual machine into its own file.This state is independent of the type of hardware assist used so there isreally no need for it to be in Intel-specific code.Obtained from: NetApp
Add support for trapping MMIO writes to local apic registers and emulating them.The default behavior is still to present the local apic to the guest in thex2apic mode.
12