|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12 |
|
| #
38f9e4b9 |
| 12-Nov-2024 |
Kalesh Singh <[email protected]> |
arm64: kvm: Introduce nvhe stack size constants
Refactor nvhe stack code to use NVHE_STACK_SIZE/SHIFT constants, instead of directly using PAGE_SIZE/SHIFT. This makes the code a bit easier to read,
arm64: kvm: Introduce nvhe stack size constants
Refactor nvhe stack code to use NVHE_STACK_SIZE/SHIFT constants, instead of directly using PAGE_SIZE/SHIFT. This makes the code a bit easier to read, without introducing any functional changes.
Cc: Marc Zyngier <[email protected]> Cc: Mark Brown <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Kalesh Singh <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Marc Zyngier <[email protected]>
show more ...
|
|
Revision tags: v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3 |
|
| #
afe789b7 |
| 09-Oct-2024 |
John Hubbard <[email protected]> |
kaslr: rename physmem_end and PHYSMEM_END to direct_map_physmem_end
For clarity. It's increasingly hard to reason about the code, when KASLR is moving around the boundaries. In this case where KAS
kaslr: rename physmem_end and PHYSMEM_END to direct_map_physmem_end
For clarity. It's increasingly hard to reason about the code, when KASLR is moving around the boundaries. In this case where KASLR is randomizing the location of the kernel image within physical memory, the maximum number of address bits for physical memory has not changed.
What has changed is the ending address of memory that is allowed to be directly mapped by the kernel.
Let's name the variable, and the associated macro accordingly.
Also, enhance the comment above the direct_map_physmem_end definition, to further clarify how this all works.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: John Hubbard <[email protected]> Reviewed-by: Pankaj Gupta <[email protected]> Acked-by: David Hildenbrand <[email protected]> Acked-by: Will Deacon <[email protected]> Reviewed-by: Mike Rapoport (Microsoft) <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Alistair Popple <[email protected]> Cc: Jordan Niethe <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
| #
c5c3238d |
| 23-Oct-2024 |
Christoph Hellwig <[email protected]> |
asm-generic: provide generic page_to_phys and phys_to_page implementations
page_to_phys is duplicated by all architectures, and from some strange reason placed in <asm/io.h> where it doesn't fit at
asm-generic: provide generic page_to_phys and phys_to_page implementations
page_to_phys is duplicated by all architectures, and from some strange reason placed in <asm/io.h> where it doesn't fit at all.
phys_to_page is only provided by a few architectures despite having a lot of open coded users.
Provide generic versions in <asm-generic/memory_model.h> to make these helpers more easily usable.
Note with this patch powerpc loses the CONFIG_DEBUG_VIRTUAL pfn_valid check. It will be added back in a generic version later.
Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Arnd Bergmann <[email protected]>
show more ...
|
|
Revision tags: v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7 |
|
| #
eeb8fdfc |
| 03-Sep-2024 |
D Scott Phillips <[email protected]> |
arm64: Expose the end of the linear map in PHYSMEM_END
The memory hot-plug and resource management code needs to know the largest address which can fit in the linear map, so set PHYSMEM_END for that
arm64: Expose the end of the linear map in PHYSMEM_END
The memory hot-plug and resource management code needs to know the largest address which can fit in the linear map, so set PHYSMEM_END for that purpose.
This fixes a crash at boot when amdgpu tries to create DEVICE_PRIVATE_MEMORY and is given a physical address by the resource management code which is outside the range which can have a `struct page`
| Unable to handle kernel paging request at virtual address 000001ffa6000034 | user pgtable: 4k pages, 48-bit VAs, pgdp=000008000287c000 | [000001ffa6000034] pgd=0000000000000000, p4d=0000000000000000 | Call trace: | __init_zone_device_page.constprop.0+0x2c/0xa8 | memmap_init_zone_device+0xf0/0x210 | pagemap_range+0x1e0/0x410 | memremap_pages+0x18c/0x2e0 | devm_memremap_pages+0x30/0x90 | kgd2kfd_init_zone_device+0xf0/0x200 [amdgpu] | amdgpu_device_ip_init+0x674/0x888 [amdgpu] | amdgpu_device_init+0x7a4/0xea0 [amdgpu] | amdgpu_driver_load_kms+0x28/0x1c0 [amdgpu] | amdgpu_pci_probe+0x1a0/0x560 [amdgpu]
Signed-off-by: D Scott Phillips <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
show more ...
|
|
Revision tags: v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1, v6.8, v6.8-rc7, v6.8-rc6 |
|
| #
c034ec84 |
| 24-Feb-2024 |
Ankit Agrawal <[email protected]> |
KVM: arm64: Introduce new flag for non-cacheable IO memory
Currently, KVM for ARM64 maps at stage 2 memory that is considered device (i.e. it is not RAM) with DEVICE_nGnRE memory attributes; this se
KVM: arm64: Introduce new flag for non-cacheable IO memory
Currently, KVM for ARM64 maps at stage 2 memory that is considered device (i.e. it is not RAM) with DEVICE_nGnRE memory attributes; this setting overrides (as per the ARM architecture [1]) any device MMIO mapping present at stage 1, resulting in a set-up whereby a guest operating system cannot determine device MMIO mapping memory attributes on its own but it is always overridden by the KVM stage 2 default.
This set-up does not allow guest operating systems to select device memory attributes independently from KVM stage-2 mappings (refer to [1], "Combining stage 1 and stage 2 memory type attributes"), which turns out to be an issue in that guest operating systems (e.g. Linux) may request to map devices MMIO regions with memory attributes that guarantee better performance (e.g. gathering attribute - that for some devices can generate larger PCIe memory writes TLPs) and specific operations (e.g. unaligned transactions) such as the NormalNC memory type.
The default device stage 2 mapping was chosen in KVM for ARM64 since it was considered safer (i.e. it would not allow guests to trigger uncontained failures ultimately crashing the machine) but this turned out to be asynchronous (SError) defeating the purpose.
Failures containability is a property of the platform and is independent from the memory type used for MMIO device memory mappings.
Actually, DEVICE_nGnRE memory type is even more problematic than Normal-NC memory type in terms of faults containability in that e.g. aborts triggered on DEVICE_nGnRE loads cannot be made, architecturally, synchronous (i.e. that would imply that the processor should issue at most 1 load transaction at a time - it cannot pipeline them - otherwise the synchronous abort semantics would break the no-speculation attribute attached to DEVICE_XXX memory).
This means that regardless of the combined stage1+stage2 mappings a platform is safe if and only if device transactions cannot trigger uncontained failures and that in turn relies on platform capabilities and the device type being assigned (i.e. PCIe AER/DPC error containment and RAS architecture[3]); therefore the default KVM device stage 2 memory attributes play no role in making device assignment safer for a given platform (if the platform design adheres to design guidelines outlined in [3]) and therefore can be relaxed.
For all these reasons, relax the KVM stage 2 device memory attributes from DEVICE_nGnRE to Normal-NC.
The NormalNC was chosen over a different Normal memory type default at stage-2 (e.g. Normal Write-through) to avoid cache allocation/snooping.
Relaxing S2 KVM device MMIO mappings to Normal-NC is not expected to trigger any issue on guest device reclaim use cases either (i.e. device MMIO unmap followed by a device reset) at least for PCIe devices, in that in PCIe a device reset is architected and carried out through PCI config space transactions that are naturally ordered with respect to MMIO transactions according to the PCI ordering rules.
Having Normal-NC S2 default puts guests in control (thanks to stage1+stage2 combined memory attributes rules [1]) of device MMIO regions memory mappings, according to the rules described in [1] and summarized here ([(S1) - stage1], [(S2) - stage 2]):
S1 | S2 | Result NORMAL-WB | NORMAL-NC | NORMAL-NC NORMAL-WT | NORMAL-NC | NORMAL-NC NORMAL-NC | NORMAL-NC | NORMAL-NC DEVICE<attr> | NORMAL-NC | DEVICE<attr>
It is worth noting that currently, to map devices MMIO space to user space in a device pass-through use case the VFIO framework applies memory attributes derived from pgprot_noncached() settings applied to VMAs, which result in device-nGnRnE memory attributes for the stage-1 VMM mappings.
This means that a userspace mapping for device MMIO space carried out with the current VFIO framework and a guest OS mapping for the same MMIO space may result in a mismatched alias as described in [2].
Defaulting KVM device stage-2 mappings to Normal-NC attributes does not change anything in this respect, in that the mismatched aliases would only affect (refer to [2] for a detailed explanation) ordering between the userspace and GuestOS mappings resulting stream of transactions (i.e. it does not cause loss of property for either stream of transactions on its own), which is harmless given that the userspace and GuestOS access to the device is carried out through independent transactions streams.
A Normal-NC flag is not present today. So add a new kvm_pgtable_prot (KVM_PGTABLE_PROT_NORMAL_NC) flag for it, along with its corresponding PTE value 0x5 (0b101) determined from [1].
Lastly, adapt the stage2 PTE property setter function (stage2_set_prot_attr) to handle the NormalNC attribute.
The entire discussion leading to this patch series may be followed through the following links. Link: https://lore.kernel.org/all/[email protected] Link: https://lore.kernel.org/r/[email protected]
[1] section D8.5.5 - DDI0487J_a_a-profile_architecture_reference_manual.pdf [2] section B2.8 - DDI0487J_a_a-profile_architecture_reference_manual.pdf [3] sections 1.7.7.3/1.8.5.2/appendix C - DEN0029H_SBSA_7.1.pdf
Suggested-by: Jason Gunthorpe <[email protected]> Acked-by: Catalin Marinas <[email protected]> Acked-by: Will Deacon <[email protected]> Reviewed-by: Marc Zyngier <[email protected]> Signed-off-by: Ankit Agrawal <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Oliver Upton <[email protected]>
show more ...
|
|
Revision tags: v6.8-rc5 |
|
| #
9684ec18 |
| 14-Feb-2024 |
Ard Biesheuvel <[email protected]> |
arm64: Enable LPA2 at boot if supported by the system
Update the early kernel mapping code to take 52-bit virtual addressing into account based on the LPA2 feature. This is a bit more involved than
arm64: Enable LPA2 at boot if supported by the system
Update the early kernel mapping code to take 52-bit virtual addressing into account based on the LPA2 feature. This is a bit more involved than LVA (which is supported with 64k pages only), given that some page table descriptor bits change meaning in this case.
To keep the handling in asm to a minimum, the initial ID map is still created with 48-bit virtual addressing, which implies that the kernel image must be loaded into 48-bit addressable physical memory. This is currently required by the boot protocol, even though we happen to support placement outside of that for LVA/64k based configurations.
Enabling LPA2 involves more than setting TCR.T1SZ to a lower value, there is also a DS bit in TCR that needs to be set, and which changes the meaning of bits [9:8] in all page table descriptors. Since we cannot enable DS and every live page table descriptor at the same time, let's pivot through another temporary mapping. This avoids the need to reintroduce manipulations of the page tables with the MMU and caches disabled.
To permit the LPA2 feature to be overridden on the kernel command line, which may be necessary to work around silicon errata, or to deal with mismatched features on heterogeneous SoC designs, test for CPU feature overrides first, and only then enable LPA2.
Signed-off-by: Ard Biesheuvel <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
show more ...
|
| #
9cce9c6c |
| 14-Feb-2024 |
Ard Biesheuvel <[email protected]> |
arm64: mm: Handle LVA support as a CPU feature
Currently, we detect CPU support for 52-bit virtual addressing (LVA) extremely early, before creating the kernel page tables or enabling the MMU. We ca
arm64: mm: Handle LVA support as a CPU feature
Currently, we detect CPU support for 52-bit virtual addressing (LVA) extremely early, before creating the kernel page tables or enabling the MMU. We cannot override the feature this early, and so large virtual addressing is always enabled on CPUs that implement support for it if the software support for it was enabled at build time. It also means we rely on non-trivial code in asm to deal with this feature.
Given that both the ID map and the TTBR1 mapping of the kernel image are guaranteed to be 48-bit addressable, it is not actually necessary to enable support this early, and instead, we can model it as a CPU feature. That way, we can rely on code patching to get the correct TCR.T1SZ values programmed on secondary boot and resume from suspend.
On the primary boot path, we simply enable the MMU with 48-bit virtual addressing initially, and update TCR.T1SZ if LVA is supported from C code, right before creating the kernel mapping. Given that TTBR1 still points to reserved_pg_dir at this point, updating TCR.T1SZ should be safe without the need for explicit TLB maintenance.
Since this gets rid of all accesses to the vabits_actual variable from asm code that occurred before TCR.T1SZ had been programmed, we no longer have a need for this variable, and we can replace it with a C expression that produces the correct value directly, based on the value of TCR.T1SZ.
Signed-off-by: Ard Biesheuvel <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
show more ...
|
|
Revision tags: v6.8-rc4, v6.8-rc3, v6.8-rc2, v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6 |
|
| #
32697ff3 |
| 13-Dec-2023 |
Ard Biesheuvel <[email protected]> |
arm64: vmemmap: Avoid base2 order of struct page size to dimension region
The placement and size of the vmemmap region in the kernel virtual address space is currently derived from the base2 order o
arm64: vmemmap: Avoid base2 order of struct page size to dimension region
The placement and size of the vmemmap region in the kernel virtual address space is currently derived from the base2 order of the size of a struct page. This makes for nicely aligned constants with lots of leading 0xf and trailing 0x0 digits, but given that the actual struct pages are indexed as an ordinary array, this resulting region is severely overdimensioned when the size of a struct page is just over a power of 2.
This doesn't matter today, but once we enable 52-bit virtual addressing for 4k pages configurations, the vmemmap region may take up almost half of the upper VA region with the current struct page upper bound at 64 bytes. And once we enable KMSAN or other features that push the size of a struct page over 64 bytes, we will run out of VMALLOC space entirely.
So instead, let's derive the region size from the actual size of a struct page, and place the entire region 1 GB from the top of the VA space, where it still doesn't share any lower level translation table entries with the fixmap.
Acked-by: Mark Rutland <[email protected]> Signed-off-by: Ard Biesheuvel <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]> Acked-by: Mark Rutland <[email protected]>
show more ...
|
| #
b730b0f2 |
| 13-Dec-2023 |
Ard Biesheuvel <[email protected]> |
arm64: mm: Move fixmap region above vmemmap region
Move the fixmap region above the vmemmap region, so that the start of the vmemmap delineates the end of the region available for vmalloc and vmap a
arm64: mm: Move fixmap region above vmemmap region
Move the fixmap region above the vmemmap region, so that the start of the vmemmap delineates the end of the region available for vmalloc and vmap allocations and the randomized placement of the kernel and modules.
In a subsequent patch, we will take advantage of this to reclaim most of the vmemmap area when running a 52-bit VA capable build with 52-bit virtual addressing disabled at runtime.
Note that the existing guard region of 256 MiB covers the fixmap and PCI I/O regions as well, so we can reduce it 8 MiB, which is what we use in other places too.
Signed-off-by: Ard Biesheuvel <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]> Acked-by: Mark Rutland <[email protected]>
show more ...
|
| #
031e011d |
| 13-Dec-2023 |
Ard Biesheuvel <[email protected]> |
arm64: mm: Move PCI I/O emulation region above the vmemmap region
Move the PCI I/O region above the vmemmap region in the kernel's VA space. This will permit us to reclaim the lower part of the vmem
arm64: mm: Move PCI I/O emulation region above the vmemmap region
Move the PCI I/O region above the vmemmap region in the kernel's VA space. This will permit us to reclaim the lower part of the vmemmap region for vmalloc/vmap allocations when running a 52-bit VA capable build on a 48-bit VA capable system.
Also, given that PCI_IO_START is derived from VMEMMAP_END, use that symbolic constant directly in ptdump rather than deriving it from VMEMMAP_START and VMEMMAP_SIZE, as those definitions will change in subsequent patches.
Signed-off-by: Ard Biesheuvel <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]> Acked-by: Mark Rutland <[email protected]>
show more ...
|
| #
27232ba9 |
| 21-Dec-2023 |
Andrey Konovalov <[email protected]> |
kasan/arm64: improve comments for KASAN_SHADOW_START/END
Patch series "kasan: assorted clean-ups".
Code clean-ups, nothing worthy of being backported to stable.
This patch (of 11):
Unify and imp
kasan/arm64: improve comments for KASAN_SHADOW_START/END
Patch series "kasan: assorted clean-ups".
Code clean-ups, nothing worthy of being backported to stable.
This patch (of 11):
Unify and improve the comments for KASAN_SHADOW_START/END definitions from include/asm/kasan.h and include/asm/memory.h.
Also put both definitions together in include/asm/memory.h.
Also clarify the related BUILD_BUG_ON checks in mm/kasan_init.c.
Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/140108ca0b164648c395a41fbeecb0601b1ae9e1.1703188911.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Marco Elver <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
| #
5cc5ed7a |
| 15-Dec-2023 |
Wang Jinchao <[email protected]> |
arm64: memory: remove duplicated include
remove duplicated include
Signed-off-by: Wang Jinchao <[email protected]> Link: https://lore.kernel.org/r/[email protected] Si
arm64: memory: remove duplicated include
remove duplicated include
Signed-off-by: Wang Jinchao <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
show more ...
|
|
Revision tags: v6.7-rc5, v6.7-rc4 |
|
| #
376f5a3b |
| 29-Nov-2023 |
Ard Biesheuvel <[email protected]> |
arm64: mm: get rid of kimage_vaddr global variable
We store the address of _text in kimage_vaddr, but since commit 09e3c22a86f6889d ("arm64: Use a variable to store non-global mappings decision"), w
arm64: mm: get rid of kimage_vaddr global variable
We store the address of _text in kimage_vaddr, but since commit 09e3c22a86f6889d ("arm64: Use a variable to store non-global mappings decision"), we no longer reference this variable from modules so we no longer need to export it.
In fact, we don't need it at all so let's just get rid of it.
Acked-by: Mark Rutland <[email protected]> Signed-off-by: Ard Biesheuvel <[email protected]> Reviewed-by: Anshuman Khandual <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
show more ...
|
|
Revision tags: v6.7-rc3, v6.7-rc2, v6.7-rc1, v6.6, v6.6-rc7, v6.6-rc6, v6.6-rc5, v6.6-rc4, v6.6-rc3, v6.6-rc2, v6.6-rc1, v6.5, v6.5-rc7, v6.5-rc6, v6.5-rc5, v6.5-rc4, v6.5-rc3, v6.5-rc2, v6.5-rc1, v6.4, v6.4-rc7, v6.4-rc6, v6.4-rc5 |
|
| #
3e35d303 |
| 30-May-2023 |
Mark Rutland <[email protected]> |
arm64: module: rework module VA range selection
Currently, the modules region is 128M in size, which is a problem for some large modules. Shanker reports [1] that the NVIDIA GPU driver alone can con
arm64: module: rework module VA range selection
Currently, the modules region is 128M in size, which is a problem for some large modules. Shanker reports [1] that the NVIDIA GPU driver alone can consume 110M of module space in some configurations. We'd like to make the modules region a full 2G such that we can always make use of a 2G range.
It's possible to build kernel images which are larger than 128M in some configurations, such as when many debug options are selected and many drivers are built in. In these configurations, we can't legitimately select a base for a 128M module region, though we currently select a value for which allocation will fail. It would be nicer to have a diagnostic message in this case.
Similarly, in theory it's possible to build a kernel image which is larger than 2G and which cannot support modules. While this isn't likely to be the case for any realistic kernel deplyed in the field, it would be nice if we could print a diagnostic in this case.
This patch reworks the module VA range selection to use a 2G range, and improves handling of cases where we cannot select legitimate module regions. We now attempt to select a 128M region and a 2G region:
* The 128M region is selected such that modules can use direct branches (with JUMP26/CALL26 relocations) to branch to kernel code and other modules, and so that modules can reference data and text (using PREL32 relocations) anywhere in the kernel image and other modules.
This region covers the entire kernel image (rather than just the text) to ensure that all PREL32 relocations are in range even when the kernel data section is absurdly large. Where we cannot allocate from this region, we'll fall back to the full 2G region.
* The 2G region is selected such that modules can use direct branches with PLTs to branch to kernel code and other modules, and so that modules can use reference data and text (with PREL32 relocations) in the kernel image and other modules.
This region covers the entire kernel image, and the 128M region (if one is selected).
The two module regions are randomized independently while ensuring the constraints described above.
[1] https://lore.kernel.org/linux-arm-kernel/[email protected]/
Signed-off-by: Mark Rutland <[email protected]> Reviewed-by: Ard Biesheuvel <[email protected]> Cc: Shanker Donthineni <[email protected]> Cc: Will Deacon <[email protected]> Tested-by: Shanker Donthineni <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
show more ...
|
| #
6e13b6b9 |
| 30-May-2023 |
Mark Rutland <[email protected]> |
arm64: kaslr: split kaslr/module initialization
Currently kaslr_init() handles a mixture of detecting/announcing whether KASLR is enabled, and randomizing the module region depending on whether KASL
arm64: kaslr: split kaslr/module initialization
Currently kaslr_init() handles a mixture of detecting/announcing whether KASLR is enabled, and randomizing the module region depending on whether KASLR is enabled.
To make it easier to rework the module region initialization, split the KASLR initialization into two steps:
* kaslr_init() determines whether KASLR should be enabled, and announces this choice, recording this to a new global boolean variable. This is called from setup_arch() just before the existing call to kaslr_requires_kpti() so that this will always provide the expected result.
* kaslr_module_init() randomizes the module region when required. This is called as a subsys_initcall, where we previously called kaslr_init().
As a bonus, moving the KASLR reporting earlier makes it easier to spot and permits it to be logged via earlycon, making it easier to debug any issues that could be triggered by KASLR.
Booting a v6.4-rc1 kernel with this patch applied, the log looks like:
| EFI stub: Booting Linux Kernel... | EFI stub: Generating empty DTB | EFI stub: Exiting boot services... | [ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x000f0510] | [ 0.000000] Linux version 6.4.0-rc1-00006-g4763a8f8aeb3 (mark@lakrids) (aarch64-linux-gcc (GCC) 12.1.0, GNU ld (GNU Binutils) 2.38) #2 SMP PREEMPT Tue May 9 11:03:37 BST 2023 | [ 0.000000] KASLR enabled | [ 0.000000] earlycon: pl11 at MMIO 0x0000000009000000 (options '') | [ 0.000000] printk: bootconsole [pl11] enabled
Signed-off-by: Mark Rutland <[email protected]> Reviewed-by: Ard Biesheuvel <[email protected]> Cc: Will Deacon <[email protected]> Tested-by: Shanker Donthineni <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
show more ...
|
|
Revision tags: v6.4-rc4, v6.4-rc3, v6.4-rc2, v6.4-rc1, v6.3, v6.3-rc7, v6.3-rc6, v6.3-rc5, v6.3-rc4, v6.3-rc3, v6.3-rc2, v6.3-rc1, v6.2, v6.2-rc8, v6.2-rc7, v6.2-rc6, v6.2-rc5, v6.2-rc4, v6.2-rc3, v6.2-rc2, v6.2-rc1, v6.1, v6.1-rc8, v6.1-rc7, v6.1-rc6, v6.1-rc5, v6.1-rc4, v6.1-rc3, v6.1-rc2, v6.1-rc1, v6.0, v6.0-rc7, v6.0-rc6, v6.0-rc5, v6.0-rc4, v6.0-rc3, v6.0-rc2, v6.0-rc1, v5.19, v5.19-rc8, v5.19-rc7, v5.19-rc6, v5.19-rc5, v5.19-rc4, v5.19-rc3, v5.19-rc2, v5.19-rc1 |
|
| #
c94b1a01 |
| 30-May-2022 |
Linus Walleij <[email protected]> |
arm64: memory: Make virt_to_pfn() a static inline
Making virt_to_pfn() a static inline taking a strongly typed (const void *) makes the contract of a passing a pointer of that type to the function e
arm64: memory: Make virt_to_pfn() a static inline
Making virt_to_pfn() a static inline taking a strongly typed (const void *) makes the contract of a passing a pointer of that type to the function explicit and exposes any misuse of the macro virt_to_pfn() acting polymorphic and accepting many types such as (void *), (unitptr_t) or (unsigned long) as arguments without warnings.
Since arm64 is using <asm-generic/memory_model.h> to provide __phys_to_pfn() we need to move the inclusion of that header up, so we can resolve the static inline at compile time.
Acked-by: Catalin Marinas <[email protected]> Signed-off-by: Linus Walleij <[email protected]>
show more ...
|
| #
504cae45 |
| 07-Apr-2023 |
Baoquan He <[email protected]> |
arm64: kdump: defer the crashkernel reservation for platforms with no DMA memory zones
In commit 031495635b46 ("arm64: Do not defer reserve_crashkernel() for platforms with no DMA memory zones"), re
arm64: kdump: defer the crashkernel reservation for platforms with no DMA memory zones
In commit 031495635b46 ("arm64: Do not defer reserve_crashkernel() for platforms with no DMA memory zones"), reserve_crashkernel() is called much earlier in arm64_memblock_init() to avoid causing base apge mapping on platforms with no DMA meomry zones.
With taking off protection on crashkernel memory region, no need to call reserve_crashkernel() specially in advance. The deferred invocation of reserve_crashkernel() in bootmem_init() can cover all cases. So revert the whole commit now.
Signed-off-by: Baoquan He <[email protected]> Reviewed-by: Zhen Lei <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
show more ...
|
| #
0d3c9468 |
| 10-Mar-2023 |
Andrey Konovalov <[email protected]> |
kasan, arm64: add arch_suppress_tag_checks_start/stop
Add two new tagging-related routines arch_suppress_tag_checks_start/stop that suppress MTE tag checking via the TCO register.
These rouines are
kasan, arm64: add arch_suppress_tag_checks_start/stop
Add two new tagging-related routines arch_suppress_tag_checks_start/stop that suppress MTE tag checking via the TCO register.
These rouines are used in the next patch.
[[email protected]: drop __ from mte_disable/enable_tco names] Link: https://lkml.kernel.org/r/7ad5e5a9db79e3aba08d8f43aca24350b04080f6.1680114854.git.andreyknvl@google.com Link: https://lkml.kernel.org/r/75a362551c3c54b70ae59a3492cabb51c105fa6b.1678491668.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Evgenii Stepanov <[email protected]> Cc: Marco Elver <[email protected]> Cc: Peter Collingbourne <[email protected]> Cc: Vincenzo Frascino <[email protected]> Cc: Weizhao Ouyang <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
| #
0eafff1c |
| 10-Mar-2023 |
Andrey Konovalov <[email protected]> |
kasan, arm64: rename tagging-related routines
Rename arch_enable_tagging_sync/async/asymm to arch_enable_tag_checks_sync/async/asymm, as the new name better reflects their function.
Also rename kas
kasan, arm64: rename tagging-related routines
Rename arch_enable_tagging_sync/async/asymm to arch_enable_tag_checks_sync/async/asymm, as the new name better reflects their function.
Also rename kasan_enable_tagging to kasan_enable_hw_tags for the same reason.
Link: https://lkml.kernel.org/r/069ef5b77715c1ac8d69b186725576c32b149491.1678491668.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Evgenii Stepanov <[email protected]> Cc: Marco Elver <[email protected]> Cc: Peter Collingbourne <[email protected]> Cc: Vincenzo Frascino <[email protected]> Cc: Weizhao Ouyang <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
| #
010338d7 |
| 23-Feb-2023 |
Ard Biesheuvel <[email protected]> |
arm64: kaslr: don't pretend KASLR is enabled if offset < MIN_KIMG_ALIGN
Our virtual KASLR displacement is a randomly chosen multiple of 2 MiB plus an offset that is equal to the physical placement m
arm64: kaslr: don't pretend KASLR is enabled if offset < MIN_KIMG_ALIGN
Our virtual KASLR displacement is a randomly chosen multiple of 2 MiB plus an offset that is equal to the physical placement modulo 2 MiB. This arrangement ensures that we can always use 2 MiB block mappings (or contiguous PTE mappings for 16k or 64k pages) to map the kernel.
This means that a KASLR offset of less than 2 MiB is simply the product of this physical displacement, and no randomization has actually taken place. Currently, we use 'kaslr_offset() > 0' to decide whether or not randomization has occurred, and so we misidentify this case.
If the kernel image placement is not randomized, modules are allocated from a dedicated region below the kernel mapping, which is only used for modules and not for other vmalloc() or vmap() calls.
When randomization is enabled, the kernel image is vmap()'ed randomly inside the vmalloc region, and modules are allocated in the vicinity of this mapping to ensure that relative references are always in range. However, unlike the dedicated module region below the vmalloc region, this region is not reserved exclusively for modules, and so ordinary vmalloc() calls may end up overlapping with it. This should rarely happen, given that vmalloc allocates bottom up, although it cannot be ruled out entirely.
The misidentified case results in a placement of the kernel image within 2 MiB of its default address. However, the logic that randomizes the module region is still invoked, and this could result in the module region overlapping with the start of the vmalloc region, instead of using the dedicated region below it. If this happens, a single large vmalloc() or vmap() call will use up the entire region, and leave no space for loading modules after that.
Since commit 82046702e288 ("efi/libstub/arm64: Replace 'preferred' offset with alignment check"), this is much more likely to occur on systems that boot via EFI but lack an implementation of the EFI RNG protocol, as in that case, the EFI stub will decide to leave the image where it found it, and the EFI firmware uses 64k alignment only.
Fix this, by correctly identifying the case where the virtual displacement is a result of the physical displacement only.
Signed-off-by: Ard Biesheuvel <[email protected]> Reviewed-by: Mark Brown <[email protected]> Acked-by: Mark Rutland <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
show more ...
|
| #
450d0e74 |
| 15-Jun-2022 |
Zhou Guanghui <[email protected]> |
memblock,arm64: expand the static memblock memory table
In a system(Huawei Ascend ARM64 SoC) using HBM, a multi-bit ECC error occurs, and the BIOS will mark the corresponding area (for example, 2 MB
memblock,arm64: expand the static memblock memory table
In a system(Huawei Ascend ARM64 SoC) using HBM, a multi-bit ECC error occurs, and the BIOS will mark the corresponding area (for example, 2 MB) as unusable. When the system restarts next time, these areas are not reported or reported as EFI_UNUSABLE_MEMORY. Both cases lead to an increase in the number of memblocks, whereas EFI_UNUSABLE_MEMORY leads to a larger number of memblocks.
For example, if the EFI_UNUSABLE_MEMORY type is reported: ... memory[0x92] [0x0000200834a00000-0x0000200835bfffff], 0x0000000001200000 bytes on node 7 flags: 0x0 memory[0x93] [0x0000200835c00000-0x0000200835dfffff], 0x0000000000200000 bytes on node 7 flags: 0x4 memory[0x94] [0x0000200835e00000-0x00002008367fffff], 0x0000000000a00000 bytes on node 7 flags: 0x0 memory[0x95] [0x0000200836800000-0x00002008369fffff], 0x0000000000200000 bytes on node 7 flags: 0x4 memory[0x96] [0x0000200836a00000-0x0000200837bfffff], 0x0000000001200000 bytes on node 7 flags: 0x0 memory[0x97] [0x0000200837c00000-0x0000200837dfffff], 0x0000000000200000 bytes on node 7 flags: 0x4 memory[0x98] [0x0000200837e00000-0x000020087fffffff], 0x0000000048200000 bytes on node 7 flags: 0x0 memory[0x99] [0x0000200880000000-0x0000200bcfffffff], 0x0000000350000000 bytes on node 6 flags: 0x0 memory[0x9a] [0x0000200bd0000000-0x0000200bd01fffff], 0x0000000000200000 bytes on node 6 flags: 0x4 memory[0x9b] [0x0000200bd0200000-0x0000200bd07fffff], 0x0000000000600000 bytes on node 6 flags: 0x0 memory[0x9c] [0x0000200bd0800000-0x0000200bd09fffff], 0x0000000000200000 bytes on node 6 flags: 0x4 memory[0x9d] [0x0000200bd0a00000-0x0000200fcfffffff], 0x00000003ff600000 bytes on node 6 flags: 0x0 memory[0x9e] [0x0000200fd0000000-0x0000200fd01fffff], 0x0000000000200000 bytes on node 6 flags: 0x4 memory[0x9f] [0x0000200fd0200000-0x0000200fffffffff], 0x000000002fe00000 bytes on node 6 flags: 0x0 ...
The EFI memory map is parsed to construct the memblock arrays before the memblock arrays can be resized. As the result, memory regions beyond INIT_MEMBLOCK_REGIONS are lost.
Add a new macro INIT_MEMBLOCK_MEMORY_REGIONS to replace INIT_MEMBLOCK_REGTIONS to define the size of the static memblock.memory array.
Allow overriding memblock.memory array size with architecture defined INIT_MEMBLOCK_MEMORY_REGIONS and make arm64 to set INIT_MEMBLOCK_MEMORY_REGIONS to 1024 when CONFIG_EFI is enabled.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Zhou Guanghui <[email protected]> Acked-by: Mike Rapoport <[email protected]> Tested-by: Darren Hart <[email protected]> Acked-by: Will Deacon <[email protected]> [arm64] Reviewed-by: Anshuman Khandual <[email protected]> Cc: Xu Qiang <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
| #
6928bcc8 |
| 26-Jul-2022 |
Kalesh Singh <[email protected]> |
KVM: arm64: Allocate shared pKVM hyp stacktrace buffers
In protected nVHE mode the host cannot directly access hypervisor memory, so we will dump the hypervisor stacktrace to a shared buffer with th
KVM: arm64: Allocate shared pKVM hyp stacktrace buffers
In protected nVHE mode the host cannot directly access hypervisor memory, so we will dump the hypervisor stacktrace to a shared buffer with the host.
The minimum size for the buffer required, assuming the min frame size of [x29, x30] (2 * sizeof(long)), is half the combined size of the hypervisor and overflow stacks plus an additional entry to delimit the end of the stacktrace.
The stacktrace buffers are used later in the series to dump the nVHE hypervisor stacktrace when using protected-mode.
Signed-off-by: Kalesh Singh <[email protected]> Reviewed-by: Fuad Tabba <[email protected]> Tested-by: Fuad Tabba <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
4890cc18 |
| 05-Jul-2022 |
Anshuman Khandual <[email protected]> |
arm64/mm: Define defer_reserve_crashkernel()
Crash kernel memory reservation gets deferred, when either CONFIG_ZONE_DMA or CONFIG_ZONE_DMA32 config is enabled on the platform. This deferral also imp
arm64/mm: Define defer_reserve_crashkernel()
Crash kernel memory reservation gets deferred, when either CONFIG_ZONE_DMA or CONFIG_ZONE_DMA32 config is enabled on the platform. This deferral also impacts overall linear mapping creation including the crash kernel itself. Just encapsulate this deferral check in a new helper for better clarity.
Cc: Catalin Marinas <[email protected]> Cc: Will Deacon <[email protected]> Cc: [email protected] Cc: [email protected] Signed-off-by: Anshuman Khandual <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
show more ...
|
| #
0d9b1ffe |
| 24-Jun-2022 |
Ard Biesheuvel <[email protected]> |
arm64: mm: make vabits_actual a build time constant if possible
Currently, we only support 52-bit virtual addressing on 64k pages configurations, and in all other cases, vabits_actual is guaranteed
arm64: mm: make vabits_actual a build time constant if possible
Currently, we only support 52-bit virtual addressing on 64k pages configurations, and in all other cases, vabits_actual is guaranteed to equal VA_BITS (== VA_BITS_MIN). So get rid of the variable entirely in that case.
While at it, move the assignment out of the asm entry code - it has no need to be there.
Signed-off-by: Ard Biesheuvel <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
show more ...
|
|
Revision tags: v5.18, v5.18-rc7, v5.18-rc6, v5.18-rc5, v5.18-rc4, v5.18-rc3, v5.18-rc2, v5.18-rc1, v5.17, v5.17-rc8, v5.17-rc7, v5.17-rc6, v5.17-rc5, v5.17-rc4, v5.17-rc3, v5.17-rc2, v5.17-rc1, v5.16, v5.16-rc8, v5.16-rc7, v5.16-rc6, v5.16-rc5, v5.16-rc4, v5.16-rc3, v5.16-rc2, v5.16-rc1 |
|
| #
b89ddf4c |
| 05-Nov-2021 |
Russell King <[email protected]> |
arm64/bpf: Remove 128MB limit for BPF JIT programs
Commit 91fc957c9b1d ("arm64/bpf: don't allocate BPF JIT programs in module memory") restricts BPF JIT program allocation to a 128MB region to ensur
arm64/bpf: Remove 128MB limit for BPF JIT programs
Commit 91fc957c9b1d ("arm64/bpf: don't allocate BPF JIT programs in module memory") restricts BPF JIT program allocation to a 128MB region to ensure BPF programs are still in branching range of each other. However this restriction should not apply to the aarch64 JIT, since BPF_JMP | BPF_CALL are implemented as a 64-bit move into a register and then a BLR instruction - which has the effect of being able to call anything without proximity limitation.
The practical reason to relax this restriction on JIT memory is that 128MB of JIT memory can be quickly exhausted, especially where PAGE_SIZE is 64KB - one page is needed per program. In cases where seccomp filters are applied to multiple VMs on VM launch - such filters are classic BPF but converted to BPF - this can severely limit the number of VMs that can be launched. In a world where we support BPF JIT always on, turning off the JIT isn't always an option either.
Fixes: 91fc957c9b1d ("arm64/bpf: don't allocate BPF JIT programs in module memory") Suggested-by: Ard Biesheuvel <[email protected]> Signed-off-by: Russell King <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Tested-by: Alan Maguire <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
show more ...
|