1 //! In-memory representation of compiled machine code, with labels and fixups to 2 //! refer to those labels. Handles constant-pool island insertion and also 3 //! veneer insertion for out-of-range jumps. 4 //! 5 //! This code exists to solve three problems: 6 //! 7 //! - Branch targets for forward branches are not known until later, when we 8 //! emit code in a single pass through the instruction structs. 9 //! 10 //! - On many architectures, address references or offsets have limited range. 11 //! For example, on AArch64, conditional branches can only target code +/- 1MB 12 //! from the branch itself. 13 //! 14 //! - The lowering of control flow from the CFG-with-edges produced by 15 //! [BlockLoweringOrder](super::BlockLoweringOrder), combined with many empty 16 //! edge blocks when the register allocator does not need to insert any 17 //! spills/reloads/moves in edge blocks, results in many suboptimal branch 18 //! patterns. The lowering also pays no attention to block order, and so 19 //! two-target conditional forms (cond-br followed by uncond-br) can often by 20 //! avoided because one of the targets is the fallthrough. There are several 21 //! cases here where we can simplify to use fewer branches. 22 //! 23 //! This "buffer" implements a single-pass code emission strategy (with a later 24 //! "fixup" pass, but only through recorded fixups, not all instructions). The 25 //! basic idea is: 26 //! 27 //! - Emit branches as they are, including two-target (cond/uncond) compound 28 //! forms, but with zero offsets and optimistically assuming the target will be 29 //! in range. Record the "fixup" for later. Targets are denoted instead by 30 //! symbolic "labels" that are then bound to certain offsets in the buffer as 31 //! we emit code. (Nominally, there is a label at the start of every basic 32 //! block.) 33 //! 34 //! - As we do this, track the offset in the buffer at which the first label 35 //! reference "goes out of range". We call this the "deadline". If we reach the 36 //! deadline and we still have not bound the label to which an unresolved branch 37 //! refers, we have a problem! 38 //! 39 //! - To solve this problem, we emit "islands" full of "veneers". An island is 40 //! simply a chunk of code inserted in the middle of the code actually produced 41 //! by the emitter (e.g., vcode iterating over instruction structs). The emitter 42 //! has some awareness of this: it either asks for an island between blocks, so 43 //! it is not accidentally executed, or else it emits a branch around the island 44 //! when all other options fail (see `Inst::EmitIsland` meta-instruction). 45 //! 46 //! - A "veneer" is an instruction (or sequence of instructions) in an "island" 47 //! that implements a longer-range reference to a label. The idea is that, for 48 //! example, a branch with a limited range can branch to a "veneer" instead, 49 //! which is simply a branch in a form that can use a longer-range reference. On 50 //! AArch64, for example, conditionals have a +/- 1 MB range, but a conditional 51 //! can branch to an unconditional branch which has a +/- 128 MB range. Hence, a 52 //! conditional branch's label reference can be fixed up with a "veneer" to 53 //! achieve a longer range. 54 //! 55 //! - To implement all of this, we require the backend to provide a `LabelUse` 56 //! type that implements a trait. This is nominally an enum that records one of 57 //! several kinds of references to an offset in code -- basically, a relocation 58 //! type -- and will usually correspond to different instruction formats. The 59 //! `LabelUse` implementation specifies the maximum range, how to patch in the 60 //! actual label location when known, and how to generate a veneer to extend the 61 //! range. 62 //! 63 //! That satisfies label references, but we still may have suboptimal branch 64 //! patterns. To clean up the branches, we do a simple "peephole"-style 65 //! optimization on the fly. To do so, the emitter (e.g., `Inst::emit()`) 66 //! informs the buffer of branches in the code and, in the case of conditionals, 67 //! the code that would have been emitted to invert this branch's condition. We 68 //! track the "latest branches": these are branches that are contiguous up to 69 //! the current offset. (If any code is emitted after a branch, that branch or 70 //! run of contiguous branches is no longer "latest".) The latest branches are 71 //! those that we can edit by simply truncating the buffer and doing something 72 //! else instead. 73 //! 74 //! To optimize branches, we implement several simple rules, and try to apply 75 //! them to the "latest branches" when possible: 76 //! 77 //! - A branch with a label target, when that label is bound to the ending 78 //! offset of the branch (the fallthrough location), can be removed altogether, 79 //! because the branch would have no effect). 80 //! 81 //! - An unconditional branch that starts at a label location, and branches to 82 //! another label, results in a "label alias": all references to the label bound 83 //! *to* this branch instruction are instead resolved to the *target* of the 84 //! branch instruction. This effectively removes empty blocks that just 85 //! unconditionally branch to the next block. We call this "branch threading". 86 //! 87 //! - A conditional followed by an unconditional, when the conditional branches 88 //! to the unconditional's fallthrough, results in (i) the truncation of the 89 //! unconditional, (ii) the inversion of the condition's condition, and (iii) 90 //! replacement of the conditional's target (using the original target of the 91 //! unconditional). This is a fancy way of saying "we can flip a two-target 92 //! conditional branch's taken/not-taken targets if it works better with our 93 //! fallthrough". To make this work, the emitter actually gives the buffer 94 //! *both* forms of every conditional branch: the true form is emitted into the 95 //! buffer, and the "inverted" machine-code bytes are provided as part of the 96 //! branch-fixup metadata. 97 //! 98 //! - An unconditional B preceded by another unconditional P, when B's label(s) have 99 //! been redirected to target(B), can be removed entirely. This is an extension 100 //! of the branch-threading optimization, and is valid because if we know there 101 //! will be no fallthrough into this branch instruction (the prior instruction 102 //! is an unconditional jump), and if we know we have successfully redirected 103 //! all labels, then this branch instruction is unreachable. Note that this 104 //! works because the redirection happens before the label is ever resolved 105 //! (fixups happen at island emission time, at which point latest-branches are 106 //! cleared, or at the end of emission), so we are sure to catch and redirect 107 //! all possible paths to this instruction. 108 //! 109 //! # Branch-optimization Correctness 110 //! 111 //! The branch-optimization mechanism depends on a few data structures with 112 //! invariants, which are always held outside the scope of top-level public 113 //! methods: 114 //! 115 //! - The latest-branches list. Each entry describes a span of the buffer 116 //! (start/end offsets), the label target, the corresponding fixup-list entry 117 //! index, and the bytes (must be the same length) for the inverted form, if 118 //! conditional. The list of labels that are bound to the start-offset of this 119 //! branch is *complete* (if any label has a resolved offset equal to `start` 120 //! and is not an alias, it must appear in this list) and *precise* (no label 121 //! in this list can be bound to another offset). No label in this list should 122 //! be an alias. No two branch ranges can overlap, and branches are in 123 //! ascending-offset order. 124 //! 125 //! - The labels-at-tail list. This contains all MachLabels that have been bound 126 //! to (whose resolved offsets are equal to) the tail offset of the buffer. 127 //! No label in this list should be an alias. 128 //! 129 //! - The label_offsets array, containing the bound offset of a label or 130 //! UNKNOWN. No label can be bound at an offset greater than the current 131 //! buffer tail. 132 //! 133 //! - The label_aliases array, containing another label to which a label is 134 //! bound or UNKNOWN. A label's resolved offset is the resolved offset 135 //! of the label it is aliased to, if this is set. 136 //! 137 //! We argue below, at each method, how the invariants in these data structures 138 //! are maintained (grep for "Post-invariant"). 139 //! 140 //! Given these invariants, we argue why each optimization preserves execution 141 //! semantics below (grep for "Preserves execution semantics"). 142 //! 143 //! # Avoiding Quadratic Behavior 144 //! 145 //! There are two cases where we've had to take some care to avoid 146 //! quadratic worst-case behavior: 147 //! 148 //! - The "labels at this branch" list can grow unboundedly if the 149 //! code generator binds many labels at one location. If the count 150 //! gets too high (defined by the `LABEL_LIST_THRESHOLD` constant), we 151 //! simply abort an optimization early in a way that is always correct 152 //! but is conservative. 153 //! 154 //! - The fixup list can interact with island emission to create 155 //! "quadratic island behavior". In a little more detail, one can hit 156 //! this behavior by having some pending fixups (forward label 157 //! references) with long-range label-use kinds, and some others 158 //! with shorter-range references that nonetheless still are pending 159 //! long enough to trigger island generation. In such a case, we 160 //! process the fixup list, generate veneers to extend some forward 161 //! references' ranges, but leave the other (longer-range) ones 162 //! alone. The way this was implemented put them back on a list and 163 //! resulted in quadratic behavior. 164 //! 165 //! To avoid this fixups are split into two lists: one "pending" list and one 166 //! final list. The pending list is kept around for handling fixups related to 167 //! branches so it can be edited/truncated. When an island is reached, which 168 //! starts processing fixups, all pending fixups are flushed into the final 169 //! list. The final list is a `BinaryHeap` which enables fixup processing to 170 //! only process those which are required during island emission, deferring 171 //! all longer-range fixups to later. 172 173 use crate::binemit::{Addend, CodeOffset, Reloc}; 174 use crate::ir::function::FunctionParameters; 175 use crate::ir::{DebugTag, ExceptionTag, ExternalName, RelSourceLoc, SourceLoc, TrapCode}; 176 use crate::isa::unwind::UnwindInst; 177 use crate::machinst::{ 178 BlockIndex, MachInstLabelUse, TextSectionBuilder, VCodeConstant, VCodeConstants, VCodeInst, 179 }; 180 use crate::trace; 181 use crate::{MachInstEmitState, ir}; 182 use crate::{VCodeConstantData, timing}; 183 use alloc::collections::BinaryHeap; 184 use alloc::string::String; 185 use alloc::vec::Vec; 186 use core::cmp::Ordering; 187 use core::mem; 188 use core::ops::Range; 189 use cranelift_control::ControlPlane; 190 use cranelift_entity::{PrimaryMap, SecondaryMap, entity_impl}; 191 use smallvec::SmallVec; 192 193 #[cfg(feature = "enable-serde")] 194 use serde::{Deserialize, Serialize}; 195 196 #[cfg(feature = "enable-serde")] 197 pub trait CompilePhase { 198 type MachSrcLocType: for<'a> Deserialize<'a> + Serialize + core::fmt::Debug + PartialEq + Clone; 199 type SourceLocType: for<'a> Deserialize<'a> + Serialize + core::fmt::Debug + PartialEq + Clone; 200 } 201 202 #[cfg(not(feature = "enable-serde"))] 203 pub trait CompilePhase { 204 type MachSrcLocType: core::fmt::Debug + PartialEq + Clone; 205 type SourceLocType: core::fmt::Debug + PartialEq + Clone; 206 } 207 208 /// Status of a compiled artifact that needs patching before being used. 209 #[derive(Clone, Debug, PartialEq)] 210 #[cfg_attr(feature = "enable-serde", derive(Serialize, Deserialize))] 211 pub struct Stencil; 212 213 /// Status of a compiled artifact ready to use. 214 #[derive(Clone, Debug, PartialEq)] 215 pub struct Final; 216 217 impl CompilePhase for Stencil { 218 type MachSrcLocType = MachSrcLoc<Stencil>; 219 type SourceLocType = RelSourceLoc; 220 } 221 222 impl CompilePhase for Final { 223 type MachSrcLocType = MachSrcLoc<Final>; 224 type SourceLocType = SourceLoc; 225 } 226 227 #[derive(Clone, Copy, Debug, PartialEq, Eq)] 228 enum ForceVeneers { 229 Yes, 230 No, 231 } 232 233 /// A buffer of output to be produced, fixed up, and then emitted to a CodeSink 234 /// in bulk. 235 /// 236 /// This struct uses `SmallVec`s to support small-ish function bodies without 237 /// any heap allocation. As such, it will be several kilobytes large. This is 238 /// likely fine as long as it is stack-allocated for function emission then 239 /// thrown away; but beware if many buffer objects are retained persistently. 240 pub struct MachBuffer<I: VCodeInst> { 241 /// The buffer contents, as raw bytes. 242 data: SmallVec<[u8; 1024]>, 243 /// The required alignment of this buffer. 244 min_alignment: u32, 245 /// Any relocations referring to this code. Note that only *external* 246 /// relocations are tracked here; references to labels within the buffer are 247 /// resolved before emission. 248 relocs: SmallVec<[MachReloc; 16]>, 249 /// Any trap records referring to this code. 250 traps: SmallVec<[MachTrap; 16]>, 251 /// Any call site records referring to this code. 252 call_sites: SmallVec<[MachCallSite; 16]>, 253 /// Any patchable call site locations. 254 patchable_call_sites: SmallVec<[MachPatchableCallSite; 16]>, 255 /// Any exception-handler records referred to at call sites. 256 exception_handlers: SmallVec<[MachExceptionHandler; 16]>, 257 /// Any source location mappings referring to this code. 258 srclocs: SmallVec<[MachSrcLoc<Stencil>; 64]>, 259 /// Any debug tags referring to this code. 260 debug_tags: Vec<MachDebugTags>, 261 /// Pool of debug tags referenced by `MachDebugTags` entries. 262 debug_tag_pool: Vec<DebugTag>, 263 /// Any user stack maps for this code. 264 /// 265 /// Each entry is an `(offset, span, stack_map)` triple. Entries are sorted 266 /// by code offset, and each stack map covers `span` bytes on the stack. 267 user_stack_maps: SmallVec<[(CodeOffset, u32, ir::UserStackMap); 8]>, 268 /// Any unwind info at a given location. 269 unwind_info: SmallVec<[(CodeOffset, UnwindInst); 8]>, 270 /// The current source location in progress (after `start_srcloc()` and 271 /// before `end_srcloc()`). This is a (start_offset, src_loc) tuple. 272 cur_srcloc: Option<(CodeOffset, RelSourceLoc)>, 273 /// Known label offsets; `UNKNOWN_LABEL_OFFSET` if unknown. 274 label_offsets: SmallVec<[CodeOffset; 16]>, 275 /// Label aliases: when one label points to an unconditional jump, and that 276 /// jump points to another label, we can redirect references to the first 277 /// label immediately to the second. 278 /// 279 /// Invariant: we don't have label-alias cycles. We ensure this by, 280 /// before setting label A to alias label B, resolving B's alias 281 /// target (iteratively until a non-aliased label); if B is already 282 /// aliased to A, then we cannot alias A back to B. 283 label_aliases: SmallVec<[MachLabel; 16]>, 284 /// Constants that must be emitted at some point. 285 pending_constants: SmallVec<[VCodeConstant; 16]>, 286 /// Byte size of all constants in `pending_constants`. 287 pending_constants_size: CodeOffset, 288 /// Traps that must be emitted at some point. 289 pending_traps: SmallVec<[MachLabelTrap; 16]>, 290 /// Fixups that haven't yet been flushed into `fixup_records` below and may 291 /// be related to branches that are chomped. These all get added to 292 /// `fixup_records` during island emission. 293 pending_fixup_records: SmallVec<[MachLabelFixup<I>; 16]>, 294 /// The nearest upcoming deadline for entries in `pending_fixup_records`. 295 pending_fixup_deadline: CodeOffset, 296 /// Fixups that must be performed after all code is emitted. 297 fixup_records: BinaryHeap<MachLabelFixup<I>>, 298 /// Latest branches, to facilitate in-place editing for better fallthrough 299 /// behavior and empty-block removal. 300 latest_branches: SmallVec<[MachBranch; 4]>, 301 /// All labels at the current offset (emission tail). This is lazily 302 /// cleared: it is actually accurate as long as the current offset is 303 /// `labels_at_tail_off`, but if `cur_offset()` has grown larger, it should 304 /// be considered as empty. 305 /// 306 /// For correctness, this *must* be complete (i.e., the vector must contain 307 /// all labels whose offsets are resolved to the current tail), because we 308 /// rely on it to update labels when we truncate branches. 309 labels_at_tail: SmallVec<[MachLabel; 4]>, 310 /// The last offset at which `labels_at_tail` is valid. It is conceptually 311 /// always describing the tail of the buffer, but we do not clear 312 /// `labels_at_tail` eagerly when the tail grows, rather we lazily clear it 313 /// when the offset has grown past this (`labels_at_tail_off`) point. 314 /// Always <= `cur_offset()`. 315 labels_at_tail_off: CodeOffset, 316 /// Metadata about all constants that this function has access to. 317 /// 318 /// This records the size/alignment of all constants (not the actual data) 319 /// along with the last available label generated for the constant. This map 320 /// is consulted when constants are referred to and the label assigned to a 321 /// constant may change over time as well. 322 constants: PrimaryMap<VCodeConstant, MachBufferConstant>, 323 /// All recorded usages of constants as pairs of the constant and where the 324 /// constant needs to be placed within `self.data`. Note that the same 325 /// constant may appear in this array multiple times if it was emitted 326 /// multiple times. 327 used_constants: SmallVec<[(VCodeConstant, CodeOffset); 4]>, 328 /// Indicates when a patchable region is currently open, to guard that it's 329 /// not possible to nest patchable regions. 330 open_patchable: bool, 331 /// Stack frame layout metadata. If provided for a MachBuffer 332 /// containing a function body, this allows interpretation of 333 /// runtime state given a view of an active stack frame. 334 frame_layout: Option<MachBufferFrameLayout>, 335 } 336 337 impl MachBufferFinalized<Stencil> { 338 /// Get a finalized machine buffer by applying the function's base source location. 339 pub fn apply_base_srcloc(self, base_srcloc: SourceLoc) -> MachBufferFinalized<Final> { 340 MachBufferFinalized { 341 data: self.data, 342 relocs: self.relocs, 343 traps: self.traps, 344 call_sites: self.call_sites, 345 patchable_call_sites: self.patchable_call_sites, 346 exception_handlers: self.exception_handlers, 347 srclocs: self 348 .srclocs 349 .into_iter() 350 .map(|srcloc| srcloc.apply_base_srcloc(base_srcloc)) 351 .collect(), 352 debug_tags: self.debug_tags, 353 debug_tag_pool: self.debug_tag_pool, 354 user_stack_maps: self.user_stack_maps, 355 unwind_info: self.unwind_info, 356 alignment: self.alignment, 357 frame_layout: self.frame_layout, 358 nop_units: self.nop_units, 359 } 360 } 361 } 362 363 /// A `MachBuffer` once emission is completed: holds generated code and records, 364 /// without fixups. This allows the type to be independent of the backend. 365 #[derive(PartialEq, Debug, Clone)] 366 #[cfg_attr( 367 feature = "enable-serde", 368 derive(serde_derive::Serialize, serde_derive::Deserialize) 369 )] 370 pub struct MachBufferFinalized<T: CompilePhase> { 371 /// The buffer contents, as raw bytes. 372 pub(crate) data: SmallVec<[u8; 1024]>, 373 /// Any relocations referring to this code. Note that only *external* 374 /// relocations are tracked here; references to labels within the buffer are 375 /// resolved before emission. 376 pub(crate) relocs: SmallVec<[FinalizedMachReloc; 16]>, 377 /// Any trap records referring to this code. 378 pub(crate) traps: SmallVec<[MachTrap; 16]>, 379 /// Any call site records referring to this code. 380 pub(crate) call_sites: SmallVec<[MachCallSite; 16]>, 381 /// Any patchable call site locations referring to this code. 382 pub(crate) patchable_call_sites: SmallVec<[MachPatchableCallSite; 16]>, 383 /// Any exception-handler records referred to at call sites. 384 pub(crate) exception_handlers: SmallVec<[FinalizedMachExceptionHandler; 16]>, 385 /// Any source location mappings referring to this code. 386 pub(crate) srclocs: SmallVec<[T::MachSrcLocType; 64]>, 387 /// Any debug tags referring to this code. 388 pub(crate) debug_tags: Vec<MachDebugTags>, 389 /// Pool of debug tags referenced by `MachDebugTags` entries. 390 pub(crate) debug_tag_pool: Vec<DebugTag>, 391 /// Any user stack maps for this code. 392 /// 393 /// Each entry is an `(offset, span, stack_map)` triple. Entries are sorted 394 /// by code offset, and each stack map covers `span` bytes on the stack. 395 pub(crate) user_stack_maps: SmallVec<[(CodeOffset, u32, ir::UserStackMap); 8]>, 396 /// Stack frame layout metadata. If provided for a MachBuffer 397 /// containing a function body, this allows interpretation of 398 /// runtime state given a view of an active stack frame. 399 pub(crate) frame_layout: Option<MachBufferFrameLayout>, 400 /// Any unwind info at a given location. 401 pub unwind_info: SmallVec<[(CodeOffset, UnwindInst); 8]>, 402 /// The required alignment of this buffer. 403 pub alignment: u32, 404 /// The means by which to NOP out patchable call sites. 405 /// 406 /// This allows a consumer of a `MachBufferFinalized` to disable 407 /// patchable call sites (which are enabled by default) without 408 /// specific knowledge of the target ISA. 409 /// 410 /// Each entry is one form of nop, and these are required to be 411 /// sorted in ascending-size order. 412 pub nop_units: Vec<Vec<u8>>, 413 } 414 415 const UNKNOWN_LABEL_OFFSET: CodeOffset = 0xffff_ffff; 416 const UNKNOWN_LABEL: MachLabel = MachLabel(0xffff_ffff); 417 418 /// Threshold on max length of `labels_at_this_branch` list to avoid 419 /// unbounded quadratic behavior (see comment below at use-site). 420 const LABEL_LIST_THRESHOLD: usize = 100; 421 422 /// A label refers to some offset in a `MachBuffer`. It may not be resolved at 423 /// the point at which it is used by emitted code; the buffer records "fixups" 424 /// for references to the label, and will come back and patch the code 425 /// appropriately when the label's location is eventually known. 426 #[derive(Clone, Copy, Debug, PartialEq, Eq, PartialOrd, Ord, Hash)] 427 pub struct MachLabel(u32); 428 entity_impl!(MachLabel); 429 430 impl MachLabel { 431 /// Get a label for a block. (The first N MachLabels are always reserved for 432 /// the N blocks in the vcode.) 433 pub fn from_block(bindex: BlockIndex) -> MachLabel { 434 MachLabel(bindex.index() as u32) 435 } 436 437 /// Creates a string representing this label, for convenience. 438 pub fn to_string(&self) -> String { 439 format!("label{}", self.0) 440 } 441 } 442 443 impl Default for MachLabel { 444 fn default() -> Self { 445 UNKNOWN_LABEL 446 } 447 } 448 449 /// Represents the beginning of an editable region in the [`MachBuffer`], while code emission is 450 /// still occurring. An [`OpenPatchRegion`] is closed by [`MachBuffer::end_patchable`], consuming 451 /// the [`OpenPatchRegion`] token in the process. 452 pub struct OpenPatchRegion(usize); 453 454 /// A region in the [`MachBuffer`] code buffer that can be edited prior to finalization. An example 455 /// of where you might want to use this is for patching instructions that mention constants that 456 /// won't be known until later: [`MachBuffer::start_patchable`] can be used to begin the patchable 457 /// region, instructions can be emitted with placeholder constants, and the [`PatchRegion`] token 458 /// can be produced by [`MachBuffer::end_patchable`]. Once the values of those constants are known, 459 /// the [`PatchRegion::patch`] function can be used to get a mutable buffer to the instruction 460 /// bytes, and the constants uses can be updated directly. 461 pub struct PatchRegion { 462 range: Range<usize>, 463 } 464 465 impl PatchRegion { 466 /// Consume the patch region to yield a mutable slice of the [`MachBuffer`] data buffer. 467 pub fn patch<I: VCodeInst>(self, buffer: &mut MachBuffer<I>) -> &mut [u8] { 468 &mut buffer.data[self.range] 469 } 470 } 471 472 impl<I: VCodeInst> MachBuffer<I> { 473 /// Create a new section, known to start at `start_offset` and with a size limited to 474 /// `length_limit`. 475 pub fn new() -> MachBuffer<I> { 476 MachBuffer { 477 data: SmallVec::new(), 478 min_alignment: I::function_alignment().minimum, 479 relocs: SmallVec::new(), 480 traps: SmallVec::new(), 481 call_sites: SmallVec::new(), 482 patchable_call_sites: SmallVec::new(), 483 exception_handlers: SmallVec::new(), 484 srclocs: SmallVec::new(), 485 debug_tags: vec![], 486 debug_tag_pool: vec![], 487 user_stack_maps: SmallVec::new(), 488 unwind_info: SmallVec::new(), 489 cur_srcloc: None, 490 label_offsets: SmallVec::new(), 491 label_aliases: SmallVec::new(), 492 pending_constants: SmallVec::new(), 493 pending_constants_size: 0, 494 pending_traps: SmallVec::new(), 495 pending_fixup_records: SmallVec::new(), 496 pending_fixup_deadline: u32::MAX, 497 fixup_records: Default::default(), 498 latest_branches: SmallVec::new(), 499 labels_at_tail: SmallVec::new(), 500 labels_at_tail_off: 0, 501 constants: Default::default(), 502 used_constants: Default::default(), 503 open_patchable: false, 504 frame_layout: None, 505 } 506 } 507 508 /// Current offset from start of buffer. 509 pub fn cur_offset(&self) -> CodeOffset { 510 self.data.len() as CodeOffset 511 } 512 513 /// Add a byte. 514 pub fn put1(&mut self, value: u8) { 515 self.data.push(value); 516 517 // Post-invariant: conceptual-labels_at_tail contains a complete and 518 // precise list of labels bound at `cur_offset()`. We have advanced 519 // `cur_offset()`, hence if it had been equal to `labels_at_tail_off` 520 // before, it is not anymore (and it cannot become equal, because 521 // `labels_at_tail_off` is always <= `cur_offset()`). Thus the list is 522 // conceptually empty (even though it is only lazily cleared). No labels 523 // can be bound at this new offset (by invariant on `label_offsets`). 524 // Hence the invariant holds. 525 } 526 527 /// Add 2 bytes. 528 pub fn put2(&mut self, value: u16) { 529 let bytes = value.to_le_bytes(); 530 self.data.extend_from_slice(&bytes[..]); 531 532 // Post-invariant: as for `put1()`. 533 } 534 535 /// Add 4 bytes. 536 pub fn put4(&mut self, value: u32) { 537 let bytes = value.to_le_bytes(); 538 self.data.extend_from_slice(&bytes[..]); 539 540 // Post-invariant: as for `put1()`. 541 } 542 543 /// Add 8 bytes. 544 pub fn put8(&mut self, value: u64) { 545 let bytes = value.to_le_bytes(); 546 self.data.extend_from_slice(&bytes[..]); 547 548 // Post-invariant: as for `put1()`. 549 } 550 551 /// Add a slice of bytes. 552 pub fn put_data(&mut self, data: &[u8]) { 553 self.data.extend_from_slice(data); 554 555 // Post-invariant: as for `put1()`. 556 } 557 558 /// Reserve appended space and return a mutable slice referring to it. 559 pub fn get_appended_space(&mut self, len: usize) -> &mut [u8] { 560 let off = self.data.len(); 561 let new_len = self.data.len() + len; 562 self.data.resize(new_len, 0); 563 &mut self.data[off..] 564 565 // Post-invariant: as for `put1()`. 566 } 567 568 /// Align up to the given alignment. 569 pub fn align_to(&mut self, align_to: CodeOffset) { 570 trace!("MachBuffer: align to {}", align_to); 571 assert!( 572 align_to.is_power_of_two(), 573 "{align_to} is not a power of two" 574 ); 575 while self.cur_offset() & (align_to - 1) != 0 { 576 self.put1(0); 577 } 578 579 // Post-invariant: as for `put1()`. 580 } 581 582 /// Begin a region of patchable code. There is one requirement for the 583 /// code that is emitted: It must not introduce any instructions that 584 /// could be chomped (branches are an example of this). In other words, 585 /// you must not call [`MachBuffer::add_cond_branch`] or 586 /// [`MachBuffer::add_uncond_branch`] between calls to this method and 587 /// [`MachBuffer::end_patchable`]. 588 pub fn start_patchable(&mut self) -> OpenPatchRegion { 589 assert!(!self.open_patchable, "Patchable regions may not be nested"); 590 self.open_patchable = true; 591 OpenPatchRegion(usize::try_from(self.cur_offset()).unwrap()) 592 } 593 594 /// End a region of patchable code, yielding a [`PatchRegion`] value that 595 /// can be consumed later to produce a one-off mutable slice to the 596 /// associated region of the data buffer. 597 pub fn end_patchable(&mut self, open: OpenPatchRegion) -> PatchRegion { 598 // No need to assert the state of `open_patchable` here, as we take 599 // ownership of the only `OpenPatchable` value. 600 self.open_patchable = false; 601 let end = usize::try_from(self.cur_offset()).unwrap(); 602 PatchRegion { range: open.0..end } 603 } 604 605 /// Allocate a `Label` to refer to some offset. May not be bound to a fixed 606 /// offset yet. 607 pub fn get_label(&mut self) -> MachLabel { 608 let l = self.label_offsets.len() as u32; 609 self.label_offsets.push(UNKNOWN_LABEL_OFFSET); 610 self.label_aliases.push(UNKNOWN_LABEL); 611 trace!("MachBuffer: new label -> {:?}", MachLabel(l)); 612 MachLabel(l) 613 614 // Post-invariant: the only mutation is to add a new label; it has no 615 // bound offset yet, so it trivially satisfies all invariants. 616 } 617 618 /// Reserve the first N MachLabels for blocks. 619 pub fn reserve_labels_for_blocks(&mut self, blocks: usize) { 620 trace!("MachBuffer: first {} labels are for blocks", blocks); 621 debug_assert!(self.label_offsets.is_empty()); 622 self.label_offsets.resize(blocks, UNKNOWN_LABEL_OFFSET); 623 self.label_aliases.resize(blocks, UNKNOWN_LABEL); 624 625 // Post-invariant: as for `get_label()`. 626 } 627 628 /// Registers metadata in this `MachBuffer` about the `constants` provided. 629 /// 630 /// This will record the size/alignment of all constants which will prepare 631 /// them for emission later on. 632 pub fn register_constants(&mut self, constants: &VCodeConstants) { 633 for (c, val) in constants.iter() { 634 self.register_constant(&c, val); 635 } 636 } 637 638 /// Similar to [`MachBuffer::register_constants`] but registers a 639 /// single constant metadata. This function is useful in 640 /// situations where not all constants are known at the time of 641 /// emission. 642 pub fn register_constant(&mut self, constant: &VCodeConstant, data: &VCodeConstantData) { 643 let c2 = self.constants.push(MachBufferConstant { 644 upcoming_label: None, 645 align: data.alignment(), 646 size: data.as_slice().len(), 647 }); 648 assert_eq!(*constant, c2); 649 } 650 651 /// Completes constant emission by iterating over `self.used_constants` and 652 /// filling in the "holes" with the constant values provided by `constants`. 653 /// 654 /// Returns the alignment required for this entire buffer. Alignment starts 655 /// at the ISA's minimum function alignment and can be increased due to 656 /// constant requirements. 657 fn finish_constants(&mut self, constants: &VCodeConstants) -> u32 { 658 let mut alignment = self.min_alignment; 659 for (constant, offset) in mem::take(&mut self.used_constants) { 660 let constant = constants.get(constant); 661 let data = constant.as_slice(); 662 self.data[offset as usize..][..data.len()].copy_from_slice(data); 663 alignment = constant.alignment().max(alignment); 664 } 665 alignment 666 } 667 668 /// Returns a label that can be used to refer to the `constant` provided. 669 /// 670 /// This will automatically defer a new constant to be emitted for 671 /// `constant` if it has not been previously emitted. Note that this 672 /// function may return a different label for the same constant at 673 /// different points in time. The label is valid to use only from the 674 /// current location; the MachBuffer takes care to emit the same constant 675 /// multiple times if needed so the constant is always in range. 676 pub fn get_label_for_constant(&mut self, constant: VCodeConstant) -> MachLabel { 677 let MachBufferConstant { 678 align, 679 size, 680 upcoming_label, 681 } = self.constants[constant]; 682 if let Some(label) = upcoming_label { 683 return label; 684 } 685 686 let label = self.get_label(); 687 trace!( 688 "defer constant: eventually emit {size} bytes aligned \ 689 to {align} at label {label:?}", 690 ); 691 self.pending_constants.push(constant); 692 self.pending_constants_size += size as u32; 693 self.constants[constant].upcoming_label = Some(label); 694 label 695 } 696 697 /// Bind a label to the current offset. A label can only be bound once. 698 pub fn bind_label(&mut self, label: MachLabel, ctrl_plane: &mut ControlPlane) { 699 trace!( 700 "MachBuffer: bind label {:?} at offset {}", 701 label, 702 self.cur_offset() 703 ); 704 debug_assert_eq!(self.label_offsets[label.0 as usize], UNKNOWN_LABEL_OFFSET); 705 debug_assert_eq!(self.label_aliases[label.0 as usize], UNKNOWN_LABEL); 706 let offset = self.cur_offset(); 707 self.label_offsets[label.0 as usize] = offset; 708 self.lazily_clear_labels_at_tail(); 709 self.labels_at_tail.push(label); 710 711 // Invariants hold: bound offset of label is <= cur_offset (in fact it 712 // is equal). If the `labels_at_tail` list was complete and precise 713 // before, it is still, because we have bound this label to the current 714 // offset and added it to the list (which contains all labels at the 715 // current offset). 716 717 self.optimize_branches(ctrl_plane); 718 719 // Post-invariant: by `optimize_branches()` (see argument there). 720 } 721 722 /// Lazily clear `labels_at_tail` if the tail offset has moved beyond the 723 /// offset that it applies to. 724 fn lazily_clear_labels_at_tail(&mut self) { 725 let offset = self.cur_offset(); 726 if offset > self.labels_at_tail_off { 727 self.labels_at_tail_off = offset; 728 self.labels_at_tail.clear(); 729 } 730 731 // Post-invariant: either labels_at_tail_off was at cur_offset, and 732 // state is untouched, or was less than cur_offset, in which case the 733 // labels_at_tail list was conceptually empty, and is now actually 734 // empty. 735 } 736 737 /// Resolve a label to an offset, if known. May return `UNKNOWN_LABEL_OFFSET`. 738 pub(crate) fn resolve_label_offset(&self, mut label: MachLabel) -> CodeOffset { 739 let mut iters = 0; 740 while self.label_aliases[label.0 as usize] != UNKNOWN_LABEL { 741 label = self.label_aliases[label.0 as usize]; 742 // To protect against an infinite loop (despite our assurances to 743 // ourselves that the invariants make this impossible), assert out 744 // after 1M iterations. The number of basic blocks is limited 745 // in most contexts anyway so this should be impossible to hit with 746 // a legitimate input. 747 iters += 1; 748 assert!(iters < 1_000_000, "Unexpected cycle in label aliases"); 749 } 750 self.label_offsets[label.0 as usize] 751 752 // Post-invariant: no mutations. 753 } 754 755 /// Emit a reference to the given label with the given reference type (i.e., 756 /// branch-instruction format) at the current offset. This is like a 757 /// relocation, but handled internally. 758 /// 759 /// This can be called before the branch is actually emitted; fixups will 760 /// not happen until an island is emitted or the buffer is finished. 761 pub fn use_label_at_offset(&mut self, offset: CodeOffset, label: MachLabel, kind: I::LabelUse) { 762 trace!( 763 "MachBuffer: use_label_at_offset: offset {} label {:?} kind {:?}", 764 offset, label, kind 765 ); 766 767 // Add the fixup, and update the worst-case island size based on a 768 // veneer for this label use. 769 let fixup = MachLabelFixup { 770 label, 771 offset, 772 kind, 773 }; 774 self.pending_fixup_deadline = self 775 .pending_fixup_deadline 776 // Subtract one alignment here to the deadline to account for 777 // extra space taken by aligning an island. 778 .min(fixup.deadline() - I::LabelUse::ALIGN); 779 trace!("pending_fixup_deadline = {}", self.pending_fixup_deadline); 780 self.pending_fixup_records.push(fixup); 781 782 // Post-invariant: no mutations to branches/labels data structures. 783 } 784 785 /// Inform the buffer of an unconditional branch at the given offset, 786 /// targeting the given label. May be used to optimize branches. 787 /// The last added label-use must correspond to this branch. 788 /// This must be called when the current offset is equal to `start`; i.e., 789 /// before actually emitting the branch. This implies that for a branch that 790 /// uses a label and is eligible for optimizations by the MachBuffer, the 791 /// proper sequence is: 792 /// 793 /// - Call `use_label_at_offset()` to emit the fixup record. 794 /// - Call `add_uncond_branch()` to make note of the branch. 795 /// - Emit the bytes for the branch's machine code. 796 /// 797 /// Additional requirement: no labels may be bound between `start` and `end` 798 /// (exclusive on both ends). 799 pub fn add_uncond_branch(&mut self, start: CodeOffset, end: CodeOffset, target: MachLabel) { 800 debug_assert!( 801 !self.open_patchable, 802 "Branch instruction inserted within a patchable region" 803 ); 804 assert!(self.cur_offset() == start); 805 debug_assert!(end > start); 806 assert!(!self.pending_fixup_records.is_empty()); 807 let fixup = self.pending_fixup_records.len() - 1; 808 self.lazily_clear_labels_at_tail(); 809 self.latest_branches.push(MachBranch { 810 start, 811 end, 812 target, 813 fixup, 814 inverted: None, 815 labels_at_this_branch: self.labels_at_tail.clone(), 816 }); 817 818 // Post-invariant: we asserted branch start is current tail; the list of 819 // labels at branch is cloned from list of labels at current tail. 820 } 821 822 /// Inform the buffer of a conditional branch at the given offset, 823 /// targeting the given label. May be used to optimize branches. 824 /// The last added label-use must correspond to this branch. 825 /// 826 /// Additional requirement: no labels may be bound between `start` and `end` 827 /// (exclusive on both ends). 828 pub fn add_cond_branch( 829 &mut self, 830 start: CodeOffset, 831 end: CodeOffset, 832 target: MachLabel, 833 inverted: &[u8], 834 ) { 835 debug_assert!( 836 !self.open_patchable, 837 "Branch instruction inserted within a patchable region" 838 ); 839 assert!(self.cur_offset() == start); 840 debug_assert!(end > start); 841 assert!(!self.pending_fixup_records.is_empty()); 842 debug_assert!( 843 inverted.len() == (end - start) as usize, 844 "branch length = {}, but inverted length = {}", 845 end - start, 846 inverted.len() 847 ); 848 let fixup = self.pending_fixup_records.len() - 1; 849 let inverted = Some(SmallVec::from(inverted)); 850 self.lazily_clear_labels_at_tail(); 851 self.latest_branches.push(MachBranch { 852 start, 853 end, 854 target, 855 fixup, 856 inverted, 857 labels_at_this_branch: self.labels_at_tail.clone(), 858 }); 859 860 // Post-invariant: we asserted branch start is current tail; labels at 861 // branch list is cloned from list of labels at current tail. 862 } 863 864 fn truncate_last_branch(&mut self) { 865 debug_assert!( 866 !self.open_patchable, 867 "Branch instruction truncated within a patchable region" 868 ); 869 870 self.lazily_clear_labels_at_tail(); 871 // Invariants hold at this point. 872 873 let b = self.latest_branches.pop().unwrap(); 874 assert!(b.end == self.cur_offset()); 875 876 // State: 877 // [PRE CODE] 878 // Offset b.start, b.labels_at_this_branch: 879 // [BRANCH CODE] 880 // cur_off, self.labels_at_tail --> 881 // (end of buffer) 882 self.data.truncate(b.start as usize); 883 self.pending_fixup_records.truncate(b.fixup); 884 885 // Trim srclocs and debug tags now past the end of the buffer. 886 while let Some(last_srcloc) = self.srclocs.last_mut() { 887 if last_srcloc.end <= b.start { 888 break; 889 } 890 if last_srcloc.start < b.start { 891 last_srcloc.end = b.start; 892 break; 893 } 894 self.srclocs.pop(); 895 } 896 while let Some(last_debug_tag) = self.debug_tags.last() { 897 if last_debug_tag.offset <= b.start { 898 break; 899 } 900 self.debug_tags.pop(); 901 } 902 903 // State: 904 // [PRE CODE] 905 // cur_off, Offset b.start, b.labels_at_this_branch: 906 // (end of buffer) 907 // 908 // self.labels_at_tail --> (past end of buffer) 909 let cur_off = self.cur_offset(); 910 self.labels_at_tail_off = cur_off; 911 // State: 912 // [PRE CODE] 913 // cur_off, Offset b.start, b.labels_at_this_branch, 914 // self.labels_at_tail: 915 // (end of buffer) 916 // 917 // resolve_label_offset(l) for l in labels_at_tail: 918 // (past end of buffer) 919 920 trace!( 921 "truncate_last_branch: truncated {:?}; off now {}", 922 b, cur_off 923 ); 924 925 // Fix up resolved label offsets for labels at tail. 926 for &l in &self.labels_at_tail { 927 self.label_offsets[l.0 as usize] = cur_off; 928 } 929 // Old labels_at_this_branch are now at cur_off. 930 self.labels_at_tail.extend(b.labels_at_this_branch); 931 932 // Post-invariant: this operation is defined to truncate the buffer, 933 // which moves cur_off backward, and to move labels at the end of the 934 // buffer back to the start-of-branch offset. 935 // 936 // latest_branches satisfies all invariants: 937 // - it has no branches past the end of the buffer (branches are in 938 // order, we removed the last one, and we truncated the buffer to just 939 // before the start of that branch) 940 // - no labels were moved to lower offsets than the (new) cur_off, so 941 // the labels_at_this_branch list for any other branch need not change. 942 // 943 // labels_at_tail satisfies all invariants: 944 // - all labels that were at the tail after the truncated branch are 945 // moved backward to just before the branch, which becomes the new tail; 946 // thus every element in the list should remain (ensured by `.extend()` 947 // above). 948 // - all labels that refer to the new tail, which is the start-offset of 949 // the truncated branch, must be present. The `labels_at_this_branch` 950 // list in the truncated branch's record is a complete and precise list 951 // of exactly these labels; we append these to labels_at_tail. 952 // - labels_at_tail_off is at cur_off after truncation occurs, so the 953 // list is valid (not to be lazily cleared). 954 // 955 // The stated operation was performed: 956 // - For each label at the end of the buffer prior to this method, it 957 // now resolves to the new (truncated) end of the buffer: it must have 958 // been in `labels_at_tail` (this list is precise and complete, and 959 // the tail was at the end of the truncated branch on entry), and we 960 // iterate over this list and set `label_offsets` to the new tail. 961 // None of these labels could have been an alias (by invariant), so 962 // `label_offsets` is authoritative for each. 963 // - No other labels will be past the end of the buffer, because of the 964 // requirement that no labels be bound to the middle of branch ranges 965 // (see comments to `add_{cond,uncond}_branch()`). 966 // - The buffer is truncated to just before the last branch, and the 967 // fixup record referring to that last branch is removed. 968 } 969 970 /// Performs various optimizations on branches pointing at the current label. 971 pub fn optimize_branches(&mut self, ctrl_plane: &mut ControlPlane) { 972 if ctrl_plane.get_decision() { 973 return; 974 } 975 976 self.lazily_clear_labels_at_tail(); 977 // Invariants valid at this point. 978 979 trace!( 980 "enter optimize_branches:\n b = {:?}\n l = {:?}\n f = {:?}", 981 self.latest_branches, self.labels_at_tail, self.pending_fixup_records 982 ); 983 984 // We continue to munch on branches at the tail of the buffer until no 985 // more rules apply. Note that the loop only continues if a branch is 986 // actually truncated (or if labels are redirected away from a branch), 987 // so this always makes progress. 988 while let Some(b) = self.latest_branches.last() { 989 let cur_off = self.cur_offset(); 990 trace!("optimize_branches: last branch {:?} at off {}", b, cur_off); 991 // If there has been any code emission since the end of the last branch or 992 // label definition, then there's nothing we can edit (because we 993 // don't move code once placed, only back up and overwrite), so 994 // clear the records and finish. 995 if b.end < cur_off { 996 break; 997 } 998 999 // If the "labels at this branch" list on this branch is 1000 // longer than a threshold, don't do any simplification, 1001 // and let the branch remain to separate those labels from 1002 // the current tail. This avoids quadratic behavior (see 1003 // #3468): otherwise, if a long string of "goto next; 1004 // next:" patterns are emitted, all of the labels will 1005 // coalesce into a long list of aliases for the current 1006 // buffer tail. We must track all aliases of the current 1007 // tail for correctness, but we are also allowed to skip 1008 // optimization (removal) of any branch, so we take the 1009 // escape hatch here and let it stand. In effect this 1010 // "spreads" the many thousands of labels in the 1011 // pathological case among an actual (harmless but 1012 // suboptimal) instruction once per N labels. 1013 if b.labels_at_this_branch.len() > LABEL_LIST_THRESHOLD { 1014 break; 1015 } 1016 1017 // Invariant: we are looking at a branch that ends at the tail of 1018 // the buffer. 1019 1020 // For any branch, conditional or unconditional: 1021 // - If the target is a label at the current offset, then remove 1022 // the conditional branch, and reset all labels that targeted 1023 // the current offset (end of branch) to the truncated 1024 // end-of-code. 1025 // 1026 // Preserves execution semantics: a branch to its own fallthrough 1027 // address is equivalent to a no-op; in both cases, nextPC is the 1028 // fallthrough. 1029 if self.resolve_label_offset(b.target) == cur_off { 1030 trace!("branch with target == cur off; truncating"); 1031 self.truncate_last_branch(); 1032 continue; 1033 } 1034 1035 // If latest is an unconditional branch: 1036 // 1037 // - If the branch's target is not its own start address, then for 1038 // each label at the start of branch, make the label an alias of the 1039 // branch target, and remove the label from the "labels at this 1040 // branch" list. 1041 // 1042 // - Preserves execution semantics: an unconditional branch's 1043 // only effect is to set PC to a new PC; this change simply 1044 // collapses one step in the step-semantics. 1045 // 1046 // - Post-invariant: the labels that were bound to the start of 1047 // this branch become aliases, so they must not be present in any 1048 // labels-at-this-branch list or the labels-at-tail list. The 1049 // labels are removed form the latest-branch record's 1050 // labels-at-this-branch list, and are never placed in the 1051 // labels-at-tail list. Furthermore, it is correct that they are 1052 // not in either list, because they are now aliases, and labels 1053 // that are aliases remain aliases forever. 1054 // 1055 // - If there is a prior unconditional branch that ends just before 1056 // this one begins, and this branch has no labels bound to its 1057 // start, then we can truncate this branch, because it is entirely 1058 // unreachable (we have redirected all labels that make it 1059 // reachable otherwise). Do so and continue around the loop. 1060 // 1061 // - Preserves execution semantics: the branch is unreachable, 1062 // because execution can only flow into an instruction from the 1063 // prior instruction's fallthrough or from a branch bound to that 1064 // instruction's start offset. Unconditional branches have no 1065 // fallthrough, so if the prior instruction is an unconditional 1066 // branch, no fallthrough entry can happen. The 1067 // labels-at-this-branch list is complete (by invariant), so if it 1068 // is empty, then the instruction is entirely unreachable. Thus, 1069 // it can be removed. 1070 // 1071 // - Post-invariant: ensured by truncate_last_branch(). 1072 // 1073 // - If there is a prior conditional branch whose target label 1074 // resolves to the current offset (branches around the 1075 // unconditional branch), then remove the unconditional branch, 1076 // and make the target of the unconditional the target of the 1077 // conditional instead. 1078 // 1079 // - Preserves execution semantics: previously we had: 1080 // 1081 // L1: 1082 // cond_br L2 1083 // br L3 1084 // L2: 1085 // (end of buffer) 1086 // 1087 // by removing the last branch, we have: 1088 // 1089 // L1: 1090 // cond_br L2 1091 // L2: 1092 // (end of buffer) 1093 // 1094 // we then fix up the records for the conditional branch to 1095 // have: 1096 // 1097 // L1: 1098 // cond_br.inverted L3 1099 // L2: 1100 // 1101 // In the original code, control flow reaches L2 when the 1102 // conditional branch's predicate is true, and L3 otherwise. In 1103 // the optimized code, the same is true. 1104 // 1105 // - Post-invariant: all edits to latest_branches and 1106 // labels_at_tail are performed by `truncate_last_branch()`, 1107 // which maintains the invariants at each step. 1108 1109 if b.is_uncond() { 1110 // Set any label equal to current branch's start as an alias of 1111 // the branch's target, if the target is not the branch itself 1112 // (i.e., an infinite loop). 1113 // 1114 // We cannot perform this aliasing if the target of this branch 1115 // ultimately aliases back here; if so, we need to keep this 1116 // branch, so break out of this loop entirely (and clear the 1117 // latest-branches list below). 1118 // 1119 // Note that this check is what prevents cycles from forming in 1120 // `self.label_aliases`. To see why, consider an arbitrary start 1121 // state: 1122 // 1123 // label_aliases[L1] = L2, label_aliases[L2] = L3, ..., up to 1124 // Ln, which is not aliased. 1125 // 1126 // We would create a cycle if we assigned label_aliases[Ln] 1127 // = L1. Note that the below assignment is the only write 1128 // to label_aliases. 1129 // 1130 // By our other invariants, we have that Ln (`l` below) 1131 // resolves to the offset `b.start`, because it is in the 1132 // set `b.labels_at_this_branch`. 1133 // 1134 // If L1 were already aliased, through some arbitrarily deep 1135 // chain, to Ln, then it must also resolve to this offset 1136 // `b.start`. 1137 // 1138 // By checking the resolution of `L1` against this offset, 1139 // and aborting this branch-simplification if they are 1140 // equal, we prevent the below assignment from ever creating 1141 // a cycle. 1142 if self.resolve_label_offset(b.target) != b.start { 1143 let redirected = b.labels_at_this_branch.len(); 1144 for &l in &b.labels_at_this_branch { 1145 trace!( 1146 " -> label at start of branch {:?} redirected to target {:?}", 1147 l, b.target 1148 ); 1149 self.label_aliases[l.0 as usize] = b.target; 1150 // NOTE: we continue to ensure the invariant that labels 1151 // pointing to tail of buffer are in `labels_at_tail` 1152 // because we already ensured above that the last branch 1153 // cannot have a target of `cur_off`; so we never have 1154 // to put the label into `labels_at_tail` when moving it 1155 // here. 1156 } 1157 // Maintain invariant: all branches have been redirected 1158 // and are no longer pointing at the start of this branch. 1159 let mut_b = self.latest_branches.last_mut().unwrap(); 1160 mut_b.labels_at_this_branch.clear(); 1161 1162 if redirected > 0 { 1163 trace!(" -> after label redirects, restarting loop"); 1164 continue; 1165 } 1166 } else { 1167 break; 1168 } 1169 1170 let b = self.latest_branches.last().unwrap(); 1171 1172 // Examine any immediately preceding branch. 1173 if self.latest_branches.len() > 1 { 1174 let prev_b = &self.latest_branches[self.latest_branches.len() - 2]; 1175 trace!(" -> more than one branch; prev_b = {:?}", prev_b); 1176 // This uncond is immediately after another uncond; we 1177 // should have already redirected labels to this uncond away 1178 // (but check to be sure); so we can truncate this uncond. 1179 if prev_b.is_uncond() 1180 && prev_b.end == b.start 1181 && b.labels_at_this_branch.is_empty() 1182 { 1183 trace!(" -> uncond follows another uncond; truncating"); 1184 self.truncate_last_branch(); 1185 continue; 1186 } 1187 1188 // This uncond is immediately after a conditional, and the 1189 // conditional's target is the end of this uncond, and we've 1190 // already redirected labels to this uncond away; so we can 1191 // truncate this uncond, flip the sense of the conditional, and 1192 // set the conditional's target (in `latest_branches` and in 1193 // `fixup_records`) to the uncond's target. 1194 if prev_b.is_cond() 1195 && prev_b.end == b.start 1196 && self.resolve_label_offset(prev_b.target) == cur_off 1197 { 1198 trace!( 1199 " -> uncond follows a conditional, and conditional's target resolves to current offset" 1200 ); 1201 // Save the target of the uncond (this becomes the 1202 // target of the cond), and truncate the uncond. 1203 let target = b.target; 1204 let data = prev_b.inverted.clone().unwrap(); 1205 self.truncate_last_branch(); 1206 1207 // Mutate the code and cond branch. 1208 let off_before_edit = self.cur_offset(); 1209 let prev_b = self.latest_branches.last_mut().unwrap(); 1210 let not_inverted = SmallVec::from( 1211 &self.data[(prev_b.start as usize)..(prev_b.end as usize)], 1212 ); 1213 1214 // Low-level edit: replaces bytes of branch with 1215 // inverted form. cur_off remains the same afterward, so 1216 // we do not need to modify label data structures. 1217 self.data.truncate(prev_b.start as usize); 1218 self.data.extend_from_slice(&data[..]); 1219 1220 // Save the original code as the inversion of the 1221 // inverted branch, in case we later edit this branch 1222 // again. 1223 prev_b.inverted = Some(not_inverted); 1224 self.pending_fixup_records[prev_b.fixup].label = target; 1225 trace!(" -> reassigning target of condbr to {:?}", target); 1226 prev_b.target = target; 1227 debug_assert_eq!(off_before_edit, self.cur_offset()); 1228 continue; 1229 } 1230 } 1231 } 1232 1233 // If we couldn't do anything with the last branch, then break. 1234 break; 1235 } 1236 1237 self.purge_latest_branches(); 1238 1239 trace!( 1240 "leave optimize_branches:\n b = {:?}\n l = {:?}\n f = {:?}", 1241 self.latest_branches, self.labels_at_tail, self.pending_fixup_records 1242 ); 1243 } 1244 1245 fn purge_latest_branches(&mut self) { 1246 // All of our branch simplification rules work only if a branch ends at 1247 // the tail of the buffer, with no following code; and branches are in 1248 // order in latest_branches; so if the last entry ends prior to 1249 // cur_offset, then clear all entries. 1250 let cur_off = self.cur_offset(); 1251 if let Some(l) = self.latest_branches.last() { 1252 if l.end < cur_off { 1253 trace!("purge_latest_branches: removing branch {:?}", l); 1254 self.latest_branches.clear(); 1255 } 1256 } 1257 1258 // Post-invariant: no invariant requires any branch to appear in 1259 // `latest_branches`; it is always optional. The list-clear above thus 1260 // preserves all semantics. 1261 } 1262 1263 /// Emit a trap at some point in the future with the specified code and 1264 /// stack map. 1265 /// 1266 /// This function returns a [`MachLabel`] which will be the future address 1267 /// of the trap. Jumps should refer to this label, likely by using the 1268 /// [`MachBuffer::use_label_at_offset`] method, to get a relocation 1269 /// patched in once the address of the trap is known. 1270 /// 1271 /// This will batch all traps into the end of the function. 1272 pub fn defer_trap(&mut self, code: TrapCode) -> MachLabel { 1273 let label = self.get_label(); 1274 self.pending_traps.push(MachLabelTrap { 1275 label, 1276 code, 1277 loc: self.cur_srcloc.map(|(_start, loc)| loc), 1278 }); 1279 label 1280 } 1281 1282 /// Is an island needed within the next N bytes? 1283 pub fn island_needed(&self, distance: CodeOffset) -> bool { 1284 let deadline = match self.fixup_records.peek() { 1285 Some(fixup) => fixup.deadline().min(self.pending_fixup_deadline), 1286 None => self.pending_fixup_deadline, 1287 }; 1288 trace!( 1289 "checking island_needed: cur_offset = {} deadline = {} worst_case_end_of_island = {}", 1290 self.cur_offset(), 1291 deadline, 1292 self.worst_case_end_of_island(distance) 1293 ); 1294 let needed = deadline < u32::MAX && self.worst_case_end_of_island(distance) > deadline; 1295 trace!(" -> needed = {needed}"); 1296 needed 1297 } 1298 1299 /// Returns the maximal offset that islands can reach if `distance` more 1300 /// bytes are appended. 1301 /// 1302 /// This is used to determine if veneers need insertions since jumps that 1303 /// can't reach past this point must get a veneer of some form. 1304 fn worst_case_end_of_island(&self, distance: CodeOffset) -> CodeOffset { 1305 // Assume that all fixups will require veneers and that the veneers are 1306 // the worst-case size for each platform. This is an over-generalization 1307 // to avoid iterating over the `fixup_records` list or maintaining 1308 // information about it as we go along. 1309 let island_worst_case_size = ((self.fixup_records.len() + self.pending_fixup_records.len()) 1310 as u32) 1311 * (I::LabelUse::worst_case_veneer_size()) 1312 + self.pending_constants_size 1313 + (self.pending_traps.len() * I::TRAP_OPCODE.len()) as u32; 1314 self.cur_offset() 1315 .saturating_add(distance) 1316 .saturating_add(island_worst_case_size) 1317 } 1318 1319 /// Emit all pending constants and required pending veneers. 1320 /// 1321 /// Should only be called if `island_needed()` returns true, i.e., if we 1322 /// actually reach a deadline. It's not necessarily a problem to do so 1323 /// otherwise but it may result in unnecessary work during emission. 1324 pub fn emit_island(&mut self, distance: CodeOffset, ctrl_plane: &mut ControlPlane) { 1325 self.emit_island_maybe_forced(ForceVeneers::No, distance, ctrl_plane); 1326 } 1327 1328 /// Same as `emit_island`, but an internal API with a `force_veneers` 1329 /// argument to force all veneers to always get emitted for debugging. 1330 fn emit_island_maybe_forced( 1331 &mut self, 1332 force_veneers: ForceVeneers, 1333 distance: CodeOffset, 1334 ctrl_plane: &mut ControlPlane, 1335 ) { 1336 trace!( 1337 "emitting island at {}, distance = {distance}", 1338 self.cur_offset() 1339 ); 1340 1341 // We're going to purge fixups, so no latest-branch editing can happen 1342 // anymore. 1343 self.latest_branches.clear(); 1344 1345 // End the current location tracking since anything emitted during this 1346 // function shouldn't be attributed to whatever the current source 1347 // location is. 1348 // 1349 // Note that the current source location, if it's set right now, will be 1350 // restored at the end of this island emission. 1351 let cur_loc = self.cur_srcloc.map(|(_, loc)| loc); 1352 if cur_loc.is_some() { 1353 self.end_srcloc(); 1354 } 1355 1356 let forced_threshold = self.worst_case_end_of_island(distance); 1357 trace!("forced_threshold = {forced_threshold}"); 1358 1359 // Emit traps/constants after the island: with potentially 1360 // unbounded pending constants/traps and potentially small 1361 // deadlines, it would otherwise be possible to emit a 1362 // small-range jump, have a nearby deadline *before* the end 1363 // of pending constants/traps, and not be able to emit a 1364 // veneer in time. 1365 // 1366 // Fixups whose labels aren't yet defined (e.g. references to 1367 // pending constants/traps) are simply deferred here; they'll 1368 // be resolved in the next island or in the final fixup pass 1369 // at the end of emission. 1370 1371 // Either handle all pending fixups because they're ready or move them 1372 // onto the `BinaryHeap` tracking all pending fixups if they aren't 1373 // ready. 1374 assert!(self.latest_branches.is_empty()); 1375 trace!( 1376 "About to handle fixups at offset {}: {:?}", 1377 self.cur_offset(), 1378 self.pending_fixup_records 1379 ); 1380 for fixup in mem::take(&mut self.pending_fixup_records) { 1381 if self.should_apply_fixup(&fixup, forced_threshold) { 1382 self.handle_fixup(fixup, force_veneers, forced_threshold); 1383 } else { 1384 self.fixup_records.push(fixup); 1385 } 1386 } 1387 self.pending_fixup_deadline = u32::MAX; 1388 while let Some(fixup) = self.fixup_records.peek() { 1389 trace!( 1390 "emit_island: fixup {:?} deadline {}", 1391 fixup, 1392 fixup.deadline() 1393 ); 1394 1395 // If this fixup shouldn't be applied, that means its label isn't 1396 // defined yet and there'll be remaining space to apply a veneer if 1397 // necessary in the future after this island. In that situation 1398 // because `fixup_records` is sorted by deadline this loop can 1399 // exit. 1400 if !self.should_apply_fixup(fixup, forced_threshold) { 1401 break; 1402 } 1403 1404 let fixup = self.fixup_records.pop().unwrap(); 1405 self.handle_fixup(fixup, force_veneers, forced_threshold); 1406 } 1407 1408 // Now emit pending traps and constants. 1409 // 1410 // Note that traps are placed first since this typically happens at the 1411 // end of the function and for disassemblers we try to keep all the code 1412 // contiguously together. 1413 trace!("emitting pending traps: {:?}", self.pending_traps); 1414 for MachLabelTrap { label, code, loc } in mem::take(&mut self.pending_traps) { 1415 // If this trap has source information associated with it then 1416 // emit this information for the trap instruction going out now too. 1417 if let Some(loc) = loc { 1418 self.start_srcloc(loc); 1419 } 1420 self.align_to(I::LabelUse::ALIGN); 1421 self.bind_label(label, ctrl_plane); 1422 self.add_trap(code); 1423 self.put_data(I::TRAP_OPCODE); 1424 if loc.is_some() { 1425 self.end_srcloc(); 1426 } 1427 } 1428 1429 trace!("emitting pending constants: {:?}", self.pending_constants); 1430 for constant in mem::take(&mut self.pending_constants) { 1431 let MachBufferConstant { align, size, .. } = self.constants[constant]; 1432 let label = self.constants[constant].upcoming_label.take().unwrap(); 1433 self.align_to(align); 1434 self.bind_label(label, ctrl_plane); 1435 self.used_constants.push((constant, self.cur_offset())); 1436 self.get_appended_space(size); 1437 } 1438 1439 if let Some(loc) = cur_loc { 1440 self.start_srcloc(loc); 1441 } 1442 } 1443 1444 fn should_apply_fixup(&self, fixup: &MachLabelFixup<I>, forced_threshold: CodeOffset) -> bool { 1445 let label_offset = self.resolve_label_offset(fixup.label); 1446 trace!( 1447 "should_apply_fixup: fixup {fixup:?} label_offset {label_offset} deadline {} forced_threshold {forced_threshold} supports_veneer {}", 1448 fixup.deadline(), 1449 fixup.kind.supports_veneer() 1450 ); 1451 let result = (label_offset != UNKNOWN_LABEL_OFFSET) 1452 || ((fixup.deadline() < forced_threshold) && fixup.kind.supports_veneer()); 1453 trace!( 1454 " -> {}, {}, {} -> {result}", 1455 label_offset != UNKNOWN_LABEL_OFFSET, 1456 fixup.deadline() < forced_threshold, 1457 fixup.kind.supports_veneer() 1458 ); 1459 result 1460 } 1461 1462 fn handle_fixup( 1463 &mut self, 1464 fixup: MachLabelFixup<I>, 1465 force_veneers: ForceVeneers, 1466 forced_threshold: CodeOffset, 1467 ) { 1468 let MachLabelFixup { 1469 label, 1470 offset, 1471 kind, 1472 } = fixup; 1473 let start = offset as usize; 1474 let end = (offset + kind.patch_size()) as usize; 1475 let label_offset = self.resolve_label_offset(label); 1476 1477 if label_offset != UNKNOWN_LABEL_OFFSET { 1478 // If the offset of the label for this fixup is known then 1479 // we're going to do something here-and-now. We're either going 1480 // to patch the original offset because it's an in-bounds jump, 1481 // or we're going to generate a veneer, patch the fixup to jump 1482 // to the veneer, and then keep going. 1483 // 1484 // If the label comes after the original fixup, then we should 1485 // be guaranteed that the jump is in-bounds. Otherwise there's 1486 // a bug somewhere because this method wasn't called soon 1487 // enough. All forward-jumps are tracked and should get veneers 1488 // before their deadline comes and they're unable to jump 1489 // further. 1490 // 1491 // Otherwise if the label is before the fixup, then that's a 1492 // backwards jump. If it's past the maximum negative range 1493 // then we'll emit a veneer that to jump forward to which can 1494 // then jump backwards. 1495 let veneer_required = if label_offset >= offset { 1496 assert!((label_offset - offset) <= kind.max_pos_range()); 1497 false 1498 } else { 1499 (offset - label_offset) > kind.max_neg_range() 1500 }; 1501 trace!( 1502 " -> label_offset = {}, known, required = {} (pos {} neg {})", 1503 label_offset, 1504 veneer_required, 1505 kind.max_pos_range(), 1506 kind.max_neg_range() 1507 ); 1508 1509 if (force_veneers == ForceVeneers::Yes && kind.supports_veneer()) || veneer_required { 1510 self.emit_veneer(label, offset, kind); 1511 } else { 1512 let slice = &mut self.data[start..end]; 1513 trace!( 1514 "patching in-range! slice = {slice:?}; offset = {offset:#x}; label_offset = {label_offset:#x}" 1515 ); 1516 kind.patch(slice, offset, label_offset); 1517 } 1518 } else { 1519 // If the offset of this label is not known at this time then 1520 // that means that a veneer is required because after this 1521 // island the target can't be in range of the original target. 1522 assert!(forced_threshold - offset > kind.max_pos_range()); 1523 self.emit_veneer(label, offset, kind); 1524 } 1525 } 1526 1527 /// Emits a "veneer" the `kind` code at `offset` to jump to `label`. 1528 /// 1529 /// This will generate extra machine code, using `kind`, to get a 1530 /// larger-jump-kind than `kind` allows. The code at `offset` is then 1531 /// patched to jump to our new code, and then the new code is enqueued for 1532 /// a fixup to get processed at some later time. 1533 fn emit_veneer(&mut self, label: MachLabel, offset: CodeOffset, kind: I::LabelUse) { 1534 // If this `kind` doesn't support a veneer then that's a bug in the 1535 // backend because we need to implement support for such a veneer. 1536 assert!( 1537 kind.supports_veneer(), 1538 "jump beyond the range of {kind:?} but a veneer isn't supported", 1539 ); 1540 1541 // Allocate space for a veneer in the island. 1542 self.align_to(I::LabelUse::ALIGN); 1543 let veneer_offset = self.cur_offset(); 1544 trace!("making a veneer at {}", veneer_offset); 1545 let start = offset as usize; 1546 let end = (offset + kind.patch_size()) as usize; 1547 let slice = &mut self.data[start..end]; 1548 // Patch the original label use to refer to the veneer. 1549 trace!( 1550 "patching original at offset {} to veneer offset {}", 1551 offset, veneer_offset 1552 ); 1553 kind.patch(slice, offset, veneer_offset); 1554 // Generate the veneer. 1555 let veneer_slice = self.get_appended_space(kind.veneer_size() as usize); 1556 let (veneer_fixup_off, veneer_label_use) = 1557 kind.generate_veneer(veneer_slice, veneer_offset); 1558 trace!( 1559 "generated veneer; fixup offset {}, label_use {:?}", 1560 veneer_fixup_off, veneer_label_use 1561 ); 1562 // Register a new use of `label` with our new veneer fixup and 1563 // offset. This'll recalculate deadlines accordingly and 1564 // enqueue this fixup to get processed at some later 1565 // time. 1566 self.use_label_at_offset(veneer_fixup_off, label, veneer_label_use); 1567 } 1568 1569 fn finish_emission_maybe_forcing_veneers( 1570 &mut self, 1571 force_veneers: ForceVeneers, 1572 ctrl_plane: &mut ControlPlane, 1573 ) { 1574 while !self.pending_constants.is_empty() 1575 || !self.pending_traps.is_empty() 1576 || !self.fixup_records.is_empty() 1577 || !self.pending_fixup_records.is_empty() 1578 { 1579 // `emit_island()` will emit any pending veneers and constants, and 1580 // as a side-effect, will also take care of any fixups with resolved 1581 // labels eagerly. 1582 self.emit_island_maybe_forced(force_veneers, 0, ctrl_plane); 1583 } 1584 1585 // Ensure that all labels have been fixed up after the last island is emitted. This is a 1586 // full (release-mode) assert because an unresolved label means the emitted code is 1587 // incorrect. 1588 assert!(self.fixup_records.is_empty()); 1589 assert!(self.pending_fixup_records.is_empty()); 1590 } 1591 1592 /// Finish any deferred emissions and/or fixups. 1593 pub fn finish( 1594 mut self, 1595 constants: &VCodeConstants, 1596 ctrl_plane: &mut ControlPlane, 1597 ) -> MachBufferFinalized<Stencil> { 1598 let _tt = timing::vcode_emit_finish(); 1599 1600 self.finish_emission_maybe_forcing_veneers(ForceVeneers::No, ctrl_plane); 1601 1602 let alignment = self.finish_constants(constants); 1603 1604 // Resolve all labels to their offsets. 1605 let finalized_relocs = self 1606 .relocs 1607 .iter() 1608 .map(|reloc| FinalizedMachReloc { 1609 offset: reloc.offset, 1610 kind: reloc.kind, 1611 addend: reloc.addend, 1612 target: match &reloc.target { 1613 RelocTarget::ExternalName(name) => { 1614 FinalizedRelocTarget::ExternalName(name.clone()) 1615 } 1616 RelocTarget::Label(label) => { 1617 FinalizedRelocTarget::Func(self.resolve_label_offset(*label)) 1618 } 1619 }, 1620 }) 1621 .collect(); 1622 1623 let finalized_exception_handlers = self 1624 .exception_handlers 1625 .iter() 1626 .map(|handler| handler.finalize(|label| self.resolve_label_offset(label))) 1627 .collect(); 1628 1629 let mut srclocs = self.srclocs; 1630 srclocs.sort_by_key(|entry| entry.start); 1631 1632 MachBufferFinalized { 1633 data: self.data, 1634 relocs: finalized_relocs, 1635 traps: self.traps, 1636 call_sites: self.call_sites, 1637 patchable_call_sites: self.patchable_call_sites, 1638 exception_handlers: finalized_exception_handlers, 1639 srclocs, 1640 debug_tags: self.debug_tags, 1641 debug_tag_pool: self.debug_tag_pool, 1642 user_stack_maps: self.user_stack_maps, 1643 unwind_info: self.unwind_info, 1644 alignment, 1645 frame_layout: self.frame_layout, 1646 nop_units: I::gen_nop_units(), 1647 } 1648 } 1649 1650 /// Add an external relocation at the given offset. 1651 pub fn add_reloc_at_offset<T: Into<RelocTarget> + Clone>( 1652 &mut self, 1653 offset: CodeOffset, 1654 kind: Reloc, 1655 target: &T, 1656 addend: Addend, 1657 ) { 1658 let target: RelocTarget = target.clone().into(); 1659 // FIXME(#3277): This should use `I::LabelUse::from_reloc` to optionally 1660 // generate a label-use statement to track whether an island is possibly 1661 // needed to escape this function to actually get to the external name. 1662 // This is most likely to come up on AArch64 where calls between 1663 // functions use a 26-bit signed offset which gives +/- 64MB. This means 1664 // that if a function is 128MB in size and there's a call in the middle 1665 // it's impossible to reach the actual target. Also, while it's 1666 // technically possible to jump to the start of a function and then jump 1667 // further, island insertion below always inserts islands after 1668 // previously appended code so for Cranelift's own implementation this 1669 // is also a problem for 64MB functions on AArch64 which start with a 1670 // call instruction, those won't be able to escape. 1671 // 1672 // Ideally what needs to happen here is that a `LabelUse` is 1673 // transparently generated (or call-sites of this function are audited 1674 // to generate a `LabelUse` instead) and tracked internally. The actual 1675 // relocation would then change over time if and when a veneer is 1676 // inserted, where the relocation here would be patched by this 1677 // `MachBuffer` to jump to the veneer. The problem, though, is that all 1678 // this still needs to end up, in the case of a singular function, 1679 // generating a final relocation pointing either to this particular 1680 // relocation or to the veneer inserted. Additionally 1681 // `MachBuffer` needs the concept of a label which will never be 1682 // resolved, so `emit_island` doesn't trip over not actually ever 1683 // knowing what some labels are. Currently the loop in 1684 // `finish_emission_maybe_forcing_veneers` would otherwise infinitely 1685 // loop. 1686 // 1687 // For now this means that because relocs aren't tracked at all that 1688 // AArch64 functions have a rough size limits of 64MB. For now that's 1689 // somewhat reasonable and the failure mode is a panic in `MachBuffer` 1690 // when a relocation can't otherwise be resolved later, so it shouldn't 1691 // actually result in any memory unsafety or anything like that. 1692 self.relocs.push(MachReloc { 1693 offset, 1694 kind, 1695 target, 1696 addend, 1697 }); 1698 } 1699 1700 /// Add an external relocation at the current offset. 1701 pub fn add_reloc<T: Into<RelocTarget> + Clone>( 1702 &mut self, 1703 kind: Reloc, 1704 target: &T, 1705 addend: Addend, 1706 ) { 1707 self.add_reloc_at_offset(self.data.len() as CodeOffset, kind, target, addend); 1708 } 1709 1710 /// Add a trap record at the current offset. 1711 pub fn add_trap(&mut self, code: TrapCode) { 1712 self.traps.push(MachTrap { 1713 offset: self.data.len() as CodeOffset, 1714 code, 1715 }); 1716 } 1717 1718 /// Add a call-site record at the current offset. 1719 pub fn add_call_site(&mut self) { 1720 self.add_try_call_site(None, core::iter::empty()); 1721 } 1722 1723 /// Add a call-site record at the current offset with exception 1724 /// handlers. 1725 pub fn add_try_call_site( 1726 &mut self, 1727 frame_offset: Option<u32>, 1728 exception_handlers: impl Iterator<Item = MachExceptionHandler>, 1729 ) { 1730 let start = u32::try_from(self.exception_handlers.len()).unwrap(); 1731 self.exception_handlers.extend(exception_handlers); 1732 let end = u32::try_from(self.exception_handlers.len()).unwrap(); 1733 let exception_handler_range = start..end; 1734 1735 self.call_sites.push(MachCallSite { 1736 ret_addr: self.data.len() as CodeOffset, 1737 frame_offset, 1738 exception_handler_range, 1739 }); 1740 } 1741 1742 /// Add a patchable call record at the current offset The actual 1743 /// call is expected to have been emitted; the VCodeInst trait 1744 /// specifies how to NOP it out, and we carry that information to 1745 /// the finalized Machbuffer. 1746 pub fn add_patchable_call_site(&mut self, len: u32) { 1747 self.patchable_call_sites.push(MachPatchableCallSite { 1748 ret_addr: self.cur_offset(), 1749 len, 1750 }); 1751 } 1752 1753 /// Add an unwind record at the current offset. 1754 pub fn add_unwind(&mut self, unwind: UnwindInst) { 1755 self.unwind_info.push((self.cur_offset(), unwind)); 1756 } 1757 1758 /// Set the `SourceLoc` for code from this offset until the offset at the 1759 /// next call to `end_srcloc()`. 1760 /// Returns the current [CodeOffset] and [RelSourceLoc]. 1761 pub fn start_srcloc(&mut self, loc: RelSourceLoc) -> (CodeOffset, RelSourceLoc) { 1762 let cur = (self.cur_offset(), loc); 1763 self.cur_srcloc = Some(cur); 1764 cur 1765 } 1766 1767 /// Mark the end of the `SourceLoc` segment started at the last 1768 /// `start_srcloc()` call. 1769 pub fn end_srcloc(&mut self) { 1770 let (start, loc) = self 1771 .cur_srcloc 1772 .take() 1773 .expect("end_srcloc() called without start_srcloc()"); 1774 let end = self.cur_offset(); 1775 // Skip zero-length extends. 1776 debug_assert!(end >= start); 1777 if end > start { 1778 self.srclocs.push(MachSrcLoc { start, end, loc }); 1779 } 1780 } 1781 1782 /// Push a user stack map onto this buffer. 1783 /// 1784 /// The stack map is associated with the given `return_addr` code 1785 /// offset. This must be the PC for the instruction just *after* this stack 1786 /// map's associated instruction. For example in the sequence `call $foo; 1787 /// add r8, rax`, the `return_addr` must be the offset of the start of the 1788 /// `add` instruction. 1789 /// 1790 /// Stack maps must be pushed in sorted `return_addr` order. 1791 pub fn push_user_stack_map( 1792 &mut self, 1793 emit_state: &I::State, 1794 return_addr: CodeOffset, 1795 mut stack_map: ir::UserStackMap, 1796 ) { 1797 let span = emit_state.frame_layout().active_size(); 1798 trace!("Adding user stack map @ {return_addr:#x} spanning {span} bytes: {stack_map:?}"); 1799 1800 debug_assert!( 1801 self.user_stack_maps 1802 .last() 1803 .map_or(true, |(prev_addr, _, _)| *prev_addr < return_addr), 1804 "pushed stack maps out of order: {} is not less than {}", 1805 self.user_stack_maps.last().unwrap().0, 1806 return_addr, 1807 ); 1808 1809 stack_map.finalize(emit_state.frame_layout().sp_to_sized_stack_slots()); 1810 self.user_stack_maps.push((return_addr, span, stack_map)); 1811 } 1812 1813 /// Push a debug tag associated with the current buffer offset. 1814 pub fn push_debug_tags(&mut self, pos: MachDebugTagPos, tags: &[DebugTag]) { 1815 trace!("debug tags at offset {}: {tags:?}", self.cur_offset()); 1816 let start = u32::try_from(self.debug_tag_pool.len()).unwrap(); 1817 self.debug_tag_pool.extend(tags.iter().cloned()); 1818 let end = u32::try_from(self.debug_tag_pool.len()).unwrap(); 1819 self.debug_tags.push(MachDebugTags { 1820 offset: self.cur_offset(), 1821 pos, 1822 range: start..end, 1823 }); 1824 } 1825 1826 /// Increase the alignment of the buffer to the given alignment if bigger 1827 /// than the current alignment. 1828 pub fn set_log2_min_function_alignment(&mut self, align_to: u8) { 1829 self.min_alignment = self.min_alignment.max( 1830 1u32.checked_shl(u32::from(align_to)) 1831 .expect("log2_min_function_alignment too large"), 1832 ); 1833 } 1834 1835 /// Set the frame layout metadata. 1836 pub fn set_frame_layout(&mut self, frame_layout: MachBufferFrameLayout) { 1837 debug_assert!(self.frame_layout.is_none()); 1838 self.frame_layout = Some(frame_layout); 1839 } 1840 } 1841 1842 impl<I: VCodeInst> Extend<u8> for MachBuffer<I> { 1843 fn extend<T: IntoIterator<Item = u8>>(&mut self, iter: T) { 1844 for b in iter { 1845 self.put1(b); 1846 } 1847 } 1848 } 1849 1850 impl<T: CompilePhase> MachBufferFinalized<T> { 1851 /// Get a list of source location mapping tuples in sorted-by-start-offset order. 1852 pub fn get_srclocs_sorted(&self) -> &[T::MachSrcLocType] { 1853 &self.srclocs[..] 1854 } 1855 1856 /// Get all debug tags, sorted by associated offset. 1857 pub fn debug_tags(&self) -> impl Iterator<Item = MachBufferDebugTagList<'_>> { 1858 self.debug_tags.iter().map(|tags| { 1859 let start = usize::try_from(tags.range.start).unwrap(); 1860 let end = usize::try_from(tags.range.end).unwrap(); 1861 MachBufferDebugTagList { 1862 offset: tags.offset, 1863 pos: tags.pos, 1864 tags: &self.debug_tag_pool[start..end], 1865 } 1866 }) 1867 } 1868 1869 /// Get the total required size for the code. 1870 pub fn total_size(&self) -> CodeOffset { 1871 self.data.len() as CodeOffset 1872 } 1873 1874 /// Return the code in this mach buffer as a hex string for testing purposes. 1875 pub fn stringify_code_bytes(&self) -> String { 1876 // This is pretty lame, but whatever .. 1877 use core::fmt::Write; 1878 let mut s = String::with_capacity(self.data.len() * 2); 1879 for b in &self.data { 1880 write!(&mut s, "{b:02X}").unwrap(); 1881 } 1882 s 1883 } 1884 1885 /// Get the code bytes. 1886 pub fn data(&self) -> &[u8] { 1887 // N.B.: we emit every section into the .text section as far as 1888 // the `CodeSink` is concerned; we do not bother to segregate 1889 // the contents into the actual program text, the jumptable and the 1890 // rodata (constant pool). This allows us to generate code assuming 1891 // that these will not be relocated relative to each other, and avoids 1892 // having to designate each section as belonging in one of the three 1893 // fixed categories defined by `CodeSink`. If this becomes a problem 1894 // later (e.g. because of memory permissions or similar), we can 1895 // add this designation and segregate the output; take care, however, 1896 // to add the appropriate relocations in this case. 1897 1898 &self.data[..] 1899 } 1900 1901 /// Get a mutable slice of the code bytes, allowing patching 1902 /// post-passes. 1903 pub fn data_mut(&mut self) -> &mut [u8] { 1904 &mut self.data[..] 1905 } 1906 1907 /// Get the list of external relocations for this code. 1908 pub fn relocs(&self) -> &[FinalizedMachReloc] { 1909 &self.relocs[..] 1910 } 1911 1912 /// Get the list of trap records for this code. 1913 pub fn traps(&self) -> &[MachTrap] { 1914 &self.traps[..] 1915 } 1916 1917 /// Get the user stack map metadata for this code. 1918 pub fn user_stack_maps(&self) -> &[(CodeOffset, u32, ir::UserStackMap)] { 1919 &self.user_stack_maps 1920 } 1921 1922 /// Take this buffer's user stack map metadata. 1923 pub fn take_user_stack_maps(&mut self) -> SmallVec<[(CodeOffset, u32, ir::UserStackMap); 8]> { 1924 mem::take(&mut self.user_stack_maps) 1925 } 1926 1927 /// Get the list of call sites for this code, along with 1928 /// associated exception handlers. 1929 /// 1930 /// Each item yielded by the returned iterator is a struct with: 1931 /// 1932 /// - The call site metadata record, with a `ret_addr` field 1933 /// directly accessible and denoting the offset of the return 1934 /// address into this buffer's code. 1935 /// - The slice of pairs of exception tags and code offsets 1936 /// denoting exception-handler entry points associated with this 1937 /// call site. 1938 pub fn call_sites(&self) -> impl Iterator<Item = FinalizedMachCallSite<'_>> + '_ { 1939 self.call_sites.iter().map(|call_site| { 1940 let handler_range = call_site.exception_handler_range.clone(); 1941 let handler_range = usize::try_from(handler_range.start).unwrap() 1942 ..usize::try_from(handler_range.end).unwrap(); 1943 FinalizedMachCallSite { 1944 ret_addr: call_site.ret_addr, 1945 frame_offset: call_site.frame_offset, 1946 exception_handlers: &self.exception_handlers[handler_range], 1947 } 1948 }) 1949 } 1950 1951 /// Get the frame layout, if known. 1952 pub fn frame_layout(&self) -> Option<&MachBufferFrameLayout> { 1953 self.frame_layout.as_ref() 1954 } 1955 1956 /// Get the list of patchable call sites for this code. 1957 /// 1958 /// Each location in the buffer contains the bytes for a call 1959 /// instruction to the specified target. If the call is to be 1960 /// patched out, the bytes in the region should be replaced with 1961 /// those given in the `MachBufferFinalized::nop` array, repeated 1962 /// as many times as necessary. (The length of the patchable 1963 /// region is guaranteed to be an integer multiple of that NOP 1964 /// unit size.) 1965 pub fn patchable_call_sites(&self) -> impl Iterator<Item = &MachPatchableCallSite> + '_ { 1966 self.patchable_call_sites.iter() 1967 } 1968 } 1969 1970 /// An item in the exception-handler list for a callsite, with label 1971 /// references. Items are interpreted in left-to-right order and the 1972 /// first match wins. 1973 #[derive(Clone, Copy, Debug, PartialEq, Eq)] 1974 pub enum MachExceptionHandler { 1975 /// A specific tag (in the current dynamic context) should be 1976 /// handled by the code at the given offset. 1977 Tag(ExceptionTag, MachLabel), 1978 /// All exceptions should be handled by the code at the given 1979 /// offset. 1980 Default(MachLabel), 1981 /// The dynamic context for interpreting tags is updated to the 1982 /// value stored in the given machine location (in this frame's 1983 /// context). 1984 Context(ExceptionContextLoc), 1985 } 1986 1987 impl MachExceptionHandler { 1988 fn finalize<F: Fn(MachLabel) -> CodeOffset>(self, f: F) -> FinalizedMachExceptionHandler { 1989 match self { 1990 Self::Tag(tag, label) => FinalizedMachExceptionHandler::Tag(tag, f(label)), 1991 Self::Default(label) => FinalizedMachExceptionHandler::Default(f(label)), 1992 Self::Context(loc) => FinalizedMachExceptionHandler::Context(loc), 1993 } 1994 } 1995 } 1996 1997 /// An item in the exception-handler list for a callsite, with final 1998 /// (lowered) code offsets. Items are interpreted in left-to-right 1999 /// order and the first match wins. 2000 #[derive(Clone, Copy, Debug, PartialEq, Eq)] 2001 #[cfg_attr( 2002 feature = "enable-serde", 2003 derive(serde_derive::Serialize, serde_derive::Deserialize) 2004 )] 2005 pub enum FinalizedMachExceptionHandler { 2006 /// A specific tag (in the current dynamic context) should be 2007 /// handled by the code at the given offset. 2008 Tag(ExceptionTag, CodeOffset), 2009 /// All exceptions should be handled by the code at the given 2010 /// offset. 2011 Default(CodeOffset), 2012 /// The dynamic context for interpreting tags is updated to the 2013 /// value stored in the given machine location (in this frame's 2014 /// context). 2015 Context(ExceptionContextLoc), 2016 } 2017 2018 /// A location for a dynamic exception context value. 2019 #[derive(Clone, Copy, Debug, PartialEq, Eq)] 2020 #[cfg_attr( 2021 feature = "enable-serde", 2022 derive(serde_derive::Serialize, serde_derive::Deserialize) 2023 )] 2024 pub enum ExceptionContextLoc { 2025 /// An offset from SP at the callsite. 2026 SPOffset(u32), 2027 /// A GPR at the callsite. The physical register number for the 2028 /// GPR register file on the target architecture is used. 2029 GPR(u8), 2030 } 2031 2032 /// Metadata about a constant. 2033 struct MachBufferConstant { 2034 /// A label which has not yet been bound which can be used for this 2035 /// constant. 2036 /// 2037 /// This is lazily created when a label is requested for a constant and is 2038 /// cleared when a constant is emitted. 2039 upcoming_label: Option<MachLabel>, 2040 /// Required alignment. 2041 align: CodeOffset, 2042 /// The byte size of this constant. 2043 size: usize, 2044 } 2045 2046 /// A trap that is deferred to the next time an island is emitted for either 2047 /// traps, constants, or fixups. 2048 #[derive(Debug)] 2049 struct MachLabelTrap { 2050 /// This label will refer to the trap's offset. 2051 label: MachLabel, 2052 /// The code associated with this trap. 2053 code: TrapCode, 2054 /// An optional source location to assign for this trap. 2055 loc: Option<RelSourceLoc>, 2056 } 2057 2058 /// A fixup to perform on the buffer once code is emitted. Fixups always refer 2059 /// to labels and patch the code based on label offsets. Hence, they are like 2060 /// relocations, but internal to one buffer. 2061 #[derive(Debug)] 2062 struct MachLabelFixup<I: VCodeInst> { 2063 /// The label whose offset controls this fixup. 2064 label: MachLabel, 2065 /// The offset to fix up / patch to refer to this label. 2066 offset: CodeOffset, 2067 /// The kind of fixup. This is architecture-specific; each architecture may have, 2068 /// e.g., several types of branch instructions, each with differently-sized 2069 /// offset fields and different places within the instruction to place the 2070 /// bits. 2071 kind: I::LabelUse, 2072 } 2073 2074 impl<I: VCodeInst> MachLabelFixup<I> { 2075 fn deadline(&self) -> CodeOffset { 2076 self.offset.saturating_add(self.kind.max_pos_range()) 2077 } 2078 } 2079 2080 impl<I: VCodeInst> PartialEq for MachLabelFixup<I> { 2081 fn eq(&self, other: &Self) -> bool { 2082 self.deadline() == other.deadline() 2083 } 2084 } 2085 2086 impl<I: VCodeInst> Eq for MachLabelFixup<I> {} 2087 2088 impl<I: VCodeInst> PartialOrd for MachLabelFixup<I> { 2089 fn partial_cmp(&self, other: &Self) -> Option<Ordering> { 2090 Some(self.cmp(other)) 2091 } 2092 } 2093 2094 impl<I: VCodeInst> Ord for MachLabelFixup<I> { 2095 fn cmp(&self, other: &Self) -> Ordering { 2096 other.deadline().cmp(&self.deadline()) 2097 } 2098 } 2099 2100 /// A relocation resulting from a compilation. 2101 #[derive(Clone, Debug, PartialEq)] 2102 #[cfg_attr( 2103 feature = "enable-serde", 2104 derive(serde_derive::Serialize, serde_derive::Deserialize) 2105 )] 2106 pub struct MachRelocBase<T> { 2107 /// The offset at which the relocation applies, *relative to the 2108 /// containing section*. 2109 pub offset: CodeOffset, 2110 /// The kind of relocation. 2111 pub kind: Reloc, 2112 /// The external symbol / name to which this relocation refers. 2113 pub target: T, 2114 /// The addend to add to the symbol value. 2115 pub addend: i64, 2116 } 2117 2118 type MachReloc = MachRelocBase<RelocTarget>; 2119 2120 /// A relocation resulting from a compilation. 2121 pub type FinalizedMachReloc = MachRelocBase<FinalizedRelocTarget>; 2122 2123 /// A Relocation target 2124 #[derive(Debug, Clone, PartialEq, Eq, Hash)] 2125 pub enum RelocTarget { 2126 /// Points to an [ExternalName] outside the current function. 2127 ExternalName(ExternalName), 2128 /// Points to a [MachLabel] inside this function. 2129 /// This is different from [MachLabelFixup] in that both the relocation and the 2130 /// label will be emitted and are only resolved at link time. 2131 /// 2132 /// There is no reason to prefer this over [MachLabelFixup] unless the ABI requires it. 2133 Label(MachLabel), 2134 } 2135 2136 impl From<ExternalName> for RelocTarget { 2137 fn from(name: ExternalName) -> Self { 2138 Self::ExternalName(name) 2139 } 2140 } 2141 2142 impl From<MachLabel> for RelocTarget { 2143 fn from(label: MachLabel) -> Self { 2144 Self::Label(label) 2145 } 2146 } 2147 2148 /// A Relocation target 2149 #[derive(Debug, Clone, PartialEq, Eq, Hash)] 2150 #[cfg_attr( 2151 feature = "enable-serde", 2152 derive(serde_derive::Serialize, serde_derive::Deserialize) 2153 )] 2154 pub enum FinalizedRelocTarget { 2155 /// Points to an [ExternalName] outside the current function. 2156 ExternalName(ExternalName), 2157 /// Points to a [CodeOffset] from the start of the current function. 2158 Func(CodeOffset), 2159 } 2160 2161 impl FinalizedRelocTarget { 2162 /// Returns a display for the current [FinalizedRelocTarget], with extra context to prettify the 2163 /// output. 2164 pub fn display<'a>(&'a self, params: Option<&'a FunctionParameters>) -> String { 2165 match self { 2166 FinalizedRelocTarget::ExternalName(name) => format!("{}", name.display(params)), 2167 FinalizedRelocTarget::Func(offset) => format!("func+{offset}"), 2168 } 2169 } 2170 } 2171 2172 /// A trap record resulting from a compilation. 2173 #[derive(Clone, Debug, PartialEq)] 2174 #[cfg_attr( 2175 feature = "enable-serde", 2176 derive(serde_derive::Serialize, serde_derive::Deserialize) 2177 )] 2178 pub struct MachTrap { 2179 /// The offset at which the trap instruction occurs, *relative to the 2180 /// containing section*. 2181 pub offset: CodeOffset, 2182 /// The trap code. 2183 pub code: TrapCode, 2184 } 2185 2186 /// A call site record resulting from a compilation. 2187 #[derive(Clone, Debug, PartialEq)] 2188 #[cfg_attr( 2189 feature = "enable-serde", 2190 derive(serde_derive::Serialize, serde_derive::Deserialize) 2191 )] 2192 pub struct MachCallSite { 2193 /// The offset of the call's return address, *relative to the 2194 /// start of the buffer*. 2195 pub ret_addr: CodeOffset, 2196 2197 /// The offset from the FP at this callsite down to the SP when 2198 /// the call occurs, if known. In other words, the size of the 2199 /// stack frame up to the saved FP slot. Useful to recover the 2200 /// start of the stack frame and to look up dynamic contexts 2201 /// stored in [`ExceptionContextLoc::SPOffset`]. 2202 /// 2203 /// If `None`, the compiler backend did not specify a frame 2204 /// offset. The runtime in use with the compiled code may require 2205 /// the frame offset if exception handlers are present or dynamic 2206 /// context is used, but that is not Cranelift's concern: the 2207 /// frame offset is optional at this level. 2208 pub frame_offset: Option<u32>, 2209 2210 /// Range in `exception_handlers` corresponding to the exception 2211 /// handlers for this callsite. 2212 exception_handler_range: Range<u32>, 2213 } 2214 2215 /// A call site record resulting from a compilation. 2216 #[derive(Clone, Debug, PartialEq)] 2217 pub struct FinalizedMachCallSite<'a> { 2218 /// The offset of the call's return address, *relative to the 2219 /// start of the buffer*. 2220 pub ret_addr: CodeOffset, 2221 2222 /// The offset from the FP at this callsite down to the SP when 2223 /// the call occurs, if known. In other words, the size of the 2224 /// stack frame up to the saved FP slot. Useful to recover the 2225 /// start of the stack frame and to look up dynamic contexts 2226 /// stored in [`ExceptionContextLoc::SPOffset`]. 2227 /// 2228 /// If `None`, the compiler backend did not specify a frame 2229 /// offset. The runtime in use with the compiled code may require 2230 /// the frame offset if exception handlers are present or dynamic 2231 /// context is used, but that is not Cranelift's concern: the 2232 /// frame offset is optional at this level. 2233 pub frame_offset: Option<u32>, 2234 2235 /// Exception handlers at this callsite, with target offsets 2236 /// *relative to the start of the buffer*. 2237 pub exception_handlers: &'a [FinalizedMachExceptionHandler], 2238 } 2239 2240 /// A patchable call site record resulting from a compilation. 2241 #[derive(Clone, Debug, PartialEq)] 2242 #[cfg_attr( 2243 feature = "enable-serde", 2244 derive(serde_derive::Serialize, serde_derive::Deserialize) 2245 )] 2246 pub struct MachPatchableCallSite { 2247 /// The offset of the call's return address (i.e., the address 2248 /// after the end of the patchable region), *relative to the start 2249 /// of the buffer*. 2250 pub ret_addr: CodeOffset, 2251 2252 /// The length of the region to be patched by NOP bytes. 2253 pub len: u32, 2254 } 2255 2256 /// A source-location mapping resulting from a compilation. 2257 #[derive(PartialEq, Debug, Clone)] 2258 #[cfg_attr( 2259 feature = "enable-serde", 2260 derive(serde_derive::Serialize, serde_derive::Deserialize) 2261 )] 2262 pub struct MachSrcLoc<T: CompilePhase> { 2263 /// The start of the region of code corresponding to a source location. 2264 /// This is relative to the start of the function, not to the start of the 2265 /// section. 2266 pub start: CodeOffset, 2267 /// The end of the region of code corresponding to a source location. 2268 /// This is relative to the start of the function, not to the start of the 2269 /// section. 2270 pub end: CodeOffset, 2271 /// The source location. 2272 pub loc: T::SourceLocType, 2273 } 2274 2275 impl MachSrcLoc<Stencil> { 2276 fn apply_base_srcloc(self, base_srcloc: SourceLoc) -> MachSrcLoc<Final> { 2277 MachSrcLoc { 2278 start: self.start, 2279 end: self.end, 2280 loc: self.loc.expand(base_srcloc), 2281 } 2282 } 2283 } 2284 2285 /// Record of branch instruction in the buffer, to facilitate editing. 2286 #[derive(Clone, Debug)] 2287 struct MachBranch { 2288 start: CodeOffset, 2289 end: CodeOffset, 2290 target: MachLabel, 2291 fixup: usize, 2292 inverted: Option<SmallVec<[u8; 8]>>, 2293 /// All labels pointing to the start of this branch. For correctness, this 2294 /// *must* be complete (i.e., must contain all labels whose resolved offsets 2295 /// are at the start of this branch): we rely on being able to redirect all 2296 /// labels that could jump to this branch before removing it, if it is 2297 /// otherwise unreachable. 2298 labels_at_this_branch: SmallVec<[MachLabel; 4]>, 2299 } 2300 2301 impl MachBranch { 2302 fn is_cond(&self) -> bool { 2303 self.inverted.is_some() 2304 } 2305 fn is_uncond(&self) -> bool { 2306 self.inverted.is_none() 2307 } 2308 } 2309 2310 /// Stack-frame layout information carried through to machine 2311 /// code. This provides sufficient information to interpret an active 2312 /// stack frame from a running function, if provided. 2313 #[derive(Clone, Debug, PartialEq)] 2314 #[cfg_attr( 2315 feature = "enable-serde", 2316 derive(serde_derive::Serialize, serde_derive::Deserialize) 2317 )] 2318 pub struct MachBufferFrameLayout { 2319 /// Offset from bottom of frame to FP (near top of frame). This 2320 /// allows reading the frame given only FP. 2321 pub frame_to_fp_offset: u32, 2322 /// Offset from bottom of frame for each StackSlot, 2323 pub stackslots: SecondaryMap<ir::StackSlot, MachBufferStackSlot>, 2324 } 2325 2326 /// Descriptor for a single stack slot in the compiled function. 2327 #[derive(Clone, Debug, PartialEq, Default)] 2328 #[cfg_attr( 2329 feature = "enable-serde", 2330 derive(serde_derive::Serialize, serde_derive::Deserialize) 2331 )] 2332 pub struct MachBufferStackSlot { 2333 /// Offset from the bottom of the stack frame. 2334 pub offset: u32, 2335 2336 /// User-provided key to describe this stack slot. 2337 pub key: Option<ir::StackSlotKey>, 2338 } 2339 2340 /// Debug tags: a sequence of references to a stack slot, or a 2341 /// user-defined value, at a particular PC. 2342 #[derive(Clone, Debug, PartialEq)] 2343 #[cfg_attr( 2344 feature = "enable-serde", 2345 derive(serde_derive::Serialize, serde_derive::Deserialize) 2346 )] 2347 pub(crate) struct MachDebugTags { 2348 /// Offset at which this tag applies. 2349 pub offset: CodeOffset, 2350 2351 /// Position on the attached instruction. This indicates whether 2352 /// the tags attach to the prior instruction (i.e., as a return 2353 /// point from a call) or the current instruction (i.e., as a PC 2354 /// seen during a trap). 2355 pub pos: MachDebugTagPos, 2356 2357 /// The range in the tag pool. 2358 pub range: Range<u32>, 2359 } 2360 2361 /// Debug tag position on an instruction. 2362 /// 2363 /// We need to distinguish position on an instruction, and not just 2364 /// use offsets, because of the following case: 2365 /// 2366 /// ```plain 2367 /// <tag1, tag2> call ... 2368 /// <tag3, tag4> trapping_store ... 2369 /// ``` 2370 /// 2371 /// If the stack is walked and interpreted with debug tags while 2372 /// within the call, the PC seen will be the return point, i.e. the 2373 /// address after the call. If the stack is walked and interpreted 2374 /// with debug tags upon a trap of the following instruction, it will 2375 /// be the PC of that instruction -- which is the same PC! Thus to 2376 /// disambiguate which tags we want, we attach a "pre/post" flag to 2377 /// every group of tags at an offset; and when we look up tags, we 2378 /// look them up for an offset and "position" at that offset. 2379 /// 2380 /// Thus there are logically two positions at every offset -- so the 2381 /// above will be emitted as 2382 /// 2383 /// ```plain 2384 /// 0: call ... 2385 /// 4, post: <tag1, tag2> 2386 /// 4, pre: <tag3, tag4> 2387 /// 4: trapping_store ... 2388 /// ``` 2389 #[derive(Clone, Copy, Debug, PartialEq, Eq)] 2390 #[cfg_attr( 2391 feature = "enable-serde", 2392 derive(serde_derive::Serialize, serde_derive::Deserialize) 2393 )] 2394 pub enum MachDebugTagPos { 2395 /// Tags attached after the instruction that ends at this offset. 2396 /// 2397 /// This is used to attach tags to a call, because the PC we see 2398 /// when walking the stack is the *return point*. 2399 Post, 2400 /// Tags attached before the instruction that starts at this offset. 2401 /// 2402 /// This is used to attach tags to every other kind of 2403 /// instruction, because the PC we see when processing a trap of 2404 /// that instruction is the PC of that instruction, not the 2405 /// following one. 2406 Pre, 2407 } 2408 2409 /// Iterator item for visiting debug tags. 2410 pub struct MachBufferDebugTagList<'a> { 2411 /// Offset at which this tag applies. 2412 pub offset: CodeOffset, 2413 2414 /// Position at this offset ("post", attaching to prior 2415 /// instruction, or "pre", attaching to next instruction). 2416 pub pos: MachDebugTagPos, 2417 2418 /// The underlying tags. 2419 pub tags: &'a [DebugTag], 2420 } 2421 2422 /// Implementation of the `TextSectionBuilder` trait backed by `MachBuffer`. 2423 /// 2424 /// Note that `MachBuffer` was primarily written for intra-function references 2425 /// of jumps between basic blocks, but it's also quite usable for entire text 2426 /// sections and resolving references between functions themselves. This 2427 /// builder interprets "blocks" as labeled functions for the purposes of 2428 /// resolving labels internally in the buffer. 2429 pub struct MachTextSectionBuilder<I: VCodeInst> { 2430 buf: MachBuffer<I>, 2431 next_func: usize, 2432 force_veneers: ForceVeneers, 2433 } 2434 2435 impl<I: VCodeInst> MachTextSectionBuilder<I> { 2436 /// Creates a new text section builder which will have `num_funcs` functions 2437 /// pushed into it. 2438 pub fn new(num_funcs: usize) -> MachTextSectionBuilder<I> { 2439 let mut buf = MachBuffer::new(); 2440 buf.reserve_labels_for_blocks(num_funcs); 2441 MachTextSectionBuilder { 2442 buf, 2443 next_func: 0, 2444 force_veneers: ForceVeneers::No, 2445 } 2446 } 2447 } 2448 2449 impl<I: VCodeInst> TextSectionBuilder for MachTextSectionBuilder<I> { 2450 fn append( 2451 &mut self, 2452 labeled: bool, 2453 func: &[u8], 2454 align: u32, 2455 ctrl_plane: &mut ControlPlane, 2456 ) -> u64 { 2457 // Conditionally emit an island if it's necessary to resolve jumps 2458 // between functions which are too far away. 2459 let size = func.len() as u32; 2460 if self.force_veneers == ForceVeneers::Yes || self.buf.island_needed(size) { 2461 self.buf 2462 .emit_island_maybe_forced(self.force_veneers, size, ctrl_plane); 2463 } 2464 2465 self.buf.align_to(align); 2466 let pos = self.buf.cur_offset(); 2467 if labeled { 2468 self.buf.bind_label( 2469 MachLabel::from_block(BlockIndex::new(self.next_func)), 2470 ctrl_plane, 2471 ); 2472 self.next_func += 1; 2473 } 2474 self.buf.put_data(func); 2475 u64::from(pos) 2476 } 2477 2478 fn resolve_reloc(&mut self, offset: u64, reloc: Reloc, addend: Addend, target: usize) -> bool { 2479 crate::trace!( 2480 "Resolving relocation @ {offset:#x} + {addend:#x} to target {target} of kind {reloc:?}" 2481 ); 2482 let label = MachLabel::from_block(BlockIndex::new(target)); 2483 let offset = u32::try_from(offset).unwrap(); 2484 match I::LabelUse::from_reloc(reloc, addend) { 2485 Some(label_use) => { 2486 self.buf.use_label_at_offset(offset, label, label_use); 2487 true 2488 } 2489 None => false, 2490 } 2491 } 2492 2493 fn force_veneers(&mut self) { 2494 self.force_veneers = ForceVeneers::Yes; 2495 } 2496 2497 fn write(&mut self, offset: u64, data: &[u8]) { 2498 self.buf.data[offset.try_into().unwrap()..][..data.len()].copy_from_slice(data); 2499 } 2500 2501 fn finish(&mut self, ctrl_plane: &mut ControlPlane) -> Vec<u8> { 2502 // Double-check all functions were pushed. 2503 assert_eq!(self.next_func, self.buf.label_offsets.len()); 2504 2505 // Finish up any veneers, if necessary. 2506 self.buf 2507 .finish_emission_maybe_forcing_veneers(self.force_veneers, ctrl_plane); 2508 2509 // We don't need the data any more, so return it to the caller. 2510 mem::take(&mut self.buf.data).into_vec() 2511 } 2512 } 2513 2514 // We use an actual instruction definition to do tests, so we depend on the `arm64` feature here. 2515 #[cfg(all(test, feature = "arm64"))] 2516 mod test { 2517 use cranelift_entity::EntityRef as _; 2518 2519 use super::*; 2520 use crate::ir::UserExternalNameRef; 2521 use crate::isa::aarch64::inst::{BranchTarget, CondBrKind, EmitInfo, Inst}; 2522 use crate::isa::aarch64::inst::{OperandSize, xreg}; 2523 use crate::machinst::{MachInstEmit, MachInstEmitState}; 2524 use crate::settings; 2525 2526 fn label(n: u32) -> MachLabel { 2527 MachLabel::from_block(BlockIndex::new(n as usize)) 2528 } 2529 fn target(n: u32) -> BranchTarget { 2530 BranchTarget::Label(label(n)) 2531 } 2532 2533 #[test] 2534 fn test_elide_jump_to_next() { 2535 let info = EmitInfo::new(settings::Flags::new(settings::builder())); 2536 let mut buf = MachBuffer::new(); 2537 let mut state = <Inst as MachInstEmit>::State::default(); 2538 let constants = Default::default(); 2539 2540 buf.reserve_labels_for_blocks(2); 2541 buf.bind_label(label(0), state.ctrl_plane_mut()); 2542 let inst = Inst::Jump { dest: target(1) }; 2543 inst.emit(&mut buf, &info, &mut state); 2544 buf.bind_label(label(1), state.ctrl_plane_mut()); 2545 let buf = buf.finish(&constants, state.ctrl_plane_mut()); 2546 assert_eq!(0, buf.total_size()); 2547 } 2548 2549 #[test] 2550 fn test_elide_trivial_jump_blocks() { 2551 let info = EmitInfo::new(settings::Flags::new(settings::builder())); 2552 let mut buf = MachBuffer::new(); 2553 let mut state = <Inst as MachInstEmit>::State::default(); 2554 let constants = Default::default(); 2555 2556 buf.reserve_labels_for_blocks(4); 2557 2558 buf.bind_label(label(0), state.ctrl_plane_mut()); 2559 let inst = Inst::CondBr { 2560 kind: CondBrKind::NotZero(xreg(0), OperandSize::Size64), 2561 taken: target(1), 2562 not_taken: target(2), 2563 }; 2564 inst.emit(&mut buf, &info, &mut state); 2565 2566 buf.bind_label(label(1), state.ctrl_plane_mut()); 2567 let inst = Inst::Jump { dest: target(3) }; 2568 inst.emit(&mut buf, &info, &mut state); 2569 2570 buf.bind_label(label(2), state.ctrl_plane_mut()); 2571 let inst = Inst::Jump { dest: target(3) }; 2572 inst.emit(&mut buf, &info, &mut state); 2573 2574 buf.bind_label(label(3), state.ctrl_plane_mut()); 2575 2576 let buf = buf.finish(&constants, state.ctrl_plane_mut()); 2577 assert_eq!(0, buf.total_size()); 2578 } 2579 2580 #[test] 2581 fn test_flip_cond() { 2582 let info = EmitInfo::new(settings::Flags::new(settings::builder())); 2583 let mut buf = MachBuffer::new(); 2584 let mut state = <Inst as MachInstEmit>::State::default(); 2585 let constants = Default::default(); 2586 2587 buf.reserve_labels_for_blocks(4); 2588 2589 buf.bind_label(label(0), state.ctrl_plane_mut()); 2590 let inst = Inst::CondBr { 2591 kind: CondBrKind::Zero(xreg(0), OperandSize::Size64), 2592 taken: target(1), 2593 not_taken: target(2), 2594 }; 2595 inst.emit(&mut buf, &info, &mut state); 2596 2597 buf.bind_label(label(1), state.ctrl_plane_mut()); 2598 let inst = Inst::Nop4; 2599 inst.emit(&mut buf, &info, &mut state); 2600 2601 buf.bind_label(label(2), state.ctrl_plane_mut()); 2602 let inst = Inst::Udf { 2603 trap_code: TrapCode::STACK_OVERFLOW, 2604 }; 2605 inst.emit(&mut buf, &info, &mut state); 2606 2607 buf.bind_label(label(3), state.ctrl_plane_mut()); 2608 2609 let buf = buf.finish(&constants, state.ctrl_plane_mut()); 2610 2611 let mut buf2 = MachBuffer::new(); 2612 let mut state = Default::default(); 2613 let inst = Inst::TrapIf { 2614 kind: CondBrKind::NotZero(xreg(0), OperandSize::Size64), 2615 trap_code: TrapCode::STACK_OVERFLOW, 2616 }; 2617 inst.emit(&mut buf2, &info, &mut state); 2618 let inst = Inst::Nop4; 2619 inst.emit(&mut buf2, &info, &mut state); 2620 2621 let buf2 = buf2.finish(&constants, state.ctrl_plane_mut()); 2622 2623 assert_eq!(buf.data, buf2.data); 2624 } 2625 2626 #[test] 2627 fn test_island() { 2628 let info = EmitInfo::new(settings::Flags::new(settings::builder())); 2629 let mut buf = MachBuffer::new(); 2630 let mut state = <Inst as MachInstEmit>::State::default(); 2631 let constants = Default::default(); 2632 2633 buf.reserve_labels_for_blocks(4); 2634 2635 buf.bind_label(label(0), state.ctrl_plane_mut()); 2636 let inst = Inst::CondBr { 2637 kind: CondBrKind::NotZero(xreg(0), OperandSize::Size64), 2638 taken: target(2), 2639 not_taken: target(3), 2640 }; 2641 inst.emit(&mut buf, &info, &mut state); 2642 2643 buf.bind_label(label(1), state.ctrl_plane_mut()); 2644 while buf.cur_offset() < 2000000 { 2645 if buf.island_needed(0) { 2646 buf.emit_island(0, state.ctrl_plane_mut()); 2647 } 2648 let inst = Inst::Nop4; 2649 inst.emit(&mut buf, &info, &mut state); 2650 } 2651 2652 buf.bind_label(label(2), state.ctrl_plane_mut()); 2653 let inst = Inst::Nop4; 2654 inst.emit(&mut buf, &info, &mut state); 2655 2656 buf.bind_label(label(3), state.ctrl_plane_mut()); 2657 let inst = Inst::Nop4; 2658 inst.emit(&mut buf, &info, &mut state); 2659 2660 let buf = buf.finish(&constants, state.ctrl_plane_mut()); 2661 2662 assert_eq!(2000000 + 8, buf.total_size()); 2663 2664 let mut buf2 = MachBuffer::new(); 2665 let mut state = Default::default(); 2666 let inst = Inst::CondBr { 2667 kind: CondBrKind::NotZero(xreg(0), OperandSize::Size64), 2668 2669 // This conditionally taken branch has a 19-bit constant, shifted 2670 // to the left by two, giving us a 21-bit range in total. Half of 2671 // this range positive so the we should be around 1 << 20 bytes 2672 // away for our jump target. 2673 // 2674 // There are two pending fixups by the time we reach this point, 2675 // one for this 19-bit jump and one for the unconditional 26-bit 2676 // jump below. A 19-bit veneer is 4 bytes large and the 26-bit 2677 // veneer is 20 bytes large, which means that pessimistically 2678 // assuming we'll need two veneers. Currently each veneer is 2679 // pessimistically assumed to be the maximal size which means we 2680 // need 40 bytes of extra space, meaning that the actual island 2681 // should come 40-bytes before the deadline. 2682 taken: BranchTarget::ResolvedOffset((1 << 20) - 20 - 20), 2683 2684 // This branch is in-range so no veneers should be needed, it should 2685 // go directly to the target. 2686 not_taken: BranchTarget::ResolvedOffset(2000000 + 4 - 4), 2687 }; 2688 inst.emit(&mut buf2, &info, &mut state); 2689 2690 let buf2 = buf2.finish(&constants, state.ctrl_plane_mut()); 2691 2692 assert_eq!(&buf.data[0..8], &buf2.data[..]); 2693 } 2694 2695 #[test] 2696 fn test_island_backward() { 2697 let info = EmitInfo::new(settings::Flags::new(settings::builder())); 2698 let mut buf = MachBuffer::new(); 2699 let mut state = <Inst as MachInstEmit>::State::default(); 2700 let constants = Default::default(); 2701 2702 buf.reserve_labels_for_blocks(4); 2703 2704 buf.bind_label(label(0), state.ctrl_plane_mut()); 2705 let inst = Inst::Nop4; 2706 inst.emit(&mut buf, &info, &mut state); 2707 2708 buf.bind_label(label(1), state.ctrl_plane_mut()); 2709 let inst = Inst::Nop4; 2710 inst.emit(&mut buf, &info, &mut state); 2711 2712 buf.bind_label(label(2), state.ctrl_plane_mut()); 2713 while buf.cur_offset() < 2000000 { 2714 let inst = Inst::Nop4; 2715 inst.emit(&mut buf, &info, &mut state); 2716 } 2717 2718 buf.bind_label(label(3), state.ctrl_plane_mut()); 2719 let inst = Inst::CondBr { 2720 kind: CondBrKind::NotZero(xreg(0), OperandSize::Size64), 2721 taken: target(0), 2722 not_taken: target(1), 2723 }; 2724 inst.emit(&mut buf, &info, &mut state); 2725 2726 let buf = buf.finish(&constants, state.ctrl_plane_mut()); 2727 2728 assert_eq!(2000000 + 12, buf.total_size()); 2729 2730 let mut buf2 = MachBuffer::new(); 2731 let mut state = Default::default(); 2732 let inst = Inst::CondBr { 2733 kind: CondBrKind::NotZero(xreg(0), OperandSize::Size64), 2734 taken: BranchTarget::ResolvedOffset(8), 2735 not_taken: BranchTarget::ResolvedOffset(4 - (2000000 + 4)), 2736 }; 2737 inst.emit(&mut buf2, &info, &mut state); 2738 let inst = Inst::Jump { 2739 dest: BranchTarget::ResolvedOffset(-(2000000 + 8)), 2740 }; 2741 inst.emit(&mut buf2, &info, &mut state); 2742 2743 let buf2 = buf2.finish(&constants, state.ctrl_plane_mut()); 2744 2745 assert_eq!(&buf.data[2000000..], &buf2.data[..]); 2746 } 2747 2748 #[test] 2749 fn test_multiple_redirect() { 2750 // label0: 2751 // cbz x0, label1 2752 // b label2 2753 // label1: 2754 // b label3 2755 // label2: 2756 // nop 2757 // nop 2758 // b label0 2759 // label3: 2760 // b label4 2761 // label4: 2762 // b label5 2763 // label5: 2764 // b label7 2765 // label6: 2766 // nop 2767 // label7: 2768 // ret 2769 // 2770 // -- should become: 2771 // 2772 // label0: 2773 // cbz x0, label7 2774 // label2: 2775 // nop 2776 // nop 2777 // b label0 2778 // label6: 2779 // nop 2780 // label7: 2781 // ret 2782 2783 let info = EmitInfo::new(settings::Flags::new(settings::builder())); 2784 let mut buf = MachBuffer::new(); 2785 let mut state = <Inst as MachInstEmit>::State::default(); 2786 let constants = Default::default(); 2787 2788 buf.reserve_labels_for_blocks(8); 2789 2790 buf.bind_label(label(0), state.ctrl_plane_mut()); 2791 let inst = Inst::CondBr { 2792 kind: CondBrKind::Zero(xreg(0), OperandSize::Size64), 2793 taken: target(1), 2794 not_taken: target(2), 2795 }; 2796 inst.emit(&mut buf, &info, &mut state); 2797 2798 buf.bind_label(label(1), state.ctrl_plane_mut()); 2799 let inst = Inst::Jump { dest: target(3) }; 2800 inst.emit(&mut buf, &info, &mut state); 2801 2802 buf.bind_label(label(2), state.ctrl_plane_mut()); 2803 let inst = Inst::Nop4; 2804 inst.emit(&mut buf, &info, &mut state); 2805 inst.emit(&mut buf, &info, &mut state); 2806 let inst = Inst::Jump { dest: target(0) }; 2807 inst.emit(&mut buf, &info, &mut state); 2808 2809 buf.bind_label(label(3), state.ctrl_plane_mut()); 2810 let inst = Inst::Jump { dest: target(4) }; 2811 inst.emit(&mut buf, &info, &mut state); 2812 2813 buf.bind_label(label(4), state.ctrl_plane_mut()); 2814 let inst = Inst::Jump { dest: target(5) }; 2815 inst.emit(&mut buf, &info, &mut state); 2816 2817 buf.bind_label(label(5), state.ctrl_plane_mut()); 2818 let inst = Inst::Jump { dest: target(7) }; 2819 inst.emit(&mut buf, &info, &mut state); 2820 2821 buf.bind_label(label(6), state.ctrl_plane_mut()); 2822 let inst = Inst::Nop4; 2823 inst.emit(&mut buf, &info, &mut state); 2824 2825 buf.bind_label(label(7), state.ctrl_plane_mut()); 2826 let inst = Inst::Ret {}; 2827 inst.emit(&mut buf, &info, &mut state); 2828 2829 let buf = buf.finish(&constants, state.ctrl_plane_mut()); 2830 2831 let golden_data = vec![ 2832 0xa0, 0x00, 0x00, 0xb4, // cbz x0, 0x14 2833 0x1f, 0x20, 0x03, 0xd5, // nop 2834 0x1f, 0x20, 0x03, 0xd5, // nop 2835 0xfd, 0xff, 0xff, 0x17, // b 0 2836 0x1f, 0x20, 0x03, 0xd5, // nop 2837 0xc0, 0x03, 0x5f, 0xd6, // ret 2838 ]; 2839 2840 assert_eq!(&golden_data[..], &buf.data[..]); 2841 } 2842 2843 #[test] 2844 fn test_handle_branch_cycle() { 2845 // label0: 2846 // b label1 2847 // label1: 2848 // b label2 2849 // label2: 2850 // b label3 2851 // label3: 2852 // b label4 2853 // label4: 2854 // b label1 // note: not label0 (to make it interesting). 2855 // 2856 // -- should become: 2857 // 2858 // label0, label1, ..., label4: 2859 // b label0 2860 let info = EmitInfo::new(settings::Flags::new(settings::builder())); 2861 let mut buf = MachBuffer::new(); 2862 let mut state = <Inst as MachInstEmit>::State::default(); 2863 let constants = Default::default(); 2864 2865 buf.reserve_labels_for_blocks(5); 2866 2867 buf.bind_label(label(0), state.ctrl_plane_mut()); 2868 let inst = Inst::Jump { dest: target(1) }; 2869 inst.emit(&mut buf, &info, &mut state); 2870 2871 buf.bind_label(label(1), state.ctrl_plane_mut()); 2872 let inst = Inst::Jump { dest: target(2) }; 2873 inst.emit(&mut buf, &info, &mut state); 2874 2875 buf.bind_label(label(2), state.ctrl_plane_mut()); 2876 let inst = Inst::Jump { dest: target(3) }; 2877 inst.emit(&mut buf, &info, &mut state); 2878 2879 buf.bind_label(label(3), state.ctrl_plane_mut()); 2880 let inst = Inst::Jump { dest: target(4) }; 2881 inst.emit(&mut buf, &info, &mut state); 2882 2883 buf.bind_label(label(4), state.ctrl_plane_mut()); 2884 let inst = Inst::Jump { dest: target(1) }; 2885 inst.emit(&mut buf, &info, &mut state); 2886 2887 let buf = buf.finish(&constants, state.ctrl_plane_mut()); 2888 2889 let golden_data = vec![ 2890 0x00, 0x00, 0x00, 0x14, // b 0 2891 ]; 2892 2893 assert_eq!(&golden_data[..], &buf.data[..]); 2894 } 2895 2896 #[test] 2897 fn metadata_records() { 2898 let mut buf = MachBuffer::<Inst>::new(); 2899 let ctrl_plane = &mut Default::default(); 2900 let constants = Default::default(); 2901 2902 buf.reserve_labels_for_blocks(3); 2903 2904 buf.bind_label(label(0), ctrl_plane); 2905 buf.put1(1); 2906 buf.add_trap(TrapCode::HEAP_OUT_OF_BOUNDS); 2907 buf.put1(2); 2908 buf.add_trap(TrapCode::INTEGER_OVERFLOW); 2909 buf.add_trap(TrapCode::INTEGER_DIVISION_BY_ZERO); 2910 buf.add_try_call_site( 2911 Some(0x10), 2912 [ 2913 MachExceptionHandler::Tag(ExceptionTag::new(42), label(2)), 2914 MachExceptionHandler::Default(label(1)), 2915 ] 2916 .into_iter(), 2917 ); 2918 buf.add_reloc( 2919 Reloc::Abs4, 2920 &ExternalName::User(UserExternalNameRef::new(0)), 2921 0, 2922 ); 2923 buf.put1(3); 2924 buf.add_reloc( 2925 Reloc::Abs8, 2926 &ExternalName::User(UserExternalNameRef::new(1)), 2927 1, 2928 ); 2929 buf.put1(4); 2930 buf.bind_label(label(1), ctrl_plane); 2931 buf.put1(0xff); 2932 buf.bind_label(label(2), ctrl_plane); 2933 buf.put1(0xff); 2934 2935 let buf = buf.finish(&constants, ctrl_plane); 2936 2937 assert_eq!(buf.data(), &[1, 2, 3, 4, 0xff, 0xff]); 2938 assert_eq!( 2939 buf.traps() 2940 .iter() 2941 .map(|trap| (trap.offset, trap.code)) 2942 .collect::<Vec<_>>(), 2943 vec![ 2944 (1, TrapCode::HEAP_OUT_OF_BOUNDS), 2945 (2, TrapCode::INTEGER_OVERFLOW), 2946 (2, TrapCode::INTEGER_DIVISION_BY_ZERO) 2947 ] 2948 ); 2949 let call_sites: Vec<_> = buf.call_sites().collect(); 2950 assert_eq!(call_sites[0].ret_addr, 2); 2951 assert_eq!(call_sites[0].frame_offset, Some(0x10)); 2952 assert_eq!( 2953 call_sites[0].exception_handlers, 2954 &[ 2955 FinalizedMachExceptionHandler::Tag(ExceptionTag::new(42), 5), 2956 FinalizedMachExceptionHandler::Default(4) 2957 ], 2958 ); 2959 assert_eq!( 2960 buf.relocs() 2961 .iter() 2962 .map(|reloc| (reloc.offset, reloc.kind)) 2963 .collect::<Vec<_>>(), 2964 vec![(2, Reloc::Abs4), (3, Reloc::Abs8)] 2965 ); 2966 } 2967 } 2968