1 //! In-memory representation of compiled machine code, with labels and fixups to 2 //! refer to those labels. Handles constant-pool island insertion and also 3 //! veneer insertion for out-of-range jumps. 4 //! 5 //! This code exists to solve three problems: 6 //! 7 //! - Branch targets for forward branches are not known until later, when we 8 //! emit code in a single pass through the instruction structs. 9 //! 10 //! - On many architectures, address references or offsets have limited range. 11 //! For example, on AArch64, conditional branches can only target code +/- 1MB 12 //! from the branch itself. 13 //! 14 //! - The lowering of control flow from the CFG-with-edges produced by 15 //! [BlockLoweringOrder](super::BlockLoweringOrder), combined with many empty 16 //! edge blocks when the register allocator does not need to insert any 17 //! spills/reloads/moves in edge blocks, results in many suboptimal branch 18 //! patterns. The lowering also pays no attention to block order, and so 19 //! two-target conditional forms (cond-br followed by uncond-br) can often by 20 //! avoided because one of the targets is the fallthrough. There are several 21 //! cases here where we can simplify to use fewer branches. 22 //! 23 //! This "buffer" implements a single-pass code emission strategy (with a later 24 //! "fixup" pass, but only through recorded fixups, not all instructions). The 25 //! basic idea is: 26 //! 27 //! - Emit branches as they are, including two-target (cond/uncond) compound 28 //! forms, but with zero offsets and optimistically assuming the target will be 29 //! in range. Record the "fixup" for later. Targets are denoted instead by 30 //! symbolic "labels" that are then bound to certain offsets in the buffer as 31 //! we emit code. (Nominally, there is a label at the start of every basic 32 //! block.) 33 //! 34 //! - As we do this, track the offset in the buffer at which the first label 35 //! reference "goes out of range". We call this the "deadline". If we reach the 36 //! deadline and we still have not bound the label to which an unresolved branch 37 //! refers, we have a problem! 38 //! 39 //! - To solve this problem, we emit "islands" full of "veneers". An island is 40 //! simply a chunk of code inserted in the middle of the code actually produced 41 //! by the emitter (e.g., vcode iterating over instruction structs). The emitter 42 //! has some awareness of this: it either asks for an island between blocks, so 43 //! it is not accidentally executed, or else it emits a branch around the island 44 //! when all other options fail (see `Inst::EmitIsland` meta-instruction). 45 //! 46 //! - A "veneer" is an instruction (or sequence of instructions) in an "island" 47 //! that implements a longer-range reference to a label. The idea is that, for 48 //! example, a branch with a limited range can branch to a "veneer" instead, 49 //! which is simply a branch in a form that can use a longer-range reference. On 50 //! AArch64, for example, conditionals have a +/- 1 MB range, but a conditional 51 //! can branch to an unconditional branch which has a +/- 128 MB range. Hence, a 52 //! conditional branch's label reference can be fixed up with a "veneer" to 53 //! achieve a longer range. 54 //! 55 //! - To implement all of this, we require the backend to provide a `LabelUse` 56 //! type that implements a trait. This is nominally an enum that records one of 57 //! several kinds of references to an offset in code -- basically, a relocation 58 //! type -- and will usually correspond to different instruction formats. The 59 //! `LabelUse` implementation specifies the maximum range, how to patch in the 60 //! actual label location when known, and how to generate a veneer to extend the 61 //! range. 62 //! 63 //! That satisfies label references, but we still may have suboptimal branch 64 //! patterns. To clean up the branches, we do a simple "peephole"-style 65 //! optimization on the fly. To do so, the emitter (e.g., `Inst::emit()`) 66 //! informs the buffer of branches in the code and, in the case of conditionals, 67 //! the code that would have been emitted to invert this branch's condition. We 68 //! track the "latest branches": these are branches that are contiguous up to 69 //! the current offset. (If any code is emitted after a branch, that branch or 70 //! run of contiguous branches is no longer "latest".) The latest branches are 71 //! those that we can edit by simply truncating the buffer and doing something 72 //! else instead. 73 //! 74 //! To optimize branches, we implement several simple rules, and try to apply 75 //! them to the "latest branches" when possible: 76 //! 77 //! - A branch with a label target, when that label is bound to the ending 78 //! offset of the branch (the fallthrough location), can be removed altogether, 79 //! because the branch would have no effect). 80 //! 81 //! - An unconditional branch that starts at a label location, and branches to 82 //! another label, results in a "label alias": all references to the label bound 83 //! *to* this branch instruction are instead resolved to the *target* of the 84 //! branch instruction. This effectively removes empty blocks that just 85 //! unconditionally branch to the next block. We call this "branch threading". 86 //! 87 //! - A conditional followed by an unconditional, when the conditional branches 88 //! to the unconditional's fallthrough, results in (i) the truncation of the 89 //! unconditional, (ii) the inversion of the condition's condition, and (iii) 90 //! replacement of the conditional's target (using the original target of the 91 //! unconditional). This is a fancy way of saying "we can flip a two-target 92 //! conditional branch's taken/not-taken targets if it works better with our 93 //! fallthrough". To make this work, the emitter actually gives the buffer 94 //! *both* forms of every conditional branch: the true form is emitted into the 95 //! buffer, and the "inverted" machine-code bytes are provided as part of the 96 //! branch-fixup metadata. 97 //! 98 //! - An unconditional B preceded by another unconditional P, when B's label(s) have 99 //! been redirected to target(B), can be removed entirely. This is an extension 100 //! of the branch-threading optimization, and is valid because if we know there 101 //! will be no fallthrough into this branch instruction (the prior instruction 102 //! is an unconditional jump), and if we know we have successfully redirected 103 //! all labels, then this branch instruction is unreachable. Note that this 104 //! works because the redirection happens before the label is ever resolved 105 //! (fixups happen at island emission time, at which point latest-branches are 106 //! cleared, or at the end of emission), so we are sure to catch and redirect 107 //! all possible paths to this instruction. 108 //! 109 //! # Branch-optimization Correctness 110 //! 111 //! The branch-optimization mechanism depends on a few data structures with 112 //! invariants, which are always held outside the scope of top-level public 113 //! methods: 114 //! 115 //! - The latest-branches list. Each entry describes a span of the buffer 116 //! (start/end offsets), the label target, the corresponding fixup-list entry 117 //! index, and the bytes (must be the same length) for the inverted form, if 118 //! conditional. The list of labels that are bound to the start-offset of this 119 //! branch is *complete* (if any label has a resolved offset equal to `start` 120 //! and is not an alias, it must appear in this list) and *precise* (no label 121 //! in this list can be bound to another offset). No label in this list should 122 //! be an alias. No two branch ranges can overlap, and branches are in 123 //! ascending-offset order. 124 //! 125 //! - The labels-at-tail list. This contains all MachLabels that have been bound 126 //! to (whose resolved offsets are equal to) the tail offset of the buffer. 127 //! No label in this list should be an alias. 128 //! 129 //! - The label_offsets array, containing the bound offset of a label or 130 //! UNKNOWN. No label can be bound at an offset greater than the current 131 //! buffer tail. 132 //! 133 //! - The label_aliases array, containing another label to which a label is 134 //! bound or UNKNOWN. A label's resolved offset is the resolved offset 135 //! of the label it is aliased to, if this is set. 136 //! 137 //! We argue below, at each method, how the invariants in these data structures 138 //! are maintained (grep for "Post-invariant"). 139 //! 140 //! Given these invariants, we argue why each optimization preserves execution 141 //! semantics below (grep for "Preserves execution semantics"). 142 //! 143 //! # Avoiding Quadratic Behavior 144 //! 145 //! There are two cases where we've had to take some care to avoid 146 //! quadratic worst-case behavior: 147 //! 148 //! - The "labels at this branch" list can grow unboundedly if the 149 //! code generator binds many labels at one location. If the count 150 //! gets too high (defined by the `LABEL_LIST_THRESHOLD` constant), we 151 //! simply abort an optimization early in a way that is always correct 152 //! but is conservative. 153 //! 154 //! - The fixup list can interact with island emission to create 155 //! "quadratic island behavior". In a little more detail, one can hit 156 //! this behavior by having some pending fixups (forward label 157 //! references) with long-range label-use kinds, and some others 158 //! with shorter-range references that nonetheless still are pending 159 //! long enough to trigger island generation. In such a case, we 160 //! process the fixup list, generate veneers to extend some forward 161 //! references' ranges, but leave the other (longer-range) ones 162 //! alone. The way this was implemented put them back on a list and 163 //! resulted in quadratic behavior. 164 //! 165 //! To avoid this fixups are split into two lists: one "pending" list and one 166 //! final list. The pending list is kept around for handling fixups related to 167 //! branches so it can be edited/truncated. When an island is reached, which 168 //! starts processing fixups, all pending fixups are flushed into the final 169 //! list. The final list is a `BinaryHeap` which enables fixup processing to 170 //! only process those which are required during island emission, deferring 171 //! all longer-range fixups to later. 172 173 use crate::binemit::{Addend, CodeOffset, Reloc}; 174 use crate::ir::function::FunctionParameters; 175 use crate::ir::{DebugTag, ExceptionTag, ExternalName, RelSourceLoc, SourceLoc, TrapCode}; 176 use crate::isa::unwind::UnwindInst; 177 use crate::machinst::{ 178 BlockIndex, MachInstLabelUse, TextSectionBuilder, VCodeConstant, VCodeConstants, VCodeInst, 179 }; 180 use crate::trace; 181 use crate::{MachInstEmitState, ir}; 182 use crate::{VCodeConstantData, timing}; 183 use core::ops::Range; 184 use cranelift_control::ControlPlane; 185 use cranelift_entity::{PrimaryMap, SecondaryMap, entity_impl}; 186 use smallvec::SmallVec; 187 use std::cmp::Ordering; 188 use std::collections::BinaryHeap; 189 use std::mem; 190 use std::string::String; 191 use std::vec::Vec; 192 193 #[cfg(feature = "enable-serde")] 194 use serde::{Deserialize, Serialize}; 195 196 #[cfg(feature = "enable-serde")] 197 pub trait CompilePhase { 198 type MachSrcLocType: for<'a> Deserialize<'a> + Serialize + core::fmt::Debug + PartialEq + Clone; 199 type SourceLocType: for<'a> Deserialize<'a> + Serialize + core::fmt::Debug + PartialEq + Clone; 200 } 201 202 #[cfg(not(feature = "enable-serde"))] 203 pub trait CompilePhase { 204 type MachSrcLocType: core::fmt::Debug + PartialEq + Clone; 205 type SourceLocType: core::fmt::Debug + PartialEq + Clone; 206 } 207 208 /// Status of a compiled artifact that needs patching before being used. 209 #[derive(Clone, Debug, PartialEq)] 210 #[cfg_attr(feature = "enable-serde", derive(Serialize, Deserialize))] 211 pub struct Stencil; 212 213 /// Status of a compiled artifact ready to use. 214 #[derive(Clone, Debug, PartialEq)] 215 pub struct Final; 216 217 impl CompilePhase for Stencil { 218 type MachSrcLocType = MachSrcLoc<Stencil>; 219 type SourceLocType = RelSourceLoc; 220 } 221 222 impl CompilePhase for Final { 223 type MachSrcLocType = MachSrcLoc<Final>; 224 type SourceLocType = SourceLoc; 225 } 226 227 #[derive(Clone, Copy, Debug, PartialEq, Eq)] 228 enum ForceVeneers { 229 Yes, 230 No, 231 } 232 233 /// A buffer of output to be produced, fixed up, and then emitted to a CodeSink 234 /// in bulk. 235 /// 236 /// This struct uses `SmallVec`s to support small-ish function bodies without 237 /// any heap allocation. As such, it will be several kilobytes large. This is 238 /// likely fine as long as it is stack-allocated for function emission then 239 /// thrown away; but beware if many buffer objects are retained persistently. 240 pub struct MachBuffer<I: VCodeInst> { 241 /// The buffer contents, as raw bytes. 242 data: SmallVec<[u8; 1024]>, 243 /// The required alignment of this buffer. 244 min_alignment: u32, 245 /// Any relocations referring to this code. Note that only *external* 246 /// relocations are tracked here; references to labels within the buffer are 247 /// resolved before emission. 248 relocs: SmallVec<[MachReloc; 16]>, 249 /// Any trap records referring to this code. 250 traps: SmallVec<[MachTrap; 16]>, 251 /// Any call site records referring to this code. 252 call_sites: SmallVec<[MachCallSite; 16]>, 253 /// Any patchable call site locations. 254 patchable_call_sites: SmallVec<[MachPatchableCallSite; 16]>, 255 /// Any exception-handler records referred to at call sites. 256 exception_handlers: SmallVec<[MachExceptionHandler; 16]>, 257 /// Any source location mappings referring to this code. 258 srclocs: SmallVec<[MachSrcLoc<Stencil>; 64]>, 259 /// Any debug tags referring to this code. 260 debug_tags: Vec<MachDebugTags>, 261 /// Pool of debug tags referenced by `MachDebugTags` entries. 262 debug_tag_pool: Vec<DebugTag>, 263 /// Any user stack maps for this code. 264 /// 265 /// Each entry is an `(offset, span, stack_map)` triple. Entries are sorted 266 /// by code offset, and each stack map covers `span` bytes on the stack. 267 user_stack_maps: SmallVec<[(CodeOffset, u32, ir::UserStackMap); 8]>, 268 /// Any unwind info at a given location. 269 unwind_info: SmallVec<[(CodeOffset, UnwindInst); 8]>, 270 /// The current source location in progress (after `start_srcloc()` and 271 /// before `end_srcloc()`). This is a (start_offset, src_loc) tuple. 272 cur_srcloc: Option<(CodeOffset, RelSourceLoc)>, 273 /// Known label offsets; `UNKNOWN_LABEL_OFFSET` if unknown. 274 label_offsets: SmallVec<[CodeOffset; 16]>, 275 /// Label aliases: when one label points to an unconditional jump, and that 276 /// jump points to another label, we can redirect references to the first 277 /// label immediately to the second. 278 /// 279 /// Invariant: we don't have label-alias cycles. We ensure this by, 280 /// before setting label A to alias label B, resolving B's alias 281 /// target (iteratively until a non-aliased label); if B is already 282 /// aliased to A, then we cannot alias A back to B. 283 label_aliases: SmallVec<[MachLabel; 16]>, 284 /// Constants that must be emitted at some point. 285 pending_constants: SmallVec<[VCodeConstant; 16]>, 286 /// Byte size of all constants in `pending_constants`. 287 pending_constants_size: CodeOffset, 288 /// Traps that must be emitted at some point. 289 pending_traps: SmallVec<[MachLabelTrap; 16]>, 290 /// Fixups that haven't yet been flushed into `fixup_records` below and may 291 /// be related to branches that are chomped. These all get added to 292 /// `fixup_records` during island emission. 293 pending_fixup_records: SmallVec<[MachLabelFixup<I>; 16]>, 294 /// The nearest upcoming deadline for entries in `pending_fixup_records`. 295 pending_fixup_deadline: CodeOffset, 296 /// Fixups that must be performed after all code is emitted. 297 fixup_records: BinaryHeap<MachLabelFixup<I>>, 298 /// Latest branches, to facilitate in-place editing for better fallthrough 299 /// behavior and empty-block removal. 300 latest_branches: SmallVec<[MachBranch; 4]>, 301 /// All labels at the current offset (emission tail). This is lazily 302 /// cleared: it is actually accurate as long as the current offset is 303 /// `labels_at_tail_off`, but if `cur_offset()` has grown larger, it should 304 /// be considered as empty. 305 /// 306 /// For correctness, this *must* be complete (i.e., the vector must contain 307 /// all labels whose offsets are resolved to the current tail), because we 308 /// rely on it to update labels when we truncate branches. 309 labels_at_tail: SmallVec<[MachLabel; 4]>, 310 /// The last offset at which `labels_at_tail` is valid. It is conceptually 311 /// always describing the tail of the buffer, but we do not clear 312 /// `labels_at_tail` eagerly when the tail grows, rather we lazily clear it 313 /// when the offset has grown past this (`labels_at_tail_off`) point. 314 /// Always <= `cur_offset()`. 315 labels_at_tail_off: CodeOffset, 316 /// Metadata about all constants that this function has access to. 317 /// 318 /// This records the size/alignment of all constants (not the actual data) 319 /// along with the last available label generated for the constant. This map 320 /// is consulted when constants are referred to and the label assigned to a 321 /// constant may change over time as well. 322 constants: PrimaryMap<VCodeConstant, MachBufferConstant>, 323 /// All recorded usages of constants as pairs of the constant and where the 324 /// constant needs to be placed within `self.data`. Note that the same 325 /// constant may appear in this array multiple times if it was emitted 326 /// multiple times. 327 used_constants: SmallVec<[(VCodeConstant, CodeOffset); 4]>, 328 /// Indicates when a patchable region is currently open, to guard that it's 329 /// not possible to nest patchable regions. 330 open_patchable: bool, 331 /// Stack frame layout metadata. If provided for a MachBuffer 332 /// containing a function body, this allows interpretation of 333 /// runtime state given a view of an active stack frame. 334 frame_layout: Option<MachBufferFrameLayout>, 335 } 336 337 impl MachBufferFinalized<Stencil> { 338 /// Get a finalized machine buffer by applying the function's base source location. 339 pub fn apply_base_srcloc(self, base_srcloc: SourceLoc) -> MachBufferFinalized<Final> { 340 MachBufferFinalized { 341 data: self.data, 342 relocs: self.relocs, 343 traps: self.traps, 344 call_sites: self.call_sites, 345 patchable_call_sites: self.patchable_call_sites, 346 exception_handlers: self.exception_handlers, 347 srclocs: self 348 .srclocs 349 .into_iter() 350 .map(|srcloc| srcloc.apply_base_srcloc(base_srcloc)) 351 .collect(), 352 debug_tags: self.debug_tags, 353 debug_tag_pool: self.debug_tag_pool, 354 user_stack_maps: self.user_stack_maps, 355 unwind_info: self.unwind_info, 356 alignment: self.alignment, 357 frame_layout: self.frame_layout, 358 nop_units: self.nop_units, 359 } 360 } 361 } 362 363 /// A `MachBuffer` once emission is completed: holds generated code and records, 364 /// without fixups. This allows the type to be independent of the backend. 365 #[derive(PartialEq, Debug, Clone)] 366 #[cfg_attr( 367 feature = "enable-serde", 368 derive(serde_derive::Serialize, serde_derive::Deserialize) 369 )] 370 pub struct MachBufferFinalized<T: CompilePhase> { 371 /// The buffer contents, as raw bytes. 372 pub(crate) data: SmallVec<[u8; 1024]>, 373 /// Any relocations referring to this code. Note that only *external* 374 /// relocations are tracked here; references to labels within the buffer are 375 /// resolved before emission. 376 pub(crate) relocs: SmallVec<[FinalizedMachReloc; 16]>, 377 /// Any trap records referring to this code. 378 pub(crate) traps: SmallVec<[MachTrap; 16]>, 379 /// Any call site records referring to this code. 380 pub(crate) call_sites: SmallVec<[MachCallSite; 16]>, 381 /// Any patchable call site locations refering to this code. 382 pub(crate) patchable_call_sites: SmallVec<[MachPatchableCallSite; 16]>, 383 /// Any exception-handler records referred to at call sites. 384 pub(crate) exception_handlers: SmallVec<[FinalizedMachExceptionHandler; 16]>, 385 /// Any source location mappings referring to this code. 386 pub(crate) srclocs: SmallVec<[T::MachSrcLocType; 64]>, 387 /// Any debug tags referring to this code. 388 pub(crate) debug_tags: Vec<MachDebugTags>, 389 /// Pool of debug tags referenced by `MachDebugTags` entries. 390 pub(crate) debug_tag_pool: Vec<DebugTag>, 391 /// Any user stack maps for this code. 392 /// 393 /// Each entry is an `(offset, span, stack_map)` triple. Entries are sorted 394 /// by code offset, and each stack map covers `span` bytes on the stack. 395 pub(crate) user_stack_maps: SmallVec<[(CodeOffset, u32, ir::UserStackMap); 8]>, 396 /// Stack frame layout metadata. If provided for a MachBuffer 397 /// containing a function body, this allows interpretation of 398 /// runtime state given a view of an active stack frame. 399 pub(crate) frame_layout: Option<MachBufferFrameLayout>, 400 /// Any unwind info at a given location. 401 pub unwind_info: SmallVec<[(CodeOffset, UnwindInst); 8]>, 402 /// The required alignment of this buffer. 403 pub alignment: u32, 404 /// The means by which to NOP out patchable call sites. 405 /// 406 /// This allows a consumer of a `MachBufferFinalized` to disable 407 /// patchable call sites (which are enabled by default) without 408 /// specific knowledge of the target ISA. 409 /// 410 /// Each entry is one form of nop, and these are required to be 411 /// sorted in ascending-size order. 412 pub nop_units: Vec<Vec<u8>>, 413 } 414 415 const UNKNOWN_LABEL_OFFSET: CodeOffset = 0xffff_ffff; 416 const UNKNOWN_LABEL: MachLabel = MachLabel(0xffff_ffff); 417 418 /// Threshold on max length of `labels_at_this_branch` list to avoid 419 /// unbounded quadratic behavior (see comment below at use-site). 420 const LABEL_LIST_THRESHOLD: usize = 100; 421 422 /// A label refers to some offset in a `MachBuffer`. It may not be resolved at 423 /// the point at which it is used by emitted code; the buffer records "fixups" 424 /// for references to the label, and will come back and patch the code 425 /// appropriately when the label's location is eventually known. 426 #[derive(Clone, Copy, Debug, PartialEq, Eq, PartialOrd, Ord, Hash)] 427 pub struct MachLabel(u32); 428 entity_impl!(MachLabel); 429 430 impl MachLabel { 431 /// Get a label for a block. (The first N MachLabels are always reserved for 432 /// the N blocks in the vcode.) 433 pub fn from_block(bindex: BlockIndex) -> MachLabel { 434 MachLabel(bindex.index() as u32) 435 } 436 437 /// Creates a string representing this label, for convenience. 438 pub fn to_string(&self) -> String { 439 format!("label{}", self.0) 440 } 441 } 442 443 impl Default for MachLabel { 444 fn default() -> Self { 445 UNKNOWN_LABEL 446 } 447 } 448 449 /// Represents the beginning of an editable region in the [`MachBuffer`], while code emission is 450 /// still occurring. An [`OpenPatchRegion`] is closed by [`MachBuffer::end_patchable`], consuming 451 /// the [`OpenPatchRegion`] token in the process. 452 pub struct OpenPatchRegion(usize); 453 454 /// A region in the [`MachBuffer`] code buffer that can be edited prior to finalization. An example 455 /// of where you might want to use this is for patching instructions that mention constants that 456 /// won't be known until later: [`MachBuffer::start_patchable`] can be used to begin the patchable 457 /// region, instructions can be emitted with placeholder constants, and the [`PatchRegion`] token 458 /// can be produced by [`MachBuffer::end_patchable`]. Once the values of those constants are known, 459 /// the [`PatchRegion::patch`] function can be used to get a mutable buffer to the instruction 460 /// bytes, and the constants uses can be updated directly. 461 pub struct PatchRegion { 462 range: Range<usize>, 463 } 464 465 impl PatchRegion { 466 /// Consume the patch region to yield a mutable slice of the [`MachBuffer`] data buffer. 467 pub fn patch<I: VCodeInst>(self, buffer: &mut MachBuffer<I>) -> &mut [u8] { 468 &mut buffer.data[self.range] 469 } 470 } 471 472 impl<I: VCodeInst> MachBuffer<I> { 473 /// Create a new section, known to start at `start_offset` and with a size limited to 474 /// `length_limit`. 475 pub fn new() -> MachBuffer<I> { 476 MachBuffer { 477 data: SmallVec::new(), 478 min_alignment: I::function_alignment().minimum, 479 relocs: SmallVec::new(), 480 traps: SmallVec::new(), 481 call_sites: SmallVec::new(), 482 patchable_call_sites: SmallVec::new(), 483 exception_handlers: SmallVec::new(), 484 srclocs: SmallVec::new(), 485 debug_tags: vec![], 486 debug_tag_pool: vec![], 487 user_stack_maps: SmallVec::new(), 488 unwind_info: SmallVec::new(), 489 cur_srcloc: None, 490 label_offsets: SmallVec::new(), 491 label_aliases: SmallVec::new(), 492 pending_constants: SmallVec::new(), 493 pending_constants_size: 0, 494 pending_traps: SmallVec::new(), 495 pending_fixup_records: SmallVec::new(), 496 pending_fixup_deadline: u32::MAX, 497 fixup_records: Default::default(), 498 latest_branches: SmallVec::new(), 499 labels_at_tail: SmallVec::new(), 500 labels_at_tail_off: 0, 501 constants: Default::default(), 502 used_constants: Default::default(), 503 open_patchable: false, 504 frame_layout: None, 505 } 506 } 507 508 /// Current offset from start of buffer. 509 pub fn cur_offset(&self) -> CodeOffset { 510 self.data.len() as CodeOffset 511 } 512 513 /// Add a byte. 514 pub fn put1(&mut self, value: u8) { 515 self.data.push(value); 516 517 // Post-invariant: conceptual-labels_at_tail contains a complete and 518 // precise list of labels bound at `cur_offset()`. We have advanced 519 // `cur_offset()`, hence if it had been equal to `labels_at_tail_off` 520 // before, it is not anymore (and it cannot become equal, because 521 // `labels_at_tail_off` is always <= `cur_offset()`). Thus the list is 522 // conceptually empty (even though it is only lazily cleared). No labels 523 // can be bound at this new offset (by invariant on `label_offsets`). 524 // Hence the invariant holds. 525 } 526 527 /// Add 2 bytes. 528 pub fn put2(&mut self, value: u16) { 529 let bytes = value.to_le_bytes(); 530 self.data.extend_from_slice(&bytes[..]); 531 532 // Post-invariant: as for `put1()`. 533 } 534 535 /// Add 4 bytes. 536 pub fn put4(&mut self, value: u32) { 537 let bytes = value.to_le_bytes(); 538 self.data.extend_from_slice(&bytes[..]); 539 540 // Post-invariant: as for `put1()`. 541 } 542 543 /// Add 8 bytes. 544 pub fn put8(&mut self, value: u64) { 545 let bytes = value.to_le_bytes(); 546 self.data.extend_from_slice(&bytes[..]); 547 548 // Post-invariant: as for `put1()`. 549 } 550 551 /// Add a slice of bytes. 552 pub fn put_data(&mut self, data: &[u8]) { 553 self.data.extend_from_slice(data); 554 555 // Post-invariant: as for `put1()`. 556 } 557 558 /// Reserve appended space and return a mutable slice referring to it. 559 pub fn get_appended_space(&mut self, len: usize) -> &mut [u8] { 560 let off = self.data.len(); 561 let new_len = self.data.len() + len; 562 self.data.resize(new_len, 0); 563 &mut self.data[off..] 564 565 // Post-invariant: as for `put1()`. 566 } 567 568 /// Align up to the given alignment. 569 pub fn align_to(&mut self, align_to: CodeOffset) { 570 trace!("MachBuffer: align to {}", align_to); 571 assert!( 572 align_to.is_power_of_two(), 573 "{align_to} is not a power of two" 574 ); 575 while self.cur_offset() & (align_to - 1) != 0 { 576 self.put1(0); 577 } 578 579 // Post-invariant: as for `put1()`. 580 } 581 582 /// Begin a region of patchable code. There is one requirement for the 583 /// code that is emitted: It must not introduce any instructions that 584 /// could be chomped (branches are an example of this). In other words, 585 /// you must not call [`MachBuffer::add_cond_branch`] or 586 /// [`MachBuffer::add_uncond_branch`] between calls to this method and 587 /// [`MachBuffer::end_patchable`]. 588 pub fn start_patchable(&mut self) -> OpenPatchRegion { 589 assert!(!self.open_patchable, "Patchable regions may not be nested"); 590 self.open_patchable = true; 591 OpenPatchRegion(usize::try_from(self.cur_offset()).unwrap()) 592 } 593 594 /// End a region of patchable code, yielding a [`PatchRegion`] value that 595 /// can be consumed later to produce a one-off mutable slice to the 596 /// associated region of the data buffer. 597 pub fn end_patchable(&mut self, open: OpenPatchRegion) -> PatchRegion { 598 // No need to assert the state of `open_patchable` here, as we take 599 // ownership of the only `OpenPatchable` value. 600 self.open_patchable = false; 601 let end = usize::try_from(self.cur_offset()).unwrap(); 602 PatchRegion { range: open.0..end } 603 } 604 605 /// Allocate a `Label` to refer to some offset. May not be bound to a fixed 606 /// offset yet. 607 pub fn get_label(&mut self) -> MachLabel { 608 let l = self.label_offsets.len() as u32; 609 self.label_offsets.push(UNKNOWN_LABEL_OFFSET); 610 self.label_aliases.push(UNKNOWN_LABEL); 611 trace!("MachBuffer: new label -> {:?}", MachLabel(l)); 612 MachLabel(l) 613 614 // Post-invariant: the only mutation is to add a new label; it has no 615 // bound offset yet, so it trivially satisfies all invariants. 616 } 617 618 /// Reserve the first N MachLabels for blocks. 619 pub fn reserve_labels_for_blocks(&mut self, blocks: usize) { 620 trace!("MachBuffer: first {} labels are for blocks", blocks); 621 debug_assert!(self.label_offsets.is_empty()); 622 self.label_offsets.resize(blocks, UNKNOWN_LABEL_OFFSET); 623 self.label_aliases.resize(blocks, UNKNOWN_LABEL); 624 625 // Post-invariant: as for `get_label()`. 626 } 627 628 /// Registers metadata in this `MachBuffer` about the `constants` provided. 629 /// 630 /// This will record the size/alignment of all constants which will prepare 631 /// them for emission later on. 632 pub fn register_constants(&mut self, constants: &VCodeConstants) { 633 for (c, val) in constants.iter() { 634 self.register_constant(&c, val); 635 } 636 } 637 638 /// Similar to [`MachBuffer::register_constants`] but registers a 639 /// single constant metadata. This function is useful in 640 /// situations where not all constants are known at the time of 641 /// emission. 642 pub fn register_constant(&mut self, constant: &VCodeConstant, data: &VCodeConstantData) { 643 let c2 = self.constants.push(MachBufferConstant { 644 upcoming_label: None, 645 align: data.alignment(), 646 size: data.as_slice().len(), 647 }); 648 assert_eq!(*constant, c2); 649 } 650 651 /// Completes constant emission by iterating over `self.used_constants` and 652 /// filling in the "holes" with the constant values provided by `constants`. 653 /// 654 /// Returns the alignment required for this entire buffer. Alignment starts 655 /// at the ISA's minimum function alignment and can be increased due to 656 /// constant requirements. 657 fn finish_constants(&mut self, constants: &VCodeConstants) -> u32 { 658 let mut alignment = self.min_alignment; 659 for (constant, offset) in mem::take(&mut self.used_constants) { 660 let constant = constants.get(constant); 661 let data = constant.as_slice(); 662 self.data[offset as usize..][..data.len()].copy_from_slice(data); 663 alignment = constant.alignment().max(alignment); 664 } 665 alignment 666 } 667 668 /// Returns a label that can be used to refer to the `constant` provided. 669 /// 670 /// This will automatically defer a new constant to be emitted for 671 /// `constant` if it has not been previously emitted. Note that this 672 /// function may return a different label for the same constant at 673 /// different points in time. The label is valid to use only from the 674 /// current location; the MachBuffer takes care to emit the same constant 675 /// multiple times if needed so the constant is always in range. 676 pub fn get_label_for_constant(&mut self, constant: VCodeConstant) -> MachLabel { 677 let MachBufferConstant { 678 align, 679 size, 680 upcoming_label, 681 } = self.constants[constant]; 682 if let Some(label) = upcoming_label { 683 return label; 684 } 685 686 let label = self.get_label(); 687 trace!( 688 "defer constant: eventually emit {size} bytes aligned \ 689 to {align} at label {label:?}", 690 ); 691 self.pending_constants.push(constant); 692 self.pending_constants_size += size as u32; 693 self.constants[constant].upcoming_label = Some(label); 694 label 695 } 696 697 /// Bind a label to the current offset. A label can only be bound once. 698 pub fn bind_label(&mut self, label: MachLabel, ctrl_plane: &mut ControlPlane) { 699 trace!( 700 "MachBuffer: bind label {:?} at offset {}", 701 label, 702 self.cur_offset() 703 ); 704 debug_assert_eq!(self.label_offsets[label.0 as usize], UNKNOWN_LABEL_OFFSET); 705 debug_assert_eq!(self.label_aliases[label.0 as usize], UNKNOWN_LABEL); 706 let offset = self.cur_offset(); 707 self.label_offsets[label.0 as usize] = offset; 708 self.lazily_clear_labels_at_tail(); 709 self.labels_at_tail.push(label); 710 711 // Invariants hold: bound offset of label is <= cur_offset (in fact it 712 // is equal). If the `labels_at_tail` list was complete and precise 713 // before, it is still, because we have bound this label to the current 714 // offset and added it to the list (which contains all labels at the 715 // current offset). 716 717 self.optimize_branches(ctrl_plane); 718 719 // Post-invariant: by `optimize_branches()` (see argument there). 720 } 721 722 /// Lazily clear `labels_at_tail` if the tail offset has moved beyond the 723 /// offset that it applies to. 724 fn lazily_clear_labels_at_tail(&mut self) { 725 let offset = self.cur_offset(); 726 if offset > self.labels_at_tail_off { 727 self.labels_at_tail_off = offset; 728 self.labels_at_tail.clear(); 729 } 730 731 // Post-invariant: either labels_at_tail_off was at cur_offset, and 732 // state is untouched, or was less than cur_offset, in which case the 733 // labels_at_tail list was conceptually empty, and is now actually 734 // empty. 735 } 736 737 /// Resolve a label to an offset, if known. May return `UNKNOWN_LABEL_OFFSET`. 738 pub(crate) fn resolve_label_offset(&self, mut label: MachLabel) -> CodeOffset { 739 let mut iters = 0; 740 while self.label_aliases[label.0 as usize] != UNKNOWN_LABEL { 741 label = self.label_aliases[label.0 as usize]; 742 // To protect against an infinite loop (despite our assurances to 743 // ourselves that the invariants make this impossible), assert out 744 // after 1M iterations. The number of basic blocks is limited 745 // in most contexts anyway so this should be impossible to hit with 746 // a legitimate input. 747 iters += 1; 748 assert!(iters < 1_000_000, "Unexpected cycle in label aliases"); 749 } 750 self.label_offsets[label.0 as usize] 751 752 // Post-invariant: no mutations. 753 } 754 755 /// Emit a reference to the given label with the given reference type (i.e., 756 /// branch-instruction format) at the current offset. This is like a 757 /// relocation, but handled internally. 758 /// 759 /// This can be called before the branch is actually emitted; fixups will 760 /// not happen until an island is emitted or the buffer is finished. 761 pub fn use_label_at_offset(&mut self, offset: CodeOffset, label: MachLabel, kind: I::LabelUse) { 762 trace!( 763 "MachBuffer: use_label_at_offset: offset {} label {:?} kind {:?}", 764 offset, label, kind 765 ); 766 767 // Add the fixup, and update the worst-case island size based on a 768 // veneer for this label use. 769 let fixup = MachLabelFixup { 770 label, 771 offset, 772 kind, 773 }; 774 self.pending_fixup_deadline = self.pending_fixup_deadline.min(fixup.deadline()); 775 self.pending_fixup_records.push(fixup); 776 777 // Post-invariant: no mutations to branches/labels data structures. 778 } 779 780 /// Inform the buffer of an unconditional branch at the given offset, 781 /// targeting the given label. May be used to optimize branches. 782 /// The last added label-use must correspond to this branch. 783 /// This must be called when the current offset is equal to `start`; i.e., 784 /// before actually emitting the branch. This implies that for a branch that 785 /// uses a label and is eligible for optimizations by the MachBuffer, the 786 /// proper sequence is: 787 /// 788 /// - Call `use_label_at_offset()` to emit the fixup record. 789 /// - Call `add_uncond_branch()` to make note of the branch. 790 /// - Emit the bytes for the branch's machine code. 791 /// 792 /// Additional requirement: no labels may be bound between `start` and `end` 793 /// (exclusive on both ends). 794 pub fn add_uncond_branch(&mut self, start: CodeOffset, end: CodeOffset, target: MachLabel) { 795 debug_assert!( 796 !self.open_patchable, 797 "Branch instruction inserted within a patchable region" 798 ); 799 assert!(self.cur_offset() == start); 800 debug_assert!(end > start); 801 assert!(!self.pending_fixup_records.is_empty()); 802 let fixup = self.pending_fixup_records.len() - 1; 803 self.lazily_clear_labels_at_tail(); 804 self.latest_branches.push(MachBranch { 805 start, 806 end, 807 target, 808 fixup, 809 inverted: None, 810 labels_at_this_branch: self.labels_at_tail.clone(), 811 }); 812 813 // Post-invariant: we asserted branch start is current tail; the list of 814 // labels at branch is cloned from list of labels at current tail. 815 } 816 817 /// Inform the buffer of a conditional branch at the given offset, 818 /// targeting the given label. May be used to optimize branches. 819 /// The last added label-use must correspond to this branch. 820 /// 821 /// Additional requirement: no labels may be bound between `start` and `end` 822 /// (exclusive on both ends). 823 pub fn add_cond_branch( 824 &mut self, 825 start: CodeOffset, 826 end: CodeOffset, 827 target: MachLabel, 828 inverted: &[u8], 829 ) { 830 debug_assert!( 831 !self.open_patchable, 832 "Branch instruction inserted within a patchable region" 833 ); 834 assert!(self.cur_offset() == start); 835 debug_assert!(end > start); 836 assert!(!self.pending_fixup_records.is_empty()); 837 debug_assert!( 838 inverted.len() == (end - start) as usize, 839 "branch length = {}, but inverted length = {}", 840 end - start, 841 inverted.len() 842 ); 843 let fixup = self.pending_fixup_records.len() - 1; 844 let inverted = Some(SmallVec::from(inverted)); 845 self.lazily_clear_labels_at_tail(); 846 self.latest_branches.push(MachBranch { 847 start, 848 end, 849 target, 850 fixup, 851 inverted, 852 labels_at_this_branch: self.labels_at_tail.clone(), 853 }); 854 855 // Post-invariant: we asserted branch start is current tail; labels at 856 // branch list is cloned from list of labels at current tail. 857 } 858 859 fn truncate_last_branch(&mut self) { 860 debug_assert!( 861 !self.open_patchable, 862 "Branch instruction truncated within a patchable region" 863 ); 864 865 self.lazily_clear_labels_at_tail(); 866 // Invariants hold at this point. 867 868 let b = self.latest_branches.pop().unwrap(); 869 assert!(b.end == self.cur_offset()); 870 871 // State: 872 // [PRE CODE] 873 // Offset b.start, b.labels_at_this_branch: 874 // [BRANCH CODE] 875 // cur_off, self.labels_at_tail --> 876 // (end of buffer) 877 self.data.truncate(b.start as usize); 878 self.pending_fixup_records.truncate(b.fixup); 879 880 // Trim srclocs and debug tags now past the end of the buffer. 881 while let Some(last_srcloc) = self.srclocs.last_mut() { 882 if last_srcloc.end <= b.start { 883 break; 884 } 885 if last_srcloc.start < b.start { 886 last_srcloc.end = b.start; 887 break; 888 } 889 self.srclocs.pop(); 890 } 891 while let Some(last_debug_tag) = self.debug_tags.last() { 892 if last_debug_tag.offset <= b.start { 893 break; 894 } 895 self.debug_tags.pop(); 896 } 897 898 // State: 899 // [PRE CODE] 900 // cur_off, Offset b.start, b.labels_at_this_branch: 901 // (end of buffer) 902 // 903 // self.labels_at_tail --> (past end of buffer) 904 let cur_off = self.cur_offset(); 905 self.labels_at_tail_off = cur_off; 906 // State: 907 // [PRE CODE] 908 // cur_off, Offset b.start, b.labels_at_this_branch, 909 // self.labels_at_tail: 910 // (end of buffer) 911 // 912 // resolve_label_offset(l) for l in labels_at_tail: 913 // (past end of buffer) 914 915 trace!( 916 "truncate_last_branch: truncated {:?}; off now {}", 917 b, cur_off 918 ); 919 920 // Fix up resolved label offsets for labels at tail. 921 for &l in &self.labels_at_tail { 922 self.label_offsets[l.0 as usize] = cur_off; 923 } 924 // Old labels_at_this_branch are now at cur_off. 925 self.labels_at_tail.extend(b.labels_at_this_branch); 926 927 // Post-invariant: this operation is defined to truncate the buffer, 928 // which moves cur_off backward, and to move labels at the end of the 929 // buffer back to the start-of-branch offset. 930 // 931 // latest_branches satisfies all invariants: 932 // - it has no branches past the end of the buffer (branches are in 933 // order, we removed the last one, and we truncated the buffer to just 934 // before the start of that branch) 935 // - no labels were moved to lower offsets than the (new) cur_off, so 936 // the labels_at_this_branch list for any other branch need not change. 937 // 938 // labels_at_tail satisfies all invariants: 939 // - all labels that were at the tail after the truncated branch are 940 // moved backward to just before the branch, which becomes the new tail; 941 // thus every element in the list should remain (ensured by `.extend()` 942 // above). 943 // - all labels that refer to the new tail, which is the start-offset of 944 // the truncated branch, must be present. The `labels_at_this_branch` 945 // list in the truncated branch's record is a complete and precise list 946 // of exactly these labels; we append these to labels_at_tail. 947 // - labels_at_tail_off is at cur_off after truncation occurs, so the 948 // list is valid (not to be lazily cleared). 949 // 950 // The stated operation was performed: 951 // - For each label at the end of the buffer prior to this method, it 952 // now resolves to the new (truncated) end of the buffer: it must have 953 // been in `labels_at_tail` (this list is precise and complete, and 954 // the tail was at the end of the truncated branch on entry), and we 955 // iterate over this list and set `label_offsets` to the new tail. 956 // None of these labels could have been an alias (by invariant), so 957 // `label_offsets` is authoritative for each. 958 // - No other labels will be past the end of the buffer, because of the 959 // requirement that no labels be bound to the middle of branch ranges 960 // (see comments to `add_{cond,uncond}_branch()`). 961 // - The buffer is truncated to just before the last branch, and the 962 // fixup record referring to that last branch is removed. 963 } 964 965 /// Performs various optimizations on branches pointing at the current label. 966 pub fn optimize_branches(&mut self, ctrl_plane: &mut ControlPlane) { 967 if ctrl_plane.get_decision() { 968 return; 969 } 970 971 self.lazily_clear_labels_at_tail(); 972 // Invariants valid at this point. 973 974 trace!( 975 "enter optimize_branches:\n b = {:?}\n l = {:?}\n f = {:?}", 976 self.latest_branches, self.labels_at_tail, self.pending_fixup_records 977 ); 978 979 // We continue to munch on branches at the tail of the buffer until no 980 // more rules apply. Note that the loop only continues if a branch is 981 // actually truncated (or if labels are redirected away from a branch), 982 // so this always makes progress. 983 while let Some(b) = self.latest_branches.last() { 984 let cur_off = self.cur_offset(); 985 trace!("optimize_branches: last branch {:?} at off {}", b, cur_off); 986 // If there has been any code emission since the end of the last branch or 987 // label definition, then there's nothing we can edit (because we 988 // don't move code once placed, only back up and overwrite), so 989 // clear the records and finish. 990 if b.end < cur_off { 991 break; 992 } 993 994 // If the "labels at this branch" list on this branch is 995 // longer than a threshold, don't do any simplification, 996 // and let the branch remain to separate those labels from 997 // the current tail. This avoids quadratic behavior (see 998 // #3468): otherwise, if a long string of "goto next; 999 // next:" patterns are emitted, all of the labels will 1000 // coalesce into a long list of aliases for the current 1001 // buffer tail. We must track all aliases of the current 1002 // tail for correctness, but we are also allowed to skip 1003 // optimization (removal) of any branch, so we take the 1004 // escape hatch here and let it stand. In effect this 1005 // "spreads" the many thousands of labels in the 1006 // pathological case among an actual (harmless but 1007 // suboptimal) instruction once per N labels. 1008 if b.labels_at_this_branch.len() > LABEL_LIST_THRESHOLD { 1009 break; 1010 } 1011 1012 // Invariant: we are looking at a branch that ends at the tail of 1013 // the buffer. 1014 1015 // For any branch, conditional or unconditional: 1016 // - If the target is a label at the current offset, then remove 1017 // the conditional branch, and reset all labels that targeted 1018 // the current offset (end of branch) to the truncated 1019 // end-of-code. 1020 // 1021 // Preserves execution semantics: a branch to its own fallthrough 1022 // address is equivalent to a no-op; in both cases, nextPC is the 1023 // fallthrough. 1024 if self.resolve_label_offset(b.target) == cur_off { 1025 trace!("branch with target == cur off; truncating"); 1026 self.truncate_last_branch(); 1027 continue; 1028 } 1029 1030 // If latest is an unconditional branch: 1031 // 1032 // - If the branch's target is not its own start address, then for 1033 // each label at the start of branch, make the label an alias of the 1034 // branch target, and remove the label from the "labels at this 1035 // branch" list. 1036 // 1037 // - Preserves execution semantics: an unconditional branch's 1038 // only effect is to set PC to a new PC; this change simply 1039 // collapses one step in the step-semantics. 1040 // 1041 // - Post-invariant: the labels that were bound to the start of 1042 // this branch become aliases, so they must not be present in any 1043 // labels-at-this-branch list or the labels-at-tail list. The 1044 // labels are removed form the latest-branch record's 1045 // labels-at-this-branch list, and are never placed in the 1046 // labels-at-tail list. Furthermore, it is correct that they are 1047 // not in either list, because they are now aliases, and labels 1048 // that are aliases remain aliases forever. 1049 // 1050 // - If there is a prior unconditional branch that ends just before 1051 // this one begins, and this branch has no labels bound to its 1052 // start, then we can truncate this branch, because it is entirely 1053 // unreachable (we have redirected all labels that make it 1054 // reachable otherwise). Do so and continue around the loop. 1055 // 1056 // - Preserves execution semantics: the branch is unreachable, 1057 // because execution can only flow into an instruction from the 1058 // prior instruction's fallthrough or from a branch bound to that 1059 // instruction's start offset. Unconditional branches have no 1060 // fallthrough, so if the prior instruction is an unconditional 1061 // branch, no fallthrough entry can happen. The 1062 // labels-at-this-branch list is complete (by invariant), so if it 1063 // is empty, then the instruction is entirely unreachable. Thus, 1064 // it can be removed. 1065 // 1066 // - Post-invariant: ensured by truncate_last_branch(). 1067 // 1068 // - If there is a prior conditional branch whose target label 1069 // resolves to the current offset (branches around the 1070 // unconditional branch), then remove the unconditional branch, 1071 // and make the target of the unconditional the target of the 1072 // conditional instead. 1073 // 1074 // - Preserves execution semantics: previously we had: 1075 // 1076 // L1: 1077 // cond_br L2 1078 // br L3 1079 // L2: 1080 // (end of buffer) 1081 // 1082 // by removing the last branch, we have: 1083 // 1084 // L1: 1085 // cond_br L2 1086 // L2: 1087 // (end of buffer) 1088 // 1089 // we then fix up the records for the conditional branch to 1090 // have: 1091 // 1092 // L1: 1093 // cond_br.inverted L3 1094 // L2: 1095 // 1096 // In the original code, control flow reaches L2 when the 1097 // conditional branch's predicate is true, and L3 otherwise. In 1098 // the optimized code, the same is true. 1099 // 1100 // - Post-invariant: all edits to latest_branches and 1101 // labels_at_tail are performed by `truncate_last_branch()`, 1102 // which maintains the invariants at each step. 1103 1104 if b.is_uncond() { 1105 // Set any label equal to current branch's start as an alias of 1106 // the branch's target, if the target is not the branch itself 1107 // (i.e., an infinite loop). 1108 // 1109 // We cannot perform this aliasing if the target of this branch 1110 // ultimately aliases back here; if so, we need to keep this 1111 // branch, so break out of this loop entirely (and clear the 1112 // latest-branches list below). 1113 // 1114 // Note that this check is what prevents cycles from forming in 1115 // `self.label_aliases`. To see why, consider an arbitrary start 1116 // state: 1117 // 1118 // label_aliases[L1] = L2, label_aliases[L2] = L3, ..., up to 1119 // Ln, which is not aliased. 1120 // 1121 // We would create a cycle if we assigned label_aliases[Ln] 1122 // = L1. Note that the below assignment is the only write 1123 // to label_aliases. 1124 // 1125 // By our other invariants, we have that Ln (`l` below) 1126 // resolves to the offset `b.start`, because it is in the 1127 // set `b.labels_at_this_branch`. 1128 // 1129 // If L1 were already aliased, through some arbitrarily deep 1130 // chain, to Ln, then it must also resolve to this offset 1131 // `b.start`. 1132 // 1133 // By checking the resolution of `L1` against this offset, 1134 // and aborting this branch-simplification if they are 1135 // equal, we prevent the below assignment from ever creating 1136 // a cycle. 1137 if self.resolve_label_offset(b.target) != b.start { 1138 let redirected = b.labels_at_this_branch.len(); 1139 for &l in &b.labels_at_this_branch { 1140 trace!( 1141 " -> label at start of branch {:?} redirected to target {:?}", 1142 l, b.target 1143 ); 1144 self.label_aliases[l.0 as usize] = b.target; 1145 // NOTE: we continue to ensure the invariant that labels 1146 // pointing to tail of buffer are in `labels_at_tail` 1147 // because we already ensured above that the last branch 1148 // cannot have a target of `cur_off`; so we never have 1149 // to put the label into `labels_at_tail` when moving it 1150 // here. 1151 } 1152 // Maintain invariant: all branches have been redirected 1153 // and are no longer pointing at the start of this branch. 1154 let mut_b = self.latest_branches.last_mut().unwrap(); 1155 mut_b.labels_at_this_branch.clear(); 1156 1157 if redirected > 0 { 1158 trace!(" -> after label redirects, restarting loop"); 1159 continue; 1160 } 1161 } else { 1162 break; 1163 } 1164 1165 let b = self.latest_branches.last().unwrap(); 1166 1167 // Examine any immediately preceding branch. 1168 if self.latest_branches.len() > 1 { 1169 let prev_b = &self.latest_branches[self.latest_branches.len() - 2]; 1170 trace!(" -> more than one branch; prev_b = {:?}", prev_b); 1171 // This uncond is immediately after another uncond; we 1172 // should have already redirected labels to this uncond away 1173 // (but check to be sure); so we can truncate this uncond. 1174 if prev_b.is_uncond() 1175 && prev_b.end == b.start 1176 && b.labels_at_this_branch.is_empty() 1177 { 1178 trace!(" -> uncond follows another uncond; truncating"); 1179 self.truncate_last_branch(); 1180 continue; 1181 } 1182 1183 // This uncond is immediately after a conditional, and the 1184 // conditional's target is the end of this uncond, and we've 1185 // already redirected labels to this uncond away; so we can 1186 // truncate this uncond, flip the sense of the conditional, and 1187 // set the conditional's target (in `latest_branches` and in 1188 // `fixup_records`) to the uncond's target. 1189 if prev_b.is_cond() 1190 && prev_b.end == b.start 1191 && self.resolve_label_offset(prev_b.target) == cur_off 1192 { 1193 trace!( 1194 " -> uncond follows a conditional, and conditional's target resolves to current offset" 1195 ); 1196 // Save the target of the uncond (this becomes the 1197 // target of the cond), and truncate the uncond. 1198 let target = b.target; 1199 let data = prev_b.inverted.clone().unwrap(); 1200 self.truncate_last_branch(); 1201 1202 // Mutate the code and cond branch. 1203 let off_before_edit = self.cur_offset(); 1204 let prev_b = self.latest_branches.last_mut().unwrap(); 1205 let not_inverted = SmallVec::from( 1206 &self.data[(prev_b.start as usize)..(prev_b.end as usize)], 1207 ); 1208 1209 // Low-level edit: replaces bytes of branch with 1210 // inverted form. cur_off remains the same afterward, so 1211 // we do not need to modify label data structures. 1212 self.data.truncate(prev_b.start as usize); 1213 self.data.extend_from_slice(&data[..]); 1214 1215 // Save the original code as the inversion of the 1216 // inverted branch, in case we later edit this branch 1217 // again. 1218 prev_b.inverted = Some(not_inverted); 1219 self.pending_fixup_records[prev_b.fixup].label = target; 1220 trace!(" -> reassigning target of condbr to {:?}", target); 1221 prev_b.target = target; 1222 debug_assert_eq!(off_before_edit, self.cur_offset()); 1223 continue; 1224 } 1225 } 1226 } 1227 1228 // If we couldn't do anything with the last branch, then break. 1229 break; 1230 } 1231 1232 self.purge_latest_branches(); 1233 1234 trace!( 1235 "leave optimize_branches:\n b = {:?}\n l = {:?}\n f = {:?}", 1236 self.latest_branches, self.labels_at_tail, self.pending_fixup_records 1237 ); 1238 } 1239 1240 fn purge_latest_branches(&mut self) { 1241 // All of our branch simplification rules work only if a branch ends at 1242 // the tail of the buffer, with no following code; and branches are in 1243 // order in latest_branches; so if the last entry ends prior to 1244 // cur_offset, then clear all entries. 1245 let cur_off = self.cur_offset(); 1246 if let Some(l) = self.latest_branches.last() { 1247 if l.end < cur_off { 1248 trace!("purge_latest_branches: removing branch {:?}", l); 1249 self.latest_branches.clear(); 1250 } 1251 } 1252 1253 // Post-invariant: no invariant requires any branch to appear in 1254 // `latest_branches`; it is always optional. The list-clear above thus 1255 // preserves all semantics. 1256 } 1257 1258 /// Emit a trap at some point in the future with the specified code and 1259 /// stack map. 1260 /// 1261 /// This function returns a [`MachLabel`] which will be the future address 1262 /// of the trap. Jumps should refer to this label, likely by using the 1263 /// [`MachBuffer::use_label_at_offset`] method, to get a relocation 1264 /// patched in once the address of the trap is known. 1265 /// 1266 /// This will batch all traps into the end of the function. 1267 pub fn defer_trap(&mut self, code: TrapCode) -> MachLabel { 1268 let label = self.get_label(); 1269 self.pending_traps.push(MachLabelTrap { 1270 label, 1271 code, 1272 loc: self.cur_srcloc.map(|(_start, loc)| loc), 1273 }); 1274 label 1275 } 1276 1277 /// Is an island needed within the next N bytes? 1278 pub fn island_needed(&self, distance: CodeOffset) -> bool { 1279 let deadline = match self.fixup_records.peek() { 1280 Some(fixup) => fixup.deadline().min(self.pending_fixup_deadline), 1281 None => self.pending_fixup_deadline, 1282 }; 1283 deadline < u32::MAX && self.worst_case_end_of_island(distance) > deadline 1284 } 1285 1286 /// Returns the maximal offset that islands can reach if `distance` more 1287 /// bytes are appended. 1288 /// 1289 /// This is used to determine if veneers need insertions since jumps that 1290 /// can't reach past this point must get a veneer of some form. 1291 fn worst_case_end_of_island(&self, distance: CodeOffset) -> CodeOffset { 1292 // Assume that all fixups will require veneers and that the veneers are 1293 // the worst-case size for each platform. This is an over-generalization 1294 // to avoid iterating over the `fixup_records` list or maintaining 1295 // information about it as we go along. 1296 let island_worst_case_size = ((self.fixup_records.len() + self.pending_fixup_records.len()) 1297 as u32) 1298 * (I::LabelUse::worst_case_veneer_size()) 1299 + self.pending_constants_size 1300 + (self.pending_traps.len() * I::TRAP_OPCODE.len()) as u32; 1301 self.cur_offset() 1302 .saturating_add(distance) 1303 .saturating_add(island_worst_case_size) 1304 } 1305 1306 /// Emit all pending constants and required pending veneers. 1307 /// 1308 /// Should only be called if `island_needed()` returns true, i.e., if we 1309 /// actually reach a deadline. It's not necessarily a problem to do so 1310 /// otherwise but it may result in unnecessary work during emission. 1311 pub fn emit_island(&mut self, distance: CodeOffset, ctrl_plane: &mut ControlPlane) { 1312 self.emit_island_maybe_forced(ForceVeneers::No, distance, ctrl_plane); 1313 } 1314 1315 /// Same as `emit_island`, but an internal API with a `force_veneers` 1316 /// argument to force all veneers to always get emitted for debugging. 1317 fn emit_island_maybe_forced( 1318 &mut self, 1319 force_veneers: ForceVeneers, 1320 distance: CodeOffset, 1321 ctrl_plane: &mut ControlPlane, 1322 ) { 1323 // We're going to purge fixups, so no latest-branch editing can happen 1324 // anymore. 1325 self.latest_branches.clear(); 1326 1327 // End the current location tracking since anything emitted during this 1328 // function shouldn't be attributed to whatever the current source 1329 // location is. 1330 // 1331 // Note that the current source location, if it's set right now, will be 1332 // restored at the end of this island emission. 1333 let cur_loc = self.cur_srcloc.map(|(_, loc)| loc); 1334 if cur_loc.is_some() { 1335 self.end_srcloc(); 1336 } 1337 1338 let forced_threshold = self.worst_case_end_of_island(distance); 1339 1340 // First flush out all traps/constants so we have more labels in case 1341 // fixups are applied against these labels. 1342 // 1343 // Note that traps are placed first since this typically happens at the 1344 // end of the function and for disassemblers we try to keep all the code 1345 // contiguously together. 1346 for MachLabelTrap { label, code, loc } in mem::take(&mut self.pending_traps) { 1347 // If this trap has source information associated with it then 1348 // emit this information for the trap instruction going out now too. 1349 if let Some(loc) = loc { 1350 self.start_srcloc(loc); 1351 } 1352 self.align_to(I::LabelUse::ALIGN); 1353 self.bind_label(label, ctrl_plane); 1354 self.add_trap(code); 1355 self.put_data(I::TRAP_OPCODE); 1356 if loc.is_some() { 1357 self.end_srcloc(); 1358 } 1359 } 1360 1361 for constant in mem::take(&mut self.pending_constants) { 1362 let MachBufferConstant { align, size, .. } = self.constants[constant]; 1363 let label = self.constants[constant].upcoming_label.take().unwrap(); 1364 self.align_to(align); 1365 self.bind_label(label, ctrl_plane); 1366 self.used_constants.push((constant, self.cur_offset())); 1367 self.get_appended_space(size); 1368 } 1369 1370 // Either handle all pending fixups because they're ready or move them 1371 // onto the `BinaryHeap` tracking all pending fixups if they aren't 1372 // ready. 1373 assert!(self.latest_branches.is_empty()); 1374 for fixup in mem::take(&mut self.pending_fixup_records) { 1375 if self.should_apply_fixup(&fixup, forced_threshold) { 1376 self.handle_fixup(fixup, force_veneers, forced_threshold); 1377 } else { 1378 self.fixup_records.push(fixup); 1379 } 1380 } 1381 self.pending_fixup_deadline = u32::MAX; 1382 while let Some(fixup) = self.fixup_records.peek() { 1383 trace!("emit_island: fixup {:?}", fixup); 1384 1385 // If this fixup shouldn't be applied, that means its label isn't 1386 // defined yet and there'll be remaining space to apply a veneer if 1387 // necessary in the future after this island. In that situation 1388 // because `fixup_records` is sorted by deadline this loop can 1389 // exit. 1390 if !self.should_apply_fixup(fixup, forced_threshold) { 1391 break; 1392 } 1393 1394 let fixup = self.fixup_records.pop().unwrap(); 1395 self.handle_fixup(fixup, force_veneers, forced_threshold); 1396 } 1397 1398 if let Some(loc) = cur_loc { 1399 self.start_srcloc(loc); 1400 } 1401 } 1402 1403 fn should_apply_fixup(&self, fixup: &MachLabelFixup<I>, forced_threshold: CodeOffset) -> bool { 1404 let label_offset = self.resolve_label_offset(fixup.label); 1405 label_offset != UNKNOWN_LABEL_OFFSET || fixup.deadline() < forced_threshold 1406 } 1407 1408 fn handle_fixup( 1409 &mut self, 1410 fixup: MachLabelFixup<I>, 1411 force_veneers: ForceVeneers, 1412 forced_threshold: CodeOffset, 1413 ) { 1414 let MachLabelFixup { 1415 label, 1416 offset, 1417 kind, 1418 } = fixup; 1419 let start = offset as usize; 1420 let end = (offset + kind.patch_size()) as usize; 1421 let label_offset = self.resolve_label_offset(label); 1422 1423 if label_offset != UNKNOWN_LABEL_OFFSET { 1424 // If the offset of the label for this fixup is known then 1425 // we're going to do something here-and-now. We're either going 1426 // to patch the original offset because it's an in-bounds jump, 1427 // or we're going to generate a veneer, patch the fixup to jump 1428 // to the veneer, and then keep going. 1429 // 1430 // If the label comes after the original fixup, then we should 1431 // be guaranteed that the jump is in-bounds. Otherwise there's 1432 // a bug somewhere because this method wasn't called soon 1433 // enough. All forward-jumps are tracked and should get veneers 1434 // before their deadline comes and they're unable to jump 1435 // further. 1436 // 1437 // Otherwise if the label is before the fixup, then that's a 1438 // backwards jump. If it's past the maximum negative range 1439 // then we'll emit a veneer that to jump forward to which can 1440 // then jump backwards. 1441 let veneer_required = if label_offset >= offset { 1442 assert!((label_offset - offset) <= kind.max_pos_range()); 1443 false 1444 } else { 1445 (offset - label_offset) > kind.max_neg_range() 1446 }; 1447 trace!( 1448 " -> label_offset = {}, known, required = {} (pos {} neg {})", 1449 label_offset, 1450 veneer_required, 1451 kind.max_pos_range(), 1452 kind.max_neg_range() 1453 ); 1454 1455 if (force_veneers == ForceVeneers::Yes && kind.supports_veneer()) || veneer_required { 1456 self.emit_veneer(label, offset, kind); 1457 } else { 1458 let slice = &mut self.data[start..end]; 1459 trace!( 1460 "patching in-range! slice = {slice:?}; offset = {offset:#x}; label_offset = {label_offset:#x}" 1461 ); 1462 kind.patch(slice, offset, label_offset); 1463 } 1464 } else { 1465 // If the offset of this label is not known at this time then 1466 // that means that a veneer is required because after this 1467 // island the target can't be in range of the original target. 1468 assert!(forced_threshold - offset > kind.max_pos_range()); 1469 self.emit_veneer(label, offset, kind); 1470 } 1471 } 1472 1473 /// Emits a "veneer" the `kind` code at `offset` to jump to `label`. 1474 /// 1475 /// This will generate extra machine code, using `kind`, to get a 1476 /// larger-jump-kind than `kind` allows. The code at `offset` is then 1477 /// patched to jump to our new code, and then the new code is enqueued for 1478 /// a fixup to get processed at some later time. 1479 fn emit_veneer(&mut self, label: MachLabel, offset: CodeOffset, kind: I::LabelUse) { 1480 // If this `kind` doesn't support a veneer then that's a bug in the 1481 // backend because we need to implement support for such a veneer. 1482 assert!( 1483 kind.supports_veneer(), 1484 "jump beyond the range of {kind:?} but a veneer isn't supported", 1485 ); 1486 1487 // Allocate space for a veneer in the island. 1488 self.align_to(I::LabelUse::ALIGN); 1489 let veneer_offset = self.cur_offset(); 1490 trace!("making a veneer at {}", veneer_offset); 1491 let start = offset as usize; 1492 let end = (offset + kind.patch_size()) as usize; 1493 let slice = &mut self.data[start..end]; 1494 // Patch the original label use to refer to the veneer. 1495 trace!( 1496 "patching original at offset {} to veneer offset {}", 1497 offset, veneer_offset 1498 ); 1499 kind.patch(slice, offset, veneer_offset); 1500 // Generate the veneer. 1501 let veneer_slice = self.get_appended_space(kind.veneer_size() as usize); 1502 let (veneer_fixup_off, veneer_label_use) = 1503 kind.generate_veneer(veneer_slice, veneer_offset); 1504 trace!( 1505 "generated veneer; fixup offset {}, label_use {:?}", 1506 veneer_fixup_off, veneer_label_use 1507 ); 1508 // Register a new use of `label` with our new veneer fixup and 1509 // offset. This'll recalculate deadlines accordingly and 1510 // enqueue this fixup to get processed at some later 1511 // time. 1512 self.use_label_at_offset(veneer_fixup_off, label, veneer_label_use); 1513 } 1514 1515 fn finish_emission_maybe_forcing_veneers( 1516 &mut self, 1517 force_veneers: ForceVeneers, 1518 ctrl_plane: &mut ControlPlane, 1519 ) { 1520 while !self.pending_constants.is_empty() 1521 || !self.pending_traps.is_empty() 1522 || !self.fixup_records.is_empty() 1523 || !self.pending_fixup_records.is_empty() 1524 { 1525 // `emit_island()` will emit any pending veneers and constants, and 1526 // as a side-effect, will also take care of any fixups with resolved 1527 // labels eagerly. 1528 self.emit_island_maybe_forced(force_veneers, u32::MAX, ctrl_plane); 1529 } 1530 1531 // Ensure that all labels have been fixed up after the last island is emitted. This is a 1532 // full (release-mode) assert because an unresolved label means the emitted code is 1533 // incorrect. 1534 assert!(self.fixup_records.is_empty()); 1535 assert!(self.pending_fixup_records.is_empty()); 1536 } 1537 1538 /// Finish any deferred emissions and/or fixups. 1539 pub fn finish( 1540 mut self, 1541 constants: &VCodeConstants, 1542 ctrl_plane: &mut ControlPlane, 1543 ) -> MachBufferFinalized<Stencil> { 1544 let _tt = timing::vcode_emit_finish(); 1545 1546 self.finish_emission_maybe_forcing_veneers(ForceVeneers::No, ctrl_plane); 1547 1548 let alignment = self.finish_constants(constants); 1549 1550 // Resolve all labels to their offsets. 1551 let finalized_relocs = self 1552 .relocs 1553 .iter() 1554 .map(|reloc| FinalizedMachReloc { 1555 offset: reloc.offset, 1556 kind: reloc.kind, 1557 addend: reloc.addend, 1558 target: match &reloc.target { 1559 RelocTarget::ExternalName(name) => { 1560 FinalizedRelocTarget::ExternalName(name.clone()) 1561 } 1562 RelocTarget::Label(label) => { 1563 FinalizedRelocTarget::Func(self.resolve_label_offset(*label)) 1564 } 1565 }, 1566 }) 1567 .collect(); 1568 1569 let finalized_exception_handlers = self 1570 .exception_handlers 1571 .iter() 1572 .map(|handler| handler.finalize(|label| self.resolve_label_offset(label))) 1573 .collect(); 1574 1575 let mut srclocs = self.srclocs; 1576 srclocs.sort_by_key(|entry| entry.start); 1577 1578 MachBufferFinalized { 1579 data: self.data, 1580 relocs: finalized_relocs, 1581 traps: self.traps, 1582 call_sites: self.call_sites, 1583 patchable_call_sites: self.patchable_call_sites, 1584 exception_handlers: finalized_exception_handlers, 1585 srclocs, 1586 debug_tags: self.debug_tags, 1587 debug_tag_pool: self.debug_tag_pool, 1588 user_stack_maps: self.user_stack_maps, 1589 unwind_info: self.unwind_info, 1590 alignment, 1591 frame_layout: self.frame_layout, 1592 nop_units: I::gen_nop_units(), 1593 } 1594 } 1595 1596 /// Add an external relocation at the given offset. 1597 pub fn add_reloc_at_offset<T: Into<RelocTarget> + Clone>( 1598 &mut self, 1599 offset: CodeOffset, 1600 kind: Reloc, 1601 target: &T, 1602 addend: Addend, 1603 ) { 1604 let target: RelocTarget = target.clone().into(); 1605 // FIXME(#3277): This should use `I::LabelUse::from_reloc` to optionally 1606 // generate a label-use statement to track whether an island is possibly 1607 // needed to escape this function to actually get to the external name. 1608 // This is most likely to come up on AArch64 where calls between 1609 // functions use a 26-bit signed offset which gives +/- 64MB. This means 1610 // that if a function is 128MB in size and there's a call in the middle 1611 // it's impossible to reach the actual target. Also, while it's 1612 // technically possible to jump to the start of a function and then jump 1613 // further, island insertion below always inserts islands after 1614 // previously appended code so for Cranelift's own implementation this 1615 // is also a problem for 64MB functions on AArch64 which start with a 1616 // call instruction, those won't be able to escape. 1617 // 1618 // Ideally what needs to happen here is that a `LabelUse` is 1619 // transparently generated (or call-sites of this function are audited 1620 // to generate a `LabelUse` instead) and tracked internally. The actual 1621 // relocation would then change over time if and when a veneer is 1622 // inserted, where the relocation here would be patched by this 1623 // `MachBuffer` to jump to the veneer. The problem, though, is that all 1624 // this still needs to end up, in the case of a singular function, 1625 // generating a final relocation pointing either to this particular 1626 // relocation or to the veneer inserted. Additionally 1627 // `MachBuffer` needs the concept of a label which will never be 1628 // resolved, so `emit_island` doesn't trip over not actually ever 1629 // knowing what some labels are. Currently the loop in 1630 // `finish_emission_maybe_forcing_veneers` would otherwise infinitely 1631 // loop. 1632 // 1633 // For now this means that because relocs aren't tracked at all that 1634 // AArch64 functions have a rough size limits of 64MB. For now that's 1635 // somewhat reasonable and the failure mode is a panic in `MachBuffer` 1636 // when a relocation can't otherwise be resolved later, so it shouldn't 1637 // actually result in any memory unsafety or anything like that. 1638 self.relocs.push(MachReloc { 1639 offset, 1640 kind, 1641 target, 1642 addend, 1643 }); 1644 } 1645 1646 /// Add an external relocation at the current offset. 1647 pub fn add_reloc<T: Into<RelocTarget> + Clone>( 1648 &mut self, 1649 kind: Reloc, 1650 target: &T, 1651 addend: Addend, 1652 ) { 1653 self.add_reloc_at_offset(self.data.len() as CodeOffset, kind, target, addend); 1654 } 1655 1656 /// Add a trap record at the current offset. 1657 pub fn add_trap(&mut self, code: TrapCode) { 1658 self.traps.push(MachTrap { 1659 offset: self.data.len() as CodeOffset, 1660 code, 1661 }); 1662 } 1663 1664 /// Add a call-site record at the current offset. 1665 pub fn add_call_site(&mut self) { 1666 self.add_try_call_site(None, core::iter::empty()); 1667 } 1668 1669 /// Add a call-site record at the current offset with exception 1670 /// handlers. 1671 pub fn add_try_call_site( 1672 &mut self, 1673 frame_offset: Option<u32>, 1674 exception_handlers: impl Iterator<Item = MachExceptionHandler>, 1675 ) { 1676 let start = u32::try_from(self.exception_handlers.len()).unwrap(); 1677 self.exception_handlers.extend(exception_handlers); 1678 let end = u32::try_from(self.exception_handlers.len()).unwrap(); 1679 let exception_handler_range = start..end; 1680 1681 self.call_sites.push(MachCallSite { 1682 ret_addr: self.data.len() as CodeOffset, 1683 frame_offset, 1684 exception_handler_range, 1685 }); 1686 } 1687 1688 /// Add a patchable call record at the current offset The actual 1689 /// call is expected to have been emitted; the VCodeInst trait 1690 /// specifies how to NOP it out, and we carry that information to 1691 /// the finalized Machbuffer. 1692 pub fn add_patchable_call_site(&mut self, len: u32) { 1693 self.patchable_call_sites.push(MachPatchableCallSite { 1694 ret_addr: self.cur_offset(), 1695 len, 1696 }); 1697 } 1698 1699 /// Add an unwind record at the current offset. 1700 pub fn add_unwind(&mut self, unwind: UnwindInst) { 1701 self.unwind_info.push((self.cur_offset(), unwind)); 1702 } 1703 1704 /// Set the `SourceLoc` for code from this offset until the offset at the 1705 /// next call to `end_srcloc()`. 1706 /// Returns the current [CodeOffset] and [RelSourceLoc]. 1707 pub fn start_srcloc(&mut self, loc: RelSourceLoc) -> (CodeOffset, RelSourceLoc) { 1708 let cur = (self.cur_offset(), loc); 1709 self.cur_srcloc = Some(cur); 1710 cur 1711 } 1712 1713 /// Mark the end of the `SourceLoc` segment started at the last 1714 /// `start_srcloc()` call. 1715 pub fn end_srcloc(&mut self) { 1716 let (start, loc) = self 1717 .cur_srcloc 1718 .take() 1719 .expect("end_srcloc() called without start_srcloc()"); 1720 let end = self.cur_offset(); 1721 // Skip zero-length extends. 1722 debug_assert!(end >= start); 1723 if end > start { 1724 self.srclocs.push(MachSrcLoc { start, end, loc }); 1725 } 1726 } 1727 1728 /// Push a user stack map onto this buffer. 1729 /// 1730 /// The stack map is associated with the given `return_addr` code 1731 /// offset. This must be the PC for the instruction just *after* this stack 1732 /// map's associated instruction. For example in the sequence `call $foo; 1733 /// add r8, rax`, the `return_addr` must be the offset of the start of the 1734 /// `add` instruction. 1735 /// 1736 /// Stack maps must be pushed in sorted `return_addr` order. 1737 pub fn push_user_stack_map( 1738 &mut self, 1739 emit_state: &I::State, 1740 return_addr: CodeOffset, 1741 mut stack_map: ir::UserStackMap, 1742 ) { 1743 let span = emit_state.frame_layout().active_size(); 1744 trace!("Adding user stack map @ {return_addr:#x} spanning {span} bytes: {stack_map:?}"); 1745 1746 debug_assert!( 1747 self.user_stack_maps 1748 .last() 1749 .map_or(true, |(prev_addr, _, _)| *prev_addr < return_addr), 1750 "pushed stack maps out of order: {} is not less than {}", 1751 self.user_stack_maps.last().unwrap().0, 1752 return_addr, 1753 ); 1754 1755 stack_map.finalize(emit_state.frame_layout().sp_to_sized_stack_slots()); 1756 self.user_stack_maps.push((return_addr, span, stack_map)); 1757 } 1758 1759 /// Push a debug tag associated with the current buffer offset. 1760 pub fn push_debug_tags(&mut self, pos: MachDebugTagPos, tags: &[DebugTag]) { 1761 trace!("debug tags at offset {}: {tags:?}", self.cur_offset()); 1762 let start = u32::try_from(self.debug_tag_pool.len()).unwrap(); 1763 self.debug_tag_pool.extend(tags.iter().cloned()); 1764 let end = u32::try_from(self.debug_tag_pool.len()).unwrap(); 1765 self.debug_tags.push(MachDebugTags { 1766 offset: self.cur_offset(), 1767 pos, 1768 range: start..end, 1769 }); 1770 } 1771 1772 /// Increase the alignment of the buffer to the given alignment if bigger 1773 /// than the current alignment. 1774 pub fn set_log2_min_function_alignment(&mut self, align_to: u8) { 1775 self.min_alignment = self.min_alignment.max( 1776 1u32.checked_shl(u32::from(align_to)) 1777 .expect("log2_min_function_alignment too large"), 1778 ); 1779 } 1780 1781 /// Set the frame layout metadata. 1782 pub fn set_frame_layout(&mut self, frame_layout: MachBufferFrameLayout) { 1783 debug_assert!(self.frame_layout.is_none()); 1784 self.frame_layout = Some(frame_layout); 1785 } 1786 } 1787 1788 impl<I: VCodeInst> Extend<u8> for MachBuffer<I> { 1789 fn extend<T: IntoIterator<Item = u8>>(&mut self, iter: T) { 1790 for b in iter { 1791 self.put1(b); 1792 } 1793 } 1794 } 1795 1796 impl<T: CompilePhase> MachBufferFinalized<T> { 1797 /// Get a list of source location mapping tuples in sorted-by-start-offset order. 1798 pub fn get_srclocs_sorted(&self) -> &[T::MachSrcLocType] { 1799 &self.srclocs[..] 1800 } 1801 1802 /// Get all debug tags, sorted by associated offset. 1803 pub fn debug_tags(&self) -> impl Iterator<Item = MachBufferDebugTagList<'_>> { 1804 self.debug_tags.iter().map(|tags| { 1805 let start = usize::try_from(tags.range.start).unwrap(); 1806 let end = usize::try_from(tags.range.end).unwrap(); 1807 MachBufferDebugTagList { 1808 offset: tags.offset, 1809 pos: tags.pos, 1810 tags: &self.debug_tag_pool[start..end], 1811 } 1812 }) 1813 } 1814 1815 /// Get the total required size for the code. 1816 pub fn total_size(&self) -> CodeOffset { 1817 self.data.len() as CodeOffset 1818 } 1819 1820 /// Return the code in this mach buffer as a hex string for testing purposes. 1821 pub fn stringify_code_bytes(&self) -> String { 1822 // This is pretty lame, but whatever .. 1823 use std::fmt::Write; 1824 let mut s = String::with_capacity(self.data.len() * 2); 1825 for b in &self.data { 1826 write!(&mut s, "{b:02X}").unwrap(); 1827 } 1828 s 1829 } 1830 1831 /// Get the code bytes. 1832 pub fn data(&self) -> &[u8] { 1833 // N.B.: we emit every section into the .text section as far as 1834 // the `CodeSink` is concerned; we do not bother to segregate 1835 // the contents into the actual program text, the jumptable and the 1836 // rodata (constant pool). This allows us to generate code assuming 1837 // that these will not be relocated relative to each other, and avoids 1838 // having to designate each section as belonging in one of the three 1839 // fixed categories defined by `CodeSink`. If this becomes a problem 1840 // later (e.g. because of memory permissions or similar), we can 1841 // add this designation and segregate the output; take care, however, 1842 // to add the appropriate relocations in this case. 1843 1844 &self.data[..] 1845 } 1846 1847 /// Get a mutable slice of the code bytes, allowing patching 1848 /// post-passes. 1849 pub fn data_mut(&mut self) -> &mut [u8] { 1850 &mut self.data[..] 1851 } 1852 1853 /// Get the list of external relocations for this code. 1854 pub fn relocs(&self) -> &[FinalizedMachReloc] { 1855 &self.relocs[..] 1856 } 1857 1858 /// Get the list of trap records for this code. 1859 pub fn traps(&self) -> &[MachTrap] { 1860 &self.traps[..] 1861 } 1862 1863 /// Get the user stack map metadata for this code. 1864 pub fn user_stack_maps(&self) -> &[(CodeOffset, u32, ir::UserStackMap)] { 1865 &self.user_stack_maps 1866 } 1867 1868 /// Take this buffer's user strack map metadata. 1869 pub fn take_user_stack_maps(&mut self) -> SmallVec<[(CodeOffset, u32, ir::UserStackMap); 8]> { 1870 mem::take(&mut self.user_stack_maps) 1871 } 1872 1873 /// Get the list of call sites for this code, along with 1874 /// associated exception handlers. 1875 /// 1876 /// Each item yielded by the returned iterator is a struct with: 1877 /// 1878 /// - The call site metadata record, with a `ret_addr` field 1879 /// directly accessible and denoting the offset of the return 1880 /// address into this buffer's code. 1881 /// - The slice of pairs of exception tags and code offsets 1882 /// denoting exception-handler entry points associated with this 1883 /// call site. 1884 pub fn call_sites(&self) -> impl Iterator<Item = FinalizedMachCallSite<'_>> + '_ { 1885 self.call_sites.iter().map(|call_site| { 1886 let handler_range = call_site.exception_handler_range.clone(); 1887 let handler_range = usize::try_from(handler_range.start).unwrap() 1888 ..usize::try_from(handler_range.end).unwrap(); 1889 FinalizedMachCallSite { 1890 ret_addr: call_site.ret_addr, 1891 frame_offset: call_site.frame_offset, 1892 exception_handlers: &self.exception_handlers[handler_range], 1893 } 1894 }) 1895 } 1896 1897 /// Get the frame layout, if known. 1898 pub fn frame_layout(&self) -> Option<&MachBufferFrameLayout> { 1899 self.frame_layout.as_ref() 1900 } 1901 1902 /// Get the list of patchable call sites for this code. 1903 /// 1904 /// Each location in the buffer contains the bytes for a call 1905 /// instruction to the specified target. If the call is to be 1906 /// patched out, the bytes in the region should be replaced with 1907 /// those given in the `MachBufferFinalized::nop` array, repeated 1908 /// as many times as necessary. (The length of the patchable 1909 /// region is guaranteed to be an integer multiple of that NOP 1910 /// unit size.) 1911 pub fn patchable_call_sites(&self) -> impl Iterator<Item = &MachPatchableCallSite> + '_ { 1912 self.patchable_call_sites.iter() 1913 } 1914 } 1915 1916 /// An item in the exception-handler list for a callsite, with label 1917 /// references. Items are interpreted in left-to-right order and the 1918 /// first match wins. 1919 #[derive(Clone, Copy, Debug, PartialEq, Eq)] 1920 pub enum MachExceptionHandler { 1921 /// A specific tag (in the current dynamic context) should be 1922 /// handled by the code at the given offset. 1923 Tag(ExceptionTag, MachLabel), 1924 /// All exceptions should be handled by the code at the given 1925 /// offset. 1926 Default(MachLabel), 1927 /// The dynamic context for interpreting tags is updated to the 1928 /// value stored in the given machine location (in this frame's 1929 /// context). 1930 Context(ExceptionContextLoc), 1931 } 1932 1933 impl MachExceptionHandler { 1934 fn finalize<F: Fn(MachLabel) -> CodeOffset>(self, f: F) -> FinalizedMachExceptionHandler { 1935 match self { 1936 Self::Tag(tag, label) => FinalizedMachExceptionHandler::Tag(tag, f(label)), 1937 Self::Default(label) => FinalizedMachExceptionHandler::Default(f(label)), 1938 Self::Context(loc) => FinalizedMachExceptionHandler::Context(loc), 1939 } 1940 } 1941 } 1942 1943 /// An item in the exception-handler list for a callsite, with final 1944 /// (lowered) code offsets. Items are interpreted in left-to-right 1945 /// order and the first match wins. 1946 #[derive(Clone, Copy, Debug, PartialEq, Eq)] 1947 #[cfg_attr( 1948 feature = "enable-serde", 1949 derive(serde_derive::Serialize, serde_derive::Deserialize) 1950 )] 1951 pub enum FinalizedMachExceptionHandler { 1952 /// A specific tag (in the current dynamic context) should be 1953 /// handled by the code at the given offset. 1954 Tag(ExceptionTag, CodeOffset), 1955 /// All exceptions should be handled by the code at the given 1956 /// offset. 1957 Default(CodeOffset), 1958 /// The dynamic context for interpreting tags is updated to the 1959 /// value stored in the given machine location (in this frame's 1960 /// context). 1961 Context(ExceptionContextLoc), 1962 } 1963 1964 /// A location for a dynamic exception context value. 1965 #[derive(Clone, Copy, Debug, PartialEq, Eq)] 1966 #[cfg_attr( 1967 feature = "enable-serde", 1968 derive(serde_derive::Serialize, serde_derive::Deserialize) 1969 )] 1970 pub enum ExceptionContextLoc { 1971 /// An offset from SP at the callsite. 1972 SPOffset(u32), 1973 /// A GPR at the callsite. The physical register number for the 1974 /// GPR register file on the target architecture is used. 1975 GPR(u8), 1976 } 1977 1978 /// Metadata about a constant. 1979 struct MachBufferConstant { 1980 /// A label which has not yet been bound which can be used for this 1981 /// constant. 1982 /// 1983 /// This is lazily created when a label is requested for a constant and is 1984 /// cleared when a constant is emitted. 1985 upcoming_label: Option<MachLabel>, 1986 /// Required alignment. 1987 align: CodeOffset, 1988 /// The byte size of this constant. 1989 size: usize, 1990 } 1991 1992 /// A trap that is deferred to the next time an island is emitted for either 1993 /// traps, constants, or fixups. 1994 struct MachLabelTrap { 1995 /// This label will refer to the trap's offset. 1996 label: MachLabel, 1997 /// The code associated with this trap. 1998 code: TrapCode, 1999 /// An optional source location to assign for this trap. 2000 loc: Option<RelSourceLoc>, 2001 } 2002 2003 /// A fixup to perform on the buffer once code is emitted. Fixups always refer 2004 /// to labels and patch the code based on label offsets. Hence, they are like 2005 /// relocations, but internal to one buffer. 2006 #[derive(Debug)] 2007 struct MachLabelFixup<I: VCodeInst> { 2008 /// The label whose offset controls this fixup. 2009 label: MachLabel, 2010 /// The offset to fix up / patch to refer to this label. 2011 offset: CodeOffset, 2012 /// The kind of fixup. This is architecture-specific; each architecture may have, 2013 /// e.g., several types of branch instructions, each with differently-sized 2014 /// offset fields and different places within the instruction to place the 2015 /// bits. 2016 kind: I::LabelUse, 2017 } 2018 2019 impl<I: VCodeInst> MachLabelFixup<I> { 2020 fn deadline(&self) -> CodeOffset { 2021 self.offset.saturating_add(self.kind.max_pos_range()) 2022 } 2023 } 2024 2025 impl<I: VCodeInst> PartialEq for MachLabelFixup<I> { 2026 fn eq(&self, other: &Self) -> bool { 2027 self.deadline() == other.deadline() 2028 } 2029 } 2030 2031 impl<I: VCodeInst> Eq for MachLabelFixup<I> {} 2032 2033 impl<I: VCodeInst> PartialOrd for MachLabelFixup<I> { 2034 fn partial_cmp(&self, other: &Self) -> Option<Ordering> { 2035 Some(self.cmp(other)) 2036 } 2037 } 2038 2039 impl<I: VCodeInst> Ord for MachLabelFixup<I> { 2040 fn cmp(&self, other: &Self) -> Ordering { 2041 other.deadline().cmp(&self.deadline()) 2042 } 2043 } 2044 2045 /// A relocation resulting from a compilation. 2046 #[derive(Clone, Debug, PartialEq)] 2047 #[cfg_attr( 2048 feature = "enable-serde", 2049 derive(serde_derive::Serialize, serde_derive::Deserialize) 2050 )] 2051 pub struct MachRelocBase<T> { 2052 /// The offset at which the relocation applies, *relative to the 2053 /// containing section*. 2054 pub offset: CodeOffset, 2055 /// The kind of relocation. 2056 pub kind: Reloc, 2057 /// The external symbol / name to which this relocation refers. 2058 pub target: T, 2059 /// The addend to add to the symbol value. 2060 pub addend: i64, 2061 } 2062 2063 type MachReloc = MachRelocBase<RelocTarget>; 2064 2065 /// A relocation resulting from a compilation. 2066 pub type FinalizedMachReloc = MachRelocBase<FinalizedRelocTarget>; 2067 2068 /// A Relocation target 2069 #[derive(Debug, Clone, PartialEq, Eq, Hash)] 2070 pub enum RelocTarget { 2071 /// Points to an [ExternalName] outside the current function. 2072 ExternalName(ExternalName), 2073 /// Points to a [MachLabel] inside this function. 2074 /// This is different from [MachLabelFixup] in that both the relocation and the 2075 /// label will be emitted and are only resolved at link time. 2076 /// 2077 /// There is no reason to prefer this over [MachLabelFixup] unless the ABI requires it. 2078 Label(MachLabel), 2079 } 2080 2081 impl From<ExternalName> for RelocTarget { 2082 fn from(name: ExternalName) -> Self { 2083 Self::ExternalName(name) 2084 } 2085 } 2086 2087 impl From<MachLabel> for RelocTarget { 2088 fn from(label: MachLabel) -> Self { 2089 Self::Label(label) 2090 } 2091 } 2092 2093 /// A Relocation target 2094 #[derive(Debug, Clone, PartialEq, Eq, Hash)] 2095 #[cfg_attr( 2096 feature = "enable-serde", 2097 derive(serde_derive::Serialize, serde_derive::Deserialize) 2098 )] 2099 pub enum FinalizedRelocTarget { 2100 /// Points to an [ExternalName] outside the current function. 2101 ExternalName(ExternalName), 2102 /// Points to a [CodeOffset] from the start of the current function. 2103 Func(CodeOffset), 2104 } 2105 2106 impl FinalizedRelocTarget { 2107 /// Returns a display for the current [FinalizedRelocTarget], with extra context to prettify the 2108 /// output. 2109 pub fn display<'a>(&'a self, params: Option<&'a FunctionParameters>) -> String { 2110 match self { 2111 FinalizedRelocTarget::ExternalName(name) => format!("{}", name.display(params)), 2112 FinalizedRelocTarget::Func(offset) => format!("func+{offset}"), 2113 } 2114 } 2115 } 2116 2117 /// A trap record resulting from a compilation. 2118 #[derive(Clone, Debug, PartialEq)] 2119 #[cfg_attr( 2120 feature = "enable-serde", 2121 derive(serde_derive::Serialize, serde_derive::Deserialize) 2122 )] 2123 pub struct MachTrap { 2124 /// The offset at which the trap instruction occurs, *relative to the 2125 /// containing section*. 2126 pub offset: CodeOffset, 2127 /// The trap code. 2128 pub code: TrapCode, 2129 } 2130 2131 /// A call site record resulting from a compilation. 2132 #[derive(Clone, Debug, PartialEq)] 2133 #[cfg_attr( 2134 feature = "enable-serde", 2135 derive(serde_derive::Serialize, serde_derive::Deserialize) 2136 )] 2137 pub struct MachCallSite { 2138 /// The offset of the call's return address, *relative to the 2139 /// start of the buffer*. 2140 pub ret_addr: CodeOffset, 2141 2142 /// The offset from the FP at this callsite down to the SP when 2143 /// the call occurs, if known. In other words, the size of the 2144 /// stack frame up to the saved FP slot. Useful to recover the 2145 /// start of the stack frame and to look up dynamic contexts 2146 /// stored in [`ExceptionContextLoc::SPOffset`]. 2147 /// 2148 /// If `None`, the compiler backend did not specify a frame 2149 /// offset. The runtime in use with the compiled code may require 2150 /// the frame offset if exception handlers are present or dynamic 2151 /// context is used, but that is not Cranelift's concern: the 2152 /// frame offset is optional at this level. 2153 pub frame_offset: Option<u32>, 2154 2155 /// Range in `exception_handlers` corresponding to the exception 2156 /// handlers for this callsite. 2157 exception_handler_range: Range<u32>, 2158 } 2159 2160 /// A call site record resulting from a compilation. 2161 #[derive(Clone, Debug, PartialEq)] 2162 pub struct FinalizedMachCallSite<'a> { 2163 /// The offset of the call's return address, *relative to the 2164 /// start of the buffer*. 2165 pub ret_addr: CodeOffset, 2166 2167 /// The offset from the FP at this callsite down to the SP when 2168 /// the call occurs, if known. In other words, the size of the 2169 /// stack frame up to the saved FP slot. Useful to recover the 2170 /// start of the stack frame and to look up dynamic contexts 2171 /// stored in [`ExceptionContextLoc::SPOffset`]. 2172 /// 2173 /// If `None`, the compiler backend did not specify a frame 2174 /// offset. The runtime in use with the compiled code may require 2175 /// the frame offset if exception handlers are present or dynamic 2176 /// context is used, but that is not Cranelift's concern: the 2177 /// frame offset is optional at this level. 2178 pub frame_offset: Option<u32>, 2179 2180 /// Exception handlers at this callsite, with target offsets 2181 /// *relative to the start of the buffer*. 2182 pub exception_handlers: &'a [FinalizedMachExceptionHandler], 2183 } 2184 2185 /// A patchable call site record resulting from a compilation. 2186 #[derive(Clone, Debug, PartialEq)] 2187 #[cfg_attr( 2188 feature = "enable-serde", 2189 derive(serde_derive::Serialize, serde_derive::Deserialize) 2190 )] 2191 pub struct MachPatchableCallSite { 2192 /// The offset of the call's return address (i.e., the address 2193 /// after the end of the patchable region), *relative to the start 2194 /// of the buffer*. 2195 pub ret_addr: CodeOffset, 2196 2197 /// The length of the region to be patched by NOP bytes. 2198 pub len: u32, 2199 } 2200 2201 /// A source-location mapping resulting from a compilation. 2202 #[derive(PartialEq, Debug, Clone)] 2203 #[cfg_attr( 2204 feature = "enable-serde", 2205 derive(serde_derive::Serialize, serde_derive::Deserialize) 2206 )] 2207 pub struct MachSrcLoc<T: CompilePhase> { 2208 /// The start of the region of code corresponding to a source location. 2209 /// This is relative to the start of the function, not to the start of the 2210 /// section. 2211 pub start: CodeOffset, 2212 /// The end of the region of code corresponding to a source location. 2213 /// This is relative to the start of the function, not to the start of the 2214 /// section. 2215 pub end: CodeOffset, 2216 /// The source location. 2217 pub loc: T::SourceLocType, 2218 } 2219 2220 impl MachSrcLoc<Stencil> { 2221 fn apply_base_srcloc(self, base_srcloc: SourceLoc) -> MachSrcLoc<Final> { 2222 MachSrcLoc { 2223 start: self.start, 2224 end: self.end, 2225 loc: self.loc.expand(base_srcloc), 2226 } 2227 } 2228 } 2229 2230 /// Record of branch instruction in the buffer, to facilitate editing. 2231 #[derive(Clone, Debug)] 2232 struct MachBranch { 2233 start: CodeOffset, 2234 end: CodeOffset, 2235 target: MachLabel, 2236 fixup: usize, 2237 inverted: Option<SmallVec<[u8; 8]>>, 2238 /// All labels pointing to the start of this branch. For correctness, this 2239 /// *must* be complete (i.e., must contain all labels whose resolved offsets 2240 /// are at the start of this branch): we rely on being able to redirect all 2241 /// labels that could jump to this branch before removing it, if it is 2242 /// otherwise unreachable. 2243 labels_at_this_branch: SmallVec<[MachLabel; 4]>, 2244 } 2245 2246 impl MachBranch { 2247 fn is_cond(&self) -> bool { 2248 self.inverted.is_some() 2249 } 2250 fn is_uncond(&self) -> bool { 2251 self.inverted.is_none() 2252 } 2253 } 2254 2255 /// Stack-frame layout information carried through to machine 2256 /// code. This provides sufficient information to interpret an active 2257 /// stack frame from a running function, if provided. 2258 #[derive(Clone, Debug, PartialEq)] 2259 #[cfg_attr( 2260 feature = "enable-serde", 2261 derive(serde_derive::Serialize, serde_derive::Deserialize) 2262 )] 2263 pub struct MachBufferFrameLayout { 2264 /// Offset from bottom of frame to FP (near top of frame). This 2265 /// allows reading the frame given only FP. 2266 pub frame_to_fp_offset: u32, 2267 /// Offset from bottom of frame for each StackSlot, 2268 pub stackslots: SecondaryMap<ir::StackSlot, MachBufferStackSlot>, 2269 } 2270 2271 /// Descriptor for a single stack slot in the compiled function. 2272 #[derive(Clone, Debug, PartialEq, Default)] 2273 #[cfg_attr( 2274 feature = "enable-serde", 2275 derive(serde_derive::Serialize, serde_derive::Deserialize) 2276 )] 2277 pub struct MachBufferStackSlot { 2278 /// Offset from the bottom of the stack frame. 2279 pub offset: u32, 2280 2281 /// User-provided key to describe this stack slot. 2282 pub key: Option<ir::StackSlotKey>, 2283 } 2284 2285 /// Debug tags: a sequence of references to a stack slot, or a 2286 /// user-defined value, at a particular PC. 2287 #[derive(Clone, Debug, PartialEq)] 2288 #[cfg_attr( 2289 feature = "enable-serde", 2290 derive(serde_derive::Serialize, serde_derive::Deserialize) 2291 )] 2292 pub(crate) struct MachDebugTags { 2293 /// Offset at which this tag applies. 2294 pub offset: CodeOffset, 2295 2296 /// Position on the attached instruction. This indicates whether 2297 /// the tags attach to the prior instruction (i.e., as a return 2298 /// point from a call) or the current instruction (i.e., as a PC 2299 /// seen during a trap). 2300 pub pos: MachDebugTagPos, 2301 2302 /// The range in the tag pool. 2303 pub range: Range<u32>, 2304 } 2305 2306 /// Debug tag position on an instruction. 2307 /// 2308 /// We need to distinguish position on an instruction, and not just 2309 /// use offsets, because of the following case: 2310 /// 2311 /// ```plain 2312 /// <tag1, tag2> call ... 2313 /// <tag3, tag4> trapping_store ... 2314 /// ``` 2315 /// 2316 /// If the stack is walked and interpreted with debug tags while 2317 /// within the call, the PC seen will be the return point, i.e. the 2318 /// address after the call. If the stack is walked and interpreted 2319 /// with debug tags upon a trap of the following instruction, it will 2320 /// be the PC of that instruction -- which is the same PC! Thus to 2321 /// disambiguate which tags we want, we attach a "pre/post" flag to 2322 /// every group of tags at an offset; and when we look up tags, we 2323 /// look them up for an offset and "position" at that offset. 2324 /// 2325 /// Thus there are logically two positions at every offset -- so the 2326 /// above will be emitted as 2327 /// 2328 /// ```plain 2329 /// 0: call ... 2330 /// 4, post: <tag1, tag2> 2331 /// 4, pre: <tag3, tag4> 2332 /// 4: trapping_store ... 2333 /// ``` 2334 #[derive(Clone, Copy, Debug, PartialEq, Eq)] 2335 #[cfg_attr( 2336 feature = "enable-serde", 2337 derive(serde_derive::Serialize, serde_derive::Deserialize) 2338 )] 2339 pub enum MachDebugTagPos { 2340 /// Tags attached after the instruction that ends at this offset. 2341 /// 2342 /// This is used to attach tags to a call, because the PC we see 2343 /// when walking the stack is the *return point*. 2344 Post, 2345 /// Tags attached before the instruction that starts at this offset. 2346 /// 2347 /// This is used to attach tags to every other kind of 2348 /// instruction, because the PC we see when processing a trap of 2349 /// that instruction is the PC of that instruction, not the 2350 /// following one. 2351 Pre, 2352 } 2353 2354 /// Iterator item for visiting debug tags. 2355 pub struct MachBufferDebugTagList<'a> { 2356 /// Offset at which this tag applies. 2357 pub offset: CodeOffset, 2358 2359 /// Position at this offset ("post", attaching to prior 2360 /// instruction, or "pre", attaching to next instruction). 2361 pub pos: MachDebugTagPos, 2362 2363 /// The underlying tags. 2364 pub tags: &'a [DebugTag], 2365 } 2366 2367 /// Implementation of the `TextSectionBuilder` trait backed by `MachBuffer`. 2368 /// 2369 /// Note that `MachBuffer` was primarily written for intra-function references 2370 /// of jumps between basic blocks, but it's also quite usable for entire text 2371 /// sections and resolving references between functions themselves. This 2372 /// builder interprets "blocks" as labeled functions for the purposes of 2373 /// resolving labels internally in the buffer. 2374 pub struct MachTextSectionBuilder<I: VCodeInst> { 2375 buf: MachBuffer<I>, 2376 next_func: usize, 2377 force_veneers: ForceVeneers, 2378 } 2379 2380 impl<I: VCodeInst> MachTextSectionBuilder<I> { 2381 /// Creates a new text section builder which will have `num_funcs` functions 2382 /// pushed into it. 2383 pub fn new(num_funcs: usize) -> MachTextSectionBuilder<I> { 2384 let mut buf = MachBuffer::new(); 2385 buf.reserve_labels_for_blocks(num_funcs); 2386 MachTextSectionBuilder { 2387 buf, 2388 next_func: 0, 2389 force_veneers: ForceVeneers::No, 2390 } 2391 } 2392 } 2393 2394 impl<I: VCodeInst> TextSectionBuilder for MachTextSectionBuilder<I> { 2395 fn append( 2396 &mut self, 2397 labeled: bool, 2398 func: &[u8], 2399 align: u32, 2400 ctrl_plane: &mut ControlPlane, 2401 ) -> u64 { 2402 // Conditionally emit an island if it's necessary to resolve jumps 2403 // between functions which are too far away. 2404 let size = func.len() as u32; 2405 if self.force_veneers == ForceVeneers::Yes || self.buf.island_needed(size) { 2406 self.buf 2407 .emit_island_maybe_forced(self.force_veneers, size, ctrl_plane); 2408 } 2409 2410 self.buf.align_to(align); 2411 let pos = self.buf.cur_offset(); 2412 if labeled { 2413 self.buf.bind_label( 2414 MachLabel::from_block(BlockIndex::new(self.next_func)), 2415 ctrl_plane, 2416 ); 2417 self.next_func += 1; 2418 } 2419 self.buf.put_data(func); 2420 u64::from(pos) 2421 } 2422 2423 fn resolve_reloc(&mut self, offset: u64, reloc: Reloc, addend: Addend, target: usize) -> bool { 2424 crate::trace!( 2425 "Resolving relocation @ {offset:#x} + {addend:#x} to target {target} of kind {reloc:?}" 2426 ); 2427 let label = MachLabel::from_block(BlockIndex::new(target)); 2428 let offset = u32::try_from(offset).unwrap(); 2429 match I::LabelUse::from_reloc(reloc, addend) { 2430 Some(label_use) => { 2431 self.buf.use_label_at_offset(offset, label, label_use); 2432 true 2433 } 2434 None => false, 2435 } 2436 } 2437 2438 fn force_veneers(&mut self) { 2439 self.force_veneers = ForceVeneers::Yes; 2440 } 2441 2442 fn write(&mut self, offset: u64, data: &[u8]) { 2443 self.buf.data[offset.try_into().unwrap()..][..data.len()].copy_from_slice(data); 2444 } 2445 2446 fn finish(&mut self, ctrl_plane: &mut ControlPlane) -> Vec<u8> { 2447 // Double-check all functions were pushed. 2448 assert_eq!(self.next_func, self.buf.label_offsets.len()); 2449 2450 // Finish up any veneers, if necessary. 2451 self.buf 2452 .finish_emission_maybe_forcing_veneers(self.force_veneers, ctrl_plane); 2453 2454 // We don't need the data any more, so return it to the caller. 2455 mem::take(&mut self.buf.data).into_vec() 2456 } 2457 } 2458 2459 // We use an actual instruction definition to do tests, so we depend on the `arm64` feature here. 2460 #[cfg(all(test, feature = "arm64"))] 2461 mod test { 2462 use cranelift_entity::EntityRef as _; 2463 2464 use super::*; 2465 use crate::ir::UserExternalNameRef; 2466 use crate::isa::aarch64::inst::{BranchTarget, CondBrKind, EmitInfo, Inst}; 2467 use crate::isa::aarch64::inst::{OperandSize, xreg}; 2468 use crate::machinst::{MachInstEmit, MachInstEmitState}; 2469 use crate::settings; 2470 2471 fn label(n: u32) -> MachLabel { 2472 MachLabel::from_block(BlockIndex::new(n as usize)) 2473 } 2474 fn target(n: u32) -> BranchTarget { 2475 BranchTarget::Label(label(n)) 2476 } 2477 2478 #[test] 2479 fn test_elide_jump_to_next() { 2480 let info = EmitInfo::new(settings::Flags::new(settings::builder())); 2481 let mut buf = MachBuffer::new(); 2482 let mut state = <Inst as MachInstEmit>::State::default(); 2483 let constants = Default::default(); 2484 2485 buf.reserve_labels_for_blocks(2); 2486 buf.bind_label(label(0), state.ctrl_plane_mut()); 2487 let inst = Inst::Jump { dest: target(1) }; 2488 inst.emit(&mut buf, &info, &mut state); 2489 buf.bind_label(label(1), state.ctrl_plane_mut()); 2490 let buf = buf.finish(&constants, state.ctrl_plane_mut()); 2491 assert_eq!(0, buf.total_size()); 2492 } 2493 2494 #[test] 2495 fn test_elide_trivial_jump_blocks() { 2496 let info = EmitInfo::new(settings::Flags::new(settings::builder())); 2497 let mut buf = MachBuffer::new(); 2498 let mut state = <Inst as MachInstEmit>::State::default(); 2499 let constants = Default::default(); 2500 2501 buf.reserve_labels_for_blocks(4); 2502 2503 buf.bind_label(label(0), state.ctrl_plane_mut()); 2504 let inst = Inst::CondBr { 2505 kind: CondBrKind::NotZero(xreg(0), OperandSize::Size64), 2506 taken: target(1), 2507 not_taken: target(2), 2508 }; 2509 inst.emit(&mut buf, &info, &mut state); 2510 2511 buf.bind_label(label(1), state.ctrl_plane_mut()); 2512 let inst = Inst::Jump { dest: target(3) }; 2513 inst.emit(&mut buf, &info, &mut state); 2514 2515 buf.bind_label(label(2), state.ctrl_plane_mut()); 2516 let inst = Inst::Jump { dest: target(3) }; 2517 inst.emit(&mut buf, &info, &mut state); 2518 2519 buf.bind_label(label(3), state.ctrl_plane_mut()); 2520 2521 let buf = buf.finish(&constants, state.ctrl_plane_mut()); 2522 assert_eq!(0, buf.total_size()); 2523 } 2524 2525 #[test] 2526 fn test_flip_cond() { 2527 let info = EmitInfo::new(settings::Flags::new(settings::builder())); 2528 let mut buf = MachBuffer::new(); 2529 let mut state = <Inst as MachInstEmit>::State::default(); 2530 let constants = Default::default(); 2531 2532 buf.reserve_labels_for_blocks(4); 2533 2534 buf.bind_label(label(0), state.ctrl_plane_mut()); 2535 let inst = Inst::CondBr { 2536 kind: CondBrKind::Zero(xreg(0), OperandSize::Size64), 2537 taken: target(1), 2538 not_taken: target(2), 2539 }; 2540 inst.emit(&mut buf, &info, &mut state); 2541 2542 buf.bind_label(label(1), state.ctrl_plane_mut()); 2543 let inst = Inst::Nop4; 2544 inst.emit(&mut buf, &info, &mut state); 2545 2546 buf.bind_label(label(2), state.ctrl_plane_mut()); 2547 let inst = Inst::Udf { 2548 trap_code: TrapCode::STACK_OVERFLOW, 2549 }; 2550 inst.emit(&mut buf, &info, &mut state); 2551 2552 buf.bind_label(label(3), state.ctrl_plane_mut()); 2553 2554 let buf = buf.finish(&constants, state.ctrl_plane_mut()); 2555 2556 let mut buf2 = MachBuffer::new(); 2557 let mut state = Default::default(); 2558 let inst = Inst::TrapIf { 2559 kind: CondBrKind::NotZero(xreg(0), OperandSize::Size64), 2560 trap_code: TrapCode::STACK_OVERFLOW, 2561 }; 2562 inst.emit(&mut buf2, &info, &mut state); 2563 let inst = Inst::Nop4; 2564 inst.emit(&mut buf2, &info, &mut state); 2565 2566 let buf2 = buf2.finish(&constants, state.ctrl_plane_mut()); 2567 2568 assert_eq!(buf.data, buf2.data); 2569 } 2570 2571 #[test] 2572 fn test_island() { 2573 let info = EmitInfo::new(settings::Flags::new(settings::builder())); 2574 let mut buf = MachBuffer::new(); 2575 let mut state = <Inst as MachInstEmit>::State::default(); 2576 let constants = Default::default(); 2577 2578 buf.reserve_labels_for_blocks(4); 2579 2580 buf.bind_label(label(0), state.ctrl_plane_mut()); 2581 let inst = Inst::CondBr { 2582 kind: CondBrKind::NotZero(xreg(0), OperandSize::Size64), 2583 taken: target(2), 2584 not_taken: target(3), 2585 }; 2586 inst.emit(&mut buf, &info, &mut state); 2587 2588 buf.bind_label(label(1), state.ctrl_plane_mut()); 2589 while buf.cur_offset() < 2000000 { 2590 if buf.island_needed(0) { 2591 buf.emit_island(0, state.ctrl_plane_mut()); 2592 } 2593 let inst = Inst::Nop4; 2594 inst.emit(&mut buf, &info, &mut state); 2595 } 2596 2597 buf.bind_label(label(2), state.ctrl_plane_mut()); 2598 let inst = Inst::Nop4; 2599 inst.emit(&mut buf, &info, &mut state); 2600 2601 buf.bind_label(label(3), state.ctrl_plane_mut()); 2602 let inst = Inst::Nop4; 2603 inst.emit(&mut buf, &info, &mut state); 2604 2605 let buf = buf.finish(&constants, state.ctrl_plane_mut()); 2606 2607 assert_eq!(2000000 + 8, buf.total_size()); 2608 2609 let mut buf2 = MachBuffer::new(); 2610 let mut state = Default::default(); 2611 let inst = Inst::CondBr { 2612 kind: CondBrKind::NotZero(xreg(0), OperandSize::Size64), 2613 2614 // This conditionally taken branch has a 19-bit constant, shifted 2615 // to the left by two, giving us a 21-bit range in total. Half of 2616 // this range positive so the we should be around 1 << 20 bytes 2617 // away for our jump target. 2618 // 2619 // There are two pending fixups by the time we reach this point, 2620 // one for this 19-bit jump and one for the unconditional 26-bit 2621 // jump below. A 19-bit veneer is 4 bytes large and the 26-bit 2622 // veneer is 20 bytes large, which means that pessimistically 2623 // assuming we'll need two veneers. Currently each veneer is 2624 // pessimistically assumed to be the maximal size which means we 2625 // need 40 bytes of extra space, meaning that the actual island 2626 // should come 40-bytes before the deadline. 2627 taken: BranchTarget::ResolvedOffset((1 << 20) - 20 - 20), 2628 2629 // This branch is in-range so no veneers should be needed, it should 2630 // go directly to the target. 2631 not_taken: BranchTarget::ResolvedOffset(2000000 + 4 - 4), 2632 }; 2633 inst.emit(&mut buf2, &info, &mut state); 2634 2635 let buf2 = buf2.finish(&constants, state.ctrl_plane_mut()); 2636 2637 assert_eq!(&buf.data[0..8], &buf2.data[..]); 2638 } 2639 2640 #[test] 2641 fn test_island_backward() { 2642 let info = EmitInfo::new(settings::Flags::new(settings::builder())); 2643 let mut buf = MachBuffer::new(); 2644 let mut state = <Inst as MachInstEmit>::State::default(); 2645 let constants = Default::default(); 2646 2647 buf.reserve_labels_for_blocks(4); 2648 2649 buf.bind_label(label(0), state.ctrl_plane_mut()); 2650 let inst = Inst::Nop4; 2651 inst.emit(&mut buf, &info, &mut state); 2652 2653 buf.bind_label(label(1), state.ctrl_plane_mut()); 2654 let inst = Inst::Nop4; 2655 inst.emit(&mut buf, &info, &mut state); 2656 2657 buf.bind_label(label(2), state.ctrl_plane_mut()); 2658 while buf.cur_offset() < 2000000 { 2659 let inst = Inst::Nop4; 2660 inst.emit(&mut buf, &info, &mut state); 2661 } 2662 2663 buf.bind_label(label(3), state.ctrl_plane_mut()); 2664 let inst = Inst::CondBr { 2665 kind: CondBrKind::NotZero(xreg(0), OperandSize::Size64), 2666 taken: target(0), 2667 not_taken: target(1), 2668 }; 2669 inst.emit(&mut buf, &info, &mut state); 2670 2671 let buf = buf.finish(&constants, state.ctrl_plane_mut()); 2672 2673 assert_eq!(2000000 + 12, buf.total_size()); 2674 2675 let mut buf2 = MachBuffer::new(); 2676 let mut state = Default::default(); 2677 let inst = Inst::CondBr { 2678 kind: CondBrKind::NotZero(xreg(0), OperandSize::Size64), 2679 taken: BranchTarget::ResolvedOffset(8), 2680 not_taken: BranchTarget::ResolvedOffset(4 - (2000000 + 4)), 2681 }; 2682 inst.emit(&mut buf2, &info, &mut state); 2683 let inst = Inst::Jump { 2684 dest: BranchTarget::ResolvedOffset(-(2000000 + 8)), 2685 }; 2686 inst.emit(&mut buf2, &info, &mut state); 2687 2688 let buf2 = buf2.finish(&constants, state.ctrl_plane_mut()); 2689 2690 assert_eq!(&buf.data[2000000..], &buf2.data[..]); 2691 } 2692 2693 #[test] 2694 fn test_multiple_redirect() { 2695 // label0: 2696 // cbz x0, label1 2697 // b label2 2698 // label1: 2699 // b label3 2700 // label2: 2701 // nop 2702 // nop 2703 // b label0 2704 // label3: 2705 // b label4 2706 // label4: 2707 // b label5 2708 // label5: 2709 // b label7 2710 // label6: 2711 // nop 2712 // label7: 2713 // ret 2714 // 2715 // -- should become: 2716 // 2717 // label0: 2718 // cbz x0, label7 2719 // label2: 2720 // nop 2721 // nop 2722 // b label0 2723 // label6: 2724 // nop 2725 // label7: 2726 // ret 2727 2728 let info = EmitInfo::new(settings::Flags::new(settings::builder())); 2729 let mut buf = MachBuffer::new(); 2730 let mut state = <Inst as MachInstEmit>::State::default(); 2731 let constants = Default::default(); 2732 2733 buf.reserve_labels_for_blocks(8); 2734 2735 buf.bind_label(label(0), state.ctrl_plane_mut()); 2736 let inst = Inst::CondBr { 2737 kind: CondBrKind::Zero(xreg(0), OperandSize::Size64), 2738 taken: target(1), 2739 not_taken: target(2), 2740 }; 2741 inst.emit(&mut buf, &info, &mut state); 2742 2743 buf.bind_label(label(1), state.ctrl_plane_mut()); 2744 let inst = Inst::Jump { dest: target(3) }; 2745 inst.emit(&mut buf, &info, &mut state); 2746 2747 buf.bind_label(label(2), state.ctrl_plane_mut()); 2748 let inst = Inst::Nop4; 2749 inst.emit(&mut buf, &info, &mut state); 2750 inst.emit(&mut buf, &info, &mut state); 2751 let inst = Inst::Jump { dest: target(0) }; 2752 inst.emit(&mut buf, &info, &mut state); 2753 2754 buf.bind_label(label(3), state.ctrl_plane_mut()); 2755 let inst = Inst::Jump { dest: target(4) }; 2756 inst.emit(&mut buf, &info, &mut state); 2757 2758 buf.bind_label(label(4), state.ctrl_plane_mut()); 2759 let inst = Inst::Jump { dest: target(5) }; 2760 inst.emit(&mut buf, &info, &mut state); 2761 2762 buf.bind_label(label(5), state.ctrl_plane_mut()); 2763 let inst = Inst::Jump { dest: target(7) }; 2764 inst.emit(&mut buf, &info, &mut state); 2765 2766 buf.bind_label(label(6), state.ctrl_plane_mut()); 2767 let inst = Inst::Nop4; 2768 inst.emit(&mut buf, &info, &mut state); 2769 2770 buf.bind_label(label(7), state.ctrl_plane_mut()); 2771 let inst = Inst::Ret {}; 2772 inst.emit(&mut buf, &info, &mut state); 2773 2774 let buf = buf.finish(&constants, state.ctrl_plane_mut()); 2775 2776 let golden_data = vec![ 2777 0xa0, 0x00, 0x00, 0xb4, // cbz x0, 0x14 2778 0x1f, 0x20, 0x03, 0xd5, // nop 2779 0x1f, 0x20, 0x03, 0xd5, // nop 2780 0xfd, 0xff, 0xff, 0x17, // b 0 2781 0x1f, 0x20, 0x03, 0xd5, // nop 2782 0xc0, 0x03, 0x5f, 0xd6, // ret 2783 ]; 2784 2785 assert_eq!(&golden_data[..], &buf.data[..]); 2786 } 2787 2788 #[test] 2789 fn test_handle_branch_cycle() { 2790 // label0: 2791 // b label1 2792 // label1: 2793 // b label2 2794 // label2: 2795 // b label3 2796 // label3: 2797 // b label4 2798 // label4: 2799 // b label1 // note: not label0 (to make it interesting). 2800 // 2801 // -- should become: 2802 // 2803 // label0, label1, ..., label4: 2804 // b label0 2805 let info = EmitInfo::new(settings::Flags::new(settings::builder())); 2806 let mut buf = MachBuffer::new(); 2807 let mut state = <Inst as MachInstEmit>::State::default(); 2808 let constants = Default::default(); 2809 2810 buf.reserve_labels_for_blocks(5); 2811 2812 buf.bind_label(label(0), state.ctrl_plane_mut()); 2813 let inst = Inst::Jump { dest: target(1) }; 2814 inst.emit(&mut buf, &info, &mut state); 2815 2816 buf.bind_label(label(1), state.ctrl_plane_mut()); 2817 let inst = Inst::Jump { dest: target(2) }; 2818 inst.emit(&mut buf, &info, &mut state); 2819 2820 buf.bind_label(label(2), state.ctrl_plane_mut()); 2821 let inst = Inst::Jump { dest: target(3) }; 2822 inst.emit(&mut buf, &info, &mut state); 2823 2824 buf.bind_label(label(3), state.ctrl_plane_mut()); 2825 let inst = Inst::Jump { dest: target(4) }; 2826 inst.emit(&mut buf, &info, &mut state); 2827 2828 buf.bind_label(label(4), state.ctrl_plane_mut()); 2829 let inst = Inst::Jump { dest: target(1) }; 2830 inst.emit(&mut buf, &info, &mut state); 2831 2832 let buf = buf.finish(&constants, state.ctrl_plane_mut()); 2833 2834 let golden_data = vec![ 2835 0x00, 0x00, 0x00, 0x14, // b 0 2836 ]; 2837 2838 assert_eq!(&golden_data[..], &buf.data[..]); 2839 } 2840 2841 #[test] 2842 fn metadata_records() { 2843 let mut buf = MachBuffer::<Inst>::new(); 2844 let ctrl_plane = &mut Default::default(); 2845 let constants = Default::default(); 2846 2847 buf.reserve_labels_for_blocks(3); 2848 2849 buf.bind_label(label(0), ctrl_plane); 2850 buf.put1(1); 2851 buf.add_trap(TrapCode::HEAP_OUT_OF_BOUNDS); 2852 buf.put1(2); 2853 buf.add_trap(TrapCode::INTEGER_OVERFLOW); 2854 buf.add_trap(TrapCode::INTEGER_DIVISION_BY_ZERO); 2855 buf.add_try_call_site( 2856 Some(0x10), 2857 [ 2858 MachExceptionHandler::Tag(ExceptionTag::new(42), label(2)), 2859 MachExceptionHandler::Default(label(1)), 2860 ] 2861 .into_iter(), 2862 ); 2863 buf.add_reloc( 2864 Reloc::Abs4, 2865 &ExternalName::User(UserExternalNameRef::new(0)), 2866 0, 2867 ); 2868 buf.put1(3); 2869 buf.add_reloc( 2870 Reloc::Abs8, 2871 &ExternalName::User(UserExternalNameRef::new(1)), 2872 1, 2873 ); 2874 buf.put1(4); 2875 buf.bind_label(label(1), ctrl_plane); 2876 buf.put1(0xff); 2877 buf.bind_label(label(2), ctrl_plane); 2878 buf.put1(0xff); 2879 2880 let buf = buf.finish(&constants, ctrl_plane); 2881 2882 assert_eq!(buf.data(), &[1, 2, 3, 4, 0xff, 0xff]); 2883 assert_eq!( 2884 buf.traps() 2885 .iter() 2886 .map(|trap| (trap.offset, trap.code)) 2887 .collect::<Vec<_>>(), 2888 vec![ 2889 (1, TrapCode::HEAP_OUT_OF_BOUNDS), 2890 (2, TrapCode::INTEGER_OVERFLOW), 2891 (2, TrapCode::INTEGER_DIVISION_BY_ZERO) 2892 ] 2893 ); 2894 let call_sites: Vec<_> = buf.call_sites().collect(); 2895 assert_eq!(call_sites[0].ret_addr, 2); 2896 assert_eq!(call_sites[0].frame_offset, Some(0x10)); 2897 assert_eq!( 2898 call_sites[0].exception_handlers, 2899 &[ 2900 FinalizedMachExceptionHandler::Tag(ExceptionTag::new(42), 5), 2901 FinalizedMachExceptionHandler::Default(4) 2902 ], 2903 ); 2904 assert_eq!( 2905 buf.relocs() 2906 .iter() 2907 .map(|reloc| (reloc.offset, reloc.kind)) 2908 .collect::<Vec<_>>(), 2909 vec![(2, Reloc::Abs4), (3, Reloc::Abs8)] 2910 ); 2911 } 2912 } 2913