1 //! In-memory representation of compiled machine code, with labels and fixups to 2 //! refer to those labels. Handles constant-pool island insertion and also 3 //! veneer insertion for out-of-range jumps. 4 //! 5 //! This code exists to solve three problems: 6 //! 7 //! - Branch targets for forward branches are not known until later, when we 8 //! emit code in a single pass through the instruction structs. 9 //! 10 //! - On many architectures, address references or offsets have limited range. 11 //! For example, on AArch64, conditional branches can only target code +/- 1MB 12 //! from the branch itself. 13 //! 14 //! - The lowering of control flow from the CFG-with-edges produced by 15 //! [BlockLoweringOrder](super::BlockLoweringOrder), combined with many empty 16 //! edge blocks when the register allocator does not need to insert any 17 //! spills/reloads/moves in edge blocks, results in many suboptimal branch 18 //! patterns. The lowering also pays no attention to block order, and so 19 //! two-target conditional forms (cond-br followed by uncond-br) can often by 20 //! avoided because one of the targets is the fallthrough. There are several 21 //! cases here where we can simplify to use fewer branches. 22 //! 23 //! This "buffer" implements a single-pass code emission strategy (with a later 24 //! "fixup" pass, but only through recorded fixups, not all instructions). The 25 //! basic idea is: 26 //! 27 //! - Emit branches as they are, including two-target (cond/uncond) compound 28 //! forms, but with zero offsets and optimistically assuming the target will be 29 //! in range. Record the "fixup" for later. Targets are denoted instead by 30 //! symbolic "labels" that are then bound to certain offsets in the buffer as 31 //! we emit code. (Nominally, there is a label at the start of every basic 32 //! block.) 33 //! 34 //! - As we do this, track the offset in the buffer at which the first label 35 //! reference "goes out of range". We call this the "deadline". If we reach the 36 //! deadline and we still have not bound the label to which an unresolved branch 37 //! refers, we have a problem! 38 //! 39 //! - To solve this problem, we emit "islands" full of "veneers". An island is 40 //! simply a chunk of code inserted in the middle of the code actually produced 41 //! by the emitter (e.g., vcode iterating over instruction structs). The emitter 42 //! has some awareness of this: it either asks for an island between blocks, so 43 //! it is not accidentally executed, or else it emits a branch around the island 44 //! when all other options fail (see `Inst::EmitIsland` meta-instruction). 45 //! 46 //! - A "veneer" is an instruction (or sequence of instructions) in an "island" 47 //! that implements a longer-range reference to a label. The idea is that, for 48 //! example, a branch with a limited range can branch to a "veneer" instead, 49 //! which is simply a branch in a form that can use a longer-range reference. On 50 //! AArch64, for example, conditionals have a +/- 1 MB range, but a conditional 51 //! can branch to an unconditional branch which has a +/- 128 MB range. Hence, a 52 //! conditional branch's label reference can be fixed up with a "veneer" to 53 //! achieve a longer range. 54 //! 55 //! - To implement all of this, we require the backend to provide a `LabelUse` 56 //! type that implements a trait. This is nominally an enum that records one of 57 //! several kinds of references to an offset in code -- basically, a relocation 58 //! type -- and will usually correspond to different instruction formats. The 59 //! `LabelUse` implementation specifies the maximum range, how to patch in the 60 //! actual label location when known, and how to generate a veneer to extend the 61 //! range. 62 //! 63 //! That satisfies label references, but we still may have suboptimal branch 64 //! patterns. To clean up the branches, we do a simple "peephole"-style 65 //! optimization on the fly. To do so, the emitter (e.g., `Inst::emit()`) 66 //! informs the buffer of branches in the code and, in the case of conditionals, 67 //! the code that would have been emitted to invert this branch's condition. We 68 //! track the "latest branches": these are branches that are contiguous up to 69 //! the current offset. (If any code is emitted after a branch, that branch or 70 //! run of contiguous branches is no longer "latest".) The latest branches are 71 //! those that we can edit by simply truncating the buffer and doing something 72 //! else instead. 73 //! 74 //! To optimize branches, we implement several simple rules, and try to apply 75 //! them to the "latest branches" when possible: 76 //! 77 //! - A branch with a label target, when that label is bound to the ending 78 //! offset of the branch (the fallthrough location), can be removed altogether, 79 //! because the branch would have no effect). 80 //! 81 //! - An unconditional branch that starts at a label location, and branches to 82 //! another label, results in a "label alias": all references to the label bound 83 //! *to* this branch instruction are instead resolved to the *target* of the 84 //! branch instruction. This effectively removes empty blocks that just 85 //! unconditionally branch to the next block. We call this "branch threading". 86 //! 87 //! - A conditional followed by an unconditional, when the conditional branches 88 //! to the unconditional's fallthrough, results in (i) the truncation of the 89 //! unconditional, (ii) the inversion of the condition's condition, and (iii) 90 //! replacement of the conditional's target (using the original target of the 91 //! unconditional). This is a fancy way of saying "we can flip a two-target 92 //! conditional branch's taken/not-taken targets if it works better with our 93 //! fallthrough". To make this work, the emitter actually gives the buffer 94 //! *both* forms of every conditional branch: the true form is emitted into the 95 //! buffer, and the "inverted" machine-code bytes are provided as part of the 96 //! branch-fixup metadata. 97 //! 98 //! - An unconditional B preceded by another unconditional P, when B's label(s) have 99 //! been redirected to target(B), can be removed entirely. This is an extension 100 //! of the branch-threading optimization, and is valid because if we know there 101 //! will be no fallthrough into this branch instruction (the prior instruction 102 //! is an unconditional jump), and if we know we have successfully redirected 103 //! all labels, then this branch instruction is unreachable. Note that this 104 //! works because the redirection happens before the label is ever resolved 105 //! (fixups happen at island emission time, at which point latest-branches are 106 //! cleared, or at the end of emission), so we are sure to catch and redirect 107 //! all possible paths to this instruction. 108 //! 109 //! # Branch-optimization Correctness 110 //! 111 //! The branch-optimization mechanism depends on a few data structures with 112 //! invariants, which are always held outside the scope of top-level public 113 //! methods: 114 //! 115 //! - The latest-branches list. Each entry describes a span of the buffer 116 //! (start/end offsets), the label target, the corresponding fixup-list entry 117 //! index, and the bytes (must be the same length) for the inverted form, if 118 //! conditional. The list of labels that are bound to the start-offset of this 119 //! branch is *complete* (if any label has a resolved offset equal to `start` 120 //! and is not an alias, it must appear in this list) and *precise* (no label 121 //! in this list can be bound to another offset). No label in this list should 122 //! be an alias. No two branch ranges can overlap, and branches are in 123 //! ascending-offset order. 124 //! 125 //! - The labels-at-tail list. This contains all MachLabels that have been bound 126 //! to (whose resolved offsets are equal to) the tail offset of the buffer. 127 //! No label in this list should be an alias. 128 //! 129 //! - The label_offsets array, containing the bound offset of a label or 130 //! UNKNOWN. No label can be bound at an offset greater than the current 131 //! buffer tail. 132 //! 133 //! - The label_aliases array, containing another label to which a label is 134 //! bound or UNKNOWN. A label's resolved offset is the resolved offset 135 //! of the label it is aliased to, if this is set. 136 //! 137 //! We argue below, at each method, how the invariants in these data structures 138 //! are maintained (grep for "Post-invariant"). 139 //! 140 //! Given these invariants, we argue why each optimization preserves execution 141 //! semantics below (grep for "Preserves execution semantics"). 142 //! 143 //! # Avoiding Quadratic Behavior 144 //! 145 //! There are two cases where we've had to take some care to avoid 146 //! quadratic worst-case behavior: 147 //! 148 //! - The "labels at this branch" list can grow unboundedly if the 149 //! code generator binds many labels at one location. If the count 150 //! gets too high (defined by the `LABEL_LIST_THRESHOLD` constant), we 151 //! simply abort an optimization early in a way that is always correct 152 //! but is conservative. 153 //! 154 //! - The fixup list can interact with island emission to create 155 //! "quadratic island behvior". In a little more detail, one can hit 156 //! this behavior by having some pending fixups (forward label 157 //! references) with long-range label-use kinds, and some others 158 //! with shorter-range references that nonetheless still are pending 159 //! long enough to trigger island generation. In such a case, we 160 //! process the fixup list, generate veneers to extend some forward 161 //! references' ranges, but leave the other (longer-range) ones 162 //! alone. The way this was implemented put them back on a list and 163 //! resulted in quadratic behavior. 164 //! 165 //! To avoid this, we could use a better data structure that allows 166 //! us to query for fixups with deadlines "coming soon" and generate 167 //! veneers for only those fixups. However, there is some 168 //! interaction with the branch peephole optimizations: the 169 //! invariant there is that branches in the "most recent branches 170 //! contiguous with end of buffer" list have corresponding fixups in 171 //! order (so that when we chomp the branch, we can chomp its fixup 172 //! too). 173 //! 174 //! So instead, when we generate an island, for now we create 175 //! veneers for *all* pending fixups, then if upgraded to a kind 176 //! that no longer supports veneers (is at "max range"), kick the 177 //! fixups off to a list that is *not* processed at islands except 178 //! for one last pass after emission. This allows us to skip the 179 //! work and avoids the quadratic behvior. We expect that this is 180 //! fine-ish for now: islands are relatively rare, and if they do 181 //! happen and generate unnecessary veneers (as will now happen for 182 //! the case above) we'll only get one unnecessary veneer per 183 //! branch (then they are at max range already). 184 //! 185 //! Longer-term, we could use a data structure that allows querying 186 //! by deadline, as long as we can properly chomp just-added fixups 187 //! when chomping branches. 188 189 use crate::binemit::{Addend, CodeOffset, Reloc, StackMap}; 190 use crate::ir::{ExternalName, Opcode, RelSourceLoc, SourceLoc, TrapCode}; 191 use crate::isa::unwind::UnwindInst; 192 use crate::machinst::{ 193 BlockIndex, MachInstLabelUse, TextSectionBuilder, VCodeConstant, VCodeConstants, VCodeInst, 194 }; 195 use crate::timing; 196 use crate::trace; 197 use cranelift_control::ControlPlane; 198 use cranelift_entity::{entity_impl, PrimaryMap}; 199 use smallvec::{smallvec, SmallVec}; 200 use std::convert::TryFrom; 201 use std::mem; 202 use std::string::String; 203 use std::vec::Vec; 204 205 #[cfg(feature = "enable-serde")] 206 use serde::{Deserialize, Serialize}; 207 208 #[cfg(feature = "enable-serde")] 209 pub trait CompilePhase { 210 type MachSrcLocType: for<'a> Deserialize<'a> + Serialize + core::fmt::Debug + PartialEq + Clone; 211 type SourceLocType: for<'a> Deserialize<'a> + Serialize + core::fmt::Debug + PartialEq + Clone; 212 } 213 214 #[cfg(not(feature = "enable-serde"))] 215 pub trait CompilePhase { 216 type MachSrcLocType: core::fmt::Debug + PartialEq + Clone; 217 type SourceLocType: core::fmt::Debug + PartialEq + Clone; 218 } 219 220 /// Status of a compiled artifact that needs patching before being used. 221 #[derive(Clone, Debug, PartialEq)] 222 #[cfg_attr(feature = "enable-serde", derive(Serialize, Deserialize))] 223 pub struct Stencil; 224 225 /// Status of a compiled artifact ready to use. 226 #[derive(Clone, Debug, PartialEq)] 227 pub struct Final; 228 229 impl CompilePhase for Stencil { 230 type MachSrcLocType = MachSrcLoc<Stencil>; 231 type SourceLocType = RelSourceLoc; 232 } 233 234 impl CompilePhase for Final { 235 type MachSrcLocType = MachSrcLoc<Final>; 236 type SourceLocType = SourceLoc; 237 } 238 239 #[derive(Clone, Copy, Debug, PartialEq, Eq)] 240 enum ForceVeneers { 241 Yes, 242 No, 243 } 244 245 #[derive(Clone, Copy, Debug, PartialEq, Eq)] 246 enum IsLastIsland { 247 Yes, 248 No, 249 } 250 251 /// A buffer of output to be produced, fixed up, and then emitted to a CodeSink 252 /// in bulk. 253 /// 254 /// This struct uses `SmallVec`s to support small-ish function bodies without 255 /// any heap allocation. As such, it will be several kilobytes large. This is 256 /// likely fine as long as it is stack-allocated for function emission then 257 /// thrown away; but beware if many buffer objects are retained persistently. 258 pub struct MachBuffer<I: VCodeInst> { 259 /// The buffer contents, as raw bytes. 260 data: SmallVec<[u8; 1024]>, 261 /// Any relocations referring to this code. Note that only *external* 262 /// relocations are tracked here; references to labels within the buffer are 263 /// resolved before emission. 264 relocs: SmallVec<[MachReloc; 16]>, 265 /// Any trap records referring to this code. 266 traps: SmallVec<[MachTrap; 16]>, 267 /// Any call site records referring to this code. 268 call_sites: SmallVec<[MachCallSite; 16]>, 269 /// Any source location mappings referring to this code. 270 srclocs: SmallVec<[MachSrcLoc<Stencil>; 64]>, 271 /// Any stack maps referring to this code. 272 stack_maps: SmallVec<[MachStackMap; 8]>, 273 /// Any unwind info at a given location. 274 unwind_info: SmallVec<[(CodeOffset, UnwindInst); 8]>, 275 /// The current source location in progress (after `start_srcloc()` and 276 /// before `end_srcloc()`). This is a (start_offset, src_loc) tuple. 277 cur_srcloc: Option<(CodeOffset, RelSourceLoc)>, 278 /// Known label offsets; `UNKNOWN_LABEL_OFFSET` if unknown. 279 label_offsets: SmallVec<[CodeOffset; 16]>, 280 /// Label aliases: when one label points to an unconditional jump, and that 281 /// jump points to another label, we can redirect references to the first 282 /// label immediately to the second. 283 /// 284 /// Invariant: we don't have label-alias cycles. We ensure this by, 285 /// before setting label A to alias label B, resolving B's alias 286 /// target (iteratively until a non-aliased label); if B is already 287 /// aliased to A, then we cannot alias A back to B. 288 label_aliases: SmallVec<[MachLabel; 16]>, 289 /// Constants that must be emitted at some point. 290 pending_constants: SmallVec<[VCodeConstant; 16]>, 291 /// Traps that must be emitted at some point. 292 pending_traps: SmallVec<[MachLabelTrap; 16]>, 293 /// Fixups that must be performed after all code is emitted. 294 fixup_records: SmallVec<[MachLabelFixup<I>; 16]>, 295 /// Fixups whose labels are at maximum range already: these need 296 /// not be considered in island emission until we're done 297 /// emitting. 298 fixup_records_max_range: SmallVec<[MachLabelFixup<I>; 16]>, 299 /// Current deadline at which all constants are flushed and all code labels 300 /// are extended by emitting long-range jumps in an island. This flush 301 /// should be rare (e.g., on AArch64, the shortest-range PC-rel references 302 /// are +/- 1MB for conditional jumps and load-literal instructions), so 303 /// it's acceptable to track a minimum and flush-all rather than doing more 304 /// detailed "current minimum" / sort-by-deadline trickery. 305 island_deadline: CodeOffset, 306 /// How many bytes are needed in the worst case for an island, given all 307 /// pending constants and fixups. 308 island_worst_case_size: CodeOffset, 309 /// Latest branches, to facilitate in-place editing for better fallthrough 310 /// behavior and empty-block removal. 311 latest_branches: SmallVec<[MachBranch; 4]>, 312 /// All labels at the current offset (emission tail). This is lazily 313 /// cleared: it is actually accurate as long as the current offset is 314 /// `labels_at_tail_off`, but if `cur_offset()` has grown larger, it should 315 /// be considered as empty. 316 /// 317 /// For correctness, this *must* be complete (i.e., the vector must contain 318 /// all labels whose offsets are resolved to the current tail), because we 319 /// rely on it to update labels when we truncate branches. 320 labels_at_tail: SmallVec<[MachLabel; 4]>, 321 /// The last offset at which `labels_at_tail` is valid. It is conceptually 322 /// always describing the tail of the buffer, but we do not clear 323 /// `labels_at_tail` eagerly when the tail grows, rather we lazily clear it 324 /// when the offset has grown past this (`labels_at_tail_off`) point. 325 /// Always <= `cur_offset()`. 326 labels_at_tail_off: CodeOffset, 327 /// Metadata about all constants that this function has access to. 328 /// 329 /// This records the size/alignment of all constants (not the actual data) 330 /// along with the last available label generated for the constant. This map 331 /// is consulted when constants are referred to and the label assigned to a 332 /// constant may change over time as well. 333 constants: PrimaryMap<VCodeConstant, MachBufferConstant>, 334 /// All recorded usages of constants as pairs of the constant and where the 335 /// constant needs to be placed within `self.data`. Note that the same 336 /// constant may appear in this array multiple times if it was emitted 337 /// multiple times. 338 used_constants: SmallVec<[(VCodeConstant, CodeOffset); 4]>, 339 } 340 341 impl MachBufferFinalized<Stencil> { 342 /// Get a finalized machine buffer by applying the function's base source location. 343 pub fn apply_base_srcloc(self, base_srcloc: SourceLoc) -> MachBufferFinalized<Final> { 344 MachBufferFinalized { 345 data: self.data, 346 relocs: self.relocs, 347 traps: self.traps, 348 call_sites: self.call_sites, 349 srclocs: self 350 .srclocs 351 .into_iter() 352 .map(|srcloc| srcloc.apply_base_srcloc(base_srcloc)) 353 .collect(), 354 stack_maps: self.stack_maps, 355 unwind_info: self.unwind_info, 356 alignment: self.alignment, 357 } 358 } 359 } 360 361 /// A `MachBuffer` once emission is completed: holds generated code and records, 362 /// without fixups. This allows the type to be independent of the backend. 363 #[derive(PartialEq, Debug, Clone)] 364 #[cfg_attr(feature = "enable-serde", derive(serde::Serialize, serde::Deserialize))] 365 pub struct MachBufferFinalized<T: CompilePhase> { 366 /// The buffer contents, as raw bytes. 367 pub(crate) data: SmallVec<[u8; 1024]>, 368 /// Any relocations referring to this code. Note that only *external* 369 /// relocations are tracked here; references to labels within the buffer are 370 /// resolved before emission. 371 pub(crate) relocs: SmallVec<[MachReloc; 16]>, 372 /// Any trap records referring to this code. 373 pub(crate) traps: SmallVec<[MachTrap; 16]>, 374 /// Any call site records referring to this code. 375 pub(crate) call_sites: SmallVec<[MachCallSite; 16]>, 376 /// Any source location mappings referring to this code. 377 pub(crate) srclocs: SmallVec<[T::MachSrcLocType; 64]>, 378 /// Any stack maps referring to this code. 379 pub(crate) stack_maps: SmallVec<[MachStackMap; 8]>, 380 /// Any unwind info at a given location. 381 pub unwind_info: SmallVec<[(CodeOffset, UnwindInst); 8]>, 382 /// The requireed alignment of this buffer 383 pub alignment: u32, 384 } 385 386 const UNKNOWN_LABEL_OFFSET: CodeOffset = 0xffff_ffff; 387 const UNKNOWN_LABEL: MachLabel = MachLabel(0xffff_ffff); 388 389 /// Threshold on max length of `labels_at_this_branch` list to avoid 390 /// unbounded quadratic behavior (see comment below at use-site). 391 const LABEL_LIST_THRESHOLD: usize = 100; 392 393 /// A label refers to some offset in a `MachBuffer`. It may not be resolved at 394 /// the point at which it is used by emitted code; the buffer records "fixups" 395 /// for references to the label, and will come back and patch the code 396 /// appropriately when the label's location is eventually known. 397 #[derive(Clone, Copy, Debug, PartialEq, Eq, PartialOrd, Ord, Hash)] 398 pub struct MachLabel(u32); 399 entity_impl!(MachLabel); 400 401 impl MachLabel { 402 /// Get a label for a block. (The first N MachLabels are always reseved for 403 /// the N blocks in the vcode.) 404 pub fn from_block(bindex: BlockIndex) -> MachLabel { 405 MachLabel(bindex.index() as u32) 406 } 407 408 /// Get the numeric label index. 409 pub fn get(self) -> u32 { 410 self.0 411 } 412 413 /// Creates a string representing this label, for convenience. 414 pub fn to_string(&self) -> String { 415 format!("label{}", self.0) 416 } 417 } 418 419 impl Default for MachLabel { 420 fn default() -> Self { 421 UNKNOWN_LABEL 422 } 423 } 424 425 /// A stack map extent, when creating a stack map. 426 pub enum StackMapExtent { 427 /// The stack map starts at this instruction, and ends after the number of upcoming bytes 428 /// (note: this is a code offset diff). 429 UpcomingBytes(CodeOffset), 430 431 /// The stack map started at the given offset and ends at the current one. This helps 432 /// architectures where the instruction size has not a fixed length. 433 StartedAtOffset(CodeOffset), 434 } 435 436 impl<I: VCodeInst> MachBuffer<I> { 437 /// Create a new section, known to start at `start_offset` and with a size limited to 438 /// `length_limit`. 439 pub fn new() -> MachBuffer<I> { 440 MachBuffer { 441 data: SmallVec::new(), 442 relocs: SmallVec::new(), 443 traps: SmallVec::new(), 444 call_sites: SmallVec::new(), 445 srclocs: SmallVec::new(), 446 stack_maps: SmallVec::new(), 447 unwind_info: SmallVec::new(), 448 cur_srcloc: None, 449 label_offsets: SmallVec::new(), 450 label_aliases: SmallVec::new(), 451 pending_constants: SmallVec::new(), 452 pending_traps: SmallVec::new(), 453 fixup_records: SmallVec::new(), 454 fixup_records_max_range: SmallVec::new(), 455 island_deadline: UNKNOWN_LABEL_OFFSET, 456 island_worst_case_size: 0, 457 latest_branches: SmallVec::new(), 458 labels_at_tail: SmallVec::new(), 459 labels_at_tail_off: 0, 460 constants: Default::default(), 461 used_constants: Default::default(), 462 } 463 } 464 465 /// Current offset from start of buffer. 466 pub fn cur_offset(&self) -> CodeOffset { 467 self.data.len() as CodeOffset 468 } 469 470 /// Add a byte. 471 pub fn put1(&mut self, value: u8) { 472 self.data.push(value); 473 474 // Post-invariant: conceptual-labels_at_tail contains a complete and 475 // precise list of labels bound at `cur_offset()`. We have advanced 476 // `cur_offset()`, hence if it had been equal to `labels_at_tail_off` 477 // before, it is not anymore (and it cannot become equal, because 478 // `labels_at_tail_off` is always <= `cur_offset()`). Thus the list is 479 // conceptually empty (even though it is only lazily cleared). No labels 480 // can be bound at this new offset (by invariant on `label_offsets`). 481 // Hence the invariant holds. 482 } 483 484 /// Add 2 bytes. 485 pub fn put2(&mut self, value: u16) { 486 let bytes = value.to_le_bytes(); 487 self.data.extend_from_slice(&bytes[..]); 488 489 // Post-invariant: as for `put1()`. 490 } 491 492 /// Add 4 bytes. 493 pub fn put4(&mut self, value: u32) { 494 let bytes = value.to_le_bytes(); 495 self.data.extend_from_slice(&bytes[..]); 496 497 // Post-invariant: as for `put1()`. 498 } 499 500 /// Add 8 bytes. 501 pub fn put8(&mut self, value: u64) { 502 let bytes = value.to_le_bytes(); 503 self.data.extend_from_slice(&bytes[..]); 504 505 // Post-invariant: as for `put1()`. 506 } 507 508 /// Add a slice of bytes. 509 pub fn put_data(&mut self, data: &[u8]) { 510 self.data.extend_from_slice(data); 511 512 // Post-invariant: as for `put1()`. 513 } 514 515 /// Reserve appended space and return a mutable slice referring to it. 516 pub fn get_appended_space(&mut self, len: usize) -> &mut [u8] { 517 let off = self.data.len(); 518 let new_len = self.data.len() + len; 519 self.data.resize(new_len, 0); 520 &mut self.data[off..] 521 522 // Post-invariant: as for `put1()`. 523 } 524 525 /// Align up to the given alignment. 526 pub fn align_to(&mut self, align_to: CodeOffset) { 527 trace!("MachBuffer: align to {}", align_to); 528 assert!( 529 align_to.is_power_of_two(), 530 "{} is not a power of two", 531 align_to 532 ); 533 while self.cur_offset() & (align_to - 1) != 0 { 534 self.put1(0); 535 } 536 537 // Post-invariant: as for `put1()`. 538 } 539 540 /// Allocate a `Label` to refer to some offset. May not be bound to a fixed 541 /// offset yet. 542 pub fn get_label(&mut self) -> MachLabel { 543 let l = self.label_offsets.len() as u32; 544 self.label_offsets.push(UNKNOWN_LABEL_OFFSET); 545 self.label_aliases.push(UNKNOWN_LABEL); 546 trace!("MachBuffer: new label -> {:?}", MachLabel(l)); 547 MachLabel(l) 548 549 // Post-invariant: the only mutation is to add a new label; it has no 550 // bound offset yet, so it trivially satisfies all invariants. 551 } 552 553 /// Reserve the first N MachLabels for blocks. 554 pub fn reserve_labels_for_blocks(&mut self, blocks: usize) { 555 trace!("MachBuffer: first {} labels are for blocks", blocks); 556 debug_assert!(self.label_offsets.is_empty()); 557 self.label_offsets.resize(blocks, UNKNOWN_LABEL_OFFSET); 558 self.label_aliases.resize(blocks, UNKNOWN_LABEL); 559 560 // Post-invariant: as for `get_label()`. 561 } 562 563 /// Registers metadata in this `MachBuffer` about the `constants` provided. 564 /// 565 /// This will record the size/alignment of all constants which will prepare 566 /// them for emission later on. 567 pub fn register_constants(&mut self, constants: &VCodeConstants) { 568 for (c, val) in constants.iter() { 569 let c2 = self.constants.push(MachBufferConstant { 570 upcoming_label: None, 571 align: val.alignment(), 572 size: val.as_slice().len(), 573 }); 574 assert_eq!(c, c2); 575 } 576 } 577 578 /// Completes constant emission by iterating over `self.used_constants` and 579 /// filling in the "holes" with the constant values provided by `constants`. 580 /// 581 /// Returns the alignment required for this entire buffer. Alignment starts 582 /// at the ISA's minimum function alignment and can be increased due to 583 /// constant requirements. 584 fn finish_constants(&mut self, constants: &VCodeConstants) -> u32 { 585 let mut alignment = I::function_alignment().minimum; 586 for (constant, offset) in mem::take(&mut self.used_constants) { 587 let constant = constants.get(constant); 588 let data = constant.as_slice(); 589 self.data[offset as usize..][..data.len()].copy_from_slice(data); 590 alignment = constant.alignment().max(alignment); 591 } 592 alignment 593 } 594 595 /// Returns a label that can be used to refer to the `constant` provided. 596 /// 597 /// This will automatically defer a new constant to be emitted for 598 /// `constant` if it has not been previously emitted. Note that this 599 /// function may return a different label for the same constant at 600 /// different points in time. The label is valid to use only from the 601 /// current location; the MachBuffer takes care to emit the same constant 602 /// multiple times if needed so the constant is always in range. 603 pub fn get_label_for_constant(&mut self, constant: VCodeConstant) -> MachLabel { 604 let MachBufferConstant { 605 align, 606 size, 607 upcoming_label, 608 } = self.constants[constant]; 609 if let Some(label) = upcoming_label { 610 return label; 611 } 612 613 let label = self.get_label(); 614 trace!( 615 "defer constant: eventually emit {size} bytes aligned \ 616 to {align} at label {label:?}", 617 ); 618 self.update_deadline(size, u32::MAX); 619 self.pending_constants.push(constant); 620 self.constants[constant].upcoming_label = Some(label); 621 label 622 } 623 624 /// Bind a label to the current offset. A label can only be bound once. 625 pub fn bind_label(&mut self, label: MachLabel, ctrl_plane: &mut ControlPlane) { 626 trace!( 627 "MachBuffer: bind label {:?} at offset {}", 628 label, 629 self.cur_offset() 630 ); 631 debug_assert_eq!(self.label_offsets[label.0 as usize], UNKNOWN_LABEL_OFFSET); 632 debug_assert_eq!(self.label_aliases[label.0 as usize], UNKNOWN_LABEL); 633 let offset = self.cur_offset(); 634 self.label_offsets[label.0 as usize] = offset; 635 self.lazily_clear_labels_at_tail(); 636 self.labels_at_tail.push(label); 637 638 // Invariants hold: bound offset of label is <= cur_offset (in fact it 639 // is equal). If the `labels_at_tail` list was complete and precise 640 // before, it is still, because we have bound this label to the current 641 // offset and added it to the list (which contains all labels at the 642 // current offset). 643 644 self.optimize_branches(ctrl_plane); 645 646 // Post-invariant: by `optimize_branches()` (see argument there). 647 } 648 649 /// Lazily clear `labels_at_tail` if the tail offset has moved beyond the 650 /// offset that it applies to. 651 fn lazily_clear_labels_at_tail(&mut self) { 652 let offset = self.cur_offset(); 653 if offset > self.labels_at_tail_off { 654 self.labels_at_tail_off = offset; 655 self.labels_at_tail.clear(); 656 } 657 658 // Post-invariant: either labels_at_tail_off was at cur_offset, and 659 // state is untouched, or was less than cur_offset, in which case the 660 // labels_at_tail list was conceptually empty, and is now actually 661 // empty. 662 } 663 664 /// Resolve a label to an offset, if known. May return `UNKNOWN_LABEL_OFFSET`. 665 pub(crate) fn resolve_label_offset(&self, mut label: MachLabel) -> CodeOffset { 666 let mut iters = 0; 667 while self.label_aliases[label.0 as usize] != UNKNOWN_LABEL { 668 label = self.label_aliases[label.0 as usize]; 669 // To protect against an infinite loop (despite our assurances to 670 // ourselves that the invariants make this impossible), assert out 671 // after 1M iterations. The number of basic blocks is limited 672 // in most contexts anyway so this should be impossible to hit with 673 // a legitimate input. 674 iters += 1; 675 assert!(iters < 1_000_000, "Unexpected cycle in label aliases"); 676 } 677 self.label_offsets[label.0 as usize] 678 679 // Post-invariant: no mutations. 680 } 681 682 /// Emit a reference to the given label with the given reference type (i.e., 683 /// branch-instruction format) at the current offset. This is like a 684 /// relocation, but handled internally. 685 /// 686 /// This can be called before the branch is actually emitted; fixups will 687 /// not happen until an island is emitted or the buffer is finished. 688 pub fn use_label_at_offset(&mut self, offset: CodeOffset, label: MachLabel, kind: I::LabelUse) { 689 trace!( 690 "MachBuffer: use_label_at_offset: offset {} label {:?} kind {:?}", 691 offset, 692 label, 693 kind 694 ); 695 696 // Add the fixup, and update the worst-case island size based on a 697 // veneer for this label use. 698 self.fixup_records.push(MachLabelFixup { 699 label, 700 offset, 701 kind, 702 }); 703 if kind.supports_veneer() { 704 self.island_worst_case_size += kind.veneer_size(); 705 self.island_worst_case_size &= !(I::LabelUse::ALIGN - 1); 706 } 707 let deadline = offset.saturating_add(kind.max_pos_range()); 708 if deadline < self.island_deadline { 709 self.island_deadline = deadline; 710 } 711 712 // Post-invariant: no mutations to branches/labels data structures. 713 } 714 715 /// Inform the buffer of an unconditional branch at the given offset, 716 /// targetting the given label. May be used to optimize branches. 717 /// The last added label-use must correspond to this branch. 718 /// This must be called when the current offset is equal to `start`; i.e., 719 /// before actually emitting the branch. This implies that for a branch that 720 /// uses a label and is eligible for optimizations by the MachBuffer, the 721 /// proper sequence is: 722 /// 723 /// - Call `use_label_at_offset()` to emit the fixup record. 724 /// - Call `add_uncond_branch()` to make note of the branch. 725 /// - Emit the bytes for the branch's machine code. 726 /// 727 /// Additional requirement: no labels may be bound between `start` and `end` 728 /// (exclusive on both ends). 729 pub fn add_uncond_branch(&mut self, start: CodeOffset, end: CodeOffset, target: MachLabel) { 730 assert!(self.cur_offset() == start); 731 debug_assert!(end > start); 732 assert!(!self.fixup_records.is_empty()); 733 let fixup = self.fixup_records.len() - 1; 734 self.lazily_clear_labels_at_tail(); 735 self.latest_branches.push(MachBranch { 736 start, 737 end, 738 target, 739 fixup, 740 inverted: None, 741 labels_at_this_branch: self.labels_at_tail.clone(), 742 }); 743 744 // Post-invariant: we asserted branch start is current tail; the list of 745 // labels at branch is cloned from list of labels at current tail. 746 } 747 748 /// Inform the buffer of a conditional branch at the given offset, 749 /// targetting the given label. May be used to optimize branches. 750 /// The last added label-use must correspond to this branch. 751 /// 752 /// Additional requirement: no labels may be bound between `start` and `end` 753 /// (exclusive on both ends). 754 pub fn add_cond_branch( 755 &mut self, 756 start: CodeOffset, 757 end: CodeOffset, 758 target: MachLabel, 759 inverted: &[u8], 760 ) { 761 assert!(self.cur_offset() == start); 762 debug_assert!(end > start); 763 assert!(!self.fixup_records.is_empty()); 764 debug_assert!(inverted.len() == (end - start) as usize); 765 let fixup = self.fixup_records.len() - 1; 766 let inverted = Some(SmallVec::from(inverted)); 767 self.lazily_clear_labels_at_tail(); 768 self.latest_branches.push(MachBranch { 769 start, 770 end, 771 target, 772 fixup, 773 inverted, 774 labels_at_this_branch: self.labels_at_tail.clone(), 775 }); 776 777 // Post-invariant: we asserted branch start is current tail; labels at 778 // branch list is cloned from list of labels at current tail. 779 } 780 781 fn truncate_last_branch(&mut self) { 782 self.lazily_clear_labels_at_tail(); 783 // Invariants hold at this point. 784 785 let b = self.latest_branches.pop().unwrap(); 786 assert!(b.end == self.cur_offset()); 787 788 // State: 789 // [PRE CODE] 790 // Offset b.start, b.labels_at_this_branch: 791 // [BRANCH CODE] 792 // cur_off, self.labels_at_tail --> 793 // (end of buffer) 794 self.data.truncate(b.start as usize); 795 self.fixup_records.truncate(b.fixup); 796 while let Some(last_srcloc) = self.srclocs.last_mut() { 797 if last_srcloc.end <= b.start { 798 break; 799 } 800 if last_srcloc.start < b.start { 801 last_srcloc.end = b.start; 802 break; 803 } 804 self.srclocs.pop(); 805 } 806 // State: 807 // [PRE CODE] 808 // cur_off, Offset b.start, b.labels_at_this_branch: 809 // (end of buffer) 810 // 811 // self.labels_at_tail --> (past end of buffer) 812 let cur_off = self.cur_offset(); 813 self.labels_at_tail_off = cur_off; 814 // State: 815 // [PRE CODE] 816 // cur_off, Offset b.start, b.labels_at_this_branch, 817 // self.labels_at_tail: 818 // (end of buffer) 819 // 820 // resolve_label_offset(l) for l in labels_at_tail: 821 // (past end of buffer) 822 823 trace!( 824 "truncate_last_branch: truncated {:?}; off now {}", 825 b, 826 cur_off 827 ); 828 829 // Fix up resolved label offsets for labels at tail. 830 for &l in &self.labels_at_tail { 831 self.label_offsets[l.0 as usize] = cur_off; 832 } 833 // Old labels_at_this_branch are now at cur_off. 834 self.labels_at_tail 835 .extend(b.labels_at_this_branch.into_iter()); 836 837 // Post-invariant: this operation is defined to truncate the buffer, 838 // which moves cur_off backward, and to move labels at the end of the 839 // buffer back to the start-of-branch offset. 840 // 841 // latest_branches satisfies all invariants: 842 // - it has no branches past the end of the buffer (branches are in 843 // order, we removed the last one, and we truncated the buffer to just 844 // before the start of that branch) 845 // - no labels were moved to lower offsets than the (new) cur_off, so 846 // the labels_at_this_branch list for any other branch need not change. 847 // 848 // labels_at_tail satisfies all invariants: 849 // - all labels that were at the tail after the truncated branch are 850 // moved backward to just before the branch, which becomes the new tail; 851 // thus every element in the list should remain (ensured by `.extend()` 852 // above). 853 // - all labels that refer to the new tail, which is the start-offset of 854 // the truncated branch, must be present. The `labels_at_this_branch` 855 // list in the truncated branch's record is a complete and precise list 856 // of exactly these labels; we append these to labels_at_tail. 857 // - labels_at_tail_off is at cur_off after truncation occurs, so the 858 // list is valid (not to be lazily cleared). 859 // 860 // The stated operation was performed: 861 // - For each label at the end of the buffer prior to this method, it 862 // now resolves to the new (truncated) end of the buffer: it must have 863 // been in `labels_at_tail` (this list is precise and complete, and 864 // the tail was at the end of the truncated branch on entry), and we 865 // iterate over this list and set `label_offsets` to the new tail. 866 // None of these labels could have been an alias (by invariant), so 867 // `label_offsets` is authoritative for each. 868 // - No other labels will be past the end of the buffer, because of the 869 // requirement that no labels be bound to the middle of branch ranges 870 // (see comments to `add_{cond,uncond}_branch()`). 871 // - The buffer is truncated to just before the last branch, and the 872 // fixup record referring to that last branch is removed. 873 } 874 875 fn optimize_branches(&mut self, ctrl_plane: &mut ControlPlane) { 876 if ctrl_plane.get_decision() { 877 return; 878 } 879 880 self.lazily_clear_labels_at_tail(); 881 // Invariants valid at this point. 882 883 trace!( 884 "enter optimize_branches:\n b = {:?}\n l = {:?}\n f = {:?}", 885 self.latest_branches, 886 self.labels_at_tail, 887 self.fixup_records 888 ); 889 890 // We continue to munch on branches at the tail of the buffer until no 891 // more rules apply. Note that the loop only continues if a branch is 892 // actually truncated (or if labels are redirected away from a branch), 893 // so this always makes progress. 894 while let Some(b) = self.latest_branches.last() { 895 let cur_off = self.cur_offset(); 896 trace!("optimize_branches: last branch {:?} at off {}", b, cur_off); 897 // If there has been any code emission since the end of the last branch or 898 // label definition, then there's nothing we can edit (because we 899 // don't move code once placed, only back up and overwrite), so 900 // clear the records and finish. 901 if b.end < cur_off { 902 break; 903 } 904 905 // If the "labels at this branch" list on this branch is 906 // longer than a threshold, don't do any simplification, 907 // and let the branch remain to separate those labels from 908 // the current tail. This avoids quadratic behavior (see 909 // #3468): otherwise, if a long string of "goto next; 910 // next:" patterns are emitted, all of the labels will 911 // coalesce into a long list of aliases for the current 912 // buffer tail. We must track all aliases of the current 913 // tail for correctness, but we are also allowed to skip 914 // optimization (removal) of any branch, so we take the 915 // escape hatch here and let it stand. In effect this 916 // "spreads" the many thousands of labels in the 917 // pathological case among an actual (harmless but 918 // suboptimal) instruction once per N labels. 919 if b.labels_at_this_branch.len() > LABEL_LIST_THRESHOLD { 920 break; 921 } 922 923 // Invariant: we are looking at a branch that ends at the tail of 924 // the buffer. 925 926 // For any branch, conditional or unconditional: 927 // - If the target is a label at the current offset, then remove 928 // the conditional branch, and reset all labels that targetted 929 // the current offset (end of branch) to the truncated 930 // end-of-code. 931 // 932 // Preserves execution semantics: a branch to its own fallthrough 933 // address is equivalent to a no-op; in both cases, nextPC is the 934 // fallthrough. 935 if self.resolve_label_offset(b.target) == cur_off { 936 trace!("branch with target == cur off; truncating"); 937 self.truncate_last_branch(); 938 continue; 939 } 940 941 // If latest is an unconditional branch: 942 // 943 // - If the branch's target is not its own start address, then for 944 // each label at the start of branch, make the label an alias of the 945 // branch target, and remove the label from the "labels at this 946 // branch" list. 947 // 948 // - Preserves execution semantics: an unconditional branch's 949 // only effect is to set PC to a new PC; this change simply 950 // collapses one step in the step-semantics. 951 // 952 // - Post-invariant: the labels that were bound to the start of 953 // this branch become aliases, so they must not be present in any 954 // labels-at-this-branch list or the labels-at-tail list. The 955 // labels are removed form the latest-branch record's 956 // labels-at-this-branch list, and are never placed in the 957 // labels-at-tail list. Furthermore, it is correct that they are 958 // not in either list, because they are now aliases, and labels 959 // that are aliases remain aliases forever. 960 // 961 // - If there is a prior unconditional branch that ends just before 962 // this one begins, and this branch has no labels bound to its 963 // start, then we can truncate this branch, because it is entirely 964 // unreachable (we have redirected all labels that make it 965 // reachable otherwise). Do so and continue around the loop. 966 // 967 // - Preserves execution semantics: the branch is unreachable, 968 // because execution can only flow into an instruction from the 969 // prior instruction's fallthrough or from a branch bound to that 970 // instruction's start offset. Unconditional branches have no 971 // fallthrough, so if the prior instruction is an unconditional 972 // branch, no fallthrough entry can happen. The 973 // labels-at-this-branch list is complete (by invariant), so if it 974 // is empty, then the instruction is entirely unreachable. Thus, 975 // it can be removed. 976 // 977 // - Post-invariant: ensured by truncate_last_branch(). 978 // 979 // - If there is a prior conditional branch whose target label 980 // resolves to the current offset (branches around the 981 // unconditional branch), then remove the unconditional branch, 982 // and make the target of the unconditional the target of the 983 // conditional instead. 984 // 985 // - Preserves execution semantics: previously we had: 986 // 987 // L1: 988 // cond_br L2 989 // br L3 990 // L2: 991 // (end of buffer) 992 // 993 // by removing the last branch, we have: 994 // 995 // L1: 996 // cond_br L2 997 // L2: 998 // (end of buffer) 999 // 1000 // we then fix up the records for the conditional branch to 1001 // have: 1002 // 1003 // L1: 1004 // cond_br.inverted L3 1005 // L2: 1006 // 1007 // In the original code, control flow reaches L2 when the 1008 // conditional branch's predicate is true, and L3 otherwise. In 1009 // the optimized code, the same is true. 1010 // 1011 // - Post-invariant: all edits to latest_branches and 1012 // labels_at_tail are performed by `truncate_last_branch()`, 1013 // which maintains the invariants at each step. 1014 1015 if b.is_uncond() { 1016 // Set any label equal to current branch's start as an alias of 1017 // the branch's target, if the target is not the branch itself 1018 // (i.e., an infinite loop). 1019 // 1020 // We cannot perform this aliasing if the target of this branch 1021 // ultimately aliases back here; if so, we need to keep this 1022 // branch, so break out of this loop entirely (and clear the 1023 // latest-branches list below). 1024 // 1025 // Note that this check is what prevents cycles from forming in 1026 // `self.label_aliases`. To see why, consider an arbitrary start 1027 // state: 1028 // 1029 // label_aliases[L1] = L2, label_aliases[L2] = L3, ..., up to 1030 // Ln, which is not aliased. 1031 // 1032 // We would create a cycle if we assigned label_aliases[Ln] 1033 // = L1. Note that the below assignment is the only write 1034 // to label_aliases. 1035 // 1036 // By our other invariants, we have that Ln (`l` below) 1037 // resolves to the offset `b.start`, because it is in the 1038 // set `b.labels_at_this_branch`. 1039 // 1040 // If L1 were already aliased, through some arbitrarily deep 1041 // chain, to Ln, then it must also resolve to this offset 1042 // `b.start`. 1043 // 1044 // By checking the resolution of `L1` against this offset, 1045 // and aborting this branch-simplification if they are 1046 // equal, we prevent the below assignment from ever creating 1047 // a cycle. 1048 if self.resolve_label_offset(b.target) != b.start { 1049 let redirected = b.labels_at_this_branch.len(); 1050 for &l in &b.labels_at_this_branch { 1051 trace!( 1052 " -> label at start of branch {:?} redirected to target {:?}", 1053 l, 1054 b.target 1055 ); 1056 self.label_aliases[l.0 as usize] = b.target; 1057 // NOTE: we continue to ensure the invariant that labels 1058 // pointing to tail of buffer are in `labels_at_tail` 1059 // because we already ensured above that the last branch 1060 // cannot have a target of `cur_off`; so we never have 1061 // to put the label into `labels_at_tail` when moving it 1062 // here. 1063 } 1064 // Maintain invariant: all branches have been redirected 1065 // and are no longer pointing at the start of this branch. 1066 let mut_b = self.latest_branches.last_mut().unwrap(); 1067 mut_b.labels_at_this_branch.clear(); 1068 1069 if redirected > 0 { 1070 trace!(" -> after label redirects, restarting loop"); 1071 continue; 1072 } 1073 } else { 1074 break; 1075 } 1076 1077 let b = self.latest_branches.last().unwrap(); 1078 1079 // Examine any immediately preceding branch. 1080 if self.latest_branches.len() > 1 { 1081 let prev_b = &self.latest_branches[self.latest_branches.len() - 2]; 1082 trace!(" -> more than one branch; prev_b = {:?}", prev_b); 1083 // This uncond is immediately after another uncond; we 1084 // should have already redirected labels to this uncond away 1085 // (but check to be sure); so we can truncate this uncond. 1086 if prev_b.is_uncond() 1087 && prev_b.end == b.start 1088 && b.labels_at_this_branch.is_empty() 1089 { 1090 trace!(" -> uncond follows another uncond; truncating"); 1091 self.truncate_last_branch(); 1092 continue; 1093 } 1094 1095 // This uncond is immediately after a conditional, and the 1096 // conditional's target is the end of this uncond, and we've 1097 // already redirected labels to this uncond away; so we can 1098 // truncate this uncond, flip the sense of the conditional, and 1099 // set the conditional's target (in `latest_branches` and in 1100 // `fixup_records`) to the uncond's target. 1101 if prev_b.is_cond() 1102 && prev_b.end == b.start 1103 && self.resolve_label_offset(prev_b.target) == cur_off 1104 { 1105 trace!(" -> uncond follows a conditional, and conditional's target resolves to current offset"); 1106 // Save the target of the uncond (this becomes the 1107 // target of the cond), and truncate the uncond. 1108 let target = b.target; 1109 let data = prev_b.inverted.clone().unwrap(); 1110 self.truncate_last_branch(); 1111 1112 // Mutate the code and cond branch. 1113 let off_before_edit = self.cur_offset(); 1114 let prev_b = self.latest_branches.last_mut().unwrap(); 1115 let not_inverted = SmallVec::from( 1116 &self.data[(prev_b.start as usize)..(prev_b.end as usize)], 1117 ); 1118 1119 // Low-level edit: replaces bytes of branch with 1120 // inverted form. cur_off remains the same afterward, so 1121 // we do not need to modify label data structures. 1122 self.data.truncate(prev_b.start as usize); 1123 self.data.extend_from_slice(&data[..]); 1124 1125 // Save the original code as the inversion of the 1126 // inverted branch, in case we later edit this branch 1127 // again. 1128 prev_b.inverted = Some(not_inverted); 1129 self.fixup_records[prev_b.fixup].label = target; 1130 trace!(" -> reassigning target of condbr to {:?}", target); 1131 prev_b.target = target; 1132 debug_assert_eq!(off_before_edit, self.cur_offset()); 1133 continue; 1134 } 1135 } 1136 } 1137 1138 // If we couldn't do anything with the last branch, then break. 1139 break; 1140 } 1141 1142 self.purge_latest_branches(); 1143 1144 trace!( 1145 "leave optimize_branches:\n b = {:?}\n l = {:?}\n f = {:?}", 1146 self.latest_branches, 1147 self.labels_at_tail, 1148 self.fixup_records 1149 ); 1150 } 1151 1152 fn purge_latest_branches(&mut self) { 1153 // All of our branch simplification rules work only if a branch ends at 1154 // the tail of the buffer, with no following code; and branches are in 1155 // order in latest_branches; so if the last entry ends prior to 1156 // cur_offset, then clear all entries. 1157 let cur_off = self.cur_offset(); 1158 if let Some(l) = self.latest_branches.last() { 1159 if l.end < cur_off { 1160 trace!("purge_latest_branches: removing branch {:?}", l); 1161 self.latest_branches.clear(); 1162 } 1163 } 1164 1165 // Post-invariant: no invariant requires any branch to appear in 1166 // `latest_branches`; it is always optional. The list-clear above thus 1167 // preserves all semantics. 1168 } 1169 1170 /// Emit a trap at some point in the future with the specified code and 1171 /// stack map. 1172 /// 1173 /// This function returns a [`MachLabel`] which will be the future address 1174 /// of the trap. Jumps should refer to this label, likely by using the 1175 /// [`MachBuffer::use_label_at_offset`] method, to get a relocation 1176 /// patched in once the address of the trap is known. 1177 /// 1178 /// This will batch all traps into the end of the function. 1179 pub fn defer_trap(&mut self, code: TrapCode, stack_map: Option<StackMap>) -> MachLabel { 1180 let label = self.get_label(); 1181 self.update_deadline(I::TRAP_OPCODE.len(), u32::MAX); 1182 self.pending_traps.push(MachLabelTrap { 1183 label, 1184 code, 1185 stack_map, 1186 loc: self.cur_srcloc.map(|(_start, loc)| loc), 1187 }); 1188 label 1189 } 1190 1191 fn update_deadline(&mut self, len: usize, max_distance: CodeOffset) { 1192 trace!("defer: eventually emit {} bytes", len); 1193 let deadline = self.cur_offset().saturating_add(max_distance); 1194 self.island_worst_case_size += len as CodeOffset; 1195 self.island_worst_case_size = 1196 (self.island_worst_case_size + I::LabelUse::ALIGN - 1) & !(I::LabelUse::ALIGN - 1); 1197 if deadline < self.island_deadline { 1198 self.island_deadline = deadline; 1199 } 1200 } 1201 1202 /// Is an island needed within the next N bytes? 1203 pub fn island_needed(&self, distance: CodeOffset) -> bool { 1204 self.worst_case_end_of_island(distance) > self.island_deadline 1205 } 1206 1207 /// Returns the maximal offset that islands can reach if `distance` more 1208 /// bytes are appended. 1209 /// 1210 /// This is used to determine if veneers need insertions since jumps that 1211 /// can't reach past this point must get a veneer of some form. 1212 fn worst_case_end_of_island(&self, distance: CodeOffset) -> CodeOffset { 1213 self.cur_offset() 1214 .saturating_add(distance) 1215 .saturating_add(self.island_worst_case_size) 1216 } 1217 1218 /// Emit all pending constants and required pending veneers. 1219 /// 1220 /// Should only be called if `island_needed()` returns true, i.e., if we 1221 /// actually reach a deadline. It's not necessarily a problem to do so 1222 /// otherwise but it may result in unnecessary work during emission. 1223 pub fn emit_island(&mut self, ctrl_plane: &mut ControlPlane) { 1224 self.emit_island_maybe_forced(ForceVeneers::No, IsLastIsland::No, ctrl_plane); 1225 } 1226 1227 /// Same as `emit_island`, but an internal API with a `force_veneers` 1228 /// argument to force all veneers to always get emitted for debugging. 1229 fn emit_island_maybe_forced( 1230 &mut self, 1231 force_veneers: ForceVeneers, 1232 last_island: IsLastIsland, 1233 ctrl_plane: &mut ControlPlane, 1234 ) { 1235 // We're going to purge fixups, so no latest-branch editing can happen 1236 // anymore. 1237 self.latest_branches.clear(); 1238 1239 // Reset internal calculations about islands since we're going to 1240 // change the calculus as we apply fixups. 1241 self.island_deadline = UNKNOWN_LABEL_OFFSET; 1242 self.island_worst_case_size = 0; 1243 1244 // End the current location tracking since anything emitted during this 1245 // function shouldn't be attributed to whatever the current source 1246 // location is. 1247 // 1248 // Note that the current source location, if it's set right now, will be 1249 // restored at the end of this island emission. 1250 let cur_loc = self.cur_srcloc.map(|(_, loc)| loc); 1251 if cur_loc.is_some() { 1252 self.end_srcloc(); 1253 } 1254 1255 // First flush out all traps/constants so we have more labels in case 1256 // fixups are applied against these labels. 1257 // 1258 // Note that traps are placed first since this typically happens at the 1259 // end of the function and for disassemblers we try to keep all the code 1260 // contiguously together. 1261 for MachLabelTrap { 1262 label, 1263 code, 1264 stack_map, 1265 loc, 1266 } in mem::take(&mut self.pending_traps) 1267 { 1268 // If this trap has source information associated with it then 1269 // emit this information for the trap instruction going out now too. 1270 if let Some(loc) = loc { 1271 self.start_srcloc(loc); 1272 } 1273 self.align_to(I::LabelUse::ALIGN); 1274 self.bind_label(label, ctrl_plane); 1275 self.add_trap(code); 1276 if let Some(map) = stack_map { 1277 let extent = StackMapExtent::UpcomingBytes(I::TRAP_OPCODE.len() as u32); 1278 self.add_stack_map(extent, map); 1279 } 1280 self.put_data(I::TRAP_OPCODE); 1281 if loc.is_some() { 1282 self.end_srcloc(); 1283 } 1284 } 1285 1286 for constant in mem::take(&mut self.pending_constants) { 1287 let MachBufferConstant { align, size, .. } = self.constants[constant]; 1288 let label = self.constants[constant].upcoming_label.take().unwrap(); 1289 self.align_to(align); 1290 self.bind_label(label, ctrl_plane); 1291 self.used_constants.push((constant, self.cur_offset())); 1292 self.get_appended_space(size); 1293 } 1294 1295 let last_island_fixups = match last_island { 1296 IsLastIsland::Yes => mem::take(&mut self.fixup_records_max_range), 1297 IsLastIsland::No => smallvec![], 1298 }; 1299 for fixup in mem::take(&mut self.fixup_records) 1300 .into_iter() 1301 .chain(last_island_fixups.into_iter()) 1302 { 1303 trace!("emit_island: fixup {:?}", fixup); 1304 let MachLabelFixup { 1305 label, 1306 offset, 1307 kind, 1308 } = fixup; 1309 let label_offset = self.resolve_label_offset(label); 1310 let start = offset as usize; 1311 let end = (offset + kind.patch_size()) as usize; 1312 1313 if label_offset != UNKNOWN_LABEL_OFFSET { 1314 // If the offset of the label for this fixup is known then 1315 // we're going to do something here-and-now. We're either going 1316 // to patch the original offset because it's an in-bounds jump, 1317 // or we're going to generate a veneer, patch the fixup to jump 1318 // to the veneer, and then keep going. 1319 // 1320 // If the label comes after the original fixup, then we should 1321 // be guaranteed that the jump is in-bounds. Otherwise there's 1322 // a bug somewhere because this method wasn't called soon 1323 // enough. All forward-jumps are tracked and should get veneers 1324 // before their deadline comes and they're unable to jump 1325 // further. 1326 // 1327 // Otherwise if the label is before the fixup, then that's a 1328 // backwards jump. If it's past the maximum negative range 1329 // then we'll emit a veneer that to jump forward to which can 1330 // then jump backwards. 1331 let veneer_required = if label_offset >= offset { 1332 assert!((label_offset - offset) <= kind.max_pos_range()); 1333 false 1334 } else { 1335 (offset - label_offset) > kind.max_neg_range() 1336 }; 1337 trace!( 1338 " -> label_offset = {}, known, required = {} (pos {} neg {})", 1339 label_offset, 1340 veneer_required, 1341 kind.max_pos_range(), 1342 kind.max_neg_range() 1343 ); 1344 1345 if (force_veneers == ForceVeneers::Yes && kind.supports_veneer()) || veneer_required 1346 { 1347 self.emit_veneer(label, offset, kind); 1348 } else { 1349 let slice = &mut self.data[start..end]; 1350 trace!("patching in-range!"); 1351 kind.patch(slice, offset, label_offset); 1352 } 1353 } else { 1354 // If the offset of this label is not known at this time then 1355 // there are three possibilities: 1356 // 1357 // 1. It's possible that the label is already a "max 1358 // range" label: a veneer would not help us any, 1359 // and so we need not consider the label during 1360 // island emission any more until the very end (the 1361 // last "island" pass). In this case we kick the 1362 // label into a separate list to process once at 1363 // the end, to avoid quadratic behavior (see 1364 // "quadratic island behavior" above, and issue 1365 // #6798). 1366 // 1367 // 2. Or, we may be about to exceed the maximum jump range of 1368 // this fixup. In that case a veneer is inserted to buy some 1369 // more budget for the forward-jump. It's guaranteed that the 1370 // label will eventually come after where we're at, so we know 1371 // that the forward jump is necessary. 1372 // 1373 // 3. Otherwise, we're still within range of the 1374 // forward jump but the precise target isn't known 1375 // yet. In that case, to avoid quadratic behavior 1376 // (again, see above), we emit a veneer and if the 1377 // resulting label-use fixup is then max-range, we 1378 // put it in the max-range list. We could enqueue 1379 // the fixup for processing later, and this would 1380 // enable slightly fewer veneers, but islands are 1381 // relatively rare and the cost of "upgrading" all 1382 // forward label refs that cross an island should 1383 // be relatively low. 1384 if !kind.supports_veneer() { 1385 self.fixup_records_max_range.push(MachLabelFixup { 1386 label, 1387 offset, 1388 kind, 1389 }); 1390 } else { 1391 self.emit_veneer(label, offset, kind); 1392 } 1393 } 1394 } 1395 1396 if let Some(loc) = cur_loc { 1397 self.start_srcloc(loc); 1398 } 1399 } 1400 1401 /// Emits a "veneer" the `kind` code at `offset` to jump to `label`. 1402 /// 1403 /// This will generate extra machine code, using `kind`, to get a 1404 /// larger-jump-kind than `kind` allows. The code at `offset` is then 1405 /// patched to jump to our new code, and then the new code is enqueued for 1406 /// a fixup to get processed at some later time. 1407 fn emit_veneer(&mut self, label: MachLabel, offset: CodeOffset, kind: I::LabelUse) { 1408 // If this `kind` doesn't support a veneer then that's a bug in the 1409 // backend because we need to implement support for such a veneer. 1410 assert!( 1411 kind.supports_veneer(), 1412 "jump beyond the range of {:?} but a veneer isn't supported", 1413 kind, 1414 ); 1415 1416 // Allocate space for a veneer in the island. 1417 self.align_to(I::LabelUse::ALIGN); 1418 let veneer_offset = self.cur_offset(); 1419 trace!("making a veneer at {}", veneer_offset); 1420 let start = offset as usize; 1421 let end = (offset + kind.patch_size()) as usize; 1422 let slice = &mut self.data[start..end]; 1423 // Patch the original label use to refer to the veneer. 1424 trace!( 1425 "patching original at offset {} to veneer offset {}", 1426 offset, 1427 veneer_offset 1428 ); 1429 kind.patch(slice, offset, veneer_offset); 1430 // Generate the veneer. 1431 let veneer_slice = self.get_appended_space(kind.veneer_size() as usize); 1432 let (veneer_fixup_off, veneer_label_use) = 1433 kind.generate_veneer(veneer_slice, veneer_offset); 1434 trace!( 1435 "generated veneer; fixup offset {}, label_use {:?}", 1436 veneer_fixup_off, 1437 veneer_label_use 1438 ); 1439 // Register a new use of `label` with our new veneer fixup and 1440 // offset. This'll recalculate deadlines accordingly and 1441 // enqueue this fixup to get processed at some later 1442 // time. Note that if we now have a max-range, we instead skip 1443 // the usual fixup list to avoid quadratic behavior. 1444 if veneer_label_use.supports_veneer() { 1445 self.use_label_at_offset(veneer_fixup_off, label, veneer_label_use); 1446 } else { 1447 self.fixup_records_max_range.push(MachLabelFixup { 1448 label, 1449 offset: veneer_fixup_off, 1450 kind: veneer_label_use, 1451 }); 1452 } 1453 } 1454 1455 fn finish_emission_maybe_forcing_veneers( 1456 &mut self, 1457 force_veneers: ForceVeneers, 1458 ctrl_plane: &mut ControlPlane, 1459 ) { 1460 while !self.pending_constants.is_empty() 1461 || !self.pending_traps.is_empty() 1462 || !self.fixup_records.is_empty() 1463 || !self.fixup_records_max_range.is_empty() 1464 { 1465 // `emit_island()` will emit any pending veneers and constants, and 1466 // as a side-effect, will also take care of any fixups with resolved 1467 // labels eagerly. 1468 self.emit_island_maybe_forced(force_veneers, IsLastIsland::Yes, ctrl_plane); 1469 } 1470 1471 // Ensure that all labels have been fixed up after the last island is emitted. This is a 1472 // full (release-mode) assert because an unresolved label means the emitted code is 1473 // incorrect. 1474 assert!(self.fixup_records.is_empty()); 1475 } 1476 1477 /// Finish any deferred emissions and/or fixups. 1478 pub fn finish( 1479 mut self, 1480 constants: &VCodeConstants, 1481 ctrl_plane: &mut ControlPlane, 1482 ) -> MachBufferFinalized<Stencil> { 1483 let _tt = timing::vcode_emit_finish(); 1484 1485 // Do any optimizations on branches at tail of buffer, as if we 1486 // had bound one last label. 1487 self.optimize_branches(ctrl_plane); 1488 1489 self.finish_emission_maybe_forcing_veneers(ForceVeneers::No, ctrl_plane); 1490 1491 let alignment = self.finish_constants(constants); 1492 1493 let mut srclocs = self.srclocs; 1494 srclocs.sort_by_key(|entry| entry.start); 1495 1496 MachBufferFinalized { 1497 data: self.data, 1498 relocs: self.relocs, 1499 traps: self.traps, 1500 call_sites: self.call_sites, 1501 srclocs, 1502 stack_maps: self.stack_maps, 1503 unwind_info: self.unwind_info, 1504 alignment, 1505 } 1506 } 1507 1508 /// Add an external relocation at the current offset. 1509 pub fn add_reloc(&mut self, kind: Reloc, name: &ExternalName, addend: Addend) { 1510 let name = name.clone(); 1511 // FIXME(#3277): This should use `I::LabelUse::from_reloc` to optionally 1512 // generate a label-use statement to track whether an island is possibly 1513 // needed to escape this function to actually get to the external name. 1514 // This is most likely to come up on AArch64 where calls between 1515 // functions use a 26-bit signed offset which gives +/- 64MB. This means 1516 // that if a function is 128MB in size and there's a call in the middle 1517 // it's impossible to reach the actual target. Also, while it's 1518 // technically possible to jump to the start of a function and then jump 1519 // further, island insertion below always inserts islands after 1520 // previously appended code so for Cranelift's own implementation this 1521 // is also a problem for 64MB functions on AArch64 which start with a 1522 // call instruction, those won't be able to escape. 1523 // 1524 // Ideally what needs to happen here is that a `LabelUse` is 1525 // transparently generated (or call-sites of this function are audited 1526 // to generate a `LabelUse` instead) and tracked internally. The actual 1527 // relocation would then change over time if and when a veneer is 1528 // inserted, where the relocation here would be patched by this 1529 // `MachBuffer` to jump to the veneer. The problem, though, is that all 1530 // this still needs to end up, in the case of a singular function, 1531 // generating a final relocation pointing either to this particular 1532 // relocation or to the veneer inserted. Additionally 1533 // `MachBuffer` needs the concept of a label which will never be 1534 // resolved, so `emit_island` doesn't trip over not actually ever 1535 // knowning what some labels are. Currently the loop in 1536 // `finish_emission_maybe_forcing_veneers` would otherwise infinitely 1537 // loop. 1538 // 1539 // For now this means that because relocs aren't tracked at all that 1540 // AArch64 functions have a rough size limits of 64MB. For now that's 1541 // somewhat reasonable and the failure mode is a panic in `MachBuffer` 1542 // when a relocation can't otherwise be resolved later, so it shouldn't 1543 // actually result in any memory unsafety or anything like that. 1544 self.relocs.push(MachReloc { 1545 offset: self.data.len() as CodeOffset, 1546 kind, 1547 name, 1548 addend, 1549 }); 1550 } 1551 1552 /// Add a trap record at the current offset. 1553 pub fn add_trap(&mut self, code: TrapCode) { 1554 self.traps.push(MachTrap { 1555 offset: self.data.len() as CodeOffset, 1556 code, 1557 }); 1558 } 1559 1560 /// Add a call-site record at the current offset. 1561 pub fn add_call_site(&mut self, opcode: Opcode) { 1562 debug_assert!( 1563 opcode.is_call(), 1564 "adding call site info for a non-call instruction." 1565 ); 1566 self.call_sites.push(MachCallSite { 1567 ret_addr: self.data.len() as CodeOffset, 1568 opcode, 1569 }); 1570 } 1571 1572 /// Add an unwind record at the current offset. 1573 pub fn add_unwind(&mut self, unwind: UnwindInst) { 1574 self.unwind_info.push((self.cur_offset(), unwind)); 1575 } 1576 1577 /// Set the `SourceLoc` for code from this offset until the offset at the 1578 /// next call to `end_srcloc()`. 1579 pub fn start_srcloc(&mut self, loc: RelSourceLoc) { 1580 self.cur_srcloc = Some((self.cur_offset(), loc)); 1581 } 1582 1583 /// Mark the end of the `SourceLoc` segment started at the last 1584 /// `start_srcloc()` call. 1585 pub fn end_srcloc(&mut self) { 1586 let (start, loc) = self 1587 .cur_srcloc 1588 .take() 1589 .expect("end_srcloc() called without start_srcloc()"); 1590 let end = self.cur_offset(); 1591 // Skip zero-length extends. 1592 debug_assert!(end >= start); 1593 if end > start { 1594 self.srclocs.push(MachSrcLoc { start, end, loc }); 1595 } 1596 } 1597 1598 /// Add stack map metadata for this program point: a set of stack offsets 1599 /// (from SP upward) that contain live references. 1600 /// 1601 /// The `offset_to_fp` value is the offset from the nominal SP (at which the `stack_offsets` 1602 /// are based) and the FP value. By subtracting `offset_to_fp` from each `stack_offsets` 1603 /// element, one can obtain live-reference offsets from FP instead. 1604 pub fn add_stack_map(&mut self, extent: StackMapExtent, stack_map: StackMap) { 1605 let (start, end) = match extent { 1606 StackMapExtent::UpcomingBytes(insn_len) => { 1607 let start_offset = self.cur_offset(); 1608 (start_offset, start_offset + insn_len) 1609 } 1610 StackMapExtent::StartedAtOffset(start_offset) => { 1611 let end_offset = self.cur_offset(); 1612 debug_assert!(end_offset >= start_offset); 1613 (start_offset, end_offset) 1614 } 1615 }; 1616 trace!("Adding stack map for offsets {start:#x}..{end:#x}"); 1617 self.stack_maps.push(MachStackMap { 1618 offset: start, 1619 offset_end: end, 1620 stack_map, 1621 }); 1622 } 1623 } 1624 1625 impl<T: CompilePhase> MachBufferFinalized<T> { 1626 /// Get a list of source location mapping tuples in sorted-by-start-offset order. 1627 pub fn get_srclocs_sorted(&self) -> &[T::MachSrcLocType] { 1628 &self.srclocs[..] 1629 } 1630 1631 /// Get the total required size for the code. 1632 pub fn total_size(&self) -> CodeOffset { 1633 self.data.len() as CodeOffset 1634 } 1635 1636 /// Return the code in this mach buffer as a hex string for testing purposes. 1637 pub fn stringify_code_bytes(&self) -> String { 1638 // This is pretty lame, but whatever .. 1639 use std::fmt::Write; 1640 let mut s = String::with_capacity(self.data.len() * 2); 1641 for b in &self.data { 1642 write!(&mut s, "{:02X}", b).unwrap(); 1643 } 1644 s 1645 } 1646 1647 /// Get the code bytes. 1648 pub fn data(&self) -> &[u8] { 1649 // N.B.: we emit every section into the .text section as far as 1650 // the `CodeSink` is concerned; we do not bother to segregate 1651 // the contents into the actual program text, the jumptable and the 1652 // rodata (constant pool). This allows us to generate code assuming 1653 // that these will not be relocated relative to each other, and avoids 1654 // having to designate each section as belonging in one of the three 1655 // fixed categories defined by `CodeSink`. If this becomes a problem 1656 // later (e.g. because of memory permissions or similar), we can 1657 // add this designation and segregate the output; take care, however, 1658 // to add the appropriate relocations in this case. 1659 1660 &self.data[..] 1661 } 1662 1663 /// Get the list of external relocations for this code. 1664 pub fn relocs(&self) -> &[MachReloc] { 1665 &self.relocs[..] 1666 } 1667 1668 /// Get the list of trap records for this code. 1669 pub fn traps(&self) -> &[MachTrap] { 1670 &self.traps[..] 1671 } 1672 1673 /// Get the stack map metadata for this code. 1674 pub fn stack_maps(&self) -> &[MachStackMap] { 1675 &self.stack_maps[..] 1676 } 1677 1678 /// Get the list of call sites for this code. 1679 pub fn call_sites(&self) -> &[MachCallSite] { 1680 &self.call_sites[..] 1681 } 1682 } 1683 1684 /// Metadata about a constant. 1685 struct MachBufferConstant { 1686 /// A label which has not yet been bound which can be used for this 1687 /// constant. 1688 /// 1689 /// This is lazily created when a label is requested for a constant and is 1690 /// cleared when a constant is emitted. 1691 upcoming_label: Option<MachLabel>, 1692 /// Required alignment. 1693 align: CodeOffset, 1694 /// The byte size of this constant. 1695 size: usize, 1696 } 1697 1698 /// A trap that is deferred to the next time an island is emitted for either 1699 /// traps, constants, or fixups. 1700 struct MachLabelTrap { 1701 /// This label will refer to the trap's offset. 1702 label: MachLabel, 1703 /// The code associated with this trap. 1704 code: TrapCode, 1705 /// An optional stack map to associate with this trap. 1706 stack_map: Option<StackMap>, 1707 /// An optional source location to assign for this trap. 1708 loc: Option<RelSourceLoc>, 1709 } 1710 1711 /// A fixup to perform on the buffer once code is emitted. Fixups always refer 1712 /// to labels and patch the code based on label offsets. Hence, they are like 1713 /// relocations, but internal to one buffer. 1714 #[derive(Debug)] 1715 struct MachLabelFixup<I: VCodeInst> { 1716 /// The label whose offset controls this fixup. 1717 label: MachLabel, 1718 /// The offset to fix up / patch to refer to this label. 1719 offset: CodeOffset, 1720 /// The kind of fixup. This is architecture-specific; each architecture may have, 1721 /// e.g., several types of branch instructions, each with differently-sized 1722 /// offset fields and different places within the instruction to place the 1723 /// bits. 1724 kind: I::LabelUse, 1725 } 1726 1727 /// A relocation resulting from a compilation. 1728 #[derive(Clone, Debug, PartialEq)] 1729 #[cfg_attr(feature = "enable-serde", derive(serde::Serialize, serde::Deserialize))] 1730 pub struct MachReloc { 1731 /// The offset at which the relocation applies, *relative to the 1732 /// containing section*. 1733 pub offset: CodeOffset, 1734 /// The kind of relocation. 1735 pub kind: Reloc, 1736 /// The external symbol / name to which this relocation refers. 1737 pub name: ExternalName, 1738 /// The addend to add to the symbol value. 1739 pub addend: i64, 1740 } 1741 1742 /// A trap record resulting from a compilation. 1743 #[derive(Clone, Debug, PartialEq)] 1744 #[cfg_attr(feature = "enable-serde", derive(serde::Serialize, serde::Deserialize))] 1745 pub struct MachTrap { 1746 /// The offset at which the trap instruction occurs, *relative to the 1747 /// containing section*. 1748 pub offset: CodeOffset, 1749 /// The trap code. 1750 pub code: TrapCode, 1751 } 1752 1753 /// A call site record resulting from a compilation. 1754 #[derive(Clone, Debug, PartialEq)] 1755 #[cfg_attr(feature = "enable-serde", derive(serde::Serialize, serde::Deserialize))] 1756 pub struct MachCallSite { 1757 /// The offset of the call's return address, *relative to the containing section*. 1758 pub ret_addr: CodeOffset, 1759 /// The call's opcode. 1760 pub opcode: Opcode, 1761 } 1762 1763 /// A source-location mapping resulting from a compilation. 1764 #[derive(PartialEq, Debug, Clone)] 1765 #[cfg_attr(feature = "enable-serde", derive(serde::Serialize, serde::Deserialize))] 1766 pub struct MachSrcLoc<T: CompilePhase> { 1767 /// The start of the region of code corresponding to a source location. 1768 /// This is relative to the start of the function, not to the start of the 1769 /// section. 1770 pub start: CodeOffset, 1771 /// The end of the region of code corresponding to a source location. 1772 /// This is relative to the start of the section, not to the start of the 1773 /// section. 1774 pub end: CodeOffset, 1775 /// The source location. 1776 pub loc: T::SourceLocType, 1777 } 1778 1779 impl MachSrcLoc<Stencil> { 1780 fn apply_base_srcloc(self, base_srcloc: SourceLoc) -> MachSrcLoc<Final> { 1781 MachSrcLoc { 1782 start: self.start, 1783 end: self.end, 1784 loc: self.loc.expand(base_srcloc), 1785 } 1786 } 1787 } 1788 1789 /// Record of stack map metadata: stack offsets containing references. 1790 #[derive(Clone, Debug, PartialEq)] 1791 #[cfg_attr(feature = "enable-serde", derive(serde::Serialize, serde::Deserialize))] 1792 pub struct MachStackMap { 1793 /// The code offset at which this stack map applies. 1794 pub offset: CodeOffset, 1795 /// The code offset just past the "end" of the instruction: that is, the 1796 /// offset of the first byte of the following instruction, or equivalently, 1797 /// the start offset plus the instruction length. 1798 pub offset_end: CodeOffset, 1799 /// The stack map itself. 1800 pub stack_map: StackMap, 1801 } 1802 1803 /// Record of branch instruction in the buffer, to facilitate editing. 1804 #[derive(Clone, Debug)] 1805 struct MachBranch { 1806 start: CodeOffset, 1807 end: CodeOffset, 1808 target: MachLabel, 1809 fixup: usize, 1810 inverted: Option<SmallVec<[u8; 8]>>, 1811 /// All labels pointing to the start of this branch. For correctness, this 1812 /// *must* be complete (i.e., must contain all labels whose resolved offsets 1813 /// are at the start of this branch): we rely on being able to redirect all 1814 /// labels that could jump to this branch before removing it, if it is 1815 /// otherwise unreachable. 1816 labels_at_this_branch: SmallVec<[MachLabel; 4]>, 1817 } 1818 1819 impl MachBranch { 1820 fn is_cond(&self) -> bool { 1821 self.inverted.is_some() 1822 } 1823 fn is_uncond(&self) -> bool { 1824 self.inverted.is_none() 1825 } 1826 } 1827 1828 /// Implementation of the `TextSectionBuilder` trait backed by `MachBuffer`. 1829 /// 1830 /// Note that `MachBuffer` was primarily written for intra-function references 1831 /// of jumps between basic blocks, but it's also quite usable for entire text 1832 /// sections and resolving references between functions themselves. This 1833 /// builder interprets "blocks" as labeled functions for the purposes of 1834 /// resolving labels internally in the buffer. 1835 pub struct MachTextSectionBuilder<I: VCodeInst> { 1836 buf: MachBuffer<I>, 1837 next_func: usize, 1838 force_veneers: ForceVeneers, 1839 } 1840 1841 impl<I: VCodeInst> MachTextSectionBuilder<I> { 1842 /// Creates a new text section builder which will have `num_funcs` functions 1843 /// pushed into it. 1844 pub fn new(num_funcs: usize) -> MachTextSectionBuilder<I> { 1845 let mut buf = MachBuffer::new(); 1846 buf.reserve_labels_for_blocks(num_funcs); 1847 MachTextSectionBuilder { 1848 buf, 1849 next_func: 0, 1850 force_veneers: ForceVeneers::No, 1851 } 1852 } 1853 } 1854 1855 impl<I: VCodeInst> TextSectionBuilder for MachTextSectionBuilder<I> { 1856 fn append( 1857 &mut self, 1858 labeled: bool, 1859 func: &[u8], 1860 align: u32, 1861 ctrl_plane: &mut ControlPlane, 1862 ) -> u64 { 1863 // Conditionally emit an island if it's necessary to resolve jumps 1864 // between functions which are too far away. 1865 let size = func.len() as u32; 1866 if self.force_veneers == ForceVeneers::Yes || self.buf.island_needed(size) { 1867 self.buf 1868 .emit_island_maybe_forced(self.force_veneers, IsLastIsland::No, ctrl_plane); 1869 } 1870 1871 self.buf.align_to(align); 1872 let pos = self.buf.cur_offset(); 1873 if labeled { 1874 self.buf.bind_label( 1875 MachLabel::from_block(BlockIndex::new(self.next_func)), 1876 ctrl_plane, 1877 ); 1878 self.next_func += 1; 1879 } 1880 self.buf.put_data(func); 1881 u64::from(pos) 1882 } 1883 1884 fn resolve_reloc(&mut self, offset: u64, reloc: Reloc, addend: Addend, target: usize) -> bool { 1885 crate::trace!( 1886 "Resolving relocation @ {offset:#x} + {addend:#x} to target {target} of kind {reloc:?}" 1887 ); 1888 let label = MachLabel::from_block(BlockIndex::new(target)); 1889 let offset = u32::try_from(offset).unwrap(); 1890 match I::LabelUse::from_reloc(reloc, addend) { 1891 Some(label_use) => { 1892 self.buf.use_label_at_offset(offset, label, label_use); 1893 true 1894 } 1895 None => false, 1896 } 1897 } 1898 1899 fn force_veneers(&mut self) { 1900 self.force_veneers = ForceVeneers::Yes; 1901 } 1902 1903 fn finish(&mut self, ctrl_plane: &mut ControlPlane) -> Vec<u8> { 1904 // Double-check all functions were pushed. 1905 assert_eq!(self.next_func, self.buf.label_offsets.len()); 1906 1907 // Finish up any veneers, if necessary. 1908 self.buf 1909 .finish_emission_maybe_forcing_veneers(self.force_veneers, ctrl_plane); 1910 1911 // We don't need the data any more, so return it to the caller. 1912 mem::take(&mut self.buf.data).into_vec() 1913 } 1914 } 1915 1916 // We use an actual instruction definition to do tests, so we depend on the `arm64` feature here. 1917 #[cfg(all(test, feature = "arm64"))] 1918 mod test { 1919 use cranelift_entity::EntityRef as _; 1920 1921 use super::*; 1922 use crate::ir::UserExternalNameRef; 1923 use crate::isa::aarch64::inst::xreg; 1924 use crate::isa::aarch64::inst::{BranchTarget, CondBrKind, EmitInfo, Inst}; 1925 use crate::machinst::{MachInstEmit, MachInstEmitState}; 1926 use crate::settings; 1927 use std::default::Default; 1928 use std::vec::Vec; 1929 1930 fn label(n: u32) -> MachLabel { 1931 MachLabel::from_block(BlockIndex::new(n as usize)) 1932 } 1933 fn target(n: u32) -> BranchTarget { 1934 BranchTarget::Label(label(n)) 1935 } 1936 1937 #[test] 1938 fn test_elide_jump_to_next() { 1939 let info = EmitInfo::new(settings::Flags::new(settings::builder())); 1940 let mut buf = MachBuffer::new(); 1941 let mut state = <Inst as MachInstEmit>::State::default(); 1942 let constants = Default::default(); 1943 1944 buf.reserve_labels_for_blocks(2); 1945 buf.bind_label(label(0), state.ctrl_plane_mut()); 1946 let inst = Inst::Jump { dest: target(1) }; 1947 inst.emit(&[], &mut buf, &info, &mut state); 1948 buf.bind_label(label(1), state.ctrl_plane_mut()); 1949 let buf = buf.finish(&constants, state.ctrl_plane_mut()); 1950 assert_eq!(0, buf.total_size()); 1951 } 1952 1953 #[test] 1954 fn test_elide_trivial_jump_blocks() { 1955 let info = EmitInfo::new(settings::Flags::new(settings::builder())); 1956 let mut buf = MachBuffer::new(); 1957 let mut state = <Inst as MachInstEmit>::State::default(); 1958 let constants = Default::default(); 1959 1960 buf.reserve_labels_for_blocks(4); 1961 1962 buf.bind_label(label(0), state.ctrl_plane_mut()); 1963 let inst = Inst::CondBr { 1964 kind: CondBrKind::NotZero(xreg(0)), 1965 taken: target(1), 1966 not_taken: target(2), 1967 }; 1968 inst.emit(&[], &mut buf, &info, &mut state); 1969 1970 buf.bind_label(label(1), state.ctrl_plane_mut()); 1971 let inst = Inst::Jump { dest: target(3) }; 1972 inst.emit(&[], &mut buf, &info, &mut state); 1973 1974 buf.bind_label(label(2), state.ctrl_plane_mut()); 1975 let inst = Inst::Jump { dest: target(3) }; 1976 inst.emit(&[], &mut buf, &info, &mut state); 1977 1978 buf.bind_label(label(3), state.ctrl_plane_mut()); 1979 1980 let buf = buf.finish(&constants, state.ctrl_plane_mut()); 1981 assert_eq!(0, buf.total_size()); 1982 } 1983 1984 #[test] 1985 fn test_flip_cond() { 1986 let info = EmitInfo::new(settings::Flags::new(settings::builder())); 1987 let mut buf = MachBuffer::new(); 1988 let mut state = <Inst as MachInstEmit>::State::default(); 1989 let constants = Default::default(); 1990 1991 buf.reserve_labels_for_blocks(4); 1992 1993 buf.bind_label(label(0), state.ctrl_plane_mut()); 1994 let inst = Inst::CondBr { 1995 kind: CondBrKind::Zero(xreg(0)), 1996 taken: target(1), 1997 not_taken: target(2), 1998 }; 1999 inst.emit(&[], &mut buf, &info, &mut state); 2000 2001 buf.bind_label(label(1), state.ctrl_plane_mut()); 2002 let inst = Inst::Nop4; 2003 inst.emit(&[], &mut buf, &info, &mut state); 2004 2005 buf.bind_label(label(2), state.ctrl_plane_mut()); 2006 let inst = Inst::Udf { 2007 trap_code: TrapCode::Interrupt, 2008 }; 2009 inst.emit(&[], &mut buf, &info, &mut state); 2010 2011 buf.bind_label(label(3), state.ctrl_plane_mut()); 2012 2013 let buf = buf.finish(&constants, state.ctrl_plane_mut()); 2014 2015 let mut buf2 = MachBuffer::new(); 2016 let mut state = Default::default(); 2017 let inst = Inst::TrapIf { 2018 kind: CondBrKind::NotZero(xreg(0)), 2019 trap_code: TrapCode::Interrupt, 2020 }; 2021 inst.emit(&[], &mut buf2, &info, &mut state); 2022 let inst = Inst::Nop4; 2023 inst.emit(&[], &mut buf2, &info, &mut state); 2024 2025 let buf2 = buf2.finish(&constants, state.ctrl_plane_mut()); 2026 2027 assert_eq!(buf.data, buf2.data); 2028 } 2029 2030 #[test] 2031 fn test_island() { 2032 let info = EmitInfo::new(settings::Flags::new(settings::builder())); 2033 let mut buf = MachBuffer::new(); 2034 let mut state = <Inst as MachInstEmit>::State::default(); 2035 let constants = Default::default(); 2036 2037 buf.reserve_labels_for_blocks(4); 2038 2039 buf.bind_label(label(0), state.ctrl_plane_mut()); 2040 let inst = Inst::CondBr { 2041 kind: CondBrKind::NotZero(xreg(0)), 2042 taken: target(2), 2043 not_taken: target(3), 2044 }; 2045 inst.emit(&[], &mut buf, &info, &mut state); 2046 2047 buf.bind_label(label(1), state.ctrl_plane_mut()); 2048 while buf.cur_offset() < 2000000 { 2049 if buf.island_needed(0) { 2050 buf.emit_island(state.ctrl_plane_mut()); 2051 } 2052 let inst = Inst::Nop4; 2053 inst.emit(&[], &mut buf, &info, &mut state); 2054 } 2055 2056 buf.bind_label(label(2), state.ctrl_plane_mut()); 2057 let inst = Inst::Nop4; 2058 inst.emit(&[], &mut buf, &info, &mut state); 2059 2060 buf.bind_label(label(3), state.ctrl_plane_mut()); 2061 let inst = Inst::Nop4; 2062 inst.emit(&[], &mut buf, &info, &mut state); 2063 2064 let buf = buf.finish(&constants, state.ctrl_plane_mut()); 2065 2066 assert_eq!(2000000 + 8, buf.total_size()); 2067 2068 let mut buf2 = MachBuffer::new(); 2069 let mut state = Default::default(); 2070 let inst = Inst::CondBr { 2071 kind: CondBrKind::NotZero(xreg(0)), 2072 2073 // This conditionally taken branch has a 19-bit constant, shifted 2074 // to the left by two, giving us a 21-bit range in total. Half of 2075 // this range positive so the we should be around 1 << 20 bytes 2076 // away for our jump target. 2077 // 2078 // There are two pending fixups by the time we reach this point, 2079 // one for this 19-bit jump and one for the unconditional 26-bit 2080 // jump below. A 19-bit veneer is 4 bytes large and the 26-bit 2081 // veneer is 20 bytes large, which means that pessimistically 2082 // assuming we'll need two veneers we need 24 bytes of extra 2083 // space, meaning that the actual island should come 24-bytes 2084 // before the deadline. 2085 taken: BranchTarget::ResolvedOffset((1 << 20) - 4 - 20), 2086 2087 // This branch is in-range so no veneers are technically 2088 // be needed; however because we resolve *all* pending 2089 // fixups that cross an island when that island occurs, it 2090 // will have a veneer as well. This veneer comes just 2091 // after the one above. (Note that because the CondBr has 2092 // two instructions, the conditinoal and unconditional, 2093 // this offset is the same, though the veneer is four 2094 // bytes later.) 2095 not_taken: BranchTarget::ResolvedOffset((1 << 20) - 4 - 20), 2096 }; 2097 inst.emit(&[], &mut buf2, &info, &mut state); 2098 2099 let buf2 = buf2.finish(&constants, state.ctrl_plane_mut()); 2100 2101 assert_eq!(&buf.data[0..8], &buf2.data[..]); 2102 } 2103 2104 #[test] 2105 fn test_island_backward() { 2106 let info = EmitInfo::new(settings::Flags::new(settings::builder())); 2107 let mut buf = MachBuffer::new(); 2108 let mut state = <Inst as MachInstEmit>::State::default(); 2109 let constants = Default::default(); 2110 2111 buf.reserve_labels_for_blocks(4); 2112 2113 buf.bind_label(label(0), state.ctrl_plane_mut()); 2114 let inst = Inst::Nop4; 2115 inst.emit(&[], &mut buf, &info, &mut state); 2116 2117 buf.bind_label(label(1), state.ctrl_plane_mut()); 2118 let inst = Inst::Nop4; 2119 inst.emit(&[], &mut buf, &info, &mut state); 2120 2121 buf.bind_label(label(2), state.ctrl_plane_mut()); 2122 while buf.cur_offset() < 2000000 { 2123 let inst = Inst::Nop4; 2124 inst.emit(&[], &mut buf, &info, &mut state); 2125 } 2126 2127 buf.bind_label(label(3), state.ctrl_plane_mut()); 2128 let inst = Inst::CondBr { 2129 kind: CondBrKind::NotZero(xreg(0)), 2130 taken: target(0), 2131 not_taken: target(1), 2132 }; 2133 inst.emit(&[], &mut buf, &info, &mut state); 2134 2135 let buf = buf.finish(&constants, state.ctrl_plane_mut()); 2136 2137 assert_eq!(2000000 + 12, buf.total_size()); 2138 2139 let mut buf2 = MachBuffer::new(); 2140 let mut state = Default::default(); 2141 let inst = Inst::CondBr { 2142 kind: CondBrKind::NotZero(xreg(0)), 2143 taken: BranchTarget::ResolvedOffset(8), 2144 not_taken: BranchTarget::ResolvedOffset(4 - (2000000 + 4)), 2145 }; 2146 inst.emit(&[], &mut buf2, &info, &mut state); 2147 let inst = Inst::Jump { 2148 dest: BranchTarget::ResolvedOffset(-(2000000 + 8)), 2149 }; 2150 inst.emit(&[], &mut buf2, &info, &mut state); 2151 2152 let buf2 = buf2.finish(&constants, state.ctrl_plane_mut()); 2153 2154 assert_eq!(&buf.data[2000000..], &buf2.data[..]); 2155 } 2156 2157 #[test] 2158 fn test_multiple_redirect() { 2159 // label0: 2160 // cbz x0, label1 2161 // b label2 2162 // label1: 2163 // b label3 2164 // label2: 2165 // nop 2166 // nop 2167 // b label0 2168 // label3: 2169 // b label4 2170 // label4: 2171 // b label5 2172 // label5: 2173 // b label7 2174 // label6: 2175 // nop 2176 // label7: 2177 // ret 2178 // 2179 // -- should become: 2180 // 2181 // label0: 2182 // cbz x0, label7 2183 // label2: 2184 // nop 2185 // nop 2186 // b label0 2187 // label6: 2188 // nop 2189 // label7: 2190 // ret 2191 2192 let info = EmitInfo::new(settings::Flags::new(settings::builder())); 2193 let mut buf = MachBuffer::new(); 2194 let mut state = <Inst as MachInstEmit>::State::default(); 2195 let constants = Default::default(); 2196 2197 buf.reserve_labels_for_blocks(8); 2198 2199 buf.bind_label(label(0), state.ctrl_plane_mut()); 2200 let inst = Inst::CondBr { 2201 kind: CondBrKind::Zero(xreg(0)), 2202 taken: target(1), 2203 not_taken: target(2), 2204 }; 2205 inst.emit(&[], &mut buf, &info, &mut state); 2206 2207 buf.bind_label(label(1), state.ctrl_plane_mut()); 2208 let inst = Inst::Jump { dest: target(3) }; 2209 inst.emit(&[], &mut buf, &info, &mut state); 2210 2211 buf.bind_label(label(2), state.ctrl_plane_mut()); 2212 let inst = Inst::Nop4; 2213 inst.emit(&[], &mut buf, &info, &mut state); 2214 inst.emit(&[], &mut buf, &info, &mut state); 2215 let inst = Inst::Jump { dest: target(0) }; 2216 inst.emit(&[], &mut buf, &info, &mut state); 2217 2218 buf.bind_label(label(3), state.ctrl_plane_mut()); 2219 let inst = Inst::Jump { dest: target(4) }; 2220 inst.emit(&[], &mut buf, &info, &mut state); 2221 2222 buf.bind_label(label(4), state.ctrl_plane_mut()); 2223 let inst = Inst::Jump { dest: target(5) }; 2224 inst.emit(&[], &mut buf, &info, &mut state); 2225 2226 buf.bind_label(label(5), state.ctrl_plane_mut()); 2227 let inst = Inst::Jump { dest: target(7) }; 2228 inst.emit(&[], &mut buf, &info, &mut state); 2229 2230 buf.bind_label(label(6), state.ctrl_plane_mut()); 2231 let inst = Inst::Nop4; 2232 inst.emit(&[], &mut buf, &info, &mut state); 2233 2234 buf.bind_label(label(7), state.ctrl_plane_mut()); 2235 let inst = Inst::Ret { 2236 rets: vec![], 2237 stack_bytes_to_pop: 0, 2238 }; 2239 inst.emit(&[], &mut buf, &info, &mut state); 2240 2241 let buf = buf.finish(&constants, state.ctrl_plane_mut()); 2242 2243 let golden_data = vec![ 2244 0xa0, 0x00, 0x00, 0xb4, // cbz x0, 0x14 2245 0x1f, 0x20, 0x03, 0xd5, // nop 2246 0x1f, 0x20, 0x03, 0xd5, // nop 2247 0xfd, 0xff, 0xff, 0x17, // b 0 2248 0x1f, 0x20, 0x03, 0xd5, // nop 2249 0xc0, 0x03, 0x5f, 0xd6, // ret 2250 ]; 2251 2252 assert_eq!(&golden_data[..], &buf.data[..]); 2253 } 2254 2255 #[test] 2256 fn test_handle_branch_cycle() { 2257 // label0: 2258 // b label1 2259 // label1: 2260 // b label2 2261 // label2: 2262 // b label3 2263 // label3: 2264 // b label4 2265 // label4: 2266 // b label1 // note: not label0 (to make it interesting). 2267 // 2268 // -- should become: 2269 // 2270 // label0, label1, ..., label4: 2271 // b label0 2272 let info = EmitInfo::new(settings::Flags::new(settings::builder())); 2273 let mut buf = MachBuffer::new(); 2274 let mut state = <Inst as MachInstEmit>::State::default(); 2275 let constants = Default::default(); 2276 2277 buf.reserve_labels_for_blocks(5); 2278 2279 buf.bind_label(label(0), state.ctrl_plane_mut()); 2280 let inst = Inst::Jump { dest: target(1) }; 2281 inst.emit(&[], &mut buf, &info, &mut state); 2282 2283 buf.bind_label(label(1), state.ctrl_plane_mut()); 2284 let inst = Inst::Jump { dest: target(2) }; 2285 inst.emit(&[], &mut buf, &info, &mut state); 2286 2287 buf.bind_label(label(2), state.ctrl_plane_mut()); 2288 let inst = Inst::Jump { dest: target(3) }; 2289 inst.emit(&[], &mut buf, &info, &mut state); 2290 2291 buf.bind_label(label(3), state.ctrl_plane_mut()); 2292 let inst = Inst::Jump { dest: target(4) }; 2293 inst.emit(&[], &mut buf, &info, &mut state); 2294 2295 buf.bind_label(label(4), state.ctrl_plane_mut()); 2296 let inst = Inst::Jump { dest: target(1) }; 2297 inst.emit(&[], &mut buf, &info, &mut state); 2298 2299 let buf = buf.finish(&constants, state.ctrl_plane_mut()); 2300 2301 let golden_data = vec![ 2302 0x00, 0x00, 0x00, 0x14, // b 0 2303 ]; 2304 2305 assert_eq!(&golden_data[..], &buf.data[..]); 2306 } 2307 2308 #[test] 2309 fn metadata_records() { 2310 let mut buf = MachBuffer::<Inst>::new(); 2311 let ctrl_plane = &mut Default::default(); 2312 let constants = Default::default(); 2313 2314 buf.reserve_labels_for_blocks(1); 2315 2316 buf.bind_label(label(0), ctrl_plane); 2317 buf.put1(1); 2318 buf.add_trap(TrapCode::HeapOutOfBounds); 2319 buf.put1(2); 2320 buf.add_trap(TrapCode::IntegerOverflow); 2321 buf.add_trap(TrapCode::IntegerDivisionByZero); 2322 buf.add_call_site(Opcode::Call); 2323 buf.add_reloc( 2324 Reloc::Abs4, 2325 &ExternalName::User(UserExternalNameRef::new(0)), 2326 0, 2327 ); 2328 buf.put1(3); 2329 buf.add_reloc( 2330 Reloc::Abs8, 2331 &ExternalName::User(UserExternalNameRef::new(1)), 2332 1, 2333 ); 2334 buf.put1(4); 2335 2336 let buf = buf.finish(&constants, ctrl_plane); 2337 2338 assert_eq!(buf.data(), &[1, 2, 3, 4]); 2339 assert_eq!( 2340 buf.traps() 2341 .iter() 2342 .map(|trap| (trap.offset, trap.code)) 2343 .collect::<Vec<_>>(), 2344 vec![ 2345 (1, TrapCode::HeapOutOfBounds), 2346 (2, TrapCode::IntegerOverflow), 2347 (2, TrapCode::IntegerDivisionByZero) 2348 ] 2349 ); 2350 assert_eq!( 2351 buf.call_sites() 2352 .iter() 2353 .map(|call_site| (call_site.ret_addr, call_site.opcode)) 2354 .collect::<Vec<_>>(), 2355 vec![(2, Opcode::Call)] 2356 ); 2357 assert_eq!( 2358 buf.relocs() 2359 .iter() 2360 .map(|reloc| (reloc.offset, reloc.kind)) 2361 .collect::<Vec<_>>(), 2362 vec![(2, Reloc::Abs4), (3, Reloc::Abs8)] 2363 ); 2364 } 2365 } 2366