1 //! In-memory representation of compiled machine code, with labels and fixups to
2 //! refer to those labels. Handles constant-pool island insertion and also
3 //! veneer insertion for out-of-range jumps.
4 //!
5 //! This code exists to solve three problems:
6 //!
7 //! - Branch targets for forward branches are not known until later, when we
8 //!   emit code in a single pass through the instruction structs.
9 //!
10 //! - On many architectures, address references or offsets have limited range.
11 //!   For example, on AArch64, conditional branches can only target code +/- 1MB
12 //!   from the branch itself.
13 //!
14 //! - The lowering of control flow from the CFG-with-edges produced by
15 //!   [BlockLoweringOrder](super::BlockLoweringOrder), combined with many empty
16 //!   edge blocks when the register allocator does not need to insert any
17 //!   spills/reloads/moves in edge blocks, results in many suboptimal branch
18 //!   patterns. The lowering also pays no attention to block order, and so
19 //!   two-target conditional forms (cond-br followed by uncond-br) can often by
20 //!   avoided because one of the targets is the fallthrough. There are several
21 //!   cases here where we can simplify to use fewer branches.
22 //!
23 //! This "buffer" implements a single-pass code emission strategy (with a later
24 //! "fixup" pass, but only through recorded fixups, not all instructions). The
25 //! basic idea is:
26 //!
27 //! - Emit branches as they are, including two-target (cond/uncond) compound
28 //!   forms, but with zero offsets and optimistically assuming the target will be
29 //!   in range. Record the "fixup" for later. Targets are denoted instead by
30 //!   symbolic "labels" that are then bound to certain offsets in the buffer as
31 //!   we emit code. (Nominally, there is a label at the start of every basic
32 //!   block.)
33 //!
34 //! - As we do this, track the offset in the buffer at which the first label
35 //!   reference "goes out of range". We call this the "deadline". If we reach the
36 //!   deadline and we still have not bound the label to which an unresolved branch
37 //!   refers, we have a problem!
38 //!
39 //! - To solve this problem, we emit "islands" full of "veneers". An island is
40 //!   simply a chunk of code inserted in the middle of the code actually produced
41 //!   by the emitter (e.g., vcode iterating over instruction structs). The emitter
42 //!   has some awareness of this: it either asks for an island between blocks, so
43 //!   it is not accidentally executed, or else it emits a branch around the island
44 //!   when all other options fail (see `Inst::EmitIsland` meta-instruction).
45 //!
46 //! - A "veneer" is an instruction (or sequence of instructions) in an "island"
47 //!   that implements a longer-range reference to a label. The idea is that, for
48 //!   example, a branch with a limited range can branch to a "veneer" instead,
49 //!   which is simply a branch in a form that can use a longer-range reference. On
50 //!   AArch64, for example, conditionals have a +/- 1 MB range, but a conditional
51 //!   can branch to an unconditional branch which has a +/- 128 MB range. Hence, a
52 //!   conditional branch's label reference can be fixed up with a "veneer" to
53 //!   achieve a longer range.
54 //!
55 //! - To implement all of this, we require the backend to provide a `LabelUse`
56 //!   type that implements a trait. This is nominally an enum that records one of
57 //!   several kinds of references to an offset in code -- basically, a relocation
58 //!   type -- and will usually correspond to different instruction formats. The
59 //!   `LabelUse` implementation specifies the maximum range, how to patch in the
60 //!   actual label location when known, and how to generate a veneer to extend the
61 //!   range.
62 //!
63 //! That satisfies label references, but we still may have suboptimal branch
64 //! patterns. To clean up the branches, we do a simple "peephole"-style
65 //! optimization on the fly. To do so, the emitter (e.g., `Inst::emit()`)
66 //! informs the buffer of branches in the code and, in the case of conditionals,
67 //! the code that would have been emitted to invert this branch's condition. We
68 //! track the "latest branches": these are branches that are contiguous up to
69 //! the current offset. (If any code is emitted after a branch, that branch or
70 //! run of contiguous branches is no longer "latest".) The latest branches are
71 //! those that we can edit by simply truncating the buffer and doing something
72 //! else instead.
73 //!
74 //! To optimize branches, we implement several simple rules, and try to apply
75 //! them to the "latest branches" when possible:
76 //!
77 //! - A branch with a label target, when that label is bound to the ending
78 //!   offset of the branch (the fallthrough location), can be removed altogether,
79 //!   because the branch would have no effect).
80 //!
81 //! - An unconditional branch that starts at a label location, and branches to
82 //!   another label, results in a "label alias": all references to the label bound
83 //!   *to* this branch instruction are instead resolved to the *target* of the
84 //!   branch instruction. This effectively removes empty blocks that just
85 //!   unconditionally branch to the next block. We call this "branch threading".
86 //!
87 //! - A conditional followed by an unconditional, when the conditional branches
88 //!   to the unconditional's fallthrough, results in (i) the truncation of the
89 //!   unconditional, (ii) the inversion of the condition's condition, and (iii)
90 //!   replacement of the conditional's target (using the original target of the
91 //!   unconditional). This is a fancy way of saying "we can flip a two-target
92 //!   conditional branch's taken/not-taken targets if it works better with our
93 //!   fallthrough". To make this work, the emitter actually gives the buffer
94 //!   *both* forms of every conditional branch: the true form is emitted into the
95 //!   buffer, and the "inverted" machine-code bytes are provided as part of the
96 //!   branch-fixup metadata.
97 //!
98 //! - An unconditional B preceded by another unconditional P, when B's label(s) have
99 //!   been redirected to target(B), can be removed entirely. This is an extension
100 //!   of the branch-threading optimization, and is valid because if we know there
101 //!   will be no fallthrough into this branch instruction (the prior instruction
102 //!   is an unconditional jump), and if we know we have successfully redirected
103 //!   all labels, then this branch instruction is unreachable. Note that this
104 //!   works because the redirection happens before the label is ever resolved
105 //!   (fixups happen at island emission time, at which point latest-branches are
106 //!   cleared, or at the end of emission), so we are sure to catch and redirect
107 //!   all possible paths to this instruction.
108 //!
109 //! # Branch-optimization Correctness
110 //!
111 //! The branch-optimization mechanism depends on a few data structures with
112 //! invariants, which are always held outside the scope of top-level public
113 //! methods:
114 //!
115 //! - The latest-branches list. Each entry describes a span of the buffer
116 //!   (start/end offsets), the label target, the corresponding fixup-list entry
117 //!   index, and the bytes (must be the same length) for the inverted form, if
118 //!   conditional. The list of labels that are bound to the start-offset of this
119 //!   branch is *complete* (if any label has a resolved offset equal to `start`
120 //!   and is not an alias, it must appear in this list) and *precise* (no label
121 //!   in this list can be bound to another offset). No label in this list should
122 //!   be an alias.  No two branch ranges can overlap, and branches are in
123 //!   ascending-offset order.
124 //!
125 //! - The labels-at-tail list. This contains all MachLabels that have been bound
126 //!   to (whose resolved offsets are equal to) the tail offset of the buffer.
127 //!   No label in this list should be an alias.
128 //!
129 //! - The label_offsets array, containing the bound offset of a label or
130 //!   UNKNOWN. No label can be bound at an offset greater than the current
131 //!   buffer tail.
132 //!
133 //! - The label_aliases array, containing another label to which a label is
134 //!   bound or UNKNOWN. A label's resolved offset is the resolved offset
135 //!   of the label it is aliased to, if this is set.
136 //!
137 //! We argue below, at each method, how the invariants in these data structures
138 //! are maintained (grep for "Post-invariant").
139 //!
140 //! Given these invariants, we argue why each optimization preserves execution
141 //! semantics below (grep for "Preserves execution semantics").
142 //!
143 //! # Avoiding Quadratic Behavior
144 //!
145 //! There are two cases where we've had to take some care to avoid
146 //! quadratic worst-case behavior:
147 //!
148 //! - The "labels at this branch" list can grow unboundedly if the
149 //!   code generator binds many labels at one location. If the count
150 //!   gets too high (defined by the `LABEL_LIST_THRESHOLD` constant), we
151 //!   simply abort an optimization early in a way that is always correct
152 //!   but is conservative.
153 //!
154 //! - The fixup list can interact with island emission to create
155 //!   "quadratic island behvior". In a little more detail, one can hit
156 //!   this behavior by having some pending fixups (forward label
157 //!   references) with long-range label-use kinds, and some others
158 //!   with shorter-range references that nonetheless still are pending
159 //!   long enough to trigger island generation. In such a case, we
160 //!   process the fixup list, generate veneers to extend some forward
161 //!   references' ranges, but leave the other (longer-range) ones
162 //!   alone. The way this was implemented put them back on a list and
163 //!   resulted in quadratic behavior.
164 //!
165 //!   To avoid this, we could use a better data structure that allows
166 //!   us to query for fixups with deadlines "coming soon" and generate
167 //!   veneers for only those fixups. However, there is some
168 //!   interaction with the branch peephole optimizations: the
169 //!   invariant there is that branches in the "most recent branches
170 //!   contiguous with end of buffer" list have corresponding fixups in
171 //!   order (so that when we chomp the branch, we can chomp its fixup
172 //!   too).
173 //!
174 //!   So instead, when we generate an island, for now we create
175 //!   veneers for *all* pending fixups, then if upgraded to a kind
176 //!   that no longer supports veneers (is at "max range"), kick the
177 //!   fixups off to a list that is *not* processed at islands except
178 //!   for one last pass after emission. This allows us to skip the
179 //!   work and avoids the quadratic behvior. We expect that this is
180 //!   fine-ish for now: islands are relatively rare, and if they do
181 //!   happen and generate unnecessary veneers (as will now happen for
182 //!   the case above) we'll only get one unnecessary veneer per
183 //!   branch (then they are at max range already).
184 //!
185 //!   Longer-term, we could use a data structure that allows querying
186 //!   by deadline, as long as we can properly chomp just-added fixups
187 //!   when chomping branches.
188 
189 use crate::binemit::{Addend, CodeOffset, Reloc, StackMap};
190 use crate::ir::{ExternalName, Opcode, RelSourceLoc, SourceLoc, TrapCode};
191 use crate::isa::unwind::UnwindInst;
192 use crate::machinst::{
193     BlockIndex, MachInstLabelUse, TextSectionBuilder, VCodeConstant, VCodeConstants, VCodeInst,
194 };
195 use crate::timing;
196 use crate::trace;
197 use cranelift_control::ControlPlane;
198 use cranelift_entity::{entity_impl, PrimaryMap};
199 use smallvec::{smallvec, SmallVec};
200 use std::convert::TryFrom;
201 use std::mem;
202 use std::string::String;
203 use std::vec::Vec;
204 
205 #[cfg(feature = "enable-serde")]
206 use serde::{Deserialize, Serialize};
207 
208 #[cfg(feature = "enable-serde")]
209 pub trait CompilePhase {
210     type MachSrcLocType: for<'a> Deserialize<'a> + Serialize + core::fmt::Debug + PartialEq + Clone;
211     type SourceLocType: for<'a> Deserialize<'a> + Serialize + core::fmt::Debug + PartialEq + Clone;
212 }
213 
214 #[cfg(not(feature = "enable-serde"))]
215 pub trait CompilePhase {
216     type MachSrcLocType: core::fmt::Debug + PartialEq + Clone;
217     type SourceLocType: core::fmt::Debug + PartialEq + Clone;
218 }
219 
220 /// Status of a compiled artifact that needs patching before being used.
221 #[derive(Clone, Debug, PartialEq)]
222 #[cfg_attr(feature = "enable-serde", derive(Serialize, Deserialize))]
223 pub struct Stencil;
224 
225 /// Status of a compiled artifact ready to use.
226 #[derive(Clone, Debug, PartialEq)]
227 pub struct Final;
228 
229 impl CompilePhase for Stencil {
230     type MachSrcLocType = MachSrcLoc<Stencil>;
231     type SourceLocType = RelSourceLoc;
232 }
233 
234 impl CompilePhase for Final {
235     type MachSrcLocType = MachSrcLoc<Final>;
236     type SourceLocType = SourceLoc;
237 }
238 
239 #[derive(Clone, Copy, Debug, PartialEq, Eq)]
240 enum ForceVeneers {
241     Yes,
242     No,
243 }
244 
245 #[derive(Clone, Copy, Debug, PartialEq, Eq)]
246 enum IsLastIsland {
247     Yes,
248     No,
249 }
250 
251 /// A buffer of output to be produced, fixed up, and then emitted to a CodeSink
252 /// in bulk.
253 ///
254 /// This struct uses `SmallVec`s to support small-ish function bodies without
255 /// any heap allocation. As such, it will be several kilobytes large. This is
256 /// likely fine as long as it is stack-allocated for function emission then
257 /// thrown away; but beware if many buffer objects are retained persistently.
258 pub struct MachBuffer<I: VCodeInst> {
259     /// The buffer contents, as raw bytes.
260     data: SmallVec<[u8; 1024]>,
261     /// Any relocations referring to this code. Note that only *external*
262     /// relocations are tracked here; references to labels within the buffer are
263     /// resolved before emission.
264     relocs: SmallVec<[MachReloc; 16]>,
265     /// Any trap records referring to this code.
266     traps: SmallVec<[MachTrap; 16]>,
267     /// Any call site records referring to this code.
268     call_sites: SmallVec<[MachCallSite; 16]>,
269     /// Any source location mappings referring to this code.
270     srclocs: SmallVec<[MachSrcLoc<Stencil>; 64]>,
271     /// Any stack maps referring to this code.
272     stack_maps: SmallVec<[MachStackMap; 8]>,
273     /// Any unwind info at a given location.
274     unwind_info: SmallVec<[(CodeOffset, UnwindInst); 8]>,
275     /// The current source location in progress (after `start_srcloc()` and
276     /// before `end_srcloc()`).  This is a (start_offset, src_loc) tuple.
277     cur_srcloc: Option<(CodeOffset, RelSourceLoc)>,
278     /// Known label offsets; `UNKNOWN_LABEL_OFFSET` if unknown.
279     label_offsets: SmallVec<[CodeOffset; 16]>,
280     /// Label aliases: when one label points to an unconditional jump, and that
281     /// jump points to another label, we can redirect references to the first
282     /// label immediately to the second.
283     ///
284     /// Invariant: we don't have label-alias cycles. We ensure this by,
285     /// before setting label A to alias label B, resolving B's alias
286     /// target (iteratively until a non-aliased label); if B is already
287     /// aliased to A, then we cannot alias A back to B.
288     label_aliases: SmallVec<[MachLabel; 16]>,
289     /// Constants that must be emitted at some point.
290     pending_constants: SmallVec<[VCodeConstant; 16]>,
291     /// Traps that must be emitted at some point.
292     pending_traps: SmallVec<[MachLabelTrap; 16]>,
293     /// Fixups that must be performed after all code is emitted.
294     fixup_records: SmallVec<[MachLabelFixup<I>; 16]>,
295     /// Fixups whose labels are at maximum range already: these need
296     /// not be considered in island emission until we're done
297     /// emitting.
298     fixup_records_max_range: SmallVec<[MachLabelFixup<I>; 16]>,
299     /// Current deadline at which all constants are flushed and all code labels
300     /// are extended by emitting long-range jumps in an island. This flush
301     /// should be rare (e.g., on AArch64, the shortest-range PC-rel references
302     /// are +/- 1MB for conditional jumps and load-literal instructions), so
303     /// it's acceptable to track a minimum and flush-all rather than doing more
304     /// detailed "current minimum" / sort-by-deadline trickery.
305     island_deadline: CodeOffset,
306     /// How many bytes are needed in the worst case for an island, given all
307     /// pending constants and fixups.
308     island_worst_case_size: CodeOffset,
309     /// Latest branches, to facilitate in-place editing for better fallthrough
310     /// behavior and empty-block removal.
311     latest_branches: SmallVec<[MachBranch; 4]>,
312     /// All labels at the current offset (emission tail). This is lazily
313     /// cleared: it is actually accurate as long as the current offset is
314     /// `labels_at_tail_off`, but if `cur_offset()` has grown larger, it should
315     /// be considered as empty.
316     ///
317     /// For correctness, this *must* be complete (i.e., the vector must contain
318     /// all labels whose offsets are resolved to the current tail), because we
319     /// rely on it to update labels when we truncate branches.
320     labels_at_tail: SmallVec<[MachLabel; 4]>,
321     /// The last offset at which `labels_at_tail` is valid. It is conceptually
322     /// always describing the tail of the buffer, but we do not clear
323     /// `labels_at_tail` eagerly when the tail grows, rather we lazily clear it
324     /// when the offset has grown past this (`labels_at_tail_off`) point.
325     /// Always <= `cur_offset()`.
326     labels_at_tail_off: CodeOffset,
327     /// Metadata about all constants that this function has access to.
328     ///
329     /// This records the size/alignment of all constants (not the actual data)
330     /// along with the last available label generated for the constant. This map
331     /// is consulted when constants are referred to and the label assigned to a
332     /// constant may change over time as well.
333     constants: PrimaryMap<VCodeConstant, MachBufferConstant>,
334     /// All recorded usages of constants as pairs of the constant and where the
335     /// constant needs to be placed within `self.data`. Note that the same
336     /// constant may appear in this array multiple times if it was emitted
337     /// multiple times.
338     used_constants: SmallVec<[(VCodeConstant, CodeOffset); 4]>,
339 }
340 
341 impl MachBufferFinalized<Stencil> {
342     /// Get a finalized machine buffer by applying the function's base source location.
343     pub fn apply_base_srcloc(self, base_srcloc: SourceLoc) -> MachBufferFinalized<Final> {
344         MachBufferFinalized {
345             data: self.data,
346             relocs: self.relocs,
347             traps: self.traps,
348             call_sites: self.call_sites,
349             srclocs: self
350                 .srclocs
351                 .into_iter()
352                 .map(|srcloc| srcloc.apply_base_srcloc(base_srcloc))
353                 .collect(),
354             stack_maps: self.stack_maps,
355             unwind_info: self.unwind_info,
356             alignment: self.alignment,
357         }
358     }
359 }
360 
361 /// A `MachBuffer` once emission is completed: holds generated code and records,
362 /// without fixups. This allows the type to be independent of the backend.
363 #[derive(PartialEq, Debug, Clone)]
364 #[cfg_attr(feature = "enable-serde", derive(serde::Serialize, serde::Deserialize))]
365 pub struct MachBufferFinalized<T: CompilePhase> {
366     /// The buffer contents, as raw bytes.
367     pub(crate) data: SmallVec<[u8; 1024]>,
368     /// Any relocations referring to this code. Note that only *external*
369     /// relocations are tracked here; references to labels within the buffer are
370     /// resolved before emission.
371     pub(crate) relocs: SmallVec<[MachReloc; 16]>,
372     /// Any trap records referring to this code.
373     pub(crate) traps: SmallVec<[MachTrap; 16]>,
374     /// Any call site records referring to this code.
375     pub(crate) call_sites: SmallVec<[MachCallSite; 16]>,
376     /// Any source location mappings referring to this code.
377     pub(crate) srclocs: SmallVec<[T::MachSrcLocType; 64]>,
378     /// Any stack maps referring to this code.
379     pub(crate) stack_maps: SmallVec<[MachStackMap; 8]>,
380     /// Any unwind info at a given location.
381     pub unwind_info: SmallVec<[(CodeOffset, UnwindInst); 8]>,
382     /// The requireed alignment of this buffer
383     pub alignment: u32,
384 }
385 
386 const UNKNOWN_LABEL_OFFSET: CodeOffset = 0xffff_ffff;
387 const UNKNOWN_LABEL: MachLabel = MachLabel(0xffff_ffff);
388 
389 /// Threshold on max length of `labels_at_this_branch` list to avoid
390 /// unbounded quadratic behavior (see comment below at use-site).
391 const LABEL_LIST_THRESHOLD: usize = 100;
392 
393 /// A label refers to some offset in a `MachBuffer`. It may not be resolved at
394 /// the point at which it is used by emitted code; the buffer records "fixups"
395 /// for references to the label, and will come back and patch the code
396 /// appropriately when the label's location is eventually known.
397 #[derive(Clone, Copy, Debug, PartialEq, Eq, PartialOrd, Ord, Hash)]
398 pub struct MachLabel(u32);
399 entity_impl!(MachLabel);
400 
401 impl MachLabel {
402     /// Get a label for a block. (The first N MachLabels are always reseved for
403     /// the N blocks in the vcode.)
404     pub fn from_block(bindex: BlockIndex) -> MachLabel {
405         MachLabel(bindex.index() as u32)
406     }
407 
408     /// Get the numeric label index.
409     pub fn get(self) -> u32 {
410         self.0
411     }
412 
413     /// Creates a string representing this label, for convenience.
414     pub fn to_string(&self) -> String {
415         format!("label{}", self.0)
416     }
417 }
418 
419 impl Default for MachLabel {
420     fn default() -> Self {
421         UNKNOWN_LABEL
422     }
423 }
424 
425 /// A stack map extent, when creating a stack map.
426 pub enum StackMapExtent {
427     /// The stack map starts at this instruction, and ends after the number of upcoming bytes
428     /// (note: this is a code offset diff).
429     UpcomingBytes(CodeOffset),
430 
431     /// The stack map started at the given offset and ends at the current one. This helps
432     /// architectures where the instruction size has not a fixed length.
433     StartedAtOffset(CodeOffset),
434 }
435 
436 impl<I: VCodeInst> MachBuffer<I> {
437     /// Create a new section, known to start at `start_offset` and with a size limited to
438     /// `length_limit`.
439     pub fn new() -> MachBuffer<I> {
440         MachBuffer {
441             data: SmallVec::new(),
442             relocs: SmallVec::new(),
443             traps: SmallVec::new(),
444             call_sites: SmallVec::new(),
445             srclocs: SmallVec::new(),
446             stack_maps: SmallVec::new(),
447             unwind_info: SmallVec::new(),
448             cur_srcloc: None,
449             label_offsets: SmallVec::new(),
450             label_aliases: SmallVec::new(),
451             pending_constants: SmallVec::new(),
452             pending_traps: SmallVec::new(),
453             fixup_records: SmallVec::new(),
454             fixup_records_max_range: SmallVec::new(),
455             island_deadline: UNKNOWN_LABEL_OFFSET,
456             island_worst_case_size: 0,
457             latest_branches: SmallVec::new(),
458             labels_at_tail: SmallVec::new(),
459             labels_at_tail_off: 0,
460             constants: Default::default(),
461             used_constants: Default::default(),
462         }
463     }
464 
465     /// Current offset from start of buffer.
466     pub fn cur_offset(&self) -> CodeOffset {
467         self.data.len() as CodeOffset
468     }
469 
470     /// Add a byte.
471     pub fn put1(&mut self, value: u8) {
472         self.data.push(value);
473 
474         // Post-invariant: conceptual-labels_at_tail contains a complete and
475         // precise list of labels bound at `cur_offset()`. We have advanced
476         // `cur_offset()`, hence if it had been equal to `labels_at_tail_off`
477         // before, it is not anymore (and it cannot become equal, because
478         // `labels_at_tail_off` is always <= `cur_offset()`). Thus the list is
479         // conceptually empty (even though it is only lazily cleared). No labels
480         // can be bound at this new offset (by invariant on `label_offsets`).
481         // Hence the invariant holds.
482     }
483 
484     /// Add 2 bytes.
485     pub fn put2(&mut self, value: u16) {
486         let bytes = value.to_le_bytes();
487         self.data.extend_from_slice(&bytes[..]);
488 
489         // Post-invariant: as for `put1()`.
490     }
491 
492     /// Add 4 bytes.
493     pub fn put4(&mut self, value: u32) {
494         let bytes = value.to_le_bytes();
495         self.data.extend_from_slice(&bytes[..]);
496 
497         // Post-invariant: as for `put1()`.
498     }
499 
500     /// Add 8 bytes.
501     pub fn put8(&mut self, value: u64) {
502         let bytes = value.to_le_bytes();
503         self.data.extend_from_slice(&bytes[..]);
504 
505         // Post-invariant: as for `put1()`.
506     }
507 
508     /// Add a slice of bytes.
509     pub fn put_data(&mut self, data: &[u8]) {
510         self.data.extend_from_slice(data);
511 
512         // Post-invariant: as for `put1()`.
513     }
514 
515     /// Reserve appended space and return a mutable slice referring to it.
516     pub fn get_appended_space(&mut self, len: usize) -> &mut [u8] {
517         let off = self.data.len();
518         let new_len = self.data.len() + len;
519         self.data.resize(new_len, 0);
520         &mut self.data[off..]
521 
522         // Post-invariant: as for `put1()`.
523     }
524 
525     /// Align up to the given alignment.
526     pub fn align_to(&mut self, align_to: CodeOffset) {
527         trace!("MachBuffer: align to {}", align_to);
528         assert!(
529             align_to.is_power_of_two(),
530             "{} is not a power of two",
531             align_to
532         );
533         while self.cur_offset() & (align_to - 1) != 0 {
534             self.put1(0);
535         }
536 
537         // Post-invariant: as for `put1()`.
538     }
539 
540     /// Allocate a `Label` to refer to some offset. May not be bound to a fixed
541     /// offset yet.
542     pub fn get_label(&mut self) -> MachLabel {
543         let l = self.label_offsets.len() as u32;
544         self.label_offsets.push(UNKNOWN_LABEL_OFFSET);
545         self.label_aliases.push(UNKNOWN_LABEL);
546         trace!("MachBuffer: new label -> {:?}", MachLabel(l));
547         MachLabel(l)
548 
549         // Post-invariant: the only mutation is to add a new label; it has no
550         // bound offset yet, so it trivially satisfies all invariants.
551     }
552 
553     /// Reserve the first N MachLabels for blocks.
554     pub fn reserve_labels_for_blocks(&mut self, blocks: usize) {
555         trace!("MachBuffer: first {} labels are for blocks", blocks);
556         debug_assert!(self.label_offsets.is_empty());
557         self.label_offsets.resize(blocks, UNKNOWN_LABEL_OFFSET);
558         self.label_aliases.resize(blocks, UNKNOWN_LABEL);
559 
560         // Post-invariant: as for `get_label()`.
561     }
562 
563     /// Registers metadata in this `MachBuffer` about the `constants` provided.
564     ///
565     /// This will record the size/alignment of all constants which will prepare
566     /// them for emission later on.
567     pub fn register_constants(&mut self, constants: &VCodeConstants) {
568         for (c, val) in constants.iter() {
569             let c2 = self.constants.push(MachBufferConstant {
570                 upcoming_label: None,
571                 align: val.alignment(),
572                 size: val.as_slice().len(),
573             });
574             assert_eq!(c, c2);
575         }
576     }
577 
578     /// Completes constant emission by iterating over `self.used_constants` and
579     /// filling in the "holes" with the constant values provided by `constants`.
580     ///
581     /// Returns the alignment required for this entire buffer. Alignment starts
582     /// at the ISA's minimum function alignment and can be increased due to
583     /// constant requirements.
584     fn finish_constants(&mut self, constants: &VCodeConstants) -> u32 {
585         let mut alignment = I::function_alignment().minimum;
586         for (constant, offset) in mem::take(&mut self.used_constants) {
587             let constant = constants.get(constant);
588             let data = constant.as_slice();
589             self.data[offset as usize..][..data.len()].copy_from_slice(data);
590             alignment = constant.alignment().max(alignment);
591         }
592         alignment
593     }
594 
595     /// Returns a label that can be used to refer to the `constant` provided.
596     ///
597     /// This will automatically defer a new constant to be emitted for
598     /// `constant` if it has not been previously emitted. Note that this
599     /// function may return a different label for the same constant at
600     /// different points in time. The label is valid to use only from the
601     /// current location; the MachBuffer takes care to emit the same constant
602     /// multiple times if needed so the constant is always in range.
603     pub fn get_label_for_constant(&mut self, constant: VCodeConstant) -> MachLabel {
604         let MachBufferConstant {
605             align,
606             size,
607             upcoming_label,
608         } = self.constants[constant];
609         if let Some(label) = upcoming_label {
610             return label;
611         }
612 
613         let label = self.get_label();
614         trace!(
615             "defer constant: eventually emit {size} bytes aligned \
616              to {align} at label {label:?}",
617         );
618         self.update_deadline(size, u32::MAX);
619         self.pending_constants.push(constant);
620         self.constants[constant].upcoming_label = Some(label);
621         label
622     }
623 
624     /// Bind a label to the current offset. A label can only be bound once.
625     pub fn bind_label(&mut self, label: MachLabel, ctrl_plane: &mut ControlPlane) {
626         trace!(
627             "MachBuffer: bind label {:?} at offset {}",
628             label,
629             self.cur_offset()
630         );
631         debug_assert_eq!(self.label_offsets[label.0 as usize], UNKNOWN_LABEL_OFFSET);
632         debug_assert_eq!(self.label_aliases[label.0 as usize], UNKNOWN_LABEL);
633         let offset = self.cur_offset();
634         self.label_offsets[label.0 as usize] = offset;
635         self.lazily_clear_labels_at_tail();
636         self.labels_at_tail.push(label);
637 
638         // Invariants hold: bound offset of label is <= cur_offset (in fact it
639         // is equal). If the `labels_at_tail` list was complete and precise
640         // before, it is still, because we have bound this label to the current
641         // offset and added it to the list (which contains all labels at the
642         // current offset).
643 
644         self.optimize_branches(ctrl_plane);
645 
646         // Post-invariant: by `optimize_branches()` (see argument there).
647     }
648 
649     /// Lazily clear `labels_at_tail` if the tail offset has moved beyond the
650     /// offset that it applies to.
651     fn lazily_clear_labels_at_tail(&mut self) {
652         let offset = self.cur_offset();
653         if offset > self.labels_at_tail_off {
654             self.labels_at_tail_off = offset;
655             self.labels_at_tail.clear();
656         }
657 
658         // Post-invariant: either labels_at_tail_off was at cur_offset, and
659         // state is untouched, or was less than cur_offset, in which case the
660         // labels_at_tail list was conceptually empty, and is now actually
661         // empty.
662     }
663 
664     /// Resolve a label to an offset, if known. May return `UNKNOWN_LABEL_OFFSET`.
665     pub(crate) fn resolve_label_offset(&self, mut label: MachLabel) -> CodeOffset {
666         let mut iters = 0;
667         while self.label_aliases[label.0 as usize] != UNKNOWN_LABEL {
668             label = self.label_aliases[label.0 as usize];
669             // To protect against an infinite loop (despite our assurances to
670             // ourselves that the invariants make this impossible), assert out
671             // after 1M iterations. The number of basic blocks is limited
672             // in most contexts anyway so this should be impossible to hit with
673             // a legitimate input.
674             iters += 1;
675             assert!(iters < 1_000_000, "Unexpected cycle in label aliases");
676         }
677         self.label_offsets[label.0 as usize]
678 
679         // Post-invariant: no mutations.
680     }
681 
682     /// Emit a reference to the given label with the given reference type (i.e.,
683     /// branch-instruction format) at the current offset.  This is like a
684     /// relocation, but handled internally.
685     ///
686     /// This can be called before the branch is actually emitted; fixups will
687     /// not happen until an island is emitted or the buffer is finished.
688     pub fn use_label_at_offset(&mut self, offset: CodeOffset, label: MachLabel, kind: I::LabelUse) {
689         trace!(
690             "MachBuffer: use_label_at_offset: offset {} label {:?} kind {:?}",
691             offset,
692             label,
693             kind
694         );
695 
696         // Add the fixup, and update the worst-case island size based on a
697         // veneer for this label use.
698         self.fixup_records.push(MachLabelFixup {
699             label,
700             offset,
701             kind,
702         });
703         if kind.supports_veneer() {
704             self.island_worst_case_size += kind.veneer_size();
705             self.island_worst_case_size &= !(I::LabelUse::ALIGN - 1);
706         }
707         let deadline = offset.saturating_add(kind.max_pos_range());
708         if deadline < self.island_deadline {
709             self.island_deadline = deadline;
710         }
711 
712         // Post-invariant: no mutations to branches/labels data structures.
713     }
714 
715     /// Inform the buffer of an unconditional branch at the given offset,
716     /// targetting the given label. May be used to optimize branches.
717     /// The last added label-use must correspond to this branch.
718     /// This must be called when the current offset is equal to `start`; i.e.,
719     /// before actually emitting the branch. This implies that for a branch that
720     /// uses a label and is eligible for optimizations by the MachBuffer, the
721     /// proper sequence is:
722     ///
723     /// - Call `use_label_at_offset()` to emit the fixup record.
724     /// - Call `add_uncond_branch()` to make note of the branch.
725     /// - Emit the bytes for the branch's machine code.
726     ///
727     /// Additional requirement: no labels may be bound between `start` and `end`
728     /// (exclusive on both ends).
729     pub fn add_uncond_branch(&mut self, start: CodeOffset, end: CodeOffset, target: MachLabel) {
730         assert!(self.cur_offset() == start);
731         debug_assert!(end > start);
732         assert!(!self.fixup_records.is_empty());
733         let fixup = self.fixup_records.len() - 1;
734         self.lazily_clear_labels_at_tail();
735         self.latest_branches.push(MachBranch {
736             start,
737             end,
738             target,
739             fixup,
740             inverted: None,
741             labels_at_this_branch: self.labels_at_tail.clone(),
742         });
743 
744         // Post-invariant: we asserted branch start is current tail; the list of
745         // labels at branch is cloned from list of labels at current tail.
746     }
747 
748     /// Inform the buffer of a conditional branch at the given offset,
749     /// targetting the given label. May be used to optimize branches.
750     /// The last added label-use must correspond to this branch.
751     ///
752     /// Additional requirement: no labels may be bound between `start` and `end`
753     /// (exclusive on both ends).
754     pub fn add_cond_branch(
755         &mut self,
756         start: CodeOffset,
757         end: CodeOffset,
758         target: MachLabel,
759         inverted: &[u8],
760     ) {
761         assert!(self.cur_offset() == start);
762         debug_assert!(end > start);
763         assert!(!self.fixup_records.is_empty());
764         debug_assert!(inverted.len() == (end - start) as usize);
765         let fixup = self.fixup_records.len() - 1;
766         let inverted = Some(SmallVec::from(inverted));
767         self.lazily_clear_labels_at_tail();
768         self.latest_branches.push(MachBranch {
769             start,
770             end,
771             target,
772             fixup,
773             inverted,
774             labels_at_this_branch: self.labels_at_tail.clone(),
775         });
776 
777         // Post-invariant: we asserted branch start is current tail; labels at
778         // branch list is cloned from list of labels at current tail.
779     }
780 
781     fn truncate_last_branch(&mut self) {
782         self.lazily_clear_labels_at_tail();
783         // Invariants hold at this point.
784 
785         let b = self.latest_branches.pop().unwrap();
786         assert!(b.end == self.cur_offset());
787 
788         // State:
789         //    [PRE CODE]
790         //  Offset b.start, b.labels_at_this_branch:
791         //    [BRANCH CODE]
792         //  cur_off, self.labels_at_tail -->
793         //    (end of buffer)
794         self.data.truncate(b.start as usize);
795         self.fixup_records.truncate(b.fixup);
796         while let Some(last_srcloc) = self.srclocs.last_mut() {
797             if last_srcloc.end <= b.start {
798                 break;
799             }
800             if last_srcloc.start < b.start {
801                 last_srcloc.end = b.start;
802                 break;
803             }
804             self.srclocs.pop();
805         }
806         // State:
807         //    [PRE CODE]
808         //  cur_off, Offset b.start, b.labels_at_this_branch:
809         //    (end of buffer)
810         //
811         //  self.labels_at_tail -->  (past end of buffer)
812         let cur_off = self.cur_offset();
813         self.labels_at_tail_off = cur_off;
814         // State:
815         //    [PRE CODE]
816         //  cur_off, Offset b.start, b.labels_at_this_branch,
817         //  self.labels_at_tail:
818         //    (end of buffer)
819         //
820         // resolve_label_offset(l) for l in labels_at_tail:
821         //    (past end of buffer)
822 
823         trace!(
824             "truncate_last_branch: truncated {:?}; off now {}",
825             b,
826             cur_off
827         );
828 
829         // Fix up resolved label offsets for labels at tail.
830         for &l in &self.labels_at_tail {
831             self.label_offsets[l.0 as usize] = cur_off;
832         }
833         // Old labels_at_this_branch are now at cur_off.
834         self.labels_at_tail
835             .extend(b.labels_at_this_branch.into_iter());
836 
837         // Post-invariant: this operation is defined to truncate the buffer,
838         // which moves cur_off backward, and to move labels at the end of the
839         // buffer back to the start-of-branch offset.
840         //
841         // latest_branches satisfies all invariants:
842         // - it has no branches past the end of the buffer (branches are in
843         //   order, we removed the last one, and we truncated the buffer to just
844         //   before the start of that branch)
845         // - no labels were moved to lower offsets than the (new) cur_off, so
846         //   the labels_at_this_branch list for any other branch need not change.
847         //
848         // labels_at_tail satisfies all invariants:
849         // - all labels that were at the tail after the truncated branch are
850         //   moved backward to just before the branch, which becomes the new tail;
851         //   thus every element in the list should remain (ensured by `.extend()`
852         //   above).
853         // - all labels that refer to the new tail, which is the start-offset of
854         //   the truncated branch, must be present. The `labels_at_this_branch`
855         //   list in the truncated branch's record is a complete and precise list
856         //   of exactly these labels; we append these to labels_at_tail.
857         // - labels_at_tail_off is at cur_off after truncation occurs, so the
858         //   list is valid (not to be lazily cleared).
859         //
860         // The stated operation was performed:
861         // - For each label at the end of the buffer prior to this method, it
862         //   now resolves to the new (truncated) end of the buffer: it must have
863         //   been in `labels_at_tail` (this list is precise and complete, and
864         //   the tail was at the end of the truncated branch on entry), and we
865         //   iterate over this list and set `label_offsets` to the new tail.
866         //   None of these labels could have been an alias (by invariant), so
867         //   `label_offsets` is authoritative for each.
868         // - No other labels will be past the end of the buffer, because of the
869         //   requirement that no labels be bound to the middle of branch ranges
870         //   (see comments to `add_{cond,uncond}_branch()`).
871         // - The buffer is truncated to just before the last branch, and the
872         //   fixup record referring to that last branch is removed.
873     }
874 
875     fn optimize_branches(&mut self, ctrl_plane: &mut ControlPlane) {
876         if ctrl_plane.get_decision() {
877             return;
878         }
879 
880         self.lazily_clear_labels_at_tail();
881         // Invariants valid at this point.
882 
883         trace!(
884             "enter optimize_branches:\n b = {:?}\n l = {:?}\n f = {:?}",
885             self.latest_branches,
886             self.labels_at_tail,
887             self.fixup_records
888         );
889 
890         // We continue to munch on branches at the tail of the buffer until no
891         // more rules apply. Note that the loop only continues if a branch is
892         // actually truncated (or if labels are redirected away from a branch),
893         // so this always makes progress.
894         while let Some(b) = self.latest_branches.last() {
895             let cur_off = self.cur_offset();
896             trace!("optimize_branches: last branch {:?} at off {}", b, cur_off);
897             // If there has been any code emission since the end of the last branch or
898             // label definition, then there's nothing we can edit (because we
899             // don't move code once placed, only back up and overwrite), so
900             // clear the records and finish.
901             if b.end < cur_off {
902                 break;
903             }
904 
905             // If the "labels at this branch" list on this branch is
906             // longer than a threshold, don't do any simplification,
907             // and let the branch remain to separate those labels from
908             // the current tail. This avoids quadratic behavior (see
909             // #3468): otherwise, if a long string of "goto next;
910             // next:" patterns are emitted, all of the labels will
911             // coalesce into a long list of aliases for the current
912             // buffer tail. We must track all aliases of the current
913             // tail for correctness, but we are also allowed to skip
914             // optimization (removal) of any branch, so we take the
915             // escape hatch here and let it stand. In effect this
916             // "spreads" the many thousands of labels in the
917             // pathological case among an actual (harmless but
918             // suboptimal) instruction once per N labels.
919             if b.labels_at_this_branch.len() > LABEL_LIST_THRESHOLD {
920                 break;
921             }
922 
923             // Invariant: we are looking at a branch that ends at the tail of
924             // the buffer.
925 
926             // For any branch, conditional or unconditional:
927             // - If the target is a label at the current offset, then remove
928             //   the conditional branch, and reset all labels that targetted
929             //   the current offset (end of branch) to the truncated
930             //   end-of-code.
931             //
932             // Preserves execution semantics: a branch to its own fallthrough
933             // address is equivalent to a no-op; in both cases, nextPC is the
934             // fallthrough.
935             if self.resolve_label_offset(b.target) == cur_off {
936                 trace!("branch with target == cur off; truncating");
937                 self.truncate_last_branch();
938                 continue;
939             }
940 
941             // If latest is an unconditional branch:
942             //
943             // - If the branch's target is not its own start address, then for
944             //   each label at the start of branch, make the label an alias of the
945             //   branch target, and remove the label from the "labels at this
946             //   branch" list.
947             //
948             //   - Preserves execution semantics: an unconditional branch's
949             //     only effect is to set PC to a new PC; this change simply
950             //     collapses one step in the step-semantics.
951             //
952             //   - Post-invariant: the labels that were bound to the start of
953             //     this branch become aliases, so they must not be present in any
954             //     labels-at-this-branch list or the labels-at-tail list. The
955             //     labels are removed form the latest-branch record's
956             //     labels-at-this-branch list, and are never placed in the
957             //     labels-at-tail list. Furthermore, it is correct that they are
958             //     not in either list, because they are now aliases, and labels
959             //     that are aliases remain aliases forever.
960             //
961             // - If there is a prior unconditional branch that ends just before
962             //   this one begins, and this branch has no labels bound to its
963             //   start, then we can truncate this branch, because it is entirely
964             //   unreachable (we have redirected all labels that make it
965             //   reachable otherwise). Do so and continue around the loop.
966             //
967             //   - Preserves execution semantics: the branch is unreachable,
968             //     because execution can only flow into an instruction from the
969             //     prior instruction's fallthrough or from a branch bound to that
970             //     instruction's start offset. Unconditional branches have no
971             //     fallthrough, so if the prior instruction is an unconditional
972             //     branch, no fallthrough entry can happen. The
973             //     labels-at-this-branch list is complete (by invariant), so if it
974             //     is empty, then the instruction is entirely unreachable. Thus,
975             //     it can be removed.
976             //
977             //   - Post-invariant: ensured by truncate_last_branch().
978             //
979             // - If there is a prior conditional branch whose target label
980             //   resolves to the current offset (branches around the
981             //   unconditional branch), then remove the unconditional branch,
982             //   and make the target of the unconditional the target of the
983             //   conditional instead.
984             //
985             //   - Preserves execution semantics: previously we had:
986             //
987             //         L1:
988             //            cond_br L2
989             //            br L3
990             //         L2:
991             //            (end of buffer)
992             //
993             //     by removing the last branch, we have:
994             //
995             //         L1:
996             //            cond_br L2
997             //         L2:
998             //            (end of buffer)
999             //
1000             //     we then fix up the records for the conditional branch to
1001             //     have:
1002             //
1003             //         L1:
1004             //           cond_br.inverted L3
1005             //         L2:
1006             //
1007             //     In the original code, control flow reaches L2 when the
1008             //     conditional branch's predicate is true, and L3 otherwise. In
1009             //     the optimized code, the same is true.
1010             //
1011             //   - Post-invariant: all edits to latest_branches and
1012             //     labels_at_tail are performed by `truncate_last_branch()`,
1013             //     which maintains the invariants at each step.
1014 
1015             if b.is_uncond() {
1016                 // Set any label equal to current branch's start as an alias of
1017                 // the branch's target, if the target is not the branch itself
1018                 // (i.e., an infinite loop).
1019                 //
1020                 // We cannot perform this aliasing if the target of this branch
1021                 // ultimately aliases back here; if so, we need to keep this
1022                 // branch, so break out of this loop entirely (and clear the
1023                 // latest-branches list below).
1024                 //
1025                 // Note that this check is what prevents cycles from forming in
1026                 // `self.label_aliases`. To see why, consider an arbitrary start
1027                 // state:
1028                 //
1029                 // label_aliases[L1] = L2, label_aliases[L2] = L3, ..., up to
1030                 // Ln, which is not aliased.
1031                 //
1032                 // We would create a cycle if we assigned label_aliases[Ln]
1033                 // = L1.  Note that the below assignment is the only write
1034                 // to label_aliases.
1035                 //
1036                 // By our other invariants, we have that Ln (`l` below)
1037                 // resolves to the offset `b.start`, because it is in the
1038                 // set `b.labels_at_this_branch`.
1039                 //
1040                 // If L1 were already aliased, through some arbitrarily deep
1041                 // chain, to Ln, then it must also resolve to this offset
1042                 // `b.start`.
1043                 //
1044                 // By checking the resolution of `L1` against this offset,
1045                 // and aborting this branch-simplification if they are
1046                 // equal, we prevent the below assignment from ever creating
1047                 // a cycle.
1048                 if self.resolve_label_offset(b.target) != b.start {
1049                     let redirected = b.labels_at_this_branch.len();
1050                     for &l in &b.labels_at_this_branch {
1051                         trace!(
1052                             " -> label at start of branch {:?} redirected to target {:?}",
1053                             l,
1054                             b.target
1055                         );
1056                         self.label_aliases[l.0 as usize] = b.target;
1057                         // NOTE: we continue to ensure the invariant that labels
1058                         // pointing to tail of buffer are in `labels_at_tail`
1059                         // because we already ensured above that the last branch
1060                         // cannot have a target of `cur_off`; so we never have
1061                         // to put the label into `labels_at_tail` when moving it
1062                         // here.
1063                     }
1064                     // Maintain invariant: all branches have been redirected
1065                     // and are no longer pointing at the start of this branch.
1066                     let mut_b = self.latest_branches.last_mut().unwrap();
1067                     mut_b.labels_at_this_branch.clear();
1068 
1069                     if redirected > 0 {
1070                         trace!(" -> after label redirects, restarting loop");
1071                         continue;
1072                     }
1073                 } else {
1074                     break;
1075                 }
1076 
1077                 let b = self.latest_branches.last().unwrap();
1078 
1079                 // Examine any immediately preceding branch.
1080                 if self.latest_branches.len() > 1 {
1081                     let prev_b = &self.latest_branches[self.latest_branches.len() - 2];
1082                     trace!(" -> more than one branch; prev_b = {:?}", prev_b);
1083                     // This uncond is immediately after another uncond; we
1084                     // should have already redirected labels to this uncond away
1085                     // (but check to be sure); so we can truncate this uncond.
1086                     if prev_b.is_uncond()
1087                         && prev_b.end == b.start
1088                         && b.labels_at_this_branch.is_empty()
1089                     {
1090                         trace!(" -> uncond follows another uncond; truncating");
1091                         self.truncate_last_branch();
1092                         continue;
1093                     }
1094 
1095                     // This uncond is immediately after a conditional, and the
1096                     // conditional's target is the end of this uncond, and we've
1097                     // already redirected labels to this uncond away; so we can
1098                     // truncate this uncond, flip the sense of the conditional, and
1099                     // set the conditional's target (in `latest_branches` and in
1100                     // `fixup_records`) to the uncond's target.
1101                     if prev_b.is_cond()
1102                         && prev_b.end == b.start
1103                         && self.resolve_label_offset(prev_b.target) == cur_off
1104                     {
1105                         trace!(" -> uncond follows a conditional, and conditional's target resolves to current offset");
1106                         // Save the target of the uncond (this becomes the
1107                         // target of the cond), and truncate the uncond.
1108                         let target = b.target;
1109                         let data = prev_b.inverted.clone().unwrap();
1110                         self.truncate_last_branch();
1111 
1112                         // Mutate the code and cond branch.
1113                         let off_before_edit = self.cur_offset();
1114                         let prev_b = self.latest_branches.last_mut().unwrap();
1115                         let not_inverted = SmallVec::from(
1116                             &self.data[(prev_b.start as usize)..(prev_b.end as usize)],
1117                         );
1118 
1119                         // Low-level edit: replaces bytes of branch with
1120                         // inverted form. cur_off remains the same afterward, so
1121                         // we do not need to modify label data structures.
1122                         self.data.truncate(prev_b.start as usize);
1123                         self.data.extend_from_slice(&data[..]);
1124 
1125                         // Save the original code as the inversion of the
1126                         // inverted branch, in case we later edit this branch
1127                         // again.
1128                         prev_b.inverted = Some(not_inverted);
1129                         self.fixup_records[prev_b.fixup].label = target;
1130                         trace!(" -> reassigning target of condbr to {:?}", target);
1131                         prev_b.target = target;
1132                         debug_assert_eq!(off_before_edit, self.cur_offset());
1133                         continue;
1134                     }
1135                 }
1136             }
1137 
1138             // If we couldn't do anything with the last branch, then break.
1139             break;
1140         }
1141 
1142         self.purge_latest_branches();
1143 
1144         trace!(
1145             "leave optimize_branches:\n b = {:?}\n l = {:?}\n f = {:?}",
1146             self.latest_branches,
1147             self.labels_at_tail,
1148             self.fixup_records
1149         );
1150     }
1151 
1152     fn purge_latest_branches(&mut self) {
1153         // All of our branch simplification rules work only if a branch ends at
1154         // the tail of the buffer, with no following code; and branches are in
1155         // order in latest_branches; so if the last entry ends prior to
1156         // cur_offset, then clear all entries.
1157         let cur_off = self.cur_offset();
1158         if let Some(l) = self.latest_branches.last() {
1159             if l.end < cur_off {
1160                 trace!("purge_latest_branches: removing branch {:?}", l);
1161                 self.latest_branches.clear();
1162             }
1163         }
1164 
1165         // Post-invariant: no invariant requires any branch to appear in
1166         // `latest_branches`; it is always optional. The list-clear above thus
1167         // preserves all semantics.
1168     }
1169 
1170     /// Emit a trap at some point in the future with the specified code and
1171     /// stack map.
1172     ///
1173     /// This function returns a [`MachLabel`] which will be the future address
1174     /// of the trap. Jumps should refer to this label, likely by using the
1175     /// [`MachBuffer::use_label_at_offset`] method, to get a relocation
1176     /// patched in once the address of the trap is known.
1177     ///
1178     /// This will batch all traps into the end of the function.
1179     pub fn defer_trap(&mut self, code: TrapCode, stack_map: Option<StackMap>) -> MachLabel {
1180         let label = self.get_label();
1181         self.update_deadline(I::TRAP_OPCODE.len(), u32::MAX);
1182         self.pending_traps.push(MachLabelTrap {
1183             label,
1184             code,
1185             stack_map,
1186             loc: self.cur_srcloc.map(|(_start, loc)| loc),
1187         });
1188         label
1189     }
1190 
1191     fn update_deadline(&mut self, len: usize, max_distance: CodeOffset) {
1192         trace!("defer: eventually emit {} bytes", len);
1193         let deadline = self.cur_offset().saturating_add(max_distance);
1194         self.island_worst_case_size += len as CodeOffset;
1195         self.island_worst_case_size =
1196             (self.island_worst_case_size + I::LabelUse::ALIGN - 1) & !(I::LabelUse::ALIGN - 1);
1197         if deadline < self.island_deadline {
1198             self.island_deadline = deadline;
1199         }
1200     }
1201 
1202     /// Is an island needed within the next N bytes?
1203     pub fn island_needed(&self, distance: CodeOffset) -> bool {
1204         self.worst_case_end_of_island(distance) > self.island_deadline
1205     }
1206 
1207     /// Returns the maximal offset that islands can reach if `distance` more
1208     /// bytes are appended.
1209     ///
1210     /// This is used to determine if veneers need insertions since jumps that
1211     /// can't reach past this point must get a veneer of some form.
1212     fn worst_case_end_of_island(&self, distance: CodeOffset) -> CodeOffset {
1213         self.cur_offset()
1214             .saturating_add(distance)
1215             .saturating_add(self.island_worst_case_size)
1216     }
1217 
1218     /// Emit all pending constants and required pending veneers.
1219     ///
1220     /// Should only be called if `island_needed()` returns true, i.e., if we
1221     /// actually reach a deadline. It's not necessarily a problem to do so
1222     /// otherwise but it may result in unnecessary work during emission.
1223     pub fn emit_island(&mut self, ctrl_plane: &mut ControlPlane) {
1224         self.emit_island_maybe_forced(ForceVeneers::No, IsLastIsland::No, ctrl_plane);
1225     }
1226 
1227     /// Same as `emit_island`, but an internal API with a `force_veneers`
1228     /// argument to force all veneers to always get emitted for debugging.
1229     fn emit_island_maybe_forced(
1230         &mut self,
1231         force_veneers: ForceVeneers,
1232         last_island: IsLastIsland,
1233         ctrl_plane: &mut ControlPlane,
1234     ) {
1235         // We're going to purge fixups, so no latest-branch editing can happen
1236         // anymore.
1237         self.latest_branches.clear();
1238 
1239         // Reset internal calculations about islands since we're going to
1240         // change the calculus as we apply fixups.
1241         self.island_deadline = UNKNOWN_LABEL_OFFSET;
1242         self.island_worst_case_size = 0;
1243 
1244         // End the current location tracking since anything emitted during this
1245         // function shouldn't be attributed to whatever the current source
1246         // location is.
1247         //
1248         // Note that the current source location, if it's set right now, will be
1249         // restored at the end of this island emission.
1250         let cur_loc = self.cur_srcloc.map(|(_, loc)| loc);
1251         if cur_loc.is_some() {
1252             self.end_srcloc();
1253         }
1254 
1255         // First flush out all traps/constants so we have more labels in case
1256         // fixups are applied against these labels.
1257         //
1258         // Note that traps are placed first since this typically happens at the
1259         // end of the function and for disassemblers we try to keep all the code
1260         // contiguously together.
1261         for MachLabelTrap {
1262             label,
1263             code,
1264             stack_map,
1265             loc,
1266         } in mem::take(&mut self.pending_traps)
1267         {
1268             // If this trap has source information associated with it then
1269             // emit this information for the trap instruction going out now too.
1270             if let Some(loc) = loc {
1271                 self.start_srcloc(loc);
1272             }
1273             self.align_to(I::LabelUse::ALIGN);
1274             self.bind_label(label, ctrl_plane);
1275             self.add_trap(code);
1276             if let Some(map) = stack_map {
1277                 let extent = StackMapExtent::UpcomingBytes(I::TRAP_OPCODE.len() as u32);
1278                 self.add_stack_map(extent, map);
1279             }
1280             self.put_data(I::TRAP_OPCODE);
1281             if loc.is_some() {
1282                 self.end_srcloc();
1283             }
1284         }
1285 
1286         for constant in mem::take(&mut self.pending_constants) {
1287             let MachBufferConstant { align, size, .. } = self.constants[constant];
1288             let label = self.constants[constant].upcoming_label.take().unwrap();
1289             self.align_to(align);
1290             self.bind_label(label, ctrl_plane);
1291             self.used_constants.push((constant, self.cur_offset()));
1292             self.get_appended_space(size);
1293         }
1294 
1295         let last_island_fixups = match last_island {
1296             IsLastIsland::Yes => mem::take(&mut self.fixup_records_max_range),
1297             IsLastIsland::No => smallvec![],
1298         };
1299         for fixup in mem::take(&mut self.fixup_records)
1300             .into_iter()
1301             .chain(last_island_fixups.into_iter())
1302         {
1303             trace!("emit_island: fixup {:?}", fixup);
1304             let MachLabelFixup {
1305                 label,
1306                 offset,
1307                 kind,
1308             } = fixup;
1309             let label_offset = self.resolve_label_offset(label);
1310             let start = offset as usize;
1311             let end = (offset + kind.patch_size()) as usize;
1312 
1313             if label_offset != UNKNOWN_LABEL_OFFSET {
1314                 // If the offset of the label for this fixup is known then
1315                 // we're going to do something here-and-now. We're either going
1316                 // to patch the original offset because it's an in-bounds jump,
1317                 // or we're going to generate a veneer, patch the fixup to jump
1318                 // to the veneer, and then keep going.
1319                 //
1320                 // If the label comes after the original fixup, then we should
1321                 // be guaranteed that the jump is in-bounds. Otherwise there's
1322                 // a bug somewhere because this method wasn't called soon
1323                 // enough. All forward-jumps are tracked and should get veneers
1324                 // before their deadline comes and they're unable to jump
1325                 // further.
1326                 //
1327                 // Otherwise if the label is before the fixup, then that's a
1328                 // backwards jump. If it's past the maximum negative range
1329                 // then we'll emit a veneer that to jump forward to which can
1330                 // then jump backwards.
1331                 let veneer_required = if label_offset >= offset {
1332                     assert!((label_offset - offset) <= kind.max_pos_range());
1333                     false
1334                 } else {
1335                     (offset - label_offset) > kind.max_neg_range()
1336                 };
1337                 trace!(
1338                     " -> label_offset = {}, known, required = {} (pos {} neg {})",
1339                     label_offset,
1340                     veneer_required,
1341                     kind.max_pos_range(),
1342                     kind.max_neg_range()
1343                 );
1344 
1345                 if (force_veneers == ForceVeneers::Yes && kind.supports_veneer()) || veneer_required
1346                 {
1347                     self.emit_veneer(label, offset, kind);
1348                 } else {
1349                     let slice = &mut self.data[start..end];
1350                     trace!("patching in-range!");
1351                     kind.patch(slice, offset, label_offset);
1352                 }
1353             } else {
1354                 // If the offset of this label is not known at this time then
1355                 // there are three possibilities:
1356                 //
1357                 // 1. It's possible that the label is already a "max
1358                 //    range" label: a veneer would not help us any,
1359                 //    and so we need not consider the label during
1360                 //    island emission any more until the very end (the
1361                 //    last "island" pass). In this case we kick the
1362                 //    label into a separate list to process once at
1363                 //    the end, to avoid quadratic behavior (see
1364                 //    "quadratic island behavior" above, and issue
1365                 //    #6798).
1366                 //
1367                 // 2. Or, we may be about to exceed the maximum jump range of
1368                 //    this fixup. In that case a veneer is inserted to buy some
1369                 //    more budget for the forward-jump. It's guaranteed that the
1370                 //    label will eventually come after where we're at, so we know
1371                 //    that the forward jump is necessary.
1372                 //
1373                 // 3. Otherwise, we're still within range of the
1374                 //    forward jump but the precise target isn't known
1375                 //    yet. In that case, to avoid quadratic behavior
1376                 //    (again, see above), we emit a veneer and if the
1377                 //    resulting label-use fixup is then max-range, we
1378                 //    put it in the max-range list. We could enqueue
1379                 //    the fixup for processing later, and this would
1380                 //    enable slightly fewer veneers, but islands are
1381                 //    relatively rare and the cost of "upgrading" all
1382                 //    forward label refs that cross an island should
1383                 //    be relatively low.
1384                 if !kind.supports_veneer() {
1385                     self.fixup_records_max_range.push(MachLabelFixup {
1386                         label,
1387                         offset,
1388                         kind,
1389                     });
1390                 } else {
1391                     self.emit_veneer(label, offset, kind);
1392                 }
1393             }
1394         }
1395 
1396         if let Some(loc) = cur_loc {
1397             self.start_srcloc(loc);
1398         }
1399     }
1400 
1401     /// Emits a "veneer" the `kind` code at `offset` to jump to `label`.
1402     ///
1403     /// This will generate extra machine code, using `kind`, to get a
1404     /// larger-jump-kind than `kind` allows. The code at `offset` is then
1405     /// patched to jump to our new code, and then the new code is enqueued for
1406     /// a fixup to get processed at some later time.
1407     fn emit_veneer(&mut self, label: MachLabel, offset: CodeOffset, kind: I::LabelUse) {
1408         // If this `kind` doesn't support a veneer then that's a bug in the
1409         // backend because we need to implement support for such a veneer.
1410         assert!(
1411             kind.supports_veneer(),
1412             "jump beyond the range of {:?} but a veneer isn't supported",
1413             kind,
1414         );
1415 
1416         // Allocate space for a veneer in the island.
1417         self.align_to(I::LabelUse::ALIGN);
1418         let veneer_offset = self.cur_offset();
1419         trace!("making a veneer at {}", veneer_offset);
1420         let start = offset as usize;
1421         let end = (offset + kind.patch_size()) as usize;
1422         let slice = &mut self.data[start..end];
1423         // Patch the original label use to refer to the veneer.
1424         trace!(
1425             "patching original at offset {} to veneer offset {}",
1426             offset,
1427             veneer_offset
1428         );
1429         kind.patch(slice, offset, veneer_offset);
1430         // Generate the veneer.
1431         let veneer_slice = self.get_appended_space(kind.veneer_size() as usize);
1432         let (veneer_fixup_off, veneer_label_use) =
1433             kind.generate_veneer(veneer_slice, veneer_offset);
1434         trace!(
1435             "generated veneer; fixup offset {}, label_use {:?}",
1436             veneer_fixup_off,
1437             veneer_label_use
1438         );
1439         // Register a new use of `label` with our new veneer fixup and
1440         // offset. This'll recalculate deadlines accordingly and
1441         // enqueue this fixup to get processed at some later
1442         // time. Note that if we now have a max-range, we instead skip
1443         // the usual fixup list to avoid quadratic behavior.
1444         if veneer_label_use.supports_veneer() {
1445             self.use_label_at_offset(veneer_fixup_off, label, veneer_label_use);
1446         } else {
1447             self.fixup_records_max_range.push(MachLabelFixup {
1448                 label,
1449                 offset: veneer_fixup_off,
1450                 kind: veneer_label_use,
1451             });
1452         }
1453     }
1454 
1455     fn finish_emission_maybe_forcing_veneers(
1456         &mut self,
1457         force_veneers: ForceVeneers,
1458         ctrl_plane: &mut ControlPlane,
1459     ) {
1460         while !self.pending_constants.is_empty()
1461             || !self.pending_traps.is_empty()
1462             || !self.fixup_records.is_empty()
1463             || !self.fixup_records_max_range.is_empty()
1464         {
1465             // `emit_island()` will emit any pending veneers and constants, and
1466             // as a side-effect, will also take care of any fixups with resolved
1467             // labels eagerly.
1468             self.emit_island_maybe_forced(force_veneers, IsLastIsland::Yes, ctrl_plane);
1469         }
1470 
1471         // Ensure that all labels have been fixed up after the last island is emitted. This is a
1472         // full (release-mode) assert because an unresolved label means the emitted code is
1473         // incorrect.
1474         assert!(self.fixup_records.is_empty());
1475     }
1476 
1477     /// Finish any deferred emissions and/or fixups.
1478     pub fn finish(
1479         mut self,
1480         constants: &VCodeConstants,
1481         ctrl_plane: &mut ControlPlane,
1482     ) -> MachBufferFinalized<Stencil> {
1483         let _tt = timing::vcode_emit_finish();
1484 
1485         // Do any optimizations on branches at tail of buffer, as if we
1486         // had bound one last label.
1487         self.optimize_branches(ctrl_plane);
1488 
1489         self.finish_emission_maybe_forcing_veneers(ForceVeneers::No, ctrl_plane);
1490 
1491         let alignment = self.finish_constants(constants);
1492 
1493         let mut srclocs = self.srclocs;
1494         srclocs.sort_by_key(|entry| entry.start);
1495 
1496         MachBufferFinalized {
1497             data: self.data,
1498             relocs: self.relocs,
1499             traps: self.traps,
1500             call_sites: self.call_sites,
1501             srclocs,
1502             stack_maps: self.stack_maps,
1503             unwind_info: self.unwind_info,
1504             alignment,
1505         }
1506     }
1507 
1508     /// Add an external relocation at the current offset.
1509     pub fn add_reloc(&mut self, kind: Reloc, name: &ExternalName, addend: Addend) {
1510         let name = name.clone();
1511         // FIXME(#3277): This should use `I::LabelUse::from_reloc` to optionally
1512         // generate a label-use statement to track whether an island is possibly
1513         // needed to escape this function to actually get to the external name.
1514         // This is most likely to come up on AArch64 where calls between
1515         // functions use a 26-bit signed offset which gives +/- 64MB. This means
1516         // that if a function is 128MB in size and there's a call in the middle
1517         // it's impossible to reach the actual target. Also, while it's
1518         // technically possible to jump to the start of a function and then jump
1519         // further, island insertion below always inserts islands after
1520         // previously appended code so for Cranelift's own implementation this
1521         // is also a problem for 64MB functions on AArch64 which start with a
1522         // call instruction, those won't be able to escape.
1523         //
1524         // Ideally what needs to happen here is that a `LabelUse` is
1525         // transparently generated (or call-sites of this function are audited
1526         // to generate a `LabelUse` instead) and tracked internally. The actual
1527         // relocation would then change over time if and when a veneer is
1528         // inserted, where the relocation here would be patched by this
1529         // `MachBuffer` to jump to the veneer. The problem, though, is that all
1530         // this still needs to end up, in the case of a singular function,
1531         // generating a final relocation pointing either to this particular
1532         // relocation or to the veneer inserted. Additionally
1533         // `MachBuffer` needs the concept of a label which will never be
1534         // resolved, so `emit_island` doesn't trip over not actually ever
1535         // knowning what some labels are. Currently the loop in
1536         // `finish_emission_maybe_forcing_veneers` would otherwise infinitely
1537         // loop.
1538         //
1539         // For now this means that because relocs aren't tracked at all that
1540         // AArch64 functions have a rough size limits of 64MB. For now that's
1541         // somewhat reasonable and the failure mode is a panic in `MachBuffer`
1542         // when a relocation can't otherwise be resolved later, so it shouldn't
1543         // actually result in any memory unsafety or anything like that.
1544         self.relocs.push(MachReloc {
1545             offset: self.data.len() as CodeOffset,
1546             kind,
1547             name,
1548             addend,
1549         });
1550     }
1551 
1552     /// Add a trap record at the current offset.
1553     pub fn add_trap(&mut self, code: TrapCode) {
1554         self.traps.push(MachTrap {
1555             offset: self.data.len() as CodeOffset,
1556             code,
1557         });
1558     }
1559 
1560     /// Add a call-site record at the current offset.
1561     pub fn add_call_site(&mut self, opcode: Opcode) {
1562         debug_assert!(
1563             opcode.is_call(),
1564             "adding call site info for a non-call instruction."
1565         );
1566         self.call_sites.push(MachCallSite {
1567             ret_addr: self.data.len() as CodeOffset,
1568             opcode,
1569         });
1570     }
1571 
1572     /// Add an unwind record at the current offset.
1573     pub fn add_unwind(&mut self, unwind: UnwindInst) {
1574         self.unwind_info.push((self.cur_offset(), unwind));
1575     }
1576 
1577     /// Set the `SourceLoc` for code from this offset until the offset at the
1578     /// next call to `end_srcloc()`.
1579     pub fn start_srcloc(&mut self, loc: RelSourceLoc) {
1580         self.cur_srcloc = Some((self.cur_offset(), loc));
1581     }
1582 
1583     /// Mark the end of the `SourceLoc` segment started at the last
1584     /// `start_srcloc()` call.
1585     pub fn end_srcloc(&mut self) {
1586         let (start, loc) = self
1587             .cur_srcloc
1588             .take()
1589             .expect("end_srcloc() called without start_srcloc()");
1590         let end = self.cur_offset();
1591         // Skip zero-length extends.
1592         debug_assert!(end >= start);
1593         if end > start {
1594             self.srclocs.push(MachSrcLoc { start, end, loc });
1595         }
1596     }
1597 
1598     /// Add stack map metadata for this program point: a set of stack offsets
1599     /// (from SP upward) that contain live references.
1600     ///
1601     /// The `offset_to_fp` value is the offset from the nominal SP (at which the `stack_offsets`
1602     /// are based) and the FP value. By subtracting `offset_to_fp` from each `stack_offsets`
1603     /// element, one can obtain live-reference offsets from FP instead.
1604     pub fn add_stack_map(&mut self, extent: StackMapExtent, stack_map: StackMap) {
1605         let (start, end) = match extent {
1606             StackMapExtent::UpcomingBytes(insn_len) => {
1607                 let start_offset = self.cur_offset();
1608                 (start_offset, start_offset + insn_len)
1609             }
1610             StackMapExtent::StartedAtOffset(start_offset) => {
1611                 let end_offset = self.cur_offset();
1612                 debug_assert!(end_offset >= start_offset);
1613                 (start_offset, end_offset)
1614             }
1615         };
1616         trace!("Adding stack map for offsets {start:#x}..{end:#x}");
1617         self.stack_maps.push(MachStackMap {
1618             offset: start,
1619             offset_end: end,
1620             stack_map,
1621         });
1622     }
1623 }
1624 
1625 impl<T: CompilePhase> MachBufferFinalized<T> {
1626     /// Get a list of source location mapping tuples in sorted-by-start-offset order.
1627     pub fn get_srclocs_sorted(&self) -> &[T::MachSrcLocType] {
1628         &self.srclocs[..]
1629     }
1630 
1631     /// Get the total required size for the code.
1632     pub fn total_size(&self) -> CodeOffset {
1633         self.data.len() as CodeOffset
1634     }
1635 
1636     /// Return the code in this mach buffer as a hex string for testing purposes.
1637     pub fn stringify_code_bytes(&self) -> String {
1638         // This is pretty lame, but whatever ..
1639         use std::fmt::Write;
1640         let mut s = String::with_capacity(self.data.len() * 2);
1641         for b in &self.data {
1642             write!(&mut s, "{:02X}", b).unwrap();
1643         }
1644         s
1645     }
1646 
1647     /// Get the code bytes.
1648     pub fn data(&self) -> &[u8] {
1649         // N.B.: we emit every section into the .text section as far as
1650         // the `CodeSink` is concerned; we do not bother to segregate
1651         // the contents into the actual program text, the jumptable and the
1652         // rodata (constant pool). This allows us to generate code assuming
1653         // that these will not be relocated relative to each other, and avoids
1654         // having to designate each section as belonging in one of the three
1655         // fixed categories defined by `CodeSink`. If this becomes a problem
1656         // later (e.g. because of memory permissions or similar), we can
1657         // add this designation and segregate the output; take care, however,
1658         // to add the appropriate relocations in this case.
1659 
1660         &self.data[..]
1661     }
1662 
1663     /// Get the list of external relocations for this code.
1664     pub fn relocs(&self) -> &[MachReloc] {
1665         &self.relocs[..]
1666     }
1667 
1668     /// Get the list of trap records for this code.
1669     pub fn traps(&self) -> &[MachTrap] {
1670         &self.traps[..]
1671     }
1672 
1673     /// Get the stack map metadata for this code.
1674     pub fn stack_maps(&self) -> &[MachStackMap] {
1675         &self.stack_maps[..]
1676     }
1677 
1678     /// Get the list of call sites for this code.
1679     pub fn call_sites(&self) -> &[MachCallSite] {
1680         &self.call_sites[..]
1681     }
1682 }
1683 
1684 /// Metadata about a constant.
1685 struct MachBufferConstant {
1686     /// A label which has not yet been bound which can be used for this
1687     /// constant.
1688     ///
1689     /// This is lazily created when a label is requested for a constant and is
1690     /// cleared when a constant is emitted.
1691     upcoming_label: Option<MachLabel>,
1692     /// Required alignment.
1693     align: CodeOffset,
1694     /// The byte size of this constant.
1695     size: usize,
1696 }
1697 
1698 /// A trap that is deferred to the next time an island is emitted for either
1699 /// traps, constants, or fixups.
1700 struct MachLabelTrap {
1701     /// This label will refer to the trap's offset.
1702     label: MachLabel,
1703     /// The code associated with this trap.
1704     code: TrapCode,
1705     /// An optional stack map to associate with this trap.
1706     stack_map: Option<StackMap>,
1707     /// An optional source location to assign for this trap.
1708     loc: Option<RelSourceLoc>,
1709 }
1710 
1711 /// A fixup to perform on the buffer once code is emitted. Fixups always refer
1712 /// to labels and patch the code based on label offsets. Hence, they are like
1713 /// relocations, but internal to one buffer.
1714 #[derive(Debug)]
1715 struct MachLabelFixup<I: VCodeInst> {
1716     /// The label whose offset controls this fixup.
1717     label: MachLabel,
1718     /// The offset to fix up / patch to refer to this label.
1719     offset: CodeOffset,
1720     /// The kind of fixup. This is architecture-specific; each architecture may have,
1721     /// e.g., several types of branch instructions, each with differently-sized
1722     /// offset fields and different places within the instruction to place the
1723     /// bits.
1724     kind: I::LabelUse,
1725 }
1726 
1727 /// A relocation resulting from a compilation.
1728 #[derive(Clone, Debug, PartialEq)]
1729 #[cfg_attr(feature = "enable-serde", derive(serde::Serialize, serde::Deserialize))]
1730 pub struct MachReloc {
1731     /// The offset at which the relocation applies, *relative to the
1732     /// containing section*.
1733     pub offset: CodeOffset,
1734     /// The kind of relocation.
1735     pub kind: Reloc,
1736     /// The external symbol / name to which this relocation refers.
1737     pub name: ExternalName,
1738     /// The addend to add to the symbol value.
1739     pub addend: i64,
1740 }
1741 
1742 /// A trap record resulting from a compilation.
1743 #[derive(Clone, Debug, PartialEq)]
1744 #[cfg_attr(feature = "enable-serde", derive(serde::Serialize, serde::Deserialize))]
1745 pub struct MachTrap {
1746     /// The offset at which the trap instruction occurs, *relative to the
1747     /// containing section*.
1748     pub offset: CodeOffset,
1749     /// The trap code.
1750     pub code: TrapCode,
1751 }
1752 
1753 /// A call site record resulting from a compilation.
1754 #[derive(Clone, Debug, PartialEq)]
1755 #[cfg_attr(feature = "enable-serde", derive(serde::Serialize, serde::Deserialize))]
1756 pub struct MachCallSite {
1757     /// The offset of the call's return address, *relative to the containing section*.
1758     pub ret_addr: CodeOffset,
1759     /// The call's opcode.
1760     pub opcode: Opcode,
1761 }
1762 
1763 /// A source-location mapping resulting from a compilation.
1764 #[derive(PartialEq, Debug, Clone)]
1765 #[cfg_attr(feature = "enable-serde", derive(serde::Serialize, serde::Deserialize))]
1766 pub struct MachSrcLoc<T: CompilePhase> {
1767     /// The start of the region of code corresponding to a source location.
1768     /// This is relative to the start of the function, not to the start of the
1769     /// section.
1770     pub start: CodeOffset,
1771     /// The end of the region of code corresponding to a source location.
1772     /// This is relative to the start of the section, not to the start of the
1773     /// section.
1774     pub end: CodeOffset,
1775     /// The source location.
1776     pub loc: T::SourceLocType,
1777 }
1778 
1779 impl MachSrcLoc<Stencil> {
1780     fn apply_base_srcloc(self, base_srcloc: SourceLoc) -> MachSrcLoc<Final> {
1781         MachSrcLoc {
1782             start: self.start,
1783             end: self.end,
1784             loc: self.loc.expand(base_srcloc),
1785         }
1786     }
1787 }
1788 
1789 /// Record of stack map metadata: stack offsets containing references.
1790 #[derive(Clone, Debug, PartialEq)]
1791 #[cfg_attr(feature = "enable-serde", derive(serde::Serialize, serde::Deserialize))]
1792 pub struct MachStackMap {
1793     /// The code offset at which this stack map applies.
1794     pub offset: CodeOffset,
1795     /// The code offset just past the "end" of the instruction: that is, the
1796     /// offset of the first byte of the following instruction, or equivalently,
1797     /// the start offset plus the instruction length.
1798     pub offset_end: CodeOffset,
1799     /// The stack map itself.
1800     pub stack_map: StackMap,
1801 }
1802 
1803 /// Record of branch instruction in the buffer, to facilitate editing.
1804 #[derive(Clone, Debug)]
1805 struct MachBranch {
1806     start: CodeOffset,
1807     end: CodeOffset,
1808     target: MachLabel,
1809     fixup: usize,
1810     inverted: Option<SmallVec<[u8; 8]>>,
1811     /// All labels pointing to the start of this branch. For correctness, this
1812     /// *must* be complete (i.e., must contain all labels whose resolved offsets
1813     /// are at the start of this branch): we rely on being able to redirect all
1814     /// labels that could jump to this branch before removing it, if it is
1815     /// otherwise unreachable.
1816     labels_at_this_branch: SmallVec<[MachLabel; 4]>,
1817 }
1818 
1819 impl MachBranch {
1820     fn is_cond(&self) -> bool {
1821         self.inverted.is_some()
1822     }
1823     fn is_uncond(&self) -> bool {
1824         self.inverted.is_none()
1825     }
1826 }
1827 
1828 /// Implementation of the `TextSectionBuilder` trait backed by `MachBuffer`.
1829 ///
1830 /// Note that `MachBuffer` was primarily written for intra-function references
1831 /// of jumps between basic blocks, but it's also quite usable for entire text
1832 /// sections and resolving references between functions themselves. This
1833 /// builder interprets "blocks" as labeled functions for the purposes of
1834 /// resolving labels internally in the buffer.
1835 pub struct MachTextSectionBuilder<I: VCodeInst> {
1836     buf: MachBuffer<I>,
1837     next_func: usize,
1838     force_veneers: ForceVeneers,
1839 }
1840 
1841 impl<I: VCodeInst> MachTextSectionBuilder<I> {
1842     /// Creates a new text section builder which will have `num_funcs` functions
1843     /// pushed into it.
1844     pub fn new(num_funcs: usize) -> MachTextSectionBuilder<I> {
1845         let mut buf = MachBuffer::new();
1846         buf.reserve_labels_for_blocks(num_funcs);
1847         MachTextSectionBuilder {
1848             buf,
1849             next_func: 0,
1850             force_veneers: ForceVeneers::No,
1851         }
1852     }
1853 }
1854 
1855 impl<I: VCodeInst> TextSectionBuilder for MachTextSectionBuilder<I> {
1856     fn append(
1857         &mut self,
1858         labeled: bool,
1859         func: &[u8],
1860         align: u32,
1861         ctrl_plane: &mut ControlPlane,
1862     ) -> u64 {
1863         // Conditionally emit an island if it's necessary to resolve jumps
1864         // between functions which are too far away.
1865         let size = func.len() as u32;
1866         if self.force_veneers == ForceVeneers::Yes || self.buf.island_needed(size) {
1867             self.buf
1868                 .emit_island_maybe_forced(self.force_veneers, IsLastIsland::No, ctrl_plane);
1869         }
1870 
1871         self.buf.align_to(align);
1872         let pos = self.buf.cur_offset();
1873         if labeled {
1874             self.buf.bind_label(
1875                 MachLabel::from_block(BlockIndex::new(self.next_func)),
1876                 ctrl_plane,
1877             );
1878             self.next_func += 1;
1879         }
1880         self.buf.put_data(func);
1881         u64::from(pos)
1882     }
1883 
1884     fn resolve_reloc(&mut self, offset: u64, reloc: Reloc, addend: Addend, target: usize) -> bool {
1885         crate::trace!(
1886             "Resolving relocation @ {offset:#x} + {addend:#x} to target {target} of kind {reloc:?}"
1887         );
1888         let label = MachLabel::from_block(BlockIndex::new(target));
1889         let offset = u32::try_from(offset).unwrap();
1890         match I::LabelUse::from_reloc(reloc, addend) {
1891             Some(label_use) => {
1892                 self.buf.use_label_at_offset(offset, label, label_use);
1893                 true
1894             }
1895             None => false,
1896         }
1897     }
1898 
1899     fn force_veneers(&mut self) {
1900         self.force_veneers = ForceVeneers::Yes;
1901     }
1902 
1903     fn finish(&mut self, ctrl_plane: &mut ControlPlane) -> Vec<u8> {
1904         // Double-check all functions were pushed.
1905         assert_eq!(self.next_func, self.buf.label_offsets.len());
1906 
1907         // Finish up any veneers, if necessary.
1908         self.buf
1909             .finish_emission_maybe_forcing_veneers(self.force_veneers, ctrl_plane);
1910 
1911         // We don't need the data any more, so return it to the caller.
1912         mem::take(&mut self.buf.data).into_vec()
1913     }
1914 }
1915 
1916 // We use an actual instruction definition to do tests, so we depend on the `arm64` feature here.
1917 #[cfg(all(test, feature = "arm64"))]
1918 mod test {
1919     use cranelift_entity::EntityRef as _;
1920 
1921     use super::*;
1922     use crate::ir::UserExternalNameRef;
1923     use crate::isa::aarch64::inst::xreg;
1924     use crate::isa::aarch64::inst::{BranchTarget, CondBrKind, EmitInfo, Inst};
1925     use crate::machinst::{MachInstEmit, MachInstEmitState};
1926     use crate::settings;
1927     use std::default::Default;
1928     use std::vec::Vec;
1929 
1930     fn label(n: u32) -> MachLabel {
1931         MachLabel::from_block(BlockIndex::new(n as usize))
1932     }
1933     fn target(n: u32) -> BranchTarget {
1934         BranchTarget::Label(label(n))
1935     }
1936 
1937     #[test]
1938     fn test_elide_jump_to_next() {
1939         let info = EmitInfo::new(settings::Flags::new(settings::builder()));
1940         let mut buf = MachBuffer::new();
1941         let mut state = <Inst as MachInstEmit>::State::default();
1942         let constants = Default::default();
1943 
1944         buf.reserve_labels_for_blocks(2);
1945         buf.bind_label(label(0), state.ctrl_plane_mut());
1946         let inst = Inst::Jump { dest: target(1) };
1947         inst.emit(&[], &mut buf, &info, &mut state);
1948         buf.bind_label(label(1), state.ctrl_plane_mut());
1949         let buf = buf.finish(&constants, state.ctrl_plane_mut());
1950         assert_eq!(0, buf.total_size());
1951     }
1952 
1953     #[test]
1954     fn test_elide_trivial_jump_blocks() {
1955         let info = EmitInfo::new(settings::Flags::new(settings::builder()));
1956         let mut buf = MachBuffer::new();
1957         let mut state = <Inst as MachInstEmit>::State::default();
1958         let constants = Default::default();
1959 
1960         buf.reserve_labels_for_blocks(4);
1961 
1962         buf.bind_label(label(0), state.ctrl_plane_mut());
1963         let inst = Inst::CondBr {
1964             kind: CondBrKind::NotZero(xreg(0)),
1965             taken: target(1),
1966             not_taken: target(2),
1967         };
1968         inst.emit(&[], &mut buf, &info, &mut state);
1969 
1970         buf.bind_label(label(1), state.ctrl_plane_mut());
1971         let inst = Inst::Jump { dest: target(3) };
1972         inst.emit(&[], &mut buf, &info, &mut state);
1973 
1974         buf.bind_label(label(2), state.ctrl_plane_mut());
1975         let inst = Inst::Jump { dest: target(3) };
1976         inst.emit(&[], &mut buf, &info, &mut state);
1977 
1978         buf.bind_label(label(3), state.ctrl_plane_mut());
1979 
1980         let buf = buf.finish(&constants, state.ctrl_plane_mut());
1981         assert_eq!(0, buf.total_size());
1982     }
1983 
1984     #[test]
1985     fn test_flip_cond() {
1986         let info = EmitInfo::new(settings::Flags::new(settings::builder()));
1987         let mut buf = MachBuffer::new();
1988         let mut state = <Inst as MachInstEmit>::State::default();
1989         let constants = Default::default();
1990 
1991         buf.reserve_labels_for_blocks(4);
1992 
1993         buf.bind_label(label(0), state.ctrl_plane_mut());
1994         let inst = Inst::CondBr {
1995             kind: CondBrKind::Zero(xreg(0)),
1996             taken: target(1),
1997             not_taken: target(2),
1998         };
1999         inst.emit(&[], &mut buf, &info, &mut state);
2000 
2001         buf.bind_label(label(1), state.ctrl_plane_mut());
2002         let inst = Inst::Nop4;
2003         inst.emit(&[], &mut buf, &info, &mut state);
2004 
2005         buf.bind_label(label(2), state.ctrl_plane_mut());
2006         let inst = Inst::Udf {
2007             trap_code: TrapCode::Interrupt,
2008         };
2009         inst.emit(&[], &mut buf, &info, &mut state);
2010 
2011         buf.bind_label(label(3), state.ctrl_plane_mut());
2012 
2013         let buf = buf.finish(&constants, state.ctrl_plane_mut());
2014 
2015         let mut buf2 = MachBuffer::new();
2016         let mut state = Default::default();
2017         let inst = Inst::TrapIf {
2018             kind: CondBrKind::NotZero(xreg(0)),
2019             trap_code: TrapCode::Interrupt,
2020         };
2021         inst.emit(&[], &mut buf2, &info, &mut state);
2022         let inst = Inst::Nop4;
2023         inst.emit(&[], &mut buf2, &info, &mut state);
2024 
2025         let buf2 = buf2.finish(&constants, state.ctrl_plane_mut());
2026 
2027         assert_eq!(buf.data, buf2.data);
2028     }
2029 
2030     #[test]
2031     fn test_island() {
2032         let info = EmitInfo::new(settings::Flags::new(settings::builder()));
2033         let mut buf = MachBuffer::new();
2034         let mut state = <Inst as MachInstEmit>::State::default();
2035         let constants = Default::default();
2036 
2037         buf.reserve_labels_for_blocks(4);
2038 
2039         buf.bind_label(label(0), state.ctrl_plane_mut());
2040         let inst = Inst::CondBr {
2041             kind: CondBrKind::NotZero(xreg(0)),
2042             taken: target(2),
2043             not_taken: target(3),
2044         };
2045         inst.emit(&[], &mut buf, &info, &mut state);
2046 
2047         buf.bind_label(label(1), state.ctrl_plane_mut());
2048         while buf.cur_offset() < 2000000 {
2049             if buf.island_needed(0) {
2050                 buf.emit_island(state.ctrl_plane_mut());
2051             }
2052             let inst = Inst::Nop4;
2053             inst.emit(&[], &mut buf, &info, &mut state);
2054         }
2055 
2056         buf.bind_label(label(2), state.ctrl_plane_mut());
2057         let inst = Inst::Nop4;
2058         inst.emit(&[], &mut buf, &info, &mut state);
2059 
2060         buf.bind_label(label(3), state.ctrl_plane_mut());
2061         let inst = Inst::Nop4;
2062         inst.emit(&[], &mut buf, &info, &mut state);
2063 
2064         let buf = buf.finish(&constants, state.ctrl_plane_mut());
2065 
2066         assert_eq!(2000000 + 8, buf.total_size());
2067 
2068         let mut buf2 = MachBuffer::new();
2069         let mut state = Default::default();
2070         let inst = Inst::CondBr {
2071             kind: CondBrKind::NotZero(xreg(0)),
2072 
2073             // This conditionally taken branch has a 19-bit constant, shifted
2074             // to the left by two, giving us a 21-bit range in total. Half of
2075             // this range positive so the we should be around 1 << 20 bytes
2076             // away for our jump target.
2077             //
2078             // There are two pending fixups by the time we reach this point,
2079             // one for this 19-bit jump and one for the unconditional 26-bit
2080             // jump below. A 19-bit veneer is 4 bytes large and the 26-bit
2081             // veneer is 20 bytes large, which means that pessimistically
2082             // assuming we'll need two veneers we need 24 bytes of extra
2083             // space, meaning that the actual island should come 24-bytes
2084             // before the deadline.
2085             taken: BranchTarget::ResolvedOffset((1 << 20) - 4 - 20),
2086 
2087             // This branch is in-range so no veneers are technically
2088             // be needed; however because we resolve *all* pending
2089             // fixups that cross an island when that island occurs, it
2090             // will have a veneer as well. This veneer comes just
2091             // after the one above. (Note that because the CondBr has
2092             // two instructions, the conditinoal and unconditional,
2093             // this offset is the same, though the veneer is four
2094             // bytes later.)
2095             not_taken: BranchTarget::ResolvedOffset((1 << 20) - 4 - 20),
2096         };
2097         inst.emit(&[], &mut buf2, &info, &mut state);
2098 
2099         let buf2 = buf2.finish(&constants, state.ctrl_plane_mut());
2100 
2101         assert_eq!(&buf.data[0..8], &buf2.data[..]);
2102     }
2103 
2104     #[test]
2105     fn test_island_backward() {
2106         let info = EmitInfo::new(settings::Flags::new(settings::builder()));
2107         let mut buf = MachBuffer::new();
2108         let mut state = <Inst as MachInstEmit>::State::default();
2109         let constants = Default::default();
2110 
2111         buf.reserve_labels_for_blocks(4);
2112 
2113         buf.bind_label(label(0), state.ctrl_plane_mut());
2114         let inst = Inst::Nop4;
2115         inst.emit(&[], &mut buf, &info, &mut state);
2116 
2117         buf.bind_label(label(1), state.ctrl_plane_mut());
2118         let inst = Inst::Nop4;
2119         inst.emit(&[], &mut buf, &info, &mut state);
2120 
2121         buf.bind_label(label(2), state.ctrl_plane_mut());
2122         while buf.cur_offset() < 2000000 {
2123             let inst = Inst::Nop4;
2124             inst.emit(&[], &mut buf, &info, &mut state);
2125         }
2126 
2127         buf.bind_label(label(3), state.ctrl_plane_mut());
2128         let inst = Inst::CondBr {
2129             kind: CondBrKind::NotZero(xreg(0)),
2130             taken: target(0),
2131             not_taken: target(1),
2132         };
2133         inst.emit(&[], &mut buf, &info, &mut state);
2134 
2135         let buf = buf.finish(&constants, state.ctrl_plane_mut());
2136 
2137         assert_eq!(2000000 + 12, buf.total_size());
2138 
2139         let mut buf2 = MachBuffer::new();
2140         let mut state = Default::default();
2141         let inst = Inst::CondBr {
2142             kind: CondBrKind::NotZero(xreg(0)),
2143             taken: BranchTarget::ResolvedOffset(8),
2144             not_taken: BranchTarget::ResolvedOffset(4 - (2000000 + 4)),
2145         };
2146         inst.emit(&[], &mut buf2, &info, &mut state);
2147         let inst = Inst::Jump {
2148             dest: BranchTarget::ResolvedOffset(-(2000000 + 8)),
2149         };
2150         inst.emit(&[], &mut buf2, &info, &mut state);
2151 
2152         let buf2 = buf2.finish(&constants, state.ctrl_plane_mut());
2153 
2154         assert_eq!(&buf.data[2000000..], &buf2.data[..]);
2155     }
2156 
2157     #[test]
2158     fn test_multiple_redirect() {
2159         // label0:
2160         //   cbz x0, label1
2161         //   b label2
2162         // label1:
2163         //   b label3
2164         // label2:
2165         //   nop
2166         //   nop
2167         //   b label0
2168         // label3:
2169         //   b label4
2170         // label4:
2171         //   b label5
2172         // label5:
2173         //   b label7
2174         // label6:
2175         //   nop
2176         // label7:
2177         //   ret
2178         //
2179         // -- should become:
2180         //
2181         // label0:
2182         //   cbz x0, label7
2183         // label2:
2184         //   nop
2185         //   nop
2186         //   b label0
2187         // label6:
2188         //   nop
2189         // label7:
2190         //   ret
2191 
2192         let info = EmitInfo::new(settings::Flags::new(settings::builder()));
2193         let mut buf = MachBuffer::new();
2194         let mut state = <Inst as MachInstEmit>::State::default();
2195         let constants = Default::default();
2196 
2197         buf.reserve_labels_for_blocks(8);
2198 
2199         buf.bind_label(label(0), state.ctrl_plane_mut());
2200         let inst = Inst::CondBr {
2201             kind: CondBrKind::Zero(xreg(0)),
2202             taken: target(1),
2203             not_taken: target(2),
2204         };
2205         inst.emit(&[], &mut buf, &info, &mut state);
2206 
2207         buf.bind_label(label(1), state.ctrl_plane_mut());
2208         let inst = Inst::Jump { dest: target(3) };
2209         inst.emit(&[], &mut buf, &info, &mut state);
2210 
2211         buf.bind_label(label(2), state.ctrl_plane_mut());
2212         let inst = Inst::Nop4;
2213         inst.emit(&[], &mut buf, &info, &mut state);
2214         inst.emit(&[], &mut buf, &info, &mut state);
2215         let inst = Inst::Jump { dest: target(0) };
2216         inst.emit(&[], &mut buf, &info, &mut state);
2217 
2218         buf.bind_label(label(3), state.ctrl_plane_mut());
2219         let inst = Inst::Jump { dest: target(4) };
2220         inst.emit(&[], &mut buf, &info, &mut state);
2221 
2222         buf.bind_label(label(4), state.ctrl_plane_mut());
2223         let inst = Inst::Jump { dest: target(5) };
2224         inst.emit(&[], &mut buf, &info, &mut state);
2225 
2226         buf.bind_label(label(5), state.ctrl_plane_mut());
2227         let inst = Inst::Jump { dest: target(7) };
2228         inst.emit(&[], &mut buf, &info, &mut state);
2229 
2230         buf.bind_label(label(6), state.ctrl_plane_mut());
2231         let inst = Inst::Nop4;
2232         inst.emit(&[], &mut buf, &info, &mut state);
2233 
2234         buf.bind_label(label(7), state.ctrl_plane_mut());
2235         let inst = Inst::Ret {
2236             rets: vec![],
2237             stack_bytes_to_pop: 0,
2238         };
2239         inst.emit(&[], &mut buf, &info, &mut state);
2240 
2241         let buf = buf.finish(&constants, state.ctrl_plane_mut());
2242 
2243         let golden_data = vec![
2244             0xa0, 0x00, 0x00, 0xb4, // cbz x0, 0x14
2245             0x1f, 0x20, 0x03, 0xd5, // nop
2246             0x1f, 0x20, 0x03, 0xd5, // nop
2247             0xfd, 0xff, 0xff, 0x17, // b 0
2248             0x1f, 0x20, 0x03, 0xd5, // nop
2249             0xc0, 0x03, 0x5f, 0xd6, // ret
2250         ];
2251 
2252         assert_eq!(&golden_data[..], &buf.data[..]);
2253     }
2254 
2255     #[test]
2256     fn test_handle_branch_cycle() {
2257         // label0:
2258         //   b label1
2259         // label1:
2260         //   b label2
2261         // label2:
2262         //   b label3
2263         // label3:
2264         //   b label4
2265         // label4:
2266         //   b label1  // note: not label0 (to make it interesting).
2267         //
2268         // -- should become:
2269         //
2270         // label0, label1, ..., label4:
2271         //   b label0
2272         let info = EmitInfo::new(settings::Flags::new(settings::builder()));
2273         let mut buf = MachBuffer::new();
2274         let mut state = <Inst as MachInstEmit>::State::default();
2275         let constants = Default::default();
2276 
2277         buf.reserve_labels_for_blocks(5);
2278 
2279         buf.bind_label(label(0), state.ctrl_plane_mut());
2280         let inst = Inst::Jump { dest: target(1) };
2281         inst.emit(&[], &mut buf, &info, &mut state);
2282 
2283         buf.bind_label(label(1), state.ctrl_plane_mut());
2284         let inst = Inst::Jump { dest: target(2) };
2285         inst.emit(&[], &mut buf, &info, &mut state);
2286 
2287         buf.bind_label(label(2), state.ctrl_plane_mut());
2288         let inst = Inst::Jump { dest: target(3) };
2289         inst.emit(&[], &mut buf, &info, &mut state);
2290 
2291         buf.bind_label(label(3), state.ctrl_plane_mut());
2292         let inst = Inst::Jump { dest: target(4) };
2293         inst.emit(&[], &mut buf, &info, &mut state);
2294 
2295         buf.bind_label(label(4), state.ctrl_plane_mut());
2296         let inst = Inst::Jump { dest: target(1) };
2297         inst.emit(&[], &mut buf, &info, &mut state);
2298 
2299         let buf = buf.finish(&constants, state.ctrl_plane_mut());
2300 
2301         let golden_data = vec![
2302             0x00, 0x00, 0x00, 0x14, // b 0
2303         ];
2304 
2305         assert_eq!(&golden_data[..], &buf.data[..]);
2306     }
2307 
2308     #[test]
2309     fn metadata_records() {
2310         let mut buf = MachBuffer::<Inst>::new();
2311         let ctrl_plane = &mut Default::default();
2312         let constants = Default::default();
2313 
2314         buf.reserve_labels_for_blocks(1);
2315 
2316         buf.bind_label(label(0), ctrl_plane);
2317         buf.put1(1);
2318         buf.add_trap(TrapCode::HeapOutOfBounds);
2319         buf.put1(2);
2320         buf.add_trap(TrapCode::IntegerOverflow);
2321         buf.add_trap(TrapCode::IntegerDivisionByZero);
2322         buf.add_call_site(Opcode::Call);
2323         buf.add_reloc(
2324             Reloc::Abs4,
2325             &ExternalName::User(UserExternalNameRef::new(0)),
2326             0,
2327         );
2328         buf.put1(3);
2329         buf.add_reloc(
2330             Reloc::Abs8,
2331             &ExternalName::User(UserExternalNameRef::new(1)),
2332             1,
2333         );
2334         buf.put1(4);
2335 
2336         let buf = buf.finish(&constants, ctrl_plane);
2337 
2338         assert_eq!(buf.data(), &[1, 2, 3, 4]);
2339         assert_eq!(
2340             buf.traps()
2341                 .iter()
2342                 .map(|trap| (trap.offset, trap.code))
2343                 .collect::<Vec<_>>(),
2344             vec![
2345                 (1, TrapCode::HeapOutOfBounds),
2346                 (2, TrapCode::IntegerOverflow),
2347                 (2, TrapCode::IntegerDivisionByZero)
2348             ]
2349         );
2350         assert_eq!(
2351             buf.call_sites()
2352                 .iter()
2353                 .map(|call_site| (call_site.ret_addr, call_site.opcode))
2354                 .collect::<Vec<_>>(),
2355             vec![(2, Opcode::Call)]
2356         );
2357         assert_eq!(
2358             buf.relocs()
2359                 .iter()
2360                 .map(|reloc| (reloc.offset, reloc.kind))
2361                 .collect::<Vec<_>>(),
2362             vec![(2, Reloc::Abs4), (3, Reloc::Abs8)]
2363         );
2364     }
2365 }
2366