17971fb35SRodrigo Siqueira========================
2e91f8401SRodrigo SiqueiraDisplay Core Debug tools
3e91f8401SRodrigo Siqueira========================
4e91f8401SRodrigo Siqueira
5*dec36b22SRodrigo SiqueiraIn this section, you will find helpful information on debugging the amdgpu
6*dec36b22SRodrigo Siqueiradriver from the display perspective. This page introduces debug mechanisms and
7*dec36b22SRodrigo Siqueiraprocedures to help you identify if some issues are related to display code.
8*dec36b22SRodrigo Siqueira
9*dec36b22SRodrigo SiqueiraNarrow down display issues
10*dec36b22SRodrigo Siqueira==========================
11*dec36b22SRodrigo Siqueira
12*dec36b22SRodrigo SiqueiraSince the display is the driver's visual component, it is common to see users
13*dec36b22SRodrigo Siqueirareporting issues as a display when another component causes the problem. This
14*dec36b22SRodrigo Siqueirasection equips users to determine if a specific issue was caused by the display
15*dec36b22SRodrigo Siqueiracomponent or another part of the driver.
16*dec36b22SRodrigo Siqueira
17*dec36b22SRodrigo SiqueiraDC dmesg important messages
18*dec36b22SRodrigo Siqueira---------------------------
19*dec36b22SRodrigo Siqueira
20*dec36b22SRodrigo SiqueiraThe dmesg log is the first source of information to be checked, and amdgpu
21*dec36b22SRodrigo Siqueiratakes advantage of this feature by logging some valuable information. When
22*dec36b22SRodrigo Siqueiralooking for the issues associated with amdgpu, remember that each component of
23*dec36b22SRodrigo Siqueirathe driver (e.g., smu, PSP, dm, etc.) is loaded one by one, and this
24*dec36b22SRodrigo Siqueirainformation can be found in the dmesg log. In this sense, look for the part of
25*dec36b22SRodrigo Siqueirathe log that looks like the below log snippet::
26*dec36b22SRodrigo Siqueira
27*dec36b22SRodrigo Siqueira  [    4.254295] [drm] initializing kernel modesetting (IP DISCOVERY 0x1002:0x744C 0x1002:0x0E3B 0xC8).
28*dec36b22SRodrigo Siqueira  [    4.254718] [drm] register mmio base: 0xFCB00000
29*dec36b22SRodrigo Siqueira  [    4.254918] [drm] register mmio size: 1048576
30*dec36b22SRodrigo Siqueira  [    4.260095] [drm] add ip block number 0 <soc21_common>
31*dec36b22SRodrigo Siqueira  [    4.260318] [drm] add ip block number 1 <gmc_v11_0>
32*dec36b22SRodrigo Siqueira  [    4.260510] [drm] add ip block number 2 <ih_v6_0>
33*dec36b22SRodrigo Siqueira  [    4.260696] [drm] add ip block number 3 <psp>
34*dec36b22SRodrigo Siqueira  [    4.260878] [drm] add ip block number 4 <smu>
35*dec36b22SRodrigo Siqueira  [    4.261057] [drm] add ip block number 5 <dm>
36*dec36b22SRodrigo Siqueira  [    4.261231] [drm] add ip block number 6 <gfx_v11_0>
37*dec36b22SRodrigo Siqueira  [    4.261402] [drm] add ip block number 7 <sdma_v6_0>
38*dec36b22SRodrigo Siqueira  [    4.261568] [drm] add ip block number 8 <vcn_v4_0>
39*dec36b22SRodrigo Siqueira  [    4.261729] [drm] add ip block number 9 <jpeg_v4_0>
40*dec36b22SRodrigo Siqueira  [    4.261887] [drm] add ip block number 10 <mes_v11_0>
41*dec36b22SRodrigo Siqueira
42*dec36b22SRodrigo SiqueiraFrom the above example, you can see the line that reports that `<dm>`,
43*dec36b22SRodrigo Siqueira(**Display Manager**), was loaded, which means that display can be part of the
44*dec36b22SRodrigo Siqueiraissue. If you do not see that line, something else might have failed before
45*dec36b22SRodrigo Siqueiraamdgpu loads the display component, indicating that we don't have a
46*dec36b22SRodrigo Siqueiradisplay issue.
47*dec36b22SRodrigo Siqueira
48*dec36b22SRodrigo SiqueiraAfter you identified that the DM was loaded correctly, you can check for the
49*dec36b22SRodrigo Siqueiradisplay version of the hardware in use, which can be retrieved from the dmesg
50*dec36b22SRodrigo Siqueiralog with the command::
51*dec36b22SRodrigo Siqueira
52*dec36b22SRodrigo Siqueira  dmesg | grep -i 'display core'
53*dec36b22SRodrigo Siqueira
54*dec36b22SRodrigo SiqueiraThis command shows a message that looks like this::
55*dec36b22SRodrigo Siqueira
56*dec36b22SRodrigo Siqueira  [    4.655828] [drm] Display Core v3.2.285 initialized on DCN 3.2
57*dec36b22SRodrigo Siqueira
58*dec36b22SRodrigo SiqueiraThis message has two key pieces of information:
59*dec36b22SRodrigo Siqueira
60*dec36b22SRodrigo Siqueira* **The DC version (e.g., v3.2.285)**: Display developers release a new DC version
61*dec36b22SRodrigo Siqueira  every week, and this information can be advantageous in a situation where a
62*dec36b22SRodrigo Siqueira  user/developer must find a good point versus a bad point based on a tested
63*dec36b22SRodrigo Siqueira  version of the display code. Remember from page :ref:`Display Core <amdgpu-display-core>`,
64*dec36b22SRodrigo Siqueira  that every week the new patches for display are heavily tested with IGT and
65*dec36b22SRodrigo Siqueira  manual tests.
66*dec36b22SRodrigo Siqueira* **The DCN version (e.g., DCN 3.2)**: The DCN block is associated with the
67*dec36b22SRodrigo Siqueira  hardware generation, and the DCN version conveys the hardware generation that
68*dec36b22SRodrigo Siqueira  the driver is currently running. This information helps to narrow down the
69*dec36b22SRodrigo Siqueira  code debug area since each DCN version has its files in the DC folder per DCN
70*dec36b22SRodrigo Siqueira  component (from the example, the developer might want to focus on
71*dec36b22SRodrigo Siqueira  files/folders/functions/structs with the dcn32 label might be executed).
72*dec36b22SRodrigo Siqueira  However, keep in mind that DC reuses code across different DCN versions; for
73*dec36b22SRodrigo Siqueira  example, it is expected to have some callbacks set in one DCN that are the same
74*dec36b22SRodrigo Siqueira  as those from another DCN. In summary, use the DCN version just as a guide.
75*dec36b22SRodrigo Siqueira
76*dec36b22SRodrigo SiqueiraFrom the dmesg file, it is also possible to get the ATOM bios code by using::
77*dec36b22SRodrigo Siqueira
78*dec36b22SRodrigo Siqueira  dmesg  | grep -i 'ATOM BIOS'
79*dec36b22SRodrigo Siqueira
80*dec36b22SRodrigo SiqueiraWhich generates an output that looks like this::
81*dec36b22SRodrigo Siqueira
82*dec36b22SRodrigo Siqueira  [    4.274534] amdgpu: ATOM BIOS: 113-D7020100-102
83*dec36b22SRodrigo Siqueira
84*dec36b22SRodrigo SiqueiraThis type of information is useful to be reported.
85*dec36b22SRodrigo Siqueira
86*dec36b22SRodrigo SiqueiraAvoid loading display core
87*dec36b22SRodrigo Siqueira--------------------------
88*dec36b22SRodrigo Siqueira
89*dec36b22SRodrigo SiqueiraSometimes, it might be hard to figure out which part of the driver is causing
90*dec36b22SRodrigo Siqueirathe issue; if you suspect that the display is not part of the problem and your
91*dec36b22SRodrigo Siqueirabug scenario is simple (e.g., some desktop configuration) you can try to remove
92*dec36b22SRodrigo Siqueirathe display component from the equation. First, you need to identify `dm` ID
93*dec36b22SRodrigo Siqueirafrom the dmesg log; for example, search for the following log::
94*dec36b22SRodrigo Siqueira
95*dec36b22SRodrigo Siqueira  [    4.254295] [drm] initializing kernel modesetting (IP DISCOVERY 0x1002:0x744C 0x1002:0x0E3B 0xC8).
96*dec36b22SRodrigo Siqueira  [..]
97*dec36b22SRodrigo Siqueira  [    4.260095] [drm] add ip block number 0 <soc21_common>
98*dec36b22SRodrigo Siqueira  [    4.260318] [drm] add ip block number 1 <gmc_v11_0>
99*dec36b22SRodrigo Siqueira  [..]
100*dec36b22SRodrigo Siqueira  [    4.261057] [drm] add ip block number 5 <dm>
101*dec36b22SRodrigo Siqueira
102*dec36b22SRodrigo SiqueiraNotice from the above example that the `dm` id is 5 for this specific hardware.
103*dec36b22SRodrigo SiqueiraNext, you need to run the following binary operation to identify the IP block
104*dec36b22SRodrigo Siqueiramask::
105*dec36b22SRodrigo Siqueira
106*dec36b22SRodrigo Siqueira  0xffffffff & ~(1 << [DM ID])
107*dec36b22SRodrigo Siqueira
108*dec36b22SRodrigo SiqueiraFrom our example the IP mask is::
109*dec36b22SRodrigo Siqueira
110*dec36b22SRodrigo Siqueira 0xffffffff & ~(1 << 5) = 0xffffffdf
111*dec36b22SRodrigo Siqueira
112*dec36b22SRodrigo SiqueiraFinally, to disable DC, you just need to set the below parameter in your
113*dec36b22SRodrigo Siqueirabootloader::
114*dec36b22SRodrigo Siqueira
115*dec36b22SRodrigo Siqueira amdgpu.ip_block_mask = 0xffffffdf
116*dec36b22SRodrigo Siqueira
117*dec36b22SRodrigo SiqueiraIf you can boot your system with the DC disabled and still see the issue, it
118*dec36b22SRodrigo Siqueirameans you can rule DC out of the equation. However, if the bug disappears, you
119*dec36b22SRodrigo Siqueirastill need to consider the DC part of the problem and keep narrowing down the
120*dec36b22SRodrigo Siqueiraissue. In some scenarios, disabling DC is impossible since it might be
121*dec36b22SRodrigo Siqueiranecessary to use the display component to reproduce the issue (e.g., play a
122*dec36b22SRodrigo Siqueiragame).
123*dec36b22SRodrigo Siqueira
124*dec36b22SRodrigo Siqueira**Note: This will probably lead to the absence of a display output.**
125*dec36b22SRodrigo Siqueira
126*dec36b22SRodrigo SiqueiraDisplay flickering
127*dec36b22SRodrigo Siqueira------------------
128*dec36b22SRodrigo Siqueira
129*dec36b22SRodrigo SiqueiraDisplay flickering might have multiple causes; one is the lack of proper power
130*dec36b22SRodrigo Siqueirato the GPU or problems in the DPM switches. A good first generic verification
131*dec36b22SRodrigo Siqueirais to set the GPU to use high voltage::
132*dec36b22SRodrigo Siqueira
133*dec36b22SRodrigo Siqueira   bash -c "echo high > /sys/class/drm/card0/device/power_dpm_force_performance_level"
134*dec36b22SRodrigo Siqueira
135*dec36b22SRodrigo SiqueiraThe above command sets the GPU/APU to use the maximum power allowed which
136*dec36b22SRodrigo Siqueiradisables DPM switches. If forcing DPM levels high does not fix the issue, it
137*dec36b22SRodrigo Siqueirais less likely that the issue is related to power management. If the issue
138*dec36b22SRodrigo Siqueiradisappears, there is a good chance that other components might be involved, and
139*dec36b22SRodrigo Siqueirathe display should not be ignored since this could be a DPM issues. From the
140*dec36b22SRodrigo Siqueiradisplay side, if the power increase fixes the issue, it is worth debugging the
141*dec36b22SRodrigo Siqueiraclock configuration and the pipe split police used in the specific
142*dec36b22SRodrigo Siqueiraconfiguration.
143*dec36b22SRodrigo Siqueira
144*dec36b22SRodrigo SiqueiraDisplay artifacts
145*dec36b22SRodrigo Siqueira-----------------
146*dec36b22SRodrigo Siqueira
147*dec36b22SRodrigo SiqueiraUsers may see some screen artifacts that can be categorized into two different
148*dec36b22SRodrigo Siqueiratypes: localized artifacts and general artifacts. The localized artifacts
149*dec36b22SRodrigo Siqueirahappen in some specific areas, such as around the UI window corners; if you see
150*dec36b22SRodrigo Siqueirathis type of issue, there is a considerable chance that you have a userspace
151*dec36b22SRodrigo Siqueiraproblem, likely Mesa or similar. The general artifacts usually happen on the
152*dec36b22SRodrigo Siqueiraentire screen. They might be caused by a misconfiguration at the driver level
153*dec36b22SRodrigo Siqueiraof the display parameters, but the userspace might also cause this issue. One
154*dec36b22SRodrigo Siqueiraway to identify the source of the problem is to take a screenshot or make a
155*dec36b22SRodrigo Siqueiradesktop video capture when the problem happens; after checking the
156*dec36b22SRodrigo Siqueirascreenshot/video recording, if you don't see any of the artifacts, it means
157*dec36b22SRodrigo Siqueirathat the issue is likely on the the driver side. If you can still see the
158*dec36b22SRodrigo Siqueiraproblem in the data collected, it is an issue that probably happened during
159*dec36b22SRodrigo Siqueirarendering, and the display code just got the framebuffer already corrupted.
160*dec36b22SRodrigo Siqueira
161*dec36b22SRodrigo SiqueiraDisabling/Enabling specific features
162*dec36b22SRodrigo Siqueira====================================
163*dec36b22SRodrigo Siqueira
164*dec36b22SRodrigo SiqueiraDC has a struct named `dc_debug_options`, which is statically initialized by
165*dec36b22SRodrigo Siqueiraall DCE/DCN components based on the specific hardware characteristic. This
166*dec36b22SRodrigo Siqueirastructure usually facilitates the bring-up phase since developers can start
167*dec36b22SRodrigo Siqueirawith many disabled features and enable them individually. This is also an
168*dec36b22SRodrigo Siqueiraimportant debug feature since users can change it when debugging specific
169*dec36b22SRodrigo Siqueiraissues.
170*dec36b22SRodrigo Siqueira
171*dec36b22SRodrigo SiqueiraFor example, dGPU users sometimes see a problem where a horizontal fillet of
172*dec36b22SRodrigo Siqueiraflickering happens in some specific part of the screen. This could be an
173*dec36b22SRodrigo Siqueiraindication of Sub-Viewport issues; after the users identified the target DCN,
174*dec36b22SRodrigo Siqueirathey can set the `force_disable_subvp` field to true in the statically
175*dec36b22SRodrigo Siqueirainitialized version of `dc_debug_options` to see if the issue gets fixed. Along
176*dec36b22SRodrigo Siqueirathe same lines, users/developers can also try to turn off `fams2_config` and
177*dec36b22SRodrigo Siqueira`enable_single_display_2to1_odm_policy`. In summary, the `dc_debug_options` is
178*dec36b22SRodrigo Siqueiraan interesting form for identifying the problem.
179*dec36b22SRodrigo Siqueira
180b2568d68SRodrigo SiqueiraDC Visual Confirmation
181b2568d68SRodrigo Siqueira======================
182b2568d68SRodrigo Siqueira
183b2568d68SRodrigo SiqueiraDisplay core provides a feature named visual confirmation, which is a set of
184b2568d68SRodrigo Siqueirabars added at the scanout time by the driver to convey some specific
185b2568d68SRodrigo Siqueirainformation. In general, you can enable this debug option by using::
186b2568d68SRodrigo Siqueira
187b2568d68SRodrigo Siqueira  echo <N> > /sys/kernel/debug/dri/0/amdgpu_dm_visual_confirm
188b2568d68SRodrigo Siqueira
189b2568d68SRodrigo SiqueiraWhere `N` is an integer number for some specific scenarios that the developer
190b2568d68SRodrigo Siqueirawants to enable, you will see some of these debug cases in the following
191b2568d68SRodrigo Siqueirasubsection.
1927971fb35SRodrigo Siqueira
1937971fb35SRodrigo SiqueiraMultiple Planes Debug
1947971fb35SRodrigo Siqueira---------------------
1957971fb35SRodrigo Siqueira
1967971fb35SRodrigo SiqueiraIf you want to enable or debug multiple planes in a specific user-space
1977971fb35SRodrigo Siqueiraapplication, you can leverage a debug feature named visual confirm. For
1987971fb35SRodrigo Siqueiraenabling it, you will need::
1997971fb35SRodrigo Siqueira
2007971fb35SRodrigo Siqueira  echo 1 > /sys/kernel/debug/dri/0/amdgpu_dm_visual_confirm
2017971fb35SRodrigo Siqueira
2027971fb35SRodrigo SiqueiraYou need to reload your GUI to see the visual confirmation. When the plane
2037971fb35SRodrigo Siqueiraconfiguration changes or a full update occurs there will be a colored bar at
2047971fb35SRodrigo Siqueirathe bottom of each hardware plane being drawn on the screen.
2057971fb35SRodrigo Siqueira
2067971fb35SRodrigo Siqueira* The color indicates the format - For example, red is AR24 and green is NV12
2077971fb35SRodrigo Siqueira* The height of the bar indicates the index of the plane
2087971fb35SRodrigo Siqueira* Pipe split can be observed if there are two bars with a difference in height
2097971fb35SRodrigo Siqueira  covering the same plane
2107971fb35SRodrigo Siqueira
2117971fb35SRodrigo SiqueiraConsider the video playback case in which a video is played in a specific
2127971fb35SRodrigo Siqueiraplane, and the desktop is drawn in another plane. The video plane should
2137971fb35SRodrigo Siqueirafeature one or two green bars at the bottom of the video depending on pipe
2147971fb35SRodrigo Siqueirasplit configuration.
2157971fb35SRodrigo Siqueira
2167971fb35SRodrigo Siqueira* There should **not** be any visual corruption
2177971fb35SRodrigo Siqueira* There should **not** be any underflow or screen flashes
2187971fb35SRodrigo Siqueira* There should **not** be any black screens
2197971fb35SRodrigo Siqueira* There should **not** be any cursor corruption
2207971fb35SRodrigo Siqueira* Multiple plane **may** be briefly disabled during window transitions or
2217971fb35SRodrigo Siqueira  resizing but should come back after the action has finished
222b2568d68SRodrigo Siqueira
223b2568d68SRodrigo SiqueiraPipe Split Debug
224b2568d68SRodrigo Siqueira----------------
225b2568d68SRodrigo Siqueira
226b2568d68SRodrigo SiqueiraSometimes we need to debug if DCN is splitting pipes correctly, and visual
227b2568d68SRodrigo Siqueiraconfirmation is also handy for this case. Similar to the MPO case, you can use
228b2568d68SRodrigo Siqueirathe below command to enable visual confirmation::
229b2568d68SRodrigo Siqueira
230b2568d68SRodrigo Siqueira  echo 1 > /sys/kernel/debug/dri/0/amdgpu_dm_visual_confirm
231b2568d68SRodrigo Siqueira
232b2568d68SRodrigo SiqueiraIn this case, if you have a pipe split, you will see one small red bar at the
233b2568d68SRodrigo Siqueirabottom of the display covering the entire display width and another bar
234b2568d68SRodrigo Siqueiracovering the second pipe. In other words, you will see a bit high bar in the
235b2568d68SRodrigo Siqueirasecond pipe.
23676659755SRodrigo Siqueira
23776659755SRodrigo SiqueiraDTN Debug
23876659755SRodrigo Siqueira=========
23976659755SRodrigo Siqueira
24076659755SRodrigo SiqueiraDC (DCN) provides an extensive log that dumps multiple details from our
24176659755SRodrigo Siqueirahardware configuration. Via debugfs, you can capture those status values by
24276659755SRodrigo Siqueirausing Display Test Next (DTN) log, which can be captured via debugfs by using::
24376659755SRodrigo Siqueira
24476659755SRodrigo Siqueira  cat /sys/kernel/debug/dri/0/amdgpu_dm_dtn_log
24576659755SRodrigo Siqueira
24676659755SRodrigo SiqueiraSince this log is updated accordingly with DCN status, you can also follow the
24776659755SRodrigo Siqueirachange in real-time by using something like::
24876659755SRodrigo Siqueira
24976659755SRodrigo Siqueira  sudo watch -d cat /sys/kernel/debug/dri/0/amdgpu_dm_dtn_log
25076659755SRodrigo Siqueira
25176659755SRodrigo SiqueiraWhen reporting a bug related to DC, consider attaching this log before and
25276659755SRodrigo Siqueiraafter you reproduce the bug.
253c41028a2SHamza Mahfooz
254*dec36b22SRodrigo SiqueiraCollect Firmware information
255*dec36b22SRodrigo Siqueira============================
256*dec36b22SRodrigo Siqueira
257*dec36b22SRodrigo SiqueiraWhen reporting issues, it is important to have the firmware information since
258*dec36b22SRodrigo Siqueirait can be helpful for debugging purposes. To get all the firmware information,
259*dec36b22SRodrigo Siqueirause the command::
260*dec36b22SRodrigo Siqueira
261*dec36b22SRodrigo Siqueira  cat /sys/kernel/debug/dri/0/amdgpu_firmware_info
262*dec36b22SRodrigo Siqueira
263*dec36b22SRodrigo SiqueiraFrom the display perspective, pay attention to the firmware of the DMCU and
264*dec36b22SRodrigo SiqueiraDMCUB.
265*dec36b22SRodrigo Siqueira
266c41028a2SHamza MahfoozDMUB Firmware Debug
267c41028a2SHamza Mahfooz===================
268c41028a2SHamza Mahfooz
269c41028a2SHamza MahfoozSometimes, dmesg logs aren't enough. This is especially true if a feature is
270c41028a2SHamza Mahfoozimplemented primarily in DMUB firmware. In such cases, all we see in dmesg when
271c41028a2SHamza Mahfoozan issue arises is some generic timeout error. So, to get more relevant
272c41028a2SHamza Mahfoozinformation, we can trace DMUB commands by enabling the relevant bits in
273c41028a2SHamza Mahfooz`amdgpu_dm_dmub_trace_mask`.
274c41028a2SHamza Mahfooz
275c41028a2SHamza MahfoozCurrently, we support the tracing of the following groups:
276c41028a2SHamza Mahfooz
277c41028a2SHamza MahfoozTrace Groups
278c41028a2SHamza Mahfooz------------
279c41028a2SHamza Mahfooz
280c41028a2SHamza Mahfooz.. csv-table::
281c41028a2SHamza Mahfooz   :header-rows: 1
282c41028a2SHamza Mahfooz   :widths: 1, 1
283c41028a2SHamza Mahfooz   :file: ./trace-groups-table.csv
284c41028a2SHamza Mahfooz
285c41028a2SHamza Mahfooz**Note: Not all ASICs support all of the listed trace groups**
286c41028a2SHamza Mahfooz
287c41028a2SHamza MahfoozSo, to enable just PSR tracing you can use the following command::
288c41028a2SHamza Mahfooz
289c41028a2SHamza Mahfooz  # echo 0x8020 > /sys/kernel/debug/dri/0/amdgpu_dm_dmub_trace_mask
290c41028a2SHamza Mahfooz
291c41028a2SHamza MahfoozThen, you need to enable logging trace events to the buffer, which you can do
292c41028a2SHamza Mahfoozusing the following::
293c41028a2SHamza Mahfooz
294c41028a2SHamza Mahfooz  # echo 1 > /sys/kernel/debug/dri/0/amdgpu_dm_dmcub_trace_event_en
295c41028a2SHamza Mahfooz
296c41028a2SHamza MahfoozLastly, after you are able to reproduce the issue you are trying to debug,
297c41028a2SHamza Mahfoozyou can disable tracing and read the trace log by using the following::
298c41028a2SHamza Mahfooz
299c41028a2SHamza Mahfooz  # echo 0 > /sys/kernel/debug/dri/0/amdgpu_dm_dmcub_trace_event_en
300c41028a2SHamza Mahfooz  # cat /sys/kernel/debug/dri/0/amdgpu_dm_dmub_tracebuffer
301c41028a2SHamza Mahfooz
302c41028a2SHamza MahfoozSo, when reporting bugs related to features such as PSR and ABM, consider
303c41028a2SHamza Mahfoozenabling the relevant bits in the mask before reproducing the issue and
304c41028a2SHamza Mahfoozattach the log that you obtain from the trace buffer in any bug reports that you
305c41028a2SHamza Mahfoozcreate.
306