漏洞标题
cxl/pci:如果CXL.mem设备已断开,则跳过处理RAS错误
漏洞描述信息
在Linux内核中,以下漏洞已得到解决:
cxl/pci:如果CXL.mem设备已断开,则跳过处理RAS错误
PCI AER模型对于CXL错误处理来说并不合适。期望的是,PCI设备可以通过升格到链路重置来从AER事件中恢复,而同样的重置在CXL上等同于大规模内存的意外热插拔。
目前,CXL错误处理程序在收割一些RAS寄存器值后尝试一些乐观的错误处理,以将设备从cxl_mem驱动程序解绑。这导致了一种“充满希望”的尝试以拔出内存,但并不能保证成功。
在内存设备解绑事件后的后续AER通知不能再假设寄存器已映射。在收割状态寄存器值之前检查内存设备是否绑定,以避免如下形式的崩溃:
BUG: 无法处理地址的页面故障: ffa00000195e9100
#PF: 超级用户读取权限在内核模式下
#PF: 错误代码(0x0000) - 不存在的页面
[...]
RIP: 0010:__cxl_handle_ras+0x30/0x110 [cxl_core]
[...]
堆栈跟踪:
<TASK>
? __die+0x24/0x70
? page_fault_oops+0x82/0x160
? kernelmode_fixup_or_oops+0x84/0x110
? exc_page_fault+0x113/0x170
? asm_exc_page_fault+0x26/0x30
? __pfx_dpc_reset_link+0x10/0x10
? __cxl_handle_ras+0x30/0x110 [cxl_core]
? find_cxl_port+0x59/0x80 [cxl_core]
cxl_handle_rp_ras+0xbc/0xd0 [cxl_core]
cxl_error_detected+0x6c/0xf0 [cxl_core]
report_error_detected+0xc7/0x1c0
pci_walk_bus+0x73/0x90
pcie_do_recovery+0x23f/0x330
长期来看,解绑和PCI_ERS_RESULT_DISCONNECT行为可能需要被新的PCI_ERS_RESULT_PANIC所取代。
CVSS信息
CVSS:3.1/AV:L/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H
漏洞类别
不恰当的资源关闭或释放
漏洞标题
cxl/pci: Skip to handle RAS errors if CXL.mem device is detached
漏洞描述信息
In the Linux kernel, the following vulnerability has been resolved:
cxl/pci: Skip to handle RAS errors if CXL.mem device is detached
The PCI AER model is an awkward fit for CXL error handling. While the
expectation is that a PCI device can escalate to link reset to recover
from an AER event, the same reset on CXL amounts to a surprise memory
hotplug of massive amounts of memory.
At present, the CXL error handler attempts some optimistic error
handling to unbind the device from the cxl_mem driver after reaping some
RAS register values. This results in a "hopeful" attempt to unplug the
memory, but there is no guarantee that will succeed.
A subsequent AER notification after the memdev unbind event can no
longer assume the registers are mapped. Check for memdev bind before
reaping status register values to avoid crashes of the form:
BUG: unable to handle page fault for address: ffa00000195e9100
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
[...]
RIP: 0010:__cxl_handle_ras+0x30/0x110 [cxl_core]
[...]
Call Trace:
<TASK>
? __die+0x24/0x70
? page_fault_oops+0x82/0x160
? kernelmode_fixup_or_oops+0x84/0x110
? exc_page_fault+0x113/0x170
? asm_exc_page_fault+0x26/0x30
? __pfx_dpc_reset_link+0x10/0x10
? __cxl_handle_ras+0x30/0x110 [cxl_core]
? find_cxl_port+0x59/0x80 [cxl_core]
cxl_handle_rp_ras+0xbc/0xd0 [cxl_core]
cxl_error_detected+0x6c/0xf0 [cxl_core]
report_error_detected+0xc7/0x1c0
pci_walk_bus+0x73/0x90
pcie_do_recovery+0x23f/0x330
Longer term, the unbind and PCI_ERS_RESULT_DISCONNECT behavior might
need to be replaced with a new PCI_ERS_RESULT_PANIC.
CVSS信息
N/A
漏洞类别
N/A
漏洞标题
Linux kernel 安全漏洞
漏洞描述信息
Linux kernel是美国Linux基金会的开源操作系统Linux所使用的内核。 Linux kernel存在安全漏洞,该漏洞源于RAS错误处理。
CVSS信息
N/A
漏洞类别
其他