admin 管理员组文章数量: 887021
DRIVER
文章目录
- DRIVER_POWER_STATE_FAILURE蓝屏问题分析
- 1. 问题背景
- 2. 技术概要
- 2.1 PopEnableIrpWatchdog
- 2.1.1 PopComputeWatchdogTimeout
- 2.1.2 PopIrpWatchdog
- 2.2 IofCallDriver
- 2.3 PopIrpWorker
- 2.4 总结
- 3. 问题分析
- 4. 总结
- 5. 附录
DRIVER_POWER_STATE_FAILURE蓝屏问题分析
最近有网友新组装的电脑比较稳定的出现DRIVER_POWER_STATE_FAILURE
蓝屏,尝试了网上许多的办法都没有解决,请求帮忙分析一下转储文件,本文记录一下整体的分析过程。
1. 问题背景
电脑比较有规律的出现蓝屏问题,如下:
4: kd> !targetinfo
Windows 10 Built by: 22000.1.amd64fre.co_release.210604-1628,
Kernel debug session
Crash Time: Mon Jan 24 18:55:57 2022
System Uptime: 0 days 0:06:25
Version: 10.0
Processors: 12 proc (F/M/S Vendor)0 6/167/1 GenuineIntel1 6/167/1 GenuineIntel2 6/167/1 GenuineIntel3 6/167/1 GenuineIntel>4 6/167/1 GenuineIntel5 6/167/1 GenuineIntel6 6/167/1 GenuineIntel7 6/167/1 GenuineIntel8 6/167/1 GenuineIntel9 6/167/1 GenuineIntel10 6/167/1 GenuineIntel11 6/167/1 GenuineIntel
这里我们可以发现,系统运行6分钟25秒的时候出现崩溃(据反馈,开机大致7分钟出现蓝屏),和现象基本一致。蓝屏代码如下:
DRIVER_POWER_STATE_FAILURE (9f)
A driver has failed to complete a power IRP within a specific time.
Arguments:
Arg1: 0000000000000003, A device object has been blocking an Irp for too long a time
Arg2: ffff808f2bd19360, Physical Device Object of the stack
Arg3: ffffd501e185f090, nt!TRIAGE_9F_POWER on Win7 and higher, otherwise the Functional Device Object of the stack
Arg4: ffff808f2bc13970, The blocked IRP
DRIVER_POWER_STATE_FAILURE
表示是电源状态错误(在指定时间内没有处理完成电源IRP请求出现蓝屏),蓝屏堆栈如下:
# Child-SP RetAddr Call Site
00 ffffd501`e185f058 fffff801`4b9c25e7 nt!KeBugCheckEx
01 ffffd501`e185f060 fffff801`4b9c2501 nt!PopIrpWatchdogBugcheck+0xdf
02 ffffd501`e185f0d0 fffff801`4b6cc394 nt!PopIrpWatchdog+0x31
03 ffffd501`e185f120 fffff801`4b6ca984 nt!KiProcessExpiredTimerList+0x204
04 ffffd501`e185f250 fffff801`4b819a4e nt!KiRetireDpcList+0x714
05 ffffd501`e185f500 00000000`00000000 nt!KiIdleLoop+0x9e
2. 技术概要
通常,为了给一个设备发送电源相关的IRP,都是通过PoRequestPowerIrp
来完成的,这个函数的声明如下:
NTSTATUS PoRequestPowerIrp(PDEVICE_OBJECT DeviceObject,UCHAR MinorFunction,POWER_STATE PowerState,PREQUEST_POWER_COMPLETE CompletionFunction,__drv_aliasesMem PVOID Context,PIRP *Irp
);
在Win10系统下面,PoRequestPowerIrp
这个函数的实现大致如下:
- 调用
PopAllocateIrp
来创建一个IRP对象,并将完成历程设置为PopRequestCompletion
- 调用
PopQueueQuerySetIrp
来完成一个查询或者设置电源请求的IRP。
对于PopQueueQuerySetIrp
的实现可以总结为如下:
- 调用
PopEnableIrpWatchdog
给IRP设置一个看门狗定时器。 - 调用
IofCallDriver
来完成IRP。
2.1 PopEnableIrpWatchdog
PopEnableIrpWatchdog
这个函数的流程大致可以总结为如下:
- 通过
PopComputeWatchdogTimeout
计算出看门狗的超时时间。 - 然后初始化一个DPC,并设置好定时器。
- DPC的回调例程是
PopIrpWatchdog
.
2.1.1 PopComputeWatchdogTimeout
PopComputeWatchdogTimeout
是获取的如下两个值中的一个:
PopWatchdogSleepTimeout
。PopWatchdogResumeTimeout
。
这两个值的结果如下:
4: kd> dd nt!PopWatchdogSleepTimeout L1
fffff801`4c105078 0000012c
4: kd> dd nt!PopWatchdogResumeTimeout L1
fffff801`4c105150 00000078
这里PopWatchdogSleepTimeout
是300秒,跟我们开机7分钟之后蓝屏比较接近(也就是说,这个IRP大致是开机后2分钟之内发起的)。
2.1.2 PopIrpWatchdog
PopIrpWatchdog
这个是看门狗定时器的回调例程,这个函数大致实现如下:
void __thiscall __noreturn PopIrpWatchdogBugcheck(_DWORD *this, int a2)
{//...TriagePower.IrpList = (_LIST_ENTRY *)&PopIrpList;TriagePower.Signature = 0x8000u;TriagePower.Revision = 2;TriagePower.DelayedWorkQueue = (_TRIAGE_EX_WORK_QUEUE *)ExWorkerQueue;TriagePower.DelayedIoWorkQueue = (_TRIAGE_EX_WORK_QUEUE *)IoWorkerQueue;TriagePower.ThreadList = (_LIST_ENTRY *)&PopIrpThreadList;KeBugCheckEx(0x9Fu, 3u, DeviceObject, &TriagePower, Irp);
}
在这个超时回调函数中,回调用KeBugCheckEx(0x9Fu, 3u, DeviceObject, &TriagePower, Irp);
来使得系统强制蓝屏。
2.2 IofCallDriver
在电源IRP中,IofCallDriver
函数里面使用PoHandleIrp
来处理IRP的调用,PoHandleIrp
这个函数使用PopDispatchQuerySetIrp
来分发函数。
在PopDispatchQuerySetIrp
实现两个东西:
- 将IRP挂入
PopIrpWorkerList
队列中。 - 使用
KeReleaseSemaphore(&PopIrpWorkerSemaphore, 0, 1, 0)
通知PopIrpWorker
来处理。
2.3 PopIrpWorker
PopIrpWorker
这个worker是用来处理电源请求的线程,这个线程的线程栈如下:
4: kd> k*** Stack trace for last set context - .thread/.cxr resets it# Child-SP RetAddr Call Site
00 ffffd501`e1817060 fffff801`4b7324a7 nt!KiSwapContext+0x76
01 ffffd501`e18171a0 fffff801`4b734359 nt!KiSwapThread+0x3a7
02 ffffd501`e1817280 fffff801`4b72e274 nt!KiCommitThreadWait+0x159
03 ffffd501`e1817320 fffff801`4b7ab612 nt!KeWaitForSingleObject+0x234
04 ffffd501`e1817410 fffff801`4b6478f5 nt!PopIrpWorker+0x102
05 ffffd501`e18174b0 fffff801`4b819bb4 nt!PspSystemThreadStartup+0x55
06 ffffd501`e1817500 00000000`00000000 nt!KiStartSystemThread+0x34
在这个线程中等待PopIrpWorkerSemaphore
信号,然后从PopIrpWorkerList
去取出IRP执行,我们可以看一下整个Worker的响应线程调用栈,如下:
kd> !thread
THREAD 8a153040 Cid 0004.16d4 Teb: 00000000 Win32Thread: 00000000 RUNNING on processor 0
Not impersonating
DeviceMap 8ac04330
Owning Process 8a1837c0 Image: System
Attached Process N/A Image: N/A
Wait Start TickCount 4667 Ticks: 2 (0:00:00:00.031)
Context Switch Count 9 IdealProcessor: 0
UserTime 00:00:00.000
KernelTime 00:00:00.015
Win32 Start Address nt!PopIrpWorker (0x81bb604e)
Stack Init c1503ca0 Current c15037dc Base c1504000 Limit c1501000 Call 00000000
Priority 13 BasePriority 13 PriorityDecrement 0 IoPriority 2 PagePriority 5
ChildEBP RetAddr Args to Child
80e49ce0 8287f85f 00000000 00000000 8287fac0 ACPI!AsyncEvalObject (FPO: [Non-Fpo])
80e49d58 82879fab a0041802 8ead9a24 a763ac20 ACPI!ACPIGet+0x31f (FPO: [Non-Fpo])
80e49d94 8287c304 90dfe818 80e12300 80e49f18 ACPI!ACPIDevicePowerProcessPhase0DeviceSubPhase1+0x51 (FPO: [Non-Fpo])
80e49de0 81ab2898 828d0da0 00000000 00000000 ACPI!ACPIDevicePowerDpc+0x394 (FPO: [Non-Fpo])
80e49eb8 81ab2083 80e49f18 00000000 90dfe818 nt!KiExecuteAllDpcs+0x2b8 (FPO: [Non-Fpo])
80e49ff4 81bed85e c1503904 00000000 00000000 nt!KiRetireDpcList+0x123 (FPO: [Non-Fpo])
c1503928 81b7db53 00000008 828d0dc4 c150395c nt!KiDispatchInterrupt+0x2e (FPO: [Uses EBP] [0,0,1])
c1503938 8287a51a 910cca70 8eae4c30 00000000 nt!KfReleaseSpinLock+0x23 (FPO: [Non-Fpo])
c150395c 82897c9c 00000004 910cca70 00000006 ACPI!ACPIDeviceInitializePowerRequest+0x114 (FPO: [Non-Fpo])
c1503994 81aac394 00000004 828990b0 82896e34 ACPI!ACPIDeviceIrpDeviceFilterRequest+0xea (FPO: [Non-Fpo])
c15039e8 81aac14a c1503a18 82986c8d 910ccb44 nt!IopfCompleteRequest+0x244 (FPO: [Non-Fpo])
c15039f0 82986c8d 910ccb44 910cca70 910cca16 nt!IofCompleteRequest+0x1a (FPO: [Non-Fpo])
c1503a18 81bb636f 00963030 910cca70 90963030 pci!PciDispatchPnpPower+0xdd (FPO: [Non-Fpo])
c1503a30 81aaf96c 910ccb68 910cca70 c1503a48 nt!IopPoHandleIrp+0x27 (FPO: [Non-Fpo])
c1503a40 81bbb78a c1503a78 82895c65 90963030 nt!IofCallDriver+0x5c (FPO: [Non-Fpo])
c1503a48 82895c65 90963030 910cca70 82895bbe nt!IoCallDriver+0x10 (FPO: [Non-Fpo])
c1503a78 82872395 909338d0 910cca70 910ccb68 ACPI!ACPIFilterIrpSetPower+0xa7 (FPO: [Non-Fpo])
c1503ab0 81bb636f 909338d0 910cca70 909338d0 ACPI!ACPIDispatchIrp+0x345 (FPO: [Non-Fpo])
c1503ac8 81aaf96c 910cca70 909b50e0 c1503ae0 nt!IopPoHandleIrp+0x27 (FPO: [Non-Fpo])
c1503ad8 81bbb78a c1503b00 82c0f5be 909338d0 nt!IofCallDriver+0x5c (FPO: [Non-Fpo])
c1503ae0 82c0f5be 909338d0 910cca70 910cca70 nt!IoCallDriver+0x10 (FPO: [Non-Fpo])
c1503b00 82c26f6c 909b5028 909b5320 82c57001 storport!RaidAdapterDevicePowerstopAdapter+0x98 (FPO: [Non-Fpo])
c1503b14 82c0ed97 909b5028 910cca70 00000000 storport!RaidAdapterStopOnPowerdown+0x181cc
c1503b44 82c0ec95 910cca70 909b6cd8 909b6cd8 storport!GatewayRegisterForEmptyNotification+0x4d (FPO: [Non-Fpo])
c1503b60 82c0e551 909b50e0 00000004 910cca70 storport!RaidAdapterPowerDownDevice+0x4f (FPO: [0,0,4])
c1503b90 82c0e262 910cca70 00000002 909b50e0 storport!RaidAdapterSetDevicePowerIrp+0x89 (FPO: [Non-Fpo])
c1503bac 82c0e0d2 910f94c0 910cca70 909b5028 storport!RaidAdapterSetPowerIrp+0x48 (FPO: [Non-Fpo])
c1503bc4 82c0e049 909b5028 910f94c0 910cca70 storport!RaidAdapterPowerIrp+0x52 (FPO: [Non-Fpo])
c1503bd8 81bb61d4 909b5028 910cca70 00000000 storport!RaDriverPowerIrp+0x39 (FPO: [Non-Fpo])
c1503c38 81b120f0 a358d590 7c778104 00000000 nt!PopIrpWorker+0x186 (FPO: [Non-Fpo])
c1503c70 81bee18d 81bb604e a358d590 00000000 nt!PspSystemThreadStartup+0x4a (FPO: [Non-Fpo])
c1503c7c 00000000 00000000 0401ff00 00000023 nt!KiThreadStartup+0x15
2.4 总结
综上,我们总结一下PoRequestPowerIrp
发起电源IRP的整个过程。
- 创建一个IRP。
- 设置好IRP对应的DPC定时器看门狗,如果看门狗超时,那么就会直接蓝屏。
- 将IRP放入到
PopIrpWorkerList
队列中,提交给PopIrpWorker
来处理。 PopIrpWorker
从队列中取出IRP,然后发送IRP到指定的设备栈。
这里我们可以推测的两种情况:
- IRP正常完成,那么取消DPC的看门狗。
- 如果IRP超时,那么DPC定时器相应,然后蓝屏。
3. 问题分析
通过上面分析,我们可以看一下整个分析过程,我们可以初步得出的信息有:
- IRP发往的设备对象为
ffff808f2bd19360
. - 蓝屏时候的
TRIAGE_9F_POWER
结构地址为ffffd501e185f090
. - 超时的IRP地址为
ffff808f2bc13970
.
在Windbg中提供了!poaction
来解析上述的两个链表(PopIrpList
和PopIrpThreadList
), 如下:
4: kd> !poaction
PopAction: fffff8014c022640State..........: 0 - IdleUpdates........: 0 Action.........: NoneLightest State.: UnspecifiedFlags..........: 10000003 QueryApps|UIAllowedIrp minor......: ??System State...: UnspecifiedHiber Context..: 0000000000000000Allocated power irps (PopIrpList - fffff8014c022e20)IRP: ffff808f33a045a0 (wait-wake/S0), PDO: ffff808f338dddc0IRP: ffff808f33a389a0 (wait-wake/S0), PDO: ffff808f33910d40IRP: ffff808f3395b010 (wait-wake/S4), PDO: ffff808f33a570a0IRP: ffff808f3395c010 (wait-wake/S4), PDO: ffff808f33a95060IRP: ffff808f33957010 (wait-wake/S4), PDO: ffff808f33a86120IRP: ffff808f33959010 (wait-wake/S4), PDO: ffff808f33ab7060IRP: ffff808f2bc13970 (set/D3,), PDO: ffff808f2bd19360, CURRENT: ffff808f2bcc5d50Irp worker threads (PopIrpThreadList - fffff8014c01f970)THREAD: ffff808f2a745040 (static)THREAD: ffff808f2a744040 (static)Broadcast in progress: FALSE
Is Directed DRIPS Transition: FALSENo Device State present
针对IRP: ffff808f2bc13970 (set/D3,), PDO: ffff808f2bd19360, CURRENT: ffff808f2bcc5d50,我们看看每个设备的电源状态是否设置正常,如下:
4: kd> !podev ffff808f2bd19360
Device object is for:DriverObject 2bbd4c90
Current Irp 00000000 RefCount 0 Type 00000004 AttachedDev ffff808f2bcc5d50 DevFlags 00001040
Device queue is not busy.
Device Object Extension: ffff808f2bd19c68:
PowerFlags: 00000040 =>SystemState=0 DeviceState=4
Dope: 00000000:4: kd> !podev ffff808f2bcc5d50
Device object is for:DriverObject 2a9cf060
Current Irp 00000000 RefCount 0 Type 00000032 AttachedDev ffff808f2bc0f050 DevFlags 00000000
Device queue is not busy.
Device Object Extension: ffff808f2bcc5ea0:
PowerFlags: 00000040 =>SystemState=0 DeviceState=4
Dope: 00000000:4: kd> !podev ffff808f2bc0f050
Device object is for:DriverObject 2bc6e830
Current Irp 00000000 RefCount 0 Type 00000004 DevFlags 00000050
Device queue is not busy.
Device Object Extension: ffff808f2bc138e0:
PowerFlags: 00000040 =>SystemState=0 DeviceState=4
Dope: 00000000:
可以发现每个设备的电源状态为DeviceState=4,看起来似乎是正常的,已经设置成功了,针对这个IRP状态如下:
4: kd> !irp ffff808f2bc13970
Irp is active with 6 stacks 4 is current (= 0xffff808f2bc13b18)No Mdl: No System Buffer: Thread 00000000: Irp stack trace. cmd flg cl Device File Completion-Context[N/A(0), N/A(0)]0 0 00000000 00000000 00000000-00000000 Args: 00000000 00000000 00000000 00000000[N/A(0), N/A(0)]0 0 00000000 00000000 00000000-00000000 Args: 00000000 00000000 00000000 00000000[IRP_MJ_POWER(16), IRP_MN_WAIT_WAKE(0)]0 0 ffff808f2bd19360 00000000 fffff8014ddee1d0-fffff8014ddee5b0 \Driver\pci ACPI!ACPIDeviceIrpDeviceFilterRequestArgs: 00000000 00000000 00000000 00000000
>[IRP_MJ_POWER(16), IRP_MN_SET_POWER(2)]0 e1 ffff808f2bcc5d50 00000000 fffff8014e321b60-00000000 Success Error Cancel pending\Driver\ACPI storport!RaidAdapterPowerDownDeviceCompletionArgs: 00000000 00000001 00000004 00000000[IRP_MJ_POWER(16), IRP_MN_SET_POWER(2)]0 e1 ffff808f2bc0f050 00000000 fffff8014b79f830-ffff808f2a6e1208 Success Error Cancel pending\Driver\storahci nt!PopRequestCompletionArgs: 00000000 00000001 00000004 00000000[N/A(0), N/A(0)]0 0 00000000 00000000 00000000-ffff808f2a6e1208 Args: 00000000 00000000 00000000 00000000
我们得知IRP_MN_SET_POWER
IRP没有完成导致超时蓝屏,这里有两个可疑点:
IRP: ffff808f2bc13970 (set/D3,), PDO: ffff808f2bd19360, CURRENT: ffff808f2bcc5d50
:说明这个IRP是将硬件设备设置为D3状态,对于PowerDeviceD3
表明是将设备进行关闭的调用,这对于一个存储设备来说是比较可疑的。IRP_MN_SET_POWER
IRP发往哪里进行处理了。
我们先从比较简单的情况来看,看一下IRP在什么地方进行处理了,从!irp ffff808f2bc13970
命令大致可以得到,PCI总线已经响应了电源请求处理(也就是说调用了IoCompleteRequest
),但是在完成例程中再次对IRP处理的时候失去了响应。
我们对电源处理的Worker和PCI处理的Worker线程进行查看,发现如下:
4: kd> !process 0xffff808f2a6e7040 7THREAD ffff808f2a744040 Cid 0004.0010 Teb: 0000000000000000 Win32Thread: 0000000000000000 WAIT: (Executive) KernelMode Non-Alertablefffff8014c022600 Semaphore Limit 0x7fffffffNot impersonatingDeviceMap ffffae88c1438c3fOwning Process ffff808f2a6e7040 Image: SystemAttached Process N/A Image: N/AWait Start TickCount 9730 Ticks: 14919 (0:00:03:53.109)Context Switch Count 63 IdealProcessor: 0 UserTime 00:00:00.000KernelTime 00:00:00.000Win32 Start Address nt!PopIrpWorker (0xfffff8014b7ab510)Stack Init ffffd501e1817530 Current ffffd501e1817020Base ffffd501e1818000 Limit ffffd501e1811000 Call 0000000000000000Priority 13 BasePriority 13 PriorityDecrement 0 IoPriority 2 PagePriority 5Child-SP RetAddr : Args to Child : Call Siteffffd501`e1817060 fffff801`4b7324a7 : fffff801`47b24180 fffff801`ffffffff ffff808f`00000000 00000000`00000001 : nt!KiSwapContext+0x76ffffd501`e18171a0 fffff801`4b734359 : ffff808f`00000000 fffff801`0000000a ffffd501`e1817380 00000000`00000000 : nt!KiSwapThread+0x3a7ffffd501`e1817280 fffff801`4b72e274 : 00000000`00000000 00000000`00000000 ffff808f`00000000 00000000`00000000 : nt!KiCommitThreadWait+0x159ffffd501`e1817320 fffff801`4b7ab612 : fffff801`4c022600 ffff808f`00000000 00000000`00000000 00000000`00000000 : nt!KeWaitForSingleObject+0x234ffffd501`e1817410 fffff801`4b6478f5 : 00000000`00000000 fffff801`4b7ab510 00000000`00000000 00000000`000000d4 : nt!PopIrpWorker+0x102ffffd501`e18174b0 fffff801`4b819bb4 : ffffbf01`2b372180 ffff808f`2a744040 fffff801`4b6478a0 00000000`00000000 : nt!PspSystemThreadStartup+0x55ffffd501`e1817500 00000000`00000000 : ffffd501`e1818000 ffffd501`e1811000 00000000`00000000 00000000`00000000 : nt!KiStartSystemThread+0x34THREAD ffff808f2a745040 Cid 0004.0014 Teb: 0000000000000000 Win32Thread: 0000000000000000 WAIT: (Executive) KernelMode Non-Alertablefffff8014c022600 Semaphore Limit 0x7fffffffNot impersonatingDeviceMap ffffae88c1438c3fOwning Process ffff808f2a6e7040 Image: SystemAttached Process N/A Image: N/AWait Start TickCount 9413 Ticks: 15236 (0:00:03:58.062)Context Switch Count 108 IdealProcessor: 0 UserTime 00:00:00.000KernelTime 00:00:00.000Win32 Start Address nt!PopIrpWorker (0xfffff8014b7ab510)Stack Init ffffd501e181f530 Current ffffd501e181f020Base ffffd501e1820000 Limit ffffd501e1819000 Call 0000000000000000Priority 13 BasePriority 13 PriorityDecrement 0 IoPriority 2 PagePriority 5Child-SP RetAddr : Args to Child : Call Siteffffd501`e181f060 fffff801`4b7324a7 : fffff801`47b24180 fffff801`47b24180 ffff808f`00000000 00000000`00000003 : nt!KiSwapContext+0x76ffffd501`e181f1a0 fffff801`4b734359 : ffff808f`00000000 fffff801`00000003 ffffd501`e181f380 00007f70`00000000 : nt!KiSwapThread+0x3a7ffffd501`e181f280 fffff801`4b72e274 : 00000000`00000000 00000000`00000000 ffff808f`00000000 00000000`00000000 : nt!KiCommitThreadWait+0x159ffffd501`e181f320 fffff801`4b7ab612 : fffff801`4c022600 ffff808f`00000000 00000000`00000000 00000000`00000000 : nt!KeWaitForSingleObject+0x234ffffd501`e181f410 fffff801`4b6478f5 : 00000000`00000000 fffff801`4b7ab510 00000000`00000000 00000000`000000da : nt!PopIrpWorker+0x102ffffd501`e181f4b0 fffff801`4b819bb4 : ffffbf01`2b1ce180 ffff808f`2a745040 fffff801`4b6478a0 00000000`00000000 : nt!PspSystemThreadStartup+0x55ffffd501`e181f500 00000000`00000000 : ffffd501`e1820000 ffffd501`e1819000 00000000`00000000 00000000`00000000 : nt!KiStartSystemThread+0x34THREAD ffff808f2a6d9040 Cid 0004.014c Teb: 0000000000000000 Win32Thread: 0000000000000000 WAIT: (Executive) KernelMode Non-Alertablefffff8014de40a60 NotificationEventfffff8014de40a20 NotificationEventNot impersonatingDeviceMap ffffae88c1438c3fOwning Process ffff808f2a6e7040 Image: SystemAttached Process N/A Image: N/AWait Start TickCount 5451 Ticks: 19198 (0:00:04:59.968)Context Switch Count 845 IdealProcessor: 4 UserTime 00:00:00.000KernelTime 00:00:00.062Win32 Start Address ACPI!ACPIWorkerThread (0xfffff8014dde5550)Stack Init ffffd501e1b78530 Current ffffd501e1b78040Base ffffd501e1b79000 Limit ffffd501e1b72000 Call 0000000000000000Priority 16 BasePriority 8 PriorityDecrement 0 IoPriority 2 PagePriority 5Child-SP RetAddr : Args to Child : Call Siteffffd501`e1b78080 fffff801`4b7324a7 : ffffbf01`2b1ce180 00000000`ffffffff ffff808f`2eb32158 fffff801`4de0208a : nt!KiSwapContext+0x76ffffd501`e1b781c0 fffff801`4b734359 : 00180001`00000004 00000000`00000003 ffffd501`e1b783d8 ffff808f`00000000 : nt!KiSwapThread+0x3a7ffffd501`e1b782a0 fffff801`4b68ecd1 : ffff808f`00000000 fffff801`00000000 ffff808f`00000000 00000000`00000000 : nt!KiCommitThreadWait+0x159ffffd501`e1b78340 fffff801`4dde55ea : ffff808f`00000000 fffff801`4de40a00 00000000`00000000 ffffbf01`2b1ce180 : nt!KeWaitForMultipleObjects+0x2b1ffffd501`e1b78440 fffff801`4b6478f5 : ffff808f`2a6d9040 fffff801`4dde5550 00000000`00000000 ffff808f`2a6e7040 : ACPI!ACPIWorkerThread+0x9affffd501`e1b784b0 fffff801`4b819bb4 : ffffbf01`2b1ce180 ffff808f`2a6d9040 fffff801`4b6478a0 00000000`00000000 : nt!PspSystemThreadStartup+0x55ffffd501`e1b78500 00000000`00000000 : ffffd501`e1b79000 ffffd501`e1b72000 00000000`00000000 00000000`00000000 : nt!KiStartSystemThread+0x34
从上面来看,暂时没有发现相关的异常,因此从IRP来跟踪,难度已经比较大了。不过这里有个比较特殊的点就是Wait Start TickCount 5451 Ticks: 19198 (0:00:04:59.968),似乎ACPIWorkerThread
大致已经等了5分钟,这和我们发起的IRP_MN_SET_POWER
超时时间基本一致,但是这也并不能说明什么。
接下来我们看一下IRP对应设备栈的状态,大致如下:
4: kd> !devstack ffff808f2bc0f050!DevObj !DrvObj !DevExt ObjectName
> ffff808f2bc0f050 \Driver\storahci ffff808f2bc0f1a0 RaidPort0ffff808f2bcc5d50 \Driver\ACPI ffff808f2bafa420 ffff808f2bd19360 \Driver\pci ffff808f2bd194b0 NTPNP_PCI0006
!DevNode ffff808f2bbdbc40 :DeviceInst is "PCI\VEN_8086&DEV_43D2&SUBSYS_86941043&REV_11\3&11583659&0&B8"ServiceName is "storahci"
4: kd> !DevNode ffff808f2bbdbc40
DevNode 0xffff808f2bbdbc40 for PDO 0xffff808f2bd19360Parent 0xffff808f2a715c40 Sibling 0xffff808f2bbe9c40 Child 0000000000 InstancePath is "PCI\VEN_8086&DEV_43D2&SUBSYS_86941043&REV_11\3&11583659&0&B8"ServiceName is "storahci"State = DeviceNodeStopped (0x30a)Previous State = DeviceNodeAwaitingQueuedRemoval (0x30f)StateHistory[09] = DeviceNodeAwaitingQueuedRemoval (0x30f)StateHistory[08] = DeviceNodeAwaitingQueuedDeletion (0x30e)StateHistory[07] = DeviceNodeStopped (0x30a)StateHistory[06] = DeviceNodeQueryStopped (0x309)StateHistory[05] = DeviceNodeStarted (0x308)StateHistory[04] = DeviceNodeStartPostWork (0x307)StateHistory[03] = DeviceNodeStartCompletion (0x306)StateHistory[02] = DeviceNodeStartPending (0x305)StateHistory[01] = DeviceNodeResourcesAssigned (0x304)StateHistory[00] = DeviceNodeUninitialized (0x301)StateHistory[19] = Unknown State (0x0)StateHistory[18] = Unknown State (0x0)StateHistory[17] = Unknown State (0x0)StateHistory[16] = Unknown State (0x0)StateHistory[15] = Unknown State (0x0)StateHistory[14] = Unknown State (0x0)StateHistory[13] = Unknown State (0x0)StateHistory[12] = Unknown State (0x0)StateHistory[11] = Unknown State (0x0)StateHistory[10] = Unknown State (0x0)Flags (0x6c000030) DNF_ENUMERATED, DNF_IDS_QUERIED, DNF_NO_LOWER_DEVICE_FILTERS, DNF_NO_LOWER_CLASS_FILTERS, DNF_NO_UPPER_DEVICE_FILTERS, DNF_NO_UPPER_CLASS_FILTERS
这里有一个非常奇怪的地方就是State = DeviceNodeStopped (0x30a),设备的状态已经处于Stop状态了,看起来这个状态是异常的,因为一个设备处于工作状态应该是State = DeviceNodeStarted (0x308),但是缺少信息继续深入下去了。
我们看一下整个系统有没有处于异常情况的操作状态,结果发现如下:
4: kd> !thread ffff808f2a6d4040
THREAD ffff808f2a6d4040 Cid 0004.016c Teb: 0000000000000000 Win32Thread: 0000000000000000 WAIT: (Executive) KernelMode Non-Alertableffffd501e1bb7a00 SynchronizationEvent
IRP List:ffff808f40b82010: (0006,01f0) Flags: 00000000 Mdl: 00000000
Not impersonating
DeviceMap ffffae88c1438c3f
Owning Process ffff808f2a6e7040 Image: System
Attached Process N/A Image: N/A
Wait Start TickCount 9546 Ticks: 15103 (0:00:03:55.984)
Context Switch Count 23469 IdealProcessor: 6
UserTime 00:00:00.000
KernelTime 00:00:01.015
Win32 Start Address nt!ExpWorkerThread (0xfffff8014b71cf50)
Stack Init ffffd501e1bb8530 Current ffffd501e1bb75c0
Base ffffd501e1bb9000 Limit ffffd501e1bb2000 Call 0000000000000000
Priority 13 BasePriority 12 PriorityDecrement 0 IoPriority 2 PagePriority 5
Child-SP RetAddr : Args to Child : Call Site
ffffd501`e1bb7600 fffff801`4b7324a7 : ffffbf01`2b372180 fffff801`ffffffff 00000000`00000001 ffffd501`00000000 : nt!KiSwapContext+0x76
ffffd501`e1bb7740 fffff801`4b734359 : ffff808f`00000006 00000000`00000004 ffffd501`e1bb7920 00000000`00000000 : nt!KiSwapThread+0x3a7
ffffd501`e1bb7820 fffff801`4b72e274 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiCommitThreadWait+0x159
ffffd501`e1bb78c0 fffff801`4de6f0d5 : ffffd501`e1bb7a00 ffff808f`00000000 ffff808f`2cf8f300 fffff801`4de2fb00 : nt!KeWaitForSingleObject+0x234
ffffd501`e1bb79b0 fffff801`4ddecdf0 : 00000000`00000000 ffff808f`2a6b0aa0 ffff808f`40b82010 ffff808f`2a6b0aa8 : ACPI!ACPIInitStopDevice+0xd9
ffffd501`e1bb7a30 fffff801`4ddc13d9 : ffff808f`2a6b0aa0 ffff808f`40b82002 ffff808f`40b82010 ffff808f`40b82170 : ACPI!ACPIFilterIrpRemoveDevice+0x250
ffffd501`e1bb7ae0 fffff801`4b703025 : 00000000`00000007 ffff808f`3141c180 ffff808f`40b82010 00000000`74727044 : ACPI!ACPIDispatchIrp+0x3c9
ffffd501`e1bb7b60 fffff801`63adbad3 : ffff808f`3141c180 ffff808f`3141c180 ffff808f`40b82010 ffff808f`3141c030 : nt!IofCallDriver+0x55
ffffd501`e1bb7ba0 fffff801`6396ed55 : ffff808f`3141c102 ffff808f`3141c030 00000000`00000000 ffff808f`40b82010 : dxgkrnl!DpiFdoHandleRemoveDevice+0x3b3
ffffd501`e1bb7be0 fffff801`6392c7ea : ffff808f`3141c030 ffff808f`40b82010 ffff808f`40b82010 00000000`00000000 : dxgkrnl!DpiFdoDispatchPnp+0xd5
ffffd501`e1bb7c80 fffff801`676856db : 00000000`c0000002 ffffd501`e1bb7e19 ffff808f`31587000 ffff808f`3141c030 : dxgkrnl!DpiDispatchPnp+0xea
ffffd501`e1bb7da0 fffff801`67685104 : ffff808f`3141c030 00000000`c0000002 ffff808f`31587000 ffffae88`d2ae2e70 : nvlddmkm!nvDumpConfig+0x4e423b
ffffd501`e1bb7e80 fffff801`4b703025 : ffff808f`3141c030 00000000`00000000 ffffd501`e1bb7fa0 00000000`69706e04 : nvlddmkm!nvDumpConfig+0x4e3c64
ffffd501`e1bb7eb0 fffff801`4badf364 : 00000000`c00000bb ffff808f`3141c030 00000000`00000000 ffff808f`2bd04b08 : nt!IofCallDriver+0x55
ffffd501`e1bb7ef0 fffff801`4bbcab04 : 00000000`00000002 ffff808f`2bd04120 ffff808f`2bd04ae0 ffff808f`2bd04120 : nt!IopSynchronousCall+0xf8
ffffd501`e1bb7f60 fffff801`4b76b961 : ffffae88`d2e84f00 ffff808f`2bd04ae0 00000000`00000200 00000000`0000000a : nt!IopRemoveDevice+0x108
ffffd501`e1bb8010 fffff801`4bbcba9e : ffff808f`2bd04ae0 ffffae88`00000015 00000000`00000000 cb3a4008`00200001 : nt!PnpRemoveLockedDeviceNode+0x1a9
ffffd501`e1bb8070 fffff801`4bbcb803 : ffff808f`2bd04ae0 ffffd501`e1bb80f0 ffff808f`2bd04120 00000000`00000000 : nt!PnpDeleteLockedDeviceNode+0x52
ffffd501`e1bb80b0 fffff801`4bbc9b4c : ffff808f`2bd04120 00000000`00000002 ffff808f`2bd04120 00000000`00000000 : nt!PnpDeleteLockedDeviceNodes+0xcf
ffffd501`e1bb8130 fffff801`4bbc6d99 : ffffd501`e1bb8270 ffff808f`2bd04a00 ffff808f`414e9d00 ffffae88`00000002 : nt!PnpProcessQueryRemoveAndEject+0x2fc
ffffd501`e1bb8210 fffff801`4bba6605 : ffffae88`d2e84f00 ffffae88`d150fe10 ffff808f`2a6e1b00 00000000`00000000 : nt!PnpProcessTargetDeviceEvent+0x109
ffffd501`e1bb8240 fffff801`4b71d09f : ffff808f`2a6e1bd0 ffff808f`2a6d4040 ffff808f`414e9d90 fffff801`00000000 : nt!PnpDeviceEventWorker+0x2c5
ffffd501`e1bb82c0 fffff801`4b6478f5 : ffff808f`2a6d4040 ffffbf01`2b6b0000 ffff808f`2a6d4040 005fa4ef`b59bbfff : nt!ExpWorkerThread+0x14f
ffffd501`e1bb84b0 fffff801`4b819bb4 : ffffbf01`2b6a1180 ffff808f`2a6d4040 fffff801`4b6478a0 e0458b48`217503fa : nt!PspSystemThreadStartup+0x55
ffffd501`e1bb8500 00000000`00000000 : ffffd501`e1bb9000 ffffd501`e1bb2000 00000000`00000000 00000000`00000000 : nt!KiStartSystemThread+0x34
这个操作看起来也比较奇怪(dxgkrnl!DpiFdoHandleRemoveDevice),因为显卡设备一般来说都不会拔除掉(并且这个IRP似乎也没有被底层设备完成了),我们看下显卡设备的状态如下:
4: kd> !DevNode ffff808f2bd04ae0
DevNode 0xffff808f2bd04ae0 for PDO 0xffff808f2bd04120Parent 0xffff808f2bbd6c40 Sibling 0xffff808f2bd06ae0 Child 0xffff808f36386cb0 InstancePath is "PCI\VEN_10DE&DEV_2489&SUBSYS_150A7377&REV_A1\4&39fddc03&0&0008"ServiceName is "nvlddmkm"State = DeviceNodeRemoved (0x312)Previous State = DeviceNodeStopped (0x30a)StateHistory[13] = DeviceNodeStopped (0x30a)StateHistory[12] = DeviceNodeAwaitingQueuedRemoval (0x30f)StateHistory[11] = DeviceNodeAwaitingQueuedDeletion (0x30e)StateHistory[10] = DeviceNodeStopped (0x30a)StateHistory[09] = DeviceNodeAwaitingQueuedRemoval (0x30f)StateHistory[08] = DeviceNodeAwaitingQueuedDeletion (0x30e)StateHistory[07] = DeviceNodeStopped (0x30a)StateHistory[06] = DeviceNodeQueryStopped (0x309)StateHistory[05] = DeviceNodeStarted (0x308)StateHistory[04] = DeviceNodeStartPostWork (0x307)StateHistory[03] = DeviceNodeStartCompletion (0x306)StateHistory[02] = DeviceNodeStartPending (0x305)StateHistory[01] = DeviceNodeResourcesAssigned (0x304)StateHistory[00] = DeviceNodeUninitialized (0x301)StateHistory[19] = Unknown State (0x0)StateHistory[18] = Unknown State (0x0)StateHistory[17] = Unknown State (0x0)StateHistory[16] = Unknown State (0x0)StateHistory[15] = Unknown State (0x0)StateHistory[14] = Unknown State (0x0)Flags (0x6e0000f0) DNF_ENUMERATED, DNF_IDS_QUERIED, DNF_HAS_BOOT_CONFIG, DNF_BOOT_CONFIG_RESERVED, DNF_UNINSTALLED, DNF_NO_LOWER_DEVICE_FILTERS, DNF_NO_LOWER_CLASS_FILTERS, DNF_NO_UPPER_DEVICE_FILTERS, DNF_NO_UPPER_CLASS_FILTERSUserFlags (0x00000300) Unknown flags 0x00000300CapabilityFlags (0x00600400) WakeFromD0Unknown flags 0x00600000
处于DeviceNodeRemoved状态,显然这个也是有点异常的。
接下来我们看一下所有硬件设备树的状态信息,如下:
4: kd> !devnode 0 1
Dumping IopRootDeviceNode (= 0xffff808f2a7dbaa0)
DevNode 0xffff808f2a7dbaa0 for PDO 0xffff808f2a8cbd50InstancePath is "HTREE\ROOT\0"State = DeviceNodeStopped (0x30a)Previous State = DeviceNodeAwaitingQueuedRemoval (0x30f)DevNode 0xffff808f2a8c6aa0 for PDO 0xffff808f2a8ced70InstancePath is "ROOT\volmgr\0000"ServiceName is "volmgr"State = DeviceNodeStopped (0x30a)Previous State = DeviceNodeAwaitingQueuedRemoval (0x30f)DevNode 0xffff808f2e510ca0 for PDO 0xffff808f2e50cc10InstancePath is "STORAGE\Volume\{29d2189a-7903-11ec-b987-806e6f6e6963}#0000000000200000"ServiceName is "volume"TargetDeviceNotify List - f 0xffffae88c18b0990 b 0xffffae88c4069c90State = DeviceNodeStopped (0x30a)Previous State = DeviceNodeQueryStopped (0x309)DevNode 0xffff808f2e514ca0 for PDO 0xffff808f2e50d8f0InstancePath is "STORAGE\Volume\{29d2189a-7903-11ec-b987-806e6f6e6963}#0000000012E00000"ServiceName is "volume"State = DeviceNodeStopped (0x30a)Previous State = DeviceNodeQueryStopped (0x309)DevNode 0xffff808f2e515ca0 for PDO 0xffff808f2e5108f0InstancePath is "STORAGE\Volume\{29d2189a-7903-11ec-b987-806e6f6e6963}#000000001AE00000"ServiceName is "volume"TargetDeviceNotify List - f 0xffffae88c18b0910 b 0xffffae88c4069090State = DeviceNodeStopped (0x30a)Previous State = DeviceNodeQueryStopped (0x309)DevNode 0xffff808f2e5478a0 for PDO 0xffff808f2e5148f0InstancePath is "STORAGE\Volume\{29d2189a-7903-11ec-b987-806e6f6e6963}#00000013EEF00000"ServiceName is "volume"TargetDeviceNotify List - f 0xffffae88c1859690 b 0xffffae88c4069c10State = DeviceNodeStopped (0x30a)Previous State = DeviceNodeQueryStopped (0x309)
竟然发现整个设备树的设备都是DeviceNodeStopped状态,是什么情况导致的呢?还未可知。
分析进行到这一步似乎在继续下去已经比较困难了,是否这里可以做一个相关猜想:由于硬件的异常,导致设备都处于Stopped状态了,这样硬件设备对于默认IRP的处理就出现问题,例如IRP_MN_SET_POWER,导致IRP处理超时蓝屏。但这仅仅是一个猜想,真实原因是否这样也不可知。
当然这个问题后续的方法可以通过动态调试,从开机开始观察整个系统和硬件设备的状态,来更进一步确认原因;由于现场环境的缺失(并且网友后面通过多次重装系统解决了),并且本人对于硬件驱动的了解也不是特别深入(上述设备的状态是否异常,已经是否有跟进一步的信息确认),本次分析就到这里了。
4. 总结
我们先来看一个问题,据反馈系统大致运行7分钟之后出现蓝屏,为什么会是这个时间呢?因为下面这个超时时间:
4: kd> dd nt!PopWatchdogSleepTimeout L1
fffff801`4c105078 0000012c
4: kd> ? 0000012c
Evaluate expression: 300 = 00000000`0000012c
300秒刚好是5分钟,估计这个IRP应该是在开机大致2分钟上下的时间产生的,因此也就导致系统在7分钟的时间出现蓝屏。
综上,上述问题的产生似乎并非软件引起,那么是否是硬件上面的不兼容呢(组装硬件的兼容性问题)?看起来比较大概率是的,但是没有最终的结论。
很遗憾,这个问题花了我周末一天的时间,依旧没有找到具体的根因?如果有网友看到本文章,有思路和建议欢迎留言给出。
5. 附录
这个问题网友后面通过重装了多次操作系统然后解决了,让其帮忙取一下正常情况的转储文件,发现设备状态如下:
0: kd> !devnode 0 1
Dumping IopRootDeviceNode (= 0xffff8d87e27f1aa0)
DevNode 0xffff8d87e27f1aa0 for PDO 0xffff8d87e27f0d70InstancePath is "HTREE\ROOT\0"State = DeviceNodeStopped (0x30a)Previous State = DeviceNodeAwaitingQueuedRemoval (0x30f)DevNode 0xffff8d87e27f2aa0 for PDO 0xffff8d87e28e7d70InstancePath is "ROOT\volmgr\0000"ServiceName is "volmgr"State = DeviceNodeStopped (0x30a)Previous State = DeviceNodeAwaitingQueuedRemoval (0x30f)DevNode 0xffff8d87e6527ca0 for PDO 0xffff8d87e65218f0InstancePath is "STORAGE\Volume\{97e02de8-85b9-11ec-baee-806e6f6e6963}#0000000000200000"ServiceName is "volume"TargetDeviceNotify List - f 0xffffe5889e602f10 b 0xffffe5889e3b7b90State = DeviceNodeStopped (0x30a)Previous State = DeviceNodeQueryStopped (0x309)DevNode 0xffff8d87e6528ca0 for PDO 0xffff8d87e65278f0InstancePath is "STORAGE\Volume\{97e02de8-85b9-11ec-baee-806e6f6e6963}#0000000012E00000"ServiceName is "volume"State = DeviceNodeStopped (0x30a)Previous State = DeviceNodeQueryStopped (0x309)DevNode 0xffff8d87e6550ca0 for PDO 0xffff8d87e65288f0InstancePath is "STORAGE\Volume\{97e02de8-85b9-11ec-baee-806e6f6e6963}#000000001AE00000"ServiceName is "volume"TargetDeviceNotify List - f 0xffffe5889e603a90 b 0xffffe5889e3b7610State = DeviceNodeStopped (0x30a)Previous State = DeviceNodeQueryStopped (0x309)DevNode 0xffff8d87e6555ca0 for PDO 0xffff8d87e65508f0InstancePath is "STORAGE\Volume\{97e02de8-85b9-11ec-baee-806e6f6e6963}#00000013EEF00000"ServiceName is "volume"TargetDeviceNotify List - f 0xffffe5889e603290 b 0xffffe5889e3b7790State = DeviceNodeStopped (0x30a)Previous State = DeviceNodeQueryStopped (0x309)DevNode 0xffff8d87e65568a0 for PDO 0xffff8d87e65558f0InstancePath is "STORAGE\Volume\{97e02de8-85b9-11ec-baee-806e6f6e6963}#000000141B000000"ServiceName is "volume"TargetDeviceNotify List - f 0xffffe5889e60f790 b 0xffffe5889e3b7f90State = DeviceNodeStopped (0x30a)Previous State = DeviceNodeQueryStopped (0x309)
这个设备状态依旧是State = DeviceNodeStopped (0x30a),但是系统仍然可以正常使用;看起来似乎挺奇怪的,但也先到此为止吧!
本文标签: DRIVER
版权声明:本文标题:DRIVER 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.freenas.com.cn/jishu/1712983665h625989.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论