Practical analysis of dump and strace commands causing watchdog problems caused by keystore deadlock

1. Problem description

Recently, a system stuck problem occurred. It is speculated that a watchdog appears in system_server. Looking at the watchdog file, it is PackageManager stuck in the KeyStore.clearUid() interface:

----- pid 1091 at 2023-10-26 09:40:10 -----
Cmd line: system_server
ABI: 'arm64'

"PackageManager" sysTid=1352
  #00 pc 000000000006e1c4 /system/lib64/libc.so (__ioctl + 4)
  #01 pc 00000000000290d0 /system/lib64/libc.so (ioctl + 136)
  #02 pc 000000000005a5e4 /system/lib64/libbinder.so (android::IPCThreadState::talkWithDriver(bool) + 256)
  #03 pc 000000000005b48c /system/lib64/libbinder.so (android::IPCThreadState::waitForResponse(android::Parcel*, int*) + 340)
  #04 pc 000000000005b1c0 /system/lib64/libbinder.so (android::IPCThreadState::transact(int, unsigned int, android::Parcel const & amp;, android::Parcel*, unsigned int) + 176)
  #05 pc 0000000000051318 /system/lib64/libbinder.so (android::BpBinder::transact(unsigned int, android::Parcel const & amp;, android::Parcel*, unsigned int) + 72)
  #06 pc 000000000012f72c /system/lib64/libandroid_runtime.so (android_os_BinderProxy_transact(_JNIEnv*, _jobject*, int, _jobject*, _jobject*, int) + 152)
  #07 pc 0000000000406e34 /system/framework/arm64/boot-framework.oat (offset 0x3d2000) (android.os.BinderProxy.transactNative + 196)
  #08 pc 0000000000c00d5c /system/framework/arm64/boot-framework.oat (offset 0x3d2000) (android.os.BinderProxy.transact + 748)
  #09 pc 000000000065e4e8 /system/framework/arm64/boot-framework.oat (offset 0x3d2000) (android.security.IKeystoreService$Stub$Proxy.clear_uid + 216)
  #10 pc 0000000000556578 /system/framework/arm64/boot-framework.oat (offset 0x3d2000) (android.security.KeyStore.clearUid + 88)
  #11 pc 00000000014bdba4 /system/framework/oat/arm64/services.odex (offset 0x64d000) (com.android.server.pm.PackageManagerService.removeKeystoreDataIfNeeded + 244)
  #12 pc 00000000014bea94 /system/framework/oat/arm64/services.odex (offset 0x64d000) (com.android.server.pm.PackageManagerService.removePackageDataLIF + 2836)
  #13 pc 00000000014900b0 /system/framework/oat/arm64/services.odex (offset 0x64d000) (com.android.server.pm.PackageManagerService.deleteInstalledPackageLIF + 688)
  #14 pc 000000000149161c /system/framework/oat/arm64/services.odex (offset 0x64d000) (com.android.server.pm.PackageManagerService.deletePackageLIF + 3532)
  #15 pc 00000000014dd554 /system/framework/oat/arm64/services.odex (offset 0x64d000) (com.android.server.pm.PackageManagerService.deletePackageX + 2724)
  #16 pc 0000000000d9735c /system/framework/oat/arm64/services.odex (offset 0x64d000) (com.android.server.pm.PackageManagerService$12.run + 476)
  #17 pc 0000000000ab737c /system/framework/arm64/boot-framework.oat (offset 0x3d2000) (android.os.Handler.dispatchMessage + 76)
  #18 pc 0000000000aba4e0 /system/framework/arm64/boot-framework.oat (offset 0x3d2000) (android.os.Looper.loop + 1264)
  #19 pc 0000000000ab9380 /system/framework/arm64/boot-framework.oat (offset 0x3d2000) (android.os.HandlerThread.run + 544)
  #20 pc 00000000007d57b4 /system/framework/oat/arm64/services.odex (offset 0x64d000) (com.android.server.ServiceThread.run + 100)
  #21 pc 0000000000554d88 /system/lib64/libart.so (art_quick_invoke_stub + 584)
  #22 pc 00000000000cf6c8 /system/lib64/libart.so (art::ArtMethod::Invoke(art::Thread*, unsigned int*, unsigned int, art::JValue*, char const*) + 200)
  #23 pc 000000000045c840 /system/lib64/libart.so (art::(anonymous namespace)::InvokeWithArgArray(art::ScopedObjectAccessAlreadyRunnable const & amp;, art::ArtMethod*, art::(anonymous namespace)::ArgArray* , art::JValue*, char const*) + 104)
  #24 pc 000000000045d8fc /system/lib64/libart.so (art::InvokeVirtualOrInterfaceWithJValues(art::ScopedObjectAccessAlreadyRunnable const & amp;, _jobject*, _jmethodID*, jvalue*) + 424)
  #25 pc 0000000000488a94 /system/lib64/libart.so (art::Thread::CreateCallback(void*) + 1120)
  #26 pc 00000000000819b4 /system/lib64/libc.so (__pthread_start(void*) + 36)
  #27 pc 0000000000023478 /system/lib64/libc.so (__start_thread + 68)

The problem occurs when some apps are installed and uninstalled from the app store. After further investigation, this problem occurred after installing and deleting an application. Using adb uninstall, you can find that the command line is stuck in uninstallation and does not return normally.

After reading the code, the thread pool registered by the keystore service is 1. If the keystoreservice is stuck when deleting this application, the subsequent module that calls the keystoreservice interface will also be stuck.

Second, use the strace command

8155 platform keystoreservice is android. keymaster is Qualcomm’s qtikeymaster. So I wrote the following three batch commands to track the system calls of system_server, keystoreservice, and qtikeymaster.

1. Write the following batch program strace_keystore.bat to start capturing the system calls of keystore and keymaster.

echo "enable adb connecting first..."

adb wait-for-device
adb root

echo Wscript.sleep Wscript.Arguments(0)>Delay.vbs
Delay.vbs 5000
delDelay.vbs

adb shell mkdir /data/strace

md d:\stracelog

adb shell "echo 512 > /sys/module/binder/parameters/debug_mask"

adb shell pidof [email protected] > d:\stracelog\keymaster.pid

adb shell pidof keystore > d:\stracelog\keystore.pid

adb shell "strace -p `pidof keystore` -p `pidof [email protected]` -o /data/strace/process -f -ff -tt"

The above command will save the trace log in the system/data/strace directory. The d:\stracelog folder in the local windows directory saves the keystore and keymaster processes when a problem occurs, so that the /data/strace/ directory LOG can be exported later to confirm the process.

2. Reproduce the problem through adb uninstall com.xxx.car.

3. Write an export_strace_logs.bat batch program to export LOG.

adb pull /data/strace/ d:\stracelog
adb shell "rm /data/strace/*"

4. Export adb log at the same time.

5. Analyze the strace log and see that the keystore is stuck at:

09:47:17.190694 ioctl(4, BINDER_WRITE_READ, 0x7ff422e268) = 0
09:47:17.199964 ioctl(7, BINDER_WRITE_READ, 0x7ff4236a78) = 0
09:47:17.200665 futex(0x7339a462c8, FUTEX_WAKE_PRIVATE, 2147483647) = 0
09:47:17.200762 ioctl(7, BINDER_WRITE_READ, 0x7ff4236ca8) = 0
09:47:17.200844 futex(0x7339a462c8, FUTEX_WAKE_PRIVATE, 2147483647) = 0
09:47:17.200929 newfstatat(AT_FDCWD, "timestamp", {st_mode=S_IFREG|0600, st_size=0, ...}, 0) = 0
09:47:17.201304 ioctl(7, BINDER_WRITE_READ, 0x7ff422e138) = 0
09:47:17.201724 mprotect(0x73bbd6e000, 4096, PROT_READ|PROT_WRITE) = 0
09:47:17.201806 mprotect(0x73bbd6e000, 4096, PROT_READ) = 0
09:47:17.202226 futex(0x73bb575080, FUTEX_WAKE_PRIVATE, 2147483647) = 0
09:47:17.202310 ioctl(7, BINDER_WRITE_READ, 0x7ff422e208) = 0
09:47:17.203316 faccessat(AT_FDCWD, "user_0/10041_USRPKEY_no506b3822wb", R_OK|W_OK) = 0
09:47:17.203436 openat(AT_FDCWD, "user_0/10041_USRPKEY_no506b3822wb", O_RDONLY) = 9
09:47:17.203536 read(9, "\3\4\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0 \0\0\0\0\0\0\0\0\0\0\0\0"..., 32824) = 823
09:47:17.203612 read(9, "", 32001) = 0
09:47:17.203670 close(9) = 0
09:47:17.203756 ioctl(4, BINDER_WRITE_READ

This is abnormal. This situation indicates that it is waiting for the return of the binder peer. If it is in the normal idle state, the last printing of the process should be blocked in the following code:

00:05:07.682578 ioctl(7, BINDER_WRITE_READ <detached ...>

Continue to view the strace log of qtikeymaster:

9:47:17.205302 close(6) = 0
09:47:17.205372 madvise(0x765dcb4000, 102400, MADV_DONTNEED) = 0
09:47:17.205447 madvise(0x765dc92000, 102400, MADV_DONTNEED) = 0
09:47:17.205526 openat(AT_FDCWD, "/dev/hab", O_RDWR) = 6
09:47:17.205613 ioctl(6, _IOC(_IOC_READ|_IOC_WRITE, 0xa, 0x4, 0x10), 0x7fe6247b68) = 0
09:47:17.206340 getuid() = 1000
09:47:17.206412 writev(5, [{iov_base="\0a\1\245s8e\351,J\f", iov_len=11}, {iov_base="\4", iov_len=1}, {iov_base=" uhab\0", iov_len=5}, {iov_base="habmm_socket_open: opened fd 6, "..., iov_len=67}], 4) = 84
09:47:17.206709 ioctl(6, _IOC(_IOC_WRITE, 0xa, 0x2, 0x18), 0x7fe6247b70) = 0
09:47:17.206960 ioctl(6, _IOC(_IOC_READ|_IOC_WRITE, 0xa, 0x3, 0x18)

Finally, it is also a read function stuck in the ioctl interface. This function has no return value, indicating that it is stuck. Since keystoreservice is also a client of qtikeymaster, the security environment is implemented by the manufacturer. Stuck on qtikeymaster.

3. Confirm which interface the card is on

By adding LOG to system/security/keystore/key_store_service.cpp, it was finally confirmed that the keystore interface called when the problematic APP is initialized when connected to the Internet and not connected to the Internet is different:

Internet-enabled installation:
keystore: del USRPKEY_CMCC_SDK
keystore: del USRCERT_CMCC_SDK
keystore: del CACERT_CMCC_SDK
keystore: del USRPKEY_no506b3822wb
keystore: del USRCERT_no506b3822wb
keystore: del CACERT_no506b3822wb
keystore: attestKey
keystore: del USRPKEY_sec506b3822
keystore: del USRCERT_sec506b3822
keystore: del CACERT_sec506b3822
With network uninstallation:
keystore: clear_uid
Installation without network:
keystore: del USRPKEY_CMCC_SDK
keystore: del USRCERT_CMCC_SDK
keystore: del CACERT_CMCC_SDK
Uninstall without network:
keystore: clear_uid

The last interface that got stuck was keystore: attestKey. Asked to confirm the call stack, we need the command to print the call stack when stuck:

dump1_system_server.bat

adb shell "debuggerd -b `pidof system_server`" >d:\stracelog\dump_system_server.txt

dump2_keystore.bat

adb shell "debuggerd -b `pidof keystore`" >d:\stracelog\dump_keystore.txt

dump3_keymaster.bat

adb shell "debuggerd -b `pidof [email protected]` " >d:\stracelog\dump_keymaster.txt

Note that the debuggerd -b command will cause the current process to exit the block state and continue. Therefore, if the locks have a nested relationship, the outer layer should be dumped first, and then the root layer application should be dumped. Therefore, the order of dump in this example is system_server first, and qtikeymaster last.

The result of dump is:

----- pid 962 at 2023-10-26 20:47:27 -----
Cmd line: system_server
ABI: 'arm64'


"PackageManager" sysTid=1121
  #00 pc 000000000006e1c4 /system/lib64/libc.so (__ioctl + 4)
  #01 pc 00000000000290d0 /system/lib64/libc.so (ioctl + 136)
  #02 pc 000000000005a5e4 /system/lib64/libbinder.so (android::IPCThreadState::talkWithDriver(bool) + 256)
  #03 pc 000000000005b48c /system/lib64/libbinder.so (android::IPCThreadState::waitForResponse(android::Parcel*, int*) + 340)
  #04 pc 000000000005b1c0 /system/lib64/libbinder.so (android::IPCThreadState::transact(int, unsigned int, android::Parcel const & amp;, android::Parcel*, unsigned int) + 176)
  #05 pc 0000000000051318 /system/lib64/libbinder.so (android::BpBinder::transact(unsigned int, android::Parcel const & amp;, android::Parcel*, unsigned int) + 72)
  #06 pc 000000000012f72c /system/lib64/libandroid_runtime.so (android_os_BinderProxy_transact(_JNIEnv*, _jobject*, int, _jobject*, _jobject*, int) + 152)
  #07 pc 0000000000406e34 /system/framework/arm64/boot-framework.oat (offset 0x3d2000) (android.os.BinderProxy.transactNative + 196)
  #08 pc 0000000000c00d5c /system/framework/arm64/boot-framework.oat (offset 0x3d2000) (android.os.BinderProxy.transact + 748)
  #09 pc 000000000065e4e8 /system/framework/arm64/boot-framework.oat (offset 0x3d2000) (android.security.IKeystoreService$Stub$Proxy.clear_uid + 216)
  #10 pc 0000000000556578 /system/framework/arm64/boot-framework.oat (offset 0x3d2000) (android.security.KeyStore.clearUid + 88)
  #11 pc 00000000014bdba4 /system/framework/oat/arm64/services.odex (offset 0x64d000) (com.android.server.pm.PackageManagerService.removeKeystoreDataIfNeeded + 244)
  #12 pc 00000000014bea94 /system/framework/oat/arm64/services.odex (offset 0x64d000) (com.android.server.pm.PackageManagerService.removePackageDataLIF + 2836)
  #13 pc 00000000014900b0 /system/framework/oat/arm64/services.odex (offset 0x64d000) (com.android.server.pm.PackageManagerService.deleteInstalledPackageLIF + 688)
  #14 pc 000000000149161c /system/framework/oat/arm64/services.odex (offset 0x64d000) (com.android.server.pm.PackageManagerService.deletePackageLIF + 3532)
  #15 pc 00000000014dd554 /system/framework/oat/arm64/services.odex (offset 0x64d000) (com.android.server.pm.PackageManagerService.deletePackageX + 2724)
  #16 pc 0000000000d97634 /system/framework/oat/arm64/services.odex (offset 0x64d000) (com.android.server.pm.PackageManagerService$12.run + 1204)
  #17 pc 0000000000ab737c /system/framework/arm64/boot-framework.oat (offset 0x3d2000) (android.os.Handler.dispatchMessage + 76)
  #18 pc 0000000000aba4e0 /system/framework/arm64/boot-framework.oat (offset 0x3d2000) (android.os.Looper.loop + 1264)
  #19 pc 0000000000ab9380 /system/framework/arm64/boot-framework.oat (offset 0x3d2000) (android.os.HandlerThread.run + 544)
  #20 pc 00000000007d57b4 /system/framework/oat/arm64/services.odex (offset 0x64d000) (com.android.server.ServiceThread.run + 100)
  #21 pc 0000000000554d88 /system/lib64/libart.so (art_quick_invoke_stub + 584)
  #22 pc 00000000000cf6c8 /system/lib64/libart.so (art::ArtMethod::Invoke(art::Thread*, unsigned int*, unsigned int, art::JValue*, char const*) + 200)
  #23 pc 000000000045c840 /system/lib64/libart.so (art::(anonymous namespace)::InvokeWithArgArray(art::ScopedObjectAccessAlreadyRunnable const & amp;, art::ArtMethod*, art::(anonymous namespace)::ArgArray* , art::JValue*, char const*) + 104)
  #24 pc 000000000045d8fc /system/lib64/libart.so (art::InvokeVirtualOrInterfaceWithJValues(art::ScopedObjectAccessAlreadyRunnable const & amp;, _jobject*, _jmethodID*, jvalue*) + 424)
  #25 pc 0000000000488a94 /system/lib64/libart.so (art::Thread::CreateCallback(void*) + 1120)
  #26 pc 00000000000819b4 /system/lib64/libc.so (__pthread_start(void*) + 36)
  #27 pc 0000000000023478 /system/lib64/libc.so (__start_thread + 68)





----- pid 904 at 2023-10-26 20:47:30 -----
Cmd line: /system/bin/keystore
ABI: 'arm64'

"keystore" sysTid=904
  #00 pc 000000000006e1c4 /system/lib64/libc.so (__ioctl + 4)
  #01 pc 00000000000290d0 /system/lib64/libc.so (ioctl + 136)
  #02 pc 000000000001cfdc /system/lib64/libhwbinder.so (android::hardware::IPCThreadState::talkWithDriver(bool) + 240)
  #03 pc 000000000001e018 /system/lib64/libhwbinder.so (android::hardware::IPCThreadState::waitForResponse(android::hardware::Parcel*, int*) + 60)
  #04 pc 000000000001cce4 /system/lib64/libhwbinder.so (android::hardware::BpHwBinder::transact(unsigned int, android::hardware::Parcel const & amp;, android::hardware::Parcel*, unsigned int , std::__1::function<void (android::hardware::Parcel & amp;)>) + 132)
  #05 pc 00000000000148e8 /system/lib64/[email protected] (android::hardware::keymaster::V4_0::BpHwKeymasterDevice::_hidl_attestKey(android::hardware::IInterface*, android::hardware: :details::HidlInstrumentor*, android::hardware::hidl_vec<unsigned char> const & amp;, android::hardware::hidl_vec<android::hardware::keymaster::V4_0::KeyParameter> const & amp;, std::__1::function<void (android::hardware::keymaster::V4_0::ErrorCode, android::hardware::hidl_vec<android::hardware::hidl_vec<unsigned char>> co #06 pc 00000000000184d8 / system/lib64/[email protected] (android::hardware::keymaster::V4_0::BpHwKeymasterDevice::attestKey(android::hardware::hidl_vec<unsigned char> const & amp;, android:: hardware::hidl_vec<android::hardware::keymaster::V4_0::KeyParameter> const & amp;, std::__1::function<void (android::hardware::keymaster::V4_0::ErrorCode, android: :hardware::hidl_vec<android::hardware::hidl_vec<unsigned char>> const & amp;)>) + 160)
  #07 pc 00000000000482c4 /system/bin/keystore (android::hardware::keymaster::V4_0::support::Keymaster4::attestKey(android::hardware::hidl_vec<unsigned char> const & amp;, android:: hardware::hidl_vec<android::hardware::keymaster::V4_0::KeyParameter> const & amp;, std::__1::function<void (android::hardware::keymaster::V4_0::ErrorCode, android: :hardware::hidl_vec<android::hardware::hidl_vec<unsigned char>> const & amp;)>) + 212)
  #08 pc 000000000003d604 /system/bin/keystore (keystore::KeyStoreService::attestKey(android::String16 const & amp;, android::security::keymaster::KeymasterArguments const & amp;, android::security::keymaster ::KeymasterCertificateChain*, int*) + 832)
  #09 pc 0000000000016554 /system/lib64/libkeystore_aidl.so (android::security::BnKeystoreService::onTransact(unsigned int, android::Parcel const & amp;, android::Parcel*, unsigned int) + 1432)
  #10 pc 000000000004fa60 /system/lib64/libbinder.so (android::BBinder::transact(unsigned int, android::Parcel const & amp;, android::Parcel*, unsigned int) + 136)
  #11 pc 000000000005ab00 /system/lib64/libbinder.so (android::IPCThreadState::executeCommand(int) + 520)
  #12 pc 000000000005a83c /system/lib64/libbinder.so (android::IPCThreadState::getAndExecuteCommand() + 156)
  #13 pc 000000000005af04 /system/lib64/libbinder.so (android::IPCThreadState::joinThreadPool(bool) + 60)
  #14 pc 0000000000049484 /system/bin/keystore (main.cfi + 1364)
  #15 pc 00000000000acac8 /system/lib64/libc.so (__libc_init + 88)

"HwBinder:904_1" sysTid=941
  #00 pc 000000000006e1c4 /system/lib64/libc.so (__ioctl + 4)
  #01 pc 00000000000290d0 /system/lib64/libc.so (ioctl + 136)
  #02 pc 000000000001cfdc /system/lib64/libhwbinder.so (android::hardware::IPCThreadState::talkWithDriver(bool) + 240)
  #03 pc 000000000001425c /system/lib64/libhwbinder.so (android::hardware::IPCThreadState::getAndExecuteCommand() + 20)
  #04 pc 0000000000014588 /system/lib64/libhwbinder.so (android::hardware::IPCThreadState::joinThreadPool(bool) + 268)
  #05 pc 000000000001c1ac /system/lib64/libhwbinder.so (android::hardware::PoolThread::threadLoop() + 24)
  #06 pc 000000000000f9d8 /system/lib64/libutils.so (android::Thread::_threadLoop(void*) + 280)
  #07 pc 00000000000819b4 /system/lib64/libc.so (__pthread_start(void*) + 36)
  #08 pc 0000000000023478 /system/lib64/libc.so (__start_thread + 68)

----- end 904 -----


----- pid 354 at 2023-10-26 20:47:33 -----
Cmd line: /vendor/bin/hw/[email protected]
ABI: 'arm64'

"[email protected]" sysTid=354
  #00 pc 000000000006e1c8 /system/lib64/libc.so (__ioctl + 8)
  #01 pc 00000000000290d0 /system/lib64/libc.so (ioctl + 136)
  #02 pc 0000000000001260 /vendor/lib64/libuhab.so (habmm_socket_recv + 96)
  #03 pc 0000000000002150 /vendor/lib64/libkeymasterdeviceutils.so (km_comm_send_cmd + 444)
  #04 pc 000000000000cd00 /vendor/lib64/libqtikeymaster4.so (keymasterdevice::KeymasterHalDevice::attest_key(keymaster_key_blob_t const*, keymaster_key_param_set_t const*, keymaster_cert_chain_t*) + 892)
  #05 pc 000000000000919c /vendor/lib64/libqtikeymaster4.so (keymaster::V4_0::Keymaster4Device::attestKey(android::hardware::hidl_vec<unsigned char> const & amp;, android::hardware::hidl_vec<android: :hardware::keymaster::V4_0::KeyParameter> const & amp;, std::__1::function<void (android::hardware::keymaster::V4_0::ErrorCode, android::hardware::hidl_vec<android ::hardware::hidl_vec<unsigned char>> const & amp;)>) + 132)
  #06 pc 000000000001bf18 /system/lib64/vndk-28/[email protected] (android::hardware::keymaster::V4_0::BnHwKeymasterDevice::_hidl_attestKey(android::hidl::base::V1_0 ::BnHwBase*, android::hardware::Parcel const & amp;, android::hardware::Parcel*, std::__1::function<void (android::hardware::Parcel & amp;)>) + 484)
  #07 pc 000000000001e820 /system/lib64/vndk-28/[email protected] (android::hardware::keymaster::V4_0::BnHwKeymasterDevice::onTransact(unsigned int, android::hardware::Parcel const & amp;, android::hardware::Parcel*, unsigned int, std::__1::function<void (android::hardware::Parcel & amp;)>) + 2048)
  #08 pc 000000000001d334 /system/lib64/vndk-sp-28/libhwbinder.so (android::hardware::IPCThreadState::executeCommand(int) + 640)
  #09 pc 000000000001430c /system/lib64/vndk-sp-28/libhwbinder.so (android::hardware::IPCThreadState::getAndExecuteCommand() + 196)
  #10 pc 0000000000014588 /system/lib64/vndk-sp-28/libhwbinder.so (android::hardware::IPCThreadState::joinThreadPool(bool) + 268)
  #11 pc 0000000000000f64 /vendor/bin/hw/[email protected] (main + 128)
  #12 pc 00000000000acac8 /system/lib64/libc.so (__libc_init + 88)

----- end 354 -----


Finally, let qtikeymaster check:

keymasterdevice::KeymasterHalDevice::attest_key

4. QSEE log

Under QSEE, the tz log needs to be exported to Qualcomm for analysis. tz log can be obtained by typing the command on the QNX terminal when the problem recurs:

cat /dev/pdbg/qcore/tzdiag/qseecom
cat /dev/pdbg/qcore/tzdiag/log