之前有简单介绍过abba锁，在了解crash的时候，发现其特别方便用于熟悉和调试内核，能够查看其实时的变量状态和出现堆栈后的状态，本文基于锁来简单演示通过crash查看锁状态，从而提供调试锁的一种方法

测试代码

在<ABBA锁介绍>的文章中提供了测试代码，复用即可

spinlock测试

在测试之前，我们查看spinlock结构体如下


crash> struct qspinlock -o -x
struct qspinlock {
        union {
  [0x0]     atomic_t val;
            struct {
  [0x0]         u8 locked;
  [0x1]         u8 pending;
            };
            struct {
  [0x0]         u16 locked_pending;
  [0x2]         u16 tail;
            };
        };
}
SIZE: 0x4

读取变量的信息如下


crash> rd -8 spinlock_a 4
ffffffc001542030:  00 00 00 00                                       ....
crash> rd -8 spinlock_b 4
ffffffc001542018:  00 00 00 00                                       ....

我们开启spinlock测试


echo 1 > /sys/module/test/parameters/testsuite

此时我们可以观察到spinlock_a/b变量的值如下


crash> rd -8 spinlock_a 4
ffffffc001542030:  01 01 00 00                                       ....
crash> rd -8 spinlock_b 4
ffffffc001542018:  01 01 00 00                                       ....

上面数据需要注意大小端，我们可以直接解析如下


crash> struct qspinlock spinlock_a -x
struct qspinlock {
  {
    val = {
      counter = 0x101
    },
    {
      locked = 0x1,
      pending = 0x1
    },
    {
      locked_pending = 0x101,
      tail = 0x0
    }
  }
}

crash> struct qspinlock spinlock_b -x
struct qspinlock {
  {
    val = {
      counter = 0x101
    },
    {
      locked = 0x1,
      pending = 0x1
    },
    {
      locked_pending = 0x101,
      tail = 0x0
    }
  }
}

可以看到，spinlock_a/b 的信息如下：

locked是1，代表锁被持有
pending是1，代表有一个任务尝试获取锁

可以看到，与代码现象ABBA锁相符

mutex测试

同样的，测试之前先读取数据结构


crash> struct mutex -o -x
struct mutex {
   [0x0] atomic_long_t owner;
   [0x8] spinlock_t wait_lock;
  [0x20] struct optimistic_spin_queue osq;
  [0x28] struct list_head wait_list;
}
SIZE: 0x38

然后我们确定mutex的owner默认值是0


crash> struct mutex.owner mutex_a -x
  owner = {
    counter = 0x0
  }
crash> struct mutex.owner mutex_b -x
  owner = {
    counter = 0x0
  }

开始测试


echo 2 > /sys/module/test/parameters/testsuite

此时我们看到mutex.owner变成了一个值，说明有人持有这个锁，如下


crash> struct mutex.owner mutex_a -x
  owner = {
    counter = 0xffffff804e286581
  }
crash> struct mutex.owner mutex_b -x
  owner = {
    counter = 0xffffff800dfd1d01
  }

对于mutex，我们知道其flag如下


#define MUTEX_FLAG_WAITERS      0x01

其中含义：是有任务正在等待锁
在这个flag之外，mutex的owner是一个task_struct结构体如下


static inline struct task_struct *__owner_task(unsigned long owner)
{
        return (struct task_struct *)(owner & ~MUTEX_FLAGS);
}

故我们计算出持有mutex_a的任务是


crash> struct task_struct.pid,comm 0xffffff804e286580
  pid = 2818
  comm = "spinlock_thread"

同样的持有mutex_b的任务是


crash> struct task_struct.pid,comm 0xffffff800dfd1d00
  pid = 2819
  comm = "spinlock_thread"

ps查看信息如下


crash> ps | grep 2819
   2819      2   5  ffffff800dfd1d00  UN   0.0       0      0  [spinlock_thread]
crash> ps | grep 2818
   2818      2   4  ffffff804e286580  UN   0.0       0      0  [spinlock_thread]

这样我们就找到了谁在持有这个锁。

semaphore测试

先查看semaphore的结构体


crash> struct semaphore -o -x
struct semaphore {
   [0x0] raw_spinlock_t lock;
  [0x18] unsigned int count;
  [0x20] struct list_head wait_list;
}
SIZE: 0x30

其初始值如下


crash> struct semaphore.count semaphore_a
  count = 1
crash> struct semaphore.count semaphore_b
  count = 1

此时我们开启测试


echo 3 > /sys/module/test/parameters/testsuite

semaphore的作用是down占用了资源，需要等待up恢复资源，默认情况下count是1，如果down了则是0，我们查看死锁后状态如下


crash> struct semaphore.count semaphore_b
  count = 0
crash> struct semaphore.count semaphore_a
  count = 0

可以看到semaphore变量a/b都是被占座了，所以产生了ABBA锁。

总结

本文主要是通过crash的方式来辅助定位死锁问题，在实际问题中，相当于多了一种方式排查问题，我们知道最有效排查死锁的问题还是内核提供的CONFIG，毋庸置疑。

目录

测试代码

spinlock测试

mutex测试

semaphore测试

总结