[kernel] 如果在linux内核模块里用错锁?

内核模块里自己做了一个简化版的 find_get_page:

  struct page * my_get_page(struct address_space *mapping, unsigned long offset)
  {
      struct page *page;

      read_lock_irq(&mapping->tree_lock);
      page = radix_tree_lookup(&mapping->page_tree, offset);
      read_unlock_irq(&mapping->tree_lock);
      return page;
  }

代码是从2.6.18的内核里抄来的。

在2.6.9内核的xen虚拟机上测试,现象很诡异:服务器的ssh连接突然断开,只能重连,重连后看见模块被自动卸载掉了。仔细看代码,才发现2.6.9内核的struct address_space里的tree_lock是spinlock_t,而不是rwlock_t,结果我的代码就是对着一个spinlock_t调用read_lock_irq,锁操作不对,然后又发生了什么呢?

read_lock_irq 调用了 _raw_read_lock:

  static inline void _raw_read_lock(rwlock_t *rw)
  {
  #ifdef CONFIG_DEBUG_SPINLOCK
      BUG_ON(rw->magic != RWLOCK_MAGIC);
  #endif
      __build_read_lock(rw, "__read_lock_failed");
  }

喔,kernel会判断锁的magic number,不对的话会调用BUG_ON(),进而调用BUG(),所以机器打了个嗝。

如果没有这个magic number的检测,一旦死机,错误就更难找了,这个检测虽然很土,但很体贴,也很有效。


相关文章

分类

留言:

关于文章

This page contains a single entry by DongHao published on 08 4, 2010 3:48 PM.

[c++] 小心析构函数 was the previous entry in this blog.

自动化测试之痛 is the next entry in this blog.

Find recent content on the main index or look in the 存档 to find all content.