原文: Racing against the clock -- hitting a tiny kernel race window
The kernel tries to figure out whether it can account for all references to some file by comparing the file's refcount with the number of references from inflight SKBs (socket buffers). If they are equal, it assumes that the UNIX domain sockets subsystem effectively has exclusive access to the file because it owns all references.
The problem is that struct file can also be referenced from an RCU read-side critical section (which you can't detect by looking at the refcount), and such an RCU reference can be upgraded into a refcounted reference using
get_file_rcu()
/get_file_rcu_many()
by__fget_files()
as long as the refcount is non-zero.
unix_gc()
的预期逻辑是: total_refs
和 inflight_refs
相同就可以认为此时 file
是 skb
单独占有的,就可以把 skb
和 file
一起 free 掉__fget_files
那里就会发现 f_count
是 0 或者 file 是 NULLfile->f_count
在 __fget_files()
中会被加 1 ,在 unix_gc
后面的代码中就不会被释放 file
的内存,而只是把 f_count
减 1,这也意味着在 close()
之后依然可以 dup()
成功dup() -> __fget_files()
**file = files_lookup_fd_rcu(files, fd); // fdt->fd[fd] (1)**
...
**get_file_rcu_many(file, refs) // update: f_count+1 (2)**
close() -> unix_gc()
list_for_each_entry_safe(u, next, &gc_inflight_list, link) {
**total_refs = file_count(u->sk.sk_socket->file); // read f_count: 1 (3)**
inflight_refs = atomic_long_read(&u->inflight); // inflight_refs: 1
...
if (total_refs == inflight_refs) { // compare
list_move_tail(&u->link, &gc_candidates);
...
unix_gc() 中 file 和 skb 没有同步释放可能造成的影响?