MirOS Manual: vnode(9)

VNODE(9)                      BSD Kernel Manual                       VNODE(9)

NAME

     vnode - an overview of vnodes

DESCRIPTION

     A vnode is an object in kernel memory that speaks the UNIX file interface
     (open, read, write, close, readdir, etc.). Vnodes can represent files,
     directories, FIFOs, domain sockets, block devices, character devices.

     Each vnode has a set of methods which start with string 'VOP_'. These
     methods include VOP_OPEN, VOP_READ, VOP_WRITE, VOP_RENAME, VOP_CLOSE,
     VOP_MKDIR. Many of these methods correspond closely to the equivalent
     file system call - open, read, write, rename, etc. Each file system (FFS,
     NFS, etc.) provides implementations for these methods.

     The Virtual File System (VFS) library maintains a pool of vnodes. File
     systems cannot allocate their own vnodes; they must use the functions
     provided by the VFS to create and manage vnodes.

Vnode life cycle

     When a client of the VFS requests a new vnode, the vnode allocation code
     can reuse an old vnode object that is no longer in use. Whether a vnode
     is in use is tracked by the vnode reference count (v_usecount). By con-
     vention, each open file handle holds a reference as do VM objects backed
     by files. A vnode with a reference count of 1 or more will not be de-
     allocated or re-used to point to a different file. So, if you want to en-
     sure that your vnode doesn't become a different file under you, you
     better be sure you have a reference to it. A vnode that points to a valid
     file and has a reference count of 1 or more is called "active".

     When a vnode's reference count drops to zero, it becomes "inactive", that
     is, a candidate for reuse. An "inactive" vnode still refers to a valid
     file and one can try to reactivate it using vget(9) (this is used a lot
     by caches).

     Before the VFS can reuse an inactive vnode to refer to another file, it
     must clean all information pertaining to the old file. A cleaned out
     vnode is called a "reclaimed" vnode.

     To support forceable unmounts and the revoke(2) system call, the VFS may
     "reclaim" a vnode with a positive reference count. The "reclaimed" vnode
     is given to the dead file system, which returns errors for most opera-
     tions. The reclaimed vnode will not be re-used for another file until its
     reference count hits zero.

Vnode pool

     The getnewvnode(9) system call allocates a vnode from the pool, possibly
     reusing an "inactive" vnode, and returns it to the caller. The vnode re-
     turned has a reference count (v_usecount) of 1.

     The vref(9) call increments the reference count on the vnode. It may only
     be on a vnode with reference count of 1 or greater. The vrele(9) and
     vput(9) calls decrement the reference count. In addition, the vput(9)
     call also releases the vnode lock.

     The vget(9) call, when used on an inactive vnode, will make the vnode
     "active" by bumping the reference count to one. When called on an active
     vnode, vget increases the reference count by one. However, if the vnode
     is being reclaimed concurrently, then vget will fail and return an error.

     The vgone(9) and vgonel(9) orchestrate the reclamation of a vnode. They
     can be called on both active and inactive vnodes.

     When transitioning a vnode to the "reclaimed" state, the VFS will call
     VOP_RECLAIM(9) method. File systems use this method to free any file-
     system specific data they attached to the vnode.

Vnode locks

     The vnode actually has three different types of lock: the vnode lock, the
     vnode interlock, and the vnode reclamation lock (VXLOCK).

The vnode lock

     The vnode lock and its consistent use accomplishes the following:

     •   It keeps a locked vnode from changing across certain pairs of VOP_
         calls, thus preserving cached data. For example, it keeps the direc-
         tory from changing between a VOP_LOOKUP call and a VOP_CREATE. The
         VOP_LOOKUP call makes sure the name doesn't already exist in the
         directory and finds free room in the directory for the new entry. The
         VOP_CREATE can then go ahead and create the file without checking if
         it already exists or looking for free space.

     •   Some file systems rely on it to ensure that only one "thread" at a
         time is calling VOP_ vnode operations on a given file or directory.
         Otherwise, the file system's behavior is undefined.

     •   On rare occasions, code will hold the vnode lock so that a series of
         VOP_ operations occurs as an atomic unit. (Of course, this doesn't
         work with network file systems like NFSv2 that don't have any notion
         of bundling a bunch of operations into an atomic unit.)

     •   While the vnode lock is held, the vnode will not be reclaimed.

     There is a discipline to using the vnode lock. Some VOP_ operations re-
     quire that the vnode lock is held before being called. A description of
     this rather arcane locking discipline is in sys/kern/vnode_if.src.

     The vnode lock is acquired by calling vn_lock(9) and released by calling
     VOP_UNLOCK(9).

     A process is allowed to sleep while holding the vnode lock.

     The implementation of the vnode lock is the responsibility of the indivi-
     dual file systems. Not all file systems implement it.

     To prevent deadlocks, when acquiring locks on multiple vnodes, the lock
     of parent directory must be acquired before the lock on the child direc-
     tory.

Vnode interlock

     The vnode interlock (vp->v_interlock) is a spinlock. It is useful on
     multi-processor systems for acquiring a quick exclusive lock on the con-
     tents of the vnode. It MUST NOT be held while sleeping. (What fields does
     it cover? What about splbio/interrupt issues?)

     Operations on this lock are a no-op on uniprocessor systems.

Other Vnode synchronization

     The vnode reclamation lock (VXLOCK) is used to prevent multiple processes
     from entering the vnode reclamation code. It is also used as a flag to
     indicate that reclamation is in progress. The VXWANT flag is set by
     processes that wish to be woken up when reclamation is finished.

     The vwaitforio(9) call is used to wait for all outstanding write I/Os as-
     sociated with a vnode to complete.

Version number/capability

     The vnode capability, v_id, is a 32-bit version number on the vnode.
     Every time a vnode is reassigned to a new file, the vnode capability is
     changed. This is used by code that wishes to keep pointers to vnodes but
     doesn't want to hold a reference (e.g., caches). The code keeps both a
     vnode * and a copy of the capability. The code can later compare the
     vnode's capability to its copy and see if the vnode still points to the
     same file.

     Note: for this to work, memory assigned to hold a struct vnode can only
     be used for another purpose when all pointers to it have disappeared.
     Since the vnode pool has no way of knowing when all pointers have disap-
     peared, it never frees memory it has allocated for vnodes.

Vnode fields

     Most of the fields of the vnode structure should be treated as opaque and
     only manipulated through the proper APIs. This section describes the
     fields that are manipulated directly.

     The v_flag attribute contains random flags related to various functions.
     They are summarized in table ...

     The v_tag attribute indicates what file system the vnode belongs to. Very
     little code actually uses this attribute and its use is deprecated. Pro-
     grammers should seriously consider using more object-oriented approaches
     (e.g. function tables). There is no safe way of defining new v_tags for
     loadable file systems. The v_tag attribute is read-only.

     The v_type attribute indicates what type of file (e.g. directory, regu-
     lar, FIFO) this vnode is. This is used by the generic code for various
     checks. For example, the read(2) system call returns an error when a read
     is attempted on a directory.

     The v_data attribute allows a file system to attach a piece of file sys-
     tem specific memory to the vnode. This contains information about the
     file that is specific to the file system.

     The v_numoutput attribute indicates the number of pending synchronous and
     asynchronous writes on the vnode. It does not track the number of dirty
     buffers attached to the vnode. The attribute is used by code like fsync
     to wait for all writes to complete before returning to the user. This at-
     tribute must be manipulated at splbio().

     The v_writecount attribute tracks the number of write calls pending on
     the vnode.

RULES

     The vast majority of vnode functions may not be called from interrupt
     context. The exceptions are bgetvp and brelvp. The following fields of
     the vnode are manipulated at interrupt level: v_numoutput, v_holdcnt,
     v_dirtyblkhd, v_cleanblkhd, v_bioflag, v_freelist, and v_synclist. Any
     access to these fields should be protected by splbio.

HISTORY

     This document first appeared in OpenBSD 2.9.

MirOS BSD #10-current         February 22, 2001                              2

Generated on 2014-07-04 21:17:45 by $MirOS: src/scripts/roff2htm,v 1.79 2014/02/10 00:36:11 tg Exp $

These manual pages and other documentation are copyrighted by their respective writers; their source is available at our CVSweb, AnonCVS, and other mirrors. The rest is Copyright © 2002‒2014 The MirOS Project, Germany.
This product includes material provided by Thorsten Glaser.

This manual page’s HTML representation is supposed to be valid XHTML/1.1; if not, please send a bug report – diffs preferred.