Directions of UNIX at Berkeley
Marshall Kirk McKusick
Michael J. Karels
Computer Systems Research Group
Computer Science Division
Department of Electrical Engineering and Computer Science
University of California, Berkeley
Berkeley, California 94720
ABSTRACT
This paper gives a brief overview of the con-
tributions to UNIX- made by the research community
and describes the needs that prompted the distri-
butions from Berkeley. The next Berkeley system
will attempt to adapt to the current state of
technology in the areas of virtual memory and file
system interfaces. The paper makes a brief survey
of this available technological base and then
speculates on the ways in which future Berkeley
systems will use this technology.
1. The Role of Research in Maintaining System Vitality
Since the divestiture of AT&T, UNIX has become the
focus of a massive marketing effort. To succeed, this effort
must convince potential customers that the product is sup-
ported, that future versions will continue to be developed,
and that these versions will be upwardly compatible with all
past applications.
AT&T's size alone ensures that it will be around in
years to come. Because the company has allocated increasing
research, development, and support resources to UNIX over
the past 10 years it provides an assurance of its commit-
ment. Its massive advertising campaign for System V, its
presence on the /usr/group UNIX standards committee, and the
_________________________
-UNIX is a registered trademark of AT&T in the US and
other countries.
April 27, 2013
- 2 -
publication of the System V Interface Definition testify to
the company's intention to remain compatible with past sys-
tems.
Although repeal of the law of entropy is a necessary
step along the road to a viable commercial product, this
runs counter to orderly system evolution. Be that as it may,
AT&T's major UNIX commercialization effort has succeeded in
making the system available to a much broader audience than
was previously possible.
The freezing of what previously had been an ever-
changing UNIX interface represented a major departure from
the pattern that the small but highly skilled UNIX community
had come to expect. Most early users had accounts at sites
that had the source to the programs they ran. Thus, as the
system interface evolved to reflect more current technology,
software could be changed to keep pace. Users simply updated
their programs to account for the new interface, recompiled
them, and continued to use them as before. Although this
required a large effort, it allowed the system -- and the
tools that ran on it -- to reflect changes in software tech-
nology.
At the forefront of the technological wave was AT&T's
own Bell Laboratories [Ritchie74]. It was there that the
UNIX system was born and nurtured, and it was there that its
evolution was controlled -- up through the release of the
7th Edition. Universities also were involved with the system
almost from its inception. The University of California at
Berkeley was among the first participants, playing host to
several researchers on sabbatical from the Labs. This
cooperation typified the harmony that was characteristic of
the early UNIX community. Work that was contributed to the
Labs by different members of the community helped produce a
rapidly expanding set of tools and facilities.
With the release of the 7th Edition, though, the use-
fulness of UNIX already had been clearly established, and
other organizations within AT&T began to handle the public
releases of the system. These groups took far less input
from the community as they began to freeze the system inter-
face in preparation for entry into the commercial market-
place.
As the research community continued to modify the UNIX
system, it found that it needed an organization that could
produce releases. Berkeley quickly stepped into the role.
Before the final public release of UNIX from the Labs,
Berkeley's work had been focused on the development of tools
designed to be added to existing UNIX systems. After the
AT&T freeze, though, a group of researchers at the univer-
sity found that they could easily expand their role to
include the coalescing function previously provided by the
April 27, 2013
- 3 -
Labs. Out of this came the first full Berkeley distribution
of UNIX (3.0BSD), complete with virtual memory -- a first
for UNIX users. The idea was so successful that System V
eventually adopted it six years later.
1.1. Motivations for Change
At the same time that AT&T was beginning to put the
brakes on further change in UNIX, local area networks and
bitmapped workstations were just beginning to emerge from
Xerox PARC and other research centers. Users in the academic
and research community realized that there were no
production-quality operating systems capable of using such
hardware. They also saw that networking unquestionably would
be an indispensable facility in future systems research.
Though it was not clear that UNIX was the correct base on
which to build a networked system, it was clear that UNIX
offered the most expedient means by which to build such a
system.
This posed the Berkeley group with an interesting chal-
lenge: how to meet the needs of the community of users
without adding needless complexity to existing applications.
Their efforts were aided by the presence of a large and
diverse local group of users who were teaching introductory
programming, typesetting documents, developing software sys-
tems, and trying to build huge Lisp-based systems capable of
solving differential equations. In addition, they were able
to discuss current problems and hash out potential solutions
at semi-annual technical conferences run by the Usenix
organization.
The assistance of a steering committee composed of
academics, commercial vendors, DARPA researchers, and people
from the Labs made it possible for the architecture of a
networking-based UNIX system to be developed. By keeping
with the UNIX tradition of integrating work done by others
in preference to writing everything from scratch, 4.2BSD was
released less than two years later [Joy83].
2. The Future of UNIX at Berkeley
The release of 4.3BSD in April of 1986 addressed many
of the performance problems and unfinished interfaces
present in 4.2BSD [Leffler84] [McKusick85]. Berkeley has now
embarked on a new development phase to likewise update other
old parts of the system. There are three main areas of work.
The first is to rewrite the virtual memory system to take
advantage of current technology and to provide new capabili-
ties such as mapped files and shared memory. The second is
to provide a standard interface to file systems so that mul-
tiple local and remote file systems can be supported much as
multiple networking protocols are by 4.3BSD. Finally, there
is a need to provide more internal flexibility in a way
April 27, 2013
- 4 -
similar to the System V Streams paradigm.
2.1. A New Virtual Memory Implementation
With the cost per byte of memory approaching that of
the cost per byte for disks, and with file systems increas-
ingly removed from host machines, a new approach to the
implementation of virtual memory is necessary. In 4.3BSD the
swap space is preallocated; this limits the maximum virtual
memory that can be supported to the size of the swap area
[Babaoglu79] [Someren84]. The new system should support vir-
tual memory space at least as great as the sum of sizes of
physical memory plus swap space (a system may run with no
swap space if it has no local disk). For systems that have a
local swap disk, but utilize remote file systems, using some
memory to keep track of the contents of swap space may be
useful to avoid multiple fetches of the same data from the
file system.
The new implementation should also add new functional-
ity. Processes should be allowed to have large sparse
address spaces, to map files into their address spaces, to
map device memory into their address spaces, and to share
memory with other processes. The shared address space may
either be obtained by mapping a file into (possibly dif-
ferent) parts of the address space, or by arranging for
processes to share ``anonymous memory'' (that is, memory
that is zero-fill on demand, and whose contents are lost
when the last process unmaps the memory). This latter
approach was the one adopted by the developers of System V.
One possible use of shared memory is to provide a
high-speed Inter-Process Communication (IPC) mechanism
between two or more cooperating processes. To insure the
integrity of data structures in a shared region, processes
must be able to use semaphores to coordinate their access to
these shared structures. In System V, semaphores are pro-
vided as a set of system calls. Unfortunately, the use of
system calls reduces the throughput of the shared memory IPC
to that of existing IPC mechanisms. To avoid this
bottleneck, we expect that the next release of BSD will
incorporate a scheme that places the semaphores in the
shared memory segment, so that machines with a test-and-set
instruction will be able to handle the usual uncontested
``lock'' and ``unlock'' without doing two system calls. Only
in the unusual case of trying to lock an already-locked lock
or when a desired lock is being released will a system call
be required. The interface will allow a user-level imple-
mentation of the System V semaphore interface on most
machines with a much lower runtime cost [McKusick86].
2.2. Toward a Compatible File System Interface
As network or remote file systems have been implemented
April 27, 2013
- 5 -
for UNIX, several stylized interfaces between the file sys-
tem implementation and the rest of the kernel have been
developed. Among these are Sun Microsystems' Virtual File
System interface (VFS) using vnodes [Sandburg85] [Klei-
man86], Digital Equipment's Generic File System (GFS) archi-
tecture [Rodriguez86], AT&T's File System Switch (FSS) [Rif-
kin86], the LOCUS distributed file system [Walker85], and
Masscomp's extended file system [Cole85]. Other remote file
systems have been implemented in research or university
groups for internal use - notably the network file system in
the Eighth Edition UNIX system [Weinberger84] and two dif-
ferent file systems used at Carnegie Mellon University
[Satyanarayanan85]. Numerous other remote file access
methods have been devised for use within individual UNIX
processes, many of them by modifications to the C I/O
library similar to those in the Newcastle Connection [Brown-
bridge82].
Each design attempts to isolate file system-dependent
details below a generic interface and to provide a framework
within which new file systems may be incorporated. However,
each of these interfaces is different from and is incompati-
ble with the others. Each addresses somewhat different
design goals, having been based on a different starting ver-
sion of UNIX, having targeted a different set of file sys-
tems with varying characteristics, and having selected a
different set of file system primitive operations.
We have studied the various file system interfaces to
determine their generality, completeness, robustness, effi-
ciency, and aesthetics. Based on this study, we have
developed a proposal for a new file system interface that we
believe includes the best features of each of the existing
implementations. Briefly, the proposal adopts the 4.3BSD
calling convention for name lookup, but otherwise is closely
related to Sun's VFS. A prototype implementation now is
being developed. This proposal and the rationale underlying
its development have been presented to major software ven-
dors as an early step toward convergence on a compatible
file system interface [Karels86].
2.3. Changes to the Protocol Layering Interface
The original work on restructuring the UNIX character
I/O system to allow flexible configuration of the internal
modules by user processes was done at Bell Laboratories
[Ritchie84]. Known as stackable line disciplines, these
interfaces allowed a user process to open a raw terminal
port and then push on appropriate processing modules (such
as one to do line editing). This model allowed terminal pro-
cessing modules to be used with virtual-circuit network
modules to create ``network virtual terminals'' by stacking
a terminal processing module on top of a networking proto-
col.
April 27, 2013
- 6 -
The design of the networking facilities for 4.2BSD took
a different approach based on the socket interface. This
design allows a single system to support multiple sets of
networking protocols with stream, datagram, and other types
of access. Protocol modules may deal with multiplexing of
data from different connections onto a single transport
medium.
A problem with stackable line disciplines though, is
that they are inherently linear in nature. Thus, they do not
adequately model the fan-in and fan-out associated with mul-
tiplexing. The simple and elegant stackable line discipline
implementation of Eighth Edition UNIX was converted to the
full production implementation of Streams in System V
Release 3. In doing the conversion, many pragmatic issues
were addressed, including the handling of multiplexed con-
nections and commercially important protocols. Unfor-
tunately, the implementation complexity increased enor-
mously.
Because AT&T will not allow others to include Streams
unless they also change their interface to comply with the
System V Interface Definition base and Networking Extension,
we cannot use the Release 3 implementation of Streams in the
Berkeley system. Given that compatibility thus will be dif-
ficult, we feel we will have complete freedom to make our
choices based solely on technical merits. As a result, our
implementation will appear far more like the simpler stack-
able line disciplines than the more complex Release 3
Streams [Chandler86]. A socket interface will be used rather
than a character device interface, and demultiplexing will
be handled internally by the protocols in the kernel. How-
ever, like Streams, the interfaces between kernel protocol
modules will follow a uniform convention.
3. References
Babaoglu79 Babaoglu, O., W. Joy, ``Data Structures
Added in the Berkeley Virtual Memory
Extensions to the UNIX Operating System''
Computer Systems Research Group, Dept of
EECS, University of California, Berkeley,
CA 94720, USA, November 1979.
Brownbridge82 Brownbridge, D.R., L.F. Marshall, B. Ran-
dell, ``The Newcastle Connection, or
UNIXes of the World Unite!,'' Software-
Practice and Experience, Vol. 12, pp.
1147-1162, 1982.
Chandler86 Chandler, D., ``The Monthly Report - Up
April 27, 2013
- 7 -
the Streams Without a Standard'', UNIX
Review, Vol. 4, No. 9, pp. 6-14, September
1986.
Cole85 Cole, C.T., P.B. Flinn, A.B. Atlas, ``An
Implementation of an Extended File System
for UNIX,'' Usenix Conference Proceedings,
pp. 131-150, June, 1985.
Joy83 Joy, W., E. Cooper, R. Fabry, S. Leffler,
M. McKusick, D. Mosher, ``4.2BSD System
Manual,'' 4.2BSD UNIX Programmer's Manual,
Vol 2c, Document #68 August 1983.
Karels86 Karels, M., M. McKusick, ``Towards a Com-
patible File System Interface,'' Proceed-
ings of the European UNIX Users Group
Meeting, Manchester, England, pp. 481-496,
September 1986.
Kleiman86 Kleiman, S., ``Vnodes: An Architecture for
Multiple File System Types in Sun UNIX,''
Usenix Conference Proceedings, pp. 238-
247, June, 1986.
Leffler84 Leffler, S., M.K. McKusick, M. Karels,
``Measuring and Improving the Performance
of 4.2BSD,'' Usenix Conference Proceed-
ings, pp. 237-252, June, 1984.
McKusick85 McKusick, M.K., M. Karels, S. Leffler,
``Performance Improvements and Functional
Enhancements in 4.3BSD,'' Usenix Confer-
ence Proceedings, pp. 519-531, June, 1985.
McKusick86 McKusick, M., M. Karels, ``A New Virtual
Memory Implementation for Berkeley UNIX,''
Proceedings of the European UNIX Users
Group Meeting, Manchester, England, pp.
451-460, September 1986.
Someren84 Someren, J. van, ``Paging in Berkeley
UNIX,'' Laboratorium voor schakeltechniek
en techneik v.d. informatieverwerkende
machines, Codenummer 051560-44(1984)01,
February 1984.
April 27, 2013
- 8 -
Rifkin86 Rifkin, A.P., M.P. Forbes, R.L. Hamilton,
M. Sabrio, S. Shah, K. Yueh, ``RFS Archi-
tectural Overview,'' Usenix Conference
Proceedings, pp. 248-259, June, 1986.
Ritchie74 Ritchie, D.M., K. Thompson, ``The Unix
Time-Sharing System,'' Communications of
the ACM, Vol. 17, pp. 365-375, July, 1974.
Ritchie84 Ritchie, D.M., ``A Stream Input-Output
System,'' AT&T Bell Laboratories Technical
Journal, Vol 63, No 8, Part 2, pp. 1897-
1910, October 1984.
Rodriguez86 Rodriguez, R., M. Koehler, R. Hyde, ``The
Generic File System,'' Usenix Conference
Proceedings, pp. 260-269, June, 1986.
Sandberg85 Sandberg, R., D. Goldberg, S. Kleiman, D.
Walsh, B. Lyon, ``Design and Implementa-
tion of the Sun Network File System,''
Usenix Conference Proceedings, pp. 119-
130, June, 1985.
Satyanarayanan85 Satyanarayanan, M., et al., ``The ITC Dis-
tributed File System: Principles and
Design,'' Proc. 10th Symposium on Operat-
ing Systems Principles, pp. 35-50, ACM,
December, 1985.
Walker85 Walker, B.J. and S.H. Kiser, ``The LOCUS
Distributed File System,'' The LOCUS Dis-
tributed System Architecture, G.J. Popek
and B.J. Walker, ed., The MIT Press, Cam-
bridge, MA, 1985.
Weinberger84 Weinberger, P.J., ``The Version 8 Network
File System,'' Usenix Conference presenta-
tion, June, 1984.
April 27, 2013
Generated on 2013-04-27 00:20:00 by $MirOS: src/scripts/roff2htm,v 1.77 2013/01/01 20:49:09 tg Exp $
These manual pages and other documentation are copyrighted by their respective writers;
their source is available at our CVSweb,
AnonCVS, and other mirrors. The rest is Copyright © 2002‒2013 The MirOS Project, Germany.
This product includes material
provided by Thorsten Glaser.
This manual page’s HTML representation is supposed to be valid XHTML/1.1; if not, please send a bug report – diffs preferred.