MirOS Manual: diskperf(PAPERS)


     Performance Effects of Disk Subsystem Choices for
             VAX|- Systems Running 4.2BSD UNIX*

                   Revised July 27, 1983

                         Bob Kridle

                          mt Xinu
                      2560 9th Street
                         Suite #312
                Berkeley, California  94710

                  Marshall Kirk McKusick=

              Computer Systems Research Group
                 Computer Science Division
 Department of Electrical Engineering and Computer Science
             University of California, Berkeley
                    Berkeley, CA  94720

                          ABSTRACT

          Measurements were made of the UNIX file  sys-
     tem  throughput  for  various I/O operations using
     the most attractive currently available Winchester
     disks  and controllers attached to both the native
     busses  (SBI/CMI)  and  the  UNIBUS  on  both  VAX
     11/780s  and  VAX 11/750s. The tests were designed
     to highlight the performance of  single  and  dual
     drive subsystems operating in the 4.2BSD fast file
     system environment. Many of  the  results  of  the
     tests   were   initially   counter-intuitive   and
     revealed several  important  aspects  of  the  VAX
     implementations which were surprising to us.

          The hardware used included two  Fujitsu 2351A
     ``Eagle''  disk  drives  on  each  of two foreign-
     vendor disk controllers and  two  DEC  RA-81  disk
_________________________
-VAX, UNIBUS, and MASSBUS  are  trademarks  of  Digital
Equipment Corporation.
* UNIX is a trademark of Bell Laboratories.
=This work was supported under grants from the National
Science Foundation under grant MCS80-05144, and the De-
fense Advance Research Projects Agency (DoD) under Arpa
Order  No.  4031  monitored  by Naval Electronic System
Command under Contract No. N00039-82-C-0235.

                           - 2 -

     drives  on  a  DEC  UDA-50  disk  controller.  The
     foreign-vendor  controllers  were  Emulex   SC750,
     SC780  and  Systems  Industries  9900  native  bus
     interfaced controllers. The DEC UDA-50  controller
     is  a  UNIBUS  interfaced,  heavily  buffered con-
     troller which is the first implementation of a new
     DEC storage system architecture, DSA.

          One of the  most  important  results  of  our
     testing  was  the  correction  of  several  timing
     parameters in our device handler for devices  with
     an  RH750/RH780  type  interface  and  having high
     burst transfer  rates.  The  correction  of  these
     parameters  resulted in an increase in performance
     of over twenty percent in some cases. In addition,
     one  of the controller manufacturers altered their
     bus arbitration scheme to produce another increase
     in throughput.

Performance                - i -                    Contents

                     TABLE OF CONTENTS

1.  Motivation

2.  Equipment
2.1.    DEC UDA50 disk controller
2.2.    Emulex SC750/SC780 disk controllers
2.3.    Systems Industries 9900 disk controller
2.4.    DEC RA81 disk drives
2.5.    Fujitsu 2351A disk drives

3.  Methodology

4.  Tests

5.  Results

6.  Conclusions

Acknowledgements

References

Appendix A
A.1.     read_8192
A.2.     write_4096
A.3.     write_8192
A.4.     rewrite_8192

Performance                - 1 -                  Motivation

1. Motivation

     These benchmarks were performed  for  several  reasons.
Foremost was our desire to obtain guideline to aid in choos-
ing one the most expensive components of any VAX UNIX confi-
guration,  the  disk storage system. The range of choices in
this area has increased dramatically in the last  year.  DEC
has  become, with the introduction of the UDA50/RA81 system,
cost competitive in the area of disk storage for  the  first
time.  Emulex's  entry  into  the  VAX 11/780 SBI controller
field, the SC780, represented a important choice for  us  to
examine,  given  our  previous success with their VAX 11/750
SC750 controller and their UNIBUS controllers.  The  Fujitsu
2351A  Winchester disk drive represents the lowest cost-per-
byte disk storage known to us. In addition, Fujitsu's  repu-
tation  for  reliability  was appealing. The many attractive
aspects of these components justified a more careful  exami-
nation of their performance aspects under UNIX.

     In addition to the direct motivation of  developing  an
effective  choice  of storage systems, we hoped to gain more
insight into VAX UNIX file system  and  I/O  performance  in
general.  What generic characteristics of I/O subsystems are
most important? How important is the location  of  the  con-
troller  on  the  SBI/CMI  versus  the  UNIBUS? Is extensive
buffering in the controller essential or even important? How
much  can  be  gained  by putting more of the storage system
management and optimization function in  the  controller  as
DEC does with the UDA50?

     We also wanted to resolve particular speculation  about
the  value of storage system optimization by a controller in
a UNIX environment. Is the access optimization as  effective
as  that already provided by the existing 4.2BSD UNIX device
handlers for traditional disks? VMS disk handlers do no seek
optimization.  This  gives the UDA50 controller an advantage
over other controllers under VMS which is not likely  to  be
as  important  to  UNIX. Are there penalties associated with
greater intelligence in the controller?

     A third and last reason for evaluating  this  equipment
is  comparable  to  the  proverbial mountain climbers answer
when asked why he climbs a  particular  mountain,  ``It  was
there.''  In our case the equipment was there. We were lucky
enough to assemble all the desired disks and controllers and
get  them  installed  on a temporarily idle VAX 11/780. This
got us started collecting data. Although many of  the  tests
were later rerun on a variety of other systems, this initial
test bed was essential for working out the testing bugs  and
getting our feet wet.

Performance                - 2 -                   Equipment

2. Equipment

     Various combinations of the  three  manufacturers  disk
controllers,  and  two  pairs of Winchester disk drives were
tested on both VAX 11/780 and VAX 11/750  CPUs.  The  Emulex
and  Systems  Industries disk controllers were interfaced to
Fujitsu 2351A ``Eagle'' 404 Megabyte disk  drives.  The  DEC
UDA50  disk  controller  was  interfaced to two DEC RA81 456
Megabyte Winchester disk drives. All three controllers  were
tested  on the VAX 780 although only the Emulex and DEC con-
trollers were benchmarked on the VAX 11/750. Systems  Indus-
tries makes a VAX 11/750 CMI interface for their controller,
but we did not have time to test this device.  In  addition,
not  all  the  storage  systems  were  tested  for two drive
throughput. Each of the controllers and disk drives used  in
the benchmarks is described briefly below.

2.1. DEC UDA50 disk controller

     This is a new controller design  which  is  part  of  a
larger,  long  range  storage  architecture  referred  to as
``DSA'' or Digital Storage Archetecture. An important aspect
of  DSA  is migrating a large part of the storage management
previously handled in the operating system  to  the  storage
system.  Thus,  the  UDA50  is  a much more intelligent con-
troller than previous interfaces like the  RH750  or  RH780.
The  UDA50  handles all error correction. It also deals with
most of the physical storage parameters.  Typically,  system
software requests a logical block or sequence of blocks. The
physical locations of these blocks, their head,  track,  and
cylinder  indices,  are  determined  by  the controller. The
UDA50 also orders disk requests to maximize throughput where
possible, minimizing total seek and rotational delays. Where
multiple drives are attached to  a  single  controller,  the
UDA50 can interleave simultaneous data transfers from multi-
ple drives.

     The UDA50 is a UNIBUS  implementation  of  a  DSA  con-
troller.  It  contains  52  sectors of internal buffering to
minimize the effects of a slow UNIBUS such as the one on the
VAX-11/780.  This  buffering  also  minimizes the effects of
contention with other UNIBUS peripherals.

2.2. Emulex SC750/SC780 disk controllers

     These two models of the same  controller  interface  to
the CMI bus of a VAX 11/750 and the SBI bus of a 11/VAX 780,
respectively. To the operating system, they  emulate  either
an  RH750  or  and  RH780.  The  controllers  install in the
MASSBUS locations in the CPU cabinets and operate  from  the
VAX power suplies. They provide an ``SMD'' or Storage Module
Drive interface to the disk drives. Although a large  number
of  disk drives use this interface, we tested the controller
exclusively connected to Fujitsu 2351A disks.

Performance                - 3 -                   Equipment

     The controller ws first implemented for the  VAX-11/750
as the SC750 model several years ago. Although the SC780 was
introduced more recently, both are stable products  with  no
bugs known to us.

2.3. System Industries 9900 disk controller

     This controller is an evolution of the S.I. 9400  first
introduced  as  a  UNIBUS  SMD  interface. The 9900 has been
enhanced to include an interface to the  VAX  11/780  native
bus,  the  SBI.  It  has  also been upgraded to operate with
higher data rate drives such as the Fujitsu 2351As  we  used
in  this  test. The controller is contained in its own rack-
mounted drawer with an integral power supply. The  interface
to  the  SMD  is  a  four  module  set which mounts in a CPU
cabinet slot normally occupied by an RH780. The  SBI  inter-
face derives power from the VAX CPU cabinet power supplies.

2.4. DEC RA81 disk drives

     The RA81 is a rack-mountable 456  Megabyte  (formatted)
Winchester  disk  drive  manufactured  by DEC. It includes a
great deal of technology which is an integral  part  of  the
DEC  DSA  scheme.  The  novel  technology  includes a serial
packet based communications  protocol  with  the  controller
over  a  pair  of  mini-coaxial cables. The physical charac-
teristics of the RA81 are shown in the table below:

               ______________________________________
              | DEC RA81 Disk Drive Characteristics |
              |_____________________________________|
              | Peak Transfer Rate   2.2 Mbytes/sec.|
              | Rotational Speed     3,600 RPM      |
              | Data Sectors/Track   51             |
              | Logical Cylinders    1,248          |
              | Logical Data Heads   14             |
              | Data Capacity        456 Mbytes     |
              | Minimum Seek Time    6 milliseconds |
              | Average Seek Time    28 milliseconds|
              | Maximum Seek Time    52 milliseconds|
              |_____________________________________|

2.5. Fujitsu 2351A disk drives

     The Fujitsu 2351A disk drive is a Winchester disk drive
with  an  SMD  controller interface. Fujitsu has developed a
very good reputation for reliable storage products over  the
last  several  years.  The  2351A has the following physical
characteristics:

Performance                - 4 -                   Equipment

             __________________________________________
            | Fujitsu 2351A Disk Drive Characteristics|
            |_________________________________________|
            | Peak Transfer Rate   1.859 Mbytes/sec.  |
            | Rotational Speed     3,961 RPM          |
            | Data Sectors/Track   48                 |
            | Cylinders            842                |
            | Data Heads           20                 |
            | Data Capacity        404 Mbytes         |
            | Minimum Seek Time    5 milliseconds     |
            | Average Seek Time    18 milliseconds    |
            | Maximum Seek Time    35 milliseconds    |
            |_________________________________________|

Performance                - 5 -                 Methodology

3. Methodology

     Our goal was to evaluate the performance of the  target
peripherals  in  an environment as much like our 4.2BSD UNIX
systems as possible.  There  are  two  basic  approaches  to
creating  this  kind  of  test  environment.  These might be
termed the indirect and the direct  approach.  The  approach
used by DEC in producing most of the performance data on the
UDA50/RA81 system under VMS is what  we  term  the  indirect
approach. We chose to use the direct approach.

     The indirect approach used by DEC involves  two  steps.
First,  the  environment  in  which  performance  is  to  be
evaluated is parameterized.  In  this  case,  the  disk  I/O
characteristics  of VMS were measured as to the distribution
of various sizes of accesses and the proportion of reads and
writes.  This  parameterization  of typical I/O activity was
termed a ``vax mix.'' The second stage  involves  simulating
this mixture of I/O activities with the devices to be tested
and noting the total volume of  transactions  processed  per
unit time by each system.

     The problems encountered with  this  indirect  approach
often  have  to  do with the completeness and correctness of
the parameterization of the context environment.  For  exam-
ple, the ``vax mix'' model constructed for DECs tests uses a
random distribution of seeks to the blocks read or  written.
It  is  not likely that any real system produces a distribu-
tion of disk transfer locations which is  truly  random  and
does not exhibit strong locality characteristics.

     The methodology chosen by us is  direct  in  the  sense
that  it  uses the standard structured file system mechanism
present in the 4.2BSD UNIX operating system  to  create  the
sequence  of  locations and sizes of reads and writes to the
benchmarked equipment. We simply  create,  write,  and  read
files  as they would be by user's activities. The disk space
allocation and disk cacheing mechanism built  into  UNIX  is
used  to  produce the actual device reads and writes as well
as to determine their size and  location  on  the  disk.  We
measure  and  compare the rate at which these user files can
be written, rewritten, or read.

     The advantage of this approach is the implicit accuracy
in  testing  in the same environment in which the peripheral
will be used. Although this system does not account for  the
I/O produced by some paging and swapping, in our memory rich
environment these activities account for a relatively  small
portion of the total disk activity.

     A more significant disadvantage to the direct  approach
is  the  occasional difficulty we have in accounting for our
measured results. The apparently  straight-forward  activity
of  reading  or writing a logical file on disk can produce a

Performance                - 6 -                 Methodology

complex mixture of disk traffic. File I/O is supported by  a
file  management system that buffers disk traffic through an
internal cache, which allows writes to ba handled  asynchro-
nously.  Reads must be done synchronously, however this res-
triction is  moderated  by  the  use  of  read-ahead.  Small
changes  in the performance of the disk controller subsystem
can result in large and unexpected changes in the file  sys-
tem performance, as it may change the characteristics of the
memory contention experienced by the processor.

Performance                - 7 -                       Tests

4. Tests

     Our  battery  of  tests  consists  of  four   programs,
read_8192,  write_8192,  write_4096  and rewrite_8192 origi-
nally written by [McKusick83] to evaluate the performance of
the new file system in 4.2BSD. These programs all follow the
same model and are typified by read_8192 shown here.

        #define BUFSIZ 8192
        main( argc, argv)
        char **argv;
        {
                char buf[BUFSIZ];
                int i, j;

                j = open(argv[1], 0);
                for (i = 0; i < 1024; i++)
                        read(j, buf, BUFSIZ);
        }

The remaining programs are included in appendix A.

     These programs read, write with two different  blocking
factors, and rewrite logical files in structured file system
on the disk under test. The write programs create new  files
while  the rewrite program overwrites an existing file. Each
of these programs represents an  important  segment  of  the
typical  UNIX  file  system  activity  with the read program
representing by far the largest class and  the  rewrite  the
smallest.

     A blocking factor of  8192  is  used  by  all  programs
except  write_4096. This is typical of most 4.2BSD user pro-
grams since a standard set of I/O support routines  is  com-
monly  used  and these routines buffer data in similar block
sizes.

     For each test run, a empty eight  Kilobyte  block  file
system  was  created in the target storage system. Then each
of the four tests was run and timed. Each test was run three
times;  the first to clear out any useful data in the cache,
and the second two to insure that the  experiment  had  sta-
blized and was repeatable. Each test operated on eight Mega-
bytes of data to insure that the cache did not overly influ-
ence  the  results. Another file system was then initialized
using a basic blocking factor of four Kilobytes and the same
tests  were  run again and timed. A command script for a run
appears as follows:

Performance                - 8 -                       Tests

        #!/bin/csh
        set time=2
        echo "8K/1K file system"
        newfs /dev/rhp0g eagle
        mount /dev/hp0g /mnt0
        mkdir /mnt0/foo
        echo "write_8192 /mnt0/foo/tst2"
        rm -f /mnt0/foo/tst2
        write_8192 /mnt0/foo/tst2
        rm -f /mnt0/foo/tst2
        write_8192 /mnt0/foo/tst2
        rm -f /mnt0/foo/tst2
        write_8192 /mnt0/foo/tst2
        echo "read_8192 /mnt0/foo/tst2"
        read_8192 /mnt0/foo/tst2
        read_8192 /mnt0/foo/tst2
        read_8192 /mnt0/foo/tst2
        umount /dev/hp0g

Performance                - 9 -                     Results

5. Results

     The following tables indicate the results of  our  test
runs. Note that each table contains results for tests run on
two varieties of 4.2BSD  file  systems.  The  first  set  of
results  is  always  for a file system with a basic blocking
factor of eight Kilobytes and a fragment size of 1 Kilobyte.
The  second sets of measurements are for file systems with a
four Kilobyte block size and a one Kilobyte  fragment  size.
The  values  in  parenthesis  indicate the percentage of CPU
time used by the test program. In the case of the  two  disk
arm tests, the value in parenthesis indicates the sum of the
percentage of the test programs that were  run.  Entries  of
``n. m.'' indicate this value was not measured.

   ______________________________________________________________
  |            4.2BSD File Systems Tests - VAX 11/750           |
  |______________________________________________________________|
  |                Logically Sequential Transfers               |
  |        from an 8K/1K 4.2BSD File System (Kbytes/sec.)       |
  |_____________|_______________________|_______________________|
  | Test        |  Emulex SC750/Eagle   |  UDA50/RA81           |
  |             |                       |                       |
  |             |  1 Drive     2 Drives |  1 Drive     2 Drives |
  |_____________|_______________________|_______________________|
  | read_8192   |  490 (69%)   620 (96%)|  310 (44%)   520 (65%)|
  | write_4096  |  380 (99%)   370 (99%)|  370 (97%)   360 (98%)|
  | write_8192  |  470 (99%)   470 (99%)|  320 (71%)   410 (83%)|
  | rewrite_8192|  650 (99%)   620 (99%)|  310 (50%)   450 (70%)|
  |______________________________________________________________|
  |                Logically Sequential Transfers               |
  |         from 4K/1K 4.2BSD File System (Kbytes/sec.)         |
  |_____________|_______________________|_______________________|
  | Test        |  Emulex SC750/Eagle   |  UDA50/RA81           |
  |             |                       |                       |
  |             |  1 Drive     2 Drives |  1 Drive     2 Drives |
  |_____________|_______________________|_______________________|
  | read_8192   |  300 (60%)   400 (84%)|  210 (42%)   340 (77%)|
  | write_4096  |  320 (98%)   320 (98%)|  220 (67%)   290 (99%)|
  | write_8192  |  340 (98%)   340 (99%)|  220 (65%)   310 (98%)|
  | rewrite_8192|  450 (99%)   450 (98%)|  230 (47%)   340 (78%)|
  |_____________|_______________________|_______________________|

     Note that the rate  of  write  operations  on  the  VAX
11/750  are  ultimately CPU limited in some cases. The write
rates saturate the CPU at a lower bandwidth than  the  reads
because  they  must do disk allocation in addition to moving
the data from the user program to the disk.  The  UDA50/RA81
saturates  the  CPU  at  a  lower  transfer rate for a given
operation than the SC750/Eagle because it causes more memory
contention  with  the CPU. We do not know if this contention

Performance                - 10 -                    Results

is caused by the UNIBUS controller or the UDA50.

     The following table reports the results of test runs on
a VAX 11/780 with 4 Megabytes of main memory.

_________________________________________________________________________________________
|                        4.2BSD File Systems Tests - VAX 11/780                         |
_|________________________________________________________________________________________|
|                            Logically Sequential Transfers                             |
|                    from an 8K/1K 4.2BSD File System (Kbytes/sec.)                     |
|____________|________________________|_________________________|_______________________|
|Test        |  Emulex SC780/Eagle    |  UDA50/RA81             |  Sys. Ind. 9900/Eagle |
|            |                        |                         |                       |
|            |  1 Drive      2 Drives |  1 Drive      2 Drives  |  1 Drive     2 Drives |
|____________|________________________|_________________________|_______________________|
|read_8192   |  560 (70%)    480 (58%)|  360 (45%)    540 (72%) |  340 (41%)   520 (66%)|
|write_4096  |  440 (98%)    440 (98%)|  380 (99%)    480 (96%) |  490 (96%)   440 (84%)|
|write_8192  |  490 (98%)    490 (98%)|  220 (58%)*   480 (92%) |  490 (80%)   430 (72%)|
|rewrite_8192|  760 (100%)   560 (72%)|  220 (50%)*   180 (52%)*|  490 (60%)   520 (62%)|
_|________________________________________________________________________________________|
|                            Logically Sequential Transfers                             |
|                    from an 4K/1K 4.2BSD File System (Kbytes/sec.)                     |
|____________|________________________|_________________________|_______________________|
|Test        |  Emulex SC780/Eagle    |  UDA50/RA81             |  Sys. Ind. 9900/Eagle |
|            |                        |                         |                       |
|            |  1 Drive      2 Drives |  1 Drive      2 Drives  |  1 Drive     2 Drives |
|____________|________________________|_________________________|_______________________|
|read_8192   |  490 (77%)    370 (66%)|     n.m.         n.m.   |  200 (31%)   370 (56%)|
|write_4096  |  380 (98%)    370 (98%)|     n.m.         n.m.   |  200 (46%)   370 (88%)|
|write_8192  |  380 (99%)    370 (97%)|     n.m.         n.m.   |  200 (45%)   320 (76%)|
|rewrite_8192|  490 (87%)    350 (66%)|     n.m.         n.m.   |  200 (31%)   300 (46%)|
|____________|________________________|_________________________|_______________________|

        * the operation of the hardware was suspect during these tests.

     The dropoff in reading and writing rates  for  the  two
drive  SC780/Eagle tests are probably due to the file system
using insufficient rotational delay for these tests. We have
not fully investigated these times.

     The following table compares data rates on VAX  11/750s
directly  with  those  of  VAX  11/780s using the UDA50/RA81
storage system.

Performance                - 11 -                    Results

 _________________________________________________________________
|       4.2BSD File Systems Tests - DEC UDA50 - 750 vs. 780      |
|_________________________________________________________________|
|                 Logically Sequential Transfers                 |
|         from an 8K/1K 4.2BSD File System (Kbytes/sec.)         |
|_____________|________________________|_________________________|
| Test        |  VAX 11/750 UNIBUS     |  VAX 11/780 UNIBUS      |
|             |                        |                         |
|             |  1 Drive     2 Drives  |  1 Drive      2 Drives  |
|_____________|________________________|_________________________|
| read_8192   |  310 (44%)   520 (84%) |  360 (45%)    540 (72%) |
| write_4096  |  370 (97%)   360 (100%)|  380 (99%)    480 (96%) |
| write_8192  |  320 (71%)   410 (96%) |  220 (58%)*   480 (92%) |
| rewrite_8192|  310 (50%)   450 (80%) |  220 (50%)*   180 (52%)*|
|_________________________________________________________________|
|                 Logically Sequential Transfers                 |
|         from an 4K/1K 4.2BSD File System (Kbytes/sec.)         |
|_____________|________________________|_________________________|
| Test        |  VAX 11/750 UNIBUS     |  VAX 11/780 UNIBUS      |
|             |                        |                         |
|             |  1 Drive     2 Drives  |  1 Drive      2 Drives  |
|_____________|________________________|_________________________|
| read_8192   |  210 (42%)   342 (77%) |     n.m.         n.m.   |
| write_4096  |  215 (67%)   294 (99%) |     n.m.         n.m.   |
| write_8192  |  215 (65%)   305 (98%) |     n.m.         n.m.   |
| rewrite_8192|  227 (47%)   336 (78%) |     n.m.         n.m.   |
|_____________|________________________|_________________________|

        * the operation of the hardware was suspect during these tests.

     The higher throughput available on VAX 11/780s  is  due
to a number of factors. The larger main memory size allows a
larger file system cache. The block allocation routines  run
faster, raising the upper limit on the data rates in writing
new files.

     The next table makes the same comparison using an  Emu-
lex controller on both systems.

Performance                - 12 -                    Results

  _______________________________________________________________
 |       4.2BSD File Systems Tests - Emulex - 750 vs. 780       |
 |_______________________________________________________________|
 |                Logically Sequential Transfers                |
 |        from an 8K/1K 4.2BSD File System (Kbytes/sec.)        |
 |_____________|_______________________|________________________|
 | Test        |  VAX 11/750 CMI Bus   |  VAX 11/780 SBI Bus    |
 |             |                       |                        |
 |             |  1 Drive     2 Drives |  1 Drive      2 Drives |
 |_____________|_______________________|________________________|
 | read_8192   |  490 (69%)   620 (96%)|  560 (70%)    480 (58%)|
 | write_4096  |  380 (99%)   370 (99%)|  440 (98%)    440 (98%)|
 | write_8192  |  470 (99%)   470 (99%)|  490 (98%)    490 (98%)|
 | rewrite_8192|  650 (99%)   620 (99%)|  760 (100%)   560 (72%)|
 |_______________________________________________________________|
 |                Logically Sequential Transfers                |
 |        from an 4K/1K 4.2BSD File System (Kbytes/sec.)        |
 |_____________|_______________________|________________________|
 | Test        |  VAX 11/750 CMI Bus   |  VAX 11/780 SBI Bus    |
 |             |                       |                        |
 |             |  1 Drive     2 Drives |  1 Drive      2 Drives |
 |_____________|_______________________|________________________|
 | read_8192   |  300 (60%)   400 (84%)|  490 (77%)    370 (66%)|
 | write_4096  |  320 (98%)   320 (98%)|  380 (98%)    370 (98%)|
 | write_8192  |  340 (98%)   340 (99%)|  380 (99%)    370 (97%)|
 | rewrite_8192|  450 (99%)   450 (98%)|  490 (87%)    350 (66%)|
 |_____________|_______________________|________________________|

     The following table illustrates the  evolution  of  our
testing  process  as  both  hardware  and  software problems
effecting  the  performance  of  the   Emulex   SC780   were
corrected. The software change was suggested to us by George
Goble of Purdue University.

     The 4.2BSD  handler  for  RH750/RH780  interfaced  disk
drives  contains  several  constants  which to determine how
much time is provided between  an  interrupt  signaling  the
completion of a positioning command and the subsequent start
of a data transfer operation. These lead times are expressed
as  sectors  of  rotational delay. If they are too small, an
extra complete rotation will often  be  required  between  a
seek  and subsequent read or write operation. The higher bit
rate and rotational speed of the 2351A Fujitsu  disk  drives
required increasing these constants.

     The hardware  change  involved  allowing  for  slightly
longer  delays  in  arbitrating for cycles on the SBI bus by
starting the bus arbitration cycle a little further ahead of
when  the  data  was  ready  for transfer. Finally we had to
increase the rotational delay between consecutive blocks  in

Performance                - 13 -                    Results

the  file  because  the  higher bandwidth from the disk gen-
erated more memory contention, which slowed down the proces-
sor.

_______________________________________________________________________
| 4.2BSD File Systems Tests - Emulex SC780 Disk Controller Evolution  |
_|______________________________________________________________________|
|                   Logically Sequential Transfers                    |
|           from an 8K/1K 4.2BSD File System (Kbytes/sec.)            |
|__________|______________________|________________|__________________|
|est       |Inadequate Search Lead|OK Search Lead  |OK Search Lead    |
|          |Initial SBI Arbitratio|Init SBI Arb.   |Improved SBI Arb. |
|          |                      |                |                  |
|          |1 Drive    2 Drives   | 1 Drive 2 Drive| 1 Drive  2 Drives|
|__________|______________________|________________|__________________|
|ead_8192  |  320        370      |440 (60%)  n.m. |560 (70%) 480 (58%|
|rite_4096 |  250        270      |300 (63%)  n.m. |440 (98%) 440 (98%|
|rite_8192 |  250        280      |340 (60%)  n.m. |490 (98%) 490 (98%|
|ewrite_819|  250        290      |380 (48%)  n.m. |760 (100%)560 (72%|
_|______________________________________________________________________|
|                   Logically Sequential Transfers                    |
|           from an 4K/1K 4.2BSD File System (Kbytes/sec.)            |
|__________|______________________|________________|__________________|
|est       |Inadequate Search Lead|OK Search Lead  |OK Search Lead    |
|          |Initial SBI Arbitratio|Init SBI Arb.   |Improved SBI Arb. |
|          |                      |                |                  |
|          |1 Drive    2 Drives   | 1 Drive 2 Drive| 1 Drive  2 Drives|
|__________|______________________|________________|__________________|
|ead_8192  |  200        220      |   280     n.m. |490 (77%) 370 (66%|
|rite_4096 |  180        190      |   300     n.m. |380 (98%) 370 (98%|
|rite_8192 |  180        200      |   320     n.m. |380 (99%) 370 (97%|
|ewrite_819|  190        200      |   340     n.m. |490 (87%) 350 (66%|
|__________|______________________|________________|__________________|

Performance                - 14 -                Conclusions

6. Conclusions

     Peak available throughput is only one criterion in most
storage  system  purchasing  decisions. Most of the VAX UNIX
systems we are familiar with  are  not  I/O  bandwidth  con-
strained. Nevertheless, an adequate disk bandwidth is neces-
sary for good performance and especially to preserve  snappy
response  time.  All  of  the disk systems we tested provide
more than adequate bandwidth for  typical  VAX  UNIX  system
application. Perhaps in some I/O-intensive applications such
as image processing, more consideration should be  given  to
the  peak  throughput available. In most situations, we feel
that other factors are more important in  making  a  storage
choice  between  the  systems  we tested. Cost, reliability,
availability, and support are some  of  these  factors.  The
maturity  of  the  technology purchased must also be weighed
against the future value and expandability of newer  techno-
logies.

     Two important conclusions about storage systems in gen-
eral  can  be  drawn  from  these  tests.  The first is that
buffering can be effective in smoothing the effects of lower
bus  speeds  and  bus  contention.  Even though the UDA50 is
located on the relatively slow UNIBUS,  its  performance  is
similar  to  controllers  located  on  the  faster processor
busses. However, the SC780 with only one sector of buffering
shows  that little buffering is needed if the underlying bus
is fast enough.

     Placing more intelligence in the  controller  seems  to
hinder  UNIX system performance more than it helps. Our pro-
filing tests have indicated that UNIX spends about the  same
percentage  of time in the SC780 driver and the UDA50 driver
(about 10-14%). Normally UNIX uses  a  disk  sort  algorithm
that  separates reads and writes into two seek order queues.
The read queue has priority  over  the  write  queue,  since
reads  cause  processes  to  block, while writes can be done
asynchronously. This is particularly useful when  generating
large  files,  as  it  allows the disk allocator to read new
disk maps and begin doing new allocations while  the  blocks
allocated  out  of  the  previous  map  are written to disk.
Because the UDA50 handles all block ordering, and because it
keeps  all  requests  in  a single queue, there is no way to
force the longer seek needed to get the next disk map.  This
disfunction causes all the writes to be done before the disk
map read, which idles the disk until a new set of blocks can
be allocated.

     The additional functionality of  the  UDA50  controller
that allows it to transfer simultaneously from two drives at
once tends to make the two drive  transfer  tests  run  much
more  effectively.  Tuning  for  the single drive case works
more effectively in the two drive case than when controllers
that cannot handle simultaneous transfers are used.

Performance                - 15 -           Acknowledgements

Acknowledgements

     We thank Paul Massigilia  and  Bill  Grace  of  Digital
Equipment  Corp  for  helping us run our disk tests on their
UDA50/RA81. We also thank Rich Notari and Paul Ritkowski  of
Emulex  for making their machines available to us to run our
tests of the SC780/Eagles. Dan McKinster,  then  of  Systems
Industries,  arranged  to make their equipment available for
the tests. We appreciate the time provided by Bob Gross, Joe
Wolf, and Sam Leffler on their machines to refine our bench-
marks. Finally we thank our sponsors, the  National  Science
Foundation  under grant MCS80-05144, and the Defense Advance
Research Projects Agency (DoD) under  Arpa  Order  No.  4031
monitored  by Naval Electronic System Command under Contract
No. N00039-82-C-0235.

References

[McKusick83]        M. McKusick,  W.  Joy,  S.  Leffler,  R.
                    Fabry,  ``A Fast File System for UNIX'',
                    ACM Transactions on Computer Systems  2,
                    3. pp 181-197, August 1984.

Performance                - 16 -                 Appendix A

Appendix A

read_8192

        #define BUFSIZ 8192
        main( argc, argv)
        char **argv;
        {
                char buf[BUFSIZ];
                int i, j;

                j = open(argv[1], 0);
                for (i = 0; i < 1024; i++)
                        read(j, buf, BUFSIZ);
        }

write_4096

        #define BUFSIZ 4096
        main( argc, argv)
        char **argv;
        {
                char buf[BUFSIZ];
                int i, j;

                j = creat(argv[1], 0666);
                for (i = 0; i < 2048; i++)
                        write(j, buf, BUFSIZ);
        }

write_8192

        #define BUFSIZ 8192
        main( argc, argv)
        char **argv;
        {
                char buf[BUFSIZ];
                int i, j;

                j = creat(argv[1], 0666);
                for (i = 0; i < 1024; i++)
                        write(j, buf, BUFSIZ);
        }

Performance                - 17 -                 Appendix A

rewrite_8192

        #define BUFSIZ 8192
        main( argc, argv)
        char **argv;
        {
                char buf[BUFSIZ];
                int i, j;

                j = open(argv[1], 2);
                for (i = 0; i < 1024; i++)
                        write(j, buf, BUFSIZ);
        }

Generated on 2014-04-02 20:57:59 by $MirOS: src/scripts/roff2htm,v 1.79 2014/02/10 00:36:11 tg Exp $

These manual pages and other documentation are copyrighted by their respective writers; their source is available at our CVSweb, AnonCVS, and other mirrors. The rest is Copyright © 2002‒2014 The MirOS Project, Germany.
This product includes material provided by Thorsten Glaser.

This manual page’s HTML representation is supposed to be valid XHTML/1.1; if not, please send a bug report – diffs preferred.