MirOS Manual: 15.sed(USD)


            SED - A Non-interactive Text Editor

                       Lee E. McMahon

                          ABSTRACT

          Sed is a non-interactive context editor  that
     runs on the UNIX- operating system.

     Sed is designed to be especially useful  in  three
     cases:

     1)  To  edit  files  too  large  for   comfortable
     interactive editing;
     2) To edit any size  file  when  the  sequence  of
     editing commands is too complicated to be comfort-
     ably typed in interactive mode;
     3) To perform multiple `global' editing  functions
     efficiently in one pass through the input.

     This memorandum constitutes a manual for users  of
     sed.

Introduction

Sed is a non-interactive context editor designed to be espe-
cially useful in three cases:

     1) To edit files too large for comfortable  interactive
     editing;
     2) To edit any size file when the sequence  of  editing
     commands  is too complicated to be comfortably typed in
     interactive mode;
     3) To perform multiple `global' editing functions effi-
     ciently in one pass through the input.

Since only a few lines of the input reside in  core  at  one
time, and no temporary files are used, the effective size of
file that can be edited is limited only by  the  requirement
that  the input and output fit simultaneously into available
secondary storage.

_________________________
-  UNIX  is a registered trademark of AT&T Bell Labora-
tories in the USA and other countries.

USD:15-2                 SED - A Non-interactive Text Editor

Complicated editing scripts can be  created  separately  and
given  to  sed  as  a  command file. For complex edits, this
saves considerable typing, and  its  attendant  errors.  Sed
running  from a command file is much more efficient than any
interactive editor known to the author, even if that  editor
can be driven by a pre-written script.

The principal loss of functions compared to  an  interactive
editor  are  lack  of  relative  addressing  (because of the
line-at-a-time operation), and lack of  immediate  verifica-
tion that a command has done what was intended.

Sed is a lineal descendant of the UNIX editor,  ed.  Because
of  the  differences between interactive and non-interactive
operation, considerable changes have been  made  between  ed
and  sed;  even  confirmed  users  of  ed will frequently be
surprised (and probably chagrined), if they rashly  use  sed
without  reading Sections 2 and 3 of this document. The most
striking family resemblance between the two  editors  is  in
the  class  of  patterns (`regular expressions') they recog-
nize; the code for matching patterns is copied almost verba-
tim  from  the  code  for ed, and the description of regular
expressions in Section 2 is copied almost verbatim from  the
UNIX Programmer's Manual[1]. (Both code and description were
written by Dennis M. Ritchie.)

1. Overall Operation

Sed by default copies the standard  input  to  the  standard
output,  perhaps  performing one or more editing commands on
each line before writing it to the output. This behavior may
be  modified  by  flags on the command line; see Section 1.1
below.

The general format of an editing command is:

     [address1,address2][function][arguments]

One  or  both  addresses  may  be  omitted;  the  format  of
addresses  is  given  in  Section 2. Any number of blanks or
tabs may separate the addresses from the function. The func-
tion  must  be present; the available commands are discussed
in Section 3. The arguments may  be  required  or  optional,
according  to  which function is given; again, they are dis-
cussed in Section 3 under each individual function.

Tab characters and spaces at  the  beginning  of  lines  are
ignored.

1.1. Command-line Flags

Four flags are recognized on the command line:

     -a: tells sed to delay opening files created by  the  w

SED - A Non-interactive Text Editor                 USD:15-3

     function until it is applied to a line of input;
     -e: tells sed to take the next argument as  an  editing
     command;
     -f: tells sed to take the next argument as a file name;
     the  file  should  contain  editing  commands, one to a
     line.
     -n: tells sed not to copy all  lines,  but  only  those
     specified  by  p functions or p flags after s functions
     (see Section 3.3);

1.2. Order of Application of Editing Commands

Before any editing is done (in fact, before any  input  file
is  even opened), all the editing commands are compiled into
a form which will be moderately efficient during the  execu-
tion  phase (when the commands are actually applied to lines
of the input file). The commands are compiled in  the  order
in  which  they are encountered; this is generally the order
in which they will be attempted at execution time. The  com-
mands  are  applied one at a time; the input to each command
is the output of all preceding commands.

The default linear order of application of editing  commands
can be changed by the flow-of-control commands, t and b (see
Section 3). Even when the order of application is changed by
these  commands, it is still true that the input line to any
command is the output of any previously applied command.

1.3.  Pattern-space

The range of pattern matches is called  the  pattern  space.
Ordinarily, the pattern space is one line of the input text,
but more than one line can be read into the pattern space by
using the N command (Section 3.6.).

1.4. Examples

Examples are scattered throughout  the  text.  Except  where
otherwise noted, the examples all assume the following input
text:

        In Xanadu did Kubla Khan
        A stately pleasure dome decree:
        Where Alph, the sacred river, ran
        Through caverns measureless to man
        Down to a sunless sea.

(In no case is the output of the sed  commands  to  be  con-
sidered an improvement on Coleridge.)

USD:15-4                 SED - A Non-interactive Text Editor

Example:

The command

     2q

will quit after copying the first two lines  of  the  input.
The output will be:

        In Xanadu did Kubla Khan
        A stately pleasure dome decree:

2. ADDRESSES: Selecting lines for editing

Lines in the input file(s) to which editing commands are  to
be  applied  can  be selected by addresses. Addresses may be
either line numbers or context addresses.

The application of a group of commands can be controlled  by
one  address (or address-pair) by grouping the commands with
curly braces (`{ }')(Sec. 3.6.).

2.1. Line-number Addresses

A line number is a decimal integer. As  each  line  is  read
from  the  input,  a  line-number  counter is incremented; a
line-number address matches (selects) the input  line  which
causes  the  internal  counter  to  equal  the address line-
number. The counter runs cumulatively through multiple input
files; it is not reset when a new input file is opened.

As a special case, the character $ matches the last line  of
the last input file.

2.2. Context Addresses

A  context  address  is  a  pattern  (`regular  expression')
enclosed  in  slashes  (`/'). The regular expressions recog-
nized by sed are constructed as follows:

     1) An ordinary character (not one  of  those  discussed
     below)  is a regular expression, and matches that char-
     acter.
     2) A circumflex `^'  at  the  beginning  of  a  regular
     expression  matches the null character at the beginning
     of a line.
     3) A dollar-sign `$' at the end of a regular expression
     matches the null character at the end of a line.
     4) The characters `\n' match an imbedded newline  char-
     acter,  but  not  the newline at the end of the pattern
     space.
     5) A  period  `.'  matches  any  character  except  the

SED - A Non-interactive Text Editor                 USD:15-5

     terminal newline of the pattern space.
     6) A regular expression followed  by  an  asterisk  `*'
     matches   any   number   (including   0)   of  adjacent
     occurrences of the regular expression it follows.
     7) A string of characters  in  square  brackets  `[  ]'
     matches any character in the string, and no others. If,
     however, the first character of the string  is  circum-
     flex  `^', the regular expression matches any character
     except the characters in the string  and  the  terminal
     newline of the pattern space.
     8) A concatenation of regular expressions is a  regular
     expression  which  matches the concatenation of strings
     matched by the components of the regular expression.
     9) A regular expression between the sequences `\('  and
     `\)'  is  identical  in effect to the unadorned regular
     expression, but has side-effects  which  are  described
     under the s command below and specification 10) immedi-
     ately below.
     10) The expression `\d' means the same string of  char-
     acters  matched  by  an expression enclosed in `\(' and
     `\)' earlier in the same pattern. Here d  is  a  single
     digit;  the string specified is that beginning with the
     dth occurrence of `\('  counting  from  the  left.  For
     example,  the  expression  `^\(.*\)\1'  matches  a line
     beginning with two repeated  occurrences  of  the  same
     string.
     11) The null regular expression standing  alone  (e.g.,
     `//')  is  equivalent  to  the  last regular expression
     compiled.

To use one of the special characters (^ $ . * [ ] \ /) as  a
literal  (to  match  an  occurrence of itself in the input),
precede the special character by a backslash `\'.

For a context address to `match' the input requires that the
whole  pattern  within the address match some portion of the
pattern space.

2.3. Number of Addresses

The commands in the  next  section  can  have  0,  1,  or  2
addresses.  Under each command the maximum number of allowed
addresses is given. For a command  to  have  more  addresses
than the maximum allowed is considered an error.

If a command has no addresses, it is applied to  every  line
in the input.

If a command has one address, it is  applied  to  all  lines
which match that address.

If a command has two addresses, it is applied to  the  first
line  which matches the first address, and to all subsequent
lines until (and including) the first subsequent line  which

USD:15-6                 SED - A Non-interactive Text Editor

matches  the second address. Then an attempt is made on sub-
sequent lines to again match the first address, and the pro-
cess is repeated.

Two addresses are separated by a comma.

Examples:

        /an/            matches lines 1, 3, 4 in our sample text
        /an.*an/        matches line 1
        /^an/           matches no lines
        /./             matches all lines
        /\./            matches line 5
        /r*an/          matches lines 1,3, 4 (number = zero!)
        /\(an\).*\1/    matches line 1

3. FUNCTIONS

All functions are named by a single character. In  the  fol-
lowing summary, the maximum number of allowable addresses is
given enclosed in parentheses,  then  the  single  character
function  name, possible arguments enclosed in angles (< >),
an expanded  English  translation  of  the  single-character
name,  and finally a description of what each function does.
The angles around the arguments are not part  of  the  argu-
ment, and should not be typed in actual editing commands.

3.1. Whole-line Oriented Functions

(2)d -- delete lines

     The d function deletes from the file (does not write to
     the output) all those lines matched by its address(es).

     It also has the side effect that  no  further  commands
     are  attempted on the corpse of a deleted line; as soon
     as the d function is executed, a new line is read  from
     the  input,  and  the  list  of editing commands is re-
     started from the beginning on the new line.

(2)n -- next line

     The n function reads the  next  line  from  the  input,
     replacing the current line. The current line is written
     to the output if it should be. The list of editing com-
     mands is continued following the n command.

(1)a\ <text> -- append lines

     The a function causes the argument <text> to be written
     to  the  output  after the line matched by its address.
     The a command is inherently multi-line; a  must  appear

SED - A Non-interactive Text Editor                 USD:15-7

     at the end of a line, and <text> may contain any number
     of lines. To preserve  the  one-command-to-a-line  fic-
     tion,  the  interior  newlines  must  be  hidden  by  a
     backslash character  (`\')  immediately  preceding  the
     newline. The <text> argument is terminated by the first
     unhidden newline (the first one  not  immediately  pre-
     ceded by backslash).

     Once an a function  is  successfully  executed,  <text>
     will  be written to the output regardless of what later
     commands  do  to  the  line  which  triggered  it.  The
     triggering  line  may  be deleted entirely; <text> will
     still be written to the output.

     The <text> is not scanned for address matches,  and  no
     editing commands are attempted on it. It does not cause
     any change in the line-number counter.

(1)i\ <text> -- insert lines

     The i function  behaves identically to the a  function,
     except  that <text> is written to the output before the
     matched line. All other comments about the  a  function
     apply to the i function as well.

(2)c\ <text> -- change lines

     The c  function  deletes  the  lines  selected  by  its
     address(es),  and  replaces  them  with  the  lines  in
     <text>. Like a and i, c must be followed by  a  newline
     hidden by a backslash; and interior new lines in <text>
     must be hidden by backslashes.

     The c command may have  two  addresses,  and  therefore
     select  a  range of lines. If it does, all the lines in
     the range are deleted, but only one copy of  <text>  is
     written  to  the output, not one copy per line deleted.
     As with a and i, <text>  is  not  scanned  for  address
     matches,  and  no editing commands are attempted on it.
     It does not change the  line-number counter.

     After a line has been  deleted  by  a  c  function,  no
     further commands are attempted on the corpse.

     If text is appended after a line by a or  r  functions,
     and the line is subsequently changed, the text inserted
     by the c function will be placed before the text of the
     a  or r functions. (The r function is described in Sec-
     tion 3.4.)

          Note: Within the text put in the output  by  these
          functions, leading blanks and tabs will disappear,
          as always in sed commands. To get  leading  blanks
          and  tabs  into  the  output,  precede  the  first

USD:15-8                 SED - A Non-interactive Text Editor

          desired blank or tab by a backslash; the backslash
          will not appear in the output.

Example:

The list of editing commands:

        n
        a\
        XXXX
        d

applied to our standard input, produces:

        In Xanadu did Kubhla Khan
        XXXX
        Where Alph, the sacred river, ran
        XXXX
        Down to a sunless sea.

In this particular case, the same effect would  be  produced
by either of the two following command lists:

        n               n
        i\              c\
        XXXX    XXXX
        d

3.2. Substitute Function

One very important function changes parts of lines  selected
by a context search within the line.

(2)s<pattern><replacement><flags> -- substitute

     The s function replaces part of  a  line  (selected  by
     <pattern>) with <replacement>. It can best be read:

          Substitute for <pattern>, <replacement>

     The <pattern> argument contains a pattern, exactly like
     the  patterns  in  addresses  (see 2.2 above). The only
     difference between <pattern> and a context  address  is
     that  the  context  address  must be delimited by slash
     (`/') characters; <pattern> may  be  delimited  by  any
     character other than space or newline.

     By default, only the first string matched by  <pattern>

SED - A Non-interactive Text Editor                 USD:15-9

     is replaced, but see the g flag below.

     The <replacement> argument begins immediately after the
     second  delimiting  character of <pattern>, and must be
     followed  immediately  by  another  instance   of   the
     delimiting  character.  (Thus  there  are exactly three
     instances of the delimiting character.)

     The <replacement> is not a pattern, and the  characters
     which are special in patterns do not have special mean-
     ing in <replacement>.  Instead,  other  characters  are
     special:

          &    is replaced by the string  matched  by  <pat-
               tern>

          \d   (where d is a single digit)  is  replaced  by
               the  dth  substring matched by parts of <pat-
               tern> enclosed in `\(' and  `\)'.  If  nested
               substrings  occur  in  <pattern>,  the dth is
               determined  by  counting  opening  delimiters
               (`\(').

     As in patterns, special characters may be made  literal
     by preceding them with backslash (`\').

     The <flags> argument may contain the following flags:

          g    substitute  <replacement>   for   all   (non-
               overlapping)  instances  of  <pattern> in the
               line. After a  successful  substitution,  the
               scan  for  the  next  instance  of  <pattern>
               begins just after the  end  of  the  inserted
               characters; characters put into the line from
               <replacement> are not rescanned.

          p    print the line if  a  successful  replacement
               was  done.  The  p flag causes the line to be
               written to the output if and only if  a  sub-
               stitution  was  actually  made by the s func-
               tion. Notice that  if  several  s  functions,
               each  followed by a p flag, successfully sub-
               stitute in  the  same  input  line,  multiple
               copies  of  the  line  will be written to the
               output: one for each successful substitution.

          w <filename>
               write the line to  a  file  if  a  successful
               replacement was done. The w flag causes lines
               which are actually substituted by the s func-
               tion  to  be  written  to  a  file  named  by
               <filename>. If <filename> exists  before  sed
               is  run,  it  is  overwritten;  if not, it is
               created.  The  possibilities   of   multiple,

USD:15-10                SED - A Non-interactive Text Editor

               somewhat  different  copies of one input line
               being written are the same as for p.  A  max-
               imum  of  10 different file names may be men-
               tioned after w flags  and  w  functions  (see
               below), combined.

Examples:

The following command, applied to our standard input,

     s/to/by/w changes

produces, on the standard output:

        In Xanadu did Kubhla Khan
        A stately pleasure dome decree:
        Where Alph, the sacred river, ran
        Through caverns measureless by man
        Down by a sunless sea.

and, on the file `changes':

        Through caverns measureless by man
        Down by a sunless sea.

If the nocopy option is in effect, the command:

     s/[.,;?:]/*P&*/gp

produces:

        A stately pleasure dome decree*P:*
        Where Alph*P,* the sacred river*P,* ran
        Down to a sunless sea*P.*

Finally, to illustrate the effect of the g  flag,  the  com-
mand:

     /X/s/an/AN/p

produces (assuming nocopy mode):

        In XANadu did Kubhla Khan

and the command:

SED - A Non-interactive Text Editor                USD:15-11

     /X/s/an/AN/gp

produces:

        In XANadu did Kubhla KhAN

3.3. Input-output Functions

(2)p -- print

     The print function writes the addressed  lines  to  the
     standard  output file. They are written at the time the
     p function is encountered, regardless of what  succeed-
     ing editing commands may do to the lines.

(2)w <filename> -- write on <filename>

     The write function writes the addressed  lines  to  the
     file  named  by  <filename>.  If  the  file  previously
     existed, it is overwritten; if not, it is created.  The
     lines  are written exactly as they exist when the write
     function is encountered for each  line,  regardless  of
     what subsequent editing commands may do to them.

     A maximum of ten different files may  be  mentioned  in
     write  functions  and  w  flags after s functions, com-
     bined.

(1)r <filename> -- read the contents of a file

     The read function reads the contents of <filename>, and
     appends them after the line matched by the address. The
     file is read and appended regardless of what subsequent
     editing  commands  do  to  the  line  which matched its
     address. If r and a functions are executed on the  same
     line, the text from the a functions and the r functions
     is written to the output in the order  that  the  func-
     tions are executed. If a file mentioned by a r function
     cannot be opened, it is considered a null file, not  an
     error, and no diagnostic is given.

     NOTE: Since there is a limit to  the  number  of  files
     that can be opened simultaneously, care should be taken
     that no more than ten files be mentioned in w functions
     or  flags; that number is reduced by one if any r func-
     tions are present. (Only one read file is open  at  one
     time.)

Examples

Assume that the file `note1' has the following contents:

USD:15-12                SED - A Non-interactive Text Editor

        Note:  Kubla Khan (more properly Kublai Khan; 1216-1294)
        was the grandson and most eminent successor of Genghiz
        (Chingiz) Khan, and founder of the Mongol dynasty in China.

Then the following command:

     /Kubla/r note1

produces:

        In Xanadu did Kubla Khan
        Note:  Kubla Khan (more properly Kublai Khan; 1216-1294)
        was the grandson and most eminent successor of Genghiz
        (Chingiz) Khan, and founder of the Mongol dynasty in China.
        A stately pleasure dome decree:
        Where Alph, the sacred river, ran
        Through caverns measureless to man
        Down to a sunless sea.

3.4. Multiple Input-line Functions

Three functions, all spelled with capital letters, deal spe-
cially  with  pattern  spaces  containing imbedded newlines;
they are intended principally  to  provide  pattern  matches
across lines in the input.

(2)N -- Next line

     The next input line is appended to the current line  in
     the pattern space; the two input lines are separated by
     an imbedded newline. Pattern matches may extend  across
     the imbedded newline(s).

(2)D -- Delete first part of the pattern space

     Delete up to and including the first newline  character
     in  the  current  pattern  space.  If the pattern space
     becomes empty (the only newline was the  terminal  new-
     line),  read  another line from the input. In any case,
     begin the list  of  editing  commands  again  from  its
     beginning.

(2)P -- Print first part of the pattern space

     Print up to and including the first newline in the pat-
     tern space.

     The P and D functions are equivalent  to  their  lower-
     case  counterparts if there are no imbedded newlines in
     the pattern space.

SED - A Non-interactive Text Editor                USD:15-13

3.5.  Hold and Get Functions

Four functions save and retrieve part of the input for  pos-
sible later use.

(2)h -- hold pattern space

     The h functions copies  the  contents  of  the  pattern
     space  into  a  hold area (destroying the previous con-
     tents of the hold area).

(2)H -- Hold pattern space

     The H function appends  the  contents  of  the  pattern
     space  to the contents of the hold area; the former and
     new contents are separated by a newline.

(2)g -- get contents of hold area

     The g function copies the contents  of  the  hold  area
     into  the  pattern  space (destroying the previous con-
     tents of the pattern space).

(2)G -- Get contents of hold area

     The G function appends the contents of the hold area to
     the  contents  of the pattern space; the former and new
     contents are separated by a newline.

(2)x -- exchange
     The exchange command interchanges the contents  of  the
     pattern space and the hold area.

Example

The commands

        1h
        1s/ did.*//
        1x
        G
        s/\n/  :/

applied to our standard example, produce:

        In Xanadu did Kubla Khan  :In Xanadu
        A stately pleasure dome decree:  :In Xanadu
        Where Alph, the sacred river, ran  :In Xanadu
        Through caverns measureless to man  :In Xanadu
        Down to a sunless sea.  :In Xanadu

USD:15-14                SED - A Non-interactive Text Editor

3.6.  Flow-of-Control Functions

These functions do no editing on the input lines,  but  con-
trol  the  application of functions to the lines selected by
the address part.

(2)! -- Don't

     The Don't command causes the next command  (written  on
     the  same  line),  to  be applied to all and only those
     input lines not selected by the address part.

(2){ -- Grouping

     The grouping command `{' causes the next  set  of  com-
     mands  to be applied (or not applied) as a block to the
     input lines selected by the addresses of  the  grouping
     command. The first of the commands under control of the
     grouping may appear on the same line as the `{'  or  on
     the next line.

     The group of commands is terminated by a  matching  `}'
     standing on a line by itself.

     Groups can be nested.

(0):<label> -- place a label

     The label function marks a place in the list of editing
     commands which may be referred to by b and t functions.
     The <label> may be any sequence of eight or fewer char-
     acters; if two different colon functions have identical
     labels, a compile time diagnostic  will  be  generated,
     and no execution attempted.

(2)b<label> -- branch to label

     The branch function causes   the  sequence  of  editing
     commands  being applied to the current input line to be
     restarted immediately after the  place  where  a  colon
     function  with  the same <label> was encountered. If no
     colon function with the same label can be  found  after
     all  the editing commands have been compiled, a compile
     time  diagnostic  is  produced,  and  no  execution  is
     attempted.

     A b function with no <label> is taken to be a branch to
     the  end  of  the  list  of  editing commands; whatever
     should be done with the current input line is done, and
     another  input  line  is read; the list of editing com-
     mands is restarted from the beginning on the new line.

(2)t<label> -- test substitutions

SED - A Non-interactive Text Editor                USD:15-15

     The t function tests whether any  successful  substitu-
     tions  have been made on the current input line; if so,
     it branches to <label>; if not, it  does  nothing.  The
     flag which indicates that a successful substitution has
     been executed is reset by:

     1)   reading a new input line, or

     2)   executing a t function.

3.7. Miscellaneous Functions

(1)= -- equals

     The = function writes to the standard output  the  line
     number of the line matched by its address.

(1)q -- quit

     The q function causes the current line to be written to
     the output (if it should be), any appended or read text
     to be written, and execution to be terminated.

Reference

[1]  Ken  Thompson  and  Dennis   M.   Ritchie,   The   UNIX
     Programmer's Manual. Bell Laboratories, 1978.

Generated on 2014-04-02 20:57:59 by $MirOS: src/scripts/roff2htm,v 1.79 2014/02/10 00:36:11 tg Exp $

These manual pages and other documentation are copyrighted by their respective writers; their source is available at our CVSweb, AnonCVS, and other mirrors. The rest is Copyright © 2002‒2014 The MirOS Project, Germany.
This product includes material provided by Thorsten Glaser.

This manual page’s HTML representation is supposed to be valid XHTML/1.1; if not, please send a bug report – diffs preferred.