MirOS Manual: 17.m4(PSD)


The M4 Macro Processor                                   PS1:17-1

                     The M4 Macro Processor

                       Brian W. Kernighan

                        Dennis M. Ritchie

                     AT&T Bell Laboratories

                  Murray Hill, New Jersey 07974

                            ABSTRACT

          M4 is a macro processor  available  on  UNIX-  and

     GCOS.  Its primary use has been as a front end for Rat-

     for for those cases where parameterless macros are  not

     adequately   powerful.   It  has  also  been  used  for

     languages as disparate as C and Cobol. M4  is  particu-

     larly  suited  for  functional  languages like Fortran,

     PL/I and C since macros are specified in  a  functional

     notation.

          M4 provides features seldom  found  even  in  much

     larger macro processors, including

-------------------------
- UNIX is a registered trademark of AT&T  Bell  Labora-
tories in the USA and other countries.

PS1:17-2                                   The M4 Macro Processor

       +  arguments

       +  condition testing

       +  arithmetic capabilities

       +  string and substring functions

       +  file manipulation

          This paper is a user's manual for M4.

Introduction

     A macro processor is a useful way to enhance  a  programming

language,  to  make  it  more  palatable  or more readable, or to

tailor it to a particular application. The #define statement in C

and  the  analogous  define  in  Ratfor are examples of the basic

facility provided by any macro processor - replacement of text by

other text.

     The M4 macro processor is an extension of a macro  processor

called  M3  which was written by D. M. Ritchie for the AP-3 mini-

computer; M3 was in turn based on a macro  processor  implemented

for  [1].  Readers  unfamiliar with the basic ideas of macro pro-

cessing may wish to read some of the discussion there.

     M4 is a suitable front end for Ratfor and C,  and  has  also

been  used  successfully  with Cobol. Besides the straightforward

replacement of one string of text by another, it provides  macros

The M4 Macro Processor                                   PS1:17-3

with  arguments,  conditional  macro  expansion, arithmetic, file

manipulation, and some specialized string processing functions.

     The basic operation of M4 is to copy its input to  its  out-

put.  As  the input is read, however, each alphanumeric ``token''

(that is, string of letters and digits) is checked. If it is  the

name  of  a  macro, then the name of the macro is replaced by its

defining text, and the resulting string is pushed back  onto  the

input  to  be  rescanned. Macros may be called with arguments, in

which case the arguments are collected and substituted  into  the

right places in the defining text before it is rescanned.

     M4 provides a collection of  about  twenty  built-in  macros

which  perform  various  useful operations; in addition, the user

can define new macros. Built-ins  and  user-defined  macros  work

exactly  the  same  way,  except that some of the built-in macros

have side effects on the state of the process.

Usage

     On UNIX, use

   m4 [files]

Each argument file is processed in order; if there are  no  argu-

ments,  or  if  an argument is `-', the standard input is read at

that point. The processed text is written on the standard output,

which may be captured for subsequent processing with

   m4 [files] >outputfile

PS1:17-4                                   The M4 Macro Processor

On GCOS, usage is identical, but the program is called ./m4.

Defining Macros

     The primary built-in function of M4 is define, which is used

to define new macros. The input

   define(name, stuff)

causes the string name to be defined  as  stuff.  All  subsequent

occurrences  of  name  will  be  replaced  by stuff. name must be

alphanumeric and must begin  with  a  letter  (the  underscore  _

counts  as  a  letter).  stuff is any text that contains balanced

parentheses; it may stretch over multiple lines.

     Thus, as a typical example,

   define(N, 100)

    ...

   if (i > N)

defines N to be 100, and uses this  ``symbolic  constant''  in  a

later if statement.

     The  left  parenthesis  must  immediately  follow  the  word

define,  to  signal  that  define  has  arguments.  If a macro or

built-in name is not followed immediately by `(', it  is  assumed

to  have  no  arguments. This is the situation for N above; it is

actually a macro with no arguments, and  thus  when  it  is  used

there need be no (...) following it.

     You should also notice that a macro name is only  recognized

The M4 Macro Processor                                   PS1:17-5

as  such if it appears surrounded by non-alphanumerics. For exam-

ple, in

   define(N, 100)

    ...

   if (NNN > 100)

the variable NNN is absolutely unrelated to the defined macro  N,

even though it contains a lot of N's.

     Things may be defined in terms of other things. For example,

   define(N, 100)

   define(M, N)

defines both M and N to be 100.

     What happens if N is redefined? Or, to say it  another  way,

is  M  defined  as  N or as 100? In M4, the latter is true - M is

100, so even if N subsequently changes, M does not.

     This behavior arises because M4  expands  macro  names  into

their  defining text as soon as it possibly can. Here, that means

that when the string N is seen as the  arguments  of  define  are

being  collected, it is immediately replaced by 100; it's just as

if you had said

   define(M, 100)

in the first place.

     If this isn't what you really want, there are two  ways  out

PS1:17-6                                   The M4 Macro Processor

of  it.  The  first,  which  is specific to this situation, is to

interchange the order of the definitions:

   define(M, N)

   define(N, 100)

Now M is defined to be the string N, so when you ask for M later,

you'll always get the value of N at that time (because the M will

be replaced by N which will be replaced by 100).

Quoting

     The more general solution is to delay the expansion  of  the

arguments  of  define by quoting them. Any text surrounded by the

single quotes ` and ' is not expanded immediately,  but  has  the

quotes stripped off. If you say

   define(N, 100)

   define(M, `N')

the quotes around the N are stripped off as the argument is being

collected,  but  they have served their purpose, and M is defined

as the string N, not 100. The general  rule  is  that  M4  always

strips off one level of single quotes whenever it evaluates some-

thing. This is true even outside of macros. If you want the  word

define  to  appear  in  the  output,  you have to quote it in the

input, as in

        `define' = 1;

     As another instance of the same thing, which is a  bit  more

The M4 Macro Processor                                   PS1:17-7

surprising, consider redefining N:

   define(N, 100)

    ...

   define(N, 200)

Perhaps regrettably, the N in the second definition is  evaluated

as  soon as it's seen; that is, it is replaced by 100, so it's as

if you had written

   define(100, 200)

This statement is ignored by M4, since you can only define things

that  look  like  names, but it obviously doesn't have the effect

you wanted. To really redefine N, you must delay  the  evaluation

by quoting:

   define(N, 100)

    ...

   define(`N', 200)

In M4, it is often wise to quote the first argument of a macro.

     If ` and ' are not convenient for  some  reason,  the  quote

characters can be changed with the built-in changequote:

   changequote([, ])

makes the new quote characters the left and right  brackets.  You

can restore the original characters with just

   changequote

PS1:17-8                                   The M4 Macro Processor

     There are two additional built-ins related to define.  unde-

fine removes the definition of some macro or built-in:

   undefine(`N')

removes the definition of  N.  (Why  are  the  quotes  absolutely

necessary?) Built-ins can be removed with undefine, as in

   undefine(`define')

but once you remove one, you can never get it back.

     The built-in ifdef provides a way to determine if a macro is

currently  defined.  In  particular, M4 has pre-defined the names

unix and gcos on the corresponding systems, so you can tell which

one you're using:

   ifdef(`unix', `define(wordsize,16)' )

   ifdef(`gcos', `define(wordsize,36)' )

makes a definition appropriate for the particular machine.  Don't

forget the quotes!

     ifdef actually permits three arguments; if the name is unde-

fined, the value of ifdef is then the third argument, as in

   ifdef(`unix', on UNIX, not on UNIX)

Arguments

     So far we have discussed the simplest form of macro process-

ing  -  replacing  one  string  by  another (fixed) string. User-

The M4 Macro Processor                                   PS1:17-9

defined macros may also have arguments, so different  invocations

can  have  different  results.  Within the replacement text for a

macro (the second argument of its define) any  occurrence  of  $n

will  be  replaced by the nth argument when the macro is actually

used. Thus, the macro bump, defined as

   define(bump, $1 = $1 + 1)

generates code to increment its argument by 1:

   bump(x)

is

   x = x + 1

     A macro can have as many arguments as you want, but only the

first  nine  are  accessible,  through  $1 to $9. (The macro name

itself is $0, although that is  less  commonly  used.)  Arguments

that  are  not  supplied  are replaced by null strings, so we can

define a macro cat which simply concatenates its arguments,  like

this:

   define(cat, $1$2$3$4$5$6$7$8$9)

Thus

   cat(x, y, z)

is equivalent to

   xyz

PS1:17-10                                  The M4 Macro Processor

$4 through $9 are null, since  no  corresponding  arguments  were

provided.

     Leading unquoted blanks, tabs, or newlines that occur during

argument  collection  are  discarded.  All  other  white space is

retained. Thus

   define(a,   b   c)

defines a to be b   c.

     Arguments are  separated  by  commas,  but  parentheses  are

counted  properly,  so  a comma ``protected'' by parentheses does

not terminate an argument. That is, in

   define(a, (b,c))

there are only two arguments; the second is literally (b,c).  And

of  course a bare comma or parenthesis can be inserted by quoting

it.

Arithmetic Built-ins

     M4 provides two built-in functions for doing  arithmetic  on

integers  (only).  The  simplest  is  incr,  which increments its

numeric argument by 1. Thus  to  handle  the  common  programming

situation  where  you want a variable to be defined as ``one more

than N'', write

   define(N, 100)

   define(N1, `incr(N)')

The M4 Macro Processor                                  PS1:17-11

Then N1 is defined as one more than the current value of N.

     The more general mechanism  for  arithmetic  is  a  built-in

called   eval,  which  is  capable  of  arbitrary  arithmetic  on

integers. It provides the operators (in decreasing order of  pre-

cedence)

        unary + and -

        ** or ^ (exponentiation)

        *  /  % (modulus)

        +  -

        ==  !=  <  <=  >  >=

        !               (not)

        & or && (logical and)

        | or ||         (logical or)

Parentheses may be used to group operations where needed. All the

operands  of  an  expression  given  to  eval  must ultimately be

numeric. The numeric value of a true relation (like  1>0)  is  1,

and  false  is 0. The precision in eval is 32 bits on UNIX and 36

bits on GCOS.

     As a simple example, suppose we want M to be 2**N+1. Then

   define(N, 3)

   define(M, `eval(2**N+1)')

As a matter of principle, it is advisable to quote  the  defining

text  for  a  macro  unless  it is very simple indeed (say just a

number); it usually gives the result you  want,  and  is  a  good

PS1:17-12                                  The M4 Macro Processor

habit to get into.

File Manipulation

     You can include a new file in the input at any time  by  the

built-in function include:

   include(filename)

inserts the contents of filename in place of the include command.

The contents of the file is often a set of definitions. The value

of include (that is, its replacement text) is the contents of the

file; this can be captured in definitions, etc.

     It is a fatal error if the file named in include  cannot  be

accessed.  To get some control over this situation, the alternate

form sinclude can be used;  sinclude  (``silent  include'')  says

nothing and continues if it can't access the file.

     It is also possible to divert the output of M4 to  temporary

files  during  processing, and output the collected material upon

command. M4  maintains  nine  of  these  diversions,  numbered  1

through 9. If you say

   divert(n)

all subsequent output is put onto the end  of  a  temporary  file

referred  to  as  n. Diverting to this file is stopped by another

divert command; in particular, divert or  divert(0)  resumes  the

normal output process.

     Diverted text is normally output all at once at the  end  of

The M4 Macro Processor                                  PS1:17-13

processing,  with  the  diversions output in numeric order. It is

possible, however, to bring back diversions at any time, that is,

to append them to the current diversion.

   undivert

brings back all diversions in numeric order,  and  undivert  with

arguments brings back the selected diversions in the order given.

The act of undiverting  discards  the  diverted  stuff,  as  does

diverting  into  a  diversion whose number is not between 0 and 9

inclusive.

     The value of undivert is not the  diverted  stuff.  Further-

more, the diverted material is not rescanned for macros.

     The built-in divnum returns  the  number  of  the  currently

active diversion. This is zero during normal processing.

System Command

     You can run any program in the local operating  system  with

the syscmd built-in. For example,

   syscmd(date)

on UNIX runs the date command. Normally syscmd would be  used  to

create a file for a subsequent include.

     To facilitate making unique file names,  the  built-in  mak-

etemp  is  provided,  with specifications identical to the system

function mktemp: a string of XXXXX in the argument is replaced by

the process id of the current process.

PS1:17-14                                  The M4 Macro Processor

Conditionals

     There is a built-in called ifelse which enables you to  per-

form arbitrary conditional testing. In the simplest form,

   ifelse(a, b, c, d)

compares the two strings a and b. If these are identical,  ifelse

returns  the  string  c;  otherwise  it  returns d. Thus we might

define a macro called compare  which  compares  two  strings  and

returns ``yes'' or ``no'' if they are the same or different.

   define(compare, `ifelse($1, $2, yes, no)')

Note the quotes, which prevent too-early evaluation of ifelse.

     If the fourth argument is missing, it is treated as empty.

     ifelse can actually have any number of arguments,  and  thus

provides  a limited form of multi-way decision capability. In the

input

   ifelse(a, b, c, d, e, f, g)

if the string a matches the string b, the result is c. Otherwise,

if  d  is the same as e, the result is f. Otherwise the result is

g. If the final argument is omitted, the result is null, so

   ifelse(a, b, c)

is c if a matches b, and null otherwise.

The M4 Macro Processor                                  PS1:17-15

String Manipulation

     The built-in len returns the length of the string that makes

up its argument. Thus

   len(abcdef)

is 6, and len((a,b)) is 5.

     The built-in substr can be used  to  produce  substrings  of

strings.  substr(s, i, n)  returns the substring of s that starts

at the ith position (origin zero), and is n characters long. If n

is omitted, the rest of the string is returned, so

   substr(`now is the time', 1)

is

   ow is the time

If i or n are out of range, various sensible things happen.

     index(s1, s2) returns the index (position) in s1  where  the

string  s2 occurs, or -1 if it doesn't occur. As with substr, the

origin for strings is 0.

     The built-in translit performs character transliteration.

   translit(s, f, t)

modifies  s  by  replacing  any  character  found  in  f  by  the

corresponding character of t. That is,

PS1:17-16                                  The M4 Macro Processor

   translit(s, aeiou, 12345)

replaces the vowels by the corresponding digits. If t is  shorter

than f, characters which don't have an entry in t are deleted; as

a limiting case, if t is not present at all,  characters  from  f

are deleted from s. So

   translit(s, aeiou)

deletes vowels from s.

     There is also a built-in called dnl which deletes all  char-

acters that follow it up to and including the next newline; it is

useful mainly for throwing away empty lines that  otherwise  tend

to clutter up M4 output. For example, if you say

   define(N, 100)

   define(M, 200)

   define(L, 300)

the newline at the end of each line is not part  of  the  defini-

tion,  so  it  is  copied  into  the  output, where it may not be

wanted. If you add dnl to each of these lines, the newlines  will

disappear.

     Another way to achieve this, due to J. E. Weythman, is

The M4 Macro Processor                                  PS1:17-17

   divert(-1)

        define(...)

        ...

   divert

Printing

     The built-in errprint writes its arguments out on the  stan-

dard error file. Thus you can say

   errprint(`fatal error')

     dumpdef is a debugging aid which dumps the  current  defini-

tions of defined terms. If there are no arguments, you get every-

thing; otherwise you get the ones you name  as  arguments.  Don't

forget to quote the names!

Summary of Built-ins

     Each entry is preceded  by  the  page  number  where  it  is

described.

PS1:17-18                                  The M4 Macro Processor

        3 changequote(L, R)

        1 define(name, replacement)

        4 divert(number)

        4 divnum

        5 dnl

        5 dumpdef(`name', `name', ...)

        5 errprint(s, s, ...)

        4 eval(numeric expression)

        3 ifdef(`name', this if true, this if false)

        5 ifelse(a, b, c, d)

        4 include(file)

        3 incr(number)

        5 index(s1, s2)

        5 len(string)

        4 maketemp(...XXXXX...)

        4 sinclude(file)

        5 substr(string, position, number)

        4 syscmd(s)

        5 translit(str, from, to)

        3 undefine(`name')

        4 undivert(number,number,...)

Acknowledgements

     We are indebted to Rick Becker, John Chambers, Doug McIlroy,

and  especially  Jim Weythman, whose pioneering use of M4 has led

to several valuable improvements. We are also deeply grateful  to

The M4 Macro Processor                                  PS1:17-19

Weythman for several substantial contributions to the code.

References

[1]  B. W. Kernighan and P. J. Plauger, Software Tools,  Addison-

     Wesley, Inc., 1976.

Generated on 2014-07-04 21:17:45 by $MirOS: src/scripts/roff2htm,v 1.79 2014/02/10 00:36:11 tg Exp $

These manual pages and other documentation are copyrighted by their respective writers; their source is available at our CVSweb, AnonCVS, and other mirrors. The rest is Copyright © 2002‒2014 The MirOS Project, Germany.
This product includes material provided by Thorsten Glaser.

This manual page’s HTML representation is supposed to be valid XHTML/1.1; if not, please send a bug report – diffs preferred.