The M4 Macro Processor PS1:17-1
The M4 Macro Processor
Brian W. Kernighan
Dennis M. Ritchie
AT&T Bell Laboratories
Murray Hill, New Jersey 07974
ABSTRACT
M4 is a macro processor available on UNIX- and
GCOS. Its primary use has been as a front end for Rat-
for for those cases where parameterless macros are not
adequately powerful. It has also been used for
languages as disparate as C and Cobol. M4 is particu-
larly suited for functional languages like Fortran,
PL/I and C since macros are specified in a functional
notation.
M4 provides features seldom found even in much
larger macro processors, including
-------------------------
- UNIX is a registered trademark of AT&T Bell Labora-
tories in the USA and other countries.
PS1:17-2 The M4 Macro Processor
• arguments
• condition testing
• arithmetic capabilities
• string and substring functions
• file manipulation
This paper is a user's manual for M4.
Introduction
A macro processor is a useful way to enhance a programming
language, to make it more palatable or more readable, or to
tailor it to a particular application. The #define statement in C
and the analogous define in Ratfor are examples of the basic
facility provided by any macro processor - replacement of text by
other text.
The M4 macro processor is an extension of a macro processor
called M3 which was written by D. M. Ritchie for the AP-3 mini-
computer; M3 was in turn based on a macro processor implemented
for [1]. Readers unfamiliar with the basic ideas of macro pro-
cessing may wish to read some of the discussion there.
M4 is a suitable front end for Ratfor and C, and has also
been used successfully with Cobol. Besides the straightforward
replacement of one string of text by another, it provides macros
The M4 Macro Processor PS1:17-3
with arguments, conditional macro expansion, arithmetic, file
manipulation, and some specialized string processing functions.
The basic operation of M4 is to copy its input to its out-
put. As the input is read, however, each alphanumeric ``token''
(that is, string of letters and digits) is checked. If it is the
name of a macro, then the name of the macro is replaced by its
defining text, and the resulting string is pushed back onto the
input to be rescanned. Macros may be called with arguments, in
which case the arguments are collected and substituted into the
right places in the defining text before it is rescanned.
M4 provides a collection of about twenty built-in macros
which perform various useful operations; in addition, the user
can define new macros. Built-ins and user-defined macros work
exactly the same way, except that some of the built-in macros
have side effects on the state of the process.
Usage
On UNIX, use
m4 [files]
Each argument file is processed in order; if there are no argu-
ments, or if an argument is `-', the standard input is read at
that point. The processed text is written on the standard output,
which may be captured for subsequent processing with
m4 [files] >outputfile
PS1:17-4 The M4 Macro Processor
On GCOS, usage is identical, but the program is called ./m4.
Defining Macros
The primary built-in function of M4 is define, which is used
to define new macros. The input
define(name, stuff)
causes the string name to be defined as stuff. All subsequent
occurrences of name will be replaced by stuff. name must be
alphanumeric and must begin with a letter (the underscore _
counts as a letter). stuff is any text that contains balanced
parentheses; it may stretch over multiple lines.
Thus, as a typical example,
define(N, 100)
...
if (i > N)
defines N to be 100, and uses this ``symbolic constant'' in a
later if statement.
The left parenthesis must immediately follow the word
define, to signal that define has arguments. If a macro or
built-in name is not followed immediately by `(', it is assumed
to have no arguments. This is the situation for N above; it is
actually a macro with no arguments, and thus when it is used
there need be no (...) following it.
You should also notice that a macro name is only recognized
The M4 Macro Processor PS1:17-5
as such if it appears surrounded by non-alphanumerics. For exam-
ple, in
define(N, 100)
...
if (NNN > 100)
the variable NNN is absolutely unrelated to the defined macro N,
even though it contains a lot of N's.
Things may be defined in terms of other things. For example,
define(N, 100)
define(M, N)
defines both M and N to be 100.
What happens if N is redefined? Or, to say it another way,
is M defined as N or as 100? In M4, the latter is true - M is
100, so even if N subsequently changes, M does not.
This behavior arises because M4 expands macro names into
their defining text as soon as it possibly can. Here, that means
that when the string N is seen as the arguments of define are
being collected, it is immediately replaced by 100; it's just as
if you had said
define(M, 100)
in the first place.
If this isn't what you really want, there are two ways out
PS1:17-6 The M4 Macro Processor
of it. The first, which is specific to this situation, is to
interchange the order of the definitions:
define(M, N)
define(N, 100)
Now M is defined to be the string N, so when you ask for M later,
you'll always get the value of N at that time (because the M will
be replaced by N which will be replaced by 100).
Quoting
The more general solution is to delay the expansion of the
arguments of define by quoting them. Any text surrounded by the
single quotes ` and ' is not expanded immediately, but has the
quotes stripped off. If you say
define(N, 100)
define(M, `N')
the quotes around the N are stripped off as the argument is being
collected, but they have served their purpose, and M is defined
as the string N, not 100. The general rule is that M4 always
strips off one level of single quotes whenever it evaluates some-
thing. This is true even outside of macros. If you want the word
define to appear in the output, you have to quote it in the
input, as in
`define' = 1;
As another instance of the same thing, which is a bit more
The M4 Macro Processor PS1:17-7
surprising, consider redefining N:
define(N, 100)
...
define(N, 200)
Perhaps regrettably, the N in the second definition is evaluated
as soon as it's seen; that is, it is replaced by 100, so it's as
if you had written
define(100, 200)
This statement is ignored by M4, since you can only define things
that look like names, but it obviously doesn't have the effect
you wanted. To really redefine N, you must delay the evaluation
by quoting:
define(N, 100)
...
define(`N', 200)
In M4, it is often wise to quote the first argument of a macro.
If ` and ' are not convenient for some reason, the quote
characters can be changed with the built-in changequote:
changequote([, ])
makes the new quote characters the left and right brackets. You
can restore the original characters with just
changequote
PS1:17-8 The M4 Macro Processor
There are two additional built-ins related to define. unde-
fine removes the definition of some macro or built-in:
undefine(`N')
removes the definition of N. (Why are the quotes absolutely
necessary?) Built-ins can be removed with undefine, as in
undefine(`define')
but once you remove one, you can never get it back.
The built-in ifdef provides a way to determine if a macro is
currently defined. In particular, M4 has pre-defined the names
unix and gcos on the corresponding systems, so you can tell which
one you're using:
ifdef(`unix', `define(wordsize,16)' )
ifdef(`gcos', `define(wordsize,36)' )
makes a definition appropriate for the particular machine. Don't
forget the quotes!
ifdef actually permits three arguments; if the name is unde-
fined, the value of ifdef is then the third argument, as in
ifdef(`unix', on UNIX, not on UNIX)
Arguments
So far we have discussed the simplest form of macro process-
ing - replacing one string by another (fixed) string. User-
The M4 Macro Processor PS1:17-9
defined macros may also have arguments, so different invocations
can have different results. Within the replacement text for a
macro (the second argument of its define) any occurrence of $n
will be replaced by the nth argument when the macro is actually
used. Thus, the macro bump, defined as
define(bump, $1 = $1 • 1)
generates code to increment its argument by 1:
bump(x)
is
x = x • 1
A macro can have as many arguments as you want, but only the
first nine are accessible, through $1 to $9. (The macro name
itself is $0, although that is less commonly used.) Arguments
that are not supplied are replaced by null strings, so we can
define a macro cat which simply concatenates its arguments, like
this:
define(cat, $1$2$3$4$5$6$7$8$9)
Thus
cat(x, y, z)
is equivalent to
xyz
PS1:17-10 The M4 Macro Processor
$4 through $9 are null, since no corresponding arguments were
provided.
Leading unquoted blanks, tabs, or newlines that occur during
argument collection are discarded. All other white space is
retained. Thus
define(a, b c)
defines a to be b c.
Arguments are separated by commas, but parentheses are
counted properly, so a comma ``protected'' by parentheses does
not terminate an argument. That is, in
define(a, (b,c))
there are only two arguments; the second is literally (b,c). And
of course a bare comma or parenthesis can be inserted by quoting
it.
Arithmetic Built-ins
M4 provides two built-in functions for doing arithmetic on
integers (only). The simplest is incr, which increments its
numeric argument by 1. Thus to handle the common programming
situation where you want a variable to be defined as ``one more
than N'', write
define(N, 100)
define(N1, `incr(N)')
The M4 Macro Processor PS1:17-11
Then N1 is defined as one more than the current value of N.
The more general mechanism for arithmetic is a built-in
called eval, which is capable of arbitrary arithmetic on
integers. It provides the operators (in decreasing order of pre-
cedence)
unary + and -
** or ^ (exponentiation)
* / % (modulus)
+ -
== != < <= > >=
! (not)
& or && (logical and)
| or || (logical or)
Parentheses may be used to group operations where needed. All the
operands of an expression given to eval must ultimately be
numeric. The numeric value of a true relation (like 1>0) is 1,
and false is 0. The precision in eval is 32 bits on UNIX and 36
bits on GCOS.
As a simple example, suppose we want M to be 2**N+1. Then
define(N, 3)
define(M, `eval(2**N+1)')
As a matter of principle, it is advisable to quote the defining
text for a macro unless it is very simple indeed (say just a
number); it usually gives the result you want, and is a good
PS1:17-12 The M4 Macro Processor
habit to get into.
File Manipulation
You can include a new file in the input at any time by the
built-in function include:
include(filename)
inserts the contents of filename in place of the include command.
The contents of the file is often a set of definitions. The value
of include (that is, its replacement text) is the contents of the
file; this can be captured in definitions, etc.
It is a fatal error if the file named in include cannot be
accessed. To get some control over this situation, the alternate
form sinclude can be used; sinclude (``silent include'') says
nothing and continues if it can't access the file.
It is also possible to divert the output of M4 to temporary
files during processing, and output the collected material upon
command. M4 maintains nine of these diversions, numbered 1
through 9. If you say
divert(n)
all subsequent output is put onto the end of a temporary file
referred to as n. Diverting to this file is stopped by another
divert command; in particular, divert or divert(0) resumes the
normal output process.
Diverted text is normally output all at once at the end of
The M4 Macro Processor PS1:17-13
processing, with the diversions output in numeric order. It is
possible, however, to bring back diversions at any time, that is,
to append them to the current diversion.
undivert
brings back all diversions in numeric order, and undivert with
arguments brings back the selected diversions in the order given.
The act of undiverting discards the diverted stuff, as does
diverting into a diversion whose number is not between 0 and 9
inclusive.
The value of undivert is not the diverted stuff. Further-
more, the diverted material is not rescanned for macros.
The built-in divnum returns the number of the currently
active diversion. This is zero during normal processing.
System Command
You can run any program in the local operating system with
the syscmd built-in. For example,
syscmd(date)
on UNIX runs the date command. Normally syscmd would be used to
create a file for a subsequent include.
To facilitate making unique file names, the built-in mak-
etemp is provided, with specifications identical to the system
function mktemp: a string of XXXXX in the argument is replaced by
the process id of the current process.
PS1:17-14 The M4 Macro Processor
Conditionals
There is a built-in called ifelse which enables you to per-
form arbitrary conditional testing. In the simplest form,
ifelse(a, b, c, d)
compares the two strings a and b. If these are identical, ifelse
returns the string c; otherwise it returns d. Thus we might
define a macro called compare which compares two strings and
returns ``yes'' or ``no'' if they are the same or different.
define(compare, `ifelse($1, $2, yes, no)')
Note the quotes, which prevent too-early evaluation of ifelse.
If the fourth argument is missing, it is treated as empty.
ifelse can actually have any number of arguments, and thus
provides a limited form of multi-way decision capability. In the
input
ifelse(a, b, c, d, e, f, g)
if the string a matches the string b, the result is c. Otherwise,
if d is the same as e, the result is f. Otherwise the result is
g. If the final argument is omitted, the result is null, so
ifelse(a, b, c)
is c if a matches b, and null otherwise.
The M4 Macro Processor PS1:17-15
String Manipulation
The built-in len returns the length of the string that makes
up its argument. Thus
len(abcdef)
is 6, and len((a,b)) is 5.
The built-in substr can be used to produce substrings of
strings. substr(s, i, n) returns the substring of s that starts
at the ith position (origin zero), and is n characters long. If n
is omitted, the rest of the string is returned, so
substr(`now is the time', 1)
is
ow is the time
If i or n are out of range, various sensible things happen.
index(s1, s2) returns the index (position) in s1 where the
string s2 occurs, or -1 if it doesn't occur. As with substr, the
origin for strings is 0.
The built-in translit performs character transliteration.
translit(s, f, t)
modifies s by replacing any character found in f by the
corresponding character of t. That is,
PS1:17-16 The M4 Macro Processor
translit(s, aeiou, 12345)
replaces the vowels by the corresponding digits. If t is shorter
than f, characters which don't have an entry in t are deleted; as
a limiting case, if t is not present at all, characters from f
are deleted from s. So
translit(s, aeiou)
deletes vowels from s.
There is also a built-in called dnl which deletes all char-
acters that follow it up to and including the next newline; it is
useful mainly for throwing away empty lines that otherwise tend
to clutter up M4 output. For example, if you say
define(N, 100)
define(M, 200)
define(L, 300)
the newline at the end of each line is not part of the defini-
tion, so it is copied into the output, where it may not be
wanted. If you add dnl to each of these lines, the newlines will
disappear.
Another way to achieve this, due to J. E. Weythman, is
The M4 Macro Processor PS1:17-17
divert(-1)
define(...)
...
divert
Printing
The built-in errprint writes its arguments out on the stan-
dard error file. Thus you can say
errprint(`fatal error')
dumpdef is a debugging aid which dumps the current defini-
tions of defined terms. If there are no arguments, you get every-
thing; otherwise you get the ones you name as arguments. Don't
forget to quote the names!
Summary of Built-ins
Each entry is preceded by the page number where it is
described.
PS1:17-18 The M4 Macro Processor
3 changequote(L, R)
1 define(name, replacement)
4 divert(number)
4 divnum
5 dnl
5 dumpdef(`name', `name', ...)
5 errprint(s, s, ...)
4 eval(numeric expression)
3 ifdef(`name', this if true, this if false)
5 ifelse(a, b, c, d)
4 include(file)
3 incr(number)
5 index(s1, s2)
5 len(string)
4 maketemp(...XXXXX...)
4 sinclude(file)
5 substr(string, position, number)
4 syscmd(s)
5 translit(str, from, to)
3 undefine(`name')
4 undivert(number,number,...)
Acknowledgements
We are indebted to Rick Becker, John Chambers, Doug McIlroy,
and especially Jim Weythman, whose pioneering use of M4 has led
to several valuable improvements. We are also deeply grateful to
The M4 Macro Processor PS1:17-19
Weythman for several substantial contributions to the code.
References
[1] B. W. Kernighan and P. J. Plauger, Software Tools, Addison-
Wesley, Inc., 1976.
Generated on 2008-12-26 21:13:42 by $MirOS: src/scripts/roff2htm,v 1.57 2008/12/09 22:04:51 tg Exp $
These manual pages are copyrighted
by their respective writers; their source is available at our CVSweb, AnonCVS, and other mirrors.
The rest is Copyright © 2002-2008 The
MirOS Project, Germany.
This product includes material provided by Thorsten Glaser.
This manual page’s HTML representation is supposed to be valid XHTML/1.1; if not, please send a bug report – diffs preferred.