TeXpp -- TeX Preprocessor

Introduction
Macro names
Macro parameters
Defining a macro
Advanced features
Translation file
Examples
Usage
Author

1 Introduction

While one can define macros in the TeX typesetting language, their names are restricted to a very limited set. For example one cannot write the sequence "|--" to denote derivability, instead the not so easily memorizable word "\vdash" should be used. Neither "<=" nor ">=" can be used, only "\le" and "\ge", respectively. Angular brackets do not appear in the text as easily recognizable sequences (as, e.g., "<:" and ":>") but words starting with a backslash whose matching pair is not so easy to find.

TeX likes the \ symbol. In a mathematical text, the probability that alpha means the corresponding Greek letter is above 99.99 percent. It is almost inconceivable that the author wanted to use this sequence of five letters. Why must then we use "\alpha"?

Accents are frequently used in mathematical text. Among them the prime is denoted in the natural order as "a'", while one cannot say "a bar" or "a dot". Instead the accent is indicated by a word before the letter as in "\tilde a".

The texpp TeX Preprocessor makes it easier to produce, and helps to understand at a glimpse by allowing arbitrary character sequences as macro names. The preprocessor reads several files, understands and evaluates macro definitions, and later the invocations are replaced by the body of the macro. The process results in a single file. Using the features in the preprocessor, the problems indicated above, as well as several others, can be solved. Naturally there remain several drawbacks inherent in TeX language.

2 Macro names

A macro name can be any sequence of consecutive visible characters not containing the percentage sign and curly brackets: %, {, and }. For example, the following is a list of correct macro names:

      \/      |--      /\      ...
      <=      <:       x1      M

Macro names are recongized only if they are separated from their environment by a space (tab, newline) characters, or by one of the grouping {, } characters. Thus a name can be a proper subset of another name with no problem, as "..." and "...,".

3 Macro parameters>

A macro can have up to nine parameters. Parameters might appear before the name, after the name, or even can be mixed. Parameters are also separated by spaces (tab, newline), or by { and }. The bounding characters do not belong to the parameter. A parameter can also be a macro name with further parameters. Suppose that the macro <- has a left parameter, while -> has a right parameter. In this case in "-> p <-" the letter p is not a common parameter for both macros, rather going from left to right, p is the right parameter for the macro ->, and the parameter for <- is the expanded form of "-> p". If the macro name is within a pair of curly brackets, all of its parameters must be within that pair, too.

4 Defining a macro

A macro definition always takes (at least one) full line, this line, however, can be anywhere. A macro definition has one of the following two possible forms:

     %define  <parameters> macroname <parameters> %text% comment
     %mdefine <parameters> macroname <parameters> %text% comment

The two forms tell when should the macro name be substituted. In the first case the name is always replaced by the macro body, while in the second case only in math and display mode, that is, between $ or $$ signs but see section 5. A single name might have only one type of definition at the same time.

If the definition does not fit into a single line, the macro text can be continued. In this case the line should end with a \ character which is left out from the macro text.

In the definition parameters are denoted by "#1" up to "#9". Thus if the # sign occurs either in the macro name or in the macro text, it should be doubled. Before the macro name, and between the macro name and the % mark denoting the start of the macro body, only parameter denotations might occur. The macro text is closed also by a percentage % mark, everything after it until the end of the line is comment, and left out. The macro body defines the replacement text, in it parameter denotations might occur anywhere and in any number; they will be replaced by the value of the parameters at the invocation.

The macro body might contain other macro names defined earlier; they are substituted when the definition is elaborated. Thus macro definitions are static, not dynamic.

Macro names can be redefined. In these cases the earlier definition is not lost, but become hidden. Any macro definition can be revoked by issuing the command

     %undefine macroname % comment

starting at the beginning of a line. The last definition of the macro is dropped, and the previous definition (if any) is restored. This feature can be used to define temporary macros without risking to lost any earlier definition.

5 Advanced features

5.1 Mode switches

The TeX typesetting language distinguishes the modes math, display, and plain. Commands might work differently in different modes, or work in one or the other mode only. The preprocessor tries to maintain the actual mode. One basic way to enter and leave math mode is to use $ sign, for display mode it is done by encountering $$. As any macro, either defined in TeX or in the preprocessor, can change mode, it would be impossible to track the right mode. Thus sequences which switch mode should separately be given as

     %mathmode <entering sequence> <leaving sequence>
     %dispmode <entering sequence> <leaving sequence>

Here "entering sequence" is the (macro) name which is used to switch to the appropriate mode, "leaving sequence" which denotes the end of the mode. These names must occur in pairs, and cannot interleave. If both sequences are the same, it is not necessary to duplicate it. It could be useful to use different sequences to denote the beginning and the end of a formula display; this could prevent TeX misinterpreting your text if you left out a closing $$ sequence. The preprocessor tells you if modes are not properly embedded. The following definitions are especially handy when using LaTeX:

     %mathmode \(  \)   % for plain math mode
     %mathmode \math  \endmath
     %dispmode \[  \]   % for displayed formulas
     %dispmode \equation \endequation

5.2 Indentation

By default, TeX ignores indentation at the beginning of lines. The preprocessor can be instructed to replace those spaces by appropriate number of explicit spaces "\ ". This can be done by the command

     %indentspaces <entering sequence> <leaving sequence>

Here, as before, if both sequences are the same, it is enough to give only the first one.

5.3 Inhibiting macro replacement

Occasionally we do not want the preprocessor to go on with macro replacement. It can be achieved by using the macro "\preserve" with a single right parameter as follows:

     \preserve <anything_which_does_not_have_spaces_and_opening_brace> 
     \preserve {text in which curly brackets are balanced}

In the argument of "\preserve" there is no macro replacement; the macro name won't appear in the final text. In a macro definition the "\preserve" cannot be applied to the parameters, as expansion of parameters happens before consulting the macro text.

Special parameter handling can be instructed by the following directives:

     %preservepar  <macroname>
     %plainpar     <macroname>
     %mathpar      <macroname>

The macroname appearing in any of these directives must be defined, cannot have any left parameters, and must have at least one right parameter. All right parameters of the macro are handled according to the directive. In the first case the parameters is taken as is (no further macro substitution is made). In the second case the parameter is evaluated in plain (not math) mode, independently of the present mode; while in the last case the parameter is evaluated in math mode. It is an error (and meaningless anyway) to use the "%mathpar" directive for a macro defined with "%mdefine".

These possibilities are handy for the array environment in LaTeX:

     %mdefine btab #1 %\begin{array}{#1}%
     %mdefine etab    %\end{array}%
     %preservepar btab

The only argument of the btab macro is a sequence of letters l, c, and r which defines how many columns the array has and how its fields should be aligned. Naturally, this parameter should left alone.

5.4 Text in displayed formulas

Very frequently displayed formulas contain text, as e.g. when is a definition several cases are enlisted, or members of a set is defined by words. Quotation marks can be used to surround such a text in math mode. The preprocessor recognizes this construct. The material between quotation marks is elaborated in plain mode, and then put inside an \hbox (and so the material is not broken into several lines). For example, $$"gcd"(25,10) = 5$$ is transformed into $$\hbox{gcd}(25,10) = 5$$. In this case "gcd" is typeset from the font which is used for surrounding text.

5.5 Boxes

Boxes always switch back to plain mode. This can be indicated by using the following definitions:

     %mdefine   \hbox #1   %\hbox{#1}%
     %plainpar  \hbox

Very frequently a box also gets its size, as is the case in "\hbox to 1.5cm {...}". Seeing this construct, the preprocessor would take "to" as the argument for \hbox. To mend the situation, the preprocessor knows the following definitions, too:

     %define   <macroname> #1 {#2} %replacement text % comment
     %mdefine  <macroname> #1 {#2} %replacement text % comment

The macro must have exactly two right parameters, the second one is enclosed into curly brackets. When invoking the macro definition, the second parameter must be inside curly brackets, everything between the macro name and the opening curly bracket will be the value of the first parameters. The first parameter can be empty. Using this, the definition of the boxes goes as follows:

     %mdefine  \hbox #1{#2}  %\hbox #1{#2}%
     %plainpar \hbox
     %mdefine  \vbox #1{#2}  %\vbox #1{#2}%
     %plainpar \vbox

6 Translation file

After macro substitution a preprocessor can replace characters by other characters, or sequences of characters. This might come handy especially when using large number of accented letters. Text editors might be able to use these characters, however there might be problem with the TeX version. Characters to be replaced together with the replacement sequences are in a separate file. Its format is as follows. Each line is either comment, or contains definition for exactly one replacement character. Empty lines as well as lines starting with space or tab character are comments. Otherwise the first character of the line will be replaced, the replacement text is enclosed by apostrophes or by quotation marks, and only spaces and tab character can be between it and the character. Replacement text cannot be continued, it must fit in a single line.

There is a possibility to give the character by its code. In this case its code in decimal should be given after a \. (Of course, to define a single \ as a character to be replaced, the \\ pair should be given.) For example, the letter "í" has code 237 (say), both of the next two lines work well

      í       "{\'\i}"
      \237       "\'\i "

7 Examples

7.1

Let us define the macro "*8" which repeats its preceding argument eight times.

     %define #1  *2 %#1#1% temporary, for doubling
     %define #1  *8 %#1 *2 *2 *2% apply *2 three times
     %undefine *2               % not needed any more

Parameters are assigned from left to right, thus the third "*2" has its argument as the result of the replacement in "#1 *2 *2", that is "#1#1#1#1". Consequently the body of the "*8" macro will be "#1#1#1#1#1#1#1#1" as required. Using this, the result of both "x 1 *8 y" and "x{1}*8 y" will be "x11111111y". (Observe that the space after "*8" has vanished.) However, "x 1*8 y" and "x 1 *8y" remain unchanged as the macro name "*8" cannot be extracted. As another example, in "x {a b} *8 y" the space between a and b as well as the space before the opening brace survives in the replacement, but the space after the macro name vanishes. The result will be "x a ba ba ba ba ba ba ba by".

7.2

This example shows that macro definitions are elaborated during definition and not during invocation. This implies that a later macro definition has no effect for an earlier defined macro. This is used below.

     %define [[  %[%  now [[ is replace by a single [
     %define ]]  %]%  similarly for ]]
     %define [   %\{%  from now on [ means \{
     %define ]   %\}%  and ] means \}

These definitions can be used to transform the formula of the first line into the formula of the second line:

     a = [ 0, 1, 2, ..., n ] + [ 2n, ..., 4n ]
     a =\{0, 1, ..., n\}+\{2n, ..., 4n\}

while a reference which must go between square brackets can be enclosed by double brackets, as from "[[ a ]]" will produce "[a]". Exchanging the first two lines and last two lines in the above definition yields identical replacements for both "[[" and "[", i.e. "\{".

7.3

Our next example show how we can define symbols for model theory.

     %mdefine /\      %\wedge %      "and" symbol
     %mdefine \/      %\vee %        "or" symbol
     %mdefine |--     %\vdash %      derivable
     %mdefine |==     %\models %     semantically follows
     %mdefine -->     %\rightarrow % implies
     %mdefine phi     %\varphi %
     %mdefine ALL     %\forall %
     %mdefine not     %\neg %        negation symbol
     %mdefine ...     %\ldots %      ellipsis

As the preprocessor does not handle \ specially, both /\ and \/ are recognised with no problem. The spaces at the end of the macro definitions are needed as the replacing mechanism erases spaces following the macro name. Without these spaces the TeX commands would merge with the following symbols. Using these definitions we can write

     $$ |== ALL x ( phi (x) \/ not phi (x) ) $$
     $$ |-- phi --> ( not phi --> phi )$$

while the official TeX forms are

     $$\models\forall x (\varphi(x)\vee\neg\varphi(x) ) $$
     $$\vdash\varphi\rightarrow(\neg\varphi\rightarrow\varphi)$$

7.4

Sometimes it comes handy to denote exponentiation by **. It can be done by using "%mdefine #1 ** #2 %#1^{#2}%". Using this, expanding " $( e ** { x ** 2 / 2 } ) = ( e **{x ** 2 /2} )$ " yields

     $e^{x^{2}/2 } ) = (e^{x^{2}/2} )$

(watch for the spaces). In the definition curly brackets around #2 are necessary as we want the whole second argument be in the exponent. Without these brackets the expanded form of the left hand side would be "(e^x^2/ 2 )", definitely not what was wanted.

7.5

Using the definitions below we could use ".EQ" and ".EN" to enclose displayed formulas. Beyond this, we want formula numbers appear on the left hand side, which should be an argument of ".EQ":

     \def\mydispformula#1#2{#2\leqno #1} %TeX definition
     %dispmode .EQ .EN
     %define .EQ #1 %$$\mydispformula{#1}{% call TeX macro
     %define .EN    %}$$%

Using these we could write

     .EQ (1)
        a=b+c
     .EN

As the macroname has a right parameter, even if the formula has no number at all, we still must supply the parameter. For this purpose the empty argument "{}" can be used as ".EQ {}". Not doing so, the first symbol in the formula is used as the number.

8 Usage

The texpp program can be invoked by as "texpp [switches] file1 file2 ...". The accepted switches are

-h help

-s silent: do not write messages

-t xxxx use xxxx as translation file

-m xxxx read macro file xxxx, don't produce output

-w xxxx write the result to file xxxx

-a xxxx append result to file xxxx

-w- do not give output, check only

Using the -w switch, the given file, if exists, is erased first. With -a the new material is added at the end. If neither -a nor -w appears, the result goes to the standard output.

All the files on the argument list are processed in the order they are given, a single - indicates reading from stdin.

The environment variable TEXPP is checked first for switches and file names, its content should be similar to an argument list. Switches given after the command name have priority over the ones in the environment variable; file names there are processed before the files appearing in the argument.

Error messages are printed to the standard error, and also appear in the output marked by "%%%TeXpp Error". The program exits with value 0 if no errors were encountered, otherwise it exits with value 1.

Author

The porgram was written by Laszlo Csirmaz during a visit to DIMACS back in 1991. Originally it run on a UNIX mainframe. In 1993 it was ported to PC using Borland's C Compiler; in 2001 ported back to linux, which is the present version. The program is under GPL, the source is available with a sample macro file.

	`-h`		help
	`-s`		silent: do not write messages
	`-t xxxx`		use `xxxx` as translation file
	`-m xxxx`		read macro file `xxxx`, don't produce output
	`-w xxxx`		write the result to file `xxxx`
	`-a xxxx`		append result to file `xxxx`
	`-w-`		do not give output, check only