Quels sont les différents catcodes ?

Avertissement

Cette page est extraite de la documentation de Tralics.

Certaines parties sont spécifiques à Tralics et ne concernent pas LaTeX.

This page contains the description characters of catcode 0, catcode 1, catcode 2, catcode 3, catcode 4, catcode 5, catcode 6, catcode 7, catcode 8, catcode 9, catcode 10, catcode 11, catcode 12, catcode 13, catcode 14, catcode 15, catcode 16.

The following definitions will be used here:

\def\testeq#1#2{\def\tmp{#2}\ifx#1\tmp\else \toks0={#1wantd: ->#2.}
\typeout{\the\toks0}\show #1\uerror\fi}
\def\xtesteq#1#2{\xdef\tmp{#2}\ifx#1\tmp\else \toks0={#1wantd: ->}
\typeout{\the\toks0 #2.}\show #1\uerror\fi}

This can be used as

\def\foo{something} \testeq\foo{Something else}

and will produce no XML, but an error and two lines of message, like

\foo wantd: ->Something else.
\foo=macro: ->something.
Error signaled at line 352:
Undefined command \uerror.

Note the following points. Both \typeout and \show print the value on the tty as well as the transcript file. The token list of \typeout is fully expanded, and for this reason, we put in a token list the quantity #1wantd: ->#2. and inhibit expansion via \the.


Unless stated otherwise, a category code (in short a catcode) is an integer between 0 and 15. It is a property of input characters, used by the scanner to convert the input stream into a sequence of tokens. The character with ASCII code 37 is the character %, and is normally of category 14, thus behaves like a start-of-comment. You can insert such a character in the XML output via the sequence \char37 (However \char60 produces <latex>$<$</latex>, see the \char command). In most of the cases, a character of catcode c is read as a command with command code c.

Tralics uses the same parsing rules as TeX, with the exception of characters of category 16 (see below), and the fact that the character range is not limited to 8bit characters but to 16 bits. A construction like ^^^^^1d7e0 provides a character without category code, see the documentation on characters.

Catcode 0

The only character with catcode 0 is the backslash. This character is used to construct command names. The number 0 is not a command code. The command name is created as follows: if the next character is not of category 11, the name consists of the single character, and Tralics goes into state S if the character is a space, state M otherwise; if the character is of category 11, the name consists of all characters category 11 starting with the current one, and Tralics goes into state S. The first unused character will be read again later, and a category code assigned to it. Normally, spaces after command names are ignored because the current state S and the category code of the space character is 10. But the command can change the category code of the character that follows.

The following example shows how to change the \catcode of the letter x, so that x can be used like a backslash. In the second definition, there is no space between the digit zero and the letter x. This letter is scanned, found to be a non digit, and pushed back in the main token list; at this moment, the catcode is still unchanged, xfoo is considered as a string of four letters, and not a command.

{\def\foo{\gdef\bar{OK}}  \catcode`x=0 xfoo}
\testeq\bar{OK}
{\def\foo{\gdef\bar{notOK}}  \catcode`x=0xfoo}
\testeq\bar{OK}

The next example shows that the token after a control sequence like \foo can be of catcode letter (because when the character is read again, its catcode is analysed again).

{\def\bar#1{\egroup\show#1}
\def\foo{\bgroup\catcode32=11\catcode`\%=11 \bar}\foo \foo%\foo=\foo$\foo#}

In the Tralics output below, `the letter” means a character of category 11, `the character” a character of category 12.

the letter  .
the letter %.
the character =.
math shift character $.
macro parameter character #.

Catcode 1

Initially, the only character of catcode 1 is the open brace. See below.

Catcode 2

Initially, the only character of catcode 2 is the closing brace.

The following example shows that you can use other characters. In Tralics, characters of catcode 1 and 2 serve two purposes: for grouping, so that modifications are local to a group, and for delimiting arguments.

{\catcode`A1\catcode`B2
\def\fooA2B\testeq\fooA2B
\def\barA\bgroup\def\fooA3B\egroupB\bar% \bar modifies \foo locally
\testeq\foo{2}}

Note the following trick: \uppercase \relax\bgroup}. After \uppercase and commands like that, you can have an arbitrary sequence of spaces and \relax commands. The argument is delimited on the left by an implicit left brace, on the right by an explicit right brace.

Catcode 3

Initially, the only character with catcode 3 is the dollar sign. It is used to enter and exit math and display math mode. A construct like {catcode `x=3 catcode`y=3 xy sinxy} is the same as \[\sin\].

Catcode 4

The only character with catcode 4 is the alignment tab character &. (see description of arrays).

Catcode 5

The only character with catcode 5 is the end-of-line character (carriage return, ASCII code 13). When TeX sees such a character, it throws away the remaining of the line. If TeX is in state N, the result is a \par token; if TeX is in state M, the result is a newline token of catcode 10, and otherwise, the character is ignored. For Tralics, the newline token has value 10 (line-feed), and not 32 (space) as in TeX. As a result, in most cases, newline characters remain in the XML result, whenever they are equivalent to space (the purpose is to make the output more readable). Note that Tralics is in state N whenever it reads the first character of a line. The number 5 is not a command code.

Whenever Tralics sees a new line, it inserts the character defined by the \endlinechar command. This character is by default the end of line character, see \endlinechar.

Catcode 6

The only character with catcode 6 is the sharp sign #. This character is used as parameter delimiter or parameter reference in macro definitions. It is also used in TeX table preambles (but not in LaTeX not Tralics). In the definition of \xbar below, the quantity #1 refers to the first argument of \Ma, ##1 refers to the first argument of \Mb and ####1 could be used to refer to the first argument of \xbar. As you can see, the sharp character can be replaced by any character of catcode 6. In order to put in a command a character of category code 6, it suffices to precede it by any character of catcode 6. The body of the \xfoo command is #A#A, but the printer shows it as ##AA##AA. Example

{\catcode`A6 \def\fooA1A2{\xdef\bar{A2A1}}\foo23
\testeq\bar{32}}
\def\Ma#1{\def\Mb ##1{\xdef\xbar{#1##1}}}\Ma a\Mb b
\testeq\xbar{ab}
\def\foo{###AA#AA}\def\fooB{##AA##AA} \def\fooC{####AAAA}
\ifx\foo\fooB\else\bad\fi \ifx\foo\fooC\bad\fi}

Note

There are some subtle differences between TeX and Tralics. Assume that X has category code 6, T has category 1 and that you define \def\foo xX1yT#1}. If you ask TeX to print the value, you will see \foo=macro: xX1y->X1 while Tralics says \foo=macro: x#1y->#1. The reason is the following: TeX stores the macro as a single list of tokens, replacing the start of the body (here T) by a special marker, and omitting the final brace. This explains why T is not printed. The body of the macro holds a reference to the first argument so \def\xfoo xX1y{X1} produces the same result, but \def\yfoo x#1y{#1} gives a different result. In case of \def\foo X1#2{#1}, the character used in the body is the last found in the argument list (here a sharp sign). On the other hand, Tralics stores somewhere the list of characters that are before the first argument (here x), and the delimitors for the arguments (here y for the first argument). In particular \foo, \xfoo and \yfoo use the same representation.

As a consequence, comparing macros via \ifx may produce different results; the same holds for \meaning. Consider now \def\foo X1XTX1}; TeX prints \foo=macro: X1T->X1T while Tralics prints \foo=macro: #1T->#1. Here \foo is a macro with one argument delimited by T (of category code 1), and this character is reinserted after the expansion. Tralics does not show which character it reinserts; in fact it insert an open brace. One could use \futurelet in order to see the difference.

Finally consider \newcommand\foo[2][truc]{X1X2}. You will see \xfoo=macro: ->\@protected@testopt \xfoo \\xfoo {truc} in the case of TeX (this means that TeX created an auxiliary command, whose value is \\xfoo=\long macro:[#1]#2->#1#2) or \xfoo=opt \long macro: truc#2->#1#2 (this means that Tralics did not create other other commands).

Catcode 7

The only character with catcode 7 is the hat. This character is used in math mode for superscripts. It is also used in the double hat construct: if a character of catcode 7 appears twice in a row, like in ^^13 and ^^ab, and is followed by two digits in base 16, it is as if the character with this code had been given (here, code 19, and 171); note that only lowercase letters are allowed here. In the case where a character of catcode 7 appears twice in a row, and is followed by a 7bit character of code c (like ^^Z or ^^A or ^^{), it is as if TeX had seen a character of code c-64 or c+64 (the one which is between 0 and 128). In the example, the numbers are 26, 1 and 59. The catcode of this character is examined again, for instance ^^5e is the hat character, of catcode 7.

Example.

{1^^{^^ab2^^5e^ab3^^5e^5e^ab4\def\Abc{ok}\def\bAc{OK}\^^41bc\b^^41c}
{\catcode `=7 ééab $2$ %next line should produce M
éé
$1=^^^AééT$}  %line contains hat, hat, control-A
\def\msg{A message.^^J}

This is the XML translation

<p>1;«&nbsp;&nbsp;&nbsp;4okOK
«&nbsp;<formula type='inline'><math xmlns='http://www.w3.org/1998/Math/MathML'>
<msup><mi>x</mi> <mn>2</mn> </msup></math></formula>
 M<formula type='inline'><math xmlns='http://www.w3.org/1998/Math/MathML'>
<mrow><msup><mn>1</mn> <mo>è</mo> </msup><mo>=</mo><mi>A</mi>
<mo>&#20;</mo></mrow></math></formula> </p>

Note: The line that contains the two é characters translates as capital M, because the last character on the line is the newline character, control-M (even though on Unix, you would expect control-J). The \msg command contains as last token a newline character (control-J of catcode 12), and not an end-of-line character of catcode 5. The character control-T, represented by  seems to be illegal in XML. Without it, the preview is some chars.

If you say ^^^^ABCD, the result is a character whose value is defined by the value ABCD (each letter must be a digit, or a lower case letter between A and F). Such a construct is equivalent to \char &ABCD, but it is one token, and spaces are not ignored after it. (You can also use five hats, see the documentation on characters). Example

\def\foo#1#2#3{#1=#2=#3=}
\foo^^^^0153^^^^0152^^^^0178
 ^^^^017b^^8?
&#339;=&#338;=&#376;=
&#379;x?

Catcode 8

The only character with catcode 8 is the underscore character. It is used for subscripts in math mode. See the \sp command for an example of use.

Outside math mode, you will get an error. For instance, if you say

{\catcode`x7 \catcode`y=8 a^b_c xy\sp\sb}

then Tralics will complain (but not in the same fashion as TeX).

Error signaled at line 377:
Missing dollar not inserted, token ignored: {Character ^ of catcode 7}.
Error signaled at line 377:
Missing dollar not inserted, token ignored: {Character _ of catcode 8}.
Error signaled at line 377:
Missing dollar not inserted, token ignored: {Character x of catcode 7}.
Error signaled at line 377:
Missing dollar not inserted, token ignored: {Character y of catcode 8}.
Error signaled at line 377:
Missing dollar not inserted, token ignored: \sp.
Error signaled at line 377:
Missing dollar not inserted, token ignored: \sb.

Catcode 9

Characters of code 9 are ignored. Initially, no character has this category code.

Catcode 10

A character of catcode 10 acts like a space. If TeX sees a character of catcode 10, the action depends on the current state. If the state is N or S, the character is ignored. Otherwise, TeX is in state M and changes to state S, and the result is a space token (character 32, category 10). Space, tabulation are of catcode 10.

Spaces are in general ignored at start of line, because TeX is in state M. In verbatim mode, the catcode of the space is changed, and thus spaces remain.

Catcode 11

Characters of catcode letter can be used to make multiletter control sequences (without using \csname). Only ASCII letter (between a and z, or between A and Z) are by default of catcode 11.

Catcode 12

Characters of catcode 12 cannot be used to make multiletter control sequences. All characters not listed elsewhere are of catcode 12 (especially, all 8-bit characters).

Catcode 13

Characters of category 13 are active. They can be used only if a definition is associated. In Tralics only the tilde character is of 13, but the three characters _#& have a definition (the translation is the character). Note that, in PlainTeX, the tilde character expands to “”penalty \@M \ “” (there is a space at the end of the command) and in LaTeX to \nobreakspace{}, which is the same with a \leavevmode in front, in Tralics, the expansion is simply \nobreakspace.

Catcode 14

Characters of catcode 14 act like an start-of-comment character. The only character with catcode 14 is the percent character.

Catcode 15

Characters of catcode 15 are invalid. There is no invalid character in Tralics.

Catcode 16

There is no character of catcode 16 in TeX . In Tralics, this code is reserved for verbatim-like characters, defined by \DefineShortVerb. These characters act is if they were preceded by \verb. Note that the star character is not exceptional. You can use \fvset, if you want to change the translation of a space.

Example:

\DefineShortVerb{\|}
Test of |\DefineShortVerb| and |\UndefineShortVerb|.
\DefineShortVerb{\+}
test 1 |toto| +x+ |+x-| +|t|+
\UndefineShortVerb{\+}
test 2 |toto| +x+ |+x-| +|t|+
espace: |+ +|\fvset{showspaces=true}|+ +|\fvset{showspaces=false}|+ +|.
\DefineShortVerb{\*}
Verbatimfoo: *+ foo +*\verb+*foo*+\verb*+foo*+
Verbatimfoo: \verb|+ foo +*foo*foo*|.

The XML output is the following

<p>Test of <hi rend='tt'>\DefineShortVerb</hi> and <hi rend='tt'>\UndefineShortVerb</hi>.

test 1 <hi rend='tt'>toto</hi> <hi rend='tt'>x</hi> <hi rend='tt'>+x-&#x200B;</hi>
  <hi rend='tt'>|t|</hi>

test 2 <hi rend='tt'>toto</hi> +x+ <hi rend='tt'>+x-&#x200B;</hi> +<hi rend='tt'>t</hi>+
espace: <hi rend='tt'>+&nbsp;+</hi><hi rend='tt'>+&blank;+</hi><hi rend='tt'>+&nbsp;+</hi>.

Verbatimfoo: <hi rend='tt'>+&nbsp;foo&nbsp;+</hi><hi rend='tt'>*foo*</hi><hi rend='tt'>foo*</hi>
Verbatimfoo: <hi rend='tt'>+&nbsp;foo&nbsp;+*foo*foo*</hi>.
</p>

We can continue the example as follows. We show how to use \SaveVerb and \UseVerb.

\SaveVerb{FE}|}|\def\FE{\UseVerb{FE}}
\DefineShortVerb{\+}
\SaveVerb{VE}+|+\def\VE{\UseVerb{VE}}
\SaveVerb{DU}|$_|\def\DU{\UseVerb{DU}} %$
\UndefineShortVerb{\+}
\UndefineShortVerb{\|}
\UndefineShortVerb{\*}
Test \FE,\VE, \DU.
<code>

<code xml>
<p>Test <hi rend='tt'>}</hi>,<hi rend='tt'>|</hi>, <hi rend='tt'>$_</hi>.
</p>

Source: https://www-sop.inria.fr/marelle/tralics/doc-symbols.html