Mathematical Markup Language (MathML) Version 2.0

5 Combining Presentation and Content Markup

6 Entities, Characters and Fonts

6.1 Introduction

6.1.1 The Intent of Entity Names

6.1.2 The STIX Project

6.1.3 Entity Listings

6.1.4 Non-Marking Entities

6.1.5 Printing Entity Listings

6.1.6 Special Constants

6.1.7 Alphabetical Lists

6.1.8 ISO Entity Set Groupings

6.1.9 Additional Entity Set Grouping

7 The MathML Interface

Notation has proved very important for mathematics. Mathematics has grown in part because of the succinctness and suggestiveness of its evolving notation. There have been many new signs evolved for use in mathematical notation, and mathematicians have not held back from making use of many symbols originally developed elsewhere. The result is that mathematics makes use of a very large collection of symbols. It is difficult to write mathematics fluently if these characters are not available for use in coding. It is difficult to read mathematics if glyphs are not available for presentation on specific display devices.

This situation poses a problem for the W3C Math Working Group. It does not fall naturally within the purview of a mathematics for HTML specification and DTD production to worry about more than the entities allowed in the DTD. Moreover, as experience has shown, a long list of entities with no means to display them is of little use, and a cause of frequent frustrations in trying use a standard. On the other hand, a large collection of glyphs or characters without a standard way to refer to them is not of much use either.

The W3C Math Working Group has therefore taken on directly specification of part of the full mechanism of proceeding from notation to final presentation, and is collaborating with organizations undertaking specification of the rest.

For instance, we try to use entity names that are contained in ISO TR 9573, which supersedes the ISO TR 8879 annex as far as mathematics is concerned. There are considerations of mathematical usage that do on occasion militate against this, and the TR 9573 lists need supplementing. We hope to be able to agree with the TR 9573 WG on suitable extensions, in the course of the revision of their document that they are presently undertaking.

The STIX project of the STIPUB group of scientific and technical publishers has also been working toward a common collection of mathematical symbols and names. The W3C Math Working Group expects to issue further updates on the matter of character entities as a consequence of this project's useful work. For the latest character tables and fonts information, see the W3C Math Working Group home page.

The STIX project team leader, Nico Poppelier, is a member of the W3C Math Working Group. The STIX project, set up by the STIPUB group of publishers, aims to formulate a collection of characters needed in the course of scientific and technical publishing. A database of characters in common use is being produced by collaborating publishing organizations. The team will propose to the Unicode consortium the additions to the next revision of the Unicode character set that this process shows are needed, together with the appropriate character codes. Finally the STIX project will commission the production of a complete set of fonts covering those Unicode characters for science and technology, to be made available to the public under license, but free of charge. The STIPUB group recognizes that easy availability of the characters and fonts greatly facilitates communication and publication.

This chapter of the MathML Specification contains a listing of entities for use in MathML.

To provide more background on the characters used by mathematics we have used a larger comparative database showing codes and meanings in other common math environments. The W3C Math Working Group is very grateful to Elsevier Science and to Wolfram Research (makers of Mathematica ®) for making available to us so much useful data.

Some character entities, although important for the quality of print rendering do not directly have glyph marks that correspond. They are called here non-marking entities. Below we have a table of those adopted for the purposes of MathML. Their roles are discussed in chapter 3 [Presentation Markup] and chapter 4 [Content Markup], respectively. The values of the spaces given are recommendations. Some of these characters do not already have Unicode values. Arbitrary values up in the Private Zone E8 range have been assigned. The correspondence between the spacing values mentioned below and those in the Unicode descriptions are not exact, but are good matches.

Entity name | Unicode | Description |

`	` |
0009 | tabulator stop; horizontal tabulation |

`
` |
000A | force a line break; line feed |

`&IndentingNewLine;` |
E891 | force a line break and indent appropriately on next line |

`⁠` |
E892 | never break line here |

`&GoodBreak;` |
E893 | if a linebreak is needed, here is a good spot |

`&BadBreak;` |
E894 | if a linebreak is needed, try to avoid breaking here |

`&Space;` |
0020 | one em of space in the current font |

` ` |
00A0 | space that is not a legal breakpoint |

`​` |
200B | space of no width at all |

` ` |
200A | space of width 1/18 em |

` ` |
2009 | space of width 3/18 em |

` ` |
2005 | space of width 4/18 em |

`  ` |
E897 | space of width 5/18 em |

`​` |
E898 | space of width -1/18 em |

`​` |
E899 | space of width -3/18 em |

`​` |
E89A | space of width -4/18 em |

`​` |
E89B | space of width -5/18 em |

`⁣` |
E89C | used as a separator, e.g. in indices (section 3.2.4 [Operator, Fence, Separator or Accent] |

`⁣` |
E89C | short form of `⁣` |

`⁢` |
E89E | marks multiplication when it is understood without a mark (section 3.2.4 [Operator, Fence, Separator or Accent] |

`⁢` |
E89E | short form of `⁢` |

`⁡` |
E8A0 | character showing function application in presentation tagging (section 3.2.4 [Operator, Fence, Separator or Accent] |

`⁡` |
E8A0 | short form of `⁡` |

Since the situation concerning availability of character codes from Unicode and under ISO 9573-13 is not yet fully clear at the time of writing, we have decided to proceed conservatively.

We have taken the ISO 9573-13 proposal, as conveyed to us from
Anders Berglund, and have added a number of additional aliases based
in the practice of the mathematical typesetting community. Thus the
main influence outside ISO has been the names to be found in the T_{E}X
community.

To facilitate comprehension of a fairly large list of names, which totals over 2000 in this case, we offer the same information in more than one form.

We have entities listed by name and sample glyphs for all of them. Each entity name is accompanied by a code for a character grouping chosen from a list given below, a short verbal description, and a Unicode hex code if there is a corresponding sample glyph to be found in ISO 10646. Those codes beginning with the hex digit E, e.g. E321, indicate assignments to the private zone of Unicode. This indicates that the character in question is not at present an official Unicode character. It is highly recommended that authors use entity names instead of Unicode values, especially for those characters in the Unicode private zone, as those values may change. It is hoped that most of these characters will become officially endorsed by Unicode and ISO under its 10646 standard in due course. In any case we expect fonts for these characters to become publicly available as the use of MathML develops. If the entity name is an alias then a reference back to the ISO form is given if there is one, and to a preferred form if not. The ISO or preferred forms have references to their alternates where they exist.

*Newly Revised.* The entity listings by alphabetical and
Unicode order in section 6.1.7 [Alphabetical Lists] have now been
brought more into line with the corresponding ISO character sets, in
that if some part of a set is included then the entire set is
included. Also, ISOCHEM has been dropped. These changes have also
been reflected in the entity declarations in the DTD in appendix A [Parsing MathML].

The tables of character sets with glyphs given in section 6.1.8 [ISO Entity Set Groupings] have not been revised from the original tables. In cases where information from section 6.1.7 [Alphabetical Lists] and section 6.1.8 [ISO Entity Set Groupings] conflict, the tables in section 6.1.6 [Special Constants] and the DTD should be considered normative.

To commence we list separately a few of the special characters which MathML has seen fit to be a little radical in introducing. There are two for special constants and one for calculus. They too must have private Unicode values.

Entity name | Unicode | Description |

`ⅅ` |
F74B | D for use in differentials, e.g. within integrals |

`ⅅ` |
F74B | short form of `ⅅ` |

`ⅆ` |
F74C | d for use in differentials, e.g. within integrals |

`ⅆ` |
F74C | short form of `ⅆ` |

`ⅇ` |
F74D | e for use for the exponential base of the natural logarithms |

`ⅇ` |
F74D | short form of `ⅇ` |

`&false;` |
E8A7 | logical constant false |

`ⅈ` |
F74E | i for use as a square root of -1 |

`ⅈ` |
F74E | short form of `ⅈ` |

`&NotANumber;` |
E8AA | used in 4.3.2.9 |

`&true;` |
E8AB | logical constant true |

The first table offered is a very large ASCII listing of printing entity names, ordered alphabetically, with upper-case preceding lower-case as in ASCII order. The Unicode numbers beginning with E are arbitrary assignments in the Private Area where there is presently no Unicode character available. When there is no Unicode offered at all it is because the characters listed can be thought of as font variations of common Roman alphabetic characters.

There is also an ASCII listing of printing entities ordered by Unicode number. Next we have collections of the entities in entity sets which are similar to the groupings in the corresponding ISO documents.

In addition, we list the above material in the groupings used by ISO 9573-13 with an additional grouping of aliases introduced. This table makes explicit the entity groupings and provides links to ASCII listings of the groups and HTML tabular listings which display the glyphs, insofar as they are to be had, as well.

The symbols for mathematics that ISO have considered are organized, for both historical and mnemonic reasons into groupings with somewhat descriptive names. In the tables below we reproduce the newly proposed versions of these groups and give the corresponding Unicode sample glyphs. For each ISO 9573-13 group we give first an Extended version in ASCII listing which includes aliases, then a similar listing with sample glyphs, then the Basic ISO 9573-13 entity set and its version with included glyphs. The entries are organized alphabetically by entity name.

It should be noted that the sample glyphs given here are in GIF files intended for viewing on a monitor's screen at 72dpi. They are not suitable for printing, and in particular do not constitute a set of fonts covering the symbols of mathematics. In addition, it is important to note that the Unicode numbers assigned in the private zone, beginning with hex digits E2 and above, are arbitrary and only used here to ensure that sample glyphs are available for display. They do not constitute suggested assignments of codes. Such a set of fonts is under development in more than one context. The MathML Working Group is engaged in ensuring that fonts will be readily publicly available.

This first block of entity sets includes mostly non-letter symbols,
along with a few letters loaded with mathematical semantics. At the
end of the block we have included the table MMALIAS of the aliases
introduced by MathML, which mostly come from the T_{E}X community, and
MMEXTRA with the additional character entities added by MathML. Note
that some of the blocks are place-holders for a possible future
expansion of the tables.

Group | Descriptive Name | ||

ISOAMSA | Added Math Symbols: Arrows | Extended Glyphs | Basic Glyphs |

ISOAMSB | Added Math Symbols: Binary Operators | Extended Glyphs | Basic Glyphs |

ISOAMSC | Added Math Symbols: Delimiters | Extended Glyphs | Basic Glyphs |

ISOAMSN | Added Math Symbols: Negated Relations | Extended Glyphs | Basic Glyphs |

ISOAMSO | Added Math Symbols: Ordinary | Extended Glyphs | Basic Glyphs |

ISOAMSR | Added Math Symbols: Relations | Extended Glyphs | Basic Glyphs |

ISOTECH | General Technical | Extended Glyphs | Basic Glyphs |

ISOPUB | Publishing | Extended Glyphs | Basic Glyphs |

ISODIA | Diacritical Marks | Extended Glyphs | Basic Glyphs |

ISONUM | Numeric and Special Graphic | Extended Glyphs | Basic Glyphs |

ISOBOX | Box and Line Drawing | Basic Glyphs | |

MMALIAS | MathML Aliases | Basic Glyphs | |

MMEXTRA | MathML Additions | Basic Glyphs |

Mathematical literature displays the common use of particular font styles. Characters representing given letters which differ only in the glyph presentation are in principle not different for the purposes of a character registry such as Unicode, which is not supposed to take into account mere font differences. However usage has meant that both ISO and Unicode, like mathematics, recognize them as different entities. Therefore we include lists for Greek, script, open face (also known as double struck or blackboard bold), and fraktur (also known as gothic or German) fonts.

Group | Descriptive Name | ||

ISOGRK3 | Greek Symbols | ASCII | Glyphs |

ISOMSCR | Math Alphabet Script | ASCII | Glyphs |

ISOMOPF | Math Alphabet Open Face | ASCII | Glyphs |

ISOMFRK | Math Alphabet Fraktur | ASCII | Glyphs |

For reference we provide a list of the names of several other ISO font entity sets which are really normally used for text. ISOGRK4 is actually a collection of emboldened forms of the Greek letters.

Group | Descriptive Name |

ISOGRK1 | Greek Letters |

ISOGRK2 | Monotoniko Greek |

ISOGRK4 | Alternative Greek Symbols |

ISOCYR1 | Russian Cyrillic |

ISOCYR2 | Non-Russian Cyrillic |

In addition to the above listed, for the sake of completeness, we provide a table of other entities not within the ISO lists which are referred to somewhere in this specification. It is not certain that all these characters, though of mathematical significance, will reach incorporation within Unicode. The W3C Math WG continues to wrestle with the problems of the characters of mathematics.

`&LeftSkeleton;` |
E850 | start of missing information |

`&RightSkeleton;` |
E851 | end of missing information |

`&LeftBracketingBar;` |
F603 | left vertical delimiter |

`&RightBracketingBar;` |
E604 | right vertical delimiter |

`&LeftDoubleBracketingBar;` |
F605 | left double vertical delimiter |

`&RightDoubleBracketingBar;` |
F606 | right double vertical delimiter |

`─` |
E859 | short horizontal line |

`|` |
E85A | short vertical line |

`≔` |
E85B | assignment operator |

`❘` |
E85C | vertical separating operator |

`⫤` |
E30F | alias for `⫤` |

`⥰` |
F524 | right double arrow with rounded head (looks like thin superset) |

`⊏̸` |
E604 | negated set-like partial order operator |

`⊐̸` |
E615 | negated set-like partial order operator |

`⊈` |
2288 | alias of `⊈` |

`⊉` |
2289 | alias of `⊉` |

`⥐` |
F50B | left-down-right-down harpoon |

`⥞` |
F50E | left-down harpoon from bar |

`⥖` |
F50C | left-down harpoon to bar |

`⥟` |
F50F | right-down harpoon from bar |

`⥗` |
F50D | right-down harpoon to bar |

`⇤` |
21E4 | alias for `⇤` |

`⥎` |
F505 | left-up-right-up harpoon |

`↤` |
21A4 | alias for `↤` |

`⥚` |
F509 | left-up harpoon from bar |

`⥒` |
F507 | left-up harpoon to bar |

`⇥` |
21E5 | alias for `⇥` |

`⥛` |
F50A | right-up harpoon from bar |

`⥓` |
F508 | up-right harpoon to bar |

`⩵` |
F431 | two consecutive equal signs |

`⪢` |
E2F7 | alias for `≫` |

`⧏` |
F410 | not left triangle, vertical bar |

`⪡` |
E2FB | alias for `≪` |

`≭` |
226D | alias for `&nasymp;` |

`≂̸` |
E84E | alias for `≂̸` |

`≎̸` |
E616 | alias for `≎̸` |

`≏̸` |
E84D | alias for `≏̸` |

`⧏̸` |
F412 | not left triangle, vertical bar |

`⪢̸` |
F428 | not double greater-than sign |

`⪡̸` |
F423 | not double less-than sign |

`&NotPrecedesTilde;` |
E5DC | alias for `⪯̸` |

`⧐̸` |
E870 | not vertical bar, right triangle |

`≿̸` |
E837 | not succeeds or similar |

`⧐` |
F411 | vertical bar, right triangle |

`∏` |
220F | alias for `∏` |

`⋄` |
22C4 | alias for `⋄` |

`⨯` |
E619 | cross or vector product |

`□` |
25A1 | alias for `□` |

`⤓` |
F504 | down arrow to bar |

`↧` |
21A7 | alias for `↧` |

`⥡` |
F519 | down-left harpoon from bar |

`⥙` |
F517 | down-left harpoon to bar |

`⥑` |
F515 | up-left-down-left harpoon |

`⥠` |
F518 | up-left harpoon from bar |

`⥘` |
F516 | up-left harpoon to bar |

`⥝` |
F514 | down-right harpoon from bar |

`⥕` |
F512 | down-right harpoon to bar |

`⥏` |
F510 | up-right-down-right harpoon |

`⥜` |
F513 | up-right harpoon from bar |

`⥔` |
F511 | up-right harpoon to bar |

`↓` |
E87F | short down arrow |

`↑` |
E880 | sort up arrow |

`⤒` |
F503 | up arrow to bar |

`↥` |
21A5 | `↥` |

`̑` |
0311 | breve, inverted (non-spacing) |

`‾` |
00AF | over bar |

`⏞` |
F612 | over brace |

`⎴` |
F614 | over bracket |

`⏜` |
F610 | over parenthesis |

`_` |
0332 | combining low line |

`⏟` |
F613 | under brace |

`⎵` |
F615 | under bracket |

`⏝` |
F611 | under parenthesis |

`▫` |
F530 | empty very small square |

`▪` |
F529 | filled very small square |

`◻` |
F527 | empty small square |

`◼` |
F528 | filled small square |

`⧴` |
F51F | rule-delayed (colon right arrow) |