MY AMERICAN SCIENTIST
SEARCH

HOME > PAST ISSUE > May-June 2009 > Article Detail

COMPUTING SCIENCE

# Writing Math on the Web

The Web would make a dandy blackboard if only we could scribble an equation

# Marking It Up

From the outset, the primary language of the Web has been HTML (Hypertext Markup Language), in which “tags” identify various parts of a document’s structure, such as headings, paragraphs and lists. In its earliest versions, HTML was quite simple; it couldn’t even do subscripts and superscripts, so there wasn’t much hope of displaying elaborate mathematical notation.

Several revisions later, HTML does have subscript and superscript tags. And a supplementary language called CSS (Cascading Style Sheets) allows finer control over many aspects of the appearance of a Web page. Modern browsers are also equipped with an interpreter for JavaScript, a programming language, so that Web pages become not just static documents but interactive programs. Still, none of these features directly address the needs of scientific and mathematical writing. There are no HTML tags for integrals, say, or for matrices.

Two main impediments stand in the way of presenting mathematics on the Web. First is the alphabet problem: Mathematicians have created a sprawling zoo of novel symbols and embellished or transformed versions of familiar characters. The nabla (∇) that appears in Maxwell’s equations is a notable example, and it is joined by hundreds of other unusual glyphs—∂, ∀, ⊥, ∫—not to mention the entire Greek alphabet and occasional borrowings from Hebrew and other languages. The difficulty of reproducing these characters has been alleviated to some extent by the recent adoption of Unicode fonts, which have room for a larger collection of glyphs than earlier font formats. But it’s still not to be taken for granted that every reader will have the necessary fonts installed.

The second problem is one of layout. Mathematical notation is two-dimensional; in order to represent a matrix, say, or a summation with upper and lower bounds, it’s necessary to specify the exact x and y coordinates of symbols. Some elements of mathematical notation, such as brackets and the radical that designates a square root, vary in shape and size as well as position. Encoding such geometric information in HTML and CSS is not impossible, but it stretches the technology to its limit.

Early in the history of the Web, a group of mathematicians and other interested parties gathered to address this issue in a systematic way. The result was a new markup language called MathML, which was endorsed in 1998 by the World Wide Web Consortium. I’ll return below to the present status and future prospects of MathML, but it will suffice for now to note that most of the mathematical notation to be found on the Web is not encoded in MathML. Instead it relies on a variety of ingenious but ad hoc workarounds. Most often there is a TeX system somewhere in the background.

One common strategy is to convert a mathematical expression to an image, or “bitmap.” In some cases each symbol becomes a separate image, and multiple images have to be assembled and carefully positioned to represent an equation. In other cases an entire equation is encapsulated in a single image. The practice takes us back to the pre-alphabetic tradition of pictographic writing.

Pictograms have one big advantage: Any symbol, no matter how arcane, can be displayed in any browser. But there are also drawbacks. Symbols in the images may not blend well with typefaces on the page, and it’s hard to control size, spacing and alignment. A math-intensive document could have hundreds of small images, which are slow to load and display. Images cannot be copied and pasted in the same way that text can, and the equations cannot easily be edited or revised.

But relying on fonts to supply mathematical symbols also has hazards. The author of a Web page cannot know what fonts are available on the reader’s computer, or which of the available fonts will be selected for any given symbol. Thus a page that looks fine to the author may display very differently for some readers, or could be completely indecipherable.