Logo IMG
HOME > PAST ISSUE > Article Detail


Speaking of Mathematics

Brian Hayes

Parsing Mathematics

AsTeR has three main components. A recognizer parses LaTeX notation and creates an internal representation that is easier for the program to manipulate. An audio formatting language, called AFL, renders the parsed text using both speech and nonspeech sounds. The third component is a facility for audio browsing, or actively traversing the structure of a document.

The recognizer extracts structure and ideally even meaning from the TeX-encoded text. When given a mathematical expression, it parses the entire input before the audio rendering begins. Looking ahead in the text is something that even simple speech-synthesis systems may have to do—for example, a question mark at the end of a sentence can alter the intonation of the beginning—but those systems never rearrange the words spoken. AsTeR must carry out some deeper transformations.

A simple example is the audio formatting of the expression log10x, which a listener might prefer to hear spoken as "the logarithm of x to the base 10." In creating this rendering, AsTeR cannot simply process the symbols in their original sequence. Integrals present a similar challenge, because the listener needs to know the variable of integration as soon as possible. Thus

Click to Enlarge Image

might be read as "the integral with respect to x, from zero to infinity, of e to the minus x, dx." The LaTeX encoding is

$$\int_0^\infty {e^{-x}}\dx$$

which requires AsTeR to search out the \dx at the end of the expression before it can render the \int at the beginning. To generate renderings for texts like these, AsTeR must break the expression down into its component pieces and then reassemble them in a different order.

Mathematical expressions have a treelike structure. The equation y = x + 2 can be understood as a tree that has the = operator as its root, with two branches. One of the branches consists of the symbol y, whereas the other branch is a subtree with the operator + as its root and with further branches x and 2. The tree can be represented in prefix notation as (= y (+ x 2)). AsTeR employs a similar notation internally.

Parsing is a straightforward matter for mathematical expressions written in a programming language such as FORTRAN or Pascal. But real mathematical writing, including its TeX representation, is highly ambiguous. An expression such as f(x + y) might mean "the product of f and x + y," or it might mean "the function f applied to x + y." Similarly, cos x sin y could be parsed as either (X (cos x) (sin y)) or as (cos (X x (sin x))). Superscripts are another construct that can have multiple meanings: x-1 means 1/x, but sin-1 refers to the inverse sine function; AT is probably the transpose of a matrix and D2 may be a second derivative. AsTeR's recognizer is able to resolve many of these ambiguities; those that remain are left until the rendering phase, when the user can specify how a particular notation is to be interpreted.

comments powered by Disqus


Of Possible Interest

Computing Science: Computer Vision and Computer Hallucinations

Feature Article: The Statistical Crisis in Science

Computing Science: Clarity in Climate Modeling

Subscribe to American Scientist