Logo IMG
HOME > PAST ISSUE > Article Detail


Speaking of Mathematics

Brian Hayes

Speaking Mathematics

When the recognizer has done its work, the second component of AsTeR takes over to render the parsed expression in sound. It does so by applying rules written in AFL, the audio formatting language. The rules determine not only what words are spoken but also how they are spoken, controlling the pitch and speed of the voice and a variety of other qualities such as breathiness and smoothness. The rules also invoke nonspeech audio cues.

AsTeR's standard rule for rendering fractions reads a simple expression such as a/b as "a over b," but a more complicated instance such as (x + y)/(x - y) is given as "the fraction with numerator x + y and denominator x - y." A few special cases are recognized, so that 1/2 can be rendered as "one-half" rather than "one over two." All of the AFL rules are subject to modification.

The rendering of superscripts and subscripts is an area where changes in voice quality provide an intuitive vocal analogue of the visual rendering. Superscripts are read at a higher pitch and subscripts at a lower pitch. Such voice cues can help to resolve ambiguities in an audio rendering. For instance, xn + 1 is readily distinguished from xn + 1, even without an explicit verbal marker of where the exponent ends.

An even more direct mapping from visual space to auditory space helps the listener to discern the structure of tables and matrices. With stereophonic output, AsTeR can vary the relative loudness of the left and right sound channels while reading the rows of a matrix, so that the voice seems to be moving through the structure.

Nonspeech sounds provide a concise and unobtrusive way of conveying certain other textual features. In a bulleted list, a brief tone can announce each new item, rather than repeating the word "bullet." Sounds played continuously in the background while speech continues can serve to emphasize or highlight a passage of text, providing an audio equivalent of italic type and boldface.

The aim of these various devices is to create a true audio notation for mathematics. In written mathematics, succinct notation allows the overall structure of an expression to be taken in at a glance, whereas the same concepts expressed in words would have to be laboriously parsed. AsTeR seeks in a similar way to shift some of the work of listening from the cognitive to the perceptual domain.

comments powered by Disqus


Of Possible Interest

Computing Science: Computer Vision and Computer Hallucinations

Feature Article: The Statistical Crisis in Science

Computing Science: Clarity in Climate Modeling

Subscribe to American Scientist