Top banner
Subscribe
MY AMERICAN SCIENTIST
LOG IN! REGISTER!
SEARCH
 
Logo

COMPUTING SCIENCE

The Semicolon Wars

Every programmer knows there is one true programming language. A new one every week

Brian Hayes

If you want to be a thorough-going world traveler, you need to learn 6,912 ways to say "Where is the toilet, please?" That's the number of languages known to be spoken by the peoples of planet Earth, according to Ethnologue.com.

If you want to be the complete polyglot programmer, you also have quite a challenge ahead of you, learning all the ways to say:

printf("hello, world\n") ;

(This one is in C.) A catalog maintained by Bill Kinnersley of the University of Kansas lists about 2,500 programming languages. Another survey, compiled by Diarmuid Piggott, puts the total even higher, at more than 8,500. And keep in mind that whereas human languages have had millennia to evolve and diversify, all the computer languages have sprung up in just 50 years. Even by the more-conservative standards of the Kinnersley count, that means we've been inventing one language a week, on average, ever since Fortran.

A chronology of selected programming languages...Click to Enlarge Image

For ethnologists, linguistic diversity is a cultural resource to be nurtured and preserved, much like biodiversity. All human languages are valuable; the more the better. That attitude of detached reverence is harder to sustain when it comes to computer languages, which are products of design or engineering rather than evolution. The creators of a new programming language are not just adding variety for its own sake; they are trying to make something demonstrably better. But the very fact that the proliferation of languages goes on and on argues that we still haven't gotten it right. We still don't know the best notation—or even a good-enough notation—for expressing an algorithm or defining a data structure.

There are programmers of my acquaintance who will dispute that last statement. I expect to hear from them. They will argue—zealously, ardently, vehemently—that we have indeed found the right programming language, and for me to claim otherwise is willful ignorance. The one true language may not yet be perfect, they'll concede, but it's built on a sound foundation and solves the main problems, and now we should all work together to refine and improve it. The catch, of course, is that each of these friends will favor a different language. It's Lisp, says one. No, it's Python. It's Ruby. It's Java, C#, Lua, Haskell, Prolog, Curl.

Sadly, linguistic diversity has a dark side. Communities separated by differences of language don't always get along peaceably; the term "Balkanization" comes to mind. And, like weary, war-torn countries, the computing professions have had their share of sectarian strife and schism. As far as I know, the conflicts have never come to actual bloodshed, but harsh words have been exchanged (in many languages).

The Endian Wars

In 1726 Jonathan Swift told of a dispute between the Little-Endians of Lilliput and the Big-Endians of Blefuscu; 41,000 perished in a war fought to decide which end of a boiled egg to crack. This famous tempest in an egg cup was replayed 250 years later by designers of computer hardware and communications protocols. When a block of data is stored or transmitted, either the least-significant bit or the most-significant bit can go first. Which way is better? It hardly matters, although life would be easier if everyone made the same choice. But that's not what has happened, and so quite a lot of hardware and software is needed just to swap ends at boundaries between systems.

This modern echo of Swift's Endian wars was first pointed out by Danny Cohen of the University of Southern California in a brilliant 1980 memo, "On holy wars and a plea for peace." The memo, subsequently published in Computer, was widely read and admired; the plea for peace was ignored.

Another feud—largely forgotten, I think, but never settled by truce or treaty—focused on the semicolon. In Algol and Pascal, program statements have to be separated by semicolons. For example, in x:=0; y:=x+1; z:=2 the semicolons tell the compiler where one statement ends and the next begins. C programs are also peppered with semicolons, but in C they are statement terminators, not separators. What's the difference? C needs a semicolon after the last statement, but Pascal doesn't. This discrepancy was one of the gripes cited by Brian W. Kernighan of AT&T Bell Labs in a 1981 diatribe, "Why Pascal is not my favorite programming language." Although Kernighan's paper was never published, it circulated widely in samizdat, and in retrospect it can be seen as the beginning of the end of Pascal as a serious programming tool.

Still another perennially contentious issue is how to count. This one brings out the snarling dogmatism in the meekest programmer. Suppose we have a list of three items. Do we number them 1, 2, 3, or should it be 0, 1, 2? Everyone in computerdom knows the answer to that question, and knows it as an eternal truth held with the deepest, visceral conviction. Only one of the alternatives is logically tenable. But which is it? Consider the Java expression Date(2006,1,1); what calendar date do you suppose that specifies?The answer is February 1, 3906. In Java we count months starting with 0, days starting with 1, and years starting with 1,900.

Even the parts of a program that aren't really part of the program can provoke discord. "Comments" are meant for the human reader and have to be marked in some way so that the computer will ignore them. You might think it would be easy to choose some marker that could be reserved for this purpose in all languages. But a compendium of programming-language syntax compiled by Pascal Rigaux—a marvelous resource, by the way—lists some 39 incompatible ways to designate comments: # in awk,\ in Forth, (*...*) in Pascal, /*...*/ in C, and so on. There's also a running debate over whether comments should be "nestable"—whether it's permissible to have comments inside comments.

Then there's the CamelCase controversy. Most programming languages insist that names of things—variables, procedures, etc.—be single words, without spaces inside them; but runningthewordstogether makes them unreadable. Hence CamelCase, with humps in the middle (also known as BumpyCaps and NerdCaps; but sTuDLy CaPs are something else). To tell the truth, I don't think there's much actual controversy about the use of CamelCase, but the name has occasioned lively and erudite discussions, revisiting old questions about Camelus dromedarius and C. bactrius, and offering glimpses of such further refinements as sulkingCamelCase (with a droopy head).

Organizing Babel

I mock the pettiness of these squabbles—and I believe some of them deserve mocking—and yet I don't want to give the impression that only cosmetic issues are in dispute, or that programming languages are really all alike under the skin. On the contrary, what's most fascinating about programming languages is how dramatically they differ. I would argue that the distance between C and Lisp, for example, is greater than that between any pair of human languages.

Noam Chomsky asserts that all human languages have the same "deep structure," which may even be hard-wired into the brain. In computer languages, too, certain features seem to be universal. Almost all programming languages are built on the same kind of grammatical scaffold, called a context-free grammar. At the semantic level, almost all programming languages have the same computational power: If you can compute something in one language, you can get the same answer in any other, given enough effort. But this formal equivalence is misleading. Raw computational power is not what people care about in a programming language; the real criterion is how readily you can express your ideas.

In the 1930s the linguists Edward Sapir and Benjamin Lee Whorf argued that what you can think is conditioned by what language you think in. For natural languages, the Sapir-Whorf hypothesis has met with much skepticism, but for computer languages the idea seems more plausible. Different categories of programming languages elicit quite different modes of thinking and problem solving.

The imperative and the functional styles...Click to Enlarge Image

Programming languages are usually classified in four families. Imperative languages are built on commands: do this, do that, do the next thing. The commands act on stored data, modifying the overall state of the system. The imperative approach was the default in most early programming languages, including Fortran, cobol and Algol.

A functional language is modeled on the idea of a mathematical function, such as f(x)=x 2. The function is a black box that accepts arguments as input and returns values as output. A key point is that the calculation depends only on the arguments and affects only the value; there are no extraneous side effects. This property makes it easier to reason about functional programs, since there's no need to keep track of the state of the entire machine. Functional programming began with Lisp, although most versions of Lisp allow other styles of programming as well. John Backus, the lead developer of Fortran and a contributor to Algol, later became an advocate of functional languages. Several "pure" functional languages have emerged since then, including ML, Miranda and Haskell.

In object-oriented programming languages the root idea is to bind together imperative commands and the data they act on, forming encapsulated objects. Instead of defining a procedure to manipulate a data structure, one "teaches" the data structure how to carry out operations on itself. Most object-oriented languages also have some notion of inheritance, whereby an object is born already knowing default behaviors. The object-oriented languages trace their heritage back to SIMULA 67, but they began to attract attention only in the 1980s with Smalltalk. In a curious turn of events, object-oriented principles became wildly popular, but the result was not the widespread adoption of Smalltalk; instead, object-oriented features were bolted onto other languages. From C, for example, came C++ and Objective C and eventually C#; Java is also in this family. Object-oriented notions are now so deeply ingrained that they influence almost every new language.

The languages of the fourth category are variously known as logic, relational or declarative languages. What they have in common is the idea of programming not by spelling out step-by-step algorithms but by stating facts or relations. The best-known exemplar of this technique is Prolog, which relies on an method called unification to make deductions from stated facts. Related concepts also turn up in less-exotic areas such as database-query languages and spreadsheets.

These four categories suggest the breadth of the programming-language spectrum, but there are further variations across many other dimensions. At the most superficial level, the various languages simply look different. C is terse, cobol quite verbose. Lisp is full of parentheses. Perl, said some wag, looks like Snoopy swearing: @&$^^#@!.

Languages can also be distinguished as "low-level" or "high-level." The low-level ones allow more-direct access to aspects of the underlying hardware, such as addresses in memory or input and output devices. High-level languages provide an insulating layer of abstraction.

A generation of languages created in the 1970s emphasized "structured programming"—otherwise known as bondage and discipline. Pascal is in this group: It enforces strict rules about types of data and the flow of control through a program. The reaction against such constraints produced "hacker-friendly" languages, including C.

Languages also differ in their intended audience or area of application. Fortran began as a language for scientific computing, COBOL for business. Quite a few interesting languages were designed for teaching or for children. BASIC, Pascal and Smalltalk are all in this class, and so is Logo. (All of them have had to struggle to be taken seriously as languages for grownups.)

Zealotry

The remarkably wide range of programming languages would seem to offer something for everyone. We could celebrate diversity. We could let a thousand flowers bloom. What actually happens, more often, is that we launch a crusade to convert the infidels—or else exterminate them.

In 1975 Edsger W. Dijkstra, a major figure in the structured-programming movement, wrote a memo titled "How Do We Tell Truths that Might Hurt?" The "truths" were mostly Dijkstra's opinions of programming languages; how he told them was very bluntly. Fortran is "an infantile disorder," PL/I "a fatal disease," APL "a mistake, carried through to perfection." Students exposed to COBOL "are mentally mutilated beyond hope of regeneration," he said. "The use of COBOL cripples the mind; its teaching should, therefore, be regarded as a criminal offense." When the memo was published a few years later, defenders of COBOL and BASIC replied in kind, although none of them were quite able to match Dijkstra's acid rhetoric.

In fairness, I should note that most disputes over programming languages are neither as vicious nor as humorless as the affair of Dijkstra's "truths." Today's missionaries take an upbeat approach, spending more time in promoting their own religion and less in dissing the other person's beliefs. The message is no longer "You'll burn in hell if you write C." It's "Look what a paradise Python offers you!" (I think maybe I liked the old sermons better.)

Much of this proselytizing is done with the best of intentions. When you have found a tool that seems artful and elegant, you want to spread the good news. This is a generous impulse. But there is also self-interest at work. For programming language P to prosper, it must have a community of users—people who write P programs and buy books about P, who teach P to students, who agitate to get P supported on new platforms, who hire P programmers. Every convert to P improves P's chance of survival; if the convert comes from the rival language Q, so much the better.

Quarrels over notation are hardly unique to the world of computing. In mathematics there was the famous impasse between the Leibnizian dx/dt and the Newtonian "x-dot" (known as the war between deity and dotage). Chemists wrangle about how to name molecules. Even chess players have fought over how to record moves. But the situation in computer science is of a different order. Calculus never had 2,500 ways to write a derivative.

Over the years, the cacophony of programming languages has repeatedly been cited as a threat to further progress in computing. The usual response has been—what else?—to propose yet another programming language. "If we could all just get together and agree on one last, greatest language...." In the 1960s this was the ambition of PL/I, the language that Dijkstra called a fatal disease. Later, Ada was to reunify all of computing—by mandate of the U.S. Department of Defense. A decade ago Java was the shining hope, promoted with the slogan "Write once, run anywhere."

A few programming languages—most notably Fortran and Lisp—seem to be all but immortal; the rest are like waves washing ashore and then draining into the sand. Riding the crest of the latest wave are the scripting languages, especially Python and Ruby. Their origins are humble. The idea of scripting began with batch-command languages, used as "glue" to bind together other programs, and with extension languages, meant to be embedded inside programs. But scripting languages have grown up into general-purpose programming languages. They are popular now for writing Internet applications. Python also has a following in scientific computing.

The Internet has brought another encouraging development: a new multilingualism. Merely managing a Web site these days requires fluency in half a dozen programming and data-formatting languages. There's HTML (Hypertext Markup Language) for the basic structure of the pages and CSS (Cascading Style Sheets) for details of presentation, as well as JavaScript for annoyances such as pop-up windows. On the server side, content is likely to be encoded in some form of XML (Extensible Markup Language) and accessed through a database query language such SQL. All the pieces are held together by a scripting language, which might be PHP, Perl, Python or Ruby. (Of course this situation cries out for yet another language to unify or replace all the others. At least two languages are already contending for this role—Curl and Links.)

Lisping in Numbers

My plea for peace in programmerhood would carry more weight if I could present myself as an impartial arbiter, with no stake in the outcome of the language race. But it's time to confess. I too have a favorite programming language, which I cling to like a child with a bedraggled teddy bear. Don't you dare try to take away my Lisp!

Carnage at the breakfast table...Click to Enlarge Image

Without engaging in missionary zealotry of my own, it's hard to explain my fondness for Lisp. I'll just say it's a simple-minded language with one trick that it does very well. Every Lisp expression is a list, evaluated by reading the first element of the list as the name of a function and the remaining elements as the arguments to the function. For example, (/ (+ 3 5) 2) is a program for dividing (+ 3 5) by 2 , where (+ 3 5) is a subprogram for adding 3 and 5 . The value of the entire expression is 4 . The syntax is brutally simple, even primitive, but that's its strength. Lisp evangelists always note that data and programs are represented in the same way, which makes it easy to write programs that manipulate other programs. That's true, but what appeals to me is just the uniformity of the notation. Everything is done the same way, and so there's not much to remember. (One thing I won't mention is the profusion of parentheses (which annoy some people). (What the world needs (I think) is not (a Lisp (with fewer parentheses)) but (an English (with more.))))

In the chronology of programming languages, Lisp comes from the very dawn of time. It was conceived nearly 50 years ago by John McCarthy, now of Stanford University. My own acquaintance with the language goes back 25 years. To persist in using such an antique idiom seems peculiar and affected, like speaking in Miltonic verses. There's something stubborn and curmudgeonly about it. It looks like a rebuke to all the effort expended on programming-language design since the 1950s. Do I really mean to suggest that not one of the 2,500 newer languages has been able to improve on Lisp?

No, I don't. And of course the Lisp I speak today is not the language McCarthy introduced 50 years ago. It has been augmented, overhauled, updated, split into multiple dialects, then reassembled in a standard called Common Lisp. (Still, the parts of the language I like best were there at the beginning and have changed little.)

An International Lisp Conference was held at Stanford a year ago. This was a gathering of the faithful, and naturally there was talk about how to bring enlightenment to the rest of the world. It was also an occasion showing that even advocates of the same language are quite capable of arguing among themselves deep into the night.

At the end of the final session, John McCarthy rose to speak. He looked around at his audience and remarked, "If someone set off a bomb in this room, it would wipe out half of the worldwide Lisp community. That might not be a bad thing for Lisp, because it would have to be reinvented." His meaning, as I understood it, was partly that the Common Lisp standard had stifled innovation. But he went on to say that if he could go all the way back to the beginning, there were things he would do differently. Even the maker of the language did not see it as beyond improvement. I found McCarthy's candor refreshing, but I also had the thought: No, no, don't tamper with it. I like it just the way it is.

I do believe there are real differences among programming languages—better ones and worse ones—and I rank Lisp among the better. When you get to the bottom of it, however, I write programs in Lisp for the same reason I write prose in English—not because it's the best language, but because it's the language I know best.

© Brian Hayes


comments powered by Disqus
 

EMAIL TO A FRIEND :


Bottom Banner