COMPUTING SCIENCE
Rumours and Errours
Brian Hayes
Typos and Thinkos
Computer programming teaches humility, or at least that's my
experience. In principle, the discrepancy I observed might have
pointed to an error in the published result, but that wasn't my
first hypothesis. I checked my own code, fully expecting to find
some careless mistake—running through a loop one time too few
or too many, failing to update a variable, miscalculating an array
index. Nothing leapt out at me. The problem, I began to suspect, was
not a typo but a thinko.
I did know of one soft spot in the program. The individuals
X and Y were chosen in such a way that they could
both turn out to be the same person, suggesting the strange
spectacle of spreading a rumor to oneself. ("Pssst. Have I
heard about...?") When I went to fix this oddity, I discovered
another bug. A variable named spreader-count was
incremented or decremented on each passage through the loop,
according to the outcome of the encounter; when this variable
reached zero, the program ended. After each spreader-spreader
interaction, I decreased spreader-count by 2—with
potentially disastrous results if X and Y were
identical. This was a serious flaw, which needed to be repaired;
however, the change had no discernible effect on the value of
θ, which remained stuck at 0.285.
I had another thought. Belen and Pearce were careful to state that
their result holds only when the population size tends to infinity.
Perhaps my discrepancy would go away in a larger sample. I tried a
range of populations, with these results:
| population | θ |
| 10 | 0.354 |
| 100 | 0.296 |
| 1,000 | 0.286 |
| 10,000 | 0.285 |
| 100,000 | 0.285 |
The trend was in the right direction—a smaller proportion of
residual ignorants as population increased—but the curve
seemed to flatten out beyond 1,000, and θ looked unlikely ever
to reach 0.203. Even so, it seemed worthwhile to test still larger
populations, but for that I would need a faster program. I wrote a
new and simpler version, dispensing with the array of individuals
and merely keeping track of the number of persons in each of the
three categories. With this strategy I was able to test populations
up to 100 million. The value of θ remained steady at 0.285.

Looking at the distribution of θ values from single
runs of the program (rather than averages over many runs) suggested
another idea. Most of the results were clustered between
θ=0.25 and θ=0.35, but there were a
few outliers—runs in which 99 percent of the population never
heard the rumor. I could see what must be going on. Suppose on the
very first interaction X spreads the rumor to Y,
and then in the second round the random selection happens to settle
on X and Y again. The rumor dies in infancy,
having reached only two people. Could it be that excluding these
outliers would bring the average value of θ down to 0.203? I
gave it a try; the answer was no.
» Post Comment