Logo IMG


The Experimental Analysis of Behavior

The 1957 American Scientist article, reproduced in full

B. F. Skinner

Operant Conditioning

Among the conditions which alter rate of responding are some of the consequences of behavior. Operant behavior usually affects the environment and generates stimuli which “feed back” to the organism. Some feedback may have the effects identified by the layman as reward and punishment. Any consequence of behavior which is rewarding or, more technically, reinforcing, increases the probability of further responding. Unfortunately, a consequence which is punishing has a much more complex result [3]. Pecking the key in our experimental space has certain natural consequences. It stimulates the bird tactually and auditorily, and such stimulation may be slightly reinforcing. We study the effect more expediently, however, by arranging an arbitrary consequence which is clearly so. For example, food is reinforcing to a hungry pigeon (for our present purposes we need not inquire why this is so), and we therefore arrange to present food with a special magazine. When a solenoid is energized, a tray containing a mixture of grains is brought into position in the square opening below the key in Figure 1, where the pigeon has access to the grain for, say, four seconds.

We can demonstrate the effect of operant reinforcement simply by connecting the key which the pigeon pecks to the solenoid which operates the food tray. A single presentation of food, following immediately upon a response, increases the rate with which responses to the key are subsequently emitted so long as the pigeon remains hungry. By reinforcing several responses, we may create a high probability of responding. If the magazine is now disconnected, the rate declines to, and may even go below, its original level. These changes are the processes of operant conditioning and extinction, respectively. More interesting phenomena are generated when responses are merely intermittently reinforced. It is characteristic of everyday life that few of the things we do always “pay off.” The dynamic characteristics of our behavior depend upon the actual schedules of reinforcement.

2012-01SkinnerF2.jpgClick to Enlarge ImageThe effects of intermittent reinforcement have been extensively studied in the laboratory [3] [4]. A common sort of intermittency is based on time. Reinforced responses can be spaced, say, ten minutes apart. When one reinforcement is received, a timer is started which opens the reinforcing circuit for ten minutes; the first response after the circuit is closed is reinforced. When an organism is exposed to this schedule of reinforcement for many hours, it develops a characteristic performance which is related in a rather complex way to the schedule. A short sample of such a performance is shown in Figure 2, obtained with a cumulative recorder. The scales and a few representative speeds are shown in the lower right-hand corner. The experimental session begins at a. The first reinforcement will not occur until ten minutes later, and the bird begins at a very low rate of responding. As the 10-minute interval passes, the rate increases, accelerating fairly smoothly to a terminal rate at reinforcement at b. The rate then drops to zero. Except for a slight abortive start at c, it again accelerates to a high terminal value by the end of the second 10-minute interval. A third fairly smooth acceleration is shown at d. (At e the pen instantly resets to the starting position on the paper.) The over-all pattern of performance on a “fixed interval” schedule is a fairly smoothly accelerating scallop in each interval, the acceleration being more rapid the longer the initial pause. Local effects due to separate reinforcements are evident, however, which cannot be discussed here for lack of space [4]. If the intervals between reinforcements are not fixed, the performance shown in Figure 2 cannot develop.

2012-01SkinnerF3.jpgClick to Enlarge ImageIf the length of interval is varied essentially at random, responding occurs at a single rate represented by a constant slope in the cumulative record. Two examples are shown in Figure 3. In the upper curve, a hungry pigeon is reinforced with grain on a variable-interval schedule, where the mean interval between reinforcements is 3 minutes. Reinforcements occur where marked by pips. In the lower curve a hungry chimpanzee, operating a toggle switch, is reinforced on the same schedule with laboratory food. The over-all rate under variable-interval reinforcement is a function of the mean interval, of the level of food-deprivation, and of many other variables. It tends to increase slowly under prolonged exposure to any one set of conditions. The constant rate itself eventually becomes an important condition of the experiment and resists any change to other values. For this reason the straight lines of Figure 3 are not as suitable for baselines as might be supposed.

Reinforcements may be scheduled with a counter instead of a timer. For example, we may maintain a fixed ratio between responses and reinforcements. In industry this schedule is referred to as piecework or piece-rate pay. Anyone who has seen workers paid on such a schedule is familiar with some features of the performance generated: a high rate is sustained for long periods of time. For this reason, the schedule is attractive to employers, but it is generally recognized that the level of activity generated is potentially dangerous and justified only in seasonal or other periodic employment.

2012-01SkinnerF4.jpgClick to Enlarge ImagePerformances of a pigeon under fixed-ratio reinforcement are shown in Figure 4. In the left-hand record reinforcements occur every 210 responses (at a, b, c, and elsewhere). The over-all rate is high. Most of the pauses occur immediately after reinforcement. At the right is the performance generated when the pigeon pecks the key 900 times for each reinforcement. This unusually high ratio was reached in some experiments in the Harvard Psychological Laboratories by W. H. Morse and R. J. Herrnstein. A short pause after reinforcement is the rule.

A variable-ratio schedule programmed by a counter corresponds to the variable-interval schedule programmed by a timer. Reinforcement is contingent on a given average number of responses but the numbers are allowed to vary roughly at random. We are all familiar with this schedule because it is the heart of all gambling devices and systems. The confirmed or pathological gambler exemplifies the result: a very high rate of activity is generated by a relatively slight net reinforcement. Where the “cost” of a response can be estimated (in terms, say, of the food required to supply the energy needed, or of the money required to play the gambling device), it may be demonstrated that organisms will operate at a net loss.

When the food magazine is disconnected after intermittent reinforcement, many more responses continue to occur than after continuous reinforcement. After certain schedules, the rate may decline in a smoothly accelerated extinction curve. After other schedules, when the rate itself enters prominently into the experimental conditions, it may oscillate widely. The potential responding built up by reinforcement may last a long time. The writer has obtained extinction curves six years after prolonged reinforcement on a variable-ratio schedule [5]. Ratio schedules characteristically produce large numbers of responses in extinction. After prolonged exposure to a ratio of 900:1 (Figure 4) the bird was put in the apparatus with the magazine disconnected. During the first 4-1/2 hours, it emitted 73,000 responses.

Interval and ratio schedules have different effects for several reasons. When a reinforcement is scheduled by a timer, the probability of reinforcement increases during any pause, and first responses after pauses are especially likely to be reinforced. On ratio schedules responses which are part of short runs are likely to be reinforced. Moreover, when a given schedule of reinforcement has had a first effect, the performance which develops becomes itself an important part of the experimental situation. This performance, in combination with the schedule, arranges certain probable conditions at the moment of reinforcement. Sometimes a schedule produces a performance which maintains just those conditions which perpetuate the performance. Some schedules generate a progressive change. Under still other schedules the combination of schedule and performance yields conditions at reinforcement which generate a different performance, which in turn produces conditions at reinforcement which restore the earlier performance.

Charles B. Ferster and the writer have checked this explanation of the effect of schedules by controlling conditions more precisely at the moment of reinforcement [4]. For example, we guaranteed that all reinforced responses would be preceded by pauses instead of making this condition merely probable under an interval schedule. In a variable interval performance, such as that shown in Figure 3, it is not difficult to find responses which are preceded by, say, 3-second pauses. We can arrange that only such responses will be reinforced without greatly disturbing our schedule. When this is done, the slope of the record immediately drops. On the other hand, we may choose to reinforce responses which occur during short rapid bursts of responding, and we then note an immediate increase in rate.

2012-01SkinnerF5.jpgClick to Enlarge ImageIf we insist upon a very long pause, we may be able to reinforce every response satisfying these conditions and still maintain a very low rate. The differential reinforcement of low rates was first studied by Douglas Anger in the Harvard Laboratories. Wilson and Keller at Columbia have reported an independent investigation [6]. Recently W. H. Morse and the writer have studied the effect of relatively long enforced pauses. Figure 5 shows the performance obtained in one such experiment. Any response which followed a pause at least 3 minutes in duration was reinforced. Whenever a response was made before 3 minutes had elapsed, the timer was reset and another 3-minute pause required. Under these conditions a very low stable rate of responding obtains. The figure shows a continuous performance (cut into segments for easier reproduction) in a single experimental session of 143 hours, during which time the pigeon received approximately 250 reinforcements. At no time did it pause for more than 15 minutes, and it seldom paused for more than 5 minutes.

The situation under this schedule is inherently unstable. Rate of responding increases with the severity of food deprivation and decreases as the bird becomes satiated. Let us assume that at some time during the experiment, say, at a in Figure 5, reinforcements are occurring too infrequently to maintain the bird’s body weight. The bird is operating, so to speak, at a loss. The increasing deprivation then increases the rate of responding and makes it even less likely that the pigeon will wait 3 minutes in order to respond successfully for reinforcement. Nothing but starvation lies ahead in that direction. If, on the other hand, the bird is receiving slightly more reinforcements than necessary to maintain body weight, the level of deprivation will be decreased. This will produce a lower rate of responding, which in turn means that the 3-minute pause is more frequently satisfied and reinforcements still more frequently received. In such a case the result is a fully satiated bird, and the experiment must be brought to a close. This actually happened at b in Figure 5 where reinforcements had become so frequent that the bird was rapidly gaining weight. This inherent instability can be corrected by changing the required pause in terms of the organism’s performance. If the over-all rate of reinforcement begins to drift in either direction, the required pause may be appropriately changed. Thus the experiment in Figure 5 could have been continued if at point c, say, the required interval had been increased to 4 minutes. By an appropriate adjustment of the interval, we have been able to keep a pigeon responding continuously for 1500 hours—that is, 24 hours a day, 7 days a week, for approximately 2 months. Pigeon breeders have said that pigeons never sleep (roosting is merely a precautionary device against blind flying), and the statement seems to be confirmed by experiments of the present sort.

By differentially reinforcing high rates of responding, pigeons have been made to respond as rapidly as 10 or 15 responses per second. Here technical problems become crucial. It is not difficult to construct a key which will follow rapid responding, but the topography of the behavior itself changes. The excursions of head and beak become very small, and it is doubtful whether any single “response” can be properly compared with a response at a lower rate.

2012-01SkinnerF6.jpgClick to Enlarge ImageIn their study of different kinds of schedules of reinforcement, Ferster and the writer found that it was possible to set up several performances in a single pigeon by bringing each one of them under stimulus control. Several different colored lights were projected on the translucent key and responses were reinforced on several corresponding schedules. Figure 6 shows a typical performance under such a multiple schedule of reinforcement. When the key was red, the pigeon was reinforced on a 6-minute fixed-interval schedule. The usual interval scallops are seen, as at a and b. When the key was green, the pigeon was reinforced upon completing 60 responses (a fixed ratio of 60:1). The usual ratio high rate is shown as at c and d. When the key was yellow, reinforcements followed a variable-interval schedule where a pause of 6 seconds was required. The resulting low steady performance is shown at e, f, and elsewhere. In one experiment we were able to show nine different performances under the control of nine different patterns on the key.

2012-01SkinnerF7.jpgClick to Enlarge ImageThe experiment may be complicated still further by introducing more than one key and by reinforcing on two or more schedules concurrently. An example of the resulting performances is shown in Figure 7, from some research by Ferster, at the Yerkes Laboratories for Primate Biology at Orange Park, Florida. In Ferster’s experiment, a chimpanzee operates two toggle switches, one with each hand. Responses with the right hand are reinforced on a fixed ratio of approximately 210:1, and the performance recorded from the right toggle switch is shown in the upper part of Figure 7. As usual in many ratio performances, pauses occur after reinforcements. Responses with the left hand are at the same time being reinforced on a variable-interval schedule with a mean interval of 5 minutes, and the performance is shown in the lower part of the figure. There is some interaction between the performances, for reinforcements in the variable-interval record usually correspond to slight pauses in the ratio performance. In general, however, the experiment shows a remarkable independence of two response systems in a single organism.

comments powered by Disqus


Subscribe to American Scientist