chapter six

 

feedback

 

When turning a corner in a car, a driver will adjust how far he turns the wheel based on visual information about the position of the car to the road and what effect the last turning of the wheel made. When talking to a group, a good lecturer or teacher will adjust the pace, content, and style of his presentation to fit the group, according to cues, such as facial expressions, that he picks up from the group. Both the driver and the lecturer are altering their behavior based on feedback, information fed back into the system about the effects of an output of the system. Feedback provides people with information about the effects of their behavior on the environment.

A thermostat is a mechanism that utilizes feedback. When the temperature goes below some value, the thermostat turns on the furnace. The furnace then stays on until the thermostat receives feedback that the temperature has reached the desired point. Without such a feedback device you would have to manually turn the furnace on and off. In 1948 Wiener argued that the logic of feedback theory, as it had been developed with machines such as the thermostat, could be applied to other areas, such as biology, neurophysiology, and psychology. Wiener used the term cybernetics for the application of such an approach to machines and animals. Feedback is now a key psychological concept from the level of simple muscle control to complex group interactions.

 

One important type of feedback, called proprioception, comes from the muscles. Simple walking involves a complex set of muscle movements that require feedback about where different muscles are and what they are currently doing. With your eyes shut you should be able to touch your index fingers together, regardless of where your hands start from. This requires proprioceptive feedback. The disease tabes dorsalis blocks proprioceptive feedback from the limbs. A person with this disease may have poor control over voluntary movement of the limbs and might not be able to do the above finger touching with his eyes closed.

 

Human speech utilizes a large number of different feedback mechanisms. During speech there is constant feedback about the spatial position, direction of movement, and velocity of movement of the various structures involved in speech, particularly the tongue (Sussman, 1972). As Sussman points out, “any attempt to explain how the tongue signals the higher brain centers concerning its highly complex positional adjustments during speech activity must necessarily incorporate a rapidly acting, highly discriminative, and comprehensively informative neurosystem.” The complexity of such a feedback system is mentioned by Sussman in the case of a man speaking with a pipe clenched between his teeth. Here the entire muscular movement patterns of the tongue and lips must compensate for non-moving jaws.

 

Hearing your own voice is another source of feedback for normal speech. One way of demonstrating this is with delayed auditory feedback (Yates, 1963). Here the subject hears his voice while talking, but instead of hearing it immediately, it is electronically delayed a fraction of a second. For example, while counting from one to ten you might hear “two” while saying “three.” Talking during delayed auditory feedback is quite difficult. While counting you might repeat the same number several times, and speech is generally slower, and contains many more errors. A similar phenomenon occurs with people giving speeches in a large hall. Sometimes a speaker in such a situation will hear his own speech slightly delayed, as from his amplified voice bouncing back to him from a far wall, and this effect may severely impair his speech. Rock musicians will often have small speakers on the stage with them for immediate feedback of their sound. Otherwise their timing might be disrupted from hearing their music delayed as it rebounds off walls. On the other hand deaf people often show peculiar inflections and intonations in their speech because they lack auditory feedback.

 

Smith (1972) has shown similar disruption in performance with delayed visual feedback. Subjects had to track a moving geometrical figure with a wand. However, they could not directly see their hands or the objects. Rather they saw what they were doing on a television monitor which could delay the visual feedback. As the delay increased (from 17 to 820 msec.), performance decreased. With the intermediate delay times, about 250 msec., the subjects often reported that their arm and hand movements had a peculiarly “rubbery” appearance and feeling.

 

Adams (1968) has summarized some of the ways in which feedback is incorporated with learning as follows. S-R learning theorists conceptualize the learning process as associations formed between stimuli (5) and responses (R), whereas S-S theorists view learning in terms of associations between stimuli, such as stimulus relationships in the environment (see discussion of S-R and S-S theories in Chapter 1). For the S-R theorist, feedback, such as proprioception, is a source of stimuli. New responses can be learned to these feedback stimuli and/or the stimuli may become conditioned reinforcers. This feedback provides the basis for chaining, a sequence of responses in which the occurrence of one response provides part of the cues for the following response. One rat named S. R. Rodent was taught the following chain of behaviors. Rodent first had to climb a spiral staircase, then run across a drawbridge, climb a ladder, get into a cable car and pull himself across a gap, climb another stairway, play a toy piano, run through a tunnel, climb into an elevator and pull a chain to start it, ride to the bottom floor, and then press a bar to receive pellets (Bachrach, 1964). This chain of behaviors was conditioned into the rat by starting at the end (pressing the bar) and moving backward. In the final chain each component of the chain occurs to two sets of stimuli, the stimuli of the apparatus and feedback stimuli from the previous behavior. Thus Rodent’s behavior of climbing the ladder provides feedback stimuli which help lead to the next response of getting into the cable car. The feedback stimuli might also become conditioned reinforcers that then reinforce that member of the chain. Similarly in learning to say a long poem from memory, the feedback from saying one line may facilitate remembering and saying the next line.

 

For S-S theorists, according to Adams, feedback provides information about the proper conditions for the behavior. Feedback stimuli feed into the whole stimulus complex to which the animal makes some responses. These approaches are often cognitive in nature, and one may interpret learning as being primarily perceptual. Feedback cues are often thought of as information about which behaviors are appropriate, rather than stimuli that elicit responses.

 

Adams favors a closed-loop theory of behavior in which “the consequences of a response with sufficient habit strength to occur are fed back and compared with a reference which is the desired value for the system. Any difference between a reference and its response feedback is error, and the detection of errors results in a response sequence that can lead to error nulling.” In other words the animal has a certain goal to attain, and makes responses in that direction. Feedback following the responses provides information about whether the goal has been reached or what responses might now be appropriate toward reaching the goal.

 

Miller, Galanter, and Pribram (1960) proposed a closed-loop analysis of behavior based on TOTE units, as opposed to S-R units. A TOTE unit stands for Test-Operate-Test-Exit. The “test” is an analysis of information, mostly sensory data, about any incongruities between the current state of affairs and some goal. If there is some incongruity, the animal responds, or “operates.” After it operates, the animal again tests. If there is still an incongruity, it operates again, and then tests again.  When the test finally shows no incongruity, the animal exits and stops this one type of behavior.

 

 

Figure 6—1 shows a basic TOTE unit related to hammering a nail until it is flush with the surface. During the test the person inspects whether the nail is flush. If the nail is not flush, then he hammers (operate). If the nail is flush, he no longer hammers this nail (exit).

 

A TOTE unit is an example of a closed-loop mechanism since the organism tests the outcome of its behavior against a reference until there is no error. Complex human behavior, of course, can seldom be explained in terms of a simple TOTE unit. Rather, Miller, Galanter, and Pribram showed how behavior might be explained in terms of a number of interrelated TOTE units. They called such a set of TOTE units a plan. A plan may involve several TOTE units functioning simultaneously, as well as TOTE units that work sequentially.

 

Another type of feedback theory is ideo-motor theory, originally proposed by William James in 1890 and propounded more recently by Greenwald (1970). According to ideo-motor theory, a response is selected on the basis of its own anticipated sensory feedback. In this theory a perceptual image or idea of an action initiates the performance. William James argued that the mere thought of a movement “awakens in some degree” the actual movement. (Think about swinging a golf club or riding a bicycle and note tendencies in the related muscles toward movement.) The only thing that stops the actual movement is inhibitory influences from other sources, such as other thoughts. Thus the simple thought of an action results in anticipation of its own sensory feedback which in turn helps to determine which behavior will finally occur.

 

SENSITIVITY TRAINING GROUPS

 

Feedback is the key ingredient in sensitivity training groups, also called T-groups (see Aronson, 1972, Chap. 8). In a T-group a number of people get together with a trainer to learn more about how their behavior affects other people. The discussions of the group center on a current analysis of the social dynamics of the group itself. Each member learns how to provide feedback to other members of the group about how he feels and is affected by their specific behaviors. Through such feedback each member can find out what other people really think and feel about different behaviors. Feedback of this nature is useful to some people who do not ordinarily receive it, either because of selective perception on their part or because people are not giving them this feedback. It can be a useful source of information which may or may not provide cues and consequences that will affect later performance.

 

For feedback in a T-group to be most useful it should usually meet the following requirements. It should describe the speaker’s feelings and reactions, not simply make evaluations. It should provide specific examples, rather than generalities. It should not just be dumped on the person, but should be presented to him in a time and a way that is most useful to the receiver.

 

T-groups, of course, include much more than feedback, but feedback is probably the major objective. The research on T-groups is currently grossly inadequate. For example, it would be interesting to analyze T-groups in terms of modeling and reinforcement.

 

The types of skills and behaviors a person learns in interacting with members of a T-group may or may not be useful behaviors in dealing with people outside of the T-group, particularly if there is not a good transition between the T-group and the rest of the world. Too many T groups merely provide the person with another reference group manipulating his behaviors, while denying any such manipulation. A second problem is that feedback is not always a very powerful change mechanism. The feedback may convince the person that he wishes to act differently (more assertively, for example), but he might not have the desired skills in his behavioral repertoire. Other change procedures (e.g., assertive training) then might be useful.

 

OPERANT CONDITIONING

 

Occasionally the consequence of some response or behavior will make it more or less probable that the response will occur again in similar situations. If a hungry rat presses a bar and receives food, the consequence of the food will usually make it more probable that the rat will press the bar again. If, however, the bar-press yields electric shock to the rat’s feet, the consequence of the shock will make it less probable that the rat will bar-press again.

 

Operant conditioning, also called instrumental conditioning and type R learning, is the study of the effects of contingent (contiguous) events on the behaviors they follow. A contingent event is a dependent event when it occurs if and only if a specified behavior occurred first. In the case of the rat bar-pressing for food, the food appears if and only if the rat presses the bar. If the contingent event makes it more probable that the response will be repeated, then the event is called a reinforcement. If the contingent event makes the behavior less probable, the event is called a punishment. It should be noted that the concept of “contingency” has been defined in various ways by operant conditioners. Some theorists (e.g., Schoenfeld & Farmer, 1970) argue that “contingency” should imply more than simple contiguity, perhaps some form of causal relationship between the distribution of responses and the distribution of contingent events.

 

Figure 6—2 shows the temporal sequence in operant conditioning. In the presence of certain antecedent stimuli the animal makes some response. Contingent on this response is some event. Following this event there may be a change in the probability of the response re-occurring in the presence of the antecedent stimuli. Such contingent events then are clearly a form of feedback, for they inform the animal about the consequences of his behavior.

 

 

The contingent event may increase or come on (positive) following the response, or it may decrease or go off (negative). Also, the contingent event may increase the probability of the response (reinforcement) or decrease the probability (punishment). This yields the following four combinations: positive reinforcement, negative reinforcement, positive punishment, and negative punishment.

 

Positive reinforcement is an event whose increase results in an increase in the probability of the response it is contingent on. The rat increases his probability of bar-pressing because each bar-press increases the amount of food present. A child cries at bedtime if this ensures that his parents will read him a story.

 

Negative reinforcement is an event whose decrease results in an increase in the probability of the response it is contingent on. A rat will increase his probability of bar-pressing if each bar-press decreases the electric shock in the grid floor. A person begins taking a different route home because he learns it decreases the traffic he encounters. Note that a very common error is for people to confuse negative reinforcement with punishment. Remember that negative reinforcement increases response probability, while punishment decreases it.

 

Positive punishment is an event whose increase results in a decrease in the probability of the response it is contingent on. The rat decreases his probability of bar-pressing if each bar-press increases the amount of electric shock in the grid floor. A student stops answering questions in class if his answers are met with derision.

 

Negative punishment is an event whose decrease results in a decrease in the probability of the response it is contingent on. A rat might decrease his probability of bar-pressing if each bar press decreases the supply of something it likes, such as the amount of available food. A child stops yelling if each time he yells his television program is turned off for 10 seconds.

 

 

Figure 6—3 shows the relationships between these four types of contingent events. Note that the onset and offset of the same event may function differently depending on what behaviors they are contingent on. Thus if the onset of a pleasant event can be a positive reinforcement, the offset of the event can usually be a negative punishment. Similarly, if the onset of an aversive event can be a positive punishment, the offset can be a negative reinforcement. These relationships correspond to the diagonals of Figure 6—3. It should also be noted that increasing the probability of a response does not necessarily increase the rate or magnitude of the response when it occurs; it merely increases the probability of it occurring. For example, a person might receive positive reinforcement for talking more slowly. Here the positive reinforcement increases the probability of the behavior of speaking more slowly. Similarly, decreasing the probability of a response does not necessarily decrease its rate or magnitude when it occurs.

 

Operant extinction, like respondent extinction, results from terminating a contingency. In operant extinction we terminate the contingency between the response of the animal and the following event. For example, the rat that had learned to bar-press for food could be put on extinction by insuring that its bar-presses no longer produced food. The rat’s behavior would be considered extinguished when it no longer pressed the bar at higher than baseline level, the rate it was pressing the bar before food was made contingent on bar-pressing. Or a child in a classroom might kneel in his chair to get the teacher’s attention, but stop doing it after the teacher no longer responded to this behavior. An extinguished operant response may show spontaneous recovery, an increase in the probability of the extinguished response following a period of time.

 

In respondent conditioning we speak of the CS as eliciting the CR. The CS forces the animal to make the CR. In operant conditioning the animal is said to emit the response in the presence of the antecedent stimuli. That is, these stimuli do not elicit the response. The rat does not immediately press the bar the instant he is put in the operant chamber. Rather the antecedent stimuli “set the occasion” for the operant response. Some people equate “elicited” with “involuntary” and “emitted” with “voluntary,” but this is not necessarily true. B. F. Skinner, dean of current operant conditioners, would argue that an operant response is as determined and involuntary as any respondent response. For any operant response there is a sequence of stimuli that causes the response to occur, as the CS causes the CR to occur. But the operant stimuli are not as easy to identify as the CS. Nor, for most practical purposes, is it important to be able to identify them. In operant conditioning we often have satisfactory prediction and control merely through manipulation of the contingent events.

 

Occasionally some of the antecedent stimuli, through learning, develop a particularly strong control over the operant behavior. These stimuli are then called discriminative stimuli, and the behavior they control is called a discriminative operant. For example, a rat might be trained that when the left light is on, bar-pressing yields food, whereas when the right light is on, a bar-press produces no food. If the left light is on for only a short time, we will probably find our hungry rat will learn to hurry to the bar when the left light is on and generally avoid the bar when the right light is on. Here we would say that the discriminative stimulus of the left light sets the occasion for the discriminative operant of pressing the bar. The discriminative stimulus is often abbreviated SD while the other stimuli, such as the right light, are abbreviated SD.

 

An operant conditioner might decide to reinforce a particular behavior that he wishes to occur more often. But what happens if the behavior never occurs the first time? Or what if the behavior occurs, but only very infrequently? In these situations, it is desirable to find some way to get the behavior to occur for operant conditioning. There are many ways to do this, the three most popular of which are known by the terms shaping, modeling, and fading.

 

Shaping, also called successive approximation, consists of reinforcing behaviors that gradually approximate the desired behavior. If you want a rat to press a bar, you don’t wait until it presses the bar to reward it. Rather you shape it to press the bar by first reinforcing it just for being in that half of the apparatus where the bar is. Next it has to be within a certain area of the bar to be reinforced, then it has to touch the bar, then put its paw on the bar, and finally it has to press the bar. In practice, shaping is more fluid and less discrete than this description, but the approach is the same. It is not unusual for a skilled shaper to have a naive rat bar-pressing within 15 minutes of the time the rat is put in the test apparatus.

 

Similarly, if you were working with a chronic catatonic who has not talked in ten years, it would be a poor operant program to wait until he said a sentence to reinforce him. Rather you must gradually shape him to talk. Perhaps you would first reward him just for blowing a little air out of his mouth, and from there slowly move on to getting him to produce simple sounds.

 

As a further illustration, take the case of a secretary who is always 15 minutes late for work. If you decide to reward her with praise on the day when she does come in on time, you might have a long wait. Rather you should use shaping; i.e., reward her closer approximations to being on time.

 

Modeling, as mentioned in the first chapter, is often a quick way of first getting a response to occur for operant conditioning; this method often works much faster than shaping. A rat that watched another rat press a bar may imitate or model the other rat and have a tendency to press the bar itself. With humans it is often easy to demonstrate the behavior we wish and reward the modeling.

 

Fading is keeping the behavior the same while gradually changing the stimuli. Thus fading consists of approximations on the stimulus side, whereas shaping consists of approximations on the response side. As an example of fading, a pigeon might be trained to peck a disc to the stimulus of a blue square. If it is now shown a red circle, the pigeon might not peck the disc, as the red circle is too different from the blue square. But if we keep rewarding the disc-pecking as we gradually change, or fade, the blue square stimulus into a red circle stimulus, we can eventually have the pigeon pecking to the red circle without ever having lost the original disc-pecking.

 

Principles of fading become important when we wish to transfer learning from one situation (e.g., school room, clinic) into a new situation (e.g., home). Here it is often useful to provide some transitions between the different settings.

 

REINFORCEMENT DELAY AND SCHEDULE

 

As a general rule reinforcements are most effective when they occur immediately after the behavior (see Renner, 1964). The time between the response and the reinforcement is called the delay of reinforcement, and short delays usually produce better learning than long delays. As the delay of reinforcement increases, the animal often must find ways of mediating the time. A rat may learn some behavior, such as chewing on the food dish, which mediates the time. Humans often use language, both out loud and internalized as thoughts, as mediating behavior. Also, the presence of conditioned reinforcers during the delay period may help to support mediating behaviors. A problem with long delays of reinforcment is that they generally allow for many intervening events to become associated with either the response or the reinforcement, and such associations may interfere with the response-reinforcement learning. Experiments designed to minimize such interfering associations permit learning with longer delays of reinforcement. For example, rats in a simple two-choice learning task learned the correct response with reinforcement delays of up to 8 minutes if they were removed from the test apparatus during the delay period (Lett, 1973).

 

In practical settings we often find behaviors more affected by sources of immediate reinforcement than by events that are temporally distant. The alcoholic’s drinking behavior (which may involve physiological addiction) is often more affected by the immediate reinforcing effects of drinking (e.g., reduction in anxiety, social approval, good feelings) than by the longer term punishing effects of having been drunk (e.g., hangovers). Similarly, a graduate student who only has complete his thesis may often find that the more immediate rewards associated with play affect his daily life more powerfully than the long range rewards associated with the completion of his thesis. The long range rewards may be substantially stronger than the immediate rewards, but this is often more than offset by the delay of reinforcement and the person’s experience of working under long delays.

 

Thus, many programs geared toward altering human behavior involve providing reinforcements with short delays and/or taking existing reinforcers with long delays and building in mediating behaviors and conditioned reinforcers. This cutting down on long delays will be seen later when we discuss contingency contracting.

 

We have been discussing reinforcement as if it occurred after every correct response, but this is not necessarily the case. For example, we could arrange to reinforce only every third response. The pattern by which reinforcements are related to responses is called the schedule of reinforcement (see Ferster & Skinner, 1957; Schoenfeld, 1970; Thompson & Grabowski, 1972). There are two general types of schedules: (1) continuous reinforcement (CRF), in which every correct response is reinforced; and (2) intermittent reinforcement, in which only some of the correct responses are reinforced. Generally, original learning is faster with continuous reinforcement, but the number of trials before extinction occurs is larger under intermittent reinforcement. This longer time to extinction with intermittent schedules is called the partial reinforcement effect. There are basically four types of intermittent schedules: fixed ratio, variable ratio, fixed interval, and variable interval.

 

A fixed ratio schedule means that the animal must make a fixed number of responses before being reinforced. Thus a rat on an FR-5 schedule must make 5 bar-presses before being reinforced. By gradually increasing the ratio, an animal can be trained to make an enormous number of responses for a single reinforcement. Fixed ratio schedules correspond to piecework pay. A laborer payed for every 3 items he produces is on an FR-3 schedule.

 

A variable ratio schedule is the same as the fixed ratio except that the number of responses required each time varies around some average. Thus a rat on a VR-9 schedule must press, on the average, about 9 times before being reinforced. However, one time he might press only twice and another time might require 12 presses. VR schedules often result in very long times to extinction. Consider a man playing roulette in Las Vegas and only betting one number each time. He is on a VR-38 schedule since 38 different numbers come up randomly. Thus on the average he wins once for each 38 bets (and is paid only 35 to 1 odds), but he might win twice in a row or go 200 times without a win. However, with no one influencing the wheel, the long term average will be about 1 in 38. A behavior maintained under such a schedule is, of course, difficult to extinguish, and this is one of the variables feeding into gambling fever.

 

A fixed interval schedule is one where the animal is reinforced for the first correct response it makes after some period of time has passed. Thus a rat on an FI-1-minute schedule will be reinforced for the first response he makes after one minute has passed. Responses made before this time is up will have no effect.

 

A variable interval schedule is the same as the fixed interval except that the amount of time varies from trial to trial around some average. A rat on a VI-1-minute schedule is reinforced for the first response he makes after some period of time. This period may be different each time, but will average out to about one minute. VI schedules often produce some of the stablest responding, since the animal can’t “figure out” when to respond and when not to respond. VI schedules often produce very long times to extinction. Thus people who want to build in a strong behavior often start their animal or human subject on CRF and gradually phase them onto a VI schedule.

 

These four intermittent schedules can be combined in various ways, such as requiring the animal to respond first on VR-5, then FI-2 (minutes), then VR-5, and so forth. There are also many other schedules, too numerous to be fully discussed here.

 

THE NATURE OF REINFORCEMENT

 

The simplest approach to reinforcement is to define it operationally: An event which when contingent on a response increases the probability of the response is a reinforcement. This has a touch of circularity to it in that an event is identified as a reinforcement after it functions as a reinforcement. This circularity can be overcome by showing that the reinforcement is trans-situational. That is, it is possible to demonstrate that the event which functions as a reinforcement in one situation also functions as a reinforcement in quite different situations. (One problem is what constitutes a “different” situation.) At the empirical, operational level there is fairly good consensus about the properties of reinforcement. However, at the theoretical level there is little consensus about the nature of reinforcement.

 

A major theoretical issue is whether reinforcement affects learning or only performance. Theorists who hold that reinforcement affects learning (e.g., Thorndike and Hull) argue that the reinforcing event somehow facilitates the learning process or strengthens the learned association. For example, Landauer (1969) assumes that learning is by continguity and that reinforcement facilitates the consolidation of the learning. To Landauer, a reinforcement is any event that strengthens learning, such as contingent food or CS-UCS pairings.

 

On the other hand some theorists (e.g., early Tolman) hold that reinforcement affects only performance, and not learning. Such theorists often think of the reinforcement event as being an incentive, an event that the animal is motivated to try to acquire, rather than an event which strengthens learning. Bolles (1972) argues that contingent reinforcement is neither a necessary nor a sufficient condition for operant learning. Bolles’ expectancy theory of learning states the following primary law of learning: “What is learned is that certain events, cues, (S), predict certain other, biologically important events, consequences, (S*). An animal may incidentally show new responses, but what it learns is an expectancy that represents and corresponds to the S-S* contingency.” According to Bolles, when an animal learns a relationship between its behavior (R) and some consequence of this behavior (S*), the animal learnsan R-S* expectancy. These two expectancies, S-S* and R-S*, are all that is usually learned in operant conditioning. These expectancies then become “synthesized” so that in the presence of S the animal makes the response R. Thus if “an animal is placed in a situation where there are cues predicting food, and food is made contingent upon some response, the animal will learn first that these cues predict food, and second, that its behavior produces food. If the animal is hungry, then it is likely to make that response.” In Bolles’ theory, operant and respondent conditioning both involve learning S-S* expectancies, and in operant conditioning the subject may also learn an R-S* expectancy.

 

THEORIES OF REINFORCEMENT

 

Let us now turn to a few of the many theories of reinforcement, most of which were proposed by theorists who believed that reinforcement affects learning. Hull (1943) suggested that all basic drives, such as hunger or the sexual drive, feed into one non-specific drive. This nonspecific drive then energizes whatever behaviors the animal makes in the particular stimulus situation. According to Hull, reinforcement is any event which produces a reduction in this non-specific drive. Hull’s theory is thus referred to as a drive-reduction theory of reinforcement.

 

Sheffield (1966a, 1966b), on the other hand, has suggested a drive- induction theory of reinforcement. Sheffield argues that animals learn those responses which arouse motivation. If a rat receives food for turning right in a T-maze, as opposed to turning left, the consummatory response of eating becomes conditioned to the stimuli of the right side as well as to response-produced stimuli of the instrumental behavior. When the rat now approaches the choice point, these stimuli elicit, to some degree, the consummatory response. But since the rat can’t consume the food until he gets to it, the consummatory stimulation without consummation is drive induction, which motivates the rat to make the response (turning right) which in the past preceded the consummatory response. Thus Sheffield’s rat is forced to make the response because of the drive induction. Although originally more general, Sheffield’s theory now is basically only applied to consummatory situations, as opposed, for example, to punishment situations. The consummatory response may also be a central response without overt behaviors.

 

Gibson’s theory of perceptual learning, discussed in Chapter 2, suggests that reduction of uncertainty is the reinforcement for much of perceptual learning (Gibson, 1969, p. 120). The complexity-arousal theories in Chapter 3 also deal with reinforcement effects.

 

Premack (1965) has proposed a theory of reinforcement in which responses reinforce responses. To determine which responses will act as reinforcers we must first measure the independent rates of the different responses. This is done by putting the animal in a situation where it can freely do either of two responses with no contingencies between the responses. From this, Premack predicts that the higher probability response will reinforce the lower probability response if a contingency is established between the two. For example, if a hungry rat is put in a situation where it can eat food or press a bar (where the bar-press does not yield anything), the independent rate of eating food will be higher than the independent rate of pressing the bar. So if the response of eating food is made contingent on the response of pressing the bar, the rate of bar-pressing will increase, being reinforced by the opportunity to eat food.

 

Premack’s theory has two major strengths. First, it allows us to incorporate into reinforcement theory well-known examples of activities reinforcing activities, such as when the mother tells her son that he must first eat his vegetables (low-probability behavior) before he may go out and play (high-probability behavior). Although there are other explanations for such conditions, they fit so well into Premack’s theory that Premack’s principle of reinforcement is sometimes called “Grandma’s rule.” The idea that the opportunity to engage in some activity is a reinforcement underlies much of contingency contracting, discussed later.

 

The second strength of Premack’s theory is its suggestion of reinforcement relationships that are not as obvious from other theoretical positions. For example, in some situations Premack showed that humans’ pinball playing was reinforced by eating, while in other situations eating was reinforced by pinball playing. Premack was also able to reinforce a rat’s drinking with giving it the opportunity to run in an activity wheel.

 

So far we have discussed only positive reinforcement in Premack’s theory. The same logic applies to negative reinforcement as well, except that now we are talking about the probability of the offset of an activity or response. Altogether, then, Premack’s principle of reinforcement is as follows: If the onset or offset of one response is more probable than the onset or offset of another, the former will reinforce the latter positively if the superiority is for “on” probability, and negatively if it is for the “off” probability.

 

The next set of reinforcement theories are based on possible physiological bases of reinforcement. More specifically they center on observations that electrical stimulation to certain parts of the brain produces strong reinforcing effects.

 

REINFORCING BRAIN STIMULATION

 

In the early 1950’s, Olds and Milner (1954) were doing experiments which involved putting small electrodes into the brains of rats so that they could electrically stimulate specific areas of the brain. Since brain functioning is at least partially electrical in nature, electrically stimulating an area of the brain, and thus forcing that area to be activated, is one way of testing approximately what that area does in natural functioning. In the course of one experiment, while Olds and Milner were aiming their electrodes at one area of the brain (reticular formation), one electrode, by mistake, ended up much further forward in the brain. It was observed that stimulation through this electrode seemed to be “pleasant” to the rat in that the rat would go to specific places on a table or run a maze to receive this stimulation. Thus began the massive research on reinforcing electrical stimulation of the brain (ESB). (Similar effects can be produced by chemical stimulation, but this literature will not be discussed here.)

 

The effects of ESB are usually defined in an operant paradigm. If the animal will make some response, such as pressing a bar, to turn the stimulation on, the ESB is considered positively reinforcing, whereas if he will respond to turn it off, the ESB is negatively reinforcing. The results, however, are often not this simple. In some situations the ESB is reinforcing at first, but becomes aversive if continued (Bower & Miller, 1958). This may be because the electrical current spreads from reward areas into aversive areas or it may be the result of an actual functional change in the stimulated site.

 

By putting electrodes in various parts of the brain, it is possible to map out the “reward” areas of the brain. It appears that most of the brain, particularly the cortex, is motivationally inert, with ESB producing neither positive nor negative reinforcement. The positive reinforcement areas are mostly in subcortical areas and seem to outnumber the subcortical negative reinforcement areas.

 

The reinforcing effect of ESB varies according to the exact placement of the electrode, the species of the animal, the duration and intensity of the stimulation, and a number of other variables. But at its best, reinforcing ESB is one of the most powerful reinforcements that man has discovered. In the extreme, rats will bar-press for reinforcing ESB to the point of physical exhaustion, often not taking out sufficient time to eat or drink. The strength of the reinforcing effect of the ESB is often measured in terms of rate of response, such as how fast the animal will press a bar. But there are problems with the use of response rate as a measure (see Valenstein, 1964). For example, the ESB may also elicit a motor response or seizure which decreases the rate at which the animal is capable of responding. Or the animal might be reinforced for responding at a specific rate, as in micromolar theory (Logan, 1956). Thus, to determine which of two brain areas produces the strongest reinforcing effect, it might be better to give the animal a choice between ESB to the different areas rather than to merely compare the response rates of the different areas. What is the relationship between reinforcing ESB and other more conventional reinforcements? One thing that stands out is that many of the areas of the brain where reinforcing ESB is found are also areas concerned with other sources of reinforcement, such as from eating. For example, the hypothalamus perhaps the most popular site for reinforcing ESB is a critical brain structure for the control of a wide range of consummatory behaviors, including eating, drinking, and sex. This has suggested to several theorists, including Olds (1962), that reinforcing ESB stimulates the actual physiological substrates of conventional reinforcements.

 

Miller (1961) showed correlations between drive reduction theories of reinforcement and reinforcing ESB, suggesting that the ESB might be stimulating a reward mechanism usually triggered by drive reduction. For example, it is known that electrical stimulation of a part of the hypothalamus reduces the amount of food that an animal will eat. This, then, might be the area stimulated by the drive reduction from eating. Thus we would expect that ESB in this area would be reinforcing, which Miller showed was true. (Although Miller reported that continued stimulation quickly became aversive.) Miller also defended his position by showing how manipulations of drives often affected how reinforcing the ESB was. Later Grossman (1967, p. 591) summarized these findings, saying, “The available evidence indicates that the rate of self-stimulation at a specific electrode site correlates positively with only one particular drive, suggesting a close functional relation between specific drives and the reward effect.” However, there are many reports of conflicting and confusing results in trying to correlate sites of reinforcing ESB with neural sites related to conventional reinforcements.

 

Others have pointed out a number of apparent differences between reinforcing ESB and more conventional reinforcements. These differences include the following: (a) extinction of a response which had ESB as the reward is often more rapid than extinction of responses based on other rewards; (b) satiation to reinforcing ESB often takes much longer; and (c) it is often difficult to maintain responding under an intermittent schedule of ESB.

 

Deutsch (Deutsch & Howarth, 1963) proposed a theory that accounts for some of these differences. According to Deutsch, in reinforcing ESB the electrical current stimulates both a reinforcement system and a motivation system. Stimulation of the motivation system motivates the animal to make the response which results in reinforcement plus motivation to repeat the response. Hence the effect is self-perpetuating, resulting in less satiation with some ESB than with other rewards. Faster extinction and difficulty in maintaining response with intermittent schedules of ESB thus occur because the motivation is eliminated or greatly reduced. (This contrasts with the hungry rat bar-pressing for food that stays hungry even though the bar-press no longer yields food.)

 

Although there are many clever experiments supporting Deutsch’s theory (e.g., Deutsch & Howarth, 1963; Gallistel, 1966), there are also many that refute it. For example, Cantor (1971) used a situation in which the reinforcing ESB was made predictable by preceding it with a brief warning signal. In this case rats would bar-press for a variety of different intermittent schedules of ESB, including FR-2000 and VI-2 minutes. After reviewing a number of studies critical of Deutsch’s theory, Trowill, Panksepp, and Gandleman (1969) concluded that many of the apparent differences between reinforcing ESB and other rewards are due to the specific conditions of deprivation and training used by researchers such as Deutsch, and that the results do not hold up in more general testing situations. They prefer to conceptualize the motivating effects of ESB in terms of incentives rather than as the stimulation of a motivational energizing system such as Deutsch’s.

 

The issues are, of course, far from resolved. For example, Lenzer (1972) has offered a model which argues again that there are differences between behavior maintained by reinforcing ESB and behavior reinforced by more conventional rewards. According to Lenzer, in CRF situations or where the ESB’s follow each other closely, the controlling stimuli (those stimuli leading to the operant response) are internal stimuli produced by the ESB, whereas in similar situations with conventional rewards, the stimuli produced by the reward do not have a major role in controlling the response. Lenzer assumed that those ESB-produced controlling stimuli decay rapidly with time, yielding Deutsch’s type of results. In other situations, such as widely spaced ESB’s, the subject receiving reinforcing ESB learns to respond to stimuli similar to those controlling the behavior under conventional rewards. So in these situations little difference will be found between the effects of reinforcing ESB and other reinforcements.

 

Glickman and Schiff (1967) noted that there was an overlap between those brain areas mediating positive or negative reinforcement and those areas related to species-typical behaviors behaviors that occur in almost all members of a species. Since these species-typical behaviors are generally important to the animal, as in survival value, it is useful for them to become linked with a reinforcement mechanism that will maintain them in the animal’s behavior. According to Glickman and Schiff, reinforcement evolved as a mechanism to insure some species-typical behaviors to appropriate stimuli. Thus reinforcing ESB is the stimulation and facilitation of a neural system underlying species-typical behavior. Aversive effects of ESB are due to the stimulation of areas related to withdrawal behaviors.

 

Consider a domestic cat growling and attacking objects. It may appear to the observer that the cat is experiencing something unpleasant. But, as Glickman and Schiff point out, such behavior may have had survival value in the history of the cat and thus became associated with reinforcement mechanisms. So our growling cat may actually be experiencing pleasure.

 

Reinforcing ESB has also been investigated in humans by a number of investigators, including Heath and his associates (e.g., Bishop et al., 1963; Heath, 1963). Heath uses ESB primarily in a therapeutic setting with mental patients. The “pleasurable” effects of reinforcing ESB can be used to disrupt undesirable behaviors that are incapacitating the subjects. In one case (Moan & Heath, 1972) the investigators took advantage of the fact that stimulation of the septal area of the brain may produce both pleasure and sexual arousal. The patient was a 24 year old homosexual male who was repeatedly hospitalized for chronic suicidal depression. When shown a stag movie of sexual intercourse he showed no interest. However, after a series of septal stimulations, the subject, while still feeling “high” from the ESB, was again shown the movie, which now caused considerable sexual arousal. With the help of more septal stimulations and a prostitute the experimenters were able to quickly build in the subject heterosexual behavior which lasted well after treatment. Heath’s emphasis, then, has been to use ESB more for eliciting responses and emotional states than for reinforcing specific behaviors, although the two effects are often confounded.

 

Delgado (1969) has developed ESB technology to an impressive stage. Delgado’s subjects (usually monkeys, although humans were used on occasion) are equipped with a unit in their skull that simultaneously records brain activity and stimulates specific areas. This unit can be monitored and controlled via radio communication so that the subject is not restricted in movement by wires coming out of his head. Through such a set-up the observer, which may be a computer, can monitor the subject’s brain activity and stimulate different areas of the brain when specific reactions are desired. By stimulating different areas, the subject can be made sleepy, hungry, aggressive, afraid, or sexually aroused; almost any basic emotion, motivation, or simple physical movement can be elicited. And the stimulation may be used as a reinforcement.

 

People who have received reinforcing ESB say that it is pleasurable; they often describe the sensation in terms of one or more other types of rewarding sensations, such as sexual orgasm or the pleasure experienced from having something good to eat. At present we don’t know just how powerful a reinforcing effect can be produced by ESB in humans. Is there an area or combination of areas in the human brain which when stimulated will produce so powerful a pleasurable sensation that the subject will choose this ESB over all other sensations or activities? We don’t know, but there is no reason to believe that there isn’t. The work done on ESB in humans by researchers such as Heath and Delgado has not really emphasized the reinforcing effects of some ESB; that is, they have not experimented with requiring the subject to make some response in order to receive reinforcing ESB. Such experimentation, however, has its dangers. An example of a misuse of reinforcement would be giving a mental patient a reinforcing ESB every time we recorded some specific aberrant activity in his brain. Although we may have intended the ESB to disrupt the aberrant activity and associated behaviors, we might actually be reinforcing this particular brain activity to occur more often.

 

The possibility of ESB’s having powerful reinforcement effects in man raises a host of philosophical, ethical, and science-fiction issues. Under what conditions would we have the right to apply such a technology to someone else? If we find areas where ESB is pleasurable, should we then give it to everyone? If I had someone work around my house for me in order to receive reinforcing ESB each night and he told you he was doing the work voluntarily because he liked ESB so much, would you object to my coercing work out of him? If the ESB is so rewarding to my worker that he would do anything to receive it, where does the concept of “will” enter in? Or is man so complex that he can never be controlled through such a simple procedure?

 

PUNISHMENT

 

When used by itself the term “punishment” usually refers to positive punishment a contingent event whose increase results in a decrease in the probability of the response it is contingent on. It is less probable that a child will touch the burner on the stove if he is burned when he first makes the touching response. Although it is easy to define punishment in terms of its effect on behavior, the mechanisms by which it produces these effects are highly debated (Campbell & Church, 1969; Church, 1963; Dunham, 1971; Johnston, 1972; Solomon, 1964). We will consider a few of the possibilities.

 

A punishment probably elicits emotional responses in the subject, such as fear and anxiety. These emotional responses then may become respondently conditioned to the situation in which the punishment occurred. To the extent that these emotions are incompatible with the punished response, the probability of the response may decrease. Or these emotional responses may lead to some other imcompatible response which becomes conditioned to the situation.

 

The punishment may elicit some response, other than an emotional response, which becomes respondently conditioned to the situation. Again, to the extent that this response is incompatible with the punished response, there will be a decrease in the probability of the punished response.

 

Since the onset of the aversive stimulus is positive punishment, the offset of the stimulus is negative reinforcement. Thus whatever response the subject is making when the stimulus goes off, such as an escape response, will be reinforced. If this reinforced response is incompatible with the punished response, there will be a decrease in the probability of the punished response. Of course, punishment need not produce just one of the effects mentioned above, but may produce different combinations of the effects in different situations.

 

Dunham (1971) has summarized the effects of punishment due to electric shock into two basic rules: (1) That particular response in the organism’s repertoire which is most frequently associated with shock onset, or which predicts the onset of shock within a shorter time than other responses, will decrease in probability and remain below its operant baseline; (2) That particular response in the organism’s repertoire which is most frequently associated with the absence of shock onset, or which predicts the absence of shock onset for a longer period of time than other responses, will increase in probability and remain above its operant baseline.

 

Premack expanded his response-probability approach to reinforcement to include punishment as well (Terhune & Premack, 1970). That is, in reinforcement, response A will reinforce response B if A is more probable (has a higher independent rate) than B, whereas in punishment, response A will suppress response B if A is less probable than B.

 

In applied situations the practitioner should generally avoid, the use of punishment as a change procedure for reasons such as the following:

 

1. Punishment by itself does not necessarily produce desirable behavior. Punishing a child for impolite behavior does not guarantee that he will then show polite behavior, as the desired behavior may not even be in his repertoire.

 

2. The punishment may condition in fear, anxiety, or other perhaps undesired emotions. A worker may develop a dislike for his job and show little commitment to his work because his supervisor keeps criticizing his mistakes.

 

3. The punished person may develop escape or avoidance behaviors. The author had a case of a boy with a school phobia so severe that the boy would no longer even enter the school building. The primary factor that led to this phobia was that the school emphasized corporal punishment which caused the boy to learn an avoidance response to school.

 

4. Attempted punishment of an escape or avoidance response in some situations increases the strength of the avoidance. The author watched a father at the beach trying to overcome his son’s fear of the water. The father would take his son to the edge of the water and then retreat a short distance. As soon as a medium sized wave came in, the child became afraid and ran away from the water. The father punished the child’s running away, verbally or physically, which only made the boy more anxious, and made him run from the water faster and sooner.

 

5. Punishment may result in masochism. If the only time that a child really gets much attention from his parents is when they punish him, he may be willing to receive the punishment in order to receive the attention. In such cases the assumed punishment may become a conditioned reinforcement as the result of its pairing with the reinforcement of attention (see Chapter 7).

 

6. The punishing agent may provide a model for aggressive behavior. Children often model or imitate their parents. If they see their parents handle conflict situations by being aggressive, they too will learn to be aggressive.

 

7. The punished person often becomes less flexible or adaptable in his behaviors. On the wards of many mental hospitals there is much that the patient can do and be punished for, but little that he is rewarded for. In such situations the patient’s best “strategy” is to do as little as possible.

 

Because of such possible effects of punishment as these, it is usually better to try to reinforce and shape in the desired behaviors, rather than punish the undesired behaviors. This, of course, is not always practical, as sometimes the behavior is so detrimental (e.g., the child who keeps running into the street or the autistic child who claws up his face) that it is necessary to use punishment to suppress the undesired behavior long enough to build in desired behavior. Also, a number of cases have been reported in which punishment was a useful change procedure (Baer, 1971).

 

If punishment does have many bad effects and is not one of the most effective change procedures, why is it so prevalent in our society? There are, of course, a myriad of reasons, such as moral and legal philosophies (e.g., “an eye for an eye”) and the fact that the punishing agent often uses punishment to release his own anger or uncertainty about how to handle a situation. But a major variable is delay of reinforcement. The immediate effects of punishment are reinforcing to the punisher, the punished behavior is quickly suppressed, and the punisher releases some of his emotions. It is in the more long range effects that the disadvantages of punishment usually arise, but because behavior is so easily controlled by the short delay effects, people are reinforced to use punishment.

 

The United States generally puts more emphasis on punishment than on rehabilitation. This is particularly evident in the prison systems, but can be seen at all levels of society. In behavior change situations people tend to think in terms of punishing or stopping undesired behavior, rather than building in desired behavior. The teacher asks “How can I stop the children from running in the halls?” rather than “How can I get the children to walk in the halls?” The manager asks “How can I stop my workers from taking extra time during lunch?” rather than “How can I get the workers to take only one hour for lunch?” Although these differences may sound semantical, they generally lead to significantly different approaches to behavior change. A point that Skinner (see Skinner, 1971) continually makes is the importance to our society of switching from punishment to reinforcement. For, Skinner argues, reinforcement procedures are generally more effective than punishment procedures in changing behavior and maintaining desirable behaviors. Also, behavior control by pleasant consequences seems preferable to control by aversive consequences.

 

The second type of punishment is negative punishment, a contingent event whose decrease results in a decrease in the probability of the response it is contingent on. A mental patient may decrease his delusional talk if every time he talks this way the social worker walks away from him for five minutes. Significantly less research has been done on negative punishment than positive punishment (see Coughlin, 1972). Since negative punishment essentially consists of withdrawing a positive reinforcement, there are many possible explanations for the resulting effects. To a certain extent negative punishment is an operant extinction procedure since behaviors can now occur and not be reinforced, because the reinforcement is withdrawn. The act of removing the source of positive reinforcement may also function as a positive punishment.

 

A common form of negative punishment in schools is a time-out procedure. Here the student to be punished is sent to a room or section of a room in which he just sits for a short time. If the regular classroom is a source of reinforcement for the student, then the time-out procedure will be negative punishment. In an ideal classroom operating on reinforcement principles, time-out may be the most reasonable form of punishment.

 

EXAMPLES OF OPERANT CONDITIONING

 

Operant conditioning has been applied in an amazingly large number of different situations. Here we will mention only a few examples.

 

Verhave (1967) trained pigeons to inspect pills for a drug company. The pigeon would sit in a cage with two rounded discs before it; one was a translucent window, the other opaque. A conveyor belt moved pill capsules by the translucent window. If the pill was acceptable the pigeon pecked the opaque disc; if defective, it pecked the translucent disc. Within a week of training the pigeons were working at 99 per cent accuracy. The pigeons were rewarded with food for making the right discriminations.

 

During the Second World War, Skinner (1960) trained pigeons to fly missiles. The pigeons worked as a homing device in an air-to-ground missile called the Pelican. In the training the pigeons’ behavior was reinforced for pecking the appropriate keys controlling the direction of the missile toward the chosen target. Although Skinner’s project worked quite well, it was not well received by the appropriate government officials, who caused the project to be terminated.

 

Pryor (1969) has shown how to operantly condition “creativity” in porpoises. Her method was to reward the porpoise only for behaviors that had not been rewarded before. Thus, after running through its usual repertoire of behaviors, it had to generate entirely new or creative behaviors. Many of these new behaviors (aerial flips, gliding with tail out of water) had never been observed in a porpoise by the staff at the Sea Life Park.

 

Skinner (MacCorquodale, 1969; Skinner, 1957) has suggested an analysis of speech as essentially a form of verbal behavior whose acquisition and maintenance is due to operant conditioning. For example, a small child’s behavior might be reinforced by fondling for making the operant response “da-da” to the discriminative stimulus of the father (SD = father; an SD = milkman). It is also easy to imagine how the parents gradually shaped the response “da-da” by reinforcing approximations to this response. Parents’ “ability” to hear “words” in the seemingly random sounds of their child often facilitates verbal shaping. The person gradually acquires a very complex set of verbal behaviors which have been learned because of how useful they are in maximizing reinforcements in the social environments. Critics of Skinner (e.g., Chomsky, 1959) argue that Skinner’s analysis cannot account for all the complexities of language learning and speech. Perhaps there are other variables, such as a predisposition to acquiring certain grammatical styles, that have to be added. But this is a question that does not yet seem to have been adequately resolved, although some critics believe that it has.

 

An operant analysis of speech suggests the possibility that thoughts may, totally or to some degree, be considered covert internalized verbal behaviors that are under the control of operant variables. This has led to a procedure called coverant control in which thoughts are manipulated by operant conditioning (Homme, 1965; Mahoney, 1970). For example, the author had a case of a college student who in certain social situations kept having thoughts about his social inadequacies. The student was convinced that his thoughts were irrational and not well founded, but they kept occurring and bothering him. Through coverant control it was possible to operantly condition other thoughts to occur in place of the undesired thoughts, and in two weeks the problem was gone. This was accomplished by the student’s writing the desired thoughts on small cards, which he then inserted in his cigarette pack. When in the social situation that elicited the undesired responses, he would occasionally read to himself one of the desired responses and reinforce himself, as with a cigarette or by thinking about something particularly pleasant. This was continued until the desired thoughts replaced the undesired thoughts.

 

Much of children’s behavior can be thought of as operant behavior maintained by the reinforcement of attention, as in the following examples. When put to bed little Jeffrey will cry and refuse to sleep until his parents return to his room and read him a story. Although capable of working by himself, Stevie keeps coming up to the teacher’s desk for help. When Susie’s parents are engrossed in adult conversation with some visitors, Susie may do something “cute” to bring the group attention to her. Ideally parents and teachers should use their attention to reinforce desirable behaviors and not to reinforce undesirable ones. There is a natural tendency, however, to do just the opposite. That is, when the child is doing all right (emitting desirable behavior), the parent or teacher relaxes and probably leaves the child alone, whereas when the behavior becomes somewhat troublesome (child emitting undesirable behaviors), the parent or teacher decides that it is now time to attend to what the child is doing.

 

A good operant conditioner learns to ask the question “What is the function of this behavior?” That is, what are the operant contingencies maintaining this behavior? Rather than explaining problem behavior in terms of intra-psychic disturbances or in terms of the historical development of the problem, the operant conditioner looks for the contingencies currently maintaining the problem behaviors and how these contingencies or alternative behaviors might be manipulated. (This is not to suggest, however, that all behavior can be reduced to the operant paradigm.) Manipulation of operant contingencies, particularly with humans, necessarily raises ethical issues about what constitutes “desirable” behaviors and who has the right to alter another person’s behavior, either intentionally or not.

 

Madsen and associates (1968) investigated the effects of rules, ignoring inappropriate behavior, and showing approval for appropriate behavior exhibited by students in an elementary classroom. They concluded that (a) rules alone had little effect on classroom behavior, (b) the combination of ignoring inappropriate behavior and showing approval for appropriate behavior was very effective in achieving better classroom behavior, and (c) approval for desirable behavior is “probably the key to effective classroom management.”

 

Emery Air Freight Corporation had a goal for their customer service department of responding to customer queries within 90 minutes. The employees felt they met this goal about nine times out of ten, but in fact it was only three times out of ten. An operant feedback system was established in which the employees marked off on their sheets whether each call was answered within 90 minutes. The supervisor then gave praise and recognition for improvement in performance. Within one day performance went from the 30 per cent to 90 per cent and stayed between 90 and 95 per cent for at least three years (Business Week, Dec.18, 1971).

 

Sabatasso and Jacobson (1970) worked with a 58 year old man who had spent five years in a ward for chronic schizophrenics. His diagnosis was “chronic brain syndrome, resulting from brain trauma, with psychotic reaction.” The head injury resulted from being hit over the head with a board during a fight. The subject was considered a mute psychotic as he had only said one word, “yes,” during his five years in the hospital. Through modeling and reinforcement with praise and candy the subject was gradually shaped to speak. Within ten hours of therapy the subject verbalized 307 words, 56 different words, and several simple sentences. At one point the subject shouted excitedly, “I’m talkin’ to you.”

 

A popular behavior modification procedure in applied operant situations is contingency contracting, a formal agreement about reinforcement contingencies and required behaviors. A parent specifies exactly what behaviors he expects from his children (e.g., being home for dinner by 5:30, maintaining a C average in school) and what reinforcements (e.g., allowance, being permitted to go to a movie) the child will receive contingent on these behaviors. A teacher posts the rules for the classroom (e.g., having specified supplies each day, staying in seat during self-work time) and each student who fulfills this contract may choose one reinforcement from a list (e.g., 10 minutes at the end of the class period to read whatever he wants, permission to leave class 2 minutes early). A person who wants to lose weight gives his favorite records to a friend and then must earn the records back by specified weight loss. A husband and wife undergoing marriage counseling learn to do contingency contracting with each other as a first step toward building give-and-take into their marriage (e.g., the husband agrees to be home by 2 A. M. on his poker night if the wife fixes one of a number of specified dinners at least twice a week.)

 

Various forms of contingency contracting have been applied to many different types of behaviors in a wide range of situations. Contingency contracting has many positive points:

 

1. It guarantees the systematic use of operant conditioning.

 

2. All required behaviors should be well specified so that there is no question about actually what is expected or arguments about whether the behavior occurred or not. Many arguments between parents and their children center on whether or not the child did what he was supposed to do.

 

3. It forces all participants to be consistent. The student in the classroom or the child at home enjoys contingency contracting since he knows that he will receive a specified reward for a specified behavior and that this is independent of the parent’s or teacher’s current mood or whether or not the teacher likes him.

 

4. It provides an easy way to guarantee reinforcement for behaviors that ordinarily are not reinforced or which are reinforced but with too long a delay of reinforcement. One of the author’s graduate students who had trouble motivating himself to work on his thesis (too long a delay of reinforcement for thesis completion) gave the author a number of things that were highly reinforcing to the student (e.g., guitar, records, books, clothes, and things to consume). The student gradually earned these back by completing portions of his thesis within specified time limits.

 

5. Contracts can be individualized to deal with the needs of each person. Classes can be set up for truly individualized instruction. A program in a mental hospital can take into account each patient’s particular needs and problems.

 

A variation of contingency contracting is a token economy, in which the immediate reinforcement is tokens which can later be exchanged for other reinforcements. The tokens, such as poker chips or marks on a chart, are just the medium for exchange. A token system in a mental hospital might involve the patient’s earning tokens for behaviors such as dressing himself, acting in specified ways, and attending vocational rehabilitation programs. These tokens can later be exchanged for rewards such as magazines, an opportunity to see a movie, or a trip to town, with the number of required tokens varying from item to item. The main advantage of tokens is that they can be administered almost anywhere with little delay of reinforcement. If there is a big enough selection of things to buy with the tokens, the tokens should always be reinforcing.

 

Token economies have revolutionized mental hospitals (Ayllon & Azrin, 1968), establishing programs that help large numbers of patients without necessarily increasing the staff. Token economies in classrooms (O’Leary & Drabman, 1971) provide settings in which both students and teachers work more effectively and with more enjoyment. Token systems have also been successfully used in homes, prisons, and half-way houses (see Kazdin & Bootzin, 1972).

 

CONDITIONING VISCERAL RESPONSES

 

The somatic nervous system is that set of nerves which controls “voluntary” actions of the skeletal-muscular system, such as moving an arm. The responses of this system are usually conditioned operantly, but many can also be conditioned respondently (e.g., human eyelid response or the pattellar reflex). The autonomic nervous system is that set of nerves that controls visceral responses, including circulation, digestion, and activity of glands. Historically this nervous system was considered inferior or more primitive than the somatic nervous system. It appeared to function fairly autonomously, outside of “voluntary” control. Until fairly recently it was almost universally held by learning theorists that the visceral responses of the autonomic nervous system could be conditioned respondently, but not operantly. This suggested that there are at least two different types of learning: operant conditioning affecting the somatic nervous system but not the autonomic nervous system, and respondent conditioning affecting the autonomic nervous system and some of the somatic nervous system. Today there is impressive data that visceral responses can be operantly conditioned (DiCara, 1970; Katkin, 1971; Miller, 1969) as well as brought under voluntary control (see next section on Biofeedback). On the basis of these experiments Miller (1969) has argued that there may be just one type of learning, based on reinforcement.

 

A problem in demonstrating operant conditioning of visceral responses is that any apparent effects may be an artifact of the conditioning of a skeletal response. That is, in trying to operantly condition the visceral response, the experimenter may actually be operantly conditioning a skeletal response which in turn produces changes in the visceral response. To avoid this problem, Miller and his associates (Miller, 1969) gave their rats the drug curare, which produces paralysis of the skeletal muscles. This drug also facilitated the conditioning of the visceral responses, perhaps because it removed some of the variability and distraction from the somatic nervous system.

 

Miller used reinforcing brain stimulation as a reinforcement for conditioning his curarized rats. Miller showed that he could shape the rats’ heart rate either up or down by reinforcing changes in the desired direction. For example, if he wanted heart rate to go up he would wait until the natural fluctuations of the heart rate increased and then reinforce this increase with the reinforcing brain stimulation. Through shaping, larger and larger changes were required and generated. Miller also showed that these heart rate changes could be brought under discriminative control. For example, a rat could be conditioned so that his heart rate would go down when a light and tone came on. Miller and his associates then demonstrated the operant conditioning of a variety of other visceral responses, including intestinal contractions, urine formation by the kidney, and amount of blood flow in the tail. In one experiment they were even able to condition the rat so that more blood would flow into one ear than another. To show that such conditioning effects are not specific to the use of reinforcing brain stimulation, Miller also conditioned heart rate changes, intestinal contractions, and changes in blood pressure where the reward was that the rat avoided shock to his tail.

 

However, visceral responses do seem resistant to relatively simple operant conditioning procedures. It may be that it would be evolutionarily disadvantageous if visceral responses were readily manipulated by operant contingencies. For health and survival, an animal’s visceral responses must stay relatively stable despite drastic changes in environmental contingencies. Otherwise chance reinforcements might produce an animal with high blood pressure and inadequate responses by the kidney. On the other hand, it might also be evolutionarily undesirable if the visceral responses did not respond at all to operant variables. For in extreme situations such as malfunction, disease, or extreme constant environmental changes it may be desirable to have visceral learning.

 

These animal experiments suggest that many human psychosomatic illnesses might be due to operant conditioning of visceral responses. For example, blood flow to specific body organs has been shown to be conditioned operantly, so this could result in specific psychosomatic symptoms related to that organ. Perhaps reinforcers such as a mother’s attention or avoidance of unpleasant situations might be sufficient to shape in psychosomatic illnesses. This is an open area of research.

 

BIOFEEDBACK

 

Feedback has been shown to be a powerful determinant of behavior. However, many response systems, such as visceral responses, provide little or no feedback regarding their functioning. For example, the reader might try to tune in to the activity of his liver or try to feel slight changes in blood pressure. Extreme activity of such systems may be perceived, particularly as they affect other parts of the body, but the normal fluctuations of activity in these systems are usually imperceptible owing to inadequate feedback. This is probably just as well, for if early man had feedback and control of visceral responses, he probably would have messed himself up more than helped himself.

 

Earlier we saw how visceral responses could be manipulated by operant conditioning, a form of feedback. This suggests that if people were provided feedback from systems that they don’t usually receive feedback from, such as those controlling blood pressure, they might be able to learn to control these systems. This leads to the investigations with biofeedback, utilizing mechanical devices that provide knowledge of the activity of a body function for which the person usually has inadequate feedback (see Lang, 1970; Shapiro & Schwartz, 1972).

 

Say, for example, that we wished to teach a person how to lower his blood pressure. We could hook him up to a mechanical device that would measure blood pressure and turn on a green light when the blood pressure went below a specified level. At first this level might not be very low, but it could be gradually lowered, as in shaping. After the subject is hooked up to such a device, he is given the simple instruction to try to get the green light to come on. (He might also be given other instructions, such as how to relax, but this is not necessary.) Although he may not know how he is doing it, after a short time the subject can get the green light to come on “at will.” With a little more training and shaping the subject is soon able to significantly lower his blood pressure when he wishes. Shapiro and his colleagues (1969) have shown how subjects can learn control of blood pressure through such biofeedback procedures. Schwartz (1972) demonstrated biofeedback control of heart rate and blood pressure. In fact, Schwartz’s subjects could control heart rate and blood pressure independently, raising one and simultaneously lowering the other.

 

Because of the apparent absence of internal feedback, subjects learning to control response systems such as those involved with blood pressure often have no subjective feeling about what they are doing when they change these responses. They just “know” how to do it, but they don’t feel any different. Some subjects develop superstitious behaviors, such as learning to tense or relax some muscle that is irrelevant to the effect. It remains to be seen whether some subjects will actually learn to respond to very subtle feedback cues that are actually correlated with the response system to be changed.

 

The potential implications of such studies are enormous. Researchers are currently investigating whether people with high blood pressure can learn to keep their blood pressure down by voluntary control. One wonders how many autonomic responses people can learn to control. Will people in the near future be able to learn control of their bodies so that a person might be able to voluntarily quiet an upset stomach or relax by lowering his heart rate? Will a person with a defective gland learn voluntary control over this gland? Will many medical problems fall under the domain of the biofeedback trainer? Under what circumstances is such control over autonomic responses undesirable or dangerous?

 

There are also a host of practical problems. For example, in the animal studies on operant conditioning of visceral responses, Miller found that the animals were much easier to condition while on the drug curare. Miller suggested that this might be because without curare the skeletal responses and autonomic responses elicited by these skeletal responses may interfere with the autonomic responses that the experimenter wishes to condition. This raises the question of how effective biofeedback training of autonomic responses in humans can be without control such as that produced by curare. Another practical problem is how long a person can maintain autonomic control after he is no longer hooked up to a biofeedback device. We know that control lasts for a little while, but we don’t know exactly how long. Perhaps the subject would need occasional booster training sessions with a biofeedback device in a clinic or with a small unit at home.

 

Currently there is ongoing research on the role of biofeedback procedures in the treatment of headaches. One group (Budzynski et al., 1970) is reporting success in treating tension headaches by giving the subjects biofeedback about the muscle tension in his head and neck. Learning to relax these muscles through feedback reduces the headaches. Another group (Sargent et al., 1972) has been investigating migraine headaches by combining biofeedback techniques with autogenic training. (Autogenic training is a program to learn simultaneous regulation of mental and somatic functions. Control of somatic responses is accomplished by concentrating on specific phrases such as “My feet feel heavy and relaxed.”) The biofeedback training consists of training the subject to voluntarily increase the blood flow into his hands and thus also increase hand temperature. This training seems to be an effective way of dealing with migraine headaches by decreasing the relative blood flow to the head, although the exact reasons why it works are not clear at the time of this writing.

 

The area of biofeedback training which has attracted the most publicity has been control of specific brain waves. Electrodes on the human skull record a variety of brain waves of different frequencies (EEG). Different ranges of the frequencies have been assigned different names:  delta waves are in the range of 0 to 4 cycles per second; theta designates 4 to 8; alpha, 8 to 13; and beta, more than 13. Although the brain generally emits a complex combination of different waves, the waves are often predominantly of one type which correlates with various aspects of behavior. For example, delta waves are primarily seen during sleep, whereas beta waves are seen when the person is awake and looking at things or actively thinking something through.

 

Biofeedback devices can let the subject know when his brain waves are primarily within a certain range. One device might be a tone which sounds in proportion to the amount of alpha waves the subject is generating. Through such devices a person can learn to produce specific types of brain waves. The practical applications of such brain wave control have not been adequately researched yet, but the following are some possibilities: People who have trouble relaxing might learn to generate alpha waves as part of a procedure for calming down. Insomniacs might be partially helped by learning to produce delta waves. Epileptics might be able to control their seizures to some degree by generating specific brain patterns. Chapter 8 contains a discussion of some research that is trying to increase creativity with procedures that include learning to generate theta waves. Parapsychologists speculate that it may be possible to train a person to get his mind in that specific state which lends itself best for receiving extrasensory perception.

 

The most popular wave in such experimentation has been the alpha wave the type of wave a person would probably be generating if he sat back in a chair with his eyes closed, relaxed, and tried not to think about anything specific. This is the wave that people often generate while in meditation. Many machines (most being of inadequate quality) are being sold to people to learn how to generate alpha waves. Societies and occult groups have formed around the idea of alpha wave conditioning. Varied and often preposterous claims are being made for alpha wave conditioning; for example, that it is a short cut to deep meditation states, and that it can produce various forms of extrasensory perception, faster learning, better memory, and better physical and mental health. It is possible that alpha wave conditioning may facilitate a variety of phenomena and may even be a necessary condition for some phenomena, but it is probably not a sufficient condition for most of the phenomena attributed to it. For example, it seems improbable that alpha wave conditioning can produce the “deeper” stages of meditation that come with considerable practice in controlling concentration and training the mind. Lynch and Paskewitz (1971) have suggested that much of “alpha wave conditioning” may not be the actual conditioning of brain waves so much as learning to ignore stimuli and to stop responses which block alpha waves.

 

KNOWLEDGE OF RESULTS

 

A common form of feedback, particularly useful in education, is knowledge of results, feedback about whether the person’s response was correct or not (see Annett, 1969). Knowledge of results (KR) may or may not also contain information about what the correct response is.

 

The effects of KR on behavior have been explained in terms of simple reinforcement: On those trials in which a person was right, he is reinforced by finding out he was right; on those trials in which he was wrong, the wrong response is partially extinguished, or punished, or both. If this explanation is correct, then from our information about delay of reinforcement we would expect the KR to be most effective when given immediately after the behavior. However, this does not seem to be true, particularly when KR includes information about the correct response.

For example, Sassenrath and Yonge (1969) gave college students multiple-choice tests on material they had learned. One group received KR immediately after the test, while another group received KR 24 hours after the test. There was no difference between the groups on a retention test right after the feedback, but on a retention test 5 days later there was a small but significant difference favoring the delayed KR group. Sturges (1972) found a similar superiority for a 24-hour delay KR group, and presented evidence that the difference between the groups was due to factors operating at and/or following the feedback, rather than factors operating during the delay interval. Sturges suggests that with delayed feedback the subjects respond differently to the same feedback. For example, with immediate KR the subjects may be concerned only with that part of the KR which tells them whether or not they were right, whereas with delayed KR the subject reads through more of the KR information and hence learns more.

 

An important, yet unanswered, question for education is: What is the optimal amount of time to delay the KR? The answer probably depends on a number of variables, such as type of subjects used, the nature of the material to be learned, and the nature of the KR. In one study, More (1969) investigated the effects of four different delays of KR (knowledge of how subjects did on a retention test plus what the correct responses to each item were) on the learning of eighth-grade students. In terms of later retention he found that KR was more effective when given either 2.5 hours or one day after the first retention test than if given immediately after or four days after the first test.

 

KR may also have a motivational effect, as when the subject decides to work harder. If a student finds out that his performance on the first exam earned him a C, he may decide to study harder for the second exam. After having all her house plants die, a woman may decide to water the next plants more than once a month. After reviewing the studies done on motivational KR, Locke and associates (1968) concluded that the main effect was on the goals that the subject set for himself. They concluded that motivational KR is most effective when specific goals are set and when the goals set are difficult ones.

 

One powerful application of KR is in the area of programmed instruction, a technology originally investigated by S. L. Pressey in the 1920’s, but which got its main push from Skinner and his colleagues (Holland, 1960; Skinner, 1958). In programmed instruction, the material to be learned (the program) is presented to the student in a series of logical units (frames). The student is required to make a response to each frame, after which he is immediately told the correct answer. By this procedure the student is gradually shaped to the desired terminal behavior. The mechanical device which presents the program is called a teaching machine

 

In programmed instruction, a student is given a small amount of material to learn and then is asked a question on the material. The student might be asked to respond in one of a number of different ways, such as writing his answer or pushing a button to indicate his choice of answers from a number of alternatives. After he has answered, the student is told the correct response (KR). The student then moves to the next frame. If his answer was wrong, he might be diverted to some other part of the program to review the material he missed.

 

Many theorists interpret the effects of KR in programmed instruction as being reinforcement. However, we have already seen that this is an oversimplification of the effects of KR, for the KR also provides for additional learning and motivational changes.

 

Programmed instruction has a number of desirable characteristics:

 

1. Because he is required to answer questions, the student is more active in his learning than he is in other learning situations, such as listening to a teacher.

 

2. The student receives continual and immediate feedback as he progresses. This procedure catches errors early, before the student can go too far in the wrong direction. Also, to the extent that KR is reinforcing, it provides a short delay of reinforcement.

 

3. The student must usually learn some material before being permitted to continue. This is often critical to later learning that presupposes some previous knowledge.

4. The student can work at his own pace. This allows for individual differences in learning rate and style and lends itself nicely to individualized instruction.

5. The programmer receives feedback about how the student is doing on various parts of the program, and can thus adjust and improve the program accordingly.

 

On the negative side, for some people, such as many college students, the format of existing programmed instruction is too constricting for the type of freewheeling, conceptual, integrative learning they prefer and learn best with. Also, it seems that some types of skills, such as problem solving, are better learned by other teaching procedures. But these criticisms may be applicable only to the types of programs that currently exist, rather than to the logic of programmed instruction itself.

 

There are basically two types of programs: linear and branching. In a linear program all students progress through the same sequence of frames. In a branching program students are routed through different sequences of frames depending on how well they do at specific points in the program. Thus if a student misses a question, he may be routed through a number of different frames that cover the same material in a slightly different way, while students who didn’t miss the question continue on to new material. Or, based on an assessment question, one student may be permitted to skip over a number of frames that cover material that he already knows.

 

Programmed instruction can be made even more flexible by the use of computers (Atkinson, 1968). This computer-assisted instruction (CAI) can handle very complex branching programs, almost instantly presenting to the student the next frame he needs based on his performance on the last frame. The computer can also record useful data pertaining to each student, such as areas of particular difficulty that a teacher might wish to attend to. The computer can keep data about the effectiveness of the program, such as which frames the students are making more mistakes on. In CAI the computer can do a host of other things as well, such as presenting visual displays on a screen, turning on slides and movies, and presenting audio messages through headphones to the student.

 

Throughout this chapter we have discussed a wide range of effects that feedback can have. Feedback may produce one or more of the following effects:

1. The feedback can be a reinforcement or a punishment.

 

2. The feedback can produce changes in motivation, such as changes in the goals a person sets for himself.

 

3. Feedback may provide informative cues that guide learning and performance, such as discriminative cues.

 

4. Feedback may provide a new learning experience or a rehearsal of previous learning.

 

SUMMARY

 

Feedback is the input to an organism resulting from a response of the organism. It includes both the sensory input from the muscles involved in making the response and information about how the environment was changed as a result of the response. Basic behaviors, such as walking and simple muscular control, require proprioception feedback from the muscles. Normal speech depends on auditory feedback—the person’s hearing his own voice—plus feedback from speech structures, including the tongue. Visual feedback is involved in tasks such as driving, writing, and drawing. T-groups involve social feedback in which the participants provide information about the way they perceive and feel about each other.

 

S-R theorists often describe feedback as a source of stimuli to which new responses can be conditioned. An example of this is chaining, a sequence of responses in which each response provides part of the stimulus cues (feedback) for following responses. Chaining is the process by which a rat learns a complex sequence of behaviors and a person learns a poem by rote. For many S-S theorists feedback provides information about which behavior is appropriate. TOTE units are an example of this approach. According to the ideo-motor theory, responses are chosen on the basis of their anticipated feedback. Overall feedback may be a reinforcement or a punishment; it may produce changes in motivation, provide discriminative cues, or provide a new learning experience or rehearsal of previous learning.

 

Operant conditioning is the study of the effects of events that are contiguous on responses. Perceiving these events, then, is feedback about the effects of the behavior. If the contiguous event makes it more probable that the response will occur again in a similar situation, the event is called a reinforcement. If the event makes the response less probable, the event is a punishment. Terminating the relationship between the contiguous event and the behavior results in extinction. There are many ways to originally get a response to occur for operant conditioning, three of the more popular ways being shaping, modeling, and fading.

 

For optimal learning, the delay of reinforcement the time between the behavior and the reinforcement—should generally be as short as possible. Continuous reinforcement means that every correct response is reinforced, whereas with intermittent reinforcement only some of the responses are reinforced. Generally original learning is faster under continuous reinforcement than under intermittent, and extinction takes longer with intermittent reinforcement.

 

There is no consensus on exactly how reinforcement functions. For example, does reinforcement affect learning or only performance? Drive reduction theories of reinforcement suggest that animals learn those responses which reduce drives, such as a hunger drive. Drive induction theories stress that animals learn those responses which arouse motivation. The Premack theory is based on the idea that high probability responses can reinforce low probability responses. On the physiological level it has been shown that electrical stimulation of certain brain areas in man and other animals produces reinforcement. These reinforcement areas are often brain areas related to biological needs such as hunger and to species-typical behaviors. Some theorists thus argue that these are the same brain areas that underlie the effects of more conventional forms of reinforcement, such as food and water. However, other theorists have suggested a number of differences between reinforcing brain stimulation and the effects of the other types of reinforcement. These differences may exist because the reinforcing brain stimulation activates both a reward system and a motivation system, or because it results in internal stimuli having more control over the response than would be the case with the conventional reinforcements.

 

Punishment, by definition, reduces the probability of the response that precedes it. In addition, the punishing event elicits many responses, including emotional responses, which may become conditioned to the situation in which the punishment occurred. The offset of the punishment may function as a reinforcement for behaviors such as escape behaviors. Because of effects such as these, most forms of punishment are usually not the most desirable way to change human behavior.

 

Behavior modification often involves the reinforcement of desired behaviors and the extinction and/or punishment of undesired behaviors. One example of this approach is contingency contracting, in which there is a formal agreement among people about the reinforcement contingencies and the required behaviors. Contingency contracting provides for the systematic application of operant conditioning, builds more consistency into people’s behavior, may cut down on the delay of reinforcement, and provides a structure for individualizing behavior modification programs. A token economy is a form of contingency contracting in which the person is rewarded with tokens that can later be exchanged for various reinforcements.

 

Recent research has shown that it is possible to condition animals to alter visceral responses such as heart rate, intestinal contractions, urine formation by the kidney, and blood flow to specific body areas. This suggests that the answers to the genesis and treatment of many psychosomatic illnesses may lie within operant conditioning. Related to this research is the work with biofeedback in which, via mechanical devices, humans are given feedback about the activity of body systems which usually provide little or no feedback. Through biofeedback procedures, people may learn voluntary control over heart rate, blood pressure, specific brain waves, and headaches.

 

A common form of feedback, one which is particularly important in education, is knowledge of results (KR): feedback about whether a person’s response was correct or not. One important question is what the optimal delay of knowledge of results is. For example, for optimal long term learning, how long after a test should a student be given feedback about his performance on the test? Programmed instruction involves giving the student immediate knowledge of results as he actively works his way through a structured program designed to systematically shape his learning behavior.

 

SUGGESTED READINGS

 

Annett, J. Feedback and Human Behavior. Baltimore: Penguin, 1969.

Barber, T. X., DiCara, L. V., Kamiya, J., Miller, N. E., Shapiro, D., & Stoyva, J. (eds.) Biofeedback and Self-Control, 1970. Chicago: Aldine Atherton, 1971.

Honig, W. K. (ed.) Operant Behavior: Areas of Research and Application. New York:

Appleton-Century-Crofts, 1966.

McGinnies, E., & Ferster, C. B. (eds.) The Reinforcement of Social Behavior. Boston:

Houghton-Mifflin, 1971.

Reynolds, G. S. A Primer of Operant Conditioning. Glenview, Ill.: Scott, Foresman, 1968.

Skinner, B. F. Science and Human Behavior. Toronto: Macmillan, 1953.

Skinner, B. F. Walden Two. Toronto: Macmillan, 1948.

Tapp, J. T. (ed.) Reinforcement and Behavior. New York: Academic Press, 1969.

Whaley, D. L., & Malott, R. W. Elementary Principles of Behavior. New York: AppletonCentury-Crofts, 1971.

Williams, J. L. Operant Learning: Procedures for Changing Behavior. Belmont, Calif.:

Wadsworth, 1973.