A thermostat is a
mechanism that utilizes feedback. When the temperature goes below some value,
the thermostat turns on the furnace. The furnace then stays on until the
thermostat receives feedback that the temperature has reached the desired
point. Without such a feedback device you would have to manually turn the furnace
on and off. In 1948 Wiener argued that the logic of feedback theory, as it had
been developed with machines such as the thermostat, could be applied to other
areas, such as biology, neurophysiology, and psychology. Wiener used the term cybernetics for the application of such an approach to machines and animals.
Feedback is now a key psychological concept from the level of simple muscle
control to complex group interactions.
One important type
of feedback, called proprioception,
comes from the muscles.
Simple walking involves a complex set of muscle movements that require feedback
about where different muscles are and what they are currently doing. With
your eyes shut you should be able to touch your index fingers together, regardless
of where your hands start from. This requires proprioceptive feedback. The
disease tabes dorsalis blocks proprioceptive feedback from the limbs.
A person with this disease may have poor control over voluntary movement of
the limbs and might not be able to do the above finger touching with his eyes
closed.
Human speech utilizes
a large number of different feedback mechanisms. During speech there is constant
feedback about the spatial position, direction of movement, and velocity of
movement of the various structures involved in speech, particularly the tongue
(Sussman, 1972). As Sussman points out, “any attempt to explain how the tongue
signals the higher brain centers concerning its highly complex positional
adjustments during speech activity must necessarily incorporate a rapidly
acting, highly discriminative, and comprehensively informative neurosystem.”
The complexity of such a feedback system is mentioned by Sussman in the case
of a man speaking with a pipe clenched between his teeth. Here the entire
muscular movement patterns of the tongue and lips must compensate for non-moving
jaws.
Hearing your own
voice is another source of feedback for normal speech. One way of demonstrating
this is with delayed auditory
feedback (Yates, 1963).
Here the subject hears his voice while talking, but instead of hearing it
immediately, it is electronically delayed a fraction of a second. For example,
while counting from one to ten you might hear “two” while saying “three.”
Talking during delayed auditory feedback is quite difficult. While counting
you might repeat the same number several times, and speech is generally slower,
and contains many more errors. A similar phenomenon occurs with people giving
speeches in a large hall. Sometimes a speaker in such a situation will hear
his own speech slightly delayed, as from his amplified voice bouncing back
to him from a far wall, and this effect may severely impair his speech. Rock
musicians will often have small speakers on the stage with them for immediate
feedback of their sound. Otherwise their timing might be disrupted from hearing
their music delayed as it rebounds off walls. On the other hand deaf people
often show peculiar inflections and intonations in their speech because they
lack auditory feedback.
Smith (1972) has
shown similar disruption in performance with delayed visual feedback. Subjects
had to track a moving geometrical figure with a wand. However, they could
not directly see their hands or the objects. Rather they saw what they were
doing on a television monitor which could delay the visual feedback. As the
delay increased (from 17 to 820 msec.), performance decreased. With the intermediate
delay times, about 250 msec., the subjects often reported that their arm and
hand movements had a peculiarly “rubbery” appearance and feeling.
Adams (1968) has
summarized some of the ways in which feedback is incorporated with learning
as follows. S-R learning theorists conceptualize the learning process as associations
formed between stimuli (5) and responses (R), whereas S-S theorists view learning
in terms of associations between stimuli, such as stimulus relationships in
the environment (see discussion of S-R and S-S theories in Chapter 1). For
the S-R theorist, feedback, such as proprioception, is a source of stimuli.
New responses can be learned to these feedback stimuli and/or the stimuli
may become conditioned reinforcers. This feedback provides the basis for chaining,
a sequence of responses in which the occurrence
of one response provides part of the cues for the following response. One
rat named S. R. Rodent was taught the following chain of behaviors. Rodent
first had to climb a spiral staircase, then run across a drawbridge, climb
a ladder, get into a cable car and pull himself across a gap, climb another
stairway, play a toy piano, run through a tunnel, climb into an elevator and
pull a chain to start it, ride to the bottom floor, and then press a bar to
receive pellets (Bachrach, 1964). This chain of behaviors was conditioned
into the rat by starting at the end (pressing the bar) and moving backward.
In the final chain each component of the chain occurs to two sets of stimuli,
the stimuli of the apparatus and feedback stimuli from the previous behavior.
Thus Rodent’s behavior of climbing the ladder provides feedback stimuli which
help lead to the next response of getting into the cable car. The feedback
stimuli might also become conditioned reinforcers that then reinforce that
member of the chain. Similarly in learning to say a long poem from memory,
the feedback from saying one line may facilitate remembering and saying the
next line.
For S-S theorists,
according to Adams, feedback provides information about the proper conditions
for the behavior. Feedback stimuli feed into the whole stimulus complex to
which the animal makes some responses. These approaches are often cognitive
in nature, and one may interpret learning as being primarily perceptual. Feedback
cues are often thought of as information about which behaviors are appropriate,
rather than stimuli that elicit responses.
Adams favors a closed-loop theory of behavior in which “the consequences of a
response with sufficient habit strength to occur are fed back and compared
with a reference which is the desired value for the system. Any difference
between a reference and its response feedback is error, and the detection
of errors results in a response sequence that can lead to error nulling.”
In other words the animal has a certain goal to attain, and makes responses
in that direction. Feedback following the responses provides information about
whether the goal has been reached or what responses might now be appropriate
toward reaching the goal.
Miller, Galanter,
and Pribram (1960) proposed a closed-loop analysis of behavior based on TOTE
units, as opposed to S-R units. A TOTE unit stands for
Test-Operate-Test-Exit. The “test” is an analysis of information, mostly sensory
data, about any incongruities between the current state of affairs and some
goal. If there is some incongruity, the animal responds, or “operates.” After
it operates, the animal again tests. If there is still an incongruity, it
operates again, and then tests again. When
the test finally shows no incongruity, the animal exits and stops this one
type of behavior.
_dir%5Ctem8261seg371.jpg)
Figure 6—1 shows a
basic TOTE unit related to hammering a nail until it is flush with the surface.
During the test the person inspects whether the nail is flush. If the nail is
not flush, then he hammers (operate). If the nail is flush, he no longer
hammers this nail (exit).
A TOTE unit is an
example of a closed-loop mechanism since the organism tests the outcome of
its behavior against a reference until there is no error. Complex human behavior,
of course, can seldom be explained in terms of a simple TOTE unit. Rather,
Miller, Galanter, and Pribram showed how behavior might be explained in terms
of a number of interrelated TOTE units. They called such a set of TOTE units
a plan. A plan may involve several TOTE units functioning simultaneously, as well
as TOTE units that work sequentially.
Another type of feedback
theory is ideo-motor theory,
originally proposed by
William James in 1890 and propounded more recently by Greenwald (1970). According
to ideo-motor theory, a response is selected on the basis of its own anticipated sensory feedback. In this theory a perceptual image or idea of an action
initiates the performance. William James argued that the mere thought of a
movement “awakens in some degree” the actual movement. (Think about swinging
a golf club or riding a bicycle and note tendencies in the related muscles
toward movement.) The only thing that stops the actual movement is inhibitory
influences from other sources, such as other thoughts. Thus the simple thought
of an action results in anticipation of its own sensory feedback which in
turn helps to determine which behavior will finally occur.
Feedback is the key
ingredient in sensitivity
training groups, also called
T-groups (see Aronson, 1972, Chap. 8). In a T-group a
number of people get together with a trainer to learn more about how their
behavior affects other people. The discussions of the group center on a current
analysis of the social dynamics of the group itself. Each member learns how to
provide feedback to other members of the group about how he feels and is
affected by their specific behaviors. Through such feedback each member can
find out what other people really think and feel about different behaviors.
Feedback of this nature is useful to some people who do not ordinarily receive
it, either because of selective perception on their part or because people are
not giving them this feedback. It can be a useful source of information which
may or may not provide cues and consequences that will affect later
performance.
For feedback in a
T-group to be most useful it should usually meet the following requirements.
It should describe the speaker’s feelings and reactions, not simply make evaluations.
It should provide specific examples, rather than generalities. It should not
just be dumped on the person, but should be presented to him in a time and
a way that is most useful to the receiver.
T-groups, of course,
include much more than feedback, but feedback is probably the major objective.
The research on T-groups is currently grossly inadequate. For example, it
would be interesting to analyze T-groups in terms of modeling and reinforcement.
The types of skills
and behaviors a person learns in interacting with members of a T-group may
or may not be useful behaviors in dealing with people outside of the T-group,
particularly if there is not a good transition between the T-group and the
rest of the world. Too many T groups merely provide the person with another
reference group manipulating his behaviors, while denying any such manipulation.
A second problem is that feedback is not always a very powerful change mechanism.
The feedback may convince the person that he wishes to act differently (more
assertively, for example), but he might not have the desired skills in his
behavioral repertoire. Other change procedures (e.g., assertive training)
then might be useful.
Occasionally the
consequence of some response or behavior will make it more or less probable
that the response will occur again in similar situations. If a hungry rat
presses a bar and receives food, the consequence of the food will usually make
it more probable that the rat will press the bar again. If, however, the
bar-press yields electric shock to the rat’s feet, the consequence of the shock
will make it less probable that the rat will bar-press again.
Operant conditioning,
also called instrumental conditioning and type R learning, is
the study of the effects of contingent
(contiguous) events on the
behaviors they follow. A contingent event is a dependent event when it
occurs if and only if a specified behavior occurred first. In the case of
the rat bar-pressing for food, the food appears if and only if the rat presses
the bar. If the contingent event makes it more probable that the response
will be repeated, then the event is called a reinforcement. If the contingent
event makes the behavior less probable, the event is called a punishment. It should be noted that the concept of “contingency” has been defined
in various ways by operant conditioners. Some theorists (e.g., Schoenfeld
& Farmer, 1970) argue that “contingency” should imply more than simple
contiguity, perhaps some form of causal relationship between the distribution
of responses and the distribution of contingent events.
Figure 6—2 shows
the temporal sequence in operant conditioning. In the presence of certain
antecedent stimuli the animal makes some response. Contingent on this response
is some event. Following this event there may be a change in the probability
of the response re-occurring in the presence of the antecedent stimuli. Such
contingent events then are clearly a form of feedback, for they inform the
animal about the consequences of his behavior.
_dir%5Ctem9101seg401.jpg)
The contingent event
may increase or come on (positive) following the response, or it may decrease
or go off (negative). Also, the contingent event may increase the probability
of the response (reinforcement) or decrease the probability (punishment). This
yields the following four combinations: positive reinforcement, negative
reinforcement, positive punishment, and negative punishment.
Positive reinforcement
is an event whose increase
results in an increase in the probability of the response it is contingent
on. The rat increases his probability of bar-pressing because each
bar-press increases the amount of food present. A child cries at
bedtime if this ensures that his parents will read him a story.
Negative reinforcement
is an event whose decrease
results in an increase in the probability of the response it is contingent
on. A rat will increase his
probability of bar-pressing if each bar-press decreases
the electric shock in the grid floor. A person
begins taking a different route home because he learns it decreases the traffic
he encounters. Note that a very common error is for people to confuse negative
reinforcement with punishment. Remember that negative reinforcement increases response probability, while punishment decreases it.
Positive punishment
is an event whose increase
results in a decrease in the probability of the response it is contingent
on. The rat decreases his probability of bar-pressing if each bar-press
increases the amount of electric shock in the grid floor.
A student stops answering questions in class if his answers are met with derision.
Negative punishment
is an event whose decrease results in a decrease
in the probability of the response it is contingent on. A rat might decrease his probability of bar-pressing if each bar press decreases the supply of something it likes, such as the amount of available food.
A child stops yelling if each time he yells his television program is turned
off for 10 seconds.
_dir%5Ctem9101seg402.jpg)
Figure 6—3 shows
the relationships between these four types of contingent events. Note that
the onset and offset of the same event may function differently depending
on what behaviors they are contingent on. Thus if the onset of a pleasant
event can be a positive reinforcement, the offset of the event can usually
be a negative punishment. Similarly, if the onset of an aversive event can
be a positive punishment, the offset can be a negative reinforcement. These
relationships correspond to the diagonals of Figure 6—3. It should also be
noted that increasing the probability
of a response does not necessarily increase
the rate or magnitude of the response
when it occurs; it merely increases the probability of it occurring. For example,
a person might receive positive reinforcement for talking more slowly. Here
the positive reinforcement increases the probability of the behavior of speaking
more slowly. Similarly, decreasing the probability of a response does not
necessarily decrease its rate or magnitude when it occurs.
Operant extinction,
like respondent extinction,
results from terminating a contingency. In operant extinction we terminate
the contingency between the response of the animal and the following event.
For example, the rat that had learned to bar-press for food could be put on
extinction by insuring that its bar-presses no longer produced food. The rat’s
behavior would be considered extinguished when it no longer pressed the bar
at higher than baseline level, the rate it was pressing the bar before
food was made contingent on bar-pressing. Or a child in a classroom might
kneel in his chair to get the teacher’s attention, but stop doing it after
the teacher no longer responded to this behavior. An extinguished operant
response may show spontaneous
recovery, an increase in the
probability of the extinguished response following a period of time.
In respondent conditioning
we speak of the CS as eliciting the CR. The
CS forces the animal to make the CR. In operant conditioning the animal is
said to emit the response in the presence of the antecedent
stimuli. That is, these stimuli do not elicit the response. The rat does not
immediately press the bar the instant he is put in the operant chamber. Rather
the antecedent stimuli “set the occasion” for the operant response. Some people
equate “elicited” with “involuntary” and “emitted” with “voluntary,” but this
is not necessarily true. B. F. Skinner, dean of current operant conditioners,
would argue that an operant response is as determined and involuntary as any
respondent response. For any operant response there is a sequence of stimuli
that causes the response to occur, as the CS causes the CR to occur. But the
operant stimuli are not as easy to identify as the CS. Nor, for most practical
purposes, is it important to be able to identify them. In operant conditioning
we often have satisfactory prediction and control merely through manipulation
of the contingent events.
Occasionally some
of the antecedent stimuli, through learning, develop a particularly strong
control over the operant behavior. These stimuli are then called discriminative stimuli, and the behavior they control is called a discriminative operant. For example, a rat might be trained that when
the left light is on, bar-pressing yields food, whereas when the right light
is on, a bar-press produces no food. If the left light is on for only a short
time, we will probably find our hungry rat will learn to hurry to the bar
when the left light is on and generally avoid the bar when the right light
is on. Here we would say that the discriminative stimulus of the left light
sets the occasion for the discriminative operant of pressing the bar. The
discriminative stimulus is often abbreviated SD while the other stimuli, such
as the right light, are abbreviated SD.
An operant conditioner
might decide to reinforce a particular behavior that he wishes to occur more
often. But what happens if the behavior never occurs the first time? Or what
if the behavior occurs, but only very infrequently? In these situations, it
is desirable to find some way to get the behavior to occur for operant conditioning.
There are many ways to do this, the three most popular of which are known
by the terms shaping, modeling, and fading.
Shaping, also called successive approximation, consists of reinforcing behaviors that gradually approximate the desired behavior. If you want a rat to press
a bar, you don’t wait until it presses the bar to reward it. Rather you shape
it to press the bar by first reinforcing it just for being in that half of
the apparatus where the bar is. Next it has to be within a certain area of
the bar to be reinforced, then it has to touch the bar, then put its paw on
the bar, and finally it has to press the bar. In practice, shaping is more
fluid and less discrete than this description, but the approach is the same.
It is not unusual for a skilled shaper to have a naive rat bar-pressing within
15 minutes of the time the rat is put in the test apparatus.
Similarly, if you
were working with a chronic catatonic who has not talked in ten years, it
would be a poor operant program to wait until he said a sentence to reinforce
him. Rather you must gradually shape him to talk. Perhaps you would first
reward him just for blowing a little air out of his mouth, and from there
slowly move on to getting him to produce simple sounds.
As a further illustration,
take the case of a secretary who is always 15 minutes late for work. If you
decide to reward her with praise on the day when she does come in on time,
you might have a long wait. Rather you should use shaping; i.e., reward her
closer approximations to being on time.
Modeling, as
mentioned in the first chapter, is often a quick way of first getting a response
to occur for operant conditioning; this method often works much faster than
shaping. A rat that watched another rat press a bar may imitate or model the
other rat and have a tendency to press the bar itself. With humans it is often
easy to demonstrate the behavior we wish and reward the modeling.
Fading is
keeping the behavior the same while gradually changing the stimuli. Thus fading
consists of approximations on the stimulus side, whereas shaping consists
of approximations on the response side. As an example of fading, a pigeon
might be trained to peck a disc to the stimulus of a blue square. If it is
now shown a red circle, the pigeon might not peck the disc, as the red circle
is too different from the blue square. But if we keep rewarding the disc-pecking
as we gradually change, or fade, the blue square stimulus into a red circle
stimulus, we can eventually have the pigeon pecking to the red circle without
ever having lost the original disc-pecking.
Principles of fading
become important when we wish to transfer learning from one situation (e.g.,
school room, clinic) into a new situation (e.g., home). Here it is often useful
to provide some transitions between the different settings.
As
a general rule reinforcements are most effective when they occur immediately
after the behavior (see Renner, 1964). The time between the response and the
reinforcement is called the delay
of reinforcement, and short
delays usually produce better learning than long delays. As the delay of reinforcement
increases, the animal often must find ways of mediating the time. A rat may
learn some behavior, such as chewing on the food dish, which mediates the
time. Humans often use language, both out loud and internalized as thoughts,
as mediating behavior. Also, the presence of conditioned reinforcers during
the delay period may help to support mediating behaviors. A problem with long
delays of reinforcment is that they generally allow for many intervening events
to become associated with either the response or the reinforcement, and such
associations may interfere with the response-reinforcement learning. Experiments
designed to minimize such interfering associations permit learning with longer
delays of reinforcement. For example, rats in a simple two-choice learning
task learned the correct response with reinforcement delays of up to 8 minutes
if they were removed from the test apparatus during the delay period (Lett,
1973).
In practical settings
we often find behaviors more affected by sources of immediate reinforcement
than by events that are temporally distant. The alcoholic’s drinking behavior
(which may involve physiological addiction) is often more affected by the
immediate reinforcing effects of drinking (e.g., reduction in anxiety, social
approval, good feelings) than by the longer term punishing effects of having
been drunk (e.g., hangovers). Similarly, a graduate student who only has tç
complete his thesis may often find that the
more immediate rewards associated with play affect his daily life more powerfully
than the long range rewards associated with the completion of his thesis.
The long range rewards may be substantially stronger than the immediate rewards,
but this is often more than offset by the delay of reinforcement and the person’s
experience of working under long delays.
Thus, many programs
geared toward altering human behavior involve providing reinforcements with
short delays and/or taking existing reinforcers with long delays and building
in mediating behaviors and conditioned reinforcers. This cutting down on long
delays will be seen later when we discuss contingency contracting.
We have been discussing
reinforcement as if it occurred after every correct response, but this is
not necessarily the case. For example, we could arrange to reinforce only
every third response. The pattern by which reinforcements are related to responses
is called the schedule of
reinforcement (see Ferster
& Skinner, 1957; Schoenfeld, 1970; Thompson & Grabowski, 1972). There
are two general types of schedules: (1) continuous reinforcement (CRF), in which every correct response is reinforced; and (2) intermittent reinforcement, in which only some of the correct responses
are reinforced. Generally, original learning is faster with continuous reinforcement,
but the number of trials before extinction occurs is larger under intermittent
reinforcement. This longer time to extinction with intermittent schedules
is called the partial reinforcement effect. There are basically four types of intermittent schedules: fixed ratio,
variable ratio, fixed interval, and variable interval.
A fixed ratio schedule means that the animal must make a fixed number of responses before
being reinforced. Thus a rat on an FR-5 schedule must make 5 bar-presses before
being reinforced. By gradually increasing the ratio, an animal can be trained
to make an enormous number of responses for a single reinforcement. Fixed
ratio schedules correspond to piecework pay. A laborer payed for every 3 items
he produces is on an FR-3 schedule.
A variable ratio schedule is the same as the fixed ratio except
that the number of responses required each time varies around some average.
Thus a rat on a VR-9 schedule must press, on the average, about 9 times before
being reinforced. However, one time he might press only twice and another
time might require 12 presses. VR schedules often result in very long times
to extinction. Consider a man playing roulette in Las Vegas and only betting
one number each time. He is on a VR-38 schedule since 38 different numbers
come up randomly. Thus on the average he wins once for each 38 bets (and is
paid only 35 to 1 odds), but he might win twice in a row or go 200 times without
a win. However, with no one influencing the wheel, the long term average will
be about 1 in 38. A behavior maintained under such a schedule is, of course,
difficult to extinguish, and this is one of the variables feeding into gambling
fever.
A fixed interval
schedule is one where the animal is reinforced for the first correct response it
makes after some period of time has passed. Thus a rat on an FI-1-minute schedule
will be reinforced for the first response he makes after one minute has passed.
Responses made before this time is up will have no effect.
A variable interval schedule is the same as the fixed interval except
that the amount of time varies from trial to trial around some average. A
rat on a VI-1-minute schedule is reinforced for the first response he makes
after some period of time. This period may be different each time, but will
average out to about one minute. VI schedules often produce some of the stablest
responding, since the animal can’t “figure out” when to respond and when not
to respond. VI schedules often produce very long times to extinction. Thus
people who want to build in a strong behavior often start their animal or
human subject on CRF and gradually phase them onto a VI schedule.
These four intermittent
schedules can be combined in various ways, such as requiring the animal to
respond first on VR-5, then FI-2 (minutes), then VR-5, and so forth. There
are also many other schedules, too numerous to be fully discussed here.
The simplest
approach to reinforcement is to define it operationally: An event which when
contingent on a response increases the probability of the response is a
reinforcement. This has a touch of circularity to it in that an event is
identified as a reinforcement after it functions as a reinforcement. This
circularity can be overcome by showing that the reinforcement is trans-situational. That is, it is possible to demonstrate that
the event which functions as a reinforcement in one situation also functions as
a reinforcement in quite different situations. (One problem is what constitutes
a “different” situation.) At the empirical, operational level there is fairly
good consensus about the properties
of reinforcement. However,
at the theoretical level there is little consensus about the nature of reinforcement.
A major theoretical
issue is whether reinforcement affects learning or only performance. Theorists
who hold that reinforcement affects learning (e.g., Thorndike and Hull) argue
that the reinforcing event somehow facilitates the learning process or strengthens
the learned association. For example, Landauer (1969) assumes that learning
is by continguity and that reinforcement facilitates the consolidation of
the learning. To Landauer, a reinforcement is any event that strengthens learning,
such as contingent food or CS-UCS pairings.
On the other hand
some theorists (e.g., early Tolman) hold that reinforcement affects only performance,
and not learning. Such theorists often think of the reinforcement event as
being an incentive, an event that the animal is motivated to try
to acquire, rather than an event which strengthens learning. Bolles (1972)
argues that contingent reinforcement is neither a necessary nor a sufficient
condition for operant learning. Bolles’ expectancy theory of learning states
the following primary law of learning: “What is learned is that certain events,
cues, (S), predict certain other, biologically important events, consequences,
(S*). An animal may incidentally show new responses, but what it learns is
an expectancy that represents and corresponds to the S-S* contingency.” According
to Bolles, when an animal learns a relationship between its behavior (R) and
some consequence of this behavior (S*), the animal learnsan R-S* expectancy.
These two expectancies, S-S* and R-S*, are all that is usually learned in operant conditioning. These expectancies
then become “synthesized” so that in the presence of S the animal makes the
response R. Thus if “an animal is placed in a situation where there are cues
predicting food, and food is made contingent upon some response, the animal
will learn first that these cues predict food, and second, that its behavior
produces food. If the animal is hungry, then it is likely to make that response.”
In Bolles’ theory, operant and respondent conditioning both involve learning
S-S* expectancies, and in operant conditioning the subject may also learn
an R-S* expectancy.
Let us now turn to a
few of the many theories of reinforcement, most of which were proposed by
theorists who believed that reinforcement affects learning. Hull (1943)
suggested that all basic drives, such as hunger or the sexual drive, feed into
one non-specific drive. This nonspecific drive then energizes whatever
behaviors the animal makes in the particular stimulus situation. According to
Hull, reinforcement is any event which produces a reduction in this
non-specific drive. Hull’s theory is thus referred to as a drive-reduction theory of reinforcement.
Sheffield (1966a,
1966b), on the other hand, has suggested a drive- induction theory
of reinforcement. Sheffield argues that animals learn those responses which
arouse motivation. If a rat receives food for turning right in a T-maze, as
opposed to turning left, the consummatory response of eating becomes conditioned
to the stimuli of the right side as well as to response-produced stimuli of
the instrumental behavior. When the rat now approaches the choice point, these
stimuli elicit, to some degree, the consummatory response. But since the rat
can’t consume the food until he gets to it, the consummatory stimulation without
consummation is drive induction, which motivates the rat to make the response
(turning right) which in the past preceded the consummatory response. Thus
Sheffield’s rat is forced to make the response because of the drive induction.
Although originally more general, Sheffield’s theory now is basically only
applied to consummatory situations, as opposed, for example, to punishment
situations. The consummatory response may also be a central response without
overt behaviors.
Gibson’s theory of
perceptual learning, discussed in Chapter 2, suggests that reduction of uncertainty
is the reinforcement for much of perceptual learning (Gibson, 1969, p. 120).
The complexity-arousal theories in Chapter 3 also deal with reinforcement
effects.
Premack (1965) has
proposed a theory of reinforcement in which responses reinforce responses.
To determine which responses will act as reinforcers we must first measure
the independent rates of the different responses. This is done by putting
the animal in a situation where it can freely do either of two responses with
no contingencies between the responses. From this, Premack predicts that the
higher probability response will reinforce the lower probability response
if a contingency is established between the two. For example, if a hungry
rat is put in a situation where it can eat food or press a bar (where the
bar-press does not yield anything), the independent rate of eating food will
be higher than the independent rate of pressing the bar. So if the response
of eating food is made contingent on the response of pressing the bar, the
rate of bar-pressing will increase, being reinforced by the opportunity to
eat food.
Premack’s theory
has two major strengths. First, it allows us to incorporate into reinforcement
theory well-known examples of activities reinforcing activities, such as when
the mother tells her son that he must first eat his vegetables (low-probability
behavior) before he may go out and play (high-probability behavior). Although
there are other explanations for such conditions, they fit so well into Premack’s
theory that Premack’s principle of reinforcement is sometimes called “Grandma’s
rule.” The idea that the opportunity to engage in some activity is a reinforcement
underlies much of contingency contracting, discussed later.
The second strength
of Premack’s theory is its suggestion of reinforcement relationships that
are not as obvious from other theoretical positions. For example, in some
situations Premack showed that humans’ pinball playing was reinforced by eating,
while in other situations eating was reinforced by pinball playing. Premack
was also able to reinforce a rat’s drinking with giving it the opportunity
to run in an activity wheel.
So far we have discussed
only positive reinforcement in Premack’s theory. The same logic applies to
negative reinforcement as well, except that now we are talking about the probability
of the offset of an activity or response. Altogether, then, Premack’s principle
of reinforcement is as follows: If the onset or offset of one response is
more probable than the onset or offset of another, the former will reinforce
the latter positively if the superiority is for “on” probability, and negatively
if it is for the “off” probability.
The next set of
reinforcement theories are based on possible physiological bases of
reinforcement. More specifically they center on observations that electrical
stimulation to certain parts of the brain produces strong reinforcing effects.
In the early 1950’s,
Olds and Milner (1954) were doing experiments which involved putting small
electrodes into the brains of rats so that they could electrically stimulate
specific areas of the brain. Since brain functioning is at least partially
electrical in nature, electrically stimulating an area of the brain, and thus
forcing that area to be activated, is one way of testing approximately what
that area does in natural functioning. In the course of one experiment, while
Olds and Milner were aiming their electrodes at one area of the brain
(reticular formation), one electrode, by mistake, ended up much further forward
in the brain. It was observed that stimulation through this electrode seemed to
be “pleasant” to the rat in that the rat would go to specific places on a table
or run a maze to receive this stimulation. Thus began the massive research on
reinforcing electrical stimulation of the brain (ESB). (Similar effects can be
produced by chemical stimulation, but this literature will not be discussed
here.)
The effects of ESB
are usually defined in an operant paradigm. If the animal will make some response,
such as pressing a bar, to turn the stimulation on, the ESB is considered
positively reinforcing, whereas if he will respond to turn it off, the ESB
is negatively reinforcing. The results, however, are often not this simple.
In some situations the ESB is reinforcing at first, but becomes aversive if
continued (Bower & Miller, 1958). This may be because the electrical current
spreads from reward areas into aversive areas or it may be the result of an
actual functional change in the stimulated site.
By putting electrodes
in various parts of the brain, it is possible to map out the “reward” areas
of the brain. It appears that most of the brain, particularly the cortex,
is motivationally inert, with ESB producing neither positive nor negative
reinforcement. The positive reinforcement areas are mostly in subcortical
areas and seem to outnumber the subcortical negative reinforcement areas.
The reinforcing effect
of ESB varies according to the exact placement of the electrode, the species
of the animal, the duration and intensity of the stimulation, and a number
of other variables. But at its best, reinforcing ESB is one of the most powerful
reinforcements that man has discovered. In the extreme, rats will bar-press
for reinforcing ESB to the point of physical exhaustion, often not taking
out sufficient time to eat or drink. The strength of the reinforcing effect
of the ESB is often measured in terms of rate of response, such as how fast
the animal will press a bar. But there are problems with the use of response
rate as a measure (see Valenstein, 1964). For example, the ESB may also elicit
a motor response or seizure which decreases the rate at
which the animal is capable of responding. Or the animal might be reinforced
for responding at a specific rate, as in micromolar theory (Logan, 1956).
Thus, to determine which of two brain areas produces the strongest reinforcing
effect, it might be better to give the animal a choice between ESB to the
different areas rather than to merely compare the response rates of the different
areas. What is the relationship between reinforcing ESB and other more conventional
reinforcements? One thing that stands out is that many of the areas of the
brain where reinforcing ESB is found are also areas concerned with other sources
of reinforcement, such as from eating. For example, the hypothalamus — perhaps
the most popular site for reinforcing ESB — is a critical brain structure
for the control of a wide range of consummatory behaviors, including eating,
drinking, and sex. This has suggested to several theorists, including Olds
(1962), that reinforcing ESB stimulates the actual physiological substrates
of conventional reinforcements.
Miller (1961) showed
correlations between drive reduction theories of reinforcement and reinforcing
ESB, suggesting that the ESB might be stimulating a reward mechanism usually
triggered by drive reduction. For example, it is known that electrical stimulation
of a part of the hypothalamus reduces the amount of food that an animal will
eat. This, then, might be the area stimulated by the drive reduction from
eating. Thus we would expect that ESB in this area would be reinforcing, which
Miller showed was true. (Although Miller reported that continued stimulation
quickly became aversive.) Miller also defended his position by showing how
manipulations of drives often affected how reinforcing the ESB was. Later
Grossman (1967, p. 591) summarized these findings, saying, “The available
evidence indicates that the rate of self-stimulation at a specific electrode
site correlates positively with only one particular drive, suggesting a close
functional relation between specific drives and the reward effect.” However,
there are many reports of conflicting and confusing results in trying to correlate
sites of reinforcing ESB with neural sites related to conventional reinforcements.
Others have pointed
out a number of apparent differences between reinforcing ESB and more conventional
reinforcements. These differences include the following: (a) extinction of
a response which had ESB as the reward is often more rapid than extinction
of responses based on other rewards; (b) satiation to reinforcing ESB often
takes much longer; and (c) it is often difficult to maintain responding under
an intermittent schedule of ESB.
Deutsch (Deutsch & Howarth, 1963) proposed a theory that accounts for some of these differences. According to Deutsch, in reinforcing ESB the electrical current stimulates both a reinforcement system and a motivation system. Stimulation of the motivation system motivates the animal to make the response which results in reinforcement plus motivation to repeat the response. Hence the effect is self-perpetuating, resulting in less satiation with some ESB than with other rewards. Faster extinction and difficulty in maintaining response with intermittent schedules of ESB thus occur because the motivation is eliminated or greatly reduced. (This contrasts with the hungry rat bar-pressing for food that stays hungry even though the bar-press no longer yields food.)
Although there
are many clever experiments supporting Deutsch’s theory (e.g., Deutsch &
Howarth, 1963; Gallistel, 1966), there are also many that refute it. For
example, Cantor (1971) used a situation in which the reinforcing ESB was
made predictable by preceding it with a brief warning signal. In this case
rats would bar-press for a variety of different intermittent schedules of
ESB, including FR-2000 and VI-2 minutes. After reviewing a number of studies
critical of Deutsch’s theory, Trowill, Panksepp, and Gandleman (1969) concluded
that many of the apparent differences between reinforcing ESB and other
rewards are due to the specific conditions of deprivation and training used
by researchers such as Deutsch, and that the results do not hold up in more
general testing situations. They prefer to conceptualize the motivating
effects of ESB in terms of incentives rather than as the stimulation of
a motivational energizing system such as Deutsch’s.
The issues are,
of course, far from resolved. For example, Lenzer (1972) has offered a model
which argues again that there are differences between behavior maintained
by reinforcing ESB and behavior reinforced by more conventional rewards.
According to Lenzer, in CRF situations or where the ESB’s follow each other
closely, the controlling stimuli (those stimuli leading to the operant response)
are internal stimuli produced by the ESB, whereas in similar situations
with conventional rewards, the stimuli produced by the reward do not have
a major role in controlling the response. Lenzer assumed that those ESB-produced
controlling stimuli decay rapidly with time, yielding Deutsch’s type of
results. In other situations, such as widely spaced ESB’s, the subject receiving
reinforcing ESB learns to respond to stimuli similar to those controlling
the behavior under conventional rewards. So in these situations little difference
will be found between the effects of reinforcing ESB and other reinforcements.
Glickman and Schiff
(1967) noted that there was an overlap between those brain areas mediating
positive or negative reinforcement and those areas related to species-typical
behaviors — behaviors that occur in almost all members of
a species. Since these species-typical behaviors are generally important
to the animal, as in survival value, it is useful for them to become linked
with a reinforcement mechanism that will maintain them in the animal’s behavior.
According to Glickman and Schiff, reinforcement evolved as a mechanism to
insure some species-typical behaviors to appropriate stimuli. Thus reinforcing
ESB is the stimulation and facilitation of a neural system underlying species-typical
behavior. Aversive effects of ESB are due to the stimulation of areas related
to withdrawal behaviors.
Consider a domestic
cat growling and attacking objects. It may appear to the observer that the
cat is experiencing something unpleasant. But, as Glickman and Schiff point
out, such behavior may have had survival value in the history of the cat
and thus became associated with reinforcement mechanisms. So our growling
cat may actually be experiencing pleasure.
Reinforcing ESB
has also been investigated in humans by a number of investigators, including
Heath and his associates (e.g., Bishop et al., 1963; Heath, 1963). Heath
uses ESB primarily in a therapeutic setting with mental patients. The “pleasurable”
effects of reinforcing ESB can be used to disrupt undesirable behaviors
that are incapacitating the subjects. In one case (Moan & Heath, 1972)
the investigators took advantage of the fact that stimulation of the septal
area of the brain may produce both pleasure and sexual arousal. The patient
was a 24 year old homosexual male who was repeatedly hospitalized for chronic
suicidal depression. When shown a stag movie of sexual intercourse he showed
no interest. However, after a series of septal stimulations, the subject,
while still feeling “high” from the ESB, was again shown the movie, which
now caused considerable sexual arousal. With the help of more septal stimulations
and a prostitute the experimenters were able to quickly build in the subject
heterosexual behavior which lasted well after treatment. Heath’s emphasis,
then, has been to use ESB more for eliciting responses and emotional states
than for reinforcing specific behaviors, although the two effects are often
confounded.
Delgado (1969)
has developed ESB technology to an impressive stage. Delgado’s subjects
(usually monkeys, although humans were used on occasion) are equipped with
a unit in their skull that simultaneously records brain activity and stimulates
specific areas. This unit can be monitored and controlled via radio communication
so that the subject is not restricted in movement by wires coming out of
his head. Through such a set-up the observer, which may be a computer, can
monitor the subject’s brain activity and stimulate different areas of the
brain when specific reactions are desired. By stimulating different areas,
the subject can be made sleepy, hungry, aggressive, afraid, or sexually
aroused; almost any basic emotion, motivation, or simple physical movement
can be elicited. And the stimulation may be used as a reinforcement.
People who have
received reinforcing ESB say that it is pleasurable; they often describe
the sensation in terms of one or more other types of rewarding sensations,
such as sexual orgasm or the pleasure experienced from having something
good to eat. At present we don’t know just how powerful a reinforcing effect
can be produced by ESB in humans. Is there an area or combination of areas
in the human brain which when stimulated will produce so powerful a pleasurable
sensation that the subject will choose this ESB over all other sensations
or activities? We don’t know, but there is no reason to believe that there
isn’t. The work done on ESB in humans by researchers such as Heath and Delgado
has not really emphasized the reinforcing effects of some ESB; that is,
they have not experimented with requiring the subject to make some response
in order to receive reinforcing ESB. Such experimentation, however, has
its dangers. An example of a misuse of reinforcement would be giving a mental
patient a reinforcing ESB every time we recorded some specific aberrant
activity in his brain. Although we may have intended the ESB to disrupt
the aberrant activity and associated behaviors, we might actually be reinforcing
this particular brain activity to occur more often.
The possibility
of ESB’s having powerful reinforcement effects in man raises a host of philosophical,
ethical, and science-fiction issues. Under what conditions would we have
the right to apply such a technology to someone else? If we find areas where
ESB is pleasurable, should we then give it to everyone? If I had someone
work around my house for me in order to receive reinforcing ESB each night
and he told you he was doing the work voluntarily because he liked ESB so
much, would you object to my coercing work out of him? If the ESB is so
rewarding to my worker that he would do anything to receive it, where does
the concept of “will” enter in? Or is man so complex that he can never be
controlled through such a simple procedure?
When used by itself
the term “punishment” usually refers to positive punishment —
a contingent event whose increase results in
a decrease in the probability of the response it is contingent on. It is
less probable that a child will touch the burner on the stove if he is burned
when he first makes the touching response. Although it is easy to define
punishment in terms of its effect on behavior, the mechanisms by which it
produces these effects are highly debated (Campbell & Church, 1969;
Church, 1963; Dunham, 1971; Johnston, 1972; Solomon, 1964). We will consider
a few of the possibilities.
A punishment probably
elicits emotional responses in the subject, such as fear and anxiety. These
emotional responses then may become respondently conditioned to the situation
in which the punishment occurred. To the extent that these emotions are
incompatible with the punished response, the probability of the response
may decrease. Or these emotional responses may lead to some other imcompatible
response which becomes conditioned to the situation.
The punishment
may elicit some response, other than an emotional response, which becomes
respondently conditioned to the situation. Again, to the extent that this
response is incompatible with the punished response, there will be a decrease
in the probability of the punished response.
Since the onset
of the aversive stimulus is positive punishment, the offset of the stimulus
is negative reinforcement. Thus whatever response the subject is making
when the stimulus goes off, such as an escape response, will be reinforced.
If this reinforced response is incompatible with the punished response,
there will be a decrease in the probability of the punished response. Of
course, punishment need not produce just one of the effects mentioned above,
but may produce different combinations of the effects in different situations.
Dunham (1971) has
summarized the effects of punishment due to electric shock into two basic
rules: (1) That particular response in the organism’s repertoire which is
most frequently associated with shock onset, or which predicts the onset
of shock within a shorter time than other responses, will decrease in probability
and remain below its operant baseline; (2) That particular response in the
organism’s repertoire which is most frequently associated with the absence of shock onset, or which predicts the absence of shock onset for a longer
period of time than other responses, will increase in probability and remain
above its operant baseline.
Premack expanded
his response-probability approach to reinforcement to include punishment
as well (Terhune & Premack, 1970). That is, in reinforcement, response
A will reinforce response B if A is more probable (has a higher independent
rate) than B, whereas in punishment, response A will suppress response B
if A is less probable than B.
In applied situations
the practitioner should generally avoid, the use of punishment as a change
procedure for reasons such as the following:
1. Punishment by
itself does not necessarily produce desirable behavior. Punishing a child
for impolite behavior does not guarantee that he will then show polite behavior,
as the desired behavior may not even be in his repertoire.
2. The punishment
may condition in fear, anxiety, or other perhaps undesired emotions. A worker
may develop a dislike for his job and show little commitment to his work
because his supervisor keeps criticizing his mistakes.
3. The punished
person may develop escape or avoidance behaviors. The author had a case
of a boy with a school phobia so severe that the boy would no longer even
enter the school building. The primary factor that led to this phobia was
that the school emphasized corporal punishment which caused the boy to learn
an avoidance response to school.
4. Attempted punishment
of an escape or avoidance response in some situations increases the strength
of the avoidance. The author watched a father at the beach trying to overcome
his son’s fear of the water. The father would take his son to the edge of
the water and then retreat a short distance. As soon as a medium sized wave
came in, the child became afraid and ran away from the water. The father
punished the child’s running away, verbally or physically, which only made
the boy more anxious, and made him run from the water faster and sooner.
5. Punishment may
result in masochism. If the only time that a child really gets much attention
from his parents is when they punish him, he may be willing to receive the
punishment in order to receive the attention. In such cases the assumed
punishment may become a conditioned reinforcement as the result of its pairing
with the reinforcement of attention (see Chapter 7).
6. The punishing
agent may provide a model for aggressive behavior. Children often model
or imitate their parents. If they see their parents handle conflict situations
by being aggressive, they too will learn to be aggressive.
7. The punished person often becomes less flexible
or adaptable in his behaviors. On the wards of many mental hospitals there
is much that the patient can do and be punished for, but little that he
is rewarded for. In such situations the patient’s best “strategy” is to
do as little as possible.
Because of such
possible effects of punishment as these, it is usually better to try to
reinforce and shape in the desired behaviors, rather than punish the undesired
behaviors. This, of course, is not always practical, as sometimes the behavior
is so detrimental (e.g., the child who keeps running into the street or
the autistic child who claws up his face) that it is necessary to use punishment
to suppress the undesired behavior long enough to build in desired behavior.
Also, a number of cases have been reported in which punishment was a useful
change procedure (Baer, 1971).
If punishment does
have many bad effects and is not one of the most effective change procedures,
why is it so prevalent in our society? There are, of course, a myriad of
reasons, such as moral and legal philosophies (e.g., “an eye for an eye”)
and the fact that the punishing agent often uses punishment to release his
own anger or uncertainty about how to handle a situation. But a major variable
is delay of reinforcement. The immediate effects of punishment are reinforcing
to the punisher, the punished behavior is quickly suppressed, and the punisher
releases some of his emotions. It is in the more long range effects that
the disadvantages of punishment usually arise, but because behavior is so
easily controlled by the short delay effects, people are reinforced to use
punishment.
The United States
generally puts more emphasis on punishment than on rehabilitation. This
is particularly evident in the prison systems, but can be seen at all levels
of society. In behavior change situations people tend to think in terms
of punishing or stopping undesired behavior, rather than building in desired
behavior. The teacher asks “How can I stop the children from running in
the halls?” rather than “How can I get the children to walk in the halls?”
The manager asks “How can I stop my workers from taking extra time during
lunch?” rather than “How can I get the workers to take only one hour for
lunch?” Although these differences may sound semantical, they generally
lead to significantly different approaches to behavior change. A point that
Skinner (see Skinner, 1971) continually makes is the importance to our society
of switching from punishment to reinforcement. For, Skinner argues, reinforcement
procedures are generally more effective than punishment procedures in changing
behavior and maintaining desirable behaviors. Also, behavior control by
pleasant consequences seems preferable to control by aversive consequences.
The second type
of punishment is negative
punishment, a contingent
event whose decrease results in a decrease in the probability of the response
it is contingent on. A mental patient may decrease his delusional talk if
every time he talks this way the social worker walks away from him for five
minutes. Significantly less research has been done on negative punishment
than positive punishment (see Coughlin, 1972). Since negative punishment
essentially consists of withdrawing a positive reinforcement, there are
many possible explanations for the resulting effects. To a certain extent
negative punishment is an operant extinction procedure since behaviors can
now occur and not be reinforced, because the reinforcement is withdrawn.
The act of removing the source of positive reinforcement may also function
as a positive punishment.
A common form of
negative punishment in schools is a time-out procedure. Here
the student to be punished is sent to a room or section of a room in which
he just sits for a short time. If the regular classroom is a source of reinforcement
for the student, then the time-out procedure will be negative punishment.
In an ideal classroom operating on reinforcement principles, time-out may
be the most reasonable form of punishment.
Operant conditioning
has been applied in an amazingly large number of different situations. Here
we will mention only a few examples.
Verhave (1967)
trained pigeons to inspect pills for a drug company. The pigeon would sit
in a cage with two rounded discs before it; one was a translucent window,
the other opaque. A conveyor belt moved pill capsules by the translucent
window. If the pill was acceptable the pigeon pecked the opaque disc; if
defective, it pecked the translucent disc. Within a week of training the
pigeons were working at 99 per cent accuracy. The pigeons were rewarded
with food for making the right discriminations.
During the Second
World War, Skinner (1960) trained pigeons to fly missiles. The pigeons worked
as a homing device in an air-to-ground missile called the Pelican. In the
training the pigeons’ behavior was reinforced for pecking the appropriate
keys controlling the direction of the missile toward the chosen target.
Although Skinner’s project worked quite well, it was not well received by
the appropriate government officials, who caused the project to be terminated.
Pryor (1969) has
shown how to operantly condition “creativity” in porpoises. Her method was
to reward the porpoise only for behaviors that had not been rewarded before.
Thus, after running through its usual repertoire of behaviors, it had to
generate entirely new or creative behaviors. Many of these new behaviors
(aerial flips, gliding with tail out of water) had never been observed in
a porpoise by the staff at the Sea Life Park.
Skinner (MacCorquodale,
1969; Skinner, 1957) has suggested an analysis of speech as essentially
a form of verbal behavior whose acquisition and maintenance is due to operant
conditioning. For example, a small child’s behavior might be reinforced
by fondling for making the operant response “da-da” to the discriminative
stimulus of the father (SD = father; an SD =
milkman). It is also easy
to imagine how the parents gradually shaped the response “da-da” by reinforcing
approximations to this response. Parents’ “ability” to hear “words” in the
seemingly random sounds of their child often facilitates verbal shaping.
The person gradually acquires a very complex set of verbal behaviors which
have been learned because of how useful they are in maximizing reinforcements
in the social environments. Critics of Skinner (e.g., Chomsky, 1959) argue
that Skinner’s analysis cannot account for all the complexities of language
learning and speech. Perhaps there are other variables, such as a predisposition
to acquiring certain grammatical styles, that have to be added. But this
is a question that does not yet seem to have been adequately resolved, although
some critics believe that it has.
An operant analysis
of speech suggests the possibility that thoughts may, totally or to some
degree, be considered covert internalized verbal behaviors that are under
the control of operant variables. This has led to a procedure called coverant control in which thoughts are manipulated by operant
conditioning (Homme, 1965; Mahoney, 1970). For example, the author had a
case of a college student who in certain social situations kept having thoughts
about his social inadequacies. The student was convinced that his thoughts
were irrational and not well founded, but they kept occurring and bothering
him. Through coverant control it was possible to operantly condition other
thoughts to occur in place of the undesired thoughts, and in two weeks the
problem was gone. This was accomplished by the student’s writing the desired
thoughts on small cards, which he then inserted in his cigarette pack. When
in the social situation that elicited the undesired responses, he would
occasionally read to himself one of the desired responses and reinforce
himself, as with a cigarette or by thinking about something particularly
pleasant. This was continued until the desired thoughts replaced the undesired
thoughts.
Much of children’s
behavior can be thought of as operant behavior maintained by the reinforcement
of attention, as in the following examples. When put to bed little Jeffrey
will cry and refuse to sleep until his parents return to his room and read
him a story. Although capable of working by himself, Stevie keeps coming
up to the teacher’s desk for help. When Susie’s parents are engrossed in
adult conversation with some visitors, Susie may do something “cute” to
bring the group attention to her. Ideally parents and teachers should use
their attention to reinforce desirable behaviors and not to reinforce undesirable
ones. There is a natural tendency, however, to do just the opposite. That
is, when the child is doing all right (emitting desirable behavior), the
parent or teacher relaxes and probably leaves the child alone, whereas when
the behavior becomes somewhat troublesome (child emitting undesirable behaviors),
the parent or teacher decides that it is now time to attend to what the
child is doing.
A good operant
conditioner learns to ask the question “What is the function of this behavior?”
That is, what are the operant contingencies maintaining this behavior? Rather
than explaining problem behavior in terms of intra-psychic disturbances
or in terms of the historical development of the problem, the operant conditioner
looks for the contingencies currently maintaining the problem behaviors
and how these contingencies or alternative behaviors might be manipulated.
(This is not to suggest, however, that all behavior can be reduced to the
operant paradigm.) Manipulation of operant contingencies, particularly with
humans, necessarily raises ethical issues about what constitutes “desirable”
behaviors and who has the right to alter another person’s behavior, either
intentionally or not.
Madsen and associates
(1968) investigated the effects of rules, ignoring inappropriate behavior,
and showing approval for appropriate behavior exhibited by students in an
elementary classroom. They concluded that (a) rules alone had little effect
on classroom behavior, (b) the combination of ignoring inappropriate behavior
and showing approval for appropriate behavior was very effective in achieving
better classroom behavior, and (c) approval for desirable behavior is “probably
the key to effective classroom management.”
Emery Air Freight
Corporation had a goal for their customer service department of responding
to customer queries within 90 minutes. The employees felt they met this
goal about nine times out of ten, but in fact it was only three times out
of ten. An operant feedback system was established in which the employees
marked off on their sheets whether each call was answered within 90 minutes.
The supervisor then gave praise and recognition for improvement in performance.
Within one day performance went from the 30 per cent to 90 per cent and
stayed between 90 and 95 per cent for at least three years (Business Week,
Dec.18, 1971).
Sabatasso and Jacobson
(1970) worked with a 58 year old man who had spent five years in a ward
for chronic schizophrenics. His diagnosis was “chronic brain syndrome, resulting
from brain trauma, with psychotic reaction.” The head injury resulted from
being hit over the head with a board during a fight. The subject was considered
a mute psychotic as he had only said one word, “yes,” during his five years
in the hospital. Through modeling and reinforcement with praise and candy
the subject was gradually shaped to speak. Within ten hours of therapy the
subject verbalized 307 words, 56 different words, and several simple sentences.
At one point the subject shouted excitedly, “I’m talkin’ to you.”
A popular behavior
modification procedure in applied operant situations is contingency contracting, a formal agreement about reinforcement contingencies
and required behaviors. A parent specifies exactly what behaviors he expects
from his children (e.g., being home for dinner by 5:30, maintaining a C
average in school) and what reinforcements (e.g., allowance, being permitted
to go to a movie) the child will receive contingent on these behaviors.
A teacher posts the rules for the classroom (e.g., having specified supplies
each day, staying in seat during self-work time) and each student who fulfills
this contract may choose one reinforcement from a list (e.g., 10 minutes
at the end of the class period to read whatever he wants, permission to
leave class 2 minutes early). A person who wants to lose weight gives his
favorite records to a friend and then must earn the records back by specified
weight loss. A husband and wife undergoing marriage counseling learn to
do contingency contracting with each other as a first step toward building
give-and-take into their marriage (e.g., the husband agrees to be home by
2 A. M. on his poker night if the wife fixes one of a number of specified dinners
at least twice a week.)
Various forms of
contingency contracting have been applied to many different types of behaviors
in a wide range of situations. Contingency contracting has many positive
points:
2. All required
behaviors should be well specified so that there is no question about actually
what is expected or arguments about whether the behavior occurred or not.
Many arguments between parents and their children center on whether or not
the child did what he was supposed to do.
3. It forces all
participants to be consistent. The student in the classroom or the child
at home enjoys contingency contracting since he knows that he will receive
a specified reward for a specified behavior and that this is independent
of the parent’s or teacher’s current mood or whether or not the teacher
likes him.
4. It provides
an easy way to guarantee reinforcement for behaviors that ordinarily are
not reinforced or which are reinforced but with too long a delay of reinforcement.
One of the author’s graduate students who had trouble motivating himself
to work on his thesis (too long a delay of reinforcement for thesis completion)
gave the author a number of things that were highly reinforcing to the student
(e.g., guitar, records, books, clothes, and things to consume). The student
gradually earned these back by completing portions of his thesis within
specified time limits.
5. Contracts can
be individualized to deal with the needs of each person. Classes can be
set up for truly individualized instruction. A program in a mental hospital
can take into account each patient’s particular needs and problems.
A variation of
contingency contracting is a token
economy, in which the
immediate reinforcement is tokens which can later be exchanged for other
reinforcements. The tokens, such as poker chips or marks on a chart, are
just the medium for exchange. A token system in a mental hospital might
involve the patient’s earning tokens for behaviors such as dressing himself,
acting in specified ways, and attending vocational rehabilitation programs.
These tokens can later be exchanged for rewards such as magazines, an opportunity
to see a movie, or a trip to town, with the number of required tokens varying
from item to item. The main advantage of tokens is that they can be administered
almost anywhere with little delay of reinforcement. If there is a big enough
selection of things to buy with the tokens, the tokens should always be
reinforcing.
Token economies
have revolutionized mental hospitals (Ayllon & Azrin, 1968), establishing
programs that help large numbers of patients without necessarily increasing
the staff. Token economies in classrooms (O’Leary & Drabman, 1971) provide
settings in which both students and teachers work more effectively and with
more enjoyment. Token systems have also been successfully used in homes,
prisons, and half-way houses (see Kazdin & Bootzin, 1972).
The somatic nervous system is that set of nerves which controls “voluntary”
actions of the skeletal-muscular system, such as moving an arm. The responses
of this system are usually conditioned operantly, but many can also be conditioned
respondently (e.g., human eyelid response or the pattellar reflex). The
autonomic nervous system
is that set of nerves
that controls visceral responses, including circulation, digestion, and
activity of glands. Historically this nervous system was considered inferior
or more primitive than the somatic nervous system. It appeared to function
fairly autonomously, outside of “voluntary” control. Until fairly recently
it was almost universally held by learning theorists that the visceral responses
of the autonomic nervous system could be conditioned respondently, but not
operantly. This suggested that there are at least two different types of
learning: operant conditioning affecting the somatic nervous system but
not the autonomic nervous system, and respondent conditioning affecting
the autonomic nervous system and some
of the somatic nervous system. Today there is
impressive data that visceral responses can be operantly conditioned (DiCara,
1970; Katkin, 1971; Miller, 1969) as well as brought under voluntary control
(see next section on Biofeedback). On the basis of these experiments Miller
(1969) has argued that there may be just one type of learning, based on
reinforcement.
A problem in demonstrating
operant conditioning of visceral responses is that any apparent effects
may be an artifact of the conditioning of a skeletal response. That is,
in trying to operantly condition the visceral response, the experimenter
may actually be operantly conditioning a skeletal response which in turn
produces changes in the visceral response. To avoid this problem, Miller
and his associates (Miller, 1969) gave their rats the drug curare, which
produces paralysis of the skeletal muscles. This drug also facilitated the
conditioning of the visceral responses, perhaps because it removed some
of the variability and distraction from the somatic nervous system.
Miller used reinforcing
brain stimulation as a reinforcement for conditioning his curarized rats.
Miller showed that he could shape the rats’ heart rate either up or down
by reinforcing changes in the desired direction. For example, if he wanted
heart rate to go up he would wait until the natural fluctuations of the
heart rate increased and then reinforce this increase with the reinforcing
brain stimulation. Through shaping, larger and larger changes were required
and generated. Miller also showed that these heart rate changes could be
brought under discriminative control. For example, a rat could be conditioned
so that his heart rate would go down when a light and tone came on. Miller
and his associates then demonstrated the operant conditioning of a variety
of other visceral responses, including intestinal contractions, urine formation
by the kidney, and amount of blood flow in the tail. In one experiment they
were even able to condition the rat so that more blood would flow into one
ear than another. To show that such conditioning effects are not specific
to the use of reinforcing brain stimulation, Miller also conditioned heart
rate changes, intestinal contractions, and changes in blood pressure where
the reward was that the rat avoided shock to his tail.
However, visceral
responses do seem resistant to relatively simple operant conditioning procedures.
It may be that it would be evolutionarily disadvantageous if visceral responses
were readily manipulated by operant contingencies. For health and survival,
an animal’s visceral responses must stay relatively stable despite drastic
changes in environmental contingencies. Otherwise chance reinforcements
might produce an animal with high blood pressure and inadequate responses
by the kidney. On the other hand, it might also be evolutionarily undesirable
if the visceral responses did not respond at all to operant variables. For
in extreme situations such as malfunction, disease, or extreme constant
environmental changes it may be desirable to have visceral learning.
These animal experiments
suggest that many human psychosomatic illnesses might be due to operant
conditioning of visceral responses. For example, blood flow to specific
body organs has been shown to be conditioned operantly, so this could result
in specific psychosomatic symptoms related to that organ. Perhaps reinforcers
such as a mother’s attention or avoidance of unpleasant situations might
be sufficient to shape in psychosomatic illnesses. This is an open area
of research.
Feedback has been
shown to be a powerful determinant of behavior. However, many response systems,
such as visceral responses, provide little or no feedback regarding their
functioning. For example, the reader might try to tune in to the activity
of his liver or try to feel slight changes in blood pressure. Extreme activity
of such systems may be perceived, particularly as they affect other parts
of the body, but the normal fluctuations of activity in these systems are
usually imperceptible owing to inadequate feedback. This is probably just
as well, for if early man had feedback and control of visceral responses,
he probably would have messed himself up more than helped himself.
Earlier we saw
how visceral responses could be manipulated by operant conditioning, a form
of feedback. This suggests that if people were provided feedback from systems
that they don’t usually receive feedback from, such as those controlling
blood pressure, they might be able to learn to control these systems. This
leads to the investigations with biofeedback,
utilizing mechanical
devices that provide knowledge of the activity of a body function for which
the person usually has inadequate feedback (see Lang, 1970; Shapiro &
Schwartz, 1972).
Say, for example,
that we wished to teach a person how to lower his blood pressure. We could
hook him up to a mechanical device that would measure blood pressure and
turn on a green light when the blood pressure went below a specified level.
At first this level might not be very low, but it could be gradually lowered,
as in shaping. After the subject is hooked up to such a device, he is given
the simple instruction to try to get the green light to come on. (He might
also be given other instructions, such as how to relax, but this is not
necessary.) Although he may not know how he is doing it, after a short time
the subject can get the green light to come on “at will.” With a little
more training and shaping the subject is soon able to significantly lower
his blood pressure when he wishes. Shapiro and his colleagues (1969) have
shown how subjects can learn control of blood pressure through such biofeedback
procedures. Schwartz (1972) demonstrated biofeedback control of heart rate
and blood pressure. In fact, Schwartz’s subjects could control heart rate
and blood pressure independently, raising one and simultaneously lowering
the other.
Because of the
apparent absence of internal feedback, subjects learning to control response
systems such as those involved with blood pressure often have no subjective
feeling about what they are doing when they change these responses. They
just “know” how to do it, but they don’t feel any different. Some subjects
develop superstitious behaviors, such as learning to tense or relax some
muscle that is irrelevant to the effect. It remains to be seen whether some
subjects will actually learn to respond to very subtle feedback cues that
are actually correlated with the response system to be changed.
The potential implications
of such studies are enormous. Researchers are currently investigating whether
people with high blood pressure can learn to keep their blood pressure down
by voluntary control. One wonders how many autonomic responses people can
learn to control. Will people in the near future be able to learn control
of their bodies so that a person might be able to voluntarily quiet an upset
stomach or relax by lowering his heart rate? Will a person with a defective
gland learn voluntary control over this gland? Will many medical problems
fall under the domain of the biofeedback trainer? Under what circumstances
is such control over autonomic responses undesirable or dangerous?
There are also
a host of practical problems. For example, in the animal studies on operant
conditioning of visceral responses, Miller found that the animals were much
easier to condition while on the drug curare. Miller suggested that this
might be because without curare the skeletal responses and autonomic responses
elicited by these skeletal responses may interfere with the autonomic responses
that the experimenter wishes to condition. This raises the question of how
effective biofeedback training of autonomic responses in humans can be without
control such as that produced by curare. Another practical problem is how
long a person can maintain autonomic control after he is no longer hooked
up to a biofeedback device. We know that control lasts for a little while,
but we don’t know exactly how long. Perhaps the subject would need occasional
booster training sessions with a biofeedback device in a clinic or with
a small unit at home.
Currently there
is ongoing research on the role of biofeedback procedures in the treatment
of headaches. One group (Budzynski et al., 1970) is reporting success in
treating tension headaches by giving the subjects biofeedback about the
muscle tension in his head and neck. Learning to relax these muscles through
feedback reduces the headaches. Another group (Sargent et al., 1972) has
been investigating migraine headaches by combining biofeedback techniques
with autogenic training. (Autogenic training is a program to learn simultaneous
regulation of mental and somatic functions. Control of somatic responses
is accomplished by concentrating on specific phrases such as “My feet feel
heavy and relaxed.”) The biofeedback training consists of training the subject
to voluntarily increase the blood flow into his hands and thus also increase
hand temperature. This training seems to be an effective way of dealing
with migraine headaches by decreasing the relative blood flow to the head,
although the exact reasons why it works are not clear at the time of this
writing.
The area of biofeedback
training which has attracted the most publicity has been control of specific
brain waves. Electrodes on the human skull record a variety of brain waves
of different frequencies (EEG). Different ranges of the frequencies have
been assigned different names: delta
waves are in the range of 0 to 4 cycles per second; theta designates 4 to
8; alpha, 8 to 13; and beta, more than 13. Although the brain generally
emits a complex combination of different waves, the waves are often predominantly
of one type which correlates with various aspects of behavior. For example,
delta waves are primarily seen during sleep, whereas beta waves are seen
when the person is awake and looking at things or actively thinking something
through.
Biofeedback devices
can let the subject know when his brain waves are primarily within a certain
range. One device might be a tone which sounds in proportion to the amount
of alpha waves the subject is generating. Through such devices a person
can learn to produce specific types of brain waves. The practical applications
of such brain wave control have not been adequately researched yet, but
the following are some possibilities: People who have trouble relaxing might
learn to generate alpha waves as part of a procedure for calming down. Insomniacs
might be partially helped by learning to produce delta waves. Epileptics
might be able to control their seizures to some degree by generating specific
brain patterns. Chapter 8 contains a discussion of some research that is
trying to increase creativity with procedures that include learning to generate
theta waves. Parapsychologists speculate that it may be possible to train
a person to get his mind in that specific state which lends itself best
for receiving extrasensory perception.
The most popular
wave in such experimentation has been the alpha wave — the
type of wave a person would probably be generating if he sat back in a chair
with his eyes closed, relaxed, and tried not to think about anything specific.
This is the wave that people often generate while in meditation. Many machines
(most being of inadequate quality) are being sold to people to learn how
to generate alpha waves. Societies and occult groups have formed around
the idea of alpha wave conditioning. Varied and often preposterous claims
are being made for alpha wave conditioning; for example, that it is a short
cut to deep meditation states, and that it can produce various forms of
extrasensory perception, faster learning, better memory, and better physical
and mental health. It is possible that alpha wave conditioning may facilitate
a variety of phenomena and may even be a necessary condition for some phenomena,
but it is probably not a sufficient condition for most of the phenomena
attributed to it. For example, it seems improbable that alpha wave conditioning
can produce the “deeper” stages of meditation that come with considerable
practice in controlling concentration and training the mind. Lynch and Paskewitz
(1971) have suggested that much of “alpha wave conditioning” may not be
the actual conditioning of brain waves so much as learning to ignore stimuli
and to stop responses which block alpha waves.
A common form of
feedback, particularly useful in education, is knowledge of results, feedback about whether the person’s response
was correct or not (see Annett, 1969). Knowledge of results (KR) may or
may not also contain information about what the correct response is.
The effects of
KR on behavior have been explained in terms of simple reinforcement: On
those trials in which a person was right, he is reinforced by finding out
he was right; on those trials in which he was wrong, the wrong response
is partially extinguished, or punished, or both. If this explanation is
correct, then from our information about delay of reinforcement we would
expect the KR to be most effective when given immediately after the behavior.
However, this does not seem to be true, particularly when KR includes information
about the correct response.
For example, Sassenrath
and Yonge (1969) gave college students multiple-choice tests on material
they had learned. One group received KR immediately after the test, while
another group received KR 24 hours after the test. There was no difference
between the groups on a retention test right after the feedback, but on
a retention test 5 days later there was a small but significant difference
favoring the delayed KR group. Sturges (1972) found a similar superiority
for a 24-hour delay KR group, and presented evidence that the difference
between the groups was due to factors operating at and/or following the
feedback, rather than factors operating during the delay interval. Sturges
suggests that with delayed feedback the subjects respond differently to
the same feedback. For example, with immediate KR the subjects may be concerned
only with that part of the KR which tells them whether or not they were
right, whereas with delayed KR the subject reads through more of the KR
information and hence learns more.
An important, yet
unanswered, question for education is: What is the optimal amount of time
to delay the KR? The answer probably depends on a number of variables, such
as type of subjects used, the nature of the material to be learned, and
the nature of the KR. In one study, More (1969) investigated the effects
of four different delays of KR (knowledge of how subjects did on a retention
test plus what the correct responses to each item were) on the learning
of eighth-grade students. In terms of later retention he found that KR was
more effective when given either 2.5 hours or one day after the first retention
test than if given immediately after or four days after the first test.
KR may also have
a motivational effect, as when the subject decides to work harder. If a
student finds out that his performance on the first exam earned him a C,
he may decide to study harder for the second exam. After having all her
house plants die, a woman may decide to water the next plants more than
once a month. After reviewing the studies done on motivational KR, Locke
and associates (1968) concluded that the main effect was on the goals that
the subject set for himself. They concluded that motivational KR is most
effective when specific goals are set and when the goals set are difficult
ones.
One powerful application
of KR is in the area of programmed instruction, a technology originally investigated by S. L. Pressey in the 1920’s, but
which got its main push from Skinner and his colleagues (Holland, 1960;
Skinner, 1958). In programmed instruction, the material to be learned (the
program) is presented to the student in a series of logical units (frames).
The student is required to make a response to each frame, after which he
is immediately told the correct answer. By this procedure the student is
gradually shaped to the desired terminal behavior. The mechanical device
which presents the program is called a teaching
machine
In programmed instruction,
a student is given a small amount of material to learn and then is asked
a question on the material. The student might be asked to respond in one
of a number of different ways, such as writing his answer or pushing a button
to indicate his choice of answers from a number of alternatives. After he
has answered, the student is told the correct response (KR). The student
then moves to the next frame. If his answer was wrong, he might be diverted
to some other part of the program to review the material he missed.
Many theorists
interpret the effects of KR in programmed instruction as being reinforcement.
However, we have already seen that this is an oversimplification of the
effects of KR, for the KR also provides for additional learning and motivational
changes.
Programmed instruction
has a number of desirable characteristics:
1. Because he is
required to answer questions, the student is more active in his learning
than he is in other learning situations, such as listening to a teacher.
2. The student receives continual and immediate feedback as he progresses. This procedure catches errors early, before the student can go too far in the wrong direction. Also, to the extent that KR is reinforcing, it provides a short delay of reinforcement.
3. The student
must usually learn some material before being permitted to continue. This
is often critical to later learning that presupposes some previous knowledge.
4. The student
can work at his own pace. This allows for individual differences in learning
rate and style and lends itself nicely to individualized instruction.
5. The programmer
receives feedback about how the student is doing on various parts of the
program, and can thus adjust and improve the program accordingly.
On the negative
side, for some people, such as many college students, the format of existing
programmed instruction is too constricting for the type of freewheeling,
conceptual, integrative learning they prefer and learn best with. Also,
it seems that some types of skills, such as problem solving, are better
learned by other teaching procedures. But these criticisms may be applicable
only to the types of programs that currently exist, rather than to the logic
of programmed instruction itself.
There are basically
two types of programs: linear and branching. In a linear program all students progress through the same sequence
of frames. In a branching
program students are
routed through different sequences of frames depending on how well they
do at specific points in the program. Thus if a student misses a question,
he may be routed through a number of different frames that cover the same
material in a slightly different way, while students who didn’t miss the
question continue on to new material. Or, based on an assessment question,
one student may be permitted to skip over a number of frames that cover
material that he already knows.
Programmed instruction
can be made even more flexible by the use of computers (Atkinson, 1968).
This computer-assisted
instruction (CAI) can
handle very complex branching programs, almost instantly presenting to the
student the next frame he needs based on his performance on the last frame.
The computer can also record useful data pertaining to each student, such
as areas of particular difficulty that a teacher might wish to attend to.
The computer can keep data about the effectiveness of the program, such
as which frames the students are making more mistakes on. In CAI the computer
can do a host of other things as well, such as presenting visual displays
on a screen, turning on slides and movies, and presenting audio messages
through headphones to the student.
Throughout this
chapter we have discussed a wide range of effects that feedback can have.
Feedback may produce one or more of the following effects:
3. Feedback may
provide informative cues that guide learning and performance, such as discriminative
cues.
4. Feedback may
provide a new learning experience or a rehearsal of previous learning.
Feedback is the input to an organism resulting from a
response of the organism. It includes both the sensory input from the muscles
involved in making the response and information about how the environment
was changed as a result of the response. Basic behaviors, such as walking
and simple muscular control, require proprioception — feedback from the muscles.
Normal speech depends on auditory feedback—the person’s hearing his own
voice—plus feedback from speech structures, including the tongue. Visual
feedback is involved in tasks such as driving, writing, and drawing. T-groups
involve social feedback in which the participants provide information about
the way they perceive and feel about each other.
S-R theorists often
describe feedback as a source of stimuli to which new responses can be conditioned.
An example of this is chaining, a sequence
of responses in which each response provides part of the stimulus cues (feedback)
for following responses. Chaining is the process by which a rat learns a
complex sequence of behaviors and a person learns a poem by rote. For many
S-S theorists feedback provides information about which behavior is appropriate.
TOTE units are an example of this approach. According to the ideo-motor
theory, responses are chosen on the basis of their anticipated feedback.
Overall feedback may be a reinforcement or a punishment; it may produce
changes in motivation, provide discriminative cues, or provide a new learning
experience or rehearsal of previous learning.
Operant conditioning
is the study of the effects of events that are contiguous on responses.
Perceiving these events, then, is feedback about the effects of the behavior.
If the contiguous event makes it more probable that the response will occur
again in a similar situation, the event is called a reinforcement. If the event makes the response less probable,
the event is a punishment.
Terminating the relationship
between the contiguous event and the behavior results in extinction. There are many ways to originally get a response to occur for operant
conditioning, three of the more popular ways being shaping, modeling, and fading.
For optimal learning,
the delay of reinforcement
— the time between the behavior and the reinforcement—should
generally be as short as possible. Continuous reinforcement means that every correct response is reinforced,
whereas with intermittent
reinforcement only some
of the responses are reinforced. Generally original learning is faster under
continuous reinforcement than under intermittent, and extinction takes longer
with intermittent reinforcement.
There is no consensus
on exactly how reinforcement functions. For example, does reinforcement
affect learning or only performance? Drive reduction theories of reinforcement suggest that animals learn
those responses which reduce drives, such as a hunger drive. Drive induction theories stress that animals learn those responses which
arouse motivation. The Premack
theory is based on the
idea that high probability responses can reinforce low probability responses.
On the physiological level it has been shown that electrical stimulation
of certain brain areas in man and other animals produces reinforcement.
These reinforcement areas are often brain areas related to biological needs
such as hunger and to species-typical behaviors. Some theorists thus argue
that these are the same brain areas that underlie the effects of more conventional
forms of reinforcement, such as food and water. However, other theorists
have suggested a number of differences between reinforcing brain stimulation
and the effects of the other types of reinforcement. These differences may
exist because the reinforcing brain stimulation activates both a reward
system and a motivation system, or because it results in internal stimuli
having more control over the response than would be the case with the conventional
reinforcements.
Punishment, by
definition, reduces the probability of the response that precedes it. In
addition, the punishing event elicits many responses, including emotional
responses, which may become conditioned to the situation in which the punishment
occurred. The offset of the punishment may function as a reinforcement for
behaviors such as escape behaviors. Because of effects such as these, most
forms of punishment are usually not the most desirable way to change human
behavior.
Behavior modification
often involves the reinforcement of desired behaviors and the extinction
and/or punishment of undesired behaviors. One example of this approach is
contingency contracting,
in which there is a formal
agreement among people about the reinforcement contingencies and the required
behaviors. Contingency contracting provides for the systematic application
of operant conditioning, builds more consistency into people’s behavior,
may cut down on the delay of reinforcement, and provides a structure for
individualizing behavior modification programs. A token economy is a form of contingency contracting in which
the person is rewarded with tokens that can later be exchanged for various
reinforcements.
Recent research
has shown that it is possible to condition animals to alter visceral responses
such as heart rate, intestinal contractions, urine formation by the kidney,
and blood flow to specific body areas. This suggests that the answers to
the genesis and treatment of many psychosomatic illnesses may lie within
operant conditioning. Related to this research is the work with biofeedback
in which, via mechanical devices, humans are given feedback about the activity
of body systems which usually provide little or no feedback. Through biofeedback procedures, people may learn voluntary control over heart rate, blood
pressure, specific brain waves, and headaches.
A common form of
feedback, one which is particularly important in education, is knowledge of results (KR): feedback about whether a person’s response was correct or not. One important
question is what the optimal delay of knowledge of results is. For example,
for optimal long term learning, how long after a test should a student be
given feedback about his performance on the test? Programmed instruction involves giving the student immediate knowledge
of results as he actively works his way through a structured program designed
to systematically shape his learning behavior.
Annett, J. Feedback
and Human Behavior. Baltimore: Penguin, 1969.
Barber, T. X., DiCara,
L. V., Kamiya, J., Miller, N. E., Shapiro, D., & Stoyva, J. (eds.) Biofeedback
and Self-Control, 1970. Chicago: Aldine Atherton, 1971.
Honig, W. K. (ed.)
Operant Behavior: Areas of Research and Application. New York:
Appleton-Century-Crofts,
1966.
McGinnies, E., &
Ferster, C. B. (eds.) The Reinforcement of Social Behavior. Boston:
Houghton-Mifflin,
1971.
Reynolds, G. S.
A Primer of Operant Conditioning. Glenview, Ill.: Scott, Foresman,
1968.
Skinner, B. F. Science
and Human Behavior. Toronto: Macmillan, 1953.
Skinner, B. F. Walden
Two. Toronto: Macmillan, 1948.
Tapp, J. T. (ed.)
Reinforcement and Behavior. New York: Academic Press, 1969.
Whaley, D. L., &
Malott, R. W. Elementary Principles of Behavior. New York: AppletonCentury-Crofts,
1971.
Williams, J. L.
Operant Learning: Procedures for Changing Behavior. Belmont, Calif.:
Wadsworth, 1973.