STATISTICS IV: HYPOTHESIS TESTING
 
 

Hypothesis tests are procedures for making rational decisions about the reality of effects.

Rational Decisions

Most decisions require that an individual select a single alternative from a number of possible alternatives. The decision is made without knowing whether or not it is correct; that is, it is based on incomplete information. For example, a person either takes or does not take an umbrella to school based upon both the weather report and observation of outside conditions. If it is not currently raining, this decision must be made with incomplete information.

A rational decision is characterized by the use of a procedure which insures the likelihood or probability that success is incorporated into the decision-making process. The procedure must be stated in such a fashion that another individual, using the same information, would make the same decision.

One is reminded of a STAR TREK episode. Captain Kirk, for one reason or another, is stranded on a planet without his communicator and is unable to get back to the Enterprise. Spock has assumed command and is being attacked by Klingons (who else). Spock asks for and receives information about the location of the enemy, but is unable to act because he does not have complete information. Captain Kirk arrives at the last moment and saves the day because he can act on incomplete information.

This story goes against the concept of rational man. Spock, being the ultimate rational man, would not be immobilized by indecision. Instead, he would have selected the alternative which realized the greatest expected benefit given the information available. If complete information were required to make decisions, few decisions would be made by rational men and women. This is obviously not the case. The script writer misunderstood Spock and rational man.
Effects

When a change in one thing is associated with a change in another, we have an effect. The changes may be either quantitative or qualitative, with the hypothesis testing procedure selected based upon the type of change observed. For example, if changes in salt intake in a diet are associated with activity level in children, we say an effect occurred. In another case, if the distribution of political party preference (Republicans, Democrats, or Independents) differs for sex (Male or Female), then an effect is present. Much of the behavioral science is directed toward discovering and understanding effects.

The effects discussed in the remainder of this text appear as various statistics including: differences between means, contingency tables, and correlation coefficients.



 

General Principles

All hypothesis tests conform to similar principles and proceed with the same sequence of events.

* A model of the world is created in which there are no effects. The experiment is then repeated an infinite number of times.
* The results of the experiment are compared with the model of step one. If, given the model, the results are unlikely, then the model is rejected and the effects are accepted as real. If, the results could be explained by the model, the model must be retained. In the latter case no decision can be made about the reality of effects.

Hypothesis testing is equivalent to the geometrical concept of hypothesis negation. That is, if one wishes to prove that A (the hypothesis) is true, one first assumes that it isn't true. If it is shown that this assumption is logically impossible, then the original hypothesis is proven. In the case of hypothesis testing the hypothesis may never be proven; rather, it is decided that the model of no effects is unlikely enough that the opposite hypothesis, that of real effects, must be true.

An analogous situation exists with respect to hypothesis testing in statistics. In hypothesis testing one wishes to show real effects of an experiment. By showing that the experimental results were unlikely, given that there were no effects, one may decide that the effects are, in fact, real. The hypothesis that there were no effects is called the NULL HYPOTHESIS. The symbol H0 is used to abbreviate the Null Hypothesis in statistics. Note that, unlike geometry, we cannot prove the effects are real, rather we may decide the effects are real.

For example, suppose the following probability model (distribution) described the state of the world. In this case the decision would be that there were no effects; the null hypothesis is true.

Event A might be considered fairly likely, given the above model was correct. As a result the model would be retained, along with the null hypothesis. Event B on the other hand is unlikely, given the model. Here the model would be rejected, along with the null hypothesis.


 

The Model

The SAMPLING DISTRIBUTION is a distribution of a sample statistic. It is used as a model of what would happen if

1.) the null hypothesis were true (there really were no effects), and

2.) the experiment was repeated an infinite number of times.
 

Probability

Probability is a theory of uncertainty. It is a necessary concept because the world according to the scientist is unknowable in its entirety. However, prediction and decisions are obviously possible. As such, probability theory is a rational means of dealing with an uncertain world.

Probabilities are numbers associated with events that range from zero to one (0-1). A probability of zero means that the event is impossible. For example, if I were to flip a coin, the probability of a leg is zero, due to the fact that a coin may have a head or tail, but not a leg. Given a probability of one, however, the event is certain. For example, if I flip a coin the probability of heads, tails, or an edge is one, because the coin must take one of these possibilities.

In real life, most events have probabilities between these two extremes. For instance, the probability of rain tonight is .40; tomorrow night the probability is .10. Thus it can be said that rain is more likely tonight than tomorrow.

The meaning of the term probability depends upon one's philosophical orientation. In the CLASSICAL approach, probabilities refer to the relative frequency of an event, given the experiment was repeated an infinite number of times. For example, the .40 probability of rain tonight means that if the exact conditions of this evening were repeated an infinite number of times, it would rain 40% of the time.

In the Subjective approach, however, the term probability refers to a "degree of belief." That is, the individual assigning the number .40 to the probability of rain tonight believes that, on a scale from 0 to 1, the likelihood of rain is .40. This leads to a branch of statistics called "BAYESIAN STATISTICS." While many statisticians take this approach, it is not usually taught at the introductory level. At this point in time all the introductory student needs to know is that a person calling themselves a "Bayesian Statistician" is not ignorant of statistics. Most likely, he or she is simply involved in the theory of statistics.

No matter what theoretical position is taken, all probabilities must conform to certain rules. Some of the rules are concerned with how probabilities combine with one another to form new probabilities. For example, when events are independent, that is, one doesn't effect the other, the probabilities may be multiplied together to find the probability of the joint event. The probability of rain today AND the probability of getting a head when flipping a coin is the product of the two individual probabilities.

A deck of cards illustrates other principles of probability theory. In bridge, poker, rummy, etc., the probability of a heart can be found by dividing thirteen, the number of hearts, by fifty-two, the number of cards, assuming each card is equally likely to be drawn. The probability of a queen is four (the number of queens) divided by the number of cards. The probability of a queen OR a heart is sixteen divided by fifty-two. This figure is computed by adding the probability of hearts to the probability of a queen, and then subtracting the probability of a queen AND a heart which equals 1/52.


Testing Hypothesis About Single Means

THE HEAD-START EXPERIMENT

Suppose an educator had a theory which argued that a great deal of learning occurrs before children enter grade school or kindergarten. This theory explained that socially disadvantaged children start school intellectually behind other children and are never able to catch up. In order to remedy this situation, he proposes a head-start program, which starts children in a school situation at ages three and four.

A politician reads this theory and feels that it might be true. However, before he is willing to invest the billions of dollars necessary to begin and maintain a head-start program, he demands that the scientist demonstrate that the program really does work. At this point the educator calls for the services of a researcher and statistician.

Because this is a fantasy, the following research design would probably never be used in practice. This design will be used to illustrate the procedure and the logic underlying the hypothesis test.

A random sample 64 four-year old children is taken from the population of all four-year old children. The children in the sample are all enrolled in the head-start program for a year, at the end of which time they are given a standardized intelligence test. The mean I.Q. of the sample is found to be 103.27.

On the basis of this information, the educator wishes to begin a nationwide head-start program. He argues that the average I.Q. in the population is 100 (m =100) and that 103.27 is greater than that. Therefore, the head-start program had an effect of about 103.27-100 or 3.27 I.Q. points. As a result, the billions of dollars necessary for the program would be well invested.

The statistician, being in this case the devil's advocate, is not ready to act so hastily. He wants to know whether chance could have caused the large mean. In other words, head start doesn't make a bit of difference. The mean of 103.27 was obtained because the sixty-four students selected for the sample were slightly brighter than average. He argues that this possibility must be ruled out before any action is taken. If not ruled out completely, he argues that although possible, the likelihood must be small enough that the risk of making a wrong decision outweighs possible benefits of making a correct decision.

To determine if chance could have caused the difference, the hypothesis test proceeds as a thought experiment. First, the statistician assumes that there were no effects; in this case, the head-start program didn't work. He then creates a model of what the world would look like if the experiment were performed an infinite number of times under the assumption of no effects. The sampling distribution of the mean is used as this model. The reasoning goes something like this:
 

POPULATION DISTRIBUTION ASSUMING NO EFFECTS
 

SAMPLING DISTRIBUTION ASSUMING NO EFFECTS AND N = 64
 

RESULTS OF THE EXPERIMENT
 

He or she then compares the results of the actual experiment with those expected from the model, given there were no effects and the experiment was repeated an infinite number of times. He or she concludes that the model probably could explain the results.

Therefore, because chance could explain the results, the educator was premature in deciding that head-start had a real effect.
 

HEAD-START EXPERIMENT REDONE

Suppose that the researcher changed the experiment. Instead of a sample of sixty-four children, the sample was increased to N=400 four-year old children. Furthermore, this sample had the same mean (=103.27) at the conclusion as had the previous study. The statistician must now change the model to reflect the larger sample size.

POPULATION DISTRIBUTION ASSUMING NO EFFECTS
 

SAMPLING DISTRIBUTION ASSUMING NO EFFECTS AND N = 400
 

RESULTS OF THE EXPERIMENT
 

The conclusion reached by the statistician states that it is highly unlikely the model could explain the results. The model of chance is rejected and the reality of effects accepted. Why? The mean that resulted from the study fell in the tail of the sampling distribution.

The different conclusions reached in these two experiments may seem contradictory to the student. A little reflection, however, reveals that the second experiment was based on a much larger sample size (400 vs. 64). As such, the researcher is rewarded for doing more careful work and taking a larger sample. The sampling distribution of the mean specifies the nature of the reward.

At this point it should also be pointed out that we are discussing statistical significance: whether or not the results could have occurred by chance. The second question, that of practical significance, occurs only after an affirmative decision about the reality of the effects. The practical significance question is tackled by the politician, who must decide whether the effects are large enough to be worth the money to begin and maintain the program. Even though head-start works, the money may be better spent in programs for the health of the aged or more nuclear submarines. In short, this is a political and practical decision made by people and not statistical procedures.

More about Hypothesis Testing


HOMEWORK

1. What is the purpose of hypothesis testing?

2. Why is the following explanation incorrect?

The probability value is the probability of obtaining a statistic as different from the parameter specified in the null hypothesis as the statistic obtained in the experiment. The probability value is computed assuming that the null hypothesis is true.

3. Why might an experimenter hypothesize something (the null hypothesis) that he or she does not believe is true?

4. State the null hypothesis for :

a. An experiment comparing the mean effectiveness of two methods of psychotherapy.

b. A correlational study on the relationship between exercise and cholesterol.

c. An invesitgation of whether a particular coin is a fair coin.

d. A study comparing a drug with a placebo on the amount of pain relief. (A one-tailed test was used.)