Bias and Error


A sample is expected to mirror the population from which it comes, however, there is no guarantee that any sample will be precisely representative of the population from which it comes. Chance may dictate that a disproportionate number of untypical observations will be made like for the case of testing fuses, the sample of fuses may consist of more or less faulty fuses than the real population proportion of faulty cases. In practice, it is rarely known when a sample is unrepresentative and should be discarded.

Sampling error

What can make a sample unrepresentative of its population? One of the most frequent causes is sampling error.

Sampling error comprises the differences between the sample and the population that are due solely to the particular units that happen to have been selected.

For example, suppose that a sample of 100 american women are measured and are all found to be taller than six feet. It is very clear even without any statistical prove that this would be a highly unrepresentative sample leading to invalid conclusions. This is a very unlikely occurance because naturally such rare cases are widely distributed among the population. But it can occur. Luckily, this is a very obvious error and can be etected very easily.

The more dangerous error is the less obvious sampling error against which nature offers very little protection. An example would be like a sample in which the average height is overstated by only one inch or two rather than one foot which is more obvious. It is the unobvious error that is of much concern.

There are two basic causes for sampling error. One is chance: That is the error that occurs just because of bad luck. This may result in untypical choices. Unusual units in a population do exist and there is always a possibility that an abnormally large number of them will be chosen. For example, in a recent study in which I was looking at the number of trees, I selected a sample of households randomly but strange enough, the two households in the whole population, which had the highest number of trees (10,018 and 6345 ) were both selected making the sample average higher than it should be. The average with these two extremes removed was 828 trees. The main protection agaisnt this kind of error is to use a large enough sample. The second cause of sampling is sampling bias.

Sampling bias is a tendency to favour the selection of units that have paticular characteristics.

Sampling bias is usually the result of a poor sampling plan. The most notable is the bias of non response when for some reason some units have no chance of appearing in the sample. For example, take a hypothetical case where a survey was conducted recently by SJU Graduate school to find out the level of stress that graduate students were going through. A mail questionnaire was sent to 100 randomly selected graduate students. Only 52 responded and the results were that students were not under strees at that time when the actual case was that it was the highest time of stress for all students except those who were writing their thesis at their own pace. Apparently, this is the group that had the time to respond. The researcher who was conducting the study went back to the questionnaire to find out what the problem was and found that all those who had responded were third and fourth PhD students. Bias can be very costly and has to be gaurded against as much as possible. For this case, $2000.00 had been spent and there were no reliable results in addition, it cost the reseacher his job since his employer thought if he was qualified, he should have known that before hand and planned on how to avoid it. A means of selecting the units of analysis must be designed to avoid the more obvious forms of bias. Another example would be where you would like to know the average income of some community and you decide to use the telephone numbers to select a sample of the total population in a locality where only the rich and middle class households have telephone lines. You will end up with high average income which will lead to the wrong policy decisions.

Non sampling error (measurement error)

The other main cause of unrepresentative samples is non sampling error. This type of error can occur whether a census or a sample is being used. Like sampling error, non sampling error may either be produced by participants in the statistical study or be an innocent by product of the sampling plans and procedures.

A non sampling error is an error that results solely from the manner in which the observations are made.

The simplest example of non sampling error is inaccurate measurements due to malfuntioning instruments or poor procedures. For example, Consider the observation of human weights. If persons are asked to state their own weights themselves, no two answers will be of equal reliability. The people will have weighed themselves on different scales in various states of poor caliberation. An individual`s weight fluctuates diurnally by several pounds, so that the time of weighing will affect the answer. The scale reading will also vary with the person`s state of undress. Responses therefore will not be of comparable validity unless all persons are weighed under the same circumstances.

Biased observations due to inaccurate measurement can be innocent but very devastating. A story is told of a French astronomer who once proposed a new theory based on spectroscopic measurements of light emitted by a particular star. When his colleques discovered that the measuring instrument had been contaminated by cigarette smoke, they rejected his findings.

In surveys of personal characteristics, unintended errors may result from: -The manner in which the response is elicited -The social desirability of the persons surveyed -The purpose of the study -The personal biases of the interviewer or survey writer

The interwiers effect

No two interviewers are alike and the same person may provide different answers to different interviewers. The manner in which a question is formulated can also result in inaccurate responses. Individuals tend to provide false answers to particular questions. For example, some people want to feel younger or older for some reason known to themselves. If you ask such a person their age in years, it is easier for the idividual just to lie to you by over stating their age by one or more years than it is if you asked which year they were born since it will require a bit of quick arithmetic to give a false date and a date of birth will definitely be more accurate.

The respondent effect

Respondents might also give incorrect answers to impress the interviewer. This type of error is the most difficult to prevent because it results from out right deceit on the part of the respondee. An example of this is what I witnessed in my recent study in which I was asking farmers how much maize they harvested last year (1995). In most cases, the men tended to lie by saying a figure which is the reccomended expected yield that is 25 bags per acre. The responses from men looked so uniform that I became suspicious. I compared with the responses of the wives of the these men and their responses were all different. To decide which one was right, whenever possible I could in a tactful way verify with an older son or daughter. It is important to acknowledge that certain psychological factors induce incorrect responses and great care must be taken to design a study that minimizes their effect.

Knowing the study purpose

Knowing why a study is being conducted may create incorrect responses. A classic example is the question: What is your income? If a government agency is asking, a different figure may be provided than the respondent would give on an application for a home mortgage. One way to guard against such bias is to camouflage the study`s goals; Another remedy is to make the questions very specific, allowing no room for personal interpretation. For example, "Where are you employed?" could be followed by "What is your salary?" and "Do you have any extra jobs?" A sequence of such questions may produce more accurate information.

Induced bias

Finally, it should be noted that the personal prejudices of either the designer of the study or the data collector may tend to induce bias. In designing a questionnaire, questions may be slanted in such a way that a particular response will be obtained even though it is inacurrate. For example, an agronomist may apply fertilizer to certain key plots, knowing that they will provide more favourable yields than others. To protect against induced bias, advice of an individual trained in statistics should be sought in the design and someone else aware of search pitfalls should serve in an auditing capacity.