Sampling (statistics)

This page provides an introductory overview of statistical sampling. For more detailed information, see Sampling (statistics) on Wikipedia.

Definition of sampling
Key terms: Sampling, sampling frame, target population

Sampling is a term used in statistics. It is the process of choosing a representative sample from a target population and collecting data from that sample in order to understand something about the population as a whole. Here is a simple illustration. You have before you a freshly baked homemade cherry pie (the greater whole) and you are pondering the question, “Does it taste delicious?” It looks delicious, it smells delicious, but is it delicious? You take a bite (sample) of the pie (greater whole of many bites), let your taste buds study it, and then make a generalization, otherwise known as an inference, about the whole pie ... mmm this pie is delicious!

The term used in statistical sampling to describe the greater whole is population. A population could be a group of people or any group of objects you are studying (e.g., rocks containing gold, dog biscuits made by x-brand, or all left handed one-toothed people in the world). Studying populations can be complicated, expensive, and time consuming so researchers have developed several different ways to sample whatever it is they are studying. Broadly, these sampling techniques are either probability-based (random sampling) or non-probability-based (non-random sampling).

Let's say after the cherry pie experiment you decide to become a researcher. Before you can determine which sampling method to use you must first decide what will be your target population. From that you would develop a sampling frame, or list, of the set of people from whom data could be collected. This can be a difficult task at times. Then you would probably want to apply one of the following methods for determining what, or who, will be in the sample.

(Simple) random sampling
This doesn’t mean haphazard. It means every left-handed one-toothed person in the sampling frame (list) has an equal and unbiased chance of being sampled. Usually you would number the names or items on the list and then randomly pick numbers to make up the sample (or use an equivalent electronic process). This method is great for statistical accuracy, but very difficult to do sometimes in practice. What if you have to travel to Guam, Brazil, Canada and Germany to sample your left handed one toothed people?

Systematic (random) sampling
Here random sampling is given a little structure. You decide you want to sample 100 of the 1000 left-handed one-toothed people in your sampling frame. First, a random starting point is picked and then the rest of the sample is selected at equal intervals from that starting point. So for our example, you would pick a number from 1-100, say the#8, and then pick every 100th person from that number. You start with person #8, which means 8,108, 208, 308 etc. would make up your sample. This method ensures better coverage of the population, as long as nothing quirky comes up as an underlying pattern, such as every 100th person lost their teeth by eating taffy.

Stratified sampling
In this method your sampling frame would be divided into non-overlapping groups and then samples from each of those groups is conducted.

For example, we could group our sampling frame of left-handed one-tooth people according to geographic region. This could give us more valuable precise data if the way they are grouped is relevant to what is being studied. So, we might learn not only how left-handed people lose their teeth but that how they lost them differs between Europe, South America, Asia and North America.

Clustering sampling
First the sampling frame is broken up into groups. Then a sample of groups is randomly picked out of all the groups. Finally, the people in those groups are randomly sampled. This cuts down on travel, time and expense.

For example, you would end up only sampling people from Paris, Buenos Aires, Beijing, and Chicago instead of traveling to 40 different towns and cities in ten different countries. The one drawback is that the groups need to be as dissimilar as possible or you could have a large sampling error. For instance, if all of your groups end up being in Europe you would lose valuable information and it wouldn’t be representative of your population.

Convenience sampling
This is where you pick your sample according to what is available. This is why college students are studied so much…no, truly, it isn’t because they are so strange! Another example is when you see someone on a street corner randomly stopping people to do a survey. Convenience sampling is great because…here it comes…it’s convenient, but it is often difficult to make inferences to the population at large.

Snowball sampling
In this approach the research gains the trust of some people in the target population (e.g., ecstasy users), gathers data, and then asks these people to introduce the researcher to other potential participants. So, the sample gradually snowballs to become larger and larger.