Introduction to Sampling Techniques | Sampling Method Types & Techniques

Sampling Techniques: Introduction
Sampling
Different types of Sampling techniques
Choosing Between Probability and Non-Probability Samples
Probability Sampling
Non-probability sampling

Contributed by: Sreekanth Tadakaluru
LinkedIn Profile: https://www.linkedin.com/in/sreekanth-tadakaluru-3301649b/

Introduction:

Let’s take an example of COVID-19 vaccine clinical trials. It is very difficult to conduct the trials on the entire population, as it deals with time, money, and resources. So in research methodologies, sampling is a method that helps researchers to infer information about a population based on results from a subset of the population, without having to investigate every individual.

A telecom company planning to build a machine learning model to predict, churn customers from their network. One way is to collect all the customers’ information and build a prediction model. This method requires high computational power and resources. So the best way is to take a sample (Subset of customers) from the population (All customers) which represents the population and build the machine learning model. This saves money and effort.

Sampling:

Sampling is the process of selecting a group of individuals from a population to study them and characterize the population as a whole.

The population includes all members from a specified group, all possible outcomes or measurements that are of interest. The exact population will depend on the scope of the study.

The sample consists of some observations drawn from the population, so a part of a subset of the population. The sample is the group of elements who participated in the study.

The sampling frame is the information that locates and defines the dimensions of the universe.

A good sample should satisfy the below conditions-

Representativeness: The sample should be the best representative of the population under study.
Accuracy: Accuracy is defined as the degree to which bias is absent from the sample. An accurate (unbiased) sample is one that exactly represents the population.
Size: A good sample must be adequate in size and reliability.

Also read: Introduction to Inferential Statistics

Different types of Sampling techniques:

There are several different sampling techniques available, and they can be subdivided into two groups-

1. Probability sampling involves random selection, allowing you to make statistical inferences about the whole group.

There are four types of probability sampling techniques

Simple random sampling
Cluster sampling
Systematic sampling
Stratified random sampling

2. Non-probability sampling involves non-random selection based on convenience or other criteria, allowing you to easily collect initial data. There are four types of Non-probability sampling techniques.

Convenience sampling
Judgmental or purposive sampling
Snowball sampling
Quota sampling

Choosing Between Probability and Non-Probability Samples

The choice between using a probability or a non-probability approach to sampling depends on a variety of factors:

Objectives and scope of the study
Method of data collection
Precision of the results
Availability of a sampling frame and resources required to maintain the frame
Availability of extra information about the members of the population

Probability Sampling

Probability sampling is normally preferred when conducting major studies, especially when a population frame is available, ensuring that we can select and contact each unit in the population. Probability sampling allows us to quantify the standard error of estimates, confidence intervals to be formed and hypotheses to be formally tested.

The main disadvantage is Bias in selecting the sample and the costs involved in the survey.

Simple random sampling

In Simple Random Sampling, each observation in the population is given an equal probability of selection, and every possible sample of a given size has the same probability of being selected. One possible method of selecting a simple random sample is to number each unit on the sampling frame sequentially and make the selections by generating numbers from a random number generator.

Simple random sampling can involve the units being selected either with or without replacement. Replacement sampling allows the units to be selected multiple times whilst without replacement only allows a unit to be selected once. Without replacement, sampling is the most commonly used method.

Ex: If a sample of 20 needs to be collected from a population of 100. Assign unique numbers to population members and randomly select 20 members with a random generator. Train and test split in ML problems.

Also Read: What is the probability of winning a lottery?

Applications

Train and test split in machine learning problems
Lottery methods

Advantages

Minimum sampling bias as the samples are collected randomly
Selection of samples is simple as random generators are used
The results can be generalized due to representativeness

Disadvantages

The potential availability of all respondents can be costly and time consuming
Larger sample sizes

Systematic sampling

In systematic random sampling, the researcher first randomly picks the first item from the population. Then, the researcher will select each nth item from the list. The procedure involved in systematic random sampling is very easy and can be done manually. The results are representative of the population unless certain characteristics of the population are repeated for every nth individual.

Steps in selecting a systematic random sample:

Calculate the sampling interval (the number of observations in the population divided by the number of observations needed for the sample)
Select a random start between 1 and sampling interval
Repeatedly add sampling interval to select subsequent households

Ex: If a sample of 20 needs to be collected from a population of 100. Divide the population into 20 groups with a members of (100/20) = 5. Select a random number from the first group and get every 5^th member from the random number.

Applications

Quality Control: The systematic sampling is extensively used in manufacturing industries for statistical quality control of their products. Here a sample is obtained by taking an item from the current production stream at regular intervals.
In Auditing: In auditing the savings accounts, the most natural way to sample a list of accounts to check compliance with accounting procedures.

Advantages

Cost and time efficient
Spreads the sample more evenly over the population

Disadvantages

Complete population should be known
Sample bias If there are periodic patterns within the dataset

Stratified random sampling

In Stratified random sampling, the entire population is divided into multiple non-overlapping, homogeneous groups (strata) and randomly choose final members from the various strata for research. Members in each of these groups should be distinct so that every member of all groups get equal opportunity to be selected using simple probability.

There are three types of stratified random sampling-

1. Proportionate Stratified Random Sampling

The sample size of each stratum in this technique is proportionate to the population size of the stratum when viewed against the entire population. For example, you have 3 strata with 10, 20 and 30 population sizes respectively and the sampling fraction is 0.5 then the random samples are 5, 10 and 15 from each stratum respectively.

2. Disproportionate Stratified Random Sampling

The only difference between proportionate and disproportionate stratified random sampling is their sampling fractions. With disproportionate sampling, the different strata have different sampling fractions.

3. Optimal stratified sampling

The size of the strata is proportional to the standard deviation of the variables being studied.

Ex: A company wants to do an employee satisfaction survey and the company has 300k employees and planned to collect a sample of 1000 employees for the survey. So the sample should contain all the levels of employees and from all the locations. So create different strata or groups and select the sample from each strata.

Advantages

Greater level of representation from all the groups
If there is homogeneity within strata and heterogeneity between strata, the estimates can be as accurate

Disadvantages

Requires the knowledge of strata membership
Might take longer and more expensive
Complex methodology

Cluster sampling

Cluster sampling divides the population into multiple clusters for research. Researchers then select random groups with a simple random or systematic random sampling technique for data collection and data analysis.

Steps involved in cluster sampling:

Create the clusters from the population data
Select each cluster as a sampling frame
Number each cluster
Select the random clusters

After selecting the clusters, either complete clusters will be used for the study or apply the other sampling methods to pick the sample elements from the clusters.

Ex: A researcher wants to conduct an academic performance of engineering students under a particular university. He can divide the entire population into multiple engineering colleges (Which are clusters) and randomly pick up some clusters for the study.

Types of cluster sampling:

One-stage cluster : From the above example, selecting the entire students from the random engineering colleges is one stage cluster
Two-Stage Cluster: From the same example, picking up the random students from the each cluster by random or systematic sampling is Two-Stage Cluster

Advantages

Saves time and money
It is very easy to use from the practical standpoint
Larger sample sizes can be used

Disadvantages

High sampling error
May fail to reflect the diversity in the sampling frame

Non-probability sampling

Non-Probability samples are preferred when accuracy in the results is not important. These are inexpensive, easy to run and no frame is required. If a non-probability sample is carried out carefully, then the bias in the results can be reduced.

The main disadvantage of Non-Probability sampling is “dangerous to make inferences about the whole population.”

Convenience sampling

Convenience sampling is the easiest method of sampling and the participants are selected based on availability and willingness to participate in the survey. The results are prone to significant bias as the sample may not be a representative of population.

Applications

Surveys conducted in social networking sites and offices

Examples: The polls conducted in Facebook or Youtube. The people who are interested in taking the survey or polls will attend the survey and the results may not be accurate as the results are prone to significant bias.

Advantages

It is easy to get the sample
Low cost and participants are readily available

Disadvantages

Can’t generalize the results
Possibility of under or over representation of the population
Significant bias

Quota sampling

This method is mainly used by market researchers. The researchers divide the survey population into mutually exclusive subgroups. These subgroups are selected with respect to certain known features, traits, or interests. Samples from each subgroup are selected by the researcher.

Quota sampling can be divided into two groups-

Controlled quota sampling involves introduction of certain restrictions in order to limit researcher’s choice of samples.
Uncontrolled quota sampling resembles convenience sampling method in a way that researcher is free to choose sample group members

Steps involved in Quota Sampling

Divide the population into exclusive sub groups
Identify the proportion of sub groups in the population
Select the subjects for each subgroup
Ensure the sample is the representative of population

Ex: A painting company wants to do research on one of their products. So the researcher uses the quota sampling methods to pick up painters, builders, agents and retail painting shop owners.

Advantages

Cost effective
Doesn’t depend on sampling frames
Allows the researchers to sample a subgroup that is of great interest to the study

Disadvantages

sample may be overrepresented
Unable to calculate the sampling error
Great potential for researcher bias and the quality of work may suffer due to researcher incompetency and/or lack of experience

Judgement (or Purposive) Sampling

In Judgement (or Purposive) Sampling, a researcher relies on his or her judgment when choosing members of the population to participate in the study. Researchers often believe that they can obtain a representative sample by using sound judgment, which will result in saving time and money.

As the researcher’s knowledge is instrumental in creating a sample in this sampling technique, there are chances that the results obtained will be highly accurate with a minimum margin of error.

Ex: A broadcasting company wants to research one of the TV shows. The researcher has an idea of the target audience and he can choose the members of the population to participate in the study.

Advantages

Cost and time effective sampling method
Allows researchers to approach their target market directly
Almost real-time results

Disadvantages

Vulnerability to errors in judgment by researcher
Low level of reliability and high levels of bias
Inability to generalize research findings

Snowball sampling

This method is commonly used in social sciences when investigating hard-to-reach groups. Existing subjects are asked to nominate further subjects known to them, so the sample increases in size like a rolling snowball. For example, when surveying risk behaviors amongst intravenous drug users, participants may be asked to nominate other users to be interviewed.

This sampling method involves primary data sources nominating other potential primary data sources to be used in the research. So the snowball sampling method is based on referrals from initial subjects to generate additional subjects. Therefore, when applying this sampling method members of the sample group are recruited via chain referral.

There are three patterns of Snowball Sampling-

Linear snowball sampling. Recruit only one subject and the subject provides only one referral
Exponential non-discriminative snowball sampling. Recruit only one subject and the subject provides multiple referrals
Exponential discriminative snowball sampling. Recruit only one subject and the subject provides multiple referrals. But only one subject is picked up from the referrals

Ex: Individuals with rare diseases. If a drug company is interested in doing research on the individuals with rare diseases, it may be difficult to find these individuals. So the drug company can find few individuals to participate in the study and request them to refer the individuals from their contacts.

Advantages

Researchers can reach rare subjects in a particular population
Low-cost and easy to implement
It doesn’t require a recruitment team to recruit the additional subjects

Disadvantages

The sample may not be a representative
Sampling bias may occur
Because the sample is likely to be biased, it can be hard to draw conclusions about the larger population with any confidence

Finally,

Reducing sampling error is the major goal of any selection technique.
A sample should be big enough to answer the research question, but not so big that the process of sampling becomes uneconomical.
In general, the larger the sample, the smaller the sampling error, and the better job you can do.
Decide the appropriate sampling method based on the study or use case.

Hope you found this introduction to sampling techniques helpful!

Introduction to Sampling Techniques | Different Sampling Types and Techniques

Introduction:

Sampling:

Different types of Sampling techniques:

Choosing Between Probability and Non-Probability Samples

Probability Sampling

Simple random sampling

Applications

Advantages

Disadvantages

Systematic sampling

Steps in selecting a systematic random sample:

Applications

Advantages

Disadvantages

Stratified random sampling

Advantages

Disadvantages

Cluster sampling

Types of cluster sampling:

Advantages

Disadvantages

Non-probability sampling

Convenience sampling

Applications

Advantages

Disadvantages

Quota sampling

Advantages

Disadvantages

Judgement (or Purposive) Sampling

Advantages

Disadvantages

Snowball sampling

Advantages

Disadvantages

What is the Frequent Pattern (FP) Growth Algorithm?

What is Big Data Analytics Types, Application and why its Important?

An Introduction to R – Square

Overview of Multivariate Analysis | What is Multivariate Analysis and Model Building Process?

Data Cleaning in Python | What is Data Cleaning?

Apache Hadoop Tutorial |What is Apache Hadoop?