## Introduction:

Let’s take an example of COVID-19 vaccine clinical trials. It is very difficult to conduct the trials on the entire population, as it deals with time, money, and resources. So in research methodologies, sampling is a method that helps researchers to infer information about a population based on results from a subset of the population, without having to investigate every individual.

A telecom company planning to build a machine learning model to predict, churn customers from their network. One way is to collect all the customers’ information and build a prediction model. This method requires high computational power and resources. So the best way is to take a sample (Subset of customers) from the population (All customers) which represents the population and build the machine learning model. This saves money and effort.

## Sampling:

Sampling is the process of selecting a group of individuals from a population to study them and characterize the population as a whole.

The population includes all members from a specified group, all possible outcomes or measurements that are of interest. The exact population will depend on the scope of the study.

The sample consists of some observations drawn from the population, so a part of a subset of the population. The sample is the group of elements who participated in the study.

The sampling frame is the information that locates and defines the dimensions of the universe.

A good sample should satisfy the below conditions-

1. Representativeness: The sample should be the best representative of the population under study.
2. Accuracy: Accuracy is defined as the degree to which bias is absent from the sample. An accurate (unbiased) sample is one that exactly represents the population.
3. Size: A good sample must be adequate in size and reliability.

Also read: Introduction to Inferential Statistics

## Different types of Sampling techniques:

There are several different sampling techniques available, and they can be subdivided into two groups-

1. Probability sampling involves random selection, allowing you to make statistical inferences about the whole group.

There are four types of probability sampling techniques

• Simple random sampling
• Cluster sampling
• Systematic sampling
• Stratified random sampling

2. Non-probability sampling involves non-random selection based on convenience or other criteria, allowing you to easily collect initial data. There are four types of Non-probability sampling techniques.

• Convenience sampling
• Judgmental or purposive sampling
• Snowball sampling
• Quota sampling

## Choosing Between Probability and Non-Probability Samples

The choice between using a probability or a non-probability approach to sampling depends on a variety of factors:

1. Objectives and scope of the study
2. Method of data collection
3. Precision of the results
4. Availability of a sampling frame and resources required to maintain the frame
5. Availability of extra information about the members of the population

## ProbabilitySampling

Probability sampling is normally preferred when conducting major studies, especially when a population frame is available, ensuring that we can select and contact each unit in the population. Probability sampling allows us to quantify the standard error of estimates, confidence intervals to be formed and hypotheses to be formally tested.

The main disadvantage is Bias in selecting the sample and the costs involved in the survey.

## Simple random sampling

In Simple Random Sampling, each observation in the population is given an equal probability of selection, and every possible sample of a given size has the same probability of being selected. One possible method of selecting a simple random sample is to number each unit on the sampling frame sequentially and make the selections by generating numbers from a random number generator.

Simple random sampling can involve the units being selected either with or without replacement. Replacement sampling allows the units to be selected multiple times whilst without replacement only allows a unit to be selected once. Without replacement, sampling is the most commonly used method.

Ex: If a sample of 20 needs to be collected from a population of 100. Assign unique numbers to population members and randomly select 20 members with a random generator. Train and test split in ML problems.

#### Applications

1. Train and test split in machine learning problems
2. Lottery methods

1. Minimum sampling bias as the samples are collected randomly
2. Selection of samples is simple as random generators are used
3. The results can be generalized due to representativeness

1. The potential availability of all respondents can be costly and time consuming
2. Larger sample sizes

## Systematic sampling

In systematic random sampling, the researcher first randomly picks the first item from the population. Then, the researcher will select each nth item from the list. The procedure involved in systematic random sampling is very easy and can be done manually. The results are representative of the population unless certain characteristics of the population are repeated for every nth individual.

#### Steps in selecting a systematic random sample:

1. Calculate the sampling interval (the number of observations in the population divided by the number of observations needed for the sample)
2. Select a random start between 1 and sampling interval
3. Repeatedly add sampling interval to select subsequent households

Ex: If a sample of 20 needs to be collected from a population of 100. Divide the population into 20 groups with a members of (100/20) = 5. Select a random number from the first group and get every 5th member from the random number.

#### Applications

1. Quality Control: The systematic sampling is extensively used in manufacturing industries for statistical quality control of their products. Here a sample is obtained by taking an item from the current production stream at regular intervals.
2. In Auditing: In auditing the savings accounts, the most natural way to sample a list of accounts to check compliance with accounting procedures.

1. Cost and time efficient
2. Spreads the sample more evenly over the population

1. Complete population should be known
2. Sample bias If there are periodic patterns within the dataset

## Stratified random sampling

In Stratified random sampling, the entire population is divided into multiple non-overlapping, homogeneous groups (strata) and randomly choose final members from the various strata for research. Members in each of these groups should be distinct so that every member of all groups get equal opportunity to be selected using simple probability.

There are three types of stratified random sampling-

1. Proportionate Stratified Random Sampling

The sample size of each stratum in this technique is proportionate to the population size of the stratum when viewed against the entire population. For example, you have 3 strata with 10, 20 and 30 population sizes respectively and the sampling fraction is 0.5 then the random samples are 5, 10 and 15 from each stratum respectively.

2. Disproportionate Stratified Random Sampling

The only difference between proportionate and disproportionate stratified random sampling is their sampling fractions. With disproportionate sampling, the different strata have different sampling fractions.

3. Optimal stratified sampling

The size of the strata is proportional to the standard deviation of the variables being studied.

Ex: A company wants to do an employee satisfaction survey and the company has 300k employees and planned to collect a sample of 1000 employees for the survey. So the sample should contain all the levels of employees and from all the locations. So create different strata or groups and select the sample from each strata.

1. Greater level of representation from all the groups
2. If there is homogeneity within strata and heterogeneity between strata, the estimates can be as accurate

1. Requires the knowledge of strata membership
2. Might take longer and more expensive
3. Complex methodology

## Cluster sampling

Cluster sampling divides the population into multiple clusters for research. Researchers then select random groups with a simple random or systematic random sampling technique for data collection and data analysis.

Steps involved in cluster sampling:

1. Create the clusters from the population data
2. Select each cluster as a sampling frame
3. Number each cluster
4. Select the random clusters

After selecting the clusters, either complete clusters will be used for the study or apply the other sampling methods to pick the sample elements from the clusters.

Ex: A researcher wants to conduct an academic performance of engineering students under a particular university. He can divide the entire population into multiple engineering colleges (Which are clusters) and randomly pick up some clusters for the study.

#### Types of cluster sampling:

1. One-stage cluster : From the above example, selecting the entire students from the random engineering colleges is one stage cluster
2. Two-Stage Cluster: From the same example, picking up the random students from the each cluster by random or systematic sampling is Two-Stage Cluster

1. Saves time and money
2. It is very easy to use from the practical standpoint
3. Larger sample sizes can be used

1. High sampling error
2. May fail to reflect the diversity in the sampling frame

## Non-probability sampling

Non-Probability samples are preferred when accuracy in the results is not important. These are inexpensive, easy to run and no frame is required. If a non-probability sample is carried out carefully, then the bias in the results can be reduced.

The main disadvantage of Non-Probability sampling is “dangerous to make inferences about the whole population.”

## Convenience sampling

Convenience sampling is the easiest method of sampling and the participants are selected based on availability and willingness to participate in the survey. The results are prone to significant bias as the sample may not be a representative of population.

#### Applications

1. Surveys conducted in social networking sites and offices

Examples: The polls conducted in Facebook or Youtube. The people who are interested in taking the survey or polls will attend the survey and the results may not be accurate as the results are prone to significant bias.

1. It is easy to get the sample
2. Low cost and participants are readily available

1. Can’t generalize the results
2. Possibility of under or over representation of the population
3. Significant bias

## Quota sampling

This method is mainly used by market researchers. The researchers divide the survey population into mutually exclusive subgroups. These subgroups are selected with respect to certain known features, traits, or interests. Samples from each subgroup are selected by the researcher.

Quota sampling can be divided into two groups-

1. Controlled quota sampling involves introduction of certain restrictions in order to limit researcher’s choice of samples.
2. Uncontrolled quota sampling resembles convenience sampling method in a way that researcher is free to choose sample group members

Steps involved in Quota Sampling

1. Divide the population into exclusive sub groups
2. Identify the proportion of sub groups in the population
3. Select the subjects for each subgroup
4. Ensure the sample is the representative of population

Ex: A painting company wants to do research on one of their products. So the researcher uses the quota sampling methods to pick up painters, builders, agents and retail painting shop owners.

1. Cost effective
2. Doesn’t depend on sampling frames
3. Allows the researchers to sample a subgroup that is of great interest to the study

1. sample may be overrepresented
2. Unable to calculate the sampling error
3. Great potential for researcher bias and the quality of work may suffer due to researcher incompetency and/or lack of experience

## Judgement (or Purposive) Sampling

In Judgement (or Purposive) Sampling, a researcher relies on his or her judgment when choosing members of the population to participate in the study. Researchers often believe that they can obtain a representative sample by using sound judgment, which will result in saving time and money.

As the researcher’s knowledge is instrumental in creating a sample in this sampling technique, there are chances that the results obtained will be highly accurate with a minimum margin of error.

Ex: A broadcasting company wants to research one of the TV shows. The researcher has an idea of the target audience and he can choose the members of the population to participate in the study.

1. Cost and time effective sampling method
2. Allows researchers to approach their target market directly
3. Almost real-time results

1. Vulnerability to errors in judgment by researcher
2. Low level of reliability and high levels of bias
3. Inability to generalize research findings

## Snowball sampling

This method is commonly used in social sciences when investigating hard-to-reach groups. Existing subjects are asked to nominate further subjects known to them, so the sample increases in size like a rolling snowball. For example, when surveying risk behaviors amongst intravenous drug users, participants may be asked to nominate other users to be interviewed.

This sampling method involves primary data sources nominating other potential primary data sources to be used in the research. So the snowball sampling method is based on referrals from initial subjects to generate additional subjects. Therefore, when applying this sampling method members of the sample group are recruited via chain referral.

There are three patterns of Snowball Sampling-

1. Linear snowball sampling. Recruit only one subject and the subject provides only one referral
2. Exponential non-discriminative snowball sampling. Recruit only one subject and the subject provides multiple referrals
3. Exponential discriminative snowball sampling. Recruit only one subject and the subject provides multiple referrals. But only one subject is picked up from the referrals

Ex: Individuals with rare diseases. If a drug company is interested in doing research on the individuals with rare diseases, it may be difficult to find these individuals. So the drug company can find few individuals to participate in the study and request them to refer the individuals from their contacts.

1. Researchers can reach rare subjects in a particular population
2. Low-cost and easy to implement
3. It doesn’t require a recruitment team to recruit the additional subjects

1. The sample may not be a representative
2. Sampling bias may occur
3. Because the sample is likely to be biased, it can be hard to draw conclusions about the larger population with any confidence

Finally,

1. Reducing sampling error is the major goal of any selection technique.
2. A sample should be big enough to answer the research question, but not so big that the process of sampling becomes uneconomical.
3. In general, the larger the sample, the smaller the sampling error, and the better job you can do.
4. Decide the appropriate sampling method based on the study or use case.

Hope you found this introduction to sampling techniques helpful!

1
Next articleYOLO object detection using OpenCV
Great Learning's Blog covers the latest developments and innovations in technology that can be leveraged to build rewarding careers. You'll find career guides, tech tutorials and industry news to keep yourself updated with the fast-changing world of tech and business.