Write a Python code of Inferential Statistics with a Hypothesis Test in Jupyter Notebook

What is Inferential Statistics?

Inferential statistics is a branch of statistics that allows us to make conclusions or inferences about a population based on a sample of data from that population. While descriptive statistics focuses on summarizing the features of a dataset (like mean, median, and standard deviation), inferential statistics helps to answer broader questions and make predictions.
Inferential statistics uses the data gathered from a sample to make estimates or predictions about the entire population. The two major concepts involved in inferential statistics are:
1. Estimation: Estimating population parameters (such as population mean or proportion) from sample statistics.
2. Hypothesis Testing: Testing an assumption or hypothesis about a population parameter.

Importance of Inferential Statistics:
  • Helps in decision-making when it's impossible or impractical to collect data from the entire population.
  • Aids in understanding relationships and patterns in data that can be generalized to a broader context.  
What is Hypothesis Testing?
Hypothesis testing is a fundamental part of inferential statistics. It’s a method used to decide population parameters based on sample data. In hypothesis testing, we start with two competing hypotheses:
  • Null Hypothesis (H₀): A default position that states there is no effect or difference.
  • Alternative Hypothesis (H₁): A statement that contradicts the null hypothesis, indicating an effect or difference exists.
The steps in hypothesis testing typically include:
1. State the Hypotheses: Formulate the null and alternative hypotheses.
2. Select a Significance Level (α): This is the probability threshold (e.g., 0.05) below which we reject the null hypothesis.
3. Choose a Test Statistic: This could be a Z-test, t-test, chi-square test, etc., depending on the data.
4. Calculate the p-value: The p-value tells you the probability of obtaining the observed data, assuming the null hypothesis is true.
5. Make a Decision: Compare the p-value to α. If the p-value is less than α, reject the null hypothesis in favor of the alternative.

Example of Hypothesis Test:
Let’s consider an example where we want to test whether the average height of a population differs from a known value. We take a sample from the population, calculate the sample mean, and use a hypothesis test to infer about the population mean.

Python Example: Hypothesis Testing in Jupyter Notebook
Here, we'll use Python to perform a one-sample t-test to test whether the mean of a sample is significantly different from a known population mean.

Example Scenario:
Assume we have a sample of test scores, and we want to determine whether the sample mean differs significantly from a hypothesized population mean of 75.

Python Code Example:
# Import necessary libraries
import numpy as np
from scipy import stats
# Sample data: Test scores of 20 students
sample_data = [78, 72, 79, 71, 74, 77, 70, 75, 76, 73, 79, 80, 81, 72, 74, 77, 78, 73, 71, 76]
# Hypothesized population mean
population_mean = 75
# Perform a one-sample t-test
t_statistic, p_value = stats.ttest_1samp(sample_data, population_mean)
# Print the results
print(f"T-statistic: {t_statistic}")
print(f"P-value: {p_value}")
# Set a significance level
alpha = 0.05
# Make a decision
if p_value < alpha:
    print("Reject the null hypothesis: The sample mean is significantly different from the population mean.")
else:
    print("Fail to reject the null hypothesis: There is no significant difference between the sample mean and the population mean.")

Explanation:
1. Sample Data: This is a set of 20 test scores.
2. Hypothesized Population Mean: We assume the population mean is 75.
3. t-test: We use `ttest_1samp` from `scipy.stats` to perform a one-sample t-test.
4. Decision: We reject the null hypothesis if the p-value is less than our significance level (0.05).

Output:


The t-statistic and p-value will be printed, and based on the p-value, a decision will be made on whether to reject the null hypothesis.

Conclusion
Inferential statistics is crucial for drawing conclusions about a population from a sample. Hypothesis testing is a core tool within inferential statistics that helps us make decisions based on data. In this blog post, we explored how hypothesis testing works and provided an example using Python to perform a one-sample t-test.

Write a Python code of Inferential Statistics with a Hypothesis Test in Jupyter Notebook

Here’s a complete Python example demonstrating inferential statistics with a hypothesis test using a Jupyter Notebook setup. This example uses the one-sample t-test to test whether the mean of a sample is significantly different from a hypothesized population mean. Let’s assume we want to test if the average height of a group of students differs from a known average height of 170 cm.
Hypothesis:
  • Null Hypothesis (H₀): The average height of students = 170 cm.
  • Alternative Hypothesis (H₁): The average height of students ≠ 170 cm.
The code below can be run in a Jupyter Notebook:
# Import necessary libraries
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
# Generate sample data: Heights of 30 students (in cm)
np.random.seed(0)  # Set seed for reproducibility
sample_heights = np.random.normal(loc=172, scale=5, size=30)  # Mean = 172, StdDev = 5
# Hypothesized population mean
population_mean = 170
# Descriptive statistics of the sample
sample_mean = np.mean(sample_heights)
sample_std_dev = np.std(sample_heights, ddof=1)  # Using Bessel's correction (ddof=1)
sample_size = len(sample_heights)
print(f"Sample Mean: {sample_mean:.2f}")
print(f"Sample Standard Deviation: {sample_std_dev:.2f}")
print(f"Sample Size: {sample_size}")
# Perform a one-sample t-test
t_statistic, p_value = stats.ttest_1samp(sample_heights, population_mean)
# Print the results of the t-test
print(f"\nT-statistic: {t_statistic:.2f}")
print(f"P-value: {p_value:.4f}")
# Set a significance level
alpha = 0.05
# Make a decision based on the p-value
if p_value < alpha:
    print("\nReject the null hypothesis: The average height is significantly different from 170 cm.")
else:
    print("\nFail to reject the null hypothesis: There is no significant difference between the sample mean and 170 cm.")
# Visualize the sample data
plt.figure(figsize=(8, 5))
plt.hist(sample_heights, bins=8, color='skyblue', edgecolor='black')
plt.axvline(population_mean, color='red', linestyle='dashed', linewidth=2, label='Population Mean (170 cm)')
plt.axvline(sample_mean, color='green', linestyle='solid', linewidth=2, label=f'Sample Mean ({sample_mean:.2f} cm)')
plt.xlabel('Height (cm)')
plt.ylabel('Frequency')
plt.title('Distribution of Sample Heights')
plt.legend()
plt.show()

Explanation:
1. Generate Sample Data: We create a synthetic dataset of 30 student heights, with a mean of 172 cm and a standard deviation of 5 cm.
2. Hypothesis Setup: The null hypothesis is that the mean height is 170 cm.
3. Perform t-test: Using the `ttest_1samp` function from `scipy.stats`, we perform a one-sample t-test.
4. Decision Based on p-value: We compare the p-value to the significance level (0.05) and decide whether to reject the null hypothesis.
5. Visualization: We plot a histogram of the sample data to visually compare the sample mean and the hypothesized population mean.

Output:
This code will print:
  • The sample mean, standard deviation, and size.
  • The t-statistic and p-value.
  • A decision on whether to reject the null hypothesis based on the p-value.
Additionally, the histogram will show the distribution of heights with lines marking the population mean and sample mean, providing a visual representation of the test.

Comments