Testing of hypothesis and Prerequisite Knowledge

A hypothesis is a statement or assumption about a population parameter (e.g., mean, proportion) that can be tested using statistical methods. It is the foundation of hypothesis testing, which determines whether there is enough statistical evidence in a sample to infer a conclusion about the entire population.

Karl Poppers

Whenever there is a conjecture which is a statement that is not yet proved According to Karl Poppers it is easier to disprove it by showing Empirical Evidence. The conjecture is called the Null Hypothesis and the opposite of it is called the Alternative Hypothesis.

Attributes

Population : In testing of hypothesis the number of subjects that are measured is called the population.
Sampling: The reduced amount of data.
- It is used largely by government, industries., etc. when the data is too hard to collect and we collect a sample of the data.
$H_{0}$ : Null Hypothesis
$H_{1}$ : Alternative Hypothesis

graph LR
    A(Formulate Hypothesis) --> B(Collect Sample Data)
    B --> C(Analyze Data)
    C --> D{Is there sufficient evidence?}
    D -- Yes --> E(Reject Null Hypothesis)
    D -- No --> F(Fail to Reject Null Hypothesis)
    E --> G(Make an Inference About the Population)
    F --> G

Surveys

Surveys are used to collect data from a population.
The data is collected from a sample of the population.
The data is then analyzed to make a statement about the population.

Parameters & Statistics

We use greek for population and english for sample. Population measures like mean ( $μ$ ) and variance $σ^{2}$ are called parameters. The sample measures like mean ( $\overset{x}{ˉ}$ ) and variance $s^{2}$ are called statistics.

Statistical Hypothesis

Hypothesis
- A new drug significantly reducing blood pressure.
Null Hypothesis
- Definition
  - A definite statement about a population parameter which is tested for possible rejection under the assumption that it is true. It is usually a hypothesis of no difference. Represented by $h_{0}$
- The new drug does not reduce blood pressure compared to a placebo.
Testing Process
- Researchers would conduct a clinical trial, and if the data shows a statistically significant decrease in blood pressure in the drug then the hypothesis could be accepted Else it is rejected
Alternative Hypothesis
- Any hypothesis that is a complementary to null hypothesis is called an alternative hypothesis and is denoted by $h_{1}$ .

Types of errors

Type 1 Error
- Rejecting a true null hypothesis. The probability of making a type 1 error is denoted by $α$ P(Rejecting $H_{0}$ | $H_{0}$ ) = $α$ )
  - Example
    - Convicting an innocent person.
    - 100 Phones, 10 phones were sampled for test. produced we found 1 defected box so it was a type 1 error.
Type 2 Error
- Accepting a false null hypothesis.. The probability of making a type 2 error is denoted by $β$ P(Accepting $H_{0}$ | $H_{1}$ ) = $β$ )
  - Example
    - Acquitting a guilty person.
    - 100 Phones produced, 10 phones were sampled for test. no defected phones were point we have a type 2 error.
$α$ and $β$ is referred to as Producer’s Risk and Consumer’s Risk respectively.

Example Problems

Example 1

Average Marks of boys are not same as average marks of girls Let average marks for boys be $μ_{1}$ and average marks of girls be $μ_{2}$ $H_{0} : μ_{1} = μ_{2}$
$H_{1} : μ_{1} \neq = μ_{2}$

Example 2

Average Height of boys is more than average height of girls Let average height for boys be $μ_{1}$ and average height of girls be $μ_{2}$ $H_{0} : μ_{1} = μ_{2}$
$H_{1} : μ_{1} > μ_{2}$

One Tailed & Two Tailed Test

Out of a sample size $n_{1}$ and the average $\overset{x_{1}}{ˉ}$ then we took a different sample $n_{2}$ and the average was $x_{2}$ Like this we took $n_{n}$

$h_{0}$ : $μ = μ_{0}$
- $h_{1}$ : $μ > μ_{0}$ Right Tailed Test
- $h_{1}$ : $μ$ < $μ_{0}$ Left Tailed Test
Two Tailed $μ = μ_{0}$
- Against $h_{1}$ : $μ \neq = μ_{0}$

Level of significance

The probability lets say $α$ of rejecting a true null hypothesis is called the level of significance. $p (Rejecting H_{0} ∣ H_{0}) = α$ .

The level of significance is the probability of rejecting a true null hypothesis. It is denoted by $α$ .

If we know the probability of $α$ then we can calculate the $Z$ Value. The $Z$ value is the number of standard deviations a data point is from the mean.

Example

If the $α = 0.05$ we look closely at the Z table and find that the value of $Z_{α}$ is 1.96.

Confidence Interval

The confidence interval is the range of values within which the true value of the parameter is expected to lie with a certain level of confidence. The confidence interval is denoted by $1 - α$ where $α$ is the level of significance. $(\overset{ˉ}{X} - Z_{α} \frac{σ}{n}, \overset{ˉ}{X} + Z_{α} \frac{σ}{n})$

Tests of significance Problems

Question 1

Test of significance between population mean and sample mean. A sample size is considered larger if the sample size is greater than 30. If the sample size is less than 30 then the sample size is considered small.

Question 1

Sample size = 100 Standard Deviation $σ$ = 10cm Sample Mean $\overset{ˉ}{X}$ = 160cm Mean Height $μ$ = 165cm

$H_{0} : μ = 165$
$H_{1} : μ \neq = 165$ ( Two Tailed Test )

$α = 0.05$ $Z_{α} = 1.96$

Solving Test statistics of $∣ Z ∣ < Z_{α}$

$Z = \frac{X ˉ - μ}{\frac{σ}{n}} = \frac{160 - 165}{\frac{10}{100}} = - 5$ $∣ Z ∣ = Z > Z_{α}$ $5 > 1.96$ Reject $H_{0}$

$Conclusion: Reject H_{0}$

Question 2

A random sample of 200 measurements from a large population has a mean of 50 and a standard deviation of 10. Test the hypothesis that the population mean is 52 against the alternative hypothesis that the population mean is not 52. Use a level of significance of 0.05.

$H_{0} : μ = 52$

T Test

If the sample size is less than 30 then the sample size is considered small. The test statistic is calculated using the t-distribution. degrees of freedom = n-1 $T_{α}$ is the t value for the level of significance $α$ and degrees of freedom $n - 1$ .

\overset{x}{ˉ} = \frac{x _{1} + x _{2} + x _{3} + \dots + x _{n}}{n}

When $\overset{x}{ˉ}$ is known we can ignore only one value thus degree of freedom is $n - 1$ .

T = \frac{X ˉ - μ}{\frac{s}{n - 1}}

Properties of t-distribution

The t-distribution is symmetric about the mean.
The t-distribution has a mean of 0.
The t-distribution is more spread out than the standard normal distribution.

T = \frac{( X ˉ _{1} - X ˉ _{2} )}{\frac{s _{1}^{2}}{n _{1}} + \frac{s _{2}^{2}}{n _{2}}}

Problems Small Dataset

Question 1

A machine solvses a problem in 1.75 seconds. A new machine is introduced and the time taken to solve the problem is 1.85 seconds. The standard deviation is 0.1. Test the hypothesis that the new machine is inferior to the old machine. Use a level of significance of 0.05. $n = 10$ $H_{0} : μ = 1.75 \to$ machine is not inferior $H_{1} : μ \neq = 1.75 \to$ Two tailed Test Machine is inferior $\overset{ˉ}{X} = 1.85$ $σ = 0.1$ $α = 0.05, df = n-1=9$ $T_{α} = 2.262$

T = \frac{1.85 - 1.75}{\frac{0.1}{9}} = 3

Question 2

A certain injection is administered will it always

$n = 12$ $\overset{ˉ}{X} = 2.4167$ $σ = 3.09$

$H_{0} : μ = 0 \to$ There $H_{1} : μ > 0 \to$ There is a significant difference

TOS of difference between two large sample means

Testing of significance of difference between two large samples means. We will now have two values of $\overset{ˉ}{X}$ and two values of $σ$ and two values of $n$ . We will also calculate the $Z$ value for the two samples. If student 1 is asked to get a sample of college students with marks and student 2 is asked to get another sample. The standard deviation will remain the same. This is because student 1 and student 2

When the samples are too large it will follow standard normal distribution. When the samples are too small it will follow the t-distribution. Assumption will be made on the basis of sample size.

Cases:

Case 1: $σ_{1} = σ_{2}$ and known
Case 2: $σ_{1} = σ_{2}$ and unknown
Case 3: $σ_{1} \neq = σ_{2}$ and known
Case 4: $σ_{1} \neq = σ_{2}$ and unknown

Formulas:

Case 4 Formula:

z = \frac{x ˉ _{1} - x ˉ _{2}}{\frac{s _{1}^{2}}{n _{1}} + \frac{s _{2}^{2}}{n _{2}}} \sim N (0, 1)

Case 3 Formula:

z = \frac{x ˉ _{1} - x ˉ _{2}}{\frac{σ _{1}^{2}}{n _{1}} + \frac{σ _{2}^{2}}{n _{2}}}

Case 1 Formula:

z = \frac{x ˉ _{1} - x ˉ _{2}}{σ \frac{1}{n _{1}} + \frac{1}{n _{2}}}

Case 2 Formula:

z = \frac{x ˉ _{1} - x ˉ _{2}}{\frac{s ^{2}}{n _{1}} + \frac{s ^{2}}{n _{2}}}

Test of significance of difference between two sample means ( Small Samples )

$t = \frac{x ˉ _{1} - x ˉ _{2}}{\frac{n _{1} s _{1}^{2} + n _{2} s _{2}^{2}}{n _{1} + n _{2} - 2} ( \frac{1}{n _{1}} + \frac{1}{n _{2}} )}$

Questions

Samples of two types of electric bulbs is given

	Size	Mean	Standard Deviation
Sample1	8	1214	36
Sample2	7	1036

Questions based on TOS

The average marks scored by 32 boys is 72 with an sd of 8 while that for 36 girls is 70 with an SD of 6. Test at 1% LOS whether the boys perform better than girls.

Paired Testing

When there are two different instances of same sample we can use paired testing. In the case of the example where students first exam and second exam Let $x_{1}$ be the marks of the first exam and $x_{2}$ be the marks of the second exam. $d = x_{1} - x_{2}$ is the difference between the two exams. $\overset{ˉ}{d} = \frac{1}{n} \sum (x_{1} - x_{2})$ $s_{d}^{2} = \frac{1}{n} \sum (x_{1} - x_{2} - \overset{ˉ}{d})^{2}$ or $s_{d}^{2} = \frac{1}{n} \sum (x_{1} - x_{2})^{2} - \frac{1}{n} \sum (x_{1} - x_{2})^{2}$ $s^{2} = v a r (d) = \frac{1}{n}$ The test statistic is giveen by $t = \frac{d ˉ}{\frac{s}{n - 1}} \sim t (n - 1) d . f .$

F Test

We move from comparing mean to comparing variance. Proportions cannot be compared for very small samples. So for samples of large size we use F test that is variance. Test of significance of difference between two small sample variance. For this we use the F test and the F distribution table. It is always right tailed. It is defined only for positive values

Step 1 : Sample Size ( F Test is only for Small Sample) Step 2 : $H_{0} : σ_{1}^{2} = σ_{2}^{2}$ Step 3: $H_{1} : σ_{1}^{2} > σ_{2}^{2}$ The F- Test Statistic is calculated as

F = \frac{S _{1}^{2}}{S _{2}^{2}}

Where $S_{1}^{2}$ and $S_{2}^{2}$ are the sample variances of the two samples. If the calculated value of F is greater than the critical value of F then we reject the null hypothesis. If the calculated value of F is less than the critical value of F then we fail to reject the null hypothesis.

Larger Variance is taken as numerator and taken as $S_{1}^{2}$ The degrees of freedom for the F distribution are $n_{1} - 1$ and $n_{2} - 1$ where $n_{1}$ and $n_{2}$ are the sample sizes of the two samples. If alpha changes the table changes

Step 4 : Critical Value Example $n_{1} = 8$ $n_{2} = 7$ $S_{1}^{2} = 2.059$ $S_{2}^{2} = 10.10$ $F = \frac{10.10}{2.059}$ $F =$

$\hat{S_{1}^{2}} = \frac{n _{1} S _{1}^{2}}{n _{1} - 1}$ $\hat{S_{2}^{2}} = \frac{n _{2} S _{2}^{2}}{n _{2} - 1}$

$F = \frac{S _{1} ^{2} ^}{S _{2} ^{2} ^}$

Step 5 is conclusion

Notice how we wrote the large variance above

Questions on F Test

A company wants to compare the variability in the productivity of two machines A and B. The company takes a sample of 10 observations from machine A and 12 observations from machine B. The sample variances are 4.5 and 2.5 respectively. Test the hypothesis that the variability in the productivity of the two machines is the same at 5% level of significance.

References

Information

date: 2025.03.12
time: 14:05

🪴 TJ's Notes 1.0

Explorer

Hypothesis and Probability and Statistics

Testing of hypothesis and Prerequisite Knowledge

Karl Poppers

Attributes

Surveys

Parameters & Statistics

Statistical Hypothesis

Hypothesis

Null Hypothesis

Testing Process

Alternative Hypothesis

Types of errors

Example Problems

Example 1

Example 2

One Tailed & Two Tailed Test

Level of significance

Example

Confidence Interval

Tests of significance Problems

Question 1

Question 1

Question 2

T Test

Problems Small Dataset

Question 1

Question 2

TOS of difference between two large sample means

Cases:

Formulas:

Case 4 Formula:

Case 3 Formula:

Case 1 Formula:

Case 2 Formula:

Test of significance of difference between two sample means ( Small Samples )

Questions

Questions based on TOS

Paired Testing

F Test

Questions on F Test

References

Information

Graph View

Table of Contents

Backlinks