Ad Code

AIOU B.Ed Solved Educational Statistics (8614) ASSIGNMENT No. 2

 

AIOU B.Ed Solved

Educational Statistics (8614) 

ASSIGNMENT No. 2

 

 

Q1:  What do you know about?                                          

a)    An independent sample t-test.

b)    A paired sample t-test

 

Ans: T-Test:

 

A t-test is a useful statistical technique used for comparing mean values of two data sets obtained from two groups. The comparison tells us whether these data sets are different from each other. It further tells us how significant the differences are and if these differences could have happened by chance. The statistical significance of t-test indicates whether or not the difference between the mean of two groups most likely reflects a real difference in the population from which the groups are selected.

 

t-tests are used when there are two groups (male and female) or two sets of data (before and after), and the researcher wishes to compare the mean score on some continuous variable.

 

 

Type of T-Test

 

 

There are a number of t-test available but two main types independent sample t-test and paired sample t-test are most commonly used. Let us deal with these types in some detail.


i)                Independent sample t-test

 

Independent sample t-test is used when there are two different independent groups of people and the researcher is interested to compare their scores. In this case the researcher collects information from two different groups of people on only one occasion.

 

ii)                 Paired sample t-test

 

Paired sample t-test is also called repeated measures. It is used the researcher is interested in comparing changes in the scores of the same group tested at two different occasions.

 

Here at this level it is necessary to know some general assumptions regarding use of ttest. The first assumption regarding t-test concerns the scale of measurement. It means that it is assumed that the dependent variable is measured at interval or ratio scale. The second assumption made is that of a simple random sample, that the data is collected from a representative, randomly selected portion of the total population. The third assumption is that the data, when plotted, results in a normal distribution

i.e. in bell shaped distribution curve. The fourth assumption is that the observation that make up data must independent of one another. That is, each observation or measurement must not be influences by any other observation or measurement. The fifth assumption   is   that   a reasonably large sample size is used. A large sample size means that the distribution of results should approach a normal bell-shaped curve. The final assumption is homogeneity of variance. Variance will be homogeneous or equal when the standard deviation of samples is approximately equal.


Q.2:  Why do we use regression analysis? Write down the types of regression.                        

 

 

Ans: Regression analysis:

 

A correlation quantifies the degree and direction to which two variables are related. It does not fit a line through the data points. It does not have to think about the cause and effect. It does not natter which of the two variables is called dependent and which is called independent. On the other hand regression finds the best line that predicts dependent variables from the independent variable. The decision of which variable is calls dependent and which calls independent is an important matter in regression, as it will get a different best-fit line if we exchange the two variables, i.e. dependent to independent and independent to dependent. The line that best predicts independent variable from dependent variable will not be the same as the line that predicts dependent variable from independent variable.

 

Let us start with the simple case of studying the relationship between two variables X and Y. The variable Y is dependent variable and the variable X is the independent variable. We are interested in seeing how various values of the independent variable X predict corresponding values of dependent Y. This statistical technique is called regression analysis. We can say that regression analysis is a technique that is used to model the dependency of one dependent variable upon one independent variable. Merriam-Webster online dictionary defines regression as a functional relationship between two or more correlated variables that is often empirically determined from


data and is used especially to predict values of one variable when given variables of others. According to Gravetter & Wallnua (2002), regression is a statistical technique for finding the best-fitting straight line for a set of data is called regression, and the resulting straight line is called regression line

 

Types of regression.

Commonly used types of regression are:

 

i)        Linear Regression

 

 

It is the most commonly used types of regression. In this technique the dependent variable is continuous and the independent variable can be continuous or discrete and the nature of regression line is linear. Linear regression establishes a relationship between dependent variable (Y) and one or more independent variables (X) using best fit straight line (also known as regression line).

 

ii)      Logistic Regression

 

 

Logistic regression is a statistical method for analyzing a dataset in which there are one or more independent variables that determine an outcome. The outcome is measured with the dichotomous (binary) variable. Like all regression analysis, the logistic regression is a predictive analysis. It is used to describe and explain relationship between one dependent binary variable and one or more nominal, ordinal, interval or ratio level independent variables.


iii)    Polynomial Regression

 

 

It is a form of regression analysis in which the relationship between independent variable X and dependent variable Y is modeled as an nth degree polynomial in x. this type of regression fits a non-linear relationship between the values of X with the corresponding values of Y.

 

Stepwise Regression

 

 

It is a method of fitting regression model in which the choice of predictive

variables is carried out by an automatic procedure. In each step, a variable is considered for addition or subtraction from the set of explanatory variables based on some pre-specified criteria. The general idea behind this procedure is that we build our regression model from a set of predictor variable by entering and removing predictors in our model, in a stepwise manner, until there is no justifiable reason to enter or remove any more.

 

v)  Ridge Regression

 

 

It is a technique for analyzing multiple regression data that suffer from multicollinearity (independent variables are highly correlated). When multicollinearity occurs, least squares estimates are unbiased, but their variances are large so that they may be far from the true value. By adding the degree of bias to the regression estimates, ridge regression reduces the standard errors.


vi)  LASSO Regression

 

 

LASSO or lasso stands for Least Absolute Shrinkage and Selection Operator. It is a method that performs both variable selection and regularization in order to enhance the prediction accuracy and interpretability of the statistical model it produces. This type of regression uses shrinkage. Shrinkage is where data values are shrunk towards a central point, like the mean.

 

vii)  Elastic Net Regression

 

 

This type of regression is a hybrid of lasso and ridge regression techniques. It is useful when there are multiple features which are correlated.

 

 

 

 

 

Q.3 Write a short note on one way ANOVA. Write down main assumptions underlying and way ANOVA.                                                                           

 

Ans: note on one way ANOVA:

 

 

The one way analysis of variance (ANOVA) is an extension of independent two-sample t test. It is a statistical technique by which we can test if three or more means are equal. It tests if the value of a single variable differs significantly among three or more level of a factor.


We can also say that one way ANOVA is a procedure of testing hypothesis that K population means are equal, where K 2. It compares the means of the samples   or   groups   in   order   to make inferences about the population means. Specifically, it tests the null hypothesis:

 

Ho : µ1 = µ2 = µ3 = ... = µk

 

 

Where µ = group mean and k = number of If one way ANOVA yields statistically   significant   result,   we   accept   the   alternate hypothesis (HA), which states that there are two group means that are statistically significantly different from each other. Here it should be kept in mind that one way ANOVA cannot tell which specific groups were statistically significantly different from each other. To determine which specific groups are different from each other, a researcher will have to use post hoc test. As there is only one independent variable or factor in one way ANOVA so it is also called single factor ANOVA. The   independent   variable   has   nominal   levels   or   a    few ordinal levels. Also, there is only one dependent variable and hypotheses are formulated about the means of the group on dependent variable. The dependent variable differentiates individuals on some quantitative dimension.

 

Assumptions Underlying the One-Way ANOVA:

There are three main assumptions

 

 

Assumption of Independence

 

 

According to this assumption the observations are random and


independent samples from the populations. The null hypothesis actually states that the samples come from populations that have the same mean. The samples must be random and independent if they are to be representative of the populations. The value of one observation is not related to any other observation. In other words, one individual’s score should not provide any clue as to how any of the other individual should score. That is, one event does not depend on another. A lack of assumption of independence leads to most serious consequences. If this assumption is violated, one way ANOVA will be inappropriate to statistic.

 

Assumption of Normality

 

 

The distributions of the population from which the samples are selected are normal.

This assumption implies that the dependent variable is normally distributed in each of the groups. One way ANOVA is considered a robust test against the assumption of normality and tolerates the violation of this assumption. As regards the normality of grouped data, the one way ANOVA can tolerate data that is normal (skewed or kurtotic distribution) with only a small effect on I error rate. However, platykurtosis can have profound effect when group sizes are small.

 

Assumptions of Homogeneity of Variance

 

 

The variances of the distribution in the populations are equal. This assumption provides that the distribution in the population have the same shapes, means, and variances; that is, they are the same populations. In other words, the variances on the dependent variable are equal across the groups.


 

 

Q.4: What do you know about chi- square (x2) goodness of fit

test? Write down the procedure for goodness of fit test.                      

Ans: chi- square (x2) goodness of fit test:

 

The chi-square (χ2) goodness of fit test (commonly referred to as one-sample chi-square) is the most commonly used goodness of fit test. It explores the proportion of cases that fall into the various categories of a single variable, and compares these with hypothesized values. In some simple words we can say that it is used to find out how the observed value of a given phenomena is significantly different from the expected value. Or we can also say that it is used to test if sample data fits a distribution from a certain population.

 

The chi-square (χ2) statistics is commonly used for testing relationship between categorical variables. It is intended to test how likely it is that an observed difference is due to chance. In most situations it can be used as a quick test of significance. In this unit you will study this important technique in detail.

 

In other words we can say that chi-square goodness of fit test tells us if the sample data represents the data we expect to find in the actual population. It tells us whether sample data are consistent with a hypothesized distribution. This is a variation of more general chi-square test. The setting for this test is a single categorical variable that can have many levels. In chi-square goodness of fit test sample data is divided into intervals. Then, the numbers of points that fall into the intervals are compared with the expected


numbers of points in each interval. . The null hypothesis for the chi-square goodness of fit test is that the data does not come from the specified distribution. The alternate hypothesis is that the data comes from the specified distribution. The formula for chi-square goodness of fit test is:

 


 

Procedure for Chi-Square (χ2) Goodness of Fit Test

For using chi-square (χ2) goodness of fit test we will have to set up null and alternate hypothesis. A null hypothesis assumes that there is no significance difference between observed and expected value. Then, alternate hypothesis will become, there is significant different difference between the observed and the expected value. Now compute the value of chi-square of fit test using formula:

 


 

Two potential disadvantages of chi-square are:

a)      The chi-square test can only be used to put data into classes. If there is data that have not been put into classes then it is necessary to make a frequency table of histogram before performing the test.

b)                                                             It requires sufficient sample size in order for chi-square approximation to be valid.


When to Use the Chi-Square Goodness of Fit Test?

 

 

The chi-square goodness of fit test is appropriate when the following conditions are met:

 

o    The sampling method is simple random.

o    The variable under study is categorical.

o    The expected value of the number of sample observation in each level of the  variable is at least 5.

 

For the chi-square goodness of fit test, the hypotheses take the form: Ho :           The data are not consistent with a specified distribution. Ha :                       The data are consistent with a specified distribution.

 

The null hypothesis (H0) specifies the proportion of observations at each level of the categorical variable. The alternative hypothesis (Ha) is that a least one of the specified proportion is not true.

 

A Chi-Square Statistic is one way to a relationship between two categorical (nonnumerical) variables. The Chi-Square Statistic is a is a single number that tells us how much difference exists between the observed counts and the counts that one expects if there is no relationship in the population.


Q.5 What is chi-square (x2) independence test? Explain in detail. (20)

 

 

Ans: chi-square (x2) independence test:

 

A chi-square (χ2) test of independence is the second important form of chi-square tests. It is used to explore the relationship between    two    categorical    variables.    Each    of     these variables can have two of more categories. It determines if there is a significant relationship between two nominal (categorical) variables. The frequency of one nominal variable is compared with different values of the second nominal variable.

 

The data can be displayed in R*C contingency table, where R is the row and C is the column. For example, the researcher wants to examine the relationship between gender (male and female) and empathy (high vs. low). The researcher will use chi-square test of independence. If the null hypothesis is accepted there would be no relationship between gender and empathy. If the null hypothesis is rejected then the conclusion will be there is a relationship between gender and empathy (e.g. say females tent to score higher on empathy and males tend to score lower on empathy).

 

The chi-square distribution is the distribution of the sum of these random samples squared. The degrees of freedom (say k) are equal to the number of samples being summed. For example, if 10 samples are taken from the normal distribution, then degree of freedom df =

10.    Chisquare distributions are always right skewed. The greater the


degree of freedom, the more the chi-square distribution looks like a normal distribution.

 

The chi-square test of independence being a non-parametric technique follow less strict assumptions, there are some general assumptions which should be taken care of:

·         Random Sample - Sample should be selected using simple random sampling method.

·         Variables - Both variables under study should be categorical.

§   Independent Observations Each person or case should be counted only once and none should appear in more than one category of group. The data from one subject should not influence the data from another subject.

§   If the data are displayed in a contingency table, the expected frequency count for each cell of the table is at least 5.

 

Both the chi-square tests are sometime confused but they are quite different from each other

 

a.       The chi-square test for independence compares two sets of data to see if there is relationship.

b.      The chi-square goodness of fit test is to fit one categorical variable to a distribution.

Post a Comment

1 Comments

  1. aoa sir please developing close relationship with peers for academic achievement at 8 grade through social media is topic k question solve kr dain r video bna dy

    ReplyDelete

Close Menu