Premium Essay

Linear Regression

In: Business and Management

Submitted By wlooh
Words 1277
Pages 6
Chapter 4
Multiple Linear Regression

Section 4.1
The Model and Assumptions

Participants will:  understand the elements of the model  understand the major assumptions of doing a regression analysis  learn how to verify the assumptions  understand a median split


The Model y   o  1x1  ...   p x p   or in Matrix Notation
Dependent Variable nx1 Unknown Parameters (p+1) x 1

Y  X e
Independent Variables – n x(p+1)

Error – nx1


How many unknown parameters are there? Can you name them? How many populations will be sampled? What are conceptual populations?


Major Requirements for Doing a Regression Analysis
The errors are normally distributed (not Y). Constant variance – What is the null hypothesis? Linear in the parameters Errors are independent. Some people call these assumptions.
EY   () X


We have observed y = response (change in blood pressure) and x = dosage level of a drug. We assume a linear relationship between E(y) and x. The two graphs are the same, but they have been rotated to give additional views.






 

 

Sketch E(y). Based on the graphs, make comments about the assumptions. Do they appear to be satisfied or violated? How many populations are represented by the graphs? List all of the parameters. Write the model down.


Checking Assumptions
Testing the residuals for normality PROC CAPABILITY

Testing for constant variance Use the test for Heteroscedasticity in PROC REG


y versus x1


Residual versus x1 after Fitting x1 and x1*x1




This demonstration illustrates  the model and parameters  testing the assumptions  virtual populations.


Tasks to Do
     

Write down the…...

Similar Documents

Premium Essay

Regression Models

...Regression Models Student Name Grantham University BA/520 – Quantitative Analysis Instructor Name April 6, 2013 Abstract This paper will refer to regression models and the benefits that variables provide when developing and examining such models. Also, it will discuss the reason why scatter diagrams are used and will describe the simple linear regression model and will refer to multiple regression analysis as well as the potential uses for this type of model. Regression Models Regression models are a statistical measure that attempts to determine the strength of the relationship between one dependent variable (usually denoted by Y) and a series of other changing variables (known as independent variables). Regression models provide the scientist with a powerful tool, allowing predictions about past, present, or future events to be made with information about past or present events. Inference based on such models is known as regression analysis. The main purpose of regression analysis is to predict the value of a dependent or response variable based on values of the independent or explanatory variables. According to Render, Stair, and Hanna (2011) they are two reasons for which regression analyses are used: one is to understand the relation between various variables and the second is to predict the variable's value based on the value of the other. Variables provide many advantages when creating models. One of the......

Words: 1282 - Pages: 6

Premium Essay

Linear Regression

...Linear Regression I would like to know if people who enjoy thrill seeking have tattoos. I believe thrill seeking and tattoos go hand in hand. Most people I know are adventurous, risk takers, and daredevils and all of them have tattoos. I have a strong feeling that the correlation between the two will have a strong positive relationship. X= Tattoos Y= Thrill Seeking The scatter plot shows an extremely rough linear pattern but there is an upward sloping. Line of best fit: y = 0.9148x +25.505 Analysis: 1. r = .14 little or no correlation 2. R^2 = 2% 2% of the variance in thrill seeking is accounted by tattoos. 3. Slope = 0.0196(m) For every 1 tattoo people have there is an increase we expected of 0.9148 in thrill seeking. Conclusion: Between these two variables, there are no correlations between the two. It was shocking to see there is no relationship between the two. I truly believed people who are thrill seekers have tattoo. T-Test Independent 2 Sample My gym teacher believes that males are stronger than females and that is why males have more tattoos. The scale is determine by the number of tattoos both males and females have. Eighty-four males and one hundred and eleven females responded. The males average 39 (s.d. 1.42) while the females average 38 (s.d. 0.98). At the .10 significance level, test to see if there is a difference between males having more tattoos than females? Ho: Null Hypothesis Males equal Females Ha: Null......

Words: 478 - Pages: 2

Premium Essay


...Regression Analysis: Basic Concepts Allin Cottrell∗ 1 The simple linear model Suppose we reckon that some variable of interest, y, is ‘driven by’ some other variable x. We then call y the dependent variable and x the independent variable. In addition, suppose that the relationship between y and x is basically linear, but is inexact: besides its determination by x, y has a random component, u, which we call the ‘disturbance’ or ‘error’. Let i index the observations on the data pairs (x, y). The simple linear model formalizes the ideas just stated: yi = β0 + β1 xi + ui The parameters β0 and β1 represent the y-intercept and the slope of the relationship, respectively. In order to work with this model we need to make some assumptions about the behavior of the error term. For now we’ll assume three things: E(ui ) = 0 2 2 E(ui ) = σu E(ui u j ) = 0, i = j u has a mean of zero for all i it has the same variance for all i no correlation across observations We’ll see later how to check whether these assumptions are met, and also what resources we have for dealing with a situation where they’re not met. We have just made a bunch of assumptions about what is ‘really going on’ between y and x, but we’d like to put numbers on the parameters βo and β1 . Well, suppose we’re able to gather a sample of data on x and y. The task ˆ of estimation is then to come up with coefficients—numbers that we can calculate from the data, call them β0 and ˆ1 —which serve as estimates of the unknown......

Words: 1464 - Pages: 6

Free Essay

Linear Regression

...Linear Regression Forecast Nicolas Scott Gomez Park University Introduction………………………………………………………………………………………..3 Subjects and Methods...…………………..…………………………….…………………………3 Results…………………..…………………………………………………………………………4 References…...…………………………………………………………………………………….7 Introduction There is a growing awareness of obesity in more modern nations which has added importance to efforts in understanding causes and natural history of obesity. In order to understand it, you must determine what a normal body fat content is and how it changes with age. Most recently, there are four component models of body composition that don’t rely on major assumptions about constant compositions that have been developed. The models offer the opportunity to determine the relation between age and body composition components such as fat in a more accurate way. In 1999, a study was done showing several studies showing body composition variables like fat mass and how they vary significantly among ethnic groups. Subjects and Methods Fat mass was determined once in a large sample of healthy volunteers by using a 4 component model requiring measurement of body volume, total body water, total body bone mineral mass and the body weight. The relation between age and body fat was explored by using several different statistical methods and the 1324 volunteers ages 20-94 were recruited for his study through various means of advertising. These studies were performed between 1986 and 1997 and each potential subject......

Words: 717 - Pages: 3

Premium Essay


...Q1: All the regressions were performed. Output can be made available if needed. See outputs for Q2 in appendix. Q2: Select the model you are going to keep for each brand and explain WHY. Report the corresponding output in an appendix attached to your report (hence, 1 output per brand) We use Adjusted R Squared to compare the Linear or Semilog Regression. R^2 is a statistic that will give some information about the goodness of fit of a model. In regression, the Adjusted R^2 coefficient of determination is a statistical measure of how well the regression line approximates the real data points. An R2 of 1 indicates that the regression line perfectly fits the data. Brand1: Linear Regression R^2 | 0.594 | SemiLog Regression R^2 | 0.563 | We use the Linear Regression Model since R-squared is higher. Brand 2: Linear Regression R^2 | 0.758 | SemiLog Regression R^2 | 0.588 | We use the Linear Regression Model since R-squared is higher Brand 3: Linear Regression R^2 | 0.352 | SemiLog Regression R^2 | 0.571 | We use the Semilog Regression Model since R-squared is higher Brand 4: Linear Regression R^2 | 0.864 | SemiLog Regression R^2 | 0.603 | We use the Linear Regression Model since R-squared is higher Q3: Here we compute the cross-price elasticity. Depending on whether we use linear or semi-log model, Linear Model Linear Model Semi-Log Model Semi-Log Model ` ...

Words: 609 - Pages: 3

Premium Essay

Forecasting Gold Prices Using Multiple Linear Regression Method

...Forecasting Gold Prices Using Multiple Linear Regression Method Z. Ismail, 2A. Yahya and 1A. Shabri Department of Mathematics, Faculty of Science 2 Department of Basic Education, Faculty of Education University Technology Malaysia, 81310 Skudai, Johor Malaysia 1 1 Abstract: Problem statement: Forecasting is a function in management to assist decision making. It is also described as the process of estimation in unknown future situations. In a more general term it is commonly known as prediction which refers to estimation of time series or longitudinal type data. Gold is a precious yellow commodity once used as money. It was made illegal in USA 41 years ago, but is now once again accepted as a potential currency. The demand for this commodity is on the rise. Approach: Objective of this study was to develop a forecasting model for predicting gold prices based on economic factors such as inflation, currency price movements and others. Following the melt-down of US dollars, investors are putting their money into gold because gold plays an important role as a stabilizing influence for investment portfolios. Due to the increase in demand for gold in Malaysian and other parts of the world, it is necessary to develop a model that reflects the structure and pattern of gold market and forecast movement of gold price. The most appropriate approach to the understanding of gold prices is the Multiple Linear Regression (MLR) model. MLR is a study on the relationship between a single......

Words: 3920 - Pages: 16

Premium Essay


...relationships between the variables. The relationships can either be negative or positive. This is told by whether the graph increases or decreases. Benefits and Intrinsic Job Satisfaction Regression output from Excel SUMMARY OUTPUT Regression Statistics Multiple R 0.069642247 R Square 0.004850043 Adjusted R Square -0.00471871 Standard Error 0.893876875 Observations 106 ANOVA df SS MS F Significance F Regression 1 0.404991362 0.404991 0.50686 0.478094147 Residual 104 83.09765015 0.799016 Total 105 83.50264151 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0% Intercept 5.506191723 0.363736853 15.13784 4.8E-28 4.784887893 6.2274956 4.7848879 6.22749555 Benefits -0.05716561 0.080295211 -0.711943 0.47809 -0.21639402 0.1020628 -0.216394 0.10206281 Y=5.5062+-0.0572x Graph Benefits and Extrinsic Job Satisfaction Regression output from Excel SUMMARY OUTPUT Regression Statistics Multiple R 0.161906 R Square 0.026214 Adjusted R Square 0.01685 Standard Error 1.001305 Observations 106 ANOVA df SS MS F Significance F Regression 1 2.806919 2.806919 2.799606 0.097293 Residual 104 104.2717 1.002612 Total 105 107.0786 Coefficients Standard Error t Stat P-value Lower 95% Upper......

Words: 653 - Pages: 3

Premium Essay

Linear Regression

...Linear Regression deals with the numerical measures to express the relationship between two variables. Relationships between variables can either be strong or weak or even direct or inverse. A few examples may be the amount McDonald’s spends on advertising per month and the amount of total sales in a month. Additionally the amount of study time one puts toward this statistics in comparison to the grades they receive may be analyzed using the regression method. The formal definition of Regression Analysis is the equation that allows one to estimate the value of one variable based on the value of another. Key objectives in performing a regression analysis include estimating the dependent variable Y based on a selected value of the independent variable X. To explain, Nike could possibly measurer how much they spend on celebrity endorsements and the affect it has on sales in a month. When measuring, the amount spent celebrity endorsements would be the independent X variable. Without the X variable, Y would be impossible to estimate. The general from of the regression equation is Y hat "=a + bX" where Y hat is the estimated value of the estimated value of the Y variable for a selected X value. a represents the Y-Intercept, therefore, it is the estimated value of Y when X=0. Furthermore, b is the slope of the line or the average change in Y hat for each change of one unit in the independent variable X. Finally, X is any value of the independent variable that is......

Words: 1324 - Pages: 6

Premium Essay


...STATISTICS FOR ENGINEERS (EQT 373) TUTORIAL CHAPTER 3 – INTRODUCTORY LINEAR REGRESSION 1) Given 5 observations for two variables, x and y. | 3 | 12 | 6 | 20 | 14 | | 55 | 40 | 55 | 10 | 15 | a. Develop a scatter diagram for these data. b. What does the scatter diagram developed in part (a) indicate about the relationship between the two variables? c. Develop the estimated regression equation by computing the values and. d. Use the estimated regression equation to predict the value of y when x=10. e. Compute the coefficient of determination. Comment on the goodness of fit. f. Compute the sample correlation coefficient (r) and explain the result. 2) The Tenaga Elektik MN Company is studying the relationship between kilowatt-hours (thousands) used and the number of room in a private single-family residence. A random sample of 10 homes yielded the following. Number of rooms | Kilowatt-Hours (thousands) | 12 9 14 6 10 8 10 10 5 7 | 9 7 10 5 8 6 8 10 4 7 | a. Identify the independent and dependent variable. b. Compute the coefficient of correlation and explain. c. Compute the coefficient of determination and explain. d. Test whether there is a positive correlation between both variables. Use α=0.05. e. Determine the regression equation (used Least Square method) f. Determine the value of kilowatt-hours used if number of rooms is 11. g. Can you use the model in (f.) to predict the kilowatt-hours if number of......

Words: 1184 - Pages: 5

Premium Essay


...MULTIPLE REGRESSION After completing this chapter, you should be able to: understand model building using multiple regression analysis apply multiple regression analysis to business decision-making situations analyze and interpret the computer output for a multiple regression model test the significance of the independent variables in a multiple regression model use variable transformations to model nonlinear relationships recognize potential problems in multiple regression analysis and take the steps to correct the problems. incorporate qualitative variables into the regression model by using dummy variables. Multiple Regression Assumptions The errors are normally distributed The mean of the errors is zero Errors have a constant variance The model errors are independent Model Specification Decide what you want to do and select the dependent variable Determine the potential independent variables for your model Gather sample data (observations) for all variables The Correlation Matrix Correlation between the dependent variable and selected independent variables can be found using Excel: Tools / Data Analysis… / Correlation Can check for statistical significance of correlation with a t test Example A distributor of frozen desert pies wants to evaluate factors thought to influence demand Dependent variable: Pie sales (units per......

Words: 1561 - Pages: 7

Premium Essay

Multiple Linear Regression

...In multiple linear regression analysis, R2 is a measure of the ________. A) homoskedasticity of the predictors B) misclassification rate C) percentage of the variance of the dependent variable that is explained by the set of independent (predictor) variables D) precision of the resulting model when applied to the validation data 2. Categorical variables can be used in a multiple linear regression model _________. A) by partitioning of the dataset B) when no multicollinearity among the independent variables is present C) when the sample size is at least 10 times that of the number of variables D) through the use of dummy variables 3. In multiple linear regression analysis “multicollinearity” refers to _________. A) two or more predictors sharing the same linear relationship with the outcome variable B) a high degree of correlation between the dependent variables C) the equality of the variance of the dependent throughout its range of values D) None of the above. 4. In multiple regression analysis, which of the following is an example of a subset selection algorithm? A) Forward selection B) Backwards elimination C) Stepwise regression D) All of the above 5. _________ is an important property of a good model. A) Complexity B) Independence C) Parsimony D) None of the bove 6. An assumption that applies to the linear multiple regression method is that the distribution of the error term values should be ________. A)......

Words: 460 - Pages: 2

Premium Essay

Regression Analysis

...Acts 430 Regression Analysis In this project, we are required to forecast number of houses sold in the United States by creating a regression analysis using the SAS program. We initially find out the dependent variable which known as HSN1F. 30-yr conventional Mortgage rate, real import of good and money stock, these three different kinds of data we considered as independent variables, which can be seen as the factors will impact the market of house sold in USA. Intuitively, we thought 30-yr conventional mortgage rate is a significant factor that will influences our behavior in house sold market, which has a negative relation with number of house sold. When mortgage rate increases, which means people are paying relatively more to buy a house, which will leads to a decrease tendency in house sold market. By contrast, a lower interest rate would impulse the market. We believe that real import good and service is another factor that will causes up and down in house sold market. When a large amount of goods and services imported by a country, that means we give out a lot of money to other country. In other words, people have less money, the sales of houses decreased. Otherwise, less import of goods and services indicates an increase tendency in house sold market. We can see it also has a negative relationship with the number of house sold. Lastly, we have money stock as our third impact factor of house sold. We considered it has a positive relationship with the number of...

Words: 723 - Pages: 3

Free Essay

Psych 625 Week 5 Learning Team Assignment Linear Regression

...PSYCH 625 Week 5 Learning Team Assignment Linear regression To Buy this Class Copy & paste below link in your Brower Or Visit Our Website Visit : Email Us : PSYCH 625 Week 5 Learning Team Assignment Linear regression PSYCH 625 Week 5 Learning Team Assignment Linear regression To Buy this Class Copy & paste below link in your Brower Or Visit Our Website Visit : Email Us : PSYCH 625 Week 5 Learning Team Assignment Linear regression PSYCH 625 Week 5 Learning Team Assignment Linear regression To Buy this Class Copy & paste below link in your Brower Or Visit Our Website Visit : Email Us : PSYCH 625 Week 5 Learning Team Assignment Linear regression PSYCH 625 Week 5 Learning Team Assignment Linear regression To Buy this Class Copy & paste below link in your Brower Or Visit Our Website Visit : Email Us : ......

Words: 2736 - Pages: 11

Premium Essay

Linear Regression

...Introduction Simple linear regression is a model with a single regressor x that has a relationship with a response y that is a straight line. This simple linear regression model is y = β0 + β1x + ε where the intercept β0 and the slope β1 are unknown constants and ε is a random error component. Testing Significance of Regression: H0: β1 = 0, H1 : β1 ≠ 0 The hypotheses relate to the significance of regression. Failing to reject H0: β1 = 0 implies that there is no linear relationship between x and y. On the other hand, if H0: β1 = 0 is rejected, it implies that x is of value in explaining the variability in y. The following equation is the Fundamental analysis-of-variance identity for a regression model. SST = SSR + SSRes Analysis of variance (ANOVA) is a collection of statistical models used in order to analyze the differences between group means and their associated procedures (such as "variation" among and between groups), developed by R. A. Fisher. In the ANOVA setting, the observed variance in a particular variable is partitioned into components attributable to different sources of variation.  P value or calculated probability is the estimated probability of rejecting the null hypothesis (H0) of a study question when that hypothesis is true. VIF (the variance inflation factor) for each term in the model measures the combined effect of the dependences among the regressors on the variance of the term. Practical experience indicates that if any of...

Words: 483 - Pages: 2

Free Essay

Regression Paper

...Regression Paper Team RES/342 Research and Evaluation Teacher Date The Hypothesis Team C’s hypothesis is that the more years of education one receives the more a person can potentially earn in salary. The team will use the process of linear regression analysis to explain how the information is used and conduct a five-step test to see if the hypothesis proves true or false. Linear Regression Analysis Team C’s purpose of this research paper is to use a linear regression analysis test to determine if a significant linear relationship exists between an independent variable which is X, level or years of education, and a dependent variable Y, salaries earned or potentially earned. “It is used to determine the extent to which there is a linear relationship between a dependent variable and one or more independent variables,” (Statistically Significant Consulting, 2010, para. 1). Learning Team C will use the salary and education levels from the Wages and Wage Earners Data Set collected through access to the e-source link of University of Phoenix. For this test the dependent variable, Y, will represent the salary of the 100 participants and the independent variable, X, will represent the education of the 100 participants. How the Information is used This information will be used in a linear regression test to see if there is enough evidence to reject the null hypothesis that a higher education does not......

Words: 1091 - Pages: 5