title

Multiple Linear Regression using python and sklearn

description

Multiple linear regression is the most common form of linear regression analysis. As a predictive analysis, the multiple linear regression is used to explain the relationship between one continuous dependent variable and two or more independent variables.
References - Kirell Ermenko Projects On Linear Regression. This video is dedicated to him
Please subscribe and share the channel
Simple Linear Regressiion link: https://www.youtube.com/watch?v=E-xp-SjfOSY&t=1598s
Github link: https://github.com/krishnaik06/Multiple-Linear-Regression
You can buy my book where I have provided a detailed explanation of how we can use Machine Learning, Deep Learning in Finance using python
Packt url : https://prod.packtpub.com/in/big-data-and-business-intelligence/hands-python-finance
Amazon url: https://www.amazon.com/Hands-Python-Finance-implementing-strategies-ebook/dp/B07Q5W7GB1/ref=sr_1_1?keywords=Krish+naik&qid=1554285070&s=gateway&sr=8-1-spell

detail

{'title': 'Multiple Linear Regression using python and sklearn', 'heatmap': [{'end': 598.951, 'start': 555.709, 'weight': 0.823}, {'end': 995.992, 'start': 960.571, 'weight': 0.837}, {'end': 1073.144, 'start': 1000.075, 'weight': 0.707}], 'summary': "Covers understanding multiple linear regression and its practical use in predicting output values based on multiple independent features, demonstrating a startup profit prediction case with '50_startups.csv' and achieving a model accuracy of r squared (0.93) through model training.", 'chapters': [{'end': 429.919, 'segs': [{'end': 185.859, 'src': 'embed', 'start': 161.783, 'weight': 2, 'content': [{'end': 170.529, 'text': 'coefficients. now, when we are discussing about this equation, this intercepts basically indicate that when my size is zero, what will be my base price?', 'start': 161.783, 'duration': 8.746}, {'end': 178.734, 'text': 'so this is the value of my intercept in that particular best fit line, and slope basically indicates that with the unit increase in size,', 'start': 170.529, 'duration': 8.205}, {'end': 181.356, 'text': 'what is the unit increase in the slope?', 'start': 178.734, 'duration': 2.622}, {'end': 185.859, 'text': 'that is what this particular value is actually specified by this particular slope value.', 'start': 181.356, 'duration': 4.503}], 'summary': 'Intercept indicates base price, slope shows unit increase in price per unit size.', 'duration': 24.076, 'max_score': 161.783, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/5rvnlZWzox8/pics/5rvnlZWzox8161783.jpg'}, {'end': 314.547, 'src': 'embed', 'start': 285.378, 'weight': 0, 'content': [{'end': 289.319, 'text': 'Now our aim in multiple linear regression is that we need to compute beta zero.', 'start': 285.378, 'duration': 3.941}, {'end': 291.6, 'text': 'So beta zero is again the intercept.', 'start': 289.899, 'duration': 1.701}, {'end': 294.132, 'text': 'Again, the intercept.', 'start': 293.331, 'duration': 0.801}, {'end': 302.018, 'text': 'Beta one, beta two, beta three are the slopes of the coefficient with respect to this independent features.', 'start': 294.772, 'duration': 7.246}, {'end': 309.663, 'text': 'Now that basically indicates that if I increase the value of x one by one unit,', 'start': 303.158, 'duration': 6.505}, {'end': 314.547, 'text': 'then this beta one says that by how much value it will affect in the price.', 'start': 309.663, 'duration': 4.884}], 'summary': 'Compute beta zero, the intercept, and slopes beta one, beta two, and beta three in multiple linear regression to determine their impact on the price.', 'duration': 29.169, 'max_score': 285.378, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/5rvnlZWzox8/pics/5rvnlZWzox8285378.jpg'}, {'end': 413.948, 'src': 'embed', 'start': 387.067, 'weight': 1, 'content': [{'end': 391.208, 'text': 'You know that NumPy is basically used for creating arrays.', 'start': 387.067, 'duration': 4.141}, {'end': 394.809, 'text': 'Whereas Matplotlib is basically used for visualization.', 'start': 392.289, 'duration': 2.52}, {'end': 397.91, 'text': 'Pandas is basically used for creating the data set.', 'start': 394.909, 'duration': 3.001}, {'end': 400.777, 'text': "First step, I'm going to read my data set.", 'start': 398.875, 'duration': 1.902}, {'end': 406.182, 'text': 'So my data set is in this particular location only in my working space directory that you can see over here.', 'start': 400.797, 'duration': 5.385}, {'end': 410.846, 'text': 'Make sure you also set up the working space directory and put your CSV file there.', 'start': 407.102, 'duration': 3.744}, {'end': 413.948, 'text': "I'm reading this 50starters.csv.", 'start': 412.007, 'duration': 1.941}], 'summary': "Numpy for arrays, matplotlib for visualization, pandas for data set creation. reading '50starters.csv'.", 'duration': 26.881, 'max_score': 387.067, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/5rvnlZWzox8/pics/5rvnlZWzox8387067.jpg'}], 'start': 1.704, 'title': 'Understanding multiple linear regression', 'summary': "Explains the concept of multiple linear regression, where the equation y=β0+β1x1+β2x2+β3x3 is used to predict the output value based on multiple independent features, with a practical use case demonstrated with '50_startups.csv' using numpy, matplotlib, and pandas libraries.", 'chapters': [{'end': 181.356, 'start': 1.704, 'title': 'Multiple linear regression intuition', 'summary': 'Introduces the concept of simple linear regression, compares it with multiple linear regression, and discusses the intuition behind finding the best fit line in simple linear regression based on the size and price of a house.', 'duration': 179.652, 'highlights': ['Simple linear regression is defined by the equation y = beta0 + beta1*x1, where beta0 is the intercept and beta1 is the slope. In simple linear regression, the equation y = beta0 + beta1*x1 represents the best fit line, with beta0 as the intercept and beta1 as the slope.', 'The intercept in simple linear regression indicates the base price when the size is zero, while the slope indicates the unit increase in price with a unit increase in size. The intercept in the simple linear regression equation represents the base price when the size is zero, and the slope indicates the unit increase in price with a unit increase in size.', 'The chapter discusses the features F1 (size) and F2 (price) in the context of simple linear regression. The chapter explains the features F1 (size) and F2 (price) in the context of simple linear regression, where F1 is the independent feature and F2 is the dependent feature.']}, {'end': 429.919, 'start': 181.356, 'title': 'Understanding multiple linear regression', 'summary': "Explains the concept of multiple linear regression, where the equation y=β0+β1x1+β2x2+β3x3 is used to predict the output value based on multiple independent features, with a practical use case to be demonstrated with a data set called '50_startups.csv' using numpy, matplotlib, and pandas libraries.", 'duration': 248.563, 'highlights': ['The equation y=β0+β1x1+β2x2+β3x3 is used to predict the output value based on multiple independent features, with β0 as the intercept and β1, β2, β3 as slopes of the coefficient with respect to the independent features.', "The practical use case involves a data set called '50_startups.csv' and the usage of NumPy for creating arrays, Matplotlib for visualization, and Pandas for creating the data set.", "The data set '50_startups.csv' is to be used for demonstrating the practical part of multiple linear regression, with the data set to be read and explained, using NumPy, Matplotlib, and Pandas libraries for the process."]}], 'duration': 428.215, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/5rvnlZWzox8/pics/5rvnlZWzox81704.jpg', 'highlights': ['The equation y=β0+β1x1+β2x2+β3x3 is used to predict the output value based on multiple independent features, with β0 as the intercept and β1, β2, β3 as slopes of the coefficient with respect to the independent features.', "The practical use case involves a data set called '50_startups.csv' and the usage of NumPy for creating arrays, Matplotlib for visualization, and Pandas for creating the data set.", 'The intercept in simple linear regression indicates the base price when the size is zero, while the slope indicates the unit increase in price with a unit increase in size.']}, {'end': 791.031, 'segs': [{'end': 458.813, 'src': 'embed', 'start': 430.94, 'weight': 1, 'content': [{'end': 438.562, 'text': 'Here you can see that my data set basically has, so this is the information of 50 startups in various states.', 'start': 430.94, 'duration': 7.622}, {'end': 444.083, 'text': 'Features like R&D spend, administration, marketing spend, state and profit.', 'start': 440.262, 'duration': 3.821}, {'end': 452.987, 'text': 'Now in this, if you know that, just by seeing this particular data, profit is basically my dependent feature.', 'start': 445.839, 'duration': 7.148}, {'end': 458.813, 'text': 'Now what this use case is all about? I have to find out profit based on this feature, on R&D spent.', 'start': 453.007, 'duration': 5.806}], 'summary': 'Dataset contains information on 50 startups with features like r&d spend, administration, marketing spend, state, and profit, aiming to predict profit based on r&d spend.', 'duration': 27.873, 'max_score': 430.94, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/5rvnlZWzox8/pics/5rvnlZWzox8430940.jpg'}, {'end': 512.928, 'src': 'embed', 'start': 481.482, 'weight': 0, 'content': [{'end': 483.863, 'text': 'I should be able to predict what is the profit value.', 'start': 481.482, 'duration': 2.381}, {'end': 486.287, 'text': 'So this is a kind of regression problem.', 'start': 484.605, 'duration': 1.682}, {'end': 491.771, 'text': 'And just by seeing the data set, it looks like this all are independent features that I have.', 'start': 486.767, 'duration': 5.004}, {'end': 500.059, 'text': 'R&D spend, administration spend, marketing spend, and state are my independent features, whereas profit is my dependent feature.', 'start': 492.112, 'duration': 7.947}, {'end': 506.345, 'text': 'So always, whenever we load a data set, our first work is to divide into dependent and independent features.', 'start': 500.802, 'duration': 5.543}, {'end': 512.928, 'text': "So for that, I've written a code where I'm taking data set dot ilock colon comma colon minus one.", 'start': 506.785, 'duration': 6.143}], 'summary': 'Predict profit using regression with independent features: r&d spend, administration spend, marketing spend, and state.', 'duration': 31.446, 'max_score': 481.482, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/5rvnlZWzox8/pics/5rvnlZWzox8481482.jpg'}, {'end': 598.951, 'src': 'heatmap', 'start': 555.709, 'weight': 0.823, 'content': [{'end': 562.233, 'text': 'The next thing is that if I go and see on my data set, I have a state column and this looks like a category column.', 'start': 555.709, 'duration': 6.524}, {'end': 566.156, 'text': "right?. We have three unique categories and every time we'll be having three unique categories.", 'start': 562.233, 'duration': 3.923}, {'end': 568.997, 'text': 'One is New York, one is California, one is Florida.', 'start': 566.636, 'duration': 2.361}, {'end': 572.82, 'text': 'We have to convert this category feature into one hot encoding.', 'start': 569.838, 'duration': 2.982}, {'end': 575.116, 'text': 'We had just two categories.', 'start': 573.735, 'duration': 1.381}, {'end': 578.299, 'text': 'At that time we could convert that into label encoding.', 'start': 575.397, 'duration': 2.902}, {'end': 580.962, 'text': 'Now we have to convert this into one hot encoding.', 'start': 578.9, 'duration': 2.062}, {'end': 588.909, 'text': "In order to convert this into one hot encoding, I'll be using a function inside pandas that is get underscore dummies.", 'start': 581.542, 'duration': 7.367}, {'end': 594.374, 'text': 'Get underscore dummies helps you to create dummy variables with respect to the number of categories that are present.', 'start': 589.129, 'duration': 5.245}, {'end': 598.951, 'text': 'Inside this, I am using the x state value.', 'start': 595.768, 'duration': 3.183}], 'summary': 'Data set has 3 unique state categories, to be converted into one hot encoding using pandas get_dummies function.', 'duration': 43.242, 'max_score': 555.709, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/5rvnlZWzox8/pics/5rvnlZWzox8555709.jpg'}, {'end': 605.437, 'src': 'embed', 'start': 581.542, 'weight': 2, 'content': [{'end': 588.909, 'text': "In order to convert this into one hot encoding, I'll be using a function inside pandas that is get underscore dummies.", 'start': 581.542, 'duration': 7.367}, {'end': 594.374, 'text': 'Get underscore dummies helps you to create dummy variables with respect to the number of categories that are present.', 'start': 589.129, 'duration': 5.245}, {'end': 598.951, 'text': 'Inside this, I am using the x state value.', 'start': 595.768, 'duration': 3.183}, {'end': 605.437, 'text': "So basically in my x feature, which is my independent feature, I'm taking the state and I'm converting this into dummy variables.", 'start': 599.051, 'duration': 6.386}], 'summary': 'Using pandas get_dummies function to convert state into dummy variables.', 'duration': 23.895, 'max_score': 581.542, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/5rvnlZWzox8/pics/5rvnlZWzox8581542.jpg'}], 'start': 430.94, 'title': 'Startup profit prediction and one hot encoding in data preprocessing', 'summary': 'Discusses predicting the profit of 50 startups based on independent features and creating a regression model. it also explains the process of converting categorical features into one hot encoding using get_dummies function in pandas, addressing the issue of dummy variable trap and resulting in a new data frame with two columns.', 'chapters': [{'end': 562.233, 'start': 430.94, 'title': 'Startup profit prediction', 'summary': 'Discusses the process of predicting the profit of 50 startups based on independent features like r&d spend, administration, marketing spend, and state, aiming to create a regression model.', 'duration': 131.293, 'highlights': ['The data set contains information on 50 startups in various states, with features like R&D spend, administration, marketing spend, state, and profit. Quantifiable data: 50 startups, various states, R&D spend, administration, marketing spend.', 'The goal is to predict the profit value based on the values of R&D spend, administration, marketing spend, and the state, constituting a regression problem. Key point: Predicting profit value. Quantifiable data: R&D spend, administration, marketing spend, state.', 'The process involves dividing the data set into dependent (profit) and independent features (R&D spend, administration spend, marketing spend, state). Key point: Dividing data into dependent and independent features. Quantifiable data: Profit, R&D spend, administration spend, marketing spend, state.']}, {'end': 791.031, 'start': 562.233, 'title': 'One hot encoding in data preprocessing', 'summary': 'Explains the process of converting categorical features into one hot encoding using get_dummies function in pandas, addressing the issue of dummy variable trap and concatenating the dummy variables with the original dataset, resulting in a new data frame with two columns.', 'duration': 228.798, 'highlights': ['The chapter explains the process of converting categorical features into one hot encoding using get_dummies function in Pandas, addressing the issue of dummy variable trap and concatenating the dummy variables with the original dataset, resulting in a new data frame with two columns.', 'The get_dummies function in Pandas is used to create dummy variables for the three unique categories (New York, California, and Florida), resulting in three columns representing the presence of each category, with the drop_first=True parameter to avoid the dummy variable trap.', 'The state column is dropped from the dataset after converting it into dummy variables, and the dummy variables are then concatenated with the original dataset using pd.concat, resulting in a new data frame with two columns.', 'The process involves converting the categorical feature into one hot encoding using get_dummies function in Pandas, addressing the issue of dummy variable trap by setting drop_first=True and concatenating the dummy variables with the original dataset using pd.concat, resulting in a new data frame with two columns.']}], 'duration': 360.091, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/5rvnlZWzox8/pics/5rvnlZWzox8430940.jpg', 'highlights': ['The goal is to predict the profit value based on the values of R&D spend, administration, marketing spend, and the state, constituting a regression problem. Quantifiable data: R&D spend, administration, marketing spend, state.', 'The process involves dividing the data set into dependent (profit) and independent features (R&D spend, administration spend, marketing spend, state). Key point: Dividing data into dependent and independent features. Quantifiable data: Profit, R&D spend, administration spend, marketing spend, state.', 'The chapter explains the process of converting categorical features into one hot encoding using get_dummies function in Pandas, addressing the issue of dummy variable trap and concatenating the dummy variables with the original dataset, resulting in a new data frame with two columns.', 'The get_dummies function in Pandas is used to create dummy variables for the three unique categories (New York, California, and Florida), resulting in three columns representing the presence of each category, with the drop_first=True parameter to avoid the dummy variable trap.']}, {'end': 1189.453, 'segs': [{'end': 894.742, 'src': 'embed', 'start': 862.785, 'weight': 0, 'content': [{'end': 865.426, 'text': 'so this is how my equation gets converted now.', 'start': 862.785, 'duration': 2.641}, {'end': 869.968, 'text': 'obviously, if we have multiple independent features, what we can do?', 'start': 865.426, 'duration': 4.542}, {'end': 870.888, 'text': 'we can.', 'start': 869.968, 'duration': 0.92}, {'end': 873.83, 'text': 'we have to use multiple linear regression.', 'start': 870.888, 'duration': 2.942}, {'end': 877.672, 'text': "now, what i'll do is that i will go and do the train test first of all.", 'start': 873.83, 'duration': 3.842}, {'end': 883.038, 'text': "Now, in train test split, I'm using the library called as train test split.", 'start': 878.857, 'duration': 4.181}, {'end': 888.54, 'text': "In this, I'm taking x, y, my test size is 0.2, my random state is 0.", 'start': 883.318, 'duration': 5.222}, {'end': 894.742, 'text': 'And random state, you can select any so that the train test splits gets automatically randomly selected.', 'start': 888.54, 'duration': 6.202}], 'summary': 'Using multiple linear regression, train test split with test size 0.2 and random state 0 for feature conversion.', 'duration': 31.957, 'max_score': 862.785, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/5rvnlZWzox8/pics/5rvnlZWzox8862785.jpg'}, {'end': 995.992, 'src': 'heatmap', 'start': 960.571, 'weight': 0.837, 'content': [{'end': 964.752, 'text': 'Now the next thing is that I have my y test data, sorry, I have my x test data.', 'start': 960.571, 'duration': 4.181}, {'end': 968.032, 'text': "I'm going to do the prediction for the x test data.", 'start': 965.252, 'duration': 2.78}, {'end': 974.494, 'text': 'This is my y pred, now my y pred is created, right? And this is my y test.', 'start': 969.333, 'duration': 5.161}, {'end': 977.441, 'text': 'my white test is created now.', 'start': 975.68, 'duration': 1.761}, {'end': 982.744, 'text': 'the next thing is that i want to compare whether this white, red and white test are well and good or not.', 'start': 977.441, 'duration': 5.303}, {'end': 984.125, 'text': 'is my accuracy better?', 'start': 982.744, 'duration': 1.381}, {'end': 987.947, 'text': 'is my accuracy good when compared to the real value that is white test?', 'start': 984.125, 'duration': 3.822}, {'end': 995.992, 'text': 'for that, what i will be doing is that i will be implementing a concept which is called as r squared.', 'start': 987.947, 'duration': 8.045}], 'summary': 'Predicted x test data and evaluating accuracy using r squared.', 'duration': 35.421, 'max_score': 960.571, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/5rvnlZWzox8/pics/5rvnlZWzox8960571.jpg'}, {'end': 1083.36, 'src': 'heatmap', 'start': 1000.075, 'weight': 3, 'content': [{'end': 1005.654, 'text': 'okay, R squared formula is basically given by something like this', 'start': 1000.075, 'duration': 5.579}, {'end': 1006.837, 'text': 'You can see this.', 'start': 1006.275, 'duration': 0.562}, {'end': 1015.987, 'text': 'R squared is nothing but one minus sum of residual divided by sum of mean.', 'start': 1007.399, 'duration': 8.588}, {'end': 1021.289, 'text': 'now, when i say sum of residual, sum of residual is basically given by this particular equation,', 'start': 1015.987, 'duration': 5.302}, {'end': 1035.286, 'text': 'average of i is equal to 1 to m y y minus y hat y hat is basically my y predicted value, whole square, whereas ss mean is basically given as one by n.', 'start': 1021.289, 'duration': 13.997}, {'end': 1041.111, 'text': 'summation of i is equal to one to n y minus y mean whole square.', 'start': 1035.286, 'duration': 5.825}, {'end': 1048.737, 'text': 'So, obviously, if we, we will always know that SSRES and SSMEAN from this two.', 'start': 1041.83, 'duration': 6.907}, {'end': 1055.102, 'text': 'if we calculate in this particular way, so basically SSMEAN will always be greater than SSRES.', 'start': 1048.737, 'duration': 6.365}, {'end': 1060.326, 'text': 'So SSMEAN will always be greater than SSRES if our model is very, very good.', 'start': 1055.162, 'duration': 5.164}, {'end': 1065.302, 'text': 'So this value that I am doing a division will be a small number.', 'start': 1061.021, 'duration': 4.281}, {'end': 1073.144, 'text': 'So 1 minus small number will be ranging between 0.8 to 0.98 somewhere around this.', 'start': 1066.122, 'duration': 7.022}, {'end': 1078.646, 'text': 'Most of the time we will be getting value greater than 0.8.', 'start': 1074.365, 'duration': 4.281}, {'end': 1083.36, 'text': 'And usually the R square values comes between 0 to 1.', 'start': 1078.646, 'duration': 4.714}], 'summary': 'R-squared formula: 1 - sum of residual/sum of mean, resulting in values between 0.8-0.98, indicating a good model.', 'duration': 28.198, 'max_score': 1000.075, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/5rvnlZWzox8/pics/5rvnlZWzox81000075.jpg'}, {'end': 1152.678, 'src': 'embed', 'start': 1122.549, 'weight': 1, 'content': [{'end': 1139.983, 'text': 'so that basically indicates that whatever model we have actually designed over here it is a good model is able to solve this multiple linear regression the perfect example of multiple linear regression.', 'start': 1122.549, 'duration': 17.434}, {'end': 1148.233, 'text': "i have also got a model where i'm using this r squared values or r squared score to measure how my model has done.", 'start': 1139.983, 'duration': 8.25}, {'end': 1151.397, 'text': 'So I hope you like this video guys.', 'start': 1149.616, 'duration': 1.781}, {'end': 1152.678, 'text': 'Please share with your friends.', 'start': 1151.517, 'duration': 1.161}], 'summary': 'A good model for multiple linear regression with high r-squared score.', 'duration': 30.129, 'max_score': 1122.549, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/5rvnlZWzox8/pics/5rvnlZWzox81122549.jpg'}], 'start': 791.031, 'title': 'Multiple linear regression model training', 'summary': 'Explains training a multiple linear regression model with train-test split (test size 0.2, random state 0) and making predictions. it discusses r squared (0.93) for model accuracy.', 'chapters': [{'end': 977.441, 'start': 791.031, 'title': 'Multiple linear regression model training', 'summary': 'Explains the process of training a multiple linear regression model with multiple independent features, using the train-test split with a test size of 0.2 and random state of 0, and fitting the model to make predictions.', 'duration': 186.41, 'highlights': ['The process involves training a multiple linear regression model with multiple independent features The chapter explains the process of training a multiple linear regression model with multiple independent features, using the train-test split with a test size of 0.2 and random state of 0.', "The train-test split is performed using the library called train_test_split In the train-test split, the library 'train_test_split' is used with x, y, a test size of 0.2, and a random state of 0.", 'Fitting the model with X train and Y train to initialize the linear regression model for training The model is initialized and fitted with X train and Y train to create the linear regression model for training.', 'Making predictions by using the trained model on the X test data to obtain Y predictions The trained model is used to make predictions on the X test data to obtain Y predictions.']}, {'end': 1189.453, 'start': 977.441, 'title': 'Calculating r squared for model accuracy', 'summary': 'Discusses the concept of r squared to measure model accuracy, highlighting that a high r squared value (0.93) indicates a good model for multiple linear regression.', 'duration': 212.012, 'highlights': ['The R squared value obtained is 0.93, indicating a very good model for multiple linear regression.', 'R squared values range between 0 to 1, with values nearer to one signifying a very good model.', 'SSMEAN will always be greater than SSRES if the model is very good, resulting in a small R squared value.', 'R squared is calculated using the formula: 1 - (sum of residual / sum of mean).']}], 'duration': 398.422, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/5rvnlZWzox8/pics/5rvnlZWzox8791031.jpg', 'highlights': ['The process involves training a multiple linear regression model with multiple independent features using the train-test split with a test size of 0.2 and random state of 0.', 'The R squared value obtained is 0.93, indicating a very good model for multiple linear regression.', "In the train-test split, the library 'train_test_split' is used with x, y, a test size of 0.2, and a random state of 0.", 'R squared values range between 0 to 1, with values nearer to one signifying a very good model.']}], 'highlights': ['The equation y=β0+β1x1+β2x2+β3x3 is used to predict the output value based on multiple independent features, with β0 as the intercept and β1, β2, β3 as slopes of the coefficient with respect to the independent features.', "The practical use case involves a data set called '50_startups.csv' and the usage of NumPy for creating arrays, Matplotlib for visualization, and Pandas for creating the data set.", 'The intercept in simple linear regression indicates the base price when the size is zero, while the slope indicates the unit increase in price with a unit increase in size.', 'The goal is to predict the profit value based on the values of R&D spend, administration, marketing spend, and the state, constituting a regression problem. Quantifiable data: R&D spend, administration, marketing spend, state.', 'The process involves dividing the data set into dependent (profit) and independent features (R&D spend, administration spend, marketing spend, state). Key point: Dividing data into dependent and independent features. Quantifiable data: Profit, R&D spend, administration spend, marketing spend, state.', 'The chapter explains the process of converting categorical features into one hot encoding using get_dummies function in Pandas, addressing the issue of dummy variable trap and concatenating the dummy variables with the original dataset, resulting in a new data frame with two columns.', 'The get_dummies function in Pandas is used to create dummy variables for the three unique categories (New York, California, and Florida), resulting in three columns representing the presence of each category, with the drop_first=True parameter to avoid the dummy variable trap.', 'The process involves training a multiple linear regression model with multiple independent features using the train-test split with a test size of 0.2 and random state of 0.', 'The R squared value obtained is 0.93, indicating a very good model for multiple linear regression.', "In the train-test split, the library 'train_test_split' is used with x, y, a test size of 0.2, and a random state of 0.", 'R squared values range between 0 to 1, with values nearer to one signifying a very good model.']}