title

Principal Component Analysis in Python | Basics of Principle Component Analysis Explained | Edureka

description

๐ฅ Post Graduate Diploma in Artificial Intelligence by E&ICT Academy
NIT Warangal: https://www.edureka.co/executive-programs/machine-learning-and-ai
This Edureka session on Principal Component Analysis (PCA) will help you understand the concepts behind dimensionality reduction and how PCA can be used to deal with high dimensional data.
Hereโs a list of topics that will be covered in this session:
1. Need For Principal Component Analysis
2. What is PCA?
3. Step by step computation of PCA
4. Principal Component Analysis With Python
Check out the Entire Machine Learning Playlist: https://bit.ly/2NG9tK4
Do subscribe to our channel and hit the bell icon to never miss an update from us in the future: https://goo.gl/6ohpTV
----------๐๐๐ฎ๐ซ๐๐ค๐ ๐๐ฒ๐ญ๐ก๐จ๐ง ๐๐ซ๐๐ข๐ง๐ข๐ง๐ ๐ฌ-----------
๐ตPython Programming Certification: http://bit.ly/37rEsnA
๐ตPython Certification Training for Data Science: http://bit.ly/2Gj6fux
----------๐๐๐ฎ๐ซ๐๐ค๐ ๐๐๐ฌ๐ญ๐๐ซ๐ฌ ๐๐ซ๐จ๐ ๐ซ๐๐ฆ----------
๐ตData Scientist Masters Program: http://bit.ly/2t1snGM
๐ตMachine Learning Engineer Masters Program: https://bit.ly/3Hi1sXN
-----------๐๐๐ฎ๐ซ๐๐ค๐ ๐๐ง๐ข๐ฏ๐๐ซ๐ฌ๐ข๐ญ๐ฒ ๐๐ซ๐จ๐ ๐ซ๐๐ฆ----------
๐Post Graduate Diploma in Artificial Intelligence Course offered by E&ICT Academy
NIT Warangal: https://bit.ly/3qdRRdw
Instagram: https://www.instagram.com/edureka_learning/
Facebook: https://www.facebook.com/edurekaIN/
Twitter: https://twitter.com/edurekain
LinkedIn: https://www.linkedin.com/company/edureka
Slideshare: https://www.slideshare.net/EdurekaIN/
-------------------------------------
About the Masters Program
Edurekaโs Machine Learning Engineer Masters Program makes you proficient in techniques like Supervised Learning, Unsupervised Learning and Natural Language Processing. It includes training on the latest advancements and technical approaches in Artificial Intelligence & Machine Learning such as Deep Learning, Graphical Models and Reinforcement Learning.
The Master's Program Covers Topics LIke:
Python Programming
PySpark
HDFS
Spark SQL
Machine Learning Techniques and Artificial Intelligence Types
Tokenization
Named Entity Recognition
Lemmatization
Supervised Algorithms
Unsupervised Algorithms
Tensor Flow
Deep learning
Keras
Neural Networks
Bayesian and Markovโs Models
Inference
Decision Making
Bandit Algorithms
Bellman Equation
Policy Gradient Methods.
----------------------
Prerequisites
There are no prerequisites for enrolment to the Masters Program. However, as a goodwill gesture, Edureka offers a complimentary self-paced course in your LMS on SQL Essentials to brush up on your SQL Skills. This program is designed and developed for an aspirant planning to build a career in Machine Learning or an experienced professional working in the IT industry.
--------------------------------------
Please write back to us at sales@edureka.in or call us at IND: 9606058406 / US: 18338555775 (toll-free) for more information.

detail

{'title': 'Principal Component Analysis in Python | Basics of Principle Component Analysis Explained | Edureka', 'heatmap': [{'end': 268.486, 'start': 171.613, 'weight': 0.784}, {'end': 461.929, 'start': 432.913, 'weight': 0.968}, {'end': 615.143, 'start': 595.67, 'weight': 0.901}, {'end': 841.469, 'start': 768.823, 'weight': 0.983}, {'end': 912.87, 'start': 887.901, 'weight': 0.722}, {'end': 1147.285, 'start': 996.917, 'weight': 0.807}, {'end': 1549.922, 'start': 1483.832, 'weight': 0.72}, {'end': 1689.984, 'start': 1646.279, 'weight': 0.72}], 'summary': 'Tutorial series explains principal component analysis (pca) and its significance in solving high-dimensional data problems. it covers the need for pca, the curse of dimensionality, dimensionality reduction, covariance analysis, eigenvectors, eigenvalues, and demonstrates reducing dataset dimension from 9000 to 500, resulting in 10x simplification in computation.', 'chapters': [{'end': 298.249, 'segs': [{'end': 37.934, 'src': 'embed', 'start': 11.624, 'weight': 3, 'content': [{'end': 15.806, 'text': 'With the advancements in the field of machine learning and artificial intelligence.', 'start': 11.624, 'duration': 4.182}, {'end': 20.547, 'text': 'It has become very important to understand the fundamentals behind such Technologies.', 'start': 16.146, 'duration': 4.401}, {'end': 26.69, 'text': "Hi all I'm Zuleika from Edureka and I welcome you to this session on principle component analysis.", 'start': 21.188, 'duration': 5.502}, {'end': 34.213, 'text': 'In this session, you will understand the concepts behind dimensionality reduction and how it can be used to deal with high dimensional data.', 'start': 27.47, 'duration': 6.743}, {'end': 37.934, 'text': "Now before we move any further, let's take a look at the agenda for today.", 'start': 34.833, 'duration': 3.101}], 'summary': 'Advancements in machine learning and ai make understanding pca crucial. zuleika from edureka discusses dimensionality reduction in high-dimensional data.', 'duration': 26.31, 'max_score': 11.624, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/n7npKX5zIWI/pics/n7npKX5zIWI11624.jpg'}, {'end': 171.613, 'src': 'embed', 'start': 86.903, 'weight': 0, 'content': [{'end': 91.069, 'text': 'but, however, using a large data set has its own pitfalls.', 'start': 86.903, 'duration': 4.166}, {'end': 98.779, 'text': 'The biggest pitfall is the curse of dimensionality now in order to understand what exactly curse of dimensionality means.', 'start': 91.589, 'duration': 7.19}, {'end': 100.782, 'text': "Let's take a look at a small example.", 'start': 99.18, 'duration': 1.602}, {'end': 103.045, 'text': 'Now look at the figure on the screen.', 'start': 101.122, 'duration': 1.923}, {'end': 109.796, 'text': "So consider a line of hundred yards and let's say that you have dropped a coin somewhere on this line.", 'start': 103.974, 'duration': 5.822}, {'end': 118.879, 'text': "Now, it's very easy and it's very convenient for you to find the coin by simply walking on the line and this very line is a single dimensional entity.", 'start': 110.296, 'duration': 8.583}, {'end': 124.323, 'text': "Now in the second image, you can see that there is a square of let's say of side hundred yards.", 'start': 119.6, 'duration': 4.723}, {'end': 127.704, 'text': "Now again, you've dropped a coin somewhere in between.", 'start': 124.923, 'duration': 2.781}, {'end': 134.968, 'text': "now it's quite evident that you're going to take more time to find the coin within that square as compared to the previous scenario,", 'start': 127.704, 'duration': 7.264}, {'end': 137.79, 'text': 'because the square is a two-dimensional entity.', 'start': 134.968, 'duration': 2.822}, {'end': 143.213, 'text': "right now, let's go one step further and let's consider cube of side hundred yards.", 'start': 137.79, 'duration': 5.423}, {'end': 147.655, 'text': "So again, let's assume that you dropped a coin somewhere in between this cube.", 'start': 144.013, 'duration': 3.642}, {'end': 155.21, 'text': "So obviously it's going to be more difficult for you to find the coin this time because a cube is a three-dimensional entity.", 'start': 148.354, 'duration': 6.856}, {'end': 163.589, 'text': 'So, as you can observe, the complexity is increasing as the dimension of the whole plane is increasing and, in real life,', 'start': 155.845, 'duration': 7.744}, {'end': 171.613, 'text': "the high dimensional data that we're talking about has thousands of dimensions that make it very, very complex to handle and process right.", 'start': 163.589, 'duration': 8.024}], 'summary': 'High dimensional data presents complexity, with thousands of dimensions making it difficult to handle and process.', 'duration': 84.71, 'max_score': 86.903, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/n7npKX5zIWI/pics/n7npKX5zIWI86903.jpg'}, {'end': 268.486, 'src': 'heatmap', 'start': 171.613, 'weight': 0.784, 'content': [{'end': 179.777, 'text': 'the high dimensional data can be easily found in use cases like image processing in natural language processing, image translation and so on.', 'start': 171.613, 'duration': 8.164}, {'end': 183.038, 'text': 'So this is exactly what the curse of dimensionality means.', 'start': 180.297, 'duration': 2.741}, {'end': 188.42, 'text': 'So to get rid of this curse, we came up with a process called dimensionality reduction.', 'start': 183.458, 'duration': 4.962}, {'end': 200.105, 'text': 'Now, dimensionality reduction techniques can be used to filter only a limited number of significant features which are needed for training your model or your predictive model or your machine learning model.', 'start': 188.921, 'duration': 11.184}, {'end': 204.607, 'text': 'And this is exactly where principal component analysis comes into the picture.', 'start': 200.645, 'duration': 3.962}, {'end': 209.322, 'text': "So now let's move on and discuss what exactly principal component analysis is.", 'start': 205.279, 'duration': 4.043}, {'end': 213.546, 'text': 'So principal component analysis, also known as PCA,', 'start': 209.863, 'duration': 3.683}, {'end': 226.536, 'text': 'is a dimensionality reduction technique that lets you identify correlations and patterns in a data set so that it can be transformed into another data set which has lower dimensions,', 'start': 213.546, 'duration': 12.99}, {'end': 233.542, 'text': "and it also make sure that you're not losing much information when you're transferring your data from high dimension to low dimension.", 'start': 226.536, 'duration': 7.006}, {'end': 241.466, 'text': 'Now the main idea behind PCA is to figure out patterns and correlations among different features in your data set.', 'start': 234.082, 'duration': 7.384}, {'end': 242.946, 'text': 'So why do we do that??', 'start': 242.126, 'duration': 0.82}, {'end': 246.188, 'text': 'Why do we have to find correlations among different features?', 'start': 242.986, 'duration': 3.202}, {'end': 256.278, 'text': "So the reason why we're finding highly correlated features is because highly correlated independent features usually cause an output which is very biased.", 'start': 246.992, 'duration': 9.286}, {'end': 262.582, 'text': "right, and if two features within data set are highly correlated and I'm talking about two independent features or you can say,", 'start': 256.278, 'duration': 6.304}, {'end': 268.486, 'text': 'two predictor features now if two predictor features in your data set are highly correlated,', 'start': 262.582, 'duration': 5.904}], 'summary': 'Dimensionality reduction techniques like pca help identify and transform high-dimensional data into a lower-dimensional dataset to avoid losing much information.', 'duration': 96.873, 'max_score': 171.613, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/n7npKX5zIWI/pics/n7npKX5zIWI171613.jpg'}, {'end': 241.466, 'src': 'embed', 'start': 209.863, 'weight': 1, 'content': [{'end': 213.546, 'text': 'So principal component analysis, also known as PCA,', 'start': 209.863, 'duration': 3.683}, {'end': 226.536, 'text': 'is a dimensionality reduction technique that lets you identify correlations and patterns in a data set so that it can be transformed into another data set which has lower dimensions,', 'start': 213.546, 'duration': 12.99}, {'end': 233.542, 'text': "and it also make sure that you're not losing much information when you're transferring your data from high dimension to low dimension.", 'start': 226.536, 'duration': 7.006}, {'end': 241.466, 'text': 'Now the main idea behind PCA is to figure out patterns and correlations among different features in your data set.', 'start': 234.082, 'duration': 7.384}], 'summary': 'Pca is a dimensionality reduction technique that identifies correlations and patterns in a dataset to transform it into a lower-dimensional set without losing much information.', 'duration': 31.603, 'max_score': 209.863, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/n7npKX5zIWI/pics/n7npKX5zIWI209863.jpg'}], 'start': 11.624, 'title': 'Principal component analysis and curse of dimensionality', 'summary': 'Covers the need for principal component analysis to deal with high dimensional data, along with the curse of dimensionality and the concept of pca as a dimensionality reduction technique. it emphasizes the increasing complexity with higher dimensions and the need to identify and remove redundant and highly correlated features to prevent bias in the output.', 'chapters': [{'end': 109.796, 'start': 11.624, 'title': 'Understanding principal component analysis', 'summary': 'Covers the need for principal component analysis to deal with high dimensional data in machine learning, followed by the step-by-step computation of pca and a practical demonstration using python.', 'duration': 98.172, 'highlights': ['The biggest pitfall is the curse of dimensionality in machine learning, where having a large and informative dataset can lead to issues due to high dimensionality. The curse of dimensionality can lead to issues in machine learning with large and informative datasets.', 'Understanding the need for principal component analysis in dealing with high dimensional data in machine learning. Explaining the need for principal component analysis to handle high dimensional data in machine learning.', 'The session covers the step-by-step computation of principal component analysis and a practical demonstration of using PCA with Python. Detailed explanation and practical demonstration of the step-by-step computation of principal component analysis using Python.']}, {'end': 298.249, 'start': 110.296, 'title': 'Curse of dimensionality and pca', 'summary': 'Explains the curse of dimensionality and the concept of principal component analysis (pca) as a dimensionality reduction technique, highlighting the increasing complexity with higher dimensions and the need to identify and remove redundant and highly correlated features from high dimensional data to prevent bias in the output.', 'duration': 187.953, 'highlights': ['The complexity of handling high dimensional data increases as the dimension of the whole plane increases, with thousands of dimensions making it very complex to handle and process. Handling high dimensional data becomes increasingly complex as the dimensions increase, with thousands of dimensions posing challenges in handling and processing the data.', 'Principal component analysis (PCA) is a dimensionality reduction technique that identifies correlations and patterns in a data set and transforms it into a lower-dimensional data set while preserving as much information as possible. PCA is a technique that identifies correlations and patterns in a data set and transforms it into a lower-dimensional data set while preserving as much information as possible, serving as a method for dimensionality reduction.', 'Identifying highly correlated features is crucial as they can cause bias in the output, and removing redundant and inconsistent features is essential for dimensionality reduction. Identifying highly correlated features is crucial as they can cause bias in the output, and removing redundant and inconsistent features is essential for dimensionality reduction, ensuring that only significant data is retained for predicting the output.']}], 'duration': 286.625, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/n7npKX5zIWI/pics/n7npKX5zIWI11624.jpg', 'highlights': ['Detailed explanation and practical demonstration of the step-by-step computation of principal component analysis using Python.', 'PCA is a technique that identifies correlations and patterns in a data set and transforms it into a lower-dimensional data set while preserving as much information as possible, serving as a method for dimensionality reduction.', 'Identifying highly correlated features is crucial as they can cause bias in the output, and removing redundant and inconsistent features is essential for dimensionality reduction, ensuring that only significant data is retained for predicting the output.', 'Understanding the need for principal component analysis to handle high dimensional data in machine learning.', 'The curse of dimensionality can lead to issues in machine learning with large and informative datasets.', 'Handling high dimensional data becomes increasingly complex as the dimensions increase, with thousands of dimensions posing challenges in handling and processing the data.']}, {'end': 549.846, 'segs': [{'end': 333.845, 'src': 'embed', 'start': 298.509, 'weight': 4, 'content': [{'end': 299.79, 'text': "Let's say in the original data set.", 'start': 298.509, 'duration': 1.281}, {'end': 305.755, 'text': "We have like hundred variables and then you just narrow it down to a couple of let's say 20 variables.", 'start': 299.83, 'duration': 5.925}, {'end': 308.639, 'text': 'This is exactly how dimensionality reduction is done.', 'start': 306.357, 'duration': 2.282}, {'end': 316.326, 'text': "Now one thing you need to keep in mind is when you're performing dimensionality reduction using PCA or any other method for that matter.", 'start': 309.199, 'duration': 7.127}, {'end': 325.113, 'text': 'You have to make sure that you perform this whole process in such a way that the significant data is still retained in your new data set.', 'start': 316.826, 'duration': 8.287}, {'end': 333.845, 'text': "So, basically, you're narrowing down a couple of variables from your original data set to your final data set, the data set that is reduced.", 'start': 325.98, 'duration': 7.865}], 'summary': 'Dimensionality reduction reduces 100 variables to 20, retaining significant data.', 'duration': 35.336, 'max_score': 298.509, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/n7npKX5zIWI/pics/n7npKX5zIWI298509.jpg'}, {'end': 461.929, 'src': 'heatmap', 'start': 432.913, 'weight': 0.968, 'content': [{'end': 439.075, 'text': 'Next, you will compute the principal components, followed by which will finally reduce the dimension of your data set,', 'start': 432.913, 'duration': 6.162}, {'end': 443.497, 'text': 'depending on all the features that you obtain or the principal components that you obtain.', 'start': 439.075, 'duration': 4.422}, {'end': 447.198, 'text': "now let's understand each of these steps in a more detailed manner.", 'start': 443.497, 'duration': 3.701}, {'end': 450.759, 'text': 'So what exactly do you mean by standardization of a data set?', 'start': 447.638, 'duration': 3.121}, {'end': 456.203, 'text': "Okay, if you're familiar with data analysis and with data processing,", 'start': 451.579, 'duration': 4.624}, {'end': 461.929, 'text': 'you know that missing out on standardization will probably result in a biased outcome,', 'start': 456.203, 'duration': 5.726}], 'summary': "Compute principal components to reduce data dimension, understanding standardization's importance in data analysis and processing.", 'duration': 29.016, 'max_score': 432.913, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/n7npKX5zIWI/pics/n7npKX5zIWI432913.jpg'}, {'end': 507.673, 'src': 'embed', 'start': 482.879, 'weight': 0, 'content': [{'end': 490.404, 'text': "Okay, these are all let's say ratings of a movie between 0 and 5 and the number of downloads that movie is caught on some website.", 'start': 482.879, 'duration': 7.525}, {'end': 493.987, 'text': "Okay, let's say that value ranges between hundred and five thousand.", 'start': 490.745, 'duration': 3.242}, {'end': 502.928, 'text': "Now, in such a scenario, it's very obvious that the output which is calculated by using these two variables is going to be biased,", 'start': 494.759, 'duration': 8.169}, {'end': 507.673, 'text': 'because the variables with a larger range will have a more obvious impact on the output.', 'start': 502.928, 'duration': 4.745}], 'summary': 'Movie ratings from 0 to 5 and downloads between 100 to 5000 may create biased output due to variable ranges.', 'duration': 24.794, 'max_score': 482.879, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/n7npKX5zIWI/pics/n7npKX5zIWI482879.jpg'}, {'end': 552.488, 'src': 'embed', 'start': 526.671, 'weight': 2, 'content': [{'end': 532.934, 'text': "That's why standardization or standardizing the data into a comparable range is very important.", 'start': 526.671, 'duration': 6.263}, {'end': 541.06, 'text': 'So what you do is you narrow down ratings and number of downloads into a similar range in decimal points or in single digits.', 'start': 533.534, 'duration': 7.526}, {'end': 543.061, 'text': 'So how do you perform standardization?', 'start': 541.5, 'duration': 1.561}, {'end': 549.846, 'text': 'You basically take the value of your variable, you subtract it with a mean and then you divide it by your standard deviation.', 'start': 543.101, 'duration': 6.745}, {'end': 552.488, 'text': "That's exactly how you perform standardization.", 'start': 550.286, 'duration': 2.202}], 'summary': 'Standardizing data is crucial; narrow ratings and downloads into comparable decimal ranges; perform standardization by subtracting mean and dividing by standard deviation.', 'duration': 25.817, 'max_score': 526.671, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/n7npKX5zIWI/pics/n7npKX5zIWI526671.jpg'}], 'start': 298.509, 'title': 'Dimensionality reduction and pca', 'summary': 'Focuses on the significance of dimensionality reduction in retaining important data while narrowing variables, emphasizing the relevance of pca in solving high dimensional data problems. it explains the step-by-step process of pca, including standardization, computing covariance matrix, eigenvalues, and eigenvectors, underlining the importance of standardization for accuracy.', 'chapters': [{'end': 376.993, 'start': 298.509, 'title': 'Importance of dimensionality reduction', 'summary': 'Discusses the significance of dimensionality reduction, particularly in retaining significant data while narrowing down variables from the original data set to the reduced data set, emphasizing the importance of not losing out on important information. it also highlights the relevance of pca in solving complex data-driven problems with high dimensional data sets.', 'duration': 78.484, 'highlights': ['The significance of dimensionality reduction lies in retaining important data while narrowing down variables from the original data set to the reduced data set, emphasizing the importance of not losing out on significant information.', 'PCA is crucial in solving complex data-driven problems involving high dimensional data sets and is implemented in the majority of machine learning algorithms.', 'The process of dimensionality reduction can be achieved through a series of steps, such as PCA, and it is essential in solving complex data-driven problems.']}, {'end': 549.846, 'start': 376.993, 'title': 'Principal component analysis (pca)', 'summary': 'Explains the step-by-step process of principal component analysis (pca), which includes standardization, computing covariance matrix, eigenvalues and eigenvectors, and reducing the dimension of the dataset, emphasizing the importance of standardization for accurate outcomes.', 'duration': 172.853, 'highlights': ['The importance of standardization for accurate outcomes Standardization is crucial for scaling data to ensure all variables and features lie within a similar range, preventing biased outcomes, as demonstrated through the example of movie ratings and number of downloads.', 'Process of Principal Component Analysis (PCA) The step-by-step process of PCA involves standardization, computing covariance matrix, eigenvalues and eigenvectors, and reducing the dimension of the dataset to enable the implementation of machine learning and predictive modeling.', 'Impact of variables on the outcome Explains how variables with larger ranges can have a more significant impact on the output, emphasizing the need to standardize data to ensure fair influence of all features in the dataset.']}], 'duration': 251.337, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/n7npKX5zIWI/pics/n7npKX5zIWI298509.jpg', 'highlights': ['PCA is crucial in solving complex data-driven problems involving high dimensional data sets and is implemented in the majority of machine learning algorithms.', 'The process of dimensionality reduction can be achieved through a series of steps, such as PCA, and it is essential in solving complex data-driven problems.', 'The significance of dimensionality reduction lies in retaining important data while narrowing down variables from the original data set to the reduced data set, emphasizing the importance of not losing out on significant information.', 'The step-by-step process of PCA involves standardization, computing covariance matrix, eigenvalues and eigenvectors, and reducing the dimension of the dataset to enable the implementation of machine learning and predictive modeling.', 'The importance of standardization for accurate outcomes Standardization is crucial for scaling data to ensure all variables and features lie within a similar range, preventing biased outcomes, as demonstrated through the example of movie ratings and number of downloads.', 'Impact of variables on the outcome Explains how variables with larger ranges can have a more significant impact on the output, emphasizing the need to standardize data to ensure fair influence of all features in the dataset.']}, {'end': 966.776, 'segs': [{'end': 621.668, 'src': 'heatmap', 'start': 595.67, 'weight': 0.901, 'content': [{'end': 600.553, 'text': 'So guys covariance matrix is something that we learned in I think I learned it around when I was in 10th standard.', 'start': 595.67, 'duration': 4.883}, {'end': 601.134, 'text': "I'm not sure.", 'start': 600.593, 'duration': 0.541}, {'end': 609.379, 'text': 'So, like I mentioned earlier, PCA helps you to identify the correlation and the dependencies among the features in a data set right,', 'start': 601.454, 'duration': 7.925}, {'end': 615.143, 'text': "because it's important to understand variables which are heavily correlated or, you know, heavily dependent on each other,", 'start': 609.379, 'duration': 5.764}, {'end': 616.304, 'text': 'so that you can get rid of them.', 'start': 615.143, 'duration': 1.161}, {'end': 621.668, 'text': 'unless these two variables are your predictor variable and your target variable,', 'start': 616.884, 'duration': 4.784}], 'summary': 'Pca helps identify correlations and dependencies among features in a dataset, enabling removal of heavily correlated or dependent variables.', 'duration': 25.998, 'max_score': 595.67, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/n7npKX5zIWI/pics/n7npKX5zIWI595670.jpg'}, {'end': 650.121, 'src': 'embed', 'start': 625.931, 'weight': 1, 'content': [{'end': 632.676, 'text': 'But if the other features or if the other predictor variables in your data set are highly dependent on each other,', 'start': 625.931, 'duration': 6.745}, {'end': 634.758, 'text': 'then your output is going to be biased again.', 'start': 632.676, 'duration': 2.082}, {'end': 641.663, 'text': 'So PCA basically helps you identify this correlation between your different predictor variables or your different features in the data set.', 'start': 635.598, 'duration': 6.065}, {'end': 650.121, 'text': 'So what a covariance Matrix does is a covariance Matrix will represent the correlation between the different variables in a data set.', 'start': 642.213, 'duration': 7.908}], 'summary': 'Pca identifies and represents correlation between predictor variables.', 'duration': 24.19, 'max_score': 625.931, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/n7npKX5zIWI/pics/n7npKX5zIWI625931.jpg'}, {'end': 841.469, 'src': 'heatmap', 'start': 768.823, 'weight': 0.983, 'content': [{'end': 773.926, 'text': "Now, let's move on and look at the next step which is calculating your eigenvectors and eigenvalues.", 'start': 768.823, 'duration': 5.103}, {'end': 782.268, 'text': 'Now guys eigenvectors and eigenvalues are the mathematical construct that must be computed from your covariance matrix.', 'start': 774.665, 'duration': 7.603}, {'end': 789.071, 'text': 'Now the reason why you need eigenvectors and eigenvalues is so that you can determine the principal components of your data set.', 'start': 782.789, 'duration': 6.282}, {'end': 793.654, 'text': 'Now, at this point, you must be wondering what exactly a principal component is right?', 'start': 789.632, 'duration': 4.022}, {'end': 801.057, 'text': 'So, simply put, principal components are the new set of variables that are obtained from your initial set of variables.', 'start': 794.214, 'duration': 6.843}, {'end': 810.14, 'text': 'So basically they are computed in such a manner that the newly obtained variables are highly significant and independent of each other.', 'start': 801.698, 'duration': 8.442}, {'end': 812.16, 'text': "Now, this is something that's very important.", 'start': 810.48, 'duration': 1.68}, {'end': 818.502, 'text': "So, principal components are basically the new features that you'll obtain after you perform dimensionality reduction,", 'start': 812.461, 'duration': 6.041}, {'end': 822.723, 'text': 'after you basically go through the process of reducing, cutting down your variables.', 'start': 818.502, 'duration': 4.221}, {'end': 829.225, 'text': 'So very important that these principal components have to be highly significant for predicting your output,', 'start': 823.343, 'duration': 5.882}, {'end': 831.766, 'text': 'and they also have to be independent of each other.', 'start': 829.225, 'duration': 2.541}, {'end': 833.647, 'text': 'Now the principal components.', 'start': 832.326, 'duration': 1.321}, {'end': 841.469, 'text': 'what they do is they compress, and they possess most of the useful information that is actually scattered among all the different variables.', 'start': 833.647, 'duration': 7.822}], 'summary': 'Calculate eigenvectors and eigenvalues to determine principal components for dimensionality reduction.', 'duration': 72.646, 'max_score': 768.823, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/n7npKX5zIWI/pics/n7npKX5zIWI768823.jpg'}, {'end': 833.647, 'src': 'embed', 'start': 789.632, 'weight': 2, 'content': [{'end': 793.654, 'text': 'Now, at this point, you must be wondering what exactly a principal component is right?', 'start': 789.632, 'duration': 4.022}, {'end': 801.057, 'text': 'So, simply put, principal components are the new set of variables that are obtained from your initial set of variables.', 'start': 794.214, 'duration': 6.843}, {'end': 810.14, 'text': 'So basically they are computed in such a manner that the newly obtained variables are highly significant and independent of each other.', 'start': 801.698, 'duration': 8.442}, {'end': 812.16, 'text': "Now, this is something that's very important.", 'start': 810.48, 'duration': 1.68}, {'end': 818.502, 'text': "So, principal components are basically the new features that you'll obtain after you perform dimensionality reduction,", 'start': 812.461, 'duration': 6.041}, {'end': 822.723, 'text': 'after you basically go through the process of reducing, cutting down your variables.', 'start': 818.502, 'duration': 4.221}, {'end': 829.225, 'text': 'So very important that these principal components have to be highly significant for predicting your output,', 'start': 823.343, 'duration': 5.882}, {'end': 831.766, 'text': 'and they also have to be independent of each other.', 'start': 829.225, 'duration': 2.541}, {'end': 833.647, 'text': 'Now the principal components.', 'start': 832.326, 'duration': 1.321}], 'summary': 'Principal components are new significant, independent variables obtained after dimensionality reduction, crucial for predicting output.', 'duration': 44.015, 'max_score': 789.632, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/n7npKX5zIWI/pics/n7npKX5zIWI789632.jpg'}, {'end': 912.87, 'src': 'heatmap', 'start': 887.901, 'weight': 0.722, 'content': [{'end': 894.284, 'text': "your overall performance will be reduced because you're performing unnecessary computation which, might you know, affect your final output.", 'start': 887.901, 'duration': 6.383}, {'end': 900.207, 'text': "So that's why you need to make sure that your principal components have all the essential data in them.", 'start': 894.825, 'duration': 5.382}, {'end': 905.967, 'text': 'So if your data set has five dimensions, then you have to form five principal components.', 'start': 900.925, 'duration': 5.042}, {'end': 912.87, 'text': "So initially you'll have to start by forming the same number of principal components as the number of dimensions that are there in your data set.", 'start': 906.467, 'duration': 6.403}], 'summary': 'Unnecessary computation reduces performance. use components corresponding to data dimensions.', 'duration': 24.969, 'max_score': 887.901, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/n7npKX5zIWI/pics/n7npKX5zIWI887901.jpg'}, {'end': 982.561, 'src': 'embed', 'start': 950.808, 'weight': 0, 'content': [{'end': 957.151, 'text': 'So you need to compute them in such a way that your first principal component will have the maximum information.', 'start': 950.808, 'duration': 6.343}, {'end': 960.453, 'text': 'The second will have the second maximum information and so on.', 'start': 957.432, 'duration': 3.021}, {'end': 966.776, 'text': 'So, when you do this, you know that if you arrange your data, or if you arrange these values in descending order,', 'start': 961.033, 'duration': 5.743}, {'end': 973.44, 'text': 'you know that the first few principal components are the most important, and you can just take them and use them as your new data variables.', 'start': 966.776, 'duration': 6.664}, {'end': 982.561, 'text': "Now where exactly do eigenvectors fall into this whole process? I'm assuming that you have a basic understanding of eigenvectors and eigenvalues.", 'start': 974.21, 'duration': 8.351}], 'summary': 'Compute principal components for maximum information. utilize eigenvectors and eigenvalues.', 'duration': 31.753, 'max_score': 950.808, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/n7npKX5zIWI/pics/n7npKX5zIWI950808.jpg'}], 'start': 550.286, 'title': 'Covariance and principal components analysis', 'summary': 'Discusses the importance of covariance values in identifying relationships between variables, the role of eigenvectors and eigenvalues in determining principal components, and their significance in dimensionality reduction and predictive modeling.', 'chapters': [{'end': 722.646, 'start': 550.286, 'title': 'Standardization and covariance matrix in ml', 'summary': 'Discusses the importance of standardization in machine learning and the role of covariance matrix in identifying correlation and dependencies among features, with a focus on reducing bias and redundant information. it also highlights the mathematical representation and key takeaways of the covariance matrix.', 'duration': 172.36, 'highlights': ['Performing standardization is essential to level down all variables in one range, enabling their use in predicting the output, and it scales variables across a standard and comparable scale.', 'PCA helps to identify the correlation and dependencies among features in a data set, emphasizing the importance of understanding variables heavily correlated or dependent on each other to avoid bias.', 'A covariance matrix represents the correlation between different variables in a data set and its dimensions are crucial, with each entry representing the covariance of the corresponding variables.']}, {'end': 966.776, 'start': 722.646, 'title': 'Covariance and principal components analysis', 'summary': 'Discusses how covariance values indicate the relationship between variables, with negative values denoting indirect proportionality and positive values denoting direct proportionality. it also explains the importance of eigenvectors and eigenvalues in determining the principal components of a data set, which are new variables obtained from the initial set and play a significant role in dimensionality reduction and predictive modeling.', 'duration': 244.13, 'highlights': ['Covariance value indicates the relationship between variables, with negative values denoting indirect proportionality and positive values denoting direct proportionality. The covariance value signifies the relationship between variables, with negative values indicating indirect proportionality (e.g., if a increases, b decreases) and positive values denoting direct proportionality (e.g., if a increases, b also increases).', 'Explanation of the importance of eigenvectors and eigenvalues in determining the principal components of a data set. Eigenvectors and eigenvalues are essential mathematical constructs computed from the covariance matrix to determine the principal components of a data set, which are new significant and independent variables obtained from the initial set.', 'Description of the significance of principal components in dimensionality reduction and predictive modeling. Principal components are new variables obtained from the initial set, which are highly significant and independent of each other, playing a crucial role in dimensionality reduction and predictive modeling by compressing and containing most of the useful information from the original variables.']}], 'duration': 416.49, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/n7npKX5zIWI/pics/n7npKX5zIWI550286.jpg', 'highlights': ['Performing standardization scales variables across a standard and comparable scale, essential for predicting the output.', 'PCA helps identify correlation and dependencies among features, emphasizing understanding heavily correlated variables to avoid bias.', 'Covariance matrix represents correlation between variables, with each entry representing the covariance of corresponding variables.', 'Covariance value indicates the relationship between variables, with negative values denoting indirect proportionality and positive values denoting direct proportionality.', 'Eigenvectors and eigenvalues are essential mathematical constructs computed from the covariance matrix to determine principal components.', 'Principal components are new variables obtained from the initial set, highly significant and independent, crucial for dimensionality reduction and predictive modeling.']}, {'end': 1474.628, 'segs': [{'end': 1147.285, 'src': 'heatmap', 'start': 996.917, 'weight': 0.807, 'content': [{'end': 999.16, 'text': "Okay, let's say that you have a two-dimensional data set.", 'start': 996.917, 'duration': 2.243}, {'end': 1004.125, 'text': "For that you'll have to compute two eigenvectors, right, and their respective eigenvalues.", 'start': 999.46, 'duration': 4.665}, {'end': 1014.055, 'text': 'Now the idea behind eigenvectors is to use the covariance matrix to understand where in the data there is the most amount of variance.', 'start': 1004.685, 'duration': 9.37}, {'end': 1022.663, 'text': 'because the covariance matrix gives you the variance overall variance among different variables and your overall variance in your data.', 'start': 1014.675, 'duration': 7.988}, {'end': 1028.829, 'text': 'the eigenvalues and the eigenvectors are basically used to understand variance in your data set,', 'start': 1022.663, 'duration': 6.166}, {'end': 1034.214, 'text': 'because more variance in the data will basically denote more information about the data.', 'start': 1028.829, 'duration': 5.385}, {'end': 1042.704, 'text': 'So eigenvectors are basically used to identify where in your data you have the maximum variance, in which direction or in which variable,', 'start': 1035.04, 'duration': 7.664}, {'end': 1049.649, 'text': 'or in you know which way do you have maximum variance in your data set, because variance denotes more information.', 'start': 1042.704, 'duration': 6.945}, {'end': 1056.973, 'text': "right, and that's exactly the point behind principal components, because you need to compute principal components that store the maximum information.", 'start': 1049.649, 'duration': 7.324}, {'end': 1066.241, 'text': "Now maximum information is stored in a place where there is maximum variance, right? That's the whole point behind eigenvalues and eigenvectors.", 'start': 1057.553, 'duration': 8.688}, {'end': 1071.746, 'text': 'So eigenvalues on the other hand will simply denote the scalar representative of your eigenvector.', 'start': 1066.922, 'duration': 4.824}, {'end': 1077.892, 'text': 'So basically you need to know that eigenvectors and eigenvalues will compute the principal components of the data set.', 'start': 1072.367, 'duration': 5.525}, {'end': 1082.369, 'text': "Moving on to step number four, here you'll compute the principal components.", 'start': 1078.644, 'duration': 3.725}, {'end': 1089.277, 'text': 'Now, once you finalize your eigenvectors and eigenvalues, all you have to do is you have to order them in the descending order,', 'start': 1082.789, 'duration': 6.488}, {'end': 1093.462, 'text': 'where the eigenvector with the highest eigenvalue is the most significant.', 'start': 1089.277, 'duration': 4.185}, {'end': 1098.728, 'text': 'And the one that is the most significant will form the first principal component.', 'start': 1094.203, 'duration': 4.525}, {'end': 1106.325, 'text': 'So in this manner, what you do is you have a ordered list wherein the first principal component is of the most importance.', 'start': 1099.341, 'duration': 6.984}, {'end': 1108.866, 'text': 'So, when you arrange them in descending order,', 'start': 1106.965, 'duration': 1.901}, {'end': 1115.35, 'text': 'what you can do is you can remove the lesser significant principal components in order to reduce the dimension of your data.', 'start': 1108.866, 'duration': 6.484}, {'end': 1117.571, 'text': "That's as logical as it gets.", 'start': 1115.85, 'duration': 1.721}, {'end': 1121.213, 'text': 'This is the basic logic behind principal component analysis.', 'start': 1118.011, 'duration': 3.202}, {'end': 1126.239, 'text': 'Also remember that in this step, you form a matrix, known as feature matrix,', 'start': 1121.957, 'duration': 4.282}, {'end': 1134.163, 'text': 'which basically contains all the significant data variables or all the significant principal components that possess the maximum information about the data.', 'start': 1126.239, 'duration': 7.924}, {'end': 1137.485, 'text': 'This is everything that you need to know about principal components.', 'start': 1134.764, 'duration': 2.721}, {'end': 1139.474, 'text': "Now let's look at the last step.", 'start': 1138.132, 'duration': 1.342}, {'end': 1147.285, 'text': 'So the last step in performing PCA is to basically rearrange the original data with the final principal components,', 'start': 1139.955, 'duration': 7.33}], 'summary': 'Pca involves computing eigenvectors and eigenvalues to identify maximum variance, then forming principal components to reduce data dimensions.', 'duration': 150.368, 'max_score': 996.917, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/n7npKX5zIWI/pics/n7npKX5zIWI996917.jpg'}, {'end': 1098.728, 'src': 'embed', 'start': 1072.367, 'weight': 7, 'content': [{'end': 1077.892, 'text': 'So basically you need to know that eigenvectors and eigenvalues will compute the principal components of the data set.', 'start': 1072.367, 'duration': 5.525}, {'end': 1082.369, 'text': "Moving on to step number four, here you'll compute the principal components.", 'start': 1078.644, 'duration': 3.725}, {'end': 1089.277, 'text': 'Now, once you finalize your eigenvectors and eigenvalues, all you have to do is you have to order them in the descending order,', 'start': 1082.789, 'duration': 6.488}, {'end': 1093.462, 'text': 'where the eigenvector with the highest eigenvalue is the most significant.', 'start': 1089.277, 'duration': 4.185}, {'end': 1098.728, 'text': 'And the one that is the most significant will form the first principal component.', 'start': 1094.203, 'duration': 4.525}], 'summary': 'Eigenvectors and eigenvalues compute principal components for data analysis.', 'duration': 26.361, 'max_score': 1072.367, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/n7npKX5zIWI/pics/n7npKX5zIWI1072367.jpg'}, {'end': 1134.163, 'src': 'embed', 'start': 1108.866, 'weight': 2, 'content': [{'end': 1115.35, 'text': 'what you can do is you can remove the lesser significant principal components in order to reduce the dimension of your data.', 'start': 1108.866, 'duration': 6.484}, {'end': 1117.571, 'text': "That's as logical as it gets.", 'start': 1115.85, 'duration': 1.721}, {'end': 1121.213, 'text': 'This is the basic logic behind principal component analysis.', 'start': 1118.011, 'duration': 3.202}, {'end': 1126.239, 'text': 'Also remember that in this step, you form a matrix, known as feature matrix,', 'start': 1121.957, 'duration': 4.282}, {'end': 1134.163, 'text': 'which basically contains all the significant data variables or all the significant principal components that possess the maximum information about the data.', 'start': 1126.239, 'duration': 7.924}], 'summary': 'Principal component analysis reduces data dimensions by removing lesser significant components to form a feature matrix with maximum information.', 'duration': 25.297, 'max_score': 1108.866, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/n7npKX5zIWI/pics/n7npKX5zIWI1108866.jpg'}, {'end': 1298.713, 'src': 'embed', 'start': 1257.081, 'weight': 0, 'content': [{'end': 1261.323, 'text': "we're going to narrow them down to as less movies as possible, right?", 'start': 1257.081, 'duration': 4.242}, {'end': 1263.044, 'text': "So we're going to have lesser features.", 'start': 1261.343, 'duration': 1.701}, {'end': 1264.265, 'text': "That's what we're going to do.", 'start': 1263.425, 'duration': 0.84}, {'end': 1266.126, 'text': "So let's quickly open up PyCharm.", 'start': 1264.525, 'duration': 1.601}, {'end': 1268.407, 'text': "I'll be using PyCharm in order to run this code.", 'start': 1266.186, 'duration': 2.221}, {'end': 1273.91, 'text': "I've already typed out the code and I will briefly explain what exactly is happening in the code.", 'start': 1268.728, 'duration': 5.182}, {'end': 1276.472, 'text': "We'll focus more on the output that we're going to get.", 'start': 1274.131, 'duration': 2.341}, {'end': 1283.928, 'text': 'So as always you start by importing your dependencies all the libraries that you need in order to perform this right.', 'start': 1277.125, 'duration': 6.803}, {'end': 1288.669, 'text': 'So we load the data and we store it here in a pandas framework right?', 'start': 1284.008, 'duration': 4.661}, {'end': 1290.89, 'text': 'So our data stored in this path.', 'start': 1288.709, 'duration': 2.181}, {'end': 1298.713, 'text': 'now the data set contains, like I said, contains around 700 users for a thousand nine hundred and thirteen movie ratings.', 'start': 1290.89, 'duration': 7.823}], 'summary': 'Narrowing down movies, using pycharm to process code, dataset has 700 users and 1913 movie ratings.', 'duration': 41.632, 'max_score': 1257.081, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/n7npKX5zIWI/pics/n7npKX5zIWI1257081.jpg'}, {'end': 1372.627, 'src': 'embed', 'start': 1343.304, 'weight': 1, 'content': [{'end': 1345.885, 'text': "you have to make sure that they're converted into numeric format.", 'start': 1343.304, 'duration': 2.581}, {'end': 1351.526, 'text': "Also, what I'm doing is I'm replacing all the null values and I'm converting them to zeros.", 'start': 1346.745, 'duration': 4.781}, {'end': 1355.943, 'text': "So I'm just performing basic very basic data processing over here.", 'start': 1352.262, 'duration': 3.681}, {'end': 1360.164, 'text': 'All right, so guys take note of this function standard scalar function.', 'start': 1355.963, 'duration': 4.201}, {'end': 1365.085, 'text': 'This is basically a function provided by SQL and it performs standardization.', 'start': 1360.384, 'duration': 4.701}, {'end': 1372.627, 'text': "So like I said in our data set, we don't really need to perform standardization, but it's always a plus point to you know, perform it anyway.", 'start': 1365.345, 'duration': 7.282}], 'summary': 'Converting null values to zeros and using standard scalar function for data processing.', 'duration': 29.323, 'max_score': 1343.304, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/n7npKX5zIWI/pics/n7npKX5zIWI1343304.jpg'}, {'end': 1405.827, 'src': 'embed', 'start': 1382.711, 'weight': 3, 'content': [{'end': 1390.437, 'text': 'Now our next step will be to compute your covariance Matrix, right? So a covariance Matrix is created based on your standardized data set.', 'start': 1382.711, 'duration': 7.726}, {'end': 1396.06, 'text': 'So only your standardized data set will be passed to your covariance Matrix based on the standardized data set.', 'start': 1390.817, 'duration': 5.243}, {'end': 1398.742, 'text': "You'll build a covariance Matrix right?", 'start': 1396.081, 'duration': 2.661}, {'end': 1405.827, 'text': 'The covariance Matrix, like I said, is a representation of the variance between each features in your original data set, right?', 'start': 1398.822, 'duration': 7.005}], 'summary': 'Compute covariance matrix on standardized data to represent variance between features.', 'duration': 23.116, 'max_score': 1382.711, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/n7npKX5zIWI/pics/n7npKX5zIWI1382711.jpg'}, {'end': 1459.078, 'src': 'embed', 'start': 1431.48, 'weight': 4, 'content': [{'end': 1438.685, 'text': 'So basically the negative features like I said will denote that the variance or the relationship between two variables is inversely proportional.', 'start': 1431.48, 'duration': 7.205}, {'end': 1447.89, 'text': "Now once you've computed the covariance matrix, you're going to perform eigen decomposition in order to derive your eigenvalues and your eigenvectors.", 'start': 1439.503, 'duration': 8.387}, {'end': 1453.174, 'text': 'Eigenvectors and eigenvalues are found as a result of eigen decomposition.', 'start': 1448.31, 'duration': 4.864}, {'end': 1459.078, 'text': 'This is a fancy term just used to derive your eigenvectors and eigenvalues.', 'start': 1454.034, 'duration': 5.044}], 'summary': 'Covariance matrix helps derive eigenvalues and eigenvectors for relationship analysis.', 'duration': 27.598, 'max_score': 1431.48, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/n7npKX5zIWI/pics/n7npKX5zIWI1431480.jpg'}], 'start': 966.776, 'title': 'Eigenvectors, eigenvalues, and pca', 'summary': 'Covers the use of eigenvectors and eigenvalues to identify maximum variance in data, leading to the formation of principal components in principal component analysis (pca), a dimensionality reduction technique used in machine learning and deep learning.', 'chapters': [{'end': 1071.746, 'start': 966.776, 'title': 'Understanding eigenvectors and eigenvalues', 'summary': 'Explains how eigenvectors and eigenvalues are used to identify the maximum variance in a dataset, with the goal of storing the maximum information in principal components.', 'duration': 104.97, 'highlights': ['The dimensions in the data determine the number of eigenvectors needed to be calculated, such as having to compute two eigenvectors for a two-dimensional data set.', 'Eigenvectors are used to identify where in the data there is the most amount of variance, as more variance in the data denotes more information about the data.', 'Eigenvalues and eigenvectors are used to understand the variance in the dataset, with the goal of computing principal components that store the maximum information.']}, {'end': 1474.628, 'start': 1072.367, 'title': 'Principal component analysis', 'summary': 'Explains the basic logic behind principal component analysis (pca), its steps, and its significance, emphasizing the computation of eigenvectors, eigenvalues, and the formation of principal components, leading to a dimensionality reduction technique that simplifies computations, particularly in machine learning and deep learning.', 'duration': 402.261, 'highlights': ['PCA involves computing eigenvectors and eigenvalues to form principal components, allowing dimensionality reduction. The process of computing eigenvectors and eigenvalues to form principal components allows for dimensionality reduction, simplifying computations, particularly in machine learning and deep learning.', 'Ordering eigenvectors in descending order of eigenvalues to identify the most significant principal component. Ordering eigenvectors in descending order of eigenvalues helps in identifying the most significant principal component, enabling the removal of lesser significant principal components to reduce the data dimension.', 'Formation of a feature matrix containing significant data variables or principal components. In this step, a feature matrix is formed, containing significant data variables or principal components that possess maximum information about the data, facilitating dimensionality reduction.', 'Rearranging the original data with the final principal components to represent the most significant information of the data set. The rearrangement of the original data with the final principal components represents the most significant information of the data set, leading to a reduced data set containing only the most important information.', 'Performing PCA using Python on a high-dimensional movie rating data set to illustrate dimensionality reduction. Illustrating the application of principal component analysis in reducing the dimensions of a high-dimensional movie rating data set using Python, aiming to compress the data and reduce the number of features.']}], 'duration': 507.852, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/n7npKX5zIWI/pics/n7npKX5zIWI966776.jpg', 'highlights': ['Eigenvectors are used to identify where in the data there is the most amount of variance, as more variance in the data denotes more information about the data.', 'Ordering eigenvectors in descending order of eigenvalues to identify the most significant principal component. Ordering eigenvectors in descending order of eigenvalues helps in identifying the most significant principal component, enabling the removal of lesser significant principal components to reduce the data dimension.', 'Formation of a feature matrix containing significant data variables or principal components. In this step, a feature matrix is formed, containing significant data variables or principal components that possess maximum information about the data, facilitating dimensionality reduction.', 'Rearranging the original data with the final principal components to represent the most significant information of the data set. The rearrangement of the original data with the final principal components represents the most significant information of the data set, leading to a reduced data set containing only the most important information.', 'Performing PCA using Python on a high-dimensional movie rating data set to illustrate dimensionality reduction. Illustrating the application of principal component analysis in reducing the dimensions of a high-dimensional movie rating data set using Python, aiming to compress the data and reduce the number of features.', 'Eigenvalues and eigenvectors are used to understand the variance in the dataset, with the goal of computing principal components that store the maximum information.', 'PCA involves computing eigenvectors and eigenvalues to form principal components, allowing dimensionality reduction. The process of computing eigenvectors and eigenvalues to form principal components allows for dimensionality reduction, simplifying computations, particularly in machine learning and deep learning.', 'The dimensions in the data determine the number of eigenvectors needed to be calculated, such as having to compute two eigenvectors for a two-dimensional data set.']}, {'end': 1739.977, 'segs': [{'end': 1549.922, 'src': 'heatmap', 'start': 1483.832, 'weight': 0.72, 'content': [{'end': 1491.836, 'text': "Hence, your maximum information will be stored wherever there is maximum variance, right? Like I said earlier, that's the logic behind this.", 'start': 1483.832, 'duration': 8.004}, {'end': 1501.201, 'text': "Now after that what we're doing is we're just sorting out our entire eigenvalues in descending order like I said.", 'start': 1495.419, 'duration': 5.782}, {'end': 1507.203, 'text': "This is basically you're sorting it in a list which contains eigenvalues in the descending order.", 'start': 1501.741, 'duration': 5.462}, {'end': 1515.105, 'text': "After this we have your main part which is we're going to import the PCA function which is there in your SQLern library.", 'start': 1508.263, 'duration': 6.842}, {'end': 1523.474, 'text': "So what we'll do here is remember, like I said, that the first principal component will capture the most variance in your original variables.", 'start': 1516.047, 'duration': 7.427}, {'end': 1527.637, 'text': 'right?. The second component will capture the second highest variance in your data set.', 'start': 1523.474, 'duration': 4.163}, {'end': 1530.24, 'text': "That's the whole logic behind PC1 PC2.", 'start': 1528.078, 'duration': 2.162}, {'end': 1538.707, 'text': "And for example, if you were to plot from a data set that contains two features, so let's just wait till we get the eigenvalues and eigenvectors.", 'start': 1530.66, 'duration': 8.047}, {'end': 1540.869, 'text': 'This is taking a little bit of time to compute.', 'start': 1538.727, 'duration': 2.142}, {'end': 1549.922, 'text': 'So, guys, remember that your eigenvectors with the lowest eigenvalue will describe the least amount of variation that is there in the data set,', 'start': 1541.678, 'duration': 8.244}], 'summary': 'Sorting eigenvalues in descending order, using pca to capture variance in data.', 'duration': 66.09, 'max_score': 1483.832, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/n7npKX5zIWI/pics/n7npKX5zIWI1483832.jpg'}, {'end': 1594.973, 'src': 'embed', 'start': 1562.353, 'weight': 0, 'content': [{'end': 1566.496, 'text': 'So after you get the ordered list of eigenvalues, you can look at couple of you know.', 'start': 1562.353, 'duration': 4.143}, {'end': 1576.422, 'text': 'you can say that the first 15 or the first 20 or the first 50 of them are really essential and the rest of them are not really contributing to your significance in the data set.', 'start': 1566.496, 'duration': 9.926}, {'end': 1578.804, 'text': 'so you can get rid of those principal components.', 'start': 1576.422, 'duration': 2.382}, {'end': 1582.046, 'text': "So that's how you'll reduce your dimension in your data set.", 'start': 1579.364, 'duration': 2.682}, {'end': 1588.15, 'text': "So what we're going to do first is we'll just import the PCA function which is there in our SQL on library.", 'start': 1582.786, 'duration': 5.364}, {'end': 1594.973, 'text': 'and to get a better idea of how principal components describe the variance in the data,', 'start': 1589.049, 'duration': 5.924}], 'summary': 'Using pca, essential eigenvalues reduce dimensionality, improving dataset significance.', 'duration': 32.62, 'max_score': 1562.353, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/n7npKX5zIWI/pics/n7npKX5zIWI1562353.jpg'}, {'end': 1628.218, 'src': 'embed', 'start': 1603.718, 'weight': 3, 'content': [{'end': 1610.083, 'text': 'Now, if you look at the output, it shows that the first two principal components, which is PC1 and PC2,', 'start': 1603.718, 'duration': 6.365}, {'end': 1613.245, 'text': 'they describe approximately 14% of the variance in the data.', 'start': 1610.083, 'duration': 3.162}, {'end': 1619.333, 'text': 'Now to get a better view of how each principal component explains the variance within your data.', 'start': 1614.511, 'duration': 4.822}, {'end': 1621.895, 'text': "We'll create something known as a scree plot.", 'start': 1619.694, 'duration': 2.201}, {'end': 1623.656, 'text': 'Okay, a scree plot.', 'start': 1622.175, 'duration': 1.481}, {'end': 1626.517, 'text': "is this exactly what I've done in this set of code?", 'start': 1623.656, 'duration': 2.861}, {'end': 1628.218, 'text': 'now a scree plot is basically.', 'start': 1626.517, 'duration': 1.701}], 'summary': 'Pc1 and pc2 describe 14% of data variance. a scree plot visualizes variance explained.', 'duration': 24.5, 'max_score': 1603.718, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/n7npKX5zIWI/pics/n7npKX5zIWI1603718.jpg'}, {'end': 1709.437, 'src': 'heatmap', 'start': 1646.279, 'weight': 2, 'content': [{'end': 1649.981, 'text': 'Now this is a major reduction from the initial 8, 900 or 9, 000 features that we had.', 'start': 1646.279, 'duration': 3.702}, {'end': 1658.461, 'text': 'Right So what we did is we just narrowed down your 9000 features to approximately 500 features.', 'start': 1652.978, 'duration': 5.483}, {'end': 1666.705, 'text': 'Therefore just the first 500 or 490 eigenvectors should be used to construct the dimensions for the new feature space.', 'start': 1659.041, 'duration': 7.664}, {'end': 1675.709, 'text': 'So your final output, or your final data set, or your new data set, will contain only 480 features when compared to the original data set,', 'start': 1667.185, 'duration': 8.524}, {'end': 1677.209, 'text': 'which contained around 9000 features.', 'start': 1675.709, 'duration': 1.5}, {'end': 1679.952, 'text': 'So I hope all of you get the point.', 'start': 1678.61, 'duration': 1.342}, {'end': 1685.06, 'text': 'This is how we reduce your data dimension from 9000 to just 500.', 'start': 1680.092, 'duration': 4.968}, {'end': 1689.984, 'text': "that's going to reduce the complexity of the computation hundredfolds, right?", 'start': 1685.06, 'duration': 4.924}, {'end': 1692.746, 'text': 'So, guys, this is why I stress so much on PCA,', 'start': 1690.404, 'duration': 2.342}, {'end': 1701.011, 'text': "because PCA is one of the most simplest dimensionality reduction techniques and it's very essential because it makes your job 10 times simpler.", 'start': 1692.746, 'duration': 8.265}, {'end': 1709.437, 'text': "You're just getting rid of unnecessary and irrelevant data so that you can focus on the more important part of your predictive modeling,", 'start': 1701.392, 'duration': 8.045}], 'summary': 'Reduced data dimension from 9000 to 500, simplifying computation and focusing on important data.', 'duration': 63.158, 'max_score': 1646.279, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/n7npKX5zIWI/pics/n7npKX5zIWI1646279.jpg'}], 'start': 1475.289, 'title': 'Pca for dimensionality reduction', 'summary': 'Highlights computing eigenvalues and eigenvectors for pca, emphasizing their importance in capturing maximum variance and demonstrates reducing dataset dimension from 9000 to 500, resulting in 10x simplification in computation.', 'chapters': [{'end': 1561.688, 'start': 1475.289, 'title': 'Computing eigenvalues and eigenvectors for pca', 'summary': 'Emphasizes the importance of computing eigenvalues and eigenvectors to understand and capture maximum variance in data, sorting them in descending order, and utilizing the pca function to capture variance in original variables.', 'duration': 86.399, 'highlights': ['Eigenvalues and eigenvectors help understand maximum variance in data and store maximum information (quantifiable data: maximum variance)', 'Sorting eigenvalues in descending order is crucial for further analysis (quantifiable data: descending order)', 'First principal component captures the most variance in original variables, while the second component captures the second highest variance (quantifiable data: variance captured)', 'Eigenvectors with the lowest eigenvalue describe the least amount of variation in the data set, indicating the possibility of dropping off these variables (quantifiable data: significance in the data set)']}, {'end': 1739.977, 'start': 1562.353, 'title': 'Pca dimensionality reduction', 'summary': "Demonstrates the use of pca to reduce a dataset's dimension from 9000 features to 500, resulting in a 10x simplification in computation, and emphasizes the significance of pca in enhancing predictive modeling.", 'duration': 177.624, 'highlights': ["The dataset's dimension is reduced from 9000 features to 500 using PCA, resulting in a 10x simplification in computation. The initial dataset with around 9000 features is reduced to approximately 500 features, simplifying the computation process.", 'PCA explains that the first two principal components describe approximately 14% of the variance in the data. The first two principal components, PC1 and PC2, explain around 14% of the variance in the dataset.', 'Emphasizing the significance of PCA in enhancing predictive modeling and building a more accurate model. PCA is highlighted as a crucial technique for reducing unnecessary and irrelevant data, enabling a focus on the more important aspects of predictive modeling for better accuracy.']}], 'duration': 264.688, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/n7npKX5zIWI/pics/n7npKX5zIWI1475289.jpg', 'highlights': ['Eigenvalues and eigenvectors help understand maximum variance in data and store maximum information (quantifiable data: maximum variance)', 'Sorting eigenvalues in descending order is crucial for further analysis (quantifiable data: descending order)', "The dataset's dimension is reduced from 9000 features to 500 using PCA, resulting in a 10x simplification in computation. The initial dataset with around 9000 features is reduced to approximately 500 features, simplifying the computation process.", 'First principal component captures the most variance in original variables, while the second component captures the second highest variance (quantifiable data: variance captured)']}], 'highlights': ['PCA is crucial in solving complex data-driven problems involving high dimensional data sets and is implemented in the majority of machine learning algorithms.', 'The significance of dimensionality reduction lies in retaining important data while narrowing down variables from the original data set to the reduced data set, emphasizing the importance of not losing out on significant information.', 'The step-by-step process of PCA involves standardization, computing covariance matrix, eigenvalues and eigenvectors, and reducing the dimension of the dataset to enable the implementation of machine learning and predictive modeling.', 'Eigenvectors and eigenvalues are essential mathematical constructs computed from the covariance matrix to determine principal components.', 'Ordering eigenvectors in descending order of eigenvalues to identify the most significant principal component. Ordering eigenvectors in descending order of eigenvalues helps in identifying the most significant principal component, enabling the removal of lesser significant principal components to reduce the data dimension.', 'Formation of a feature matrix containing significant data variables or principal components. In this step, a feature matrix is formed, containing significant data variables or principal components that possess maximum information about the data, facilitating dimensionality reduction.', 'Rearranging the original data with the final principal components to represent the most significant information of the data set. The rearrangement of the original data with the final principal components represents the most significant information of the data set, leading to a reduced data set containing only the most important information.', "The dataset's dimension is reduced from 9000 features to 500 using PCA, resulting in a 10x simplification in computation. The initial dataset with around 9000 features is reduced to approximately 500 features, simplifying the computation process."]}