title
Data Science Full Course - 12 Hours | Data Science For Beginners | Data Science Tutorial | Edureka

description
🔥 𝐄𝐝𝐮𝐫𝐞𝐤𝐚 𝐃𝐚𝐭𝐚 𝐒𝐜𝐢𝐞𝐧𝐜𝐞 𝐂𝐞𝐫𝐭𝐢𝐟𝐢𝐜𝐚𝐭𝐢𝐨𝐧 𝐂𝐨𝐮𝐫𝐬𝐞 (Use code: "𝐘𝐎𝐔𝐓𝐔𝐁𝐄𝟐𝟎") : https://www.edureka.co/data-science-python-certification-course This Edureka Data Science Full Course video will help you understand and learn Data Science Algorithms in detail. This Data Science Tutorial is ideal for both beginners as well as professionals who want to master Data Science Algorithms. Below are the topics covered in this Data Science Full course tutorial: 00:00:00 Introduction to Data Science Full Course 00:00:45 Agenda of Data Science Full Course 00:02:59 What is Data Science 00:04:31 Data Science Basics 00:07:13 Walmart use cases 00:12:05 Who is a Data Scientist 00:13:47 Role of a Data Scientist 00:15:39 Technologies to learn for a Data scientist 00:44:16 Data Scientist Roadmap 01:00:55 Data Science Salary 01:09:44 Statistics and Probability 01:40:48 Use Case 01:50:43 Confusion matrix 02:00:15 Probability 02:15:56 Bayes Theorem 02:22:22 Inferntial Statistics 02:31:37 Use Case 02:42:20 What is Machine Learning 02:59:23 What is Regression in Machine Learning 03:08:05 Use Case 03:25:53 Logistic Regression 03:32:46 Use case 04:14:25 Decision Tree Algorithm 04:15:01 What is Classification 04:19:12 Types of Classification 04:28:28 What is Decision Tree 04:35:35 Decesion Tree Terminologies 04:59:50 Random Forest 05:03:11 Working of Random Forest 05:04:08 RandomSampling with Replacement 05:12:15 Advantages of Random Forest 05:15:37 Hands-on Random forest 05:27:01 KNN Algorithm 05:29:02 Features of KNN 05:37:17 How Does KNN Algorithm Works 05:42:50 Hands-on KNN Algorithm 05:59:30 Naive Bayes Classifier 06:19:50 Support Vector Machine 06:45:16 K-means clustering Algorithm 07:05:27 Apriori Algorithm 07:21:14 Hands-on 07:35:08 Reinforcement Learning 07:38:25 Reinforcement Learning - Counter strike example 07:56:36 Q Learning 07:57:34 Defining a problem statement 08:01:37 The Bellman Equation 08:19:25 What is Deep Learning 08:23:24 Why do we need Artificial Neuron 08:33:55 What is Tensorflow 08:43:50 Tensorflow Code Basics 08:55:45 What is a Computational Graph 09:18:18 Limitation of a Single layer perception 09:25:54 Multi-layer Perceptron Use case 09:57:08 Data Scientist Resume 10:02:33 Data Science Interview Question & Answers 🔴 Subscribe to our channel to get video updates. Hit the subscribe button above: https://goo.gl/6ohpTV 🔴 𝐄𝐝𝐮𝐫𝐞𝐤𝐚 𝐎𝐧𝐥𝐢𝐧𝐞 𝐓𝐫𝐚𝐢𝐧𝐢𝐧𝐠 𝐚𝐧𝐝 𝐂𝐞𝐫𝐭𝐢𝐟𝐢𝐜𝐚𝐭𝐢𝐨𝐧𝐬 🔵 DevOps Online Training: http://bit.ly/3VkBRUT 🌕 AWS Online Training: http://bit.ly/3ADYwDY 🔵 React Online Training: http://bit.ly/3Vc4yDw 🌕 Tableau Online Training: http://bit.ly/3guTe6J 🔵 Power BI Online Training: http://bit.ly/3VntjMY 🌕 Selenium Online Training: http://bit.ly/3EVDtis 🔵 PMP Online Training: http://bit.ly/3XugO44 🌕 Salesforce Online Training: http://bit.ly/3OsAXDH 🔵 Cybersecurity Online Training: http://bit.ly/3tXgw8t 🌕 Java Online Training: http://bit.ly/3tRxghg 🔵 Big Data Online Training: http://bit.ly/3EvUqP5 🌕 RPA Online Training: http://bit.ly/3GFHKYB 🔵 Python Online Training: http://bit.ly/3Oubt8M 🌕 Azure Online Training: http://bit.ly/3i4P85F 🔵 GCP Online Training: http://bit.ly/3VkCzS3 🌕 Microservices Online Training: http://bit.ly/3gxYqqv 🔵 Data Science Online Training: http://bit.ly/3V3nLrc 🌕 CEHv12 Online Training: http://bit.ly/3Vhq8Hj 🔵 Angular Online Training: http://bit.ly/3EYcCTe 🔴 𝐄𝐝𝐮𝐫𝐞𝐤𝐚 𝐑𝐨𝐥𝐞-𝐁𝐚𝐬𝐞𝐝 𝐂𝐨𝐮𝐫𝐬𝐞𝐬 🔵 DevOps Engineer Masters Program: http://bit.ly/3Oud9PC 🌕 Cloud Architect Masters Program: http://bit.ly/3OvueZy 🔵 Data Scientist Masters Program: http://bit.ly/3tUAOiT 🌕 Big Data Architect Masters Program: http://bit.ly/3tTWT0V 🔵 Machine Learning Engineer Masters Program: http://bit.ly/3AEq4c4 🌕 Business Intelligence Masters Program: http://bit.ly/3UZPqJz 🔵 Python Developer Masters Program: http://bit.ly/3EV6kDv 🌕 RPA Developer Masters Program: http://bit.ly/3OteYfP 🔵 Web Development Masters Program: http://bit.ly/3U9R5va 🌕 Computer Science Bootcamp Program : http://bit.ly/3UZxPBy 🔵 Cyber Security Masters Program: http://bit.ly/3U25rNR 🌕 Full Stack Developer Masters Program : http://bit.ly/3tWCE2S 🔵 Automation Testing Engineer Masters Program : http://bit.ly/3AGXg2J 🌕 Python Developer Masters Program : https://bit.ly/3EV6kDv 🔵 Azure Cloud Engineer Masters Program: http://bit.ly/3AEBHzH 🔴 𝐄𝐝𝐮𝐫𝐞𝐤𝐚 𝐔𝐧𝐢𝐯𝐞𝐫𝐬𝐢𝐭𝐲 𝐏𝐫𝐨𝐠𝐫𝐚𝐦𝐬 🌕 Professional Certificate Program in DevOps with Purdue University: https://bit.ly/3Ov52lT 🔵 Advanced Certificate Program in Data Science with E&ICT Academy, IIT Guwahati: http://bit.ly/3V7ffrh 📢📢 𝐓𝐨𝐩 𝟏𝟎 𝐓𝐫𝐞𝐧𝐝𝐢𝐧𝐠 𝐓𝐞𝐜𝐡𝐧𝐨𝐥𝐨𝐠𝐢𝐞𝐬 𝐭𝐨 𝐋𝐞𝐚𝐫𝐧 𝐢𝐧 2023 𝐒𝐞𝐫𝐢𝐞𝐬 📢📢 ⏩ NEW Top 10 Technologies To Learn In 2023 - https://youtu.be/udD_GQVDt5g Got a question on the topic? Please share it in the comment section below and our experts will answer it for you. Please write back to us at sales@edureka.co or call us at IND: 9606058406 / US: 18338555775 (toll-free) for more information.

detail
{'title': 'Data Science Full Course - 12 Hours | Data Science For Beginners | Data Science Tutorial | Edureka', 'heatmap': [{'end': 2053.263, 'start': 1226.5, 'weight': 0.905}, {'end': 4503.742, 'start': 4092.195, 'weight': 0.714}, {'end': 6143.434, 'start': 4905.114, 'weight': 0.939}, {'end': 7786.991, 'start': 7363.159, 'weight': 0.766}], 'summary': 'This 12-hour data science course for beginners covers fundamental concepts, applications, and technologies, including python and r libraries, big data technologies, statistical analysis, machine learning algorithms, and deep learning evolution. it also includes practical applications achieving specific accuracies, such as 99% accuracy with knn algorithm and 91.4% accuracy in handwritten digit classification using tensorflow model.', 'chapters': [{'end': 956.7, 'segs': [{'end': 78.489, 'src': 'embed', 'start': 48.039, 'weight': 0, 'content': [{'end': 51.162, 'text': "We'll begin this video by understanding what data science is.", 'start': 48.039, 'duration': 3.123}, {'end': 55.826, 'text': 'This section will cover all the data science fundamentals that you would need to know.', 'start': 51.742, 'duration': 4.084}, {'end': 60.371, 'text': 'Followed by this, we have the topic who is a data scientist?', 'start': 56.447, 'duration': 3.924}, {'end': 67.566, 'text': "Now, once we understand who a data scientist is, we'll then see the roadmap to becoming a data scientist.", 'start': 61.064, 'duration': 6.502}, {'end': 70.787, 'text': "We'll also see some salary statistics after that.", 'start': 67.986, 'duration': 2.801}, {'end': 74.328, 'text': "Next, we'll discuss the data science core concepts.", 'start': 71.207, 'duration': 3.121}, {'end': 78.489, 'text': "In this, we'll begin by seeing the data life cycle.", 'start': 74.928, 'duration': 3.561}], 'summary': 'Introduction to data science, including core concepts and salary statistics.', 'duration': 30.45, 'max_score': 48.039, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s48039.jpg'}, {'end': 260.522, 'src': 'embed', 'start': 234.615, 'weight': 2, 'content': [{'end': 242.679, 'text': 'And then making use of machine learning algorithms, Google Maps sends real traffic updates by way of colored lines on the traffic layers.', 'start': 234.615, 'duration': 8.064}, {'end': 250.684, 'text': 'This helps you find your optimal route and even determine which areas should be avoided due to roadwork or accidents.', 'start': 243.52, 'duration': 7.164}, {'end': 260.522, 'text': "Isn't that amazing? As data science continues to evolve, the demand for skilled professionals in this domain is also increasing drastically.", 'start': 251.497, 'duration': 9.025}], 'summary': 'Google maps uses machine learning for real traffic updates, increasing demand for data science professionals.', 'duration': 25.907, 'max_score': 234.615, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s234615.jpg'}, {'end': 460.744, 'src': 'embed', 'start': 431.135, 'weight': 1, 'content': [{'end': 437.141, 'text': "Now, before we get into the details of data science, let's see how Walmart uses data science to grow their business.", 'start': 431.135, 'duration': 6.006}, {'end': 444.35, 'text': "So guys, Walmart is the world's biggest retailer with over 20,000 stores in just 28 countries.", 'start': 437.924, 'duration': 6.426}, {'end': 452.758, 'text': "Okay, now it's currently building the world's biggest private cloud which will be able to process 2.5 petabytes of data every hour.", 'start': 444.85, 'duration': 7.908}, {'end': 460.744, 'text': "Now, the reason behind Walmart's success is how they use the customer data to get useful insights about customers' shopping patterns.", 'start': 453.198, 'duration': 7.546}], 'summary': 'Walmart, with 20,000+ stores, processes 2.5 petabytes of data per hour to gain insights for business growth.', 'duration': 29.609, 'max_score': 431.135, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s431135.jpg'}, {'end': 576.189, 'src': 'embed', 'start': 549.211, 'weight': 3, 'content': [{'end': 554.936, 'text': 'So what Walmart did was they placed all the strawberry pop-tarts as a checkout before a hurricane would occur.', 'start': 549.211, 'duration': 5.725}, {'end': 557.759, 'text': 'So this way they increase sales of their pop-tarts.', 'start': 555.296, 'duration': 2.463}, {'end': 559.52, 'text': 'Now guys, this is an actual thing.', 'start': 558.299, 'duration': 1.221}, {'end': 560.421, 'text': "I'm not making it up.", 'start': 559.58, 'duration': 0.841}, {'end': 562.082, 'text': 'You can look it up on the internet.', 'start': 560.661, 'duration': 1.421}, {'end': 569.829, 'text': 'Not only that, Walmart is analyzing the data generated by social media to find out all the trending products.', 'start': 562.743, 'duration': 7.086}, {'end': 576.189, 'text': "So through social media, you can find out the likes and dislikes of a person, right? So what Walmart did is they're quite smart.", 'start': 570.345, 'duration': 5.844}], 'summary': 'Walmart strategically placed pop-tarts at checkout before a hurricane to boost sales and uses social media data for trend analysis.', 'duration': 26.978, 'max_score': 549.211, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s549211.jpg'}], 'start': 7.401, 'title': 'Data science and its applications', 'summary': "Covers the fundamentals of data science, including salary statistics, core concepts, and various algorithms. it also explores real-world applications of data science in google maps and walmart, along with walmart's data-driven success stories.", 'chapters': [{'end': 144.828, 'start': 7.401, 'title': 'Data science full course', 'summary': 'Discusses the fundamentals of data science, the roadmap to becoming a data scientist, salary statistics, data science core concepts including statistics, probability, machine learning, and deep learning, along with algorithms such as linear regression, logistic regression, decision tree, random forest, knn, naive bayes classifier, support vector machine, k means clustering, a priori, reinforcement learning, and queue learning.', 'duration': 137.427, 'highlights': ['The chapter covers the fundamentals of data science and the roadmap to becoming a data scientist, along with salary statistics.', 'It discusses core concepts such as statistics, probability, and machine learning, and also delves into deep learning.', 'It includes detailed explanations of various algorithms in data science, including linear regression, logistic regression, decision tree, random forest, KNN, naive Bayes classifier, support vector machine, K means clustering, a priori, reinforcement learning, and queue learning.']}, {'end': 532.998, 'start': 145.444, 'title': 'Data science: insights and applications', 'summary': 'Explores the concept of data science, its applications in real-world scenarios such as google maps and walmart, the exponential growth of data generation through iot and social media, and the significance of extracting useful insights from data to drive business growth.', 'duration': 387.554, 'highlights': ["Walmart is building the world's biggest private cloud capable of processing 2.5 petabytes of data every hour to gain useful insights into customers' shopping patterns. Walmart's significant investment in data infrastructure highlights the importance of utilizing customer data to understand shopping patterns, leading to actionable insights.", 'Google Maps uses data science and machine learning algorithms to collect and analyze data from a multitude of reliable sources, providing real-time traffic updates to users. The use of machine learning algorithms in Google Maps exemplifies the practical application of data science in providing real-time solutions based on data analysis.', 'The IoT is estimated to generate over 500 zettabytes of data per year, highlighting the exponential growth of data generation through networked tools and devices communicating via the internet. The staggering volume of data generated by IoT underscores the need for advanced data processing and analysis to derive meaningful insights.', 'Social media platforms contribute significantly to data generation, with activities such as online transactions, streaming music, and video consumption adding to the vast pool of generated data. The impact of social media and online activities on data generation emphasizes the diverse sources contributing to the ever-increasing volume of available data for analysis and insights.']}, {'end': 956.7, 'start': 533.318, 'title': "Walmart's data-driven success", 'summary': "Highlights walmart's success in using data analysis to drive sales, such as identifying the association between hurricanes and strawberry pop-tarts, using social media data to introduce trending products, and how data science uncovers findings to make smart business decisions.", 'duration': 423.382, 'highlights': ['Walmart used data analysis to place strawberry pop-tarts at the checkout before hurricanes, increasing their sales. Walmart placed strawberry pop-tarts at the checkout before hurricanes, leading to increased sales.', 'Walmart analyzed social media data to identify trending products, such as introducing cake pops based on Facebook user interest. Walmart used social media data to introduce cake pops, driven by Facebook user interest.', 'Data scientists become detectives when exploring challenging questions, investigating leads, and understanding different data patterns for organizational improvement. Data scientists act as detectives, investigating leads and understanding data patterns for organizational improvement.']}], 'duration': 949.299, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s7401.jpg', 'highlights': ['The chapter covers the fundamentals of data science and the roadmap to becoming a data scientist, along with salary statistics.', "Walmart is building the world's biggest private cloud capable of processing 2.5 petabytes of data every hour to gain useful insights into customers' shopping patterns.", 'Google Maps uses data science and machine learning algorithms to collect and analyze data from a multitude of reliable sources, providing real-time traffic updates to users.', 'Walmart used data analysis to place strawberry pop-tarts at the checkout before hurricanes, increasing their sales.']}, {'end': 2198.952, 'segs': [{'end': 1002.543, 'src': 'embed', 'start': 957.241, 'weight': 0, 'content': [{'end': 963.066, 'text': 'Another technology is database, knowledge of SQL and NoSQL and big data.', 'start': 957.241, 'duration': 5.825}, {'end': 966.369, 'text': 'Big data, the knowledge of Hadoop, Spark, any one of them is required.', 'start': 963.266, 'duration': 3.103}, {'end': 969.77, 'text': "So now let's cover these technologies in detail.", 'start': 966.949, 'duration': 2.821}, {'end': 972.791, 'text': 'Now moving on forward to the programming language.', 'start': 970.791, 'duration': 2}, {'end': 980.954, 'text': 'Now why a programming language is really required by a data scientist is to analyze huge volumes of data and also to perform analysis of the data.', 'start': 972.891, 'duration': 8.063}, {'end': 986.096, 'text': 'So programming language really aids in better analysis manipulation of the data.', 'start': 981.235, 'duration': 4.861}, {'end': 992.059, 'text': 'So programming language generally used are Python or Scala or Java, but generally preferred is Python,', 'start': 986.517, 'duration': 5.542}, {'end': 997.561, 'text': "and it's the most popular language which is now being used in the data science community.", 'start': 992.059, 'duration': 5.502}, {'end': 1002.543, 'text': 'r is also there, but on the production side it becomes a little tough, okay.', 'start': 997.561, 'duration': 4.982}], 'summary': 'Data scientists need knowledge of sql, nosql, hadoop, spark, and python for data analysis and manipulation.', 'duration': 45.302, 'max_score': 957.241, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s957241.jpg'}, {'end': 1051.299, 'src': 'embed', 'start': 1024.796, 'weight': 1, 'content': [{'end': 1029.26, 'text': 'This means that numerous functions and methods are available in pandas,', 'start': 1024.796, 'duration': 4.464}, {'end': 1034.364, 'text': 'which allows you to perform data analysis and manipulation of the data very quickly and very fast.', 'start': 1029.26, 'duration': 5.104}, {'end': 1038.186, 'text': 'And time series analysis is also possible with the help of pandas.', 'start': 1034.824, 'duration': 3.362}, {'end': 1046.074, 'text': "So the examples, like when you're forecasting, when you're forecasting weather predictions or stock prices predictions or houses predictions,", 'start': 1038.547, 'duration': 7.527}, {'end': 1050.198, 'text': 'then Pandas really help you know in data analysis and manipulation.', 'start': 1046.074, 'duration': 4.124}, {'end': 1051.299, 'text': "So that's an example of that.", 'start': 1050.218, 'duration': 1.081}], 'summary': 'Pandas offers various functions for quick data analysis and manipulation, including time series analysis for forecasting weather, stock prices, and housing predictions.', 'duration': 26.503, 'max_score': 1024.796, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s1024796.jpg'}, {'end': 1143.32, 'src': 'embed', 'start': 1112.662, 'weight': 3, 'content': [{'end': 1116.728, 'text': 'That means that it derives functionality methods from Matplotlib,', 'start': 1112.662, 'duration': 4.066}, {'end': 1123.555, 'text': 'but offers more flexibility in terms of plotting the graphs and better representation in terms of color,', 'start': 1116.728, 'duration': 6.827}, {'end': 1127.116, 'text': 'and aesthetically beautiful plots can be plotted with the help of Seaborn.', 'start': 1123.555, 'duration': 3.561}, {'end': 1131.417, 'text': 'So now we are moving forward to another language which is R.', 'start': 1127.676, 'duration': 3.741}, {'end': 1136.518, 'text': 'So R is also preferred in the data science community and data scientists work with R.', 'start': 1131.417, 'duration': 5.101}, {'end': 1143.32, 'text': 'R has very strong statistical packages as it was developed by statisticians and R is really good for data analysis.', 'start': 1136.518, 'duration': 6.802}], 'summary': 'Seaborn offers flexibility and beautiful plots, while r is preferred for strong statistical packages and data analysis.', 'duration': 30.658, 'max_score': 1112.662, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s1112662.jpg'}, {'end': 2053.263, 'src': 'heatmap', 'start': 1226.5, 'weight': 0.905, 'content': [{'end': 1232.366, 'text': 'All right for date and time analysis, and this really helps because it already has functions and methods predefined,', 'start': 1226.5, 'duration': 5.866}, {'end': 1238.491, 'text': 'which understands date and time analysis and manipulation can be done very easily with the help of UBREDATE.', 'start': 1232.366, 'duration': 6.125}, {'end': 1245.038, 'text': 'Now another library that we would cover is ggplot2 for data visualization in R.', 'start': 1239.072, 'duration': 5.966}, {'end': 1248.88, 'text': 'Now, this gg stands for grammar of graphics.', 'start': 1245.558, 'duration': 3.322}, {'end': 1252.402, 'text': 'Now, in our ggplot2 is used to make the graphs plots.', 'start': 1249.24, 'duration': 3.162}, {'end': 1255.263, 'text': 'And we work with layers in ggplot2.', 'start': 1252.842, 'duration': 2.421}, {'end': 1261.286, 'text': 'So with the help of these layers, we can plot a very complex plot with the help of ggplot2.', 'start': 1255.643, 'duration': 5.643}, {'end': 1267.39, 'text': 'And we can say that ggplot2 has very aesthetically appealing plots.', 'start': 1261.827, 'duration': 5.563}, {'end': 1273.133, 'text': 'And it gives us the flexibility to show our data in various structures and various formats.', 'start': 1267.63, 'duration': 5.503}, {'end': 1276.336, 'text': 'and we can always see what the data is saying.', 'start': 1273.693, 'duration': 2.643}, {'end': 1280.321, 'text': 'so we visualize the data with the help of ggplot2 in our.', 'start': 1276.336, 'duration': 3.985}, {'end': 1286.407, 'text': 'now i want to tell you something really very interesting, as why is data visualization important?', 'start': 1280.321, 'duration': 6.086}, {'end': 1288.91, 'text': 'and now let us just uncover this thing.', 'start': 1286.407, 'duration': 2.503}, {'end': 1295.414, 'text': 'The significance of data visualization can be understood with the help of the image here on the left hand side,', 'start': 1289.631, 'duration': 5.783}, {'end': 1297.536, 'text': 'and this is known as Anscombe Quadrate.', 'start': 1295.414, 'duration': 2.122}, {'end': 1304.54, 'text': 'Now Anscombe was an English mathematician and he developed this quadrate to demonstrate the importance of data visualization.', 'start': 1298.036, 'duration': 6.504}, {'end': 1314.829, 'text': 'Now this comprises of four, as we can see, four data sets, and these data sets have the same or identical simple, descriptive statistics, right?', 'start': 1305.18, 'duration': 9.649}, {'end': 1319.714, 'text': 'But when they were being plotted, we get different graphs of each data set.', 'start': 1315.27, 'duration': 4.444}, {'end': 1327.721, 'text': 'Now this is the importance of visualization that the numbers and figures will appear same but when they are plotted they would be different.', 'start': 1320.334, 'duration': 7.387}, {'end': 1336.389, 'text': 'so, to get the real picture and the meaning of your data, your data needs to be plotted to get the true picture all right.', 'start': 1328.602, 'duration': 7.787}, {'end': 1344.957, 'text': "so it's always been said that numbers will not reveal the true picture but the, but the visualization of that data would definitely reveal it.", 'start': 1336.389, 'duration': 8.568}, {'end': 1350.762, 'text': 'so this was a very important enscombe quarter to visualize that rest data visualization.', 'start': 1344.957, 'duration': 5.805}, {'end': 1351.983, 'text': 'why is it really important?', 'start': 1350.762, 'duration': 1.221}, {'end': 1357.988, 'text': 'So, moving on further, linked with this is data visualization tools that a data scientist uses.', 'start': 1352.483, 'duration': 5.505}, {'end': 1360.29, 'text': 'And one can learn these tools.', 'start': 1358.428, 'duration': 1.862}, {'end': 1361.491, 'text': 'It could be very handy.', 'start': 1360.53, 'duration': 0.961}, {'end': 1366.856, 'text': 'So a few visualization tools or softwares are Power BI, Tableau.', 'start': 1361.932, 'duration': 4.924}, {'end': 1371.16, 'text': 'Now, these tools really help you to visualize large datasets.', 'start': 1367.356, 'duration': 3.804}, {'end': 1376.741, 'text': 'And these powerful softwares, they can connect with multiple data sources in no time.', 'start': 1371.72, 'duration': 5.021}, {'end': 1384.384, 'text': 'You can build dashboards, interactive dashboards, powerful visualization, and even you can go on creating applications with these tools.', 'start': 1377.002, 'duration': 7.382}, {'end': 1390.125, 'text': 'So business intelligence tools knowledge is one of the requirements, but it is not that necessary.', 'start': 1384.924, 'duration': 5.201}, {'end': 1392.506, 'text': "It's easy to learn and it's fun too.", 'start': 1390.265, 'duration': 2.241}, {'end': 1398.728, 'text': 'Okay now move coming on to another concept after programming language and that is machine learning.', 'start': 1393.326, 'duration': 5.402}, {'end': 1406.071, 'text': 'Now machine learning this is the ability to understand algorithms both supervised and unsupervised machine learning algorithm.', 'start': 1399.388, 'duration': 6.683}, {'end': 1411.213, 'text': 'So now we will discuss about supervised unsupervised machine learning algorithms.', 'start': 1406.531, 'duration': 4.682}, {'end': 1412.673, 'text': 'So supervised is something.', 'start': 1411.613, 'duration': 1.06}, {'end': 1414.234, 'text': 'when we work with labeled data,', 'start': 1412.673, 'duration': 1.561}, {'end': 1420.34, 'text': "That means the output is known and unsupervised is when we don't know the output and the algorithm has to make a guess.", 'start': 1414.574, 'duration': 5.766}, {'end': 1425.885, 'text': 'Okay, what is this and which category or which class cluster will the data fall into??', 'start': 1420.42, 'duration': 5.465}, {'end': 1434.593, 'text': 'So, primarily we deal with supervised machine learning and there are two types of supervised machine learning, that is, classification and regression.', 'start': 1426.305, 'duration': 8.288}, {'end': 1440.44, 'text': 'and in unsupervised machine learning we will see dimensionality reduction and then various algorithms like clustering,', 'start': 1434.593, 'duration': 5.847}, {'end': 1445.325, 'text': 'and in clustering there are so many algorithms like hierarchical clustering of k-means and so on.', 'start': 1440.44, 'duration': 4.885}, {'end': 1454.175, 'text': 'So a data scientist has to know about the algorithms, different kinds of algorithms that can be applied to a data to give good results.', 'start': 1445.846, 'duration': 8.329}, {'end': 1458.339, 'text': "That's why it's a little vast But it is really very interesting.", 'start': 1454.576, 'duration': 3.763}, {'end': 1463.402, 'text': 'So various algorithms like SVM, NavVise, Random Forest, K-Means.', 'start': 1458.659, 'duration': 4.743}, {'end': 1467.805, 'text': 'good command over the supervised and unsupervised machine learning is required by a data scientist.', 'start': 1463.402, 'duration': 4.403}, {'end': 1469.746, 'text': 'Hyperparameter tuning.', 'start': 1468.485, 'duration': 1.261}, {'end': 1475.97, 'text': 'Now this is again important because hyperparameters are nothing but they control the behavior of an algorithm.', 'start': 1469.966, 'duration': 6.004}, {'end': 1483.715, 'text': 'So you can say that if hyperparameter tuning is being done or performed on any algorithm, that is done to improve the results.', 'start': 1476.17, 'duration': 7.545}, {'end': 1486.836, 'text': 'So for model optimization this is required.', 'start': 1484.295, 'duration': 2.541}, {'end': 1494.558, 'text': 'So to improve the predictions by tweaking a few parameters via hyperparameter tuning that is what a data scientist also does.', 'start': 1487.316, 'duration': 7.242}, {'end': 1496.638, 'text': 'So to learn machine learning.', 'start': 1494.838, 'duration': 1.8}, {'end': 1500.039, 'text': 'moving on forward now, to learn machine learning in Python.', 'start': 1496.638, 'duration': 3.401}, {'end': 1505.8, 'text': 'what libraries are required is that scikit-learn is a package for machine learning library in Python.', 'start': 1500.039, 'duration': 5.761}, {'end': 1511.922, 'text': 'So the scikit-learn is the machine learning library and this supports both supervised and unsupervised learning.', 'start': 1506.321, 'duration': 5.601}, {'end': 1519.989, 'text': 'And it also provides various tools for model fitting, data pre-processing, model selection, model evaluation, and other utilities.', 'start': 1512.302, 'duration': 7.687}, {'end': 1526.534, 'text': "So if you're making any model or you're importing any of the algorithms, scikit-learn package would be used.", 'start': 1520.349, 'duration': 6.185}, {'end': 1529.757, 'text': 'Okay, now moving on further to R.', 'start': 1527.194, 'duration': 2.563}, {'end': 1535.762, 'text': 'So in R, if one wants to build machine learning models, so MLR is the package or carrot is the package.', 'start': 1529.757, 'duration': 6.005}, {'end': 1541.086, 'text': 'So here MLR has all the important and useful algorithms to perform machine learning tasks.', 'start': 1536.182, 'duration': 4.904}, {'end': 1544.767, 'text': 'and CARROT stands for Classification and Regression Training.', 'start': 1541.505, 'duration': 3.262}, {'end': 1550.15, 'text': 'So with CARROT one can process and streamline model building and evaluation process.', 'start': 1545.507, 'duration': 4.643}, {'end': 1556.634, 'text': 'Also feature selection could be done and other techniques are also can be applied with the help of these libraries.', 'start': 1550.951, 'duration': 5.683}, {'end': 1564.258, 'text': 'These are very strong and powerful libraries which have consolidated packages and you just need to import it in your project and use it.', 'start': 1556.654, 'duration': 7.604}, {'end': 1568.961, 'text': 'But the knowledge of how to use it, where to use it is what the data scientist knows.', 'start': 1564.819, 'duration': 4.142}, {'end': 1575.558, 'text': 'So now moving on further to another technique and technology that is deep learning techniques.', 'start': 1570.475, 'duration': 5.083}, {'end': 1582.342, 'text': 'So now, when we want to work with complex datasets, when the dataset size is huge,', 'start': 1576.198, 'duration': 6.144}, {'end': 1585.583, 'text': 'then traditional machine learning algorithms they are not preferred.', 'start': 1582.342, 'duration': 3.241}, {'end': 1588.025, 'text': 'And generally we move on to deep learning.', 'start': 1586.004, 'duration': 2.021}, {'end': 1591.626, 'text': 'So gradually from machine learning, one moves to deep learning.', 'start': 1588.405, 'duration': 3.221}, {'end': 1592.946, 'text': 'That is a gradual shift.', 'start': 1591.866, 'duration': 1.08}, {'end': 1598.008, 'text': 'So when we talk about deep learning, what comes to our mind is neural networks.', 'start': 1593.367, 'duration': 4.641}, {'end': 1604.51, 'text': "So when you're working with various kind of layers and neural networks, then deep learning is preferred.", 'start': 1598.128, 'duration': 6.382}, {'end': 1613.233, 'text': 'And neural networks or neural, they are a set of algorithms which are modeled loosely after the human brain to recognize patterns.', 'start': 1604.63, 'duration': 8.603}, {'end': 1619.795, 'text': 'So just as to make it the system more intelligent by just understanding the patterns and predicting.', 'start': 1613.773, 'duration': 6.022}, {'end': 1623.735, 'text': 'So deep learning is more advanced than machine learning I can say.', 'start': 1620.215, 'duration': 3.52}, {'end': 1632.938, 'text': 'So a few deep learning algorithms are convolution neural networks, RNN, that is, recurrent neural networks, or LSTMs, which is long,', 'start': 1624.356, 'duration': 8.582}, {'end': 1638.479, 'text': 'short-term memory network, and there are many other deep learning techniques or algorithms.', 'start': 1632.938, 'duration': 5.541}, {'end': 1647.138, 'text': 'So these algorithms they help to solve complex problems like image classification or text to speech or language translation.', 'start': 1638.539, 'duration': 8.599}, {'end': 1656.363, 'text': 'So these are really very powerful algorithms which are deployed to solve real world challenges and how to really make life simple.', 'start': 1647.458, 'duration': 8.905}, {'end': 1660.745, 'text': 'So now moving on forward to natural language processing.', 'start': 1657.143, 'duration': 3.602}, {'end': 1667.43, 'text': 'And what if I say that we all have sometime or the other had experienced with NLP and we know the end result.', 'start': 1661.485, 'duration': 5.945}, {'end': 1672.694, 'text': "How? Remember Siri and Alexa? Yes, I'm talking about natural language processing.", 'start': 1667.57, 'duration': 5.124}, {'end': 1675.476, 'text': 'And this is a subfield of artificial intelligence.', 'start': 1673.054, 'duration': 2.422}, {'end': 1679.399, 'text': 'And this is concerned with interaction between machines and humans.', 'start': 1675.936, 'duration': 3.463}, {'end': 1686.224, 'text': 'So it is used for speech recognition, reading text, voicemail, virtual assistants like Alexa or Siri.', 'start': 1679.799, 'duration': 6.425}, {'end': 1689.107, 'text': 'And these are also the examples of NLP.', 'start': 1686.845, 'duration': 2.262}, {'end': 1695.668, 'text': 'So for Python, the deep learning libraries are MXNetR, DeepNet, DeepR, H2O.', 'start': 1689.966, 'duration': 5.702}, {'end': 1704.432, 'text': 'So talking about MXNetR, it is used for feed-forward neural network, convolution neural network, whereas H2O can also be used for the same.', 'start': 1695.969, 'duration': 8.463}, {'end': 1707.453, 'text': 'But H2O is for deep autoencoders.', 'start': 1704.912, 'duration': 2.541}, {'end': 1713.776, 'text': 'So when we move on to deep learning, there are more sophisticated terms and technologies that needs to be learned.', 'start': 1707.954, 'duration': 5.822}, {'end': 1719.203, 'text': 'and to puff to improve the model performance, one needs to understand the working of algorithm,', 'start': 1714.417, 'duration': 4.786}, {'end': 1724.911, 'text': 'the limitation of these algorithms and how one can really improve upon the model by creating over these models.', 'start': 1719.203, 'duration': 5.708}, {'end': 1729.578, 'text': 'So now moving on forward from machine learning deep learning.', 'start': 1725.452, 'duration': 4.126}, {'end': 1733.16, 'text': 'and another tools and technique, and that is statistics.', 'start': 1729.998, 'duration': 3.162}, {'end': 1735.702, 'text': 'so yet this is yet another important tool.', 'start': 1733.16, 'duration': 2.542}, {'end': 1738.805, 'text': 'now, how does a data scientist use statistics?', 'start': 1735.702, 'duration': 3.103}, {'end': 1741.707, 'text': 'so it has been used to identify the right questions.', 'start': 1738.805, 'duration': 2.902}, {'end': 1750.894, 'text': 'so data scientists also use statistics to apply it to the data and ask the right questions, and once they know what the question is,', 'start': 1741.707, 'duration': 9.187}, {'end': 1753.295, 'text': 'they can use statistics to find answers.', 'start': 1750.894, 'duration': 2.401}, {'end': 1761.462, 'text': 'so statistics also help them to understand how reliable their results are and how likely is it that the findings can be changed.', 'start': 1753.295, 'duration': 8.167}, {'end': 1767.327, 'text': 'Now moving on further, the statistics help data scientists to do predictive modeling.', 'start': 1761.882, 'duration': 5.445}, {'end': 1774.554, 'text': 'This means that using the past data to create models and that can be used for predicting the future events.', 'start': 1767.488, 'duration': 7.066}, {'end': 1776.036, 'text': 'That is predictive modeling.', 'start': 1774.854, 'duration': 1.182}, {'end': 1786.998, 'text': 'Now another importance of statistics is that it will help a data scientist to design and interpret experiments to make informed decisions.', 'start': 1776.636, 'duration': 10.362}, {'end': 1789.899, 'text': 'So now let us understand this with the help of an example.', 'start': 1787.458, 'duration': 2.441}, {'end': 1798.381, 'text': 'An observation was made that an advertisement X, suppose an advertisement X has a seven percent higher click-through rate than an advertisement B.', 'start': 1790.619, 'duration': 7.762}, {'end': 1807.913, 'text': 'So a data scientist would determine whether or not this difference is significant enough to warrant any increased attention, focus and investment.', 'start': 1799.241, 'duration': 8.672}, {'end': 1816.902, 'text': 'Now, this statistics would help us to experiment and design the frequency statistics or hypothesis testing and confidence intervals.', 'start': 1808.835, 'duration': 8.067}, {'end': 1825.125, 'text': 'So data scientists would work with these tools like frequency statistics and hypothesis testing to understand whether this is really important or not,', 'start': 1817.282, 'duration': 7.843}, {'end': 1832.567, 'text': 'whether advertisement X requires more attention or advertisement B requires more attention, focus and investment or not.', 'start': 1825.125, 'duration': 7.442}, {'end': 1838.369, 'text': 'So statistics helps to determine these important and crucial decisions.', 'start': 1833.007, 'duration': 5.362}, {'end': 1843.332, 'text': 'All right, so now another use of statistics is to estimate intelligently.', 'start': 1839.471, 'duration': 3.861}, {'end': 1845.993, 'text': 'So data scientists estimate intelligently.', 'start': 1843.773, 'duration': 2.22}, {'end': 1848.134, 'text': 'That means by using Bayesian theorem.', 'start': 1846.033, 'duration': 2.101}, {'end': 1851.275, 'text': 'There are various theorems which help to estimate intelligently,', 'start': 1848.514, 'duration': 2.761}, {'end': 1858.458, 'text': 'because a Bayesian theorem takes the result from the past experiences and observations and then make predictions.', 'start': 1851.275, 'duration': 7.183}, {'end': 1861.199, 'text': 'So this is estimating intelligently.', 'start': 1858.498, 'duration': 2.701}, {'end': 1867.381, 'text': 'And they can also summarize what estimates are and mean with the help of Bayesian theorem.', 'start': 1861.959, 'duration': 5.422}, {'end': 1872.947, 'text': 'So now statistics equip a data scientist to solve problems and make data-driven decisions.', 'start': 1868.121, 'duration': 4.826}, {'end': 1878.834, 'text': 'And now moving on forward to most common statistical methods and this is descriptive statistics.', 'start': 1873.548, 'duration': 5.286}, {'end': 1880.837, 'text': 'So we talked about descriptive statistics.', 'start': 1879.115, 'duration': 1.722}, {'end': 1883.54, 'text': 'Now let us understand what is descriptive statistics.', 'start': 1880.897, 'duration': 2.643}, {'end': 1893.257, 'text': 'So descriptive means it is a summary statistics that quantitatively summarizes features and some measures include.', 'start': 1884.451, 'duration': 8.806}, {'end': 1899.682, 'text': 'these are the central tendency measures which are being listed on the right hand side, like mean, median and mode.', 'start': 1893.257, 'duration': 6.425}, {'end': 1911.235, 'text': 'and on the left hand side it is skewness and variability which are measures of standard deviations and variance is nothing but knowledge of like how much is the spread of the data?', 'start': 1900.282, 'duration': 10.953}, {'end': 1919.644, 'text': 'right skewness is where the data is inclined, whether it is right towards the right or left, and determining the normal distribution of the data.', 'start': 1911.235, 'duration': 8.409}, {'end': 1920.986, 'text': 'we understand that.', 'start': 1919.644, 'duration': 1.342}, {'end': 1922.727, 'text': 'are there any outliers in the data?', 'start': 1920.986, 'duration': 1.741}, {'end': 1925.51, 'text': 'what how the data really is spread?', 'start': 1922.727, 'duration': 2.783}, {'end': 1932.237, 'text': 'so these descriptive statistics they help us from, through the numbers that they give, we understand what the data is.', 'start': 1925.51, 'duration': 6.727}, {'end': 1941.984, 'text': 'Now, moving on further to understanding what is inferential statistics, or data scientists must also be able to use inferential statistics,', 'start': 1932.978, 'duration': 9.006}, {'end': 1951.649, 'text': 'because here inferential means we are taking a small sample from the entire data and we are making generalization about the entire data.', 'start': 1941.984, 'duration': 9.665}, {'end': 1960.915, 'text': 'So this is inferential statistics inferencing something from a small data assuming that this is this will also apply to a large data.', 'start': 1952.03, 'duration': 8.885}, {'end': 1966.498, 'text': 'okay. so we make inferences from a sample about the population.', 'start': 1961.673, 'duration': 4.825}, {'end': 1970.962, 'text': 'now the methods for this includes hypothesis testing, probability.', 'start': 1966.498, 'duration': 4.464}, {'end': 1974.906, 'text': 'now probability is used to predict the likelihood of an event.', 'start': 1970.962, 'duration': 3.944}, {'end': 1977.509, 'text': 'regression analysis is yet another method.', 'start': 1974.906, 'duration': 2.603}, {'end': 1985.916, 'text': 'so regression, what we do is we model the relationship between variables, like ANOVA, is yet another, which is analysis of variance.', 'start': 1977.509, 'duration': 8.407}, {'end': 1993.098, 'text': 'This is a test to compare the means of different groups and how different they are and based on the results we take decisions.', 'start': 1986.096, 'duration': 7.002}, {'end': 1995.179, 'text': 'Another method is chi-square,', 'start': 1993.718, 'duration': 1.461}, {'end': 2003.241, 'text': 'and this chi-square is used to determine any relationship between variables or compare the result between an expected and an observed values.', 'start': 1995.179, 'duration': 8.062}, {'end': 2010.383, 'text': 'So this is inferential statistics and now we will move on to another technique and that is from statistics to databases.', 'start': 2003.861, 'duration': 6.522}, {'end': 2013.086, 'text': "So let's understand what our database is.", 'start': 2010.984, 'duration': 2.102}, {'end': 2018.453, 'text': 'So now data has to be stored before analyzing or making predictions.', 'start': 2013.707, 'duration': 4.746}, {'end': 2020.535, 'text': 'And to store the data, we have databases.', 'start': 2018.553, 'duration': 1.982}, {'end': 2025.1, 'text': 'Thus, the knowledge of database again becomes important for a data scientist.', 'start': 2020.815, 'duration': 4.285}, {'end': 2032.187, 'text': 'So every time a data scientist would have to retrieve a data He has to look upon a person who could help him retrieve the data.', 'start': 2025.721, 'duration': 6.466}, {'end': 2037.191, 'text': 'So it is better that a data scientist should be equipped and should possess this knowledge,', 'start': 2032.568, 'duration': 4.623}, {'end': 2044.316, 'text': 'so that whenever the need arises to access the data or store the data data, scientists can do it by himself right?', 'start': 2037.191, 'duration': 7.125}, {'end': 2046.898, 'text': 'So that is why database knowledge is required.', 'start': 2044.616, 'duration': 2.282}, {'end': 2053.263, 'text': "So now coming to the types of databases, that is relational database was we'll be talking about.", 'start': 2047.558, 'duration': 5.705}], 'summary': 'Date and time analysis methods are used for visualization in r with ggplot2. data visualization importance is demonstrated with the anscombe quadrant. machine learning types, algorithms, and tools like scikit-learn and mlr are discussed. deep learning, natural language processing, statistics, and databases are also covered.', 'duration': 826.763, 'max_score': 1226.5, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s1226500.jpg'}, {'end': 1360.29, 'src': 'embed', 'start': 1336.389, 'weight': 4, 'content': [{'end': 1344.957, 'text': "so it's always been said that numbers will not reveal the true picture but the, but the visualization of that data would definitely reveal it.", 'start': 1336.389, 'duration': 8.568}, {'end': 1350.762, 'text': 'so this was a very important enscombe quarter to visualize that rest data visualization.', 'start': 1344.957, 'duration': 5.805}, {'end': 1351.983, 'text': 'why is it really important?', 'start': 1350.762, 'duration': 1.221}, {'end': 1357.988, 'text': 'So, moving on further, linked with this is data visualization tools that a data scientist uses.', 'start': 1352.483, 'duration': 5.505}, {'end': 1360.29, 'text': 'And one can learn these tools.', 'start': 1358.428, 'duration': 1.862}], 'summary': 'Data visualization tools are important for data scientists to reveal the true picture of the data, and they can be learned.', 'duration': 23.901, 'max_score': 1336.389, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s1336389.jpg'}, {'end': 1486.836, 'src': 'embed', 'start': 1445.846, 'weight': 5, 'content': [{'end': 1454.175, 'text': 'So a data scientist has to know about the algorithms, different kinds of algorithms that can be applied to a data to give good results.', 'start': 1445.846, 'duration': 8.329}, {'end': 1458.339, 'text': "That's why it's a little vast But it is really very interesting.", 'start': 1454.576, 'duration': 3.763}, {'end': 1463.402, 'text': 'So various algorithms like SVM, NavVise, Random Forest, K-Means.', 'start': 1458.659, 'duration': 4.743}, {'end': 1467.805, 'text': 'good command over the supervised and unsupervised machine learning is required by a data scientist.', 'start': 1463.402, 'duration': 4.403}, {'end': 1469.746, 'text': 'Hyperparameter tuning.', 'start': 1468.485, 'duration': 1.261}, {'end': 1475.97, 'text': 'Now this is again important because hyperparameters are nothing but they control the behavior of an algorithm.', 'start': 1469.966, 'duration': 6.004}, {'end': 1483.715, 'text': 'So you can say that if hyperparameter tuning is being done or performed on any algorithm, that is done to improve the results.', 'start': 1476.17, 'duration': 7.545}, {'end': 1486.836, 'text': 'So for model optimization this is required.', 'start': 1484.295, 'duration': 2.541}], 'summary': 'Data scientist needs knowledge of algorithms, including svm, navvise, random forest, k-means. requires command over supervised and unsupervised machine learning, including hyperparameter tuning for model optimization.', 'duration': 40.99, 'max_score': 1445.846, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s1445846.jpg'}, {'end': 1535.762, 'src': 'embed', 'start': 1506.321, 'weight': 7, 'content': [{'end': 1511.922, 'text': 'So the scikit-learn is the machine learning library and this supports both supervised and unsupervised learning.', 'start': 1506.321, 'duration': 5.601}, {'end': 1519.989, 'text': 'And it also provides various tools for model fitting, data pre-processing, model selection, model evaluation, and other utilities.', 'start': 1512.302, 'duration': 7.687}, {'end': 1526.534, 'text': "So if you're making any model or you're importing any of the algorithms, scikit-learn package would be used.", 'start': 1520.349, 'duration': 6.185}, {'end': 1529.757, 'text': 'Okay, now moving on further to R.', 'start': 1527.194, 'duration': 2.563}, {'end': 1535.762, 'text': 'So in R, if one wants to build machine learning models, so MLR is the package or carrot is the package.', 'start': 1529.757, 'duration': 6.005}], 'summary': 'Scikit-learn supports supervised and unsupervised learning, provides tools for model fitting, data pre-processing, model selection, model evaluation, and other utilities.', 'duration': 29.441, 'max_score': 1506.321, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s1506321.jpg'}, {'end': 1598.008, 'src': 'embed', 'start': 1570.475, 'weight': 8, 'content': [{'end': 1575.558, 'text': 'So now moving on further to another technique and technology that is deep learning techniques.', 'start': 1570.475, 'duration': 5.083}, {'end': 1582.342, 'text': 'So now, when we want to work with complex datasets, when the dataset size is huge,', 'start': 1576.198, 'duration': 6.144}, {'end': 1585.583, 'text': 'then traditional machine learning algorithms they are not preferred.', 'start': 1582.342, 'duration': 3.241}, {'end': 1588.025, 'text': 'And generally we move on to deep learning.', 'start': 1586.004, 'duration': 2.021}, {'end': 1591.626, 'text': 'So gradually from machine learning, one moves to deep learning.', 'start': 1588.405, 'duration': 3.221}, {'end': 1592.946, 'text': 'That is a gradual shift.', 'start': 1591.866, 'duration': 1.08}, {'end': 1598.008, 'text': 'So when we talk about deep learning, what comes to our mind is neural networks.', 'start': 1593.367, 'duration': 4.641}], 'summary': 'Deep learning is preferred for complex datasets with huge sizes, leading to a gradual shift from traditional machine learning to neural networks.', 'duration': 27.533, 'max_score': 1570.475, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s1570475.jpg'}, {'end': 1776.036, 'src': 'embed', 'start': 1729.998, 'weight': 9, 'content': [{'end': 1733.16, 'text': 'and another tools and technique, and that is statistics.', 'start': 1729.998, 'duration': 3.162}, {'end': 1735.702, 'text': 'so yet this is yet another important tool.', 'start': 1733.16, 'duration': 2.542}, {'end': 1738.805, 'text': 'now, how does a data scientist use statistics?', 'start': 1735.702, 'duration': 3.103}, {'end': 1741.707, 'text': 'so it has been used to identify the right questions.', 'start': 1738.805, 'duration': 2.902}, {'end': 1750.894, 'text': 'so data scientists also use statistics to apply it to the data and ask the right questions, and once they know what the question is,', 'start': 1741.707, 'duration': 9.187}, {'end': 1753.295, 'text': 'they can use statistics to find answers.', 'start': 1750.894, 'duration': 2.401}, {'end': 1761.462, 'text': 'so statistics also help them to understand how reliable their results are and how likely is it that the findings can be changed.', 'start': 1753.295, 'duration': 8.167}, {'end': 1767.327, 'text': 'Now moving on further, the statistics help data scientists to do predictive modeling.', 'start': 1761.882, 'duration': 5.445}, {'end': 1774.554, 'text': 'This means that using the past data to create models and that can be used for predicting the future events.', 'start': 1767.488, 'duration': 7.066}, {'end': 1776.036, 'text': 'That is predictive modeling.', 'start': 1774.854, 'duration': 1.182}], 'summary': 'Statistics is an important tool for data scientists, used to identify questions, find answers, and do predictive modeling.', 'duration': 46.038, 'max_score': 1729.998, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s1729998.jpg'}, {'end': 2161.177, 'src': 'embed', 'start': 2137.342, 'weight': 11, 'content': [{'end': 2146.227, 'text': 'So, when you are analyzing huge volume of data from social network sites, these non-relational database really help you to understand, analyze,', 'start': 2137.342, 'duration': 8.885}, {'end': 2148.928, 'text': 'store the data and find some relation between them.', 'start': 2146.227, 'duration': 2.701}, {'end': 2154.674, 'text': 'So that is why it is really important to have the knowledge of these databases, NoSQL databases.', 'start': 2149.551, 'duration': 5.123}, {'end': 2161.177, 'text': 'And MongoDB is optimized to store the documents, right? So there are various non-relational databases.', 'start': 2155.214, 'duration': 5.963}], 'summary': 'Non-relational databases like mongodb optimize storage of documents for analyzing social network data.', 'duration': 23.835, 'max_score': 2137.342, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s2137342.jpg'}, {'end': 2208.639, 'src': 'embed', 'start': 2178.794, 'weight': 12, 'content': [{'end': 2186.363, 'text': 'what happens here is that we are dealing with large and more complex data, set day by day, and companies are dealing with huge volumes of data,', 'start': 2178.794, 'duration': 7.569}, {'end': 2188.484, 'text': 'especially from new data sources.', 'start': 2186.903, 'duration': 1.581}, {'end': 2194.629, 'text': "So these data sets they're so voluminous that traditional data processing softwares cannot manage them.", 'start': 2188.504, 'duration': 6.125}, {'end': 2198.952, 'text': "That's why the big data technology is really helpful and big data.", 'start': 2195.029, 'duration': 3.923}, {'end': 2208.639, 'text': 'it is a collection of data from diverse data sources and it has certain characteristics like volume, variety, velocity and veracity.', 'start': 2198.952, 'duration': 9.687}], 'summary': 'Big data technology helps manage large, diverse data sets from new sources, which traditional software cannot handle.', 'duration': 29.845, 'max_score': 2178.794, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s2178794.jpg'}], 'start': 957.241, 'title': 'Data science technologies, python and r libraries, techniques, and statistics', 'summary': 'Covers essential data science technologies and programming languages such as python, scala, and java, emphasizing python and its libraries like pandas, foundational libraries for data analysis and visualization including pandas, numpy, seaborn, matplotlib, and r libraries, essential tools and techniques in data science covering machine learning libraries like scikit-learn and mlr, and the significance of statistics in predictive modeling, experimental design, and problem-solving, along with the importance of databases and big data technology.', 'chapters': [{'end': 1046.074, 'start': 957.241, 'title': 'Data science technologies and programming languages', 'summary': 'Covers the essential technologies and programming languages for data science, including database, big data (hadoop, spark), and programming languages such as python, scala, and java, with a focus on the importance of python and its libraries like pandas for data analysis and manipulation.', 'duration': 88.833, 'highlights': ['Python is the most popular language used in the data science community, preferred for analyzing huge volumes of data and performing data analysis (relevance score: 5)', 'Pandas library in Python is crucial for data analysis and manipulation, providing fast and flexible data structures that enable quick data analysis and manipulation, including time series analysis (relevance score: 4)', 'Knowledge of SQL, NoSQL, Hadoop, and Spark is essential for data scientists (relevance score: 3)', 'Forecasting weather predictions, stock prices predictions, and houses predictions are examples of time series analysis (relevance score: 2)', 'R language is also used but may be challenging on the production side (relevance score: 1)']}, {'end': 1486.836, 'start': 1046.074, 'title': 'Python and r libraries for data analysis and visualization', 'summary': 'Discusses the foundational libraries for data analysis and visualization, emphasizing the significance of pandas, numpy, seaborn, matplotlib, r, dplyr, genitor, lubridate, ggplot2, and the importance of data visualization and machine learning algorithms.', 'duration': 440.762, 'highlights': ['The significance of Pandas, NumPy, Seaborn, Matplotlib, R, dplyr, genitor, Lubridate, ggplot2, and their role in data analysis and visualization. These libraries serve as foundational tools for data analysis and visualization, playing a crucial role in scientific computations and data manipulation.', 'The importance of data visualization and how numbers and figures can appear the same but result in different graphs, emphasizing the need for visualization to understand the true picture of the data. Data visualization is essential to understand the true meaning of data, as numbers and figures may appear similar but can result in different graphs, highlighting the importance of visualization.', 'The significance of machine learning algorithms, including supervised and unsupervised algorithms, and the requirement for a data scientist to have a good command over algorithms like SVM, NavVise, Random Forest, and K-Means. A data scientist needs a good command over algorithms like SVM, NavVise, Random Forest, and K-Means, and an understanding of supervised and unsupervised machine learning for achieving good results in data analysis.', 'The role of hyperparameter tuning in controlling the behavior of machine learning algorithms and its importance in improving results for model optimization. Hyperparameter tuning is crucial for improving results and controlling the behavior of machine learning algorithms, playing a significant role in model optimization.']}, {'end': 1753.295, 'start': 1487.316, 'title': 'Data science techniques and tools', 'summary': 'Discusses the essential tools and techniques in data science, covering machine learning libraries like scikit-learn and mlr, the transition from machine learning to deep learning, and the role of statistics in asking and answering questions with data.', 'duration': 265.979, 'highlights': ['The scikit-learn library in Python supports both supervised and unsupervised learning, and provides various tools for model fitting, data pre-processing, model selection, and evaluation. Scikit-learn is a comprehensive machine learning library in Python, offering support for supervised and unsupervised learning, with tools for model fitting, pre-processing, selection, and evaluation.', 'Deep learning techniques, such as neural networks, are preferred for working with complex datasets, and are used to solve problems like image classification, text-to-speech, and language translation. Deep learning techniques, including neural networks, are preferred for complex datasets and problem-solving, such as image classification and language translation.', 'Natural Language Processing (NLP) is a subfield of artificial intelligence used for speech recognition, reading text, and virtual assistants like Siri and Alexa, with Python libraries like MXNetR and H2O for deep learning. NLP, a subfield of AI, involves tasks like speech recognition and virtual assistants, supported by Python libraries like MXNetR and H2O for deep learning.', 'Statistics are essential for data scientists to identify and ask the right questions, and to apply statistical methods to find answers within the data. Statistics are crucial for data scientists to identify and ask the right questions, and to use statistical methods to find answers within the data.']}, {'end': 2198.952, 'start': 1753.295, 'title': 'Importance of statistics and databases in data science', 'summary': 'Highlights the significance of statistics in data science, including its role in predictive modeling, experimental design, intelligent estimation, and problem-solving, as well as the importance of databases and big data technology in handling large and complex data sets.', 'duration': 445.657, 'highlights': ['importance_of_statistics Statistics plays a crucial role in data science, aiding in predictive modeling, experimental design, intelligent estimation, and problem-solving, enabling data scientists to make informed decisions and predictions based on past data and observations.', 'types_of_databases The chapter provides insight into the types of databases, including relational and non-relational databases, and highlights their significance in storing and retrieving structured and unstructured data, with examples such as MySQL, MongoDB, and Apache GDUF.', 'significance_of_big_data_technology The significance of big data technology in handling large and complex data sets is emphasized, acknowledging its relevance in analyzing unstructured data and addressing the limitations of traditional data processing software in managing voluminous data.']}], 'duration': 1241.711, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s957241.jpg', 'highlights': ['Python is the most popular language in data science for analyzing huge volumes of data (relevance score: 5)', 'Pandas library provides fast and flexible data structures for quick data analysis (relevance score: 4)', 'Knowledge of SQL, NoSQL, Hadoop, and Spark is essential for data scientists (relevance score: 3)', 'Pandas, NumPy, Seaborn, Matplotlib, and R are foundational for data analysis and visualization', 'Data visualization is essential to understand the true meaning of data (relevance score: 2)', 'A good command over machine learning algorithms like SVM, Random Forest is crucial for data analysis', 'Hyperparameter tuning is crucial for improving results and controlling the behavior of machine learning algorithms', 'Scikit-learn supports both supervised and unsupervised learning in Python', 'Deep learning techniques, such as neural networks, are preferred for working with complex datasets', 'Statistics are essential for data scientists to identify and ask the right questions', 'Statistics play a crucial role in predictive modeling, experimental design, and problem-solving', 'Insight into relational and non-relational databases and their significance in storing and retrieving data', 'The significance of big data technology in handling large and complex data sets']}, {'end': 3996.545, 'segs': [{'end': 2245.831, 'src': 'embed', 'start': 2198.952, 'weight': 0, 'content': [{'end': 2208.639, 'text': 'it is a collection of data from diverse data sources and it has certain characteristics like volume, variety, velocity and veracity.', 'start': 2198.952, 'duration': 9.687}, {'end': 2212.06, 'text': 'So these are certain characteristics of big data.', 'start': 2209.099, 'duration': 2.961}, {'end': 2215.622, 'text': 'Now moving on forward to big data technologies.', 'start': 2212.58, 'duration': 3.042}, {'end': 2216.742, 'text': 'Now why they are needed?', 'start': 2215.662, 'duration': 1.08}, {'end': 2224.945, 'text': 'because to manage big data we have to use big data technologies and they are classified into data mining, data analysis,', 'start': 2216.742, 'duration': 8.203}, {'end': 2227.206, 'text': 'data visualization and data storage.', 'start': 2224.945, 'duration': 2.261}, {'end': 2229.427, 'text': 'So these are the big data technologies.', 'start': 2227.646, 'duration': 1.781}, {'end': 2233.888, 'text': 'So all these big data technologies are not required by a data scientist.', 'start': 2229.747, 'duration': 4.141}, {'end': 2241.59, 'text': 'We will particularly deal with data storage which is required and which is really important to know by a data scientist.', 'start': 2234.228, 'duration': 7.362}, {'end': 2244.091, 'text': 'So for data storage we have Hadoop.', 'start': 2242.07, 'duration': 2.021}, {'end': 2245.831, 'text': 'So we will cover first Hadoop.', 'start': 2244.491, 'duration': 1.34}], 'summary': 'Big data has volume, variety, velocity, and veracity. big data technologies include data mining, analysis, visualization, and storage, with hadoop being essential for data storage.', 'duration': 46.879, 'max_score': 2198.952, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s2198952.jpg'}, {'end': 2689.576, 'src': 'embed', 'start': 2665.303, 'weight': 2, 'content': [{'end': 2673.207, 'text': "But we're going to take a look at the most compelling ones, which are growing demand, high salary, low competition, and diverse domains.", 'start': 2665.303, 'duration': 7.904}, {'end': 2675.968, 'text': "So let's go over them one by one.", 'start': 2673.867, 'duration': 2.101}, {'end': 2677.749, 'text': 'First up, growing demand.', 'start': 2676.448, 'duration': 1.301}, {'end': 2687.455, 'text': "US Bureau of Labor Statistics estimates that there'll be around 11.5 million data scientist jobs by the year 2026.", 'start': 2678.369, 'duration': 9.086}, {'end': 2689.576, 'text': 'That says a lot about the field itself.', 'start': 2687.455, 'duration': 2.121}], 'summary': '11.5 million data scientist jobs projected by 2026 in response to growing demand.', 'duration': 24.273, 'max_score': 2665.303, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s2665303.jpg'}, {'end': 2742.601, 'src': 'embed', 'start': 2716.468, 'weight': 3, 'content': [{'end': 2720.93, 'text': 'In India alone, a data scientist can easily make on an average over 9 lakhs.', 'start': 2716.468, 'duration': 4.462}, {'end': 2724.472, 'text': 'In the United States, that number is 120,000 US dollars.', 'start': 2721.351, 'duration': 3.121}, {'end': 2730.495, 'text': 'Data scientists with experience of more than five years can make 20 to 30 lakhs or even over that.', 'start': 2724.492, 'duration': 6.003}, {'end': 2735.558, 'text': 'And in the United States, they can make over 200 to 500 thousand dollars per annum.', 'start': 2730.855, 'duration': 4.703}, {'end': 2737.739, 'text': 'So those are some attractive figures.', 'start': 2736.098, 'duration': 1.641}, {'end': 2742.601, 'text': "And keep in mind, they don't include bonuses, incentives, benefits and allowances.", 'start': 2737.859, 'duration': 4.742}], 'summary': 'Experienced data scientists can earn over 30 lakhs in india and 500k in the us, excluding bonuses and benefits.', 'duration': 26.133, 'max_score': 2716.468, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s2716468.jpg'}, {'end': 2847.861, 'src': 'embed', 'start': 2824.841, 'weight': 4, 'content': [{'end': 2834.85, 'text': 'So what I mean by that is data scientists are hired to solve problems for a company or to help them make crucial decisions in order to solve any big problem.', 'start': 2824.841, 'duration': 10.009}, {'end': 2836.731, 'text': 'They have to ask the right questions,', 'start': 2835.11, 'duration': 1.621}, {'end': 2847.861, 'text': 'which will lead them to the right data to collect and the processes and models they need to use or build to eventually pull the right answers that will help a business grow or get rid of its problems.', 'start': 2836.731, 'duration': 11.13}], 'summary': 'Data scientists solve problems and make crucial decisions by asking the right questions and using processes and models to pull the right answers for business growth or problem-solving.', 'duration': 23.02, 'max_score': 2824.841, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s2824841.jpg'}, {'end': 3094.871, 'src': 'embed', 'start': 3062.017, 'weight': 5, 'content': [{'end': 3064.499, 'text': 'Moving on, the fourth skill is deep learning.', 'start': 3062.017, 'duration': 2.482}, {'end': 3071.786, 'text': 'So it is very useful for an aspiring data scientist to not only know the concepts of data science and machine learning,', 'start': 3064.977, 'duration': 6.809}, {'end': 3077.693, 'text': 'but they should know deep learning so they are able to replicate neural networks to produce predictive models.', 'start': 3071.786, 'duration': 5.907}, {'end': 3083.361, 'text': 'The fifth skill that a data scientist does need to have is data visualization.', 'start': 3078.414, 'duration': 4.947}, {'end': 3094.871, 'text': "They should be able to take all the insights from models and systems that they've built and be able to put that in a form of report where they can explain the insights in the simplest way possible.", 'start': 3083.948, 'duration': 10.923}], 'summary': 'Data scientists need deep learning and data visualization skills to replicate neural networks and present insights effectively.', 'duration': 32.854, 'max_score': 3062.017, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s3062017.jpg'}, {'end': 3295.821, 'src': 'embed', 'start': 3259.812, 'weight': 6, 'content': [{'end': 3261.514, 'text': 'You should know rules of probability.', 'start': 3259.812, 'duration': 1.702}, {'end': 3264.015, 'text': 'dependent and independent events.', 'start': 3261.514, 'duration': 2.501}, {'end': 3271.621, 'text': "implement conditional marginal and joint probability using Bayes' theorem, probability distribution and central limit theorem.", 'start': 3264.015, 'duration': 7.606}, {'end': 3275.111, 'text': 'And in statistics, you should know all the terminologies.', 'start': 3272.27, 'duration': 2.841}, {'end': 3284.416, 'text': 'numerical parameters like mean mode, median sensitivity, entropy, sampling techniques, types of statistics, hypothesis testing,', 'start': 3275.111, 'duration': 9.305}, {'end': 3287.857, 'text': 'data clustering testing, data regression, modeling.', 'start': 3284.416, 'duration': 3.441}, {'end': 3295.821, 'text': 'So the concepts that we just talked about of probability and statistics will form the basis that you will use to process data and gain insights.', 'start': 3288.478, 'duration': 7.343}], 'summary': 'Learn probability rules, statistics terminologies, and concepts for data processing and gaining insights.', 'duration': 36.009, 'max_score': 3259.812, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s3259812.jpg'}, {'end': 3373.085, 'src': 'embed', 'start': 3347.139, 'weight': 7, 'content': [{'end': 3354.5, 'text': "And as you're learning the math, it's very good to start learning the libraries like NumPy, Pandas, Matplotlib.", 'start': 3347.139, 'duration': 7.361}, {'end': 3358.421, 'text': "If you're learning the programming languages R, learn the same concepts.", 'start': 3355.221, 'duration': 3.2}, {'end': 3364.663, 'text': 'And for libraries, it is essential to learn E1071 and POT for R programming.', 'start': 3358.601, 'duration': 6.062}, {'end': 3373.085, 'text': 'Apart from Python or R, you should also know a bit of SQL as it forms the basis to understand any type of databases.', 'start': 3365.243, 'duration': 7.842}], 'summary': 'Learn numpy, pandas, matplotlib for python; e1071, pot for r; also sql for databases.', 'duration': 25.946, 'max_score': 3347.139, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s3347139.jpg'}, {'end': 3464.467, 'src': 'embed', 'start': 3437.611, 'weight': 8, 'content': [{'end': 3443.152, 'text': 'Apart from these, you should also learn dimensionality reduction concept, time series analysis.', 'start': 3437.611, 'duration': 5.541}, {'end': 3450.254, 'text': 'And then depending on the kind of situation and problem at hand, you should also learn how to do model selection and boosting.', 'start': 3443.553, 'duration': 6.701}, {'end': 3457.364, 'text': "So at this point, you're getting quite good and deep into data science, And as you learn all of these topics one by one,", 'start': 3450.854, 'duration': 6.51}, {'end': 3464.467, 'text': 'you should do practical examples using Python or R and make sure that you really have a grasp of all these concepts.', 'start': 3457.364, 'duration': 7.103}], 'summary': 'Learn dimensionality reduction, time series analysis, model selection, boosting, and apply with python or r for a deep understanding of data science.', 'duration': 26.856, 'max_score': 3437.611, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s3437611.jpg'}, {'end': 3503.723, 'src': 'embed', 'start': 3477.773, 'weight': 9, 'content': [{'end': 3482.175, 'text': 'So in deep learning you should know single layer perception, TensorFlow 2.0,.', 'start': 3477.773, 'duration': 4.402}, {'end': 3483.796, 'text': 'convolutional neural networks.', 'start': 3482.175, 'duration': 1.621}, {'end': 3489.795, 'text': 'regional CNN implement Boltzmann machine and autoencoder.', 'start': 3484.372, 'duration': 5.423}, {'end': 3494.698, 'text': 'generative adversarial network, emotion and gender detection.', 'start': 3489.795, 'duration': 4.903}, {'end': 3497.439, 'text': 'RNN and GRU LSTM.', 'start': 3494.698, 'duration': 2.741}, {'end': 3501.742, 'text': "If you haven't heard about these concepts, then you must be feeling overwhelmed.", 'start': 3498.14, 'duration': 3.602}, {'end': 3503.723, 'text': 'Let me put your mind at ease.', 'start': 3502.442, 'duration': 1.281}], 'summary': 'Deep learning covers single layer perception, tensorflow 2.0, cnn, boltzmann machine, autoencoder, gan, emotion and gender detection, rnn, gru, lstm.', 'duration': 25.95, 'max_score': 3477.773, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s3477773.jpg'}, {'end': 3590.389, 'src': 'embed', 'start': 3548.4, 'weight': 10, 'content': [{'end': 3558.447, 'text': 'Spark RDDS DataFrames and Spark SQL machine learning using Spark MLlib understanding Apache Kafka and Apache Flume.', 'start': 3548.4, 'duration': 10.047}, {'end': 3561.93, 'text': 'Apache Spark Streaming processing multiple batches and data sources.', 'start': 3558.447, 'duration': 3.483}, {'end': 3565.215, 'text': "again, don't dive too deep into these topics.", 'start': 3562.634, 'duration': 2.581}, {'end': 3566.696, 'text': 'you should just know the basics.', 'start': 3565.215, 'duration': 1.481}, {'end': 3569.738, 'text': "don't be overwhelmed and move on to the next thing.", 'start': 3566.696, 'duration': 3.042}, {'end': 3575.021, 'text': 'last skill to acquire, which is data visualization, and so you should know tools like tableau.', 'start': 3569.738, 'duration': 5.283}, {'end': 3583.905, 'text': 'so in data visualization, you should learn basic visual analytics, advanced visual analytics, calculations, level of detail, lod expressions,', 'start': 3575.021, 'duration': 8.884}, {'end': 3590.389, 'text': 'geographical visualizations, advanced charts, dashboard and stories, and so obviously you can use different tools,', 'start': 3583.905, 'duration': 6.484}], 'summary': 'Covered spark, kafka, flume, spark streaming, and data visualization tools like tableau.', 'duration': 41.989, 'max_score': 3548.4, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s3548400.jpg'}, {'end': 3689.038, 'src': 'embed', 'start': 3655.801, 'weight': 16, 'content': [{'end': 3658.802, 'text': "Moving ahead, let's look at data scientist job trends.", 'start': 3655.801, 'duration': 3.001}, {'end': 3665.364, 'text': 'This is a graph of number of job openings for data scientists per 1 million postings.', 'start': 3659.742, 'duration': 5.622}, {'end': 3675.289, 'text': 'What we can infer from the graph is that data scientist jobs have a growing trend, and this trend is expected to continue in the coming years.', 'start': 3666.303, 'duration': 8.986}, {'end': 3681.793, 'text': 'apart from this, statistics also say that 97% of data scientist jobs are on full-time basis.', 'start': 3675.289, 'duration': 6.504}, {'end': 3689.038, 'text': 'Whereas only 3% are part-time also one of the important things to get a job is mastering the required skills.', 'start': 3681.933, 'duration': 7.105}], 'summary': 'Data scientist job openings show a growing trend, with 97% being full-time positions and 3% part-time.', 'duration': 33.237, 'max_score': 3655.801, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s3655801.jpg'}, {'end': 3754.024, 'src': 'embed', 'start': 3705.625, 'weight': 12, 'content': [{'end': 3710.347, 'text': 'In fact, mastering just three core skills, which are Python, are an SQL,', 'start': 3705.625, 'duration': 4.722}, {'end': 3715.849, 'text': 'can provide a solid foundation for seven out of ten job openings today as data scientist.', 'start': 3710.347, 'duration': 5.502}, {'end': 3721.877, 'text': "Now, let's look at the average data scientist salary based on different criterias.", 'start': 3717.256, 'duration': 4.621}, {'end': 3729.84, 'text': "First Let's look at the salary based on degree as we can see both in India and the United States.", 'start': 3723.018, 'duration': 6.822}, {'end': 3735.602, 'text': 'There is a clear correlation between degree level and the associated salary.', 'start': 3730.08, 'duration': 5.522}, {'end': 3738.442, 'text': 'higher the degree, better the salary.', 'start': 3735.602, 'duration': 2.84}, {'end': 3743.184, 'text': 'hence PhD holders on the highest as a data scientist compared to others.', 'start': 3738.442, 'duration': 4.742}, {'end': 3747.799, 'text': 'The next criteria is experience.', 'start': 3744.636, 'duration': 3.163}, {'end': 3751.802, 'text': 'your experience plays a crucial role in deciding your salary.', 'start': 3747.799, 'duration': 4.003}, {'end': 3754.024, 'text': 'beat India or the United States.', 'start': 3751.802, 'duration': 2.222}], 'summary': 'Mastering python and sql can lead to high-demand data scientist jobs with lucrative salaries, especially for phd holders.', 'duration': 48.399, 'max_score': 3705.625, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s3705625.jpg'}, {'end': 3883.955, 'src': 'embed', 'start': 3851.351, 'weight': 14, 'content': [{'end': 3855.073, 'text': "Now, let's quickly discuss the future scope for data scientists.", 'start': 3851.351, 'duration': 3.722}, {'end': 3866.167, 'text': 'Data scientists are high in demand in industries like healthcare transport e-commerce cyber security along with aviation and Airlines.', 'start': 3856.462, 'duration': 9.705}, {'end': 3874.39, 'text': 'The healthcare industry is extensively making use of data scientists to develop a system that can predict health risks.', 'start': 3867.327, 'duration': 7.063}, {'end': 3883.955, 'text': 'They are also being hired for the process of drug discovery to provide insights into optimizing and increasing the success rate of predictions.', 'start': 3875.231, 'duration': 8.724}], 'summary': 'Data scientists are in high demand in various industries like healthcare, transport, e-commerce, cyber security, aviation, and airlines, with a focus on predicting health risks and optimizing drug discovery.', 'duration': 32.604, 'max_score': 3851.351, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s3851351.jpg'}, {'end': 4007.665, 'src': 'embed', 'start': 3978.675, 'weight': 15, 'content': [{'end': 3980.436, 'text': 'It starts with a business requirement.', 'start': 3978.675, 'duration': 1.761}, {'end': 3982.457, 'text': 'Next is the data acquisition.', 'start': 3980.776, 'duration': 1.681}, {'end': 3985.699, 'text': "After that, you'll process the data, which is called data processing.", 'start': 3982.877, 'duration': 2.822}, {'end': 3989.581, 'text': 'Then there is data exploration, modeling, and finally deployment.', 'start': 3986.039, 'duration': 3.542}, {'end': 3996.545, 'text': "So guys, before you even start on a data science project, it is important that you understand the problem you're trying to solve.", 'start': 3990.001, 'duration': 6.544}, {'end': 4001.308, 'text': "So, in this stage, you're just going to focus on identifying the central objectives of the project,", 'start': 3996.785, 'duration': 4.523}, {'end': 4005.116, 'text': "And you'll do this by identifying the variables that need to be predicted.", 'start': 4001.669, 'duration': 3.447}, {'end': 4007.665, 'text': 'Next up, we have data acquisition.', 'start': 4005.763, 'duration': 1.902}], 'summary': 'Data science project stages: requirement, acquisition, processing, exploration, modeling, deployment, and problem understanding.', 'duration': 28.99, 'max_score': 3978.675, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s3978675.jpg'}], 'start': 2198.952, 'title': 'Data science and big data technologies', 'summary': 'Covers the characteristics of big data including volume, variety, velocity, and veracity, the importance of big data technologies such as data mining, analysis, visualization, and storage, essential skills for data science including mathematics, programming, machine learning, and data visualization, and job trends for data scientists with insights into demand, salaries, and future scope.', 'chapters': [{'end': 2245.831, 'start': 2198.952, 'title': 'Understanding big data technologies', 'summary': 'Covers the characteristics of big data including volume, variety, velocity, and veracity, and emphasizes the importance of big data technologies such as data mining, data analysis, data visualization, and data storage, with a focus on hadoop.', 'duration': 46.879, 'highlights': ['The chapter explains the characteristics of big data, including volume, variety, velocity, and veracity, which are essential for understanding the nature of big data.', 'It highlights the significance of big data technologies such as data mining, data analysis, data visualization, and data storage in managing and analyzing big data.', 'The chapter emphasizes the necessity for data scientists to have knowledge about data storage, particularly focusing on Hadoop as an important technology for data storage in big data analytics.']}, {'end': 3172.347, 'start': 2245.891, 'title': 'Data science roadmap & skills', 'summary': 'Provides an overview of key technologies and skills required for a data scientist, including the demand for data scientists, their roles and responsibilities, and the essential technical and soft skills. it also outlines a structured learning process and highlights the growing demand, high salaries, low competition, and diverse domains in the field of data science.', 'duration': 926.456, 'highlights': ['Growing Demand for Data Scientists The US Bureau of Labor Statistics estimates around 11.5 million data scientist jobs by 2026, indicating significant growth in the field due to the increasing importance of data science in decision making across organizations globally.', 'High Salary Potential Experienced data scientists can earn over 20-30 lakhs in India and $200,000 to $500,000 in the United States per annum, with the potential for bonuses and incentives, reflecting the lucrative nature of the field.', 'Low Competition and Diverse Domains Organizations face challenges in finding good data scientists due to the high demand and can operate in diverse domains such as product, manufacturing, energy, retail, marketing, and healthcare, ensuring varied and challenging opportunities for data scientists.', 'Roles and Responsibilities of Data Scientists Data scientists are responsible for asking the right questions, cleaning and prepping data, performing exploratory data analysis, choosing appropriate models and algorithms, checking model accuracy, making reports, and continuously adjusting and retraining models based on stakeholder feedback to guide business decisions.', 'Essential Skills for Data Scientists Data scientists require skills in mathematics, programming, data science and machine learning, deep learning, data visualization, and big data, along with soft skills such as good communication and the ability to simplify complex ideas for stakeholders and clients.']}, {'end': 3628.754, 'start': 3172.968, 'title': 'Essential skills for data science', 'summary': 'Emphasizes the essential skills for data science, highlighting the importance of mathematics, programming skills, data science, machine learning, deep learning, big data tools, and data visualization in becoming a proficient data scientist.', 'duration': 455.786, 'highlights': ['Importance of Mathematics for Data Science Understanding the importance of probability and statistics as the basis for processing data, with emphasis on learning topics like probability rules, conditional probability, probability distributions, and statistics terminologies.', 'Programming Skills and Tools Emphasizing the significance of learning Python or R programming languages, understanding their basic syntax, data structures, file operations, functions, object-oriented programming, and essential libraries like NumPy, Pandas, and Matplotlib.', 'Data Science and Machine Learning Fundamentals Stressing the importance of understanding data science concepts, data analysis pipeline, machine learning models, algorithms, dimensionality reduction, time series analysis, model selection, and boosting.', 'Deep Learning Concepts Highlighting the necessity of learning deep learning concepts such as single layer perception, TensorFlow 2.0, convolutional neural networks, generative adversarial networks, RNN, and LSTM after gaining proficiency in machine learning and data science.', 'Big Data Tools and Concepts Emphasizing the importance of acquiring knowledge about big data tools like PySpark, Hadoop, Spark, Apache Kafka, and Apache Flume, while stressing the significance of understanding the basics without being overwhelmed.', 'Data Visualization Skills Stressing the importance of learning data visualization tools like Tableau, and understanding basic and advanced visual analytics, calculations, geographical visualizations, advanced charts, dashboards, and stories.']}, {'end': 3996.545, 'start': 3629.354, 'title': 'Data science roadmap and job trends', 'summary': 'Covers a data science roadmap, job trends for data scientists, in-demand skills, average salary based on degree, experience, location, and companies, future scope in industries, and the data life cycle.', 'duration': 367.191, 'highlights': ['In-Demand Skills for Data Scientists Mastering Python and SQL can provide a solid foundation for seven out of ten job openings today as data scientist.', 'Average Data Scientist Salary Higher degrees correlate with higher salaries, and more experience results in higher pay, particularly in larger cities like Bangalore in India and San Francisco in the United States. Companies like Oracle, JP Morgan, Intel, Amazon, IBM, and Accenture offer high salaries for data scientists.', 'Future Scope for Data Scientists Data scientists are in high demand in industries like healthcare, transport, e-commerce, cyber security, aviation, and Airlines, where they play crucial roles in developing predictive health systems, improving business processes and customer service, enhancing user experience, combating fraudulent activities, and optimizing pricing and maintenance systems.', 'Data Life Cycle The data life cycle consists of six steps: business requirement, data acquisition, data processing, data exploration, modeling, and deployment.', 'Job Trends for Data Scientists Data scientist jobs have a growing trend and are expected to continue in the coming years, with 97% of jobs being full-time and job openings correlating with specific in-demand skills.']}], 'duration': 1797.593, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s2198952.jpg', 'highlights': ['The chapter explains the characteristics of big data, including volume, variety, velocity, and veracity, which are essential for understanding the nature of big data.', 'The chapter emphasizes the necessity for data scientists to have knowledge about data storage, particularly focusing on Hadoop as an important technology for data storage in big data analytics.', 'The US Bureau of Labor Statistics estimates around 11.5 million data scientist jobs by 2026, indicating significant growth in the field due to the increasing importance of data science in decision making across organizations globally.', 'Experienced data scientists can earn over 20-30 lakhs in India and $200,000 to $500,000 in the United States per annum, with the potential for bonuses and incentives, reflecting the lucrative nature of the field.', 'Data scientists are responsible for asking the right questions, cleaning and prepping data, performing exploratory data analysis, choosing appropriate models and algorithms, checking model accuracy, making reports, and continuously adjusting and retraining models based on stakeholder feedback to guide business decisions.', 'Data scientists require skills in mathematics, programming, data science and machine learning, deep learning, data visualization, and big data, along with soft skills such as good communication and the ability to simplify complex ideas for stakeholders and clients.', 'Understanding the importance of probability and statistics as the basis for processing data, with emphasis on learning topics like probability rules, conditional probability, probability distributions, and statistics terminologies.', 'Emphasizing the significance of learning Python or R programming languages, understanding their basic syntax, data structures, file operations, functions, object-oriented programming, and essential libraries like NumPy, Pandas, and Matplotlib.', 'Stressing the importance of understanding data science concepts, data analysis pipeline, machine learning models, algorithms, dimensionality reduction, time series analysis, model selection, and boosting.', 'Highlighting the necessity of learning deep learning concepts such as single layer perception, TensorFlow 2.0, convolutional neural networks, generative adversarial networks, RNN, and LSTM after gaining proficiency in machine learning and data science.', 'Emphasizing the importance of acquiring knowledge about big data tools like PySpark, Hadoop, Spark, Apache Kafka, and Apache Flume, while stressing the significance of understanding the basics without being overwhelmed.', 'Stressing the importance of learning data visualization tools like Tableau, and understanding basic and advanced visual analytics, calculations, geographical visualizations, advanced charts, dashboards, and stories.', 'Mastering Python and SQL can provide a solid foundation for seven out of ten job openings today as data scientist.', 'Higher degrees correlate with higher salaries, and more experience results in higher pay, particularly in larger cities like Bangalore in India and San Francisco in the United States.', 'Data scientists are in high demand in industries like healthcare, transport, e-commerce, cyber security, aviation, and Airlines, where they play crucial roles in developing predictive health systems, improving business processes and customer service, enhancing user experience, combating fraudulent activities, and optimizing pricing and maintenance systems.', 'The data life cycle consists of six steps: business requirement, data acquisition, data processing, data exploration, modeling, and deployment.', 'Data scientist jobs have a growing trend and are expected to continue in the coming years, with 97% of jobs being full-time and job openings correlating with specific in-demand skills.']}, {'end': 6528.793, 'segs': [{'end': 4068.065, 'src': 'embed', 'start': 4040.348, 'weight': 7, 'content': [{'end': 4047.032, 'text': "So if you find any data set that is cleaned and it's packaged well for you, then you've actually won the lottery,", 'start': 4040.348, 'duration': 6.684}, {'end': 4051.275, 'text': 'because finding the right data takes a lot of time and it takes a lot of effort.', 'start': 4047.032, 'duration': 4.243}, {'end': 4056.819, 'text': 'And one of the major time consuming tasks in the data science process is data cleaning.', 'start': 4051.756, 'duration': 5.063}, {'end': 4059.06, 'text': 'Okay, this requires a lot of time.', 'start': 4057.279, 'duration': 1.781}, {'end': 4060.661, 'text': 'it requires a lot of effort,', 'start': 4059.06, 'duration': 1.601}, {'end': 4068.065, 'text': 'because you have to go through the entire data set to find out any missing values or if there are any inconsistent values or corrupted data,', 'start': 4060.661, 'duration': 7.404}], 'summary': 'Finding a well-cleaned data set saves time and effort in data science.', 'duration': 27.717, 'max_score': 4040.348, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s4040348.jpg'}, {'end': 4503.742, 'src': 'heatmap', 'start': 4092.195, 'weight': 0.714, 'content': [{'end': 4095.756, 'text': 'You can just pull up a random subset of data and plot a histogram.', 'start': 4092.195, 'duration': 3.561}, {'end': 4098.477, 'text': 'You can even create interactive visualizations.', 'start': 4096.036, 'duration': 2.441}, {'end': 4105.747, 'text': 'This is the point where you dive deep into the data and you try to explore the different models that can be applied to your data.', 'start': 4099.142, 'duration': 6.605}, {'end': 4107.707, 'text': 'Next up, we have data modeling.', 'start': 4106.187, 'duration': 1.52}, {'end': 4112.892, 'text': "So after processing the data, what you're going to do is you're going to carry out model training.", 'start': 4108.127, 'duration': 4.765}, {'end': 4118.756, 'text': 'Okay, now model training is basically about finding the model that answers the questions more accurately.', 'start': 4113.292, 'duration': 5.464}, {'end': 4121.678, 'text': 'So the process of model training involves a lot of steps.', 'start': 4119.196, 'duration': 2.482}, {'end': 4127.022, 'text': "So firstly, you'll start by splitting the input data into the training data set and the testing data set.", 'start': 4122.138, 'duration': 4.884}, {'end': 4131.063, 'text': "Okay, you're going to take the entire data set and you're going to separate it into two parts.", 'start': 4127.421, 'duration': 3.642}, {'end': 4133.564, 'text': 'One is the training and one is the testing data.', 'start': 4131.462, 'duration': 2.102}, {'end': 4136.904, 'text': "After that, you'll build a model by using the training data set.", 'start': 4133.944, 'duration': 2.96}, {'end': 4140.746, 'text': "And once you're done with that, you'll evaluate the training and the test data set.", 'start': 4137.205, 'duration': 3.541}, {'end': 4146.508, 'text': "Now to evaluate the training and testing data set, you'll be using series of machine learning algorithms.", 'start': 4141.206, 'duration': 5.302}, {'end': 4151.049, 'text': "After that, you'll find out the model which is the most suitable for your business requirement.", 'start': 4146.988, 'duration': 4.061}, {'end': 4153.461, 'text': 'So this was mainly data modeling.', 'start': 4151.64, 'duration': 1.821}, {'end': 4159.962, 'text': 'This is where you build a model out of your training data set and then you evaluate this model by using the testing data set.', 'start': 4153.702, 'duration': 6.26}, {'end': 4161.542, 'text': 'Next, we have deployment.', 'start': 4160.343, 'duration': 1.199}, {'end': 4168.104, 'text': 'So guys, the goal of this stage is to deploy a model into a production or maybe a production-like environment.', 'start': 4162.064, 'duration': 6.04}, {'end': 4174.487, 'text': 'So this is basically done for final user acceptance and the users have to validate the performance of the models,', 'start': 4168.484, 'duration': 6.003}, {'end': 4180.408, 'text': 'and if there are any issues with the model or any issues with the algorithm, then they have to be fixed in this stage.', 'start': 4174.487, 'duration': 5.921}, {'end': 4188.937, 'text': "Let's move ahead and take a look at what is data.", 'start': 4186.225, 'duration': 2.712}, {'end': 4191.344, 'text': 'Now this is a quite simple question.', 'start': 4189.602, 'duration': 1.742}, {'end': 4198.77, 'text': "If I ask any of you what is data, you'll see that it's a set of numbers or some sort of documents that I've stored in my computer.", 'start': 4191.444, 'duration': 7.326}, {'end': 4203.774, 'text': 'Now data is actually everything, all right? Look around you, there is data everywhere.', 'start': 4199.25, 'duration': 4.524}, {'end': 4207.598, 'text': 'Each click on your phone generates more data than you know.', 'start': 4204.355, 'duration': 3.243}, {'end': 4214.243, 'text': 'Now this generated data provides insights for analysis and helps us make better business decisions.', 'start': 4208.078, 'duration': 6.165}, {'end': 4216.906, 'text': 'This is why data is so important.', 'start': 4214.864, 'duration': 2.042}, {'end': 4224.988, 'text': 'To give you a formal definition, data refers to facts and statistics collected together for reference or analysis.', 'start': 4217.546, 'duration': 7.442}, {'end': 4229.849, 'text': 'Alright, this is the definition of data in terms of statistics and probability.', 'start': 4225.688, 'duration': 4.161}, {'end': 4239.131, 'text': 'So as we know, data can be collected, it can be measured and analyzed, it can be visualized by using statistical models and graphs.', 'start': 4230.669, 'duration': 8.462}, {'end': 4243.986, 'text': 'Now data is divided into two major subcategories.', 'start': 4240.302, 'duration': 3.684}, {'end': 4248.592, 'text': 'So first we have qualitative data and quantitative data.', 'start': 4245.128, 'duration': 3.464}, {'end': 4250.614, 'text': 'These are the two different types of data.', 'start': 4248.832, 'duration': 1.782}, {'end': 4258.583, 'text': 'Under qualitative data we have nominal and ordinal data and under quantitative data we have discrete and continuous data.', 'start': 4251.255, 'duration': 7.328}, {'end': 4261.696, 'text': "Now let's focus on qualitative data.", 'start': 4259.415, 'duration': 2.281}, {'end': 4270.12, 'text': "Now this type of data deals with characteristics and descriptors that can't be easily measured but can be observed subjectively.", 'start': 4262.036, 'duration': 8.084}, {'end': 4275.182, 'text': 'Now qualitative data is further divided into nominal and ordinal data.', 'start': 4270.68, 'duration': 4.502}, {'end': 4281.525, 'text': "So nominal data is any sort of data that doesn't have any order or ranking.", 'start': 4275.802, 'duration': 5.723}, {'end': 4284.626, 'text': 'An example of nominal data is gender.', 'start': 4282.405, 'duration': 2.221}, {'end': 4287.183, 'text': 'Now there is no ranking in gender.', 'start': 4285.322, 'duration': 1.861}, {'end': 4290.065, 'text': "There's only male, female, or other.", 'start': 4287.423, 'duration': 2.642}, {'end': 4294.007, 'text': 'There is no one, two, three, four, or any sort of ordering in gender.', 'start': 4290.405, 'duration': 3.602}, {'end': 4296.888, 'text': 'Race is another example of nominal data.', 'start': 4294.607, 'duration': 2.281}, {'end': 4301.751, 'text': 'Now ordinal data is basically an ordered series of information.', 'start': 4297.488, 'duration': 4.263}, {'end': 4304.892, 'text': "Let's say that you went to a restaurant.", 'start': 4303.011, 'duration': 1.881}, {'end': 4308.654, 'text': 'Your information is stored in the form of customer ID.', 'start': 4305.593, 'duration': 3.061}, {'end': 4311.896, 'text': 'So basically you are represented with a customer ID.', 'start': 4309.295, 'duration': 2.601}, {'end': 4317.532, 'text': 'Now you would have rated their service as either good or average.', 'start': 4312.569, 'duration': 4.963}, {'end': 4319.693, 'text': "That's how ordinal data is.", 'start': 4317.912, 'duration': 1.781}, {'end': 4326.636, 'text': "And similarly, they'll have a record of other customers who visit the restaurant along with their ratings.", 'start': 4320.433, 'duration': 6.203}, {'end': 4333.259, 'text': 'So any data which has some sort of sequence or some sort of order to it is known as ordinal data.', 'start': 4327.697, 'duration': 5.562}, {'end': 4336.321, 'text': 'So guys, this is pretty simple to understand.', 'start': 4334.42, 'duration': 1.901}, {'end': 4339.59, 'text': "Now let's move on and look at quantitative data.", 'start': 4337.108, 'duration': 2.482}, {'end': 4343.593, 'text': 'So quantitative data basically deals with numbers and things.', 'start': 4340.05, 'duration': 3.543}, {'end': 4347.595, 'text': 'You can understand that by the word quantitative itself.', 'start': 4344.373, 'duration': 3.222}, {'end': 4354.6, 'text': 'Quantitative is basically quantity, right? So it deals with numbers, it deals with anything that you can measure objectively.', 'start': 4348.076, 'duration': 6.524}, {'end': 4358.723, 'text': 'So there are two types of quantitative data.', 'start': 4356.201, 'duration': 2.522}, {'end': 4361.125, 'text': 'There is discrete and continuous data.', 'start': 4358.963, 'duration': 2.162}, {'end': 4368.836, 'text': 'Now discrete data is also known as categorical data and it can hold a finite number of possible values.', 'start': 4361.733, 'duration': 7.103}, {'end': 4372.638, 'text': 'Now the number of students in a class is a finite number.', 'start': 4369.497, 'duration': 3.141}, {'end': 4375.799, 'text': "Alright, you can't have infinite number of students in a class.", 'start': 4372.658, 'duration': 3.141}, {'end': 4378.961, 'text': "Let's say in your fifth grade there were 100 students in your class.", 'start': 4376.34, 'duration': 2.621}, {'end': 4383.963, 'text': "Alright, there weren't infinite number but there was a definite finite number of students in your class.", 'start': 4379.241, 'duration': 4.722}, {'end': 4385.704, 'text': "Okay, that's discrete data.", 'start': 4384.283, 'duration': 1.421}, {'end': 4387.785, 'text': 'Next, we have continuous data.', 'start': 4386.304, 'duration': 1.481}, {'end': 4392.989, 'text': 'Now, this type of data can hold infinite number of possible values, okay?', 'start': 4388.185, 'duration': 4.804}, {'end': 4398.052, 'text': 'So when you say weight of a person is an example of continuous data.', 'start': 4393.489, 'duration': 4.563}, {'end': 4405.217, 'text': 'what I mean to say is my weight can be 50 kgs or it can be 50.1 kgs or it can be 50.001 kgs or 50.0001 or 50.023, and so on.', 'start': 4398.052, 'duration': 7.165}, {'end': 4407.238, 'text': 'right?. There are infinite number of possible values.', 'start': 4405.217, 'duration': 2.021}, {'end': 4419.806, 'text': 'So this is what I mean by continuous data.', 'start': 4416.704, 'duration': 3.102}, {'end': 4423.827, 'text': 'This is the difference between discrete and continuous data.', 'start': 4419.826, 'duration': 4.001}, {'end': 4428.009, 'text': "And also I'd like to mention a few other things over here.", 'start': 4424.788, 'duration': 3.221}, {'end': 4431.851, 'text': 'Now there are a couple of types of variables as well.', 'start': 4428.59, 'duration': 3.261}, {'end': 4436.333, 'text': 'We have a discrete variable and we have a continuous variable.', 'start': 4432.391, 'duration': 3.942}, {'end': 4440.286, 'text': 'Discrete variable is also known as a categorical variable.', 'start': 4436.784, 'duration': 3.502}, {'end': 4443.529, 'text': 'It can hold values of different categories.', 'start': 4440.987, 'duration': 2.542}, {'end': 4447.831, 'text': "Let's say that you have a variable called message.", 'start': 4444.129, 'duration': 3.702}, {'end': 4451.954, 'text': 'And there are two types of values that this variable can hold.', 'start': 4448.292, 'duration': 3.662}, {'end': 4456.938, 'text': "Let's say that your message can either be a spam message or a non-spam message.", 'start': 4452.795, 'duration': 4.143}, {'end': 4462.021, 'text': "That's when you call a variable as discrete or categorical variable.", 'start': 4457.598, 'duration': 4.423}, {'end': 4466.524, 'text': 'Because it can hold values that represent different categories of data.', 'start': 4462.641, 'duration': 3.883}, {'end': 4472.782, 'text': 'Now continuous variables are basically variables that can store infinite number of values.', 'start': 4467.216, 'duration': 5.566}, {'end': 4476.746, 'text': 'So the weight of a person can be denoted as a continuous variable.', 'start': 4473.342, 'duration': 3.404}, {'end': 4482.732, 'text': "Let's say there is a variable called weight and it can store infinite number of possible values.", 'start': 4477.366, 'duration': 5.366}, {'end': 4484.974, 'text': "That's why we'll call it a continuous variable.", 'start': 4482.932, 'duration': 2.042}, {'end': 4489.239, 'text': 'So, guys, basically, variable is anything that can store a value right?', 'start': 4485.658, 'duration': 3.581}, {'end': 4496.781, 'text': "So if you associate any sort of data with a variable, then it'll become either discrete variable or continuous variable.", 'start': 4489.699, 'duration': 7.082}, {'end': 4499.821, 'text': 'that is also dependent and independent type of variables.', 'start': 4496.781, 'duration': 3.04}, {'end': 4503.742, 'text': "Now we won't discuss all of that in depth because that's pretty understandable.", 'start': 4500.482, 'duration': 3.26}], 'summary': 'Data modeling involves training, testing, and evaluating models; qualitative and quantitative data are distinct categories.', 'duration': 411.547, 'max_score': 4092.195, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s4092195.jpg'}, {'end': 4151.049, 'src': 'embed', 'start': 4119.196, 'weight': 6, 'content': [{'end': 4121.678, 'text': 'So the process of model training involves a lot of steps.', 'start': 4119.196, 'duration': 2.482}, {'end': 4127.022, 'text': "So firstly, you'll start by splitting the input data into the training data set and the testing data set.", 'start': 4122.138, 'duration': 4.884}, {'end': 4131.063, 'text': "Okay, you're going to take the entire data set and you're going to separate it into two parts.", 'start': 4127.421, 'duration': 3.642}, {'end': 4133.564, 'text': 'One is the training and one is the testing data.', 'start': 4131.462, 'duration': 2.102}, {'end': 4136.904, 'text': "After that, you'll build a model by using the training data set.", 'start': 4133.944, 'duration': 2.96}, {'end': 4140.746, 'text': "And once you're done with that, you'll evaluate the training and the test data set.", 'start': 4137.205, 'duration': 3.541}, {'end': 4146.508, 'text': "Now to evaluate the training and testing data set, you'll be using series of machine learning algorithms.", 'start': 4141.206, 'duration': 5.302}, {'end': 4151.049, 'text': "After that, you'll find out the model which is the most suitable for your business requirement.", 'start': 4146.988, 'duration': 4.061}], 'summary': 'Model training involves data splitting, building, and evaluating using machine learning algorithms to find the most suitable model.', 'duration': 31.853, 'max_score': 4119.196, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s4119196.jpg'}, {'end': 4275.182, 'src': 'embed', 'start': 4251.255, 'weight': 9, 'content': [{'end': 4258.583, 'text': 'Under qualitative data we have nominal and ordinal data and under quantitative data we have discrete and continuous data.', 'start': 4251.255, 'duration': 7.328}, {'end': 4261.696, 'text': "Now let's focus on qualitative data.", 'start': 4259.415, 'duration': 2.281}, {'end': 4270.12, 'text': "Now this type of data deals with characteristics and descriptors that can't be easily measured but can be observed subjectively.", 'start': 4262.036, 'duration': 8.084}, {'end': 4275.182, 'text': 'Now qualitative data is further divided into nominal and ordinal data.', 'start': 4270.68, 'duration': 4.502}], 'summary': 'Qualitative data includes nominal and ordinal types.', 'duration': 23.927, 'max_score': 4251.255, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s4251255.jpg'}, {'end': 6143.434, 'src': 'heatmap', 'start': 4896.786, 'weight': 0, 'content': [{'end': 4904.231, 'text': "Now, guys, in this session I'll only be focusing on probability, so let's move on and look at the different types of probability sampling.", 'start': 4896.786, 'duration': 7.445}, {'end': 4915.559, 'text': 'So what is probability sampling? It is a sampling technique in which samples from a large population are chosen by using the theory of probability.', 'start': 4905.114, 'duration': 10.445}, {'end': 4919.381, 'text': 'So there are three types of probability sampling.', 'start': 4916.94, 'duration': 2.441}, {'end': 4921.682, 'text': 'First we have the random sampling.', 'start': 4920.002, 'duration': 1.68}, {'end': 4928.466, 'text': 'Now in this method, each member of the population has an equal chance of being selected in the sample.', 'start': 4922.223, 'duration': 6.243}, {'end': 4937.019, 'text': 'Alright, so each and every individual or each and every object in the population has an equal chance of being a part of the sample.', 'start': 4929.292, 'duration': 7.727}, {'end': 4939.641, 'text': "That's what random sampling is all about.", 'start': 4937.86, 'duration': 1.781}, {'end': 4943.665, 'text': "Okay, you're randomly going to select any individual or any object.", 'start': 4939.981, 'duration': 3.684}, {'end': 4950.84, 'text': 'So this way, each individual has an equal chance of being selected, correct? Next we have systematic sampling.', 'start': 4944.105, 'duration': 6.735}, {'end': 4959.065, 'text': 'Now in systematic sampling, every nth record is chosen from the population to be a part of the sample.', 'start': 4951.4, 'duration': 7.665}, {'end': 4962.947, 'text': "Now refer this image that I've shown over here.", 'start': 4960.325, 'duration': 2.622}, {'end': 4968.29, 'text': 'Out of these six groups, every second group is chosen as a sample.', 'start': 4963.847, 'duration': 4.443}, {'end': 4973.892, 'text': 'Okay, so every second record is chosen here, and this is how systematic sampling works.', 'start': 4968.968, 'duration': 4.924}, {'end': 4979.236, 'text': "Okay, you're randomly selecting the nth record, and you're going to add that to your sample.", 'start': 4974.372, 'duration': 4.864}, {'end': 4981.778, 'text': 'Next, we have stratified sampling.', 'start': 4980.016, 'duration': 1.762}, {'end': 4987.922, 'text': 'Now, in this type of technique, a stratum is used to form samples from a large population.', 'start': 4982.298, 'duration': 5.624}, {'end': 4996.529, 'text': 'So what is a stratum? A stratum is basically a subset of the population that shares at least one common characteristics.', 'start': 4988.423, 'duration': 8.106}, {'end': 5000.534, 'text': "So let's say that your population has a mix of both male and female.", 'start': 4997.173, 'duration': 3.361}, {'end': 5003.135, 'text': 'So you can create two stratums out of this.', 'start': 5001.134, 'duration': 2.001}, {'end': 5007.256, 'text': 'One will have only the male subset and the other will have the female subset.', 'start': 5003.435, 'duration': 3.821}, {'end': 5009.557, 'text': 'This is what stratum is.', 'start': 5007.777, 'duration': 1.78}, {'end': 5015.339, 'text': 'It is basically a subset of the population that shares at least one common characteristics.', 'start': 5009.977, 'duration': 5.362}, {'end': 5018, 'text': 'In our example, it is gender.', 'start': 5016.299, 'duration': 1.701}, {'end': 5026.086, 'text': "So after you've created a stratum, you're going to use random sampling on these stratums and you're going to choose a final sample.", 'start': 5018.661, 'duration': 7.425}, {'end': 5034.951, 'text': 'So random sampling, meaning that all of the individuals in each of the stratum will have an equal chance of being selected in the sample correct?', 'start': 5026.526, 'duration': 8.425}, {'end': 5038.613, 'text': 'So, guys, these were the three different types of sampling techniques.', 'start': 5035.331, 'duration': 3.282}, {'end': 5044.757, 'text': "Now let's move on and look at our next topic which is the different types of statistics.", 'start': 5039.334, 'duration': 5.423}, {'end': 5050.191, 'text': "So after this we'll be looking at the more advanced concepts of statistics.", 'start': 5045.589, 'duration': 4.602}, {'end': 5054.513, 'text': 'So far, we discussed the basics of statistics, which is basically what is statistics,', 'start': 5050.231, 'duration': 4.282}, {'end': 5058.255, 'text': 'the different sampling techniques and the terminologies in statistics.', 'start': 5054.513, 'duration': 3.742}, {'end': 5061.216, 'text': 'Now we look at the different types of statistics.', 'start': 5058.975, 'duration': 2.241}, {'end': 5067.159, 'text': 'So there are two major types of statistics, descriptive statistics and inferential statistics.', 'start': 5061.896, 'duration': 5.263}, {'end': 5072.141, 'text': "In today's session we'll be discussing both of these types of statistics in depth.", 'start': 5067.859, 'duration': 4.282}, {'end': 5083.871, 'text': "We'll also be looking at a demo which I'll be running in the R language in order to make you understand what exactly descriptive and inferential statistics is.", 'start': 5073.099, 'duration': 10.772}, {'end': 5088.175, 'text': "So guys, we're just gonna look at the basics, so don't worry if you don't have much knowledge.", 'start': 5084.511, 'duration': 3.664}, {'end': 5090.778, 'text': "I'm explaining everything from the basic level.", 'start': 5088.455, 'duration': 2.323}, {'end': 5091.678, 'text': 'All right?', 'start': 5091.378, 'duration': 0.3}, {'end': 5092.619, 'text': 'So, guys,', 'start': 5092.159, 'duration': 0.46}, {'end': 5103.026, 'text': 'descriptive statistics is a method which is used to describe and understand the features of specific data set by giving a short summary of the data,', 'start': 5092.619, 'duration': 10.407}, {'end': 5106.709, 'text': 'okay?. So it is mainly focused upon the characteristics of data.', 'start': 5103.026, 'duration': 3.683}, {'end': 5110.011, 'text': 'It also provides a graphical summary of the data.', 'start': 5107.29, 'duration': 2.721}, {'end': 5118.998, 'text': "Now, in order to make you understand what descriptive statistics is, let's suppose that you want to gift all your classmates a T-shirt.", 'start': 5110.812, 'duration': 8.186}, {'end': 5123.802, 'text': 'So to study the average shirt size of a student in a classroom.', 'start': 5119.56, 'duration': 4.242}, {'end': 5130.705, 'text': 'so if you were to use descriptive statistics to study the average shirt size of students in your classroom,', 'start': 5123.802, 'duration': 6.903}, {'end': 5139.769, 'text': 'then what you would do is you would record the shirt size of all students in the class and then you would find out the maximum minimum and average shirt size of the class.', 'start': 5130.705, 'duration': 9.064}, {'end': 5143.794, 'text': 'Okay, so coming to inferential statistics.', 'start': 5140.473, 'duration': 3.321}, {'end': 5152.015, 'text': 'inferential statistics makes inferences and predictions about a population based on the sample of data taken from the population.', 'start': 5143.794, 'duration': 8.221}, {'end': 5160.317, 'text': 'Okay, so in simple words, it generalizes a large data set and it applies probability to draw a conclusion.', 'start': 5152.495, 'duration': 7.822}, {'end': 5167.138, 'text': 'Okay, so it allows you to infer data parameters based on a statistical model by using sample data.', 'start': 5160.697, 'duration': 6.441}, {'end': 5174.622, 'text': 'So, if we consider the same example of finding the average shirt size of students in a class in inferential statistics,', 'start': 5167.757, 'duration': 6.865}, {'end': 5180.066, 'text': 'you will take a sample set of the class, which is basically a few people from the entire class.', 'start': 5174.622, 'duration': 5.444}, {'end': 5184.91, 'text': 'You already have had grouped the class into large, medium, and small.', 'start': 5181.127, 'duration': 3.783}, {'end': 5191.995, 'text': 'In this method, you basically build a statistical model and expand it for the entire population in the class.', 'start': 5185.79, 'duration': 6.205}, {'end': 5196.777, 'text': 'So guys, that was a brief understanding of descriptive and inferential statistics.', 'start': 5192.595, 'duration': 4.182}, {'end': 5199.859, 'text': "So that's the difference between descriptive and inferential.", 'start': 5197.178, 'duration': 2.681}, {'end': 5204.081, 'text': "Now in the next section, we'll go in depth about descriptive statistics.", 'start': 5200.419, 'duration': 3.662}, {'end': 5209.024, 'text': "So let's discuss more about descriptive statistics.", 'start': 5205.162, 'duration': 3.862}, {'end': 5211.612, 'text': 'So, like I mentioned earlier,', 'start': 5210.091, 'duration': 1.521}, {'end': 5222.396, 'text': 'descriptive statistics is a method that is used to describe and understand the features of a specific data set by giving short summaries about the sample and measures of the data.', 'start': 5211.612, 'duration': 10.784}, {'end': 5226.058, 'text': 'There are two important measures in descriptive statistics.', 'start': 5222.957, 'duration': 3.101}, {'end': 5233.381, 'text': 'We have measure of central tendency, which is also known as measure of center, and we have measures of variability.', 'start': 5226.338, 'duration': 7.043}, {'end': 5235.722, 'text': 'This is also known as measures of spread.', 'start': 5233.761, 'duration': 1.961}, {'end': 5239.62, 'text': 'So measures of center include mean, median, and mode.', 'start': 5236.334, 'duration': 3.286}, {'end': 5247.613, 'text': 'Now what is measures of center? Measures of the center are statistical measures that represent the summary of a data set.', 'start': 5240.08, 'duration': 7.533}, {'end': 5252.069, 'text': 'The three main measures of center are mean, median, and mode.', 'start': 5248.548, 'duration': 3.521}, {'end': 5262.213, 'text': 'Coming to measures of variability or measures of spread, we have range, interquartile range, variance, and standard deviation.', 'start': 5252.429, 'duration': 9.784}, {'end': 5268.535, 'text': "So now let's discuss each of these measures in a little more depth, starting with the measures of center.", 'start': 5262.573, 'duration': 5.962}, {'end': 5271.235, 'text': "Now I'm sure all of you know what the mean is.", 'start': 5268.555, 'duration': 2.68}, {'end': 5275.997, 'text': 'Mean is basically the measure of the average of all the values in a sample.', 'start': 5271.515, 'duration': 4.482}, {'end': 5280.472, 'text': "Okay, so it's basically the average of all the values in a sample.", 'start': 5276.79, 'duration': 3.682}, {'end': 5284.474, 'text': 'How do you measure the mean? I hope all of you know how the mean is measured.', 'start': 5280.732, 'duration': 3.742}, {'end': 5289.396, 'text': 'If there are 10 numbers and you wanna find the mean of these 10 numbers,', 'start': 5285.094, 'duration': 4.302}, {'end': 5293.578, 'text': 'all you have to do is you have to add up all the 10 numbers and you have to divide it by 10..', 'start': 5289.396, 'duration': 4.182}, {'end': 5298.12, 'text': '10 here represents the number of samples in your data set.', 'start': 5293.578, 'duration': 4.542}, {'end': 5302.262, 'text': "Alright, since we have 10 numbers, we're gonna divide this by 10.", 'start': 5298.48, 'duration': 3.782}, {'end': 5304.863, 'text': 'Alright, this will give us the average or the mean.', 'start': 5302.262, 'duration': 2.601}, {'end': 5310.099, 'text': "So to better understand the measures of central tendency, let's look at an example.", 'start': 5305.655, 'duration': 4.444}, {'end': 5316.524, 'text': 'Now the data set over here is basically the cars data set and it contains a few variables.', 'start': 5310.479, 'duration': 6.045}, {'end': 5318.326, 'text': 'It has something known as cars.', 'start': 5316.965, 'duration': 1.361}, {'end': 5325.372, 'text': 'It has mileage per gallon, cylinder type, displacement, horsepower, and real axle ratio.', 'start': 5318.866, 'duration': 6.506}, {'end': 5329.356, 'text': 'All right, all of these measures are related to cars, okay?', 'start': 5325.852, 'duration': 3.504}, {'end': 5340.026, 'text': "So what you're gonna do is you're going to use descriptive analysis and you're going to analyze each of the variables in the sample data set for the mean standard deviation median,", 'start': 5329.816, 'duration': 10.21}, {'end': 5340.927, 'text': 'more, and so on.', 'start': 5340.026, 'duration': 0.901}, {'end': 5347.979, 'text': "So let's say that you want to find out the mean or the average horsepower of the cars among the population of cars.", 'start': 5341.494, 'duration': 6.485}, {'end': 5352.783, 'text': "Like I mentioned earlier, what you'll do is you'll check the average of all the values.", 'start': 5348.74, 'duration': 4.043}, {'end': 5360.169, 'text': "So in this case, we'll take the sum of the horsepower of each car and we'll divide that by the total number of cars.", 'start': 5353.344, 'duration': 6.825}, {'end': 5363.352, 'text': "Okay, that's exactly what I've done here in the calculation part.", 'start': 5360.59, 'duration': 2.762}, {'end': 5369.097, 'text': 'So this 110 basically represents the horsepower for the first car.', 'start': 5363.952, 'duration': 5.145}, {'end': 5376.867, 'text': "Similarly, I've just added up all the values of horsepower for each of the cars and I've divided it by eight.", 'start': 5369.958, 'duration': 6.909}, {'end': 5380.812, 'text': 'Now eight is basically the number of cars in our data set.', 'start': 5377.447, 'duration': 3.365}, {'end': 5387.494, 'text': 'So 103.625 is what our mean is or the average of horsepower is.', 'start': 5382.772, 'duration': 4.722}, {'end': 5391.756, 'text': "Now let's understand what median is with an example.", 'start': 5388.775, 'duration': 2.981}, {'end': 5399.46, 'text': 'So to define median, median is basically a measure of the central value of the sample set is called the median.', 'start': 5392.676, 'duration': 6.784}, {'end': 5402.061, 'text': 'You can say that it is a middle value.', 'start': 5400.2, 'duration': 1.861}, {'end': 5410.076, 'text': 'So, if you want to find out the center value of the mileage per gallon among the population of cars first,', 'start': 5402.751, 'duration': 7.325}, {'end': 5417.54, 'text': "what we'll do is we'll arrange the MPG values in ascending or descending order and choose a middle value.", 'start': 5410.076, 'duration': 7.464}, {'end': 5423.924, 'text': 'In this case, since we have eight values, we have eight values, which is an even entry.', 'start': 5417.82, 'duration': 6.104}, {'end': 5432.835, 'text': "So whenever you have even number of data points or samples in your data set, then you're going to take the average of the two middle values.", 'start': 5424.773, 'duration': 8.062}, {'end': 5438.936, 'text': 'If we had nine values over here, we can easily figure out the middle value and choose that as a median.', 'start': 5433.335, 'duration': 5.601}, {'end': 5444.237, 'text': "But since there are even number of values, we're going to take the average of the two middle values.", 'start': 5439.296, 'duration': 4.941}, {'end': 5444.497, 'text': 'all right?', 'start': 5444.237, 'duration': 0.26}, {'end': 5452.219, 'text': "So 22.8 and 23 are my two middle values, and I'm taking the mean of those two, and hence I get 22.9,, which is my median.", 'start': 5444.837, 'duration': 7.382}, {'end': 5457.24, 'text': "Lastly, let's look at how mode is calculated.", 'start': 5454.999, 'duration': 2.241}, {'end': 5466.521, 'text': 'So what is mode? The value that is most recurrent in the sample set is known as mode, or basically the value that occurs most often.', 'start': 5457.68, 'duration': 8.841}, {'end': 5468.662, 'text': 'That is known as mode.', 'start': 5467.262, 'duration': 1.4}, {'end': 5474.403, 'text': "So let's say that we want to find out the most common type of cylinder among the population of cars.", 'start': 5469.282, 'duration': 5.121}, {'end': 5478.864, 'text': 'All we have to do is we will check the value which is repeated the most number of times.', 'start': 5475.203, 'duration': 3.661}, {'end': 5482.222, 'text': 'here we can see that the cylinders come in two types.', 'start': 5479.48, 'duration': 2.742}, {'end': 5486.246, 'text': 'we have cylinder of type four and cylinder of type six.', 'start': 5482.222, 'duration': 4.024}, {'end': 5488.187, 'text': 'right. so take a look at the data set.', 'start': 5486.246, 'duration': 1.941}, {'end': 5491.89, 'text': 'you can see that the most recurring value is six.', 'start': 5488.187, 'duration': 3.703}, {'end': 5496.674, 'text': 'right, we have one, two, three, four and five.', 'start': 5491.89, 'duration': 4.784}, {'end': 5501.278, 'text': 'we have five, six and we have one, two, three.', 'start': 5496.674, 'duration': 4.604}, {'end': 5506.329, 'text': 'yeah, we have three, four type cylinders and five six-type cylinders.', 'start': 5501.278, 'duration': 5.051}, {'end': 5512.133, 'text': 'So basically we have three four-type cylinders and we have five six-type cylinders.', 'start': 5506.909, 'duration': 5.224}, {'end': 5517.337, 'text': 'So our mode is going to be six since six is more recurrent than four.', 'start': 5513.074, 'duration': 4.263}, {'end': 5522.324, 'text': 'So guys, those were the measures of the center or the measures of central tendency.', 'start': 5517.981, 'duration': 4.343}, {'end': 5526.147, 'text': "Now let's move on and look at the measures of the spread.", 'start': 5522.924, 'duration': 3.223}, {'end': 5529.329, 'text': 'Now, what is the measure of spread?', 'start': 5526.727, 'duration': 2.602}, {'end': 5538.255, 'text': 'A measure of spread, sometimes also called as measure of dispersion, is used to describe the variability in a sample or population.', 'start': 5529.729, 'duration': 8.526}, {'end': 5542.678, 'text': 'You can think of it as some sort of deviation in the sample.', 'start': 5539.015, 'duration': 3.663}, {'end': 5546.946, 'text': 'So you measure this with the help of the different measure of spreads.', 'start': 5543.625, 'duration': 3.321}, {'end': 5551.027, 'text': 'We have range, interquartile range, variance, and standard deviation.', 'start': 5547.346, 'duration': 3.681}, {'end': 5553.788, 'text': 'Now range is pretty self-explanatory.', 'start': 5551.628, 'duration': 2.16}, {'end': 5559.43, 'text': 'It is the given measure of how spread apart the values in a data set are.', 'start': 5554.288, 'duration': 5.142}, {'end': 5563.992, 'text': 'The range can be calculated as shown in this formula.', 'start': 5559.99, 'duration': 4.002}, {'end': 5570.476, 'text': "So you're basically going to subtract the maximum value in your data set from the minimum value in your data set.", 'start': 5564.393, 'duration': 6.083}, {'end': 5576.52, 'text': "That's how you calculate the range of the data, all right? Next we have interquartile range.", 'start': 5570.896, 'duration': 5.624}, {'end': 5582.023, 'text': "So before we discuss interquartile range, let's understand what a quartile is.", 'start': 5577.26, 'duration': 4.763}, {'end': 5589.506, 'text': 'So quartiles basically tell us about the spread of a data set by breaking the data set into different quartiles.', 'start': 5583.042, 'duration': 6.464}, {'end': 5596.63, 'text': 'Just like how the median breaks the data into two parts, the quartile will break it into different quartiles.', 'start': 5590.246, 'duration': 6.384}, {'end': 5602.934, 'text': "So to better understand how quartile and interquartile are calculated, let's look at a small example.", 'start': 5597.231, 'duration': 5.703}, {'end': 5611.24, 'text': 'Now, this data set basically represents the marks of 100 students ordered from the lowest to the highest scores.', 'start': 5603.735, 'duration': 7.505}, {'end': 5614.842, 'text': 'Alright, so the quartiles lie in the following ranges.', 'start': 5611.92, 'duration': 2.922}, {'end': 5620.586, 'text': 'Now, the first quartile, which is also known as Q1, it lies between the 25th and the 26th observation.', 'start': 5614.862, 'duration': 5.724}, {'end': 5626.77, 'text': "All right, so if you look at this, I've highlighted the 25th and the 26th observation.", 'start': 5622.507, 'duration': 4.263}, {'end': 5632.915, 'text': 'So how you can calculate Q1 or first quartile is by taking the average of these two values.', 'start': 5627.19, 'duration': 5.725}, {'end': 5639.499, 'text': "All right, since both the values are 45, when you add them up and divide them by two, you'll still get 45.", 'start': 5633.375, 'duration': 6.124}, {'end': 5644.723, 'text': 'Now the second quartile or Q2 is between the 50th and the 51st observation.', 'start': 5639.499, 'duration': 5.224}, {'end': 5651.619, 'text': "So you're going to take the average of 58 and 59 and you'll get a value of 58.5.", 'start': 5645.345, 'duration': 6.274}, {'end': 5653.043, 'text': 'Now this is my second quarter.', 'start': 5651.619, 'duration': 1.424}, {'end': 5659.527, 'text': 'the third quartile, or q3, is between the 75th and the 76th observation.', 'start': 5653.641, 'duration': 5.886}, {'end': 5669.195, 'text': "here again, you'll take the average of the two values, which is the 75th value and the 76th value, all right, and you'll get a value of 71 all right.", 'start': 5659.527, 'duration': 9.668}, {'end': 5672.679, 'text': 'so, guys, this is exactly how you calculate the different quartiles.', 'start': 5669.195, 'duration': 3.484}, {'end': 5675.661, 'text': "now let's look at what is interquartile range.", 'start': 5672.679, 'duration': 2.982}, {'end': 5683.649, 'text': 'so iqr, or the interquartile range, is a measure of variability based on dividing a data set into quartiles.', 'start': 5675.661, 'duration': 7.988}, {'end': 5689.395, 'text': 'Now the interquartile range is calculated by subtracting the Q1 from Q3.', 'start': 5684.19, 'duration': 5.205}, {'end': 5692.938, 'text': 'So basically Q3 minus Q1 is your IQR.', 'start': 5690.055, 'duration': 2.883}, {'end': 5696.602, 'text': 'So your IQR is your Q3 minus Q1.', 'start': 5693.999, 'duration': 2.603}, {'end': 5699.745, 'text': 'Now this is how each of the quartiles are.', 'start': 5697.583, 'duration': 2.162}, {'end': 5703.579, 'text': 'Each quartile represents a quarter which is 25%.', 'start': 5699.965, 'duration': 3.614}, {'end': 5708.427, 'text': 'Alright, so guys I hope all of you are clear with interquartile range and what are quartiles.', 'start': 5703.579, 'duration': 4.848}, {'end': 5711.071, 'text': "Now let's look at variance.", 'start': 5708.907, 'duration': 2.164}, {'end': 5718.317, 'text': 'Now, variance is basically a measure that shows how much a random variable differs from its expected value.', 'start': 5711.753, 'duration': 6.564}, {'end': 5721.219, 'text': "Okay, it's basically the variance in any variable.", 'start': 5718.677, 'duration': 2.542}, {'end': 5724.421, 'text': 'Now, variance can be calculated by using this formula.', 'start': 5721.679, 'duration': 2.742}, {'end': 5728.983, 'text': 'Right here, X basically represents any data point in your data set.', 'start': 5724.441, 'duration': 4.542}, {'end': 5735.267, 'text': 'N is the total number of data points in your data set, and X bar is basically the mean of data points.', 'start': 5729.544, 'duration': 5.723}, {'end': 5738.349, 'text': 'All right, this is how you calculate variance.', 'start': 5735.847, 'duration': 2.502}, {'end': 5742.393, 'text': 'Variance is basically computing the squares of deviations.', 'start': 5738.889, 'duration': 3.504}, {'end': 5744.394, 'text': "Okay, that's why it says s squared there.", 'start': 5742.633, 'duration': 1.761}, {'end': 5747.638, 'text': "Now let's look at what is deviation.", 'start': 5745.075, 'duration': 2.563}, {'end': 5751.301, 'text': 'Deviation is just the difference between each element from the mean.', 'start': 5748.058, 'duration': 3.243}, {'end': 5760.79, 'text': 'Okay, so it can be calculated by using this simple formula where xi basically represents a data point and mu is the mean of the population.', 'start': 5751.761, 'duration': 9.029}, {'end': 5763.833, 'text': 'Alright, this is exactly how you calculate deviation.', 'start': 5760.81, 'duration': 3.023}, {'end': 5773.668, 'text': "Now, population variance and sample variance are very specific to whether you're calculating the variance in your population data set or in your sample data set.", 'start': 5764.466, 'duration': 9.202}, {'end': 5776.889, 'text': "That's the only difference between population and sample variance.", 'start': 5774.028, 'duration': 2.861}, {'end': 5780.929, 'text': 'So the formula for population variance is pretty explanatory.', 'start': 5777.509, 'duration': 3.42}, {'end': 5783.17, 'text': 'So Xi is basically each data point.', 'start': 5781.29, 'duration': 1.88}, {'end': 5785.23, 'text': 'Mu is the mean of the population.', 'start': 5783.33, 'duration': 1.9}, {'end': 5787.951, 'text': 'N is the number of samples in your data set.', 'start': 5785.931, 'duration': 2.02}, {'end': 5791.312, 'text': "Now let's look at sample variance.", 'start': 5789.211, 'duration': 2.101}, {'end': 5796.154, 'text': 'Now sample variance is the average of squared differences from the mean.', 'start': 5791.812, 'duration': 4.342}, {'end': 5799.635, 'text': 'Here xi is any data point or any sample in your data set.', 'start': 5796.374, 'duration': 3.261}, {'end': 5803.037, 'text': 'X bar is the mean of your sample.', 'start': 5800.155, 'duration': 2.882}, {'end': 5804.757, 'text': "It's not the mean of your population.", 'start': 5803.137, 'duration': 1.62}, {'end': 5806.358, 'text': "It's the mean of your sample.", 'start': 5805.237, 'duration': 1.121}, {'end': 5809.639, 'text': 'And if you notice, n here is a smaller n.', 'start': 5806.658, 'duration': 2.981}, {'end': 5811.82, 'text': "It's the number of data points in your sample.", 'start': 5809.639, 'duration': 2.181}, {'end': 5816.181, 'text': 'And this is basically the difference between sample and population variance.', 'start': 5812.74, 'duration': 3.441}, {'end': 5817.362, 'text': 'I hope that is clear.', 'start': 5816.562, 'duration': 0.8}, {'end': 5826.268, 'text': "Coming to standard deviation is the measure of dispersion of a set of data from its mean, all right? So it's basically the deviation from your mean.", 'start': 5818.18, 'duration': 8.088}, {'end': 5828.03, 'text': "That's what standard deviation is.", 'start': 5826.468, 'duration': 1.562}, {'end': 5834.276, 'text': "Now to better understand how the measures of spread are calculated, let's look at a small use case.", 'start': 5828.57, 'duration': 5.706}, {'end': 5837.215, 'text': "So let's say Daenerys has 20 dragons.", 'start': 5834.873, 'duration': 2.342}, {'end': 5842.739, 'text': 'They have the numbers nine, two, five, four, and so on as shown on the screen.', 'start': 5837.595, 'duration': 5.144}, {'end': 5845.861, 'text': 'What you have to do is you have to work out the standard deviation.', 'start': 5843.019, 'duration': 2.842}, {'end': 5850.744, 'text': 'All right, in order to calculate the standard deviation, you need to know the mean right?', 'start': 5846.321, 'duration': 4.423}, {'end': 5854.046, 'text': "So first you're gonna find out the mean of your sample set.", 'start': 5851.224, 'duration': 2.822}, {'end': 5862.112, 'text': 'So how do you calculate the mean? You add all the numbers in your data set and divide it by the total number of samples in your data set.', 'start': 5854.507, 'duration': 7.605}, {'end': 5864.639, 'text': 'So you get a value of seven here.', 'start': 5862.777, 'duration': 1.862}, {'end': 5868.824, 'text': 'Then you calculate the RHS of your standard deviation formula.', 'start': 5865.18, 'duration': 3.644}, {'end': 5873.09, 'text': "So from each data point, you're going to subtract the mean and you're going to square that.", 'start': 5869.365, 'duration': 3.725}, {'end': 5876.814, 'text': "So when you do that, you'll get the following result.", 'start': 5873.891, 'duration': 2.923}, {'end': 5878.836, 'text': "You'll basically get this, 425, 4925, and so on.", 'start': 5877.235, 'duration': 1.601}, {'end': 5886.001, 'text': 'So finally, you will just find the mean of these squared differences all right?', 'start': 5882.04, 'duration': 3.961}, {'end': 5891.722, 'text': 'So your standard deviation will come up to 2.983 once you take the square root.', 'start': 5886.361, 'duration': 5.361}, {'end': 5893.483, 'text': 'So guys, this is pretty simple.', 'start': 5892.303, 'duration': 1.18}, {'end': 5895.483, 'text': "It's a simple mathematic technique.", 'start': 5893.623, 'duration': 1.86}, {'end': 5902.325, 'text': 'All you have to do is you have to substitute the values in the formula, all right? I hope this was clear to all of you.', 'start': 5895.603, 'duration': 6.722}, {'end': 5908.751, 'text': "Now let's move on and discuss the next topic which is information gain and entropy.", 'start': 5903.889, 'duration': 4.862}, {'end': 5911.452, 'text': 'Now this is one of my favorite topics in statistics.', 'start': 5909.251, 'duration': 2.201}, {'end': 5919.436, 'text': "It's very interesting and this topic is mainly involved in machine learning algorithms like decision trees and random forest.", 'start': 5911.572, 'duration': 7.864}, {'end': 5927.259, 'text': "It's very important for you to know how information gain and entropy really work and why they're so essential in building machine learning models.", 'start': 5920.096, 'duration': 7.163}, {'end': 5931.16, 'text': "We'll focus on the statistic parts of information gain and entropy.", 'start': 5927.959, 'duration': 3.201}, {'end': 5937.823, 'text': "And after that, we'll discuss a use case and see how information gain and entropy is used in decision trees.", 'start': 5931.74, 'duration': 6.083}, {'end': 5943.225, 'text': "So for those of you who don't know what a decision tree is, it is basically a machine learning algorithm.", 'start': 5938.343, 'duration': 4.882}, {'end': 5944.985, 'text': "You don't have to know anything about this.", 'start': 5943.505, 'duration': 1.48}, {'end': 5947.386, 'text': "I'll explain everything in depth, so don't worry.", 'start': 5945.125, 'duration': 2.261}, {'end': 5951.768, 'text': "Now let's look at what exactly entropy and information gain is.", 'start': 5947.846, 'duration': 3.922}, {'end': 5958.338, 'text': 'Now guys, entropy is basically the measure of any sort of uncertainty that is present in the data.', 'start': 5953.095, 'duration': 5.243}, {'end': 5961.84, 'text': 'Alright, so it can be measured by using this formula.', 'start': 5958.838, 'duration': 3.002}, {'end': 5968.664, 'text': 'So here S is the set of all instances in the data set or all the data items in the data set.', 'start': 5962.22, 'duration': 6.444}, {'end': 5971.966, 'text': 'N is the different type of classes in your data set.', 'start': 5969.104, 'duration': 2.862}, {'end': 5974.327, 'text': 'PI is the event probability.', 'start': 5972.546, 'duration': 1.781}, {'end': 5981.869, 'text': "Now, this might seem a little confusing to you all, but when we go through the use case, you'll understand all of these terms even better.", 'start': 5974.847, 'duration': 7.022}, {'end': 5982.149, 'text': 'all right?', 'start': 5981.869, 'duration': 0.28}, {'end': 5985.87, 'text': 'Coming to information gain, as the word suggests,', 'start': 5982.589, 'duration': 3.281}, {'end': 5993.992, 'text': 'information gain indicates how much information a particular feature or a particular variable gives us about the final outcome.', 'start': 5985.87, 'duration': 8.122}, {'end': 5996.773, 'text': 'Okay, it can be measured by using this formula.', 'start': 5994.472, 'duration': 2.301}, {'end': 6000.994, 'text': 'So again here, H is the entropy of the whole data set S.', 'start': 5997.293, 'duration': 3.701}, {'end': 6006.095, 'text': 'Sj is the number of instances with the j value of an attribute A.', 'start': 6000.994, 'duration': 5.101}, {'end': 6008.676, 'text': 'S is the total number of instances in the data set.', 'start': 6006.095, 'duration': 2.581}, {'end': 6012.757, 'text': 'V is the set of distinct values of an attribute A.', 'start': 6008.976, 'duration': 3.781}, {'end': 6016.158, 'text': 'H is the entropy of subset of instances.', 'start': 6012.757, 'duration': 3.401}, {'end': 6020.499, 'text': 'And H is the entropy of an attribute A.', 'start': 6016.798, 'duration': 3.701}, {'end': 6024.504, 'text': "Even though this seems confusing, I'll clear out the confusion.", 'start': 6021.261, 'duration': 3.243}, {'end': 6033.53, 'text': "Let's discuss a small problem statement where we'll understand how information gain and entropy is used to study the significance of a model.", 'start': 6025.124, 'duration': 8.406}, {'end': 6042.437, 'text': 'So like I said, information gain and entropy are very important statistical measures that let us understand the significance of a predictive model.', 'start': 6034.091, 'duration': 8.346}, {'end': 6046.62, 'text': "To get a more clear understanding, let's look at a use case.", 'start': 6043.377, 'duration': 3.243}, {'end': 6051.221, 'text': "All right, now suppose we're given a problem statement.", 'start': 6048.519, 'duration': 2.702}, {'end': 6058.284, 'text': 'All right, the statement is that you have to predict whether a match can be played or not by studying the weather conditions.', 'start': 6051.601, 'duration': 6.683}, {'end': 6062.187, 'text': 'So the predictor variables here are outlook, humidity, wind.', 'start': 6058.865, 'duration': 3.322}, {'end': 6064.388, 'text': 'Day is also a predictor variable.', 'start': 6062.207, 'duration': 2.181}, {'end': 6066.789, 'text': 'The target variable is basically play.', 'start': 6064.768, 'duration': 2.021}, {'end': 6071.212, 'text': "All right, the target variable is the variable that you're trying to predict.", 'start': 6067.189, 'duration': 4.023}, {'end': 6076.767, 'text': 'okay?. Now, the value of the target variable will decide whether or not a game can be played.', 'start': 6071.212, 'duration': 5.555}, {'end': 6079.528, 'text': "So that's why the play has two values.", 'start': 6077.547, 'duration': 1.981}, {'end': 6080.649, 'text': 'It has no and yes.', 'start': 6079.608, 'duration': 1.041}, {'end': 6085.531, 'text': 'No meaning that the weather conditions are not good and therefore you cannot play the game.', 'start': 6081.209, 'duration': 4.322}, {'end': 6091.253, 'text': 'Yes meaning that the weather conditions are good and suitable for you to play the game.', 'start': 6086.111, 'duration': 5.142}, {'end': 6093.554, 'text': 'So that was a problem statement.', 'start': 6091.973, 'duration': 1.581}, {'end': 6095.775, 'text': 'I hope the problem statement is clear to all of you.', 'start': 6093.654, 'duration': 2.121}, {'end': 6100.497, 'text': 'Now to solve such a problem, we make use of something known as decision trees.', 'start': 6096.415, 'duration': 4.082}, {'end': 6106.673, 'text': 'So guys, think of an inverted tree, and each branch of the tree denotes some decision.', 'start': 6101.169, 'duration': 5.504}, {'end': 6106.953, 'text': 'all right?', 'start': 6106.673, 'duration': 0.28}, {'end': 6109.975, 'text': 'Each branch is known as the branch node,', 'start': 6107.293, 'duration': 2.682}, {'end': 6117.36, 'text': "and at each branch node you're going to take a decision in such a manner that you'll get an outcome at the end of the branch.", 'start': 6109.975, 'duration': 7.385}, {'end': 6117.66, 'text': 'all right?', 'start': 6117.36, 'duration': 0.3}, {'end': 6127.787, 'text': 'Now, this figure here basically shows that out of 14 observations, nine observations result in a yes, meaning that out of 14 days,', 'start': 6118.301, 'duration': 9.486}, {'end': 6130.609, 'text': 'the match can be played only on nine days.', 'start': 6127.787, 'duration': 2.822}, {'end': 6130.849, 'text': 'all right?', 'start': 6130.609, 'duration': 0.24}, {'end': 6138.349, 'text': 'So here, if you see on day one, day two, day eight, day nine and day 11, the outlook has been sunny.', 'start': 6131.602, 'duration': 6.747}, {'end': 6143.434, 'text': "So basically we're trying to cluster our data set depending on the outlook.", 'start': 6139.29, 'duration': 4.144}], 'summary': 'Focused on probability sampling and descriptive & inferential statistics, discussing measures of central tendency, variability, and information gain in statistics.', 'duration': 24.896, 'max_score': 4896.786, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s4896786.jpg'}, {'end': 5226.058, 'src': 'embed', 'start': 5197.178, 'weight': 1, 'content': [{'end': 5199.859, 'text': "So that's the difference between descriptive and inferential.", 'start': 5197.178, 'duration': 2.681}, {'end': 5204.081, 'text': "Now in the next section, we'll go in depth about descriptive statistics.", 'start': 5200.419, 'duration': 3.662}, {'end': 5209.024, 'text': "So let's discuss more about descriptive statistics.", 'start': 5205.162, 'duration': 3.862}, {'end': 5211.612, 'text': 'So, like I mentioned earlier,', 'start': 5210.091, 'duration': 1.521}, {'end': 5222.396, 'text': 'descriptive statistics is a method that is used to describe and understand the features of a specific data set by giving short summaries about the sample and measures of the data.', 'start': 5211.612, 'duration': 10.784}, {'end': 5226.058, 'text': 'There are two important measures in descriptive statistics.', 'start': 5222.957, 'duration': 3.101}], 'summary': 'Descriptive statistics describe features of data, including measures, in depth.', 'duration': 28.88, 'max_score': 5197.178, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s5197178.jpg'}, {'end': 5275.997, 'src': 'embed', 'start': 5248.548, 'weight': 2, 'content': [{'end': 5252.069, 'text': 'The three main measures of center are mean, median, and mode.', 'start': 5248.548, 'duration': 3.521}, {'end': 5262.213, 'text': 'Coming to measures of variability or measures of spread, we have range, interquartile range, variance, and standard deviation.', 'start': 5252.429, 'duration': 9.784}, {'end': 5268.535, 'text': "So now let's discuss each of these measures in a little more depth, starting with the measures of center.", 'start': 5262.573, 'duration': 5.962}, {'end': 5271.235, 'text': "Now I'm sure all of you know what the mean is.", 'start': 5268.555, 'duration': 2.68}, {'end': 5275.997, 'text': 'Mean is basically the measure of the average of all the values in a sample.', 'start': 5271.515, 'duration': 4.482}], 'summary': 'Measures of center: mean, median, mode. measures of variability: range, interquartile range, variance, standard deviation.', 'duration': 27.449, 'max_score': 5248.548, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s5248548.jpg'}, {'end': 5696.602, 'src': 'embed', 'start': 5669.195, 'weight': 3, 'content': [{'end': 5672.679, 'text': 'so, guys, this is exactly how you calculate the different quartiles.', 'start': 5669.195, 'duration': 3.484}, {'end': 5675.661, 'text': "now let's look at what is interquartile range.", 'start': 5672.679, 'duration': 2.982}, {'end': 5683.649, 'text': 'so iqr, or the interquartile range, is a measure of variability based on dividing a data set into quartiles.', 'start': 5675.661, 'duration': 7.988}, {'end': 5689.395, 'text': 'Now the interquartile range is calculated by subtracting the Q1 from Q3.', 'start': 5684.19, 'duration': 5.205}, {'end': 5692.938, 'text': 'So basically Q3 minus Q1 is your IQR.', 'start': 5690.055, 'duration': 2.883}, {'end': 5696.602, 'text': 'So your IQR is your Q3 minus Q1.', 'start': 5693.999, 'duration': 2.603}], 'summary': 'Interquartile range (iqr) is calculated by subtracting q1 from q3, representing the measure of variability in a data set.', 'duration': 27.407, 'max_score': 5669.195, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s5669195.jpg'}, {'end': 6012.757, 'src': 'embed', 'start': 5982.589, 'weight': 5, 'content': [{'end': 5985.87, 'text': 'Coming to information gain, as the word suggests,', 'start': 5982.589, 'duration': 3.281}, {'end': 5993.992, 'text': 'information gain indicates how much information a particular feature or a particular variable gives us about the final outcome.', 'start': 5985.87, 'duration': 8.122}, {'end': 5996.773, 'text': 'Okay, it can be measured by using this formula.', 'start': 5994.472, 'duration': 2.301}, {'end': 6000.994, 'text': 'So again here, H is the entropy of the whole data set S.', 'start': 5997.293, 'duration': 3.701}, {'end': 6006.095, 'text': 'Sj is the number of instances with the j value of an attribute A.', 'start': 6000.994, 'duration': 5.101}, {'end': 6008.676, 'text': 'S is the total number of instances in the data set.', 'start': 6006.095, 'duration': 2.581}, {'end': 6012.757, 'text': 'V is the set of distinct values of an attribute A.', 'start': 6008.976, 'duration': 3.781}], 'summary': "Information gain measures feature's predictiveness, using entropy, instances, and distinct values.", 'duration': 30.168, 'max_score': 5982.589, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s5982589.jpg'}, {'end': 6091.253, 'src': 'embed', 'start': 6058.865, 'weight': 8, 'content': [{'end': 6062.187, 'text': 'So the predictor variables here are outlook, humidity, wind.', 'start': 6058.865, 'duration': 3.322}, {'end': 6064.388, 'text': 'Day is also a predictor variable.', 'start': 6062.207, 'duration': 2.181}, {'end': 6066.789, 'text': 'The target variable is basically play.', 'start': 6064.768, 'duration': 2.021}, {'end': 6071.212, 'text': "All right, the target variable is the variable that you're trying to predict.", 'start': 6067.189, 'duration': 4.023}, {'end': 6076.767, 'text': 'okay?. Now, the value of the target variable will decide whether or not a game can be played.', 'start': 6071.212, 'duration': 5.555}, {'end': 6079.528, 'text': "So that's why the play has two values.", 'start': 6077.547, 'duration': 1.981}, {'end': 6080.649, 'text': 'It has no and yes.', 'start': 6079.608, 'duration': 1.041}, {'end': 6085.531, 'text': 'No meaning that the weather conditions are not good and therefore you cannot play the game.', 'start': 6081.209, 'duration': 4.322}, {'end': 6091.253, 'text': 'Yes meaning that the weather conditions are good and suitable for you to play the game.', 'start': 6086.111, 'duration': 5.142}], 'summary': 'Predictor variables include outlook, humidity, wind, and day; target variable is play with values: yes and no.', 'duration': 32.388, 'max_score': 6058.865, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s6058865.jpg'}, {'end': 6419.503, 'src': 'embed', 'start': 6390.412, 'weight': 4, 'content': [{'end': 6392.773, 'text': 'You just substitute the values in the formula.', 'start': 6390.412, 'duration': 2.361}, {'end': 6396.394, 'text': "So when you substitute the values in the formula, you'll get a value of 0.940.", 'start': 6392.953, 'duration': 3.441}, {'end': 6403.517, 'text': 'This is the entropy or this is the uncertainty of the data present in our sample.', 'start': 6396.394, 'duration': 7.123}, {'end': 6408.439, 'text': 'Now, in order to ensure that we choose the best variable for the root node.', 'start': 6404.258, 'duration': 4.181}, {'end': 6413.121, 'text': 'let us look at all the possible combinations that you can use on the root node.', 'start': 6408.439, 'duration': 4.682}, {'end': 6415.502, 'text': 'So these are all the possible combinations.', 'start': 6413.641, 'duration': 1.861}, {'end': 6419.503, 'text': 'You can either have outlook, you can have windy, humidity, or temperature.', 'start': 6415.562, 'duration': 3.941}], 'summary': 'Entropy value of 0.940 indicates uncertainty in sample data. evaluating possible combinations for root node variables.', 'duration': 29.091, 'max_score': 6390.412, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s6390412.jpg'}], 'start': 3996.785, 'title': 'Understanding the data science process and statistical analysis', 'summary': 'Provides an overview of the data science process, covering stages such as data acquisition, processing, exploration, modeling, and deployment. it also explains ordinal and quantitative data, sampling techniques, descriptive and inferential statistics, measures of central tendency and variability, as well as cylinders, measures of spread, information gain, and entropy in machine learning. it emphasizes the importance of data types and statistical techniques in various analyses.', 'chapters': [{'end': 4301.751, 'start': 3996.785, 'title': 'Data science process overview', 'summary': 'Outlines the data science process, covering key stages such as data acquisition, processing, exploration, modeling, and deployment, and also provides insights on the importance and types of data.', 'duration': 304.966, 'highlights': ['Data Acquisition This stage emphasizes the identification of central project objectives and the process of gathering data from various sources to support these objectives.', 'Data Modeling The process of model training involves splitting the input data into training and testing sets, building a model, and evaluating it using machine learning algorithms to find the most suitable model for business requirements.', 'Data Processing Data cleaning is highlighted as a time-consuming task that involves formatting, structuring, and removing unnecessary or inconsistent data from the collected datasets to prepare it for analysis.', 'Data Exploration The stage involves using visualization techniques like histograms and exploring different models to understand patterns and prepare for analysis.', 'Importance and Types of Data The importance of data in making better business decisions is emphasized, along with insights into qualitative and quantitative data, including subcategories of nominal, ordinal, discrete, and continuous data.']}, {'end': 4691.626, 'start': 4303.011, 'title': 'Understanding data: ordinal and quantitative data', 'summary': 'Explains the concept of ordinal data and quantitative data, including discrete and continuous data, and introduces basic terminologies in statistics, emphasizing the significance of population and sample in statistical analysis.', 'duration': 388.615, 'highlights': ['Statistics is an area of applied mathematics concerned with data collection, analysis, interpretation, and presentation, including visualizing and interpreting data using statistical methods. Statistics encompasses data collection, analysis, interpretation, and presentation, serving as a tool to solve complex problems and visualize data.', 'The chapter introduces the concept of ordinal data and its representation through customer ratings, illustrating the concept of data with some sort of sequence or order. Ordinal data is explained through the example of customer ratings at a restaurant, demonstrating data with a sequence or order.', 'The distinction between discrete and continuous data is clearly explained, demonstrating the finite and infinite number of possible values. The chapter explains the difference between discrete and continuous data, highlighting the finite and infinite number of possible values they can hold.', 'The significance of basic terminologies in statistics, namely population and sample, is emphasized, implying their recurring importance throughout statistics courses and problem-solving. The importance of basic terminologies in statistics, specifically population and sample, is highlighted, indicating their significance in statistical analysis.']}, {'end': 5478.864, 'start': 4692.293, 'title': 'Sampling techniques and types of statistics', 'summary': 'Explains the importance of sampling techniques in statistical analysis, emphasizing the role of probability sampling and its three types: random, systematic, and stratified sampling. it also provides a clear understanding of descriptive and inferential statistics, emphasizing their differences and applications, and delves into the measures of central tendency and variability in descriptive statistics.', 'duration': 786.571, 'highlights': ['The chapter emphasizes the importance of probability sampling and its three types: random, systematic, and stratified sampling, in inferring statistical knowledge about a population. Probability sampling is highlighted as a crucial method for inferring statistical knowledge about a population, with an emphasis on its three types.', 'The chapter provides a detailed understanding of descriptive and inferential statistics, highlighting their differences and applications in statistical analysis. Clear explanation of the differences and applications of descriptive and inferential statistics is provided, enhancing understanding of statistical analysis.', 'The chapter delves into the measures of central tendency and variability in descriptive statistics, explaining the concepts of mean, median, mode, range, interquartile range, variance, and standard deviation. In-depth explanation of measures of central tendency and variability in descriptive statistics, covering concepts such as mean, median, mode, range, interquartile range, variance, and standard deviation.']}, {'end': 5919.436, 'start': 5479.48, 'title': 'Cylinders and measures of spread', 'summary': 'Covers the types of cylinders, mode calculation, measures of spread including range, interquartile range, variance, and standard deviation, with an example of calculating these measures and explaining population and sample variance, as well as standard deviation. it also briefly mentions information gain and entropy in machine learning.', 'duration': 439.956, 'highlights': ['Explanation of Interquartile Range and quartiles calculation Interquartile range (IQR) is explained as the variability measure based on quartiles, with a detailed example of quartile calculation and IQR formula.', 'Calculation and explanation of standard deviation with a practical example The calculation of standard deviation is demonstrated through a practical example of finding the mean and applying the standard deviation formula to a given dataset of numbers.', 'Definition and calculation of variance, including population and sample variance Variance is defined as a measure of how much a random variable differs from its expected value, with explanations and formulas provided for both population and sample variance calculation.']}, {'end': 6528.793, 'start': 5920.096, 'title': 'Understanding information gain and entropy', 'summary': 'Explains the concepts of information gain and entropy, their importance in machine learning, using a decision tree use case, and calculating information gain for attributes, with a focus on the example of predicting whether a match can be played based on weather conditions.', 'duration': 608.697, 'highlights': ['Entropy measures uncertainty in the data and can be calculated using a specific formula. Entropy is a measure of uncertainty in the data and can be calculated using a formula involving the set of instances, the different classes, and event probability.', 'Information gain quantifies the usefulness of a feature in predicting the final outcome and can be calculated using a specific formula. Information gain indicates the amount of information a feature provides about the final outcome and can be calculated using a formula involving the entropy of the whole dataset, the number of instances with a specific attribute value, and the entropy of subsets.', "The use case illustrates how decision trees, information gain, and entropy are used to study the significance of a predictive model, particularly in predicting whether a match can be played based on weather conditions. The use case involves predicting whether a match can be played based on weather conditions, with predictor variables including outlook, humidity, wind, and day, and the target variable being 'play.' Decision trees, information gain, and entropy are used to analyze the significance of the predictive model."]}], 'duration': 2532.008, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s3996785.jpg', 'highlights': ['Probability sampling is crucial for inferring statistical knowledge about a population, emphasizing its three types.', 'Descriptive and inferential statistics are explained, highlighting their differences and applications in statistical analysis.', 'In-depth explanation of measures of central tendency and variability in descriptive statistics, covering concepts such as mean, median, mode, range, interquartile range, variance, and standard deviation.', 'Interquartile range (IQR) is explained as the variability measure based on quartiles, with a detailed example of quartile calculation and IQR formula.', 'Entropy is a measure of uncertainty in the data and can be calculated using a formula involving the set of instances, the different classes, and event probability.', 'Information gain quantifies the usefulness of a feature in predicting the final outcome and can be calculated using a specific formula.', 'The process of model training involves splitting the input data into training and testing sets, building a model, and evaluating it using machine learning algorithms to find the most suitable model for business requirements.', 'Data cleaning is highlighted as a time-consuming task that involves formatting, structuring, and removing unnecessary or inconsistent data from the collected datasets to prepare it for analysis.', "The use case involves predicting whether a match can be played based on weather conditions, with predictor variables including outlook, humidity, wind, and day, and the target variable being 'play.' Decision trees, information gain, and entropy are used to analyze the significance of the predictive model.", 'The importance of data in making better business decisions is emphasized, along with insights into qualitative and quantitative data, including subcategories of nominal, ordinal, discrete, and continuous data.']}, {'end': 7844.172, 'segs': [{'end': 6621.933, 'src': 'embed', 'start': 6594.195, 'weight': 0, 'content': [{'end': 6598.979, 'text': "Now when you calculate the information gain for this attribute you'll get a value of 0.029 which is again very less.", 'start': 6594.195, 'duration': 4.784}, {'end': 6607.897, 'text': 'So what you can summarize from here is if we look at the information gain for each of these variable,', 'start': 6601.77, 'duration': 6.127}, {'end': 6610.92, 'text': "we'll see that for Outlook we have the maximum gain.", 'start': 6607.897, 'duration': 3.023}, {'end': 6616.006, 'text': 'All right, we have 0.247, which is the highest information gain value.', 'start': 6611.761, 'duration': 4.245}, {'end': 6621.933, 'text': 'And you must always choose a variable with the highest information gain to split the data at the root node.', 'start': 6616.647, 'duration': 5.286}], 'summary': 'Choosing outlook for splitting yields highest information gain of 0.247.', 'duration': 27.738, 'max_score': 6594.195, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s6594195.jpg'}, {'end': 6690.446, 'src': 'embed', 'start': 6663.144, 'weight': 1, 'content': [{'end': 6668.947, 'text': 'Now a confusion matrix is a matrix that is often used to describe the performance of a model.', 'start': 6663.144, 'duration': 5.803}, {'end': 6674.51, 'text': 'And this is specifically used for classification models or a classifier.', 'start': 6669.908, 'duration': 4.602}, {'end': 6685.036, 'text': 'And what it does is it will calculate the accuracy or it will calculate the performance of your classifier by comparing your actual results and your predicted results.', 'start': 6675.131, 'duration': 9.905}, {'end': 6690.446, 'text': 'All right, so this is what it looks like, true positive plus true negative and all of that.', 'start': 6685.928, 'duration': 4.518}], 'summary': 'Confusion matrix evaluates classifier performance for classification models by comparing actual and predicted results.', 'duration': 27.302, 'max_score': 6663.144, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s6663144.jpg'}, {'end': 6979.653, 'src': 'embed', 'start': 6954.952, 'weight': 2, 'content': [{'end': 6961.314, 'text': "So guys, it's always best to perform practical implementations in order to understand the concepts in a better way.", 'start': 6954.952, 'duration': 6.362}, {'end': 6968.576, 'text': "Okay, so here we'll be executing a small demo that'll show you how to calculate the mean median mode, variance,", 'start': 6961.754, 'duration': 6.822}, {'end': 6972.937, 'text': 'standard deviation and how to study the variables by plotting a histogram.', 'start': 6968.576, 'duration': 4.361}, {'end': 6975.933, 'text': "Don't worry if you don't know what a histogram is.", 'start': 6974.032, 'duration': 1.901}, {'end': 6977.533, 'text': "It's basically a frequency plot.", 'start': 6976.013, 'duration': 1.52}, {'end': 6979.653, 'text': "There's no big science behind it.", 'start': 6978.073, 'duration': 1.58}], 'summary': 'Practical demo on calculating mean, median, mode, variance, and standard deviation, as well as studying variables with a histogram.', 'duration': 24.701, 'max_score': 6954.952, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s6954952.jpg'}, {'end': 7239.706, 'src': 'embed', 'start': 7194.122, 'weight': 3, 'content': [{'end': 7203.205, 'text': 'Therefore, we can say that probability and statistics are interconnected branches of mathematics that deal with analyzing the relative frequency of events.', 'start': 7194.122, 'duration': 9.083}, {'end': 7210.348, 'text': "So they're very interconnected fields and probability makes use of statistics and statistics makes use of probability.", 'start': 7203.866, 'duration': 6.482}, {'end': 7212.149, 'text': "They're very interconnected fields.", 'start': 7210.868, 'duration': 1.281}, {'end': 7216.02, 'text': 'So that is the relationship between statistics and probability.', 'start': 7212.879, 'duration': 3.141}, {'end': 7219.201, 'text': "Now let's understand what exactly is probability.", 'start': 7216.5, 'duration': 2.701}, {'end': 7224.722, 'text': 'So probability is the measure of how likely an event will occur.', 'start': 7219.921, 'duration': 4.801}, {'end': 7230.464, 'text': 'To be more precise, it is the ratio of desired outcome to the total outcomes.', 'start': 7225.222, 'duration': 5.242}, {'end': 7234.665, 'text': 'Now the probability of all outcomes always sum up to one.', 'start': 7231.104, 'duration': 3.561}, {'end': 7237.485, 'text': 'Now the probability will always sum up to one.', 'start': 7235.145, 'duration': 2.34}, {'end': 7239.706, 'text': 'Probability cannot go beyond one.', 'start': 7237.805, 'duration': 1.901}], 'summary': 'Probability and statistics are interconnected fields; probability measures likelihood of events with outcomes summing up to one.', 'duration': 45.584, 'max_score': 7194.122, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s7194122.jpg'}, {'end': 7786.991, 'src': 'heatmap', 'start': 7363.159, 'weight': 0.766, 'content': [{'end': 7369.864, 'text': 'So one, two, three, four, five, six phases are there and out of that you need to find the probability of getting a two.', 'start': 7363.159, 'duration': 6.705}, {'end': 7374.188, 'text': 'So all the possible outcomes will basically represent your sample space.', 'start': 7370.125, 'duration': 4.063}, {'end': 7376.992, 'text': 'Okay, so one to six are all your possible outcomes.', 'start': 7374.688, 'duration': 2.304}, {'end': 7378.614, 'text': 'This represents your sample space.', 'start': 7377.072, 'duration': 1.542}, {'end': 7382.199, 'text': 'Now event is one or more outcome of an experiment.', 'start': 7379.014, 'duration': 3.185}, {'end': 7390.391, 'text': 'So in this case, my event is to get a two when I roll a dice, right? So my event is the probability of getting a two when I roll a dice.', 'start': 7382.58, 'duration': 7.811}, {'end': 7401.221, 'text': "So guys, this is basically what random experiment sample space and event really means, all right? Now, let's discuss the different types of events.", 'start': 7391.016, 'duration': 10.205}, {'end': 7403.702, 'text': 'There are two types of events that you should know about.', 'start': 7401.701, 'duration': 2.001}, {'end': 7406.404, 'text': 'There is disjoint and non-disjoint events.', 'start': 7403.722, 'duration': 2.682}, {'end': 7410.846, 'text': 'Disjoint events are events that do not have any common outcome.', 'start': 7406.924, 'duration': 3.922}, {'end': 7420.487, 'text': 'For example, if you draw a single card from a deck of cards, it cannot be a king and a queen, correct? It can either be king or it can be queen.', 'start': 7411.44, 'duration': 9.047}, {'end': 7425.031, 'text': 'Now, non-disjoint events are events that have common outcomes.', 'start': 7421.168, 'duration': 3.863}, {'end': 7430.275, 'text': 'For example, a student can get 100 marks in statistics and 100 marks in probability.', 'start': 7425.591, 'duration': 4.684}, {'end': 7437.004, 'text': 'And also the outcome of a ball delivered can be a no ball and it can be a six.', 'start': 7431.62, 'duration': 5.384}, {'end': 7439.466, 'text': 'So this is what non-disjoint events are.', 'start': 7437.384, 'duration': 2.082}, {'end': 7441.887, 'text': 'These are very simple to understand.', 'start': 7440.346, 'duration': 1.541}, {'end': 7447.371, 'text': "Now let's move on and look at the different types of probability distribution.", 'start': 7442.888, 'duration': 4.483}, {'end': 7451.935, 'text': "I'll be discussing the three main probability distribution functions.", 'start': 7448.532, 'duration': 3.403}, {'end': 7457.639, 'text': "I'll be talking about probability density function, normal distribution, and central limit theorem.", 'start': 7452.415, 'duration': 5.224}, {'end': 7469.933, 'text': 'Probability density function, also known as PDF, is concerned with the relative likelihood for a continuous random variable to take on a given value.', 'start': 7459.326, 'duration': 10.607}, {'end': 7470.233, 'text': 'all right?', 'start': 7469.933, 'duration': 0.3}, {'end': 7476.257, 'text': 'So the PDF gives the probability of a variable that lies between the range A and B.', 'start': 7470.513, 'duration': 5.744}, {'end': 7484.663, 'text': "So basically what you're trying to do is you're going to try and find the probability of a continuous random variable over a specified range.", 'start': 7476.257, 'duration': 8.406}, {'end': 7489.537, 'text': 'Now this graph denotes the PDF of a continuous variable.', 'start': 7485.876, 'duration': 3.661}, {'end': 7492.098, 'text': 'Now this graph is also known as the bell curve.', 'start': 7490.017, 'duration': 2.081}, {'end': 7495.259, 'text': "It's famously called the bell curve because of its shape.", 'start': 7492.518, 'duration': 2.741}, {'end': 7500.34, 'text': 'And there are three important properties that you need to know about a probability density function.', 'start': 7495.799, 'duration': 4.541}, {'end': 7503.881, 'text': 'Now the graph of a PDF will be continuous over a range.', 'start': 7500.78, 'duration': 3.101}, {'end': 7511.773, 'text': "This is because you're finding the probability that a continuous variable lies between the ranges A and B.", 'start': 7504.77, 'duration': 7.003}, {'end': 7519.196, 'text': 'The second property is that the area bounded by the curve of a density function and the x-axis is equal to one.', 'start': 7511.773, 'duration': 7.423}, {'end': 7525.019, 'text': 'Basically, the area below the curve is equal to one because it denotes probability.', 'start': 7519.736, 'duration': 5.283}, {'end': 7528.46, 'text': 'Again, the probability cannot range more than one.', 'start': 7525.479, 'duration': 2.981}, {'end': 7530.101, 'text': 'It has to be between zero and one.', 'start': 7528.6, 'duration': 1.501}, {'end': 7540.949, 'text': 'Property number three is that the probability that a random variable assumes a value between A and B is equal to the area under the PDF bounded by A and B.', 'start': 7530.661, 'duration': 10.288}, {'end': 7547.004, 'text': 'Now what this means is that the probability value is denoted by the area of the graph.', 'start': 7541.702, 'duration': 5.302}, {'end': 7557.188, 'text': 'So whatever value that you get here, which is basically one, is the probability that a random variable will lie between the range A and B.', 'start': 7547.884, 'duration': 9.304}, {'end': 7560.289, 'text': 'So I hope all of you have understood the probability density function.', 'start': 7557.188, 'duration': 3.101}, {'end': 7568.272, 'text': "It's basically the probability of finding the value of a continuous random variable between the range A and B.", 'start': 7560.709, 'duration': 7.563}, {'end': 7573.995, 'text': "All right, now let's look at our next distribution which is normal distribution.", 'start': 7569.053, 'duration': 4.942}, {'end': 7579.137, 'text': 'Now, normal distribution, which is also known as the Gaussian distribution,', 'start': 7574.535, 'duration': 4.602}, {'end': 7584.458, 'text': 'is a probability distribution that denotes the symmetric property of the mean.', 'start': 7579.137, 'duration': 5.321}, {'end': 7592.521, 'text': 'Meaning that the idea behind this function is that the data near the mean occurs more frequently than the data away from the mean.', 'start': 7584.518, 'duration': 8.003}, {'end': 7597.923, 'text': 'So what it means to say is that the data around the mean represents the entire data set.', 'start': 7593.201, 'duration': 4.722}, {'end': 7603.642, 'text': 'So if you just take a sample of data around the mean, it can represent the entire data set.', 'start': 7598.719, 'duration': 4.923}, {'end': 7610.225, 'text': 'Now similar to the probability density function, the normal distribution appears as a bell curve.', 'start': 7604.022, 'duration': 6.203}, {'end': 7615.007, 'text': 'Now when it comes to normal distribution, there are two important factors.', 'start': 7610.825, 'duration': 4.182}, {'end': 7619.49, 'text': 'We have the mean of the population and the standard deviation.', 'start': 7615.508, 'duration': 3.982}, {'end': 7624.692, 'text': 'So the mean in the graph determines the location of the center of the graph.', 'start': 7620.17, 'duration': 4.522}, {'end': 7627.894, 'text': 'And the standard deviation determines the height of the graph.', 'start': 7625.153, 'duration': 2.741}, {'end': 7632.815, 'text': 'So if the standard deviation is large, the curve is going to look something like this.', 'start': 7629.111, 'duration': 3.704}, {'end': 7634.677, 'text': "It'll be short and wide.", 'start': 7633.415, 'duration': 1.262}, {'end': 7639.301, 'text': 'And if the standard deviation is small, the curve is tall and narrow.', 'start': 7635.097, 'duration': 4.204}, {'end': 7642.285, 'text': 'So this was it about normal distribution.', 'start': 7639.922, 'duration': 2.363}, {'end': 7644.896, 'text': "Now let's look at the central limit theory.", 'start': 7642.913, 'duration': 1.983}, {'end': 7653.667, 'text': 'Now, the central limit theory states that the sampling distribution of the mean of any independent random variable will be normal,', 'start': 7645.336, 'duration': 8.331}, {'end': 7656.61, 'text': 'or nearly normal if the sample size is large enough.', 'start': 7653.667, 'duration': 2.943}, {'end': 7658.493, 'text': "Now that's a little confusing.", 'start': 7657.091, 'duration': 1.402}, {'end': 7660.115, 'text': 'Okay, let me break it down for you.', 'start': 7658.813, 'duration': 1.302}, {'end': 7667.017, 'text': 'Now, in simple terms, if we had a large population and we divided it into many samples,', 'start': 7660.675, 'duration': 6.342}, {'end': 7673.959, 'text': 'then the mean of all the samples from the population will be almost equal to the mean of the entire population.', 'start': 7667.017, 'duration': 6.942}, {'end': 7674.74, 'text': 'all right?', 'start': 7673.959, 'duration': 0.781}, {'end': 7678.181, 'text': 'Meaning that each of the sample is normally distributed right?', 'start': 7674.98, 'duration': 3.201}, {'end': 7684.823, 'text': 'So if you compare the mean of each of the sample, it will almost be equal to the mean of the population, right?', 'start': 7678.441, 'duration': 6.382}, {'end': 7690.427, 'text': 'So this graph basically shows a more clear understanding of the central limit theorem.', 'start': 7685.504, 'duration': 4.923}, {'end': 7697.25, 'text': 'You can see each sample here and the mean of each sample is almost along the same line.', 'start': 7690.927, 'duration': 6.323}, {'end': 7701.473, 'text': 'So this is exactly what the central limit theorem states.', 'start': 7697.27, 'duration': 4.203}, {'end': 7707.596, 'text': 'Now the accuracy or the resemblance to the normal distribution depends on two main factors.', 'start': 7702.013, 'duration': 5.583}, {'end': 7710.918, 'text': 'So the first is the number of sample points that you consider.', 'start': 7708.276, 'duration': 2.642}, {'end': 7714.68, 'text': 'And the second is the shape of the underlying population.', 'start': 7711.878, 'duration': 2.802}, {'end': 7721.025, 'text': 'Now, the shape obviously depends on the standard deviation and the mean of a sample correct?', 'start': 7715.28, 'duration': 5.745}, {'end': 7722.106, 'text': 'So, guys,', 'start': 7721.465, 'duration': 0.641}, {'end': 7731.773, 'text': 'central limit theorem basically states that each sample will be normally distributed in such a way that the mean of each sample will coincide with the mean of the actual population.', 'start': 7722.106, 'duration': 9.667}, {'end': 7734.956, 'text': "Alright, in short terms, that's what central limit theorem states.", 'start': 7732.214, 'duration': 2.742}, {'end': 7738.106, 'text': 'And this holds true only for a large data set.', 'start': 7735.784, 'duration': 2.322}, {'end': 7742.91, 'text': 'Mostly for a small data set, there are more deviations when compared to a large data set.', 'start': 7738.547, 'duration': 4.363}, {'end': 7745.212, 'text': "It's because of the scaling factor.", 'start': 7743.291, 'duration': 1.921}, {'end': 7749.276, 'text': 'The smallest deviation in a small data set will change the value very drastically.', 'start': 7745.312, 'duration': 3.964}, {'end': 7753.419, 'text': 'But in a large data set, a small deviation will not matter at all.', 'start': 7749.676, 'duration': 3.743}, {'end': 7758.964, 'text': "Now let's move on and look at our next topic, which is the different types of probability.", 'start': 7754, 'duration': 4.964}, {'end': 7764.826, 'text': 'Now, this is a important topic, because most of your problems can be solved by understanding.', 'start': 7759.484, 'duration': 5.342}, {'end': 7769.068, 'text': 'which type of probability should I use to solve this problem right?', 'start': 7764.826, 'duration': 4.242}, {'end': 7771.809, 'text': 'So we have three important types of probability.', 'start': 7769.688, 'duration': 2.121}, {'end': 7775.15, 'text': 'We have marginal, joint, and conditional probability.', 'start': 7772.109, 'duration': 3.041}, {'end': 7777.179, 'text': "So let's discuss each of these.", 'start': 7775.757, 'duration': 1.422}, {'end': 7786.991, 'text': 'Now the probability of an event occurring unconditioned on any other event is known as marginal probability or unconditional probability.', 'start': 7777.199, 'duration': 9.792}], 'summary': 'Probability concepts, sample space, events, and types of probability distributions explained with examples and properties.', 'duration': 423.832, 'max_score': 7363.159, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s7363159.jpg'}, {'end': 7484.663, 'src': 'embed', 'start': 7452.415, 'weight': 5, 'content': [{'end': 7457.639, 'text': "I'll be talking about probability density function, normal distribution, and central limit theorem.", 'start': 7452.415, 'duration': 5.224}, {'end': 7469.933, 'text': 'Probability density function, also known as PDF, is concerned with the relative likelihood for a continuous random variable to take on a given value.', 'start': 7459.326, 'duration': 10.607}, {'end': 7470.233, 'text': 'all right?', 'start': 7469.933, 'duration': 0.3}, {'end': 7476.257, 'text': 'So the PDF gives the probability of a variable that lies between the range A and B.', 'start': 7470.513, 'duration': 5.744}, {'end': 7484.663, 'text': "So basically what you're trying to do is you're going to try and find the probability of a continuous random variable over a specified range.", 'start': 7476.257, 'duration': 8.406}], 'summary': 'Probability density function gives likelihood for continuous random variable values within a range.', 'duration': 32.248, 'max_score': 7452.415, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s7452415.jpg'}, {'end': 7584.458, 'src': 'embed', 'start': 7560.709, 'weight': 6, 'content': [{'end': 7568.272, 'text': "It's basically the probability of finding the value of a continuous random variable between the range A and B.", 'start': 7560.709, 'duration': 7.563}, {'end': 7573.995, 'text': "All right, now let's look at our next distribution which is normal distribution.", 'start': 7569.053, 'duration': 4.942}, {'end': 7579.137, 'text': 'Now, normal distribution, which is also known as the Gaussian distribution,', 'start': 7574.535, 'duration': 4.602}, {'end': 7584.458, 'text': 'is a probability distribution that denotes the symmetric property of the mean.', 'start': 7579.137, 'duration': 5.321}], 'summary': 'The transcript discusses the probability of a continuous random variable between a range a and b, and introduces the normal distribution, also known as the gaussian distribution.', 'duration': 23.749, 'max_score': 7560.709, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s7560709.jpg'}, {'end': 7667.017, 'src': 'embed', 'start': 7645.336, 'weight': 7, 'content': [{'end': 7653.667, 'text': 'Now, the central limit theory states that the sampling distribution of the mean of any independent random variable will be normal,', 'start': 7645.336, 'duration': 8.331}, {'end': 7656.61, 'text': 'or nearly normal if the sample size is large enough.', 'start': 7653.667, 'duration': 2.943}, {'end': 7658.493, 'text': "Now that's a little confusing.", 'start': 7657.091, 'duration': 1.402}, {'end': 7660.115, 'text': 'Okay, let me break it down for you.', 'start': 7658.813, 'duration': 1.302}, {'end': 7667.017, 'text': 'Now, in simple terms, if we had a large population and we divided it into many samples,', 'start': 7660.675, 'duration': 6.342}], 'summary': 'Central limit theory: sampling mean distribution becomes normal with large sample size.', 'duration': 21.681, 'max_score': 7645.336, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s7645336.jpg'}, {'end': 7786.991, 'src': 'embed', 'start': 7754, 'weight': 8, 'content': [{'end': 7758.964, 'text': "Now let's move on and look at our next topic, which is the different types of probability.", 'start': 7754, 'duration': 4.964}, {'end': 7764.826, 'text': 'Now, this is a important topic, because most of your problems can be solved by understanding.', 'start': 7759.484, 'duration': 5.342}, {'end': 7769.068, 'text': 'which type of probability should I use to solve this problem right?', 'start': 7764.826, 'duration': 4.242}, {'end': 7771.809, 'text': 'So we have three important types of probability.', 'start': 7769.688, 'duration': 2.121}, {'end': 7775.15, 'text': 'We have marginal, joint, and conditional probability.', 'start': 7772.109, 'duration': 3.041}, {'end': 7777.179, 'text': "So let's discuss each of these.", 'start': 7775.757, 'duration': 1.422}, {'end': 7786.991, 'text': 'Now the probability of an event occurring unconditioned on any other event is known as marginal probability or unconditional probability.', 'start': 7777.199, 'duration': 9.792}], 'summary': 'Probability has three important types: marginal, joint, and conditional probability.', 'duration': 32.991, 'max_score': 7754, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s7754000.jpg'}], 'start': 6529.446, 'title': 'Probability, statistics, and decision trees', 'summary': 'Covers decision tree analysis, descriptive statistics in r, and the relationship between probability and statistics. it includes a discussion on information gain in decision tree analysis, confusion matrix, true negative, false positive, false negative, mean, median, mode, variance, standard deviation, probability, pdf, normal distribution, and different types of probability.', 'chapters': [{'end': 6887.006, 'start': 6529.446, 'title': 'Decision tree analysis', 'summary': 'Explains the concept of information gain and the importance of choosing the variable with the highest information gain in decision tree analysis, followed by an introduction to the confusion matrix and its application in evaluating the performance of a classifier.', 'duration': 357.56, 'highlights': ['The information gain for the attribute outlook is 0.247, indicating the highest gain among the variables, guiding the choice of the root node in decision tree analysis. The information gain for the attribute outlook is 0.247, the highest among the variables, guiding the choice of the root node in decision tree analysis.', 'The information gain for the humidity variable is 0.151, considered a decent value but lower than the information gain of the attribute outlook. The information gain for the humidity variable is 0.151, which is lower than the information gain of the attribute outlook, indicating its lesser significance in decision tree analysis.', 'Introduction of confusion matrix in evaluating the performance of a classifier, highlighting its application for calculating accuracy and comparing actual and predicted results. The explanation of the confusion matrix and its application for calculating accuracy and comparing actual and predicted results in evaluating the performance of a classifier.']}, {'end': 7126.911, 'start': 6887.506, 'title': 'Descriptive statistics in r demo', 'summary': 'Discussed the concepts of true negative, false positive, and false negative in a confusion matrix, followed by a demonstration in r on calculating mean, median, mode, variance, and standard deviation, showcasing how basic statistics form the foundation of machine learning and deep learning algorithms.', 'duration': 239.405, 'highlights': ['The chapter discussed the concepts of true negative, false positive, and false negative in a confusion matrix It explained the meanings of true negative, false positive, and false negative in the context of disease prediction, providing a foundation for understanding classification accuracy.', 'Demonstration in R on calculating mean, median, mode, variance, and standard deviation The demonstration showcased the practical application of statistical concepts in R, including calculating mean, median, mode, variance, and standard deviation, aiding in a better understanding of descriptive statistics.', 'Basic statistics form the foundation of machine learning and deep learning algorithms It emphasized the significance of understanding basic statistical concepts like mean, median, mode, and variance, as they serve as the fundamental building blocks for machine learning and deep learning algorithms.']}, {'end': 7584.458, 'start': 7127.831, 'title': 'Probability & statistics relationship', 'summary': 'Discusses the relationship between probability and statistics, explaining probability as the measure of likelihood of an event occurring, and covers terminologies, event types, and probability distribution functions, including pdf and normal distribution.', 'duration': 456.627, 'highlights': ['Probability is the measure of how likely an event will occur, expressed as the ratio of desired outcomes to total outcomes, always summing up to one. Probability is the measure of how likely an event will occur, expressed as the ratio of desired outcomes to total outcomes, always summing up to one.', 'Probability and statistics are interconnected branches of mathematics that deal with analyzing the relative frequency of events, where probability makes use of statistics and vice versa. Probability and statistics are interconnected branches of mathematics that deal with analyzing the relative frequency of events, where probability makes use of statistics and vice versa.', 'Probability distribution functions include probability density function (PDF) and normal distribution, each with distinct properties and applications. Probability distribution functions include probability density function (PDF) and normal distribution, each with distinct properties and applications.', 'The probability density function (PDF) gives the probability of a continuous random variable lying between a specified range, with a graph denoting a bell curve and properties ensuring the probability ranges between zero and one. The probability density function (PDF) gives the probability of a continuous random variable lying between a specified range, with a graph denoting a bell curve and properties ensuring the probability ranges between zero and one.', "Normal distribution, also known as Gaussian distribution, is a symmetric probability distribution that represents the relative likelihood of a variable's values around the mean. Normal distribution, also known as Gaussian distribution, is a symmetric probability distribution that represents the relative likelihood of a variable's values around the mean."]}, {'end': 7844.172, 'start': 7584.518, 'title': 'Normal distribution & probability types', 'summary': 'Discusses the concepts of normal distribution, central limit theorem, and different types of probability, highlighting the mean, standard deviation, and the three types of probability - marginal, joint, and conditional.', 'duration': 259.654, 'highlights': ['Normal distribution is represented by a bell curve with mean determining the center and standard deviation determining the height. The mean in the graph determines the location of the center of the graph, and the standard deviation determines the height of the graph.', 'Central limit theorem states that the sampling distribution of the mean of any independent random variable will be normal if the sample size is large enough. The mean of all the samples from the population will be almost equal to the mean of the entire population if the sample size is large enough.', 'The three types of probability are marginal, joint, and conditional probability, each with specific applications and calculations. Marginal probability is the probability of an event occurring unconditioned on any other event, joint probability measures two events happening at the same time, and conditional probability is the probability of an event occurring given that another event has already occurred.']}], 'duration': 1314.726, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s6529446.jpg', 'highlights': ['The information gain for the attribute outlook is 0.247, the highest among the variables, guiding the choice of the root node in decision tree analysis.', 'Introduction of confusion matrix in evaluating the performance of a classifier, highlighting its application for calculating accuracy and comparing actual and predicted results.', 'Demonstration in R on calculating mean, median, mode, variance, and standard deviation showcased the practical application of statistical concepts, aiding in a better understanding of descriptive statistics.', 'Probability is the measure of how likely an event will occur, expressed as the ratio of desired outcomes to total outcomes, always summing up to one.', 'Probability and statistics are interconnected branches of mathematics that deal with analyzing the relative frequency of events, where probability makes use of statistics and vice versa.', 'Probability distribution functions include probability density function (PDF) and normal distribution, each with distinct properties and applications.', "Normal distribution, also known as Gaussian distribution, is a symmetric probability distribution that represents the relative likelihood of a variable's values around the mean.", 'Central limit theorem states that the sampling distribution of the mean of any independent random variable will be normal if the sample size is large enough.', 'The three types of probability are marginal, joint, and conditional probability, each with specific applications and calculations.']}, {'end': 10514.399, 'segs': [{'end': 8127.925, 'src': 'embed', 'start': 8101.54, 'weight': 0, 'content': [{'end': 8108.927, 'text': "You're saying that you want to find the probability of a candidate who has a good package given that he's not undergone any training.", 'start': 8101.54, 'duration': 7.387}, {'end': 8111.93, 'text': "The condition is that he's not undergone any training.", 'start': 8109.488, 'duration': 2.442}, {'end': 8119.941, 'text': 'All right, so the number of people who have not undergone training are 60, and out of that, five of them have got a good package.', 'start': 8112.518, 'duration': 7.423}, {'end': 8127.925, 'text': "Right. so that's why this is five by 60 and not five by 105, because here they've clearly mentioned has a good package,", 'start': 8120.442, 'duration': 7.483}], 'summary': 'Probability of good package for untrained candidate: 5/60', 'duration': 26.385, 'max_score': 8101.54, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s8101540.jpg'}, {'end': 9567.809, 'src': 'embed', 'start': 9529.991, 'weight': 1, 'content': [{'end': 9541.237, 'text': "Okay, we're gonna compare the life expectancy of Ireland and South Africa and we're gonna perform the t-test to check if the comparison follows a null hypothesis or an alternate hypothesis.", 'start': 9529.991, 'duration': 11.246}, {'end': 9545.059, 'text': "Okay, so let's run the code.", 'start': 9541.977, 'duration': 3.082}, {'end': 9561.247, 'text': "Okay, so now we'll apply the t-test and we'll compare the life expectancy of these two places.", 'start': 9555.464, 'duration': 5.783}, {'end': 9563.267, 'text': "All right, let's run this.", 'start': 9562.127, 'duration': 1.14}, {'end': 9567.809, 'text': 'Notice the mean in group Ireland and South Africa.', 'start': 9564.288, 'duration': 3.521}], 'summary': 'Comparing life expectancy of ireland and south africa using t-test.', 'duration': 37.818, 'max_score': 9529.991, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s9529991.jpg'}, {'end': 9991.557, 'src': 'embed', 'start': 9965.116, 'weight': 2, 'content': [{'end': 9969.98, 'text': "So to sum this up, let's look at a few reasons why machine learning is so important.", 'start': 9965.116, 'duration': 4.864}, {'end': 9974.265, 'text': 'So the first reason is obviously increase in data generation.', 'start': 9970.543, 'duration': 3.722}, {'end': 9982.611, 'text': 'So because of excessive production of data, we need a method that can be used to structure, analyze and draw useful insights from data.', 'start': 9974.826, 'duration': 7.785}, {'end': 9985.093, 'text': 'This is where machine learning comes in.', 'start': 9983.492, 'duration': 1.601}, {'end': 9991.557, 'text': 'It uses data to solve problems and find solutions to the most complex tasks faced by organizations.', 'start': 9985.353, 'duration': 6.204}], 'summary': 'Machine learning is crucial due to the exponential increase in data generation, requiring structured analysis and insights for solving complex organizational challenges.', 'duration': 26.441, 'max_score': 9965.116, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s9965116.jpg'}, {'end': 10177.002, 'src': 'embed', 'start': 10153.168, 'weight': 3, 'content': [{'end': 10159.675, 'text': 'Now this figure basically shows how a machine learning algorithm or how the machine learning process really works.', 'start': 10153.168, 'duration': 6.507}, {'end': 10164.635, 'text': 'So the machine learning process begins by feeding the machine lots and lots of data.', 'start': 10160.293, 'duration': 4.342}, {'end': 10169.458, 'text': 'By using this data, the machine is trained to detect hidden insights and trends.', 'start': 10165.296, 'duration': 4.162}, {'end': 10177.002, 'text': 'Now, these insights are then used to build a machine learning model by using an algorithm in order to solve a problem.', 'start': 10170.018, 'duration': 6.984}], 'summary': 'Machine learning process: data feeding, hidden insights detection, model building.', 'duration': 23.834, 'max_score': 10153.168, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s10153168.jpg'}, {'end': 10384.577, 'src': 'embed', 'start': 10358.3, 'weight': 4, 'content': [{'end': 10365.287, 'text': "So first of all, let's define the different stages or the different steps involved in the machine learning process.", 'start': 10358.3, 'duration': 6.987}, {'end': 10372.672, 'text': "So a machine learning process always begins with defining the objective or defining the problem that you're trying to solve.", 'start': 10366.129, 'duration': 6.543}, {'end': 10376.033, 'text': 'Next stage is data gathering or data collection.', 'start': 10373.352, 'duration': 2.681}, {'end': 10380.435, 'text': 'Now the data that you need to solve this problem is collected at this stage.', 'start': 10376.693, 'duration': 3.742}, {'end': 10384.577, 'text': 'This is followed by data preparation or data processing.', 'start': 10381.055, 'duration': 3.522}], 'summary': 'Machine learning process involves defining objectives, gathering data, and preparing it.', 'duration': 26.277, 'max_score': 10358.3, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s10358300.jpg'}], 'start': 7844.172, 'title': "Probability, bayes' theorem, hypothesis testing & machine learning", 'summary': "Covers types of probability, bayes' theorem, point estimation, confidence intervals, hypothesis testing, and the importance and introduction to machine learning. it includes examples and applications, providing a comprehensive understanding of these concepts and their significance in data analysis and decision-making.", 'chapters': [{'end': 8239.973, 'start': 7844.172, 'title': "Types of probability and bayes' theorem", 'summary': "Discusses joint probability, conditional probability, marginal probability, and bayes' theorem, using a small use case to illustrate the concepts and calculations.", 'duration': 395.801, 'highlights': ["The probability that a candidate has undergone Edureka's training is 45 divided by 105, resulting in a value of approximately 0.42. Calculating the marginal probability of candidates who have undergone Edureka's training.", "30 out of 105 candidates have attended Edureka's training and have a good package, resulting in a joint probability of 30 divided by 105. Calculating the joint probability of candidates with Edureka's training and a good package.", 'The probability of a candidate having a good package, given that they have not undergone any training, is 5 divided by 60, resulting in a probability of around 0.08. Determining the conditional probability of candidates having a good package without training.', "Bayes' Theorem is used to show the relation between one conditional probability and its inverse, and it is represented mathematically as the likelihood ratio and the posterior. Explaining the concept and mathematical representation of Bayes' Theorem."]}, {'end': 8922.064, 'start': 8239.973, 'title': 'Bayes theorem and point estimation', 'summary': 'Explains the bayes theorem and point estimation, providing an example to illustrate the application. it then delves into interval estimation, discussing confidence interval and margin of error with examples, and highlighting the methods used for point estimation.', 'duration': 682.091, 'highlights': ['The chapter explains the Bayes theorem and point estimation, providing an example to illustrate the application.', 'Interval estimation is discussed, emphasizing confidence interval and margin of error with examples.', 'The methods used for point estimation are detailed, including method of moments, maximum likelihood, base estimator, and best unbiased estimators.', 'The chapter provides a clear explanation of confidence interval using the example of a survey on cat food purchase, demonstrating the significance of confidence level and interval in estimation.', 'The concept of interval estimation is clarified by comparing point estimation with interval estimation using the example of reaching a theater, highlighting the difference and importance of interval estimation.']}, {'end': 9247.077, 'start': 8922.842, 'title': 'Confidence interval & hypothesis testing', 'summary': 'Explains the concept of margin of error, confidence intervals, and hypothesis testing, covering the calculation of margin of error, estimation of confidence intervals, and the steps involved in constructing a confidence interval, along with an example. additionally, it delves into the significance and steps of hypothesis testing.', 'duration': 324.235, 'highlights': ['The level of confidence, denoted by C, is the probability that the interval estimate contains the population parameter. Explains the concept of level of confidence and its relevance to the interval estimate containing the population parameter.', 'The margin of error is calculated using the formula: ZC * standard deviation / √sample size. Details the formula for calculating the margin of error, providing a clear understanding of its calculation method.', 'Steps involved in constructing a confidence interval include identifying a sample statistic, selecting a confidence level, finding the margin of error, and specifying the confidence interval. Describes the sequential steps for constructing a confidence interval, outlining the key components and process involved.', 'Example: Using a 95% confidence level, the margin of error for the mean price of all textbooks in the bookstore is approximately 8.12. Provides a practical example of calculating the margin of error at a specific confidence level, offering a tangible application of the concept.', 'Hypothesis testing is used to determine whether there is enough evidence in a data sample to infer that a certain condition holds true for an entire population. Defines the purpose and objective of hypothesis testing, emphasizing its role in making inferences about the entire population based on sample data.']}, {'end': 9783.535, 'start': 9247.638, 'title': 'Probability and hypothesis testing', 'summary': 'Discusses the probability of john not cheating in a classroom cleaning activity, using hypothesis testing and then demonstrates hypothesis testing using the gapminder dataset in r, showing how the alternate hypothesis is true with a 95% confidence interval.', 'duration': 535.897, 'highlights': ['The probability of John not cheating drops to approximately 42% if he is not picked for three days in a row, and further drops to 3.2% if he is not picked for 12 days in a row. The probability of John not being picked for a day, dropping to 42% after three days and 3.2% after 12 days, indicating a high probability of cheating.', 'If the probability of an event occurring is less than 5%, it indicates bias, proving the alternate hypothesis. If the probability lies below 5%, then the event is biased, proving the alternate hypothesis.', 'The demonstration uses the Gapminder dataset to perform hypothesis testing, comparing the life expectancy of Ireland and South Africa, and concludes that the alternate hypothesis is true with a 95% confidence interval. The demonstration compares the life expectancy of Ireland and South Africa using the Gapminder dataset, concluding that the alternate hypothesis is true with a 95% confidence interval.', 'The P value in the t-test is very small, suggesting statistical significance and disapproving the null hypothesis. The small P value suggests statistical significance and disapproves the null hypothesis.', 'The demonstration concludes with a visualization showing a linear variance in life expectancy for each continent with respect to the GDP per capita. The demonstration concludes with a visualization showing a linear variance in life expectancy for each continent with respect to the GDP per capita.']}, {'end': 10129.758, 'start': 9783.955, 'title': 'Importance of machine learning', 'summary': 'Discusses the importance of machine learning in analyzing and making sense of large volumes of data, and provides examples of its applications in companies such as netflix, facebook, amazon, and google, emphasizing its role in predicting risks, profits, uncovering patterns, improving decision-making, and solving complex problems.', 'duration': 345.803, 'highlights': ["Netflix's recommendation engine is a key application of machine learning, generating most of Netflix's revenue. The recommendation engine studies users' movie viewing patterns to recommend relevant movies, driving most of Netflix's revenue.", "Facebook's auto-tagging feature and Alexa's capabilities are driven by machine learning and natural language processing. Facebook's auto-tagging feature and Alexa's capabilities are based on machine learning and natural language processing, enhancing user experience.", 'Gmail uses machine learning to filter spam messages, demonstrating another practical application of machine learning. Gmail utilizes machine learning to classify emails as spam or non-spam, demonstrating a practical application of machine learning in enhancing user experience.', 'Machine learning is important due to the increase in data generation, improving decision-making, uncovering patterns and trends, and solving complex problems. Machine learning is crucial due to the increase in data generation, improved decision-making, uncovering patterns, and solving complex problems.', 'Machine learning provides machines the ability to learn automatically and improve from experience without explicit programming. Machine learning is a subset of artificial intelligence that enables machines to learn automatically and improve from experience without explicit programming.']}, {'end': 10514.399, 'start': 10130.779, 'title': 'Introduction to machine learning', 'summary': 'Introduces the concept of machine learning, explaining the process of feeding data to train a machine learning model, the commonly used terms in machine learning, and the stages involved in the machine learning process, with an example of predicting rain possibility based on weather conditions.', 'duration': 383.62, 'highlights': ['Machine learning process involves feeding a machine with data to train it and build a model using machine learning algorithms, which can then be used to predict outcomes or solve complex problems, with the model being the representation of the entire machine learning process. The machine learning process begins by feeding the machine lots of data to train it, draw useful insights and patterns, and build a model using machine learning algorithms, which can help predict outcomes or solve complex problems.', 'Key terms in machine learning include algorithm (set of rules or statistical techniques used to learn patterns from data), model (representation of the entire machine learning process), predictor variable (feature used to predict output), response variable (output variable to be predicted), and training data (used to build the machine learning model). Key terms in machine learning include algorithm, model, predictor variable, response variable, and training data, which are essential components in the machine learning process.', 'The stages of the machine learning process include defining the objective, data gathering, data preparation, data exploration and analysis, building a machine learning model, model evaluation, and prediction, with an example of predicting rain possibility based on weather conditions. The stages of the machine learning process involve defining the objective, data gathering, data preparation, data exploration and analysis, building a machine learning model, model evaluation, and prediction, exemplified by predicting rain possibility based on weather conditions.']}], 'duration': 2670.227, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s7844172.jpg', 'highlights': ['The probability of a candidate having a good package, given that they have not undergone any training, is 5 divided by 60, resulting in a probability of around 0.08. Determining the conditional probability of candidates having a good package without training.', 'The demonstration uses the Gapminder dataset to perform hypothesis testing, comparing the life expectancy of Ireland and South Africa, and concludes that the alternate hypothesis is true with a 95% confidence interval. The demonstration compares the life expectancy of Ireland and South Africa using the Gapminder dataset, concluding that the alternate hypothesis is true with a 95% confidence interval.', 'Machine learning is important due to the increase in data generation, improving decision-making, uncovering patterns and trends, and solving complex problems. Machine learning is crucial due to the increase in data generation, improved decision-making, uncovering patterns, and solving complex problems.', 'The machine learning process begins by feeding the machine lots of data to train it, draw useful insights and patterns, and build a model using machine learning algorithms, which can help predict outcomes or solve complex problems.', 'The stages of the machine learning process involve defining the objective, data gathering, data preparation, data exploration and analysis, building a machine learning model, model evaluation, and prediction, exemplified by predicting rain possibility based on weather conditions.']}, {'end': 11993.365, 'segs': [{'end': 10588.014, 'src': 'embed', 'start': 10514.78, 'weight': 0, 'content': [{'end': 10518.742, 'text': 'You can just go ahead and download the data sets from websites such as Kaggle.', 'start': 10514.78, 'duration': 3.962}, {'end': 10521.544, 'text': 'Now coming back to the problem at hand.', 'start': 10519.643, 'duration': 1.901}, {'end': 10529.85, 'text': 'the data needed for weather forecasting includes measures such as humidity level, temperature pressure, locality,', 'start': 10521.544, 'duration': 8.306}, {'end': 10532.571, 'text': 'whether or not you live in a hill station, and so on.', 'start': 10529.85, 'duration': 2.721}, {'end': 10536.314, 'text': 'So guys, such data must be collected and stored for analysis.', 'start': 10533.072, 'duration': 3.242}, {'end': 10540.796, 'text': 'Now the next stage in machine learning is preparing your data.', 'start': 10537.254, 'duration': 3.542}, {'end': 10544.997, 'text': 'The data you collected is almost never in the right format.', 'start': 10541.296, 'duration': 3.701}, {'end': 10549.759, 'text': "So basically you'll encounter a lot of inconsistencies in the data set.", 'start': 10545.438, 'duration': 4.321}, {'end': 10554.761, 'text': 'Okay, this includes missing values, redundant variables, duplicate values and so on.', 'start': 10550.2, 'duration': 4.561}, {'end': 10560.984, 'text': 'Removing such values is very important because they might lead to wrongful computations and predictions.', 'start': 10555.402, 'duration': 5.582}, {'end': 10568.665, 'text': "So that's why at this stage you must scan the entire data set for any inconsistencies and you have to fix them at this stage.", 'start': 10561.562, 'duration': 7.103}, {'end': 10572.287, 'text': 'Now the next step is exploratory data analysis.', 'start': 10569.266, 'duration': 3.021}, {'end': 10579.83, 'text': 'Now data analysis is all about diving deep into data and finding all the hidden data mysteries.', 'start': 10573.247, 'duration': 6.583}, {'end': 10582.191, 'text': 'Okay, this is where you become a detective.', 'start': 10580.21, 'duration': 1.981}, {'end': 10588.014, 'text': 'So EDA or exploratory data analysis is like a brainstorming of machine learning.', 'start': 10582.712, 'duration': 5.302}], 'summary': 'Data collection, preparation, and exploratory analysis are crucial in machine learning for weather forecasting.', 'duration': 73.234, 'max_score': 10514.78, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s10514780.jpg'}, {'end': 10637.61, 'src': 'embed', 'start': 10610.437, 'weight': 3, 'content': [{'end': 10614.378, 'text': 'Okay, so such correlations have to be understood and mapped at this stage.', 'start': 10610.437, 'duration': 3.941}, {'end': 10619.773, 'text': 'Now this stage is followed by stage number five, which is building a machine learning model.', 'start': 10614.899, 'duration': 4.874}, {'end': 10627.288, 'text': 'So all the insights and the patterns that you derive during data exploration are used to build the machine learning model.', 'start': 10620.467, 'duration': 6.821}, {'end': 10633.69, 'text': 'So this stage always begins by splitting the data set into two parts, training data and the testing data.', 'start': 10627.708, 'duration': 5.982}, {'end': 10637.61, 'text': 'So earlier in this session I already told you what training and testing data is.', 'start': 10634.03, 'duration': 3.58}], 'summary': 'Understanding and mapping correlations, followed by building a machine learning model with data split into training and testing sets.', 'duration': 27.173, 'max_score': 10610.437, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s10610437.jpg'}, {'end': 10718.026, 'src': 'embed', 'start': 10691.859, 'weight': 8, 'content': [{'end': 10697.864, 'text': 'So after building a model by using the training data set, it is finally time to put the model to a test.', 'start': 10691.859, 'duration': 6.005}, {'end': 10705.451, 'text': 'So the testing data set is used to check the efficiency of the model and how accurately it can predict the outcome.', 'start': 10698.845, 'duration': 6.606}, {'end': 10711.236, 'text': 'So once you calculate the accuracy, any improvements in the model have to be implemented in this stage.', 'start': 10705.791, 'duration': 5.445}, {'end': 10718.026, 'text': 'So methods like parameter tuning and cross validation can be used to improve the performance of the model.', 'start': 10712.297, 'duration': 5.729}], 'summary': 'After building a model with training data, testing data is used to check accuracy and implement improvements.', 'duration': 26.167, 'max_score': 10691.859, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s10691859.jpg'}, {'end': 10798.903, 'src': 'embed', 'start': 10759.573, 'weight': 4, 'content': [{'end': 10761.954, 'text': 'Let us understand what regression in machine learning is.', 'start': 10759.573, 'duration': 2.381}, {'end': 10764.256, 'text': 'So what exactly is regression?', 'start': 10762.955, 'duration': 1.301}, {'end': 10772.501, 'text': 'The main goal of regression is the construction of an efficient model to predict the dependent attributes from a bunch of attribute variables.', 'start': 10765.016, 'duration': 7.485}, {'end': 10778.465, 'text': 'a regression problem is where the output variable is either real or a continuous value, like salary, weight, area, Etc.', 'start': 10772.501, 'duration': 5.964}, {'end': 10785.5, 'text': 'We can also define regression as a statistical means that is used in applications like housing, investing, Etc,', 'start': 10779.454, 'duration': 6.046}, {'end': 10789.564, 'text': 'to predict the relationship between a dependent variable and a bunch of independent variables.', 'start': 10785.5, 'duration': 4.064}, {'end': 10798.903, 'text': "For example, let's say, in the finance application or investing, we can actually predict the values of certain stock prices, or you know those values,", 'start': 10790.396, 'duration': 8.507}], 'summary': 'Regression in machine learning predicts dependent attributes from independent variables in real or continuous values for applications like finance and investing.', 'duration': 39.33, 'max_score': 10759.573, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s10759573.jpg'}, {'end': 10864.976, 'src': 'embed', 'start': 10837.322, 'weight': 5, 'content': [{'end': 10840.423, 'text': 'So let us take a look at various types of regression techniques that we have.', 'start': 10837.322, 'duration': 3.101}, {'end': 10846.871, 'text': 'We have simple linear regression then we have polynomial regression support vector regression decision D regression.', 'start': 10841.189, 'duration': 5.682}, {'end': 10850.152, 'text': 'We have random forest regression and we have logistic regression as well.', 'start': 10847.271, 'duration': 2.881}, {'end': 10855.433, 'text': "That is also a type of regression that we have but for now we'll be focusing on simple linear regression.", 'start': 10850.512, 'duration': 4.921}, {'end': 10859.974, 'text': "So let's talk about how or what exactly simple linear regression first.", 'start': 10856.093, 'duration': 3.881}, {'end': 10864.976, 'text': 'So one of the most interesting and common regression technique is simple linear regression,', 'start': 10860.434, 'duration': 4.542}], 'summary': 'Various regression techniques discussed include simple linear, polynomial, support vector, decision tree, random forest, and logistic regression, with focus on simple linear regression.', 'duration': 27.654, 'max_score': 10837.322, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s10837322.jpg'}, {'end': 10960.637, 'src': 'embed', 'start': 10936.47, 'weight': 6, 'content': [{'end': 10944.356, 'text': 'simple linear regression is a regression technique in which the independent variable has a linear relationship with the dependent variable.', 'start': 10936.47, 'duration': 7.886}, {'end': 10946.838, 'text': 'straight line in the diagram is the best fit line,', 'start': 10944.356, 'duration': 2.482}, {'end': 10954.144, 'text': 'and the main goal of the simple linear regression is to consider the given data points and plot the best fit line to fit the model in the best way possible.', 'start': 10946.838, 'duration': 7.306}, {'end': 10960.637, 'text': 'So if you talk about a real-life analogy to explain linear regression, we can take an example of a car resale value.', 'start': 10954.875, 'duration': 5.762}], 'summary': 'Simple linear regression fits a best-fit line to model data points.', 'duration': 24.167, 'max_score': 10936.47, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s10936470.jpg'}], 'start': 10514.78, 'title': 'Machine learning for regression', 'summary': 'Covers machine learning data preparation, exploring weather data, resolving inconsistencies, and performing exploratory data analysis. it also delves into the machine learning process, regression types, and implementing linear regression in python, achieving a mean squared error of 2004.', 'chapters': [{'end': 10588.014, 'start': 10514.78, 'title': 'Machine learning data preparation', 'summary': 'Discusses the process of downloading and collecting weather data for forecasting, preparing the data by resolving inconsistencies and removing erroneous values, and then performing exploratory data analysis for deeper insights.', 'duration': 73.234, 'highlights': ['Exploratory data analysis involves diving deep into the data to find hidden patterns and mysteries, akin to a brainstorming of machine learning.', 'Preparing the data involves identifying and fixing inconsistencies such as missing values, redundant variables, and duplicate values to ensure accurate computations and predictions.', 'Data collection for weather forecasting includes measures like humidity level, temperature, pressure, locality, and the consideration of living in a hill station.']}, {'end': 11163.396, 'start': 10588.473, 'title': 'Machine learning process & regression', 'summary': 'Discusses the machine learning process, including data exploration, building a machine learning model, model evaluation and optimization, and predictions, followed by an explanation of regression in machine learning, types of regression techniques, and terminologies. key points include understanding correlations in data exploration, building a machine learning model using insights from data exploration, evaluating model efficiency, and types of regression techniques such as simple linear regression and its terminologies.', 'duration': 574.923, 'highlights': ['The chapter discusses the machine learning process, including data exploration, building a machine learning model, model evaluation and optimization, and predictions, followed by an explanation of regression in machine learning, types of regression techniques, and terminologies.', 'Building a machine learning model involves using insights and patterns derived during data exploration, splitting the data set into training and testing data, and choosing the right algorithm based on the problem and data complexity.', 'Model evaluation and optimization include checking the efficiency of the model using the testing data set, implementing improvements through methods like parameter tuning and cross validation, and putting the evaluated and improved model to use for predictions.', 'Regression in machine learning is the construction of an efficient model to predict dependent attributes from attribute variables, with the output variable being real or a continuous value, used in applications like housing and investing.', 'Types of regression techniques include simple linear regression, polynomial regression, support vector machine regression, decision tree regression, and random forest regression, with simple linear regression being a common technique to predict the outcome of a dependent variable based on independent variables with a linear relationship.', 'Terminologies in linear regression include the cost function, providing the best possible values for the intercept and slope to make the best fit line, and the gradient descent, a method of updating values to reduce the mean squared error, used to update the values of the intercept and slope in Python with the sklearn or scikit-learn library.', 'Advantages of linear regression include performing well for linearly separable data, easy implementation and interpretation, efficiency in training, handling overfitting using techniques like dimensionality reduction and regularization, and extrapolation beyond a specific data set.']}, {'end': 11471.41, 'start': 11164.117, 'title': 'Linear regression in machine learning', 'summary': 'Covers the advantages and disadvantages of linear regression, use cases including sales forecasting, risk analysis, housing applications, and finance, and steps to implement linear regression using sklearn library for accuracy evaluation.', 'duration': 307.293, 'highlights': ['Linear regression has several disadvantages including assumptions of linearity between dependent and independent variables, proneness to noise, overfitting, sensitivity to outliers, and multicollinearity. assumptions of linearity between dependent and independent variables, proneness to noise, overfitting, sensitivity to outliers, and multicollinearity', 'Linear regression can be used for sales forecasting, risk analysis for disease predictions, housing applications, and finance applications such as predicting stock prices and investment evaluation. sales forecasting, risk analysis for disease predictions, housing applications, and finance applications', 'Steps to implement linear regression using sklearn library include loading the data, exploring the data, slicing the data, training and splitting the data, generating the model, and evaluating the accuracy using the fit and predict method and mean squared error. loading the data, exploring the data, slicing the data, training and splitting the data, generating the model, and evaluating the accuracy using the fit and predict method and mean squared error']}, {'end': 11993.365, 'start': 11472.228, 'title': 'Implementing linear regression in python', 'summary': 'Discusses implementing linear regression in python using a disease dataset, splitting the data for training and testing, generating and fitting the model, calculating accuracy, and visualizing the best fit line, achieving a mean squared error of 2004.', 'duration': 521.137, 'highlights': ['The chapter demonstrates splitting the data for training and testing, using the last 30 data entries for testing and the last 20 entries for training, achieving a mean squared error of 2004.', 'It highlights the process of generating and fitting the linear regression model, using the disease dataset and target variable, and making predictions with disease X test, achieving a mean squared error of 2004.', 'The chapter discusses visualizing the best fit line by plotting a graph using disease X test and y predict, achieving a mean squared error of 2004.', 'The chapter explains the importance of segregating data separately and the limitations of implementing linear regression using all columns of the dataset, and demonstrates the use of a custom dataset for car resale value prediction.']}], 'duration': 1478.585, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s10514780.jpg', 'highlights': ['Exploratory data analysis involves diving deep into the data to find hidden patterns and mysteries, akin to a brainstorming of machine learning.', 'Preparing the data involves identifying and fixing inconsistencies such as missing values, redundant variables, and duplicate values to ensure accurate computations and predictions.', 'Data collection for weather forecasting includes measures like humidity level, temperature, pressure, locality, and the consideration of living in a hill station.', 'Building a machine learning model involves using insights and patterns derived during data exploration, splitting the data set into training and testing data, and choosing the right algorithm based on the problem and data complexity.', 'Regression in machine learning is the construction of an efficient model to predict dependent attributes from attribute variables, with the output variable being real or a continuous value, used in applications like housing and investing.', 'Types of regression techniques include simple linear regression, polynomial regression, support vector machine regression, decision tree regression, and random forest regression, with simple linear regression being a common technique to predict the outcome of a dependent variable based on independent variables with a linear relationship.', 'Advantages of linear regression include performing well for linearly separable data, easy implementation and interpretation, efficiency in training, handling overfitting using techniques like dimensionality reduction and regularization, and extrapolation beyond a specific data set.', 'Linear regression can be used for sales forecasting, risk analysis for disease predictions, housing applications, and finance applications such as predicting stock prices and investment evaluation.', 'The chapter demonstrates splitting the data for training and testing, using the last 30 data entries for testing and the last 20 entries for training, achieving a mean squared error of 2004.', 'It highlights the process of generating and fitting the linear regression model, using the disease dataset and target variable, and making predictions with disease X test, achieving a mean squared error of 2004.', 'The chapter discusses visualizing the best fit line by plotting a graph using disease X test and y predict, achieving a mean squared error of 2004.', 'The chapter explains the importance of segregating data separately and the limitations of implementing linear regression using all columns of the dataset, and demonstrates the use of a custom dataset for car resale value prediction.']}, {'end': 13601.529, 'segs': [{'end': 12814.311, 'src': 'embed', 'start': 12788.048, 'weight': 0, 'content': [{'end': 12792.01, 'text': 'So in that case linear regression helps you to predict what will be the temperature tomorrow.', 'start': 12788.048, 'duration': 3.962}, {'end': 12799.016, 'text': "whereas logistic regression will only tell you which is going to rain or not or whether it's cloudy or not which is going to snow or not.", 'start': 12792.45, 'duration': 6.566}, {'end': 12800.718, 'text': 'So these values are discrete.', 'start': 12799.337, 'duration': 1.381}, {'end': 12802.68, 'text': 'Whereas if you apply linear regression,', 'start': 12801.038, 'duration': 1.642}, {'end': 12808.445, 'text': "you'll be predicting things like what is the temperature tomorrow or what is the temperature day after tomorrow, and all those things.", 'start': 12802.68, 'duration': 5.765}, {'end': 12812.009, 'text': 'So these are the slight differences between linear regression and logistic regression.', 'start': 12808.746, 'duration': 3.263}, {'end': 12814.311, 'text': 'So moving ahead we have classification problem.', 'start': 12812.249, 'duration': 2.062}], 'summary': 'Linear regression predicts temperature, logistic regression predicts weather conditions.', 'duration': 26.263, 'max_score': 12788.048, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s12788048.jpg'}, {'end': 13188.557, 'src': 'embed', 'start': 13165.027, 'weight': 3, 'content': [{'end': 13172.132, 'text': 'So numpy is a library in Python which basically stands for numerical Python and it is widely used to perform any scientific computation.', 'start': 13165.027, 'duration': 7.105}, {'end': 13174.232, 'text': "Next we'll be importing seaborn.", 'start': 13172.691, 'duration': 1.541}, {'end': 13176.913, 'text': 'So seaborn is a library for statistical plotting.', 'start': 13174.652, 'duration': 2.261}, {'end': 13179.534, 'text': "So I'll say import seaborn as SNS.", 'start': 13177.333, 'duration': 2.201}, {'end': 13181.375, 'text': "I'll also import matplotlib.", 'start': 13179.874, 'duration': 1.501}, {'end': 13184.336, 'text': 'So matplotlib library is again for plotting.', 'start': 13181.995, 'duration': 2.341}, {'end': 13188.557, 'text': "So I'll say import matplotlib.pyplot as PLD.", 'start': 13184.676, 'duration': 3.881}], 'summary': 'Numpy is used for scientific computation in python, seaborn and matplotlib are also imported for statistical and general plotting purposes.', 'duration': 23.53, 'max_score': 13165.027, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s13165027.jpg'}, {'end': 13516.296, 'src': 'embed', 'start': 13492.151, 'weight': 1, 'content': [{'end': 13498.532, 'text': 'the lowest class or the cheapest class to get into the Titanic, and the people who did survive majorly belong to the higher classes.', 'start': 13492.151, 'duration': 6.381}, {'end': 13502.873, 'text': 'So here one and two has more rise than the passenger who were traveling in the third class.', 'start': 13498.932, 'duration': 3.941}, {'end': 13508.354, 'text': 'So here we have concluded that the passengers who did not survive a majorly of third class or, you can say,', 'start': 13503.373, 'duration': 4.981}, {'end': 13514.416, 'text': 'the lowest class and the passengers who were traveling in first and second class would tend to survive more next.', 'start': 13508.354, 'duration': 6.062}, {'end': 13516.296, 'text': 'Let us plot a graph for the age distribution.', 'start': 13514.436, 'duration': 1.86}], 'summary': 'Most titanic survivors were from higher classes.', 'duration': 24.145, 'max_score': 13492.151, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s13492151.jpg'}], 'start': 11994.705, 'title': 'Analyzing car price and logistic regression', 'summary': 'Explores the relationship between car horsepower and price, including a car of 450 horsepower priced at around $70,000 and a car of 50 horsepower priced at around $10,000. it also introduces logistic regression for predicting outcomes of a binary dependent variable and covers its practical applications in weather prediction, illness determination, and data analysis of the titanic dataset, which comprises 891 passengers.', 'chapters': [{'end': 12172.107, 'start': 11994.705, 'title': 'Relation between horsepower and car price', 'summary': 'Discusses the evident relation between car horsepower and price, with a car of 450 horsepower priced at around $70,000 and a car of 50 horsepower priced at around $10,000. it also outlines the process of building a linear regression model to predict car prices based on horsepower.', 'duration': 177.402, 'highlights': ['The relation between car horsepower and price is evident, with a car of 450 horsepower priced at around $70,000 and a car of 50 horsepower priced at around $10,000.', 'The process of building a linear regression model to predict car prices based on horsepower involves reshaping the independent variables and using the fit method to generate the model.', 'The coefficient and intercept for the linear regression model are printed to analyze the relationship between car horsepower and price.']}, {'end': 12697.449, 'start': 12174.308, 'title': 'Logistic regression overview', 'summary': 'Covers the implementation of simple linear regression on a custom data set and then delves into the what and why of logistic regression, emphasizing its use for predicting outcomes of a binary dependent variable and explaining the differences between linear and logistic regression.', 'duration': 523.141, 'highlights': ['The algorithm for logistic regression is most widely used when the dependent variable is in binary format, predicting outcomes of categorical dependent variable with values being discrete, such as 0 or 1, yes or no, true or false, etc.', 'Logistic regression restricts the value of Y to be between 0 and 1, unlike linear regression where the value of Y is in a range, and the main aim is to clip the linear line at 0 and 1, leading to the formulation of the sigmoid function curve.', 'The sigmoid function converts any value from minus infinity to infinity to discrete values of 0 or 1, with the concept of a threshold value indicating the probability of either winning or losing, thereby making the output discrete.', 'In logistic regression, the equation is transformed by dividing it by 1 minus y and taking the logarithmic to get the range between minus infinity to infinity, making the Y value range between 0 to 1, and this transformation is automatically handled in Python by calling the logistic regression function.']}, {'end': 13118.392, 'start': 12697.549, 'title': 'Logistic regression applications & practical implementation', 'summary': 'Discusses the applications of logistic regression, such as weather prediction and illness determination, and the practical implementation through two projects: titanic data analysis and suv car data analysis.', 'duration': 420.843, 'highlights': ['Logistic regression can be used in weather prediction to determine if it is raining, sunny, or cloudy, with discrete values for predictions. It is used to predict weather conditions such as rain, sunshine, and cloudiness with discrete values, distinguishing it from linear regression.', "The practical implementation of logistic regression includes analyzing the Titanic dataset to predict factors contributing to a person's survival using categorical and numerical features. An analysis of the Titanic dataset is performed to predict factors contributing to a person's survival, such as passenger class, gender, age, and other features.", 'Logistic regression is also utilized in determining illnesses by analyzing patient data, including factors like sugar levels, blood pressure, age, and medical history. It aids in determining illnesses by analyzing patient data, including factors like sugar levels, blood pressure, age, and medical history to identify the outcome and severity of the illness.', "Practical implementation extends to analyzing SUV car data to understand factors influencing people's interest in purchasing SUVs. The practical implementation involves analyzing SUV car data to understand the factors influencing people's interest in purchasing SUVs and the insights obtained from the analysis."]}, {'end': 13327.879, 'start': 13118.392, 'title': 'Titanic data analysis', 'summary': 'Covers the process of importing libraries and collecting data for titanic data analysis, including key steps like importing pandas, numpy, seaborn, and matplotlib, and calculating the total number of passengers, with a total of 891 passengers in the original dataset.', 'duration': 209.487, 'highlights': ["The total number of passengers in the original dataset is 891. The length function is used to calculate the total length of the 'Titanic data' index, which yields the total number of passengers in the dataset.", 'Importing libraries including pandas, numpy, seaborn, and matplotlib for data analysis. The process involves importing various libraries such as pandas, numpy, seaborn, and matplotlib for data analysis purposes, with brief explanations of their respective uses.', 'The first step involves collecting data and importing all the required libraries. The initial step includes collecting data and importing necessary libraries such as pandas, numpy, seaborn, and matplotlib, essential for the Titanic data analysis.']}, {'end': 13601.529, 'start': 13328.199, 'title': 'Analyzing titanic data', 'summary': 'Discusses analyzing the titanic dataset, including survival rates, gender distribution, passenger class, age distribution, and fare size.', 'duration': 273.33, 'highlights': ['The passengers who did not survive majorly belonged to the third class, while the majority of survivors belong to the higher classes, indicating a class-based survival disparity.', 'The analysis reveals that on average, women were more than three times more likely to survive than men, indicating a gender-based survival disparity.', 'The age distribution analysis indicates a higher proportion of young passengers (0-10 years old) and average-aged passengers onboard the Titanic.', 'The fare size analysis shows that the majority of fares fall within the range of 0 to 100, with a decreasing population as the fare size increases.']}], 'duration': 1606.824, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s11994705.jpg', 'highlights': ['The relation between car horsepower and price is evident, with a car of 450 horsepower priced at around $70,000 and a car of 50 horsepower priced at around $10,000.', 'Logistic regression restricts the value of Y to be between 0 and 1, unlike linear regression where the value of Y is in a range, and the main aim is to clip the linear line at 0 and 1, leading to the formulation of the sigmoid function curve.', 'Logistic regression can be used in weather prediction to determine if it is raining, sunny, or cloudy, with discrete values for predictions.', "The total number of passengers in the original dataset is 891. The length function is used to calculate the total length of the 'Titanic data' index, which yields the total number of passengers in the dataset.", 'The passengers who did not survive majorly belonged to the third class, while the majority of survivors belong to the higher classes, indicating a class-based survival disparity.']}, {'end': 15732.736, 'segs': [{'end': 14822.241, 'src': 'embed', 'start': 14797.644, 'weight': 3, 'content': [{'end': 14803.551, 'text': 'So over here if you want to do it manually you have to plus these two numbers, which is 105 plus 63.', 'start': 14797.644, 'duration': 5.907}, {'end': 14809.595, 'text': 'So this comes out to almost 168 and then you have to divide it by the sum of all the four numbers.', 'start': 14803.551, 'duration': 6.044}, {'end': 14813.257, 'text': 'So 105 plus 63 plus 21 plus 25.', 'start': 14809.935, 'duration': 3.322}, {'end': 14815.658, 'text': 'So this gives me a result of 214.', 'start': 14813.257, 'duration': 2.401}, {'end': 14822.241, 'text': "So now if you divide these two number you'll get the same accuracy that is 78% or you can say 0.78.", 'start': 14815.658, 'duration': 6.583}], 'summary': 'Manually calculate 105+63=168, divide by sum of 105+63+21+25=214, resulting in 78% accuracy.', 'duration': 24.597, 'max_score': 14797.644, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s14797644.jpg'}, {'end': 14942.548, 'src': 'embed', 'start': 14913.354, 'weight': 8, 'content': [{'end': 14918.416, 'text': 'So guys I will not be going through all the details of data cleaning and analyzing the part that part.', 'start': 14913.354, 'duration': 5.062}, {'end': 14919.277, 'text': "I'll just leave it on you.", 'start': 14918.477, 'duration': 0.8}, {'end': 14921.718, 'text': 'So just go ahead and practice as much as you can.', 'start': 14919.497, 'duration': 2.221}, {'end': 14923.059, 'text': 'All right.', 'start': 14922.739, 'duration': 0.32}, {'end': 14925.54, 'text': 'So my second project is SUV predictions.', 'start': 14923.379, 'duration': 2.161}, {'end': 14927.981, 'text': 'All right.', 'start': 14927.661, 'duration': 0.32}, {'end': 14930.562, 'text': 'So first of all, I have to import all the libraries.', 'start': 14928.381, 'duration': 2.181}, {'end': 14935.325, 'text': "So I say import numpy as NP and similarly I'll do the rest of it.", 'start': 14930.622, 'duration': 4.703}, {'end': 14942.548, 'text': 'All right.', 'start': 14942.208, 'duration': 0.34}], 'summary': 'Practice data cleaning and analysis for suv predictions project.', 'duration': 29.194, 'max_score': 14913.354, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s14913354.jpg'}, {'end': 15011.138, 'src': 'embed', 'start': 14983.828, 'weight': 1, 'content': [{'end': 14989.291, 'text': 'So here should fetch me all the rows and only the second and third column which is age and estimated salary.', 'start': 14983.828, 'duration': 5.463}, {'end': 14993.393, 'text': 'So these are the factors which will be used to predict the dependent variable that is purchase.', 'start': 14989.671, 'duration': 3.722}, {'end': 14996.034, 'text': 'So here my dependent variable is purchase.', 'start': 14993.953, 'duration': 2.081}, {'end': 14998.83, 'text': 'and independent variable is of age and salary.', 'start': 14996.829, 'duration': 2.001}, {'end': 15001.772, 'text': "So I'll say data set dot I log.", 'start': 14999.171, 'duration': 2.601}, {'end': 15005.314, 'text': "I'll have all the rows and I just want fourth column.", 'start': 15002.332, 'duration': 2.982}, {'end': 15007.636, 'text': 'That is my purchase column dot values.', 'start': 15005.354, 'duration': 2.282}, {'end': 15011.138, 'text': "All right, so I've just forgot one one square bracket over here.", 'start': 15008.176, 'duration': 2.962}], 'summary': 'Fetching all rows with age and estimated salary to predict purchase.', 'duration': 27.31, 'max_score': 14983.828, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s14983828.jpg'}, {'end': 15493.536, 'src': 'embed', 'start': 15463.721, 'weight': 0, 'content': [{'end': 15470.124, 'text': 'okay. whereas if you want to predict something which is the discrete outcome to give you an example,', 'start': 15463.721, 'duration': 6.403}, {'end': 15474.867, 'text': 'you want to predict that whether i will win the match or not, okay,', 'start': 15470.124, 'duration': 4.743}, {'end': 15478.869, 'text': 'i want to predict whether the particular employee will churn out from the company or not.', 'start': 15474.867, 'duration': 4.002}, {'end': 15483.986, 'text': 'You want to predict whether the person will have a cancer or not.', 'start': 15479.761, 'duration': 4.225}, {'end': 15486.112, 'text': "You're getting the point right?", 'start': 15485.151, 'duration': 0.961}, {'end': 15493.536, 'text': 'So if you have, the output which you want to predict is in the form of yes or no, or true or false, right?', 'start': 15486.372, 'duration': 7.164}], 'summary': 'Predict discrete outcomes such as employee churn, cancer, or match results.', 'duration': 29.815, 'max_score': 15463.721, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s15463721.jpg'}, {'end': 15627.314, 'src': 'embed', 'start': 15604.215, 'weight': 9, 'content': [{'end': 15613.202, 'text': 'Now, what is a decision tree? What I was telling you is, believe me or not, decision tree is something you use every day in your daily life.', 'start': 15604.215, 'duration': 8.987}, {'end': 15619.247, 'text': 'For example, you take decisions and today also, you took a decision to attend this webinar right?', 'start': 15613.242, 'duration': 6.005}, {'end': 15624.532, 'text': 'But how do you decide a decision based on various further decisions right?', 'start': 15619.788, 'duration': 4.744}, {'end': 15627.314, 'text': 'For example, for today joining the webinar.', 'start': 15624.872, 'duration': 2.442}], 'summary': 'Decision tree is a common tool for making daily decisions, like attending a webinar.', 'duration': 23.099, 'max_score': 15604.215, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s15604215.jpg'}], 'start': 13601.529, 'title': 'Titanic analysis and machine learning', 'summary': 'Covers titanic survival analysis based on gender and passenger class, logistic regression model achieving 78% accuracy, and supervised machine learning overview including decision tree algorithm application.', 'chapters': [{'end': 13638.226, 'start': 13601.529, 'title': 'Titanic survival analysis', 'summary': 'Presents an analysis of survival rates based on gender and passenger class, with a demonstration of the count plot for the number of siblings or spouses aboard the titanic.', 'duration': 36.697, 'highlights': ['The analysis includes survival rates based on gender, revealing that females tend to survive more than males.', 'Passenger class is also analyzed, showing differences in survival rates among first, second, and third class passengers.', 'A count plot is demonstrated for the number of siblings or spouses aboard the Titanic, providing a visual representation of the data.']}, {'end': 14099.89, 'start': 13638.706, 'title': 'Data wrangling in titanic dataset', 'summary': 'Discusses data wrangling on the titanic dataset, including identifying and handling missing values, dropping unnecessary columns, and converting string variables into categorical variables for logistic regression, to ensure a clean and usable dataset for further analysis.', 'duration': 461.184, 'highlights': ['Removing unnecessary columns and handling missing values is crucial for data wrangling, as it directly affects the accuracy of the dataset. Removing unnecessary columns and handling missing values is crucial for data wrangling, as it directly affects the accuracy of the dataset.', "The dataset contains 177 missing values in the 'Age' column, a large number of missing values in the 'Cabin' column, and very few missing values in the 'Embarked' column. The dataset contains 177 missing values in the 'Age' column, a large number of missing values in the 'Cabin' column, and very few missing values in the 'Embarked' column.", "Passengers traveling in first and second class tend to be older compared to those in the third class, as indicated by analyzing the 'Age' column based on passenger class. Passengers traveling in first and second class tend to be older compared to those in the third class, as indicated by analyzing the 'Age' column based on passenger class.", "Dropping the 'Cabin' column due to a large number of missing values and cleaning the dataset to remove all null values ensures a cleaner dataset for analysis. Dropping the 'Cabin' column due to a large number of missing values and cleaning the dataset to remove all null values ensures a cleaner dataset for analysis.", 'Converting string variables into categorical variables and dummy variables is necessary for implementing logistic regression and ensuring the dataset is suitable for machine learning analysis. Converting string variables into categorical variables and dummy variables is necessary for implementing logistic regression and ensuring the dataset is suitable for machine learning analysis.']}, {'end': 14459.709, 'start': 14100.87, 'title': 'Data wrangling and cleaning', 'summary': 'Covers data wrangling using pandas, including converting categorical data into numerical format and dropping irrelevant columns, resulting in a cleaned dataset for further analysis and model building.', 'duration': 358.839, 'highlights': ['Using pandas to convert categorical data into numerical format for gender, embarked location, and passenger class. The speaker demonstrates using pandas to convert categorical data such as gender (male/female), embarked location (C, Q, S), and passenger class (1, 2, 3) into numerical format, enabling easier analysis and modeling.', 'Dropping irrelevant columns such as passenger ID, name, and ticket number to clean the dataset. The speaker shows the process of dropping irrelevant columns like passenger ID, name, and ticket number, streamlining the dataset for further analysis and model building.', 'Explaining the process of splitting the dataset into train and test subsets for model training and testing. The speaker explains the importance of splitting the dataset into train and test subsets, followed by building and testing models on the respective subsets, a crucial step in data analysis and model evaluation.']}, {'end': 14878.752, 'start': 14459.709, 'title': 'Logistic regression model', 'summary': 'Covers the process of creating a logistic regression model to predict survival rate using the titanic dataset, achieving an accuracy of 78%, and introduces a new project on suv data analysis.', 'duration': 419.043, 'highlights': ['The process of creating a logistic regression model to predict survival rate using the Titanic dataset The chapter extensively covers the steps involved in creating a logistic regression model to predict survival rate using the Titanic dataset.', 'Achieving an accuracy of 78% in predicting survival rate The accuracy of the logistic regression model in predicting survival rate is highlighted as 78%.', 'Introduction of a new project on SUV data analysis The chapter introduces a new project on SUV data analysis, aimed at predicting the category of people interested in buying the new SUV released by a car company.']}, {'end': 15316.954, 'start': 14878.992, 'title': 'Logistic regression for suv prediction', 'summary': 'Discusses implementing logistic regression to predict whether a person can purchase an suv based on age and salary, achieving an 89% accuracy.', 'duration': 437.962, 'highlights': ['The model achieved an accuracy of 89% in predicting SUV purchases based on age and salary. The accuracy of the logistic regression model in predicting SUV purchases based on age and salary is 89%.', 'Logistic regression was applied to predict SUV purchases using age and salary as independent variables. The logistic regression model was applied to predict SUV purchases using age and salary as independent variables.', 'Data set was divided into training and test subsets in a 75-25 ratio. The data set was divided into training and test subsets in a 75-25 ratio for model evaluation.', 'The chapter explains the concept of classification in machine learning with an example of predicting whether Prime Minister Modi will be re-elected. The chapter explains the concept of classification in machine learning with an example of predicting whether Prime Minister Modi will be re-elected.']}, {'end': 15732.736, 'start': 15317.014, 'title': 'Supervised machine learning overview', 'summary': 'Explains supervised machine learning, including its types - regression and classification, with examples and its application in decision tree algorithm, emphasizing on its graphical representation and practical usage.', 'duration': 415.722, 'highlights': ["Supervised machine learning is classified into regression and classification types, where regression predicts continuous outcomes and classification predicts discrete outcomes, such as 'yes' or 'no'.", 'Decision tree is a graphical representation of decision-making process, used in everyday scenarios, and is a fundamental algorithm in supervised machine learning for making choices based on various factors.', 'Supervised machine learning involves providing labeled training data to predict outcomes, making it useful for scenarios like predicting community pricing, employee churn, or disease diagnosis.', "The process of supervised machine learning involves dividing the dataset into different categories or groups by adding a label, making it suitable for predicting 'yes' or 'no' outcomes, fraud detection, or employee turnover."]}], 'duration': 2131.207, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s13601529.jpg', 'highlights': ['Logistic regression model achieves 78% accuracy in predicting survival rate', 'Survival rates analyzed based on gender and passenger class', 'Passenger class differences in survival rates demonstrated', "Data wrangling crucial for accuracy, handling 177 missing 'Age' values", 'Converting string variables into categorical variables necessary for logistic regression', 'Pandas used to convert categorical data into numerical format for analysis', 'Process of splitting dataset into train and test subsets explained', 'Logistic regression model applied to predict SUV purchases with 89% accuracy', 'Supervised machine learning involves providing labeled training data to predict outcomes', 'Decision tree algorithm fundamental in supervised machine learning for decision-making']}, {'end': 18368.388, 'segs': [{'end': 16759.925, 'src': 'embed', 'start': 16726.14, 'weight': 0, 'content': [{'end': 16731.223, 'text': "And I'm explaining you with the help of one simplest example and we will do some maths over it.", 'start': 16726.14, 'duration': 5.083}, {'end': 16740.552, 'text': 'So imagine that there is a data set where I want to find it out whether I will play the match or not.', 'start': 16732.005, 'duration': 8.547}, {'end': 16745.276, 'text': 'Okay, there is a cricket match or there is a football match or whatever it is.', 'start': 16741.873, 'duration': 3.403}, {'end': 16749.641, 'text': 'I want to find it out whether I will play the match or not.', 'start': 16745.759, 'duration': 3.882}, {'end': 16759.925, 'text': 'Now, how do the decision tree look like? Decision tree look like this.', 'start': 16750.022, 'duration': 9.903}], 'summary': 'Explaining decision tree using a simple example and applying mathematical concepts to determine match participation.', 'duration': 33.785, 'max_score': 16726.14, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s16726140.jpg'}, {'end': 16968.554, 'src': 'embed', 'start': 16942.688, 'weight': 3, 'content': [{'end': 16948.716, 'text': 'In this case, try imagining you have to match each fruit with its label.', 'start': 16942.688, 'duration': 6.028}, {'end': 16950.309, 'text': 'right now.', 'start': 16949.429, 'duration': 0.88}, {'end': 16953.79, 'text': 'in this case, the impurity cannot be equal to zero.', 'start': 16950.309, 'duration': 3.481}, {'end': 16962.532, 'text': 'the impurity will not be equal to zero in this case, because what would happen here is that you have a chances of misclassification, right?', 'start': 16953.79, 'duration': 8.742}, {'end': 16964.253, 'text': 'this is a very important concept.', 'start': 16962.532, 'duration': 1.721}, {'end': 16968.554, 'text': 'right. when you have perfect thing, the misclassification will not happen.', 'start': 16964.253, 'duration': 4.301}], 'summary': 'Impurity must not be zero to avoid misclassification in fruit labeling.', 'duration': 25.866, 'max_score': 16942.688, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s16942688.jpg'}], 'start': 15733.136, 'title': 'Classification-based algorithms and decision tree concepts', 'summary': 'Discusses the use of previous outcomes for decision-making, introduces classification-based algorithms and decision tree terminologies, emphasizes impurity and entropy concepts, and explains the process of building decision trees and random forests in python.', 'chapters': [{'end': 15778.405, 'start': 15733.136, 'title': 'Classification-based algorithms', 'summary': 'Discusses the use of previous outcomes to make decisions, with examples such as predicting credit card fraud based on factors like salary, job profile, and previous card activity. it introduces the simplest form of classification-based algorithm and then delves into the concept of random forest.', 'duration': 45.269, 'highlights': ['The chapter discusses using previous outcomes to make decisions, such as predicting credit card fraud based on factors like salary, job profile, and previous card activity.', 'It introduces the simplest form of classification-based algorithm.', 'It delves into the concept of random forest.']}, {'end': 16537.383, 'start': 15778.405, 'title': 'Understanding random forest, naive base, and k nearest neighbor', 'summary': 'Introduces the concept of random forests, naive base algorithm, and k nearest neighbor, highlighting their application in decision making, the base theorem, and customer profiling, through examples and explanations.', 'duration': 758.978, 'highlights': ['Random Forest: Building a forest of decision trees to make powerful decisions based on majority outcomes, also known as bagging methodology.', 'Naive Base Algorithm: Making decisions based on conditional probability using the base theorem, illustrated through an example of diagnosing a disease.', 'K Nearest Neighbor: Grouping similar data points to make decisions based on patterns and similarities, demonstrated through customer profiling based on transaction data.']}, {'end': 16895.968, 'start': 16537.383, 'title': 'Understanding decision tree terminologies', 'summary': 'Explains the terminologies of decision trees, such as root node, branches, parent/child nodes, splitting, and leaf nodes, and emphasizes the importance of using genie index and information gain to decide which feature to use first in building a decision tree.', 'duration': 358.585, 'highlights': ['The chapter emphasizes the importance of using genie index and information gain to decide which feature to use first in building a decision tree. Genie index and information gain are crucial in determining the feature to be used as the root node in a decision tree, aiding in the process of building an effective tree structure.', 'Explains the terminologies of decision trees, such as root node, branches, parent/child nodes, splitting, and leaf nodes. The chapter provides an explanation of key terminologies in decision trees, including root node, branches, parent/child nodes, splitting, and leaf nodes, essential for understanding the structure and functioning of decision trees.', 'Illustrates the process of deciding which feature to use first as the root node in building a decision tree. The chapter illustrates the decision-making process of selecting the initial feature to be used as the root node in constructing a decision tree, highlighting the significance of this crucial step in the tree-building process.']}, {'end': 17377.314, 'start': 16896.028, 'title': 'Understanding impurity and entropy', 'summary': 'Explains the concept of impurity using the example of fruit baskets, and introduces the concept of entropy to measure randomness and decision uncertainty, with a detailed explanation of how to calculate entropy and information gain for decision tree feature selection.', 'duration': 481.286, 'highlights': ['The concept of impurity is illustrated using the example of baskets with multiple fruits and labels, demonstrating the increase in misclassification and impurity when there are multiple labels for different fruits, leading to the introduction of the term entropy. ', 'Entropy is defined as the randomness of the sample space, with higher entropy indicating greater uncertainty in decision-making, and lower entropy indicating greater certainty. ', 'The formula for calculating entropy is explained, with the example of determining the entropy based on the probability of an employee leaving or staying in an organization. ', 'The process of calculating entropy for decision tree feature selection is detailed, including the steps to calculate entropy for different features like outlook and determining the information gain to measure the usefulness of a feature in reducing uncertainty. ']}, {'end': 18368.388, 'start': 17377.314, 'title': 'Building decision trees and random forests', 'summary': 'Covers the process of building a decision tree in python, including calculating information gain for feature selection, encoding categorical data, splitting the dataset for training and testing, using decision tree classifier for prediction, and understanding the working of a random forest through random sampling, feature selection, and bootstrap aggregation.', 'duration': 991.074, 'highlights': ['Building a decision tree involves calculating information gain for feature selection, encoding categorical data, and splitting the dataset for training and testing. Information gain is calculated for each feature to determine its importance in predicting outcomes. Categorical data is encoded to numerical values. The dataset is split into training and testing subsets for model evaluation.', 'Using a decision tree classifier for prediction involves fitting the model to the training dataset and evaluating its accuracy through precision, recall, and F1 score. The decision tree classifier is fitted to the training dataset and its accuracy is evaluated using precision, recall, F1 score, and support metrics. The precision metric indicates the accuracy of the predictions.', 'Understanding the working of a random forest includes random sampling, feature selection, and bootstrap aggregation. Random sampling with replacement and feature selection are used to create subsets for constructing decision trees. Bootstrap aggregation, also known as bagging, is used as an ensemble technique in random forests.']}], 'duration': 2635.252, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s15733136.jpg', 'highlights': ['The chapter discusses using previous outcomes to make decisions, such as predicting credit card fraud based on factors like salary, job profile, and previous card activity.', 'Random Forest: Building a forest of decision trees to make powerful decisions based on majority outcomes, also known as bagging methodology.', 'The chapter emphasizes the importance of using genie index and information gain to decide which feature to use first in building a decision tree.', 'The concept of impurity is illustrated using the example of baskets with multiple fruits and labels, demonstrating the increase in misclassification and impurity when there are multiple labels for different fruits, leading to the introduction of the term entropy.', 'Building a decision tree involves calculating information gain for feature selection, encoding categorical data, and splitting the dataset for training and testing.']}, {'end': 19489.288, 'segs': [{'end': 18679.108, 'src': 'embed', 'start': 18649.628, 'weight': 2, 'content': [{'end': 18653.01, 'text': "Okay, so now let's move on forward to splitting methods.", 'start': 18649.628, 'duration': 3.382}, {'end': 18661.396, 'text': 'So what are the splitting methods that we use in random forest? So splitting methods are many like Gini impurity, information gain, or chi-square.', 'start': 18653.651, 'duration': 7.745}, {'end': 18663.697, 'text': "So let's discuss about Gini impurity.", 'start': 18661.696, 'duration': 2.001}, {'end': 18664.458, 'text': 'So Gini.', 'start': 18664.038, 'duration': 0.42}, {'end': 18673.344, 'text': 'impurity is nothing, but it is used to predict the likelihood that a randomly selected example would be incorrectly classified by a specific node.', 'start': 18664.458, 'duration': 8.886}, {'end': 18679.108, 'text': 'And it is called impurity metric because it shows how the model differs from a pure division, right?', 'start': 18673.664, 'duration': 5.444}], 'summary': 'Random forest uses various splitting methods like gini impurity, information gain, or chi-square to predict classification likelihood.', 'duration': 29.48, 'max_score': 18649.628, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s18649628.jpg'}, {'end': 18738.622, 'src': 'embed', 'start': 18710.754, 'weight': 1, 'content': [{'end': 18714.276, 'text': 'so entropy is nothing but it is a measure of uncertainty.', 'start': 18710.754, 'duration': 3.522}, {'end': 18716.958, 'text': "so information gain, let's talk about that first.", 'start': 18714.276, 'duration': 2.682}, {'end': 18719.06, 'text': 'so the features they are selected.', 'start': 18716.958, 'duration': 2.102}, {'end': 18724.664, 'text': 'that provides most of the information about a class right, and this utilizes the entropy concept.', 'start': 18719.06, 'duration': 5.604}, {'end': 18727.447, 'text': "So let's see what is entropy.", 'start': 18725.124, 'duration': 2.323}, {'end': 18731.773, 'text': 'This is a measure of randomness or uncertainty in the data, right?', 'start': 18727.828, 'duration': 3.945}, {'end': 18738.622, 'text': "So now let's move on to understanding the advantages of random forest, and we see here various advantages.", 'start': 18732.414, 'duration': 6.208}], 'summary': 'Entropy measures uncertainty; information gain selects features providing most class information; random forest offers multiple advantages.', 'duration': 27.868, 'max_score': 18710.754, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s18710754.jpg'}, {'end': 19036.217, 'src': 'embed', 'start': 18993.506, 'weight': 0, 'content': [{'end': 19000.934, 'text': 'so our task is to specify or to classify these species of penguins and do the respective correct species right.', 'start': 18993.506, 'duration': 7.428}, {'end': 19008.16, 'text': 'so we see the shape of our data and we see that it is like 344 rows and seven columns and we will see the info.', 'start': 19000.934, 'duration': 7.226}, {'end': 19013.943, 'text': 'so we see df.info and this gives us along with the non-null count.', 'start': 19008.16, 'duration': 5.783}, {'end': 19016.304, 'text': 'we also get the data type of the values.', 'start': 19013.943, 'duration': 2.361}, {'end': 19022.047, 'text': 'so we have got species island as the object data type, whereas the bill length, build up,', 'start': 19016.304, 'duration': 5.743}, {'end': 19029.691, 'text': 'triple length and body mass are in floating point or you can say floating data type and the sex is in object data type Right.', 'start': 19022.047, 'duration': 7.644}, {'end': 19036.217, 'text': 'So now moving on forward to calculating how many null values are there with the help of df dot is null dot sum.', 'start': 19029.911, 'duration': 6.306}], 'summary': 'Task: classify penguin species with 344 rows and 7 columns, identifying null values.', 'duration': 42.711, 'max_score': 18993.506, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s18993506.jpg'}, {'end': 19098.13, 'src': 'embed', 'start': 19073.473, 'weight': 7, 'content': [{'end': 19080.638, 'text': 'Now we have seen that we have got some object data type in our data frame and before feeding it into algorithm that is random forest.', 'start': 19073.473, 'duration': 7.165}, {'end': 19086.423, 'text': 'We have to transform the categorical data or the object data type into the numeric.', 'start': 19080.738, 'duration': 5.685}, {'end': 19091.346, 'text': 'So we are using here one hot encoding to convert the categorical data into numeric.', 'start': 19086.843, 'duration': 4.503}, {'end': 19098.13, 'text': "Now, there are various ways in Python which we can do that like we're not encoding or you can also use the mapping function in Python.", 'start': 19091.626, 'duration': 6.504}], 'summary': 'Data frame contains object data type, transformed to numeric using one hot encoding for random forest algorithm.', 'duration': 24.657, 'max_score': 19073.473, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s19073473.jpg'}, {'end': 19311.18, 'src': 'embed', 'start': 19283.246, 'weight': 5, 'content': [{'end': 19287.729, 'text': 'So what we do is we will delete sex and island here, which are just repeating,', 'start': 19283.246, 'duration': 4.483}, {'end': 19291.672, 'text': "because we've got here male and we have also got here dream and Torgerson.", 'start': 19287.729, 'duration': 3.943}, {'end': 19294.734, 'text': 'So we do not require this Island column neither the sex.', 'start': 19291.692, 'duration': 3.042}, {'end': 19301.416, 'text': 'So we just drop it with the help of new data drop and the column names X is one in place equals to true.', 'start': 19295.114, 'duration': 6.302}, {'end': 19303.737, 'text': "Right And let's see the head of this data frame.", 'start': 19301.617, 'duration': 2.12}, {'end': 19307.079, 'text': 'Head of the data frame gives me five unique values.', 'start': 19304.158, 'duration': 2.921}, {'end': 19311.18, 'text': 'Right And now it is time to create a separate target variable.', 'start': 19307.339, 'duration': 3.841}], 'summary': "Data cleaning included dropping 'sex' and 'island' columns, resulting in a data frame with 5 unique values.", 'duration': 27.934, 'max_score': 19283.246, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s19283246.jpg'}, {'end': 19396.852, 'src': 'embed', 'start': 19369.421, 'weight': 3, 'content': [{'end': 19372.422, 'text': "So we just drop it and let's see our new data frame.", 'start': 19369.421, 'duration': 3.001}, {'end': 19376.583, 'text': "So we see that we don't have any target species here, right? Okay.", 'start': 19372.562, 'duration': 4.021}, {'end': 19382.666, 'text': "So in X, let's store this new data and perform the splitting of the data.", 'start': 19376.883, 'duration': 5.783}, {'end': 19391.41, 'text': 'So what we do is from sklearn.model selection, we will import our train test split and we will split our training data into 70% and 30%.', 'start': 19382.886, 'duration': 8.524}, {'end': 19396.852, 'text': 'So test data becomes 30% and training data is some 70%.', 'start': 19391.41, 'duration': 5.442}], 'summary': 'Data is split into 70% training and 30% test data.', 'duration': 27.431, 'max_score': 19369.421, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s19369421.jpg'}], 'start': 18368.388, 'title': 'Random forest and penguin species classification', 'summary': 'Discusses construction of decision trees, feature selection, ensemble techniques like bootstrap aggregation, majority voting, and penguin species classification involving data cleaning, feature engineering, one-hot encoding, model training with 5 decision trees, and evaluation of accuracy using a confusion matrix.', 'chapters': [{'end': 18664.458, 'start': 18368.388, 'title': 'Random forest and ensemble techniques', 'summary': 'Discusses the construction of decision trees using subsets, feature selection in random forests, ensemble techniques such as bootstrap aggregation and the use of majority voting for predictions.', 'duration': 296.07, 'highlights': ['The feature selection in random forest involves taking the square root of the total number of features for classification problems, and the total number of features divided by 3 for regression problems.', 'Bootstrap aggregation, also known as ensemble techniques, in random forest involves aggregating the results of decision trees and using majority voting for classification and mean for regression to give the final output.', 'The method for feature selection in random forest is explained, where for a classification problem, the default feature selection involves taking the square root of the total number of features, and for a regression problem, the feature selection involves dividing the total number of features by 3.']}, {'end': 18993.506, 'start': 18664.458, 'title': 'Understanding random forest', 'summary': 'Covers the concept of impurity, gini impurity, information gain, advantages, and disadvantages of random forest, emphasizing its low variance, reduced overfitting, and good accuracy, and also discusses its practical demonstration using a penguins dataset.', 'duration': 329.048, 'highlights': ['Random forest has the advantage of low variance as it combines the result of multiple decision trees, leading to reduced overfitting and improved generalization on unseen data. Random forest has the advantage of low variance as it combines the result of multiple decision trees, leading to reduced overfitting and improved generalization on unseen data.', 'Random forest provides good accuracy and outperforms other classifiers like Naive Bayes, SVM, or KNN. Random forest provides good accuracy and outperforms other classifiers like Naive Bayes, SVM, or KNN.', 'Random forest is suitable for both classification and regression problems, as well as for both categorical and continuous data. Random forest is suitable for both classification and regression problems, as well as for both categorical and continuous data.', 'Random forest is computationally expensive and requires a lot of resources due to the training of multiple decision trees and storing them. Random forest is computationally expensive and requires a lot of resources due to the training of multiple decision trees and storing them.', 'Random forest requires more training time due to the construction of multiple decision trees, especially for large datasets. Random forest requires more training time due to the construction of multiple decision trees, especially for large datasets.']}, {'end': 19489.288, 'start': 18993.506, 'title': 'Penguin species classification', 'summary': "Outlines the process of classifying penguin species, which involves data cleaning, feature engineering, one-hot encoding, target variable creation, data splitting, training a random forest classifier with 5 decision trees and evaluating the model's accuracy using a confusion matrix.", 'duration': 495.782, 'highlights': ['The data consists of 344 rows and 7 columns. The chapter begins by describing the shape of the data, which includes 344 rows and 7 columns, providing a foundational understanding of the dataset.', "11 null values are present in the 'sex' feature. The chapter identifies the presence of 11 null values in the 'sex' feature, which is crucial for data cleaning and preprocessing.", "One-hot encoding is applied to transform categorical data into numeric for the 'sex' and 'island' features. The process of one-hot encoding is detailed, specifically applied to the 'sex' and 'island' features to convert categorical data into numeric values for algorithm input, enhancing the understanding of feature engineering.", 'The data is split into 70% training and 30% test data. The chapter highlights the splitting of the data into 70% training and 30% test data, a critical step in preparing the dataset for model training and evaluation.', 'A random forest classifier with 5 decision trees is trained on the training set. The process of training a random forest classifier with 5 decision trees on the training set is outlined, providing insight into the model training process.', "A confusion matrix is used to evaluate the accuracy of the random forest algorithm. The chapter concludes with the use of a confusion matrix to assess the accuracy of the trained random forest classifier, offering a method for evaluating the model's performance."]}], 'duration': 1120.9, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s18368388.jpg', 'highlights': ['Random forest provides good accuracy and outperforms other classifiers like Naive Bayes, SVM, or KNN.', 'The feature selection in random forest involves taking the square root of the total number of features for classification problems, and the total number of features divided by 3 for regression problems.', 'Bootstrap aggregation, also known as ensemble techniques, in random forest involves aggregating the results of decision trees and using majority voting for classification and mean for regression to give the final output.', 'Random forest has the advantage of low variance as it combines the result of multiple decision trees, leading to reduced overfitting and improved generalization on unseen data.', 'Random forest is suitable for both classification and regression problems, as well as for both categorical and continuous data.', 'A confusion matrix is used to evaluate the accuracy of the random forest algorithm.', 'The data is split into 70% training and 30% test data.', "One-hot encoding is applied to transform categorical data into numeric for the 'sex' and 'island' features.", 'A random forest classifier with 5 decision trees is trained on the training set.', 'Random forest requires more training time due to the construction of multiple decision trees, especially for large datasets.']}, {'end': 21111.899, 'segs': [{'end': 19654.45, 'src': 'embed', 'start': 19626.25, 'weight': 0, 'content': [{'end': 19628.111, 'text': 'What is KNN algorithm?', 'start': 19626.25, 'duration': 1.861}, {'end': 19638.738, 'text': 'So, as we said, Canine algorithm is K nearest neighbors algorithm and it is an example of supervised learning algorithm where, basically,', 'start': 19628.171, 'duration': 10.567}, {'end': 19647.825, 'text': 'you try to classify a new data point based on the neighbors of that data point, which is basically which data points are closer to it.', 'start': 19638.738, 'duration': 9.087}, {'end': 19654.45, 'text': 'For example here as you can see on one side you have dogs and on the other side you have cats.', 'start': 19648.146, 'duration': 6.304}], 'summary': 'Knn is a supervised learning algorithm that classifies data points based on their neighbors.', 'duration': 28.2, 'max_score': 19626.25, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s19626250.jpg'}, {'end': 20482.058, 'src': 'embed', 'start': 20412.293, 'weight': 1, 'content': [{'end': 20419.254, 'text': 'K is equal to 1, is the will give you the most flexible sort of demarcating line or function.', 'start': 20412.293, 'duration': 6.961}, {'end': 20425.118, 'text': "Whereas the variability will be the maximum So that that's the sort of the trade-off.", 'start': 20419.974, 'duration': 5.144}, {'end': 20429.339, 'text': "And that's how we actually determine K.", 'start': 20426.538, 'duration': 2.801}, {'end': 20438.702, 'text': 'So we have to get a K value in such a way, based on trial and error, that sort of maximizes our sort of or reduces the bias as well as the variance,', 'start': 20429.339, 'duration': 9.363}, {'end': 20441.583, 'text': "and and that's the kind of optimization we are trying to do.", 'start': 20438.702, 'duration': 2.881}, {'end': 20449.04, 'text': 'Okay So how do we calculate the distance itself, right? and distance typically can be of many kinds.', 'start': 20442.483, 'duration': 6.557}, {'end': 20454.106, 'text': 'So you know, the example here is of the Euclidean distance, but you can have other kinds of distances,', 'start': 20449.08, 'duration': 5.026}, {'end': 20460.915, 'text': 'like Manhattan distance or Mahalanobis distance, and you can look up references for other kinds of distances.', 'start': 20454.106, 'duration': 6.809}, {'end': 20467.255, 'text': 'Now Euclidean distance is calculated for the point P 1 and P 2 as given here.', 'start': 20461.954, 'duration': 5.301}, {'end': 20476.237, 'text': 'essentially, Euclidean distance is nothing but the you know square root of sum of the x coordinates of these two points, P 1 and P 2,', 'start': 20467.255, 'duration': 8.982}, {'end': 20482.058, 'text': 'and then y coordinate square of of these two points, P 1 and P 2.', 'start': 20476.237, 'duration': 5.821}], 'summary': 'Determining the most flexible demarcating line with k=1 and optimizing distance calculation for euclidean distance.', 'duration': 69.765, 'max_score': 20412.293, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s20412293.jpg'}], 'start': 19489.828, 'title': 'Knn algorithm and random forest', 'summary': 'Discusses the implementation of random forest classifier, achieving 99% accuracy by changing criteria and number of trees, introduces the knn algorithm, its features, working, determining the value of k, calculating distance, and achieving an accuracy of 99 percent with k=21.', 'chapters': [{'end': 19686.547, 'start': 19489.828, 'title': 'Random forest classifier & knn algorithm', 'summary': 'Discusses the implementation of random forest classifier, achieving 99% accuracy by changing criteria and number of trees, and introduces the knn algorithm for supervised learning.', 'duration': 196.719, 'highlights': ['The random forest classifier achieved 99% accuracy by changing the criteria and number of trees, surpassing the initial accuracy of 98%.', "The precision of the random forest classifier's predictions was 96%, with a 100% recall rate and 98% F1 score, indicating high accuracy and effectiveness.", 'The chapter introduces the KNN algorithm as a supervised learning approach based on classifying new data points according to their nearest neighbors, exemplifying the classification process with the analogy of categorizing animals as cats or dogs based on proximity.']}, {'end': 20067.316, 'start': 19687.123, 'title': 'K nearest neighbors algorithm', 'summary': 'Discusses the k nearest neighbors algorithm, a non-parametric supervised learning technique, its features, and when to use non-parametric versus parametric techniques.', 'duration': 380.193, 'highlights': ['KNN is a non-parametric supervised learning algorithm used for predicting class based on the nearest neighbors, providing flexibility and working well with smaller datasets. KNN is a non-parametric supervised learning algorithm used for predicting class based on the nearest neighbors, providing flexibility and working well with smaller datasets.', 'Non-parametric techniques like KNN are suitable when the functional relationship is unknown, for large datasets, and when working with lower dimensional data. Non-parametric techniques like KNN are suitable when the functional relationship is unknown, for large datasets, and when working with lower dimensional data.', 'Parametric techniques, on the other hand, assume a functional relationship or distribution of the data points, and are used when the relationship between data points is understood or when dealing with high dimensional data. Parametric techniques assume a functional relationship or distribution of the data points and are used when the relationship between data points is understood or when dealing with high dimensional data.']}, {'end': 20795.269, 'start': 20067.716, 'title': 'Understanding knn algorithm', 'summary': 'Explains the knn algorithm, a lazy, non-parametric algorithm used for both classification and regression, based on feature similarity, with a detailed explanation of its working, determining the value of k, calculating distance, use cases, and a hands-on session using breast cancer data.', 'duration': 727.553, 'highlights': ['KNN is a lazy, non-parametric algorithm used for both classification and regression, based on feature similarity, with no training step involved. It is highlighted that KNN is a lazy algorithm with no training step involved, used for both classification and regression, based on feature similarity.', 'Determining the value of K is crucial, as it affects bias and variance, where a higher K value increases bias but reduces variance, and vice versa. The impact of the value of K on bias and variance is explained, where a higher K value increases bias but reduces variance, and vice versa.', 'Calculating distance in KNN can be done using various methods such as Euclidean, Manhattan, or Mahalanobis distance, with the choice of distance being crucial for different applications. Different methods for calculating distance in KNN, such as Euclidean, Manhattan, or Mahalanobis distance, are discussed, emphasizing the importance of choosing the right distance method for different applications.', 'KNN algorithm finds application in various use cases such as book recommendation, classifying satellite images, handwritten digits, and electrocardiograms. The diverse use cases of KNN algorithm including book recommendation, classifying satellite images, handwritten digits, and electrocardiograms are highlighted.', 'The hands-on session involves using the breast cancer data to demonstrate the practical application of KNN algorithm for classification. The hands-on session using breast cancer data to demonstrate the practical application of KNN algorithm for classification is mentioned.']}, {'end': 21111.899, 'start': 20795.269, 'title': 'Standardizing data and k nearest neighbors', 'summary': 'Covers the process of standardizing data to the same scale, dividing data into test and train sets, using the k nearest neighbors algorithm to predict cancerous samples, and determining the best k value through trial and error, achieving an accuracy of 99 percent with k=21.', 'duration': 316.63, 'highlights': ['The process of standardizing data involves bringing all the samples to the same range, typically between -1 and 1 with a mean of 0, to enable comparison of samples, ensuring data consistency and avoiding prediction issues.', 'The division of the entire data set into a 70% train set and a 30% test set using the train_test_split function from the scikit-learn package, ensuring a consistent split for model training and testing.', 'The use of the K nearest neighbors algorithm with K=1 initially, achieving a high accuracy of 94-95 percent in classifying cancerous and non-cancerous samples.', 'Determining the best K value by running the algorithm with K values ranging from 1 to 40 and identifying K=21 as the point where the error is minimized, improving the accuracy to almost 99 percent.']}], 'duration': 1622.071, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s19489828.jpg', 'highlights': ['The random forest classifier achieved 99% accuracy by changing the criteria and number of trees, surpassing the initial accuracy of 98%.', "The precision of the random forest classifier's predictions was 96%, with a 100% recall rate and 98% F1 score, indicating high accuracy and effectiveness.", 'Determining the best K value by running the algorithm with K values ranging from 1 to 40 and identifying K=21 as the point where the error is minimized, improving the accuracy to almost 99 percent.', 'The use of the K nearest neighbors algorithm with K=1 initially, achieving a high accuracy of 94-95 percent in classifying cancerous and non-cancerous samples.', 'The division of the entire data set into a 70% train set and a 30% test set using the train_test_split function from the scikit-learn package, ensuring a consistent split for model training and testing.', 'The chapter introduces the KNN algorithm as a supervised learning approach based on classifying new data points according to their nearest neighbors, exemplifying the classification process with the analogy of categorizing animals as cats or dogs based on proximity.']}, {'end': 23081.019, 'segs': [{'end': 21169.645, 'src': 'embed', 'start': 21112.344, 'weight': 9, 'content': [{'end': 21116.446, 'text': 'And then here we have just two data points which are misclassified from the test data set.', 'start': 21112.344, 'duration': 4.102}, {'end': 21124.071, 'text': 'So now this was one example of applying KNN on the cancer data set, which is available freely.', 'start': 21117.127, 'duration': 6.944}, {'end': 21131.275, 'text': "and now let's look at another example, which is the iris data set, and again available freely as well.", 'start': 21124.071, 'duration': 7.204}, {'end': 21135.818, 'text': 'So iris is a type of flower and we will see what flower is it.', 'start': 21131.875, 'duration': 3.943}, {'end': 21137.959, 'text': 'So just give me a moment here.', 'start': 21136.098, 'duration': 1.861}, {'end': 21146.218, 'text': "So we again start by importing the necessary libraries and then we'll look at what this iris data set is.", 'start': 21137.979, 'duration': 8.239}, {'end': 21156.161, 'text': 'So the iris data set comprises of 50 samples of three species of iris flower, which is iris setosa iris virginica and iris versicolor.', 'start': 21146.338, 'duration': 9.823}, {'end': 21158.742, 'text': 'These are the three types of iris flowers.', 'start': 21156.201, 'duration': 2.541}, {'end': 21169.645, 'text': 'And if you run this basically we will find that Okay, sorry, we did not copy the entire read the code here.', 'start': 21159.202, 'duration': 10.443}], 'summary': 'Knn misclassifies 2 data points in cancer dataset, 50 samples of 3 iris species.', 'duration': 57.301, 'max_score': 21112.344, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s21112344.jpg'}, {'end': 21295.665, 'src': 'embed', 'start': 21272.663, 'weight': 8, 'content': [{'end': 21280.949, 'text': "So we are running a pair plot on the data set and in the meantime, I'll just copy another part of the code here.", 'start': 21272.663, 'duration': 8.286}, {'end': 21285.212, 'text': 'Okay So now the plot has come up.', 'start': 21282.951, 'duration': 2.261}, {'end': 21295.665, 'text': 'So, as we can see that the green is basically your set Oza flower and we can see that this pair plot actually just plots the sepal length,', 'start': 21285.272, 'duration': 10.393}], 'summary': 'Pair plot analysis visualizes sepal length for oza flowers.', 'duration': 23.002, 'max_score': 21272.663, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s21272663.jpg'}, {'end': 21367.685, 'src': 'embed', 'start': 21340.061, 'weight': 0, 'content': [{'end': 21343.782, 'text': 'So if you plot the sepal length and sepal width we get something distribution like this.', 'start': 21340.061, 'duration': 3.721}, {'end': 21349.564, 'text': 'So essentially we see that the maximum centered around here and then there is a distribution as you can see here.', 'start': 21344.262, 'duration': 5.302}, {'end': 21351.993, 'text': "So there's some kind of a linear relationship here.", 'start': 21350.151, 'duration': 1.842}, {'end': 21361.32, 'text': 'Okay So now we will again do the same standardization of the variables that we had done in the cancer data set case.', 'start': 21353.434, 'duration': 7.886}, {'end': 21367.685, 'text': 'So we are importing the standard scalar function from the scikit-learn pre-processing.', 'start': 21361.38, 'duration': 6.305}], 'summary': 'Plotting sepal length and width shows linear relationship, standardizing variables with sklearn preprocessing', 'duration': 27.624, 'max_score': 21340.061, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s21340061.jpg'}, {'end': 21585.758, 'src': 'embed', 'start': 21536.7, 'weight': 6, 'content': [{'end': 21544.605, 'text': "So the accuracy improves 200% So that's the example of how you can choose K.", 'start': 21536.7, 'duration': 7.905}, {'end': 21550.109, 'text': 'So we have covered the hands-on and now we will look at some of the references.', 'start': 21544.605, 'duration': 5.504}, {'end': 21554.672, 'text': 'So some of the references are basically these three textbooks here.', 'start': 21550.189, 'duration': 4.483}, {'end': 21556.467, 'text': 'which I found quite good.', 'start': 21555.367, 'duration': 1.1}, {'end': 21558.728, 'text': 'Some of them are even available online.', 'start': 21556.728, 'duration': 2}, {'end': 21566.831, 'text': 'So you can refer to them if you want to learn more about KNN and also they are quite good references for some of the other techniques as well.', 'start': 21559.208, 'duration': 7.623}, {'end': 21575.775, 'text': 'So let us now talk about what is naive base.', 'start': 21572.673, 'duration': 3.102}, {'end': 21579.496, 'text': 'Okay, let us understand naive base with an example.', 'start': 21576.555, 'duration': 2.941}, {'end': 21585.758, 'text': 'Here, I just cannot seem to figure out which are the best days to play football with my friend.', 'start': 21580.097, 'duration': 5.661}], 'summary': 'Accuracy improves by 200% with k, references include three textbooks, some available online. explains naive base with an example.', 'duration': 49.058, 'max_score': 21536.7, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s21536700.jpg'}, {'end': 21646.92, 'src': 'embed', 'start': 21615.801, 'weight': 12, 'content': [{'end': 21621.104, 'text': 'so if you look at these combinations, okay, we will look at this using navy bears on.', 'start': 21615.801, 'duration': 5.303}, {'end': 21623.946, 'text': 'how do we decide whether we can play or not?', 'start': 21621.104, 'duration': 2.842}, {'end': 21631.431, 'text': 'so if i have noted down all the days it was good, bad to play football and the combination of weather matrices on that day, that will be perfect.', 'start': 21623.946, 'duration': 7.485}, {'end': 21636.634, 'text': 'right, that is perfect, and we will be able to do navy bias classifiers using that.', 'start': 21631.431, 'duration': 5.203}, {'end': 21646.92, 'text': 'now, Navy bias classifier comes from the Navy bias theorem, and Navy bias theorem is purely and purely based on the assumption of independence.', 'start': 21636.634, 'duration': 10.286}], 'summary': 'Using the navy bias theorem to determine football playing days based on weather matrices.', 'duration': 31.119, 'max_score': 21615.801, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s21615801.jpg'}, {'end': 21755.013, 'src': 'embed', 'start': 21729.196, 'weight': 3, 'content': [{'end': 21735.799, 'text': 'so what happens in navy bars is we estimate the posterior probability of every event happening.', 'start': 21729.196, 'duration': 6.603}, {'end': 21742.343, 'text': 'here, we calculate the posterior probability of an event happening.', 'start': 21735.799, 'duration': 6.544}, {'end': 21745.345, 'text': 'okay, so here, if you see, if you look at the sunny conditions,', 'start': 21742.343, 'duration': 3.002}, {'end': 21750.607, 'text': 'what we have done sunny conditions we have this distribution that there is no play happening in summer.', 'start': 21745.345, 'duration': 5.262}, {'end': 21755.013, 'text': 'there is play happening in monsoon, there is play happening in winter, right.', 'start': 21750.607, 'duration': 4.406}], 'summary': 'Estimating posterior probabilities for different weather conditions and play occurrences.', 'duration': 25.817, 'max_score': 21729.196, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s21729196.jpg'}, {'end': 22918.067, 'src': 'embed', 'start': 22890.249, 'weight': 5, 'content': [{'end': 22893.691, 'text': 'Okay, now that you have an intuition behind what is support vector machine.', 'start': 22890.249, 'duration': 3.442}, {'end': 22900.382, 'text': "let's understand, as how does this svm, that is, support vector machine, would work?", 'start': 22894.981, 'duration': 5.401}, {'end': 22903.023, 'text': 'so in case of support vector machine?', 'start': 22900.382, 'duration': 2.641}, {'end': 22904.763, 'text': 'so here there is one more example.', 'start': 22903.023, 'duration': 1.74}, {'end': 22912.025, 'text': 'i have this set of points which is green, green curve, and i have another set of points which are in red color.', 'start': 22904.763, 'duration': 7.262}, {'end': 22918.067, 'text': 'so these two uh points are belonging to the different different classes.', 'start': 22912.025, 'duration': 6.042}], 'summary': 'Explaining support vector machine and classifying points into different classes.', 'duration': 27.818, 'max_score': 22890.249, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s22890249.jpg'}, {'end': 22968.865, 'src': 'embed', 'start': 22937.999, 'weight': 2, 'content': [{'end': 22943.641, 'text': 'now the support vectors is nothing but the point which is closer to my hyperplane.', 'start': 22937.999, 'duration': 5.642}, {'end': 22947.703, 'text': "now, here in this example that you're seeing the,", 'start': 22943.641, 'duration': 4.062}, {'end': 22961.679, 'text': "this data point and this data point are called as support vectors because these are the data points which are nearest for my hyper plane that I'm just drawn in if I am trying to make use of this SPM model.", 'start': 22947.703, 'duration': 13.976}, {'end': 22968.865, 'text': 'It is going to draw this kind of hyper plane to make sure that it is separating two classes.', 'start': 22961.92, 'duration': 6.945}], 'summary': 'Support vectors are closest data points to the hyperplane, used to separate classes in svm.', 'duration': 30.866, 'max_score': 22937.999, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s22937999.jpg'}], 'start': 21112.344, 'title': 'Classifiers and data analysis', 'summary': 'Covers knn classification on iris and cancer data sets with 96% accuracy, naive bias theorem, navy bias theorem, data preparation and training, naive bias classifier implementation with 80% training accuracy, and support vector machine basics.', 'chapters': [{'end': 21295.665, 'start': 21112.344, 'title': 'Applying knn on iris and cancer data sets', 'summary': 'Demonstrates applying knn on the iris and cancer data sets, with the iris data set comprising 50 samples of three species of iris flower and the pair plot showcasing the classification of flowers into categories.', 'duration': 183.321, 'highlights': ['The iris data set comprises of 50 samples of three species of iris flower, which is iris setosa iris virginica and iris versicolor. The iris data set contains 50 samples of three different species of iris flower: iris setosa, iris virginica, and iris versicolor.', 'The pair plot actually just plots the sepal length. The pair plot visualizes the sepal length of the flowers.', 'The data consists of sepal length, sepal width, petal length, and petal width of each of the species. The data includes the measurements of sepal length, sepal width, petal length, and petal width for each species.', 'We will see whether we can use KNN to actually classify these flowers into the data points, into these categories of flowers. The chapter explores using KNN to classify the flowers into categories based on the data points.', 'The green is basically your set Oza flower. The pair plot visually represents the setosa flower as green.']}, {'end': 21670.615, 'start': 21295.665, 'title': 'Knn classification example', 'summary': 'Demonstrates a knn classification example using the iris dataset, achieving an accuracy of 96% and determining the optimal k value as 3, while also introducing the concept of naive bayes classification.', 'duration': 374.95, 'highlights': ['The accuracy of the KNN classification is around 96%, with only two misclassified points. The KNN classification achieves an accuracy of 96% with just two misclassified points, demonstrating the effectiveness of the model.', 'The optimal K value for the KNN classification is determined as 3, leading to a 200% improvement in accuracy. By analyzing the errors for varying K values from 1 to 40, it is found that the minimum error occurs at K=3, resulting in a 200% improvement in accuracy.', 'Introduction and explanation of the concept of Naive Bayes classification based on the Naive Bayes theorem and the assumption of independence between variables. The concept of Naive Bayes classification is introduced, emphasizing its basis on the assumption of independence between variables, as derived from the Naive Bayes theorem.']}, {'end': 22127.073, 'start': 21670.615, 'title': 'Navy bias theorem and probabilistic classification', 'summary': 'Discusses the navy bias theorem and probabilistic classification, examining the correlation between weather conditions and the probability of play happening, and illustrates the posterior probability calculations for events given specific conditions, ultimately demonstrating a simplified implementation using python.', 'duration': 456.458, 'highlights': ['The chapter discusses the Navy Bias Theorem and Probabilistic Classification The transcript provides insights into the Navy Bias Theorem and Probabilistic Classification, focusing on the correlation between weather conditions and the probability of play happening.', 'Illustrates the posterior probability calculations for events given specific conditions The transcript explains in detail the calculation of posterior probability for play happening given specific weather conditions, highlighting the significance of these calculations in determining the likelihood of play occurring.', 'Simplified implementation using Python The chapter outlines the intention to conduct the same calculations and activities demonstrated in the transcript using a simplified implementation in Python, emphasizing the efficiency of achieving these tasks with minimal lines of code.']}, {'end': 22394.067, 'start': 22128.033, 'title': 'Data preparation and training', 'summary': 'Discusses the process of using basic libraries to read and manipulate a dataset, converting variables into categorical codes, and dividing the dataset into training and testing sets, aiming to use 10 records for training and 4 records for testing.', 'duration': 266.034, 'highlights': ['The dataset contains 14 days with outlook (overcast, rainy, sunny), temperature (hot, cool, mild), humidity, wind, and play. The dataset consists of 14 days with specific weather outlooks, temperature ranges, humidity, wind, and play outcomes.', 'Variables are converted into categorical codes, resulting in two data frames - one with categorical variables and another with 1s and 0s for different conditions and play outcomes. The process involves converting the variables into categorical codes, leading to two data frames with categorical variables and binary representations for conditions and play outcomes.', 'The dataset is divided into training and testing sets, with 10 records allocated for training and 4 records for testing. The dataset is split into training and testing sets, with 10 records designated for training and 4 records for testing purposes.']}, {'end': 22755.335, 'start': 22395.027, 'title': 'Implementing naive bias classifier in python', 'summary': 'Demonstrates implementing a naive bias classifier in python, achieving a 80% training accuracy and 75% testing accuracy, and explains the calculation of probabilistic scores and class probability in detail.', 'duration': 360.308, 'highlights': ['The training accuracy achieved is 80% and the testing accuracy is 75%, demonstrating the effectiveness of the Naive Bias Classifier in Python.', 'The detailed explanation of the probabilistic scores and class probability calculation provides a comprehensive understanding of the Naive Bias algorithm.', 'The step-by-step process of initializing, fitting, predicting, and evaluating the model in just a few lines of code showcases the simplicity and efficiency of implementing the Naive Bias Classifier in Python.']}, {'end': 23081.019, 'start': 22755.335, 'title': 'Support vector machine basics', 'summary': 'Discusses the probability of drawing face cards, introduces support vector machine as a discriminative classifier used for classification tasks, explaining the concept and implementation using hyperplanes and different types of svm kernels.', 'duration': 325.684, 'highlights': ["The probability of drawing a face card, given it's a King, is 1, and the overall probability of drawing a face card is 4/13.", 'The Support Vector Machine is a discriminative classifier that separates examples using hyperplanes and support vectors to create a maximal gap between different categories.', 'The implementation of Support Vector Machine involves drawing a hyperplane to separate classes as much as possible, ensuring it is equidistant from support vectors and exploring different types of SVM kernels such as linear, radial basis function, and polynomial kernels.']}], 'duration': 1968.675, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s21112344.jpg', 'highlights': ['KNN classification achieves 96% accuracy with 2 misclassified points', "Naive Bayes classification's training accuracy is 80% and testing accuracy is 75%", 'The iris dataset contains 50 samples of 3 different species of iris flower', 'The pair plot visualizes the sepal length of the flowers', 'The dataset consists of sepal length, sepal width, petal length, and petal width for each species', 'The optimal K value for KNN classification is determined as 3, leading to a 200% improvement in accuracy', 'Introduction and explanation of the concept of Naive Bayes classification based on the Naive Bayes theorem and the assumption of independence between variables', 'The chapter discusses the Navy Bias Theorem and Probabilistic Classification', 'The dataset contains 14 days with outlook, temperature, humidity, wind, and play', 'Variables are converted into categorical codes, resulting in two data frames - one with categorical variables and another with 1s and 0s for different conditions and play outcomes', 'The dataset is divided into training and testing sets, with 10 records allocated for training and 4 records for testing', 'The Support Vector Machine is a discriminative classifier that separates examples using hyperplanes and support vectors to create a maximal gap between different categories', 'The implementation of Support Vector Machine involves drawing a hyperplane to separate classes as much as possible, ensuring it is equidistant from support vectors and exploring different types of SVM kernels such as linear, radial basis function, and polynomial kernels']}, {'end': 25490.167, 'segs': [{'end': 23248.673, 'src': 'embed', 'start': 23220.725, 'weight': 4, 'content': [{'end': 23228.527, 'text': "So this is the notebook that have already prepared, and I'll give you a walkthrough as we proceed along now, here in my first cell.", 'start': 23220.725, 'duration': 7.802}, {'end': 23234.189, 'text': "I'm importing my numpy library pandas library and along with that for creation of plots.", 'start': 23228.947, 'duration': 5.242}, {'end': 23236.569, 'text': "I'm importing my matplotlib library.", 'start': 23234.469, 'duration': 2.1}, {'end': 23240.85, 'text': 'Now if you are comfortable with seaborn, you can use the seaborn library as well.', 'start': 23236.589, 'duration': 4.261}, {'end': 23248.673, 'text': "So in my example I'm just making use of matplotlib, because we are not interested in creation of visualization,", 'start': 23241.391, 'duration': 7.282}], 'summary': 'Using numpy, pandas, and matplotlib for data analysis and visualization.', 'duration': 27.948, 'max_score': 23220.725, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s23220725.jpg'}, {'end': 24308.327, 'src': 'embed', 'start': 24229.023, 'weight': 0, 'content': [{'end': 24232.966, 'text': 'polynomial features that have applied on my given linear data.', 'start': 24229.023, 'duration': 3.943}, {'end': 24238.308, 'text': 'So I have increased the degrees by which my model can learn.', 'start': 24233.683, 'duration': 4.625}, {'end': 24247.136, 'text': 'Now instead of straight line, my model is also having the ability to learn this complex representation as well.', 'start': 24240.87, 'duration': 6.266}, {'end': 24253.002, 'text': 'Because I have increased the model complexity by adding my polynomial features.', 'start': 24248.457, 'duration': 4.545}, {'end': 24259.258, 'text': 'and while doing it to make sure that we follow a clear path.', 'start': 24254.915, 'duration': 4.343}, {'end': 24261.8, 'text': 'So I have defined this skill as pipeline.', 'start': 24259.658, 'duration': 2.142}, {'end': 24268.524, 'text': "So if you're new to data science machine learning, I highly recommend you to learn this concept of a skill and pipeline.", 'start': 24262.12, 'duration': 6.404}, {'end': 24273.988, 'text': 'Now this skill and pipeline helps us to combine multiple operations in a single call.', 'start': 24268.824, 'duration': 5.164}, {'end': 24277.35, 'text': 'So here we have created a pipeline.', 'start': 24274.968, 'duration': 2.382}, {'end': 24281.873, 'text': 'This pipeline is going to add some polynomial features for my input data.', 'start': 24277.45, 'duration': 4.423}, {'end': 24290.671, 'text': "and on top of it this going to perform scaling and then I'm going to perform this binomial classification using this SVM.", 'start': 24282.924, 'duration': 7.747}, {'end': 24302.382, 'text': "And finally, I'm performing the fit on my data set see when I perform the fit it takes my input X and it's going to do all these activities.", 'start': 24292.393, 'duration': 9.989}, {'end': 24308.327, 'text': 'It is going to chain all these activities together and then it is going to perform the fit for my data Y.', 'start': 24302.542, 'duration': 5.785}], 'summary': 'Applied polynomial features to increase model complexity and created a pipeline for data operations.', 'duration': 79.304, 'max_score': 24229.023, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s24229023.jpg'}, {'end': 24954.65, 'src': 'embed', 'start': 24931.597, 'weight': 2, 'content': [{'end': 24940.741, 'text': "so there are a number of observations which might it's not mandatory which might, can belong to one or more clusters that could happen.", 'start': 24931.597, 'duration': 9.144}, {'end': 24942.762, 'text': 'so the third, we agglomerate the clustering.', 'start': 24940.741, 'duration': 2.021}, {'end': 24946.183, 'text': 'so which, which is the third type of clustering technique that we have.', 'start': 24942.762, 'duration': 3.421}, {'end': 24949.046, 'text': 'so now, what is this agglomerative clustering?', 'start': 24946.183, 'duration': 2.863}, {'end': 24952.028, 'text': 'So this is what we also call it as hierarchical clustering.', 'start': 24949.126, 'duration': 2.902}, {'end': 24954.65, 'text': 'These clustering techniques are built using H clustering.', 'start': 24952.368, 'duration': 2.282}], 'summary': 'Multiple observations can belong to one or more clusters in agglomerative clustering, also known as hierarchical clustering.', 'duration': 23.053, 'max_score': 24931.597, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s24931597.jpg'}], 'start': 23081.019, 'title': 'Svm and clustering techniques', 'summary': "Discusses the use cases and implementation of support vector machines (svm) in various applications such as face detection, text categorization, and bioinformatics, along with the steps for implementation. it also covers the implementation of svm for data exploration, data splitting, and drawing decision boundaries, with a demonstration using the 'iris' and 'make moons' datasets. additionally, the chapter delves into the use of polynomial kernel in svm for polynomial regression, as well as unsupervised clustering techniques including k-means, fuzzy or siemens, agglomerative, division, and mean shift clustering, with a practical example of clustering movie metadata.", 'chapters': [{'end': 23165.005, 'start': 23081.019, 'title': 'Svm use cases and implementation', 'summary': 'Explains the use cases of support vector machines (svm) including face detection, text categorization, and bioinformatics, and highlights the common steps for implementing svm in classification tasks.', 'duration': 83.986, 'highlights': ['Support vector machines (SVM) can be used in face detection, text and hypertext categorization, image classification, bioinformatics, and remote homology detection.', 'SVM is applicable in the task of classification, providing a generalized predictive control.', 'The common steps for implementing SVM include theoretical understanding, practical implementation, and task-specific classification usage.']}, {'end': 24114.23, 'start': 23166.026, 'title': 'Implementing support vector machines', 'summary': "Covers the implementation of support vector machines for data exploration, data splitting, loading the 'iris' dataset, drawing decision boundaries, and scaling data, with a demonstration using 'make moons' dataset, aiming to achieve a better understanding of the svm model.", 'duration': 948.204, 'highlights': ['The chapter covers the implementation of support vector machines for data exploration, data splitting, and model training, aiming to achieve a better understanding of the SVM model. The chapter discusses the process of data exploration, data splitting, and model training for support vector machines.', "The 'iris' dataset is loaded and only two features, petal length and petal width, are extracted for visualization purposes. The 'iris' dataset is loaded, and only the 'petal length' and 'petal width' features are extracted for visualization.", 'The decision boundary is drawn using the SVM model, and support vectors are identified, demonstrating the process of drawing a hyperplane to separate data points. The decision boundary is drawn using the SVM model, and support vectors are identified, illustrating the process of drawing a hyperplane to separate data points.', 'The importance of scaling data is emphasized for achieving a better fit of the SVM model. The chapter emphasizes the importance of scaling data to achieve a better fit of the SVM model.', "A demonstration using the 'Make Moons' dataset showcases the challenge of using linear classifiers and introduces the need for nonlinear classifiers. A demonstration using the 'Make Moons' dataset showcases the challenge of using linear classifiers and introduces the need for nonlinear classifiers."]}, {'end': 24795.509, 'start': 24114.29, 'title': 'Svm model and polynomial regression', 'summary': 'Explains the use of polynomial kernel in svm model to perform polynomial regression, illustrating the process through pipeline and the benefits of increasing model complexity. it also covers clustering technique, specifically k-means algorithm and its steps, with a practical example.', 'duration': 681.219, 'highlights': ['The chapter explains the use of polynomial kernel in SVM model to perform polynomial regression The chapter demonstrates using a polynomial kernel to perform polynomial regression with the SVM model.', 'Illustrating the process through pipeline and the benefits of increasing model complexity The chapter highlights the use of a pipeline to combine polynomial features, scaling, and classification with SVM, showcasing the benefits of increasing model complexity.', 'It also covers clustering technique, specifically k-means algorithm and its steps, with a practical example The chapter provides an explanation of the k-means clustering algorithm and its steps, including choosing the number of clusters, initializing centroids, assigning clusters, and optimizing convergence, with a practical example.']}, {'end': 25490.167, 'start': 24795.509, 'title': 'Unsupervised clustering techniques', 'summary': 'Explains unsupervised clustering techniques, including k-means, fuzzy or siemens, agglomerative, division, and mean shift clustering, and provides a practical example of clustering movie metadata into five clusters based on director and actor facebook likes.', 'duration': 694.658, 'highlights': ['The chapter explains unsupervised clustering techniques, including K-means, Fuzzy or Siemens, agglomerative, division, and mean shift clustering. The chapter covers various unsupervised clustering techniques such as K-means, Fuzzy or Siemens, agglomerative, division, and mean shift clustering, providing an overview of different clustering algorithms.', 'Provides a practical example of clustering movie metadata into five clusters based on director and actor Facebook likes. A practical example is given where movie metadata is clustered into five clusters based on director and actor Facebook likes, showcasing the application of clustering techniques in real-world data analysis.']}], 'duration': 2409.148, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s23081019.jpg', 'highlights': ['SVM applicable in classification for face detection, text categorization, and bioinformatics', 'Steps for implementing SVM include theoretical understanding, practical implementation, and task-specific usage', 'Importance of scaling data emphasized for achieving a better fit of the SVM model', "Demonstration using 'Make Moons' dataset showcases the challenge of using linear classifiers and introduces the need for nonlinear classifiers", 'Explanation of using polynomial kernel in SVM model to perform polynomial regression', 'Use of a pipeline to combine polynomial features, scaling, and classification with SVM, showcasing the benefits of increasing model complexity', 'Explanation of k-means clustering algorithm and its steps, with a practical example', 'Overview of various unsupervised clustering techniques such as K-means, Fuzzy or Siemens, agglomerative, division, and mean shift clustering', 'Practical example of clustering movie metadata into five clusters based on director and actor Facebook likes']}, {'end': 28212.397, 'segs': [{'end': 25517.791, 'src': 'embed', 'start': 25490.928, 'weight': 2, 'content': [{'end': 25494.089, 'text': 'Like Spielberg might be making a film with Tom Cruise or something like that.', 'start': 25490.928, 'duration': 3.161}, {'end': 25499.857, 'text': 'so. likewise, you can clearly see that where, instead of if you do this activity manually,', 'start': 25495.293, 'duration': 4.564}, {'end': 25506.302, 'text': "it might take a little longer time for you to segregate each and every component, but within five minutes we're able to cluster this activity.", 'start': 25499.857, 'duration': 6.445}, {'end': 25507.543, 'text': "so that's what the beauty with algorithms.", 'start': 25506.302, 'duration': 1.241}, {'end': 25509.364, 'text': "you don't need to manually do this activity.", 'start': 25507.543, 'duration': 1.821}, {'end': 25512.627, 'text': 'where you can provide your data automatically based on the properties of the data.', 'start': 25509.364, 'duration': 3.263}, {'end': 25515.929, 'text': 'your algorithm will build this clustering output out of the data.', 'start': 25512.627, 'duration': 3.302}, {'end': 25517.791, 'text': "it's very simple to do so.", 'start': 25515.929, 'duration': 1.862}], 'summary': 'Algorithms can cluster data within five minutes, saving time and effort.', 'duration': 26.863, 'max_score': 25490.928, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s25490928.jpg'}, {'end': 26379.285, 'src': 'embed', 'start': 26352.338, 'weight': 5, 'content': [{'end': 26358.852, 'text': 'So here the confidence is going to be support 135 divided by a support 1 3 That is to buy through it.', 'start': 26352.338, 'duration': 6.514}, {'end': 26360.933, 'text': 'That is 66.6 person is 60 person.', 'start': 26359.052, 'duration': 1.881}, {'end': 26367.857, 'text': 'So here rule 1 is going to be selected and in rule 2 we have 1 5 1 3 5 minus 1 5.', 'start': 26361.453, 'duration': 6.404}, {'end': 26375.762, 'text': 'That means if the people, if someone, is purchasing 1 and 5, then they are going to have a good possibility of going for 3 as well,', 'start': 26367.857, 'duration': 7.905}, {'end': 26379.285, 'text': 'where we are going to see the confidence and for finding the confidence.', 'start': 26375.762, 'duration': 3.523}], 'summary': 'Rule 1 selected with 66.6% confidence; rule 2 implies good possibility of purchasing 1 and 5 for 3.', 'duration': 26.947, 'max_score': 26352.338, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s26352338.jpg'}, {'end': 27445.666, 'src': 'embed', 'start': 27415.895, 'weight': 0, 'content': [{'end': 27417.698, 'text': 'okay, or you can say this is a setback.', 'start': 27415.895, 'duration': 1.803}, {'end': 27424.032, 'text': 'So, just like how we humans learn from our mistakes by trial and error, reinforcement learning is also similar.', 'start': 27418.271, 'duration': 5.761}, {'end': 27429.293, 'text': 'Okay, so we have an agent which is basically the baby and a reward which is the candy over here.', 'start': 27424.232, 'duration': 5.061}, {'end': 27435.755, 'text': 'Okay, and with many hurdles in between the agent is supposed to find the best possible path to reach the reward.', 'start': 27429.553, 'duration': 6.202}, {'end': 27438.735, 'text': 'So guys, I hope you all are clear with the reinforcement learning.', 'start': 27436.215, 'duration': 2.52}, {'end': 27441.276, 'text': "Now, let's look at the reinforcement learning process.", 'start': 27439.156, 'duration': 2.12}, {'end': 27445.666, 'text': 'So generally a reinforcement learning system has two main components.', 'start': 27441.943, 'duration': 3.723}], 'summary': 'Reinforcement learning is akin to trial and error learning, with an agent navigating hurdles to reach a reward.', 'duration': 29.771, 'max_score': 27415.895, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s27415895.jpg'}, {'end': 28172.743, 'src': 'embed', 'start': 28140.056, 'weight': 4, 'content': [{'end': 28144.159, 'text': 'We must always explore the different nodes so that we can find a more optimal policy.', 'start': 28140.056, 'duration': 4.103}, {'end': 28150.644, 'text': "But in this case, obviously a CD has the highest reward and we're going with a CD but generally it's not so simple.", 'start': 28144.5, 'duration': 6.144}, {'end': 28151.865, 'text': 'There are a lot of nodes.', 'start': 28150.684, 'duration': 1.181}, {'end': 28154.547, 'text': "There are hundreds of nodes to traverse and they're like 50 60 policies.", 'start': 28151.925, 'duration': 2.622}, {'end': 28155.688, 'text': 'Okay 50 60 different policies.', 'start': 28154.567, 'duration': 1.121}, {'end': 28164.557, 'text': 'So you make sure you explore through all the policies and then decide on an optimum policy which will give you a maximum reward.', 'start': 28158.137, 'duration': 6.42}, {'end': 28172.743, 'text': "So guys, this is our code and this is executed in Python and I'm assuming that all of you have a good background in Python.", 'start': 28165.777, 'duration': 6.966}], 'summary': 'Exploring nodes to find optimal policy with hundreds of nodes and 50-60 policies.', 'duration': 32.687, 'max_score': 28140.056, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s28140056.jpg'}, {'end': 28220.034, 'src': 'embed', 'start': 28195.807, 'weight': 3, 'content': [{'end': 28206.073, 'text': "Okay, numpy is basically a Python library for adding support for large multi-dimensional arrays and matrices and it's basically for computing mathematical functions.", 'start': 28195.807, 'duration': 10.266}, {'end': 28210.676, 'text': "Okay, so first we're going to import that after that we're going to create the R matrix.", 'start': 28206.453, 'duration': 4.223}, {'end': 28212.397, 'text': 'Okay, so this is the R matrix.', 'start': 28211.016, 'duration': 1.381}, {'end': 28220.034, 'text': "Next we're going to create a Q matrix and it's a six into six matrix, because obviously we have six states starting from zero to five.", 'start': 28212.85, 'duration': 7.184}], 'summary': 'Numpy adds support for large arrays and matrices for computing mathematical functions. create r matrix, then a 6x6 q matrix for six states from 0 to 5.', 'duration': 24.227, 'max_score': 28195.807, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s28195807.jpg'}], 'start': 25490.928, 'title': 'Algorithms in market analysis and reinforcement learning basics', 'summary': "Covers the application of algorithms in market basket analysis, emphasizing the efficiency and time-saving nature of algorithms, and discusses reinforcement learning basics including reward maximization, markov's decision process, and exploration vs. exploitation. it also showcases the use of algorithms to derive insights from large datasets and the implementation of apriori algorithm in python.", 'chapters': [{'end': 25622.966, 'start': 25490.928, 'title': 'Application of algorithms in market basket analysis', 'summary': 'Discusses the application of algorithms in market basket analysis, showcasing the efficiency and time-saving nature of algorithms in clustering activities and how large retailers leverage market basket analysis to increase revenue by uncovering associations between items and targeting customers with relevant offers.', 'duration': 132.038, 'highlights': ['The efficiency and time-saving nature of algorithms in clustering activities, enabling the clustering of activities within five minutes as opposed to manual segregation.', 'The use of market basket analysis by large retailers to uncover associations between items, leading to right product placement and targeting customers with relevant offers to increase revenue.', 'The goal of any organization being to increase revenue by mining data related to frequently bought items and leveraging market basket analysis as a key technique to achieve this goal.']}, {'end': 26464.718, 'start': 25624.687, 'title': 'Association rules mining', 'summary': 'Discusses association rules mining, market basket analysis, and the a priori algorithm, emphasizing that these algorithms can derive valuable insights from large datasets, like 10,000 items, through support, confidence, and lift measures.', 'duration': 840.031, 'highlights': ['Association rules mining can provide valuable insights for encouraging customer spending, as seen in Market Basket Analysis, where customers buying bread and butter are encouraged to buy eggs through discounts or offers. Customer behavior analysis through Association rules mining, as illustrated in Market Basket Analysis, can lead to increased spending by incentivizing customers to purchase additional items, such as eggs, through discounts or offers.', 'The A Priori algorithm is crucial for deriving insights from large datasets, such as 10,000 items, by using support, confidence, and lift measures to filter out low-frequency items and create association rules. The A Priori algorithm plays a vital role in analyzing large datasets, like 10,000 items, by filtering out low-frequency items and creating association rules based on support, confidence, and lift measures.', 'Support measures the fraction of transactions containing a specific item, confidence indicates how often items occur together, and lift indicates the strength of a rule over random occurrence, guiding the identification of significant association rules. Support, confidence, and lift measures guide the identification of significant association rules by quantifying the frequency of item occurrences, the likelihood of items co-occurring, and the strength of a rule over random occurrence.']}, {'end': 27289.818, 'start': 26464.718, 'title': 'Implementing apriori algorithm in python', 'summary': 'Covers the implementation of apriori algorithm in python, including data set import, data cleaning, item consolidation, encoding, and rule generation with specific support, confidence, and lift values.', 'duration': 825.1, 'highlights': ['The chapter covers the implementation of Apriori algorithm in Python The transcript discusses the process of implementing the Apriori algorithm in Python, including steps for creating rules, working with sample datasets, and installing necessary libraries like Pandas and MLX10.', 'Data cleaning is performed, including removing spaces from descriptions and dropping rows without invoice numbers or containing credit transactions The process involves cleaning the data by removing spaces from descriptions, dropping rows with missing invoice numbers, and eliminating credit transactions to ensure data accuracy.', "Item consolidation is achieved by grouping transactions by country and creating a consolidated transaction per row for a specific country The transcript explains the consolidation of items into one transaction per row for a specific country, using the 'group by' function and unstacking the data to achieve the desired format.", 'Encoding is performed to convert positive values to 1 and set anything less than 0 to 0, followed by generating frequent item sets and association rules with specific support, confidence, and lift values The process involves encoding positive values to 1 and transforming anything less than 0 to 0, followed by generating frequent item sets with a minimum support of 7% and creating association rules based on support, confidence, and lift values with specific thresholds.']}, {'end': 27775.497, 'start': 27289.838, 'title': 'Reinforcement learning basics', 'summary': 'Introduces the concept of reinforcement learning, where an agent learns to behave in an environment by performing actions to maximize reward, with examples like a baby learning to walk and a player learning to play counter-strike, and key terms such as agent, environment, action, state, reward, policy, value, and action value.', 'duration': 485.659, 'highlights': ['Reinforcement learning is a part of machine learning where an agent learns to behave in an environment by performing actions to maximize reward. Reinforcement learning involves an agent learning to behave in an environment by performing actions to maximize reward.', 'The chapter provides examples of reinforcement learning, such as a baby learning to walk and a player learning to play Counter-Strike, to illustrate the concept. Examples like a baby learning to walk and a player learning to play Counter-Strike are used to illustrate the concept of reinforcement learning.', 'Key terms related to reinforcement learning, such as agent, environment, action, state, reward, policy, value, and action value, are defined and explained in the chapter. The chapter defines and explains key terms related to reinforcement learning, including agent, environment, action, state, reward, policy, value, and action value.']}, {'end': 28212.397, 'start': 27776.175, 'title': "Reinforcement learning: reward maximization and markov's decision process", 'summary': "Explains the concept of reward maximization in reinforcement learning, including the fox and tiger example, discounting of rewards based on gamma value, exploration and exploitation, and markov's decision process, and how to find the shortest path with minimum cost. it also emphasizes the importance of maximizing rewards by choosing the optimum policy and understanding the difference between exploration and exploitation.", 'duration': 436.222, 'highlights': ["The agent's goal is to eat the maximum amount of meat before being eaten by the tiger, by eating the meat that is closer to him rather than the meat which is closer to the tiger, due to the higher risk near the tiger. The agent, represented by the fox, aims to maximize meat consumption while avoiding the risk of being eaten by the tiger, by prioritizing meat closer to him and discounting larger meat chunks near the tiger.", "The concept of discounting of rewards is explained, with the gamma value between 0 and 1 determining the discount value, where smaller gamma leads to larger discounting and the agent's likelihood of exploration. The discounting of rewards is based on the gamma value, with a smaller gamma leading to a larger discount value, indicating the agent's reluctance to explore and consume meat chunks closer to the tiger, while a gamma closer to 1 encourages exploration.", "The difference between exploration and exploitation is illustrated using the fox and tiger example, emphasizing the importance of exploring the entire environment to collect bigger rewards. Exploration involves capturing more information about the environment, while exploitation focuses on using known information to maximize rewards, as demonstrated by the fox's choice to explore the entire environment for bigger rewards.", "The chapter introduces Markov's decision process as a mathematical approach for mapping a solution in reinforcement learning, emphasizing the parameters such as set of actions, set of states, rewards, policy, and value to maximize rewards by choosing the optimum policy. Markov's decision process is a mathematical approach used in reinforcement learning, involving parameters such as actions, states, rewards, policy, and value, with the main goal of maximizing rewards by selecting the best policy.", 'The process of finding the shortest path with minimum cost is explained using the example of traversing nodes A to D, demonstrating the selection of the optimum policy for maximum reward. The explanation of finding the shortest path involves traversing nodes A to D and selecting the optimum policy for maximum reward, emphasizing the importance of exploring different policies to find the most optimal one.', 'The importance of maximizing rewards by choosing the optimum policy and understanding the difference between exploration and exploitation is emphasized, highlighting the need to explore different policies to find the most optimal one for maximum reward. Emphasizing the significance of selecting the best policy to maximize rewards and understanding the distinction between exploration and exploitation, stressing the importance of exploring various policies to find the most optimal one for maximum reward.']}], 'duration': 2721.469, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s25490928.jpg', 'highlights': ['The A Priori algorithm is crucial for deriving insights from large datasets, such as 10,000 items, by using support, confidence, and lift measures to filter out low-frequency items and create association rules.', 'Support, confidence, and lift measures guide the identification of significant association rules by quantifying the frequency of item occurrences, the likelihood of items co-occurring, and the strength of a rule over random occurrence.', 'The chapter covers the implementation of Apriori algorithm in Python, including steps for creating rules, working with sample datasets, and installing necessary libraries like Pandas and MLX10.', 'Reinforcement learning involves an agent learning to behave in an environment by performing actions to maximize reward.', 'The chapter defines and explains key terms related to reinforcement learning, including agent, environment, action, state, reward, policy, value, and action value.', "Markov's decision process is a mathematical approach used in reinforcement learning, involving parameters such as actions, states, rewards, policy, and value, with the main goal of maximizing rewards by selecting the best policy."]}, {'end': 29949.991, 'segs': [{'end': 28894.512, 'src': 'embed', 'start': 28865.399, 'weight': 3, 'content': [{'end': 28870.763, 'text': 'Now, this is done by associating the topmost priority location with a very high reward than the usual ones.', 'start': 28865.399, 'duration': 5.364}, {'end': 28875.086, 'text': "So let's put 999 in the cell L6 comma L6.", 'start': 28871.363, 'duration': 3.723}, {'end': 28880.049, 'text': 'Now the table of rewards with the higher reward for the topmost location looks something like this.', 'start': 28875.746, 'duration': 4.303}, {'end': 28886.654, 'text': 'We have now formally defined all the vital components for the solution we are aiming for the problem discussed now.', 'start': 28880.87, 'duration': 5.784}, {'end': 28894.512, 'text': 'We will shift gears a bit and study some of the fundamental concepts that prevail in the world of reinforcement learning and Q learning.', 'start': 28887.29, 'duration': 7.222}], 'summary': 'Associating top priority location with 999 reward, defining vital components for the solution, and studying fundamental concepts in reinforcement learning.', 'duration': 29.113, 'max_score': 28865.399, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s28865399.jpg'}, {'end': 29410.375, 'src': 'embed', 'start': 29382.912, 'weight': 0, 'content': [{'end': 29388.617, 'text': 'we essentially mean that there is an 80% chance that the robot will take the upper turn.', 'start': 29382.912, 'duration': 5.705}, {'end': 29400.591, 'text': 'Now, if you put all the required values in our equation, we get V of S is equal to maximum of R of S, comma A plus comma of 0.8 into V of room up,', 'start': 29389.426, 'duration': 11.165}, {'end': 29410.375, 'text': 'plus 0.1 into V of room down, 0.03 into room of V of room left, plus 0.03 into V of room right.', 'start': 29400.591, 'duration': 9.784}], 'summary': 'The robot has an 80% chance of taking the upper turn based on the given equation.', 'duration': 27.463, 'max_score': 29382.912, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s29382912.jpg'}, {'end': 29530.191, 'src': 'embed', 'start': 29499.683, 'weight': 4, 'content': [{'end': 29504.907, 'text': 'So earlier we had 0.8 into V of S1, 0.03 into V of S2, 0.1 into V of S3 and so on.', 'start': 29499.683, 'duration': 5.224}, {'end': 29515.747, 'text': 'Now, if you incorporate the idea of assessing the quality of the action for moving to a certain state,', 'start': 29510.245, 'duration': 5.502}, {'end': 29520.588, 'text': 'so the environment with the agent and the quality of the action will look something like this', 'start': 29515.747, 'duration': 4.841}, {'end': 29530.191, 'text': 'So instead of 0.8 V of s1 will have Q of s1 comma a1 will have Q of s2 comma a2 Q of s3.', 'start': 29521.228, 'duration': 8.963}], 'summary': 'Transcript discusses assessing action quality with quantifiable data.', 'duration': 30.508, 'max_score': 29499.683, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s29499683.jpg'}, {'end': 29873.489, 'src': 'embed', 'start': 29843.763, 'weight': 1, 'content': [{'end': 29848.804, 'text': "and let's initialize the parameters, that is, the gamma and alpha parameters.", 'start': 29843.763, 'duration': 5.041}, {'end': 29856.025, 'text': 'so gamma is 0.75, which is the discount factor, whereas alpha is 0.9, which is the learning rate.', 'start': 29848.804, 'duration': 7.221}, {'end': 29860.546, 'text': "now, next, what we're going to do is define the states and map it to numbers.", 'start': 29856.025, 'duration': 4.521}, {'end': 29867.568, 'text': 'so, as i mentioned earlier, l1 is 0 and till n line, we have defined the states in the numerical form.', 'start': 29860.546, 'duration': 7.022}, {'end': 29873.489, 'text': 'now the next step is to define the actions, which is, as mentioned above, represent the transition to the next state.', 'start': 29867.568, 'duration': 5.921}], 'summary': 'Initializing parameters: gamma = 0.75, alpha = 0.9. defining states and actions numerically.', 'duration': 29.726, 'max_score': 29843.763, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s29843763.jpg'}], 'start': 28212.85, 'title': 'Reinforcement learning in q learning', 'summary': 'Introduces reinforcement learning and q learning, covers the creation and initialization of a q matrix, setting the gamma parameter, defining available actions, choosing actions at random, computing q value, updating the q matrix, and training phase with 10,000 iterations. it also explains the process of setting a current state, finding paths with maximum rewards, and defining vital components of a reinforcement learning solution. additionally, it discusses enabling a robot with memory using the bellman equation, handling stochasticity using the markov decision process, and incorporating stochasticity in the bellman equation. it emphasizes key equations in reinforcement learning and q learning, and the implementation with parameters gamma and alpha.', 'chapters': [{'end': 28499.843, 'start': 28212.85, 'title': 'Reinforcement learning q-matrix', 'summary': 'Covers the creation and initialization of a q matrix, setting the gamma parameter, defining available actions, choosing actions at random, computing q value using a formula, updating the q matrix, training phase with 10,000 iterations, and normalizing the q matrix for the testing phase.', 'duration': 286.993, 'highlights': ['Creation and initialization of a Q matrix The Q matrix is created as a 6x6 matrix and initialized to zero, representing 6 states starting from zero to five.', 'Training phase with 10,000 iterations The training phase involves 10,000 iterations to find the best policy for the agent.', 'Choosing actions at random and updating Q matrix The process involves choosing available actions from the current state, computing the next action, and updating the Q matrix accordingly.', 'Normalizing the Q matrix for the testing phase The Q matrix values are normalized by dividing them with the maximum Q value into 100 to make computation easier and understandable.']}, {'end': 28905.874, 'start': 28500.184, 'title': 'Reinforcement learning in q learning', 'summary': 'Introduces reinforcement learning and q learning by explaining the process of setting a current state, finding paths with maximum rewards, and defining the vital components of a reinforcement learning solution for an automobile factory scenario.', 'duration': 405.69, 'highlights': ['The agent is trained to find the shortest route from any given location to another within the automobile factory warehouse. The task involves enabling robots to find the shortest route from any given location to another within the automobile factory warehouse.', 'The Q matrix is calculated to determine the selected path for the agent based on maximum rewards. The Q matrix is calculated to determine the selected path for the agent based on maximum rewards, such as the path 2 3 4 5 for the initial stage set as two.', 'The reward table is constructed to map all possible states and their associated rewards, prioritizing specific locations with high rewards. A reward table is constructed to map all possible states and their associated rewards, prioritizing specific locations with high rewards, such as associating the topmost priority location with a very high reward.']}, {'end': 29287.806, 'start': 28906.474, 'title': 'Robot path planning with bellman equation', 'summary': 'Discusses enabling a robot with the memory using the bellman equation to create value footprints and handling stochasticity using the markov decision process in the context of path planning, with emphasis on key equations in reinforcement learning and q learning.', 'duration': 381.332, 'highlights': ["The Bellman equation is used to create value footprints for a robot's path planning by considering state values, actions, and a discount factor, with the max function aiding in decision-making. The Bellman equation creates value footprints for the robot's path planning by considering state values, actions, and a discount factor. It uses the max function to aid in decision-making.", 'The concept of stochasticity and handling random outcomes in robot path planning is addressed through the Markov decision process, which models decision-making in situations where outcomes are partly random and partly under the control of the decision maker. The Markov decision process addresses the concept of stochasticity and handling random outcomes in robot path planning, providing a mathematical framework for modeling decision-making in situations where outcomes are partly random and partly under the control of the decision maker.']}, {'end': 29949.991, 'start': 29288.647, 'title': 'Q learning and the bellman equation', 'summary': 'Discusses the incorporation of stochasticity in the bellman equation, the introduction of living penalty, the transition to q learning, and its implementation with the parameters gamma and alpha. it also covers the mapping of warehouse locations to states and actions, and the creation of the reward table.', 'duration': 661.344, 'highlights': ["Incorporating Stochasticity in the Bellman Equation The equation is modified to introduce randomness into the robot's movement, utilizing probabilities for each turn to quantify the robot's chances, with an example showing the calculation of V of S incorporating stochasticity.", 'Introduction of Living Penalty The concept of rewarding the robot for its actions, known as the living penalty, is discussed, highlighting the importance of assessing the quality of actions and the complexity of modeling sparse rewards in reinforcement learning.', 'Transition to Q Learning The chapter explains the shift to Q learning, focusing on assessing the quality of actions taken to move to a state, incorporating Q values and the temporal difference to capture random changes in the environment.', 'Implementation of Q Learning The process of implementing the Q learning algorithm is detailed, including the mapping of warehouse locations to numerical states, defining actions, creating a reward table, and discussing the inverse mapping of states back to their original locations.']}], 'duration': 1737.141, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s28212850.jpg', 'highlights': ['Training phase with 10,000 iterations to find the best policy for the agent', 'The Q matrix is calculated to determine the selected path for the agent based on maximum rewards', 'The concept of stochasticity and handling random outcomes in robot path planning is addressed through the Markov decision process', "Incorporating Stochasticity in the Bellman Equation to introduce randomness into the robot's movement", 'Introduction of Living Penalty to assess the quality of actions and the complexity of modeling sparse rewards in reinforcement learning', 'Transition to Q Learning, focusing on assessing the quality of actions taken to move to a state']}, {'end': 30792, 'segs': [{'end': 30145.414, 'src': 'embed', 'start': 30107.936, 'weight': 2, 'content': [{'end': 30114.118, 'text': "we provide an input image of a dog and then it'll automatically learn certain features, even if we don't provide it with those features.", 'start': 30107.936, 'duration': 6.182}, {'end': 30121.68, 'text': 'and after that it will give us the probability, and according to this particular scenario, it says 95% chances of being a dog,', 'start': 30114.878, 'duration': 6.802}, {'end': 30123.88, 'text': '3% chances of some other animal.', 'start': 30121.68, 'duration': 2.2}, {'end': 30125.861, 'text': 'similarly, 2% chances of some other animal.', 'start': 30123.88, 'duration': 1.981}, {'end': 30132.423, 'text': 'So since the highest probability goes with the dog, so the prediction says that the object or the input image is nothing but a dog.', 'start': 30126.421, 'duration': 6.002}, {'end': 30137.024, 'text': "Fine guys, so we'll move forward and we'll understand how exactly deep learning works.", 'start': 30133.343, 'duration': 3.681}, {'end': 30144.554, 'text': 'So the motivation behind deep learning is nothing but the human brain as we have seen in the previous analogy as well.', 'start': 30138.848, 'duration': 5.706}, {'end': 30145.414, 'text': 'What we are trying to do.', 'start': 30144.594, 'duration': 0.82}], 'summary': 'Deep learning predicts 95% chance of input image being a dog, 3% other animal, and 2% another animal.', 'duration': 37.478, 'max_score': 30107.936, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s30107936.jpg'}, {'end': 30244.819, 'src': 'embed', 'start': 30202.324, 'weight': 0, 'content': [{'end': 30206.485, 'text': 'Then only the cell or this neuron will fire the signals to the next neuron.', 'start': 30202.324, 'duration': 4.161}, {'end': 30213.586, 'text': 'And this is how the neuron or the brain cell works and we take the same process forward and we try to create artificial neurons.', 'start': 30206.905, 'duration': 6.681}, {'end': 30218.187, 'text': 'So let us understand why we need artificial neurons with an example.', 'start': 30215.466, 'duration': 2.721}, {'end': 30225.088, 'text': 'So I have a data set of flowers say and that data set includes sepal length, sepal width, petal length and petal width.', 'start': 30218.647, 'duration': 6.441}, {'end': 30230.09, 'text': 'Now what I want to do I want to classify the type of flower on the basis of this data set.', 'start': 30226.027, 'duration': 4.063}, {'end': 30233.031, 'text': 'Now there are two options either I can do it manually.', 'start': 30230.57, 'duration': 2.461}, {'end': 30240.076, 'text': 'I can look at the flower manually and determine by its color or any means, and I can identify what sort of a flower it is,', 'start': 30233.492, 'duration': 6.584}, {'end': 30241.757, 'text': 'or I can train a machine to do that.', 'start': 30240.076, 'duration': 1.681}, {'end': 30244.819, 'text': 'Now let me tell you the problem with doing this process manually.', 'start': 30242.277, 'duration': 2.542}], 'summary': 'Understanding the need for artificial neurons in classifying flowers based on data sets and the limitations of manual processing.', 'duration': 42.495, 'max_score': 30202.324, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s30202324.jpg'}, {'end': 30428.282, 'src': 'embed', 'start': 30397.444, 'weight': 3, 'content': [{'end': 30399.445, 'text': 'or another class or another type of the flower.', 'start': 30397.444, 'duration': 2.001}, {'end': 30405.834, 'text': 'We can even call perceptron as a single layer binary linear classifier to be more specific.', 'start': 30400.592, 'duration': 5.242}, {'end': 30410.296, 'text': 'Because it is able to classify inputs which are linearly separable.', 'start': 30406.394, 'duration': 3.902}, {'end': 30420.3, 'text': 'And our main task here is to predict to which of the two possible categories a certain data point belongs, based on a set of input variables.', 'start': 30411.176, 'duration': 9.124}, {'end': 30424.222, 'text': "Now there's an algorithm on which it works, so let me explain you that.", 'start': 30421.421, 'duration': 2.801}, {'end': 30428.282, 'text': 'So the first thing we do is we initialize the weights and the threshold.', 'start': 30425.06, 'duration': 3.222}], 'summary': 'A perceptron is a single layer binary linear classifier for classifying linearly separable inputs.', 'duration': 30.838, 'max_score': 30397.444, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s30397444.jpg'}, {'end': 30738.458, 'src': 'embed', 'start': 30714.709, 'weight': 4, 'content': [{'end': 30722.241, 'text': 'When I talk about one one, then our output will be two, which is greater than 0.5, so we get an output that is one, and this is how we get the graph.', 'start': 30714.709, 'duration': 7.532}, {'end': 30728.813, 'text': 'And now if you notice with the help of the single layer perceptron, we are able to classify the ones and zeros.', 'start': 30723.43, 'duration': 5.383}, {'end': 30734.776, 'text': 'So this line, anything above this line is actually one and anything below this line, we have zeros.', 'start': 30729.293, 'duration': 5.483}, {'end': 30738.458, 'text': 'So this is how we are able to classify or able to implement OR gate.', 'start': 30735.436, 'duration': 3.022}], 'summary': 'Using single layer perceptron, we can implement or gate to classify ones and zeros.', 'duration': 23.749, 'max_score': 30714.709, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s30714709.jpg'}], 'start': 29950.811, 'title': 'Deep learning evolution and artificial neurons', 'summary': 'Explores the evolution from manual feature extraction to automatic feature determination in deep learning, mimicking human brain functioning, and explains the process of training machines to classify using artificial neurons and perceptron, including activation functions, weight updates, and linear separability. it provides insights into the limitations of manual classification and the implementation of logic gates.', 'chapters': [{'end': 30225.088, 'start': 29950.811, 'title': 'Deep learning: mimicking human brain', 'summary': "Delves into the concept of deep learning, highlighting the evolution from manual feature extraction to the automatic feature determination process in deep learning algorithms, mirroring the human brain's functioning through the creation of artificial neurons.", 'duration': 274.277, 'highlights': ['The evolution from manual feature extraction to automatic feature determination in deep learning algorithms Deep learning eliminates the manual extraction of features by directly feeding input images to the algorithm, which automatically determines important features, leading to efficient classification of objects.', "The concept of mimicking the human brain through deep learning The motivation for deep learning is to mimic the human brain's thinking processes, decision-making, and problem-solving abilities, achieved through the creation of systems that replicate the functioning of the human brain using artificial neurons.", 'Explanation of how neurons work, including dendrites, cell body, and synapse The functioning of neurons involves dendrites receiving signals, passing them to the cell body, performing a Sigma function, and firing signals to the next neuron through the synapse, based on signal thresholds, similar to artificial neurons.']}, {'end': 30792, 'start': 30226.027, 'title': 'Artificial neurons and perceptron', 'summary': 'Explains the limitations of human brain in classifying flowers manually and the process of training a machine to classify flowers using artificial neurons and perceptron, which involves multiplying inputs with corresponding weights, using activation functions, updating weights through a learning algorithm, and applying perceptron to classify linearly separable inputs and implement logic gates.', 'duration': 565.973, 'highlights': ['The chapter explains the limitations of human brain in classifying flowers manually and the process of training a machine to classify flowers using artificial neurons and perceptron. It discusses the challenges of manually classifying flowers, the benefits of training a machine to classify flowers, and the role of artificial neurons and perceptron in the classification process.', 'The process of training a machine to classify flowers using artificial neurons and perceptron involves multiplying inputs with corresponding weights, using activation functions, and updating weights through a learning algorithm. It details the steps of multiplying inputs with corresponding weights, using activation functions to determine the firing of neurons, and updating weights through a learning algorithm to reduce the difference between actual and desired outputs.', 'The chapter explains the application of perceptron in classifying linearly separable inputs and implementing logic gates. It illustrates the usage of perceptron to classify linearly separable inputs and implement logic gates such as OR and AND gates, explaining the process and the corresponding activation functions.']}], 'duration': 841.189, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s29950811.jpg', 'highlights': ['Deep learning eliminates manual feature extraction, leading to efficient classification.', "Deep learning aims to mimic human brain's thinking processes and problem-solving abilities.", 'Neurons function through dendrites, cell body, Sigma function, and synapse, similar to artificial neurons.', 'Training machines to classify flowers involves multiplying inputs with weights and using activation functions.', 'Updating weights through a learning algorithm reduces the difference between actual and desired outputs.', 'Perceptron is applied to classify linearly separable inputs and implement logic gates like OR and AND gates.']}, {'end': 32276.856, 'segs': [{'end': 31087.178, 'src': 'embed', 'start': 31054.408, 'weight': 3, 'content': [{'end': 31058.552, 'text': "So after that, I'm going to define my node one and node two, which are constant nodes.", 'start': 31054.408, 'duration': 4.144}, {'end': 31071.067, 'text': "So for that I'm gonna type in here node one equal to tf.constant, and then I'm going to define the constant value in this.", 'start': 31059.032, 'duration': 12.035}, {'end': 31078.352, 'text': "so it'll be three and it'll be a float value of tf.float of 32 bit.", 'start': 31071.067, 'duration': 7.285}, {'end': 31087.178, 'text': "All right, and now I'm gonna define my second constant node, so I'll type in here node two, tf.constant.", 'start': 31079.493, 'duration': 7.685}], 'summary': 'Defining two constant nodes with values 3 and tf.float32.', 'duration': 32.77, 'max_score': 31054.408, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s31054408.jpg'}, {'end': 31331.181, 'src': 'embed', 'start': 31303.181, 'weight': 6, 'content': [{'end': 31308.904, 'text': "and then I'm going to define the last constant node and tf.constant.", 'start': 31303.181, 'duration': 5.723}, {'end': 31310.665, 'text': "the value that'll be there is three.", 'start': 31308.904, 'duration': 1.761}, {'end': 31316.769, 'text': 'All right, so we have three constant nodes, and now we are going to perform certain operations in them.', 'start': 31311.666, 'duration': 5.103}, {'end': 31323.213, 'text': "For that I'm going to define one node, let it be d, which will be equal to tf.constant.", 'start': 31317.269, 'duration': 5.944}, {'end': 31331.181, 'text': "So the two nodes that we want to multiply, so that'll be a comma b.", 'start': 31324.199, 'duration': 6.982}], 'summary': 'Defining 3 constant nodes and performing multiplication operations in tensorflow.', 'duration': 28, 'max_score': 31303.181, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s31303181.jpg'}, {'end': 31418.326, 'src': 'embed', 'start': 31389.227, 'weight': 7, 'content': [{'end': 31391.209, 'text': "Let's go ahead and run this and see what happens.", 'start': 31389.227, 'duration': 1.982}, {'end': 31398.236, 'text': 'So we have got the value five which is correct because if you notice our presentation as well let me open it for you.', 'start': 31392.57, 'duration': 5.666}, {'end': 31404.075, 'text': 'Over here we also get the value phi similar to our implementation in PyCharm as well.', 'start': 31399.591, 'duration': 4.484}, {'end': 31408.238, 'text': 'So this is how you can actually build a computational graph and run a computational graph.', 'start': 31404.595, 'duration': 3.643}, {'end': 31409.319, 'text': "I've given you an example.", 'start': 31408.258, 'duration': 1.061}, {'end': 31412.602, 'text': 'Now guys, let us move forward because these are all the constant nodes.', 'start': 31409.839, 'duration': 2.763}, {'end': 31416.565, 'text': "What if I want to change the value that is there in the node? So for that we don't use the constant nodes.", 'start': 31412.702, 'duration': 3.863}, {'end': 31418.326, 'text': 'For that we use placeholders and variables.', 'start': 31416.585, 'duration': 1.741}], 'summary': 'Demonstration of building and running a computational graph using constant nodes, placeholders, and variables.', 'duration': 29.099, 'max_score': 31389.227, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s31389227.jpg'}, {'end': 31465.923, 'src': 'embed', 'start': 31443.955, 'weight': 4, 'content': [{'end': 31452.498, 'text': 'these three lines are bit like a function or a lambda in which we define two input parameters, A and B, and then an operation possible in them.', 'start': 31443.955, 'duration': 8.543}, {'end': 31454.439, 'text': 'So we are actually performing addition.', 'start': 31452.818, 'duration': 1.621}, {'end': 31460.961, 'text': 'So we can evaluate this graph with multiple inputs by using feed underscore dict parameter.', 'start': 31455.199, 'duration': 5.762}, {'end': 31465.923, 'text': 'As you can see we are doing it here so we are actually passing all these values to our placeholders here.', 'start': 31461.481, 'duration': 4.442}], 'summary': 'Defining a function with two parameters a and b, performing addition, and evaluating the graph with multiple inputs.', 'duration': 21.968, 'max_score': 31443.955, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s31443955.jpg'}, {'end': 31786.727, 'src': 'embed', 'start': 31736.112, 'weight': 0, 'content': [{'end': 31741.713, 'text': 'Now, as we have seen earlier as well, in order to initialize constants we call tf.constant, and their value will never change.', 'start': 31736.112, 'duration': 5.601}, {'end': 31745.954, 'text': 'But by contrast, variables are not initialized when you call tf.variable.', 'start': 31742.274, 'duration': 3.68}, {'end': 31748.955, 'text': 'So, to initialize all the variables in the TensorFlow program,', 'start': 31746.454, 'duration': 2.501}, {'end': 31753.736, 'text': "what you need to do is you need to explicitly call a special operation and how you're gonna do it.", 'start': 31748.955, 'duration': 4.781}, {'end': 31762.964, 'text': 'just type in here inet equal to tf.global underscore variables.', 'start': 31753.736, 'duration': 9.228}, {'end': 31764.485, 'text': 'underscore initializer.', 'start': 31762.964, 'duration': 1.521}, {'end': 31768.267, 'text': "That's all.", 'start': 31767.907, 'duration': 0.36}, {'end': 31770.368, 'text': "And then we're gonna run a session.", 'start': 31768.927, 'duration': 1.441}, {'end': 31780.232, 'text': 'so you know the process says is equals to tf.session inet.', 'start': 31770.368, 'duration': 9.864}, {'end': 31786.727, 'text': "Now let's print it and before that, what we need to do is we need to provide the x placeholder with some values.", 'start': 31781.826, 'duration': 4.901}], 'summary': 'Constants are initialized with tf.constant, while variables require tf.global_variables_initializer to be explicitly called in tensorflow program.', 'duration': 50.615, 'max_score': 31736.112, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s31736112.jpg'}, {'end': 32164.462, 'src': 'embed', 'start': 32101.778, 'weight': 2, 'content': [{'end': 32110.684, 'text': 'And Y will have values zero comma minus one comma minus two comma minus three.', 'start': 32101.778, 'duration': 8.906}, {'end': 32121.512, 'text': 'Print says dot run W comma P.', 'start': 32112.666, 'duration': 8.846}, {'end': 32125.61, 'text': 'Now let me first go ahead and Comment these lines.', 'start': 32121.512, 'duration': 4.098}, {'end': 32132.773, 'text': "Alright, so I've just made a mistake here.", 'start': 32130.212, 'duration': 2.561}, {'end': 32135.835, 'text': 'This is in uppercase W and yeah.', 'start': 32132.813, 'duration': 3.022}, {'end': 32138.656, 'text': "So now we are good to go and let's run this and see what happens.", 'start': 32136.055, 'duration': 2.601}, {'end': 32143.919, 'text': 'So these are our final model parameters.', 'start': 32141.918, 'duration': 2.001}, {'end': 32154.571, 'text': 'So the value of a W will be around .999969 and the value of our B will be around .9999082.', 'start': 32144.399, 'duration': 10.172}, {'end': 32160.618, 'text': 'So this is how we actually build a model and then we evaluate how good it is and then we try to optimize it in the best way possible.', 'start': 32154.571, 'duration': 6.047}, {'end': 32164.462, 'text': "So I've just given you a general overview of how things works in TensorFlow.", 'start': 32161.178, 'duration': 3.284}], 'summary': 'Model parameters: w=0.999969, b=0.9999082. general overview of tensorflow.', 'duration': 62.684, 'max_score': 32101.778, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s32101778.jpg'}, {'end': 32289.646, 'src': 'embed', 'start': 32259.051, 'weight': 5, 'content': [{'end': 32262.172, 'text': 'If both the inputs are true or both the inputs are one, then output is true.', 'start': 32259.051, 'duration': 3.121}, {'end': 32265.093, 'text': 'Oops, I forgot a comma everywhere.', 'start': 32263.252, 'duration': 1.841}, {'end': 32266.613, 'text': 'Let me just go ahead and do that.', 'start': 32265.113, 'duration': 1.5}, {'end': 32269.474, 'text': 'So if both the inputs are true, the output will be true.', 'start': 32267.613, 'duration': 1.861}, {'end': 32271.655, 'text': 'If any of the input is false, the output will be false.', 'start': 32269.834, 'duration': 1.821}, {'end': 32276.856, 'text': "So I'm just gonna type in here false everywhere because there's only one condition in which we have the true output.", 'start': 32271.675, 'duration': 5.181}, {'end': 32279.857, 'text': 'All right, so this is done now.', 'start': 32278.637, 'duration': 1.22}, {'end': 32284.905, 'text': 'Now as we know that TensorFlow works by building a model out of empty tensors.', 'start': 32281.004, 'duration': 3.901}, {'end': 32289.646, 'text': 'Then plugging in known values and evaluating the model like we have done in the previous example.', 'start': 32285.525, 'duration': 4.121}], 'summary': 'If both inputs are true, output is true. tensorflow builds model using empty tensors.', 'duration': 30.595, 'max_score': 32259.051, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s32259051.jpg'}], 'start': 30793.501, 'title': 'Tensorflow basics and model optimization', 'summary': 'Covers understanding the mnist dataset, building and running computational graphs in tensorflow, using placeholders and variables, and optimizing a tensorflow model for better performance, all with practical application and quantifiable data.', 'chapters': [{'end': 30932.665, 'start': 30793.501, 'title': 'Understanding mnist dataset and tensorflow', 'summary': 'Covers the use of mnist dataset, containing 50,000 training images and 10,000 testing images for training and testing a deep learning model using tensorflow, which operates through tensor representation and data flow graph.', 'duration': 139.164, 'highlights': ['MNIST dataset contains 50,000 training images and 10,000 testing images for training and testing the model. The MNIST dataset contains handwritten digits from zero to nine, with 50,000 training images and 10,000 testing images.', 'TensorFlow is used to implement deep learning models, representing data as tensors, which are multi-dimensional arrays. TensorFlow is used to implement deep learning models, representing data as tensors, which are multi-dimensional arrays, or an extension of two-dimensional tables to data with high dimension.', 'TensorFlow operates through a data flow graph, performing matrix manipulation and utilizing tensors for weights, inputs, matrix multiplication, and bias addition. TensorFlow operates through a data flow graph, performing matrix manipulation and utilizing tensors for weights, inputs, matrix multiplication, and bias addition in the perceptron.']}, {'end': 31351.968, 'start': 30933.285, 'title': 'Tensorflow basics: building and running computational graphs', 'summary': 'Explains the basics of building and running computational graphs in tensorflow, including the process of defining constant nodes, creating a computational graph, and running the graph within a session to evaluate the nodes.', 'duration': 418.683, 'highlights': ['The process of building a computational graph in TensorFlow involves defining constant nodes, which can be as simple as addition or as complex as multivariate equations.', 'Running the computational graph within a session is essential to evaluate the nodes and obtain their values, in this case yielding the output of three and four.', 'A practical example demonstrates the creation of a computational graph with three constant nodes, performing operations such as multiplication, addition, and subtraction, followed by executing the graph within a session to obtain the final output.']}, {'end': 31815.502, 'start': 31356.44, 'title': 'Tensorflow computational graph and session', 'summary': 'Covers building and running a computational graph with tensorflow, using placeholders and variables to modify the graph and add trainable parameters, and initializing all the variables in a tensorflow program.', 'duration': 459.062, 'highlights': ['Building and Running Computational Graph The process of building and running a computational graph involves defining a variable, running the session with tf.session, and printing the output, resulting in getting the correct value of five, showcasing the practical implementation of building and running a computational graph.', 'Using Placeholders for Modifying Graph The utilization of placeholders includes defining placeholders for providing values later, performing operations with multiple inputs using feed_dict parameter, and executing the graph with provided values, resulting in getting the output of three and seven through addition, demonstrating the practical usage of placeholders for modifying the graph.', 'Introducing Variables for Trainable Parameters The introduction of variables in TensorFlow involves adding trainable parameters to the graph, initializing variables using tf.globalVariable initializer, and executing the session with initialized variables, showcasing the process of introducing variables for trainable parameters in a TensorFlow program.']}, {'end': 32276.856, 'start': 31817.483, 'title': 'Tensorflow model evaluation and optimization', 'summary': 'Discusses the evaluation and optimization of a tensorflow model, calculating the loss function, implementing gradient descent to minimize the loss, and obtaining the final model parameters with the aim of optimizing the model for better performance.', 'duration': 459.373, 'highlights': ['The loss function is calculated by subtracting the actual output from the desired output, squaring the difference, summing all the square deltas, and defining a single scalar as the loss. Loss is calculated as the sum of the squared differences between actual and desired outputs, resulting in a loss value of 23.66 for a specific set of input values.', "TensorFlow provides optimizers like gradient descent to minimize the loss function by slowly changing each variable, and the learning rate is set at 0.01. Gradient descent optimizer is used to minimize the loss function with a learning rate of 0.01, aiming to improve the model's performance.", 'The AND gate is implemented by providing training data consisting of the truth table for the AND gate and adding an extra value of one as bias to all training examples. The training data for the AND gate is provided, specifying the input-output relationships for the AND gate, and incorporating bias by adding an extra value of one to all training examples.']}], 'duration': 1483.355, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s30793501.jpg', 'highlights': ['MNIST dataset contains 50,000 training images and 10,000 testing images for training and testing the model.', 'TensorFlow operates through a data flow graph, performing matrix manipulation and utilizing tensors for weights, inputs, matrix multiplication, and bias addition.', 'The process of building a computational graph in TensorFlow involves defining constant nodes, which can be as simple as addition or as complex as multivariate equations.', 'Running the computational graph within a session is essential to evaluate the nodes and obtain their values, in this case yielding the output of three and four.', 'Building and Running Computational Graph involves defining a variable, running the session with tf.session, and printing the output, resulting in getting the correct value of five.', 'The utilization of placeholders includes defining placeholders for providing values later, performing operations with multiple inputs using feed_dict parameter, and executing the graph with provided values, resulting in getting the output of three and seven through addition.', 'The loss function is calculated by subtracting the actual output from the desired output, squaring the difference, summing all the square deltas, and defining a single scalar as the loss.', 'TensorFlow provides optimizers like gradient descent to minimize the loss function by slowly changing each variable, and the learning rate is set at 0.01.', 'The AND gate is implemented by providing training data consisting of the truth table for the AND gate and adding an extra value of one as bias to all training examples.']}, {'end': 34119.147, 'segs': [{'end': 33000.851, 'src': 'embed', 'start': 32940.373, 'weight': 2, 'content': [{'end': 32949.536, 'text': 'And similarly Y is also a 2D array where each row is one hot 10 dimensional vector indicating which digit class the corresponding MNIST image belongs to.', 'start': 32940.373, 'duration': 9.163}, {'end': 32954.578, 'text': 'And now next step is to define weight and biases for our model like we have done in the previous example.', 'start': 32950.318, 'duration': 4.26}, {'end': 32960.762, 'text': 'So we could imagine treating these like additional inputs, but TensorFlow has even a better way to handle them.', 'start': 32955.32, 'duration': 5.442}, {'end': 32962.803, 'text': 'and what it is it is nothing but variables.', 'start': 32960.762, 'duration': 2.041}, {'end': 32965.104, 'text': 'So let us go ahead and do that.', 'start': 32963.322, 'duration': 1.782}, {'end': 32969.086, 'text': "I'm gonna type in here WTF dot variable.", 'start': 32965.144, 'duration': 3.942}, {'end': 32979.427, 'text': "tf.zeros I'm going to initialize it to zeros and the shape will be a 784 comma 10.", 'start': 32970.881, 'duration': 8.546}, {'end': 32983.049, 'text': "So it's like 28 cross 28 pixels and a 10 classes.", 'start': 32979.427, 'duration': 3.622}, {'end': 32985.571, 'text': 'Similarly when I talk about bias.', 'start': 32983.769, 'duration': 1.802}, {'end': 33000.851, 'text': 'so it will be tf.variable, tf.zeros, initialize it to zeros and the shape will be 10.', 'start': 32985.571, 'duration': 15.28}], 'summary': 'Defining weight, biases in tensorflow using 2d arrays for mnist image classification.', 'duration': 60.478, 'max_score': 32940.373, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s32940373.jpg'}, {'end': 33176.968, 'src': 'embed', 'start': 33151.309, 'weight': 4, 'content': [{'end': 33155.911, 'text': 'So labels is equals to y means that this is our target output and this is the actual output.', 'start': 33151.309, 'duration': 4.602}, {'end': 33157.833, 'text': "So it'll be, we'll name it as y underscore.", 'start': 33155.932, 'duration': 1.901}, {'end': 33164.855, 'text': 'And so what exactly is happening? It will calculate the difference between the target output and for the actual output for all the examples.', 'start': 33158.493, 'duration': 6.362}, {'end': 33167.797, 'text': 'Then it is gonna sum all of them and then find out the mean.', 'start': 33165.375, 'duration': 2.422}, {'end': 33171.498, 'text': 'So this is what basically this cross entropy variable will do.', 'start': 33168.396, 'duration': 3.102}, {'end': 33176.968, 'text': 'Now that we have defined our model and training loss function it is straightforward to train using TensorFlow.', 'start': 33172.188, 'duration': 4.78}], 'summary': 'Cross entropy variable calculates mean difference for training loss in tensorflow.', 'duration': 25.659, 'max_score': 33151.309, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s33151309.jpg'}, {'end': 33799.597, 'src': 'embed', 'start': 33771.285, 'weight': 0, 'content': [{'end': 33774.148, 'text': "Then we're gonna calculate the output of H2 as well similarly.", 'start': 33771.285, 'duration': 2.863}, {'end': 33779.672, 'text': 'And we get the output as .596884378.', 'start': 33774.688, 'duration': 4.984}, {'end': 33783.415, 'text': 'Next up, we are going to repeat the process for the output layer neurons as well.', 'start': 33779.672, 'duration': 3.743}, {'end': 33790.45, 'text': 'So for the output layer, the net input will be W5 into out of H1, W6 into out of H2 plus B2, that is the bias.', 'start': 33784.025, 'duration': 6.425}, {'end': 33792.772, 'text': 'So we get the output somewhere like this.', 'start': 33791.17, 'duration': 1.602}, {'end': 33799.597, 'text': 'And then the net output for out O1 will be after the activation function, which will be 0.751.', 'start': 33793.252, 'duration': 6.345}], 'summary': 'Calculating output for h2: 0.597, o1 net output: 0.751', 'duration': 28.312, 'max_score': 33771.285, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s33771285.jpg'}], 'start': 32278.637, 'title': 'Tensorflow model building and application', 'summary': 'Covers tensorflow model building, logic gates implementation, and handwritten digit classification with mnist dataset, achieving an accuracy of 91.4% using steepest gradient descent, and discusses multilayer perceptron, back propagation, and convolutional neural networks for improving efficiency to 99%.', 'chapters': [{'end': 32563.204, 'start': 32278.637, 'title': 'Tensorflow model building', 'summary': 'Explains the process of building a tensorflow model using empty tensors, defining variables with random values, and activation functions, calculating output error and mean squared error, and updating the weights through matrix multiplication and addition, ultimately requiring evaluation by a tensorflow session.', 'duration': 284.567, 'highlights': ['The process of building a TensorFlow model using empty tensors and defining variables with random values is explained. TensorFlow works by building a model out of empty tensors and defining a variable with random values using tf.variable and tf.random_normal.', 'Defining an activation function and calculating output error and mean squared error in the TensorFlow model. The chapter covers the definition of a step activation function, calculation of output error, and mean squared error using TensorFlow functions like tf.greater, tf.to_float, tf.multiply, tf.subtract, tf.matmul, and tf.reduce_mean.', 'Updating the weights of the model through matrix multiplication and addition, and the requirement of evaluation by a TensorFlow session. The process of updating the weights through matrix multiplication and addition is explained, along with the necessity of evaluating the model using a TensorFlow session after initializing all variables.']}, {'end': 33167.797, 'start': 32564.218, 'title': 'Implementing logic gates and handwritten digit classification', 'summary': 'Explains implementing logic gates using single layer perceptron and then moves on to discussing handwritten digit classification using tensorflow with mnist dataset, highlighting key concepts and quantifiable data.', 'duration': 603.579, 'highlights': ['Implementing Logic Gates The chapter explains implementing logic gates using single layer perceptron, where the mean squared error was reduced to zero in three epochs, demonstrating the effectiveness of the approach.', 'MNIST Handwritten Digit Classification The chapter discusses the classification of handwritten digits using TensorFlow with the MNIST dataset, which comprises 55,000 training sets and 10,000 test sets, and explains the concept of one-hot encoding.', 'Computation Graph and Regression Model The chapter delves into building the computation graph for the regression model, defining placeholders for input images and target output classes, and initializing variables for weight and biases in the TensorFlow model.']}, {'end': 33503.589, 'start': 33168.396, 'title': 'Tensorflow training and evaluation', 'summary': 'Explains training a model using tensorflow with steepest gradient descent and achieving an accuracy of 91.4% on the mnist dataset, highlighting the limitations of a single layer perceptron.', 'duration': 335.193, 'highlights': ['Achieving 91.4% accuracy on the MNIST dataset The model achieves an accuracy of around 91.4% on the MNIST dataset, indicating the performance of the trained model.', 'Training the model using steepest gradient descent with a learning rate of 0.5 The model is trained using steepest gradient descent with a learning rate of 0.5, demonstrating the optimization process during training.', 'Explaining the limitations of a single layer perceptron The chapter discusses the limitations of a single layer perceptron, providing insights into its drawbacks in handling complex datasets.']}, {'end': 33703.035, 'start': 33504.29, 'title': 'Multilayer perceptron and back propagation', 'summary': 'Discusses the use of multiple neurons in a multilayer perceptron with back propagation to solve the problem of classifying high and low outputs, and explains the back propagation algorithm using an example of lead prioritization.', 'duration': 198.745, 'highlights': ['The use of multiple neurons in a multilayer perceptron allows for the separation of high and low outputs with two lines, solving the classification problem (2 neurons can have two lines separating the high and low outputs)', 'A multilayer perceptron has the same structure as a single layer perceptron but with more than one hidden layer, enhancing its capability (multilayer perceptrons have more than one hidden layer)', 'The back propagation algorithm is used to update weights and increase the efficiency of the model, aiding in the learning process (back propagation algorithm updates weights to increase efficiency)', 'Back propagation is a supervised learning algorithm for multi-layer perceptron, involving comparing actual and desired outputs and updating weights accordingly (back propagation is a supervised learning algorithm for multi-layer perceptron)']}, {'end': 34119.147, 'start': 33703.535, 'title': 'Back propagation and convolutional neural networks', 'summary': 'Explains the back propagation process to update weights in a neural network to minimize errors, achieving an efficiency increase from 97% to 99% using multi-layer perceptron and convolutional neural networks.', 'duration': 415.612, 'highlights': ['Explanation of back propagation process The chapter details the process of back propagation to update weights in a neural network in order to minimize errors, with error calculations and weight updating formulas provided.', 'Efficiency increase from 97% to 99% using multi-layer perceptron and convolutional neural networks The chapter illustrates achieving an efficiency increase from 97% to 99% using multi-layer perceptron and convolutional neural networks to classify images, with a detailed explanation of the process involved.']}], 'duration': 1840.51, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s32278637.jpg', 'highlights': ['Achieving 91.4% accuracy on the MNIST dataset', 'Efficiency increase from 97% to 99% using multi-layer perceptron and convolutional neural networks', 'The back propagation algorithm is used to update weights and increase the efficiency of the model, aiding in the learning process', 'The use of multiple neurons in a multilayer perceptron allows for the separation of high and low outputs with two lines, solving the classification problem (2 neurons can have two lines separating the high and low outputs)', 'The process of building a TensorFlow model using empty tensors and defining variables with random values is explained. TensorFlow works by building a model out of empty tensors and defining a variable with random values using tf.variable and tf.random_normal.']}, {'end': 36372.546, 'segs': [{'end': 34251.064, 'src': 'embed', 'start': 34220.777, 'weight': 2, 'content': [{'end': 34224.78, 'text': 'Basically, you can be from any background and still take up data science and machine learning.', 'start': 34220.777, 'duration': 4.003}, {'end': 34230.121, 'text': 'Now if you want to get into the depth of the field, then obviously you do require programming languages.', 'start': 34225.436, 'duration': 4.685}, {'end': 34237.389, 'text': "But, however, if you just want to start off and you cannot do programming at all, there are a couple of tools that I'll be discussing today,", 'start': 34230.662, 'duration': 6.727}, {'end': 34241.293, 'text': 'and these tools do not require you to have any prior programming knowledge.', 'start': 34237.389, 'duration': 3.904}, {'end': 34244.937, 'text': 'Now in the further slides, you will see me discuss about these tools.', 'start': 34242.134, 'duration': 2.803}, {'end': 34246.879, 'text': "I'll be mentioning a couple of features to y'all.", 'start': 34244.997, 'duration': 1.882}, {'end': 34251.064, 'text': 'Now, let me answer a couple of questions which are very frequently asked.', 'start': 34247.603, 'duration': 3.461}], 'summary': "Data science and machine learning accessible to all backgrounds; some tools don't need programming.", 'duration': 30.287, 'max_score': 34220.777, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s34220777.jpg'}, {'end': 34665.197, 'src': 'embed', 'start': 34613.003, 'weight': 0, 'content': [{'end': 34616.124, 'text': 'You can perform a lot of data visualization through this tool.', 'start': 34613.003, 'duration': 3.121}, {'end': 34625.306, 'text': 'It also comes with an inbuilt RapidMiner Hadoop that allows you to integrate with Hadoop frameworks for data mining and analysis.', 'start': 34616.764, 'duration': 8.542}, {'end': 34631.132, 'text': 'So having Hadoop support is very important because it helps in big data analytics.', 'start': 34626.127, 'duration': 5.005}, {'end': 34637.938, 'text': 'So you can dump all your data into the Hadoop framework and then perform analytics and all of that through RapidMiner.', 'start': 34631.532, 'duration': 6.406}, {'end': 34647.328, 'text': 'Apart from that, it supports any data format and it also performs top class predictive analytics by expertly cleaning the data.', 'start': 34638.619, 'duration': 8.709}, {'end': 34652.291, 'text': 'So data cleaning and data wrangling is done very easily through RapidMiner.', 'start': 34648.128, 'duration': 4.163}, {'end': 34659.234, 'text': 'It uses programming constructs that automate all the high level tasks such as data modeling.', 'start': 34652.971, 'duration': 6.263}, {'end': 34663.196, 'text': 'So data modeling is also automated on RapidMiner.', 'start': 34659.815, 'duration': 3.381}, {'end': 34665.197, 'text': "You don't have to code a single line.", 'start': 34663.557, 'duration': 1.64}], 'summary': 'Rapidminer enables data visualization, hadoop integration, and top-class predictive analytics without coding.', 'duration': 52.194, 'max_score': 34613.003, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s34613003.jpg'}, {'end': 35217.136, 'src': 'embed', 'start': 35187.388, 'weight': 10, 'content': [{'end': 35191.731, 'text': 'It provides the entire data science workflow and it is quite easy to use.', 'start': 35187.388, 'duration': 4.343}, {'end': 35194.752, 'text': 'Moving on to the next tool we have AutoVica.', 'start': 35192.469, 'duration': 2.283}, {'end': 35199.517, 'text': 'Now this is one of my favorite tools when it comes to machine learning and data science.', 'start': 35195.272, 'duration': 4.245}, {'end': 35210.59, 'text': 'It is an open source UI based tool which is ideal for beginners because it provides a very intuitive interface for performing all your data science related tasks.', 'start': 35200.118, 'duration': 10.472}, {'end': 35217.136, 'text': 'So apart from having a simple UI, it supports automated data processing.', 'start': 35211.733, 'duration': 5.403}], 'summary': 'Autovica is an open source ui tool ideal for beginners in data science, supporting automated data processing.', 'duration': 29.748, 'max_score': 35187.388, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s35187388.jpg'}, {'end': 35438.466, 'src': 'embed', 'start': 35409.715, 'weight': 11, 'content': [{'end': 35414.776, 'text': 'It supports Jython scripting, SPSS modulars, and data refinery.', 'start': 35409.715, 'duration': 5.061}, {'end': 35417.217, 'text': 'So it provides support for all these languages.', 'start': 35415.337, 'duration': 1.88}, {'end': 35424.319, 'text': 'Now for coders and data scientists, it offers integration with RStudio, with Scala, with Python, and so on.', 'start': 35417.937, 'duration': 6.382}, {'end': 35427.2, 'text': 'You can see that IBM Watson is a more advanced tool.', 'start': 35424.68, 'duration': 2.52}, {'end': 35432.242, 'text': 'So a lot of data scientists and a lot of coders actually make use of this tool.', 'start': 35427.941, 'duration': 4.301}, {'end': 35438.466, 'text': 'Apart from being automated, it also provides support for the different programming languages like R and Python.', 'start': 35433.122, 'duration': 5.344}], 'summary': 'Ibm watson supports jython, spss, data refinery, rstudio, scala, and python, making it an advanced tool used by many data scientists and coders.', 'duration': 28.751, 'max_score': 35409.715, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s35409715.jpg'}, {'end': 35489.415, 'src': 'embed', 'start': 35458.149, 'weight': 1, 'content': [{'end': 35460.451, 'text': "So I'd say IBM Watson is a more advanced tool.", 'start': 35458.149, 'duration': 2.302}, {'end': 35464.713, 'text': 'It provides support for all the different languages that come under data science.', 'start': 35460.551, 'duration': 4.162}, {'end': 35467.876, 'text': 'It has really good data analysis support.', 'start': 35464.813, 'duration': 3.063}, {'end': 35471.198, 'text': 'So the EDA that is performed in this tool is really good.', 'start': 35468.276, 'duration': 2.922}, {'end': 35477.322, 'text': 'It helps in extracting the most significant variables, and then it builds a model on that variable.', 'start': 35471.218, 'duration': 6.104}, {'end': 35480.764, 'text': 'So guys, that was a little bit about IBM Watson Studio.', 'start': 35477.962, 'duration': 2.802}, {'end': 35483.586, 'text': "Now let's look at our next tool, which is Tableau.", 'start': 35481.224, 'duration': 2.362}, {'end': 35489.415, 'text': 'Now, Tableau is known as the most popular data visualization tool in the market.', 'start': 35484.392, 'duration': 5.023}], 'summary': 'Ibm watson: advanced tool with support for all languages; tableau: popular data visualization tool', 'duration': 31.266, 'max_score': 35458.149, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s35458149.jpg'}, {'end': 35529.995, 'src': 'embed', 'start': 35501.439, 'weight': 14, 'content': [{'end': 35507.944, 'text': 'You can also perform a lot of data analysis also on the tool, but it is one of the best tools for data visualizations.', 'start': 35501.439, 'duration': 6.505}, {'end': 35512.228, 'text': 'A lot of high-end companies, a lot of data-driven companies,', 'start': 35507.964, 'duration': 4.264}, {'end': 35517.672, 'text': 'make use of this tool to see their business growth and to visualize and analyze their data.', 'start': 35512.228, 'duration': 5.444}, {'end': 35520.975, 'text': 'So let me tell you a couple of features of Tableau.', 'start': 35518.553, 'duration': 2.422}, {'end': 35529.995, 'text': 'The Tableau desktop feature, it allows you to create customized reports and dashboards that help you get real-time updates.', 'start': 35521.633, 'duration': 8.362}], 'summary': 'Tableau is a powerful tool for data visualization, used by high-end and data-driven companies for business growth and analysis.', 'duration': 28.556, 'max_score': 35501.439, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s35501439.jpg'}, {'end': 36372.546, 'src': 'embed', 'start': 36352.796, 'weight': 13, 'content': [{'end': 36363.287, 'text': 'but Python is moving itself into a production grade where things can be deployed after the sort of prototype into a production environment and it can phase the customers from day one.', 'start': 36352.796, 'duration': 10.491}, {'end': 36366.67, 'text': 'So that sort of capabilities are coming up with Python.', 'start': 36363.847, 'duration': 2.823}, {'end': 36372.546, 'text': "So let's talk something a bit more specific with the data.", 'start': 36367.884, 'duration': 4.662}], 'summary': 'Python is becoming production-ready, enabling deployment of prototypes to production environment, and serving customers from day one.', 'duration': 19.75, 'max_score': 36352.796, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s36352796.jpg'}], 'start': 34119.147, 'title': 'Tools for data science & ml', 'summary': 'Covers the accessibility of data science and machine learning to non-programmers, the advantages of tools requiring no programming, and the highlights of top tools such as rapidminer, datarobot, bigml, and mlbase. it also discusses machine learning tools like google cloud automl, autovica, and mlbase, and provides tips for creating a data scientist sample resume.', 'chapters': [{'end': 34507.391, 'start': 34119.147, 'title': 'Data science & ml for non-programmers', 'summary': 'Discusses the accessibility of data science and machine learning to non-programmers, addressing the top three questions about the necessity of programming knowledge, the availability of tools requiring no programming, and the importance of advanced mathematics. the chapter also highlights the advantages of using data science and ml tools for non-programmers.', 'duration': 388.244, 'highlights': ["There were 10,000 test samples with 9876 correct predictions, achieving around 99% accuracy, significantly higher than the single layer perceptron example with 92% accuracy. The model's performance on the test set demonstrated an accuracy of around 99%, a substantial improvement compared to a previous example with 92% accuracy, showcasing the effectiveness of the current approach.", 'Tools developed in the last decade have made AI and ML more accessible, even to individuals with no prior programming experience, such as software testers, product managers, and content creators. Advancements in tools over the last decade have democratized access to AI and ML, enabling individuals from diverse backgrounds, including those without programming experience, to engage in these fields.', 'Tools exist that do not require prior programming knowledge, making AI and ML accessible to beginners without programming skills, although learning programming languages is still recommended for a comprehensive understanding. While certain tools do not mandate programming knowledge, the recommendation for beginners to learn programming languages underscores the importance of grasping the underlying principles, despite the availability of non-programming tools.', 'There are tools in the market that facilitate machine learning for business applications, including predictive modeling and statistical analysis, catering to individuals without programming knowledge. The market offers tools specifically designed for business applications of machine learning, providing functionalities for predictive modeling and statistical analysis, particularly beneficial for individuals without programming knowledge.', 'Advanced mathematics, including probability, statistics, and linear algebra, is essential for an in-depth understanding of machine learning and data science, although lack of advanced math knowledge should not deter learning machine learning. While advanced mathematics is fundamental for a comprehensive grasp of machine learning, the absence of such knowledge should not discourage individuals from pursuing machine learning, emphasizing the significance of understanding mathematical concepts for the field.', 'Data science and ML tools offer advantages to non-programmers, including not requiring programming skills, providing an interactive user interface, and automating processes with minimal human intervention. The tools cater to non-programmers by offering an interactive user interface and automated processes, eliminating the need for programming skills and minimizing human intervention, thereby facilitating efficient data processing and model building.']}, {'end': 34972.396, 'start': 34508.254, 'title': 'Top tools for data science and ml', 'summary': 'Discusses the advantages of data science and machine learning tools, and highlights rapidminer, datarobot, bigml, and mlbase as top tools for non-programmers, emphasizing their features and ease of use for data analysis and predictive modeling.', 'duration': 464.142, 'highlights': ['RapidMiner is an all-in-one tool for data science workflow, with a strong and interactive UI, Hadoop integration for big data analytics, support for any data format, top-class predictive analytics, and automated data modeling.', 'DataRobot is an automated machine learning platform that identifies significant features, automates data modeling, tests accuracy of different machine learning models, and incorporates model evaluation methods like parameter tuning, making it a single tool for end-to-end data science and machine learning processes.', 'BigML eases the process of developing machine learning and data science models, offers support for complex machine learning algorithms, provides a simple web interface and APIs, creates visually interactive predictive models, and incorporates bindings and libraries of popular data science languages like Python and Java.', "MLBase is an open source tool used to create large-scale machine learning projects, with components like ML optimizer to automate machine learning pipeline construction, MLI API for developing algorithms and performing feature extractions, and MLlib as Apache Spark's machine learning library, supported by the Spark's community."]}, {'end': 35321.68, 'start': 34972.916, 'title': 'Ml tools overview', 'summary': 'Discusses three machine learning tools: mlbase, google cloud automl, and autovica, highlighting their simple ui, model evaluation, ease of use for non-programmers, and support for beginners, with google cloud automl standing out for its extensive documentation and integration with google cloud services.', 'duration': 348.764, 'highlights': ['Google Cloud AutoML stands out for its extensive documentation and integration with Google Cloud services, making it easy to use for beginners and professionals with minimal experience in machine learning. Extensive documentation, integration with Google Cloud services, easy to use for beginners and professionals with minimal experience', 'MLBase provides a very simple UI for developing machine learning models, making it easy for non-programmers to scale data science modules and effective for large projects. Simple UI, easy for non-programmers, effective for large projects', 'AutoVica is an open source UI based tool ideal for beginners, with a supportive community, extensive data processing, and a wide range of machine learning algorithms. Open source UI, ideal for beginners, supportive community, extensive data processing, wide range of machine learning algorithms']}, {'end': 36017.784, 'start': 35322.602, 'title': 'Top tools for data science & machine learning', 'summary': 'Discusses the key features and applications of ibm watson studio, tableau, trafactor, and knime, emphasizing their contribution to ai and data science, their automated data analysis capabilities, and their support for various programming languages. the chapter also provides tips for creating a data scientist sample resume.', 'duration': 695.182, 'highlights': ['IBM Watson Studio provides support for data preparation, exploration, modeling, and multiple data science languages, with an emphasis on automation and quick end-to-end process. IBM Watson Studio offers quick and automated support for data preparation, exploration, and modeling, with the ability to perform the entire end-to-end process in a few minutes. It also supports multiple data science languages and tools such as Python 3 notebooks, Jython scripting, SPSS modulars, and data refinery.', 'Tableau is known for its data visualization capabilities, customized reports, dashboards for real-time updates, and cross-database join functionality, making it a popular choice for high-end companies for data visualization and analysis. Tableau is renowned for its data visualization capabilities, allowing the creation of customized reports and dashboards for real-time updates. It also offers cross-database join functionality and can visualize a significant amount of data to identify correlations and patterns.', 'Trafactor stands out as an enterprise data wrangling platform with the ability to connect to multiple data sources, interactive UI, visual guidance, machine learning workflows, and monitoring capabilities, making it a powerful tool for data preparation and cleaning. Trafactor is an enterprise data wrangling platform that can connect to multiple data sources and provides an interactive UI for understanding and cleaning the data. It also offers visual guidance, machine learning workflows, and monitoring capabilities to ensure data preparation and cleaning.', 'KNIME is an open source data analytics platform enabling end-to-end data science workflow creation without coding, with support for various data sourcing formats, data wrangling, feature extraction, normalization, data modeling, model evaluation, and interactive visualizations, making it a comprehensive tool for data science. KNIME is an open source data analytics platform that allows the creation of end-to-end data science workflows without coding. It supports various data sourcing formats, data wrangling, feature extraction, normalization, data modeling, model evaluation, and interactive visualizations.', 'The resume tips emphasize the importance of showcasing data science projects, proficiency in programming languages such as R and Python, strong understanding of predictive modeling and machine learning algorithms, data mining, cleaning, and modeling, graphical modeling, data visualization, and deep learning using neural networks. The resume tips highlight the significance of showcasing data science projects, proficiency in programming languages such as R and Python, understanding of predictive modeling and machine learning algorithms, data mining, cleaning, and modeling, graphical modeling, data visualization, and deep learning using neural networks.']}, {'end': 36372.546, 'start': 36018.424, 'title': 'Data scientist resume tips', 'summary': 'Provides key points for building a data scientist resume, including stating career objectives, educational qualifications, professional experience, technical and non-technical skills, and essential knowledge in data science, computer science, and mathematics.', 'duration': 354.122, 'highlights': ["Stating career objectives, educational qualifications, professional experience, technical, and non-technical skills are key points for building a data scientist resume. It's important to clearly state career objectives, educational qualifications, professional experience, technical, and non-technical skills when building a data scientist resume.", 'Preferred educational backgrounds for data scientists include computer science and statistics. Individuals with backgrounds in computer science or statistics are preferred for data science roles.', 'Python is a highly sought-after programming language for data science, especially with libraries like NumPy and Pandas. Python, with libraries like NumPy and Pandas, has established itself as a robust framework for designing data science solutions.', 'Data science involves applying computer science and mathematical knowledge to real-world business applications to achieve a return on investment. Data science involves applying computer science and mathematical knowledge to real-world business applications to achieve a return on investment.', 'Data scientists are required to have strong communication skills and business acumen for effectively communicating with stakeholders. Data scientists need strong communication skills and business acumen to effectively communicate with stakeholders and handle business processes.']}], 'duration': 2253.399, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s34119147.jpg', 'highlights': ['Tools have made AI and ML accessible to non-programmers, enabling diverse backgrounds.', 'Advanced math is essential for understanding ML, but lack of it should not deter learning.', 'Data science and ML tools offer advantages to non-programmers, including not requiring programming skills.', 'RapidMiner is an all-in-one tool for data science workflow with a strong and interactive UI.', 'DataRobot is an automated ML platform that automates data modeling and incorporates model evaluation methods.', 'BigML eases the process of developing ML and data science models, offering support for complex algorithms.', 'MLBase is an open source tool used to create large-scale machine learning projects.', 'Google Cloud AutoML stands out for its extensive documentation and integration with Google Cloud services.', 'AutoVica is an open source UI based tool ideal for beginners, with a wide range of machine learning algorithms.', 'IBM Watson Studio offers quick and automated support for data preparation, exploration, and modeling.', 'Tableau is known for its data visualization capabilities and customized reports.', 'Trafactor is an enterprise data wrangling platform with interactive UI, visual guidance, and monitoring capabilities.', 'KNIME is an open source data analytics platform enabling end-to-end data science workflow creation without coding.', 'The resume tips emphasize the importance of showcasing data science projects and proficiency in programming languages.', 'Stating career objectives, educational qualifications, professional experience, technical, and non-technical skills are key points for building a data scientist resume.', 'Preferred educational backgrounds for data scientists include computer science and statistics.', 'Python is a highly sought-after programming language for data science, especially with libraries like NumPy and Pandas.', 'Data science involves applying computer science and mathematical knowledge to real-world business applications.', 'Data scientists are required to have strong communication skills and business acumen.']}, {'end': 38103.891, 'segs': [{'end': 36502.251, 'src': 'embed', 'start': 36474.189, 'weight': 4, 'content': [{'end': 36480.014, 'text': 'right, but what happens with this is, while we do this 1 million record selection in a randomized way,', 'start': 36474.189, 'duration': 5.825}, {'end': 36486.84, 'text': 'there are chances that you might have certain bias in the analysis, obviously because you are not using the entire population.', 'start': 36480.014, 'duration': 6.826}, {'end': 36494.146, 'text': 'so selection bias is this particular sort of a characteristics while you are doing a sampling on a large population of data.', 'start': 36486.84, 'duration': 7.306}, {'end': 36502.251, 'text': 'A very common example for this is if you want to do an exit poll analysis of a particular election even before the election results are coming up,', 'start': 36494.526, 'duration': 7.725}], 'summary': 'Sampling 1 million records in a randomized way may lead to selection bias in analysis of a large population.', 'duration': 28.062, 'max_score': 36474.189, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s36474189.jpg'}, {'end': 36639.941, 'src': 'embed', 'start': 36615.174, 'weight': 2, 'content': [{'end': 36623.796, 'text': 'certain data visualizations would need you to not put your attributes as a separate column but rather as the one column which can have the attribute names,', 'start': 36615.174, 'duration': 8.622}, {'end': 36628.198, 'text': 'so which might then go into building your legends right.', 'start': 36623.796, 'duration': 4.402}, {'end': 36633.899, 'text': 'so these kind of techniques are kind of very common formats between, like the long and divide,', 'start': 36628.198, 'duration': 5.701}, {'end': 36639.941, 'text': 'and very frequently will people like deal with both these data formats, depending on what tasks they are doing,', 'start': 36633.899, 'duration': 6.042}], 'summary': 'Data visualizations may require attribute names in one column, using common formats like long and wide.', 'duration': 24.767, 'max_score': 36615.174, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s36615174.jpg'}, {'end': 37233.473, 'src': 'embed', 'start': 37211.468, 'weight': 1, 'content': [{'end': 37221.977, 'text': 'So you have your, in this case your FP, which is the false positive cases, and on the off diagonal element, if you look at this one, which is FN,', 'start': 37211.468, 'duration': 10.509}, {'end': 37227.261, 'text': 'which is your false negative for the cases where you are predicting the customer is not going to buy,', 'start': 37221.977, 'duration': 5.284}, {'end': 37231.332, 'text': 'but in the actual data it says that customer actually has brought the product.', 'start': 37227.83, 'duration': 3.502}, {'end': 37233.473, 'text': 'so in this case the model is wrong.', 'start': 37231.332, 'duration': 2.141}], 'summary': "Model's false positive and false negative cases affect its accuracy.", 'duration': 22.005, 'max_score': 37211.468, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s37211468.jpg'}, {'end': 37397.68, 'src': 'embed', 'start': 37357.121, 'weight': 5, 'content': [{'end': 37362.703, 'text': 'so these words are very common and the idea is depending on the complexity of your model.', 'start': 37357.121, 'duration': 5.582}, {'end': 37370.465, 'text': 'you might see that you want to adapt very sort of exactly to your data points or you might want to do a generalization.', 'start': 37362.703, 'duration': 7.762}, {'end': 37371.766, 'text': 'so for instance here,', 'start': 37370.465, 'duration': 1.301}, {'end': 37387.814, 'text': 'if I have these red and blue dots here right and if I draw a curve like this which separates the red from the blue and when this separation happens I am building actually a classifier using some sort of modeling technique.', 'start': 37371.766, 'duration': 16.048}, {'end': 37397.68, 'text': 'but now imagine, by drawing a smooth curve like the one which is given in black, you might be over generalizing it right,', 'start': 37387.814, 'duration': 9.866}], 'summary': 'Adapt model complexity to data points for accurate classification.', 'duration': 40.559, 'max_score': 37357.121, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s37357121.jpg'}, {'end': 37507.388, 'src': 'embed', 'start': 37482.929, 'weight': 0, 'content': [{'end': 37491.622, 'text': 'as you would be like very aware of, like things like standard deviation averages, how to interpret median, how to interpret quartiles right,', 'start': 37482.929, 'duration': 8.693}, {'end': 37495.064, 'text': 'the first quartile, second quartile and so on.', 'start': 37491.622, 'duration': 3.442}, {'end': 37497.685, 'text': 'and what do you mean by percentiles?', 'start': 37495.064, 'duration': 2.621}, {'end': 37499.585, 'text': 'right, these are some basic questions.', 'start': 37497.685, 'duration': 1.9}, {'end': 37505.447, 'text': 'a bit more complex in nature might be discussions around sensitivity over fitting, under fitting.', 'start': 37499.585, 'duration': 5.862}, {'end': 37507.388, 'text': 'these are like statistical ideas.', 'start': 37505.447, 'duration': 1.941}], 'summary': 'Understanding statistical concepts like standard deviation, averages, median, quartiles, and percentiles, as well as discussions on sensitivity, overfitting, and underfitting.', 'duration': 24.459, 'max_score': 37482.929, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s37482929.jpg'}, {'end': 37616.276, 'src': 'embed', 'start': 37588.507, 'weight': 3, 'content': [{'end': 37594.514, 'text': 'so in particular, R has libraries like tm right, the text mining package, python as well.', 'start': 37588.507, 'duration': 6.007}, {'end': 37605.006, 'text': 'we have packages like pandas, packages like the numpy ones right, and also packages like NLTK, which is built only for a natural language processing,', 'start': 37594.514, 'duration': 10.492}, {'end': 37609.81, 'text': 'so it can deal with many different sort of text mining approaches or text analytics approaches.', 'start': 37605.006, 'duration': 4.804}, {'end': 37616.276, 'text': 'so in comparison, if you talk about, as I said, the robustness in python is much more than in R, but in terms of features,', 'start': 37609.81, 'duration': 6.466}], 'summary': 'R and python have libraries like tm, pandas, numpy, and nltk for text mining and natural language processing, with python being more robust.', 'duration': 27.769, 'max_score': 37588.507, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s37588507.jpg'}], 'start': 36372.947, 'title': 'Data analysis techniques and pitfalls', 'summary': 'Delves into selection bias, normal distribution, ab testing, and best practices in data analysis. it emphasizes the challenges of selection bias and the significance of understanding normal distribution. additionally, it covers ab testing in web design, the importance of sensitivity in machine learning models, and best practices such as data cleaning, multivariate analysis, sampling techniques, and the role of eigenvalue and eigenvectors in reducing data dimensions.', 'chapters': [{'end': 36537.944, 'start': 36372.947, 'title': 'Selection bias in data analysis', 'summary': 'Discusses the concept of selection bias in data analysis, where a large dataset requires a representative sample and the challenges of using randomized selection to minimize bias, illustrated with an example of an exit poll analysis.', 'duration': 164.997, 'highlights': ['Selection bias occurs in data analysis when a large dataset, such as 1 billion records, requires a representative sample for analysis, leading to the challenge of minimizing bias while using randomized selection. The chapter emphasizes the challenge of ensuring a representative sample from a large dataset of 1 billion records, highlighting the difficulty of minimizing bias while using randomized selection.', 'The chapter explains that randomized selection of a smaller subset, such as 1 million records, from a large dataset is aimed at achieving a true representation of the entire population but may still introduce bias into the analysis. It discusses the intention of using randomized selection to obtain a true representation of the entire population, but also acknowledges the potential introduction of bias into the analysis despite this approach.', "The example of an exit poll analysis is used to illustrate the impact of selection bias, where a non-representative sample may lead to inaccurate conclusions about the entire population's opinion. An example of an exit poll analysis is provided to demonstrate the potential consequences of selection bias, emphasizing the possibility of inaccurate conclusions about the entire population's opinion due to a non-representative sample."]}, {'end': 36870.057, 'start': 36538.384, 'title': 'Dealing with data and normal distribution', 'summary': 'Covers dealing with structured data formats, the significance of normal distribution in data analysis, and the importance of understanding normal distribution in statistical techniques and modeling exercises.', 'duration': 331.673, 'highlights': ['Understanding structured data formats and the benefits of long and wide formats in data visualization Long and wide formats are common structured data formats with benefits in data visualization, particularly in building visualization dashboards.', 'The significance of normal distribution in data analysis and its characteristics in understanding data Normal distribution is commonly used in analyzing data, particularly in understanding characteristics of data such as salary ranges, bell curve distribution, and its significance in data analysis.', 'The importance of normal distribution in statistical techniques and modeling exercises Normal distribution plays a crucial role in statistical techniques and modeling exercises, where its understanding is fundamental in making modeling assumptions and applying certain modeling techniques.']}, {'end': 37631.692, 'start': 36870.057, 'title': 'Ab testing, sensitivity, and overfitting in data analysis', 'summary': 'Covers the concept of ab testing in web design changes, the importance of sensitivity in machine learning models, and the potential issues of overfitting and underfitting in model building.', 'duration': 761.635, 'highlights': ['The chapter explains the concept of AB testing in the context of web design changes, highlighting the process of defining a metric to measure the impact of changes and the use of A-B testing framework for risk identification. AB testing is used by companies like LinkedIn to test changes in website design and features, where analysts define metrics to measure the impact of changes and use a framework to identify risks through randomized user groups.', 'The importance of sensitivity in evaluating machine learning models is emphasized, with a focus on controlling true positive and true negative cases to ensure balanced model performance. Sensitivity is crucial for evaluating machine learning models, as it helps control true positive and true negative cases, ensuring balanced model performance and statistical power.', 'The potential issues of overfitting and underfitting in model building are discussed, emphasizing the need for a balanced approach to generalization and pattern recognition in data analysis. The chapter highlights the potential issues of overfitting and underfitting in model building, emphasizing the need for a balanced approach to generalization and pattern recognition in data analysis to avoid skewed predictions.']}, {'end': 38103.891, 'start': 37631.692, 'title': 'Data analysis techniques and best practices', 'summary': 'Emphasizes the importance of data cleaning and understanding, multivariate analysis, sampling techniques, and eigenvalue and eigenvectors in data analysis, with 70-80% of time spent on cleaning and understanding the data, and a powerful idea of eigenvalue and eigenvectors in reducing the dimensions of a large data set, commonly used in principal component analysis (pca).', 'duration': 472.199, 'highlights': ['70-80% of time is spent on data cleaning and understanding in any data analysis task Data cleaning and understanding take up 70-80% of the time in any data analysis task.', 'Importance of eigenvalue and eigenvectors in reducing the dimensions of a large data set, commonly used in principal component analysis (PCA) Eigenvalue and eigenvectors are important in reducing the dimensions of a large data set and are commonly used in principal component analysis (PCA).', 'Emphasis on multivariate analysis to understand complex problems involving multiple factors Multivariate analysis is emphasized to understand complex problems involving multiple factors.', 'Explanation of sampling techniques such as cluster-based and systematic sampling for accurate analysis The transcript explains sampling techniques like cluster-based and systematic sampling for accurate analysis.']}], 'duration': 1730.944, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s36372947.jpg', 'highlights': ['Selection bias challenge: ensuring representative sample from 1 billion records.', 'Randomized selection aims for true representation but may introduce bias.', 'Exit poll analysis example illustrates impact of selection bias on conclusions.', 'Long and wide data formats benefit data visualization and dashboard creation.', 'Significance of normal distribution in analyzing data and statistical techniques.', 'AB testing in web design: defining metrics, using A-B testing framework for risk identification.', 'Importance of sensitivity in evaluating machine learning models for balanced performance.', 'Balanced approach needed to avoid overfitting and underfitting in model building.', 'Data cleaning and understanding consume 70-80% of time in any data analysis task.', 'Eigenvalue and eigenvectors crucial in reducing dimensions of large data sets in PCA.', 'Emphasis on multivariate analysis for understanding complex problems with multiple factors.', 'Explanation of cluster-based and systematic sampling techniques for accurate analysis.']}, {'end': 39195.264, 'segs': [{'end': 38298.988, 'src': 'embed', 'start': 38274.824, 'weight': 2, 'content': [{'end': 38282.526, 'text': 'so in simple terms, it is better to not expose the patient with a false positive with a treatment like this chemotherapy, like treatment,', 'start': 38274.824, 'duration': 7.702}, {'end': 38290.909, 'text': "then it is like much better than saying you don't have cancer okay, and a very similar example in some other context might also come up.", 'start': 38282.526, 'duration': 8.383}, {'end': 38293.89, 'text': 'so if you would like to think of some other examples in the same context.', 'start': 38290.909, 'duration': 2.981}, {'end': 38298.988, 'text': 'okay, so where is the other case now, which is the false negative right?', 'start': 38294.545, 'duration': 4.443}], 'summary': 'Avoiding false positives in chemotherapy is crucial for patient well-being, as with false negatives in other contexts.', 'duration': 24.164, 'max_score': 38274.824, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s38274824.jpg'}, {'end': 38940.605, 'src': 'embed', 'start': 38910.528, 'weight': 1, 'content': [{'end': 38916.589, 'text': 'So depending on the presence of a label, we can say either to use supervised or an unsupervised learning.', 'start': 38910.528, 'duration': 6.061}, {'end': 38923.55, 'text': 'And both of these approaches are quite common, and sometimes there are certain algorithms which can have the both ways.', 'start': 38917.129, 'duration': 6.421}, {'end': 38928.611, 'text': 'It can also learn an unsupervised manner and the supervised manner, so depending on how you model the problem.', 'start': 38923.79, 'duration': 4.821}, {'end': 38933.452, 'text': 'The fundamental difference comes from the fact on whether we have the label or not.', 'start': 38929.031, 'duration': 4.421}, {'end': 38940.605, 'text': 'okay. so when we talk about the supervised learning algorithms other name for supervise kind of,', 'start': 38934.381, 'duration': 6.224}], 'summary': 'Presence of label determines use of supervised or unsupervised learning.', 'duration': 30.077, 'max_score': 38910.528, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s38910528.jpg'}, {'end': 39079.214, 'src': 'embed', 'start': 39055.478, 'weight': 0, 'content': [{'end': 39064.089, 'text': 'and these algorithms are best suited for two class problems or a binary problem right where you have either y, yes or no, Quite a common technique,', 'start': 39055.478, 'duration': 8.611}, {'end': 39072.352, 'text': 'as I mentioned, and in all the possible cases wherever you have these binary classes of problems, you might use a logistic regression.', 'start': 39064.089, 'duration': 8.263}, {'end': 39079.214, 'text': 'A political leader winning an election or not, somebody getting a success in an examination or not and, as I mentioned,', 'start': 39072.912, 'duration': 6.302}], 'summary': 'Logistic regression is best suited for binary class problems, such as election wins and examination success.', 'duration': 23.736, 'max_score': 39055.478, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s39055478.jpg'}], 'start': 38105.412, 'title': 'Machine learning fundamentals', 'summary': 'Covers the importance of false positives and false negatives in different scenarios, the process of building a machine learning model, and the concepts of supervised and unsupervised learning in machine learning, providing insights into their relevance in various industries and applications.', 'chapters': [{'end': 38454.687, 'start': 38105.412, 'title': 'Understanding false positives and false negatives', 'summary': 'Discusses the importance of false positives and false negatives in different scenarios, such as in medical diagnoses and criminal convictions, highlighting the potential consequences of each, as well as their relevance in the banking industry.', 'duration': 349.275, 'highlights': ['The importance of false positives in medical diagnoses, such as in the case of predicting cancer and the subsequent decision of administering chemotherapy, where false positives can lead to harmful side effects, is emphasized (e.g., chemotherapy administered to a patient without cancer cells).', 'The significance of false negatives in criminal convictions, where the potential harm caused by letting a criminal go free due to a false negative prediction outweighs the inconvenience of holding a suspect for a longer period, is highlighted (e.g., the risk of a criminal going free from the judicial system).', 'The equal role of false positives and false negatives in the banking industry, particularly in loan approvals, where both types of errors can lead to financial losses, is discussed, emphasizing the impact of each type of error on the business (e.g., losing business due to false negatives and taking a financial risk due to false positives).']}, {'end': 38804.633, 'start': 38454.687, 'title': 'Building machine learning model', 'summary': 'Covers the process of building a machine learning model, including data division into training, validation, and testing sets, the k-fold cross validation approach, and the importance of cross validation in model generalization and performance improvement.', 'duration': 349.946, 'highlights': ['The chapter covers the process of building a machine learning model, including data division into training, validation, and testing sets. It explains the need to divide the data set into training, validation, and testing sets for building a machine learning model.', 'It describes the k-fold cross validation approach, where a small portion of the data is kept for validation, and the rest is used for training, with the validation set changing in each fold. The k-fold cross validation approach involves using a training set and a validation set, with a small portion of the data kept for validation and the rest used for training, and the validation set changing in each fold.', 'The chapter emphasizes the importance of cross validation in model generalization and performance improvement. It highlights the significance of cross validation in ensuring the model generalizes well to the data and brings performance improvements, addressing overfitting and underfitting cases.']}, {'end': 39195.264, 'start': 38804.633, 'title': 'Supervised and unsupervised learning in machine learning', 'summary': 'Discusses the concepts of supervised and unsupervised learning in machine learning, highlighting the fundamental differences, types of algorithms, and common use cases, such as classification and recommender systems.', 'duration': 390.631, 'highlights': ['Supervised learning involves labeled input attributes, while unsupervised learning deals with unlabeled input attributes, with common algorithms including support vector machine regression and logistic regression. Supervised learning with labeled input attributes; unsupervised learning with unlabeled input attributes; common algorithms like support vector machine regression and logistic regression.', 'Classification is a type of supervised learning algorithm used for categorizing input attributes into classes, such as identifying fruits or predicting customer behavior in banking sectors. Classification as a type of supervised learning; examples include identifying fruits and predicting customer behavior in banking sectors.', 'Logistic regression, a widely used algorithm in banking and companies like American Express, is best suited for two-class problems and binary classification, such as predicting customer defaulters. Logistic regression widely used in banking and by companies like American Express; best suited for two-class problems and binary classification, such as predicting customer defaulters.', 'Recommender systems, widely used in platforms like Amazon, YouTube, Netflix, and Facebook, provide personalized recommendations for products, videos, movies, and friends, benefiting businesses by increasing sales and user engagement. Recommender systems widely used in platforms like Amazon, YouTube, Netflix, and Facebook; provide personalized recommendations for products, videos, movies, and friends; benefit businesses by increasing sales and user engagement.']}], 'duration': 1089.852, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s38105412.jpg', 'highlights': ['The equal role of false positives and false negatives in the banking industry, particularly in loan approvals, where both types of errors can lead to financial losses, is discussed, emphasizing the impact of each type of error on the business (e.g., losing business due to false negatives and taking a financial risk due to false positives).', 'The chapter emphasizes the importance of cross validation in model generalization and performance improvement. It highlights the significance of cross validation in ensuring the model generalizes well to the data and brings performance improvements, addressing overfitting and underfitting cases.', 'Recommender systems, widely used in platforms like Amazon, YouTube, Netflix, and Facebook, provide personalized recommendations for products, videos, movies, and friends, benefiting businesses by increasing sales and user engagement.']}, {'end': 40926.035, 'segs': [{'end': 39358.588, 'src': 'embed', 'start': 39333.338, 'weight': 0, 'content': [{'end': 39340.001, 'text': 'so from the past data I know that given these attributes of the house, what should be the ideal price of a house?', 'start': 39333.338, 'duration': 6.663}, {'end': 39343.583, 'text': "I'll use that as my training data and then build my model for future.", 'start': 39340.001, 'duration': 3.582}, {'end': 39344.223, 'text': 'so now,', 'start': 39343.943, 'duration': 0.28}, {'end': 39351.646, 'text': 'with any such similar pattern in any data which is going to be coming in future for maybe our new property which is built in some XYZ location,', 'start': 39344.223, 'duration': 7.423}, {'end': 39358.588, 'text': 'I can use the model and predict exactly, because these features are somewhere similar in that locality, the prices might be in a particular range.', 'start': 39351.646, 'duration': 6.942}], 'summary': 'Using past data to train a model for predicting future house prices based on similar attributes.', 'duration': 25.25, 'max_score': 39333.338, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s39333338.jpg'}, {'end': 39835.016, 'src': 'embed', 'start': 39805.439, 'weight': 1, 'content': [{'end': 39812.564, 'text': 'And once we have found out that the data is very good, now, after we have removed all the outliers and the missing values and so on,', 'start': 39805.439, 'duration': 7.125}, {'end': 39820.368, 'text': 'you then start to understand certain relationship like given in input attributes relate in some way or the other right.', 'start': 39812.564, 'duration': 7.804}, {'end': 39827.552, 'text': 'so this is a stage where you start to prepare for any further inside building exercise or like model building exercise.', 'start': 39820.368, 'duration': 7.184}, {'end': 39835.016, 'text': "and let's say, if you build the model in this step, the immediate step is to validate it right, whether the model is really good or not,", 'start': 39827.552, 'duration': 7.464}], 'summary': 'Data preparation is crucial for model building and validation.', 'duration': 29.577, 'max_score': 39805.439, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s39805439.jpg'}, {'end': 40136.996, 'src': 'embed', 'start': 40112.54, 'weight': 3, 'content': [{'end': 40118.264, 'text': 'So it is like sometimes in most of the standard literature, probability and statistics comes together.', 'start': 40112.54, 'duration': 5.724}, {'end': 40120.926, 'text': 'It is inseparable anytime.', 'start': 40118.284, 'duration': 2.642}, {'end': 40127.591, 'text': 'So, in the name-based algorithm like this is one of the machine learning algorithm which is based on the Bayes Theorem,', 'start': 40121.467, 'duration': 6.124}, {'end': 40136.996, 'text': 'probability ideas are quite a lot used and there are some really niche probability concepts like the probability graph models,', 'start': 40128.226, 'duration': 8.77}], 'summary': 'Probability and statistics are inseparable in the name-based algorithm, which uses bayes theorem and niche probability concepts.', 'duration': 24.456, 'max_score': 40112.54, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s40112540.jpg'}], 'start': 39195.968, 'title': 'Recommender systems and probability calculations', 'summary': 'Covers fundamental concepts of recommender systems, including collaborative filtering approaches and handling outliers, as well as probability calculations for various scenarios such as shooting stars and random number generation using dice and coins, highlighting the practical applications of these concepts in data analysis and model building.', 'chapters': [{'end': 39257.928, 'start': 39195.968, 'title': 'Comparing users and items for recommender systems', 'summary': 'Discusses the fundamental idea behind recommender systems, focusing on comparing two users or two items, such as products, and the use cases that arise. it also outlines the famous collaborative filtering approaches for recommender systems.', 'duration': 61.96, 'highlights': ['Recommender systems aim to compare two users or two items, such as products, to determine similarity and make recommendations based on high similarity.', 'Use cases for recommender systems include recommending products on Amazon based on similarity, and suggesting connections between similar users on social media platforms like Facebook.', 'Famous examples of recommender system approaches include user-based and item-based collaborative filtering algorithms, both commonly used in recommender systems.']}, {'end': 39488.343, 'start': 39258.449, 'title': 'Linear regression in machine learning', 'summary': 'Discusses linear regression in machine learning, including its use for predicting values, the optimization of model fit, and the consideration of polynomial regression for non-linear relationships, with a focus on statistical concepts like p-values and hypothesis testing.', 'duration': 229.894, 'highlights': ['Linear regression models in machine learning can regress over a given input data to predict a value, like the price of a house in a particular locality based on attributes such as number of bedrooms and area in square feet. Use of linear regression for predicting values such as house prices based on input attributes.', 'The goal is to fit a line passing through as closely as possible to all the data points and minimize the error, which is the sum of all the distances, emphasizing the optimization of model fit. Emphasis on fitting a line to minimize error and optimize model fit.', 'Consideration of polynomial regression for non-linear relationships, with a focus on statistical concepts like hypothesis testing, p-values, and confidence intervals. Consideration of polynomial regression for non-linear relationships and its statistical underpinnings.']}, {'end': 40175.506, 'start': 39488.343, 'title': 'Recommendation algorithms and handling outliers', 'summary': 'Discusses recommendation algorithms, particularly item-based and user-based collaborative filtering, and the importance of handling outliers in data analysis and model building, including techniques such as imputation and k-means clustering algorithm.', 'duration': 687.163, 'highlights': ['The chapter discusses the two commonly used recommendation algorithms, item-based collaborative filtering (IBCF) and user-based collaborative filtering (UBCF), and their role in recommending suitable items based on user behaviors and ratings, with a specific example of recommending movies to users. ', 'The importance of handling outliers in data analysis and model building is emphasized, with examples of how outliers can mislead models, and techniques such as removing outliers based on mean plus three standard deviations or using percentile-based methods are discussed. ', 'The process of handling missing values in data, including techniques such as imputation by calculating averages within specific segments, and the caution against discarding rows with missing values when data is limited in number, is explained. ', 'The k-means clustering algorithm and the approach of determining the appropriate value of k using the Elbow Curve method, which involves plotting the number of clusters against the within sum of square values, is detailed with a graphical representation and explanation. ', 'The association of probability concepts with machine learning algorithms, particularly the use of Bayes Theorem in name-based algorithms and probability graph models, is mentioned, with an emphasis on the importance of understanding fundamental probability concepts in interviews. ']}, {'end': 40926.035, 'start': 40176.027, 'title': 'Probability calculations and random number generation', 'summary': "Discusses probability calculations for seeing shooting stars and generating random numbers between 1 to 7 using a die, along with the probability of getting two girls in a couple's children and the probability of selecting a head after seeing 10 heads in a row while tossing a coin from a jar of 1000 coins, where one is a double-headed coin.", 'duration': 750.008, 'highlights': ['The probability of seeing a shooting star in a period of one hour is 0.6, and the probability of not seeing a shooting star in a period of one hour is 0.4. The probability of not seeing the shooting star in a period of one hour is calculated as 0.8^4, resulting in 0.4, and the probability of seeing a shooting star in a period of one hour is 1 - 0.4, which equals 0.6.', 'To generate a random number between 1 to 7 using a die, one approach is to roll the die twice, resulting in 36 possible outcomes, and then assigning values to ensure each number between 1 to 7 is equally likely to occur. By rolling the die twice, the number of outcomes increases to 36, and by strategically assigning values to these outcomes, the probabilities of generating numbers between 1 to 7 become equally likely, ensuring a fair random number generation.', 'The probability of a couple having two girl children, given that at least one of the children is a girl, is 1/3, derived from equally likely combinations of having two boys, two girls, or one of each. By considering all possible combinations of having two boys, two girls, or one of each, and noting that at least one child is a girl, the probability of having two girl children is calculated as 1/3.', 'The probability of selecting another head after seeing 10 heads in a row while tossing a coin from a jar of 1000 coins, where one is a double-headed coin, is 0.75. After calculating the probabilities of selecting a fair coin and getting 10 heads, and selecting the double-headed coin, the probability of selecting another head is found to be 0.75, slightly higher due to the presence of the double-headed coin.']}], 'duration': 1730.067, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/xiEC5oFsq2s/pics/xiEC5oFsq2s39195968.jpg', 'highlights': ['Recommender systems compare users or items to determine similarity for making recommendations.', 'Linear regression models predict values based on input data and optimize model fit.', 'Handling outliers and missing values in data analysis is crucial for model building.', 'Probability concepts are associated with machine learning algorithms and practical scenarios.']}], 'highlights': ['The chapter covers the fundamentals of data science and the roadmap to becoming a data scientist, along with salary statistics.', "Walmart is building the world's biggest private cloud capable of processing 2.5 petabytes of data every hour to gain useful insights into customers' shopping patterns.", 'Google Maps uses data science and machine learning algorithms to collect and analyze data from a multitude of reliable sources, providing real-time traffic updates to users.', 'Walmart used data analysis to place strawberry pop-tarts at the checkout before hurricanes, increasing their sales.', 'Python is the most popular language in data science for analyzing huge volumes of data (relevance score: 5)', 'Pandas library provides fast and flexible data structures for quick data analysis (relevance score: 4)', 'Knowledge of SQL, NoSQL, Hadoop, and Spark is essential for data scientists (relevance score: 3)', 'The chapter explains the characteristics of big data, including volume, variety, velocity, and veracity, which are essential for understanding the nature of big data.', 'The chapter emphasizes the necessity for data scientists to have knowledge about data storage, particularly focusing on Hadoop as an important technology for data storage in big data analytics.', 'The US Bureau of Labor Statistics estimates around 11.5 million data scientist jobs by 2026, indicating significant growth in the field due to the increasing importance of data science in decision making across organizations globally.', 'Experienced data scientists can earn over 20-30 lakhs in India and $200,000 to $500,000 in the United States per annum, with the potential for bonuses and incentives, reflecting the lucrative nature of the field.', 'Data scientists are responsible for asking the right questions, cleaning and prepping data, performing exploratory data analysis, choosing appropriate models and algorithms, checking model accuracy, making reports, and continuously adjusting and retraining models based on stakeholder feedback to guide business decisions.', 'Understanding the importance of probability and statistics as the basis for processing data, with emphasis on learning topics like probability rules, conditional probability, probability distributions, and statistics terminologies.', 'Emphasizing the significance of learning Python or R programming languages, understanding their basic syntax, data structures, file operations, functions, object-oriented programming, and essential libraries like NumPy, Pandas, and Matplotlib.', 'Stressing the importance of learning data visualization tools like Tableau, and understanding basic and advanced visual analytics, calculations, geographical visualizations, advanced charts, dashboards, and stories.', 'Probability sampling is crucial for inferring statistical knowledge about a population, emphasizing its three types.', 'Descriptive and inferential statistics are explained, highlighting their differences and applications in statistical analysis.', 'The information gain for the attribute outlook is 0.247, the highest among the variables, guiding the choice of the root node in decision tree analysis.', 'Introduction of confusion matrix in evaluating the performance of a classifier, highlighting its application for calculating accuracy and comparing actual and predicted results.', 'Demonstration in R on calculating mean, median, mode, variance, and standard deviation showcased the practical application of statistical concepts, aiding in a better understanding of descriptive statistics.', 'The probability of a candidate having a good package, given that they have not undergone any training, is 5 divided by 60, resulting in a probability of around 0.08. Determining the conditional probability of candidates having a good package without training.', 'Machine learning is important due to the increase in data generation, improving decision-making, uncovering patterns and trends, and solving complex problems. Machine learning is crucial due to the increase in data generation, improved decision-making, uncovering patterns, and solving complex problems.', 'Exploratory data analysis involves diving deep into the data to find hidden patterns and mysteries, akin to a brainstorming of machine learning.', 'Regression in machine learning is the construction of an efficient model to predict dependent attributes from attribute variables, with the output variable being real or a continuous value, used in applications like housing and investing.', 'Random Forest: Building a forest of decision trees to make powerful decisions based on majority outcomes, also known as bagging methodology.', 'The chapter emphasizes the importance of using genie index and information gain to decide which feature to use first in building a decision tree.', 'Random forest provides good accuracy and outperforms other classifiers like Naive Bayes, SVM, or KNN.', 'The feature selection in random forest involves taking the square root of the total number of features for classification problems, and the total number of features divided by 3 for regression problems.', 'The random forest classifier achieved 99% accuracy by changing the criteria and number of trees, surpassing the initial accuracy of 98%.', 'KNN classification achieves 96% accuracy with 2 misclassified points', "Naive Bayes classification's training accuracy is 80% and testing accuracy is 75%", 'The iris dataset contains 50 samples of 3 different species of iris flower', 'The Q matrix is calculated to determine the selected path for the agent based on maximum rewards', 'Deep learning eliminates manual feature extraction, leading to efficient classification.', 'MNIST dataset contains 50,000 training images and 10,000 testing images for training and testing the model.', 'Tools have made AI and ML accessible to non-programmers, enabling diverse backgrounds.', 'The equal role of false positives and false negatives in the banking industry, particularly in loan approvals, where both types of errors can lead to financial losses, is discussed, emphasizing the impact of each type of error on the business (e.g., losing business due to false negatives and taking a financial risk due to false positives).', 'Recommender systems, widely used in platforms like Amazon, YouTube, Netflix, and Facebook, provide personalized recommendations for products, videos, movies, and friends, benefiting businesses by increasing sales and user engagement.']}