title
Machine Learning with Python | Machine Learning Tutorial for Beginners | Machine Learning Tutorial

description
🔥1000+ Free Courses With Free Certificates: https://www.mygreatlearning.com/academy?ambassador_code=GLYT_DES_RnFGwxJwx-0&utm_source=GLYT&utm_campaign=GLYT_DES_RnFGwxJwx-0 🔥Build a successful career in Artificial Intelligence and Machine Learning https://www.mygreatlearning.com/pg-program-artificial-intelligence-course?ambassador_code=GLYT_DES_Middle_SEP22&utm_source=GLYT&utm_campaign=GLYT_DES_Middle_SEP54 🏁 Watch the video and attend this quiz and get a certificate of completion: https://glacad.me/3QpzCxV Machine learning is defined as the “use and development of computer systems that can learn and adapt without following explicit instructions, using algorithms and statistical models to analyze and draw inferences from patterns in data.” Great Learning’s python with Machine learning course brings you a full-fledged course inclusive of all essential topics like introduction to machine learning, machine learning algorithm, reinforcement learning, and many more. This course will enlighten you with machine learning concepts, not just conceptually but also with practicality. Learn Artificial Intelligence from leading experts and attain a Dual Certificate in AI and Machine Learning from world-renowned universities. Take a step towards your professional growth by obtaining expertise in the real-world application of the latest technological tools of AI. Over 500+ Hiring Partners & 8000+ career transitions over varied domains. Know More: https://glacad.me/3qSjmt0 Learn more about: ✔Data Frames with Panda ✔Grouping Data ✔Loops and functions ✔Visualization libraries ✔Descriptive statistics ✔Dispersions and histogram 🏁 Topics Covered: 00:00:00 - Agenda - 00:03:58 - Introduction to Python and Anaconda 01:07:05 - Introduction to Pandas and Data Manipulation 04:42:32 - Introduction to Numpy and Numerical Computing 05:10:58 - Data Visualization 06:06:12 - Statistics vs Machine Learning 06:12:44 - Types of Statistics 07:54:39 - Understanding Data 07:58:19 - What is Reinforcement Learning? 08:53:46 - Reinforcement Learning Framework 09:24:58 - Q-Learning 09:51:08 - Case Study on Smart Taxi 🔥Check Our Free Courses on with free certificate: 📌 Machine Learning with Python course: https://glacad.me/3dPgTdf 📌Machine Learning with Python: https://www.mygreatlearning.com/academy/learn-for-free/courses/machine-learning-with-python?ambassador_code=GLYT_DES_Middle_SEP22&utm_source=GLYT&utm_campaign=GLYT_DES_Middle_SEP35 📌Machine Learning with AWS: https://www.mygreatlearning.com/academy/learn-for-free/courses/machine-learning-with-aws?ambassador_code=GLYT_DES_Middle_SEP22&utm_source=GLYT&utm_campaign=GLYT_DES_Middle_SEP36 📌Machine Learning Algorithms: https://www.mygreatlearning.com/academy/learn-for-free/courses/machine-learning-algorithms?ambassador_code=GLYT_DES_Middle_SEP22&utm_source=GLYT&utm_campaign=GLYT_DES_Middle_SEP37 You can check out our other full course videos: Python for Data Science: https://www.youtube.com/watch?v=edvg4eHi_Mw&t=17638s Data Science full Course: https://www.youtube.com/watch?v=u2zsY-2uZiE&t=979s Tableau Training for Beginners: https://www.youtube.com/watch?v=6mBtTNggkUk&t=2s Time Series Analysis: https://www.youtube.com/watch?v=FPM6it4v8MY&t=8433s Probability and Statistics: https://www.youtube.com/watch?v=z9siRCCElls&t=4844s Machine Learning Salary Trends in India: https://www.mygreatlearning.com/blog/machine-learning-salary-in-india/ ⚡ About Great Learning Academy: Visit Great Learning Academy to get access to 1000+ free courses with free certificate on Data Science, Data Analytics, Digital Marketing, Artificial Intelligence, Big Data, Cloud, Management, Cybersecurity, Software Development, and many more. These are supplemented with free projects, assignments, datasets, quizzes. You can earn a certificate of completion at the end of the course for free. ⚡ About Great Learning: With more than 5.4 Million+ learners in 170+ countries, Great Learning, a part of the BYJU'S group, is a leading global edtech company for professional and higher education offering industry-relevant programs in the blended, classroom, and purely online modes across technology, data and business domains. These programs are developed in collaboration with the top institutions like Stanford Executive Education, MIT Professional Education, The University of Texas at Austin, NUS, IIT Madras, IIT Bombay & more. SOCIAL MEDIA LINKS: 🔹 For more interesting tutorials, don't forget to subscribe to our channel: https://glacad.me/YTsubscribe 🔹 For more updates on courses and tips follow us on: ✅ Telegram: https://t.me/GreatLearningAcademy ✅ Facebook: https://www.facebook.com/GreatLearningOfficial/ ✅ LinkedIn: https://www.linkedin.com/school/great-learning/mycompany/verification/ ✅ Follow our Blog: https://glacad.me/GL_Blog

detail
{'title': 'Machine Learning with Python | Machine Learning Tutorial for Beginners | Machine Learning Tutorial', 'heatmap': [{'end': 2292.576, 'start': 1521.924, 'weight': 0.734}, {'end': 4585.18, 'start': 4199.675, 'weight': 0.77}, {'end': 6492.923, 'start': 4960.055, 'weight': 0.842}, {'end': 22151.575, 'start': 21769.198, 'weight': 0.746}], 'summary': "Covers python's dominance in machine learning and data analysis, using libraries like pandas, numpy, and matplotlib for processing and visualization. it includes case studies achieving 96-97% accuracy, data manipulation, outliers removal, reinforcement learning applications, and practical implementation of q-learning in smart taxi scenarios, providing insights into real-world machine learning challenges and techniques.", 'chapters': [{'end': 1002.511, 'segs': [{'end': 26.064, 'src': 'embed', 'start': 0.109, 'weight': 0, 'content': [{'end': 4.759, 'text': 'With Python being so easy to learn and having really powerful capabilities,', 'start': 0.109, 'duration': 4.65}, {'end': 8.888, 'text': 'it seems to be the dominant choice for new learners and industry professionals alike.', 'start': 4.759, 'duration': 4.129}, {'end': 13.696, 'text': 'Machine learning and Python are like peanut butter and jelly, an unbeatable match.', 'start': 9.793, 'duration': 3.903}, {'end': 19.459, 'text': "You don't even have to have much experience in the technical industry to learn either of these topics.", 'start': 14.436, 'duration': 5.023}, {'end': 26.064, 'text': 'With these skills in hand, many people go on to work in companies like Netflix, Goldman Sachs, and Deloitte.', 'start': 20.22, 'duration': 5.844}], 'summary': "Python's ease and power make it a dominant choice for new learners and industry professionals; leading to opportunities at companies like netflix, goldman sachs, and deloitte.", 'duration': 25.955, 'max_score': 0.109, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-0109.jpg'}, {'end': 69.631, 'src': 'embed', 'start': 42.132, 'weight': 4, 'content': [{'end': 49.318, 'text': 'So we have got three amazing faculty members, experts in their own right to help you get started with this discipline in a comprehensive manner.', 'start': 42.132, 'duration': 7.186}, {'end': 57.645, 'text': 'They are Dr. Abhinanda Sarkar, one of the top 10 data science academicians in India, Dr. Narayana Dharpaneni, an IISc postdoctoral fellow,', 'start': 49.759, 'duration': 7.886}, {'end': 59.687, 'text': 'and Mr. Raghu Raman, a big data expert.', 'start': 57.645, 'duration': 2.042}, {'end': 62.268, 'text': 'Now, before we start off with the session,', 'start': 60.327, 'duration': 1.941}, {'end': 69.631, 'text': "I'd like to inform you that we will be coming up with a series of high quality tutorials on artificial intelligence, data science and so much more.", 'start': 62.268, 'duration': 7.363}], 'summary': 'Three faculty members: dr. sarkar, dr. dharpaneni, and mr. raman, to provide comprehensive guidance in data science. series of high-quality tutorials on ai and data science upcoming.', 'duration': 27.499, 'max_score': 42.132, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-042132.jpg'}, {'end': 115.729, 'src': 'embed', 'start': 83.318, 'weight': 6, 'content': [{'end': 85.599, 'text': 'Here are the topics we will cover in this tutorial.', 'start': 83.318, 'duration': 2.281}, {'end': 92.555, 'text': 'First, we will get you to dip your toes in some programming with an introduction to Python and Anaconda.', 'start': 87.11, 'duration': 5.445}, {'end': 100.663, 'text': "Then, we'll cover different libraries in Python, including Pandas and NumPy for data processing and manipulation.", 'start': 93.536, 'duration': 7.127}, {'end': 107.43, 'text': 'After that, we shall see how we can visualize data in Python using libraries like Matplotlib and Seaborn.', 'start': 101.884, 'duration': 5.546}, {'end': 115.729, 'text': 'We will then get into a statistical approach, understanding how statistics is different from machine learning and the different types of statistics.', 'start': 108.387, 'duration': 7.342}], 'summary': 'Introduction to python, pandas, numpy, matplotlib, seaborn, and statistics for data processing and visualization.', 'duration': 32.411, 'max_score': 83.318, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-083318.jpg'}, {'end': 579.416, 'src': 'embed', 'start': 549.77, 'weight': 1, 'content': [{'end': 552.87, 'text': 'So basically when you download and install, you will be getting Python.', 'start': 549.77, 'duration': 3.1}, {'end': 557.811, 'text': 'You will also be getting some of the libraries which are very commonly used with Python.', 'start': 553.27, 'duration': 4.541}, {'end': 563.172, 'text': 'And it is very easy to update and if you want to add any other libraries, you can do it in Anaconda.', 'start': 558.451, 'duration': 4.721}, {'end': 567.453, 'text': 'So Anaconda is one of the most famous distribution of Python.', 'start': 563.192, 'duration': 4.261}, {'end': 571.134, 'text': 'We also have another thing called nthot canopy.', 'start': 568.633, 'duration': 2.501}, {'end': 572.394, 'text': 'There is something called canopy.', 'start': 571.214, 'duration': 1.18}, {'end': 573.514, 'text': "That's another distribution.", 'start': 572.434, 'duration': 1.08}, {'end': 579.416, 'text': "Well canopy is not Very popular like Anaconda, but I have also, they're similar.", 'start': 574.775, 'duration': 4.641}], 'summary': 'Anaconda is a popular distribution of python, including commonly used libraries, and easy to update and add new libraries.', 'duration': 29.646, 'max_score': 549.77, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-0549770.jpg'}, {'end': 683.166, 'src': 'embed', 'start': 630.655, 'weight': 5, 'content': [{'end': 633.457, 'text': 'And then we will be using Python 3.', 'start': 630.655, 'duration': 2.802}, {'end': 640.481, 'text': "I don't think this requires any explanation, but we have a version called Python 2, which is an old version.", 'start': 633.457, 'duration': 7.024}, {'end': 643.363, 'text': 'Python 2..', 'start': 641.162, 'duration': 2.201}, {'end': 649.065, 'text': 'by 2012 or 2013 was officially removed from the community.', 'start': 643.363, 'duration': 5.702}, {'end': 650.845, 'text': 'Not removed, the support was removed.', 'start': 649.425, 'duration': 1.42}, {'end': 658.008, 'text': 'So Python 2 will no longer exist actually, but in some of the older projects, we still see Python 2, people writing the code.', 'start': 651.386, 'duration': 6.622}, {'end': 661.049, 'text': 'What we are using is Python 3.', 'start': 658.648, 'duration': 2.401}, {'end': 663.289, 'text': 'There is no major difference as such.', 'start': 661.049, 'duration': 2.24}, {'end': 670.572, 'text': 'some small syntax differences and performance improvements in Python 3, but everywhere people follow Python 3 these days right?', 'start': 663.289, 'duration': 7.283}, {'end': 674.779, 'text': 'And now, how are we going to learn Python?', 'start': 671.696, 'duration': 3.083}, {'end': 683.166, 'text': 'So one thing you need to understand is that you are going to learn Python in the aspect of data science or machine learning.', 'start': 674.839, 'duration': 8.327}], 'summary': 'Python 2 was officially removed by 2012-2013, and python 3 is widely adopted for data science and machine learning.', 'duration': 52.511, 'max_score': 630.655, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-0630655.jpg'}, {'end': 867.298, 'src': 'embed', 'start': 844.178, 'weight': 2, 'content': [{'end': 851.381, 'text': 'So, and then I will be using this NumPy arrays to load that data and then do matching and classification or whatever you want to do.', 'start': 844.178, 'duration': 7.203}, {'end': 858.144, 'text': 'So NumPy arrays and all are very important when it comes to your deep learning image classification, some of those contexts.', 'start': 851.841, 'duration': 6.303}, {'end': 865.175, 'text': 'Maybe not immediately you will not work on NumPy, but a basic knowledge on NumPy will be actually helpful for you in the future.', 'start': 858.726, 'duration': 6.449}, {'end': 867.298, 'text': 'So this is one thing.', 'start': 866.537, 'duration': 0.761}], 'summary': 'Numpy arrays are crucial for deep learning image classification, providing useful knowledge for future work.', 'duration': 23.12, 'max_score': 844.178, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-0844178.jpg'}, {'end': 955.394, 'src': 'embed', 'start': 908.282, 'weight': 3, 'content': [{'end': 916.585, 'text': 'but in some of the scientific calculations we need the data in the form of 4 dimensions and then you do additions and subtractions on the data.', 'start': 908.282, 'duration': 8.303}, {'end': 924.667, 'text': 'So you need a different type of data structure to represent that your normal data structures cannot define 4 dimensions or 5 dimensions.', 'start': 917.005, 'duration': 7.662}, {'end': 926.948, 'text': 'that is when we say we will use numpy.', 'start': 924.667, 'duration': 2.281}, {'end': 927.768, 'text': 'basically right.', 'start': 926.948, 'duration': 0.82}, {'end': 934.726, 'text': 'And in your projects or upcoming sessions, these will be explained further whenever it comes to the point of NumPy.', 'start': 928.584, 'duration': 6.142}, {'end': 940.388, 'text': 'And this pandas, this is what we will be starting today our analysis with.', 'start': 936.007, 'duration': 4.381}, {'end': 941.549, 'text': "It's called pandas.", 'start': 940.628, 'duration': 0.921}, {'end': 947.991, 'text': 'So pandas is a library for sort of like labeled data analysis.', 'start': 942.789, 'duration': 5.202}, {'end': 952.093, 'text': 'So you guys are familiar with Excel, right? Excel sheets.', 'start': 948.451, 'duration': 3.642}, {'end': 955.394, 'text': 'So the same thing in Python is pandas actually.', 'start': 952.633, 'duration': 2.761}], 'summary': 'Numpy is used for 4d and 5d data; pandas for labeled data analysis like excel.', 'duration': 47.112, 'max_score': 908.282, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-0908282.jpg'}], 'start': 0.109, 'title': 'Python and libraries for machine learning', 'summary': 'Covers the dominance of python in machine learning, introduces python prerequisite for data science and machine learning, and emphasizes the use of numpy and pandas for data processing, including a case study involving processing data from around 10-20 million rows with pandas.', 'chapters': [{'end': 497.054, 'start': 0.109, 'title': 'Python for ml & machine learning introduction', 'summary': 'Introduces the dominance of python in machine learning, the credentials of the faculty members, and the topics covered in the tutorial, highlighting the practical and comprehensive approach to learning python and machine learning.', 'duration': 496.945, 'highlights': ["Python's dominance in machine learning and its practical applications in industry are emphasized as the key choice for new learners and professionals, with a series of tutorials and a practical learning approach provided. Python being the dominant choice for new learners and industry professionals, with a series of tutorials to get started in machine learning and Python.", 'The credentials of the faculty members, including Dr. Abhinanda Sarkar, Dr. Narayana Dharpaneni, and Mr. Raghu Raman, are highlighted, showcasing their expertise and experience in the field of machine learning and big data. The credentials of the faculty members, including Dr. Abhinanda Sarkar, Dr. Narayana Dharpaneni, and Mr. Raghu Raman, with their expertise in machine learning and big data.', "The tutorial's coverage of various topics, including Python programming, data processing, visualization, statistics, reinforcement learning, and the credentials of the faculty members, is mentioned, emphasizing the comprehensive approach to learning Python and machine learning. The topics covered in the tutorial, including Python programming, data processing, visualization, statistics, and reinforcement learning, highlighting a comprehensive learning approach."]}, {'end': 745.313, 'start': 497.454, 'title': 'Python prerequisite and anaconda', 'summary': 'Introduces the python prerequisite for data science and machine learning, emphasizing the installation of anaconda, a distribution of python that comes with commonly used libraries and an ide for python, enabling easy installation and updates for necessary tools.', 'duration': 247.859, 'highlights': ['Anaconda is a distribution of Python that includes commonly used libraries and an IDE for Python, making it easy to update and add new libraries. Anaconda comes with Python, commonly used libraries, and an IDE for Python, making it convenient for data science and machine learning purposes.', 'The chapter emphasizes the importance of Python 3 for data science and machine learning, highlighting the usage of specific libraries such as NumPy, pandas, seaborn, and scikit-learn. Python 3 is highlighted as the version for data science and machine learning, with an emphasis on the necessity of libraries like NumPy, pandas, seaborn, and scikit-learn.', 'The chapter mentions the irrelevance of Python 2 for data science and machine learning, specifying the removal of support and the prevalence of Python 3 in current projects. Python 2 is deemed irrelevant for data science and machine learning, with the removal of support and the prevalence of Python 3 in current projects.', 'The chapter outlines the specific purpose of learning Python for data science and machine learning, emphasizing the utilization of libraries such as NumPy, pandas, seaborn, and scikit-learn, which are included by default in Anaconda. Learning Python for data science and machine learning is specified, highlighting the use of libraries like NumPy, pandas, seaborn, and scikit-learn, which are included by default in Anaconda.']}, {'end': 1002.511, 'start': 745.553, 'title': 'Numpy and pandas for data processing', 'summary': 'Introduces the use of numpy for representing numerical data in arrays and the importance of numpy and pandas in data processing, including 4-dimensional data representation and handling large datasets, with a case study from mercedes benz involving processing data from around 10-20 million rows with pandas.', 'duration': 256.958, 'highlights': ['NumPy is used for representing numerical data in arrays, including two-dimensional, three-dimensional, and multi-dimensional arrays, and is important for scientific calculations and manipulating data. NumPy is utilized for representing numerical data in various array dimensions, such as two-dimensional, three-dimensional, and multi-dimensional arrays, and is crucial for scientific calculations and data manipulation.', 'NumPy is essential for deep learning image classification and image processing, where data is converted to NumPy arrays for matching, classification, and processing. NumPy is crucial for deep learning image classification and image processing, as it involves converting image data into NumPy arrays for matching, classification, and processing.', 'Pandas is used for labeled data analysis and is beneficial for processing large datasets, as demonstrated by a case study involving Mercedes Benz processing data from around 10-20 million rows. Pandas is employed for labeled data analysis and is advantageous for processing large datasets, exemplified by a case study involving Mercedes Benz processing data from around 10-20 million rows.']}], 'duration': 1002.402, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-0109.jpg', 'highlights': ["Python's dominance in machine learning and its practical applications in industry are emphasized as the key choice for new learners and professionals, with a series of tutorials and a practical learning approach provided.", 'Anaconda is a distribution of Python that includes commonly used libraries and an IDE for Python, making it easy to update and add new libraries.', 'NumPy is essential for deep learning image classification and image processing, as it involves converting image data into NumPy arrays for matching, classification, and processing.', 'Pandas is employed for labeled data analysis and is advantageous for processing large datasets, exemplified by a case study involving Mercedes Benz processing data from around 10-20 million rows.', 'The credentials of the faculty members, including Dr. Abhinanda Sarkar, Dr. Narayana Dharpaneni, and Mr. Raghu Raman, with their expertise in machine learning and big data.', 'Python 3 is highlighted as the version for data science and machine learning, with an emphasis on the necessity of libraries like NumPy, pandas, seaborn, and scikit-learn.', 'The topics covered in the tutorial, including Python programming, data processing, visualization, statistics, and reinforcement learning, highlighting a comprehensive learning approach.', 'NumPy is utilized for representing numerical data in various array dimensions, such as two-dimensional, three-dimensional, and multi-dimensional arrays, and is crucial for scientific calculations and data manipulation.', 'Python being the dominant choice for new learners and industry professionals, with a series of tutorials to get started in machine learning and Python.', 'Python 2 is deemed irrelevant for data science and machine learning, with the removal of support and the prevalence of Python 3 in current projects.']}, {'end': 3213.16, 'segs': [{'end': 1052.601, 'src': 'embed', 'start': 1024.069, 'weight': 0, 'content': [{'end': 1027.83, 'text': 'the text data that you get, normal text will be in terabytes.', 'start': 1024.069, 'duration': 3.761}, {'end': 1029.992, 'text': 'So like more than billions of rows.', 'start': 1028.352, 'duration': 1.64}, {'end': 1031.772, 'text': "But we can't help it right?", 'start': 1030.913, 'duration': 0.859}, {'end': 1032.973, 'text': 'Because sensors will keep on.', 'start': 1031.792, 'duration': 1.181}, {'end': 1035.314, 'text': 'you know, every second, or even less than that.', 'start': 1032.973, 'duration': 2.341}, {'end': 1041.136, 'text': "they'll keep on tracking the motion of the car or what it is doing, and they have to collect it and end of the day,", 'start': 1035.314, 'duration': 5.822}, {'end': 1045.617, 'text': 'their problem is like I want to analyze it and then probably make a machine learning model.', 'start': 1041.136, 'duration': 4.481}, {'end': 1052.601, 'text': "So if I'm giving you such a file, for example, I'm giving you a text file or a CSV file where I'm saying that there are, you know, 100 billion rows.", 'start': 1045.978, 'duration': 6.623}], 'summary': 'Text data in terabytes, billions of rows, for machine learning model.', 'duration': 28.532, 'max_score': 1024.069, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-01024069.jpg'}, {'end': 1090.088, 'src': 'embed', 'start': 1059.575, 'weight': 1, 'content': [{'end': 1062.137, 'text': 'Pandas is one of the libraries which helps you to do that.', 'start': 1059.575, 'duration': 2.562}, {'end': 1065.178, 'text': 'There are other libraries also, but Pandas is most common.', 'start': 1062.537, 'duration': 2.641}, {'end': 1072.423, 'text': 'So where you work with the labeled data, you have rows and columns and then you can have selection.', 'start': 1065.719, 'duration': 6.704}, {'end': 1077.426, 'text': 'filtering, indexing, grouping joins anything that you normally do on a dataset.', 'start': 1072.423, 'duration': 5.003}, {'end': 1080.228, 'text': "So we'll be spending a lot of time on Pandas assets.", 'start': 1077.746, 'duration': 2.482}, {'end': 1082.069, 'text': 'So this is one place where we need to spend.', 'start': 1080.508, 'duration': 1.561}, {'end': 1086.984, 'text': 'Mathplotlib, this is used for visualization.', 'start': 1083.761, 'duration': 3.223}, {'end': 1090.088, 'text': 'Like I want to draw a graph.', 'start': 1088.045, 'duration': 2.043}], 'summary': 'Pandas is commonly used for data manipulation, while mathplotlib is used for visualization.', 'duration': 30.513, 'max_score': 1059.575, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-01059575.jpg'}, {'end': 2292.576, 'src': 'heatmap', 'start': 1521.924, 'weight': 0.734, 'content': [{'end': 1528.55, 'text': 'So first thing is that you understand the domain because I had no clue about the healthcare domain or specific to cancer.', 'start': 1521.924, 'duration': 6.626}, {'end': 1533.575, 'text': 'So then you understand some aspects of the domain, like what is cancer? What causes it?', 'start': 1529.111, 'duration': 4.464}, {'end': 1535.677, 'text': 'What are the factors which influences it?', 'start': 1533.995, 'duration': 1.682}, {'end': 1538.07, 'text': 'Does age actually influences it?', 'start': 1536.488, 'duration': 1.582}, {'end': 1543.294, 'text': 'So this will take roughly around one to two months to understand the domain and the features of our data.', 'start': 1538.61, 'duration': 4.684}, {'end': 1546.658, 'text': 'Then you talk to the hospital and the doctors and try to collect the data.', 'start': 1543.775, 'duration': 2.883}, {'end': 1548.579, 'text': 'So initially you may not get enough data.', 'start': 1546.978, 'duration': 1.601}, {'end': 1555.084, 'text': 'You will just get the patient records, which will have their age, weight and statistics, and then you ask for more data.', 'start': 1548.599, 'duration': 6.485}, {'end': 1561.128, 'text': 'So the more data you get, the more features you can get from the data and that will impact your ML model.', 'start': 1555.445, 'duration': 5.683}, {'end': 1568.693, 'text': 'So actually doing this ML is not a very big deal as you are thinking about, because you have an algorithm to do this right?', 'start': 1561.589, 'duration': 7.104}, {'end': 1571.495, 'text': 'So probably in the morning you would have discussed.', 'start': 1569.054, 'duration': 2.441}, {'end': 1575.858, 'text': 'So for every machine learning problem, we have an algorithm which can solve the problem right?', 'start': 1571.915, 'duration': 3.943}, {'end': 1578.92, 'text': 'So all you need to do is call the algorithm, give the data.', 'start': 1576.198, 'duration': 2.722}, {'end': 1580.021, 'text': 'algorithm will give you the result.', 'start': 1578.92, 'duration': 1.101}, {'end': 1583.583, 'text': "It's probably 10 minutes job, really right?", 'start': 1580.441, 'duration': 3.142}, {'end': 1585.864, 'text': "So it's not like very complicated.", 'start': 1583.663, 'duration': 2.201}, {'end': 1590.967, 'text': 'but before you call the algorithm, you should have the right data in the right format to feed to the algorithm.', 'start': 1585.864, 'duration': 5.103}, {'end': 1592.928, 'text': "Then only it's gonna give you the expected result.", 'start': 1591.287, 'duration': 1.641}, {'end': 1602.354, 'text': 'So majority of the time the data scientists are spending in understanding the data and seeing if I can collect other varieties of data,', 'start': 1593.428, 'duration': 8.926}, {'end': 1603.874, 'text': 'then what features I can extract.', 'start': 1602.354, 'duration': 1.52}, {'end': 1606.056, 'text': 'So initially we were getting only patient records.', 'start': 1604.135, 'duration': 1.921}, {'end': 1610.231, 'text': 'And then we build a model that was not very powerful.', 'start': 1607.129, 'duration': 3.102}, {'end': 1615.554, 'text': 'It was giving results, but it was not really what the center was expecting.', 'start': 1610.771, 'duration': 4.783}, {'end': 1617.875, 'text': 'And then we had further discussions.', 'start': 1616.374, 'duration': 1.501}, {'end': 1625.66, 'text': 'And then they said, we will also share some more data, which is not directly related to cancer, but some of the things which may affect.', 'start': 1618.155, 'duration': 7.505}, {'end': 1628.941, 'text': 'For example, we started looking at gene patterns and all.', 'start': 1626.02, 'duration': 2.921}, {'end': 1635.025, 'text': "For example, so there's a study which is going on, which proves that it can be hereditary.", 'start': 1629.842, 'duration': 5.183}, {'end': 1642.197, 'text': 'Cancer is not hereditary actually, but what is the effect of that? So then that is a whole different question.', 'start': 1636.032, 'duration': 6.165}, {'end': 1649.923, 'text': "So now you need that kind of a data where the person's hereditary details, his genome data, father, mother details and all this.", 'start': 1642.537, 'duration': 7.386}, {'end': 1657.949, 'text': "What is the history of cancer in the family? So then how do you represent that data? So this is majority of the time you're playing with the data.", 'start': 1650.383, 'duration': 7.566}, {'end': 1659.049, 'text': 'That is this EDA part.', 'start': 1657.989, 'duration': 1.06}, {'end': 1666.974, 'text': 'So once your data is finalized and you have an idea, okay, so this is what I want, calling the ML algorithm is not a big deal actually.', 'start': 1659.91, 'duration': 7.064}, {'end': 1668.114, 'text': 'Everything is already written.', 'start': 1667.114, 'duration': 1}, {'end': 1670.515, 'text': 'If I am calling a regression algorithm, it is already written.', 'start': 1668.134, 'duration': 2.381}, {'end': 1671.456, 'text': 'I am not writing anything.', 'start': 1670.555, 'duration': 0.901}, {'end': 1673.277, 'text': 'Just passing the data, I am getting the output.', 'start': 1671.756, 'duration': 1.521}, {'end': 1676.298, 'text': 'But then I validate is this correct what I am doing??', 'start': 1674.077, 'duration': 2.221}, {'end': 1682.161, 'text': 'So if that is not correct, then I need to again rebuild my data and then train my model and all.', 'start': 1676.498, 'duration': 5.663}, {'end': 1685.163, 'text': 'So that is why this EDA part is very important.', 'start': 1682.481, 'duration': 2.682}, {'end': 1693.235, 'text': 'because you have to collect the data and then look at the data and extract certain features from the data and then compare it.', 'start': 1686.089, 'duration': 7.146}, {'end': 1696.417, 'text': 'that is where this EDA becomes very useful.', 'start': 1693.235, 'duration': 3.182}, {'end': 1701.441, 'text': 'These are not the only EDA methods there are further methods as well which you will cover later.', 'start': 1696.698, 'duration': 4.743}, {'end': 1705.284, 'text': 'These are some of the basic EDA things that we do usually.', 'start': 1701.822, 'duration': 3.462}, {'end': 1719.994, 'text': 'So that project actually got on hold, but we went up to 96, 97% accuracy levels in that.', 'start': 1712.871, 'duration': 7.123}, {'end': 1724.236, 'text': 'So right now that model is not running, but we were able to achieve.', 'start': 1720.274, 'duration': 3.962}, {'end': 1730.118, 'text': 'I was not actively a contributor in that project, but I was helping them since that was a new project actually.', 'start': 1724.656, 'duration': 5.462}, {'end': 1733.399, 'text': 'But we were able to get a good accuracy level in that actually.', 'start': 1730.618, 'duration': 2.781}, {'end': 1738.081, 'text': 'So it depends on what kind of problem you are solving.', 'start': 1735.18, 'duration': 2.901}, {'end': 1740.502, 'text': 'Each problem is actually different for ML.', 'start': 1738.281, 'duration': 2.221}, {'end': 1743.786, 'text': 'And, like I said.', 'start': 1742.145, 'duration': 1.641}, {'end': 1745.247, 'text': 'so what we will do?', 'start': 1743.786, 'duration': 1.461}, {'end': 1752.87, 'text': 'we will just look at basics of Python a little bit and then we will go to pandas and we will do some hands-on with pandas right?', 'start': 1745.247, 'duration': 7.623}, {'end': 1756.272, 'text': 'And then NumPy and then this visualization stuff right?', 'start': 1752.97, 'duration': 3.302}, {'end': 1761.775, 'text': 'So I think, possibly I should, you know, get started with the hands-on part.', 'start': 1756.732, 'duration': 5.043}, {'end': 1770.719, 'text': 'So if you can look at your laptops, right? So today probably what we will do is like, you can do along with me if you want.', 'start': 1763.175, 'duration': 7.544}, {'end': 1776.681, 'text': 'maybe tomorrow I will do it myself, because it also takes time right.', 'start': 1771.639, 'duration': 5.042}, {'end': 1783.925, 'text': 'so maybe, depending on what we need to cover sometimes tomorrow, what I will do, I will just demonstrate and you can practice later also.', 'start': 1776.681, 'duration': 7.244}, {'end': 1785.565, 'text': 'that will not be a challenge.', 'start': 1783.925, 'duration': 1.64}, {'end': 1791.788, 'text': 'so if you have an installed anaconda right, you can just search for anaconda.', 'start': 1785.565, 'duration': 6.223}, {'end': 1800.072, 'text': 'this is see, there is something called anaconda navigator and you can just open it, just just to make sure that things are, you know, working fine.', 'start': 1791.788, 'duration': 8.284}, {'end': 1802.98, 'text': 'Well, Jupyter is in fact not an IDE.', 'start': 1800.779, 'duration': 2.201}, {'end': 1804.24, 'text': 'Spider is the IDE.', 'start': 1803.24, 'duration': 1}, {'end': 1805.821, 'text': "We'll be using Jupyter for sure.", 'start': 1804.44, 'duration': 1.381}, {'end': 1809.322, 'text': "I mean, we don't use Spider at all for ML projects actually.", 'start': 1805.941, 'duration': 3.381}, {'end': 1810.582, 'text': "I'll show you.", 'start': 1810.082, 'duration': 0.5}, {'end': 1820.686, 'text': "So once you start your Anaconda and one more thing you can do is that if you're not really very much comfortable programming,", 'start': 1810.702, 'duration': 9.984}, {'end': 1826.608, 'text': "I mean you may be somehow comfortable, but if you feel like you're not really comfortable programming,", 'start': 1820.686, 'duration': 5.922}, {'end': 1829.569, 'text': "you can just take a help for somebody who's sitting next to you, right?", 'start': 1826.608, 'duration': 2.961}, {'end': 1833.269, 'text': 'Yeah, so if you open Anaconda, you will see these icons', 'start': 1830.205, 'duration': 3.064}, {'end': 1840.057, 'text': 'And in these icons, we are really interested only in this thing called Jupiter Notebook, the second thing that you see.', 'start': 1833.769, 'duration': 6.288}, {'end': 1843.521, 'text': 'There are many things actually, we are not really using any of them.', 'start': 1840.578, 'duration': 2.943}, {'end': 1852.589, 'text': 'For most of the data science projects, Jupiter is the primary, you know, building platform for prototyping and all, at least now,', 'start': 1844.022, 'duration': 8.567}, {'end': 1854.39, 'text': 'when you begin right.', 'start': 1852.589, 'duration': 1.801}, {'end': 1859.452, 'text': "then, once you complete, let's say three months, six months and all probably you can use different tools also.", 'start': 1854.39, 'duration': 5.062}, {'end': 1862.993, 'text': 'but to get started, what you need is this thing called jupiter.', 'start': 1859.452, 'duration': 3.541}, {'end': 1869.256, 'text': 'so either you can click on the launch button here one thing there is a launch button or even it will be available here.', 'start': 1862.993, 'duration': 6.263}, {'end': 1873.397, 'text': 'if you go to the programs, just type jupiter somewhere, it will be there.', 'start': 1869.256, 'duration': 4.141}, {'end': 1874.638, 'text': 'yeah, so there is jupiter notebook.', 'start': 1873.397, 'duration': 1.241}, {'end': 1880.513, 'text': 'So whichever way, either you click on this launch or you click on this just open this thing called Jupyter.', 'start': 1875.609, 'duration': 4.904}, {'end': 1882.855, 'text': 'It will open in a browser.', 'start': 1881.874, 'duration': 0.981}, {'end': 1883.776, 'text': 'I will tell you what it is.', 'start': 1882.955, 'duration': 0.821}, {'end': 1885.377, 'text': 'It will open in a browser.', 'start': 1883.796, 'duration': 1.581}, {'end': 1892.463, 'text': 'Well, what is this, right? So Jupyter is actually an open source project.', 'start': 1885.977, 'duration': 6.486}, {'end': 1895.425, 'text': 'Originally it was called IPython Notebook.', 'start': 1893.083, 'duration': 2.342}, {'end': 1898.007, 'text': 'Later it was renamed as Jupyter Notebook.', 'start': 1896.026, 'duration': 1.981}, {'end': 1902.231, 'text': 'It is a browser based interactive shell for Python.', 'start': 1898.568, 'duration': 3.663}, {'end': 1909.642, 'text': 'Meaning normally, if you are writing a program, what you do is that either you will take a text pad or something,', 'start': 1903.14, 'duration': 6.502}, {'end': 1911.743, 'text': 'or you will open the command prompt and type the program.', 'start': 1909.642, 'duration': 2.101}, {'end': 1915.264, 'text': "So if you're doing something like Java or something, you will use Eclipse or something right?", 'start': 1911.923, 'duration': 3.341}, {'end': 1921.926, 'text': "In case of Python, since it is sort of like a scripting kind of a language, it's very easy to write the code.", 'start': 1916.124, 'duration': 5.802}, {'end': 1927.088, 'text': 'What Jupyter allows you to, is to create something called notebook.', 'start': 1922.506, 'duration': 4.582}, {'end': 1933.008, 'text': 'A notebook is an environment where you can type your code, run the code, see the output.', 'start': 1928.024, 'duration': 4.984}, {'end': 1936.09, 'text': 'And the advantage is that you can share it with others also.', 'start': 1933.568, 'duration': 2.522}, {'end': 1941.815, 'text': 'So just to show you an example, we will pick up some notebooks, but you guys can do this.', 'start': 1937.311, 'duration': 4.504}, {'end': 1945.618, 'text': 'You can say new, there is a new button and there is something called Python 3.', 'start': 1941.855, 'duration': 3.763}, {'end': 1951.202, 'text': 'If you click on this Python 3, ideally in a different tab, this will open.', 'start': 1945.618, 'duration': 5.584}, {'end': 1963.645, 'text': 'So what do you need to do? Just go here, say new Python 3 and something like this will open, right? Are you able to do this? Yeah, okay.', 'start': 1952.756, 'duration': 10.889}, {'end': 1965.286, 'text': 'So this is called a notebook.', 'start': 1964.145, 'duration': 1.141}, {'end': 1968.529, 'text': "Now, if you really want to know what this guy is doing, it's very simple.", 'start': 1965.666, 'duration': 2.863}, {'end': 1975.915, 'text': 'For example, I can say a equal to five, b equal to 10, okay.', 'start': 1968.689, 'duration': 7.226}, {'end': 1977.396, 'text': 'And I can simply say print.', 'start': 1976.055, 'duration': 1.341}, {'end': 1979.653, 'text': 'A plus B.', 'start': 1978.712, 'duration': 0.941}, {'end': 1981.414, 'text': 'I think probably you can understand the code.', 'start': 1979.653, 'duration': 1.761}, {'end': 1982.835, 'text': "It's not so difficult.", 'start': 1981.835, 'duration': 1}, {'end': 1985.778, 'text': "So let's say I'm writing a code like this, okay?", 'start': 1983.436, 'duration': 2.342}, {'end': 1991.883, 'text': "So I'm saying there is a variable A, that's five, B is 10, and I just want to print A plus B, right?", 'start': 1985.798, 'duration': 6.085}, {'end': 1993.464, 'text': 'So this is my program, imagine.', 'start': 1992.123, 'duration': 1.341}, {'end': 2004.453, 'text': 'Now, if I want to execute my program, what I can do, either I can click on this run button, there is a button run, or I can just press shift enter.', 'start': 1994.064, 'duration': 10.389}, {'end': 2008.104, 'text': 'Just run the code and show me the output here.', 'start': 2005.942, 'duration': 2.162}, {'end': 2012.366, 'text': 'This is basically what it does, right? So it is an interactive notebook.', 'start': 2008.224, 'duration': 4.142}, {'end': 2015.669, 'text': "You can type and click and see what you're typing or what it is going to do.", 'start': 2012.446, 'duration': 3.223}, {'end': 2020.792, 'text': "And for most of your data science or ML projects, you'll be using this tool.", 'start': 2016.169, 'duration': 4.623}, {'end': 2026.396, 'text': "Because in ML it's very important that in every step you see what's happening right?", 'start': 2021.753, 'duration': 4.643}, {'end': 2032.197, 'text': "So you don't write a full code and then run it, but rather you load the data, then see what is the data filter.", 'start': 2026.854, 'duration': 5.343}, {'end': 2032.918, 'text': 'it see what it is.', 'start': 2032.197, 'duration': 0.721}, {'end': 2036.3, 'text': 'So the most convenient way to work that is using Jupyter.', 'start': 2033.358, 'duration': 2.942}, {'end': 2046.426, 'text': "So we'll be using Jupyter all the way, right? Yeah, so some of you are having small difficulties in starting IPython and all, it's perfectly fine.", 'start': 2036.32, 'duration': 10.106}, {'end': 2049.928, 'text': "Even if you're not able to do it right now, it's perfectly fine, okay?", 'start': 2046.806, 'duration': 3.122}, {'end': 2053.69, 'text': "So anyway, I'll be sharing whatever I'm doing over the LMS, right?", 'start': 2049.967, 'duration': 3.723}, {'end': 2057.922, 'text': 'Now in the LMS you have a file.', 'start': 2055.02, 'duration': 2.902}, {'end': 2059.244, 'text': 'Can you look in this file?', 'start': 2058.003, 'duration': 1.241}, {'end': 2068.992, 'text': 'If you go to the LMS, let me just can you see these files in LMS?', 'start': 2060.685, 'duration': 8.307}, {'end': 2074.817, 'text': 'There is a Python overview, Python visualization, store sales, Uber driver, et cetera.', 'start': 2069.293, 'duration': 5.524}, {'end': 2081.944, 'text': 'Where in LMS, there will be a zip file called Python files.', 'start': 2077.52, 'duration': 4.424}, {'end': 2085.458, 'text': 'Do you see a zip file called Python files?', 'start': 2083.833, 'duration': 1.625}, {'end': 2089.007, 'text': 'Can you download it and extract it?', 'start': 2087.443, 'duration': 1.564}, {'end': 2093.237, 'text': 'Inside that you will have a zip file right?', 'start': 2091.493, 'duration': 1.744}, {'end': 2095.792, 'text': 'Can you download it and extract it?', 'start': 2094.411, 'duration': 1.381}, {'end': 2100.354, 'text': 'And so, once you download it, extract it and you should see these files.', 'start': 2096.592, 'duration': 3.762}, {'end': 2104.016, 'text': 'So basically what do you need to do? Once you download them, open your Jupyter.', 'start': 2100.854, 'duration': 3.162}, {'end': 2109.358, 'text': 'So this is my Jupyter, right? And all you need to do is, let me see if it is already there.', 'start': 2104.716, 'duration': 4.642}, {'end': 2110.259, 'text': 'No, it is not there.', 'start': 2109.518, 'duration': 0.741}, {'end': 2113.36, 'text': 'All you need to do is click on this upload button.', 'start': 2111.379, 'duration': 1.981}, {'end': 2120.163, 'text': 'There is an upload button, okay? And then select the file you downloaded.', 'start': 2113.92, 'duration': 6.243}, {'end': 2121.384, 'text': 'Let me show you which one.', 'start': 2120.443, 'duration': 0.941}, {'end': 2125.577, 'text': 'Where is it? Python files.', 'start': 2123.555, 'duration': 2.022}, {'end': 2129.36, 'text': 'And then select this file called Python overview.', 'start': 2126.397, 'duration': 2.963}, {'end': 2131.441, 'text': 'Can you see? And say open.', 'start': 2129.46, 'duration': 1.981}, {'end': 2133.983, 'text': 'And click on upload.', 'start': 2133.022, 'duration': 0.961}, {'end': 2135.324, 'text': "That's it.", 'start': 2135.024, 'duration': 0.3}, {'end': 2144.11, 'text': 'So what do you need to do? Click on, you know, this icon called upload.', 'start': 2139.267, 'duration': 4.843}, {'end': 2149.154, 'text': 'Click on this upload and upload this Python overview file.', 'start': 2144.811, 'duration': 4.343}, {'end': 2156.641, 'text': 'Now I have seen some of the classes for some participants when they try to do this, it will not get uploaded.', 'start': 2150.796, 'duration': 5.845}, {'end': 2158.884, 'text': 'And the issue was with the browser.', 'start': 2157.242, 'duration': 1.642}, {'end': 2163.068, 'text': 'So sometimes if you are using Internet Explorer, it will not allow you to upload.', 'start': 2158.944, 'duration': 4.124}, {'end': 2169.454, 'text': "I don't know why for that reason, but and if you have uploaded it successfully, you should see it here like this on my screen.", 'start': 2163.088, 'duration': 6.366}, {'end': 2172.091, 'text': 'somewhere in this home file.', 'start': 2170.39, 'duration': 1.701}, {'end': 2176.734, 'text': 'So if you just click on this file, Python overview, it should open like this.', 'start': 2172.692, 'duration': 4.042}, {'end': 2179.456, 'text': 'You know, you should see this actually right?', 'start': 2176.934, 'duration': 2.522}, {'end': 2182.178, 'text': 'So this is the advantage of a notebook, see?', 'start': 2180.337, 'duration': 1.841}, {'end': 2186.401, 'text': 'You can type your code and you can even type the explanation, see?', 'start': 2182.658, 'duration': 3.743}, {'end': 2189.303, 'text': 'So I have typed something like code structure.', 'start': 2186.821, 'duration': 2.482}, {'end': 2190.203, 'text': 'what are you learning?', 'start': 2189.303, 'duration': 0.9}, {'end': 2191.424, 'text': 'you know all these things.', 'start': 2190.203, 'duration': 1.221}, {'end': 2193.085, 'text': 'And this is not actually code.', 'start': 2191.864, 'duration': 1.221}, {'end': 2195.767, 'text': "This is just the markup, right, that I'm doing.", 'start': 2193.365, 'duration': 2.402}, {'end': 2199.269, 'text': 'So I can even type these kinds of explanations and share with people.', 'start': 2196.147, 'duration': 3.122}, {'end': 2199.97, 'text': "So it's very easy.", 'start': 2199.389, 'duration': 0.581}, {'end': 2202.753, 'text': 'right, and i want you to do one thing.', 'start': 2200.53, 'duration': 2.223}, {'end': 2205.436, 'text': 'once you open this, go to this cell menu.', 'start': 2202.753, 'duration': 2.683}, {'end': 2207.418, 'text': 'there is a cell, can you see?', 'start': 2205.436, 'duration': 1.982}, {'end': 2209.261, 'text': 'and there is something called all output.', 'start': 2207.418, 'duration': 1.843}, {'end': 2217.238, 'text': 'clear the last option, Because normally, when you create a notebook, it will have some outputs already.', 'start': 2209.261, 'duration': 7.977}, {'end': 2218.419, 'text': 'So I just want you to clear it.', 'start': 2217.258, 'duration': 1.161}, {'end': 2222.481, 'text': 'So just go to this cell menu, all output, and say clear.', 'start': 2219.119, 'duration': 3.362}, {'end': 2226.584, 'text': 'Now you can also insert your own cells.', 'start': 2222.802, 'duration': 3.782}, {'end': 2229.886, 'text': 'So basically how the notebook works, you can see different, different cells.', 'start': 2226.864, 'duration': 3.022}, {'end': 2232.448, 'text': 'Each cell can be executed independently.', 'start': 2230.507, 'duration': 1.941}, {'end': 2237.691, 'text': 'And you can also write your own cells if you want, right? And this is a code structure.', 'start': 2233.048, 'duration': 4.643}, {'end': 2242.475, 'text': 'So first we will look at some of the native data types in Python, which are important for you.', 'start': 2237.731, 'duration': 4.744}, {'end': 2249.985, 'text': 'And we will look at pandas and basic data frame attributes and common data manipulation task using pandas.', 'start': 2243.195, 'duration': 6.79}, {'end': 2254.913, 'text': 'Then there is loops and functions, visualization, any other miscellaneous topics, right.', 'start': 2250.406, 'duration': 4.507}, {'end': 2260.084, 'text': 'And what I want you to do first is to run this cell.', 'start': 2256.002, 'duration': 4.082}, {'end': 2263.125, 'text': 'Under this basic data types, you have a cell here.', 'start': 2260.224, 'duration': 2.901}, {'end': 2268.127, 'text': 'So either you can press shift enter, or you can click on this run button.', 'start': 2263.685, 'duration': 4.442}, {'end': 2274.49, 'text': 'And how do you know whether it is actually running? You will see this number one here, can you see? This means it has run.', 'start': 2268.968, 'duration': 5.522}, {'end': 2278.807, 'text': 'So that is the only way to identify whether that cell has executed.', 'start': 2276.025, 'duration': 2.782}, {'end': 2284.531, 'text': 'What is that cell exactly? And what is this? So basically I am just doing a small import here.', 'start': 2278.927, 'duration': 5.604}, {'end': 2289.154, 'text': 'So I am saying that from IPython import something called interactive shell.', 'start': 2284.851, 'duration': 4.303}, {'end': 2292.576, 'text': 'And interactive shell code interactively all.', 'start': 2289.874, 'duration': 2.702}], 'summary': 'Understanding healthcare domain and collecting data took 1-2 months. data scientists spent time extracting features and validating ml models. achieved 96-97% accuracy in a cancer-related project.', 'duration': 770.652, 'max_score': 1521.924, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-01521924.jpg'}, {'end': 1738.081, 'src': 'embed', 'start': 1712.871, 'weight': 2, 'content': [{'end': 1719.994, 'text': 'So that project actually got on hold, but we went up to 96, 97% accuracy levels in that.', 'start': 1712.871, 'duration': 7.123}, {'end': 1724.236, 'text': 'So right now that model is not running, but we were able to achieve.', 'start': 1720.274, 'duration': 3.962}, {'end': 1730.118, 'text': 'I was not actively a contributor in that project, but I was helping them since that was a new project actually.', 'start': 1724.656, 'duration': 5.462}, {'end': 1733.399, 'text': 'But we were able to get a good accuracy level in that actually.', 'start': 1730.618, 'duration': 2.781}, {'end': 1738.081, 'text': 'So it depends on what kind of problem you are solving.', 'start': 1735.18, 'duration': 2.901}], 'summary': 'Project achieved 96-97% accuracy levels, though currently not running.', 'duration': 25.21, 'max_score': 1712.871, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-01712871.jpg'}, {'end': 2036.3, 'src': 'embed', 'start': 2008.224, 'weight': 3, 'content': [{'end': 2012.366, 'text': 'This is basically what it does, right? So it is an interactive notebook.', 'start': 2008.224, 'duration': 4.142}, {'end': 2015.669, 'text': "You can type and click and see what you're typing or what it is going to do.", 'start': 2012.446, 'duration': 3.223}, {'end': 2020.792, 'text': "And for most of your data science or ML projects, you'll be using this tool.", 'start': 2016.169, 'duration': 4.623}, {'end': 2026.396, 'text': "Because in ML it's very important that in every step you see what's happening right?", 'start': 2021.753, 'duration': 4.643}, {'end': 2032.197, 'text': "So you don't write a full code and then run it, but rather you load the data, then see what is the data filter.", 'start': 2026.854, 'duration': 5.343}, {'end': 2032.918, 'text': 'it see what it is.', 'start': 2032.197, 'duration': 0.721}, {'end': 2036.3, 'text': 'So the most convenient way to work that is using Jupyter.', 'start': 2033.358, 'duration': 2.942}], 'summary': 'Jupyter is an interactive notebook crucial for data science and ml projects, allowing users to see real-time data processing and analysis.', 'duration': 28.076, 'max_score': 2008.224, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-02008224.jpg'}, {'end': 2584.853, 'src': 'embed', 'start': 2556.908, 'weight': 4, 'content': [{'end': 2569.864, 'text': "If I say zero to three, what it'll print? 20, 30, 40, right? So this is one way of accessing the elements within the list.", 'start': 2556.908, 'duration': 12.956}, {'end': 2577.028, 'text': 'You always use this colon notation and say that which elements or which range you want to actually print.', 'start': 2569.944, 'duration': 7.084}, {'end': 2579.61, 'text': 'And this notation is very common in Python.', 'start': 2577.548, 'duration': 2.062}, {'end': 2584.853, 'text': "This is not only in list, even if you go to data pandas and all, we'll be using this colon.", 'start': 2579.81, 'duration': 5.043}], 'summary': 'Python uses colon notation to access list elements and ranges.', 'duration': 27.945, 'max_score': 2556.908, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-02556908.jpg'}, {'end': 2843.112, 'src': 'embed', 'start': 2810.358, 'weight': 5, 'content': [{'end': 2819.137, 'text': 'So I can say something like dict equal to, And how do you create a dictionary is with a curly braces.', 'start': 2810.358, 'duration': 8.779}, {'end': 2822.738, 'text': 'So whenever you see a curly braces, that is a dictionary.', 'start': 2819.617, 'duration': 3.121}, {'end': 2828.92, 'text': 'And what is a dictionary? A dictionary is a collection of key value pairs.', 'start': 2824.318, 'duration': 4.602}, {'end': 2836.481, 'text': 'For example, I can say Raghu and I am a trainer, right?', 'start': 2829.14, 'duration': 7.341}, {'end': 2843.112, 'text': 'you guys right?', 'start': 2840.31, 'duration': 2.802}], 'summary': 'A dictionary is a collection of key-value pairs, created using curly braces.', 'duration': 32.754, 'max_score': 2810.358, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-02810358.jpg'}], 'start': 1002.932, 'title': 'Car sensor data collection, data analysis, ml hands-on, and introduction to jupyter and python basics', 'summary': 'Discusses the collection of sensor data from moving cars, resulting in terabytes of text data. it also covers the importance of libraries like pandas, matplotlib, and seaborn for data analysis and visualization in python. additionally, it covers the accomplishment of achieving 96-97% accuracy in a project and introduces the usage of jupyter for data science and machine learning projects.', 'chapters': [{'end': 1041.136, 'start': 1002.932, 'title': 'Car sensor data collection', 'summary': 'Discusses the collection of sensor data from moving cars, resulting in terabytes of text data with billions of rows, due to constant tracking of car motion and activities.', 'duration': 38.204, 'highlights': ['The sensor data collected from moving cars results in terabytes of text data with billions of rows, due to continuous tracking of car motion and activities.', 'The car sensor data includes tracking of acceleration, brake, pedestrians, and other activities, resulting in a vast amount of collected data.', 'The sensor data collection occurs continuously, with data being collected every second or even less, resulting in a massive amount of collected data over time.']}, {'end': 1705.284, 'start': 1041.136, 'title': 'Data analysis and visualization in python', 'summary': 'Covers the importance of libraries like pandas, matplotlib, and seaborn for data analysis and visualization in python, emphasizing the need for statistical plotting and the significance of exploratory data analytics (eda) in machine learning projects.', 'duration': 664.148, 'highlights': ['Explaining the role of Pandas in handling large datasets Pandas is a crucial library for working with labeled data, allowing for selection, filtering, indexing, grouping, and joins on large datasets, such as a text or CSV file with 100 billion rows.', 'Emphasizing the importance of statistical plotting with Seaborn and Matplotlib Seaborn and Matplotlib are essential for creating statistical plots, including graphs, bar charts, pie charts, scatter diagrams, and box plots, enabling the calculation and inclusion of properties like mean, median, and standard deviation.', 'Stressing the significance of exploratory data analytics (EDA) in machine learning projects EDA plays a critical role in understanding the domain, collecting and analyzing data, and extracting features necessary for training machine learning models, with a majority of time spent on understanding and manipulating the data.']}, {'end': 2008.104, 'start': 1712.871, 'title': 'Ml hands-on with python and jupyter', 'summary': 'Covers the accomplishment of achieving 96-97% accuracy in a project, the importance of jupyter notebook in data science, and an introduction to using python and jupyter notebook for creating and running code in an interactive environment.', 'duration': 295.233, 'highlights': ['Achieved 96-97% accuracy in a project The project achieved 96-97% accuracy levels.', 'Importance of Jupyter Notebook in data science Jupyter Notebook is highlighted as the primary platform for prototyping and building for data science projects.', 'Introduction to using Python and Jupyter Notebook An introduction to using Python and Jupyter Notebook for creating and running code in an interactive environment.']}, {'end': 2475.997, 'start': 2008.224, 'title': 'Introduction to jupyter and python basics', 'summary': 'Introduces the usage of jupyter for data science and machine learning projects, guiding on how to download and upload files, clear cell outputs, execute cells, and understand basic python data types, with a focus on lists, dictionaries, and tuples.', 'duration': 467.773, 'highlights': ['The chapter introduces the usage of Jupyter for data science and machine learning projects Jupyter is emphasized as an essential tool for data science and machine learning projects, allowing users to interactively type, click, and see the code execution, ensuring that in every step, the data can be filtered and visualized, providing a convenient way to work through ML projects.', "Guide on how to download and upload files in Jupyter Step-by-step instructions are provided for downloading and extracting files from the LMS, as well as uploading files to Jupyter using the 'upload' button and the potential issue of file upload failure in certain browsers such as Internet Explorer.", "Instructions on clearing cell outputs and executing cells in Jupyter The process of clearing cell outputs and executing cells in Jupyter is explained, emphasizing the identification of cell execution through the appearance of a number, and the usage of 'hash' for commenting in Python.", 'Explanation of basic Python data types with a focus on lists, dictionaries, and tuples The chapter provides a detailed explanation of basic Python data types including lists, dictionaries, and tuples, highlighting their importance in data manipulation tasks, and demonstrates the creation and printing of a list using Python, while also explaining the default printing feature in IPython notebook.']}, {'end': 2753.132, 'start': 2477.062, 'title': 'Accessing list elements in python', 'summary': 'Explains how to access elements in a list using index positions and colon notation, emphasizing the common use of the colon notation in python and providing examples of accessing individual elements and ranges within a list.', 'duration': 276.07, 'highlights': ['The colon notation is commonly used in Python to access elements within a list, allowing for the retrieval of specific elements or ranges using index positions, such as zero to two yielding 20, 30 and zero to three yielding 20, 30, 40.', 'The emphasis on accessing ranges of elements within a list is highlighted, with an explanation that it is rare to pick individual elements from a list and examples given on how to access ranges, like one colon providing 30 and all elements after it, and two colon yielding 40, 50, 60.', 'The chapter also introduces the concept of accessing individual elements within a list and challenges the audience to figure out how to print individual elements like 20 and 50, with subsequent discussion and clarification on the correct syntax and approach for achieving this.', 'The instructor encourages the audience to understand and practice accessing list elements, highlighting that it is a fundamental concept in Python and could be useful when dealing with lists containing a large number of elements.']}, {'end': 3213.16, 'start': 2753.133, 'title': 'List and dictionary in python', 'summary': 'Covers the usage of lists and dictionaries in python, highlighting the creation, accessing elements, and the structure of dictionary key-value pairs, and the ability to contain different data types, which is demonstrated through examples and explanations.', 'duration': 460.027, 'highlights': ['Dictionaries are a collection of key-value pairs, represented using curly braces, and they can contain different data types, accessed using keys, and can be used as a lookup table. Dictionaries are a collection of key-value pairs, represented using curly braces, and they can contain different data types. They can be accessed using keys, and they can be used as a lookup table for retrieving values. The example demonstrates the use of keys to access specific values and the ability to mix different data types within a dictionary.', 'Lists can contain different data types, but it is preferable to keep similar types of elements within the list. Lists can contain different data types, but it is preferable to keep similar types of elements within the list. The example illustrates the creation of a list with different data types and the consideration for keeping similar elements within the list.', 'The chapter also introduces the concept of iterating through a dictionary using a for loop to retrieve different values. The chapter introduces the concept of iterating through a dictionary using a for loop to retrieve different values. It emphasizes the use of a for loop to access a range of keys and obtain different values from the dictionary.']}], 'duration': 2210.228, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-01002932.jpg', 'highlights': ['The sensor data collected from moving cars results in terabytes of text data with billions of rows, due to continuous tracking of car motion and activities.', 'Explaining the role of Pandas in handling large datasets Pandas is a crucial library for working with labeled data, allowing for selection, filtering, indexing, grouping, and joins on large datasets, such as a text or CSV file with 100 billion rows.', 'Achieved 96-97% accuracy in a project The project achieved 96-97% accuracy levels.', 'The chapter introduces the usage of Jupyter for data science and machine learning projects Jupyter is emphasized as an essential tool for data science and machine learning projects, allowing users to interactively type, click, and see the code execution, ensuring that in every step, the data can be filtered and visualized, providing a convenient way to work through ML projects.', 'The colon notation is commonly used in Python to access elements within a list, allowing for the retrieval of specific elements or ranges using index positions, such as zero to two yielding 20, 30 and zero to three yielding 20, 30, 40.', 'Dictionaries are a collection of key-value pairs, represented using curly braces, and they can contain different data types, accessed using keys, and can be used as a lookup table. Dictionaries are a collection of key-value pairs, represented using curly braces, and they can contain different data types. They can be accessed using keys, and they can be used as a lookup table for retrieving values.']}, {'end': 6098.591, 'segs': [{'end': 3340.394, 'src': 'embed', 'start': 3308.745, 'weight': 0, 'content': [{'end': 3311.827, 'text': 'So you can always see there is something called append.', 'start': 3308.745, 'duration': 3.082}, {'end': 3318.512, 'text': "So I can actually do a append and let's say seven.", 'start': 3312.007, 'duration': 6.505}, {'end': 3330.912, 'text': 'So what happens here is if I do a my list, Meaning lists are mutable.', 'start': 3320.713, 'duration': 10.199}, {'end': 3333.813, 'text': 'You can add elements, you can even remove elements.', 'start': 3331.233, 'duration': 2.58}, {'end': 3336.994, 'text': 'There is a method called pop and there is a method called remove.', 'start': 3333.873, 'duration': 3.121}, {'end': 3340.394, 'text': 'Using that you can just pull the, remove the elements, you can add elements.', 'start': 3337.194, 'duration': 3.2}], 'summary': 'Lists in python are mutable, allowing addition and removal of elements.', 'duration': 31.649, 'max_score': 3308.745, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-03308745.jpg'}, {'end': 3769.637, 'src': 'embed', 'start': 3743.168, 'weight': 1, 'content': [{'end': 3749.95, 'text': 'The model and all are fine, but if you are doing things like deep learning and all, you need a lot of GPU, graphical processing unit.', 'start': 3743.168, 'duration': 6.782}, {'end': 3753.524, 'text': 'That is a challenge because GPUs are very costly.', 'start': 3750.782, 'duration': 2.742}, {'end': 3754.625, 'text': 'You know what is GPU right?', 'start': 3753.564, 'duration': 1.061}, {'end': 3757.487, 'text': "It's graphics card kind of thing, right?", 'start': 3754.645, 'duration': 2.842}, {'end': 3762.091, 'text': "So if you're using a laptop or something, it will not work because you will have what.", 'start': 3758.368, 'duration': 3.723}, {'end': 3764.513, 'text': "two GB or, I don't know, three GB max.", 'start': 3762.091, 'duration': 2.422}, {'end': 3769.637, 'text': 'So one possibility we explored was that you can sign up with the Google Cloud.', 'start': 3765.193, 'duration': 4.444}], 'summary': 'Deep learning requires expensive gpus; google cloud offers a solution.', 'duration': 26.469, 'max_score': 3743.168, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-03743168.jpg'}, {'end': 4132.225, 'src': 'embed', 'start': 4101.96, 'weight': 2, 'content': [{'end': 4105.603, 'text': 'So basically this is an actual data set which we have collected.', 'start': 4101.96, 'duration': 3.643}, {'end': 4108.005, 'text': 'So this is regarding the Uber trip data.', 'start': 4105.803, 'duration': 2.202}, {'end': 4113.089, 'text': 'So how do you explain this data? So there is a start date and end date.', 'start': 4108.786, 'duration': 4.303}, {'end': 4118.053, 'text': 'So start date is when the trip was started and end date is when it was ended.', 'start': 4113.87, 'duration': 4.183}, {'end': 4120.935, 'text': 'Most of the cases it is same date, same date trips.', 'start': 4118.634, 'duration': 2.301}, {'end': 4123.938, 'text': "Then there is a category that's always business.", 'start': 4121.876, 'duration': 2.062}, {'end': 4127.6, 'text': 'Then there is a starting point of the trip for the Uber driver.', 'start': 4124.479, 'duration': 3.121}, {'end': 4132.225, 'text': 'So these are all cities in US, right? New York, Fort Pierce and all.', 'start': 4127.801, 'duration': 4.424}], 'summary': 'Uber trip data analysis of us cities with common start and end dates and business category.', 'duration': 30.265, 'max_score': 4101.96, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-04101960.jpg'}, {'end': 4585.18, 'src': 'heatmap', 'start': 4199.675, 'weight': 0.77, 'content': [{'end': 4204.237, 'text': 'Can you see? So just click on the upload button, select the over drives, click upload.', 'start': 4199.675, 'duration': 4.562}, {'end': 4208.919, 'text': 'It should ideally show you on the screen of the homepage like this.', 'start': 4204.317, 'duration': 4.602}, {'end': 4214.742, 'text': 'And Python has a lot of ways to read a different type of files.', 'start': 4210.54, 'duration': 4.202}, {'end': 4218.108, 'text': 'Right now, we are interested in CSV.', 'start': 4216.247, 'duration': 1.861}, {'end': 4221.891, 'text': 'You can also read Excel files and other formats of files.', 'start': 4218.248, 'duration': 3.643}, {'end': 4224.994, 'text': "So for the time being, we'll concentrate on CSV files.", 'start': 4222.351, 'duration': 2.643}, {'end': 4230.198, 'text': 'If you want to read a CSV file, all you need to do is that you can say so.', 'start': 4225.534, 'duration': 4.664}, {'end': 4231.279, 'text': 'what is this line?', 'start': 4230.198, 'duration': 1.081}, {'end': 4234.841, 'text': 'import pandas as PD?', 'start': 4231.279, 'duration': 3.562}, {'end': 4236.463, 'text': 'What do you mean by this as PD?', 'start': 4234.942, 'duration': 1.521}, {'end': 4239.938, 'text': "It's an aliasing, right?", 'start': 4238.857, 'duration': 1.081}, {'end': 4241.058, 'text': "It's an alias.", 'start': 4240.438, 'duration': 0.62}, {'end': 4243.319, 'text': "So the library that I'm importing is pandas.", 'start': 4241.218, 'duration': 2.101}, {'end': 4247.421, 'text': "I'm saying that I want to import pandas as pd.", 'start': 4244.06, 'duration': 3.361}, {'end': 4249.822, 'text': "So it's just an alias name.", 'start': 4248.362, 'duration': 1.46}, {'end': 4252.023, 'text': 'And so pd will refer to pandas.', 'start': 4250.262, 'duration': 1.761}, {'end': 4260.948, 'text': 'If you want to read the CSV using pandas, you simply say pd.read underscore CSV and just give the name of the file.', 'start': 4252.403, 'duration': 8.545}, {'end': 4267.691, 'text': 'And then if you actually want to see the file, okay, you can simply type df.', 'start': 4261.648, 'duration': 6.043}, {'end': 4272.778, 'text': "Oh, I didn't run this, right? Sorry.", 'start': 4271.038, 'duration': 1.74}, {'end': 4277.219, 'text': "You have to first run this, right? I'll run this, then run this.", 'start': 4273.178, 'duration': 4.041}, {'end': 4281.06, 'text': 'Might take a moment to read it.', 'start': 4279.92, 'duration': 1.14}, {'end': 4290.801, 'text': 'So DF will be your file.', 'start': 4287.661, 'duration': 3.14}, {'end': 4292.562, 'text': 'I mean the variable in which you are reading it.', 'start': 4290.862, 'duration': 1.7}, {'end': 4294.322, 'text': "I don't know why it is taking this much time.", 'start': 4292.702, 'duration': 1.62}, {'end': 4296.082, 'text': 'Usually it should read very fast.', 'start': 4294.642, 'duration': 1.44}, {'end': 4299.043, 'text': 'My PC is actually very slow today.', 'start': 4297.243, 'duration': 1.8}, {'end': 4300.865, 'text': 'for some reason.', 'start': 4300.165, 'duration': 0.7}, {'end': 4303.286, 'text': 'Let me do one thing ok.', 'start': 4302.486, 'duration': 0.8}, {'end': 4324.453, 'text': 'I will just, so I will just say import this, why it is saying, it is still star for me.', 'start': 4303.346, 'duration': 21.107}, {'end': 4340.421, 'text': "Are you guys able to read it? Yes, so can you see this? So the moment you say DF, it's gonna print it.", 'start': 4324.553, 'duration': 15.868}, {'end': 4347.766, 'text': 'Now the question is, what exactly is this thing called DF? DF is called a data frame.', 'start': 4340.901, 'duration': 6.865}, {'end': 4351.848, 'text': 'Well, the name can be anything, it can be Raghu.', 'start': 4348.466, 'duration': 3.382}, {'end': 4354.35, 'text': "I'm saying the data type is called a data frame.", 'start': 4352.048, 'duration': 2.302}, {'end': 4364.511, 'text': 'What is a data frame? A data frame is the basic data structure we have inside pandas, which represents your data in the form of rows and columns.', 'start': 4355.438, 'duration': 9.073}, {'end': 4370.52, 'text': 'So this pretty much looks like your Excel spreadsheet, and that data structure is called a data frame.', 'start': 4365.132, 'duration': 5.388}, {'end': 4378.903, 'text': "So in order to create a data frame, what I'm doing, I'm just reading the CSV file using this method and assigning to this variable called DF.", 'start': 4371.221, 'duration': 7.682}, {'end': 4387.184, 'text': 'Now DF is my data frame, right? And if you simply type the name of the data frame, it should print the output like this.', 'start': 4379.403, 'duration': 7.781}, {'end': 4389.245, 'text': 'So you can actually see what is inside this.', 'start': 4387.524, 'duration': 1.721}, {'end': 4397.306, 'text': 'Also in Python, if you want to verify the data type of anything, you can do this.', 'start': 4391.185, 'duration': 6.121}, {'end': 4406.584, 'text': 'You can simply say, I think it should work, type of df, yeah.', 'start': 4398.127, 'duration': 8.457}, {'end': 4413.749, 'text': 'So you can just say type of df, it will say pandas core frame data frame.', 'start': 4407.064, 'duration': 6.685}, {'end': 4415.13, 'text': 'So basically this is a data frame.', 'start': 4413.789, 'duration': 1.341}, {'end': 4417.632, 'text': 'So pandas is actually built on top of NumPy.', 'start': 4415.53, 'duration': 2.102}, {'end': 4420.57, 'text': 'It is built on the NumPy library actually.', 'start': 4418.868, 'duration': 1.702}, {'end': 4422.431, 'text': 'It does not directly use NumPy.', 'start': 4421.01, 'duration': 1.421}, {'end': 4426.915, 'text': 'The methods are different, but the core data structures are built on top of NumPy.', 'start': 4422.752, 'duration': 4.163}, {'end': 4428.917, 'text': 'We need to use pandas actually.', 'start': 4427.376, 'duration': 1.541}, {'end': 4432.941, 'text': "NumPy doesn't have any built-in libraries to read Excel files or anything.", 'start': 4428.937, 'duration': 4.004}, {'end': 4435.563, 'text': "Also in NumPy, you don't have this labels.", 'start': 4433.381, 'duration': 2.182}, {'end': 4440.448, 'text': 'This, I print this, df.', 'start': 4436.084, 'duration': 4.364}, {'end': 4444.202, 'text': 'You have this start date, end date, category.', 'start': 4441.598, 'duration': 2.604}, {'end': 4448.167, 'text': 'Those are the column headers, right? That structure is not available in NumPy.', 'start': 4444.322, 'duration': 3.845}, {'end': 4451.392, 'text': 'So it anyway will not be able to read what is in the data.', 'start': 4448.227, 'duration': 3.165}, {'end': 4453.515, 'text': 'So in order to read, I need a Pandas.', 'start': 4451.792, 'duration': 1.723}, {'end': 4458.542, 'text': "So I'm saying that in the data frame, you can actually call it as a column name that is possible.", 'start': 4453.795, 'duration': 4.747}, {'end': 4461.086, 'text': 'But in NumPy, there is no way to do that.', 'start': 4459.124, 'duration': 1.962}, {'end': 4463.088, 'text': "It doesn't support any labeling as such.", 'start': 4461.286, 'duration': 1.802}, {'end': 4469.734, 'text': 'So now the question is that if you simply say read underscore CSV, how it is able to, you know, read it like this.', 'start': 4463.268, 'duration': 6.466}, {'end': 4472.777, 'text': 'For example, this is my column headers in Excel.', 'start': 4470.295, 'duration': 2.482}, {'end': 4474.939, 'text': 'Sorry, CSV that it is reading.', 'start': 4473.598, 'duration': 1.341}, {'end': 4479.384, 'text': 'So by default, the setting is like the first row will be considered as your column header.', 'start': 4475.26, 'duration': 4.124}, {'end': 4485.808, 'text': "Now, what if I don't have a column header, right? I get a CSV file, there is no column header.", 'start': 4480.404, 'duration': 5.404}, {'end': 4492.472, 'text': 'So if you Google for this pd.readcsv method, there are some arguments you can pass.', 'start': 4486.908, 'duration': 5.564}, {'end': 4504.46, 'text': 'You can say skip a line, skip the header, add a header or, and one common problem that we have when you do things like this is that,', 'start': 4493.873, 'duration': 10.587}, {'end': 4508.823, 'text': 'even though we are not very stringent about this, we are always looking for data types.', 'start': 4504.46, 'duration': 4.363}, {'end': 4517.586, 'text': "So one common problem that you're going to have is that if I read this, this is fine, I have something called a start date and end date.", 'start': 4510.059, 'duration': 7.527}, {'end': 4521.009, 'text': 'So what is this? This is dates.', 'start': 4518.587, 'duration': 2.422}, {'end': 4527.494, 'text': 'So normally you want to do things like I want to subtract one date from another date or I want to compare two dates.', 'start': 4521.289, 'duration': 6.205}, {'end': 4533.439, 'text': 'Now if I want to do something like that, they should be represented in the date data type.', 'start': 4527.915, 'duration': 5.524}, {'end': 4536.142, 'text': 'But the problem is that normally when it reads, this will be string.', 'start': 4533.68, 'duration': 2.462}, {'end': 4538.733, 'text': 'I will show you how to look into that.', 'start': 4537.452, 'duration': 1.281}, {'end': 4543.215, 'text': "But normally when you simply say pd.readcsv, it's gonna assume this is a string.", 'start': 4539.053, 'duration': 4.162}, {'end': 4545.315, 'text': 'Everything is a string, unless it sees some integer.', 'start': 4543.435, 'duration': 1.88}, {'end': 4548.877, 'text': 'This will be an integer or a float, but these are all strings.', 'start': 4545.676, 'duration': 3.201}, {'end': 4556.74, 'text': 'So in this pd.readcsv method, if you read about it in pandas, I will show you, there are some arguments you can pass.', 'start': 4549.297, 'duration': 7.443}, {'end': 4560.142, 'text': 'For example, I can say the third and fourth column are dates.', 'start': 4557.161, 'duration': 2.981}, {'end': 4562.083, 'text': 'Please consider them as data type.', 'start': 4560.522, 'duration': 1.561}, {'end': 4565.525, 'text': 'or skip the first line that is not the header, et cetera.', 'start': 4562.643, 'duration': 2.882}, {'end': 4569.708, 'text': 'But as of now, we are simply reading it because the header is also the same one here.', 'start': 4565.946, 'duration': 3.762}, {'end': 4575.132, 'text': 'Now, I think we will be able to see this if I do a df.', 'start': 4570.789, 'duration': 4.343}, {'end': 4579.876, 'text': 'Look at here.', 'start': 4579.436, 'duration': 0.44}, {'end': 4585.18, 'text': 'You can always do a df.dtypes.', 'start': 4581.857, 'duration': 3.323}], 'summary': 'The transcript discusses how to read and manipulate csv files using pandas in python, with an emphasis on data types and column headers.', 'duration': 385.505, 'max_score': 4199.675, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-04199675.jpg'}, {'end': 4769.273, 'src': 'embed', 'start': 4735.284, 'weight': 3, 'content': [{'end': 4737.004, 'text': 'my original file gets altered.', 'start': 4735.284, 'duration': 1.72}, {'end': 4741.085, 'text': "So all these changes what I'm doing, it is not going to affect my original file.", 'start': 4737.524, 'duration': 3.561}, {'end': 4742.746, 'text': 'So you could have downloaded somewhere.', 'start': 4741.485, 'duration': 1.261}, {'end': 4746.647, 'text': "So this will upload it, it's on workspace.", 'start': 4743.286, 'duration': 3.361}, {'end': 4749.307, 'text': 'And if I want, there is a method called pd.toCSV.', 'start': 4747.047, 'duration': 2.26}, {'end': 4756.184, 'text': 'So if I have this data frame I created this did some manipulation.', 'start': 4752.421, 'duration': 3.763}, {'end': 4757.605, 'text': 'I want to save it back.', 'start': 4756.184, 'duration': 1.421}, {'end': 4761.207, 'text': 'I can say pd to csv, back to the csv file.', 'start': 4757.605, 'duration': 3.602}, {'end': 4761.888, 'text': "that's also possible.", 'start': 4761.207, 'duration': 0.681}, {'end': 4769.273, 'text': 'So this point is very important because when you look at the data types, this guy is object, this guy is object.', 'start': 4763.309, 'duration': 5.964}], 'summary': 'Data can be manipulated and saved using pd.tocsv, without altering the original file.', 'duration': 33.989, 'max_score': 4735.284, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-04735284.jpg'}, {'end': 5961.902, 'src': 'embed', 'start': 5932.409, 'weight': 4, 'content': [{'end': 5936.451, 'text': 'So you can mention the format, month, date and all if you want.', 'start': 5932.409, 'duration': 4.042}, {'end': 5944.114, 'text': 'So there is a URL I have given here, HTTPS Python 3 library date time.', 'start': 5936.811, 'duration': 7.303}, {'end': 5952.478, 'text': 'So you can just open this to understand more about the date time functionalities that you want to use.', 'start': 5944.614, 'duration': 7.864}, {'end': 5961.902, 'text': "And also see when you're actually doing ML classes, There, this will be again repeated.", 'start': 5955.459, 'duration': 6.443}], 'summary': 'Url provided for understanding python 3 library date time functionalities.', 'duration': 29.493, 'max_score': 5932.409, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-05932409.jpg'}], 'start': 3213.2, 'title': 'Python, deep learning, pandas, and date time', 'summary': 'Covers python lists and tuples, differences, and demonstrations; challenges in deep learning model building, cost of gpus, and distributed computing platforms; working with pandas, analyzing uber trip data, and data verification; pandas data manipulation, data frames, and date conversion; working with date time in python, date format conversion, and pandas data frame methods.', 'chapters': [{'end': 3602.226, 'start': 3213.2, 'title': 'Python lists and tuples', 'summary': 'Covers the differences between lists and tuples in python, highlighting their mutability and immutability, along with demonstrations on accessing list elements, adding and removing elements, and installing libraries using pip and conda.', 'duration': 389.026, 'highlights': ['Lists are mutable, allowing addition and removal of elements, demonstrated by accessing elements, using the append method to add elements, and the pop and remove methods to remove elements. The chapter demonstrates how lists are mutable and allows addition and removal of elements, such as accessing elements, using the append method to add elements, and the pop and remove methods to remove elements.', 'Tuples are immutable, preventing modification after creation, and lack methods like append, emphasizing their immutability. The chapter explains the immutability of tuples, highlighting that they prevent modification after creation and lack methods like append, emphasizing their immutability.', 'Demonstrates the process of accessing and modifying list elements using Jupyter Notebook, showcasing the tab completion feature and methods available for editing lists. The chapter demonstrates the process of accessing and modifying list elements using Jupyter Notebook, showcasing the tab completion feature and methods available for editing lists.', "Explains the process of installing Python libraries using pip, emphasizing the command 'pip install' and the use of exclamation mark to run it from the command line within Jupyter Notebook. The chapter explains the process of installing Python libraries using pip, emphasizing the command 'pip install' and the use of exclamation mark to run it from the command line within Jupyter Notebook."]}, {'end': 4034.84, 'start': 3602.266, 'title': 'Challenges in deep learning model building', 'summary': 'Discusses the challenges faced in building deep learning models, including the need for high-cost gpus, limited free credits from cloud providers like google, and the potential of using distributed computing platforms like apache spark.', 'duration': 432.574, 'highlights': ['The challenge of obtaining high-cost GPUs for deep learning models, with a need for around 16 gigs of GPU for training, leading to the rapid depletion of free credits from cloud providers like Google. The need for high-cost GPUs, around 16 gigs, for deep learning models, causing rapid depletion of free credits from cloud providers like Google, which provide around 21,000 rupees worth of credits.', 'The limited free credits from cloud providers like Google and the high cost of GPU-based services, making deep learning model building a costly endeavor. The limited free credits from cloud providers like Google and the high cost of GPU-based services making deep learning model building a costly endeavor.', 'The potential of using distributed computing platforms like Apache Spark for building and training ML models, with the need to learn about different data types and slight changes in the exploratory data analysis (EDA) process. The potential of using distributed computing platforms like Apache Spark for building and training ML models, with the need to learn about different data types and slight changes in the exploratory data analysis (EDA) process.']}, {'end': 4611.228, 'start': 4036.221, 'title': 'Working with pandas and analyzing uber trip data', 'summary': 'Introduces working with pandas, analyzing uber trip data including start and end dates, locations, miles covered, and purpose of trips, and explains the process of reading and verifying the data using pandas and data frames.', 'duration': 575.007, 'highlights': ['The data set is about Uber trip data with details of start and end dates, locations, miles covered, and purpose of the trips.', 'The data set contains roughly around thousand plus lines of data.', 'The process of reading and verifying the data involves using pandas, creating a data frame, and checking the data types of each column.']}, {'end': 5475.065, 'start': 4611.288, 'title': 'Pandas data manipulation', 'summary': 'Covers key concepts of data manipulation using pandas, including aliasing for library simplification, preventing alterations to original files, converting string to date data types, understanding row indices, creating data frames from dictionaries, and utilizing pd.todatetime method for date conversion.', 'duration': 863.777, 'highlights': ['Preventing alterations to original files It is important to prevent changes to original files when performing data manipulation, as altering the original file could lead to loss of original data and affect the integrity of the dataset.', 'Converting string to date data types The chapter discusses the common requirement of converting string data types to date data types in scenarios such as dealing with sensor data with timestamp columns in Unix timestamp format, emphasizing the significance of this conversion for enabling data manipulation.', "Aliasing for library simplification The concept of aliasing in Pandas, such as using 'pd' as an alias for 'pandas', is explained as a method for simplifying the syntax and reducing the need to repeatedly type out the full library names, enhancing code readability and efficiency.", 'Understanding row indices The significance of row indices in Pandas is highlighted, along with the confusion it may cause for individuals accustomed to Excel, and the potential to define custom indices for improved data management.', 'Creating data frames from dictionaries The ability to create data frames from dictionaries in Pandas is demonstrated, illustrating how keys in the dictionary become column names and values become the corresponding data, showcasing a method for manual data frame creation.']}, {'end': 6098.591, 'start': 5475.065, 'title': 'Working with date time in python', 'summary': 'Discusses working with date time in python, including how to convert date formats, handling multiple dates using lists, and using describe method and value count method in pandas data frames.', 'duration': 623.526, 'highlights': ['The chapter discusses working with date time in Python, including how to convert date formats, handling multiple dates using lists, and using describe method and value count method in pandas data frames. Working with date time, converting date formats, handling multiple dates using lists, using describe method and value count method in pandas data frames.', 'The built-in date time function in Python allows for easy conversion of date formats by importing the date time module and using the strip time function. Built-in date time function, importing date time module, using strip time function for date format conversion.', "When handling multiple dates, it's advantageous to keep them in a list, as it allows for easy conversion of multiple elements to date time format. Advantage of using lists for handling multiple dates, easy conversion of multiple elements to date time format.", 'The describe method in pandas data frames provides a quick overview of the data set, including count, unique values, mean, standard deviation, and more. Utility of describe method in pandas data frames, providing a quick overview of the data set.', 'The value count method in pandas data frames is useful for obtaining the count of unique values in a specific column, providing insights into the distribution of data. Utility of value count method in pandas data frames, obtaining count of unique values.']}], 'duration': 2885.391, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-03213200.jpg', 'highlights': ['Lists are mutable, allowing addition and removal of elements, demonstrated by accessing elements, using the append method to add elements, and the pop and remove methods to remove elements.', 'The challenge of obtaining high-cost GPUs for deep learning models, with a need for around 16 gigs of GPU for training, leading to the rapid depletion of free credits from cloud providers like Google.', 'The data set is about Uber trip data with details of start and end dates, locations, miles covered, and purpose of the trips.', 'Preventing alterations to original files It is important to prevent changes to original files when performing data manipulation, as altering the original file could lead to loss of original data and affect the integrity of the dataset.', 'The chapter discusses working with date time in Python, including how to convert date formats, handling multiple dates using lists, and using describe method and value count method in pandas data frames.']}, {'end': 9033.122, 'segs': [{'end': 6175.681, 'src': 'embed', 'start': 6147.062, 'weight': 0, 'content': [{'end': 6149.185, 'text': 'Mutating conditionally adding the column.', 'start': 6147.062, 'duration': 2.123}, {'end': 6155.414, 'text': 'I have all the sales revenue, I want to add a new column, the total or the mean of the sales, something like that.', 'start': 6150.006, 'duration': 5.408}, {'end': 6157.608, 'text': 'group by summarize.', 'start': 6156.607, 'duration': 1.001}, {'end': 6160.63, 'text': 'So, for example, the Uber trip data.', 'start': 6158.308, 'duration': 2.322}, {'end': 6168.276, 'text': 'I want to group by the start location and find out all the trips which are having more than 10 miles or something like that, these kinds of analysis.', 'start': 6160.63, 'duration': 7.646}, {'end': 6175.681, 'text': 'So, if you can get a basic idea about these five tasks majority of your what you say EDA is done right?', 'start': 6168.656, 'duration': 7.025}], 'summary': 'Mutating and summarizing data for sales revenue and uber trip analysis.', 'duration': 28.619, 'max_score': 6147.062, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-06147062.jpg'}, {'end': 7525.059, 'src': 'embed', 'start': 7478.789, 'weight': 1, 'content': [{'end': 7481.37, 'text': 'actually, loc is a selection attribute.', 'start': 7478.789, 'duration': 2.581}, {'end': 7484.892, 'text': 'so when i am calling loc, i just need to mention what are the index?', 'start': 7481.37, 'duration': 3.522}, {'end': 7487.834, 'text': 'it takes only two separate positions.', 'start': 7484.892, 'duration': 2.942}, {'end': 7492.696, 'text': 'one is the row, one is the column.', 'start': 7487.834, 'duration': 4.862}, {'end': 7495.258, 'text': 'yes, internally, so it is not a function.', 'start': 7492.696, 'duration': 2.562}, {'end': 7499.56, 'text': 'if loc was a function or a method, i will see that brackets and options.', 'start': 7495.258, 'duration': 4.302}, {'end': 7500.741, 'text': 'that is not there.', 'start': 7499.56, 'duration': 1.181}, {'end': 7503.102, 'text': 'so whenever you see those brackets, that means it is a method.', 'start': 7500.741, 'duration': 2.361}, {'end': 7504.883, 'text': 'actually, this is not a method.', 'start': 7503.102, 'duration': 1.781}, {'end': 7525.059, 'text': 'So why do not you do this in class assignment?', 'start': 7522.977, 'duration': 2.082}], 'summary': "The 'loc' attribute takes two separate positions, row and column, and is not a method.", 'duration': 46.27, 'max_score': 7478.789, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-07478789.jpg'}, {'end': 7951.666, 'src': 'embed', 'start': 7919.985, 'weight': 3, 'content': [{'end': 7928.927, 'text': 'For example, one of the condition that I want to put is that I want to filter all the trips which are greater than 10 miles.', 'start': 7919.985, 'duration': 8.942}, {'end': 7932.133, 'text': 'So there is a miles column and I need to filter.', 'start': 7929.912, 'duration': 2.221}, {'end': 7941.5, 'text': "But before you run this, don't directly run this, let us see what will happen if I do this only.", 'start': 7932.314, 'duration': 9.186}, {'end': 7951.666, 'text': "So I'm just copying only one part of the code, which says in the data frame, I have a miles column.", 'start': 7945.322, 'duration': 6.344}], 'summary': 'Filter trips greater than 10 miles from data frame.', 'duration': 31.681, 'max_score': 7919.985, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-07919985.jpg'}, {'end': 8494.326, 'src': 'embed', 'start': 8454.63, 'weight': 4, 'content': [{'end': 8464.753, 'text': 'For example, I want to select the start location, find out all the rides that is, you know, starting from Cary and Mooresville.', 'start': 8454.63, 'duration': 10.123}, {'end': 8469.394, 'text': 'So if you want to, you know, look for multiple locations, you can do this.', 'start': 8464.793, 'duration': 4.601}, {'end': 8472.915, 'text': "Or let's say you want Cary and New York.", 'start': 8469.454, 'duration': 3.461}, {'end': 8491.105, 'text': 'If it is having either carry or New York, it is going to give you the answer.', 'start': 8486.664, 'duration': 4.441}, {'end': 8494.326, 'text': 'So and if I do a, let us save it as something.', 'start': 8491.165, 'duration': 3.161}], 'summary': 'Demonstrating how to select start locations and search for multiple locations in ridesharing app.', 'duration': 39.696, 'max_score': 8454.63, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-08454630.jpg'}], 'start': 6098.611, 'title': 'Data frame manipulation in pandas', 'summary': 'Covers data manipulation tasks in data frames for uber trip data analysis, selecting and indexing data using iloc and loc in pandas, filtering and selecting data frames, and the use of pandas for data filtering, emphasizing the importance of proper syntax and practical data manipulation and visualization.', 'chapters': [{'end': 6168.276, 'start': 6098.611, 'title': 'Data manipulation in data frames', 'summary': 'Explores common data manipulation tasks in data frames, including selecting, filtering, sorting, mutating, grouping, and summarizing data in the context of uber trip data analysis.', 'duration': 69.665, 'highlights': ['Group by summarize: Analyzing Uber trip data by grouping start locations and finding trips with more than 10 miles.', 'Mutating conditionally adding the column: Adding new columns, such as total sales or mean sales, based on existing data in the data frame.', 'Sorting the data: Performing sorting operations on the data frame to arrange the data in a specific order.', 'Filtering the data: Selecting values greater than or less than a certain threshold from the data frame.', 'Selecting indexing the data: Common task of selecting one or multiple columns from the data frame.']}, {'end': 7141.409, 'start': 6168.656, 'title': 'Selecting and indexing data in pandas', 'summary': 'Covers selecting and indexing data using iloc and loc in pandas, with iloc being deprecated and discouraged due to its reliance on numerical indexing, while loc is preferred for label-based indexing, resulting in easier and more practical data manipulation and visualization.', 'duration': 972.753, 'highlights': ['The importance of understanding the difference between iloc and loc, with iloc being deprecated and discouraged due to its reliance on numerical indexing, while loc is preferred for label-based indexing, resulting in easier and more practical data manipulation and visualization.', 'The practical demonstration of using iloc and loc to select specific rows and columns from a DataFrame, demonstrating the application of iloc and loc in slicing and selecting data based on numerical indexing and label-based indexing.', 'The need to save the result of selecting data using iloc or loc as a new DataFrame, as the original DataFrame remains unaffected, highlighting the importance of understanding the implications of selecting and indexing data in Pandas.', 'The explanation of the difference between returning a series and a DataFrame when selecting a single column using iloc and loc, along with the solution of passing the selection as a list to ensure the output as a DataFrame instead of a series.']}, {'end': 7799.732, 'start': 7142.769, 'title': 'Using pandas: data frame manipulation', 'summary': 'Discusses the use of loc and iloc methods in pandas for data frame manipulation, highlighting the differences between using lists, tuples, and data frame methods, and emphasizing the importance of proper syntax when selecting specific rows and columns.', 'duration': 656.963, 'highlights': ['The loc method in Pandas accepts either a list or a tuple as an input parameter, allowing for the selection of specific columns. The loc method can accept either a list or a tuple as an input parameter, allowing for the selection of specific columns in a data frame.', 'The ideal condition for passing parameters to the data frame is to pass a list for multiple columns and a string for single columns, as using a tuple may result in the data type being treated as a series instead of a data frame. The ideal condition for passing parameters to the data frame is to pass a list for multiple columns and a string for single columns, as using a tuple may result in the data type being treated as a series instead of a data frame.', 'The use of the loc method for selecting specific rows and columns is emphasized, with a suggested assignment to extract the first 5 rows and specific columns from a data frame using either iloc or loc. The use of the loc method for selecting specific rows and columns is emphasized, with a suggested assignment to extract the first 5 rows and specific columns from a data frame using either iloc or loc.']}, {'end': 8394.757, 'start': 7799.752, 'title': 'Data frame filtering and selection', 'summary': 'Discusses data frame filtering and selection using iloc and filtering based on conditions and columns, with an emphasis on the importance of correct syntax and understanding the impact of filtering on the entire data frame.', 'duration': 595.005, 'highlights': ['Filtering the data frame based on a condition, such as miles greater than 10, using iloc and creating a new data frame with specific columns, emphasizing the importance of syntax and its impact on the entire data frame. Filtering based on a condition like miles greater than 10, creating a new data frame with specific columns.', 'Explaining the impact of syntax error when applying a condition to specific columns, and clarifying the distinction between applying a condition to all columns or specific columns. Clarifying the distinction between applying a condition to all columns or specific columns and explaining the impact of syntax error.', 'Emphasizing the need for correct syntax when selecting specific columns after applying a condition, and demonstrating the error encountered due to treating the condition as a series object. Emphasizing the need for correct syntax when selecting specific columns after applying a condition, demonstrating the error encountered due to treating the condition as a series object.']}, {'end': 9033.122, 'start': 8395.177, 'title': 'Pandas data filtering', 'summary': "Covers the use of pandas for data filtering, including the 'is in' operator and the importance of using brackets for multiple filter conditions, with a focus on demonstrating the application of these concepts in python's pandas library.", 'duration': 637.945, 'highlights': ["Demonstration of using the 'is in' operator for matching multiple conditions, such as selecting rides starting from Cary and Mooresville or Cary and New York. Usage of 'is in' operator for matching multiple conditions.", "Importance of using brackets for applying multiple filter conditions, and the need to use 'and' within brackets for filtering. Importance of using brackets for multiple filter conditions.", "Explanation of the difference between 'iloc' and 'loc' in pandas for filtering, with emphasis on the significance of 'loc' being name-based indexing for rows and columns. Difference between 'iloc' and 'loc' in pandas for filtering."]}], 'duration': 2934.511, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-06098611.jpg', 'highlights': ['Group by summarize: Analyzing Uber trip data by grouping start locations and finding trips with more than 10 miles.', 'The importance of understanding the difference between iloc and loc, with iloc being deprecated and discouraged due to its reliance on numerical indexing, while loc is preferred for label-based indexing, resulting in easier and more practical data manipulation and visualization.', 'The loc method in Pandas accepts either a list or a tuple as an input parameter, allowing for the selection of specific columns.', 'Filtering the data frame based on a condition, such as miles greater than 10, using iloc and creating a new data frame with specific columns, emphasizing the importance of syntax and its impact on the entire data frame.', "Demonstration of using the 'is in' operator for matching multiple conditions, such as selecting rides starting from Cary and Mooresville or Cary and New York."]}, {'end': 10244.399, 'segs': [{'end': 9446.194, 'src': 'embed', 'start': 9417.281, 'weight': 0, 'content': [{'end': 9425.567, 'text': 'Right? So that is where you say in df1, I will say df1, okay, dot reset index.', 'start': 9417.281, 'duration': 8.286}, {'end': 9434.104, 'text': 'The problem is if I run this command, it will reset the index, but the original data frame will not be affected.', 'start': 9428.459, 'duration': 5.645}, {'end': 9436.806, 'text': 'Then what is the use? Right.', 'start': 9434.804, 'duration': 2.002}, {'end': 9439.308, 'text': 'I want df1 to be my final output.', 'start': 9437.347, 'duration': 1.961}, {'end': 9443.071, 'text': 'But if I say reset index, it will not change anything here.', 'start': 9439.749, 'duration': 3.322}, {'end': 9446.194, 'text': 'That is where you are saying in place equal to true.', 'start': 9443.272, 'duration': 2.922}], 'summary': 'To make df1 the final output, use df1.reset_index(inplace=true).', 'duration': 28.913, 'max_score': 9417.281, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-09417281.jpg'}, {'end': 9524.637, 'src': 'embed', 'start': 9477.453, 'weight': 1, 'content': [{'end': 9480.334, 'text': 'Meaning drop equal to true means it will drop this column.', 'start': 9477.453, 'duration': 2.881}, {'end': 9481.594, 'text': 'This will be a final data.', 'start': 9480.594, 'duration': 1}, {'end': 9485.919, 'text': 'Well it is a bit I mean complicated to understand but that is what is happening.', 'start': 9482.298, 'duration': 3.621}, {'end': 9491.08, 'text': 'You are resetting the index and removing the old index that is what you are doing alright.', 'start': 9486.299, 'duration': 4.781}, {'end': 9496.741, 'text': 'So you will see these things when you work sometimes and sometimes the requirement will be totally different.', 'start': 9491.48, 'duration': 5.261}, {'end': 9499.782, 'text': 'because I do not want to touch the original data frame, then it is fine.', 'start': 9496.741, 'duration': 3.041}, {'end': 9502.943, 'text': 'I save it as another data frame and work on my things right.', 'start': 9499.782, 'duration': 3.161}, {'end': 9511.068, 'text': 'df will anyway remain as it is because after filter this is my new data frame filter will not affect here this will remain there.', 'start': 9504.363, 'duration': 6.705}, {'end': 9519.995, 'text': 'Yeah, there can be many place.', 'start': 9518.854, 'duration': 1.141}, {'end': 9524.637, 'text': 'I mean so this is where I want to change, but in many cases.', 'start': 9520.195, 'duration': 4.442}], 'summary': 'The process involves dropping a column, resetting the index, and saving as a new data frame to maintain the original data integrity.', 'duration': 47.184, 'max_score': 9477.453, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-09477453.jpg'}, {'end': 9573.355, 'src': 'embed', 'start': 9541.565, 'weight': 3, 'content': [{'end': 9546.528, 'text': 'So I will always say that either read it as a separate data frame, then I can do my activities right?', 'start': 9541.565, 'duration': 4.963}, {'end': 9549.151, 'text': 'That depends on the use case, what we are working on.', 'start': 9547.228, 'duration': 1.923}, {'end': 9556.761, 'text': 'In the point of time, I will just quickly show you some more things.', 'start': 9552.135, 'duration': 4.626}, {'end': 9561.711, 'text': 'All outputs clear.', 'start': 9560.49, 'duration': 1.221}, {'end': 9563.911, 'text': 'So there is a sorting, okay.', 'start': 9562.611, 'duration': 1.3}, {'end': 9573.355, 'text': "I don't want to spend a lot of time on sorting, but basically what you can do is that you can do a sort values by the column.", 'start': 9563.931, 'duration': 9.424}], 'summary': 'Discussing data frame activities and sorting methods.', 'duration': 31.79, 'max_score': 9541.565, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-09541565.jpg'}, {'end': 10062.206, 'src': 'embed', 'start': 10033.627, 'weight': 4, 'content': [{'end': 10039.591, 'text': 'So what if we want to do this in class assignment? Create a new column with the following condition.', 'start': 10033.627, 'duration': 5.964}, {'end': 10040.391, 'text': "I'll give you a clue.", 'start': 10039.631, 'duration': 0.76}, {'end': 10045.675, 'text': "I'll give you a clue but think if you can figure it out how to do this.", 'start': 10042.233, 'duration': 3.442}, {'end': 10049.143, 'text': 'I will write a clue for this.', 'start': 10047.623, 'duration': 1.52}, {'end': 10051.104, 'text': 'So the clue is very simple.', 'start': 10049.884, 'duration': 1.22}, {'end': 10053.684, 'text': 'You will use two np.where conditions.', 'start': 10051.524, 'duration': 2.16}, {'end': 10062.206, 'text': 'In the first np.where, you will say if the distance is something, you will mark it as long trip.', 'start': 10054.505, 'duration': 7.701}], 'summary': 'In-class assignment: create a new column using two np.where conditions to mark distance as long trip.', 'duration': 28.579, 'max_score': 10033.627, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-010033627.jpg'}], 'start': 9033.122, 'title': 'Data frame manipulation in pandas', 'summary': 'Covers resetting dataframe index, filtering and sorting data frames, adding and categorizing columns using pandas and numpy, with emphasis on methods, parameters, and use cases, ensuring clarity and consistency in data manipulation.', 'chapters': [{'end': 9315.618, 'start': 9033.122, 'title': 'Manipulating dataframe index in pandas', 'summary': 'Covers the process of resetting the index of a dataframe in pandas, highlighting the importance of in-place parameter and the potential issues and solutions, ultimately resulting in the removal of the previous index, ensuring clarity and consistency in the data manipulation.', 'duration': 282.496, 'highlights': ["By default, when resetting the index of a DataFrame in Pandas, the original data frame remains unaffected, necessitating the use of the 'in-place' parameter to change the original data.", "The 'in-place' parameter, when set to true, enables the reset index to change the original data frame, addressing the need to modify the DataFrame without creating a separate copy.", "The presence of the previous index column can be resolved by using the 'drop' parameter set to true, resulting in the final output without the previous index column.", 'The process of resetting the index and handling the potential issues ensures the clarity and consistency of the data manipulation, providing a comprehensive understanding of DataFrame indexing in Pandas.']}, {'end': 9496.741, 'start': 9317.999, 'title': 'Data frame index resetting', 'summary': "Explains the process of resetting the index of a data frame in python using the 'reset_index' method and the 'inplace' and 'drop' parameters, resulting in the removal of the old index and the creation of a new index.", 'duration': 178.742, 'highlights': ["The 'reset_index' method is used to reset the index of a data frame in Python. The speaker explains the process of using the 'reset_index' method to reset the index of a data frame in Python.", "The use of the 'inplace' parameter in the 'reset_index' method to change the original data frame. The speaker emphasizes the significance of the 'inplace' parameter in the 'reset_index' method, which allows for changing the original data frame.", "The explanation of the 'drop' parameter in the 'reset_index' method to remove the old index column. The speaker describes the functionality of the 'drop' parameter in the 'reset_index' method, which enables the removal of the old index column from the data frame."]}, {'end': 9643.064, 'start': 9496.741, 'title': 'Filtering and sorting data frames', 'summary': 'Discusses the importance of saving a filtered data frame as a new data frame to avoid affecting the original one, along with the process of sorting data frames by columns and the default ascending order, emphasizing the use case of multiple purposes for a data frame and demonstrating the sorting process with an example of sorting by multiple columns.', 'duration': 146.323, 'highlights': ['The importance of saving a filtered data frame as a new data frame to avoid affecting the original one It is important to save a filtered data frame as a new data frame to work on without affecting the original, demonstrating a cautious approach to data manipulation.', "Process of sorting data frames by columns and the default ascending order Explains the process of sorting data frames by columns, including the default ascending order, and demonstrates the use of 'sort_values' and 'ascending' parameters for sorting.", 'Use case of multiple purposes for a data frame and demonstration of sorting process with an example of sorting by multiple columns Emphasizes the use case of a data frame serving multiple purposes, and demonstrates sorting by multiple columns, showcasing the ability to sort by one column and then another, providing insight into the concept of sorting by multiple criteria.']}, {'end': 10244.399, 'start': 9643.064, 'title': 'Adding and categorizing columns with numpy', 'summary': "Discusses the process of adding a column using numpy's np.where method and categorizing trips based on distance, with examples and insights into the use and limitations of numpy arrays and methods.", 'duration': 601.335, 'highlights': ['The np.where method in NumPy is used to add a column based on conditions, such as categorizing trips by distance into short, medium, and long trips, with examples and insights provided. (Relevance: 5)', "The process of adding a column using NumPy's np.where method is demonstrated, with a focus on categorizing trips based on distance into short, medium, and long trips, allowing for conditional column addition. (Relevance: 4)", 'Insights are provided on the limitations of using NumPy arrays for adding columns, such as the need to pass multiple values for unique additions, and the replication of a single value for all rows, shedding light on its practical use cases. (Relevance: 3)', 'The limitations of using NumPy arrays for adding columns are highlighted, including the need to pass multiple values for unique additions and the replication of a single value for all rows, providing practical insights. (Relevance: 2)', 'The discussion also touches on the limitations of using NumPy arrays for adding columns, highlighting the need to pass multiple values for unique additions and the replication of a single value for all rows, with insights into its practical use cases. (Relevance: 1)']}], 'duration': 1211.277, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-09033122.jpg', 'highlights': ["The 'reset_index' method in Pandas changes the original data frame when 'in-place' parameter is set to true.", "Using the 'drop' parameter in 'reset_index' method removes the previous index column for clarity and consistency.", 'Saving a filtered data frame as a new data frame is important to avoid affecting the original one, demonstrating a cautious approach to data manipulation.', "Sorting data frames by columns and the default ascending order is explained, showcasing the use of 'sort_values' and 'ascending' parameters for sorting.", 'The np.where method in NumPy is used to add a column based on conditions, such as categorizing trips by distance into short, medium, and long trips, with examples and insights provided.']}, {'end': 12422.479, 'segs': [{'end': 10274.276, 'src': 'embed', 'start': 10245.947, 'weight': 0, 'content': [{'end': 10248.428, 'text': "It's fine, right? I don't know, I just wrote.", 'start': 10245.947, 'duration': 2.481}, {'end': 10251.65, 'text': 'You guys also check your output is correct then comparing with me.', 'start': 10248.708, 'duration': 2.942}, {'end': 10253.311, 'text': "Don't believe me whatever I'm writing.", 'start': 10251.91, 'duration': 1.401}, {'end': 10258.253, 'text': 'So Wes McKinney is the person who created DataFrames and Pandas, actually.', 'start': 10253.691, 'duration': 4.562}, {'end': 10264.156, 'text': 'And this book is very good if you are looking for some reference to read, actually, and very good book.', 'start': 10258.653, 'duration': 5.503}, {'end': 10266.657, 'text': 'So you can probably get a copy or something.', 'start': 10264.196, 'duration': 2.461}, {'end': 10274.276, 'text': 'and somebody was asking how to learn python right, and I gave couple of things.', 'start': 10268.268, 'duration': 6.008}], 'summary': 'Wes mckinney created dataframes and pandas, recommended a good book for reference, and provided advice on learning python.', 'duration': 28.329, 'max_score': 10245.947, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-010245947.jpg'}, {'end': 10371.083, 'src': 'embed', 'start': 10341.96, 'weight': 1, 'content': [{'end': 10345.162, 'text': 'So that part you may find it a bit difficult to digest.', 'start': 10341.96, 'duration': 3.202}, {'end': 10349.524, 'text': 'How to statically create the data, right? Because the real applications are different.', 'start': 10345.782, 'duration': 3.742}, {'end': 10351.745, 'text': 'But you need to spend some time on NumPy.', 'start': 10349.984, 'duration': 1.761}, {'end': 10353.946, 'text': "So probably an hour we'll spend on NumPy.", 'start': 10352.145, 'duration': 1.801}, {'end': 10357.088, 'text': 'And then we will spend some time on visualization.', 'start': 10354.927, 'duration': 2.161}, {'end': 10360.056, 'text': 'Seaborn and Matclotlib.', 'start': 10358.836, 'duration': 1.22}, {'end': 10364.699, 'text': "Visualizations, even if you can understand four or five types, that's enough.", 'start': 10360.617, 'duration': 4.082}, {'end': 10371.083, 'text': "There are a lot of in-depth visualizations, but even if you understand the basic four or five types, that's enough for you.", 'start': 10365.76, 'duration': 5.323}], 'summary': 'Spend 1 hour on numpy and learn 4-5 visualization types.', 'duration': 29.123, 'max_score': 10341.96, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-010341960.jpg'}, {'end': 11422.397, 'src': 'embed', 'start': 11388.03, 'weight': 2, 'content': [{'end': 11393.454, 'text': 'So, like in every class, the assignment is more complicated than my explanation usually, right?', 'start': 11388.03, 'duration': 5.424}, {'end': 11396.876, 'text': "But before you try this, let's try to understand the problem right?", 'start': 11393.894, 'duration': 2.982}, {'end': 11398.237, 'text': 'So what do you want to find?', 'start': 11397.256, 'duration': 0.981}, {'end': 11407.883, 'text': 'The most recent and the earliest travel date and mean distance travel for each start city.', 'start': 11399.198, 'duration': 8.685}, {'end': 11409.845, 'text': 'So let us break this down.', 'start': 11408.204, 'duration': 1.641}, {'end': 11422.397, 'text': 'What will be the grouping column first? So you have to say group by start, correct? And then, so this is first level.', 'start': 11411.613, 'duration': 10.784}], 'summary': 'Find the earliest and most recent travel dates with mean distance for each start city.', 'duration': 34.367, 'max_score': 11388.03, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-011388030.jpg'}, {'end': 11558.083, 'src': 'embed', 'start': 11528.459, 'weight': 3, 'content': [{'end': 11530.02, 'text': 'So this is how you break down the problem.', 'start': 11528.459, 'duration': 1.561}, {'end': 11532.38, 'text': 'First, you want to find out what you want to group.', 'start': 11530.16, 'duration': 2.22}, {'end': 11533.861, 'text': 'So that is for sure, start.', 'start': 11532.82, 'duration': 1.041}, {'end': 11535.281, 'text': 'I want to from each city.', 'start': 11534.181, 'duration': 1.1}, {'end': 11536.802, 'text': 'I want to find something right?', 'start': 11535.281, 'duration': 1.521}, {'end': 11539.843, 'text': 'Find on what column right?', 'start': 11537.422, 'duration': 2.421}, {'end': 11543.204, 'text': 'So on date column and miles column right?', 'start': 11540.263, 'duration': 2.941}, {'end': 11545.185, 'text': 'On the miles column I want to apply mean.', 'start': 11543.484, 'duration': 1.701}, {'end': 11546.405, 'text': 'On the date column, this.', 'start': 11545.405, 'duration': 1}, {'end': 11547.625, 'text': "But don't try this.", 'start': 11546.505, 'duration': 1.12}, {'end': 11548.586, 'text': "Don't try this.", 'start': 11548.026, 'duration': 0.56}, {'end': 11549.126, 'text': 'There is a trick.', 'start': 11548.606, 'duration': 0.52}, {'end': 11552.227, 'text': 'This may not work if you write in any ways.', 'start': 11550.206, 'duration': 2.021}, {'end': 11554.748, 'text': 'Why? I mean, the logic is correct.', 'start': 11552.347, 'duration': 2.401}, {'end': 11558.083, 'text': 'But you need to do one small thing.', 'start': 11555.762, 'duration': 2.321}], 'summary': 'Group data by city, calculate mean miles, and apply date logic.', 'duration': 29.624, 'max_score': 11528.459, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-011528459.jpg'}, {'end': 12408.025, 'src': 'embed', 'start': 12381.966, 'weight': 4, 'content': [{'end': 12386.909, 'text': 'I will give you some assignments on this further than this, so that you can understand more about grouping.', 'start': 12381.966, 'duration': 4.943}, {'end': 12388.01, 'text': 'so this is just an example.', 'start': 12386.909, 'duration': 1.101}, {'end': 12393.794, 'text': 'but some more simple assignments on different columns, and tomorrow you will have a practice session,', 'start': 12388.01, 'duration': 5.784}, {'end': 12398.858, 'text': "so where you will have around 3-4 hours to spend only on these things again, right, so it's not over.", 'start': 12393.794, 'duration': 5.064}, {'end': 12405.803, 'text': 'like after we finish today the class is not over you will still have some more time to brush up and some more assignments and all this.', 'start': 12398.858, 'duration': 6.945}, {'end': 12408.025, 'text': 'I just want to introduce that these kind of things are possible.', 'start': 12405.803, 'duration': 2.222}], 'summary': 'Introduction to more assignments and practice sessions for understanding grouping with around 3-4 hours allotted.', 'duration': 26.059, 'max_score': 12381.966, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-012381966.jpg'}], 'start': 10245.947, 'title': 'Python data analysis and pandas grouping', 'summary': 'Covers learning python for data analysis, including topics such as pandas, numpy, visualization, data extraction methods, and the process of grouping and aggregating data using the pandas library to calculate mean, min, and max values for specific columns, with considerations for different versions of pandas.', 'chapters': [{'end': 10622.615, 'start': 10245.947, 'title': 'Learning python data analysis', 'summary': 'Discusses learning python for data analysis, including the timeline for learning, resources available, and topics to be covered, such as pandas, numpy, visualization, data extraction methods, grouping and applying operations in dataframes, and the concept of split, apply, combine in statistical terms.', 'duration': 376.668, 'highlights': ['Wes McKinney is the creator of DataFrames and Pandas, and the book recommended for reference in learning Python for data analysis. This highlights the importance of the creator of Pandas and DataFrames, as well as recommends a book for reference in learning Python for data analysis.', 'The timeline for learning Python for data analysis is discussed, including the availability of two months for statistics, followed by the start of machine learning in April. This provides a timeline for learning Python for data analysis, indicating two months for statistics and the start of machine learning in April.', 'Planned topics to be covered include Pandas, NumPy, visualization with Seaborn and Matplotlib, and possibly data extraction methods from an API. This discusses the planned topics to be covered, including Pandas, NumPy, visualization using Seaborn and Matplotlib, and potential data extraction methods from an API.', 'The concept of grouping and applying operations in DataFrames is explained, with a focus on the split, apply, combine process in statistical terms. This details the explanation of grouping and applying operations in DataFrames, emphasizing the split, apply, combine process in statistical terms.', 'The process of grouping data is demonstrated, including selecting a column to group, applying a function, such as finding the average distance traveled, and the logic of split, apply, combine. This highlights the demonstration of grouping data, involving selecting a column to group, applying a function, and explaining the logic of split, apply, combine.']}, {'end': 11121.894, 'start': 10623.555, 'title': 'Pandas data grouping and aggregation', 'summary': "Demonstrates the process of grouping and aggregating data using the pandas library, highlighting the use of 'group by' and 'agg' functions to calculate mean, min, and max values for specific columns, as well as the ability to group by multiple columns for simultaneous analysis.", 'duration': 498.339, 'highlights': ["The chapter explains the process of grouping data by a specific column using the 'group by' function, and then calculating the average distance traveled using the 'AGG' function with 'mean', showcasing the general method for data aggregation. Use of 'group by' and 'AGG' functions to calculate the average distance traveled.", "It discusses the option of using shorthand notations such as directly using 'mean' instead of 'AGG mean' for quick data analysis, with the example of grouping by 'start' and directly applying the 'mean' function to achieve the same output. Explanation of shorthand notation for quick data analysis.", "The chapter explains the capability of grouping data by multiple columns, illustrating the process of grouping by both 'start' and 'stop' columns and calculating the mean and sum simultaneously using the 'AGG' function. Demonstration of grouping by multiple columns and simultaneous calculation of mean and sum."]}, {'end': 11286.34, 'start': 11122.074, 'title': 'Grouping and calculating data in pandas', 'summary': 'Discusses grouping and calculating data in pandas, including the process of reshaping columns and explaining how to specify columns for mean and sum calculation, with considerations for different versions of pandas.', 'duration': 164.266, 'highlights': ['The process involves grouping by starting and stopping, then calculating mean and sum for each group. The speaker explains the process of grouping by starting and stopping, then calculating the mean and sum for each group, providing an insight into the data manipulation process.', 'The importance of reshaping columns to present data in a clear format is emphasized. Emphasis is placed on the importance of reshaping columns to present data in a clear format, highlighting the significance of data visualization and presentation.', 'Considerations for specifying columns for mean and sum calculation are discussed, noting the default behavior for single and multiple columns in different versions of Pandas. The discussion delves into considerations for specifying columns for mean and sum calculation, highlighting the default behavior for single and multiple columns in different versions of Pandas, providing a comprehensive understanding of the process.']}, {'end': 11628.126, 'start': 11286.76, 'title': 'Grouping and aggregation in python', 'summary': 'Discusses the process of grouping and aggregation in python, focusing on finding the most recent and earliest travel date and mean distance traveled for each start city, and addresses the potential issues with date formatting and logic.', 'duration': 341.366, 'highlights': ['The chapter discusses the process of grouping and aggregation in Python, focusing on finding the most recent and earliest travel date and mean distance traveled for each start city The main focus of the chapter is on grouping and aggregation in Python, particularly on finding the most recent and earliest travel date and mean distance traveled for each start city, addressing the logic and potential date formatting issues.', 'The process involves breaking down the problem into defining the grouping column (start city), specifying aggregation columns (date and miles), and determining the aggregation functions (mean, min, and max) for the date and miles columns. The process involves breaking down the problem into defining the grouping column (start city), specifying aggregation columns (date and miles), and determining the aggregation functions (mean, min, and max) for the date and miles columns to address the specific data analysis requirements.', 'The chapter also highlights the potential issue with date formatting, emphasizing the need to convert the date column to the appropriate format to facilitate accurate analysis. The chapter also emphasizes the potential issue with date formatting, stressing the need to convert the date column to the appropriate format to ensure accurate analysis and address potential errors in the data.']}, {'end': 11927.644, 'start': 11628.406, 'title': 'Data frame date conversion and row removal', 'summary': 'Discusses converting a date column in a data frame to the date format, removing the last row, and addressing potential issues with iloc and group by in pandas.', 'duration': 299.238, 'highlights': ['The first task involves removing the last row from the data frame by using df.iloc[:-1] and assigning it back to df, resulting in the removal of the last row.', "Converting a specified date column to the datetime format using pd.to_datetime(df['Date']) can be hindered by potential issues with iloc, potentially requiring the use of 'loc' instead.", "Potential issues with iloc in recent versions of pandas can be addressed by using 'loc' instead, especially if iloc support has been removed.", "The chapter also touches on potential problems in using iloc or 'loc' for group by operations in pandas, specifically related to grouping by the 'Start' column."]}, {'end': 12422.479, 'start': 11928.144, 'title': 'Applying aggregation functions with multiple conditions', 'summary': 'Discusses the challenges of applying aggregation functions with multiple conditions in pandas, emphasizing the use of dictionaries to represent the different operations for each column.', 'duration': 494.335, 'highlights': ['The use of dictionaries is emphasized for representing different operations for each column when applying aggregation functions with multiple conditions in Pandas.', 'The process of grouping by a column and applying aggregation functions is explained, highlighting the use of dictionaries to specify different operations for each column.', 'The instructor plans to provide further assignments and a practice session to help the students understand more about grouping in Pandas, indicating an ongoing learning process.']}], 'duration': 2176.532, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-010245947.jpg', 'highlights': ['Wes McKinney is the creator of DataFrames and Pandas, and the book recommended for reference in learning Python for data analysis.', 'Planned topics to be covered include Pandas, NumPy, visualization with Seaborn and Matplotlib, and possibly data extraction methods from an API.', 'The chapter discusses the process of grouping and aggregation in Python, focusing on finding the most recent and earliest travel date and mean distance traveled for each start city.', 'The process involves breaking down the problem into defining the grouping column (start city), specifying aggregation columns (date and miles), and determining the aggregation functions (mean, min, and max) for the date and miles columns.', 'The instructor plans to provide further assignments and a practice session to help the students understand more about grouping in Pandas, indicating an ongoing learning process.']}, {'end': 13949.038, 'segs': [{'end': 12534.708, 'src': 'embed', 'start': 12508.78, 'weight': 1, 'content': [{'end': 12515.842, 'text': 'Yeah, so here also if you look at the start date, it is unique, right? There is no duplicate.', 'start': 12508.78, 'duration': 7.062}, {'end': 12518.863, 'text': 'So, you are grouping by what column? Start.', 'start': 12516.422, 'duration': 2.441}, {'end': 12519.963, 'text': 'So, it will pick up.', 'start': 12519.403, 'duration': 0.56}, {'end': 12522.444, 'text': 'So, in Agnew, there are like 11 trips.', 'start': 12520.123, 'duration': 2.321}, {'end': 12526.285, 'text': 'So, Agnew is repeating 11 times, but you see it only once.', 'start': 12523.584, 'duration': 2.701}, {'end': 12534.708, 'text': 'It grouped, right? But what you should also do is that you should do a reset index.', 'start': 12526.945, 'duration': 7.763}], 'summary': 'Data grouped by start date, showing 11 trips for agnew, with no duplicates.', 'duration': 25.928, 'max_score': 12508.78, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-012508780.jpg'}, {'end': 12820.43, 'src': 'embed', 'start': 12793.886, 'weight': 0, 'content': [{'end': 12798.584, 'text': 'So that for selecting it might be important, otherwise you can leave it as 0, 1..', 'start': 12793.886, 'duration': 4.698}, {'end': 12804.248, 'text': 'I will show you an example probably we have a sales data example where the indexing might be very useful.', 'start': 12798.584, 'duration': 5.664}, {'end': 12807.75, 'text': 'There is a sales data example where the indexing might be useful.', 'start': 12805.048, 'duration': 2.702}, {'end': 12809.471, 'text': 'List is similar elements.', 'start': 12808.21, 'duration': 1.261}, {'end': 12812.433, 'text': 'Dictionary is if you have a key and a set of values.', 'start': 12809.751, 'duration': 2.682}, {'end': 12820.43, 'text': "So if I'm using like, if I'm saying that I have three columns and I want to apply min, max on each column.", 'start': 12813.666, 'duration': 6.764}], 'summary': 'Indexing is important for selecting data, useful in sales examples, and applying functions to columns.', 'duration': 26.544, 'max_score': 12793.886, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-012793886.jpg'}, {'end': 13227.257, 'src': 'embed', 'start': 13200.798, 'weight': 2, 'content': [{'end': 13206.179, 'text': 'So properly add an index and then the columns and then only you can remove it.', 'start': 13200.798, 'duration': 5.381}, {'end': 13207.72, 'text': 'So did you guys understand this?', 'start': 13206.52, 'duration': 1.2}, {'end': 13211.121, 'text': 'After you group it.', 'start': 13209.061, 'duration': 2.06}, {'end': 13212.642, 'text': 'so this is the grouping result right?', 'start': 13211.121, 'duration': 1.521}, {'end': 13217.204, 'text': 'After the grouping if you look at the columns, these are the columns.', 'start': 13213.202, 'duration': 4.002}, {'end': 13220.552, 'text': 'So how many columns you have? three columns.', 'start': 13217.224, 'duration': 3.328}, {'end': 13227.257, 'text': 'Even though it displays like in this fashion, if you actually look at the columns, it says multi index column.', 'start': 13221.173, 'duration': 6.084}], 'summary': 'Grouped data has three multi-index columns.', 'duration': 26.459, 'max_score': 13200.798, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-013200798.jpg'}, {'end': 13297.578, 'src': 'embed', 'start': 13268.268, 'weight': 3, 'content': [{'end': 13274.61, 'text': 'and I cannot directly change it that is what I am saying that I want to reset the whole index of my columns.', 'start': 13268.268, 'duration': 6.342}, {'end': 13283.893, 'text': 'So when you run reset index, what it will eventually do is that it will add a column for index, starting with 0, 1, 2, 3, and then level your columns,', 'start': 13275.11, 'duration': 8.783}, {'end': 13288.775, 'text': 'and then you will be able to add that my custom column name, whatever I want.', 'start': 13283.893, 'duration': 4.882}, {'end': 13297.578, 'text': 'Which one? Let me see, so this is after leveling right, so I will do a.', 'start': 13290.035, 'duration': 7.543}], 'summary': 'Resetting the index will add a column for index, starting with 0, 1, 2, 3, and then level your columns, allowing the addition of custom column names.', 'duration': 29.31, 'max_score': 13268.268, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-013268268.jpg'}], 'start': 12425.438, 'title': 'Python data manipulation', 'summary': 'Covers python dictionary notation for efficient data extraction, data frame grouping, indexing, and column operations in pandas, emphasizing flexibility and efficiency compared to traditional methods.', 'chapters': [{'end': 12472.458, 'start': 12425.438, 'title': 'Python dictionary notation', 'summary': 'Discusses the usage of dictionary notation in python to extract specific data such as average miles and latest trip from a dataset, highlighting the efficiency and flexibility it offers compared to traditional methods.', 'duration': 47.02, 'highlights': ['The chapter discusses the usage of dictionary notation in Python to extract specific data such as average miles and latest trip from a dataset, highlighting the efficiency and flexibility it offers compared to traditional methods.', 'The speaker emphasizes the flexibility of using dictionary notation in Python to extract specific data, illustrating that it allows for more efficient and concise code compared to traditional methods.', 'The transcript includes a question about extracting only the average miles and the latest trip from a dataset using Python, showcasing the practical application of the discussed concept.']}, {'end': 12932.624, 'start': 12473.118, 'title': 'Data frame grouping and indexing', 'summary': 'Discusses grouping data frames, resetting index, and customizing column names, emphasizing the importance of removing nested columns and utilizing indexes for horizontal operations.', 'duration': 459.506, 'highlights': ["Grouping by 'start' column results in 11 trips for 'Agnew' When grouping by the 'start' column, there are 11 trips for 'Agnew', indicating the frequency of this occurrence.", 'Resetting index to standard format (0, 1, 2...) removes nested columns Resetting the index to the standard format removes nested columns and organizes the data frame, facilitating better data manipulation.', 'Customizing column names to match specific requirements The ability to customize column names allows for tailoring the data frame to specific needs, enhancing its usability and readability.', 'Utilizing indexes for horizontal operations in data frames Indexes can be utilized for horizontal operations in data frames, enabling efficient data selection and manipulation in a horizontal manner.', 'Using lists for applying aggregation functions to multiple columns Lists can be used to apply aggregation functions to multiple columns, providing flexibility in performing mathematical operations on specific sets of columns.']}, {'end': 13267.45, 'start': 12934.665, 'title': 'Pandas functions and column operations', 'summary': 'Discusses the use of lists and dictionaries for applying functions on columns in pandas, emphasizing the need for proper mapping and resetting of index after grouping, with a focus on the mismatch of expected and actual columns in the grouping output.', 'duration': 332.785, 'highlights': ['The need for proper mapping and resetting of index after grouping Emphasizes the importance of using a dictionary for proper mapping of functions on columns and resetting the index after grouping to avoid mismatch of expected and actual columns in the grouping output.', 'Use of lists and dictionaries for applying functions on columns in pandas Explains the use of lists for one-to-one mapping of columns and functions, and the need for dictionaries for mapping multiple functions to multiple columns in pandas.', 'Explanation of mismatch of expected and actual columns in the grouping output Provides a detailed explanation of how the grouping output may have nested columns that require resetting the index to properly match the expected columns, ensuring the correct application of functions on the grouped data.']}, {'end': 13949.038, 'start': 13268.268, 'title': 'Resetting index and working with data frames', 'summary': 'Discusses resetting the index in a data frame, adding custom column names, dropping the index, limitations of removing the row index, saving data frames as csv, and working with for loops and list comprehensions in python data manipulation, with an emphasis on efficient code writing.', 'duration': 680.77, 'highlights': ['Resetting the index in a data frame involves adding a column for index, starting with 0, 1, 2, 3, and then leveling the columns to allow the addition of custom column names. This process adds a column for index starting with 0, 1, 2, 3, and levels the columns, enabling the addition of custom column names.', 'Limitation of removing the row index in a data frame, as a data frame must have an index and removing it would result in the start city becoming the index column. It is not possible to completely remove the row index in a data frame, as the start city would become the index column if the row index is removed.', "Saving data frames as CSV files using 'pd.to_csv' adds the index by default, and an argument can be used to remove the index while saving. When saving data frames as CSV using 'pd.to_csv,' the index is added by default, and an argument can be used to remove the index while saving.", 'Demonstrating the use of for loops and list comprehensions in Python for efficient data manipulation and code writing. The chapter provides examples of using for loops and list comprehensions in Python for efficient data manipulation and code writing.']}], 'duration': 1523.6, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-012425438.jpg', 'highlights': ['The chapter covers the usage of dictionary notation in Python for efficient data extraction and data frame operations in pandas, emphasizing flexibility and efficiency compared to traditional methods.', "Grouping by 'start' column results in 11 trips for 'Agnew', indicating the frequency of this occurrence.", 'The need for proper mapping and resetting of index after grouping is emphasized to avoid mismatch of expected and actual columns in the grouping output.', 'Resetting the index in a data frame involves adding a column for index, starting with 0, 1, 2, 3, and then leveling the columns to allow the addition of custom column names.']}, {'end': 15412.723, 'segs': [{'end': 14045.551, 'src': 'embed', 'start': 13992.02, 'weight': 0, 'content': [{'end': 14000.562, 'text': 'so remove the one and upload that into your python, like this.', 'start': 13992.02, 'duration': 8.542}, {'end': 14001.562, 'text': 'so here I have uploaded.', 'start': 14000.562, 'duration': 1}, {'end': 14004.343, 'text': 'can you see store sales?', 'start': 14001.562, 'duration': 2.781}, {'end': 14012.705, 'text': 'see. so once you upload it, you come back here and you should be able to read it like this.', 'start': 14004.343, 'duration': 8.362}, {'end': 14013.485, 'text': 'so can you read it?', 'start': 14012.705, 'duration': 0.78}, {'end': 14020.295, 'text': 'And this is a very interesting data set, because here we are having a store ID.', 'start': 14014.892, 'duration': 5.403}, {'end': 14023.896, 'text': 'So you have like S1, S2, S3, up to some 100 stores.', 'start': 14020.655, 'duration': 3.241}, {'end': 14026.798, 'text': 'There is a city for where each store is.', 'start': 14024.517, 'duration': 2.281}, {'end': 14032.6, 'text': 'And then there is like months, Jan, Feb, March, et cetera, and sales in thousands.', 'start': 14027.198, 'duration': 5.402}, {'end': 14034.141, 'text': "So there's like $8, 000, $20, 000, et cetera, et cetera.", 'start': 14032.981, 'duration': 1.16}, {'end': 14043.349, 'text': 'right. so this data is particularly interesting because here there is a horizontal way of working it and there is a vertical way of working it right.', 'start': 14035.942, 'duration': 7.407}, {'end': 14045.551, 'text': 'like you can look at it both ways.', 'start': 14043.349, 'duration': 2.202}], 'summary': 'Transcript discusses uploading store sales data with store ids, cities, months, and sales in thousands.', 'duration': 53.531, 'max_score': 13992.02, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-013992020.jpg'}, {'end': 14163.52, 'src': 'embed', 'start': 14124.469, 'weight': 2, 'content': [{'end': 14132.315, 'text': 'I want to calculate the average sale across Jan column.', 'start': 14124.469, 'duration': 7.846}, {'end': 14134.256, 'text': 'I want the average.', 'start': 14132.315, 'duration': 1.941}, {'end': 14136.278, 'text': 'so you can say store sales.', 'start': 14134.256, 'duration': 2.022}, {'end': 14153.535, 'text': 'then what you do, you want to group by, I can say, should work right.', 'start': 14136.278, 'duration': 17.257}, {'end': 14158.757, 'text': "so I'm just interested in calculating the average or the mean for Jan.", 'start': 14153.535, 'duration': 5.222}, {'end': 14161.278, 'text': 'the easiest way to do that will be.', 'start': 14158.757, 'duration': 2.521}, {'end': 14163.52, 'text': 'you can just also, I think, some, some.', 'start': 14161.278, 'duration': 2.242}], 'summary': 'Calculate average sale across jan column for store sales.', 'duration': 39.051, 'max_score': 14124.469, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-014124469.jpg'}, {'end': 14290.175, 'src': 'embed', 'start': 14246.311, 'weight': 3, 'content': [{'end': 14260.32, 'text': 'So I can simply say, storesales.apply So, apply is a function and you can simply say mean.', 'start': 14246.311, 'duration': 14.009}, {'end': 14268.468, 'text': 'So can you take a guess what happened if I ran this?', 'start': 14265.885, 'duration': 2.583}, {'end': 14283.132, 'text': 'yes, so basically, apply is a special function we use, and within apply you need to say what you need to do.', 'start': 14275.97, 'duration': 7.162}, {'end': 14284.813, 'text': "so here I'm saying mean.", 'start': 14283.132, 'duration': 1.681}, {'end': 14290.175, 'text': 'what it did was that it pick up each month and then calculated the mean.', 'start': 14284.813, 'duration': 5.362}], 'summary': 'Using the apply function, mean was calculated for each month in storesales.', 'duration': 43.864, 'max_score': 14246.311, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-014246311.jpg'}, {'end': 14798.83, 'src': 'embed', 'start': 14755.583, 'weight': 4, 'content': [{'end': 14757.284, 'text': 'can you do a group by what?', 'start': 14755.583, 'duration': 1.701}, {'end': 14761.227, 'text': 'City. I want to do a group by city.', 'start': 14758.045, 'duration': 3.182}, {'end': 14771.636, 'text': 'so I want to group the stores by city and then want to calculate what? Probably average or means sale.', 'start': 14761.227, 'duration': 10.409}, {'end': 14772.456, 'text': 'can you try to do that??', 'start': 14771.636, 'duration': 0.82}, {'end': 14789.024, 'text': 'What I am saying?', 'start': 14788.023, 'duration': 1.001}, {'end': 14798.83, 'text': 'usually, when you apply the apply function, if you have string and integer, it will omit integer, but in some pandas version it will not.', 'start': 14789.024, 'duration': 9.806}], 'summary': 'Group stores by city and calculate average sales.', 'duration': 43.247, 'max_score': 14755.583, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-014755583.jpg'}], 'start': 13949.318, 'title': 'Using pandas for store sales data analysis', 'summary': 'Covers uploading, reading, and analyzing store sales data in python, calculating average sales, handling errors, and utilizing apply and agg functions for group operations and data analysis.', 'chapters': [{'end': 14045.551, 'start': 13949.318, 'title': 'Uploading and reading store sales data', 'summary': 'Involves uploading and reading a store sales dataset in python, containing store ids, cities, months, and sales in thousands, offering a unique perspective on analyzing the data horizontally and vertically.', 'duration': 96.233, 'highlights': ['The data set comprises store IDs, cities, months, and sales in thousands, providing insights into various aspects of store sales.', "The process involves uploading and renaming the file 'store sales' in Python, followed by reading the uploaded data to gain valuable insights for analysis.", 'The dataset contains information on around 100 stores, with sales data spanning across different months, providing a comprehensive overview of store performance.']}, {'end': 14290.175, 'start': 14045.551, 'title': 'Calculating average sales for store data', 'summary': 'Discusses reading store sales data using pandas, calculating the average sale for the jan month, and introducing the apply function to calculate the mean for each month in the store sales data.', 'duration': 244.624, 'highlights': ['The apply function in pandas is introduced for calculating the mean of each month in the store sales data, offering a flexible way to perform calculations across multiple columns.', 'The chapter covers the process of calculating the average sale for the Jan month in the store sales data using pandas, emphasizing the use of the mean function to obtain the desired result.', 'The discussion involves importing pandas, reading store sales data, and addressing potential issues with function usage and file naming, highlighting the practical steps in working with the data.']}, {'end': 14658.016, 'start': 14290.175, 'title': 'Handling errors in pandas', 'summary': 'Discusses handling errors in pandas while using the apply function to calculate mean, and provides a technique to exclude string columns by using the select d types command to avoid errors.', 'duration': 367.841, 'highlights': ['The apply function in Pandas can throw an error when encountering string columns while calculating the mean, necessitating the use of a technique to exclude string columns by using the select D types command to include only integer and float data types.', 'The axis parameter in Pandas allows for the calculation of mean horizontally by specifying axis=1 for row-wise calculation, and axis=0 for column-wise calculation.', 'In production, it is common to encounter a data frame with a mixture of string, float, and integer columns, which may lead to errors when applying functions like mean, necessitating the use of techniques like the select D types command to exclude unwanted data types.']}, {'end': 14903.628, 'start': 14665.217, 'title': 'Pandas apply and agg functions', 'summary': 'Discusses the use of the apply and agg functions in pandas, highlighting their application in group operations, potential issues with data types, and the need for aggregation functions like mean in data analysis.', 'duration': 238.411, 'highlights': ['The apply function in pandas works row wise and column wise, and is typically used to apply functions like mean to entire rows or columns.', 'The agg function in pandas is used for aggregation, requiring the data to be grouped before applying the desired function, such as calculating the average sales by city.', 'Issues may arise with data types when using the apply function, as it may attempt to calculate the mean of a string, resulting in an error and necessitating careful consideration of data types and values.']}, {'end': 15412.723, 'start': 14906.789, 'title': 'Using apply function in pandas', 'summary': 'Explains how to use the apply function in pandas to apply a function to a data frame, with a detailed example of defining a bonus function for store sales data based on a condition of sales value, and the use of np.where for comparison and adding a new column.', 'duration': 505.934, 'highlights': ['The apply function in pandas can be used to apply a function to all columns or rows of a data frame, and is not limited to selected ones.', "A detailed example of defining a bonus function for store sales data is provided, where the condition of sales value is checked using np.where, and a new column 'jan bonus' is added based on the result.", "The process of defining a function in Python, with an example of defining a simple 'add' function and its usage, is explained.", 'The concept of applying the mean using the apply function on a data frame is demonstrated, highlighting that it works on whole rows and columns.', 'The chapter also discusses the limitations of the apply function, stating that it is not used for everything and provides insights into common use cases such as in the context of store data and applying a bonus function.']}], 'duration': 1463.405, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-013949318.jpg', 'highlights': ['The dataset contains information on around 100 stores, with sales data spanning across different months, providing a comprehensive overview of store performance.', "The process involves uploading and renaming the file 'store sales' in Python, followed by reading the uploaded data to gain valuable insights for analysis.", 'Covers uploading, reading, and analyzing store sales data in python, calculating average sales, handling errors, and utilizing apply and agg functions for group operations and data analysis.', 'The apply function in pandas is introduced for calculating the mean of each month in the store sales data, offering a flexible way to perform calculations across multiple columns.', 'The agg function in pandas is used for aggregation, requiring the data to be grouped before applying the desired function, such as calculating the average sales by city.']}, {'end': 16555.056, 'segs': [{'end': 15508.633, 'src': 'embed', 'start': 15444.258, 'weight': 0, 'content': [{'end': 15448.26, 'text': 'And then what you do? Instead of 10, you write sales column.', 'start': 15444.258, 'duration': 4.002}, {'end': 15452.943, 'text': 'So you will say, do you guys understand this? Yeah.', 'start': 15448.28, 'duration': 4.663}, {'end': 15460.968, 'text': "Yeah So here you are passing it as a variable, right? Well, it's not required, I'm just saying.", 'start': 15453.544, 'duration': 7.424}, {'end': 15493.523, 'text': 'Sales column is less than 10.', 'start': 15463.049, 'duration': 30.474}, {'end': 15508.633, 'text': 'But what is more interesting will be this, I can say store sales, new df equal to store sales, I can say dot apply.', 'start': 15493.523, 'duration': 15.11}], 'summary': 'Discussion about using a sales column and applying a function to store sales data.', 'duration': 64.375, 'max_score': 15444.258, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-015444258.jpg'}, {'end': 15707.08, 'src': 'embed', 'start': 15670.265, 'weight': 1, 'content': [{'end': 15675.568, 'text': 'when you do concatenation you can either add them horizontally or vertically.', 'start': 15670.265, 'duration': 5.303}, {'end': 15678.05, 'text': 'horizontal adding means two will be added like this.', 'start': 15675.568, 'duration': 2.482}, {'end': 15679.431, 'text': 'vertical means like this.', 'start': 15678.05, 'duration': 1.381}, {'end': 15685.184, 'text': 'so ideally their rows and columns should be equal, otherwise you will end up having problem.', 'start': 15679.431, 'duration': 5.753}, {'end': 15690.648, 'text': 'So now my requirement is I have two data frames and I just want to add them together right.', 'start': 15685.204, 'duration': 5.444}, {'end': 15693.57, 'text': "So let's see how to do that then I will pick up your questions.", 'start': 15691.049, 'duration': 2.521}, {'end': 15707.08, 'text': 'So I can say something like this pd dot one moment okay concat I think the function is called concat.', 'start': 15699.214, 'duration': 7.866}], 'summary': 'Concatenate data frames horizontally or vertically using the pd.concat function.', 'duration': 36.815, 'max_score': 15670.265, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-015670265.jpg'}, {'end': 16538.685, 'src': 'embed', 'start': 16511.108, 'weight': 2, 'content': [{'end': 16513.549, 'text': 'so you have some value to experiment with.', 'start': 16511.108, 'duration': 2.441}, {'end': 16515.59, 'text': 'you cannot simply say drop everything right.', 'start': 16513.549, 'duration': 2.041}, {'end': 16518.632, 'text': "so if it is 0 doesn't mean like, so you have to justify that.", 'start': 16515.59, 'duration': 3.042}, {'end': 16521.053, 'text': 'so what we do is that we will replace that NaN value with a mean.', 'start': 16518.632, 'duration': 2.421}, {'end': 16526.503, 'text': 'So 10 year revenue of the company is this, 1 year is missing the mean will take care of it.', 'start': 16521.809, 'duration': 4.694}, {'end': 16535.122, 'text': 'For that companies all years.', 'start': 16532, 'duration': 3.122}, {'end': 16537.024, 'text': 'that column mean I mean one technique.', 'start': 16535.122, 'duration': 1.902}, {'end': 16538.685, 'text': 'I am saying anything you can do.', 'start': 16537.024, 'duration': 1.661}], 'summary': 'Replace missing values with mean to justify analysis. utilize mean for 1 year revenue data.', 'duration': 27.577, 'max_score': 16511.108, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-016511108.jpg'}], 'start': 15412.883, 'title': 'Managing data in pandas', 'summary': 'Covers setting variable sales targets, applying functions to filter data, concatenating data frames with potential challenges, and handling nan values by dropping or replacing with 0 or mean, essential in data analytics.', 'chapters': [{'end': 15606.125, 'start': 15412.883, 'title': 'Variable sales targets and applying functions', 'summary': 'Discusses how to set variable sales targets based on store sales, utilizing the apply function to filter data based on specific criteria in a dataframe.', 'duration': 193.242, 'highlights': ['The chapter discusses setting variable sales targets based on store sales, with bonuses given if sales exceed certain thresholds, such as 10 or 20 units, and demonstrates how to adapt these targets using the apply function.', 'Explains the process of using the apply function to filter data based on specific criteria, such as selecting data types or applying functions to every column in a DataFrame.', 'Illustrates the concept of passing variables in the context of setting dynamic sales targets, and provides examples of using the apply function to modify and filter data within a DataFrame.']}, {'end': 16005.518, 'start': 15606.125, 'title': 'Concatenating data frames in pandas', 'summary': 'Discusses the process of concatenating data frames in pandas, emphasizing the importance of carefully aligning rows and columns and providing insights on potential challenges and solutions, with a demonstration of adding two data frames horizontally resulting in a dataset of 15 columns and 200 rows.', 'duration': 399.393, 'highlights': ['The process of concatenating data frames in Pandas is explained, underscoring the significance of ensuring equality in rows and columns to avoid potential issues, with the demonstration of adding two data frames horizontally resulting in a dataset of 15 columns and 200 rows.', 'The importance of carefully considering the alignment of rows and columns during concatenation is emphasized, with a cautionary note on potential confusion arising from repetitive column names if not manually adjusted or renamed.', 'A demonstration is provided on adding two data frames horizontally, resulting in a dataset of 15 columns and 200 rows, showcasing the practical application of concatenation in Pandas.', 'Challenges and potential solutions related to column naming conflicts in concatenated data frames are discussed, highlighting the need for manual adjustment or renaming to avoid confusion and ensure clarity of the dataset structure.']}, {'end': 16555.056, 'start': 16012.399, 'title': 'Handling nan values in dataframes', 'summary': 'Discusses the creation of a data frame from a series, handling nan values by dropping or replacing them with 0 or mean, and the importance of these techniques in data analytics.', 'duration': 542.657, 'highlights': ['The importance of handling NaN values in data analytics and the common occurrence of millions of NaN values in data frames, emphasizing the need to change or drop them.', 'Demonstration of dropping NaN values using the dropna method in pandas, with an explanation of how it operates row-wise or column-wise and its impact on the original data frame.', 'Explanation of replacing NaN values with 0 or mean in a data frame, with the example of filling NaN values with the mean annual revenue of companies to ensure meaningful analysis and interpretation.']}], 'duration': 1142.173, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-015412883.jpg', 'highlights': ['Covers setting variable sales targets based on store sales and adapting targets using the apply function.', 'Explains the process of concatenating data frames in Pandas and the importance of ensuring equality in rows and columns.', 'Demonstrates the importance of handling NaN values in data analytics and provides examples of dropping or replacing NaN values.']}, {'end': 18596.426, 'segs': [{'end': 16695.787, 'src': 'embed', 'start': 16670.789, 'weight': 1, 'content': [{'end': 16677.279, 'text': 'So there is a make and model of the car how many tires it has, what is the engine?', 'start': 16670.789, 'duration': 6.49}, {'end': 16682.342, 'text': 'what type that is automobile data, and check the head of the data frame.', 'start': 16677.279, 'duration': 5.063}, {'end': 16683.942, 'text': 'how many rows and columns are there?', 'start': 16682.342, 'duration': 1.6}, {'end': 16685.643, 'text': 'what is the average price of all cars?', 'start': 16683.942, 'duration': 1.701}, {'end': 16687.204, 'text': 'which is the cheapest make?', 'start': 16685.643, 'duration': 1.561}, {'end': 16689.865, 'text': 'how many cars have horsepower greater than this?', 'start': 16687.204, 'duration': 2.661}, {'end': 16693.826, 'text': 'three most commonly found cars, which cars are priced, et cetera, et cetera.', 'start': 16689.865, 'duration': 3.961}, {'end': 16695.787, 'text': 'So this is one sample practice.', 'start': 16693.866, 'duration': 1.921}], 'summary': 'Analyzing automobile data including tire count, engine type, row count, column count, average price, cheapest make, and horsepower distribution.', 'duration': 24.998, 'max_score': 16670.789, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-016670789.jpg'}, {'end': 16754.285, 'src': 'embed', 'start': 16726.385, 'weight': 2, 'content': [{'end': 16728.568, 'text': 'It is a website which shares data to you.', 'start': 16726.385, 'duration': 2.183}, {'end': 16733.511, 'text': 'You get a lot of data and there are a lot of competitions that happen in Kaggle.', 'start': 16729.188, 'duration': 4.323}, {'end': 16738.936, 'text': "So you can sign up for a competition, they'll give you the data and then they will ask you to find a solution.", 'start': 16733.872, 'duration': 5.064}, {'end': 16742.639, 'text': 'So for data science and ML, this is the best place to get all the data.', 'start': 16739.517, 'duration': 3.122}, {'end': 16750.264, 'text': 'You can also see the previous competition results which people uploaded and that also can help you get more idea.', 'start': 16742.719, 'duration': 7.545}, {'end': 16754.285, 'text': 'So this automobile data set is actually taken from Kaggle.', 'start': 16750.324, 'duration': 3.961}], 'summary': 'Kaggle offers abundant data, hosts competitions, and aids in finding solutions for data science and ml.', 'duration': 27.9, 'max_score': 16726.385, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-016726384.jpg'}, {'end': 16814.597, 'src': 'embed', 'start': 16790.697, 'weight': 3, 'content': [{'end': 16798.083, 'text': 'So the basic idea is to create multi-dimensional arrays, right and NumPy support something called array, okay.', 'start': 16790.697, 'duration': 7.386}, {'end': 16802.247, 'text': 'So the array can be one-dimensional, two-dimensional, three-dimensional that is what it is.', 'start': 16798.103, 'duration': 4.144}, {'end': 16807.211, 'text': 'The practical application may not come right now for you.', 'start': 16802.627, 'duration': 4.584}, {'end': 16813.597, 'text': 'you will see NumPy arrays later, when you, you know, start working with deep learning or any other you know.', 'start': 16807.211, 'duration': 6.386}, {'end': 16814.597, 'text': 'projects as such.', 'start': 16813.857, 'duration': 0.74}], 'summary': 'Numpy supports multi-dimensional arrays for deep learning and other projects.', 'duration': 23.9, 'max_score': 16790.697, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-016790697.jpg'}, {'end': 17102.989, 'src': 'embed', 'start': 17071.169, 'weight': 0, 'content': [{'end': 17072.71, 'text': 'pre-compiled data types.', 'start': 17071.169, 'duration': 1.541}, {'end': 17081.575, 'text': "meaning if you create a NumPy array of, let's say, integers like this, they are already compiled and any operation you write on them,", 'start': 17072.71, 'duration': 8.865}, {'end': 17085.918, 'text': 'like addition subtraction, they can apply that on all the elements really fast.', 'start': 17081.575, 'duration': 4.343}, {'end': 17092.482, 'text': 'So a NumPy operations are 10 to 100 times faster than normal list operations.', 'start': 17086.839, 'duration': 5.643}, {'end': 17102.989, 'text': "You have heard about this Fortran, right? Fortran, what's the language? Fortran is considered to be one of the fastest languages ever created.", 'start': 17093.263, 'duration': 9.726}], 'summary': 'Numpy arrays are 10 to 100 times faster than normal lists due to pre-compiled data types, making operations efficient.', 'duration': 31.82, 'max_score': 17071.169, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-017071169.jpg'}, {'end': 17613.756, 'src': 'embed', 'start': 17580.201, 'weight': 4, 'content': [{'end': 17589.049, 'text': 'This will be useful in case you know you are looking at some sampling data so if I say 15 it will give me what 0 to 14.', 'start': 17580.201, 'duration': 8.848}, {'end': 17591.952, 'text': 'So this will be a numpy array basically.', 'start': 17589.049, 'duration': 2.903}, {'end': 17598.812, 'text': 'Now where this can be useful is you can also mention this.', 'start': 17592.881, 'duration': 5.931}, {'end': 17606.004, 'text': "So I'm saying five is my starting point, 56 is the ending point and five is the step function.", 'start': 17599.593, 'duration': 6.411}, {'end': 17613.756, 'text': 'So can you see this? I did a np.arrange five, 56 and five.', 'start': 17608.373, 'duration': 5.383}], 'summary': 'Numpy array ranges from 5 to 56 in steps of 5.', 'duration': 33.555, 'max_score': 17580.201, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-017580201.jpg'}], 'start': 16555.696, 'title': 'Numpy fundamentals', 'summary': 'Provides an introduction to numpy, highlighting its performance benefits for numerical operations, covering common operations, array manipulation, and key functions such as a range and linspace, along with examples of array dimensions and reshaping.', 'chapters': [{'end': 16765.93, 'start': 16555.696, 'title': 'Pandas package practice and kaggle introduction', 'summary': 'Introduces the practice of using pandas package for data manipulation and provides an exercise involving automobile data, along with a brief introduction to kaggle as a platform for accessing and participating in data science and machine learning competitions.', 'duration': 210.234, 'highlights': ['The chapter introduces a practice exercise involving automobile data, consisting of questions about the data frame, average price, cheapest make, cars with horsepower greater than a certain value, and commonly found cars.', 'The instructor introduces Kaggle as a platform for accessing data and participating in data science and machine learning competitions, highlighting its availability of free data sets and past competition results.', 'The instructor emphasizes the importance of finding and manipulating NaN values in data frames and shares a notebook for practice, along with an explanation of NaN values.', 'The instructor encourages the students to practice using Pandas package and mentions sharing a notebook for further practice, demonstrating the process of uploading files and accessing automobile data.', 'The instructor mentions the availability of a solution to the practice exercise and hints at providing more similar exercises in the future.']}, {'end': 17120.72, 'start': 16765.95, 'title': 'Introduction to numpy', 'summary': 'Explains the basics of numpy, demonstrates the conversion of a list to a numpy array, and highlights the performance benefits of using numpy arrays for numerical operations, which are 10 to 100 times faster than normal list operations.', 'duration': 354.77, 'highlights': ['NumPy arrays offer 10 to 100 times faster numerical operations compared to normal lists, making them highly efficient for computational tasks.', 'Demonstrates the conversion of a list to a NumPy array, showcasing the ability to perform operations like distance by speed and obtain the result as a NumPy n-dimensional array.', 'NumPy arrays are pre-compiled data types, enabling fast application of operations like addition and subtraction on all elements.', 'Explains the concept of NumPy as a special data structure library in Python, designed for creating and manipulating multi-dimensional arrays, which is essential for tasks like deep learning and data analysis.']}, {'end': 17450.375, 'start': 17121.08, 'title': 'Numpy common operations and data types', 'summary': 'Covers common operations in numpy arrays, including concatenation, broadcasting, and data type conversion, highlighting the importance of maintaining consistent dimensions and data types for efficient array manipulation.', 'duration': 329.295, 'highlights': ['If you add two NumPy arrays, their dimensions must be exactly the same, as it is essential for efficient array operations, demonstrated by the error when attempting to add arrays with different dimensions.', 'NumPy arrays allow common operations to be broadcasted to the elements, enhancing efficiency and simplicity of array manipulation.', 'In NumPy arrays, data type consistency is crucial, as mixing different data types can lead to automatic casting, potentially impacting array functionality and performance.']}, {'end': 17836.58, 'start': 17451.055, 'title': 'Numpy functions: a range and linspace', 'summary': 'Introduces numpy functions a range and linspace, which generate ranges of numbers and equidistant numbers respectively, and explains the difference between the two functions in terms of their outputs and use cases.', 'duration': 385.525, 'highlights': ['A range function generates a range of numbers, allowing for the specification of start, end, and step size, such as np.arange(5, 56, 5), which generates numbers from 5 to 55 in steps of 5. Demonstrates the usage of a range function with specific start, end, and step size, providing a clear example of its functionality.', 'The linspace function generates equidistant numbers within a specified range, as demonstrated by np.linspace(30, 100, 5), which produces 5 equidistant numbers between 30 and 100. Explains the functionality of the linspace function and provides an example of generating equidistant numbers within a specified range.', "The explanation of the formula behind linspace's calculation of equidistant numbers, as seen in the expression 100 minus 30 divided by 5 minus 1, provides insight into the underlying mechanism of the function. Provides an insight into the calculation method used by the linspace function to generate equidistant numbers within a given range."]}, {'end': 18596.426, 'start': 17836.66, 'title': 'Understanding dimensions and reshaping in numpy', 'summary': 'Discusses the concept of dimensions in numpy arrays, including one, two, and three-dimensional arrays, and the use of the reshape function to convert arrays into multi-dimensional matrices, with examples demonstrating the creation of two and three-dimensional arrays as well as the application of methods such as filter, np.zeros, np.ones, and np.identity.', 'duration': 759.766, 'highlights': ['The chapter explains the concept of dimensions in NumPy arrays, covering one, two, and three-dimensional arrays, with examples of creating arrays and matrices. The transcript elaborates on the concept of dimensions in NumPy arrays, discussing one, two, and three-dimensional arrays, and providing examples of creating arrays and matrices.', 'The use of the reshape function to convert arrays into multi-dimensional matrices is demonstrated through examples of creating two and three-dimensional arrays. The transcript showcases the use of the reshape function to convert arrays into multi-dimensional matrices, with examples illustrating the creation of two and three-dimensional arrays.', 'Examples of applying methods such as filter, np.zeros, np.ones, and np.identity are provided, demonstrating their utility in manipulating and creating NumPy arrays. The transcript presents examples of applying methods such as filter, np.zeros, np.ones, and np.identity, showcasing their utility in manipulating and creating NumPy arrays.']}], 'duration': 2040.73, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-016555696.jpg', 'highlights': ['NumPy arrays offer 10 to 100 times faster numerical operations compared to normal lists, making them highly efficient for computational tasks.', 'The chapter introduces a practice exercise involving automobile data, consisting of questions about the data frame, average price, cheapest make, cars with horsepower greater than a certain value, and commonly found cars.', 'The instructor introduces Kaggle as a platform for accessing data and participating in data science and machine learning competitions, highlighting its availability of free data sets and past competition results.', 'If you add two NumPy arrays, their dimensions must be exactly the same, as it is essential for efficient array operations, demonstrated by the error when attempting to add arrays with different dimensions.', 'A range function generates a range of numbers, allowing for the specification of start, end, and step size, such as np.arange(5, 56, 5), which generates numbers from 5 to 55 in steps of 5. Demonstrates the usage of a range function with specific start, end, and step size, providing a clear example of its functionality.', 'The chapter explains the concept of dimensions in NumPy arrays, covering one, two, and three-dimensional arrays, with examples of creating arrays and matrices. The transcript elaborates on the concept of dimensions in NumPy arrays, discussing one, two, and three-dimensional arrays, and providing examples of creating arrays and matrices.']}, {'end': 19958.665, 'segs': [{'end': 18631.208, 'src': 'embed', 'start': 18597.711, 'weight': 0, 'content': [{'end': 18600.372, 'text': "Don't ask me, okay, why? I mean, I don't know.", 'start': 18597.711, 'duration': 2.661}, {'end': 18602.073, 'text': "So I've been trying to find out.", 'start': 18600.953, 'duration': 1.12}, {'end': 18607.336, 'text': 'I mean God knows how it is working, but normally an identity matrix should have equal number of rows and columns.', 'start': 18602.073, 'duration': 5.263}, {'end': 18609.037, 'text': 'then only the concept makes sense.', 'start': 18607.336, 'duration': 1.701}, {'end': 18614.539, 'text': "But you can also say I want more columns and then it just adds, I don't know, maybe it works like that.", 'start': 18609.477, 'duration': 5.062}, {'end': 18622.483, 'text': "Now, if you want to print something like, so what is happening here? I'm just filling the diagonals with the numbers, one, three.", 'start': 18615.14, 'duration': 7.343}, {'end': 18631.208, 'text': 'So identity matrix will have ones, right? But in certain use cases where you want to you know fill it you can actually fill it like this right.', 'start': 18622.984, 'duration': 8.224}], 'summary': 'Explanation of identity matrix properties and potential variations.', 'duration': 33.497, 'max_score': 18597.711, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-018597711.jpg'}, {'end': 18731.388, 'src': 'embed', 'start': 18700.091, 'weight': 1, 'content': [{'end': 18704.995, 'text': "But these days it's very rare that you use matplotlib directly.", 'start': 18700.091, 'duration': 4.904}, {'end': 18709.023, 'text': 'We are using a library called Seaborn on top of it.', 'start': 18705.922, 'duration': 3.101}, {'end': 18713.524, 'text': 'So Seaborn is the actual library that people use.', 'start': 18710.923, 'duration': 2.601}, {'end': 18717.705, 'text': 'Seaborn is built on top of Matplotlib and this has much more visualizations.', 'start': 18713.984, 'duration': 3.721}, {'end': 18723.626, 'text': "There is also one more visualization library that's very popular.", 'start': 18718.265, 'duration': 5.361}, {'end': 18730.548, 'text': 'Tableau Tableau is BI, business intelligence.', 'start': 18725.667, 'duration': 4.881}, {'end': 18731.388, 'text': 'That is not this.', 'start': 18730.728, 'duration': 0.66}], 'summary': 'Seaborn, built on matplotlib, offers more visualizations than tableau bi.', 'duration': 31.297, 'max_score': 18700.091, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-018700091.jpg'}, {'end': 18792.095, 'src': 'embed', 'start': 18759.622, 'weight': 2, 'content': [{'end': 18762.283, 'text': 'There is something called univariate and bivariate.', 'start': 18759.622, 'duration': 2.661}, {'end': 18771.422, 'text': 'Univariate means you are looking at a single variable and then you are visualizing the data.', 'start': 18765.018, 'duration': 6.404}, {'end': 18775.845, 'text': 'Bivariate means you are looking at two different variables and then you are visualizing the data.', 'start': 18771.502, 'duration': 4.343}, {'end': 18782.829, 'text': "For example, let's take the ages of people in this class.", 'start': 18776.285, 'duration': 6.544}, {'end': 18792.095, 'text': "So let's say I want to plot a graph, considering all the ages of people in this class.", 'start': 18784.47, 'duration': 7.625}], 'summary': 'Univariate focuses on a single variable, while bivariate analyzes two different variables for visualization.', 'duration': 32.473, 'max_score': 18759.622, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-018759622.jpg'}, {'end': 18985.019, 'src': 'embed', 'start': 18956.22, 'weight': 3, 'content': [{'end': 18962.628, 'text': "because you're basically looking at a string column and saying that how many people are from Mumbai, from Chennai, from Bangalore?", 'start': 18956.22, 'duration': 6.408}, {'end': 18968.435, 'text': 'Categorical You will have spaces between this thing, bars in this.', 'start': 18963.369, 'duration': 5.066}, {'end': 18970.378, 'text': "That's why it's called a bar chart, actually.", 'start': 18968.455, 'duration': 1.923}, {'end': 18979.372, 'text': "So in univariate, normally you will draw a histogram or you will draw a bar chart, that's very common.", 'start': 18973.325, 'duration': 6.047}, {'end': 18985.019, 'text': "Now let's talk about bivariate, where you have two variable to compare.", 'start': 18979.953, 'duration': 5.066}], 'summary': 'Analyzing categorical data through bar charts and histograms in univariate and bivariate analysis.', 'duration': 28.799, 'max_score': 18956.22, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-018956220.jpg'}, {'end': 19344.614, 'src': 'embed', 'start': 19315.266, 'weight': 4, 'content': [{'end': 19319.21, 'text': 'For example, you are calculating the average salary of people in this class.', 'start': 19315.266, 'duration': 3.944}, {'end': 19320.581, 'text': 'We are all here.', 'start': 19319.96, 'duration': 0.621}, {'end': 19329.549, 'text': "Now let's say I invite Mark Zuckerberg to sit in the class and then if you calculate the average salary all of us will be millionaires.", 'start': 19321.301, 'duration': 8.248}, {'end': 19331.871, 'text': 'That is mean.', 'start': 19331.431, 'duration': 0.44}, {'end': 19335.112, 'text': 'understood what will happen.', 'start': 19333.291, 'duration': 1.821}, {'end': 19336.452, 'text': 'it will impact it.', 'start': 19335.112, 'duration': 1.34}, {'end': 19339.593, 'text': 'so mark zuckerberg has billions of dollar salary.', 'start': 19336.452, 'duration': 3.141}, {'end': 19340.453, 'text': 'you calculate the mean.', 'start': 19339.593, 'duration': 0.86}, {'end': 19341.013, 'text': 'what will happen?', 'start': 19340.453, 'duration': 0.56}, {'end': 19344.614, 'text': 'the average class salary will be 10 million, 100 million.', 'start': 19341.013, 'duration': 3.601}], 'summary': "Calculating average class salary would increase significantly if mark zuckerberg's salary is included.", 'duration': 29.348, 'max_score': 19315.266, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-019315266.jpg'}, {'end': 19383.951, 'src': 'embed', 'start': 19358.891, 'weight': 5, 'content': [{'end': 19366.576, 'text': 'so in the box plot, what happens is that it will pick Bangalore okay, and the salaries will be arranged in percentile,', 'start': 19358.891, 'duration': 7.685}, {'end': 19370.319, 'text': 'in a median fashion and will show you the middle value.', 'start': 19366.576, 'duration': 3.743}, {'end': 19377.184, 'text': 'so right now, if I look at, you can see that people from Delhi has more salary, because this is the median of Delhi and this is more than this.', 'start': 19370.319, 'duration': 6.865}, {'end': 19383.951, 'text': 'So by looking at this, you can say that people from Delhi has more salary compared to people from Chennai, compared to people from Bangalore.', 'start': 19378.168, 'duration': 5.783}], 'summary': 'Box plot depicts salary comparison, showing delhi has highest median salary.', 'duration': 25.06, 'max_score': 19358.891, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-019358891.jpg'}, {'end': 19453.434, 'src': 'embed', 'start': 19424.224, 'weight': 6, 'content': [{'end': 19427.526, 'text': "Because now again you're categorizing what? Male and female.", 'start': 19424.224, 'duration': 3.302}, {'end': 19433.552, 'text': "But that's not a good idea, because if you do three, four variable plotting, plotting will work,", 'start': 19428.226, 'duration': 5.326}, {'end': 19437.756, 'text': 'but you cannot make sense of what is happening right?', 'start': 19433.552, 'duration': 4.204}, {'end': 19439.958, 'text': 'So the visualizations are not very powerful.', 'start': 19437.816, 'duration': 2.142}, {'end': 19443.601, 'text': 'So normally we do univariate and bivariate analysis.', 'start': 19440.538, 'duration': 3.063}, {'end': 19448.11, 'text': 'anything more than that is not typically good idea to do right?', 'start': 19443.601, 'duration': 4.509}, {'end': 19453.434, 'text': 'And that is exactly where your ML and all are going to help, right?', 'start': 19448.931, 'duration': 4.503}], 'summary': 'Visualizations limited to univariate and bivariate analysis, ml offers more insights.', 'duration': 29.21, 'max_score': 19424.224, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-019424224.jpg'}, {'end': 19523.822, 'src': 'embed', 'start': 19495.452, 'weight': 7, 'content': [{'end': 19497.753, 'text': 'That is where you say, call a machine learning algorithm.', 'start': 19495.452, 'duration': 2.301}, {'end': 19504.758, 'text': 'Let this guy work on all these features, seven or eight features, and then tell me why my sales are declining.', 'start': 19498.334, 'duration': 6.424}, {'end': 19510.763, 'text': 'That is where actually ML works, right? So basically you can say that it is an extension of your visualization.', 'start': 19505.259, 'duration': 5.504}, {'end': 19514.574, 'text': 'not visualization but problem solving.', 'start': 19511.912, 'duration': 2.662}, {'end': 19515.435, 'text': 'basically right?', 'start': 19514.574, 'duration': 0.861}, {'end': 19523.822, 'text': 'So sometimes I handle the intro to ML class and I give a very interesting example, but this is not related to visualization.', 'start': 19516.075, 'duration': 7.747}], 'summary': 'Machine learning algorithm can work on 7-8 features to analyze declining sales, extending problem-solving beyond visualization.', 'duration': 28.37, 'max_score': 19495.452, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-019495452.jpg'}, {'end': 19571.218, 'src': 'embed', 'start': 19542.83, 'weight': 8, 'content': [{'end': 19544.951, 'text': 'So first quadrant is called a KK.', 'start': 19542.83, 'duration': 2.121}, {'end': 19554.893, 'text': 'So that means there are things which you know that you know, right? For example, you know Java, for example.', 'start': 19545.351, 'duration': 9.542}, {'end': 19556.014, 'text': "So you're a Java developer.", 'start': 19554.913, 'duration': 1.101}, {'end': 19560.395, 'text': 'So you know that you know Java, right? I mean, that is your understanding.', 'start': 19556.554, 'duration': 3.841}, {'end': 19565.156, 'text': 'So first quadrant is where you have the knowledge of something which you already know.', 'start': 19560.415, 'duration': 4.741}, {'end': 19568.417, 'text': 'The second quadrant is KDK.', 'start': 19565.616, 'duration': 2.801}, {'end': 19571.218, 'text': 'You know things you do not know.', 'start': 19569.597, 'duration': 1.621}], 'summary': 'Understanding the four quadrants of knowledge: kk and kdk.', 'duration': 28.388, 'max_score': 19542.83, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-019542830.jpg'}, {'end': 19788.696, 'src': 'embed', 'start': 19762.139, 'weight': 9, 'content': [{'end': 19767.563, 'text': 'right, the machine learning is where you have some data and you have no clue.', 'start': 19762.139, 'duration': 5.424}, {'end': 19773.988, 'text': 'I mean you already know something about the data, but you are trying to figure out something which is not possible through any three quadrant.', 'start': 19767.563, 'duration': 6.425}, {'end': 19779.711, 'text': 'You are trying to understand something which nobody can otherwise give you an idea about.', 'start': 19775.448, 'duration': 4.263}, {'end': 19788.696, 'text': "So you are just going to get some data and then you have to get a useful insight from the data and then show that's all you are doing in ML.", 'start': 19779.731, 'duration': 8.965}], 'summary': 'Machine learning extracts useful insights from data.', 'duration': 26.557, 'max_score': 19762.139, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-019762139.jpg'}], 'start': 18597.711, 'title': 'Matrix, visualization, and data analysis', 'summary': 'Covers the concept of identity matrix and its applications, visualization in python, types of visualizations, different plot types in data analysis, impact of mean and median, and understanding machine learning. it discusses various aspects including requirements, tools such as seaborn, types of visualizations, and the importance of machine learning, providing insights for building machine learning models and trend analysis.', 'chapters': [{'end': 18631.208, 'start': 18597.711, 'title': 'Identity matrix and its applications', 'summary': 'Discusses the concept of an identity matrix and its applications, including the requirement for equal number of rows and columns, the possibility of having more columns, and the use of ones to fill the diagonals for certain use cases.', 'duration': 33.497, 'highlights': ['An identity matrix should have equal number of rows and columns for the concept to make sense.', 'In certain use cases, it is possible to fill the diagonals of an identity matrix with numbers other than ones.', 'It is mentioned that you can also have more columns in an identity matrix, indicating a flexibility in its structure.']}, {'end': 18912.594, 'start': 18632.829, 'title': 'Visualization in python and types of visualizations', 'summary': 'Discusses the importance of visualizations in building machine learning models, the use of seaborn for visualization in python, and explains the concepts of univariate and bivariate visualizations, including the types of visualizations for integers and strings.', 'duration': 279.765, 'highlights': ['Seaborn is the primary library for visualization in Python, built on top of Matplotlib and offering a wider range of visualizations. Seaborn is the main visualization library in Python, providing a broader range of visualizations compared to Matplotlib.', 'Understanding the concept of univariate and bivariate visualizations is crucial for effectively visualizing single and multiple variables in a dataset. Knowledge of univariate and bivariate visualizations is essential for effectively representing single and multiple variables in a dataset.', 'Histograms are used for visualizing single univariate data, while pie charts are suitable for categorical data representation. Histograms are employed for visualizing single univariate data, while pie charts are suitable for representing categorical data.']}, {'end': 19313.765, 'start': 18912.594, 'title': 'Types of plots in data analysis', 'summary': 'Covers the difference between a bar chart and a histogram, the use of scatter plots in bivariate analysis, and the introduction of box plots for comparing a string and a numerical variable, with emphasis on the importance of fitting lines to scatter plots for trend analysis.', 'duration': 401.171, 'highlights': ['The difference between a bar chart and a histogram is explained, emphasizing how bars in a bar chart are separate for categorical data and how histograms group data together. Explanation of the difference between bar chart and histogram, emphasizing the separation of bars in a bar chart for categorical data.', 'The use of scatter plots in bivariate analysis is discussed, with emphasis on comparing two numerical variables by plotting them on a 2D graph and the importance of fitting lines to determine the trend of the analysis. Discussion on the use of scatter plots for comparing two numerical variables, emphasizing the importance of fitting lines to determine the trend.', 'Introduction of box plots for comparing a string and numerical variable, highlighting their importance in understanding the mean and median of different categories. Introduction of box plots for comparing a string and numerical variable, emphasizing their role in understanding the mean and median of different categories.']}, {'end': 19495.311, 'start': 19315.266, 'title': 'Impact of mean, median, and visualization in data analysis', 'summary': 'Discusses the impact of calculating mean and median on salary data, the use of box plots to visualize salary distribution, and the limitations of using multiple variables in visualizations, emphasizing the importance of machine learning for complex data analysis.', 'duration': 180.045, 'highlights': ["The impact of calculating mean and median on salary data is explained, highlighting the potential distortion in average salary when outliers are present. Mark Zuckerberg's salary example", 'The use of box plots to visualize salary distribution and compare salaries across different locations is demonstrated, illustrating the significance of median values and percentiles. Comparison of salaries from different cities using box plots', 'The discussion on the limitations of using multiple variables in visualizations, emphasizing the challenges of interpreting complex visualizations with more than two variables. The recommendation to use univariate and bivariate analysis over complex multivariate visualizations', 'The importance of machine learning for complex data analysis is highlighted, with an example of analyzing sales data with multiple parameters that cannot be effectively visualized. The need to include seven or eight parameters in sales data analysis']}, {'end': 19958.665, 'start': 19495.452, 'title': 'Understanding machine learning and visualization', 'summary': 'Discusses the concept of machine learning as a tool for problem-solving, explaining the four quadrants of knowledge and its application in data visualization and mining, with an emphasis on the importance of machine learning in gaining useful insights from data.', 'duration': 463.213, 'highlights': ['Machine learning as an extension of visualization and problem-solving Machine learning is described as an extension of visualization and problem-solving, where it works on features to identify reasons for sales decline.', 'Explanation of the four quadrants of knowledge and their relevance to learning The concept of the four quadrants of knowledge is explained, illustrating scenarios of known knowledge, unknown knowledge, learned knowledge, and unknowable knowledge, with a focus on how they apply to learning and experience.', 'Importance of machine learning in gaining useful insights from data The importance of machine learning is emphasized in gaining useful insights from data that cannot be derived from traditional analysis or visualization tools, positioning it as a valuable tool for uncovering unique and valuable insights.']}], 'duration': 1360.954, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-018597711.jpg', 'highlights': ['An identity matrix should have equal number of rows and columns for the concept to make sense.', 'Seaborn is the primary library for visualization in Python, offering a wider range of visualizations.', 'Understanding the concept of univariate and bivariate visualizations is crucial for effectively visualizing single and multiple variables in a dataset.', 'The difference between a bar chart and a histogram is explained, emphasizing how bars in a bar chart are separate for categorical data.', 'The impact of calculating mean and median on salary data is explained, highlighting the potential distortion in average salary when outliers are present.', 'The use of box plots to visualize salary distribution and compare salaries across different locations is demonstrated, illustrating the significance of median values and percentiles.', 'The discussion on the limitations of using multiple variables in visualizations, emphasizing the challenges of interpreting complex visualizations with more than two variables.', 'Machine learning as an extension of visualization and problem-solving, working on features to identify reasons for sales decline.', 'Explanation of the four quadrants of knowledge and their relevance to learning, illustrating scenarios of known knowledge, unknown knowledge, learned knowledge, and unknowable knowledge.', 'Importance of machine learning in gaining useful insights from data that cannot be derived from traditional analysis or visualization tools.']}, {'end': 22329.662, 'segs': [{'end': 20023.11, 'src': 'embed', 'start': 19989.813, 'weight': 0, 'content': [{'end': 19991.575, 'text': 'See this has around 81 columns.', 'start': 19989.813, 'duration': 1.762}, {'end': 19993.557, 'text': "So it's very difficult to show the old columns.", 'start': 19991.615, 'duration': 1.942}, {'end': 19999.904, 'text': 'But what it actually has is that this is the housing sales data from US.', 'start': 19994.158, 'duration': 5.746}, {'end': 20007.152, 'text': 'So you will have the square feet, number of bedroom, number of bathroom and all the properties of the house, basically,', 'start': 20000.424, 'duration': 6.728}, {'end': 20008.793, 'text': 'and the sale price and all right.', 'start': 20007.152, 'duration': 1.641}, {'end': 20009.634, 'text': 'So we have different columns.', 'start': 20008.813, 'duration': 0.821}, {'end': 20015.826, 'text': 'You can actually do a head that to understand how see it.', 'start': 20012.024, 'duration': 3.802}, {'end': 20023.11, 'text': 'so you have lot frontage, lot area, street lot shape, utilities, pool area.', 'start': 20015.826, 'duration': 7.284}], 'summary': 'The transcript describes housing sales data with 81 columns including square feet, bedrooms, bathrooms, and sale price.', 'duration': 33.297, 'max_score': 19989.813, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-019989813.jpg'}, {'end': 20076.599, 'src': 'embed', 'start': 20046.25, 'weight': 1, 'content': [{'end': 20051.272, 'text': 'So you can actually, are you able to read the data? I mean, load the data? Yes, right? Yeah.', 'start': 20046.25, 'duration': 5.022}, {'end': 20054.473, 'text': "So let's look at a univariate analysis.", 'start': 20051.672, 'duration': 2.801}, {'end': 20062.635, 'text': "So what I'm going to do, I'm going to do a dist plot on the lot area column.", 'start': 20055.173, 'duration': 7.462}, {'end': 20067.218, 'text': 'So lot area is a column which has integer kind of value.', 'start': 20063.737, 'duration': 3.481}, {'end': 20074.939, 'text': 'And normally in cases of your seaborn, when you want to do a histogram, you will say dist plot.', 'start': 20068.078, 'duration': 6.861}, {'end': 20076.599, 'text': 'Dist plot is your histogram.', 'start': 20075.379, 'duration': 1.22}], 'summary': 'Performing univariate analysis on lot area column using dist plot.', 'duration': 30.349, 'max_score': 20046.25, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-020046250.jpg'}, {'end': 20632.113, 'src': 'embed', 'start': 20605.416, 'weight': 2, 'content': [{'end': 20614.324, 'text': "you know what you say without outliers, because I'm considering only the data till 95 point, and the points are much more you know, aligned together.", 'start': 20605.416, 'duration': 8.908}, {'end': 20624.029, 'text': 'So, by looking at this graph, we can probably say there is a I mean the you know line is actually pointing towards upwards.', 'start': 20614.524, 'duration': 9.505}, {'end': 20627.751, 'text': 'so you can say that as the lot area increases sale price also.', 'start': 20624.029, 'duration': 3.722}, {'end': 20632.113, 'text': 'So one way to remove your outliers is to use the quantile.', 'start': 20628.231, 'duration': 3.882}], 'summary': 'Data suggests positive correlation between lot area and sale price, utilizing quantile to remove outliers.', 'duration': 26.697, 'max_score': 20605.416, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-020605416.jpg'}, {'end': 20901.022, 'src': 'embed', 'start': 20872.504, 'weight': 3, 'content': [{'end': 20879.708, 'text': 'So basically what I am doing here is that by running this I am just selecting all the columns where there is an SF.', 'start': 20872.504, 'duration': 7.204}, {'end': 20883.73, 'text': 'So you want to do some analysis based on the square feet.', 'start': 20881.069, 'duration': 2.661}, {'end': 20891.694, 'text': 'Now imagine I want to plot a graph where you know I want to plot multiple graphs.', 'start': 20884.971, 'duration': 6.723}, {'end': 20895.797, 'text': 'So how many columns are there with square feet? 9 columns.', 'start': 20892.575, 'duration': 3.222}, {'end': 20901.022, 'text': 'I have 9 columns where there is a square feet in it.', 'start': 20897.239, 'duration': 3.783}], 'summary': 'Selecting columns with sf, 9 columns have square feet data.', 'duration': 28.518, 'max_score': 20872.504, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-020872504.jpg'}, {'end': 21698.528, 'src': 'embed', 'start': 21666.847, 'weight': 4, 'content': [{'end': 21675.851, 'text': 'And what are we plotting here? You are plotting the material used for exterior painting against the sales price, right? So you can see that.', 'start': 21666.847, 'duration': 9.004}, {'end': 21681.093, 'text': 'So what is the inference you can use from this? What do you understand by this plot??', 'start': 21677.191, 'duration': 3.902}, {'end': 21689.081, 'text': 'make some inferences from this.', 'start': 21682.096, 'duration': 6.985}, {'end': 21690.662, 'text': 'Outlier is fine, apart from outlier.', 'start': 21689.081, 'duration': 1.581}, {'end': 21692.564, 'text': 'what this tells you about the house?', 'start': 21690.662, 'duration': 1.902}, {'end': 21698.528, 'text': 'so if somebody is asking so how is the cost involved in this?', 'start': 21692.564, 'duration': 5.964}], 'summary': 'Analyzing material used for exterior painting vs. sales price for house valuation.', 'duration': 31.681, 'max_score': 21666.847, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-021666847.jpg'}, {'end': 22151.575, 'src': 'heatmap', 'start': 21769.198, 'weight': 0.746, 'content': [{'end': 21773.863, 'text': 'So, you always look at this median value and see how the plot is actually going, that is a box plot.', 'start': 21769.198, 'duration': 4.665}, {'end': 21782.531, 'text': 'This line, now this is a standard how do you say axis representation.', 'start': 21777.947, 'duration': 4.584}, {'end': 21786.604, 'text': 'Oh, this one.', 'start': 21786.044, 'duration': 0.56}, {'end': 21792.508, 'text': 'So this one actually, so if you look at some of them, this actually encounters the outliers also.', 'start': 21787.685, 'duration': 4.823}, {'end': 21796.01, 'text': 'So it just says that this is pointing towards outlier.', 'start': 21793.249, 'duration': 2.761}, {'end': 21799.132, 'text': 'Some of the, in general, there is nothing to be done.', 'start': 21796.35, 'duration': 2.782}, {'end': 21800.633, 'text': 'The size actually matters.', 'start': 21799.432, 'duration': 1.201}, {'end': 21807.537, 'text': 'For example, this data is very big, right? So if you look at this line, you can see that the total length is actually big.', 'start': 21800.673, 'duration': 6.864}, {'end': 21811.66, 'text': 'And this one is very small because the amount of data that you have is very small.', 'start': 21808.098, 'duration': 3.562}, {'end': 21821.783, 'text': 'This one which I do not know, I do not think so.', 'start': 21818.623, 'duration': 3.16}, {'end': 21826.924, 'text': 'Now I do not think the length actually matters, the median line actually matters where it is.', 'start': 21823.104, 'duration': 3.82}, {'end': 21829.865, 'text': 'Median line is what actually matters where it is.', 'start': 21827.865, 'duration': 2}, {'end': 21832.326, 'text': 'I do not think length actually matters.', 'start': 21830.585, 'duration': 1.741}, {'end': 21840.067, 'text': 'But general consideration is like if the amount of data you have is more than the length will be more, total line I am saying.', 'start': 21833.166, 'duration': 6.901}, {'end': 21843.048, 'text': 'These are the values.', 'start': 21842.308, 'duration': 0.74}, {'end': 21857.773, 'text': "that falls in that range for example plywood these many values are there you don't get a number but if this box is bigger more values are there.", 'start': 21844.025, 'duration': 13.748}, {'end': 21859.915, 'text': 'Line is just a sort of visualization.', 'start': 21857.813, 'duration': 2.102}, {'end': 21863.817, 'text': 'this is actual values are here only because you look at this guy.', 'start': 21859.915, 'duration': 3.902}, {'end': 21866.679, 'text': 'this is very small, right, and this is bigger.', 'start': 21863.817, 'duration': 2.862}, {'end': 21869.24, 'text': 'so box plot actually starts with two lines actually.', 'start': 21866.679, 'duration': 2.561}, {'end': 21875.731, 'text': 'No, no, no, you should not consider this.', 'start': 21874.11, 'duration': 1.621}, {'end': 21876.691, 'text': 'you should consider this.', 'start': 21875.731, 'duration': 0.96}, {'end': 21879.692, 'text': 'boxes actually, not these guys.', 'start': 21876.691, 'duration': 3.001}, {'end': 21885.254, 'text': 'and also in the middle, what you see this line, what you see that is a median value, actually right?', 'start': 21879.692, 'duration': 5.562}, {'end': 21891.896, 'text': 'This is the total range.', 'start': 21890.676, 'duration': 1.22}, {'end': 21898.278, 'text': 'This is 50th percentage.', 'start': 21893.697, 'duration': 4.581}, {'end': 21907.315, 'text': 'Yes, in this range.', 'start': 21906.215, 'duration': 1.1}, {'end': 21910.696, 'text': "Oh, that's what you are asking.", 'start': 21909.656, 'duration': 1.04}, {'end': 21915.138, 'text': 'Yeah, so if you look at the whole graph, this is the minimum, this is the maximum.', 'start': 21910.976, 'duration': 4.162}, {'end': 21918.339, 'text': 'Okay, this is the minimum value.', 'start': 21916.758, 'duration': 1.581}, {'end': 21919.979, 'text': 'This is the maximum value.', 'start': 21918.819, 'duration': 1.16}, {'end': 21921.62, 'text': 'And this is clearly an outlier.', 'start': 21920.299, 'duration': 1.321}, {'end': 21925.841, 'text': 'And this is the, what you say, median value.', 'start': 21922.3, 'duration': 3.541}, {'end': 21929.122, 'text': 'So most of the points are centered over here, it seems.', 'start': 21926.541, 'duration': 2.581}, {'end': 21932.221, 'text': 'Okay, so visualization.', 'start': 21930.58, 'duration': 1.641}, {'end': 21934.242, 'text': 'I think basic visualization we have covered.', 'start': 21932.221, 'duration': 2.021}, {'end': 21940.406, 'text': 'if anything else pending, that will be discussed in your respective statistics classes right.', 'start': 21934.242, 'duration': 6.164}, {'end': 21949.691, 'text': 'Because you should know what is the difference between, let us say, a box plot and a normal plot, and how it looks like, what is the use of it.', 'start': 21941.407, 'duration': 8.284}, {'end': 21955.395, 'text': 'apart from that conditions, and all later, when you get the actual data, you will be adding and discussing them, okay.', 'start': 21949.691, 'duration': 5.704}, {'end': 21977.902, 'text': 'what you now need to do is you now need to be able to get the data to solve this problem.', 'start': 21972.8, 'duration': 5.102}, {'end': 21988.906, 'text': 'so therefore the statistical way of thinking typically says you formulate a problem and then you get the data to solve that problem.', 'start': 21980.903, 'duration': 8.003}, {'end': 21993.51, 'text': 'the machine learning we have looking at things typically says here is the data.', 'start': 21990.029, 'duration': 3.481}, {'end': 21996.892, 'text': 'tell me what that data is telling you.', 'start': 21994.551, 'duration': 2.341}, {'end': 22000.753, 'text': 'many of my colleagues and i myself have run into this problem and going for interviews, etcetera,', 'start': 21996.892, 'duration': 3.861}, {'end': 22009.437, 'text': "etcetera and so sort of statisticians say that we're not getting jobs out there,", 'start': 22000.753, 'duration': 8.684}, {'end': 22014.879, 'text': "and so i go to go to people who are hiring and saying that why don't you hire statisticians?", 'start': 22009.437, 'duration': 5.442}, {'end': 22028.594, 'text': "and i reach an interesting conclusion to this entire discussion that sometimes around the way the interviewer who's interviewing the statisticians for a data scientist job ask the question here is my data,", 'start': 22014.879, 'duration': 13.715}, {'end': 22030.595, 'text': 'what can you say?', 'start': 22028.594, 'duration': 2.001}, {'end': 22034.256, 'text': 'and the statistician answers with something like what do you want to know?', 'start': 22030.595, 'duration': 3.661}, {'end': 22037.918, 'text': "and the business guy says but that's why i want to hire you.", 'start': 22034.256, 'duration': 3.662}, {'end': 22043.3, 'text': "and the statistician says but if you don't tell me what you want to know, how do i know what to tell you?", 'start': 22037.918, 'duration': 5.382}, {'end': 22045.681, 'text': 'and this goes round and round right.', 'start': 22043.3, 'duration': 2.381}, {'end': 22047.322, 'text': "no one's happy about this entire process.", 'start': 22045.681, 'duration': 1.641}, {'end': 22053.413, 'text': "so there's a difference in the way these two communities approach things.", 'start': 22049.571, 'duration': 3.842}, {'end': 22057.915, 'text': 'my job is not to resolve that.', 'start': 22056.614, 'duration': 1.301}, {'end': 22067.279, 'text': "because in the world that you will face you see a lot more of this kind of thinking than you'll see in this thing.", 'start': 22059.916, 'duration': 7.363}, {'end': 22074.763, 'text': 'because in this world the data is cheap and the question is expensive.', 'start': 22070.081, 'duration': 4.682}, {'end': 22078.645, 'text': "and you're paid for asking the right question.", 'start': 22076.904, 'duration': 1.741}, {'end': 22081.698, 'text': 'in this world.', 'start': 22081.058, 'duration': 0.64}, {'end': 22084.019, 'text': 'the question is cheap in the data is expensive.', 'start': 22082.058, 'duration': 1.961}, {'end': 22086.4, 'text': "you're paid for collecting the data.", 'start': 22084.92, 'duration': 1.48}, {'end': 22094.404, 'text': 'so sometimes you will be in a situation where this is going to be important.', 'start': 22090.262, 'duration': 4.142}, {'end': 22100.167, 'text': "for example, let's suppose you're trying to understand who's going to buy my product.", 'start': 22095.264, 'duration': 4.903}, {'end': 22103.188, 'text': "you're asking the question.", 'start': 22102.328, 'duration': 0.86}, {'end': 22104.869, 'text': "let's say that my products aren't selling.", 'start': 22103.208, 'duration': 1.661}, {'end': 22116.184, 'text': "and you want to find out why what will you do? get what data? so let's say that you're selling your go.", 'start': 22106.549, 'duration': 9.635}, {'end': 22116.685, 'text': "i don't know.", 'start': 22116.384, 'duration': 0.301}, {'end': 22117.305, 'text': 'what do you want to sell??', 'start': 22116.685, 'duration': 0.62}, {'end': 22121.068, 'text': 'want to sell watches?', 'start': 22117.325, 'duration': 3.743}, {'end': 22125.211, 'text': "say so, let's suppose people aren't buying buying watches anymore, which is a reality, correct?", 'start': 22121.068, 'duration': 4.143}, {'end': 22128.073, 'text': "so you're a watch company who buys watches these.", 'start': 22125.672, 'duration': 2.401}, {'end': 22131.056, 'text': 'the entire business model of a watch is disappearing.', 'start': 22128.073, 'duration': 2.983}, {'end': 22131.676, 'text': 'do you have watches?', 'start': 22131.076, 'duration': 0.6}, {'end': 22134.759, 'text': 'some of you have.', 'start': 22131.936, 'duration': 2.823}, {'end': 22138.442, 'text': 'he has actually a surprising number of you have.', 'start': 22134.759, 'duration': 3.683}, {'end': 22140.023, 'text': 'maybe they do different things these days.', 'start': 22138.442, 'duration': 1.581}, {'end': 22145.069, 'text': "right that that seems like a very, that that's a fitness device is not really a watch at all.", 'start': 22140.023, 'duration': 5.046}, {'end': 22149.853, 'text': 'so something like this was actually with my daughter at lunch today.', 'start': 22147.051, 'duration': 2.802}, {'end': 22151.575, 'text': 'so she got something like this.', 'start': 22150.474, 'duration': 1.101}], 'summary': 'The discussion covers box plots, statistical thinking, and the importance of asking the right questions in data analysis and decision-making.', 'duration': 382.377, 'max_score': 21769.198, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-021769198.jpg'}, {'end': 22009.437, 'src': 'embed', 'start': 21980.903, 'weight': 5, 'content': [{'end': 21988.906, 'text': 'so therefore the statistical way of thinking typically says you formulate a problem and then you get the data to solve that problem.', 'start': 21980.903, 'duration': 8.003}, {'end': 21993.51, 'text': 'the machine learning we have looking at things typically says here is the data.', 'start': 21990.029, 'duration': 3.481}, {'end': 21996.892, 'text': 'tell me what that data is telling you.', 'start': 21994.551, 'duration': 2.341}, {'end': 22000.753, 'text': 'many of my colleagues and i myself have run into this problem and going for interviews, etcetera,', 'start': 21996.892, 'duration': 3.861}, {'end': 22009.437, 'text': "etcetera and so sort of statisticians say that we're not getting jobs out there,", 'start': 22000.753, 'duration': 8.684}], 'summary': 'Statistical thinking vs. machine learning in job market analysis.', 'duration': 28.534, 'max_score': 21980.903, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-021980903.jpg'}, {'end': 22262.616, 'src': 'embed', 'start': 22229.472, 'weight': 6, 'content': [{'end': 22231.793, 'text': "let's consider only data that is within you.", 'start': 22229.472, 'duration': 2.321}, {'end': 22232.953, 'text': "we'll go outside not to worry.", 'start': 22231.913, 'duration': 1.04}, {'end': 22235.354, 'text': "but let's say that i'm looking at my data.", 'start': 22233.834, 'duration': 1.52}, {'end': 22239.475, 'text': 'what data do i want to see and what questions do i want to ask of it?', 'start': 22235.854, 'duration': 3.621}, {'end': 22244.737, 'text': 'so, sales, year by year, times.', 'start': 22241.696, 'duration': 3.041}, {'end': 22254.191, 'text': 'and then what comparisons do i want to do here? With what purpose?', 'start': 22244.737, 'duration': 9.454}, {'end': 22255.932, 'text': 'What question am I asking?', 'start': 22254.531, 'duration': 1.401}, {'end': 22262.616, 'text': 'the data? What section of customers are buying my product compared to what?', 'start': 22255.932, 'duration': 6.684}], 'summary': 'Analyze sales data year by year to compare customer segments and purchase trends.', 'duration': 33.144, 'max_score': 22229.472, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-022229472.jpg'}], 'start': 19958.846, 'title': 'Housing data analysis and outliers removal', 'summary': 'Covers the analysis of a housing dataset with 81 columns, focusing on univariate analysis of the lot area column, identifying outliers and using count plot for exterior coating types. it also explains how to use quantile to identify and remove outliers in bivariate analysis, resulting in a more aligned scatter plot and improved understanding of the relationship between lot area and sale price.', 'chapters': [{'end': 20385.064, 'start': 19958.846, 'title': 'Housing data analysis', 'summary': 'Covers the analysis of a housing dataset with 81 columns, focusing on univariate analysis of the lot area column, identifying outliers and using count plot for exterior coating types.', 'duration': 426.218, 'highlights': ['The dataset contains around 81 columns, including square feet, number of bedrooms, number of bathrooms, and sale price.', 'Univariate analysis is performed on the lot area column using dist plot, revealing the presence of outliers and skewness in the data.', 'The concept of Kernel Density Estimate (KDE) is explained, with the option to visualize the distribution of housing lot areas using KDE.', 'A count plot is used to visualize the frequency of different exterior coating types, with a technique to improve readability by rotating and spreading the x-axis labels.']}, {'end': 20823.43, 'start': 20385.425, 'title': 'Removing outliers using quantile', 'summary': 'Explains how to use quantile to identify and remove outliers in bivariate analysis, resulting in a more aligned scatter plot and improved understanding of the relationship between lot area and sale price.', 'duration': 438.005, 'highlights': ['The chapter demonstrates using the quantile function to identify and remove outliers by finding the 0.5, 0.95, and 0.99 percentiles of lot area, resulting in the removal of outliers above the 95th percentile and a more aligned scatter plot.', 'It explains the concept of quantile and percentile, illustrating how to use the values to filter out outliers and make informed decisions on which data points to retain.', 'The speaker emphasizes the improvement in the scatter plot after removing outliers, leading to a clearer understanding of the relationship between lot area and sale price.']}, {'end': 21666.407, 'start': 20824.19, 'title': 'Plotting multiple graphs for square feet data', 'summary': "Explains how to use list comprehension to extract columns with 'square feet' data, and then plot 9 graphs on a single canvas to analyze the impact of various square feet measurements on the sales price of houses.", 'duration': 842.217, 'highlights': ["Using list comprehension to filter columns with 'square feet' data and storing them as SF_columns. The speaker demonstrates the use of list comprehension to filter all columns containing 'SF' and storing them as SF_columns, resulting in 9 columns with square feet data.", 'Plotting 9 graphs on a single canvas to analyze the impact of various square feet measurements on the sales price of houses. The explanation of plotting 9 graphs on a single canvas is crucial as it illustrates the analysis of the impact of different square feet measurements on the sales price of houses, providing a comprehensive visual representation.', "Demonstrating the usage of 'regplot' to showcase the impact of different square feet measurements on sales price. The use of 'regplot' to demonstrate the impact of different square feet measurements on sales price provides a quantitative analysis, offering insights into the relationship between square feet measurements and sales price."]}, {'end': 22329.662, 'start': 21666.847, 'title': 'Exterior painting material and sales price analysis', 'summary': 'Discusses the analysis of exterior painting material against sales price, emphasizing the use of box plots to identify outliers and understand the cost implications of different materials, while also highlighting the importance of asking the right questions and structuring data for problem-solving in a statistical and analytical context.', 'duration': 662.815, 'highlights': ['The use of box plots to identify outliers and understand the cost implications of different materials. The discussion emphasizes the use of box plots to visualize the relationship between exterior painting materials and sales price, highlighting the identification of outliers and the cost implications of materials such as vinyl and metal.', 'The importance of asking the right questions and structuring data for problem-solving in a statistical and analytical context. The chapter underlines the significance of asking the right questions and structuring data for statistical problem-solving, emphasizing the difference in approach between statistical and analytical thinking and the value of formulating precise questions for effective data analysis.', 'The emphasis on understanding the implications of sales data and identifying trends to address declining sales. The discussion delves into the importance of understanding sales data implications, such as identifying trends and addressing declining sales, highlighting the need to ask pertinent questions and structure data effectively in response to business challenges.']}], 'duration': 2370.816, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-019958846.jpg', 'highlights': ['The dataset contains around 81 columns, including square feet, bedrooms, bathrooms, and sale price.', 'Univariate analysis reveals outliers and skewness in the lot area column using dist plot and KDE visualization.', 'Using quantile function to remove outliers above the 95th percentile results in a more aligned scatter plot.', "List comprehension filters columns with 'square feet' data, plotting 9 graphs to analyze their impact on sales price.", 'Box plots are used to identify outliers and understand cost implications of different exterior coating materials.', 'Emphasizes the importance of asking the right questions and structuring data for statistical problem-solving.', 'Understanding sales data implications, identifying trends, and addressing declining sales are crucial for effective analysis.']}, {'end': 24414.741, 'segs': [{'end': 22391.254, 'src': 'embed', 'start': 22354.747, 'weight': 0, 'content': [{'end': 22358.149, 'text': 'now, what conclusions at the end of this do i want to be able to do?', 'start': 22354.747, 'duration': 3.402}, {'end': 22360.97, 'text': 'how do i need to?', 'start': 22359.95, 'duration': 1.02}, {'end': 22362.751, 'text': 'how do i want to use this information??', 'start': 22360.97, 'duration': 1.781}, {'end': 22365.313, 'text': 'now for this.', 'start': 22364.412, 'duration': 0.901}, {'end': 22371.796, 'text': 'you usually follow something like a three-step process, and you may have seen this, and this covers both these sites,', 'start': 22365.313, 'duration': 6.483}, {'end': 22374.557, 'text': 'and these words should be should be familiar to you to some extent.', 'start': 22371.796, 'duration': 2.761}, {'end': 22376.438, 'text': 'the first is called descriptive.', 'start': 22375.098, 'duration': 1.34}, {'end': 22379.84, 'text': 'the second is called predictive.', 'start': 22378.499, 'duration': 1.341}, {'end': 22391.254, 'text': "and the third is called prescriptive have these words been introduced you at least in this call at least you've read it.", 'start': 22382.793, 'duration': 8.461}], 'summary': 'A three-step process: descriptive, predictive, and prescriptive.', 'duration': 36.507, 'max_score': 22354.747, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-022354747.jpg'}, {'end': 22639.533, 'src': 'embed', 'start': 22616.022, 'weight': 1, 'content': [{'end': 22623.067, 'text': "if you are building an autonomous vehicle, you'll have situation saying the car has to do this, but it also has to follow certain other rules.", 'start': 22616.022, 'duration': 7.045}, {'end': 22631.451, 'text': "for example, if you see someone crossing the road, it should stop, but it shouldn't stop very suddenly,", 'start': 22624.209, 'duration': 7.242}, {'end': 22635.912, 'text': 'because if it stops very suddenly is going to hurt the car, is also probably going to hurt the driver.', 'start': 22631.451, 'duration': 4.461}, {'end': 22639.533, 'text': 'so it can it should needs to stop.', 'start': 22638.032, 'duration': 1.501}], 'summary': 'Autonomous vehicles should stop for pedestrians, but not too suddenly to prevent harm to the car and driver.', 'duration': 23.511, 'max_score': 22616.022, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-022616022.jpg'}, {'end': 23226.374, 'src': 'embed', 'start': 23197.052, 'weight': 2, 'content': [{'end': 23203.397, 'text': "let's say the doctor says that if your blood sugar is above 140, i'm going to do something if your blood sugar is a less than 140.", 'start': 23197.052, 'duration': 6.345}, {'end': 23204.478, 'text': "i'm not going to do anything.", 'start': 23203.397, 'duration': 1.081}, {'end': 23207.24, 'text': "i don't know whether this is the right number or not, but just let's make it up.", 'start': 23204.538, 'duration': 2.702}, {'end': 23212.585, 'text': 'now the doctor is going to see from you a number.', 'start': 23209.562, 'duration': 3.023}, {'end': 23215.844, 'text': 'it may be a single reading.', 'start': 23214.943, 'duration': 0.901}, {'end': 23216.665, 'text': 'it may be an average.', 'start': 23215.904, 'duration': 0.761}, {'end': 23217.686, 'text': 'it may be a number of things.', 'start': 23216.705, 'duration': 0.981}, {'end': 23226.374, 'text': 'how is the doctor going to translate? what they see from you and compare it to the 140?', 'start': 23218.586, 'duration': 7.788}], 'summary': "Doctor takes action if blood sugar > 140; no action if < 140. how to interpret patient's numbers?", 'duration': 29.322, 'max_score': 23197.052, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-023197052.jpg'}, {'end': 23647.139, 'src': 'embed', 'start': 23619.907, 'weight': 3, 'content': [{'end': 23628.291, 'text': 'it is simply telling you what is there? with respect to certain questions that you might possibly ask of it.', 'start': 23619.907, 'duration': 8.384}, {'end': 23633.773, 'text': 'what is the context to the case?', 'start': 23632.313, 'duration': 1.46}, {'end': 23641.477, 'text': "the market research team at a company's assigned the task to identify the profile of the typical customer free treadmill product offered by the company.", 'start': 23635.194, 'duration': 6.283}, {'end': 23647.139, 'text': 'the market research team decides to investigate whether there are differences across product line with respect to customer characteristics.', 'start': 23642.435, 'duration': 4.704}], 'summary': "Market research team tasked with identifying customer profile for company's free treadmill product.", 'duration': 27.232, 'max_score': 23619.907, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-023619907.jpg'}, {'end': 23889.545, 'src': 'embed', 'start': 23856.038, 'weight': 4, 'content': [{'end': 23860.12, 'text': 'numpy is something that was built more for mathematical problems than anything else.', 'start': 23856.038, 'duration': 4.082}, {'end': 23863.461, 'text': 'so some of the mathematical algorithms that are needed are there.', 'start': 23861.08, 'duration': 2.381}, {'end': 23865.382, 'text': 'there are other stats.', 'start': 23864.642, 'duration': 0.74}, {'end': 23871.185, 'text': "i plots in metal of plot life or seaborne and many other things that you've seen already.", 'start': 23865.382, 'duration': 5.803}, {'end': 23876.207, 'text': 'python is still figuring out how to arrange these libraries well enough.', 'start': 23871.185, 'duration': 5.022}, {'end': 23881.91, 'text': 'the, shall we say, that the programming bias is sometimes shows through in the libraries.', 'start': 23876.207, 'duration': 5.703}, {'end': 23889.545, 'text': 'so i for one do not remotely know this well enough to know what to import up front but a good session.', 'start': 23883.243, 'duration': 6.302}], 'summary': 'Numpy is built for mathematical problems, with available algorithms and stats. python is still arranging libraries, showing a programming bias.', 'duration': 33.507, 'max_score': 23856.038, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-023856038.jpg'}], 'start': 22330.822, 'title': 'Data analysis and business challenges', 'summary': 'Covers sales analysis, challenges in autonomous vehicles, descriptive analytics, fuzzy logic, and data analysis techniques. it emphasizes the need for understanding sales trends, navigating autonomous vehicle challenges, careful analysis in descriptive analytics, and the use of statistical approaches for customer data analysis.', 'chapters': [{'end': 22615.061, 'start': 22330.822, 'title': 'Sales analysis and business optimization', 'summary': 'Discusses the process of descriptive, predictive, and prescriptive analysis for sales data, emphasizing the need to understand sales trends, predict outcomes based on changes, and optimize business decisions to meet various requirements and improve profitability.', 'duration': 284.239, 'highlights': ['The process of descriptive, predictive, and prescriptive analysis for sales data is explained, outlining the importance of understanding sales trends and predicting outcomes based on changes.', 'The discussion emphasizes the need to optimize business decisions to meet various requirements, such as increasing sales while maintaining profitability and managing resources effectively.', 'The example of predicting sales based on changes in product pricing is highlighted, indicating the application of predictive analysis to understand the potential impact of business decisions on sales.', 'The concept of prescriptive analysis is illustrated through the analogy of diagnosing a pre-diabetic condition and issuing a prescription, emphasizing the translation of data into actionable insights to optimize outcomes.']}, {'end': 23147.96, 'start': 22616.022, 'title': 'Challenges in autonomous vehicles and descriptive analytics', 'summary': 'Discusses the challenges of an autonomous vehicle navigating complex situations and the complexity of descriptive analytics in understanding biological data, emphasizing the need for precise and careful analysis.', 'duration': 531.938, 'highlights': ['The challenges of an autonomous vehicle in navigating complex situations, such as stopping suddenly without hurting passengers or property, are highlighted, emphasizing the need for careful and precise decision-making.', 'The complexity of descriptive analytics in understanding biological data, especially the variability of blood composition and the challenges in reaching conclusions from random variables, is emphasized, underscoring the necessity for precise and thoughtful analysis.', 'The discussion on the challenges faced by doctors in recommending the right blood tests based on symptoms and the vast amount of information in biological data, stressing the difficulty in making accurate decisions from a wealth of data, is presented.', 'The variability of blood composition and the randomness of bodily fluids, as well as the challenges in reaching conclusions from random variables, are highlighted, emphasizing the complexity and precision required in analyzing biological data.', 'The challenges of sampling blood from different parts of the body and the time-based averaging in analyzing biological data, underlining the complexity and precision needed in understanding biological information, are discussed.']}, {'end': 23532.317, 'start': 23147.98, 'title': 'Understanding descriptive analytics and fuzzy logic', 'summary': 'Discusses the concept of descriptive analytics and fuzzy logic, exploring the use of thresholds, averaging, and uncertainty in medical data interpretation, and the need for descriptive and mathematical instruments.', 'duration': 384.337, 'highlights': ['The importance of thresholds in medical data interpretation, with the example of a blood sugar threshold of 140 and the need to compare readings to this threshold.', 'The concept of fuzzy logic as a way to handle uncertainty in data interpretation, using a range instead of a specific threshold to make decisions.', 'The need for descriptive analytics in understanding data variations and the limitations of purely descriptive instruments in solving analytical problems.', 'The introduction of the language of probability in medical data interpretation to express confidence levels and uncertainties in measurements.']}, {'end': 23830.502, 'start': 23534.622, 'title': 'Analyzing treadmill customer data', 'summary': 'Introduces the context for analyzing customer data for treadmill products and explains the statistical approach to understanding customer characteristics, with a focus on data collection and the importance of product-market fit.', 'duration': 295.88, 'highlights': ['The chapter introduces the context for analyzing customer data for treadmill products The chapter outlines the context for analyzing customer data to identify the profile of typical customers for treadmill products and investigate differences across product lines.', 'Explains the statistical approach to understanding customer characteristics The chapter contrasts statistical thinking with machine learning, emphasizing a problem-first, data-next approach that focuses on hypothesis, populations, and sampling, without making predictions or inferences.', 'Emphasizes the importance of product-market fit The chapter discusses the concept of product-market fit, highlighting the importance of matching what customers will buy with what can be made, and the implications for entrepreneurs in physical and software product spaces.']}, {'end': 24414.741, 'start': 23832.784, 'title': 'Data analysis techniques and challenges', 'summary': 'Discusses the challenges and techniques in data analysis, including the use of python libraries like pandas and numpy, handling of data types and the need for representative measures in statistics.', 'duration': 581.957, 'highlights': ['The discussion covers the purpose of Python libraries like Pandas and Numpy for data analysis, emphasizing their statistical and mathematical functionalities. Python libraries like Pandas and Numpy are used for statistical and mathematical analysis.', 'The challenges in organizing and importing data in Python are addressed, highlighting the need to handle data types and naming conventions effectively. Challenges in organizing and importing data, including handling of data types and naming conventions.', 'The need for representative measures in statistics is explained using examples like determining a representative age or weight for product design. The importance of representative measures in statistics, illustrated through examples in product design.']}], 'duration': 2083.919, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-022330822.jpg', 'highlights': ['The process of descriptive, predictive, and prescriptive analysis for sales data is explained, outlining the importance of understanding sales trends and predicting outcomes based on changes.', 'The challenges of an autonomous vehicle in navigating complex situations, such as stopping suddenly without hurting passengers or property, are highlighted, emphasizing the need for careful and precise decision-making.', 'The importance of thresholds in medical data interpretation, with the example of a blood sugar threshold of 140 and the need to compare readings to this threshold.', 'The chapter introduces the context for analyzing customer data for treadmill products The chapter outlines the context for analyzing customer data to identify the profile of typical customers for treadmill products and investigate differences across product lines.', 'The discussion covers the purpose of Python libraries like Pandas and Numpy for data analysis, emphasizing their statistical and mathematical functionalities. Python libraries like Pandas and Numpy are used for statistical and mathematical analysis.']}, {'end': 25497.611, 'segs': [{'end': 24520.993, 'src': 'embed', 'start': 24458.237, 'weight': 0, 'content': [{'end': 24462.339, 'text': 'the youngest is 18, 25% or a quarter of them are between 18 and 24, a quarter between 24 and 26,', 'start': 24458.237, 'duration': 4.102}, {'end': 24477.133, 'text': 'a quarter between 26 and 33 and a quarter are between 33 and 50..', 'start': 24462.339, 'duration': 14.794}, {'end': 24478.934, 'text': 'this is what is known as a distribution.', 'start': 24477.133, 'duration': 1.801}, {'end': 24482.316, 'text': 'this is what is known as a distribution statisticians love distributions.', 'start': 24479.475, 'duration': 2.841}, {'end': 24485.999, 'text': 'they capture the variability in the data and they would do all kinds of things with it.', 'start': 24482.356, 'duration': 3.643}, {'end': 24489.001, 'text': "so i'm going to draw typical shape of a distribution.", 'start': 24486.659, 'duration': 2.342}, {'end': 24492.183, 'text': 'we will make more sense of it later on.', 'start': 24490.262, 'duration': 1.921}, {'end': 24494.724, 'text': 'this is a theoretical distribution distribution.', 'start': 24492.743, 'duration': 1.981}, {'end': 24496.666, 'text': "for example, let's say has a minimum.", 'start': 24494.764, 'duration': 1.902}, {'end': 24501.249, 'text': 'as a maximum, as a 25% point.', 'start': 24498.907, 'duration': 2.342}, {'end': 24508.906, 'text': 'as a 50% point, it says 75%.', 'start': 24501.249, 'duration': 7.657}, {'end': 24514.269, 'text': 'in terms of probabilities this 25% here.', 'start': 24508.906, 'duration': 5.363}, {'end': 24514.909, 'text': '25% here.', 'start': 24514.269, 'duration': 0.64}, {'end': 24516.631, 'text': '25% here.', 'start': 24514.909, 'duration': 1.722}, {'end': 24520.993, 'text': '25% here, if you want to think in terms of pure description.', 'start': 24516.631, 'duration': 4.362}], 'summary': 'Data distribution: 25% aged 18-24, 25% 24-26, 25% 26-33, and 25% 33-50.', 'duration': 62.756, 'max_score': 24458.237, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-024458237.jpg'}, {'end': 24929.518, 'src': 'embed', 'start': 24903.275, 'weight': 2, 'content': [{'end': 24907.698, 'text': "so when the mean is not equal to the median, that's a signal that the left is not equal to the right.", 'start': 24903.275, 'duration': 4.423}, {'end': 24915.555, 'text': 'and when the mean is a little more than the median, it says that there is some data that has been pushed to the right.', 'start': 24909.653, 'duration': 5.902}, {'end': 24923.737, 'text': 'and that should be something that you can guess here because the mean and the median to some extent are what 2426 etc.', 'start': 24917.735, 'duration': 6.002}, {'end': 24926.378, 'text': 'the lowest is 18.', 'start': 24924.457, 'duration': 1.921}, {'end': 24929.518, 'text': "that's about six six years eight years less than that.", 'start': 24926.378, 'duration': 3.14}], 'summary': 'Mean not equal to median indicates data imbalance. lowest value: 18.', 'duration': 26.243, 'max_score': 24903.275, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-024903275.jpg'}, {'end': 24984.285, 'src': 'embed', 'start': 24961.336, 'weight': 3, 'content': [{'end': 24968.519, 'text': "yes so therefore one reason that the median often doesn't move is because it is not that sensitive to outliers.", 'start': 24961.336, 'duration': 7.183}, {'end': 24973.101, 'text': "so let's suppose for example, we look at us as us and we ask ourselves.", 'start': 24969.979, 'duration': 3.122}, {'end': 24979.363, 'text': 'what is our mean income or our median income? and we have that each of us make a certain amount of money.', 'start': 24973.521, 'duration': 5.842}, {'end': 24981.944, 'text': 'we can sort that up in sections and put that in now.', 'start': 24979.403, 'duration': 2.541}, {'end': 24984.285, 'text': "let's suppose that mr. mukesh ambani walks into the room.", 'start': 24981.964, 'duration': 2.321}], 'summary': 'The median is less sensitive to outliers, as illustrated by the example of income distribution, with the potential impact of an outlier like mukesh ambani.', 'duration': 22.949, 'max_score': 24961.336, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-024961336.jpg'}, {'end': 25171.039, 'src': 'embed', 'start': 25144.722, 'weight': 4, 'content': [{'end': 25149.83, 'text': 'if it is positive, it usually correspond try skewness mean minus median.', 'start': 25144.722, 'duration': 5.108}, {'end': 25152.771, 'text': 'negative usually corresponds to left skewness.', 'start': 25149.83, 'duration': 2.941}, {'end': 25159.614, 'text': 'this is a statistical rule, but sometimes it is used as a definition for skewness.', 'start': 25154.432, 'duration': 5.182}, {'end': 25171.039, 'text': 'there are many definitions for skewness skewed data sometimes causes difficulties in analysis because what happens is the idea of variation changes.', 'start': 25161.555, 'duration': 9.484}], 'summary': 'Skewed data can cause analysis difficulties due to variation changes.', 'duration': 26.317, 'max_score': 25144.722, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-025144722.jpg'}, {'end': 25333.829, 'src': 'embed', 'start': 25301.731, 'weight': 8, 'content': [{'end': 25303.472, 'text': 'it will give you some ideas as to how that works.', 'start': 25301.731, 'duration': 1.741}, {'end': 25307.614, 'text': "it's a nice book, is one of the best books that you have in business statistics,", 'start': 25303.712, 'duration': 3.902}, {'end': 25310.535, 'text': "but it's not necessarily a book that will tell you how to code things up.", 'start': 25307.614, 'duration': 2.921}, {'end': 25312.656, 'text': 'that is not a deficiency of the book.', 'start': 25311.316, 'duration': 1.34}, {'end': 25314.898, 'text': 'not every book can do things of that sort.', 'start': 25313.117, 'duration': 1.781}, {'end': 25320.3, 'text': 'there are other books around that will tell you how to code things up, but will not explain what you are doing.', 'start': 25315.438, 'duration': 4.862}, {'end': 25322.441, 'text': "it's important to know what you are doing.", 'start': 25320.96, 'duration': 1.481}, {'end': 25324.703, 'text': "it's also important to know why you're doing it.", 'start': 25323.021, 'duration': 1.682}, {'end': 25328.586, 'text': "but books can't be written with often everything in my guess.", 'start': 25326.444, 'duration': 2.142}, {'end': 25333.829, 'text': 'the thinking is here.', 'start': 25333.029, 'duration': 0.8}], 'summary': "The book is great for business statistics, but doesn't teach coding. it's important to understand and explain the concepts.", 'duration': 32.098, 'max_score': 25301.731, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-025301731.jpg'}, {'end': 25424.63, 'src': 'embed', 'start': 25390.922, 'weight': 7, 'content': [{'end': 25392.302, 'text': 'things are very well organized these days.', 'start': 25390.922, 'duration': 1.38}, {'end': 25400.787, 'text': "there's also the question and i should give you a very slight warning here or to not to discourage you from anything.", 'start': 25395.204, 'duration': 5.583}, {'end': 25406.49, 'text': 'but in the next nine months or thereabouts the duration of your program.', 'start': 25402.408, 'duration': 4.082}, {'end': 25410.12, 'text': "there's going to be a fair amount of material that will be thrown at you.", 'start': 25407.418, 'duration': 2.702}, {'end': 25417.905, 'text': 'correct the look and feel will sometimes be like what we would what we would often call it mit as drinking from a fire hose.', 'start': 25412.422, 'duration': 5.483}, {'end': 25424.63, 'text': 'you can if you want to but you will get very wet.', 'start': 25420.347, 'duration': 4.283}], 'summary': "In the next nine months, expect a lot of material thrown at you, like 'drinking from a fire hose' at mit.", 'duration': 33.708, 'max_score': 25390.922, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-025390922.jpg'}, {'end': 25461.224, 'src': 'embed', 'start': 25436.021, 'weight': 6, 'content': [{'end': 25441.831, 'text': 'but if you try to get into equal depth on every topic that you want to learn that will take up a lot of your professional time.', 'start': 25436.021, 'duration': 5.81}, {'end': 25447.997, 'text': 'now the reason we do the statistics part first one.', 'start': 25445.216, 'duration': 2.781}, {'end': 25453.22, 'text': "it's it's a little easier from a computational perspective, although harder from a conceptual perspective.", 'start': 25448.137, 'duration': 5.083}, {'end': 25461.224, 'text': 'so we begin it this way, but hold on to that idea and then, as you keep going, see if this is something that you want to learn more on,', 'start': 25453.26, 'duration': 7.964}], 'summary': 'Focusing on statistics first saves time and offers computational ease.', 'duration': 25.203, 'max_score': 25436.021, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-025436021.jpg'}], 'start': 24414.841, 'title': 'Understanding data distribution and outliers', 'summary': 'Covers data distribution using a 5-point summary, differences between mean and median, and the impact of outliers on median, with practical examples and a focus on a dataset revealing a right-skewed distribution. it also emphasizes the importance of understanding statistics through books and the challenges in the program duration.', 'chapters': [{'end': 24954.561, 'start': 24414.841, 'title': 'Data distribution and parameters', 'summary': 'Explains the concept of data distribution using a 5-point summary and discusses the differences between mean and median as parameters, with practical examples and a focus on a dataset with a range of 18 to 50 years, revealing a right-skewed distribution.', 'duration': 539.72, 'highlights': ['The concept of data distribution is explained using a 5-point summary, covering the minimum, 25%, 50%, 75%, and maximum values of the dataset, providing a practical demonstration of how to represent the range of a dataset. ', 'The differences between mean and median as parameters are discussed in detail, providing practical examples of their application and the implications of a dataset with a range of 18 to 50 years, highlighting a right-skewed distribution. Age range of dataset: 18 to 50 years', 'The impact of a right-skewed distribution on the mean and median is explained, with emphasis on the practical implications of a dataset where the mean is higher than the median, indicating a push of the data to the right, towards older ages. Age range with a mean higher than the median: 24 to 50 years']}, {'end': 25171.039, 'start': 24961.336, 'title': 'Effect of outliers on median', 'summary': "Discusses the impact of outliers on the median, demonstrating that the median is less sensitive to outliers, as exemplified by the scenario of mr. mukesh ambani's income, and explains the concept of skewness using a statistical rule for measuring skewness.", 'duration': 209.703, 'highlights': ["The median is less sensitive to outliers, as evidenced by the scenario of Mr. Mukesh Ambani's income, where the median remains almost the same despite a significant increase in his income, indicating that it is not heavily influenced by extreme values.", 'The concept of skewness is explained using the statistical rule for measuring skewness, where a positive value of mean minus median indicates right skewness, and a negative value indicates left skewness, with skewed data posing challenges in analysis due to variations.', 'The discussion includes an illustrative example of income distribution, highlighting the stability of the median even in the presence of extreme values, and the implication of skewness in data analysis, emphasizing the impact of outliers on measures of central tendency.']}, {'end': 25497.611, 'start': 25172.25, 'title': 'Utilizing statistics books', 'summary': 'Discusses the importance of understanding statistics through books, emphasizing the need for depth in learning, and the upcoming challenges in the program duration.', 'duration': 325.361, 'highlights': ['The importance of understanding the statistic side of books is emphasized, suggesting depth of learning. Emphasizes understanding the statistics side of books, highlighting the need for depth in learning.', 'Upcoming challenges in the program duration are mentioned, likening it to drinking from a fire hose. Warns about the significant amount of material to be covered in the upcoming program duration, likening it to drinking from a fire hose.', 'Recommendation of a well-written book for understanding statistics, with the offer of assistance for explanations. Recommends a well-written book for understanding statistics, offering assistance for explanations.']}], 'duration': 1082.77, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-024414841.jpg', 'highlights': ['The concept of data distribution is explained using a 5-point summary, covering the minimum, 25%, 50%, 75%, and maximum values of the dataset, providing a practical demonstration of how to represent the range of a dataset.', 'The differences between mean and median as parameters are discussed in detail, providing practical examples of their application and the implications of a dataset with a range of 18 to 50 years, highlighting a right-skewed distribution. Age range of dataset: 18 to 50 years', 'The impact of a right-skewed distribution on the mean and median is explained, with emphasis on the practical implications of a dataset where the mean is higher than the median, indicating a push of the data to the right, towards older ages. Age range with a mean higher than the median: 24 to 50 years', "The median is less sensitive to outliers, as evidenced by the scenario of Mr. Mukesh Ambani's income, where the median remains almost the same despite a significant increase in his income, indicating that it is not heavily influenced by extreme values.", 'The concept of skewness is explained using the statistical rule for measuring skewness, where a positive value of mean minus median indicates right skewness, and a negative value indicates left skewness, with skewed data posing challenges in analysis due to variations.', 'The discussion includes an illustrative example of income distribution, highlighting the stability of the median even in the presence of extreme values, and the implication of skewness in data analysis, emphasizing the impact of outliers on measures of central tendency.', 'The importance of understanding the statistic side of books is emphasized, suggesting depth of learning. Emphasizes understanding the statistics side of books, highlighting the need for depth in learning.', 'Upcoming challenges in the program duration are mentioned, likening it to drinking from a fire hose. Warns about the significant amount of material to be covered in the upcoming program duration, likening it to drinking from a fire hose.', 'Recommendation of a well-written book for understanding statistics, with the offer of assistance for explanations. Recommends a well-written book for understanding statistics, offering assistance for explanations.']}, {'end': 27182.186, 'segs': [{'end': 25586.229, 'src': 'embed', 'start': 25531.071, 'weight': 0, 'content': [{'end': 25536.036, 'text': 'and what is the formula for a standard deviation? std is equal to.', 'start': 25531.071, 'duration': 4.965}, {'end': 25539.12, 'text': 'the square root of.', 'start': 25538.139, 'duration': 0.981}, {'end': 25561.434, 'text': 'little bit of a mess.', 'start': 25560.694, 'duration': 0.74}, {'end': 25567.818, 'text': 'but two steps step 1 calculate the average.', 'start': 25563.616, 'duration': 4.202}, {'end': 25574.522, 'text': 'step 2 take the distance from the average for every observation.', 'start': 25569.98, 'duration': 4.542}, {'end': 25577.304, 'text': 'ask the question.', 'start': 25576.484, 'duration': 0.82}, {'end': 25580.206, 'text': 'how far is every data point from the middle?', 'start': 25577.764, 'duration': 2.442}, {'end': 25586.229, 'text': 'if it is very far from the middle, say that the deviation is more.', 'start': 25582.587, 'duration': 3.642}], 'summary': 'Standard deviation formula: std = square root of (distance from average for every observation)', 'duration': 55.158, 'max_score': 25531.071, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-025531071.jpg'}, {'end': 25982.479, 'src': 'embed', 'start': 25953.617, 'weight': 1, 'content': [{'end': 25955.458, 'text': 'this one is a lot less sensitive to outliers.', 'start': 25953.617, 'duration': 1.841}, {'end': 25963.693, 'text': 'this one what it does is if it is far away the 22 squares to 484 or something like that, which is a large number.', 'start': 25956.57, 'duration': 7.123}, {'end': 25973.016, 'text': 'so the standard deviation is often driven by very large deviances larger the deviance the more it blows up.', 'start': 25965.073, 'duration': 7.943}, {'end': 25978.116, 'text': 'and so therefore, this is often very criticized.', 'start': 25976.055, 'duration': 2.061}, {'end': 25982.479, 'text': 'if you read, for example, the finance literature, this guy called talib nassim talib,', 'start': 25978.116, 'duration': 4.363}], 'summary': 'Data analysis method criticizes large deviances in finance literature.', 'duration': 28.862, 'max_score': 25953.617, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-025953617.jpg'}, {'end': 26087.512, 'src': 'embed', 'start': 26060.066, 'weight': 2, 'content': [{'end': 26064.528, 'text': 'if i take your blood pressure, how far from your average blood pressure is this reading?', 'start': 26060.066, 'duration': 4.462}, {'end': 26069.691, 'text': "if this is exactly equal, then i don't need to worry about variability.", 'start': 26066.469, 'duration': 3.222}, {'end': 26071.372, 'text': "every time i measure blood pressure, i'll see the same thing.", 'start': 26069.691, 'duration': 1.681}, {'end': 26081.258, 'text': "what is your average bank balance? don't tell me that but but but you know what i mean, right? you have an average bank balance.", 'start': 26074.214, 'duration': 7.044}, {'end': 26087.512, 'text': 'your bank account manager or your bank actually tracks this what your average bank balance is.', 'start': 26082.729, 'duration': 4.783}], 'summary': 'Measuring variability in blood pressure and tracking average bank balance.', 'duration': 27.446, 'max_score': 26060.066, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-026060066.jpg'}, {'end': 26711.826, 'src': 'embed', 'start': 26655.972, 'weight': 3, 'content': [{'end': 26659.394, 'text': 'a histogram is an example how to do that.', 'start': 26655.972, 'duration': 3.422}, {'end': 26662.106, 'text': 'and i tend to agree.', 'start': 26660.444, 'duration': 1.662}, {'end': 26669.393, 'text': 'if you want to test yourself of your understanding of data and your understanding of any programming language and any visualization language,', 'start': 26662.106, 'duration': 7.287}, {'end': 26670.314, 'text': 'code a histogram in it.', 'start': 26669.393, 'duration': 0.921}, {'end': 26673.477, 'text': 'and have fun.', 'start': 26673.096, 'duration': 0.381}, {'end': 26682.765, 'text': "so it's a nice challenge from many perspectives the data challenge the language challenge the visualization challenge all of that.", 'start': 26675.919, 'duration': 6.846}, {'end': 26693.435, 'text': 'yes will companies do that that they want archival data to be of only one data form only one format.', 'start': 26682.785, 'duration': 10.65}, {'end': 26702.722, 'text': "why is that so because as i said when you store data, how do you store it? let's say that you've generated an analysis.", 'start': 26694.898, 'duration': 7.824}, {'end': 26707.364, 'text': "the analysis is done correct and you've decided not to destroy the data.", 'start': 26703.502, 'duration': 3.862}, {'end': 26711.826, 'text': "you're going to keep the data in your company's databases or in your own database.", 'start': 26708.845, 'duration': 2.981}], 'summary': "Coding a histogram is a valuable challenge for testing data and programming language understanding, with practical applications in companies' data storage.", 'duration': 55.854, 'max_score': 26655.972, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-026655972.jpg'}], 'start': 25497.611, 'title': 'Understanding data variability', 'summary': 'Introduces standard deviation and variability in data, emphasizing its significance in measuring data spread, discussing the contemporary use of standard deviation, and covering measures of dispersion and data visualization, along with the impact of data format on analytical solutions.', 'chapters': [{'end': 25732.064, 'start': 25497.611, 'title': 'Understanding standard deviation', 'summary': 'Introduces the concept of standard deviation, explaining its formula and significance in measuring the spread of data, with emphasis on the importance of considering both positive and negative deviations.', 'duration': 234.453, 'highlights': ['The standard deviation is a measure of how spread a typical observation is from the average, calculated using the formula std = sqrt(Σ(x_i - x̄)^2 / (n-1)), where deviation represents the variation from the average and the squaring enables consideration of both positive and negative deviations.', 'In modern machine learning, the mean absolute deviation (MAD) is sometimes used as an alternative measure of variability, where absolute values are taken without squaring, offering a different approach to assessing data spread.', 'The formula for standard deviation involves two key steps: calculating the average and then determining the distance of each data point from the average, with the resulting measure providing insight into the variability of the data set.']}, {'end': 26035.246, 'start': 25732.104, 'title': 'Understanding variability in data', 'summary': 'Discusses the concept of variability in data, emphasizing the difference in ages and the historical debate between mean absolute deviation and standard deviation, ultimately showcasing the contemporary use of standard deviation in data analysis.', 'duration': 303.142, 'highlights': ["The debate between mean absolute deviation and standard deviation was historically influenced by mathematicians such as Gauss and Laplace, with Gauss's method prevailing due to its compatibility with calculus.", "Standard deviation is often criticized for its sensitivity to outliers, as larger deviances can significantly impact the result, a point emphasized by finance literature, particularly by Nassim Taleb in his books 'The Black Swan' and 'Fooled by Randomness'.", 'Contemporary data analysis predominantly utilizes standard deviation, facilitated by the ease of computational implementation, rendering the historical debate regarding calculation methods irrelevant in the modern context.']}, {'end': 26682.765, 'start': 26037.828, 'title': 'Measures of dispersion and data visualization', 'summary': 'Covers the concept of measuring variability through examples of blood pressure and bank balances, explains the calculation of interquartile range and range, and demonstrates the use of histograms for visualizing data distribution.', 'duration': 644.937, 'highlights': ['The chapter explains the concept of measuring variability through examples of blood pressure and bank balances. The discussion includes examples of blood pressure and bank balances to illustrate the concept of variability in measurements.', 'It demonstrates the calculation of interquartile range and range as measures of dispersion. The explanation includes the calculation of interquartile range and range as measures of dispersion, emphasizing their significance in analyzing data variability.', 'The chapter also showcases the use of histograms for visualizing data distribution. The text introduces the use of histograms as a visualization tool to represent the distribution of data, highlighting its significance in data analysis and interpretation.']}, {'end': 27182.186, 'start': 26682.785, 'title': 'Archival data format and analytical solutions', 'summary': 'Discusses the importance of storing archival data in a single format to ensure consistency for analytical algorithms, the impact of data format on the efficiency and deployment of analytical models, and the cultural and regulatory factors influencing data storage and analytical decisions in companies.', 'duration': 499.401, 'highlights': ['Companies prefer storing archival data in a single format to maintain consistency for analytical algorithms. Storing data in a single format ensures that analytical algorithms can assume certain things about the data, leading to standardized and predictable outcomes for different algorithms.', 'The format of the data can significantly impact the efficiency and deployment of analytical models in companies. Adopting a standardized data format allows for the development of efficient and stable analytical models, reducing the need for rebuilding models with new variables and saving time and resources in model deployment.', 'Cultural and regulatory factors play a significant role in the decisions related to data storage and analytical solutions in companies. The decisions regarding data format and analytical solutions are influenced by cultural considerations within a company, as well as regulatory requirements, such as audits, which demand clear and transparent data storage and decision-making processes.']}], 'duration': 1684.575, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-025497611.jpg', 'highlights': ['The formula for standard deviation involves two key steps: calculating the average and then determining the distance of each data point from the average, with the resulting measure providing insight into the variability of the data set.', 'Contemporary data analysis predominantly utilizes standard deviation, facilitated by the ease of computational implementation, rendering the historical debate regarding calculation methods irrelevant in the modern context.', 'The chapter explains the concept of measuring variability through examples of blood pressure and bank balances. The discussion includes examples of blood pressure and bank balances to illustrate the concept of variability in measurements.', 'The chapter also showcases the use of histograms for visualizing data distribution. The text introduces the use of histograms as a visualization tool to represent the distribution of data, highlighting its significance in data analysis and interpretation.', 'Companies prefer storing archival data in a single format to maintain consistency for analytical algorithms. Storing data in a single format ensures that analytical algorithms can assume certain things about the data, leading to standardized and predictable outcomes for different algorithms.']}, {'end': 28407.669, 'segs': [{'end': 27211.139, 'src': 'embed', 'start': 27182.226, 'weight': 4, 'content': [{'end': 27187.769, 'text': "it's also what makes it interesting and it's sort of interesting and exciting.", 'start': 27182.226, 'duration': 5.543}, {'end': 27189.83, 'text': "it's not all bad.", 'start': 27188.409, 'duration': 1.421}, {'end': 27196.513, 'text': 'okay. so the histogram command summaries of what these histograms are, and each gives you a sense of what the distribution is.', 'start': 27190.51, 'duration': 6.003}, {'end': 27203.017, 'text': 'and, as you can see from most of these pictures, most of these variables, when they do have a skew, tend to have a right skew.', 'start': 27196.513, 'duration': 6.504}, {'end': 27205.078, 'text': 'maybe education has a little bit of a left skew.', 'start': 27203.057, 'duration': 2.021}, {'end': 27211.139, 'text': 'maybe education a little bit of a left skew that a few people are educated and most people are here.', 'start': 27207.196, 'duration': 3.943}], 'summary': 'Histograms show right skew for most variables, some left skew for education.', 'duration': 28.913, 'max_score': 27182.226, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-027182226.jpg'}, {'end': 27261.651, 'src': 'embed', 'start': 27233.335, 'weight': 3, 'content': [{'end': 27237.657, 'text': 'um people are unsure as to where this box came from.', 'start': 27233.335, 'duration': 4.322}, {'end': 27240.678, 'text': "because there's a sensation called box.", 'start': 27239.338, 'duration': 1.34}, {'end': 27245.902, 'text': "who's used this before but this box came from what it used to be called a box and whisker plot.", 'start': 27242.219, 'duration': 3.683}, {'end': 27246.642, 'text': 'these are the whiskers.', 'start': 27245.982, 'duration': 0.66}, {'end': 27249.724, 'text': 'this whisker will go.', 'start': 27248.503, 'duration': 1.221}, {'end': 27251.525, 'text': 'this is this is the median.', 'start': 27250.244, 'duration': 1.281}, {'end': 27257.148, 'text': 'this is the upper quartile the top edge of the box.', 'start': 27254.106, 'duration': 3.042}, {'end': 27261.651, 'text': 'the bottom edge of the box is the lower quartile.', 'start': 27258.709, 'duration': 2.942}], 'summary': 'The box and whisker plot represents data distribution with the median, upper and lower quartiles.', 'duration': 28.316, 'max_score': 27233.335, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-027233335.jpg'}, {'end': 27696.543, 'src': 'embed', 'start': 27651.799, 'weight': 1, 'content': [{'end': 27657.424, 'text': "they've got three kinds of treadmills and they're trying to understand which.", 'start': 27651.799, 'duration': 5.625}, {'end': 27659.506, 'text': 'who was using what kind of treadmill.', 'start': 27657.424, 'duration': 2.082}, {'end': 27662.749, 'text': 'a business problem is to understand who is using what product.', 'start': 27659.506, 'duration': 3.243}, {'end': 27663.59, 'text': 'this is a cross step.', 'start': 27662.749, 'duration': 0.841}, {'end': 27668.754, 'text': 'what is this? this is something that will be used for categorical variables.', 'start': 27663.93, 'duration': 4.824}, {'end': 27670.916, 'text': 'no box plot will make sense here.', 'start': 27669.535, 'duration': 1.381}, {'end': 27672.678, 'text': 'there are no numbers.', 'start': 27672.137, 'duration': 0.541}, {'end': 27683.375, 'text': 'so now you can ask interesting questions here if you want to and you can think about how to answer it is that for example, you can ask a question.', 'start': 27676.65, 'duration': 6.725}, {'end': 27686.016, 'text': 'is there a difference between the preferences of men and women?', 'start': 27683.695, 'duration': 2.321}, {'end': 27691.78, 'text': 'possibly is there a difference in the products that they prove?', 'start': 27688.018, 'duration': 3.762}, {'end': 27696.543, 'text': 'it that, irrespective of gender, is there a product that that they prefer?', 'start': 27691.78, 'duration': 4.763}], 'summary': 'Analyzing treadmill usage to understand user preferences and trends.', 'duration': 44.744, 'max_score': 27651.799, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-027651799.jpg'}, {'end': 27744.619, 'src': 'embed', 'start': 27717.872, 'weight': 0, 'content': [{'end': 27724.076, 'text': "what i'm saying is that if for this if you want to do a little more analysis on it, you now have to reach a conclusion based on it.", 'start': 27717.872, 'duration': 6.204}, {'end': 27734.865, 'text': 'so for example, one conclusion to ask is is that is that do men and women have the same preferences when it comes to the fitness product they use.', 'start': 27725.838, 'duration': 9.027}, {'end': 27744.619, 'text': "now, that's a question to answer that question is enough to look at the data, but just looking at it will not give me the answer.", 'start': 27736.957, 'duration': 7.662}], 'summary': 'Analyzing data to determine if men and women have the same fitness product preferences.', 'duration': 26.747, 'max_score': 27717.872, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-027717872.jpg'}, {'end': 27919.291, 'src': 'embed', 'start': 27865.771, 'weight': 5, 'content': [{'end': 27868.153, 'text': 'but what you really need is a phd in powerpoint engineering.', 'start': 27865.771, 'duration': 2.382}, {'end': 27872.575, 'text': "i mean that's a necessary qualification for success.", 'start': 27870.414, 'duration': 2.161}, {'end': 27875.317, 'text': 'so certain tools have been used.', 'start': 27873.996, 'duration': 1.321}, {'end': 27880.475, 'text': 'so therefore those tools have been implemented in many of these softwares as well.', 'start': 27875.597, 'duration': 4.878}, {'end': 27883.136, 'text': 'this is the pivot table version of the same data set.', 'start': 27880.736, 'duration': 2.4}, {'end': 27887.978, 'text': 'this is the last sort of not last but still this is this is a this is a plot.', 'start': 27884.077, 'duration': 3.901}, {'end': 27892.22, 'text': "let me show you this plot and then we'll end or we'll take a break.", 'start': 27889.098, 'duration': 3.122}, {'end': 27899.582, 'text': 'this is a plot that is a very popular plot because it is a very lazy plot.', 'start': 27893.92, 'duration': 5.662}, {'end': 27905.004, 'text': 'this plot requires extremely little thinking pair plot of a data frame.', 'start': 27901.183, 'duration': 3.821}, {'end': 27908.985, 'text': "right? you don't care what the variables are.", 'start': 27906.865, 'duration': 2.12}, {'end': 27912.267, 'text': "you're telling it nothing about the plots.", 'start': 27910.666, 'duration': 1.601}, {'end': 27917.089, 'text': "you're simply saying figure out a way to plot them pair by pair.", 'start': 27913.647, 'duration': 3.442}, {'end': 27919.291, 'text': 'and it does that.', 'start': 27918.75, 'duration': 0.541}], 'summary': 'Implementing tools in softwares, using pivot tables and creating lazy plot for data analysis.', 'duration': 53.52, 'max_score': 27865.771, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-027865771.jpg'}, {'end': 28032.9, 'src': 'embed', 'start': 27998.487, 'weight': 7, 'content': [{'end': 28005.432, 'text': "univariate means i'm looking at it variable by variable one variable at a time when i'm looking at age.", 'start': 27998.487, 'duration': 6.945}, {'end': 28008.195, 'text': "i'm only looking at age.", 'start': 28006.153, 'duration': 2.042}, {'end': 28019.984, 'text': 'so univariate analysis is just a word uni as in uniform same form unicycle cycle with one wheel things like that univariate unit.', 'start': 28009.015, 'duration': 10.969}, {'end': 28031.8, 'text': 'another set of data in replicate the same it will replicate the same nature of the data.', 'start': 28027.079, 'duration': 4.721}, {'end': 28032.9, 'text': "They'll be histogram here again.", 'start': 28031.82, 'duration': 1.08}], 'summary': 'Univariate analysis focuses on examining one variable at a time, such as age, to understand its distribution through histograms.', 'duration': 34.413, 'max_score': 27998.487, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-027998487.jpg'}], 'start': 27182.226, 'title': 'Data visualization and analysis', 'summary': 'Covers the use of histograms, box plots, and pair plots for summarizing data distribution, understanding treadmill usage through categorization and chi-square test, and visualization tools such as bar plots, pivot tables, and pair plots for data analysis.', 'chapters': [{'end': 27651.799, 'start': 27182.226, 'title': 'Understanding box plots and histograms', 'summary': 'Explains the use of histograms and box plots in summarizing data distribution, with examples illustrating right skewness and the identification of outliers, and also provides insights into interpreting box plot components and their significance.', 'duration': 469.573, 'highlights': ['Box plot components and interpretation Explains the components of a box plot, such as the whiskers, median, and quartiles, with a demonstration of interpreting the box plot for data distribution and identification of outliers.', 'Histograms for data distribution visualization Describes the use of histograms for visualizing data distribution, indicating the presence of right skewness in most variables and a left skew in the education variable.', 'Identification and interpretation of outliers Provides a definition of outliers in the context of box plots, emphasizing the use of 1.5 times the interquartile range as a criterion for identifying outliers.', 'Significance of box plot for skewed data Highlights the significance of box plots in representing right-skewed data, demonstrating the relationship between quartiles and median in summarizing skewed data distribution.']}, {'end': 27805.358, 'start': 27651.799, 'title': 'Treadmill usage analysis', 'summary': 'Discusses categorization of treadmill users, posing questions about gender preferences and product usage, and introduces the concept of chi-square test for future analysis.', 'duration': 153.559, 'highlights': ['Introduction of chi-square test for analyzing gender preferences in product usage Introduces the concept of chi-square test for comparing preferences between men and women in fitness product usage, setting the stage for future analysis.', 'Categorization of treadmill users based on product preference and gender Discusses the categorization of users based on their preferences and gender, highlighting the potential differences in product usage among men and women.', 'Discussion on understanding user preferences and product usage Explores the business problem of understanding user preferences for different treadmills and products, emphasizing the need for analysis and conclusions based on the data.']}, {'end': 28032.9, 'start': 27805.358, 'title': 'Visualization tools and plot types', 'summary': 'Covers the use of counts, bar plots, pivot tables, and pair plots for data visualization, along with insights on univariate analysis and the convenience of certain plot types.', 'duration': 227.542, 'highlights': ['Pair plot creates a matrix for variables and plots them pair by pair, making it a convenient and lazy plot for quick visualization.', 'Pivot tables are likened to the convenience of pivot tables in Excel, with the emphasis on the usefulness of PowerPoint engineering in corporate success.', 'Univariate analysis is explained as a variable-by-variable analysis, focusing on one variable at a time, often resulting in histograms for visualization.']}, {'end': 28407.669, 'start': 28037.102, 'title': 'Understanding pair plots in data analysis', 'summary': 'Explains the pair plot command, which ignores objects and plots numeric data, with histograms providing distribution and count visualizations, and cautions against unnecessary histogram manipulation.', 'duration': 370.567, 'highlights': ['The pair plot command ignores objects and plots numeric data, resulting in histograms that visualize the distribution and count of the data.', 'Histograms in the pair plot display the count of observations for each age group, serving as an optical device to visualize the shape and count of the data.', "It is advised not to manipulate histograms unless one has significant experience in data analysis, as changing the bin width can alter the histogram's shape."]}], 'duration': 1225.443, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-027182226.jpg', 'highlights': ['Introduction of chi-square test for analyzing gender preferences in product usage', 'Categorization of treadmill users based on product preference and gender', 'Discussion on understanding user preferences and product usage', 'Box plot components and interpretation Explains the components of a box plot, such as the whiskers, median, and quartiles, with a demonstration of interpreting the box plot for data distribution and identification of outliers.', 'Histograms for data distribution visualization Describes the use of histograms for visualizing data distribution, indicating the presence of right skewness in most variables and a left skew in the education variable.', 'Pair plot creates a matrix for variables and plots them pair by pair, making it a convenient and lazy plot for quick visualization.', 'Pivot tables are likened to the convenience of pivot tables in Excel, with the emphasis on the usefulness of PowerPoint engineering in corporate success.', 'Univariate analysis is explained as a variable-by-variable analysis, focusing on one variable at a time, often resulting in histograms for visualization.']}, {'end': 29550.32, 'segs': [{'end': 28493.844, 'src': 'embed', 'start': 28458.493, 'weight': 4, 'content': [{'end': 28461.854, 'text': 'here you go self rated and fitness and one to five silver.', 'start': 28458.493, 'duration': 3.361}, {'end': 28463.815, 'text': 'one is in poor shape and five is an excellent ship.', 'start': 28461.854, 'duration': 1.961}, {'end': 28465.876, 'text': 'this was the created data.', 'start': 28464.735, 'duration': 1.141}, {'end': 28470.337, 'text': 'so in this data set i now have that this variable in it.', 'start': 28466.596, 'duration': 3.741}, {'end': 28478.321, 'text': "these kinds of variables sometimes cause difficulty in the sense that they are some there's a word for it.", 'start': 28471.94, 'duration': 6.381}, {'end': 28481.282, 'text': 'these are sometimes called ordinal variables.', 'start': 28478.941, 'duration': 2.341}, {'end': 28484.162, 'text': 'so sometimes data is looked at sort of, you know, numerical.', 'start': 28481.502, 'duration': 2.66}, {'end': 28493.844, 'text': 'and categorical and categorical is sometime called nominal.', 'start': 28488.163, 'duration': 5.681}], 'summary': 'Self-rated fitness data ranging from 1 (poor shape) to 5 (excellent shape) was discussed, with emphasis on its ordinal and categorical nature.', 'duration': 35.351, 'max_score': 28458.493, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-028458493.jpg'}, {'end': 28604.296, 'src': 'embed', 'start': 28574.227, 'weight': 6, 'content': [{'end': 28578.731, 'text': 'so sometimes, when you have data that looks like this, the data,', 'start': 28574.227, 'duration': 4.504}, {'end': 28587.699, 'text': "the python or any database will recognize it as a number because you've entered it as a number but you analyze it as if it is a category.", 'start': 28578.731, 'duration': 8.968}, {'end': 28599.793, 'text': 'so the opposite problem also sometimes exists, in that sometimes you get to see a categorical variable show up as a number,', 'start': 28591.087, 'duration': 8.706}, {'end': 28602.055, 'text': "but you know it's a categorical variable.", 'start': 28599.793, 'duration': 2.262}, {'end': 28604.296, 'text': 'a zip code is an example.', 'start': 28602.055, 'duration': 2.241}], 'summary': 'Data analysis may misinterpret categorical variables as numbers, leading to errors in analysis and interpretation.', 'duration': 30.069, 'max_score': 28574.227, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-028574227.jpg'}, {'end': 28644.683, 'src': 'embed', 'start': 28621.24, 'weight': 5, 'content': [{'end': 28630.349, 'text': 'the other difficulty with zip codes is that they can be many of them which means that as your data set grows the number of zip codes also grows.', 'start': 28621.24, 'duration': 9.109}, {'end': 28639.237, 'text': 'so the number of values that a variable can take grows with the data, and this sometimes causes a difficulty,', 'start': 28631.23, 'duration': 8.007}, {'end': 28644.683, 'text': 'because what happens is that in the statement of the definition of the variable, you now cannot state how many categories there will be present.', 'start': 28639.237, 'duration': 5.446}], 'summary': 'As the dataset grows, the number of zip codes increases, leading to a growing number of categories for a variable.', 'duration': 23.443, 'max_score': 28621.24, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-028621240.jpg'}, {'end': 29292.961, 'src': 'embed', 'start': 29239.13, 'weight': 0, 'content': [{'end': 29247.856, 'text': 'the amount of applications people are developing and going for patents are dominant in RL, combining RL and DL.', 'start': 29239.13, 'duration': 8.726}, {'end': 29257.624, 'text': 'these two areas are dominant in generating values and also, if you see the takeovers in startups, people focusing on these two, deep learning or RL.', 'start': 29247.856, 'duration': 9.768}, {'end': 29261.809, 'text': 'These two are like say dominant.', 'start': 29259.387, 'duration': 2.422}, {'end': 29264.973, 'text': 'When it comes to startup, yes, I have showcased some value with RL.', 'start': 29261.91, 'duration': 3.063}, {'end': 29268.177, 'text': 'You have higher chance of maybe taking over by Google.', 'start': 29265.394, 'duration': 2.783}, {'end': 29279.359, 'text': 'And recently like say some 9 months back it happened in Bangalore right one not actually team itself was acquired by Google AI team.', 'start': 29270.369, 'duration': 8.99}, {'end': 29287.607, 'text': 'It was not even like say some established start up ok they showcased some value in DL and it was taken over by Google.', 'start': 29280.68, 'duration': 6.927}, {'end': 29289.739, 'text': 'and some undisclosed money.', 'start': 29288.519, 'duration': 1.22}, {'end': 29292.961, 'text': 'They did not disclose how much money they have paid.', 'start': 29290.58, 'duration': 2.381}], 'summary': 'Rl and dl applications dominant in generating value and takeovers, e.g., google acquiring startup for undisclosed money.', 'duration': 53.831, 'max_score': 29239.13, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-029239130.jpg'}, {'end': 29443.888, 'src': 'embed', 'start': 29409.323, 'weight': 3, 'content': [{'end': 29412.565, 'text': 'monitoring method and monitoring device of deep learning processor.', 'start': 29409.323, 'duration': 3.242}, {'end': 29417.603, 'text': 'And you can actually see exactly what people are doing in this in past.', 'start': 29415.02, 'duration': 2.583}, {'end': 29424.949, 'text': 'This actually sets up our task like say yes we have to get into this field or to basically ride the tide.', 'start': 29418.464, 'duration': 6.485}, {'end': 29431.195, 'text': 'In traders terminology there is something like say trend is your friend or trend is traders friend.', 'start': 29425.95, 'duration': 5.245}, {'end': 29435.799, 'text': 'If some market is going up or going down that is the best thing where they can make money.', 'start': 29431.996, 'duration': 3.803}, {'end': 29443.888, 'text': 'If market is fluctuating around a point, they actually switch off the system and go for Okay.', 'start': 29437.301, 'duration': 6.587}], 'summary': 'Monitoring method and device for deep learning processor, with insights on market trends and strategies for making money.', 'duration': 34.565, 'max_score': 29409.323, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-029409323.jpg'}], 'start': 28409.61, 'title': 'Handling data variables and reinforcement learning applications', 'summary': 'Discusses challenges and considerations of handling ordinal and categorical variables in data analysis, and the dominance of reinforcement learning in patent applications and startup takeovers, exemplified by the exponential growth and acquisition of a startup by google ai team, aiming to equip the audience to program real-world scenarios.', 'chapters': [{'end': 28678.055, 'start': 28409.61, 'title': 'Understanding ordinal and categorical variables', 'summary': 'Discusses the challenges and considerations of handling ordinal and categorical variables in data analysis, including the impact on plot creation, the definition of fitness variable, and the issues with treating categorical data as numerical and vice versa.', 'duration': 268.445, 'highlights': ['The fitness variable has numbers one to five, representing poor shape to excellent shape, making it an ordinal categorical variable. The fitness variable ranges from one to five, representing poor shape to excellent shape, making it an ordinal categorical variable, which impacts data analysis and interpretation.', 'The difficulty in handling zip codes as categorical variables due to their increasing values with the dataset growth and the inability to perform arithmetic operations with them. Handling zip codes as categorical variables poses challenges due to their increasing values with dataset growth and the inability to perform arithmetic operations with them, requiring special solutions.', 'The impact of treating categorical data as numerical and vice versa on plot creation and analysis, with the recommendation to change the data type to match the desired representation in software like Python. Treating categorical data as numerical and vice versa affects plot creation and analysis, requiring the data type to match the desired representation in software like Python for accurate visualization and interpretation.']}, {'end': 29168.077, 'start': 28699.053, 'title': 'Introduction to reinforcement learning', 'summary': 'Discusses the challenges and implementations of reinforcement learning, touching on its practical applications, the absence of readily available libraries, and the time taken to develop a framework, followed by an agenda on key concepts and case studies, aiming to equip the audience to program real-world scenarios.', 'duration': 469.024, 'highlights': ['Reinforcement learning challenges and implementation The speaker discusses the challenges and uneasiness associated with implementing reinforcement learning, mentioning the absence of readily available libraries and the need to develop a framework from scratch, taking around 9 months for a single application.', 'Agenda on key concepts and case studies The agenda includes an introduction to reinforcement learning, the business value, differences between types of learning, dominant problems solved using RL, important concepts and components, celebrated algorithms, and case studies such as smart taxi and frozen lake.', 'Practical applications of reinforcement learning The speaker highlights that many top-notch applications in the market are programmed using some form of reinforcement learning, emphasizing its relevance and impact on real-world scenarios.']}, {'end': 29550.32, 'start': 29168.137, 'title': 'Patent trends and startup acquisitions', 'summary': 'Discusses the dominance of reinforcement learning and deep learning in patent applications and startup takeovers, with reinforcement learning showing exponential growth and being dominant in recent years, exemplified by the acquisition of a startup by google ai team. the chapter also emphasizes the significance of monitoring patent trends and riding the technological tide for generating value in the field of machine learning.', 'duration': 382.183, 'highlights': ['Reinforcement learning shows exponential growth and dominance in recent patent applications with around 65,000 patents, highlighted by its significance in generating value and startup takeovers. Reinforcement learning exhibits exponential growth with around 65,000 patents, dominant in recent patent applications, and significant in generating value and startup acquisitions.', 'Deep learning, an older technique, has approximately 100,000 patents, showcasing its longstanding relevance in the field of machine learning. Deep learning, with around 100,000 patents, signifies its longstanding relevance in the field of machine learning as an older technique.', 'Startup acquisition by Google AI team exemplifies the value of deep learning and reinforcement learning, with undisclosed monetary transactions. The acquisition of a startup by Google AI team highlights the value of deep learning and reinforcement learning, with undisclosed monetary transactions.', 'The growth in reinforcement learning is dominant compared to deep learning in recent years, reflecting the trend of automation and potential for value generation. The dominant growth of reinforcement learning compared to deep learning in recent years reflects the trend of automation and potential for value generation.', 'The importance of monitoring patent trends and leveraging the technological tide for value generation is emphasized, highlighting the significance of riding the trend in machine learning technologies. The chapter emphasizes the significance of monitoring patent trends and riding the technological tide for value generation in the field of machine learning.']}], 'duration': 1140.71, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-028409610.jpg', 'highlights': ['Reinforcement learning exhibits exponential growth with around 65,000 patents, dominant in recent patent applications, and significant in generating value and startup acquisitions.', 'The acquisition of a startup by Google AI team highlights the value of deep learning and reinforcement learning, with undisclosed monetary transactions.', 'The dominant growth of reinforcement learning compared to deep learning in recent years reflects the trend of automation and potential for value generation.', 'The importance of monitoring patent trends and leveraging the technological tide for value generation is emphasized, highlighting the significance of riding the trend in machine learning technologies.', 'The fitness variable ranges from one to five, representing poor shape to excellent shape, making it an ordinal categorical variable, which impacts data analysis and interpretation.', 'Handling zip codes as categorical variables poses challenges due to their increasing values with dataset growth and the inability to perform arithmetic operations with them, requiring special solutions.', 'Treating categorical data as numerical and vice versa affects plot creation and analysis, requiring the data type to match the desired representation in software like Python for accurate visualization and interpretation.']}, {'end': 32421.799, 'segs': [{'end': 29693.184, 'src': 'embed', 'start': 29666.492, 'weight': 11, 'content': [{'end': 29676.02, 'text': 'That means the problems which actually how to incorporate that latest external factors, those are the problems where RL can be very good.', 'start': 29666.492, 'duration': 9.528}, {'end': 29682.305, 'text': 'Your regular supervised and unsupervised learning will not be so effective in those scenarios.', 'start': 29678.402, 'duration': 3.903}, {'end': 29689.3, 'text': 'right so more general than supervised and unsupervised learning ok.', 'start': 29683.755, 'duration': 5.545}, {'end': 29693.184, 'text': 'This problems like, say, reinforcement, learning problems.', 'start': 29690.782, 'duration': 2.402}], 'summary': 'Reinforcement learning is effective for incorporating latest external factors, unlike supervised and unsupervised learning.', 'duration': 26.692, 'max_score': 29666.492, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-029666492.jpg'}, {'end': 29884.523, 'src': 'embed', 'start': 29852.756, 'weight': 2, 'content': [{'end': 29853.556, 'text': 'It should be penalized.', 'start': 29852.756, 'duration': 0.8}, {'end': 29862.679, 'text': "Make sense? So that's what on, let's say, boards of view, what reinforcement learning is.", 'start': 29856.017, 'duration': 6.662}, {'end': 29866.56, 'text': 'To solve this problem, now people actually came up with a lot of algorithms.', 'start': 29863.199, 'duration': 3.361}, {'end': 29875.136, 'text': 'okay. so, predominantly RL, is the science of decision making?', 'start': 29869.031, 'duration': 6.105}, {'end': 29880.12, 'text': 'okay, and this is where, like say, people are actually running behind it.', 'start': 29875.136, 'duration': 4.984}, {'end': 29884.523, 'text': 'automated decision making pilot is not sitting in this one.', 'start': 29880.12, 'duration': 4.403}], 'summary': 'Reinforcement learning is the science of decision making with a focus on algorithms and automated decision making.', 'duration': 31.767, 'max_score': 29852.756, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-029852756.jpg'}, {'end': 30202.048, 'src': 'embed', 'start': 30171.757, 'weight': 3, 'content': [{'end': 30179.129, 'text': 'we can actually make all these things as observable entities correct, right.', 'start': 30171.757, 'duration': 7.372}, {'end': 30183.112, 'text': 'ok, now getting into the dog example.', 'start': 30179.129, 'duration': 3.983}, {'end': 30186.075, 'text': 'let us actually try to quantify in the sense.', 'start': 30183.112, 'duration': 2.963}, {'end': 30188.317, 'text': 'let us try to get some sort of logic.', 'start': 30186.075, 'duration': 2.242}, {'end': 30190.699, 'text': 'what exactly is happening?', 'start': 30188.317, 'duration': 2.382}, {'end': 30193.681, 'text': 'how reinforcement learning works in broad of sense?', 'start': 30190.699, 'duration': 2.982}, {'end': 30197.464, 'text': 'ok, say, for example, your dog is agent.', 'start': 30193.681, 'duration': 3.783}, {'end': 30202.048, 'text': 'ok, that is responding to the or that is exposed to the environment.', 'start': 30197.464, 'duration': 4.584}], 'summary': 'Discussing how to quantify and apply reinforcement learning to a dog example.', 'duration': 30.291, 'max_score': 30171.757, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-030171757.jpg'}, {'end': 30422.129, 'src': 'embed', 'start': 30394.547, 'weight': 1, 'content': [{'end': 30402.771, 'text': 'there will be rewards, rewards to the agent, actions by agent, environment because of the those actions and rewards.', 'start': 30394.547, 'duration': 8.224}, {'end': 30409.615, 'text': 'environment changes, environment changes in the sense the states in the environment changes.', 'start': 30402.771, 'duration': 6.844}, {'end': 30416.079, 'text': 'right. so it is a continuous process till you expect the agent to complete the task.', 'start': 30409.615, 'duration': 6.464}, {'end': 30422.129, 'text': 'what it you wanted make sense, right.', 'start': 30416.079, 'duration': 6.05}], 'summary': "Continuous process with rewards for agent's actions and environment changes.", 'duration': 27.582, 'max_score': 30394.547, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-030394547.jpg'}, {'end': 30482.883, 'src': 'embed', 'start': 30455.056, 'weight': 5, 'content': [{'end': 30460.48, 'text': 'The policy, this is another term which you will be hearing again and again.', 'start': 30455.056, 'duration': 5.424}, {'end': 30463.842, 'text': 'The policy in reinforcement learning is the following.', 'start': 30461.06, 'duration': 2.782}, {'end': 30468.665, 'text': 'The policy is the strategy of choosing an action based on the state.', 'start': 30464.402, 'duration': 4.263}, {'end': 30477.32, 'text': 'Okay, policy is chosen by agent and then, if this policy is optimal policy or the best policy,', 'start': 30470.717, 'duration': 6.603}, {'end': 30482.003, 'text': 'that policy is referred as optimal policy for a given state.', 'start': 30477.32, 'duration': 4.683}, {'end': 30482.883, 'text': 'this is the best action.', 'start': 30482.003, 'duration': 0.88}], 'summary': 'Reinforcement learning policy is the strategy for choosing actions based on states, aiming for the best policy.', 'duration': 27.827, 'max_score': 30455.056, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-030455056.jpg'}, {'end': 30810.509, 'src': 'embed', 'start': 30768.388, 'weight': 6, 'content': [{'end': 30790.487, 'text': 'So once you train your system for the period of time ok for the state one probably walking to next point is the best action right.', 'start': 30768.388, 'duration': 22.099}, {'end': 30797.011, 'text': 'So, over the period of time, you actually start repeating these actions.', 'start': 30791.687, 'duration': 5.324}, {'end': 30802.675, 'text': 'you learn your system and then come up with a list of optimal actions, not list of optimal action.', 'start': 30797.011, 'duration': 5.664}, {'end': 30805.537, 'text': 'each state mapped with one optimal action.', 'start': 30802.675, 'duration': 2.862}, {'end': 30810.509, 'text': 'ok, now the output of reinforcement learning.', 'start': 30807.546, 'duration': 2.963}], 'summary': 'Reinforcement learning trains system to repeat optimal actions for each state.', 'duration': 42.121, 'max_score': 30768.388, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-030768388.jpg'}, {'end': 30927.311, 'src': 'embed', 'start': 30898.898, 'weight': 7, 'content': [{'end': 30913.45, 'text': 'right. so different states but same action, because your this states are very similar to each other, excluding the geographic location right.', 'start': 30898.898, 'duration': 14.552}, {'end': 30916.493, 'text': 'so so in case of action needs to be changed.', 'start': 30913.45, 'duration': 3.043}, {'end': 30919.629, 'text': 'Okay, scenario has changed, environment has changed.', 'start': 30917.128, 'duration': 2.501}, {'end': 30920.809, 'text': 'let us say something has happened.', 'start': 30919.629, 'duration': 1.18}, {'end': 30922.87, 'text': 'Optimal action needs to be changed.', 'start': 30921.429, 'duration': 1.441}, {'end': 30927.311, 'text': 'Exactly Then what happens? That was my question.', 'start': 30923.29, 'duration': 4.021}], 'summary': 'Similar states require same action, but changed scenarios demand altered optimal action.', 'duration': 28.413, 'max_score': 30898.898, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-030898898.jpg'}, {'end': 31443.699, 'src': 'embed', 'start': 31389.118, 'weight': 0, 'content': [{'end': 31389.979, 'text': 'you have been living with it.', 'start': 31389.118, 'duration': 0.861}, {'end': 31397.132, 'text': 'correct? right, yeah, ok, now, so far are we fine?', 'start': 31392.23, 'duration': 4.902}, {'end': 31401.294, 'text': 'we are able to get like, say reinforcement learning is nothing but like say,', 'start': 31397.132, 'duration': 4.162}, {'end': 31410.678, 'text': 'looking at the state and taking the action and you are getting reward for that one and over the period of time.', 'start': 31401.294, 'duration': 9.384}, {'end': 31414.139, 'text': 'what are the optimal actions for the given state?', 'start': 31410.678, 'duration': 3.461}, {'end': 31420.542, 'text': 'ok, that is what we have to figure out and output of reinforcement learning is going to be this just this csv table.', 'start': 31414.139, 'duration': 6.403}, {'end': 31428.814, 'text': 'ok, on this missile it will be just the csv table implemented.', 'start': 31421.932, 'duration': 6.882}, {'end': 31437.077, 'text': 'so for that, like say, system to work, there will be video streaming coming in, new state identified, optimal action taken.', 'start': 31428.814, 'duration': 8.263}, {'end': 31439.578, 'text': 'it will be continuous.', 'start': 31437.077, 'duration': 2.501}, {'end': 31443.699, 'text': 'how fast you can react, that is nothing but your accuracy.', 'start': 31439.578, 'duration': 4.121}], 'summary': 'Reinforcement learning aims to determine optimal actions for given states and output as a csv table. it involves continuous video streaming and fast reaction for accuracy.', 'duration': 54.581, 'max_score': 31389.118, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-031389118.jpg'}, {'end': 31603.35, 'src': 'embed', 'start': 31569.11, 'weight': 12, 'content': [{'end': 31572.412, 'text': "And that's why self-driving car, you will have a lot of data to crunch.", 'start': 31569.11, 'duration': 3.302}, {'end': 31576.836, 'text': 'Google spent several hundreds of man years to build this one.', 'start': 31573.914, 'duration': 2.922}, {'end': 31579.837, 'text': 'self-driving car.', 'start': 31578.996, 'duration': 0.841}, {'end': 31586.12, 'text': 'Google spent several hundreds of man years to actually simulate all these things and still we have, like, say,', 'start': 31579.837, 'duration': 6.283}, {'end': 31595.486, 'text': 'recently self-driving car hitting human on the road right because that particular scenario was not captured earlier.', 'start': 31586.12, 'duration': 9.366}, {'end': 31603.35, 'text': 'right? so you have the frame and also associated attributes.', 'start': 31595.486, 'duration': 7.864}], 'summary': 'Google spent several hundred man years to build self-driving car, but it still faces challenges such as recent accidents.', 'duration': 34.24, 'max_score': 31569.11, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-031569110.jpg'}, {'end': 31663.575, 'src': 'embed', 'start': 31637.001, 'weight': 10, 'content': [{'end': 31645.2, 'text': 'training will never take place in your like, say, car, okay, the person who sold you that self-driving instrument.', 'start': 31637.001, 'duration': 8.199}, {'end': 31650.224, 'text': 'they have trained with enough scenarios and they have given you.', 'start': 31645.2, 'duration': 5.024}, {'end': 31653.086, 'text': "and if they want to train with the new scenarios, that's what they say.", 'start': 31650.224, 'duration': 2.862}, {'end': 31657.41, 'text': 'we are giving you updates.', 'start': 31653.086, 'duration': 4.324}, {'end': 31658.391, 'text': 'we are giving you updates.', 'start': 31657.41, 'duration': 0.981}, {'end': 31660.533, 'text': 'they are adding more scenarios.', 'start': 31658.391, 'duration': 2.142}, {'end': 31663.575, 'text': 'it will be like, say, very limited data.', 'start': 31660.533, 'duration': 3.042}], 'summary': 'Self-driving instrument training includes updates with new scenarios for more comprehensive data.', 'duration': 26.574, 'max_score': 31637.001, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-031637001.jpg'}, {'end': 31755.576, 'src': 'embed', 'start': 31728.948, 'weight': 14, 'content': [{'end': 31733.63, 'text': 'If I am not able to detect that one, I actually end up with an accident at some time.', 'start': 31728.948, 'duration': 4.682}, {'end': 31739.432, 'text': 'This is a definitely like say useful data to have as an attribute.', 'start': 31734.991, 'duration': 4.441}, {'end': 31744.274, 'text': 'And we have been saying like say this is what is available.', 'start': 31741.293, 'duration': 2.981}, {'end': 31747.109, 'text': 'You just have to capture and use it.', 'start': 31745.848, 'duration': 1.261}, {'end': 31749.731, 'text': 'Once you have all this one, you are doing action.', 'start': 31747.81, 'duration': 1.921}, {'end': 31751.853, 'text': 'That action is leading to you to the next state.', 'start': 31749.931, 'duration': 1.922}, {'end': 31755.576, 'text': 'That next state again includes frame and attributes.', 'start': 31753.014, 'duration': 2.562}], 'summary': 'Capturing useful data leads to action and next state.', 'duration': 26.628, 'max_score': 31728.948, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-031728948.jpg'}, {'end': 32042.878, 'src': 'embed', 'start': 31971.159, 'weight': 15, 'content': [{'end': 31974.021, 'text': 'But we need to define that for all the scenarios for business.', 'start': 31971.159, 'duration': 2.862}, {'end': 31977.604, 'text': 'For instance, for your problem, stock problem, there will be a default state.', 'start': 31974.041, 'duration': 3.563}, {'end': 31980.426, 'text': 'We have to define all these corner cases should be defined.', 'start': 31977.964, 'duration': 2.462}, {'end': 31988.814, 'text': "okay, because that's why the building like, say, building reinforcement learning system takes lot of time.", 'start': 31981.812, 'duration': 7.002}, {'end': 31991.754, 'text': 'okay, it looks very, very easy to explain.', 'start': 31988.814, 'duration': 2.94}, {'end': 31994.055, 'text': 'but development now you see the complexity.', 'start': 31991.754, 'duration': 2.301}, {'end': 31998.696, 'text': 'okay, we are actually talking about lot of like, say new data points coming in.', 'start': 31994.055, 'duration': 4.641}, {'end': 31999.576, 'text': 'all these things.', 'start': 31998.696, 'duration': 0.88}, {'end': 32004.217, 'text': 'now, coming up with a general framework to solve this problem is far from reality.', 'start': 31999.576, 'duration': 4.641}, {'end': 32030.674, 'text': 'Now let me go to the next one.', 'start': 32028.033, 'duration': 2.641}, {'end': 32034.595, 'text': 'This is like say I think almost all of us might have experienced.', 'start': 32031.234, 'duration': 3.361}, {'end': 32042.878, 'text': 'How do we actually teach baby to walk? This is exactly like previous one.', 'start': 32035.355, 'duration': 7.523}], 'summary': 'Defining corner cases for business scenarios is complex and time-consuming, with new data points adding to the challenge.', 'duration': 71.719, 'max_score': 31971.159, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-031971159.jpg'}, {'end': 32273.397, 'src': 'embed', 'start': 32242.687, 'weight': 17, 'content': [{'end': 32245.208, 'text': 'there is limit for simplification online also.', 'start': 32242.687, 'duration': 2.521}, {'end': 32248.73, 'text': 'it is basically for marketing in real term.', 'start': 32245.208, 'duration': 3.522}, {'end': 32253.013, 'text': 'in reality, I may not believe online learning.', 'start': 32248.73, 'duration': 4.283}, {'end': 32259.977, 'text': 'of course there are algorithms for online learning, ok, and reinforcement learning.', 'start': 32253.013, 'duration': 6.964}, {'end': 32266.441, 'text': 'if you actually take it as like, say, optimal actions and then states it is actually taking actions online,', 'start': 32259.977, 'duration': 6.464}, {'end': 32273.397, 'text': 'Maybe you can say my system is automated and taking actions online or in real time.', 'start': 32268.393, 'duration': 5.004}], 'summary': 'Online learning has limits, skepticism about effectiveness, but acknowledges algorithms for reinforcement and reinforcement learning.', 'duration': 30.71, 'max_score': 32242.687, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-032242687.jpg'}], 'start': 29552.201, 'title': 'Reinforcement learning in decision making', 'summary': 'Discusses the application of reinforcement learning in decision-making, using examples from various domains. it emphasizes the importance of state changes, actions, rewards, and external factors, and highlights challenges such as inadequate training data and evolving targets.', 'chapters': [{'end': 29884.523, 'start': 29552.201, 'title': 'Reinforcement learning: making automated decisions', 'summary': 'Highlights the application of reinforcement learning in making automated decisions, illustrated by examples from aviation and missile systems, emphasizing the role of external factors, state changes, and rewards in decision-making processes.', 'duration': 332.322, 'highlights': ["Reinforcement learning's application in automated decision-making processes in aviation and missile systems, demonstrating the impact of external factors on state changes and decision outcomes, leading to a few meters of error in target hitting. The application of reinforcement learning in aviation and missile systems, where external factors impact state changes and decision outcomes, resulting in a few meters of error in hitting the target.", "The continuous decision-making process in reinforcement learning, involving the agent's interactions with the environment, state changes, and assessment of actions based on rewards to achieve the target. The continuous decision-making process in reinforcement learning, involving the agent's interactions with the environment, state changes, and assessment of actions based on rewards to achieve the target.", 'The concept of rewards in reinforcement learning, where actions leading closer to the target are rewarded, while actions deviating from the target are penalized to guide decision-making processes. The concept of rewards in reinforcement learning, where actions leading closer to the target are rewarded, while actions deviating from the target are penalized to guide decision-making processes.']}, {'end': 30369.403, 'start': 29884.523, 'title': 'Reinforcement learning process and examples', 'summary': 'Discusses the concept of reinforcement learning, using examples of training animals such as dogs, and highlights the process of state changes, actions, and rewards in the learning process.', 'duration': 484.88, 'highlights': ['The process of teaching animals through reinforcement learning is demonstrated using examples of teaching dogs through rewards and punishments based on their actions. Demonstration of reinforcement learning with examples of training dogs, using rewards and punishments based on their actions.', 'The concept of state changes, actions, and rewards is emphasized in the context of reinforcement learning, illustrating the logic behind the learning process. Emphasis on state changes, actions, and rewards in the context of reinforcement learning, illustrating the logic behind the learning process.', 'The discussion includes the quantification of the learning process, explaining how actions and rewards are observable entities that can be quantified. Explanation of quantifying the learning process, treating actions and rewards as observable and quantifiable entities.']}, {'end': 31005.994, 'start': 30369.403, 'title': 'Reinforcement learning basics', 'summary': 'Introduces the concept of reinforcement learning, explaining the continuous process of state changes, actions, rewards, and the policy in reinforcement learning, emphasizing the importance of optimal actions for given states and the output of reinforcement learning as a list of states and their optimal actions.', 'duration': 636.591, 'highlights': ['The policy in reinforcement learning is the strategy of choosing an action based on the state, with the optimal policy being the best action for a given state. Understanding the concept of policy in reinforcement learning and the significance of optimal policy for a given state.', 'The output of reinforcement learning is a list of all possible states followed by their optimal actions, resulting in just two columns of states and optimal actions in a CSV file. Highlighting the final output of reinforcement learning and the format in which it is presented, emphasizing the importance of optimal actions for different states.', 'The chapter discusses the need to change the optimal action when the environment or scenario changes, and the consideration of data required to make such decisions in production. Emphasizing the importance of adapting optimal actions when the environment or scenario changes and the practical application of this concept in production.']}, {'end': 31705.913, 'start': 31005.994, 'title': 'Reinforcement learning and optimal actions', 'summary': 'Discusses the concept of reinforcement learning, emphasizing the process of capturing states, taking optimal actions, and accumulating rewards over time, with a focus on long-term rewards and the training process. it also highlights the impact of reaction speed on accuracy and the iterative nature of training in self-driving car systems, stressing the need for continuous updates to capture new scenarios.', 'duration': 699.919, 'highlights': ['The process of capturing states, taking optimal actions, and accumulating rewards over time is emphasized, with a focus on long-term rewards and the training process. Capturing states, taking optimal actions, accumulating rewards, focus on long-term rewards and training process.', 'The impact of reaction speed on accuracy is highlighted, with faster reaction times resulting in hitting targets before others realize it. Faster reaction times lead to hitting targets before others realize it.', 'The iterative nature of training in self-driving car systems is emphasized, with the need for continuous updates to capture new scenarios. Iterative training process in self-driving car systems, continuous updates for capturing new scenarios.', 'The importance of capturing states and associated attributes, such as speed and external factors, for each frame in self-driving car systems is discussed. Importance of capturing states and associated attributes for each frame in self-driving car systems.']}, {'end': 31970.398, 'start': 31705.913, 'title': 'Challenges of automated learning', 'summary': 'Discusses the challenges of automated learning, particularly in the context of self-driving cars, highlighting the evolving nature of targets in reinforcement learning, the impact of inadequate training data, and the need for extensive testing environments. it emphasizes the differences between reinforcement learning, unsupervised learning, and supervised learning, and the necessity of capturing and using useful data to prevent accidents.', 'duration': 264.485, 'highlights': ['The evolving nature of targets in reinforcement learning is emphasized, with the example of short-term targets in self-driving cars being evaluated every 100 meters. Evolving targets in reinforcement learning, short-term evaluation in self-driving cars', 'The impact of inadequate training data on automated systems is discussed, with the example of self-driving cars not being suitable for Indian roads due to the violation of lane discipline and other scenarios not part of the training data. Impact of inadequate training data on self-driving cars, unsuitability for Indian roads', "The need for extensive testing environments, particularly in the context of self-driving cars, is emphasized to ensure the system's capability to handle unexpected scenarios. Necessity of extensive testing environments for self-driving cars, handling unexpected scenarios", 'Differences between reinforcement learning, unsupervised learning, and supervised learning are highlighted, emphasizing the evolving nature of targets in reinforcement learning compared to fixed targets in supervised learning. Differences between reinforcement learning, unsupervised learning, and supervised learning, evolving targets in reinforcement learning', 'The necessity of capturing and using useful data to prevent accidents in automated systems is emphasized, highlighting the potential consequences of not detecting essential objects and scenarios. Necessity of capturing and using useful data, consequences of not detecting essential objects and scenarios']}, {'end': 32421.799, 'start': 31971.159, 'title': 'Reinforcement learning challenges', 'summary': 'Explains the complexities of reinforcement learning, using examples of teaching a baby to walk and the challenges of online learning, highlighting the need to define corner cases and the continuous cycle of interpreting actions and changing states.', 'duration': 450.64, 'highlights': ['The chapter explains the need to define all corner cases for business scenarios and the complexities of building reinforcement learning systems, which takes a lot of time. Defining all corner cases for business scenarios is essential; building reinforcement learning systems is time-consuming.', 'The process of teaching a baby to walk is compared to reinforcement learning, emphasizing the use of rewards and the interpretation of actions to drive behavior. Teaching a baby to walk involves using rewards to drive behavior and interpreting actions to guide learning.', 'The challenges of online learning and automated machine learning are discussed, highlighting the discrepancy between marketing claims and reality. Online learning and automated machine learning face challenges with marketing claims versus reality.', 'The continuous cycle of interpreting actions and changing states in reinforcement learning is emphasized, showing the complexity of the process. Reinforcement learning involves a continuous cycle of interpreting actions and changing states, illustrating its complexity.', 'Different algorithms and strategies are mentioned for efficient problem-solving in reinforcement learning, such as focusing on the next reward or adjusting present rewards in view of the end target. Efficient problem-solving in reinforcement learning involves considering different algorithms and strategies, such as focusing on the next reward or adjusting present rewards in view of the end target.']}], 'duration': 2869.598, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-029552201.jpg', 'highlights': ["Reinforcement learning's impact on state changes and decision outcomes in aviation and missile systems", 'Continuous decision-making process in reinforcement learning involving interactions with the environment', 'Concept of rewards guiding decision-making processes in reinforcement learning', 'Demonstration of reinforcement learning through examples of teaching dogs using rewards and punishments', 'Emphasis on state changes, actions, and rewards in the context of reinforcement learning', 'Understanding the concept of policy and the significance of optimal policy in reinforcement learning', 'Final output of reinforcement learning as a list of possible states and their optimal actions', 'Emphasizing the importance of adapting optimal actions when the environment or scenario changes', 'Emphasis on capturing states, taking optimal actions, and accumulating rewards in reinforcement learning', 'Impact of reaction speed on accuracy in reinforcement learning', 'Iterative training process and continuous updates in self-driving car systems', 'Evolving nature of targets and the impact of inadequate training data in reinforcement learning', 'Necessity of extensive testing environments for self-driving cars to handle unexpected scenarios', 'Differences between reinforcement learning, unsupervised learning, and supervised learning', 'Need for capturing and using useful data to prevent accidents in automated systems', 'Complexities of building reinforcement learning systems and defining all corner cases for business scenarios', 'Comparison of teaching a baby to walk to reinforcement learning', 'Challenges of online learning and automated machine learning', 'Continuous cycle of interpreting actions and changing states in reinforcement learning', 'Mention of different algorithms and strategies for efficient problem-solving in reinforcement learning']}, {'end': 33827.861, 'segs': [{'end': 32477.874, 'src': 'embed', 'start': 32452.102, 'weight': 0, 'content': [{'end': 32459.906, 'text': 'And RL from optimal control point of view are like say your aerospace applications or communication network applications ok.', 'start': 32452.102, 'duration': 7.804}, {'end': 32464.188, 'text': 'And then you have like say operations research right.', 'start': 32460.266, 'duration': 3.922}, {'end': 32466.369, 'text': 'Operations research it is pretty interesting problem.', 'start': 32464.528, 'duration': 1.841}, {'end': 32474.853, 'text': 'Say, for example, you have this big blood bank in the city and like, say, yes,', 'start': 32466.609, 'duration': 8.244}, {'end': 32477.874, 'text': 'blood supply is there and there is requirement also from different hospitals.', 'start': 32474.853, 'duration': 3.021}], 'summary': 'Rl applied in aerospace, communication networks, and operations research with blood bank supply management.', 'duration': 25.772, 'max_score': 32452.102, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-032452102.jpg'}, {'end': 32674.71, 'src': 'embed', 'start': 32649.596, 'weight': 1, 'content': [{'end': 32655.344, 'text': 'he could theoretically prove it state aggregation, how to do in a continuous manner,', 'start': 32649.596, 'duration': 5.748}, {'end': 32661.626, 'text': 'how to actually aggregate this information and put it as a state right.', 'start': 32655.344, 'duration': 6.282}, {'end': 32665.327, 'text': 'So that is nothing but like say, some people call it as vectorization,', 'start': 32662.206, 'duration': 3.121}, {'end': 32671.249, 'text': 'some people call it as clustering and in RL terminology we say state aggregation.', 'start': 32665.327, 'duration': 5.922}, {'end': 32673.83, 'text': 'Continuous data.', 'start': 32673.129, 'duration': 0.701}, {'end': 32674.71, 'text': 'you have to put it into.', 'start': 32673.83, 'duration': 0.88}], 'summary': 'Techniques like vectorization, clustering, and state aggregation are used to process continuous data in reinforcement learning.', 'duration': 25.114, 'max_score': 32649.596, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-032649596.jpg'}, {'end': 32785.955, 'src': 'embed', 'start': 32757.177, 'weight': 3, 'content': [{'end': 32768.203, 'text': 'you have your environment in that environment, you have states, rewards, actions right, you actually emit state and reward and this man, agent,', 'start': 32757.177, 'duration': 11.026}, {'end': 32769.225, 'text': 'he actually sends back action.', 'start': 32768.203, 'duration': 1.022}, {'end': 32772.743, 'text': 'because of that, something will change.', 'start': 32771.201, 'duration': 1.542}, {'end': 32775.225, 'text': 'here again, we actually start emitting.', 'start': 32772.743, 'duration': 2.482}, {'end': 32784.534, 'text': 'ok, this is what is exactly happening in your self driving car or any continuous systems.', 'start': 32775.225, 'duration': 9.309}, {'end': 32785.955, 'text': 'right, this is what I wrote here.', 'start': 32784.534, 'duration': 1.421}], 'summary': 'In a continuous environment, states, rewards, and actions are emitted, affecting the self-driving car or other systems.', 'duration': 28.778, 'max_score': 32757.177, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-032757177.jpg'}, {'end': 32881.928, 'src': 'embed', 'start': 32850.578, 'weight': 2, 'content': [{'end': 32857.343, 'text': 'the process of reinforcement, learning, observing the environment, deciding how to act using some strategy.', 'start': 32850.578, 'duration': 6.765}, {'end': 32859.586, 'text': 'observing the environment is nothing but state aggregation.', 'start': 32857.343, 'duration': 2.243}, {'end': 32868.392, 'text': 'Last 10 microseconds information let me put it as my state, right.', 'start': 32861.667, 'duration': 6.725}, {'end': 32873.256, 'text': 'In last 10 microseconds my vehicle traveled at like say 120 kilometer speed.', 'start': 32868.913, 'duration': 4.343}, {'end': 32881.928, 'text': 'right and coordinates are from here to there.', 'start': 32876.906, 'duration': 5.022}], 'summary': 'Reinforcement learning involves observing the environment and making decisions based on state aggregation, with the vehicle traveling at 120 kilometers per hour in the last 10 microseconds.', 'duration': 31.35, 'max_score': 32850.578, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-032850578.jpg'}, {'end': 33088.886, 'src': 'embed', 'start': 33056.718, 'weight': 4, 'content': [{'end': 33065.262, 'text': 'so target is going for like, say, maximum reward, okay.', 'start': 33056.718, 'duration': 8.544}, {'end': 33070.535, 'text': 'so some of the standard examples for the reward okay.', 'start': 33065.262, 'duration': 5.273}, {'end': 33074.598, 'text': 'so this is just like, say, sample list.', 'start': 33070.535, 'duration': 4.063}, {'end': 33081.182, 'text': 'when you are actually building any solution for a given problem, you have to define your reward.', 'start': 33074.598, 'duration': 6.584}, {'end': 33088.886, 'text': 'say, for example, the dog receives a reward from a trainer for doing the activity, what trainer expected?', 'start': 33081.182, 'duration': 7.704}], 'summary': 'Targeting maximum reward in defining and giving examples of rewards for a solution.', 'duration': 32.168, 'max_score': 33056.718, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-033056718.jpg'}, {'end': 33349.65, 'src': 'embed', 'start': 33323.072, 'weight': 5, 'content': [{'end': 33326.595, 'text': 'if you lose one dollar money, negative reward.', 'start': 33323.072, 'duration': 3.523}, {'end': 33332.538, 'text': 'right, and then controlling a power station.', 'start': 33326.595, 'duration': 5.943}, {'end': 33336.782, 'text': 'now you have, like, say, if you are able to manage power efficiently, positive reward.', 'start': 33332.538, 'duration': 4.244}, {'end': 33341.385, 'text': 'any security, like, say, incidents, negative reward.', 'start': 33336.782, 'duration': 4.603}, {'end': 33346.008, 'text': 'just giving an idea like, say, how do we define rewards?', 'start': 33341.385, 'duration': 4.623}, {'end': 33349.65, 'text': 'each problem will have different rewards, different kind of reward definition.', 'start': 33346.008, 'duration': 3.642}], 'summary': 'Defining rewards based on managing power efficiently and addressing security incidents.', 'duration': 26.578, 'max_score': 33323.072, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-033323072.jpg'}, {'end': 33432.509, 'src': 'embed', 'start': 33405.741, 'weight': 6, 'content': [{'end': 33411.07, 'text': 'so how systems will understand if that is a I mean negative reward or positive?', 'start': 33405.741, 'duration': 5.329}, {'end': 33411.991, 'text': 'We are defining it.', 'start': 33411.269, 'duration': 0.722}, {'end': 33417.375, 'text': 'We are defining that it is, no, ok, very good.', 'start': 33413.712, 'duration': 3.663}, {'end': 33427.464, 'text': 'Ok, how do we define rewards in that helicopter case? Any guess? Sensors, there should be sensors which actually measure the distance.', 'start': 33418.877, 'duration': 8.587}, {'end': 33429.567, 'text': 'Exactly How close the distance is to the ground.', 'start': 33427.625, 'duration': 1.942}, {'end': 33432.509, 'text': 'Exactly You actually took the action now, ok.', 'start': 33429.847, 'duration': 2.662}], 'summary': 'Defining rewards for systems using sensors to measure distance in helicopter case', 'duration': 26.768, 'max_score': 33405.741, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-033405741.jpg'}, {'end': 33579.197, 'src': 'embed', 'start': 33556.52, 'weight': 7, 'content': [{'end': 33566.027, 'text': 'I mean where is the main information in reinforcement learning, the better you make the states and the better actions you make for those states.', 'start': 33556.52, 'duration': 9.507}, {'end': 33568.869, 'text': 'I mean what are the possible actions?', 'start': 33566.027, 'duration': 2.842}, {'end': 33570.69, 'text': 'what are the possible states?', 'start': 33568.869, 'duration': 1.821}, {'end': 33579.197, 'text': 'if you can actually get all possible states for our environment or for our problem now,', 'start': 33570.69, 'duration': 8.507}], 'summary': 'Improving states and actions in reinforcement learning for better performance.', 'duration': 22.677, 'max_score': 33556.52, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-033556520.jpg'}], 'start': 32424.278, 'title': 'Reinforcement learning applications', 'summary': 'Discusses diverse applications of reinforcement learning, including optimal control in aerospace and communication networks, operations research in blood bank supply chain, and state aggregation for continuous data processing. it provides insights from leading indian research scientists and emphasizes its role in decision-making processes and handling continuous data.', 'chapters': [{'end': 32649.596, 'start': 32424.278, 'title': 'Reinforcement learning applications', 'summary': 'Discusses the diverse applications of reinforcement learning, including optimal control in aerospace and communication networks, operations research in blood bank supply chain, and state aggregation for continuous data processing, with insights from leading indian research scientists.', 'duration': 225.318, 'highlights': ['Reinforcement learning applications range from optimal control in aerospace and communication networks to operations research in blood bank supply chain and state aggregation for continuous data processing.', 'The problem of blood bank supply optimization, involving universal donors, acceptors, and specific groups, can be framed as a reinforcement learning problem, addressing supply chain challenges using artificial intelligence.', 'Reinforcement learning also finds applications in economics, utility theory, game theory, medical applications, neuroscience, and deep learning.', 'State aggregation for continuous data processing is a crucial aspect in reinforcement learning, as demonstrated empirically and theoretically by leading Indian research scientist Professor Borkar and his associates.']}, {'end': 32929.815, 'start': 32649.596, 'title': 'Understanding reinforcement learning', 'summary': 'Discusses the concept of state aggregation in reinforcement learning, emphasizing its role in automating decision-making processes and its application in handling continuous data, with a focus on the science of making optimal decisions and the interactions between agents and environments.', 'duration': 280.219, 'highlights': ['State aggregation in reinforcement learning involves consolidating information into states at regular intervals, such as every 100 milliseconds, to facilitate decision-making (e.g., vehicle speed, coordinates, fuel consumption) - emphasizing its application in handling continuous data.', 'Reinforcement learning is essential for automating decision-making processes, particularly in the absence of a target variable, and serves as the core of decision-making, enabling precision decision-making to a greater extent.', "The interaction between the agent and the environment in reinforcement learning involves the emission of states and rewards, followed by the agent's action, resulting in changes in the environment, highlighting its relevance in continuous systems and self-driving cars.", 'Observing the environment, deciding how to act using some strategy, and learning from experiences are key components of the reinforcement learning process, emphasizing the importance of state aggregation and the iterative nature of developing an optimal strategy.']}, {'end': 33405.741, 'start': 32935.795, 'title': 'Reinforcement learning: reward and optimization', 'summary': 'Discusses the concept of reward in reinforcement learning, emphasizing the need for defining custom rewards based on examples and scenarios, including the role of rewards in various applications like training animals, controlling systems, and playing games.', 'duration': 469.946, 'highlights': ['The concept of reward in reinforcement learning is essential, as it drives the optimization process towards maximizing the reward, and the need for defining custom rewards becomes apparent as different scenarios and examples demonstrate unique reward definitions. various examples and scenarios', 'The role of rewards is evident in applications such as training animals, controlling systems like auto-driving helicopters, managing investment portfolios, controlling power stations, and enabling humanoid robots to walk, each with its own unique reward definition. applications: training animals, controlling systems, managing investment portfolios, controlling power stations, enabling humanoid robots to walk', 'The application of reinforcement learning in playing games, particularly Atari games, highlights the potential for systems to become highly intelligent and increasingly difficult to control, as exemplified by the success of AlphaGo. success of AlphaGo']}, {'end': 33827.861, 'start': 33405.741, 'title': 'Defining rewards in reinforcement learning', 'summary': 'Explains the process of defining rewards in reinforcement learning, the significance of environment perception as state, and the importance of capturing all possible states and actions for training a model, mentioning the association of actions with states in the optimal policy.', 'duration': 422.12, 'highlights': ['The process of defining rewards in reinforcement learning is crucial for training the system, as it involves measuring the distance from the ground using sensors and determining the impact of actions on the altitude, which results in either negative or positive rewards.', "The environment perception, referred to as state, holds significant importance in reinforcement learning, encompassing the vehicle's image and attributes, and capturing all possible states for the environment or problem is essential for effective model training.", 'The association of actions with states in the optimal policy is a fundamental aspect in reinforcement learning, where the optimal action is linked to a given state, and capturing all possible states and actions is essential for training a model effectively.']}], 'duration': 1403.583, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-032424278.jpg', 'highlights': ['Reinforcement learning applications range from optimal control in aerospace and communication networks to operations research in blood bank supply chain and state aggregation for continuous data processing.', 'State aggregation for continuous data processing is a crucial aspect in reinforcement learning, as demonstrated empirically and theoretically by leading Indian research scientist Professor Borkar and his associates.', 'State aggregation in reinforcement learning involves consolidating information into states at regular intervals, such as every 100 milliseconds, to facilitate decision-making (e.g., vehicle speed, coordinates, fuel consumption) - emphasizing its application in handling continuous data.', "The interaction between the agent and the environment in reinforcement learning involves the emission of states and rewards, followed by the agent's action, resulting in changes in the environment, highlighting its relevance in continuous systems and self-driving cars.", 'The concept of reward in reinforcement learning is essential, as it drives the optimization process towards maximizing the reward, and the need for defining custom rewards becomes apparent as different scenarios and examples demonstrate unique reward definitions.', 'The role of rewards is evident in applications such as training animals, controlling systems like auto-driving helicopters, managing investment portfolios, controlling power stations, and enabling humanoid robots to walk, each with its own unique reward definition.', 'The process of defining rewards in reinforcement learning is crucial for training the system, as it involves measuring the distance from the ground using sensors and determining the impact of actions on the altitude, which results in either negative or positive rewards.', 'The association of actions with states in the optimal policy is a fundamental aspect in reinforcement learning, where the optimal action is linked to a given state, and capturing all possible states and actions is essential for training a model effectively.']}, {'end': 35411.977, 'segs': [{'end': 33909.845, 'src': 'embed', 'start': 33883.239, 'weight': 1, 'content': [{'end': 33887.848, 'text': 'few examples through which we can actually develop the intuition.', 'start': 33883.239, 'duration': 4.609}, {'end': 33893.634, 'text': 'and if at all we want to deep dive, we have to write our own frameworks.', 'start': 33887.848, 'duration': 5.786}, {'end': 33896.477, 'text': 'ok, so this is open AI gym.', 'start': 33893.634, 'duration': 2.843}, {'end': 33901.7, 'text': 'you can actually get into pip, install gym and you can start using it.', 'start': 33896.477, 'duration': 5.223}, {'end': 33905.923, 'text': 'and you see, document is not so great like in scikit-learn.', 'start': 33901.7, 'duration': 4.223}, {'end': 33908.604, 'text': 'scikit-learn is pretty good documentation.', 'start': 33905.923, 'duration': 2.681}, {'end': 33909.845, 'text': 'how do we solve the problem?', 'start': 33908.604, 'duration': 1.241}], 'summary': 'Discussion on using open ai gym for developing intuition and comparing documentation with scikit-learn.', 'duration': 26.606, 'max_score': 33883.239, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-033883239.jpg'}, {'end': 33992.452, 'src': 'embed', 'start': 33959.434, 'weight': 0, 'content': [{'end': 33960.816, 'text': "what is your daddy's name?", 'start': 33959.434, 'duration': 1.382}, {'end': 33969.161, 'text': 'daddy? right, there is no model per say, except k equals to 3 or k equals to 10..', 'start': 33960.816, 'duration': 8.345}, {'end': 33979.026, 'text': 'right distance calculation exactly and in queue learning its model free, you just have dot csv file as an output.', 'start': 33969.161, 'duration': 9.865}, {'end': 33980.386, 'text': 'you just have dot csv file.', 'start': 33979.026, 'duration': 1.36}, {'end': 33984.869, 'text': "that's it, ok.", 'start': 33980.386, 'duration': 4.483}, {'end': 33987.59, 'text': 'so it is a model free of policy.', 'start': 33984.869, 'duration': 2.721}, {'end': 33992.452, 'text': 'means it does not require to follow a specific policy while training.', 'start': 33987.59, 'duration': 4.862}], 'summary': 'Discussion about a model-free learning approach with k=3 or k=10, using distance calculation and outputting a .csv file.', 'duration': 33.018, 'max_score': 33959.434, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-033959434.jpg'}, {'end': 34068.583, 'src': 'embed', 'start': 34045.011, 'weight': 3, 'content': [{'end': 34052.793, 'text': 'so you actually start repeating this as a you know, like say, information again and again at some like say, at some stage you actually converge.', 'start': 34045.011, 'duration': 7.782}, {'end': 34055.617, 'text': 'that means your queue table will not be this.', 'start': 34052.793, 'duration': 2.824}, {'end': 34060.039, 'text': 'these results, these values will not be getting updated much.', 'start': 34055.617, 'duration': 4.422}, {'end': 34062.38, 'text': 'that is what we call it as convergence.', 'start': 34060.039, 'duration': 2.341}, {'end': 34065.021, 'text': 'ok, now how do I use it in the production.', 'start': 34062.38, 'duration': 2.641}, {'end': 34068.583, 'text': 'or how do I actually say what is the best one for state one?', 'start': 34065.021, 'duration': 3.562}], 'summary': 'Repeating information leads to convergence, reducing updates. applies to production and determining best state.', 'duration': 23.572, 'max_score': 34045.011, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-034045011.jpg'}, {'end': 34166.975, 'src': 'embed', 'start': 34126.456, 'weight': 4, 'content': [{'end': 34128.337, 'text': 'that will give me my action.', 'start': 34126.456, 'duration': 1.881}, {'end': 34129.917, 'text': 'optimal action.', 'start': 34128.337, 'duration': 1.58}, {'end': 34138.339, 'text': 'arg max is nothing but maximum happening at the particular place, that index will be thrown out correct, right.', 'start': 34129.917, 'duration': 8.422}, {'end': 34143.601, 'text': 'so Q learning basically prepares this table and from that we basically prepare for optimal actions.', 'start': 34138.339, 'duration': 5.262}, {'end': 34143.901, 'text': "that's it.", 'start': 34143.601, 'duration': 0.3}, {'end': 34152.73, 'text': 'and this is one of the simplest algorithms and I feel like say RL is more natural than any other supervised or unsupervised learning.', 'start': 34146.348, 'duration': 6.382}, {'end': 34160.453, 'text': 'So you think that chatbot is simple, but for some other guy who actually has lot of fancy thing with cars, he may think car is very good.', 'start': 34153.19, 'duration': 7.263}, {'end': 34166.975, 'text': 'reason is like, say, like looking at the photo, and identifying is much better than understanding some multiple languages.', 'start': 34160.453, 'duration': 6.522}], 'summary': 'Q learning prepares table for optimal actions in rl, a simple algorithm natural than supervised/unsupervised learning.', 'duration': 40.519, 'max_score': 34126.456, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-034126456.jpg'}, {'end': 34274.31, 'src': 'embed', 'start': 34243.785, 'weight': 2, 'content': [{'end': 34244.506, 'text': 'But this is the base.', 'start': 34243.785, 'duration': 0.721}, {'end': 34252.633, 'text': 'So at the end, now, when you actually look at it, this problem like, say,', 'start': 34248.45, 'duration': 4.183}, {'end': 34257.977, 'text': 'I think defining reinforcement learning is much easier than compared to any other supervised or unsupervised learning.', 'start': 34252.633, 'duration': 5.344}, {'end': 34267.204, 'text': 'It is very easy to actually see what is the problem and it is very easy to actually define what is reward, what is state, what is action.', 'start': 34258.338, 'duration': 8.866}, {'end': 34274.31, 'text': 'Whatever you are saying chat bot may be you are the right person to actually to define.', 'start': 34269.626, 'duration': 4.684}], 'summary': 'Defining reinforcement learning is easier than supervised or unsupervised learning.', 'duration': 30.525, 'max_score': 34243.785, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-034243785.jpg'}, {'end': 34329.37, 'src': 'embed', 'start': 34295.003, 'weight': 8, 'content': [{'end': 34300.525, 'text': 'Your Q table starting with all are zeros, okay.', 'start': 34295.003, 'duration': 5.522}, {'end': 34302.425, 'text': 'And from there onwards you actually start updating.', 'start': 34300.585, 'duration': 1.84}, {'end': 34308.238, 'text': 'And also if you look at state 3, multiple 0s are there, which action will you take? Random action.', 'start': 34303.735, 'duration': 4.503}, {'end': 34316.101, 'text': 'Among all those 0s, among all those, so like say here you have like 1, 2, 3.', 'start': 34308.518, 'duration': 7.583}, {'end': 34318.484, 'text': 'Among these 3 actions, random action will be taken.', 'start': 34316.102, 'duration': 2.382}, {'end': 34329.37, 'text': 'When we are starting with training, so this U table, the size that you said S into F, do we already know how many states are there? Yes, of course.', 'start': 34318.504, 'duration': 10.866}], 'summary': 'Updating q table from zeros, taking random actions, and knowing the number of states.', 'duration': 34.367, 'max_score': 34295.003, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-034295003.jpg'}, {'end': 34397.047, 'src': 'embed', 'start': 34344.468, 'weight': 10, 'content': [{'end': 34348.348, 'text': 'There are chances that you may miss because there might be many scenarios.', 'start': 34344.468, 'duration': 3.88}, {'end': 34356.03, 'text': 'Right While the system is running, if it comes across unknown state, then it cannot act.', 'start': 34351.209, 'duration': 4.821}, {'end': 34360.531, 'text': 'Basically, that is why you have k-means clustering to help quantization.', 'start': 34356.43, 'duration': 4.101}, {'end': 34366.339, 'text': 'This latest new state which was not experienced in the history is similar to so and so.', 'start': 34362.178, 'duration': 4.161}, {'end': 34369.16, 'text': 'For instance, here you are regarding when it is higher.', 'start': 34366.519, 'duration': 2.641}, {'end': 34372.14, 'text': 'Let us take the example of helicopter.', 'start': 34369.88, 'duration': 2.26}, {'end': 34377.501, 'text': 'So when you are going up the altitude high, then your height should increase.', 'start': 34372.22, 'duration': 5.281}, {'end': 34381.782, 'text': 'My reward state will be something based on that action.', 'start': 34377.961, 'duration': 3.821}, {'end': 34386.543, 'text': 'Now you reached a state, after that it has to be reversed.', 'start': 34382.302, 'duration': 4.241}, {'end': 34389.464, 'text': 'So how do you identify that in queue learning?', 'start': 34386.983, 'duration': 2.481}, {'end': 34392.906, 'text': 'So Q learning will not identify that.', 'start': 34390.106, 'duration': 2.8}, {'end': 34397.047, 'text': 'okay, let me actually answer that one.', 'start': 34392.906, 'duration': 4.141}], 'summary': 'K-means clustering aids in identifying new states for reinforcement learning, with a focus on quantization and action-based reward systems.', 'duration': 52.579, 'max_score': 34344.468, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-034344468.jpg'}, {'end': 34601.567, 'src': 'embed', 'start': 34574.717, 'weight': 12, 'content': [{'end': 34580.132, 'text': 'one of those actions will be performing for a given state, But one of those actions will be the right one.', 'start': 34574.717, 'duration': 5.415}, {'end': 34582.794, 'text': 'others will be like, say, deviating.', 'start': 34580.132, 'duration': 2.662}, {'end': 34583.775, 'text': 'others will be wrong actions.', 'start': 34582.794, 'duration': 0.981}, {'end': 34590.041, 'text': 'Make sense right? Is it clear or not? Ok.', 'start': 34585.757, 'duration': 4.284}, {'end': 34593.905, 'text': 'Now, how do we update that Q table?', 'start': 34590.602, 'duration': 3.303}, {'end': 34597.166, 'text': 'okay, that Q table.', 'start': 34595.686, 'duration': 1.48}, {'end': 34601.567, 'text': 'if I actually write it slightly differently, let me put it here the equation.', 'start': 34597.166, 'duration': 4.401}], 'summary': 'Actions in a given state must be right to update q table.', 'duration': 26.85, 'max_score': 34574.717, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-034574717.jpg'}, {'end': 34814.755, 'src': 'embed', 'start': 34788.008, 'weight': 7, 'content': [{'end': 34793.87, 'text': 'actually, you will be actually leading to 2 right.', 'start': 34788.008, 'duration': 5.862}, {'end': 34794.51, 'text': 'are you following now?', 'start': 34793.87, 'duration': 0.64}, {'end': 34797.575, 'text': 'So this will be repeated.', 'start': 34796.474, 'duration': 1.101}, {'end': 34802.221, 'text': 'Positive actions, like say, actions leading to positive rewards are getting accumulated.', 'start': 34798.116, 'duration': 4.105}, {'end': 34807.546, 'text': 'Negative rewards are getting accumulated, but negative side.', 'start': 34804.743, 'duration': 2.803}, {'end': 34814.755, 'text': 'Yes Whatever it might be, like it might be sensors which are detecting what is going on in the environment, right? So it understands the state.', 'start': 34807.707, 'duration': 7.048}], 'summary': 'Positive actions lead to accumulation of positive rewards, while negative actions lead to accumulation of negative rewards.', 'duration': 26.747, 'max_score': 34788.008, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-034788008.jpg'}, {'end': 34859.775, 'src': 'embed', 'start': 34832.682, 'weight': 14, 'content': [{'end': 34836.404, 'text': 'like I said, there is no designated way of saying okay, this is how we have to define for this problem.', 'start': 34832.682, 'duration': 3.722}, {'end': 34846.512, 'text': 'How do people do it today? Like, do they do it as a post-process? Like they capture state and action? randomly and then assign rewards? Yeah.', 'start': 34836.965, 'duration': 9.547}, {'end': 34851.593, 'text': 'Say for example, let me see this standard problems where you can see quantitative reward.', 'start': 34846.532, 'duration': 5.061}, {'end': 34859.775, 'text': "When I buy a stock, what is the return I'm getting by the end of the day is my reward for that action.", 'start': 34852.854, 'duration': 6.921}], 'summary': 'Discussion on defining problems, capturing state and action, and quantifying rewards in stock trading.', 'duration': 27.093, 'max_score': 34832.682, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-034832682.jpg'}, {'end': 34974.613, 'src': 'embed', 'start': 34928.747, 'weight': 15, 'content': [{'end': 34934.989, 'text': 'My reward is reaching the destination, hitting the enemy and that may have like say one million dollars as reward.', 'start': 34928.747, 'duration': 6.242}, {'end': 34937.229, 'text': 'So learning can happen at two stages here.', 'start': 34935.409, 'duration': 1.82}, {'end': 34944.47, 'text': 'One is online or when the thing is actually being trained actively versus offline when the data is being collected.', 'start': 34937.669, 'duration': 6.801}, {'end': 34948.191, 'text': 'Oh, when, when, no, no, no.', 'start': 34944.97, 'duration': 3.221}, {'end': 34950.251, 'text': "Actually that doesn't happen.", 'start': 34948.471, 'duration': 1.78}, {'end': 34954.112, 'text': 'Okay The online, whatever you are saying actually is part of training.', 'start': 34950.271, 'duration': 3.841}, {'end': 34958.848, 'text': 'The online whatever you are referring is actually training.', 'start': 34956.707, 'duration': 2.141}, {'end': 34961.069, 'text': 'So which means you have to repeat it a lot of time.', 'start': 34959.488, 'duration': 1.581}, {'end': 34961.529, 'text': 'Of course.', 'start': 34961.209, 'duration': 0.32}, {'end': 34965.73, 'text': 'That is where like say it takes time to build the system.', 'start': 34963.129, 'duration': 2.601}, {'end': 34970.452, 'text': 'And the better actually you train the more robust it is.', 'start': 34967.711, 'duration': 2.741}, {'end': 34974.613, 'text': 'Yeah, we can actually give whatever nomenclature is possible to this baby.', 'start': 34970.972, 'duration': 3.641}], 'summary': 'Training online requires repetition for robust system. potential reward: $1 million.', 'duration': 45.866, 'max_score': 34928.747, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-034928747.jpg'}, {'end': 35082.133, 'src': 'embed', 'start': 35048.821, 'weight': 6, 'content': [{'end': 35050.142, 'text': 'And we will be introducing some of them.', 'start': 35048.821, 'duration': 1.321}, {'end': 35052.544, 'text': 'This is what the updation mechanism is.', 'start': 35050.783, 'duration': 1.761}, {'end': 35053.325, 'text': 'Q table.', 'start': 35052.885, 'duration': 0.44}, {'end': 35055.167, 'text': 'Now there are few parameters.', 'start': 35054.046, 'duration': 1.121}, {'end': 35059.191, 'text': 'We have learning rate, discount factor.', 'start': 35055.928, 'duration': 3.263}, {'end': 35068.569, 'text': 'okay, when you actually look at the equation, the max maximum of next state.', 'start': 35060.447, 'duration': 8.122}, {'end': 35071.51, 'text': 'okay for all the possible actions.', 'start': 35068.569, 'duration': 2.941}, {'end': 35079.652, 'text': 'you are becoming greedy right for all the actions in the next state.', 'start': 35071.51, 'duration': 8.142}, {'end': 35082.133, 'text': 'once you perform action on a t on s?', 'start': 35079.652, 'duration': 2.481}], 'summary': 'Introducing updates, q table parameters: learning rate, discount factor, and equation analysis.', 'duration': 33.312, 'max_score': 35048.821, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-035048821.jpg'}, {'end': 35226.119, 'src': 'embed', 'start': 35197.948, 'weight': 17, 'content': [{'end': 35203.009, 'text': 'say, for example, one reinforcement learning convergence, I mean one model convergence if it is taking like, say,', 'start': 35197.948, 'duration': 5.061}, {'end': 35206.579, 'text': 'one week for hundred parameter combinations.', 'start': 35203.009, 'duration': 3.57}, {'end': 35210.163, 'text': 'now we have to take cloud subscription and then, like,', 'start': 35206.579, 'duration': 3.584}, {'end': 35220.453, 'text': 'see in independent manner you actually trying all those hundred reinforcement learning models and see, like,', 'start': 35210.163, 'duration': 10.29}, {'end': 35226.119, 'text': "how your system is performing and that's how you actually decide the learning parameter.", 'start': 35220.453, 'duration': 5.666}], 'summary': 'Reinforcement learning models tested in cloud, reducing convergence time to 1 week for 100 parameter combinations.', 'duration': 28.171, 'max_score': 35197.948, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-035197948.jpg'}, {'end': 35363.245, 'src': 'embed', 'start': 35329.168, 'weight': 18, 'content': [{'end': 35334.671, 'text': 'and when you actually look at this queue learning, oh, it is so simple, right?', 'start': 35329.168, 'duration': 5.503}, {'end': 35336.452, 'text': 'actually, hard work is hidden here.', 'start': 35334.671, 'duration': 1.781}, {'end': 35340.574, 'text': 'states and actions, preparation, state action.', 'start': 35336.452, 'duration': 4.122}, {'end': 35344.43, 'text': 'yeah, states preparation is hidden.', 'start': 35340.574, 'duration': 3.856}, {'end': 35349.915, 'text': 'that is where actually all our blood will be squeezed.', 'start': 35344.43, 'duration': 5.485}, {'end': 35354.598, 'text': 'ok, the IP is state preparation.', 'start': 35349.915, 'duration': 4.683}, {'end': 35358.822, 'text': 'Q learning is known to everyone now.', 'start': 35354.598, 'duration': 4.224}, {'end': 35360.323, 'text': 'different variants of Q learning also.', 'start': 35358.822, 'duration': 1.501}, {'end': 35363.245, 'text': 'people know now.', 'start': 35360.323, 'duration': 2.922}], 'summary': 'Understanding q learning involves states, actions, and hard work, with various variants known to people.', 'duration': 34.077, 'max_score': 35329.168, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-035329168.jpg'}], 'start': 33827.861, 'title': 'Q learning in rl', 'summary': 'Covers the q learning framework, its implementation in reinforcement learning, accumulation of rewards, q-table preparation, system training, and q-table update process, highlighting its simplicity and natural approach in solving reinforcement learning problems.', 'chapters': [{'end': 33992.452, 'start': 33827.861, 'title': 'Reinforcement learning: q learning framework', 'summary': 'Discusses the q learning framework for reinforcement learning, emphasizing the scarcity of frameworks available and highlighting the q learning algorithm as a means to solve reinforcement learning problems, with a focus on its model-free nature and ease of implementation.', 'duration': 164.591, 'highlights': ['The Q learning algorithm is highlighted as a proposed solution for reinforcement learning problems, with an emphasis on its model-free nature and ease of implementation through a dot csv file output.', 'The scarcity of frameworks available for reinforcement learning algorithms is mentioned, with a comparison to scikit-learn and the suggestion that open AI gym is a matured framework with examples to develop intuition.', 'The need for defined states, actions, and a method for training the system in reinforcement learning is discussed, emphasizing the goal to solve problems and the challenges posed by the lack of available frameworks.']}, {'end': 34213.725, 'start': 33992.452, 'title': 'Q learning in rl', 'summary': 'Explains q learning in reinforcement learning, highlighting the process of accumulating rewards for states, converging to optimal actions, and preparing a table for optimal actions, which is one of the simplest algorithms in rl with a natural approach.', 'duration': 221.273, 'highlights': ['The Q table converges when the values are not updated much, indicating convergence. The Q table converges when the values are not updated much, signifying convergence in the learning process.', 'Identifying optimal actions is based on the maximum value in the Q table for each state. Identifying optimal actions is based on the maximum value in the Q table for each state, guiding the selection of actions.', 'RL is considered more natural than supervised or unsupervised learning, with Q learning being one of the simplest algorithms. Q learning is highlighted as one of the simplest algorithms in reinforcement learning, considered more natural than other learning approaches.', 'The Q table is repeatedly updated to prepare for optimal actions in the learning process. The Q table is repeatedly updated to prepare for optimal actions, demonstrating the iterative nature of the learning process.', 'The process involves accumulating positive rewards for good states and negative rewards for bad states. The process involves accumulating positive rewards for good states and negative rewards for bad states, contributing to the learning and decision-making process.']}, {'end': 34574.717, 'start': 34214.225, 'title': 'Understanding q-learning in reinforcement learning', 'summary': 'Delves into the fundamentals of q-learning in reinforcement learning, discussing the process of training the system, the challenges of unknown states, and the actions and rewards associated with different states, with a focus on the example of a helicopter simulation.', 'duration': 360.492, 'highlights': ['The process of training reinforcement learning algorithms starts with a Q table initialized with zeros, and the table is updated over time, demonstrating the iterative nature of learning (e.g. starting with zeros and updating)', 'In reinforcement learning, the knowledge of the states is crucial for training the system, as the system needs to quantify the states and their associated actions to make informed decisions (e.g. discussing the importance of knowing the number of states for effective training)', 'The utilization of k-means clustering is highlighted as a method to address unknown states in reinforcement learning, aiding in the quantization of new states by comparing them to previously experienced states (e.g. using k-means clustering to help in quantization of new states)', 'The example of a helicopter simulation is used to illustrate the association between actions and rewards in different states, emphasizing the importance of identifying the right actions based on the state to achieve favorable outcomes (e.g. discussing the actions and rewards associated with different states in a helicopter simulation)']}, {'end': 34851.593, 'start': 34574.717, 'title': 'Reinforcement learning q-table update', 'summary': 'Discusses the process of updating a q table in reinforcement learning, involving actions, rewards, states, and learning rates, with a focus on accumulating positive rewards and defining the reward for a problem.', 'duration': 276.876, 'highlights': ['The Q table is updated based on actions, rewards, and states, with an emphasis on accumulating positive rewards and handling negative rewards.', 'The process involves initializing the Q table with zeros, updating rewards based on learning rates and discount factors, and determining the best rewards for each state and action pair.', 'The transcript explores the challenge of defining rewards for reinforcement learning problems, highlighting the absence of a designated way and the need for defining rewards based on the specific problem.']}, {'end': 35411.977, 'start': 34852.854, 'title': 'Understanding rewards and learning in reinforcement', 'summary': 'Discusses the concept of rewards and learning in reinforcement, emphasizing the importance of defining rewards, training mechanisms, and computational challenges, with the q-learning algorithm as a focal point.', 'duration': 559.123, 'highlights': ['The importance of defining rewards in reinforcement learning is highlighted, with examples such as the reward of reaching a destination or hitting the enemy, which may have quantifiable values such as one million dollars. Learning can occur online during active training or offline during data collection.', 'The training process involves repetition to build a robust system, with the more robust training leading to a more robust system, and various learning mechanisms such as superficial learning, unsupervised learning, and ambiguity playing a role.', 'The computational challenges in reinforcement learning, including the time-consuming process of model convergence, the need for cloud subscriptions for efficient computation, and the absence of readily available libraries, are emphasized. The process of grid search and the necessity to write custom wrappers are mentioned as part of the model training.', 'The Q-learning algorithm is detailed, including the nine main steps involved in the learning process, the hidden complexity in state preparation, and the limited availability of frameworks for reinforcement learning, leading to sparse implementation in academic institutes in India.']}], 'duration': 1584.116, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-033827861.jpg', 'highlights': ['Q learning algorithm is a model-free solution for reinforcement learning, emphasizing ease of implementation through a dot csv file output.', 'Open AI gym is a matured framework with examples to develop intuition, addressing the scarcity of available reinforcement learning frameworks.', 'Reinforcement learning requires defined states, actions, and a method for training the system, emphasizing the goal to solve problems and the challenges posed by the lack of available frameworks.', 'Q table converges when the values are not updated much, indicating convergence in the learning process.', 'Identifying optimal actions is based on the maximum value in the Q table for each state, guiding the selection of actions.', 'Q learning is highlighted as one of the simplest algorithms in reinforcement learning, considered more natural than other learning approaches.', 'The Q table is repeatedly updated to prepare for optimal actions, demonstrating the iterative nature of the learning process.', 'The process involves accumulating positive rewards for good states and negative rewards for bad states, contributing to the learning and decision-making process.', 'The process of training reinforcement learning algorithms starts with a Q table initialized with zeros, and the table is updated over time, demonstrating the iterative nature of learning.', 'In reinforcement learning, the knowledge of the states is crucial for training the system, as the system needs to quantify the states and their associated actions to make informed decisions.', 'The utilization of k-means clustering is highlighted as a method to address unknown states in reinforcement learning, aiding in the quantization of new states by comparing them to previously experienced states.', 'The example of a helicopter simulation is used to illustrate the association between actions and rewards in different states, emphasizing the importance of identifying the right actions based on the state to achieve favorable outcomes.', 'The Q table is updated based on actions, rewards, and states, with an emphasis on accumulating positive rewards and handling negative rewards.', 'The process involves initializing the Q table with zeros, updating rewards based on learning rates and discount factors, and determining the best rewards for each state and action pair.', 'The transcript explores the challenge of defining rewards for reinforcement learning problems, highlighting the absence of a designated way and the need for defining rewards based on the specific problem.', 'The importance of defining rewards in reinforcement learning is highlighted, with examples such as the reward of reaching a destination or hitting the enemy, which may have quantifiable values such as one million dollars.', 'The training process involves repetition to build a robust system, with the more robust training leading to a more robust system, and various learning mechanisms such as superficial learning, unsupervised learning, and ambiguity playing a role.', 'The computational challenges in reinforcement learning, including the time-consuming process of model convergence, the need for cloud subscriptions for efficient computation, and the absence of readily available libraries, are emphasized.', 'The Q-learning algorithm is detailed, including the nine main steps involved in the learning process, the hidden complexity in state preparation, and the limited availability of frameworks for reinforcement learning, leading to sparse implementation in academic institutes in India.']}, {'end': 38189.611, 'segs': [{'end': 35690.417, 'src': 'embed', 'start': 35658.312, 'weight': 0, 'content': [{'end': 35662.992, 'text': 'whenever proper passenger is picked up, reward is 20..', 'start': 35658.312, 'duration': 4.68}, {'end': 35666.573, 'text': 'drop twenty, ok, and we can actually redefine those things.', 'start': 35662.992, 'duration': 3.581}, {'end': 35671.674, 'text': 'when we are coding by default, these values are there.', 'start': 35666.573, 'duration': 5.101}, {'end': 35675.954, 'text': 'so now we are getting the reward, the meaning right.', 'start': 35671.674, 'duration': 4.28}, {'end': 35678.655, 'text': 'so for proper pick up, big reward.', 'start': 35675.954, 'duration': 2.701}, {'end': 35682.375, 'text': 'for proper drop up, big reward.', 'start': 35678.655, 'duration': 3.72}, {'end': 35690.417, 'text': 'roaming around freely, penalty after picking up if you are not reaching the destination in shorter time for every move.', 'start': 35682.375, 'duration': 8.042}], 'summary': 'Proper pick up earns 20 rewards; penalty for delayed drop-off.', 'duration': 32.105, 'max_score': 35658.312, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-035658312.jpg'}, {'end': 35749.438, 'src': 'embed', 'start': 35715.616, 'weight': 1, 'content': [{'end': 35719.078, 'text': 'ok, this is what, like, say, the rectangle I have in front of me.', 'start': 35715.616, 'duration': 3.462}, {'end': 35721.74, 'text': 'let me discretize that rectangle into 5 by 5 square.', 'start': 35719.078, 'duration': 2.662}, {'end': 35730.374, 'text': 'the taxi can be in any of these 5 by 5 square.', 'start': 35723.427, 'duration': 6.947}, {'end': 35737.3, 'text': 'ok, the drop and the pick up locations could be any anywhere in this things.', 'start': 35730.374, 'duration': 6.926}, {'end': 35741.164, 'text': 'ok, there are 4 locations where it could be drop.', 'start': 35737.3, 'duration': 3.864}, {'end': 35749.438, 'text': 'it could be pick up as well, right Now.', 'start': 35741.164, 'duration': 8.274}], 'summary': 'Discretizing a rectangle into 5x5 squares for taxi locations and drop/pick-up locations.', 'duration': 33.822, 'max_score': 35715.616, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-035715616.jpg'}, {'end': 37254.813, 'src': 'embed', 'start': 37217.081, 'weight': 3, 'content': [{'end': 37235.204, 'text': 'but in general, for the next step you see when I am doing it took 2126 steps to pick up a passenger and drop him.', 'start': 37217.081, 'duration': 18.123}, {'end': 37239.367, 'text': 'ok, and now let us get into Q, learning way of doing things.', 'start': 37235.204, 'duration': 4.163}, {'end': 37242.85, 'text': 'So I can actually so.', 'start': 37241.229, 'duration': 1.621}, {'end': 37247.134, 'text': 'whenever you are not actually dropping the passenger, reward is not 20.', 'start': 37242.85, 'duration': 4.284}, {'end': 37249.536, 'text': 'when you are picking up the passenger, it is taking reward as 10.', 'start': 37247.134, 'duration': 2.402}, {'end': 37254.813, 'text': 'when you are hitting the wall, your reward is minus 10..', 'start': 37249.536, 'duration': 5.277}], 'summary': 'In the q learning process, it took 2126 steps to pick up and drop a passenger, with rewards of 10 for picking up, -10 for hitting the wall, and -20 for not dropping the passenger.', 'duration': 37.732, 'max_score': 37217.081, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-037217081.jpg'}, {'end': 37336.211, 'src': 'embed', 'start': 37310.389, 'weight': 2, 'content': [{'end': 37318.799, 'text': 'so for episode I am saying like say, while reward is not equals to twenty, take the action.', 'start': 37310.389, 'duration': 8.41}, {'end': 37320.54, 'text': 'you are actually this is the state.', 'start': 37318.799, 'duration': 1.741}, {'end': 37324.783, 'text': 'take the action, best action from the table.', 'start': 37320.54, 'duration': 4.243}, {'end': 37330.147, 'text': 'and then you are saying like, say, environment, dot, step of action, best action.', 'start': 37324.783, 'duration': 5.364}, {'end': 37333.169, 'text': 'and for that best action, what was the reward?', 'start': 37330.147, 'duration': 3.022}, {'end': 37336.211, 'text': 'you are repeating it till your reward is twenty.', 'start': 37333.169, 'duration': 3.042}], 'summary': 'Using reinforcement learning, take actions until reward reaches 20.', 'duration': 25.822, 'max_score': 37310.389, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-037310389.jpg'}], 'start': 35411.977, 'title': 'Reinforcement learning in smart taxi', 'summary': 'Explores reinforcement learning through examples of smart taxi and walking in a frozen lake, emphasizing practical implementation over theoretical knowledge, with details on defining rewards, penalties, and state aggregation in rl. it discusses the taxi grid problem, teaching a cab to pick and drop passengers, implementing reinforcement learning with a gym environment, and training a q table with 2000 iterations to enable a taxi to pick up and drop off passengers, exploring various actions and rewards in a taxi environment.', 'chapters': [{'end': 35902.778, 'start': 35411.977, 'title': 'Reinforcement learning in smart taxi', 'summary': 'Discusses essentials of reinforcement learning (rl) through examples of smart taxi and walking in a frozen lake, emphasizing the importance of practical implementation over theoretical knowledge, with details on defining rewards, penalties, and state aggregation in rl.', 'duration': 490.801, 'highlights': ['Defining rewards for proper pick up and drop, and penalizing unnecessary roaming, with a reward of 20 for proper drop and a negative 1 penalty for each wrong move.', "Explaining the environment as a 5 by 5 grid, with 25 possible states and 4 pick-up/drop locations, and the taxi's movement constraints, highlighting the significance of penalizing wrong activities for future learning.", 'Discussing the complexity of obstacles and multiple lanes in the environment, showcasing the potential for increasing the complexity of RL scenarios.']}, {'end': 36258.044, 'start': 35902.778, 'title': 'Taxi grid problem', 'summary': 'Discusses the taxi grid problem, in which a 5x5 grid with 4 drop or pick up locations results in 500 states, and the complexity can be increased to 2500 states by not fixing the location. it also explores the possibility of obstacles in the middle of the road.', 'duration': 355.266, 'highlights': ['The taxi grid problem involves a 5x5 grid with 4 drop or pick up locations, resulting in 500 states.', 'The complexity can be increased to 2500 states by not fixing the location and obtaining all the coordinates without any aggregation.', 'The possibility of obstacles in the middle of the road is explored, adding to the complexity of the problem.']}, {'end': 36820.719, 'start': 36259.042, 'title': 'Reinforcement learning: teaching a cab to pick and drop passengers', 'summary': 'Discusses the simulation of a cab navigating a 5x5 grid to pick and drop passengers, with 500 possible states, using reinforcement learning and updating the q-table based on reward and penalty.', 'duration': 561.677, 'highlights': ['The simulation involves a 5x5 grid where a cab navigates to pick and drop passengers, with 500 possible states, and updates the Q-table with rewards or penalties for each movement.', "The 5x5 grid scenario results in 500 possible states, representing the cab's location, passenger's location, and whether the passenger is inside the cab, with the Q-table being updated for each movement with rewards or penalties.", 'By applying reinforcement learning in the simulation, the Q-table is updated for each movement of the cab, with the learning rate and designated time steps, to teach the cab to pick and drop passengers at their right locations.']}, {'end': 37423.106, 'start': 36820.719, 'title': 'Reinforcement learning with gym environment', 'summary': 'Discusses implementing a reinforcement learning algorithm using the gym environment, where various actions and rewards are explored in a taxi environment, aiming to optimize the learning process with q-learning, reaching the destination in 562 steps, and setting 2000 episodes for training.', 'duration': 602.387, 'highlights': ['Implementing reinforcement learning with gym environment The transcript discusses the implementation of reinforcement learning using the gym environment for exploring various states, actions, and rewards.', 'Exploring various actions and rewards in a taxi environment The speaker delves into the exploration of different actions and rewards within a taxi environment, including penalizing actions and the concept of reaching the destination with reward values.', 'Optimizing the learning process with Q-learning The chapter highlights the use of Q-learning to optimize the learning process, aiming to reach the destination in 562 steps and setting 2000 episodes for training.']}, {'end': 38189.611, 'start': 37423.106, 'title': 'Reinforcement learning q table', 'summary': "Discusses the process of training a q table with 2000 iterations to enable a taxi to pick up and drop off passengers, and the impact of altering the number of episodes on the taxi's performance and learning.", 'duration': 766.505, 'highlights': ['Training Q table with 2000 iterations The Q table was trained with 2000 iterations to enable the taxi to learn the process of picking up and dropping off passengers.', "Impact of altering the number of episodes on performance The taxi's performance and learning were impacted when the number of episodes was altered, as it initially performed poorly but improved over time.", 'Challenges in defining states, actions, and rewards Defining the number of states, possible actions, and rewards based on the states posed challenges, with the complexity varying based on the number of states and actions.', 'Adoption of existing environment for modifications Data scientists adopt existing environments, such as the taxi environment, and modify them for their specific data, utilizing the predefined states, actions, and rewards as a template.']}], 'duration': 2777.634, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RnFGwxJwx-0/pics/RnFGwxJwx-035411977.jpg', 'highlights': ['Defining rewards for proper pick up and drop, and penalizing unnecessary roaming, with a reward of 20 for proper drop and a negative 1 penalty for each wrong move.', 'The taxi grid problem involves a 5x5 grid with 4 drop or pick up locations, resulting in 500 states.', 'Implementing reinforcement learning with gym environment The transcript discusses the implementation of reinforcement learning using the gym environment for exploring various states, actions, and rewards.', 'Training Q table with 2000 iterations The Q table was trained with 2000 iterations to enable the taxi to learn the process of picking up and dropping off passengers.']}], 'highlights': ["Python's dominance in machine learning and its practical applications in industry are emphasized as the key choice for new learners and professionals, with a series of tutorials and a practical learning approach provided.", "Covers python's dominance in machine learning and data analysis, using libraries like pandas, numpy, and matplotlib for processing and visualization.", 'Anaconda is a distribution of Python that includes commonly used libraries and an IDE for Python, making it easy to update and add new libraries.', 'NumPy is essential for deep learning image classification and image processing, as it involves converting image data into NumPy arrays for matching, classification, and processing.', 'Pandas is employed for labeled data analysis and is advantageous for processing large datasets, exemplified by a case study involving Mercedes Benz processing data from around 10-20 million rows.', 'Achieved 96-97% accuracy in a project The project achieved 96-97% accuracy levels.', 'The chapter introduces the usage of Jupyter for data science and machine learning projects Jupyter is emphasized as an essential tool for data science and machine learning projects, allowing users to interactively type, click, and see the code execution, ensuring that in every step, the data can be filtered and visualized, providing a convenient way to work through ML projects.', 'Lists are mutable, allowing addition and removal of elements, demonstrated by accessing elements, using the append method to add elements, and the pop and remove methods to remove elements.', 'The challenge of obtaining high-cost GPUs for deep learning models, with a need for around 16 gigs of GPU for training, leading to the rapid depletion of free credits from cloud providers like Google.', 'Group by summarize: Analyzing Uber trip data by grouping start locations and finding trips with more than 10 miles.', 'The importance of understanding the difference between iloc and loc, with iloc being deprecated and discouraged due to its reliance on numerical indexing, while loc is preferred for label-based indexing, resulting in easier and more practical data manipulation and visualization.', "The 'reset_index' method in Pandas changes the original data frame when 'in-place' parameter is set to true.", "Using the 'drop' parameter in 'reset_index' method removes the previous index column for clarity and consistency.", 'Wes McKinney is the creator of DataFrames and Pandas, and the book recommended for reference in learning Python for data analysis.', 'The chapter discusses the process of grouping and aggregation in Python, focusing on finding the most recent and earliest travel date and mean distance traveled for each start city.', 'The chapter covers the usage of dictionary notation in Python for efficient data extraction and data frame operations in pandas, emphasizing flexibility and efficiency compared to traditional methods.', 'The dataset contains information on around 100 stores, with sales data spanning across different months, providing a comprehensive overview of store performance.', 'Covers setting variable sales targets based on store sales and adapting targets using the apply function.', 'NumPy arrays offer 10 to 100 times faster numerical operations compared to normal lists, making them highly efficient for computational tasks.', 'The formula for standard deviation involves two key steps: calculating the average and then determining the distance of each data point from the average, with the resulting measure providing insight into the variability of the data set.', 'Introduction of chi-square test for analyzing gender preferences in product usage', 'Reinforcement learning exhibits exponential growth with around 65,000 patents, dominant in recent patent applications, and significant in generating value and startup acquisitions.', 'Q learning algorithm is a model-free solution for reinforcement learning, emphasizing ease of implementation through a dot csv file output.', 'Defining rewards for proper pick up and drop, and penalizing unnecessary roaming, with a reward of 20 for proper drop and a negative 1 penalty for each wrong move.']}